7,842 Matching Annotations
  1. May 2026
    1. On 2018-08-29 14:16:01, user Eryn McFarlane wrote:

      Dear Authors,

      We are a group of Phd students and Postdocs at the University of Edinburgh that meet weekly to discuss life history papers. We noticed that the tone of our discussions could be a little negative, so, to counteract this, we decided that the most positive thing would be to review pre-print papers, and then share our reviews with the authors. Hopefully, this acts to both give us experience as reviewers, and provide feedback to researchers who have posted their manuscripts on bioarkiv.

      We hope that this review is useful to you, and will help to improve your paper. Please feel free to contact us if you have any questions or clarifications.

      Best wishes,

      Eryn McFarlane<br /> eryn.mcfarlane@ed.ac.uk<br /> on behalf of UoE Life History Journal Club

      Major Comments on ‘Latitudinal variation in Life History Reveals a Reproductive Disadvantage in the Texas Horned Lizard (Phrynosoma cornutum)’

      This paper addresses some interesting questions about whether life history trade offs are expected to vary according to latitude (and associated environmental variables), particularly in ecotherms which are expected to be physiologically sensitive to heat gradients. Hughes et al. use museum samples from four different populations, collected over 52 years to answer this question. Below, in no particular order, are our main suggestions to improve this manuscript.

      Eggs in the ovary vs. clutch size during a breeding season. We would like to see this metric verified against lizards who have laid a full clutch size. In other systems, we are aware of the possibility of 1) biological constraints that limit the number of visible eggs in the ovaduct, meaning that as the visible eggs are hatched, more eggs will be produced and 2) the possibility for females to re-absorb eggs/fetuses they choose not to bring to maturity. For these reasons, we would want to see that the eggs measured here are correlated with the number of eggs a female would lay during her breeding season.

      Statistics: We think it would be appropriate to account for both the month (or day, if possible) the animal was collected and the year the animal was collected in all models. Given that animals were collected at different points in the breeding season, without accounting for date, it’s difficult to know if any differences are just due to a bias in when the animals were collected. This would be particularly true if there was a bias in when animals in each site were collected (i.e. were those in Kansas just sampled later in the year?). Similarly, the authors should account for year in their models. Given that climate change has been demonstrated to have affected phenology in a number of different species, we would expect that the window for laying may have changed as the study has gone on.<br /> There are a number of places that the authors don’t employ statistics where we believe they should. For example, the differences in breeding season between the study sites should be assessed statistically, rather than by eye, while accounting for sampling dates and years.<br /> From the description of the counties for sampling in each state, there appears to be a lot of variation in the lat/long that each animal was sampled at. Why not use the lat/long for each animal, rather than lumping them into three state level populations? This would allow for a finer scale assessment of the effects of latitude, particularly since New Mexico and Texas are much closer together than they are to Kansas. <br /> In some cases (i.e. for individual egg measurements), it seems that multiple eggs are taken from the same female. These eggs are not independent. This needs to be controlled for statistically, such as by using a random effect of female id. Otherwise, there is pseudoreplication in the model.

      Red thread: We saw the interesting potential in this paper (it’s why we picked it!), but we found it quite hard to follow in some places. For example, the introduction could follow a more traditional structure of starting very broadly, and narrowing down to your questions, while trying to keep each paragraph quite focussed. Similarly, we found it difficult to link between the methods and the results in some places, as results are reported that were not clear from the methods. For example, we’re not sure where ‘Monthly activity’ comes from (i.e. are there analyses done here? What are the sample sizes for each sample size for each month?); age/body size at sexual maturity, we’re not sure how these were inferred at all, given that all samples were dead? We think that more clarity in the methods, and a focus on a one:one ratio between methods and results would help to make the story clearer.

      Collection description: We have a lot of questions about where your samples came from, and how they were collected, and stored. It seems likely that there is a lot more data than has been reported here (i.e. exact coordinates, exact dates of sample collection). Were all of the samples collected the same way? How was that done? If they were, for example, all roadkill, how would that bias your results? How were they dissected? Were they all dissected by the same person?<br /> Similarly, we suspect that there is a lot more possible data out there that could help to answer some of the questions posed here. For example, is there NOAA data for each of the sites? This would help to pull apart differences due to latitude per se and differences due to temperature or precipitation. Such analyses would be really interesting, although perhaps too data demanding. As it is now, without this information, it’s difficult to tell if the patterns reported are due to latitude, or something correlated with latitude.

    1. On 2018-07-31 05:50:17, user Guillaume Petit wrote:

      Dear Hayden Schmidt, Robin Betz Ron Dror and Andrew Kruse,

      We found your article on BioRxiv, read it and discussed it at our research group journal club. Our comments are summarised below.

      Overall, we thought this manuscript was very interesting with new and original results, and everyone enjoyed reading it.

      In the introduction, perhaps it would be worth mentioning for the broader readership why the σ1 receptor is so named.

      For the crystallography and structural biology section: <br /> We appreciate how much hard work it takes to produce and study membrane proteins. Nevertheless, we thought that some of the data in Table 1 could be expanded and commented on. For example, the signal to noise ratio and half set correlation coefficients are very low in the highest resolution shell and the B-factors are very high for all three structures. We recommend including R(pim) and B Wilson values as well as R(free) values for the high-resolution shells. Given the weak data, the structures have been carefully refined because the geometry statistics show that they have been tightly restrained. We recommend providing a comment in the results section on the quality of the raw data and another one in the discussion section of the implications of the data quality for the modelled structures. This would help readers and end-users who might not be familiar with crystallography statistics to ensure that the coordinates are used with appropriate caution. A question that we wondered about was why 7 crystals were needed for the NE-100 complex structure but only one for each of the other two complexes? Could a comment be added to explain that? Additionally, we think Fig 1d and 1e would benefit from being presented in stereo and perhaps with electron density displayed around the residues and ligands. In Fig. 2b, 2d and 2f of the supplementary data, the electron density quality suggests that the orientation of the ligand may have been difficult to determine. Can you comment on how the final orientation was selected and whether other orientations were trialled? Finally, it would be helpful to include a comment on the quality of these three liganded structures in comparison to the previously published structure of the 1-receptor-ligand. Are they better? Worse? Similar?

      Kinetics analysis: More background information would be helpful to understand the kinetics and scintillation experiments. What formula was used for the two steps model? What is the relationship between Kd, kfast and kslow? In Table 2, the errors for each measurement and a description of how they were calculated need to be added. We also recommend that you add the full word definition for the N.D. abbreviation in the figure description. Finally, we suggest that you move the “Saturation binding in Sf9 membranes” and “Measurement of ligand dissociation in Sf9 membranes” sections in the online method chapter, to the supplementary data since the corresponding results are also found there.

      Molecular dynamics: We recommend including a comment on the limitations of using a simulation of the monomer when the protein forms a trimer in in vitro experiments. Are there plans to simulate the trimer in the future? If so, how would that be done, or why can’t it be done now/easily? For the future it would be worth defining what mutations on helix alpha 4 could be done to confirm the simulation results. Are there other antagonist molecules that could be used to validate the results observed with (+)-pentazocine? What would be the next step to confirm the models?

      Small corrections: <br /> In the second sentence of the discussion section agonist is used twice, we believe that one of these should probably be antagonist. <br /> In the online methods section: first sentence, a word is missing after the “and”. In the same sentence we suggest changing the text to the receptor is expressed in Sf9 cells and then purified. (Not expressed and purified in Sf9 cells).

      Hopefully you will find these comments helpful. We wish you all the best for the publication of this interesting article!

      Kind regards

      The Martin Lab (https://www.griffith.edu.au...

      (Draft prepared by PhD student Guillaume Petit: guillaume.petit@griffithuni.edu.au, with contributions from all of the Martin Lab)

    1. On 2018-05-09 16:50:03, user Ben Tully wrote:

      Journal Club Preprint Review<br /> Parallel Evolution of Key Genomic Features and Cellular Bioenergetics Across the Marine Radiation of a Bacterial Phylum <br /> Reviewed by:<br /> Drs. Benjamin Tully, Michael Morando, Sarah Hu, Michael D. Lee; Graduate students Heidi Aronson, Gerid Ollison, Asa Conover; & One Anonymous reviewer

      Overall, the group thought the paper was interesting and well executed and the most powerful results was the evidence of convergent genomic evolution. The following comments are for the few areas where we had questions or thought the paper would be stronger with some clarification or adjustments. We have broken our commentary down into major and minor comments, with major comments structured around larger ideas and minor comments designed to ask specific questions brought up during the journal club.

      Major comments:<br /> Two main topics and recurring themes could have been more effectively communicated – specifically “streamlining” and “switching”. While these terms are fleshed out over the course of the manuscript (either in the main text or methods), there are portions of the manuscript where it is unclear what their significance is, or it is poorly constrained.

      “Streamlining”<br /> There was a prolonged discussion about the term ‘streamlining’ amongst the group. It was the understanding of the group that this term as generally defined (such as the in the cited Giovanonni et al. 2014) includes concepts regarding how changes in closely related organisms towards lower GC, reduction in pseudogenes and paralogs, low coding density and a possible reduction in genome size could be markers for “streamlining”. <br /> There was concern that the measures used to streamlining were not accurate. The metrics provided (intergenic spacer, N-ARSC, C-ARSC), while corollary to these metrics, do not directly address these factors outright. For example, in Ln. 89-92: “with the genomes of epipelagic Marinimicrobia containing signatures of streamlining such as lower % GC content and shorter intergenic regions”, it seems to us that this application of intergenic-region length is a little ambiguous without gene count and/or genome size accounted for, and that a better metric is coding percentage. The Giovanonni et al. 2014 paper mentions intergenic regions only in the context of ratios of non-coding to coding (“low ratios of intergenic spacer DNA to coding DNA”). Interpretation of this result would also be assisted by either including clarification in Figure 1B that “Size” is “Estimate Size” or providing the raw %completeness and %redundancy values.<br /> Ln 117-9: “…streamlined in all other aspects.” While there is some evidence suggesting this, we think that inherently intergenic space length is not the most accurate measure (as detailed above) and that N-ARSC and C-ARSC may to suggest adaptation to environmental niche/nutrient regimes, which may be related to genomic streamlining, but is not evidence itself.<br /> For another concern raised amongst the group was that our currently understood evolutionary mechanisms for streamlining either involve relatively extremely large effective populations sizes (such as with marine picocyanobacteria), or relatively extremely small effective population sizes (such as with obligate endosymbionts). Is there evidence or data that can suggest which avenue may be driving streamlining in epipelagic clades?

      “Switching”<br /> The term ‘switching’ was problematic for most of the group. The way this term was interpreted amongst the group was as the example provided by Simon et al. 2017 (https://www.ncbi.nlm.nih.go... in regard to Roseobacter, with a specific example such as: a marine clade, holds a subclade that diverged to freshwater, and within that freshwater clade there are members than “switched” back to a marine lifestyle. It wasn’t clear to us that there were specific examples provided here of this type of ‘switching’ as opposed to divergence from a common ancestor into new niches more than once. We feel the convergent genomic evolution is clear and solid, but we had a hard time teasing apart whether “switching” was meant how we were interpreting it, and then finding evidence for it. It was suggested that providing a detailed tree with loss/gain markers may assist in this, but it was also discussed that there may be issues due to the incomplete nature of the genomes. In this same vein, the term “parallel evolution” is used in the title, so we think you may want to increase discussion of evidence of ‘switching’ and change the title or keep the title the same and use examples of convergent evolution.

      Minor comments:<br /> Genome quality. Using a cutoff of 40% completeness seems a bit low? How would a higher cut off impact the number of genomes analyzed? While included in the table about the genomes used, it may be useful to have a figure (suggested: histogram) looking at completeness? There are several traits that are determined by presence/absence (proteorhodopsin) or based on the fraction of genes recovered (NDH) – does such a low cutoff impact these conclusions?

      Methodology questions that should be addressed:<br /> • What were the cutoffs used to determine if a genome was epipelagic vs mesopelagic? <br /> • How were polytomies on the genome tree reconciled with the TPM calculation? Wouldn’t genomes that appear identical on a genome tree of 120 markers competitively recruit the same reads? There was a discussion of selecting a representative of identical MAGs – was this applied evenly to all Tara originating MAGs? From TOBG (Tully), TARA.MAG (Delmont), and UBA (Parks)?<br /> • Explicitly stating how the heatmaps in Fig 1 and 2 were normalized would be helpful, i.e. by row, by column, or across the entire heatmap.

      NADH Respiratory Complexes. For the difference between canonical and noncanonical NDH – are these being visualized? As it stands, using a cutoff of 6 and 5 subunits, respectively, on incomplete genomes seems to imply that a genome can have many of the subunits still missing, can you tell if nuoDEF is ‘missing’ without a visualization of the contig? If you are visualizing these contigs, it would be great to include a figure that displays this. Could nuoDEF be encoded in a different part of the genome and just not recovered? Is nuoA-M normally syntenic? It is tough to glean this information from the methods. For the presence of NQR, can organisms utilize the established sodium motive force without a sodium pumping ATPase? Can sodium pumping via NDH be a form of osmotic stabilization?

      Fraction of recruited reads. As it stands there is not enough information in Figure 1 to determine how abundance values/TPM/relative fraction of Marinimicrobia differs between the epipelagic and mesopelagic samples. It would be great to detail the scales of ‘Genome Abundance’ in the heatmap of Figure 1 and provide more details as to what is being displayed. Readers in the group came up with two completely valid ways of interpreting the data, one assuming raw relative abundance values, the other assuming normalized relative abundance values – this needs to be clarified. Also, ‘Genome Abundance’ should be explicitly defined as ‘Relative Fraction’ of recruited reads or TPM if that is what is being displayed.

      Are the 90 metagenome samples from Tara the same referenced in Delmont et al? Is it just the bacterial/archaeal fraction? There are a 100+ more Tara samples that could be analyzed so some discussion as to why these one over other would be helpful.

      Very minor grammatical issues:

      Line 52: recent work has

      Line 63 and elsewhere: “Marinimicrobial” with an “L” at the end is just an ad hoc adjective, it isn’t the proper noun/phylum, so shouldn’t be capitalized

      Line 73 got displaced in this version, probably not actually missing

      Line 102: Fig. 2B

      Line 119: “to be more”

      Line 121: “requires a particular coding potential, that” (I think a comma after “potential” helps this sentence)

    1. On 2018-04-19 13:36:27, user Joe Pomerening wrote:

      Great work -- I'll be sure to read the whole article word-for-word. Just wanted to add the comment, while I've been out of the academic game for sometime, particularly the cell cycle field, I think investigators are going to need to accept that the boundaries between *all phases are not nearly as discrete as we schematize and *want to believe. In fact, both S- and M-phase-entries are likely more fuzzy than one would imagine, though the outcomes are clearly quite all-or-none. The linking of post-M-phase progression is an incredibly interesting area, and I believe is where the greatest potential for oncogenesis lies. A really nice story by García-Higuera's group showed that if Cdh1 is ablated, cells start replication way to soon and suffer from a shortage of dNTPs, showing indeed, timing is everything. Anyway, cheers on a nicely done story, good luck with the publication, and may the field be able to accept/comprehend that phase initiation/physiologic response to catalytic enzymes (CDKs, APC's, etc.) will not be as neatly delineated as crossing a line shown in a G1->-S->G2->M schematic.

    1. On 2018-04-02 17:04:21, user Pat Schloss wrote:

      My research group reviewed the preprint by Hirten and colleagues as part of our journal club and prepared this collaborative review. None of us have been asked to provide a review of the manuscript for a journal and we do not know its status.

      This preprint aims to describe the outcomes of fecal microbiota transplant (FMT) as a treatment for recurrent Clostridium difficile infection (CDI) in patients with and without IBD. The authors have a specific focus on whether IBD patients receiving FMTs are more or less likely to respond to, or have complications arising from, the procedure as a treatment for recurrent CDI. Overall, we think the longitudinal clinical component of the study was particularly well-suited to address the questions laid forth by the investigators, however, the subsequent analysis and grammatical errors within the paper made it difficult to both follow the narrative and reach the same conclusions. As an example, while engraftment of microbial communities following FMT was stated as one of the secondary outcomes the authors were interested in, the data presented aren't sufficient for making any conclusions regarding colonization of donor-derived microbes. Likewise, there are several instances of missing information such as adequate background and justification in the introduction, experimental details in the materials and methods, and the requisite information to interpret the plots in the figure legends. We believe that while the research is worthwhile, the aforementioned issues significantly hinder any conclusions made in the manuscript and need to be addressed.

      General comments:

      1. In the study, they find that "23 out of 118 (19.5%) patients with follow up at 2 months and 31 out of 83 (37.3%) patients with follow up at 6 months suffered from recurrent CDI after the initial FMT." These failure rates make us wonder if the FMTs can even be considered successful. Given these high failure rates, we wonder how meaningful the results of this study are. The low rates are addressed briefly in the discussion by saying "Many studies exclude subjects with severe CDI, a known predictor of CDI recurrence, which may explain the lower success rate of FMT observed in our cohort compared to others." This explanation, however, is somewhat contradicted when discussing predictors of short term relapse (failed FMT) which included "severe CDI" suggesting that other studies must have also included severe cases in their cohorts. Some clarity regarding these points would greatly appreciated.

      2. Although "microbiome engraftment" is a critical concept in this paper, it is never specifically defined or discussed in the context of current literature. Likewise, while engraftment is listed as a secondary outcome, they did not define what a successful engraftment would actually look like. The authors should expand upon the meaning of the phrase and they should also review CDI FMT literature, framing this study in terms of what has been seen previously.

      3. Throughout the manuscript, FMT is being used to refer to materials used for the transplant while it really means the process. For example, "FMT was obtained from healthy donors..." should be "Material for FMT was obtained...." This needs to be corrected in several instances.

      4. The Methods section needs additional details on how 16S rRNA gene sequences were processed. Additionally, specific details regarding software (parameters used, version, etc.) are absent but are required for proper interpretation of the analysis pipeline. The details that are provided in the Supplemental Material are inadequate and they really should be in the main body of the paper given their importance to the overall story.

      5. The last section within the results where the functional analysis of the patient microbiomes is described doesn't warrant its own subsection. The information contained is too broad and disjointed as it's presented and either needs to be expanded on, included elsewhere, or removed altogether. While interesting, it doesn't appear to contribute anything to the main narrative as defined by the primary and secondary outcomes laid out in the introduction.

      6. The figure legends were missing almost all information required to interpret the plots and, in some cases, the labels provided were even in the wrong order. Specific methodological and statistical methods need to be stated for each panel or groups of panels.

      7. The manuscript needs thorough editing for language and grammar. There are multiple places where it is unclear what pronouns refer to, mixtures of tenses, and confusing sentence structure.

      8. For the microbiome analysis, the study included a 9 vs. 9 study design but there isn't any indication whether that would be sufficient to detect the effects of interest. Power calculations need to be provided indicating what effect sizes the study design would allow the investigators to detect for their primary research questions.

      9. If the authors used phylogenetic methods, how did they get OTUs for the Random Forest approach? If they generated OTUs, then why not also analyze OTU-based metrics of alpha and beta-diversity. The reference-based UniFrac methods that the authors likely used are strongly biased by what is in the database and are known to have numerous problems relative to de novo clustering methods.

      10. Comparison of the microbiota over time should be done with paired tests using each subject as their own control.

      11. In the text relating to Figure 2A, the authors use "bacterial alpha diversity", but the y-axis of the figure says "phylogenetic diversity" while an identical plot in Figure 2C uses "alpha diversity" for the y-axis. In addition to this, the y-axes in these plots should start at zero, not 5 or 2.5. The same is true for Figure 3A.

      Further Questions:

      1. Did the 18 people subjected to the microbiome analyses have a difference in recurrence rates?

      2. What percent of the IBD patients went into remission?

      3. Was there a relationship between donor and the requirement for escalation of medication (IBDe) vs. the stable group (IBDs)?

      Specific Comments (P = page, p = paragraph):

      1. P4p2: "reducing bacterial diversity and the abundance of Bacteroidetes and Firmicutes phyla"

      2. P4p2: The authors reference an "aberrant microbiome with a donor-like microbiome," however, the supporting data are mixed and don't present a clear conclusion. The aberrant microbiota are supplemented with microbiota from a donor, but it doesn't consistently take on the structure of the donor.

      3. P4p2: When stating, "frequent use of concomitant immunosuppressive agents" we assumed that this is in reference to IBD and not CDI treatment, but the writing could be made clearer.

    1. On 2018-03-27 17:35:43, user Haley Dylewski wrote:

      Hello! We are graduate students at the University of Tennessee reading BioRxiv submissions and we enjoyed your paper! We have compiled our thoughts into a review and hope you find it helpful!

      The authors hypothesize that initial TLR4 expression and concentration changes dictate how sepsis syndrome is initiated. A model consisting of three ODE’s was constructed to simulate initial TLR4 flux between relevant compartments of the cell. The proposed model consists of three ODEs that describe three regions in phase space, each space representing a unique cellular compartment relevant to TLR4 activation. For the most part the authors do a good job explaining the physiological basis of their model as well as discussing their assumptions and reasonings.<br /> The paper starts off strong, providing detailed rationale for the study and a strong biological background for the process described. The description of the model and the biological processes it represents seem sound however, the paper becomes weaker after the model is presented. The authors do not discuss the significance of the model’s outputs in a meaningful way: What information do the phase portraits offer about the system dynamics? What can be interpreted biologically and how exactly was the model applied to clinical situations? <br /> Overall, the authors seem to be addressing an impactful problem and I encourage them to pursue this work further.

      Major Comments:

      1. Alternative models were not considered in this paper. In the application of the model, the authors ultimately relate mRNA production rate to patient outcomes. Such a relationship seems like it could be represented by a linear regression; I think it is important that the authors compare their model to a simpler one to justify using their model and ultimately make their paper stronger.

      2. Though the model design is based on biology, the application of the output is not adequately addressed. The authors predict patient outcomes using the model but do not specify how they do so (it seems that the signal resolution of patient data was compared to the simulated systems). What is the rationale for choosing phase portraits as the output? Does the model provide insight into the system or is its purpose simply to be a predictor of patient outcomes.

      3. Some of the greek letters are missing from the text. This may be due to the format of bioRxiv but makes some discussion unclear. We would suggest checking the produced pdf when approving the ms. in bioarxiv.

      4. Phi is the crux of this model however the variable is not sound:

      a. There are no units associated with it. Without units it is hard to interpret what the variable means. <br /> b. The use of phi in the model is not adequately justified. Phi represents the rate tlr4 mRNA is produced. This parameter appears to be treated functionally as the TLR4 flux entering the system as a result of LPS stimulation. The authors did not clearly explain simplifications assumed in this relationship. Eukaryotic protein production is complex; is the rate of mRNA production representative of the flux in this system? Or is this variable chosen for convenience? There needs to be some kind of mathematical relationship linking rate of mRNA production to rate of protein production, otherwise justification for omitting such a critical relationship should be stated.<br /> c. Three different response systems are modeled representing varying levels of LPS stimulation. The parameter differentiating these conditions is phi however how the values were chosen is not clear to me. How was the cut off of each response determined? It is said that phi “has been determined experimentally”. There is a reference on the measurement but perhaps, because of importance of this parameter, there should be a short explanation of how the measurement was done.

      1. Similar to 4-a, none of the parameters have unitis. If the model was made dimensionless, this needs to be clearly illustrated. This is critical for physiological interpretation and reproducibility.

      2. The values for the parameters are said to have been “varied until a stable limit cycle was attained”. What does this mean? And does using this method choose parameter values that adequately describe the physical system? It would be good to provide a comparison between the experimentally quantified parameters and the parameters chosen based on the stable limit cycle.

      3. At the bottom of page 5, it is mentioned that the model leaves out additional populations of TLR4 in the cell. The reason for omitting this from the model should be discussed.

      4. Is it possible that the model predictions would become negative? It appears that in eq. 1 yz are subtracted independently of x - is that well justified?

      Minor Comments:<br /> 1. The Supplemental data file does not contain the Tables referenced. It does contain access to the actual model code which is helpful.

    1. On 2018-03-23 18:51:44, user Pat Schloss wrote:

      The manuscript by Contijoch and colleagues presents a very intriguing collection of experiments that evaluate the variation in DNA density within the fecal material of sixteen mammalian species. I am excited about this work because it highlights that microbial density may be a confounding variable in microbiome studies. Although it would be difficult to ascertain by this method, I have wondered whether how much of a disease like cystic fibrosis is driven by bacterial density rather than a specific set of pathogens. Although I think this is an important contribution to the field, I have several concerns about the methods and the interpretation of the data. Needless to say, the experiments and analysis have really piqued my curiosity.

      1. To throw a wet blanket on the analysis, I could argue that the differences and changes in density are of questionable biological significance. If we assume that DNA density is a proxy for density, then the differences the authors see are much less than a single log in density. Although the numbers are a bit questionable, if we assume that typical feces has 10^12 bacteria per gram, a change in 10-50% would still leave a significant amount of microbial biomass. It would be interesting to get the authors' thoughts on their data from a cell count perspective rather than the more abstract DNA density. I am curious what they think is a biologically meaningful difference in density.

      2. The authors appear to assume that fecal DNA is coming from living organisms. However, previous studies have indicated that a minority of bacteria in feces are from intact cells. Ben-Amor and colleagues found that 49% of cells were intact, 19% were injured or damaged cells, and 32% were dead (doi: 10.1128/AEM.71.8.4679-4689.2005). This is important because it would impact their transfer experiments and it may be possible that different mammalian species have different fractions of live/dead bacteria in their feces. In spite of this potential confounder, it is still interesting that the DNA load varies so much across species and between individuals of the same species.

      3. The authors did not comment on the fact that some mammalian species vary widely in their DNA density. The density in rats, pigs, and mice varies more than their median density. It would be nice to know whether all individuals within the species consumed the same diet or whether there were other differences that may account for intra-species variation.

      4. I fear the authors may have over interpreted the meaning of the variation in carrying capacity. They imply that the carrying capacity is an intrinsic variable for each species. I wonder if it isn't also a product of the taxa within the sample. Some taxa may have "sharp elbows" and exclude other taxa and their density. It would be interesting to see something like a Mantel test relating the difference in density to the difference in beta-diversity within each species. Is density related to community structure?

      5. Throughout the manuscript, the authors describe differences in "community fitness". The authors need to provide a better definition of fitness in this context. I am unclear whether it's a measure of the ability of a transferred community to have the same carrying capacity as the host (or the donor) or whether it's a measure of something else. Regardless, I worry that if it is tied to the ability to colonize a germ free animal of a different species that it is a poor metric. Again, we know that much of the DNA in feces is from dead bacteria, there is host-dependent selection for what type of microbes can live in an environment, and a one-time gavage of microbiota is unlikely to enable taxa that are part of a climax community to colonize. It's a bit too artificial of a measure of fitness with a bunch of troubling caveats.

      6. The authors show that differences in density relate to differences in gene expression and host response. Unfortunately, there is no commentary on what the genes they identified were and whether there are plausible relationships between differences in microbiota and their density and host response. A fear is that such analyses are prone to false positives - even with correction for multiple comparisons. Given that the authors used host tissue from the cecum and the rest of the gastrointestinal track for these analyses, it would be nice to see confirmation that the DNA density in the feces correlated with its density in other locations.

      7. Throughout the manuscript the authors measure density as the "ug of DNA per mg feces". It is unclear to me whether this was based on the wet or dry weight of the fecal material. I would argue that it should be on a dry basis in all of the analyses to control for differences in stool consistency. I know the authors have presented evidence that there's no correlation between density and water content, but it would be interesting to see the authors correct their data for moisture content. For example, is the variation in mouse microbiota density partly attributed to variation in water content? The colors in 1C do not allow me to easily discriminate between the 16 species, but it appears that there is more variation in moisture content than density for the mouse samples and several other species' samples as well.

      Other comments...

      1. L45 "rDNA" should be "rRNA gene" throughout the manuscript.

      2. L65 "sixteen different mammals". Should be "sixteen mammal species"

      3. Throughout the authors use the mean and SEM to report their results. These data do not appear to be normally distributed. I would find the results more compelling if they presented the median and interquartile range.

    1. On 2018-03-18 13:46:56, user Aurelien ROUX wrote:

      I have read this paper with a critical mind. I provide here a list of concerns about this manuscript, based on previous publications using similar techniques, many of which I am a co-author. Also, for sake of transparency, I am in direct competition with this study as colleagues have obtained in my group opposite results (no fission and no force change with the same set of proteins) made with similar tube pulling assays. However, I think that some of the claims made by this study are sufficiently incompatible with the known mechanics of membranes to be commented, so that the reader can make an educated opinion with a detailed comment. I provide here only a list of concerns, and I do not comment on the quality of the work, besides the points raised below. As such, this is not a formal review, and thus does not aim at proposing experiments or controls to improve the quality of the work.<br /> There are several main problems with this study:<br /> The force increase: <br /> One of the important claims of the authors is the change in tube force (force needed to hold the tube) upon local protein binding. Force increase or decay as a response of protein binding to membranes has already been observed for dynamin (Roux et al., 2010), BAR proteins (Prevost et al., 2015; Simunovic et al., 2016; Singh et al., 2012; Sorre et al., 2012; Wu and Baumgart, 2014) Epsin (Capraro et al., 2010) clathrin (Saleem et al., 2015) and the ESCRT-III protein Snf7 (Chiaruttini et al., 2015). Using the tube pulling assay described in this study, the tube force always drops when a protein binds specifically to the tube (curvature sensor), because the protein stabilizes the tube shape. A force increase is seen in cases where the protein binds only onto the GUV, as it is the case for Snf7, because it increases membrane tension that has a direct effect onto the tube force (tube force = F = 2π√2κσ where σ is the tension and κ the bending rigidity, (Chiaruttini et al., 2015)). Cases where protein binds to both GUV and tube are more complex to describe, but usually lead to tube force drop because the stabilizing effect of the protein overcomes the increased tension of proteins binding to the GUV (Saleem et al., 2015; Sorre et al., 2012).

      1-The author claim that they see an increase of tube force, correlated with a local polymerization of ESCRT-III proteins into the tube. First, a polymerization of ESCRT-III in the tube, based on above mentioned studies, is expected to drop the force, not to increase it, as the ESCRT scaffold should stabilize the tube shape. The most likely explanation for this force increase is that the ESCRT proteins polymerize onto the GUV, increasing its tension, as seen in Chiaruttini et al. Cell 2015 (Fig 5G-H). Also, the authors have a clear dependence of the force increase with the ATPase activity of Vps4 (fig 2), which was shown to activate turnover and polymerization of large ESCRT-III assemblies (Mierzwa et al., 2017), reinforcing the idea that the force is arising from the polymerization of ESCRT onto the GUV, and not on the tube. Unfortunately, the poor quality of the confocal images provided in fig 4A-F (see following comments below) does not allow to see if ESCRT proteins polymerize onto the GUV membrane. In fig 4I, the authors plot the fluorescence intensity of the Snf7 at the GUV membrane as a function of time, but from the image shown in 4C and 4F, it is impossible to distinguish the membrane bound pool from the bulk solution in the GUV lumen. <br /> Vps2/Vps24 have inhibiting function on Snf7 polymerization that can be levered by the addition of Vps4 and ATP (Mierzwa et al. NCB (2017)). Thus, a possible mechanism to explain the authors’ data is that upon ATP uncaging, Vps4 removes the inhibition of Vps2/Vps24 and promotes polymerization of the ESCRT-III proteins onto the GUV. This increases the tension, and the tube force.

      2-Second, when a protein partially polymerizes onto the membrane tube it cannot change the force, because the membrane is fluid and cannot act as a force transmitter (Roux et al., 2010; Simunovic et al., 2016). The protein thus acts as a swimmer in the swimming pool, moving the lipids around rather than applying forces on the membrane boundaries. Thus, it is very unlikely that the force increase measured by the authors is generated by the ESCRT-III punctae along the tube.

      3-the force values. The authors claim that the force generated by the ESCRT-III polymerization is responsible for fission. However, when they have 2um of proteins, they have a high force value (max 65pN. Fig 2), which should be more effective for fission, but observe no fission. On the contrary, at a lower ESCRT-III concentration, 200nM, they have a much lower force increase, (below 20pN, Fig 3D), which should be much less effective in promoting fission, and they observe fission in 50% of the cases. Knowing the force and the bending rigidity of POPC membranes used in this study (10 kT, see (Marsh, 2006)), one can calculate the resulting membrane tension at the highest tube force of 65 pN using the formula F = 2π√2κσ. It gives values of tension in the order of mN/m, which are in the range of lysis tension values (usually 1-10 mN/m). It is thus expected that at the highest force (with 2uM proteins), the authors should see fission, just because they should break the membrane by reaching lysis tension.

      In conclusion to the force comments, the authors do not provide a mechanism (or model) to explain how local polymerization of the ESCRT-III complex could generate force increase in the tube. They do not provide either an explanation how the force increase could participate in fission. In particular, the increase in force tube suggest the force is applied along the tube axis, whereas fission requires forces perpendicular to the tube axis (constriction) which normally do not affect the tube force. For instance, in fission events mediated by punctual clusters of dynamin along the tube, no force change is observed prior to fission (Morlot et al., 2012).

      The fission rate, efficiency and localization.

      The time of fission is somewhat surprising. All events reported in this paper take more than 150 seconds apparently and some are very long about 600 seconds. About 50% of the tubes do not break after 1000s. In comparison (numbers taken from (Morlot et al., 2012)), the average time for dynamin-mediated fission is below 10s if the GTP concentration is above 150uM, with 100% efficiency. The dynamin fission time is still below 100s, with more than 65% efficiency if the GTP concentration is only 5uM. Thus, if the events reported in this manuscript are really fission events (see comments 2 below), the rate is extremely low and the reaction non-efficient. This may indicate a missing factor. <br /> An inherent limitation of the force trace to follow fission of the tube is that instantaneous force drop can be due to detachment of the membrane tube from the bead, and not to fission. Because the force is rising in this manuscript, increasing the probability of detachment of the membrane from the bead, and the time for fission is above 100s, the probability is high that the events shown in figure 3 are tube detachments and not fission. Of course, the fact that the authors do not observe such events with 2uM proteins, while having higher force, supports that the membrane/bead link is solid, but by experience, the solidity of such link varies a lot from experiments to experiments.<br /> Thus, to fully validate fission, it is required to show time-lapse imaging that correlates the position of the protein coat with the fission event, identified by a clear break in the tube with visible stumps after fission, one attached to the bead (Morlot et al., 2012; Simunovic et al., 2017). In this study, the fission event is not clearly identifiable on the images presented in Fig 4F. Instead of a clear cut in the membrane channel, one can see the tube rather fainting away instead of breaking. This is consistent with the hypothesis that the force increases because of a rise in membrane tension, that would reduce progressively the tube radius. <br /> But the most awkward images are the ESCRT-III protein coat punctual localization in the tube (fig 4F). It consists of a single pixel, certainly very bright (see arrows in images 49.10s to 50.95s in fig 4F). One would expect that if a diffraction-limited spot of ESCRT-III protein was seen, it would cover at least several pixels. Moreover, images and movies look like they have been time averaged (see below), which could artificially increase the duration time of a puncta present in the bulk solution, and that coincidently localized with the tube on a single image. <br /> For instance, movie S2 shows some unexpected noise correlations between frames (compare noise around the time stamp in images of the Vps2 channel between images at time -1.96s and -2.33).

      This strongly suggests that the movies have been time interpolated. This may not be a problem if it is clearly stated and provided that the quantifications and conclusions are not affected by the interpolation. But the time averaging is not stated in the text nor in the Mat&Meth. If time averaging has been performed, this is clearly a problem for the interpretation of Fig 4F: the single pixel interpreted as localized polymerization of the ESCRT-III proteins and visible on multiple frames could be in fact a noise pixel present on a single image and time averaged.

      There are also unanswered questions, which affects the reliability of the study.

      1-The group of Jim Hurley published two papers (Wollert and Hurley, 2010; Wollert et al., 2009) in which the main claim was that fission was Vps2/Vps24 dependent while it was Vps4 independent, and that Vps4+ATP was only needed for recycling. The only sentence mentioning the discrepancy between their previous studies and their new findings is line 56-59: “Early attempts at in vitro reconstitution of ESCRT-mediated budding and scission using giant unilamellar vesicles (GUVs) suggested that the process was independent of Vps4 and ATP (21,22), except for the final post-scission recycling step.” The authors must provide the fundamental differences between the two assays that explain this discrepancy.<br /> 2-Why is the reaction blocked at higher protein concentration?<br /> 3-What’s the cATP concentration? Is it constant in all experiments? How much is uncaged during UV illumination?

      Minor points:<br /> -While I understand the interpretation that instantaneous intensity drops correlate with fission of the tube in figure H, J, L (intensities measured on the tube), I do not understand what the instantaneous force drops observed in figure I, K and M (measured on the GUVs) correlate with, as no fission or explosion of the GUVs are expected.<br /> -Line 119: ”Snf7 intensity in the GUV, however, is essentially unchanged (Fig. 4I).” The quality of the images presented is too low (saturation is too high) to support this statement.<br /> -Line 124: ”We can rule out such a bulk stiffening mechanism in our system given the lack of recruitment of Snf7 to the GUV membrane and the lack of correlation between GUV Snf7 intensity and force generation.” As said above, it is impossible to see whether ESCRT-III proteins are binding or not to the GUV on the images presented in this manuscript, and how and where the fluorescence measurement on the GUV remain unclear in the text and Math&Meth.<br /> -Line 156: “We also found that scission by ESCRT III and Vps4 can occur mid tube” this statement is difficult to verify because the quality of images presented is too low to conclude anything.

      REFERENCES:<br /> Capraro, B.R., Yoon, Y., Cho, W., and Baumgart, T. (2010). Curvature sensing by the epsin N-terminal homology domain measured on cylindrical lipid membrane tethers. Journal of the American Chemical Society 132, 1200-1201.<br /> Chiaruttini, N., Redondo-Morata, L., Colom, A., Humbert, F., Lenz, M., Scheuring, S., and Roux, A. (2015). Relaxation of Loaded ESCRT-III Spiral Springs Drives Membrane Deformation. Cell 163, 866-879.<br /> Marsh, D. (2006). Elastic curvature constants of lipid monolayers and bilayers. Chem Phys Lipids 144, 146-159.<br /> Mierzwa, B.E., Chiaruttini, N., Redondo-Morata, L., von Filseck, J.M., Konig, J., Larios, J., Poser, I., Muller-Reichert, T., Scheuring, S., Roux, A., et al. (2017). Dynamic subunit turnover in ESCRT-III assemblies is regulated by Vps4 to mediate membrane remodelling during cytokinesis. Nat Cell Biol 19, 787-798.<br /> Morlot, S., Galli, V., Klein, M., Chiaruttini, N., Manzi, J., Humbert, F., Dinis, L., Lenz, M., Cappello, G., and Roux, A. (2012). Membrane shape at the edge of the dynamin helix sets location and duration of the fission reaction. Cell 151, 619-629.<br /> Prevost, C., Zhao, H., Manzi, J., Lemichez, E., Lappalainen, P., Callan-Jones, A., and Bassereau, P. (2015). IRSp53 senses negative membrane curvature and phase separates along membrane tubules. Nat Commun 6, 8529.<br /> Roux, A., Koster, G., Lenz, M., Sorre, B., Manneville, J.B., Nassoy, P., and Bassereau, P. (2010). Membrane curvature controls dynamin polymerization. Proc Natl Acad Sci U S A 107, 4141-4146.<br /> Saleem, M., Morlot, S., Hohendahl, A., Manzi, J., Lenz, M., and Roux, A. (2015). A balance between membrane elasticity and polymerization energy sets the shape of spherical clathrin coats. Nat Commun 6, 6249.<br /> Simunovic, M., Evergren, E., Golushko, I., Prevost, C., Renard, H.F., Johannes, L., McMahon, H.T., Lorman, V., Voth, G.A., and Bassereau, P. (2016). How curvature-generating proteins build scaffolds on membrane nanotubes. Proc Natl Acad Sci U S A 113, 11226-11231.<br /> Simunovic, M., Manneville, J.B., Renard, H.F., Evergren, E., Raghunathan, K., Bhatia, D., Kenworthy, A.K., Voth, G.A., Prost, J., McMahon, H.T., et al. (2017). Friction Mediates Scission of Tubular Membranes Scaffolded by BAR Proteins. Cell 170, 172-184 e111.<br /> Singh, P., Mahata, P., Baumgart, T., and Das, S.L. (2012). Curvature sorting of proteins on a cylindrical lipid membrane tether connected to a reservoir. Phys Rev E Stat Nonlin Soft Matter Phys 85, 051906.<br /> Sorre, B., Callan-Jones, A., Manzi, J., Goud, B., Prost, J., Bassereau, P., and Roux, A. (2012). Nature of curvature coupling of amphiphysin with membranes depends on its bound density. Proc Natl Acad Sci U S A 109, 173-178.<br /> Wollert, T., and Hurley, J.H. (2010). Molecular mechanism of multivesicular body biogenesis by ESCRT complexes. Nature 464, 864-869.<br /> Wollert, T., Wunder, C., Lippincott-Schwartz, J., and Hurley, J.H. (2009). Membrane scission by the ESCRT-III complex. Nature 458, 172-177.<br /> Wu, T., and Baumgart, T. (2014). BIN1 membrane curvature sensing and generation show autoinhibition regulated by downstream ligands and PI(4,5)P2. Biochemistry 53, 7297-7309.

    1. On 2018-02-14 16:38:34, user Pat Schloss wrote:

      I was excited to see the preprint from Calus and colleagues describing NanoAmpli-Seq. This is a method of sequencing long amplicons using the Oxford Nanopore sequencing platform. For my set of applications within microbial ecology, this exciting sequencing platform still appears to be a method in search of an application. This preprint lays out an improved method of sequencing full-length 16S rRNA genes. This is an important issue because (as they note) the number of full-length sequences going into our reference databases is slowing and is unlikely to be representative of the diversity we are now seeing in surveys using MiSeq to sequence fragments of the 16S rRNA gene. Further, we'd really like to have longer reads for improved classifications. Reading the Introduction one will see that my previous work developing methods for sequencing 16S rRNA genes using the MiSeq and PacBio figure prominently in their motivation. It should also be noted that I do not know the current status of this manuscript and have not been invited to review it for a journal.

      The authors do an admirable job of tempering expectations and pointing out that the sequence quality is still not to the level that we find on other platforms. The authors mention that they get a sequencing accuracy of 99.5%, in contrast to the 99.98% accuracy we see with the other methods. In some ways this manuscript reads like, "We've done our best to solve the error rate problem, here's where we are, take it from here." These type of "landmark" papers are important, but I can't help but think of things to try. Perhaps other approaches were attempted (they mention three INC-Seq aligners), but they don't seem to be mentioned and there is not an extensive description of any parameter sweep tests.

      I think it would be helpful if the authors could improve their legend for Figure 3 - this is the critical figure for describing the method. The authors should note that the A, B, C of the legend seem to correspond with the three shaded boxes, not the A, B, C, ... J within those boxes. The method appears to run the output for the Nanopore sequencer through the INC-Seq software and use that as the starting point for their flow with chopSeq. My understanding of the first step in D is to re-orient the reads and trim the reads to start and end with the correct primers. They then remove the tandem repeats. Instead, I wonder why the authors didn't start over with the INC-Seq software to make a better assembler that is aware of the primers and other issues from sequencing 16S rRNA genes. In our development of the PacBio pipeline, the creation of the consensus sequence made the biggest impact. As PacBio improved their assembler, the data quality far better than anything we could do. If they did this, the authors could calculate better quality scores, assess a aggregate consensus sequence quality score that could be used to filter the consensus sequences.

      On P14 they state, "This suggests that consensus sequence accuracy is reliably high only for OTUs where a minimum of 50 reads are available for use in constructing the consensus sequence" and on the next page that they used a three concatemer threshold set for INC-Seq. Given the ability to generate massively long reads on the Nanopore, why not run the sequencer longer to sequence more concatemers? Also, what happens to the error rate when the authors require more concatemers? Again, the PacBio aggregate quality score for a consensus sequence is linked to the level of coverage. I'm wondering if such information could be obtained either from the INC-Seq software or from making their own version of the assembler.

      As mentioned above, I found the overall description of the bioinformatics methods to be jargony and a bit glossy on details. First, I was a bit confused by the authors description of why they partitioned the consensus reads into thirds for the nanoClust step. I'm also not clear how this would work - did they cluster the three partitions separately and then bring them back together somehow? Second, they removed singletons, which probably deflates their error rate relative to my reported PacBio error rates. I know that this is contentious, but I think that removing singletons from a 'real' sample would be pretty risky and likely to create a bias against rare organisms in poorly sequenced samples. Third, I wonder why the authors didn't align the sequences prior to getting a consensus sequence using something like a NAST-based profile alignment. They could then cluster similar sequences together using something like oligotyping or our pre-clustering method. This should be considerably faster (and I suspect more robust) than VSEARCH followed by MAFFT.

      Another problem that the authors do not mention is the possible biased abundances generated by RCA. They assume that RCA followed by fragmentation and debranching would yield the same number of fragments per piece of circularized DNA. I don't know that this is true. I wonder whether random barcodes could be added to the PCR primers so that when the fragments are amplified, circularized, fragmented, and sequenced, it would be possible to know which fragments came from the same RCA reaction. That way each RCA reaction could only be used once in downstream analyses.

      I wonder whether the authors included chimeric sequences when calculating their error rates. Chimeras are not sequencing errors and should not be included in calculating the error rates. This may help to reduce their error rate a bit.

      The authors are to be commended for providing their detailed methods as supplemental materials, this is excellent. One thing we learned from publishing our Kozich methods was that in addition to this, it would be great to provide a link to a GitHub page that has the "live" version of the method with any recent updates they've made to the protocol. We have a GitHub page for ours now, but wish we would have included the link to the page in our paper since the one in the supplement is now quite out of date.

      Some smaller points...

      1. There are other methods beside PacBio for generating full length 16S rRNA gene sequences using HiSeq. Perhaps it would be worth mentioning these in passing? They cite using EMIRGE to extract 16S rRNA gene sequences from metagenomic libraries, but it has also been used by stitching together short amplicon data (doi: 10.1371/journal.pone.0056018).

      2. It's hard to keep track of what generation we're on! Instead of using "Second" and "Third" generation in the abstract and introduction, how about just using the platform names. Also, the generation model implies one generation is better than another when the authors' data indicates that "better" still depends on the application.

      3. The Abstract is jargony. There are a lot of terms used that are not defined when someone reads only the abstract. What is INC-Seq? The acronym is spelled out, but can the authors give a brief description of what it is? What is "chopSeq"? What is "nanoClust"?

      4. On P11. "Inspections of the read to reference alignment length ratio indicated that the major source of sequence error for both INC-Seq and chopSeq corrected reads originated from deletions; i.e. percent similarity of the read to the reference decreased in proportion to the read to reference alignment ratio for all experiments and INC-Seq aligners us". I don't see how the "i.e." explains the first sentence

    1. On 2018-01-25 20:11:24, user Heather Bruce wrote:

      This comment was posted a few versions back and isn't showing up here, but I think the discussion is important, so I'm reposting it.

      SPXR said:<br /> Two shortcomings are: (1) lack of explicit comparison of the "large" and "small" plates of Oncopeltus with respect to the pleurae of other insects, and (2) assumption that the abdominal appendages of insects and other Pancrustacea are uniramous. With respect to 1, the pleurae of insects comprise the two pleural arcs (coxosternite: trochantin, precoxa; anapleurite: episternum, epimeron), as defined by Snodgrass (1935, Fundamentals of Insect Morphology). It is frustrating to read a paper making claims about the "body wall" of insects without ever using the term "pleuron", which appears to betray lack of comparative morphological knowledge. That said, shortcoming 2 above is less grievous, but still disheartening: The authors claim that the abdominal styli of Archaeognatha and Zygentoma are epipods, apparently forgetting that the abdominal appendages of Pancrustacea are biramous, with an endopod ("telopod") and exopod, with a number of protopodal epipods.

      Please pardon the tone of this message. The work is very encouraging for the resolution of leg and pleural homologies overall!

      My response:<br /> Thank you for your comments, I’m very happy for the opportunity to discuss this!

      Regarding (1), we made a decision to remove as much jargon as possible so that the paper would be accessible to a wide audience. As you are probably aware, insect and crustacean morphology nomenclature can be quite daunting, and we didn’t want to lose the reader at Figure 1! In the original version of the manuscript, I went with the terms in Snodgrass 1927, where he makes a nice case for the insect subcoxa theory. So, I homologized the crustacean coxa with the insect trochantin, and the crustacean precoxa with the insect epimeron/episternum. Terminology aside, another reason not to use the insect nomenclature is that there may not be terms that correspond precisely to the ancestral crustacean structures. For example, the epimeron/episternum might only represent the lateral/pleural portion of the precoxa leg segment that was incorporated into the insect body wall, but the precoxa might also include a portion of the notum, above and adjacent to the wing. Another issue was that I did not come across a term that distinguishes the body wall part of the trochantin from the plate-like outgrowth of the trochantin, which extends over the insect coxa. It was important to distinguish these two regions, because only the plate-like outgrowth of the trochantin is deleted following the loss of wing/epipod genes (Clark-Hachtel 2013, Ohde 2013, Medved 2015, Wang, 2017), and therefore it was to this region only that I homologized the crustacean coxal epipod. Our solution to these problems was to use pictures and plain language to show our homology schema. However, I’m happy to wade into the jargon weeds with you here :)

      As you probably know, “biramous” refers to a leg with an exopod, while “uniramous” legs lack an exopod, but may have epipods and endites (Boxshall 2004, Boxshall 2009). Regarding your point (2), we do not claim that the abdominal appendages of Archaeognatha or Pancrustacea are uniramous (Parhyale abdominal appendages are quite biramous, as are the thoracic and/or abdominal appendages of many crustaceans). From our manuscript, “the thoracic stylus of jumping bristletails (Fig. 4, st) is the epipod of the crustacean basis”. From a morphological standpoint, Tiegs 1940 says that Archaeognathan thoracic styli are unsegmented, and do not have intrinsic musculature, which are hallmarks of epipods (Boxshall 2004, Boxshall 2009). In contrast, the abdominal styli, while they may or may not be segmented (Matsuda 1976 vs Staniczek 2014), apparently have intrinsic musculature (Matsuda 1976, Matsuda 1957, Tiegs 1940), which suggests that they are exopods (Boxshall 2004, Boxshall 2009). However, since thoracic and abdominal styli both emerge from the insect coxa/crustacean basis (our manuscript, and following discussion), it is somewhat curious if they are not homologous structures. I certainly welcome any good sources you may have on this subject!

      Basal hexapods aside, a more satisfying answer regarding the identity of the lateral nubs of insect embryonic abdominal appendages lies in a comparison of Sp6-9/btd and Dll expression. The crustacean basis/insect coxa expresses Sp6-9 and btd (our manuscript). It carries the exopod and endopod, which each express Dll (Fig. S1, Hejnol 2004, Williams 2002, Panganiban 1995). Thus, if insect abdominal appendages had exopods, they should express Dll. However, most insects do not express Dll in the abdomen (this is why many molecular researchers didn’t know that insects form abdominal appendages as embryos and regarded them in adults as novel structures). While Dll, and thus exopods, are not expressed in insect A2-8, there are paired, leg-like domains of btd expression on each abdominal segment of some insects (Schaeper 2010), which, according to our leg segment homology model (Fig. 4), suggests that these appendages are comprised of three leg segments: the precoxa (pink), crustacean coxa (red), and insect coxa (orange, expresses Sp6-9/btd), but not the trochanter (yellow, expresses Dll). See also the beautiful SEM images of the lateral nubs of the embryonic abdominal appendages in terrestrial carabid beetles in Kobayashi 2013 compared to the gills in aquatic carabid beetles in Komatsu 2012. Komatsu 2012 note that the gills of the aquatic beetle do not develop from the insect coxa, but from a proximal region, the subcoxa, which fuses to the body wall. This is most apparent by examining the nub/gill of the A1 pleuropod, which emerges from a position proximal to the insect coxa. Notably, the A1 pleuropodia, which is longer, expresses both Sp6-9/btd and Dll at the tip, while the A2-8 appendages, which are shorter, express only Sp6-9/btd (Schaeper 2010, Beerman 2004, Beerman 2001, Rogers 2002). This is explained by our model (Fig. 4): the longer A1 pleuropod is comprised of at least four leg segments (precoxa, crustacean coxa, insect coxa, trochanter) expressing both Sp6-9/btd and Dll, while the shorter A2-8 appendages have three leg segments (precoxa, crustacean coxa, insect coxa), expressing only Sp6-9/btd. Since the A2-8 appendages do not express Dll, they do not have exopods, and cannot be considered biramous.

      We are planning to submit this manuscript to a journal soon, and due to space limits, we could not include as much of the background supporting information as we would have liked. However, I am currently writing a review that more fully discusses the morphological, molecular, and embryological evidence for the model we propose in this manuscript. I hope this was helpful :)

    1. On 2018-01-22 03:30:20, user BenjaminSchwessinger wrote:

      Thanks for posting this preprint. The detail of analysis and the availability of all code is great. it is excellent to see more plant pathogenic obligate biotrophic fungi sequenced. My 'feel' is that these genomes may well look pretty different to some of the better studied non-obligate oomycetes and fungi e.g. 'two speed' genome with effectors clustering to TEs. I could conceived that at least a subset of effectors may well be required in obligate biotrophs as they have to infect the host to complete the life-cycle.

      Some thoughts and questions:

      • Would be great to see some read length statistics on your PacBio sequencing to get a better understanding why the genome is still in a good number of contigs.

      l. 146 Instead of beginning and end of contig I would use 5' and 3' prime of the sequence.

      l. 185 ff. I got confused here as the numbers didn't add up for me 6039 single-copy groups give rise to 6,844 one-to-one mappings? I think I get it after reading it several times, yet some rephrasing may well help. Else proteinortho with the synteny flag may have also been an option for doing this analysis.

      l. 223ff: The observation of smaller parts of the genome being reshuffled in DH14 vs. RACE1 is pretty interesting. We saw something similar comparing the two haploid genomes in wheat stripe rust fungus (see Figure 2, https://www.biorxiv.org/con.... Wonder how this all happens. Else http://assemblytics.com/ may also be a useful too in future to compare two genomes with each other in regards to structural variations.

      l. 265ff: Great analysis on paralogous. We still need to do this for our candidate effectors, yet we saw an overall 'clustering' of candidate effectors. I liked the part of looking if SPs are enriched on certain contigs. Does this also hold true if you consider gene content and not only contig length?

      Figure 4A would be easier to interpret if it were normalized to the number of genes analyzed and n given within the figure.

      l. 353 ff: Mirrors what we found in wheat stripe rust and others in P. coronata, where candidate effectors do not reside close to TEs in general and not in gene sparse regions. We also see that candidate effectors such as CEPS in Figure S2 C have no really close neighbours. This is pretty intriguing to me. Any thoughts on this? Have you tested if CSEPs are somewhat linked to BUSCOs following the idea that some effectors are necessary in obligate biotrophs. If that is the case for you guys as well, i would be happy to look into if the BUSCOs or effectors tor which this is true are conserved.

      l. 380 ff: The analysis of a TE burst in Bgh is very interesting indeed. I think it would profit from a bit more detail on what kinds of TEs were found and how much each family covered. Figure 5 also lacks some details about the usage of all these acronyms used in the figure eg. BOTR? Increasing font size and including a key in the legend would be great.

      What I wonder with BGH is where did all the old TEs go? Wouldn't you expect to have some of the older TEs still present around the same age/%id as in the other Blumeria? Within the Blumeria how many TE families were specific to each species? Could it be that your database does not include the most recent TEs from other fungi?

      Supplemental figure[:-3]: Not sure that joyplots are the best representation here. A circos plot maybe a better visualization.

      Great work. Gave me some good pointers for my own work.

    1. On 2018-01-04 21:08:32, user Jeffrey Ross-Ibarra wrote:

      Although current data strongly suggests a single domestication of maize (Matsuoka et al. 2002), knowing the geographic location of domestication is of interest for a multiple reasons. It may be of use agronomically, allowing us to identify portions of the range of maize’s wild ancestor teosinte most likely to harbor novel genetic diversity. But it is also of interest scientifically in terms of our understanding of how domestication occurs. Is maize descended primarily from a single population on one hillside and spread from there? Or was maize domestication a more dispersed process, involving selection and gene flow across a number of populations by multiple groups?

      By studying a nice sampling of maize and teosinte populations from across Mexico, Moreno Letelier et al. (2017) seek to reasses the genetic evidence for specific geographic origins of maize domestication. Using a number of different methods, they claim “the likely ancestor of maize may be an extinct population of teosinte from Jalisco or the Pacific coast”.

      I should state from the start that I don’t know where maize was domesticated. The SouthWest Mexican lowlands <1800m<br /> seems pretty likely given all the evidence, but whether Jalisco or Michoacan or Balsas I don’t think the genetic data have yet said with any certainty.

      Below I detail some concerns with the analyses presented here.

      Jalisco as ancestor

      Moreno Letelier et al. (2017) build dendrograms of genetic distance (Figure 3) among all their samples, finding that parviglumis from Jalisco is closer to maize than populations from the Balsas. I don’t doubt this result, but as we discuss in Van Heerwaarden et al. (2011), this could be due to gene flow instead of ancestry. Current gene flow from parviglumis to maize is known in Jalisco (see e.g. discussion in Serratos (1997)), and should be discounted as an explanation before trying to infer ancestry from genetic distance alone. Indeed, in their own TreeMix analysis (Figure 4), Jalisco populations of teosinte form a single group with other teosintes, and are thus no more “ancestral” than any other (but see below for issues with TreeMix analyses). Given the really nice data the authors have, I’d be tempted to do something like redoing the analyses of Van Heerwaarden et al. (2011), especially if combined with denser geographic sampling.

      I’m not sure where the inference of an “extinct” population comes from, as this idea seems mentioned only in the abstract.

      TreeMix

      The authors use TreeMix (Pickrell and Pritchard 2012) to test for gene flow. This method first builds a population tree using allele frequencies, then adds edges (arrows) of migration to account for excess covariance in allele frequencies between populations. However, the authors chose to compare all domesticated maize as a single group to individual populations of teosinte. This means any post-domestication gene flow between maize and teosinte (which is presumably restricted to sympatric populations) is either missed entirely or interpreted as gene flow between all maize and teosinte. Indeed, the gene flow shown on Fig. 4 is between maize and mexicana, as has been well documented in the highlands of central Mexico (Hufford et al. 2013), but is limited to populations there and perhaps the Southwest US (Fonseca et al. 2015).

      A clue that this analysis might be problematic comes from the monophyletic grouping of all teosinte (both mexicana and parviglumis) separate from maize. Taking this at face value would suggest those subspecies split after domestication, which seems somewhat unlikely given both genetic (Ross-Ibarra, Tenaillon, and Gaut 2009) and ecological (Hufford et al. 2012) evidence they’ve been distinct for some time.

      I think it would be preferable to sample a number of maize populations and include each in the analysis, hopefully allowing TreeMix to do a better job building the correct tree and localizing gene flow. SeeDs of Discovery data, for example, provides publicly-available SNP data for ~5,000 maize landraces.

      ABBA-BABA

      The authors then apply the ABBA-BABA test (Durand et al. 2011), which tests for assymetry in counts of shared derived alleles between two taxa in an ingroup with a third taxon. If the tree depicting the relationship between species is correct, then both ingroup taxa should share similar numbers of derived alleles with the third taxon. Asymmetry in numbers of shared derived alleles then suggests gene flow. Here, the authors use only maize from the highlands of central Mexico for this test, citing Freitas et al. (2003) that these landraces were likely the first to be domesticated. But the widespread gene flow from mexicana into highland maize makes a problematic choice to use for understanding the origin of maize domestication (Van Heerwaarden et al. 2011). Moreover, both trees show teosinte populations sharing a common ancestor more recently than either do with maize, which seems problematic. The first tree (((Jalisco,Balsas),maize),Tripsacum) shows the two parviglumis populations splitting post maize domestication, which is only plausible if one is a very recently derived colonist. The second tree (((*mexicana*,Balsas),maize),Tripsacum) shows parviglumis and mexicana diverging after their common ancestor with maize, which as discussed above is likely wrong. Significant D (or fd) statistics here may thus mainly reflect that the tree is wrong. Perhaps instead the questions of maize origin might be one of comparing a “Jaslico-ancestral” tree (((Jalisco,maize),Balsas),Tripsacum) to a “Balsas-ancestral” tree (((Balsas,maize),Jalisco),Tripsacum) – I’m dubious ABBA-BABA is the appropriate way to go about this though.

      From the lit

      Both Van Heerwaarden et al. (2011) and Hufford et al. (2013) are papers produced by my lab, so I’m clearly not objective, but in several places the authors seem to ignore or misinterpret results from these papers, highlighting instead results from their own work which are pretty similar.

      Recognizing that gene flow from mexicana likely causes biases in identifying ancestral maize populations, Van Heerwaarden et al. (2011) used a broad sampling of >1,000<br /> landraces to estimate ancestral maize allele frequencies. We identified numerous samples from Western Mexico (including multiple samples from Jalisco) as those most genetically similar to the putative ancestor of modern maize. Notably, however, we did not suggest “ancestral teosinte alleles in the Western region, rather than the Balsas Basin” (emphasis mine) – we actually didn’t have the resolution to really say one way or the other (see our Figure 3B). In fact, in spite of our lack of resolution, we mostly interpreted our data as consistent with archaeology and previous genetics as supporting a Balsas origin. In spite of its inclusion as evidence supporting a possible Jalisco origin, Moreno Letelier et al. (2017) seem to forget our paper later, however, claiming that “dense enough sampling in the mountains of Jalisco… were not considered in previous studies as a potential center of domestiation”, and noting “the inclusion of Jalisco populations here, which have not been used previously in other studies”.

      Hufford et al. (2013) used the same genotyping platform as Moreno Letelier et al. (2017) to test for gene flow between mexicana and highland maize. But while Moreno Letelier et al. (2017) claim “previous studies could not differentiate between contemporary processes and ancestral introgression”, we explicitly used HapMix (Price et al. 2009) to estimate the timing of admixture from tracts of inferred ancestry. Our analysis was problematic for a number of reasons – for example assuming a single bout of admixture – but nonetheless revealed that maize alleles in mexicana were mostly young while mexicana alleles in maize could be quite old, consistent with adaptive introgression from mexicana into maize upon colonization of the highlands and selection against gene flow from maize into mexicana (see Fig. S4 in Hufford et al. (2013)). The authors later compare their inferred 9.6% introgression from mexicana into maize to experimental results showing 1-2% (citing our review (Hufford et al. 2012), but presumably referring to results from Ellstrand et al. (2007)), but don’t mention the nearly identical 9.8% estimate from Hufford et al. (2013) using STRUCTURE (Pritchard, Stephens, and Donnelly 2000) (our HapMix estimate was 19.1%). Their result that “there are more introgressed alleles from mexicana to maize than in the opposite direction” also echoes our finding that “gene flow appeared asymmetric, favoring teosinte introgression into maize”.

      Fnally, Moreno Letelier et al. (2017) seem to imply that climate data pointing to the existence of refugia in Western Mexico favor a Jalisco origin for maize. But the paper they cite – Hufford et al. (2012) – instead argues “there has been little change in the subspecies’ ranges from the time of domestication to the present”, and at least by my reading makes no reference to specific geographic areas as more likely domestication origins.

      References<br /> Durand, Eric Y, Nick Patterson, David Reich, and Montgomery Slatkin. 2011. “Testing for Ancient Admixture Between Closely Related Populations.” Molecular Biology and Evolution 28 (8). Oxford University Press: 2239–52.

      Ellstrand, Norman C, Lauren C Garner, Subray Hegde, Roberto Guadagnuolo, and Lesley Blancas. 2007. “Spontaneous Hybridization Between Maize and Teosinte.” Journal of Heredity 98 (2). Oxford University Press: 183–87.

      Fonseca, Rute R da, Bruce D Smith, Nathan Wales, Enrico Cappellini, Pontus Skoglund, Matteo Fumagalli, José Alfredo Samaniego, et al. 2015. “The Origin and Evolution of Maize in the Southwestern United States.” Nature Plants 1. Nature Publishing Group: 14003.

      Freitas, Fabio Oliveira, Gerhard Bendel, Robin G Allaby, and Terence A Brown. 2003. “DNA from Primitive Maize Landraces and Archaeological Remains: Implications for the Domestication of Maize and Its Expansion into South America.” Journal of Archaeological Science 30 (7). Elsevier: 901–8.

      Hufford, Matthew B, Paul Bilinski, Tanja Pyhäjärvi, and Jeffrey Ross-Ibarra. 2012. “Teosinte as a Model System for Population and Ecological Genomics.” Trends in Genetics 28 (12). Elsevier: 606–15.

      Hufford, Matthew B, Pesach Lubinksy, Tanja Pyhäjärvi, Michael T Devengenzo, Norman C Ellstrand, and Jeffrey Ross-Ibarra. 2013. “The Genomic Signature of Crop-Wild Introgression in Maize.” PLoS Genetics 9 (5). Public Library of Science: e1003477.

      Matsuoka, Yoshihiro, Yves Vigouroux, Major M Goodman, Jesus Sanchez, Edward Buckler, and John Doebley. 2002. “A Single Domestication for Maize Shown by Multilocus Microsatellite Genotyping.” Proceedings of the National Academy of Sciences 99 (9). National Acad Sciences: 6080–4.

      Moreno Letelier, Alejandra, Jonas A. Aguirre Liguori, Maud I Tenaillon, Daniel Piñero, Brandon S Gaut, Alejandra Vazquez Lobo, and Luis E Eguiarte. 2017. “Was Maize Domesticated in the Balsas Basin? Complex Patterns of Genetic Divergence, Gene Flow and Ancestral Introgressions Among Zea Subspecies Suggest an Alternative Scenario.” BioRxiv. Cold Spring Harbor Laboratory. doi:10.1101/239707.

      Pickrell, Joseph K, and Jonathan K Pritchard. 2012. “Inference of Population Splits and Mixtures from Genome-Wide Allele Frequency Data.” PLoS Genetics 8 (11). Public Library of Science: e1002967.

      Price, Alkes L, Arti Tandon, Nick Patterson, Kathleen C Barnes, Nicholas Rafaels, Ingo Ruczinski, Terri H Beaty, Rasika Mathias, David Reich, and Simon Myers. 2009. “Sensitive Detection of Chromosomal Segments of Distinct Ancestry in Admixed Populations.” PLoS Genetics 5 (6). Public Library of Science: e1000519.

      Pritchard, Jonathan K, Matthew Stephens, and Peter Donnelly. 2000. “Inference of Population Structure Using Multilocus Genotype Data.” Genetics 155 (2). Genetics Soc America: 945–59.

      Ross-Ibarra, Jeffrey, Maud Tenaillon, and Brandon S Gaut. 2009. “Historical Divergence and Gene Flow in the Genus Zea.” Genetics 181 (4). Genetics Soc America: 1399–1413.

      Serratos, J Antonio. 1997. Gene Flow Among Maize Landraces, Impoved Maize Varieties, and Teosinte: Implications for Transgenic Maize. CIMMYT.

      Van Heerwaarden, Joost, John Doebley, William H Briggs, Jeffrey C Glaubitz, Major M Goodman, Jose de Jesus Sanchez Gonzalez, and Jeffrey Ross-Ibarra. 2011. “Genetic Signals of Origin, Spread, and Introgression in a Large Sample of Maize Landraces.” Proceedings of the National Academy of Sciences 108 (3). National Acad Sciences: 1088–92.

    1. On 2017-12-28 19:16:57, user Larry Matt York wrote:

      The preprint entitled, "Trait components of whole plant water use efficiency are defined by unique, environmentally responsive genetic signatures in the model C4 grass Setaria" investigates components of water use efficiency in 189 genotypes of a recombinant inbred line population using a controlled environment automated phenotyping system that controls water content of pots and measures plant size using imagery. Overall, the paper is well-written, methods are largely satisfactory, and conclusions are valid. However, there may be some gaps in explanation of the experimental design and while it's understandable this is new system generating a ton of data, I don't feel enough is being done to use the time series data, which focuses on making daily calculations. Further discussion of the SLOD method would be appreciated as it may account for some time dependence. These are explained more below.

      Methods are quoted largely from previous related manuscripts, which is fine with me. However, the number of replicates was not reported. Based on the number of genotypes, 189, and the stated number of individuals, 1138, we can assume there were 3 reps or blocks within which the water treatment levels were randomized along with the genotypes (although there is a remainder when dividing 1138 by 189 - why?). However, stating the number of replicates in the methods would be standard. One sentence is confusing, though, "This strategy effectively…within both treatment blocks." Does this imply only two blocks, one for well watered and one for water limited? In this case, there would only be one replicate of WW and WL, so impossible to do statistical comparisons of the water treatment levels. Personally, I feel some type of schematic of physical layout is always necessary to ensure correct description of design.

      Line 171: I believe multiple linear regression is a more common term, multivariate would imply multiple dependent variables but I think you only had mass. Given the confusion you should also specify if fresh and dry weight were estimated simultaneously (multivariate) or separately (multiple). As a side note, you could try models that include the interactions of the predictors, which is the same as multiplying predictors together to create a new term.

      Line 224: Maybe I'm missing something, but I don't see how calculations every other day limit replication? Are you saying you use the values for each day as replicates? Is that accurate? Replication should be the statistical replication, which I think is 3. That would seem an odd choice to me, based on my understanding. The explanation of equation 1 partially answers this, but I'm not familiar with that approach. Has it been used elsewhere other than in your own work?

      Line 257: Seems like doing analysis for each time point individually is sort of the obvious way, but I'm not sure it leverages the power of the timer series data the most? Are there not more complex models that include time series for QTL analysis? How to more effectively handle time series data will be a major consideration for the future of phenomics.

      Lines 293-315: Redundant with methods, which should not be necessary in the Results section. If some of the information is not in methods, put it there and delete from here.

      Line 209: The talk of both treatment blocks is confusing, as described earlier for statistical design. I think you have three blocks with water and genotype randomized within (at least I hope so).

      Line 328: For discussion, is it possible to update the water weight during the experiment using the biomass estimates?

      Line 331: Have you considered non-linear curve fitting? Loess shows it's variable during the life cycle, but also looks like a saturating curve might approximate. Then, the parameter estimates of the curve could be new traits.

      Figures. Put legends on the figures, not just in text. For a good plot, you shouldn't need to read the caption.

      -Larry M York, Noble Research Institute

    1. On 2017-11-07 01:27:39, user Gustavo Rocha wrote:

      The work is not ground-breaking and the methodology is relatively simple, but the results do certainly expand on the knowledge we Brazilian researchers have on our own flora. I was wondering whether it was considered or not to carry out similar studies with other ginger variants; it is known that the major compound of Zingiber officinalis, for instance, is gingerol, for example. Even though these different plants are still named “ginger”, chemical compositions may vary greatly. Under the same trail of thought, I was also wondering whether it would be better to assess MIC and MBC of the whole oil, and not of zerumbone by itself. It was fortunate that this compound was the major one found in the extracted oil, but other flavonoids found in the composition of the whole oil might improve the antimicrobials actions of zerumbone due to synergistic mechanisms. It would have been interesting to see a comparison between zerumbone and the whole oil. Should MBC and MIC be the same, there would be no need to extract and isolate zerumbone should it ever come to be used as a therapeutic agent, saving money and time for the industries. Also, other ginger samples obtained at different seasons of the year could have different compositions of essential oil, which could cause zerumbone to not necessarily be the major compound, and in these circumstances, having the whole oil characterized might be better than just the isolated compound. I also think it is a bit “bold” to claim zerumbone could be used specifically to treat tooth decay diseases; it does work against a strain of bacteria, but it is a long way from zerumbone to be actually used in a formulation on human beings to treat this kind of disease; the potential is certainly there, however, as is with many of our yet to be studied plants.

    1. On 2017-10-11 15:38:59, user Pat Schloss wrote:

      The preprint by Robert Edgar sets out to take on the issue of what similarity threshold should be used to delineate bacterial species using partial and full-length 16S rRNA gene sequences. This is well covered territory and I'm not sure that many people would defend to the death the assertion that a 97% cutoff describes species-level taxa. It is helpful to have a discussion about the various threshold people use to bin sequences into OTUs. I think that the broader discussion and the discussion in this specific preprint in favor of a high threshold (e.g. 99.9 or 100%) has come off as being rather dogmatic. My comments below include suggestions for taking a more nuanced view. Ultimately, I think Edgar's and others' goal of pushing the field to a high threshold is an attempt to get a tool to do something it is not capable of doing. Specifically, 16S rRNA gene fragment sequences cannot delineate bacterial species and cannot tell us about phenotype. If scientists have these types of questions there are far more powerful tools at their disposal than debating the appropriate threshold for defining OTUs.

      To be transparent, a considerable amount of the material that Edgar uses as a point of contrast to his work are papers that I have published over the past few years and I am the creator of mothur. As of writing this review, I have not been asked to review this manuscript for a journal, but would be happy for any editor to use my comments. Judging from the style of writing, my sense is that this preprint is unlikely to have already been submitted to a journal.

      Major comments.

      1. The general approach Edgar has taken is to use a variety of metrics to compare the composition of operational taxonomic units (OTUs) generated by database-independent approaches to the taxonomic assignment for those sequences. By identifying the distance that optimizes these thresholds, he arrives at the conclusion that the widely used 97% threshold is too low. Although this approach may be new, this conclusion is not (see the numerous papers published by Tiedje and Konstantinidis. I have significant concerns about his method and do not think Edgar has appropriately described the limitations of his approach. His is a problematic approach because systematicists are inconsistent in how they lump and split strains into bacterial species. From the perspecitve of the 16S rRNA gene, some species are finely split (e.g. Bacillus cereus, subtilis, anthracis) and others are lumped (e.g. Pseudomonas putida). There is broad consensus within microbiology that the 16S rRNA gene is unable to delineate bacterial species or phenotype. Furthermore, a 250 nt region of that gene is even less able to delineate a species. Considering that a minority of bacteria have actually been assigned a species-level classification, using taxonomy as the ground truth for assessing a threshold is problematic. Previous attempts have replaced the DNA-DNA hybridization approach of Stackebrandt and Goebel with genome-scale phylogenies and attempted to correlate that structure with 16S rRNA gene sequence diversity. These caveats as well as a more thorough review of attempts to find a better cutoff are warranted in a revised manuscript.

      2. One of the reasons to favor a less restrictive threshold (e.g. 97%) is that there is considerable intragenomic variation in addition to considerable intraspecies variation. Using a higher threshold risks splitting sequences from the same genome into different OTUs. Previously, Edgar has indicated that he thinks this variation is the result of sequencing artifacts or contamination (see bottom of page 9, https://doi.org/10.1101/081... they are not. As an example of intragenomic variation, E. coli ATCC 70096 has 7 copies of the 16S rRNA gene and 6 of these are different from each other in the full length version of the gene. Fortunately, within the V4 region the 7 copies are identical. Alternatively, Staphylococcus aureus ATCC BAA-1718 and Staphylococcus epidermidis ATCC 12228 both have 5 copies of the 16S rRNA gene. Considering the V4 region of these species, 4 of the 5 copies in each genome are identical between the two species. The remaining S. aureus copy is 1 nt different from the other S aureus copies; however the remaining S. epidermidis copy is 1.7 and 2.0% different from the other S. epidermidis and S. aureus copies. The less restrictive threshold would lump the two species together; however, the more restrictive threshold suggested by Edgar would generate 3 OTUs. None of these reflect the biology he claims and the method would split sequences from the same strain into different OTUs. Given the ubiquity of these strains in skin-associated communities, it would make sense to take a more guarded recommendation than to make dogmatic pronouncements about using high thresholds. In the Discussion, Edgar brushes off intraspecies variation concerns and seems to ignore the case where an investigator would like to make an inference regarding the association between the relative abundance of individual OTUs and different treatment groups. Furthermore, he seems to think it would be possible to correct for the inflated alpha diversity metrics obtained by splitting sequences from the same species into different OTUs - the same seems reasonable to say about lower threshold. Although Edgar's Pcs calculations seem to account for intraspecies variation, it does not seem to factor in intrastrain variation.

      3. Edgar states "Also, state-of-the-art denoisers have been shown to accurately recover biological sequences from 454 and Illumina amplicon reads (Quince et al., 2009; Callahan et al., 2016; Edgar, 2016) suggesting that the best strategy for amplicon reads is to cluster denoised sequences, in which case the clustering problem is well-modeled by error-free sequences from known species." Again, I would encourage caution in pushing these methods as the strengths and weaknesses of the approaches are not well established. Some of the methods are aggressive in removing rare sequences that may be true sequences, others seem to overfit complicated models, and as described above, others may be splitting 16S rRNA genes from the same genome into different OTUs. Furthermore, the lack of randomness in sequencing errors has not been addressed thoroughly, which creates the possibility that a spurious sequence with sufficient sequencing coverage could be treated as a new OTU rather than be folded into a similar OTU. Finally, these methods have not been well validated for the breadth of sequencing platforms that people are using. I am far more confident in the quality of sequences generated from fully overlapping 250 nt MiSeq reads for the V4 region than I am for single HiSeq reads of the V4 region. There is a trend for people to push the length of the region and throughput at the expense of quality. In short, I agree that a species likely requires a very high threshold for 16S rRNA gene sequences; however, I am not convinced by the papers he has cited that the data accumulated in the literature is of sufficient quality to trust OTUs generated with high thresholds. Combined with the reality of intragenomic variation, I see value in having a more nuanced recommendation.

      4. I am happy to receive Edgar's critique regarding the methods used in mothur. I do not see how his section comparing mothur and pairwise alignments or adverse triplets helps make his points about the OTU threshold. I would suggest removing these sections unless he can find a way to tie them in better to his bigger claims - I certainly wouldn't lead off the Discussion with a critique of my use of the Matthew's Correlation Coefficient. That is a weak way to summarize his story. The following two comments will address these specific comments that, again, I do not feel have a direct connection to the goal of the paper.

      A. The comparison between NAST-based profile alignments and pairwise alignments has previously been published. We too saw that pairwise alignment had smaller distances than profile alignments (doi: 10.1371/journal.pcbi.1000844 and doi: 10.1371/journal.pone.0008230). By definition, a pairwise alignment optimizes the similarity between the two sequences. In contrast, by using a profile-based alignment where the reference is aligned to the secondary structure of the 16S rRNA molecule, additional information is incorporated. This frequently increases the distance between sequences because it incorporates this extra information. I have also addressed this previously in the literature (doi: 10.1038/ismej.2012.102). I agree that the example Edgar shows is a problem. It is a well-known issue with profile alignments - if there are problems in the reference, there will be problems with the alignment. When using the SILVA reference alignment, such errors can be corrected by fixing the reference alignment. Furthermore, I would point out that an advantage of using a profile alignment like the NAST aligner in mothur is that it is considerably fast compared to a pairwise alignment. Generating pairwise alignments for N sequences would take N times longer than a profile alignment (i.e. profile alignments scale linearly while pairwise alignments scale quadratically). With large datasets pairwise alignments can be prohibitive while it only takes seconds with a profile alignment.

      B. Regarding the section, "Comments on the MCCsw metric"... I readily acknowledge that because evolution does not care to conform to a similarity threshold when creating species, there will be "adverse triplets" around any threshold. As I've pointed out above, there are adverse triplets in the case of S. epidermidis V4 sequences and full length E. coli 16S rRNA gene sequences. In fact, this is why we have developed the MCC metric. It evaluates how well an algorithm balances the need to split and lump similar 16S rRNA gene sequences when assigning sequences into a bin. We have used MCC in a fundamentally different method than Edgar has in this paper. We used it assuming that the taxonomic databases are not helpful. He uses it assuming that it is the ground truth. Perhaps there is room for both views, but given the points I raised above, I am happy to stick with my approach over Edgar's.

    1. On 2017-09-14 03:15:20, user Yuri Lazebnik wrote:

      Dear Naomi,

      Thank you very much for your insightful comment and for your interest in the R-factor.

      Please let us reply point by point.

      “The idea of classifying hypotheses as supported or refuted by ongoing works, as a means to identify "strongly supported" or "strongly refuted" claims is an interesting one. I would like to see further discussion of how this could be applied.”

      Thank you for considering our idea interesting! We will be happy to discuss it.

      We would prefer to avoid qualifiers, such as “strongly”, because what is strong evidence for one scientist can be nonsense to another, as many a scientific discussion or a set of opposing reviews would testify.

      “Namely, it seems the R-factor is something that should be applied to a specific scientific claim, as opposed to a whole research article. Being able to quickly identify the evidence that supports (green), refutes (red) or relates unclearly (yellow) to a claim, directly from the claim in said literature, could aid comprehension (not to mention discoverability) of the surrounding literature, and highlight claims that are well-supported or lacking in independent replications. Do the authors feel that one paper is sufficiently related to one central claim for application of the R-factor the paper? Alternatively, I would argue that judging the "veracity" of component evidence presented within an article could be more informative.”

      We agree completely and tried to emphasize the focus on a claim as a unit of evaluation in our preprint. We will update the preprint to articulate this focus explicitly. Whether an article would have one claim or more depends on the report. In the latter case, applying the R-factor to all claims would be reasonable.

      In the examples mentioned in our preprint and in our current research we deal with the main claims because these claims are commonly articulated in the titles of the articles by their authors. This choice minimizes the possibility of misunderstanding what the authors concluded and facilitates the automation of identifying what an article claims.

      “Further, limiting these data to the "cited by" literature from that paper could skew the perspective, depending on which article you are viewing the claim in - to understand the overall "veracity" of a claim, it seems the reader would need to navigate back to the first mention of that claim in order to find the longest chain of evidence. Instead, I would be interested to explore the feasibility of a claim-centric (as opposed to paper-centric) count, and to understand whether this is already achieved by existing practises (such as meta-analyses of the literature). Perhaps an alternative approach would be to ensure that meta-analyses that include an article are more clearly visible from that article (e.g. highlighted in a "cited-by" section), and an extension to that would be to link that more recent work to the specific assertions that it relates to in the current article.”

      The point about the “trees” of evidence for a claim is indeed excellent! We envision that these trees will be extractable from the R-factor resource and would be one of its most powerful features, enabling the user to grasp quickly the history of the claim and thus the novelty or the lack thereof of the articles referring to it. We are beginning to build a prototype of the “tree viewer”, which we call the Linker: http://bit.do/mock_up (you can zoom in and out, click on the links and nodes, and move around the graph). We keep in mind the century long history of ignoring the claim that the ulcer disease is caused by a bacterium as an example of how a timely reconstruction of the “trees” of evidence could help accelerate discovery.

      “I would also be interested in whether the authors' have any thoughts on the reporting bias towards positive results (it may be hard to judge replicability, if failed replications remain in desk drawers), as well as on more nuanced evaluations of related evidence: is some evidence stronger than others? Is it feasible to define a scientific claim, or is it dependent on context/species/other factors?”

      Indeed, the R-factor can only reflect what scientists have published, which means that the results that are now in the drawers would not be considered. However, we anticipate that the use of the R-factor and the increasing popularity of preprints can change this. Currently “negative” results stay in the drawers because the value of reporting them is uncertain while the effort of reporting them is substantial. We think that the opportunity to affect the R-factor of a praised paper that everyone in the field knows is wrong and the ease of reporting the results through a preprint service like bioRxiv would help to keep the drawers empty.

      We would like to emphasize that the R-factor of a claim does not measure the replicability of the study that reported it, but whether the claim has been confirmed. For example, testing a claim in a different experimental system, which is a common practice, is not a replication by definition and thus the result of such testing would be missed by the replication approaches, but would be included in the R-factor. Likewise, a replication study can test whether the reported result is reproducible, but not whether it is misinterpreted. The R-factor evaluates the chance that a scientific claim, which is an interpretation of the results, is correct, irrespective of whether this claim is based on valid results, a guess, or misunderstanding, which all have their role in science.

      “Finally, I would be concerned about applying such a metric to individual researchers. An examination of unintended consequences for such a metric would be useful to discuss.”

      We welcome this discussion, but the R-factor of researchers will be derived from the R-factors of their reports by extension. We do not see how this extension can be blocked and question whether it should be blocked. We would suggest that an open and transparent score could be better than a reputation based on grapevine, the membership in the old boys clubs, or on unqualified praise in the media. We envision that once the dust settles, people will see in numbers what they already know intuitively, namely that no one is perfect in their scientific judgment and that some outliers on both sides of the distribution are present. We would also like to emphasize that the R-factor will be but one of the measures used to evaluate scientists and hope that non-quantifiable evaluating criteria will also stay in place.

      Thanks again for your insightful comments, which made us think and wish to discuss the issues you raised further.

      Best regards,

      The Verum team.

    1. On 2017-05-02 14:56:24, user Peter Doshi wrote:

      I very much enjoyed reading this proposal. I agree with the need for dramatic improvements in the way journals handle the post-publication modification to articles.

      Some general reactions/thoughts:

      1. Newspapers vs. Scientific Journals. I find it difficult to pinpoint the key difference between newspaper articles and scientific journals, making me wonder why the simple approach many newspapers have taken to amending articles cannot work for scientific journals, or at least be the basis for the approach journals take? I think the key would be that journals would do a better job at allowing an audit trail. The audit train would make transparent the nature and timing of the changes, make obvious to the reader which version they are viewing at any given time, and add on electronic features such as the ability to compare any two versions a reader wants to compare, showing tracked-changes version between those two versions.

      2. What to call it. I agree with the authors that terms like “correction” and, in particular, “retraction” are problematic in that they are perceived to necessarily convey more than the straightforward act of a post-publication change to an article. I agree that the term “amendment” is neutral, but I have trouble with it. My trouble is that one dictionary definition of “amendment” is a “minor change in a document”, whereas you are conceiving this as an overarching term including changes that may be large. My second difficulty is that it suggests adding something to the document, as in an appendix, not necessarily changing the document itself (albeit in ways that are transparent). Did you consider “alteration”, “version”, or “revision”? Journals generally use “revision” to characterize different version of a manuscript in the pre-publication phase. Is there any good reason not to use it post-publication? There can be minor revisions and major revisions, insubstantial, substantial, and complete revisions… so the term has the flexibility I think you’re looking for.

      3. In terms of contextualizing the topic, I think something needs to be said describing the old world of print only vs. the new world of print and online. With print, there was a clear ORIGINAL publication. No matter how flawed, once printed on paper, one couldn’t amend the true original in the library stacks, but only issue further statements ABOUT the article (corrections, editorials, notices of concern, retraction NOTICES). With online comes the possibility of editing the original – or at least what appears to be the original (i.e. the version people will see when they attempt to access the publication from the publisher’s website). This added complexity can help reduce the propagation of errors (small or large) that may have existed in a publication.

      4. This point is not just stylistic. I think it gets at the heart of whether or not the first published version should means anything special. We are used to thinking it does. But the authors discuss protocols (under “A proposal for the future”) which raises the question of publishing documents that have mostly not undergone editorial processes. So when does the first version start, and what does it represent? Is it the first version that the authors ever drafted or is it the version accepted for publication i.e. the culmination of many revisions pre-publication? Is the idea that once a document is made public (i.e. published), then thereafter, all changes, big or small, will be tracked?

      5. What does one do in the event that editors are convinced of an error that needs correcting but the author(s) adamantly refuse? I know the correction notice can carry words to the effect that the editors are fixing what they deem to be an error but if the authors disagree, will the actual article be amended yet still carry the authors names in the byline? That seems highly problematic as it attributes words to them that they do not stand by. The only way I can see to deal with this is to issue, depending on severity of error, a retraction against authors will or, alternatively, a linked ‘expression of concern’ but not change the actual text of the article. Any kind of modification of the original where even one author dissents seems problematic. It would get even more complex if some authors agree with the amendment yet others do not. Do we get version forking with editors to decide which version is served up as “current”?

      6. I would avoid introducing protocols into the single-stream publication proposal. Protocols are one of many essential documents involved in research. A research paper another one of those essential documents – but again there are many important documents with research. Protocols often go through their own many revisions and, practically speaking, are different files on one’s computer that may exist simultaneously to a manuscript that is in preparation. It seems to me your proposal should track the versioning of a single document – e.g. the journal-destined research report – possibly pre-, but definitely post-publication in a journal.

      7. Stylistically, the concept the authors put forward (of conceptualizing the amendment process as containing two distinct elements of (1) editing articles and (2) publishing a notice about the edit) is important and I think can be made clearer earlier on, perhaps giving it its own heading “The amendment model: publish a notice and edit the article itself”.

      8. Also stylistically, I would put more emphasis on the point that in the case of a correction, irrespective of the size of the correction, articles available online will be edited so that by default, the version served up to a reader when they visit the article on the web will be the CURRENT version (reflecting all edits/corrections to date), not the version that appeared when the article was first published (with a notice that a correction exists). I think this is a big break from current practices at many journals.

      9. The authors note that every publisher has their own strategy for content delivery, and do not make specific recommendations for or against how to display amendments. But it seems to me that display strategies are part of what has got many people feeling uncomfortable about corrections. Many authors probably would prefer to avoid a big bold all caps notice that THERE HAS BEEN A CORRECTION TO THIS ARTICLE at the top of their published article? I think therefore that a specific recommendation for how the reader should be alerted about the existence of amendments should be made. The authors suggest a difference in display between minor vs. other amendments, but I wonder if there should just be one approach in line with the notion of de-stigmatizing amendments.

      10. As far as locations for noting the existence of post-publication amendments, newspapers are doing it below the article, often set apart with italicized text. Where will this flagging occur in scholarly publishing? What about proposing certain article meta-data become standard. Just the way Acknowledgements and COI declarations are now fairly standard elements of articles, could a “Current version: X (version history)” line become standard, with a link to a separate document that contains the explanation of the corrections? Or perhaps all articles should contain a "Version history (up to [YYYYMMDD]): On YYYYMMDD, we fixed a typo. On YYYYMMDD, we removed an author for reasons described here (DOI-to-editorial-note-about-research-misconduct)..."

    2. On 2017-03-29 05:37:38, user M Hooper wrote:

      Excellent and radical suggestion about amendments. For me, it raised three suggestions and questions:

      1. If I see an article that has been subject to a wholesale amendment, I will wonder why. I don’t think it’s possible to remove the stigma of amendments without providing the reasons for the amendment. I think reasons could be provided immediately below the “Declaration”.

      The article convinces me that drawing a distinction between (i) amendments, and (ii) the reasons for them, will be good for the culture of making amendments. Consequently, it will be good for the accuracy of the academic record more broadly, and good for the community, practitioners, and policy makers. But it’s not so obvious to me that such a sharp distinction will be good for ethics. A reader deserves to know if the reason for some amendment is that the authors fabricated data. Even admitting the many faults of the term “retraction”, it has sometimes played this helpful role, and it has sometimes been a good stick for ethics to brandish.

      One of the problems your suggestion solves is that the term “retraction” has harmed innocent-authors, and it has deterred innocent-authors from doing good things (like correcting the record). But “retraction” has also helped stigmatise some very bad practices, which is a nice effect.

      The reason “retraction” is a bad term is that it applies to both cases involving fault and cases involving no-fault. The most obvious solution is to just say we need two terms instead of one. You didn’t choose that option, and I think you were wise not to. Choosing a neutral umbrella term has all the benefits you describe.

      Nonetheless, *something* has to do the job of separating one from the other - cases of misconduct from causes involving no fault. As I said, I think this should be done by providing the reason *immediately below* the declaration of amendment.

      1. Will it be possible for an author of a paper to lose her authorship in a later amendment? Suppose I authored the original paper, but have since refused to be involved in substantial amendments. Moreover, suppose my original contribution was to the sections of the paper that have since been amended such that my original contribution is now eliminated from the current version. Should I still be an author?

      2. If amendments become acceptable and normal, journals will have to decide the kinds of amendments that they will allow. This will be tricky if authors want to make amendments for minor things. What if I just change my mind about something minor in the discussion, for example? What if I have since thought of a more elegant way to phrase my introduction? What if critics, in subsequent published works, have accused me of treating the literature carelessly; may I now just amend my literature review to make their criticisms false? (I imagine these issues will more problematic in the humanities, but could well be wrong.)<br /> Anyway, great paper.

    1. On 2017-05-02 14:16:16, user Peter Civáň wrote:

      Dear Jae and Michael,

      Thanks for this interesting paper! I’m glad the debate goes on and people are trying to make sense out of the contradictions.

      You made several good points here and I totally agree that the genomic window size of the CLDGRs is critical for clustering patterns that are based on genetic distance.

      However, the situation is not as simple as “the smaller genomic windows provide more correct genealogy”. Surely, we know that sh4 and prog1 coding sequences are fixed in all cultivated rice, so if we focus on a genomic window narrow enough (e.g. just the coding sequence, or in an extreme case just the FNP), we will inevitably recover a monophyletic O. sativa group (or better say paraphyletic O. rufipogon). The question is, how far from the gene can we go and collect genealogically informative signal (undisturbed by recombination)? Neither I nor you have answered this question.

      Consider the situation on the attached figure. Keep in mind that the “domestication gene” can be a very ancient allelic variant that emerged in wild populations long before the domestication, and also keep in mind that wild rice populations are quite dynamic (in terms<br /> of recombination and glacial-interglacial movement). Then we can imagine a situation where we have multiple combinations of alleles (Xa–Xe) with different genetic backgrounds within the wild population. Let’s imagine two independent domestication events, leading to two cultigens (I and II). The allele Xd is selected in both cases and fixed in both cultigens. If we focus on the narrow window, we recover monophyly of the cultigens. If we focus on the<br /> large window, we recover polyphyletic cultigens. In this particular cartoon, the latter would be correct.

      I cannot be sure that this is indeed the case of indica and japonica, because I did not identify the entire haplotypes with their recombination points (the quality of the data just doesn’t allow that). I think both of us may be over-interpreting the selective sweep analysis a bit. Maybe we need to focus on other kinds of data and methods (your recent coalescence analysis is one example). Maybe our paper (Civan and Brown. Origin<br /> of rice (Oryza sativa L.) domestication genes. Genet Resour Crop Evol. In press) will bring some new insights, and hopefully, there will be more stuff coming from me and Terry soon.

      https://uploads.disquscdn.c...

    1. On 2017-04-13 02:21:17, user Dave Baltrus wrote:

      As the "big data revolution" progresses and biology is confronted with ever more complicated patterns to interpret, evolutionary terms are being increasingly invoked to explain perceived patterns. "Frequency dependence" is one of these terms. The purpose of this manuscript from Brisson is to begin to clarify when it is/is not appropriate to use the term "negative frequency dependent selection (NFDS)" in the context of evolutionary explanations. Brisson does a great job of laying out definitions and explanations for use of this term over the last century or so, and does so while describing how such selection regimes could help to explain the amount of diversity we see in the world. I'm a proponent of clearly laying out the case for when nuanced evolutionary terms are applied inappropriately, and Brisson does a good job of describing instances where patterns may suggest negative frequency dependent selection but where this specific evolutionary model doesn't apply. He makes this case throughout the manuscript and does so in a way that is clear and concise. I think this manuscript could go a long way towards clearing up some confusion in the literature if the right people see it at the right time.

      I have no major qualms with this preprint, it's laid out and written quite well. However, I do think that it would make the case slightly more clear if, in cases where the pattern suggest NFDS falsely, if some examples were imagined that would allow the patterns to fall under the purview of NFDS. For example...what would need to happen to make the "killing the winner" scenario actually fall under NFDS? I'm not sure if there is actually a clear way to do this or if it would muddle things, but if possible it would be good to include additions that could make these situations fall under NFDS as counterpoints.

      I really enjoyed this preprint both for its subject matter and clarity, and I hope to see it well received across communities.

    1. On 2017-04-11 15:14:46, user Rahul Nahar wrote:

      We have seen such a phenomenon even on Hiseq2500 which also uses bridge amplification and thus I think it might be present on Nextseq 500 as well though may be to a slightly lower extent than Hiseq4000.

    1. On 2017-03-07 13:14:22, user Pat Schloss wrote:

      The preprint from Herren and McMahon describes a new metric - cohesion - to describe the overall connectedness within a community using temporal data. I was excited to see this preprint because I am familiar with McMahon's long history of developing rich time series data for microbial communities in Wisconsin lakes. I also have a lot of my own time series data from humans and mice where we struggle to incorporate time into the analysis to understand the interactions between bacterial populations.

      A significant struggle in analyzing time course community data is the ability to synthesize observations for large numbers of taxa over time. Many of the existing methods people use attempt to adapt methods from cross sectional studies. For example, a study may sample a large number of lakes, people, soils, etc and characterize their microbial communities. They'll then calculate correlations across those samples based on the relative abundance of the populations. Alternatively, they'll used presence/absence data to generate co-occurrence matrices. The problem with these studies is that the next step is to often infer something about the interactions between the populations - even if the populations would never possibly co-occur. Herren and McMahon's efforts to study the connectedness of individual populations and their cohesion is very welcome because it has the potential to get us closer to describing the actual interactions between populations.

      To briefly summarize the approach, the method starts by calculating the Pearson correlation between all pairs of populations across time and then discounts the correlation that would be expected if all interactions were random. This is important because of the compositional nature of the data and the effects of different population sizes. Next, the method calculates the average positive and negative corrected correlation for each population. These become the positive and negative connectedness values for each population. Finally the positive and negative cohesion values for each community is calculated by determining the sum of the product of the connectedness value and the relative abundance for that population.

      The following are general critiques and questions, which I appreciate may be beyond the scope of the current manuscript (note, I am not a reviewer for the manuscript at a journal):

      1. To develop the cohesion metric for a community, the authors sum over all of the populations in the community. This raised three questions for me. First, independent of the relative abundances in each sample, is the *number* of positive and negative connections for each population relevant? It might be worthwhile exploring which populations have more positive/negative connections than others. What does that distribution look like? Second, does the connectedness metric value itself have any value? What are the populations that are highly connected with other populations. Finally, the method generates a cohesion value for each time point. If I think of Lake Mendota as a community that was sampled over time, it would be interesting to know whether it has been more cohesive than Lake Monona over the 19 years of sampling. Thinking of my own work, I would be interested in knowing whether mice that are more susceptible to C. difficile colonization are less cohesive than those that are resistant. Again, this would require a composite score, not individual scores for each time point.

      2. Continuing on my self-serving thread, I wonder how sensitive the method is to the time interval between samples and the number of samples. In my experiments I may have 20 daily samples from a mouse - is this sufficient? What if we miss a day - how will having a jump between points affect the metrics? As the authors state, the Lake Mendota dataset has 293 samples collected over 19 years (e.g. 1.3 samples/month). This is a very unique dataset that is unlikely to be repeated elsewhere. What if we were to get more frequent samples? What if they were more spaced out? What if we only had a year's worth of data? It would be interesting to see the authors describe how their cohesion values change when they subset the dataset to simulate more realistic sampling schemes.

      3. A significant challenge in developing these types of metrics is not knowing what the true value of the metric is in nature. I appreciate Herren and McMahon's effort to validate the metrics by comparing their results to count data and to explaining the variation in Bray-Curtis distances. The manuscript reads almost like they want their method to recapitulate what is seen with those distances. But we already have Bray-Curtis distances, if that's the goal, then why do we need the cohesion metric? It would be interesting to see the authors simulate data from communities with varying levels of cohesion and abundance to see that the method gets back the expected cohesion value. Perhaps it would be possible to generate an ODE-based model to generate the data instead of variance/covariance data. There is one simulation described at the end of the Results (L300); however, it is unclear whether the lack of a meaningful R-squared value was the expected result or not.

      4. Throughout the manuscript, the authors make use of parametric statistics such as Pearson's correlation coefficients and the arithmetic mean. Given that relative abundance data are generally not normally distributed and are likely zero-inflated, I wonder why the authors made these choices. I would encourage the authors to instead use Spearman correlation coefficients and median values. Related to this point, a concern with using these correlation coefficients is the problem of double zeros where two populations may be absent from the same communities. These will appear to be more correlated with each other than they really are, which is why we don't use these metrics for community comparison - we use things like Bray-Curtis. I wonder whether subtracting the null model counteracts the problem of double zeroes.

      5. The authors translate their count data into relative abundance data before calculating their correlation and Bray-Curtis values. I wonder if the authors subsampled or rarefied their data to a common number of individuals. Both of these metrics are sensitive to uneven sampling. Even if the counts are converted to relative abundances, this would not remove the effects. For example, if one sample has 1000 individuals and another has 100, the limit of detection on the first would be 10-fold higher than the second. There may be populations that represent 0.5% of both communities that would not be seen in the second. If they haven't already, I would encourage the authors to subsample their dataset to a common number of individuals.

      6. The "Description of datasets" section of the Methods describes the various datasets in general terms, but what is the nature of the data? How were the phytoplankton counted? How many individuals were sampled from each sample?

      7. It would be great to have the code that was used made publicly available on GitHub

      8. The authors present the material in a format that I have not previously seen in the microbial ecology literature (i.e. ISMEJ where this appears to be destined for review). The authors flip back and forth between presenting a different stage of the algorithm and validating that step. I think this is a bit more aligned with how one would present the material in a talk than in a paper. I've seen similar methods development described before where there might be a methods section on algorithm development and then the results section would test the assumptions and performance of the algorithm. I'm curious to see whether this structure persists through the editorial process.

    1. On 2017-02-23 21:54:41, user Dave Baltrus wrote:

      (Stepping up to break the ice and comment formally here instead of just on twitter)

      1. I think the Ben Schwessinger experience described here (https://blushgreengrassataf... is worth a mention for a couple of different reasons. It's the first time that I can recall that a journal had to step up and actually deal with a situation where scooping by preprint (or because of preprint) may have occurred. As such the policy at PLoS has been refined. When things change, there are always the uneasy situations like this that force people to make difficult (and sometimes wrong) decisions

      2. I think it's also worthwhile to mention sites like PubPeer. Public reviews and comments on preprints are part of overlapping discussions but aren't necessarily the same discussion. Feels like there's something to be said about that although I'm not sure what that is right now.

      3. My whole take on "but it's not peer reviewed" is that those that will be reading the preprints in order to cite them are well qualified as reviewers themselves. If you don't trust the paper or don't like it, don't cite it. If you read through the paper and don't see fault with experiments, why not cite it? We all have blindspots but it's not like we don't review papers all the time and critique them anyway even if they've been through peer review.

      4. I think we should make a greater effort to write positive comments on preprints and not just use this as a forum for review. Positive comments can help those who maybe aren't in the literature figure out which preprints are great and which have holes (by their lack of positive comments). I see this as important if preprints are going to be written about by the popular press and digested by those who aren't necessarily experts. We as experts need to endorse good papers just as we will trash the bad papers.

      5. I had the first preprint in biorXiv under Microbiology, why are you taking this achievment away from me Schloss?

      6. Looping back on number 4...if we are going to be the ones reviewing grants and papers and we see a preprint cited, we can actually review this work. Some are going to use it to get around page limits but, like you point out, we as scientists should be pretty good at snuffing shoddy and rushed work out and so that this could also theoretically backfire on the person trying an end run on page limits. Sure it may give you more space to write, but if you do a terrible job you may otherwise poison the impression of a grant reviewer that might otherwise like your grant. I'm tired of having to see (in press) or (in prep) when work is cited in a paper or grant. If it's an important enough story for the grant, I want to be able to read the story myself and preprints allow this.

      7. There are different costs and benefits for preprints depending on the field you are in and the point in your career. I don't know that we've figured this out at all yet or if there is a great answer across the board. It seems as though the pop gen fields have taken to preprints more than other fields, but in my experience evolutionary biology in general tends to be less "scoopy" or "eat their young" than other fields. I'd like the world to exist where everyone can freely post preprints and get credit, but I can see this going horribly wrong in fields that are much more competitive and potentially containing more selfish PIs. I mean this not as a positive or negative commentary on different fields, but it's quite obvious to me that some fields are more cutthroat than others for a variety of reasons and the cost/benefit analysis for preprints in these fields will be different.

    1. On 2017-01-30 20:23:17, user Alexandro Rodriguez-Rojas wrote:

      This is a nice idea. However, the authors say 'We assume that interactions between the strains are solely due to resource competition'. I think that competition is often more complex than resources speed use. Let's imagine a few different situations. What would happen if that one strain is more susceptible to metabolic wastes, acidification of the medium or has an altered quorum sensing? What if the growth depends on a public good such as siderophores? In this last example the strain that growths worst (alone), when is co-cultured with the one that growths better (also alone), would cheat by stealing the public good that is unable to produce, outcompeting its peer. What if one of the microbes produce a toxin, an antibiotic or a phage? Validation of the model in this kind of scenarios would be a great plus to this new technique. I hope this helps and I'm looking forward to seeing that the model is independent of more complex interactions or it may be even useful to unveil them.

    1. On 2017-01-04 19:35:12, user JN wrote:

      Nice paper--I like the valacyclovir question/angle on a familiar theme. The paper can trace its scientific lineage to the work of Montaner, Lima and Williams and others work in the 00's. Although younger and perhaps unaware of "ancient history", the authors could consider acknowledging earlier work which in turn was built on Anderson and May's early work on HIV and AIDS epidemiologic modeling. When we did our Lancet paper in 2008 my only regret is not discussing how it was a logical outcome of two decades of research/modelling around HIV natural history and the potential for treatment. It may have made it less shocking to the HIV community--the fierce and, in some settings ongoing resistance, to the idea that treatment has an important role in both keeping people healthy and preventing HIV transmission has directly translated into delayed implementation of test and treat services for people living with HIV--see www.hivpolicywatch.org for latest policy status. This is reflected in many models that restrict treatment while focusing on scaling prevention--interesting idea but not realistic to not prioritize treating people with HIV and then layering on prevention modalities.

      Not sure where the 70% ART coverage comes from but it is important to think carefully about the impact of ART--many models downgrade ART efficacy or effectiveness by loading in pessimistic reduction in transmission risk and retention parameters that are not supported by the latest population based studies from Botswana and other countries. Most major models out there are not that clear about the definitions of coverage and/or the actual risk reduction parameters. I suspect that this will continue to be a battleground as some modelers prefer pessimistic assumptions for ART (but not PrEP!) and others choose well-performing program data.

      The good news is that the models will continue to help us think about the optimal strategy to both keep the 37M people alive while controlling and elimination the epidemic in many settings....

    1. On 2016-12-27 23:16:53, user Peter Ellis wrote:

      Another comment that occurs to me at this point - when you were looking at the "co-amplified" X genes for signatures of selection, how did you define co-amplification? If I am reading the paper right, it looks like you looked specifically at the direct homologues of the Y-linked ampliconic genes, i.e. Sstx, Slx, Slxl1 and Srsx.

      In looking for signatures of selection around X-linked genes, I think it is imperative to first consider which X genes are likely to be affected by the conflict. The Slx/Sly conflict seems to be mediated by varying the strength of PSCR, i.e. a GLOBAL regulation of sex chromosome expression in spermatids. The prediction therefore is that if Sly-mediated repression increases, EVERY dosage-sensitive, spermatid-expressed gene on the X and Y will come under selection to increase its activity.

      This is what we saw in our 2011 paper - the proliferation of Slx and Sly in the Palaearctic clade is associated with an increase in copy number at almost all the X-linked ampliconic genes, not just the direct homologues of the Y-linked ampliconic genes. We also showed that the net transcription level of the X amplicons stayed approximately constant across species despite an increase in copy number. We interpreted this as showing that the X linked genes are being selected to maintain functionality despite increasing postmeiotic repression.

      In your data we would therefore predict a signature of selection not just at the specific homologues of Y-linked ampliconic genes, but at many of the other X-ampliconic genes. This would confound attempts to detect selection by comparing the X-Y homologous genes to the rest of the chromosome.

      Similarly, a selective signature from the conflict may not be restricted to ampliconic genes. All we can predict is that as Sly repression increases, X- and Y-linked genes are forced to respond _in some way_. That does not only mean gene amplification. Any given gene could respond by an increase in copy number (more copies) - but it could also respond with an increase in promoter strength (more transcripts per copy), improved translation efficiency (more protein per mRNA molecule) or an increase in protein function (more functional activity per protein molecule).

      For example, there is a single Zfy gene in rat. In mouse this has become duplicated to give Zfy1 and Zfy2 (gene copy number change), Zfy2 has acquired a new spermatid-specific promoter (increased transcription from one gene copy), and Zfy2 has additionally become a stronger transcriptional activator (increased function per mRNA transcript). I can't prove (yet) that this is linked to the Slx/Sly conflict, but it looks to me like it may be.

      Whatever the form of response, if it was driven by selection, it should in principle leave some signature around many of the spermatid-expressed genes on the X. How does the analysis in figure S4 change if you look at the DNA surrounding all the spermatid-expressed genes on the X? Given that there are rather a lot of them(!) it may be that they all run into one and you won't be able to find a specific loss of diversity around each gene, just a loss of diversity across the X as a whole.

      If you do try this, you may need to treat spermatid-specific genes separately from genes expressed more widely. Widely-expressed genes will be constrained by the fact that increasing their activity in spermatids may also increase their expression in other cell types, however spermatid-specific genes will be freer to respond to the conflict. I think this is what's going on in Larson et al 2016a when they report that some genes show transcriptional alteration in pre-meiotic spermatogonia in the different species and F1 hybrids. I think what may have happened here is that some widely-expressed X-linked genes have been selected for stronger promoter activity to overcome Sly-mediated repression in spermatids. This keeps overall transcription reasonably constant in spermatids, but now leaves them overdosed in the spermatogonia.

      And finally (!)<br /> The potential selective signature of the Slx/Sly conflict may not be restricted to the sex chromosomes - there are also a few ampliconic autosomal loci that appear to be regulated by Slx and Sly. These include Speer genes (Cocquet et al 2009, 2012) and a block of genes on chromosome 14 (Larson et al 2016a, fig 4C). It might be that a look at these areas would show something interesting. Possibly it would even be easier to see a signature of selection here, since so far as I'm aware these are discrete blocks of genes rather than chromosome-wide regulatory effects.

    2. On 2016-12-23 13:38:50, user Peter Ellis wrote:

      What an absolutely fascinating paper.

      I have a few questions and comments - some of them likely quite naive as statistical genetics is not my area!

      ********************************

      Lines 167-176:<br /> You counted gene copies directly and confirmed the earlier finding that musculus has much higher copy numbers of Slx and Sly relative to domesticus (Fig. 5). However you also showed that the proportion of the red/blue/yellow amplicons is the same in each species (Fig. S3A), and that the domesticus Yq is if anything slightly larger on average that musculus Yq (Fig. S3B).

      How can these observations be squared with each other? If domesticus Yq is larger than musculus Yq, and they both have same proportion of the red amplicon containing Sly - how can musculus have a larger copy number of Sly? I'm not sure what these different measurements are telling us.

      ********************************

      Lines 184-190: <br /> You looked for a signal of selection at the co-amplified loci on the X, but observed no reduction in genetic diversity surrounding the X ampliconic regions.

      Given that one mode (most likely mode?) of expansion of these clusters is by nonallelic homologous recombination, does this affect the calculation? It seems plausible that it would, since the same effective mutation - expansion of the cluster - can occur recurrently on different haplotypes and also spread horizontally between haplotypes by recombination within the gene cluster. This doesn't apply to males, and so the males would have a much greater reduction in diversity associated with selection on the amplicons.

      ********************************

      Lines 194-199:<br /> You find that for X and autosomal genes, there is more variation between tissues and less between species, compared to the Y chromosome. e.g. PC1 and PC2 for X+A genes represent tissue specificity whereas PC2 for Y genes represents species differences.

      To what extent is this due to Y genes being almost exclusively testis specific? When genes are expressed in multiple tissues, there's room for a lot of complexity, which will show up in the PCA analysis. When genes are expressed in only one tissue, one principle component is sufficient to encapsulate that fact, and so PC2 will necessarily relate to something else. <br /> What happens if you compare Y-linked genes to testis-specific (or spermatid-specific) genes on the X and autosomes? Does the Y still show up as having increased expression divergence between species? I suspect it will, but it would be nice to check.

      ********************************

      Lines 217 onwards:<br /> Yes, there's definitely more complexity here and it's not just a linear function of Slx:Sly ratio. In our original paper (Cocquet et al 2012) and the preceding shSLX knockdown paper, we found that knocking down Slx on its own didn't really affect X gene expression as a whole, although there was a sex ratio skew. It may be that there are thresholding effects - e.g. as long as you have "enough" Sly around to prevent Slx from accessing chromatin, then adding more Sly beyond that point won't affect X gene expression any more.

      Deficiency in the multicopy Sycp3-like X-linked genes Slx and Slxl1 causes major defects in spermatid differentiation.<br /> Cocquet J, Ellis PJ, Yamauchi Y, Riel JM, Karacs TP, Rattigan A, Ojarikre OA, Affara NA, Ward MA, Burgoyne PS.<br /> Mol Biol Cell. 2010 Oct 15;21(20):3497-505.

      Julie has also recently shown that SSTY proteins interact with all the Slx/Slxl1/Sly family and may affect their ability to enter the nucleus - so not only do Slx/Slxl1/Sly likely compete for binding to particular chromatin sites, they may also compete for some factor that transports them into the nucleus.

      SSTY proteins co-localize with the post-meiotic sex chromatin and interact with regulators of its expression.<br /> Comptour A, Moretti C, Serrentino ME, Auer J, Ialy-Radio C, Ward MA, Touré A, Vaiman D, Cocquet J.<br /> FEBS J. 2014 Mar;281(6):1571-84. doi: 10.1111/febs.12724.

      ********************************

      Lines 248-251 and Figure 7B:<br /> What is the gene copy number in the lines used for the DXD and MXM crosses? Given that you've now documented extensive variability within as well as between species, I think it would be useful to include this information in the figure.

      What was the denominator for the Slx/y family here? Did you count both Slx and Slxl1, or just Slx? Does the interpretation change if you use just one or the other? It's not clear to me that we yet know whether Sly is directly competing with Slx, Slxl1 or both.

      It might also be interesting to normalise the activity for the gene copy number in each case. Slx and Sly have a fundamental difference in that (if the underlying hypothesis is true that Slx promotes expression from the sex chromosomes and Sly represses it), Slx has a positive feedback on itself, while Sly has a negative feedback on itself.

      Thus a comparatively small change in Slx copy number could have a disproportionately large effect, while a large change in Sly copy number will be "buffered" by the negative feedback. In the 2/3 Yq deletion mice, expression of Yq genes drops by less than 50%, since each copy is transcribed at an intrinsically higher level. I suspect this may be a contributory factor to the sheer size of the Yq amplicons - a small amplification on the X triggers a much greater degree of amplification on the Y, because the Y has to fight through the fog of its own negative feedback.

      ********************************

      Lines 308-310<br /> Do you mean to say that you cannot detect a signature of sex ratio skewing, or that you can definitively rule out sex ratio skewing? In a conflict scenario such as the one hypothesised, then the historical situation could well be one of constant change - sometimes the X has the upper hand and the sex ratio is female biased, sometimes the other way round. Would that not obscure the signature of any particular episode of skewing?

      ********************************

      Lines 370-371<br /> You say, "Sex-ratio distortion has been observed in the offspring of males with X:Y copy-number mismatch in some experiments (Cocquet et al., 2009; Case et al., 2015) but not in others (Turner et al., 2012; Albrechtová et al., 2012)."

      The Turner paper did show some effects in the predicted direction. From their Table 2:<br /> Domesticus offspring = 31/59 = 52.5% female<br /> Hybrid domesticus (with domesticus Y ) = 58/105 = 55.2% female<br /> Hybrid musculus (with musculus Y) = 88/206 = 42.7% female<br /> Musculus = 42/76 = 55.3% female

      With low numbers in each group, these differences are not all significant (power calculation for a 10% skew requires 400 in each group for 80% power), but I certainly don't think it can be ruled out, particularly since they didn't explicitly break down the groups based on the proportion of the X chromosome coming from the introgressed background in each case, i.e. their hybrid groups may include animals where only autosomal loci have ingressed and the sex chromosomes are congruent. Indeed, their own conclusion was that "the trend in our data is consistent with a sex ratio distorter on the musculus Y which is effective only on a partially domesticus background"

      The Albrechtova paper made no measurements of sex ratio and I'm not sure why you're citing it here. They looked at sperm counts and sperm velocity in an area with an introgressed Y which is already known to affect sex ratio, and found that,

      "In the section of the HMHZ we studied, the YMUS chromosome has introgressed across the zone in apparent disregard of Haldane's rule and this introgression is associated with a shift in the sex ratio in favour of males [6]. In the current study, we find that in the presence of the invading Y chromosome the most extreme reduction of SC in hybrid individuals is more than rescued, to the extent that an apparently domesticus male with the introgressed YMUS chromosome is expected to have higher SC than one with its consubspecific Y."

      i.e. introgression of the musculus Y is favoured because it rescues adverse sperm phenotypes in hybrid males. Their reference 6 is to the following paper, which is relevant and should be cited in your paper.

      Macholán M., Baird S. J. E., Munclinger P., Dufková P., Bímová B., Piálek J. 2008. Genetic conflict outweighs heterogametic incompatibility in the mouse hybrid zone? BMC Evol. Biol. 8, 271–284

      ********************************

      Lines 378-383<br /> Here, there are two independent deletions of ~2/3 of Yq that should be cited - the one from Conway et al that you already have, which arose on an RIII background, and also one from Josefa Styrna that arose on a B10.BR background. The paper from Macholán et al is also probably best mentioned here as a "real world" example of sex ratio alteration associated with Y chromosome introgression.

      Influence of partial deletion of the Y chromosome on mouse sperm phenotype.<br /> Styrna J, Klag J, Moriwaki K.<br /> J Reprod Fertil. 1991 May;92(1):187-95.

      Regarding the paper by Fischer et al on C57Bl/6JBomTac, all they say in the paper is that there are no reports of sperm abnormalities or sex ratio skewing in this line. So far as I'm aware, nobody's looked yet, so this is certainly worth checking. I don't think we can assume anything from the current absence of evidence, though.

    1. On 2016-09-28 08:27:01, user Gordana Rasic wrote:

      Gordana Rašić<br /> In the spirit of open science, we are sharing the reviewers' comments on this paper and our responses.

      Enjoy!

      Reviewer #1: This is a straight forward, clear presentation. It addresses and important issue and the conclusions are supported by the data. Thus I have no problem recommending it be published.

      One sentence, however, confused me. line 255-258. I do not believe the cited paper sowed Aaa and Aaf are "one genetic cluster". More accurately, that sentence could read: At least in one locality in Africa (Senegal) the two established subspecies Aaa and Aaf are integrating with no sign of genetic subdivision when brought into sympatry, so it is not surprising....."

       We thank the reviewer for the overall positive assessment of our work. We agree that the stated sentence is confusing (line 255-258), and we have changed it following the reviewer’s suggestion (line 265-269).

      Might also cite Tsuda et al. Japan Society of Medical Entomology and Zoology 54:73 (2003).

       As per reviewer’s recommendation, we now cite the paper by Tsuda et al. (line 236-238).

      Reviewer #2: This manuscript investigates to what extent worldwide disseminated domestic Ae. aegypti specimens morphologically identified as the pale variety queenslandensis and the type form from Australia and Singapore are reproductively isolated ("how freely they interbreed"). A total of 74 sympatric pale and type Ae. aegypti were genotyped for a 1170 bp-long mitochondrial sequence and 16,569 nuclear SNPs. <br /> Although I am not an expert on Aedes taxonomy, I have identified a few issues that should be better explained/corrected before this manuscript is considered as publishable material.

      1. Published references are cited in a very loose manner. Sometimes even with disregard to their original meaning (i.e. Powell & Tabachnick 2013).

       While we appreciate reviewer’s critical assessment of our work, we cannot agree with this conclusion and are not able to address it without a concrete example of what is being disregarded.

      Our citation of the e.g. Powell & Tabachnick paper (2013) refers to their statement regarding McClelland’s 1967 conclusion that the classical subdivision within Aedes aegypti is a gross oversimplification and that “...Aedes aegypti cannot be split into definite interspecific entities...” Powell & Tabachnick (2013) conclude on page 16, paragraph 1: “...In the 45 years since, this advice has often been ignored, even in recent times”.

      Our reference (underlined) to Powell & Tabachnick conclusion (above) states: <br /> ...“McClelland [7] suggested that subdivision into forms seems oversimplistic and should be abandoned unless correlation between genetic and color variation can be demonstrated [7]. His recommendations have been largely disregarded [9]) despite the fact that multiple genetic marker systems (allozymes, microsatellites, nuclear and mitochondrial SNPs) have failed to find a clear differentiation between forms and markers [10][11][12].” (page 4, line 82-87).

      We have now changed this sentence to: “In their latest review of Ae. aegypti history, Powell and Tabachnick [9] point out that McClelland’s recommendations have often been ignored for the past 45 years...”, to further clarify the context for this citation (line 88-89).

      1. Chan et al (2014) did not consider Ae. aegypti aegypti and Ae. aegypti queenslandensis as separate entities.

       This is correct and we did not attempt to argue that Chan et al. (2014) considered them as separate entities. We said that their finding of a relatively high mitochondrial divergence between the two forms “..., although lower than a commonly adopted threshold of 3% for species delineation in insects [14], suggests that the two forms may not freely interbreed. ” (line 94-97). Hence, we decided to further test this hypothesis.

      1. No significant differences in oral infection of DENV-2 between pale (Ae. aegypti queenslandensis) and dark (Ae. aegypti aegypti) were ever observed (Wasinpiyamongkol et al. 2003).

       That is correct, but nowhere in the text do we argue to the contrary. In fact, the findings of Wasinpiyamongkol et al. (2003) further support our conclusions and we have now included this citation (line 268-269).

      1. The taxonomy of the variation seen within Ae. aegypti, as presented, is flawed and incomplete.<br /> I feel that the scientific issue selected to be addressed has not been properly defined or characterized.

       It is little hard to respond to this. Our work was not intended to present a taxonomic description of phenotypic variation, and we followed the well-established color/scalling criteria of Mattingly and McClelland (described in the text). The focus of our work was to test if the two forms are genetically distinct using the mtDNA and nuclear SNP variation.

      Reviewer #3: An interesting and useful contribution to understanding of Aedes aegypti population biology / genetics, and the date presented further allay any potential concerns that deployment of Wolbachia or RIDL-based control may be stymied by mating barriers, at least in Asia or Latin America. The study seems well conducted and methodologically sound.

       We thank the reviewer for the overall positive assessment of our work.

      85 'his recommendations have been largely disregarded' - I don't think this is really true - it is not a widely held view among contemporary mosquito biologists that Ae. aegypti outside of Africa should be divided into forms based on colour or exist as reproductively isolated sub-populations, especially given several recent population genetic papers providing evidence to the contrary. The Chen paper seems an exception in this respect; perhaps the anomalous results in that paper may have been a result of collections conducted over a period of a number of years.

       Please see above our explanation of this statement (citation) and the rationale for further testing of the Chan et al (2014) findings.

      It might be useful in intro or discussion to give a little more information on the putative queenslandensis form, e.g. Mediterranean populations were recorded as belonging to this light form prior to their eradication, and possible behavioural / oviposition differences.

       As per reviewer’s recommendations, we have added more information on the Mediterranean light form in the Introduction (line 81-84).

      268-9 do the light colour variants ever arise in the lab populations used for release? (which will have been outcrossed with wild material).

       The light color individuals indeed show up (albeit very rarely) in our laboratory populations originating from the release areas. We have now added this statement to Discussion (line 239-242).

      Table 1 - suggest move to online.

       Done.

      Minor<br /> 68 replace urged with e.g. caused

       Replaced with “motivated” (line 68).

      79 Ae.

       Corrected (line 79).

      81 aegypty

       Corrected (line 81).

      201 Moore not More; refs 10 & 35 same

       Corrected (line 204) and removed a duplicate reference (ref. 35).

      282 aegupti

       Corrected (line 292).

      references e.g. some paper titles in title case, some lowercase

       Corrected throughout.

      End of comments.

    1. On 2016-07-15 09:05:13, user Adam Eyre-Walker wrote:

      We have known since Seglen’s seminal paper in the 1990s that the distribution of the number of citations for papers published in a journal is highly skewed, that there is considerable overlap in the citation distribution between journals and that there is a poor correlation between the number of citations a paper receives and the journal IF. These observations have been used to suggest that the journal IF should not be used to assess the merit or quality of a particular paper. Usually it is suggested that either the paper is read or that article level metrics are used to assess the merit or quality.

      Reading the paper may be considered the gold standard but it is impractical in many circumstances in which one is interested in assessing merit; if for example, you have 100 CVs to look through, you can’t possible read all their papers, or even the best three. Even the papers of those on the shortlist may be too many and you may not be an expert in the field under consideration.

      As for citations, as all researchers know articles are cited for all sorts of reasons, often incorrectly. The only quantitative analysis I know of, concluded that the vast majority of the variation in the number of citations a paper receives is just noise, and has nothing to do with the underlying merit of the paper (http://journals.plos.org/pl...:IMc0c9cv2v-9IfdpEuhFmiJxdk8 "http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1001675)"). I suspect the same is true for other article level metrics.

      I find there is a strange disconnect in arguments about the IF. The journal IF must contain some information about the merit of the papers published in a journal because we, the scientific community, are the ones that determine where things get published and what gets cited. We don’t publish any old paper in Nature and Science; we publish what we believe is the best and most interesting science. Now sometimes, may be even often, we will get this wrong, but an informed decision is made to publish a paper in a particular journal. In a sense all the IF represents is someone else’s opinion about the merit of a paper. I think this might be one of the reasons people are uncomfortable with the IF along with the fact that the IF is clearly subject to error as a measure of merit. However, all measures of merit are subject to error and there is no evidence that the IF is any worse (http://journals.plos.org/pl...:IMc0c9cv2v-9IfdpEuhFmiJxdk8 "http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1001675)"). I’m not suggesting that the IF should be used blindly to assess papers and researchers, but suggesting that it contains little or no information about the merit of a paper seems illogical to me.

    1. On 2016-03-18 13:55:54, user Fabien Campagne wrote:

      Interesting visualization work. I think in addition to the stated aim, but from my point of view potentially as important, is the visualization of workflows under execution. Developing workflows would be helped by looking at such plots annotated with timing info, or success failure conditions, because the workflow may not work right away and better development tools would make the process easier. I think aggregation of provenance data, if error conditions are captured would be very useful as a workflow development and debugging help.

      It this is of interest, please contact me, we are looking for good ways to visualize workflows as they are executing/being developed. See GobyWeb (http://arxiv.org/abs/1211.6...:Gc-90yQBd1sOO8d6oWuJaiBF6Ec "http://arxiv.org/abs/1211.6666)") and its successor, NextflowWorkbench (http://biorxiv.org/content/...:0w3VsVwWYggHNGxgXnlqN3tRYmE "http://biorxiv.org/content/early/2016/02/24/041236)").

    1. On 2016-02-25 00:05:06, user Meru Sadhu wrote:

      Thank you, David, for the kind words and comments. We agree that the most immediate applications of the CRISPR-based recombination mapping will be in unicellular organisms and cell culture. We also think the method holds a lot of promise for research in multicellular organisms, although we did not mean to imply that it “will be an efficient mapping method for all multicellular organisms”. Every organism will have its own set of constraints as well as experimental tools that will be relevant when adapting a new technique. To best help experts working on these organisms, here are our thoughts on your questions.

      You asked about mutagenesis during recombination. We Sanger sequenced 72 of our LOH lines at the recombination site and did not observe any mutations, as described in the supplementary materials. We expect the absence of mutagenesis is because we targeted heterozygous sites where the untargeted allele did not have a usable PAM site; thus, following LOH, the targeted site is no longer present and cutting stops. In your experiments you targeted sites that were homozygous; thus, following recombination, the CRISPR target site persisted, and continued cutting ultimately led to repair by NHEJ and mutagenesis.

      As to the more general question of the optimal mapping strategies in different organisms, they will depend on the ease of generating and screening for editing events, the cost and logistics of maintaining and typing many lines, and generation time, among other factors. It sounds like in Drosophila today, your related approach of generating markers with CRISPR, and then enriching for natural recombination events that separate them, is preferable. In yeast, we’ve found the opposite to be the case. As you note, even in Drosophila, our approach may be preferable for regions with low or highly non-uniform recombination rates.

      Finally, mapping in sterile interspecies hybrids should be straightforward for unicellular hybrids (of which there are many examples) and for cells cultured from hybrid animals or plants. For studies in hybrid multicellular organisms, we agree that driving mitotic recombination in the early embryo may be the most promising approach. Chimeric individuals with mitotic clones will be sufficient for many traits. Depending on the system, it may in fact be possible to generate diploid individuals with uniform LOH genotype, but this is certainly beyond the scope of our paper. The calculation of the number of lines assumes that the mapping is done in a single step; as you note in your earlier comment, mapping sequentially can reduce this number dramatically.

    2. On 2016-02-20 22:58:29, user David Stern wrote:

      This is a lovely method and should find wide applicability in many settings, especially for microorganisms and cell lines. However, it is not clear that this approach will be, as implied by the discussion, an efficient mapping method for all multicellular organisms. I have performed similar experiments in Drosophila, focused on meiotic recombination, on a much smaller scale, and found that CRISPR-Cas9 can indeed generate targeted recombination at gRNA target sites. In every case I tested, I found that the recombination event was associated with a deletion at the gRNA site, which is probably unimportant for most mapping efforts, but may be a concern in some specific cases, for example for clinical applications. It would be interesting to know how often mutations occurred at the targeted gRNA site in this study.

      The wider issue, however, is whether CRISPR-mediated recombination will be more efficient than other methods of mapping. After careful consideration of all the costs and the time involved in each of the steps for Drosophila, we have decided that targeted meiotic recombination using flanking visible markers will be, in most cases, considerably more efficient than CRISPR-mediated recombination. This is mainly due to the large expense of injecting embryos and the extensive effort and time required to screen injected animals for appropriate events. It is both cheaper and faster to generate markers (with CRISPR) and then perform a large meiotic recombination mapping experiment than it would be to generate the lines required for CRISPR-mediated recombination mapping. It is possible to dramatically reduce costs by, for example, mapping sequentially at finer resolution. But this approach would require much more time than marker-assisted mapping. If someone develops a rapid and cheap method of reliably introducing DNA into Drosophila embryos, then this calculus might change.

      However, it is possible to imagine situations where CRISPR-mediated mapping would be preferable, even for Drosophila. For example, some genomic regions display extremely low or highly non-uniform recombination rates. It is possible that CRISPR-mediated mapping could provide a reasonable approach to fine mapping genes in these regions.

      The authors also propose the exciting possibility that CRISPR-mediated loss of heterozygosity could be used to map traits in sterile species hybrids. It is not entirely obvious to me how this experiment would proceed and I hope the authors can illuminate me. If we imagine driving a recombination event in the early embryo (with maternal Cas9 from one parent and gRNA from a second parent), then at best we would end up with chimeric individuals carrying mitotic clones. I don't think one could generate diploid animals where all cells carried the same loss of heterozygosity event. Even if we could, this experiment would require construction of a substantial number of stable transgenic lines expressing gRNAs. Mapping an ~20Mbp chromosome arm to ~10kb would require on the order of two-thousand transgenic lines. Not an undertaking to be taken lightly. It is already possible to perform similar tests (hemizygosity tests) using D. melanogaster deficiency lines in crosses with D. simulans, so perhaps CRISPR-mediated LOH could complement these deficiency screens for fine mapping efforts. But, at the moment, it is not clear to me how to do the experiment.

    1. On 2016-02-24 21:32:08, user Fabien Campagne wrote:

      My lab developed the Goby framework, which you included in the benchmark.

      Could you clarify which command line options you used when running each tool for these comparisons?

      For Goby, you need to know that default options are equivalent to GZIP compression. They are not the state of the art approaches that we published in Campagne et al PLOS 2013. If you want these, you need to activate them (see command line flags described in our paper).

      On page 4, you write " Goby were run with Java v1.7. All were run with default parameters", so I am think you may have benchmarked against the GZIP codec.

      The data you present seem to suggest this as well, since our prior evaluations comparing CRAM and Goby found a large compression efficiency difference for Goby on RNA-Seq reads (of course, it is possible CRAM has made major progress since we conducted our benchmark).

    1. On 2015-12-17 23:02:04, user Jon Brock wrote:

      Thanks for sharing this. I'm really glad that autism researchers are starting to (a) look at cognitive heterogeneity; and (b) use preprint servers!

      Some comments:

      First, this is bugbear of mine but I find it quite unhelpful to talk about the RMET as a measure of mentalizing. In truth, it's a (relatively difficult) 4AFC test of emotion recognition. We can argue about whether learning the meanings of certain emotion words used in the test is contingent on having a fully functioning "theory of mind". But it's clear that the RMET is measuring something very different to other "mentalizing" tests in which the participant infers mental states based on the protagonists behaviour and/or events that are witnessed or described.

      Second, I agree that there's potentially useful information at the item level that is lost by just totting up the number of correct items. But it's not clear to me that your study is demonstrating this to be true. In other words, what does subdividing the ASD group into "impaired" and "unimpaired" subgroups based on the clustering algorithm tells us that we wouldn't get by subdividing them according to some cut-off based on raw score? We learn that the "unimpaired" group have higher overall scores and higher VIQs, but we kind of know that already.

      Third, related to the previous point, you show that a classifier trained on your subgroups in one dataset does a good job of predicting subgroup in an independent dataset; but how much of this "replicability" is driven by differences in overall performance? It would be helpful to get some more explicit details of what went into the classifier, but I assume that it's essentially providing a threshold on a weighted sum of all the items in the test. You've already shown that your subgroups (on which the classifier is trained) differ in overall performance (ie the unweighted sum of all the items). So it would be pretty odd if the classifier *didn't* perform well in a replication sample where subgroups also differed in overall performance. Indeed, in the TD group, where there aren't huge differences in overall performance, the classifier doesn't translate to the replication sample.

      Hopefully my comment will help you clarify the article. I really like the approach of digging into the item-level data. At the very least I think it tells us something useful about the structure of the RMET - and which items are discriminating well between people who do versus do not have difficulties with labelling complex emotions. I'm just not convinced (yet) of some of the bolder claims you're making!

      Finally, some references you may find useful:

      Roach, N. W., Edwards, V. T., & Hogben, J. H. (2004). The tale is in the tail: An alternative hypothesis for psychophysical performance variability in dyslexia. PERCEPTION-LONDON-, 33(7), 817-830.

      Towgood, K. J., Meuwese, J. D., Gilbert, S. J., Turner, M. S., & Burgess, P. W. (2009). Advantages of the multiple case series approach to the study of cognitive deficits in autism spectrum disorder. Neuropsychologia, 47(13), 2981-2988.

      Brock, J. (2011). Commentary: complementary approaches to the developmental cognitive neuroscience of autism–reflections on Pelphrey et al.(2011). Journal of Child Psychology and Psychiatry, 52(6), 645-646.

    1. On 2015-11-23 15:28:13, user Philippe Fort wrote:

      Yes, interesting story. Your paper is very exhaustive in the analysis of the p53 retrogene family in Proboscideans. <br /> I myself looked at these pseudogene sequences several years ago in the elephant genome and found that they had a much higher dN/dS ratio than the active gene, suggesting that they had no particular role in the cell metabolism. However, this was a global analysis and did not explore the possibility that only part of the protein may be important and that a single retrogene may be expressed and be under selection. So nice job for the identification of all copies in Proboscideans.<br /> Nevertheless, I think your paper needs more robustness on the biological role of TAP53RTG12, since a major experiment is missing and the most important figures are not totally convincing.<br /> - The experiment missing should answer "Does knocking down TAP53RTG12 in elephant dermal cells reduce mitomycin D sensitivity"? (by the way, Figure 6 which shows hypersensitivity to DNA damage is not oncluded in the pdf). <br /> - Figure4B and 7:<br /> Could you explain which are the data shown in Figure 4B? Besides, I was expecting a graph showing gene copy number vs body mass (in the present panel, we don't know which samples are paired!) <br /> Figure 7 is not clearly explained. Since data are dose-responses, it would be better to treat them as such (non linear fit and F-Test). <br /> It is not clear to me why drug doses at which TAP53RTG12 expressing cells display a maximal p53 response elicit a so small effect on caspase activation (even if it is statistically significant, is it biologically relevant?).

    1. On 2015-07-16 04:41:22, user Michael Eisen wrote:

      Vale has put his finger on an important problem. The process of publication has far too great an influence on the way we do science, let alone communicate it. And it would be great if we all used preprint servers and strived to publish work faster and in a less mature form than we currently do. I am very, very supportive of Vale’s quest (indeed it has been mine for the past twenty years) – if it is successful, the benefits to science and society would be immense.

      However, in the spirit of the free and open discussion of ideas that Vale hopes to rekindle, I should say that I didn’t completely buy the specific arguments and conclusions of this paper.

      My first issue is that the essay misdiagnoses the problem. Yes, it is bad that we require too much data in papers, and that this slows down the communication of science and the progress of people’s careers. But this is a symptom of something more fundamental – the wildly disproportionate value we place on the title of the journal in which papers are published rather than on the quality of the data or its ultimate impact.

      If you fixed this deeper problem by eliminating journals entirely and moving to a system of post-publication review, it would remove the perverse incentives that produce the effects Vale describes. However Vale proposes a far more modest solution – the use of pre-print servers. The odd thing with this proposal, as Vale admits, is that pre-print servers don’t actually solve the problem of needing a lot of data to get something published. It would be great for all sorts of reasons if every paper were made freely available online as early as possible – and I strongly support the push for the use of pre-print servers. But Vale’s proposal seem to assume that existing journal hierarchy would remain in place, and that most papers would ultimately be published in a journal. And this wouldn’t fundamentally alter the set of incentives to journals and authors that has led to problems Vale writes about. To do that you have to strip journals of the power to judge who is doing well in science – not just have them render that decision after articles are posted in a pre-print server. Unless the rules of the game are changed, with hiring, funding and promotion committees looking at quality instead of citation, universal adoption of pre-print servers will both be harder to achieve, and will have a limited effect on the culture of publishing.

      Indeed, I would argue that we don’t need “pre-print” servers. What we need is to treat the act of posting your paper online in some kind of centralized server as the primary act of publication. Then it can be reviewed for technical merit, interest and importance starting at the moment it is “published” and continuing for as long as people find the paper worth reading.

      Giving people credit for the impact their work has over the long-term would encourage them to publish important data quickly, and to fill in the story over time, rather than wait for a single “mature” paper. Similarly, rather than somewhat artificially create a new type of paper to publish “key findings” I think people will naturally write the kind of paper Vale wants if we change the incentives around publication by destroying the whole notion of “high-impact publications” and the toxic glamour culture that surrounds it.

      Another concern I have about Vale’s essay is that he bases his argument for pre-print servers on a set of data analyses that, while I found them interesting, I didn’t find them compelling. I think I get what Vale’s doing. He wants to promote the use of pre-print servers, and realizes that there is a lot of resistance. So he is trying to provide data that will convince people that there are real problems in science publishing so that they will endorse his proposals. But by basing calls for change on data, there is the real risk that other people will also find the data less than compelling and will dismiss the Vale’s proposed solutions as unnecessary as a result, when in fact the things Vale proposes would be just as valuable even if all the data trends he cites weren’t true

      So let’s delve into the data a bit. First, in an effort to test the widely held sentiment that the amount of data required for a paper has increased over time, he attempted to compare the amount of data contained in papers published in Cell, Nature and JCB during the first six months of 1984 and of 2014 (it’s not clear why he chose these three journals).

      The first interesting observation is that the number of biology papers published in Nature has dropped slightly over thirty years, and the number of papers published in JCB has dropped in half (presumably as the result of increased competition from other journals). To quantify the amount of data a paper contained, Vale analyzed figures in each of the papers. The total number of figures per paper was largely unchanged (a product, he argues, of journal policies), but the number of subpanels in each figure went up dramatically – two to four-fold.

      I am inclined to agree with him, but it is worth noting that there are several alternative explanations for these observations.

      As Vale acknowledges, practices in data presentation could have changed, with things that used to be listed as “data not shown” may now be presented in figures. I would add that maybe the increase in figure complexity reflects the fact that it is far easier to make complex figures now than it was in 1984. For example, when I did my graduate work in the early 1990’s it was very difficult to make figures showing aspects of protein structure. Now it is simple. Authors may simply be more inclined to make relatively minor points in a figure panel now because it’s easier.

      A glance at any of these journals will also tell you that the complexity of figures varies a lot from field to field. Developmental biologists, for example, seem to love figures with ten or twenty subpanels. Maybe Cell, Nature and JCB are simply publishing more papers from fields where authors are inclined to use more complex figures.

      Finally, the real issue Vale is addressing is not exactly the amount of data included in a paper, but rather the amount of data that had to be collected to get to the point of publishing a paper. It’s possible that authors don’t actually spend more time collecting data, but that they used to leave more data “in the drawer”.

      The real point is that it’s really hard to answer the question of whether papers now contain more data than they used to. And it’s even harder to determine whether the amount of data required to get a paper published is more of less of an obstacle now than it was thirty years ago.

      I think I understand why Vale did this analysis. His push to reform science publishing is based on a hypothesis – that the amount of data required to publish a paper has increased over time – and, as a good scientist, he didn’t want to leave this hypothesis untested. However, I would argue that differences between 1984 and today are irrelevant. Making it easier to publish work, and giving people incentives to publish their ideas and data earlier, is simply a good idea – and would be equally good even if papers published in 1984 required more data than they do today.

      Vale goes on to speculate about why papers today require more data, and chalks it up primarily to the increased size of the biomedical research community, which has increased competition for coveted slots in high-ranking journals while it has also increased the desire for such publications, and that this has allowed journals to be even more selective and to put more demands on authors. (It’s really quite interesting that the number of papers in Cell, Nature and (I assume)Science has not increased in 30 years even as the community has grown).

      This certainly seems plausible, but I wonder if it’s really true. I wonder if, instead, the increase in expectations of “mature” work have to do with the maturation of the fields in question. Nature has pretty broad coverage in biology (although it’s coverage is by no means uniform), but Cell and JCB both represent fields (molecular biology and cell biology) that were kind of in their infancies, or at least early adolescences, 30 years ago. And as fields mature, it seems quite natural for papers to include more data, and for journals to have higher expectations for what constitutes an important advance. You can see this happening over much shorter timeframes. Papers on the microbiome for example used to contain very little experimental data – often a few observations about the microbial diversity of some niche – but within just a few years, expectations for papers in the field have changed, with the papers getting far more data-dense. It would be interesting to repeat the kind of analysis Vale did, but to try and identify “new” fields (whatever that means), and see whether fields that were “new” in 2014 have papers of similar complexity to “new” fields in 1984.

      The second bit of data Vale produced is on the relationship between publications and the amount of time spent in graduate school. Using data from UCSF’s graduate program, he found that current graduate students “published fewer first/second author papers and published much less frequently in the three most prestigious journals.” The average time to a first author papers for UCSF students in the 80’s was 4.7 years, and now it is 6.0. And the number of students withScience, Nature or Cell papers has fallen in half.

      Again, one could pick this analysis apart a bit. Even if you accept the bogus notion that SNC publications are some kind of measure of quality, there are more graduate students both in the US and elsewhere, but the number of slots in those journals has remained steady. Even if criteria for publication were unchanged over time, one would have expected the number of SNC papers for UCSF graduate students to have gone down simply because of increased competition. If SNCpapers are what these students aspire to (which is probably sadly largely true) then it makes sense that they would spend more time trying to make better papers that will get into these journals. It’s not clear to me that this requires that papers have more data, but rather than they have better data. But either way, once could look at this and argue that the problem isn’t that we need new ways of publishing, but rather that we need to stop encouraging students to put their papers into SNC. I suspect that all of the trends Vale measures here would be reversed if UCSF faculty encouraged all of their graduate students to publish all of their papers in PLOS ONE.

      One could also argue that the trends reflect not a shift in publishing, but rather a degradation in the way we train graduate students. In my experience most graduate student papers reflect data that was collected in the year preceding publication. Maybe UCSF faculty, distracted perhaps by grant writing, aren’t getting students to the point where they do the important, incisive experiments that lead to publication until their fifth year, instead of their fourth.

      And again, while the increased time to first publication has increased dramatically in the last 30 years, it’s hard to point to 1984 as some kind of Golden Age. That typical students back then weren’t publishing at all until the end of their fifth year in graduate school is still bad.

      So, in conclusion, I think there is a lot to like in this essay. Without explicitly making this point, the observations, data and discussion Vale present make a compelling case that publishing is having a negative impact on the way we do science and the way we train the next generation. I have some issues with the way he has framed the argument and the degree of conservativeness in his solutions. But I think Vale has made an important contribution to the now decades old fight to reform science publishing, and we would all be better off if we heeded his advice.

    2. On 2015-07-15 21:25:25, user Stephen Curry wrote:

      This is an excellent contribution to the live and ongoing debate about the problems in scholarly publishing. The idea of preprints is not new, though it’s a fairly late arrival in the life sciences, incarnated here in bioarXiv and in PeerJ Preprints.

      In the UK the Royal Society (think 'National Academy') held a two-part meeting in April and May of this year to discuss the Future of Scholarly Scientific Communications (https://royalsociety.org/ev.... It covered a broad range of issues (peer review, research assessment, reproducibility, fraud, the journal article and publisher profits) but on a surprising number of occasions the debate circled back to the problem of perverse incentives, the most notable one being the hold of the impact factor on people's careers. This retards science and encourages a spectrum of fraudulent behaviours as researchers strive to get in to the 'best' journals. The wider adoption of preprints would clearly help to mitigate some of the worst effects by tapping into the nascent culture of openness and by enabling much more rapid dissemination of results than at present. (see my digest of the meeting here: http://occamstypewriter.org...

      The adoption of preprints represents a cultural shift, to be sure, but if the physicists can manage it, there's no good reason for life scientists not to be able to follow suit! We need to start rewarding people for publishing stuff quickly – and for participating in the open commentary that preprints invite.

  2. Apr 2026
    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      In this important work, it is demonstrated that certain high-resolution cryo-EM structures can be obtained by using concentrated cell extracts without purification. The compelling results with the mammalian ribosomes demonstrate the utility of this approach for this molecule and complexes with elongation factor 2. Moreover, this work also demonstrates the utility of 2D template matching for particle picking for structure determination by single-particle averaging pipelines.

      We thank the reviewers for their valuable comments and suggestions, which have helped us to improve the manuscript. We provide a response to the referees’ comments below.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Seraj et al. introduces a transformative structural biology methodology termed "in extracto cryo-EM." This approach circumvents the traditional, often destructive, purification processes by performing single-particle cryo-EM directly on crude cellular lysates. By utilizing high-resolution 2D template matching (2DTM), the authors localize ribosomal particles within a complex molecular "crowd," achieving near-atomic resolution (~2.2 Å). The biological centerpiece of the study is the characterization of the mammalian translational apparatus under varying physiological states. The authors identify elongation factor 2 (eEF2) as a nearly universal hibernation factor, remarkably present not only on non-translating 80S ribosomes but also on 60S subunits. The study provides a detailed structural atlas of how eEF2, alongside factors like SERBP1, LARP1, and IFRD2, protects the ribosome's most sensitive functional centers (the PTC, DC, and SRL) during cellular stress.

      Strengths:

      The "in extracto" approach is a significant leap forward. It offers the high resolution typically reserved for purified samples while maintaining the "molecular context" found in in situ studies. This addresses a major bottleneck in structural biology: the loss of transiently bound or labile factors during biochemical purification.

      The finding that eEF2 binds and sequesters 60S subunits is a major biological insight. This suggests a "pre-assembly" hibernation state that allows for rapid mobilization of the translation machinery once stress is relieved, which was previously uncharacterized in mammalian cells.

      The authors successfully captured eIF5A and various hibernation factors in states that are traditionally disrupted. The identification of eIF5A across nearly all translating and non-translating states highlights the power of this method to detect ubiquitous but weakly bound regulators.

      The manuscript beautifully illustrates the "shielding" mechanism of the ribosome. By mapping the binding sites of eEF2 and its co-factors, the authors provide a clear chemical basis for how the cell prevents nucleolytic cleavage of ribosomal RNA during nutrient deprivation.

      Weaknesses:

      (1) While 2DTM is a powerful search tool, it inherently relies on a known structural "template." There is a risk that this methodology may be "blind" to highly divergent or novel macromolecular complexes that do not share sufficient structural similarity with the search model. The authors should discuss the limitations of using a vacant 60S/80S template in identifying highly remodeled stress-induced complexes. For instance, what happens if an empty 40S subunit is used as a template? In the current work, while 60S and 80S particles are picked, none are 40S. The authors should comment on this.

      Thank you for your comment. As noted by the reviewer, 2DTM inherently favors particles that share sufficient similarity with the search template and may underrepresent highly remodeled or structurally divergent complexes. Importantly, once particles are identified, subsequent 2D/3D classification and refinement are not constrained by the template used for particle picking. Consistent with this, we observe classes displaying additional or altered densities absent in the original template, indicating that template matching does not preclude the detection of remodeled ribosomal states, although highly divergent species may still escape detection.

      Regarding the use of a 40S subunit as a template for 2DTM, we tested two templates: a complete 40S subunit and the 40S body alone. Using these 40S templates, we captured several 40S-, 43S-, and 48S-containing complexes, as well as 80S particles. As expected, no individual 60S classes emerge with 40S-TM. 40S-TM yielded 80S classes similar to those with 60-TM, although the number of particles was lower than that in 60S template matching, resulting in lower resolution of these classes. Since this study focuses on ribosome hibernation, we chose to proceed with the 60S-TM results and do not report results using 40S-TM. We reported 40S-TM results in another study from our groups (Zottig et al., bioRxiv, 2025), which focuses on translation initiation on 40S subunits and was deposited as preprint after this submission.

      We have added a comment and reference describing the use of the 40S template in the initial section of Results and Discussion: “This result echoes our concurrent finding that using 40S or partial 40S templates yields a variety of initiation complexes and 80S classes, revealing densities beyond those in the template [44].”

      (2) In the GTPase center, the authors identify density for "DRG-like" proteins. However, due to limited local resolution in that specific region, they are unable to definitively distinguish between DRG1 and DRG2. While the structural similarity is high, the functional implications differ, and the identification remains somewhat speculative. The authors should acknowledge this in the text.

      We agree with this comment and address it in the main text:

      “Whereas the overall shape and secondary structure resemble DRG1 or DRG2, the local resolution is insufficient to distinguish between these or other similarly structured proteins. Both yeast and mammalian counterparts are reported to function with a companion factor (Tma146p or Gir2 in yeast; or DFRP1 and DFRP2 in mammals), but our maps do not contain density that could correspond to DFRP1/2 near the putative DRG1/2 density. Future work will elucidate the function of these or other DRG-like GTPases in the context of an elongation complex.”

      (3) While "in extracto" is superior to purified SPA, the act of cell lysis (even rapid permeabilization) still involves a change in the chemical environment (pH, ion concentration, and dilution of metabolites). The authors could strengthen the manuscript by discussing how post-lysis changes might affect the occupancy of factors like GTP vs. GDP states.

      Thank you for pointing this out. Cell lysis can indeed lead to a change in the chemical environment, although we do not know how post-lysis changes may specifically affect the occupancy of factors, such as GTP- vs. GDP-bound states. We tried to minimize this effect by performing a rapid permeabilization. Our efforts to optimize our protocols are ongoing, and we expect to have a better answer to this question in the future.

      Nevertheless, to address this reviewer’s concern, our discussion states: “Additional optimization of buffer conditions may be required to more accurately represent the translation states observed in cells, as ionic conditions are known to affect the conformation of the ribosomes (e.g. rotated/non-rotated) and binding of protein factors”.

      (4) The study provides excellent snapshots of stationary states (translating vs. hibernating), but the kinetic transition, specifically how the 60S-eEF2 complex is recruited back into active translation, is not well discussed. On page 13, the authors present eEF2 bound to 60S but do not mention anything regarding which nucleotide is bound to the factor. It only becomes clear that it is GDP after looking at Figure S9. This should be clarified in the text. Similarly, the observations that eEF2 is bound to GDP in the 60S and 80S raise questions as to how the factor dissociates from the ribosome. This could also be discussed.

      Thank you for bringing this to our attention. We now state in the main text that eEF2 is bound with GDP on the 60S subunit.

      As for the kinetic transitions of 60S-eEF2 complexes, like this reviewer, we are fascinated by the possible roles and mechanisms of the 60S-eEF2 complex. The averaged particle ensembles derived from cryo-EM data do not report on the kinetics or transition pathways directly. We acknowledge in the main text that “Future studies will bring insights into the roles of the protein(s) and into the functions and transitions of 60S•eEF2 complexes to the pool of translating ribosomes”.

      Overall Assessment:

      The work reported in this manuscript likely represents the future of structural proteomics. The combination of high-resolution structural biology with minimal sample perturbation provides a new standard for investigating the cellular machines that govern life. After addressing minor points regarding template bias, protein identification, and transition dynamics, this work may become a landmark in the field of translation.

      Reviewer #2 (Public review):

      In this manuscript, the authors describe using "in extracto" cryo-EM to obtain high-resolution structures of mammalian ribosomes from concentrated cell extracts without further purification or reconstitution. This approach aims to solve two related problems. The first is that purified ribosomes often lose cellular cofactors, which are often reconstituted in vitro; this precludes the ability to find novel interactions. The second is that while it is possible to perform cryo-EM on cellular lamella, FIB milling is a slow and laborious process, making it unfeasible to collect datasets sufficiently large to allow for high-resolution structure determination. Extracts should contain all cellular cofactors and allow for grid preparation similar to standard single-particle analysis (SPA) approaches. While cryo-EM of cell extracts is not in itself novel, this manuscript uses 2D template matching (2DTM) for particle picking prior to structure determination using more standard SPA pipelines. This should allow for improved picking over other approaches in order to obtain large datasets for high-resolution SPA.

      This manuscript has two main results: novel structures of ribosomes in hibernating states; and a proof-of-principle for in extracto cryo-EM using 2DTM. Overall, I think the results presented here are strong and serve as a proof-of-principle for an approach that may be useful to many others. However, without presenting the logic of how parameters were optimized, this manuscript is limited in its direct utility to readers.

      Thank you for this valuable comment. We have expanded our Methods section “Optimization of 2DTM in RRL data “to present the logic behind parameter optimization, with the paragraph beginning with “We optimized high-resolution template matching procedures…”

      Reviewer #3 (Public review):

      Summary:

      The authors describe a new structural biology framework termed "in extracto cryo-EM," which aims to bridge the gap between single-particle cryo-EM of purified complexes and in situ cryo-electron tomography (cryo-ET). By utilizing high-resolution 2D template matching (2DTM) on mammalian cell lysates, the authors sought to visualize the translational apparatus in a near-native environment while maintaining near-atomic resolution. The study identifies elongation factor 2 (eEF2) as a major hibernation factor bound to both 60S and 80S particles and describes a variety of hibernation scenarios involving factors such as SERBP1, LARP1, and CCDC124.

      Strengths:

      (1) The use of 2DTM effectively overcomes the signal-to-noise challenges posed by the dense and viscous nature of cellular extracts, yielding maps as high as 2.2 Å.

      (2) The discovery of eEF2-GDP as a ubiquitous shield for ribosomal functional centers, particularly its unexpected stabilization on the 60S subunit, provides a compelling model for ribosome preservation during stress.

      Weaknesses:

      (1) Representative nature of cell samples and lower detection limit

      The cells used in this study (MCF-7, BSC-1, and RRL) are either fast-growing cancer cell lines or specialized protein-synthetic systems. For cells with naturally low ribosomal abundance (such as quiescent primary cells), achieving the target concentration (e.g., A260 > 1000 ng/uL) would require an exponentially larger starting cell population.

      Is there a defined lower limit of ribosomal concentration in the raw lysate below which the 2DTM algorithm fails to yield high-resolution classes? In ribosome-sparse lysates, A260 becomes an unreliable proxy for ribosome density due to the high background of other RNA species and proteins. How do the authors estimate specific ribosome abundance in such heterogeneous fields?

      We have not tested these specific points, but we found that 2DTM can successfully result in high-resolution reconstructions even with 1-2 particles per micrograph. This would require a substantially larger dataset than in this work yet could provide a viable strategy for diluted or low-abundance samples. Other optimizations, including lysate concentration, may help as well. We have the following text to reflect these points:

      “Additional optimization of buffer conditions may be required to more accurately represent the translation states observed in cells, as ionic conditions are known to affect the conformation of the ribosomes (e.g. rotated/non-rotated) and binding of protein factors [91-94]. For cells or samples with lower abundance of ribosomes or other macromolecules/complexes of interest, a lysate concentration step or collection of a larger dataset may be considered.”

      (2) Quantitation in heterogeneous lysates and crowding effects

      The authors utilize A260 as a key quality control measure before grid preparation. However, if extreme physical concentration is required to see enough particles, the background concentration of other cytoplasmic components also increases. This may lead to molecular crowding or sample viscosity that interferes with the formation of optimal thin ice. How do the authors calculate or estimate the specific abundance of ribosomes in the cryo-EM field of view when they represent a much smaller percentage of the total cellular content?

      We reported A260 as a reference that may be useful to achieve particle distributions resembling those in our work, rather than as a key quality control measure. Accordingly, we do not use it to estimate ribosome concentration or the specific abundance of ribosomes; instead, we’d recommend adjusting the sample concentration/dilution by grid screening.

      This reviewer mentions the important aspect of ice thickness. We found that the highest population of ribosome particles is found in thicker ice regions, and these particles have been used to make up the majority of our datasets leading to high-resolution reconstructions. We have added this observation to “Optimization of 2DTM in RRL data”.

      (3) Optimization of sample preparation

      The authors describe lysates as dense and viscous, requiring multiple blotting steps (2-3 times) for 3-8 seconds. Have the authors tested whether a larger molecular weight cutoff (e.g., 100 kDa) during concentration could improve the ribosome-to-background ratio without losing small factors like eIF5A (approx. 17 kDa)? Could repeated blotting of a concentrated, viscous lysate introduce shearing forces or increased exposure to the air-water interface that perturbs the native conformation of the complexes?

      We strived to minimize the number of steps in sample preparation, so we did not extensively test concentration steps. We also found that a concentration step can be omitted; the eIF5A-containing structure from the RRL dataset was determined without this step. We agree with the reviewer that repeated blotting may change ribosome complex equilibrium and result in a different distribution of functional states than in cells. However, we did not find evidence of perturbation of the native conformations of complexes, as the positions of ribosomes and factors are nearly identical to those observed in previous studies, including the recent high-resolution structures from cells that we cite.

      (4) The regulatory switch and mechanism of eEF2

      The finding that eEF2-GDP occupies dormant ribosomes is striking. What drives eEF2 from its canonical role in translocation to this hibernation state? Is this transition purely driven by stoichiometry (lack of mRNA/tRNA) and the GDP/GTP ratio, or is there a role for post-translational modifications? How do these eEF2-bound dormant ribosomes rapidly re-enter the translation pool upon stress relief?

      We are glad that this reviewer is fascinated by the eEF2-GDP occupancy on dormant ribosome (just like we are)! These are important open questions that require further research, as our cryo-EM analyses cannot directly address the kinetic or mechanistic aspects of the mentioned processes. We did explore the known modification/phosphorylation sites in eEF2 densities but did not find evidence for such modifications, which does not rule out the possibility of transient or new modifications.

      (5) Hibernation diversity and LARP1 contextualization

      The study reveals that hibernation strategies vary across cell types. Does the high hibernation rate in RRL reflect a physiological state, or does it hint at “preparation-induced stress” due to resource exhaustion or mRNA degradation in the cell-free system? How do the authors reconcile their discovery of LARP1 on 80S particles with recent 2024 reports that primarily describe LARP1 as an SSU-bound repressor?

      Based on the high abundance of hibernating ribosomes in RRL (relative to many other samples we have tested so far), we speculate that this scenario may result from the stresses induced during lysate preparation: first, the rabbits are treated with phenylhydrazine inducing cell stress, then lysates are treated with micrococcal nuclease to degrade endogenous mRNAs. In addition, the specialization of reticulocytes may contribute to the distinct expression of stress/hibernation factors.

      As for LARP1, our finding is consistent with the 2024 work by Saba et al, who reported LARP1 binding to both 40S subunits and 80S ribosomes. They also noted that LARP1-bound ribosomes are “non-translating”, consistent with our structures.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In Figure 3, it would be easier for the reader if the authors would report the % of particles in each class. Also, indicating body rotation and head swiveling values would help.

      Because our high-resolution maps result from a combination of data sets (e.g., RRL with an mRNA and RRL without an mRNA), we specify the particle percentages in the corresponding classification schemes in supplemental figures. To avoid excessive labeling in this figures, body rotation and head swiveling values for the new classes are shown in Figure 4.

      (2) Page 16, what is 'elongation factor 1'? It doesn't seem the authors refer to eEF1A?

      Thank you for pointing out this inconsistency, this is indeed eEF1A. We have corrected the text.

      (3) Page 16, after 'individual 60S subunits', there is a missing full stop.

      Thanks. Corrected.

      Reviewer #2 (Recommendations for the authors):

      I am not an expert in ribosome biology and do not have any specific comments on the various states presented here. Instead, I will mainly focus on the image processing aspects of this manuscript.

      Major points:

      (1) Were any AI-based particle pickers, such as crYOLO, topaz, or warp tested? While more traditional template-based or LoG pickers were shown to be inferior to 2DTM, it is unclear if AI methods would perform just as well. Given that a major point of this manuscript is the image processing pipeline, and that these AI tools have been widely adopted in the field, I think this is an important consideration.

      We used other particle pickers before using 2DTM and have listed them in the Supplementary Information: please see Table S1 for a complete list of particle pickers evaluated in this study. Since our present work focuses on a sample preparation method, a more extensive evaluation of particle picking methods is beyond the scope of this study.

      (2) While the methods used to obtain the structures presented are detailed, I think it would also be useful to provide some logic for how parameters were determined or optimized. This would serve as a useful foundation for readers who wish to try out this in an extracto approach on their own specimens. Some of these optimizations seem quite specific, such as optimization of angular search parameters, but with no clear logic: e.g., why is the out-plane search coarser than the in-plane search; what is the effect of increasing the angular step sizes? Some seem inconsistent, e.g., why is e2pdb2mrc.py sometimes used and the cisTEM simulate used other times? Some are poorly described, such as "the defocus search turned on for micrographs with thicker ice" where there is no mention of how ice thickness is assessed and how thick is too thick. I think a workflow figure with accompanying text would help the reader understand the logic used in this work and how to apply that logic to their own projects.

      To address the comments in (2), we provide separate responses addressing each comment:

      (1) Provide some logic for how parameters were determined or optimized:

      The logic behind determining and optimizing search parameters is a balance between search precision and computational cost. In practice, users must weigh the benefit of finer sampling against the substantial increase in runtime, particularly for large datasets. For example, enabling defocus searching with a 200 Å step size and a 1000 Å range increases the computational time by approximately 11-fold compared to running the same search with defocus disabled (since each defocus plane in the positive and negative direction are searched), making such increases prohibitive, when GPU resources are limited. In such cases, reducing the defocus search to a 250 Å step size and a 500 Å range can dramatically shorten runtime while preserving nearly the same number of reliable matches. In summary, we found that optimizing the defocus search, in-plane, out-plane angles, and the image/micrograph pixel size can substantially reduce the processing speed while sacrificing only a small percentage of particles.

      We have expanded our parameter optimization paragraph in “Optimization of 2DTM in RRL data”, as mentioned in a previous response.

      (2) Some seem inconsistent, e.g., why is e2pdb2mrc.py sometimes used and the cisTEM simulate used other times?

      e2pdb2mrc.py is simpler to use and was used in the beginning of the project. Later, we switched to using the simulate program since it preformed slightly better. Either software is suitable to generate templates for 2DTM.

      (3) Some are poorly described, such as "the defocus search turned on for micrographs with thicker ice" where there is no mention of how ice thickness is assessed and how thick is too thick.

      We did not quantitatively assess ice thickness; instead, we tested whether it is advantageous to include the defocus search. To this end, we first performed CTF estimation and grouped micrographs based on their fit resolution. From each group, we selected ten micrographs representing the highest and lowest fit resolutions. Template matching was then performed using identical parameters, once with defocus search enabled and once with it disabled. The number of picked particles for each micrograph under both conditions was compared. When a significant difference was observed most commonly for icy micrographs with low fit resolution we enabled defocus search for that group of images. The difference between having the defocus search on vs off sometimes resulted in having 2x more matches. We found these images/datasets appeared to have a higher background compared to in-vitro reconstituted samples. The template-matching results from these micrographs were subsequently combined with results from groups processed with defocus search disabled.

      To address this point, we have included this description in “Optimization of 2DTM in RRL data”.

      (4) I think a workflow figure with accompanying text would help the reader understand the logic used in this work and how to apply that logic to their own projects.

      Thanks for this suggestion. We have added a workflow figure as Figure 1—figure supplement 2.

      Minor Points:

      (1) While the image processing described seems appropriate, I think it is still necessary to include Fourier shell correlation plots for the final structures as supplemental data.

      Thank you for pointing out this inadvertent omission. We have added FSC curves in Figure 3—figure supplement 3.

      (2) One of the initial workflows used is a Relion 3 pipeline, which is, at this point, quite dated. Is there a reason Relion 4 or 5 was not used instead?

      The project started when Relion 3 was the latest version.

  3. social-media-ethics-automation.github.io social-media-ethics-automation.github.io
    1. In some cases we might want a social media company to be able to see our “private” messages, such as if someone was sending us death threats. We might want to report that user to the social media company for a ban, or to law enforcement (though many people have found law enforcement to be not helpful), and we want to open access to those “private” messages to prove that they were sent.

      Many people assume that if someone wants privacy, they must be doing something suspicious, but this chapter shows that privacy is often about dignity, safety, and control over personal information. For example, people may want private conversations to avoid embarrassment, protect themselves from harassment, or separate different parts of their lives. I think this is especially relevant today because social media often pressures people to share everything publicly. Sometimes choosing privacy is actually a healthy boundary.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2025-02954

      Corresponding author(s): Ana-Maria Lennon-Duménil and Sandra Iden

      1. General Statements [optional]

      We thank the three reviewers for the time and caution taken to assess our manuscript, and for their constructive feedback that will help improve the study. We herewith provide a revised manuscript that addressed the key points raised by the reviewers.

      2. Point-by-point description of the revisions

      __Reviewer #1 (Evidence, reproducibility and clarity (Required)): __

      Summary: The manuscript by Delgado et al. reports the role of the actin remodeling Arp2/3 complex in the biology of Langerhans cells, which are specialized innate immune cells of the epidermis. The study is based on a conditional KO mouse model (CD11cCre;Arpc4fl/fl), in which the deletion of the Arp2/3 subunit ArpC4 is under the control of the myeloid cell specific CD11c promoter.

      In this model, the assembly of LC networks in the epidermis of ear and tail skin is preserved when examining animals immediately after birth (up to 1 week). Subsequently however LCs from ArpC4-deleted mice start displaying morphological aberrations (reduced elongation and number of branches at 4 weeks of age). Additionally, a profound decline in LC numbers is reported in the skin of both the ear and tail of young adult mice (8-10 weeks).

      To explore the cause of such decline, the authors then opt for the complementary in vitro study of bone-marrow derived DCs, given the lack of a model to study LCs in vitro. They report that ArpC4 deletion is associated with aberrantly shaped nuclei, decreased expression of the nucleoskeleton proteins Lamin A/C and B1, nuclear envelop ruptures and increased DNA damage as shown by γH2Ax staining. Importantly, they provide evidence that the defects evoked by ArpC4 deletion also occur in the LCs in situ (immunofluorescence of the skin in 4-week old mice).

      Increased DNA damage is further documented by staining differentiating DCs from ArpC4-deleted mice with the 53BP1 marker. In parallel, nuclear levels of DNA repair kinase ATR and recruitment of RPA70 (which recruits ATR to replicative forks) are reduced in the ArpC4-deleted condition. In vitro treatment of DCs with the topoisomerase II inhibitor etoposide and the Arp2/3 inhibitor CK666 induce comparable DNA damage, as well as multilobulated nuclei and DNA bridges. The authors conclude that the ArpC4-KO phenotype might stem, at least in part, from a defective ability to repair DNA damages occurring during cell division.

      The study in enriched by an RNA-seq analysis that points to an increased expression of genes linked to IFN signaling, which the authors hypothetically relate to overt activation of innate nucleic acid sensing pathways.

      The study ends by an examination of myeloid cell populations in ArpC4-KO mice beyond LCs. Skin cDC2 and cDC2 subsets display skin emigration defects (like LCs), but not numerical defects in the skin (unlike LCs). Myeloid cell subsets of the colon are also present in normal numbers. In the lungs, interstitial and alveolar macrophages are reduced, but not lung DC subsets. Collectively, these observations suggest that ArpC4 is essential for the maintenance of myeloid cell subsets that rely on cell division to colonize or to self-maintain within their tissue of residency (including LCs).

      MAJOR COMMENTS

      1. ArpC4 and Arp2/3 expression The authors argue that LCs from Arpc4KO mice should delete the Arpc4 gene in precursors that colonize the skin around birth. It would be important to show it to rule out the possibility that the lack of phenotype (initial seeding, initial proliferative burst) in young animals (first week) could be related to an incomplete deletion of ArpC4 expression. Also important would be to show what is happening to the Arp2/3 complex in LCs from Arpc4KO mice.

      __Response: __We thank this reviewer for the careful assessment of our manuscript. Regarding this specific comment, we would like to clarify that we do not expect ArpC4 to be deleted in LC precursors, as CD11c is only expressed once the cells have entered the epidermis. Instead, we expect the deletion to take place after birth around day 2-4 (Chorro et al., 2009). For this reason, we performed a deletion PCR of epidermal cells at postnatal day 7 (P7), a time at which the proliferative burst occurs. This analysis revealed CD11c-Cre-driven recombination in the ArpC4 locus (Fig. S2C). This experiment indicates that ArpC4 deletion does not alter LC proliferation and postnatal network formation.

      We apologize if this was not clear enough and have (1) revised the manuscript text to clearly explain the time at which ArpC4 will be deleted early during development when using the CD11c-Cre transgene, and (2) better emphasized the rationale for the deletion PCR (page 4).

      In the in vitro studies with DCs, the level of ArpC4 and Arp2/3 deletion at the protein level is also not documented.

      __Response: __We have previously analyzed the expression of ArpC4 in BMDCs in a recent study, confirming its loss in CD11c-Cre;ArpC4fl/fl cells at the protein level: Rivera et al. Immunity 2022; doi: 10.1016/j.immuni.2021.11.008. PMID: 34910930 (Fig. S2D). Therefore, in the current manuscript we only refer to that paper (Results, first paragraph).

      The authors explain that surface expression of the CD11c marker, which drives Arpc4 deletion, gradually increased during differentiation of DCs: from 50% to 90% of the cells. Does that mean that loss of ArpC4 expression is only effective in a fraction of the cells examined before day 10 of differentiation (e.g. in the RNA-seq analysis)?

      __Response: __The reviewer is correct, there is heterogeneity in CD11c expression, which is inherent of this DC culture model, implying that Arpc4 gene deletion will be partial. However, despite this, we were able to detect significant differences between the transcriptome of control and CD11c-Cre;ArpC4fl/fl DCs in early phases during differentiation, emphasizing that the phenotype of ArpC4 loss is robust.

      We have included a notion on this heterogeneity in the revised manuscript text (page 5).

      Intra-nuclear versus extra-nuclear activities of Arp2/3

      The authors favor a model whereby intra-nuclear ArpC4 helps maintaining nuclear integrity during proliferation of DCs (and possibly LCs). However, multiple pools of Arp2/3 have been described and accordingly, multiple mechanisms may account for the observed phenotype: i) cytoplasmic pool to drive the protrusions sustaining the assembly of the LC network and its connectivity with keratinocytes ; ii) peri-nuclear pool to protect the nucleus ; iii) Intra-nuclear pool to facilite DNA repair mechanisms e.g. by stabilizing replicative forks (the scenario favored by the authors).

      __Response: __The referee is correct, and this is discussed in our manuscript (page 11, upper paragraph): we cannot exclude that several pools of branched actin are influencing the phenotype we here describe.

      Unfortunately, we have previously tested several antibodies against ArpC4, but in our hands, and despite comprehensive optimization, they did not yield specific signals that would enable us to assess changes in subcellular localization in murine cells. Upon this reviewer's comment, we have now reassessed the available tools. We have tested an antibody against ArpC2 (Millipore, Anti-p34-Arc/ARPC2, 07-227-I-100UG), which however did not produce any specific signals either. Instead, we found an ArpC5 antibody that yielded a filamentous staining in the cytoplasm plus nuclear staining in distinct foci of control bone marrow-derived DCs, indicating that Arp2/3 components may in principle act in the nucleus in these cells (see revised Figure S3F,G).

      It is recommended that the authors try to gather more supportive data to sustain the intra-nuclear role. Documenting ArpC4 presence in the nucleus would help support the claim. It could be combined with treatments aiming at blocking proliferation in order to reinforce the possibility that a main function of ArpC4 is to protect proliferating cells by favoring DNA repair inside the nucleus.

      __Response: __We thank this reviewer for this very helpful comment. As outlined in the previous response, we have aimed at obtaining subcellular localization data for Arp2/3 complex components, and along with that study a potential intranuclear localization. Beyond that, in comparison to commonly cultured cell types, however, we face two hurdles addressing the nuclear Arp2/3 role in full: 1) Due to poor transduction rates and epigenetic silencing, we cannot sufficiently express exogenous constructs such as ArpC4-NLS in DCs to assess the subcellular localization of Arp2/3 complex components. 2) We have performed preliminary tests to block proliferation in DCs, using the cyclin D kinase 1 inhibitor RO3306 at different concentrations and incubation times during DC differentiation. Unfortunately, most cells were found dead after treatment. Further lowering the inhibitor concentrations (below 3.5uM) will likely not block the cell cycle, rendering this approach unsuited.

      As mentioned above, we have tested the suitability of additional antibodies directed against Arp2/3 complex components to assess their subcellular localization, with the aim to discriminate peripheral cytoplasmic vs. perinuclear vs. intranuclear localization. These new data that report nuclear and cytoplasmic ArpC5 in control DCs are now presented in revised figure S3F,G. In addition, we toned down our current phrasing in the discussion, also emphasizing the possibility that cytoplasmic or perinuclear pools of the complex may indirectly help maintain the integrity of the genome in LCs (page 12).

      Nuclear envelop ruptures

      The nuclear envelop ruptures are not sufficiently documented (how many cells were imaged? quantification?). The authors employ STED microscopy to examine Lamin B1 distribution. The image shown in Figure 4A does not really highlight the nuclear envelop, but rather the entire content. Whether it is representative is questionable. We would expect Lamin B1 staining intensity to be drastically reduced given the quantification shown in Figure 3D. In addition, although the authors have stressed in the previous figure that Arpc4-KO is associated with nucleus shape aberrations, the example shown in Figure 4A is that of a nucleus with a normal ovoid shape.

      It is recommended to quantify the ruptures with Lap2b antibodies (or another staining that would better delineate the envelop) in order to avoid the possible bias due to the reduced staining intensity of Lamin B1.

      __Response: __NE ruptures are quantified by imaging NLS-GFP-expressing DCs in microchannels to visualize leakage of their nuclear content (Fig. 4B,C). The STED image mentioned by the referee (Fig. 4A,D) was only shown to further illustrate examples of NE ruptures, here using Lamin B1 as an immunofluorescence marker for the NE. We do agree with the reviewer that it was not chosen optimally to represent the ArpC4KO phenotype regarding nuclear shape and Lamin B1.

      We have now provided representative examples of nuclear illustrations of the ArpC4KO phenotype vs. control cells. In addition, we performed STED microscopy of Lap2b immunostained DCs as suggested by the referee (revised Fig. 4A,B).

      A missing analysis is that of nuclear envelop ruptures as a function of nucleus deformations.

      __Response: __As stated in the manuscript (page 5, third paragraph), the morphology of DCs is quite heterogeneous. As mentioned above, nuclear rupture events were quantified by live-imaging of NLS-GFP expressing DCs, enabling the tracing of rupture events. Live imaging is the only robust manner to measure nuclear membrane rupture events as they are transient due to rapid membrane repair (Raab et al. Science 2016). The NLS-GFP label itself, however, is not accurate enough to also quantify nuclear deformations. The latter therefore was quantified after cell fixation, using DAPI and/or immunostaining for NE envelope markers (Figures 3 and S3).

      As suggested by the referee, we have now quantified nuclear deformations using Lap2b staining of the nuclear envelope (revised Fig. 4A,B), demonstrating reduced circularity and increased elongation of ArpC4KO nuclei.

      Fig 4B-C: same frequency of Arpc4-KO and WT cells displaying nuclear envelop ruptures in the 4-µm channels; however image show a rupture for the Arpc4-KO and no rupture for the WT cells (this is somehow misleading). Are ruptures similar in Arpc4-KO and WT cells in this condition?

      __Response: __We apologize for choosing an image that does not represent well our quantification, our mistake. The revised manuscript now contains an image that better reflects our quantification (revised Fig. 4C).

      Fig 4D-E: is their a direct link between nuclear envelop ruptures and ƴH2A.X?

      __Response: __At present, we can only correlate the findings of increased gH2Ax and elevated events of nuclear envelope ruptures in ArpC4KO DCs. Rescue experiments are very difficult to impossible in DCs (e.g. restoring Lamin A/C and B1 levels in the KOs and subsequently assessing the amount of DNA damage). While we are afraid that we cannot address a potential link between NE ruptures and DNA damage by experiments in a manner feasible within this manuscript's revision, we have discussed this interesting aspect based on observations in immortalized cell culture systems (page 10). However, we would like to note that this was indeed shown for different cell types in Nader et al. Cell 2021. This effect results from access of cytosolic nuclease Trex1 to nuclear DNA. We have added this point in our revised manuscript (page 11).

      Interesting (but optional) would be to understand what is happening to DNA, histones? Is their evidence for leakage in the cytoplasm?

      __Response: __This is an interesting question. To assess this, we have now performed immunostainings for double-stranded DNA in the cytoplasm, following published protocols (Spada et al., 2019; PMID 31727239). This analysis revealed significantly increased cytoplasmic dsDNA in ArpC4KO DCs (revised Fig. 4G,H), indeed suggesting leakage into the cytoplasm following ArpC4 loss.

      RNA seq analysis

      The RNA-seq analysis suffers from a lack of direct connection with the rest of the study. The extracted molecular information is not validated nor further explored. It remains very descriptive. The PCA analysis suggests a « more pronounced transcriptomic heterogeneity in differentiating Arpc4KO DCs ». However it seems difficult to make such a claim from the comparison of 3 mice per group. In addition, such heterogeneity is not seen in the more detailed analysis (Fig 5F). The authors claim that « day 10 control and Arpc4KO DCs showed no to very little differences in gene expression, in contrast to cells at days 7-9 of differentiation ». This is not obvious from the data displayed in the corresponding figure. In addition, it is not expected that cells that may take a divergent differentiation path at days 7-9 may would return to a similar transcriptional activity at day 10.

      A point that is not discussed is that before day 10 of DC differentiation, Arpc4 KO is expected to only occur in about 50% of the cell population. This is expected to impact the RNA-seq analysis.

      Not all clusters have been exploited (e.g. cluster 3 elevated, cluster 6 partly reduced). I suggest the authors reconsider their analysis and analysis of the RNA-seq analysis (or eventually invest in complementary analysis).

      __Response: __Despite a comprehensive analysis of the different transcriptomes of control and ArpC4 mutant cells during DC differentiation, we decided to focus the presentation and discussion of our RNAseq results on the most notable findings. Of these, the elevated innate immune responses in ArpC4KO DCs (Fig. 5E,H) caught our particular attention, as this seemed highly meaningful in light of DC and LC functions.

      As suggested by the referee, in the revised manuscript, we better connected the RNAseq data to the other cellular and molecular analyses shown, complementing these results by investigating the potential involvement of innate immune responses in the ArpC4KO phenotype (page 7).

      What causes the profound numerical drop of LC in the epidermis?

      A major open question is what causes the massive drop of LCs. Although differentiating Arpc4KO DCs start accumulating DNA damage upon proliferation, they succeed in progressing through the cell cycle. There is even a slightly elevated expression of cell cycle genes at day 7 of differentiation in the DC model.

      Only a trend for increased apoptosis is observed in ear and tail skin. It would be important to provide complementary data documenting increased death (or aberrant emigration?) of LCs in the 4-8 week time window.

      __Response: __We agree with the reviewer that this is an important question. We exclude that elevated emigration causes the decline of LCs in ArpC4KO epidermis, as ArpC4-mutant LCs are significantly reduced (and not increased) in number in skin-draining lymph nodes (Fig. 7E). To assess whether increased cell death contributed to LC loss, we have tried to identify LCs that are just about to die. As the reviewer noted, we could only observe a trend of apoptosis-positive LCs in mutant epidermis. We assume that this is because of a quick elimination of compromised LCs following DNA damage, with only a short time passing until LCs with impaired genome integrity will be cleared from the system, making it very difficult to detect gH2Ax-positive cells that are positive for markers of cell death.

      Despite these limitations to detect DNA-damage-positive but viable LCs in vivo, we have now collected 6-week-old mice to analyze LC numbers and apoptosis (cleaved Caspase-3), complementing our data derived from 7-day and 4-week-old mice (Figures S2A,B,E,F). While we did observe the expected trends for reduced LC numbers and increased DNA damage of ArpC4KO LCs as seen in adolescent mice, we were unable to detect a significant increase of apoptotic LCs in ArpC4KO animals at 6 weeks of age (revised Suppl. Fig. 4A-D). We assume that this is due to the outlined short-lived stages of apoptotic cells. Alternatively, it seems possible that ArpC4KO LCs were lost via cell death pathways other than apoptosis, a matter which we feel is beyond the scope of this manuscript. Accordingly, we revised our discussion to include this possibility (page 11-12).

      Functional consequences

      Although the study reports novel aspects of LC biology, the consequence of ArpC4 deletion for skin barrier function and immunosurveillance are not investigated. It would seem very relevant to test how this model copes with radiation, chemical and/or microorganism challenges.

      __Response: __We fully agree with this reviewer that this is a very interesting point. Therefore, next to assessing the steady-state circulation of LCs and DCs, we also addressed the consequence of ArpC4 loss for LC function in chemically challenged skin: we performed skin painting experiments using the contact sensitizer fluorescein isothiocyanate (FITC), diluted in the sensitizing agent dibutyl phthalate (DBP), to detect cutaneous-derived phagocytes within draining lymph nodes. These experiments revealed that migration of ArpC4KO LCs (as well as of ArpC4KO DCs) to skin-draining lymph nodes was impaired (Fig. 7C-E), confirming an in vivo role of ArpC4 for immune cell migration to lymphatic organs following a chemical challenge. The revised manuscript contains a more detailed note to properly explain the FITC painting experiment and highlight its importance (page 9).

      MINOR COMMENTS:

      1- Figure 1D

      Gating strategy: twice the same empty plots. The content seems to be missing... Does this need to be shown in the main figure?

      __Response: __We apologize for this problem that might be due to file conversion of PDF reader software. In our PDF versions (including the published bioRxiv preprint) we do see the data points; however, we have earlier experienced incomplete FACS plots during manuscript preparation.

      For the revised manuscript, we double-checked the results after converting the figures into PDFs. Here is a screenshot:

      2- Figure 2

      Best would be to keep same scale to compare P1 and P7 (tail skin, figure 2A)

      Response: We have replaced the examples with micrographs of comparable scale (revised Fig. 2A).

      Overlay of Ki67 and MHC-II does not allow to easily visualize the double-positive cells (Fig 2C)

      Response: We now provided single-channel image next to the merged view and improved the visualization of double-positive cells (revised Fig. 2C).

      Quality of Ki67 staining different for Arpc4-KO (less intense, less focused to the nuclei): a technical issue or could that reflect something?

      Response: We thank the reviewer for spotting this. We have re-assessed all Ki67 micrographs and noted that the originally chosen examples indeed were not fully representative. We have selected more representative examples of Ki67-positive cells in control and mutant tissues, reflecting no difference in the principal nature of Ki67 staining (revised Fig. 2C).

      Fig 2C: Panels mounted differently for ear and tail skin (different order to present the individual stainings, Dapi for tail skin only).

      Response: We agree and have harmonized the sequence of panels in figure 2 accordingly (revised Fig. 2C).

      3- LC branch analysis (Fig 1 and 2)

      While Fig 1 indicates that ear skin LCs form in average twice as few branches as tail skin LCs (3-4 versus 8-9 branches per cell), Fig 2 shows the opposite (10-12 versus 6-7 branches per cell).

      Is this due to a very distinct pattern between the 2 considered ages (4 weeks versus 8-10 weeks)? Could the author double-check that there is no methodological bias in their analysis?

      Response: We thank the reviewer for hinting to this apparent inconsistency. Indeed, our initial analysis suffered from a bias in detecting LC dendrites, as the tissue cellularity and overall morphology significantly differs between 4-week-old and adult animals: In adult animals, the immunostainings showed a higher baseline background signal for the skin epithelium compared to P28. We had noted this beforehand and had adjusted the imaging pipeline accordingly, with a more stringent thresholding to eliminate background signals in the case of adult tissues. While we were able to detect the described ArpC4 phenotype, this strategy resulted in a reduced ability to detect dendrites (both in control and mutant tissues), explaining the seemingly reduced number of dendrites in adult vs. 4-week-old tissues.

      We have double-checked both the micrographs and the corresponding quantifications and did not identify errors. Instead, our assumption -that a too high stringency for background reduction in adults caused the discrepancy- turned out correct. We now performed detailed analyses of LC morphology at 4-week and adult stages by confocal microscopy, using a 63x objective rather than a 40x objective as done previously. The new results confirm that with this approach the number of LC dendrites across these ages are largely comparable, while the phenotypes of ArpC4 loss are retained. The revised manuscript now contains a completely new analysis based on image acquisition with a 63x objective (revised Fig. 1E-G).

      4- Fig 3 E-G

      How many animals were examined (n=5)? Reproducible accros animals? Why was it done with 4-week animals (phenotype not complete? Event occurring before loss in numbers...)

      Response: As mentioned in the figure legend for Fig. 3F we have analysed N = 4 control and N= 5 KO mice. We chose the 4-week time-point as this was the stage when the loss of LCs first became apparent (even though non-significant at this age). We aimed to learn whether changes in nuclear morphology and nuclear envelope markers represented early molecular and cellular events following ArpC4 loss. Compared to later stages, this strategy poses a reduced risk to detect indirect effects of ArpC4 loss. We added a notion in the revised manuscript text to clarify this (page 5).

      Staining Lamin A/C globally more intense in the Arpc4-KO epidermis (also seems to apply to the masks corresponding to the LCs). Surprising to see that the quantification indicates a major drop of Lamin A/C intensity in the LCs.

      Response: We again thank the reviewer for this careful assessment. As with many tissue stainings, there is inter-sample variability. We have now revisited the micrographs and did not find a significant global reduction of Lamin A/C in the entire epidermis (including keratinocytes/KCs). The drop of Lamin A/C intensity is restricted to ArpC4KO LCs -and not KCs- and in line with the reduced Lamin A/C expression data in DCs (Fig. 3C,D). The revised manuscript now shows more representative examples (revised Fig. 3E).

      Legend Fig 4D replace confocal microscopy by STED microscopy

      Response: We replaced "confocal microscopy" by "STED microscopy".

      6- Figure 4F

      Intensity/background of γH2Ax staining very distinct between the 2 micrographs shown for WT and Arpc4-KO epidermis.

      Response: We revisited the micrographs and now selected more representative examples (revised Fig. 4I).

      7- Figure 7C, F, H

      Gating strategies: would be better to harmonize the style of the plots (dot plots and 2 types of contour plots have been used...)

      Response: We agree and provided a harmonized plot illustration in the revised manuscript (revised Fig. 7).

      8- Figure 7H

      Legend of lower gating strategy seems to be wrong (KO and not WT).

      Response: We thank the reviewer for pointing out this mistake. The revised Figure 7H shows a corrected figure display.

      Reviewer #1 (Significance (Required)):

      Strengths: the general quality of the manuscript is high. It is very clearly written and it contains a very detailed method section that would allow reproducing the reported experiments. This work entails a clear novelty in that it represents the first investigation of the role of ArpC4 in LCs. It opens an interesting perspective about specific mechanisms sustaining the maintenance of myeloid cell subsets in peripheral tissues. This work is therefore expected to be of interest for a large audience of cellular immunologists and beyond. Challenging skin function with an external trigger would lift the relevance for a even wider audience (see main point 6).

      __Response: __see main point 6.

      Limitations: in its current version the manuscript suffers from a lack of solidity around a few analysis (see main points on ArpC4 and Arp2/3 protein expression, nuclear envelop rupture analysis,...). It also tends to formulate a narrative centered on the ArpC4 intra-nuclear function that is not definitely proven.

      The field of expertise of this reviewer is: cellular immunology and actin remodeling.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      SUMMARY This is a study in experimental mice employing both in vitro and, importantly, in vivo approaches. EPIDERMAL LANGERHANS CELLS serve as a paradigm for the maintenance of homeostasis of myeloid cells in a tissue, epidermis in this case. In addition to well known functions of the ACTIN NETWORK in cell migration, chemotaxis, cell adherence and phagocytosis the authors reveal a critical function of actin networks in the survival of cells in their home tissue.

      Actin-related proteins (Arp), specifically here the Arp2/3 complex, are necessary to form the filamentous actin networks. The authors use conditional knock-out mice where Arpc4 (an essential component of the Arp2/3 complex) is deleted under the control of CD11c, the most prominent dendritic cell marker which is also expressed on Langerhans cells. In normal mice, epidermal Langerhans cells reside in the epidermis virtually life-long. They initially settle the epidermis around and few days after birth an establish a dense network by a burst of proliferation and then they "linger on" by low level maintenance proliferation. In the epidermis of Arpc4 knock-out mice Langerhans cells also start off with this proliferative burst but, strikingly, they do not stay but are massively reduced by the age of 8-12 weeks.

      The analyses of this decline revealed that

      -- the shape (number of nuclear lobes) and integrity of cell nuclei was compromised; they were fragile and ruptured to some degree when Arpc4 was knocked out, i.e., the Arp2/3 complex was missing;

      -- DNA damage, as detected by staining for gamma-H2Ax or 53BP1 accumulated when Arpc4 was knocked out, i.e., the Arp2/3 complex was missing;

      -- recruitment of DNA repair molecules was inhibited when Arpc4 was knocked out, i.e., the Arp2/3 complex was missing;

      -- gene signatures of interferon signaling and response were increased when Arpc4 was knocked out, i.e., the Arp2/3 complex was missing;

      -- in vivo migration of dendritic cells and Langerhans cells from the skin to the draining lymph nodes in an inflammatory setting (FITC painting of the skin) was impaired when Arpc4 was knocked out, i.e., the Arp2/3 complex was missing;

      -- the persistence of the typical dense network of Langerhans cells in the epidermis, created by proliferation shortly after birth, is abrogated when Arpc4 was knocked out, i.e., the Arp2/3 complex was missing. Importantly, this was not the case for myeloid cell populations that settle a tissue without needing that initial burst of proliferation. For instance, numbers of colonic macrophages were not affected when Arpc4 was knocked out, i.e., the Arp2/3 complex was missing.

      Thus, the authors conclude that the Arp2/3 complex is essential by its formation of actin networks to maintain the integrity of nuclei and ensure DNA repair thereby ascertaining the maintenance proliferation of Langerhans cells and, as the consequence, the persistence of the dense epidermal netowrk of Langerhans cells.

      Up-to-date methodology from the fields of cell biology and cellular immunology (cell isolation from tissues, immunofluorescence, multiparameter flow cytometry, FISH, "good old" - but important - transmission electron microscopy, etc.) was used at high quality (e.g., immunofluorescence pictures!). Quantitative and qualitative analytical methods were timely and appropriate (e.g., Voronoi diagrams, cell shape profiling tools, Cre-lox gene-deletion technology, etc.). Importantly, the authors used a clever method, that they had developed several years ago, namely the analysis of dendritic cell migration in microchannels of defined widths. Molecular biology methods such as RNAseq were also employed and analysed by appropriate bioinformatic tools.

      MAJOR COMMENTS:

      • ARE THE KEY CONCLUSIONS CONVINCING? Yes, they are.

      • SHOULD THE AUTHORS QUALIFY SOME OF THEIR CLAIMS AS PRELIMINARY OR SPECULATIVE, OR REMOVE THEM ALTOGETHER? No, I think it is ok as it stands. The authors are wording their claims and conclusions not apodictically but cautiously, as it should be. They point out explicitely which lines of investigations they did not follow up here.

      • WOULD ADDITIONAL EXPERIMENTS BE ESSENTIAL TO SUPPORT THE CLAIMS OF THE PAPER? REQUEST ADDITIONAL EXPERIMENTS ONLY WHERE NECESSARY FOR THE PAPER AS IT IS, AND DO NOT ASK AUTHORS TO OPEN NEW LINES OF EXPERIMENTATION. I think that the here presented experimental evidence suffices to support the conclusions drawn. No additional experiments are necessary.

      • ARE THE SUGGESTED EXPERIMENTS REALISTIC IN TERMS OF TIME AND RESOURCES? IT WOULD HELP IF YOU COULD ADD AN ESTIMATED COST AND TIME INVESTMENT FOR SUBSTANTIAL EXPERIMENTS. Not applicable.

      • ARE THE DATA AND THE METHODS PRESENTED IN SUCH A WAY THAT THEY CAN BE REPRODUCED? Yes, they are.

      • ARE THE EXPERIMENTS ADEQUATELY REPLICATED AND STATISTICAL ANALYSIS ADEQUATE? Yes.

      __Response: __We thank the reviewer very much for assessing our work, for providing constructive suggestions, and for acknowledging the strength of the study.

      MINOR COMMENTS:

      • SPECIFIC EXPERIMENTAL ISSUES THAT ARE EASILY ADDRESSABLE. None

      • ARE PRIOR STUDIES REFERENCED APPROPRIATELY? Essentially yes. Regarding the reduction / loss of the adult epidermal Langerhans cell network, it may be of some interest to also refer to / discuss to another one of the few examples of this phenomenon. There, the initial burst of proliferation is followed by reduced proliferation and increased apoptosis when a critical member of the mTOR signaling cascade is conditionally knocked out (Blood 123:217, 2014).

      Response: We thank the reviewer for pointing out this important work. We now included the paper into the revised manuscript (page 12).

      • ARE THE TEXT AND FIGURES CLEAR AND ACCURATE? Yes they are. Figures are well arranged for easy comprehension.

      • DO YOU HAVE SUGGESTIONS THAT WOULD HELP THE AUTHORS IMPROVE THE PRESENTATION OF THEIR DATA AND CONCLUSIONS?

      1. Materials & Methods. The authors write, regarding flow cytometry of epidermal cells: "Briefly, 1cm2 of back skin from 8-14 weeks old female wild-type and knockout littermates was dissociated in 0.25 mg/mL Liberase (Sigma, cat. #5401020001) and 0.5 mg/mL DNase (Sigma, cat.#10104159001) in 1 mL of RPMI (Sigma) and mechanically disaggregated in Eppendorf tubes, FOLLOWED BY INCUBATED for 2 h at 37 {degree sign}C." Followed by what?

      __Response: __We apologize for this mistake. The text should read: "... followed by incubation for 2 h at 37 {degree sign}C and filtration using a 100µm cell strainer. Thereafter, blocking was performed in PBS supplemented with 0.5% bovine serum albumin and 2 mM EDTA at 4 {degree sign}C, followed by antibody labeling of cells in single cell suspension". The text has been corrected in the revised manuscript (page 17).

      Materials & Methods. BMDC electronmicroscopy. What is "IF". Please specify.

      __Response: __We also regret this mistake in the method text. It should read: "... For electron microscopy analysis, after PDMS removal, cells were fixed using 2.5% glutaraldehyde ...". The text has been corrected in the revised manuscript (page 21).

      RESULTS in gene expression analyses. The authors observe some increase in apoptosis (as detected by cleaved-Caspase-3 staining). Is this observation in immunofluorescence also evident in the RNAseq data (where the IFN changes were seen), i.e., in Figure 5.

      __Response: __We have checked our RNAseq data regarding any changes in apoptosis-related genes, however, apart from a few transcripts that are upregulated in ArpC4KO cells, we do not find a pronounced enrichment of apoptosis-related genes. We included volcano plot data in revised Suppl. Fig. 5H to share these DEGs.

      Figure 7 F and G. Perhaps the authors may want to swap upper and lower panels in F or G, so that macrophage FACS plots and bar graphs are in the same row - ob, obiously, DC plots and bars likewise.

      __Response: __We agree and have harmonized the panel sequence in the revised manuscript (revised Fig. 7F, G; panels swapped in G, display harmonized).

      Figure 7H. "Gating strategy in ArpC4WT Lung (previously gated in Live CD45+ cells)" - The lower row is knock-out, not WT. This is indicated correctly in the legand, but in the figure both rows are labeled as WT.

      __Response: __Indeed, the legend information is correct, but the corresponding figure panel is incorrect. We now provide a corrected version (revised Fig. 7H).

      The reference by Park et al. 2021 is missing in the list.

      __Response: __We have added the reference to the revised bibliography.

      Figure 1D. Sure, the bar graphs are meant to say "CD11c"? The FACS plots show "CD11b".

      __Response: __We have checked the panels and corrected where necessary (revised fig. 1D).

      As to cDC1. In Figure 1D the FACS plot shows an absence of CD103+ cDC1 cells. In contrast, In Figure 7A-left side panel, there is not difference in cDC1 cells between WT and KO mice. Is therefore the flow cytometry plot in Figure 1D not representative regarding cDC1 cells? Correct?

      __Response: __The reviewer is correct about this apparent discrepancy. We have not observed differences in the control vs. ArpC4KO cDC1 population, hence Figure 7 represents our findings. For figure 1, we have by mistake chosen a non-representative plot, with the aim of illustrating the gating strategy. We apologize for this mistake and now provide a corrected and representative FACS plot figure in the revised manuscript (revised Fig. 1D).

      Reviewer #2 (Significance (Required)):

      • DESCRIBE THE NATURE AND SIGNIFICANCE OF THE ADVANCE (E.G. CONCEPTUAL, TECHNICAL, CLINICAL) FOR THE FIELD. This is a conceptual advance. It adds a big step to our understanding of how immune cells in tissues (which all come from the bone marrow or are seeded before birth from embryonal hematopoietic organs such as yolk sac and fetal liver) can remain resident in these tissues. For cell types such as Langerhans cells, which establish their final population density within their tissues of residence, the presented finding convincingly buttress the role of proliferation and thereby the role for the actin-related protein complex 2/3 (Arp2/3).

      • PLACE THE WORK IN THE CONTEXT OF THE EXISTING LITERATURE (PROVIDE REFERENCES, WHERE APPROPRIATE). While we know much about actin-related proteins (Arp), as correctly cited by the authors, this knowledge is derived mostly from in vitro studies. The submitted study translates the findings to an in vivo setting for the first time.

      • STATE WHAT AUDIENCE MIGHT BE INTERESTED IN AND INFLUENCED BY THE REPORTED FINDINGS. Skin immunologists foremost, but these findings are of interest to the entire community of immunologists, but also cell biologists.

      • DEFINE YOUR FIELD OF EXPERTISE. My expertise is in skin immunology, in particular skin dendritic cells including Langerhans cells.

      We acknowledge the referee for their positive assessment of our manuscript.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary:

      The manuscript identifies a role of the Arp2/3 complex, the major regulator of actin branching in cells, for controlling the homeostasis of murine Langerhans cells (LCs), a specialized subset of dendritic cells in the skin epidermis. The findings of the study are based on the analysis of CD11c-Cre Arpc4-flox mice, a conditional knockout mouse model, which interferes with Arp2/3 function in Langerhans cells and other CD11c-expressing myeloid cells, e.g. dendritic cell or macrophage subsets. By using immunofluorescence and flow cytometry analysis of epidermis and skin tissues, the authors provide a detailed analysis of LC numbers at different developmental stages (postnatal day 1, 7, 28, and adult mice) and demonstrate that Arpc4-deficiency does not interfere with the establishment of LC networks until postnatal day 28. However, LCs in ear and tail skin are substantially reduced in Arpc4-deficient mice at 8-12 weeks of age. In parallel to their in vivo model, the authors analyze cultures of bone marrow-derived dendritic cells (BMDCs) from control and CD11c-Cre Arpc4-flox mice. Arpc4-deficiency in BMDCs, which develop over 8-10 days in culture, results in nuclear shape and lamina abnormalities, as well as signs of increased DNA damage. Aspects of this phenotype are also detected in Langerhans cells in epidermal preparations. Transcriptomic analysis of BMDCs highlights a gene signature of increased expression of the interferon response pathway and alterations in cell cycle regulation. Arpc4-deficient BMDCs show increased expression of DNA damage markers and reduced expression of certain DNA repair factors. Based on these correlative findings from the BMDC model, the authors conclude that the decline in LC numbers might develop from the accumulation of DNA damage over time, which the authors phrease "pre-mature aging of Langerhans cells". Lastly, the authors show a heterogenous picture how Arp2/3 depletion affects distinct DC populations in CD11c-Cre Arpc4-flox mice. While some tissue-resident DC subsets appear normal in numbers, others are declined in numbers in the tissue. This may be related to their proliferation potential in tissues.

      Major comments:

      • Are the claims and the conclusions supported by the data or do they require additional experiments or analyses to support them?

      1) The authors claim that Arpc4 deficiency selectively compromises myeloid cell populations that rely on proliferation for tissue colonization (Figure 7). The presented data might give hints for such a general hypothesis, but solid experimental proof to prove this is lacking. When comparing myeloid cell subsets from foru different irgans, the authors refer to published data that some dendritic cell subsets are more proliferative in tissues than others and that CD11cCre Arpc4-flox mice appear to have reduced cell numbers in these populations. However, the presented data are purely correlative and no functional connection to cell proliferation has been made to the phenotypes. While some dendritic cell subsets (Langerhans cells, alveolar DCs) show reduced cell numbers in CD11cCre Arpc4-flox mice, other myeloid cell cells subsets are unaffected (e.g. dermal cDC1 and 2, colon macrophages).There could be plenty of other reasons that might underly the observed discrepancies between these cell subsets, e.g. Arp2/3 knockout efficiency and myeloid cell turnover in the tissue are just two examples, which have not been taken into consideration. Direct measurement of cell proliferation, e.g. BrdU labeling, and the observed phenotype would be missing to make such claims. The data could either be removed. Experimentally addressing these points could take 3-6 months.

      Response and revisions: We thank the referee for bringing this point. We agree that these results give hints that support our conclusion but that do not address this question directly. However, we would like to emphasize that our conclusion is based on studies from others showing that alveolar macrophages self-maintain themselves through proliferation (Bain et al. Mucosal Immunology 2022). In contrast, it has been reported that a large fraction of colonic macrophages are derived from monocytes that are being recruited to the gut through life (Bain et al. Mucosal Immunity 2023). We now added these points in our revised manuscript. Moreover, during revision we confirmed deletion of the ArpC4 allele by genotyping PCR of FACsorted colon macrophages (revised Suppl. Fig. 7C and revised methods). In addition, we stress that we do not exclude that different intracellular Arpc4-dependent processes might contribute to the phenotypes observed (beyond maintenance of DNA integrity) (page 11). This will help mitigate our conclusions and leave open the potential implication of alternative mechanisms.

      2) The authors claim that DC subsets (e.g. dermal cDCs), which develop from pre-DCs, are not affected by Arp2/3 depletion (Figure 7, although the FACS plot in Fig. 1D would suggest a different picture for cDC1). This is surprising in light of the data with bone marrow-derived DCs (BMDCs), the major in vitro model of this study, which develop from CDPs that again develop from pre-DCs. BMDCs did show aberrant nuclei and signs of DNA damage. How would the authors then explain the discrepancies of the BMDC model with DC subsets, where the authors feel that the pre-DC origin explains the phenotypic difference? This is a general concern of the data interpretation and conclusions.

      __Response: __We thank the referee for bringing this point that indeed requires clarification. Two non-exclusive hypotheses could explain this apparent discrepancy:

      • The ontogeny of bone-marrow-derived DCs: Depending on the protocol used, there might be variations in the precursors DCs develop from. We use one of the first protocols, which was pioneered by Paola Ricciardi-Castagnoli lab (Winzler et al. Exp.Med. 1997). It relies on a supernatant from J558 cells transfected with GMCSF, which contains additional cytokines and mainly generate DC2-like DCs. Langerhans cells are closer to DC2s, which resemble more macrophages than DC1s. We thus chose this protocol rather than the protocols that use Flt3-L, which produce both DC1s and DC2s developed from common dendritic-cell precursors (CDPs). It is thus possible that our BM-derived DCs develop from other precursor cells closer to monocyte precursors.
      • As shown in Figure 5C, kinetics of acquisition of CD11c expression, and thus deletion of the Arpc4 gene, might be distinct in vivo and in vitro. In vivo, as stated in our manuscript, DCs acquire CD11c as preDCs and undergo few rounds of divisions after. In vitro, as shown by our cycling experiments, BM-derived DCs continuously cycle, so they will keep dividing after having acquired CD11c (around day 7) and deleting the Arpc4 gene. We now mentioned these hypotheses in the discussion of our revised manuscript to explain the apparent contradiction raised by the referee (pages 10 and 12).

      3) In line with point 2, the authors never show that BMDCs show reduced proliferation, reduced cell numbers or increased cell death in Arpc4-deficient cell cultures, as a consequence of the detected DNA damage and impaired DNA repair. In fact, Figure 5C even shows that cell growth rates between control and KO are equal. This is a major mismatch in the current study. Since the authors use the BMDC model to explain the declining cell numbers in Langerhans cells (which derive from fetal liver cells), this phenotype is not mirrored by the BMDC culture and it remains open whether the observed changes in nuclear DNA damage and repair are indeed directly linked to the observed phenotype of declining cell numbers in the tissue. These aspects require argumentation why cell growth is unchanged in KO cells. Additional experiments addressing these points with sufficient biological replicates (cultures from different mice) could take 2-3 months, including preparation time.

      __Response____: __We thank the referee for bringing this point, which was probably not properly discussed in the first version of our manuscript. Indeed, Arpc4KO BM-derived DCs do not show the premature cell death phenotype observed in LCs in vivo, as stated by the referee. There are at least two putative non-exclusive explanations for this. First, unlike LCs, which are long-lived cells, BM-derived DCs can be kept in culture for only 10-12 days. As DNA damage-induced cell death takes time (LCs only start to die about 3-4 weeks after network establishment), the lifespan of BM-DCs could simply not be long enough to observe this phenotype. Second, in the epidermis, LCs are physically constrained and continuously exposed to diverse signals that might increase their sensitivity to DNA damage and thereby induction of subsequent cell death.

      We have attempted to clarify this point in our revised manuscript by providing putative explanations for the death phenotype of Arpc4-deficient LCs not being observed in BM-derived DCs. We further explained that this does not invalidate this cellular model as it was used to raise hypotheses on the putative role played by ArpC4 in myeloid cells, i.e. maintenance of DNA integrity, which was then confirmed in vivo (ArpC4KO LCs do indeed display DNA damage in the epidermis) (page 12). Without this "imperfect cellular model", we would have probably not been able to uncover this novel function of Arp2/3 in immune cells.

      4) The authors refer to a "pre-mature aging" phenotype of Arpc4-deficient BMDCs and LCs, based on reductions in Lamin B, Lamin A and increases in gH2AX and 53BP1. I find this term and overstatement of the current data and suggest that other markers for cell senescence, such as p53, Rb, p21 and b-Galactosidase are then also used to make such strong claim on "aging" and cell senescence. Experimentally addressing this point with sufficient biological replicates could take 2-3 months, including preparation time.

      __Response: __We now assessed senescence signatures in our RNAseq analysis of Arpc4WT and Arpc4KO-derived DCs, as suggested by the referee. These results revealed several senescence-related DEGs upregulated in ArpC4KO DCs, such as serpinB2 (revised Suppl. Fig. 5G, volcano plots) as well as a general enrichment of a senescence-related signature when using the senescence gene set (Aging Atlas Consortium, 2021; revised Fig. 5I). These data support our notion of a premature aging phenotype following ArpC4 loss in BMDCs.

      5) The study does not provide a mechanism how the Arp2/3 complex would mediate the observed effects on DNA damage and repairs has not been addressed in the cell model, and only potential scenarios from other non-myeloid cell lines are discussed. It remains unclear whether the observed phenotypes in Arpc4-depleted myleoid cells relate to the direct nuclear function of Arp2/3 or the cytosolic function of Arp2/3, including its roles in cytoskeletal regulation that may have secondary effects on the nuclear alterations. This is a general concern of the presented data, data on mechanism might require more than 6 months.

      __Response____: __The referee is correct: Our manuscript shows that Arp2/3 deficiency in specific myeloid cells impacts on their survival in vivo and proposes that this could result at least in part from impaired maintenance of DNA integrity in these cells. We do not know whether this also applies to non-myeloid cells, which, although very interesting, is beyond the scope of the present study. In addition, we do not have any experimental tool to distinguish whether the DNA damage phenotype of Arpc4KO cells involves the nuclear or cortical pool of F-actin, this is why we have left this question open in the discussion of our manuscript.

      6) OPTIONAL: The authors make a strong case arguing that the increased interferon expression signature (based on the transcriptomics data) reflects the nuclear ruptures in Arpc4-deficient cells and adds to the observed phenotype. If this is so, what happens then in STING knockout cells in the presence of CK666 inhibitor?

      __Response____: __During revision, we now tested the putative role of STING in the ArpC4-KO phenotype. We found that abrogation of STING function in ArpC4KO mice did not rescue LC survival, excluding the possibility that aberrant STING activation triggers LC loss in these animals (revised Fig. S5E,F). Therefore, we tempered our conclusion (page 7).

      • Are the data and the methods presented in such a way that they can be reproduced?

      1) The analyses include quite a number of intensity calculations of immunofluorescence signals (Fig. 3D, E; Fig. 4E, Fig. 5B and 6B)? The background stainings are often variable or very high. In some cases it is even unclear whether stainings are really detecting protein and go beyond background staining (Fig. 6A, Fig. 5F). How were immunofluorescence data acquired and dealt with different background staining intensities?

      __Response____: __We extended our description of the microscopes used for image acquisition as well as the downstream analyses for each experiment, which indeed vary depending on the signals observed with distinct antibodies or constructs.

      2) It remained unclear to me on which basis the nuclear deformations in Fig. 3G, H were calculated?

      __Response____: __We also extended the mentioning of methods used to quantify nuclear deformations.

      3) The detailed phenotype of control mice is a bit unclear. It appears as if these were Cre-negative animals. Did the authors have some proof-of-principle experiments showing that CD11cCre Arpc4 +/+ animals have comparable phenotypes to Cre-negative animals?

      __Response____: __We have never observed any decline in LC numbers in other mouse lines/genotypes (for example in cPLA2flox/flox;CD11c-Cre mice shown in the manuscript, Fig. S6B), excluding a putative role for the Cre in LC death. To nevertheless thoroughly check this aspect, we now quantified gH2Ax immunostaining of LCs of both Cre-positive and Cre-negative animals. These analyses revealed no Cre-mediated effect on DNA damage in LCs (revised Suppl. Fig. 4E,F).

      • Are the experiments adequately replicated and statistical analysis adequate?

      For most experiments, the number of biological replicates (mice, or BMDC cultures from different mice) and individual values (n, cells) are indicated. Statistical analysis appears adequate.

      Minor comments:

      • Prior published studies on Arp2/3 function in immune cells are referenced accordingly. A number of additional pre-print manuscripts on this topic have not been cited and could be considered referencing.

      __Response: __We now cited additional, relevant preprints and peer-reviewed work (page 12).

      • The text is very clearly and very well written. Figures are clear and accurate for most cases. There are some open questions:

      • Fig. 1B: The number of dots betwenn graph and legend do not match. The dots are not n=12 for both genotypes. Additionally: What do the symbols in the circles in the graph stand for? This is also in another later figure unclear.

      • Fig. 2C: The current IF presentation (overlay MHCII with Ki67) is not very helpful. An additional image that shows only the Ki67 signal in the MHCII mask would be very helpful.

      • Fig. 4B: BMDCs of which culture day were used for these experiments?

      • Fig. 4A and D shows the same representative cells for two biological messages, which is only moderately convincing regarding a "general" phenotype.

      • Fig. 5, B: Scale bars are missing.

      __Response: __We have fixed all these points (revised Fig. 1B, 2C, 4B, 4A&D, 5B).

      Reviewer #3 (Significance (Required)):

      Strengths and Advance:

      The study provides strong data and a very detailed analysis of how the Arp2/3 complex regulates stages of Langerhans cell development and homeostasis. The role of the Arp2/3 complex as regulator of actin branching, which is involved in many cellular functions, has previously not been reported for this cell type. Previous research in immune cells have already studied the Arp2/3 complex, but studies were focussed on its role in migration and the majority of published phenotypes related to cell migration. While there are already a number of in vitro studies showing that the Arp2/3 complex can regulate aspects of cell cycle control or cell death in non-immune cells, most of these studies were performed with immortalized, non-immune cell lines, which can be more easily manipulated to dissect mechanistic aspects of the cellular phenotype, but are limited in their physiological interpretation. Hence, it is a major strength of this study to investigate the effects of Arp2/3 in a primary immune cell type, directly in the native and physiological environment. This is important because in vitro data from other cell types cannot be easily extrapolated to any other cell type and it is critical for our understanding to collect physiological data from tissues, where the biology really happens. The finding that the Arp2/3 complex regulates the tissue-residency of Langerhans cell through processes that are unrelated to migration are partially unexpected, shifting the view of this protein complex's physiological role to other cell biological processes, e.g. regulation of cell proliferation.

      Limitations: The limitations of the study are detailed in the five major points listed above. The study accumulates many experiments that characterize the phenotype of Arpc4-depleted cells, showing signs of DNA damage in Langerhans cells and cultures of BMDCs. How the Arp2/3 complex would mechanistically mediate the observed effects on DNA damage and repairs have not been addressed. It also remains open whether this is due to the effects of the Arp2/3 complex in the nucleus or the cytosol, which would be biologically extremely important to understand. Above that, there are some discrepancies regarding the phenotype of the BMDC model, which does neither entirely match the Langerhans cell phenotype in the tissue (reduced proliferation, LC derive from different progenitors), nor other endogenous DC populations, which should also derive from similar progenitors.

      Audience and reviewer background:

      In its current form, the manuscript will already be of interest for several research fields: Langerhans cell and dendritic cell homeostasis, immune cell trafficking, actin and cytoskeleton regulation in immune cells, physiological role of actin-regulating proteins. My own field of expertise is immune cell trafficking in mouse models, leukocyte migration and cytoskeletal regulation. I cannot judge the analysis and clustering of the bulk RNA sequencing data.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      • This is a study in experimental mice employing both in vitro and, importantly, in vivo approaches. EPIDERMAL LANGERHANS CELLS serve as a paradigm for the maintenance of homeostasis of myeloid cells in a tissue, epidermis in this case. In addition to well known functions of the ACTIN NETWORK in cell migration, chemotaxis, cell adherence and phagocytosis the authors reveal a critical function of actin networks in the survival of cells in their home tissue.

      • Actin-related proteins (Arp), specifically here the Arp2/3 complex, are necessary to form the filamentous actin networks. The authors use conditional knock-out mice where Arpc4 (an essential component of the Arp2/3 complex) is deleted under the control of CD11c, the most prominent dendritic cell marker which is also expressed on Langerhans cells. In normal mice, epidermal Langerhans cells reside in the epidermis virtually life-long. They initially settle the epidermis around and few days after birth an establish a dense network by a burst of proliferation and then they "linger on" by low level maintenance proliferation. In the epidermis of Arpc4 knock-out mice Langerhans cells also start off with this proliferative burst but, strikingly, they do not stay but are massively reduced by the age of 8-12 weeks.

      • The analyses of this decline revealed that

      a) the shape (number of nuclear lobes) and integrity of cell nuclei was compromised; they were fragile and ruptured to some degree when Arpc4 was knocked out, i.e., the Arp2/3 complex was missing;

      b) DNA damage, as detected by staining for gamma-H2Ax or 53BP1 accumulated when Arpc4 was knocked out, i.e., the Arp2/3 complex was missing;

      c) recruitment of DNA repair molecules was inhibited when Arpc4 was knocked out, i.e., the Arp2/3 complex was missing;

      d) gene signatures of interferon signaling and response were increased when Arpc4 was knocked out, i.e., the Arp2/3 complex was missing;

      e) in vivo migration of dendritic cells and Langerhans cells from the skin to the draining lymph nodes in an inflammatory setting (FITC painting of the skin) was impaired when Arpc4 was knocked out, i.e., the Arp2/3 complex was missing;

      f) the persistence of the typical dense network of Langerhans cells in the epidermis, created by proliferation shortly after birth, is abrogated when Arpc4 was knocked out, i.e., the Arp2/3 complex was missing. Importantly, this was not the case for myeloid cell populations that settle a tissue without needing that initial burst of proliferation. For instance, numbers of colonic macrophages were not affected when Arpc4 was knocked out, i.e., the Arp2/3 complex was missing.

      • Thus, the authors conclude that the Arp2/3 complex is essential by its formation of actin networks to maintain the integrity of nuclei and ensure DNA repair thereby ascertaining the maintenance proliferation of Langerhans cells and, as the consequence, the persistence of the dense epidermal netowrk of Langerhans cells.

      • Up-to-date methodology from the fields of cell biology and cellular immunology (cell isolation from tissues, immunofluorescence, multiparameter flow cytometry, FISH, "good old" - but important - transmission electronmicroscopy, etc.) was used at high quality (e.g., immunofluorescence pictures!). Quantitative and qualitative analytical methods were timely and appropriate (e.g., Voronoi diagrams, cell shape profiling tools, Cre-lox gene-deletion technology, etc.). Importantly, the authors used a clever method, that they had developed several years ago, namely the analysis of dendritic cell migration in microchannels of defined widths. Molecular biology methods such as RNAseq were also employed and analysed by appropriate bioinformatic tools.

      Major comments:

      • ARE THE KEY CONCLUSIONS CONVINCING? Yes, they are.

      • SHOULD THE AUTHORS QUALIFY SOME OF THEIR CLAIMS AS PRELIMINARY OR SPECULATIVE, OR REMOVE THEM ALTOGETHER? No, I think it is ok as it stands. The authors are wording their claims and conclusions not apodictically but cautiously, as it should be. They point out explicitely which lines of investigations they did not follow up here.

      • WOULD ADDITIONAL EXPERIMENTS BE ESSENTIAL TO SUPPORT THE CLAIMS OF THE PAPER? REQUEST ADDITIONAL EXPERIMENTS ONLY WHERE NECESSARY FOR THE PAPER AS IT IS, AND DO NOT ASK AUTHORS TO OPEN NEW LINES OF EXPERIMENTATION. I think that the here presented experimental evidence suffices to support the conclusions drawn. No additional experiments are necessary.

      • ARE THE SUGGESTED EXPERIMENTS REALISTIC IN TERMS OF TIME AND RESOURCES? IT WOULD HELP IF YOU COULD ADD AN ESTIMATED COST AND TIME INVESTMENT FOR SUBSTANTIAL EXPERIMENTS. Not applicable.

      • ARE THE DATA AND THE METHODS PRESENTED IN SUCH A WAY THAT THEY CAN BE REPRODUCED? Yes, they are.

      • ARE THE EXPERIMENTS ADEQUATELY REPLICATED AND STATISTICAL ANALYSIS ADEQUATE? Yes.

      Minor comments:

      • SPECIFIC EXPERIMENTAL ISSUES THAT ARE EASILY ADDRESSABLE. None

      • ARE PRIOR STUDIES REFERENCED APPROPRIATELY? Essentially yes. Regarding the reduction / loss of the adult epidermal Langerhans cell network, it may be of some interest to also refer to / discuss to another one of the few examples of this phenomenon. There, the initial burst of proliferation is followed by reduced proliferation and increased apoptosis when a critical member of the mTOR signaling cascade is conditionally knocked out (Blood 123:217, 2014).

      • ARE THE TEXT AND FIGURES CLEAR AND ACCURATE? Yes they are. Figures are well arranged for easy comprehension.

      • DO YOU HAVE SUGGESTIONS THAT WOULD HELP THE AUTHORS IMPROVE THE PRESENTATION OF THEIR DATA AND CONCLUSIONS?

      • Materials & Methods. The authors write, regarding flow cytometry of epidermal cells: "Briefly, 1cm2 of back skin from 8-14 weeks old female wild-type and knockout littermates was dissociated in 0.25 mg/mL Liberase (Sigma, cat. #5401020001) and 0.5 mg/mL DNase (Sigma, cat.#10104159001) in 1 mL of RPMI (Sigma) and mechanically disaggregated in Eppendorf tubes, FOLLOWED BY INCUBATED for 2 h at 37 {degree sign}C." Followed by what?

      • Materials & Methods. BMDC electronmicroscopy. What is "IF". Please specify.

      • RESULTS in gene expression analyses. The authors observe some increase in apoptosis (as detected by cleaved-Caspase-3 staining). Is this observation in immunofluorescence also evident in the RNAseq data (where the IFN changes were seen), i.e., in Figure 5.

      • Figure 7 F and G. Perhaps the authors may want to swap upper and lower panels in F or G, so that macrophage FACS plots and bar graphs are in the same row - ob, obiously, DC plots and bars likewise.

      • Figure 7H. "Gating strategy in ArpC4WT Lung (previously gated in Live CD45+ cells)" - The lower row is knock-out, not WT. This is indicated correctly in the legand, but in the figure both rows are labeled as WT.

      • The reference by Park et al. 2021 is missing in the list.

      • Figure 1D. Sure, the bar graphs are meant to say "CD11c"? The FACS plots show "CD11b".

      • As to cDC1. In Figure 1D the FACS plot shows an absence of CD103+ cDC1 cells. In contrast, In Figure 7A-left side panel, there is not difference in cDC1 cells between WT and KO mice. Is therefore the flow cytometry plot in Figure 1D not representative regarding cDC1 cells? Correct?

      Significance

      • DESCRIBE THE NATURE AND SIGNIFICANCE OF THE ADVANCE (E.G. CONCEPTUAL, TECHNICAL, CLINICAL) FOR THE FIELD. This is a conceptual advance. It adds a big step to our understanding of how immune cells in tissues (which all come from the bone marrow or are seeded before birth from embryonal hematopoietic organs such as yolk sac and fetal liver) can remain resident in these tissues. For cell types such as Langerhans cells, which establish their final population density within their tissues of residence, the presented finding convincingly buttress the role of proliferation and thereby the role for the actin-related protein complex 2/3 (Arp2/3).

      • PLACE THE WORK IN THE CONTEXT OF THE EXISTING LITERATURE (PROVIDE REFERENCES, WHERE APPROPRIATE). While we know much about actin-related proteins (Arp), as correctly cited by the authors, this knowledge is derived mostly from in vitro studies. The submitted study translates the findings to an in vivo setting for the first time.

      • STATE WHAT AUDIENCE MIGHT BE INTERESTED IN AND INFLUENCED BY THE REPORTED FINDINGS. Skin immunologists foremost, but these findings are of interest to the entire community of immunologists, but also cell biologists.

      • DEFINE YOUR FIELD OF EXPERTISE. My expertise is in skin immunology, in particular skin dendritic cells including Langerhans cells.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Shahbazi et al used a recurrent neural network model trained to control a musculoskeletal model of the arm to investigate how neural populations accommodate activity patterns underpinning savings. The paper draws upon the recent finding of a "uniform shift" in preparatory activity in monkey motor cortex associated with savings, and leverages full access to a computational model to establish causality.

      Strengths:

      The paper is well written, and the figures are clearly presented. The key finding that the uniform shift first reported based on neural recordings by Sun et al. emerges in artificial neural networks performing a similar task is interesting and well-backed by their analyses. Manipulating this uniform shift to show that it drives behavioural savings is an important causal confirmation of the proposal by Sun et al.

      Weaknesses / Comments:

      As mentioned earlier, the core results are well backed by the analyses. Most of my comments relate to adding more controls and additional questions that could be explored with the model to strengthen the paper.

      (1) Savings are quantified as more rapid relearning of the FF upon re-exposure (e.g., Figure 3). This finding is based on backpropagation through time, but would this hold when using a different optimiser, e.g., FORCE?

      This is an interesting question, and indeed, there are an increasing number of studies addressing how different neural network learning rules may affect the kinds of representations that arise after learning (Codol et al., 2024). However the focus of the present paper is not on which neural network approaches or which specific optimisers produce savings, rather, the focus is on the basis and neural geometry of savings when it emerges.

      We have added a short paragraph to the Discussion section [lines 349-355] to address this:

      “The present results are based on RNNs trained in an error-based approach using backpropagation through time (Werbos, 1990) using the Adam optimizer (Kingma and Ba, 2014). Other techniques for training RNNs have been proposed including the FORCE algorithm (Sussillo and Abbott, 2009). In addition, several recent reports have demonstrated success using reinforcement learning approaches to train neural networks in the context of sensorimotor control tasks (Lillicrap et al., 2015; Codol et al., 2024a). An interesting avenue for future work is to determine how the present results may or may not generalize to different neural network architectures and learning rules.”

      (2) The authors should include a "null model" showing that training on a different reaching task following NF, as opposed to FF2, won't show something akin to a uniform shift during preparation due to the adoption of TDR and having similar targets.

      This is a critical point. Training on a different reaching task other than FF2 (e.g. a different force field) will indeed result in a uniform shift, but critically, a shift in a different direction in neural state space than the uniform shift associated with FF2. The central focus of the present paper is to show that when there remains a non-zero projection of preparatory neural activity along the direction of the uniform shift associated with a given learning task, this residual projection underlies savings when networks are subsequently re-exposed to the same task.

      In the Results section we had included a short paragraph to describe control simulations that we performed that address this concept. We have expanded this text and added a Figure and the results of statistical tests to better describe this control [lines 179-187]:

      “As an additional control we trained networks after the growing up phase on an opposing force field (CCW) and then as above, exposed the networks to a NF washout phase, and then to a CW force field. In this case no savings was observed in the CW force field, either for initial lateral deviation, or for learning rate (Figure 3). In fact, we observed that initial lateral deviation is larger for the novel force field (t(39)=-4.918, p=1.6e-5). This observation is in line with the finding that learning opposing force fields sequentially results in interference (Sun et al., 2022). The results of these control simulations underscore that the savings effect observed in our main study was learning-specific—it was due to prior learning of the CCW force field, and not a general effect of learning any novel dynamics.”

      (3) The analyses of network activity during movement preparation (Figure 4) nicely replicate the key finding in Sun et al, but I think the authors could leverage the full access to their network and go further, e.g., by examining changes (or the lack of) during execution in FF2 with respect to FF (and perhaps in a future NF2 with respect to NF), including whether execution activity lives also lives in parallel hyperplanes, etc.

      We agree that a visualization of the neural activity during movement would be beneficial to the reader. To address this we have added a new Figure (Fig. 6) and associated text [lines 210-219]. The Figure shows the neural trajectories when the RNNs are first exposed to the FF1 and when they are first exposed to FF2 (after NF2 washout). Trajectories are plotted in 3D corresponding to the first 3 principal components, starting at the go cue and ending 200 ms into the movement, for each of the 8 movement targets.

      “The neural trajectories for preparation and for movement can be visualized in principal component space. Figure 6 shows trajectories during planning and early execution for initial FF1 and FF2 exposure. Hidden unit activity was subjected to a principal components analysis, and neural trajectories within the first three PCs are shown for movements to each of the eight movement targets. Filled circles indicate neural state 200 ms prior to the go cue. During the preparatory period trajectories travel along PC1 and then disperse across PC2 and PC3 into the circular pattern indicated by the filled stars, which indicate time of the go cue (also see Figure 5A). After the go cue neural trajectories shift back along PC1 and rotate along oscillatory patterns characteristic of populations of motor cortical neurons in non-human primates during movement (Churchland and Shenoy, 2024).”

      (4) Related to the above, while the results are interesting and the paper is well done, I kept wishing that the authors had done "more" with their model. This could be one or two final sections on "predictions" that would nicely complement their "validation" of the uniform shift, and that, in my opinion, would greatly increase the impact of the paper. In particular:

      (a) What would be the effect of learning more "tasks"? For example, is there a limit on how many fields can be learned? (You show something related by manipulating network size, but this is slightly different.)

      These are interesting questions and to some extent they are already addressed in the paper. Of course, the number of tasks that a network is able to learn, will be related to how much those tasks overlap in a control space. Indeed, this idea goes back to early theoretical accounts of connectionist models such as Hopfield nets and capacity for representing information (Hopfield, 1982; Hopfield et al., 1983). The control simulations that we described in the paper [lines 179-187 and Figure 4] are a test of one extreme version of this, in which two tasks are in direct opposition to each other (opposite force fields), and in this situation no savings emerges. We believe it is an interesting question, but beyond the scope of the present paper to undertake a comprehensive exploration of the nature of task-overlap in upper limb reaching learning tasks.

      (b) Figure 5 is a nice causal demonstration that the uniform shift is related to savings. However, and related to comment #3, it'd be interesting to see more details about how the behaviour and the network activity changes as preparatory activity shifts along this axis, in particular regarding how moving the preparatory states affect the organisation and dynamics of upcoming execution activity -these are the kind of intuitions that modelling studies like this one can provide.

      This has been addressed above by the changes we made to address the reviewer’s comment #3.

      (c) The authors focus on a task design that spans baseline, FF, NF, FF2 to replicate the original study by Sun et al. However, it would be interesting if they generated predictions for neural changes to other types of tasks that have been studied behaviourally. These could include, for example: (i) modelling a visuomotor rotation or a mirror reversal task; (ii) having to adapt to a FF in the opposite direction; (iii) investigating the role of adding an explicit context and having the networks learn multiple FF; and (iv) trying to learn FF fields in opposite directions, perhaps restricted to specific targets. As the authors know, all these questions and more have been studied with similar behavioural paradigms, and it would be nice to see what neural predictions are generated by this model.

      See responses above e.g. to comment 4. We have clarified the text and provided a new Figure to illustrate our opposite FF control simulations. The other suggestions about visumotor rotations, and contextual cues, are interesting and potentially important questions that we are working on, but we believe are beyond the scope of the current paper which is focused specifically around the question of savings in FF learning.

      (5) On the Discussion: When extrapolating from neural network results to animals, the fact that your networks can learn implicitly doesn't mean that animals do learn implicitly. Indeed, I think the consensus view is that different perturbations may lead to the expression of different types of savings (e.g., FF vs VR, which seems to be more explicit). Besides, these different mechanisms may be primarily implemented by brain regions less directly tied to motor control (e.g., cerebellum, parietal cortex?), which are not directly implemented in the authors' model.

      Of course the reviewer is correct that our simulations are not evidence that savings in motor tasks learned by animals is only implicit, and we do not make any such claims in the paper. The model we describe in the present paper is not meant to be a comprehensive model of motor learning in humans/animals. Indeed, the pure “context free” type of learning that we implement in our simulations basically cannot occur in animals, because there is always some information that provides contextual information. Indeed there are computational models of motor learning that include these effects, e.g. the COIN model (Heald et al., 2021). Our model however provides a useful window into what the context-free component of savings may look like. The approach we describe in the present paper is a powerful way to probe the context-free component of savings in isolation in a way that is not possible (at least not readily) in animals/humans. We have modified the text in the Discussion [lines 372-379] to better articulate this point.

      “The simulations described here do not constitute evidence that savings in motor learning tasks is exclusively implicit in animals and humans. The purely context-free learning implemented in our simulations is highly unrealistic, as some form of contextual information is invariably available. Indeed, computational models of motor learning that incorporate contextual effects already exist, e.g. (Heald et al. 2021). Nevertheless, our simulations provide a useful window into what the context-free component of savings may look like. This approach offers a powerful means of probing the context-free component of savings in isolation—something that is not readily achievable in animal or human experiments.”

      Reviewer #2 (Public review):

      Summary:

      Shahbazi et al. trained recurrent neural networks (RNNs) to simulate human upper limb movement during adaptation to a force field perturbation. They demonstrated that throughout adaptation, the pattern of motor commands to the muscles of the simulated arm changed, allowing the perturbed movements to regain their typical, perturbation-free straight-line paths. After this initial learning block (FF1), the network encountered null-fields to wash out the adaptation, before re-experiencing the force in a second learning block (FF2). Upon re-exposure, the network learned faster than during initial learning, consistent with the savings observed in behavioral studies of adaptation. They also found that as the number of hidden units in the RNN increased, so did the probability of exhibiting savings. The authors concluded that these results propose a neural basis for savings that is independent of context and strategic processes.

      Strengths:

      The paper addresses an important and controversial topic in motor adaptation: the mechanism underlying motor memory. The RNN simulation reproduces behavioral hallmarks of adaptation, and it provides a useful illustration of the pattern of muscle activity underlying human-like movements under both normal and perturbing conditions. While the savings effect produced by the network, though significant, appears somewhat small, the simulation demonstrating an increase in savings with a greater number of hidden units is particularly intriguing.

      Weaknesses:

      (1) To be transparent, savings in motor adaptation have been a primary focus of my own research. Some core findings presented in this paper are at odds with the ideas I and others have previously put forward. While I don't want to impose my agenda on the authors of this paper, I do think the authors should address these issues.

      (a) The authors acknowledge the ongoing debate in the literature regarding the mechanisms underlying savings, particularly whether it stems from explicit or implicit learning processes. However, it remains unclear how the current work addresses this debate. There is already a considerable body of research, particularly in visuomotor adaptation, demonstrating that savings is predominantly driven by explicit strategies. For example, when people are asked to report their strategy, they recall a strategy that was useful during the first learning block (Morehead et al. 2015). Furthermore, savings are abolished under experimental manipulations designed to eliminate strategic contributions (e.g., Haith et al., 2015; Huberdeau et al., 2019; Avraham et al., 2021). The authors briefly state that their findings support the hypothesis that a neural basis of memory retention underlying savings can be independent of cognitive or strategic learning components, and that savings can be characterized as implicit. While these statements may be true, it is not clear how this work substantiates these claims.

      We have addressed a similar point raised by Reviewer 1, see point #5 above. Our work represents an example of how savings can occur from implicit mechanisms in the absence of explicit contextual cues. Our goal is not to resolve the debate about how this occurs in humans/animals. Rather, our model provides a useful window into what the context-free component of savings may look like. Our approach is a powerful way to probe the context-free component of savings in isolation in a way that is not possible (at least not readily) in animals/humans. We have modified the text in the Discussion [lines 372-379] to better articulate this point.

      “The simulations described here do not constitute evidence that savings in motor learning tasks is exclusively implicit in animals and humans. The purely context-free learning implemented in our simulations is not meant to be a full model of biological learning, as in biological systems some form of contextual information is invariably available. Indeed, computational models of motor learning that incorporate contextual effects already exist, e.g. (Heald et al. 2021). Nevertheless, our simulations provide a useful window into what the context-free component of savings may look like. This approach offers a powerful means of probing the context-free component of savings in isolation—something that is not readily achievable in animal or human experiments.”

      (b) Our research has also demonstrated that if implicit adaptation is completely washed out after the initial learning block, it not only fails to exhibit savings but is actually attenuated relative to the first learning block (Avraham et al., 2021). This phenomenon of attenuation upon relearning can also be seen in other studies of visuomotor adaptation (e.g., Leow et al., 2020; Yin and Wei, 2020; Hamel et al., 2021; Hamel et al., 2022; Wang and Ivry, 2023; Hadjiosif et al., 2023). More recently, we have shown that this attenuation is due to anterograde interference arising from the experience with the washout block experience (Avraham and Ivry, 2025). We illustrated that the implicit system is highly susceptible to interference; it doesn't require exposure to salient opposite errors and can occur even following prolonged exposure to veridical feedback. The central thesis of this paper, namely that implicit savings can emerge through RNNs, is at odds with these empirical results. The authors should address this discrepancy.

      These empirical results are interesting and intriguing, and we agree that they are relevant in the context of the debate about the relative contributions and interactions between explicit and implicit learning systems and savings. Importantly, contextual interference is impossible in our model, since there are no contextual cues about which force field is present or absent. Interactions between an explicit system and an implicit learning system are also impossible in our model, since there is no possibility of context-driven explicit learning or memory. The approach we have taken in the present paper is not to model a full explicit plus implicit learning system but rather to probe how savings may emerge from a purely implicit learning mechanism alone and to compare the neural geometry underlying this implicit-drive savings to the neural recording results from monkey electrophysiology studies. Nevertheless we have added some text to the Discussion [lines 380-391] to situate our findings in the context of the studies mentioned above by the reviewer.

      “Recent empirical work suggests that relearning after washout of implicit adaptation can be attenuated rather than facilitated, a phenomenon attributed to anterograde interference from the washout phase (Avraham et al., 2021; Hadjiosif et al., 2023; Hamel et al., 2022, 2021; Leow et al., 2020; Wang and Ivry, 2025; Yin and Wei, 2020). The savings observed in our simulations differs from these behavioral findings. Crucially, our model excludes both contextual interference (since no cues signal which force field is present) and explicit-implicit interactions (since context-driven explicit learning is absent). Our goal was not to model a complete explicit-implicit system, but rather to probe how savings may emerge from a purely implicit mechanism and to compare the underlying neural geometry to monkey electrophysiology data. Our results suggest that high-dimensional neural circuits possess an intrinsic capacity for savings via persistent preparatory traces. How and when this capacity may be masked by interference or explicit-implicit interactions in biological systems remains an open question for future work.”

      (2) This brings me to the question about neural correlates: The results are linked to activity in the primary motor cortex. How does that align with the well-established role of the cerebellum in implicit motor adaptation? And with the studies showing that savings are due to explicit strategies, which are generally associated with prefrontal regions?

      The modeling approach we use in the present paper is area agnostic, and we do not include different neural modules to represent specific brain areas such as cerebellum or prefrontal regions. In the current approach we specifically exclude explicit strategies, as a way to specifically probe implicit mechanisms alone. Also see response to reviewer 1 comment 5 above.

      (3) The analysis on the complexity of the neural network (i.e., the number of hidden units) and its relationship to savings is very interesting. It makes sense to me that more complex networks would show more savings. I'm not sure I follow the author's explanation, but my understanding is that increased network complexity makes it more difficult to override the formed memory through interference (e.g., from the experience with NF2). Also, the results indicate that a network with 32 units led to a less-than-chance level of networks exhibiting savings (Figure 3b). What behavioral output does this configuration produce? Could this behavior manifest as attenuation upon relearning? Furthermore, if one were to examine an even smaller, simpler network (perhaps one more closely reflecting cerebellar circuits), would such a model predict attenuation rather than savings?

      These are interesting questions, and are potentially important, for future work to explore. Our interpretation of the results of smaller networks is that these small RNNs fail to show savings presumably because the learned FF behavior is 'erased' during washout because of the limited capacity to retain the FF learning in a distinct neighborhood in neural state space. Our paper is focused specifically on the relationship between savings, implicit learning, and neural capacity via network size, in the context of the monkey electrophysiology results in motor cortex. It would be interesting in future work to explore a cerebellar-like modeling approach.

      (4) The authors emphasize that their network did not receive any explicit contextual signals related to the presence or absence of the force field (FF), thus operating in a 'context-free' manner. From my understanding, some existing models of context's role in motor memories (e.g., Oh and Schweighofer, 2019; Heald et al., 2021) propose that memory-related changes can be observed even without explicit contextual information, as contextual changes can be inferred from sudden or significant environmental shifts (e.g., the introduction or removal of perturbations). Given this, could the observed savings in the current simulation be explained by some form of contextual retrieval, inferred by the network from the re-presentation of the perturbation in FF2?

      It is important to note that this is not possible in the context of the modeling approach described in the present paper. For example, in trial 1 of FF2, because the network has no contextual cue signaling the FF’s presence, the network has no information before movement begins that a FF will be present during movement (recall that the FF is velocity-dependent, and so is zero before movement begins). Once the network encounters the FF during movement, some component of its response I suppose could be described as contextual inference derived from effector state (similar to the account described in the COIN model), but strictly speaking the model is only responding to what it encounters in the moment. Any change in behaviour due to prior learning (e.g. savings) is due to the interaction between the residual learning-related neural state (e.g. the uniform shift), the effector state in the moment, and the errors encountered during movement. We don’t interpret this as “inference” in the traditional sense of an explicit learning system.

      (5) If there is residual hidden unit activity related to the FF at the end of the NF2 phase, how does the simulated movement revert back to baseline? Are there any differences in the movement trajectory, beyond just lateral deviation, between NF1 and NF2? The authors state that "changes in the preparatory hidden unit activity did not result in substantive changes in the motor commands (Figure 5b), which emphasizes that the uniform shift resides in the null space of motor output." However, Figure 5b appears to show visible changes in hidden unit activity. Don't these changes reflect a pattern of muscle activity that is the basis for behavior? These changes are indeed small, but it seems that so is the effect size for savings (Figure 3a). Could this suggest that there is not, in fact, a complete washout of initial learning during NF2 within the network?

      This is precisely the point of the paper, i.e. to show that neural activity during the preparatory period before movement onset is different, even though the behaviour during the preparatory period is the same (i.e. no muscle activity and no movement). This recapitulates the empirical findings from the neural data reported in the Sun et al. (2022) paper.

      The reviewer asks “Don't these changes reflect a pattern of muscle activity that is the basis for behavior?” Yes indeed they do, but not during the NF and not during the preparatory activity prior to movement onset.

      The reviewer asks “Could this suggest that there is not, in fact, a complete washout of initial learning during NF2 within the network?” We addressed this in the paper (Results/Washout) by comparing kinematics after washout to that prior to FF learning; e.g. any differences in lateral deviation of the hand path for the entire reach trajectory was in the range of 0.1 mm, which is less than 0.25 % of the lateral deviation encountered in the FF and only 0.1 % of the reach distance (10 cm).

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) Figure 1c, lower panel: Is this from the early or late stage of FF1?

      This is an example movement after learning in a null field (NF). We have clarified this in the Figure caption.

      (2) Please clarify what the two panels in Figure 1e represent.

      We have clarified in the Figure caption that these are activity from two example hidden units.

      (3) If Figure 2c is intended to illustrate the changes in motor commands for individual muscles, consider reorganizing the plots by muscle to more clearly show the change for each muscle from NF1 to FF1.

      The point here is not to make fine-grained comparisons between specific muscles, rather to show a general example of how muscle activity is different. For the sake of visual simplicity in a Figure that already has many components we have decided to keep Figure 2c the same.

      (4) The text mentions that no savings were observed when the network was trained on CCW followed by CW perturbations. However, no data or statistical analysis is presented to support this claim. I wonder if the authors would expect attenuated learning when exposed to the CW perturbation, given a memory of the opposite perturbation.

      We have added a Figure to provide data for the FF opposite control.

      (5) The relevance of the discussion on choking under pressure to the paper wasn't clear.

      We have modified the relevant text in the Discussion section [lines 356-363] to clarify the relevance of the present work to other recent work on how complex features of motor behaviour can arise due to the dynamics of preparatory neural activity in motor cortex.

      References

      Avraham G, Morehead JR, Kim HE, Ivry RB. 2021. Reexposure to a sensorimotor perturbation produces opposite effects on explicit and implicit learning processes. PLoS Biol 19:e3001147. doi:10.1371/journal.pbio.3001147

      Codol O, Krishna NH, Lajoie G, Perich MG. 2024. Brain-like neural dynamics for behavioral control develop through reinforcement learning. bioRxiv. doi:10.1101/2024.10.04.616712

      Hadjiosif AM, Morehead JR, Smith MA. 2023. A double dissociation between savings and long-term memory in motor learning. PLoS Biol 21:e3001799. doi:10.1371/journal.pbio.3001799

      Hamel R, Dallaire-Jean L, De La Fontaine É, Lepage JF, Bernier PM. 2021. Learning the same motor task twice impairs its retention in a time- and dose-dependent manner. Proc Biol Sci 288:20202556. doi:10.1098/rspb.2020.2556

      Hamel R, Lepage J-F, Bernier P-M. 2022. Anterograde interference emerges along a gradient as a function of task similarity: A behavioural study. Eur J Neurosci 55:49–66. doi:10.1111/ejn.15561

      Heald JB, Lengyel M, Wolpert DM. 2021. Contextual inference underlies the learning of sensorimotor repertoires. Nature 600:489–493. doi:10.1038/s41586-021-04129-3

      Hopfield JJ. 1982. Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci U S A 79:2554–2558. doi:10.1073/pnas.79.8.2554

      Hopfield JJ, Feinstein DI, Palmer RG. 1983. “Unlearning” has a stabilizing effect in collective memories. Nature 304:158–159. doi:10.1038/304158a0

      Leow L-A, Marinovic W, de Rugy A, Carroll TJ. 2020. Task errors drive memories that improve sensorimotor adaptation. J Neurosci 40:3075–3088. doi:10.1523/JNEUROSCI.1506-19.2020

      Wang T, Ivry RB. 2025. Contextual effects during sensorimotor adaptation are an emergent property of population coding in a cerebellar-inspired model. Sci Adv 11:eadr4540. doi:10.1126/sciadv.adr4540

      Yin C, Wei K. 2020. Savings in sensorimotor adaptation without an explicit strategy. J Neurophysiol 123:1180–1192. doi:10.1152/jn.00524.2019

    1. Problem Ownership Problem ownership is an important tool to utilize when caregivers are communicating with children because it can help avoid blaming and arguing. This is when caregivers take time to reflect on an issue and think, “Whose problem is this? Who is actually upset about this?” Sometimes we may think the child is the one with the problem when actually we are the ones getting upset. In reality, the child is just fine – we are the ones that have a problem. This is when a caregiver should own the problem. If a caregiver owns the problem, it is a perfect opportunity to utilize effective communication strategies such as I-Messages to express one’s thoughts and feelings regarding the problem. If, however, the child owns the problem, caregivers can use this as a chance to practice adult-child interaction techniques such as active listening and the CALM method to connect with the child concerning the problem. Problem ownership helps caregivers determine which problems they need to figure out themselves, and which problems they should allow their children to figure out. This provides a learning experience to gain responsibility for one’s actions that can be utilized in other relationships as well.

      Problem ownership is an important tool of communication for parents to take a moment and think who is truly upset by an issue, them or the child. This is used to prevent unnecessary conflict and allows the child to learn responsibility for their actions as the parent offers advice, but doesn't take full control of the issue.

      Key takeaway: As a parent its likely easy to get upset on the behalf of the child and try to take responsibility for the problem, but the parent should instead try to help the child reflect on it and solve it themselves.

      Example: A child tells their parent a problem and the parent immediately wants to try solving it themselves, but they realize theyre getting too worked up about it and instead asks the child to tell them what happened and gives advice on how to go about it.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors test the hypotheses, using an effort-exertion and an effort-based decision-making task, while recording brain dynamics with EEG, that the brain processes reward outcomes for effort differentially when they earned for themselves versus others.

      Strengths:

      The strengths of this experiment include what appears to be a novel finding of opposite signed effects of effort on the processing of reward outcomes when the recipient is self versus others. Also, the experiment is well-designed, the study seems sufficiently powered, and the data and code are publicly available.

      Weaknesses:

      There is some concern about the fact that participants report feeling less subjective effort, but also more disliking of tasks when they were earning rewards for others versus self. The concern is that participants worked with less vigor during self-versus-others trials and this may partly account for a key two-way Recipient x Effort interaction on the size of the Reward Positivity EEG component. Of note, participants took longer to complete tasks when working for others. While it is true that, in all cases, participants met the requisite task demands (they pressed the required number of buttons) they did so more sluggishly when earning rewards for others. The Authors argue that this reflects less motivation when working for others, which is a plausible explanation. The Authors also try to rule out this diminished vigor as a confounding explanation by showing that the two way interaction remains even when including reaction times (and also self-reported task liking) as a covariate. Nevertheless, it is possible that covariates do not fully account for the effects of differential motivation levels which would otherwise explain the two-way interaction. As such, I think a caveat is warranted regarding this particular result.

      We thank Reviewer #1 for the continued positive assessment and for continuing to highlight the caveat regarding the potential influence of differential vigor on the observed RewP interaction effects.

      We agree that a caveat is warranted. As detailed in our previous response (R5), we had already conducted control analyses addressing this concern; however, we acknowledge that these results were not incorporated into the manuscript itself. We have now addressed this by adding the covariate analyses to the Result section, along with an explicit caveat in the Discussion.

      Before describing the specific revisions, we would like to offer a minor clarification: the covariates in our control analyses were trial-by-trial response speed and self-reported effort ratings, rather than task liking ratings as noted in the summary above. Neither response speed nor effort rating predicted RewP amplitudes, and the critical Recipient × Effort and Recipient × Effort × Magnitude interactions remained significant and essentially unchanged. However, as the reviewer rightly pointed out, covariates may not fully capture the effects of differential motivation. Specifically, we have made the following revisions:

      First, we added the covariate control analyses to the Result section: “To rule out the possibility that the differential vigor between self- and other-benefiting trials drove the Recipient × Effort and Recipient × Effort × Magnitude interactions on the RewP, we conducted two control analyses by including trial-by-trial response speed and subjective effort ratings as separate covariates in the RewP model. Neither response speed (b = -0.07, p = .641) nor effort rating (b = 0.10, p = .186) predicted RewP amplitudes, and the critical Recipient × Effort and Recipient × Effort × Magnitude interactions remained significant and essentially unchanged (see Supplementary Table S3 for full regression estimates)” (page 12, para. 1).

      Second, we added a caveat to the Discussion section acknowledging this alterative explanation, which reads, “Another concern is that participants exhibited less vigor when working for others, as indicated by slower response speed and lower subjective effort ratings for other- versus self-benefiting trials. Although our control analyses confirmed that neither covariate predicted RewP amplitudes and the critical interactions remained significant, covariates may not fully capture the effects of differential motivation, and this alternative explanation cannot be entirely ruled out” (page 22, para. 2, lines 9–12; page 23, para. 1).

      Reviewer #2 (Public review):

      Summary:

      Measurements of the reward positivity, an electrophysiological component elicited during reward evaluation, have previously been used to understand how self-benefitting effort expenditure influences processing of rewards. The present study is the first to complement those measurements with electrophysiological reward after-effects of effort expenditure during prosocial acts. The results provide solid evidence that effort adds reward value when the recipient of the reward is the self but discounts reward value when the beneficiary is another individual.

      Strengths:

      An important strength of the study is that amount of effort, the prospective reward, the recipient of the reward, and whether the reward was actually gained or not were parametrically and orthogonally varied. In addition, the researchers examined whether the pattern of results generalized to decisions about future efforts. The sample size (N=40) and mixed-effects regression models are also appropriate for addressing the key research questions. Those conclusions are plausible and adequately supported by statistical analyses.

      We sincerely appreciate Reviewer #2’s positive evaluation of our manuscript and thank the reviewer for recognizing the strength of our experimental design and analysis approach.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This important study concerns the propagation of waves in bacterial biofilms, bridging active matter physics and bacterial biophysics. While the experimental observations are solid, the theoretical interpretation and model validation are currently incomplete and require further refinement. This work will be of interest to microbiologists, biophysicists, and researchers studying collective behavior in biological systems.

      In the revised manuscript, we have added new experimental results that strengthen the connection between our observations and the modeling framework used to interpret the collective oscillations. We have not introduced a new theoretical model; rather, we employed established active matter models and sought to link the observed phenomena to these frameworks. In particular, our new data demonstrate that the transition between the motile and biofilm-forming states specifically modulates the elasticity and elasto active coupling of the bacterial structure. This behavior is in excellent agreement with the predictions of the active solid model. All the experimental details are given below. We believe that the revised version of the manuscript now establishes this connection more clearly and convincingly.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Overall, this is an interesting paper. The authors have found multiple experimental knobs to perturb a mechanical wave behavior driven by pilli feedback. The authors framed this as nonreciprocal interactions - while I can see how nonreciprocity could play a role - what about mechanical feedback? Phenomenological models are fine, but a lack of mechanistic understanding is a weakness. I think it will be more interesting to frame the model based on potential mechanochemical feedback to understand microscopic mechanisms. Regardless, more can be done to better constrain the model through finding knobs to explain experimental observations (in Figures 3, 4, 5, and 7).

      We thank the reviewer for the positive assessment and for highlighting this important point. The reviewer is correct that the phenomenological Kuramoto-based model does not explicitly show the detailed cell–cell interactions. However, the active solid model is formulated on detailed elastic couplings and active forces, which inherently represent mechanical feedback within the biofilm structure. In this framework, nonreciprocity emerges naturally from the tensorial nature of active forces between bacteria—a concept already well established in the active matter literature. Importantly, this mechanism is purely mechanical and closely parallels nonreciprocal hydrodynamic interactions among active particles, which also arise from tensorial couplings.

      In our system, elastic interactions within the biofilm matrix, combined with pilus-generated active forces, provide a natural origin for nonreciprocal interactions. To further validate this, we improved our imaging to record single-cell dynamics both at the colony edge and on the biofilm surface. (new supplementary Video). These experiments show that motile bacteria at the leading edge of the biofilm structure do not generate waves, whereas stationary bacteria within the biofilm display local oscillations within the elastic network. This observation supports the view that collective oscillations are a property of the elastic biofilm state rather than of freely motile cells.

      Moreover, the main control parameter for these oscillations is the ratio between elastic strength and the active force generated by pili. In the active solid model, this ratio is captured by the parameter π and alpha terms. Experimentally, we can tune this ratio simply by adding or removing water from the biofilm, thereby modulating its elasto active coupling. We further motivated the controllability of this feature experimentally. We let the plate dry nonuniformly and observed that the transition between spiral target and plane waves could emerge spontaneously across the plate (see Figure 3a). This observation also states the importance of moisture in the biofilm. Starting from this point we established the connection between experimental observation and modelling. In our new simulations we also noticed that the transition from spiral to target wave is particularly driven by merging processes of different topological charges +/- 1 spiral pairs. This critical point was also confirmed by modelling which links the process to elasto active coupling. Further we supported our claim by imagining the edge and the biofilm structure. These new results clarify that elastic structure of the biofilm is critically important (Supplementary Figure 3). We have clarified this mechanistic link in the revised manuscript and rewritten the relevant sections to make this connection explicit.

      Modification in the manuscript:

      “To gain deeper insight into the mechanisms underlying wave formation, we imaged the dynamics of individual bacteria from the fingering regions toward the center of the biofilm. This distinction is critical because, unlike the biofilm center, the edges do not generate waves. We observed that bacteria near the fingering regions remain motile and exhibit collective flow. In contrast, bacteria at the biofilm center are surface-attached and undergo periodic lifting motions. This behavior strongly resembles Mexican-wave dynamics.”

      “We further found that the central region of the biofilm is mechanically more elastic, whereas the edge regions—where wave formation is absent—are motile. These observations suggest that gradual biofilm maturation is a key factor that transforms motile bacteria into a periodically moving but spatially constrained state. Consistent with this picture, the PAO1 strain, which has a strong biofilm-forming capability, completely suppresses surface oscillations. In contrast, the PA14 strain exhibits intermediate behavior, sustaining a partial transition between motile and locally constrained dynamics. Remarkably, signatures of this transition and wave generation are already detectable at the earliest stages of finger formation.”

      Strengths:

      The report of mechanical waves in bacterial collectives. The mechanism has potential application in a multicellular context, such as morphogenesis.

      We thank the reviewer for the positive assessment and for highlighting this potential broad impact of our findings.

      Weaknesses:

      My most serious concern is about left-right symmetry breaking. I fail to see how the data in Figure 6 shows LR symmetry breaking. All they show is in-out directionality, which is a boundary condition. LR SM means breaking of mirror symmetry - the pattern cannot be superimposed on its mirror image using only rigid body transformations (translation and rotation) - as far as I am aware, this condition is not satisfied in this pattern-forming system.

      We thank the reviewer for pointing out this critical issue. We acknowledge that we overlooked the distinction between biological and physical definitions of left–right symmetry in our initial submission, and we agree that our terminology was confusing.

      In developmental biology, the term “left–right symmetry breaking” is often used to describe asymmetric flows generated by nodal cilia, which subsequently establish developmental asymmetry. This usage differs fundamentally from the physical definition of mirror symmetry breaking, which refers to chirality switching upon mirror reflection. As the reviewer correctly noted, our system does not exhibit mirror symmetry breaking in this strict physical sense.

      To avoid confusion, we have revised the manuscript and replaced the term left–right symmetry breaking with left–right asymmetry between the edge and the center of the biofilm. This asymmetry arises from frequency gradients across the biofilm and is not a trivial boundary effect. For circular colonies, this phenomenon is more accurately described as radial asymmetry. We have rewritten the relevant sections of the manuscript to clarify this distinction and prevent misinterpretation.

      Reviewer #2 (Public review):

      Summary:

      This manuscript by Altin et al. examines the dynamics of bacterial assemblies, building on previously published work documenting mechanical spiral waves. The authors show that the emergent dynamics can be influenced by various factors, including the strain of bacteria and water content in the sample. While the topic of this paper would be of broad interest, and the preliminary results are certainly interesting, various aspects of this paper are underdeveloped and require further exploration.

      Strengths:

      One of the nice features of this system is the ability to transition between the different states based on the addition or withdrawal of water. The authors use a similar experimental model system and mathematical model to previously published work (Reference 49), but extend by showing that the behaviour can be modified through simple interventions. Specifically, the authors show that adding water droplets or drying the sample through heating can result in changes in the observed wave structure. This represents a possible way of controlling active matter.

      The mathematical model proposed in this paper involves a phase-oscillator model of Kuramotostyle coupling (similar to previously reported models). A non-reciprocal phase lag is introduced in order to facilitate the patterns seen in experiments. The qualitative agreement in the behaviour is quite striking, showing both spiral waves and travelling waves.

      We thank the reviewer for the positive assessment and for pointing out areas that required further development. The reviewer is correct that our work builds on previously reported bacterial spiral wave systems; however, there are several significant differences that we now emphasize more clearly in the revised manuscript.

      First, our study involves a different bacterial species and reveals a distinct dynamical process: the waves we report are strictly localized on the surface of the biofilm, in contrast to the bulk oscillations detected through density fluctuations in the earlier work (Ref. 49). The surface waves in our system resemble “Mexican wave”-like motions, in which surface bacteria periodically lift upward. To highlight this key distinction, we performed new imaging experiments that directly visualize this process. (New Video 5 and 6, Author response image 1).

      Second, we systematically compared different bacterial strains, including pathogenic species such as P. aeruginosa PA14 and PAO1, alongside our BSL-1 strain. This comparative approach demonstrates that the observed phenomenon spans strains with different pathogenicity levels, and genetic variations while also showing that our strain provides a safer and more broadly usable model system for laboratory investigations.

      Third, the modeling frameworks differ. Whereas the referred study relied primarily on phase models similar to those used in cilia systems, we combine a delayed Kuramoto-style oscillator model with an active solid model. This combination provides both a phenomenological description and a physical interpretation of the collective dynamics. We acknowledge that, in the original submission, the physical interpretation of the model in relation to our experimental system was underdeveloped. In the revision, we have now established this link explicitly through the elasticity and elasto active coupling of the biofilm. Specifically, we show that the transition from motile to biofilm states is accompanied by changes in elasticity, which directly influence the observed transitions between different types of wave defects. This connection is consistent with prior theoretical works and has even been only studied in robotic active matter systems.

      Together, these clarifications and new results reinforce the novelty of our findings and establish a stronger connection between the experiments and the modeling framework.

      Author response image 1.

      Comparison between the elastic biofilm core and the motile colony edge. Highresolution video recordings revealing individual bacterial motion highlight the key physical differences driving wave-generating. Time-lapse snapshots show that bacteria at the colony edge move freely and form fingering structures, whereas bacteria in the elastic central biofilm periodically lift vertically, producing a Mexican-wave–like collective motion across the surface. See new Video

      Weaknesses:

      The principal observation of the paper - that spiral waves emerge in these systems and can be controlled in various ways - is not linked to microscale dynamics at the cell level. It is recognised that hydrodynamics can introduce non-reciprocity, an essential ingredient of this model. However, in this work the authors have not identified a physical mechanism for the lag, e.g., either through steric interactions or hydrodynamic disturbances. This is also relevant in the phase oscillator modelling section. In low Reynolds number flows, dynamics are instantaneously determined. In this light, what does the phase lag term represent?

      The reviewer is correct that, at low Reynolds numbers, fluid dynamics are instantaneous and do not generate real temporal delays. However, nonreciprocity in hydrodynamic interactions can still emerge from the tensorial structure of the Blake–Oseen Green’s function. In this formalism, the effective asymmetry can be represented mathematically as a phase-lag–like term. This has been theoretically demonstrated in Ref.40. While this is not a literal time delay, it functions analogously by breaking odd symmetry in the coupling.

      In our system, strong long-range hydrodynamic interactions are absent, as the bacteria are embedded in an elastic biofilm matrix. Instead, the dominant interactions are active elastic couplings mediated by pili and biofilm structure. The elastic solid model behaves in a way that is conceptually similar to the hydrodynamic case: pili-induced deformations of the elastic medium produce anisotropic stresses that play a role analogous to the tensorial hydrodynamic Green’s function. Thus, the phase-lag term in our Kuramoto-based model can be interpreted as an effective representation of these nonreciprocal elastic interactions.

      We have clarified this point in the revised manuscript by explicitly connecting the phenomenological phase-lag term to the underlying elastic coupling in biofilms.

      What is the origin of the coupling term, b? Can this be varied systematically or derived from experimental measurements or parameters?

      The term b represents the enhanced elasto-active coupling of the pili process. The length of the Pili varies, and the elongated Pili has more potential to modulate the coupling between bacteria which is known to depend on a critical threshold. This process resembles the pinning dynamics and is driven by the activity of molecular motors within the pili machinery. However, the detailed mechanisms that set the effective coupling strength remain highly complex and are not yet fully understood.

      At present, we do not have a direct way to systematically manipulate b in experiments. A major technical limitation is the nanoscale nature of type IV pili: these protein assemblies are extremely small and difficult to monitor or manipulate directly. Even basic tools such as GFP-based labeling have proven challenging to implement, which restricts our ability to track the detailed dynamics of these structures in live biofilms.

      While we cannot currently derive b directly from experimental parameters, we emphasize in the revised manuscript that b should be understood as an effective parameter capturing the excitability of pili retractions. We also highlight this limitation and note that future advances in molecular imaging and manipulation of pili will be essential for quantitatively linking b to microscopic processes.

      Classification of wave properties is an important aspect of this paper, but is not accomplished in a quantitative sense. What is the method for distinguishing between travelling and spiral waves? There is a range of quantitative tools that could be used to investigate these dynamics (and also compare quantitatively with the models). For example, examining the correlation functions and order parameters could assist with the extraction of wave features (see extensive literature on oscillator models).

      We thank the reviewer for emphasizing this important point. In the revised manuscript, we have incorporated the classic Kuramoto order parameter (S) to characterize the dynamics in our model simulations. However, this metric is not directly applicable to our experimental system, because we cannot resolve the phase of individual bacteria at large scales.

      Instead, we have focused on a flux-based parameter, as previously used in Ref. 40, which can be measured experimentally from collective surface dynamics. Interestingly, we find that the directional flux extracted from our experimental movies closely matches the trends predicted by the model order parameter. We suspect that this similarity arises from the combination of our optical illumination method and the characteristic surface modulations of the biofilm. While we currently lack a rigorous theoretical justification for this correspondence, so we want to keep this discussion in the review document.

      In summary, we now use the classic Kuramoto order parameter in simulations and rely on the experimentally accessible flux measure for our experimental data. This dual approach allows us to compare model and experiment in a consistent manner.

      Author response image 2.

      Critical order parameters of the coupled biofilm system. (a) The Kuramoto global order parameter increases continuously as the system becomes globally synchronized. In contrast, in the nonreciprocally coupled system the order parameter saturates at a critical level. (b) In the experimentally observed biofilm, however the flux generated by the coupled oscillations provides a more appropriate measure of synchronization. Blue curves indicate directionally propagating planar waves, red curves correspond to spiral wave formation, and green curves represent the globally synchronized reciprocal system.

      Author response image 3.

      Comparison of flux profiles of the simulations with experimental measurements. Directional optical illumination enhances the flux term on the surface of the biofilm.

      The methodology of changing the dynamics through moisture content appears to be slightly underdeveloped, e.g., adding water involves a droplet, and removing water is accomplished by heating (which presumably could cause other effects). Could the dynamics not be controlled more directly by varying the humidity?

      We thank the reviewer for this valuable suggestion. Our results indicate that water content in the biofilm plays a key role in driving the transition to the biofilm state by modulating its elasticity. During the initial submission, we did not know how to systematically vary humidity without simultaneously altering temperature. Standard approaches typically involve water evaporation in controlled chambers, which inherently changes both parameters.

      Following the reviewer’s recommendation, we first measured the ambient moisture levels inside closed culture plates. To our surprise, the relative humidity was already ~98%, leaving virtually no room to increase it further. We then attempted to decrease humidity by flowing dry synthetic air, but even under these conditions we could not reduce it below ~85%, and achieving this required unrealistically high flow rates. Moreover, we noticed that in closed-lid NGM plates, evaporation is already substantial, and when the lid is left open the evaporation rate reaches ~1 µm/s. This rapid surface thinning severely limits the quality of long-term time-lapse imaging.

      Taken together, these technical constraints explain why we have to reliy on localized perturbations such as water droplets and heating rather than global humidity control. We have clarified this point in the revised manuscript and now explicitly discuss both the challenges and limitations of humidity-based approaches.

      At the same time, the authors also mention that temperature itself plays a role in shaping the behaviour. What is the mechanism for this? Is it just through evaporation? Since the frequency increases with temperature, could it just be that activity increases with temperature?

      We thank the reviewer for raising this critical point. We believe that temperature has two distinct impacts operating on different timescales.

      Short timescale (~minutes): We observed that biofilm oscillations respond to temperature changes very rapidly and in a reversible manner. This timescale is too short to be explained by modulation of water content or bulk elasticity of the biofilm. Instead, we attribute the immediate frequency increase to enhanced biological activity of the bacteria at elevated temperatures.

      Long timescale (~tens of minutes to hours): During processes such as the transition from planar to spiral waves, prolonged heating can significantly alter the biofilm structure. These changes are not reversible and likely involve modifications of elasticity and other structural properties.

      In the modeling framework, the short-timescale effect is represented as an increase in the active force term, while the long-timescale effect is captured by concurrent changes in both the active force and the elastic properties of the biofilm. We have clarified this mechanism and its representation in the revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      This manuscript presents a novel investigation into unidirectionally propagating waves observed on the surface of Pseudomonas nitroreducens bacterial biofilms. The authors explore how these waves, initially spiral in form, transition into combinations of spiral, target, and planar patterns. The study identifies the periodic extension-retraction cycles of type IV pili as the driving mechanism for wave propagation, which preferentially moves from the colony's edge to its center. Furthermore, the manuscript proposes two theoretical models-a phase-oscillator model and a continuum active solid model-to reproduce these phenomena, and demonstrates how external manipulations (e.g., water droplets, temperature, PEG) can control wave patterns and direction, often correlating with oscillation frequency gradients. The work aims to bridge the fields of activematter physics and bacterial biophysics by providing both experimental observations and theoretical frameworks for understanding these complex biological wave phenomena.

      We thank the reviewer for the positive assessment of our work and for highlighting both the novelty and the key contributions of our study.

      Strengths:

      The experimental discovery of unidirectionally propagating waves on bacterial biofilms is highly intriguing and represents a significant contribution to both microbiology and active-matter physics.

      The detailed observations of wave pattern transitions (spiral to target to planar) and their response to various environmental perturbations (water, temperature, PEG) provide valuable empirical data. The identification of type IV pili as the driving force offers a concrete biological mechanism. The observed correlation between frequency gradients and wave direction is a compelling finding with potential for broader implications in understanding biological pattern formation. This work has the potential to stimulate further research in the collective behavior of living systems and the physical principles underlying biological organization.

      We thank the reviewer once again for emphasizing the importance of wave directionality. We also believe that this phenomenon may provide insight into early symmetry-breaking processes observed in developmental biology, where oxygen or nutrient gradients in dense environments could play a similar role.

      Weaknesses:

      The manuscript attempts to link unidirectional wave propagation to non-reciprocal couplings but ultimately shows that the wave direction is determined by the gradient of the oscillation frequency. The couplings in the two theoretical models are both isotropic and thus cannot dictate the wave direction. A clear distinction should be made between non-reciprocity as a source of wave generation and non-uniformity as a controlling factor of wave direction.

      We greatly appreciate the reviewer’s careful evaluation, particularly for highlighting this important and often confusing distinction. The relationship between nonreciprocity, spontaneous symmetry breaking, and frequency gradients has also been a challenging concept for us and required significant effort to clarify.

      Recent theoretical studies have established that traveling wave formation requires nonreciprocity, which provides a framework for understanding phenomena ranging from spiral to target and planar waves. In our system, nonreciprocity arises between the displacement field (U) and the pili force vector (P): as a result in broken phase U effectively “chases” P, breaking PT symmetry locally and thereby enabling the generation of local directional flux and traveling waves. In this sense, nonreciprocity is essential for travelling wave generation and spontaneous symmetry breaking in either direction.

      However, we now agree that global directionality (always from right to left, or edge to center) is set by an independent factor—namely, the oscillation frequency gradient across the biofilm. Thus, while nonreciprocity determines whether waves can travel, frequency gradients determine the large-scale direction in which they propagate. Put differently, PT symmetry is already broken spiral waves due to nonreciprocity, but global asymmetry (frequency gradients) is required to align the overall propagation in one direction.

      We have clarified this distinction in the revised manuscript, emphasizing that nonreciprocity is a necessary ingredient for travelling wave generation, whereas global asymmetry controls global wave direction.

      Modification in the manuscript:

      “We should note that traveling waves indicate broken PT symmetry between these fields triggered by nonreciprocity, with spiral waves serving as a classic signature of this phenomenon. A further transition from spiral to planar waves reflects an overall asymmetry in the frequency profile, which is not directly related to PT-symmetry breaking.”

      The relationship between the phase oscillator model and the active solid model is unclear. Given that U and P are both dynamical variables evolving in three-dimensional space, defining the phase Φ precisely in the phase space spanned by U and P could be challenging. A graphical illustration of the definition of Φ would be beneficial. To ensure reproducibility of the numerical results, the parameter values used in the numerical simulations and an explicit definition of the elastic force in the active solid model should be provided.

      We agree with the reviewer that the relationship between the phase oscillator model and the active solid model can be confusing, but establishing this link is essential to connect different modeling approaches in the literature. As the reviewer notes, in a fully three-dimensional setting with freely moving bacteria, defining the oscillation phase (Φ) in the phase space spanned by U and P is indeed complicated.

      However, our recent imaging results show that bacteria within the biofilm do not undergo large translational motions but instead exhibit periodic “Mexican wave”-like oscillations. These oscillations are confined to a restricted phase space, which allows us to define Φ in a straightforward way. In this context, the phase oscillator model becomes a natural reduction of the dynamics.

      Similarly, in the active solid (or active gel) model, we can plot not only the displacement and force vectors but also the local phase, which shows strong agreement with the phenomenological Kuramoto-style model. To make this connection clearer, we have now included a schematic illustration in the revised manuscript that explicitly shows how Φ is defined in the reduced phase space, and we provide the parameter values used in the simulations as well as the explicit definition of the elastic force in the active solid model to ensure reproducibility.

      The link between the theoretical models and experimental results is weak. For example, the propagation of the kink from the lower to the higher part of the surface (Figure 1e) could be addressed within the framework of the active solid model. The mechanism of transition from spiral to target waves (Figure 3a), b)) requires clarification, identifying which model parameter is crucial for inducing this transition. The wave propagation toward the lower frequency side is numerically demonstrated using the phase oscillator model, but a physical or intuitive explanation for this phenomenon is missing. Also, the wave transitions induced by the addition of water droplets and temperature rise are not linked to specific parameters in the theoretical models.

      We thank the reviewer for highlighting this important weakness, which was also consistently noted by the other reviewers. We fully agree that the link between our theoretical models and experimental results required significant strengthening.

      With improved imaging in the revised study, we were able to uncover additional connections that help establish this link more clearly. We acknowledge that our ability to measure detailed biofilm parameters is limited, which restricts us from providing fully quantitative mappings. Nonetheless, based on the reviewers’ suggestions, we carried out additional imaging and simulations to compare bacterial dynamics at the colony edge and within the biofilm surface. These data confirm that cells within the biofilm undergo restricted, “Mexican wave”-like oscillations, emphasizing the critical role of elasticity in governing the collective dynamics.

      Experimentally, we found that adding water or PEG, or alternatively inducing drying, strongly modulates the effective elasticity of the biofilm. Within the active solid framework, elasticity and the elasto-active coupling are the key parameters controlling the system. By tuning these parameters in simulations, we could reproduce the qualitative transitions observed experimentally. Specifically, we observed that:

      At low elasticity, topological defects are mobile and can move, merge, or annihilate, leading to the emergence of planar waves.

      At high elasticity, defects remain pinned, across the biofilm surface, dominating the dynamics.

      These observations suggest that the motility of defects is the crucial parameter governing the transition between spiral, target, and planar waves. Although we cannot independently manipulate each parameter in experiments, varying the moisture content provides an effective and experimentally accessible control.

      Finally, our simulations and new analyses reveal that spiral defect cores can move and merge to form target waves or annihilate entirely—processes that we also observe experimentally. This rich dynamical behavior underscores the importance of elasticity in shaping pattern transitions, and we believe it warrants further theoretical exploration. We have clarified this connection and its implications in the revised manuscript.

      First, we compare defect dynamics in both Kuramoto-based simulations and the active solid model. Both systems exhibit similar defect-survival behavior. As shown in the review , pairs of unlike (+/−) defects can stably persist only at high nonreciprocity. We further quantify this behavior by plotting the separation distances between unlike defect pairs and find that short-range defect separations are possible exclusively in the high-nonreciprocity regime Supplementary Figure 11.

      This high-nonreciprocity regime corresponds to the dry biofilm state. Increasing moisture reduces elasticity, leading to the loss of stable defect dynamics and promoting the annihilation of unlike defect pairs, which in turn drives the system toward target-wave formation and ultimately planar waves. Conversely, heating the biofilm removes water, enhances elasticity, and increases the system’s ability to sustain closely separated defect pairs.

      Experimentally, we further observe that removing water by heating enhances surface nonuniformities, which readily trigger defect-pair formation. To investigate this mechanism, we performed additional simulations in which local nonuniformities were introduced Supplementary Figure 12. Consistent with experiments, defect-pair generation occurs only at high nonreciprocity, where pairs of unlike defects can be stably maintained. Experimental observation (Author response image 4) also show that surface nonuniformities on the biofilm surface similarly trigger the formation of closely separated defect pairs. We have updated the details of the defect dynamics in the revised manuscript to clarify the transition between these waves.

      Author response image 4.

      Experimental observation showing that small surface nonuniformities on the biofilm surface trigger the formation of closely separated defect pairs. Arrows indicate the position of the nonuniformities

      Modification in the manuscript:

      Defect dynamics controlling the transition between spiral to target waves

      “To better understand the dynamics of the transition between different form of the waves we focused on numerical simulations. We noticed that the motility of defects is the crucial parameter governing the transition between spiral, target, and planar waves varying the moisture content provides an effective and experimentally accessible control this motility. Our analyses revealed that spiral defect cores can move and merge to form target waves or annihilate entirely—processes that we also observe experimentally. This rich dynamical behavior underscores the importance of elasticity in shaping pattern transitions. First, we compare defect dynamics in both Kuramotobased simulations and the active solid model. Both systems exhibit similar defect-survival behavior. As shown in Supplementary Figure10, pairs of unlike (+/−) defects can stably persist only at high nonreciprocity. We further quantify this behavior by plotting the separation distances between unlike defect pairs and find that short-range defect separations are possible exclusively in the high-nonreciprocity regime (Supplementary Figure11). This high-nonreciprocity regime corresponds to the dry biofilm state. Increasing moisture reduces elasticity, leading to the loss of stable defect dynamics and promoting the annihilation of unlike defect pairs, which in turn drives the system toward target-wave formation and ultimately planar waves. Conversely, heating the biofilm removes water, enhances elasticity, and increases the system’s ability to sustain closely separated defect pairs. Experimentally, we further observe that removing water by heating enhances surface nonuniformities, which readily trigger defect-pair formation (Supplementary Video9). To investigate this mechanism, we performed additional simulations in which local nonuniformities were introduced (Supplementary Video12-13). Consistent with experiments, defect-pair generation occurs only at high nonreciprocity, where pairs of unlike defects can be stably maintained. Experimental observation (Supplementary Video9) also show that surface nonuniformities on the biofilm surface similarly trigger the formation of closely separated defect pairs.”

      All the recommended points have been addressed in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study investigates how collective navigation improvements arise in homing pigeons. Building on the Sasaki & Biro (2017) experiment on homing pigeons, the authors use simulations to test seven candidate social learning strategies of varying cognitive complexity, ranging from simple route averaging to potentially cognitively demanding selective propagation of superior routes. They show that only the simplest strategy-equal route averaging-quantitatively matches the experimental data in both route efficiency and social weighting. More complex strategies, while potentially more effective, fail to align with the observed data. The authors also introduce the concept of "effective group size," showing that the chaining design leads to a strong dilution of earlier individuals' contributions. Overall, they conclude that cognitive simplicity rather than cumulative cultural evolution explains collective route improvements in pigeons.

      Strengths:

      The manuscript addresses an important question and provides a compelling argument that a simpler hypothesis is necessary and sufficient to explain findings of a recent influential study on pigeon route improvements, via a rigorous systematic comparison of seven alternative hypotheses. The authors should be commended for their willingness to critically re-examine established interpretations. The introduction and discussion are broad and link pigeon navigation to general debates on social learning, wisdom of crowds, and CCE.

      We thank the reviewer for their positive comments.

      Weaknesses:

      The lack of availability of codes and data for this manuscript, especially given that it critically examines and proposes alternative hypotheses for an important published work.

      We thank the reviewer for their comment. The code and data for our manuscript are an important aspect of the study, and we had intended to make them publicly available upon publication. The link to our code and data on fig share can be found here: (https://doi.org/10.6084/m9.figshare.28950032.v1). We have now revised the manuscript to include a link to our dataset.

      Reviewer #2 (Public review):

      Summary:

      The manuscript investigates which social navigation mechanisms, with different cognitive demands, can explain experimental data collected from homing pigeons. Interestingly, the results indicate that the simplest strategy - route averaging - aligns best with the experimental data, while the most demanding strategy - selectively propagating the best route - offers no advantage. Further, the results suggest that a mixed strategy of weighted averaging may provide significant improvements.

      The manuscript addresses the important problem of identifying possible mechanisms that could explain observed animal behavior by systematically comparing different candidate models. A core aspect of the study is the calculation of collective routes from individual bird routes using different models that were hypothesized to be employed by the animals, but which differ in their cognitive demands.

      The manuscript is well-written, with high-quality figures supporting both the description of the approach taken and the presentation of results. The results should be of interest to a broad community of researchers investigating (collective) animal behavior, ranging from experiment to theory. The general approach and mathematical methods appear reasonable and show no obvious flaws. The statistical methods also appear.

      Strengths:

      The main strength of the manuscript is the systematic comparison of different meta-mechanisms for social navigation by modeling social trajectories from solitary trajectories and directly comparing them with experimental results on social navigation. The results show that the experimentally observed behavior could, in principle, arise from simple route averaging without the need to identify "knowledgeable" individuals. Another strength of the work is the establishment of a connection between social navigation behavior and the broader literature on the wisdom of crowds through the concept of effective group size.

      We thank the reviewer for their positive comments.

      Weaknesses:

      However, there are two main weaknesses that should be addressed:

      (1) The first concerns the definition of "mechanism" as used by the authors, for example, when writing "navigation mechanism." Intuitively, one might assume that what is meant is a behavioral mechanism in the sense of how behavior is generated as a dynamic process. However, here it is used at a more abstract (meta) level, referring to high-level categories such as "averaging" versus "leader-follower" dynamics. It is not used in the sense of how an individual makes decisions while moving, where the actual route followed in a social context emerges from individuals navigating while simultaneously interacting with conspecifics in space and time. In the presented work, the approach is to directly combine (global) route data of solitary birds according to the considered "meta-mechanisms" to generate social trajectories. Of course, this is not how pigeon social navigation actually works-they do not sit together before the flight and say, "This is my route, this is your route, let's combine them in this way." A mechanistic modeling approach would instead be some form of agent-based model that describes how agents move and interact in space and time. Such a "bottom-up" approach, however, has its drawbacks, including many unknown parameters and often strongly simplifying (implicit) assumptions. I do not expect the authors to conduct agent-based modeling, but at the very least, they should clearly discuss what they mean by "mechanism" and clarify that while their approach has advantages-such as naturally accounting for the statistical features of solitary routes and allowing a direct comparison of different meta-mechanisms is also limited, as it does not address how behavior is actually generated. For example, the approach lacks any explicit modeling of errors, uncertainty, or stochasticity more broadly (e.g., due to environmental influences). Thus, while the presented study yields some interesting results, it can only be considered an intermediate step toward understanding actual behavioral mechanisms.

      We thank the reviewer for their comment and thoughtful suggestions. We agree that the inherent behavioral mechanisms and the biological basis of these mechanisms cannot be determined just through the navigational data alone. For instance, it remains unexplored if pigeons are adapting their behavior based only on social cues from their partners or using other navigational features such as landmarks or roads, location of the sun, geomagnetic cues or prior learnt routes. However, we do agree (as also pointed by the reviewer) that these behavioral rules generate an emergent ‘meta-mechanism’ where the bird pairs are behaving as if their preferred routes are averaged during a flight. It will be important in future work to explore the biological basis of these mechanisms, but our current approach allows us to only describe the mechanisms in a meta sense with any confidence. Considering this, we believe that our analysis is a more top-down approach towards describing the outcomes of these underlying mechanisms in an abstract sense. We would also like to point the reviewer to Dalmaijer, 2024 [1] who used a bottom up approach, using naive agents and showed that cumulative route improvements emerged in the absence of any sophisticated communication in the same dataset, in agreement with our approach. We have now added a paragraph: “It is also important to clarify that we use the terms…… that lead to these meta-mechanisms arising remain an open question.” found in lines 120-129 in our Introduction to make this clarification.

      (2) While the presented study raises important questions about the applicability and viability of cumulative cultural evolution (CCE) in explaining certain animal behaviors such as social navigation, I find that it falls short in discussing them. What are the implications regarding the applicability of CCE to animal data and to previously claimed experimental evidence for CCE? Should these experiments be re-analyzed or critically reassessed? If not, why? What are good examples from animal behavior where CCE should not be doubted? Furthermore, what about the cited definitions and criteria of CCE? Are they potentially too restrictive? Should they be revised-and if so, how? Conversely, if the definitions become too general, is CCE still a useful concept for studying certain classes of animal behavior? I think these are some of the very important questions that could be addressed or at least raised in the discussion to initiate a broader debate within the community.

      We thank the reviewer for their comments and interesting questions regarding our study. We agree with the reviewer that our study opens up new avenues for critically analysing the criteria previous studies have used for providing evidence of CCE in non-human animals. According to our literature review, we found that the field has been usually motivated in thinking about CCE in a ‘process’ focused manner (Reindl et al. [2]) in regards to individuals being able to compare strategies and selecting ones resulting in higher individual fitness. This preferential selection of strategies – termed innovations — allows for the stereotypical ratcheting effect seen in CCE. In our study, we propose that in the case of homing pigeons, the ratcheting effect is more of a statistical outcome rather than deliberate individual judgement. We believe that this strategy is also amenable to certain task types (which in our study was homing route choice) and may change for others (for example solving a puzzle box) and the task also needs to be sufficiently complex for animals to benefit from the use of social information (Caldwell et al. 2008 [3]). Thus, we recommend future work to address what classes of problems would fit well within the definition of “emergent” CCE and which ones don’t. Keeping this framework in mind, studies should clearly state what definition of CCE they are using and should be critically evaluated for their underlying task type and cognitive mechanisms to deem them as CCE. Considering these points, we have now expanded our Discussion to include a paragraph: “Our results highlight the need for more…..range of task types and cognitive abilities.” found in lines 420-433 to highlight these key questions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I do not have any major objections, but I am clarifying my points as major or minor depending on the effort required to address (mostly via rewriting and clarifications).

      Major comments:

      (1) A schematic summary of the original study: Since the current manuscript builds directly on Sasaki & Biro (2017), it would greatly help readers if you included a concise schematic figure summarizing the original experiment. For instance, a simple panel could depict the chain design (experienced + naïve replacements), the control treatments, and the key empirical findings (improvements in route efficiency across generations, and route similarity within vs. between chains). Presenting this visually would save readers the effort of reconstructing the design and main results from text alone, especially for those unfamiliar with the original paper. It would also clarify exactly what empirical patterns your simulations are intended to reproduce.

      We thank the reviewer for this comment. We have now revised the manuscript with a schematic illustration adapted from the original study by Sasaki and Biro (2017). We hope this clarifies the experimental design and results we aimed to highlight in our work.

      (2) Reproducibility: Code and data are only "available on request." I believe eLife has strong policies on open science; a lack of immediate open access to analysis would be a barrier. I find it jarring that a paper intending to reproduce and improvise a previously published paper does not make the codes and data available for peer review or to readers without an explicit request.

      We have taken the feedback into consideration and updated the Data Availability section with a link to our Fig share dataset.

      (3) One huge drawback of the current format of the manuscript, where Methods come after Results, is that one has to really struggle to understand and appreciate Figures 2 and 3. I would strongly urge authors to have a shorter methods section embedded either as a subsection before the Results, or within the results section, as described in each figure. Perhaps a lot of my confusion also comes from not having known the previous paper, but it may be true for other readers, too. More specifically, for Figure 3, how is social weight for the experiments inferred? Figure 3 caption talks of mean difference, but one has to check the manuscript at multiple places throughout to really understand what this difference is (the definition) and how it is computed.

      While we agree that our manuscript includes the Methods section at the end, we tried to structure our text to tell a story (as stated in our manuscript title). To this end, we organized the text into short titled subsections that briefly convey the relevant background, identify the knowledge gap and outline our approach. We chose this structure to reserve the indepth details about model implementation and statistical analysis for the Methods.

      Additionally, we made sure to include references to methodological details in relevant segments of the Introduction and Results section so as to not bog down the reader by model complexities and keep a coherent narrative that delivers the message of our study. To further address the background of our work, we have now added a schematic of the original study in response to a previous comment by the reviewer, which we hope helps the reader better understand our work. We hope this explanation clarifies the intention behind our writing choice and decision to retain the current structure.

      (4) The introduction of the 'effective group size' concept is a potentially valuable and intuitive way to interpret chain dynamics, but the explanation is somewhat buried in the Results/Methods; I suggest highlighting it more prominently (e.g., in the Discussion or with a schematic in the Results) so readers can readily grasp this useful idea.

      We thank the reviewer that they found our concept of ‘effective group size’ useful. However, we do believe that we introduced the idea and rationale behind using this method in the Results: “We asked to what extent……to an equivalent group size” found in lines 305-314. We reserved a detailed description of this method in the Methods section. However, to further emphasize the importance of the concept we have now added a text: “This is further supported….. slightly better than two individuals.” found in lines 389-394 in the Discussion. 

      Minor comments:

      (1) Line 12: "what is the navigation mechanism(s)" - the (s) is a bit awkward. Either remove (s) or ask what the mechanisms are.

      We have fixed the typo to clarify the statement.

      (2) Line 78: "Such 'ratchet'-like improvements is referred to..." → "are referred to."

      We have fixed the typo to clarify the statement.

      (3) Figure 3 caption: "color scheme in the plots are same" → should be "is the same."

      We have fixed the typo to clarify the statement.

      (4) Clarification on reporting confidence intervals: The manuscript reports confidence intervals (CIs) for the model-based comparisons (e.g., Figures 2-3). This might seem unnecessary for simulation studies, since running more iterations can arbitrarily shrink uncertainty. However, in your case, the CIs are justified because the simulations are anchored to a finite empirical dataset (only 9 solo trajectories), sampled with replacement, and analyzed with mixed-effects models that incorporate bird identity as a random effect. Thus, the intervals reflect biological sample variability rather than simulation noise. This must be clarified.

      We have added a clarifying statement: “...and reflect the biological uncertainty in the empirical dataset, not simulation noise” found in lines 241 and 293 in the captions of Figures 2 and 3 in accordance with the reviewer’s comment. 

      (5) One part of the issue is that details of methods come much later in the manuscript, perhaps following journal style. Therefore, I recommend explicitly highlighting this rationale in the Results, so readers do not misinterpret the CIs as simply reflecting simulation error.

      We believe that the clarifying statements we have now added in the captions of Figures 2 and 3 should convey this interpretation of CIs and further changes in the Results may not be required.

      With these proposed changes we hope that we improved upon the clarity of our manuscript.

      References:

      (1) Dalmaijer ES (2024) Cumulative route improvements spontaneously emerge in artificial navigators even in the absence of sophisticated communication or thought. PLoS Biol. 22:e3002644.

      (2) Reindl, E., Gwilliams, A.L., Dean, L.G. et al. (2020) Skills and motivations underlying children’s cumulative cultural learning: case not closed. Palgrave Commun 6, 106.

      (3) Caldwell CA, Millen AE (2008) Studying cumulative cultural evolution in the laboratory. Phil. Trans. R. Soc. B 363:3529-3539.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) While the manuscript is written for a scientific audience, the authors are likely aware that findings like this will be of broad appeal to the field of neurology, where treatments for memory loss are desperately needed. For this reason, the authors could consider including a statement regarding an interpretation of this meta-analysis from a clinical standpoint. Statements such as 'safe and effective' imply a clinical indication, and yet the manuscript does not engage with clinical trials terminology such as blinding, parallel arm versus crossover design, and trial phase. While the authors might prefer not to engage with this terminology, it can be confusing when studies delivering intervention-like five days of consecutive TMS (e.g., Wang et al., 2014) are clustered with studies that delivered online rhythmic TMS, which tests target engagement (e.g., Hermiller et al., 2020). While the 'sessions' variable somewhat addresses the basic-science versus intervention-like approach, adding an explicit statement regarding this in the discussion might help the reader navigate the broad scope of approaches that are utilized in the meta-analysis.

      We appreciate the suggestion to enhance interpretability of our report by broader audiences. First, to avoid confusion, we have eliminated “safe” and “effective” descriptors from the main summary of findings in the Abstract (pg. 1) and Discussion (pg. 6). Second, we now describe that reviewed studies included those categorized as traditional clinical trials, as well as non-clinical studies that generally follow clinical trial designs (i.e., multi-day intervention-like studies), in addition to more basic-oriented studies that are geared towards target engagement (Introduction, pg. 2). Third, we now clarify that the Design and Control factors (Figure 3) correspond to fairly standard distinctions in the clinical trials literature and were intended to capture major study design factors choices that are used in both clinical-trial and non-trial studies (Methods, pg. 9; Table S1). Finally, we now clarify that future clinical trials would be needed to evaluate HITS for any specific indication, and that our findings motivate such investigations but do not conclusively indicate efficacy for any given indication (Abstract, pg. 1; Discussion, pg. 7).

      Reviewer #1 (Recommendations for the authors):

      (1) The color scheme of Figure 1 was a bit confusing. All of the colors used for the flagged regions were incredibly similar. At first glance, it looks like the hippocampus was targeted directly due to the subtle color difference. Could the authors use colors that are more different? Similarly, zooming into the specific locations shows blue dots encompassed by teal. I am not sure what I am looking at here.

      We have updated the figure for clarity.

      (2) Given the broad appeal of the current study, I would encourage the authors to include a brief visual depiction of "HITS." This could help the more casual reader to understand the general approach.

      We have included this in Figure 1A.

      Reviewer #2 (Public review):

      (1) While the introduction centers on the role of the hippocampus in episodic memory and posits hippocampal neuromodulation by TMS as causative, the true mechanism may be more complex. Clean hippocampal lesions in primates cause focal loss of spatial and place memory, and I am aware of no specific evidence that the hippocampus does more than this in humans. Moreover, there is evidence that lateral parietal TMS also reaches neighboring temporal lobe regions, which contribute to episodic memory. The hippocampus may, therefore, be a reliable deep seed for connectivity-based targeting of the episodic memory network, but might not be the true or only functional target.

      We regret to have implied that we think the hippocampus is the true or only functional target. We agree with the reviewer that the hippocampus is “a reliable deep seed for connectivity-based targeting of the episodic memory network” and that the specific locus/loci of the HITS effects and mechanisms are not yet clear. We now emphasize that although hippocampus is used to define the targeted network, effects of TMS are likely distributed throughout the network, citing relevant studies that have shown that brain activity changes due to HITS are certainly not restricted to the hippocampus (Introduction, pg. 2).

      (2) The meta-analysis combines studies with confirmation of targeting and target-network engagement from fMRI and studies without independent evidence of having stimulated the putative target (e.g., Koch et al). That seems like a more important methodological distinction than merely the use of any individual targeting method. In my experience, atlas-based estimates are at least as accurate as eyeballing cortical areas in individuals. Hence, entering individual functional targeting as a factor might reveal an effect on efficacy.

      Our current definition of the “Targeting” factor appears to satisfy this concern. That is, we distinguish studies that used “individual functional targeting” (i.e., resting-state fMRI or DTI connectivity in each individual to select the target) from those that did not (i.e., atlas or other group-average approach). Notably, the Targeting factor modulation effect failed to survive correction for multiple comparisons. We think this satisfies the reviewer criticism, unless the reviewer is suggesting that we categorize studies based on whether they included evaluation of target engagement (e.g., tested for change in fMRI activity or connectivity of the network due to HITS) versus those that measured only behavioral outcomes. We did not include this distinction as a factor, as our analysis focuses on behavioral effects of HITS, and it is not clear what the neural effects would have been in studies in which they were not measured. Notably, we are providing the full raw dataset of effect sizes in a public repository with our final version of record, such that any other categorization schemes could be assessed by others.

      (3) The funnel plot and Egger's regression for episodic memory outcomes suggested possible bias, and the average sample size of 23 is small, contributing to the likelihood of false positive results. It would be informative, therefore, to know how many or which studies had formal power estimates and what the predicted effect sizes were.

      Regarding the average sample size of 23, we note that we used Hedges’ g for the effect size measure because it corrects for bias associated with small samples (pg. 10). Further, small sample sizes contribute to noisy estimates of true effects, allowing outliers to contribute to false positives and low power to contribute to false negatives, but without any reason to systematically yield bias towards false positives. Regarding potential publication bias, although we cannot rule this out based only on the statistics, we think that bias against publication of negative results is unlikely. First, HITS experiments are time consuming and expensive, and most in the field seem to be motivated to publish, whatever the outcome. Second, the notion of memory enhancement via brain stimulation is controversial, and groups have certainly been motivated, if not overly eager, to publish “failure to replicate” studies for HITS (e.g., the failure-to-replicate publication by Hendrikse et al. 2020, which was then re-analyzed by many of the original authors to arrive at different conclusions in Cash et al. 2022). Given these considerations, we think that it is very unlikely that publication bias had any major impact on our conclusions, but of course it cannot be conclusively excluded. Finally, we note that our finding of HITS selectivity for recollection enhancement is likely not affected by publication bias, as this selectivity versus other memory and non-memory outcomes was found only within published studies (i.e., it is very unlikely that publication bias would have led researchers to withhold publication of studies that found effects of HITS on recognition but not on recollection).

      (4) In the Discussion, the authors might provide a comparison between the effect size for memory improvement found here with those reported for other brain-targeted interventions and behavioral strategies. It may also be worthwhile pointing out that HITS/memory is one of the very few, or perhaps the only, neuromodulatory effects on cognition that has been extensively reproduced and survived rigorous meta-analysis.

      We now emphasize that this is, to our knowledge, the only neuromodulatory effect on cognition that is selective, has been extensively reproduced, and survived rigorous meta-analysis (Discussion, pg. 6). However, we wish to avoid the clinical overinterpretation of our findings that might result if we were to compare directly to effect size estimates for other current therapies, which have been evaluated for specific clinical indications. For example, antibody and pharmacological interventions for Alzheimer’s dementia typically have been associated with similar effect sizes to our estimate for HITS. However, those estimates derive from systematic review of randomized controlled trials measuring clinically relevant outcomes at relatively long delays, whereas the HITS studies we review include a mix of controlled and uncontrolled trials, vary in whether clinical outcomes were assessed, and mostly assessed outcomes at shorter delays. Thus, it could be misleading to directly compare the effect sizes. We instead continue to highlight that the HITS effects are promising and warrant rigorous testing for any given clinical indication.

      (5) The section of the Discussion on specificity compares HITS to transcranial electrical stimulation without specifying an anatomical target or intended outcome. A better contrast might be the enormous variety of cognitive and emotional effects claimed for TMS of the dorsolateral prefrontal cortex.

      We now also note that TMS of lateral frontal cortex has not been associated with similarly high specificity (Discussion, pg. 6). Note however that we cannot exclude anti-depressant or other psychological effects of HITS, as such outcomes were not consistently assessed in HITS studies and so were not included in our analyses.

      (6) With reference to why other nodes in the episodic memory network have not been tested, current flow modeling shows TMS of the medial prefrontal cortex is unlikely to be achievable without stronger stimulation of the convexity under the coil, in addition to being uncomfortable. The lateral temporal lobe has been stimulated without undue discomfort.

      We now additionally indicate that medial prefrontal stimulation may be ineffective given conventional TMS (Discussion, pg. 7). However, we are aware of no studies that have stimulated the portion of middle temporal gyrus that shows strong connectivity with hippocampus. We have tried this location, which positions the coil on or slightly above the ear and bordering on the temple area that is very sensitive to most. We were not able to minimize pain/discomfort for most subjects in pilot experiments, and so had to abandon it. Perhaps others have succeeded? If the reviewer has any specific references that could be included we would be happy to add them and update this section accordingly.

      (7) Finally, a critical question hanging over the clinical applicability of HITS and other neuromodulation techniques is how well they will work on a damaged substrate. Functional and/or anatomical imaging might answer this question and help screen for likely responders. The authors' opinion on this would be informative.

      We appreciate this point but don’t think there are enough data to assess the level of substrate damage needed to frustrate any stimulation benefits. The only thing we can say is that HITS was equally effective for mild to moderate Alzheimer’s dementia as it was for other non-neurodegenerative groups (nonsignificant effect of the Population factor, Figure 3B), suggesting that whatever degree of damage present in that group is insufficient to prevent the stimulation effects. We now highlight this point and raise the issue that, presumably, some level of damage would render HITS ineffective (Discussion, pg. 8).

      Reviewer #3 (Public review):

      (1) My only significant concern is how studies are categorized in the 'Timing' factor (when stimulation is applied). Currently, protocols in which TMS is administered across days are categorized as 'pre-encoding' in the Timing factor. This has the potential to be misleading and may lead to inaccurate conclusions. When TMS is administered across multiple days, followed by memory encoding and retrieval (often on a subsequent day), it is not possible to attribute the influence of TMS to a specific memory phase (i.e., encoding or retrieval) per se. Thus, labeling multi-day TMS studies as 'pre-encoding' may be misleading to readers, as it may imply that the influence of TMS is due to modulation of encoding mechanisms per se, which cannot be concluded. For example, multi-day TMS protocols could be labeled as 'pre-retrieval' and be similarly accurate. This approach also pools results from TMS protocols with temporal specificity (i.e., those applied immediately during encoding and not on board during memory testing) and without temporal specificity (i.e., the case of multi-day TMS) regarding TMS timing. Given the variety of paradigms employed in the literature, and to maximize the utility/accuracy of this analysis, one suggestion is to modify the categories within the Timing factor, e.g., using labels like 'Temporally-Specific' and 'Temporally Non-specific'. The 'Temporally-Specific' category could be subdivided based on the specific memory process affected: 'encoding', 'retrieval', or 'consolidation' (if possible). I think this would improve the accuracy of the approach and help to reach more meaningful conclusions, given the variety of protocols employed in the literature.

      We agree in principle with this criticism and think that the most straightforward way to address it is to relabel the “Pre-Encoding” category as “Pre-Task”. The issue with labeling/considering single-session stimulation delivered immediately before encoding as “Pre-encoding” is that this makes the assumption that this stimulation doesn’t also affect retrieval (i.e., is temporally specific). We do not have certainty about the timecourse of how a single session of stimulation affects brain activity. We think the “Pre-Task” label and interpretation is the best way to address this, to avoid suggesting that we are confident about the timecourse/selectivity of stimulation effects. Notably, the “Sessions” factor directly compares among designs that delivered stimulation in a single session versus in multiple consecutive sessions, and was a nonsignificant modulator. Thus, our analyses already compare studies that are relatively temporally specific versus those that, likely, are less so. In addition to relabeling, we have also added clear caveats to address the interpretive constraint imposed by the unknown timecourse of stimulation effects (Discussion, pg. 6-7) and revised the Abstract to reflect this change.

      (2) As the scope of the meta-analysis is limited to TMS applied to parietal or superior occipital cortex, it is important to highlight this in the Introduction/Abstract. The 'HITS' terminology suggests a general approach that would not necessarily be restricted to parietal/nearby cortical sites.

      This was previously highlighted only in the Methods and Discussion (with a Discussion paragraph dedicated to the issue of target selection; see also Comment 6 from Reviewer 2). We now also note this in the Introduction (pg. 2) and Abstract.

      Minor:

      (1) To reduce the number of study factors tested, data reduction was performed via Lasso regression to remove factors that were not unique predictors of the influence of TMS on memory. This approach is reasonable; however, one limitation is that factors strongly correlated with others (and predict less unique variance) will be dropped. This may result in a misrepresentation, i.e., if readers interpret factors left out of this analysis as not being strongly related to the influence of TMS on memory. I do see and appreciate the paragraph in the Discussion which appropriately addresses this issue. However, it may be worth also considering an alternative analysis approach, if the authors have not already done so, which explicitly captures the correlation structure in the data (i.e., shown in Figure S2) using a tool like PCA or an appropriate factor analysis. Then, this shared covariance amongst factors can be tested as predictors of the influence of TMS - e.g., by testing whether component scores for dominant PCs are indeed predictive of the influence of TMS. This complementary approach would capture rather than obfuscate the extent to which different factors are correlated and assess their joint (rather than independent) influence on memory, potentially resulting in more descriptive conclusions. For example, TMS intensity and protocol may jointly influence memory.

      We argue that feature selection via Lasso regression is a better approach for our research question than PCA, factor analysis, or other latent variable methods. The main reason is that PCA would sacrifice the interpretability of our findings with respect to the design of future experiments using or testing HITS. That is, because PCA creates composite components that are linear combinations of multiple variables, we would lose the ability to provide clear, actionable guidance to researchers about which specific study design choices (e.g., stimulation intensity, protocol type, timing) influence memory outcomes. Given that a major goal of our meta-analysis is to inform future experimental design, we believe that it is essential to maintain interpretability of the individual factors that must be decided when designing a study. Regarding factor analysis, this approach would require making a priori theoretical decisions about how to group individual moderators, which could introduce subjective bias into the analysis and would introduce other complications such as a need for validation of the resulting factor scores. We believe that the exploratory nature of our investigation, examining which among many possible study design factors substantially determine TMS efficacy, is better suited to a data-driven selection approach like Lasso. While the reviewer correctly notes that Lasso may drop factors that are correlated with stronger predictors, this feature can be considered advantageous in terms of identifying factors for inclusion in future study designs. That is, this can help identify the most parsimonious set of independent predictors, such that researchers can focus on the study design elements that matter most when controlling for other factors. Notably, we provide the table of factor relationships (Figure S2) so that interested readers can inspect how dropped factors were related to those that were retained.

      It is also important to note that we have provided the full dataset with our resubmission, which has been deposited in Dryad with a link in the Data Availability section (pg. 15). Thus, others are free to explore alternative analytical approaches should they wish to examine the data from different perspectives or to answer different questions.

      (2) Given the specific focus on TMS applied to parietal cortex to modulate hippocampal and related network function, it would be fruitful if the authors could consider adding discussion/speculation regarding whether this approach may be effectively broadened using other stimulation methods (e.g., tACS, tDCS), how it may compare to other non-invasive brain stimulation methods with depth penetration to target hippocampal function directly (transcranial temporal interference, or transcranial focused ultrasound), and/or how or whether other stimulation sites may or may not be effective.

      We briefly discuss a meta-analysis of tACS studies which reported nonspecific effects, including for parietal targets overlapping those used for HITS (Discussion, pg 6). We briefly speculate about how tES effects remain mechanistically uncertain. We are afraid that further speculation about other stimulation modalities and targets would be beyond the scope of this focused meta-analysis, given especially the few datapoints for newer approaches such as TI or tFUS.

      (3) Studies were only included in the meta-analysis if they contained objective episodic memory tests. How were studies handled that included both objective and subjective memory, or other non-episodic memory measures? For example, Yazar et al. 2014 showed no influence of TMS on objective recall, but an impairment in subjective confidence. I assume confidence was not included in the meta-analysis. Similarly, Webler et al. 2024 report results from both the mnemonic similarity task (presumably included) and a fear conditioning paradigm (presumably excluded). Please clarify in the methods how these distinctions were handled.

      Studies were included in our meta-analysis if they included at least one objectively scorable test of episodic memory. We only included objectively scorable test performance in our analysis, excluding scores from any other subjective measures if they were also reported. This is now clarified in Methods (pg. 9).

      (4) The analysis comparing memory to non-memory measures is important, showing the specificity of stimulation. Did the authors consider further categorizing the non-memory tasks into distinct domains (i.e., language, working memory, etc.)? If possible, this could provide a finer detail regarding the selectivity of influences on memory vs. other aspects of cognition. It is likely that other aspects of cognition dependent on hippocampal function may be modulated as well, i.e., tasks with high relational/associative processing demands.

      This is an interesting idea, but it is beyond our expertise to categorize these other tasks based on the nature of processing demands that they capture. Note that the task names are provided in the data table that we are making available online with our submission of record (via Dryad), such that other groups could address this question if interested.

      (5) In the analysis of the Intensity factor, how were studies using Active (rather than resting) MT categorized? Only resting MT is mentioned in Table S1. This is important as the original theta-burst TMS protocol from Huang et al. 2005 determines intensity based on Active Motor Threshold.

      MT was resting/passive in all reviewed studies except for one (Tambini et al. 2018), which used 80% of active MT. We categorized this as <100% MT for the Intensity factor, as it was <100% of MT as defined in that study. Although one could make the argument that 80% AMT might instead correspond to 100+% RMT, this change would have very little influence on our results or conclusions. We now clarify this in Table S1.

      (6) Is there a reason why the study by Koen et al. 2018 (Cognitive Neuroscience) was not included? TMS was performed during encoding to the left AG, and objective memory was assessed, so it would seemingly meet the inclusion criterion.

      The failure to include Koen et al. 2018 was our error. Koen et al. 2018 is the only study that used “online” stimulation, delivered during the trials when memoranda were displayed for encoding in the task. In contrast, all other reviewed studies delivered “offline” stimulation either before the memoranda was presented (“Pre-Task”) or after the encoding period but before retrieval (“Post-Encoding”). Therefore, categorization for the “Timing” factor would be problematic for its inclusion in the main analysis. We therefore now include Koen et al. 2018 in the “Supplementary Results” section as well as the corresponding main Results section on “Similar outcomes in studies that were excluded from meta-analysis”. We also note in the relevant discussion that “online” stimulation, as done in Koen et al. 2018, is typically considered disruptive (e.g., Beynel et al. 2019 Neuroscience & Biobehavioral Reviews; Yeh & Rose 2019 Frontiers in Psychology), which should be taken into account when considering the findings of Koen et al. 2018 relative to other reviewed studies that used “offline” designs.

      (7) It would be helpful to briefly differentiate the current meta-analysis from that performed by Yeh & Rose (How can transcranial magnetic stimulation be used to modulate episodic memory?: A systematic review and meta-analysis, 2019, Frontiers in Psychology) (other than being more current).

      Beyond being more current and therefore including many more studies in which stimulation targets were based on hippocampal connectivity (which tend to have been published more recently), the differences with Yeh & Rose 2019 are subtle. Our review focuses on assessment of network targeting and whether effects were specific to episodic memory versus other tasks, which differs somewhat from the focus of Yeh & Rose 2019. The main difference in conclusions likely derives from there being more network-focused memory TMS experiments now than were available for Yeh & Rose’s review. We also differentiate episodic memory into recollection versus other components to test specificity and analyze modulation by many study design factors relevant to HITS studies that were not emphasized in Yeh & Rose’s review. Note that we now cite Yeh & Rose for those interested in potential differences.

      (8) For transparency and to facilitate further understanding of the literature and potential data re-use, it would be great if the authors consider sharing a supplementary table or file that describes how individual studies/memory measures were categorized under the factors listed in Table S1.

      As promised in our original submission, we are providing the full data table, including how individual studies and memory measures were categorized, as an open dataset in Dryad. The Dryad dataset is cited in “Data availability” (pg. 15).

      Reviewer #3 (Recommendations for the authors):

      Please explicitly state in the Methods (Meta-analysis of effect modifiers section) that the criteria used for categorizing each measure into a factor (e.g., probing Recollection, Recognition, etc.) are fully described in Table S1; this will help readers to find these details (it took me a while!).

      This is now emphasized (pg. 10).

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer 1 (Public review):

      (1) "The timescales of the peptide recognition and unbinding process are much longer than what can be sampled from unbiased simulations. Therefore, the proposed mechanism of recognition should only be considered a hypothesis based on the results presented here. For example, peptides that do not dissociate within one one-microsecond MD simulation are considered to be stable binders. However, they may not have a viable way to bind to the narrow protein cleft in the first place."

      We thank the Reviewer for this valuable feedback and we agree with the Reviewer. Our work on the IRE1 cLD activation mechanism is focused on generating a hypothesis of the binding mechanism driven by MD simulations. We recognize the limitations in defining a stable binder due to the time scales sampled. However, our primary focus was to sample and characterize a possible binding pose in the center of the cLD dimer. We contextualized our statements about stable binders and limited our claims to stating that the protein-peptide complex is stable within 1 µs-long simulations. However, we believe that our finding that the cLD dimer groove is not able to accommodate peptides is solid, as the steric impediment described is present in all our replicas, both with and without peptides, in a cumulative sampling time of 24 µs without peptides and 66 µs with peptides. Additionally, we included a plot showing the distribution of groove width across all replicas.

      Addition to the text. (Results section: Unfolded polypeptides bind to hIRE1α cLD dimer surface) The title was changed from “Unfolded polypeptides can stably bind to hIRE1α cLD dimer” to “Unfolded polypeptides bind to hIRE1α cLD dimer surface”

      Addition to the text. (Figure 15 A legend) “(A) Distributions of the groove width of peptide-bound cLD dimers throughout all simulations performed. The left column shows the values for the three replicas in TIP3P water, while the right column displays those for the three replicas in TIP4P-D water.”

      (2) Oftentimes, representative structures sampled from MD simulation are used to draw conclusions (e.g., Figure 4 about the role of R161 mutation in binding affinity). This is not appropriate as one unbinding event being observed or not observed in a microsecond-long trajectory does not provide sufficient information about the binding strength of the free energy difference.

      We thank the Reviewer for the insightful comment. As explained in the previous point, we believe that our simulations provide useful hypotheses. We are aware of the limitations due to the timescale and agree that these limitations cannot be overcome with standard equilibrium simulations. To address these limitations, used orthogonal methods, specifically MM/PB(GB)SA calculations, to calculate binding free energies from existing trajectories. We added predictions of all the peptides using AlphaFold 3, to confirm the binding region. Importantly, we now provide experimental results to assess the binding affinity of cLD dimer mutants E102R and Y161R.

      Addition to the text. (Results section: Unfolded polypeptides bind to hIRE1α cLD dimer surface) “AlphaFold3 predictions of the complexes indicate that the peptides adopt the same preferred orientation, despite being predominantly helical (Supplementary Fig. 16A). We further assessed the MPZ-derived peptide complexes using MM/PBSA free energy calculations over the final 250 ns of each simulation replica (see Methods), finding binding enthalpies consistent with our observations (Supplementary Fig. 16B). In particular, MPZ1N-2X exhibited the lowest binding energy, whereas MPZ1N-2X-RD showed the highest.”

      Addition to the text. (Figure 16 legend) “(A) Prediction of AlphaFold 3 for hIRE1α cLD dimer in complex with peptides. Colors represent the confidence of the prediction (plDDT). (B) Difference in enthalpy (enthalpy of binding, ∆H) as an estimate of the binding free energies of unfolded polypeptides to hIRE1α cLD dimer derived from MM/PBSA calculations of our peptide simulations.”

      Addition to the text. (Figure 4 G legend) “(G) Fluorescence anisotropy measurements of labeled MPZ1N-2X binding to hIRE1α LD wild type and mutants E102R and Y161R.”

      Addition to the text. (Results section: Point mutations destabilize unfolded peptide binding to cLD) “To experimentally test whether these residues are involved in hIRE1α LD’s interaction with peptides, we expressed and purified these mutants and conducted fluorescence anisotropy experiments using fluorescently labeled MPZ1N-2X peptide. We could purify both E102R and Y161R mutants to high purity (Supplementary Fig. 18C). They both behaved similarly to the wild type during purification. Notably, both E102R and Y161R mutants demonstrated around two-fold lower binding affinity (Fig. 4G, E102 K<sub>1/2</sub>= 6.35 µM and Y161R K<sub>1/2</sub>= 5.4 µM, Supplementary Table 3) compared to the wildtype (K<sub>1/2</sub>= 2.14 µM, Supplementary Table 3), revealing that the protein’s central area is crucial for binding unfolded proteins and that binding activity occurs within the pocket defined by E102 and Y161.”

      Addition to the text. (Figure 4G legend) “(G) Fluorescence anisotropy measurements of labeled MPZ1N-2X binding to hIRE1α LD wild type and mutants E102R and Y161R.”

      Addition to the text. (Supplementary Table 3)

      Reviewer 2 (Public review):

      (1) Improving presentation to include more computational details.

      We thank the Reviewer for raising this critical point. We agree that the manuscript is tailored for a biology audience, as the data are particularly relevant for that community. Nevertheless, we also understand the importance of providing sufficient methodological detail for computational readers. We added more references to the methods for computational information in the main text.

      (2) More quantitative analysis in addition to visual structures.

      We added an uncertainty estimate for the HDX calculations using bootstrapping and included additional information on bond distances for E102 and Y161. We also incorporated time-series data showing the distance of the peptide from the groove across all replicas.

      Addition to the text. (Figure 1C legend) “(C) The deuterated fraction obtained from experimental results (dashed line, shaded area indicates the error we calculated from bootstrapping) published by Amin-Wetzel et al. and the fraction computed from MD simulations (solid lines, blue for TIP3P water and orange for TIP4PD water) for the PDB and AF model at incubation time point 0.5 min. This time point corresponds to experimental incubation times, not MD simulation time. Each point represents the mean value derived from three replicas and two monomers per replica. The error bars were obtained from bootstrapping. Below each absolute value plot, we report the discrepancy, which is defined as the difference between the simulated and experimental deuterated fractions, with the shaded area indicating the corresponding error.”

      Addition to the text. (Figure 15B legend) “(B) Minimum groove-peptide distance over time for all simulations of cLD dimer in complex with a peptide. The left column shows the values for the three replicas in TIP3P water, while the right column displays those for the three replicas in TIP4P-D water.”

      Reviewer 3 (Public review):

      A potential weakness of the study is the usage of equilibrium (unbiased) molecular dynamics simulations, so that processes and conformational changes on the microsecond time scale can be probed. Furthermore, there can be inaccuracies and biases in the description of unfolded peptides and protein segments due to the protein force fields. Here, it should be noted that the authors do acknowledge these possible limitations of their study in the conclusions.

      We appreciate the Reviewer’s thoughtful comment. As noted in our response to Reviewer 1, we addressed the concern about sampling by applying orthogonal methods and experimental techniques. We agree with the Reviewer that some form of enhanced sampling is necessary if we want to assess binding in a more quantitative way, e.g., via free energy calculations. However, we also realize that applying any enhanced sampling scheme to our system is very challenging, given its large size and the complex peptide-protein interactions, which are not easily captured in a few collective variables. After a careful assessment and some preliminary tests, we decided that estimating free energies using enhanced sampling would necessitate a separate paper due to both the conceptual complexity of the project and the size of the necessary sampling campaign.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Some enhanced sampling or path sampling simulations may be carried out to identify the peptides’ binding and unbinding mechanisms to the protein. This can show whether the disordered peptides studied in this work do indeed bind to the protein.

      We thank the Reviewer for this constructive criticism. We acknowledge the limitations associated with investigating binding and unbinding mechanisms of disordered peptides within the time scales accessible to our equilibrium simulations. However, the primary objective of our study was to sample and characterize a plausible binding pose at the center of the cLD dimer. We wanted to understand if unfolded model peptides require an open groove able to contain them to bind to IRE1’s core luminal domain or if binding also in the absence of an open groove.

      Enhanced sampling is, of course, an important strategy to overcome the limits of equilibrium simulations. However, we note that implementing enhanced sampling approaches in this system poses significant challenges due to its large size and the complexity of peptide–protein interactions, which cannot be easily captured using a limited set of collective variables. We decided that a thorough application of enhanced sampling would therefore constitute a separate study. Instead, we decided to validate our simulations in two ways: 1) we ran a new set of free energy calculations, and 2) we tested key predictions in experiments, adding significant new data to strengthen the conclusions of our manuscript.

      To evaluate whether the binding free energies of MPZ-derived peptides to human IRE1α cLD dimers are consistent with experimentally reported binding constants, we employed the MM/PBSA (Molecular Mechanics/Poisson–Boltzmann Surface Area) method. Calculations were performed over the final 250 ns of each simulation replica using the Single Trajectory Protocol (STP), which avoids the need for additional simulations. This approach provides an estimate of the effective binding free energy (i.e., enthalpy of binding) by accounting for bonded and non-bonded interactions, as well as solvation contributions. The entropic contribution, being computationally more demanding and subject to additional approximations, was not included. Binding enthalpies were obtained for MPZ1-N (in different initial orientations), MPZ1-C, MPZ1-N-2X, and MPZ1-N-2X-RD. The results indicated small differences in effective binding energies between the shorter peptides (MPZ1-N and MPZ1-C), whereas MPZ1-N-2X exhibited the lowest binding energy and MPZ1-N-2X-RD the highest, consistent with experimental trends. These findings support the reliability of our model and sampling strategy as a framework for analyzing peptide binding conformations to cLD.

      We identified residues E102 and Y161 as key contributors to the binding of unfolded peptides in our simulations. Contact analysis revealed these residues as binding hotspots, centrally located within the observed interaction regions. To probe their relevance, we conducted simulations of cLD dimers with single arginine mutations in these residues, aimed at disrupting these hotspots through charge repulsion. These simulations revealed increased instability of the MPZ1N2X on the cLD dimer surface. We further validated these findings experimentally using fluorescence anisotropy assays. Fluorescently labeled MPZ1N-2X was titrated with purified cLD mutants (E102R and Y161R), and anisotropy measurements were fitted to derive  K<sub>1/2</sub> values. Both mutations resulted in approximately a two-fold reduction in binding affinity relative to the wild-type cLD, confirming the importance of these residues in stabilizing peptide binding.

      Addition to the text. (Results section title: Unfolded polypeptides bind to hIRE1α cLD dimer surface) “We further assessed the MPZ-derived peptide complexes using MM/PBSA free energy calculations over the final 250 ns of each simulation replica (see Methods), finding binding enthalpies consistent with our observations (Supplementary Fig. 16B). In particular, MPZ1N-2X exhibited the lowest binding energy, whereas MPZ1N-2X-RD showed the highest.”

      Addition to the text. (Results section title: Unfolded polypeptides bind to hIRE1α cLD dimer surface) “Thus, we investigated how the point mutations of two key residues, E102R and Y161R, would affect peptide binding by simulating the cLD mutant in complex with MPZ1N-2X (Fig. 4C-E). We initialized the systems in the pose described for the other peptide-cLD systems described earlier (Fig. 3B, t = 0 µs). In simulations of the wild-type (WT) cLD dimer, the peptide generally remained near the center (Fig. 4C,F). By contrast, MPZ1N-2X displayed reduced binding to E102R, fully dissociating in one TIP4P-D replica (Fig. 4E,F). A similar trend was observed for Y161R, where one partial dissociation event occurred (Fig. 4D,F). Comparative analysis of MPZ1N-2X contact sites on the WT and mutant cLD dimers (Supplementary Fig. 17B-D) revealed that, in the presence of mutations, the peptide engages a broader surface region rather than remaining centrally localized, while forming fewer contacts with the specific residues (Supplementary Fig. 18A-B).”

      Addition to the text. (Results section title: Unfolded polypeptides bind to hIRE1α cLD dimer surface) “To experimentally test whether these residues are involved in hIRE1α LD’s interaction with peptides, we expressed and purified these mutants and conducted fluorescence anisotropy experiments using fluorescently labeled MPZ1N-2X peptide. We could purify both E102R and Y161R mutants to high purity (Supplementary Fig. 18C). They both behaved similarly to the wild type during purification. Notably, both E102R and Y161R mutants demonstrated around two-fold lower binding affinity (Fig. 4G, E102  K<sub>1/2</sub>= 6.35 µM and Y161R  K<sub>1/2</sub>= 5.4 µM, Supplementary Table 1) compared to the wildtype (K<sub>1/2</sub>= 2.14 µM, Supplementary Table 1), revealing that the protein’s central area is crucial for binding unfolded proteins and that binding activity occurs within the pocket defined by E102 and Y161.”

      Addition to the text. (Figure 4 legend) “(E) Side view snapshot after 1 µs of simulation of E102R hIRE1α cLD dimer (gray) in complex with MPZ1N-2X (orange). The amino acid R102 on both monomers is represented in magenta sticks. (F) Time series of the minimum groove-peptide distance for MPZ1N-2X simulated in complex with wild-type, E102R, and Y161R hIRE1α cLD dimer in TIP3P (3 replicas) and TIP4P-D (3 replicas) water. The darker lines show the rolling average over 25 frames, while the shaded lines represent the raw data. (G) Fluorescence anisotropy measurements of labeled MPZ1N-2X binding to hIRE1α LD wild type and mutants E102R and Y161R.”

      Addition to the text. (Methods section: Binding free energy calculations (MM/PBSA)) “The binding free energy of noncovalently bound complexes of human IRE1 cLD and peptides was calculated with MM/PBSA (Molecular mechanics/PoissonBoltzmann Surface Area) method via gmx_MMPBSA (version 1.6.4)[1, 2]. The Poisson-Boltzmann method was used to estimate the electrostatic contribution to solvation free energy as recommended for data obtained with the CHARMM force field. The contribution of the entropic term was omitted, obtaining effective binding free energy values, or enthalpy of binding (∆H). We used the Single Trajectory Protocol (STP), using the cLD-peptide simulations as input. The calculations were performed on the last 250 ns of each replica. Single-term total non-polar solvation free energy (inp = 1) was used. The charmm_radii (PBRadii= 7) was used to build amber topology files [3]. The default parameters were applied for other terms.”

      Addition to the text. (Methods section: Protein purification) “To express hIRE1α LD (24-443) human cDNA sequences were cloned into pET47b(+) to create a coding sequence with N-terminal His6-tag. Mutations of hIRE1α LD were introduced by overlap extension PCR and restriction cloning into pET47b(+). For expression of the proteins, the plasmid of interest was transformed into Escherichia coli strain BL21DE3* RIPL (Agilent Technologies). Cells were grown in Luria Broth until OD600=0.6-0.8. Protein expression was induced with 0.6 mM IPTG, and cells were grown in 20°C overnight. For purification, cells after harvesting were resuspended in Lysis Buffer (50 mM HEPES pH 7.2, 400 mM NaCl, 20 mM imidazole, 5% glycerol, 5 mM β-mercaptoethanol) and were lysed in Constans Systems cell disruptor at 25 000 psi. The supernatant was collected after centrifugation for 45 minutes at 48000×g in 4°C. Supernatant was loaded onto Ni-NTA column (Cytiva) and the protein eluted with a linear gradient of imidazole from 20 to 500 mM. Fractions containing the protein were diluted 1:8 with anion exchange wash buffer (50 mM HEPES pH 7.2, 5 mM β-mercaptoethanol), loaded onto HiTRAP-Q ion exchange column (Cytiva) and eluted with a linear gradient from 50 mM to 1 M NaCl. Afterwards, the His6tag was removed by cleavage with Precission protease (GE Healthcare, 1 µg of enzyme per 100 µg of protein). The cleavage was performed overnight in 4°C. The protein sample after cleavage was loaded onto a Ni-NTA column, and the flow-through containing protein without the tag was collected. The protein was further purified on a Superdex 200 10/300 gel filtration column equilibrated with Buffer A (25 mM HEPES pH 7.2, 150 mM NaCl, 2 mM DTT). Protein concentrations were determined using extinction coefficient at 280 nm predicted by the Expasy ProtParam tool (http://web.expasy.org/protparam/).”

      Addition to the text. (Methods section: Fluorescence anisotropy) “For fluorescence anisotropy measurements, the MPZ1-N-2X peptide attached to 5 carboxyfluorescein (5-FAM) at its N-terminus was obtained from GenScript at >95% purity. Binding affinities of hIRE1α LD mutants to FAM-labeled peptides were determined by measuring the change in fluorescence anisotropy on a Tecan CM Spark Micro Plate Reader with excitation at 485 nm and emission at 525 nm with increasing concentrations of hIRE1α LD variants. Measurements were performed in Buffer A supplemented with Tween 20 (25 mM HEPES pH 7.2, 150 mM NaCl, 2 mM DTT, 0.025% Tween 20). Fluorescently labeled peptides were used in a concentration of 90 nM. The reaction volume of each data point was 25 µL and the measurements were performed in 384-well, black flat-bottomed plates (Corning) after incubation of peptide with hIRE1α LD variants for 30 min at 25◦C. Binding curves were fitted using Prism Software (GraphPad) using the following equation: F<sub>bound</sub> = r<sub>free</sub> +( r<sub>max</sub>r<sub>free</sub>)/(1+10((Log K<sub>1/2</sub> −x)·n<sub>H</sub>)), where F<sub>bound</sub> is the fraction of peptide bound, r<sub>max</sub> and r<sub>free</sub> are the anisotropy values at maximum and minimum plateaus, respectively. n<sub>H</sub> is the Hill coefficient and x is the concentration of the protein in log scale. Curve-fitting was performed with minimal constraints to obtain K<sub>1/2</sub> values with high R<sup>2</sup> values. However, as this equation does not consider the equilibria between hIRE1α LD dimers/oligomers, these apparent K<sub>1/2</sub> values do not reflect the dissociation constant.”

      (2) Wherever possible, conclusions related to binding affinity should not be drawn from single unbinding events. For example, the title of Figure 4, "Single point mutation of cLD alters the binding affinity of unfolded peptide," should be softened. Similar changes should be made throughout the manuscript where such claims have been presented.

      We thank the Reviewer for highlighting this important point. In the revised manuscript, we have adjusted the text to remove or soften conclusions related to binding affinity that were based on single unbinding events in the MD simulations.

      Addition to the text. (Figure 4 title) “Single point mutations of cLD alter the binding of unfolded peptide MPZ1N-2X.”

      Addition to the text. (Results section title: Unfolded polypeptides can stably bind to hIRE1α cLD dimer) “Unfolded polypeptides bind to hIRE1α cLD dimer surface.”

      Addition to the text. (Results section: Unfolded polypeptides bind to hIRE1αα cLD dimer surface) “Our goal was to elucidate a potential binding pose and identify the relevant features of unfolded proteins and the cLD that affect the binding.”

      Reviewer #2 (Recommendations for the authors):

      (1) A table of all simulated trajectories, including simulation conditions, number of replicas, box size, number of atoms, equilibration length, recording time step, number of frames for further analysis.

      We thank the Reviewer for this helpful suggestion. We have added a summary table of all simulations, including the requested details, to the Supplementary Information (Table 1).

      Addition to the text. (Supplementary figures and tables: Table 2)

      (2) The current NVT equilibration time was 0.125ns, and then no productive NPT simulations were mentioned as equilibration. Even though this is a simulation of mostly folded structures, it still takes some time for these amino acids to relax within the force field.

      We thank the Reviewer for this constructive comment and acknowledge the validity of the concern. However, our simulations were extensively sampled, and equilibration was achieved within the first 50 ns of the production runs. Therefore, the segments of the trajectories from which we draw conclusions correspond to equilibrated states (see RMSD analysis, Figure 1). Additionally, binding free energy calculations (MM/PBSA) were carried out on the last 250 ns of the simulation replicas.

      (3) At least three histograms were presented in Figure 2C, which I guess is from multiple simulations, and does not seem to be discussed.

      We thank the Reviewer for pointing out the lack of reference to Figure 2C. We added the correct reference to the text where the groove width of luminal domains of human and yeast is discussed.

      Author response image 1.

      RMSD analysis of human IRE1_α_ cLD dimer simulated in complex with unfolded peptides.

      Addition to the text. (Results section: The putative groove of human IREα cLD is dynamic but unable to contain peptides ) In simulations of the dimeric structures, the average groove width was 7.3 ± 0.1 Å for the human cLD and 8.9 ± 0.1 Å for the yeast cLD, averaged over three TIP3P and three TIP4P-D replicas per system (Fig. 2C).

      (4) The comment regarding the CHARMM force field on Page 6 is not justified. Actually the force field the authors used (CHARMM36m, Jing et al Nat Methods 2016) did include scaling of TIP3P LJ parameters to correctly capture the dimensions of the intrinsically disordered proteins (IDPs). However, the authors cited a couple of examples of literature of previous versions of CHARMM force fields and commented that it cannot capture IDP dimensions with TIP3P.

      We thank the Reviewer for pointing out this source of confusion. We cited the main papers of CHARMM as [4, 5], which were misleading, and following the Reviewer’s advice, we removed these citations.

      Addition to the text. (Results section: The hIRE1α cLD forms a stable dimer) “Current all-atom force fields used in MD simulations are mainly designed to reproduce the dynamics of folded and globular proteins [6].”

      (5) I am fine that the authors used TIP4PD with CHARMM36m, but caution should be taken for such a combination of protein and water force fields. Note that when optimizing force fields for IDPs, one often has to balance protein-water interactions by either enhancing protein-water interactions, enhancing water dispersions, or reducing protein-protein interactions. So, all such optimization is dependent on both protein and water force fields. TIP4PD was designed to pair with Amber99sb-ildn or, most recently, Amber99sb-disp instead of CHARMM36m. This could result in rescaling of LJ parameters.

      We thank the Reviewer for raising this issue. We argue that the TIP4P-D water model has been used in combination with the CHARMM36m force field [7] and has been shown to yield satisfactory results for disordered regions.

      Addition to the text. (Results section: The hIRE1α cLD forms a stable dimer) “The TIP4P-D water model was developed to address limitations of existing force fields in reproducing the structural ensembles of intrinsically disordered proteins and regions. It incorporates enhanced dispersion and moderately stronger electrostatic interactions to improve the balance between water dispersion and electrostatics [8]. Zapletal et al. [7] showed that for proteins containing both folded and disordered regions, the CHARMM36m force field [9] in combination with the TIP4P-D water model provides a robust framework, preventing collapse of disordered regions while preserving folded regions. Acknowledging that the behavior of disordered regions can be case-specific, we conducted molecular dynamics simulations of the two cLD dimer models using the CHARMM36m force field with both TIP3P and TIP4P-D water models.”

      (6) I suggest referring to the methodology part for simulation details as much as possible when presenting the story.

      We thank the Reviewer for this suggestion. In the revised manuscript, we now refer the reader to the Methodology section for detailed descriptions of the HDX-MS data analysis and the MM/PBSA free energy calculations.

      Addition to the text. (Results section: Hydrogen-deuterium exchange experimental data validate the cLD dimer structure) “From our simulations, we calculated the theoretical deuterated fraction using the method by Bradshaw et al.[10] and compared it to the experimental data (Fig. 1C-D and Supplementary Fig. 10) (see Methods).”

      Addition to the text. (Results section: Unfolded polypeptides bind to hIRE1α cLD dimer surface) “We further assessed the MPZ-derived peptide complexes using MM/PBSA free energy calculations over the final 250 ns of each simulation replica (see Methods), finding binding enthalpies consistent with our observations (Supplementary Fig. 16B). In particular, MPZ1N-2X exhibited the lowest binding energy, whereas MPZ1N-2X-RD showed the highest.”

      (7) Error bars and methodology of error analysis should be provided for all cases of all-atom simulations if possible, since convergence is always an issue when considering these conformational changes within microseconds of all-atom simulations.

      We thank the Reviewer for the important observation. We agree and added error methodology for the estimation of theoretical deuterated fractions (Fig. 1C).

      Addition to the text. (Figure C legend) “Each point represents the mean value derived from three replicas and two monomers per replica. The error bars were obtained from bootstrapping.”

      Addition to the text. (Methods section: Hydrogen-deuterium exchange fractions calculation from MD simulations) “To reproduce the time points after incubation in deuterium (D<sub>2</sub>O), we computed deuterated fractions separately for each of the two monomers constituting a dimer for the time points 0.5 min (30 s) and 5 min (300 s). Then, we computed the mean and standard deviation over the data coming from replicas of the same cLD dimer model (AF or PDB model) and the same water model (TIP3P or TIP4P-D). To estimate the uncertainty of the mean values obtained from our datasets and the dataset from Amin-Wetzel et al. ([11] Figure 3—source data 1), we applied a non-parametric bootstrap resampling procedure. For each sequence range from HDX-MS analysis, we treated the measurements from the N=6 independent datasets as independent samples, accounting for 3 replicas each with two monomers (6 monomers total). We then generated 10,000 bootstrap replicates by sampling the datasets with replacement, maintaining the same number of samples N in each resample. For each replicate, we calculated the mean at each sequence position. The resulting distribution of bootstrap means was used to compute the standard deviation as an estimate of the standard error. We computed the difference between simulation and experimental data (deuterated fraction discrepancy), and for each residue, we selected as the ‘best structure’ the model with the discrepancy closest to zero among PDB-TIP3P, PDB-TIP4P-D, AF-TIP3P, and AF-TIP4P-D systems.”

      (8) Technically I would call DR1 and DR2 linker regions within a folded structure. Their motions are quite restrained by the fold part. I therefore, am not sure how much TIP4PD really helps in contrast to a scaled TIP3P. A plot of structures colored with PLDDT score or b-factor within the PDB should be provided. Quantitative metrics of these regions (e.g. chi chi-squared) might help justify the choice of the AF model against the PDB model. Currently, the two models look very similar in Figures 1c and 1d. Similarly, quantitative metrics as a function of different simulation time windows will help justify the convergence of the simulation and indicate the flexibility of these regions.

      We thank the Reviewer for this thoughtful comment. In response, we analyzed the AlphaFold2 and AlphaFold3 predictions, which consistently assign very low pLDDT values (<50) to the DR2 region, while DR1, is predicted with higher but still low confidence (50 < pLDDT < 70). These scores indicate intrinsic uncertainty in the structural definition of both regions, supporting their flexibility despite being located within a folded context.

      Addition to the text. (Results section: The hIRE1_α_ cLD forms a stable dimer) “All five AlphaFold 2 predictions closely resembled the top-ranked model used for our simulations (Supplementary Fig. 7C). In contrast, the five AlphaFold 3 predictions yielded greater variability in DR2 organization and longer helices in DR2, but still consistently maintain low pLDDT scores in this region, indicating disorder (Supplementary Fig. 7D).”

      Addition to the text. (Figure 7 C-D legend) “(C) Superposition of the 5 structures predicted by AlphaFold 2 Multimer for the cLD dimer and colored by confidence prediction score (pLDDT). (D) Superposition of the 5 structures predicted by AlphaFold 3 for the cLD dimer and colored by confidence prediction score (pLDDT).”

      (9) Fluorescence anisotropy seems to be an important set of experimental data to justify the binding of multiple unfolded peptides to IRE. I suggest the authors include a bar plot of binding affinity of different variants in Figure 3. The raw titration curves should also be included in SI.

      We thank the Reviewer for this valuable suggestion. The binding affinities reported in previous studies are summarized in Table 2; the reader is referred to those works for the corresponding raw titration curves. The binding affinities for the cLD mutants analyzed in the present study are provided in Table 3, and the associated titration curves are shown in Figure 4G.

      Addition to the text. (Figure 4G legend) “Fluorescence anisotropy measurements of labeled MPZ1N-2X binding to hIRE1α LD wild type and mutants E102R and Y161R.”

      Addition to the text. (Supplementary figures and tables: Table 3) See Tab. 1

      (10) The authors should discuss the dependence of initial orientations of unfolded peptides on the final results. The authors claimed that after 1 microsecond simulations, the orientation of these peptides to IRE changed. Quantitative metrics showing both the binding (e.g., number of contacts) and binding orientation (contact region or angles) should be provided to tell whether the simulation is converged. The comparison to the experimental data lacks quantitative metrics. The authors mentioned the dissociation of MPZ1N-2X-RD in half of the simulations; they might want to provide such a metric for all peptides. Technically, 1 microsecond brute-force simulation is quite short for observing such a binding event, and enhanced sampling methods (e.g. metadynamics) might be necessary for investigating binding. However, at least the presentation and interpretation of the current results should be improved for comparing simulations and experiments.

      We thank the Reviewer for the insight. We expanded the discussion of the peptide orientation and added an analysis of the peptide angle with respect to the cLD central groove and contacts. Additionally, we inserted AlphaFold 3 predictions of all the simulated complexes.

      Addition to the text. (Results section: Unfolded polypeptides bind to hIRE1_α_ cLD dimer surface) “In initial simulations with peptides valine8 and MPZ1-N, we positioned the polypeptides over the cLD, aligning them parallel to the principal axis of the central groove in accordance with the proposed binding mode. We refer to this pose as the "0◦ orientation", as the peptide forms a 0 ◦ angle with the principal axis of the groove. We observed that the peptides could rearrange into an orientation perpendicular to the central groove axis, while maintaining contact with the dimer (Fig. 3A, Supplementary Fig. 13A, valine8 TIP4P-D, and Supplementary Fig. 14). Conversely, when MPZ1-N was initially oriented perpendicularly to the groove, it did not transition to a parallel (0◦) orientation (Supplementary Fig. 14). We refer to these poses as the "90◦ orientation" and "270◦ orientation".”

      Addition to the text. (Supplementary Figures and Tables Fig. 14) “(A) Peptide orientation with respect to the central groove principal axis. The angle was computed as the dihedral angle described by the Cα atoms of Y161 residues (groove principal axis) and the C_α_ atoms of residues L1 and A12 of the MPZ1N peptide. The dark lines indicate the rolling average of the fraction of native contacts over 10 frames, while the shaded lines indicate the value per frame. (B) Number of contacts between hIRE1α cLD dimer and MPZ1N peptide. The dark lines indicate the rolling average of the fraction of native contacts over 50 frames, while the shaded lines indicate the value per frame. The analysis were performed on three sets of simulations: "90 degrees" orientation, the peptide is initially placed perpendicular to the central groove principal axis; "270 degrees" orientation, the peptide is initially placed perpendicular to the central groove principal axis but flipped 180 degrees with respect to the 0 degree; "0 degrees" orientation, the peptide is placed parallel to the groove principal axis.”

      Addition to the text. (Results section: Unfolded polypeptides bind to hIRE1α cLD dimer surface) “AlphaFold3 predictions of the complexes indicate that the peptides adopt the same preferred orientation, despite being predominantly helical (Supplementary Fig. ??A).”

      Addition to the text. (Supplementary Figures and Tables Fig. 16A) “(A) Prediction of AlphaFold 3 for hIRE1α cLD dimer in complex with peptides. Colors represent the confidence of the prediction (plDDT).”

      (11) I also have a couple of questions regarding the point mutant Y161R. a) The motivation of mutating Y161 to R is more speculative (Figures 4a,b) than quantitative. The authors might want to show an intermolecular contact map between IRE and unfolded peptides or IRE contact probability along residue indexes to show the interaction hotspots. Figure S11 only showed the structure instead of any metrics for such a purpose. b) It might be better to also show a histogram of the distances of Figure 4e and 4f. Figure 4f actually suggested 1 microsecond simulation is quite short to observe the dissociation event. c) Testing the mutation within the experiment, if possible, would clearly strengthen this part of the manuscript.

      We thank the Reviewer for these constructive suggestions. We have added an analysis of intermolecular contacts for the Y161R and E102R mutants (Fig. 18A–B), which highlights the interaction hotspots between IRE1 residues and the unfolded peptides. To further characterize peptide–groove interactions, we now provide minimum peptide–groove distance time series for all peptides (Fig. 15B). Moreover, to experimentally support our simulations, we performed fluorescence anisotropy measurements on the MPZ1N-2X peptide with cLD WT and mutant constructs. These experiments confirm our computational observations (Fig. 4F–G and Fig. 18C).

      Addition to the text. (Figure 18 legend) “(A) Number of contacts between residues 102 on both monomers and the MPZ1-N-2X peptide during simulations of WT hIREα LD and mutants E10R and Y161R. The dark lines indicate the rolling average of the fraction of native contacts over 25 frames, while the shaded lines indicate the value per frame. (B) Number of contacts between residues 161 on both monomers and the MPZ1-N-2X peptide during simulations of WT hIREα LD and mutants E10R and Y161R. The dark lines indicate the rolling average of the fraction of native contacts over 25 frames, while the shaded lines indicate the value per frame. (C) Protein purification of WT hIREα LD and mutants E10R and Y161R.”

      Addition to the text. (Figure 4F-G legend) “(F) Time series of the minimum groove-peptide distance for MPZ1N-2X simulated in complex with wild-type, E102R, and Y161R hIRE1α cLD dimer in TIP3P (3 replicas) and TIP4P-D (3 replicas) water. The darker lines show the rolling average over 25 frames, while the shaded lines represent the raw data. (G) Fluorescence anisotropy measurements of labeled MPZ1N-2X binding to hIRE1α LD wild type and mutants E102R and Y161R.”

      Addition to the text. (Figure 15B legend) “(B) Minimum groove-peptide distance over time for all simulations of cLD dimer in complex with a peptide. The left column shows the values for the three replicas in TIP3P water, while the right column displays those for the three replicas in TIP4P-D water.”

      (12) Similar comments of quantitative analysis (e.g. contact map as a function of simulation time) apply to the last part of results when discussing the intermolecular interactions. Observations such as "the interface predicted by AlphaFold showed stability across MD simulation replicas lasting 200 ns" were provided, but there is no quantitative analysis. How consistent was this observation across multiple replicas of simulations, and how many replicas were used?

      We thank the Reviewer for this valuable suggestion. To provide a quantitative assessment, we performed new triplicate simulations of the BiP–cLD monomer complex and plotted the fraction of native contacts over time. These results, which demonstrate the consistency of the interface across replicas, are now included in the Supplementary Material.

      Addition to the text. (Figure 19 legend) “(A) Prediction of AlphaFold 3 for hIRE1α cLD monomer in complex with ATP-bound BiP. The colors are as in Fig. 5B. (B) Prediction of AlphaFold 3 for hIRE1α cLD monomer in complex with ADP-bound BiP. (C) Prediction of AlphaFold 3 for hIRE1α cLD monomer in complex with BiP not bound to any nucleotide. (D) Structure of hIRE1α cLDBiP-ATP after 2 µs of simulation. (E) Structure of hIRE1α cLD-BiP-ADP after 2 µs of simulation. (F) Structure of hIRE1α cLD-BiP after 2 µs of simulation.”

      Addition to the text. (Figure 20 legend) “Fraction of native contacts between BiP and cLD monomer in simulations of the structures predicted by AlphaFold 3 without ligands or in complex with ADP or ATP. The dark lines indicate the rolling average of the fraction of native contacts over 100 frames, while the shaded lines indicate the value per frame. The fraction of native contacts (Q) was calculated according to the definition of Best et al. [12]: . For N pairs of native contacts (i, j), where is the distance of the pair in the initial configuration (here the AlphaFold 3 prediction), r<sub>(i,j)</sub>(X) is the distance at frame X, β is a smoothing parameter (β = 50 nm<sup>−1</sup>), λ is the tolerance of the reference distance (λ \= 1.8) and the cutoff used to define a contact between heavy atoms was 0.45 nm.”

      (13) The figure legends are noted using lowercase letters but are described using uppercase.

      We thank the Reviewer for pointing that out, and we changed everything to capital letters.

      Reviewer #3 (Recommendations for the authors):

      (1) Figure 1: I am confused about the HDX-MS results shown in Figure 1. Here, I must also mention that I am not familiar with comparing HDX-MS experiments with MD simulations. The authors mention that they show the deuterated fraction computed from MD simulations for the PDB and AF model at time points 0.5 min and 5 min. However, this time certainly does not correspond to the MD simulation time, thus, it is unclear to me where the difference between the results comes from. Are the two time points some input parameters to the script used to calculate the deuterated fraction? Thus, I would ask the authors to better explain what is the difference in the results between the two time points. Especially, since the general reader might not be familiar with comparing HDX-MS experimental results to MD simulations. Furthermore, I would ask the authors to clarify in the Figure 1 caption that these time points do not correspond to the MD simulation time.

      We thank the Reviewer for pointing us to this possible source of confusion. The time points are effectively input parameters to the calculations of theoretical deuterated fractions from MD simulations. We expanded the explanation of the method in the method section and clarified in the Figure 1 caption that these time points do not correspond to the MD simulation time.

      Addition to the text. (Methods section: Hydrogen-deuterium exchange fractions calculation from MD simulations) “To determine the deuterated fraction of a peptide segment from simulations, the protection factor for each residue i, Pi, must be computed from the simulation snapshots, following the approach of Best and Vendruscolo [13]: . Here, N<sub>C,i</sub> and N<sub>H,i</sub> are the number of H-bonds and heavy-atom contacts of the backbone amide of residue i, and the scaling factors β<sub>C</sub> and β<sub>H</sub> are set to 0.35 and 2.0, respectively. The simulated deuterated fraction of a peptide segment, , defined by residues m<sub>j</sub> +1 to n<sub>j</sub>, was then calculated at any exchange time point t as:

      Where m<sub>j</sub> and n<sub>j</sub> are the first and last residue numbers of the j-th protein fragment, respectively. The intrinsic exchange rate constants for each residue type () were obtained from Bai et al. with updated acidic residues and glycine [14, 15].”

      Addition to the text. (Figure 1 legend: ) “This time point corresponds to experimental incubation times, not MD simulation time.”

      Addition to the text. (Figure 10 legend: ) “Time points correspond to experimental incubation times, not MD simulation time.”

      (2) For AlphaFold 2 Multimer prediction, the authors only considered the top predicted structure. However, AF2-M, one generally obtains 5 structures, and it is also possible to obtain more structures by using an additional random seed. Thus, it would be interesting if the authors would consider the difference between the 5 structures they obtained from the AF2-M prediction. Are they all very similar? (Especially considering the DR1 and DR2 segments, that is the main difference between the PDB and AF2 structures). Analyzing the different predicted AF2 structures would give more insight into the accuracy of the AF2-M predicted model.

      We thank the Reviewer for this insightful suggestion. All AF2-M predicted structures were found to be highly similar, and we now include them in Figure 7E for comparison.

      Addition to the text. (Figure 7E legend) “(E) Superposition of the 5 structures predicted by AlphaFold 2 Multimer for the cLD dimer and colored by confidence prediction score (pLDDT).”

      (3) On Page 6, the authors talk about a "an early PDB model". First, I find the nomenclature "early" confusing here; perhaps it would be better to talk about "an initial PDB model", but I leave it up to the authors to think about if they want to change that. More importantly, reading the Comp. detail on Page 23, it is not so clear what the difference is between the "early" and "final" PDB models, and how the difference in their setups leads to different results. The information is somewhat there on Page 6 and Page 23, but it can be made much clearer. Thus, I would ask the authors to better explain the difference between the early and final PDB models.

      We thank the Reviewer for this helpful comment. In the revised manuscript, we have clarified the terminology and provided a more explicit explanation of the differences between the two IRE1 models, both in the Results section and in the Methods.

      Addition to the text. (Results section: The hIRE1α cLD forms a stable dimer) “An initial PDB model with modified side chain orientations in residues L116 and Y166 due to the modelling of neighbouring missing DR1, caused the dimer to dissociate in one-third of the replicas. [...] The final PDB model, with correctly oriented L116 and Y166 (Supplementary Fig. 9B), was stable in simulations in both TIP3P and TIP4P-D water (Supplementary Fig. 7B).”

      Addition to the text. (Methods section: IRE1_α_ core Luminal Domain (cLD) structural models - Human PDB dimer) “An initial PDB model was briefly equilibrated in NPT, and a conformation with a groove width of approximately 0.6 nm was selected. This snapshot was used as the initial structure for the initial “PDB model” simulations, in which the dimer dissociates.”

      (4) Page 12: "In early simulations", again, I find the nomenclature "early" confusing here. Perhaps it would be better to talk about "In initial simulations" or "In preliminary simulations", but I leave it up to authors to think about this.

      We thank the Reviewer for pointing out this possible source of confusion. We improved the text by referring to these simulations based on the different orientations of the peptide on the cLD dimer in the modeled complex.

      Addition to the text. (Results section: Unfolded polypeptides bind to hIRE1_α_ cLD dimer surface) “In initial simulations with peptides valine8 and MPZ1-N, we positioned the polypeptides over the cLD, aligning them parallel to the principal axis of the central groove in accordance with the proposed binding mode. We refer to this pose as the "0° orientation", as the peptide forms a 0° angle with the principal axis of the groove. We observed that the peptides could rearrange into an orientation perpendicular to the central groove axis, while maintaining contact with the dimer (Fig. 3A, Supplementary Fig. 13A, valine8 TIP4P-D, and Supplementary Fig. 14). Conversely, when MPZ1-N was initially oriented perpendicularly to the groove, it did not transition to a parallel (0°) orientation (Supplementary Fig. 14). We refer to these poses as the "90° orientation" and "270° orientation".”

      Here, we provide a detailed description of the additional changes made to the manuscript.

      Additional edits to the manuscript

      Following discussions with Prof. Dr. David Ron, we refined our BiP model by removing the signal peptide (residues 1–18). Using AlphaFold 3, we predicted BiP–cLD heterodimeric complexes in the presence of ADP, ATP, or without nucleotide. Each of the three complexes was simulated in TIP3P water, in three independent replicas of 1 µs each.

      Addition to the text. (Results section: hIRE1α cLD intermolecular interactions guide the activation process) “We used AlphaFold 3 to model the interaction between a cLD monomer and BiP (residues E19–L654) in the presence of ATP and ADP (Fig. 5B, Supplementary Fig. 19A). Prediction quality was limited in the apo and ADP-bound states (pTM = 0.48, ipTM = 0.59; pTM = 0.49, ipTM = 0.61, respectively), whereas ATP binding improved accuracy (pTM = 0.66, ipTM = 0.72). The predicted interfaces involved DR2, particularly residues 314PLLEG-318, forming a short parallel β-sheet with the substrate-binding domain (SBD) of BiP through two hydrogen bonds. All AlphaFold 3 models were stable across three 1-µs simulations (Supplementary Fig. 19B), with cLD–BiP interfaces retaining 60–80% of initial contacts (Supplementary Fig. 20). In the apo and ADP-bound states, the nucleotide-binding domain (NBD) showed high Predicted Aligned Error (PAE) relative to the cLD, indicating uncertain positioning of the two domains relative to each other. Notably, in the ADP-bound state, which is thought to interact with hIRE1α cLD, the NBD remained mobile but proximal to the αB-helices, thereby restricting access to this region. Together, the AlphaFold 3 predictions suggest that BiP engages hIRE1α cLD by sterically hindering the oligomerization interface defined by DR2 and the αB-helices [16].”

      Addition to the text. (Figure 5 legend) “(B) BiP-cLD monomer complex as predicted by AlphaFold (BiP in shades of purple, cLD in orange) before the simulation (t = 0 µs) and at the end of the simulation (t = 1 µs). The SBD (residues E19-D408) is colored in light purple, and the NDB (residues C420-E650) in dark purple, and the interdomain linker (residues D409-V419) and KDEL motif (residues K651-L654) in light purple.”

      Addition to the text. (Figure 19 legend) “(A) Prediction of AlphaFold 3 for hIRE1α cLD monomer in complex with ATP-bound BiP. The colors are as in Fig. 5B. (B) Prediction of AlphaFold 3 for hIRE1α cLD monomer in complex with ADP-bound BiP. (C) Prediction of AlphaFold 3 for hIRE1α cLD monomer in complex with BiP not bound to any nucleotide. (D) Structure of hIRE1α cLDBiP-ATP after 2 µs of simulation. (E) Structure of hIRE1α cLD-BiP-ADP after 2 µs of simulation. (F) Structure of hIRE1α cLD-BiP after 2 µs of simulation.”

      Addition to the text. (Methods section: cLD monomer in complex with BiP) “The BiP-cLD heterodimer systems were predicted with AlphaFold 3 using the AlphaFold server[17] at https://alphafoldserver.com/. The hIRE1α cLD sequence used is the same used for predicting the dimer: the PDB 2HZ6 sequence, Uniprot identifier O75460 with mutations C127S and C311S, and residues P29-P368. The BiP sequence used is taken from UniProt identifier P11021, residues E19L654. We predicted three complexes: one without any nucleotide, one containing ADP, and another containing ATP. Simulations of the BiP-cLD complex were run in TIP3P water.”

      We have updated the Zenodo repository with additional data and calculations, and the corresponding link is provided in the manuscript.

      References

      (1) Mario S. Valdés-Tresanco, Mario E. Valdés-Tresanco, Pedro A. Valiente, and Ernesto Moreno. gmx_mmpbsa: A New Tool to Perform End-State Free Energy Calculations with GROMACS. Journal of Chemical Theory and Computation, 17(10):6281–6291, October 2021. Publisher: American Chemical Society.

      (2) Bill R. III Miller, T. Dwight Jr. McGee, Jason M. Swails, Nadine Homeyer, Holger Gohlke, and Adrian E. Roitberg. MMPBSA.py: An Efficient Program for End-State Free Energy Calculations. Journal of Chemical Theory and Computation, 8(9):3314–3321, September 2012. Publisher: American Chemical Society.

      (3) Fanhao Wang, Yuzhe Wang, Laiyi Feng, Changsheng Zhang, and Luhua Lai. Target-Specific De Novo Peptide Binder Design with DiffPepBuilder. Journal of Chemical Information and Modeling, 64(24):9135–9149, December 2024. Publisher: American Chemical Society.

      (4) Alexander D. MacKerell Jr., Bernard Brooks, Charles L. Brooks III, Lennart Nilsson, Benoit Roux, Youngdo Won, and Martin Karplus. CHARMM: The Energy Function and Its Parameterization. In Encyclopedia of Computational Chemistry. 2002.

      (5) Bernard R. Brooks, Robert E. Bruccoleri, Barry D. Olafson, David J. States, S. Swaminathan, and Martin Karplus. CHARMM: A program for macromolecular energy, minimization, and dynamics calculations. Journal of Computational Chemistry, 4(2):187–217, 1983.

      (6) Junxi Mu, Hao Liu, Jian Zhang, Ray Luo, and Hai-Feng Chen. Recent Force Field Strategies for Intrinsically Disordered Proteins. Journal of Chemical Information and Modeling, 61(3):1037–1047, March 2021.

      (7) Vojtech Zapletal, Arnošt Mládek, Kateˇ ˇrina Melková, Petr Louša, Erik Nomilner, Zuzana Jasenáková, Vojtˇ ech Kubᡠn, Markéta Makovická, Alice Laníková, Lukᚡ Žídek, and Jozef Hritz. Choice of Force Field for Proteins Containing Structured and Intrinsically Disordered Regions. Biophysical Journal, 118(7):1621–1633, April 2020.

      (8) Stefano Piana, Alexander G. Donchev, Paul Robustelli, and David E. Shaw. Water dispersion interactions strongly influence simulated structural properties of disordered protein states. Journal of Physical Chemistry B, 119(16):5113–5123, April 2015.

      (9) Jing Huang, Sarah Rauscher, Grzegorz Nawrocki, Ting Ran, Michael Feig, Bert L. de Groot, Helmut Grubmüller, and Alexander D. MacKerell. CHARMM36m: an improved force field for folded and intrinsically disordered proteins. Nature Methods, 14(1):71–73, January 2017.

      (10) Richard T. Bradshaw, Fabrizio Marinelli, José D. Faraldo-Gómez, and Lucy R. Forrest. Interpretation of HDX Data by Maximum-Entropy Reweighting of Simulated Structural Ensembles. Biophysical Journal, 118(7):1649–1664, April 2020.

      (11) Niko Amin-Wetzel, Lisa Neidhardt, Yahui Yan, Matthias P. Mayer, and David Ron. Unstructured regions in IRE1 specify BiP-mediated destabilisation of the luminal domain dimer and repression of the UPR. eLife, 8, December 2019.

      (12) Robert B. Best, Gerhard Hummer, and William A. Eaton. Native contacts determine protein folding mechanisms in atomistic simulations. Proceedings of the National Academy of Sciences, 110(44):17874–17879, October 2013. Publisher: Proceedings of the National Academy of Sciences.

      (13) Robert B. Best and Michele Vendruscolo. Structural Interpretation of Hydrogen Exchange Protection Factors in Proteins: Characterization of the Native State Fluctuations of CI2. Structure, 14(1):97–106, January 2006.

      (14) Yawen Bai, John S. Milne, Leland Mayne, and S. Walter Englander. Primary structure effects on peptide group hydrogen exchange. Proteins: Structure, Function, and Bioinformatics, 17(1):75–86, 1993. _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/prot.340170110.

      (15) David Nguyen, Leland Mayne, Michael C. Phillips, and S. Walter Englander. Reference Parameters for Protein Hydrogen Exchange Rates. Journal of the American Society for Mass Spectrometry, 29(9):1936–1939, September 2018. Publisher: American Society for Mass Spectrometry. Published by the American Chemical Society. All rights reserved.

      (16) G Elif Karagöz, Diego Acosta-Alvear, Hieu T Nguyen, Crystal P Lee, Feixia Chu, and Peter Walter. An unfolded protein-induced conformational switch activates mammalian IRE1. eLife, 6:e30700, 2017.

      (17) Josh Abramson, Jonas Adler, Jack Dunger, Richard Evans, Tim Green, Alexander Pritzel, Olaf Ronneberger, Lindsay Willmore, Andrew J. Ballard, Joshua Bambrick, Sebastian W. Bodenstein, David A. Evans, Chia-Chun Hung, Michael O’Neill, David Reiman, Kathryn Tunyasuvunakool, Zachary Wu, Akvile Žemgu-˙ lyte, Eirini Arvaniti, Charles Beattie, Ottavia Bertolli, Alex Bridgland, Alexey˙ Cherepanov, Miles Congreve, Alexander I. Cowen-Rivers, Andrew Cowie, Michael Figurnov, Fabian B. Fuchs, Hannah Gladman, Rishub Jain, Yousuf A. Khan, Caroline M. R. Low, Kuba Perlin, Anna Potapenko, Pascal Savy, Sukhdeep Singh, Adrian Stecula, Ashok Thillaisundaram, Catherine Tong, Sergei Yakneen, Ellen D. Zhong, Michal Zielinski, Augustin Žídek, Victor Bapst, Pushmeet Kohli, Max Jaderberg, Demis Hassabis, and John M. Jumper. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature, pages 1–3, May 2024.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We would like to thank the reviewers for their positive and constructive feedback.

      We apologise for the delay in coming back. The first author has moved to the LMB, and the Trost lab has been relocating to the University of Manchester, which delayed our ability to respond quickly.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Reviewer Comments

      The manuscript by Chatterjee et al. describes a novel ultra-sensitive isolation and deep proteomics workflow to investigate phagosome dynamics of bacterium-containing phagosomes. The method enables dual proteome coverage of both host and pathogen, and the authors report quantitative changes in the host and bacterial proteomes using Salmonella isogenic mutants defective in intracellular survival. They further leverage these datasets to assess the relevance of selected Salmonella genes in intracellular fitness.

      Overall, the manuscript presents a powerful and technically impressive approach that will be of significant interest to the infection biology community. The study is well conceived and addresses an important gap in the field. However, several clarifications and additions would strengthen the work and improve interpretability of the results.

      Specific Comments

      Line 76: The authors should consider including the following relevant citations: PMID: 30079117 and PMID: 31009521.

      We thank the reviewer for pointing this out. We have now included the suggested references


      Line 104: Please define the abbreviation BFP clearly upon first use.

      We thank the reviewer; we have defined the abbreviation upon first instance.

      Figure 1A, Step 2: From the schematic, it is unclear whether the pellet or the supernatant is used for the subsequent step in which the CellVue dye is added. Please clarify.

      We thank the reviewer for bringing this to our attention. We have now modified Figure 1A.

      Figure 1B: It would be informative to report the percentage of S. Typhimurium that are double positive, especially in the BFP + Claret condition. A small bar plot for each condition would help visualize and compare the proportion of Claret-labelled bacteria.

      We have now included a figure for the percentage of BFP + Claret for STM in S1H.

      Figure 1C: The distinction between the upper and lower images is unclear. Do they represent different particles or different fields of view of the same sample? Please clarify.

      They both are from different fields of view.

      Line 122: The statement is not entirely accurate. Cells that lyse via pyroptosis will leave behind cellular remnants, including nuclei, that may still co-sediment with intact cells in such preparations.

      We have modified the sentence accordingly.

      Line 128: CellVue and Claret appear to be used interchangeably-are they the same reagent? Please clarify and use consistent terminology throughout.

      We have rectified this inconsistency in our revised manuscript.

      Line 136: Please explain the basis for the stated estimates. If this is common knowledge within the field, additional explanation would still be helpful for non-experts.

      We have clarified this further in the manuscript. Obviously, these numbers are estimates but give the reader an idea with how little material we are working.

      Lines 143 & 145: Please define "protein IDs" and indicate how many correspond to host proteins versus Salmonella proteins.

      We have defined this in our revised manuscript. Also, to avoid any confusion, these proteomics methods were optimised using a commercially available HeLa protein digest, and hence no Salmonella proteins are detected here.

      Figure 2D: Please specify the number and type of replicates used. Also indicate the plot type (e.g., violin plot) and the statistical test used to determine significance.

      We have updated figure legend for 2D and 2E stating the number of biological replicates, i.e. n=4 and n=3.

      Line 244: Please consider citing PMID: 32514074 and PMID: 23162002.

      *We have included these references. *

      Line 253: Have the authors considered how their observations regarding MHC relate to prior findings (PMID: 27832589)?

      *Thank you for suggesting this paper and we enjoyed reading it. However, since the paper suggested by the reviewer focusses on cell surface MHC molecules and we are looking at the phagolysosomal compartment, we feel it may be difficult to make connections. *

      Line 265: Clarify which "cell" is being referred to-the host cell or the bacterial cell.

      We have modified the sentence to reduce confusion.

      Line 278: Have the authors considered how their observations on glycolytic proteins relate to earlier work (PMID: 19380470 and PMID: 37594988)?

      *Thank you for pointing out these papers. We have cited both of these and added another sentence that intracellular STM utilises host metabolites. *

      Line 285: The claim that "PhoP-dependent effectors actively remodel..." requires clarification. If the authors are referring to all PhoP-regulated genes as "effectors," this terminology may cause confusion, as "effectors" in the Salmonella field typically denotes T3SS-secreted proteins. While some T3SS effectors are PhoP-regulated, PhoP controls many additional genes, and the observed phenotypes may reflect broader defects in intracellular survival rather than absence of secreted effectors specifically. Rewording is recommended.

      Thank you for your suggestion, we have modified the same in text.

      Line 313: Have the authors examined later time points (e.g., 8 hpi), when the SCV is more established and SPI-2 effector expression is higher?

      We did not test the 8 hpi timepoint because our primary aim was to identify the induction of SPI-2 effectors at earlier stages. Testing later timepoints would be problematic, as PhoP mutants show poor survival at these times, which would confound comparisons between STM WT and PhoP mutants.

      Line 317: Were secreted SPI-2 effectors detectable using PhagoCyt, and if so, how did they behave?

      We detected some of the secreted effectors as well, and they are in accordance with the literature. As expected, most of them were detected only in WT at 4 hpi.

      For example, PipB2, SseL and SctB1 are significantly decreased in the PhoP mutant compared to the STM WT at 4 hpi.

      Line 319: Have the candidate Salmonella mutants been evaluated at later time points (6-8 hpi)? Stronger phenotypic differences may emerge when intracellular replication relies more heavily on SPI-2 function.

      We acknowledge that there may be larger differences at later time points; However, we wanted to be comparable with the data within the manuscript, i.e. within the 4 hour time-point that we have kept throughput. Moreover, at later timepoint we see increase macrophage cell death and therefore refrain from doing timepoints much longer after the 4 hour mark.

      Figure 5B: For all mutant strains, please also report in vitro growth to determine whether the phenotypes reflect general growth defects or are specific to the intracellular environment.

      We have performed the growth curve for the PhoP mutant, which is in the supplemental figure 1.

      Line 336: As above, please reconsider the use of the term "effectors." Unless evidence is provided that these are bona fide secreted SPI-2 effectors, an alternative term would avoid confusion.

      We have modified the sentence to reduce confusion.

      Supplementary Figure 5: The volcano plots appear pixelated. Please provide higher-resolution versions.

      Thank you for pointing this out. We have rectified this.

      Reviewer #1 (Significance (Required)):

      General assessment:

      This study introduces a highly sensitive dual host-pathogen proteomics workflow for profiling bacterium-containing phagosomes. Its key strengths are the technical innovation and the mechanistic insight gained using Salmonella mutants. The main areas needing improvement are clarification of methodological details and tighter interpretation of some biological claims.

      Advance:

      To my knowledge, this is the first study to achieve such deep, simultaneous proteomic coverage of both host and intracellular bacteria within purified phagosomes. This represents a notable technical advance and provides new mechanistic insight into intracellular adaptation and immune regulation.

      Audience:

      The work will interest a specialized audience in infection biology, host-pathogen interactions, and proteomics, with broader relevance for researchers studying organelle isolation or intracellular pathogens. The workflow and datasets will be useful as a resource for future studies.

      Reviewer expertise:

      Expertise in host-pathogen interactions, bacterial intracellular survival, macrophage biology, and functional proteomics. Limited expertise in MS instrumentation.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In this work, Chatterjee, Rubio and colleagues use a novel flow cytometry-based method to isolate phagosomes from Salmonella infected macrophages. This method is applied both to wild-type and to a mutant (deletion of phoP) that does not express virulence genes, prior to the proteome characterization of these phagosomes and the bacteria that they contain. The experiments were done at an early point of infection (30 min) and a later time point (4 h). The authors first identified mitochondrial proteins in their analysis, which had previously been considered contaminants from the preparation of phagosomes. However, some Salmonella effector proteins are known to affect mitochondria, and the authors demonstrate that inhibition of Complex I showed decreased Salmonella intracellular viability. Comparing WT and the phoP mutant also highlighted two Salmonella proteins that enhance intracellular survival. In addition, the authors show that their method recapitulates previously known proteins involved in Salmonella infection. The study is well designed and clearly written.

      I have only some minor comments that I hope will strengthen the work:

      It would be interesting to compare the results with a whole cell proteome analysis, and to other approaches that involve subcellular fractionation (both in the context of Salmonella infection) to: a) highlight proteins that are specifically changing in abundance in the phagosomes (but not necessarily in the cell), and b) to show that this approach is able to capture previously unknown phenomena. To avoid the performing additional experiments, the authors can compare their dataset to previous proteomic datasets of Salmonella infection. We have compared this with the ultracentrifugation methods STM WT 4h vs STM WT uptake (Figure 6A).

      A color scale for the heatmap in Fig 2C is needed. I assume that this heatmap shows intensity and not fold-changes, and thus suggest that the authors use a single-color gradient for easier visualization.

      *This has now been included. *

      Best regards,

      André Mateus

      Reviewer #2 (Significance (Required)):

      General assessment: This study provides a novel approach to study intracellular pathogenic bacteria. The method is applied to Salmonella, but can potentially be used for any bacteria, including non-genetically tractable organisms. A strength of the approach is that it captures the bacterial proteome, which is mostly undetectable when studying infected cells. Further, by enriching phagosomes, it allows measuring the spatial distribution of proteins to these organelles. The study could be improved by distinguishing proteome changes that are caused by trafficking of proteins to phagosomes vs general changes in protein abundance.

      Advance: Apart from a new methodology, the authors use the approach to identify novel aspects of Salmonella infection biology, e.g., the importance of mitochondrial proteins in host defense or novel Salmonella proteins that are involved in intracellular survival. Audience: The audience for this study is mostly those in the field of infection biology, particularly Salmonella. The dataset generated can be used to identify novel aspects of Salmonella infection, and the described method could be applied to other pathogens.

      My field of expertise: Proteomics, microbiology.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In the manuscript "Flow cytometry-based isolation of Salmonella-containing phagosomes combined with ultra-sensitive proteomics reveals novel insights into host-pathogen interactions", the authors describe a new method for analysis of composition pathogen-containing phagosomes and the pathogens within. Combination of FACS-based single phagosome analysis and sorting combined with optimised highly sensitive proteomic analysis of sorted vesicles has potential for identification of so far overlooked host-pathogen interactions. Although this is well described in the manuscript, some controls are missing.

      Major comments:

      1) The sorting of labelled bacteria is a crucial bottleneck in the whole procedure. The gating strategy presented in the Fig. 1B suggest that the initial "bacterial phagosome size" is limited from the bottom based on the noise signal but not from top. Therefore any not broken THP-1 cell remaining in the sample would be also included in the analysis. In respect to very high sensitivity of the mass spectrometry procedure and high abundance of housekeeping genes in host cells, this contamination could well explain the appearance of mitochondria, ribosome, and nuclear envelope proteins identified in Fig 2B and undermine the following results. Therefore, the gating strategy should be more stringent and data from this more stringent gating shall be compared with the current data sets. Since the authors use BFP+ Salmonella and do not analyse the claret+BFP- events, a BFP vs FSC gating step could help to distinguish free bacteria, bacteria in vesicles, and not or only partially broken host cells.

      We use a series of centrifugations to ensure that we do not have intact cells in the prepared samples. We have also visualised the final samples under the microscope and did not observe any intact cells. Because of the side/forward scatter gating, intact cells are not within the field of sorting. In Figure 1B we show that free bacteria are not within the gating strategy that we used. Finally, we visually inspected >100 pictures of sorted phagosomes by imaging flow cytometry and did not see any intact cells or free bacteria.

      2) Since the authors present data previously well accepted as contaminations from other fractions, these shall be carefully validated by other methods. For example the contact of mitochondria with SCV could be validated using a FRET- or split FP- based assays. Change of abundance of surface proteins on SCV in individual timepoints shall be validated using antibody-based flow cytometry on isolated SCVs. Most relevant antibodies are already described in the manuscript or available commercially (IL4R, IFNgR, integrins, TLRs). Microscopy-based quantification could help with the soluble proteins present within SCVs.

      We agree with the reviewer that this would be very interesting. However, we feel that this is outside of the scope of this paper and will be very laborious and time consuming, practically a whole project in itself.

      3) Since the authors describe an alternative method to methods used previously, they shall discuss the differences in results obtained by the formerly used methods.

      We have now provided a dataset that is with SCVs isolated using ultracentrifugation as a comparatively analysis to our method (Figure S6A and Table S8). __The data show that the ultracentrifugation-isolated phagosomes have many more proteins from any organelle (__Figure S6B), suggesting that they are less pure than the phagosomes isolated by the PhagoCyt approach.

      4) Only 15 Salmonella proteins downregulated between 0.5 and 4 h timepoints were identified. However, at least genes from SPI-1 and flagella would be expected to be downregulated at 4 h p.i. How do the authors explain this discrepancy? In contrast, are the SPI-2 genes among those identified as upregulated?

      In our supplementary table 6 (comparison between WT 4h vs WT uptake), we see that there are 458 Salmonella proteins that are only present in uptake samples, these were not included in limma analysis since they are completely absent in the WT 4h. We decided to report these as “unique” proteins rather than perform imputation. In Figure 5B, we specifically highlight STM proteins down-regulated, which include flagellar proteins and SPI-1 proteins.

      To answer your second question, yes, several SPI-2 genes (effectors and other regulatory proteins) are upregulated at 4 hpi. 131 Salmonella proteins are significantly upregulated, and 55 proteins are exclusively present in the WT 4hpi samples. Some selected examples are in Figure 5A.

      Minor comments:

      1) Fig 1, the figure caption seems to remain parts of an older version, mentioning blue bars not present in the current version?

      The figure caption appears to be correct for us; the “blue” is in the unstained BFP Salmonella, which is hidden behind the purple, which is the BFP Salmonella + CellVue Claret.

      2) Fig 1A point 1, how were the dead cells removed? Normal centrifugation is not able to discriminate dead and living cells well enough as percoll gradient centrifugation for example would be. Such gradient centrifugation is not mentioned in the Methods section though.

      We have not used Percoll-based centrifugation to remove dead cells; instead, we have washed the adherent macrophages in dishes 3-4 times with ice-cold PBS to remove dead, floating cells, and then washed the pellet several times with PBS to ensure we are not taking any dead cells into the sample preparation.

      3) Fig 1A point 2, did the authors check for the composition of the pellet fraction in each centrifugation step? What are the losses and cross contaminations of the other fraction?

      No, we have not checked the composition of each fraction using mass spec; however, we did run some western blots to correctly identify the major organelle contribution in each fraction.

      4) Suppl. Fig 1, caption for panels F and G are missing. The axis in the panel G is misleading - the bacteria obtained in "output" contain proliferating intracellular bacteria that originate only from a fraction of the "input" bacteria. Since the figure clearly show increase in the number of intracellular bacteria and all the extracellular bacteria should be killed by gentamicin, all bacteria in the "output" probably proliferate intracellularly and, therefore, originate from the same fraction of the "input" throughout the whole assay. Showing these results as CFU per well/plate/surface area or cell count would be more exact, in this case the "input" data shall be shown as a separate data point.

      We thank the reviewer for this observation. We have now modified the figure legends. These are normalised per cell, and we think they provide accurate results.

      5) Fig 1B, could the authors show the percentages in individual quadrants for the green "Sample with BFP Salmonella + claret"?

      Yes, there is the plot that depicts the percentage in Supplementary Figure 1H, this varies between WT and PhoP mutant, and hence, we decided to not show this in one figure.


      6) All proteins identified as significantly up or down represented shall be listed in a supplementary file.

      They are listed in the supplemental tables.

      7) Fig 2C suggests that some mitochondrial proteins are similarly present at the SCV containing WT Salmonella at 4h as ∆phoP mutant at 0.5 h p.i. Could the authors speculate how is that? The scale of blue/orange transition shall be shown in Fig 2C.

      We speculate that Salmonella WT alters the maturation of the SCVs is heavily arrested by the pathogen and hence resemble the early SCV of a mutant that is unable to arrest the SCV degradation stages.

      8) In the Fig 2D, the authors show decrease of CFU obtained from THP-1 cells treated with Rotenone. However, rotenone is known to induce host cell apoptosis. Were the presented data normalized to amount of living host cells in the sample? For example measurement of protein concentration in the sample lysate after washing away the dying host cells should enable this.

      Yes, we have normalised the data to the account for the percentage of live cells using live dead staining. However, in the timepoints used, we did not observe significant cell death.

      9) Microscopy-based observation of mitochondria relocation to SCVs in time shall strengthen the claim that mitochondria-derived ROS are involved in anti-Salmonella host defense.

      There are multiple literature PMID: 38356294, PMID: 41444067, PMID: 15866946, PMID: 41198672 that support our data in this regard.

      10) The Salmonella proteins identified in the Fig 5 shall be validated using qPCR.

      We think that data from qPCR would not be accurate to validate Salmonella proteins, as it has been shown that Salmonella mRNAs can have sub-minute half-lives (PMID: 38527194). We used rather conservative proteomics analysis settings, that have shown in a recent pre-print of our lab to have 0% false discoveries and 0.4% false quantitative rate ( https://doi.org/10.1101/2025.09.22.677725). We acknowledge that another reviewer did not find this experiment to be essential.

      Reviewer #3 (Significance (Required)):

      The manuscript was reviewed mainly from the Salmonella and flow cytometry/FACS expertise point of view. The main interest in the study lies within its methodological advances - combination of single vesicle analysis using flow cytometry/FACS with highly sensitive mass spectrometry analysis. In comparison to other similar studies in the field, this combination significantly expands the possibilities of sorting of distinct subpopulations of vesicles from the same cells. This will make the article of interest to scientists in the broad field of host-pathogen interactions and immunology.

      **Referee cross-commenting**

      Reviewer 3 - @Reviewer #1: I see your point and leave it at the editors to judge how important this comment is. My reasoning was this: Fig 5

      serves as a proof of concept that PhagoCyt has the power to make new discoveries in Salmonella biology. While behavior of some of the proteins

      shown if Fig 5 is well described (e.g. flagella or SPI-1 T3SS components and effectors), some are novel and to prove the functionality of the

      method, these results should be confirmed by some other well accepted mean. Given the great sensitivity of PhagoCyt, other proteomic

      approaches are unlikely to help in this case (e.g. flagella or SPI-1 T3SS components and effectors are not detectable by western blot at 4 h p.i.).

      Therefore, I suggest qPCR (but would accept any other method as well) as a very sensitive and well accepted approach, but leave at the authors

      to chose what proteins they want to use for the validation.

      Reviewer 1- I agree with comments raised by the other two reviewers, except the following point from Reviewer 3 '10) The Salmonella proteins

      identified in the Fig 5 shall be validated using qPCR.' It is not clear which proteins are being referred to and it is unclear to this reviewer how this

      experiment(s) would improve the manuscript in its current form.

      Reviewer 3- I agree with all comments raised.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank all three reviewers for their careful and constructive engagement with our manuscript. We are encouraged by their overall positive assessment of the work. Reviewer 1 described this as "an important study" that addresses a significant gap in understanding systemic, inter-organ responses to hypoxia, and noted the potential relevance of our findings to mammalian IL-6 biology. Reviewer 2 highlighted the study as being of "high significance" and described it as "a foundation study that will be the motivation for numerous high-impact papers in the future", noting its broad relevance to understanding hypoxia in both health and disease. In the revised manuscript, we have addressed all of the reviewers' comments and critiques. This includes performing several new experiments, expanding our Discussion, and making a number of clarifications to the text, figures, and methods as detailed below.

      Reviewer #1


      __(Evidence, reproducibility and clarity (Required)): __The authors describe a role of Unpaired 3 (Upd3) in tissue communication in responses to hypoxia in Drosophila adult flies. Upd3 mRNA is strongly upregulated in hypoxia, along with well-characterized JAK/STAT downstream target genes, in both adult fly males and females, as well as in larvae. Interestingly, adult females but not males require Upd3 for 15 to 24 h survival in hypoxia, as Upd3 mutant females but not males die to a much larger proportion in these conditions. Adult females they display strong hypoxic upregulation of Upd3 in the gut, assessed by RT-PCR or through a Gal4 transcriptional reporter, mainly in epithelial enterocytes. Enterocyte-specific RNAi-mediated KD indicated that this enterocyte expression of Upd3 represents about 40% of Upd3 expression in the whole body. Enterocyte-specific KD of Upd3 in adult females significantly reduced survival in hypoxia, suggesting that this expression is critical for hypoxic adaptation. Tissue-specific analysis of the expression of the STAT target genes, SOCS36E, TotA and TotM revealed that stimulation of the JAK/STAT pathway in hypoxia is widespread, although more pronounced in abdominal tissues. Indeed, overexpression of Upd3 in enterocytes provokes upregulation o both target genes TotA and TotM. Consistent with this RNAi-dependent inhibition of the JAK/STAT pathway in the fat body and oenocytes significantly reduced survival of female flies in hypoxia. Nitric oxide synthase (NOS) is strongly upregulated in adult female abdomens upon hypoxic exposure, and KD of NOS in fat body and oenocytes reduced hypoxic survival. Surprisingly, the found that ubiquitous KD of HIFa/Sima led to mitigation of Upd3 hypoxic induction and, more clearly, to JAK/STAT target gene induction. HIF KD flies displayed increased lethality in hypoxia, and this lethality was slightly mitigated in Upd3 heterozygous flies. The authors conclude that increased lethality of HIF-minus flies in hypoxia stems at least in part from excessive levels of Upd3. The authors then find that HIF/Sima-dependent inhibition of Upd3 expression is non-cell autonomous, since KD of Sima specifically in the gut does not affect expression of Upd3 in this organ. Instead, Sima KD at the fat body led to significant increase of Upd expression in the gut, suggesting that a Sima-born signal communicates these two organs, leading to restriction of Upd3 intestinal expression. ROS does not seem to be the signal that communicates the fat body with the gut, as expression of catalase in the fat body did not affect expression of Upd3 in the gut.

      (Significance (Required)): This is an important study, because most previous studies have focused on cell-autonomous responses to hypoxia, but much less is known about systemic responses to low oxygen conditions, particularly in relation to inter-organ communication during this responses. This work defines the cytokine unpaired 3, homolog of human interleukin 6, as a major regulator of systemic responses to hypoxia. Future studies will determine if interleukin 6 plays similar roles in mammals. This work might be of interest for a broad audience interested in responses to hypoxia, as well as general physiology.

      We thank Reviewer 1 for their careful reading and comments on the manuscript. We are pleased that they found this to be "an important study" that addresses a gap in understanding systemic, inter-organ responses to hypoxia. We have addressed each of their concerns in the revised manuscript as outlined below.

      __MAJOR CONCERNS __ 1) Figure 1 lacks statistical analysis. It is important to determine if the apparent differences in gene expression are statistically significant.

      We have now added the statistical analyses to the revised version of the figures.

      2) Is NOS expression in fat body/oenocytes JAK/STAT-dependent? Block the pathway in hypoxia specifically in this cells and check.

      To address this, we blocked JAK/STAT signaling specifically in fat body/oenocytes under hypoxia and examined the expression of Nos, as well as bnl and Hipk - two additional genes we find are regulated by gut-derived Upd3 and required for hypoxia tolerance.

      Interestingly, fat body/oenocyte-specific knockdown of STAT92E suppressed hypoxia-induced Hipk expression but did not affect Nos or bnl expression in these tissues. These results suggest that gut-derived Upd3 can control fat body/oenocyte expression of hypoxia regulators through both direct and indirect (relay) mechanism There is precedent for indirect, relay in the context of other Upd3/Upd2-mediated inter-organ responses. For example, in response to CO2, neuronal Upd3 controls blood cell differentiation in the lymph gland; however, this effect is not direct - Upd3 first signals to the fat body to induce Dilp6 expression, and Dilp6 then signals to the lymph gland to regulate hematopoiesis. A second example involves gut-derived Upd2: upon infection, Upd2 controls olfactory behavior, but does so via a relay in which Upd2 signals to glial cells, which in turn alter apolipoproteins expression, and these then modify olfactory neuron function.

      We have incorporated the new tissue-specific data into the manuscript and expanded the Discussion to address both direct and indirect modes of Upd3 action. (Fig 5 and lines 427-441)

      3) The authors relate the HIF-dependent limitation of Upd3 induction in hypoxia to regulation of cytokine-dependent immune responses in mammals; specifically they propose a parallel with a cytokine storm. This relationship is unclear to this reviewer, as in the Drosophila response Upd3 fulfils a signalling function (rather than immunological). I suggest they consider modifying this assumption.

      We appreciate this comment. Our intent in drawing a comparison to mammalian cytokine storm response was to illustrate the concept of fine-tuning cytokine responses, where too little or too much signaling can be deleterious, as we observe when comparing upd3 mutants to upd3-overexpressing animals. We have revised the Discussion to retain this concept while tempering the suggestion that our findings directly mirror cytokine storm pathologies in human (lines 511-536).

      4) Mitigation of lethality of HIF KD flies in Upd3 heterozygotes is very modest. Thus, the conclusion that one of the mechanisms by which HIF mediates adaptation to hypoxia is through inhibition of Upd3 expression is not sufficiently supported by the data. It seems like an over-interpretation of the results.

      We agree that the rescue is modest, and we would argue this may be expected given HIF-1's role as a master regulator that coordinates many gene expression changes required for hypoxia tolerance. Loss of HIF-1 therefore likely disrupts multiple essential processes simultaneously - including metabolic reprogramming and tracheal remodeling - that may not be restored by reducing upd3 dosage. We take the reviewer's point that this should not be framed as a primary mechanism. The partial reversal of lethality in upd3 heterozygotes nonetheless implicates excessive Upd3 signaling as one small component of what HIF-1 does to promote hypoxia adaptation, and we have revised the manuscript language to reflect this more measured interpretation (lines 529-536).

      5) HIF expression is well-known to reduce ROS levels in hypoxia by controlling mitochondrial activity through a wide array of mechanisms. Thus, this reviewer feels that the experiments utilized to rule out a role of ROS in fat body-to-gut communication are insufficient. Catalase reduces hydrogen peroxide levels, but not necessarily other reactive oxygen species. The authors might try to express other ROS scavengers such as superoxide dismutase. In addition, expression of scavengers should be carried out both at the fat body and gut.

      We thank the reviewer for this important point. We have now addressed it by overexpressing CatA, SOD1, or SOD2 individually in either fat body or enterocytes and measuring hypoxia-induced upd3 expression in each case. In all six conditions, hypoxia-induced upd3 expression was unaffected (Figs. S6B–G). Together, these experiments scavenge both hydrogen peroxide and superoxide in both tissues and collectively argue against a role for ROS in mediating upd3 induction

      __MINOR CONCERNS __ 6) The authors state that hypoxic upregulation of Upd3 in the gut occurs mostly in "large epithelial enterocytes". In Figure 3B, it is evident that GFP does not express in all cells; please utilize cell-type specific markers to identify which cells do express the cytokine.

      We appreciate this suggestion. Despite multiple requests to different laboratories, we were unable to obtain antibodies suitable for marking enterocyte subtypes in this context. To address the question of cell identity genetically, we used drivers specific for enterocytes (mex-GAL4) or progenitor cells (stem cells and enteroblasts; esg-GAL4) to drive RNAi-mediated knockdown of upd3 and then measured the effect on hypoxia-induced upd3 expression in whole guts. These experiments indicate that hypoxia-induced upd3 expression occurs mostly in enterocytes, with a smaller contribution from progenitor cells. This mirrors previous findings showing that infection-induced upd3 induction occurs in both enterocytes and enteroblasts, and supports our conclusion that enterocytes are the predominant source of hypoxia-induced Upd3. We have incorporated these results into the revised manuscript (Fig 3C and Fig S2C).

      7) The title of Fig 4 caption reads "Gut-derived upd3 controls adipose expression of hypoxia regulators." Only one hypoxia regulator has been analysed: Nitric Oxide Synthase. Please change the title to "Gut-derived upd3 controls adipose expression of Nitric Oxide Synthase."

      In the revised manuscript we now show that gut-derived Upd3 controls the expression of Nos, bnl, and Hipk in fat body and oenocytes, and that all three genes are required for hypoxia tolerance. We have therefore revised the figure title, to better reflect the findings presented in this version.

      8) Supplementary Figures 1 A and B lack statistical analysis.

      We have now included the statistical analyses in the revised manuscript figures.

      Reviewer 2


      __(Evidence, reproducibility and clarity (Required)): __This study by Ding and colleagues identifies a novel role for the cytokine Unpaired-3 (upd3) and the JAK/STAT signaling pathway coordinate a whole-body response to systemic hypoxia in Drosophila. The authors describe how low-oxygen conditions rapidly induce upd3 expression in both larvae and adults. Interestingly, this pathway's importance is sex-specific, as female flies require upd3 for survival in hypoxia, while males do not.

      Intriguingly, the authors identify the intestine as a crucial source of the hypoxia-induced upd3. This gut-derived upd3 then signals to the fat body and oenocytes, promoting the expression of nitric oxide synthase, which is essential for hypoxia tolerance. Furthermore, the study reveals an unexpected role for the transcription factor HIF-1α/sima as a molecular brake. Instead of simply promoting the hypoxia response, sima prevents the overproduction of upd3, demonstrating that a precise dosage of this cytokine is necessary for survival. The findings define a novel gut-to-fat/oenocyte signaling axis that coordinates systemic hypoxia adaptation and highlights the fly as an ideal system for studying interorgan communication during bouts of hypoxia. Overall, I find this manuscript an important step forward in understanding the link between hypoxia signaling and inflammation.

      __ (Significance (Required)): __This study is of high significance, as it not only demonstrates that a clear role for cytokine signaling in the Drosophila hypoxia response, but also demonstrates this response requires interorgan communication between adipose tissue and the intestine. Moreover, the study reveals a clear role for Hif1alpha in modulating upd3 expression, suggesting that this highly conserved transcription factor play a key role in fine tuning the inflammatory response.

      I think these findings are of broad interest and are potentially relevant to two aspects of public health. First, I believe the findings should be of particular interest to anyone studying hypoxic injuries, such as stroke and ischemia-reperfusion. Secondly, the observations could be relevant to a previous study that revealed an important role for hypoxia signaling in the mosquito larval intestine. Thus, this study could be important for revealing new mechanisms for inhibiting mosquito development, which would be of broad public health interest.

      Finally, I would highlight how this study raises a number of important question. Why are there sex-specific differences for upd3 in the hypoxia response? What is the signal from the fat body to the intestine? How does sima modulate upd3 signaling. Thus, I think this manuscript represents a foundation study that will be the motivation for numerous high-impact papers in the future.__ ____ __ We thank Reviewer 1 for their careful reading and comments on the manuscript. We are pleased that they found this to be "a study of high significance” that will be importance for our understanding of hypoxia and health. We have addressed each of their concerns in the revised manuscript as outlined below.

      __Major Concerns and Suggestions: __ I have no real for the manuscript as written - the experiments are well designed and control, the results, as presented, support the major conclusions. While there are clearly open questions, including what it the basis of the sex-specific effects, how does sima modulate upd3 expression, and what is the signal communicating fat body sima activity with intestinal upd3 expression, these open questions do NOT diminish the importance of the study.

      My only major concern is that the current draft lacks a discussion of previous studies in the mosquito Aedes aegypti, where hypoxia signaling plays a key role in larval development (https://doi.org/10.1073/pnas.1719063115). This body of literature should be incorporated into the discussion, as it hints at a conserved molecular mechanism.

      We thank the reviewer for pointing us to this important study. Valzania et al. demonstrate that gut hypoxia acts as a systemic signal in Aedes aegypti larvae, activating HIF to coordinate fat body metabolism and whole-body growth. We agree this is relevant context for our findings, as both studies support the idea that the gut can function as a hypoxia sensor that controls whole-body physiology through effects on the fat body. We have incorporated this into our Discussion (lines 488-492).

      Minor comments:

      Please include a list of fly stocks used in the methods with complete genotypes. Whenever possible, include the RRID number for the stock - these can be found on the BDSC page for the stock.

      We have now added the list of fly stocks as well as a supplemental table with full genotypes.

      Line 477-479 - provide citations that sima regulates glycolysis in the fly.

      We have now added these citations

      Lines 501-505 - please state if gasses were premixed or mixed in lab. Also, were flies contained in standard food vials during the exposure?

      We have now provided more detail on these points – the gases were premixed and flies were on standard food vials during the exposure.

      Lines 507-513 - how long after the hypoxia exposure were the flies assayed?

      We have now provided more detail on this point in the methods (lines 592-596) – the flies were assessed 24hrs after hypoxia exposure.

      In figures that display qRT-PCR data, please note that data were normalized to reference genes listed in Table S2.

      We have now added this methodological point.

      Please reference Flybase in either the acknowledgements or methods and include citations to the latest Flybase papers published in Genetics.

      We have now acknowledged Flybase and referenced the relevant papers

      Genetics nomenclature is inconsistent throughout the study, a few examples included: Figure legend 1 - italicize gene names Figure 2 legend - italicize upd3-null Line 259 - Capitalize gal4 Figure 4 legend - NOS is written in all capital, but in line 270, written as Nos. Please be consistent. Line 297 - gal4 is lower case, in contrast with elsewhere.

      We have now made these corrections

      Additional suggestions:

      While not required for publication, it would be interesting to examine intestinal upd3 expression when sima is inappropriately stabilized in the fat body of animals under normoxic conditions. This could be achieved by driving a fatiga-RNAi construct within the fat body.

      We did carry out this experiment but didn’t see any effect of fat body fatiga RNAi on gut upd3 levels.

      Reviewer 3


      Evidence, reproducibility and clarity (Required)): __Summary: While local cellular and organ adaptations to hypoxia are well-documented, organism-wide responses to systemic hypoxia are still not well understood. In this paper, the writers were interested in investigating how organisms adapt to systemic hypoxia. From their investigations, they were able to show that gut-derived upd3 is crucial to animals' tolerance to hypoxia. They also show that the master hypoxia regulator Sima is required to keep the upd3 level in check to avoid the deleterious effect of excess upd3. They also showed that the fatbody Sima is important in the regulation of gut-upd3 level, showing an inter-organ communication network in the adaptation to systemic hypoxia. One of their findings shows sex dimorphism in hypoxia tolerance; however, they did not show the mechanism behind this. I think the major weakness is not knowing how the animal actually fail to survive. What causes reduced survival should be explored. Generally, the studies show how animals adapt to systemic hypoxia, this knowledge is important in systemic hypoxia pathology.

      __

      __Significance (Required)): __This paper explores how the organism copes with hypoxia, and explored how Upd from the gut plays a role in mediating this response in the fat body and the oenocytes

      We thank Reviewer 1 for their careful reading and comments on the manuscript. We have addressed each of their concerns in the revised manuscript as outlined below.

      __Major comment: __

      Figure 1: The authors clearly showed that Upd3 level was up in the hypoxia condition and is important for animal tolerance to hypoxia. Apart from Upd3, are there other members of the unpaired family increasing and involved in hypoxia tolerance?

      We thank the reviewer for this question. We examined expression of all three unpaired family members and found that both upd2 and upd3 are induced by hypoxia, while upd1 is not. We also have preliminary evidence that upd2 mutants show reduced hypoxia survival, and that this effect is not additive with loss of upd3. While these early results are intriguing, this paper is focused on defining the role of upd3 in hypoxia tolerance, and exploring upd2, both alone and in combination with upd3, across different aspects of hypoxia biology we see as the basis of future investigations.

      Notably, co-induction of upd2 and upd3 by the same stress is a recurring theme in Drosophila biology, yet their respective contributions to organismal physiology are complex - sometimes overlapping, sometimes distinct - and in many studies only one family member has been characterized in detail. Indeed, our current understanding of how upd2 and upd3 each contribute to responses to infection, high-fat diet, and other stresses has emerged from the collective findings of multiple independent studies rather than from any single paper addressing both cytokines simultaneously. For example, during infection both Upd2 and Upd3 are induced in the gut to promote stem cell-mediated repair, yet only Upd2 has been shown to additionally signal to the brain to control olfactory behavior. Similarly, on a high-fat diet both cytokines are upregulated, but with distinct effects on different aspects of organismal biology: enterocyte-derived Upd3 promotes intestinal stem cell divisions, hemocyte-derived Upd3 controls fat body lipid levels, and fat body-derived Upd2 alters nephrocyte function. We see the current study as a foundation for broader investigations into unpaired cytokine biology in hypoxia. Indeed, Reviewer 2 noted that this manuscript "represents a foundation study that will be the motivation for numerous high-impact papers in the future", and we anticipate that the effects of Upd2 and Upd3 in hypoxia will prove similarly pleiotropic and resolving their respective contributions to different aspects of organismal biology in low oxygen will require dedicated future investigation.

      Figure 2: From the method, female and male flies were subjected to different durations of hypoxia, 24-28 hours for females and 16-18 hours for males. What happens when subjecting different sexes to similar periods of hypoxia?

      We thank the reviewer for this question. Males and females show inherently different sensitivities to hypoxia, as they do for other environmental stresses such as starvation. To reliably detect genetic effects on hypoxia tolerance, it is important to use exposure conditions that produce partial lethality in controls (50-80% survival), ensuring experiments are conducted within the appropriate range of hypoxic sensitivity for each sex. Because males and females differ in their sensitivity, no single timepoint satisfies this criterion for both sexes. When males are exposed for the same duration used in female experiments (24-28h), all animals - controls and experimental genotypes alike - die, precluding any meaningful comparison. Conversely, exposing females to the shorter timepoint used for males (16-18h) produces no detectable lethality, making it equally uninformative. The sex-specific exposure durations we use are therefore an experimental design choice that allows us to assess hypoxia tolerance appropriately in each sex.

      Upon concluding that gut derived upd affects fat and oenocytes, it is a bit strange that the qPCR is done in the abdomen, which is presumably where the gut is. Should the gut be excluded in these assays?

      We thank the reviewer for raising this point. For abdominal qRT-PCR experiments examining fat body and oenocyte gene expression, we dissected and removed the gut and ovaries prior to RNA extraction, leaving an abdominal sample enriched in fat body and oenocytes. We have clarified this in the Methods and Results section of the revised manuscript (Lines 245-246 and 626-627).

      It is important to establish how the animals die under hypoxia.

      We thank the reviewer for raising this important question. Our results show that gut-derived Upd3 is required for hypoxia tolerance in part through its control of Nos, bnl, and Hipk expression in fat body and oenocytes, and that knockdown of each of these genes individually reduces hypoxia survival. However, precisely why animals die when upd3 or these downstream effectors are lost remains an open question, and we discuss much of what we outline below in the revised manuscript Discussion (lines 443-466).

      All three effectors are signaling molecules, and we speculate that they likely coordinate further downstream processes required for hypoxia tolerance, either within fat body and oenocytes or by acting on other tissues. In particular, both bnl, an FGF ligand, and nitric oxide, produced downstream of Nos, have established roles in tracheal development and remodeling, raising the possibility that Upd3-dependent regulation of tracheal responses to hypoxia contributes to survival. Nitric oxide can also regulate nitrosylation and has been shown to affect the unfolded protein response, a conserved pathway induced by hypoxia. bnl, in addition to its role in tracheal remodeling, has been shown to regulate metabolic changes in target tissues. Hipk is a kinase with likely many downstream targets and has been shown in flies to control metabolism and mitochondrial function. Together, these observations suggest that Upd3 engages a broad downstream signaling network, the full scope of which remains to be defined.

      We think this situation is analogous to other environmental stresses such as starvation, where survival requires the coordinated regulation of a spectrum of physiological processes across multiple tissues, and where even well-characterized regulators are known to engage many downstream targets and pathways. We see the current paper as establishing the gut-to-fat body Upd3 requirement for hypoxia tolerance, and we suggest this lays a foundation for future exploration of the full spectrum of Upd3 targets and investigation of how they coordinate adaptive responses to low oxygen.

      Figure 3-6: Controls for RNAi experiments - is there any reason for not using RNAi-specific control, such as mcherry-RNAi, lacZ-RNAi, etc, rather than a wildtype control in all the RNAi-mediated knockdowns? Please address this. Don't necessarily have to repeat all the experiments using RNAi-specific control, but repeating just a few to show that both wild-type and UAS-RNAi-specific controls show similar results would be important.

      We thank the reviewer for raising this point. To address potential non-specific effects of RNAi expression on hypoxia tolerance, we expressed control GFP RNAi or mCherry RNAi transgenes using the main Gal4 drivers employed in this study: mex-Gal4 (gut) and desat;r4-Gal4 (fat body and oenocytes), and found no effect on hypoxia survival compared to wild-type controls (Fig S2E and S4B). These results indicate that RNAi expression per se does not adversely affect hypoxia tolerance, and that the survival effects we observe reflect specific knockdown of the genes of interest.

      Although gut-derived upd3 contributes largely (40%) to hypoxia tolerance, what other tissues' upd3 is important for hypoxia tolerance?

      We thank the reviewer for this important question. We find that upd3 is induced in multiple tissues during hypoxia, including the head, thorax, and abdomen. However, when we knocked down upd3 using drivers targeting the major cell types in these tissues, including muscle, neurons, and fat body/oenocytes, we observed no significant effect on hypoxia survival, in contrast to the robust effect seen with gut-specific knockdown. These new data, included in the revised manuscript, suggest that gut-derived Upd3 is a primary contributor to hypoxia tolerance (Fig S3).

      That said, we do not conclude that the gut is the only relevant source. Other tissues we have not yet examined, including hemocytes, glia, and tracheal cells, may also contribute, and it is possible that Upd3 produced from multiple tissues acts redundantly, such that knockdown in any single tissue other than the gut is insufficient to cause a survival defect. By analogy with other stress contexts such as nutrient deprivation and infection, where upd cytokines are produced from multiple tissues and exert distinct effects on different aspects of physiology, we anticipate that Upd3 from tissues other than the gut may well contribute to hypoxia tolerance. However, fully defining these contributions will require detailed tissue-specific experiments that are beyond the scope of the current paper and will be the focus of future investigations. We have expanded on this point in the Discussion of the revised manuscript (lines 420-425).

      Can you use a hypoxia readout to experimentally show that the gut is the main sensor of hypoxia compared to other tissues? Looking at the data, the fatbody could also be major sensors of hypoxia. Therefore, investigating hypoxia readout in these and other tissues would further strengthen the direction of communication.

      We thank the reviewer for this suggestion, however, we wish to clarify that we are not claiming the gut is the main or primary sensor of hypoxia. All tissues are likely capable of sensing low oxygen and mounting cell-autonomous responses, and in some cases perhaps also non-autonomous signals to other tissues. Our findings specifically show that one consequence of gut hypoxia sensing is upregulation of Upd3, which then acts as an inter-organ signal to coordinate responses in target tissues such as the fat body and oenocytes. The fat body itself also senses hypoxia and mounts its own responses, as we and others have shown, including HIF-dependent regulation of gut Upd3 expression described in this paper. An analogous situation exists during nutrient starvation, where all cells autonomously sense and respond to nutrient deprivation, but on top of these cell-autonomous responses, specific tissues also mediate inter-organ signaling to coordinate whole-body physiological adaptations. We propose that hypoxia responses are organized similarly, and that the gut-to-fat body Upd3 signaling axis we describe here represents one such inter-organ communication pathway. We have clarified this point in the revised manuscript (lines 468-492).

      __Minor comment:

      __

      Should check the alignment of the confocal image in Figure 3b, especially the top panel.

      We have now fixed the images to better align them

      Figure 6: "gut-specific sima knockdown (mex>sima-RNAi) did not significantly alter intestinal upd3 mRNA levels compared to controls (mex>+) under hypoxic conditions (Figure 6C)." This statement refers to Figure 6B, not Figure 6C

      We have now corrected this

      Since the fat body Sima non-autonomously control the gut upd3 level, can you also show this functionally important by investigating the animal's survival or other functional studies?

      We thank the reviewer for this suggestion. Ideally, we would manipulate sima and upd3 independently and in parallel, knocking down sima specifically in the fat body while simultaneously reducing upd3 in the gut, to directly test the functional importance of this inter-organ axis for survival. In principle this could be achieved using orthogonal binary expression systems such as the GAL4/UAS and QF/QUAS systems in combination, but this would require the development of new genetic tools. An additional challenge is that based on our results, such experiments would require fine-tuned reduction of gut upd3, sufficient to suppress the elevated levels caused by fat body sima knockdown, but not so low as to itself compromise survival, as we have shown that loss of upd3 is detrimental. For these reasons, while we agree these would be, in principle, interesting experiments, they would technically be challenging to carry out.

      Strangely, all the statistically significant data/results from both supplementary and main figures had a one-star significance even in graphs with very obvious differences and less sample variation.

      We thank the reviewer for this observation. In all figures, a single asterisk is used to denote statistical significance at p < 0.05, regardless of whether the actual p value is substantially lower. This is a presentation convention we adopted consistently across all figures rather than a reflection of the strength of the underlying differences.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public Reviews:

      In this manuscript, the authors proposed an approach to systematically characterise how heterogeneity in a protein signalling network affects its emergent dynamics, with particular emphasis on drug-response signalling dynamics in cancer treatments. They named this approach Meta Dynamic Network (MDN) modelling, as it aims to consider the potential dynamic responses globally, varying both initial conditions (i.e., expression levels) and biophysical parameters (i.e., protein interaction parameters). By characterising the "meta" response of the network, the authors propose that the method can provide insights not only into the possible dynamic behaviours of the system of interest but also into the likelihood and frequency of observing these dynamic behaviours in the natural system.

      The authors studied the Early Cell Cycle (ECC) network as a proof of concept, specifically focusing on PI3K, EGFR, and CDK4/6, with particular interest in identifying the mechanisms that cancer could potentially exploit to display drug resistance. The biochemical reaction model consists of 50 equations (state variables) with 94 kinetic parameters, described using SBML and computed in Matlab. Based on the simulations, the authors concluded the following main points: a large number of network states can facilitate resistance, the individual biophysical parameters alone are insufficient to predict resistance, and adaptive resistance is an emergent property of the network. Finally, the authors attempt to validate the model's prediction that differential core sub-networks can drive drug resistance by comparing their observations with the knock-out information available in the literature. The authors identified subnetworks potentially responsible for drug resistance through the inhibition of individual pathways. Importantly, some concerns regarding the methodology are discussed below, putting in doubt the validity of the main claims of this work.

      While the authors proposed a potentially useful computational approach to better understand the effect of heterogeneity in a system's dynamic response to a drug treatment (i.e., a perturbation), there are important weaknesses in the manuscript in its current form:

      (1) It is unclear how the random parameter sets (i.e., model instances) and initial conditions are generated, and how this choice biases or limits the general conclusions for the case studied. Particularly, it is not evident how the kinetic rates are related to any biological data, nor if the parameter distributions used in this study have any biological relevance.<br /> (2) Related to this problem, it is not clear whether the considered 100,000 random parameter samples sufficiently explore parameter space due to the combinatorial explosion that arises from having 94 free parameters, nor 100,000 random initial conditions for a system with 50 species (variables).<br /> (3) Moreover, the authors filter out all the cases with stiff behaviour. This filtering step appears to select model parameters based on computational convenience, rather than biological plausibility.<br /> (4) Also, it is not clear how exactly the drug effect is incorporated into the model (e.g., molecular inhibition?), nor how it is evaluated in the dynamic simulations (e.g., at the beginning of the simulation?). Moreover, in a complex network, the results may differ depending on whether the inhibition is applied from the start or after the network has reached a stable state.<br /> (5) On the same line, the conclusions need to be discussed in the context of stability, particularly when evaluating the role of initial conditions. As stable steady states are determined by the model parameters, once again, the details of how the perturbation effect is evaluated on the simulation dynamics are critical to interpret the results.<br /> (6) The presented validation of the model results (Fig. 7) is only qualitative, and the interpretation is not carefully discussed in the manuscript, particularly considering the comparison between fold-change responses without specifying the baseline states.

      We thank the reviewers for their thoughtful and constructive comments. In response to their comments, we have undertaken a substantial revision to address all the comments, improve clarity, transparency, and robustness while preserving the paper’s core contribution: a principled, scalable framework (MDN) for mapping how molecular heterogeneity and network architecture shape adaptive drug-response dynamics. At a high level, we clarified the study design and analysis goals, tightened definitions, and added methodological detail where it most advances interpretability. Importantly, these updates leave the analytical pipelines and major conclusions unchanged.

      Conceptually, we now make explicit that our objective is coverage of the output space of qualitative dynamics supported by the network topology, not exhaustive enumeration of parameter space. To support this, we added a convergence analysis and clarified that “triplicates” refers to independent ensembles used to demonstrate reproducibility. We also refined how we describe and implement initial conditions (as conserved total abundances that encode expression heterogeneity) and reframed filtering as minimal numerical/feasibility checks, using rejection sampling to obtain the prespecified ensemble size. Solver choices and input modelling (constant step mitogen/drug) are now spelled out succinctly.

      We expanded the model specification and rationale (complete reaction list with rate laws and brief biological justifications in the Supplement) and unified terminology throughout. Figures and legends have been overhauled for readability and accuracy, with missing labels added and ordering corrected. For validation, we clarified the nature of the single-cell reporter readout, improved Figure 7’s presentation, and emphasised - consistent with our aims - that comparisons are qualitative.

      Finally, we have rewritten the Discussion to centre on interpretation, implications, and connect our findings to the literature. It now: (i) frames MDN as a systems-level framework that links molecular heterogeneity to qualitative signalling “meta-dynamics” and adaptive escape under constant drug pressure; (ii) highlights two key findings: an asymmetry in control (interaction kinetics exert stronger, more consistent influence than protein abundance) and a topology-driven convergence whereby a vast parameter space funnels into a finite set of recurrent behaviours; (iii) shows that resistance is a network-level property, with many possible routes but a small set of recurrent hubs/modules dominating; and (iv) provides a qualitative alignment with single-cell reporter data while clarifying the intent and limits of that comparison. Moreover, we now explicitly discuss limitations (rate-law simplifications, broad priors, determinism, and modular abstractions) and outline next steps for future research, including data-constrained priors and stochastic extensions.

      We believe these revisions materially strengthen the manuscript and fully address all the reviewers’ comments. A detailed, point-by-point response follows.

      Joint Recommendations for the Authors:

      (1) It is confusing exactly what are the different sets evaluated in each cases, e.g. "generated 100,000 model instances, each with the same set of ICs but a unique set of randomly generated parameter values" (lines 299-300), "generated 100,000 model instances (in triplicate), each with the same set of 'nominal' parameter values (see supplementary Table S1), and a unique set of ICs, and repeated the analysis as performed previously" (lines 366-368), "combined the 1000 IC sets with each parameter set to create 1000 model instances" (lines 382-383), "repeated for 1000 parameter sets, allowing us to observe how frequently IC variation induced adaptive resistance independent of the chosen parameter set" (lines 386-387). A small table or just a clearer explanation is needed.

      In response to these comments, we have revised the main text to clarify the process of model instance generation. Specifically, we have made changes at page 7: line 297 - page 8: line 302, page 8: lines 305 - 310, page 9: lines 372-378, and page 9: line 384 – page 10: line 399 in the revised main text.

      We have also added a new Figure (Figure S1) to the supplementary file to allow readers to visualise the model generation process for each relevant set of experiments. Supplementary figures are referenced in the main text where appropriate.

      (2) The authors mentioned performing each simulation in triplicate, which is puzzling as the model is based on deterministic ODEs with fixed parameters for each simulation. Under such conditions, one would anticipate identical results from multiple simulations with the same initial conditions and fixed parameters. Perhaps the authors expect the model to exhibit chaos or aim to assess the precision of the parameter estimates through triplicate simulations. Further clarification from the authors would be valuable to comprehend the rationale behind conducting triplicate simulations in a deterministic setting.

      We agree that repeating deterministic ODE simulations with identical inputs would be redundant. In our study, “triplicate” referred instead to generating three independent ensembles of 100,000 unique model instances each, where model parameters (or initial conditions) were randomly resampled. These ensembles were analysed separately to assess whether the inferred meta-dynamic distributions converged robustly. Indeed, the distributions from the three replicates were nearly indistinguishable, confirming that the results are reproducible and not artefacts of a particular random draw.

      We have revised the main text to clarify this distinction (page 8: lines 305 - 310) and added an extended explanation for meta-dynamic behaviour convergence in the new section Error Convergence in the supplementary text (page 6: lines 184 - 210).

      (3) While the lack of a connection between model parameters and biological data (mentioned in the public review) may not be a fatal flaw in the manuscript, the concern about the 100,000 random samples being insufficient to explore the parameter space is valid. In a thought experiment, considering the high and low rate for each parameter and the combinatorial explosion of possibilities (2^94), the number of simulations performed (100,000) represents only an extremely small fraction of the entire parameter space (~1/10^(23)). This limitation might not accurately capture the true heterogeneity present inside a solid tumour. One potential solution is to determine biological bounds on model parameters through data fitting, which can provide more meaningful constraints for the simulations. Alternatively, increasing the number of simulations and adopting more efficient sampling techniques can enhance the coverage of possible parameter sets.

      We thank the reviewer for this insightful comment. We agree that the 94-dimensional parameter space is vast, and that 100,000 simulations represent only a fraction of the total combinatorial possibilities. However, the objective of our study is not to exhaustively sample the entire parameter space, but rather to sufficiently sample the ‘output space’ - that is, the complete spectrum of qualitative dynamic behaviours the network topology can generate. The key question is whether 100,000 model instances are sufficient for the distribution of these output dynamics to converge.

      To formally address this, we have performed a convergence analysis, which is now detailed in the new supplementary section "Error Convergence" (Supplementary text page 6: lines 184 - 210) and illustrated in Supplementary Figure S12. This analysis demonstrates that the mean squared error (MSE) between dynamic distributions from N and 2N simulations exponentially decreases as N increases, and the distribution of protein dynamics changes negligibly well before reaching 100,000 instances. Furthermore, performing the entire analysis in triplicate with independent random seeds yielded nearly identical meta-dynamic maps (average standard deviation < 0.04%), giving us high confidence that we have robustly captured the network's behavioural repertoire.

      We believe this convergence occurs because the system is degenerate: many distinct parameter sets within the high-dimensional space map to the same qualitative outcome (e.g., 'rebound' or 'decreasing'). Our goal was to capture the set of possible outcomes, not every unique parameter combination that leads to them.

      Regarding the parameter range, we intentionally chose a broad, unbiased range (10<sup>-5</sup> to 10<sup4></sup>)as a proof-of-concept to delineate the theoretical upper limit of heterogeneity the network can support, thereby capturing even rare but potentially critical resistance dynamics. We agree with the reviewer that a future direction is to constrain these ranges using biological data. Such an approach would transition from defining what is possible (the focus of this manuscript) to predicting what is probable in a specific biological context. We have added this important point to the Discussion (page 16: lines 663-679) to highlight this avenue for future work.

      (4) One of the manuscript's main results indicates that protein interactions play a more significant role in driving adaptive resistance than protein expression. To explore the impact of protein expression, the authors fixed a nominal parameter set and generated 100,000 initial concentrations of the 50 proteins in the ODE model. However, the simulations' equilibrium concentrations in the "starvation" and "fed" phases, which form the initial condition for the treated phase, are uniquely determined by the nominal model's kinetic parameters and not the initial conditions, which remain identical for each simulation. From a dynamical systems perspective, stable steady states are determined by the model parameters and attract all initial conditions within their basin of attraction. As a result, a random sampling of the initial conditions has a limited impact on the model dynamics. The authors' conclusion that "the ability of expression to induce resistance also seems to be dependent on the master parameter set" can be explained by this dynamical systems perspective, where the resistance state corresponds to a stable steady state determined by the master parameter set. Considering this, the evidence presented in the manuscript may not fully support the authors' conclusion regarding the importance of protein expressions relative to protein dynamics. The discrepancy might be attributed to a possible misunderstanding of this point, and further clarification from the authors could be helpful.

      We thank the reviewer for the thoughtful perspective. We agree that, in a monostable system with fixed kinetic parameters and fixed conserved totals, varying only the initial split among moieties (e.g., X vs pX) will not change the final steady state; trajectories converge to the same attractor. In our analysis, however, “initial conditions” predominantly refer to total protein abundances (e.g., X_tot = X + pX + complexes), used as a proxy for expression heterogeneity. These totals are invariants on the simulated timescale (no synthesis/degradation in the pre-equilibration phases), and therefore alter the value of the steady state under a given parameter set. In other words, our IC sampling mostly varies conserved totals rather than merely redistributing a fixed total; hence the equilibrium reached after the starvation/fed pre-equilibrations depends on the sampled totals and the kinetics. This can be seen in the new Supplementary Figure S4, showing that changing the ICs does shift the eventual steady state even when kinetic parameters are fixed.

      We have revised the text to: (1) define ICs explicitly as total abundances for multi-state species, (2) distinguish “initial split” from “conserved totals,” and (3) clarify that expression effects are context-dependent rather than universally dominant (page 4: lines 139-141 and page 10: lines 413-416)

      (5) Additionally, it is important to note that the random sampling of 100,000 initial concentrations might not sufficiently explore the vast space of possible initial conditions. In the thought experiment mentioned earlier, where each protein can have high or low expression concentrations, there are approximately 2^(50) = ~10^(15) possible combinations of initial concentrations. Thus, the 100,000 random simulations only represent around ~1/10^(10) of the possible initial conditions in this simplistic scenario. Consequently, this limited sampling of initial conditions may not provide enough information to draw meaningful conclusions, even if the initial conditions were more directly linked to kinetic rates.

      Please see our response to Comment (3). Briefly, our ICs are continuous total abundances (conserved moieties), not binary high/low states; many IC configurations converge to the same qualitative attractors, so we estimate distributional properties rather than enumerate all combinations. Our convergence diagnostics (independent replicates and sample-size doubling) show that the meta-dynamic distributions stabilise well before N=100,000 (see Supplementary Figure S12). We have clarified this in the Supplementary Information (Error Convergence section) with the new convergence results.

      (6) The authors implement a parameter selection step in the manuscript, where they filter out parameter sets that lead to what they term non-biological simulations. However, the rationale for determining if a given parameter set results in a stiff system of ODEs remains unclear. The authors cite references [38,39] to support the claim that stiff equations are not biologically plausible. Still, upon review, it is evident that [38] does not include the term "stiff," and [39] discusses using implicit methods to simulate stiff ODE models without specifically commenting on the biological plausibility of stiff systems. The manuscript lacks direct evidence to justify the conclusion that filtering out parameter sets that result in stiff ODE systems is reasonable. Since the filtering step accounts for the majority of discarded parameter sets, a stronger foundation is required to support the statement that stiff equations are non-biological.

      We thank the reviewer for pointing out the issue in our original justification. The reviewer is correct: stiff systems are a common feature of biological models, and our claim that they are likely ‘biologically implausible’ was not well substantiated. The filtering of these model instances was, in fact, due to a computational limitation rather than a biological principle. The issue was that these parameter sets produced systems of ODEs that were so numerically stiff they were unsolvable within a reasonable timeframe by the SUNDIALS ODE solver suite, which is specifically designed for such systems.

      Following the reviewer's comment, we investigated the source of this prohibitive stiffness. We discovered it was not an intrinsic property of the parameter sets themselves, but rather an artifact of our simulation setup. The extreme stiffness occurred almost exclusively during the initial integration timesteps, caused by the large initial discrepancy between the concentrations of active and inactive protein forms. This large discrepancy created the conditions for overtly stiff solutions i.e. unsolvable with implemented ODE solve settings. To overcome this problem, we set a large maximum number of steps in the ODE solver for the first couple of time points, enabling the solver to overcome the excessively stiff portion of the solve. We found that the vast majority of the previously 'unsolvable' model instances could now be successfully simulated. Consequently, the number of parameter sets discarded due to solver failure is now negligible (< 1%), and this filtering step no longer accounts for the majority of discarded parameter sets. Most importantly, the distributions of dynamics were not significantly altered by this adaptation.

      We have revised the " Sampling and filtering of model instances (page 5: lines 174 – 189)" part in the Methods section to reflect this more accurate understanding. We have corrected our original claim regarding the biological plausibility of stiff systems and corrected our use of the references. Ref [38] was included to demonstrate that models of biological systems are stiff, which was a major conclusion of that paper, and [39] was originally included to demonstrate that solving ODEs is reliant on solvers that can integrate stiff systems. Upon review, ref [39] has been removed.

      Overall, this investigation has made our analysis more robust by allowing us to include a wider, more representative range of parameter sets, and has tangibly improved the quality of our study.

      (7) Additionally, it is important to consider the standard method for accounting for stiff systems, as presented in [39], which involves using implicit numerical methods for ODE simulation. The authors mention using numerical methods from the SUNDIALS suite, which includes implicit methods, but the specific numerical method used remains unclear. Furthermore, it would be valuable for the authors to disclose the number of parameter sets that were filtered to obtain the final set of 100,000 accepted parameter sets. This information would provide insights into the extent of filtering and the proportion of parameter sets that were excluded during the selection process.

      We apologise for the lack of specific detail and have now updated the text. To clarify, all ODE simulations were performed using the CVODE solver from the SUNDIALS suite. This solver employs an implicit, variable-order, variable-step Backward Differentiation Formula (BDF) method, which is robust and specifically designed for handling the stiff systems common in biological network modelling. We have now explicitly stated this in the "ODE model construction, modelling, and simulations (page 4: lines 162 – 164)" section of the Methods.

      Regarding the filtered parameters, we have included a revised and detailed discussion of this in the "Sampling and filtering of model instances (page 5: lines 174 – 189)" part in the Methods section (see our response to comment (6) above). Briefly, after applying the filters, ~40–45% of instances did not reach steady state within the simulation timeframe, and ~50–55% did not meet the minimum drug-response criterion. Approximately 10% satisfied all criteria and were retained for analysis. Importantly, we employed ‘rejection sampling’ and continued drawing until we had N = 100,000 accepted instances that satisfied all the criteria.

      (8) An important step in the simulation process described by the authors is the simulation of the "fasted" and "fed" states until an equilibrium is reached. However, it is not clear how the authors determine if the system has reached an equilibrium. It would be helpful if the authors could provide more information regarding the criteria used to assess equilibrium in the simulations. Regarding the "fed" state, it is not explicitly stated whether the mitogen stimulus is assumed to be constant throughout the "fed" experiment. Considering the dynamic nature of mitogen stimulation in biological systems, it would be beneficial if the authors could clarify this assumption and discuss its biological relevance.

      We apologise for the lack not specifying this in the original text. A simulation was considered to have reached equilibrium when the concentration of every protein species changed by < 1% over the final 100 time steps of the simulation phase. We have now added this criterion to the "Sampling and filtering of model instances (page 5: lines 177 – 179)" part of the Methods section.

      Regarding the second part of the comment, in our simulations, both the mitogenic and the drug inputs were modelled as constant, stepwise functions that, once turned on, remained at a fixed concentration for the remainder of the simulation. The biological rationale for this choice was to rigorously test for bona fide adaptive resistance. By maintaining a constant mitogenic and drug pressure, we can ensure that any observed recovery in the activity of downstream proteins is due to the internal rewiring and adaptation of the signalling network itself, rather than an artefact of the removal or decay of the external stimulus/drugs. We have now clarified this rationale in the "ODE model construction, modelling, and simulations (page 4: lines 168 – 171)" part of the Methods section.

      (9) The "Description of Model Scope and Construction" section in the Supplementary Information should include explicitly the model reactions and some discussion about their specific form (e.g., why is '(((kc2f1*pIR*PI3K) / (1 + (pS6K/Ki2))) + (kc2f2*pFGFR*PI3K))' representing the phosphorylation rate of PI3K, with pS6K in the denominator?).

      The reviewer is right to ask for model justification. We have expanded the Supplementary “Description of Model Scope and Construction” section (page 2: line 63 – page 5: line 185) to include a complete reaction list with rate laws and a brief rationale for each. We also explain the specific PI3K phosphorylation term: activation by pIR and pFGFR is attenuated by pS6K via a denominator, which captures the well-described S6K-mediated negative feedback that reduces activation (e.g., via IRS1 phosphorylation).

      (10) In line 349, the statement "Given that CDK46cycD is only strongly suppressed in just under 60% of the model instances (Figure 3C)" lacks clarity regarding where to look to interpret the 60% value. If this means that 4 out of the 7 model instances are resistant, and the other 2 proteins also have the same percentage of resistance, then there is no apparent reason to focus solely on CDK46cycD.

      The reviewer is correct; the figure reference was an error, which has been rectified in the main text (page 9: line 355). The actual figure reference was to Supplementary Figure 2A, which shows the heatmap of all the frequencies for each protein dynamics for all the active protein forms. CDK4/6cycD shows a sustained decreasing dynamic for 59.93% of model instances, which is where this number was derived. We have also now explicitly referenced this number in the supplementary Figure 2A legend.

      We focus on CDK4/6cycD because it is the direct pharmacological target of CDK4/6 inhibitors. Our point was to suggest that even when the target is suppressed in the majority of instances (~60%), this does not reliably propagate to uniform downstream inhibition across the network, thus highlighting emergent, network-driven adaptive responses.

      (11) We observed that in Fig. 5A, the authors show that multiple pathways are blocked. However, it is unclear whether they reduced the value of one parameter in the experiment or simulated multiple combinations of parameter inhibition. Considering the large number of parameters (94) in the model, if the authors simulated all possible combinations of parameter inhibition, the number of combinations would be significantly more than 94. An actual inhibitor typically has an inhibitory effect on multiple molecules. Therefore, it would be necessary to identify the parameters that lead to drug resistance when multiple molecules are inhibited. However, examining the inhibition patterns for all 94 parameters would be practically impossible. As a potential approach, we suggest using ensemble learning techniques, such as random forests, to handle this problem efficiently. With a dataset of binary outputs indicating the presence or absence of resistance for a sufficient number of inhibition patterns, ensemble learning can be applied to find the parameters that contribute to drug resistance. Popular feature selection algorithms like Boruta could be utilised to identify the most relevant parameters. The results obtained by ensemble learning are similar to the ranking in Fig. 5C, potentially providing a more robust validation of the authors' findings. By incorporating these additional analyses, the authors could strengthen the reliability and significance of their results related to parameter inhibition and drug resistance.

      We appreciate the suggestion and the opportunity to clarify. Figure 5A depicts multiple pathways were interrogated, but in the analysis, parameters were inhibited one at a time (OAT) - not in combination. We have revised the figure legend and added a section named “Protein knockdown perturbation analyses (page 6: lines 228 – 233)” in the Methods section to make this explicit. Moreover, some additional text in the main text has been slightly modified to make this clearer (page 11: lines 462-463, page 24: lines 856-857).

      We chose the OAT design intentionally to obtain causal, first-order attribution of control points across a broad parameter ensemble without confounding from simultaneous co-inhibition. This provides an interpretable ranking of primary drivers (Figure 5C) that is consistent with the paper’s mechanistic focus. We agree that a multi-target inhibition approach could be a useful next step; however, an exhaustive combinatorial screen is beyond the scope of this proof-of-concept. In such future studies, the ensemble learning, as suggested by the reviewer, could be layered onto our MDN framework to assess robustness of the ranking under co-inhibition.

      (12) In explaining the parameterization of the model, we find an implication of a quantitative model. However, upon examining the results in Fig. 7D, we observe that they are only qualitatively correct. When comparing Figs. 7A and 7C, we note that many model instances are immediately suppressed, and the time scale remains unknown. We believe it would be essential for the authors to explain how the model of this study maintains its quantitative nature despite the results in Fig. 7. If such an explanation cannot be provided, it raises concerns regarding the biological reliability of several findings within this study.

      While our framework is built on quantitative ODEs, the validation we present in Figure 7 is indeed qualitative. This is an intentional and key feature of our study's design. Our goal was not to build a calibrated, quantitative model of a specific cell line (e.g., MCF10A), but rather to establish a proof-of-concept theoretical framework that systematically explores the full spectrum of dynamic behaviours a given network topology can possibly generate. To achieve this, we intentionally sampled parameters from a very broad, unbiased range to delineate the theoretical upper limit of heterogeneity. This in silico population is therefore designed to be far more heterogeneous than any single isogenic cell line.

      The striking qualitative agreement seen between our meta-dynamic distributions and the single-cell data in Figure 7D is thus not a failure of quantitative prediction, but rather a strong validation of our core premise: that a significant degree of signalling heterogeneity exists in cell populations and that our framework can effectively capture its emergent properties.

      Regarding the specific comment on Figure 7C, we apologise for the lack of clarity. Nominally, we chose to simulate for 24 hours however, the x-axis in our simulations represents arbitrary time units, as the timescale is dependent on the meaning/units of the parameter values. The goal is to compare the qualitative shape of the response (e.g., rebound, sustained decrease), not the absolute time in hours. Moreover the rapid initial suppression seen in many of our model instances (Fig 7C) is a direct parallel to the rapid suppression seen in the experimental data (Fig 7A). This initial phase is followed by a wide variety of adaptive behaviours (or lack thereof) in both our simulations and the real cells, which is the key phenomenon we are studying.

      We have revised the text (page 14: lines 598-601) and Figure 7’s legend to state more explicitly that our validation is qualitative and to clarify the purpose of our broad, uncalibrated approach. We have also added a note in the Discussion (page 18: lines 744-747) that calibrating this framework with cell-line-specific data is a natural next step for generating quantitative, context-specific predictions.

      (13) Related to the previous point, the experimental data is presented as fold-change during CDK4/6 inhibition, and we notice that the initial fold-change at time 0 varies between 1 and 1.8. The difference in initial fold-change is unclear to us, as our understanding of fold-change typically corresponds to the change from baseline, typically represented by the protein concentration at time 0.

      Furthermore, while the experimental data exhibits uniformly decreasing CDK4/6 activity, a substantial number of simulations indicate constant CDK4/6cycD, showing a significant qualitative discrepancy between the simulations and experimental findings. This disparity makes it difficult for us to interpret the comparison between the two datasets effectively, given the complexities in comprehending the experimental fold-change figure.

      As Figure 7 serves as the primary validation of model simulations in the manuscript, we believe that the current presentation may not provide a compelling reason to believe that the model accurately captures experimental data. To enhance clarity and validation, we suggest overlaying the experimental data over the simulations or considering the median and 10/90% percentile of the experimental data, which may potentially offer improved readability and facilitate a more robust interpretation of the comparison.

      The experimental data from Yang et al. (ref 55, main text) measures kinase activity using a nucleus-to-cytoplasm translocation reporter system, wherein a bait protein is phosphorylated by the target kinase causing it to translocate from the nucleus to the cytoplasm. Hence, the y-axis represents the ratio of nuclear vs. cytoplasmic fluorescence, not a fold-change from a t=0 baseline. The variation in the starting value (between 1 and 1.8) reflects the inherent heterogeneity in the reporter's localization across individual cells even before the drug is added. We have updated the y-axis label and revised Fig. 7’s legend to state this explicitly.

      The most likely explanation for the discrepancy between experimental dynamics and our simulation dynamics is that the experimental data comes from an isogenic cell line that is largely sensitive to CDK4/6 inhibition. Our simulations are derived from a very wide parameter sweep, where the intent is to represent all possible cell states. It is quite striking that that there is such a high correlation between the experimental data and simulations, indicating that perhaps the heterogeneity of even isogenic cell lines is significantly greater than might be intuited; a point we now mention in the revised Discussion (page 17: lines 716-727).

      It is worth noting again, that our analysis is intentionally constructed to be as heterogeneous as possible, and is not trained on any biological data that might otherwise constrain the output-behaviour space. The isogenic cell line almost certainly represents a much more constrained output-behaviour space than our analysis.

      The y-axis label has also been updated accordingly. As mentioned in (12) this result is intended as a qualitative validation, showing that cell lines indeed have highly variable signalling dynamics. Given the range of parameters tested, we think it is surprising that the degree of agreement between the experiment and our analysis is as high as it is. Again, we believe this suggests that heterogeneity may be more prevalent than is intuited. We do not believe we have made any strong quantitative claims in the main text, and we certainly aim to work towards biological, quantitative validation in the future. Finally, we altered the wording of the results heading (page 14: line 562) to make it clear that we are only making qualitative claims and removed the claim that the evidence was strong.

      With these clarifications and corrections, we believe the validation is now much more compelling. The key point is not a perfect quantitative match, but the strong similarity in the distribution of heterogeneous behaviours.

      (14) The authors mention simulating treatment with 10nM of CDK4/6i or Ei, but specific details on how this treatment is included in the model simulations are not provided. This lack of information makes it challenging to fully evaluate the comparison between model simulations and experimental evidence in Figure 7. It would be highly appreciated if the authors could clarify how the treatment with CDK4/6i or Ei is incorporated into the simulations to facilitate a better understanding and interpretation of the results.

      To clarify, the effects of the inhibitors were incorporated directly into the kinetic rate laws of their respective target reactions.

      CDK4/6 inhibitor (CDK4/6i): This was modelled as an inhibitor of the formation of the active CDK4/6-cyclin D complex. We have now explicitly detailed this in the description for reaction R27 in the "Description of Model Scope and Construction" section of the Supplementary Information.

      Estrogen Receptor inhibitor (Ei): This was modelled as an inhibitor of the estrogen-dependent activation of the Estrogen Receptor. This is now explicitly detailed in the description for reaction R15 in the same supplementary section.

      It is however important to reiterate that our goal in Figure 7 is qualitative, shape-based comparison; therefore, we used a fixed fractional inhibition (reported in Methods) rather than a calibrated IC50/Hill model.

      (15) The authors state strong support for their modelling conclusions based on the literature. However, we still have concerns regarding the validation of the model against CDK2 or CDK4/6 data in Figure 7, as it appears less convincing to us. Furthermore, the authors list known resistance mechanisms that are replicated in their modelling. Nevertheless, we find the conclusion somewhat weakened by Figure S10, where approximately 80% of the nodes are implicated in some form of resistance pathway. This raises questions about the model's selectivity, as many proteins included in the model seem to drive resistance in some manner. In the Supplementary Information, the authors mention excluding or abstracting some protein species from the mitogenic and cell cycle pathways to manage computational resources effectively. This abstraction makes it difficult to determine if the proteins identified as potential drivers of resistance genuinely drive resistance or might represent abstractions of other potential drivers. To enhance the manuscript's clarity and address potential concerns about the model's selectivity and abstraction, we suggest providing more details and discussion in the main text.

      The reviewer's observation that a large number of nodes are implicated in resistance pathways in Figure S10 is correct. However, we argue this is not a weakness of the model's selectivity, but rather a key finding that reflects the biological reality of adaptive resistance. The literature is replete with a wide and growing number of distinct mechanisms of resistance even to a single class of drugs (1,2), which supports the idea that cancer can co-opt a wide variety of network nodes to survive.

      Figure S10 is not a binary map where every implicated node is equal, instead it is a likelihood map, where the colour and weight of the connections represent how often a particular interaction participates in driving resistance across the theoretical full range of possible network dynamics. The figure shows that while many nodes can contribute to resistance, they do so in a hub-like manner i.e. small subsets of nodes coordinate to drive resistance. This provides a rationalised, data-driven prioritisation of the most dominant and recurrent resistance strategies. We draw two important conclusions from this work 1) Resistance likely occurs due to resistance hubs, not individual proteins, and 2) that the frequency of a resistance hub in an MDN analysis is likely proportional to the frequency of that hub emerging as a resistance mechanism in a population of cells and patients.

      Regarding the issue of abstraction, the reviewer is correct that this is an inherent feature of any tractable systems model. In our case, several species in the mitogenic/cell-cycle pathways are module-level proxies to control model size. The highly implicated "hub" nodes in our model likely represent critical cellular processes that are themselves composed of several individual protein interactions.

      To address these concerns, we have significantly revised the Discussion (page 16: lines 681 – 694) to: (1) frame resistance as a network-level phenomenon; (2) show that our frequency-based ranking is selective, prioritising the most probable, recurrent mechanisms; and (3) clarify that - given model abstraction -our findings implicate critical processes (modules), not just single proteins, as the drivers.

      Overall, these changes do not alter our main conclusions: adaptive resistance is an emergent, network-level property; many routes exist, but a smaller set of nodes/modules consistently carry the largest influence across heterogeneous contexts.

      (16) We consider that the figures and legends, including the supplementary information, are inadequately explained. The information provided is insufficient for us to comprehend the figures fully, leading to the need for interpretation on our part as readers. This could potentially introduce biases when trying to understand the claims made by the authors. To improve our understanding, it would be essential for the authors to assign appropriate labels to the figures and provide comprehensive explanations in the legends. For example, in Fig 3, we suggest labelling the tree diagrams in panels A and B, as well as the colour bars. We also recommend applying the same approach to other figures, adding accurate axis labels and descriptions of colour gradients to enhance clarity.

      We thank the reviewer for this critical feedback. To address this comment, the figure legends have been revised where appropriate and greatly expanded to improve their comprehension. Moreover, we have added explicit labels to all previously unlabelled components, such as the cluster dendrograms and colour code bars in Figure 3A, B.

      (17) To enhance readability, we recommend interchanging the order of Figures 1 and 2 in the sequence they appear in the main text. Alternatively, the text can be adjusted to refer to the figures in the correct order. Additionally, attention should be given to the bottom of Fig 1, which appears to be cropped or cut off. Furthermore, the incorrect word spacing in some figure elements, such as Fig. 3A title, Fig. 5B title, and Fig. 6B y-label, should be corrected for improved visual presentation.

      Following the reviewer’s comment, the order of Figures 1 and 2 has been switched to reflect the order in which they are referred to in the main text. These Figures have been re-exported to fix unintentional word spacing errors.

      (18) We recommend that the language used to refer to the initial conditions in the manuscript is clarified and homogenised. Currently, the authors use different terms such as "basal expression," "protein expression," "state variable values," or "initial conditions" to refer to them. This variation in terminology can be confusing for readers. In particular, the use of "basal expression" is problematic, as it typically refers to the leaky value of a reaction in the absence of an inducer, making it another biophysical parameter of the system rather than an initial condition. To enhance clarity and consistency, we suggest the authors decide on a single term to refer to the initial conditions throughout the manuscript and provide a clear explanation of its meaning to avoid any confusion. This will help readers better understand the concept being discussed and prevent any potential misinterpretations.

      We thank the reviewer for this very helpful suggestion. To resolve this and improve clarity, we have homogenized the language throughout the manuscript. We now clarify the use the following 3 terms in their specific contexts:

      We use “protein abundances” exclusively for the conserved total abundances of multi-state species (e.g., Xtot = X + pX + complexes) that are sampled across instances to represent expression heterogeneity.

      We use ‘initial conditions’ to refer to initial values of the state variables in a model simulation. This term is related to protein abundance as the setting of initial conditions for conserved species sets the protein abundance. This is explicitly stated in the text (page 3: lines 87 - 91).

      We use “state variables” to refer to the time-dependent model species.

      We avoid the term “basal expression” in technical descriptions. Where a biology-facing phrase is helpful, we use “protein expression level”. This is used when referring to the biological concept that the initial conditions are intended to represent, i.e. the heterogeneity in protein amounts across a cell population.

      We have performed a thorough search-and-replace to ensure this new convention is applied consistently and have removed the potentially confusing term "basal expression" from the revised manuscript.

      (19) Why are saturable functions (e.g., Michaelis-Menten functions) ignored in the model? What are the potential consequences?

      The main objective of this work was to perform a large-scale, systematic exploration of a high-dimensional parameter space (94 parameters) to map the full repertoire of qualitative dynamic behaviours a network topology can support. Using saturable functions like Michaelis-Menten kinetics would have roughly doubled the number of parameters to be explored (from k to Vmax and Km for each enzymatic reaction), making a parameter sweep of this scale computationally intractable. We therefore prioritised the breadth of the parameter search over the depth of kinetic detail, which we believe is the appropriate choice for a proof-of-concept study focused on heterogeneity.

      This simplification has potential consequences. A major one is that our model cannot capture phenomena that arise specifically from enzyme saturation, such as zero-order kinetics or certain forms of ultrasensitivity (switch-like responses). However, we argue that this is an acceptable trade-off for two main reasons: (1) Our analysis is based on classifying broad, qualitative response shapes (increasing, decreasing, rebound, etc.). Mass-action kinetics are fully capable of generating this rich spectrum of behaviours; and (2) by varying the mass-action rate constants over nine orders of magnitude (from 10<sup>-5</sup> to 10<sup4></sup>), our parameter sweep effectively samples a vast range of reaction efficiencies. A very low rate-constant can approximate the behaviour of a saturated, low-efficiency enzyme, while a high rate-constant can approximate a highly efficient, non-saturated one. In this way, the broad sweep of the rate parameter partially reflects the effects that would be captured by varying Vmax and Km.

      For transparency, we have added a brief rationale to the “ODE model construction, modelling, and simulations” part of the Methods (revised main text, page 4: lines 153-155) and the "Description of Model Scope and Construction" section in the Supplementary file (Supplementary text page 2: lines 63-73).

      (20) Given the relevance of the concept of "heterogeneity" in this work, a short discussion about biochemical noise and its implications on the analysis (e.g., why it is not included, and if it will be a next step) would be appreciated.

      Our MDN modelling framework represents heterogeneity by creating an ensemble of deterministic models, where each model instance has a unique set of kinetic parameters and/or initial protein abundances. We propose that this is a powerful way to mechanistically represent the functional consequences of all sources of cellular variation. Over time, the effects of genetic mutations, epigenetic states, and even the time-averaged impact of intrinsic biochemical noise will manifest as changes in the effective interaction strengths and protein concentrations within a cell. Our large-scale parameter/IC sweep is designed to systematically explore the full range of dynamic behaviours that can emerge from this underlying biological variation. Therefore, our approach does not compete with stochastic modelling but is complementary to it. While stochastic simulations can capture the dynamic trajectories of single cells, our framework provides a panoramic view of the entire spectrum of possible stable phenotypes that can emerge at the population level. We agree that modelling intrinsic biochemical noise (stochasticity arising from finite copy numbers), e.g. using chemical Langevin or SSA, is a possible extension in future work but expected to be very computationally expensive. We have added a brief discussion on this as future direction in the revised Discussion.

      (21) We have noticed that the first four paragraphs of the Discussion section overlap with the Introduction, as they mainly reiterate the significance of the study itself rather than focusing on the specific results obtained. To avoid redundancy and provide a more cohesive and informative discussion, we recommend that the authors shift the focus of the Discussion section towards presenting potential interpretations, even if they are not definitive, of the results obtained. By doing so, the Discussion will serve as a valuable platform for deeper analysis and insightful observations, allowing readers to better comprehend the implications and significance of the research findings.

      We thank the reviewer for this structural feedback. Following the reviewer's feedback, we have significantly rewritten and restructured the Discussion section. The redundant introductory material has been removed.

      The rewritten Discussion centres on interpretation, implications, and connect our findings to the literature. It now: (i) frames MDN as a systems-level framework that links molecular heterogeneity to qualitative signalling “meta-dynamics” and adaptive escape under constant drug pressure; (ii) highlights two key findings: an asymmetry in control (interaction kinetics exert stronger, more consistent influence than protein abundance) and a topology-driven convergence whereby a vast parameter space funnels into a finite set of recurrent behaviours; (iii) shows that resistance is a network-level property, with many possible routes but a small set of recurrent hubs/modules dominating; and (iv) provides a qualitative alignment with single-cell reporter data while clarifying the intent and limits of that comparison. Moreover, we now explicitly discuss limitations (rate-law simplifications, broad priors, determinism, and modular abstractions) and outline next steps for future research, including data-constrained priors and stochastic extensions.

      We believe this substantial revision has transformed the Discussion into a much more insightful and valuable part of the manuscript that directly addresses the reviewer's concerns.

      (22) The supplemental text file containing the model equations can be a bit challenging to read and understand. It would be greatly beneficial if the authors could consider generating a file using a typesetting program.

      We have now included a typeset list of state variable equations and ODEs, along with the original model files.

      (23) The authors mentioned that some model parameterizations result in negative solutions, which is surprising. Access to the model equations would help understand why this happens and is crucial for researchers who may want to use this approach. Clarifying the model equations' presentation would enhance transparency and aid other researchers in applying this method for similar research questions.ach. Clarifying the model equations' presentation would enhance transparency and aid other researchers in applying this method for similar research questions.

      The reviewer is correct to be surprised by the mention of negative solutions, as negative concentrations are physically impossible. We clarify that these are not a result of any structural flaw in our model's equations but are a well-known, although rare, numerical artifact of floating-point arithmetic in computational solvers.

      Our model is constructed using standard mass-action and first-order kinetics, which structurally guarantee non-negativity. However, when a species' concentration approaches the limits of machine precision (i.e., becomes a very small number extremely close to zero), the ODE solver can, in rare instances, numerically undershoot zero, resulting in a small negative value. If this occurs, it can lead to instability in subsequent integration steps.

      This is not a biological phenomenon but a computational one. Therefore, the standard and appropriate procedure, which we follow, is to implement a filter that discards any simulation trajectory where such a numerical instability occurs.

      (24) The reference listed for the CDK4/6 and CDK2 measurements is Yang et al. [55] in the figure caption, but as Xe et al. in lines 559-561 of the manuscript.

      The text has been updated to match citation.

      (25) We suggest that the authors revise and cite a previous study conducted by Yamada et al. (Scientific Reports, 2018), which presents an approach to expressing cell heterogeneity as a probability distribution of model parameters.

      Following this suggestion, we have revised the Discussion (see response to comment (21)) to include and discuss Yamada et al. (Scientific Reports, 2018), which models cell heterogeneity as a probability distribution over parameter values.

      (26) In the manuscript, on line 677, the authors state, "This indicates that there is an upper limit to the degree to which parameter sets can influence the qualitative shape of a protein's dynamic within a given network topology." We wish to highlight that this finding may not be particularly surprising. Given that the parameters were randomly determined within a specific range, it is understandable that altering the number of parameter samples would not substantially impact the distribution of model instances.

      We thank the reviewer for this insightful comment, which allows us to clarify the significance of this finding. While it is true that any sampling from a fixed distribution will eventually converge statistically, our conclusion is not about statistics but about the intrinsic, constraining properties of the network's topology. The novelty is not that the distribution converges, but that it converges to a surprisingly limited and finite repertoire of qualitative dynamic behaviours. A complex, non-linear network with nearly 100 free parameters could theoretically generate an almost endless variety of complex dynamics. Our finding is that this specific biological topology acts as a powerful filter, robustly channelling the vast majority of the near-infinite parameter combinations into a small, recurring set of functional outputs (increasing, decreasing, rebound, etc.).

      The reason for this finite limit is mechanistic, as the reviewer's comment prompted us to investigate further. Our parameter sweep already covers an extremely wide, 9-order-of-magnitude range. As we pushed parameter values to even greater extremes in exploratory simulations, we found they do not generate novel, complex dynamic shapes. Instead, they tend to drive network nodes into saturated states- either permanently "on" (maximally activated) or permanently "off" (minimally activated). In both cases, the node becomes unresponsive to upstream perturbations.

      Therefore, further expanding the parameter range would be unlikely to uncover new behavioural categories; it would simply increase the proportion of model instances classified as "no-response." This demonstrates a fundamental principle: the network topology itself enforces an upper limit on its dynamic complexity. We think this inherent robustness is what allows for reliable cellular signalling in the face of constant biological variation. We believe this is a non-trivial finding, and we have revised the Discussion (page 16: lines 664 - 680) to state this conclusion and its implications more clearly.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the editor and reviewers for their thoughtful and constructive feedback. We appreciate that all reviewers recognized the value of our study in linking adult neurogenesis and synaptic plasticity to representational drift in the olfactory system. They described the model as elegant and well-motivated, and agreed that it provides new theoretical insight into how stability and adaptability can coexist in sensory representations. The reviewers also identified areas where our manuscript could be strengthened, and as outlined in our revision plan we have:

      (1) Refined our description of mitral/tufted cell stability and expand on within-session and across-day variability.

      (2) Substantially expanded the Discussion to compare our modeling assumptions with experimental findings and recent anatomical evidence. Additionally, we have included the limitations of the study and areas for future investigation.

      (3) Included a clearer description of the STDP implementation, plastic synapses, and their functional effects.

      (4) Add a short section outlining model-based predictions that can guide future experiments. We also made minor textual edits to improve precision and flow, including citing prior conceptual work and clarifying model procedures.

      These changes have strengthened both the conceptual framing and technical clarity of the paper. We are grateful for the reviewers’ careful reading and valuable suggestions.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors build a network model of the olfactory bulb and the piriform cortex and use it to run simulations and test their hypotheses. Given the model's settings, the authors observe drift across days in the responses to the same odors of both the mitral/tufted cells, as well as of piriform cortex neurons. When representing the M/T and PCx responses within a lower-dimensional space, the apparent drift is more prominent in the PCx, while the M/T responses appear in comparison more stable. The authors further note that introducing spike-time dependent plasticity (STDP) at bulb synapses involving abGCs slows down the drift in the PCx representations, and further link this to the observation that repeated exposure to the same odorant slows down drift in the piriform cortex.

      The model is clearly explained and relies on several assumptions and observations:

      (1) Random projections of MTC from the olfactory bulb to the piriform cortex, random intra-piriform connectivity, and random piriform to bulb connectivity.

      (2) Higher dimensionality of piriform cortex representations compared to M/T responses, which enables superior decoding of odor identity in the piriform cortex.

      (3) Spike time-dependent plasticity (STDP) at synapses involving the abGCs.

      The authors address an open topical problem, and the model is elegant in its simplicity. I have however, several major concerns with the hypotheses underlying the model and with its biological plausibility.

      Concerns:

      (1) In their model, the authors propose that MTC remain stable at the population level, despite changes in individual MTC responses.

      The authors cite several experimental studies to support their claims that individual MTC responses to the same odors change (some increase, some decrease) across days. Interpreting the results of these studies must, however, take into account the variability of M/T responses across odor presentation repeats within the same session vs. across sessions. In the Shani-Narkiss et al., Frontiers in Neural Circuits, 2023 study referenced, a large fraction of the variability across days in M/T responses is also observed across repeats to the same odorant in the same session (Shani-Narkiss et al., Figure 4), while the authors have M/T responses in the same session that are highly reproducible. This is an important point to consider and address, since it constrains how much of the variability in M/T responses can be attributed to adult neurogenesis in the olfactory bulb versus to other networks' inhibitory mechanisms, which do not rely on neurogenesis. In the authors' model, the variability in M/T responses observed across days emerges as a result of adult-born neurogenesis, which does not need to be the main source of variability observed in imaging experiments (Shani-Narkiss et al., Figure 4).

      We agree with the reviewer and believe this is a critical discussion point. Indeed, both in Shani-Narkiss et al, Kay and Laurent, 1999, and in our lab, we observe trial-to-trial variability that occurs in the same recording session; as the reviewer correctly points out, this cannot be due to neurogenesis. These fluctuations may be trial to-trial noise, or reflect dynamics associated with other behaviors such as running (Chockanathan, et al. 2021) and decision making (Kay and Laurent, 1999). There is growing repertoire of literature showing that neural variability in early sensory coding appears to depend on behavioral fluctuations and internal states (Niell and Stryker for example). This variability that happens within a session in the Shani-Narkiss et al work may reflect some of these behaviorally relevant features of early olfactory coding, something that our model cannot account for. This is an excellent discussion point and we have included text (line 153-157, and line 321-330) in the manuscript to note this aspect of the data and how one can think of it in the context of our results.

      Another study (Kato et al., Neuron, 2012, Figure 4) reported that mitral cell responses to odors experienced repeatedly across 7 days tend to sparsen and decrease in amplitude systematically, while mitral cell responses to the same odor on day 1 vs. day 7 when the odor is not presented repeatedly in between seem less affected (although the authors also reported a decrease in the CI for this condition). As such, Kato et al. mostly report decreases in mitral cell odor responses with repeated odor exposure at both the individual and population level, and not so much increases and decreases in the individual mitral cell responses, and stability at the population level.

      Thank you for raising this important point regarding the findings of Kato et al. (2012). We agree that their results suggest increased sparsening and stability in M/T cell odor responses with repeated exposure. However, as noted in Yamada et al. (2017), the experimental literature on this question remains mixed. Yamada and colleagues reported a “drastic reorganization of ensemble odor representation” across days and emphasized that “sensory experience does not necessarily cause a major sparsening of the odor response,” explicitly contrasting their findings with those of Kato et al. (2012).

      Our model captures the dynamics observed in Yamada et al. (2017), providing a mechanistic explanation for how significant reorganization can emerge in M/T ensembles despite stable low-dimensional population structure. In both Yamada et al (2017) and Kato et al (2012) the investigators have nuanced differences in experimental design (method of head fixation, behavioral paradigm used, training etc.), all of which are known to affect olfactory responses and therefore the degree of sparsity and overlap in population codes. Our model does not include any of these behavioral features that may differentially engage the olfactory circuit and thus affect population responses. Notably, in previous work, we highlight how even simple changes to top down feedback that reflect one phenomenological manipulation to functional connectivity in the olfactory circuit could have disparate effects on the degree of sparsity in neural representations over time whereby this manipulation would be activated by some behavior broadly. In our current model, there is no behavior that would allow us to study the critical features of the neural activity code in the M/T cells. Instead we focus on one specific aspect, adult neurogenesis which we can explicitly manipulate and affect in a biologically meaningful way. The review’s point however is well taken and important, and we have added text to the Discussion (line 336-344) to highlight the differing experimental outcomes and to clarify how our model aligns with the Yamada et al. results.

      (2) In Figure 1, a set of GCs is killed off, and new GCs are integrated in the network as abGC. Following the elimination of 10% of GCs in the network, new cells are added and randomly assigned synaptic weights between these abGCs and MTC, GCs, SACs, and top-down projections from PCx. This is done for 11 days, during which time all GCs have gone through adult neurogenesis.

      Is the authors' assumption here that across the 11 days, all GCs are being replaced? This seems to depart from the known biology of the olfactory bulb granule cells, i.e., GCs survive for a large fraction of the animal's life.

      Thank you for raising this important point regarding the lifespan of granule cells (GCs). We agree that developmentally born GCs are not fully replaced. Indeed, multiple studies indicate that some developmentally born GCs can survive for very long periods, up to 18-24 months, essentially the lifetime of the animal (Kaplan, 1985; Petreanu & Alvarez-Buylla, 2002). However, the fraction of total GCs that such long lived GCs constitute remains an open question, in part because of challenges to measure the lifetime survival of newborn neurons. What there is consensus on is the significant size of the granule-cell population undergoing continuous turnover through adult neurogenesis (reviewed in Lepousez et al., 2013).

      We should clarify that we do not assume that 100% of the granule cell population turns over in an 11 day period. We use “day” to represent a static epoch over which we can implement plasticity rules across two time scales. Critically, we also randomize the turnover treating every cell in the GC population as equally likely to be replaced. Prior experimental evidence suggests that some GCs are more likely to persist (possibly as a result of experience, Magavi et al., 2005) which may in some regards make our result on stabilization following repeated sensory exposure more dramatic (as the GCs that show the largest change following STDP may also be the ones that are the most stable, and therefore least likely to turnover). We do not include this in our model as we could not identify a framework for “selecting” which GCs would persist that would not be tautological. The point the reviewer raises is critical, and a discussion of these points is warranted - which we now include in the manuscript (line 352-361).

      Additionally, there is some evidence that behaviors, such as novelty, can increase the rate of adult neurogenesis (Kamimura et al., 2022, H.van Praag et al.,1999, Gheusi and Lledo., 2014) , suggesting a complex reciprocal relationship between the mechanisms that generate the cells shaping how olfactory stimuli are encoded for and the encoding process itself; our model also does not include any of these dynamic features which represent an additional layer of complexity, which may further provide an intermediate time scale, one of behavioral selection and action, that is slower than the milliseconds on which spike time dependent plasticity happens, but faster than the time scale of neurogenesis. We include this point in the discussion also (line 352-361). 

      Our 11-day simulation however is designed to uncover how plasticity across multiple timescales (STDP and adult neurogenesis) at the network level shapes odor representations as multiple rounds of GC turnover occur. Changing the timescale and magnitude replacement in the simulations (either in terms of days or percent cells replaced) would affect the degree to which drift happens, but not phenomenon. Additionally, the representational structure in our model at intermediate time points (e.g., days 8~10) would correspond well to scenarios in which some fraction of developmentally born GCs persists in the circuit. Thus, our simulations span a range of possible empirical regimes, from high turnover to partial preservation. We have added discussion to the revised manuscript (line 352-361) clarifying this point and acknowledging the biological heterogeneity in GC lifespans.

      (3) The authors' model relies on several key assumptions: random projections of MTC from the olfactory bulb to the piriform cortex, random intra-piriform connectivity, and random piriform to bulb connectivity. These assumptions are not necessarily accurate, as recent work revealed structure in the projections from the olfactory bulb to the piriform cortex and structure within the piriform cortex connectivity itself (Fink et al., bioRxiv, 2025; Chae et al., Cell, 2022; Zeppilli et al., eLife, 2021).

      How do the results of the model relating adult neurogenesis in the bulb to drift in the piriform cortex representations change when considering an alternative scenario in which the olfactory bulb to piriform and intra-piriform connectivity is not fully distributed and indistinguishable from random, but rather is structured?

      Thank you for pointing us to these important studies. We fully agree with the reviewer that the structure of the olfactory system might not be purely random, but we do not believe these papers contradict the level of abstraction used in our model.

      Zeppilli et al. (2021) map molecularly defined projection neuron subtypes and their preferential targeting of different cortical and subcortical regions, but they do not report any fine-scale topographic organization of bulb → piriform connectivity that would contradict a view of randomly distributed input to piriform cortex. Studies from our lab using retrograde tracers in the blub show some spatial clustering of piriform cortical neurons whose axons project to the bulb (Padmanabhan et al., 2016, 2019), but these studies do not identify any “functional organization” or structure. Chae et al., (2022) focus on distinct long-range functional loops (mitral ↔ piriform vs tufted ↔ AON) and the differential role of cortical feedback, but again, at the level of cortical regions rather than individual cells and connectivity. Notably, our model does not consider AON.

      Finally, Fink et al. (2025) reports a “like-to-like” excitatory connectivity motif within the piriform cortex and an experience-dependent reorganization of inhibitory synapses. As the authors note, “... this like-to-like motif is unlikely to reflect common input from the olfactory bulb”, so it does not conflict with our assumption of broadly random bulb → piriform input. This “like-to-like” motif is reflected in our model by wiring a certain subpopulation of piriform cells. On the other hand, we agree that the experience dependent changes in inhibitory connectivity within PCx are highly relevant for learning related plasticity but fall outside the scope of our study. We intentionally omitted piriform plasticity to isolate the contributions of adult neurogenesis in the bulb and plasticity acting on adult-born granule cells. But incorporating such cortical plasticity is an important direction for future work. We added a discussion (line 395-405) on this important point raised by the reviewer in the revised manuscript.

      (4) I didn't understand the logic of the low-dimensional space analysis for M/T cells and piriform cortex neurons (Figures 2 & 3). In the authors' model, the full-ensemble M/T responses are reorganized over time, presumably due to the adult-born neurogenesis. Analyzing a lower-dimensional projection of the ensemble trajectories reveals a lower degree of re-organization. This is the same for the piriform cortex, but relatively, the piriform ensembles displayed in a low-dimensional embedding appear to drift more compared to the M/T ensembles.

      This analysis triggers a few questions: which representation is relevant for the brain function - the high or the low-dimensional projection? What fraction of response variance is included in the low-dimensional space analysis? How did the authors decide the low-dimensional cut-off? Why does STDP cause more drift in piriform cortex ensembles vs. M/T ensembles? Is this because of the assumed higher dimensionality of the piriform cortex representations compared to the mitral cells?

      Thank you for these thoughtful questions. We clarify the logic and purpose of the low-dimensional analyses and address each point below.

      (1) Which representation is relevant for brain function, the high-dimensional or low-dimensional one?

      We believe both representations are meaningful, with each capturing different aspects of the neural code. The high-dimensional activity reflects the full variability of individual cell responses, while the low-dimensional projection captures the dominant population level components that downstream areas are most likely to use for readout. We found that the low-dimensional representations are more stable in the bulb than in PCx, suggesting that information is used differentially between the two areas. The bulb provides a stable, sensory-anchored population code that reliably represents odor identity over time, consistent with both electrophysiological and behavioral studies (Nagayama et al., 2004, Chen et al., 2009, Davison and Katz, 2007, Cavaretta et al., 2018). This is consistent with its role as the first stage of information processing in the olfactory system which provides faithful representations that downstream circuits receive. The piriform cortex, by contrast, transforms this stable input into a more flexible representation. Drift in its low-dimensional space may reflect ongoing plasticity (Schoonover et al., Nature, 2021), integration of contextual signals, or higherdimensional computations characteristic of PCx (Fink et al., bioRxiv, 2025), suggesting its role more as an associative cortex instead of a pure sensory cortex.

      (2) What fraction of variance is included in the low-dimensional space, and how was the cutoff chosen?

      In our simulations, these PCs captured the majority of variance relevant for odor identity (~60–70% for M/T cells and ~55–65% for piriform cortex). We now report these fractions explicitly in Methods (line 937-939).

      (3) Why does STDP cause more drift in piriform-cortex ensembles than in M/T ensembles? Does this reflect higher dimensionality in piriform cortex?

      In our model, STDP does not cause more drift in PCx. It actually reduces drift and stabilizes PCx representations relative to the condition without STDP (as shown in Fig. 4C2). STDP has a much smaller effect in the bulb because: (1) M/T cells continue to receive stable odor input from the glomeruli and (2) the low-dimensional M/T representation is already stable even without plasticity. We have edited the manuscript to reiterate this point in both the results and discussion.

      The reviewer is correct that the piriform cortex naturally exhibits more drift than the bulb, and their comment that this is due to its substantially higher representational dimensionality is spot on. The PCx contains many more neurons, receives highly divergent OB → PCx inputs, and has dense recurrent connectivity, all of which create many more degrees of freedom through which representations can drift. Additionally, because individual PCx neurons are sampling from a substantially more diverse combinatorial space of inputs (include feedback to piriform from an array of regions, Illig, 2005, Majak et al., 2004, Chapuis et al., 2013), the “dimensionality” of the population code is likely higher dimensional. While STDP stabilizes the dimensions of the PCx representation that are reinforced during plasticity, due to the large number of orthogonal dimensions available, some residual drift remains. Additionally, as the reviewer notes, there are some forms of plasticity, such as inhibitory plasticity in PCx that are not included in the model, that may also have an impact on both the representations, and the underlying dimensionality of those representations. We include these points in the discussion (line 381-394).

      (5) Could the authors comment whether STDP at abGC synapses and its impact on decreasing drift represent a new insight, and also put it into context? Several studies (e.g., Lledo, Murthy, Komiyama groups) reported that abGC integrates in the network in an activity-dependent manner, and not randomly, and as such stabilizes the active neuronal responses, which is consistent with the authors' report.

      Related, I couldn't find through the manuscript which synapses involving abGCs they focus on, or what is the relative contribution of the various plastic synapses shown in the cartoon from Figure 4 A1 (circles and triangles).

      We thank the reviewer for raising this question. As the reviewer pointed out, several studies have shown that abGCs integrate into the bulb circuit in an activity dependent manner. They preferentially form synapses onto mitral/tufted cells that respond to behaviorally important odors, this “selection of surviving cells” is not included in our model. Instead, we use STDP at the synaptic level. This is of course not analogous, but provides a computational framework wherein the selection of surviving abGCs could be incorporated in future studies. It is perhaps notable that in our large scale simulations, synaptic changes at the population level may reflect some of this activity-dependent selection.

      To that end, our model provides a new insight and suggests a broader function for adult neurogenesis. For example, when certain odors are reinforced in an activity dependent manner, abGCs born during that period may stabilize the circuits that respond to those odors. The resulting reduction of drift would help keep the representation of those odors stable over time, even while other parts of the circuit continue to change. We now highlight this idea in the Discussion (line 366-373).

      For the second part of the question: in our model, STDP acts on two sets of connections. It applies to the synapses onto abGCs from M/T cells, GC/SAC cells, and PCx neurons. It also applies to the synapses that abGCs project to, including those onto M/T cells and GC/SAC cells. We have clarified this in the revised Methods (line 10011004).

      (6) The study would be strengthened, in my opinion, by including specific testable predictions that the authors' models make, which can be further food for thought for experimentalists.

      How does suppression of adult-born neurogenesis in the OB impact the stability of mitral cell odor responses? How about piriform cortex ensembles?

      We appreciate the reviewer’s suggestion and formalize the following two predictions from our model:

      Prediction 1: Suppressing adult neurogenesis will reduce spontaneous representational drift in the PCx. Increasing spike-timing-dependent plasticity during periods of experience with a specific odor will selectively stabilize representations of that odor.

      Prediction 2: Adult neurogenesis will not affect AON representations of odor identity or concentration in the same way that PCx representations are altered and drift.

      We include these two ideas in the discussion as experimentally testable predictions.

      Reviewer #2 (Public review):

      Summary:

      The authors address a critical problem in olfactory coding. It has long been known that adult neurogenesis, specifically in the form of adult-born granule cells that embed into the existing inhibitory networks on the olfactory bulb, can potentially alter the responses of Mitral/Tufted neurons that project activity to the Piriform Cortex and to other areas of the brain. Fundamentally, it would seem that these granule cells could alter the stability of neural codes in the OB over time. The authors develop a spiking network model to explore how stability can be achieved both in the OB over time and in the PC, which receives inputs. The model recapitulates published activity recordings of M/T cells and shows how activity in different M/T cells from the same glomerulus shifts over time in ways that, in spite of the shift, preserve population/glomerular level codes. However, these different M/T cells fan out onto different pyramidal cells of the PC, which gives rise to instability at that level. STDP then, is necessary to maintain stability at the PC level as long as odor environments remain constant. These results may also apply to a similar neurogenesis-based change in the Dentate Gyrus, which generates instability in CA1/3 regions of the hippocampus

      Strengths:

      A robust network model that untangles important, seemingly contradictory mechanisms that underlie olfactory coding.

      Weaknesses:

      The work is a significant contribution to understanding olfactory coding. But the manuscript would benefit from a brief discussion of why neurogenesis occurs in the first place - e.g., injury, ongoing needs for plasticity, and adapting to turnover of ORNs. There is literature on this topic. It seems counterintuitive to have a process in the MOB (and for that matter in the DG) that potentially disrupts the ability to generate stable codes both in the MOB and PC, and in particular a disruption that requires two different mechanisms - multiple M/T cells per glomerulus in the MOB and STDP in the PC - to counteract.

      We appreciate the reviewer’s suggestion and added discussion on this point in the revised manuscript (line 431-435).

      Given that neurogenesis has an important function, and a mechanism is in place to compensate for it in the MOB, why would it then be disrupted in fan-out projections to the PC? The answer may lie in the need for fan-out projections so that pyramidal neurons in the PC can combinatorially represent many different inputs from the MOB. So something like STDP would be needed to maintain stability in the face of the need for this coding strategy.

      This kind of discussion, or something like it, would help readers understand why these mechanisms occur in the first place. It is interesting that PC stability requires that odor environments be stable, and that this stability drives PC representational stability. This result suggests experimental work to test this hypothesis. As such, it is a novel outcome of the research.

      We agree with the reviewer. The fan-out from the bulb to the piriform cortex is essential for the combinatorial coding that allows PCx neurons to represent many odor features and mixtures. This architecture gives the piriform cortex great coding capacity, but it also makes the system sensitive to small changes in its inputs. As a result, drift that originates in the bulb can spread more easily in PCx. A stabilizing mechanism is therefore needed downstream. In our model, STDP provides this stabilization by reinforcing the dimensions that carry meaningful odor structure. This allows the piriform cortex to keep a stable population code even when its inputs change over time. Neurogenesis supplies the flexibility, the fan-out supplies the expressive power, and STDP supplies the stability. All three elements work together to support a system that must recognize odors reliably while still adapting to new sensory experiences. We have added discussion on this point in the revised manuscript (line 395-405).

      Reviewer #3 (Public review):

      Summary

      The authors set out to explore the potential relationship between adult neurogenesis of inhibitory granule cells in the olfactory bulb and cumulative changes over days in odorevoked spiking activity (representational drift) in the olfactory stream. They developed a richly detailed spiking neuronal network model based on Izhikevich (2003), allowing them to capture the diversity of spiking behaviors of multiple neuron types within the olfactory system. This model recapitulates the circuit organization of both the main olfactory bulb (MOB) and the piriform cortex (PCx), including connections between the two (both feedforward and corticofugal). Adult neurogenesis was captured by shuffling the weights of the model's granule cells, preserving the distribution of synaptic weights. Shuffling of granule cell connectivity resulted in cumulative changes in stimulus-evoked spiking of the model's M/T cells. Individual M/T cell tuning changed with time, and ensemble correlations dropped sharply over the temporal interval examined (long enough that almost all granule cells in the model had shuffled their weights).

      Interestingly, these changes in responsiveness did not disrupt low-dimensional stability of olfactory representations: when projected into a low-dimensional subspace, population vector correlations in this subspace remained elevated across the temporal interval examined. Importantly, in the model's downstream piriform layer, this was not the case. There, shuffled GC connectivity in the bulb resulted in a complete shift in piriform odor coding, including for low-dimensional projections. This is in contrast to what the model exhibited in the M/T input layer. Interestingly, these changes in PCx extended to the geometrical structure of the odor representations themselves. Finally, the authors examined the effect of experience on representational drift. Using an STDP rule, they allowed the inputs to and outputs from adult-born granule cells to change during repeated presentations of the same odor. This stabilized stimulus-evoked activity in the model's piriform layer.

      Strengths

      This paper suggests a link between adult neurogenesis in the olfactory bulb and representational drift in the piriform cortex. Using an elegant spiking network that faithfully recapitulates the basic physiological properties of the olfactory stream, the authors tackle a question of longstanding interest in a creative and interesting manner. As a purely theoretical study of drift, this paper presents important insights: synaptic turnover of recurrent inhibitory input can destabilize stimulus-evoked activity, but only to a degree, as representations in the bulb (the model's recurrent input layer) retain their basic geometrical form. However, this destabilized input results in profound drift in the model's second (piriform) layer, where both the tuning of individual neurons and the layer's overall functional geometry are restructured. This is a useful and important idea in the drift field, and to my knowledge, it is novel. The bulb is not the only setting where inhibitory synapses exhibit turnover (whether through neurogenesis or synaptic dynamics), and so this exploration of the consequences of such plasticity on drift is valuable. The authors also elegantly explore a potential mechanism to stabilize representations through experience, using an STDP rule specific to the inhibitory neurons in the input layer. This has an interesting parallel with other recent theoretical work on drift in the piriform (Morales et al., 2025 PNAS), in which STDP in the piriform layer was also shown to stabilize stimulus representations there. It is fascinating to see that this same rule also stabilizes piriform representations when implemented in the bulb's granule cells.

      The authors also provide a thoughtful discussion regarding the differential roles of mitral and tufted cells in drift in piriform and AON and the potential roles of neurogenesis in archicortex.

      In general, this paper puts an important and much-needed spotlight on the role of neurogenesis and inhibitory plasticity in drift. In this light, it is a valuable and exciting contribution to the drift conversation.

      We appreciate the reviewer’s comment and thank them for their thoughtful feedback.

      Weaknesses

      I have one major, general concern that I think must be addressed to permit proper interpretation of the results.

      I worry that the authors' model may confuse thinking on drift in the olfactory system, because of differences in the behavior of their model from known features of the olfactory bulb. In their model, the tuning of individual bulbar neurons drifts over time.

      This is inconsistent with the experimental literature on the stability of odor-evoked activity in the olfactory bulb.

      In a foundational paper, Bhalla & Bower (1997) recorded from mitral and tufted cells in the olfactory bulb of freely moving rats and measured the odor tuning of well-isolated single units across a five-day interval. They found that the tuning of a single cell was quite variable within a day, across trials, but that this variability did not increase with time. Indeed, their measure of response similarity was equivalent within and across days. In what now reads as a prescient anticipation of the drift phenomenon, Bhalla and Bower concluded: "it is clear, at least over five days, that the cell is bounded in how it can respond. If this were not the case, we would expect a continual increase in relative response variability over multiple days (the equivalent of response drift). Instead, the degree of variability in the responses of single cells is stable over the length of time we have recorded." Thus, even at the level of single cells, this early paper argues that the bulb is stable.

      This basic result has since been replicated by several groups. Kato et al. (2012) used chronic two-photon calcium imaging of mitral cells in awake, head-fixed mice and likewise found that, while odor responses could be modulated by recent experience (odor exposure leading to transient adaptation), the underlying tuning of individual cells remained stable. While experience altered mitral cell odor responses, those responses recovered to their original form at the level of the single neuron, maintaining tuning over extended periods (two months). More recently, the Mizrahi lab (Shani-Narkiss et al., 2023) extended chronic imaging to six months, reporting that single-cell odor tuning curves remained highly similar over this period. These studies reinforce Bhalla and Bower's original conclusion: despite trial-to-trial variability, olfactory bulb neurons maintain stable odor tuning across extended timescales, with plasticity emerging primarily in response to experience. (The Yamada et al., 2017 paper, which the authors here cite, is not an appropriate comparison. In Yamada, mice were exposed daily to odor. Therefore, the changes observed in Yamada are a function of odor experience, not of time alone. Yamada does not include data in which the tuning of bulb neurons is measured in the absence of intervening experience.)

      Therefore, a model that relies on instability in the tuning of bulbar neurons risks giving the incorrect impression that the bulb drifts over time. This difference should be explicitly addressed by the authors to avoid any potential confusion. Perhaps the best course of action would be to fit their model to Mizrahi's data, should this data be available, and see if, when constrained by empirical observation, the model still produces drift in piriform. If so, this would dramatically strengthen the paper. If this is not feasible, then I suggest being very explicit about this difference between the behavior of the model and what has been shown empirically. I appreciate that in the data there is modest drift (e.g., Shani-Narkiss' Figure 8C), but the changes reported there really are modest compared to what is exhibited by the model. A compromise would be to simply apply these metrics to the model and match the model's similarity to the Shani-Narkiss data. Then the authors could ask what effect this has on drift in piriform.

      The risk here is that people will conclude from this paper that drift in piriform may simply be inherited from instability in the bulb. This view is inconsistent with what has been documented empirically, and so great care is warranted to avoid conveying that impression to the community.

      We thank the reviewer for highlighting this important issue. We agree that the interpretation of our model requires care to avoid implying that the olfactory bulb exhibits spontaneous drift. As the reviewer points out, the empirical literature shows that M/T-cell tuning is highly stable for infrequently experienced odors, but can change with daily, persistent odor exposure (e.g., Kato et al., 2012; Yamada et al., 2017).

      We thank the reviewer for highlighting the Bhalla and Bower paper, as it is foundational and actually raises a number of interesting and important points. As the authors noted, there was significant variability in trial-to-trial responses over sessions and days in single neurons. This is likely due to on-going dynamics (Laurent, 1999), the impact of behaviorally relevant top-down feedback (Chen and Padmanabhan, 2022), decision making (Kay and Laurent, 1999), and an array of factors that our model does not include. In that manuscript, the authors note “the variability of the same neuron recorded over different days…was not statistically different from the within day comparisons.” While these results appear prima facie to be different from our results, there are several reasons why they may not be the case.

      First, different metrics are used for measuring neuronal stability, which may contribute to some of the differences. Second, and perhaps more importantly and interestingly, the authors in that study noted the significant trial-to-trial variability within day, which is not present in our study because our model has none of the richness of behavior that Bhalla and Bower found in the freely behaving rat. This variability within day (which is much higher than what we report) would reduce the impact of drift across days - a result that would complicate how plasticity across multiple timescales occurs. We thank the reviewer for the insights on this critical study and include these points in our discussion (line 321-330).

      Neural responses to odor representations are incredibly variable across different time scales (Padmanabhan and Urban 2010, Angelo et al 2011, Kapoor and Urban 2006, Friedrich and Laurent, 2001, Smear et al 2011, Wesson et al 2008). In our model, none of this selection of survival related to behavior is included, nor are there specific rules about which synapses may be preferentially strengthened (due to neuro modulation corresponding to behavioral choice and reinforcement learning). Instead, we aimed to recapitulate the experimental design of a few studies (Kato et al 2012, Yamada et al, 2017) to understand how neurogenesis and drift are related. Over the simulated 10 days, the odor is presented every day, and the network is otherwise frozen between sessions—meaning the model lacks mechanisms that would normally support recovery during intervals without odor exposure. Under these conditions, adult neurogenesis effectively interacts with repeated experience, producing gradual changes in individual M/T-cell tuning. Thus, our results should be interpreted as modeling experience dependent changes over the timescale of neurogenesis, not as evidence for spontaneous drift in the bulb. We now state this explicitly in the Discussion to prevent confusion and expand the discussion to incorporate some of these critical ideas (line 321-330).

      Major comments (all related to the above point)

      (1) Lines 146-168: The authors find in their model that "individual M/T cells changed their responses to the same odor across days due to adult-neurogenesis, with some cells decreasing the firing rate responses (Fig.2A1 top) while other cells increased the magnitude of their responses (Fig. 2A2 bottom, Fig. S2)" they also report a significant decrease in the "full ensemble correlation" in their model over time. They claim that these changes in individual cell tuning are "similar to what has been observed by others using calcium imaging of M/T cell activity (Kato et al., 2012 and Yamada et al., 2017)" and that the decrease in full ensemble correlation is "consistent with experimental observations (Yamada et al., 2017)." However, the conditions of the Kato and Yamada experiments that demonstrate response change are not comparable here, as odors were presented daily to the animals in these experiments. Therefore, the changes in odor tuning found in the Kato and Yamada papers (Kato Figure 4D; Yamada Figure 3E) are a function of accumulated experience with odor. This distinction is crucial because experience-induced changes reflect an underlying learning process, whereas changes that simply accumulate over time are more consistent with drift. The conditions of their model are more similar to those employed in other experiments described in Kato et al. 2012 (Figure 6C) as well as Shani-Narkiss et al. (2023), in which bulb tuning is measured not as a function of intervening experience, but rather as a function of time (Kato's "recovery" experiment). What is found in Kato is that even across two months, the tuning of individual mitral cells is stable. What alters tuning is experience with odor, the core finding of both the Kato et al., 2012 paper and also Yamada et al., 2017. It is crucial that this is clarified in the text.

      We thank the reviewer. As the issue raised here is related to the previous comment, we have clarified this in the revised text to avoid any misleading comparison and specify what aspects of our computational model map onto experimental studies and what aspects we cannot recapitulate and as a result, the places where our comparisons are limited.

      (2) The authors show that in a reduced-space correlation metric, the correlation of lowdimensional trajectories "remained high across all days"..."consistent with a recent experimental study" (Shani-Narkiss et al., 2023). It is true that in the Shani-Narkiss paper, a consistent low-dimensional response is found across days (t-SNE analysis in Shani-Narkiss Figure 7B). However, the key difference between the Shani-Narkiss data and the results reported here is that Shani-Narkiss also observed relative stability in the native space (Shani-Narkiss Figure 8). They conclude that they "find a relatively stable response of single neurons to odors in either awake or anesthetized states and a relatively stable representation of odors by the MC population as a whole (Figures 6-8; Bhalla and Bower, 1997)." This should be better clarified in the text.

      We agree with the reviewer that some of the cells in Shani-Narkiss Figure 8B showed relatively stable responses (while others did not). However, there is a clear monotonic increase in the “Average differences” over time, from “Same day” to “1 month” to “6 month”, as quantified in their Figure 8B. Although the author concluded that they "find a relatively stable response of single neurons”, we would argue that their data also provided evidence for what we would term “relatively unstable responses” as found in our model. But per reviewer’s suggestion, we better clarify it in the text now (line 194197).

      (3) In the discussion, the authors state that "In the MOB, individual M/T cells exhibited variable odor responses akin to gain control, altering their firing rate magnitudes over time. This is consistent with earlier experimental studies using calcium-imaging." (L3146). Again, I disagree that these data are consistent with what has been published thus far. Changes in gain would have resulted in increased variability across days in the Bhalla data. Moreover, changes in gain would be captured by Kato's change index ("To quantify the changes in mitral cell responses, we calculated the change index (CI) for each responsive mitral cell-odor pair on each trial (trial X) of a given day as (response on trial X - the initial response on day 1)/(response on trial X + the initial response on day 1). Thus, CI ranges from −1 to 1, where a value of −1 represents a complete loss of response, 1 represents the emergence of a new response, and 0 represents no change." Kato et al.). This index will capture changes in gain. However, as shown in Figure 4D (red traces), Figure 6C (Recovery and Odor set B during odor set A experience and vice versa), the change index is either zero or near zero. If the authors wish to claim that their model is consistent with these data, they should also compute Kato's change index for M/T odor-cell pairs in their model and show that it also remains at 0 over time, absent experience.

      We appreciate the reviewer’s suggestion and edited the text to make it more accurate (line 319-320).

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      (1) Line 28 "a graduate alteration in sensory perception". We do not know if drift results in changes in perception. If anything, behavioral evidence suggests that perception remains stable in spite of drift. For example, in Driscoll et al. (2017) mice are able to successfully navigate a virtual T maze despite drift, and in Schoonover et al. (2021), mice maintain aversive responses following fear conditioning, despite drift in the piriform. Finally, spatial navigation appears unimpaired despite pronounced drift in the hippocampus (e.g., Climer et al., 2025). It would be more appropriate to say "stimulusevoked activity patterns" than "sensory perception" or other words that refer to neuronal activity rather than cognition or behavior.

      We edited the text to make it more accurate per the reviewer’s suggestion (line 27).

      (2) In the introduction, the authors state: "This representational drift has led to the hypothesis that PCx, rather than being a primary sensory area, may be more like an association cortical region." (L76-78). However, the hypothesis that PCx operates as an association cortex comes originally from Haberly's work and thinking (e.g., Haberly and Bower, 1984, elaborated in extensive detail in Haberly, 2001). I think it would be appropriate to acknowledge that here.

      We added the references to make acknowledge that per the reviewer’s suggestion (line 77).

      (3) In the methods, the authors elegantly describe how they induce neurogenesis in their model using weight reshuffling (L805-814). I think it could really help the reader understand the model if this idea were also included in the results section. As the results section currently reads, it seems as if their model implemented neurogenesis in a different fashion: "To do this, following elimination of 10% of the GCs in the network, we added new cells and randomly assigned synaptic weights between these abGCs and M/Ts". I appreciate that in their model, shuffling all the weights of a given GC randomly is akin to "elimination", but I feel like at first blush the results section risks giving an impression a bit different than that actually used in the model.

      We edited the text to make it more accurate per the reviewer’s suggestion (line 110-112).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      This work develops a simple, rapid, low-cost methodology for assembling combinatorially complete microbial consortia using basic laboratory equipment. The motivation behind this work is to make the study of microbial community interactions more accessible to laboratories that lack specialized equipment such as robotic liquid handlers or microfluidic devices. The method was tested on a library of Pseudomonas aeruginosa strains to demonstrate its practicality and effectiveness. It provided a means to explore the complex functional interactions within microbial communities and identify optimal consortia for specific functions, such as biomass production.

      The primary strength of this manuscript lies in its accessibility and practicality. The method proposed by the authors allows any laboratory with standard equipment, such as multichannel pipettes and 96-well plates, to readily construct all possible combinations of microbial consortia from a given set of species. This greatly enhances access to full factorial designs, which were previously limited to labs with advanced technology.

      Another strength of the manuscript is the measurement and analysis of the biomass of all possible combinations of 8 strains of P. aeruginosa. This analysis provides a concrete example of how the authors' new methodology can be used to identify the best-performing communities and map pairwise and higher-order functional interactions.

      Notably, the authors do exceptionally well in providing a thorough description of the methodology, including detailed protocols and an R script for customizing the method to different experimental needs. This enhances the reproducibility and adaptability of the methodology, making it a valuable resource for researchers wishing to adopt this methodology.

      We thank the reviewer for their thoughtful comments and positive assessment of our work. Below we detail the changes we have introduced in the manuscript to clarify issues raised by the reviewer.

      While the methodology is robust and well-presented, there are some limitations that should be acknowledged more thoroughly. First, the method's scalability is an important factor. The authors indicate that it should be effective for up to 10-12 species, but there is no discussion of what sets this scale: time, amount of labor, consumables, the likelihood of error, sample volume, etc.

      The 10-12 species estimation is based on our own experience implementing the protocol, and set primarily by time, labor, and consumables (as rightly pointed out by the reviewer) rather than conceptual limitations of the approach. We have added clarifications in the Discussion (lines 401-405) regarding these scalability-limiting factors.

      Second, this methodology is tailored to construct communities where the abundance of each strain is identical in each combination. Therefore, combinations with a different number of strains also differ in the total initial amount of microbial cells. Second, variations in the initial proportions of the same set of strains cannot be readily explored.

      Note that the “density homogenization” step is optional and it could be skipped entirely, which would result in a same species being present at variable densities across consortia: specifically, skipping this step would make the density of a species in a consortium inversely proportional to the number of species in that consortium. Further variations in initial abundance could be explored by treating a same strain at two (or more) starting abundances as distinct inputs of the protocol – though this would naturally increase the number of combinations to test.

      We have included a paragraph in the Discussion (lines 416-423) describing how we can, in principle, extend our protocol to explore abundance effects.

      Third, the manuscript only discusses how to construct the combinations, and not how to assay them afterward (e.g. for community function, interspecific interactions, etc.). While details on how to achieve these goals are clearly outside the scope of this work, the use of biomass as an example function may obfuscate this caveat, which should be stated more explicitly.

      We agree that the manuscript focuses exclusively on the construction of microbial communities and does not address how these communities should be assayed afterward. This is an intentional scope decision. The proposed protocol is fully compatible with a wide range of functional, interaction-based, or omics-based assays. Absorbance is mentioned as an illustrative example of a possible readout, rather than as a recommended or exclusive parameter. We have revised the text to explicitly state that the assessment of community function or interspecific interactions lies outside the scope of this work and must be tailored to the specific biological question being addressed.

      Reviewer #1 (Recommendations for the authors):

      A few specific technical notes and notes about clarity:

      (1) It may be worth being more explicit about how to produce replicates. For example, producing technical replicates by inoculating multiple times from the same set of combinations, while biological replicates require making the combinations multiple times.

      We have updated the main text to clarify this point (line 780-781).

      (2) Figure 2C: May be worth adding some context to these performance numbers. What are typical accuracies? What would they be in a liquid handler?

      Assessing typical accuracies is nuanced since the error depends not only on the assembly steps, but also on potential intrinsic variation of the specific community function being tested and the method used to quantify it. One of the main reasons for including the experiment using colorant combinations was precisely to minimize these other sources of variation. In this experiment, we find that the error we quantify is consistent with cumulative pipetting variation (as a reference, a typical lab micropipette has an error of 0.5-1%). This is now explicitly mentioned in the manuscript.

      (3) Figure 5A: I realize it is unlikely that strains go extinct in these experiments. But it is still worth clarifying that the number of strains is the number inoculated, rather than the one present at the time of measurement.

      We updated the caption of Figure 5A as recommended by the reviewer.

      (4) Figure 5B: I realize this is just for illustration purposes, but you should provide more information about the magnitude of the difference in performance of these combinations and the confidence in their ranking (or variability in performance across replicates).

      Following this suggestion, we have added a paragraph where we report the variation across replicates for the highest-performing consortia (lines 318-323). Indeed, while variation across replicates is small, it is enough to produce an overlap between the confidence intervals of the function of some of the highest-performing consortia. This is now explicitly acknowledged in the manuscript.

      (5) Figure 5C: I believe the bold black lines indicate the combinations shown in panel D, but that is not explicitly stated.

      We have updated the caption of Figure 5C.

      Reviewer #2 (Public review):

      A simple and effective method for combinatorial assembly of microbes in synthetic communities of <12 species.

      Overall, this manuscript is a useful contribution. The efficiency of the method and clarity of the presentation is a strength. It is well-written and easy to follow. The figures are great, the pedagogical narrative is crisp. I can imagine the method being used in lots of other contexts too.

      The authors could better clarify what HOIs mean. They could address challenges with assaying community function. However, neither of these “weaknesses” affects the primary goal of the paper which is methodological.

      We thank the reviewer for the positive assessment. With respect to HOIs, we recognize that defining and quantifying them is a non-trivial subject within the broader field of microbial ecology (see e.g. ref. 24 within the manuscript). Since our aim with this manuscript is methodological, as the reviewer notes, here we have done our best to avoid introducing new or ambiguous definitions. For this reason, we simply adopt a definition given in previous works (including refs. 10, 19, 24, 29, 37, and 38 in the manuscript), where the context-dependence of pairwise interaction terms is taken as a signature of HOIs. With respect to the challenges in assaying community function, please see our responses below.

      Reviewer #2 (Recommendations for the authors):

      Overall, this manuscript is a useful contribution, I appreciate the authors taking the time to write it up! I have a few relatively minor comments.

      (1) It would be nice in the introduction to address why we might want the full factorial construction of communities in the first place. This is an especially relevant question in light of the authors' 2023 Nat E&E paper where they showed that the function of communities can often be learned even when only a fraction of all possible communities is measured. This is addressed in part in the paragraph on line 34, but I think it might be worth expanding a bit given the focus on the paper.

      We sincerely appreciate the reviewer’s feedback. In fact, one of the reasons that make full factorial construction desirable is precisely to test theoretical and computational models of community function, including (but not only) the statistical models developed in our 2023 Nature E&E paper. In that work, we showed that low-order models can explain a substantial fraction of the variation in community function in previously-published datasets, but we also predict that the same models could fail under complex structures of microbial interactions (e.g., strong high-order interactions). The protocol we present here enables the empirical quantification of such interactions, making this prediction (and others) directly testable. We have included that clarification in the revised manuscript (lines 56-58).

      (2) Around line 74, I think it is worth mentioning that even this elegant design will face insurmountable practical challenges (time, liquid handling operations, number of plates will explode) for full factorial design with 20, 30, 40 species or more. This is relevant for some very complex synthetic consortia that some microbiome groups are constructing (e.g. hCom2 from Huang/Fishbach groups) https://www.sciencedirect.com/science/article/pii/S0092867422009904.

      We agree with the reviewer that full factorial designs become impractical for very large species pools. These limits are now more clearly mentioned in the revised manuscript. We refer the reviewer to our response to comment #1 by Reviewer 1 for further details.

      (3) The binary construction is a really nice clean way to explain the protocol. Appreciate the pedagogy!

      We thank the reviewer for the appreciation.

      (4) In the experiment with pseudomonas strains the consortia are grown in LB. This medium will support growth to relatively high OD (>1). At these densities, the change in OD with density is almost certainly not linear with cell density, and this nonlinearity likely depends on strain identity. In this case, the assumption of additivity may not hold. As a result, some of the observed "interactions" may simply be non-linearity in the assay and not the abundance of bacteria in the communities. Of course, this does not affect the assembly protocol in any way, but it does complicate the interpretation of interactions via this assay. I think this is worth pointing out since other researchers may have to think carefully about the assay they use when constructing these synthetic consortia. I think in this methods paper it is important to emphasize this so other researchers do not mistakenly identify interactions due to issues with the assay.

      We thank the reviewer for pointing out this important aspect. In our experiment, we use Abs<sub>600</sub> simply as an example of a measurable community-level function. The reviewer is absolutely correct in that mapping absorbance to biomass is nuanced at large OD values, where this relationship becomes non-linear. While this is not an issue from the perspective of the protocol itself, it is indeed an important consideration for users who may want to obtain reliable quantifications of biomass. We have updated the manuscript to explicitly mention this potential issue (lines 307-313). We have also emphasized the fact that our focus on Abs<sub>600</sub> is strictly for illustrative purposes, and we have removed all instances where a direct mapping from Abs<sub>600</sub> to biomass was implied in the text.

      (5) Subtle point regarding HOIs. HOI (or pairwise) statistical interactions need not quantitatively be the same as interactions in a lotka volterra sense. I realize the authors do not explicitly use the term "interaction" in an gLV model formalism but this is how the majority of readers will interpret this term. I believe it is a research question as to how pairwise gLV interactions manifest themselves in terms of functional interactions. For example, a purely pairwise LV model could easily have HOI "functional interactions" if the function is total abundance since abundances depend nonlinearly on LV interactions. I think this part of the manuscript could be confusing to readers for this reason. I think the term "functional interaction" really helps with this issue, but just asking the authors to make sure this is clear.

      I say this because ref: 37 is focused on HOIs in an LV sense. Here, as the authors are aware, they are computing statistical "interactions" in the sense of epistasis. Given that they are computing this epistasis averaged across all community compositions a more appropriate citation might be [https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004771] where the same quantity is computed in a protein context.

      We thank the reviewer for pointing out this important issue. Indeed, we use the term “interaction” in a statistical sense (as the deviation of the observed community function from a null, additive expectation) rather than in a Lotka-Volterra sense. We agree that the reference suggested by the reviewer is more appropriate in this context. We have updated the reference list accordingly.

      (6) Figure 5G - a little hard to see. Any way to show this data more clearly? It looks like all interactions have a mean of 0 because of the way the data are presented.

      The reviewer is indeed correct in that, as defined, the interactions that we quantify are back ground dependent, and their average across backgrounds lies near zero for all species. More than an issue with the representation, we think that this is an important empirical observation: it indicates that a same species pair may interact positively or negatively depending on its ecological context. We believe that the current representation is most appropriate for making this clear, but we would be open to discussing alternatives if the reviewer had a specific suggestion in mind.

      Reviewer #3 (Public review):

      The authors developed a useful methodology for generating all combinations of multiple reagents using standard lab equipment. This methodology has clear uses for studying microbial ecology as they demonstrated. The methodology will likely be useful for other types of experiments that require exhaustive testing of all possible combinations of a given set of reagents (e.g., drug-drug antagonism and synergy).

      The authors provided a useful R script that generates a detailed experimental protocol for building the desired combination from any number of reagents. The produced document is useful and has clear instructions. The output of the computer script will be strengthened if graphical output is also provided (similar to the one provided in Figure 1C).

      The authors show that the error rate of the method doesn't go up with the number of combinations using dyes (Figure 2).

      The authors demonstrate the value of their methodology for studying interactions within microbial consortia by assembling all possible combinations of eight strains of Pseudomonas aeruginosa. The value of their methodology for this application is well-founded. However, it is also unclear why specific experimental choices were made for this application. It is unclear why authors continue to show the absorbance measurements of strain assemblies over the entire wavelength spectrum and not just for ABS 600 nm (Figures 3 and 4). It is also unclear why the authors provided information on the "sum of the three spectra" as this reference line is meaningless and not a reasonable null model for estimating how well specific strain combinations will grow together.

      Figure 5 illustrates the various analysis types that can be performed on the data collected from growing combinations of eight Pseudomonas aeruginosa strains. It is a very informative figure since it provides a "roadmap" on the various ways in which the dataset produced can be explored. The information in Figures 5 and S6 will likely be very useful for a wide audience.

      Reviewer #3 (Recommendations for the authors):

      (1) Congratulations. I think the manuscript lays out a simple and very elegant methodology that will be useful for many. While I think the method is overall well explained and rationalized, the paper can greatly benefit from further expansion of Figure 5 at the expense of Figures 3 and 4.

      We thank the reviewer for their thoughtful assessment of our work. We have considered the recommendations and discuss the following points in response.

      (2) Unless I am missing something, there is no reason to present data collected across the entire wavelength spectrum for microbial assemblies (Figures 3 and 4). Moreover, using the same color palette for bacterial strains (Figure 3A) and colorants (Figure 2) is highly confusing. I suggest considering using only the 600 nm wavelength for any data collected from microbial assemblies and using a very different color palette for bacteria and colorants to avoid misinterpretation of the data.

      We thank the reviewer for this suggestion. Our goal with Figures 3-4 was to illustrate the convenience of the protocol and the ease with which many measurements can be performed in parallel once the combinatorial assembly has been completed. While we focus on Abs<sub>600</sub> for all subsequent analyses, we chose to display the full spectra in Figs. 3-4 in hopes that future studies can make use of our rich dataset to interrogate questions on microbial interactions, with the option to focus on other wavelengths (which can effectively be treated as different community-level functions in their own right; for instance, we have previously used Abs<sub>405</sub> as a proxy for siderophore concentration). We think there is value in Figs. 3-4 in their current form to make this clear to readers.

      (3) Unlike dye absorbance, bacterial carrying capacity has an upper limit, so summing individual population absorbance as a reference line seems unjustified. If the summation of absorbance is meant to provide a "null model" for expected growth, a more suitable model should be considered (e.g., max spectra or a weighted sum of the spectra from individual members).

      We agree with the reviewer that our null model is not biologically constrained, and we did not intend to imply that the additive expectation was derived from biological principles. Instead, this additive expectation should be interpreted as a simple statistical baseline with minimal assumptions. The use of an additive baseline for quantifying microbial interactions has been addressed in the literature (see, e.g., references 10, 19, 24, 29, 37, and 38), and so here we chose to conform to this convention to avoid introducing new, non-standard quantifications of pairwise and higher-order interactions. We have revised the text to make this more explicit.

      (4) The R script is a valuable tool. I think that a valuable improvement will be to also generate visual representations as part of the script’s output such as the colored plates in Figure 1C that are specific to the generated protocol.

      We have updated the script so that it now also outputs a table specifying the location of each consortium within the plates. We chose to make this a text, rather than a graphics output, to ensure cross-device compatibility.

      (5) The discussion rightly acknowledges the potential to extend the protocol to larger libraries using liquid handlers. To facilitate this implementation, it might be beneficial to modify the script output so that the ‘volume’, ‘plate’, and ‘column’ values are tab- or comma-delimited.

      We thank the reviewer for the suggestion. We have modified the output so that it is now tab-delimited.

      (6) Figures 3 and 4 do not provide a lot of insight. I would suggest combining them into a single figure and using only absorbance values at 600 nm. It would also be interesting to add a histogram of these absorbance values and possibly show histograms for subgroups (e.g. all assemblies with more than 3 strains vs all assemblies with 3 or fewer strains).

      With respect to Figs. 3 and 4, we refer the reviewer to our response to comment #2. With respect to the histogram/subgroups plot, we understand that this would be a slightly modified version of the current Fig. 5A, where we show means and standard deviations across all subgroups of 1 to 8 species, and so we find it unclear what this figure would add.

      (7) With the recommendations of removing or reworking Figures 3 and 4, and the fact that Figure 5 is data-rich (and extremely useful), it would be beneficial to split Figure 5 and include the data shown in Figure S6 in the main figure. The analysis in Figure 6S is valuable and it might be beneficial to elevate this analysis to a primary figure and provide a detailed explanation of its rationale and methods in the main text.

      We appreciate this suggestion. In our view, we find that both the text and the figures benefit from a heavy focus on the assembly protocol, as this is the main contribution of this work. While we do think it is valuable to highlight the type and amount of data that can be collected with a full factorial assembly, as well as the types of analyses that can be performed with this data, we are afraid that allocating more space to these analyses may distract readers from the methodology itself. We have therefore chosen to keep the original structure for Figs. 5 and S6.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Here, Pinto and colleagues set out to investigate whether the cow udder is a potential mixing site for the influenza virus. The authors have demonstrated that bovine mammary epithelial cells can be infected with both avian and human influenza A viruses, supporting the idea that the cow udder may be a potential site for reassortment. Furthermore, they demonstrate that the bovine-adapted IAV replicates to similar titers in avian epithelial cells when compared to an AIV precursor virus. Thus, suggesting there is no fitness trade-off, and confirms the potential for spill-back of the cattle B3.13 into poultry, which has already been observed. Overall, I believe the authors achieved their aims. However, there are instances in which the results do not entirely support the conclusions (noted in weaknesses). Given the ongoing questions surrounding highly pathogenic avian influenza A virus in dairy cows, this work provides valuable evidence for the potential of the cow udder as a site of reassortment. These findings highlight the need for surveillance of influenza A virus incursions into livestock species, particularly cows. Some specific strengths and questions regarding weaknesses have been outlined below.

      Strengths:

      (1) The authors use a diverse range of cell types and influenza A virus strains, as well as a wide range of techniques to address the questions at hand.

      (2) The use of cells from multiple bovine breeds for the MAC-T, bMEC and explants suggests the phenomenon is not unique to a single breed.

      (3) The results suggesting there is no fitness trade-off for Cattle Texas in an avian host are interesting, and confirm the potential for spill-back of the cattle B3.13 into poultry, which has been observed.

      Weaknesses:

      I have listed my complete questions/concerns below. However, there are two main weaknesses of the article in its current state. Firstly, there is no apples-to-apples comparison in terms of determining a preference for IAV to infect the cow udder over other organs (Q4). The mammary gland and respiratory tract are represented by epithelial cells, but for other organs, fibroblasts were chosen. I think the fairer comparison would be to compare epithelial cells from different organs to demonstrate a preference for the mammary gland. Secondly, the main premise of the article relies on bMEC and MAC-T (primary and immortalised mammary epithelial cells), facilitating higher viral growth than the cells from other organs. Yet throughout the article, a 10x higher dose of IAV is used in the bMEC cells compared to everything else (Q6). This raises the question of how much of the results are due to a preference for the mammary epithelial cells, and how much is simply due to the increased dose.

      When we set out to test if cow mammary gland cells were particularly susceptible to IAV infection compared to other bovine cell types, we used what was available in the Roslin Institute in the first instance – a mix of primary and continuous cells from various anatomical sites: three epithelial cell types (two mammary, one respiratory tract) two immune cell types and four sets of fibroblasts from various organs. Given the representation of different anatomical sites, cell types and differentiation statuses, we considered this a suitably diverse panel with which to characterise infection dynamics of a broad range of IAVs, before more focussed investigations using the mammary bMEC and explant tissues. Both mammary epithelial cell types grew our library of influenza challenge strains significantly better than the BAT-II respiratory epithelial cells, as well as the two immune cell types and all four fibroblast populations. Of the fibroblast cells, those derived from the brain grew IAV significantly better than the skin and turbinate fibroblasts, while blood-derived macrophages grew virus significantly better than the lymphocytes and non-brain fibroblasts. Therefore, there are “apple to apple” comparisons as well as apple to pear comparisons that give significant differences. We therefore think that our conclusions (in the abstract) that mammary cells are particularly replication competent for IAV, (at the end of the introduction) that “a wide range of cow-derived cells are susceptible” and that (in the results section) that “mammary cells showed the highest susceptibility” are entirely justifiable. We do not claim that mammary cells are the only permissive bovine cells, but our evidence suggests they are highly susceptible.

      We used a higher MOI for bMECs because test experiments with WT PR8 and the Cattle Texas 6:2 reassortant showed that MOI 0.01 infections gave more variable results than ones run at MOI 0.1, perhaps because of the intrinsic variability of mixed primary cell populations. We therefore chose to go with the higher MOI. However, the end-point titres between the two conditions were not significantly different, so we do not think this choice is a confounding issue. We will add the comparison of the two MOIs as a supplementary figure in the formal revision.

      Reviewer #2 (Public review):

      The authors use a library of influenza A viruses from different strains, classified in lab-adapted, human, avian, and swine according to the animal from which they were isolated. They propose that the cow mammary gland serves as a mixing vessel for influenza A viruses. As a first approach, the authors assess susceptibility to infection across different cell types, including continuous and primary cell lines, bovine mammary cells, and mammary explants. All these cells support polymerase activity. Then, they analyzed changes in the bovine virus's viral fitness relative to an avian precursor. The authors use single-gene replacement to study whether and which RNP segments improve viral transcription. As part of this section, they also test IFN-specific antagonism by NS1 to assess the input of segment 8. Quantitative glycomic analysis was performed on the continuous bovine mammary cell line to demonstrate the presence of both a2,3 and a2,6, which is consistent with their observation that these cells can be co-infected with human and avian IAVs simultaneously. The main question, however, is: what is the glycome in the explants, or directly from tissues?

      We report quantitative glycomics for the primary bovine mammary epithelial cells as well as the continuous line the referee highlights. However, we agree with R2 that a detailed glycomic analysis of primary bovine mammary tissue would allow a better understanding of the actual glycosylation status in vivo. This has now been undertaken by the authors and is available as a bioRxiv preprint:

      Bovine H5N1 influenza viruses have adapted to more efficiently use receptors abundant in cattle

      Jack A. Hassard, Jiayun Yang, Bernadeta Dadonaite, Jonathan E.Pekar, Jin Yu, Samuel A. S. Richardson, Rute M. Pinto, Kristel Ramirez Valdez, Philippe Lemey, Jessica L. Quantrill, JinghanXue, Tereza Masonou, Katie-Marie Case, Jila Ajeian, Maximillian N. J. Woodall, Rebecca A. Ross, Nicolas Hudson, Kan Zhong, Hongzhi Cao, Samuel Jones, Hannah J. Klim, Brian R. Wasik, Desi N. Dermawan, Jean-Remy Sadeyen, Dirk Werling, DylanYaffy, Joe James, Alessandro Nunez, Paul Digard, Ian H. Brown, Daniel H. Goldhill, Pablo R. Murcia, Claire M. Smith, Yan Liu, Jesse D. Bloom, Munir Iqbal, Wendy S. Barclay, Stuart M.Haslam, Thomas P. Peacock: bioRxiv 2026.04.02.715584; doi:https://doi.org/10.64898/2026.04.02.715584

      Overall, the manuscript is clearly written and provides new insights into the behaviour of the cattle isolate, now compared with a representative group of model or precursor HAs of different origins.

      It would be great if a consistent nomenclature for the IAV strains could be used in the study. There is a mix of origin (Texas), animal from which the virus was isolated (mallard), or abbreviations that do not follow guidelines (IAV07). Are the USSR and Udorn not lab-adapted?

      We chose the abbreviated names for a variety of reasons. Partly from common usage (e.g. PR8, Udorn), partly for consistency with other already published papers from the FluTrailMap consortia (e.g. Cattle Texas; Dholakia et al 2026), partly to make diversity obvious in certain figures (e.g. H3N1, H5N2 etc) and partly to avoid confusion between viruses that originate from the same geographic area (e.g. AIV07, AIV09, H5N8-20 etc which are all Ck/England/isolate numbers). Overall, we found it more confusing to use the expanded nomenclature. Re AIV07 which the referee criticises for not following naming guidelines – if this is a reference to the EURL nomenclature, AIV07 is the abbreviation for the specific virus A/Chicken/England/053052/2021, our representative virus for EURL genotype EA-2020-C, as we say in the text. We should however have included this nomenclature in Table 1, which otherwise provides a cross-reference for all the names. This will be added in the formal revision to help with clarity.

      As to whether USSR and Udorn are lab adapted – that depends on definitions. There is a continuum of adaptive changes and/or sequence drift starting from the very first growth of an isolate in the laboratory. The viruses we define here as lab adapted are ones that have been deliberately adapted to other hosts or which have very long passage histories in multiple host species resulting in known functionally significant changes. For example, PR8, with 100s of passages in mice, ferrets and embryonated hens eggs (doi: 10.3390/v12060590), makes it unarguably lab-adapted. We admit that A/USSR/77 and A/Udorn/307/1972 are probably further along this adaptive pathway than more recent isolates such as A/Norway/3433/2018, but are unaware of any specific reason that would put them into our lab adapted category.

      The experimental setup includes bovine mammary primary and continuous cells, as well as mammary explants. Some of the most significant differences, for example, in viral fitness studies and co-infection experiments, are observed in these explants. Perhaps there could be some additional focus on this observation. The implications in comparison to the results obtained in cultured cells could be described. How will the human and other HA subtype viruses fare in the explants?

      We agree that this is an important and interesting question, and have tested the strains we used for co-infections, human seasonal H1N1 “Norway” and low pathogenic avian influenza “H3N1”, in the mammary explants. Both replicate, the avian virus to 20-fold higher titres. We will add this new information to the revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      This excellent manuscript by Pinto, Sharp, and colleagues examines bovine tissue tropism for influenza viruses. They find that bovine flu, as well as other strains, has strong replication in mammary tissue. They also map the genetic changes to influenza that improve replication in bovine cells. Overall, the study is well designed and executed, and the results are very timely.

      Strengths:

      (1) The experiments are well-controlled.

      (2) The figures are well-constructed and easy to follow.

      (3) The Methods and legends are detailed, with sufficient information.

      Weaknesses:

      (1) A comparison to human cells would strengthen the overall impact of the results. Are human mammary cells also uniquely susceptible to influenza? Are bovine mammary cells special in some way?

      This is an interesting question but we have not tested mammary gland cells from humans (or any other species of mammal), but we have reported elsewhere (Dholakia et al., Nat Commun. 2026 Jan 16;17(1):1603. doi: 10.1038/s41467-026-68306-6.) that Cattle Texas grows well in a variety of human respiratory cells. Here we are considering the bovine mammary organ as a potential reassortment site for IAVs; human mammary organs are unlikely to create this opportunity.

      (2) For the virus infection studies with segment 8 swaps, it should at least be noted that some of the phenotypes could be driven by NEP.

      We agree, and will change the text to acknowledge this in a revised version.

      (3) The data demonstrating that bMEC can support co-infection are compelling and important, but would be strengthened with a comparison from a different cell type or species. Do mammary cells uniquely support higher co-infection?

      We have data showing that co-infection also occurs in the continuous MAC-T udder cell line and will include these data in a revision. We have not tested bovine cells from other organs for co-infection potential as they do not seem to be significant sites of infection in vivo.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In their manuscript, Andriani et al. show intracellular zinc is exported from sperm during capacitation and suppresses the alkalinization-induced hyperpolarization in sperm. Intracellular zinc inhibits Slo3 current, which is enhanced by the co-expression of gamma subunit Lrrc52. Computational studies reveal that the Zn binding site on mSlo3 is located near E169 and E205, which are involved in the sustained zinc inhibition of mSlo3 current. The authors propose that intracellular zinc plays a key role in sperm capacitation by inhibiting the Slo3 channel.

      Strengths:

      Overall, the work appears well-designed (e.g., oocyte patch-clamp experiments), and clearly presented. Three-dimensional structural modeling and flooding simulations are executed.

      Weaknesses:

      The simple mutagenesis analysis of E169 and E205 showed partial abolishment, but the molecular mechanism by which zinc inhibits Slo3 current is not yet fully shown. The authors should consider performing more extensive experiments, such as creating double mutants or combination mutants involving other residues. Additionally, could other mechanisms explain the role of zinc in regulating the Slo3 current?

      We thank the reviewer’s thoughtful comments regarding the mutagenesis analysis and the possible mechanisms underlying zinc regulation of Slo3. Regarding the suggestion to perform double or combination mutants, we agree that such experiments would provide valuable mechanistic insight. However, due to limited resources, we were not able to perform these additional experiments within the scope of this study. Our current results show that mutations at E169 and E205 partially abolish zinc inhibition, which suggests that the inhibitory mechanism is not mediated through a single residue and is likely more complex.

      Alternative mechanisms that may contribute to zinc modulation of Slo3 include indirect effects through modulation of nearby charged residues, structural rearrangements influenced by zinc binding, or the presence of multiple zinc binding sites within Slo3 channel other than the sites discovered through this study. At present, these mechanisms remain speculative and further studies will be required to clarify their contributions. This study provides the foundational basis for understanding how zinc inhibits the Slo3 channel and serves as an important starting point for defining the molecular mechanism in more detail.

      We already acknowledged in the Discussion section that the precise molecular basis of zinc inhibition remains unknown and that future work involving more extensive mutational and structural analyses will be essential to fully resolve this issue.

      We also added the discussion section as follows:

      “It is worth noting that the incomplete loss of zinc sensitivity in these mutants suggests that additional mechanisms may participate in zinc modulation of Slo3. These may include modulation of nearby charged residues, structural rearrangements influenced by zinc binding, or the presence of multiple zinc binding sites. Comparisons with Slo2.2 (J. Zhang et al., 2023), KCNQ4 (Gao et al., 2017), and voltage-gated calcium channels (Sun et al., 2007) further support the possibility of diverse molecular determinants for zinc inhibition. Our VCF, mutagenesis, and simulation data together indicate that zinc influences voltage sensor movement in mSlo3, which may suggest a distinct inhibitory mechanism that warrants further investigation.”

      While elucidating the mechanism of Slo3 is interesting, there is substantial literature indicating how zinc regulates channel functions at a molecular level. Given this, the manuscript should provide a deeper understanding by clearly elucidating the molecular mechanism of the regulation of Slo3 current by zinc.

      Thank you for highlighting a very important point that requires deeper discussion and explanation regarding how zinc regulates Slo3 current at the molecular level. As reported, Slo3 is gated by membrane depolarization and, at the same time, this channel is also gated by intracellular pH, particularly alkalinization (Leonetti et al., 2012; Schreiber et al., 1998; X. Zhang et al., 2006). This makes the gating mechanism of this channel complex. The molecular mechanism underlying pH regulation of the Slo3 channel remains unknown (M. D. Lyon et al., 2023). We tested different pH conditions and membrane voltage to elucidate the effect of zinc on the Slo3 channel. Our data suggests that zinc inhibition in mSlo3 channels is dependent on pH (Fig. 2A-E), voltage (Fig. 2G-H; Fig.2—figure supplement 1A, B) and exhibits a long-lasting inhibitory effect (Fig. 2I, K).

      However, as much as we are aware that these data alone cannot explain the molecular mechanisms of zinc’s effect on Slo3 current, our mutagenesis experiments also did not provide a straightforward answer. The single amino acid mutations examined in this study, which contain clustered negative residues, did not significantly alter zinc-mediated current reduction compared to the wild type. As the reviewer pointed out, mutating one single amino acid may not be sufficient to fully identify other contributing residues within the predicted mSlo3 zinc-binding site. Therefore, more extensive mutagenesis studies will be required to fully elucidate the molecular mechanism of zinc inhibition in mSlo3, which could not be fully understood in this study.

      On the other hand, when we analyzed the percentage of current recovery of all the mutants, E169A and E205A showed significant current recovery upon the wash-out by pH 8.0 alone. Consistent with MD simulations, our electrophysiological recordings demonstrated that the long-lasting inhibitory effect of zinc was partly abolished by these mutations. Thus, our findings highlight the contribution of E169A, located at the lower end of S3 domain and E205A, located at the lower region of S4 domain, to zinc-mediated inhibition of mSlo3 current.

      Additionally, since the molecular mechanism of pH regulation on Slo3 channel remains unknown, the molecular basis of its dual gating has yet to be elucidated, making it difficult to draw a single definitive conclusion from our current research data on how zinc inhibits mSlo3 current. Nevertheless, this study provides the foundation for understanding possible mechanisms of zinc inhibition. Our VCF data suggest that zinc influences the movement of VSD of mSlo3, and together with our mutagenesis and MD simulations results, these findings represent an important first step toward elucidating the molecular mechanism of zinc inhibition of the mSlo3 current.

      Intracellular zinc exerts inhibitory effect on mSlo3, similar to what has been reported for Slo2.2 channels (J. Zhang et al., 2023), high- and low-voltage activated calcium channel families (Sun et al., 2007) and KCNQ4 channels (Gao et al., 2017). These studies identified different regions, amino acids, and possible mechanisms of zinc inhibition among these ion channels. For instance, in Slo2.2 channels, which belong to the same Slo family as Slo3, the zinc-binding site was identified in the RCK2 domain, where cysteine and histidine residues form a canonical zinc binding motif (J. Zhang et al., 2023). In KCNQ4 channels, zinc inhibits the channel activity in a non-canonical manner that depends on its physiological activator, the membrane lipid PI(4,5)P<sub>2</sub> (Gao et al., 2017). Although zinc exerts the inhibitory effects on those various voltage-gated potassium and calcium channels, the mechanisms differ. Our data suggests another distinct mechanism of zinc inhibition in the mSlo3 channel with the identified sites located in the VSD, where zinc influences the voltage-sensor motion, and consequently affects the complex gating of Slo3.

      We revised the discussion section as follows, which is also related to the previous comment:

      “It is worth noting that the incomplete loss of zinc sensitivity in these mutants suggests that additional mechanisms may participate in zinc modulation of Slo3. These may include modulation of nearby charged residues, structural rearrangements influenced by zinc binding, or the presence of multiple zinc binding sites. Comparisons with Slo2.2 (J. Zhang et al., 2023), KCNQ4 (Gao et al., 2017), and voltage-gated calcium channels (Sun et al., 2007) further support the possibility of diverse molecular determinants for zinc inhibition. Our VCF, mutagenesis, and simulation data together indicate that zinc influences voltage sensor movement in mSlo3, which may suggest a distinct inhibitory mechanism that warrants further investigation.”

      The manuscript includes no experimental data on the mechanism of intracellular zinc export during sperm capacitation, despite being crucial for the regulation of sperm function.

      We thank the reviewers for the valuable comment in this regard. We agree that mechanism of intracellular zinc export during capacitation is crucial for the regulation of sperm function, and it would be an important finding if we could provide the experimental data on this. However, there are significant technical difficulties in performing such experiments. Two protein families facilitate the transport of zinc across cellular and intracellular membranes in opposite directions: ZnT and ZIP. ZIP12 has been reported to be highly expressed in mouse testis (Zhu et al., 2022), as well as ZnT-1 (Elgazar et al., 2005). To date, there are no known inhibitors for zinc transporters, and there is also no suitable antibodies available for these transporters, which makes it difficult to design experiments to examine the intracellular zinc transport during sperm capacitation. Apart from the two reported zinc transporters, the functional significance of other ZnTs and ZIPs, particularly those related to capacitation, remains largely unclear, leaving the mechanisms of zinc transport in sperm during capacitation poorly understood. Moreover. homozygous Znt-1 knockout mice exhibit a lethal phenotype (Andrews et al., 2004).

      Reviewer #2 (Public review):

      Summary:

      In this paper, Andriani and colleagues are examining the potential role of Zn flux in sperm and its effect on Slo3 channels. This is an interesting question that is likely critical to how sperm function properly and Slo3 channels are a possible candidate for a downstream molecule that is impacted by Zn. In this paper, the authors use Zn imaging, sperm motility assays, and electrophysiology to show that Zn flux impacts sperm function. They then go on to look at the impact Zn has on Slo3 current and propose a binding site based on MD simulations. While the ideas are interesting, the experiments are not well described in many places making understanding the results very difficult. In addition, critical controls are missing throughout the paper.

      Strengths:

      The question of how Zn flux impacts membrane potential and sperm motility is an important one. Moreover, Slo3 presents an interesting candidate or the target of Zn regulation. The combination of methods used here also has the potential to uncover mechanisms of Zn regulation of Slo3.

      Weaknesses:

      Much of the paper lacks experimental description which makes interpretation quite difficult, or a detailed discussion is missing. Examples include:

      (1) Figure 1, particularly the Zn imaging, is not sufficiently described. How is the fluorescence intensity measured? A representative ROI? The whole tail and head? Are the sperm immobile? If not, there is evidence that motion artifacts can significantly distort these sorts of measures from Calcium measurements in Cilia. Were there controls done? Is the small amount of Zn seen in the tail above the background?

      We sincerely thank the reviewer for pointing out important details that we should provide in this study in order to make it well understood. We would like to answer and respond to the points raised by reviewer as follows:

      Fluorescence intensity is measured by the signal taken from the whole head and the proximal part of tail in sperm. We have included this in the materials and methods.

      Materials and Methods

      “Fluorescence intensity is measured by the signal taken from the whole head and the proximal part of tail in sperm.”

      Yes sperm is immobile during zinc imaging.

      We added the control data of zinc imaging without capacitation medium and incorporated the data into the graph in Figure 1B. For the control in non-capacitation medium, we use HS medium as newly explained in the methods, results, related figure (Figure 1B), and figure legends.

      Yes the small amount of Zn seen in the tail above the background. As shown in Fig. 1A we confirmed that the signal intensity at the proximal region of the tail was higher than the background. Therefore, the data for this region were calculated after background subtraction.

      (2) The second half of Figure 1 is also not well described. What is the extracellular solution in the recordings? When you apply the Zn ionophore, do you expect influx or efflux? I assume efflux is based on the conclusions but this should be discussed explicitly.

      The extracellular solution in the recordings for Figure 1 is HS solution (HEPES-buffered saline solution), a standard non-capacitation medium. We will include this information in the materials methods.

      Materials and methods

      “HS-based solution was used as the extracellular solution.”

      We assume that intracellular zinc levels increase upon application of zinc ionophore. Previous work has reported that sperm contain approximately 35.7 ng/10<sup>6</sup> cells in the head and flagellum (Henkel et al., 1999). When zinc pyrithione is applied, it facilitates the influx of Zn<sup>2+</sup> from the surrounding medium into the cell, thereby increasing intracellular zinc concentration. Zinc pyrithione functions both as a zinc source and as a transport facilitator, allowing Zn<sup>2</sup> to cross the otherwise impermeable lipid membrane without compromising membrane integrity.

      (3) Figure 2H labels the Y axis, "normalized current". Normalized to what? Why do neither of the curves end at 1? A better description of what this figure represents is needed.

      Normalization for figure 2H was performed by dividing the absolute current of mSlo3 at pH 8.0 of each voltage by the absolute current at the pre-determined highest voltage that still produced a stable mSlo3 current (i.e., good patch, good clamp). In this analysis, +140 mV was chosen as the highest voltage for normalization, since in few cells the patch was lost at +160mV and +180mV. Similar to the control condition, the absolute current of mSlo3 in the presence of 100 µM zinc was normalized to the absolute current of the control at +140 mV. This information has been included in the figure legends and the Materials Methods section of the revised manuscript.

      Materials Methods section:

      Figure legends for figure 2H has been updated.

      (4) The alpha fold simulations are not well described. How many Zn binding sites were found? Are all of the histidine mutations in Figure 4 Supplement 1 the ones that were found?

      We thank the reviewer for the question. In our AlphaFold3 input, we only input the transmembrane region of the protein. From there, we found four sites located as follows:

      Given that we are only interested in the intracellular side of the membrane, we are only interested in the site with the highest pLDDT value (confidence values). On the IC side, there are only two sites, where the other sites are located near the pore domain. The site is near E310 and K319.

      Author response image 1.

      AlphaFold3 prediction of the Zn binding site on IC side of Slo3

      The histidines in Fig. 4—figure supplement 1 are all histidines that are not in the transmembrane region. These residues were not included in the initial inputs for AlphaFold3. However, we conducted MD simulations including these residues and we were able to show that a few of these residues are in contact with Zn. We have now plotted the minimum distance between each of these residues and Zn in the flooding simulations.

      Author response image 2.

      MD simulations of histidines residues located in IC of Slo3

      Minimum distances between histidines in Fig. 4—figure supplement 1 and Zn<sup>2+</sup> from the flooding simulations. Different colors indicate different repeats.

      (5) There is no discussion of physiological intracellular Zn concentration. How much Zn is inside the sperm? How much if likely Free vs buffered? Is 100uM a reasonable physiological concentration?

      We estimated the intracellular zinc concentration in sperm based on human sperm data, which report a zinc concentration of approximately 35.7 ng/10<sup>6</sup> cells in the head and flagellum (Henkel et al., 1999). Considering the volume of a typical human sperm is about 15 µm<sup>3</sup> (Laufer et al., 1977), this translates to an estimated intracellular zinc concentration of approximately 400 mM, although the concentration of free zinc must be much lower than this level. Although exact intracellular zinc concentrations in mouse sperm are not well-documented, this estimate supports the observation of elevated zinc in non-capacitated sperm.

      There are a number of areas where the interpretation is not well supported by the data including:

      (6) You say in the Figure 4 supplement, that "we did not observe any significant decrease in the percentage of current inhibition." But that is a pretty misleading statement. There are large changes (increases) in the amount of zinc inhibition. These might be allosteric changes but I don't think you can safely eliminate these as relevant Zn binding sites. Also, some of these mutations appear to allow at least some unbinding of Zn.

      In our MD simulations, H720 is not at the zinc binding site and therefore, mutation to arginine would indeed eliminate its binding. We are showing this in the minimum distance analysis between Zn and H720 and show that they are further than 4 Å from each others (n=3), as shown in author response image 2.

      Chimera of Slo3/Slo1 RCK2 also showed large increases in the amount of zinc inhibition, and this might serve as a potential binding site. We agree that the statement: “we did not observe any significant decrease in the percentage of current inhibition.” is misleading, therefore we revised our interpretation and statement into:

      We revised the result section as follows:

      “However, the percentage of current inhibition varied across the mutated constructs, showing either increases or no appreciable change (Fig. 4—figure supplement 1B, C).”

      (7) Following up on the above point, it seems unfair to conclude that the D162S, E169A, and E205 mutants are part of the inhibitory binding site for Zn when the mutation has no effect on inhibition and only an effect on the washout. The mutations on the intracellular side also had an impact on the washout so it seems equally likely that they are the critical residues based on your data.

      We thank the reviewer for this important point. We agree that the absence of a strong reduction in the initial zinc inhibition makes it challenging to assign any single residue as a definitive zinc binding site. However, our interpretation is based not only on the electrophysiological data but also on the MD simulations, which consistently identified E169 and E205 as residues that frequently interact with zinc and stabilize zinc occupancy within the VSD region. Although the mutations did not markedly reduce the peak level of zinc inhibition, both E169A and E205A significantly altered the long-lasting inhibitory component during washout, which is consistent with the MD-predicted interactions. In contrast, the intracellular mutations affected washout but were not supported by MD simulations as potential zinc interaction sites. Taken together, these combined datasets support the idea that E169 and E205 contribute to zinc modulation of Slo3 in the VSD, even though additional residues or mechanisms are likely involved.

      (8) Nowhere in the paper do you make the specific link between Zn flux and membrane hyperpolarization via Slo3. You show that Zn flux changes the ability of the sperm to hyperpolarize and you show that Slo3 is inhibited by Zn but the connection between the two is not demonstrated. There appears to be a specific Slo3 blocker. If you use this in sperm, do you no longer see the Zn effect?

      Thank you for pointing out the need for clarifying this point. It is already known that sperm capacitation is well associated with the increase of intracellular pH (Vredenburgh‐Wilberg & Parrish, 1995; Y. Zeng et al., 1996), the hyperpolarization of the membrane (Arnoult et al., 1999; Y. Zeng et al., 1995) and the elevation of intracellular Ca<sup>2+</sup> concentration level (Breitbart, 2002; Publicover et al., 2007) through diverse ion channel activities. To explore whether these pathways are influenced by intracellular zinc, we used patch-clamp techniques to measure the membrane potential (Vm) as shown in Fig. 1D-K. It has been reported that under the whole-cell current clamp of mouse epididymal spermatozoa, resting membrane potential is hyperpolarized after intracellular alkalinization (Navarro et al., 2007). We mentioned this in line 100-108 in the manuscript.

      Next, our findings from the experiments using mouse spermatozoa suggest that intracellular zinc inhibits a key process in sperm capacitation, specifically the alkalinization-induced hyperpolarization. Previous studies have identified the pH-and voltage-dependent potassium channel Slo3 is responsible for the principal K<sup>+</sup> current (I<sub>KSper</sub>) in mouse spermatozoa (Navarro et al., 2007; Santi et al., 2010; Schreiber et al., 1998; X. H. Zeng et al., 2011). During capacitation, the rise in pHi leads to the activation of Slo3 channels, resulting in membrane hyperpolarization (Santi et al., 2010). Given this context, we next investigated whether intracellular zinc acts directly on the Slo3 channel and found that zinc inhibits mSlo3 current. We explained this rationale of the experiment in line 143-150.

      We add following sentence to add more clarity to the text:

      “During capacitation, the rise in pHi leads to the activation of Slo3 channels, resulting in membrane hyperpolarization (Santi et al., 2010).”

      Therefore, the text was modified into:

      “Our findings suggest that intracellular zinc inhibits a key process in sperm capacitation, specifically the alkalinization-induced hyperpolarization. Previous studies have identified the pH-and voltage-dependent potassium channel Slo3 is responsible for the principal K<sup>+</sup> current (I<sub>KSper</sub>) in mouse spermatozoa (Navarro et al., 2007; Santi et al., 2010; Schreiber et al., 1998; X. H. Zeng et al., 2011). During capacitation, the rise in pHi leads to the activation of Slo3 channels, resulting in membrane hyperpolarization (Santi et al., 2010). Given this context, we next investigated whether intracellular zinc acts directly on the Slo3 channel.”

      Regarding the specific inhibitor, as has been pointed out by the reviewer that a new Slo3 inhibitor, VU0546110, exhibited more than 40-fold selective for human Slo3 over Slo1 (M. Lyon et al., 2023). However, the effect of VU0546110 in mSlo3 has not been tested yet. Both mouse and human Slo3 exhibit similar responses to certain inhibitors, but mouse and human Slo3 is also differ in their responses to several other inhibitors (M. D. Lyon et al., 2023), making it uncertain if this VU0546110 will work on mSlo3.

      (9) In the second half of Figure 1, the authors suggest that there is "no hyperpolization in 100uM Zn. That is not really true. It is reduced but not absent.

      We modified the wording of “no hyperpolarization in 100 µM Zn” to “alkalinization-induced hyperpolarization was reduced in the 100 µM ZnCl<sub>2</sub> group.”

      “In contrast, alkalinization-induced hyperpolarization was reduced in the 100 µM ZnCl<sub>2</sub> group”

      (10) The claim that Lrcc52 with Slo3 shows a higher current inhibition at pH 7.5 than pH 8 is not well supported because there are only 3 replicates in the 7.5 case. In addition, the claim is made in the test that 100uM ZnCl2 "already inhibited mSlo3+Lrcc52 at pH7.5", contrasted with mSlo3 alone, is not tested statistically.

      Thank you for the valuable comment. Although Fig. 3F shows a statistical difference, we agree that having only three replicates at pH 7.5 may somewhat weaken the conclusion. Following this suggestion, we have revised the sentence as follows:

      “Alkalinization appeared to increase the percentage of current inhibition by 100 µM ZnCl<sub>2</sub>.”

      We provided statistical analysis to compare pH 7.5 between mSlo3 alone and mSlo3+Lrrc52 in the Figure 3—figure supplement 1D:

      The statistical analysis showed that 100 µM zinc significantly inhibited the mSlo3 + Lrrc52 current at pH 7.5 compared to the mSlo3 current alone. We have incorporated the necessary changes into the revised manuscript and updated the figure legends accordingly.

      In a number of places, better controls are needed.

      (11) How specific is this effect for Zn? Mg2+, for instance, is also a divalent cation that is in the hundreds of uM range inside the cell. Does it exert the same effect? Each ion certainly has unique preferred coordination geometries, does your predicted binding with MD show what you might expect for tetrahedral coordination with Zn? Did you test other divalent cations functionally or in silicon?

      To answer this question, we have tested this by building another AlphaFold3 model, with Mg<sup>2+</sup> instead of Zn<sup>2+</sup>. We did not opt for the all-atoms MD simulations due to the cost of the simulation. Here, the model shows that Mg are all clustered at the pore domain and does not reside anywhere near the Zn<sup>2+</sup> site from both MD simulations and the AF3 model.

      Author response image 3.

      AlphaFold3 model of Slo3 channel with Mg<sup>2+</sup>

      The Slo3 AlphaFold model from residue M1 to L330. The colour gradient reflects the pLDDT score range from 1.73 to 95.69. Purple sticks highlighted E169, N171 and E205. In this study, we did not examine other divalent cations in our electrophysiological recordings. Exploring their effects will be an important direction for future research.

      (12) For the VCF experiments, a significantly higher concentration of Zn was used (10mM). What is the reason for this? There is no discussion of how much a "puff" is. Assuming you are using the RNA injector it is probably on the order of 50nL or less. Assuming the volume of an oocyte is 1uL that would argue that the final concentration is 500uM or higher. But this is also complicated by potential local effects of high Zn at the injection site, artifacts of injecting that much metal, and the fact that a great deal of the Zn will likely be bound to other things inside the cell. Better controls are needed for this experiment.

      As pointed out by the reviewer, the volume of the oocytes is estimated to be approximately 1 µL. We performed manual injections using glass needle typically used for RNA injection. However, because the injections were done manually during real-time VCF recording (as illustrated in the experimental scheme), the exact volume of the solution injected into each oocyte could not be precisely controlled. We estimated that each drop to be approximately 50 nL, resulting in a final concentration around 500 µM, as described by the reviewer.

      The rationale for using relatively high concentration was to ensure that the zinc concentration inside the oocyte reached an effective level, since manual injection may sometimes deliver less than 50 nL of solution. In some cases, injections failed entirely due to the technical difficulty of the method. Because VCF recordings are already technically difficult, we aimed to ensure that zinc injection was successful in oocytes that exhibited robust fluorescence signal by injecting an excess amount of zinc that would not disrupt normal oocyte conditions. For example, 10 mM zinc was prepared in an acidic solution (pH 2.5). We verified that this acidic condition did not affect mSlo3 current by performing control injections with the acidic solution alone, since the mSlo3 current is not activated under acidic pH conditions

      Author response image 4.

      VCF control experimentes: vehicle injection.

      Reviewer #3 (Public review):

      Summary:

      The study titled "Zinc is a Key Regulator of the Sperm-Specific K+ Channel (Slo3) Function" aims to investigate the role of intracellular zinc in sperm capacitation and its regulation of the sperm-specific Slo3 potassium channel. Capacitation is a crucial physiological process that enables sperm to fertilize an egg, and membrane hyperpolarization through Slo3 activation is a well-established event in this process. The authors propose that intracellular zinc dynamically decreases during capacitation and inhibits Slo3-mediated K⁺ currents, thereby playing a regulatory role in sperm function.

      Strengths:

      (1) Novel Contribution to Sperm Physiology.

      The study provides new insights into how zinc dynamics contribute to sperm capacitation, specifically through its direct inhibition of Slo3 activity.<br /> Previous research has focused primarily on extracellular zinc's effect on sperm function; this work expands the discussion to intracellular zinc regulation, an area with limited prior investigation.

      (2) Strong Electrophysiological Evidence.

      The study employs inside-out patch-clamp recordings in Xenopus oocytes to demonstrate zinc's direct inhibition of Slo3 currents. The observed slow dissociation of zinc from Slo3 suggests a long-lasting regulatory effect, adding to the understanding of ion channel modulation in sperm cells.

      (3) Molecular Mechanistic Insights

      Using Molecular Dynamics (MD) simulations and mutagenesis, the authors identify potential zinc-binding sites within Slo3's voltage-sensing domain (VSD), particularly E169 and E205. These computational predictions are supported by electrophysiological recordings, strengthening the argument that zinc directly binds and inhibits Slo3.

      (4) Physiological Relevance and Functional Implications

      The study suggests that zinc inhibition of Slo3 could contribute to sperm motility regulation during capacitation.

      The authors provide sperm motility assays as supporting evidence, showing that zinc chelation affects motility only after capacitation has begun, suggesting a dynamic role of intracellular zinc in the capacitation process.

      Weaknesses:

      While the study presents compelling electrophysiological data and molecular insights, there are several critical gaps that must be addressed before fully supporting the physiological relevance of the findings.

      (1) The authors should measure the effects in sperm cells using the patch-clamp technique to directly record Slo3 currents. By normalizing Slo3 currents to cell capacitance at different intracellular zinc concentrations, the authors can quantitatively assess the extent of Slo3 inhibition by zinc and strengthen the physiological relevance of their findings.

      We thank the reviewer for the valuable comments to strengthen the physiological relevance of our findings. We provided additional data of Slo3 currents measured using perforated patch-clamp recording in sperm cells in experiments with zinc pyrithione (ZnPy) before and after the addition of 10 mM NH<sub>4</sub>Cl. Control experiments were conducted in the absence of ZnPy, in which Slo3 current were recorded before and after the application of 10 mM NH<sub>4</sub>Cl. These data have been integrated into Figure 1L-N and Figure 1—figure supplement 1A, B.

      It is worth noting that Slo3 current in this recording might contain other endogenous current, as no specific blocker was used. Nonetheless, the data showed that the Slo3 current in sperm tends to be inhibited by zinc, as shown by the plot of absolute Slo3 current after the addition of 10 mM NH<sub>4</sub>Cl in the absence of ZnPy (control) and in the presence of 100 µM ZnPy. There was a decrease in the fold change calculated from the absolute current before and after the addition of 10 mM NH<sub>4</sub>Cl of ZnPy treated group compared to the control group.

      We also provided data with the cell capacitance as suggested; however, cell capacitance obtained from the sperm recordings showed the capacitance throughout the head and midpiece of spermatozoa. On the other hand, Slo3 channels are not expressed in the entire spermatozoa, therefore the cell capacitance acquired from these recordings does not accurately reflect the area where the Slo3 channels are localized. Although we included normalization of Slo3 currents to cell capacitance before and after ZnPy application, this normalization should be interpreted with caution for the reasons mentioned above. The corresponding figure has been included in the supplementary data Figure 1—figure supplement 1A, B.

      We added sentences to the result section as follows:

      “We also measured Slo3 current using perforated patch-clamp recordings in spermatozoa treated with ZnPy, before and after the addition of NH<sub>4</sub> Cl. Control experiments were conducted in the absence of ZnPy, in which Slo3 current were recorded before and after the application of 10 mM NH<sub>4</sub>Cl (Fig. 1L-N; Fig. 1—figure supplement 2A, B). Slo3 current in sperm tended to be inhibited by zinc, as shown by the plot of absolute Slo3 current after the addition of 10 mM NH<sub>4</sub>Cl in the absence of ZnPy (control) and in the presence of 100 µM ZnPy (Fig. 1L, M). There was a decrease in the fold change calculated from the absolute current before and after the addition of 10 mM NH<sub>4</sub>Cl of ZnPy treated group compared to the control group (Fig. 1N). Taken together, these results confirmed that intracellular zinc indeed inhibits alkalinization-induced hyperpolarization in mouse sperm.”

      (2) Lack of Controls in Non-Capacitated Sperm

      The claim that zinc is exported from sperm during capacitation needs stronger experimental validation.

      The authors did not include a control group of non-capacitated sperm in key fluorescence imaging experiments, making it difficult to confirm that the observed zinc decrease is capacitation-specific rather than a general zinc redistribution process.

      To strengthen this conclusion, experiments should be performed in non-capacitating conditions to determine whether intracellular zinc levels remain unchanged.

      We added the control group of non-capacitated sperm in key fluorescence imaging experiments, as integrated in Figure 1B.

      The following changes in the Results and Figure Legend sections are revised and added:

      “We observed that there was a gradual and significant decrease in fluorescence intensity in both regions (Fig. 1B), particularly prominent in the flagellum (Fig. 1C). This decline suggests the active release of intracellular zinc from sperm flagellum occurs during capacitation. In contrast, the fluorescence intensity of the control group of non-capacitated sperm remained unchanged (Fig. 1B).”

      Figure Legend 1B was modified accordingly.

      (3) Unclear Role of Zinc in Physiological Capacitation

      The study clearly demonstrates zinc inhibition of Slo3 but does not sufficiently establish how this affects capacitation at a functional level.

      Additional motility and capacitation markers should be analyzed to confirm that zinc influences sperm behavior beyond Slo3 inhibition.

      We thank the reviewer for this valuable comment. We fully agree that zinc can influence sperm physiology through multiple mechanisms and that its overall effects on capacitation are complex. However, the main goal of our study is to investigate the mechanism and to determine whether intracellular Zn<sup>2+</sup> directly inhibits Slo3. Our results from both the heterologous expression system and the sperm membrane potential recordings consistently support this conclusion.

      For these reasons, we believe that adding such assays would not clarify the role of Slo3 in capacitation but rather risk confounding interpretation. Instead, we have expanded the Discussion to explicitly acknowledge these limitations and to emphasize that future studies combining genetic or pharmacological modulation of Slo3 with comprehensive capacitation analyses will be required to fully define its physiological impact.

      We added sentences to the discussion section in the revised manuscript as follows:

      “Although these results support a mechanistic link between zinc and Slo3 activity, future studies that combine genetic or pharmacological modulation of Slo3 with comprehensive capacitation analyses will be required to define its physiological impact in more detail. Within this context, this study highlights the potential importance of intracellular zinc in the regulation of sperm capacitation.”

      (4) Insufficient Data on Zinc-Slo3 Specificity

      The authors should consider using quinidine, a known washable Slo3 inhibitor, to confirm that zinc acts specifically on Slo3 channels rather than other endogenous ion channels.

      The study would benefit from including washout controls in the inside-out patch-clamp recordings, as seen in Figure 3-Supplement 1, to confirm that zinc inhibition is reversible or long-lasting.

      We thank the reviewer for raising the point regarding the need to confirm that the current observed in our recordings indeed represents Slo3 current by using a specific blocker such as quinidine, as there is a possibility that endogenous currents might also be present and that zinc could act on those endogenous currents. Performing experiments with quinidine would indeed be crucial to demonstrate the specificity of Slo3 current in our patch-clamp recordings.

      However, in our current experimental protocol, we apply ramp pulses multiple times and require a long series of recordings within a single session in one patch as described in the materials and methods as well as Figure 2I, Figure 4—figure supplement 1C, Figure 5B (pH 8.0 → 100 µM zinc → pH 8.0, to observe the washout effect). Incorporating quinidine into this sequence would make the protocol even longer (pH 8.0 → quinidine → washout → pH 8.0 → 100 µM zinc), which increases the likelihood of patch loss before completing the full set.

      Furthermore, we have ensured that the recorded current corresponds to Slo3 by using appropriate experimental conditions, specifically the suitable voltage range for activation, a high intracellular pH (pH 8.0), and high-potassium solutions in our recordings.

      (5) Missing Discussion of Zinc's Role in CatSper Regulation

      The study focuses solely on Slo3 but does not mention CatSper, the principal Ca<sup>2+</sup> channel essential for sperm capacitation.

      Zinc has been reported to inhibit CatSper activity, which could significantly impact sperm function.

      The discussion should address whether zinc's effect on Slo3 represents a broader regulatory mechanism influencing multiple ion channels during capacitation.

      Thank you for the comment. To the best of our knowledge, there have been no reports showing that CatSper activity is directly regulated by zinc ions.

      Furthermore, in our patch-clamp recordings with NH<sub>4</sub>Cl and ZnPy, we observed that the normal CatSper current increased even in the presence of ZnPy, which makes it challenging to conclude whether zinc directly affects CatSper channel activity.

      We added sentences to the discussion section in the revised manuscript as follows:

      “In addition to that, to date, there are only few reports on the effect of zinc on other sperm ion channels, and none have been reported in mouse sperm. One important study was reported by (Jeschke et al., 2021), in which seminal zinc was found to inhibit prostaglandin-induced activation of CatSper, a sperm-specific Ca<sup>2+</sup> channel, in human sperm. The complex opposing action of seminal zinc and prostaglandins on CatSper may help preventing premature activation of CatSper in the ejaculate and act as a dilution sensor, although this study does not provide direct evidence for zinc acting directly on CatSper (Jeschke et al., 2021).”

      Final Assessment

      This work presents important findings on zinc regulation of Slo3 channels, supported by strong electrophysiological and molecular analyses. However, the physiological relevance of these findings remains unclear due to missing controls, and needs additional functional assays. Addressing these issues would significantly enhance the manuscript's scientific rigor and impact.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Most of the specific comments and suggestions are in the public review. Minor additional comments primarily focused on presentation and textual errors are here.

      (1) There is something strange happening in Figure 6D in the -100ish range. I think it's likely related to the reversal potential of K+.

      Thank you for pointing it out. Yes in figure 6D there was strange plot in the range of -100 mV. As the reviewer has pointed out we also think that it is related to the reversal potential of potassium ions.

      (2) There are a number of errors in the text that make following it difficult. For instance, multiple times the authors say "In consistent" (line 120 as an example) when I think they mean consistent with.

      We changed the “in consistent” with “consistent with” throughout the revised manuscript.

      Reviewer #3 (Recommendations for the authors):

      The authors provide well-described experiments, particularly those examining the effects of intracellular zinc on Slo3 channels using inside-out patch-clamp recordings. However, some experimental designs intended to assess the physiological relevance of these findings during capacitation require additional controls and data before the authors' claims can be fully supported.

      Comments

      Major Concerns & Suggested Improvements

      Line 65: "In the present study, we find that intracellular zinc is exported during capacitation, indicating that zinc dynamics in spermatozoa play an important role in fertilization."

      This claim requires additional experimental data to be fully supported.

      Thank you for pointing it out. We have provided data for control experiments of zinc imaging in non-capacitated conditions in Figure 1B.

      Line 79: "Intracellular zinc is exported from sperm during capacitation."

      The authors should include controls in non-capacitated conditions to determine whether zinc export is specific to capacitation or a general process in sperm cells.

      Again, we have provided data for control experiments of zinc imaging in non-capacitated conditions in Figure 1B.

      Figures - General Comment:

      In all figures, please replace SEM (Standard Error of the Mean) with Standard Deviation (SD) for consistency and a more accurate representation of variability.

      SEM (Standard Error of the Mean) has been replaced with SD (Standard Deviation) in all figures (main figures and supplements) as well as in numerical description accordingly.

      Figure 1

      Panel B:

      Include a non-capacitating media control to confirm that the observed decrease in zinc-sensitive dye fluorescence is not due to artifact/photobleaching.

      We have provided data for control experiments of zinc imaging in non-capacitated conditions in Figure 1B.

      Perform an experiment with capacitating media supplemented with a higher concentration of zinc. If intracellular zinc export is a real effect, added extracellular zinc should prevent or reduce this phenomenon.

      We appreciate the reviewer’s suggestion; however, we believe that supplementing the medium with high concentrations of zinc is unsuitable for validating the export phenomenon due to confounding physiological factors. Our preliminary tests demonstrated that increasing extracellular zinc triggers a drastic increase in intracellular zinc as well (Author response image 5). Furthermore, the high concentration of BSA in the capacitation medium acts as a potent zinc buffer, precluding precise control over free Zn<sup>2+</sup> levels. Therefore, the inherent difficulty in maintaining defined extracellular and intracellular Zn<sup>2+</sup> gradients makes the interpretation of such data highly problematic. Future studies will focus on identifying the specific zinc transporters involved and characterizing their molecular mechanisms.

      Author response image 5.

      Zinc addition

      Clarify whether the "n" value represents different cells or multiple recordings from the same cell.

      n value represents different cells.

      Supplemental Figure 1:

      Incorporate Δ (delta) comparison between 10 min and 2 hours under control conditions and in the presence of TPEN.

      Here we provide data:

      Author response image 6.

      Δ comparition between control and TPEN

      Provide statistical analysis for these comparisons to make the effects of capacitation clearer.

      We did the calculation and statistical analysis, however there was no statistical difference, as shown in the author response figure 6 due to high variability of individual data.

      Figure 2

      Panel C:

      Incorporate inhibition at pH 7.4 and 6.0 for direct comparison.

      Recording inhibition effect of zinc at pH 6.0 is not possible because there would be no current to begin with, as mSlo3 is gated by both voltage and alkaline pH.

      Panel D:

      Include a washout control, similar to what is shown in Panel A.

      We included a washout control trace to Figure 2D.

      Panel E:

      Provide a longer reference trace in the absence of zinc to clearly visualize the control condition. The current reference segment is too short to properly assess baseline activity.

      Although we do not have a longer reference trace in the absence of zinc for Figure 2E, we instead show the trace recorded under the application of 0.1 µM zinc in Figure 2—figure supplement 1A to illustrate the current behavior.

      Panels G-H:

      Include inside-out patch-clamp traces and quantification of zinc washout effects.

      Inside out patch traces are shown in Figure 2G as we applied step-pulses protocol. The zinc washout effect could not be quantified because the patch was usually lost after the second step-pulse application.

      Panels I-K:

      Provide additional traces. In Panel I, the inhibition by zinc is clear, but in Panel J, the reduction appears less distinct and could be due to rundown or an artifact. Additional controls should clarify this.

      Figure 2K presents the most representative trace among five recorded cells. The apparent reduction is less distinct, likely due to an artifact caused by a bubble in the rapid perfusion system during solution exchange. However, at the end of zinc application (t = 50 s), the current amplitude was clearly reduced compared with that at t = 0–10 s.

      Figure 3

      Panel D:

      Include additional data showing the transition to pH 6 and washout with pH 7.5, similar to the experimental design in Panels A and B.

      We included additional data showing raw trace of the application of pH 6.0 in Figure 3D, also included the transition to pH 6 and washout with pH 7.5 in Figure 3E.

      Figure 3-Supplement 1:

      Include zinc washout experiments. This approach is one of the best ways to evaluate the reversibility of zinc inhibition on the channel.

      As mentioned above, in this recording we recorded step pulses up to +180 mV. The zinc washout effect could not be quantified because the patch was usually lost after the second step-pulse application.

      Figure 6

      Zinc Inhibition Specificity:

      The authors should use quinidine, a known washable Slo3 inhibitor, to assess Slo3 activity before and after zinc injection.

      This experiment would confirm that zinc specifically inhibits Slo3, rather than affecting other endogenous channels.

      We sincerely thank the reviewer for this valuable suggestion. However, given the technical difficulty of these experiments, which involve lengthy VCF recordings and manual zinc injections that significantly compromise oocyte health, it is not feasible to apply quinidine at this stage.

      Moreover, we observed voltage-dependent fluorescence changes around the VSD, and this change was influenced by the application of zinc, confirming that zinc specifically inhibits Slo3 rather than affecting other endogenous channels.

      Discussion - Key Revisions Needed

      Line 308: "Our results demonstrated that intracellular zinc is exported from spermatozoa during capacitation."

      This claim needs to be supported by experiments using non-capacitated conditions.

      Additionally, measuring maximum and minimum zinc concentrations under different conditions would improve the interpretation of fluorescence intensity changes.

      We now include negative control in non-capacitated sperm. The data is incorporated into Figure 1B.

      Line 309: "We further discovered that intracellular zinc regulates alkalinization-induced hyperpolarization in mice spermatozoa, mediated by Slo3 channel."

      Additional controls are needed to substantiate this claim.

      At this stage of the study, we do not have access to Slo3 knockout (KO) mice; therefore, performing additional experiments is not feasible.

      Line 316: "Using FluoZin3-AM for zinc imaging, we confirmed the presence of intracellular zinc in sperm (Fig. 1A), which is consistent with previous findings (Henkel et al., 1999). Our observations revealed that treatment with capacitation medium induced a decrease in zinc fluorescence intensity (Fig. 1B, C), suggesting that zinc levels are dynamic during capacitation."

      This statement must be supported by negative controls, including non-capacitated sperm conditions.

      We now include negative control in non-capacitated sperm. The data is incorporated into Figure 1B.

      Line 327: "We also observed that zinc chelator significantly affected the sperm motility only after, but not before, capacitation (Fig. 1-figure supplement 1)."

      Data presentation should be revised to highlight the effects of capacitation itself.

      The discussion should specify which motility parameters were affected and why others were not.

      In the text we mentioned that:

      “We incubated the isolated spermatozoa with cell permeable Zn<sup>2+</sup> chelator N,N,N',N'-Tetrakis(2-pyridylmethyl)ethylenediamine (TPEN) and measured the motility parameters before and after capacitation. We found that VAP (average path velocity), VCL (curvilinear velocity), and VSL (straight-line velocity) were influenced by the TPEN treatment only after the capacitation, as shown in Fig. 1—figure supplement 1. These results demonstrate that the dynamics of zinc levels during capacitation potentially contributes to sperm motility, highlighting the importance of zinc action in sperm physiology.”

      Indeed, we observed that zinc chelator significantly affected the sperm motility specifically in VAP (average path velocity), VCL (curvilinear velocity), and VSL (straight-line velocity) only after, but not before, capacitation (Fig. 1—figure supplement 1). Of note, it has been recently reported that all these motility parameters (VAP, VCL, and VSL) are reduced by Slo3-specific inhibitors in human sperm (M. Lyon et al., 2023). These findings are consistent with the idea that endogenous zinc dynamics control sperm motility through Slo3 during the capacitation process.

      Figure legend is revised accordingly.

      Line 369: "Structural determinants of zinc inhibition in the mSlo3 channel."

      The authors should include an analysis of the evolutionary conservation of the mutated sites across Slo1, Slo2, and Slo3.

      If Slo3 has a unique regulatory mechanism, these sites should show high sequence variability compared to other Slo channels.

      If these sites are highly conserved, the authors should explain how Slo3 differs functionally from Slo1 and Slo2 despite this conservation.

      We thank the reviewer for the valuable suggestions regarding the inclusion of additional discussion points on the structural determinants of zinc inhibition in the mSlo3 channel. We performed sequence alignment by using ClustalO between mSlo3, mSlo1, and mSlo2.2. It is worth noting that only human and frog variants of Slo2.1 sequence are available in the database, so we included only Slo2.2 subtype, as our focus was on Slo3 in mouse sperm.

      Based on the alignment, E169 (mSlo3 numbering) is conserved among the Slo family channels in mice, while in contrast E205 (mSlo3 numbering) is not. To date, there have been no report examining the corresponding residues to E169 (E191 in mslo1 or E176 in mslo2.2) for their zinc sensitivity. This might be because in both channels the zinc-binding sites are well defined where they are located in RCK1 domain for Slo1 (Hou et al., 2010) and RCK2 domain for Slo2.2 (J. Zhang et al., 2023). The identified binding site in Slo2.2 is conserved in Slo2.1 but not present in Slo1 and Slo3 (J. Zhang et al., 2023), further suggesting that zinc regulation differs among Slo family members. However, this does not rule out the possibility that regions surrounding E191 or E176 could provide to additional insights into zinc regulation in these channels, which could be of interest for future studies.

      Interestingly, in contrast to E169, E205 is not conserved across the Slo family, making this residue unique to the mouse Slo3 channel and potentially a determinant of zinc sensitivity in mSlo3. Given that E205 is located in the S4 domain and supported by our VCF results showing that zinc inhibition influences the motion of voltage-sensing domain of mSlo3, E205 represents an important residue to be explored in future studies. Furthermore, as this residue is unique only to Slo3, it highlights the distinct functional properties of Slo3 such as its gating mechanism as it is regulated by both membrane voltage and alkalinization, which has a different voltage range of activation compared to mSlo1 (Li et al., 2024) and involves distinct ligands and gating mechanisms compared to Slo2 (J. Zhang et al., 2023).

      We add the sequence alignment results into Figure 5—figure supplement 1F.

      We revised the results section as follows:

      “Additionally, we performed sequence alignment by using ClustalO between mSlo3, mSlo1, and mSlo2.2. It is worth noting that only human and frog variants of Slo2.1 sequence are available in the database, so we included only Slo2.2 subtype, as our focus was on Slo3 in mouse sperm. Based on the alignment, E169 (mSlo3 numbering) is conserved among the Slo family channels in mice, while in contrast E205 (mSlo3 numbering) is not. (Figure 5—figure supplement 1F).”

      We revised the discussion section as follows:

      “Based on sequence alignment, E169 (mSlo3 numbering) is conserved among Slo family channels in mice, whereas E205 (mSlo3 numbering) is not (Fig. 5—figure supplement 1F). To date, no studies have examined the corresponding residues to E169 (E191 in mSlo1 or E176 in mSlo2.2) for their potential zinc sensitivity, likely because the established zinc binding sites in these channels are located in the RCK1 domain for Slo1 (Hou et al., 2010) and the RCK2 domain for Slo2.2 (J. Zhang et al., 2023). The identified zinc binding site in Slo2.2 is conserved in Slo2.1 but is absent in both Slo1 and Slo3 (J. Zhang et al., 2023), further suggesting that zinc regulation differs among Slo family members. Although regions surrounding E191 or E176 may still provide additional insights into zinc regulation and could be of interest for future investigation, E205 stands out because, unlike E169, it is not conserved across the Slo family, making it unique to mSlo3 and potentially a specific determinant of zinc sensitivity in this channel.”

      Figure legend is revised accordingly.

      Line 392: "Physiological relevance of zinc inhibition of the mSlo3 channel in mouse sperm."

      The authors should mention the effects of zinc on CatSper channels, as CatSper is also crucial for capacitation.

      Slo3 inhibition may represent only one component of zinc's broader regulatory role during capacitation.

      We thank the reviewer for raising this important point regarding the physiological relevance of zinc inhibition of the mSlo3 channel in mouse sperm. We agree that we should have also discussed the effect of zinc on CatSper channels, as this channel is crucial for capacitation. To date, there are only few reports on the effect of zinc on CatSper channels, and none have been reported in mouse sperm. One important study was reported by (Jeschke et al., 2021), in which seminal zinc was found to inhibit prostaglandin-induced activation of CatSper in human sperm. The complex opposing action of seminal zinc and prostaglandins on CatSper may help preventing premature activation of CatSper in the ejaculate and act as a dilution sensor, which facilitating sperm to escape into female genital tract (Jeschke et al., 2021). Taking this into consideration, as the reviewer pointed out, zinc inhibition on Slo3 may represent only one component of zinc’s broader regulatory role during capacitation.

      We added a sentence to the discussion section in the revised manuscript as follows:

      “In addition to that, to date, there are only few reports on the effect of zinc on other sperm ion channels, and none have been reported in mouse sperm. One important study was reported by (Jeschke et al., 2021), in which seminal zinc was found to inhibit prostaglandin-induced activation of CatSper, a sperm-specific Ca<sup>2+</sup> channel, in human sperm. The complex opposing action of seminal zinc and prostaglandins on CatSper may help preventing premature activation of CatSper in the ejaculate and act as a dilution sensor, although this study does not provide direct evidence for zinc acting directly on CatSper (Jeschke et al., 2021).”

      The study presents valuable insights into the role of intracellular zinc in sperm capacitation and Slo3 channel function. However, the physiological impact of these findings remains unclear due to insufficient controls and missing key experimental data. The suggested revisions would strengthen the validity of the claims made by the authors and improve the overall scientific rigor of the manuscript.

      Key Areas for Improvement:

      Control experiments in non-capacitated conditions.

      Increased statistical rigor in figure analyses.

      More detailed experiments to confirm specificity of zinc action on Slo3.

      Expanded discussion of zinc's role beyond Slo3, including CatSper regulation.

      The authors should measure these effects in sperm cells using the patch-clamp technique to directly record Slo3 currents. By normalizing Slo3 currents to cell capacitance at different intracellular zinc concentrations, the authors can quantitatively assess the extent of Slo3 inhibition by zinc and strengthen the physiological relevance of their findings.

      By addressing these concerns, the manuscript will provide a more robust foundation for understanding zinc's regulatory role in sperm physiology and capacitation.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      The study presents valuable findings of an optimized E. coli cell-free protein synthesis (eCFPS) system that has been simplified by reducing the number of core components from 35 to 7; furthermore, the findings communicate a simplified 'fast lysate' preparation that eliminates the need for traditional runoff and dialysis steps. This study is an advance towards simplifying protein expression workflows, and the evidence provided is solid, starting with nanoluc, a protein that expresses readily in many systems, to applications to more challenging proteins like the functional self-assembling vimentin and the active restriction endonuclease Bsal. Data on the underlying mechanisms and efficiency of the presented system in terms of protein yield relative to other known cell-free systems would greatly enhance the findings' significance and the strength of the evidence. The paper remains of interest to scientists in microbiology, biotechnology and protein synthesis.

      We thank the editors for the positive assessment of our optimized E. coli cellfree protein synthesis (eCFPS) system and the "fast lysate" preparation.

      As suggested, we have significantly strengthened the evidence by adding:

      (1) Mechanism data: We have integrated a detailed analysis of the endogenous metabolic pathways (amino acids and nucleotides) into the Discussion section, supported by literature (Prinz et al. 1997; Yokoyama et al. 2010; Kigawa et al. 1999).

      (2) Efficiency comparisons: We have added quantitative comparisons of absolute protein yields between our simplified 7-component system and the conventional 35-component system (now in Figure S3 E-F), demonstrating that our system matches or exceeds traditional titers.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors only provided the data for optimization, leaving the underlying mechanism that explains the phenomena unexplained.

      We appreciate this feedback. To address the mechanism of how protein synthesis persists without exogenous additives, we have expanded the Discussion to explain how the "fast lysate" retains active endogenous enzymes. By omitting runoff and dialysis, our system preserves the metabolic capacity to synthesize amino acids (e.g., Cys and Trp from Ser) and nucleotides from residual precursors, as supported by the literature (Prinz et al. 1997; Yokoyama et al. 2010; Kigawa et al. 1999).

      Reviewer #2 (Public review):

      The production of the lysate requires special instrumentation, limiting accessibility. While the strengths of the study are well-emphasized, the limitations are not mentioned.

      We thank the reviewer for this point. While a high-pressure homogenizer is common in many molecular biology labs, we acknowledge it may be a barrier for some. We have now included a dedicated Limitations paragraph in the Discussion addressing accessibility and the inherent challenges of prokaryotic systems in producing complex human proteins requiring post-translational modifications.

      Reviewer #3 (Public review):

      (1) Clarification on "highly efficient" and the lack of comparison with typical high-yield systems.

      We have clarified "highly efficient" as a holistic balance of high yield, robustness, and simplified preparation. Crucially, we added absolute yield data (sfGFP standard curve) to Figure S3E-F demonstrating that our 7-component system performs comparably to or better than traditional high-yield protocols.

      (2) How did the authors ensure chemical composition only affected translation and not transcription?

      This is a key distinction. We performed new experiments using pretranscribed mRNA templates (Figure S3G) to isolate translational effects. While translation efficiency slightly decreased in the simplified buffer, the overall protein yield increased significantly due to a dramatic boost in transcription efficiency, confirming the system's net performance gain.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      There are specific concerns that need to be addressed:

      (1) On page 4, lines 103-109, the authors speculate that protein synthesis persists even in the absence of amino acids like arginine, cysteine, and tryptophan. They suggest that this is likely due to residual amounts of these amino acids present in the cell lysate. Yokoyama et al. demonstrated that these amino acids are generated from other amino acids by endogenous amino acid metabolic enzymes in the cell lysate (J. Biomol. NMR 48, 193, (2010), doi: 10.1007/s10858-010-9455-3.). Cysteine and tryptophan can be derived from serine. In this context, asparagine and glutamine can be disregarded because they are synthesized from aspartate and glutamate, respectively. A more indepth analysis is required to interpret the results accurately.

      We thank the reviewer for this insightful comment and for pointing us toward the relevant literature. We agree that the persistence of protein synthesis in the absence of exogenous amino acids like Arg, Cys, and Trp is driven by the robust metabolic capacity of our "fast lysate."

      Unlike conventional protocols, our "fast lysate" procedure deliberately omits runoff and dialysis steps, ensuring the maximal retention of active endogenous metabolic enzymes and residual small-molecule pools. As demonstrated by Yokoyama et al. (2010), E. coli cell extracts retain functional enzymes capable of synthesizing acid-sensitive amino acids from precursors or more stable amino acids. We have integrated a detailed mechanistic analysis of these endogenous metabolic pathways into the Discussion section and have cited Yokoyama et al. (2010) to support this interpretation.

      (2) On page 4, lines 111-115, the authors demonstrated that protein synthesis could occur even in the absence of CTP or UTP, provided ATP and GTP are present. This phenomenon can also be attributed to the analogous complementary actions of metabolic pathways.

      We agree with the reviewer's assessment. The ability of the optimized eCFPS to function without exogenous CTP/UTP relies on the same principle of endogenous metabolic conversion mentioned above. The omission of dialysis ensures that the lysate retains not only residual nucleotide pools but also the full suite of nucleotide metabolic enzymes. Powered by our optimized energy regeneration system, these enzymes maintain sufficient levels of CTP and UTP to support transcription and translation. This explanation has been added to the Discussion section to clarify the robustness of our system.

      (3) On Figure 3A, protein synthesis kinetics are presented in a stair plot instead of the commonly used scatterplot. Is there a specific reason for choosing the stair plot?

      We chose the stair plot representation to more clearly visualize the cumulative process of protein synthesis and its stabilization over discrete time intervals. Given that sampling occurred every 10 minutes, a stair plot effectively highlights the "plateau" phases and the incremental nature of accumulation, which can sometimes be obscured by dense scatter plots.

      (4) On Figure 3C. It is unclear which system is referred to as the "initial" system in Figure 3C. Which data point on Figures 3A and 3B corresponds to this "initial" system?

      We apologize for the lack of clarity. In Figure 3C, "initial" refers to the traditional 35-component system prior to our streamlining process. Figures 3A and 3B characterize the performance of the final optimized system alone. To resolve this ambiguity, we have updated the legend for Figure 3 to explicitly define the "initial" system as the pre-optimization control.

      (5) In Figure 5D, previously reported eCFPS and the system using "fast lysate" were compared. The only difference between the two systems seems to be the type of lysate used, according to the Supplementary table. Optimal concentrations for the components are the same for both lysates, or is there still room for optimization for "fast lysate"?

      The "fast lysate" primarily differs from conventional lysates in its preparation speed and the retention of endogenous cofactors/enzymes. While the optimal salt and energy concentrations remained consistent across both lysates in our tests, the "fast lysate" provides a higher baseline signal due to the endogenous T7 RNA polymerase and metabolic factors. We believe this demonstrates the robustness of the optimized reaction buffer across varying lysate preparation qualities.

      (6) The study suggests that the removal of DTT didn't negatively affect protein expression. However, based on my experience, certain proteins, especially those with cysteine residues on their surface, tend to aggregate without DTT. Did the authors attempt to express such proteins, or did they draw this conclusion based on the limited number of proteins tested?

      This is a valid concern. We based our conclusion on the functional expression of Bsal and vimentin—two proteins that are inherently prone to aggregation and misfolding. Their successful synthesis suggests that the intrinsic reducing capacity of the lysate (e.g., glutathione and thioredoxin systems) is sufficient for many targets (Prinz et al. 1997). However, we acknowledge that specialized cysteine-rich proteins may still require exogenous DTT. We have addressed this in the Discussion.

      Reviewer #2 (Recommendations for the authors):

      (1) Line 77-78 "we iteratively evaluated the contribution of individual constituents through luciferase reporter assays" - where is all the data? Please use an appropriate figure citation. Figure 1 cherry picks some components, but I think all data should be included.

      We have structured the data presentation to show dispensable components in Figure 1 (where removal does not inhibit reaction) and essential components in Figure 2 (where 0-concentration results in zero activity). This ensures a logical flow of the "streamlining" narrative. All raw data for these screenings have been included in the Source Data files.

      (2) Line 127 typo "concentrations".

      We thank the reviewer for pointing out this error. The typo "concentrations" has been corrected.

      (3) Figure 2: "protein expression levels" measured how?/what is the unit of the vertical bar on the right? I'm assuming that this experiment was conducted for discrete concentrations and thus generated discrete data points. However, the graph makes it seem as if this is continuous data. Kindly change the type of graphing to indicate that this is discrete data, showing each data point.

      We appreciate the reviewer's suggestion. Protein expression levels were measured using the Nanoluciferase (NLuc) reporter gene assay. We utilized heatmaps/contour plots because our data are bivariate, representing the simultaneous optimization of two concentrations (e.g., Mg<sup>2+</sup> and K<sup>+</sup> in Figure 2A). For such matrix-based screenings, heatmaps are significantly more effective than scatter plots at conveying synergistic trends and identifying optimal reaction landscapes. Notably, this visualization approach for discrete biochemical optimization data was successfully employed by Ban lab in their recent study on translation system optimization (Bothe and Ban 2024). The vertical color bar on the right represents the relative expression ratio, normalized to the maximum yield. Although we have provided a scatter plot of this discrete data for reference (see Author response image 1), we believe it appears visually cluttered due to the high density of data points, making it difficult to discern overarching trends. Heatmaps, by contrast, offer a much clearer representation of the optimal reaction landscape. To maintain transparency, the discrete concentration points tested are clearly reflected by the axis ticks, and all raw discrete data are available in the Source Data files.

      Author response image 1.

      (4) Also, for all figures: the way the units are presented (DTT/mM) is confusing to me; it could just be something like [DTT] (mM).

      We have revised all figures and tables to follow the standard format (e.g., [Component] (unit)) as suggested.

      (5) Do the sucrose gradient sedimentation data have replicates? If so, please indicate statistics.

      The sucrose gradient data provided (Figure 5C) is intended as qualitative evidence that the "fast lysate" method preserves intact 70S ribosomes across different preparation batches. This experiment has been performed independently multiple times with consistent results, demonstrating the high reproducibility of our preparation method. While we did not perform a quantitative comparative analysis of ribosome concentration, the consistency of the peaks confirms the integrity of the translational machinery.

      (6) Line 457: fix the red line.

      We thank the reviewer for pointing this out. The formatting issue has been resolved in the revised manuscript.

      (7) Please mention the limitations of this study in the discussion.

      We thank the reviewer for this suggestion. We have added a paragraph to the Discussion addressing the limitations of prokaryotic systems regarding complex eukaryotic post-translational modifications and chaperone requirements.

      (8) Please include all uncropped gels in the source data, alongside the raw data, as you have already done.

      As requested, we have provided all original, uncropped gel images in the Source Data files, alongside the raw data, to ensure full transparency and compliance with the journal's data sharing policies.

      Reviewer #3 (Recommendations for the authors):

      (1) The study lacks a comparison of protein levels with a typical cell-free protein synthesis system.

      We have performed new quantitative experiments (now included in Figure S3 E-F) to measure absolute protein yields. Our optimized system achieves yields comparable to, or exceeding, several widely recognized highyield protocols while utilizing significantly fewer components. We have also clarified in the text that "highly efficient" refers to the synergistic balance of high yield, low cost, and simplified preparation time.

      (2) What do the authors mean by "highly efficient", often used in the manuscript?

      We thank the reviewer for the opportunity to clarify our terminology. We have performed new quantitative experiments (now included in Figure S3) to measure absolute protein yields, demonstrating that our optimized system achieves yields comparable to, or exceeding, several widely recognized highyield protocols while utilizing significantly fewer components.

      In the context of this manuscript, we use the term "highly efficient" as a holistic descriptor that encapsulates three key dimensions of the system:

      (1) Performance Superiority: Achieving higher expression levels and faster kinetics compared to conventional 35-component systems.

      (2) Functional Robustness: The ability to efficiently synthesize challenging targets, such as cytotoxic proteins (BsaI) and aggregation-prone proteins (vimentin), which often fail in simplified systems.

      (3) Practical Utility: A drastic reduction in preparation time and cost through the "fast lysate" protocol and the removal of 28 auxiliary components, thereby lowering the barrier to adoption.

      This definition aligns with the study's core objective: developing a system where efficiency is measured not only by final yield but by the synergy between high performance and extreme ease of use.

      (3) In this article, the term 'optimisation' is used as a synonym for 'simplification'. In biochemistry, optimisation commonly refers to an increase in yield, or the same yield achieved more easily or at a lower cost. In this case, however, we have no idea how this new system compares to a conventional expression system in terms of yield.

      We thank the reviewer for this conceptual clarification. We agree that in biochemistry, "optimization" typically implies an improvement in yield or cost-effectiveness. In our study, we use the term to describe the process of achieving a superior balance between system simplicity and protein production. To address the reviewer's concern regarding the lack of a direct yield comparison, we have added new data in Figure S3. This figure provides a sideby-side comparison of protein yields between our simplified 7-component system and the conventional 35-component system. The results demonstrate that our system not only matches the performance of the traditional setup but frequently exceeds it in terms of final protein titer, while significantly reducing the reagent cost and preparation complexity. Thus, the simplification achieved in this work represents a true biochemical optimization of the cell-free synthesis process.

      (4) The levels of transcripts of the proteins studied were not determined in any of the experiments performed. Therefore, it is unknown whether the effects of different experimental conditions on NLuc, GFP or other protein expression are due to an effect on transcription, translation, or both.

      This is an excellent point. We performed a new set of experiments using mRNA templates instead of DNA to isolate the effects on translation (Figure S3G). Our results indicate that while the system's overall boost in NLuc expression is partially attributable to enhanced transcription efficiency, the translation machinery remains highly robust. We have updated the Results and Discussion to reflect this distinction.

      References

      Bothe, Adrian, and Nenad Ban. 2024. “A Highly Optimized Human in Vitro Translation System.” Cell Reports Methods 4 (4): 100755.

      Kigawa, T., T. Yabuki, Y. Yoshida, M. Tsutsui, Y. Ito, T. Shibata, and S. Yokoyama. 1999. “Cell-Free Production and Stable-Isotope Labeling of Milligram Quantities of Proteins.” FEBS Letters 442 (1): 15–19.

      Prinz, W. A., F. Aslund, A. Holmgren, and J. Beckwith. 1997. “The Role of the Thioredoxin and Glutaredoxin Pathways in Reducing Protein Disulfide Bonds in the Escherichia Coli Cytoplasm.” The Journal of Biological Chemistry 272 (25): 15661–67.

      Yokoyama, Jun, Takayoshi Matsuda, Seizo Koshiba, and Takanori Kigawa. 2010. “An Economical Method for Producing Stable-Isotope Labeled Proteins by the E. Coli Cell-Free System.” Journal of Biomolecular NMR 48 (4): 193–201.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors' goal was to advance the understanding of metabolic flux in the bradyzoite cyst form of the parasite T. gondii, since this is a major form of transmission of this ubiquitous parasite, but very little is understood about cyst metabolism and growth.

      Nonetheless, this is an important advance in understanding and targeting bradyzoite growth.

      Strengths:

      The study used a newly developed technique for growing T. gondii cystic parasites in a human muscle-cell myotube format, which enables culturing and analysis of cysts. This enabled screening of a set of anti-parasitic compounds to identify those that inhibit growth in both vegetative (tachyzoite) forms and bradyzoites (cysts). Three of these compounds were used for comparative Metabolomic profiling to demonstrate differences in metabolism between the two cellular forms.

      One of the compounds yielded a pattern consistent with targeting the mitochondrial bc1 complex, and suggest a role for this complex in metabolism in the bradyzoite form, an important advance in understanding this life stage.

      Weaknesses:

      Studies such as these provide important insights into the overall metabolic differences between different life stages, and they also underscore the challenge with interpreting individual patterns caused by metabolic inhibitors due to the systemic level of some of some targets, so that some observed effects are indirect consequences of the inhibitor action. While the authors make a compelling argument for focusing on the role of the bc1 complex, there are some inconsistencies in the some patterns that underscore the complexity of metabolic systems.

      Thank you for reviewing the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      A particular challenge in treating infections caused by the parasite Toxoplasma gondii is to target (and ultimately clear) the tissue cysts that persist for the lifetime of an infected individual. The study by Maus and colleagues leverages the development of a powerful in vitro culture system for the cyst-forming bradyzoite stage of Toxoplasma parasites to screen a compound library for candidate inhibitors of parasite proliferation and survival. They identify numerous inhibitors capable of inhibiting both the disease-causing tachyzoite and the cyst-forming bradyzoite stages of the parasite. To characterize the potential targets of some of these inhibitors, they undertake metabolomic analyses. The metabolic signatures from these analyses lead them to identify one compound (MMV1028806) that interferes with aspects of parasite mitochondrial metabolism. In the revised version of the manuscript, the authors present convincing evidence that MMV1028806 targets the mitochondrial electron transport (ETC) chain of the parasite (although they don't identify the actual target in the ETC). The revised manuscript also nicely addresses my other criticisms of the original version. Overall, the study presents an exciting approach for identifying and characterizing much-needed inhibitors for targeting tissue cysts in these parasites.

      Strengths:

      The study presents convincing proof-of-principle evidence that the myotube-based in vitro culture system for T. gondii bradyzoites can be used to screen compound libraries, enabling the identification of compounds that target the proliferation and/or survival of this stage of the parasite. The study also utilizes metabolomic approaches to characterize metabolic 'signatures' that provide clues to the potential targets of candidate inhibitors. In addition to insights into candidate bradyzoite inhibitors, the study also provides new insights into the physiological role of the mitochondrial electron transport chain of bradyzoites, and raises a host of interesting questions around the functional roles of mitochondria in this stage of the parasite.

      Weaknesses:

      In the revised manuscript, the authors have included additional oxygen consumption rate data that indicate that MMV1028806 targets the mitochondrial electron transport chain (ETC). These data are convincing. On line 481, the authors state that "treatments with ATQ, BPQ, MMV1028806, and antimycin A resulted in substantially reduced oxygen consumption levels relative to the DMSO control and suggest indeed a blockage of the mETC consistent with the inhibition of the bc1-complex." The OCR assay the authors use is still only an indirect measure of bc1 activity. Given that most OCR-inhibiting compounds in T. gondii are bc1 inhibitors, it is possible (and perhaps likely) that MMV1028806 is targeting this complex. However, the data cannot rule out that it is targeting another component of the ETC (or potentially even a TCA cycle enzyme). Without a direct test that MMV1028806 inhibits bc1 complex activity, the authors should be more cautious in their interpretation (e.g. by acknowledging the limitations of their conclusion, or acknowledging other possible targets). Similarly, the conclusion on line Line 622 that "... we confirmed the bc1-complex as a target" is overstating the findings. The phrasing on lines 683-695 is more appropriate: "... suggesting that it also targets complex III or a functionally linked site within the mitochondrial electron transport chain."

      We are grateful for he thorough review of the updated manuscript and the identification the minor issues. We addressed all of them as detailed below. We also tempered our conclusions regarding the identification of the bc1-complex as a target in line 616:

      “In addition to abundance data, Additionally, we confirmed the bc1-complex as a target by monitoring the incorporation of <sup>13</sup>C and <sup>15</sup>N stable isotopes from glucose and glutamine, respectively, into TCA cycle and pyrimidine biosynthesis intermediates suggest the bc1-complex as a target”

      Reviewer #3 (Public review):

      Summary:

      The authors described an exciting 400-drug screening using a MMV pathogen box to select compounds that effectively affect the medically important Toxoplasma parasite bradyzoite stage. This work utilises a bradyzoites culture technique that was published recently by the same group. They focused on compounds that affected directly the mitochondria electron transport chain (mETC) bc1-complex and compared with other bc1 inhibitors described in the literature such as atovaquone and HDQs. They further provide metabolomics analysis of inhibited parasites which serves to provide support for the target and to characterise the outcome of the different inhibitors.

      Strengths:

      This work is important as, until now, there are no effective drugs that clear cysts during T. gondii infection. So, the discovery of new inhibitors that are effective against this parasite-stage in culture and thus have the potential to battle chronic infection is needed. The further metabolic characterization provides indirect target validation and highlight different metabolic outcome for different inhibitors. The latter forms the basis for new studies in the field to understand the mode of inhibition and mechanism of bc1-complex function in detail.

      The authors focused in the function of one compound, MMV1028806, that is demonstrated to have a similar metabolic outcome to burvaquone. Furthermore, the authors evaluated the importance of ATP production in tachyzoite and bradyzoites stages and under atovaquone/HDQs drugs.

      Thank you for reviewing the revised manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Thanks for making appropriate updates. I believe it makes the report stronger. Just please double-check proof-reading in newly added text: for example "integration" is misspelled in Figure 4 legend (C, E).

      Typos have been corrected throughout the manuscript.

      Reviewer #2 (Recommendations for the authors):

      I congratulate the authors on an excellent study. I have several minor comments for the authors to consider before publication.

      Line 99. Schistosoma –

      Corrected

      Line 123. What was the pH of the bicarb-free RPMI medium?

      Added “at pH 7.2”

      Line 218 (and again on line 687). "RHku80" - are these just standard RH strain parasites? Or do the authors mean to imply that the ku80 gene has been knocked out in this line? If the latter, RH∆ku80 may be a better way to describe this line.

      We harmonized all mentions of this strain to RH∆ku80.

      Line 225. "Parasites were incubated in medium with one of the following treatments ..." How long were the parasites incubated in the different treatments before the plate was read? Was there any preincubation? I think not, but it would help to state this so the reader can appreciate that the effects of the compounds on OCR is likely an immediate (rather than a secondary) effect.

      This is indeed a good suggestion. There was no pre-incubation and we added changed the text to: “Parasites were incubated in medium with one of the following treatments immediately before measurement: … “

      Figure S2A. Check the spelling of Toxoplasmosis.

      Done, we corrected this sentence.

      Figure S2B. do you mean 'tachyzoidal' or 'tachyzocidal'? 'bradyzoidal' or 'bradyzocidal'?

      We clarified the formulation of the legends for Fig S2.

      Figure S2D. The "Tachyzoite lowest cytotoxicity" and "Bradyzoite lowest cytotoxicity" columns are, I think, depicting compound toxicity in host cells. Would it be clearer to rename these columns relative to the host cells being tested? e.g. "HFF/KD3 myotube lowest cytotoxicity"

      Good suggestion and we changed the designation accordingly.

      Line 369. "We found that tachyzocidal, bradyzocidal and dually active compounds possess a statistically significantly higher lipophilicity and this trend appeared more accentuated for bradyzocidal and dually active compounds." Significantly higher than what? Need to be clearer about the comparison being made: i.e. to non-active compounds.

      You are correct and we corrected this sentence accordingly.

      Line 500. "we attribute these changes to inhibition of host mitochondria (Fig. 5A)." The reason for referencing Figure 5A here isn't clear. Do the authors mean to point out that host mitochondrial membrane potential is affected by compound treatment? This could be stated more clearly.

      We deleted the reference to Fig 5A. We did not systematically measure the effect of the inhibitors on the membrane potential of the host mitochondria. We also changed the sentence to emphasize the speculative nature of this assertion: “we attribute these changes to potential inhibitory effects on host mitochondria”.

      Line 840. 'hurdling mechanisms'. The authors don't explain what they mean by this expression.

      We truncated the figure title to: “Untargeted metabolomic analysis of bradyzoites treated with bc1-complex inhibitors shows an energy imbalance.”

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      By mapping H3K4me2 in mouse oocytes and pre-implantation embryos, the authors aim to elucidate how this histone modification is erased and re-established during the parental-to-zygotic transition, as well as how the reprogramming of H3K4me2 regulates gene expression and facilitates zygotic genome activation.

      Employing an improved CUT&RUN approach, the authors successfully generated H3K4me2 profiling data from a limited number of embryos. While the profiling experiments are very well executed, several weaknesses, particularly in data analysis, are apparent:

      (1) The study emphasizes H3K4me2, which often serves as a precursor to H3K4me3, a well-studied modification during early development. Analyzing the new H3K4me2 dataset alongside published H3K4me3 data is crucial for comprehensively understanding epigenetic reprogramming post-fertilization and the interplay between histone modifications. However, the current analysis is preliminary and lacks depth.

      Thank you very much for your valuable suggestions. The data of histone H3K4me3 in humans and mice has been published,and our previous data revealed the unique pattern of H3K4me3 during early human embryos and oocytes (Science. 2019 Jul 26;365(6451):353-360.) . So, this study mainly focuses on the localization of H3K4me2 in mouse oocytes and preimplantation embryos, how it is erased and re-established during mammalian parental-to-zygote transition, and its function. The combined analysis of H3K4me2 and H3K4me3 is not our main work, but it is not ruled out that there may be new discoveries between these two histones. Previously, our data tended to show that the H3K4me2 not only acts as a precursor of H3K4me3, but also plays its role independently.

      (2) Tranylcypromine (TCP) is known as an irreversible inhibitor of monoamine oxidase and LSD1. While the authors suggest TCP inhibits the expression of LSD2, this assertion is questionable. Given TCP's potential non-specific effects in cells, conclusions related to the experiments using TCP should be made with caution.

      Thank you for pointing this out, and we thank the reviewer again for the important suggestion. We found that the previous study (.Binda C, Valente S, Romanenghi M, Pilotto S, Cirilli R, Karytinos A, Ciossani G, Botrugno OA, Forneris F, Tardugno M, Edmondson DE, Minucci S, Mattevi A, Mai A. Biochemical, structural, and biological evaluation of tranylcypromine derivatives as inhibitors of histone demethylases LSD1 and LSD2. J Am Chem Soc. 2010 May 19;132(19):6827-33.) indicated that TCP was a non-reversible inhibitor of LSD1 and LSD2 (Human LSD2/KDM1b/AOF1 Regulates Gene Transcription by Modulating Intragenic H3K4me2 Methylation, Mol Cell. 2010 Jul 30; 39(2): 222–233.), but according to our data, the content of LSD1 was very low in the early stages of mouse embryos, which mainly inhibited the function of LSD2.

      (3) Some batches of H3K4me2 antibody are known to cross-react with H3K4me3. Has the H3K4me2 antibody used in CUT&RUN been tested for such cross-reactivity? Heatmaps in the figures indeed show similar distribution for H3K4me2 and H3K4me3, further raising concerns about antibody specificity.

      We thank the reviewer for the insightful comments. The H3K4me2 antibody was purchased from Millipore (cat. 07030). Figure 2A shows the specific enrichment area of H3K4me2 in promoter and distal region. Some batches of H3K4me2 antibody are known to cross-react with H3K4me3, but the H3K4me2 antibody we used in our CUT&RUN seems to have Low cross-reactivity.

      (4) Certain statements lack supporting references or figures (examples on page 9 can be found on line 245, line 254, and line 258).

      Thank you for pointing this out, and we will add references to support the statement in the paper as suggested.

      (5) Extensive language editing is recommended to clarify ambiguous sentences. Additionally, caution should be taken to avoid overstatement - most analyses in this study only suggest correlation rather than causality.

      Thank you for your kind comments. We will revise the expression in the manuscript later.

      Reviewer #2 (Public Review):

      Chong Wang et al. investigated the role of H3K4me2 during the reprogramming processes in mouse preimplantation embryos. The authors show that H3K4me2 is erased from GV to MII oocytes and re-established in the late 2-cell stage by performing Cut & Run H3K4me2 and immunofluorescence staining. Erasure and re-establishment of H3K4me2 have not been studied well, and profiling of H3K4me2 in germ cells and preimplantation embryos is valuable to understanding the reprogramming process and epigenetic inheritance.

      (1) The authors claim that the Cut & Run worked for MII oocytes, zygotes, and the 2-cell embryos. However, it is unclear if H3K4me2 is erased during the stage or if the Cut & Run did not work for these samples. To support the hypothesis of the erasure of H3K4me2, the authors conducted immunofluorescence staining, and H3k4me2 was undetected in the MII oocyte, PN5, and 2-cell stage. However, the published papers showed strong staining of H3K4me2 at the zygote stage and 2-cell stage ((Ancelin et al., 2016; Shao et al., 2014)). The authors need to cite these papers and discuss the contradictory findings.

      The authors used 165 MII oocytes and 190 GV oocytes for the Cut & Run. The amount of DNA in MII oocytes is halved because of the emission of the first polar body. Would it be a reason that H3K4me2 has fewer H3K4me2 peaks in MII oocytes than GV oocytes?

      First of all, thank you for your valuable advice. The published papers showed strong staining of H3K4me2 at the zygote stage and 2-cell stage (Ancelin et al., 2016), which is interesting. I think we may have used different parameters in the confocal laser shooting process. We used the same parameter to continuously shoot the blastocyst stage from the GV stage. If we only shot the fertilized egg and the 2-cell stage, I think we may also see weak fluorescence at the 2-cell stage under different parameters. We will refer to this reference and discuss it in the resubmitted version.

      Moreover, you mentioned the H3K4me2 has fewer H3K4me2 peaks in MII oocytes than GV oocytes, because the MII expelled the polar body. There is no problem with this logic. However, the first polar body expelled from the MII stage is still in the zona pellucida, and we also collected the polar body in the CUT&RUN experiment; Therefore, compared to GV, the DNA content of MII samples is not halved. After further discussion, we believe that the reduction of H3K4me2 peaks in MII stage compared with GV stage may be closely related to oocyte maturation. It is the specific modification of histones in different forms at different times that affects the chromatin structure change appropriately with the different stages of meiosis. At present, it has been confirmed that H3K4me3 gradually decreases from GV to MII stage during the maturation of human oocytes. H3K27me3 did not change from GV to MII stage.

      In Figure 3C, 98% (13,183/13,428) of H3K4me2 marked genes in GV oocytes overlap with those in the 4-cell stage. Furthermore, 92% (14,049/15,112) of H3K4me2 marked genes in sperm overlap with those in the 4-cell stage. Therefore, most regions maintain germ line-derived H3K4me2 in the 4-cell stage. The authors need to clarify which regions of germ line-derived H3K4me2 are maintained or erased in preimplantation embryos. Additionally, it would be interesting to investigate which regions show the parental allele-specific H3K4me2 in preimplantation embryos since the authors used hybrid preimplantation embryos (B6 x DBA).

      Thank you very much for your suggestion. Further analysis of which regions show the parental allele-specific H3K4me2 in preimplantation embryos will make the study more interesting. We will discuss this in depth in resubmitted vision.

      (2) The authors claim that Kdm1a is rarely expressed during mouse embryonic development (Figure 4A). However, the published paper showed that KDM1a is present in the zygote and 2-cell stage using immunostaining and western blotting (Ancelin et al., 2016). Additionally, this paper showed that depletion of maternal KDM1A protein results in developmental arrest at the two-cell stage, and therefore, KDM1a is functionally important in early development.

      The authors should have cited the paper and described the role of KDM1a in early embryos.

      In the analysis of this experiment, we believe that in the early embryonic development of mice, the expression of KDM1A is lower than that of KDM1B, which is relative. Similarly, the transcriptome data we cite also show that KDM1A is expressed at elevated levels during oocyte maturation and fertilization compared to immature oocytes. In addition, the effects of loss of maternal KDM1a on embryonic development were not discussed. We believe that the absence of maternal KDM1b blocks embryonic development, and we will cite and discus the references later.

      (3) The authors used the published RNA data set and interpreted that KDM1B (LSD2) was highly expressed at the MII stage (Figure S3A). However, the heat map shows that KDM1B expression is high in growing oocytes but not at 8w_oocytes and MII oocytes. The authors need to interpret the data accurately.

      After re-checking the data, we found that there was a problem with the normalization method of our heat map, and we will re-make the heatmap and submit it in the modified version. With reference to Figure 4A, the content of Kdm1b is indeed higher than that of Kdm1a.

      (4) All embryos in the TCP group were arrested at the four-cell stage. Embryos generated from KDM1b KO females can survive until E10.5 (Ciccone et al., 2009); therefore, TCP-treated embryos show a more severe phenotype than oocyte-derived KDM1b deleted embryos. Depletion of maternal KDM1A protein results in developmental arrest at the two-cell stage ((Ancelin et al., 2016)). The authors need to examine whether TCP treatment affects KDM1a expression. Western blotting would be recommended to quantify the expression of KDM1A and KDM1B in the TCP-treated embryos.

      We will further dig the transcriptome data to confirm the specificity of TCP to KDM1b. In addition, the intervention of TCP on the whole fertilized egg in this study increased the H3K4me2 content, and the embryo development retarding effect was more significant than that obtained by crossing with normal paternal lines after knocking down KDM1B from the mother.

      (5) H3K4me2 is increased dramatically in the TCP-treated embryos in Figure 4 (the intensity is 1,000 times more than the control). However, the Cut & Run H3K4me2 shows that the H3K4me2 signal is increased in 251 genes and decreased in 194 genes in the TCP-treated embryos (Fold changes > 2, P < 0.01). The authors need to explain why the gain of H3K4me2 is less evident in the Cut & Run data set than in the immunofluorescence result.

      Thanks a lot for your question. In the experimental group, the fluorescence value of H3K4me2 in IF was increased by 1000 times (Figure 4E), and the expression of H3K4Me2-related genes in CR was up-regulated and down-regulated for a total of 445 changes (Figure 6A). In our opinion, as a semi-quantitative analysis, immunofluorescence cannot be compared with the quantitative analysis method of CR because of the different analysis models and threshold Settings.

      References

      Ancelin, K., ne Syx, L., Borensztein, M., mie Ranisavljevic, N., Vassilev, I., Briseñ o-Roa, L., Liu, T., Metzger, E., Servant, N., Barillot, E., Chen, C.-J., Schü le, R., & Heard, E. (2016). Maternal LSD1/KDM1A is an essential regulator of chromatin and transcription landscapes during zygotic genome activation. https://doi.org/10.7554/eLife.08851.001

      Ciccone, D. N., Su, H., Hevi, S., Gay, F., Lei, H., Bajko, J., Xu, G., Li, E., & Chen, T. (2009). KDM1B is a histone H3K4 demethylase required to establish maternal genomic imprints. Nature, 461(7262), 415-418. https://doi.org/10.1038/nature08315

      Shao, G. B., Chen, J. C., Zhang, L. P., Huang, P., Lu, H. Y., Jin, J., Gong, A. H., & Sang, J. R. (2014). Dynamic patterns of histone H3 lysine 4 methyltransferases and demethylases during mouse preimplantation development. In Vitro Cellular and Developmental Biology - Animal, 50(7), 603-613. https://doi.org/10.1007/s11626-014-9741-6

      Reviewer #3 (Public Review):

      Summary:

      This study explores the dynamic reprogramming of histone modification H3K4me2 during the early stages of mammalian embryogenesis. Utilizing the advanced CUT&RUN technique coupled with high-throughput sequencing, the authors investigate the erasure and re-establishment of H3K4me2 in mouse germinal vesicle (GV) oocytes, metaphase II (MII) oocytes, and early embryos.

      Strengths:

      The findings provide valuable insights into the temporal and spatial dynamics of H3K4me2 and its potential role in zygotic genome activation (ZGA).

      Weaknesses:

      The study primarily remains descriptive at this point. It would be advantageous to conduct further comprehensive functional validation and mechanistic exploration.

      Key areas for improvement include enhancing the innovation and novelty of the study, providing robust functional validation, establishing a clear model for H3K4me2's role, and addressing technical and presentation issues. The text would benefit from the introduction of a novel conceptual framework or model that provides a clear explanation of the functional consequences and molecular mechanisms underlying H3K4me2 reprogramming in the transition from parental to early embryonic development.

      While the findings are significant, the current manuscript falls short in several critical areas. Addressing major and minor issues will significantly strengthen the study's contribution to the field of epigenetic reprogramming and embryonic development.

    1. Author response:

      eLife Assessment

      This useful study presents an improved protocol for long-term in vitro culture of Schistosoma mansoni that enables progression toward sexually dimorphic stages, representing a meaningful advance for studying parasite development and reducing reliance on animal models. The findings show that host-specific culture conditions support essential developmental and metabolic functions required for parasite maturation, although development remains delayed compared to in vivo conditions. The evidence is solid overall, but limited pairing efficiency and the absence of egg production indicate that the system does not yet fully recapitulate complete reproductive development.

      On behalf of the co-authors, we thank the three reviewers and the editors for their complimentary remarks as well as the major and minor comments/ concerns. Addressing these concerns have led to revisions that improved the manuscript. In particular, further analyses have generated an updated Figures 3 and 4, and Supplementary Tables S1, and S4-S6.

      Public Reviews:

      Reviewer #1 (Public review):

      Pichon, Rémi et al. describe an in vitro method for transforming Schistosoma cercariae into mature adult worms. The authors show that human serum (HS) supports parasite growth and differentiation more effectively than fetal bovine serum (FBS). They also observed differences in parasite growth and activity, with worms cultured in HS efficiently digesting human red blood cells (hRBC). Cultured worms were able to pair with ex vivo adult worms and produce eggs, indicating functional maturation suitable for downstream applications such as drug screening. While the experimental approach is comprehensive and supports the advantage of HS culture conditions, the pairing efficiency was low (≈7%) and required long culture periods (70-80 days), highlighting limitations that may affect reproducibility.

      We acknowledge the reviewer for the positive highlights. Regarding the low in vitro pairing efficiency, we have now edited the manuscript to clarify a misleading statement related to 7%. We decided to remove the value of 7% — which corresponds to the percentage of experiments in which couples were observed, as it does not accurately represent the actual number of observed worm pairs and it is probably misleading. We have updated the text as follows:

      Results, lines 230 ff.:

      “While the establishment of sexual dimorphism was robust and reproducible across more than 15 independent experiments, pairing between male and female parasites was rare. Pairing was observed only in experiments lasting more than 80 days in which we were only able to observe a few couples. In addition, these pairings were temporary (Figures 6A, B; Supplementary Video S4).”

      We also agree with the reviewer that the extended culture periods required to obtain fully sexually dimorphic parasites remain a limitation. As elaborated in Discussion (see below), key factors, probably derived from the host, are missing in the in vitro system explaining both the slow in vitro development and low rate of spontaneous pairing between in vitro developed, sexually dimorphic male and female worms. This was discussed as follows (lines 340-343): “That said, while our system was highly efficient in producing sexually dimorphic worms, spontaneous pairing between male and female parasites was extremely rare, mainly in aged in vitro cultures (from 80 to 100 days in culture) indicating that other factors, e.g., cholesterol, may be missing[35].”

      A major strength of the study, in particular, is that the authors clearly differentiate the effects of FBS versus HS on developmental progression. The conversion rate observed in HS cultures is significant and consistent with previously published data.

      While the study has several strengths, some aspects of the work are not fully explored. In particular, the role of hRBC supplementation requires further clarification. Although HS-cultured worms were shown to digest hRBC more readily, the implications of this observation remain unclear. Specifically, it would be useful to understand whether hRBC supplementation influences (1) long-term culture stability, (2) molecular pathways associated with development and differentiation, or (3) the pairing capacity of the worms. While addressing these questions may not be the main objective of the study, further discussion of these points would strengthen the manuscript.

      We agree that deciphering the role of the human Red Blood Cells (hRBCs) supplementation is critical. Regarding the influence of hRBCs on the long-term culture stability in parasite development it has been well established for more than four decades that schistosomes do need red blood cells to grow in culture [Basch, P. F. Cultivation of Schistosoma mansoni in vitro. II. production of infertile eggs by worm pairs cultured from cercariae. J Parasitol 67, 186-190 (1981); Basch, P. F. Cultivation of Schistosoma mansoni in vitro. I. Establishment of cultures from cercariae and development until pairing. J. Parasitol. 67, 179-185 (1981)]. The molecular pathways underlying development, sexual differentiation and pairing and modulated by hRBCs in culture is currently being investigated by our team. We decided not to include these data and analyses in the current manuscript, as they fall outside its scope.

      The manuscript is clearly written and represents a valuable contribution to the field. Overall, the experimental approach is sound, and the results support a useful methodological framework for the in vitro culture of Schistosoma worms and the attainment of sexual maturity, particularly for adult male worms.

      We thank the reviewer for highlighting the manuscript’s strengths.

      Reviewer #2 (Public review):

      Summary:

      The authors perform confirmation studies of Paul Basch's seminal schistosome work from 1981, demonstrating the development of transformed schistosomules into sexually dimorphic adult parasites, albeit without successful egg production. In addition to the findings from Basch's earlier work, the authors add some new molecular data in the form of an analysis of proliferative cells in in-vitro-derived animals.

      Strengths:

      The authors successfully confirm experimental results from earlier schistosome researchers, providing a potential new tool for studying schistosome biology without the need for vertebrate hosts.

      We thank the reviewer for highlighting the manuscript’s strengths.

      Weaknesses:

      The display of data from the authors is sometimes difficult to follow/understand where it comes from. For example:

      (1) Line 136: The authors claim that parasites in HS and FBS conditions have substantially different mortality rates (11.3 +/- 2.7 vs 5 +/- 2.3) but a quite high p-value (0.8). Analyzing the raw data myself, I obtained a mean of 8.2 +/- 1.7% vs 4.8% +/- 4.3% with a p-value of 0.15. Either the data are not clearly presented, and I did not follow them, or the data presented in the text do not match the raw data in the supplemental files.

      We thank the reviewer for pointing this out; we have now edited Supplementary Tables S1 and S6 by turning them into a long format for the sake of clarity. Accordingly, Results, Methods sections, and indicated supplementary tables were edited as follows:

      Results, lines 142 ff.:

      “No morphological differences were observed between parasites cultured either in FBS or HS within the first week in culture; in both conditions most parasites were classified as early schistosomula [category 1: 76% ± 30 (average ± SD) in FBS and 73% ± 29 (average ± SD) in HS] with few lung (category 2) and early liver schistosomula (category 3) (Figure 1B, week 1; Supplementary Figure S1). The mean mortality (category 0) at week 1 was slightly higher, but not statistically significant (P= 0.42), in worms cultured in HS [9.75% ± 2.76 (average ± SD)] compared to the mortality registered in FBS-cultured parasites [5.52% ± 5.18 (average ± SD), Supplementary Table S6], consistent with previous findings[39].”

      Methods, lines 463-465:

      “To evaluate differences in mortality between HS- and FBS-cultured parasites, data from 5 experiments were combined and analysed using a Shapiro-Wilk normality test to test normality of the data and a non-parametric Wilcoxon rank sum exact test (Supplementary Tables S1 and S6).”

      Supplementary Tables:

      Supplementary Table S1. “Raw counts of parasites within each developmental stage category. Each row corresponds to a picture of parasites in culture medium containing FBS or HS. Each column corresponds to the raw parasite counts at indicated stage development (categories 0 to 5), time in culture (Time in days - D), and experimental condition.”

      Supplementary Table S6. “Summary of all statistical tests employed in this study. 1. Statistical tests of parasite mortality and the raw data table used for this test. 2. Statistical tests for worm size comparisons (correspond to Figure 2). 3. Statistical tests for worm black gut comparisons (correspond to Figure 3). BG: Black gut. 4. Statistical tests for EdU positive cells comparisons (correspond to Figure 4). Replicate code: E, M and L correspond to day 2, 8 and 15 respectively; R and W correspond to the presence (R) or absence (W) of RBCs added 13 days after transformation.”

      For clarity, in Author response image 1 we provide the R script used to perform the statistical tests on the data shown in Supplementary Table S6 (column Raw count of parasite developmental category per image and experiment)

      Author response image 1.

      (2) Line 187/Figure 4: Though it is not clearly stated, it appears that the authors treat their EdU counts as an ordinal data set of 61 steps (from 0 to >60) rather than a continuous measure of EdU+ cells per animal. In this author's opinion, the graph strongly suggests a continuous data set, and the fact that this reviewer had to dig through poorly-labeled raw data to discover the nature of the data is problematic. The authors should either switch to a continuous data set or make it explicit that the data shown are ordinal. If counting EdU+ cells is too arduous, the authors could consider comparing the amount of EdU+ area to the amount of DAPI+ area in maximum intensity projections of their confocal images, as this would roughly approximate the amount of proliferative cells in the animals.

      As the reviewer correctly pointed out, the data were treated as ordinal because counting worms with more than 60 Edu+ cells became extremely difficult and highly inaccurate. Therefore, we decided to group in a single category, “60 EdU+ cells”, all worms showing more than 60 EdU+ cells. We have now updated Figure 4 where medians are shown instead of media values, Supplementary Table S5 to provide more comprehensive access to the raw counts, and Supplementary Table S6 to indicate the data for EdU+ cells per worm were considered ordinal. Accordingly, we have revised the corresponding sections as follows:

      Results, lines 211 ff.:

      “HS-cultured schistosomula showed higher numbers of proliferating stem cells, with a median of >48 and >60 EdU+ cells per worm at days 8 and 15, respectively (Figure 4). On the other hand, most FBS-cultured parasites displayed no more than an average of 20 EdU+ cells per worm (Figure 4).”

      Methods, lines 520 ff.:

      “EdU+ cells per parasite were counted for an average of 100 parasites across three independent experiments (Supplementary Table S5). Worms were grouped based on the number of cells per individual, but all those showing ⪰ 60 EdU+ cells were counted in the same group named ‘60 EdU+ cells'. Therefore, the data were considered ordinal data. Statistical analysis was performed by Kruskal-Wallis test with Dunn multiple comparison post-hoc test, with P≤0.05 considered significant (Supplementary Table S6).”

      Figure 4 legend, lines 830 ff.:

      “A. Violin plots showing the number of Edu+ cells per worm at indicated time points (2, 8, and 15 days post cercarial transformation) in parasites cultured either in Foetal Bovine Serum (FBS, blue) or Human Serum (HS, light brown). Human Red Blood Cells (hRBCs) were added in the culture at day 13 post cercarial transformation. The small black dots indicate individual worms, and the big black point indicates the median of EdU+ cells per worm. All worms showing ⪰ 60 EdU+ cells were counted and clustered together in the group named ‘60 EdU+ cells’. Hence, the data were treated as ordinal and statistical analysis performed by Kruskal-Wallis test with Dunn multiple comparison post-hoc test, with P≤0.05 (*) considered significant (Supplementary Tables S5 and S6).”

      We thank the reviewer for the very interesting suggestion to quantify cell proliferation by calculating the ratio between EdU+ area to DAPI+ area in maximum intensity projections images. Measuring the fluorescence area for each worm in maximum projection is an excellent idea; however, due to the number of EdU+ cells present in some samples, we think this technique would not provide additional information or produce more detailed data compared with our analysis when the number of Edu+ cells exceeds 60 per worm. We will certainly consider this approximation for future studies.

      There are some minor issues as well:

      (1) Line 122: It is perhaps incorrect to refer to humans as "the" definitive host of schistosomes, as S. japonicum is primarily considered a zoonotic infection with water buffalo/cows being the primary definitive host.

      We thank the reviewer for pointing this out; we have now replaced “schistosomes” with “Schistosoma mansoni” (current line 131)

      (2) Line 185/298: The authors refer to EdU pulse-chase experiments, but the experiments described here are EdU pulse experiments.

      This is a very good point, we thank the reviewer for bringing this up and have accordingly edited by replacing “EdU pulse-chase” with “EdU pulse” experiments in lines 37, 204, and 321.

      Reviewer #3 (Public review):

      Summary:

      This study is significant as it established a protocol for the long-term culture of Schistosoma mansoni newly transformed cercariae, which developed in vitro into sexually dimorphic forms. The impact of two different sera, Fetal Bovine Serum (FBS) and Human Serum (HS), added to the culture medium supplemented with human red blood cells was evaluated. The authors demonstrated that HS-cultured parasites were able to digest red blood cells, a critical step for long-term parasite development. Furthermore, while most FBS-cultured parasites did not progress beyond an early liver stage, sexual dimorphism was clearly evident in the HS-cultured worms, albeit delayed compared to in vivo development.

      Strengths:

      This study could contribute to further in vitro studies for a better understanding of the unique sexual biology of Schistosoma mansoni and for screening novel schistosomicidal compounds. By increasing parasite development in in vitro studies, this protocol could have a positive impact on the principles of the 3Rs (Replacement, Reduction and Refinement) for animal research.

      We thank the reviewer for highlighting the manuscript’s strengths.

      Weaknesses:

      As the authors mentioned, "pairing between male and female parasites was rare. Pairing was observed in approximately ~7% of the experiments, usually after day ~ 80 in culture. Egg production was also not achieved with this protocol.

      Following the reviewer’s point and to clarify a misleading point, we have now decided to remove the value of 7% — which corresponds to the percentage of experiments in which couples were observed. However, this value does not accurately reflect the actual number of observed worm pairs, and it is probably misleading. We have updated the text as follows:

      Results, lines 230 ff.:

      “While the establishment of sexual dimorphism was robust and reproducible across more than 15 independent experiments, pairing between male and female parasites was rare. Pairing was observed only in experiments lasting more than 80 days in which we were only able to observe a few couples. In addition, these pairings were temporary (Figures 6A, B; Supplementary Video S4).”

    1. In 2011, a group on 4chan started spreading a plan for making a “Forever Along Involuntary Flashmob.” You can see their instructions below:

      In 2011, a group on 4chan started spreading a plan for making a “Forever Along Involuntary Flashmob.” You can see their instructions below:

      image in the form of a sort of flier with meme faces of foreveralone and trolls. The text reads: How to make your very own Forever Alone Involuntary Flashmob. 1. create fake online dating profile as mildly cute woman from NYC - just use somechicks facebook to get several believable pics etc. etc. 2. find forever alone guys from NYC on dating site, get them to believe you're interested. 3. Once forever alone guy takes the bait, suggest you meet for a date at this time and location: Pay phones outside 47th Digital store 46th st * Broadway NEW YORK 7:30PM Friday 13th May 2011. 4. watch angry alone guys flashmob rage at earthcam.com/usa/newyork/timessquare/ (select Camera 2). Also Remember: This will only work if we keep spreading these instructions and actually get involved. There is no limit to how many fake profiles and people you can trick or method used. Take time and prepare - think smart, if they suspect you're pushing the time and date too hard it aint gonna work.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript by Lin et al. presents a timely, technically strong study that builds patient-specific midbrain-like organoids (MLOs) from hiPSCs carrying clinically relevant GBA1 mutations (L444P/P415R and L444P/RecNcil). The authors comprehensively characterize nGD phenotypes (GCase deficiency, GluCer/GluSph accumulation, altered transcriptome, impaired dopaminergic differentiation), perform CRISPR correction to produce an isogenic line, and test three therapeutic modalities (SapC-DOPS-fGCase nanoparticles, AAV9GBA1, and SRT with GZ452). The model and multi-arm therapeutic evaluation are important advances with clear translational value.

      My overall recommendation is that the work undergo a major revision to address the experimental and interpretive gaps listed below.

      Strengths:

      (1) Human, patient-specific midbrain model: Use of clinically relevant compound heterozygous GBA1 alleles (L444P/P415R and L444P/RecNcil) makes the model highly relevant to human nGD and captures patient genetic context that mouse models often miss.

      (2) Robust multi-level phenotyping: Biochemical (GCase activity), lipidomic (GluCer/GluSph by UHPLC-MS/MS), molecular (bulk RNA-seq), and histological (TH/FOXA2, LAMP1, LC3) characterization are thorough and complementary.

      (3) Use of isogenic CRISPR correction: Generating an isogenic line (WT/P415R) and demonstrating partial rescue strengthens causal inference that the GBA1 mutation drives many observed phenotypes.

      (4) Parallel therapeutic testing in the same human platform: Comparing enzyme delivery (SapC-DOPS-fGCase), gene therapy (AAV9-GBA1), and substrate reduction (GZ452) within the same MLO system is an elegant demonstration of the platform's utility for preclinical evaluation.

      (5) Good methodological transparency: Detailed protocols for MLO generation, editing, lipidomics, and assays allow reproducibility

      Weaknesses:

      (1) Limited genetic and biological replication

      (a) Single primary disease line for core mechanistic claims. Most mechanistic data derive from GD2-1260 (L444P/P415R); GD2-10-257 (L444P/RecNcil) appears mainly in therapeutic experiments. Relying primarily on one patient line risks conflating patient-specific variation with general nGD mechanisms.

      We thank the reviewer for highlighting the importance of genetic and biological replication. An additional patient-derived iPSC line was included in the manuscript, therefore, our study includes two independent nGD patient-derived iPSC lines, GD2-1260 (GBA1<sup>L444P/P415R</sup>) and GD2-10-257 (GBA1<sup>L444P/RecNcil</sup>), both of which carry the severe mutations associated with nGD. These two lines represent distinct genetic backgrounds and were used to demonstrate the consistency of key disease phenotypes (reduced GCase activity, elevated substrate, impaired dopaminergic neuron differentiation etc.) across different patient’s MLOs. Major experiments (e.g., GCase activity assays, substrate, immunoblotting for DA marker TH, and therapeutic testing with SapC-DOPS-fGCase, AAV9-GBA1) were performed using both patient lines, with results showing consistent phenotypes and therapeutic responses (see Figs. 2-6, and Supplementary Figs. 4-5). To ensure clarity and transparency, a new Supplementary Table 2 summarizes the characterization of both, the GD2-1260 and GD2-10-257 lines.

      (b) Unclear biological replicate strategy. It is not always explicit how many independent differentiations and organoid batches were used (biological replicates vs. technical fields of view).

      Biological replication was ensured in our study by conducting experiments in at least 3 independent differentiations per line, and technical replicates (multiple organoids/fields per batch) were averaged accordingly. We have clarified biological replicates and differentiation in the figure legends.

      (c) A significant disadvantage of employing brain organoids is the heterogeneity during induction and potential low reproducibility. In this study, it is unclear how many independent differentiation batches were evaluated and, for each test (for example, immunofluorescent stain and bulk RNA-seq), how many organoids from each group were used. Please add a statement accordingly and show replicates to verify consistency in the supplementary data.

      In the revision, we have clarified biological replicates and differentiation in the figure legend in Fig.1E; Fig.2B,2G; Fig.3F, 3G; Fig.4B-C,E,H-J, M-N; Fig.6D; and Fig.7A-C, I.

      (d) Isogenic correction is partial. The corrected line is WT/P415R (single-allele correction); residual P415R complicates the interpretation of "full" rescue and leaves open whether the remaining pathology is due to incomplete correction or clonal/epigenetic effects.

      We attempted to generate an isogenic iPSC line by correcting both GBA1 mutations (L444P and P415R). However, this was not feasible because GBA1 overlaps with a highly homologous pseudogene (PGBA), which makes precise editing technically challenging. Consequently, only the L444P mutation was successfully corrected, and the resulting isogenic line retains the P415R mutation in a heterozygous state. Because Gaucher disease is an autosomal recessive disorder, individuals carrying a single GBA1 mutation (heterozygous carriers) do not develop clinical symptoms. Therefore, the partially corrected isogenic line, which retains only the P415R allele, represents a clinically relevant carrier model. Consistent with this, our results show that GCase activity was restored to approximately 50% of wild-type levels (Fig.4B-C), supporting the expected heterozygous state. These findings also make it unlikely that the remaining differences observed are due to clonal variation or epigenetic effects.

      (e) The authors tested week 3, 4, 8, 15, and 28 old organoids in different settings. However, systematic markers of maturation should be analyzed, and different maturation stages should be compared, for example, comparing week 8 organoids to week 28 organoids, with immunofluorescent marker staining and bulk RNAseq.

      We agree that a systematic analysis of maturation stages is essential for validating the MLO model. Our data integrated a longitudinal comparison across multiple developmental windows (Weeks 3 to 28) to characterize the transition from progenitors to mature/functional states for nGD phenotyping and evaluation of therapeutic modalities: 1) DA differentiation (Wks 3 and 8 in Fig. 3): qPCR analysis demonstrated the progression of DA-specific programs. We observed a steady increase in the mature DA neuron marker TH and ASCL1. This was accompanied by a gradual decrease in early floor plate/progenitor markers FOXA2 and PLZF, indicating a successful differentiation path from progenitors to differentiated/mature DA neurons. 2) Glycosphingolipid substrates accumulation (Wks 15 and 28 in Fig 2): To assess late-stage nGD phenotyping, we compared GluCer and GluSph at Week 15 and Week 28. This comparison highlights the progressive accumulation of substrates in nGD MLOs, reflecting the metabolic consequences of the disease at different mature stage. 3) Organoid growth dynamics (Wks 4, 8, and 15 in new Fig. 4): The new Fig. 4 tracks physical maturation through organoid size and growth rates across three key time points, providing a macro-scale verification of consistent development between WT and nGD groups. By comparing these early (Wk 3-8) and late (Wk 15-28) stages, we confirmed that our MLOs transition from a proliferative state to a post-mitotic, specialized neuronal state, satisfied the requirement for comparing distinct maturation stages.

      (f) The manuscript frequently refers to Wnt signaling dysregulation as a major finding. However, experimental validation is limited to transcriptomic data. Functional tests, such as the use of Wnt agonist/inhibitor, are needed to support this claim (see below).

      We agree that the suggested experiments could provide additional mechanistic insights into this study and will consider them in future work.

      (g) Suggested fixes / experiments

      Add at least one more independent disease hiPSC line (or show expanded analysis from GD2-10-257) for key mechanistic endpoints (lipid accumulation, transcriptomics, DA markers).

      Additional line iPSC GD2-10-257 derived MLO was included in the manuscript. This was addressed above [see response to Weaknesses (1)-a].

      Generate and analyze a fully corrected isogenic WT/WT clone (or a P415R-only line) if feasible; at minimum, acknowledge this limitation more explicitly and soften claims.

      We attempted to generate an isogenic iPSC line by correcting both GBA1 mutations (L444P and P415R). However, this was unsuccessful because the GBA1 gene overlaps with a pseudogene (PGBA) located16kd downstream of GBA1, which shares 9698% sequence similarity with GBA1) (Ref#1, #2), which complicates precise editing. GBA1 is shorter (~5.7 kb) than PGBA (~7.6 kb). The primary exonic difference between GBA1 and PGBA is a 55-bp deletion in exon 9 of the pseudogene. As a result, the isogenic line we obtained carries only the P415R mutation, and L444P was corrected to normal sequence. We have included this limitation in the Methods as “This gene editing strategy is expected to also target the GBA1 pseudogene due to the identical target sequence, which limits the gene correction on certain mutations (e.g., P415R)”.

      References:

      (1) Horowitz M., Wilder S., Horowitz Z., Reiner O., Gelbart T., Beutler E. The human glucocerebrosidase gene and pseudogene: structure and evolution. Genomics (1989). 4, 87–96. doi:10.1016/0888-7543(89)90319-4

      (2) Woo EG, Tayebi N, Sidransky E. Next-Generation Sequencing Analysis of GBA1: The Challenge of Detecting Complex Recombinant Alleles. Front Genet. (2021). 12:684067. doi: 10.3389/fgene.2021.684067. PMCID: PMC8255797.

      Report and increase independent differentiations (N = biological replicates) and present per-differentiation summary statistics.

      This was addressed above [see response to Weaknesses (1)-b, (1)-c].

      (2) Mechanistic validation is insufficient

      (a) RNA-seq pathways (Wnt, mTOR, lysosome) are not functionally probed. The manuscript shows pathway enrichment and some protein markers (p-4E-BP1) but lacks perturbation/rescue experiments to link these pathways causally to the DA phenotype.

      (b) Autophagy analysis lacks flux assays. LC3-II and LAMP1 are informative, but without flux assays (e.g., bafilomycin A1 or chloroquine), one cannot distinguish increased autophagosome formation from decreased clearance.

      (c) Dopaminergic dysfunction is superficially assessed. Dopamine in the medium and TH protein are shown, but no neuronal electrophysiology, synaptic marker co-localization, or viability measures are provided to demonstrate functional recovery after therapy.

      (d) Suggested fixes / experiments - Perform targeted functional assays:

      (i) Wnt reporter assays (TOP/FOP flash) and/or treat organoids with Wnt agonists/antagonists to test whether Wnt modulation rescues DA differentiation.

      (ii) Test mTOR pathway causality using mTOR inhibitors (e.g., rapamycin) or 4E-BP1 perturbation and assay effects on DA markers and autophagy.

      Include autophagy flux assessment (LC3 turnover with bafilomycin), and measure cathepsin activity where relevant.

      Add at least one functional neuronal readout: calcium imaging, MEA recordings, or synaptic marker quantification (e.g., SYN1, PSD95) together with TH colocalization.

      We thank the reviewer for these valuable suggestions. We agree that the suggested experiments could provide additional mechanistic insights into this study and will consider them in future work. Importantly, the primary conclusions of our manuscript, that GBA1 mutations in nGD MLOs resulted in nGD pathologies such as diminished enzymatic function, accumulation of lipid substrates, widespread transcriptomic changes, and impaired dopaminergic neuron differentiation, which can be corrected by several therapeutic strategies in this study, are supported by the evidence presented. The suggested experiments represent an important direction for future research using brain organoids.

      (3) Therapeutic evaluation needs greater depth and standardization

      (a) Short windows and limited durability data. SapC-DOPS and AAV9 experiments range from 48 hours to 3 weeks; longer follow-up is needed to assess durability and whether biochemical rescue translates into restored neuronal function.

      We agree with the reviewer. Because this is a proof-of-principle study, the treatment was designed within a short time window. Long-term studies with more comprehensive outcome assessments will be conducted in future work.

      (b) Dose-response and biodistribution are under-characterized. AAV injection sites/volumes are described, but transduction efficiency, vg copies per organoid, cell-type tropism quantification, and SapC-DOPS penetration/distribution are not rigorously quantified.

      We appreciate the reviewer’s concerns. This study was intended to demonstrate the feasibility and initial response of MLOs to AAV therapy. A comprehensive evaluation of AAV biodistribution will be considered in future studies.

      The penetration and distribution of SapC-DOPS have been extensively characterized in prior studies. In vivo biodistribution of SapC–DOPS coupled CellVue Maroon, a fluorescent cargo, was examined in mice bearing human tumor xenografts using real-time fluorescence imaging, where CellVue Maroon fluorescence in tumor remained for 48 hours (Ref. #3: Fig. 4B, mouse 1), 100 hours (Ref. #4: Fig. 5), up to 216 hours (Ref. #5: Fig. 3). Uptake kinetics were also demonstrated in cells, with flow cytometry quantification showing that fluorescent cargo coupled SapC-DOPS nanovesicles, were incorporated into human brain tumor cell membranes within minutes and remained stably incorporated into the cells for up to one hour (Ref. # 6: Fig. 1a and Fig. 1b). Building on these findings, the present study focuses on evaluating the restoration of GCase function rather than reexamining biodistribution and uptake kinetics.

      References:

      (3) X. Qi, Z. Chu, Y.Y. Mahller, K.F. Stringer, D.P. Witte, T.P. Cripe. Cancer-selective targeting and cytotoxicity by liposomal-coupled lysosomal saposin C protein. Clin. Cancer Res. (2009) 15, 5840-5851. PMID: 19737950.

      (4) Z. Chu, S. Abu-Baker, M.B. Palascak, S.A. Ahmad, R.S. Franco, and X. Qi. Targeting and cytotoxicity of SapC-DOPS nanovesicles in pancreatic cancer. PLOS ONE (2013) 8, e75507. PMID: 24124494.

      (5) Z. Chu, K. LaSance, V.M. Blanco, C-H. Kwon, B. Kaur, M. Frederick, S. Thornton, L. Lemen, and X. Qi. Multi-angle rotational optical imaging of brain tumors and arthritis using fluorescent SapC-DOPS nanovesicles. J. Vis. Exp. (2014) 87, e51187, 1-7. PMID: 24837630.

      (6) J. Wojton, Z. Chu, C-H. Kwon, L.M.L. Chow, M. Palascak, R. Franco, T. Bourdeau, S. Thornton, B. Kaur, and X. Qi. Systemic delivery of SapC-DOPS has antiangiogenic and antitumor effects against glioblastoma. Mol. Ther. (2013) 21, 1517-1525. PMID: 23732993.

      (c) Specificity controls are missing. For SapC-DOPS, inclusion of a non-functional enzyme control (or heat-inactivated fGCase) would rule out non-specific nanoparticle effects. For AAV, assessment of off-target expression and potential cytotoxicity is needed.

      Including inactive fGCase would confound the assessment of fGCase in MLOs by immunoblot and immunofluorescence; therefore, saposin C–DOPS was used as the control instead.

      We agree that assessment of off-target expression and potential cytotoxicity for AAV is important, this will be included in future studies.

      (d) Comparative efficacy lacking. It remains unclear which modality is most effective in the long term and in which cellular compartments.

      To address this comment, we have added a new table (Supplementary Table 2) comparing the four therapeutic modalities and summarizing their respective outcomes. While this study focused on short-term responses as a proof-of-principle, future work will explore long-term therapeutic effects.

      (e) Suggested fixes/experiments

      Extend follow-up (e.g., 6+ weeks) after AAV/SapC dosing and evaluate DA markers, electrophysiology, and lipid levels over time.

      We appreciate the reviewer’s suggestions. The therapeutic testing in patient-derived MLOs was designed as a proof-of-principle study to demonstrate feasibility and the primary response (rescue of GCase function) to the treatment. A comprehensive, long-term therapeutic evaluation of AAV and SapC-DOPS-fGCase is indeed important for a complete assessment; however, this represents a separate therapeutic study and is beyond the scope of the current work.

      Quantify AAV transduction by qPCR for vector genomes and by cell-type quantification of GFP+ cells (neurons vs astrocytes vs progenitors).

      For the AAV-treated experiments, we agree that measuring AAV copy number and GFP expression would provide additional information. However, the primary goal of this study was to demonstrate the key therapeutic outcome, rescue of GCase function by AAV-delivered normal GCase, which is directly relevant to the treatment objective.

      Include SapC-DOPS control nanoparticles loaded with an inert protein and/or fluorescent cargo quantitation to show distribution and uptake kinetics.

      As noted above [see response to Weakness (3)-c], using inert GCase would confound the assessment of fGCase uptake in MLOs; therefore, it was not suitable for this study. See response above for the distribution and uptake kinetics of SapC-DOPS [see response to Weaknesses (3)-b].

      Provide head-to-head comparative graphs (activity, lipid clearance, DA restoration, and durability) with statistical tests.

      We have added a new table (Supplementary Table 2) providing a head-to-head comparison of the treatment effects.

      (4) Model limitations not fully accounted for in interpretation

      (a) Absence of microglia and vasculature limits recapitulation of neuroinflammatory responses and drug penetration, both of which are important in nGD. These absences could explain incomplete phenotypic rescues and must be emphasized when drawing conclusions about therapeutic translation.

      We agree that the absence of microglia and vasculature in midbrain-like organoids represents a limitation, as we have discussed in the manuscript. In this revision, we highlighted this limitation in the Discussion section and clarified that it may contribute to incomplete phenotyping and phenotypic rescue observed in our therapeutic experiments. Additionally, we have outlined future directions to incorporate microglia and vascularization into the organoid system to better recapitulate the in vivo environment and improve translational relevance (see 7th paragraph in the Discussion).

      (b) Developmental vs degenerative phenotype conflation. Many phenotypes appear during differentiation (patterning defects). The manuscript sometimes interprets these as degenerative mechanisms; the distinction must be clarified.

      We appreciate the reviewer’s comments. In the revised manuscript, we have clarified that certain abnormalities, such as patterning defects observed during early differentiation, likely reflect developmental consequences of GBA1 mutations rather than degenerative processes. Conversely, phenotypes such as substrate accumulation, lysosomal dysfunction, and impaired dopaminergic maturation at later stages are interpreted as degenerative features. We have updated the Results and Discussion sections to avoid conflating developmental defects with neurodegenerative mechanisms.

      (c) Suggested fixes

      Tone down the language throughout (Abstract/Results/Discussion) to avoid overstatement that MLOs fully recapitulate nGD neuropathology.

      The manuscript has been revised to avoid overstatements.

      Add plans or pilot data (if available) for microglia incorporation or vascularization to indicate how future work will address these gaps.

      The manuscript now includes further plans to address the incorporation of microglia and vascularization, described in the last two paragraphs in the Discussion. Pilot study of microglia incorporation will be reported when it is completed.

      (5) Statistical and presentation issues

      (a) Missing or unclear sample sizes (n). For organoid-level assays, report the number of organoids and the number of independent differentiations.

      We have clarified biological replicates and differentiation in the figure legend [see response to Weaknesses (1)-b, (1)-c].

      (b) Statistical assumptions not justified. Tests assume normality; where sample sizes are small, consider non-parametric tests and report exact p-values.

      We have updated Statistical analysis in methods as described below:

      For comparisons between two groups, data were analyzed using unpaired two-tailed Student’s t-tests when the sample size was ≥6 per group and normality was confirmed by the Shapiro-Wilk test. When the normality assumption was not met or when sample sizes were small (n < 6), the non-parametric Mann-Whitney U test was used instead. For comparisons involving three or more groups, one-way ANOVA followed by Tukey’s multiple comparison test was applied when data were normally distributed; otherwise, the nonparametric Dunn’s multiple comparison test was used. Exclusion of outliers was made based on cutoffs of the mean ±2 standard deviations. All statistical analyses were performed using GraphPad Prism 10 software. Exact p-values are reported throughout the manuscript and figures where feasible. A p-value < 0.05 was considered statistically significant.

      (c) Quantification scope. Many image quantifications appear to be from selected fields of view, which are then averaged across organoids and differentiations.

      In this work, quantitative immunofluorescence analyses (e.g., cell counts for FOXP1+, FOXG1+, SOX2+ and Ki67+ cells, as well as marker colocalization) were performed on at least 3–5 randomly selected non-overlapping fields of view (FOVs) per organoid section, with a minimum of 3 organoids per differentiation batch. Each FOV was imaged at consistent magnification (60x) and z-stack depth to ensure comparable sampling across conditions. Data from individual FOVs were first averaged within each organoid to obtain an organoid-level mean, and then biological replicates (independent differentiations, n ≥ 3) were averaged to generate the final group mean ± SEM. This multilevel averaging approach minimizes bias from regional heterogeneity within organoids and accounts for variability across differentiations. Representative confocal images shown in the figures were selected to accurately reflect the quantified data. We believe this standardized quantification strategy ensures robust and reproducible results while appropriately representing the 3D architecture of the organoids.

      In the revision, we have clarified the method used for image analysis of sectioned MLOs as below:

      Quantitative immunofluorescence analyses (e.g., cell counts for FOXP1+, FOXG1+, SOX2+ and Ki67+ cells, as well as marker colocalization) were performed using ImageJ (NIH) on at least 3–5 randomly selected non-overlapping fields of view (FOVs) per organoid section, with a minimum of 3 organoids per differentiation batch. Each FOV was imaged at consistent magnification (60x) and z-stack depth to ensure comparable sampling across conditions. Data from individual FOVs were first averaged within each organoid to obtain an organoid-level mean, and then biological replicates (independent differentiations, n ≥ 3) were averaged to generate the final group mean ± SEM.

      (d) RNA-seq QC and deposition. Provide mapping rates, batch correction details, and ensure the GEO accession is active. Include these in Methods/Supplement.

      RNA-seq data are from same batch. The mapping rate is >90%. GEO accession will be active upon publication. These were included in the Methods.

      (e) Suggested fixes

      Add a table summarizing biological replicates, technical replicates, and statistical tests used for each figure panel.

      We have revised the figure legends to include replicates for each figure and statistical tests [see response in weaknesses (1)-b, (1)-c].

      Recompute statistics where appropriate (non-parametric if N is small) and report effect sizes and confidence intervals.

      Statistical analysis method is provided in the revision [see response in Weaknesses (5)-b].

      (6) Minor comments and clarifications

      (a) The authors should validate midbrain identity further with additional regional markers (EN1, OTX2) and show absence/low expression of forebrain markers (FOXG1) across replicates.

      We validated the MLO identity by 1) FOXG1 and 2) EN1. FOXG1 was barely detectable in Wk8 75.1_MLO but highly present in ‘age-matched’ cerebral organoid (CO), suggesting our culturing method is midbrain region-oriented. In nGD MLO, FOXG1 expression is significantly higher than 75.1_MLO, indicating that there was aberrant anterior-posterior brain specification, consistent with the transcriptomic dysregulation observed in our RNA-seq data.

      To further confirm midbrain identity, we examined the expression of EN1, an established midbrain-specific marker. Quantitative RT-PCR analysis demonstrated that EN1 expression increased progressively during differentiation in both WT-75.1 and nGD2-1260 MLOs at weeks 3 and 8 (Author response image 1). EN1 reached 34-fold and 373-fold higher levels than in WT-75.1 iPSCs at weeks 3 and 8, respectively, in WT-75.1 MLOs. In nGD MLOs, although EN1 expression showed a modest reduction at week 8, the levels were not significantly different from those observed in age-matched WT-75.1 MLOs (p > 0.05, ns).

      Author response image 1.

      qRT-PCR quantification of midbrain progenitor marker EN1 expression in WT-75.1 and GD2-1260 MLOs at Wk3 and Wk8. Data was normalized to WT-75.1 hiPSC cells and presented as mean ± SEM (n = 3-4 MLOs per group). ns, not significant.

      (b) Extracellular dopamine ELISA should be complemented with intracellular dopamine or TH+ neuron counts normalized per organoid or per total neurons.

      We quantified TH expression at both the mRNA level (Fig. 3F) and the protein level (Fig. 3G/H) from whole-organoid lysates, which provides a more consistent and integrative measure across samples. These TH expression levels correlated well with the corresponding extracellular (medium) dopamine concentrations for each genotype. In contrast, TH<sup>+</sup> neuron counts may not reliably reflect total cellular dopamine levels because the number of cells captured on each organoid section varies substantially, making normalization difficult. Measuring intracellular dopamine is an alternative approach that will be considered in future studies.

      (c) For CRISPR editing: the authors should report off-target analysis (GUIDE-seq or targeted sequencing of predicted off-targets) or at least in-silico off-target score and sequencing coverage of the edited locus. (off-target analysis (GUIDE-seq or targeted sequencing of predicted off-targets) or at least in-silico off-target score and sequencing coverage of the edited locus).

      The off-target effect was analyzed during gene editing and the chance to target other off-targets is low due to low off-target scores ranked based on the MIT Specificity Score analysis. The related method was also updated as stated below:

      “The chance to target other off-targets is low due to low off-target scores ranked based on the MIT Specificity Score analysis (Hsu, P., Scott, D., Weinstein, J. et al. DNA targeting specificity of RNA-guided Cas9 nucleases. Nat Biotechnol 31, 827–832 (2013). https://doi.org/10.1038/nbt.2647).”

      (d) It should be clarified as to whether lipidomics normalization is to total protein per organoid or per cell, and include representative LC-MS chromatograms or method QC.

      The normalization was to the protein of organoid lysate. This was clarified in the Methods section in the revision as stated below:

      “The GluCer and GluSph levels in MLO were normalized to total MLO protein (mg) that were used for glycosphingolipids analyses. Protein mass was determined by BCA assay and glycosphingolipid was expressed as pmol/mg protein. Additionally, GluSph levels in the culture medium were quantified and normalized to the medium volume (pmol/mL).”

      Representative LC-MS chromatograms for both normal and GD MLOs have been included in a new figure, Supplementary Figure 2.

      (e) Figure legends should be improved in order to state the number of organoids, the number of differentiations, and the exact statistical tests used (including multiplecomparison corrections).

      This was addressed above [see response to Weaknesses (1)-b and (5)-b].

      (f) In the title, the authors state "reveal disease mechanisms", but the studies mainly exhibit functional changes. They should consider toning down the statement.

      The title was revised to: Patient-Specific Midbrain Organoids with CRISPR Correction Recapitulate Neuronopathic Gaucher Disease Phenotypes and Enable Evaluation of Novel Therapies

      (7) Recommendations

      This reviewer recommends a major revision. The manuscript presents substantial novelty and strong potential impact but requires additional experimental validation and clearer, more conservative interpretation. Key items to address are:

      (a) Strengthening genetic and biological replication (additional lines or replicate differentiations).

      This was addressed above [see response to Weaknesses (1)-a, (1)-b, (1)-c].

      (b) Adding functional mechanistic validation for major pathways (Wnt/mTOR/autophagy) and providing autophagy flux data.

      (c) Including at least one neuronal functional readout (calcium imaging/MEA/patch) to demonstrate functional rescue.

      As addressed above [see response to Weaknesses (2)], the suggested experiments in b) and c) would provide additional insights into this study and we will consider them in future work.

      (d) Deepening therapeutic characterization (dose, biodistribution, durability) and including specificity controls.

      This was addressed above [see response to Weaknesses (3)-a to e].

      (e) Improving statistical reporting and explicitly stating biological replicate structure.

      This was addressed above [see response to Weaknesses (1)-b, (5)-b].

      Reviewer #2 (Public review):

      Sun et al. have developed a midbrain-like organoid (MLO) model for neuronopathic Gaucher disease (nGD). The MLOs recapitulate several features of nGD molecular pathology, including reduced GCase activity, sphingolipid accumulation, and impaired dopaminergic neuron development. They also characterize the transcriptome in the MLO nGD model. CRISPR correction of one of the GBA1 mutant alleles rescues most of the nGD molecular phenotypes. The MLO model was further deployed in proof-of-principle studies of investigational nGD therapies, including SapC-DOPS nanovesicles, AAV9-mediated GBA1 gene delivery, and substrate-reduction therapy (GZ452). This patient-specific 3D model provides a new platform for studying nGD mechanisms and accelerating therapy development. Overall, only modest weaknesses are noted.

      We thank the reviewer for the supportive remarks.

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors describe modeling of neuronopathic Gaucher disease (nGD) using midbrain-like organoids (MLOs) derived from hiPSCs carrying GBA1 L444P/P415R or L444P/RecNciI variants. These MLOs recapitulate several disease features, including GCase deficiency, reduced enzymatic activity, lipid substrate accumulation, and impaired dopaminergic neuron differentiation. Correction of the GBA1 L444P variant restored GCase activity, normalized lipid metabolism, and rescued dopaminergic neuronal defects, confirming its pathogenic role in the MLO model. The authors further leveraged this system to evaluate therapeutic strategies, including: (i) SapC-DOPS nanovesicles for GCase delivery, (ii) AAV9-mediated GBA1 gene therapy, and (iii) GZ452, a glucosylceramide synthase inhibitor. These treatments reduced lipid accumulation and ameliorated autophagic, lysosomal, and neurodevelopmental abnormalities.

      Strengths:

      This manuscript demonstrates that nGD patient-derived MLOs can serve as an additional platform for investigating nGD mechanisms and advancing therapeutic development.

      Comments:

      (1) It is interesting that GBA1 L444P/P415R MLOs show defects in midbrain patterning and dopaminergic neuron differentiation (Figure 3). One might wonder whether these abnormalities are specific to the combination of L444P and P415R variants or represent a general consequence of GBA1 loss. Do GBA1 L444P/RecNciI (GD2-10-257) MLOs also exhibit similar defects?

      We observed reduced dopaminergic neuron marker TH expression in GBA1 L444P/RecNciI (GD2-10-257) MLOs, suggesting that this line also exhibits defects in dopaminergic neuron differentiation. These data are provided in a new Supplementary Fig. 4E, and are summarized in new Supplementary Table 2 in the revision.

      (2) In Supplementary Figure 3, the authors examined GCase localization in SapC-DOPSfGCase-treated nGD MLOs. These data indicate that GCase is delivered to TH<sup>+</sup> neurons, GFAP<sup>+</sup> glia, and various other unidentified cell types. In fruit flies, the GBA1 ortholog, Gba1b, is only expressed in glia (PMID: 35857503; 35961319). Neuronally produced GluCer is transferred to glia for GBA1-mediated degradation. These findings raise an important question: in wild-type MLOs, which cell type(s) normally express GBA1? Are they dopaminergic neurons, astrocytes, or other cell types?

      All cell types in wild-type MLOs are expected to express GBA1, as it is a housekeeping gene broadly expressed across neurons, astrocytes, and other brain cell types. Its lysosomal function is essential for cellular homeostasis and is therefore not restricted to any specific lineage. (https://www.proteinatlas.org/ENSG00000177628GBA1/brain/midbrain).

      (3) The authors may consider switching Figures 2 and 3 so that the differentiation defects observed in nGD MLOs (Figure 3) are presented before the analysis of other phenotypic abnormalities, including the various transcriptional changes (Figure 2).

      We appreciate the reviewer’s suggestion; however, we respectfully prefer to retain the current order of Figures 2 and 3, as we believe this structure provides the clearest narrative flow. Figure 2 establishes the core biochemical hallmarks: reduced GCase activity, substrate accumulation, and global transcriptomic dysregulation (1,429 DEGs enriched in neural development, WNT signaling, and lysosomal pathways), which together provide essential molecular context for studying the specific cellular differentiation defects presented in Figure 3. Presenting the broader disease landscape first creates a coherent mechanistic link to the subsequent analyses of midbrain patterning and dopaminergic neuron impairment.

      To enhance readability, we have added a brief transitional sentence at the start of the Figure 3 paragraph: “Building on the molecular and transcriptomic hallmarks of GCase deficiency observed in nGD MLOs (Figure 2), we next investigated the impact on midbrain patterning and dopaminergic neuron differentiation (Figure 3).”

      Recommendations for the authors:

      Reviewing Editor Comments:

      Your paper has been reviewed by three expert reviewers in the GBA field. Although they appreciate the work and its novelty, they raise several concerns. We suggest that you to address these concerns in the next version.

      Reviewer #1 (Recommendations for the authors):

      Statistical and presentation issues

      (1) Missing or unclear sample sizes (n). For organoid-level assays, report the number of organoids and the number of independent differentiations.

      This was addressed above [see response to Reviewer 1 Weaknesses (1)- b].

      (2) Statistical assumptions not justified. Tests assume normality; where sample sizes are small, consider non-parametric tests and report exact p-values.

      We have updated methods to describe the Statistical analysis details [see response to Reviewer 1 Weaknesses (5)-b].

      (3) Quantification scope. Many image quantifications appear to be from selected fields of view, which are then averaged across organoids and differentiations.

      This was addressed above [see response to Reviewer 1 Weaknesses (5)- c].

      (4) RNA-seq QC and deposition. Provide mapping rates, batch correction details, and ensure the GEO accession is active. Include these in Methods/Supplement.

      Our RNA-seq data were generated from a single batch of MLOs, with mapping rates exceeding 90%. The GEO accession will be made publicly available upon publication.

      Reviewer #2 (Recommendations for the authors):

      Please consider the following suggestions for revisions:

      (1) Line 86: A bit more explanation/justification for the focus on midbrain-like organoids would be helpful, including introducing the nature of the midbrain pathology to better put some of the MLO findings in context. Is the nGD pathology for the midbrain significantly different / out of proportion to other affected brain regions?

      nGD Patients often display impaired vertical gaze and movement disorders. These symptoms correlate with midbrain involvement due to the sensitivity of this region to neuroinflammatory and degenerative processes (Ref #7, #8). Both human and mouse studies indicate that the midbrain exhibits prominent substrate accumulation compared to other brain regions, suggesting a predisposition for greater pathological involvement in GD midbrain (Ref #8, #9, #10, #11). This rationale was added to Line 86 in the revision.

      References:

      (7) Goker-Alpan O, Ivanova MM. Neuronopathic Gaucher disease: Rare in the West, common in the East. J Inherit Metab Dis.(2024) 47(5):917-934. PMID: 38768609.

      (8) Burrow TA, Sun Y, Prada CE, Bailey L, Zhang W, Brewer A, Wu SW, Setchell KDR, Witte D, Cohen MB, Grabowski GA. CNS, lung, and lymph node involvement in Gaucher disease type 3 after 11 years of therapy: clinical, histopathologic, and biochemical findings. Mol Genet Metab. (2015) 114(2):233-241. PMID: 25219293.

      (9) Tamar Farfel-Becker, Einat B. Vitner, Samuel L. Kelly, Jessica R. Bame, Jingjing Duan, Vera Shinder, Alfred H. Merrill, Kostantin Dobrenis, Anthony H. Futerman. Neuronal accumulation of glucosylceramide in a mouse model of neuronopathic Gaucher disease leads to neurodegeneration, Human Molecular Genetics, (2014). Volume 23, Issue 4, Pages 843–854.

      (10) E. Ellen Jones, Wujuan Zhang, Xueheng Zhao, Cristine Quiason , Stephanie Dale, Sheerin Shahidi-Latham, Gregory A. Grabowski, Kenneth D. R. Setchell, Richard R. Drake, and Ying Sun. High-Resolution MALDI Imaging Mass Spectrometry. SLAS Discovery (2017). Vol. 22(10) 1218–1228

      (11) Xu YH, Xu K, Sun Y, Liou B, Quinn B, Li RH, Xue L, Zhang W, Setchell KD, Witte D, Grabowski GA. Multiple pathogenic proteins implicated in neuronopathic Gaucher disease mice. Hum Mol Genet. (2014) 23(15):3943-57. PMID: 24599400.

      (2) Lines 359-360: Please specify the carbon-chain length of the sphingoid base of the GluCer species analyzed. Also, is there a citation for the statement that 18:0 and 16:0 are "brain-enriched species"?

      The carbon-chain length analyzed ranges from 14:0 to 24:0. The sphingoid base for all GluCer species analyzed is d18:1. For example, the species referred to as GluCer 18:0 corresponds to GluCer(d18:1/18:0). Although both, 16:0 and 18:0 are enriched in the brain, 18:0 is the most abundant species in the brain (Ref #12, #13). We revised "brain-enriched species” to “brain-predominant species (18:0)”.

      References:

      (12) Nilsson, O., and Svennerholm, L. Accumulation of Glucosylceramide and Glucosylsphingosine (Psychosine) in Cerebrum and Cerebellum in Infantile and Juvenile Gaucher Disease. Journal of Neurochemistry (1982) 39, 709–718.

      (13) Sun, Y., Zhang, W., Xu, Y.H., Quinn, B., Dasgupta, N., Liou, B., Setchell, K.D., and Grabowski, G.A. Substrate compositional variation with tissue/region and Gba1 mutations in mouse models--implications for Gaucher disease. PLoS One (2013). 8, e57560.10.1371/journal.pone.0057560.

      (3) Figure 2: It would be interesting to compare the MLO findings to prior gene expression data. Are there previously published transcriptome analyses from nGD brain tissue (or other tissues) that the transcriptome data obtained from MLOs may be compared with? What about transcriptome analyses of mouse GD models?

      We thank the reviewer for this valuable suggestion. To strengthen the biological context of our transcriptomic findings, we have added a new comparative table (new Supplementary Table 3) in the revised manuscript that summarizes key dysregulated pathways in our human nGD MLOs alongside previously published data from nGD mouse midbrain (Ref#14). The table highlights substantial overlap, including axon guidance, neuron differentiation, dopaminergic/glutamatergic/GABAergic synaptic signaling, lipid metabolism, apoptosis/cell death, and nervous system development, emphasizing the translational relevance of our model. We also note that our dataset uniquely reveals pronounced dysregulation of WNT signaling and anterior-posterior patterning (Fig. 2L and 2M), potentially reflecting human-specific early midbrain defects.

      We added the following sentence to Discussion: “Comparative analysis with prior transcriptomic data from nGD mouse midbrain showed consistent dysregulation in axon guidance, synaptic signaling, lipid metabolism, and nervous system development (new Supplementary Table 3), supporting the fidelity of our human MLO model.”

      Reference:

      (14) Dasgupta N, Xu YH, Li R, Peng Y, Pandey MK, Tinch SL, Liou B, Inskeep V, Zhang W, Setchell KD, Keddache M, Grabowski GA, Sun Y. Neuronopathic Gaucher disease: dysregulated mRNAs and miRNAs in brain pathogenesis and effects of pharmacologic chaperone treatment in a mouse model. Hum Mol Genet. (2015) 24(24):7031-48. PMID: 26420838.

      (4) Lines 402-405 & Figure 3D: Is it possible to include a merged image to better visualize the TH and FOXA2 co-staining / potential colocalization?

      The merged images of TH (red) and FOXA2 (green) are shown in Fig. 3E. Yellow arrows indicate TH and FOXA2 co-stained cells, which appear yellow in the merged images. The results demonstrate that the number of co-stained cells is reduced in GD2-1260 MLOs compared with WT-75.1 MLOs at both, week 6 and week 8.

      (5) Lines 447-448 & Figure 4F, G, J: It would be helpful to provide a direct analysis/visualization of MLO size between the WT-75.1, GD2-1260, and iso-GD2-1260 genotypes (allowing direct comparison of WT and iso). Similarly, the same 3-way analysis would be valuable for assessing dopamine levels.

      We have included WT-75.1 in Fig. 4 F/G/J in the revision. All three genotypes, WT-75.1, GD2-1260, and iso-GD2-1260, are presented for analysis compared to WT-75.1. In new Figure 4F, MLO growth is presented by representative MLO images taken under wide field microscopy at day 2, Wk4 and Wk8 of differentiation. In new Fig. 4G, MLOs size was analyzed by NIS elements and presented as the area (µm<sup>2</sup>) of MLO in image (mean ± SEM). N≥10 MLOs were analyzed for each genotype. In new Fig. 4J. Dopamine levels in MLO culture medium from WT-75.1, GD2-1260 and iso- GD2-1260 MLOs at Wk12 cultured in 3 mL BGM medium for 72 hours were analyzed. Data are presented as mean ± SEM (n = 5 per group). Statistical analysis applied was described in the legend.

      (6) Figure 4: What is the explanation/interpretation of the residual autophagy pathway dysfunction in CRISPR-corrected MLOs? nGD requires near-complete loss of GCase activity, so it is a bit curious that autophagic dysfunction would be observed with only ~50% GCase reduction? There is some discussion, but it doesn't fully capture the unexpected nature and implications of this result.

      This phenomenon may be explained by a threshold effect in lysosomal function. Gaucher disease is an autosomal recessive disorder. The carriers with heterozygous GBA1 mutation, who retain approximately 50% of normal GCase activity, do not develop disease. This suggests that even partial restoration of GCase activity can reduce glucosylceramide accumulation below a pathological threshold, thereby restoring lysosomal integrity and autophagic flux. In addition, improved GCase activity may help normalize the lipid composition of lysosomal membranes, facilitating the fusion events required for effective autophagy.

      (7) Lines 512-516 & Figure 5J: The data shown are inconclusive. Can these Western blot data be quantified, noting the number of replicates for each measurement? Without quantification and statistics, it is difficult to assess the claim that levels of LAMP1, LC3-I, LC3-II, 4E-BP1, and p-4E-BP1 in GD2-1260 treated with SapC-DOPS-fGCase are more similar to GD2-1260 treated following SapC-DOPS than to WT-75.1.

      We performed quantitative analysis by comparing WT-75.1 and included the data in new Fig. 5J. The result was revised as:

      Analysis of protein levels showed that decreased LAMP1 expression in GD2 1260 MLOs was not altered following SapC DOPS fGCase treatment (Figure 5J). The elevated LC3-II levels, an indicator of impaired autophagic flux, were reduced upon treatment, suggesting enhanced autophagic activity (Figure 5J). Moreover, phosphorylated 4E-BP1 (Thr37/46), but not total 4E-BP1, was improved in SapC-DOPS-fGCase–treated MLOs, reflecting a decrease in mTOR hyperactivation (Figure 5J). We anticipate that a longer duration of SapC-DOPS-fGCase exposure in nGD MLOs may produce a more robust therapeutic effect in rescuing nGD-associated phenotypes, which will be evaluated in future studies.

      (8) Lines 518-520: The presented data support "effective restoration of GCase activity," but clarification is needed regarding "correction of GD-related disease phenotypes." Perhaps "selected molecular and biochemical phenotypes" would be more accurate. Data are not shown for several other phenotypes, including TH, FOXA2, and dopamine levels.

      This was revised to “selected molecular and biochemical phenotypes “.

      (9) Figure 5D-J: Please clarify whether all experiments were conducted 48 hours after treatment, as indicated for Figure 5C. If so, does this suggest that SapC-DOPS treatment exhibits only short-term effects? Were any data collected to evaluate the persistence of the treatment effect?

      The treatment duration is specified in the Fig. 5 legend. Fig. 5D–J represent experiments conducted after two weeks of treatment, whereas Fig. 5C reflects a 48-hour treatment. In both Gaucher disease lines, two-week treatment restored GCase activity to wild-type levels and reduced GluSph substrate accumulation. These findings were intended as proof-of-principle to demonstrate therapeutic feasibility; evaluation of treatment persistence beyond two weeks was beyond the scope of this study.

      Minor suggestions

      (1) Line 80: "A brain organoid derived from hiPSCs of a healthy individual with GBA1 knockout and α-synuclein overexpression exhibited some PD features23." I would suggest enumerating what "PD features" are to distinguish from "clinical features", which I don't think is the intended meaning.

      This was revised as “exhibited characteristic PD markers”.

      (2) Figure 2I: The reported number of downregulated DEGs is incorrect. It should be 765, not 1429.

      This was corrected in Figure 2I.

      (3) Line 359: change "enrich" to "enriched".

      This word was corrected.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Programmed cell death is prominent in developing nervous systems across evolution, but its function remains obscure. Recent work suggests that it might impact behavior, but an examination of its effects on behavior and underlying neuronal circuits in intact organisms has not been determined. In this manuscript we report that programmed cell death sculpts the developing nervous system and shapes innate behavior. Using synaptic labeling, in vivo calcium imaging, targeted rescue of programmed cell death, and automated high-resolution analysis of cell death mutants, we find that loss of programmed cell death alters animal behavior. These findings reveal that neuronal cell death during development provides a reservoir of fates and circuit connections that could be accessed on evolutionary time scales to modify innate behavioral programs. Our manuscript thus answers one of the major outstanding questions in developmental neuroscience—why programmed cell death is so prevalent—by identifying consequences for brain function at the subcellular, cellular, circuit, and behavioral level. This study will be of interest to those interested in evolution of the nervous system and behavior, developmental biology, and neural circuit development.

      We thank the reviewers for their careful attention to the manuscript. Both reviewers were enthusiastic about the work. Here we address their suggestions. As noted below, we have already addressed most of their points, and we discuss in detail the remaining point—whether it is possible to perform experiments for a more specific targeting of the undead RIM cell death event to provide additional evidence for its role in altering reversal behavior.

      2. Description of the planned revisions

      *Reviewer 1: “1. The argument that that differences in reversal behavior are likely attributable to the difference in RIM neuron numbers in the ced-3 rescue studies is very plausible. Nonethless, there remains the possibility that for some reason in animals with 4 RIMs there may be a more global effect on the fate of cells slated to die, unrelated to the number of RIMs. I think there are two ways to test this. (1) quantify the behavior in 2- vs. 4- RIM neurons in animals also containing a marker for other undead neurons, and see if there is any correlation between 4 RIMs and survival of unrelated neurons (but preferably reasonably closely related by lineage- in case that's the issue). (2) Since the authors are able to distinguish the undead cells, can they perform laser ablations on these cells and assess whether behavior is restored to normal values?” *

      • *We agree that this point is already very plausible. We also appreciate the reviewer’s suggestions on how to extend this conclusion.

      Regarding suggestion (1): Unfortunately there is not a reliable marker for undead neurons (although a current project in the lab is indeed to develop one). However, we note that the undead RIM sister cells adopt a RIM neuron fate in 96% of ced-3 mutants, while with other undead cells investigated neuron fate adoption ranged from 59% (ASEL) to 77% (ASER). This suggests that the undead RIM fate adoption is not strongly correlated with the fates of other undead cells.

      Regarding suggestion (2): We attempted to perform laser ablation of undead RIM neurons in ced-3 mutants, but we could not overcome the technical hurdles (despite our lab’s expertise in laser axotomy). We found that we could not reliably remove both undead RIMs without damaging the wildtype RIM that is in close proximity, especially in the quantities of animals necessary for behavioral experiments.

      As an alternative, we plan to perform more targeted experiments to manipulate cell death in the undead RIM to address the points raised by both reviewers. Our goal is to generate two strains. In one, programmed cell death is prevented specifically in the RIM neurons in wild type animals. We hope to achieve this by either transgenic expression of a gain-of-function mutation of ced-9, or else by RIM-specific RNAi against egl-1, ced-3 or ced-4. To do this we will use the RIM promoter tdc-1, which is confined to RIM and RIC. The second strain will allow cell death to occur only in RIM (and RIC) in animals that otherwise have no cell death. Here, we will drive wild-type ced-3 or ced-4 under the tdc-1 promoter in the corresponding mutant background.

      We note 2 caveats for both of these approaches: 1) RIC also has an undead sister; 2) Most probably, the tdc-1 promoter will not be active in time to block cell death. Caveat #2 is actually the reason why we did not do these experiments initially (instead we used the most specific promoter we could find that is expressed early in the RIM lineage, before RIM is born).

      However, we agree that if successful these experiments would complement the existing experiments, and we will build all these strains.

      Reviewer 2: “Mosaic rescue of RIM via stochastic loss of a rescue array helped demonstrate the contribution RIMu have to the locomotor phenotype. As the authors emphasise these animals have many other undead cells (outside of the reverse network). A conditional rescue of only the RIMu would greatly improve the strength of the claims made. Would a conditional RIM egl-1 knockdown (via RNAi) be possible to selectively inhibit apoptosis in those neurons. This experiment should be considered OPTIONAL. It may be that such specific promoters do not allow for egl-1 RNAi to function at the right time to rescue death.”

      • *We appreciate the reviewer’s suggestion. As stated above, we are working to perform an expanded version of these exact experiments, as well as their converse. However, as the reviewer notes, it is very possible that the timing of expression will prevent these approaches from working (Caveat #2 above).

      Reviewer 2: There is a slight issue with interpretation of the data with the mosaic GLR-1::tagRFP Fig 2M which reveals the postsynaptic compartment of one RIM even though there are two present. There seems to be no obvious apposition between pre/post and they somewhat seem to be floating in space. Why is this the case? One would have imagined that the structures in Fig 2L would be tiled composites of both AIB & RIM pre and postsynaptic elements coalescing. Can the authors provide an alternative explanation for this phenotype. Nevertheless, the data on Fig 2L seems solid.. that is animals with extra undead RIM cells have additional cell-type specific synaptic terminals

      We have selected a different micrograph that is more representative of the RIM post-synapses in ced-3 mutants. In this animal, the array labeling the post-synapses in RIM has been lost from one of the two RIM neurons, making it easier to discern that the post-synapses are apposed to the AIM pre-synaptic marker (Fig 1M).

      Reviewer 2: Clarity should be improved around the use of 'expected number' in figure 1. The description of the metric 'The 'expected number' is defined as the number of neurons of the type present in wild-type animals, plus the number of lineage-proximate undead cells.' suggests that expected (blue) regions of pie charts represents lineages with expected sum total of wt and extra undead cells. However, in reference to panel H 'The wild-type animal has two RIM neurons, and the ced-3(n717) animal has two additional RIMlike cells and is counted as contributing to the orange "more than expected" sector in panel (A)' it is said that the animals with 2 WT accompanied by each undead sister contributes to more than expected (orange) region. These appear inconsistent. Can you qualify?

      We thank the reviewer for this point and have added a schematic to clarify the quantification of undead cell fates (Fig. 1).

      Reviewer 2: Specific observations shown in supplemental data SI-L despite being cited in the text is not explained or formally referenced. The details of these panels should either be briefly explained/their inclusion qualified in the text or simply remove from the figure

      We have added reference to these figures in the main text “Undead cells are even capable of producing complex morphology, such as the highly branched dendrites of the PVD neurons (Figure S1I-L).” (p. 3)

      Reviewer 2: The dual image photomicrographs could be in green/ magenta or red/cyan to make colourblind friendly.

      We have updated micrograph colors to be colorblind friendly (Fig 1K-M, S1L).

      Reviewer 2: Do the authors have data with the pRIMtagRFP egl-nucGFP. If they do it would be useful to show it.

      We have added a micrograph of egl-1::GFP and RIM labeled using NeuroPAL (Fig. S2A).

      Reviewer 1: 2. The authors speculate, if I understand correctly, that the mechanism by which reversal frequencies are decreased in 4 RIM animals may be that the reversal state is stabilized, resulting in longer reversals and consequently fewer reversal events. This is a nice model that is testable. The authors could, for example, examine the connections of RIM neurons to the AVA neuron, a main command interneuron for reversal initiation, and assess whether there are indeed more such synapses. Furthermore, the authors can assess whether the frequency of AVA firing is decreased. Of course, there are other plausible mechanisms involving connectivity of other neurons onto AVA which could explain the phenomenon. The authors may wish to add a comment regarding this in the discussion.

      • *

      We thank the reviewer for this suggestion. There are multiple postsynaptic receptors expressed in AVA for RIM neurotransmitters and the contribution of each to reversal behavior is still being debated, making it challenging to dissect the contribution of each of these to the effects on reversal behavior mediated by the undead RIM. Given this, we believe that addressing this point experimentally is beyond the scope of this paper. We have added a sentence in the discussion commenting on this as a future direction for this work “The mechanism of the downstream circuit mediating the effects of the undead RIM could be determined through quantification of AVA postsynaptic receptors and examining reversal behavior of cell death mutants with knockouts of AVA receptors.”

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statement We thank all three reviewers for their careful and constructive evaluation of our manuscript. We are pleased that the reviewers recognised the importance of the work we describe and found the experimental approach sound.

      This manuscript reports that undesired insertion of the plasmid backbone, including vector sequences not intended to be part of the genome edit, occurs at high frequency during CRISPR/Cas9-mediated HDR in Drosophila. We document this phenomenon across multiple independent genome editing projects, using three different plasmid backbones and targeting distinct genomic loci, demonstrating that it is not an isolated or project-specific artefact. We further introduce pVID, a new donor vector incorporating a ZsGreen negative selection marker that allows straightforward identification and exclusion of lines carrying undesired insertions, providing a practical solution to avoid this genome editing issue.

      In response to the reviewers' comments, we have revised the manuscript to: (i) correct and contextualise prior descriptions of this problem, incorporating the references suggested by Reviewer 2; (ii) add a table summarising gRNA characteristics for all editing projects; (iii) expand the discussion of the underlying DNA repair mechanisms, the potential influence of Cas9 source choice, and the relevance of the findings beyond Drosophila; (iv) confirm the stability of problematic template vector insertions across multiple generations; and (v) improve figure clarity, correct typographical errors, and clarify several passages flagged by the reviewers. All responses are described in detail below.

      1. Point-by-Point Description of the Revisions

        Reviewer 1

        Major Comment 1 — DNA repair pathways underlying backbone capture • I think the authors should discuss potential DNA repair pathways (e.g., NHEJ, MMEJ) underlying plasmid backbone capture in more detail. Did you check for knockouts within your screened transformants? That could provide insight into the underlying mechanisms.

      Response: We screened humanized TDP-43 line for tbph knockouts, since our aim was to fully knock out the Drosophila gene and insert the human ortholog. However, we did not screen any of the other lines described in the manuscript for indels caused by NHEJ, since the dsRed selection we employed would not enable us to recover lines without insertion events. We hypothesise that one of the two gRNAs used being more inefficient than the other causes a single homologous recombination event and insertion of the vector template. However, the underlying mechanism is still unclear, and could be caused by NHEJ, HDR or a combination of these mechanisms as has previously observed (44). We have expanded on potential mechanisms inducing HDR template vector insertion events in the discussion of the revised manuscript.

      Major Comment 2 — gRNA characteristics and design parameters • It would be important to describe gRNA characteristics and general design parameters (GC content, distance from cut to intended edit, homology arm length) and analyze whether these correlate with correct HDR vs. plasmid insertion. A table summarizing these details could help reveal potential trends.

      Response: At the reviewers suggestion, we have added a table (Table 1) describing the all the characteristics of the gRNAs further in the material and method section. Unfortunately though, no commonality was immediately apparent to us.

      Major Comment 3 — Single versus dual gRNA strategies • Did the authors consider exploring whether using a single gRNA reduces backbone insertion frequency compared to dual-gRNA strategies? I understand that two gRNAs are needed for your strategy, but it would be interesting to know whether these outcomes are linked to the dual-gRNA design.

      Response: As stated in the discussion, we theorize that perhaps one of the two gRNAs used in our strategies cuts more efficiently and thereby causes a single homologous recombination event and insertion of the vector template. It is possible that originally using a strategy with only one gRNA could cause less insertion of the vector template, however this may be at the cost of gene editing efficiency. Indeed, when Ge et al (17) compared using one versus two gRNAs to induce HDR, they observed more reliable repair events when two gRNAs were used.

      Major Comment 4 — Stability of backbone insertions across generations • Did you evaluate whether backbone insertions are stable across generations or prone to rearrangement?

      Response: We did keep several of the lines reported in this paper stably across multiple generations, and we have added this observation to the manuscript

      Major Comment 5 — Broader applicability in non-model organisms and therapeutic settings • A broader discussion of the potential applications of this approach in non-model insects, mammalian cells, or therapeutic settings where HDR is inefficient would be valuable.

      Response: While we only investigated this effect in the creation of CRISPR/Cas9 Drosophila melanogaster models, it is very possible that this could also affect other model organisms or cells. We encourage the use of HDR template negative selection markers in all uses of HDR-mediated CRISPR/Cas9 genome editing.

      Major Comment 6 — Cas9 promoter and expression level • The authors also mentioned using a validated Cas9 line (ref #23). What promoter drives Cas9 expression in this line? Did you consider testing different promoters? Since timing of Cas9 expression can be critical, promoter choice may have influenced the results and should be discussed.

      Response: We used the nos promoter for the expression of Cas9, as this promoter is expressed in germ cells and is known to have better efficiency than the other germline promotor like vasa (Port et al 2014, Ref #23). However, it is conceivable that the high Cas9 concentration in this line could induce a higher rate of double stranded breaks and thus template vector insertion. We agree it would be interesting to test other Cas9 sources, though this would likely come at the cost of overall editing efficiency. As we describe, the use of pVID now allows negative selection against HDR template vector insertion even with this Cas9 source. We have expanded upon the potential use of other Cas9 sources in the revised discussion.

      Reviewer 2

      Major comments

      None

      Minor Comment 1 — Line 38: prior descriptions of backbone insertion in Drosophila Line 38: "this type of unwanted template vector insertion in the case of Drosophila genome editing has to our knowledge not been previously described." Insertion of vector sequences after CRISPR editing in Drosophila and strategies to mitigate such events have been previously described in multiple studies. The authors need to incorporate these into their manuscript. https://doi.org/10.1242/bio.20147682, https://doi.org/10.1080/19336934.2020.1832416, https://doi.org/10.1534/g3.116.032557.

      Response: We are very grateful to the reviewer for pointing out these prior observations of vector insertion events of which we were not aware. This prior work has now been fully incorporated and referenced in the revised manuscript, and we have removed this erroneous statement. We feel this manuscript validates and quantifies the extent of HDR template insertion across multiple genome editing strategies and templates plus, with pVID, provides a solution to this vexing problem.

      Minor Comment 2 — Line 79: PAM sequence sentence I have difficulties understanding the following sentence: Line 79: "At this location, on both sides of the insertion, the PAM sequence of the target region was edited to match the PAM sequence of the template donor plasmid." I assume what is meant here is that in the donor vector the PAM sequence was mutated to prevent recutting, but that means this sequence is no longer a PAM. Please rephrase for added clarity.

      Response: The PAM sequence was indeed edited in the template donor plasmid to prevent re-cutting, and we are referring to this edited version of the PAM sequence in this sentence. We edited this sentence this to clarify that the PAM sequences have been edited.

      Minor Comment 3 — Figure 2: panel D arrangement In Figure 2 panel D is arranged between panels E and F.

      Response: Thank you for pointing this out. We have corrected this error.

      Minor Comment 4 — Primer positions in figures In Figure 2 it would be useful to also indicate the position of the primers used in 2d in the schematic in 2e. The same applies to Fig. 3a and 4a.

      Response: We have added the position of the primers in figure 2. Since the primers are targeting the backbone of the plasmid commonly in all projects included in this manuscript, we have chosen to only include one figure of this (figure 2).

      Minor Comment 5 — Lines 89–90: duplicated sentence Lines 89, 90: Duplication of the same sentence.

      Response: Thank you, we have corrected this mistake.

      Minor Comment 6 — VGAT editing: consecutive editing and sgRNA placement Editing of the VGAT gene: In this case correct editing and plasmid insertions could be found on the same chromosomes. This might be caused by concatemer formation of repair intermediates (as has been described in multiple systems) or by consecutive editing events. Can you please specify whether the donor vector was designed to prevent consecutive editing? I'm also a bit confused about the locations of the sgRNA target sites according to Fig. 3a. It appears that part of the insertion (i.e. the ALFA tag) was encoded on the homology arm and not between the target sites. While such strategies have been described, they are often avoided as the efficiency of insertion decreases with increasing distance to the cut site. Was it not possible to us a sgRNA better matching the insertion cassette?

      Response: For Vgat genome editing, we followed an existing strategy that has been proven effective, reusing the same gRNAs and overall approach to replace the 9×V5 tag with a 1×ALFA tag (Certel et al. 2022, Ref #28)

      Minor Comment 7 — Line 133: mini-white marker unreliability Line 133: Please describe why the mini-white marker was unreliable.

      Response: In our first design of the pVID vector, we used mini-white as the negative selection marker. However in a number of white eyed lines, we could still confirm the undesired insertion of the HDR template vector. We speculate that expression of mini-white (which we confirmed was not mutated) was repressed in these lines by an unknown mechanism. Since (Nyberg et al. 2020 , Ref #35) also proposed using mini-white as a negative vector selection marker, we wanted to mention this problem with mini-white negative selection, though we remain unsure of the exact cause. In any case, the use of exogenous ZsGreen in pVID as described in the manuscript fully resolved the issue allowing reliable detection of template vector insertion events as we describe.

      Minor Comment 8 — Line 161: "varying frequency" Not sure I understand the sentence in line 161: If 54% of lines had vector insertion, what does the "varying frequency" refer to?

      Response: We have edited this sentence to clarify that 54% of lines had vector insertion.

      Minor Comment 9 — pVID availability in methods Consider highlighting the availability of pVID also in the methods section that described this plasmid.

      Response: This has been added to the methods section.

      Reviewer 3 No edits suggested.

      We thank Reviewer 3 for their positive assessment of the manuscript and for confirming that no revisions are required.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This foundational study builds on prior work from this group to reveal the complexities underlying ligand-dependent RXRγ-Nur77 heterodimer formation, offering a compelling re-evaluation of their earlier conclusions. The authors examine how a library of RXR ligands influences the biophysical, structural, and functional properties of Nur77. They find that although the Nur77-RXRγ heterodimer shares notable functional similarities with the Nurr1-RXRα complex, it also exhibits unique features, notably, both dimer dissociation and classical agonist-driven activities. This work advances our understanding of the nuanced behaviors of nuclear receptor heterodimers, which have important implications for health and disease.

      Strengths:

      (1) Builds on previous work by providing a comprehensive analysis that examines whether Nur77-RXRγ heterodimer formation parallels that of the Nurr1-RXRα complex.

      (2) Systematic evaluation of a library of RXR ligands provides a broad survey of functional outputs.

      (3) Careful reanalysis of previous work sheds new light on how NR4A heterodimers function.

      We thank the reviewer for recognizing our work as foundational. In the nuclear receptor field, current understanding of ligand-regulated nuclear receptor activity is based largely on ligand-dependent coregulator recruitment preferences; for example, agonists enhance coactivator recruitment to activate transcription. Building on our recent study of Nurr1-RXRα, the present work suggests that activation of the evolutionarily related NR4A-RXR heterodimer Nur77-RXRγ by RXR ligands is also consistent with a non-classical activation mechanism involving heterodimer dissociation.

      Weaknesses:

      (1) Some conclusions appear overstated or are not well substantiated by the work presented. It's unclear how the data support a non-classical mode of agonism, for example, based on the data shown.

      We thank the reviewer for this important point. We did not intend to claim that Nur77-RXRγ activation is explained exclusively by a non-classical mode of agonism. Rather, our interpretation was that the data are consistent with two possible, non-mutually exclusive mechanisms: (1) a classical pharmacological mechanism involving ligand-dependent coregulator recruitment; and (2) a non-classical mechanism involving ligand-binding domain (LBD) heterodimer dissociation, as we previously described for Nurr1-RXRα. This differs from our prior eLife study of Nurr1-RXRα, in which the data supported the LBD heterodimer dissociation model but not the classical pharmacological model.

      In our revised manuscript, we clarify two points that are important for interpreting the Nur77-RXRγ data. First, several experimental limitations of the Nur77-RXRγ studies reduced the extent to which the mechanism could be resolved as rigorously as in our earlier Nurr1-RXRα study. Second, and more importantly, the currently available ligand set lacks Nur77-RXRγ-selective agonists. This limits our ability to determine whether LBD heterodimer dissociation is the sole or principal mechanism of activation, or instead one of several contributing mechanisms.

      Taken together, these results support LBD heterodimer dissociation as a plausible and experimentally observable component of Nur77-RXRγ activation and, therefore, as a candidate shared activation mechanism for NR4A-RXR heterodimers. At the same time, because the quantitative evidence is less definitive than in the Nurr1-RXRα system, we agree that conclusions regarding Nur77-RXRγ should be stated more cautiously. This caution is reflected in both the title of our manuscript (“Towards a unified mechanism…”) and the language used throughout the text.

      (2) Some assays have relatively few replicates, with only two in some cases.

      We thank the reviewer for their attention to experimental rigor. For some assays, the findings were reproduced in two independent experiments, which we considered sufficient to confirm the presence and reproducibility of the effects observed in those particular assay formats. In the original manuscript, we used a general statement in the figure legends (“representative of two or more independent experiments”) across all assay data. In the revised manuscript, we now specify the number of independent experimental replicates for each assay in the corresponding figure legends to improve transparency.

      Reviewer #2 (Public review):

      Summary:

      This study explores the mechanisms by which binding of the nuclear receptor RXRg regulates its heterodimeric partner Nur77. Previously, this group made the interesting discovery that ligand-dependent activation of RXRg bound to a related partner, Nurr1, does not occur through a classical pharmacological mechanism but through agonist-dependent dissociation of the complex through disruption of their ligand binding domain (LBD) interactions. Here, they revisit this paradigm with Nur77. In contrast to Nurr1, the authors do not have the reagents to clearly support a role for LBD dissociation. Following the model of partial ligand-dependent dissociation of the LBD heterodimer, the experimental data (NMR, ITC, SEC) are interesting and quite complex.

      Strengths:

      The authors do a rigorous job of describing the data and providing possible interpretations and caveats. Revisiting the analysis of Nurr1, they identify the crucial role that selective Nurr1-RXRg agonists played in supporting the LBD dissociation model; without analogous compounds for the Nur77-RXRg complex, it is difficult to invoke this mechanism. Interestingly, treatment with the Nurr1-RXRg selective agonist HX600 suggests it can induce some LBD dissociation. Therefore, there may be some similarities between the regulation of Nurr1 and Nur77 by RXRg.

      We thank the reviewer for this thoughtful and balanced summary of our work. We appreciate the reviewer’s recognition of both our prior findings in the Nurr1-RXRα system and the interesting, but more complex, experimental behavior observed here for Nur77-RXRγ. We agree that the absence of Nur77-RXRγ-selective agonists currently limits how definitively the contribution of LBD dissociation can be resolved, and we have revised the manuscript to make this point more explicit and to further temper our conclusions accordingly.

      Weaknesses:

      Despite evidence supporting a partial role for RXRg LBD dissociation as a mechanism to activate Nur77, other data demonstrate that a fundamentally different regulatory mechanism likely exists in the Nur77-RXRg complex that involves the RXRg disordered NTD. The decision to describe further study of this as outside the scope of this work is unfortunate, as it closed off an avenue that could have provided fruitful data informing the apparently distinct regulatory mechanisms of the Nur77-RXRg complex. Given the uncertainty in the importance of the partial roles of the pharmacological mechanism, LBD dissociation, and the RXRg NTD, this study may have limited impact on the field.

      We thank the reviewer for this thoughtful point. We agree that the RXRγ NTD likely contributes to regulation of Nur77-RXRγ transcription, and that our truncation data suggest that regions outside the LBD can influence transcriptional output. At present, however, the effect of RXRγ NTD truncation is not sufficiently mechanistically resolved to distinguish among several plausible explanations.

      For example, the RXRγ NTD has been implicated in phase separation and biomolecular condensate formation in cells (PubMed ID 40392852, 40420113, 33971237, 31881311), and perturbing these properties (via RXRγ NTD truncation) could indirectly affect Nur77-RXRγ transcriptional activity. In addition, NTDs of nuclear receptors can participate in coactivator or corepressor interactions (PubMed ID 24284822), raising the possibility that removal of the RXRγ NTD alters transcription by changing recruitment of regulatory factors rather than by directly informing the LBD-centered mechanism examined here. We will clarify in the revised manuscript that these possibilities remain unresolved and represent important directions for future study.

      We also agree that defining how multiple RXRγ domains contribute to Nur77-RXRγ regulation would be valuable for the field. However, the focus of the present study is narrower: to test whether, as in our previous eLife study of Nurr1-RXRα, RXR ligands can influence heterodimer function through effects on LBD-LBD interactions. Because the available data do not yet allow a mechanistic dissection of the RXRγ NTD contribution, we believe that a definitive analysis of this question would require a separate set of experiments beyond the scope of the present work. We have revised the manuscript to better acknowledge this limitation and to frame the conclusions accordingly.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Overall, this is a compelling body of work. Additional summary statements and clearer transitions would be helpful throughout.

      Here are some points that should be addressed or at least discussed by the authors:

      (1) It is unclear in the luciferase assays whether the truncated proteins are functional or not. Were there Western blots or other assays run to confirm protein concentrations?

      We thank the reviewer for this point. We did not perform Western blotting or other assays to confirm equivalent expression levels of the truncated RXRγ constructs, and we agree that this is a limitation of the luciferase assay data. As a result, the transcriptional effects observed with the truncation constructs should be interpreted cautiously.

      With that said, the increased transcriptional activity observed upon deletion of the RXRγ NTD/AF-1 region suggests that this region may exert a repressive effect on Nur77-RXRγ transcription. This effect could reflect multiple, non-mutually exclusive mechanisms, including altered phase separation or condensate-related properties of RXRγ, or altered recruitment of transcriptional coregulators through the NTD. Because our truncation strategy does not distinguish among these possibilities, we do not believe these data allow a definitive mechanistic interpretation of the NTD contribution.

      We have revised the manuscript to clarify this limitation. We also note that the primary focus of the present study is the role of ligands in modulating Nur77-RXRγ function through LBD-mediated interactions, in direct comparison with our previous Nurr1-RXRα study. A more complete mechanistic dissection of how RXRγ domain architecture influences Nur77-RXRγ transcription will require future work.

      (2) Why does the Nur77 construct lacking the NTD show increased luciferase activity?

      Please see our response above to Reviewer 2’s Public Review, which also addresses this point.

      (3) A case is made for the Nur77 LBD driving the activity, but it also could be inferred that the DBD is driving based on the data shown in Figure 1.

      We thank the reviewer for this point. We agree that the Nur77 DBD is required for binding to NBRE response elements, and we did not intend to suggest otherwise. The experimental approach in Figure 1 was not designed to dissect the relative contributions of Nur77 domains, since Nur77 was tested only in its full-length form. Instead, the purpose of this experiment was to examine how truncation of RXRγ domains affects Nur77-RXRγ transcriptional activity, in direct comparison with our prior eLife study of Nurr1-RXRα, where RXRα domain truncations helped define the importance of RXR-LBD-mediated regulation. We will revise the text to clarify that Figure 1 does not distinguish whether Nur77 DBD-dependent DNA binding is necessary, but instead addresses whether the pattern of RXRγ domain dependence is consistent with an LBD-centered mechanism of ligand-regulated heterodimer function.

      (4) It is stated that the HX600 coactivator recruitment requires further study. Why wasn't it studied here?

      We thank the reviewer for this point. The primary focus of this study was to determine how RXR ligands influence Nur77-RXRγ heterodimer activity, particularly in relation to ligand-dependent effects on heterodimer function. A more detailed analysis of HX600-dependent coactivator recruitment would require a broader mechanistic investigation of RXRα and RXRγ homodimer pharmacology and RXR-specific coregulator interactions, which extends beyond the central scope of the present manuscript. We agree that this is an important question and view it as a valuable direction for future work.

      (5) Figure 3B, the shifts in monomer populations, error bars aren't shown, the biggest shift is from 0.2 to 0.6, is that statistically meaningful?

      We thank the reviewer for this point. The reviewer is correct that error bars were not shown for Figure 3B. These NMR measurements were performed once (n=1), and therefore the shifts in monomer populations shown in Figure 3B cannot be assessed statistically. Because these studies required substantial NMR instrument time and isotopically labeled protein at high concentration, we were not able to perform experimental replicates for this dataset. We have revised the figure legend to explicitly state that these data were collected from a single experiment and have tempered the corresponding language in the manuscript accordingly.

      (6) Some ligands are shown in the figures but don't appear to be discussed in the text (at least that I can find), such as SR11237.

      We thank the reviewer for pointing this out. We used a panel of 14 commercially available RXR ligands with different pharmacological properties to probe Nur77-RXRγ function, as in our previous Nurr1-RXRα study. In the text, we emphasized ligands that were most informative for the mechanistic conclusions, rather than discussing every compound individually. SR11237, for example, behaved similarly to the broader group of RXR agonists and was therefore shown as part of the full ligand panel but not specifically highlighted in the text. We will clarify this in the revised manuscript.

      (7) There is a sentence in the discussion that says "these observations implicate that although RXRg LBD provides the protein-protein interaction interface to bind Nur77...." the authors did not show enough data to support this claim. It should be bolstered.

      We thank the reviewer for this point. We agree that this statement was stronger than was warranted by the data presented. Our intent was not to claim that the present study definitively establishes the RXRγ LBD as the sole or fully defined protein-protein interaction interface for Nur77 binding. Rather, based on the domain truncation data together with our prior Nurr1-RXRα study, we intended this statement as a working interpretation consistent with an LBD-centered mechanism. In our revised manuscript, we have softened this language to avoid overstating the conclusion and clarified that the current data support, but do not definitively prove, a role for the RXRγ LBD in mediating functionally relevant interaction with Nur77.

      Reviewer #2 (Recommendations for the authors):

      Even though this study is not able to make definitive claims about the mechanism(s) of activation of Nur77 in the Nur77-RXRg complex, the work presented here is rigorous and solidly interpreted. Identifying differences between Nurr1 and Nur77 regulation is important, and the work here shows that selective agonists are essential for supporting the non-canonical mechanism they identified before. Although they address potential implications of NTD regulation in the discussion, it feels like a lot of insight into Nur77 regulation is being missed. However, it is clear that addressing this experimentally would require substantially more work. I don't have any specific recommendations. Given current limitations on funding, I think it's fine to focus on the work completed with the acceptance that it likely limits the impact of the work on the field.

      We thank the reviewer for this thoughtful and balanced assessment of our work. The goal of this manuscript was to test whether the LBD heterodimer dissociation mechanism that we previously reported for Nurr1-RXRα may represent a conserved feature of NR4A-RXR heterodimers by extending these studies to Nur77-RXRγ. We agree that understanding the role of the RXRγ NTD in Nur77-RXRγ regulation is important and potentially highly informative. At the same time, resolving that question experimentally would require a distinct and more extensive set of studies beyond the scope of the present work. We have therefore chosen to focus this manuscript on the completed LBD-centered studies, while acknowledging that this narrower scope may limit the broader impact of the work.

      Minor points:

      (1) Without page and line numbers, it is not easy to point out specific text. On the bottom of page 6 of the document, there are two references to Figure 3a, and the arrows that help illustrate RXRg LBD-dependent CSPs; the second figure callout should describe the blue arrow, I believe.

      Thank you, we made this change.

      (2) Bottom of page 8, "...revealed two compounds [that] standout..."

      Thank you, we made this change.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      We truly appreciate all the effort that the reviewer put into reading and understanding our work. With a total of 37 excellent questions, this is one of the most thorough reviews that we have received in a long time.

      R1.0: Summary:

      In this study, the authors propose a "unifying method to evaluate inter-areal interactions in different types of neuronal recordings, timescales, and species". The method consists of computing the variance explained by a linear decoder that attempts to predict individual neural responses (firing rates) in one area based on neural responses in another area.

      The authors apply the method to previously published calcium imaging data from layer 4 and layers 2/3 of 4 mice over 7 days, and simultaneously recorded Utah array spiking data from areas V1 and V4 of 1 monkey over 5 days of recording. They report distributions over "variance explained" numbers for several combinations: from mouse V1 L4 to mouse V1 L2/3, from L2/3 to L4, from monkey V1 to monkey V4, and from V4 to V1. For their monkey data, they also report the corresponding results for different temporal shifts. Overall, they find the expected results: responses in each of the two neural populations are predictive of responses in the other, more so when the stimulus is not controlled than when it is, and with sometimes different results for different stimulus classes (e.g., gratings vs. natural images).

      Strengths:

      (1) Use of existing data.

      (2) Addresses an interesting question.

      R1.1: Unfortunately, the method falls short of the state of the art: both generalized linear models (GLMs), which have been used in similar contexts for at least 20 years (see the many papers, both theoretical and applied to neural population data, by e.g. Simoncelli, Paninsky, Pillow, Schwartz, and many colleagues dating back to 2004), and the extension of Granger causality to point processes (e.g. Kim et al. PLoS CB 2011). Both approaches are substantially superior to what is proposed in the manuscript, since they enforce non-negativity for spike rates (the importance of which can be seen in Figure 2AB), and do not require unnecessary coarse-graining of the data by binning spikes (the 200 ms time bins are very long compared to the time scale on which communication between closely connected neuronal populations within an area, or between related areas, takes place).

      First, a few points of clarification.

      (i) We worked with two-photon calcium imaging data (mice), and with the envelope of multi-unit activity (monkeys). While both of these types of signals are strongly correlated with spikes, neither of them can be truly considered to be a point process.

      (ii)The reviewer points to Figure 2AB. The signals that we worked with can be negative. The black traces are the actual signals and show clear negative bouts, especially noticeable in the middle panel in Figure 2B. Of course, this does not mean that there are negative spike rates. This has to do with the way the data are normalized and not with the specific prediction method. However, the reviewer is correct in stating that the method that we used could also yield negative values even for non-negative spike rates.

      (iii) We did not bin the macaque data into 200-ms time bins, but rather 25-ms time bins (line 548, Figure 1B legend). Additionally, we have now performed additional analyses with different window sizes, showing that the conclusions still hold (see Supplemental Figure 4 and lines 139-143).

      To further address the reviewer’s question, we implemented a Poisson GLM enforcing non-negativity on macaque MUAe data (without spontaneous activity subtraction, ensuring strictly positive values; lines 135-139, Supplemental Figure 1M). The model did not improve predictions over ridge regression, confirming our methodological choice. This method is not directly applicable to mouse calcium data, since the activity after baseline subtraction can be negative.

      We did not use Granger or any other causality methods. The question of causality is certainly important, and there are multiple methods developed to assess causality in neural signals. We do not make any claims about causality in our study. A rigorous evaluation of causality is an interesting line of research for future work.

      R1.2: In terms of analysis results, the work in the manuscript presents some expected and some less expected results. However, because the monkey data are based on only one monkey (misleadingly, the manuscript consistently uses the plural ‘monkeys’), none of the results specific to that monkey, nor the comparison of that one monkey to mice, are supported by robust data.

      We have now added data from 2 additional monkeys, including:

      (i) A second monkey (monkey “A”) from the same dataset (Chen et al., 2020), which includes all activity types except the lights off condition (lines 90-96, 120-132, 159, 161, 171, 183-185, 188-194, 200-203, 228-237, 254-258, 292-296, 334-342, 351-353, 358-364, 374-378, 387-393, 400-408, 414, 417-421, 539-540, 544-545, 680-681, 696-698; Supplemental figures 1-6, 8, 11, 12, and 13; Table 2).

      (i) We collected new neural activity from one additional monkey (monkey “D”) in collaboration with the Ponce lab (lines 90-96, 120-130, 132-134, 163-164, 228-235, 237-243, 292-296, 351-353, 374-378, 387-389, 539-540, 553-560, 696-698; Supplemental figures 1-2, 4, 6, 9, 11, and 12; Table 2). The new data include responses to the same checkerboard and gray screen images as the original dataset, along with responses during lights-off conditions.

      R1.3: One of the main results for mice (bimodality of explained variance values, mentioned in the abstract) does not appear to be quantified or supported by a statistical test.

      We have now formally quantified the bimodality of the relationship between one-vs-rest correlation and inter-laminar explained variance (EV) in mice using Hartigan’s dip test, applied to neurons with EV>0.4. The test confirmed significant bimodality in two of the three mice (MP031 and MP032: p<0.001; MP033: p=0.687). These results are now included in the Results section (lines 307-311) and shown in Supplemental Figure 7A,D. In datasets that did not show bimodality by visual inspection (macaque recordings), the same test yielded non-significant results (e.g., p=0.994), confirming that the statistical analysis distinguishes between bimodal and unimodal cases.

      R1.4: Moreover, the two data sets differ in too many aspects to allow for any conclusions about whether the comparisons reflect differences in species (mouse vs. monkey), anatomy (L2/3-L4 vs. V1-V4), or recording technique (calcium imaging vs. extracellular spiking).

      We also agree with this comment. Our goal is not to provide any direct quantitative comparison between the two species. We emphasize (lines 494-497) that the experiments in the two species differ along multiple dimensions, including: (i) differences in recording modalities (calcium vs. electrophysiology), (ii) associated differences in temporal resolution, neuronal types, and SNR, (iii) cortical targets (layers vs. areas), (iii) sample size, (iv) stimuli, (v) task conditions. In the revised manuscript, we also emphasized that the aim of this work is to investigate inter-areal interactions within each species rather than to draw quantitative comparisons between species (lines 497-499).

      Reviewer #1 (Recommendations for the authors):

      R1.5 In the analysis of directionality, you stated that subsampling was done randomly. Presumably, there could be multiple subsamples that fulfill the control of split-trial r. Are you only showing results from one subsample or multiple subsamples?

      We show the median from 10 subsample permutations. This is now clarified in line 621.

      R1.6 About the measurement 1-vs-rest r2. Understanding the definition is important for interpreting the results, but the definition was not clearly written. In lines 195-196, could you be more clear about whether the correlation is between the predicted neuron and other neurons in the predicted population or between the predicted neuron and the mean activity of the predictor population? Also, in line 212, why do you call this self-consistency? Isn't this a correlation between a neuron and the others?

      The 1-vs-rest r<sup>2</sup> value, or self-consistency, is the correlation calculated for each neuron i and does not involve other neurons. Let indicate the response 𝑟 of neuron i during trial t (t=1,..., T where T is the total number of trials). For a given trial t, we compute the average activity of the neuron excluding this trial:

      Throughout, the superscript (rest)means “all repetitions excluding repeat 𝑡”. The one-vs-rest correlation for the held-out repetition 𝑡 is:

      We then average these correlations across all held-out repetitions:

      We now clarify this in the text (lines 304-306 and lines 642-647).

      R1.7 In Figure 6 G and I. The "all" condition contains more neurons than either of the other two. In this case, is this comparison fair or meaningful?

      The reviewer is also correct here. The comparisons between the <10% and >80% groups contain the same number of predictor neurons, and those are fair comparisons. The “all” condition contains more predictor neurons, and, therefore, those comparisons are not fair. We clarified this point in lines 360-364.

      We included the “all” condition here because we think that it is an instructive sanity check in terms of reporting how EV changes with more neurons, and also in terms of understanding why the EV values in the other two conditions are lower. Expanding on this point with a little bit of philosophy, ultimately, when considering a neuron in area B (e.g., V4) and the contributions from neurons in another area A (e.g., V1), one would like to have access to all the inputs (e.g., all the neurons in V1 that are monosynaptically connected to the target neuron in area V4). We do not have access to this type of information, and we do not make any claims about monosynaptic connectivity, let alone exhaustive sampling of inputs to a given neuron. The “all” condition merely provides a quantitative illustration of the fact that EV increases with the number of predictor neurons. This observation may be considered to be somewhat trivial, but it should be pointed out that the conclusion relies on the input neurons sharing information with the target neurons (e.g., perhaps one may not be able to predict V4 activity very well from the responses of millions of neurons in the cerebellum).

      R1.8 I believe the results section can be improved by adding some interpretation after each finding.

      We thank the reviewer for the suggestion. We generally like to separate results from interpretation. However, to honor the suggestion, we added brief interpretations throughout the results section (lines 142-143, 171-173, 272-273, 279-281, 331-333, and 361-364) and expanded on the interpretations in the Discussion section.

      R1.9 Line 52 - 74: It would be better to be more specific about what kind of neuronal interactions, e.g., noise correlation, synchrony, etc.

      We added a clarification on the types of interactions we study in lines 68-73.

      R1.10 Line 81. Something seems to be missing after "5500". 5500 trials? Neurons?

      We thank the reviewer for pointing this out. The number refers to neurons (fixed in line 87).

      R1.11 Line 94. The readers would appreciate more explanation of the method.

      We have expanded on the explanation, as suggested (lines 106-107).

      R1.12 Line 104. The fraction of visually responsive neurons seems to be small. Is this typically for mouse V1? Would this fraction be higher if you also used the peak, as you did for macaque data in your SNR calculation (line 412)? And what is this number for the recorded L4?

      The reviewer correctly points out the small number of visually responsive neurons.

      We note that we now refer to the subset of neurons used for prediction analyses as visually reliable (VR) neurons (lines 115-116, 125-126, 178-179, 183-184, 211-212, 214-216, 217-226, 283-286), defined conservatively as neurons with SNR > 2 computed from the mean across all stimuli (not the peak to any one stimulus) and split-half reliability >0.8 (Methods, lines 569–590). This choice emphasizes neurons that are consistently informative over the full stimulus set.

      Regarding the question of how typical the number of responsive neurons in mice is, the fraction of “responsive” neurons in mouse V1 varies widely depending on the definition and stimulus set but the fractions are substantially lower than those reported in monkeys (with different methods). For those of us more used to the macaque neurophysiology literature, this has been one of the biggest surprises coming from work in rodents. Many studies report a sizable group of non-responsive neurons in mouse V1 (e.g., as little as 37% percent of V1 neurons being responsive in at least 25% of the trials according to de Vries et al., Nat Neur, 2020). Our fraction of visually responsive neurons is small because it couples a conservative SNR metric with a high trial-reliability threshold.

      As the reviewer notes, a peak-based metric based on any stimulus would be a less conservative criterion that would increase the fraction of neurons labeled responsive.

      R1.13 Line 113. Why not also give an exact percentage number?

      We have given the exact percentage number (lines 125-126).

      R1.14 Line 128. Is this just because L2/3 has more neurons? If so, then isn't this trivial?

      Our intention was to illustrate the best prediction performance we could get in either direction, which means including all L2/3 neurons. We have reworded our text to clarify (lines 149-151).

      R1.15 Line 134. Isn't this expected? Since V1 have more units than V4?

      The reviewer is correct. As discussed in R1.7 in mice, we sought to report the best prediction performances in either direction. We have edited our text for clarity (lines 149-151).

      R1.16 Line 165-168. What's the logical connection between these two sentences? If the former is true, we should expect to see differences. Also, why the same population? Shouldn't you include non-visual neurons?

      The two sentences in question are: “The difference in predictability in the absence of a stimulus could in principle change according to the directionality in inter-laminar interactions.” and, “There was no statistically significant difference in the EV fraction between laminar directions (L4→L2/3 vs. L2/3→L4) using the same control population as in Figure 3B (Figure 5A-C and Figure Supplement 2H).”. The key point here was to control for similar reliability values in order to make fair comparisons. We have added an additional comparison between directionalities focusing on nonvisual neurons (SNR<2 & r<0.8), and have also found no statistically significant difference between direction of predictability (Supplemental Figure 3A, right, lines 221-224).

      R1.17 Table 2. The information of which session corresponds to which experiment can be put in the table, which would be easier to read.

      We have added which sessions correspond to which experiments in Table 2.

      R1.18 Figure 1, Captions for panel c and d. I don't see any colored arrows in the figure.

      We removed the color descriptions (Figure 1C-D).

      R1.19 Figures 3, 4, and others. The annotations of "n.s." are very hard to see.

      We changed the color so that it is easier to see now (Figures 3, 4, 6, and Supplementary Figures 1-4, 6, and 8-10).

      R1.20 Figure 5, panel A. The legend is too small.

      We increased the legend size (Figure 5A).

      R1.21 Figure S5, panel D. Why are some of the data points connected?

      The paired connections are illustrated specifically in the highly predictable neurons to highlight the two separate distributions of neurons. One group, the highly predictable and highly reliable group, maintains its inter-laminar predictability after projecting out the “non-visual” activity (lines 327-330), whereas the highly predictable yet unreliable group shows a sharp decrease in inter-areal predictability, which corroborates the idea of non-visual components influencing neurons in mouse V1, as shown by Stringer et al. 2019b and consistent with our results.

      R1.22 l.91 "Ope" -> open?

      We fixed the typo (line 100).

      R1.23 Fig. 3C+D: Why is only one session used for this?

      One session was used to illustrate the distribution of split-half reliability values per area. Figure 3D contains information about all 5 stimulus sessions (see legend to Figure 3D).

      R1.24 "Even without controlling for the number of predictors or their respective split-half correlation values (627-688 sites in V1, 86-115 sites in V4), we found better predictability in the V1 to V4 direction than the reverse ( 𝑝 < 0.001, Figure Supplement 2I)." -> What does "even" mean here? Isn't this simply the null result if there is no true difference and the real reason the authors controlled for size?

      The reviewer’s understanding is correct. We have edited our text for clarity (lines 157-160)

      R1.25 "We could predict V1 and V4 activity across all stimulus types ( 𝑝 < 0.001, paired permutation test of prediction vs. shuffled frames prediction)." -> better than chance? For all neurons on average? What does this mean? Isn't it trivial and 100% expected that neural activity in the visual cortex is above chance related to the visual input?

      We stated that sites in V1 and V4 could predict each other across all stimulus types before describing the differences between them. We agree that this observation is to be expected and indicated so now in the text (lines 185-186).

      R1.26 "The predictability was the highest in both directions for neuronal activity in response to a full field checkerboard images (Figure 4D). In the V1 → V4 direction, the EV fraction was higher when predicting a slow-moving small thin bar compared to a fast-moving large thick bar (Figure 4D, left), whereas the opposite was true for the V4 → V1 direction (Figure 4D, right)." -> What does this mean? Is this expected or not? Under what theories of cortical processing?

      The differences between EV prediction directions (V1→V4: slow thin bars > fast thick bars; V4→V1: fast thick bars > slow thin bars) could be because V4 responses are more reliable for the slow thin bars whereas V1 responses are more reliable for the fast thick bars (Supplemental Figure 5H–I). To account for this possibility, we controlled for differences in target-related properties by regressing out covariates like SNR, split-half correlation, and variance. In monkey L, regressing out reliability/drive within direction using these covariates, the V4→V1 bar difference between slow thin bars and fast thick bars was not significant and the difference in the V1→V4 difference direction was reduced (Supplemental Figure 5K, lines 198-203). This suggests that the asymmetry primarily reflects stimulus‑dependent reliability of the target population rather than a strong directional selectivity.

      To the best of our knowledge, there are no clear predictions that match these observations from existing theories of visual cortical processing, especially given the paucity of computational models that include stimulus velocity when describing the responses in area V4. There has been extensive work on theories of surround suppression, but it seems unlikely that the thick bars would elicit surround suppression given the size of the V4 receptive fields. Many current computational models that aim to fit the responses of neurons in the visual cortex use neural networks that take an image as visual input and yield activations. Most of these models do not incorporate stimulus movement, and even those that do incorporate stimulus dynamics, only indirectly map onto interlaminar stimulus transformations or even between-area stimulus transformations. We hope that the results in this manuscript will help inspire and constrain better models of visual cortical processing.

      R1.27 Shouldn't all the predictability analysis be done conditioned on the stimulus in order to tell us more than the trivial "both V1 and V3, or L2/3 and L4, are driven by visual inputs"? (The spontaneous activity analyses are essentially that, for a small subset of the stimuli.)

      The key goal of this study is to quantify inter-areal interactions both under visual input and without visual input. This type of analysis is important because inter-areal interactions may depend both on visual inputs but also on neuronal inputs that are not triggered by visual signals. For example, extensive work in mice has now shown that neuronal responses in V1 depend on an animal’s running speed, independently of any visual input. Even within the visual input conditions, we present analyses where we shuffle trial order (e.g., Figure 7, Supplementary Figure 11) to estimate the contribution of trial-by-trial variations that are independent of visual inputs and other analyses where we project out non-visual activity (e.g., Supplementary Figure 7).

      R1.28 "In visually responsive neurons, there was a significant reduction in EV during gray screen compared to visual stimulus presentation" -> perfectly expected. But the report-worthy result here is how much is left, not whether EV is decreased!

      We have changed the wording on the results to highlight the sustained predictability (lines 211-212). It is important to note that, although the reduction in EV during gray screen may be expected, this observation does not hold for all neurons. In fact, there are some neurons for which the EV during visual presentation is comparable to that during gray screen (Figure 5B,C,E: neurons that lie on the diagonal line).

      R1.29 "Similar to the conclusions drawn from the mouse data, the predictability of neuronal activity was higher in response to stimulus presentation than to gray screen presentations" -> Really? Conditioned on stimulus, or explainable by the well-known fact that both V1 and V4 are visually driven?

      As discussed in R1.28, in mice, there are many neurons where the EV during gray screen is comparable to that during stimulus presentation. In monkeys, most sites were visually driven. As the reviewer points out, we expected that EV during stimulus presentation would be higher than during gray screen; this observation is a reasonable sanity check. The difference between unshuffled trials and shuffled trials (Figure 7, Supplementary Figure 11) provides an estimate of the interactions that are not purely explained by visual inputs alone in monkeys.

      R1.30 "Unlike the mouse, macaque correlation of visual predictability between stimulus presentation and spontaneous activity was high across all types of spontaneous conditions" -> Why? Is this simply explainable by a lower mean response in the spontaneous condition in the mouse? Are these mouse and monkey experiments truly comparable? Isn't it surprising that spontaneous activity in the monkey visual cortex compared to evoked activity is higher than in the mouse?

      With respect to the question of whether spontaneous activity (or stimulus-evoked activity) in monkeys is higher than in the mouse, it is difficult to make these comparisons. We emphasize in the text the multiple differences between the experiments in both species. Our goal is not to perform any quantitative comparison across species (see R1.4). We changed the wording to remove any inference of comparison between species (lines 248-250).

      R1.31 Occasionally imprecise presentation. Ex "To further examine the non-stimulus driven component, we reasoned that if the shared information between areas were strictly driven by the visual stimulus, then using the activity of a stimulus presentation repeat to one specific image could be used to predict the responses to any other stimulus repeat of the same image. On the other hand, if the shared activity does not have any stimulus-response information, then the prediction model would not work when considering responses across repeated presentations of identical stimuli in different trials. To test these two opposing ideas, we compared the inter-areal prediction EV fractions using unshuffled versus shuffled trials." -> Sets up two extreme strawmen (100% driven by stimulus vs 0% driven by stimulus). What does "model would not work" mean? EV=0? Hypotheses not ideas.

      Our intent was to set up two extreme hypotheses, not to claim that neurons must fall exclusively into one or the other. The two extremes help better interpret the results.

      The reviewer indicates that these are straw-man hypotheses. This may well be the case. But note the responses to R1.12, R1.27, R1.28, and R1.29. The reviewer seems to assume that all or most neurons in the visual cortex should be mostly or exclusively driven by visual stimuli.

      We also replaced “ideas” with “hypotheses”, as suggested. We have expanded the discussion of these points in the manuscript (lines 480-493). Many neurons occupy intermediate positions between these two extreme hypotheses. We clarified that “model would not work” refers to prediction accuracy approaching chance (EV ≈ 0).

      R1.32 "In both species and in both directions, inter-areal prediction EV fraction persisted (𝑝 < 0.001," Doesn't persist mean EV is unchanged? But the test is EV>0 or not in both cases.

      We meant that EV values remained significantly above chance, not that they were unchanged. The statistical test was indeed whether EV > 0 as the reviewer indicated. We have revised the text accordingly (lines 375-380).

      R1.33 "In mice, neurons showed a bimodal distribution in terms of their response predictability in shuffled and unshuffled trials" -> I don't see any bimodality in the figure, nor is there a statistical test provided for bimodality.

      In Figure 7C, a group of neurons lay essentially along the horizontal axis, whereas the other group is dispersed closer to the diagonal line. Specifically, the neurons that lay on the horizontal axis are also the ones whose responses are best predicted during gray screen activity. We have changed the text to clarify this point (lines 380-382).

      R1.34 "In the macaque V4 → V1 direction, there was a large proportion of neurons with peak EV when considering 25 ms to 50 ms offsets in the positive direction (i.e., V4 after V1, Figure 7I, right)." -> So what does this mean? Is this compatible with anything we know? This is the anti-causal direction so some kind of explanation would be warranted.

      In the V4→V1 panel, a positive offset means we use V4 at t+Δt to predict V1 at t (and conversely in the V1→V4 panel). Therefore, the fact that the peak EV occurs at +10–20 ms indicates that V1 leads V4 by ~10–20 ms: in other words, V1’s earlier response best predicts V4’s slightly later response. This observation is not anti-causal, but rather it is consistent with the canonical largely feed-forward V1→V4 latency (e.g., Schmolesky et al., 1998 among many others). We clarified this in text (lines 400-404).

      R1.35 L. 307: "In monkeys," plural!?

      While this was not correct in the original version, we have now added data from two more monkeys.

      R1.36 L. 313: "we observed an approximately bimodal distribution of neuronal responses, with a large subset of neurons that do not show reliable responses to visual stimuli both in L4 and L2/3" -> where?

      The bimodal distribution can be appreciated in Figure 6B (1-vs-rest r2, third panel, note neurons along the y-axis, see also R1.33) and Supplementary Figure 7B (lines 307-312). Additionally, as stated in R1.3, we have now formally quantified the bimodality of the relationship between one-vs-rest correlation and inter-laminar explained variance (EV) in mice using Hartigan’s dip test (lines 310-313); see also Supplementary Figure 7A,D. In datasets that did not show bimodality by visual inspection (macaque recordings) the same test yielded non-significant results, confirming that the statistical analysis distinguishes between bimodal and unimodal cases.

      R1.37 Random subsampling to control for population size done with how many subsamples? How are they combined? Variability across subsamples interpreted how?

      We performed 10 permutations and used the median distributions across permutations (line 621).

      Reviewer #2 (Public Review):

      R2.0: “Summary:

      In this work, the authors investigated the extent of shared variability in cortical population activity in the visual cortex in mice and macaques under conditions of spontaneous activity and visual stimulation. They argue that by studying the average response to repeated presentations of sensory stimuli, investigators are discounting the contribution of variable population responses that can have a significant impact at the single trial level. They hypothesized that, because these fluctuations are to some degree shared across cortical populations depending on the sources of these fluctuations and the relative connectivity between cortical populations within a network, one should be able to predict the response in one cortical population given the response of another cortical population on a single trial, and the degree of predictability should vary with factors such as retinotopic overlap, visual stimulation, and the directionality of canonical cortical circuits.”

      R2.1: To test this, the authors analyzed previously collected and publicly available datasets. These include calcium imaging of the primary visual cortex in mice and electrophysiology recordings in V1 and V4 of macaques under different conditions of visual stimulation. The strength of this data is that it includes simultaneous recordings of hundreds of neurons across cortical layers or areas. However, the weaknesses of calcium dynamics (which has lower temporal resolution and misses some non-linear dynamics in cortical activity) and multi-unit envelope activity (which reflects fluctuations in population activity rather than the variance in individual unit spike trains), underestimate the variability of individual neurons. The authors deploy a regression model that is appropriate for addressing their hypothesis, and their analytic approach appears rigorous and well-controlled.

      We agree with these points, and we discuss these specific limitations in capturing the variability of individual neurons in the Discussion section (lines 500-504). We have now also added analyses based on local field potentials (LFP). LFPs do not directly reflect the activity of individual neurons either.

      R2.2: From their analysis, they found that there was significant predictability of activity between layer II/III and layer IV responses in mice and V1 and V4 activity in macaques, although the specific degree of predictability varied somewhat with the condition of the comparison with some minor differences between the datasets. The authors deployed a variety of analytic controls and explored a variety of comparisons that are both appropriate and convincing that there is a significant degree of predictability in population responses at the single trial level consistent with their hypothesis. This demonstrates that a significant fraction of cortical responses to stimuli is not due solely to the feedforward response to sensory input, and if we are to understand the computations that take place in the cortex, we must also understand how sensory responses interact with other sources of activity in cortical networks. However, the source of these predictive signals and their impact on function is only explored in a limited fashion, largely due to limitations in the datasets. Overall, this work highlights that, beyond the traditionally studied average evoked responses considered in systems neuroscience, there is a significant contribution of shared variability in cortical populations that may contextualize sensory representations depending on a host of factors that may be independent of the sensory signals being studied.

      We agree that these datasets do not lend themselves well to directly separating and quantifying all the different sources of the predictive signals. We expand on this point in the Discussion section (lines 509-511).

      R2.3: The different recording modalities and comparisons (within vs. across cortical areas) limit the interpretability of the inter-species comparisons.

      We also agree with this comment. We emphasize that our goal is not to attempt a direct quantitative comparison across species (lines 497-499).

      R2.4: Strengths:

      This work considers a variety of conditions that may influence the relative predictability between cortical populations, including receptive field overlap, latency that may reflect feed-forward or feedback delays, and stimulus type and sensory condition. Their analytic approach is well-designed and statistically rigorous. They acknowledge the limitations of the data and do not over-interpret their findings.

      Weaknesses:

      The different recording modalities and comparisons (within vs. across cortical areas) limit the interpretability of the inter-species comparisons.The mechanistic contribution of known sources or correlates of shared variability (eye movements, pupil fluctuations, locomotion, whisking behaviors) were not considered, and these could be driving or a reflection of much of the predictability observed and explain differences in spontaneous and visual activity predictions.

      We have expanded on the Discussion section to explicitly state the points raised by the reviewer (lines 494-509).

      In mice, we have now also analyzed a separate dataset in which behavioral measurements were available, including running speed and facial motion (FaceMap SVDs). We used these to build behavioral-only and combined models to predict neural activity. We found that behavioral variables explained a modest but consistent portion of the variance across both spontaneous and stimulus conditions (Supplementary Figure 10A,C, lines 268-273).

      For the macaque data, we analyzed pupil size as the only available behavioral measure in the macaque dataset. We focused specifically on the “resting state, eyes open” condition, where both neural activity and pupil measurements were available. Using ridge regression, we assessed the extent to which pupil size predicted neural activity in V1 and V4. Pupil size alone explained only a small fraction of the variance (Supplementary Figure 10E, lines 274-276).

      R2.5: Previous work has explored correlations in activity between areas on various timescales, but this work only considered a narrow scope of timescales.

      Without going into specifics about the numbers, it is hard to fully address this question. As the reviewer noted in R2.1, the mouse data analyzed here do not lend themselves to evaluating predictability on scales of tens of milliseconds. In the macaque data, we have now conducted additional analyses where we binned the activity across a range of bin sizes (10 ms to 200 ms). The new analyses are shown in Supplementary Figure 4, and described in lines 140-143, 160-163.

      R2.6: The observation that there is some degree of predictability is not surprising, and it is unclear whether changes in observed predictability with analysis conditions are informative of a particular mechanism or just due to differences in the variance of activity under those conditions. Some of these issues could be addressed with further analysis, but some may be due to limitations in the experimental scope of the datasets and would require new experiments to resolve.

      First, we note that several of the analyses and comparisons are within conditions and not across conditions, where by “condition” we mean the presence or absence of a stimulus or different stimuli (e.g., Figures 3, 5, 6, 7, Supplementary Figures 3-4, 7–13).

      Second, we note that our mouse preprocessing standardized responses by spontaneous mean and SD per neuron, controlling baseline scale across conditions (lines 535-538). Because of this standardization, spontaneous traces have unit scale (mean = 0, SD = 1).

      To test whether differences in variance underlie our findings, we calculated the variance for both species. For mice, we computed variance across repeats (visual) and across timepoints (lines 286-291). For the macaque moving-bar sessions, we computed variance across the concatenated held-out samples pooling timepoints, repeats, and bar identities (lines 291-292).

      The V4 population showed a higher overall variance distribution compared to the V1 population (Supplementary Figure 2I-J), and L2/3 variance was also overall higher than L4 (Supplementary Figure 2D-E). We also see a modest monotonic relationship between EV fraction and this variance (mouse visual: Spearman ρ = 0.43–0.52, p < 0.001; macaque stimulus responses: ρ = 0.50–0.56, p < 0.001; macaque gray-screen responses: ρ = 0.38, p < 0.001, Figure 6A,D), indicating variance contributes to (but is not the primary driver of) EV prediction fraction. We then adjusted for variance by fitting, within each stimulus condition, a linear regression of EV on variance (excluding shuffled-control rows) and conducted all comparisons on the resulting residual EV values, thereby isolating effects not attributable to variance (see Supplementary Figure 3E-G, lines 165-171).

      Reviewer #2 (Recommendations for the authors):

      R2.7 Overall I found this manuscript to be very clearly written and the results compelling, although I found myself wanting a little more. I believe these datasets also include information about eye movements, pupil diameter, and maybe locomotion and whisking in the rodent work. I think it could be informative to ask the degree to which the predictability, particularly during the spontaneous activity, is attributable to these other known sources of variance in trial-by-trial measures. My concern is that during visual stimulation, the space of cortical responses is limited to a very narrow scope (observing a visual stimulus during fixation) whereas spontaneous activity includes a broader range of possibilities (different states of arousal, eye movement).

      We analyzed the role of behavioral variables that could explain the neural activity in mouse V1 (including the variables suggested by the reviewer, running speed, facemap SVDs). The open dataset authors warned not to use pupil size since in the dark, the measurements were not accurate. In terms of the contribution to the predictability of mouse V1 activity, these behavioral variables showed a weak yet significant contribution (Supplementary Figure 10A,C, lines 260-270).

      R2.8 By controlling for eye movements or pupil diameter during spontaneous measurements, would you improve your measure of predictability?

      When predicting neural activity in the lights-off eyes open condition, combining neural data of the predictor population with information of pupil size did not result in a statistically significant increase in EV fraction when predicting the target population (Supplementary Figure 10E, lines 276-278).

      R2.9 Also, there is work that shows feed-forward correlations between V1 and higher visual areas are observed in higher frequency activity, whereas feedback is associated with lower frequency activity. If you compared your predictability measure over bandpasses with different timescales, would you find the direction of V1-V4 interactions changes consistent with this previous work?

      To address this question, we extended our analyses to the local field potential signals (LFPs) in monkeys, using band-limited LFP power (2–12, 12–30, 30–45, 55–95 Hz). We reran the lag sweep analyses (10-ms steps; 200-ms windows slid every 10 ms) in both directions. The Gamma band showed a feed-forward signature in the early evoked period: the V1→V4 predictability peaked at negative offsets (∼10–30ms; V1 leads), and the V4→V1 predictability peaked at positive offsets, consistent with previous findings. The results for low and beta frequency bands are also presented in the text (Supplemental Figure 13, lines 412-423).

      Reviewer #3 (Public review):

      R3.0: Neural activity in the visual cortex has primarily been studied in terms of responses to external visual stimuli. While the noisiness of inputs to a visual area is known to also influence visual responses, the contribution of this noisy component to overall visual responses has not been well characterized.

      In this study, the authors reanalyze two previously published datasets - a Ca++ imaging study from mouse V1 and a large-scale electrophysiological study from monkey V1-V4. Using regression models, they examine how neural activity in one layer (in mice) or one cortical area (in monkeys) predicts activity in another layer or area. Their main finding is that significant predictions are possible even in the absence of visual input, highlighting the influence of non-stimulus-related downstream activity on neural responses. These findings can inform future modeling work of neural responses in the visual cortex to account for such non-visual influences.

      R3.1: "A major weakness of the study is that the analysis includes data from only a single monkey. This makes it hard to interpret the data as the results could be due to experimental conditions specific to this monkey, such as the relative placement of electrode arrays in V1 and V4."

      We have now added the second monkey (monkey “A”) from the same dataset (Chen et al., 2020), which includes all activity types except the lights-off condition. In addition, we collected new neural activity from one additional monkey (monkey “D”) in collaboration with the Carlos Ponce lab (monkey A: seelines 90-96, 120-132, 159, 161, 171, 183-185, 188-194, 200-203, 228-237, 254-258, 292-296, 334-342, 351-353, 358-364, 374-378, 387-393, 400-408, 414, 417-421, 539-540, 544-545, 680-681, 696-698; Supplemental Figures 1-6, 8, 11, 12, and 13; monkey D: see lines 90-96, 120-130, 132-134, 163-164, 228-235, 237-243, 292-296, 351-353, 374-378, 387-389, 539-540, 553-560, 696-698; Supplemental Figures 1-2, 4, 6, 9, 11, and 12. The conclusions for the new monkeys are qualitatively similar to the ones reported previously. The main quantitative differences are due to the very large difference in the number of predictor sites (Table 2, lines 127-134).

      R3.2: The authors perform a thorough analysis comparing regression-based predictions for a wide variety of combinations of stimulus conditions and directions of influence. However, the comparison of stimulus types (Figure 4) raises a potential concern. It is not clear if the differences reported reflect an actual change in predictive influence across the two conditions or if they stem from fundamental differences in the responses of the predictor population, which could in turn affect the ability to measure predictive relationships. The authors do control for some potential confounds such as the number of neurons and self-consistency of the predictor population. However, the predictability seems to closely track the responsiveness of neurons to a particular stimulus. For instance, in the monkey data, the V1 neuronal population will likely be more responsive to checkerboards than to single bars. Moreover, neurons that don't have the bars in their RFs may remain largely silent. Could the difference in predictability be just due to this? Controlling for overall neuronal responsiveness across the two conditions would make this comparison more interpretable.

      First, we note that several of the analyses and comparisons are within conditions and not across conditions, where by “condition” we mean the presence or absence of a stimulus or different stimuli (e.g., Figures 3, 5, 6, 7, Supplementary Figures 3-4, 7-13).

      In Figure 4, differences in target-population responsiveness could influence predictability across stimulus types, as the reviewer points out. We therefore controlled for this by modeling EV as a function of the following neuron properties: split-half r, SNR, one-vs-rest r^2, and response variance. Regression was performed within each direction, where we then used residuals for inference_._ When comparing residuals, the predictability of checkerboard responses remained statistically higher than the predictability of the responses to moving bars (p<0.001, permutation test, Supplementary Figure 5K, lines 196-203), suggesting that the differences in predictability cannot be exclusively attributed to differences in the target population neuronal properties.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This important study provides the first direct neuroimaging evidence for the integration segregation theory of exogenous attention underlying inhibition of return, using an optimized IOR-Stroop fMRI paradigm to dissociate integration and segregation processes and to demonstrate that attentional orienting modulates semantic- and response-level conflict processing. Although the empirical evidence is compelling, clearer justification of the experimental logic, more cautious framing of behavioral and regional interpretations, and greater transparency in reporting and presentation are needed to strengthen the conclusions. The work will be of broad interest to researchers investigating visual attention, perception, cognitive control, and conflict processing.

      We appreciate the positive reception to our manuscript. In the revised manuscript, we have further clarified the logic underlying the task design, adopted a more cautious tone in interpreting the behavioral and neuroimaging results, and enhanced the transparency of reporting and presentation.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study makes a significant and timely contribution to the field of attention research. By providing the first direct neuroimaging evidence for the integration-segregation theory of exogenous attention, it fills a critical gap in our understanding of the neural mechanisms underlying inhibition of return (IOR). The authors employ a carefully optimized cue-target paradigm combined with fMRI to elegantly dissociate the neural substrates of cue-target integration from those of segregation, thereby offering compelling support for the integration-segregation account. Beyond validating a key theoretical hypothesis, the study also uncovers an interaction between spatial orienting and cognitive conflict processing, suggesting that exogenous attention modulates conflict processing at both semantic and response levels. This finding shed new light on the neural mechanisms that connect exogenous attentional orienting with cognitive control.

      Strengths:

      The experimental design is rigorous, the analyses are thorough, and the interpretation is well grounded in the literature. The manuscript is clearly written, logically structured, and addresses a theoretically important question. Overall, this is an excellent, high-impact study that advances both theoretical and neural models of attention.

      Weaknesses:

      While this study addresses an important theoretical question and presents compelling neuroimaging findings, a few additional details would help improve clarity and interpretation. Specifically, more information could be provided regarding the experimental conditions (SI and RI), the justification for the criteria used for excluding behavioral trials, and how the null condition was incorporated into the analyses. In addition, given the non-significant interaction effect in the behavioral results, the claim that the behavioral data "clearly isolated" distinct semantic and response conflict effects should be phrased more cautiously.

      We thank the reviewer for these helpful comments. In the revised manuscript, we have provided additional clarification regarding the SI and RI conditions (page 29), expanded the justification for the behavioral trial exclusion criteria (page 32), and clarified how the null condition was modeled and incorporated into the analyses (page 29). In addition, we have revised the description of the behavioral results to adopt more cautious wording, particularly given the absence of a significant interaction effect. For detailed responses to these specific points, please refer to the "Recommendations for the Authors" section below.

      Reviewer #2 (Public review):

      Summary:

      This study provides evidence for the integration-segregation theory of an attentional effect, widely cited as inhibition of return (IOR), from a neuroimaging perspective, and explores neural interactions between IOR and cognitive conflict, showing that conflict processing is potentially modulated by attentional orienting.

      Strengths:

      The integration-segregation theory was examined in a sophisticated experimental task that also accounted for cognitive conflict processing, which is phenomenologically related to IOR but "non-spatial" by nature. This study was carefully designed and executed. The behavioral and neuroimaging data were carefully analyzed and largely well presented.

      Weaknesses:

      The rationale for the experimental design was not clearly explained in the manuscript; more specifically, why the current ER-fMRI study would disentangle integration and segregation processes was not explained. The introduction of "cognitive conflict" into the present study was not well reasoned for a non-expert reader to follow.

      We thank the reviewer for raising these important points. In the revised manuscript, we have further clarified the rationale of the experimental design and the motivation for introducing cognitive conflict.

      First, we clarified that previous neuroimaging studies relied primarily on SOA-based contrasts, which capture the temporal dynamics of attentional orienting but do not directly distinguish the functional processes of integration and segregation. We therefore established the direct comparison between cued and uncued targets in the long SOA as the critical test required by the theory, as these conditions are hypothesized to engage integration and segregation processes, respectively (pages 6-7, “The Challenge of Neural Verification”). Crucially, to successfully implement this comparison, we highlighted the specific methodological advantage of our study: the use of a Genetic Algorithm (GA) to optimize the stimulus sequence. We explained how this design maximizes statistical power specifically for contrast detection (i.e., cued vs. uncued) while maintaining high estimation efficiency, thereby directly overcoming the power constraints that had likely obscured these subtle neural signatures in prior ER-fMRI work (pages 7-8).

      Second, we clarified that the manipulation of cognitive conflict was introduced with the additional aim of examining IOR expression mechanisms, specifically investigating how spatial attention modulates ongoing cognitive processing after target onset, rather than the generation of IOR itself. We have now provided a clearer rationale for embedding a modified Stroop task within the cue-target paradigm, and explained how this design allows us to dissociate semantic and response conflicts while avoiding methodological confounds present in previous studies (page 8).

      The presentation of the results can be further improved, especially the neuroimaging results. For instance, Figure 4 is challenging to interpret. If "deactivation" (or a reduction in activation) is regarded as a neural signature of IOR, this should be clearly stated in the manuscript.

      We thank the reviewer for pointing out the interpretational challenges in Figure 4. To address this, we have revised Figure 4 and provided a clearer and more precise interpretation of these interaction effects in the manuscript.

      First, we have added explicit panel titles to Figure 4 (page 17). Panel A is now clearly labeled as the “Effect of IOR on Semantic Conflict”, while Panel B is labeled as the “Effect of IOR on Response Conflict”. We hope this visual labeling helps readers clearly identify the IOR modulation effects specific to each conflict type.

      Second, we have revised the figure caption to explicitly define the interaction contrasts used to quantify these modulations, providing specific formulas (e.g., [UncuedRI – Uncued-SI] > [Cued-RI – Cued-SI] for response conflict) to ensure transparency.

      Finally, regarding the reviewer’s comment on “deactivation”, we realized that our original figure terminology (e.g., “IOR effect under...”) might have caused confusion by mixing the interaction effect with the IOR effect itself. We have clarified that Figure 4 specifically illustrates the “Effect of IOR on the Semantic Conflict and the Response Conflict” (i.e., interaction effect between IOR and cognitive conflict). To interpret this interaction, we further examined the simple effects of conflict under each cueing condition. Specifically, we analyzed the neural signatures of semantic conflict (SI minus NE) and response conflict (RI minus SI) separately for the cued and uncued targets. Importantly, regarding the nature of the IOR effect itself (as displayed in Figure 3, page 14), it is not simply a uniform deactivation. Instead, by directly comparing the cued and uncued conditions for the neutral words, we observed neural changes in two directions: some specific regions exhibited an increased activation (Cued > Uncued), while others showed a reduced activation (Uncued > Cued). These differential patterns involved distinct brain networks and corresponded to the distinct integration and segregation mechanisms, respectively, rather than a global loss of activation (pages 20-21).

      Reviewer #3 (Public review):

      Summary:

      This study aims to provide the first direct neuroimaging evidence relevant to the integration-segregation theory of exogenous attention - a framework that has shaped behavioral research for more than two decades but has lacked clear neural validation. By combining an inhibition-of-return (IOR) paradigm with a modified Stroop task in an optimized event-related fMRI design, the authors examine how attentional integration and segregation processes are implemented at the neural level and how these processes interact with semantic and response conflicts. The central goal is to map the distinct neural substrates associated with integration and segregation and to clarify how IOR influences conflict processing in the brain.

      Strengths:

      The study is well-motivated, addressing a theoretically important gap in the attention literature by directly testing a long-standing behavioral framework with neuroimaging methods. The experimental approach is creative: integrating IOR with a Stroop manipulation expands the theoretical relevance of the paradigm, and the use of a genetic algorithm-optimized fMRI design ensures high efficiency. Methodologically, the study is sound, with rigorous preprocessing, appropriate modeling, and analyses that converge across multiple contrasts. The results are theoretically coherent, demonstrating plausible dissociations between integration-related activity in the fronto-parietal attention network (FEF, IPS, TPJ, dACC) and segregation-related activity in medial temporal regions (PHG, STG). The findings advance the field by supplying much-needed neural evidence for the integration-segregation framework and by clarifying how IOR modulates conflict processing.

      Weaknesses:

      Some interpretive aspects would benefit from clarification, particularly regarding the dual roles ascribed to dACC activation and the circumstances under which PHG and STG are treated as a single versus separate functional clusters. Reporting conventions are occasionally inconsistent (e.g., statistical formatting, abbreviation definitions), which may hinder readability. More detailed reporting of sample characteristics, exclusion criteria, and data-quality metrics-especially regarding the global-variance threshold-would improve transparency and reproducibility. Finally, some limitations of the study, including potential constraints on generalization, are not explicitly acknowledged and should be articulated to provide a more balanced interpretation.

      We thank the reviewer for the positive and constructive assessment of our study. In response to the concerns raised, we have carefully revised the manuscript and addressed all points in detail below. In brief, we have clarified key interpretation issues in the Discussion section, including the complementary roles of dACC activation and the distinction between statistical clustering and functional interpretation of PHG and STG activations (pages 20-21). We have also improved transparency and reporting throughout the manuscript by providing more detailed sample characteristics, clarifying exclusion criteria and global variance computation, adding illustrative supplementary figures, and standardizing statistical reporting and abbreviations (pages 28, 33). Finally, we have added a concise paragraph on limitations of the study to provide a more balanced interpretation of the findings (pages 26-27). Detailed, point-by-point responses to all specific comments are provided below (see the “Recommendations for the authors” Section).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Specific comments:

      (1) The figure caption contains an unclear sentence (lines 195-196): "The target was a 450-ms colored Chinese character presented 600 ms after the fixation cue onset at the two target locations with equal probabilities." This description is ambiguous and should be revised for clarity.

      Thanks for pointing this out. In the revised manuscript, we have rephrased the figure caption to improve clarity as follows (pages 9-10):

      “Each trial started with a 150-ms non-informative cue presented at one of the two peripheral boxes. After a 150-ms interstimulus interval (ISI), a 150-ms fixation cue was presented at the central fixation box. Following a further 450-ms ISI, the target, a colored Chinese character, appeared at one of the two target locations with equal probabilities and remained on the screen for 450 ms. The trial ended with a variable intertrial interval (ITI) of 850, 1050, 1250, or 1450 ms (with equal probabilities).”

      (2) Please provide a more detailed and clearer description of the SI and RI experimental conditions in the Methods section.

      Thanks for this helpful suggestion. We have revised the Methods section to provide a more detailed description of the SI and RI conditions. Specifically, we have further described the stimulus-response mapping and clarified how the SI and RI conditions are defined based on whether the ink color and the character meaning fell into the same or different response categories under this mapping. In addition, we have added a clarification in the Methods section to make it clearer that the SI trials involved semantic conflict without response conflict, whereas RI trials involve both semantic and response conflicts (page 29).

      (3) As the data were collected across two research centers, please clarify the number of participants enrolled at each site.

      Thanks for this suggestion. We have now explicitly stated in the Apparatus and Data Acquisition section that 16 participants were enrolled at each site. The revised text reads (page 31):

      “The imaging data were acquired at two research sites following comparable protocols, with equal numbers of participants scanned at each site (n = 16 per site).”

      (4) In the behavioral data analysis, please provide the rationale or justification for the criteria used to exclude trials.

      Thanks for this comment. In the revised manuscript (page 32), we have clarified that reaction times (RTs) shorter than 150 ms were excluded as anticipatory responses, and RTs longer than 1,300 ms were excluded to limit the influence of unusually slow responses. These exclusion criteria are commonly adopted in RT research and were applied consistently across all conditions (Ratcliff, 1993; Whelan, 2008).

      (5) Given that the behavioral interaction effect was not statistically significant, the conclusion on lines 236-237, "These data clearly isolated the two distinct conflict effects in the Stroop effect, namely the semantic conflict (SI-NE difference) and the response conflict (RI-SI difference)" appears overstated and should be softened accordingly.

      We thank the reviewer for this important comment. We have clarified that our original statement was intended to highlight the successful isolation of conflict types based on the significant main effects of congruency (validating the task design), rather than implying a significant interaction effect. However, we agree that the original phrasing appeared unclear in this context. We have therefore revised the sentence to adopt a more cautious tone in the revised manuscript (page 12):

      “These data demonstrated typical Stroop interference effects (Veen & Carter, 2005) in both the semantic (SI-NE difference) and response conflicts (RI-SI difference).”

      (6) The statement on lines 281-282, "Although the IOR effect showed no effect on either the semantic conflict difference (SI-NE) or the response conflict difference (RI-SI) in the behavioral performance" lacks supporting statistical evidence. Please report the relevant test statistics.

      We appreciate the reviewer’s careful reading and note that the relevant statistical evidence was missing from the original manuscript. This has now been added in the revised version. Specifically, we examined the interactions between cue validity and semantic conflict (SI vs. NE) as well as between cue validity and response conflict (RI vs. SI). Neither interaction was significant (see revised Results for full statistics on page 12), supporting our original statement that cue validity did not modulate either conflict component in behavioral performance.

      (7) The manuscript mentions that a null condition (with no Chinese character presented) was included to increase statistical power for detecting differences across conditions. However, it is unclear how this null condition was actually used in the data analyses. Please clarify the role of the null condition in both the behavioral and neuroimaging analyses.

      Thanks for this comment. We regret that this was not sufficiently clear in the original manuscript. The null condition was included for neuroimaging purposes and was not used in the behavioral analyses, as no response was required in these trials. In the fMRI analyses, null trials served as the implicit baseline and were not modeled as regressors of interest. Task-related activities for all experimental conditions were therefore estimated relative to this null baseline, facilitating estimations of task-related responses in randomized event-related designs (Burock et al., 1998; Friston et al., 1999; Liu, 2004). We have clarified this point in the revised manuscript (page 29).

      References

      Burock, M. A., Buckner, R. L., Woldorff, M. G., Rosen, B. R., & Dale, A. M. (1998). Randomized event-related experimental designs allow for extremely rapid presentation rates using functional MRI. NeuroReport, 9(16), 3735-3739. https://doi.org/10.1097/00001756-199811160-00030

      Friston, K. J., Zarahn, E., Josephs, O., Henson, R. N. A., & Dale, A. M. (1999). Stochastic designs in event-related fMRI. NeuroImage, 10(5), 607-619. https://doi.org/10.1006/nimg.1999.0498

      Liu, T. T. (2004). Efficiency, power, and entropy in event-related fMRI with multiple trial types: Part II: design of experiments. NeuroImage, 21(1), 401-413. https://doi.org/10.1016/j.neuroimage.2003.09.031

      Ratcliff, R. (1993). Methods for dealing with reaction time outliers. Psychological Bulletin, 114(3), 510-532. https://doi.org/10.1037/0033-2909.114.3.510

      Whelan, R. (2008). Effective analysis of reaction time data. The Psychological Record, 58(3), 475-482. https://doi.org/10.1007/BF03395630

      Reviewer #2 (Recommendations for the authors):

      (1) The paper is a bit too lengthy, with a lot of information that is hard for non-experts to grasp.

      We thank the reviewer for this comment. We realized that the Introduction was the most challenging section for general readers. In the revision, we refined the text in the Introduction for a better structure and more reader-friendly wording to improve readability. In addition, following the reviewer’s suggestion (Recommendation 4 below), we have added short subsection titles to the Introduction, Results, and Discussion sections to better organize the content and highlight the main ideas. We hope these revisions make the manuscript more accessible and easier for a broader audience to follow.

      (2) Please double-check the stats, as some of the results presented in the main text do not align well with the figures. Take Figure 2 as an example.

      We appreciate the reviewer’s concern and have double-checked all statistics. All the results are consistent between the figures and the main text. Take Figure 2 as an example (page 12), the perceived discrepancy probably was caused by the fact that the descriptive values reported in the main text are marginal means for the main effects (i.e., the overall average of one factor, collapsed over the other factor), whereas Figure 2 shows the mean for each Congruency × Cue Validity condition (i.e., simple effect).

      (3) The reasoning that the neuroimaging findings support the dissociation between integration and segregation needs to be improved.

      We thank the reviewer for this important comment. In the revised Discussion (pages 1921), we have strengthened the reasoning linking our neuroimaging findings to the dissociation between the integration and segregation processes. Specifically, we make it clear how the distinct activation patterns observed for the cued and uncued targets map onto the different functional demands proposed by the integration-segregation theory. The cued targets were theorized to recruit the frontoparietal attentional control networks, consistent with the re-engagement of an existing object file (integration). On the other hand, the uncued targets should engage the medial temporal and temporal association regions responsible for novelty detection and episodic encoding, consistent with the creation of a new object file (segregation). We hope the reviewer finds that the revision offers a clearer explanation of how the observed neural patterns are consistent with a dissociation between the integration and segregation processes.

      (4) Please use short section titles to organize the introduction, results, and discussion sections. For instance, the discussion section is a long chunk of text (almost 9 pages) and is pretty dense, making it hard to quickly grasp the ideas the authors want to convey.

      Thanks for this helpful suggestion. Following the reviewer’s recommendation, we have now added short subsection titles to the Introduction and Discussion sections to improve structure and readability. For the Results section, we have maintained and further refined the existing subheadings to ensure consistent organization.

      Reviewer #3 (Recommendations for the authors):

      I found this manuscript to be a timely and substantive contribution to the study of attention and cognitive neuroscience. To my knowledge, it provides the first direct neuroimaging evidence relevant to the integration-segregation theory of exogenous attention, a framework that has been influential in behavioral work for more than two decades but has lacked clear neural support. The study is conceptually well motivated, methodologically solid, and generally clearly reported. The findings differentiate neural substrates associated with integration and segregation processes and further show how inhibition of return (IOR) interacts with semantic and response conflicts at the neural level.

      The manuscript is well organized, the writing is mostly clear, and the progression from theory to hypotheses and methods is easy to follow. The combination of IOR with a modified Stroop paradigm is a clever choice that extends the theoretical scope of exogenous attention research. The use of an optimized event-related fMRI design based on a genetic algorithm is also a strength and reflects careful attention to design efficiency.

      The main results are internally consistent and theoretically meaningful. Integration related activity in the fronto-parietal attention network (including FEF, IPS, TPJ, and dACC) and segregation-related activity in medial temporal areas (PHG and STG) it well with the proposed framework, and the pattern of activations is coherent across analyses.

      Overall, I think this is a carefully executed study that offers much-needed neural evidence bearing on the integration-segregation theory of exogenous attention. I would recommend the following revisions.

      Suggestions:

      (1) In the Discussion (pp. ~17-18), dACC activation is described both in terms of general cognitive control demands and as reflecting a possible inhibitory bias toward the cued direction. It would help the reader if you could briefly indicate whether you see these as complementary (e.g., dual roles within the same region) or as more competing interpretations.

      We thank the reviewer for this helpful comment. We have clarified in the revised manuscript that dACC exerts general cognitive control demands and biasing against the cued direction are complementary rather than competing interpretations. Specifically, we described how the dACC is involved in both the cognitive control required for target integration and the inhibitory bias toward the cued location, thereby highlighting its dual roles within the same region. The revised section reads as follows (page 20):

      “Furthermore, the observed increase in the left dACC activity under the cued relative to the uncued condition likely reflected the engagement of cognitive control mechanisms (Botvinick et al., 2004; Chung et al., 2024; Mayer et al., 2012; Veen & Carter, 2005), particularly in resolving the conflict between the task-driven requirement of target integration and the reduced accessibility of the cue-initiated representation. In this context, the heightened activation of dACC may also reflect its role in fulfilling the inhibitory bias toward the cued location (Mayer et al., 2004) and discouraging inefficient integration attempts at a location marked as less relevant.”

      (2) In the Discussion, you could consider adding a short paragraph explicitly acknowledging a few limitations and how they might constrain generalization of the findings. A concise reflection of this kind would give a more balanced picture without undermining the main conclusions.

      We appreciate this helpful suggestion. In the revised manuscript, we have added a concise paragraph explicitly addressing a key limitation of the present study (pages 26-27). Specifically, we acknowledge that the absence of behavioral interactions alongside clear neural effects requires cautious interpretation. We discussed how this dissociation may reflect differences in measurement sensitivity between behavioral and neural indices, consistent with prior findings (Chen et al., 2006; Wilkinson & Halligan, 2004). We also note that the use of a GA-optimized sequence, while improving statistical efficiency, may have introduced unintended regularities in event order that could influence behavioral strategies.

      (3) Since the dataset is hosted on GitHub, adding a short note in the Data Availability section about whether the repository will also include analysis scripts or future replication data would further enhance transparency and long-term usefulness.

      Thanks for this helpful suggestion. We have revised the Data Availability section (page 35) to clarify that the GitHub repository contains the processed data used in the final analyses. Analysis scripts and additional materials for replication are available from the authors upon reasonable request.

      (4) In the Results section, the formatting of statistics is not fully consistent. For example, some reports use spaces around symbols (e.g., "η<sup>2</sup> = 0.301") whereas others do not (e.g., "p< .001"). It would be good to standardize this (e.g., "p < .001", "η<sup>2</sup> = .30") across the manuscript.

      Done as suggested.

      (5) A few abbreviations appear before they are defined-for instance, SPC (superior parietal cortex) shows up in the Results (response conflict section) before the full name is given. Ensuring that each abbreviation is defined at first mention would help readers who may be less familiar with all of the regional acronyms.

      Thanks for this comment. We have conducted a thorough check of the manuscript and ensured that all abbreviations are defined upon their first occurrence.

      (6) The text sometimes refers to "PHG/STG" as a combined cluster, while at other points, PHG and STG are described separately. It would be useful to clarify under what circumstances they are treated as a single functional cluster versus distinct regions of interest, and to keep the nomenclature as consistent as possible between the main text and the tables.

      Thanks for raising this point. In the revised manuscript, we have clarified this issue by distinguishing between statistical clustering and functional interpretation. In the whole brain analysis, activations in the left hemisphere formed a single continuous cluster spanning the PHG and STG; therefore, this cluster is labeled as “PHG/STG” in Table 1. We have explicitly noted the continuous nature of this cluster in the Results section (page 15) to ensure clarity:

      “Notably, in the left hemisphere, these activations formed a continuous cluster spanning both regions (labeled as PHG/STG in Table 1).”

      (7) It would be helpful to provide a bit more detail about the sample characteristics (e.g., age range, handedness, and inclusion/exclusion criteria) and to state explicitly how many participants, if any, were excluded from the analyses and for what reasons. This would help readers better evaluate data quality and generalizability.

      Thanks for this helpful suggestion. We have revised the Participants section (page 28) to provide the full details regarding our sample:

      “32 healthy participants with normal or corrected-to-normal vision and normal color vision were recruited. All participants were right-handed and reported no history of neurological or psychiatric disorders. Data from three participants were excluded due to excessive head movements and high global variances (see fMRI Data Analysis), leaving 29 participants for analysis (18 female, 11 male; aged 18-30 years, M = 22.69, SD = 2.58).”

      Furthermore, we have provided a clearer description of the exclusion criteria in the Data Analysis section (pages 33-34) as follows:

      “Runs with motions exceeding one voxel length in any direction were excluded (resulting in the exclusion of two runs) …Runs with global variance equal to or over 0.1% were excluded, resulting in the exclusion of eight runs (see Supplementary Information for details). Ultimately, three participants were excluded because neither run met the quality criteria. All remaining participants retained both runs, except for three individuals who each contributed only one valid run.”

      (8) Given that participants were excluded based on global variance exceeding 0.1%, it would be very informative to include, in the Supplementary Materials, an illustrative figure showing the signal time series (or global signal variance over time) for excluded participants.

      We appreciate this valuable suggestion. In the revised Supplementary Materials, we have included a new figure (Figure S2) that plots the global signal time series for the excluded runs to illustrate the signal patterns that led to their exclusion based on global variance.

      (9) Relatedly, it may help to more explicitly describe how global variance was computed (e.g., over which time window, after which preprocessing steps, and whether it was calculated on whole-brain signal or within specific masks). A concise clarification would make the exclusion criterion easier to interpret.

      Thanks for this helpful suggestion. We have now clarified in the manuscript how global variance was computed (page 33) and have also provided a more detailed description of the computation procedure in the Supplementary Materials (page 4). Specifically, after the standard preprocessing (slice timing correction, 3D motion correction, spatial smoothing, linear trend removal, and high-pass temporal filtering), the global signal was computed for each run as the mean signal across voxels with intensity values greater than 100 in each volume. Global variance was then quantified as the temporal variance of this run-wise global-signal time course across all volumes, providing a quality-control index of signal stability.

      (10) Rather than only reporting a single overall exclusion rate (e.g., 5.52% of total trials), it would be informative to break this down by source, reporting separately the proportion of trials excluded as RT outliers and the proportion excluded due to response errors. This would further improve transparency regarding the behavioral preprocessing pipeline.

      Thanks for this helpful suggestion. We have now broken down the overall exclusion rate by source in the revised manuscript. Specifically, we reported that 4.29% of trials were excluded due to incorrect responses, and 1.24% of trials were excluded as RT outliers (page 32).

      References

      Botvinick, M. M., Cohen, J. D., & Carter, C. S. (2004). Conflict monitoring and anterior cingulate cortex: an update. Trends in Cognitive Sciences, 8(12), 539-546. https://doi.org/10.1016/j.tics.2004.10.003

      Chen, Q., Wei, P., & Zhou, X. (2006). Distinct neural correlates for resolving stroop conflict at inhibited and noninhibited locations in inhibition of return. Journal Of Cognitive Neuroscience, 18(11), 1937-1946. https://doi.org/10.1162/jocn.2006.18.11.1937

      Chung, R. S., Cavaleri, J., Sundaram, S., Gilbert, Z. D., Del Campo-Vera, R. M., Leonor, A., Tang, A. M., Chen, K.-H., Sebastian, R., Shao, A., Kammen, A., Tabarsi, E., Gogia, A. S., Mason, X., Heck, C., Liu, C. Y., Kellis, S. S., & Lee, B. (2024). Understanding the human conflict processing network: A review of the literature on direct neural recordings during performance of a modified stroop task. Neuroscience Research, 206, 1-19. https://doi.org/10.1016/j.neures.2024.03.006

      Mayer, A. R., Seidenberg, M., Dorflinger, J. M., & Rao, S. M. (2004). An event-related fMRI study of exogenous orienting: supporting evidence for the cortical basis of inhibition of return? Journal Of Cognitive Neuroscience, 16(7), 1262-1271. https://doi.org/10.1162/0898929041920531

      Mayer, A. R., Teshiba, T. M., Franco, A. R., Ling, J., Shane, M. S., Stephen, J. M., & Jung, R. E. (2012). Modeling conflict and error in the medial frontal cortex. Human Brain Mapping, 33(12), 2843-2855. https://doi.org/10.1002/hbm.21405

      Veen, V. V., & Carter, C. S. (2005). Separating semantic conflict and response conflict in the Stroop task: A functional MRI study. Neuro Image, 27(3), 497-504. https://doi.org/10.1016/j.neuroimage.2005.04.042

      Wilkinson, D., & Halligan, P. (2004). The relevance of behavioural measures for functional imaging studies of cognition. Nature Reviews Neuroscience, 5(1), 67-73. https://doi.org/10.1038/nrn1302

    1. Author response:

      The following is the authors’ response to the original reviews

      Thank you very much for the positive and constructive feedback on our manuscript. We have revised the manuscript accordingly and have added a substantial number of additional experiments and have extended the data.

      Questions of the reviewers were focused mostly on mechanical insight into organoid formation, touching following aspects of lens organoid formation presented in the manuscript:

      - Cellular arrangements/re-arrangements during the process of lens formation including potential contribution of differential adhesion-mediated cell sorting to the cellular arrangement in the organoid and characterization of individual contributions of lens- and retina- committed progenitors to this process.

      - Activity of BMP and FGF signaling pathways during organoid formation, namely identification of tissue responding to the signaling withing forming organoids.

      - Contribution of externally supplemented Matrigel to the differentiation process and cellular arrangements in ocular organoids. 

      To address those points in detail we included additional experiments that are now presented in revised version of the manuscript, namely in revised Figure 2-figure supplement 1 (addressing contribution of Matrigel); new Figure 4-supplement 1/Video S5 (addressing contribution of differential adhesion-mediated cell sorting); revised Figure 4/Video S6/Video S7 (addressing contribution of lens-committed progenitors); revised Figure 6 (addressing BMP and FGF signaling pathway activities).

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary

      The authors focused on medaka retinal organoids to investigate the mechanism underlying the eye cup morphogenesis. The authors succeeded to induce lens formation in fish retinal organoids using 3D suspension culture with minimal growth factor-containing media containing the Hepes. At day 1, Rx3:H2B-GFP+ cells appear in the surface region of organoids. At day 1.5, Prox1+cells appear in the interface area between the organoid surface and the core of central cell mass, which develops a spherical-shaped lens later. So, Prox1+ cells covers the surface of the internal lens cell core. At day 2, foxe3:GFP+ cells appear in the Prox1+ area, where early lens fiber marker, LFC, starts to be expressed. In addition, foxe3:GFP+ cells show EdU+ incorporation, indicating that foxe3:GFP+ cells have lens epithelial cell-characters. At day 4, cry:EGFP+ cells differentiate inside the spherical lens core, whose the surface area consists of LFC+ and Prox1+ cells. Furthermore, at day 4, the lens core moves towards the surface of retinal organoids to form an eye-cup like structure, although this morphogenesis "inside out" mechanism is different from in vivo cellular "outside -in" mechanism of eye cup formation. From these data, the authors conclude that optic cup formation, especially the positioning of the lens, is established in retinal organoids though the different mechanism of in vivo morphogenesis.

      Overall, manuscript presentation is nice. However, there are still obscure points to understand background mechanism. My comments are shown below.

      Major comments

      (1) At the initial stage of retinal organoid morphogenesis, a spherical lens is centrally positioned inside the retinal organoids, by covering a central lens core by the outer cell sheet of retinal precursor cells. I wonder if the formation of this structure may be understood by differential cell adhesive activity or mechanical tension between lens core cells and retinal cell sheet, just like the previous study done by Heisenberg lab on the spatial patterning of endoderm, mesoderm and ectoderm (Nat. Cell Biol. 10, 429 - 436 (2008)). Lens core cells may be integrated inside retinal cell mass by cell sorting through the direct interaction between retinal cells and lens cells, or between lens cells and the culture media. After day 1, it is also possible to understand that lens core moves towards the surface of retinal organoids, if adhesive/tensile force states of lens core cells may be change by secretion of extracellular matrix. I wonder if the authors measure physical property, adhesive activity and solidness, of retinal precursor cells and lens core cells. If retinal organoids at day 1 are dissociated and cultured again, do they show the same patterning of internal lens core covering by the outer retinal cell sheet?

      The question, whether different adhesive activity is involved in cell sorting and lens formation is indeed very intriguing.

      To address this point, we included additional experiments in the revised manuscript. As proposed by the reviewer, we performed dissociation and re-aggregation experiments of day one organoids at the timepoint, when retinal cell fate is already established and first cells with early lens fate (Foxe3::GFP positive) start appearing (see new Figure 4-figure supplement 1).

      After dissociation we followed Foxe3::GFP cells over time and observed that initially equally dispersed GFP<sup>+</sup> lens-committed cells gradually sort and establish contact with other GFP<sup>+</sup> cells, ultimately resulting in the formation of a central GFP<sup>+</sup> sphere within a retinal neuroepithelium (AcTub<sup>+</sup>) localized on the surface of the organoid (see new Figure 4-figure supplement 1e and new Video S5). This data show that differential adhesive properties of lens/retinal precursor cells can enable the formation of a spherical lens in the center of the organoid. This is now clearly stated in the revised version of the manuscript. 

      (2) Optic cup is evaginated from the lateral wall of neuroepithelium of the diencephalon. In zebrafish, cell movement occurs from the pigment epithelium to the neural retina during eye morphogenesis in an FGF-dependent manner. How the medaka optic cup morphogenesis is coordinated? I also wonder if the authors conduct the tracking of cell migration during optic cup morphogenesis to reveal how cell migration and cell division are regulated in lens of the Medaka retinal organoids. It is also interesting to examine how retinal cell movement is coordinated during Medaka retinal organoids.

      Looking into the detail of how optic cup-looking tissue arrangement of ocular organoids is achieved on cellular level is of course interesting. Our previous study showed that optic vesicles of medaka retinal organoids do not form optic cups (for details please see Zilova et al., 2021, eLife). We provide evidence that the formation of cup-looking structure of the ocular organoids presented here is mediated by the following processes: establishment of retina and lens domains at specific regions of the organoid – retina on the surface and lens in the center (see Figure 3-figure supplement 1d and Figure 3e, and Figure 4). Further, the dislocation of the centrally formed lens towards the organoid periphery results in the opening of the retina layer, moving the lens to the periphery while retinal cells stay static. We propose that the “cup-like” shape is acquired by an extrusion-like process of the lens from the center of the organoid.

      To address the cellular mechanisms involved in this process, we included additional experiments and followed the movements of retinal and lens cells (see new Figure 4c and 4d, new Videos S6, S7 and S8). Retinal cells (tracked as nuclei of the Rx3::H2B-GFP transgenic line) established in the periphery display repeated short distance movements restricted to the retinal epithelium. These movements are characteristic for interkinetic nuclear migration as found in the developing retina. In contrast, Foxe3::GFP lens progenitor cells performed long distance movements from the center to the periphery of the organoid. This movement was accompanied by profound cell shape changes of lens progenitor cells, suggesting an active movement of lens cells to the organoid periphery. These movements are shown in new/extended figures and in new supplementary videos (new Figure 4c and 4d, new Videos S6, S7 and S8) in the revised version of the manuscript.

      (3) The authors showed that blockade of FGF signaling affects lens fiber differentiation in day 1-2, whereas lens formation seems to be intact in the presence of FGF receptor inhibitor in day 0-1. I suggest the authors to examine which tissue is a target of FGF signaling in retinal organoids, using markers such as pea3, which is a downstream target of ERK branch of FGF signaling. Since FGF signaling promotes cell proliferation, is the lens core size normal in SU5402-treated organoids from day 0 to day 1?

      Assessing the activity of FGF signaling (cross-reference to Reviewer #3) in the organoids is an important point that we have taken care of and included in the revised manuscript.

      To address this point, we assessed which tissue/part of the organoid is responding to FGF signaling. To do so we analyzed the presence of phosphorylated ERK (pERK1/2) as FGF signaling target in ocular organoids from day 1 to day 2. At day 1, only low levels of FGF signaling activity were detectable in presumptive retinal or/and lens tissue (see revised Figure 6b). Only half a day later, a significant increase in FGF activity was observed specifically in the central region of the organoids (lens progenitor domain) (at day 1.5), prior to the onset of differentiation of lens fiber cells. This, together with inability of lens progenitor cells to differentiate to lens fiber cells in the presence of FGF inhibitor SU5402 provided during this critical period (day 1 to day 2) demonstrates that FGF signaling activity localized in the lens progenitor cells is required for lens fiber differentiation.

      By day 2, FGF activity was detected in both lens and retinal tissue of the organoid. Similar patterns of FGF activity were observed in embryos at 2 days post fertilization (see revised Figure 6b).

      The treatment with the FGF signaling inhibitor SU5402 from day 0 to day 1 did have no impact on the core size of organoid the dimension of which were fully comparable to the control (please see Figure 6d).

      (4) Fig. 3f and 3g indicate that there is some cell population located between foxe3:GFP+ cells and rx2:H2B-RFP+ cells. What kind of cell-type is occupied in the interface area between foxe3:GFP+ cells and rx2:H2B-RFP+ cells?

      That is for sure an interesting question. We are aware of this population of cells. We currently do not have data that clarify the fate of those cells with the required certainty. Rather than speculating, we are currently following up on that question by scRNA sequencing, however we see that beyond the scope of the current manuscript.

      (5) Fig. 5e indicates the depth of Rx3 expression at day 1. Is the depth the thickness of Rx3 expressing cell sheet, which covers the central lens core in the organoids? If so, I wonder if total cell number of Rx3 expressing cell sheet may be different in each seeded-cell number, because thickness is the same across each seeded-cell number, but the surface area size may be different depending on underneath the lens core size. Please clarify this point.

      The referee is right, figure 5e indicates the thickness of the cell sheet expressing Rx3 positioned at the surface of the organoid. Indeed, the number of Rx3-expressing cells (and lens cells) scales with the size of the organoid as stated in the submitted manuscript. We have taken care to remove ambiguities related to that point in the revised version of the manuscript.

      (6) Noggin application inhibits lens formation at day 0-1. BMP signaling regulates formation of lens placode and olfactory placode at the early stage of development. It is interesting to examine whether Noggin-treated organoid expands olfactory placode area. Please check forebrain territory markers.

      What tissue differentiates at the expense of the lens in BMP inhibitor-treated organoids is of course an intriguing question.

      To address this point, we labeled Noggin treated organoids at day 2 and day 3 with forebrain and olfactory placode markers. We could identify an increase in the domains expressing Lhx2, HuC/D and Otx2 in Noggin-treated organoids, showing a shift of the preferential differentiation of the neurons of anterior forebrain identity (see attached figure for reviewer). However, the available markers Lhx2, HuC/D and Otx2 found in the olfactory placode are in addition also co-expressed in further neuronal cell types of the anterior forebrain. While the speculation is tempting, the shift in expression does not allow to conclusively state the expansion of the olfactory placode.

      Author response image 1.

      Expression of forebrain and olfactory placode markers.

      I have no minor comments

      Referees cross-commenting

      I agree that all reviewers have similar suggestions, which are reasonable and provided the same estimated time for revision.

      Reviewer #1 (Significance):

      Strength:

      This study is unique. The authors examined eye cup morphogenesis using fish retinal organoids. Eye cup normally consists of the lens, the neural retina, pigment epithelium and optic stalk. However, retinal organoids seem to be simple and consists of two cell types, lens and retina. Interestingly, a similar optic cup-like structure is achieved in both cases; however, underlying mechanism is different. It is interesting to investigate how eye morphogenesis is regulated in retinal organoids,under the unconstrained embryo-free environment.

      Limitation:

      Description is OK, but analysis is not much profound. It is necessary to apply a bit more molecular and cellular level analysis, such as tracking of cell movement and visualization of FGF signaling in organoid tissues.

      Advancement:

      The current study is descriptive. Need some conceptual advance, which impact cell biology field or medical science.

      Audience:

      The target audience of current study are still within ophthalmology and neuroscience community people, maybe translational/clinical rather than basic biology. To beyond specific fields, need to formulate a general principle for cell and developmental biology.

      Reviewer #2 (Evidence, reproducibility and clarity):

      In this study from Stahl et al., the authors demonstrate that medaka pluripotent embryonic cells can self-organise into eye organoids containing both retina and lens tissues. While these organoids can self-organize into an eye structure that resembles the vertebrate eye, they are built from a fundamentally different morphogenetic process - an "inside-out" mechanism where the lens forms centrally and moves outward, rather than the normal "outside-in" embryonic process. This is a very interesting discovery, both for our understanding of developmental biology and the potential for tissue engineering applications. The study would benefit from some additional experiments and a few clarifications.

      The authors suggest that the lens cells are the ones that move from the central to a more superficial position. Is this an active movement of lens cells or just the passive consequence of the retina cells acquiring a cup shape? Are the retina cells migrating behind the lens or the lens cells pushing outwards? High-resolution imaging of organoid cup formation, tracking retina cells in combination with membrane labeling of all cells would help elucidate the morphogenetic processes occurring in the organoids. Membrane labeling would also be useful as Prox1 positive lens cells appear elongated in embryos while in the organoids, cell shapes seem less organised, less compact and not elongated (for example as shown in Fig 3f,g).

      Looking into the detail of how the optic cup-like arrangement of ocular organoids is achieved on the cellular level is indeed highly interesting. In the revised manuscript we now provide evidence that the formation of cup-like structure of the ocular organoids presented here is mediated by the following processes: establishment of retina and lens domains at distinct regions of the organoid – retina on the surface and lens in the center (see Figure 3-figure supplement 1d and Figure 3e, and Figure 4). Further, the dislocation of the centrally formed lens towards the organoid periphery results in the opening of the retina layer, moving the lens to the periphery while retinal cells stay static. We propose that the cup-like shape is acquired by an extrusion process of the lens from the center of the organoid.

      To address cellular mechanisms involved in this process, we included additional experiments and followed the movements of retinal and lens cells (see new Figure 4c and 4e, new Videos S6, S7 and S8).

      Retinal cells (tracked as nuclei of the Rx3::H2B-GFP transgenic line) display repeated short distance movements within the retinal epithelium. These movements are characteristic for interkinetic nuclear migration as found in the developing retina.

      In contrast, Foxe3::GFP lens progenitor cells performed long distance movements from the center to the periphery of the organoid. This movement was accompanied by profound cell shape changes of lens progenitor cells, suggesting an active movement of lens cells to the organoid periphery.

      These movements are shown in new/extended figures and in new supplementary videos (new Figure 4c and 4e, new Videos S6, S7 and S8) in the revised version of the manuscript.

      The organoids could be a useful tool to address how cell fate is linked to cell shape acquisition. In the forming organoids, retinal tissue initially forms on the outside, while non-retinal tissue is located in the centre; this central tissue later expresses lens markers. Do the authors have any insights into why fate acquisition occurs in this pattern? Is there a difference in proliferation rates between the centrally located cells and the external ones? Could it be that highly proliferative cells give rise to neural retina (NR), while lower proliferating cells become lens?

      We agree with the reviewer that this is a highly interesting question and in the revised manuscript we followed the advice and dedicated a part of the discussion to this topic. We believe that the arrangement is due to the induction of central lens fates by signal emanating from the retinal epithelium and discuss the role of the diffusion limit and the potential contribution of BMB and FGF signaling to this arrangement. Additional experiments addressing the target tissues of FGF and BMP signaling in the organoid have been provided in response to Reviewer #1. Interfering with FGF signaling that is essential for lens fiber cell differentiation interestingly did not impact on the lens size arguing against an immediate proliferative effect. Although the analysis of the respective proliferation rates at the surface or in the central region of the organoid might show some differences, we do not have any indications, that the proliferation rate itself would be instructive or superior to the cell fate decisions.

      What happens in organoids that do not form lenses? Do these organoids still generate foxe3 positive cells that fail to develop into a proper lens structure? And in the absence of lens formation, does the retina still acquire a cup shape?

      Lens formation is primarily dependent on the acquisition/specification of Foxe3-expressing lens placode progenitors. In the absence of Foxe3-expression, a lens does not develop. Once Foxe3-expressing progenitors are established, a lens is formed in unperturbed conditions (measured by the presence of expression of crystallin proteins). Organoids that do not have a lens, do not contain Foxe3-expressing cells.

      In the absence of a lens, the organoid is composed of retinal neuroepithelium, that does not form an optic cup like shape (for details of such phenotypes please see Zilova et al., 2021, eLIFE). We took care to state that clearly in the revised manuscript.

      The author suggest that lens formation occurs even in the absence of Matrigel. Is the process slower in these conditions? Are the resulting organoids smaller? While there are indeed some LFC expressing cells by day2, these cells are not very well organised and the pattern of expression seems dotty. Moreover, LFC staining seems to localise posterior to the LFC negative, lens-like structure (e.g. Fig.S1 3o'clock). How do these organoids develop beyond day 4? Do they maintain their structural integrity at later stages?

      The role of HEPES in promoting organoid formation is intriguing. Do the authors have any insights into why it is important in this context? Have the authors tried other culture conditions and does culture condition influence the morphogenetic pathways occurring within the organoids?

      We thank the reviewer for pointing this out. In the revised manuscript we made sure to be sufficiently clear in the wording and description of our observation. Indeed, Matrigel is not required for the acquisition of lens fate, which can be demonstrated by the expression of lensspecific markers. However, the presence of Matrigel has a profound impact on structural aspects of organoid formation. Matrigel is essential for organization of retinal-committed cells to form a retinal epithelium (Zilova et al., 2021, eLife). The absence of the structure of the retinal epithelium indeed negatively impacts on the cellular organization and the overall lens structure.

      To clarify the contribution of the Matrigel to the organoid organization, we performed additional experiments (see revised Figure 2-figure supplement 1c-f). As mentioned above, the absence of Matrigel impacts on the organization and thickness of retinal neuroepithelium (Rx2<sup>+</sup>, Figure 2-figure supplement 1c). However, measurement of the lens in organoids at day 2 and day 5 showed that size of the lens is not impacted upon in the absence of Matrigel (Figure 3-figure supplement 1d-e). Additionally, taking advantage of the Foxe3::GFP lens reporter line, we measured the onset of lens-specific gene expression in organoids with and without Matrigel. In both conditions, with and without Matrigel supplementation, Foxe3::GFP expression was initiated at 25 hours post aggregation (see revised Figure 4b).

      The role of the HEPES in lens formation is indeed very intriguing and currently under investigation. HEPES is mainly used to regulate the pH of the culture media which on its own might have an impact on multiple cellular processes. It will require a significant time investment to address the potential HEPES triggered molecular mechanisms impacting on lens formation (cross reference with Reviewer #3), which goes beyond the scope of the current manuscript.

      Referees cross-commenting

      Pleased to see that all the other reviewers are positive about the study and raise similar concerns and comments

      Reviewer #2 (Significance):

      This is a very interesting paper, and it will be important to determine whether this alternative morphogenetic process is specific to medaka or if similar developmental routes can be recapitulated in organoid cultures from other vertebrate species.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary:

      The manuscript by Stahl and colleagues reports an approach to generate ocular organoids composed of retinal and lens structures, derived from Medaka blastula cells. The authors present a comprehensive characterisation of the timeline followed by lens and retinal progenitors, showing these have distinct origins, and that they recapitulate the expression of differentiation markers found in vivo. Despite this molecular recapitulation, morphogenesis is strikingly different, with lens progenitors arising at the centre of the organoid, and subsequently translocating to the outside.

      Comments:

      The manuscript presents a beautiful set of high quality images showing expression of lens differentiation markers over time in the organoids. The set of experiments is very robust, with high numbers of organoids analysed and reproducible data. The mechanism by which lens specification is promoted in these organoids is, however, poorly analysed, and the reader does not get a clear understanding of what is different in these experiments, as compared to previous attempts, to support lens differentiation. There is a mention to HEPES supplementation, but no further analysis is provided, and the fact that the process is independent of ECM contradicts, as the authors point out, previous reports. The manuscript would benefit from a more detailed analysis of the mechanisms that lead to lens differentiation in this setting.

      We followed the reviewer’s advice and have included a systematic analysis of the contribution of ECM (Matrigel) to the process of lens formation. In the revised manuscript we made sure to be sufficiently clear in the wording and description of our observation. Indeed, Matrigel is not required for the acquisition of lens fate, which can be demonstrated by the expression of lensspecific markers. However, the presence of Matrigel has a profound impact on structural aspects of organoid formation. Matrigel is essential for organization of retinal-committed cells to form a retinal epithelium (Zilova et al., 2021, eLIFE). The absence of the structure of the retinal epithelium in turn indeed negatively impacts on the cellular organization and the overall lens structure.

      To clarify the contribution of the Matrigel to the organoid organization, we performed additional experiments (see revised Figure 2-figure supplement 1c-f). As mentioned above, the absence of Matrigel impacts on the organization and thickness of retinal neuroepithelium (Rx2<sup>+</sup>, Figure 2-figure supplement 1c). However, measurement of the lens in organoids at day 2 and day 5 showed that size of the lens is not impacted upon by the absence of Matrigel (Figure 3-figure supplement 1d-e).

      Additionally, taking advantage of the Foxe3::GFP lens reporter line, we measured the onset of lens-specific gene expression in organoids with and without Matrigel. In both conditions (with and without Matrigel supplementation), Foxe3::GFP expression was initiated at 25 hours post aggregation (see revised Figure 4b).

      The role of the HEPES in lens formation is indeed intriguing and currently under investigation. HEPES is mainly used to adjust the pH of the culture media, which, on its own might have an impact on multiple cellular processes. It will require a significant time investment to address the potential HEPES triggered molecular mechanisms impacting on lens formation (cross reference with Reviewer #3), which clearly goes beyond the scope of the current manuscript.

      The markers analysed to show onset of lens differentiation in the organoids seem to start being expressed, in vivo, when the lens placode starts invaginating. An analysis of earlier stages is not presented. This would be very informative, allowing to determine whether progenitors differentiate as placode and neuroepithelium first, to subsequently continue differentiating into lens and retina, respectively. Could early placodal and anterior neural plate markers be analysed in the organoids? This would provide a more complete sequence of lens vs retina differentiation in this model.

      We have taken care to show according stages in embryo and organoid side by side. We provide additional data to highlight the expression of Rx3::H2B-GFP (retina) and Foxe3::GFP (lens and lens placode) markers in earlier developmental stages. For the presumptive eye field within the region of the anterior neural plate (S16, late gastrula) Rx3 represents one of the earliest markers (see revised Figure 3-figure supplement 1). Already before an apparent lens placode is formed (see revised Figure 3d) Foxe3::GFP expression is detected within the presumptive lens ectoderm, demonstrating that Foxe3 is ideally suited as an early marker for placodal progenitors in medaka. The onset of Rx3 and Foxe3-driven reporters is clearly early enough to support the claim about the separate origin of the lens (placodal) and retinal (anterior neuroectoderm) tissues within the ocular organoids now represented in the revised figures.

      The analysis of BMP and Fgf requirement for lens formation and differentiation is suggestive, but the source of these signals is not resolved or mentioned in the manuscript. Are BMP4 and Fgf8 expressed by the organoids? Where are they coming from?

      Assessing the activity of BMP and FGF signaling (cross-reference to Reviewer #1) in the organoids is an important point that we have taken care of and included in the revised manuscript.

      To address this point, we assessed which tissue/part of the organoid is responding to BMP and FGF signaling. To do so we analyzed the presence of phosphorylated SMAD1/5/8 (pSMAD1/5/8) and phosphorylated ERK (pERK1/2) as BMP and FGF signaling target in ocular organoids from day 1 to day 2. BMP signaling activity was detected in the center (region of establishment of lens-committed progenitors (Figure 3e)) of the organoid at day 1 (see revised Figure 6a). At day 1, only low levels of FGF signaling activity were detectable in presumptive retinal or/and lens tissue (see revised Figure S6b). Only half a day later, a significant increase in FGF activity was observed specifically in the central region of the organoids (lens progenitor domain, at day 1.5), prior to the onset of differentiation of lens fiber cells. This, together with inability of lens progenitor cells to differentiate to lens fiber cells in the presence of FGF inhibitor SU5402 provided during this critical period (day 1 to day 2) demonstrates that FGF signaling activity localized in the lens progenitor cells is required for lens fiber differentiation.

      By day 2, FGF activity was detected in both lens and retinal tissue of the organoid. Similar patterns of FGF activity were observed in embryos at 2 days post fertilization (see revised Figure S6b).

      The treatment with the FGF signaling inhibitor SU5402 from day 0 to day 1 did have no impact on the core size of organoid the dimension of which were fully comparable to the control (please see Figure 6b).

      Related to the presence of the corresponding ligands we can state that they are indeed expressed in the organoids at the matching stages based on RNA seq and RT-PCR analyses, however we could not find them specifically localized. This may be due to a widespread, ubiquitous expression or may simply relate to technical problems.

      While we can state with confidence that the ligands are present at the relevant time points and trigger the downstream pathways in a localized manner, the question whether the response is due to a localized signal or localized competence remains to be addressed.

      The fact that the lens becomes specified in the centre of the organoid is striking, but it is for me difficult to visualise how it ends up being extruded from the organoid. Did the authors try to follow this process in movies? I understand that this may be technically challenging, but it would certainly help to understand the process that leads to the final organisation of retinal and lens tissues in the organoid. There is no discussion of why the morphogenetic mechanism is so different from the in vivo situation. The manuscript would benefit from explicitly discussing this.

      Following the shift of the lens in vivo is indeed very relevant suggestion and we have taken care to address this in the revised manuscript.

      To clarify this process, we included additional experiments and followed the movements of lens cells (see new Figures 4c, 4d and 4e, new Videos S6 and S7). Foxe3::GFP lens progenitor cells were found to actively move over long distances from center to the organoid periphery. This movement was accompanied by profound cell shape changes of lens progenitor cells with the active extension of lamellipodia and filopodia strongly arguing for an active movement of lens cells to the organoid periphery (cross-reference with Reviewer #1 and Reviewer #2).

      Referees cross-commenting

      We all seem to have similar comments and concerns. I think overall the suggestions are feasible and realistic for the timeframe provided.

      Reviewer #3 (Significance):

      This study describes a reproducible approach to differentiate ocular organoids composed of lens and retinal tissues. The characterisation of lens differentiation in this model is very detailed, and despite the morphogenetic differences, the molecular mechanisms show many similarities to the in vivo situation. The manuscript however does not highlight, in my opinion, why this model may be relevant. Clearly articulating this relevance, particularly in the discussion, will enhance the study and provide more clarity to the readers regarding the significance of the study for the field of organoid research, ocular research and regenerative studies.

    1. ingering in clouds and servers I will never see.

      This passage really challenged my understanding of the Anthropocene and stuck with me as I continued navigating your Scalar project. I had previously not concretely thought about the relationship with an everyday object, like the phone, and the environmental impacts in places so far from me, that I may never go to. More than that I had not been able to visualize the impacts of data on 'the cloud' until I read this page and thought about the amount of electricity that is being consumed for one photo. I think this is a really interesting way of reaching the local dimensions of the Anthropocene from something as global as phones. It makes you think about your personal impact and how far removed you are from it as we spoke about in class.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Joint Public Review:

      Summary:

      The authors investigate how stochastic and deterministic factors are integrated in cell fate decisions, using Dictyostelium discoideum as a model system. They show that cells in different cell cycle phases (a deterministic factor) are predisposed to different fates, albeit with deviations, when exposed to the same environmental stimulus. However, gene expression variability (a stochastic factor) enhances the robustness of cellular responses to environmental cues that disrupt the cell cycle.

      Using a simple, tractable mathematical model, the authors demonstrate that cell fate decisions in D. discoideum depend on a combination of deterministic and stochastic factors, i.e., cell cycle phase and gene expression variability, respectively. They then identify Set1 - a key regulator of gene expression variability - indicate the mechanism through which it modulates this variability, and link it to a phenotype in D. discoideum development. Finally, they confirm that gene expression variability contributes to the robustness of the cell's response to environmental disruptions that interfere with the cell cycle.

      Strengths:

      The authors are careful in the choice of their experiments and in measuring gene expression variability, using methods that account for expected trends with average gene expression.

      Weaknesses:

      However, in terms of mathematical modelling, it would be important to rule out sources of stochasticity (other than gene expression variability), and also to consider cases where stochastic factors are not necessarily completely independent of the deterministic ones.

      We thank you and the reviewers for the insightful comments that have helped clarify the findings presented. We have addressed all comments and feel that the revised manuscript is much improved.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Minor typographical mistakes:

      (a) in the title: Linage -> lineage

      Corrected as suggested

      (b) on page 19: use a full stop in "...are biased towards the stalk fate, Use of the cell cycle position..."

      Corrected as suggested

      (c) on page 20: become -> becoming in "...(and end up biased towards become stalk)..."

      Corrected as suggested

      (d) on page 16: "mu = G p k". Perhaps it should be x instead of k?

      Corrected as suggested

      (2) Regarding the abstract:

      (a) This work tries to outline general principles (coordination/integration of deterministic and stochastic factors) in cell fate choice, especially when cells are faced with (near) identical environmental conditions. Perhaps the abstract, especially the first line, could be rephrased to reflect the generality of symmetry breaking and differentiation that is studied in this article/work. e.g., as was done in the first paragraph of the discussion.

      Corrected as suggested

      (b) It might be worthwhile clarifying what "this" is in the sentence "We suggest this represents an adaptive mechanism that increases developmental robustness against perturbations that affect deterministic signals." in the abstract.

      Corrected as suggested

      (3) Regarding the model:

      (a) The model tries to combine the stochastic and deterministic parts to explain the propensity for stalk fates. It is assumed that the cell cycle-associated factors (CCAF) provide the deterministic part while the cell cycle-independent factors (CCIF) provide the stochastic part. The net result is an addition of the two, which is then compared against a threshold to decide the propensity for stalk fates. However, another simple way to introduce stochasticity would be to make the CCAF decay stochastic. Reasons to consider this scenario would be: (i) the decay process (especially in the biological context) is generally stochastic, (ii) it would not be inconsistent with the fact that cell cycle dependent genes are also variable, and (iii) this way of introducing stochasticity would also provide expression level characteristics/plots similar to the ones outlined in Figure 1C, i.e. with a probability distribution of CCAF values for a given amount of time after mitosis. Would there be arguments or experimental evidence to rule this possibility out? For instance, would the results shown in Figure 7 contradict this model?

      We agree that there could be stochasticity the CCAF decay process. In this scenario, the expected value of CCAF (which would reflect the mean of a noisy distribution) would show a deterministic pattern of decay through time, representing the average value of CCAF across cells that are in the same phase of the cell-cycle. The noisiness around such a pattern of deterministic decay in the mean value of CCAF (i.e., the residual variation) would then represent CCIF since it would be, by definition, cell-cycle independent. Hence, the present model is fully consistent with this possibility since it would still lead to some variation being cell-cycle associated and some variation being cell-cycle independent. Therefore, this scenario could be viewed as a different functional/biological process leading to the same ultimate distribution we model. To clarify this, we have added text justifying the hypothesis that the noisy distribution is due to gene expression differences, rather than decay itself:

      “Protein levels can vary widely between cells because it is regulated at multiple levels, including transcription, translation and stability. The position of the noisiest step in a pathway affects the overall noise dramatically, because each step usually amplifies noise in the previous steps (Alon 2007). Consistent with this idea, theory and single-cell experiments have shown that a major contributor to cell-cell variation is the bursty expression of low-copy mRNAs. We therefore hypothesized that this noisiness across cells arises from stochastic expression of a set of genes contributing to CCIF levels.”

      (b) On page 7, the formula for total CCIF variance assumes independence of the genes g_i. Is this a reasonable assumption?

      This concerns the argument that a set of stochastically expressed genes will yield an approximately Gaussian distribution of CCIF. Our results do not depend on the solution for the mean and the variance, only that noisy genes will generally yield such a Gaussian distribution.This is because independence is not strictly required for the central limit theorem to yield a Gaussian distribution. The distribution will still be Gaussian under a broad range of conditions (especially since gene expression is bounded, so there is no chance of the total ending up generating an infinite variance). The primary requirement is that the expression of any given gene is independent from that of most other genes. As a result, most of the variation in expression across genes is independent (even if any given gene is not independent from all other genes).

      The most likely pattern of non-independence will be the case in which gene expression is ‘modular’, where there are co-expressed blocks, meaning that non-independence is limited in scale so that genes within a co-regulated block show correlated expression, but their expression is uncorrelated to genes in other blocks. This pattern is functionally analogous to what is known as m-dependence in sequences of random variables (e.g., time series), where variables close together in sequence are correlated (but otherwise uncorrelated). Derivations of the central limit theorem have shown that the means (and hence the sum) of these sorts of variables still follow an approximately Gaussian distribution over a broad range of scenarios. In the case of non-independent gene expression, this means that we can view the independent random variable as being the expression value of a group of co-expressed genes (instead of individual genes). Hence, the means (or sums) of these values will still conform to the central limit theorem.

      This problem is addressed in:

      Diananda, P. H. 1955. The central limit theorem for m-dependent variables. Proc. Combin. Philos. Soc. 51:92-95

      Hoeffding, W. & H. Robbins. 1948. The central limit theorem for dependent random variables. Duke Math. J. 15:773-780

      Orey, S. A. 1958. Central limit theorems for m-dependent random variables. Duke Math. J. 25:543-546

      Rosén, B. 1967. On the central limit theorem for sums of dependent random variables, Z. Wahrscheinlichkeitstheorie und Verw. Gebiete, 7:48-82

      To clarify this, we have added the following text and references:

      Although this derivation implicitly assumes that stochastically expressed genes are independent, this assumption is not strictly required for the distribution of CCIF to be approximately normal. If stochastically expressed genes show clustered co-expression owing to shared regulation, then the sum across these co-expressed blocks is still expected to be approximately normally distributed (as long as there are a reasonably large number of co-expressed clusters) (Diananda 1955; Hoeffding and Robbins 1994; Rosén 1967).

      (4) In section "Cell cycle independent stochastic gene expression variation is extensive in growing cells":

      Regarding the statement: "We first determined the coefficient of variation (CV2) of expression for all genes. As expected, this tends to decrease as average expression level increases (Supplementary Figure 2).":

      It would be good to specify how the "expected variation" was calculated exactly. For instance, it was hard to discern from Supplementary Figure 2 how CV^2 decreasing with average expression levels was used in the calculation of expected variation.

      This is described in the methods on page 38

      “A trend line was fitted to the data using non-linear least squares regression (Scran v1.15.9). Genes were defined as variable (2073 genes) based on a one-sided test assuming a normal distribution around the trend but one where deviation changed depending on the mean expression of a given gene (Scran v1.15.9 - modelGeneCV2) with a FDR of < 0.05.”

      (5) In section "Stochastically expressed genes are associated with cell fate determination"

      (a) For readers unfamiliar with the organism ‘Dictyostelium discoideum’, a short description of its life cycle with growth and development/differentiation phases would be useful to provide the right context.

      Corrected as suggested

      (b) In section "Cell cycle independent stochastic gene expression variation is extensive in growing cells", it was shown that cell cycle dependent genes are also highly variable (in other words, ‘stochastic’). It would, therefore, be useful to elaborate on the definitions of "stochastically expressed genes, cell cycle-associated genes, and non-variable genes", as used in this section. Admittedly, the distinction does get clearer towards the last section of Results, but some elaboration here would make the reading smoother.

      Corrected as suggested

      (c) If the "cell cycle associated genes" are the same as "cell cycle dependent genes", it would be good to use one term consistently.

      Corrected as suggested

      (d) The developmental index is divided into 10 bins from 0 to 1. Is there a rationale for the choice of a number of bins? Would this choice affect significance tests for "stochastic" vs others? <br /> (The same question may apply to the "Cell type index")

      Significance is robust to the number of bins chosen (e.g. 5-25). Of course, if there are too many bins (low number of genes) or too few bins (addition of noisy data) significance falls. In the case of developmental index, our choice of bins is also based on previous analyses (de Oliveira, et al 2019), which developed the index we used, and showed that a threshold of >0.9 can be used to identify ‘developmentally expressed genes’.

      (6) In Figure 5:

      (a) Does the statement "*** binomial test, p<0.01." (as seen in caption for part C) actually refer to part D?

      Corrected as suggested

      (b) Could the authors please specify what "mis-expressed" means in Figure 5D? Are these genes that are upregulated, downregulated, or both? From what set of genes was the random sampling done?

      Corrected as suggested

      (c) In Figure 5F, is the decrease in CV^2 explained entirely by the increase in mean (as shown in Figure 5E)?

      We appreciate the point made by the reviewer and recognise that disentangling changes in gene expression variation from changes in expression levels is extremely difficult (any changes in burst frequency will necessarily affect expression level). However, we do not think this affects our conclusions, which are supported by results with representative Set1 dependent reporter genes (Figure 5G and H) which suggest that the number of cells expressing (rather than the expression in each cell is affected) in these cases at least.

      (7) In Figure 6A: Could the authors please elaborate on the difference between the rows labelled "WT" and "set1-"? Are they two different types of chimera?

      Corrected as suggested

      (8) In Section "Cell cycle position and gene expression variation interact to control cell type proportioning":

      Is there a graph corresponding to the statement "However, the level of GFP expression in each responding cell did not significantly change."?

      Corrected as suggested

      (9) In section "Influence of stochastic variation on sensitivity to cell cycle perturbations" of the Supplementary text:

      (a) The model for cell cycle bias is not entirely clear. For instance, is the quantity N(t) = U(t) + Q_t U(t) also a probability distribution, like U(t) is? If so, there must be a normalization factor. It was difficult to understand the procedure behind this calculation. Perhaps some more elaboration (with words or a small schematic) on this model/method would help.

      The value of U(t) was originally being used to denote the uniform probability density function (for the uniform distribution), but for clarity this has been changed to follow the convention that U[a,b] denotes the uniform distribution over the interval from a to b (which, in this case would be U[0, 1]), while f(t) is now being used to make it clear that this is the probability density, where f(t) = 1 across the interval. Because the uniform distribution necessarily integrates to 1 over the defined range, it does not need to be normalised. The confusion here is perhaps due to the expression f(t) = 1 being interpreted as defining the probability of sampling a value of t (but in a continuous distribution we can only define the probabilities of sampling over an interval), instead of defining the probability density over the interval from a to b, where f(x) would be 1/(b – a), and hence over the interval of 0 to 1, f(x) would equal 1.

      To help clarify this issue, this section has been rewritten and a new figure (which appears as Supplementary Figure 12) has been added that illustrates the resulting probability density functions for biased sampling from the cell cycle.

      (b) References to Figure 8A, B seem to be indicating Supplementary Figure 12 instead. 

      Corrected as suggested

      Reviewer #2 (Recommendations for the authors):

      This manuscript seems quite interesting, but many sections are so unclear that I cannot follow what has been done. I would suggest slowly going through the manuscript and carefully explaining things. This will probably considerably increase the size of the manuscript, but many sections are too terse to follow even after many, many readings of the Results and figure legend.

      Corrected as suggested

      Some specific comments (this is not at all comprehensive, but rather illustrative)

      Page 2 - 'genes strongly associated with fate choice' - can you explain this a bit more - genes associated with one cell type or another, or genes that somehow regulate the choice?

      Corrected as suggested

      Page 2 - this abstract is quite vague, I would suggest being more specific to reflect what is in the manuscript.

      Corrected as suggested

      Page 3 - 'exhibit bivalent H3K4me3..' please explain 'bivalent' a bit more.

      Corrected as suggested

      Page 7 - 'Bernoulli process with probability that (meaning that is scaled to the size of the temporal interval)' (non-copying symbols deleted) could be simplified.

      Corrected as suggested

      Page 7 - please define all variables/ equation components. What is N? What is x bar? What is s2? The middle paragraph is very difficult to follow.

      This paragraph has been rewritten and a definition of the distribution added for clarity.

      Page 7 - 'genes might logically vary in the value of pi, such variability does not impact our results. Trying to decipher this paragraph, it seems that pi is a function of time, so this could affect the results.

      pi is the probability that a stochastically expressed gene is actually expressed in whatever interval is being considered for all genes. pi will necessarily increase if the time interval considered is increased. The key point is we are considering the probability that any given gene is expressed in the same time interval. In this case, genes could vary in pi, and thus some burst more often and others less often.

      Page 9 - '(it is 98.35 times more likely' there may be too many significant figures here.

      Corrected as suggested

      Page 10 - for the Area Under the Receiver Operating Characteristic Curve (AUROC), what are you classifying? AUROC is typically used for diagnostic tests to determine how well the test can discriminate between two completely different outcomes. What is the input, and what are the outcomes?

      Corrected as suggested

      Figures:

      What are the dashed lines in Figure S2A?

      Corrected as suggested

      What are the X-axes in Figure S3?

      Corrected as suggested

      I do not understand what you are showing in Figure S3.

      Corrected as suggested in results

      In Figure 2B, I cannot find in the text or figure legend any description or explanation of 'Group 1', 'Group 2', or 'Group 3'.

      Corrected as suggested

      Figure 3D needs a lot more explanation; I cannot understand this based on the text and the figure legend.

      Corrected as suggested

      The Set1 work should discuss the work in PMID: 39242621

      Corrected as suggested

      Figure 8 D needs a size bar

      Corrected as suggested

    1. Author response:

      Many thanks to the three reviewers and the editors for their comments and review. These are fair, consistent (across positives and negatives), and largely expected comments. On behalf of my coauthors, I use this letter as a provisional response to indicate what we can and intend to change in a revised manuscript.

      (1) A major comment from all three referees is that our single-nucleus RNA-seq data should be validated. The reviewers differ in the detail of exactly what they think should be validated, but they refer, individually, to (1) the discovery of ‘cell types’ themselves, (2) pathways inferred from trajectory analysis, (3) differentially expressed genes in plucked vs control condition at four time points and/or (4) inferred ligand-receptor pairs from cell-cell communication analysis, across the same time course. 

      I think we’re actually on pretty good footing for 1-3, because of work we’ve published in the cichlid fish model.

      I tally that in references cited in the manuscript, and highlighted below (References 1, 10, 11, 29, 30, 31), we present 29 figures with 273 individual figure panels of histology, in situ hybridization and immunohistochemistry featuring genes expressed across stages of tooth development and replacement. These genes are markers of dental competency and regenerative potential.

      In addition, in multiple of these papers, we use pharmacology to manipulate the role of key pathways (Hh, BMP, Wnt, Notch) in cichlid tooth development and replacement. Identification and validation of cell types make use of these published data in cichlids (for markers matched to mouse), as well as an unbiased computational approach (SAMap) that draws homology between cichlid and mouse dental cell types, based on shared global patterns of gene expression.

      In short, experiments to validate cell types, gene expression and pathways active in cichlid teeth are published and referenced herein. I noticed that these references (some of which include Gareth Fraser as an author, when he was a postdoc in my group; for Reviewer 2) were cited in the Introduction and not the Rationale/Methods or Results section (such that reviewers may have missed them). We will be clearer about this in the revision. 

      We have not validated nor analyzed functionally the ligand-receptor pairs inferred from cell-cell communication analysis, across four times points of accelerated replacement. This work is beyond the scope of the current paper, and we will include a statement that these computational inferences represent hypotheses to be tested (although many of these ligand-receptor pairs have been noted in other ‘tooth’ publications that we cite).

      (2) The biggest weakness of our manuscript, noted by referees, is that we do not provide serial histology to accompany our snRNA-seq time course after plucking. We describe this as a limitation in the “Study limitations and future direction” section of the Discussion, but we can and will be stronger about why this is a weakness (e.g., we do not explicitly know for instance, the degree of damage done to tissue in the plucking paradigm). We do know that the jaw recovers quickly, but we do not know how different the plucked side is from the control side (which is also undergoing active replacement and remodeling). Uniting reviewer comments 1 and 2 here, the best future approach is a spatial transcriptomics reference at distinct stages of the plucking<>recovery paradigm, as we framed in the Discussion section, because this addresses simultaneously the state of dental/jaw tissue and the in situ expression of thousands of genes.

      (3) Reviewers asked about the presence of stromal cells in our snRNA-seq data. Because of this and another comment on the posted preprint version of our manuscript, we will take another look at the mesenchymal compartment of the snRNA-seq data and trajectories built from it.

      (4) Multiple (minor) suggestions for clarification in text and figures will be adopted. 

      Generally, I don’t think we’ll require reviewer re-engagement on the revision; editor review should be sufficient.

      References cited in the manuscript, highlighted here:

      (1) Fraser, G. J. et al. An Ancient Gene Network Is Co-opted for Teeth on Old and New Jaws. PLoS Biol. 7, e1000031 (2009).

      (10) Fraser, G. J., Bloomquist, R. F. & Streelman, J. T. Common developmental pathways link tooth shape to regeneration. Dev. Biol. 377, 399–414 (2013).

      (11) Bloomquist, R. F. et al. Developmental plasticity of epithelial stem cells in tooth and taste bud renewal. Proc. Natl. Acad. Sci. 116, 17858–17866 (2019).

      (29) Streelman, J. T., Webb, J. F., Albertson, R. C. & Kocher, T. D. The cusp of evolution and development: a model of cichlid tooth shape diversity. Evol. Dev. 5, 600–608 (2003).

      (30) Fraser, G. J., Bloomquist, R. F. & Streelman, J. T. A periodic pattern generator for dental diversity. BMC Biol. 6, 32 (2008).

      (31) Bloomquist, R. F. et al. Coevolutionary patterning of teeth and taste buds. Proc. Natl. Acad. Sci. 112, (2015).

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study examines how different parts of the brain's reward system regulate eating behavior. The authors focus on the medial shell of the nucleus accumbens, a region known to influence pleasure and motivation. They find that nerve cells in the front (rostral) portion of this region are inhibited during eating, and when artificially activated, they reduce food intake. In contrast, similar cells at the back (caudal) are excited during eating but do not suppress feeding. The team also identifies a molecular marker, Stard5, that selectively labels the rostral hotspot and enables new genetic tools to study it. These findings clarify how specific circuits in the brain control hedonic feeding, providing new entry points to understand and potentially treat conditions such as overeating and obesity.

      We thank Reviewer 1 for the positive feedback, summary of our findings and for the thorough reading and constructive comments on the manuscript, which allowed us to improve the quality of the revised version.

      Strengths:

      (1) Conceptual advance: The work convincingly establishes a rostro-caudal gradient within the medNAcSh, clarifying earlier pharmacological studies with modern circuit-level and genetic approaches.

      (2) Methodological rigor: The combination of fiber photometry, optogenetics, CRISPR-Cas9 genetic engineering, histology, FISH, scRNA-seq, and novel mouse genetics adds robustness, with complementary approaches converging on the central claim.

      (3) Innovation: The generation of a Stard5-Flp line is a valuable resource that will enable precise interrogation of the rostral hotspot in future studies.

      (4) Specificity of findings: The dissociation between appetitive and aversive conditions strengthens the interpretation that the observed gradient is restricted to feeding.

      We thank Reviewer #1 for their supportive feedback.

      Weaknesses and points for clarification

      (1) Role of D2-SPNs: Since D1 and D2 pathways often show opposing roles in feeding, testing, or discussing D2-SPN contributions would provide an important control and context. Since the claim is that Stard5 is expressed in both D1- and D2MSNs, it seems to contradict the exclusive role of D1R MSNs in authorizing food intake.

      We agree that D2-SPNs represent an important and relevant cell population in the context of our study. The Stard5-Flp line labels a mixed population of D1- and D2-SPNs, and we agree that dissecting the distinct contributions of Stard5<sup>+</sup> D1-SPNs and Stard5⁺ D2-SPNs to feeding behavior would be both interesting and informative.

      Although we understand the point raised by the Reviewer, we do not entirely agree that the expression of Stard5 in both D1- and D2-SPNs contradicts the established role of D1-SPNs in authorizing food intake. In the medNAcSh, D1- and D2-SPNs do not exert opposing functions. D2-SPNs project densely to the ventral pallidum and more sparsely to the lateral hypothalamus and, like D1-SPNs, are predominantly rewardinhibited at the population level (Domingues et al. 2025; Pedersen et al. 2022).

      We added the following in the discussion: “Additionally, a new study showed that manipulation of D2-SPN cell bodies in the medNAcSh modulates reward preference, self-stimulation, and palatable food intake in a frequency- and context-dependent manner (Requejo-Mendoza et al., 2025). Together, these findings suggest that D1- and D2-SPNs within the medNAcSh play complementary rather than opposing roles in reward processing. Hence, the potential role of rostral and caudal medNAcSh D1- and D2-SPNs in foodrelated behaviors beyond the act of consumption could be addressed in future work.” We also acknowledge that not investigating rostro-caudal gradients of D2-SPN in reward and aversion processing “represents a limitation of this work”.

      We fully agree that disentangling the specific contributions of Stard5<sup>+</sup> D1- and Stard5<sup>+</sup> D2-SPNs is an important next step. We have now crossed the Stard5-Flp line with Drd1-Cre and A2a-Cre lines. In a pilot experiment (not shown), we injected Flp+,Cre+, Flp+,Cre- and Flp-,Cre+ mice with 4 different FlpOn-CreOn AAVs to determine if any of these AAVs demonstrate specific expression. However, all AAVs exhibited moderate to strong leaky expression of the Cre, preventing reliable cell-type-specific targeting. This was not seen with Flp-only or Cre-only AAVs. The leakiness mentioned is a known challenge of FlpOn-CreOn AAVs and requires additional troubleshooting (e.g. reduce the titer). As this proved to be more challenging than anticipated, this work is ongoing and will be addressed in a future study rather than in the present revisions.

      (2) Behavioral analyses:

      (a) In Figure 2, group differences in consumption appear uneven; additional analyses (e.g., lick counts across blocks and session totals) would strengthen interpretation.

      The group differences in consumption that appear uneven likely reflect an overall lower total lick counts per session in the Control group. We have now added analyses on average lick counts per block and session totals in the newly included Supplementary Figure S7, which support the results shown in Figure 2.

      Although we observe a difference in total lick count across the entire session between Control and Rostral ChrimsonR mice (Supplementary Figure S7d), we deem the comparison in total session lick counts not that informative here. Instead, we would argue that the laser-on epoch is the most meaningful comparison. During this period, optogenetic activation had no effect on licking behavior in control mice, showed a nonsignificant trend toward reduced consumption in caudal ChrimsonR mice, and produced a significant reduction in lick counts when rostral medNAcSh D1-SPNs were activated (Figure 2g-i and Supplementary Figure S7c).

      We added in the discussion the following explanation:

      “In addition, comparison of licking behavior during the laser-off blocks revealed an interesting effect: following cessation of opto-stimulation, Rostral ChrimsonR mice licked more than Caudal ChrimsonR and Control mice, suggesting a possible compensatory overconsumption. One possible interpretation is that the optogenetic parameters used suppressed consummatory behavior without reducing the motivation to obtain the reward. Furthermore, consistent with the RTPPA results, activation of rostral D1-SPNs may be experienced as aversive and termination of the optogenetic stimulation could produce relief, which in turn reinforces the licking behavior. Further investigations are required to test these possibilities.”

      (b) The design and contribution of aversive assays to the main conclusions remain somewhat unclear and could be better justified.

      We appreciate the Reviewer’s comment regarding the design and contribution of the aversive assays. The rationale for including these experiments was to determine whether the rostro–caudal functional segregation observed for reward-related feeding also applies to aversive processing.

      First, using foot shock, we tested whether D1-SPNs in the rostral versus caudal medNAcSh respond differently to an aversive stimulus. In contrast to reward-related responses, both populations responded similarly, exhibiting excitation. Second, to ensure that this effect was not specific to a single stressor, we tested a second aversive stimulus (tail lift) and again observed comparable excitatory responses in rostral and caudal D1-SPNs. Third, we assessed whether optogenetic activation of these neurons is perceived as rewarding or aversive. Using a real-time place preference/aversion assay, we found that optogenetic stimulation of D1-SPNs in both subregions induced place aversion.

      Together, these experiments show that while D1-SPNs display region-specific effects on reward-related feeding behavior, their activity responses to aversive stimuli and the avoidance response to optogenetic activation are similar across rostral and caudal medNAcSh. This contrast strengthens our conclusion that the D1-SPN rostro-caudal gradient is specific to appetitive contexts.

      We added the following in the discussion:

      “Here, we further tested the existence of rostro-caudal gradients for aversion, asking whether D1-SPNs in the rostral vs. caudal medNAcSh respond differently to aversive stimuli. To ensure that any observed effects were not specific to a single stressor, we tested two distinct aversive stimuli (foot shock and tail lift). In both cases, we found no rostro-caudal differences, as D1-SPNs in both subregions responded with excitation. We also asked whether optogenetic activation of these neurons is perceived as aversive. Stimulation of D1- SPNs in both rostral and caudal medNAcSh promoted aversive behavioral responses in the RTPPA experiment. Hence, in contrast to the pharmacological inhibitions mentioned above, we did not detect differences in aversive behaviors according to the rostro-caudal medNAcSh site.”

      (c) The scope of behavior is mainly limited to consumption; testing related domains (motivation, reward valuation, and extinction) could broaden the significance.

      We thank the Reviewer for the suggestion to examine additional behavioral domains such as motivation, reward valuation, and extinction. We focused our efforts on consumption given the large body of literature demonstrating a very important role of the medNAcSh in reward consumption. However, we fully agree that feeding encompasses multiple phases, from appetitive and goal-directed behaviors to consummatory behavior, and that the NAc in general, and to some extent the NAcSh is involved in behaviors across this spectrum. For instance, prior work has shown that the medNAcSh is involved in reward preference and that this follows a rostro-caudal gradient (e.g. Pedersen et al. 2022).

      While it would be informative to directly test motivational processes using operant paradigms (e.g., nosepoke or lever-press tasks), our current experimental setup did not allow for these assays. Instead, we performed exploratory experiments manipulating the animals’ internal state with food deprivation. As expected, under food deprivation, total licking increased robustly in control mCherry and Rostral ChrimsonR medNAcSh mice as compared to ad libitum feeding (25 min session with 5 alternating on-off blocks: ad libitum Control = 692 and Rostral ChrimsonR= 1280 average total licks per session, see Figure 2g-h and Supplementary Figure S7d; food deprived Control =2428 and Rostral ChrimsonR =2390 total licks averaged for N=9 Control, N= 12 Rostral). Moreover, similar to ad libitum feeding, optogenetic activation of rostral D1-SPNs suppressed licking in food-deprived mice , albeit to a lesser extent than under ad libitum feeding conditions (Figure 2).

      These preliminary observations suggest that internal state modulates the role of rostral D1-SPNs in reward consumption, potentially reflecting an interaction between homeostatic and hedonic feeding circuits. However, as this line of investigation was exploratory and not pursued further in the present study, these data are not included in the main manuscript.

      Author response image 1.

      In vivo optogenetic stimulation of rostral medNAcSh inhibits reward consumption to a lesser extent after overnight food deprivation. a. Quantification of the average lick count per 5 min block in mCherry control mice vs. ChrimsonR (rostral) mice, showing a lower lick count in rostral medNAcSh ChrimsonR mice during the opto-stimulation epoch. Blocks of 5 min with or without opto-stimulation were alternated (on/off/on/off/on) for a total of 5 blocks. b. Quantification of mean lick counts in the opto-stimulation vs. non-opto-stimulation epochs shows a significant decrease in lick counts following stimulation of rostral medNAcSh D1-SPNs and no significant difference in the control mice. 2-way RM-ANOVA (group x epoch). Main effects: epoch F (1, 28) = 6.027, p=0.0206; group F (2, 28) = 1.448, p=0.2520; group x epoch F (2, 28) = 8.123, p=0.0017. Sidak post-hoc opto-stimulation vs. non opto-stimulation: Control on vs. off t(28) = 1.856, p=0.2061; Rostral medNAcSh on vs. off t(28) = 3.054, p= 0.0147. N=9 for Control mCherry; N=12 for Rostral medNAcSh ChrimsonR. c. Pie charts showing % of mice showing food intake inhibition (mean Δlick counts non-opto/opto>0) in each group: 42% of ChrimsonR rostral medNAcSh mice, 20% of controls. Data is mean ± SEM. *p<0.05; **p<0.01; ***p<0.001.

      (3) Molecular profiling:

      (a) Stard5 expression is present in both D1- and D2-SPNs; comparisons to bulk calcium signals and quantification of percentages across rostral and caudal cells would be helpful. The authors should establish whether these cells also express SerpinB2, an established marker of LH projecting neurons.

      We thank the Reviewer for this relevant point. In the photometry experiments (Figure 7) using Stard5-Flp mice, we acknowledge that the recorded signals reflect a mixed population of D1- and D2-SPNs. Based on quantification in a separate set of brains, we estimate that Stard5 is expressed in a variety of cell types, of which 35% are D1-SPNs and 30% are D2-SPNs (Supplementary Figure S3). While Liu et al. 2024 reported no overlap between Stard5 and Drd2, canonical marker for D2-SPNs, available transcriptomic data (Chen et al. 2021) and our own histological and RNA-based analyses (Figure 6 and Supplementary Figure S3) found Stard5 to be expressed in both D1-SPNs and D2-SPNs. Hence, indeed, Stard5 is a mixed population.

      We provide here the quantification of percentages of Stard5 expression across rostral and caudal cells: for instance, in the dorsal rostral medNAcSh, 79% of D1-SPNs and 76% of D2-SPNs express Stard5; in the ventral rostral medNAcSh the percentages are 47% and 55%, whereas the same percentages drop to 39 and 31% in the dorsal caudal medNAcSh and 15% and 20% in the ventral caudal medNAcSh.

      As suggested by the Reviewer, we also performed further analysis of the publicly available scRNA-seq dataset from Chen et al. 2021, which shows that 4.4% of all Stard5-expressing cells are also Serpinb2+, while 1.8% of all sequenced NAc cells are Stard5+/Drd1+/Serpinb2+ and 0.21% are Stard5+/Drd2+/Serpinb2+.

      (b) Verification of the Stard5-2A-Flp line (specificity, overlap with immunomarkers) should be documented more thoroughly.

      We agree with the Reviewer that a more detailed characterization of the Stard5-2A-Flp mouse line would be relevant for the validation of the line.

      In our study, we identified Stard5 as a marker gene that enables selective targeting of the rostral medNAcSh, as it is strongly enriched in the rostral medNAcSh (Figure 5-7). Stard5-Flp mice injected with Flp-dependent AAV in rostral medNAcSh, NAc core and dorsal striatum show specific AAV expression only in the rostral medNAcSh (Figure 7).

      Moreover, we show that the line is specific as injection of a Flp-dependent AAV in a Stard5-Flp negative line does not lead to expression (Figure 7c).

      However, re-analysis of the published scRNA-seq dataset (Chen et al. 2021) indicates that Stard5<sup>+</sup> cells comprise a heterogeneous population, including D1-SPNs (~35%), D2-SPNs (~30%), local interneurons (~18%), glial cells (~12%), and other cell types (Suppl. Fig. S3).

      Together, these data validate the Stard5-2A-Flp line as a spatially specific genetic entry point for the rostral medNAcSh, while highlighting the cellular heterogeneity of Stard5-expressing cells. Given the limited brain material left, we were not able to add additional colocalization analyses with immunomarkers, but agree this would be important to include in future studies.

      (c) The molecular analysis is restricted to a small set of genes; broader spatial transcriptomics could uncover additional candidate markers. See also above.

      We thank the Reviewer for this suggestion. Broader spatial transcriptomic analyses would indeed be highly valuable for identifying additional candidate markers. Our aim for the present study was to identify molecular landmarks to selectively target the rostral medNAcSh, but in a future study, we would be highly interested in building on our initial findings and providing an exhaustive molecular characterization of the region using spatial transcriptomics. We would be particularly motivated to do so, given the important functional specificity of the rostral NAcSh identified in the present publication.

      Reviewer #2 (Public review):

      Summary:

      Marinescu et al. combine in vivo imaging with circuit-specific optogenetic manipulation to characterize the anatomic heterogeneity of the medial nucleus accumbens shell in the control of food intake. They demonstrate that the inhibitory influence of dopamine D1 receptor-expressing neurons of the medial shell on food intake decreases along a rostro-caudal gradient, while both rostral and caudal subpopulations similarly control aversion. They then identify Stard5 and Peg10 as molecular markers of the rostral and caudal subregions, respectively. Through the development of a new mouse line expressing the flippase under the promoter of Stard5, they demonstrate that Stard5-positive neurons recapitulate the activity of D1positive neurons of the rostral shell in response to food consumption and aversive stimuli.

      We thank Reviewer 2 for the positive feedback, summary of our findings and for the thorough reading and constructive comments on the manuscript, which allowed us to improve the quality of the revised version.

      Strengths:

      This study brings important findings for the anatomical and functional characterization of the brain reward system and its implications in physiological and pathological feeding behavior. It is a well-designed study, technically sound, with clear and reliable effects. The generation of the new Stard5-Flp line will be a valuable tool for further investigations. The paper is very well written, the discussion is very interesting, addresses limitations of the findings, and proposes relevant future directions

      We thank Reviewer #2 for their supportive feedback.

      Weaknesses:

      At this stage, identification and characterization of the activity of Stard5-positive neurons is a bit disconnected from the rest of the paper, as this population encompasses both D1- and D2-positive neurons as well as interneurons. While they display a similar response pattern as D1-neurons, it remains to be determined whether their manipulation would result in comparable behavioral outcomes.

      We agree that this represents an important limitation of the current study. In our search for molecular markers of the rostral feeding hotspot, we identified Stard5 as a marker enriched in the rostral medNAcSh; however, Stard5 labels a heterogeneous population that includes D1- and D2-SPNs as well as other cell types. While Stard5<sup>+</sup> neurons display activity patterns similar to D1-SPNs, we acknowledge that whether their direct manipulation would produce comparable behavioral effects to D1-SPNs remains to be determined. Moreover, it remains to be determined how the activity and function of Stard5<sup>+</sup> neurons compares to D2-SPNs.

      To specifically isolate Stard5<sup>+</sup> D1-SPNs, we generated a Stard5-Flp;Drd1-Cre mouse line via breeding. However, the 4 CreON/FlpON AAVs which we tested exhibited leaky expression, including ectopic expression in Cre-positive but Flp-negative cells. This prevented reliable, cell-type-specific manipulation. We are actively working to overcome this common technical limitation of Flp/Cre AAVs, and these experiments will be addressed in a future study.

      Recommendations for the authors:

      Editor's note:

      Readers would also benefit from coding individual data points by sex and noting N/sex in the figure legends.

      We thank the editor for the note, we have noted in each figure legend the N and sex of the mice.

      Reviewer #1 (Recommendations for the authors):

      (1) Integration of results: The manuscript reads as two partly disconnected halves (functional gradient vs. molecular profiling). A more precise articulation of how the molecular findings (Stard5, Peg10) directly relate to the functional data would improve coherence.

      We thank the Reviewer for raising this important point. We agree that clearer integration between the functional gradient and the molecular findings would strengthen the manuscript. In the present study, Stard5 and Peg10 are not introduced as mechanistic drivers of behavior, but as molecular landmarks that map onto the functional rostro-caudal organization of the medNAcSh.

      Stard5 expression is enriched in the rostral medNAcSh, where we identify a functional hotspot for rewardrelated feeding, whereas Peg10 marks more caudal territories. Thus, the molecular profiling provides an independent axis that aligns with and supports the functional gradient revealed by photometry and optogenetic experiments. Whether these genes themselves contribute causally to feeding or aversive behaviors remains an open and interesting question for future studies.

      To improve clarity, we have explicitly articulated this link in the Discussion:

      “Importantly, our results indicate that spatial organization also defines functional specialization in the medNAcSh, and that molecular markers such as Stard5 provide access to these spatially defined subterritories rather than labeling a single, homogenous neuronal subtype.“

      “Having established a robust functional dichotomy of D1-SPNs along the rostro-caudal axis in reward consumption, we next asked whether this functional organization is mirrored by differences in molecular composition across the medNAcSh. Using multiple anatomical techniques, we find strong differences in the molecular composition of the rostral vs. caudal medNAcSh, which in turn could explain behavioral differences between these brain subregions.”

      “This makes Stard5 a spatial molecular landmark that captures the cellular ensemble of the rostral feeding hotspot, rather than a marker defining a single functional cell class. It is interesting that Stard5, a STARTdomain protein implicated in cholesterol metabolism and cellular stress responses (Alpy and Tomasetto, 2005; Rodriguez-Agudo et al., 2012; Calderon-Dominguez et al., 2014), and Peg10, an imprinted gene with roles in embryonic development and cancer (Mou et al. 2025), mark distinct rostro-caudal domains of the medNAcSh. Whether these genes themselves causally contribute to appetitive and consummatory behaviors, or aversive processing in this region remains an important question for future studies.”

      (2) Injection site specificity: Given prior work on NAc manipulations, it is essential to ensure precise targeting. Representative images from both rostral and caudal placements, including verification of fiber/injection confinement, would increase confidence.

      We thank the Reviewer for this important point regarding injection site specificity. Optic fiber placement was validated by identifying the coronal section in which the fiber tip was centered and aligning it to the mouse brain atlas (Franklin and Paxinos, The Mouse Brain in Stereotaxic Coordinates). We validated currently a total of 14 brains, shown in the newly added Supplementary Figure S10.

      The primary source of variability across animals could be the extent of the viral spread and the size of the optic implants, which were 400 for photometry experiments and 200 μm for the optogenetic studies. We acknowledge that this limits the spatial precision with which the individual subregions can be isolated. This limitation is explicitly discussed in the manuscript.

      Importantly, despite this limitation, we detected robust and reproducible differences between rostral and caudal medNAcSh in reward-consumption photometry and optogenetic assays. This argues against injection site proximity or fiber misplacement being a major confounding factor for the main conclusions. Nonetheless this comment is a valid point, and in future studies we plan to establish targeting methods with reduced viral volumes and/or tapered optic fibers (Pisanello et al. 2017). This will allow finer spatial restriction and more precise dissection of medNAcSh subregions.

      (3) Minor clarifications:

      (a) Provide explicit definitions of "rostral" and "caudal" coordinates.

      We adjusted Figure 1 and added the coordinates.

      (b) Consider alternative wording to "gradient" since only two rostro-caudal positions are tested.

      RNA-seq and MERFISH data indicate that molecular markers in the NAcSh are organized along a continuous rostro–caudal gradient rather than discrete boundaries (Chen et al. 2021; Stanley et al. 2020). Our use of the term ‘gradient’ therefore reflects this established molecular organization, even though our functional experiments sampled two representative positions along this continuum.

      We added the following sentence in the discussion for clarification:

      “Of note, in this paper we decided to use the term “rostro-caudal gradient”, motivated by converging evidence from prior pharmacological studies (see below) and scRNA sequencing data (Chen et al., 2021; Stanley et al., 2020), which show continuous molecular and functional changes along the rostro-caudal axis of the medNAcSh rather than sharply defined boundaries. Our use of the term ‘gradient’ therefore reflects this established molecular organization, even though our functional experiments sampled only two representative positions along this continuum.”

      (c) Enhance representative images (e.g., stronger DAPI, zoom-ins, bregma coordinates).

      To improve clarity, we have adjusted Figure 1 by adding schematic representations including stereotaxic surgery coordinates, which facilitate interpretation of rostro–caudal targeting.

      (d) Report trial numbers in figure legends, injection site details (e.g., S1 mouse), learning curves, and rationale for low-pass filtering in photometry.

      We thank the Reviewer for these suggestions. The average number of successful trials is now reported in the figure legends (Figure 1 and Figure 7). Injection site details are described in the Methods and are now also illustrated in Figure 1a and validated in Supplementary Figure S10. In addition, we have added Supplementary Figure S8 showing the learning curves of the Drd1-Cre and Stard5-Flp mice included in this study.

      Regarding the low-pass filtering in photometry analysis: low-pass filtering (1 Hz) was applied to the signal to remove high-frequency noise and isolate slow calcium-dependent fluorescence fluctuations that reflect population-level neural activity as we have done before (Labouesse et al. 2023, 2024). Low-pass filtering is a commonly-used analysis in fiber photometry and often shows a better artifact-corrected signal (Zhang et al. 2023; Keevers and Jean-Richard-dit-Bressel 2025).

      Reviewer #2 (Recommendations for the authors):

      Major Comments:

      (1) As mentioned, I find the part on Stard5-positive neurons a bit disconnected. Ideally, as mentioned in the discussion, the author could cross Stard5-Flp mice with D1-cre to selectively monitor and/or manipulate these neurons. Alternatively, do they have any data regarding D2-positive neurons of the rostral part to show whether they behave differently from D1-positive neurons?

      We thank the Reviewer for this suggestion and agree that selectively monitoring or manipulating Stard5<sup>+</sup> D1-SPNs using an intersectional approach would strengthen the link between the molecular and functional findings. We are pursuing this strategy by crossing Stard5-Flp mice with Drd1-Cre mice; however, as noted above, currently available CreON/FlpON viral tools exhibited leaky expression (a commonly known problem for such AAVs), preventing reliable cell-type–specific targeting. As a result, these experiments are ongoing (including reducing the titers) and will be addressed in a future study.

      At present, we do not have equivalent functional data for D2-SPNs in the rostral medNAcSh. Investigating whether rostral D2-SPNs behave differently from caudal D2-SPNs is an important and interesting question, which we hope to address in a future study. This limitation is acknowledged in the discussion.

      (2) Do the authors have any data on locomotor activity when they manipulate D1-expressing neurons? Lower food consumption as well as lower activity in the stimulated compartment - interpreted as aversion - could be related to diminished locomotor activity.

      We thank the reviewer for the relevant point about locomotion. We ran new analyses of locomotor activity during the feeding task (operant boxes) using a machine-learning model. A small subset of frames (136 frames from 10 video recordings) was manually annotated to define the animal’s body center and nose, as well as the four corners of the operant box. These annotations were used to train a YOLO (Redmon et al. 2015)-based pose estimation model. Locomotion metrics, such as total distance moved were subsequently derived from the temporal integration of positional data and aligned to opto-on and opto-off epochs of the feeding task. During licking periods, the animal’s body center remains largely stationary, which could lead to an overestimation of immobility. Nevertheless, we quantified the total distance traveled in the entire operant box across epochs, shown in Supplementary Figure S9 a-b. In our proof-of-concept experiment (Figure 2c-e), locomotion was increased in rostral ChrimsonR mice compared to controls (Supplementary Figure S9a), a similar effect seen with chemogenetic activation of D1-SPNs (Zhu, Ottenheimer, and DiLeone 2016). In our full experimental cohort, locomotion did not differ between control, rostral and caudal ChrimsonR mice across laser on and laser off epochs. These results indicate that reduced reward consumption during stimulation of rostral D1-SPNs is not due to decreased locomotor activity. Notably, whereas the inhibitory effect on consumption is specific to rostral D1-SPNs activation, locomotor effects are similar for both rostral and caudal D1-SPNs stimulation, indicating they are at least partly dissociated from one another.

      Moreover, in the RTPPA task, it is accepted that the percentage of time spent in the light-paired chamber reflects the preference or aversiveness to optogenetic stimulation. We additionally quantified total distance traveled (Supplementary Figure S9c). While optogenetic stimulation of both rostral and caudal D1-SPNs reduced time spent in the light-paired chamber (Figure 4), total distance traveled was unchanged, indicating that the observed aversion is not due to reduced locomotion.

      We added the following to the Results section: “To determine whether the reduced reward consumption observed in Rostral ChrimsonR mice could be explained by changes in locomotion, we quantified the total distance traveled during this task. Optogenetic stimulation led to an increase in locomotion in the small cohort of Rostral ChrimsonR mice in the reward consumption experiment shown in Figure 2d-e (Supplementary Figure S9a), while no change in locomotion was observed across epochs in mCherry controls, ChrimsonR Rostral and Caudal mice (Supplementary Figure S9b, related to Figure 2g-i)”

      And

      “Quantification of locomotion showed no reduction in distance traveled in the light-paired chamber (Supplementary Figure S9c), indicating that the avoidance was not driven by impaired locomotion. These data indicate that medNAcSh D1-SPNs generally promote aversion without affecting locomotion and without major differences along the rostro-caudal axis”

      Additionally, we added the following sentence to the Discussion: “Importantly, our behavioral effects of rostral D1-SPNs in the reward consumption and RTTPA assays could not be explained by reduced locomotor activity. Indeed, optogenetic stimulation of D1-SPNs during the reward consumption task did not reduce locomotion; instead, locomotion was either unchanged or increased in a small cohort of Rostral ChrimsonR mice. The increased locomotion likely reflected appetitive behavior and is consistent with past chemogenetic studies (Zhu et al., 2016). In the RTTPA no locomotion differences were detected.“

      (3) It would be useful to provide a schematic (or pictures) for the location of fiber implantation in all animals for both photometry and optogenetics.

      We validated optic fiber placement in 14 animals by identifying the coronal section in which the fiber tip was centered and aligning this section to the mouse brain atlas (Franklin and Paxinos, The Mouse Brain in Stereotaxic Coordinates). Representative optic fiber placement and viral spread are shown in the newly added Supplementary Figure S10.

      Minor Comments:

      (1) Figure 6e and g seem mislabeled: "Drd1+ (D2-SPNs)".

      Yes, thank you. We corrected it.

      (2) Line 395-397: the authors mention Flp minimal Flp Leakage, but could it be low activity of Stard5 promoter in the core and dorsal striatum that allows little expression of the flippase that could be sufficient for recombination?

      We thank the Reviewer for this insightful point. We cannot fully distinguish between these possibilities in the current study; however, the overall recombination outside the target region remains minimal, supporting the utility of the Stard5-Flp line for selective targeting of the rostral medNAcSh. Injection of a Flp-dependent AAV into the lateral shell, core and dorsal striatum showed no expression, therefore we think this is unlikely. Moreover, this aligns with Stard5 expression patterns derived from the scRNAseq data (Chen et al. 2021), Allen Brain Atlas quantifications (Figure 5) and our RNAscope analysis (Figure 6). Nevertheless, we acknowledge that histology alone cannot definitively exclude this possibility, and quantitative approaches such as qPCR would be required.

      References

      Alpy, Fabien, and Catherine Tomasetto. 2005. “Give Lipids a START: The StAR-Related Lipid Transfer (START) Domain in Mammals.” Journal of Cell Science 118(13):2791–2801. doi:10.1242/jcs.02485.

      Calderon-Dominguez, Maria, Gregorio Gil, Miguel Angel Medina, William M. Pandak, and Daniel RodríguezAgudo. 2014. “The StarD4 Subfamily of Steroidogenic Acute Regulatory-Related Lipid Transfer (START) Domain Proteins: New Players in Cholesterol Metabolism.” The International Journal of Biochemistry & Cell Biology 49:64–68. doi:10.1016/j.biocel.2014.01.002.

      Chen, Renchao, Timothy R. Blosser, Mohamed N. Djekidel, Junjie Hao, Aritra Bhattacherjee, Wenqiang Chen, Luis M. Tuesta, Xiaowei Zhuang, and Yi Zhang. 2021. “Decoding Molecular and Cellular Heterogeneity of Mouse Nucleus Accumbens.” Nature Neuroscience 24(12):1757–71. doi:10.1038/s41593-021-00938-x.

      Domingues, Ana Verónica, Tawan T. A. Carvalho, Gabriela J. Martins, Raquel Correia, Bárbara Coimbra, Ricardo Bastos-Gonçalves, Marcelina Wezik, Rita Gaspar, Luísa Pinto, Nuno Sousa, Rui M. Costa, Carina Soares-Cunha, and Ana João Rodrigues. 2025. “Dynamic Representation of Appetitive and Aversive Stimuli in Nucleus Accumbens Shell D1- and D2-Medium Spiny Neurons.” Nature Communications 16(1):59. doi:10.1038/s41467-024-55269-9.

      Keevers, Luke J., and Philip Jean-Richard-dit-Bressel. 2025. “Obtaining Artifact-Corrected Signals in Fiber Photometry via Isosbestic Signals, Robust Regression, and DF/F Calculations.” Neurophotonics 12(02). doi:10.1117/1.NPh.12.2.025003.

      Labouesse, Marie A., Arturo Torres-Herraez, Muhammad O. Chohan, Joseph M. Villarin, Julia Greenwald, Xiaoxiao Sun, Mysarah Zahran, Alice Tang, Sherry Lam, Jeremy Veenstra-VanderWeele, Clay O. Lacefield, Jordi Bonaventura, Michael Michaelides, C. Savio Chan, Ofer Yizhar, and Christoph Kellendonk. 2023. “A Non-Canonical Striatopallidal Go Pathway That Supports Motor Control.” Nature Communications 14(1):6712. doi:10.1038/s41467-023-42288-1.

      Labouesse, Marie A., Maria Wilhelm, Zacharoula Kagiampaki, Andrew G. Yee, Raphaelle Denis, Masaya Harada, Andrea Gresch, Alina-Măriuca Marinescu, Kanako Otomo, Sebastiano Curreli, Laia Serratosa Capdevila, Xuehan Zhou, Reto B. Cola, Luca Ravotto, Chaim Glück, Stanislav Cherepanov, Bruno Weber, Xin Zhou, Jason Katner, Kjell A. Svensson, Tommaso Fellin, Louis-Eric Trudeau, Christopher P. Ford, Yaroslav Sych, and Tommaso Patriarchi. 2024. “A Chemogenetic Approach for Dopamine Imaging with Tunable Sensitivity.” Nature Communications 15(1):5551. doi:10.1038/s41467-024-49442-3.

      Liu, Yiqiong, Ying Wang, Zheng-dong Zhao, Guoguang Xie, Chao Zhang, Renchao Chen, and Yi Zhang. 2024. “A Subset of Dopamine Receptor-Expressing Neurons in the Nucleus Accumbens Controls Feeding and Energy Homeostasis.” Nature Metabolism 6(8):1616–31. doi:10.1038/s42255-02401100-0.

      Mou, Dachao, Shasha Wu, Yanqiong Chen, Yun Wang, Yufang Dai, Min Tang, Xiu Teng, Shijun Bai, and Xiufeng Bai. 2025. “Roles of PEG10 in Cancer and Neurodegenerative Disorder (Review).” Oncology Reports 53(5):1–9. doi:10.3892/or.2025.8893.

      O’Connor, Eoin C., Yves Kremer, Sandrine Lefort, Masaya Harada, Vincent Pascoli, Clément Rohner, and Christian Lüscher. 2015. “Accumbal D1R Neurons Projecting to Lateral Hypothalamus Authorize Feeding.” Neuron 88(3):553–64. doi:10.1016/j.neuron.2015.09.038.

      Pedersen, Christian E., Raajaram Gowrishankar, Sean C. Piantadosi, Daniel C. Castro, Madelyn M. Gray, Zhe C. Zhou, Shane A. Kan, Patrick J. Murphy, Patrick R. O’Neill, and Michael R. Bruchas. 2022. “Medial Accumbens Shell Spiny Projection Neurons Encode Relative Reward Preference.”

      Pisanello, Ferruccio, Gil Mandelbaum, Marco Pisanello, Ian A. Oldenburg, Leonardo Sileo, Jeffrey E. Markowitz, Ralph E. Peterson, Andrea Della Patria, Trevor M. Haynes, Mohamed S. Emara, Barbara Spagnolo, Sandeep Robert Datta, Massimo De Vittorio, and Bernardo L. Sabatini. 2017. “Dynamic Illumination of Spatially Restricted or Large Brain Volumes via a Single Tapered Optical Fiber.” Nature Neuroscience 20(8):1180–88. doi:10.1038/nn.4591.

      Redmon, Joseph, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2015. “You Only Look Once: Unified, Real-Time Object Detection.”

      Requejo-Mendoza, Nikte, José-Antonio Arias-Montaño, and Ranier Gutierrez. 2025. “Nucleus Accumbens D2-Expressing Neurons: Balancing Reward and Licking Disruption through Rhythmic Optogenetic Stimulation” edited by J. M. Dominguez. PLOS ONE 20(2):e0317605. doi:10.1371/journal.pone.0317605.

      Rodriguez-Agudo, Daniel, Maria Calderon-Dominguez, Miguel Angel Medina, Shunlin Ren, Gregorio Gil, and William M. Pandak. 2012. “ER Stress Increases StarD5 Expression by Stabilizing Its MRNA and Leads to Relocalization of Its Protein from the Nucleus to the Membranes.” Journal of Lipid Research 53(12):2708–15. doi:10.1194/jlr.M031997.

      Stanley, Geoffrey, Ozgun Gokce, Robert C. Malenka, Thomas C. Südhof, and Stephen R. Quake. 2020. “Continuous and Discrete Neuron Types of the Adult Murine Striatum.” Neuron 105(4):688-699.e8. doi:10.1016/j.neuron.2019.11.004.

      Zhang, Yan, Márton Rózsa, Yajie Liang, Daniel Bushey, Ziqiang Wei, Jihong Zheng, Daniel Reep, Gerard Joey Broussard, Arthur Tsang, Getahun Tsegaye, Sujatha Narayan, Christopher J. Obara, JingXuan Lim, Ronak Patel, Rongwei Zhang, Misha B. Ahrens, Glenn C. Turner, Samuel S. H. Wang, Wyatt L. Korff, Eric R. Schreiter, Karel Svoboda, Jeremy P. Hasseman, Ilya Kolb, and Loren L. Looger. 2023. “Fast and Sensitive GCaMP Calcium Indicators for Imaging Neural Populations.” Nature 615(7954):884–91. doi:10.1038/s41586-023-05828-9.

      Zhu, Xianglong, David Ottenheimer, and Ralph J. DiLeone. 2016. “Activity of D1/2 Receptor Expressing Neurons in the Nucleus Accumbens Regulates Running, Locomotion, and Food Intake.” Frontiers in Behavioral Neuroscience 10. doi:10.3389/fnbeh.2016.00066.

    1. Reviewer #2 (Public review):

      This study examines how curl in the retinal flow field can be used as a control variable for estimating and controlling the heading of a moving observer. The basic idea (which is not entirely new, see Matthis et al. 2022) is that translation along a path with eccentric gaze (meaning that the subject is not heading toward the point they are looking at) produces a pattern of optic flow on the retina with a rotational component around the point of fixation (which can be captured by the mathematical "curl" operator). The sign and magnitude of retinal curl vary with heading relative to the point of fixation, such that curl can be used as a control variable to steer rightward or leftward to move toward the fixated target. The authors perform behavioral experiments and show that there are biases in perceived heading that seem to be largely governed by retinal curl. They also show that a simple controller model can use curl to steer toward a target, and they provide a neural network model that provides a biologically plausible implementation of the controller (although there are some questions about that).

      There is a core of interesting work here that I think can be important to the field. However, there is a lack of clarity on several important fronts, including design of the behavioral experiments, presentation of the behavioral data, conceptual framing of what curl can and cannot do, etc. Equally importantly, the manuscript is not written in a manner that will make it accessible to most vision scientists. I consider myself to be pretty knowledgeable about optic flow, and I had to read most of the manuscript 3 or 4 times to be able to understand the bulk of it. And my experience is that most vision scientists do not understand optic flow well, so I fear that most of the readers that the authors should want to reach would struggle to understand the work. As written, this is mainly going to make an impact on a handful of optic flow gurus. Thus, I consider that this manuscript will need a major overhaul to clarify important issues and make it more accessible.

      Major issues:

      (1) The manuscript contains inconsistent, if not misleading, messaging about what information retinal curl does, and does not, provide regarding heading estimation. In the Abstract, the authors state: "We propose an alternative: the visual system utilizes retinal curl directly to estimate heading, rendering the explicit recovery of the FOE unnecessary." Based on my understanding of the rest of the manuscript, I find this statement to be a misrepresentation for two main reasons:

      a) To "directly estimate heading" relative to what? When not qualified, most people interpret "heading" to mean an observer's heading relative to the world (or some allocentric reference frame). But retinal curl only gives information about an observer's heading relative to the point on which their eyes are fixated. Moreover, that point of fixation will change every few hundred milliseconds in natural viewing, so the retinal curl will change with each new fixation even as heading relative to the world remains unchanged. So I think most readers would grossly misinterpret the claim that retinal curl can be used "directly to estimate heading". Indeed, in the authors' controller model, the initial heading needs to be given, and then the controller can work. But from where does the visual system get the initial heading, since it does not come from curl? These issues are left hanging. Thus, while curl can provide a very useful input for steering toward a fixated target, other signals are needed to estimate heading relative to the world. This has to be made much clearer early on, and a conceptual schematic diagram might help. Also, the authors generally do not specify the reference frame of the variables they are talking about, leaving lots of room for misinterpretations. It should be clear each time they are talking about a variable, such as heading, whether it is relative to the fixation target, body, world, etc.

      b) It seems to me that retinal curl will depend on other variables, in addition to heading relative to the fixation target. For example, it seems to me that the magnitude of retinal curl will depend on self-motion speed, the depth structure of the scene, the angle of elevation of the fixated target, and perhaps others. This is not discussed at all, and many readers would get the misguided impression that there is a 1:1 mapping from curl to heading (relative to fixation). If I am right that this is not correct, it means that retinal curl can tell the observer whether to steer right or left to move toward the fixated target, but it cannot tell them how much to steer. Indeed, in the authors' controller model, there is a free parameter that calibrates curl to angle. It makes sense that this works to fit trajectory data that are given from a fixed environment, but it is unclear how the brain would use retinal curl to control steering when these other variables are uncertain or changing unpredictably. Moreover, how does the system change the mapping from curl to steering command as the location of fixation changes relative to the current heading? These are issues that need to be brought up in framing the problem and discussed at some length. If the authors can show mathematically that retinal curl is only dependent on heading (relative to fixation) and not any of these other variables, it would be very valuable to show the equations for this relationship.

      (2) The description of the behavioral experiment and presentation of behavioral data leaves a lot to be desired.

      a) First, it is stated (line 158) that "Participants continuously reported their perceived direction of self-motion while maintaining fixation on the yellow dot." Again, the reference frame is completely unspecified. Participants were reporting their perceived heading relative to what? The fixation target? The world? What exactly were the instructions given to the subjects to perform the task? Based on the description of how perceived paths are computed (line 166-), it seems to be presumed that subjects are reporting their heading relative to the world because those angles are then converted into x and z coordinates in what I presume is a world-centered reference frame. But how do we know that subjects are accurately reporting their heading relative to the world? What if they are biased in their reports by the location of the fixation target relative to the scene, or by some other reference signal? Is it possible for the authors to rule out the possibility that perceptual biases seen in the unaltered curl condition result from observers not fully adopting the assumed reference frame of the task? If this cannot be firmly excluded, it seems to create problems for the rest of the study.

      b) I also feel that there is a mismatch between what the behavioral task requires and what the controller model does. Subjects are apparently asked to report their heading relative to the world, but the controller model only controls their heading relative to the point that they are fixating. I understand how this is resolved in the model, but I think this type of distinction is buried and will not be apparent to most readers. Again, the reference frames of what is being measured and controlled need to be specified explicitly in all parts of the paper, and the authors need to explain how the system would combine curl-based control with some other measures of (at least initial) heading for world-centered heading to be computed. All of the assumptions need to be clearly specified.

      c) In addition, I found it frustrating that the authors never present raw perceptual data from the observers. Rather, in Figure 2, we see reconstructed trajectories that are perfectly smooth with no indications of noise whatsoever. Since these paths are computed from the perceptual reports, there must be some noise inherent in them. The figures should represent this uncertainty somehow, and it should be explained how these perfectly smooth trajectories are obtained.

      (3) "...the magnitude of retinal curl in the fovea can specify the body trajectory relative to gaze (Matthis et al., 2022)." The main idea put forward by the authors here seems to overlap heavily with this statement that they attribute to Matthis et al. 2022. While I think this paper still adds importantly to the topic, the authors do not discuss how their findings are different from those of Matthis et al. 2022, why they are an important extension, etc. Readers should not have to go read this other paper to have any idea how the present findings are placed in importance relative to the literature.

      (4) The analysis and treatment of eye movements is extremely weak. The authors discarded trials for which gaze deviated from the fixation point by more than 3 degrees (which is a LOT given that the eye speeds are generally in the neighborhood of 0.5 deg/sec), and they provide basic stats on the distribution of positions. But this largely misses the point: it is not small position errors that are likely to matter, but rather velocity errors. Even a small amount of retinal slip of the target while it is being pursued will cause image motion that is going to alter the optic flow field around the fixation target. So, for example, the retinal curl field may no longer be centered on the fixation target. How do we know that some of the perceptual biases are not influenced by image motion resulting from imperfect tracking of the fixation target? This needs to be analyzed and discussed.

      (5) I found the sections of text comparing the separate and joined fits (starting line 287) to be a bit too rosy. The authors show the separate fits in the main text, and it is not very surprising that these fits are good, given that the model has 30 parameters, and these data are pretty low-dimensional. The authors only show the joined fits in the supplement, and they say that they are almost as good as the separate fits (indeed, they are better in a model comparison sense, but this is 30 parameters vs. 2 parameters). However, when I look at the fits of the joined model in the supplement, I don't find them to be very impressive. In particular, the model grossly misses the data for the straight paths for several subjects (e.g., id5, id6, id8, id10). And fitting the straight paths would presumably be easiest. This implies that the joined model is really missing something and that fitting the curved paths interacts strongly with fitting the data for different fixation target locations on the straight path. I think that the authors should discuss the results a bit more soberly and tone down their conclusions here.

      (6) The section of the paper on neural simulations (starting line 387) has a few weaknesses. First, why are only straight paths simulated here? This does not seem to provide a very rigorous test of the model. Second, it is awkward that the simulation results are presented in units of pixels, rather than degrees. Third, the authors seem to downplay the fact that the neural estimates of heading seem to oscillate rather wildly (over a range of hundreds of pixels, whatever that means, see especially Figure S16). It was far from clear to me how an estimate of heading with these large oscillations is useful. It would seem to require that heading estimates are integrated over substantial lengths of time to be reliable. It was therefore unclear how the model produces such smooth paths from these oscillating estimates.

  4. social-media-ethics-automation.github.io social-media-ethics-automation.github.io
    1. One idea from this chapter that stood out to me is the tension between authenticity and anonymity online. I found it interesting that the chapter suggests anonymity can sometimes support authenticity rather than undermine it. At first, that feels counterintuitive because we usually think of anonymous accounts as less trustworthy or even deceptive. But thinking about it more, I agree that anonymity can allow people to express parts of their identity they might hide in real life due to fear of judgment or consequences. For example, people may share honest opinions, personal struggles, or marginalized identities more openly when they are anonymous. At the same time, this creates a difficult balance for platforms, because anonymity can also enable harmful behavior. It makes me wonder: is it even possible for a platform to encourage “authenticity” without limiting anonymity, or are those two goals always in tension?

    1. Applied AI Literacies in Information Practice. (Exceptions can be made in the next question.) I would like my content included in the book titled AI in Information Work, which may be made available and known to broad audiences in the future.

      I like that we are give this open while not required to all but somewhat of our work that we have does throughout the course. I think it will be interesting to post different perspectives and share while receiving them as well! Excited to see the final deliverable.

    1. Author response:

      Reviewer 1:

      Porte et al. investigate how observers form confidence judgments about the presence vs absence of near-threshold audiovisual stimuli. In two psychophysical detection experiments, human participants judged whether a stimulus (visual, auditory, or audiovisual) was present or absent, reported amodal confidence, and then gave modality-specific detection and confidence ratings using a bidimensional scale. The authors report that audiovisual (AV) stimuli are detected more accurately than unimodal stimuli, but that multisensory stimulation does not improve metacognitive efficiency. Participants are more confident in absence than in presence judgments. They extend a previously proposed model to an audiovisual setting, assuming evidence is available only for presence and that absence is inferred via counterfactual detectability. Detection is modeled with a disjunctive integration rule across modalities, while confidence is explained by a combination of conjunctive (for presence) and disjunctive/negation-of-disjunction (for absence) rules.

      We thank the reviewer for thoroughly evaluating our work.

      There are several points I wish to have clarified, outlined below:

      (1) Framing of bimodal vs unimodal detection

      On p.3, the introduction states that "Adults typically show higher detection rates and faster reaction times for bimodal than for unimodal stimuli." This is broadly consistent with the literature, but as written, it obscures the fact that these effects depend critically on experimenter-defined stimulus strengths. It is trivial to construct cases where a strong unimodal stimulus is more detectable than a bimodal stimulus made of two very weak unimodal stimuli. If "bimodal" is understood as the co-presentation of two unimodal components matched in detectability, then Bayes-rule-based arguments indeed predict better detection for the bimodal case; how much better is theoretically interesting, but not quantified in this paper. There is an entire literature on the combination of two unimodal stimuli, which is not touched on. For a pertinent reference, see Ernst & Banks 2002. I recommend clarifying that the statement assumes comparable unimodal intensities.

      We will clarify that when discussing bimodal stimuli, we mean the co-presentation of two unimodal stimuli of similar intensity. We will add references to the literature during discrimination tasks that have shown that multisensory cue-combination followed Bayes rule integration (e.g., Ernst & Banks, 2002; Battaglia et al., 2003; Alais & Burr, 2004) and clarify in which ways our work differs from this rich body of work and provides novel contributions.

      (2) Relationship to signal detection theory and counterfactual perceptibility

      In the introduction, the authors write, "If sensory evidence is only available for presence," motivating counterfactual perceptibility as a necessary ingredient to infer absence. However, standard signal detection theory (SDT) already provides a widely accepted framework in which a continuous internal response is present on both signal and noise (absent) trials, with absence corresponding to the noise distribution and decisions implemented by a criterion. Thus, there is no logical need to invoke counterfactual perceptibility simply to define absence; rather, the Mazor-style framework adds an explicit belief model about detectability and an optimal stopping policy. It would strengthen the paper to more clearly state how the proposed model goes beyond SDT conceptually, acknowledge that SDT can account for presence/absence decisions without counterfactuals, and position the counterfactual account as a hypothesis about how observers actually compute absence/confidence, not as a necessity.

      One of the central claims of the paper is that detection in the case of absence requires counterfactual reasoning. The authors should demonstrate whether or not an SDT-based generative model can describe these amodal and uni- and bi-modal stimulus decisions. In such an SDT model, an SDT-based generative model in which the noise distribution is shared across conditions, and unimodal vs bimodal differences are captured by changes in the mean or variance of the signal+noise distribution.

      We will clarify that our framework explains how absence judgments (and related confidence) are formed, and what it adds to SDT models, including the reproduction of reaction times and a normative explanation of criterion placement (results about RTs are available in the supplementary materials).We will also run additional model comparisons assessing how an SDT-based generative model performs compared to our Bayesian model based on counterfactual perceivability.

      (3) Confidence vs performance: is AV confidence special?

      The paper's central claims about multisensory confidence and metacognition would be stronger if the authors showed that AV confidence deviates from what is expected given performance alone. From the reported results, AV accuracy is around 80%, with visual and auditory at about 60% and 40%, respectively. Given that confidence typically monotonically scales with accuracy, the first question is whether AV confidence is entirely explained by improved performance, or whether there is an additional multisensory contribution. A simple, informative analysis would be for each subject, plot mean confidence vs per cent correct for AV, V, A, and absent conditions, and to test whether AV confidence lies above the trend predicted by accuracy alone.

      This is an excellent suggestion, and we will conduct the proposed analysis.

      (4) Metacognitive measures: logistic regression slopes vs meta-d′/d′

      In the "Multisensory effects on metacognitive performance" section, the authors define "metacognitive sensitivity" as the slope of a Bayesian logistic regression predicting accuracy from confidence. There is substantial literature showing that logistic-slope measures of metacognitive sensitivity are criterion-dependent and can be affected by both task and confidence criteria (for one example, see Rausch & Zehetleitner, 2017). In contrast, meta-d′/d′ was specifically developed to provide a bias-invariant measure of metacognitive efficiency. Though this, too, is dated (see Boundy-Singer et al., 2023). Given that the authors already estimate HMeta-d-based M-ratios, it is unclear why they rely on logistic regression slopes as their primary "metacognitive sensitivity" metric in Figure 4A. I suggest either replacing the logistic-slope metric with SDT-based measures (meta-d′, meta-d′/d′) or providing a clear justification for using logistic slopes, along with a discussion of their known limitations.

      Additionally, Figure 3 reports M-ratios without showing the corresponding d′ or meta-d′ for judge-present vs judge-absent conditions. Presenting these would help contextualize the metacognitive efficiency results and clarify whether differences are driven mainly by changes in metacognitive sensitivity, changes in task performance, or both. The d' values per condition could be added to Figure 2A.

      All typical measures of metacognitive sensitivity are influenced by metacognitive bias and task performance to some extent, and none of them is a pure measure of type-2 sensitivity (e.g., see Rahnev, 2025). Here, we chose logistic regression because it enables modeling interactions with other predictors in a factorial design with a limited number of trials.

      We will clarify the limitations of metacognitive sensitivity measures and better explain why we then used Mratio to estimate metacognitive performance while controlling for underlying task performance.

      Thank you for this suggestion. We will add the d’ values per condition to Figure 2A.

      (5) Interpretation of confidence in absence vs presence

      The authors emphasise that it is surprising subjects are more confident in absence than in presence judgments, both at amodal and modality-specific levels. However, Figure 2B suggests that absent responses are very accurate: absent is reported as present only in about 10% of absent trials, implying a high correct rejection rate. If confidence tracks outcome probability, higher confidence for absence may be at least partly expected. Before attributing this asymmetry primarily to counterfactual reasoning, it would be important to explicitly relate confidence to accuracy for hits, misses, false alarms, and correct rejections and show whether absence confidence remains elevated relative to presence after controlling for accuracy differences across judgment types and conditions. Without this, the interpretation that higher absence confidence is inherently "unexpected" seems overstated.

      This higher confidence for absence judgments than for presence judgments was observed while controlling for response accuracy. We will clarify this in the main text.

      (6) Model: integration rules, confidence, and evidence strength

      The modeling section extends the Mazor et al. ideal observer to two modality-specific sensors, with disjunctive integration for detection and then disjunctive vs conjunctive integration rules for confidence. I have a few comments.

      First, the detection rule is disjunctive and is reported as a finding. However, the conclusion that detection relies on a disjunctive rule ("present if A or V") closely mirrors the task instructions-participants are explicitly told to respond "present" if they detect the stimulus in any modality. As such, this seems more like a sanity check than a novel empirical finding. Relatedly, the conjunctive detection is a weak null. The conjunctive rule ("present only if both A and V") is behaviorally implausible given the task instructions. A more informative baseline would be an SDT-style scalar-evidence model (see comment 2), rather than a conjunctive rule that participants would have to actively violate the instructions to follow.

      Second, confidence in the model is defined as the probability of being correct at the time of the detection decision. However, this implies a fixed amount of evidence at decision time unless additional mechanisms are invoked. This issue is well known in diffusion modeling (see Kiani et al. 2014) and deserves explicit discussion; otherwise, it is unclear how the model produces graded confidence from a bound-crossing rule alone.

      Third, the authors do not consider a straightforward evidence-strength account of confidence. When both modalities indicate presence, there is, on average, more total sensory evidence than in unimodal trials, making correct decisions more likely and, under most frameworks, confidence higher. Likewise, weak evidence in both modalities can be stronger evidence for absence than moderate in one and weak in the other. Many of the patterns that motivate the presence-conjunctive/absence-disjunctive mix could arise from a model where confidence simply reflects the amount of evidence for the chosen option, without positing distinct logical integration rules for presence vs absence. As the authors note, purely disjunctive or purely conjunctive confidence rules fail to capture the trends in confidence reports in Figure 7, leading them to adopt a combined presence-conjunctive/absence-disjunctive rule. A more parsimonious alternative-that confidence scales with evidence magnitude and cross-modal agreement-should be explicitly considered and, ideally, implemented as a competing model. Finally, if the model is intended as a good account of the data, it would be useful to report whether it also reproduces the metacognitive efficiency patterns (M-ratios) beyond the mean confidence patterns shown in Figures 7-8. At present, the model appears systematically over-confident, which should be acknowledged and quantified.

      Indeed, the disjunctive rule was expected, given our design; we will clarify this. As mentioned above, we will directly compare the results of our current model with those of a more traditional SDT-based generative model, as suggested by the reviewer.

      Contrary to a classical drift diffusion model, the model does not assume a fixed decision boundary, but derives an optimal stopping policy per time point and belief state. As a result, and depending on beliefs about perceptual evidence and the temporal discounting factor, optimal decision boundaries can be asymmetric and may collapse asymmetrically toward 0. Furthermore, given the asymmetry in the information value between sensor activations and inactivations, and differences in the information value of sensor activations of the two modalities, boundary crossing can lead to belief states that are far or close to the decision boundary, depending on the nature of the evidence. Together, even without an explicit modeling of post-decisional evidence, the model can account for variability in the total accumulated evidence at decision time.

      From our understanding, the proposed alternative is equivalent to our current model, in which confidence scales with evidence magnitude.

      The model was not fitted to confidence data, which could explain its overall overconfidence. To further test our model, we will assess its ability to reproduce patterns of metacognitive efficiency (M-ratios).

      (7) Confidence asymmetry index (CAI) and modality weighting

      The confidence asymmetry index (CAI) is defined as the difference between auditory and visual confidence on AV vs absent trials, and the authors report strong correlations between observed and simulated CAI across participants. They interpret this as evidence that subjects place different weights on auditory vs visual signals. Several questions arise. First, does CAI capture asymmetries beyond what is expected from accuracy differences between modalities and conditions? Second, because the simulated data are generated from model fits to the observed data, a correlation between observed and simulated CAI is expected: the model is built to reproduce the individual patterns it is then compared to. A stronger test would compare CAI from data simulated with modality-specific belief parameters, versus CAI from data simulated with constrained equal belief parameters (same θs). Relatedly, the paper would benefit from a plot showing the distribution of θs for A and V- present stimuli across subjects. These values could also be related to unimodal sensitivity measured in the calibration/training phases. A natural prediction is that higher unimodal sensitivity should correspond to higher belief parameters for presence.

      The model was not fitted to either the modality-specific responses or the confidence ratings, so the correlation between observed and simulated CAI was not expected and provides a good test of our model's ability to reproduce the observed patterns. We will test whether the same correlations hold when using the difference in accuracy instead of the confidence.

      We found that the best model is the one with the same belief across the visual and auditory sensors. Given this, we cannot investigate how modality-specific belief parameters are linked to unimodal sensitivity for each participant.

      Reviewer 2:

      Summary:

      In this study, across two experiments, the authors wrestle with the question: What is the profile of confidence judgments in presence/absence decisions for audiovisual stimuli? After thresholding observers to 50% target detection rates in each modality, the authors conducted one experiment that included 75% target presence (spread equally across bimodal, auditory, and visual targets) and one experiment with 50% overall target presence. Results showed that, overall, detection performance was higher for audiovisual stimuli compared to unimodal ones, and that a recent model for stimulus detection could be extended to this multisensory scenario. By incorporating a disjunctive rule for absence judgments and a conjunctive rule for presence judgments, the model was able to qualitatively reproduce some of the trends observed in the human data regarding confidence.

      Strengths:

      (1) The paper makes novel contributions to the study of multisensory confidence judgments for yes/no target detection.

      (2) The paper further extends the use of a leading model of stimulus detection (from Mazor et al., 2025).

      (3) Pre-registration of the study was implemented, and the code is publicly available (although the GitLab link requires registration to access the materials).

      (4) One of the empirical results (higher confidence for absence compared to presence judgments) is especially interesting, contributing another empirical finding to a very mixed literature on this topic (as the authors note).

      We thank the reviewer for the positive evaluation of our work.

      Weaknesses:

      (1) Page 5 - I have concerns about the use of the equal-variance model from Signal Detection Theory to analyze the data. For example, the authors should read the recent paper by Miyoshi, Rahnev, and Lau in iScience, found at this link: https://www.cell.com/iscience/fulltext/S2589-0042(26)00373-1 . In this paper, the authors note how the equal variance model should be used with caution in yes/no detection tasks, since the variances of the "stimulus present" and "stimulus absent" distributions are often different from one another. In a revision, I highly recommend that the authors explicitly discuss this paper and review whether the assumptions for the equal-variance model have been met (e.g., since they have confidence data, one way to do this would be to evaluate if the slope of the line in zROC space differs from 1). The authors may also want to incorporate methods from this iScience paper into the current manuscript, or potentially move to using an unequal variance SDT model and compute d'a and c'a.

      This is an excellent suggestion. We will run this analysis and refit the d’ and criterion response using unequal-variance models to see whether we observe the same results.

      (2) Related to the computation/measurement of the response criterion, the authors note on page 18 in the Methods that for Experiment 1, signals are actually present on 75% of trials, since a bimodal stimulus is present on 25% of trials, the visual circle only occurs on 25% of trials, the sinusoidal tone occurs on 25% of trials, and then only noise is present on 25% of trials. Did the authors have any a priori hypotheses about the response criteria that participants would exhibit in Experiment 1, considering the unbalanced target presentation rate in this task? Also, in Experiment 2, what did it mean to equate target present and target absent trials? Is it that they broke 50% target present trials down into 16.67% bimodal targets, 16.67% visual targets, and 16.67% auditory targets? A few more details would be good to explicitly note for those trying to replicate the task

      We will clarify this point in the manuscript. In Experiment 2, the stimulus was absent on 50% of the trials. As a result, the 50% of stimulus present trials were split into the three possible conditions, resulting in a sixth of the trials being auditory, a sixth visual, and a sixth audiovisual; we will make these proportions clearer in the text.

      We did not have any a priori hypotheses about the response criteria for Experiment 1. The reviewer is right, the proportion of absent versus present trials can indeed have an impact on response bias. In fact, one of the goals of Experiment 2 was to test whether the low frequency of absent trials compared to present ones could explain both response bias and higher confidence in absence observed in Experiment 1, which we found was not the case, as we did not observe a difference between the two experiments. We will clarify this in our revision.

      (3) It is important to plot the individual data for Figure 2. If the authors didn't match detection performance for the visual and auditory modalities, it would be good to see the individual data to know why. Is it that the thresholding procedure didn't work for some of the participants in the visual modality, and that's why the "yes" response rate is (on average) ~60% or higher across the two experiments? Similarly, in the auditory domain, do the authors have participants that are at floor? Or is it simply that the staircases failed to successfully target 50% detection on average?

      We will add individual data to Figure 2.

      Indeed, staircases failed to achieve 50% detection on average; participants for whom psychometric curves did not converge were excluded, as were those at floor level in one of the two modalities.

      (4) The authors mentioned that data were collected on the Prolific platform. What checks did they conduct to ensure that this data wasn't produced by bots? There are recent high-profile publications in PNAS and Behavioral Research Methods that indicate how online data collection is problematic (e.g., https://www.pnas.org/doi/10.1073/pnas.2535585123and https://link.springer.com/article/10.3758/s13428-025-02852-7 ). What analyses or quality checks are there to ensure that humans were the ones completing the task?

      Data were collected on the Prolific platform, which has been shown to yield high-quality data (Kay, 2025). However, we agree that this is a potential concern and will add a note of caution in the revised manuscript, even if the risk that the data do not come from humans but from bots is low (Huskey et al., 2026; Chetverikov, 2026).

      (5) Page 7 - Since confidence was collected on a continuous scale, the authors should say a bit more about how they were able to compute measures of metacognitive efficiency. My understanding is that to compute meta-d', the data has to be binned. How was the binning implemented? With whatever bin size the authors chose, would it make any difference to the results if they changed the number of the bins in the analysis?

      We will clarify this aspect of the analysis. Data were binned into four quartiles based on the overall distribution of confidence values across participants, based on the binning used in the example in Fleming (2017). We will examine whether changing the number of bins changes the results (Dayan, 2023).

      (6) Page 8 - Is there a prior precedent for using slope of the Bayesian logistic regression predicting accuracy from confidence as a measure of metacognitive sensitivity? If so, can the authors cite those papers as a reference? If not, can they place this analysis within the context of other measures of metacognitive sensitivity that exist? (meta-d', AUROC (Type 2), etc.)

      Yes, logistic regression has been used to quantify metacognitive sensitivity before. We will add the relevant papers as references (e.g., Sandberg et al., 2010; Norman et al., 2011; Siedlecka et al., 2016; Wierzchoń et al., 2012; Faivre et al., 2018; Pereira et al., 2023)

      (7) Page 8 - Another one of the results on page 8 is worth reflecting further upon: the authors note how in Experiment 1, no credible difference was found between unimodal and bimodal trials (DeltaM = -0.25 [-0.59, 0.10]), but in Experiment 2, "we observed higher metacognitive efficiency in unimodal compared to bimodal trials (DeltaM = -0.28 [-0.54, -0.02]. Those DeltaM values are nearly identical, so without a power analysis motivating the number of participants the authors collected, how certain are they that the results from these two experiments are really that distinct? It reminds me a bit of the Andrew Gelman blog post, "The difference between significance and non-significance is not significant".

      The number of participants was determined using a Bayesian optional stopping rule, as preregistered. The reviewer is right that the delta values are very similar in the two experiments. Given that a difference was found in only one experiment, we decided not to draw conclusions from it.

      (8) Is there any way to look at whether the presence of multisensory hallucinations (or perhaps that word is too strong, and we should simply consider them miscategorizations) increased as the task progressed? That is, the authors have repeated presentations of audiovisual stimuli for at least some percentage of the trials. Since the percentages for auditory stimuli being correctly categorized as auditory are at 85% in Experiment 1 and 79% in Experiment 2, were the trials where they miscategorized these stimuli equally spread throughout the task? Or did they come later in the experiment, after being repeatedly exposed to multisensory trials?

      We will examine how the proportion of miscategorisation changed throughout the task.

      (9) Would the authors obtain the same results if they got rid of the amodal confidence judgment in their task, and simply had participants report the bimodal confidence following the presence/absence judgment? Part of the reason for asking this is that, according to page 11, the model is only fitted to amodal detection accuracy and response time data. This surprised me. I would have expected that the bimodal confidence would provide more useful information for the model fit. The authors should further explain this rationale in the paper. It seems odd to me to have the multisensory confidence ratings and not have them play a central role in the modeling work.

      Our main goal was to investigate how participants form integrated, supramodal confidence judgments on the basis of multisensory sources of information. Therefore, the amodal confidence judgments are required here.

      Moreover, the model was fitted to response times that corresponded to the amodal judgment. Because we had no meaningful response times for the modality-specific judgment, we could not use them to fit the model.

      (10) In Figure 6, it appears the model is a bit off in its estimate of auditory responses (panel B, E) in the AV condition. Do the authors have any intuitions about why this might be happening?

      Indeed, the model does not capture the full behavioral effects reflecting multisensory interference in the modality-specific responses. We suppose that the model does not reproduce these interferences, as it is only fitted to amodal detection accuracy, and as the two sensors are completely independent from one another. We will clarify this aspect in the text.

      (11) The authors talk about how the model is reproducing effects in the human data, but there's no systematic comparison, quantitatively, of how the two things relate. The authors should include some quantitative measure that reflects this

      In addition to the d’ and criterion comparison between the observed and simulated data, we will compare modality-specific d’ and the correlations between observed and simulated confidence.

      (12) Related to this, I am not sure I agree with the characterization in Figure 7 that "when confidence followed a disjunctive rule, the model failed to capture important aspects of the data. On the other hand, when confidence followed a conjunctive rule, it reproduced confidence in presence judgments but failed to capture variability in confidence ratings for absence judgments." What, quantitatively, is the basis of this claim? This applies to Figure 8, too. I am not clear how, specifically, and quantitatively, the authors are justifying their claims about model fits. I don't think the confidence asymmetry index in Figure 8 is enough to quantify the quality of the model fitting procedure.

      To further support this claim, we will add a quantitative comparison of the different confidence fits.

      (13) Is there any chance the higher metacognitive efficiency for auditory trials is simply driven by differences in the d' values across the modalities? It might be good to probe this effect further.

      Thank you for this remark. Indeed, the difference in metacognitive efficiency may be driven by differences in the d’ values, and so a lower d’ for auditory stimuli can lead to higher metacognitive efficiency for a similar metacognitive sensitivity.

      Reviewer 3:

      This study used a pre-registered novel behavioural paradigm and computational modelling to investigate multi-sensory influences on detection and confidence. Participants performed amodal detection of auditory and visual stimuli (indicating that a stimulus was there when either an auditory stimulus or a visual stimulus or both were present), followed by amodal and unimodal confidence ratings. Detection was higher when both stimuli were present, and the presence of one modality increased the confidence in the presence of the other modality. In contrast to previous detection studies, confidence was higher for absent than for present judgements, but metacognitive efficiency was higher for present judgements. Metacognitive sensitivity was higher for bimodal stimuli, but this was not the case for metacognitive efficiency, suggesting that the sensitivity might be driven by first-order performance. The computational model showed that both detection and confidence in absence followed a disjunctive evidence integration rule, while confidence in presence followed a conjunctive integration rule.

      We thank the reviewer for engaging with our work.

      Strengths:

      The paper has several major strengths. Firstly, it addresses a novel research question using an innovative and well-controlled paradigm. Furthermore, the paradigm and analyses were pre-registered, and all effects that were interpreted were replicated in two independent samples. Finally, the paper uses an advanced computational model to capture counterintuitive patterns in the data.

      Weaknesses:

      The major weakness of the paper is the narrative structure. It is not always clear how the different analyses relate to the main research question. Many different effects are reported in terms of detection accuracy, bias, confidence and metacognition, as well as cross-modal and unimodal versus bimodal effects. It would help readability if the paper were streamlined in terms of the research question that is being answered, which I believe is specifically about multimodal absence judgements. Relatedly, for a reader not intimately familiar with the metacognition literature, the difference between MRatio, metacognitive sensitivity and metacognitive efficiency is not obvious. It would be good to clarify this more in the manuscript.

      We will improve the narrative structure so that each result clearly relates to the research question.

      We will also add a clearer definition of the various metacognition metrics to improve readability.

      In general, the conclusions drawn by the authors seem to be supported by the results. However, I was missing quantitative model comparisons between the conjunctive and the disjunctive models and an explanation of why the models systematically overestimated the confidence ratings. Furthermore, the 'perceptual multisensory interference' section reports on very interesting effects, but these are not supported by statistical tests in the main text. It would help to assess the strength of the claims if the statistical evidence in favour of these claims were presented together in the main text.

      The model was not fitted to confidence data, which could explain its overall overconfidence. As stated in previous responses, we will perform additional analyses to evaluate the model’s ability to reproduce confidence ratings. As some of the results were not replicated across experiments, we decided to put all statistical results related to multisensory interference in the supplementary materials and to focus only on consistent results across experiments.

      One other concern is that in real-world multi-sensory perception, such as the mosquito example in the introduction, the auditory and visual signals have a strong natural association, which means that if you hear the auditory signal, you expect that you will see the visual signal soon and vice versa. As far as I understood, this association was not present in the current paradigm, which might influence the type of effects that one would expect to see.

      The relation here is indeed artificial; we try to reinforce it as much as possible in the instructions of the task by indicating to the participants that they have to “detect a mosquito” that could be present auditory, visually, or both. But we acknowledge that the association between the visual and auditory stimuli is artificial, which may indeed influence our results.

      References

      Alais, D., & Burr, D. (2004). The Ventriloquist Effect Results from Near-Optimal Bimodal Integration. Current Biology, 14(3), 257‑ 262. https://doi.org/10.1016/j.cub.2004.01.029

      Battaglia, P. W., Jacobs, R. A., & Aslin, R. N. (2003). Bayesian integration of visual and auditory signals for spatial localization. JOSA A, 20(7), 1391‑ 1397. https://doi.org/10.1364/JOSAA.20.001391

      Chetverikov, A. (2026). Online behavioral studies are safe for now : Unusual RTs do not imply bots (A reply to Van der Stigchel et al., 2026) (Gjw5u_v1). PsyArXiv. https://osf.io/preprints/psyarxiv/gjw5u_v1/

      Dayan P. (2023). Metacognitive Information Theory. Open mind : discoveries in cognitive science, 7, 392–411. https://doi.org/10.1162/opmi_a_00091

      Ernst, M. O., & Banks, M. S. (2002). Humans integrate visual and haptic information in a statistically optimal fashion. Nature, 415(6870), Article 6870. https://doi.org/10.1038/415429a

      Faivre, N., Filevich, E., Solovey, G., Kühn, S., & Blanke, O. (2018). Behavioral, Modeling, and Electrophysiological Evidence for Supramodality in Human Metacognition. Journal of Neuroscience, 38(2), 263‑ 277. https://doi.org/10.1523/JNEUROSCI.0322-17.2017

      Fleming, S. M. (2017). HMeta-d : Hierarchical Bayesian estimation of metacognitive efficiency from confidence ratings. Neuroscience of Consciousness, 2017(1),

      Huskey, R., Zhao, Z., Parry, D. A., & Fisher, J. T. (2026). An AI agent can complete the Attention Network Test with human-like behavioral signatures : Implications for the bot-or-not debate (T2jru_v1). PsyArXiv. https://osf.io/preprints/psyarxiv/t2jru_v1/

      Kay, C.S. Why you shouldn’t trust data collected on MTurk. Behav Res 57, 340 (2025). https://doi.org/10.3758/s13428-025-02852-7nix007. https://doi.org/10.1093/nc/nix007

      Norman, E., Price, M. C., & Jones, E. (2011). Measuring strategic control in artificial grammar learning. Consciousness and Cognition, 20(4), 1920-1929. https://doi.org/10.1016/j.concog.2011.07.008

      Pereira, M., Skiba, R., Cojan, Y., Vuilleumier, P., & Bègue, I. (2023). Preserved Metacognition for Undetected Visuomotor Deviations. Journal of Neuroscience, 43(35), 6176‑ 6184. https://doi.org/10.1523/JNEUROSCI.0133-23.2023

      Rahnev, D. (2025). A comprehensive assessment of current methods for measuring metacognition. Nature Communications, 16(1), 701. https://doi.org/10.1038/s41467-025-56117-0

      Sandberg, K., Timmermans, B., Overgaard, M., & Cleeremans, A. (2010). Measuring consciousness : Is one measure better than the other? Consciousness and Cognition, 19(4), 1069‑ 1078. https://doi.org/10.1016/j.concog.2009.12.013

      Siedlecka, M., Paulewicz, B., & Wierzchoń, M. (2016). But I Was So Sure ! Metacognitive Judgments Are Less Accurate Given Prospectively than Retrospectively. Frontiers in Psychology, 0. https://doi.org/10.3389/fpsyg.2016.00218

      Wierzchoń, M., Asanowicz, D., Paulewicz, B., & Cleeremans, A. (2012). Subjective measures of consciousness in artificial grammar learning task. Consciousness and cognition, 21(3), 1141-1153. https://doi.org/10.1016/j.concog.2012.05.012

    2. Reviewer #2 (Public review):

      Summary:

      In this study, across two experiments, the authors wrestle with the question: What is the profile of confidence judgments in presence/absence decisions for audiovisual stimuli? After thresholding observers to 50% target detection rates in each modality, the authors conducted one experiment that included 75% target presence (spread equally across bimodal, auditory, and visual targets) and one experiment with 50% overall target presence. Results showed that, overall, detection performance was higher for audiovisual stimuli compared to unimodal ones, and that a recent model for stimulus detection could be extended to this multisensory scenario. By incorporating a disjunctive rule for absence judgments and a conjunctive rule for presence judgments, the model was able to qualitatively reproduce some of the trends observed in the human data regarding confidence.

      Strengths:

      (1) The paper makes novel contributions to the study of multisensory confidence judgments for yes/no target detection.

      (2) The paper further extends the use of a leading model of stimulus detection (from Mazor et al., 2025).

      (3) Pre-registration of the study was implemented, and the code is publicly available (although the GitLab link requires registration to access the materials).

      (4) One of the empirical results (higher confidence for absence compared to presence judgments) is especially interesting, contributing another empirical finding to a very mixed literature on this topic (as the authors note).

      Weaknesses:

      (1) Page 5 - I have concerns about the use of the equal-variance model from Signal Detection Theory to analyze the data. For example, the authors should read the recent paper by Miyoshi, Rahnev, and Lau in iScience, found at this link: https://www.cell.com/iscience/fulltext/S2589-0042(26)00373-1. In this paper, the authors note how the equal variance model should be used with caution in yes/no detection tasks, since the variances of the "stimulus present" and "stimulus absent" distributions are often different from one another. In a revision, I highly recommend that the authors explicitly discuss this paper and review whether the assumptions for the equal-variance model have been met (e.g., since they have confidence data, one way to do this would be to evaluate if the slope of the line in zROC space differs from 1). The authors may also want to incorporate methods from this iScience paper into the current manuscript, or potentially move to using an unequal variance SDT model and compute d'a and c'a.

      (2) Related to the computation/measurement of the response criterion, the authors note on page 18 in the Methods that for Experiment 1, signals are actually present on 75% of trials, since a bimodal stimulus is present on 25% of trials, the visual circle only occurs on 25% of trials, the sinusoidal tone occurs on 25% of trials, and then only noise is present on 25% of trials. Did the authors have any a priori hypotheses about the response criteria that participants would exhibit in Experiment 1, considering the unbalanced target presentation rate in this task? Also, in Experiment 2, what did it mean to equate target present and target absent trials? Is it that they broke 50% target present trials down into 16.67% bimodal targets, 16.67% visual targets, and 16.67% auditory targets? A few more details would be good to explicitly note for those trying to replicate the task.

      (3) It is important to plot the individual data for Figure 2. If the authors didn't match detection performance for the visual and auditory modalities, it would be good to see the individual data to know why. Is it that the thresholding procedure didn't work for some of the participants in the visual modality, and that's why the "yes" response rate is (on average) ~60% or higher across the two experiments? Similarly, in the auditory domain, do the authors have participants that are at floor? Or is it simply that the staircases failed to successfully target 50% detection on average?

      (4) The authors mentioned that data were collected on the Prolific platform. What checks did they conduct to ensure that this data wasn't produced by bots? There are recent high-profile publications in PNAS and Behavioral Research Methods that indicate how online data collection is problematic (e.g., https://www.pnas.org/doi/10.1073/pnas.2535585123 and https://link.springer.com/article/10.3758/s13428-025-02852-7). What analyses or quality checks are there to ensure that humans were the ones completing the task?

      (5) Page 7 - Since confidence was collected on a continuous scale, the authors should say a bit more about how they were able to compute measures of metacognitive efficiency. My understanding is that to compute meta-d', the data has to be binned. How was the binning implemented? With whatever bin size the authors chose, would it make any difference to the results if they changed the number of the bins in the analysis?

      (6) Page 8 - Is there a prior precedent for using slope of the Bayesian logistic regression predicting accuracy from confidence as a measure of metacognitive sensitivity? If so, can the authors cite those papers as a reference? If not, can they place this analysis within the context of other measures of metacognitive sensitivity that exist? (meta-d', AUROC (Type 2), etc.)

      (7) Page 8 - Another one of the results on page 8 is worth reflecting further upon: the authors note how in Experiment 1, no credible difference was found between unimodal and bimodal trials (DeltaM = -0.25 [-0.59, 0.10]), but in Experiment 2, "we observed higher metacognitive efficiency in unimodal compared to bimodal trials (DeltaM = -0.28 [-0.54, -0.02]. Those DeltaM values are nearly identical, so without a power analysis motivating the number of participants the authors collected, how certain are they that the results from these two experiments are really that distinct? It reminds me a bit of the Andrew Gelman blog post, "The difference between significance and non-significance is not significant".

      (8) Is there any way to look at whether the presence of multisensory hallucinations (or perhaps that word is too strong, and we should simply consider them miscategorizations) increased as the task progressed? That is, the authors have repeated presentations of audiovisual stimuli for at least some percentage of the trials. Since the percentages for auditory stimuli being correctly categorized as auditory are at 85% in Experiment 1 and 79% in Experiment 2, were the trials where they miscategorized these stimuli equally spread throughout the task? Or did they come later in the experiment, after being repeatedly exposed to multisensory trials?

      (9) Would the authors obtain the same results if they got rid of the amodal confidence judgment in their task, and simply had participants report the bimodal confidence following the presence/absence judgment? Part of the reason for asking this is that, according to page 11, the model is only fitted to amodal detection accuracy and response time data. This surprised me. I would have expected that the bimodal confidence would provide more useful information for the model fit. The authors should further explain this rationale in the paper. It seems odd to me to have the multisensory confidence ratings and not have them play a central role in the modeling work.

      (10) In Figure 6, it appears the model is a bit off in its estimate of auditory responses (panel B, E) in the AV condition. Do the authors have any intuitions about why this might be happening?

      (11) The authors talk about how the model is reproducing effects in the human data, but there's no systematic comparison, quantitatively, of how the two things relate. The authors should include some quantitative measure that reflects this.

      (12) Related to this, I am not sure I agree with the characterization in Figure 7 that "when confidence followed a disjunctive rule, the model failed to capture important aspects of the data. On the other hand, when confidence followed a conjunctive rule, it reproduced confidence in presence judgments but failed to capture variability in confidence ratings for absence judgments." What, quantitatively, is the basis of this claim? This applies to Figure 8, too. I am not clear how, specifically, and quantitatively, the authors are justifying their claims about model fits. I don't think the confidence asymmetry index in Figure 8 is enough to quantify the quality of the model fitting procedure.

      (13) Is there any chance the higher metacognitive efficiency for auditory trials is simply driven by differences in the d' values across the modalities? It might be good to probe this effect further.

      (14) Lastly, I think it would be interesting to look at how instructions about modality-specific attention could modulate these findings, in terms of how unimodal (unimodal visual, unimodal auditory) or bimodal attention might modulate these effects. This is an idea for future work.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Mutations in CDHR1, the human gene encoding an atypical cadherin-related protein expressed in photoreceptors, are thought to cause cone-rod dystrophy (CRD). However, the pathogenesis leading to this disease is unknown. Previous work has led to the hypothesis that CDHR1 is part of a cadherin-based junction that facilitates the development of new membranous discs at the base of the photoreceptor outer segments, without which photoreceptors malfunction and ultimately degenerate. CDHR1 is hypothesized to bind to a transmembrane partner to accomplish this function, but the putative partner protein has yet to be identified.

      The manuscript by Patel et al.makes an important contribution toward improving our understanding of the cellular and molecular basis of CDHR1-associated CRD. Using gene editing, they generate a loss of function mutation in the zebrafish cdhr1a gene, an ortholog of human CDHR1, and show that this novel mutant model has a retinal dystrophy phenotype, specifically related to defective growth and organization of photoreceptor outer segments (OS) and calyceal processes (CP). This phenotype seems to be progressive with age. Importantly, Patel et al, present intriguing evidence that pcdh15b, also known for causing retinal dystrophy in previous Xenopus and zebrafish loss of function studies, is the putative cdhr1a partner protein mediating the function of the junctional complex that regulates photoreceptor OS growth and stability.

      This research is significant in that it:

      (1) Provides evidence for a progressive, dystrophic photoreceptor phenotype in the cdhr1a mutant and, therefore, effectively models human CRD; and

      (2) Identifies pcdh15b as the putative, and long sought after, binding partner for cdhr1a, further supporting the theory of a cadherin-based junction complex that facilitates OS disc biogenesis.

      Nonetheless, the study has several shortcomings in methodology, analysis, and conceptual insight, which limits its overall impact.

      Below I outline several issues that the authors should address to strengthen their findings.

      Major comments:

      (1) Co-localization of cdhr1a and pcdh15b proteins

      The model proposed by the authors is that the interaction of cdhr1a and pcdh15b occurs in trans as a heterodimer. In cochlear hair cells, PCDH15 and CDHR23 are proposed to interact first as dimers in cis and then as heteromeric complexes in trans. This was not shown here for cdhr1a and pcdh15b, but it is a plausible configuration, as are single heteromeric dimers or homodimers. Regardless, this model depends on the differential compartmental expression of the cdhr1a and pcdh15b proteins. Data in Figure 1 show convincing evidence that these two proteins can, at least in some cases, be distributed along the length of photoreceptor membranes that are juxtaposed, as would be the case for OS and CP. If pcdh15b is predominantly expressed in CPs, whereas cdhr1a is predominantly expressed in OS, then this should be confirmed with actin double labeling with cdhr1a and pcdh15b since the apicobasal oriented (vertical) CPs would express actin in this same orientation but not in the OS. This would help to clarify whether cdhr1a and pcdh15b can be trafficked to both OS and CP compartments or whether they are mutually exclusive.

      First let me thank the reviewer for taking the time to comprehensively evaluate our work and provide constructive criticism which will improve the quality of our final version.

      To address this issue, we are completed imaging of actin/cdhr1a and actin/pcdh15b using SIM in both transverse and axial sections (Fig 1C-H). Additionally, we have recently established an immuno-gold-TEM protocol and showcase co-labeling of cdhr1a and pcdh15b at TEM resolution along the CP (Fig 1I).

      Photoreceptor heterogeneity goes beyond the cone versus rod subtypes discussed here and it is known that in zebrafish, CP morphology is distinct in different cone subtypes as well as cone versus rod. It would be important to know which specific photoreceptor subtypes are shown in zebrafish (Figures 1A-C) and the non-fish species depicted in Figures 1E-L. Also, a larger field of view of the staining patterns for Figures 1E-L would be a helpful comparison (could be added as a supplementary figure).

      The revised manuscript includes labels for the location of different cone subtypes in figure 1. All of the images showcasing CHDR1 localization across species concentrate on the PNA positive R/G cones. Larger fields of view were not collected as we prioritized the highest resolution possible and therefore collected small fields of view.

      (2) Cdhr1a function in cell culture

      The authors should explain the multiple bands in the anti-FLAG blots. Also, it would be interesting to confirm that the cdhr1a D173 mutant prevents the IP interaction with pcdh15b as well as the additive effects in aggregate assays of Figure 2.

      The multiple bands on the WB is like our previous results (Piedade 2020), which we believe arise due to ubiquitination and proteolytic cleavage of cdhr1a. We expect the D173 mutation to result in a complete absence of cdhr1a polypeptide, based on the lack of in situ signal in our WISH studies.

      Is it possible that the cultured cells undergo proliferation in the aggregation assays shown in Figure 2? Cells might differentially proliferate as clusters form in rotating cultures. A simple assay for cell proliferation under the different transfection conditions showing no differences would address this issue and lend further support to the proposed specific changes to cell adhesion as a readout of this assay.

      This is a possibility; however we did not use rotating cultures, this was a monolayer culture. We did not observe any differences in total cell number between the differing transfections. As such, we do not feel proliferation explains the aggregation of K562 cells.

      Also, the authors report that the number of clusters was normalized to the field of view, but this was not defined. Were the n values different fields of view from one transfection experiment, or were they different fields of view from separate transfection experiments? More details and clarification are needed.

      This will be clarified in the revised manuscript, in short we replicated this experiment 3 times, quantifying 5 different fields of view in each replicate.

      (3) Methodological issues in quantification and statistical analyses

      Were all the OS and CP lengths counted in the observation region or just a sample within the region? If the latter, what were the sampling criteria? For CPs, it seems that the length was an average estimate based on all CPs observed surrounding one cone or one-rod cell. Is this correct? Again, if sampled, how was this implemented? In Fig 4M', the cdhr1a-/- ROS mostly looks curvilinear. Did the measurements account for this, or were they straight linear dimension measurements from base to tip of the OS as depicted in Fig 5A-E? A clearer explanation of the OS and CP length quantification methodology is required.

      The revised manuscript will clearly outline measurement methods. In short, we measured every CP/OS in the imaged regions. We did not average CPs/cell, we simply included all CP measurements in our analysis. All our CP measurements (actin or cdhr1a or pcdh15), were measured in the presence of a counter stain, WGA, prph2, gnb1 or PNA to ensure proper measurements (landmark) and association with proper cell type. Our new figure 7 now includes cone OS counter staining to better highlight the OS.

      All measurements were taken as best as possible to reflect a straight linear dimension for consistency.

      How were cone and rod photoreceptor cell counts performed? The legend in Figure 4 states that they again counted cells in the observation region, but no details were provided. For example, were cones and rods counted as an absolute number of cells in the observation region (e.g., number of cones per defined area) or relative to total (DAPI+) cell nuclei in the region? Changes in cell density in the mutant (smaller eye or thinner ONL) might affect this quantification so it would be important to know how cell quantification was normalized.

      The revised manuscript will clearly outline measurement methods. In short, rod and cone cell counts were based on the number of outer segments that were observed in the imaging region and previously measured for length. We did not observe any eye size differences in our mutant fish.

      In Figure 6I, K, measuring the length of the signal seems problematic. The dimension of staining is not always in the apicobasal (vertical) orientation. It might be more accurate to measure the cdhr1a expression domain relative to the OS (since the length of the OS is already reduced in the mutants). Another possible approach could be to measure the intensity of cdhr1 staining relative to the intensity within a Prph2 expression domain in each group. The authors should provide complementary evidence to support their conclusion.

      The revised manuscript will clearly outline measurement methods. In short, all of our CP measurements (actin or cdhr1a or pcdh15), were done in the presence of a counter stain, WGA, prph2, gnb1 or PNA to ensure proper measurements and association with proper cell type.

      A better description of the statistical methodology is required. For example, the authors state that "each of the data points has an n of 5+ individuals." This is confusing and could indicate that in Figure 4F alone there were ~5000 individuals assayed (~100 data points per treatment group x n=5 individuals per data point x 10 treatment groups). I don't think that is what the authors intended. It would be clearer if the authors stated how many OS, CP, or cells were counted in their observation region averaged per individual and then provided the n value of individuals used per treatment group (controls and mutants), on which the statistical analyses should be based.

      This has been addressed in the revised manuscript. In short, we had an n=5 (individual fish) analyzed for each genotype/time point.

      There are hundreds of data points in the separate treatment groups shown in several of the graphs. It would not be correct to perform the ANOVA on the separate OS or CP length measurements alone as this will bias the estimates since they are not all independent samples. For example, in Figure 6H, 5dpf pcdh15b+/- have shorter CPs compared to WT but pcdh15b-/- have longer compared to WT. This could be an artifact of the analysis. Moreover, the authors should clarify in the Methods section which ANOVA post hoc tests were used to control for multiple pairwise comparisons.

      We have re-analyzed the data using multiple pairwise comparison ANOVA with post hoc tests (Tukey test). This new analysis did not significantly alter the statistical significance outcome of the study.

      (4) Cdhr1a function in photoreceptors

      The Cdhr1a IHC staining in 5dpf WT larvae in Figure 3E appears different from the cdhr1a IHC staining in 5dpf WT larvae in Figure 1A or Figure 6I. Perhaps this is just the choice of image. Can the authors comment or provide a more representative image?

      The image in figure 3E was captured using a previous non antigen retrieval protocol which limits the resolution of the cdhr1a signal along the CP. In the revised manuscript we have included an image that better represents cdhr1a staining in the WT and mutant.

      The authors show that pcdh15b localization after 5dpf mirrored the disorganization of the CP observed with actin staining. They also show in Figure 5O that at 180dpf, very little pcdh15b signal remains. They suggest based on this data that total degradation of CPs has occurred in the cdhr1a-/- photoreceptors by this time. However, although reduced in length, COS and cone CPs are still present at 180dpf (Figure 5E, E'). Thus, contrary to the authors' general conclusion, it is possible that the localization, trafficking, and/or turnover of pcdh15b is maintained through a cdhr1a-dependent mechanism, irrespective of the degree to which CPs are maintained. The experiments presented here do not clearly distinguish between a requirement for maintenance of localization versus a secondary loss of localization due to defective CPs.

      We agree, this point has been addressed in our revised manuscript. Additionally, we have also included data from 1 and 2 year old samples.

      (5) Conceptual insights

      The authors claim that cdhr1a and pcdh15b double mutants have synergistic OS and CP phenotypes. I think this interpretation should be revisited.

      First, assuming the model of cdhr1a-pcdh15b interaction in trans is correct, the authors have not adequately explained the logic of why disrupting one side of this interaction in a single mutant would not give the same severity of phenotype as disrupting both sides of this interaction in a double mutant.

      Second, and perhaps more critically, at 10dpf the OS and CP lengths in cdhr1a-/- mutants (Figure 7J, T) are significantly increased compared to WT. In contrast, there are no significant differences in these measurements in the pcdh15b-/- mutants. Yet in double homozygous mutants, there is a significant reduction of ~50% in these measurements compared to WT. A synergistic phenotype would imply that each mutant causes a change in the same direction and that the magnitude of this change is beyond additive in the double mutants (but still in the same direction). Instead, I would argue that the data presented in Figure 7 suggest that there might be a functionally antagonistic interaction between cdhr1a and pcdh15b with respect to OS and CP growth at 10dpf.

      If these proteins physically interacted in vivo, it would appear that the interaction is complex and that this interaction underlies both OS growth-promoting and growth-restraining (stabilizing) mechanisms working in concert. Perhaps separate homodimers or heterodimers subserve distinct CP-OS functional interactions. This might explain the age-dependent differences in mutant CP and OS length phenotypes if these mechanisms are temporally dynamic or exhibit distinct OS growth versus maintenance phases. Regardless of my speculations, the model presented by the authors appears to be too simplistic to explain the data.

      We agree with the reviewer, as such we have revised the discussion in our revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The goal of this study was to develop a model for CDHR1-based Con-rod dystrophy and study the role of this cadherin in cone photoreceptors. Using genetic manipulation, a cell binding assay, and high-resolution microscopy the authors find that like rods, cones localize CDHR1 to the lateral edge of outer segment (OS) discs and closely oppose PCDH15b which is known to localize to calyceal processes (CPs). Ectopic expression of CDHR1 and PCDH15b in K652 cells indicates these cadherins promote cell aggregation as heterophilic interactants, but not through homophilic binding. This data suggests a model where CDHR1 and PCDH15b link OS and CPs and potentially stabilize cone photoreceptor structure. Mutation analysis of each cadherin results in cone structural defects at late larval stages. While pcdh15b homozygous mutants are lethal, cdhr1 mutants are viable and subsequently show photoreceptor degeneration by 3-6 months.

      Strengths:

      A major strength of this research is the development of an animal model to study the cone-specific phenotypes associated with CDHR1-based CRD. The data supporting CDHR1 (OS) and PCDH15 (CP) binding is also a strength, although this interaction could be better characterized in future studies. The quality of the high-resolution imaging (at the light and EM levels) is outstanding. In general, the results support the conclusions of the authors.

      Weaknesses:

      While the cellular phenotyping is strong, the functional consequences of CDHR1 disruption are not addressed. While this is not the focus of the investigation, such analysis would raise the impact of the study overall. This is particularly important given some of the small changes observed in OS and CP structure. While statistically significant, are the subtle changes biologically significant? Examples include cone OS length (Figures 4F, 6E) as well as other morphometric data (Figure 7I in particular). Related, for quantitative data and analysis throughout the manuscript, more information regarding the number of fish/eyes analyzed as well as cells per sample would provide confidence in the rigor. The authors should also note whether the analysis was done in an automated and/or masked manner.

      First let me thank the reviewer for taking the time to comprehensively evaluate our work and provide constructive criticism which will improve the quality of our final version.

      The revised manuscript outlines both methods and statistics used for quantitation of our data. (please see comments from reviewer 1). While we do not include direct evidence of the mechanism of CDHR1 function, we do propose that its role is important in anchoring the CP and the OS, particularly in the cones, while in rods it may serve to regulate the release of newly formed disks (as previously proposed in mice). We do plan to test both of these hypothesis directly, however, that will be the basis of our future studies.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Patel et al investigates the hypothesis that CDHR1a on photoreceptor outer segments is the binding partner for PCDH15 on the calyceal processes, and the absence of either adhesion molecule results in separation between the two structures, eventually leading to degeneration. PCDH15 mutations cause Usher syndrome, a disease of combined hearing and vision loss. In the ear, PCDH15 binds CDH23 to form tip links between stereocilia. The vision loss is less understood. Previous work suggested PCDH15 is localized to the calyceal processes, but the expression of CDH23 is inconsistent between species. Patel et al suggest that CDHR1a (formerly PCDH21) fulfills the role of CDH23 in the retina.

      The experiments are mainly performed using the zebrafish model system. Expression of Pcdh15b and Cdhr1a protein is shown in the photoreceptor layer through standard confocal and structured illumination microscopy. The two proteins co-IP and can induce aggregation in vitro. Loss of either Cdhr1a or Pcdh15, or both, results in degeneration of photoreceptor outer segments over time, with cones affected primarily.

      The idea of the study is logical given the photoreceptor diseases caused by mutations in either gene, the comparisons to stereocilia tip links, and the protein localization near the outer segments. The work here demonstrates that the two proteins interact in vitro and are both required for ongoing outer segment maintenance. The major novelty of this paper would be the demonstration that Pcdh15 localized to calyceal processes interacts with Cdhr1a on the outer segment, thereby connecting the two structures. Unfortunately, the data presented are inadequate proof of this model.

      Strengths:

      The in vitro data to support the ability of Pcdh15b and Cdhr1a to bind is well done. The use of pcdh15b and cdhr1a single and double mutants is also a strength of the study, especially being that this would be the first characterization of a zebrafish cdhr1a mutant.

      Weaknesses:

      (1) The imaging data in Figure 1 is insufficient to show the specific localization of Pcdh15 to calyceal processes or Cdhr1a to the outer segment membrane. The addition of actin co-labelling with Pcdh15/Cdhr1a would be a good start, as would axial sections. The division into rod and cone-specific imaging panels is confusing because the two cell types are in close physical proximity at 5 dpf, but the cone Cdhr1a expression is somehow missing in the rod images. The SIM data appear to be disrupted by chromatic aberration but also have no context. In the zebrafish image, the lines of Pcdh15/Cdhr1a expression would be 40-50 um in length if the scale bar is correct, which is much longer than the outer segments at this stage and therefore hard to explain.

      First let me thank the reviewer for taking the time to comprehensively evaluate our work and provide constructive criticism which will improve the quality of our final version.

      To address this issue, we have added images of actin/cdhr1a and actin/pcdh15b using SIM in both transverse and axial sections. Additionally, we have established an immuno-gold-TEM protocol and provide data showcasing co-labeling of cdhr1a and pcdh15b at TEM resolution.

      (2) Figure 3E staining of Cdhr1a looks very different from the staining in Figure 1. It is unclear what the authors are proposing as to the localization of Cdhr1a. In the lab's previous paper, they describe Cdhr1a as being associated with the connecting cilium and nascent OS discs, and fail to address how that reconciles with the new model of mediating CP-OS interaction. And whether Cdhr1a localizes to discrete domains on the disc edges, where it interacts with Pcdh15 on individual calyceal processes.

      The image in figure 3E was captured using a previous non antigen retrieval protocol which limits the resolution of the cdhr1a signal along the CP. In the revised manuscript we include an image that better represents cdhr1a staining in the WT and mutant.

      (3) The authors state "In PRCs, Pcdh15 has been unequivocally shown to be localized in the CPs". However, the immunostaining here does not match the pattern seen in the Miles et al 2021 paper, which used a different antibody. Both showed loss of staining in pcdh15b mutants so unclear how to reconcile the two patterns.

      We agree that our staining appears different, but we attribute this to our antigen retrieval protocol which differed from the Miles et al paper. We also point to the fact that pcdh15b localization has been shown to be similar to our images in other species (monkey and frog). As such, we believe our protocol reveals the proper localization pattern which might be lost/hampered in the procedure used in Miles et al 2021.

      (4) The explanation for the CRISPR targets for cdhr1a and the diagram in Figure 3 does not fit with crRNA sequences or the mutation as shown. The mutation spans from the latter part of exon 5 to the initial portion of exon 6, removing intron 5-6. It should nevertheless be a frameshift mutation but requires proper documentation.

      This was an overlooked error in figure making, we have corrected this typo in the revised manuscript.

      (5) There are complications with the quantification of data. First, the number of fish analyzed for each experiment is not provided, nor is the justification for performing statistics on individual cell measurements rather than using averages for individual fish. Second, all cone subtypes are lumped together for analysis despite their variable sizes. Third, t-tests are inappropriately used for post-hoc analysis of ANOVA calculations.

      As we discussed for reviewer 1 and 2, all methods and quantification/statistics will be clearly described in the revised manuscript.

      (6) Unclear how calyceal process length is being measured. The cone measurements are shown as starting at the external limiting membrane, which is not equivalent to the origin of calyceal processes, and it is uncertain what defines the apical limit given the multiple subtypes of cones. In Figure 5, the lines demonstrating the measurements seem inconsistently placed.

      As we discussed for reviewer 1 and 2, all methods and quantification/statistics will be clearly described in the revised manuscript. We have also clarified that CP measurements were made based on a counterstain for the cone/rod OS so that the actin signal was only CP associated. We have included the counter stain in our revised Figure 7.

      (7) The number of fish analyzed by TEM and the prevalence of the phenotype across cells are not provided. A lower magnification view would provide context. Also, the authors should explain whether or not overgrowth of basal discs was observed, as seen previously in cdhr1-null frogs (Carr et al., 2021).

      The revised manuscript now includes the n number for our TEM samples. We have also added text comparing our results directly to Carr 2021.

      (8) The statement describing the separation between calyceal processes and the outer segment in the mutants is not backed up by the data. TEM or co-labelling of the structures in SIM could be done to provide evidence.

      We have completed both more SIM as well as immuno-gold TEM to support our conclusions, see new Figure 1.

      (9) "Based on work in the murine model and our own observations of rod CPs, we hypothesize that zebrafish rod CPs only extend along the newly forming OS discs and do not provide structural support to the ROS." Unclear how murine work would support that conclusion given the lack of CPs in mice, or what data in the manuscript supports this conclusion.

      In the revised manuscript we have adjusted our discussion to hypothesize that the small length of rod CPs is most likely to represent their interaction with newly forming discs rather than connect with mature discs which are enclosed in the OS.

      (10) The authors state "from the fact that rod CPs are inherently much smaller than cone CPs" without providing a reference. In the manuscript, the measurements do show rod CPs to be shorter, but there are errors in the cone measurements, and it is possible that the RPE pigment is interfering with the rod measurements.

      We have included references where rod CPs have been found to be shorter. We have no doubt that in zebrafish the rod CPs are significantly shorter. All our CP measurements are done with a counter stain for rods and cones to be sure that we are measuring the correct cell type.

      (11) The discussion should include a better comparison of the results with ocular phenotypes in previously generated pcdh15 and cdhr1 mutant animals.

      The revised manuscript has included these points.

      (12) The images in panels B-F of the Supplemental Figure are uncannily similar, possibly even of the same fish at different focal planes.

      We assure the reviewer that each of the images in supplemental figure 1 are distinct and represent different in situ experiments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In the second sentence of the Introduction section, the acronym 'PRC' should be defined.

      This has been corrected

      (2) In the Discussion section, it would be useful to comment on differences between the published Xenopus cdhr1-/- OS phenotypes and the published zebrafish pcdh15b-/- OS phenotypes compared to the present zebrafish cdhr1a-/- phenotypes. In the published studies, OS in these mutants demonstrated dysmorphic and overgrown disc membranes compared to the relatively minor disc layering defects shown for cdhr1a-/- in the present study.

      This discussion has been added.

      (3) CDHR1 mutations in patients cause cone-rod dystrophy, but mutations in PCDH15 (Usher 1F) cause rod-cone dystrophy. In the Discussion section, the authors should comment on what might lead to these different phenotypic trajectories in humans in the context of their proposed model.

      We have added to our discussion highlighting that is not possible to assess rod-cone dystrophy in the pcdh15b model as the mutation is lethal by 15dpf, which is still before most rods mature.

      Reviewer #2 (Recommendations for the authors):

      In addition to defining the 'n' for animal and cell numbers (as well as methods of analysis - automated/masked), there are a few additional recommendations for the authors.

      (1) Expression of USH1 genes in larval zebrafish (Figure S1) is not very convincing. SC RNAseq data exists and argues against this cell type restriction.

      Based on extensive experience with WISH we are confident that our interpretation of the data are valid. Furthermore, analysis of the daniocell data base confirms that cdh23, ush1ga, ush1c (harmonin) and myo7aa all have either no expression in photoreceptors or very low levels especially compared to pcdh15b and cdhr1a.

      (2) The model in Figure 1 is great. The coloring was a bit confusing. Cdhr1 and axoneme are both in green, while Pcdh15 and actin are both in red. Can each have its own color?

      Changed pcdh15b color to blue

      (3) Figure 2A: Please explain the multiple bands in some lanes. What do the full blots look like?

      Full blots were uploaded to eLife and do not exhibit any additional bands. The multiple bands are likely due to ubiquitination or proteolytic cleavage of cdhr1a and have been documented in our previous publication (Piedade 2020).

      (4) Is "data not shown" permissible? (lack of compensation of cdh1b in cdh1a mutants) (nonsense-mediated decay of the mutant transcript).

      We have added a supplementary figure showcasing this data.

      (5) Figure 4: Is there a TEM phenotype in discs before 15dpf? One would think there would be...?

      Due to technical limitations, we have not been able to examine disc phenotypes prior to 15dpf.

      (6) Figure 5: How are calyceal processes discriminated from cortical/PM-associated actin? A bonafide calyceal marker seems to be needed. Espin or Myo3, for example.

      We discriminate to identify CPs as actin signal that originates at the base of the OS and travels along the OS. Pcdh15b is a bonafinde CP marker which we show overlaps with actin signal along CPs.

      (7) Figures 5A-J: How is actin staining for CPs discriminating between rod and cones??? Apical - basal level imaging? This could be better clarified.

      CP identification is based on co-stain for either rod or cone Oss

      (8) Figure 6: Het phenotype for pcdh15b+/- (cone OS length and CP length at 5 and 10 dpf) is surprising ... worth discussing. (Figures 6E, H).

      The discussion section has been updated to discuss this finding.

      (9) Last, the authors state "Data not shown" throughout the manuscript. I do not believe this is allowed for the journal.

      This data (cdhr1b expression in cdhr1a mutants as well as cdhr1a WISH in cdhr1a mutants) has been added as supplementary figures.

      Reviewer #3 (Recommendations for the authors):

      Major comments are addressed above and the most important is the need for a convincing demonstration of Cdhr1a localization on the outer segment and proximity to Pcdh15b. The SIM could be a powerful tool, but the images provided are impossible to assess without any basis for context. Could a membrane, Prph2, and/or actin label be added? And lower magnification views?

      Minor comments.

      (1) The mention of "short CPs" in rodents is not an accurate description. Particular rodents (e.g. mouse, rat) lack CPs altogether or have a single vestigial structure.

      We have adjusted the text to reflect this point.

      (2) Inconsistent spacing between numbers and units.

      We have corrected these inconsistencies

      (3) Missing references.

      We have added missing references

      (4) Indicate the mean or median for bar graphs.

      The materials and methods section now specifies that all of our graphs depict a mean value

      (5) Unclear how rods are distinguished from cones in the cone analysis if both are labeled with prph2 antibody.

      Rods are physiological separate from cones in zebrafish retina and therefore easily identified by location as well as their distinct pattern of actin staining.

      (6) Red and green should not be used together for microscopy images.

      (7) The diagram in Figure 1D is confusing because of the repeated use of red and green for disparate structures. Also, the location and structure of actin are misrepresented, as is the transition of disc structure during maturation in rods.

      We have adjusted the color of pcdh15b to blue.

    1. Author response:

      General Statements

      We thank the reviewers for their insightful and constructive comments, which have substantially strengthened the manuscript. We have addressed all concerns and replaced the previous nonquantitative RNA-seq analysis with a new analysis that allowed for quantitative assessment. We were encouraged to find that the revised analysis not only confirmed our original observations but also reinforced and extended our conclusions.

      Point-by-point description of the revisions

      Reviewer #1:

      Significance

      At its current stage, this work represents a robust resource for molecular parasitology research programs, paving the way for mechanistic studies on multilayered gene expression control and it would benefit from experimental evidence for some of the claims concerning the in silico regulatory networks. Terms like "regulons", "recursive feedback loop" are employed without solid confirmation or extensive literature support. In my view, the most relevant contribution of this study is centered in the direct association between proteasome-dependent degradation and Leishmania differentiation.

      We thank the reviewer to acknowledge the impact of our work as a robust resource for further mechanistic studies. We agree that the new concepts emerging from our multilayered analysis should be experimentally assessed. However, given the scope of our analysis (i.e. a complete systems-level analysis of bona fide, hamster-isolated L. donovani amastigotes and derived promastigotes) and the amount of data presented in the current manuscript, such functional genetic analysis will merit an independent, in-depth investigation. The current version has been very much toned down and modified to emphasize the impact of our work as a powerful new resource for downstream functional analyses.  

      Evidence, reproducibility and clarity

      The narrative becomes somewhat diffuse with the shift to putative multilevel regulatory networks, which would benefit from further experimental validation.

      We agree with the reviewer and toned down the general discussion while suggesting putative multilevel regulatory networks for follow-up, mechanistic analyses. We now emphasize those networks for which evidence in trypanosomatids and other organisms has been published. Experimental validation of some of these regulatory networks is outside the scope of our manuscript and will be pursued as part of independent investigations.

      Major issues

      Fig.1D suggests a significant portion of the SNPs are exclusive, with a frequency of zero in one of the two stages. Were only the heterozygous and minor alleles plotted in Fig.1D, since frequencies close to 1 are barely observed? Is the same true in Sup Fig. S2B? Why do chrs 4 and 33 show unusual patterns in S2B?

      We thank the reviewer for this observation. The SNPs exclusive to either one or the other stage are likely the result of the 10% cutoff we use for this kind of analysis (eliminating SNPs that lack sufficient support, i.e. less than 10 reads). Due to bottle neck events (such as in vitro culture or stage differentiation), many low frequency SNPs are either ‘lost’ (filtered out) or ‘gained’ (passing the 10% cutoff) between the ama and pro samples. All SNPs above 10% were plotted. The absence of SNPs at 100% is one of the hallmarks of the Ld1S L. donovani strain we are using. Instead, these parasites show a majority of SNPs at a frequency of around 50%, which is likely a sign of a previous hybridization event. Chr 4 and chr 33 show a very low SNP density, most likely as they went through a transient monosomy at one moment of their evolutionary history, causing loss of heterozygosity. We now explain these facts in the figure legend.

      Chr26 revealed a striking contrasting gene coverage between H-1 and the other two samples. While a peak is observed for H-1 in the middle of this chr, the other two show a decrease in coverage. Is there any correlation with the transcriptomic/proteomic findings?

      This analysis is based on normalized median read depth, taking somy variations into account. This is now more clearly specified in the figure legend. We do not see any significant expression changes that would correlate with the observed (minor) read depth changes. As indicated in the legend, we do not consider such small fluctuations (less than +/- 1,5 fold) as significant. The reversal of the signal for chr 26 sample H1 eludes us (but again, these fluctuations are minor and not observed at mRNA level).

      The term "regulon" is used somewhat loosely in many parts of the text. Evidence of co-transcriptomic patterns alone does not necessarily demonstrate control by a common regulator (e.g., RNA-binding protein), and therefore does not fulfill the strict definition of a regulon. It should be clear whether the authors are highlighting potential multiple inferred regulons within a list of genes or not. Maybe functional/ gene module/cluster would be more appropriate terms.

      We thank the reviewer for this important comment. We replaced ‘regulon’ throughout the manuscript by ‘co-regulated, functional gene clusters’ (or similar).

      It is unclear whether the findings in Fig.3E are based on previous analysis of stagespecific rRNA modifications or inferred from the pre-snoRNA transcriptomic data in the current work or something else. I struggle to find the significance of presenting this here.

      We thank the reviewer for this comment. Yes, these data show stage-specific rRNA modifications based on previous analyses that mapped stage-specific differences of pseudouridine (Y) (Rajan et al., Cell Reports 2023, DOI: 10.1016/j.celrep.2024.114203) and 2'O-modifications (Rajan et al., Nature Com, in revision) by various RNA-seq analyses and cryoEM. This figure has been modified in the revised version to consider the identification of stageregulated snoRNAs in our new and statistically robust RNA-seq analysis. These data are shown to further support the existence of stage-regulated ribosomes that may control mRNA translatability, as suggested by the enriched GO terms ‘ribosome biogenesis’, ‘rRNA processing’ and ‘RNA methylation’ shown in Figure 2. We better integrated these analyses by moving the panels from Figure 3 to Figure 2.

      The protein turnover analysis is missing the critical confirmation of the expected lactacystin activity on the proteasome in both ama and pro. A straightforward experiment would be an anti-polyUb western blotting using a low concentration SDS-PAGE or a proteasome activity assay on total extracts.

      We thank the reviewer for this comment and have now included an anti-polyUb Western blot analysis (see Fig S7).

      The viability tests upon lactacystin treatment need a positive control for the PI and the YoPro staining (i.e., permeabilized or heat-killed promastigotes).

      This control is now included in Fig S7 and we have added the corresponding description to the text.

      I found that the section on regulatory networks was somewhat speculative and less focused. Several of the associated conclusions are, in some parts, overstated, such as in "uncovered a similar recursive feedback loop" (line 566) or "unprecedented insight into the regulatory landscape" (line 643). It would be important to provide some form of direct evidence supporting a functional connection between phosphorylation/ubiquitination, ribosome biogenesis/proteins and gene expression regulation.

      We agree with the reviewer and have considerably toned down our statements. Functional analyses to investigate and validate some of the shown network interactions are planned for the near future and will be published separately.

      Minor issues

      (1) The ordinal transition words "First,"/"Second," are used too frequently in explanatory sections. I noted six instances. I suggest replacing or rephrasing some to improve flow.

      Rectified, thanks for pointing this out.

      (2) Ln 168: Unformatted citations were given for the Python packages used in the study.

      Rectified, thanks for pointing this out.

      (3) Fig.1D: "SNP frequency" is the preferred term in English.

      Corrected.

      (4) Fig.2A: not sure what "counts}1" mean.

      This figure has been replaced.

      (5) Ln 685: "Transcripts with FC < 2 and adjusted p-value > 0.01 are represented by black dots" > This sentence is inaccurate. The intended wording might be: "Transcripts with FC < 2 OR adjusted p-value > 0.01 are represented by black dots"

      We thank the reviewer and corrected accordingly.  

      (6) Ln 698: Same as ln 685 mentioned above.

      We thank the reviewer and corrected accordingly.

      (7) Fig.2B and elsewhere: The legend key for the GO term enrichment is a bit confusing. It seems like the color scales represent the adj. p-values, but the legend keys read "Cluster efficiency" and "Enrichment score", while those values are actually represented by each bar length. Does light blue correspond to a max value of 0.05 in one scale, and dark blue to a max value of 10-7 in the other scale?

      This was corrected in the figure and the legends were updated accordingly.

      (8) Sup Figure S3A and S4A: The hierarchical clustering dendrograms are barely visible in the heatmaps.

      Thanks for the comment. Figure S3 was removed and replaced by a hierarchical clustering and a PCA plot.

      (9) S3A Legend: The following sentence sounds a bit awkward: "Rows and columns have been re-ordered thanks to a hierarchical clustering". I suggest switching "thanks to a hierarchical clustering" to "based on hierarchical clustering".

      This figure was removed and the legend modified.

      (10) Fig.5D: The font size everywhere except the legend key is too small. In addition, on the left panel, gene product names are given as a column, while on the right, the names are shown below the GeneIDs. Consistency would make it clearer.

      Thank you, this is now rectified. To ensue readability, we reduced the number of shown protein kinase examples.

      Reviewer #2 Evidence, reproducibility and clarity:

      In the absence of riboprofiling the authors return to the RNA-seq to assess the levels of pre-Sno RNA (the role of the could be more explicitly stated).

      We thank the reviewer for this comment. We moved the snoRNA analysis from Fig 3 to Fig 2 (see also the similar comment of reviewer 1), which better integrates and justifies this analysis. Based on the new and statistically robust RNA-seq analysis, the volcano plot showing differential snoRNA expression and possible ribosome modification has been adjusted (Figures 2C and D).

      The authors provide a clear and comprehensive description of the data at each stage of the results and this in woven together in the discussion allowing hypotheses to be formed on the potential regulatory and signalling pathways that control the differentiation of amastigotes to promastigotes. Given the amount and breadth of data presented the authors are able to present a high-level assessment of the processes that form feedback loops and/or intersectional signalling, but specific examples are not picked out for deeper validation or exploration.

      We thank the reviewer to acknowledge the amount and breadth of data presented. As indicated above (see responses to reviewer 1), mechanistic studies will be conducted in the near future to validate some of the regulatory interactions. These will be subject of separate publications. As noted above (response to reviewer 1), we toned down the general discussion, suggest follow-up mechanistic analyses and emphasize those networks for which evidence in trypanosomatids and other organisms has been published.

      Major comments:

      (1) As I have understood it from the description in the text, and in Data Table 4, the RNA-seq element of the work has only been conducted using two replicates. If this is the case, it would substantially undermine the RNA-seq and the inferences drawn from it. Minimum replicates required for inferential analysis is 3 bio-replicates and potentially up to 6 or 12. It may be necessary for the authors to repeat this for the RNA-seq to carry enough weight to support their arguments. (PMID: 27022035)

      We agree with the reviewer and conducted a new RNA-seq analysis with 4 independent biological replicates of spleen-purified amastigotes and derived promastigotes. Given the robustness of the stage-specific transcriptome, and the legal constrains associated with the use of animals, we chose to limit the number of replicates to the necessary. We thank the reviewer for this important comment, and the new data not only confirm the previous one (providing a high level of robustness to our data) but allowed us to increase the number of identified stage-regulated snoRNAs, thus further supporting a possible role of ribosome modification in Leishmania stage development.   

      (2) There are several examples that are given as reciprocal or recursive signalling pathways, but these are not followed up with independent, orthogonal techniques. I think the paper currently forms a great resource to pursue these interesting signalling interactions and is certainly more than just a catalogue of modifications, but to take it to the next level ideally a novel signalling interaction would be demonstrated using an orthogonal approach. Perhaps the regulation of the ribosomes could have been explored further (same teams recently published related work on this). Or perhaps more interestingly, a novel target(s) from the ubiquitinated protein kinases could have been explored further; for example making precision mutants that lack the ubiquitination or phosphorylation sites - does this abrogate differentiation?

      We agree with the reviewer that the paper currently forms a great resource. In-depth molecular analysis investigating key signaling pathways and regulatory interactions are outside the scope of the current multilevel systems analysis but will be pursued in independent investigations.

      (3) I found the use of lactacystin a bit curious as there are more potent and specific inhibitors of Leishmania proteasomes e.g. LXE-408. This could be clarified in the write-up (See below).

      We thank the reviewer for this comment. We opted for the highly specific and irreversible proteasome inhibitor lactacystin that has been previously applied to study the Leishmania proteasome (PMID: 15234661) rather than the typanosomatid-specific drug candidate LXE408 as the strong cytotoxic effect of the latter makes it difficult to distinguish between direct effects on protein turnover and secondary effects resulting from cell death, limiting its utility for dissecting proteasome function in living parasites. We have added this information in the Results section.

      (4) If it is the case that only 2 replicates of the RNA-Seq have been performed it really is not the accepted level of replication for the field. Most studies use a minimum of 3 bioreplicates and even a minimum of 6 is recommended by independent assessment of DESeq2.

      See response to comment 1 above.

      (5) As far as I could see, the cell viability assay does not include a positive control that shows it is capable of detecting cytotoxic effects of inhibitors. Add treatment showing that it can differentiate cytostatic vs cytotoxic compound.

      This control has now been added to Fig S7.

      (6) It is realistic for the authors to validate the cell viability assay. If the RNA-seq needs to be repeated then this would be a substantial involvement.

      Redoing the RNA-seq analysis was entirely feasible and very much improved the robustness of our results.

      (7) All the methods are written to a good level of detail. The sample prep, acquisition and data analysis of the protein mass spectrometry contained a high level of detail in a supplemental section. The authors should be more explicit about the amount of replication at each stage, as in parts of the manuscript this was quite unclear.

      We thank the reviewer for this comment and explicitly state the number of replicates in Methods, Results and Figure legends for all analyses. The number of replicates for each analysis is further shown in the overview Figure S1.

      (8) Unless I have misunderstood the manuscript, I believe the RNA-seq dataset is underpowered according to the number of replicates the authors report in the text.

      See response to comment 1 above.

      (9) Looking at Figure 1 and S1 and Data Table 4 to show the sample workflow I was surprised to see that the RNA-seq only used 2 replicates. The authors do show concordance between the individual biological replicates, but I would consider that only having 2 is problematic here, especially given the importance placed on the mRNA levels and linkage in this study. This would constitute a major weakness of the study, given that it is the basis for a crucial comparison between the RNA and protein levels.

      We agree and have repeated the RNAseq analysis using four independent biological replicates - see response to comment 1.

      (10) It also wasn't clear to me how many replicates were performed at each condition for the lactacystin treatment experiment - can the authors please state this clearly in the text, it looks like 4 replicates from Figure S1 and Data Table 8.

      Indeed, we did 4 replicates. This is now clarified in Methods, Results and Figure legends and shown in Figure S1.

      (11) Four replicates are used for the phosphoproteomics data set, which is probably ok, but other researchers have used a minimum of 5 in phosphoproteomics experiments to deal with the high level of variability that can often be observed with low abundance proteins & modifications. The method for the phosphoproteomics analysis suggests that a detection of a phosphosite in 1 sample (also with a localisation probability of >0.75) was required for then using missing value imputation of other samples. This seems like a low threshold for inclusion of that phosphosite for further relative quantitative analysis. For example, Geoghegan et al (2022) (PMID: 36437406) used a much more stringent threshold of greater than or equal to 2 missing values from 5 replicates as an exclusion criteria for detected phoshopeptides. Please correct me if I misunderstood the data processing, but as it stands the imputation of so many missing values (potentially 3 of 4 per sample category) could be reducing the quality of this analysis.

      We thank the reviewer for this remark and for highlighting best practices in phosphoproteomics data analysis. Unlike other studies that use cultured parasites and thus have access to unlimited amounts, our study employs bona fide amastigotes isolated from infected hamster spleens. In France, the use of animals is tightly controlled and only the minimal number of animals to obtain statistically significant results is tolerated (and necessary to obtain permission to conduct animal experiments).

      Regarding the number of biological replicates, we would like to emphasize that the use of four biological replicates is fully acceptable and used in quantitative proteomics and phosphoproteomics, particularly when combined with high-quality LC–MS/MS data and stringent peptide-level filtering. While some studies indeed employ five or more replicates, this is not a strict requirement, and many high-impact phosphoproteomics studies have successfully relied on four replicates when experimental quality and depth are high. In the present study, we adopted a discovery-oriented approach, aimed at detecting as many confidently identified phosphopeptides as possible. The consistency between replicates, combined with the depth of coverage and signal quality, indicates that four replicates are adequate for both the global proteome and the phosphoproteome in this context. Importantly, the quality of the MS data in this study is supported by (i) a high number of confidently identified peptides and phosphopeptides (identification FDR<1%), (ii) robust phosphosite localisation probabilities (localisation probability >0.75), and (iii) reproducible quantitative profiles across replicates. Notably, most of the identified phosphopeptides are quantified in at least two replicates within a given condition (between 73.2% and 83.4% of all the identified phosphopeptides among replicates of the same condition).

      Regarding missing value imputation, we appreciate that our initial description may have been unclear and we have revised the Methods to avoid misunderstanding. Phosphosites were only considered if detected with high confidence (identification FDR<1%) and high localisation confidence (localisation probability >0.75) in at least one replicate. This criterion was chosen to retain biologically relevant, low-abundance phosphosites, which are more difficult to identify and are often stochastically sampled in phosphoproteomics datasets. For statistical analyses, missing values within a given condition were imputed with a well-established algorithm (MLE) only when at least one observed value was present in that condition. Notably, they were replaced by values in the neighborhood of the observed intensities, rather than by globally low, noise-like values.

      We agree that more stringent exclusion rules, such as those used by Geoghegan et al. (2022), are appropriate in some contexts. However, there is no universally accepted standard for missingness thresholds in phosphoproteomics, and different strategies reflect trade-offs between sensitivity and stringency. In our discovery-oriented approach, we deliberately prioritized biological coverage while maintaining data quality. Our main conclusions are supported by coherent biological patterns, rather than by isolated phosphosite measurements.

      (12) For the metabolomics analysis it looks like 2 amastigote samples were compared against 4 promastigote samples. Why not triplicates of each?

      We thank the reviewer for noticing this point. It is an error in the figure file (Sup figure S1). Four biological replicates of splenic amastigotes were prepared (H130-1, H130-2, H133-1 and H133-2). Amastigotes from 2 biological replicates (H131-1 and H131-2) were seeded for differentiation into promastigotes in 4 flasks (2 per biological replicate) that were collected at passage 2. We have updated the figure file accordingly.

      Minor comments:

      Are prior studies referenced appropriately?

      Yes

      Are the text and figures clear and accurate?

      The write up is clear, with the data presented coherently for each method. The analyses that link everything together are well discussed. The figures are mostly clear (see below) and are well described in the legends. There is good use of graphics to explain the experimental designs and sample names - although it is unclear if technical replicates are defined in these figures.

      We thank the reviewer for these positive comments. We now included the information on replicates in the overview figure (Figure S1).

      As I have understood it, the authors have calculated the "phosphostoichiometry" using the ratio of change in the phosphopeptide to the ratio of the change in total protein level changes. This is detailed in the supplemental method (see below). Whilst this has normalised the data, it has not resulted in an occupancy or stoichiometry measurement, which are measured between 0-1 (0% to 100%). The normalisation has probably been sufficient and useful for this analysis, but this section needs to be re-worded to be more precise about what the authors are doing and presenting. These concepts are nicely reviewed by Muneer, Chen & Chen 2025 (PMID: 39696887) who reference seminal papers on determination of phosphopeptide occupancy - and may be a good place to start. An alternative phrase should be used to describe the ratio of ratios calculated here, not phosphostoichiometry.

      We thank the reviewer for this insightful comment and fully agree with the conceptual distinction raised. The reviewer is correct that the approach used in this study does not measure absolute phosphosite occupancy or stoichiometry, which would indeed require dedicated experimental strategies and would yield values bounded between 0 and 1 (0–100%). Instead, we calculated a normalized phosphorylation change, defined as the ratio of the change in phosphopeptide abundance relative to the change in the corresponding total protein abundance (a ratio-of-ratios approach – see doi :10.1007/978-1-0716-1967-4_12), and we tested whether this normalized phosphorylation change differed significantly from zero. This normalization approach is comparable to those previously published in the « Experimental Design and Statistical Analysis of the Proteome and the Phosphoproteome » section of the following paper (DOI: 10.1016/j.mcpro.2022.100428).

      Our intention was to account for protein-level regulation and thereby better isolate changes in phosphorylation dynamics. While this normalization is informative and appropriate for the biological questions addressed here, we agree that the term “phosphostoichiometry” is imprecise and not correct in this context.

      In response, we (i) replaced the term “phosphostoichiometry” throughout the manuscript with a more accurate description, such as “normalized phosphorylation level”, or “relative phosphorylation change normalized to protein abundance”, and (ii) revised the corresponding Methods and Results text to clearly state that absolute occupancy was not measured.

      This rewording will improve conceptual accuracy without altering the validity or interpretation of the results.

      From the authors methods describing the ratio comparison approach: "Another statistical test was performed in a second step: a contrasted t-test was performed to compare the variation in abundance of each modified peptide to the one of its parent unmodified protein using the limma R package {Ritchie, 2015; Smyth, 2005}. This second test allows determining whether the fold-change of a phosphorylated peptide between two conditions is significantly different from the one of its parent and unmodified protein (paragraph 3.9 in Giai Gianetto et al 2023). An adaptive Benjamini-Hochberg procedure was applied on the resulting pvalues thanks to the adjust.p function of R package cp4p {Giai Gianetto, 2016} using the Pounds et al {Pounds, 2006} method to control the False Discovery Rate level."

      The references have been formatted.

      Several aspects of the figures that contain STRING networks are quite useful, particularly the way colour around the circle of each node to denote different molecular functions/biological processes. However, some have descended into "hairball" plots that convey little useful information that would be equally conveyed in a table, for example. Added to this, the points on the figure are identified by gene IDs which, while clear and incontrovertible, are lacking human readability. I suggest that protein name could be included here too.

      We thank the reviewer for this comment but for readability we opted to keep the figure as is. We now refer to Tables 8, 9, and 12 that allow the reader to link gene IDs to protein name and annotation (if available).

      It is also not clear what STRING data is being plotted here, what are the edges indicating - physical interactions proven in Leishmania, or inferred interactions mapped on from other organisms? Perhaps as supplemental data provide the Cytoscape network files so readers can explore the networks themselves?

      We thank the reviewer for this comment. While the STRING plugin in Cytoscape enables integrated network-based analyses, it represents protein–protein associations as a single edge per protein pair derived from the combined confidence score. Consequently, the specific contribution of individual evidence channels (e.g. experimental evidence, curated databases, coexpression, or text mining) cannot be disentangled within this framework. However, this representation was considered appropriate for the present study, which focused on global network topology and functional enrichment rather than on the interpretation of individual interaction types. The information on stringency has been added to the Methods section and the Figure legends (adding the information on confidence score cutoff).

      We decided not to submit the Cytoscape files as they were generated with previous versions of Cytoscape and the STRING plugin. Based on the differential abundance data shown in the tables it will be very easy to recreate these networks with the new versions for any follow up study.

      The title of columns in table S10 panel A are written in French, which will be ok for many people particularly those familiar with proteomics software outputs, but everything else is in English so perhaps those titles could be made consistent.

      We apologize and have translated the text in English.

      I would suggest that the authors provide a table that has all the gene IDs of the Ld1S2D strain and the orthologs for at least one other species that is in TriTrypDB. This would make it easy to interrogate the data and make it a more useful resource for the community who work on different strains and species of Leishmania. Although this data is available it is a supplemental material file in a previous paper (Bussotti et al PNAS 2021) and not easy to find.

      We thank the reviewer for this very useful suggestion and have added this table (Table S13).

      Figure 5b - from the legend it is not clear where the confidence values were derived in this analysis, although this is explained in the supplemental method. Perhaps the legend can be a bit clearer.

      We have the following statement to the legend: ‘Confidence values were derived as described in Supplementary Methods’.

      Can the authors discuss why lactacystin was used? While this is a commonly used proteasome inhibitor in mammalian cells there is concern that it can inhibit other proteases. At the concentrations (10 µM) the authors used there are off-target effects in Leishmania, certainly the inhibition of a carboxypeptidase (PMID: 35910377) and potentially cathepsins as is observed in other systems (PMID: 9175783). There is a specific inhibitor of the Leishmania proteasome LXE-408 (PMID: 32667203), which comes closer to fulfilling the SGC criteria (PMID: 26196764) for a chemical probe - why not use this. Does lactacystin inhibit a different aspect of proteasome activity compared to LXE-408?

      We have add the following justification to the results section (see also response above to comment 3 for reviewer 2): We chose the highly specific and irreversible proteasome inhibitor lactacystin over the typanosomatid-specific, reversible drug candidate LXE408 as the latter’s potent cytotoxicity can confound direct effects on protein turnover with secondary consequences of cell death, limiting its utility for dissecting proteasome function in living parasites.

      The application of lactacystin is changing the abundance of a multitude of proteins but no precision follow up is done to identify if those proteins are necessary and/or sufficient from driving/blocking differentiation. This could be tested using precision edited lines that are unable to be ubiquitinated? There is a lack of direct evidence that the proteins protected from degradation by lactacystin are ubiquitinated? Perhaps some of these could be tagged and IP'd then probed for ubiquitin signal. Di-Gly proteomics to reveal ubiquitinated proteins? These suggestions should be considered as OPTIONAL experiments in the relevant section above.

      We very much appreciate these very interesting suggestions, which we will be considered for ongoing follow-up studies.

      In the data availability RNA-seq section the text for the GEO link is : (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc= GSE227637) but the embedded link takes me to (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE165615) which is data for another, different study. Also, the link to the GEO site for the DNA seq isn't working and manual searches with the archive number (BioProject PRJNA1231373 ) does not appear to find anything. The IDs for the mass spec data PRIDE/ProteomeXchange don't seem to bring up available datasets: PXD035697 and PXD035698

      The links have now been rectified and validated. For those data that are still under quarantine, here is the login information: To access the data:

      DNAseq data: https://dataview.ncbi.nlm.nih.gov/object/PRJNA1231373?reviewer=6qt24dd7f475838rbqfn228d 0

      RNAseq data: https://www.ebi.ac.uk/biostudies/ArrayExpress/studies/E-MTAB-16528?key=65367b55-d77f4c06-b4bd-bc10f2dc0b14

      Proteomic data:  http://www.ebi.ac.uk/pride

      Phosphoproteomic data: http://www.ebi.ac.uk/pride

      Significance

      Strengths:

      (1) The molecular pathways that regulate Leishmania life-stage transitions are still poorly understood, with many approaches exploring single proteins/RNAs etc in a reductionist manner. This paper takes a systems-scale approach and does a good job of integrating the disparate -omics datasets to generate hypotheses of the intersections of regulatory proteins that are associated with life-cycle progression.

      We thank the reviewer for this positive assessment of our work.

      (2) The differentiation step studied is from amastigote to promastigote. I am not aware that this has been studied before using phosphoproteomics. The use of the hamster derived amastigotes is a major strength. While a difficult/less common model, the use of hamsters permits the extraction of parasites that are host adapted and represent "normal", host-adapted Leishmania ploidy, the promastigote experiments are performed at a low passage number. This is a strength or the work as it reduces the interference of the biological plasticity of Leishmania when it is cultured outside the host.

      We thank the reviewer for the acknowledgment of our relevant hamster system, for which we face many challenges (financial, ethical, administrative as protocols need to be approved by the French government).

      Limitations:

      Potential lack of appropriate replication (see above).

      See response to comment 1.

      Lack of follow up/validation of a novel signalling interaction identified from the systems-wide approach. There is a lack of assessment of whether a single signalling cascade is driving the differentiation or these are all parallel, requisite pathways. The authors state the differentiation is not driven by a single master regulator, but I am not sure there is adequate evidence to rule this in or out.

      See response to comment 2 above.

      The study applies well established techniques without any particular technical stepchange. The application of large-scale multi-omics techniques and integrated comparisons of the different experimental workflows allow a synthesis of data that is a step forward from that existing in the previous Leishmania literature. It allows the generation of new hypotheses about specific regulatory pathways and crosstalk that potentially drive, or are at least active, during amastigote>promastigote differentiation.

      We thank the reviewer for these positive comments.

      This manuscript will have primary interest to those researchers studying the molecular and cell biology of Leishmania and other kinetoplastid parasites. The approaches used are quite standard (so not so interesting in terms of methods development etc.) and given the specific quirks of Leishmania biology it may not be that relevant to those working more broadly in parasites from different clades/phyla, or those working on opisthokont systems- yeast, humans etc. Other Leishmania focused groups will surely cherry-pick interesting hits from this dataset to advance their studies, so this dataset will form a valuable reference point for hypothesis generation.

      We thank the reviewer for this assessment and agree that our data sets will be very valuable for us and other teams to generate hypotheses for follow-up studies.

      Relevant expertise: Trypanosoma & Leishmania molecular & cell biology, RNA-seq, proteomics, transcriptional/epigenetic regulation, protein kinases - some experience of UPS system.

      I have not provided comment on the metabolomics as it is outside my core expertise. However, I can see it was performed at one of the leading parasitology metabolomics labs.

      We thank the reviewer for sharing expertise, investing time and intelligence in the assessment of our manuscript, and the highly constructive criticisms provided.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary:

      The study presents a comprehensive multi-omics investigation of Leishmania differentiation, combining genomic, transcriptomic, proteomic, phospho-proteomic and metabolomic data. The authors aim to uncover mechanisms of post-transcriptional and post-translational regulation that drive the stage-specific biology of L. donovani. The authors provide a detailed characterization of transcriptomic, proteomic, and phospho-proteomic changes between life stages, and dissect the relative contributions of mRNA abundance and protein degradation to stage-specific protein expression. Notably, the study is accompanied by comprehensive supplementary materials for each molecular layer and provides public access to both raw and processed data, enhancing transparency and reproducibility. While the data are rich and compelling, several mechanistic interpretations (e.g., "feedback loops," "recursive networks," "signaling cascades") are overstated. Similarly, the classification of gene sets as "regulons" is not adequately supported, as no common regulatory factor has been identified and only a single condition change (amastigote to promastigote) was assessed.

      We thank the reviewer for these comments and have corrected the manuscript to eliminate all unjustified mechanistic interpretations.

      Major Comments:

      (1) Across several sections (incl abstract, L559-565, L589-599, L600-L603, L610-612, L613-614, L625, L643-645, L650-652), the manuscript describes "recursive or self-controlling networks", "signaling cascades", "self-regulating", and "recursive feedback loops" - involving protein kinases, phosphatases, and translational regulators. While the data convincingly demonstrate stage-specific changes in phosphorylation and abundance changes in key molecules, the language used implies causal, direct and directional regulatory relationships that have not been experimentally validated.

      We agree with the reviewer and have corrected the text, replacing all expressions that may allude to causal or directional relationships by more neutral expressions such as ‘coexpression’.  

      (2) Co-expression and shared function alone do not define a regulon (L363, and several other places in the manuscript). A regulon also requires the gene set to be regulated by the same factor, for which there is no evidence here. Regulons can be derived from transcriptomic experiments, but then they need to show the same transcriptional behavior across many biological conditions, while here just 1 condition change is evaluated. Therefore, this analysis is conventional GO enrichment analysis and should not be overinterpreted into regulons.

      We agree with the reviewer and have replaced ‘regulon’ with ‘co-regulated gene clusters’ (or similar).

      (3) LFQ intensity of 0 (e.g., L389): An LFQ intensity of 0 does not necessarily indicate that a protein is absent, but rather that it was not detected. This can occur for several reasons: (1) true biological absence in one condition, (2) low abundance below the detection threshold, or (3) stochastic missingness due to random dropout in mass spectrometry. While the authors state that adjusted p-values for the 1534 proteins exclusively detected in either amastigotes or promastigotes are below 0.01, I could not find corresponding p-values for these proteins in Table 8 ('Global_Proteomic'). An appropriate statistical method designed to handle this type of missingness should be used. In this context, I also find the following statement unclear: "identified over 4000 proteins at each stage in at least 3 out of 4 biological replicates, representing 3521 differentially expressed proteins (adjusted p-value < 0.01), 1534 of which were exclusively detected in either ama or pro." If a protein is exclusively detected in one stage, then by definition it should not be detected in that number of replicates at both stages. This apparent contradiction should be clarified.

      We fully agree with the reviewer, an LFQ intensity of 0 may results from various reasons. We realize that our wording may have been ambiguous. For clarity, we have modified the original text to: ‘Label-free quantitative proteomic analysis of 4 replicates of amastigotes and derived promastigotes identified over 4000 proteins, including 1987 differentially expressed proteins (adjusted p-value < 0.01), and 1534 that were exclusively detected in either ama or pro (Figure 3A left panel, Table 6).’ We also modified the legend of the Figure 3B. Concerning missing values that could be either missing not at random (MNAR) or missing completely at random (MCAR), rather than introducing potentially misleading imputed values, we chose to treat these missing values as genuine stage-specific differences (presence/absence): quantitative statistics are restricted to proteins with measurable LFQ in both stages, while proteins with consistent presence in one stage and non-detection in the other are reported as stage-restricted detections. We believe this strategy is transparent and minimizes modeling assumptions, while still highlighting robust stage-specific signals. Our approach is supported by independent validation through RNA-seq data, which corroborates the differential presence/absence patterns observed at the protein level. Furthermore, our enrichment analyses reveal significant over-representation of specific biological terms among these stage-specific proteins, providing biological coherence to these findings. Therefore, we believe our conservative approach of treating these as genuine presence/absence differences, validated by orthogonal data, is more appropriate than introducing imputed values based on arbitrary statistical assumptions.  

      (4) L412 - Figure 3B: The figure shows proteins with infinite fold changes, which result from division by zero due to LFQ intensity values of zero in one of the compared conditions. As previously noted, interpreting LFQ zero values as true absence of expression is problematic, since these zeros can arise from several technical reasons - such as proteins being just below the detection threshold or due to stochastic dropout during MS analysis. Therefore, the calculated fold changes for these proteins are likely highly overestimated. This concern is visually supported by the large gap on the y-axis (even in log scale) between these "infinite" fold changes and the rest of the data. Moreover, given Leishmania's model of constitutive gene expression, it seems biologically implausible that all these proteins would be completely absent in one stage. This issue applies not only to Figure 3B, but also to the analyses presented in Figures 4D and 4E.

      We thank the reviewer for this comment. To clarify this section, we modified the text as follows: ‘Only expression changes were considered that either showed statistically significant differential abundance at both RNA and protein levels (p < 0.01), or showed significant RNA changes (p < 0.01) with the corresponding protein being detected in only one of the two stages. These latter proteins are identified by signals that were arbitrarily placed at the upper (detected in ama) or the lower (detected in pro) parts of the graph. Whether these proteins just escape detection due to low expression or are truly not expressed remains to be established.’ We also deleted the ‘infinity’ symbol from the Figure.

      Minor Comments:

      Methods

      L132: Typo: "A according" should be "according."

      The ‘A’ refers to RNase A. We added a comma for clarification (…RNase A, according to…)

      L158: How exactly were somy levels calculated? Please specify the method used, as I could not find a clear description in the referenced manuscript.

      We thank the reviewer for this comment. Aside the already quite detailed description in Methods and the reference there to the paper describing the pipeline, we now added a link to the description of the karyotype module of the giptools package (https://gip.readthedocs.io/en/latest/giptools/karyotype.html). There the following explanation can be found: “The karyotype module aims at comparing the chromosome sequencing coverage distributions of multiple samples. This module is useful when trying to detect chromosome ploidy differences in different isolates. For each sample the module loads the GIP files with the bin sequencing coverage (.covPerBin.gz files) and normalizes the meancoverage values by the median coverage of all bins. The bin scores are then converted to somy scores which are then used for producing plots and statistics.” The description then goes into further detail.  

      L158: Chromosome 36 is not consistently disomic, as stated. It has been observed in other somy states (e.g., Negreira et al. 2023, EMBO Reports, Figure 1), even if such occurrences are rare in the studied context. Normalizing by chr36 remains a reasonable choice, but it would be helpful to confirm that the majority of chromosomes appear disomic post-normalization to support the assumption that chr36 is disomic in this dataset as well.

      We thank the reviewer for this comment. Unlike the paper cited above (using longterm cultured promastigotes), our analysis uses promastigote parasites from early culture adaptation (p2) that were freshly derived from splenic amastigotes known to be disomic (and confirmed here), which represents an internal control validating our analysis.

      L163: Suggestion: Cite the GIP pipeline here rather than delaying the reference until L173.

      Corrected

      L188: "Controlled" may be a miswording. Consider replacing with "confirmed" or "validated."

      Corrected to ‘validated’

      L214: Please specify which statistical test was used to assess differential expression at the protein level. L227: Similarly, clarify which statistical test was applied for determining differential expression in the phospho-proteomics data.

      As noted in the Methods section, a limma t-test was applied to determine proteins/phosphoproteins with a significant difference in abundance while imposing a minimal fold change of 2 between the conditions to conclude that they are differentially abundant {Ritchie, 2015; Smyth, 2005}.

      Results

      L337-339: The interpretation here is too speculative. Phrases like "suggesting" and "likely" are too strong given the evidence presented. Alternative explanations, such as mosaic variation combined with early-stage selective pressure in the culture environment, should be considered.

      We thank the reviewers for these suggestions and have reformulated into: ‘In the absence of convergent selection, it is impossible to distinguish if these gene CNVs provide some strain-specific advantage or are merely the result of random genetic drift.’

      L340: The "undulating pattern" mentioned is somewhat subjective. To support this interpretation, consider adding a moving average (or similar) line to Figure 3A, which would more clearly highlight this trend across the data points.

      These lines have been added to Figure 1C (not 3A).

      L356: It may be more accurate to say "control of individual gene expression," since Leishmania does have promoters - the key distinction is that initiation does not occur on a gene-by-gene basis.

      Corrected

      L403-405: The statement "this is because these metabolites comprise a glycosomal succinate shunt..." should be rephrased as a hypothesis rather than a definitive explanation, as this causal link has not been experimentally validated.

      Thank you for the comment – we followed your advice.

      L407: Replace "confirming" with "matching" to avoid overstating the agreement with previous observations.

      Corrected

      L408: Replace "correlated" with "matched" for more accurate interpretation of results.

      Corrected

      L433: It is unclear how differential RNA modifications were detected. Please specify which biological material was used, the number of replicates per life stage, and how statistical evaluation of differential modifications was performed.

      This figure has now been updated using our statistically robust RNA-seq analysis conducted for the revision. See comments above.

      L436: This conclusion appears incomplete. While the manuscript mentions transcript-regulated proteins, it should also note that other proteins showed discordant mRNA/protein patterns. A more balanced conclusion would mention both the matching and non-matching subsets.

      We thank the reviewer for this comment and have made the necessary adjustments to better balance this conclusion.

      L441: The phrase "poor correlation" overgeneralizes and lacks nuance. Earlier sections of the manuscript describe hundreds of genes where mRNA and protein levels correlate well, suggesting that mRNA turnover plays a key regulatory role. Please rephrase this sentence to clarify that poor correlation applies only to a subset of the data.

      This has been corrected to ‘The discrepancies we observed in a sub-set of genes between….’.

      L454: The claim that "epitranscriptomic regulation and stage-adapted ribosomes are key processes" should be supported with references. If this builds on previously published work, please cite it accordingly.

      Corrected

      L457: Proteasomal degradation is a well-established mechanism in Leishmania. These findings are interesting but should be presented in the context of existing literature (e.g. Silva-Jardim et al.2014, [PMID: 15234661]) rather than as entirely novel.

      Corrected

      L459: The authors shoumd add a microscopy image of promastigotes treated with lactacystin. This would provide insight into whether treatment affects morphology, as is known in T. cruzi (see Dias et al., 2008). It would be particularly informative if Leishmania behaves differently.

      We added this information to Figure S7.

      L472 + L481: Table 9 shows several significant GO terms not discussed in the manuscript. Please clarify how the subset presented in the text was selected.

      We added this information to the text (‘some of the most significantly enrichment terms included …’).

      L482: The argument that a single master regulator can be excluded is unclear. Could the authors please elaborate on the reasoning or data supporting this conclusion?

      This statement was too speculative and has been removed. Instead, we added ‘Thus, Leishmania differentiation correlates with the expression of complex signaling networks that are established in a stage-specific manner’.

      L494: The term "unexpected" may not be appropriate here, as protein degradation is a wellestablished regulatory mechanism in trypanosomatids. Consider omitting this term to better reflect the field's current understanding.

      We deleted the term as suggested and reformulated to ‘….our results confirm the important role of protein degradation….’.

      L543: The term "feedback loop" should be used more cautiously. The current data are correlative, and no interventional experiments are provided to support a causal regulatory loop between proteasomal activity and protein kinases. As such, this remains a hypothesis rather than a confirmed mechanism.

      We fully agree and have toned down the entire manuscript, referring to feedback loops only as a hypothesis and not as a fact emerging from our datasets, which set the stage for future functional analyses.

      Discussion

      L555: As noted in L494, reconsider using the word "unexpected."

      Removed

      L589: The data do not fully support the presence of stage-specific ribosomes. Rather, they suggest differential ribosomal function through changes in abundance and regulation. Please consider rephrasing.

      We thank the reviewer for this comment and have follow the advice reformulating the sentence according to the suggestion.

      L657-658: The discussion of post-transcriptional and post-translational regulation of gene dosage effects would benefit from citing additional literature beyond the authors' own work. E.g. the study by Cuypers et al. (PMID: 36149920) offers a relevant and comprehensive analysis covering 4 'omic layers.

      We apologize for this omission and now describe and cite this publication in the Results section when concluding the results shown in Figure 1.

      L659-664: The reference to deep learning for biomarker discovery appears speculative and loosely connected to the current findings. As no such methods were applied in the study, and the manuscript does not clarify what types of biomarkers are intended, this statement could be seen as aspirational rather than evidence-based. Consider either omitting or elaborating with clear justification.

      We agree and have deleted this section.

      L690 + L705 (Figure 2): The phrase "main GO terms" is vague. Please clarify the criteria for selecting the GO terms shown - were they chosen based on adjusted p-value, enrichment score, or another metric? Additionally, define "cluster efficiency," explaining how it was calculated and what it represents.

      Corrected to ‘some of the most significantly enriched GO terms’.

      Referee cross-commenting

      Overall, I think the other reviewers' comments are fair. They seem to align particularly on the following points:

      (1) Reviewers agree that this is a comprehensive body of work with original contributions to the field of Leishmania/trypanosomatid molecular biology, and that it will serve as a valuable reference for hypothesis generation.

      (2) Several reviewers raise concerns about overinterpretation of the data, particularly regarding regulatory networks, regulons, and master regulators. The interpretation and large parts of the discussion are considered too speculative without additional functional validation.

      (3) There are comments about the incorrect statistical treatment of missing values in the proteomics experiments, which affects confidence in some of the conclusions.

      (4) While the correlation between the two RNA-Seq replicates is high, the decision to include only two biological replicates is seen as unfortunate and not ideal for statistical robustness.

      (5) The use of lactacystin should be more clearly motivated, and its limitations discussed in the context of the experiments.

      Even though I did not remark on the last two points (4 and 5) in my own review, I agree with them.

      We thank the reviewer for this cross-comparison, which served us as guide to revise our manuscript. We believe that we have responded to all these concerns.

      Reviewer #3 (Significance):

      This study provides a rich, integrative multi-omics dataset that advances our understanding of stage-specific adaptation in the transcriptionally unique parasite Leishmania. By dissecting the relative contributions of mRNA abundance and protein turnover to final protein levels across life stages, the authors offer valuable insights into post-transcriptional and post-translational regulation. The work represents a resource-driven yet conceptually informative contribution to the field, with comprehensive supplementary materials and transparent data sharing standing out as additional strengths.  

      However, the mechanistic insights proposed are speculative in several places and require more cautious language. The study is most impactful as a resource and descriptive atlas, initiating hypotheses for future validation. The broad scientific community working on Leishmania, trypanosomatids, and post-transcriptional regulation in eukaryotes would benefit from this work.

      We thank the reviewer for this positive assessment and have modified the manuscript to further emphasize its strength as an important resource to incite mechanistic follow-up studies.

      Field of reviewer expertise: multi-omics integration, bioinformatics, molecular parasitology, transcriptomics, proteomics, metabolomics, Leishmania, Trypanosoma.

      Reviewer #4 (Evidence, reproducibility and clarity):

      Summary:

      This study investigates the regulatory mechanisms underlying stage differentiation in Leishmania donovani, a parasitic protist. Pesher et al., aim to address the central question of how these parasites establish and maintain distinct life cycle stages in mostly the absence of transcriptional control. The authors employed a five-layered systems-level analysis comparing hamster-derived amastigotes and their in vitro-derived promastigotes. From those parasites, they performed a genomic, transcriptomic, proteomic, metabolomic and phosphoproteomic analysis to reveal the changes the parasites undertook between the two life stages.

      The main conclusion stated by the authors are:

      - The stage differentiation in vitro is largely independent of major changes in gene dosage or karyotype.

      - RNA-seq analysis identified substantial stage-specific differences in transcript abundance, forming distinct regulons with shared functional annotations. Amastigotes showed enrichment in transcripts related to amastins and ribosome biogenesis, while promastigotes exhibited enrichment in transcripts associated with ciliary cell motility, oxidative phosphorylation, and posttranscriptional regulation itself.

      - Quantitative phosphoproteome analysis revealed a significant increase in global protein phosphorylation in promastigotes. Normalizing phosphorylation changes against protein abundance identified numerous stage-specific phosphoproteins and phosphosites, indicating that differential phosphorylation also plays a crucial role in establishing stage-specific biological networks. The study identified recursive feedback loops (where components of a pathway regulate themselves) in post-transcriptional regulation, protein translation (potentially involving stage-specific ribosomes), and protein kinase activity. Reciprocal feedback loops (where components of different pathways cross-regulate each other) were observed between kinases and phosphatases, kinases and the translation machinery, and crucially, between kinases and the proteasomal system, with proteasomal inhibition disrupting promastigote differentiation.

      We thank the reviewer for the time and implication dedicated to our manuscript.  

      Further details are organised by order of apparition in the text:

      Material and Methods: while the authors are indicating some key parameters, providing the codes and scripts they used throughout the manuscript would improve reproducibility.

      We thank the reviewer for this comment and added the URL for the codes to the data availability section.

      Why only 2 biological replicates for RNA while the others layers have 3 or 4?

      We agree with the other reviewers and have repeated this analysis to have statistically more robust results.

      Is the slight but reproducible increase in median coverage observed for chr 1, 2, 3, 4, 6 and 20 stable on longer culture derived promastigotes and sandfly derived promastigotes ?

      No, as published in Barja et al Nature EcolEvol 2017 (PMID: 29109466) and Bussotti et al PNAS 2023 (PMID: 36848551), these minor fluctuations are not predicting subsequent aneuploidies in long-term culture nor in sand fly-derived promastigotes. This information has been added to the text.

      Is this change of ploidy a culture adaptation representation rather than a life cycle event as the authors discuss later on? (This is probably an optional request that would be nice to include, if the authors have performed the sequencing of such parasites. Otherwise, it should be mentioned in the discussion).

      Yes, this is a well-known culture adaptation phenomenon, on which we have published extensively. We added this conclusion and the references to the text.

      L333 "Likewise, stage differentiation was not associated with any major gene copy number variation (Figure 1C, Table 2)". The authors are looking here at steady differentiated stages rather than differentiation itself. "Likewise, stage differentiation was.." would be more appropriate.

      We corrected this sentence to ‘Likewise, differentiation of promastigotes was not associated with any major gene copy number variation at early passage 2’.

      L349-355: have the mRNA presenting change in abundance between stages been normalised by their relative DNA abundance ? Said otherwise, can the wave patterns observed at the genome level explain the respective mRNA level ? Can the authors plot in a similar way the enrichment scores in regards to the position on the genome and can the authors indicate if there is a positional enrichment in addition to the functional one they observe ? This may affect the conclusion in L356-358.

      As noted above, we did not see any significant read depth changes at DNA level when comparing amastigotes and promastigotes. Thus there is no need to normalize the RNAseq results to DNA read depth. Furthermore, in our comparative transcriptomics analysis, we only consider 2-fold or higher changes in mRNA abundance (which is far beyond the non-significant read depth change we have observed on DNA level). Manual inspection of the enrichment scores with respect to position did not reveal any significant signal (other than revealing some overrepresented tandem gene arrays where all gene copies share the same location and GO term).

      L415 "stage-specific expression changes correlate between protein and RNA levels, suggesting that the abundance of these proteins is mainly regulated by mRNA turn-over". Overstatement. Correlation does not suggest causation. "suggesting that the abundance of these proteins could be regulated by mRNA turn-over" would be more appropriate.

      We thank the reviewer for this comment and have corrected the statement accordingly.

      Figure 3B, could the authors clarify what are the "unique genes" that are on the infinite quadrants? It seems these proteins are identified in one stage and not the other. This implies that the corresponding missing values are missing non-at random (MNAR). Rather than removing those proteins containing NMAR from the differential expression analysis, the authors should probably impute those missing values. Methods of imputation of NMAR and MAR can be found in the literature. Indeed, the level of expression in one stage of those proteins is now missing, while it could strongly affect the conclusions the authors are drawing in figure 4E regarding the proteins targeted for degradation and rescued in presence of the proteasome inhibitor.

      We thank the reviewer for this important comment. However, we would like to clarify several key points regarding the treatment of proteins identified in only one condition.

      First, the reviewer assumes that proteins identified in one stage but not the other are necessarily missing not-at-random (MNAR). However, this cannot be definitively established, as these missing values could equally be missing completely at random (MCAR). Without additional information, categorizing them specifically as MNAR may be an oversimplification. More importantly, we have concerns about the reliability of imputation methods in this specific context. Algorithms designed to impute MNAR values (such as QRILC) replace absent data using random sampling from arbitrary probability distributions, typically assuming low intensity values. However, when no intensity value has been detected or quantified for a protein in a given condition, imputing an arbitrary low value raises significant concerns about data interpretation. Such imputed values would not reflect actual measurements but rather statistical assumptions that could introduce bias into downstream analyses. For instance, imputed values could lead to the conclusion that a protein is not differentially abundant, when in reality it is detected in one condition but completely absent in the other. In our view, there are two biologically plausible scenarios: either these proteins are expressed at levels below our detection threshold, or they are genuinely absent (or present at negligible levels) in the corresponding stage. Rather than introducing potentially misleading imputed values, we chose to treat these as genuine stage-specific differences (presence/absence), which results in infinite fold-changes in Figure 3B. Critically, our approach is strongly supported by independent validation through RNA-seq data, which corroborates the differential presence/absence patterns observed at the protein level. Furthermore, our enrichment analyses reveal significant over-representation of specific biological terms among these stagespecific proteins, providing biological coherence to these findings. These converging lines of evidence (proteomics, transcriptomics, and functional enrichment) strengthen our confidence that these represent biologically meaningful differences rather than technical artifacts.Therefore, we believe our conservative approach of treating these as genuine presence/absence differences, validated by orthogonal data, is more appropriate than introducing imputed values based on arbitrary statistical assumptions. To clarify this section, we modified the text as follows: ‘Only expression changes were considered that either showed statistically significant differential abundance at both RNA and protein levels (p < 0.01), or showed significant RNA changes (p < 0.01) with the corresponding protein being detected in only one of the two stages. These latter proteins are identified by signals that were arbitrarily placed at the upper (detected in ama) or the lower (detected in pro) parts of the graph. Whether these proteins just escape detection due to low expression or are truly not expressed remains to be established.’

      L430-435 "These data fit with the GO [...] the ribosome translational activity (34)." This discussion feels out of place and context. It is too speculative and with little support by the data presented at this stage of the manuscript. It should be removed as Figure 3E or could be placed in the discussion and supplementary information.

      We agree with the reviewer. In response to a comment from reviewer 1, we have moved both panels to Figure 2, which much better integrates these data.  

      The authors present an elegant way to show stage specific degradation through the comparison of stage specific proteasome blockages that show rescue in ama of proteins present in pro and vice versa. L494 "reveal an unexpected but substantial" the term unexpected is inappropriate, as several studies have shown in kinetoplastids the essential role of protein turnover through degradation / autophagy during differentiation. Furthermore the conclusions may be strongly affected by the level of expression of the proteins in the infinite quadrants as we discussed above, and should be revised accordingly.

      We rephrased the conclusion to ‘In conclusion, our results confirm the important role of protein degradation in regulating the L. donovani amastigote and promastigote proteomes and identify protein kinases as key targets of stage-specific proteasomal activities.’ Please see the response to comment 9 regarding the unique proteins.

      L518 "These data reveal a surprising level of stage-specific phosphorylation in promastigotes, which may reflect their increased biosynthetic and proliferative activities compared to amastigotes." Overstatement. Could also be due to culture adaptation - What is the overlap of stage-specific phosphorylations with previous published datasets in other species of Leishmania? Looking at such comparisons could help to decipher the role of culture adaptation response, species specificity and true differentiation conserved mechanisms.

      We agree with the reviewer and have toned this statement down by adding the statement ‘….or simply be a consequence of culture adaptation’.

      The discussion is extremely speculative. While some speculation at this stage is acceptable, claiming direct link and feedback without further validation is probably far too stretched. For example, the changes of phosphorylation observed on particular sets of proteins, such as phosphatase and DUBs, need to be validated for their respective change of protein activity in the direction that fits the model of the authors. Those discussions should be toned down.

      We agree with the reviewer and have strongly toned down the entire discussion, emphasizing the hypothesis-building character of our results, which provide a novel framework for future experimental analyses.

      A couple of typos:

      In the phosphoproteome analysis section, "...0,2 % DCA..." should be "...0.2 % DCA..." (use a decimal point).

      L225 "...peptide match was disable." should be "...peptide match was disabled."

      Both corrected

      Reviewer #4 (Significance):

      While there is not too much novelty around the emphasis of gene expression at post-translational level in kinetoplastid organisms, the scale of the work presented here, looking at 5 layers of potential regulations, is. Therefore, this study represents a substantial amount of work and provides interesting and comprehensive datasets useful for the parasitology community.

      We thank the reviewer for this positive statement.

      Several potential concerns regarding the biological meaning of the findings were identified. These include the limitations of in vitro systems promastigote differentiation potentially limiting the conclusions, the challenge of inferring causality from correlative "omics" data, and the complexities of functional interpretation of changes in phosphorylation and metabolite levels. The proposed feedback loops and functional roles of specific molecules would require further experimental validation to confirm their biological relevance in the natural life cycle of Leishmania, but that would probably fall out of the scope of this manuscript.

      We agree with the reviewer and have modified pour manuscript throughout to remove any causal relationships. Indeed, this work is setting the stage for future investigations on dissecting some of the suggested regulatory mechanisms.

      Area of expertise of the reviewers: Kinetoplastid, Differentiation, Signalling, Omics

    1. How do you think about the authenticity of the Tweets that come from Trump himself?

      I actually think that the tweets coming from Trump himself in this context are MORE authentic than the ones coming from his campaign team. As users of the platform and people living in this country, we expect a certain flavor of content out of Trump's tweets. Seeing posts from his campaign next to posts from Trump in some ways acts to muddy the waters. If the public face of a presidential candidate was always angry and negative, many voters may be turned away from that candidate. But by intermixing calm, structured posts, it makes the candidate appear able to switch their anger and negativity on and off. Which may be more appealing to voters.

    1. Author response:

      Public Reviews:

      Reviewer #1:

      Summary:

      The authors aim to study mutational paths connecting WW domains with different binding specificities. Their approach combines an unsupervised sequence generative model based on RBMs with a path-sampling algorithm. The key result is that most intermediate sequences along the designed transition paths retain measurable binding activity in wet-lab assays, whereas paths containing the same mutations introduced in a randomized order are largely nonfunctional. This difference is attributed to epistatic interactions captured by the RBM model.

      Strengths:

      Exploring mutational paths in high-dimensional protein sequence space is a challenging problem. The computational framework used here is state-of-the-art and is strengthened by systematic experimental characterization of binding activity. The study is comprehensive in scope, including multiple transition paths both within and across WW specificity classes, and the integration of modeling with high-throughput experimental validation is a clear strength.

      Weaknesses:

      A major concern is whether the stated goal of specificity switching is fully achieved. Along the sampled transition paths, most intermediate variants appear to retain specificity close to either the initial or the final class, rather than exhibiting gradually shifting specificity. For example, in Figure 4G (Class I to Class II/III), binding appears largely binary, with intermediates behaving similarly to one of the endpoints. A similar pattern is observed in Figure 3H for the Class I to Class IV transition, where binding responses are close to 0 or 1. In this sense, the specificityswitching objective is only partially realized by assigning two endpoints with different specificity. This raises a broader conceptual question: is it possible that different WW specificities evolved from a common ancestor without passing through intermediates that exhibit mixed or intermediate specificity? If so, then inferring specificity-switching pathways purely from extant natural sequences may be fundamentally challenging.

      This is a key question, which was one of the original motivations of our work. Both hypothesis of ‘abrupt switches’ (punctuated equilibria, corresponding to distinct specificities) and more gradual changes (smooth transition, through intermediate that exhibit mixed or intermediate specificity) are possible.

      Many natural specificity-switching events have probably resulted from the need to adapt to environmental change and selection for a different specificity, which can be compatible with an abrupt change in specificity. Others may reflect the gradual evolution of promiscuous ancestral sequences to more specialized ones, loosing cross-reactivity. A molecular mechanism that could allow abrupt switching is gene duplication, a frequent mechanism for WW domain diversification, beyond standard mutational-driven evolution processes.  

      As for the specificity-switching paths for WW domains found in this work, the presence of weakly responsive cross-reactive intermediates along the designed paths for I<->IV, and their absence in the I<->II path, suggests that designing promiscuous domains is hard (see also related response to point 3 of Reviewer 2) and generally not selected by natural evolution (as seen from the clear clustering of extant proteins in different specificity classes). 

      For a small domain such as WW, mutations that favor some specificity classes are known to have detrimental effects on fundamental properties, such as folding kinetics and stability, see Ref [72]. It is possible that larger, less constrained protein domains could allow for more crossreactive variants and smoother specifity switching. However, experiments on fluorescent proteins looking for interpolation between two wave-lengths have shown that the switch was abrupt [Poelwijk et al. Nature Communications (2019)].

      Our scope was to achieve a functional switch (imposed by the two extant end-points) through a path of designed, functional intermediates and to correctly predict, with our RBM model, the location of the specificity transition and of the cross-reactivity region (which we expected only along the I-IV path). This scope was successfully reached as demonstrated by experiments.  

      Reviewer #2:

      This is an extremely important work that shows how one can use generative models to construct specificity-switching mutational paths in complex fitness landscapes. The experimental evidence is very clear, and the theoretical tools are innovative.

      The work will likely have a deep impact on future research aimed at understanding how evolution navigates fitness landscapes as well as reconstructing ancestral sequences.

      The manuscript is extremely clear and well written, the experimental evidence is strong, and the methods are clearly described, so I do not have major issues to raise. A few minor issues are listed below.

      (1) I consider the WW domain as an 'easy' case from the point of view of generative modelling. The domain is rather short, epistatic effects are not very strong (e.g. Boltzmann learning usually converges very quickly to a very paramagnetic state), and the resulting models are well interpretable (e.g. the hidden units of the RBM correlate well with subclasses).

      This is not always (not often?) the case, however. In more complex proteins, the learning procedures can be slower and the resulting models less interpretable. Just for completeness, perhaps the authors could comment on the generality of the results and what they would expect for other systems based on their experience.

      We agree with Reviewer 2 that WW sequences are short and simple to handle from a computational point of view, and was chosen for this reason to test the design of full mutational paths (after having benchmarked it to lattice-protein models, see Refs. [30] and [44]). Our work gives additional support to the effectiveness of generative models learned from sequence data.  This said, from a biological point of view, WW is a highly constrained domain, see comment by Reviewer 1 above and our answer.

      In longer and more complex proteins, we expect it will be more difficult to disentangle specificityswitching latent units, see Fernandez-de-Cossio-Diaz et al., Physical Review X 2023 for a discussion and a possible computational approach to this issue. Notice that, while relating the latent units to specificity classes was convenient, it was not used to generate the paths themselves. Therefore, we believe that our method is quite robust and easily generalizable to applications to more complex and longer proteins. As an illustration, we have recently used it to sample viral trajectories (more precisely, variants of the Receptor Binding Domain of the SARSCoV-2 spike protein) capable of escaping antibody recognition, see Huot et al., PNAS 2026. In this recent work, we projected the paths onto the principal antigenic space, defined by the top two Principal Components of the viral variant binding affinities to 32 antibodies. In this representation, sampled paths displayed trends similar to natural paths, drawn from the sequences sampled during the pandemics. This finding supports the applicability and interpretation of our method for more complex proteins.

      (2) In Section 3.3, the authors say that direct paths connecting Class I and Class IV behave similarly to indirect paths, despite having lower scores according to the RBM. How generic is this? Does it also happen for other classes? This might be an important point to address, as direct paths are easier to sample.

      We think that this finding, true for paths connecting classes I and IV, is not general. In a previous paper we have benchmarked our path-designing approach on simple models of insilico lattice proteins and shown that indirect path led to gains in the overall fitness (computed according with the ground-truth model) [Mauri, Cocco, Monasson, Physical Review E 2023, fig. 9-12].

      In general, we would expect that indirect paths could explore alternative mutations, important to compensate for transitory destabilizing mutations that could occur along the path. We speculate that these stabilizing mutations happen for non-direct paths at its extremity near class-I wildtype. A slightly decrease in binding response to peptide C1 for direct path is nevertheless observed (see Suppl Table 4), but our experimental detection, focused on binding response, is not tailored to directly detect a difference in stability. When approaching the class-IV anchoring point, we observe that paths interpolating between classes I and IV are very constrained and show limited diversity, going through a funnel in sequence space corresponding to the direct path. We agree with Reviewer 2 that a more exhaustive comparison with direct paths would be interesting, and will add a sentence in conclusion.

      (3) The path shown in Figure 4 goes through a region of non-functionality around sequences 1819. It seems that the sample path is basically exploring the functional regions for Class I and Class II/III separately, trying to approach the other class, but then it can't really make the switch.

      By contrast, the path going from Class I to Class IV seems able to perform the functional switch in a single step (20-21) without losing too much of the function.

      Perhaps the authors could better comment on this? Is this a limitation of the sampling method, or a fundamental biological fact?

      Class I to Class IV paths and Class I to Class II paths fundamentally differ because the binding pocket in Class I WW domains is different from the one of Class IV WWs, while Classes I and II/III share the same binding region. This important difference may explain why class I specificity can switch to class IV specificity (steps 20-21), without completely loosing affinity to the peptide of class I. To investigate if the two binding regions are really independent or not, we have tested some additional specific mutations along the I-IV mutational paths. In our attempts to engineer cross-reactivity, we have observed that it is important to substantially lower affinity to class I peptide to acquire class IV specificity, in agreement with previous studies [72]. Moreover, the I to IV path seems to go through a funnel-like part in the region with no natural sequences, with the same transition intermediates obtained in several designed paths. This indicates that the Class I to Class IV functional switch is more constrained than the Class I to II switch. Let us also emphasize that our assessment of class specificity is based on one peptide for each class. It would be interesting to test multiple WW-binding peptides with similar biochemical properties to acquire a more complete view of the specificities. 

      (4) On page 12, it is stated that the temperature was chosen to 1/3 to maximize the score. This is important and should be mentioned earlier (I didn't notice it until that point).

      Section 3.5 explains that RBM samples can be biased, by lowering the sampling temperature to 1/3 to obtain high-scores sequences, which are more likely to be functional as proven in [Russ et al., Science 2020]. We acknowledge (as also noted by Reviewer 1) that this section comes at the end of the manuscript, while differences in scores along the path are shown before, so the discussion of this important point is somewhat delayed. We will add a sentence earlier in Results to explain this point.  

      (5) On page 13, it is stated that: "However, the scores of the ancestral sequences along the phylogenetic pathways assigned by the RBM are significantly lower than the ones of the RBMdesigned sequences. This result is expected as ASR reconstruction does not take into account epistasis, differently from RBM, and we expect ASR sequences to generally be of lesser quality."

      I was very surprised by this result. My own experience with ASR shows that, on the contrary, sequences found by ASR (via maximum likelihood) tend to have high scores in the (R)BM, and tend to be more stable than extant sequences. I attribute this to the fact that ASR typically finds a "consensus" sequence that maximizes the contribution to the score coming from the fields (the profile), which is typically dominant over the epistatic signal, resulting in a bigger score. Maybe the authors did not use maximum likelihood in the ASR? Some clarification might be useful here.

      We agree with Reviewer 2 that the consensus sequence is an atypical sequence for an independent model with a large RBM score. We will update Figure 5 of the manuscript to show that this is also happening in our case. 

      We use Maximum Likelihood in ASR but our ASR path corresponds to all internal nodes of the reconstructed tree joining the two extant sequences, not only to the most ancestral node. Overall, the ancestral sequences along the ASR paths are different from the consensus sequence (mean identity of 76% and 60% respectively). The most ancestral nodes in the paths  are also different from the consensus having 81% (paths between type I and IV domains) or 54%(paths between type I and II/III domains) similarity, and an RBM score  of -21, or -58, respectively. We agree that some ASR internal-node sequence have a higher score than the natural wild-types (extant sequences). This is shown in Fig. 6: several points have larger RBM score than the two anchoring points at the extremities of the path, possibly due to the fact that natural sequences are not always the most stable ones. As discussed in conclusion, ASR nodes have moreover generally better scores than the sequences obtained by sampling an independent model. Phylogenetic reconstruction implicitly takes into account some degree of co-variation between sites in natural sequences, as shown by the success of the use of the phylogenetic distance of a mutated sequence to the wild-type for predicting the fitness effect of these mutations [Laine, Mol. Biol. Evol. 2019]. 

      To better show this effect we will update Figure 6, reporting also the scores of the « scrambled » sequences, which do not respect potential epistasis extracted by the RBM. It appears that ASR sequences generally have better scores than the scrambled sequences, and lower than RBM sequences (sampled at T=1/3). RBM models takes into account multiple-residues correlations, which could contribute to reaching better scores than ASR and BM models. Ongoing studies on larger proteins show that the score of sequences sampled from ASR reconstruction, including the Maximum Likelihood one, can still be improved according to the RBM score by a few mutations consistent with the ASR posterior probabilities (unpublished). 

      Mistakes in the reference list will be amended in the updated version.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements [optional]

      We thank to all reviewers on their careful consideration of our manuscript. We highly appreciate their thoughtful comments and suggestions, that helped us to improve the quality of our work. We address each comment point-by-point below.

      2. Description of the planned revisions

      __Reviewer #1 __

      Minor comments:

      Figure 5 would be more informative if it included more higher magnification images that would reveal the staining at the cellular level.

      To fulfil the suggestion, we will perform a new round of immunostaining followed by high-resolution confocal imaging. This requires additional time for laboratory work.

      __Reviewer #2: __

      Major comments

      1d. The authors tried to attribute the minor phenotype to the incomplete depletion of S100A4+ cells. However, it is possible that if the S100A4+ cells only represented a minor population, their function may be compensated by other populations. This might be confirmed by quantification of S100A4+ cells or S100A4-Cre; GFP+ cells in fibroblast or CD45 populations from images showed in Figure 5.

      We will address this comment by performing required quantifications.

      Moreover, we have now included data on the presence of S100A4+ cells in S100a4-Cre;DTA mice (Figure for Reviewers 5a,b; Supplementary Figure 7a,b in the revised manuscript), which demonstrate incomplete depletion of the S100A4+ cells in the nipple and the mammary gland. This is likely due to ongoing tissue remodeling and continuous S100A4+ replenishment/ supply. Another study using the same S100a4-Cre;DTA mouse model reported an efficient S100A4+ cell depletion in mandibular condyle (Tuwatnawanit et al., 2025), which suggests that the presence of S100A4+ cells in the S100a4-Cre;DTA mammary gland and nipple is due to tissue-specific dynamics rather than lack of depletion efficiency.

              We have included in Discussion: “Notably, we observed incomplete depletion of S100A4+ cells in the mammary gland and nipple. Interestingly, a study using the same S100a4-Cre;DTA mouse model reported complete S100A4+ cell depletion in the superficial layer of mandibular condyle46. This suggests that incomplete depletion of S100A4+ cells in nipple and mammary gland is due to tissue-specific dynamics, rather than lack of depletion efficiency, indicating a compensatory mechanism that can balance the cell loss.”
      

      The images in Figure 5 and Figure S4 are difficult to confirm colocalization. A higher magnification image would be required for each panel. Furthermore, a precise quantification based on the current images would be more supportive of the conclusion regarding the discrepancy of the composition of S100A4 lineage between epidermis and mammary gland (lines 163-165).

      To address this comment, we will perform a new round of immunostaining and high-resolution confocal imaging and quantifications and include the results in the fully revised manuscript.

      Line 163, the author hypothesis the Langerhans cells due to morphology. Those cells should be able to be confirmed by a co-staining with F4/80 in addition to the current form of Fig 5h.

      To address this comment, we will perform co-staining of GFP and F4/80 (or, eventually, AIF1, depending on antibody availability) and include the results in the fully revised manuscript.


      Reviewer #3

      Minor comments

      Figure 2c: The H&E images are not fully convincing. Immunofluorescence analysis of epithelial architecture would support the authors' interpretation and should be feasible if tissues are already available.

      We will perform immunostaining for epithelial markers, such as keratins, and include the results in the fully revised manuscript.

      Figure 4f: The proliferation data are compelling, but the authors could extend this by examining how cell differentiation and epithelial organisation are affected.

      We will perform immunostaining for epithelial markers (keratins, αSMA) and include the results in the fully revised manuscript.

      Figure 5b: To more convincingly show that GFP+ cells contact endothelial cells, co-labelling with an endothelial marker such as CD31 would be helpful.

      We will perform the requested co-labeling of GFP and CD31 and include the results in the fully revised manuscript.

      Figure 5f-h: The structures referenced in the text (lines 159-163) should be clearly indicated on the immunofluorescence images.

      We will incorporate these explanations into the new, high-resolution/detailed Figure 5 in the fully revised manuscript.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Reviewer #1:

      Major comments

      1. It is rather difficult to conclude whether the observed nipple phenotype reflects an early embryonic/prepubertal defect in establishing the nipple stroma, is caused by a constitutive response to ongoing cell death, or a response to continuous DTA expression (or a combination of some of these).

      The data raise a couple of additional questions: Is there a nipple phenotype at 3 wk of age? It would not be totally unsurprising if ablation of a major fraction of dermal fibroblasts in the nipple area would lead to an early embryonic/prepubertal phenotype but there is no data on this. Hence, is there a "congenital" nipple deformity, as concluded by the authors (line 191)?

      We appreciate the reviewer’s insightful comments. We have now included data on embryonic nipple development. These data demonstrate abundant S100A4-lineage cells in E15.5 and E18.5 skin of S100a4-Cre;mT/mG embryos (Figure for Reviewers 1a, corresponding to Figure S3a in the revised manuscript) and normal appearance of nipple sheath in S100a4-Cre;DTA embryos at E18.5 (Figure for Reviewers 1b, corresponding to Figure S3b in the revised manuscript), suggesting no embryonic defect.

      Unfortunately, we cannot provide data on 3-weeks old mice (we have not collected this timepoint previously and currently we do not have this mouse line alive). Instead, however, we provide in situ pictures of DTA and S100a4-Cre;DTA nipples at 7 weeks of age (Figure for Reviewers 1c; Figure S3c in the revised manuscript), which demonstrate that the phenotype of defective nipple is fully established at this timepoint. Because the late embryonic data did not support the “congenital” establishment of the nipple deformity and we could not provide any more data from early postnatal development, we have corrected the statement “we describe a congenital nipple deformity” in the discussion to “we describe a nipple deformity”.

      Are there S100a4+ cells in the nipple area of pubertal S100a4-Cre/DTA mice? I.e. is there a continuous supply of new S100a4+ cells and thereby continuous cell death and DTA expression as one might expect based on the RNA-seq data?

      The S100A4+ cells are present in the nipple area of S100a4-Cre;DTA mice, suggesting a continuous supply of new S100A4+ cells (Figure for Reviewers 1b, corresponding to Figure S3b in the revised manuscript; and Figure for Reviewers 5a,b, corresponding to Figure S7a,b in the revised manuscript). In the revised manuscript, we comment on this in Discussion: “Notably, we observed incomplete depletion of S100A4+ cells in the mammary gland and nipple. Interestingly, a study using the same S100a4-Cre;DTA mouse model reported complete S100A4+ cell depletion in the superficial layer of mandibular condyle46. This suggests that incomplete depletion of S100A4+ cells in nipple and mammary gland is due to tissue-specific dynamics, rather than lack of depletion efficiency, indicating a compensatory mechanism that can balance the cell loss.”

      Figure for Reviewers 1 (Figure S3 in the revised manuscript): Embryonic and pubertal nipple phenotype. (a) Representative images of cleared whole-mount S100a4-Cre;mT/mG nipple tissue at embryonic developmental time-points: E15.5 and E18.5. Scale bar = 100 µm. (b) Immunofluorescent labeling for S100A4 on embryonic DTA and S100a4-Cre;DTA whole-mount skin (E18.5). Scale bar = 100 µm. (c) Representative in situ photographs of nipples from DTA and S100a4-Cre;DTA pubertal (7-weeks old) mice. Scale bar = 1 mm.

      The subtitle on line 54 implies that that S100a4-Cre/DTA mice display a branching phenotype. However, it looks to me as if there is a pubertal outgrowth defect (as is also written in the body text, line 64) rather than a branching phenotype, potentially reflecting the much smaller size of S100a4-Cre/DTA mice (Fig. 2a). Unless there is a change in branch point frequency, I suggest rephrasing the title and discussion. Instead, I suggest the authors discuss the observed outgrowth delay considering the gross overall growth defect (Fig. 2a). If ductal outgrowth was normalized to the overall growth defect, would one still observe 'a delay in branching morphogenesis'?

      We apologize for the section title confusion. We have analyzed branching frequency in 7-weeks-old females and observed reduced total number of branching points in S100a4-Cre;DTA mice (Figure for Reviewers 2a, corresponding to Figure 2f in the revised manuscript). A significant difference in number of branching points remained also after their normalization to body weight, (Figure for Reviewers 2c, corresponding to Figure 2h in the revised manuscript). We have now added the new quantifications to the revised manuscript with accompanying descriptions in the main text “Analysis of mammary epithelial development using whole-mount carmine staining revealed no significant differences in the prenatal establishment of the mammary epithelial tree but did reveal significantly delayed epithelial outgrowth and reduced branching in pubertal (7 weeks old) S100a4-Cre;DTA mice (Figure 2e,f). Normalization of epithelial outgrowth and branching to body weight indicates that the observed defect represents a mammary-specific impairment rather than a consequence of reduced body growth (Figure 2g,h).”.

      __Figure for Reviewers 2 (Figure 2 in the revised manuscript): __Pubertal branching morphogenesis is delayed in S100a4-Cre;DTA. (a-c) The plots show total number of branching points (a), epithelial outgrowth [mm] normalized to body weight [g] (b), and total number of the branching points normalized to body weight [g] (c) in 7 weeks old DTA and S100a4-Cre;DTA mice. All plots show the mean ± SD, *p

      Fig. 4e shows Masson's Trichrome and Picrosirius Red staining and the authors report the findings as follows (lines 120-124): "collagen fibers were loosened in the DTA nipples and more densely packed in the S100a4-Cre;DTA nipples". Perhaps the authors could help non-specialists to observe the loosened fibers and if they wish to make quantitative statements ("more densely packed"), such statements should be backed-up by quantifications.

      Picrosirius Red staining viewed under polarized light is a classic way to assess collagen organization, thickness, and packing. Red / orange / yellow color typically marks thicker, more mature, and more tightly packed collagen fibers (often associated with type I collagen), while green color usually marks thinner, less organized, or less densely packed fibers (often associated with type III collagen or immature collagen). We had included this explanation in the Figure legend of the submitted manuscript already: “Typically, thicker collagen fibers exhibit stronger birefringence and appear red or orange, while thinner fibers exhibit weaker birefringence and appear green or yellow.” To help with the quantification, we have extracted the red channel and quantified color intensity. The results are shown in Figure for Reviewers 3, corresponding to Figure S4 in the revised manuscript. Moreover, we will also quantify the differences in pattern of the collagen fibers. The fibers in DTA nipples look shorter and more curved, while the fibers in S100a4-Cre;DTA nipples look longer and straighter, more aligned. The results will be included in the fully revised manuscript.

      Figure for Reviewers 3 (Figure S4 in the revised manuscript): Collagen fibers are densely packed in S100a4-Cre;DTA nipples contain more . (a) Representative pictures of histological sections of DTA and S100a4-Cre;DTA stained for collagen by Picrosirius red. Polarized light images and the red channel (mature/densely packed collagen) are shown alongside detail pictures of selected regions A and B. Scale bar = 200 µm and 100 µm (in detail pictures). (b) Quantification of Intensity Mean Value for the red channel (densely packed collagen), showing statistically non-significant difference. The plot shows the mean ± SD, ns p > 0.05 (Mann-Whitney test), n = 3 DTA / 4 S100a4-Cre;DTA.

      I found the Discussion on the various mouse models somewhat problematic. Overall, the paper is written is a way that it often remains unclear whether it refers to studies addressing the role of S100a4 itself, studies addressing the function of S100a4+ cells via ablation approaches (S100a4-Cre or S10 0a4-CreERT2 crossed with floxed DTA), or those where S100a4-Cre has been used to delete gene X/Y/Z. These are all very different experimental approaches where one approach is not necessarily informative when trying to understand the results from another one. The authors should make these points clear and consider whether all their discussion points are relevant.

      We apologize for the confusion. We have carefully reviewed the references and their interpretations, and corrected them as necessary.

      The abstract states S100a4 (fibroblast-specific protein 1) is "expressed by mesenchymal cells and has been implicated in the development of eccrine glands, hair follicles, and mammary branching morphogenesis". However, the study on eccrine glands (ref. 19) shows that S100A4+ cells play a role in eccrine gland development but it does not address the role of S100a4 itself, while the study on hair follicles (ref.20) in turn reports the expression pattern of S100a4 in hair follicles but does not address its function, nor the role of S100a4+ cells. Finally, I failed to find references in the paper to studies addressing the role of S100a4, or S100a4+ cells in the mammary gland.

      Instead, the paper had references to studies where S100A4-Cre had been used to delete different genes and these mice had various mammary phenotypes - which, as indicated above, is a very different approach compared to deleting S100a4 or ablating S100a4+ cells.

      Thank you for your comment. We addressed the concern in the Abstract and further in the Discussion. We revisited the present the cited studies more carefully, clearly distinguishing the different approaches and particular findings.

      In our literature review, we also considered studies that used S100a4-Cre mouse model, to manipulate gene expression within S100A4+ cells. We believe that these studies bring indirect evidence of S100A4+ cell involvement in development and/or homeostasis of a tissue, such as mammary gland. Please, find the rephrased part of Abstract in the text, and below:

      “S100A4 (S100 calcium binding protein A4, also known as fibroblast-specific protein 1) is expressed by mesenchymal cells and has been associated with hair follicle regeneration. S100A4-expressing cells have been implicated in the development of eccrine glands, and studies using S100a4-Cre to manipulate gene function have suggested that S100A4-expressing cells may contribute to mammary branching morphogenesis.”

      __In Discussion (lines 197-200), __the authors write: "We described significant delay in mammary branching morphogenesis in puberty, confirming an important role for S100A4+ cells in mammary development, as it was previously described (refs 37-39)."

      It should be noted that none of these studies addressed the role of S100A4+ cells:

      • Ref 37 used S100a4-Cre to delete sharpin

      • Ref 38 used the same Cre line to delete Ptch1, did not address the role of S100a4 or S100a4 expressing cells

      • Likewise ref 39 deleted another gene using S100a4-Cre

      Later on in Discussion, the authors compare the reported phenotype to previous studies (lines 248-255): "...targeting S100A4+ cells through knockout experiments can result in severe phenotypes, such as a reduction in adipose tissue (ref 26), skin phenotypes, a disrupted estrous cycle, reduced fertility (ref. 38), and complete infertility, hypogonadism and defects in pituitary endocrine function (ref. 28).

      Of these, Ref. 26 used the same approach as the current study (S100a4-Cre; DTA) (Fig. 7A in the paper)

      • these mice were significantly lean, with markedly reduced fat compared with the control mice - also the mice in the current study are very small, so perhaps they could also be described as 'lean'. Yet ref. 26 reports that female mice had comparable food uptake, respiratory exchange ratio and physical activity, and slightly increased energy expenditure

      Ref. 38 (as mentioned above) reports deletion of Ptch1 using S100a4-Cre lines and these mice "displayed a disrupted estrous cycle and dramatically reduced fertility over 6.5 weeks". However, this has nothing to do with the approaches where Fsp1/S100a4+ cells are depleted with DTA. Likewise, reference 28 analyzed the phenotype of S00a4-Cre;Ptch1fl/fl mice. Obviously, deleting Ptch1 using S100a4-Cre mice is quite a different approach than "targeting S100A4+ cells" through knockout experiments". Ptch1 deletion leads to a combination of gain-of-function (of Hedgehog activation) and loss-of-function (loss of Hh-independent functions of Ptch1) and hence comparisons with these phenotypes is rather challenging. I suggest the authors focus their phenotype comparisons to ref. 26 where S100a4/Fsp1+ cells were ablated with DTA, i.e. the same approach as in the current study.

      Please, find the rephrased part of Discussion in the text (lines 236-256), and below:

      “A key consideration when interpreting studies involving S100A4 is that fundamentally different experimental approaches have been used to investigate its role. These include descriptive analyses of S100A4 expression, functional studies targeting the S100A4 protein itself, genetic models using S100a4-Cre to manipulate unrelated genes in S100A4-expressing cells, and ablation models such as S100a4-Cre;DTA, which deplete S100A4⁺ cells. These approaches are not equivalent and provide distinct types of information. In the present study, we specifically assess the consequences of ablating S100A4-expressing cells, and comparisons to other studies should therefore be interpreted within this context.

      Studies using S100a4-Cre to manipulate specific signaling pathways (e.g. Wnt or Hedgehog signaling via gene deletion) in S100A4-expressing cells have reported diverse phenotypes, including effects on fertility and endocrine function28,34. However, these phenotypes primarily reflect the consequences of pathway perturbations within S100A4-expressing cells rather than the role of S100A4⁺ cells themselves. This is fundamentally different from the ablation approach used here, which removes the S100A4⁺ cell population.

      In contrast, studies employing S100a4-Cre–driven DTA–mediated ablation represent a directly comparable approach. Such studies have reported systemic phenotypes, including reduced adipose tissue and altered metabolic parameters26, indicating that S100A4-expressing cells contribute to multiple aspects of tissue homeostasis. Consistent with these previous reports, S100a4-Cre;DTA mice used in our study were significantly smaller than their littermates. Our findings extend these observations by identifying a specific and previously unrecognized role for this cell population in nipple morphogenesis.”

      I find the Discussion is somewhat off the topic by starting with WHO recommendations on breastfeeding and linking this to observed mouse phenotype. Overall, the discussion is rather long and from time-to-time more like a literature review. I would recommend keeping the Discussion more succinct and focused.

      To improve the conciseness and focus of Discussion, we have deleted this part of text.

      **Referee cross-comenting**

      I agree with the comments of other reviewers. However, to me it seems that the analysis of S100a4 knockout mice would not be feasible within a reasonable timeframe and would represent a study of its own. My understanding was that the authors were not interested in S100a4 itself. Rather, S100a4-Cre was used as a tool to understand the importance of a certain (fibroblast) cell population for mammary gland morphogenesis.

      Indeed, our goal was to study the role of a specific cell population (S100A4+ cells) in mammary gland morphogenesis, not to study the role of S100A4 protein per se.

      Reviewer #1 (Significance (Required)): General assessment:

      This study reveals the importance of the S100a4+ cell lineage for nipple formation while showing the same cells are dispensable for mammary gland morphogenesis. The main limitation is that it remains unclear whether the observed nipple phenotype is derived from an early embryonic/prepubertal defect in establishing the nipple stroma, is caused by a constitutive response to ongoing cell death, or a response to continuous DTA expression (or a combination of some of these). Hence its relevance as a model of human inverted nipple condition remains rather speculative.

      Thank you for consideration of our work and valuable feedback. We did not intend to claim that S100a4-Cre;DTA mouse represents a model of human inverted nipple condition. However, considering morphological features, it might resemble it. We now rephrased the Discussion so it is clearer and more concise.

      Reviewer #2

      Major comments:

      1. My key concern is the discussion part. I think the authors need to re-organize/re-phrase the discussion part, it confused me a bit in terms of logic, phrases and interpretation of literatures.

      We have significantly re-organized and re-phrased the Discussion.

      Here are few examples:

      1. The lines 195-199 contain lot of repeated information

      We have rephrased the paragraph and removed repeated information. The new text can be found in lines 201-206 in the revised manuscript.

      1. The authors mentioned the studies in ref 26,28 and 38 using "targeting S100A4+ cells through knockout experiment can result in sever phenotypes". This is very misleading. Those studies using the same (or similar if the origin is different) S100A4-Cre line as the current study but induced the activation of Wnt and sHH signalling pathways, respectively. The observed phenotypes are largely due to the pathway function, rather than the S100A4 gene or normal S100A4+ cell itself. This is significantly differed from the current study.

      We apologize for the confusion; we have now rephrased our claims (lines 236-256):

      “A key consideration when interpreting studies involving S100A4 is that fundamentally different experimental approaches have been used to investigate its role. These include descriptive analyses of S100A4 expression, functional studies targeting the S100A4 protein itself, genetic models using S100a4-Cre to manipulate unrelated genes in S100A4-expressing cells, and ablation models such as S100a4-Cre;DTA, which deplete S100A4⁺ cells. These approaches are not equivalent and provide distinct types of information. In the present study, we specifically assess the consequences of ablating S100A4-expressing cells, and comparisons to other studies should therefore be interpreted within this context.

      Studies using S100a4-Cre to manipulate specific signaling pathways (e.g. Wnt or Hedgehog signaling via gene deletion) in S100A4-expressing cells have reported diverse phenotypes, including effects on fertility and endocrine function28,34. However, these phenotypes primarily reflect the consequences of pathway perturbations within S100A4-expressing cells rather than the role of S100A4⁺ cells themselves. This is fundamentally different from the ablation approach used here, which removes the S100A4⁺ cell population.

      In contrast, studies employing S100a4-Cre–driven DTA–mediated ablation represent a directly comparable approach. Such studies have reported systemic phenotypes, including reduced adipose tissue and altered metabolic parameters26, indicating that S100A4-expressing cells contribute to multiple aspects of tissue homeostasis. Consistent with these previous reports, S100a4-Cre;DTA mice used in our study were significantly smaller than their littermates. Our findings extend these observations by identifying a specific and previously unrecognized role for this cell population in nipple morphogenesis.”

      1. In the lines 253-255, why the author believe complete S100A4+ depletion would leads to the fatal of mouse? Is there study suggest that? Or have authors checked the expression of S100A4 in the S100A4-Cre;DTA model to confirm the efficiency?

      We have now included, also in response to other Reviewers’ comments, data on S100A4 expression in the S100A4-Cre;DTA model (Figure for Reviewers 5, corresponding to Figure S7 in the revised manuscript), and commented on these results in lines 257-262: “Notably, we observed incomplete depletion of S100A4+ cells in the mammary gland and nipple. Interestingly, a study using the same S100a4-Cre;DTA mouse model reported complete S100A4+ cell depletion in the superficial layer of mandibular condyle48. This suggests that incomplete depletion of S100A4+ cells in nipple and mammary gland is due to tissue-specific dynamics, rather than lack of depletion efficiency, indicating a compensatory mechanism that can balance the cell loss.”

      In Fig. 1, the authors described the impaired nursing capacity of S100A4-Cre;DTA dam. However, it seems the little size is also smaller (Fig 1a). Do authors have any explanation or hypothesis?

      Thank you for this insightful observation. It is well established that metabolic and nutritional condition directly affect female reproductive functions. Adult S100A4-Cre;DTA mice are generally smaller compared to their litter counterparts, potentially because of lower body fat content or other anatomic/metabolic condition that might negatively influence fecundity, for instance, lowering ovulation rate and/or embryonic survival. In support of this, earlier studies have reported a positive correlation between growth rate/body condition and litter size (Eisen & Durrant, 1980). Unfortunately, in the case of S100A4-Cre;DTA mice, we can only speculate about the possible explanations, as we do not have supporting data which could confirm it.

      In lines 181-184, the authors states "the results showed that the tissue reacted to a foreign chemical or an endogenous compound....." , which results are referring here? I could not find any inflammation related GO terms in figure 6b. It would be more accurate to specify them in lines 179-181, which appears to be a technical statement rather than a result in current form.

      Thank you for this comment. Indeed, there are no GO terms explicitly labeled as “inflammation” and “repair”; however, several GO terms are functionally related to these processes. Our interpretation was based on broader biological context rather the explicit annotation. To clarify this, we revisited the text and included GO terms that reflect the tissue response (lines 187-193).

      “The GO terms indicated that the tissue reacted to a foreign chemical or an endogenous compound (xenobiotic metabolic process, cellular response to xenobiotic stimulus, response to xenobiotic stimulus, epoxygenase P450 pathway), and responded to inflammation and repair (actin filament-based process, actin cytoskeleton organization; eicosanoid and lipid metabolic processes) (Figure 6b).”

      The lines 182-184 was not clear. Does the author refer the "nipple tissue response" in general as malfunction of development or inflammation and tissue repair as mentioned in the previous sentence? If the later cases, the authors should consider the failure of lactation might mimic the involution, which may cause the apoptosis and inflammation as well. This might be independent of the DTA expression.

      Thank you for raising this point. Indeed, in this line, we refer to ongoing tissue inflammation and repair. We also considered the hypothesis that the ejection incapability (and consecutive milk stasis) triggers involution. However, tissues were collected within a few hours after parturition, when only very early signs of involution, if any, would be detectable; therefore, we expect minimal influence of involution. To reflect this comment, we added new text to the Discussion (lines 272– 277). “The observed tissue response can be also associated with hallmarks of mammary involution, the process which is triggered by the milk stasis. However, the tissues were collected within few hours after parturition, when the effect of involution should be minimal53. Rather, we hypothesize that immune cell recruitment, and the upregulation of the lipid skin barrier might be caused in response to the continuous apoptosis of S100A4+ cells and their replacement.”

      Minor comments:

      1. The authors demonstrated in Figure S1 and lines 92-96 that no significant differences were observed in pituitary glands and ovaries in S100a4-Cre:DTA and DTA mice. Have the authors checked the S100A4 expression or lineage cells in these organs, or have been reported by others?

      Yes, we checked the S100A4-lineage cells in the pituitary gland and ovary and have now included the results here (Figure for Reviewer 4a,b corresponding to Figure S1a,b in the revised manuscript), along with relevant text description (lines 94-95 in the revised manuscript). “We observed S100A4-lineage traced cells in pituitary gland and ovaries using S100a4-Cre;mT/mG model (Figure S1a,b).” The presence of S100A4+ cells in these organs was also reported previously (Ren et al., 2019).

      Figure for Reviewers 4 (Figure S1 in the revised manuscript): S100A4-lineage cells are abundant in the pituitary gland and ovary. (a) Representative images of a cleared whole-mount pituitary gland from a S100a4-Cre;mT/mG mouse. (b) Representative images of a cleared whole-mount ovary from a S100a4-Cre;mT/mG mouse. Scale bar = 100 µm.

      The authors have performed live imaging to evaluate the contraction of alveoli. It would be better to include a video together with the snapshots showed in Figure S2.

      We have included the videos as supplementary movies, Movie S1 (DTA) and Movie S2 (S100a-Cre;DTA).

      Since the study is mainly using S100a4, it would be better to avoid using FSP1 in the results, for example Fig 5h.

      We apologize for this oversight; it has now been corrected.

      What does L1 stand for? Lactation Day 1? It should be spelt out in the first instance.

      Yes, indeed, L1 is lactation day 1. Please note that it was already spelled out in the first version of the manuscript, now in line 48.

      Line 150. Figure S4 should be Figure S4a.

      (Please note, that by adding new Supplementary figures, this comment is referring to Figure S6 in the new version of manuscript.) Thank you for this comment. In the text, we state “GFP+ cells were spread throughout the fat pad but were also localized in the periepithelial stroma and infiltrated the epithelium”. This we show in Figure S6a and in S6b; therefore, we now changed the reference accordingly, as it might be more accurate.

      **Referee cross-comenting**

      I agree with the other reviewers, as well as the Consultation Comments. The manuscript would benefit greatly from a thoroughly optimised Discussion section to address issues raised by all reviewers.

      __ Reviewer #2__ (Significance (Required)):

      • Overall, this study is well designed and the key findings are valid, especially the role of S100A4 during nipple development is novel and interesting.

      -One limitation of the study is that RNA-seq was performed using a mixture of all cell types present in the nipple. While this approach is reasonable-given that depletion of the S100A4+ lineage may exert both direct and indirect effects contributing to nipple dysfunction-it should be more clearly acknowledged and discussed in the manuscript. Additionally, this experimental design may limit the utility of the dataset for other researchers interested in nipple development and the specific functions of S100A4.

      Reviewer #3

      Major comments:

      2) The differential systemic versus mammary-specific effects of DTA-mediated S100A4 cell ablation are intriguing. The authors should address why the mammary fat pad appears unaffected.

      Thank you for this comment. The role of S100A4+ cells in adipose tissue was previously reported (Zhang et al., 2018). Authors reported significantly smaller adipose tissue of S100a4-Cre;DTA mice (males and females), measured as the weight of the dissected fat pad. In our work, we measured the in-situ area of the fat pad, which appeared to be unaffected. It is possible that the volume (weight) of the fat pad would be different, however we do not have data to confirm / reject this hypothesis.

      Are S100A4 expressing cells present during embryonic mammary development, or are they mainly postnatal? Would an inducible S100A4CreERT model lead to similar phenotypes, or might the timing of depletion influence the outcome? Discussing these points would reinforce the conclusions regarding the contribution of S100A4-expressing cells to mammary and nipple development and could also clarify the transient nature of the ductal branching phenotype.

      S100A4-expressing cells are present during embryonic mammary development, too. Please, refer to the embryonic lineage-tracing time-points incorporated in the first version of the manuscript (Figure 5a and Figure S6a). Now, we have added Figure for Reviewers 1 corresponding to Figure S3 in the revised manuscript), which focuses on the embryonic nipple phenotype but also provides information on the presence of S100A4+ cells.

      We agree that the use of inducible S100a4-CreERT model could potentially bring new insights toward developmental stage-specific roles of S100A4+ cells, and thus would be interesting to use in a follow-up study. Currently, such experiments are beyond our capacity.

      Therefore, we have included a new subsection on Limitations of the study, where we comment:

      “A major limitation of this study is that the timing of DTA-mediated cell depletion cannot be precisely defined in the constitutive mouse model employing S100a4-Cre because recombination may occur continuously following the initial expression of S100a4 (E8.518). This limitation could be overcome by usage of inducible S100a4-CreERT instead. With this approach, it could be more feasible to determine if the nipple deformity arises as a defect of embryonic development or postnatal morphogenesis.”

      3) Although the authors attribute lactation failure primarily to defects in nipple architecture, the RNA seq data reveal downregulation of key milk production genes and luminal differentiation keratins, strongly suggesting impaired secretory activation. The authors should more explicitly discuss the relative contributions of epithelial functional maturation defects versus nipple structural abnormalities to the lactation failure observed upon S100A4+ cell depletion. Thank you for this comment. We believe that performing an immunofluorescence labeling of epithelial architecture (requested in the Minor comment 2) could bring more light into this. However, we deduce that secretory activation is not impaired, as the presence of the milk observed on in situ wholemounts, and H&E-stained alveoli (Figure 3d) implies luminal secretion of milk components. The observed phenotype of the lactating mammary gland strongly suggests there is a structural abnormality inhibiting the milk ejection.

      The downregulation of key milk production genes and luminal keratins in the bulk RNA-seq data may be influenced by differences in tissue composition between samples. In control mice, more fully developed nipples and an extended ductal network likely contribute to a greater representation of differentiated luminal epithelial cells, thereby increasing the expression of these markers.

      Minor comments:

      1. Figure 1: Including an immunohistochemistry or immunofluorescence control confirming depletion of S100A4 expressing cells would strengthen the conclusions.

      We have now included Figure for Reviewers 5 that corresponds to Figure S7 in the revised manuscript and comment on the results in sections Results (lines 169-171) and Discussion (lines 257-262).

      In Results: “Interestingly, S100A4 antibody labeling revealed presence of S100A4+ cells in S100a4-Cre;DTA tissues (Figure S3b, Figure S7a,b).”

      In Discussion: “Notably, we observed incomplete depletion of S100A4+ cells in the mammary gland and nipple. Interestingly, a study using the same S100a4-Cre;DTA mouse model reported complete S100A4+ cell depletion in the superficial layer of mandibular condyle48. This suggests that incomplete depletion of S100A4+ cells in nipple and mammary gland is due to tissue-specific dynamics, rather than lack of depletion efficiency, indicating a compensatory mechanism that can balance the cell loss.”

      Figure for Reviewers 5 (Figure S7 in the revised manuscript): S100A4+ cells are found in S100a4-Cre;DTA nipple and mammary tissues. (a) Immunofluorescent labeling for S100A4 and vimentin on FFPE sections of DTA and S100a4-Cre;DTA L1 nipples. (b) Immunofluorescent labeling for S100A4 and smooth muscle actin on FFPE sections of DTA and S100a4-Cre;DTA L1 mammary gland. Scale bar = 100 µm.

      Figure 3c: The histological defects more accurately reflect failure of secretory activation rather than "lactation failure" per se. The terminology should be refined to reflect this more precisely.

      Thank you for this comment. As explained in the response to your major comment 3, we believe our results show that the secretory activation is conserved in S100a4-Cre;DTA lactating mice. We understand that “lactation failure” might be misleading terminology, as the production of the milk is conserved as well. We therefore change the phrasing into “nursing defect” (line 51, 73, 83), as this could reflect the phenotype most precisely.

      **Referee cross-comenting**

      I agree with the Reviewer, the authors do not need to do knockout experiments in the revised manuscript. However, it would be great if they could address my comment in the discussion.

      Reviewer #3 (Significance (Required)):

      This is an important study for mammary developmental biology, addressing the relatively understudied mechanisms that govern nipple development at the stromal-epithelial interface, and the determinants of lactational performance. A major strength is the elegant integration of DTA-mediated cell ablation, advanced imaging, lineage tracing, and transcriptomics to uncover previously uncharacterised roles for S100A4-expressing stromal populations in shaping nipple morphology and function. The work lays a foundation for future studies into nipple biology and pathologies and mechanisms underlying successful lactation.

      Although the study is already mature, it could be further strengthened by incorporating more specific genetic models, such as inducible S100A4CreERT or S100A4 gene knockout/knockdown approaches.

      Thank you for appreciation of our work.

      4. Description of analyses that authors prefer not to carry out

      Reviewer #1

      Major Comment 1.

      It is rather difficult to conclude whether the observed nipple phenotype reflects an early embryonic/prepubertal defect in establishing the nipple stroma, is caused by a constitutive response to ongoing cell death, or a response to continuous DTA expression (or a combination of some of these). The data raise a couple of additional questions: Is there a nipple phenotype at 3 wk of age?...

      Unfortunately, we cannot provide data on 3 weeks old mice because we did not collect such samples before and we had to terminate our mouse colony due to an infection in the animal house (mouse line reanimation is possible because we had stored sperm of the mouse line but it would take a lot of time and resources). Nevertheless, we tried to address this comment by providing other relevant available data (see Figure for Reviewers 1).

      Reviewer #2

      Major Comment 3.

      In Fig S1c, d and lines 93-96, the authors investigated the estrus cycles to determine the potential cause of lactation failure. The data was presented as the number of mice in each stage. A more intuitive approach would be to follow the same mice for two to three cycles and observe the duration of each stage.

      We agree that the suggested approach would be more accurate in determining truly cycling females. Unfortunately, we cannot perform this experiment currently because we do not have these mice alive anymore. Nevertheless, because the S100a4-Cre;DTA females bore pups, they had cycled and were fertile.

      Reviewer #3

      Major comment 1.

      While the S100A4Cre::DTA model is powerful for evaluating the roles of S100A4 expressing cells, the authors should discuss the potential outcomes of using S100A4 knockout or knockdown approaches. If the authors have such data available, this could help distinguish phenotypes caused by loss of S100A4 function itself from those arising due to ablation of S100A4 expressing cell populations and would add mechanistic depth to the study.

      We thank the Reviewer for this insightful suggestion. We agree that genetic approaches targeting S100A4 function (e.g., knockout or knockdown) could, in principle, help disentangle cell-autonomous effects of S100A4 from those resulting from the loss of S100A4-expressing cell populations. However, we would like to clarify that the primary objective of our study is to investigate the functional contribution of S100A4⁺ stromal cells at the population level, rather than to dissect the molecular function of S100A4 protein per se. In this context, the S100A4-Cre;DTA model provides a well-established and appropriate strategy to ablate this cell population and assess its role in tissue development. Importantly, S100A4 is not only a functional protein but also a widely used marker of a heterogeneous stromal cell population. Genetic ablation of S100A4 itself would not eliminate these cells, and may result in relatively subtle or compensable phenotypes due to functional redundancy within the S100 protein family or context-dependent roles of S100A4. Therefore, such approaches would address a distinct biological question and may not directly recapitulate the phenotypes observed upon cell ablation.

      References

      Eisen, E. J., & Durrant, B. S. (1980). Genetic and Maternal Environmental Factors Influencing Litter Size and Reproductive Efficiency in Mice. Journal of Animal Science, 50(3), 428–441. https://doi.org/10.2527/jas1980.503428x

      Ren, Y. A., Monkkonen, T., Lewis, M. T., Bernard, D. J., Christian, H. C., Jorgez, C. J., Moore, J. A., Landua, J. D., Chin, H. M., Chen, W., Singh, S., Kim, I. S., Zhang, X. H. F., Xia, Y., Phillips, K. J., MacKay, H., Waterland, R. A., Cecilia Ljungberg, M., Saha, P. K., … Richards, J. A. S. (2019). S100a4-Cre–mediated deletion of Ptch1 causes hypogonadotropic hypogonadism: Role of pituitary hematopoietic cells in endocrine regulation. JCI Insight, 4(14). https://doi.org/10.1172/jci.insight.126325

      Tuwatnawanit, T., Wessman, W., Belisova, D., Sumbalova Koledova, Z., Tucker, A. S., & Anthwal, N. (2025). FSP1/S100A4-Expressing Stem/Progenitor Cells Are Essential for Temporomandibular Joint Growth and Homeostasis. Journal of Dental Research, 104(5), 551–560. https://doi.org/10.1177/00220345251313795

      Zhang, R., Gao, Y., Zhao, X., Gao, M., Wu, Y., Han, Y., Qiao, Y., Luo, Z., Yang, L., Chen, J., & Ge, G. (2018). FSP1-positive fibroblasts are adipogenic niche and regulate adipose homeostasis. PLoS Biology, 16(8). https://doi.org/10.1371/journal.pbio.2001493

    1. Author Response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Mancl et al. present a comprehensive integrative study combining cryo-EM, SAXS, enzymatic assays, and molecular dynamics (MD) simulations to characterize conformational dynamics of human insulin-degrading enzyme (IDE). In the revised manuscript, the study now also includes time-resolved cryo-EM and coarse-grained MD simulations, which strengthen the mechanistic model by revealing insulin-induced allostery and β-sheet interactions between IDE and insulin. Together, these results expand the original mechanistic insight and further validate R668 as a key residue governing the open-close transition and substrate-dependent activity modulation of IDE.

      Strengths:

      The authors have substantially expanded the experimental scope by adding time-resolved cryo-EM data and coarse-grained MD simulations, directly addressing requests for mechanistic depth and temporal insight. The integration of multiple resolution scales (cryo-EM heterogeneity analysis, all-atom and coarse-grained MD simulations, and biochemical validation) now provides a coherent description of the conformational transitions and allosteric regulation of IDE. The addition of Aβ degradation assays strengthens the claim that R668 modulates IDE function in a substrate-specific manner. Finally, the manuscript reads more clearly: figure organization, section headers, and inclusion of a new introductory figure make it accessible to a broader audience. Overall, the revision reinforces the conceptual advance that the dynamic interdomain motions of IDE underlie both its unfoldase and protease activities and identifies structural motifs that could be targeted pharmacologically.

      Weaknesses:

      While the authors acknowledge that future studies on additional IDE substrates (e.g., amylin and glucagon) are warranted, such experiments remain outside the present scope. Their absence modestly limits the generalization of the R668 mechanism across all IDE substrates. Despite improved discussion of kinetic timescales and enzyme-substrate interactions, experimental correlation between MD timescales and catalysis remains primarily inferential. The moderate local resolution of some cryo-EM states (notably O/pO) continues to limit atomic interpretation of the most flexible regions, though the authors address this carefully.

      Reviewer #2 (Public review):

      Summary:

      The manuscript describes various conformational states and structural dynamics of the Insulin degrading enzyme (IDE), a zinc metalloprotease by nature. Both open and closed state structures of IDE have been previously solved using crystallography and cryo-EM which reveal a dimeric organization of IDE where each monomer is organized into N and C domains. C-domains form the interacting interface in the dimeric protein while the two N-domains are positioned on the outer sides of the core formed by C-domains. It remains elusive how the open state is converted into the closed state but it is generally accepted that it involves large-scale movement of N-domains relative to the C-domains. Authors here have used various complementary experimental techniques such as cryo-EM, SAXS, size-exclusion chromatography and enzymatic assays to characterize the structure and dynamics of IDE protein in the presence of substrate protein insulin whose density is captured in all the structures solved. The experimental structural data from cryo-EM suffered from high degree of intrinsic motion amongst the different domains and consequently, the resultant structures were moderately resolved at 3-4.1 Å resolution. Total five structures were generated in the originally submitted manuscript using cryo-EM. Another cryo-EM reconstruction (sixth) at 5.1Å resolution was mentioned after first revision which was obtained using time-resolved cryo-EM experiments. Authors have extensively used Molecular dynamics simulation to fish out important inter-subunit contacts which involves R668, E381, D309, etc residues. In summary, authors have explored the conformational dynamics of IDE protein using experimental approaches which are complimented and analyzed in atomic details by using MD simulation studies. The studies are meticulously conducted and lay ground for future exploration of protease structure-function relationship.

      Comments after first peer-review:

      The authors have addressed all my concerns, and have added new data and explanations in terms of time-resolved cryo-EM (Fig. 7) and upside simulations (Fig. 8) which in my opinion have strengthened the merit of the manuscript.

      We are grateful for the dedication and constructive feedback provided by the editors and reviewers. We have revised our manuscript according to the suggestions by both reviewers.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The new version of the manuscript reads exceedingly well and the corrections the authors have made during their revision made the manuscript much easier to read and digest than the first version. Below are minor details that may be corrected:

      Abstract:

      Line 45-47: "IDE is known to transition between a closed state, poised for catalysis, and an open state, able to release cleavage products and bind a new substrate." (consider adding a)

      Fixed

      Line 48-50: "Combining cryo-EM heterogeneity analysis with all-atom molecular dynamics (MD) simulations, we identified the structural basis and key residues for IDE conformational dynamics that were not previously revealed by IDE static structures." (consider adding previously)

      Changed

      Line 52-54: "Our small-angle X-ray scattering analysis and enzymatic assays of an R668A mutant indicate a profound alteration of conformational dynamics and catalytic activity." (consider adding analysis)

      Changed

      Line 54: Consider leaving out "Upside" in the abstract (to avoid confusion when reading the abstract) and leave it to be introduced in the introduction when Upside MD simulations are first mentioned.

      Changed

      Results:

      Figure 5D: There seems to be an error in the legend for Figure 5D. It says "... presence of varying amounts of insulin", but this must be Aβ1-40. Please add info on whether the replicates are technical or biological.

      The legend has been revised as suggested.

      Line 125: Consider switching the order of "here" and "we"

      “here” has been removed.

      Line 128: Replace "5" with "five"

      Changed

      Line 137: Replace "when insulin is present" with "in the presence of insulin"

      Changed

      Line 228: Replace "5" and "6" with "five " and "six"

      Changed

      Line 229: Consider adding the word "form": "First, the open subunits did not close to form a singular structure."

      We have adjusted the sentence to read “close to a singular consensus structure”

      Line 327: Replace "2" with "two"

      Changed

      Line 276: Consider replacing "Conversely" with a more suitable connecting term as it implies that the observation presented in the two sentences are reverse or rephrase what is being compared. Is it the fact there is a dose dependency or not between the substrates or is it the actual kinetic parameters that are described. I just don't think conversely is fair with the current formulation as "the R668A mutant did not exhibit a dose-dependent response to the presence of Aβ" not that the Ki is reduced for WT compared to the R668A construct when looking at Aβ.

      The connecting term has been removed completely, beginning the sentence with “When Abeta…”

      Line 359: Replace "6" with "six"

      Changed

      Consider getting rid of possessive apostrophes to keep a formal tone, e.g. lines 211 (cryoSPARC's), 259 (IDE's) and 382 (IDE's). Exception to this is Alzheimer's disease.

      All instances of possessive apostrophes, aside from Alzheimer’s, have been replaced alter more formal wording.

      Figure 7 supplement 1: The color scheme for the local resolution is missing the unit (Å).

      This has been corrected.

      Finally, the supplementary videos illustrating IDE conformational dynamics are difficult to interpret and somewhat redundant in their current form. The transitions occur very rapidly, making it hard to appreciate the described motions, and the uniform coloring of IDE further limits visual clarity. I apologize for not including this point in my initial review. I recommend either removing the videos or re-rendering them to improve interpretability, for example by slowing down the motion and applying the same domain color scheme introduced in the new Figure 1 (and used in the MD trajectory video). This would greatly aid readers in connecting the descriptions in the text to the visual representations in the movies.

      Figure 3 videos 1-4 were slowed down, simplified, and recolored to improve clarity.

      Reviewer #2 (Recommendations for the authors):

      Comments after first revision for authors:

      Thanks a ton to the authors for the detailed explanation on my comments. I believe the discussions will help a large group of audience, especially the non-experts. Please address the minor comment below:

      Minor comment:

      Please update Supplementary file 1 (Cryo-EM data collection, refinement, and validation statistics) regarding the new volume obtained by time-resolved cryo-EM. Kindly also check line 47 in the abstract: "Here, we present five cryo-EM structures" , which may need an update (six structures and resolution 3.0-5.1 Å) or rephrase the sentences accordingly. If similar instances are found in the manuscript, where list of all the structures are mentioned together, please update accordingly if necessary.

      The cryo-EM statistics for the time-resolved cryo-EM are shown in supplementary file 2 to differentiated two datasets. The abstract has been changed, as has line 149.

    1. Author Response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Age-related synaptic dysfunction can have detrimental effects on cognitive and locomotor function. Additionally, aging makes the nervous system vulnerable to late-onset neurodegenerative diseases. This manuscript by Marques et al. seeks to profile the cell surface proteomes of glia to uncover signaling pathways that are implicated in age-related neurodegeneration. They compared the glial cell-surface proteomes in the central brain of young (day 5) and old (day 50) flies and identified the most up- and down-regulated proteins during the aging process. 48 genes were selected for analysis in a lifespan screen, and interestingly, most sex-specific phenotypes. Among these, adult-specific pan-glial DIP-β overexpression (OE) significantly increased the lifespan of both males and females and improved their motor control ability. To investigate the effect of DIP-β in the aging brain, Marques et al. performed snRNA-seq on 50-day-old Drosophila brains with or without DIP-β OE in glia. Cortex and ensheathing glia showed the most differentially expressed genes. Computational analysis revealed that glial DIP-β OE increased cell-cell communication, particularly with neurons and fat cells.

      Strengths:

      (1) State-of-the-art methodology to reveal the cell surface proteomes of glia in young and old flies.

      (2) Rigorous analyses to identify differentially expressed proteins.

      (3) Examination of up- and down-regulated candidates and identification of glial-expressed mediators that impact fly lifespan.

      (4) Intriguing sex-specific glial genes that regulate life span.

      (5) Follow-up RNA-seq analysis to examine cellular transcriptomes upon overexpression of an identified candidate (DIP-β).

      (6) A compelling dataset for the community that should generate extensive interest and spawn many projects.

      Weaknesses:

      (1) DIP-β OE using flySAM:

      (a) These flies showed a larger increase in lifespan compared to using UAS-DIP-β (Figure 2 C, D). Do the authors think that flySAM is a more efficient way of OE than UAS? Also, the UAS construct would be specific to one DIP-β isoform, while flySAM would likely express all isoforms. Could this also contribute to the phenotypes observed?

      We agree with the reviewer that both can contribute to the different lifespan effect. In the original paper presenting flySAM1.0 and flySAM 2.0 (Jia et al., 2018), the authors first tested how flySAM1.0 overexpression (OE) phenotypes compare to several VPR (CRISPRa) and UAS:cDNA OE lines. They found that flySAM1.0 reliably outperforms (i.e., produces stronger OE phenotypes) than VPR in most cases, and produces OE phenotypes that are comparable (i.e., generally equivalent) to UAS:cDNA (Jia et al., 2018). After determining how flySAM1.0 performance compares to VPR and UAS:cDNA, the authors next tested if flySAM2.0 also outperforms VPR; they found that like flySAM1.0, flySAM2.0 outperforms VPR in most cases (Jia et al., 2018). In general, the data suggest that we should expect comparable overexpression phenotypes for our flySAM2.0 and UAS:cDNA lines.

      We chose to proceed with the DIP-β flySAM line for the climbing assays and snRNA-seq, as it gave a stronger lifespan effect and we thought it was likely to be the more robust OE line. While our glial cell-surface proteomics initially identified DIP-β isoform C as the candidate, it is possible that other DIP-β isoforms were also present (such as isoform F, which is identical in polypeptide sequence to isoform C) (FlyBase). Ultimately, we believe that the larger increases in lifespan observed for DIP-β flySAM are likely because flySAM targets all isoforms, whereas UAS:cDNA lines target only one isoform. Importantly, our UAS- DIP-β line was specific to DIP-β isoform C, which is the same isoform that was identified by our proteomics.

      We have made clarifications in the manuscript to address these comments.

      (b) The Glial-GS>DIP-β flySAM flies without RU-486 have significantly shorter lifespans (Figure 2C) than their UAS-DIP-β counterparts. flySAM is lethal when expressed under the control of tubulin-GAL4 (Jia et al. 2018), likely due to the toxicity of such high levels of overexpression. Is it possible that a larger increase in lifespan is due to the already reduced viability of these flies?

      This is a good point. The flySAM lines do exhibit a shorter baseline lifespan compared to the traditional UAS lines. This is likely due to the specific genetic background of the flySAM transgenic insertions, or a low level of "leaky" expression, as previously noted in the literature (Jia et al., 2018).

      However, we believe that the lifespan extensions we observed for DIP-β flySAM is a robust biological effect, rather than an artifact of reduced viability for the following reasons. First, by utilizing the GeneSwitch (GS) system, we can compare the lifespan of flies with the exact same genetic background (+/- RU-486). This ensures that the extension we report is specifically due to the induction of the transgene, rather than a comparison between disparate lines with different basal fitness levels. Second, if the lifespan extensions merely represented a recovery from lower baseline viability, we would expect to see similar improvements across other flySAM lines in our screen. However, DIP-β was the only candidate across our screen that significantly increased lifespan in both sexes (Extended Data Figs. 7 & 8). Third, the lifespan-extending effect of DIP-β was independently confirmed using a traditional UAS-cDNA line, which importantly does not share the same baseline viability issues as the flySAM lines.

      (c) Statistics: It is stated in the Methods that "statistical methods used are described in the figure legend of each relevant panel." However, there is no description of the statistics or sample sizes used in Figure 2.

      We have updated the figure legends for Figure 2 to include the missing statistical details and sample sizes.

      Specifically, for Fig. 2A: The reviewer is correct that with only two replicates of each time point (5d vs. 50d) in the initial proteomic screen, traditional p-value calculations lack the necessary power for meaningful interpretation. We have revised the legend to clarify that this panel represents a discovery-based screen. Candidates were selected based on biological relevance and specific enrichment thresholds to narrow the 872 proteins down to the 48 top candidates for screening (we were initially aiming to identify approximately 50 candidate genes for screening). For Fig. 2B: We have updated the legend to detail the parameters used for the Gene Ontology (GO) enrichment analysis.

      (2) Figure 3: The authors use a glial GeneSwitch (GS) to knock down and overexpress candidate genes. In Figure 3A, they look at glial-GS>UAS-GFP with and without RU. Without RU, there is no GFP expression, as expected. With RU, there is GFP expression. It is expected that all cell body GFP signal should colocalize with a glial nuclear marker (Repo). However, there is some signal that does not appear to be glia. Also, many glia do not express GFP, suggesting the glial GS driver does not label all glia. This could impact which glia are being targeted in several experiments.

      We thank the reviewer for this careful observation regarding the expression pattern of the GSG3285-1 line and acknowledge that the overlap between this driver and the Repo-positive cells is not absolute.

      Our selection of this specific GeneSwitch line was based on several critical experimental considerations: 1) To minimize background toxicity. We initially tested multiple Repo-GeneSwitch lines; however, we found they exhibited significant, genotype-dependent lifespan reductions upon RU486 administration, even in control crosses. This baseline toxicity confounded the interpretation of any potential lifespan effects. GSG3285-1 was chosen for this study, as it provided a robust control baseline and didn’t show lifespan effects with RU486 treatment in multiple control lines. This is essential for lifespan studies. 2) The driver breadth and specificity. As noted in its original characterization (Nicholson et al., 2008) and a later study (Catterson et al. 2023), GSG3285-1 is characterized as a pan-glial driver, though it may include a small population of sensory neurons. Furthermore, while Repo is a standard glial marker, its antibody does not label all glial subtypes with equal intensity. The "non-overlapping" signal observed in Figure 3A may reflect this staining bias. 3) The expression mosaicism. The fact that some glial cells do not show GFP expression suggests a degree of mosaicism, which is common to many GeneSwitch lines (Osterwalder et al., 2001). While we acknowledge this means our manipulations may target a broader subset — rather than every single glial cell — the fact that we still observed significant lifespan effects across two independent platforms (UAS and CRISPRa) suggests that the targeted population is sufficient to mediate these systemic effects.

      We have added a clarifying statement to contextualize the choice of the GSG3285-1 driver and its relationship to the Repo population.

      (3) It is interesting that sex-specific lifespan effects were observed in the candidate screen.

      (a) The authors should provide a discussion about these sex-specific differences and their thoughts about why these were observed.

      We agree that the sex-specific effects observed in our lifespan screen are one interesting aspect of this study. We have added a dedicated section to the Discussion exploring these differences from both a technical and biological perspective.

      On the technical side, the GeneSwitch inducer, RU486, can have sex-specific effects on metabolism and lifespan, depending on the nutritional environment (Dos Santos & Cocheme, 2024). Specifically, RU486 has been shown to counteract the lifespan-shortening effects of mating in females, an effect that is less pronounced in males (Landis et al., 2015; Tower et al., 2017). While we optimized our media and used the GSG3285-1 line to minimize these baseline effects, it remains possible that certain genotypes exhibited a sex-specific sensitivity to the inducer itself. Beyond the technical considerations, sex differences in aging are well-documented in Drosophila and other organisms (Regan et al., 2016; Austad & Fischer, 2016). Male and female flies exhibit distinct transcriptional trajectories and metabolic shifts as they age. Furthermore, recent studies have highlighted that glial function and the neuroinflammatory landscape can differ significantly between sexes, which may dictate how a specific genetic manipulation impacts the aging process in a sex-dependent manner (PMID: 40951920). While our screen identifies DIP-β as a rare candidate that extends lifespan in both sexes, the prevalence of female-specific hits in our data suggests that the female "aging program" may be more plastic or responsive to the specific glial pathways we targeted. These observations provide a valuable foundation for future studies into the mechanisms of sex-specific neuroprotection.

      (b) The authors should also provide information regarding the sex of the flies used in the glial cell surface proteome study.

      It is a mixture of half male and half female flies. This information has been added to the main text, Fig. 1, and to the methods section.

      (c) Also, beyond the scope of this study, examining sex-specific glial proteomes could reveal additional insights into age-related pathways affecting males and females differentially.

      Agreed, this would be a great idea for future studies.

      (4) The behavioral assay used in this study (climbing) tests locomotion driven by motor neurons. The proteomic analysis was performed with the adult brain, which does not include the nerve cord, where motor neurons reside. While likely beyond the scope of this study, it would be informative to test other behaviors, including learning, circadian rhythms, etc.

      We thank the reviewer for this insightful point. While our initial proteomic screen focused on the adult central brain, our behavioral validation used a pan-glial driver, which targets glia throughout the entire nervous system, including the ventral nerve cord (VNC). We have addressed the reviewer's comment as below:

      Additional behavioral data: As suggested, we performed Drosophila Activity Monitoring (DAM) assays to evaluate circadian locomotor rhythms in 50-day-old DIP-β overexpression flies compared to negative controls. Interestingly, we did not detect significant changes in circadian activity at this time point.

      The difference between our climbing and circadian results highlights the complexity of age-related decline. In Drosophila, locomotor performance (i.e., climbing) and circadian coordination often decouple. For example, specific isoforms of human Tau (hTau) can induce severe cognitive and neurodegenerative deficits without affecting lifespan or motor coordination in the same manner (Sealey et al., 2017). Furthermore, motor-specific defects can emerge independently of systemic lifespan changes, as seen in certain SOD1 models of ALS (Hirth, 2010). It is possible that the 50-day timepoint represents a specific window where motor coordination is improved by DIP-β, while circadian circuits — governed by distinct glial-neuronal interactions — remain largely unaffected, or require a different temporal window for observation.

      We agree that identifying the specific glial populations (central brain vs VNC) responsible for the improved climbing would be highly informative. While the current study establishes the pro-longevity effect of DIP-β, future work utilizing in-situ proteomics on the fully intact CNS (including the VNC) or specific VNC will be essential to map the stereotyped progression of these effects across the peripheral and central nervous systems.

      (5) It is surprising that overexpressing a CAM in glia has such a broad impact on the transcriptomes of so many different cell types. Could this be due to DIP-β OE maintaining the brain in a "younger" state and indirectly influencing the transcriptomes? Instead of DIP-β OE in glia directly influencing cell-cell interactions? Can the authors comment on this?

      We agree that the observed changes likely represent a combination of direct cell-cell interactions and a broader, more indirect maintenance of a "younger" physiological state.

      Direct: Among the DIP family, DIP-β exhibits some of the strongest and most promiscuous binding affinities, interacting with a wide array of partners including Dpr6, 8, 9, 15, and 21 (Cosmanescu et al., 2018; Sergeeva et al., 2020). This biochemical flexibility allows DIP-β to potentially interface with a much broader range of neuronal subtypes than other DIP family members, such as DIP-δ, which exclusively binds Dpr12 and did not extend lifespan in our screen. It is possible that by overexpressing DIP-β, we may be partially compensating for the global downregulation of CAMs that typically occurs during aging, thereby preserving essential glial-neuronal communication integrity.

      Indirect: By maintaining these primary glial functions and communication activities, DIP-β overexpression likely delays the overall "aging" of the brain. This preservation of neural health can have downstream effects on systemic physiology, such as the improved glia-fat body communication we observed in 50-day-old flies. In this model, the broad transcriptomic shifts are not necessarily all direct targets of DIP-β, but rather a signature of a brain that has successfully avoided the catastrophic breakdown of homeostasis typically seen in aged wild-type flies.

      We have expanded the Discussion to clarify this distinction, adding that DIP-β likely acts as a "scaffold" or “bridge” for maintaining a younger brain state, which in turn preserves multi-organ communication.

      Reviewer #2 (Public review):

      This manuscript presents an ambitious and technically innovative study that combines in situ cell-surface proteomics, functional genetic screening, and single-nucleus RNA sequencing to uncover glial factors that influence aging in Drosophila. The authors identify DIP-β as a glial protein whose overexpression extends lifespan and report intriguing sex-specific differences in lifespan outcomes. Overall, the study is conceptually compelling and offers a valuable dataset that will be of considerable interest to researchers studying glia-neuron communication, aging biology, and proteomic profiling in vivo.

      The in-situ proteomic labeling approach represents a notable methodological advance. If validated more extensively, it has the potential to become a widely used resource for probing glial aging mechanisms. The use of an inducible glial GeneSwitch driver is another strength, enabling the authors to carefully separate aging-relevant effects from developmental confounds. These technical choices meaningfully elevate the rigor of the study and support its central conclusions. The discovery of new candidate genes from the proteomics pipeline, including DIP-β, is intriguing and opens new avenues for understanding glial contributions to organismal lifespan. The observation of sex-specific lifespan effects is particularly interesting and warrants further exploration; the study sets the stage for future work in this direction.

      At the same time, several areas would benefit from clarification or additional analysis to fully support the manuscript's claims:

      (1) The manuscript frequently refers to "improved" or "increased" cell-cell communication following DIP-β overexpression, but the meaning of this term remains somewhat vague. Because the current analysis relies largely on transcriptomic predictions, it would be helpful to define precisely what metric is being used, e.g., increased numbers of predicted ligand-receptor interactions, enrichment of specific signaling pathways, or altered expression of communication-related components. Strengthening the mechanistic link between DIP-β, cell-cell communication, and lifespan extension, potentially through targeted validation of specific glial interactions, would substantially reinforce the interpretation.

      We agree that a more precise description of “improved” or “increased” cell-cell communication is necessary.

      Our conclusion that DIP-β overexpression is associated with “increased” cell-cell communication is based on the quantification of our CCC scores, which was performed using FlyPhoneDB2, a computational tool used to estimate cell-cell signaling from single-cell RNA-sequencing data (Liu et al., 2021; Qadiri et al., 2025). To infer cell-cell signaling, FlyPhoneDB2 and its predecessor, FlyPhoneDB, calculate “interaction scores,” comparing the expression levels of a curated list of ligand-receptor pairs between cell types (Liu et al., 2021; Qadiri et al., 2025). For example, if we detect a ligand in cell type A and its receptor in cell type B in DIP-β overexpression flies but didn’t detect both ligand and receptor in control flies, the CCC score is increased by 1. FlyPhoneDB2 additionally enables users to estimate signaling activity by also taking into consideration the expression of downstream reporter genes (Qadiri et al., 2025).

      “Improved cell-cell communication” is our interpretation based on the CCC analysis. It is important to note that the metric being used here (increased CCCs) is the number of predicted ligand-receptor interactions, and that our CCC analysis was based entirely on inferences from snRNA-seq data. We have added further clarification to our manuscript, which now further expands on the results of our CCC analysis (i.e., the increased expression for 61% and decreased expression for 39% of ligand-receptor pairs we observed in our DIP-β overexpression group, compared to our negative control), which ultimately led us to conclude that DIP-β overexpression is associated with improved cell-cell communication.

      (2) The lifespan screen is central to the paper, and clearer visualization and contextualization of these results would significantly improve the manuscript's impact. For example, Figure 3D is challenging to interpret in its current form. More explicit presentation of which manipulations extend lifespan in each sex, along with effect sizes and significance values, would provide clarity. Including positive controls for lifespan extension would also help contextualize the magnitude of the observed effects. The reported effects of DIP-β, while promising, are modest relative to baseline effects of RU feeding, and a discussion of this would help appropriately calibrate the conclusions.

      We appreciate the reviewer’s suggestion to improve the clarity of the lifespan screen results. We have significantly revised Figures 3D, 3E, and 3F to provide a more intuitive summary of the candidate gene manipulations. Figures 3D and 3E now explicitly include the effect sizes and p-values for each candidate gene, broken down by sex. We also added a new Figure 3G with a visual layout that has been streamlined to allow for quick identification of manipulations that successfully extended lifespan.

      The reviewer raises an important point regarding the use of positive controls to calibrate the magnitude of lifespan extension. We carefully considered adding a standard control (such as Rapamycin treatment); however, we opted against it for several methodological reasons:

      As noted in the literature, the magnitude of lifespan extension from standard controls can vary drastically depending on genetic background and lab environment. For instance, Rapamycin-induced extension ranges from ~10% (Schinaman et al., 2019), to over 80% (Landis et al., 2024). We felt that adding a single positive control might provide a false sense of "calibration" rather than a true universal benchmark.

      To ensure the robustness of our findings, we instead employed a dual-validation strategy. We confirmed the lifespan-extending effects of our candidates using both traditional UAS:cDNA and CRISPR-based overexpression. The fact that two independent genetic systems yielded consistent results provides strong internal evidence for the reported effects.

      We acknowledge that the effects of DIP-β are modest when compared to the baseline impact of RU486 feeding. We have added a section to the Discussion addressing this. While the effects are subtle, their reproducibility across different overexpression platforms suggests they are biologically relevant, even if they do not reach the dramatic shifts seen in some caloric restriction or drug-based models.

      We have further addressed this in the results section.

      (3) Several figures would benefit from improved labeling or more detailed legends. For instance, the meaning of "N" and "C" in Figure 1D is unclear; Figure 3A should clarify that Repo is a glial marker; and Figure 5C appears to have truncated labels. Reordering certain panels (e.g., moving control data in Figure 4A-B) may also improve narrative flow. These refinements would greatly aid reader comprehension.

      We have modified and improved the labeling of these figures to increase the clarity. For Fig. 1D, we added the explanation to the Figure legends. In brief, in the Tandem Mass Tag (TMT) isobaric labeling system, 128N is one of many channels (126, 127N, 127C, 128N, 128C, etc.) used to index and compare up to 18 samples simultaneously, improving throughput and reducing missing values.

      Fig. 3A has been updated to clarify that Repo is the glial marker. Fig. 4A-D have been reordered so that the DIP- β lifespan results are presented before the control lifespan, which hopefully improves the narrative flow of this figure. The Fig. 4 references in the manuscript have also been updated to match these changes. Additionally, Fig. 5C has been updated to include the truncated x-axis and y-axis labels.

      (4) A few claims would be strengthened by more specific references or acknowledgment of alternative interpretations. Examples include the phenoxy-radical labeling radius, the impact of H₂O₂ exposure, and the specificity of neutravidin. Additionally, downregulation of synapse-related GO terms may reflect age-related transcriptional changes rather than impaired glia-neuron communication per se, and this possibility should be recognized. The term "unbiased" to describe the screen may also be reconsidered, given the preselection of candidate genes.

      These are good suggestions. We have added references for the phenoxy-radical labeling radius (Durojaye, 2021), the impact of H₂O₂ exposure (J. Li et al., 2021), and the binding specificity of neutravidin (J. Li et al., 2021). We have also removed the term “unbiased” from our manuscript.

      Regarding the request to further address the downregulation of synapse-related GO terms, we believe this indicates a lack of clarity on our part. We did not intend to suggest that our GO analyses, which were based on our proteomics data, were necessarily indicative of impaired neuron-glia communication. Our conclusions regarding altered neuron-glia communication have come from our later snRNA-seq data and analyses. Inspired by this comment, we agree that our differential gene analysis may reflect transcriptional changes rather than impaired glia-neuron communication. We have added such alternative interpretation.

      (5) Clarifying the rationale for focusing on central brain glia over optic-lobe glia would be useful. 

      Agreed! As the intended focus of this study was the more general changes occurring during normal brain aging, we chose to focus on the central brain for our glial cell-surface proteomics, which is responsible for most of the brain’s higher order functions, including learning and memory, signal integration, behavior, etc. As the optic lobes account for approximately half of all neurons in the adult Drosophila brain and are specialized to process visual stimuli (Robinson et al., 2025), we were concerned that including the optic lobes in our glial cell-surface proteomics could strongly bias our findings towards age-related changes in visual function, rather than the more general changes we intended to focus on. Such clarification has been added to the results section (Quantitative comparison of young and old proteomes).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Line 62: Can the authors expand on "several changes"?

      We have added a sentence expanding upon this in the manuscript draft.

      (2) Line 137: Can the authors provide a reference for the phenoxyl radical half-life?

      Thanks for catching this. We’ve added our reference for the phenoxyl radical half-life.

      (3) Figure 1B: The authors state that neutravidin stained glia; however, there is no glial marker (e.g., anti-Repo) in this panel.

      We acknowledge the reviewer’s point. The lack of anti-Repo staining in Figure 1B is due to the requirements of the Neutravidin-Alexa 647 detection method. Because this procedure bypasses traditional primary and secondary antibody incubation to preserve the biotin signal, co-staining with Repo was not technically feasible. Nevertheless, we utilized the Repo-GAL4 driver to express UAS-CD2-HRP; since this driver is well-documented and specific to glial cells, the Neutravidin signal serves as a functional readout of the targeted glial population.

      (4) Line 254: There is no Figure 2D.

      We’ve corrected this to Fig. 2C.

      (5) Lines 390-396: No reference to the respective figures.

      We’ve made a couple corrections to reference all the respective figures.

      (6) Figure 5C: The X-axis is cut off.

      This has been corrected.

      Reviewer #2 (Recommendations for the authors):

      Minor inconsistencies (e.g., figure references-line 254 references "Figure 2D" where none exists) should be corrected.

      We’ve corrected this to Fig. 2C.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Genetically encoded fluorescent proteins expressed in specific cell types allow recognising them in vivo and, if the protein is a functional indicator, as in the case of genetically encoded calcium indicators (GECIs), to record activity from the same cellular ensemble. Ideally, if proteins (fluorophores) have perfectly distinct spectral properties, signals can be distinguished from as many cell types as the number of employed fluorophores. In practice, fluorescent proteins have non-negligible crosstalk both in absorption and emission bands. In addition, fluorescence contribution of each fluorophore normally varies from cell to cell and therefore spectral properties of cells expressing two or more proteins are different. The work of Phillips et al. addresses this challenge. The authors present an approach defined as "Neuroplex", allowing identification of up to nine cell types from the same number of fluorophores. The fingerprint of each cell is then associated with functional fluorescence from the GECI GCaMP, allowing recording calcium activity from that specific cell. The method is implemented in vivo using head-mounted miniscopes.

      The authors used a mouse line expressing GCaMP in cortical pyramidal neurons and developed an experimental pipeline. First, they injected the nine AAV viruses, causing expression of fluorophores in a different brain area. The idea was not to image that area, but a non-infected medial prefrontal cortex (mPFC) section where neurons could be infected by their axons projecting in an injected area, in this way being identified by their targeting region(s). A GRIN lens, allowing spectral analysis, was mounted in the mPFC section, and GCaMP fluorescence was then recorded during behavioural tasks and analysed to identify regions of interest (ROIs) corresponding to neuron somata. After functional imaging, the head of the mouse was fixed, spectral analysis was performed, and after necessary correction for chromatic distortions, the fluorophore contribution was determined for each ROI (neuron) from where GCaMP signals were detected. Notably, the procedures for estimation and correction of chromatic aberration and light transmission (described in Figure 2) were a major challenge in their technical achievements. The selection of the nine fluorophores was another big effort. This was done by combining computer simulations and direct measurement of spectra from individual proteins expressed in HEK293 cells. It is important to say that the authors could simulate arbitrary combinations of two or more different fluorophores and evaluate the ability of their algorithm to detect the correct proteins against wrong estimations of false-negative (absence of an expressed protein) or false-positive (presence of a non-expressed protein). Not surprisingly, this ability decreases with the level of GCaMP expression. The authors underline that most errors were false-negatives, which have a milder impact in terms of result interpretation, but the rate of false positives was, nevertheless, relevant in detecting a second fluorophore from a cell expressing only one protein. The experimental profiles of fluorophores were dependent both on the specific fluorescent protein and on the projecting area, and the distribution of double-labelled did not match anatomical evidence. This result should be taken as the limitation of the present pioneering experiments, presented as proof-of-principle of the approach, but Neuroplex may provide far improved precision under different experimental conditions.

      In my view, the work of Phillips et al. represents a significant advance in the state-of-the-art of the field. The rigorous analysis of limitations in the use of Neuroplex must be considered an important guideline for future uses of this approach.

      We appreciate the reviewer’s positive evaluation and thoughtful comments.

      Reviewer #2 (Public review):

      Summary:

      The manuscript introduces Neuroplex, a pipeline that integrates miniscope Ca²⁺ imaging in freely moving mice with multiplexed confocal and spectral imaging to infer projection identities of recorded neurons. This technical approach is promising and could broaden access to projection-resolved population imaging. However, the core quantitative analyses apply a winner-take-all single-label assignment per neuron even when multiple fluorophores exceed threshold, with additional labels treated descriptively as "secondary hits." While the authors acknowledge and simulate dual labeling, the extent to which this single-label decision rule affects subtype fractions and behavioural comparisons remains uncertain without a multi-label (or probabilistic) sensitivity analysis and propagation of classification uncertainty.

      We thank Reviewer #2 for the careful statistical perspective and focus on assignment strategy and uncertainty. Importantly, we emphasize that Neuroplex is presented as a methodological proof-of-principle, not as a definitive quantification of projection convergence.

      Strengths:

      (1) Conceptual advance and practicality: Decoupling acquisition from identity readout constitutes an innovative approach that is, in principle, applicable in laboratories currently using single-color miniscopes.

      (2) Engineering thoroughness: The manuscript offers detailed consideration of GRIN optics, spectral libraries, registration procedures, and simulations that address signal-to-noise ratio, background, and class imbalances.

      (3) Immediate community value: If demonstrated to be robust, the pipeline could enable projection-resolved analyses without reliance on specialized multicolor miniscopes.

      Weaknesses:

      (1) Single-label assignment in the main analyses: When multiple fluorophores exceed threshold for a neuron/ROI, the workflow applies a winner-take-all rule and assigns a single label (the fluorophore with the largest standardized beta), while additional above-threshold fluorophores are retained only as "secondary hits." This is a reasonable specificity-first choice, but because cortical excitatory neurons can collateralize, collapsing dual-threshold ROIs to one identity may under-represent dual-projecting cells and could bias estimated subtype fractions and behavioural comparisons.

      We thank the reviewer for raising this important conceptual point.

      We agree that cortical excitatory neurons frequently collateralize and therefore may legitimately express more than one retrograde fluorophore. Our use of a winner-take-all (WTA) rule in the primary analyses was an intentionally conservative methodological choice designed to prioritize specificity over sensitivity in this proof-of-principle study.

      As demonstrated in our simulations (Supp. Fig. 5–6), under realistic background and noise conditions, secondary assignments are more susceptible to false-positive errors than primary assignments. For this reason, we chose to assign a single primary identity for quantitative behavioral stratification while retaining additional above-threshold fluorophores as “secondary hits” and reporting their distribution separately (Supp. Fig. 7).

      We did not intend to imply that projections are exclusive. Rather, the WTA strategy provides a conservative lower-bound estimate of subtype proportions and avoids inflation of dual-label rates under conditions where spectral separability is imperfect.

      We agree that this rationale should be stated more explicitly in the manuscript, and that the potential impact of assignment strategy on subtype fractions and behavioral comparisons should be acknowledged clearly as a methodological trade-off rather than a biological claim.

      Importantly, the biological analyses presented in this manuscript are illustrative demonstrations of functional stratification capability and do not depend on exclusivity of projection identity. We have revised the manuscript to clarify this framing as follows:

      “If multiple fluorophores exceeded the threshold for an ROI, the fluorophore with the largest z-scored beta value was assigned as the primary identity (winner-take-all rule). This conservative approach was chosen to prioritize specificity under realistic noise and background conditions. Additional above-threshold fluorophores were retained as ‘secondary hits’ but were not incorporated into primary subtype stratification analyses.” (Methods, Single Pass Algorithm)

      “For quantitative behavioral comparisons, each ROI was assigned a single primary fluorophore identity using a winner-take-all rule. We emphasize that this assignment strategy does not imply projection exclusivity. Rather, it provides a conservative lower-bound estimate of subtype proportions, as ROIs exceeding threshold for multiple fluorophores were classified according to their strongest spectral contribution.” (Result, Fluorophore distribution in behaviorally relevant ROIs)

      “These analyses were performed using conservative single-label assignments; dual-threshold ROIs were not treated as co-identities in order to avoid overinterpretation of potentially ambiguous multi-label cells. Because identity assignment prioritizes specificity and classification uncertainty was not formally propagated into downstream comparisons, subtype fractions and behavior-by-subtype differences should be interpreted as qualitative demonstrations of projection-resolved functional stratification rather than precise anatomical quantifications. ” (Results, Neuronal Cell Type and Behavior)

      “Cortical pyramidal neurons frequently collateralize to multiple downstream targets, and accordingly some ROIs exceeded threshold for more than one fluorophore. In this proof-of-principle implementation, we adopted a specificity-first winner-take-all assignment rule for primary analyses to minimize false-positive multi-label calls under realistic noise conditions. This strategy likely underestimates the true prevalence of dual-projecting neurons and should therefore be interpreted as a conservative stratification approach rather than a statement of projection exclusivity.” (Discussion)

      (2) Dual-label detection is acknowledged but remains descriptive in vivo: the manuscript explicitly discusses the possibility of dual projection, evaluates dual-fluorophore detection in simulations (including performance under realistic noise/background), and reports in vivo rates of secondary hits. However, these dual-threshold events are not incorporated as co-identities in the main statistical analyses, making it difficult to judge how robust the principal biological conclusions are to the single-label decision rule.

      We thank the reviewer for this important clarification request.

      We agree that dual-projection neurons are biologically plausible and that dual-threshold ROIs were detected in vivo. In this manuscript, however, our primary goal was to establish the feasibility of high-dimensional spectral assignment and projection-resolved stratification, rather than to provide a definitive quantification of projection convergence.

      For this proof-of-principle study, we chose a conservative winner-take-all (WTA) framework for primary behavioral analyses in order to minimize false-positive multi-label assignments under realistic noise and background conditions, as demonstrated in our simulations (Supp. Fig. 5–6). Secondary hits were retained and reported descriptively (Supp. Fig. 7), but not incorporated into the primary statistical comparisons to avoid overinterpretation of potentially ambiguous dual-label calls.

      Importantly, the principal biological conclusions presented in the manuscript are qualitative demonstrations that projection-defined stratification is feasible within a single animal. These conclusions do not rely on projection exclusivity or on precise quantification of dual-projecting fractions.

      We agree that this distinction should be made clearer in the manuscript, and we have revised the text as follows:

      “Although dual-threshold ROIs were detected in vivo, these secondary assignments were not incorporated as co-identities in the primary behavioral analyses. This decision reflects a conservative specificity-first framework designed to minimize false-positive multi-label calls under realistic noise conditions. Accordingly, dual-label rates reported here should be interpreted descriptively. The present study focuses on demonstrating the feasibility of projection-resolved stratification, rather than providing definitive quantification of projection convergence.” (Results, Fluorophore distribution in behaviorally relevant ROIs)

      “We then stratified these neurons by projection target and examined behaviorally selective activity across cell types. These analyses were performed using conservative single-label assignments; dual-threshold ROIs were not treated as co-identities in order to avoid overinterpretation of potentially ambiguous multi-label cells. Because identity assignment prioritizes specificity and classification uncertainty was not formally propagated into downstream comparisons, subtype fractions and behavior-by-subtype differences should be interpreted as qualitative demonstrations of projection-resolved functional stratification rather than precise anatomical quantifications.” (Results, Behavioral Analysis)

      (3) Uncertainty is not propagated: False-positive/false-negative rates from simulations and uncertainty from registration/segmentation are not carried forward into quantitative confidence bounds on subtype proportions or behaviour-by-subtype effects.

      We agree that formal propagation of classification and registration uncertainty into subtype proportions and behavioral comparisons would be appropriate in a study primarily focused on precise anatomical quantification. However, the central goal of the present manuscript is methodological and to demonstrate that high-dimensional spectral identity can be reliably linked to miniscope-recorded functional activity within a single animal.

      We have shown that simulations under realistic noise, background, and class imbalance conditions (Supp. Fig 5-6) show that errors are predominantly false negatives rather than false positives. However, behavioral analyses are presented as qualitative demonstrations of the feasibility of projection-resolved stratification rather than as definitive quantitative anatomical measurements.

      In the revised manuscript, we clarified that 1) subtype proportions and behavioral effects are assignment-dependent estimates, 2) simulation-derived error rates provide guidance for experimental design rather than formal confidence intervals, and 3) future studies centered on precise quantification of projection fractions would benefit from formal uncertainty modeling, as follows:

      “These simulation-derived accuracy estimates characterize expected performance under defined noise and background conditions but were not formally propagated into confidence bounds on subtype proportions or behavioral comparisons. In this proof-of-principle study, subtype fractions are presented as assignment-dependent estimates rather than definitive anatomical measurements.” (Results, Assessment of spectral unmixing approach)

      “Because classification uncertainty was not formally propagated into these analyses, behavior-by-subtype comparisons should be interpreted as qualitative demonstrations of functional stratification rather than precise quantitative estimates.” (Results, Neuronal cell types and behavior)

      “The modeling framework was designed to characterize expected classification behavior across a range of experimental regimes, including background fluorescence, class imbalance, and reduced signal-to-noise ratio. These simulations provide practical performance guidance but were not used to compute formal error bars or propagate uncertainty into downstream biological analyses.” (Methods, Modeling of experimental variables to assess accuracy of algorithms)

      “Because the present study is designed to establish methodological feasibility rather than precise anatomical quantification, simulation-derived false-positive and false-negative regimes were not formally propagated into confidence bounds on subtype proportions or behavioral effect sizes. Accordingly, subtype fractions should be interpreted as assignment-dependent estimates rather than definitive anatomical measurements. Future implementations could incorporate Bayesian or likelihood-based classifiers to generate posterior identity probabilities and enable formal uncertainty propagation when quantitative estimation of projection convergence is central to the biological question.” (Discussion)

      Reviewer #3 (Public review):

      This manuscript presents Neuroplex, a technically rigorous and carefully validated pipeline that links miniscope calcium imaging in freely behaving animals with high-dimensional fluorophore-based cell-type identification using in vivo multiplexed spectral confocal imaging through the same implanted GRIN lens. The work overcomes a major practical limitation of head-mounted microscopy by enabling the identification of up to nine projection-defined neuronal populations within the same animal, without post-fixation histology. The approach is well motivated and supported by extensive calibration and simulation. While the biological results are primarily illustrative, the methodological contribution is clear and likely to be broadly useful.

      Major comments

      (1) The approach relies on the assumption that fluorophore identity assigned during anesthetized confocal imaging accurately reflects the identity of neurons recorded during prior behavioural sessions. While the use of the same GRIN lens and in vivo co-registration mitigates many concerns, the manuscript would benefit from a more explicit discussion, or empirical demonstration, if available, of the stability of fluorophore assignments across time. Even limited repeat spectral imaging in a subset of animals would strengthen confidence in longitudinal applicability.

      We thank the reviewer for highlighting this important conceptual assumption.

      Fluorophore identity in Neuroplex is genetically encoded via AAVretro delivery and therefore does not depend on transient physiological state. Spectral imaging is performed in vivo through the same GRIN lens and field of view used during behavioral imaging, and co-registration relies on anatomical landmarks. While repeat spectral imaging was not formally performed as a longitudinal experiment, the underlying fluorescent protein expression is stable over weeks, and there is no biological mechanism in this paradigm that would alter fluorophore identity across sessions.

      We revised the manuscript to explicitly state this assumption and clarify why identity stability is expected as follows:

      “…fluorophore signals and reduce unmixing fidelity, leading to an increased false positive rate. Fluorophore identity in this framework is genetically encoded via retrograde AAV delivery and is therefore expected to remain stable across behavioral and spectral imaging sessions. Because both functional and spectral data are acquired in vivo through the same GRIN lens and co-registered using anatomical landmarks, assignment stability is not expected to vary across time unless expression levels change substantially. While repeat spectral imaging was not performed as a formal longitudinal experiment in this study, the stability of fluorescent protein expression supports the assumption that fluorophore identity reflects a persistent cellular attribute.” (Discussion)

      (2) Fluorophore identity is determined using thresholding of linear unmixing coefficients relative to an empirically defined baseline, followed by a second adaptive pass for over-represented fluorophores. While this heuristic is extensively validated via simulations, it remains ad hoc from a statistical perspective. The authors should more explicitly justify this choice and discuss its limitations relative to probabilistic or likelihood-based classifiers, particularly with respect to uncertainty estimation at the single-ROI level.

      We agree that the dual-pass thresholding approach is heuristic rather than fully probabilistic. More formal probabilistic classifiers are possible but would introduce additional modeling assumptions and training requirements beyond the scope of this proof-of-principle study.

      We revised our manuscript to clarify this as follows:

      “The current classification framework relies on linear unmixing followed by empirically defined thresholding rather than full probabilistic inference. This approach provides transparency and practical robustness under realistic noise and background conditions but does not generate single-ROI posterior uncertainty estimates. ” (Discussion)

      (3) Identifiability of fluorophores is demonstrated empirically, but the manuscript does not explicitly quantify spectral separability (e.g., similarity metrics between basis spectra or conditioning of the unmixing matrix). A brief analysis of spectral independence or sensitivity of beta estimates to noise would provide mathematical reassurance, especially given the reliance on linear regression in a high-dimensional feature space.

      We agree that spectral separability is conceptually important. In this manuscript, separability is demonstrated empirically through 1) In vitro fingerprint acquisition under identical optical conditions, 2) simulation under background and noise, and 3) successful in vivo classification across regimes. We did not compute formal matrix conditioning metrics, but we agree that the separability rationale should be described more explicitly. We revised our manuscript as:

      “While formal conditioning metrics were not explicitly computed empirical fingerprint acquisition and simulation-based perturbation analyses demonstrate sufficient spectral independence for reliable linear unmixing under the tested regimes.” (Discussion)

      (4) The spectral unmixing treats CNMF-derived ROIs as fixed supports. I wonder whether ROI boundaries, neuropil contamination, and partial overlap can introduce structured uncertainty that could bias spectral estimates. If so, the authors should acknowledge this dependency more explicitly and discuss how ROI quality or overlap might influence false negatives or false positives, particularly in densely labelled regions.

      We agree that ROI definition influences spectral extraction. Spectral fingerprints are derived by averaging all pixels within the ROI mask, and therefore neuropil contamination, partial ROI overlap, and dense labeling could influence beta estimates. In the revised manuscript, we have acknowledged this dependencies more explicitly.

      “Spectral unmixing operates on CNMF-derived ROI masks treated as fixed supports. Accordingly, segmentation quality, neuropil contamination, and partial overlap between neighboring cells can influence extracted spectral fingerprints and may contribute to false negatives or secondary assignments, particularly in densely labeled regions. These structured sources of uncertainty are expected to have the greatest impact under regimes of extreme class imbalance, low fluorophore brightness, strong neuropil signal, or pairing of spectrally overlapping reporters. Use of refined segmentation strategies or nuclear-localized reporters could reduce such structured uncertainty in future implementations.” (Discussion)

      (5) The manuscript reports meaningful rates of secondary fluorophore detection, but also nontrivial false-positive rates for secondary labels under realistic conditions. The authors appropriately caution against over-interpretation, but the Discussion should more clearly delineate when dual-label assignments are likely to be biologically interpretable versus methodologically ambiguous, and how experimental design (e.g., fluorophore pairing) should be optimized accordingly.

      We agree and will delineate interpretability boundaries explicitly.

      “Dual-label assignments are most reliable when fluorophores are spectrally well separated and when signal-to-noise ratios are high. In contrast, spectrally adjacent fluorophore pairs or densely labeled regimes increase ambiguity and false-positive risk. Experimental design should therefore prioritize pairing spectrally distant fluorophores when projection convergence is of primary interest.” (Discussion)

      (6) I suspect that Neuroplex will be most effective in certain regimes (moderate convergence, bright and spectrally distinct fluorophores) and less reliable in others. A more explicit discussion of best practices, anticipated failure modes, and experimental scenarios where the method may be inappropriate would increase the practical value of the paper for adopters.

      “More broadly, Neuroplex is expected to perform most robustly in regimes characterized by moderate projection convergence, balanced fluorophore representation, bright and spectrally distinct reporters, and adequate signal-to-noise ratio. Imaging directly within a projection target that has received dense retrograde labeling may introduce substantial class imbalance, which simulations predict will reduce detection sensitivity for the dominant fluorophore. In such cases, conservative assignment strategies, reduced spectral complexity, or refinement of ROI definition may improve interpretability. Careful fluorophore selection and pilot validation under intended imaging conditions are therefore recommended prior to large-scale application. Future implementations incorporating nuclear-localized reporters may further reduce segmentation-dependent ambiguity by constraining spectral signals to somatic compartments.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors should address a few points that are not clear.

      (1) At the end of the Results, the authors assess their approach using only four fluorophores and conclude that Neuroplex works "even" under reduced complexity. There is something I am missing. In my mind, lower complexity should be easier and should work better. As a researcher, I would first assess a four-fluorophores scenario and then step up with complexity, but the authors did the opposite. Also, I think that the present Supplementary Figure 9 should be in the main text; I don't understand why the authors decided to relegate a clear result to the bottom of everything. The authors should give some explanations.

      We agree that reduced spectral complexity should, in principle, improve separability and classification performance. Our original presentation order was intended to first demonstrate feasibility under the most challenging condition (nine fluorophores plus GCaMP), thereby establishing maximal multiplexing capacity. The reduced-complexity experiment was included to demonstrate scalability and generalizability under more typical experimental regimes. However, we agree that this rationale was not sufficiently clear and that the reduced-complexity results merit presentation in the main text.

      Accordingly:

      We have moved former Supplementary Figure 9 into the main Results (Fig. 6).

      We have clarified explicitly why the nine-fluorophore condition was presented first as follows:

      “To evaluate the performance of Neuroplex under more typical experimental regimes with reduced-complexity, we applied the pipeline to two GCaMP transgenic animals injected with a subset of four fluorophores.”

      (2) The question of relative expression is crucial. Among the infected regions, there is the contralateral mPFC and I imagine that if they image there, the contribution of the expressed protein might dominate all other components, preventing detection of other fluorophores, including GCaMP. But is it the case, or would it be possible to detect projecting neurons in that region? I would be surprised that the authors never tried it; this test would simply imply mounting the GRID lens on the other hemisphere.

      This is an important conceptual point.

      Our simulations (Supp. Fig. 5) explicitly model over-representation of a single fluorophore. These results show that heavy class imbalance primarily increases false negatives (due to baseline normalization) rather than false positives.

      In the revised manuiscript, we discussed this limitation more explicitly.

      “Relative fluorophore representation within the imaged field of view influences classification robustness. As demonstrated in our simulations of class imbalance (Supp. Fig. 5g–h), extreme over-representation of a single fluorophore primarily increases false-negative rates due to baseline normalization effects. In the present study, we intentionally avoided imaging directly within heavily infected projection targets (e.g., contralateral mPFC) in order to maintain moderate fluorophore representation across ROIs. Imaging in a densely labeled region would represent a more challenging regime, and we would expect reduced sensitivity for the dominant fluorophore under such conditions.” (Dicussion)

      (3) The possibility to utilise Neuroplex goes beyond the type of experiment presented as proof-of-concept in this technical paper. In the Discussion, the authors mention genetically defined subtypes and activity-tagged neurons. But, if one changes the pipeline, can it be used by expressing GECIs with different spectra, or GECIs and genetically-encoded voltage indicators (GEVIs)? I would be very interested in knowing what the authors think about this putative "shortcut".

      We thank the reviewer for this forward-looking and insightful question.

      In principle, the Neuroplex framework could be extended to incorporate spectrally distinct genetically encoded functional indicators, including multi-color GECIs or combinations of GECIs and GEVIs. However, it is important to distinguish this from the identity-assignment strategy implemented in the present study.

      Simultaneous multi-color functional imaging under a head-mounted miniscope is optically more demanding than assigning cell identity from single-color functional recordings followed by high-dimensional spectral readout. Multi-color GECI or GEVI imaging requires real-time excitation and emission separation during dynamic recording, increases optical complexity, and is particularly sensitive to chromatic aberration, photon efficiency, and signal-to-noise constraints imposed by GRIN lenses.

      In contrast, Neuroplex decouples functional acquisition from spectral identity determination. Functional activity is recorded using a single optimized channel, while spectral separation is performed separately under controlled confocal conditions with multiplexed excitation and emission sampling. This design substantially reduces optical burden during behavioral imaging.

      While integration of multiple functional reporters is conceptually feasible within this framework, successful implementation would require careful validation of brightness, spectral separability, and temporal stability for each reporter combination.

      Reviewer #2 (Recommendations for the authors):

      (1) Implement a principled multi-label calling mode for cells with >1 above-threshold fluorophore (e.g., per-fluorophore FDR control or Bayesian posteriors). Report cell-wise weights and re-run key results three ways: single-label, hard multi-label, and soft (probabilistic) assignments; state explicitly how conclusions change.

      We appreciate this suggestion and agree that multi-label or probabilistic calling frameworks are well motivated, particularly for studies in which projection convergence is the central biological question. In the current manuscript, however, our goal is to establish a practically deployable proof-of-principle pipeline for linking miniscope functional recordings to a high-dimensional spectral-identity readout. Consistent with this scope, we used a conservative winner-take-all (WTA) strategy for primary analyses to prioritize specificity under realistic noise and background conditions, and we treated multi-hit events descriptively. Importantly, the qualitative conclusions regarding projection-resolved functional stratification are unchanged when secondary-hit distributions are examined.

      In the revised manuscript, we explicitly stated that: (i) single-label assignment is a conservative analysis choice rather than a biological claim of exclusivity, and (ii) multi-label or probabilistic calling is a natural extension for future work, as follows:

      “If multiple fluorophores exceeded the threshold for an ROI, the fluorophore with the largest z-scored beta value was assigned as the primary identity (winner-take-all rule). This conservative approach was chosen to prioritize specificity under realistic noise and background conditions. Additional above-threshold fluorophores were retained as ‘secondary hits’ but were not incorporated into primary subtype stratification analyses.” (Methods, Single Pass Algorithm)

      “Because the present study is designed to establish methodological feasibility rather than precise anatomical quantification, simulation-derived false-positive and false-negative regimes were not formally propagated into confidence bounds on subtype proportions or behavioral effect sizes. Accordingly, subtype fractions should be interpreted as assignment-dependent estimates rather than definitive anatomical measurements. Future implementations could incorporate Bayesian or likelihood-based classifiers to generate posterior identity probabilities and enable formal uncertainty propagation when quantitative estimation of projection convergence is central to the biological question.” (Discussion)

      (2) Add ground truth for dual projectors in a subset (paired orthogonal tracers or staged injections) and provide a confusion matrix including dual-positives; use this to calibrate thresholds/priors.

      We agree that ground truth validation of dual projectors using orthogonal tracers or staged injections would be valuable, particularly for calibrating priors and enabling confusion-matrix-based evaluation. However, these experiments require additional cohorts and experimental design beyond the scope of the current proof-of-principle technical manuscript. Our goal here is to demonstrate the feasibility of multiplexed identification and projection-resolved stratification within a single animal, not to provide definitive anatomical quantification of collateralization.

      We have revised the manuscript to clearly state that dual-label in vivo observations are descriptive and that studies aimed at quantitative convergence mapping should incorporate orthogonal ground truth validation.

      “Accurate quantification of projection convergence would benefit from orthogonal ground-truth validation (e.g., paired tracers or staged injections) to establish confusion matrices for dual positives and to calibrate thresholds or priors.”

      (3) Propagate uncertainty from simulations and registration/segmentation to subtype fractions and behavior effects (error bars or sensitivity analyses).

      We agree that formal uncertainty propagation is appropriate for studies focused on precisely quantifying subtype proportions or effect sizes. In this manuscript, subtype fractions and behavioral comparisons are presented primarily as demonstrations of the feasibility of projection-resolved functional stratification, rather than definitive anatomical measurements. Simulation analyses are included to characterize expected performance under defined noise and background regimes, but we did not propagate these uncertainties into downstream confidence bounds in this proof-of-principle work.

      We have revised the manuscript to clarify this explicitly as follows:

      “These simulation-derived accuracy estimates characterize expected performance under defined noise and background conditions but were not formally propagated into confidence bounds on subtype proportions or behavioral comparisons. In this proof-of-principle study, subtype fractions are presented as assignment-dependent estimates rather than definitive anatomical measurements.” (Results, Assessment of spectral unmixing approach)

      “These analyses were performed using conservative single-label assignments; dual-threshold ROIs were not treated as co-identities in order to avoid overinterpretation of potentially ambiguous multi-label cells. Because identity assignment prioritizes specificity and classification uncertainty was not formally propagated into downstream comparisons, subtype fractions and behavior-by-subtype differences should be interpreted as qualitative demonstrations of projection-resolved functional stratification rather than precise anatomical quantifications.” (Results, Neuronal cell types and behavior)

      “The modeling framework was designed to characterize expected classification behavior across a range of experimental regimes, including background fluorescence, class imbalance, and reduced signal-to-noise ratio. These simulations provide practical performance guidance but were not used to compute formal error bars or propagate uncertainty into downstream biological analyses.” (Methods, Modeling of experimental variables to assess accuracy of algorithms)

      “Because the present study is designed to establish methodological feasibility rather than precise anatomical quantification, simulation-derived false-positive and false-negative regimes were not formally propagated into confidence bounds on subtype proportions or behavioral effect sizes. Accordingly, subtype fractions should be interpreted as assignment-dependent estimates rather than definitive anatomical measurements. Future implementations could incorporate Bayesian or likelihood-based classifiers to generate posterior identity probabilities and enable formal uncertainty propagation when quantitative estimation of projection convergence is central to the biological question.” (Discussion)

      (4) Mitigate sources of spurious multi-hits (neuropil handling, ROI mask erosion, nuclear-localized reporters, spectral basis choices) and quantify their impact on dual-label recovery.

      We agree that neuropil contamination, ROI boundary choices, and spectral basis selection can influence multi-hit rates. In the current manuscript, we already implement background subtraction and evaluate multi-hit behavior through simulations under realistic background and noise regimes. Quantitative evaluation of additional mitigation strategies (e.g., ROI erosion comparisons) would require new analyses beyond the current scope.

      We have revised the Discussion to include concrete best-practice recommendations (e.g., fluorophore pairing, conservative interpretation of multi-hits, and potential use of nuclear-localized reporters).

      “Multi-hit events can reflect true biological collateralization but may also arise from structured sources of ambiguity such as neuropil contamination, partial ROI overlap, or imperfect ROI boundaries. These factors may bias spectral estimates and contribute to secondary assignments, particularly in densely labeled regions. Practical mitigation strategies include conservative assignment rules, improved segmentation, and use of nuclear-localized reporters to reduce neuropil contribution. ”

      (5) Clarify claims in the main text/figures wherever exclusivity is implied; label which panels use single-label vs multi-label/soft assignments.

      We agree and thank the reviewer for emphasizing clarity. We did not intend to imply projection exclusivity. We have revised the manuscript text and figure legends to explicitly state where single-label (winner-take-all) assignment is used, and to avoid language that could be read as claiming exclusive projection identity as follows:

      “For quantitative behavioral comparisons, each ROI was assigned a single primary fluorophore identity using conservative winner-take-all rule. This assignment reflects the strongest spectral contribution and does not imply projection exclusivity. Rather, it provides a conservative lower-bound estimate of subtype proportions, as ROIs exceeding threshold for multiple fluorophores were classified according to their strongest spectral contribution.”

    1. Reviewer #3 (Public review):

      Summary:

      This important work provides a web-based tool to contextualize effect sizes in psychiatry with respect to reliability and base rates (collectively referred to as predictive utility analysis). The methods for the tool incorporate established psychometric principles that I think are of use for multiple fields in this seemingly easy-to-use tool. I agree with the critical importance of this tool and the methodological points made in this manuscript. Enthusiasm for the manuscript is weakened by a lack of clarity on the formulation of the paper and stated goals of the examples used, with the inferences and impact on clinical decision making from various parameterizations via this tool left open-ended.

      Strengths:

      This paper presents a well-considered and, what I think will be highly useful, web-based tool to contextualize effect sizes with respect to reliability and base rates. As the authors rightly point out, such a tool could be used in conjunction with widespread analytic power analysis tools in study planning. The paper also well contexualizes the need for such a tool in the relatively recent history of concerns of power, reliability, and inference in psychiatry specifically, and more general meta-scientific debates in psychology and neuroscience.

      Weaknesses:

      My primary feedback on this manuscript is the lack of clarity in what the paper itself, specifically, separate from the tool, is hoping to achieve. There is a central, but unresolved, tension in whether the reader is supposed to:

      (1) focus on the specifics of the examples used and whether to reevaluate the substantive claims from the studies, (2) buy in to how various reliability and base rate parameters impact modeling outcomes, (3) receive an introduction to the tool itself.

      In my estimation, the largest contribution to the field here is in (2) and (3), but currently much of the real estate of the paper is dedicated to several examples of (1). While these specific examples may be illustrative to some degree, I think given the number and brevity of such, they are unlikely to incidentally achieve points (2) and (3) above. Specific examples include the assertion of kappas for DSM diagnoses, without much nuance (e.g., see https://psycnet.apa.org/buy/2015-27500-001). Given the relatively limited space given to this example, however, it's hard to be entirely certain what the reviewer should take away.

      A second point of concern is where this tool would be situated in the research pipeline. I agree with the authors that this tool could be used in ways that parallel power analysis. With that in mind, it seems the most common use of this tool for an individual investigator is likely to be in a priori study planning. In contrast, and with my point above in mind, the use of the tool for existing results is likely best done with multiple estimates of effect sizes, reliability, and base rates, as is common in meta-analysis or consensus reviews. Nevertheless, there is no real example or guidance around how this influences new study planning.

      A third point is that more nuance would be useful in the introduction about the current state of psychiatry research. For example, I share many of the authors' concerns about reliability, power, reproducibility, and barriers to translation. That said, it is the case that while effect sizes should be considered considerably more, they are widely considered in psychiatry research via the common place of meta-analysis and other data pooling approaches. Another such example that the authors state in the context of reliability: "However, this [reliability] attenuation is rarely accounted for in routine analyses in psychiatry". This is true in practice, but somewhat misleading insofar as the method by which to do this remains unclear. For example, should we all report disattenuated associations, assuming there is no error and everything is perfectly reliable? This, of course, would be unrealistic to expect zero error. That we can achieve this with the new tool is clear, but the nuance of how and under what circumstances it should be done is not clear, and such nuance should be better reflected in the framing of the problem. That is, there is also a lack of clarity on what ought to be best practices and field-wide goals, rather than simply the lack of an ability to model these factors.

      Minor point

      For conceptual clarity, it would benefit the manuscript to at least briefly mention the role of validity in translational importance. Of course, the current psychometric issues of reliability, base rate, power, etc are critical, but it should at least be mentioned, given the potential wide audience of this manuscript, validity is important as well. For example, highly reliable measures may not be valid indicators of underlying disease etiology (e.g., fMRI head motion is a highly reliable trait-level feature, but typically not considered an important predictor or consequence of mental health worth investing translational resources in). Relatedly, confounding as a general topic would be useful to mention just briefly, to help with the spirit of considering underlying issues in translation.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility, and clarity (Required)):

      Summary: In this manuscript, the authors examine how peripherin-2 (PRPH2) contributes to the localization of CNGβ1 within rod outer segment structures. PRPH2 and its homolog ROM1 are structural components of rod discs and are required for disc morphogenesis. In the absence of PRPH2, rod outer segments do not form, and various outer segment materials accumulate and are released as cilia-derived ectosomes. PRPH2 is thought to be transported through an unconventional secretory pathway, whereas cGMP-gated channels follow a conventional trafficking route. Although these components reach the outer segment through distinct pathways, PRPH2 is necessary for the proper delivery of CNGB1, a subunit of the cGMP-gated channel, to its correct destination. It was previously reported that a small fraction of PRPH2 reaches the outer segments through the conventional pathway when it forms a complex with Rom1 in mouse photoreceptors. Using Rom1 KO mice, the authors show that this conventionally trafficked PRPH2 fraction is not required for CNGB1 transport to the outer segment. Using various chimeric constructs, the authors verified that tetraspanin core of PRPH2, delivered to the OS, is sufficient to promote OS localization of CNGB1. Ct and Nt cytoplasmic regions of PRPH2 are dispensable for the role. Overall, the majority of the experiments are well-executed with statistical rigor, written in a way that others can reproduce, and support the major conclusion indicated in the title, "PRPH2 is essential for OS localization of CNGB1".

      Major comments: I believe that the majority of the conclusions are well-supported in this manuscript. Below, I am listing the major points that may need additional experiments or clarifications: 1) CNGA1 subunit is transported to and enriched within ciliary exosomes or the outer segment in PRPH2 deficient mice (Figure 1). The reduced levels of CNGA1 and CNGB1 in rds-/- mice suggest limited stability of these proteins. Their diminished abundance is also influenced by decreased mRNA expression of the corresponding genes. These findings imply that CNGB1 may not be essential for outer segment delivery of cGMP-gated channels if CNGA1 alone contains adequate targeting information. Related to these points, it is unclear whether CNGB1 exhibits a trafficking defect or encounters other problems before leaving the endoplasmic reticulum. Such problems may involve deficiencies in folding, holo-channel assembly, or related quality control processes.

      RESPONSE: We agree with this reviewer and have added additional data and interpretation to address this point. Our new data finds that in fact a low level of CNGB1 can reach ectosomes in rds-/- rods, which makes sense since we and others had observed CNGA1 was present and we know that channel assembly occurs in the ER. This suggests that the CNG channel can properly fold and assemble. Furthermore, overexpressing CNGB1 did not restore ciliary localization in Rds-/-, leading to our interpretation that in the absence of an outer segment membrane compartment, there is no place to deliver the CNG channel and it is subsequently degraded. Apart from perihperin’s binding partner, ROM1, this is unique to the CNG channel. CNG channel subunits are still significantly lower at P21 than other outer segment membrane proteins, such as ABCA4 (shown here), rhodopsin, and PCDH21(shown elsewhere).

      2) CNGB1 overexpression in rds-/- mice does not result in outer segment localization of CNGB1 channels (Figure 2A). These findings do not clarify whether CNGB1 successfully transits through the Golgi apparatus or associates properly with CNGA1 subunits. Elevating expression levels alone would not compensate for problems in folding or assembly.

      RESPONSE: We recognize that our previous submission lacked clarity on this point. Therefore, we have restructured the order of figures and provided additional controls to improve our manuscript. First, the fact that CNG channel is present at P21 and even increases over time suggests that in rds-/- rods channel processing (folding and assembly) is unaffected. Second, we recognize that channel stoichiometry is important for proper channel assembly, so we added a new supplementary figure that shows endogenous CNGA1 expression increases in rds-/- rods that are overexpressing myc-CNGB1 and FLAG-peripherin-2. This adds credence to our CNGB1 overexpression experiments and shows that CNGB1 being trapped is not due to inefficient channel assembly.

      3) Claims related to Figure 6 (P45 rds-/-) need further evidence. It remains uncertain whether CNGA1 and CNGB1 are delivered to lamellar ciliary membranes or to a distinct plasma membrane compartment comparable to that observed in wild type rod outer segments, or whether they accumulate in ciliary ectosomes. Those lamellar structures could be a part of cone outer segments. The observed GARP signal may originate solely from soluble GARP proteins. It is also unclear if CNGA1 and ROM1 colocalize in P45 rds-/- mice. Clarifying these points would strengthen the conclusion that lamellar formation, rather than specific function of PRPH2, is sufficient for CNGB1 delivery to the cilium or outer segment plasma membrane.

      RESPONSE: CNGA1/B1 are not expressed in cones, so the elevated outer segment localization observed at P45 must be coming from rods. In mouse retina, cones make up only 3% of the photoreceptor population. The SEM data clearly show that the lamellar ciliary protrusions are present on the majority of the photoreceptors. We now include CNGB1 staining from Rds-/- P45 sections that corroborate these data and show that CNGB1 is present at P45 and not P21 (Supplemental Figure 2).

      Below are minor comments: 1) The study does not establish whether a direct interaction between PRPH2 and CNGB1 is required for CNGB1 delivery to rod outer segments. Prior work by the senior author (ref 13) suggests that this interaction is not essential, since the PRPH2 binding site within the GARP domain is distinct from outer segment transport signal of CNGB1. Including a discussion of the PRPH2-GARP (or CNGB1) interaction and its relevance to CNGB1 trafficking would help readers interpret the findings more fully.

      RESPONSE: We have included this in our discussion.

      2) The authors propose that the ROM1 core is sufficient for outer segment delivery of CNGB1 based on experiments with chimeric constructs. However, in Figure 1, ROM1 is present in the outer segments (or ciliary ectosomes) of rds-/- mice even though CNGB1 is not delivered to these structures.

      RESPONSE: Our new data, including MS analysis and Western analysis from an enriched ectosome preparation, reveal that, along with ROM1, low levels of the CNG channel are delivered to ciliary ectosomes in Rds-/- mice. However, at this early timepoint photoreceptor cilia do not produce a membrane protrusion, which we observe is required to augment CNG delivery. We expressed a FLAG-ROM1 construct to try to drive earlier creation of these membrane protrusions, but this was unsuccessful, as we observed ROM1 was primarily localized to the inner segment. This suggests that overexpression of ROM1 did not increase ROM1 delivery to the cilia. Luckily, we were able to overcome this bottleneck with several of our chimeric ROM1/Prph2 constructs that did localize to the cilia and restore CNG localization. All of these new results have been included in the revised manuscript.

      3) Line 80: "Theouter" A space shall be inserted between "The" and "outer".

      RESPONSE: Done

      **Referee cross-commenting**

      Both reviewer #2 and reviewer #3 express views that align with mine. They clearly described the study's limitations, and their comments are highly valuable.

      Reviewer #1 (Significance (Required)):

      Prior studies showed that CNGB1 is not present in cilia-derived ectosomes of rds-/- mice, indicating that PRPH2 is necessary for ciliary or outer segment localization of CNGB1 in rods. Building on these earlier findings, I consider this study significant for the following reasons: 1) Using detailed analysis of different PRPH2 domains and chimeric constructs, it clarifies that PRPH2 core region, delivered to OSs, is essential and sufficient for OS localization of CNGB1. 2) PRPH2 and CNGB1 are thought to travel through different post-ER transport routes, with one pathway bypassing Golgi regions and the other passing through them. This study shows that CNGB1 depends on PRPH2, which suggests that these two routes may converge or interact at later stages and opens new directions for future investigation. 3) The study is relevant to basic scientists and biologists investigating how membrane structures acquire specialized functions in neurons, and its implications extend beyond photoreceptor biology.

      Limitation of the study: I believe that clarifying these points will make the manuscript more significant. 1) Is it not clear, as mentioned above, how PRPH2 contributes to the delivery of CNGB1 to the OSs in the different secretory pathways.

      RESPONSE: In the absence of ROM1, Prph2 only travels through the unconventional secretory pathway directly from the ER. By looking at CNG trafficking and localization in ROM1-/- mice, we rule out the possibility that the small portion of PRPH2/ROM1 complexes that traffic conventionally through the Golgi are required for channel localization (Figure 3). Further, our Rho-Prph2 chimera that includes the trafficking signal from Prprh2 did not rescue CNGB1 localization (Figure 4). These findings suggest that it is unlikely that these proteins engage during secretory transport to the outer segment.

      2) The prior study using a fluorescence complementation approach (Ritter et al, 2011) suggests that PRPH2 and CNGB1 can associate within rod ISs, likely before their delivery to OSs. However, it remains unclear whether this interaction supports the potential cotransport of CNGB1 and PRPH2 or whether the authors view these proteins as being transported independently.

      RESPONSE: As described above, our experiments rule out the notion that co-transport through the Golgi is driving CNG channel ciliary localization. We now note in our discussion that this data does not rule out the possibility of an earlier association between these proteins. However, the bulk of our data supports that any early interaction is not required for ciliary delivery.

      3) At the end of the result section (Figure 6, rds-/- P45), the authors suggest that lamellar formation (evaginations?) is required for CNGB1 transport. However, CNGB1 is normally not seen in evaginations or lamellar structures, and thus the assumption is not consistent with prior findings.

      RESPONSE: Absolutely, we agree that the CNG channel does not enter newly forming disc membranes, which has been shown by multiple groups. We included this in our discussion and have now added a clearer statement of our hypothesis: “Together, these data suggest that the partitioning of disc membranes from the plasma membrane by tetraspanin proteins is a key step for localizing the CNG channel and could play a role in segregating other proteins into the plasma membrane.”

      Overall, the manuscript is insightful and has the potential to advance our field and related disciplines.

      RESPONSE: Thanks!

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Cyclic nucleotide gated channels (CNG) localize to the plasma membrane of the rod photoreceptor outer segments, and are a key component of the phototransduction cascade. Understanding how outer segment proteins are trafficked and sequestered to the outer segments is an important field of investigation as it addresses both a fundamental aspect of cell biology and mechanism of disease, many of which have trafficking defects at the core of the pathogenic process. Using primarily IHC analysis of rodent models in combination with introduction of various expression constructs to the retina (through electroporation), this study finds that two rod outer segment structural proteins, peripheral-2 and ROM1, facilitate CNG channel localization to the outer segment.

      While this conclusion is interesting, a major concern that tempers enthusiasm is that in peripherin-2 null photoreceptors, there are no outer bona fide segments. In lieu of outer segments, there are rudimentary membranous protrusions and vesicles distal to the connecting cilia where outer segments should be. So the basis for concluding that peripherin-2 is required for CNG localization to the outer segment seems a bit wobbly. It is understood that the authors assumed the membranous materials distal to cilia as proxy for outer segments in their analysis and narrative. This assumption may have some merits. However, it is well known that when outer segment morphogenesis is severely compromised, all normally outer segment-bound proteins are ectopically localized or largely absent due to increased degradation. This could be simply due to the loss of their destination compartment, among other things. It is not clear how the authors could distinguish between a direct causal relationship where loss of one protein leads to the mislocalization of another, from secondary outcomes due to loss of the outer segments. The last sentence of the Abstract is telling. "Interestingly, this notion is supported by endogenous staining of CNGB1, which reappears in aged Rds-/- rods that have produced ciliary membrane protrusions." So in aged mice CNGB1 did localize to the OS, but what changed? There was more OS like material to house the CNGB1 protein in the aged mice.

      RESPONSE: We agree that the loss of the OS compartment is likely driving downregulation of all OS proteins and have included a statement as such in our manuscript. We also performed additional qRT-PCR analysis on ROM1 and ABCA4 to show global downregulation at the mRNA level – consistent with the notion that there are reduced outer segment proteins when morphogenesis is compromised. However, our Westerns and IHC (as well as published data) clearly find a specific decrease in the CNG channel at the protein level, suggesting that not all proteins behave similarly when the outer segment is not formed. We included additional discussion on this point as well. While not directly examined in our manuscript, previous reports have shown the reverse effect: some outer segment proteins (e.g. PCDH21, Prom1) are upregulated in rds-/- retinas (Rattner et al JBC 2004). Therefore, it is an oversimplification to state that all outer segment proteins behave the same when outer segments are not formed properly. Other models of outer segment dysmorphia (e.g. RhoKO, PCDH21KO, Prom1KO, or WASF3) localize the CNG channel properly. We have added this to the discussion and hope that by restructuring our manuscript, we clearly outline that we do think that membrane retention at the tip of the cilia is driving CNG channel localization and that molecularly the tetraspanin proteins play a role in organizing these membranes.

      Reviewer #2 (Significance (Required)):

      Trafficking of nascent proteins to the outer segment in support of its renewal is an important subject, which has significant impact in understanding the mechanisms of retinal degeneration. The conclusion from this study, that peripherin-2 and ROM1 have a direct role in supporting CNG subunit trafficking may well be meritorious. However the data presented are less than fully convincing, and specifically the question of a direct vs secondary effect needs to be better addressed.

      RESPONSE: We appreciate this reviewer’s enthusiasm for investigating this process. The initial premise of our study was to investigate whether a direct effect of peripherin-2 on CNG delivery was possible, which was meritorious based on previously published data. However, we now find no direct trafficking link between CNG and peripherin-2; instead, our data largely find that CNG delivery is dependent on the presence of retained membranes at the ciliary tip – either through natural mechanisms or by driving “rudimentary” outer segment membrane lamination by overexpression of tetraspanin domains. We have restructured the manuscript to help guide the discussion.

      The following quote underpins some of the reasoning in the study. Lines 139-144, "(Figure 2A). This localization pattern suggests that the CNGB1 subunit is trapped in the biosynthetic pathway. In contrast, when FLAG-tagged rhodopsin is overexpressed in Rds-/- rods it traffics properly to outer segment ectosomes (Figure 2B, (19)). We posit that without proper exit from the biosynthetic pathway, the endogenous CNGB1 protein is rapidly degraded to undetectable levels, which we circumvent through overexpression. These data suggest the localization defect of CNGB1 in Rds-/- rods is in the trafficking of CNGB1. " This in my view is an over- interpretation of limited data. The statement implies that rhodopsin and CNGB1 qualitatively differ in their fate but I would argue that both proteins are heavily degraded intracellularly except more of rhodopsin escaped to the "OS" and shows up in IHC. In many rhodopsin mutant transgenic mice, mutant rhodopsin appeared in OS even though intracellular degradation (gumming up the system) is a major factor in the disease process. The claim "rhodopsin trafficked properly to outer segment ectosomes" is not grounded in solid data.

      RESPONSE: We do fundamentally agree that the endogenous CNG channel is heavily degraded, which we confirm by overexpressing an exogenous CNGB1-myc and finding it trapped in the biosynthetic pathway. As stated by the reviewer, this localization pattern is in contrast to what we and others have observed for endogenous rhodopsin, and now show for overexpressed FLAG-rhodopsin – that rhodopsin does traffic to the OS ectosomes. By comparing the localization of both endogenous and overexpressed constructs (using the same promoter), we feel that our conclusion is well supported. We appreciate that our wording of “rhodopsin trafficked properly to the outer segment” is misleading, as traffic of membrane proteins in Rds-/- rods is generally affected and not “proper”. Importantly, we follow up this “limited data” with additional experiments showing that at high expression levels, we are unable to drive CNGB1 localization to OS ectosomes unless we co-express with a tetraspanin domain.

      A further minor comment is that the scope of the study appear limited, with no attempted experiments on how these proteins might interact to effect facilitation of trafficking.

      RESPONSE: Our approach was to be agnostic to the outcome of our hypothesis that peripherin-2 was directly involved in CNG channel trafficking. The experiments we performed to test this (ROM1-/- analysis and Prph2 C-terminal chimeras) did not support a role for peripherin-2 in CNG trafficking. Instead, our data support a model in which membrane retention and organization at the ciliary tip drives CNG channel delivery. We feel that our approach was not limited.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      in the gene encoding tetraspanin protein peripherin 2 (Prph2), i.e., Rds-/-, examining the requirements for various portions of the Prph2 protein in the context of an assortment of chimeric constructs expressed via transfection into photoreceptor cells, to restore localization of the beta subunit of the cyclic nucleotide-gated channel (CNGbeta1) to photoreceptor outer segments (OS) (in a small number of experiments) or, in the majority of experiments, to do so for a recombinant tagged version of this protein also overexpressed by transfection.

      The concluding sentences of the Discussion, which summarize the major conclusions are as follows: "Our data clearly show that localization of the CNG channel is dependent upon peripherin-2 after biosynthetic exit, further suggesting that the necessary action is at the ciliary base. Supporting evidence for this comes from analysis of Rhodopsin knockout outer segments which have internal disc-like structures and localize CNG channel properly. Therefore, in the absence of a fully elaborated outer segment, peripherin-2's ability to delineate a disc is sufficient to drive CNG channel delivery. Together, these data suggest that the partitioning of disc membranes from the plasma membrane by tetraspanin proteins is a key step for trafficking the CNG channel and could play a role in segregating other proteins into the plasma membrane.

      The first sentence contains both reasonable conclusions and phrases whose meaning is unclear or not supported by the results presented. The statement: 'localization of the CNG channel is dependent upon peripherin-2 is supported by the data but, of course, has long been known from previous studies of Rds-/- mice. What is meant by "...after biosynthetic exit..." is unclear. If, by this term, apparently newly invented, the authors mean "after its synthesis of the protein is complete," the statement is accurate, but also a truism.

      RESPONSE: The absence of CNGB1 was reported in previous studies, but the mechanism driving its absence has not been investigated. In our resubmission, we have added additional data that now shows CNGB1 is present at very low levels in Rds-/- ectosomes but remains undetectable by IHC, which is consistent with previous studies mentioned by the reviewer, but is also a novel finding. Importantly, we find specific downregulation of CNG channel subunits in Rds-/- retinas compared to ABCA4, supported by Western blot analysis (Figure 1), and we investigate the mechanism driving this result.

      We appreciate the reviewer pointing out that “biosynthetic exit” is a niche term not broadly understood. We have removed this statement.

      The statement, "the necessary action is at the ciliary base," is NOT supported by the data presented, as the effect of the "successful" Prph2 constructs on CNGbeta1 localization is primarily to increase its levels at the distal end of cilia and at the base of OS-related structures formed in response to the presence of the Prph2 constructs. The restoration of these membranes, which, as the authors note, has been previously reported, is overwhelmingly the biggest effect of these constructs, and it could be argued that the restored localization, rather than degradation, of CNGbeta1 is merely a downstream consequence of the formation of these structures, with perhaps, an element of stabilization of CNGbeta1 toward degradation from direct binding to Prph2, which has also been previously reported.

      RESPONSE: We agree with the reviewer. Our interpretation of our data is that the presence of Prph2 (or its variants) at the distal end of the cilia localizes CNGB1, likely due to the formation of outer segment membrane structures. Previous to this work, there was a possibility that targeting information of Prph2 was required for CNGB1. That had never been explored. We definitively rule this possibility out when we express the C-terminal tail of Prph2, which is unable to rescue CNGB1 localization. Because the tetraspanin domain of Prph2 (or ROM1) can localize CNGB1, we do agree that the definition of an outer segment structure is the driving force for CNGB1 delivery – these are new findings. We’ve restructured and added additional discussion to the manuscript to clarify this point.

      The next suggested conclusion is, "Therefore, in the absence of a fully elaborated outer segment, peripherin-2's ability to delineate a disc is sufficient to drive CNG channel delivery," is partly accurate and partly misleading. If the word "localization" were to replace the term, "delivery," concerning which there are no data (aside from those confirming that Prph2 and CNGbeta1 pass through distinct secretory pathways), this statement would be an accurate summary.

      RESPONSE: We have updated to “localization”, but the fact that we confirm these two proteins do not traffic together through the Golgi would suggest that delivery is independent of trafficking.

      The final sentence, "Together, these data suggest that the partitioning of disc membranes from the plasma membrane by tetraspanin proteins is a key step for trafficking the CNG channel and could play a role in segregating other proteins into the plasma membrane," sentence, would also be accurate if the word "localization," were to replace the term, "trafficking." The key point for these qualifications is that the experiments presented measure steady state levels of CNGbeta1 constructs at certain locations, which are determined not only by rates of trafficking, but also rates of synthesis and degradation, and the data presented confirm that total levels of CNGbeta1 are greatly diminished in the absence of functional Prph2, rendering any conclusions about the relative roles of trafficking kinetics and degradation kinetics speculative in nature.

      RESPONSE: We agree and have revised.

      Aside from these major conceptual issues, there is one overriding technical question: why are almost all the experiments presented carried out with a highly over-expressed engineered version of CNGb1 with a tag, which is clearly context far from the physiological one, as opposed to examining redistribution of the endogenous CNGbeta1, which is of much greater interest. In some results relegated to a Supplemental figure (Supp. Fig. 2), the authors clearly demonstrate that sufficient signal can be obtained from immunofluorescence staining the endogenous proteins for such experiments to be readily interpretable. If the concern was cross-reactivity with non-covalently attached GARP proteins, a few experiments showing that similar results are obtained for immunostaining of the endogenous protein or of the tagged construct would haver been sufficient, and the paper could have had more physiological relevance and impact.

      RESPONSE: We agree that endogenous CNG staining is important and valuable, which is why we included it in our manuscript. We were able to confirm that overexpressed CNG recapitulated the endogenous staining. We proceeded with analyzing overexpressed, tagged CNG for the reasons stated by the reviewer. Yes, cross-reactivity with soluble GARP proteins was one consideration, as was the fact that the GARP antibody is a mouse monoclonal antibody. Increased IgG due to inflammation in the RDS-/- model can obscure the outer segment region in these retinas, confounding our quantification. The tagged versions of CNGB1 and corresponding quantification offered the most clarity and continuity for the reader; therefore, we relegate the endogenous staining to the supplement.

      The remaining concerns are generally of less significance and mostly conceptual or quite minor technical concerns. Technically, the imaging data and their quantification are of good quality and analyzed with reasonable rigor.

      RESPONSE: Thanks!

      Abstract: "In this study, we investigate how peripherin-2 is engaged in CNG channel delivery to the outer segment. Might this not be more a question of how the absence of properly formed discs impacts the formation of outer segments with plasma membranes surrounding the disks? Is this really a question of "delivery" or "lack of address to make the delivery"?

      RESPONSE: Our interpretation of this comment is that it boils down to semantics. Delivery is inclusive of both trafficking and localization, which we investigate in our manuscript.

      Page 3, "fluorescence complementation between peripherin-2 and CNGb1 in the inner segment of transgenic Xenopus rods (23) ". The wording is unclear. It should be stated clearly that they are describing results of "bimolecular fluorescence complementation assays" of highly overexpressed recombinant proteins expressed from transgenes.

      RESPONSE: We have revised.

      Page 4, "...trapped in the biosynthetic pathway," It is unclear what the authors mean by this phrase. Obviously, "biosynthesis," i.e., translation is indeed complete, but biochemical pathways are not places. Is the intention to suggest that post-translational processing, such as addition and editing of carbohydrate chains or assembly with the alpha subunit has not been completed? If so, it would be better just to say so clearly. Or, is it meant to imply that it is physically "trapped" in the ER and/or Golgi apparatus? In any case the meaning should be made clear. Co-staining with ER and Golgi markers would have been very informative with respect to the compartments in which the highly overexpressed recombinant protein is trapped.

      RESPONSE: We acknowledge that our phrasing here was indirect. We have revised. Co-staining with Calnexin (an ER-marker) was attempted, but proved to be uninformative.

      It should also be noted that accumulation of highly overexpressed membrane proteins within internal membranes and membrane aggregates is a very commonly observed experimental phenomenon, and not restricted to the highly specialized trafficking routes in photoreceptors.

      RESPONSE: We agree that exogenous expression of membrane proteins can lead to increased presence within internal membranes of the inner segment, which we routinely see in our experiments. Importantly, our analysis is restricted to the ability of these exogenously expressed proteins to reach the ciliary compartment in Rds mice. We also conduct these experiments in wild-type retinas to ensure that our constructs are expressed, and the proteins reach the ciliary outer segment under normal conditions.

      Page 4, " peripherin-2 facilitates trafficking of the CNGb1 subunit to the outer segment " The data presented to this point do not demonstrate an enhancement of transport, but only of steady-state levels. There is nothing to rule out the possibility that some beta subunit is trafficked in Rds-/-, but is unstable to degradation in the region near the cilium when peripherin-2 and outer segments are not available. An increase in transport is certainly a possible explanation for the results, but should not be taken as an unambiguous conclusion.

      RESPONSE: We have altered the description of these results to allow for more interpretation of our data, which show that CNGB1 delivery to the outer segment is reduced in Rds-/- mice and enhanced when peripherin-2 is re-expressed.

      Page 4, " We confirmed that the fraction of peripherin-2 that traffics conventionally through the Golgi is indeed absent in Rom1-/- retinas and found that trafficking of the CNG channel via the conventional pathway is unaffected (Figure 3A) . This is one of the stronger and more interesting results in this manuscript, and tilts the argument against trafficking as being the mechanism for enhancement by overexpressed peripherin-2 of beta subunit levels in the distal region of the photoreceptor layer.

      RESPONSE: We agree.

      Page 5, " Our finding that secretory trafficking of peripherin-2 and CNGb1 is distinct . Clumsy syntax- needs to be rewritten for clarity.

      RESPONSE: Revised

      Page 5, "two previously characterized fusion proteins... have been shown to localize to the outer segment and build a rudimentary membrane structure (19) " This previous result, which is critical to interpretation of the results in this manuscript, should be introduced early, before any experimental results using related constructs are presented, in order to avoid confusion.

      RESPONSE: Prior to these experiments, we used only full-length peripherin-2, rhodopsin, or CNGB1. This paragraph is the first introduction of any chimeric protein, and we explain these two constructs thoroughly. We believe this satisfies this reviewer’s request.

      Page 5, " We confirmed these data by staining for endogenous CNGb1 in Rds-/- rods electroporated with each construct (Supplemental Figure 2B,C) " This is the most informative result in this manuscript with regard to the ability of these constructs to restore proper localization of CNGB1- it is not clear that the overexpression constructs for CNGB1 present any advantage beyond stronger signal and they may not be assumed, a priori, to be faithfully reporting on interactions of Prph2 with endogenous CNGB1, which is the biologically significant question. A big problem with Supp. Fig. 2 is that there is no real control, i.e., one without any Prph2 construct electroporated. Even the Rho-Prph2CT construct has some ROS-related structures and some CNGB1 localized to the one shown at higher magnification. The Prph2-RhoCT construct seems to lead to a substantial increase in endogenous CNGB1 in inner segment membranes. This looks like a phenomenon that is potentially very interesting, although it doesn't fit with any of the models put forth in the manuscript.

      RESPONSE: We agree that endogenous staining (shown in Supplemental Figure 3 of our revised manuscript) is informative, but it was technically challenging. Once we verified that our overexpression system recapitulated results for endogenous CNGB1, we went forward with the epitope-tagged CNGB1, which was clearer when quantifying CNGB1 localization to rudimentary outer segments.

      Our electroporation method provides an excellent internal control, as all of the non-electroporated cells show no endogenous CNGB1 localization without peripherin expression (Sup Fig 3A).

      Page 5, " cytosolic N- and C-termini of peripherin-2 are dispensable for CNGb1 outer segment localization " No- if you could simply remove them and get proper localization, that would show they are "dispensable." In these experiments they are always replaced with the corresponding region of some other protein that is localized to OS, or in one case, with 3 copies of the FLAG tag at the N-terminus. There are also clear differences in the efficacy of the different "successful" constructs, but these results and their implications are not really discussed.

      RESPONSE: We make this statement in the context of these termini being dispensable to CNGB1 localization, not to peripherin-2’s stability, function, or localization. A complete truncation of either domain results in a non-functioning protein. Our supplemental data shows reduced expression with a truncated N-terminus, preventing analysis (Sup Fig 5C). The 3X-FLAG has no known function in the cell, and we believe it serves as a proxy for removing the N-terminus altogether. Removing the C-terminus would prevent proper outer segment targeting, which is key to determining how peripherin-2 impacts CNGB1 ciliary delivery. Replacing this C-terminus with an outer segment targeting domain from another protein is an established method of investigation.

      Page 6, " We then wanted to determine whether the ROM1 tetraspanin region was sufficient to facilitate CNGb1 delivery by further replacing ROM1's cytoplasmic N-terminus with that of peripherin-2 (Prph2NT/CT-ROM1) . " This experiment obviously does NOT test "sufficiency" of the TM segments, as the construct has the termini replaced with the corresponding regions of Prph2, which might functionally substitute for the missing ROM1 regions.

      RESPONSE: Our previous results had already ruled out a role for these termini in CNGB1 localization.

      Page 6, " We show a dramatic increase in GARP staining in the aged Rds-/- retinal sections " The age dependence of this phenomenon is quite interesting and puzzling. Any thoughts on the mechanism?

      RESPONSE: We agree that this natural process is very interesting. We have restructured the order of our figures and provided additional controls to support this finding. We have added this to the discussion and hope that by restructuring our manuscript, we clearly outline that we do think that membrane retention at the tip of the cilia is driving CNG channel localization and that molecularly the tetraspanin proteins play a role in organizing these membranes.

      Page 6, " Although CNGα1, known to form homotetramers, can localize to the extracellular vesicles released into the outer segment area. " Not a sentence.

      RESPONSE: Revised

      Page 6, " Our data now shows that the population of peripherin-2 in complex with ROM1 that travels through the conventional trafficking pathway does not play a role in CNGb1 localization to the outer segment. " This is an oddly accurate, albeit somewhat contradictory sentence. Yes, you have failed to answer the question you claim this work was designed to address. Apart from this negative result, nothing is learned about trafficking, per se, from the experiments in this manuscript.

      RESPONSE: Please see our response to the reviewer’s comment above that clarifies our thinking regarding our results on trafficking.

      Page 7, " anticipated " Hopefully, the authors mean to say, "hypothesized," here.

      RESPONSE: Revised

      **Referee cross-commenting**

      My impression from reading the reviewers' comments is that there is general agreement on both the strengths and the limitations of this work. In my opinion, the issues raised by the reviewers could be addressed by editing the manuscript to be more circumspect in drawing definite conclusions from data that are not fully conclusive, without necessarily adding new experiments.

      Reviewer #3 (Significance (Required)):

      This study addresses a problem of great interest in the photoreceptor field and in cell biology more generally of trafficking and localization of specialized membrane proteins to specialized ciliary membranes. The strengths are technical quality of data with good controls, in most cases. The limitations are largely conceptual in nature and derive from the rather simplistic approach to the experimental design, as described above. The rather dated, "mix and match" approach based on chimeric construct with pieces of sequences removed and replaced at will does not properly account for the conclusion reached many times from many experiments, including some this manuscript, that the "roles" of stretches of amino acid sequence depend exquisitely on the multidimensional context in which they are tested, not simply on their position in the linear sequence. The paper presents interesting and convincing results with respect to functional requirements for formation disc-like membranes, but very little with respect to 'trafficking."

    1. On 2023-02-23 20:27:02, user Olavo Amaral wrote:

      I recently reviewed this manuscript for a journal. For the sake of transparency, I thought it was worth it to post my comments here on bioRxiv as well, as it brings the review effort within the public domain. Let me know if you have any feedback and congratulations on the work: it's a nice paper on a very important topic.

      Summary:

      The manuscript addresses the question of “shortcut citations” in methods description. Although this problem is frequently mentioned in debates about methodological reproducibility, it is understudied and it’s nice to see actual research about it.<br /> The results contain three main sections, which study (a) the prevalence of various types of citations in the methods sections of articles in highly cited journals, including shortcut ones, (b) examples of what happens when shortcut citations are followed and (c) a review of journal policies. This is followed by a reasonably extensive discussion focused on (d) guidelines on how to use shortcut citations.<br /> I generally agree that this is an interesting structure, as it (a) documents the phenomenon, (b) evaluates to what degree it represents a problem, (c) inquires what is being made to address it and (d) suggests additional measures. The weakest link in the chain, however, seems to be point (b) (i.e. measuring the impact of the problem), as I am not sure the case studies provided are enough to quantify this. I will try to make this clear in my main point below.

      Main point:

      • While the numbers of articles and citations in the first section of the study are probably sufficient to provide an overview of the use of citations, the 15 articles included as case studies in the second section are not. The authors seem to acknowledge this limitation, as they refrain from making a quantitative synthesis of these articles. That said, this leads this section of the manuscript to fall short in accurately presenting the importance of the problem. <br /> Although I found the visualization for each case study provided in Fig. S2 interesting, I would doubt that most readers will really make the effort to go through each one of them, much less be able to synthesize the data in their own heads to reach meaningful conclusions. Thus, I would strongly recommend that the authors provide some kind of quantitative synthesis of the problem in this section (i.e. What percentage of shortcut citations can ultimately be traced to the original reference? What’s the average number of steps? What percentage is behind a paywall? What percentage reaches a dead end or an insufficient description?).<br /> I note that 15 articles are probably too few for this purpose, and that the sample of articles in which citations are followed would have to be expanded. Thus, I would recommend that the authors perform a sample size calculation to reach the number of citations/articles that can provide reliable estimates within a given confidence interval. For this purpose, it’s worth noting that it would be desirable to perform synthesis both at the level of citations (i.e. what percentage of citations in the sample can be traced?) and at the level of articles (i.e. what percentage of articles in the sample have at least one untraceable citation?), as citations within a single article should not be considered as fully independent units when it comes to representing the whole population of citations. Thus, using articles as units for the purpose of sample size calculation might be the better option.

      Other general points:

      • The categorization of scientific fields is somewhat strange: most people would probably consider neuroscience is a subfield of biology, so presenting both as separate categories may puzzle some readers. I understand that this is a consequence of the JCR categories used, but making this clearer from the start (e.g. “examine the use of shortcut citations in neuroscience, biology and psychiatry journals in the abstract) and perhaps referring to the biology journals as “general biology” would help to avoid confusion.<br /> Still on this point, the selection of fields is narrow and ad hoc. I understand that this is a limitation posed by the authors’ own expertise, but it is nevertheless one of the main weaknesses of the manuscript. Thus, the narrow range of scientific fields examined should probably be mentioned in the limitations section.

      • Even within this relatively narrow sample of fields, the kinds of methods that deserve a protocol probably varies a lot: I’d guess that psychiatry journals include a lot of surveys and instruments, while biology and neuroscience might have predominantly wet lab protocols. It would be interesting if somewhere in the paper (possibly in the example cases provided) we could get a feeling of what kind of “protocols” we are talking about, even if only in a general sense. If quantifying/classifying them is not feasible, at least some illustrative examples could be provided. Are we talking about methods to quantify proteins? Scales to measure depression? Electrophysiology setups for rodents)? The citation culture probably depends a lot on the particular method, so the whole discussion sounds a bit disembodied without touching on this point somewhere.

      • Why are only minimum/maximum numbers of citations within shortcuts and the youngest/oldest citation coded? This looks like an approach to simplify data extraction, but it ends up providing very limited information (i.e. especially if there are many citations per paper, the oldest and youngest ones give very little information on the actual range).<br /> Moreover, this ends up making data visualization in Fig. 3 much less intuitive than it could be (i.e. it would clearer and more informative to provide the full range of citation ages). If the authors could provide the full ranges (although I’m not sure that this is feasible), this would likely strengthen the paper. If not, I’d reconsider whether Fig. 3 should be included in the main results, as I don’t think the results as displayed say much about the sample of citations as a whole.

      • Some points in the case series description and discussion mention that some references “provided a description that was no longer state-of-the-art” and that this may be a problem. I don’t really get the idea here: methods citation are supposed to provide an accurate description of what was done in a study, not of what’s the current state of the art of the method. In this sense, descriptions shouldn’t age badly or become “not-state-of-the art”.<br /> I understand the concern that a very old shortcut citation raises suspicions that it might not really describe what was done in the paper (as it may be likely that no one uses certain methods in exactly the same way after 50 years). But if this is what the authors meant, this should be stated more clearly, as it is not really the impression that comes out of reading these passages.<br /> In the same vein, mentioning in the discussion that “supplemental methods cannot be updated” is technically correct, but is not a limitation in terms of making methods sections reproducible (which seems to be the point of the paper). For the purpose of methods description, whatever was used in a paper should remain static, even though the method may evolve in subsequent study.

      • In terms of data sharing, one thing I could not find in the manuscript or in the OSF was the DOI and title of the articles used as case studies in Fig. S2. I may have missed it, but as there was no folder for the case series section I didn’t know where to look for it. As this seems important for reproducing the findings, this list should be provided somewhere (possibly as a document within the OSF) and cited within the text and legend to figure S2.

      Minor points:

      Introduction:<br /> - The correct name of the project mentioned in the first paragraph is Reproducibility Project: Cancer Biology (not “for Cancer Biology”).

      • “This risk of bias for randomization sequence generation and allocation concealment was unclear…” – this sentence seems odd (in particular the “This” at the start), please revise the wording.

      Figure 1:<br /> - Isn’t the methods section a viable alternative for sharing details needed to reproduce experiments as well? While I agree that in many cases a separate protocol may be a better option, that depends on the length of detail that is needed, which will vary greatly depending on the method. Therefore, I would argue that the methods section should be included as an option in the figure – saying that the information “should” be shared in a separate document sounds overprescriptive.

      • The second “readers” can be omitted from the third sentence of the figure legend.

      Methods:

      • Instead of citing the full OSF page for “protocols, data and code for the prevalence study and journal policy studies” using a single link, wouldn’t it make sense to cite a specific DOI for each of these resources? The same thing hold for points in the text in which specific resources are cited (e.g. “The full search strategy is available on the OSF repository” could point to a direct link to the search strategy rather to the full OSF page).<br /> I think this is optional, as the Readme files in the OSF are clear. But providing specific links to each resource would be more consistent with the authors’ recommendation of providing pages for book citations, for example (in the sense of sparing the reader the trouble to search for a resource within a larger space).

      • What is meant by “top journals” exactly? Are those the ones with the highest impact factor in the JCR in their specific fields? Although this would be my guess, it is not clear from the description.

      • The data on whether papers were related to SARS-Cov2 sounded rather gratuitous, as Covid-19 was not mentioned anywhere in the introduction. If this data is to be kept in the paper (I personally don’t think it adds much), the rationale for extracting this should be mentioned somewhere.

      • Though this eventually became clear, I initially had a hard time to understand what was meant by “number of citations per shortcut”. This could be made clearer when this variable is first introduced.

      • The description of a probable shortcut states that “additional details are not provided in the following sentences or elsewhere in the methods sections”. But what happens if the method is fully explained outside of the methods section (i.e. in the supplementary material or in a repository)? I was unsure how these cases were classified, so it’s probably worth commenting explicitly on it.

      • Electronic searches were performed using the terms “[journal name]”, “journal citation reports ranking”, “author guidelines”, “journal policy”, and “impact factor”. I don’t quite get what this search means to achieve. Why would one need to search for “impact factor” to look for policies?

      Results:

      Figure 2:

      • The different areas have different mean numbers of methods citations per paper (being somewhat higher in Biology). Thus, showing the results for different categories in percentages as in Fig. 2A may cause misleading impressions – although there are still less “How” citations in Biology than in Neuroscience or Psychiatry when measuring absolute numbers, the actual difference is smaller (while that in “Who or what” citations is even larger). Having the bars represent absolute numbers (possibly still displaying the percentage within the bars) – with overall longer bars for Biology – would likely provide a more accurate impression of what’s going on.
      • It took me a while to understand the right panel in Fig. 2B. While the fact the two sides of the violin plots represent different data eventually becomes clear, wouldn’t it make it easier on the reader to break the information for probable and possible citation into separate plots (especially as the left panel uses symmetric violin plots)?

      Tables S5 and S6:<br /> - Can’t the information in these tables be included in the legend for Fig.2 and Fig.3 (as it is relatively short and essentially synthesizes the data in the figures)? This is optional, but would leave the information in one place instead of creating a lot of supplements.

      Figure 5:<br /> - Are the categories in Fig.5A and 5D mutually exclusive? It would seem to me that a journal could encouragd providing sufficient methodological details both in the author guidelines and as policy, and that they may encourage sharing methods in more than one place (i.e. repository or supplemental files). This is likely worth commenting on in the legend.

      Discussion:

      • I don’t think the Germany and California examples mentioned in Box 1 are needed: there are plenty of places of the world with much worse access, and these particular examples are not particularly representative of difficulties faced by the world at large.

      • While I agree with the recommendation to “make all methods publications open access”, I don’t think that there’s any particular reason why methods papers are different from the rest of science (in the sense that they should be open access), so I’m not sure the recommendation really belongs here.

      • The discussion about copyright issues described in the list of recommendations is long for an item in a list. Thus, it probably would fit better in the main text or in a box.

      Table S7:<br /> - I get the feeling that Table S7 would read better if lines and columns were reversed (i.e. methods as lines, features as columns), but it may be a matter of taste.<br /> - Why are supplemental files and protocol journals deemed static while shortcut citations are not? This does not make much sense to me.<br /> - I’d say supplemental files would generally be expected to have been peer reviewed. I agree that this is likely not always the case, but that probably depends more on the reviewer than on the journal (e.g. I don’t know of journals that explicitly exempts supplementary material from the peer review process), so I’d remove “depending on the journal”. <br /> - The comment “protocols remain available over time” made for repositories stands for all categories – it makes sense when comparing a protocol repository to a lab notebook, not to the other forms of describing protocols. Thus, I’d probably not include it as an advantage here.<br /> - I’d argue that both shortcut citations and supplemental files are “findable” for whoever’s reading the paper (which is likely what matters here), so I’d be inclined to remove this category. <br /> - Clinical journals are not the only one to publish protocols as articles (the systematic review community has a tradition of publishing protocols, for example).

      Figure 6:<br /> - In the last no/no option, describing the method in the main text (if it is simple enough to fit) should also be included as an alternative.

    1. On 2022-09-22 21:33:39, user Jason Shepherd wrote:

      The question of how information is stored in neuronal ensembles during learning and memory has recently become accessible with IEG tagging approaches. How precisely tagged ensembles relate to the engram, or memory trace, is still not clear. Another important question is how do tagged ensembles mature or change over time and what is the precise engram that is required for remote memory recall. This preprint shows strong data supporting the idea of overlapping, but distinct ensembles involved in recent and remote memories. The authors show that tagged ensembles change their network connectivity over time, using innovative viral tracing techniques. For example, dCA1 neurons that project to the ACC are more likely to be engram cells at remote recall than recent recall timepoints, and fewer ACC to dCA1 cells are active at remote recall compared to recent recall time points.

      We think there are some additions that could be made to improve the conclusions and data presentation:

      1.Showing individual data-points for all bar graphs would improve the interpretability of the data throughout the paper. We also noticed that in some experiments, the N values for controls vs manipulated/activated animals is vastly different (eg Fig. 4).

      1. Include individual statistical tests in each figure/panel.

      3.The authors quantify the overlap of engrams tagged at different time points by calculating the overlap compared to expected overlap. While this is useful to show that Fos-tagged ensembles are not random, we believe it is important to also include the absolute percentage of overlapping cells to determine the similarity of engrams. It appears from the IHC images that the absolute overlap is a low percentage of the total number of neurons tagged as engram cells at any particular timepoint. This should also include the total % of Fos-tagged cells in each experiment. Since the total % would greatly alter the expected value by chance. Indeed, in many cases it seems that there is less than the expected chance value indicating that ensembles are not activated randomly but may be distinct.

      1. We appreciate the design of the first set of functional experiments where 4-OHT is administered during recall (Fig 1K). This approach shows that the same cells active in the recent recall engram are those inhibited by CNO a month later at the remote test. To take this experiment one step further, one could add a group where 4-OHT treatment is administered 2 days post-acquisition without a recall test, or with a recall test in a different context, and evaluation of fear conditioning at the remote time-point. This would be a convincing way to show that CNO is not simply inhibiting enough neurons to block the remote memory, but rather that it is the activity of those specific neurons in the original recall engram which are necessary for remote recall.

      Review made by Shepherd lab members

    1. On 2022-04-11 16:37:49, user Leslie Kay wrote:

      This is a review I posted on Qeios, thinking it was biorXiv asking me for the review. (Aside: Can someone explain to me what Qeios is, and how it's related to open access?)

      This paper tests the hypothesis that the olfactory bulbectomy (OBx) model of major depressive disorder (MDD) is caused by a lack of OB gamma band oscillatory input to the limbic system. OBx is a catastrophic surgery accompanied by significant blood loss and requires weeks of recovery. This leads to a confound with neurodegeneration. The current paper used DREADDs to silence the OBs bilaterally and chronically for several weeks. Additionally, they used short term silencing and cancellation / enhancement of gamma oscillations in an LPS model of MDD.

      Several findings support the hypothesis that it’s loss of OB input to the limbic system that causes the depressive phenotype. There are some differences dependent on the type of silencing. The open field test (OFT) is the gold standard for OBx depression, with hyperactivity and avoidance of the center the classic behaviors indicative of MDD. With chemogenetic silencing, animals avoid the center but are not hyperactive, and they do not exhibit anhedonia. Short term silencing does the opposite - anhedonia but not OFT hyperactivity/center avoidance. These opposite results are interesting and may help get at different mechanisms for anhedonia and anxiety in the OBx model.

      The authors use closed-loop stimulation locked to the gamma bursts in the OB to determine whether gamma burst activity in the PC reduces depressive symptoms. In the LPS model of MDD, they stimulated to either enhance or cancel out gamma transmission to PC from OB. Enhancing gamma reduced depressive symptoms in LPS, and blocking gamma by stimulating in antiphase with the OB gamma did not reduce symptoms. The authors conclude that loss of gamma is the cause of OBx depression.

      I am not sure I agree 100% with their conclusions, even though I have no substantive criticisms with the methods and results. Amplifying gamma is sufficient to reduce symptoms, but does canceling it out tell us that it is gamma per se that causes the antidepressant effect? Canceling out gamma does stimulate the fibers going in to the PC but what does the antiphase stimulation do exactly to the PC? Are the same number of action potentials produced, or is the antiphase stimulation doing something fundamentally different to the PC inputs?

      For the rest of my comments, I need to tell a story, one which I shared with Gyuri the other day. I reminded him of our conversation years ago, when I discussed the idea that OBx depression is due to loss of OB input to the PC and the rest of the limbic system. I envisioned a similar experiment to this one. A few years later we met again at Walter Freeman’s Festschrift in Tucson, the day after Walter had passed away. We discussed the idea again and I told him we were working on it. We never got anywhere with what we tried and Gyuri rightly went ahead. No hard feelings at all, and I am really glad that you all did such a great job on this.

      I think there is a crucial piece missing though, on the provenance of this idea, and it comes from Walter. I shared with Gyuri way back when we first spoke about this idea one of Walter’s little-known papers, a 1968 J Neurophys article “Effects of surgical isolation and tetanization on prepyriform cortex in cats.” This paper was published the same year as the Becker and Freeman paper cited in this report. While the Becker and Freeman paper shows that PC activity changes when the olfactory bulb is removed, the single authored 1968 paper gets at its cause.

      The origin of the idea comes from Walter Freeman, as most good ideas in olfaction do. In the 1968 paper, he bulbectomized cats and showed that a normal shock stimulus to the remaining LOT no longer induced the normal oscillatory evoked potential in the PC – there was a single peak in voltage dying off after one cycle. Two hypotheses were considered, 1) the OB drives the oscillation in the PC, when the LOT is stimulated it produces an oscillatory evoked response in the OB, which drives the same response in the PC, and 2) the OB input is necessary for the PC to produce an oscillation.

      The second hypothesis was the one favored by his results. He replaced the missing OB with tetanic 200Hz low level stimulation of the stump of the LOT and then stimulated with the normal larger shock stimulus during a pause in the tetanic stimulation. Et voila, the oscillatory evoked potential was reinstated in the PC. This relatively obscure paper showed an important role for the OB – it provides abundant excitatory drive to the rest of the system, keeping everything in the right dynamic range. These results were replicated for the entorhinal cortex by Kurt Ahrens (Ahrens and Freeman, Brain Research 2001).

      The rescue of depressive behavior with gamma enhancement in the LPS model in the current study is intriguing, and the cancellation effect of the antiphase stimulation is compelling. Would the same type of stimulation rescue a silenced olfactory bulb? If it does not, does this mean that different mechanisms are at play for different models of depression? The methods used here may be able to make sense of the mechanisms and usefulness of different models of depression for different types of treatment studies. Already the difference in behavioral effects among the several methods post some very interesting questions.

      I appreciate the space to tell Walter’s story and the format of biorXiv that allows public discourse about research reports.

    1. On 2021-10-14 16:31:13, user Colin Hawco wrote:

      Overall important work but I'd like to raise some issues.

      First the Destrieux atlas is not a functional atlas. People keep using these sorts of atlases in fMRI work and I have no idea why. The Superior temporal lobe is not a functional unit. That giant big mid-frontal region is not the DLPFC and not well overlapped with what may be reasonable activity patterns for tasks such as the NBack.

      Also, the analysis appears to use Beta values from various contrasts. IME the average t-value is more reliable as a metric because it is (de)weighted by the noise in the voxel/vertex/region. In any analysis of general patterns of activity, I have found more robust using individual t stats rather than betas.

      Also you included many contrasts, including several that have obviously lower ICCs, and in most of the paper appear to collapse across all regions and contrasts. For the Nback, I'd mainly focus on the 0 Bk, 2Bk, and most importantly, the commonly used 0vs2Bk contrast. Those look like they have relatively decent ICCs to me.

      Relatedly in the figures you average across all contrasts, but some of them are not very good contrasts and as a result, your reported regional ICCs are dragged down. Rather than a take all approach, I think it would be better to focus on the primary contrasts as the ones being used.

      I object to the use of ROI as regions which you found interesting when the entire analysis is based off an atlas; the more conventional use of ROI is parcels, etc, in the broader sense, rather than 'parcels I think are interesting versus those I think may be less interesting'. I'm being pedantic but it confused me.

      (everything after this is me pontificating on things I think are interesting in general). <br /> Interesting and important point that contrasts vs baseline have higher stability than two task contrasts, but I also think we forget this is a truism. If you have an imperfect measurement, and subtract another imperfect measurement from it, the reliability of the difference must, by definition, be lower than the reliability of the two separate things (of I see this is mentioned later in the discussion).

      Important that the SST and MID tasks had much poorer reliability. My opinion has between that reward tasks generally have very poor reliability, potentially because the signal is not strong enough, but also because people may vary quite a bit with themselves even how they respond to trials, and oscillate.

      One point of potential import is that a lot of these analyses being done across the field are assuming task activation should be stable, but the brain, and fMRI, is inherently dynamic. Averaging activity by model fit across these relativity short tasks may not provide a very stable metric. Considerations of dynamic processes may yield greater information, but a big challenge there is motion (its always motion...) which makes dynamic measures really hard.

    1. On 2021-09-20 11:28:54, user Aalok Varma wrote:

      This preprint was presented at our lab journal club and we thought we’d start an open discussion about these results.

      We would first like to note that it was a pleasure to discuss the results in this paper, which we found rather interesting, and we had a very fruitful discussion. Nevertheless, we had several questions and clarifications that we were hoping you would be able to help resolve:

      1. Could you please describe a bit more in detail how exactly bouts are defined and how bursts and bouts are distinguished from one another in the processed VNR signal? Is there an interbout interval threshold set to separate bouts, for instance? If yes, what was the value used?

      2. Proof of the idea of measuring conduction velocities using voltage imaging is neat. However, is there some validation of the conduction velocities, as measured by the sub-Nyquist interpolated spike timing (SNAPT) method? For instance, can you compare measurements of conduction velocity by this method with, say, measurements from dual recordings with downstream partners to compare the delay between activation of a cell and the arrival of a PSC? It is not perfect, of course, but given that actually recording from dendrites etc is so challenging in small preps like larval zebrafish, it would be a useful reference value for comparison of how accurate the SNAPT measurements are.

      3. The analysis of Figure 4 involves sorting bouts as those having >50% of active V3. This is a rather arbitrary classification, especially since there aren’t too many neurons per field of view so the difference between 40% and 60% might be just one neuron or so. Why not go the other way around, and first classify bouts as strong/weak and then ask what fraction of V3s was active, across all trials, by plotting bout strength against % of V3 active as a scatter plot? Moreover, we have some concerns with the definition of bout strength. Taking the average cumulative value as the bout strength doesn’t really capture the true bout strength, in our view, since it only captures amplitude, and not so much duration. In Fig 4A (bottom), for instance, bout #2 looks much weaker than bout #7 (which is longer). Yet, their computed “bout strength” is very similar. Why not use Area Under the Curve as a proxy for bout strength? It would capture both amplitude and duration in the definition of “strength”. This analysis may not change the results or the overall story, but is a more objective way of analysing the data, we think.

      4. From the representative plots shown in Fig 5C and D, it seems that when V3 neurons’ activation is turned off, the bout ends. Yet, from the earlier figures, it seems that V3 activity is sustained even after a bout ends. Is it possible with the resources available to perform acute inhibition of these neurons during a bout, to test if shutting their activity suppresses swims? It would lend support to the hypothesis that V3 activity sustains bouts?

      5. The switch to free swimming with optomotor response for the experiment in Fig 6 wasn’t very clear. Moreover, we don’t agree with the interpretation of the result about bout speed modulation in Fig 6C. From the raw data points, the distributions seem largely overlapping, and the difference being detected may simply be because of the large difference in the sample sizes between the control and the ablated groups. Also, how about doing the ablation experiment using the same paradigm as in Fig 5? That way, results may be easier to compare. Furthermore, it is interesting that there is no difference in bout durations in vivo with V3 ablation, although all previous experiments suggest that one should expect a reduction in bout duration on V3 ablation. This may be because of functional compensation/adaptation because of a genetic ablation of V3 neurons from birth. Hence, it may be better to perform acute inhibition in the V3 population during free-swimming OMR, provided you have lines to do the same.

      Other general comments:<br /> 1. Could the legends please include all n’s, as appropriate? Some legends have it, others don’t. It would make the reading much easier.<br /> 2. Fig 1G and Supplementary Fig 3 - clarify the dorsoventral axis schematic. What does 0-1 mean - as in, which is ventral and which dorsal? I think 1 would be ventral, given that active motor interneurons seem to be positioned that way, but a clarification is needed and would make the figure easier to interpret.<br /> 3. Could you please describe the filters being used in a bit more detail, instead of simply stating “high-pass filtered”? What filter type was used (Butterworth, etc.)? What were the cutoff frequencies (sometimes time constants are mentioned, but it would be better to be consistent in the reporting of these details)?<br /> 4. In the introduction, it is stated that “adapting motor output can also happen via changes in tail amplitude or force, without substantial changes in frequency.” Recent work from our lab - Jha and Thirumalai (Current Biology, 2020) - has supported this claim, and we have also shown using whole-cell recordings that this can be explained by changes in the intrinsic properties and recruitment of primary motor neurons at lower speeds. We hope you go through our paper and find it useful, in which case a citation of our work would be much appreciated.

      We hope you find some of our comments useful, and we eagerly look forward to hearing back from you.

      Thanks in advance.<br /> Best,<br /> Aalok Varma<br /> Neural Circuits and Development Lab<br /> National Centre for Biological Sciences (NCBS),<br /> India

    1. On 2021-07-15 04:53:29, user Derek Beaton wrote:

      Overview of “On stability of Canonical Correlation Analysis and Partial Least Squares with application to brain-behavior associations”

      Derek Beaton, PhD<br /> Director, Advanced Analytics <br /> Data Science & Advanced Analytics (DSAA)<br /> St. Michael’s Hospital, Unity Health Toronto

      This manuscript provides an in-depth look at reliability and stability of CCA and PLS through the use of a generative modelling approach with synthetic data (and their software gemmr), and subsequently show CCA and PLS applied to large and modern brain-behavior data sets (HCP, UKBB). The manuscript also provides multiple perspectives: (1) assessment of brain-behavior CCA & PLS when sample sizes change for the number of features, (2) a meta-analysis/review of brain-behavior CCA studies, and (3) tools, suggestions, and advice on how to approach interpretation of CCA & PLS-based studies for brain-behavior neuroimaging studies. There is a substantial amount of work and the contributions of the manuscript are quite valuable. Overall I think this is a strong manuscript and there are many good things about this paper and the software.

      However I focus my review on my concerns. I think if some of these are clarified or responded to, then the paper would possibly be stronger and clearer. Below I first bullet point my primary concerns with the manuscript, and how those concerns relate to the overall conclusions and generalizability of the work. Following that, I provide my other concerns generally in order of appearance in the manuscript.

      My first major concern is that the manuscript generally reads as potential limitations of CCA and PLS. However, only these two methods are discussed and, I believe, that the core issues of stability (and generalizability, replicability, etc…) in neuroimaging are because of (1) small samples, and (2) noisy measurements. So are the issues presented exclusive to CCA/PLS? Or should we expect to see the same effects in other techniques (e.g., standard GLMs, multivariate regressions, statistical/machine learning approaches such as SVM or random forests)?

      While comparing CCA and PLS is (very, very) useful for many fields, especially neuroimaging, I believe that some of the comparisons here are in effect unfair. In particular, CCA doesn’t really work without extra preprocessing to data when those data have more variables than samples. CCA effectively requires us to reduce the dimensionality of data so that we have more samples than variables or to allow us to invert X’X and/or Y’Y. However, PLS does not require additional preprocessing in order to work (correctly). The pipelines for the data were designed around the limitations of CCA but applied to both PLS and CCA. How does PLS perform when these extra steps are not taken? Effectively, how does PLS vs. PLS with CCA-friendly data vs. CCA compare? Though I comment on it more later, I believe that the observed “bias towards the first principal components” in the PLS results may be due to this.

      Taken together, I think the general conclusion to take away from the manuscript is that these are the behaviors and limitations of CCA/PLS under these specific conditions, but not necessarily any condition. I expand on this in additional comments and provide some references throughout.

      Abstract:

      You’ve noted that the “Application of CCA/PLS to high-dimensional datasets raises critical questions about reliability and interpretability”. Perhaps a small but important distinction here is that these techniques provide a lot of things to interpret, but comparatively are relatively easy to interpret (they are interpreted like PCA). I think there should be a de-emphasis of interpretability and most of the emphasis on reliability and stability. To note: these techniques are still easy to interpret even when results are not reliable (which is, perhaps, a drawback of their use).

      I apologize for the following comment as it will be repeated a few more times, but I believe that “For PLS [there is a] bias toward leading principal component axes.” is more likely an artifact of how the data were prepared for use in PLS and not strictly a drawback of PLS. If both X and Y data sets are principal components (which include their subsequently decreasing variance), then PLS will (correctly) pick up on those “variables” (components). This is particularly true if/when data submitted to PLS are not normed or scaled in some way (which principal components are likely not, as that destroys the inherent variance in the principal components).

      Introduction:

      I think “the dominant latent patterns of association linking individual variation in behavioral features to variation in neural features” would be better rephrased as “the dominant common latent patterns shared between behavioral and neural features”. Or something along these lines as it’s a bit clearer and doesn’t emphasis linking one thing to another thing (as this sounds a bit directional, where CCA and this flavor of PLS is symmetric)

      When you say “[...] a number of open challenges exist regarding [CCA/PLS] stability in characteristic regimes of dataset properties”, I wonder if it’s more appropriate to also discuss the open challenges of the data themselves. Noisy instruments and measurements are difficult to analyze with most approaches, and this isn’t a problem for just CCA and PLS. In effect, do we have data that are stable and reliable?

      I find the mixtures of terminology difficult to follow. Could you provide a clearer set of definitions for terminology, and then stick specifically to certain terms? You’ve mentioned both the SVD and eigendecompositions. It might make things clearer to connect CCA & PLS terminology directly to SVD/eigen results, and just use those terms instead. For one particularly confusing example: “weights”. I’m not sure what “weights” are to mean here, especially because “weights” has so many meanings in stats/machine learning.

      I think the discussions of stability rely too heavily on relatively older literature (e.g., references 10-12) which are also generally from other domains. The same points from those are likely still true (or even more so in larger and noisier data) but I think more modern works that directly discuss high dimensional problems would be helpful. Furthermore, these generally discuss CCA and not PLS. So additional literature on PLS here would be good.

      For reference 13, the manuscript says “cross-validated association strengths that are markedly lower than in-sample estimates”. Isn’t that expected based on this (and other) work? Should we not expect the smaller sample sizes (e.g., folds) to produce lower (or less stable) estimates?

      To echo a previous point: most of the literature discussing (in)stability is for CCA and not PLS. This should be clarified or further supported.

      Though this work is important and well done, I don’t think it’s fair to say “to our knowledge, no framework exists [...]”. There has been a lot of work on the systematic assessment of these techniques, and the SVD/eigen in general. Could you clarify this a bit more? Or instead show that this is an additional element in our understanding of CCA/PLS behaviors? The field of chemometrics in particular has an extensive literature on the stability of PLS (although typically the regression flavor, not the PLSC flavor here).

      I think this is misleading and possibly incorrect: “CCA and PLS differed in their dependences and robustness, in part due to PLS exhibiting a detrimental bias of weights toward principal axes”. PLS may exhibit this behavior under these data processing conditions (which are required for CCA, but not for PLS).

      Another repeated point: the manuscript says that “typical CCA/PLS studies in neuroimaging are prone to instability”. Is this because of CCA/PLS? Are other techniques also unstable? Is this because of the data?

      Results:

      “Number of features” as the additive number between X and Y is strange, because each set has a different number of features. And the sizes of X and Y (as well as their internal covariance structures) can have substantial influence on the results. For example, if X were only 1 or 2 (strongly correlated) measures and Y had many 100s or 1000s of measures, then the (joint) solution is fairly limited and (to a degree) constrained by X.

      The finding of the “average of the cross-validated and in-sample” results struck me, especially given that the bootstrapped results didn’t converge to the expected estimate (but the previous average did). I didn’t expect this, but I think it’s a positive finding. Could you provide more details on these procedures, and could you possibly explain these behaviors/findings in more detail?

      Why are you quantifying error as the greater of the two errors (X and Y) from their true weights? Why not present them separately? That would tell us if/how CCA/PLS can estimate one set but perhaps not the other.

      I don’t follow what the authors did to get around the sign-flips in the results. The manuscript says “it is chosen to obtain a positive between-set correlation”, but I’m not sure what this means here.

      To repeat a previous point about terminology: the term “loadings” has many meanings, too. Here it seems the authors used the correlation between datasets and scores, correct? These correlation loadings are one type of loading, where, say, the singular/eigen vectors are another type of loading.

      Why switch between Spearman and Pearson correlations for the distance estimate for the various scores? Why not both in both cases or choosing one?

      I find Figure 3---in particular panels A and B---unclear. First, it’s not entirely clear to me what “weights” and “feature id” convey here. Figure 3B seems to show that PLS weights are spherical. This is not what I would expect from PLS. Could you explain these results in more detail?

      A reiterated point: The description of what it means for PLS to converge to “the first principal component” is unclear. The first principal component of what? There are two data sets (X, Y) that are sets of PCs (if I am understanding correctly).

      I think the permutation tests may be too conservative and/or incorrect (as described in CCA/PLS analysis of empirical data). While it is typical to permute just the rows of one matrix vs. the other, this is potentially problematic for CCA/PLS. That’s because each X & Y has an internal covariance structure. If at least one of those structures is strong, then the results will resemble the strong internal structure. This is particularly true when, for example, behavioral data are already very correlated. So a more appropriate permutation may be within each column of the data matrices. However, this is only appropriate in the original data matrices. Permutation should not be done on the PC scores (I am presuming that was the case, but please correct me if I am wrong).

      For the line that starts with “After modality-specific preprocessing (see Methods)”, I will reiterate and expand on one of my sticking points. CCA requires invertible or rank reduced matrices when there are too many variables but PLS does not. So to reduce specifically to 100 PCs is a limitation of CCA. PLS does not require this. How would the results change if PLS were run directly on the data? Furthermore, 100 principal components is not informative nor a meaningful choice. How many total components were there? How much variance did 100 components explain? Could just 10 or 20 components explain almost as much variance as 100? For analyses based on PCs, it is important to select based on something meaningful: that could be explained variance or by performing tests on the PCs themselves for selection. Though almost any approach is somewhat arbitrary, to select 100 is seemingly unmotivated or unguided.

      In Figure 4, how are you computing 95% CIs from permutations? Permuations are for null distributions, not distributions around the effects (CIs). I would expect other resampling approaches (e.g., bootstrap) to provide CIs.

      By the time I get to Figure 4, I’m wondering why are the CCA and PLS results not directly compared? As in, why not present, for examples, correlations or other similarities between the CCA & PLS results? I think it would be important to directly quantify the similarity between CCA & PLS results.

      Later in the manuscript, you indicate that you “considered reducing the data to different numbers of principal components than 100.” While this is certainly a benefit, the description of the results is unclear. You indicate that “Retaining more than 10 behavioral PCs lead to marginal increases [...]”. But 10 here is not informative. How much variance was explained by those 10? By the 100? How much is explained by 1 PC? The total number of PCs is not particularly informative, rather, the amount of (cumulative) explained variance, the number of retained components, and the total number of possible components makes for something more informative.

      Discussion:

      The authors mention that CCA is (more) attractive (than PLS) because it’s scale invariant, which is nice when measures are not commensurate. However, when data are normalized or scaled (e.g., z-scored), then data are commensurate. Did you use normed or scaled data for PLS? How would that change the conclusions about commensurate scales and CCA’s scale invariance?

      You mention in limitations that you “assume data are described in a PC basis” and then you “expect that a dataset whose features have been rotated into a new coordinate system by an orthogonal transformation matrix to have the same sample size requirements as the untransformed dataset.” In this particular case for PLS: you don’t need to assume that. You can run the same pipelines you have with the untransformed data to see how CCA vs. PLS vs. (untransformed) PLS compare. This would provide a very interesting case regardless of the results (whether the sample size requirements are the same or different).

      You say that the generative model points out the pitfalls of CCA and PLS. Could you also apply this generative approach to other techniques, even simple linear models? Do the pitfalls also exist there? Are these pitfalls of the methods, or are these pitfalls reflective of the kinds of data we analyze?

      You note that there are regularized versions of CCA and PLS to “mitigate the problem of small sample sizes”. I have two issues (one small, one a bit bigger) with this statement. Regularized (and penalized, and sparsified, etc…) methods are not necessarily designed to allow for small sample sizes. Rather they help with mitigating overfitting (which sometimes could be due to too small of sample). My second issue is that the line between CCA and PLS becomes especially blurred, and even disappears, when it comes to regularized techniques. In particular, we should look to Witten et al.’s penalized approach for CCA. Witten et al., note that “[in] high dimensional problems, treating the covariance matrix as diagonal can yield good results” where they reframe their CCA equation (4.2) and in a different way, where their “penalized CCA criterion, [they] substitute in the identity matrix” for X’X and Y’Y in their equation 4.3. Witten et al., then further note that their CCA “is simply [eq. 2.7] with X replaced with X'Y”. That means that when it comes to penalized CCAs, most drift towards or even start out as PLS. This can make any suggestions as to which is better (CCA or PLS) moot as in the penalized approaches, they are effectively much closer to one another than in the standard approaches. (Furthermore, using a subset of PCs for each data set is, effectively, a soft form of regularization.)

      Though brief, I think you’ve placed too much emphasis on PLS regression as being “conceptually different from PLSC/PLS-SVD” because in virtually all implementations of PLS regression, the first component/latent variable is identical to PLSC’s first component/latent variable. This is because both approaches model X’Y and (in most cases) use the SVD to do so. It’s just that PLSC is one pass of the SVD (so effectively a PCA of X’Y) where as PLSR is iterative, deflates X and Y in each iteration, and (asymmetrically) emphasizes certain properties for X (e.g., orthogonal latent variables for X, but not necessarily Y).

      Methods:

      The approach to the behavioral data is not particularly realistic when it comes to studies, is it? In most cases some form of imputation would be used and the behavioral data in particular would be directly used, not a projection (PCs) of the data. Would the behavioral PCs change substantially in your pipeline if you were to impute instead of using the method you did?

      References and literature:

      Below I provide some references and literature to supplement some of my points and to help strengthen some of the points you’ve made in the paper. Please note that some are mine. I’m not providing my (or the other) citations because I want them to be or am expecting them to be cited, rather these are for reference. Furthermore, these articles also provide quite a bit of citations that are worth looking into.

      These two articles provide more unified perspectives on PLS, CCA, and many related techniques. The Borga et al., article is quite a good one. I provide my article moreso for the supplemental materials (https://www.biorxiv.org/con... "https://www.biorxiv.org/content/10.1101/598888v3.supplementary-material)"). In my supplemental materials, I further unify and generalize more approaches like the Borga article. Both of these show (at least algebraically) that these techniques can be thought of as variations of one another, and in some cases not very different.

      Borga, M., Landelius, T., & Knutsson, H. (1997). A unified approach to pca, pls, mlr and cca. Linköping University, Department of Electrical Engineering.

      Beaton, D., Saporta, G., & Abdi, H. (2019). A generalization of partial least squares regression and correspondence analysis for categorical and mixed data: An application with the ADNI data. bioRxiv, 598888.

      To further emphasize why CCA/PLS can be very different or very similar, please see another one of my articles (see below). Like above, most of this article can just be skipped. Starting in section 4 on Page 22, I show CCA, PLS, and reduced rank regression (RRR) because they are all variants of one another. In Figure 5 the data are centered and scaled, and each technique produces comparable results. In Figure 7, however, the data are only centered and produce different results. This highlights that when norming/scaling, CCA and PLS can in fact be more similar than different:

      Beaton, D. (2020). Generalized eigen, singular value, and partial least squares decompositions: The GSVD package. arXiv preprint arXiv:2010.14734.

      Some recent work has been published to show what happens to results when sample sizes are small and as sample sizes change:

      Grady, C. L., Rieck, J. R., Nichol, D., Rodrigue, K. M., & Kennedy, K. M. (2021). Influence of sample size and analytic approach on stability and interpretation of brain-behavior correlations in task-related fMRI data. Human brain mapping, 42(1), 204-219.

      The above article is an interesting companion to yours because it shows that there is an advantage to multivariate over univariate techniques because multivariate approaches provide consistent (stable) results. However, Grady et al., concluded that small samples wouldn’t be sufficient to get reliable results, regardless of approach.

      These would be more suitable PLS articles to reference, especially for neuroimaging:

      Krishnan, A., Williams, L.J., McIntosh, A.R., & Abdi, H. (2011). Partial Least Squares (PLS) methods for neuroimaging: A tutorial and review. NeuroImage, 56, 455-475.

      Abdi, H. (2010). Partial least square regression, projection on latent structure regression, PLS-Regression. Wiley Interdisciplinary Reviews: Computational Statistics, 2, 97-106.

      McIntosh, A. R., & Mišic, B. (2013). Multivariate statistical analyses for neuroimaging data. Annual review of psychology, 64, 499-525.

      McIntosh, A. R., & Lobaugh, N. J. (2004). Partial least squares analysis of neuroimaging data: applications and advances. Neuroimage, 23, S250-S263.

      McIntosh, A. R., Bookstein, F. L., Haxby, J. V., & Grady, C. L. (1996). Spatial pattern analysis of functional brain images using partial least squares. Neuroimage, 3(3), 143-157.

      Additional PLS & CCA articles:

      Gatius, F., Miralbés, C., David, C., & Puy, J. (2017). Comparison of CCA and PLS to explore and model NIR data. Chemometrics and Intelligent Laboratory Systems, 164, 76-82.

      Goodhue, D. L., Lewis, W., & Thompson, R. (2012). Does PLS have advantages for small sample size or non-normal data?. MIS quarterly, 981-1001.

      To determine the number of PCs especially when detecting the space to interpret (which applies to PLS and CCA):

      Peres-Neto, P. R., Jackson, D. A., & Somers, K. M. (2005). How many principal components? Stopping rules for determining the number of non-trivial axes revisited. Computational Statistics & Data Analysis, 49(4), 974-997.

    1. On 2021-06-10 08:38:06, user Sebastian Dresbach wrote:

      Dear Xingfeng Shao, Fanhua Guo, Qinyang Shou, Kai Wang, Kay Jann , Lirong Yan, Arthur W. Toga, Peng Zhang and Danny JJ Wanga,

      We have discussed the manuscript entitled “Laminar perfusion imaging with zoomed arterial spin labeling at 7T” in the Maastricht layer-fMRI seminar on Monday June 7th. In this letter, we would like to share a summary of our discussion points.

      The manuscript describes a sophisticated study about the implementation and application of functional layer-dependent CBF mapping in sensory and motor cortex. The authors use a pseudo-continuous ASL sequence with an optimized (relatively superiorly aligned) labeling plane and locally-focused 3D-GRASE readout at UHF. The method is validated with previously described “test-tasks” that evoke laminar-specific modulations in vascular responses.<br /> The study addresses one of the most pressing questions of the emerging field of human layer-fMRI. Namely, how to efficiently capture layer-specific signal changes that resemble laminar-specific neuronal activation changes. Thus, we believe that this manuscript will be of wide interest to the field.

      The study increases an already long list of non-BOLD layer-fMRI method studies that are currently being published in the field. This study stands out in the sense that it provides more than “just” a usable MRI sequence with extremely clear interpretable layer-profiles. It also shows expected modulations of activation changes for subtle task modulations of sensory feedback into the primary motor cortex, as well as attention modulations in V1.

      Some of the specific findings of the study are:<br /> -> The relative CBF change at these laminar resolutions can be as large as 150-200%. This is an extremely valuable piece of information to know in the field of laminar signal modeling. This will help with the interpretation of CBV results and it will help the with understanding of the vascular physiology in general. Until now, the field had to assume underestimated CBF-values from low resolution experiments (partial voluming), and from non-human animal experiments (anesthesia).<br /> -> pCASL is a usable sequence to study subtle cognitive modulations across depth within a conventional human neuroscience acquisition setting.

      While there are a few specific points that the manuscript could be revised on, we are extremely enthusiastic about the manuscript.

      Some aspects that lower our enthusiasm a little bit, refer to <br /> (i) the unclear influence of short-TI back-ground suppression, <br /> (ii) over-stated claims on novelty and superiority over other modalities, <br /> (iii) limited information about some methodological aspects, <br /> and (iv) the restricted data availability.

      An itemised list of potential improvements is given below.

      1.) Influence of background suppression is unclear.<br /> The authors use a single-inversion background suppression. Due to the long T1 at 7T, this background-suppression results in the fact that the CSF magnetisation is aligned along the opposite direction of the external magnetic field (negative phase). This results in signal cancellation at the superficial layers. While the control images (underlays of many figures) have a beautiful structural contrast, they exhibit a clear dark line of the transition between CSF and the superficial layers of GM. This might have substantial effects on the interpretation of the CBF profiles. With a net-negative phase of the z-magnetisation, an increase of CBF would result in a decrease of the MRI magnitude signal.<br /> Thus, for any voxel with partial voluming of CBF and GM, this might make the CBF quantification a bit tricky. For partial voluming of 50% and more, it might make the CBF quantification impossible? Given the nominal resolution of 1mm, this artifact might concern up to half of the cortical thickness. <br /> We would advise the authors to discuss potential influence of the background suppression as used here. <br /> Was the TI of the background suppression kept constant for all post-label-delays?

      2.) Details about the “deblurring” in the partition direction can be extended.<br /> We applaud the authors’ efforts to acknowledge and account for the blurring in the partition direction of 3D-GRASE. Unfortunately, we are afraid that the method's description is not really sufficient to help us fully appreciate how appropriate and effective the method works.

      On page 7, it is mentioned that partial Fourier sampling is applied in the partition direction and furthermore it is mentioned that variable refocusing flip angles are used. Furthermore, the spatial variance of non-180deg pulses will result in stimulated echoes and a sensitivity to T1 as well as local B1+. All of those features have substantial effects on the k-space signal evolution in the partition direction. However, based on the descriptions of the deblurring and based on the depiction of the simulated k-space signal (Fig. S5A), those effects do not seem to be incorporated in the deblurring model.<br /> We would advise the authors to comment on the limits of the used deblurring method and the potential of introducing artificial edge-enhancement features into the data. E.g. the ringing effect of the PSF in Fig. 4D might have the same spatial frequency as the layer-fMRI double peak in Fig. S5E. Is it possible that the edge enhancement-filter introduced layer-signatures across cortical depth based on the sharp border at the GM-CSF transition? Could the overcorrection of T2-blurring be responsible for the vertical stripes in the axial view of Fig. S5G?

      3.) <br /> a) We don’t follow the claim about VASO’s lack of capturing absolute CBV changes.<br /> The authors claim that the proposed method is superior to CBV-based (VASO) methods because “VASO only measures relative CBV changes that may be confounded by different baseline CBV values across cortical layers” (page 3). This claim is repeated on page 7. <br /> We believe that this statement is untrue. VASO measures "absolute" CBV changes. VASO is sensitive to volume redistributions within the voxel. Thus, the percent VASO signal change refers to “absolute” physical units of ml of CBV change per 100 ml of tissue. The “relative” part about CBV changes does not refer to a normalization to baseline CBV, however it refers to the relativity of 100ml of tissue. This is identical to the “absolute” CBF quantification in ASL. The authors quantify their “absolute” CBF values in “relative” units (per 100ml of tissue). VASO’s lack of a baseline CBV quantification (without the use of multiple inversion times) should not be misunderstood as an inherent normalization of CBVrest.<br /> For more background about the “absolute” units of VASO in layer-fMRI, see Fig. 4 and section 4.4 in Huber et al., 2021, as well as Fig. 8 in Huber et al., 20215

      Huber L, Poser BA, Kaas AL, et al. Validating layer-specific VASO across species. Neuroimage. 2021. doi:10.1016/j.neuroimage.2021.118195

      Huber L, Goense J, Kennerley AJ, et al. Cortical lamina-dependent blood volume changes in human brain at 7T. Neuroimage. 2015;107:23-33. doi:10.1016/j.neuroimage.2014.11.046

      There are plenty of VASO approaches from the Johns-Hopkins group, from NIH, and the Yale group quantifying absolute CBV changes by means of multiple TI’s.

      Hua J, Qin Q, Pekar JJ, van Zijl PCM. Measurement of absolute arterial cerebral blood volume in human brain without using a contrast agent. NMR Biomed. 2011;24(10):1313-1325. doi:10.1002/nbm.1693

      Ciris PA ksi., Qiu M, Constable RT. Noninvasive MRI measurement of the absolute cerebral blood volume-cerebral blood flow relationship during visual stimulation in healthy humans. Magn Reson Med. 2014;72(3):864-875. doi:10.1002/mrm.24984

      Gu H, Lu H, Ye FQ, Stein EA, Yang Y. Noninvasive quantification of cerebral blood volume in humans during functional activation. Neuroimage. 2006;30(2):377-387. doi:10.1016/j.neuroimage.2005.09.057

      b) Furthermore, the authors claim that due to the (wrongly presumed) CBVrest sensitivity of VASO, it fails to capture the fact that layers II/III have a stronger activation than layer Vb during finger tapping (page 5).<br /> This statement is not supported by the literature. A finger tapping task has been conducted in about a dozen layer-fMRI VASO studies and every single one shows a stronger activation in layers II/III compared to layer Vb. See the studies below, just to name a few:

      Guidi M, Huber L, Lampe L, Gauthier CJ, Möller HE. Lamina-dependent calibrated BOLD response in human primary motor cortex. Neuroimage. 2016;141:250-261. doi:10.1016/j.neuroimage.2016.06.030

      Beckett AJS, Dadakova T, Townsend J, Huber L, Park S, Feinberg DA. Comparison of BOLD and CBV using 3D EPI and 3D GRASE for cortical layer fMRI at 7T. Magn Reson Med. 2020:1-18. doi:10.1101/778142

      Persichetti AS, Avery JA, Huber L, Merriam EP, Martin A. Layer-Specific Contributions to Imagined and Executed Hand Movements in Human Primary Motor Cortex. Curr Biol. 2020;30:1-5. doi:10.2139/ssrn.3482808

      Chai Y, Li L, Huber L, Poser BA, Bandettini PA. Integrated VASO and perfusion contrast: A new tool for laminar functional MRI. Neuroimage. 2020;207. doi:10.1016/j.neuroimage.2019.116358

      Huber L, Goense J, Kennerley AJ, et al. Cortical lamina-dependent blood volume changes in human brain at 7T. Neuroimage. 2015;107:23-33. doi:10.1016/j.neuroimage.2014.11.046

      Guidi M, Huber L, Lampe L, Merola A, Ihle K, Möller HE. Cortical laminar resting-state fluctuations scale with the hypercapnic bold response. HBM. 2020;41:2014-2027. doi:10.1002/hbm.24926

      c) In the paragraph on comparisons with VASO (first paragraph on page 7) the authors claim that “the proposed ASL fMRI is robust to potential BOLD contamination”. We believe that this claim needs further explanations and/or rephrasing. The authors use the CBF model in (Alsop et al., 2015), which does not account for T2/T2’ contaminations (and lacks any discussion of BOLD or functional imaging). The model in (Alsop et al., 2015) is in fact the one that was developed in (Buxton et al. 1998) and has been developed for much lower field strengths and different voxel sizes with assumed average vascular distribution. While we agree that the intra-vascular BOLD contamination is indeed negligible at 7T (maybe even with GRASE), we do not believe that the extra-vascular BOLD contamination around the microvasculature can be neglected at 7T. <br /> While layer-fMRI VASO studies account for such T2/T2’ contaminations, by means of a dynamic division, this extra-vascular BOLD contamination is not taken care of in this study. In the setup used here, the labeled water magnetization that permeated through the capillary walls (during the label condition and not during the control condition) will experience a T2/T2’ contamination during the readout and linearly scale the CBF signal.<br /> Maybe the authors can elaborate a bit about the reasoning behind the claim that their BOLD contamination is 1-2%. How was this number obtained?

      Alsop DC, Detre JA, Golay X, et al. Recommended implementation of arterial spin-labeled Perfusion mri for clinical applications: A consensus of the ISMRM Perfusion Study group and the European consortium for ASL in dementia. Magn Reson Med. 2015;73(1):102-116. doi:10.1002/mrm.25197

      4.) Claim about first in-vivo depth-dependent CBF dynamics. <br /> On page 5, the authors claim “this is the first time that the dynamics of labeled blood flowing from pial arteries, arterioles to downstream microvasculature is shown in vivo on the cerebral cortex”. This claim is repeated on page 3.<br /> Aside from our general hesitance to appreciate novelty as scientific value, we do not believe this statement is true. Please compare the reference below:

      Zappe AC, Pfeuffer J, Merkle H, Logothetis NK, Goense JBM. The effect of labeling parameters on perfusion-based fMRI in nonhuman primates. J Cereb Blood Flow Metab. 2008;28(3):640-652. doi:10.1038/sj.jcbfm.9600564

      Along those lines of claims on CBF dynamics, the authors might be interested in the fact that a number of other studies investigated and depict time courses of the temporal evolution of CBF. The manuscript at hand, however, does not show a single time course of the CBF dynamics.

      Kim T, Kim SG. Cortical layer-dependent arterial blood volume changes: Improved spatial specificity relative to BOLD fMRI. Neuroimage. 2010;49(2):1340-1349. doi:10.1016/j.neuroimage.2009.09.061

      Kashyap S, Ivanov D, Havlicek M, Huber L, Poser BA, Uludag K. Sub-millimetre resolution laminar fMRI using Arterial Spin Labelling in humans at 7 T. PLoS One. 2021;16(4 April):1-23. doi:10.1371/journal.pone.0250504

      5.) The data are solely available given there is a MTA contract. Compared to other papers in the field, this is quite restrictive and highly unconventional. Without access to the sequence and the data, the impact of the manuscript on the field is substantially reduced. We would advise the authors to provide more details about the terms and conditions of the MTA.

      6.) The nominal resolution of 1mm iso is unconventionally low for laminar fMRI. Even for non-BOLD laminar fMRI. We believe that the “proof is in the budding” and the shown specificity of the layer-profiles justify usability of the resolution used. However, we still think that the manuscript would benefit from a brief discussion on the laminar specificity across the two investigated brain areas (M1 and V1) with respect to the cortical thickness.

      7.) The y-axis of %-perfusion changes in Fig. 2E and Fig. 3E are two orders of magnitude smaller than Fig. 4E. We assume there is a conversion to 100% missing? Maybe it does refer to a different baseline? Maybe %-perfusion in Fig. 2E and 3E are referring to % of units of M0? And %-perfusion in Fig. 4E refers to % of units of CBFrest?

      8.) The figure key and the axes descriptions in Fig. S3 are hard to read in printouts of the figure. We had to zoom in quite a bit on the electronic version to be able to read them. We would advise the authors to align the panels vertically to cover more space on the page. The y-axis range of panel G has huge implications for the field and it would be a pity, if it stays unreadable.

      9.) The qualification of M0 is unclear to us. The fact that the FAs are not exactly 180deg in the GRASE readout leads to stimulated echoes. While these stimulated echoes are helpful to obtain a better PSF and increase the signal efficiency, they introduce a T1 weighting into the final contrast. This means that (unlike single shot methods with very long TRs) the readout module itself makes it impossible to obtain a reference image without any T1-weighting (M0). We would like to encourage the authors to add a few details how they estimated the equilibrium z-magnetization with the 3D-GRASE radout used.

      Scheffler K, Engelmann J, Heule R. BOLD sensitivity and vessel size specificity along CPMG and GRASE echo trains. Magn Reson Med. 2021:1-8. doi:10.1002/mrm.28871

      10.) It is mentioned that the study would use a “segmented” readout. We find this terminology confusing. Are the authors referring to multiple excitation pulses per volume? If we understand the sequence correctly, we believe “partitioned” would be an alternative term.

      11.) We are puzzled about the message of the depicted run-averaged motion traces in Fig. S7. What should the reader take away from this? Is it concerning that there are common motion patterns that are repeatable across runs? How many participants are averaged here? A more informative depiction of the subject-motion might be the average frame-wise displacement for all runs and participants? The manuscript might become clearer by revising this figure and/or removing it.

      12.) We found Fig. S6 is quite puzzling too. While we are excited that the described methodology can be used for functional connectivity analyses, there are way too many open questions about the underlying processes and assumptions to just dump it in the supplementary material. This figure feels like a completely new study in itself that is pushed into a single figure caption. WE would advise the authors to provide more information about how this figure is generated and/or consider removing it?<br /> How should functional connectivity be interpreted, if there is no “resting-state”. Since the task data are used from M1 (dominated by the main effect), does it mean that the seed-timecourse resembles the block-design activation? Then, it refers more to a correlation-analysis of a block-design task than “functional connectivity”. How come that there are three slices shown sometimes with target regions, sometimes not? Where is the seed ROI? Does it span across cortical depth?

      13.) We would advise the authors to be a bit more specific about the terminology of “BOLD”. Most readers might interpret this as the conventional GE-BOLD. Maybe the authors can rephrase it to “GRASE-BOLD” or “SE-BOLD”, or something similar? E.g. in the abstract introduction and some figure captions, if they agree?

      14.) Performing pCASL is not trivial at 7T. While the authors circumvent many challenges with the superiorly aligned labeling plane, I feel that a successful replication of the experiment would require a more detailed description of the sequence parameters. What was the pulse shape, what was the pulse duration and inter-pulse interval? Gradient strengths? Does the yellow line width in Fig. 1B represents the bandwidth for a given flow velocity?

      15.) The choice of ROIs. Both finger tapping and flickering checkerboard tasks usually evoke widespread signal changes along the hand knob of M1 and in the visual cortex, respectively. We appreciate that you show the corresponding activation maps in figures 4B and 5B. You also state on page 9 that you manually drew the CSF/GM and GM/WM outlines in the hand knob area of M1. In addition to that, it might be interesting how you chose your slices of interest or how the lateral extent of the ROI was defined. Is the pattern you show in the layer profiles in figure 4E to be expected along the entire length of the hand-know or only in certain segments?<br /> The same holds for the ROIs in the visual cortex for which the CSF/GM and GM/WM outlines were defined automatically.

      Minor comments:<br /> PLD acronym is not introduced<br /> Page 6 reference to fig.6E - should be 5E?

      With kind regards,<br /> Sebastian Dresbach, Omer Faruk Gulban, Renzo Huber

    1. On 2021-06-09 20:58:11, user Jenny Zhang wrote:

      Hi Dr. Tian, my name is Jenny Zhang and I am an undergraduate student studying biomedical research at UCLA. My classmates and I greatly appreciated reading your paper “Enhancing Mask Activity in Dopaminergic Neurons Extends Lifespan in Flies”. Your writing was very clear and the diagrams you provided in Figure 1A and 7A were very helpful in visualizing the experiments. For figure 1A, we were slightly confused by the different colors used for the dots. Providing a key for that aspect of the figure might be helpful in informing readers. Inspired by your diagrams, we think it would be useful to insert a diagram at the start of each set of survivorship curves to show how the experimental question differs in each set. The labelling in Figure 2 was very clear, particularly in the line drawn at 50% and the notation of p-values; we think it would be useful to standardize this format for the following figures to improve cohesiveness. In addition, standardizing the PAM/0273 notation may be useful as well. It may also be beneficial to use bar graphs (with number of days to reach 50% survival on the y-axis) for the survivorship data in figures 3 and 7. This could be an easy way to get the main findings across to readers. Lastly, we had the following lingering questions. What are the flies dying from that is avoided by Mask overexpression? What are the downstream effects of GPCR signalling in DANS? How is mask interacting molecularly with microtubules? A CoIP and western blot may be useful in determining the proteins that Mask interacts with.

    1. On 2021-06-03 01:02:09, user marci_rosenberg wrote:

      Precise and pervasive phasic bursting in locus coeruleus during maternal behavior<br /> Roman Dvorkin, Stephen D. Shea<br /> Biorxiv, April 1st, 2021<br /> Doi: https://doi.org/10.1101/202...<br /> Reviewed by Eszter Kish and Marci Rosenberg as part of the 2021 UCSF Peer Review minicourse with James Fraser

      Summary<br /> In the central nervous system, noradrenergic signaling has been implicated in a wide variety of functions, including arousal, learning and memory, and, as this paper highlights, maternal behavior. While acute bursting of noradrenergic neurons has been shown to play an important role in goal directed behaviors, the timescale of the relationship between noradrenergic signaling and social (maternal) behavior is unknown since previous studies have relied on a mix of loss-of-function type approaches (e.g. knocking out the enzyme required to synthesize norepinephrine) and temporally imprecise recordings of noradrenergic activity (e.g. measuring release of noradrenaline while an animal engages in behavior). In this paper, the authors overcome these limitations of previous studies on maternal behavior by employing temporally precise recordings of activity of noradrenergic neurons.

      In this article, the authors use a combination of electrophysiology and fiber photometry to evaluate the temporal relationship between firing of noradrenergic neurons in the locus coeruleus (LC-NA) and the stereotyped mouse female social behavior of gathering dispersed pups and bringing them back to the nest. Their major goals are to demonstrate that: 1) there is a phasic LC-NA response closely time-linked to pup retrieval; 2) this LC response is robust over time; 3) this response is not experience-dependent (i.e. present at full-strength upon first retrieval); 4) this response is linked to this specific behavior, and cannot be replicated by other similar types of behaviors (e.g. digging, retrieving a toy mouse, or receiving a food reward); 5) this neuronal response immediately precedes the behavioral output; and 6) LC-NA activity is correlated with locomotion speed only during pup retrieval.

      The authors clearly succeed in providing sufficient data to support most of these conclusions, and the major success of this paper is using multiple orthogonal approaches to demonstrate the same, robust response.

      The major weakness of this paper is a lack of sufficient context and framing, especially in the introduction and discussion. There are also a few technical concerns related to data presentation and statistics. We think these are easily addressable concerns, and ones that will demonstrably strengthen the significance of the paper, especially to a wider audience.

      MAJOR CONCERNS

      Technical:<br /> - Figure 6 uses log firing rates to quantify responses in some panels of the figure, while using z-scores in the other. This is concerning as the authors attempt to note differences in the results acquired by these two distinct techniques (ephys vs. fiber-photometry), with e.g. response of female’s LC to licking/grooming pup. Why not compare z-scores across both? If the authors wish to present data using both outputs, they should provide reasoning for the use of each metric and the benefits and drawbacks of each.<br /> - The authors should report the actual p-values of their statistical tests even if they are ‘significant’.<br /> - T-tests work under the assumption that your data is normally distributed. The authors should either confirm that their data is normally distributed or use a non-parametric statistical test that does not rely on such assumptions.

      Context/framing:<br /> - The authors comment on a few possible points of significance in this study, including: 1) the link between phasic LC-NA activity and *social* behavior (highlighted in introduction); 2) the timing of phasic LC-NA activity related to behavior (highlighted in discussion); and 3) the uniform response of the LC associated with pup retrieval, which is a possible rebuke to the concept of a sub-specialized LC (highlighted in discussion). To enhance readability, we would encourage the authors to: 1) highlight the same point(s) of significance between introduction and discussion, and 2) spend a few more sentences in the introduction and, especially the discussion, really deliberately laying out ‘why this study matters’ to a generic neuroscientist.

      MINOR CONCERNS

      • The locus coeruleus is a pontine, not midbrain, nucleus
      • What does half-max width of the PSTH mean, conceptually? Why is this a meaningful output measurement? Providing a brief textual description in manuscript or Figure 1 legend would enhance readability.
      • Similarly, we would also welcome a brief textual description/explanation of the reasoning behind and methodological detail relating to the Z-score firing rate and the circular permutation analysis in Figure 2
      • Why are all the means not centered around zero in the z-score scatter plots?
      • Figure 3 looks at change in activity across days but not across individual trials. However, the heatmaps in figure 3 for P0 indicate that there may be some attenuation and temporal shift in the peak of the signal. It would be interesting to note whether this is consistent across animals as it would indicate that there is indeed change in the LC-NA responses across retrievals which would contradict the author’s current conclusions.
      • For the electrophysiology experiments, the authors use each individually recorded unit as an independent sample. While their results are robust, they should potentially consider the use of nested statistics as this would be the proper statistical technique.
    1. On 2021-03-04 13:55:35, user Johannes Franz wrote:

      Dear Tim van Mourik, Peter J. Koopmans, Lauren J. Bains, David G. Norris, Janneke F.M. Jehee,

      Thank you for posting your manuscript as a preprint. We enjoyed reading and discussing it in our layer fMRI journal club (Maastricht University). We would like to provide a few comments compiled from our discussion that we hope will be of use to you.

      The manuscript describes a layer-fMRI study with a spatial attention task. The behavioral protocol follows a long tradition in the psychophysics of spatial attention, and the layer fMRI predictions stem from a well-established literature on the neurophysiology of attentional modulation in visual cortex studied with single units. Thus, we think that the experiment is perfectly suited for applications with layer-fMRI. The acquisition and analysis procedures include cutting edge methodologies and both data and analysis code is claimed to be openly available.

      We believe a large readership will appreciate your investigation of the effect of spatial attention on laminar BOLD activation profiles in an orientation discrimination task, as well as your intention to drive the young field of laminar fMRI towards more thorough reporting of analysis choices and consequences. Furthermore, we are excited about the pipeline being publicly available.

      In this study you show, similar to previous findings, an increase in BOLD response for attended regions, with and without visual stimulation. Yet, unlike previous studies, you did not find an effect of spatial attention across layers.

      We believe the manuscript could be improved along the following points:

      1.) Data are hard to access:<br /> We fully agree with the lead author in his agenda that open sharing of data is mandatory for modern research. We think this is even more essential for replication studies that do not see the same layer-dependent effects compared to previous studies. Only when the data are available, the community can employ their own set of tools and expertise to help tease out potential layer-specific attention effects and/or potential reasons for a disagreement between studies.<br /> Given the authors' stated support for open science, and the fact that the manuscript mentioned more than 5 times (at most prominent places) that all data are openly available, we were surprised how difficult it was to get access to the data. Many of us did not succeed in getting access to the MRI data straightforwardly. After reading IT manuals on how to use webdav.data, setting up our ORCID settings from scratch, and after requesting a temporary Donders account, we succeeded to download the data of the single participant that is provided.<br /> The time course data are much easier to access. However, we were disappointed that those data do not refer to MRI data per se, but rather refer to model fits, which are highly processed, and upsampled to a temporal resolution that is three times that of the actual fMRI time series. The manuscript might benefit from adding a few details about the shared time course data.

      2.) Details on data acquisition:<br /> The acquisition of the functional data is described in one single sentence (line 354f). To aid the importance of reproducibility, we believe this section would benefit from further explanations. <br /> 2a) E.g. application of GRAPPA 8 is rather liberal and unconventional in the field. In fact, some of us first thought it was a typo. Maybe the authors can convince the reader that this is an appropriate choice of acquisition by explaining how this could be achieved (CAIPI = 1/4) and/or reporting basic quality metrics (e.g. tSNR) that allow judgement of the g-factor penalty.<br /> 2b) We were a bit surprised by the application of partial Fourier in both phase encoding directions. We believe that this might be an important piece of information to be reported in the manuscript and might help explain why no high-resolution attention effect was observed. As the MR-physicists in the author list know much better than us, the application of partial Fourier is based on the point-symmetry of the Hermitian k-space. This means that for applications of partial Fourier in both directions, it is not possible to synthesize (recover) the missing outer k-space data that represent the high spatial frequencies. With PF 6/8 for resolutions of 0.827x0.827x0.80mm^3, this results in an effective resolution of 1.15mm in the diagonal direction. Given that V1 has a cortical thickness of at most 2.5 mm, it is perhaps not surprising that the authors failed to observe differences between deep, middle, and superficial cortical layers with this effective spatial resolution.

      3.) Interaction of attention and orientation:<br /> Maybe the manuscript could benefit from including a (supplementary) figure of the behavioral data. What was the effect of the attentional manipulation on orientation discrimination? Were the behavioral effects similar in magnitude to previous studies of spatial attention?

      4.) Units of signal change:<br /> It was not clear to us why the values on the y-axis in Figure 1 and 2 are so small compared to the percent signal change reported in Figure 3? Do the arbitrary units in Figure 1 refer to the same scaling across task conditions and time steps?

      5.) Surprisingly short inter-trial intervals:<br /> We were surprised by the unconventionally short duration of the inter-trial intervals. We wondered whether this timing introduced an HRF-bias that might have confounded the characterization of layer-specific effects. Specifically, it is likely that the shape, and possibly, the linearity, of the HRF varies with cortical depth (Figure 2). Each trial has an average length of 4.7s, followed by a variable inter-trial interval of length 1 to 2.5s. Due to the variable hemodynamic response function across cortical depth (Yacoub 2006, Petridou 2017; full citation attached below), it is expected that the depth-dependent response interacts non-linearly for trials that follow in such quick succession. As such, the accumulating signal in the superficial layers might not return back to baseline as fast as the signal in the deeper layers. In addition to the draining effect, signals might be carried over to the next trial in a depth-dependent way. Specifically, the superficial signal might not only reflect processes across cortical depths from the current trials, but also processes from previous trials while the signal at lower depth could be expected to have less ‘memory’. This layer-dependent bias of non-linear HRF might diminish the attention effect in superficial layers more than in other layers. We feel that this concern could be addressed by additional control experiments with very long inter-trial intervals.

      Yacoub E, Ugurbil K, Harel N. The spatial dependence of the poststimulus undershoot as revealed by high-resolution BOLD- and CBV-weighted fMRI. 2006:634-644. doi:10.1038/sj.jcbfm.9600239

      Petridou N, Siero JCW. Laminar fMRI: What can the time domain tell us? NeuroImage. http://dx.doi.org/10.1016/j.... Published 2019.

      6.) The performance of the spatial GLM is unclear:<br /> Figure 3 has a very appealing layout that nicely conveys the relevant information. When comparing Figure 3 (main analysis with spatial GLM) to Figure 3-Figure supplement 4 (analysis with interpolated laminar signal) we noticed that the effect of ascending/draining veins (the slope of the lines) is comparable in both, if not flatter in the latter case, which is counter-intuitive (the spatial GLM should mitigate the impact of the vascular bias from pial vessels). We would be very interested in a discussion of how the spatial GLM is expected to handle potential carry-over effects between trials such as described in Point 5.

      7.) Voxel selections:<br /> We appreciate the additional analyses summarized in Table 1, repeating the analysis including different numbers of vertices. Specifically we wondered whether not using a selection threshold on the vertices of the main experiment but instead purely relying on the ROI definition of the retinotopic localizer would lead to similar conclusions as when imposing an activation threshold. Is there a danger that a statistical activation threshold in the voxel selection could have resulted in the final layer profiles coming from patches of the cortex that are more dominated by ascending and pial veins (blooming)? Could the lack of localization specificity from those veins be responsible for the lack of layer-specific attention effects? In fact, if we could access the data, we would be interested in repeating the analysis and specifically excluding the voxels with the largest responses (which the authors have focused on), as these are the very voxels that are most likely to be contaminated by a vascular bias.

      8.) Failed to replicate or a new research question?<br /> We were a bit surprised about the article type this manuscript is listed as. In previous public communication (e.g. workshops and thesis) with the lead author, the study was phrased in the context of a replication attempt. However, the article type chosen here is “New results”, as opposed to BioRxiv’s other available categories: “Confirmatory Results”, or “Contradictory Results”. <br /> While we believe that either category would be of interest to a large readership, we feel that the manuscript would benefit from an in-depth discussion of previous layer-fMRI studies that could indeed replicate a spatial attention effect in superficial layers. Maybe the authors can use these studies to estimate the expected effect size of the layer-specific attention effect in a power analysis explaining why the study at hand might not have been able to detect such modulations. Example studies are listed below:

      Liu C, Guo F, Qian C, et al. Layer-dependent multiplicative effects of spatial attention on contrast responses in human early visual cortex. Prog Neurobiol. 2020;(July):101897. doi:10.1016/j.pneurobio.2020.101897

      Gau R, Bazin P-L, Trampel R, Turner R, Noppeney U. Resolving multisensory and attentional influences across cortical depth in sensory cortices. Elife. 2020;9:1-26. doi:10.7554/elife.46856

      Hollander G De, Zwaag W Van Der, Qian C, Zhang P. Ultra-high resolution fMRI reveals origins of feedforward and feedback activity within laminae of human ocular dominance columns. Neuroimage. 2020. doi:10.1101/2020.05.19.102186

      Klein BP, Fracasso A, van Dijk JA, Paffen CLE, te Pas SF, Dumoulin SO. Cortical depth dependent population receptive field attraction by spatial attention in human V1. Neuroimage. 2018;176(October 2017):301-312. doi:10.1016/j.neuroimage.2018.04.055

      Lawrence SJD, Norris DG, de Lange FP. Dissociable laminar profiles of concurrent bottom-up and top-down modulation in the human visual cortex. Elife. 2019:1-28. https://doi.org/10.7554/eLi....

      Marquardt, I., De Weerd, P., Schneider, M., Gulban, O. F., Ivanov, D., Wang, Y., & Uludag, K. (2020). Feedback contribution to surface motion perception in the human early visual cortex. ELife, 9, 1–28. https://doi.org/10.7554/eLi...

      9.) How can a large number of participants account for head motion?<br /> Lastly, while we agree that it can be useful to include larger sample sizes for population statistics we fail to follow the reasoning: “For example, at a resolution this high, even the smallest movement of the participant may cause additional blurring of the data, with potentially detrimental effects on the signal-to-noise ratio. For this reason, we collected data from 17 participants”. It could be argued that to reduce the influence of measurement error, high-resolution fMRI experiments should repeatedly sample a small number of subjects. Given the large number of participants, we would be especially interested in a discussion of individual results, in relation to individual motion estimates.

      Stylistic suggestions:

      Line 10: “Directing spatial attention towards a particular stimulus location enhances cortical responses at corresponding regions in the cortex.” -> We would suggest to specify that BOLD responses increase with attention, not necessarily neural responses.

      Line 80: ‘histiological’ -> histological

      Line 356: ‘T2*-weigthed’ -> T2*-weighted

      Line 367: ‘3200 m’ -> 3200 ms

      Figure 3 and supplementary figures -> Could you elaborate on the gray diamonds?

      We would advise the authors to consider changing the color code in all time series figures. E.g. The two types of red and the two types of blue in Figure 1 are indistinguishable. Should the reader infer which line refers to which condition based on the magnitude of the response? If so, it could be mentioned in the caption.

      The two types of red in Figure 2 are hardly distinguishable.

      In Figure 1–Figure supplement 1, the two panels have no description that distinguishes them. We assume one refers to right and one refers to left hemispheres? It is puzzling why the unattended (blue) line in the right panel has a larger response than the attended (red) line. Is it possible that trials are not labeled correctly for one of the hemispheres? Specifically, does the attention label reflect ‘attention to the left’ instead of ‘attention to the contra-lateral side w.r.t. hemisphere'?

      Overall, we find this work presents an important contribution to the field by attempting to replicate a previously observed effect and promoting a replicable pipeline. We hope that our thoughts and comments will be helpful. We are looking forward to seeing this manuscript published.

      With kind regards,<br /> Sebastian Dresbach, Lonike Faes, Johannes Franz, Omer Faruk Gulban, Renzo Huber, Miriam Heynckes , Eli Merriam, Alessandra Pizzuti, Yawen Wang

    1. On 2021-02-04 03:36:46, user Sara Sims wrote:

      Reviewer #3 (Minor Comments):

      P2, ?3. The authors cite To et al. (2011) for the claim that foveal magnification is greater than peripheral magnification. However, to make this claim, To et al. rely on a number of other citations which would be more appropriate here (?2 of their introduction). A clear example of this is Horton and Hoyt (1991). Additionally, it might be more appropriate to describe cortical magnification as having units of square-mm/square-degree rather than only mm/degree. <br /> We appreciate reviewer 3 for her/his suggestion, we cited Horton and Hoyt, 1991; Azzopardi and Cowey 1993 in the third paragraph of Introduction on Page 3.

      P2, ?3. Additionally, the final line of this paragraph addresses receptive field size. It might be of interest to review the finding of Harvey and Dumuolin (2011) [10.1523/JNEUROSCI.2572-11.2011], that the product of the pRF size and the cortical magnification factor are approximately constant across human V1 and nearby visual cortex. <br /> We added the information regarding how receptive field size and cortical magnification factor changes as eccentricity increases through V1 constantly in human V1 and near visual areas in the third paragraph of Introduction on Page 3.

      P4, continued ?1. in order to understand what the FEF's inclusion in the Dorsal Attention Network means, it might be useful to introduce the Dorsal Attention Network briefly when discussing the DMN and the FPN. <br /> This sentence was reworded for clarity, including removing reference to the Dorsal Attention Network since it was not relevant to the sentence’s main point.

      P4, full ?1. Given the amount of work that has been done on the fronto-occipital and inferior longitudinal fasciculi, the following sentence should probably include a citation or three. "Major white matter tracts that connect to the occipital lobe such as the inferior fronto-occipital fasciculus (connects occipital lobe to lateral prefrontal cortex) and the inferior longitudinal fasciculus (connects occipital lobe to anterior temporal lobe) have been well documented using tractography methods in humans." <br /> We have added this citation to the text in the introduction: “Major white matter tracts that connect to the occipital lobe such as the inferior fronto-occipital fasciculus (connects occipital lobe to lateral prefrontal cortex) and the inferior longitudinal fasciculus (connects occipital lobe to anterior temporal lobe) have been well documented using tractography methods in humans (Wu et al., 2016).” <br /> Here is the full citation: Wu, Y., Sun, D., Wang, Y., & Wang, Y. (2016). Subcomponents and Connectivity of the Inferior Fronto-Occipital Fasciculus Revealed by Diffusion Spectrum Imaging Fiber Tracking. Frontiers in Neuroanatomy, 10, 88.

      P4, full ?2. This paragraph is a bit hard to follow and might be improved by breaking it up into shorter sentences. In particular, I'm not 100% sure what the authors mean by "direct and indirect structural connections". Additionally, I'm not sure why the end of this sentence follows from its beginning: "Since functional connectivity between two brain regions could come from both direct and indirect structural connections, we used DWI to examine direct connections between regions (Adachi et al., 2012; Honey et al., 2009) that were previously found to show functional connections." <br /> We have changed the wording of this paragraph to the following:<br /> “The goals of the current study are 1) to assess the reproducibility and generalizability of retinotopic effects on functional connections between V1 and functional networks that were found in prior work (Griffis et al., 2017). We aim to extend these findings in a new dataset collected under different task conditions (previous work used blocks of rest during a task with central fixation and the current data was collected as part of a resting-state only scan). 2) Extend prior work on the retinotopic connectivity difference to structural connections between V1 and functional networks. 3) Examine the relationship between functional and structural connections. Since functional connectivity between two brain regions could be derived from measurable structural connections, we used DWI to examine connections between regions (Adachi et al., 2012; Honey et al., 2009).”

      P4, full ?3. Again, the concept of a "direct connection" versus an "indirect connection" appears prior to being introduced. Given that this paragraph marks the concept as critical to the point of the paper, the introduction needs to explain what these are. Additionally, it seems that the paper separates the idea of a direct/indirect "structural connection" from that of a direct/indirect "functional connection". This should all be clearer. <br /> In addition to the text added in response to the above comment the following text has been added to the paragraph referenced in this comment: “the pattern of structural and functional connections is similar, suggesting that this lateral frontal functional connection pattern arises from a direct (uni-synaptic) structural connection.” for additional clarification.

      P6, ?3. "Previous work has shown that cortical anatomy is a reliable predictor of the retinotopic organization of V1 (O. Hinds et al., 2009; O. P. Hinds et al., 2008) so that the more posterior parts of the visual cortex represent more central portions of the visual field." At the risk of splitting hairs, the publications by Oliver Hinds show mainly that the V1 *boundaries* are reliably predicted by anatomy. A better citation for the V1 *retinotopic organization* is Benson et al. (2012) [10.1016/j.cub.2012.09.014], wherein we actually assessed the retinotopic maps and not just the boundaries. <br /> This citation has been added.

      P6, ?3. "The average eccentricity of each segment was estimated from Benson and colleagues' probabilistic retinotopy template (Benson et al., 2012)..." The correct citation for the retinotopic template is Benson et al. (2014) [10.1371/journal.pcbi.1003538], along with Benson and Winawer (2018) [10.7554/eLife.40224] assuming you are using a recent version of the template, which appears to be the case based on Figure 2 (though given that you are using the FreeSurfer V1 boundary also, I can't really tell). Additionally, it isn't technically correct to call this a probabilistic template (such as might be said correctly of the visual area atlas by Wang et al., 2015). The retinotopic template is more accurately a model of retinotopic organization fit to the average retinotopic organization across many subjects-it does not explicitly express or depend on probabilities. <br /> Wording has been changed to retinotopic template.

      P6, ?3. "These ROIs were defined in the gray matter on the cortical sheet for the freesurfer template, then moved into the individual anatomical space for each participant." I believe that the authors' intent here is to state that ROIs were defined on FreeSurfer's fsaverage brain using the eccentricity of the retinotopic template (which is also defined on the fsaverage brain) then were interpolated over to individual subject cortical surfaces using FreeSurfer's anatomical registration. However, I don't have a good prior for what the "freesurfer template" is here or what the "gray matter on the cortical sheet" of it might be, so this may all be wrong. Perhaps the implication is that the ROIs were hand-drawn in the voxels of the fsaverage subject's "ribbon," but if so, is the interpolation back to the individual subject done on the surface or using FreeSurfer's newish diffeomorphic volumetric alignment? <br /> The following text has been revised to further clarify for the reviewer: “These V1 eccentricity segment ROIs were defined on FreeSurfer's fsaverage brain using the eccentricity of the retinotopic template then were interpolated to individual subject cortical surfaces using FreeSurfer's anatomical registration. To avoid the potential for artifacts due to differences in ROI size when comparing probabilistic tractography results, the number of vertices were kept similar (on the Freesurfer fsaverage brain) between eccentricity segments.”

      P6, ?3. "To avoid the potential for artifacts due to differences in ROI size, the number of segments per eccentricity region were assigned to more evenly distribute ROI size." Again, this is not at all clear. Earlier text in this paragraph implies that the segments *are* eccentricity regions. Does this sentence indicate that the segments were adjusted in each individual subject to be of a similar size? Or that the ROIs were split into several segments each before interpolation? Is there a material difference between what was done and simply starting with a larger number of segments? It's not clear to my why the process is described in terms of three segments whose eccentricities are reported then redescribed in terms of more segments whose eccentricities are not reported. <br /> We acknowledge that the reporting of the V1 ROI eccentricity segments was unclear. We have simplified the text to be more clear so that it now reads: “Based on this template, 3 retinotopic regions were identified: central vision (mean eccentricity estimates of 0-2.2 degrees visual angle), mid-peripheral vision (mean eccentricity estimates of 4.1-7.3 degrees visual angle) and far-peripheral vision (mean eccentricity estimates of 14.1-25.5 degrees visual angle) (Figure 2).”

      P7, ?1. "... voxels within the white matter corresponding to the network ROIs were used as track seeds." I found this initially confusing as immediately prior to this section, "ROI" refers to the ROIs of V1, which should have no truck with the white-matter (i.e., a white-matter voxel predicted to be in an ROI derived from the FreeSurfer's V1 label or the retinotopic template must by definition be erroneous). However, I suspect that this is intended to be about a separate set of network ROIs? This should be clearer. <br /> Yes, there are two sets of ROIs, the V1 ROIs and the Network ROIs. The “network ROIs” has been changed to “network-ROIs” to emphasize this point further. Also, whenever the term “ROI” is used, the name of the set of ROIs being referred to is now stated.

      P7, Data Analysis. Again, citing the analysis methods is well and good, but this section should make very clear up front which data were collected/analyzed by the authors and which data were collected/analyzed by the HCP. I should be able to easily tell both what analysis steps were performed *and* which set of authors performed each step. <br /> See response to Reviewer #3 Major Comment #2.

      P7, ?2. "Next, right-to-left and left-to-right acquisitions were concatenated into a single 4D volume for the functional connectivity analysis." While I understand from this sentence that the preprocessed images were transformed into single 4D volume files, I do not follow the significance of "right-to-left" and "left-to-right" in this context. <br /> The text of the article has been changed to clarify this: “Next, both the acquisitions (those collected right-to-left and those collected left-to-right) were concatenated into a single 4D volume for the functional connectivity analysis.”

      P8, ?4. The text references a "2mm2 Gaussian kernel". Is this supposed to be 2 mm (not squared)? If so, does it refer to the FWHM or to the HWHM or to the parameter ?? It says the "surface maps" were smoothed, but was this done on the FreeSurfer cortical sphere (in which case, mm is a curious unit)? Volumetrically? Something else? <br /> This was a typo it has been changed to “2mm” and the text now reads “Surface maps of the track termination probabilities were smoothed using a 2mm FWHM Gaussian filter and averaged across all subjects.”. This was done with mri_glmfit “fwhm” flag.

      P9, ?1. More information is needed about the t-tests that were used. Were these tests one-tailed or two-tailed? Corrected for multiple comparisons or not? How was mri_glmfit used to perform these tests? The help-file for mri_glmfit mentions t-tests only in the context that a certain use-case reduces to a t-test in some circumstances. <br /> We have added “two-tailed” to the text. The mri_glmfit function can be used as a t-test under one sample group mean test with the --osgm flag. We did not correct for multiple comparisons due to the analysis’s design with specific, planned comparisons.

      P9, Comparison of Functional and Structural Connectivity. Was only one correlation coefficient calculated? Were the authors not interested in these correlations for the non-central V1 regions? It seems irregular that only one of these would be examined given the experimental setup and the hypotheses of the manuscript. <br /> We have now included dice coefficients, per the reviewer’s suggestion, as well as adding non-central V1 regions in this new analysis.

      Methods, generally. In a couple of places, the authors refer to commands like "mri_vol2surf" (P8, ?1). It would be ideal if the command lines or scripts were also provided with the manuscript. <br /> The code has now been added to the code repository.

      P9, ?4. "The t-test comparing functional connectivity to different eccentricity segments in V1 revealed significant effects (p<.001) and brain regions belonging to FP, CO, and DMN functional networks (Figure 3)" is the "and" here supposed to be "in"? <br /> This edit has been made.

      P9, ?4. It's not clear to me how "preference" was evaluated here. For example, "central representing V1 was preferentially connected (over mid-peripheral and far-peripheral V1) to regions associated with the FP network". Was this assessed by visual inspection? A good quantitative metric would be nice to have here, such as the dice coefficient for each ROI-network pair. <br /> We have added dice coefficients to the analysis. See Tables 1 & 2.

      P9, ?4. "Those previous results had also shown differences in connectivity between mid-peripheral-representing regions and far-peripheral representing regions, which were not observed here, (Figure 3)" <br /> This text has been reworded for clarity: “However our results differ in that mid-peripheral-representing regions and far-peripheral representing regions differences were not observed here (Figure 3).”

      P10, Figure 3. "There, vertices in yellow showed stronger (z>3) connectivity to central V1 than to both Far peripheral and mid-peripheral regions." I do not understand the significance of "(z>3)" in this caption. Additionally, what is the significance of the gray color shown on all brains in the bottom row? <br /> Clarification has been added to the Figure legend, including “The grey regions indicate the location of the other networks.”

      P11, ?1. "... we performed pairwise comparisons of functional connections... Results indicate that ... there are preferential connections between central V1 ..." Again, I'm not clear how preference is being assessed here, or what is being compared pairwise. Pairwise comparisons between segments and networks? What values exactly were compared? If these are referring to visual inspection, that is fine, but the language seems to suggest something more programatic, and what that might be is not clear. <br /> The text has been clarified to now state “We performed statistical comparisons (t-test) of functional connections between central vs far-peripheral eccentricity segments of V1 and the FPN (Figure 4).”

      P11, Figure 4. Please tell us what exactly is being plotted. What value minus what value? <br /> The values being subtracted have now been added to all figures.

      P14, ?2. "A comparison between structure and function showed overall agreement, indicating that the functional connections are likely mediated by direct structural connections (Figure 6, right column)." Depending on what the authors mean by "mediate" I'm not sure that this follows. Please elaborate. <br /> We acknowledge that this wording is unclear. We have therefore changed the wording of this statement to the following: “These relationships indicate that the overall pattern of connectivity of central V1 greater than far peripheral V1 is consistent across modalities with an especially high overlap within the FPN.”

      P14, Figure 6. "Far-peripheral and central V1 are statistically different within the FPN..." How was statistical difference within the FPN assessed? <br /> Please refer to the following section:<br /> “Tractography Analysis <br /> To test the hypothesis that patterns of functional connections previously found in V1 (Griffis et al., 2017) are similar to patterns of structural connections, comparisons were made between the central and far-peripheral eccentricity segments of V1 connectivity patterns to the FPN. Differences in track probabilities corresponding to V1 eccentricity segments connections were compared by paired, two-tailed t-test (using Freesurfer’s mri_glmfit with a one sample group mean test). “

      Style/Aesthetic Comments <br /> Throughout the manuscript, starting on P3, full ?1, there are several mismatched parentheses that are distracting. These typically look like this: "some claim is made here (e.g., (Someone et al., 2010) then continues here". Almost all of these could be fixed by removing the "(e.g. ". That said, the use of "e.g., "makes me think that there are other citations that *should* appear here, but haven't been filled in yet, especially given that many of these are broad statements somewhat outside my particular expertise, such as "The fronto-parietal network (FPN) directs attentional control (e.g., (Zanto & Gazzaley, 2013)".

      P4, L8. "Markov et. al," should be "Markov et al.," <br /> This edit has been made.

      P4, full ?2-3. The authors mix the style "Something listable: (1) first thing, (2) second thing..." and the style "Something listable: 1) first thing, 2) second thing." <br /> This formatting has been changed.

      P7, ?1. The acronyms "FP" and "CO" were previously reported as "FPN" and "CON". This needs to be fixed throughout. I get that at times the intention is to represent the deduplication of the word "network," i.e., "the fronto-parietal and default mode network" becomes "the FP and DMN". I think this usage is less clear to readers than "the FPN and DMN" and, besides, the text sometimes says "the FP and DMN networks" (P8?3L3, P9?4L4). Alternately, introduce FP et al. as separate acronyms on P7: "Fronto-parietal (FP), cingulo-opercular (CO), and default mode (DM) networks...". <br /> Abbreviations have been edited for consistency.

      Reviewer #3 (Additional data files and statistical comments):

      As mentioned in the Major and Minor comments, most if not all of the statistical tests need to be more explicitly described. I could not currently reproduce the exact tests from the manuscript, even if I had the data.

      Additionally, because the project is a reanalysis of a large dataset, it would be particularly valuable to have the source code used for analysis. It is nearly impossible to reproduce or assess a project like this without such code.

      The code for the analysis has now been added to a repository and it is referenced in the paper.

    2. On 2021-02-04 03:34:40, user Sara Sims wrote:

      Reviewer #2 (General assessment and major comments (Required)):

      In this work, Sims and colleagues use resting-state functional connectivity and diffusion tractography in human connectome project data to examine the connectivity of the central and peripheral aspects of primary visual cortex. They find that central V1 connects more strongly to regions of prefrontal cortex interpreted as the Fronto-parietal network than does peripheral V1.

      The idea that central V1 may be directly connected to control-related networks is an interesting one, and has fascinating implications for the study of top-down modulation of visual cortex function. However, I must say I am somewhat skeptical of these findings, for several reasons. <br /> First, I find the a priori anatomical basis for these proposed connections to be dubious. The authors themselves describe how Markov et al. explicitly conducted tract tracing with central V1 and found connections with posterior frontal and parietal cortex, but nothing with areas classically associated with the fronto-parietal cortex. The authors propose that the inferior fronto-occipital fasciculus may connect V1 with lateral prefrontal regions only in humans. However, they provide no evidence for this suggestion. Indeed, my understanding of the iFOF is that it connects to inferior and lateral occipital cortex (see e.g. figures from the Takemura study cited in this work). Can the authors better support the idea that the iFOF might be the route of connection between V1 and frontal cortex?

      Thank you for your comments. We agree that while the data and methods we present here don’t address whether the iFOF is the route of connection between the inferior and lateral occipital cortex, more evidence from relevant literature would be helpful. The figures from the (Takemura et al., 2016) paper shows only inferior and lateral occipital cortex and are ambiguous for our regions of interest. However, other papers suggest that iFOF may be the route of connection between V1 and frontal cortex:

      A paper by Wu and colleagues shows figures indicating that the IFOF does provide a connection between the medial occipital cortex and IFG. We now cite this in the paper. “Major white matter tracts that connect to the occipital lobe such as the inferior fronto-occipital fasciculus (connects occipital lobe to the lateral prefrontal cortex) and the inferior longitudinal fasciculus (connects occipital lobe to anterior temporal lobe) have been well documented using tractography methods in humans (Wu, Sun, Wang, & Wang, 2016).”

      Second, I am concerned that both 1) the Central V1 ROI employed in this work and 2) the inferior frontal cortex region showing strong FC with that Central V1 ROI overlap very closely with regions where we have seen poor BOLD signal in our own fMRI data (I would like to attach a figure if possible). <br /> We are not confident what the source of the poor signal might be in posterior occipital or inferior frontal cortex; we suspect the presence of large veins (possibly the transverse sinus in V1; see Winawer et al., 2010, Journal of Vision). In any case, the data quality is low enough that we believe our data should not be considered to represent actual neural function in those regions. Can the authors demonstrate convincingly that this is not the case in their HCP data?

      The reviewer suggests that based on their data, posterior occipital and inferior frontal cortex have relatively poor signal. They suggest that this poor signal would result in spurious correlations between the regions because of large veins. As described in our methods section for preprocessing of resting state scan data, white matter and CSF timecourses were regressed out, which aids in removing average venous artifact. Replication between 2 datasets (HCP and Griffis et al., 2017) and 2 modalities (DWI and resting state) further indicate the reliability of this effect.

      The Winawer et al., 2010 article cites (Schira, Tyler, Breakspear, & Spehar, 2009) when discussing this issue; that paper suggests that poor signal in these regions may come largely from partial voluming (conflating signal from gray matter with signal from veins), and that these can be managed through increasing resolution with smaller voxel sizes. Our data are collected at resolutions finer than their recommendations, suggesting that such an effect should be minimal in this dataset. We have added the following text to the limitations section to address this comment: “We also acknowledge that large veins near posterior occipital cortex could impact our functional connectivity measurements in this area. However, we performed extensive pre-processing to reduce the impact of vessels on activity. In addition, the voxel size of our resting state scan is small (2mm isotropic), mitigating contributions from nearby veins due to partial voluming effects (Schira et al., 2009).”

      Third, I have an issue with the localization of effects in this paper. The paper describes effects in the fronto-parietal network throughout the manuscript, including the title. How surprising, then, that the strongest effects are not in FP network at all! Figure 4A makes it very clear that the largest effects are in the IFG, which is outside the green outlines describing the extent of the fronto-parietal network, but inside the Default network. <br /> Figure 3A also supports this Default-centric localization, with Central V1 effects in posterior lateral parietal, medial parietal, and superior frontal cortex, all outside FP but inside Default. Since the FC effects are not actually primarily in FP, I see no reason why FP should be used as a mask in Figure 5. Indeed, the authors should show the localization of SC effects throughout the cortex, not just in FP. I also see no reason why these V1-Default connections should be characterized in any way as "attention" or "control".

      We appreciate the reviewer’s comment and have made extensive modifications to the paper in response. The reviewer notes that some vertices of the effect we observed in left frontal cortex are in a portion of the IFG that is not classified by Yeo et al, 2011 as part of the frontoparietal network, but instead classified by that paper as the default mode network. We would like to note that most other papers that define DMN would not have included the IFG as part of that network, and in fact, Yeo’s 17-network parcellation from the same paper does not classify that portion of cortex as part of the default mode network. The inclusion of that parcel as part of the DMN is likely an artifact of the requirement of the algorithm in that paper to subdivide the brain into 7 discrete networks. However, the set of vertices can be described as being in the inferior frontal cortex, and we have reworked our discussion to de-emphasize the fronto-parietal network.

      This said, we also quantified the similarities between the frontoparietal cortex and the functional connectivity patterns selective for V1, using Dice coefficients. This is now shown in Table 1. <br /> We have described this table within the text as follows: “Table 1 indicates high similarity between central V1 dominant regions and the FPN and partial similarity to portions of the CON and DMN, while the other V1 segments, mid- peripheral and far-peripheral are not strikingly similar to any networks.”

      We have also added the following text to the article in reporting of Figure 4: “This inferior frontal gyrus region aligns well with the anterior portion of the FPN as defined by Yeo, but interestingly, it does expand somewhat beyond that border into the IFG (Inferior frontal gyrus) which is related to attention and control (Baldauf & Desimone, 2014; Chong, Williams, Cunnington, & Mattingley, 2008; Fassbender et al., 2004; Hampshire, Chamberlain, Monti, Duncan, & Owen, 2010; Swick, Ashley, & Turken, 2008, 2011).”

      The reviewer also suggests that localization of structural connectivity effects should be shown throughout the cortex. We have added a figure 5 that shows the effects in our three networks of interest on the same cortical sheet. This figure shows more clearly the delineations of the strong effects. For technical reasons, we cannot perform these analyses on the cortex’s entirety at once: as described in the methods section, probability tracking for each network was calculated separately. Interestingly, however, despite this, the patterns look continuous across the boundary.

      Fourth, I feel that these FC and SC differences are wildly over-interpreted. From the scale, the actual strength of FC and SC between central V1 and lateral parietal cortex is extremely weak (around Z(r) = .1 for FC and p-track = .1 for SC). Under no circumstances would I believe that either of those values represents any sort of real connection. Cortical regions with direct structural connections have much stronger FC values than regions that indirectly influence each other via multi-step connections.

      Functional connectivity magnitudes are always influenced by the preprocessing done to obtain them. In this case we regressed out the mean signal, and regressed out white matter and CSF. While this practice decreases the mean correlation strength (Shirer, Jiang, Price, Ng, & Greicius, 2015; Weissenbacher et al., 2009) it also improves across-subject reliability (Burgess et al., 2016). The debate about this practice, now a decade long, has focused on the interpretability of negative correlations, which we do not do here. All sides of the debate agree that the practice of mean signal regression should not influence relative correlations across brain areas.

      We are looking at variability in connection strength between different portions of a single brain area, and we would expect roughly similar long-range connectivity between different parts of V1. We have incorporated this point into the discussion on page XX where we say “ While central and peripheral representation portions are still part of the same V1 area, and therefore we would expect similarity in their connectivity patterns, our results indicate that eccentricity differences do exist and are consistent with previously reported differences in information processing on central and peripheral visual information.”

      In addition, we added to the limitations section a discussion of this:<br /> “Here, we show functional connectivity strengths on the order of r=0.1. While very reliable, these magnitudes are not as large as connections to other areas, for example, portions of the occipital lobe. Functional connectivity magnitudes are always influenced by the preprocessing done to obtain them. In this case, we regressed out the mean signal and regressed out white matter and CSF. While this practice decreases the mean correlation strength (Shirer et al., 2015; Weissenbacher et al., 2009) it also improves across-subject reliability (Burgess et al., 2016). The debate about this practice, now a decade long, has focused on the interpretability of negative correlations, which we do not do here to examine relative correlations across brain areas.

      Further, very large portions of the brain probably have both stronger FC and SC to central V1 than these FP regions (the authors show this for FC but exclude this info for SC). <br /> We have included a new figure to show the SC patterns across more than just the FPN (now includes regions within FPN, DMN, and CON), now Figure 5. Along with the following text, “Next, we investigated similar comparisons between central and far-peripheral V1 in a different modality- structural connections. A t-test comparing the structural connection of central and far-peripheral V1 revealed significant effects (p<.001) in brain regions belonging to FPN, CON, and DMN functional networks (Figure 5). We chose these three networks to compare to functional connectivity findings from Figure 3. <br /> Notably, central representing V1 was preferentially connected (over far-peripheral V1) to regions associated with the FPN, including the mid orbitofrontal and inferior parietal regions of the FPN, as well as lateral portions of the DMN, and the insular portion of the CON. In contrast, far-peripheral representing V1 was preferentially connected (over central V1) to medial portions of the DMN (Figure 5).”

      Most glaringly, I certainly don't believe there is a "direct structural connection" as is claimed in the discussion--a claim based, strangely, on the spatial correspondence between the structural and functional maps, which really has nothing to do with any evidence for a direct connection. <br /> As stated in the discussion limitations section “structural tractography analysis only identifies direct connections”. <br /> The probabilistic tractography method can only show connections between Region A and Region B. It cannot indicate if there were connections between Region A and Region B that traveled via Region C. Therefore if a connection is indicated by the method, it must be direct. <br /> The statement of a “direct structural connection” is not an interpretation of the correspondence between structural and functional maps, but an interpretation of the structural maps.

      Finally, the authors must note that p values may not be used for spatial correlations between brain maps. This is because these maps are always highly autocorrelated, which violates the independence assumption of the correlation procedure. <br /> We have replaced spatial correlations between brain maps with Dice coefficients, a more field-standard method for comparing spatial maps. We thank the reviewer for the comments and think this new way of analyzing it is a better fit.

      Reviewer #2 (Additional data files and statistical comments):

      The authors should show the data (maps or scatterplots) going into their spatial correlation on page 13. <br /> Based on comments from reviewers, we changed this part of the analysis to dice coefficients with the following text : “A Dice Coefficient was calculated for comparison of the functional and structural connectivity differences of central vs far-peripheral V1 to the FPN, CON, and DMN. Across all 3 networks the Dice Coefficient (averaged across left and right hemisphere) between structural and functional connectivity patterns was .707.<br /> Within the FPN the Dice Coefficient (averaged across left and right hemisphere) between structural and functional connectivity patterns was .915. Within the CON the Dice Coefficient (averaged across left and right hemisphere) between structural and functional connectivity patterns was .842. Within the DMN the Dice Coefficient (averaged across left and right hemisphere) between structural and functional connectivity patterns was .85. These relationships indicate that the overall pattern of connectivity of central V1 greater than far peripheral V1 is consistent across modalities with an especially high overlap within the FPN.”

    3. On 2021-02-04 03:33:36, user Sara Sims wrote:

      We would like to thank the Editor and Reviewers for their helpful comments and suggestions. We have responded to them below. We believe the changes made at the behest of the editor and reviewers have greatly improved this paper.

      Reviewer #1 (General assessment and major comments (Required)):

      This manuscript extends on prior work by the authors (Griffis et al, 2017), which originally reported eccentricity-dependent differences in resting state connectivity between V1 and regions brain wide. This study builds on that work by expanding the pool of participants, using the HCP dataset, as well as also investigating any eccentricity-dependent effects that may emerge with tractography. Interestingly, both measures find that foveal areas in V1 are more strongly connected to frontoparietal networks. The study is interesting, and I believe warrants publication. I have a few remaining points.

      1) While during the resting state scans, there was, in theory, no 'task', participants were asked to maintain fixation on the cross in the center of the screen throughout the scan. I think it would be important for the authors to note that there is a possibility that the resting state correlations observed wherein foveal areas were more correlated with frontoparietal regions (and far periphery with DMN areas) could be due to attention directed towards the fixation cross, and away from the periphery. While I acknowledge the authors have no way to test this with this data set, it is possible that if participants had been asked to covertly attend to a ring in their far periphery the entire time instead, the correlations might have been flipped, with frontoparietal connectivity highest in the periphery towards the attended eccentricity. The authors should either explain why this is not a concern, or acknowledge it in the manuscript.

      Explained with point #2. See below.

      2) Related to the last point, what was the size of the screen used during the connectivity data acquisition? I ask because the far eccentricity bands determined using Benson et al's technique are *very* eccentric. And if participants had eyes opened and were fixating, was that eccentricity outside the outer edge of the screen? Because then it would be encouraged to be 'unattended', thereby potentially influencing connectivity results.

      We now acknowledge these concerns (1 &2) in the limitations sections and have added the following text: “It should also be acknowledged that functional connectivity can be influenced by attention (Gratton et al., 2018; Griffis, Elkhetali, Burge, Chen, & Visscher, 2015; Salehi et al., 2020). In both, the work by Griffis and colleagues (2017) and the current study’s resting-state scan, a fixation cross presented on a screen at the end of the bore, and participants were scanned while inside the MRI bore. Participants may, therefore, have been allocating more attention toward the visual space in the center (the screen) than the periphery (the bore). However, the fact that we observed complementary effects in the structural data indicates that these data are likely not due to transient states of attention and are likely to represent biological organization.”

      3) Was there any attempt at replicating these results in extra striate cortex? Are these patterns still there, both in structural and functional connectivity, for V2 or V3?

      Investigation of the extra striate cortex was out of the scope of the present study. However we acknowledge that this is an important avenue of research in the future. We have therefore added the following text to the future directions section: “The investigation of connectivity between retinotopic visual areas and functional networks could be expanded to other retinotopically mapped extra-striate cortex in future studies.”

    1. On 2020-12-18 13:49:48, user Karen wrote:

      Beautiful paper! I think there may be some confusion on the VM6 glomerulus. This glomerulus was renamed VC5 in the Bates paper and continued here. The Bates paper noted that there has been confusion on VM6 in the past, presumably due to its poorly defined morphology with nc82 staining. However, the VC5 that Richard Benton has named (aka Ir41a ORNs) is relatively small and corresponds better to VC3m (Li Volkan 2016 refer to it with both names).

      My lab has recently identified a drive that identifies a previously unstudied 4th ac1 ORN and this ORN targets the "classic" VM6, which by morphology, position and size matches the glomerulus you are calling VC5. GCaMP imaging shows that these neurons have a different response pattern than the Ir41a VC5/Vm3m ORNs. Several papers studying ORN lineages using MARCM and other clonal analysis have found that the VM6 ORN develops from the same lineage as the three previously known ac1 ORNs, which makes sense since all four are in the same sensilla and presumably come form the same SOP (Endo Hama Nat Neuro 2005, Li Volkan Plos Genetics 2016, Chai Benton Nat Comm 2019).

      For consistency in the literature, I would think the following make sense based on morphology, clonal analysis, and historical references:

      Or35a ORNs- target either VC3 (Couto 2005, Grabe 2015/2016, Silbering 2011) or VC3l (Fishilevich 2005, Li 2016)

      Ir41a ORNs- target either VC5 (Grabe 2016, Silbering 2011, Li 2016) or VC3m (Li 2016)

      4th ac1 ORNs (our driver- that we can share)- target VM6

      Happy to discuss further if you'd like!

    1. On 2020-12-01 20:43:45, user Guangmei Liu wrote:

      Introduction<br /> We are university students taking an upper-level neurobiology course that centers on understanding neural circuits and modern research techniques through in-depth discussions of recent literature. To fully immerse ourselves in current scientific discourse, we have written this review of the manuscript from Park et al. posted on biorxiv.org (version: November 12, 2020).<br /> Jamie Dela Cruz 1, Angélica Gaona 1, John Axiotakis 1, Guangmei Liu 2<br /> 1 Senior undergraduate in Neurobiology, Boston University. 2 First-year PhD student in Neurobiology, Boston University.

      Summary<br /> There is a growing body of literature examining the effects of social deprivation during the critical developmental period and how it affects later social function. In particular, Park et al. are interested in studying social recognition, or the ability of an animal to distinguish a novel conspecific from a familiar one. To uncover what neural circuits may underlie this, the authors used juvenile social isolation (jSI) and pharmacogenetic manipulation to study the effects of early isolation in mice. They first raised singly housed (SH) mice and group housed (GH) mice. SH mice lived alone for 8 weeks immediately after weaning, whereas GH mice lived together in those 8 weeks. Afterwards, SH and GH mice were re-socialized for 4 weeks. The authors then used a variety of behavioral tests to examine the social behaviors of SH and GH mice. Next, they inhibited nucleus accumbens shell (NAcSh)-projecting IL neurons in GH mice to see if the pathway is required for social recognition. Lastly, to see if the social recognition deficits in SH mice could be reversed, the researchers selectively activated NAcSh-projecting IL neurons in SH mice. They found that jSI impairs social recognition through decreased excitability of the mPFC IL-NAcSh pathway and that pharmacogenetic manipulation of this population also selectively affects social recognition. Therefore, this paper presents a novel brain circuit required for social recognition and adds to the literature implicating the mPFC and NAcSh in early social development. Overall, we recommend that the authors consider different statistical tests for certain figures, as the distribution of their data appears to be bimodal at times. We also suggest that the authors run another cohort of SH and GH mice through the experiments, this time performing tests both before and after resocialization to distinguish between the effects of jSI and resocialization. We see an opportunity to provide more evidence for the effects of resocialization by adding a parallel cohort of SH and GH mice who were never resocialized. Additionally, our review discusses portions of the paper where the authors could provide more explanation for certain methods and tweak the figures for improved clarity. <br /> In Figure 1, they investigate what social phenotypes are affected by early social isolation. They used the 3-chamber test to see if either mouse type showed social preference (spending more time exploring a conspecific rather than an object) and social recognition (spending more time interacting with a novel conspecific than a familiar one) (Figure 1C, 1D). There was no significant difference between SH and GH mice in the social preference test, but SH mice did show a significant social recognition deficit. To see if this was caused by a general recognition memory deficit or hippocampus-dependent memory deficit in SH mice, both mouse types underwent the novel object recognition test and the object place recognition test (Figure 1E, 1F). However, there were no significant differences. In Extended Data Figure 1, the researchers looked at whether the SH mice were physiologically or emotionally different from GH mice. Researchers compared the body mass, basal locomotor activity, and anxiety levels between the two, also finding no significant differences.<br /> Extended Data Figure 2 looks at whether different durations of social isolation and resocialization will result in different behavioral phenotypes. First, they decreased the isolation time by singly housing mice for 2 weeks after weaning and resocializing for 4 weeks. In this case, SH mice showed no significant differences in social behaviors compared to GH mice. They then singly housed mice for 8 weeks after weaning and regrouped for 8 weeks to increase the resocialization time. Despite this increase, SH mice in this treatment showed the same social recognition deficit as mice in the original SH treatment.<br /> In Figure 2, the authors injected a retrograde virus into the NAcSh for GFP labelling to see what regions of the mPFC were sending the most inputs. Neurons in the ventral mPFC regions were heavily labelled, with the most labelling at the infralimbic cortex (IL), though there were some at the prelimbic (PL) as well (Figure 2A-B). They then used ex vivo brain slice whole-cell patch clamp recordings to see the excitability of both the IL and the PL, finding that neuronal excitability was reduced in NAcSH-projecting IL neurons but not PL neurons (Figure 2C-D). Extended Data Figure 4 digs into the electrophysiological properties of these IL neurons in both SH and GH mice, finding no significant differences.<br /> Figure 3 answers two main questions, the first being: does this social recognition deficit still appear in SH mice in a different behavioral paradigm? To investigate this, they habituate both SH mice and GH mice to a target mouse on day 1. On day 2, they allow the SH or GH mouse to explore either an empty cup, a novel conspecific, or the familiar conspecific target. Once again, SH mouse explored the novel and conspecific mice equally, showing an impairment in social recognition (Figure 3A-C). The second question answered by this figure is: Are the NAcSh-projecting mPFC IL neurons differentially activated by distinct social stimuli (familiar versus novel conspecific)? The researchers used c-Fos immunohistochemistry and eGFP to examine co-labelled neurons in the IL after exposing mice to either a familiar or novel conspecific (Figure 3D-E). They found that GH mice had more c-Fos and eGFP co-labelled neurons after interacting with a familiar conspecific than GH mice that interacted with a novel conspecific, suggesting that NAcSh-projecting IL neurons are activated as a result of interacting with familiar conspecifics (Figure 3F).<br /> In Figure 4, Park et al. look at whether the NAcSh-projecting IL neurons are required for social recognition. In GH mice, they injected hM4Di receptors into NAcSh-projecting IL neurons and intraperitoneally injected them with CNO, reducing the excitability of these IL neurons (Figure 4A-B, D). These GH mice then underwent the social preference test and social recognition tests (Figure 4C). With their NAcSh-projecting IL neurons inhibited, GH mice showed social recognition deficits similar to that of the SH mice in Figure 1. Extended Data Figure 5 checks whether inhibiting the NAcSh-projecting IL neurons affected the GH mice in other physiological or psychological ways. However, the GH mice showed normal performance in the novel object recognition test, object place recognition test, open field test, elevated plus maze, and forced swim test. Extended Data Figure 6 looks at whether inhibiting these IL neurons affect sociability itself. The researchers found that inhibited GH mice did not distinguish a novel mouse from its cagemate, but this did not affect the reciprocal social interaction with a novel conspecific.<br /> Lastly, Figure 5 answers: Does increasing NAcSh-projecting IL neuronal activity rescue the social recognition deficit in SH mice? To test this, they expressed the hM3Dq receptor in NAcSh-projecting IL neurons within SH mice and injected CNO 40 minutes before undergoing the social behavior tests (Figure 5A-D). The authors found that social recognition was successfully rescued in these SH mice (Figure 5E-F).<br /> In the conclusion, they tie in their findings with similar ones regarding the hippocampus’s connections to the mPFC and NAcSh and their impact on social memory. They also discuss research about the impact of social isolation on impaired motivation and drug-seeking behavior. They wrap up with a discussion of when they believe the critical period of social recognition is and how their results can contribute to the understanding of disorders like ASD.

      Major Criticisms<br /> In Figure 1, we thought that there were a few places that could use improvement or clarification. To go into detail, we would like clarification on why isolation occurred only after weaning and not pre-weaning. Previous literature has been known to isolate mice pre-weaning, and we wanted more justification on why post-weaning isolation was done instead. In addition, we also feel as 8 weeks of social isolation is too long of a period and would like to see additional evidence on why the period could not have been shorter. We also wonder why there was no behavioral testing done before and after resocialization. If there was, we would like to see the data included in the paper. Otherwise, we would suggest that you run the same behavioral experiment on a separate cohort and carry out tests before and after resocialization. Perhaps then results of the behavioral tests run on unsocialized mice can then be depicted in panels C and D for comparison. Another criticism we would like to note is that the distribution found in panels D, E, and F emulates a bimodal distribution instead of a Gaussian distribution. If possible we would like to see a different statistical analysis run that is better fitting of the data. The same can be said for Figure 5 panel E and F. Another major issue noted is the assumption that resocialization is rewarding for the mice. In some instances, one could argue that resocialization is not rewarding as the mouse could be faced with aggressive counterparts. A measurement of anxiety levels during resocialization would help aid in your argument depending on the results. We think that one way you can approach this is by measuring cortisol levels in mice before and after they have been reintroduced. You could also quantify aggression levels before the mouse was reintroduced and once the mouse has been added back into the group. Lastly, for panels E, F, and B we are looking for a bit more clarification on what characteristics delegated a familiar and novel object, position, and mouse. For example, we were wondering if the target mouse was an age and sex match.<br /> Figure 2 looks at NAcSh-projecting IL neurons in the deep layer of the mPFC. However, we suggest that the authors clarify which layer it is. Additionally, to avoid criticisms about possible discrepancies between the number of cells counted and the slice image, we suggest that the researchers provide a high-magnification image of DAPI staining and eGFP to show that each green dot shows a nucleus.<br /> The social habituation/recognition tasks in Figure 3 were performed after 4-week regrouping. It is a good control to keep all behavioral tests after the 8-week group housing or single housing and 4-week regrouping paradigm. However, to more directly confirm the social deficit in the SH mice, we suggest the social habituation/recognition tasks also performed in parallel without regrouping.

      Minor Criticisms<br /> In Figure 1 panels C and D, we would like a bit more clarification on where the objects were located in relation to the mouse. The heat maps suggest their locations; however, it is not directly stated in the writing or the figure. In addition, a legend distinguishing between GH and SH in the Social Preference, Social Recognition, Novel Object Recognition, and Object Place Recognition bar graphs would be helpful as it is only indicated in panel A. In addition, we wondered if the chamber placement of the familiar or novel object/mouse were counterbalanced so that they were not always placed on the same side of the mouse. We speculate if there was no counterbalancing done that the mouse may have preferred a certain chamber instead of a particular mouse or object. One last thing we would like to see is the exact age range in which the mice participated in the behavioral tests since it is not made completely clear in the Methods section. <br /> The Results section that discusses Figure 2 begins with an explanation of why the authors focused on mPFC-NAcSh connections in their study. However, we suggest that this is explained in the Introduction instead since it left us wondering why the authors focused on mPFC-NAcSh connections in their study; it was unclear whether there was literature supporting this decision. <br /> In Figure 3 panel B, we think it is an interesting finding that SH mice also showed significantly decreased interacting time with the conspecifics in the habituation session as GH mice did, given that these SH mice would later do poorly in social recognition tests. Thus, it would be better to notify the readers that it is an unexpected or interesting finding, and also propose the hypothesis of this phenomenon. For the panel E and F, you did a good quantification of neuronal activity in the NAcSh-projecting IL neurons by calculating the percentage of c-Fos and eGFP co-labeled neurons among total eGFP labeled neurons, because the number of NAcSh-projecting IL neurons may change in SH mice compared to GH mice. We would like to know whether the neural circuitry from IL to NAcSh is altered by social isolation. For this purpose, we suggest you inject an equal amount of AAVrg-eGFP-Cre in both GH and SH mice, and quantify the number of eGFP-labeled neurons in them. <br /> For Figure 4, the data is generally well organized and the experimental protocol was summarized well in the schematic. However, for Figure 4D (the comparison between spiking in baseline and in the presence of CNO), it would be valuable to show the effect of the control saline vehicle on spiking for both the eGFP mice as well as the hM4Di mice to illustrate that the presence of the IP injection or saline has no effect if it indeed had none. The preference index had also proved difficult to read; the way the lines overlapped made the connections between the data points on each bar hard to discern. Perhaps color coding them or using different shapes instead of uniform dots would be beneficial. <br /> Overall, Figure 5 seemed clear and straightforward. However, we did want a bit more clarification as to why 1mg/kg CNO was used for vehicle injection even though previously in Figure 4 there was 3mg/kg used. We suspect this may be due to an increase in sensitivity pertaining to excitation but would like to see that confirmed within the literature if it is the case. Another point to be made is that 4B and 5B appear to be inconsistent and like to see a bit more clarification or a comment on why that may be.

      Merits<br /> For Figure 1 panel A, the addition of the schematic was helpful in understanding the timeline of the study for both GH and SH mice. The same can be said for panels B,E, and F. The results and tests done of the mice seemed appropriate to further aid in the understanding of social recognition and preference. All in all, the authors used every panel in the figure to justify how they came to their conclusion regarding the social preference and recognition test GH and SH mice. <br /> Figure 4 is paramount in illustrating the nuanced effect of this circuit on social behavior—namely, the impairment of social recognition while retaining nominal social preference in mice which was derived from the clearly reported results of the inhibitive manipulation. This connection is no more salient in the paper than here.<br /> Figure 5 was imperative for our understanding in rescuing social recognition. Hand in hand with Figure 4, it clearly defines how social recognition is affected by the loss and regain of the NAcSh-projection IL neurons. In particular, the middle graphs in E and F do an excellent job in highlighting the effect of IL-NAc shell neurons activation in social preference and recognition tests for both GH and SH mice.

      Future Directions <br /> The findings of this paper prove to have important impacts pertaining to acquisition of social familiarity. Although it was noted in the paper that future direction includes investigation of animal models of ASD, it would be also beneficial to look at animal models of schizophrenia. In specific, revaluation of what was found in Piskorowski et al. in comparison to this paper pertaining to the critical period. The discrepancy, that is 2 weeks compared to 11 weeks, seems way too large and should be further investigated to better understand why the critical period may be one or the other. As an alternative, it may be found that it may be neither and that there is another time period that better represents the critical time period for normal social recognition.<br /> It would also be interesting to look into why the excitability of NAcSh-projecting IL neurons had decreased. Because the neurons in SH mice showed no significant electrophysiological differences from those in GH mice, the decreased excitability is likely a result of morphological changes. Indeed, Silva-Gómez et al. (2003) found that a similar social isolation protocol in rats results in decreased dendritic spine density within the mPFC.<br /> It is interesting and surprising that both GH and SH mice showed significantly decreased interacting time with the conspecifics in the habituation session in Figure 3B, but the social recognition was impaired in SH mice only. It might be because the SH mice could not remember the familiar mice and recognized them as all novel ones, which would indicate that the processes of memory consolidation and memory retrieval were impaired in SH mice. Thus, we think it would be interesting to investigate social recognition with a perspective of memory in the future.

      Works Cited:<br /> Silva-Gómez, A.B., Rojas, D., Juárez, I., & Flores, G. (2003). Decreased dendritic spine density on prefrontal cortical and hippocampal pyramidal neurons in postweaning social isolation rats. Brain Research, 983(1), 128-136.

      Piskorowski RA, et al. (2016)Age-Dependent Specific Changes in Area CA2 of the Hippocampus and Social Memory Deficit in a Mouse Model of the 22q11.2 Deletion Syndrome. Neuron 89, 163–176.

    2. On 2020-12-01 15:19:49, user Guangmei Liu wrote:

      Introduction<br /> We are university students taking an upper-level neurobiology course that centers on understanding neural circuits and modern research techniques through in-depth discussions of recent literature. To fully immerse ourselves in current scientific discourse, we have written this review of the manuscript from Park et al. posted on biorxiv.org (version: November 12, 2020).<br /> Jamie Dela Cruz1, Angélica Gaona1, Guangmei Liu2, John Axiotakis1<br /> 1 Undergraduate in Neurobiology, Boston University. 2 First-year PhD student in Neurobiology, Boston University.

      Summary<br /> There is a growing body of literature examining the effects of social deprivation during the critical developmental period and how it affects later social function. In particular, Park et al. are interested in studying social recognition, or the ability of an animal to distinguish a novel conspecific from a familiar one. To uncover what neural circuits may underlie this, the authors used juvenile social isolation (jSI) and pharmacogenetic manipulation to study the effects of early isolation in mice. They first raised singly housed (SH) mice and group housed (GH) mice. SH mice lived alone for 8 weeks immediately after weaning, whereas GH mice lived together in those 8 weeks. Afterwards, SH and GH mice were re-socialized for 4 weeks. The authors then used a variety of behavioral tests to examine the social behaviors of SH and GH mice. Next, they inhibited nucleus accumbens shell (NAcSh)-projecting IL neurons in GH mice to see if the pathway is required for social recognition. Lastly, to see if the social recognition deficits in SH mice could be reversed, the researchers selectively activated NAcSh-projecting IL neurons in SH mice. They found that jSI impairs social recognition through decreased excitability of the mPFC IL-NAcSh pathway and that pharmacogenetic manipulation of this population also selectively affects social recognition. Therefore, this paper presents a novel brain circuit required for social recognition and adds to the literature implicating the mPFC and NAcSh in early social development. Overall, we recommend that the authors consider different statistical tests for certain figures, as the distribution of their data appears to be bimodal at times. We also suggest that the authors run another cohort of SH and GH mice through the experiments, this time performing tests both before and after resocialization to distinguish between the effects of jSI and resocialization. We see an opportunity to provide more evidence for the effects of resocialization by adding a parallel cohort of SH and GH mice who were never resocialized. Additionally, our review discusses portions of the paper where the authors could provide more explanation for certain methods and tweak the figures for improved clarity. <br /> In Figure 1, they investigate what social phenotypes are affected by early social isolation. They used the 3-chamber test to see if either mouse type showed social preference (spending more time exploring a conspecific rather than an object) and social recognition (spending more time interacting with a novel conspecific than a familiar one) (Figure 1C, 1D). There was no significant difference between SH and GH mice in the social preference test, but SH mice did show a significant social recognition deficit. To see if this was caused by a general recognition memory deficit or hippocampus-dependent memory deficit in SH mice, both mouse types underwent the novel object recognition test and the object place recognition test (Figure 1E, 1F). However, there were no significant differences. In Extended Data Figure 1, the researchers looked at whether the SH mice were physiologically or emotionally different from GH mice. Researchers compared the body mass, basal locomotor activity, and anxiety levels between the two, also finding no significant differences.<br /> Extended Data Figure 2 looks at whether different durations of social isolation and resocialization will result in different behavioral phenotypes. First, they decreased the isolation time by singly housing mice for 2 weeks after weaning and resocializing for 4 weeks. In this case, SH mice showed no significant differences in social behaviors compared to GH mice. They then singly housed mice for 8 weeks after weaning and regrouped for 8 weeks to increase the resocialization time. Despite this increase, SH mice in this treatment showed the same social recognition deficit as mice in the original SH treatment.<br /> In Figure 2, the authors injected a retrograde virus into the NAcSh for GFP labelling to see what regions of the mPFC were sending the most inputs. Neurons in the ventral mPFC regions were heavily labelled, with the most labelling at the infralimbic cortex (IL), though there were some at the prelimbic (PL) as well (Figure 2A-B). They then used ex vivo brain slice whole-cell patch clamp recordings to see the excitability of both the IL and the PL, finding that neuronal excitability was reduced in NAcSH-projecting IL neurons but not PL neurons (Figure 2C-D). Extended Data Figure 4 digs into the electrophysiological properties of these IL neurons in both SH and GH mice, finding no significant differences.<br /> Figure 3 answers two main questions, the first being: does this social recognition deficit still appear in SH mice in a different behavioral paradigm? To investigate this, they habituate both SH mice and GH mice to a target mouse on day 1. On day 2, they allow the SH or GH mouse to explore either an empty cup, a novel conspecific, or the familiar conspecific target. Once again, SH mouse explored the novel and conspecific mice equally, showing an impairment in social recognition (Figure 3A-C). The second question answered by this figure is: Are the NAcSh-projecting mPFC IL neurons differentially activated by distinct social stimuli (familiar versus novel conspecific)? The researchers used c-Fos immunohistochemistry and eGFP to examine co-labelled neurons in the IL after exposing mice to either a familiar or novel conspecific (Figure 3D-E). They found that GH mice had more c-Fos and eGFP co-labelled neurons after interacting with a familiar conspecific than GH mice that interacted with a novel conspecific, suggesting that NAcSh-projecting IL neurons are activated as a result of interacting with familiar conspecifics (Figure 3F).<br /> In Figure 4, Park et al. look at whether the NAcSh-projecting IL neurons are required for social recognition. In GH mice, they injected hM4Di receptors into NAcSh-projecting IL neurons and intraperitoneally injected them with CNO, reducing the excitability of these IL neurons (Figure 4A-B, D). These GH mice then underwent the social preference test and social recognition tests (Figure 4C). With their NAcSh-projecting IL neurons inhibited, GH mice showed social recognition deficits similar to that of the SH mice in Figure 1. Extended Data Figure 5 checks whether inhibiting the NAcSh-projecting IL neurons affected the GH mice in other physiological or psychological ways. However, the GH mice showed normal performance in the novel object recognition test, object place recognition test, open field test, elevated plus maze, and forced swim test. Extended Data Figure 6 looks at whether inhibiting these IL neurons affect sociability itself. The researchers found that inhibited GH mice did not distinguish a novel mouse from its cagemate, but this did not affect the reciprocal social interaction with a novel conspecific.<br /> Lastly, Figure 5 answers: Does increasing NAcSh-projecting IL neuronal activity rescue the social recognition deficit in SH mice? To test this, they expressed the hM3Dq receptor in NAcSh-projecting IL neurons within SH mice and injected CNO 40 minutes before undergoing the social behavior tests (Figure 5A-D). The authors found that social recognition was successfully rescued in these SH mice (Figure 5E-F).<br /> In the conclusion, they tie in their findings with similar ones regarding the hippocampus’s connections to the mPFC and NAcSh and their impact on social memory. They also discuss research about the impact of social isolation on impaired motivation and drug-seeking behavior. They wrap up with a discussion of when they believe the critical period of social recognition is and how their results can contribute to the understanding of disorders like ASD.

      Major Criticisms<br /> In Figure 1, we thought that there were a few places that could use improvement or clarification. To go into detail, we would like clarification on why isolation occurred only after weaning and not pre-weaning. Previous literature has been known to isolate mice pre-weaning, and we wanted more justification on why post-weaning isolation was done instead. In addition, we also feel as 8 weeks of social isolation is too long of a period and would like to see additional evidence on why the period could not have been shorter. We also wonder why there was no behavioral testing done before and after resocialization. If there was, we would like to see the data included in the paper. Otherwise, we would suggest that you run the same behavioral experiment on a separate cohort and carry out tests before and after resocialization. Perhaps then results of the behavioral tests run on unsocialized mice can then be depicted in panels C and D for comparison. Another criticism we would like to note is that the distribution found in panels D, E, and F emulates a bimodal distribution instead of a Gaussian distribution. If possible we would like to see a different statistical analysis run that is better fitting of the data. The same can be said for Figure 5 panel E and F. Another major issue noted is the assumption that resocialization is rewarding for the mice. In some instances, one could argue that resocialization is not rewarding as the mouse could be faced with aggressive counterparts. A measurement of anxiety levels during resocialization would help aid in your argument depending on the results. We think that one way you can approach this is by measuring cortisol levels in mice before and after they have been reintroduced. You could also quantify aggression levels before the mouse was reintroduced and once the mouse has been added back into the group. Lastly, for panels E, F, and B we are looking for a bit more clarification on what characteristics delegated a familiar and novel object, position, and mouse. For example, we were wondering if the target mouse was an age and sex match.<br /> Figure 2 looks at NAcSh-projecting IL neurons in the deep layer of the mPFC. However, we suggest that the authors clarify which layer it is. Additionally, to avoid criticisms about possible discrepancies between the number of cells counted and the slice image, we suggest that the researchers provide a high-magnification image of DAPI staining and eGFP to show that each green dot shows a nucleus.<br /> The social habituation/recognition tasks in Figure 3 were performed after 4-week regrouping. It is a good control to keep all behavioral tests after the 8-week group housing or single housing and 4-week regrouping paradigm. However, to more directly confirm the social deficit in the SH mice, we suggest the social habituation/recognition tasks also performed in parallel without regrouping.

      Minor Criticisms<br /> In Figure 1 panels C and D, we would like a bit more clarification on where the objects were located in relation to the mouse. The heat maps suggest their locations; however, it is not directly stated in the writing or the figure. In addition, a legend distinguishing between GH and SH in the Social Preference, Social Recognition, Novel Object Recognition, and Object Place Recognition bar graphs would be helpful as it is only indicated in panel A. In addition, we wondered if the chamber placement of the familiar or novel object/mouse were counterbalanced so that they were not always placed on the same side of the mouse. We speculate if there was no counterbalancing done that the mouse may have preferred a certain chamber instead of a particular mouse or object. One last thing we would like to see is the exact age range in which the mice participated in the behavioral tests since it is not made completely clear in the Methods section. <br /> The Results section that discusses Figure 2 begins with an explanation of why the authors focused on mPFC-NAcSh connections in their study. However, we suggest that this is explained in the Introduction instead since it left us wondering why the authors focused on mPFC-NAcSh connections in their study; it was unclear whether there was literature supporting this decision. <br /> In Figure 3 panel B, we think it is an interesting finding that SH mice also showed significantly decreased interacting time with the conspecifics in the habituation session as GH mice did, given that these SH mice would later do poorly in social recognition tests. Thus, it would be better to notify the readers that it is an unexpected or interesting finding, and also propose the hypothesis of this phenomenon. For the panel E and F, you did a good quantification of neuronal activity in the NAcSh-projecting IL neurons by calculating the percentage of c-Fos and eGFP co-labeled neurons among total eGFP labeled neurons, because the number of NAcSh-projecting IL neurons may change in SH mice compared to GH mice. We would like to know whether the neural circuitry from IL to NAcSh is altered by social isolation. For this purpose, we suggest you inject an equal amount of AAVrg-eGFP-Cre in both GH and SH mice, and quantify the number of eGFP-labeled neurons in them. <br /> For Figure 4, the data is generally well organized and the experimental protocol was summarized well in the schematic. However, for Figure 4D (the comparison between spiking in baseline and in the presence of CNO), it would be valuable to show the effect of the control saline vehicle on spiking for both the eGFP mice as well as the hM4Di mice to illustrate that the presence of the IP injection or saline has no effect if it indeed had none. The preference index had also proved difficult to read; the way the lines overlapped made the connections between the data points on each bar hard to discern. Perhaps color coding them or using different shapes instead of uniform dots would be beneficial. <br /> Overall, Figure 5 seemed clear and straightforward. However, we did want a bit more clarification as to why 1mg/kg CNO was used for vehicle injection even though previously in Figure 4 there was 3mg/kg used. We suspect this may be due to an increase in sensitivity pertaining to excitation but would like to see that confirmed within the literature if it is the case. Another point to be made is that 4B and 5B appear to be inconsistent and like to see a bit more clarification or a comment on why that may be.

      Merits<br /> For Figure 1 panel A, the addition of the schematic was helpful in understanding the timeline of the study for both GH and SH mice. The same can be said for panels B,E, and F. The results and tests done of the mice seemed appropriate to further aid in the understanding of social recognition and preference. All in all, the authors used every panel in the figure to justify how they came to their conclusion regarding the social preference and recognition test GH and SH mice. <br /> Figure 4 is paramount in illustrating the nuanced effect of this circuit on social behavior—namely, the impairment of social recognition while retaining nominal social preference in mice which was derived from the clearly reported results of the inhibitive manipulation. This connection is no more salient in the paper than here.<br /> Figure 5 was imperative for our understanding in rescuing social recognition. Hand in hand with Figure 4, it clearly defines how social recognition is affected by the loss and regain of the NAcSh-projection IL neurons. In particular, the middle graphs in E and F do an excellent job in highlighting the effect of IL-NAc shell neurons activation in social preference and recognition tests for both GH and SH mice.

      Future Directions <br /> The findings of this paper prove to have important impacts pertaining to acquisition of social familiarity. Although it was noted in the paper that future direction includes investigation of animal models of ASD, it would be also beneficial to look at animal models of schizophrenia. In specific, revaluation of what was found in Piskorowski et al. in comparison to this paper pertaining to the critical period. The discrepancy, that is 2 weeks compared to 11 weeks, seems way too large and should be further investigated to better understand why the critical period may be one or the other. As an alternative, it may be found that it may be neither and that there is another time period that better represents the critical time period for normal social recognition.<br /> It would also be interesting to look into why the excitability of NAcSh-projecting IL neurons had decreased. Because the neurons in SH mice showed no significant electrophysiological differences from those in GH mice, the decreased excitability is likely a result of morphological changes. Indeed, Silva-Gómez et al. (2003) found that a similar social isolation protocol in rats results in decreased dendritic spine density within the mPFC.<br /> It is interesting and surprising that both GH and SH mice showed significantly decreased interacting time with the conspecifics in the habituation session in Figure 3B, but the social recognition was impaired in SH mice only. It might be because the SH mice could not remember the familiar mice and recognized them as all novel ones, which would indicate that the processes of memory consolidation and memory retrieval were impaired in SH mice. Thus, we think it would be interesting to investigate social recognition with a perspective of memory in the future.

      Works Cited:<br /> Silva-Gómez, A.B., Rojas, D., Juárez, I., & Flores, G. (2003). Decreased dendritic spine density on prefrontal cortical and hippocampal pyramidal neurons in postweaning social isolation rats. Brain Research, 983(1), 128-136.

      Piskorowski RA, et al. (2016)Age-Dependent Specific Changes in Area CA2 of the Hippocampus and Social Memory Deficit in a Mouse Model of the 22q11.2 Deletion Syndrome. Neuron 89, 163–176.

    1. On 2020-03-31 15:20:48, user Søren Grubb wrote:

      Dear David A. Hartmann and the Shih lab,

      I want to congratulate you on a beautiful work. I know how thorough and careful you are, and this subject of whether pericytes are contractile or not, is very important – and controversial. So, I read your paper with great interest, and I am very surprised to see that<br /> you observe capillary lumen decrease when optogenetically stimulating true pericytes. I have a lot of questions and comments, so I hope you don’t mind!

      I agree with your finding (or confirmation) that smooth muscle actin expression drops<br /> after (up to) 4th order capillary, as we have also found in our recent Nature communications paper: https://www.nature.com/arti.... You cite the Alarcon-Martinez et al paper to claim that there may be SMA that is not detected. I would like to make you aware that in that paper, they have a different numbering for the blood vessels, which means our 1st order capillary would correspond to their 3rd order blood vessel. Which<br /> means that when they improve their SMA stainings from 4th to 6th order vessels using their fixation procedures, it would correspond to from 2nd to 4th order capillary with our numbering. So, my conclusion is that retina might be more difficult to fixate than brain, and we do not necessarily miss out on any SMA. But I may be wrong, I often am.

      I see that you have mentioned a “sphincter” in supplementary movie 5. I am not sure what<br /> your purpose of mentioning that sphincter is, I just want to let you know that that is not a typical precapillary sphincter. A typical precapillary sphincter and bulb are visible just next to the penetrating arteriole around 21 seconds into your Supplementary movie 7 where the tissue moves a little bit in the z-direction. For more info on precapillary sphincters see our paper: https://www.nature.com/arti...

      For our paper we adopted your nomenclature for ensheathing pericytes on the first order capillary (which you call precapillary) because I feel that is a good description. However, your drawing of ensheathing pericytes indicated to me that they were a continuous sheet<br /> surrounding the capillary, which I found confusing until I did the confocal microscopy<br /> myself and saw that it was just because the pericyte processes are so tightly positioned<br /> around the capillary that they look like a continuous structure. I have tried to draw detailed ensheathing pericytes on Figure 1e in our NatComms paper (see link above). Maybe we all should do an effort to make more precise drawings of the mural cell morphology, like Zimmermann did beautifully in his paper: https://link.springer.com/a....

      I would like to also make you aware that the smooth muscle “hybrids” as we have called<br /> the smooth muscle cells on far the most of the penetrating arterioles (with average<br /> lumen diameter around 12µm), often have a slightly bulbous nucleus, and 2-3 processes<br /> in each direction around the arteriole. So, they look significantly different from the “true” smooth muscle cells that exist on larger arterioles and arteries, which have spindle shape and an elongated nucleus.

      I have never understood the reason that only the true pericyte soma should be contractile, so I’m glad to see that you address that on page 3 line 25-29. My concern has been, that because the pericyte soma protrudes it can somehow push close the capillary lumen at that exact spot if the tissue around it swells or if the capillary is somehow pulled in direction of the arteriole (by strong arteriole contraction) – and thereby false positively be interpreted as contraction.

      You write that the localized two-photon optical manipulations “disentangles their local<br /> influence on capillary diameter from the influence of flow in upstream vessels”, but pericytes have gap junctions that connect them to endothelial cells, so how can you be sure if you only stimulate and observe locally? Have you tried to uncouple them with gap junction blockers?

      If one capillary branch has increased RBC flux by optogenetic stimulation, does that<br /> mean the other branch has decreased flux (indicating a local effect) or does the other branch also have increased flux (indicating an upstream effect)?

      Have you ever seen any indication that the blebbing you see also happens on the luminal<br /> side of the pericyte, which could push the endothelial cell towards the lumen? Are<br /> these “contractions” and blebbing by depolarization pericyte specific or would<br /> you find similar blebbing in other cells you depolarize optogenetically, for example astrocytes? Do you think the blebbing is caused by an increase in intracellular<br /> Ca2+?

      You write that the blebbing might be caused by “excessive mechanical tension” when<br /> pericytes are stimulated to “supraphysiologic levels”. If the stimulation you use is not physiologically relevant but necessary for effect (in contrast to the Hill paper, that saw no effect), under what conditions do you think true pericytes contract?

      If the actin cytoskeleton should be able to create a force, I guess most of the actin<br /> cytoskeleton should be organized to pull in the same direction, have you seen any indication of that?

      The idea to ablate bridging pericytes is very elegant. When you ablate a pericyte, I assume<br /> it will go into apoptosis and the first thing it does is retrieve its processes and round up. Could the increased capillary lumen diameter be explained by the extra volume around the endothelial cell that a retraction of the pericyte processes leaves behind?

      Thanks for a really interesting paper, it was a pleasure reading it. I really hope you find<br /> time to answer my questions.

      Best regards

      Søren Grubb

      Lauritzen<br /> lab, Department of Neuroscience, University of Copenhagen

    1. On 2019-12-17 16:36:21, user Johan S. Martinez-Fuentes wrote:

      NE 598 Group 3<br /> IntroductionWe are university students enrolled in a course focused on understanding neural circuits, including factors important for their development and control of animal physiology. In an effort to promote constructive discourse of current research in this field, and to gain experience in the process of peer-review, we provide the following critique of the currently unpublished manuscript from Wallace et al. posted on biorxiv.org (version: July 25, 2019).

      Summary:There has been a growing appreciation for the role microglia play in regulating synaptic connectivity during brain development; however, how microglia regulate the circuit integration of neurons in neurogenesis in the healthy adult brain remains unclear. Wallace et al. focus on the effects of microglia on adult-born granule cells (abGCs) as part of the mechanisms underlying a previously reported increase in activity of principle neurons of the olfactory bulb (OB) after microglia ablation. Their general approach consists of combining genetic labeling methods with in vivo live-cell imaging of microglia and abGCs (both constitutive labeling and GCaMP indicator of activity-related calcium influx) under conditions of odor presentation and microglial ablation using the CSF-1R antagonist PLX-5622. Overall, the authors found evidence for specific microglial interaction with adult-born abGC spines, that the population-level dendritic GCaMP response of OB abGCs was significantly decreased, and that the excitatory input into the abGCs were selectively decreased with no change to their inhibitory input. These results further support the notion that microglia play an important role in sculpting the circuit connections of nascent/developing neurons in the context of adult neurogenesis, with new descriptions of potential molecular mechanisms that may be at play. Overall, we recommend more consistency with respect to experimental time courses to strengthen the overall conclusions, more consistent definitions of threshold values for the classification of evoked responses, and clearly articulated cohort numbers and ages. We recommend improving the labelling of figures in terms of defining the control and experimental groups using keys, and the sizes of the two groups should be more balanced. Further, we recommend consistency between written text, legends and the figures themselves, particularly in cases where the number of odorants stated and displayed do not match. The authors may elaborate on these points in-text for improved understanding of their findings.<br /> In Figure 1, to explore the nature of microglia interactions with abGCs, the authors employ viral-genetic labeling to target both cell populations and examine them under in vivo two-photon imaging. The authors confirmed the highly motile nature of microglial processes, as microglial interactions with abGC "mushroom" and "filopodial" spines were quantified by spatial overlap of the cell markers. While overlapping of both types of spines with microglial processes were not significantly greater than expected by chance ("offset" image analysis), there was an increase in the number of microglial interactions with mushroom spines with about two-fold increase in interaction time than expected by chance. This was not seen with microglial interactions with filopodia, thus showing preference for mushroom/potentially active spines.<br /> In an effort to investigate how microglia ablation effects odor-evoked responses of abGCs, Wallace et al used two-photon imaging to observe abGC calcium activity over the entire time course of abGC development in anesthetized mice on PLX5622 (PLX) chow. Compared to controls, abGC neurons in PLX-treated mice were less responsive to odors as quantified in 2F by cumulative distribution plot. Additionally, figure 2H features a raincloud plot that quantifies a decrease in the median lifetime sparseness of the abGC dendrites in OB of PLX-mice. Moreover, figure 2I quantifies median response amplitude across all dendrites, showing significant decrease in median amplitude across dendrites of treated anesthetized mice. These results suggest a decrease in their dendrites’ temporal selectivity and likely reflects the developing abGC’s decreased odor responsiveness. Another set of experiments testing these effects in awake mice were performed (Figure 3). cumulative distribution of dendrite responses in figure 3B affirms suppressed calcium transients under odor exposure in PLX-treated awake mice. There were not significant decreases in median number of responsive odors (Fig. 3C), nor lifetime median sparseness of dendrites.<br /> In Figure 4 the investigators explore whether effects of microglial ablation were specific to developing versus mature abGCs. After following the experimental protocol shown in Figure 1a, the cumulative distribution of the responses are unchanged after PLX administration, and noise is not significant (Figure 4c). Cumulative distribution of the number of effective odor (exceeding an ROC threshold of 0.53) also shown not to be significant (Figure s4d). Finally, this figure also includes a Raincloud plot of lifetime sparseness, with control and PLX groups largely overlapped, and kernel density estimates underneath with box plots showed insignificant differences (Figure 4d). Thus, the ablation of microglia did not significantly change the evoked responses of developed abGCs, highlighting the importance of microglial during abGC development.<br /> In contrast in figure 4.1, there is no administration of PLX chow. The abGCs are imaged twice at the 3 month post-injection, and three weeks later alongside control group imaging (Figure 4.1a). Dendrite-odor pair response comparisons in the images Before 1 and Before 2 as seen in the timeline garnered similar results with an R2 value of 0.73 (Figure 4.1b). There are also distribution plots show no significant difference between groups (Figure 4.1c-e). Overall, this suggests abGC cells do not display significant differences in their responses three weeks after the injection. Similar results are shown in Figure 4.2 with 9 weeks after the first imaging set (Figure 4.2a).<br /> In Figure 5, possible PLX-mediated structural changes to abGC spines in the EPL were assessed by quantification of spine number and volume in two-photon acquired images. The authors measured spine density per abGC after four weeks with or without drug treatment during abGC development and found no significant difference resulting from microglial ablation. When considering total population of spine volumes, the PLX-treated condition revealed spines were significantly smaller compared to those in control. However, this effect was not observed when cell-averaged spine volumes were compared between conditions.<br /> In figure 6, the authors looked at electrophysiological correlates in the previously observed spine head sizes during abGC development. To do this they simultaneously recorded in vitro spontaneous excitatory postsynaptic currents (sEPSCs) using patch clamping and in vivo imaging (Figure 6a). They report that there are no differences in frequency of sEPSCs from control to PLX-treated mice, but observed reduced amplitude (Figure 6c-d). They then report their finding of the membrane properties as being the same across all mice, control and treated (Supplemental 1). To test potential changes to spontaneous inhibitory postsynaptic currents (IPSC), the authors repeated the same experiment but tracing the IPSCs and found no difference between the control mice and the PLX-treated mice(Figure 6e-g). These results show changes in abGC functional responses is due to the weaker excitatory inputs. Using the timeline in Fig. 4, the authors also tested electrophysiological effects of microglial ablation on matured abGCs (Fig. 7), and found that ablation after development has no effect on synaptic input, either excitatory or inhibitory.

      Major Issues:We believe there is a general lack of explicitly tracking the age of mice used in this study, which may potentially affect the significance of the findings. The authors list the age of mice used as 8-12 weeks from the beginning of experiments. This may be too large of a range given the lack of knowledge in the field regarding how factors regulating neurogenesis change with age (Kase et al., 2019). One suggestion is to explicitly list the n associated with the age of mouse used, and perhaps in supplementary figures color code certain quantified data points by age to show how measures may or may not be different. Figure 1 uses a low number of mice (n=3), and so it is unclear whether the significant increased time of microglia-abGC interactions may be more related to earlier or later ages at this adult brain stage. The issue here may be summarized by the question, do developmentally perturbed abGCs recover activity after 12+ weeks? We invite the authors to consider addressing this timing discrepancy.<br /> In Figure 2, the timeline for development of abCGs could be improved upon because there doesn't seem to be a fixed time point to anchor the data set, we are unsure to what extent the authors can be confident in their comparisons. Moreover, there is no mention of the timeline that the authors used in Figure 3. Additionally, the cumulative distribution plots used in figures 2 and 3 do not do an adequate job of showing the discrepancy between the PLX and control groups. We suggest using another form of statistical analysis to depict the disparities between these two groups more effectively (e.g., consider general histogram depicting counts per bin of response level). There are some more integral criticisms that can be made for Figure 4 even though it is useful and well-done. In the figure legend, 16 compounds are discussed; however, the figure itself only shows the combination of the compound, heatmap, and trace for 15 substances. Furthermore, while each set of experiments looks at a different aspect of the effects of microglial ablation, the different timelines that are used over the course of the experiment and the changes to it as seen in figure 4 specifically, can be problematic when trying to make assertions when trying to make comments on the findings of the paper as a whole. Additionally, the age of the animals themselves in not mentioned. Furthermore, the ROC threshold indicated for treating evoked responses as effective is inconsistent between the primary figure, where it is listed as 0.53, the supplementary figure 4.1 where it is listed as 0.39, and the supplementary figure 4.2 where it is listed as 0.78. The use of supplementary figures and experiments was useful on its own right; however, changing the threshold values between the sets of experiments at their analogous counterpoints is problematic when trying to consider the outcomes of the parts in unison since all of the portions are using the ROC threshold value in the same way. <br /> There are two main issues to address in Figure 5. One is the abrupt change in the timing of lentiviral labeling of abGCs and PLX feeding. Here, the two were simultaneous, such that experimental migrating abGCs are expected to interact with microglia not present in other developmental ablation experiments. This particular timing would make the experimental condition more similar to control where microglia are intact. Thus, the synaptic findings in Figure 5 are not strictly transferable to functional deficits seen in Figure 2. This also means that the authors may expect a more robust synaptic phenotype if they revert to the experimental timeline used in Figure 2. The second issue is the oversampling conducted in the experimental condition: there was an average of ~61 spines sampled from each control abGC, and ~101 spines from each PLX-treated abGC. The authors may consider quantifying more control spine volumes to make a more balanced/fair comparison.<br /> A significant issue with Figure 7 is that the authors decide to use an experimental timeline different from that of Figure 4 where the time from lentiviral labeling is shortened by one month, but their choice behind this change in timeline is not explained. Besides the change in timeline, the recordings are completed after a month after injection, whereby the difference in age of abGCs from shorter experimental timelines makes it unclear what sort of broader conclusions can be drawn.

      Minor Issues:In Figure 1B, the insets showing percent coverage are insightful for understanding microglial-abGC interaction dynamics; however, it suffers from a lack of x-axis labeling that affects ease of reading. We suggest either moving all insets to its own panel with explicit time labeling, or make the x-axis reference clearer in sub-panels. Regarding spine selection, it may be important to address how/whether other spines not well-described by the two classifications were considered (i.e., were stubby and cup-shaped spines considered/observed?). It would also be interesting to see whether there were any differences across the quantified measures as a function of time (1-4 weeks post-injection).<br /> It would be interesting if the authors addressed their rationale for picking the odors that they did in both figures 2 and 3. Panels 2D and 3A would benefit from providing the common names of the scents corresponding to each odor. Figure 3 in general could also be improved by distinguishing the PLX and control groups more effectively. This could be accomplished by adding clearer labels on each of the figure insets. We would also suggest increasing the overall number of experimental mice for this particular experiment to see if the data that is currently trending towards significance can be bolstered above threshold. <br /> While overall Figure 4 is quite well-done, there are some minor errors and possible areas of improvement. In the timeline in part 4a it would have been useful to label the Before (control) as a control imaging session because looking at the figure at first glance it is not entirely clear that the control is not another group of mice and rather is the same mice imaged twice. With the consideration that a timeline is used (which was a good idea) mentioning first imaging and second imaging session directly on it could be useful. It may be helpful to the reader if the names of the odorant compounds were included. Furthermore, while one can eventually piece together that the control group is in purple and the PLX group is in orange, they are unable to do so from the figure alone. In figure 4.2d there is a key on the figure that indicates these groups are specified by these colors, but this cannot be well determined in the primary figure; while a small thing to fix, this is integral to comprehending the results accurately. Furthermore, only three mice are used in the primary experiment. It may have been useful to look in more mice for the purposes of the experiments.<br /> In reporting the results for Figure 5, it is not intuitive why cell-averaged spine volume is not significant between control and experimental conditions, but it is the opposite when analyzing individual spine populations. A short description to reconcile this conflicting finding is needed. We suspect this suggests that a relatively small population of PLX-treated abGCs harbor most of the spine volume changes. Furthermore, it is unclear in the discussion how well the authors may speak to a trend in increase in spine density when there seems to be two data points that may be driving a lot of the PLX population average.<br /> In Figure 6C-G, the figures are all comparing control to PLX-treated mice.These graphs all have two different colored sets of data, and in 6A there is a demonstration of what these two colors correspond to. However, in the rest of the figure there is no clarification of which set of data corresponds to which color. We suggest the authors include which data set if for which condition on each of the graphs or add a legend near these graphs to be clearer to the readers.

      Merits:In Figure 1, the authors highlight evidence of cellular interactions that lay proper motivation for examining the effects that microglia may have on abGC functional development. The data acquisition and method of analysis are generally well-described in their respective report sections, and the conservative nature of quantifying microglia-spine interactions lends to more confident data. The comparison of real data to its offset counterpart across many quantified measures is also a clever way to argue for microglial preferential interaction with mushroom spines.<br /> Figure 2 provides excellent histological confirmation of microglial ablation. In figure 2 and 3, the authors showed the processed data for the GCaMP6s traces in panels 3A, 2D, and 2E in an easily interpretable manner. Moreover, the decision to use a raincloud plot for panel 2H and a bar graph in 2F showed significance more effectively than the cumulative distribution plots. <br /> There are several parts of the experiments associated with Figure 4 that are highly useful. The use of a timeline is highly conducive the set up of the experiment highly understandable and creates a visual image that is easier to comprehend than the worded explanation. Furthermore, it is useful that the experimenters have chosen to include the actual chemical structures of those used in the experiments. The raincloud plot shows expertly how the data compares between the groups very directly. The kernel plot gives a sense of the individual data points, and the box plots give important information on statistical measures of the data. Additionally, the concept of including an experiment on discussing the relevance of the ablation of microglia in the context of whether developed abGCs are affected strengthens the overall argument and credibility of the paper as a whole. Finally, including a supplemental section which had experiments both on looking at simply long time post injection in comparison to the three month mark (Figure 4.1) and one that looked at an increased period of time with the PLX administration (Figure 4.2) was also very useful in bolstering the results.<br /> Figure 5 is a valuable addition to the article as it brings a cell biological mechanism into discussion for the observed functional phenotypes in microglial ablation. We commend the authors for reporting different single-spine and single-cell perspectives of analysis on the same data set in Fig. 5D even though the two analyses lay out a complex and seemingly conflicting picture. But combined with the rigor in the authors’ approach, this motivates the reader to ponder future experiments to explain the data.

      Future DirectionsWith respect to Figure 1, future experiments may further partition the subtypes of mushroom spines with which microglia interact based on different post-synaptic markers. For instance, microglia may preferentially interact with spines expressing certain receptors. It is also unclear how activity in the olfactory bulb may direct microglial interactions with abGC spines. Increasing olfactory activity in mice by housing in an environment with prolonged exposure to stimulatory odors, and subsequently tracking microglial interactions, may result in more robust phenotype and better reveal microglial attraction to certain spines.<br /> Concerning Figures 2 and 3, we suggest that future experiments should attempt to refine the timeline by providing a fixed time point for the development period of abGCs. We also suggest that more experimental mice be added to the cohort in figure 3 to probe the validity of the non-significance of their statistical analyses. <br /> As in many other sets of experiments seen in this paper, in Figure 4 a number of different odorants was used to evoke responses. We think it would be interesting to take a closer look at the chemical composition of the compounds and look at the differential effects on the responses of the abGCs. Additionally, regarding the fourth set of experiments in particular it may have been interesting to look at the before period being even further along in the life of the microglia. Microglia live for a couple years in mice, it may be interesting to look at the effects of the PLX administration in microglia that were not only fully mature, but also as they are reaching the end of their life.<br /> An intriguing idea stemming from the data in Figure 5 is that there is a subpopulation of abGCs that is particularly susceptible to microglia-dependent spine volume enlargement. Given the relatively low number of abGCs sampled per group, this may be a rather large subpopulation, perhaps representing dedicated GCs or periglomerular cells, both of which should be labeled non-discriminantly here. Thus, using a cell-type resource of the olfactory bulb, such as that created by Tepe et al. (2018), to find a lead for molecular markers to target susceptible adult-born neuron subpopulations may push our understanding of the phenotypes reported here.<br /> With regards to future experiments from the EPSC experiments (Figures 6, 7), it may be interesting to investigate potential changes in mini-EPSCs or -IPSCs to flesh out a fuller picture of the state of synaptic activity. The approach would effectively be the same only with acute introduction of tetrodotoxin at the site of recording. These minis may behave differently depending on the ablation and can have an effect on the EPSC frequency and amplitudes. This difference could be a notable change that leads to what appears to be either no change or a change in amplitude.

      Works Cited:Kase Y, Otsu K, Shimazaki T, Okano H. (2019). Involvement of p38 in Age-Related Decline in Adult Neurogenesis via Modulation of Wnt Signaling. Stem Cell Reports.;12(6):1313-1328.<br /> Tepe, B., Hill, M. C., Pekarek, B. T., Hunt, P. J., Martin, T. J., Martin, J. F., & Arenkiel, B. R. (2018). Single-Cell RNA-Seq of Mouse Olfactory Bulb Reveals Cellular Heterogeneity and Activity-Dependent Molecular Census of Adult-Born Neurons. Cell reports, 25(10), 2689–2703.e3. doi:10.1016/j.celrep.2018.11.034.

    1. On 2019-11-05 15:11:52, user Johan S. Martinez-Fuentes wrote:

      NE598 GROUP 3<br /> We are students at Boston University focused on learning about neural circuits and how their structure and function relate to animal behavior. In an effort to promote constructive discourse of current research in this field, and to gain experience in the process of peer-review, we provide the following critique of the currently unpublished manuscript from Hammond et al. posted on biorxiv.org (version: September 05, 2019).

      Summary: Multiple sclerosis (MS) is a neurodegenerative disease characterized by loss of white and grey matter leading to motor and cognitive disability. It remains unknown exactly what role the components of the immune system, including microglia and molecular complement factors (e.g., C3, C1q), play in disease progression of grey matter in MS. Hammond et al. use a mouse model of MS called experimental autoimmune encephalomyelitis (EAE) in combination with molecular, genetic, and immunohistochemical approaches to find that C3/C1q and microglial activation are implicated in different aspects of grey matter pathology in EAE. These results argue for complement signaling, and associated microglial activation, as important players in MS-related grey matter degeneration and disease severity. This research has promise of being impactful as it contributes to our general lack of knowledge surrounding lesions of MS independent of demyelination (Mandolesi et al., 2015), and potentially highlights new avenues for therapeutic treatments. Overall, we recommend improving the usage and presentation of some of the data as well as addressing complexities of cellular phenotypes, which appear to be understated.<br /> Figure 1 explored the potential functional relationship between the complement production, specifically that of C1q and C3 protein, and the EAE model. The authors used western blot to analyze C1q and C3 expression in hippocampal lysates comparing the sham and EAE mice and found an increase in both the levels of C1q and C3, 2.6-fold and 1.9-fold respectively as compared to the increase in the sham controls (Figure 1A). They normalized the band densities to the sham controls and quantified the C1q and C3 results (Figure 1B). Further, they explored mRNA expression in hippocampal tissue by isolating RNA from the sham and EAE (n=10 each) mice and analyzed using qPCR and quantified the fold change of C1qa and C3, with 2.1 fold and 8.4 fold above sham controls respectively which implicated a potential connection between local gene expression and increased protein production in the model (Figure 1C). Additionally, the group used qPCR to analyze sham and EAE (n=5 each) hippocampal CD11b+ microglia/myeloid cells and their C1qa and C3 gene expression finding no significant difference in the expression of C1qa in the EAE mice as compared to the control, but there was 54.5-fold increase for C3 (Figure 1D).<br /> Figure 2 provides visual affirmation of the upregulation of C3/C1q in the hippocampi of EAE-mice compared to sham controls. Immunohistochemsitry was used to shed light on the differential spatial patterns of C3/C1q expression across regionalized sections of the hippocampal formation. Specifically, EAE-mice showed an increase in C1q across the entire hippocampus and in some cases showed co-localization with PSD95 suggesting it may affect synaptic functionality. This phenomena extended to C3/C3d expression in the CA1 stratum-radiatum region of the hippocampus. <br /> In the third figure, the investigators display the results of an experiment developed to determine the effects of C1q or C3 loss on the motor impairment in EAE mice by comparing pathology in EAE immunized WT, C1qa KO, and C3 KO mice (n=24, 17, and 7 respectively) on a clinical scale over the course of approximately one-month post immunization. They found nearly identical results between the C1qa and WT mice groups, but lower clinical scores indicating less severe EAE related deficits in the C3 KO group. Notably, the timeline of symptom onset was consistent across the groups. To display the results, they used a graph of Days Post Immunization versus Clinical Score displaying all three of the groups’ mean scores (Figure 3).<br /> Because the authors had previously found a significant amount of synapse elimination in the CA1-stratum, in Figure 4 they looked further into the role of complement proteins in grey matter loss, specifically in Homer1 and PSD95+ puncta in the Figure 4. Using immunohistology the puncta were quantified using the “find spots” algorithm setting a threshold of brightness for the PDS95+. Compared to the WT EAE, which had a 13% decrease in Homer1 puncta, the C1qa KO EAE showed only a 7% decrease in puncta compared to the sham control. However, both C3 and C1qa showed no significant difference to the sham control. All data were normalized to the sham control and each measure was taken from an average of 6 image stacks per mouse. This could suggest that the alternate pathways of C3 is more important for grey matter pathogenesis due to increased protection from synapse elimination in C3-KO compared to wild type and C1q-KO.<br /> In Figure 5, to assess the role of C1q and C3 for activated microglia in EAE, the authors conducted morphometric quantitative image analysis of IBA1 immunostain signal in the hippocampus across control and KO animals. Activated microglia show shorter, thicker skeletal processes. Thus, an increase in activated microglia was measured through segmentation algorithm in Volocity by (i) increased IBA1 expression, (ii) increased IBA1 volume, and (iii) a decrease in the ratio of either IBA1+ skeletal length or surface area to volume compared to sham control. In both EAE WT and EAE C1q-KO conditions the authors observed a significant increase in the level of activated microglia across all measures, but no significant difference was seen in EAE C3-KO. Thus, C3-dependent activity appears to be important for EAE-related microglia activation, and taken with the previous results, this may suggest why synaptic protection in C1q-KO is insufficient for improvement in clinical score. This set of results is highly intriguing as it suggests microglia as a target for therapeutic intervention in order to potentially improve grey matter health and patient outcomes.

      Major Issues:<br /> While Figure 1 supports the implication of the complement protein C1q and C3 expression in the deficits that characterize the EAE model fairly well, there are a few critical issues. Firstly, it includes both male and female mice, and it is well-known that MS has a higher prevalence among females and this could be a potential issue with the EAE model. The investigators claim that there is no sex difference, but their n of 7 and 11 is too small to confidently make this claim. They should include more mice and run the proper statistical tests or comment on this confluence. Further, they perform an experiment looking at hippocampal CD11b+ microglia/myeloid C1qa and C3 gene expression, but only use one marker. Figure 1 introduces the issue of isolating resident-brain macrophages (microglia) rather than those that pass cross the blood brain barrier, whereby CD11b+ is insufficient to distinguish because it is expressed across a variety of immune cell in adhesion-related associations. In Figure 5, the use of IBA1 is not strictly restricted to microglia but also includes monocyte-derived macrophages that may be crossing the blood-brain barrier, which poses issues in isolating a microglial phenotype (Satoh et al., 2016). For example, if the C3-KO condition results in increased numbers of IBA1+ macrophages then relying solely on IBA1 may mask a microglial phenotype. The authors may consider using a co-marker exclusive to microglia (e.g., TMEM119). Authors may consider analyzing protein expression in microglia.<br /> Regarding the issue of having insufficient n for comparison, the authors must seriously consider the risk of oversampling certain conditions so as to bias or skew results. Instances of this can be seen in Figures 3, 4, and 5. Generally in these figures, the WT n ~20, while C1q conditions have n ~ 15, and lastly C3 conditions are <10. The authors may consider increasing sampling in undersampled conditions, or re-run statistical analyses of subsets of oversampled groups to see if results are still significant.<br /> In Figure 2, although the sparse colocalization of C1q and PSD95 in figure 2 E-D somewhat implies that C1q is upregulated at synapses and thereby dendrites, the images do not provide the resolution necessary to resolve this colocalization or actual synapse itself. This criticism extends to 2I-J for the same reasons, and the issue of rigorously defining synapses is also apparent in Figure 4. The punctas that are being marked are post-synaptic, but there is no confirmation of association with dendrites or any other part of the neurons creating these synapses. The authors may consider sparsely labeling neurons with virally introduced, promoter-driven expression of fluorescent protein to visualize spine morphologies. Returning to Figure 2, there is no bar-graph quantifying the findings for these last panels. We acknowledge that 2F adequately resolves C1q expression and thereby confirms their antibodies’ efficacy, but this panel would benefit from providing a DAPI-stain that confirms the structural integrity of their mouse-model’s cytoarchitecture. In 2G, we feel that the images are not easily interpretable and could be improved by using a unique immunohistological marker to tag blood vessels and by normalizing the signal so that we can more clearly resolve the upregulation of C3/C3d puncta. The reader would also benefit from low-magnification insets to images 2D-J to confirm proper sub-region comparison.<br /> Conceptually, the major criticism of the experiment outlined in Figure 3 would be its inconsistency of focus compared to the rest of the study. While the vast majority of the experiments work to implicate the complement pathway in hippocampal degeneration, the clinical test that is chosen is a motor test. It may have been more useful to this study in particular to use a cognitive behavioral test for memory. Furthermore, they include no comparison with a sham mouse which is not suitable as there is no control point of reference for the clinical score.<br /> For Figure 4, the analysis could be done more in depth with a much clearer explanation of which sections are being studied and compared.The data is being normalized, but it is unclear from which sections exactly. Because of the way that the data is presented there is no way to check if there is just a concentrated population of these punctas in a certain section/hippocampal subregion, or if the spread of punctas is truly as uniform as the normalized data suggests it is.<br /> An essential piece of evidence missing from Figure 5 is a positive control for microglial activation in C3-KO mice. Are the microglia, under EAE conditions, capable of exhibiting activation characteristics? It is possible that there is large-scale defect on inflammatory processes related to the germline loss of C3, and not directly related to the functions of C3 itself. Considering the onset of motor symptoms across all mice is similar, one simple way to address this is to check if they all also share an activated microglial phenotype around day 6 and/or day 14 post-immunization. Another way may be direct intracerebroventricular (ICV) injection of LPS (here, the authors may also see if EAE is correspondingly accelerated).

      Minor Issues:<br /> In Figure 1, it would be more conducive to show all the data points on the bar graphs so that a better representation of the spread of the data can be visualized. It would also be useful for the group to include what percentage of mice had an increase in C1q and C3. Furthermore, it would be useful for the group to include more on the condition of the animals and whether they used all the data they collected in the analysis or whether some was thrown away.<br /> The age of the mice should be presented in figure legends (see Fig. 2, 4, 5) to build upon the narrative established in Figure 1. Moreover, although the authors attempt to show the aforementioned co-localization of C1q and PSD95 we think figure 2 could be vastly improved by including an inset in 2D-E to contextualize where we are looking with respect to the hippocampal formation. <br /> Overall, the display of Figure 3 is well crafted and the legend does well at explaining the facts of import; however, there could be some potential corrections. On the graph two of the groups are in the same color, the readability may increase by choosing different colors for each of the mice groups—especially if a sham control is added as earlier recommended. Further, it may be useful to include more background, possibly in the results portion for this figure, of the clinical test utilized and what different scores indicate relatively in terms of severity of symptoms.<br /> In Figure 4, the authors should be more detailed in adding magnification and the scale bar scale to the images of the IHC, and they should explain why the different images use different or the same colors. While green is generally thought to be a more visible color, the authors must keep the presentation consistent across conditions, otherwise they risk biasing the perception and interpretation of their data. <br /> Please correct the following typos:’value’ to ‘area’ in “Similar results were obtained for the surface value/volume ratio…” (Page 16); ‘Qioptiq’ instead of ‘Quioptic’ in “IHC sections were imaged… with Quioptic Optigrid optical sectioning hardware” (Page 10).

      Merits:<br /> In Figure 1, The group effectively uses the data presented in the first figure to begin the argument for the rest of their study. They are able to implicate the C1q and C3 proteins as having a relationship with EAE pathway. Furthermore, as it is well-known the relationship between protein production and mRNA is not 1:1 it was a good notion to include data on both. This figure also has a high level of readability, it is labelled well, and comprehensibility.<br /> Figure 2 successfully verifies the antibodies’ fidelity in visualizing C1q, PSD95, and C3/C3d in the mouse hippocampus. Importantly, this serves as a proof of principle figure because it validates the efficacy of their experimental mouse model and confirms that their antibodies function properly. Moreover, their approach is clever because it affords them with an opportunity to resolve region-specific expression of the aforementioned molecules of interest. <br /> The concept of integrating a behavioral experiment into this largely molecularly based study as seen in Figure 3 is commendable and certainly enhances this study’s findings by implicating the functionality of the complement pathway to the actual symptomology of the disease model course. It also allows for a look at the effects of the disease in a very readable and visual manner over the course of the progression.<br /> In Figure 4, the explanation of the way the data was collected and how it was analysed was quite clear. Using the same region as had been previously found to be affected by the changes done by this study is commendable. Notable in Figure 5, the measures for microglial activation shown here abide by the standards established in the field.

      Future Directions<br /> From Figure 1, to more definitively determine whether the C1qa and C3 KO’s show other inflammatory responses rather than simply the deletion of the complement proteins the group could do separate inflammation tests for the complements. Perhaps the group could build off of their experiments in the first figure by utilizing an assay to isolate the microglia of the mice and characterize the movement with pro-inflammatory markers such as TNFa, IL-2, or IL-6 by testing in WT, C1q KO and C3 KO with and without inflammation. They could also consider running qPCR. Additionally, the group could consider running this experiment with a behavioral component, such as a cognitive deficit test concerning memory. <br /> From Figure 2, concerning future directions, we think these figure panels would benefit from higher resolution images; however, if the authors do not have access to super resolution microscopy or EM we suggest performing synaptosome enrichment to quantify differential protein expression between sham and EAE populations. We also think that the colocalization results would be bolstered by recapitulating these experiments using other synaptic markers than just PSD95.<br /> It could be a very interesting future study to look at the role of the complement system in regards to motor function on a molecular level given the clinically oriented results they obtained in Figure 3. Furthermore, it would be interesting if the group carried out a cognitive deficit behavioral with the respective groups that would align more with the rest of the given study. Further, it may be interesting to look at a knockdown of the complement pathway elements analyzed and to compare the progression of symptomology in that case.<br /> Based on the findings in Figure 4, it would be interesting to see what is the spatial distribution of populations of puncta that are, as well as aren't, being reduced. It is unknown whether the elimination is uniform or specific to a single layer or to a certain projection pathway of the hippocampus. Further analyzing the data that has already been collected and analyzing it as intact stack of images rather than simply averaging many layers together. In addition to this, it would be useful to see the synapses with synaptic markers such as CaMKII using an AAV to trace them and use a retrograde.<br /> Branching from the work in Figure 5, to further explore the importance of activated microglia in EAE, future experiments perturbing the population of microglia across different stages of EAE may be conducted to see whether this is sufficient to improve clinical scores. The CSF1 receptor inhibitor, PLX3397, has been previously used to efficiently eliminate microglia, with ~50% reduction by three days (Elmore et al., 2014); this drug may be incorporated into the EAE timing to examine the effects of microglia loss. As an alternative, antisense oligonucleotides (ASOs) against C3 or CSF1 for pan-microglia may also be considered, especially since some ASO drugs are already FDA approved.

      Works CitedElmore, M. R., Najafi, A. R., Koike, M. A., Dagher, N. N., Spangenberg, E. E., Rice, R. A., … Green, K. N. (2014). Colony-stimulating factor 1 receptor signaling is necessary for microglia viability, unmasking a microglia progenitor cell in the adult brain. Neuron, 82(2), 380–397. doi:10.1016/j.neuron.2014.02.040

      Mandolesi G, Gentile A, Musella A, Fresegna D, De Vito F, Bullitta S, Sepman H, Marfia GA, Centonze D. Synaptopathy connects inflammation and neurodegeneration in multiple sclerosis. Nat Rev Neurol. 2015 Dec;11(12):711-24. doi: 10.1038/nrneurol.2015.222.

      Satoh J, Kino Y, Asahina N, Takitani M, Miyoshi J, Ishida T, Saito Y. TMEM119 marks a subset of microglia in the human brain. Neuropathology. 2016 Feb;36(1):39-49. doi: 10.1111/neup.12235.

    1. On 2019-04-23 12:47:59, user Brian Levine wrote:

      In this study, the researchers assessed concurrent validity of questionnaires against established measures in a sample of 217 participants. There is a strong motivation for this kind of study, which provides useful information for researchers assessing memory, imagery/scene construction, navigation, and future thinking. The researchers are commended for a comprehensive study reflecting many hours of effort in order to execute these measures. My comments will be largely focused on the measures of autobiographical memory (AM), some of which were developed by my group. This comment grew out of a discussion with my trainees who also read the article, including Nick Diamond, Carina Fan, Raluca Petrican, Stephanie Simpson, and Lynn Zhu. I thank the authors for posting this preprint, open to community commentary.

      A major contribution of this paper is an emphasis on subjective experience, which, although impossible to assess directly, is important to the consideration of episodic memory. This paper supports the view that subjective and objective instruments do not assess the same thing. As stated by the authors, the use of these instruments depends on the goals of the study. Where we disagree is the premise that seems to be implied in the title, which is that questionnaires (and to some extent, the objective tests) are measuring something different than what they purport to measure.

      My main critique of the approach is that it lacks nuance in terms of levels of analysis within AM, which is itself a multifaceted construct. The authors took a strictly univariate approach in which each criterion measure is treated as a unitary measure of a latent construct. Normally, multiple measures would be deployed in a latent construct approach because no single measure is process-pure.

      A main finding of the present study is that overall, subjective ratings (either on questionnaires or self-/other ratings of laboratory test performance) correlate with each other to a greater degree than the subjective/objective comparison. This is interesting though not surprising given that subjective measures do not measure the same thing as objective measures, and that they share measurement error bias. This is also the case for the scene construction measure which is held as objective, but in fact takes subjective ratings into consideration in the scoring.

      In the Autobiographical Interview (AI), internal details are treated as a measure of a person’s capacity to recover contextual information from past events; external details reflect content not specifically related to the defined event and are therefore considered to be inversely related to cognitive control over memory retrieval. A recovered detail is neutral with respect to subjective/conscious experience. Patient M.L., who had a specific impairment in conscious re-experiencing of the past due to frontotemporal disconnection, showed only marginal reductions in internal detail production, even though his “remember” ratings for the same events suggested a profoundly reduced conscious experience (Levine, Svoboda, Turner, Mandic, & Mackey, 2009). He also showed reduced activation of the AM network when presented with rich retrieval cues for these events. Even more to the point, patients with severe medial temporal lobe amnesia, including H.M. (Steinvorth, Levine, & Corkin, 2005) have produced events with substantial internal details (see also Cermak & O'Connor, 1983).

      The SAM episodic subscale, on the other hand, was developed specifically to probe the subjective experience of recollection at the trait level. As noted by the authors, we found that these were unrelated in our original SAM paper in healthy young adults (Palombo, Williams, Abdi, & Levine, 2013; see also Hebscher, Levine, & Gilboa, 2018 for a similar finding), nor were people with Severely Deficient Autobiographical Memory (SDAM) impaired on AM for recent events using the AI. Considering these findings, the above-described patient findings, and the more general findings of dissociation between subjective recollection and recognition performance, as illustrated in the Remember/Know technique, a strong relationship between these two measures should not be expected.

      Nonetheless, some relationship between recovered details and self-reported episodic autobiographical re-experiencing at the trait level could be expected. I believe the lack of relationship is owing to the fact that the AI was designed to elicit the richest possible event descriptions from participants. As the authors note, internal details are scored liberally for the sake of reliability (i.e. the “benefit of the doubt” rule where any detail that could reasonable be considered internal was classified as such). However, there was another purpose in eliciting rich episodic autobiographical memories, which was to avoid a false positive classification of memory impairment based on incidental factors, such as misunderstanding instructions, which is of particular importance in studies of aging and clinical samples. Accordingly, under the most commonly used administration method, the subject selects an event for each time period that is highly accessible and likely well-rehearsed. The resulting score therefore reflects the participant’s best possible narrative production. This is why M.L. and H.M. could produce seemingly normal autobiographical narratives.

      The SAM, on the other hand, is explicitly designed as a measure of trait mnemonics, not cognitive function as assessed by performance on a given test. The instructions for the episodic questions are “When answering, don’t think about just one event; rather, think about your general ability to remember specific events.” Even assuming that the SAM and the AI are designed to assess the same construct (which as I argue above is not the case) there is a difference between asking how one performs in general versus assessing how they perform when asked to give their best possible narrative by the examiner. By analogy, an introverted person may appear extroverted if required in certain social situations. There is no requirement to cue 5 lifetime period events as originally specified in our 2002 aging study. The AI scoring system has been applied to memories cued in different ways. Harvesting unrehearsed events from significant others may be a more effective way to estimate one’s typical retrieval abilities as opposed to their best possible performance.

      The present paper used a sample of young adults. The AI as implemented in our 2002 study was developed for use in older adults and in patients. The internal detail measure is very sensitive to medial temporal lobe integrity. While this has been demonstrated in neuroimaging studies of healthy young adult samples (Hebscher et al., 2018; Palombo et al., 2018), its sensitivity to individual differences in a homogeneous sample of young adults is limited relative to individuals with compromised medial temporal lobe function, especially at the behavioral level. Nonetheless, the proportion of internal/total details or internal details/word count should be examined rather than the raw count of internal details, as the latter is confounded with verbosity. A comprehensive test of this relationship should also examine detail subcategories and time period effects. Given the foregoing I do not expect that this would change the results substantially, but it should be done for completeness.

      It is intriguing that the parallel analysis on subjective vs. objective measures of spatial memory yielded significant relationships. This speaks to the complexity of AM relative to spatial memory. In navigation, the criteria for success are clearer than for AM. If someone arrives at the correct location (or gets lost), their subjective and objective experience are consonant. But if someone recalls an episode, it is unclear if the correct criterion is subjective experience or imagery or quantity of detail. As noted above, I agree with the authors that there is a distinction between subjective and objective measures, and that one’s selection of measures should be governed by the goals of the study. I would not agree that the present findings call into question whether or not internal details “is actually a good measure of recall ability” given that this measure (or its variants) has been used in over 170 studies (for table of studies, see AutobiographicalInterview.com), with good evidence for the validity of the internal/external distinction, including associations to brain structure and function. I also disagree that the findings of this study justify the use of vividness ratings alone as proxies for memory recall ability, especially in patients, who may show greater variability and less reliability in their introspective ratings than healthy adults. In any case, generalization to aging or clinical samples from a homogenous sample of younger adults is not justified.

      There is great richness to these data that could be exploited in a multivariate data-driven approach. I recognize that this was not the goal of this study, but a multivariate approach such as Canonical Correlation Analysis (CCA) would allow the researchers to detect latent variables and patterns of association across these measures opaque to a series of bivariate correlations and linear regressions. This feels like a lost opportunity in favor of an assumption-laden approach that results in a flat, protracted series of individual analyses that is difficult to follow. In fact, much of the analyses here are already exploratory in that they assess the ability of questionnaires to predict performance on constructs other than the one they were hypothesized to measure. Data driven multivariate approaches are well-suited for such goals.

      Finally, I had difficulty understanding the justification for proposing a single sentence test of any psychological construct. Classical test theory dictates that the reliability of a composite is better than the reliability of a single item. While single items may be useful as a screening technique, for pathognomonic signs, or when doing mass testing, they should not be used for assessment of complex traits, where interpretations of individual items may vary across individuals. A brief questionnaire for each construct would be more stable and does not pose an undue burden on participants. There are no psychometric data presented here to support the use of a single item measure aside from the fact that they showed sensitivity in this sample of healthy adults. These overfitted coefficients will shrink if tested in a separate sample. The composite test of all 15 single items could be subjected to psychometric analysis, but it is unclear if this is of interest.

      Cermak, L. S., & O'Connor, M. (1983). The anterograde and retrograde retrieval ability of a patient with amnesia due to encephalitis. Neuropsychologia, 21(3), 213-234.

      Hebscher, M., Levine, B., & Gilboa, A. (2018). The precuneus and hippocampus contribute to individual differences in the unfolding of spatial representations during episodic autobiographical memory. Neuropsychologia, 110, 123-133. doi:10.1016/j.neuropsychologia.2017.03.029

      Levine, B., Svoboda, E., Turner, G. R., Mandic, M., & Mackey, A. (2009). Behavioral and functional neuroanatomical correlates of anterograde autobiographical memory in isolated retrograde amnesic patient M.L. Neuropsychologia, 47(11), 2188-2196.

      Palombo, D. J., Bacopulos, A., Amaral, R. S. C., Olsen, R. K., Todd, R. M., Anderson, A. K., & Levine, B. (2018). Episodic autobiographical memory is associated with variation in the size of hippocampal subregions. Hippocampus, 28(2), 69-75. doi:10.1002/hipo.22818

      Palombo, D. J., Williams, L. J., Abdi, H., & Levine, B. (2013). The survey of autobiographical memory (SAM): a novel measure of trait mnemonics in everyday life. Cortex, 49(6), 1526-1540. doi:10.1016/j.cortex.2012.08.023

      Steinvorth, S., Levine, B., & Corkin, S. (2005). Medial temporal lobe structures are needed to re-experience remote autobiographical memories: evidence from H.M. and W.R. Neuropsychologia, 43(4), 479-496.

    1. On 2019-02-27 11:54:52, user Laurentius Huber wrote:

      This version of the manuscript is based on the following reviewer comments and responses:

      Please find a formatted version of this response letter with figures here: https://goo.gl/3czXWG

      Response Letter:<br /> We thank the referees for reviewing our manuscript entitled “Sub-millimeter fMRI reveals multiple topographical digit representations that form action maps in human motor cortex”. The critical reading of this manuscript is highly appreciated, and we believe that the comments have helped to improve the manuscript and clarify the interpretation of the presented results. The manuscript has been modified according to the reviewers’ suggestions.<br /> All points raised by the reviewers have been addressed in detail below.

      Reviewer #1:<br /> R1.1 <br /> This is a very interesting study investigating the spatial organization of hand movement representations in M1. Certainly the hand representation in M1 is likely complex and therefore requires advanced methods to probe. Both imaging and neurophysiological evidence clearly suggests that M1 is not so much concerned with the representation of fingers, but rather of complex hand movements. The use of a winner-take-all map for fingers is therefore likely a less effective way of gaining a deeper understanding of the organization of M1.

      We thank the reviewer for his/her expert assessment and for appreciating the necessity of advance methodology to investigate the complex representations in M1.

      We would like to comment on the reviewer’s statement that “imaging and neurophysiological evidence clearly suggests that M1 is not so much concerned with the representation of fingers, but rather of complex hand movements”. We agree that there is imaging and electrophysiological evidence that parts of M1 can represent complex hand movements. However, we take issue that it would be established that the entire M1 must behave like this. We believe this is only part of the entire picture. <br /> In fact, physiological support of the control of the mentioned “complex hand movement” and muscle and movement synergies comes from investigations of cortico-motoneuronal (CM) cells, (CM cells are the ones with motor neurons innervating shoulder, elbow, and finger muscles). Note, however, that these representations and these cells are confined to the caudal part of M1 (also known as the “new” M1 or Brodmann area BA4p). This is the evolutionary younger part of M1 that is located deep in the central sulcus. In this part of M1, individual body parts are largely overlapping (probably to facilitate complex hand movement) and a finger dominance maps might be misleading (as the reviewer suggested).

      However, we would like to note that there are no such CM cells in the rostal M1 (Rathelot and Strick, 2006, 2009). As pointed out in Fig. S9 of or manuscript, the new finding of mirrored finger representations are solely visible in the rostal M1 (a.k.a. “old” M1 or BA4a). In this evolutinary old part of M1, body part movements (e.g. hand, elbow, shoulder) have locally distinct domains with less overlap compared to BA4p.<br /> Thus, we respectfully disagree with the reviewer about the effectiveness of finger dominance maps. These maps are extensively used in imaging and electrophysiology and have efficiently lead to important findings throughout the last century (Woolsey 1979; Hlustik 2001; Idovina 2001; Sanes 1995; Penfield 1937; Schieber 1993; Schellekens 2018; Olman 2012; Siero 2014). We don’t want to discredit this large body of literature of body part maps. And we would also like to use the tool of finger dominance maps, when appropriate.

      We would also like to point out that at no point in this analysis, we are estimating “winner-takes-all maps”. We are aware of the shortcoming of winner-takes all maps and thus, the finger-dominance maps that we are depicting in many figures, are not binary. Instead, our finger-dominance maps are shown with a continuous color scale. Every voxel has a relative regime (from 0 to 1) of how much it is dominated from that finger. This analysis retains the fact that multiple fingers can be represented in the same voxel.<br /> For even more quantitative interpretations, (e.g. to avoid that the color of one fingers covers the color of another fingers that is more weakly represented) we included Fig. 3B that shows the mirrored representation in column profiles.

      The methods presented in this paper are carefully applied and well documented. In fact the authors have made the tools and data available in an open repository, for which they are to be commended. I really have no quibbles with the processing or VASO approach, both of which have extensive prior publication history.

      We thank the reviewer for recognizing the importance of investigating the organization of M1 and we are delighted that the reviewer considers out methods adequate.

      R1.2 <br /> The paper is clearly written and illustrated. However the crux of the problem lies in the extent of the novelty of the imaging sequence versus the lack of novelty in the neuroscience findings. Certainly practioners of VASO have made a convincing argument for its superiority over GE-EPI BOLD for the localization of function at the mesoscopic scale and I certainly am convinced of that. Nonetheless researchers around the globe have used GE-EPI to look at various columnar structures in animal and human brain with some degree of success. While the results in this paper are the amongst the clearest, the spatial resolution doesn't really go beyond what Cheng et al. used in their Neuron paper in 2001. So while VASO is certainly a viable and perhaps better alternative to BOLD, this manuscript doesn't really advance the MRI side of the equation much beyond what these authors and others have already shown.

      We thank the reviewer for appreciating the clarity of the manuscript and for appreciating the value of VASO in high-resolution fMRI.<br /> Given the reviewer’s doubts about the novelty, we would like to explicitly point out the methodological advancements we achieved and novel neuroscience finding that we found.

      Methodological Novelty:<br /> We agree with reviewer, that previous studies could already achieve sub-millimeter in-plane resolutions. Note, however that previous papers (including the Cheng paper) relied on flat portions of cortices and collapsed the third dimension along 3-4mm thick MR-slices. This means that precious MRI methods to investigate “columnar” alignment where not applicable across people and certainly not across the entire precentral M1-gray matter bank with its characteristic Omega-like folding pattern. VASO has never before proven its applicability for sub-millimeter “columnar” imaging. And certainly not for along the curved cortex. This is a novel achievement. <br /> We agree with the reviewer that we could previously already show indications of layer results (with submillimeter in-plane resolution). Please note however, that our previous methodology was limited to a very small FOV of less than 3cm in read direction and less than 2cm in slice direction, resulting in a coverage that could only capture 0.8% of the cortex. In previous studies, this was sufficient to address research questions about individual chunks of the cortex. However, it is not sufficient for topographical mapping of “columnar” organization. One fundamental achievement of this study is that we developed a fundamentally new acquisition approach that allows us to achieve 22% of brain coverage. This was achieved with the novel development of advanced readout strategies. As such, we invested two years of development for the inclusion of advanced GRAPPA reconstruction, asymmetric echoes, and corresponding reconstruction to image space. Compared to our previous methods, the resulting coverage is more than an order of magnitude bigger. This is fundamentally novel and enabled the present study in the first place. <br /> In this study we developed a fundamentally new analysis methodology. The corresponding LAYNII software package used here allows columnar and laminar signal pooling in the voxel space of the native EPI space. There is no other analysis method that can achieve this. While there are previous automatic software packages (e.g. FreeSurfer, CBS-Tools etc.) that allow similar analysis steps, they are not suitable to detect ‘columnar’ structures that are smaller than 1mm (5 digits in 3mm) within the curved cortex. These methods require closed surfaces (not possible with, partial brain coverage), alignment with ‘anatomical’ data (which requires spatial resampling=blurring). Previous methods work in vertex space (not voxel space) and thus are associated with resolution loss during spatial resampling, which makes the neighboring finger representations merge and disappear. The mirrored finger results are only as clearly visible with all the above analysis advancements. And thus, we consider these advancements as a fundamental methodological novelty. <br /> Other methodological analysis novelties developed here are the columnar smoothing without signal leakage across sulci, laminar Point-spread function estimation (Fig. S3, S8), layering in 3D with isotropic voxels (not only 2D as previously), cortical unfolding in voxel space.

      Biological novelty<br /> With respect to the referenced study from Cheng et al., we would like to point out that they showed patterns that resembled the expected shape and size as columns but never established such structure and organization. There is no expected ground truth of ocular dominance columns alignment (e.g. where to find which columns). This is different for our study. We can differentiate between any random columnar pattern compared to a meaningful somatotopic organization, with neighbouring fingers being represented in neighbouring columns. This form of meaningful columnar mapping at submillimeter scale is novel compared to Cheng et al.<br /> As opposed to previous columnar fMRI studies, we do not simply try to depict known structures with known shape and size as proof-of-principle for a method as previous studies. Instead here, we are finding previously unknown organization principles of sub-millimeter representations in M1. This is a fundamentally new approach and a paradigm shift for the field of “columnar” and “laminar” fMRI. <br /> We report fundamentally new neuroscientific insights about how the previously described action representations in the microscopic regime are integrated into previously described body-part representations in the macroscopic regime This was not described until now and is a fundamental novelty of this study.<br /> We agree with the reviewer that previous studies (including Ejaz et al.,) found deviations of the homunculus model. It is not clear until now, however, how these deviations (multiple representations and fractionalizations) are coming about. Are these deviation of the linear body-part alignments just randomly aligned? Or do the deviations follow a specific geometric order? If yes, which one? According to which order are the movement actions aligned? In this study we find -for the first time- mirrored representations of individual digits in the primary motor cortex that are differently engages for different actions. This is novel and has not been described previously.

      In the revised version of the manuscript, we tried to stress the novelty of the study.

      R1.2 <br /> So we are left with the importance of the neuroscientific findings, and here I have some more serious issues. The organization of M1 and S1 along an action-axis is well known and certainly not as mysterious as the authors would represent.

      We agree with the reviewer that there are previous accounts of action representations in the motor cortex. We are describing them as part of our introduction and discussion section. We did not intend to describe them as ‘mysterious’ by any means. The point that we are trying to make is that these action representations are partly in conflict with somatotopic organization principles that are found in most of the high-resolution imaging literature (e.g. Schellekens 2018; Olman 2012; Siero 2014).

      In the revised version of the manuscript, we emphasize the [Ejaz et al., 2015] even more in a dedicated paragraph about it.

      R1.3 <br /> Furthermore, they have dismissed a paper that shows a similar result using MRI by misrepresenting the findings of that paper as I understand them (Ejaz et al., 2015, Nature Neurosci). <br /> Specifically, in reference to that paper, Huber et al. state that 1) the work argues for a simple topographic arrangement of single finger representations in S1, and 2) that the overlap between finger activation patterns is "due to noise". In that work (Ejaz et al., 2015), they used BOLD fMRI to measure the activity patterns evoked by single- and multi-finger movements in M1 and S1. The spatial arrangements of these patterns in both regions were stable within each participant (compared across different scanning sessions), but highly variable across participants. These finger patterns are shown in Fig. 1 of that paper. Close visual inspection of the patterns reveals they do not follow a clear linear arrangement in either S1 or M1, and perhaps some evidence of digit "mirroring" can be observed - definitely there are parts of the cortex activated for the thumb at the dorsal end of the hand region.

      They then calculate the dissimilarity between all pairs of finger patterns for M1 and S1, separately. Importantly, the relative dissimilarity between any pair of activity patterns (within a participant) was highly stable across participants. This is notable given the spatial arrangements of these patterns was highly variable across individuals. One stable characteristic was that the thumb pattern was more similar to the little finger than to the ring finger. This finding clearly shows - contrary to what Huber et al. claim it shows - that a simple linear somatotopic arrangement cannot account for the digit representations in M1 or S1.

      1.) Our justification for the statements in the previous version of the paper:<br /> We assume the reviewer refers to the citation on page 5 of the original manuscript:

      “In the primary somatosensory cortex, we find no clear deviations from the homunculus model as shown previously in humans (Ejaz 2015; Schluppeck 2017; Olman 2012; Kolasinski 2016; Shellekens 2018).”

      This statement in our manuscript was based on the following paragraph in [Ejaz et al., 2015] from page 1034:

      “There was some consistency: when averaging activity patterns across participants (Fig. 1), a blurry somatotopic arrangement became visible with the thumb activating more ventral and the other fingers more dorsal areas of the motor strip.”

      Figure caption: adapted screenshot from Fig. 1 of Ejaz et al. Subject average activation maps show rough features of linear somatotopic arrangement (with secondary deviations). Thumb representations peaks at the bottom (pink arrow) and the remaining fingers are linearly aligned with the little finger representation peaking at the top (red arrow).<br /> We also noticed indications of a secondary thumb representation in Fig. 1 of [Ejaz et al., 2015] next to the index finger. We discussed these double-thumb indications in the Ejaz et al. figures extensively among ourselves and eventually decided not elaborate on them in our manuscript for the following reasons:<br /> In our own pilot studies, we noticed that for some kinds of thumb movement tasks, the thumb-movement can come along with unwanted secondary wrist movement. This was not the case for index/middle/ring/pinky-finger movements. Since the wrist movement representations are expected to be located next to the pinky-finger, we were sceptical that the secondary thumb representation form Ejaz might actually refer to unwanted wrist movement?<br /> In our own BOLD data, we find some cases of signal leakage from S1 to M1 (across the central sulcus), which might introduce artifactual double representation in M1. Since, Ejaz et al., also used BOLD sequences, we speculate that this might have been the case in those data too? <br /> The text of the paper [Ejaz et al., 2015] does not discuss the secondary blob at all. Neither does it mention it in the context of a potential double-representations or mirrored representation. Thus we are hesitant to include it as a reference for this feature. If would be more appropriate for us to give the authors of [Ejaz et al., 2015] full credit for the discovery of mirrored representations, if they would have described it and discussed it consistently across people.

      It is further to note that the above statement in our preprint referred to the sensory cortex, not the motor cortex.

      Revision to avoid future misunderstandings:<br /> We think this misunderstanding can be resolved by removing the [Ejaz et al. 2015] citation on page 5. Instead we discuss the paper in more depth on page 7.

      R1.4 <br /> Furthermore, they (Ejaz et al.) go on to show that the stable structure of overlap of finger representations in M1 and S1 can be accounted for by the statistics of everyday hand movement. They did not interpret the spatial variability of these patterns as "noise due to inter-individual variability in every day hand movements". On the contrary, the statistics of hand use they showed is stable across individuals (also see Ingram et al., 2008, Exp. Brain Res.), as is the organizing principle underlying the spatial organization of activity patterns in M1 and S1.

      1.) Justification for our statements in the previous version of the paper:<br /> We assume the comment from the reviewer refers to the following section of our manuscript on page 6:

      “Previous studies by Sane et al. (1995) and by Ejaz et al. (2015) already identified deviations from linear organizations for finger representations in the human motor cortex with GE-BOLD at 2.5 mm and 1.4 mm resolutions, respectively. However, without the localization specificity, a consistent spatial layout principle, such as the mirrored finger representation alignment, was not found. Instead, the exact pattern of overlapping and segregated representations was interpreted as noise due to inter-individual variability in every day hand movements (Ejaz 2015).”

      We included this interpretation of Ejatz’ results based on the first few sentences of the discussion section in [Ejaz et al., 2015] on page 1039:

      “The relative similarities between activity patterns were preserved across individuals, despite the substantial spatial inter-subject variability of the activity patterns themselves. The representational structure remained invariant even when the shared somatotopic arrangement of the digits was removed from the data. This suggests an organizing mechanism that shapes the overlap between patterns without enforcing a regular spatial layout. The representational structure could be predicted by the natural statistics of hand use.“

      If we understand the highlighted section correctly, Ejaz et al. found that there are deviations from a simple somatotopic organization. And the patterns of these deviations have a considerable variability across people. It is not clear, however, according to which consistent organization principle this variability comes about.

      In our view, we thus (mis-)described the phrase “inter-individual variability without given structure” with the term “noise due to inter-individual variability”.

      Revision to avoid future misunderstandings:<br /> We agree that the term “noise due to inter-individual variability” might be misleading to describe “inter-individual variability”. In the revised version of the manuscript, the corresponding section is revised as follows:<br /> A previous study by Ejaz et al. (2015) already identified deviations from linear organizations for finger representations in the human motor cortex with GE-BOLD at 2.5 mm and 1.4 mm resolutions, respectively. These data already showed some indications of multiple finger representations (e.g. Fig. 1 in (Ejaz et al. 2015)). However, these data were not discussed with respect to an alternative geometric somatotopic organization principle such as a mirrored representation.

      R1.5 <br /> I definitely agree with the authors that M1 organization is more complex arrangement than simple linear finger organization. Whether the organization really is best described by two discrete finger maps with phase reversal, however, really has to await a more rigorous experimental and statistical evaluation than even what is presented in Huber et al. Whatever the answer may be, however, I do think that the improved specificity of VASO sequence may play an important role in uncovering such representations in the future, but I don't feel that what has been shown goes much beyond what is known from the literature already.

      We are glad that the reviewer agrees with our work showing that the M1 representations can be complex. We agree that the literature needs to be augmented with more rigorous studies.<br /> In fact, with the manuscript at hand we intent to do just that: providing a more rigorous experimental evaluation. We aim to move beyond the position of Ejaz et al. Namely, we aim to go beyond the conclusion “that the motor cortex is more complicated than individual finger representations”, . and describe how it is different, how these differences are geometrically organized, and whether they are stable across people.<br /> Accounting also the large bulk of electrophysiological and micro-stimulation evidence about the body-part sub-divisions in M1 we opt to see how these representation are in agreement with the results from Ejaz.<br /> In previous imaging studies (including Ejaz et al.,) it was common to view M1 as one large chunk of cortex that would follow the same architectonic principle. There is a large body of invasive literature, however, that suggests that this is not correct, neither functionally (Rathelot and Strick, 2006, 2009) nor anatomically (Geyer 1996). Thus, we intend to describe the body-part representations with a more rigorous fine-scale evaluation. To get there, we developed the advanced methodology as described here. And we start to describe the simplest movement principle of the literature (finger tapping) in the simplest part of M1, namely the evolutionary “old” M1 that has been described as body part representations. <br /> Thus, we feel that our findings go beyond what it known form the literature already.

      Reviewer #3: <br /> General Comments: <br /> This paper uses the vascular space occupancy (VASO) method of measuring cerebral blood volume (fMRI) to explore the somatotopy of the finger representation at a sub-millimeter resolution in M1 and S1 of humans. This is an important problem as prior fMRI papers exploring this issue did not have sufficient resolution to adequately address a fine grained topography for fingers. This paper appears to have adequate resolution (~0.8mm) to make a major contribution to understanding the topography of the hand in M1 as well as S1. As such, this paper is primarily one of anatomical location and fMRI reconstruction. In addition, it addresses the issue of whether a given body part representation is always active when that body part is moved. The answer is that there is functional specialization within each M1finger representation. The figures are complex and it is paramount that their display is straightforward, consistent and simple to understand.

      R3.1. The stated goal of this paper is to"non-invasively investigate the functional organization topography across columnar and laminar structures in humans", particularly M1 and S1. To understand the topography of the fingers in M1, the entire extent of the finger representations in M1 must be accurately mapped. Such maps are shown in Figs. 6S and 10S. These maps, for each participant, could form the core of an important paper, but they belong in the main body of the paper. They also need to be shown systematically for each participant. The data showing the columnar organization of M1 and S1 seem like important validating information for the reconstruction of the central sulcus. Some of this could be moved to the Supplementary information. What is currently displayed in Figs. 1-5 is just a small sample from the entire extent of slices through M1. Although the concept of mirror hand representations derived from single slices is appealing, it is only represents a small fraction of the entire map of the central sulcus. Furthermore, the single fMRI slices totally ignore the finger representations present in the depth of the central sulcus.

      We would like to clarify our goal of this study. We feel the quoted section was taken out of context. As mentioned in the abstract, it was not our goal to ‘investigate the complete topographical organization of the motor cortex at its entirety’. Instead, the quoted section comes from an introductory sentence that states that our goal actually was to ‘develop imaging and analysis methodology, which -in principle- allows us to investigate topographical features’. In a next step we then use the M1/S1 system as a test bed to investigate the neuroscientific usefulness of that methodology. Given that we find -previously not described- neuroscience findings of the mirrored digit representation, we think that the neuroscientific usefulness it confirmed. In this sense, we see our manuscript to lie along a fine line between a methods paper and neuroscience paper.

      We agree with the reviewer that every figure in the Manuscript and the Supplementary information is “tuned” to a specific message that we want to bring across. We further agree that Figs. 1-5 in the main manuscript are just a small sample of the main story and there is much more information to be seen. We don’t see this as a weakness of the manuscript. But as a means to follow the comment R3.14, namely selectively showing figures that have a specific message, which comes across as intuitive as possible.

      In order to discuss the mirrored pattern of digit representations, we find it most natural to zoom into the hand area (Fig. 1). Correspondingly, when it comes to showing the inter-participant consistently of this feature (Fig. 2), we find it advantageous to use the same imaging procedure across all people as in Fig. 1. However, when it comes to explaining where these features are located across the dimensions of the central sulcus, we show additional unzoomed images. <br /> We agree with the reviewer that entire maps of the unflattened sensory-motor-system would give a more comprehensive view. However, it would distract the reader from the feature of interest. Those entire maps would mostly contain nothing (e.g. all the non-stimulated body parts, trunk, face, feet, etc.) and the 3-8mm of interest would be tiny (e.g. See Fig. S6). <br /> To address the reviewers comment, we included the full maps of the central sulcus into the manuscript main body (new figure 3), additional to the zoomed images.<br /> Furthermore, we included additional IMAGIRO maps (as requested) of for more participants with zoomed and unzoomed sections to guide the reader which part of the superior part of M1 it refers to (See new Fig. S6E).

      The of laminar and columnar fMRI is still emerging. Thus, not all potential sources of analysis artifacts are fully described and understood. To minimize potential misinterpretation it has been suggested to depict the final results as close to the raw data as possible (Polimeni 2017; Kay 2019). Thus we try to show the activation maps in the raw EPI space (Fig. 1,2,4), when possible. This way, it can be easily be directly appreciated that the mirrored finger pattern is not an artifact of a flawed infolding artifact. Furthermore, the activity maps in EPI space best depict the spatial scale of columnar size with respect to the cortical thickness and location at the hand knob. Flattened maps are produced by several additional steps and are presented in an very abstract space where, these reference dimensions are lost. Thus, we are hesitant to remove the activation maps on the folded cortex from the manuscript. However, we included additional unfolded flattened maps in the supplementary material.

      Please note that we are also required to following the Journal’s Guidelines to only include material that is central to the narrative. In doing so, we follow the rule of not having more than double of supplementary figures as figures in the main text. Thus, is included the some of additional maps as figure-panels, not as additional stand-alone figures.

      We revised the manuscript to account for the reviewer’s comment. Specifically, we rephrased the abstract and introduction section to make our goals clearer. We also tried to make it clearer what the message is for each figure, in the figure captions respectively.

      Kay, K., Jamison, K., Vizioli, L., Zhang, R., Margalit, E., & Ugurbil, K. (2019). A critical assessment of data quality and venous effects in sub-millimeter fMRI. NeuroImage, 189, 847–869. http://doi.org/10.1016/j.ne... <br /> Polimeni, J. R., Renvall, V., Zaretskaya, N., & Fischl, B. (2017). NeuroImage Analysis strategies for high-resolution UHF-fMRI data. NeuroImage, (April), 1–25. http://doi.org/10.1016/j.ne...

      R3.2. The orientation of brain images and reconstructions should be the same in every figure. For example, Fig. 1A and 1E seem to have the right side of the brain image toward the right whereas Fig. 1B-D has it to the left. In Fig. 6S, the orientation of the CS appears to be opposite to that shown in Fig. 10S. Continually forcing the reader to flip the images creates unnecessary confusion. Since this paper shows the right hemisphere, left/medial should be on page left and right/lateral should be on page right. The terms medial and lateral are preferable to left and right. In Figs. 6S, 10S, the actual location of the medial wall/sagittal fissure should be indicated. Without this marker, the CS just floats in space with no anchor to the actual brain image. A calibration should be included on each image.

      We agree that the orientation is confusing. This comes from the fact that the convention of MRI images is to view them as they would look like from the experimenter perspective. E.g. looking at an axial cut from the perspective of the participants feet. The right motor cortex of the person is then depicting on the left. This is contradicting to the 3D-head-models from viewing from above. Thus, the 3D-views and the 2D-views were confusing.<br /> Based on the reviewers comments, we tried to make it more consistent in Fig. 1, S6 and S10. This means however, that the 3D-head-models are mirrored representations compared to their real-live pendants. <br /> We included additional calibration markers and the landmarks of the medial wall in multiple figures. E.g. Fig. S6, S9, S3.

      R3.3. The term 'multiple' is used incorrectly throughout the manuscript. Multiple means 'more than 2'.

      We respectfully disagree with the reviewer on this point. In our understanding, the term ‘multiple’ refers to ‘more than one’ (source: https://en.oxforddictionari.... We chose this term deliberately vague. We find only two mirrored representation consistently across all participants. However, we cannot exclude the possibility that there are more representation hidden below the detection threshold. Since absence of evidence is not the same as evidence of absence, we would like to refrain from calling it “double” representation. This excludes the possibility of a third or fourth representation. <br /> In one participant, with a large tilting angle, and with a very low threshold, we see indications of a third representation. However, since its not reproducible across participants, its discussion is subject to future experiments with more sensitive imaging methodology only.

      R3.4. It is unclear how the images in Fig. 1E were developed. What do the colors mean? Why is this representation shown here when it is not used until Figs. 3S, 6S.

      Fig. 1 was intended as a figure describing the methods applied in this study. Thus, we included the coordinate system of layers and columns in 3D-grids as they are used for the directional smoothing. We agree with the reviewer that it can be confusing, we thus removed the panel E from the figure in the revised version of the manuscript.

      R3.5. Discussion- <br /> The requested revisions in the data presentation will require revision of comparisons to other fMRI papers. <br /> The Discussion would be improved by a more extensive comparison to studies in monkeys where most of the mapping of M1 has occurred. An excellent brief summary of the monkey literature may be found in the section written by Paul Cheney in Omrani et al, 2017. The discussion should address two issues. <br /> First, a comparison of the organization of human M1 to the anatomical and physiological explorations of this region in the monkey. Second, the issue of specialization (separate regions of grasping and retraction) has its basis in monkey data that indicates specialization of M1 neurons for specific tasks.

      We agree with the reviewer that the summary from Cheney provides a nice summary about representations in the motor cortex learned from monkey experiments. Based on this summary, we included an additional paragraph into the discussion section that should address the two issues.

      Most of the knowledge on the functional representation of movements in the primary motor cortex has been obtained from countless experiments in monkeys over the last century. The current state of consensus in the field is nicely summarized by Paul Cheney in (Omrani 2017; see also referenced therein); Overall, corticomotoneuronal cells in the primary motor encode muscle-related parameters of movement such as muscle activity and muscle force. Although some corticomotoneuronal cells in the primary motor cortex (particularly those involved with finger movements) have their terminations confined to motoneurons of single muscles, a large amount of corticomotoneuronal cells are not rigidly coupled to the activity of its target muscles but show specialization for particular movements or categories of muscle activity. Namely, almost half of the corticomotoneuronal cells facilitate muscles involving at least one distal and one proximal joint and are specialized for specific muscle synergies, E.g. for reach-to-grasp movements. With respect to action representations shown in Fig. 2B, it is important to note that Cheney and Fetz (1985) had previously identified the muscle fields of neighboring corticomotoneuronal cells. They showed that neighboring corticomotoneuronal had muscle fields that were very similar. Hence, the notion of cortical patches that are preferentially activated for grasping and retraction actions (Fig. 2B) has its basis in previous monkey data and could refer to these previously described muscle fields.

      Specific Comments:

      R3.6. The first sentence of the Significance statement is incomprehensible. In general, the significance of this study is not well explained.

      Since the significance statement is removed from the revised version of the manuscript.

      R3.7. Introduction- Sanes et al., 1995 did not study monkeys.

      We agree with the reviewer. The Sanes reference is moved to a different section now.

      R3.8. "However, the organizational principle of smaller body parts such as individual digits could not be resolved due to the lack of localization specificity of conventional GE-BOLD fMRI and the sparse sampling of invasive electrophysiological recordings." This may be true for fMRI but the electrophysiological stimulation in monkeys (Kwan et al.l 1978; Strick and Preston, 1982 [up to 16 penetrations per 1mm2]) and Park et al. 2001) can hardly be described as sparse.

      We agree with the reviewer that the term “sparse” might be misleading and does not give those experiments’ justice. The point we were trying to make is, that fMRI is inherently a continuous mapping technique that continuously samples the entire cortical sheath without any holes between electrodes. Which is true even at low resolutions. To address the reviewers comment, we revised the paragraph in the introduction section.

      R3.9. Lin et al 2011 is often used as evidence that VASO accurately measures CBV. However, close examination of Fig. 1 in Lin et al reveals that the VASO and Gd-DTPA blood volume measurements often do not occupy the same voxels. That is, many VASO voxels with significant activation have no significant Gd-DTPA activation and many Gd-DTPA voxels with significant activation have no VASO activation. This observation suggests that VASO does not accurately represent CBV when voxel to voxel comparisons are made by the two different methods of measuring CBV. What other evidence, other than theoretical, indicates that VASO accurately measures CBV? (Lin AL, Lu H, Fox PT, Duong TQ. Cerebral blood volume measurements- Gd-DTPA vs. VASO - and their relationship with cerebral blood flow in activated human visual cortex. Open Neuroimag. J. 2011; 5: 90-95.)

      We share the reviewer’s concerns whether VASO is a good measure for CBV. For this reason, we validated our SS-SI-VASO variant with gold-standard methods in multiple setups across the last 5 years. Ranging from concomitant VASO imaging with optical imaging spectroscopy in rats, up to validations of layer-dependent VASO signal with MION/Ferraheme imaging in rats and monkeys.

      While we agree that Fig. 1 in Lin et al., shows deviations of VASO and Gd-DTPA, we would like to refrain from speculating what might be the reason for this. Reasons could range from acquisition challenges up to analysis inconsistencies. See the following reference:

      Huber, L., et al (2015). Micro- and macrovascular contributions to layer-dependent blood blood volume fMRI: A multi-modal, multi-species comparison. ISMRM. doi: http://dx.doi.org/10.7490/f... ).

      Note that our validation studies are quantitative in physical units of ml. This is in contrast to significance maps in Lin et al., that might be prone to biases in different noise characteristics post-injection of GD. <br /> Also note that our validations are carried out across columnar structures (B) and laminar structures (C).

      See figures from:<br /> Huber, L., Goense, J.B.M., Kennerley, A.J., Guidi, M., Trampel, R., Turner, R., and Möller, H.E. (2015). Micro- and macrovascular contributions to layer-dependent blood blood volume fMRI: A multi-modal, multi-species comparison. In Proceedings of the International Society of Magnetic Resonance in Medicine, p. 2114. Doi: http://dx.doi.org/10.7490/f...<br /> Huber, L., Goense, J.B.M., Kennerley, A.J., Trampel, R., Guidi, M., Ivanov, D., Gauthier, C.J., Turner, R., Möller, H.E., Reimer, E., et al. (2015). Cortical lamina-dependent blood volume changes in human brain at 7T. Neuroimage 107, 23–33.<br /> Huber, L. (2015). Mapping human brain activity by functional magnetic resonance imaging of blood volume. University of Leipzig. https://fim.nimh.nih.gov/fi... <br /> Kennerley, A.J., Huber, L., Mildner, T., Mayhew, J.E., Turner, R., Möller, H.E., and Berwick, J. (2013). Does VASO contrast really allow measurement of CBV at high field (7 T)? An in-vivo quantification using concurrent optical imaging spectroscopy. In Proceedings of the International Society of Magnetic Resonance in Medicine, p. 0757.

      In the revised version of the manuscript, we included the following additional paragraph into the discussion section:

      Note that the CBV weighting in VASO has been extensively validated by comparisons with gold-standard methods in rats and monkeys across layer and columns (Huber et al., 2015a-c; Kennerley et al., 2013).

      R3.10. The voxel size is listed as 0.89mm x 0.99mm on page 2 versus 0.79mmx0.79mmx 0.99mm on page 1. Which is correct?

      The correction resolution is 0.79 mm. This typo is corrected in the revised version of the manuscript.

      R3.11. Was the smoothing across layers a directional smoothing?

      The reviewer is correct. The layer-smoothing was applied in specific directions only. It was only applied in the direction that is parallel to the column. There was no smoothing perpendicular to this direction. <br /> Note that this way of “directional” smoothing refers to cortical directions. The smoothing was independent of the direction in the laboratory frame of reference. As such, the smoothing is applied independent of the orientation of read-direction, slice-direction and phase direction. The LAYNII program LN_DIRECT_SMOOTH was not applied in this study. <br /> An additional sentence about this is included in the revised version of the manuscript.

      R3.12. Page 13- "...primary motor cortex is 4 mm (Fischl and Dale 2000), the resolution of 0.79 mm used here allows us to obtain 5-7 independent data points across the 20 layers. The number of 20 layers is chosen based on previous experience in finding a compromise". This description is hard to understand. Suggest something like- The cortical thickness of the primary motor cortex is 4 mm (Fischl and Dale 2000). With our resolution of 0.79 mm, we obtained 5-7 independent data points across the thickness of the cortex. These data points were upsampled to create 20 layers across the thickness of the cortex. Twenty layers was chosen based on previous experience in finding a compromise... These 20 layers were smoothed and extracted (tell me what you did here) in sheets to produce a reconstruction of the face of the anterior bank of the central sulcus (Figs. 3S, 6S, 10S).

      Based on the reviewer’s suggestion, we tried provide a more detailed description of the underlying assumptions and the necessity of using so many layers in a recent blog post: https://layerfmri.com/2019/... <br /> In the revised version of the manuscript, we the included the following summarizing statement:

      The cortical thickness of the primary motor cortex is 4 mm (Fischl and Dale 2000). With our resolution of 0.79 mm, we obtained 5-7 independent data points across the thickness of the cortex. Across these data points, we created 20 layers across the thickness of the cortex on a 4-fold finer grid than the effective resolution. The number of twenty layers was chosen based on previous experience in finding a compromise data size and smoothness (see Fig. S6 in (Huber 2018)). Columnar profiles in Fig. 3 and Fig. S4 are generated from unsmoothed data. For Figs. S3 and S6, the functional signal was smoothed with 0.5 mm within columns and extracted in sheets to produce a reconstruction of the face of the anterior bank of the central sulcus. No smoothing was applied across columns.

      R3.13. Fig. 2B- For participant 5, the copper and turquoise outlines are reversed. Hue of copper and turquoise colors are not consistent in each panel. <br /> In last panel of 2B, first line- there is a hand in this panel. What is its purpose? If the purpose is to be a key for finger color, the thumb should be magenta.

      The reviewer is right, the copper and turquoise patch seems reversed in participant 5. Note, however that this is not a presentation error in the preparation of the images. We find that the grasping-extension patches do not follow a the same organization principle along the medial-lateral direction across participants. It is highly dependent on the position of the axial projection chosen. E.g. it can be seen in Fig. S6 (and previous version of Fig. S9) that, dependent on the depth of the central sulcus, the copper and turquoise patches are either on the medial or lateral side. Please also note that participant 5 is not an outlier here; in fact, participant 1 (in the same figure) has the same copper-turquoise alignment as participant 5. Please also note, that the sensory cortex consistently shows a grasping preference, across all participants.

      The additional hand pictogram had been included as a figure key to remind the reader, which color refers to which finger. Based in the reviewers comments, it is excluded in the revised version of the manuscript. It is already shown in panel A) anyway.

      R3.14. Fig. S3C- Several features of this figure make it hard to decipher and undermine the explanation of the reconstruction method. I am assuming that the little squares in panel B are equivalent to columns. This should be stated explicitly. If the colors correspond to the fingers, then the mirror representation of the hand shown in Figs. 1-3 is nowhere to be found. This is confounding. It may be useful to show the location of the slice in panel D. Panel D is reversed from panel A, creating needless confusion. In panel C, the laminar thickness of the cortex is greater than the depth of the central sulcus. Calibrations would help but why not make the laminar thickness accurate? State explicitly that the IMAGIRO reconstruction consists of 20 layers, each like the one in B. Spelling- Columnar 'distance' <br /> It took me a long time to understand what you were doing. The descriptions of the reconstruction needs to be simple, clear and intuitive or very few will comprehend them. It all makes sense but the reader should not have to go to the blog (which I did) to understand them.

      We thank the reviewer for the suggestions to make this figure clearer. We also applaud the reviewers level of commitment to check the description on our blog.<br /> -> The little squares indeed refer to the columnar dimension. Additional comments are included in the caption.<br /> -> The colors do not refer to finger dominance, but to the medial-lateral position. This is included in the caption now.<br /> -> The location colors are now included in panel C, as suggested.<br /> -> Panels C and D are now switched, as suggested.<br /> -> If, the laminar thickness could be accurately depicted, all 20 layers would be 2-3 mm apart in the figure. If we would depict it in the right geometry, the layers could not be separated with the naked eye. Scale bars are included as suggested, which points out how they are distorted.<br /> -> An explicit reference about 20 layers is included.<br /> -> The typo is corrected in “distance”

      Updated Fig. 3:

      We agree, that an intuitive image is helpful. Here, we tried to find a compromise of simple intuitive figures that are representing the complexity of the analysis without making the supplementary material too long. The reviewer’s comments are appreciated to achieve this.

      R3.15. Fig. 4S part B- Should note that this is upsampled to produce 20 layers.

      The revised version of the manuscript has an additional statement included:

      Note that the size of layer and column structures are smaller than the effective resolution of 0.79 mm. They are estimated in an upscaled space.

      R3.16. Fig. 9S- Why is the background of the VASO view of the anterior bank of the CS entirely red? This implies that the entire CS is related to the 5th finger. How is that possible? Why are there yellow and green patches distributed all along the CS? This arrangement is different from any of the other figures. There does not seem to be a double mirror representation in this participant. <br /> In the bottom panels, why is the view limited to just part of M1 instead of the whole of M1? In general, this figure is quite confusing and really difficult to interpret. The organization of the grasping and retraction patches is an important issue. A better explanation (illustration?) of what you are trying convey in this figure is necessary.

      We agree with the reviewer that previous Figure S9 could be confusing. We tried to show too many features in one Figure. Our goal of this figure was to show the consistency of the finger representations across the different tasks and also to show the position of the mirrored representation along the depth of the central sulcus. Based on the reviewer’s comments, we decided to remove Fig. S9. From the manuscript. We believe that these to messages already come across from Fig. S5, S6, S9 (new).

      To answer the reviewer’s questions (for the sake of his/her curiosity): <br /> -> The top-right figure was included for the sake of orientation. It was not included to suggest the significance of the mirrored pattern. Thus, we did not threshold the finger dominances at all. In areas outside the hand-knob, therefore, the finger-preference measure for all fingers is close to 0. The red color outside the hand knob does not mean that this finger is represented there. It only means that all the other fingers are even noisier. E.g. that the finger preference for the index finger is 0.0014 compared to other fingers with a finger preference of 0.0005. For reference, in the hand knob, the finger preferences are in the regime 0.3-1 (please, see Fig. 3B about the absolute selectivity strengths in an outside the hand knob). The previous figure S9 corresponds to the line graph in Fig. 3B from above. <br /> -> We believe that there is, in fact, a mirrored pattern visible in this figure. Within the Brodman area subsection BA4A, the color pattern is reversed.

      R3.17. Fig. 10S- in the right panel, the orientation seems to be incorrect. That is, left is lateral and right is medial which means the left ear arrow should be pointing to the right.

      We agree, the arrow description now says “right” ear.

      R3.18. I suggest alphabetizing the reference list.

      In the updated reference list “S” is after “O”.

      R3.19. The correct citation is- Meier JD, Aflalo TN, Kastner S, Graziano MS. Complex organization of human primary motor cortex: a high-resolution fMRI study. J Neurophysiol. 2008 Oct;100(4):1800-12. doi: 10.1152/jn.90531.2008. Epub 2008 Aug 6

      The reference is updated

    2. On 2019-02-26 13:10:37, user Laurentius Huber wrote:

      Please find a formatted version of this response letter with figures here: https://goo.gl/3czXWG

      Response Letter:<br /> We thank the referees for reviewing our manuscript entitled “Sub-millimeter fMRI reveals multiple topographical digit representations that form action maps in human motor cortex”. The critical reading of this manuscript is highly appreciated, and we believe that the comments have helped to improve the manuscript and clarify the interpretation of the presented results. The manuscript has been modified according to the reviewers’ suggestions.<br /> All points raised by the reviewers have been addressed in detail below.

      Reviewer #1:<br /> R1.1 <br /> This is a very interesting study investigating the spatial organization of hand movement representations in M1. Certainly the hand representation in M1 is likely complex and therefore requires advanced methods to probe. Both imaging and neurophysiological evidence clearly suggests that M1 is not so much concerned with the representation of fingers, but rather of complex hand movements. The use of a winner-take-all map for fingers is therefore likely a less effective way of gaining a deeper understanding of the organization of M1.

      We thank the reviewer for his/her expert assessment and for appreciating the necessity of advance methodology to investigate the complex representations in M1.

      We would like to comment on the reviewer’s statement that “imaging and neurophysiological evidence clearly suggests that M1 is not so much concerned with the representation of fingers, but rather of complex hand movements”. We agree that there is imaging and electrophysiological evidence that parts of M1 can represent complex hand movements. However, we take issue that it would be established that the entire M1 must behave like this. We believe this is only part of the entire picture. <br /> In fact, physiological support of the control of the mentioned “complex hand movement” and muscle and movement synergies comes from investigations of cortico-motoneuronal (CM) cells, (CM cells are the ones with motor neurons innervating shoulder, elbow, and finger muscles). Note, however, that these representations and these cells are confined to the caudal part of M1 (also known as the “new” M1 or Brodmann area BA4p). This is the evolutionary younger part of M1 that is located deep in the central sulcus. In this part of M1, individual body parts are largely overlapping (probably to facilitate complex hand movement) and a finger dominance maps might be misleading (as the reviewer suggested).

      However, we would like to note that there are no such CM cells in the rostal M1 (Rathelot and Strick, 2006, 2009). As pointed out in Fig. S9 of or manuscript, the new finding of mirrored finger representations are solely visible in the rostal M1 (a.k.a. “old” M1 or BA4a). In this evolutinary old part of M1, body part movements (e.g. hand, elbow, shoulder) have locally distinct domains with less overlap compared to BA4p.<br /> Thus, we respectfully disagree with the reviewer about the effectiveness of finger dominance maps. These maps are extensively used in imaging and electrophysiology and have efficiently lead to important findings throughout the last century (Woolsey 1979; Hlustik 2001; Idovina 2001; Sanes 1995; Penfield 1937; Schieber 1993; Schellekens 2018; Olman 2012; Siero 2014). We don’t want to discredit this large body of literature of body part maps. And we would also like to use the tool of finger dominance maps, when appropriate.

      We would also like to point out that at no point in this analysis, we are estimating “winner-takes-all maps”. We are aware of the shortcoming of winner-takes all maps and thus, the finger-dominance maps that we are depicting in many figures, are not binary. Instead, our finger-dominance maps are shown with a continuous color scale. Every voxel has a relative regime (from 0 to 1) of how much it is dominated from that finger. This analysis retains the fact that multiple fingers can be represented in the same voxel.<br /> For even more quantitative interpretations, (e.g. to avoid that the color of one fingers covers the color of another fingers that is more weakly represented) we included Fig. 3B that shows the mirrored representation in column profiles.

      The methods presented in this paper are carefully applied and well documented. In fact the authors have made the tools and data available in an open repository, for which they are to be commended. I really have no quibbles with the processing or VASO approach, both of which have extensive prior publication history.

      We thank the reviewer for recognizing the importance of investigating the organization of M1 and we are delighted that the reviewer considers out methods adequate.

      R1.2 <br /> The paper is clearly written and illustrated. However the crux of the problem lies in the extent of the novelty of the imaging sequence versus the lack of novelty in the neuroscience findings. Certainly practioners of VASO have made a convincing argument for its superiority over GE-EPI BOLD for the localization of function at the mesoscopic scale and I certainly am convinced of that. Nonetheless researchers around the globe have used GE-EPI to look at various columnar structures in animal and human brain with some degree of success. While the results in this paper are the amongst the clearest, the spatial resolution doesn't really go beyond what Cheng et al. used in their Neuron paper in 2001. So while VASO is certainly a viable and perhaps better alternative to BOLD, this manuscript doesn't really advance the MRI side of the equation much beyond what these authors and others have already shown.

      We thank the reviewer for appreciating the clarity of the manuscript and for appreciating the value of VASO in high-resolution fMRI.<br /> Given the reviewer’s doubts about the novelty, we would like to explicitly point out the methodological advancements we achieved and novel neuroscience finding that we found.

      Methodological Novelty:<br /> We agree with reviewer, that previous studies could already achieve sub-millimeter in-plane resolutions. Note, however that previous papers (including the Cheng paper) relied on flat portions of cortices and collapsed the third dimension along 3-4mm thick MR-slices. This means that precious MRI methods to investigate “columnar” alignment where not applicable across people and certainly not across the entire precentral M1-gray matter bank with its characteristic Omega-like folding pattern. VASO has never before proven its applicability for sub-millimeter “columnar” imaging. And certainly not for along the curved cortex. This is a novel achievement. <br /> We agree with the reviewer that we could previously already show indications of layer results (with submillimeter in-plane resolution). Please note however, that our previous methodology was limited to a very small FOV of less than 3cm in read direction and less than 2cm in slice direction, resulting in a coverage that could only capture 0.8% of the cortex. In previous studies, this was sufficient to address research questions about individual chunks of the cortex. However, it is not sufficient for topographical mapping of “columnar” organization. One fundamental achievement of this study is that we developed a fundamentally new acquisition approach that allows us to achieve 22% of brain coverage. This was achieved with the novel development of advanced readout strategies. As such, we invested two years of development for the inclusion of advanced GRAPPA reconstruction, asymmetric echoes, and corresponding reconstruction to image space. Compared to our previous methods, the resulting coverage is more than an order of magnitude bigger. This is fundamentally novel and enabled the present study in the first place. <br /> In this study we developed a fundamentally new analysis methodology. The corresponding LAYNII software package used here allows columnar and laminar signal pooling in the voxel space of the native EPI space. There is no other analysis method that can achieve this. While there are previous automatic software packages (e.g. FreeSurfer, CBS-Tools etc.) that allow similar analysis steps, they are not suitable to detect ‘columnar’ structures that are smaller than 1mm (5 digits in 3mm) within the curved cortex. These methods require closed surfaces (not possible with, partial brain coverage), alignment with ‘anatomical’ data (which requires spatial resampling=blurring). Previous methods work in vertex space (not voxel space) and thus are associated with resolution loss during spatial resampling, which makes the neighboring finger representations merge and disappear. The mirrored finger results are only as clearly visible with all the above analysis advancements. And thus, we consider these advancements as a fundamental methodological novelty. <br /> Other methodological analysis novelties developed here are the columnar smoothing without signal leakage across sulci, laminar Point-spread function estimation (Fig. S3, S8), layering in 3D with isotropic voxels (not only 2D as previously), cortical unfolding in voxel space.

      Biological novelty<br /> With respect to the referenced study from Cheng et al., we would like to point out that they showed patterns that resembled the expected shape and size as columns but never established such structure and organization. There is no expected ground truth of ocular dominance columns alignment (e.g. where to find which columns). This is different for our study. We can differentiate between any random columnar pattern compared to a meaningful somatotopic organization, with neighbouring fingers being represented in neighbouring columns. This form of meaningful columnar mapping at submillimeter scale is novel compared to Cheng et al.<br /> As opposed to previous columnar fMRI studies, we do not simply try to depict known structures with known shape and size as proof-of-principle for a method as previous studies. Instead here, we are finding previously unknown organization principles of sub-millimeter representations in M1. This is a fundamentally new approach and a paradigm shift for the field of “columnar” and “laminar” fMRI. <br /> We report fundamentally new neuroscientific insights about how the previously described action representations in the microscopic regime are integrated into previously described body-part representations in the macroscopic regime This was not described until now and is a fundamental novelty of this study.<br /> We agree with the reviewer that previous studies (including Ejaz et al.,) found deviations of the homunculus model. It is not clear until now, however, how these deviations (multiple representations and fractionalizations) are coming about. Are these deviation of the linear body-part alignments just randomly aligned? Or do the deviations follow a specific geometric order? If yes, which one? According to which order are the movement actions aligned? In this study we find -for the first time- mirrored representations of individual digits in the primary motor cortex that are differently engages for different actions. This is novel and has not been described previously.

      In the revised version of the manuscript, we tried to stress the novelty of the study.

      R1.2 <br /> So we are left with the importance of the neuroscientific findings, and here I have some more serious issues. The organization of M1 and S1 along an action-axis is well known and certainly not as mysterious as the authors would represent.

      We agree with the reviewer that there are previous accounts of action representations in the motor cortex. We are describing them as part of our introduction and discussion section. We did not intend to describe them as ‘mysterious’ by any means. The point that we are trying to make is that these action representations are partly in conflict with somatotopic organization principles that are found in most of the high-resolution imaging literature (e.g. Schellekens 2018; Olman 2012; Siero 2014).

      In the revised version of the manuscript, we emphasize the [Ejaz et al., 2015] even more in a dedicated paragraph about it.

      R1.3 <br /> Furthermore, they have dismissed a paper that shows a similar result using MRI by misrepresenting the findings of that paper as I understand them (Ejaz et al., 2015, Nature Neurosci). <br /> Specifically, in reference to that paper, Huber et al. state that 1) the work argues for a simple topographic arrangement of single finger representations in S1, and 2) that the overlap between finger activation patterns is "due to noise". In that work (Ejaz et al., 2015), they used BOLD fMRI to measure the activity patterns evoked by single- and multi-finger movements in M1 and S1. The spatial arrangements of these patterns in both regions were stable within each participant (compared across different scanning sessions), but highly variable across participants. These finger patterns are shown in Fig. 1 of that paper. Close visual inspection of the patterns reveals they do not follow a clear linear arrangement in either S1 or M1, and perhaps some evidence of digit "mirroring" can be observed - definitely there are parts of the cortex activated for the thumb at the dorsal end of the hand region.

      They then calculate the dissimilarity between all pairs of finger patterns for M1 and S1, separately. Importantly, the relative dissimilarity between any pair of activity patterns (within a participant) was highly stable across participants. This is notable given the spatial arrangements of these patterns was highly variable across individuals. One stable characteristic was that the thumb pattern was more similar to the little finger than to the ring finger. This finding clearly shows - contrary to what Huber et al. claim it shows - that a simple linear somatotopic arrangement cannot account for the digit representations in M1 or S1.

      1.) Our justification for the statements in the previous version of the paper:<br /> We assume the reviewer refers to the citation on page 5 of the original manuscript:

      “In the primary somatosensory cortex, we find no clear deviations from the homunculus model as shown previously in humans (Ejaz 2015; Schluppeck 2017; Olman 2012; Kolasinski 2016; Shellekens 2018).”

      This statement in our manuscript was based on the following paragraph in [Ejaz et al., 2015] from page 1034:

      “There was some consistency: when averaging activity patterns across participants (Fig. 1), a blurry somatotopic arrangement became visible with the thumb activating more ventral and the other fingers more dorsal areas of the motor strip.”

      Figure caption: adapted screenshot from Fig. 1 of Ejaz et al. Subject average activation maps show rough features of linear somatotopic arrangement (with secondary deviations). Thumb representations peaks at the bottom (pink arrow) and the remaining fingers are linearly aligned with the little finger representation peaking at the top (red arrow).<br /> We also noticed indications of a secondary thumb representation in Fig. 1 of [Ejaz et al., 2015] next to the index finger. We discussed these double-thumb indications in the Ejaz et al. figures extensively among ourselves and eventually decided not elaborate on them in our manuscript for the following reasons:<br /> In our own pilot studies, we noticed that for some kinds of thumb movement tasks, the thumb-movement can come along with unwanted secondary wrist movement. This was not the case for index/middle/ring/pinky-finger movements. Since the wrist movement representations are expected to be located next to the pinky-finger, we were sceptical that the secondary thumb representation form Ejaz might actually refer to unwanted wrist movement?<br /> In our own BOLD data, we find some cases of signal leakage from S1 to M1 (across the central sulcus), which might introduce artifactual double representation in M1. Since, Ejaz et al., also used BOLD sequences, we speculate that this might have been the case in those data too? <br /> The text of the paper [Ejaz et al., 2015] does not discuss the secondary blob at all. Neither does it mention it in the context of a potential double-representations or mirrored representation. Thus we are hesitant to include it as a reference for this feature. If would be more appropriate for us to give the authors of [Ejaz et al., 2015] full credit for the discovery of mirrored representations, if they would have described it and discussed it consistently across people.

      It is further to note that the above statement in our preprint referred to the sensory cortex, not the motor cortex.

      Revision to avoid future misunderstandings:<br /> We think this misunderstanding can be resolved by removing the [Ejaz et al. 2015] citation on page 5. Instead we discuss the paper in more depth on page 7.

      R1.4 <br /> Furthermore, they (Ejaz et al.) go on to show that the stable structure of overlap of finger representations in M1 and S1 can be accounted for by the statistics of everyday hand movement. They did not interpret the spatial variability of these patterns as "noise due to inter-individual variability in every day hand movements". On the contrary, the statistics of hand use they showed is stable across individuals (also see Ingram et al., 2008, Exp. Brain Res.), as is the organizing principle underlying the spatial organization of activity patterns in M1 and S1.

      1.) Justification for our statements in the previous version of the paper:<br /> We assume the comment from the reviewer refers to the following section of our manuscript on page 6:

      “Previous studies by Sane et al. (1995) and by Ejaz et al. (2015) already identified deviations from linear organizations for finger representations in the human motor cortex with GE-BOLD at 2.5 mm and 1.4 mm resolutions, respectively. However, without the localization specificity, a consistent spatial layout principle, such as the mirrored finger representation alignment, was not found. Instead, the exact pattern of overlapping and segregated representations was interpreted as noise due to inter-individual variability in every day hand movements (Ejaz 2015).”

      We included this interpretation of Ejatz’ results based on the first few sentences of the discussion section in [Ejaz et al., 2015] on page 1039:

      “The relative similarities between activity patterns were preserved across individuals, despite the substantial spatial inter-subject variability of the activity patterns themselves. The representational structure remained invariant even when the shared somatotopic arrangement of the digits was removed from the data. This suggests an organizing mechanism that shapes the overlap between patterns without enforcing a regular spatial layout. The representational structure could be predicted by the natural statistics of hand use.“

      If we understand the highlighted section correctly, Ejaz et al. found that there are deviations from a simple somatotopic organization. And the patterns of these deviations have a considerable variability across people. It is not clear, however, according to which consistent organization principle this variability comes about.

      In our view, we thus (mis-)described the phrase “inter-individual variability without given structure” with the term “noise due to inter-individual variability”.

      Revision to avoid future misunderstandings:<br /> We agree that the term “noise due to inter-individual variability” might be misleading to describe “inter-individual variability”. In the revised version of the manuscript, the corresponding section is revised as follows:<br /> A previous study by Ejaz et al. (2015) already identified deviations from linear organizations for finger representations in the human motor cortex with GE-BOLD at 2.5 mm and 1.4 mm resolutions, respectively. These data already showed some indications of multiple finger representations (e.g. Fig. 1 in (Ejaz et al. 2015)). However, these data were not discussed with respect to an alternative geometric somatotopic organization principle such as a mirrored representation.

      R1.5 <br /> I definitely agree with the authors that M1 organization is more complex arrangement than simple linear finger organization. Whether the organization really is best described by two discrete finger maps with phase reversal, however, really has to await a more rigorous experimental and statistical evaluation than even what is presented in Huber et al. Whatever the answer may be, however, I do think that the improved specificity of VASO sequence may play an important role in uncovering such representations in the future, but I don't feel that what has been shown goes much beyond what is known from the literature already.

      We are glad that the reviewer agrees with our work showing that the M1 representations can be complex. We agree that the literature needs to be augmented with more rigorous studies.<br /> In fact, with the manuscript at hand we intent to do just that: providing a more rigorous experimental evaluation. We aim to move beyond the position of Ejaz et al. Namely, we aim to go beyond the conclusion “that the motor cortex is more complicated than individual finger representations”, . and describe how it is different, how these differences are geometrically organized, and whether they are stable across people.<br /> Accounting also the large bulk of electrophysiological and micro-stimulation evidence about the body-part sub-divisions in M1 we opt to see how these representation are in agreement with the results from Ejaz.<br /> In previous imaging studies (including Ejaz et al.,) it was common to view M1 as one large chunk of cortex that would follow the same architectonic principle. There is a large body of invasive literature, however, that suggests that this is not correct, neither functionally (Rathelot and Strick, 2006, 2009) nor anatomically (Geyer 1996). Thus, we intend to describe the body-part representations with a more rigorous fine-scale evaluation. To get there, we developed the advanced methodology as described here. And we start to describe the simplest movement principle of the literature (finger tapping) in the simplest part of M1, namely the evolutionary “old” M1 that has been described as body part representations. <br /> Thus, we feel that our findings go beyond what it known form the literature already.

      Reviewer #3: <br /> General Comments: <br /> This paper uses the vascular space occupancy (VASO) method of measuring cerebral blood volume (fMRI) to explore the somatotopy of the finger representation at a sub-millimeter resolution in M1 and S1 of humans. This is an important problem as prior fMRI papers exploring this issue did not have sufficient resolution to adequately address a fine grained topography for fingers. This paper appears to have adequate resolution (~0.8mm) to make a major contribution to understanding the topography of the hand in M1 as well as S1. As such, this paper is primarily one of anatomical location and fMRI reconstruction. In addition, it addresses the issue of whether a given body part representation is always active when that body part is moved. The answer is that there is functional specialization within each M1finger representation. The figures are complex and it is paramount that their display is straightforward, consistent and simple to understand.

      R3.1. The stated goal of this paper is to"non-invasively investigate the functional organization topography across columnar and laminar structures in humans", particularly M1 and S1. To understand the topography of the fingers in M1, the entire extent of the finger representations in M1 must be accurately mapped. Such maps are shown in Figs. 6S and 10S. These maps, for each participant, could form the core of an important paper, but they belong in the main body of the paper. They also need to be shown systematically for each participant. The data showing the columnar organization of M1 and S1 seem like important validating information for the reconstruction of the central sulcus. Some of this could be moved to the Supplementary information. What is currently displayed in Figs. 1-5 is just a small sample from the entire extent of slices through M1. Although the concept of mirror hand representations derived from single slices is appealing, it is only represents a small fraction of the entire map of the central sulcus. Furthermore, the single fMRI slices totally ignore the finger representations present in the depth of the central sulcus.

      We would like to clarify our goal of this study. We feel the quoted section was taken out of context. As mentioned in the abstract, it was not our goal to ‘investigate the complete topographical organization of the motor cortex at its entirety’. Instead, the quoted section comes from an introductory sentence that states that our goal actually was to ‘develop imaging and analysis methodology, which -in principle- allows us to investigate topographical features’. In a next step we then use the M1/S1 system as a test bed to investigate the neuroscientific usefulness of that methodology. Given that we find -previously not described- neuroscience findings of the mirrored digit representation, we think that the neuroscientific usefulness it confirmed. In this sense, we see our manuscript to lie along a fine line between a methods paper and neuroscience paper.

      We agree with the reviewer that every figure in the Manuscript and the Supplementary information is “tuned” to a specific message that we want to bring across. We further agree that Figs. 1-5 in the main manuscript are just a small sample of the main story and there is much more information to be seen. We don’t see this as a weakness of the manuscript. But as a means to follow the comment R3.14, namely selectively showing figures that have a specific message, which comes across as intuitive as possible.

      In order to discuss the mirrored pattern of digit representations, we find it most natural to zoom into the hand area (Fig. 1). Correspondingly, when it comes to showing the inter-participant consistently of this feature (Fig. 2), we find it advantageous to use the same imaging procedure across all people as in Fig. 1. However, when it comes to explaining where these features are located across the dimensions of the central sulcus, we show additional unzoomed images. <br /> We agree with the reviewer that entire maps of the unflattened sensory-motor-system would give a more comprehensive view. However, it would distract the reader from the feature of interest. Those entire maps would mostly contain nothing (e.g. all the non-stimulated body parts, trunk, face, feet, etc.) and the 3-8mm of interest would be tiny (e.g. See Fig. S6). <br /> To address the reviewers comment, we included the full maps of the central sulcus into the manuscript main body (new figure 3), additional to the zoomed images.<br /> Furthermore, we included additional IMAGIRO maps (as requested) of for more participants with zoomed and unzoomed sections to guide the reader which part of the superior part of M1 it refers to (See new Fig. S6E).

      The of laminar and columnar fMRI is still emerging. Thus, not all potential sources of analysis artifacts are fully described and understood. To minimize potential misinterpretation it has been suggested to depict the final results as close to the raw data as possible (Polimeni 2017; Kay 2019). Thus we try to show the activation maps in the raw EPI space (Fig. 1,2,4), when possible. This way, it can be easily be directly appreciated that the mirrored finger pattern is not an artifact of a flawed infolding artifact. Furthermore, the activity maps in EPI space best depict the spatial scale of columnar size with respect to the cortical thickness and location at the hand knob. Flattened maps are produced by several additional steps and are presented in an very abstract space where, these reference dimensions are lost. Thus, we are hesitant to remove the activation maps on the folded cortex from the manuscript. However, we included additional unfolded flattened maps in the supplementary material.

      Please note that we are also required to following the Journal’s Guidelines to only include material that is central to the narrative. In doing so, we follow the rule of not having more than double of supplementary figures as figures in the main text. Thus, is included the some of additional maps as figure-panels, not as additional stand-alone figures.

      We revised the manuscript to account for the reviewer’s comment. Specifically, we rephrased the abstract and introduction section to make our goals clearer. We also tried to make it clearer what the message is for each figure, in the figure captions respectively.

      Kay, K., Jamison, K., Vizioli, L., Zhang, R., Margalit, E., & Ugurbil, K. (2019). A critical assessment of data quality and venous effects in sub-millimeter fMRI. NeuroImage, 189, 847–869. http://doi.org/10.1016/j.ne... <br /> Polimeni, J. R., Renvall, V., Zaretskaya, N., & Fischl, B. (2017). NeuroImage Analysis strategies for high-resolution UHF-fMRI data. NeuroImage, (April), 1–25. http://doi.org/10.1016/j.ne...

      R3.2. The orientation of brain images and reconstructions should be the same in every figure. For example, Fig. 1A and 1E seem to have the right side of the brain image toward the right whereas Fig. 1B-D has it to the left. In Fig. 6S, the orientation of the CS appears to be opposite to that shown in Fig. 10S. Continually forcing the reader to flip the images creates unnecessary confusion. Since this paper shows the right hemisphere, left/medial should be on page left and right/lateral should be on page right. The terms medial and lateral are preferable to left and right. In Figs. 6S, 10S, the actual location of the medial wall/sagittal fissure should be indicated. Without this marker, the CS just floats in space with no anchor to the actual brain image. A calibration should be included on each image.

      We agree that the orientation is confusing. This comes from the fact that the convention of MRI images is to view them as they would look like from the experimenter perspective. E.g. looking at an axial cut from the perspective of the participants feet. The right motor cortex of the person is then depicting on the left. This is contradicting to the 3D-head-models from viewing from above. Thus, the 3D-views and the 2D-views were confusing.<br /> Based on the reviewers comments, we tried to make it more consistent in Fig. 1, S6 and S10. This means however, that the 3D-head-models are mirrored representations compared to their real-live pendants. <br /> We included additional calibration markers and the landmarks of the medial wall in multiple figures. E.g. Fig. S6, S9, S3.

      R3.3. The term 'multiple' is used incorrectly throughout the manuscript. Multiple means 'more than 2'.

      We respectfully disagree with the reviewer on this point. In our understanding, the term ‘multiple’ refers to ‘more than one’ (source: https://en.oxforddictionari... "https://en.oxforddictionaries.com/definition/us/multi-)"). We chose this term deliberately vague. We find only two mirrored representation consistently across all participants. However, we cannot exclude the possibility that there are more representation hidden below the detection threshold. Since absence of evidence is not the same as evidence of absence, we would like to refrain from calling it “double” representation. This excludes the possibility of a third or fourth representation. <br /> In one participant, with a large tilting angle, and with a very low threshold, we see indications of a third representation. However, since its not reproducible across participants, its discussion is subject to future experiments with more sensitive imaging methodology only.

      R3.4. It is unclear how the images in Fig. 1E were developed. What do the colors mean? Why is this representation shown here when it is not used until Figs. 3S, 6S.

      Fig. 1 was intended as a figure describing the methods applied in this study. Thus, we included the coordinate system of layers and columns in 3D-grids as they are used for the directional smoothing. We agree with the reviewer that it can be confusing, we thus removed the panel E from the figure in the revised version of the manuscript.

      R3.5. Discussion- <br /> The requested revisions in the data presentation will require revision of comparisons to other fMRI papers. <br /> The Discussion would be improved by a more extensive comparison to studies in monkeys where most of the mapping of M1 has occurred. An excellent brief summary of the monkey literature may be found in the section written by Paul Cheney in Omrani et al, 2017. The discussion should address two issues. <br /> First, a comparison of the organization of human M1 to the anatomical and physiological explorations of this region in the monkey. Second, the issue of specialization (separate regions of grasping and retraction) has its basis in monkey data that indicates specialization of M1 neurons for specific tasks.

      We agree with the reviewer that the summary from Cheney provides a nice summary about representations in the motor cortex learned from monkey experiments. Based on this summary, we included an additional paragraph into the discussion section that should address the two issues.

      Most of the knowledge on the functional representation of movements in the primary motor cortex has been obtained from countless experiments in monkeys over the last century. The current state of consensus in the field is nicely summarized by Paul Cheney in (Omrani 2017; see also referenced therein); Overall, corticomotoneuronal cells in the primary motor encode muscle-related parameters of movement such as muscle activity and muscle force. Although some corticomotoneuronal cells in the primary motor cortex (particularly those involved with finger movements) have their terminations confined to motoneurons of single muscles, a large amount of corticomotoneuronal cells are not rigidly coupled to the activity of its target muscles but show specialization for particular movements or categories of muscle activity. Namely, almost half of the corticomotoneuronal cells facilitate muscles involving at least one distal and one proximal joint and are specialized for specific muscle synergies, E.g. for reach-to-grasp movements. With respect to action representations shown in Fig. 2B, it is important to note that Cheney and Fetz (1985) had previously identified the muscle fields of neighboring corticomotoneuronal cells. They showed that neighboring corticomotoneuronal had muscle fields that were very similar. Hence, the notion of cortical patches that are preferentially activated for grasping and retraction actions (Fig. 2B) has its basis in previous monkey data and could refer to these previously described muscle fields.

      Specific Comments:

      R3.6. The first sentence of the Significance statement is incomprehensible. In general, the significance of this study is not well explained.

      Since the significance statement is removed from the revised version of the manuscript.

      R3.7. Introduction- Sanes et al., 1995 did not study monkeys.

      We agree with the reviewer. The Sanes reference is moved to a different section now.

      R3.8. "However, the organizational principle of smaller body parts such as individual digits could not be resolved due to the lack of localization specificity of conventional GE-BOLD fMRI and the sparse sampling of invasive electrophysiological recordings." This may be true for fMRI but the electrophysiological stimulation in monkeys (Kwan et al.l 1978; Strick and Preston, 1982 [up to 16 penetrations per 1mm2]) and Park et al. 2001) can hardly be described as sparse.

      We agree with the reviewer that the term “sparse” might be misleading and does not give those experiments’ justice. The point we were trying to make is, that fMRI is inherently a continuous mapping technique that continuously samples the entire cortical sheath without any holes between electrodes. Which is true even at low resolutions. To address the reviewers comment, we revised the paragraph in the introduction section.

      R3.9. Lin et al 2011 is often used as evidence that VASO accurately measures CBV. However, close examination of Fig. 1 in Lin et al reveals that the VASO and Gd-DTPA blood volume measurements often do not occupy the same voxels. That is, many VASO voxels with significant activation have no significant Gd-DTPA activation and many Gd-DTPA voxels with significant activation have no VASO activation. This observation suggests that VASO does not accurately represent CBV when voxel to voxel comparisons are made by the two different methods of measuring CBV. What other evidence, other than theoretical, indicates that VASO accurately measures CBV? (Lin AL, Lu H, Fox PT, Duong TQ. Cerebral blood volume measurements- Gd-DTPA vs. VASO - and their relationship with cerebral blood flow in activated human visual cortex. Open Neuroimag. J. 2011; 5: 90-95.)

      We share the reviewer’s concerns whether VASO is a good measure for CBV. For this reason, we validated our SS-SI-VASO variant with gold-standard methods in multiple setups across the last 5 years. Ranging from concomitant VASO imaging with optical imaging spectroscopy in rats, up to validations of layer-dependent VASO signal with MION/Ferraheme imaging in rats and monkeys.

      While we agree that Fig. 1 in Lin et al., shows deviations of VASO and Gd-DTPA, we would like to refrain from speculating what might be the reason for this. Reasons could range from acquisition challenges up to analysis inconsistencies. See the following reference:

      Huber, L., et al (2015). Micro- and macrovascular contributions to layer-dependent blood blood volume fMRI: A multi-modal, multi-species comparison. ISMRM. doi: http://dx.doi.org/10.7490/f... ).

      Note that our validation studies are quantitative in physical units of ml. This is in contrast to significance maps in Lin et al., that might be prone to biases in different noise characteristics post-injection of GD. <br /> Also note that our validations are carried out across columnar structures (B) and laminar structures (C).

      See figures from:<br /> Huber, L., Goense, J.B.M., Kennerley, A.J., Guidi, M., Trampel, R., Turner, R., and Möller, H.E. (2015). Micro- and macrovascular contributions to layer-dependent blood blood volume fMRI: A multi-modal, multi-species comparison. In Proceedings of the International Society of Magnetic Resonance in Medicine, p. 2114. Doi: http://dx.doi.org/10.7490/f...<br /> Huber, L., Goense, J.B.M., Kennerley, A.J., Trampel, R., Guidi, M., Ivanov, D., Gauthier, C.J., Turner, R., Möller, H.E., Reimer, E., et al. (2015). Cortical lamina-dependent blood volume changes in human brain at 7T. Neuroimage 107, 23–33.<br /> Huber, L. (2015). Mapping human brain activity by functional magnetic resonance imaging of blood volume. University of Leipzig. https://fim.nimh.nih.gov/fi... <br /> Kennerley, A.J., Huber, L., Mildner, T., Mayhew, J.E., Turner, R., Möller, H.E., and Berwick, J. (2013). Does VASO contrast really allow measurement of CBV at high field (7 T)? An in-vivo quantification using concurrent optical imaging spectroscopy. In Proceedings of the International Society of Magnetic Resonance in Medicine, p. 0757.

      In the revised version of the manuscript, we included the following additional paragraph into the discussion section:

      Note that the CBV weighting in VASO has been extensively validated by comparisons with gold-standard methods in rats and monkeys across layer and columns (Huber et al., 2015a-c; Kennerley et al., 2013).

      R3.10. The voxel size is listed as 0.89mm x 0.99mm on page 2 versus 0.79mmx0.79mmx 0.99mm on page 1. Which is correct?

      The correction resolution is 0.79 mm. This typo is corrected in the revised version of the manuscript.

      R3.11. Was the smoothing across layers a directional smoothing?

      The reviewer is correct. The layer-smoothing was applied in specific directions only. It was only applied in the direction that is parallel to the column. There was no smoothing perpendicular to this direction. <br /> Note that this way of “directional” smoothing refers to cortical directions. The smoothing was independent of the direction in the laboratory frame of reference. As such, the smoothing is applied independent of the orientation of read-direction, slice-direction and phase direction. The LAYNII program LN_DIRECT_SMOOTH was not applied in this study. <br /> An additional sentence about this is included in the revised version of the manuscript.

      R3.12. Page 13- "...primary motor cortex is 4 mm (Fischl and Dale 2000), the resolution of 0.79 mm used here allows us to obtain 5-7 independent data points across the 20 layers. The number of 20 layers is chosen based on previous experience in finding a compromise". This description is hard to understand. Suggest something like- The cortical thickness of the primary motor cortex is 4 mm (Fischl and Dale 2000). With our resolution of 0.79 mm, we obtained 5-7 independent data points across the thickness of the cortex. These data points were upsampled to create 20 layers across the thickness of the cortex. Twenty layers was chosen based on previous experience in finding a compromise... These 20 layers were smoothed and extracted (tell me what you did here) in sheets to produce a reconstruction of the face of the anterior bank of the central sulcus (Figs. 3S, 6S, 10S).

      Based on the reviewer’s suggestion, we tried provide a more detailed description of the underlying assumptions and the necessity of using so many layers in a recent blog post: https://layerfmri.com/2019/... <br /> In the revised version of the manuscript, we the included the following summarizing statement:

      The cortical thickness of the primary motor cortex is 4 mm (Fischl and Dale 2000). With our resolution of 0.79 mm, we obtained 5-7 independent data points across the thickness of the cortex. Across these data points, we created 20 layers across the thickness of the cortex on a 4-fold finer grid than the effective resolution. The number of twenty layers was chosen based on previous experience in finding a compromise data size and smoothness (see Fig. S6 in (Huber 2018)). Columnar profiles in Fig. 3 and Fig. S4 are generated from unsmoothed data. For Figs. S3 and S6, the functional signal was smoothed with 0.5 mm within columns and extracted in sheets to produce a reconstruction of the face of the anterior bank of the central sulcus. No smoothing was applied across columns.

      R3.13. Fig. 2B- For participant 5, the copper and turquoise outlines are reversed. Hue of copper and turquoise colors are not consistent in each panel. <br /> In last panel of 2B, first line- there is a hand in this panel. What is its purpose? If the purpose is to be a key for finger color, the thumb should be magenta.

      The reviewer is right, the copper and turquoise patch seems reversed in participant 5. Note, however that this is not a presentation error in the preparation of the images. We find that the grasping-extension patches do not follow a the same organization principle along the medial-lateral direction across participants. It is highly dependent on the position of the axial projection chosen. E.g. it can be seen in Fig. S6 (and previous version of Fig. S9) that, dependent on the depth of the central sulcus, the copper and turquoise patches are either on the medial or lateral side. Please also note that participant 5 is not an outlier here; in fact, participant 1 (in the same figure) has the same copper-turquoise alignment as participant 5. Please also note, that the sensory cortex consistently shows a grasping preference, across all participants.

      The additional hand pictogram had been included as a figure key to remind the reader, which color refers to which finger. Based in the reviewers comments, it is excluded in the revised version of the manuscript. It is already shown in panel A) anyway.

      R3.14. Fig. S3C- Several features of this figure make it hard to decipher and undermine the explanation of the reconstruction method. I am assuming that the little squares in panel B are equivalent to columns. This should be stated explicitly. If the colors correspond to the fingers, then the mirror representation of the hand shown in Figs. 1-3 is nowhere to be found. This is confounding. It may be useful to show the location of the slice in panel D. Panel D is reversed from panel A, creating needless confusion. In panel C, the laminar thickness of the cortex is greater than the depth of the central sulcus. Calibrations would help but why not make the laminar thickness accurate? State explicitly that the IMAGIRO reconstruction consists of 20 layers, each like the one in B. Spelling- Columnar 'distance' <br /> It took me a long time to understand what you were doing. The descriptions of the reconstruction needs to be simple, clear and intuitive or very few will comprehend them. It all makes sense but the reader should not have to go to the blog (which I did) to understand them.

      We thank the reviewer for the suggestions to make this figure clearer. We also applaud the reviewers level of commitment to check the description on our blog.<br /> -> The little squares indeed refer to the columnar dimension. Additional comments are included in the caption.<br /> -> The colors do not refer to finger dominance, but to the medial-lateral position. This is included in the caption now.<br /> -> The location colors are now included in panel C, as suggested.<br /> -> Panels C and D are now switched, as suggested.<br /> -> If, the laminar thickness could be accurately depicted, all 20 layers would be 2-3 mm apart in the figure. If we would depict it in the right geometry, the layers could not be separated with the naked eye. Scale bars are included as suggested, which points out how they are distorted.<br /> -> An explicit reference about 20 layers is included.<br /> -> The typo is corrected in “distance”

      Updated Fig. 3:

      We agree, that an intuitive image is helpful. Here, we tried to find a compromise of simple intuitive figures that are representing the complexity of the analysis without making the supplementary material too long. The reviewer’s comments are appreciated to achieve this.

      R3.15. Fig. 4S part B- Should note that this is upsampled to produce 20 layers.

      The revised version of the manuscript has an additional statement included:

      Note that the size of layer and column structures are smaller than the effective resolution of 0.79 mm. They are estimated in an upscaled space.

      R3.16. Fig. 9S- Why is the background of the VASO view of the anterior bank of the CS entirely red? This implies that the entire CS is related to the 5th finger. How is that possible? Why are there yellow and green patches distributed all along the CS? This arrangement is different from any of the other figures. There does not seem to be a double mirror representation in this participant. <br /> In the bottom panels, why is the view limited to just part of M1 instead of the whole of M1? In general, this figure is quite confusing and really difficult to interpret. The organization of the grasping and retraction patches is an important issue. A better explanation (illustration?) of what you are trying convey in this figure is necessary.

      We agree with the reviewer that previous Figure S9 could be confusing. We tried to show too many features in one Figure. Our goal of this figure was to show the consistency of the finger representations across the different tasks and also to show the position of the mirrored representation along the depth of the central sulcus. Based on the reviewer’s comments, we decided to remove Fig. S9. From the manuscript. We believe that these to messages already come across from Fig. S5, S6, S9 (new).

      To answer the reviewer’s questions (for the sake of his/her curiosity): <br /> -> The top-right figure was included for the sake of orientation. It was not included to suggest the significance of the mirrored pattern. Thus, we did not threshold the finger dominances at all. In areas outside the hand-knob, therefore, the finger-preference measure for all fingers is close to 0. The red color outside the hand knob does not mean that this finger is represented there. It only means that all the other fingers are even noisier. E.g. that the finger preference for the index finger is 0.0014 compared to other fingers with a finger preference of 0.0005. For reference, in the hand knob, the finger preferences are in the regime 0.3-1 (please, see Fig. 3B about the absolute selectivity strengths in an outside the hand knob). The previous figure S9 corresponds to the line graph in Fig. 3B from above. <br /> -> We believe that there is, in fact, a mirrored pattern visible in this figure. Within the Brodman area subsection BA4A, the color pattern is reversed.

      R3.17. Fig. 10S- in the right panel, the orientation seems to be incorrect. That is, left is lateral and right is medial which means the left ear arrow should be pointing to the right.

      We agree, the arrow description now says “right” ear.

      R3.18. I suggest alphabetizing the reference list.

      In the updated reference list “S” is after “O”.

      R3.19. The correct citation is- Meier JD, Aflalo TN, Kastner S, Graziano MS. Complex organization of human primary motor cortex: a high-resolution fMRI study. J Neurophysiol. 2008 Oct;100(4):1800-12. doi: 10.1152/jn.90531.2008. Epub 2008 Aug 6

      The reference is updated.

    3. On 2018-11-19 20:53:04, user Diedrichsen_lab wrote:

      This is a very interesting study investigating the spatial organization of hand movement representations in M1. We agree with the authors that the hand representation in M1 is likely complex and therefore requires advanced methods to probe. We would like to point out, however, that the authors’ reference to a previous paper from our lab (Ejaz et al., 2015, NatNeuro) contains a number of misunderstandings. Specifically, we take issue with the authors stating that 1) our work argues for a simple topographic arrangement of single finger representations in S1, and 2) that the overlap between finger activation patterns is “due to noise”.

      In our work (Ejaz et al., 2015), we used BOLD fMRI to measure the activity patterns evoked by single- and multi-finger movements in M1 and S1. The spatial arrangements of these patterns in both regions were stable within each participant (compared across different scanning sessions), but highly variable across participants. These finger patterns are shown in figure 1 of our paper. Close visual inspection of the patterns reveals they do not follow a clear linear arrangement in either S1 or M1, and perhaps some evidence of digit “mirroring” can be observed – definitely there are parts of the cortex activated for the thumb at the dorsal end of the hand region.

      We then calculate the dissimilarity between all pairs of finger patterns for M1 and S1, separately. Importantly, the relative dissimilarity between any pair of activity patterns (within a participant) was highly stable across participants. This is notable given the spatial arrangements of these patterns was highly variable across individuals. One stable characteristic was that the thumb pattern was more similar to the little finger than to the ring finger. This finding clearly shows – contrary to what our paper is cited for - that a simple linear somatotopic arrangement cannot account for the digit representations in M1 or S1.

      We then show that the stable structure of overlap of finger representations in M1 and S1 can be accounted for by the statistics of everyday hand movement. Thus, we did not interpret the spatial variability of these patterns “noise due to inter-individual variability in every day hand movements”. On the contrary, the statistics of hand use is stable across individuals (Ingram et al., 2008, Exp. Brain Res.), as is the organizing principle underlying the spatial organization of activity patterns in M1 and S1.

      Overall, both imaging and neurophysiological evidence clearly suggests that M1 is not so much concerned with the representation of fingers, but rather of complex hand movements. The use of a winner-take-all map for fingers is therefore a less effective way of gaining a deeper understanding of the organization of M1. We do agree with the authors that M1 organization is more complicated than a simple linear finger organization. Whether the organization really is best described by two discrete finger maps with phase reversal, however, really has to await a more rigorous experimental and statistical evaluation. Whatever the answer may be, however, we do think that the improved specificity of the VASO sequence may play an important role in uncovering such representations in the future, and we are excited to see these new developments.

    1. On 2018-11-08 13:53:42, user Ting-Yat Wong wrote:

      Review of "Association of a lincRNA postmortem with suicide by violent means and in vivo with aggressive phenotypes?"

      Dear Dr. Punzi,

      We recently came across your manuscript, which you submitted to bioRxiv. As part of the “International Research Training Group (IRTG) 2150 – The Neuroscience of Modulating Aggression and Impulsivity in Psychopathology” (www.irtg2150.rwth-aachen.de/) "www.irtg2150.rwth-aachen.de/)"), we offer students a comprehensive and unique qualification program. One of our students came across your manuscript while searching for appropriate material to discuss in our monthly journal club. In this part of the qualification program, students are asked to put themselves in the shoes of future reviewers and formulate constructive criticism. We really enjoyed reading your excellent manuscript and agreed that it contains important new findings. It is impressive that the authors revealed the impact of non-coding RNA on aggressive phenotypes with a relative large postmortem brain sample and further provided evidence that the non-coding RNA may link to emotion regulation and impulsiveness in an independent in vivo sample via fMRI. Below, you will find a list of our comments and suggestions, which we hope to help you to further improve your manuscript and publish it successfully.

      1. We can see the importance of testing the expression of LINC01268 in the region of DLPFC. However, we are also interested in whether you have tested other brain regions, such as anterior cingulate cortex.

      2. Your prior study was mentioned a few times in the preprint manuscript but its details are lacking. Given that the importance of this study, it might be better to give some more details (e.g. a brief description).

      3. Adding the samples from your prior study can add power to your analysis. However, we do not see the meaning of it. We deem that an independent sample as a replication is good enough.

      4. We concern about the priority of main texts in the method section. It might be less important to stress too much on the describing suicide cases and how you categorize them. We found that the Table 1 is already good enough to capture the overall picture of your samples. The audience may also need more information about the method you used. Therefore, you might consider rearrange the priority of the texts in your method section.

      5. We also concerned about the consent from your participants. It might be important to disclose this piece of information.

      6. Some figures have an external border containing the boxplots (e.g. figure 2A/B). It looks like screen captured figures. We recommend that you should use better quality figures.

      7. Uses of abbreviations should be careful. Otherwise, the audience is confused if some abbreviations popped up suddenly. For example, in the abstract the last sentence of the result section, using WGCNA is confusing and the unbiased audiences cannot understand what this refers to.

      8. Although WGCNA provided important information of the biological meaning of LINC01268, its results actually make the whole discussion more confusing. A bit more effort should be paid to link its biological meaning to aggressive phenotypes.

      Overall, we think that your manuscript is well-written, your experiment is well-designed and your study provides important and novel results. We hope our comments help you to improve the current version of your manuscript. In case you have any questions, please feel free to contact us (www.irtg2150.rwth-aachen.de/) "www.irtg2150.rwth-aachen.de/)").

      Best regards from Aachen, Germany<br /> The IRTG students

    1. On 2018-10-26 19:39:06, user Dorian Pustina wrote:

      Hi Kaori,

      I think you have done a great work here to achieve a comparison that is of high interest to the community. Congratulations. The most curious aspect was that you could run such a complex study on an i5 processor.

      About the results, I am not surprised that LINDA missed many subcortical/brainstem lesions. On the contrary, I was surprised that it got some of those lesions. The reason is simple: LINDA was designed and tested on large lesions, while ATLAS is composed of many cases with small lesions. In fact, I checked the ATLAS dataset a few months ago and the median lesion size was very low, around ~5ml.xt

      Brainstem lesions in particular would be very hard to detect with LINDA simply because LINDA is trained to expect some signal at its low resolution step, which probably is not there for small lesions. On top of this, I don't even think LINDA is considering the brainstem in the registration steps, it might mask it out completely.

      This said, I still think your work is very valuable. I have three minor suggestions:<br /> 1. Please describe the lesion properties in better detail, particularly lesion size,, and put this in perspective with the lesion sizes used in each of the studies that developed the respective methods.<br /> 2. In the conclusion paragraph, you state: "We observed that testing on multi-site data resulted in decreased segmentation accuracy." This sounds like the problem is the multi-site nature of the test dataset, which may discourage people from running multi-site studies. The drop in accuracy has more simply to do with the nature of lesion accepted in a dataset, their size and location. I don't see multisite studies to be a problem per se.<br /> 3. Looks like ASSD values in Table 4 do not match the values described in the manuscript.

      DISCLAIMER: I did not perform a thorough review of the paper. Any opinion expressed here is based on a quick superficial reading and should not be taken taken as proof of approval or disapproval.

    1. On 2018-01-11 23:16:17, user Leslie Vosshall wrote:

      We received some great questions and feedback from Christopher Potter at Johns Hopkins. Emily Dennis's replies are interleaved below:

      We just read your C elegans pre-print paper for our lab’s journal club. It was very interesting! I really liked the mutant screen. Very cool. We had a couple question/comments to send on (which I hope is OK?).

      1. The work’s impact might be greater if you could test more thoroughly if str-217 was indeed a GPCR that responds to DEET. The HEK heterologous expression didn’t work, but can you instead express str-217 in another worm chemosensory neuron that doesn’t respond to DEET and see if that now confers a response? I’m not a C.elegans person, but it seems like this should be fairly easy to do (especially with the Bargeman lab nearby).

      RESPONSE: I'm very excited about this experiment! We're doing our best to get clean signal/expression in a completely DEET-insensitive neuron (the first few neurons we looked at are affected by DEET in some way even without the str-217 receptor) -- we don't have this yet but those data will definitely make it into the revision(s) when we have them.

      1. For the experiments testing if DEET could act as an odorant (Figure 1C), DEET appeared not to do much. But given your later results that DEET responses, and ADL neuron activity, lead to changed in search (?) behaviors, I’m wondering if maybe its worth taking a closer look? Maybe I read it wrong, but it sounds like a paralytic is added close to the odor source to make it easier to count worms that made an odor choice. But this might hide a DEET response? Can you instead track the behavior of worms as they get closer to the DEET source? It could be the DEET-in-agarose worked because they just needed a higher local concentration of DEET, meaning that you might only see an olfactory effect when they get quiet close to a DEET source.

      RESPONSE: I totally agree it's hard to say DEET does nothing. We have had a really hard time coming up with a perfect experiment that separates the effects of method of delivery, time/duration of exposure, proximity, and concentration in these population chemotaxis assays. I did try a few things that didn't make it into the final paper that may be of interest. First, I added DEET to the lid of the dish and didn't see any effect of DEET (though we didn't include those data in the paper as the assay itself is non standard and a little messy since DEET can chemically interact with/'melt' plastic). I also did do some population chemotaxis experiments without the paralytic, and they look very similar to the results in our pre-print. Another related anecdote: in experiments with isoamyl alcohol and DEET as point stimuli, I often saw animals on the DEET spot (!) and the odor spots, but it would be fun to see if changing the distances between DEET/odor spots would change this, or if adding a DEET spot to a 'random' place on the plate would reveal any avoidance of that spot. In an experiment somewhat indirectly related to this idea of delivery/distance being important, I am also currently exploring how duration and strength of stimulation of ADL neurons specifically alters behavior (using optogenetics).

      My intuition from observing these experiments is that DEET alone has very little effect as an olfactory/point stimulus in this assay. However, I definitely do not think we've fully explored all contexts that volatile DEET could interact with, so it would be interesting to go through, say, a larger panel of odor stimuli and co-present with DEET to see if there's any change or to add volatile/point sources of DEET to other assays and see what happens.

      1. It wasn’t mentioned, but did the other mutants you identified also work in the same neuron, or perhaps implicate a shared signaling pathway?

      RESPONSE: We only were able to map one other strain, which mapped to the gene nstp-3. Our early attempts to do cell ID and figure out where this is expressed weren't informative so we don't know if it's the same or different cells. My guess is there are lots of genes and lots of neurons required for complete DEET-sensitivity, so there's lots more to do & explore! I would love to see someone do a sensitized screen in the str-217 mutant strain to see if we can get even higher chemotaxis...

      Nice work! Fingers crossed for a painless journal review.

      RESPONSE: Thanks again, this was a lovely email to receive.

    1. On 2017-12-13 16:36:30, user Md Nurul Islam wrote:

      I think there is a different in the approaches spatially tuned cells needs to be dealt with. In our lab we prefer detecting with empty eyes and then going into computation to do further verification and characterization. This way we reduce False positive rates that may arise from doing shuffling on spatial units. Again, the whole point of shuffling the spike times is that we want to see if the results that we see (place cell, grid cell, hd cell) are random and the generation of spiking activity is particular variable dependent. Now, when it comes to grid cell, given that it has multiple firing fields, there is always a big chance that animal will be closer to or at one of the firing fields and the shuffled spike will be associated to that location mimicking the grid-like pattern and giving a 95% gridness score closer to the original spiking activity, so there may have a chance of False negative as well (I have not tested it). But when it comes to False positive, I do not see a reason not to look into the geometric locations of the autocorrelation peaks and only looking into gridness as a 'verification' tool for the grid cells. And the entire idea of 'detecting' or 'determining' or 'verifying' grid cells based on shuffling analysis, and considering it as an standard, does not need to have a proof of high false positive rate when it fundamentally may not be acceptable to do so.

    1. On 2017-03-16 13:03:39, user Róbert Bódizs wrote:

      Dear Colleagues!<br /> The issue of slow and fast sleep spindle frequencies, as well as the problem of the individual- and derivation-specific amplitude criteria is the main focus of our research group since 2004 (http://dx.doi.org/10.1111/j... ). After recognizing the empirical and theoretical importance of the issue we created a new conceptual framework and a new methodology in order to formalize the phenomenon of individual-specificity of sleep spindles (J Neurosci Methods. 2009 Mar 30;178(1):205-13. doi: 10.1016/j.jneumeth.2008.11.006). Our conceptual proposals and empirical findings were published in several scientific papers. Journal of Sleep Research, Journal of Neuroscience, Scientific Reports, Frontiers in Human Neuroscience, Developmental Psychology are among the journals publishing these findings and considerations. It is unfortunate that authors of the present report completely dismiss these parts of the scientific literature. Several problems and issues reported in the manuscript of Cox et al, were already addressed, carefully reviewed and considered elsewhere. This neglect is however, even more embarrassing if one considers that 3 out of 4 authors of the present paper were listening to a keynote presentation on the issue of individual specificity in sleep spindling the last year (The 1st International Conference on Sleep Spindling, May 12-14, Budapest: http://static.akcongress.co... ). In this presentation I reviewed and critically considered the problem of universal, ad-hoc frequency and amplitude criteria in sleep spindle detection. The introductory part of the Cox et al paper is a reflecting on the same issue. We are also the first proponents of the idea that human sleep spindles have to be individualized in the broader frequency range of 9-16 Hz. The same values appear in the Cox et al paper. Last but not least, we empirically tested the best match between frequency criteria of fix frequency methods and individualized frequency methods. Results were particularly interesting as the 12 Hz (or 12.5 Hz) demarcation was found to be the optimal for dissociating slow and fast sleep spindles in healthy human adults in contrast to the widely acknowledged 13 Hz value (Front. Hum. Neurosci., 17 February 2015 | https://doi.org/10.3389/fnh... "https://doi.org/10.3389/fnhum.2015.00052)"). It happens that the 12 Hz value is considered as being optimal - again without mentioning the outcomes of our analyses resulting in the same value. An analysis which was performed on 161 healthy volunteers.<br /> I hope that the authors are willing to acknowledge the above mentioned reports as parts of the common knowledge of our scientific community.<br /> It is strange that - for sem reason - my previous comment was deleted from this site. I am just wondering why this happened, as I do not think that my sentences have to be moderated. <br /> Sincerely yours,<br /> Róbert Bódizs

    1. On 2016-12-22 20:18:01, user mauromanassi wrote:

      Dear Will and Peter,

      Congratulations on your new paper! We have read it with great interest. We listed below our concerns and comments on it. We hope you will find these comments useful, we wrote them with a very constructive spirit hoping to improve the manuscript.

      General comments:

      1. You mentioned that three general classes of mechanism have been advanced to account for crowding (positional uncertainty, feature averaging and source confusion). How do you consider grouping? Another mechanism? When do you think it occurs? Any assumption would have strong constraints on the way the model is built.

      2. Lines 172-176. It is not clear why mixture modeling based on maximum likelihood would fail to predict the underlying distribution of a data set. This technique has been widely used in the visual short term memory literature as the author properly cited. Some of us have also been using it for explaining visual masking and its interaction with spatial attention (Agaoglu, Agaoglu, Breitmeyer, & Ogmen, 2015; Agaoglu, Breitmeyer, & Ogmen, 2016).

      3. Categorizing errors based on their distance to the nearest model prediction is technically equivalent to mixture modeling with three circular Gaussians, each sitting at the error predicted by each model (averaging, substitution etc.). So the method used here is qualitatively similar but quantitatively seems rather arbitrary. The current way of analysis implicitly assumes that the best way to account for crowded responses is a mixture model with (at least) three components, and then goes onto quantifying the weight of each component as a function of target-flanker spacing.

      Minor comments:

      The novel contribution of this study is a bit unclear to us. If it is to show that a population code of orientation selectivity can generate all types of errors, what is exactly the difference between your previous paper (CB 2015) and this manuscript?

      Poder & Wageman 2007 study is highly relevant to this work. Also Ester and colleagues' studies used a similar approach, and the differences in model parameters between similar and dissimilar flankers in Ester et al. (2015) and the differences between one-gap flanker and two-gap flanker conditions in this study would be very interesting to compare.

      In a recent study using the stimulus paradigm that you used previously (Agaoglu & Chung, 2016), we have shown that this particular stimulus paradigm is prone to eccentricity confounds. Perceptual errors are highly affected by the absolute orientation of the target and flankers, not just relative to each other. It is unclear how this affects the results reported here.

      Line 34. It is fair to ask to cite our relevant work (Agaoglu, Chung, & Ogmen, 2016) where you described previous work on crowding and eye movements, since we presented a different point of view. The same holds for Pachai, Doering & Herzog 2016 (you cited only the reply to the reply). As scientists, we can agree to disagree, we hope.

      Line 143. Except for N1, perceptual error does not seem to follow a linear trend. For A2 there is an increase in perceptual error only for the smallest flanker size. You may want to revise that sentence.

      Line 270. We have a supporting evidence for this sentence. The role of masking is indeed increasing random guessing and slightly decreasing stimulus encoding precision (Agaoglu, Agaoglu, Breitmeyer, & Ogmen, 2015). However, ruling out metacontrast masking only because of this seems weak. Since the stimulus duration was 500 ms, we don't think there is any masking at all. You might also want to mention that to support the claim made in this sentence.

      Mauro Manassi<br /> Mehmet Agaoglu<br /> Michael Herzog<br /> Susana Chung

    1. On 2016-08-24 20:55:25, user Tal Yarkoni wrote:

      This is an innovative and very thought-provoking paper that will hopefully be widely read by researchers working with fMRI. I have two general comments with respect to the authors' main thesis:

      1. As far as I can tell, the authors don't motivate the decision to focus exclusively on sub-voxel representations. They point out that non-smooth sub-voxel representations would be impossible to detect with fMRI, which is an important observation. But surely non-smooth *supra-voxel* representations would still be easily detectable with fMRI. A priori, there doesn't seem to be a good reason to rule out this kind of representation in the brain. As far as I can tell, representational similarity analyses would still work successfully if the brain were composed of hundreds of functionally discrete tiles that were non-smooth at both the sub-voxel and supra-voxel levels. This doesn't seem like a far-fetched possibility; for example, suppose that when people think about penguins, they're somewhat more likely to think about the unusual climate in which penguins live. Representations of climate may be non-smooth, yet reside in fundamentally different brain circuits from representations of physical shape, size, etc. One consequence would be that neural representations of robins would almost certainly more closely resemble those of sparrows than those of penguins even if there were no spatially graded sub-voxel representations at all in the human brain--simply in virtue of sharing a larger number of salient properties with the former than the latter. Of course, I'm not suggesting that there _aren't_ smooth sub-voxel representations in the brain, but simply that the authors conclusion that "the neural code must be smooth, both at the subvoxel and functional levels" doesn't necessarily follow.

      2. Even if one assumes that the signal detected by fMRI is in fact driven entirely by smooth sub-voxel representations, it still wouldn't follow that the neural code must be smooth at the sub-voxel level. All we would be able to conclude is that there is at least *some* component of the signal that is smooth. This would not preclude other neural codes from existing, and in fact, we already have abundant evidence of non-smooth sub-voxel representations. For example, ocular dominance columns clearly exist, and if fMRI is unable to detect them, that reflects a limitation of fMRI, not a generalizable claim about the way the brain represents information. While I'm not a systems neuroscientist, I would imagine that there are any number of examples in the systems neuroscience literature of non-smooth, but highly structured sub-voxel representations that would probably be completely undetectable with fMRI. So I think the authors may want to be more circumspect about the conclusions they draw. Their results don't really show that only a subset of neural coding schemes are plausible; rather they suggest that whatever neural representations fMRI is capable of detecting are likely to stem from either (a) smooth representations (either sub- or supra-voxel) or (b) non-smooth supra-voxel representations. This leaves open the possibility (and it seems like a very real one) that the vast majority of information represented in the brain is not represented in a way that is amenable to detection with fMRI.

      Setting these concerns aside, I think this is still a paper that should be of great interest to most cognitive neuroscientists. One point that is made very elegantly here is that the nature of neural representations does not have to be (and probably isn't) uniform across the brain. In particular, the authors put forward a compelling argument for the possibility that brain regions higher in the processing stream--and that are more likely to represent very abstract, multidimensional information--may not be amenable to imaging at all. This point should give many fMRI researchers pause when considering studying, e.g., the representational structure of prefrontal cortex. At the very least, the manuscript raises a number of important questions that should spur further discussion within the neuroimaging community.

    1. On 2016-06-23 17:12:34, user Justyna Hobot wrote:

      "@anilkseth: TMS to prefrontal (or parietal) cortex does NOT impair visual metacognition, new @sacklercentreled by @DanielBor https://t.co/LnHE3DRtL5."

      Dear Authors, how would you rate your awareness that the quoted sentence is just a catchy overstatement? I allow myself to post some comments on the paper, I hope this might be helpful.

      1. "An advantage of TMS, besides its non-invasive nature, is that TMS-induced changes are limited to short time periods so that plasticity is unlikely to affect performance."

      Didn’t you apply TMS in order to induce the plasticity-like changes that affect cognitive performance?

      1. "First, continuous theta burst TMS (cTBS) was used instead of repetitive TMS."

      Continuous Theta Burst Stimulation (cTBS) is an example of repetitive TMS. Repetitive TMS simply means it has a precise temporal pattern of pulses, and cTBS has the precise temporal pattern of pulses (see e.g. Bergmann 2016 or Oberman 2011).

      Bergmann, T. O., Karabanov, A., Hartwigsen, G., Thielscher, A., & Siebner, H. R. (n.d.). Combining non-invasive transcranial brain stimulation with neuroimaging and electrophysiology: Current approaches and future perspectives. NeuroImage. http://doi.org/10.1016/j.ne...<br /> Oberman, L., Edwards, D., Eldaief, M., & Pascual-Leone, A. (2011). Safety of Theta Burst Transcranial Magnetic Stimulation: A systematic review of the literature. Journal of Clinical Neurophysiology, 28(1), 67–74. http://doi.org/10.1097/WNP....

      1. "This technique involves a very rapid sequence of TMS pulses, typically for 40 s, and is thought to suppress cortical excitability for up to 20 minutes (ref. 19)"

      "thought to suppress cortical excitability" – the 40 s cTBS may suppress M1 excitability, as long as it is applied correctly and the basal state of the brain allows such changes to occur, but e.g. the change of current direction can reverse inhibition to facilitation (see e.g. Jacobs 2012), and the short version of cTBS (like the one used by you) may actually increase M1 excitability, if there is no prior voluntary motor activation (see e.g. Gentler 2008).

      "for up to 20 minutes" – you referred to Huang 2005, where the motor cortical excitability after the 40 s of cTBS was suppressed for 60 min. The after-effects lasting up to 20 minutes were also reported, but after 20 s (not 40 s) of the cTBS. Therefore, there is no need to confuse the reader by writing: "TMS pulses, typically for 40 s, and is thought to suppress cortical excitability for up to 20 minutes".

      Jacobs, M. F., Zapallow, C. M., Tsang, P., Lee, K. G. H., Asmussen, M. J., & Nelson, A. J. (2012). Current direction specificity of continuous ?-burst stimulation in modulating human motor cortex excitability when applied to somatosensory cortex. Neuroreport, 23(16), 927–931. http://doi.org/10.1097/WNR....<br /> Gentner, R., Wankerl, K., Reinsberger, C., Zeller, D., & Classen, J. (2008). Depression of human corticospinal excitability induced by magnetic theta-burst stimulation: evidence of rapid polarity-reversing metaplasticity. Cerebral Cortex (New York, N.Y.: 1991), 18(9), 2046–2053. http://doi.org/10.1093/cerc...<br /> Huang, Y.-Z., Edwards, M. J., Rounis, E., Bhatia, K. P., & Rothwell, J. C. (2005). Theta burst stimulation of the human motor cortex. Neuron, 45(2), 201–206. http://doi.org/10.1016/j.ne...

      1. "In this way, TMS administration can be entirely separated from the behavioural task, and therefore will not distract the participants from it."

      It may be worth to note that what happens just after applying cTBS may reverse its after-effects (see e.g. Huang 2008), which means the first minutes of performing the post-TBS block may influence the effects observed on the following part. Did you try to check, how consistent the task performance was, by comparing the first 150 trials with the second half of the block?

      Huang, Y.-Z., Rothwell, J. C., Edwards, M. J., & Chen, R.-S. (2008). Effect of physiological activity on an NMDA-dependent form of cortical plasticity in human. Cerebral Cortex (New York, N.Y.: 1991), 18(3), 563–570. http://doi.org/10.1093/cerc...

      1. "In addition, a small (n=7) patient lesion study showed that the anterior prefrontal cortex (i.e. a region neighbouring the DLPFC) selectively impaired perceptual metacognition, though not memory-based metacognition, compared with patients who had temporal lobe lesions (27)."

      You may check Del Cul 2009 paper, which also indicated the involvement of aPFC in perceptual metacognition, and the study was conducted on a bigger group of patients (n=15) than the one you refer to. Moreover, McCurdy 2013 showed that variation in visual metacognitive efficiency in his study was correlated with volume of frontal polar regions, while the variation in memory metacognitive efficiency with volume of the precuneus. However, I wonder, how this should support the use of DLPFC, instead of aPFC? Only because it is a neighbouring region?

      Cul, A. D., Dehaene, S., Reyes, P., Bravo, E., & Slachevsky, A. (2009). Causal role of prefrontal cortex in the threshold for access to consciousness. Brain, 132(9), 2531–2540. http://doi.org/10.1093/brai...<br /> McCurdy, L. Y., Maniscalco, B., Metcalfe, J., Liu, K. Y., Lange, F. P. de, & Lau, H. (2013). Anatomical Coupling between Distinct Metacognitive Systems for Memory and Visual Perception. The Journal of Neuroscience, 33(5), 1897–1906. http://doi.org/10.1523/JNEU...

      1. "In experiment 1 we therefore sought to replicate the Rounis study, as well as extend it to the posterior parietal cortex, since this region in neuroimaging studies is very commonly co-activated with DLPFC".

      What do you mean when saying "this region"? PPC is an area, big enough to be consisted of subregions that have a different cytoarchitectonics, a different pattern of structural connectivity, and the activity of these subregions may correlate in a different way with the activity in different subregions of DLPFC (e.g. Leech 2011). The same of course applies to DLPFC (see e.g. Optiz 2016 for comparison of distinct DLPFC stimulation zones with respect to functional networks).

      Leech, R., Kamourieh, S., Beckmann, C. F., & Sharp, D. J. (2011). Fractionating the Default Mode Network: Distinct Contributions of the Ventral and Dorsal Posterior Cingulate Cortex to Cognitive Control. The Journal of Neuroscience, 31(9), 3217–3224. http://doi.org/10.1523/JNEU...<br /> Opitz, A., Fox, M. D., Craddock, R. C., Colcombe, S., & Milham, M. P. (2016). An integrated framework for targeting functional networks via transcranial magnetic stimulation. NeuroImage, 127, 86–96. http://doi.org/10.1016/j.ne...

      1. "Furthermore, we attempted to enhance the original Rounis design, by including an active TMS control (vertex), rather than sham stimulation."

      Is there any reason to assume that by applying 2 times the same protocol to the same site (600 pulses to the vertex) you control for the effects of applying the same protocol to two different sites (300 pulses to each site)?

      1. "We were concerned that managing the relative frequency of subjective ratings of "clear" and "unclear" labels across an experiment may have placed additional working memory demands on participants, since they would need to keep a rough recent tally of each rating in order to balance them out. In addition, these labels were difficult to interpret psychologically on account of their relative nature. We therefore opted instead for the labels "[completely] random [guess]" and "[at least some] confidence." Using confidence instead of clarity labels is a common practice, consistent with other recent metacognition studies (24, 25)."

      What do you think about a possibility that by replacing the introspective report with a different kind of metacognitive report you investigated a different phenomenon/underlying processes than Rounis 2010 did (see e.g. Overgaard and Sandberg 2012)? In the papers of Fleming you refer to, metacognitive assessment always follows the behavioural response, which means it relies on processes such as e.g. error monitoring (see e.g. Young and Summerfield 2012), and in your paradigm the behavioural response is combined with the metacognitive rating, so it may be difficult to conceive it as a metacognitive measure of the confidence in choice ("Most notably, confidence in choice was used instead of visibility to determine metacognitive judgement.").

      Overgaard, M., & Sandberg, K. (2012). Kinds of access: different methods for report reveal different kinds of metacognitive access. Philosophical Transactions of the Royal Society B: Biological Sciences, 367(1594), 1287–1296. http://doi.org/10.1098/rstb...<br /> Yeung, N., & Summerfield, C. (2012). Metacognition in human decision-making: confidence and error monitoring. Phil. Trans. R. Soc. B, 367(1594), 1310–1321. http://doi.org/10.1098/rstb...

      1. "The AMT was defined as the lowest intensity that elicited at least 3 consecutive twitches, stimulated over the motor hot spot, while the participant was maintaining a voluntary contralateral finger-thumb contraction."

      There is no consistency in the literature in what is understood as AMT, the main differences are present in: the amount of pulses required, the amplitude of MEP required, the level of muscular contraction. By looking at this paper the reader cannot know what method was used, even if it was the same as Rounis 2010 it still says nothing, as she does not provide this information either.

      1. "cTBS was delivered with the handle pointing posteriorly and the coil placed tangentially to the scalp"

      What was the current direction used? If you did not change the current direction to the reversed (AP-PA in the brain), then the current flow (PA-AP) was the opposite to the optimal (AP-PA), that presumably resulted in higher motor thresholds compared to ones that are obtained by using the optimal method.

      1. "The standard cTBS pattern used, as with the Rounis 2010 study, was a burst of three pulses at 50 Hz given in 200 ms intervals, repeated for 300 pulses (or 100 bursts) for 20 s."

      It may be good to mention the pulses (if they) were biphasic. Also "given in 200 ms intervals" may confuse the reader, because she may not be sure whether the inter trial interval (the time period between the last pulse in the first train to the first pulse in the next train) was 160 ms (as it should be) or 200 ms.

      1. You have performed a lot of stimulations, have you forgotten that PPC was stimulated as well? There is no information in the paper on how PPC was determined; neither about the region of interest (within PPC) nor about the method used to target this region. Also, you may want to change PPN to PPC on the charts.

      2. Surprisingly, there are quite big differences in metacognitive sensitivity in the pre-TBS blocks of the experiment 1, which makes it impossible to compare the effects resulting from stimulation to the different sites. Even more surprisingly, you do not address this issue in the discussion.

      3. "In this way, we could rigorously explore the within subject likelihood of both a metacognitive impairment (or enhancement) following DLPFC cTBS and no metacognitive change following vertex cTBS, with a potential single subject replication of this pattern."

      Doesn’t the lack of counterbalancing across the simulation sites indicate this was not a "rigorous exploration" (e.g. an influence of the behavioural learning)?

      1. "The remaining 17 participants are summarised in table 5. Ten of these participants had no meta d’ changes on the first DLPFC session, and thus were not asked to return for subsequent sessions."

      Does it mean that if you got the intended effect (by rejection of >50% of the participants), you would conclude that cTBS influences metacognitive sensitivity? I assume that you would not, therefore it may be difficult to follow the idea behind the rejection of participants who do not confirm the expectations of the researchers.

      1. "Of the remaining 7 participants, 3 showed the expected impairment, while 4 showed a clear metacognitive enhancement following DLPFC cTBS. 6 of these 7 participants also showed a clear metacognitive change for the vertex control session, and thus were not asked to return for the 3rd session (2nd DLPFC)."

      Still quite difficult to follow. The possibility of obtaining some significant effects caused by stimulation to the control site, in my opinion, represents the goal of the active control stimulation (performed in order to evaluate whether the potential significant effect of stimulation is site-specific). Also it probably shouldn’t be surprising to observe some effects in your control condition, as the vertex stimulation may influence the activity in DMN (e.g. Jung 2016).

      Jung, J., Bungert, A., Bowtell, R., & Jackson, S. R. (2016). Vertex Stimulation as a Control Site for Transcranial Magnetic Stimulation: A Concurrent TMS/fMRI Study. Brain Stimulation, 9(1), 58–64. http://doi.org/10.1016/j.br...

      1. "We have therefore not only failed to replicate the Rounis result, but provided evidence from our own experiments that on this paradigm there is no modulatory effect of theta-burst TMS to DLPFC on metacognition."

      This evidence is not a scientific evidence, this explanation is as likely as the one that you did't apply the stimulation protocol properly (e.g. because it may work only when the current flow is perpendicular to the stimulated structure). The generalisations such as "no modulatory effect of theta-burst TMS" may not be accurate, especially in the case when one uses only the short version of one type of TBS protocols (300 pulses of cTBS), or "DLPFC" – this is just the general term, that is related to multiple subregions, and the stimulation in your study was (probably) applied just to one of them.

      1. "First, it may well be that cTBS of cortex, at the medically safe stimulation thresholds commonly employed (80% of active motor threshold) is just not intense enough to induce a subtle cognitive effect, such as a reduction in metacognitive sensitivity."

      Is there any way to verify this explanation? For example, by providing the reader with the information about the average MSO, the current direction used, the method used to determine AMT?

      1. "To our knowledge, only one published paper to date, besides that of Rounis and colleagues, has demonstrated the general efficacy of DLPFC cTBS in modulating cognitive performance (38)."

      What about, e.g.: cTBS applied to the left DLPFC impairs MCST performance (Ko 2008); DLPFC stimulation changes subjective evaluation of percepts, i.e. metacogniton (Chiang 2014); cTBS over the left DLPFC decreases medium load working memory performance (Schicktanz 2015). Moreover, Rahnev 2016 reported that both: cTBS applied to right aPFC and cTBS applied to right DLPFC affected metacognition. Is there any reason to ignore the results that are not consistent with the view presented in the discussion?

      Ko, J. H., Monchi, O., Ptito, A., Bloomfield, P., Houle, S., & Strafella, A. P. (2008). Theta burst stimulation-induced inhibition of dorsolateral prefrontal cortex reveals hemispheric asymmetry in striatal dopamine release during a set-shifting task – a TMS–[11C]raclopride PET study. European Journal of Neuroscience, 28(10), 2147–2155. http://doi.org/10.1111/j.146<br /> Schicktanz, N., Fastenrath, M., Milnik, A., Spalek, K., Auschra, B., Nyffeler, T., … Schwegler, K. (2015). Continuous Theta Burst Stimulation over the Left Dorsolateral Prefrontal Cortex Decreases Medium Load Working Memory Performance in Healthy Humans. PLoS ONE, 10(3). http://doi.org/10.1371/jour...<br /> Chiang, T.-C., Lu, R.-B., Hsieh, S., Chang, Y.-H., & Yang, Y.-K. (2014). Stimulation in the Dorsolateral Prefrontal Cortex Changes Subjective Evaluation of Percepts. PLOS ONE, 9(9), e106943. http://doi.org/10.1371/jour...<br /> Rahnev, D., Nee, D. E., Riddle, J., Larson, A. S., & D’Esposito, M. (n.d.). Causal evidence for frontal cortex organization for perceptual decision making.

      1. "Following a 1 minute interval, this was repeated at a different site for a further 20s (or again on the vertex in the control condition), determined by which group the participant was assigned to. The five groups were: i) bilateral DLPFC, ii) bilateral PPC, iii) left DLPFC and PPC, iv) right DLPFC and PPC, and v) VERTEX (control)."

      Did you counterbalance the starting sites of the stimulation?

      1. "However, the fact that we did not observe metacognitive impairment reliably in any subject in experiment two speaks against interpreting our null results simply in terms of missing the DLPFC during cTBS."

      Does it? Following this way of reasoning one may conclude you missed the DLPFC in the first experiment, as you observed the effect just for some of the participants.

      1. "... our results nevertheless indicate that the cTBS approach is not sensitive enough to establish a causal link between DLPFC and metacognitive processes."

      Can it stem from the fact you used a short version of the protocol (300 pulses), and a probability the conventional cTBS (600 pulses) is excitatory in the first half and switches to inhibition only after the full length protocol (see e.g. Gamboa 2010), so application of 300 cTBS pulses may result either in no change or in small inhibitory/excitatory effects? Or, can it rather result from a possibility that the site within DLPFC you were targeting may have nothing to do with metacognitive processes?

      Gamboa, O. L., Antal, A., Moliadze, V., & Paulus, W. (2010). Simply longer is not better: reversal of theta burst after-effect with prolonged stimulation. Experimental Brain Research, 204(2), 181–187. http://doi.org/10.1007/s002...

    1. On 2014-06-10 13:49:57, user Authors of the manuscript wrote:

      Dear Mike X Cohen,

      this kind of personal commenting is much more helpful and constructive for the authors than the anonymous peer-review process and we thank you for taking your time to write this comment. We respond to some of your points in the following:

      MXC: “It is not always clear whether the authors are criticizing the biophysical interpretation of CFC analyses, or the mathematical foundations of CFC methods. Perhaps it would be useful for the authors to define the situations under which CFC could be validly interpreted, and what exactly the neurobiologically meaningful interpretation would be.<br /> Concerning the former, the authors accurately state that relatively little is understood about the neural mechanisms that could produce CFC, and this may impede interpretations of empirical findings (the same criticism applies to most macroscopic measures of brain activity, including ERPs, time-frequency power, most measures of functional connectivity, the FMRI BOLD response, etc.).”

      Authors:

      We agree with this comment in the sense that indeed many measures in Neuroscience depend on an interpretational step. However, in contrast to the current handling of CFC, these aspects are well acknowledged for measures like BOLD and ERP. In addition there have been intense efforts to disentangle various generating mechanisms of BOLD signals and ERPs. (For the origin of the BOLD signal, the role of astrocytes, lactate, and calcium see for example: Niessing et al, Science, 2005; Logothetis et al., Nature, 2001; Barros, TINS, 2013; Petzold&Murthy, Neuron, 2011; Iadecola&Nedergaard, Nat Neurosci, 2007 . For generating principles of the ERP see for example: Mazaheri & Jensen, J Neurosci, 2008; Turi et al. NeuroImage, 2012; Telenczuk et al, J Neurophysiol, 2010, and references therein).

      In these fields, the variety of generating mechanisms is typically discussed and wording is carefully chosen. With respect to the interpretation of CFC measures, this care is often lacking. Moreover, the mathematical methods of CFC are more involved compared to standard BOLD-fMRI or ERP analyses. Therefore, plain technical errors in published work occur more frequently than in either ERP or BOLD fMRI studies.<br /> _____

      MXC: “Their suggestion for researchers to label their CFC analyses as relatively exploratory vs. confirmatory and as a marker vs. biophysical understanding (figure 5) is also sensible (this suggestion also could be applied to most or perhaps all measures of brain activity). The reliance on DCM should be cautioned against the over-parameterization and opaqueness of DCM models used in practice.”

      Authors:

      We agree with this comment insofar as the mathematics involved in DCMs is necessarily much more involved than that in the current standard CFC analyses. In our opinion however, this is outweighed by the advantage to be able to state the relative odds for and against the presence of a CFC mechanism in the data. Moreover, we also agree that the mathematical complexity of model specification indeed results in a certain opaqueness, especially to the lay.

      We disagree with the criticism of over-parametrization, as models selected by Bayesian model comparison need two properties: (1) the ability to explain the data well, and (2) generalizability. The latter is ensured by automatically favoring models that explain the data well without using an excessive number of parameters, thus implementing Occam's razor. However, it is indeed necessary to carefully specify models for comparison, that are plausible a priori, based on existing knowledge (Lohmann et al, NeuroImage, 2013; comments by Friston et al, NeuroImage, 2013; Breakspear, NeuroImage, 2013; reply by Lohmann, NeuroImage, 2013). This requirement may mean that DCMs of CFC will have to wait until the mechanisms underlying CFC are spelled out more explicitly using interventions.<br /> ____

      MXC: “the general point is that methods for assessing CFC are not necessarily confounded just because their results can be difficult to interpret from a neurophysiological perspective. Let me explain this by analogy: Imagine comparing ten randomly selected negative numbers with ten randomly selected positive numbers. A t-test would indicate statistical significance, but this significance is uninterpretable. However, the reason that the result is uninterpretable is not due to a confound of the t-test, but rather, due to the assumptions underlying the data collection. Imagine you received the same numbers but were told that they reflected measurements of relative alpha-band power in conditions A and B. Now the same result would be interpretable.”

      Authors:

      Indeed, in some sense the whole first part of our paper illustrates the variety of different but equally plausible reasons behind a CFC signature, or different possible interpretations if you wish. So, why do we call them "methodological confounds"?

      Taking an analogy with the t-test might help us here, though we think that the analogy provided by MXC is slightly misleading and prefer a different version of the analogy. Namely, when you make a t-test, the un-interpretability is not only about the "origin of the data" (as in the example of MXC), but also (and actually even more) about the "nature of the data".

      T-test makes specific assumptions on the underlying probability distribution (e.g. normality) and when these assumptions do not hold, the p-value obtained might very well just reflect the fact that the underlying distribution did not match well.

      This is similar to CFC - we do not claim that the CFC measures are wrong, but in some sense show that the underlying assumption that there is real coupling in the data might well be doubted (for several reasons explained in the text). We show how alternative assumptions (i.e. non-linearity, common drive etc) could as well account for high CFC values. I.e. the CFC measure describes the amount of coupling only if we already assume the existence of this coupling, and the absence of the other mechanisms, or their constancy over experimental conditions.

      Maybe "methodological confounds" sounds more appropriate if one keeps also this analogy in mind - if the methodology is applied in case of doubt with assumptions, the results are not interpretable. It is the same with the T-test - applying it to any distribution, one is not able to draw conclusions. This is not a fault of the T-test. However we would end up with a possible confound if we DID not know what the underlying distribution is, but still applied the T-test. In the case of CFC analysis we do not have a good understanding of underlying biophysics, but still apply the CFC measure and try to interpret it.

      It might be useful to compare two different possibilities of expanding the acronym CFC - either Cross-Frequency Correlation or Cross-Frequency Coupling. The latter indicates biophysical interaction and even causality and is the one used now in the literature. Our article discusses at length why in fact we should rather hold to Cross-Frequency Correlation. Moreover, we explain that even in this case it is important to try to partial out the effects that could diminish the specificity of CFC as a marker.<br /> ______

      MXC: “Their first example is the van der Pol oscillator. The authors claim that CFC here reflects a confound, because (page 3) “there is no simple physical interpretation for the different frequency components of the oscillator.” The interpretation depends entirely on the assumptions of the signal. If this were a neural signal, one might interpret that certain phases of the lower frequency oscillation regulate the variability of faster activity (as an aside, the lack of band-limited activity in Figure S1 is a classic situation of when *not* to interpret results as reflecting an oscillation; this has been discussed since the 1990’s by, among other researchers, Singer, Tallon-Baudry, Pfurtscheller, Miller). This is readily apparent by plotting the van der Pol signal along with its rectified derivative, which can be obtained with the Matlab code below:

      ode = @(t,y)

      vanderpoldemo(t,y,1);

      [t,y] = ode45(ode,[0 20],[2 0]);

      plot(t,y(:,1)), hold on

      plot(t(1:end-1),abs(diff(y(:,1)))*8,'r')

      The problem here is not with the measure of CFC. In fact, I do not see a problem at all; the authors simply tested a method on simulated data and got a result, much like a t-test on signed random numbers would produce a result. Here is another, even more striking, example:

      t=0:1/1000:1;

      plot(t,sin(2*pi*40*t) .*sin(2*pi*t))

      As with the van der Pol illustration, one can say that CFC here is uninterpretable because there is no interaction amongst subsystems; there is simply a 40-Hz sine wave multiplied by a 1-Hz sine wave (this could occur from two independent systems with wave cancelation at the recording electrode). Again, the problem is not with the CFC measure, but that the simulated data do not lend themselves to a neurobiological interpretation of CFC.”

      Authors:

      Indeed, “the simulated data do not lend themselves to a neurobiological interpretation of CFC”, and neither do the neurobiological data at the moment. This is one of the main points of the manuscript.

      The problem is that for now, the neurobiological measurements might not lend themselves to the “coupling” interpretation of CFC. The CFC analysis has been adopted and is used with a certain aim and interpretation. Thus it seems fair to say that if the methodology does not provide answers and interpretations it should, we deal with "methodological confounds".

      The examples brought up show that without further assumptions and knowledge of the underlying neurobiology, current methodology is unable to discriminate between various basic but very different interpretations. In analogy with the T-test example above, similar other toy examples treated with a T-test would illustrate what could happen if the underlying distribution did not match the assumptions (i.e. normality) - and why a T-test is not applicable without checking its assumptions first.

      As we mention in several places, this is not a problem when one tries to use the CFC measure only as a MARKER, however the problem comes when one goes one step further in the interpretation, trying to give a particular (physiological) meaning to CFC (“high frequency oscillations modulated by low frequency phase” or something along these lines).

      Also, notice that your second example (modulated sinusoids) does tell you something about which parameters (in terms of bandwidth) should be used so that the CFC measure would be closer to its desired interpretation.<br /> ____

      MXC: “Their other examples are also not compelling as identifying any confounds with CFC measures. Prime numbers are nonrandom sequences with a periodic structure (http://xxx.lanl.gov/pdf/cond-m... and anyway, true random sequences can appear non-random at small N. A more serious concern is that the authors are interpreting CFC in random data or in ECoG data with non-linearity introduced (Figure S6) without performing any statistics to justify the interpretation of CFC. Analogously, a t-statistic on random numbers is unlikely to be exactly 0; it is only through evaluation of that t-statistic with respect to a null hypothesis distribution that a t-value of, say, 1.5 can be interpreted.”

      Authors:

      Interestingly enough, prime-numbers, when one partials out the fact that there is only one even prime number, one prime number that is divisible by three etc, seem to be best described as what are called pseudo-random numbers. (See for example any of Terence Tao’s blog posts or presentations on “primes and pseudorandomness”.) So at least for now, to our knowledge, there seems to be no reason to believe that there is cross-frequency coupling behind any process we might expect to generate prime numbers. ;) But of course this is just an illustration of how hard it is to conclude anything about mechanistic processes by just using a CFC measure. As a side note, one should also not forget that still some care is needed when interpreting such statistics, i.e. recall the numerical information on the change of sign between \pi(x) and li(x) and Skewes’ numbers. ;) But probably none of us is an expert on primes and knows exactly why they give rise to a high CFC index. We reason in the article that even in the case of the CFC measured from the brain, this “why” still continues to have a multitude of possible answers.

      Now, more seriously, in the ECoG or random data we use the exactly same procedure as is usual in the CFC analysis. Indeed, we used the code provided by Tort for the modulation index, and the code provided by Canolty et al. from their Science paper and hence, their respective surrogate analysis (and in our text it was indicated that the results were significant). In addition, for the non-linearity case we even provided a simple example (supplementary material) where we derived analytically that quadratic non-linearities lead to CFC. <br /> ____

      MXC: “Another issue identified by the authors is the potential confound of co-occurring but independent low-frequency phase and high-frequency power dynamics. This is a potential confound (discussed in Cohen, 2014, Analyzing Neural Time Series Data; figure 30.7) but is fairly easy to identify and address (including: avoiding interpreting CFC from immediate post-stimulus periods, removing the phase-locked time-domain signal before computing CFC, and inspecting whether the time-course of CFC differs from the time-course of phase clustering). Perhaps the authors have additional suggestions?”

      Authors:

      As we note in our manuscript “if a brain area under a recording electrode receives time-varying input from any other brain area, this input might generate similar dependencies across frequency components (Figure 4A). The problem is that usually one has no control over the timing of the internal input to the examined brain area (Figure 4B). Thus, phase-amplitude coupling measured anywhere in the brain can be potentially explained by common influence on the phase and amplitude, without the phase of a low frequency oscillation modulating the power of high frequency activity.” The improvements mentioned in your commentary do not help to identify and address the problems with INTERNAL input, where we have no idea about the onset time (see Figure 4). <br /> ____

      MXC: “Later, they write (pages 9-10 and figure 4) "If a brain area under a recording electrode receives time-varying input from any other brain area, this input might generate similar dependencies across frequency components." This does not seem to be a confound, but rather, a description of CFC: low-frequency oscillations from a distal brain region modulate local activity, as manifest in higher frequency oscillations. Perhaps if the authors would identify a mechanism/consequence of CFC for neural activity it would be easier to understand whether/how this is a confound.”

      Authors:

      There is a misunderstanding here. We would not NOT agree with the interpretation that “low-frequency oscillations from a distal brain region modulate local activity, as manifest in higher frequency oscillations”. Instead we clearly write in our manuscript that “non-stationary input to a given area simultaneously affects the phase of a low frequency component and increases high-frequency activity (common drive to frequency components of the same signal).” This means that the low frequency phase is modulated and the high frequency component is influenced by the same common drive to the area. As we conclude: “In this case, high-frequency amplitude increases occur preferentially for certain phases of slow oscillations even without any need of interaction between the two rhythms.” (See also Figure 3). Again, we would agree on this point if CFC would stand for Cross-Frequency Correlation rather than Cross-Frequency Coupling, as the latter indicates interaction or causality.

      ____

      MXC: “On page 6, the authors write “The main conclusion is – not that surprisingly - that a clear peak in the power spectrum of the low frequency component is a prerequisite for a meaningful interpretation of any CFC pattern.” The justification does not follow. If one is interested in *phase* dynamics, why does there need to be a peak in *power*? Assuming that phase reflects the timing of neural populations while power reflects their spatial coherence at the LFP level, why is spatial coherence considered a prerequisite for investigating timing? In real EEG data, power and phase dynamics are often independent of each other.”

      Authors:

      It is here not at all necessary to think about which neural processes the phase or power variable could reflect. The reason for why a peak in the power spectrum is a prerequisite for a meaningful interpretation of phase (as an index that is a parameter of an oscillation) is well known in the physics/electrical engineering community and simply comes from the signal processing perspective: phase can be meaningfully defined only for narrow-band (and slowly frequency-varying) oscillatory signals for which the phase grows monotonically (please see page 35 of the manuscript: Supplementary discussion - conditions for a meaningful phase). Note that although narrow-band filtering a signal enhances smooth dynamics of its phase, it does not improve its physical interpretability.

      ____

      MCX: “A related discussion is potential differences in power across conditions. CFC methods generally measure the relationship between power and phase, not the magnitude of power. Appropriate permutation-based statistical corrections will account for differences in the magnitude of power (Cohen, 2014, chapter 30).”

      Authors:

      Yes, we agree that this is something that one indeed can control for and just point out that this is not always done in the literature. (See literature review).<br /> ____

      MCX: “The potential confound of low power for estimating phase (Muthukumaraswamy & Singh, 2011) applies only for very low SNR; in real EEG data, power and phase dynamics are often easily disambiguated and unrelated to each other.”

      Authors:

      The level of SNR for EEG is dependent on the frequency band considered and stimulation elicited by the experimental protocol. Here the main point is that many studies compare CFC between conditions that elicit very different power in a given band (e.g. peak vs no peak). Thus there is straight away a bias in the reliability of the phase estimation and therefore of the phase-amplitude coupling. How big this effect is should be assessed for each dataset. In addition, the amplitude and phase defined by the analytical signal approach (using Hilbert transforms) are not fully independent and even a nominal change in one of them induces a perturbation in the other (Supplementary Figure 7B).

      ____

      MXC: “Table 1 should include citations of the papers surveyed; otherwise independent verification is not possible.”

      Authors:

      we feel that the description preceding the literature review enables anyone to find the respective papers (as the years, journals and search criteria have been mentioned, a simple PUBMED search can provide the explicit list of papers considered). The magic paper is the one we added manually, which we indeed can identify here - Saalmann et al., 2012 in Science. The literature review covers papers up to January 2014 (included).

    1. AbstractIt has been empirically established that genome mixing between divergent species can trigger meiotic aberrations, ultimately leading to the emergence of asexual reproduction through the production of unreduced gametes in various metazoan lineages. Yet, it remains poorly understood how such asexual hybrids cope with co-inherited differences in sex determination systems, diverged regulatory networks, and chromosomal incompatibilities— especially in the context of increased ploidy. Addressing these questions requires high-quality, chromosome-level reference genomes of the parental species involved in hybrid formation.Here, we present the first chromosome-level genome assemblies for three hybridizing Cobitis species (C. elongatoides, C. taenia, and C. tanaitica), providing a comprehensive framework to investigate the genetic and cytogenetic basis of hybrid sterility and the transition to asexuality. By integrating genome scaffolding, male/female pooled sequencing, and molecular cytogenetics, we uncover extensive structural variation among homologous chromosomes of the three species, despite their overall syntenic conservation.Population-level Pool-Seq analyses further revealed that each species possesses a distinct, non-homologous sex chromosome, highlighting sex chromosome turnover even among recently diverged lineages. These assemblies enabled the design of chromosome-specific painting probes, which we applied to meiotic metaphase I spreads of diploid hybrids. This approach revealed striking differences in the pairing success of orthologous chromosomes, with some (e.g., Ch01B) frequently forming bivalents, while others (e.g., Ch01A, Ch05, Ch20) failed to do so and remained unpaired.Our results demonstrate that chromosome-specific features, shaped by structural evolution and sex-linked divergence, contribute unequally to hybrid meiotic failure. Together, this work provides a high-resolution genomic and cytogenetic framework to understand how interspecific hybridization gives rise to clonality, and how the architecture of inherited parental genomes shapes the success or breakdown of meiosis in hybrid vertebrates.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giag031), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 1:

      The authors assembled the genomes of three Cobitis species native to Eurasia in an attempt to investigate the effects of structural variants on hybrid meiotic failure. This is certainly an interesting topic given the advances in our abilities to study hybridization that have been enabled by modern genomic sequencing methods, and the evolutionary consequences of asexually-reproducing species that result from rare instances of these hybrid events.

      Major comments: The introduction of the manuscript is well-written and focused on the topic at hand. Language was mostly clear throughout the manuscript. However, the paper overall is very lengthy and would benefit from extensive revision. Personally, I think the assembly and annotation of the three genomes is worthy of being a paper (genome report) on its own. Extraction of this material into a separate manuscript would allow the authors to hone the remainder of the paper into a much more concise and focused manuscript. Some aspects of the methods section related to genome assembly and annotation could be clarified and/or bolstered. Presentation of methods is mostly clear, but the description of genome annotation methods is a bit tough to follow. This procedure included many complicated steps and may benefit from a flow chart, even if included only as a supplemental figure.

      Several important quality control steps pertaining to genome assembly and DNA/RNA sequence processing were not mentioned. Authors do not report methods used for quality filtering or trimming. They do not report any process for removal of sequencing adapters. Additionally, they do not report screening of the genome assemblies for contamination from other species. These are critical steps in producing high-quality genome assemblies that need to be addressed.

      Presentation of statistics describing genome assembly quality, contiguity, and completeness could be improved. Authors might want to take some inspiration from statistics required for reporting in genome reports published by other journals, such as G3 or Genome Biology and Evolution. Sequencing depth is not reported in any context for the initial assemblies. Only log-transformed values are available in a single figure. Throughout the manuscript, authors conflate sequencing coverage (the proportion of a genome or genomic region that has been sequenced) with sequencing depth (the number of times a base or genomic region has been sequenced).

      For the sex-linked primers designed by the authors - I would recommend development of an internal positive control that would be expected to amplify in both sexes and be easily distinguishable from the sex-linked locus by size or fluorescent label. This allows the users to distinguish between failed PCRs and identification of the homogametic sex. This is especially important because the fish selected for marker development were collected from a relatively small portion of the species' distributions (Figure 1) so there could be population-specific differences that affect reliability of these markers for identifying sex. This is a problem I regularly encounter in my own work for wide-ranging species.

      I was also surprised that the authors did not conduct a GWAS analysis. That seems to be a fairly typical analysis included in studies of this type to elucidate sex-linked SNPs. It would add to an already extensive manuscript; however, this could add an additional argument for splitting this manuscript in two. It would provide more space to include it in a more focused manuscript.

      The results section contains many statements that would be more appropriate in the Methods section, or could be deleted entirely because they are redundant with statements already present in the Methods section. Additionally, there are some sentences that are more appropriate for inclusion in the Discussion section because they are interpretive. I have included examples under the 'Minor comments' section of this review. Some of the material presented as results in the Supplementary tables is presented in a confusing manner, and appears to contain errors (see examples in 'Minor comments' section below).

      The first several paragraphs of the Discussion section either repeat material already covered in the Results section, or go on tangents that are not directly related to the main purpose of the paper. However, some of it could be more appropriate to include in a genome report if the authors split the manuscript in two.

      Given the above issues, I find that the paper needs extensive editing and possibly more analytical work (if some of the methodological deficiencies were overlooked in the analysis phase as well as the writing phase of this project). It is unlikely this work could be accomplished in the normal window for a revision. Therefore, I regrettably suggest rejection of the manuscript.

      Finally, I have no meaningful experience with FISH probes or chromosomal painting so unfortunately, I can't provide much comment on those portions of the paper.

      Minor comments: Line 291: please provide specific version number for Hisat2 Line 319: version numbers for D-Genies and SyRI missing Line 331: version number for NGenomeSyn missing Line 439-440: Authors provide N50 values, but the paper would benefit from providing some additional metrics, such as N90 and L90, to help readers gauge the contiguity of these genomes. Line 442 - 443: I'm having a hard time understanding how the authors are calling these 'chromosome-level' assemblies when nearly a third (>30%) of the genome of two species (C. tanaitica and C. elongatoides) could not be assembled into chromosomal scaffolds. Line 457 - 458: Either the term 'topologically associated domains' is missing, or the authors need to remove the parentheses from around TADs if it was defined earlier in the manuscript. Line 470: change 'less' to 'fewer' Line 483 - 486: The statements that observed patterns of repeat families 'suggest' something are interpretive and should be moved to the discussion. Line 499 - 500: This sentence repeats content of the methods section. I suggest deleting it. Line 540 - 564: If I am understanding correctly, the discussion of 'coverage' here would be more accurately described as 'depth' since the authors seem to be talking about average sequencing depth in different areas of the genome. Furthermore, authors never provide untransformed measures of sequencing depth in any context (the initial genome assemblies, pool-seq data, re-sequenced individuals, etc.). Therefore, it is difficult to determine if the differences being discussed here are derived from data with enough statistical power to measure differences in sequencing depth between male and female fish. Lines 614 - 619: This could be explored with GWAS Lines 635 - 641: Much of this paragraph is a description of methods and belongs in the Methods section. Lines 664 - 667: Much of this is interpretive - more appropriate for the discussion. Lines 700 - 711: This paragraph has little or no relevance to the main topic of this paper (hybrid meiotic failure). Line 745: remove "loci's" Line 813 - 815: PMER was already defined earlier in the paper. Line 854: I suggest removal of "the first of their kind in an asexually reproducing vertebrate," because such statements rarely age well, and the concept behind the paper is interesting enough to stand on its own without pointing out the novelty of it being the 'first' time it was detected. References section: Capitalization of article titles varies from one reference to the next. Scientific names are sometimes italicized; other times they are not. Table 2: 'L50' and 'Number of Chromosomes' are always going to be integers. Why are there two significant digits to the right of the decimal point? Supplementary Figure S2: 'Cobitis' should be italicized. Supplementary Table S7: This table presents pre- and post-HiC values in a confusing manner that is nonsensical and probably erroneous. For example, the N50 values seem problematic. How do you have a 154 Kbp pre-HiC N50 contig value for C. elongatoides, but a 154 Mbp post-HiC N50 contig value for the same species? This is longer than the longest reported chromosome for any species (C. taenia) in Supplementary Table S8 (99 Mbp). Supplementary Table S10: I don't know what the percentages in line 33 refer to?

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      In this paper, Stanojcic and colleagues attempt to map sites of DNA replication initiation in the genome of the African trypanosome, Trypanosoma brucei. Their approach to this mapping is to isolate 'short-nascent strands' (SNSs), a strategy adopted previously in other eukaryotes (including in the related parasite Leishmania major), which involves isolation of DNA molecules whose termini contain replication-priming RNA. By mapping the isolated and sequenced SNSs to the genome (SNS-seq), the authors suggest that they have identified origins, which they localise to intergenic (strictly, inter-CDS) regions within polycistronic transcription units and suggest display very extensive overlap with previously mapped R-loops in the same loci. Finally, having defined locations of SNS-seq mapping, they suggest they have identified G4 and nucleosome features of origins, again using previously generated data. Though there is merit in applying a new approach to understand DNA replication initiation in T. brucei, where previous work has used MFA-seq and ChIP of a subunit of the Origin Replication Complex (ORC), there are two significant deficiencies in the study that must be addressed to ensure rigour and accuracy.

      (i) The suggestion that the SNS-seq data is mapping DNA replication origins that are present in inter-CDS regions of the polycistronic transcription units of T. brucei is novel and does not agree with existing data on the localisation of ORC1/CDC6, and it is very unclear if it agrees with previous mapping of DNA replication by MFA-seq due to the way the authors have presented this correlation. For these reasons, the findings essentially rely on a single experimental approach, which must be further tested to ensure SNS-seq is truly detecting origins. Indeed, in this regard, the very extensive overlap of SNS-seq signal with RNA-DNA hybrids should be tested further to rule out the possibility that the approach is mapping these structures and not origins.

      (ii) The authors' presentation of their SNS-seq data is too limited and therefore potentially provides a misleading view of DNA replication in the genome of T. brucei. The work is presented through a narrow focus on SNS-seq signal in the inter-CDS regions within polycistronic transcription units, which constitute only part of the genome, ignoring both the transcription start and stop sites at the ends of the units and the large subtelomeres, which are mainly transcriptionally silent. The authors must present a fuller and more balanced view of SNS-seq mapping, across the whole genome, to ensure full understanding and clarity.

      In the revised manuscript, the authors have improved the presentation and analysis of their data, expanding the description of SNS-seq mapping across the genome, and more clearly assessing to what extent there is correlation between SNS-seq signal and previous mapping approaches to predict origins (by MFA-seq and ChiP-chip of ORC1/CDC6). With regard the correlation between SNS-seq and ORC/1CDC6 ChIP-chip, it should be noted that two datasets were generated in distinct strains of T. brucei (Lister 427 and TREU927, respectively), and it is unclear if the latter dataset can be accurately mapped to the strain used here. Notwithstanding this concern, these improvements clarify a number of aspects of the SNS-seq mapping: (1) the signal is more prevalent in the transcribed core of the genome than in the largely transcriptionally silent subtelomeres; and (2) whereas previous work revealed strong correlation between ORC1/CDC6 localisation and MFA-seq peaks at the ends of multigene transcription units, neither of these data show significant overlap with SNS-seq signal, which is not seen at transcription start or stop sites ('SSRs'; supplementary Fig.8D) and shows marked depletion at predicted ORC1/CDC6 sites (supplementary Fig.8C). To the authors' credit, they acknowledge this lack of correlation in the discussion.

      The authors have not provided any new data to substantiate their assertion that SNS-seq accurately detects origins in T. brucei, and therefore the work rests on a single experimental approach, without validation. As a result, the suggestion of abundant, previously undetected origins in the intergenic regions of multigene transcription remains a prediction. One key untested limitation of the work lies in the observation that the very large majority of SNS-seq signal overlaps with previously RNA-DNA hybrids; without an experimental test, the suggestion that the authors have 'disclosed for the first time a strong link between RNANA hybrid formation and DNA replication initiation' remains conjecture.

      Reviewer #2 (Public review):

      Summary:

      Stanojcic et al. investigate the origins of DNA replication in the unicellular parasite Trypanosoma brucei. They perform two experiments, stranded SNS-seq and DNA molecular combing. Further, they integrate various publicly available datasets, such as G4-seq and DRIP-seq, into their extensive analysis. Using this data, they elucidate the structure of origins of replications. In particular, they find various properties located at or around origins, such as polynucleotide stretches, G-quadruplex structures, regions of low and high nucleosome occupancy, R-loops, and that origins are mostly present in intergenic regions. Combining their population-level SNS-seq and their single-molecule DNA molecular combing data, they elucidate the total number of origins as well as the number of origins active in a single cell.

      Between the initial submission and this revision, the raised major concerns have not been resolved, and no additional validation has been provided.

      Strengths:

      (1) A very strong part of this manuscript is that the authors integrate several other datasets and investigate a large number of properties around origins of replication. Data analysis clearly shows the enrichment of various properties at the origins, and the manuscript is concluded with a very well-presented model that clearly explains the authors' understanding and interpretation of the data.

      (2) The DNA combing experiment is an excellent orthogonal approach to the SNS-seq data. The authors used the different properties of the two experiments (one giving location information, one giving single-molecule information) well to extract information and contrast the experiments.

      (3) The discussion is exemplary, as the authors openly discuss the strengths and weaknesses of the approaches used. Further, the discussion serves its purpose of putting the results in both an evolutionary and a trypanosome-focused context.

      Weaknesses:

      I have major concerns about the origin of replication sites determined from the SNS-seq data. As a caveat, I want to state that, before reading this manuscript, SNS-seq was unknown to me; hence, some of my concerns might be misplaced.

      (1) There are substantial discrepancies between the origins identified here and those reported in previous studies. Given that the other studies precede this manuscript, it is the authors' duty to investigate these differences. A conclusion should be reached on why the results are different, e.g., by orthogonally validating origins absent in the previous studies.

      We agree that orthogonally validation of origins detected by stranded SNS-seq is necessary and we are working on it.

      (2) I am concerned that up to 96% percent of all SNS-seq peaks are filtered away. If there is so much noise in the data, how can one be sure that the peaks that remain are real? Upon request, the authors have performed a control, where randomly placed peaks were run through the same filtering process. Only approximately twice as many experimental peaks passed filtering compared to random peaks. While the authors emphasize reproducibility between replicates, technical artifacts from the protocol would also be reproducible. Moreover, in other SNS-seq studies, for example, Pratto et al. Cell 2021, Fig. 1B, + and − strand peaks always appear closely paired. This pattern contrasts strongly with Fig. 2A in this manuscript.

      The size and overlap of peaks depend on the length of the SNS. In our study, the width of the peaks corresponds to the size of the short nascent strands (0.5–2.5 kb) chosen as the starting material, whereas the width of the peaks in Pratto et al., Cell, 2021 are much larger (few kb). This could be due to the longer SNS used in the Pratto et al. study. Consequently, the overlap of the longer SNS is more pronounced since the SNS fibres elongate in both directions: at the 3′ end by DNA polymerase and at the 5′ end by ligation of Okazaki fragments. Additionally, the genomic regions displayed in our Figure 2A and in Pratto et al, Figure 1B are presented at substantially different resolutions, with a roughly ten‑fold difference in scale.

      Further, I have some minor concerns that do not affect the main conclusions of the manuscript:

      - Fig 2C: The regions shown in the heatmap have different sizes, and I presume that the regions are ordered by size on the y-axis? If so, does the cone-shaped pattern, which is origin-less for genic regions and origin-enriched for intergenic regions, arise from the size of the regions? (I.e., for each genic region, the region itself is origin-less and the flanking intergenic regions contain origins.) If this is the case, then the peaks/valleys, centered exactly on the center of the regions on the mean frequency plots, arise from the different sizes of the analyzed regions, not from the fact that origins are mostly found at the center of intergenic regions. This data would be better presented with all regions stretched to the same size. This has not been addressed in the revision.

      As the reviewer suggested, we have produced scaled plots of the stranded SNS-seq origins over genic and intergenic regions (see Figure 3, which is attached along with the Reviewer #2 (Recommendations for the authors)). However, we would prefer to keep the unscaled versions in the manuscript and add a note in the text as part of the Version of Record, explaining that the origins are evenly distributed throughout intergenic regions rather than being centred within them.

      - Line 123, "and the average length of origins was found to be approximately 150 bp.": To determine origins, the authors filter away overlapping peaks and peaks that are too far from each other. Both restrict the minimal and maximal length of origins that can be observed, and this, in turn, affects the average length. This has not been addressed in the revision.

      This observation is correct. By applying filtering and setting the maximum distance between the positive and negative peaks, we are most likely affecting the average length by excluding potentially wider origins.

      We'll modify the text as part of the Version of Record.

      Are claims well substantiated?:

      The identification of origins via SNS-seq appears to be incompletely supported to me.<br /> All downstream analyses depend on the reliability of origin identification.<br /> Impact:

      This study has the potential to be valuable for two fields: In research focused on T. brucei as a disease agent, where essential processes that function differently than in mammals are excellent drug targets. Further, this study would impact basic research analyzing DNA replication over the evolutionary tree, where T. brucei can be used as an early-divergent eucaryotic model organism.


      The following is the authors’ response to the original reviews.

      eLife Assessment

      The authors use sequencing of nascent DNA (DNA linked to an RNA primer, "SNS-Seq") to localise DNA replication origins in Trypanosoma brucei, so this work will be of interest to those studying either Kinetoplastids or DNA replication. The paper presents the SNS-seq results for only part of the genome, and there are significant discrepancies between the SNS-Seq results and those from other, previously-published results obtained using other origin mapping methods. The reasons for the differences are unknown and from the data available, it is not possible to assess which origin-mapping method is most suitable for origin mapping in T. brucei. Thus at present, the evidence that origins are distributed as the authors claim - and not where previously mapped - is inadequate.

      We would like to clarify a few points regarding our study. Our primary objective was to characterise the topology and genome-wide distribution of short nascent-strand (SNS) enrichments. The stranded SNS-seq approach provides the high strand-specific resolution required to analyse origins. The observation that SNS-seq peaks (potential origins) are most frequently found in intergenic regions is not an artefact of analysing only part of the genome; rather, it is a result of analysing the entire genome.

      We agree that orthogonal validation is necessary. However, neither MFA-seq nor TbORC1/CDC6 ChIP-on-chip has yet been experimentally validated as definitive markers of origin activity in T. brucei, nor do they validate each other.

      Public Reviews:

      Reviewer #1 (Public review):

      In this paper, Stanojcic and colleagues attempt to map sites of DNA replication initiation in the genome of the African trypanosome, Trypanosoma brucei. Their approach to this mapping is to isolate 'short-nascent strands' (SNSs), a strategy adopted previously in other eukaryotes (including in the related parasite Leishmania major), which involves isolation of DNA molecules whose termini contain replication-priming RNA. By mapping the isolated and sequenced SNSs to the genome (SNS-seq), the authors suggest that they have identified origins, which they localise to intergenic (strictly, inter-CDS) regions within polycistronic transcription units and suggest display very extensive overlap with previously mapped R-loops in the same loci. Finally, having defined locations of SNS-seq mapping, they suggest they have identified G4 and nucleosome features of origins, again using previously generated data.

      Though there is merit in applying a new approach to understand DNA replication initiation in T. brucei, where previous work has used MFA-seq and ChIP of a subunit of the Origin Replication Complex (ORC), there are two significant deficiencies in the study that must be addressed to ensure rigour and accuracy.

      (1) The suggestion that the SNS-seq data is mapping DNA replication origins that are present in inter-CDS regions of the polycistronic transcription units of T. brucei is novel and does not agree with existing data on the localisation of ORC1/CDC6, and it is very unclear if it agrees with previous mapping of DNA replication by MFA-seq due to the way the authors have presented this correlation. For these reasons, the findings essentially rely on a single experimental approach, which must be further tested to ensure SNS-seq is truly detecting origins. Indeed, in this regard, the very extensive overlap of SNS-seq signal with RNA-DNA hybrids should be tested further to rule out the possibility that the approach is mapping these structures and not origins.

      (2) The authors' presentation of their SNS-seq data is too limited and therefore potentially provides a misleading view of DNA replication in the genome of T. brucei. The work is presented through a narrow focus on SNS-seq signal in the inter-CDS regions within polycistronic transcription units, which constitute only part of the genome, ignoring both the transcription start and stop sites at the ends of the units and the large subtelomeres, which are mainly transcriptionally silent. The authors must present a fuller and more balanced view of SNS-seq mapping across the whole genome to ensure full understanding and clarity.

      Regarding comparisons with previous work:

      - Two other attempts to identify origins in T. brucei - ORC1/CDC6 binding sites (ChIP-on-chip, PMID: 22840408) and MFA-seq (PMID: 22840408, 27228154) - were both produced by the McCulloch group. These methods do not validate each other; in fact, MFA-seq origins overlap with only 4.4% of the 953 ORC1/CDC6 sites (PMID: 29491738). Therefore, low overlap between SNS-seq peaks and ORC1/CDC6 sites cannot disqualify our findings. Similar low overlaps are observed in other parasites (PMID: 38441981, PMID: 38038269, PMID: 36808528) and in human cells (PMID: 38567819).

      - We also would like to emphasize that the ORC1/CDC6 dataset originally published (PMID: 22840408) is no longer available; only a re-analysis by TritrypDB exists, which differs significantly from the published version (personal communication from Richard McCulloch). While the McCulloch group reported a predominant localization of ORC1/CDC6 sites within SSRs at transcription start and termination regions, our re-analysis indicates that only 10.3% of TbORC1/CDC6-12Myc sites overlapped with 41.8% of SSRs.

      - MFA-seq does not map individual origins, it rather detects replicated genomic regions by comparing DNA copy number between S- and G1-phases of the cell cycle (PMID: 36640769; PMID: 37469113; PMID: 36455525). The broad replicated regions (0.1–0.5 Mbp) identified by MFA-seq in T. brucei are likely to contain multiple origins, rather than just one. In that sense we disagree with the McCulloch's group who claimed that there is a single origin per broad peak. Our analysis shows that up to 50% of the origins detected by stranded SNS-seq locate within broad MFA-seq regions. The methodology used by McCulloch’s group to infer single origins from MFA-seq regions has not been published or made available, as well as the precise position of these regions, making direct comparison difficult.

      Finally, the genomic features we describe—poly(dA/dT) stretches, G4 structures and nucleosome occupancy patterns—are consistent with origin topology described in other organisms.

      On the concern that SNS-seq may map RNA-DNA hybrids rather than replication origins: Isolation and sequencing of short nascent strands (SNS) is a well-established and widely used technique for high-resolution origin mapping. This technique has been employed for decades in various laboratories, with numerous publications documenting its use. We followed the published protocol for SNS isolation (Cayrou et al., Methods, 2012, PMID: 22796403). RNA-DNA hybrids cannot persist through the multiple denaturation steps in our workflow, as they melt at 95°C (Roberts and Crothers, Science, 1992; PMID: 1279808). Even in the unlikely event that some hybrids remained, they would not be incorporated into libraries prepared using a single-stranded DNA protocol and therefore would not be sequenced (see Figure 1B and Methods).

      Furthermore, our analysis shows that only a small proportion (1.7%) of previously reported RNA-DNA hybrids overlap with SNS-seq origins. It is important to note that RNA-primed nascent strands naturally form RNA-DNA hybrids during replication initiation, meaning the enrichment of RNA-DNA hybrids near origins is both expected and biologically relevant.

      On the claim that our analysis focuses narrowly on inter-CDS regions and ignores other genomic compartments: this is incorrect. We mapped and analyzed stranded SNS-seq data across the entire genome of T. brucei 427 wild-type strain (Müller et al., Nature, 2018; PMID: 30333624), including both core and subtelomeric regions. Our findings indicate that most origins are located in intergenic regions, but all analyses were performed using the full set of detected origins, regardless of location.

      We did not ignore transcription start and stop sites (TSS/TTS). The manuscript already includes origin distribution across genomic compartments as defined by TriTrypDB (Fig. 2C) and addresses overlap with TSS, TTS and HT in the section “Spatial coordination between the activity of the origin and transcription”. While this overlap is minimal, we have included metaplots in the revised manuscript for clarity.

      Reviewer #2 (Public review):

      Summary:

      Stanojcic et al. investigate the origins of DNA replication in the unicellular parasite Trypanosoma brucei. They perform two experiments, stranded SNS-seq and DNA molecular combing. Further, they integrate various publicly available datasets, such as G4-seq and DRIP-seq, into their extensive analysis. Using this data, they elucidate the structure of the origins of replication. In particular, they find various properties located at or around origins, such as polynucleotide stretches, G-quadruplex structures, regions of low and high nucleosome occupancy, R-loops, and that origins are mostly present in intergenic regions. Combining their population-level SNS-seq and their single-molecule DNA molecular combing data, they elucidate the total number of origins as well as the number of origins active in a single cell.

      Strengths:

      (1) A very strong part of this manuscript is that the authors integrate several other datasets and investigate a large number of properties around origins of replication. Data analysis clearly shows the enrichment of various properties at the origins, and the manuscript concludes with a very well-presented model that clearly explains the authors' understanding and interpretation of the data.

      We sincerely thank you for this positive feedback.

      (2) The DNA combing experiment is an excellent orthogonal approach to the SNS-seq data. The authors used the different properties of the two experiments (one giving location information, one giving single-molecule information) well to extract information and contrast the experiments.

      Thank you very much for this remark.

      (3) The discussion is exemplary, as the authors openly discuss the strengths and weaknesses of the approaches used. Further, the discussion serves its purpose of putting the results in both an evolutionary and a trypanosome-focused context.

      Thank you for appreciating our discussion.

      Weaknesses:

      I have major concerns about the origin of replication sites determined from the SNS-seq data. As a caveat, I want to state that, before reading this manuscript, SNS-seq was unknown to me; hence, some of my concerns might be misplaced.

      (1) I do not understand why SNS-seq would create peaks. Replication should originate in one locus, then move outward in both directions until the replication fork moving outward from another origin is encountered. Hence, in an asynchronous population average measurement, I would expect SNS data to be broad regions of + and -, which, taken together, cover the whole genome. Why are there so many regions not covered at all by reads, and why are there such narrow peaks?

      Thank you for asking these questions. As you correctly point out, replication forks progress in both directions from their origins and ultimately converge at termination sites. However, the SNS-seq method specifically isolates short nascent strands (SNSs) of 0.5–2.5 kb using a sucrose gradient. These short fragments are generated immediately after origin firing and mark the sites of replication initiation, rather than the entire replicated regions. Consequently: (i) SNS-seq does not capture long replication forks or termination regions, only the immediate vicinity of origins. (ii) The narrow peaks indicate the size of selected SNSs (0.5–2.5 kb) and the fact that many cells initiate replication at the same genomic sites, leading to localized enrichment. (iii) Regions without coverage refer to genomic areas that do not serve as efficient origins in the analyzed cell population. Thus, SNS-seq is designed to map origin positions, but not the entire replicated regions.

      (2) I am concerned that up to 96% percent of all peaks are filtered away. If there is so much noise in the data, how can one be sure that the peaks that remain are real? Specifically, if the authors placed the same number of peaks as was measured randomly in intergenic regions, would 4% of these peaks pass the filtering process by chance?

      Maintaining the strandness of the sequenced DNA fibres enabled us to filter the peaks, thereby increasing the probability that the filtered peak pairs corresponded to origins. Two SNS peaks must be oriented in a way that reflects the topology of the SNS strands within an active origin: the upstream peak must be on the minus strand and followed by the downstream peak on the plus strand.

      As suggested by the reviewer, we tested whether randomly placed plus and minus peaks could reproduce the number of filter-passing peaks using the same bioinformatics workflow. Only 1–6% of random peaks passed the filters, compared with 4–12% in our experimental data, resulting in about 50% fewer selected regions (origins). Moreover, the “origins” from random peaks showed 0% reproducibility across replicates, whereas the experimental data showed 7–64% reproducibility. These results indicate that the retainee peaks are highly unlikely to arise by chance and support the specificity of our approach. Thank you for this suggestion.

      (3) There are 3 previous studies that map origins of replication in T. brucei. Devlin et al. 2016, Tiengwe et al. 2012, and Krasiļņikova et al. 2025 (https://doi.org/10.1038/s41467-025-56087-3), all with a different technique: MFA-seq. All three previous studies mostly agree on the locations and number of origins. The authors compared their results to the first two, but not the last study; they found that their results are vastly different from the previous studies (see Supplementary Figure 8A). In their discussion, the authors defend this discrepancy mostly by stating that the discrepancy between these methods has been observed in other organisms. I believe that, given the situation that the other studies precede this manuscript, it is the authors' duty to investigate the differences more than by merely pointing to other organisms. A conclusion should be reached on why the results are different, e.g., by orthogonally validating origins absent in the previous studies.

      The MFA-seq data for T. brucei were published in two studies by McCulloch’s group: Tiengwe et al. (2012) using TREU927 PCF cells, and Devlin et al. (2016) using PCF and BSF Lister427 cells. In Krasilnikova et al. (2025), previously published MFA-seq data from Devlin et al. were remapped to a new genome assembly without generating new MFA-seq data, which explains why we did not include that comparison.

      Clarifying the differences between MFA-seq and our stranded SNS-seq data is essential. MFA-seq and SNS-seq interrogate different aspects of replication. SNS-seq is a widely used, high-resolution method for mapping individual replication origins, whereas MFA-seq detects replicated regions by comparing DNA copy number between S and G1 phases. MFA-seq identified broad replicated regions (0.1–0.5 Mb) that were interpreted by McCulloch’s group as containing a single origin. We disagree with this interpretation and consider that there are multiple origins in each broad peaks; theoretical considerations of replication timing indicate that far more origins are required for complete genome duplication during the short S-phase. Once this assumption is reconsidered, MFA-seq and SNS-seq results become complementary: MFA-seq identifies replicated regions, while SNS-seq pinpoints individual origins within those regions. Our analysis revealed that up to 50% of the origins detected by stranded SNS-seq were located within the broad MFA peaks. This pattern—broad MFA-seq regions containing multiple initiation sites—has also recently been found in Leishmania by McCulloch’s team using nanopore sequencing (PMID: 26481451). Nanopore sequencing showed numerous initiation sites within MFA-seq regions and additional numerous sites outside these regions in asynchronous cells, consistent with what we observed using stranded SNS-seq in T. brucei. We will expand our discussion and conclude that the discrepancy arises from methodological differences and interpretation. The two approaches provide complementary insights into replication dynamics, rather than ‘vastly different’ results.

      We recognize the importance of validating our results in future using an alternative mapping method and functional assays. However, it is important to emphasize that stranded SNS-seq is an origin mapping technique with a very high level of resolution. This technique can detect regions between two divergent SNS peaks, which should represent regions of DNA replication initiation. At present, no alternative technique has been developed that can match this level of resolution.

      (4) Some patterns that were identified to be associated with origins of replication, such as G-quadruplexes and nucleosomes phasing, are known to be biases of SNS-seq (see Foulk et al. Characterizing and controlling intrinsic biases of lambda exonuclease in nascent strand sequencing reveals phasing between nucleosomes and G-quadruplex motifs around a subset of human replication origins. Genome Res. 2015;25(5):725-735. doi:10.1101/gr.183848.114).

      It is important to note that the conditions used in our study differ significantly from those applied in the Foulk et al. Genome Res. 2015. We used SNS isolation and enzymatic treatments as described in previous reports (Cayrou, C. et al. Genome Res, 2015 and Cayrou, C et al. Methods, 2012). Here, we enriched the SNS by size on a sucrose gradient and then treated this SNS-enriched fraction with high amounts of repeated λ-exonuclease treatments (100u for 16h at 37oC - see Methods). In contrast, Foulk et al. used sonicated total genomic DNA for origin mapping, without enrichment of SNS on a sucrose gradient as we did, and then they performed a λ-exonuclease treatment. A previous study (Cayrou, C. et al. Genome Res, 2015, Figure S2, which can be found at https://genome.cshlp.org/content/25/12/1873/suppl/DC1) has shown that complete digestion of G4-rich DNA sequences is achieved under the conditions we used.

      Furthermore, the SNS depleted control (without RNA) was included in our experimental approach. This control represents all molecules that are difficult to digest with lambda exonuclease, including G4 structures. Peak calling was performed against this background control, with the aim of removing false positive peaks resulting from undigested DNA structures. We explained better this step in the revised manuscript.

      The key benefit of our study is that the orientation of the enrichments (peaks) remains consistent throughout the sequencing process. We identified an enrichment of two divergent strands synthesised on complementary strands containing G4s. These two divergent strands themselves do not, however, contain G4s (see Fig. 8 for the model). Therefore, the enriched molecules detected in our study do not contain G4s. They are complementary to the strands enriched with G4s. This means that the observed enrichment of

      G4s cannot be an artefact of the enzymatic treatments used in this study. We added this part in the discussion of the revised manuscript.

      We also performed an additional control which is not mentioned in the manuscript. In parallel with replicating cells, we isolated the DNA from the stationary phase of growth, which primarily contains non-replicating cells. Following the three λ-exonuclease treatments, there was insufficient DNA remaining from the stationary phase cells to prepare the libraries for sequencing. This control strongly indicated that there was little to no contaminating DNA present with the SNS molecules after λ-exonuclease enrichment.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Four broad issues need to be addressed.

      (1) The authors have attempted to test the overlap between ORC1/CDC6 (an ORC subunit) binding in the genome and SNS-seq. If there were an overlap, this would provide evidence that the SNS-seq signals represent origins. However, the analysis provided is inadequate: merely a statement that "we obtained an overlap of 4.2% between origins and ORC1/CDC6 binding sites within a window of {plus minus}2 kb and 6.2% in the window of {plus minus}3 kb". Nowhere are these data shown or properly discussed:

      a) The authors need to provide a diagram showing where in the genome the very small amount of overlapping SNS-seq and ORC1/CDC6 binding occurs, and to clearly show and state how many of the intergenic SNS-seq peaks are sites of ORC1/CDC6 binding. In the absence of such analysis, a key question is unanswered: is there any evidence of ORC1/CDC6 (or ORC more broadly) binding at the SNS-seq signals within the polycistronic transcription units?

      In the original version of the manuscript, these data were already presented as percentages in the text and as a metaplot (Supplementary Fig. 8C).

      We based our analysis on the set of 350 TbORC1/CDC6 binding sites available on TriTrypDB at the time of analysis. This dataset was a filtered subset of the originally reported TbORC1/CDC6 ChIP‑on‑chip peaks (personal communication, TriTrypDB). Since then, the unfiltered dataset has been made available. We therefore re‑analyzed the overlap using this dataset, to which we applied a filtering that yielded 990 binding sites closely matching the 953 sites reported by the McCulloch group. We need to stress here that the original 953 sites reported by the McCulloch group (Tiengwe et al., 2012 PMID: 22840408), is not available anymore and that the authors:

      - do not provide genomic coordinates for the 953 binding sites and

      - do not release any scripts or methodology that would allow independent reproduction of the 953 sites.

      A similar remark also applies to the MFA-seq data (see below).

      To address the reviewer’s request, we have now:

      (1) Recalculated the overlap using the updated TbORC1/CDC6 dataset (990 binding sites) from TriTrypDB.

      (2) Added the absolute number of overlapping SNS‑seq origins and TbORC1/CDC6 binding sites in the Results section for clarity.

      (3) Included the TbORC1/CDC6 binding sites in the chromosomal overview (newly added to Supplementary Fig. 8A), so that their genomic localization relative to SNS‑seq peaks is visually accessible.

      (4) Revised the metaplots of TbORC1/CDC6 distribution around SNS‑seq origins using the updated dataset (Supplementary Fig. 8C).

      With these improvements, we now find that:

      - Within ±2 kb, 12.9% (253) of SNS‑seq origins overlap with 25.6% of TbORC1/CDC6 binding sites.

      - Within ±3 kb, 18.8% (370) of SNS‑seq origins overlap with 37.4% of TbORC1/CDC6 binding sites.

      The updated metaplot shows a clear depletion of TbORC1/CDC6 signal at the origin center, with modest enrichment ~5 kb upstream and downstream. The underlying reason for this pattern remains unknown, and we agree that additional studies will be needed to understand it.

      b) Equally, the authors need to explain what they conclude from this analysis. They make a comparison with T. cruzi ORC1/CDC6 and SNS-seq overlap, which does not illuminate what the data tell us. For instance, if there is no or minimal overlap between ORC1/CDC6 binding and SNS-seq peaks within the polycistronic transcription units, do they conclude that the major SNS-seq signal they detail is evidence for ORC-independent DNA replication? If there is no overlap, what further evidence can they provide that these signals truly are origins?

      First, we would like to clarify that, to date, there is no evidence supporting ORC‑independent DNA replication in T. brucei, and—importantly—no published data demonstrating that TbORC1/CDC6 is universally required for DNA replication initiation. Because of this, we consider that it would be inappropriate to conclude that regions lacking detectable TbORC1/CDC6 signal undergo ORC‑independent initiation. We would prefer not to speculate in the absence of supporting evidence and would gratefully consider any reference the reviewer wishes to provide on this subject.

      Second, the low overlap between TbORC1/CDC6 binding sites and SNS‑seq origins does not, in our view, invalidate our mapping of replication initiation sites. Multiple factors contribute to this:

      (1) Low overlap between ORC1/CDC6 and origin‑mapping techniques has been repeatedly reported across kinetoplastids. For instance, in T. cruzi, 88.2% of origins detected by DNAscent nanopore sequencing showed no overlap with TcORC1/CDC6–Ty1 ChIP signal within ±3 kb, and only 11.7% co‑localized. This is strikingly similar to our observations in T. brucei. Thus, our data are consistent with the broader pattern in trypanosomatids rather than an exception.

      (2) The origin topology detected by stranded SNS‑seq is supported by several genomic characteristic found frequently in other eukaryotes, including:

      - A highly specific and polarized poly(dA)/poly(dT) sequence environment.

      - Strand‑specific G4 structures positioned around origin centers.

      - A conserved nucleosome‑depleted region flanked by well‑positioned nucleosomes.

      These features are absent from shuffled controls, appear at high significance, and recapitulate hallmark signatures of replication origins in other eukaryotes.

      Together, these findings give us confidence that the SNS‑seq peaks represent genuine origins - despite the incomplete overlap with TbORC1/CDC6 binding.

      Third, we fully agree with the reviewer that a definitive conclusion would require an additional, independent validation method.

      Given the lack of complete ORC subunit datasets and the unusual biology of trypanosomatid replication complexes, we believe that the cautious interpretation above is the most appropriate.

      c) The authors state (Discussion): "Validation of origins is generally a difficult task, particularly in trypanosomatids, where proteins involved in the initiation of DNA replication are difficult to determine. Few proteins have been described as potential ORC subunits (reviewed in 61), and none of them have been shown to be a specific marker that indicates the origins." There are two problems with the statement. First, most of the subunits of ORC have now been described in T. brucei; the authors should make this clear. Second, mapping of ORC1/CDC6 localisation, contrary to what the authors state here, shows precise correlation with the peaks of every MFA-seq signal described (see Tiengwe et al, Cell Reports, 2012); thus, ORC1/CDC6 binding provides evidence that MFA-seq is detecting origins, something that cannot be said for SNS-seq. The authors need to correct this misleading paragraph.

      As suggested, we have removed the paragraph from the Discussion to avoid confusion. However, we disagree with the reviewer's assessment and clarify below our position regarding the issues raised.

      First, we agree that five candidate ORC subunits have now been identified in T. brucei. Our intention was not to suggest the contrary, but rather to emphasize that, although candidate ORC components have been described, direct functional evidence for their roles in replication initiation is still limited. For this reason, we were cautious in referring to any ORC component as a definitive marker of replication origins.

      Second, regarding the reviewer’s statement that TbORC1/CDC6 binding “shows precise correlation with the peaks of every MFA‑seq signal”, we respectfully disagree based on several observations:

      (1) MFA‑seq does not identify individual origin centers, but rather broad replicated regions that often span hundreds of kilobases. By design, this method cannot define the number or position of discrete origins within each peak. For that reason, MFA-seq regions do not have the resolution required to validate TbORC1/CDC6 binding sites as individual origins.

      (2) In the published datasets (Tiengwe et al., Devlin et al.), no metaplots or locus‑wide quantification of the overlap between MFA‑seq peaks and TbORC1/CDC6 binding were provided. The coordinates or the approach used to define the discrete regions that they define as the originsin the MFA‑seq broad peaks have never been described or made available, making it difficult to evaluate the claimed correspondence.

      (3) Notably, McCulloch’s group later reported that only 4.4% of the 953 TbORC1/CDC6 sites overlapped with their 42 MFA‑seq “origins”, underscoring that the degree of correspondence is in fact limited (PMID: 29491738).

      (4) Finally, as noted in our response to point (1b), low overlap between ORC1/CDC6 binding sites and origin‑mapping techniques is a consistent observation across kinetoplastids, including T. cruzi, where DNAscent‑mapped origins show only ~12% overlap with TcORC1/CDC6 ChIP signals. This suggests that the limited overlap we observe is not unique to our dataset.

      For these reasons, we are not convinced that the TbORC1/CDC6 binding sites have been shown to align precisely with MFA seq peaks, nor that these datasets definitively validate origin mapping in T. brucei. Nevertheless, to avoid over‑interpretation and potential confusion, we have removed the paragraph from the Discussion as requested. We hope this clarifies our position and improves the accuracy and neutrality of the manuscript.

      (2) Like for ORC1/CDC6 localisation, the authors' evaluation of the relationship between MFA-seq and SNS-seq mapping is inadequate, and the depth of the analysis and discussion needs to be improved:

      a) The authors state: "We found 28-42% stranded SNS-seq origins overlapped with early and 43-55% overlapped with late S-phase MFA-seq replicated regions (Supplementary Figure 8B)." This seems important and provides (limited) validation of both datasets, but cannot be discerned from the supplied figure. Please provide a metaplot of the two datasets centred on the MFA-seq loci, including the SNS-seq peak amplitude.

      We would like to emphasize that MFA‑seq is not a method designed to map individual origins, and this fundamentally limits the interpretability of metaplots centered on MFA-seq regions. MFA‑seq identifies broad replication‑enriched domains, typically spanning 100–500 kb, within which multiple origins may fire asynchronously across the cell population.

      This concern is reinforced by the original MFA‑seq publications (Tiengwe et al., 2012; Devlin et al., 2016), which:

      - do not provide positional data for the 42-47 MFA‑inferred origins,

      - do not describe the computational method used to derive individual origin coordinates from the broad peaks, and

      - do not release any scripts or methodology that would allow independent reproduction of the claimed origin positions.

      Because of this, it is not possible to reconstruct or validate how the 42 MFA‑seq “origin” sites were defined, nor to use those coordinates as anchors for metaplot analyses.

      Most importantly, we disagree with the underlying assumption that each MFA‑seq peak corresponds to exactly one origin. This assumption runs counter to the principle of the technique, which identifies regions of higher DNA content in replicating cells than in non-replicating cells; it is also contradicted by our stranded SNS‑seq data and by DNA combing measurements:

      - SNS‑seq detects multiple discrete origins within the same genomic regions that produce a single broad MFA‑seq peak.

      - DNA combing reveals inter‑origin distances of ~36–422 kb (median ~150 kb) (PMID: 26976742), which is far shorter than the ~400–600 kb replication domains identified by MFA‑seq.

      - Furthermore, with only 42 origins detected by MFA-seq, it is not possible to achieve complete genome replication in T. brucei during S-phase. DNA combing has found that the average speed of replication forks in the procyclic forms is 1.9 Kb/min. (PMID: 26976742). Dividing the size of the Trypanosoma brucei brucei TREU927 genome (26.1 Mb) by 42 origins (PMID: 22840408) shows that 621 Kb must be replicated during the S phase. Using the calculated average replication speed of 1.9 Kb/min, we can estimate that the replication of 621 Kb would take 327 min (5.45 hours) (621 Kb/1.9 Kb/min = 327 min). However, this exceeds the estimated length of the S-phase in these parasites, which is 2.31 hours (138.6 minutes) (PMID: 32397111, 31811174, 28258618) or less, 1.36 hours (PMID: 2190996, 10574712) in Trypanosoma brucei procyclic forms. Therefore, more than 42 origins are necessary to complete replication during the short S phase.

      This makes it unlikely that MFA-seq regions represent single functional origins. For these reasons, a metaplot centered on MFA‑seq “loci” may lead to misinterpretations and would not provide biologically meaningful information.

      We hope that the expanded explanation clarifies our interpretation of the relationship between these two complementary, but fundamentally different, methods.

      b) The authors state that "Our results showed that the origins are predominantly located in the intergenic regions within the PTUs (Figure 2C)'. This finding cannot be discerned from this figure, which does not show 'strand switch regions' (SSRs; transcription start/stop sites), where MFA-seq predicts all origins to localise. The authors need to acknowledge this difference and must show a comparison of SNS-seq data, including peak amplitude, around all SSRs (whether predicted by MFA-seq to act as origins or not, since all appear to bind ORC1/CDC6).

      We have now provided the metaplots showing the overlap between stranded SNS-seq origins and SSRs (see Supplementary Figure 8D). This difference has been acknowledged and discussed in the revised manuscript.

      c) Finally, the authors' interpretation that around 30-55% of SNS-seq peaks overlap with MFA-seq 'origins' is highly questionable. MFA-seq peaks are regions of increased DNA content in replicating cells relative to non-replicating cells, and so the entire region under the MFA-seq peak is not necessarily an origin, but is likely to be a more discrete locus (eg, the SSR, where ORC1/CDC6 mainly localises). They should correct the wording and discuss what significance they see in this overlap; for instance, do they think SNS-seq 'clusters' are more pronounced within the MFA-seq peaks and, if so, what might this mean, and why does it not correlate with ORC1/CDC6 localisation?

      As the reviewer notes, ‘MFA‑seq peaks are regions of increased DNA content, and so the entire region under the MFA-seq peak is not necessarily an origin but is likely to be a more discrete locus’. This is exactly why MFA‑seq is inappropriate for identifying discrete/individual origins: within these replicated domains, multiple origins can fire, as revealed both by stranded SNS‑seq mapping.

      Regarding the overlap between SNS‑seq origins and MFA‑seq peaks, we agree with the reviewer that this overlap should not be interpreted as validating MFA‑seq “origin positions.” Instead, we now describe it more accurately as the proportion of discrete SNS‑seq origins that fall within broader MFA‑seq replication domains. This is expected, because SNS‑seq identifies individual initiation events, whereas MFA‑seq identifies S‑phase replication domains averaged across a population. Our stranded SNS‑seq data do not show enhanced origin accumulation within MFA-seq regions, and we find no correlation with TbORC1/CDC6 positions. This is now discussed.

      Regarding SSRs, we do not share the view that they should be considered privileged initiation sites. After remapping the TbORC1/CDC6 ChIP‑on‑chip dataset (see above) to the T. brucei Lister 427–2018 genome (Supplementary Fig. 8A), we observed that TbORC1/CDC6 binding is distributed throughout the chromosomes, not restricted to SSRs. To quantify this, we analyzed the overlap between TbORC1/CDC6 sites and all annotated SSR classes (dSSRs, cSSRs, and head‑to‑tail regions, as defined in Kim et al. 2009). The results show that:

      Only 10% of TbORC1/CDC6 binding sites fall within 40% of all SSRs.

      At the level of individual SSR types:

      - TTS: 3.3% of TTS overlap with 0.3% of TbORC1/CDC6 sites.

      - TSS: 67% of TSS overlap with 6.1% of TbORC1/CDC6 sites.

      - Head‑to‑tail regions: 54.2% overlap with 3.6% of TbORC1/CDC6 sites.

      These analyses demonstrate that most TbORC1/CDC6 sites are not located at SSRs, contradicting the idea that SSRs represent primary or exclusive origin sites.

      Author response image 1.

      Overlap between TbORC1/CDC6-12Myc binding sites (Tiengwe 2012, Cell Reports) and strand‑switch regions (SSRs). Venn diagram showing the overlap of 990TbORC1/CDC6-12Mycbinding sites (Retrieved from TritrypDB filtered at score 22 to achieve a number of binding sites similar to the one (953 binding sites) published in Tiengwe 2012, Cell Reports) and SSR sites in the genome (Kim 2018, NAR). The intersection shows that 10.3% of Orc1/CDC6 binding sites overlap with 41.8% SSRs. The intersection is subdivided into TSS (orange), TTS in (blue) and HT in (green).

      (3) A key objection to the data presentation is the decision to limit SNS-seq mapping to the intergenic regions. In addition to overlooking the SSRs (see above, 2), so-called subtelomeres, which account for nearly 50% of the T. brucei genome and are largely untranscribed, are not shown or discussed at all. Providing this data will improve clarity and also provide a key test of one of the predictions that the authors make: "most origins are localized in actively transcribed regions, which could lead to collisions between DNA replication and the transcription machinery. This spatial coincidence implies that transcription and replication must occur in a highly ordered and cooperative manner in T. brucei."

      We do not understand why this reviewer concluded that we took 'the decision to limit the mapping of SNS-seq to intergenic regions'. This is a factual error.

      To be clearer,

      (2) We now explicitly present the distribution of SNS‑seq origins across core and subtelomeric regions in the revised Figure 2D, making clear that origin mapping was performed genome‑wide.

      (2) And that SNS‑seq origins are also present in subtelomeric regions. We have revised the manuscript to avoid any implication that origin firing is restricted only to actively transcribed regions. Our data show that most SNS‑seq origins lie within intergenic regions of PTUs, but a minority are found outside these regions—including subtelomeres and SSRs. The revised text reflects this nuance and highlights that the spatial relationship between transcription and replication is strong but not exclusive.

      These additions undoubtedly ensure that the genomic-wide nature of SNS-seq analysis is transparent to the reader and should therefore remove this reviewer's “key objection”.

      a) The authors must show SNS-seq mapping to the subtelomeres (in addition to around the SSRs; see comment (2). If no SNS-seq peaks are detected in the subtelomeres, what do the authors conclude about how the genome is duplicated? If SNS-seq peaks are detected in the subtelomeres, do they correspond with the ordered nucleosomes in this part of the genome described by Maree et al (PMID: 28344657); if so, might SNS-seq signal localisation not be directed by transcription but chromatin?

      We have now presented the proportion of origins in subtelomeric regions (see Figure 2B).

      As illustrated in the metaplots in Author response image 2, the distribution of nucleosomes around the subtelomeric origins is similar to the distribution shown for all origins in the manuscript. We do not see the pattern of nucleosomes as described by Maree et al (PMID: 28344657) over ORC1/CDC6 binding sites in this part of the genome.

      Author response image 2.

      Metaplots showing the mean nuclesome signal over centred SNS-seq origins in subtelomeric regions. Two replicates from Maree et al 2019 (PMID: 28344657).

      We never claimed that transcription directs the localisation of the SNS-seq signal. We did not conduct experiments to address this issue. In contrast, we consider that the organisation of chromatin exerts a significant influence on the selection of active origins.

      (4) The major conclusion of the manuscript is that the SNS-seq signal corresponds very precisely to the locations of RNA-DNA hybrids (R-loops). Given all the limitations discussed above, can the authors rule out the possibility that SNS-seq is merely mapping DNA-DNA hybrids and is not, in fact, detecting origins?

      a) It is legitimate to speculate about the possibility that the very extensive overlap between SNS-seq and DRIP-seq signals within polycistronic transcription units (between ORFs) might suggest that DRIP-seq data detects nascent strands at replication origins, rather than R-loops at sites of pre-mRNA processing, as previously suggested by Briggs et al (PMID: 30304482). (eg, 'we disclosed for the first time a strong link between R-loop formation and DNA replication initiation'; 'The RNA:DNA hybrids are formed at initiation sites by RNA priming of SNS and Okazaki fragments'). However, the authors should acknowledge that alternative explanations for the localisation and potential functions of inter-CDS R-loops have been suggested,

      We do not find extensive overlap between stranded SNS-seq and DRIP-seq signal. We have observed only a minor proportion (1.7%) of the previously reported DRIP-seq signal to overlap with the origins detected by stranded SNS-seq. The RNA-primed SNS must form RNA:DNA hybrids during the initiation of DNA replication, and that an enrichment of these hybrids around the origins is expected. Therefore, we legitimately speculated that this minor proportion of RNA:DNA hybrids enriched around origin centres could be due to the origin activation.

      We agree that some of the DRIP-seq signals detected around the origins may be sites of pre-mRNA processing, as previously suggested by Briggs et al. (PMID: 30304482). Since there is no data proving implication of pre-mRNA processing into DNA replication initiation we prefer not to speculate about it.

      b) More importantly, the authors should provide experimental evidence that tests such a mechanistic prediction of R-loops and origins: for instance, have they attempted to remove R-loops, eg, by treatment with RNase H, and checked that the SNS-seq signal is unaltered? In the absence of such data, they cannot exclude the possibility that their work has revealed an overlooked problem with SNS-seq (which may not be limited to T. brucei; are matched DRIP-seq and SNS-seq datasets available to correlate these signals in a range of organisms?).

      We have not attempted RNase H treatment for a fundamental methodological reason: it seems highly improbable that RNA:DNA hybrids would persist through the multiple denaturation steps inherent to the SNS‑seq enrichment protocol. Published biophysical measurements show that RNA:DNA hybrids melt at ~95 °C (Roberts & Crothers, Science, 1992; PMID: 1279808), which is the temperature repeatedly applied during SNS isolation. Under these conditions, persistent RNA:DNA hybrids cannot remain intact and therefore cannot be responsible for the SNS‑seq peaks detected.

      We do not interpret our findings as revealing an “overlooked problem with SNS‑seq.” Instead, we consider that the enrichment of RNA:DNA hybrids around origins observed in DRIP‑seq is biologically meaningful and expected, given that replication initiation involves RNA‑primed nascent strands and that DRIP‑seq detects such structures.

      Reviewer #2 (Recommendations for the authors):

      I have some minor concerns that do not affect the main conclusions of the manuscript:

      (1) Figure 2B: The regions shown in the heatmap have different sizes, and I presume that the regions are ordered by size on the y-axis? If so, does the cone-shaped pattern, which is origin-less for genic regions and origin-enriched for intergenic regions, arise from the size of the regions? (I.e., for each genic region, the region itself is origin-less and the flanking intergenic regions contain origins.) If this is the case, then the peaks/valleys, centered exactly on the center of the regions on the mean frequency plots, arise from the different sizes of the analyzed regions, not from the fact that origins are mostly found at the center of intergenic regions.

      That is correct. The regions displayed in the heatmaps are genic and intergenic region sorted by size. We did not want to convey with this metaplot that the origins are accumulating at the centres of the intergenic region but mainly that genic regions are mostly devoid of origins and the intergenic regions enriched in origins.

      (2) Line 123, "and the average length of origins was found to be approximately 150 bp.": To determine origins, the authors filter away overlapping peaks and peaks that are too far from each other. Both restrict the minimal and maximal length of origins that can be observed, and this, in turn, affects the average length.

      This observation is correct. By applying filtering and setting the maximum distance between the positive and negative peaks, we are most likely affecting the average length by excluding origins that are potentially wider. Nevertheless, the violin plot shows that the majority of origins are shorter than 500 nt. In the end, the size of regions detected as the origin is not important. What gives the resolution of stranded-SNS-seq is the ability to identify the centre of the origin between the minus and plus peaks.

      (3) Data in the manuscript were sometimes not presented in an easy-to-read manner. In some cases, this was due to benign things, such as missing labels for the mean frequency plots (e.g., Figure 2B, blue and green) or very small fonts for axes (Figure 2B). Sometimes, due to the plot types that were chosen, such as pie-charts (Figure 2C, see https://medium.com/analytics-vidhya/dont-use-pie-charts-in-data-analysis-6c005723e657), stacked bar plots (Figure 6B), or showing cumulative distributions (Figure 5C, and Figure 2D) it makes it difficult to judge the actual distribution.

      Wherever possible, the size of the small fonts was increased to the maximum. Missing labels were added to the mean frequency plots. We increased the font size for the axes in the frequency plots.

      However, we found cumulative distributions useful. If you have a more specific proposal for replacing cumulative distributions, we would be very grateful to hear it. We also hope that magnifying the figures in TIFF format with a higher resolution will improve visibility.

      (4) Figure 2B: This data would be better presented with all regions stretched to the same size (the reason is explained in the public review).

      We performed the scaled plots for the stranded SNS-seq origins over the genic and intergenic regions as the reviewer suggested (see Author response image 3), but we prefer to keep the unscaled versions in the manuscript.

      Author response image 3.

      Distribution of mapped origins in scaled genic and intergenic regions. Scaled heatmaps present the distribution of the mapped origins and shuffled controls within scaled genic and intergenic regions (± 2 kb).

      (5) Line 149: "The number of origins in both cells was 148 compared using normalised mapped reads": Supplementary Figure 2D mentions that conditions were subsampled to the same amount. I would mention that explicitly in the main text ("compared using normalized, subsampled mapped reads"), as 'normalizing' would not include 'subsampling' for me. Also, I could not find the methods section that the authors refer to here.

      Thanks for the suggestion. We changed the text to make this point clearer. In the methods section, the subsampling process was referred to as 'PCF down-sampling', but we changed now the name to 'Read sub-sampling' to be more consistent in the edited version of the manuscript.

      (6) Figure 2C: I struggled to understand what gDNA stands for. Maybe it could be replaced with something like distribution in genome?

      Thanks for this suggestion. It is changed to ‘distribution in genomic sequence’.

      (7) Figure 5C: I cannot see how a G4 30 kb from an origin could be relevant. This also does not fit the scale of the author's own model at all (Figure 8).

      The main goal of Figure 5C was to demonstrate the differences between origins and the nearest G4s compared to the shuffled controls. The graph shows that 50% of the origins have a G4 within 2010 bp, whereas the median for the shuffled control is 4154 bp in the case of non-stabilised G4s. Our model is based on Figure 5D, which illustrates the enrichment of G4s and poly(dA) around the centre of origins.

      (8) Figure 6B: could be made supplementary in my opinion. All relevant data is repeated in panel D.

      It is true that Figures 6B and 6C contain some repetition. However, we would prefer to keep Figure 6B because it provides a quantification of the six indicated categories, along with the statistical tests. Figure 6B only presents the three categories that changed significantly. Figure 6D shows distribution but does not contain quantified data.

      (9) Figure 6D: This plot is repeating a lot, within single figures (Figure 6A, top) but also between figures (e.g., Figure 5D, Figure 4B). I'd prefer it if the initial plots of each figure were expanded a bit (here Figure 6A, top) to include some information from the previous figures. Then all these summary plots could be combined into a single figure at the very end (maybe still as different panels to reduce the number of lines in a single plot). Otherwise, each summary plot repeats the tracks of the previous, which becomes very repetitive.

      Our model is based on these summary plots, and we calculated the relative distances between the different elements using them. Two elements were repeated in each plot: the positions of poly(dA) and G4s. These two elements served as reference points to determine the relative positions of the other elements. Following your suggestion would result again in repetitive summary plots at the end, as one combined summary plot would be overloaded with lines and difficult to understand.

      (10) Figure 6D & Figure 7C: Both show predicted G4s; however, on the plus strand, one prediction has a two-peaked shape, the other only a single peak. Is this a mistake?

      The graphs for the predicted G4s do not have the same shape in the two plots as they were performed in different reference genomes for T. brucei. Figure 6C is in the 427-reference genome as the MNase-seq data set was analysed in this reference genome and we re-did the SNS-seq analysis and the G4 prediction in this reference genome to be able to compare them directly. In Figure 7C we are comparing origins DRIP-seq and predicted G4s, in this case all datasets could be compared in the 427-2018 reference genome.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study investigates the role of vascular mural cells, specifically pericytes and vascular smooth muscle cells (vSMCs), in maintaining blood-brain barrier (BBB) integrity and regulating vascular patterning. Analyzing zebrafish pdgfrb mutants that lack brain pericytes and vSMCs, they show that mural cell deficiency does not impair BBB establishment or maintenance during larval and early juvenile stages. However, mural cells seem to be crucial for preventing vascular aneurysms and hemorrhage in adulthood as focal leakage, basement membrane disruption, and increased caveolae formation are observed in adult zebrafish at aneurysm hotspots. The authors challenge the paradigm that mural cells are essential for BBB regulation in early development while highlighting their importance for long-term vascular stability.

      Strengths:

      Previous studies have established that the zebrafish BBB shares molecular and morphological homology with e.g. the mammalian BBB and therefore represents a suitable model. By examining mural cell roles across different life stages - from larval to adult zebrafish - the study provides an unprecedented comprehensive developmental analysis of brain vascular development and of how mural cells influence BBB integrity and vascular stability over time. The use of live imaging, whole-brain clearing, and electron microscopy offers high-resolution insights into cerebrovascular patterning, aneurysm development, and structural changes in endothelial cells and basement membranes. By analyzing "leakage hotspots" and their association with structural endothelial defects in adults the presented findings add novel insights into how mural cell loss may lead to vascular instability.

      Weaknesses:

      The study uses quantitative tracer assays with multiple molecular weight dyes to evaluate blood-brain barrier (BBB) permeability. The study normalizes the intensity of tracer signals (e.g., 10 kDa, 70 kDa dextrans) in the brain parenchyma to the vascular signal of a 2000 kDa dextran tracer (assumed to remain within vessels). Intensity normalization is used to control for variations in tracer injection efficiency or vascular density. This method doesn't directly assess the absolute amount of tracer present in the parenchyma, potentially underestimating leakage severity. As the lack of BBB impairment is a "negative" finding, more rigorous controls or other methods might be needed to corroborate it.

      In response to these and comments from other reviewers, we have now performed further carefully controlled analysis to test leakage of tracers using molecular weights ranging from 1 to 2000 kDa. We have performed additional normalisation approaches (new data in Fig. 2a–d) imaging tracer extravasation together with vascular reporters (Tg(kdrl:EGFP)<sup>s843</sup> or Tg(kdrl:Hsa.HRAS-mCherry)<sup>s916</sup>) and used this transgenic reporter for normalisation (as suggested by Reviewer #2). The results of these experiments all supported our initial conclusions (revised Extended Data Fig. 3a–d) further validating the reliability of our method. Furthermore, as suggested by the reviewer analysis of the raw tracer intensity amounts in the parenchyma were also performed with no normalization at all (see Author response image 1). This also supports our conclusion that the BBB is intact in young animals. Finally, we now use our methods to demonstrate that we can detect an immature leaky BBB at 3 dpf and a mature functional BBB at 7 dpf (Fig. 2e-f), a suitable positive control to show that our methods and analyses are reliable.

      Author response image 1.

      Raw intensity values from the parenchyma confirm findings in Figure 2 and Extended Data Figure 3.a–d, Raw mean fluorescence intensity values of extravasated tracers in the midbrain.(a–b) show unnormalized values corresponding to Extended Data Fig. 3a–d, and (c–d) show unnormalized values corresponding to Fig. 1a–d. Unpaired t-tests for 70 and 10 kDa at 14 dpf in (a–b), for 10 kD at 7 dpf, and for 70 kDa at 14 dpf in (c–d). Mann-Whitney tests for 70 and 10 kDa at 7 dpf in (a–b), for 70 kDa at 7 dpf, and for 10 kDa at 14 dpf (c–d), due to non-normal distribution. These data were all generated in genotype blind assays, display variance in signal that is generated between embryos due to injection differences and show no difference between the genotypes analysed in BBB integrity. Comparison of this to normalised data using 2000 kDa tracer or kdrl expression in endothelial cells (Fig. 2 and Extended Data Fig. 3) confirms that normalisation improves the analysis, effectively controlling for embryo-to-embryo differences in delivery of tracer and imaging.

      Reviewer #2 (Public review):

      Summary:

      The authors generated a zebrafish mutant of the pdgfrb gene. The presented analyses and data confirm previous studies demonstrating that Pdgfrb signaling is necessary for mural cell development in zebrafish. In addition, the data support previously published studies in zebrafish showing that mural cell deficiency leads to hemorrhages later in life. The authors presented quantified data on vessel density and branching, assessed tracer extravasation, and investigated the vasculature of adult mice using electron microscopy.

      Strengths:

      The strength of this article is that it provides independent confirmation of the important role of Pdgfrb signaling for the development of mural cells in the zebrafish brain. In addition, it confirms previous literature on zebrafish that provides evidence that, in the absence of pericytes/VSMC, hemorrhages appear (Wang et al, 2014, PMID: 24306108 and Ando et al 2021, PMID: 3431092). The study by Ando et al, 2021 did not report experiments assessing BBB leakage in pdgfrb mutants but in the review article by Ando et al (PMID: 34685412) it is stated that "indicating that endothelial cells can produce basic barrier integrity without pericytes in zebrafish."

      We thank the reviewer for their comments and pointing out literature that we had not cited (this has been corrected in our revised manuscript).

      As noted by other reviewers, our study goes beyond simply confirming previous literature. The quoted section by the reviewer from Ando et al 2021 regarding intact barrier integrity in pdgfrb mutants is a conclusion based on apparent lack of haemorrhages in pdgfrb mutants[1]. Our work shows haemorrhages in older animals and as such is in line with these previously published results, but it also extends previous work, for the first time reporting detailed functional analysis to assess BBB integrity. Our study uses definitive tracer assays (now including extensive revisions) to identify intact the BBB in pdgfrb mutants in live animals. This has not been previously described and is important because it offers a new perspective on the evolutionary conservation (or otherwise) of pericyte control of BBB function. Furthermore, our study investigates the nature of hotspot leakage and haemorrhages in more detail than in previous work.

      Weaknesses:

      (1) The authors should avoid using violin plots, which show distribution. Instead, they should replace all violin plots in the figures with graphs showing individual data points and standard deviation. For Figure 2f specifically, the standard deviation in the analyzed cohort should be shown.

      This is a good point and we have replaced the violin plots with individual data points and shown all data as mean±SEM.

      (2) The authors have not shown the reduced PDGFRB protein or the effect of mutation on mRNA level in their zebrafish mutant.

      Our pdgfrb<sup>uq30bh</sup> mutant allele introduces a mutation predicted to generate a truncated protein very similar to previously validated alleles (see detail in revised Extended Data Fig. 1a and methods). Our pdgfrb<sup>uq30bh</sup> mutant also phenocopies previous pdgfrb mutants (sa16389 and um148 alleles)[2,3], displaying mural cell loss with multiple markers (Fig. 1a, new data in Extended Data Fig. 1b–c, Fig. 3b–c; Extended Data Fig. 4c–d) and the same typical morphological defects and survival rates (new data in Extended Data Fig. 1d–f). Thus our mutant phenocopy gives confidence it is most likely a null allele, in line with previous papers studying presumed null alleles[1].

      We believe this provides sufficient confidence in this allele of pdgfrb. Moreover, considering that our manuscript focusses on loss of mural cells and we show definitively that this mutant has robust loss of mural cells in the brain, our mutant is suitable for this study.

      (3) Statistical data analysis: Did the authors perform analyses to investigate whether the data has a normal distribution (e.g., Figures 1d, e)?

      We thank the reviewer for raising this and apologise for this oversight. All data have now been assessed for normality using Shapiro-Wilk test and further statistical analyses have been performed accordingly. The specific quantifications referred to by the reviewer in Extended Data Fig. 3a–d (previously Fig. 1d-e), have normal distribution except for quantification measuring 70 kDa extravasation at 7 dpf, therefore Mann-Whitney test has been used for this comparison. Further information can be found in figure legends and methods.

      (4) Analysis of tracer extravasation. The use of 2000 kDa dextran intensity as an internal reference is problematic because the authors have not provided data demonstrating that the 2000 kDa dextran signal remains consistent across the entire vasculature. The authors have not provided data demonstrating that the 2000 kDa dextran signal in vessels exhibits acceptable variance across the vasculature to serve as a reliable internal reference. The variability of this signal within a single animal remains unknown. The presented data do not address this aspect.

      We thank the reviewer for their comment and agree that analysis was needed for showing 2000 kDa dextran as a reliable normalization signal.

      We now show the data in the following Figures that demonstrate the consistency of signal throughout the vasculature using this 2000-kDa tracer: Extended Data Fig. 2b, Extended Data Fig. 3a and c, Extended Data Fig. 5a, Extended Data Fig. 6. In fact, we observe that this 2000 kDa tracer provides a very reliable marker of large and small calibre vessels in larval, juvenile and adult animals, even in fixed and cleared whole tissues and animals (e.g. Extended Data Fig. 2d-e, Extended Data Fig. 5 and 6).

      Our further experiments and analysis support the use of this tracer as an ideal way to normalise for variation between animals and coupled with improved masking of vessels using transgenic labels (e.g. Extended Data Fig. 2b) we can quantify across whole vascular networks to reduce the concern about variation within individual animals. We also find 2000 kDa shows negligible leakage through the brain vessels Extended Data Fig. 2b–c (new data) at 2 hours post-injection (hpi) and provided images in Extended Data Fig. 6b–b′′ showing detectable signals even at 6 hpi. Finally, results generated with this approach, normalisation to transgenic markers or even raw parenchymal values of tracer intensity, generate the same conclusions. In addition, we point the reviewer to a recent pre-print that further validates this method from our team[4].

      Overall, we find the use of this tracer an ideal way to normalise for differences in injection volumes between animals and we recommend the use of this method to other groups assessing BBB leakage in zebrafish.

      Additionally, it's intriguing that the signal intensity in the parenchyma of the tested tracers presents a substantial range, varying by 20-30% in the analysed cohort (Figure 1g, Extended Figure 1e). Such large variability raises the question of its origin. Could it be a consequence of the normalization to 2000 kDa dextran intensity which differs between different fish? Or is it due to the differences in the parenchymal signal intensity while the baseline 2000 kDa intensity is stable? Or is the situation mixed?

      This is a good point raised by the reviewer.

      To address this, we have used the following approaches:

      (1) We provide additional experiments and normalisation methods that support the utility of our tracer studies (new data in Fig 2a–f and Extended Data Fig. 2b–c), discussed in detail below.

      (2) We provide graphs of the raw parenchymal distribution of tracer not normalised at all (also requested by reviewer 1). This is provided in Author response image 1 and further supports all our conclusions, showing that our normalisation methods generate meaningful data.

      Overall, the range of parenchymal intensity that we see after tracer injection and live imaging shows variations introduced during microinjection. However, these ranges are in-line with previous publications using similar methods (see studies by O’Brown et al 2019 and 2023)[5,6], allow reliable statistical comparisons to be drawn between control and mutants and allow us to detect both immature and functional BBB states during zebrafish development (new data in Fig. 2e-f).

      Of note, the variability we see is likely introduced during the injection process into tiny larval blood vessels and is the reason why we perform normalization of parenchymal tracers to a vascular dextran signal that doesn’t leak from brain vessels. In our studies, 2000-kDa dextran has been co-injected with the smaller size tracers, therefore any potential differences in injection volumes as well as imaging conditions (however consistent) should be reduced by this method.

      An alternative and potentially more effective approach would be to cross the pdgfrb mutant line with a line where endothelial cells are genetically labeled to define vessels (e.g. the line kdrl used in acquiring data presented in Figure 2a). Non-injected controls could then be used as a baseline to assess tracer extravasation into the parenchyma.

      We thank the reviewer for this suggestion.

      In response, we have performed new tracer leakage experiments at 7 and 14 dpf in siblings and pdgfrb mutants and quantified parenchymal tracer extravasation by normalizing to vascular reporters (Tg(kdrl:EGFP)<sup>s843</sup> or Tg(kdrl:Hsa.HRAS-mCherry)<sup>s916</sup>). The results were in-line with the previously presented and independent experiments and showed indistinguishable phenotypes between siblings and pdgfrb mutants (new data, Fig. 2a–d). We also used uninjected controls to assess baseline and saw consistent values approaching zero in these images and did not include this in the revised paper.

      Furthermore, we have also used this approach in wild-type larvae at 3 dpf (immature BBB) and 7 dpf (functional BBB)[5]. We detected significantly higher parenchymal extravasation of 10 and 70 kDa tracers at 3 dpf compared to 7dpf, demonstrating that our method can detect leakage (new data, Fig. 2e–f).

      We believe that both normalization approaches have advantages (as discussed above), therefore showing the same results with these two different approaches has further strengthened our findings.

      How is the data presented in Figure 3e generated? How was the dextran intensity calculated? It looks like the authors have used the kdrl line to define vessels. Was the 2000 kDa still used as in previous figures? If not, please describe this in the Materials and Methods section.

      We have moved this data to Fig. 4e (previously Fig. 3e).

      Previously, we had plotted raw data due to the nature of the experiment being conducted on a vibratome sectioned tissue. The 2000 kDa tracer was not used. In response to this query and to be consistent with the new approach suggested by the reviewer, we have revised the quantification by normalizing the 10 kDa tracer extravasation to Tg(kdrl:Hsa.HRAS-mCherry)<sup>s916</sup>) for this and the new experiments on juveniles (Fig. 5h–i). Please see the corresponding figure legends or revised methods (lines 464–472).

      (5) The authors state that both controls and mutants show extravasation of 1 kDa NHS-ester into the parenchyma. However, the presented images do not illustrate this; it is not obvious from these images (Extended Data Figure 1c). Additionally, the presented quantification data (Extended Data Figure 1e) do not show that, at 7 dpf, the vasculature is permeable to this tracer. Note that the range of signal intensity of the 1 kDa NHS-ester is similar to the 70 kDa dextran (Figure 1g and Extended Figure 1e). Would one expect an increase in the ratio in case of extravasation, considering that the 2000 kDa dextran has the same intensity in all experiments? Please explain.

      We thank the reviewer for raising this important point.

      To clarify, we have never claimed that “2000-kDa dextran has the same intensity in all experiments”. On the contrary, vascular 2000 kDa normalization has been used to account for potential differences caused by injection, as stated in the submitted supplementary materials and now made more clear in the revision.

      In response to this query, we conducted more detailed analysis on tracer extravasation patterns based on molecular weight (new data, Extended Data Fig 2b–c). This analysis showed that 1- and 10-kDa tracers have much higher extravasation rate compared to 70- and 2000-kDa tracers. Interestingly, we did not find a significant difference between 1 and 10 kDa extravasation. Therefore, in the revised manuscript we used only 10 kDa in further experiments and have removed 1 kDa from the figures.

      To assess the tracers individually (new data in Extended Data Fig. 2c), parenchymal extravasation of individual tracers was normalised to their own vascular signal (eg. Mean intensity of 10 kDa in midbrain/mean intensity of 10 kDa in vasculature), to account for potential differences in injection volume. This provides a suitable method to assess leakage in wild-type animals and is now in line with how previous studies have analysed such tracer injections[5,6]. Please see revised figure legends and supplementary materials for details.

      (6) The study would be strengthened by a more detailed temporal analysis of the phenotype. When do the aneurysms appear? Is there an additional loss of VSMC?

      We thank the reviewer for this suggestion, and we have now performed staged imaging of the pdgfrb mutants and siblings between 7 and 21 dpf using TgBAC(acta2:EGFP)<sup>uq17bh</sup> transgene (new data, Fig. 3b-c; Extended Data Fig. 4a–d). Consistent with previous results, acta2:EGFP-positive cells surrounding the middle mesencephalic central arteries (MMCtA) were missing in pdgfrb mutants. At 21 dpf, we have also observed a mild dilation of these vessels, likely the earliest changes to generate aneurysms (new data, Fig. 3c).

      To extend the number of stages analysed in this study, we have also performed new tracer leakage experiments in juveniles (30 dpf) and found that aneurysms can be detected at this age when the 10 kDa tracer is used (new data in Fig. 5b–b′). Consistent with the adult stage phenotype, aneurysms were limited to the larger calibre vessels (arteries) in the brain. We have also observed hotspots, and upon quantification, we found fewer numbers in juveniles compared to adults, suggesting that severity of aneurysms and hotspots increase with age.

      Taken together, our results show that the aneurysms in pdgfrb mutants start appearing at late larval/early juvenile stages (~21 dpf) with observable dilations. By 30 dpf, aneurysms accompanied by small numbers of hotspots are observed, which exhibits significantly increased numbers by adulthood. This also correlates with reduced development and survival rate of pdgfrb mutants after 30 dpf (new data, Extended Data Fig. 1d–e).

      (7) The authors intended to analyze the BBB at later stages (line 128), but there is not a significant time difference between 2 months (Figure 2) and 3 months (Figure 3) considering that zebrafish live on average 3 years. Therefore, the selection of only two time-points, 2 and 3 months, to analyze BBB changes does not provide a comprehensive overview of temporal changes throughout the zebrafish's lifespan. How long do the pdgfb mutants live?

      Respectfully, zebrafish transition from juvenile stages to adulthood between 2 and 3 months and there are many significant differences in the physiology of this organism at these two ages. At 2 months, zebrafish are still juveniles undergoing metamorphosis with rapid growth and ongoing skeletal and vascular development. By 3 months, they are sexually mature adults and have much more developed cranioskeletal and vascular systems. Having said that, we take the reviewers important point that further temporal resolution would improve the study.

      We have performed new experiments in 1-month-old animals and provided comprehensive analysis of the vascular phenotypes occurring in pdgfrb mutants. These were very informative experiments analysing leakage using 10-kDa tracer injections and have significantly improved the study. We had previously provided experiments at 5-month-old adults as well (previously Fig. 4a–b and Extended Data Fig. 4a) and so now the study includes larval stages (7, 14 dpf), juveniles at 1 and 2 months and adults at 3 and 5 months. While the additional timepoints did not offer up any new conclusions, they significantly enhanced the body of work overall.

      Of further note, we provided survival data up to 90 dpf where survival of the pdgfrb mutants is significantly reduced compared to siblings (Extended Data Fig. 1e). We believe this is associated with the severity of the aneurysms and haemorrhages which probably lead to lethality in these mutants.

      (8) Why is there a difference in tracer permeability between 2 and 3 months (Figures 2 and 3)? Are hemorrhages not detected in 2-month-old zebrafish?

      In response to this and other queries, we have added new additional experiments that provide more detailed temporal analysis on tracer accumulation (new data in Fig. 5b–c, Fig. 5f–g).

      In short, we do not see obvious haemorrhages in 1- or 2-month fish at a gross level during dissections (not shown). We find that using 10-kDa tracer, we can detect small hotspots at aneurysms as early as 1 month, likely representing the earliest loss of integrity. We do not see obvious hotspots in 2-month-old animals when we use the 70-kDa tracer, this suggests to us that it is less sensitive for hotspot detection (in line with new Extended Data Fig. 2c). Finally, we find that the number of hotspots increases dramatically from Juvenile to Adult stages in our datasets, which we take as indicative of a progressive phenotype.

      Overall, tracer size matters for detecting hotspots and they become more apparent in older animals - we have added a note in the main text to cover these points (lines 200–205)

      (9) Figure 3: The capillary bed should be presented in magnified images as it is not clearly visible. Figure 3e shows that in the pdgfb mutant the dextran intensity is higher also in regions 6-10. How do the authors explain this?

      We thank the reviewer for raising this important point.

      Firstly, we now include enlarged views of the capillary beds for this experiment (Fig. 4d′) and new experiments mentioned below.

      Secondly, in relation to why there is higher tracer in lateral locations and not just medial sites of haemorrhage, we believe that this is most likely due to the progressive spread of tracer from the medial hotspots. To test if this is likely, we performed additional experiments and tested tracer accumulation at 2 different timepoints in brains collected at 0.5 or 6 hpi (new data in Fig. 5f–g, Extended Data Fig. 6a–b′′). Tracer accumulation at 0.5 hpi was very minimal and was primarily limited to hotspots and nearby regions new data in (Fig. 5h), whereas a higher tracer accumulation in brains was observed across medial to lateral regions at 6 hpi (new data in Fig. 5i) in pdgfrb mutants. Comparing the data in Figure 4 (2 hpi) and new data in Figure 5i (6 hpi), the 10 kDa-tracer appears to have spread to more lateral locations given the increased time allowed post injection.

      We cannot formally exclude the possibility that tracer leakage does occur slower through capillaries than at major hotspots, which might fit with the proposed model of slow leakage via increased EC transcytosis[7-9]. However, considering that we cannot detect increased tracer accumulation in pdgfrb mutants that lack aneurysms and haemorrhages at 7 and 14 dpf, such a scenario would require capillary transcytosis to be active at later juvenile and adult stages but not in larval and late larval animals. Thus, we believe the most plausible explanation is that aneurysm/haemorrhage associated leakage is the primary cause of the vascular integrity defects in zebrafish pdgfrb mutants.

      We have added discussions addressing this in the revised manuscript (lines 220–230, 300–302).

      (10) In general, the manuscript would benefit from a more detailed description of the performed experiments. How long did the tracer circulate in the experiments presented in Figures 2, 3, and 4?

      We thank the reviewer for this suggestion and have now ensured that this is clearly described for in figure legends and methods (lines 391–395).

      (11) How do the authors explain the poor signal of the 70 kDa dextran from the vasculature of 5-month-old zebrafish presented in Extended Data Figure 3?

      We agree that the dextran signal was reduced compared to the other experiments in that Figure. This is likely due to sample preparation and clearing causing reduced fluorescence. Upon consideration of the presented data and the additional experiments using 10 kDa tracers providing further validations for our claims, we decided to remove this data from the paper.

      (12) The study would benefit from a clear separation of the phenotypes caused by the loss of VSMC. The title eludes that also capillaries present hemorrhages which is not the case. How do vascular mural cells differ from mural cells? Are there any other mural cells?

      We take the reviewers point and have now updated the title as "Mural cells protect the adult brain from haemorrhage but do not control the blood-brain barrier in developing zebrafish."

      (13) I have a few comments about how the authors have interpreted the literature and why, in my opinion, they should revise their strong statements (e.g., the last sentence in the abstract).

      Scientists have their own insights and interpretations of data. However, when citing published data, it should be clearly indicated whether the statement is a direct quote from the original publication or an interpretation. In the current manuscript, the authors have not correctly cited the data presented in the two published papers (references 5 and 6). These papers do not propose a model where pericytes suppress "adsorptive transcytosis" (lines 73-76). While increased transcytosis is observed in pericyte-deficient mice, the specific type of vesicular transport that is increased or induced remains unknown.

      Similarly, lines 151-152 refer to references 5 and 6 and use the term "adsorptive transcytosis," but the authors of both papers did not use this term. Attributing this term to the original authors is inaccurate. Additionally, lines 152-153 do not accurately represent the findings of references 5 and 6. These papers do not state that there is an induction of "caveolae" in endothelial cells in pericyte-deficient mice. In the absence of pericytes, many vesicles can be observed in endothelial cells, but these vesicles are relatively large. It is more likely that there is some form of uncontrolled transcytosis, perhaps micropinocytosis. Please refer to the original papers accurately.

      We thank the reviewer for these comments. We take the point and have rewritten the manuscript carefully to improve accuracy and avoid misrepresenting any previous claims made in specific papers.

      Also, the authors have missed the fact that in mice, the extent of pericyte loss correlates with the extent of BBB leakage. To a certain extent, the remaining pericytes, can compensate for the loss by making longer processes and so ensure the full longitudinal coverage of the endothelium. This was shown in the initial work of Armulik et al (reference 5) and later in other studies.

      We certainly did not miss this important point (as we are also working with these mouse models) and we now include reference to this in our expanded discussion. Of note, we do think it would be worthwhile assessing if the extent of BBB leakage and pericyte coverage also correlates with the presence of microhaemorrhages in these hypomorphic mouse models, although this is more challenging to do in mice than in zebrafish.

      The bold assertion on lines 183 -187 that a lack of specific BBB phenotype in pdgfrb zebrafish mutant invalidates mouse model findings is unfounded. Despite the notion that zebrafish endothelium possesses a BBB, I present a few examples highlighting the differences in brain vascular development and why the authors' expectation of a straightforward extrapolation of mouse BBB phenotypes to zebrafish is untenable.

      In mice Pdgfrb knockout is lethal, but in zebrafish, this is not the case. In marked contrast to mice, however, zebrafish pdgfrb null mutants reach adulthood despite extensive cerebral vascular anomalies and hemorrhage. Following the authors' argumentation about the unlikely divergence of zebrafish and mice evolution, does it mean that the described mouse phenotype warrants a revisit and that the Pdgfrb knockout in mice perhaps is not lethal? Another example where the role of a gene product is not one-to-one, which relates to pericyte development, is Notch3. Notch3-null mice do not show significant changes in pericyte numbers or distribution, suggesting a less prominent role in pericyte development compared to zebrafish.

      Although many aspects of development are conserved between species, there are significant differences during brain vascular development between zebrafish and mice. These differences could reveal why the BBB is not impaired in zebrafish pdgfrb mutants. There is a difference in the temporal aspect when various cellular players emerge. The timing of microglia colonization in the brain differs. In mice, microglia colonization starts before the first vessel sprouts enter the brain, while in zebrafish, microglia enter after. Additionally, microglia in zebrafish and mice have a different ontogeny. In mice, astrocytes specialize postnatally and form astrocyte endfeet postnatally. In zebrafish, radial glia/astrocytes form at 48 hpf, and as early as 3 dpf, gfap+ cells have a close relationship with blood vessels. Thus, these radial glia/astrocyte-like cells could play an important role in BBB induction in zebrafish. It's worth noting that in Drosophila, the blood-brain barrier is located in glial cells. While speculative, these cells might still play a role in zebrafish, while the role of pericytes does not seem to be crucial. Pericytes enter the brain and contact with developing vasculature (endothelium) relatively late in zebrafish (60 hpf). In mice, the situation is different, as there is no such lag between endothelium and pericyte entry into the brain. I suggest that the authors approach the observed data with curiosity and ask: Why are these differences present? Are all aspects of the BBB induced by neural tissue in zebrafish? What is the contribution of microglia and astrocytes?"

      Another interesting aspect to consider is the endothelial-pericyte ratio and longitudinal coverage of pericytes in the zebrafish brain, and how this relates to what is observed in mice. How similar is the zebrafish vasculature to the mouse vasculature when it comes to the average length of pericytes in the zebrafish brain? Does the longitudinal coverage of pericytes in the zebrafish brain reach nearly 100%, as it does in mice?

      Based on the preceding arguments, it is recommended that the authors present a balanced discussion that provides insightful discussion and situates their work within a broader framework.

      Overall, we agree with most of the points made by the reviewer above. As we have now extended the format of this paper to be a full article, we have space to provide an extended discussion and introduction. We now try to capture many of the points made by the reviewer and we think that this has significantly improved the paper. We thank the reviewer for this contribution.

      We do want to point out that we did not state that our findings using zebrafish pdgfrb mutants invalidate mouse model findings. We suggest that a deeper analysis to understand the nature of the hotspots in mural cell deficient mammalian models could be very interesting in light of the zebrafish observations. We hope that the revised discussion better reflects this.

      Reviewer #3 (Public review):

      This manuscript examines the role of pdgfrb-positive pericytes in the establishment and maintenance of the blood-brain barrier (BBB) in the zebrafish. Previous studies in PDGFB- or PDGFRB-deficient mice have suggested that loss of pericytes results in disruption of the BBB. The authors show that zebrafish pdgfrb mutant larvae have an intact BBB and that pdgfrb mutant adult fish show large vessel defects and hemorrhage but do not exhibit substantial leakage from brain capillaries, suggesting loss of pericytes is not sufficient to "open" the BBB. The authors use beautiful and compelling images and rigorous quantification to back up most of their conclusions. The imaging of the adult brain is particularly nice. The authors rigorously document the lack of BBB leakage in pdgfrbuq30bh mutant larvae and large vessel phenotypes (eg, enlargement and rupture) in pdgfrbuq30bh mutant adults. A few points would help the authors to further strengthen their findings contradicting the current dogma from rodent models.

      We appreciate the reviewer's comments on the manuscript overall and agree that addressing the raised points was needed to strengthen our findings. We have addressed the main points below and believe that this revision greatly improves this study.

      Major point:

      The authors document pericyte loss using a single TgBAC(pdgfrb:egfp)ncv22 transgenic line driven by the promoter of the same gene mutated in their pdgfrbuq30bh mutants. Given their findings on the consequences of pericyte loss directly contradict current dogma from rodent studies, it would be useful to further validate the absence of brain pericytes in these mutants using one of several other transgenic lines marking pericytes currently available in the zebrafish. This could be done using pdgfrb crispants, which the authors show nicely phenocopy the germline mutants, at least in larvae. This would help nail down the absence of any currently identifiable pericyte population or sub-population in the loss of pdgfrb animals and substantially strengthen the authors' conclusions.

      We thank the reviewer and agree that examination of pdgfrb<sup>uq30bh</sup> mutants using another transgenic line labelling pericytes would further validate the absence of brain pericytes. We generated a transgenic line, TgBAC(abcc9:abcc9-T2A-mCherry)<sup>uom139</sup>, to visualise pericytes and validated the absence of brain pericytes in the pdgfrb mutants (revised Extended Data Fig. 1b). The loss of brain pericytes matched our findings using TgBAC(pdgfrb:egfp)<sup>uq15bh</sup> line as well as previously published data by Ando et al 2016-2021, where the brain pericytes except for metencephalic artery were missing[2,3].

      Other issues:

      The authors should provide more information about the pdgfrbuq30bh mutant and how it was generated (including a diagram in a supplemental figure would be useful).

      We thank the reviewer for this suggestion. In addition to the explanations provided in supplementary materials, we have added a schematic, provided sanger sequencing results showing the mutation as well as predicted effect of the mutation on the protein domains (Extended Data Fig. 1a).

      It would be helpful to show some data on whether mutants show morphological phenotypes or developmental delay at 7 and 14 dpf, to provide some context to better assess the reduced branching and vessel length vascular phenotypes (see Figures 1c-e).

      We thank the reviewer for this suggestion. We have provided further details on body length and survival of the pdgfrb mutants until 90 dpf. As reported by Ando et al 2021, we did not observe any distinguishing feature until about 30 dpf[1,3]. The adult anatomy of our mutant allele matches that of previously described null mutants and is now shown (Extended Data Fig. 1f).

      If available, it would be helpful to have a positive control for the tracer leakage experiments - a genetic manipulation that does cause disruption of the BBB and leakage at 2 hours post-tracer injection (see Figures 1f and g).

      We thank the reviewer for this suggestion and agree that a positive control would validate reliability of our method. We have performed new experiments at 3 dpf when BBB integrity is not yet established and at 7 dpf when BBB is functional in zebrafish[5], testing both 10 and 70 kDa tracers (new data in Fig. 2e–f). We detected significantly higher tracer accumulation at 3 dpf, showing that our methods can detect tracer leakage in the brain.

      Quantification of the findings in Figure 4c, d would be useful, as would the use of germline fish for these experiments if these are now available. If this is not possible, it would be helpful to document that the crispants used in these experiments lack pdgfrb:egfp pericytes at adult stages (this is only shown for 5 dpf larvae, in Extended Data Figure 4b).

      We thank the reviewer for this comment. Using TgBAC(pdgfrb:egfp)<sup>uq15bh</sup> line, we have imaged coronal brain sections collected from 10-week old pdgfrb crispants and uninjected siblings (age-matched animals used in Fig. 5d–e, previously Fig. 4c–d). We have now included data showing that adult pdgfrb crispants lack brain mural cells, phenocopying pdgfrb<sup>uq30bh</sup> mutants (new data, Extended Data Fig. 6f). These particular crispants are very reliable in our hands and nicely reproduce stable mutant phenotypes, giving us confidence to use the faster F0 approach in this experiment.

      Adult mutants clearly show less dye leakage in the more superficial capillary regions than WT siblings, but dextran intensity is a bit higher, although this could well be diffusion from more central brain regions where overt hemorrhage is occurring. Along similar lines though, the authors' TEM data in Extended Data Figure 4d hints that there may be more caveolae in mutant brain capillaries, although the N number was lower here than for the measurements from TEM of larger central vessels (Figure 4g). It would be useful to carry out additional measurements to increase the N number in Figure 4d to see whether the difference between wild-type sibling and mutant capillary caveolae numbers remains as not significant.

      We thank the reviewer for these raising important points and suggestions.

      Firstly, in relation to signal in capillary regions and likely diffusion from hotspots, please see the response to reviewer 3 point 9 above.

      Secondly, we have imaged and analysed more capillaries in both pdgfrb mutants and siblings (Extended Data Fig. 7a–b, previously Extended Data Fig. 4d). The results showed no significant difference between these groups, suggesting that capillary EC transcytosis is unchanged in our pdgfrb mutants.

      It might be helpful to include some orienting labels and/or additional descriptions in the figure legends to help readers who are not used to looking at zebrafish brain vessels have an easier time figuring out what they are looking at and where it is in the brain.

      We thank the reviewer for this suggestion and agree that adding further information in the figure legends and illustrations about orientation would make it easier for readers. In addition to the information provided in the figure legends in the submitted version, we have added an illustration, more labels on the revised figures, extended the descriptions in figure legends, main text and methods.

      We have added a schematic depicting the tracer leakage assay workflow, orientation of live imaging and analysed region of interest (Extended Data Fig. 1a–b).

      All figure legends have been updated with the anatomical position and microscopy view.

      Additional labels on figures have been added to understand the referenced vessel names (new data in Fig. 3c and Extended Data Fig. 4a–b′).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The study uses the intensity of tracer signals within the vessels to analyze BBB permeability, potentially underestimating leakage severity. The dye intensity is measured 2 hours after injection, however, other studies have already observed leakage after 30 Minutes, by imaging directly in the brain parenchyma. The overall intensity should also decrease through leakage from the other vessels of the body, e.g. in the trunk and tail. Probably the loss of intra-vascular dye intensity from leakage in barrier-free vessels is already so high (after 2 hours) that the smaller amount of leakage across the BBB cannot be observed.

      We thank the reviewer for this comment and suggestion. We agree that small sized tracers leak from vasculature, particularly through fenestrated vessels in the trunk and tail. We have based our timing on previous studies and our own experience. In zebrafish, the study by O’Brown et al 2019 also used 2 hpi[5] for detection of leakage in mfsd2aa mutants, which also has been proposed to regulate BBB integrity by controlling EC transcytosis. Therefore, we believe that performing experiments at 2 hpi is appropriate to investigate roles of pericytes in BBB integrity. Our data would suggest that this timing works.

      In response to this and other comments, we performed further experiments and analyses to test leakage of tracers testing molecular weights ranging from 1 to 2000 kDa individually. We showed that these tracers can reliably be detected in brain parenchyma and vasculature when imaged at 2 hpi. In another study, we showed that medium size tracers such as 40 kDa Dextran can be reliably detected in the vasculature in similar timepoints[10]. Considering we have performed experiments using 10 and 70 kDa tracers do detect parenchymal tracer accumulation and tracer still within the vessels, we believe this timepoint is appropriate for assessing BBB integrity in zebrafish.

      In addition to these experiments, see our tracer leakage experiments in 1-month-old animals, at 0.5 and 6 hpi to test leakage pattern described above (Fig. 5 and Extended Data Fig. 6).

      Therefore, the authors will need to validate their method of choice, showing an impairment of the BBB, caused by other agents (known to affect the BBB), and at 48hpf, when the BBB is not tightened yet. One example for BBB impairment can be found in O'Brown et al (2019), eLife 8e47326. doi: 10.7554/eLife.47326

      We thank the reviewer for this suggestion. As shown by O’Brown et al 2019, we have performed experiments at 3 dpf when BBB integrity is not mature and at 7 dpf when BBB is functional[5], testing both 10 and 70 kDa tracers. We detected significantly higher tracer accumulation at 3 dpf, showing our new additional method (see below) can detect tracer leakage in the brain (new data in Fig. 2e–f).

      Ideally, the authors would also supplement the method with additional approaches in the younger developmental stages to validate their findings.

      The validation of the method and the findings is particularly important for the claims of lack of BBB impairment in the absence of mural cells, as this is a "negative" finding.

      In response to this and comments from other reviewers, we performed additional tracer leakage experiments (new data in Fig. 2a–d) where we imaged 10 and 70 kDa tracers with a vascular reporter (Tg(kdrl:EGFP)<sup>s843</sup> or Tg(kdrl:Hsa.HRAS-mCherry)<sup>s916</sup>) and used this reporter for normalisation. Both this approach as well as the experiments provided in the first submission (updated as Extended Data Fig. 3a–d) showed that pdgfrb mutants at 7 and 14 dpf have indistinguishable BBB integrity compared to siblings. See also Author response image 1 that further addresses this.

      I also strongly suggest to rephrase and downtown the claim that vascular mural cells do not control the blood-brain barrier in developing zebrafish.

      As a negative finding cannot be proven completely and lots of the previously shown effects on murine BBB impairment are rather weak (when caused by single agents such as Claudin5 deficiency or Sphingosine-phosphate receptor1 knockout), it might be important to only claim that in zebrafish no strong impairment (as observed in the mural cell-deficient mouse) could be observed. Or rephrase it to "no impairment as severe as/comparable to ... could be observed" and then provide an impairment control for the developmental stages.

      We thank the reviewer for this comment and agree that negative findings are very challenging to prove. However, we find no evidence of leakage of the BBB in animals lacking mural cells at 7 and 14 dpf and believe that our data is robust on this point. As such, we believe we show that a vertebrate with a largely conserved EC BBB, can have intact barrier function in the absence of mural cells.

      We have as suggested revised our claims throughout the manuscript to provide more further nuanced discussion of this, but we do not want to water down our claims too much as we believe they are important. We hope that the reviewer will appreciate our carefully worded and expanded discussion section.

      Additional items of interest to the readers and therefore suggestions to improve the manuscript could be

      (1) To include more molecular analysis: while the study identifies caveolae induction and basement membrane thickening as potential contributors to focal leakage, the exact molecular mechanisms linking mural cell loss to these structural changes are not deeply investigated.

      (2) Also, the study primarily associates BBB disruption in the adult with aneurysms. Therefore other subtle or diffuse changes to BBB permeability that might occur even without overt vascular lesions are potentially underrepresented.

      However, following up experimentally on these might exceed the scope of the manuscript.

      We thank the reviewer for these suggestions and agree with both points. However, as stated by the reviewer, these experiments are beyond the scope of the manuscript and represent future directions for our lab and others.

      Reviewer #2 (Recommendations for the authors):

      (1) Mouse genes should be written as follows: Pdgfb, Pdgfrb and be in italics. See line line 70: it should be written "Pdgfb and Pdgfrb (italics)" and not "PdgfB and Pdgfrβ".

      We have updated the text according to the reviewer’s suggestion.

      (2) Please state the age of the fish analyzed in Figure 1f and 1g.

      We have moved this data to Extended Fig. 3a–d (previously Fig. 1f-g) and have placed age information on the images and in the figure legends.

      (3) Is the reduced vascular complexity in pdgfb mutant due to reduced angiogenesis or due to excessive pruning?

      This is a good question, and we do not know at this stage. We have unpublished data that suggest pericytes secrete angiogenic growth factors, but this question warrants a thorough investigation that we believe is beyond the scope of this current study.

      (4) Please check that the figure legends state the correct number of fish analysed. For example, Figure 1 d, e N=8 but there seem to be 9 data points per group - 14dpf.

      We apologise for this mistake and thank the reviewer for raising this. We have updated the graphs and figure legends accordingly.

      (5) Please indicate in the figures the genotypes (wt, het) of a sibling presented alongside a pdgfb mutant.

      Wild-type and heterozygous mutants are commonly used together in zebrafish research as a collective control group termed siblings. Since we didn’t see any difference between wild-type and pdgfrbuq30bh/- groups in any experiments, we reported these groups together. This is now stated in the supplementary materials.

      One exception to this was examination of the growth and survival rates where we show the genotypes separately (new data in Extended Data Fig. 1b-f).

      (6) Please explain clearly what region is shown in Figure 2B. I do not understand the explanation "approximate location of dotted line". Is the image in the panel "a" top view of a brain?

      We have moved this data to Fig. 3a′ (previously Fig. 2b) and replaced the dotted line in Figure 3a (previously Fig. 2a) with a white box indicating the location of the restricted region in the whole brain image.

      We have revised the text as below:

      “Subset of z-slices from the whole brain imaging in (a) and (b) (white boxes) indicating mural cell loss and abnormal capillary network patterning. 100-μm-thick maximum intensity projections (MIP) were generated using the continuation of the left middle mesencephalic central artery (MMCtA, arrow) as an anatomical landmark.”

      In addition, we have updated all our figure legends clearly stating the view and anatomical position of the imaged sample.

      (7) Figure 2e: Note that- the dotted areas do not correspond to the areas magnified. Please adjust.

      We have moved this data to Extended Data Fig. 5a (previously Fig. 2e–e′) and updated the location of the white box in 5a shown in enlarged view in 5a′.

      (8) Lines 112 and 114 - Should the indicated figure be Figure 2b-d and Figure 2c-d, respectively, and not Figure 1?

      We thank the reviewer for pointing out this mistake. All the figure legends are now referred to appropriately in the revised manuscript.

      (9) Data presented in Figure 2 and Figure 3 can be consolidated and presented as one Figure.

      We thank the reviewer for this suggestion. After addition of new data and revising the manuscript we have decided to keep these data presented separately.

      (10) Note that Figure 2a,b shows 5-month-old fish, not 2-month-old fish. Additionally, Extended Data Figure 3 shows 5-month-old fish, not 3-month-old fish.

      The stages noted by the reviewer were correctly indicated.

      (11) Figure 2d: Please clarify the definition of a "large vessel".

      We have observed normal morphology in capillaries and noted aneurysms and hotspots in large calibre vessels such as arteries, which become more severe over time. We have revised this across the manuscript accordingly.

      (12) Figure 4a, b: Please explain how the hotspots of leakage were defined based on the extravasated tracer.

      Hotspots of leakage are scored when fluorescent tracer aggregates are clearly observed outside the vessels. Vessel borders were defined using the transgenic lines (Tg(kdrl:EGFP)<sup>s843</sup> or Tg(kdrl:Hsa.HRAS-mCherry)<sup>s916</sup>). We have added a clear description in the methods section (lines 473–475).

      Figure 4c: Why were Pdgfrb crispants used and not the mutant line?

      They were used as pdgfrb crispants phenocopy the lack of brain mural cells (Extended Data Fig. 5e, previously Extended Data Fig. 4b) and mutant phenotype reliably and for practical reasons, because they allow faster experiments and reduce fish usage.

      Figure 4e: The magnification of the electron microscopy images does not make it possible to clearly identify caveolae. What was the magnification of the collected images for caveolae analysis? How did the authors ensure that they quantified only caveolae and not other types of vesicles?

      Respectfully, we disagree that the magnification is insufficient as our images were captured and analysed consistent with previous ultrastructural descriptions[11,12]. We based our quantification of caveolae on the size of vesicles observed and define them as circular profiles of less than 100 nm in diameter and were scored as luminal or abluminal based on proximity to each surface membrane (within 500 nm of each surface or in a thin-walled vessel the caveolae closest to each surface) (lines 398–409). Importantly, comparable analyses at similar magnifications have been independently validated in multiple caveola-deficient zebrafish genetic models[4,13]. Interestingly given the reviewers comments above, we do see increased vesicular structures that are larger than caveolae, but we only provide quantification of the caveolae here.

      Reviewer #3 (Recommendations for the authors):

      Congratulations to the authors on their really beautiful imaging and rigorous quantitative documentation of phenotypes - this is a really nicely done study, and could be very important to the field with just a few additional experiments to buttress the key conclusions.

      We thank the reviewer for their kind comments.

      In addition to the comments noted in the public review, I would only point out that there are two mislabeled call-outs in the text (Lines 112 and 114; says Figure 1, should say Figure 2).

      We thank the reviewer for this point and have now revised the text accordingly.

      (1) Ando, K., Ishii, T. & Fukuhara, S. Zebrafish Vascular Mural Cell Biology: Recent Advances, Development, and Functions. Life (Basel) 11 (2021). https://doi.org/10.3390/life11101041

      (2) Ando, K. et al. Clarification of mural cell coverage of vascular endothelial cells by live imaging of zebrafish. Development 143, 1328-1339 (2016). https://doi.org/10.1242/dev.132654

      (3) Ando, K. et al. Conserved and context-dependent roles for pdgfrb signaling during zebrafish vascular mural cell development. Dev Biol 479, 11-22 (2021). https://doi.org/10.1016/j.ydbio.2021.06.010

      (4) Lim, Y. W. et al. Trans-Endothelial Trafficking in Zebrafish: Nanobio Interactions of Polyethylene Glycol-Based Nanoparticles in Live Vasculature. ACS Nano (2026). https://doi.org/10.1021/acsnano.5c21042

      (5) O'Brown, N. M., Megason, S. G. & Gu, C. Suppression of transcytosis regulates zebrafish blood-brain barrier function. Elife 8 (2019). https://doi.org/10.7554/eLife.47326

      (6) O'Brown, N. M. et al. The secreted neuronal signal Spock1 promotes blood-brain barrier development. Dev Cell 58, 1534-1547 e1536 (2023). https://doi.org/10.1016/j.devcel.2023.06.005

      (7) Armulik, A. et al. Pericytes regulate the blood-brain barrier. Nature 468, 557-561 (2010). https://doi.org/10.1038/nature09522

      (8) Daneman, R., Zhou, L., Kebede, A. A. & Barres, B. A. Pericytes are required for blood-brain barrier integrity during embryogenesis. Nature 468, 562-566 (2010). https://doi.org/10.1038/nature09513

      (9) Mae, M. A. et al. Single-Cell Analysis of Blood-Brain Barrier Response to Pericyte Loss. Circ Res 128, e46-e62 (2021). https://doi.org/10.1161/CIRCRESAHA.120.317473

      (10) Lim, Y.-W. et al. A Standardized Protocol to Investigate Trans- Endothelial Trafficking in Zebrafish: Nano-bio Interactions of PEG-based Nanoparticles in Live Vasculature. bioRxiv, 2025.2007.2023.666282 (2025). https://doi.org/10.1101/2025.07.23.666282

      (11) Parton, R. G. & Simons, K. The multiple faces of caveolae. Nat Rev Mol Cell Biol 8, 185-194 (2007). https://doi.org/10.1038/nrm2122

      (12) Parton, R. G. & del Pozo, M. A. Caveolae as plasma membrane sensors, protectors and organizers. Nat Rev Mol Cell Biol 14, 98-112 (2013). https://doi.org/10.1038/nrm3512

      (13) Lim, Y. W. et al. Caveolae Protect Notochord Cells against Catastrophic Mechanical Failure during Development. Curr Biol 27, 1968-1981 e1967 (2017). https://doi.org/10.1016/j.cub.2017.05.06

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors aim to investigate the mechanisms underlying Kupffer cell death in metabolic-associated steatotic liver disease (MASLD). The authors propose that KCs undergo massive cell death in MASLD and that glycolysis drives this process. However, there appears to be a discrepancy between the reported high rates of KC death and the apparent maintenance of KC homeostasis and replacement capacity.

      Strengths:

      This is an in vivo study.

      Weaknesses:

      There are discrepancies between the authors' observations and previous reports, as well as inconsistencies among their own findings.

      Before presenting the percentage of CLEC4F<sup>+</sup>TUNEL<sup>+</sup> cells, the authors should have first shown the number of CLEC4F<sup>+</sup> cells per unit area in Figure 1. At 16 weeks of age, the proportion of TUNEL<sup>+</sup> KCs is extremely high (~60%), yet the flow cytometry data indicate that nearly all F4/80<sup>+</sup> KCs are TIMD4<sup>+</sup>, suggesting an embryonic origin. If such extensive KC death occurred, the proportion of embryonically derived TIMD4<sup>+</sup> KCs would be expected to decrease substantially. Surprisingly, the proportion of TIMD4<sup>+</sup> KCs is comparable between chow-fed and 16-week HFHC-fed animals. Thus, the immunostaining and flow cytometry data are inconsistent, making it difficult to explain how massive KC death does not lead to their replacement by monocyte-derived cells.

      We thank the reviewer for the insightful comment and the opportunity to clarify this important point. To ensure consistency between our methodologies, we replaced Clec4f staining with TIM4 staining results as requested by the reviewer. We first showed the number of TIM4<sup>+</sup> cells per unit area in Figure 1B. The results showed a significant and progressive loss of TIM4<sup>+</sup> cells per unit area in the liver parenchyma, decreasing from approximately 60 cells/FOV at baseline (0w) to nearly 50 at 4w and further to about 30 at 16w post-HFHC diet. This finding is fully consistent with our flow cytometry data. The percentage of the embryonically derived KC population (CD11blow F4/80hi TIM4hi) among CD45<sup>+</sup> cells dropped from 30.2% (0w) to 24.3% (4w) and 17.6% (16w) (Revised Figure 1C). The absolute number per gram of liver decreased from roughly 12 x 10<sup>5</sup> (1w) to 9 x 10<sup>5</sup> (4w) and 5 x 10<sup>5</sup> (16w) (Revised Figure 1D).

      These data suggest that despite the reported high rate of cell death among CLEC4F<sup>+</sup>TIMD4<sup>+</sup> KCs, the population appears to self-maintain, with no evidence of monocyte-derived KC generation in this model, which contradicts several recent studies in the field.

      We appreciate the reviewer’s insightful comment. We agree that our data show no substantial generation of monocyte-derived Kupffer cells (MoKCs) within the 16-week HFHC model. However, we do not believe the remaining embryonic KCs(EmKCs) are maintained through self-renewal, as the proportion of Ki67<sup>+</sup>TIM4<sup>+</sup> cells remains low at all time points (Revised Figure S2D). Instead, our observations align with a phased replacement model: recruited monocytes first differentiate into monocyte-derived macrophages (MoMFs), which we see accumulate (Revised Figure S2B, S2C), and only later adopt a KC phenotype. Consistent with this, our 16-week model shows significant EmKC loss and MoMF expansion, but not yet the emergence of TIM4-MoKCs. This timing is supported by prior studies, where TIM4-KCs were observed at 24 weeks, but not at 16 weeks, on similar diets (Ref. 1,2). Therefore, we interpret our findings as capturing an earlier phase of MASLD progression, characterized by EmKC death and MoMF accumulation, prior to their full differentiation into MoKCs.

      Moreover, there is no evidence that TIM4<sup>+</sup>CLEC4F<sup>+</sup> KCs increase their proliferation rate to compensate for such extensive cell death. If approximately 60% of KCs are dying and no monocyte-derived KCs are recruited, one would expect a much greater decrease in total KC numbers than what is reported.

      Thank you for raising this point, which allows for an important clarification. The interpretation that approximately 60% of KCs are dying is correct, but this refers to the proportion of the remaining KC population at 16 weeks that is TUNEL<sup>+</sup>, not to 60% of the original KC pool. Since our data show that over half of the EmKCs are lost by 16 weeks (Revised Figure 1B), the 60% of dying cells at this late time point corresponds roughly to only 25-30% of the total original KC population at baseline. This distinction reconciles the high rate of apoptosis observed late in disease with the overall progressive depletion of the EmKC pool.

      It is also unexpected that the maximal rate of KC death occurs at early time points (8 weeks), when the mice have not yet gained substantial weight (Figure 1B). Previous studies have shown that longer feeding periods are typically required to observe the loss of embryo-derived KCs.

      We appreciate the reviewer’s insightful observation. We think KC death is a continuous event during MASLD. To induce MASH, previous studies typically assess the loss of EmKCs after longer feeding periods, which might leave us an impression that longer feeding periods are required to observe substantive loss of embryonically derived KCs. In our HFHC model, the proportion of dying KCs was already elevated by 8 weeks, and this high rate was sustained through the 16-week endpoint. In a separate MCD dietary model characterized by rapid MASLD progression, a high rate of KC death was detectable as early as 6 weeks (Revised Figure 1F). Collectively, these data suggest that the onset of significant KC death is dependent on the pace of MASLD pathogenesis, more likely an early-initiated event that is through MASLD progression.

      Furthermore, it is surprising that the HFD induces as much KC death as the HFHC and MCD diets. Earlier studies suggested that HFD alone is far less effective than MASH-inducing diets at promoting the replacement of embryonic KCs by monocyte-derived macrophages.

      We appreciate the reviewer’s insightful comment. In our study, we observed significant KCs death under both HFD and HFHC feeding for 20, 16 weeks, respectively. Moreover, both HFHC and HFD induced similar stages of MASLD (characterized by significant lipid accumulation without fibrosis development) by these time points (Authir response image 1). Therefore, these data support that the onset of substantial KCs death may be an early MASLD event, before the progression to MASH. Additionally, this finding aligns with existing literature showing that 16 weeks of HFD feeding alone is sufficient to cause a marked reduction in the TIM4<sup>+</sup>KCs population (Ref. 1).

      Author response image 1.

      Detection of liver fibrosis in MASLD mouse models. Male wild-type C57BL/6J mice were fed a high-fat, high-cholesterol (HFHC) diet for 16 weeks or a high-fat diet (HFD) for 20 weeks to induce MASLD. Mice fed a normal chow diet (NCD) served as controls. (A) Sirius Red staining of liver sections was performed to assess collagen deposition and fibrosis during MASLD progression. Scale bar, 20 μm. (B) Western blot analysis of liver tissue lysates showing α-smooth muscle actin (α-SMA) expression as a marker of hepatic stellate cell activation and liver fibrosis.

      In Figure 2D, TIMD4 staining appears extremely faint, making the results difficult to interpret. In contrast, the TUNEL signal is strikingly intense and encompasses a large proportion of liver cells (approximately 60% of KCs, 15% of hepatocytes, 20% of hepatic stellate cells, 30% of non-KC macrophages, and a proportion of endothelial cells is also likely affected). This pattern closely resembles that typically observed in mouse models of acute liver failure. Given this apparent extent of cell death, it is unexpected that ALT and AST levels remain low in MASH mice, which is highly unusual.

      Thank you for this important feedback. To address concerns about the clarity of our imaging, we have provided high-resolution split-channel raw images for Figure 2D (Revised Figure 2D), which distinctly show the localization of TIM4, TUNEL, and GS. These confirm the progressive reduction of TIM4<sup>+</sup>KCs and the increase in TUNEL<sup>+</sup> TIM4<sup>+</sup>cells over time. We agree that the high proportion of TUNEL<sup>+</sup>cells seems at odds with the modest ALT/AST elevation. This discrepancy might be explained by the distinct nature of cell death in MASLD. Unlike the acute necrosis with membrane rupture seen in acute liver failure—which causes massive, rapid enzyme release— obesity-related liver injury is a chronic process dominated by apoptosis (Ref. 4,5). Apoptosis preserves membrane integrity until late stages (Ref. 6), with dying cells packaged into apoptotic bodies for efficient phagocytic clearance by neighboring macrophages (Ref. 7,8). This controlled disposal system minimizes the leakage of intracellular enzymes. Therefore, the coexistence of widespread apoptosis (high TUNEL signal) with limited enzyme release (low ALT/AST) is a recognized feature of chronic MASLD pathogenesis.

      No statistical analysis is provided for Figure 5D, and it is unclear which metabolites show statistically significant changes in Figure 5C.

      We thank the reviewer for raising this statistical problem. We have now included statistical analysis in Revised Figure 5D.

      In addition, there is no evaluation of liver pathology in Clec4f-Cre × Chil1flox/flox mice. It remains possible that the observed effects on KC death result from aggravated liver injury in these animals. There is also no evidence that Chil1 deficiency affects glucose metabolism in KCs in vivo.

      We thank the reviewer for these important points. We previously characterized the liver pathology of Clec4f<sup>ΔChil1</sup> mice in detail (preprint: eLife 2025, DOI: 10.7554/eLife.107023.1, Fig. 2). On a normal chow diet, these mice showed no differences in body weight, hepatic lipid deposition, metabolic parameters, or glucose tolerance compared to controls. However, on an HFHC diet, Clec4f<sup>ΔChil1</sup> mice developed significantly worse metabolic and histological phenotypes. Crucially, our in vitro data demonstrate that recombinant Chi3l1 directly reduces KC death (preprint, Fig. 6E-F), indicating that the aggravated MASLD in knockout mice is a consequence of increased KC loss, not its cause.

      Regarding glucose metabolism, we have previously shown that Chi3l1 deficiency leads to increased glucose uptake by KCs in vivo using the fluorescent glucose analog 2-NBDG. This effect was reversed by supplementing knockout mice with recombinant Chi3l1 (preprint Fig. 6G-H). This provides direct evidence that Chi3l1 modulates glucose uptake in KCs in vivo.

      Finally, the authors should include a more direct experimental approach to modulate glycolysis in KCs and assess its causal role in KC death in MASH.

      We thank the reviewer for this constructive suggestion. To more directly evaluate the role of glycolysis in KCs death in vivo, we performed pharmacological inhibition of glycolysis using 2-deoxy-D-glucose (2-DG) in the HFHC-induced MASLD model (Revised Figure 4E–G). Wild-type mice were fed an HFHC diet for four weeks, and 2-DG (50 mg/kg) or vehicle was administered intraperitoneally every other day beginning at week 3. This short intervention period and modest dosing were chosen to limit potential systemic metabolic effects while modulating glycolytic activity during active disease development. KCs apoptosis was assessed by TIM4/TUNEL co-staining. 2-DG treatment significantly reduced the proportion of TUNEL<sup>+</sup>KCs compared with vehicle controls, indicating protection against KCs death. These data together with our complementary in vitro gain-of-function experiments, support a contributory role for excessive glycolytic activity in promoting KC apoptosis in MASLD. We have incorporated these findings into the revised manuscript to strengthen the causal link between glycolytic reprogramming and KCs loss in vivo (Revised manuscript, page 7, line 267-282).

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, He et al. set out to investigate the mechanisms behind Kupffer Cell death in MASLD. As has been previously shown, they demonstrate a loss of resident KCs in MASLD in different mouse models. They then go on to show that this correlates with alterations in genes/metabolites associated with glucose metabolism in KCs. To investigate the role of glucose metabolism further, they subject isolated KCs in vitro to different metabolic treatments and assess cleaved caspase 3 staining, demonstrating that KCs show increased Cl. Casp 3 staining upon stimulation of glycolysis. Finally, they use a genetic mouse model (Chil1KO) where they have previously reported that loss of this gene leads to increased glycolysis and validate this finding in BMDMs (KO). They then remove this gene specifically from KCs (Clec4fCre) and show that this leads to increased macrophage death compared with controls.

      Strengths:

      As we do not yet understand why KCs die in MASLD, this manuscript provides some explanation for this finding. The metabolomics is novel and provides insight into KC biology. It could also lead to further investigation; here, it will be important that the full dataset is made available.

      Weaknesses:

      Different diets are known to induce different amounts of KC loss, yet here, all models examined appear to result in 60% KC death. One small field of view of liver tissue is shown as representative to make these claims, but this is not sufficient, as anything can be claimed based on one field of view. Rather, a full tissue slice should be included to allow readers to really assess the level of death.

      Thank you for raising this point regarding data presentation. We analyzed full tissue slices and found that including a view of the entire slice at a standard magnification makes individual KC difficult to resolve (Author response image 2). To clearly represent the extent and distribution of KCs death across the liver tissue slice, we now include lower-magnification images that provide a wider field of view, allowing readers to assess the pattern across a larger tissue area (Revised Figures 1, 2, 6F).

      Author response image 2.

      Assessment of KCs death on full liver tissue slice. (A) Immunofluorescence staining was performed to detect Kupffer cell (KC) death in liver sections from mice fed an MCD diet for 6 weeks. Cell death was assessed by TUNEL staining (green), and KCs were identified by TIM4 staining (red). Nuclei were counterstained with DAPI (blue). Representative whole-tissue view is shown. Scale bars, 1mm.

      Additionally, there is no consistency between the markers used to define KCs and moMFs, with CLEC4F being used in microscopy, TIM4 in flow, while the authors themselves acknowledge that moKCs are CLEC4F+TIM4-. As moKCs are induced in MASLD, this limits interpretation. Additionally, Iba1 is referred to as a moMF marker but is also expressed by KCs, which again prevents an accurate interpretation of the data. Indeed, the authors show 60% of KCs are dying but only 30% of IBA1+ moMFs, as KCs are also IBA1+, this would mean that KCs die much more than moMFs, which would then limit the relevance of the BMDM studies performed if the phenotype is KC specific. Therefore, this needs to be clarified.

      We thank the reviewer for the constructive comments. For consistency, we have standardized our KC marker to TIM4 for all immunostaining data, aligning it with our flow cytometry analysis (Revised Figures 1, 2D, 6F). We have also clarified that IBA1 is expressed by hepatic macrophages (both KCs and MoMFs)(Revised Figure 2C, Revised manuscript, page 5, lines 182-183). Moreover, we also included the clarification that 60% of TIM4<sup>+</sup> KCs are TUNEL<sup>+</sup> versus 30% of total IBA1<sup>+</sup> cells further supports that KCs undergo death more readily than MoMFs (Revised manuscript, page 5, lines 186-189). We also acknowleged the limitation of BMDM studies in the Revised manuscript, page 8, line 332-340.

      The claim that periportal KCs die preferentially is not supported, given that the majority of KCs are peri-portal. Rather, these results would need to be normalised to KC numbers in PP vs PC regions to make meaningful conclusions.

      We thank the reviewer for this important point. We included the normalized data. At 8 weeks, the normalized death rate was significantly higher in periportal versus pericentral regions (p = 0.041), supporting increased periportal KC susceptibility during early MASLD. By 16 weeks, proportional death rates became comparable between zones (Revised Figure 2D, Revised manuscript, page 6, lines 194-201).

      Additionally, KCs are known to be notoriously difficult to keep alive in vitro, and for these studies, the authors only examine cl. Casp 3 staining. To fully understand that data, a full analysis of the viability of the cells and whether they retain the KC phenotype in all conditions is required.

      We appreciate the reviewer’s suggestions. To confirm the identity and health of isolated KCs in our in vitro studies, we showed that ~95% of primary isolated KCs are TIM4<sup>+</sup> (Revised Figure S3A). Furthermore, Calcein-AM staining confirmed that the remaining KCs under our experimental conditions are viable and healthy (Revised Figure S4A).

      Finally, in the Cre-driven KO model, there does not seem to be any death of KCs in the controls (rather numbers trend towards an increase with time on diet, Figure 6E), contrary to what had been claimed in the rest of the paper, again making it difficult to interpret the overall results.

      We thank the reviewer for this comment. During our analysis, we indeed observed no reduction in KCs in the Clec4f cre control mice. This prompted us to consider that Cre insertion itself might influence KCs mainteinence. To investigate this, we performed TIM4/Ki67 co-staining, which revealed significantly higher numbers of proliferating KCs in Clec4f cre mice compared with C57BL/6J mice under NCD. Following HFHC feeding, KCs proliferation in Clec4f cre mice increased even further. These results indicate that Cre insertion enhanced KCs self-renewal in Clec4f cre mice,which contributes to maintenance of the KCs pool during MASLD (Revised Figures S8A and S8B). (Revised manuscript, page 9, line 363-370).

      Additionally, there is no validation that the increased death observed in vivo in KCs is due to further promotion of glycolysis.

      We thank the reviewer for this constructive suggestion. To more directly evaluate the role of glycolysis in KCs death in vivo, we performed pharmacological inhibition of glycolysis using 2-deoxy-D-glucose (2-DG) (Revised Figure 4E–G). Wild-type mice were fed an HFHC diet for five weeks, and 2-DG (50 mg/kg) or vehicle was administered intraperitoneally every other day beginning at week 3. This short intervention period and modest dosing were chosen to limit potential systemic metabolic effects while modulating glycolytic activity in KCs. KCs apoptosis was assessed by TIM4/TUNEL co-staining. 2-DG treatment significantly reduced the proportion of TUNEL<sup>+</sup>KCs compared with vehicle controls, indicating protection against KCs death. These data, together with our complementary in vitro gain-of-function experiments support a contributory role for excessive glycolytic activity in promoting KCs death in MASLD. We have incorporated these findings into the revised manuscript to strengthen the causal link between glycolytic reprogramming and KCs loss in vivo (Revised manuscript, page 7, line 267-282).

      Reviewer #3 (Public review):

      This manuscript provides novel insights into altered glucose metabolism and KC status during early MASLD. The authors propose that hyperactivated glycolysis drives a spatially patterned KC depletion that is more pronounced than the loss of hepatocytes or hepatic stellate cells. This concept significantly enhances our understanding of early MASLD progression and KC metabolic phenotype.

      Through a combination of TUNEL staining and MS-based metabolomic analyses of KCs from HFHC-fed mice, the authors show increased KC apoptosis alongside dysregulation of glycolysis and the pentose phosphate pathway. Using in vitro culture systems and KC-specific ablation of Chil1, a regulator of glycolytic flux, they further show that elevated glycolysis can promote KC apoptosis.

      However, it remains unclear whether the observed metabolic dysregulation directly causes KC death or whether secondary factors, such as low-grade inflammation or macrophage activation, also contribute significantly. Nonetheless, the results, particularly those derived from the Chil1-ablated model, point to a new potential target for the early prevention of KC death during MASLD progression.

      The manuscript is clearly written and thoughtfully addresses key limitations in the field, especially the focus on glycolytic intermediates rather than fatty acid oxidation. The authors acknowledge the missing mechanistic link between increased glycolysis and KC death. Still, several interpretations require moderation to avoid overstatement, and certain experimental details, particularly those concerning flow cytometry and population gating, need further clarification.

      Strengths:

      (1) The study presents the novel observation of profound metabolic dysregulation in KCs during early MASLD and identifies these cells as undergoing apoptosis. The finding that Chil1 ablation aggravates this phenotype opens new avenues for exploring therapeutic strategies to mitigate or reverse MASLD progression.

      (2) The authors provide a comprehensive metabolic profile of KCs following HFHC diet exposure, including quantification of individual metabolites. They further delineate alterations in glycolysis and the pentose phosphate pathway in Chil1-deficient cells, substantiating enhanced glycolytic flux through 13C-glucose tracing experiments.

      (3) The data underscore the critical importance of maintaining balanced glucose metabolism in both in vitro and in vivo contexts to prevent KC apoptosis, emphasizing the high metabolic specialization of these cells.

      (4) The observed increase in KC death in Chil1-deficient KCs demonstrates their dependence on tightly regulated glycolysis, particularly under pathological conditions such as early MASLD.

      Weaknesses:

      (1) The novelty is questionable. The presented work has considerable overlap with a study by the same lab, which is currently under review (citation 17), and it should be considered whether the data should not be presented in one paper.

      We appreciate the reviewer for the opportunity to clarify the relationship between the two studies. In our previous work (citation 17), we focused on the transcriptional metabolic differences between Kupffer cells (KCs) and monocyte-derived macrophages (MoMFs) and identified Chi3l1 as a selective protective factor that limits glucose uptake and shields KCs from metabolic stress–induced cell death, with minimal effects on MoMFs. That study directly motivated the current work. The observation that KCs are uniquely protected from metabolic stress led us to hypothesize that excessive glycolytic activation itself may be a primary driver of KCs death, which forms the central question of the present study. Accordingly, the current manuscript shifts the focus from Chi3l1-mediated protection to the mechanistic role of hyperglycolysis in driving KCs mortality, using distinct experimental approaches and addressing a different biological question. Because the two studies address conceptually distinct aims—one defining a protective regulator of KCs survival and the other dissecting glycolysis-driven KCs death mechanisms—we believe they are best presented as separate manuscripts. Combining them into a single study would dilute the mechanistic depth and clarity of each story.

      (2) The authors report that 60% of KCs are TUNEL-positive after 16 weeks of HFHC diet and confirm this by cleaved caspase-3 staining. Given that such marker positivity typically indicates imminent cell death within hours, it is unexpected that more extensive KC depletion or monocyte infiltration is not observed. Since Timd4 expression on monocyte-derived macrophages takes roughly one month to establish, the authors should consider whether these TUNEL-positive KCs persist in a pre-apoptotic state longer than anticipated. Alternatively, fate-mapping experiments could clarify the dynamics of KC death and replacement.

      We thank the reviewer for this astute observation. As shown in revised Figure 2D, the proportion of TIM4<sup>+</sup>TUNEL<sup>+</sup>KCs peaks at 8 weeks after HFHC feeding and remains elevated at 16 weeks. However, examination of the corresponding single-channel TIM4 staining during this period reveals that the overall density of TIM4<sup>+</sup> KCs does not undergo abrupt or synchronous depletion. This temporal dissociation between sustained TUNEL positivity and relatively gradual KCs loss suggests that TUNEL-positive KCs do not undergo immediate clearance. Based on these observations, we agree with the reviewer that a substantial fraction of TUNEL-positive KCs likely persists in a prolonged pre-apoptotic or stressed state rather than undergoing rapid cell death. This interpretation is consistent with the absence of extensive KCs depletion or compensatory monocyte infiltration at these time points. Importantly, previous studies (Ref. 1,2) indicate that KCs are eventually lost as MASLD progresses, supporting the notion that KC death is a gradual process that unfolds over an extended time frame rather than acutely.

      (3) The mechanistic link between elevated glycolytic flux and KC death remains unclear.

      We thank the reviewer for this constructive suggestion. To more directly evaluate the role of glycolysis in KCs death in vivo, we performed pharmacological inhibition of glycolysis using 2-deoxy-D-glucose (2-DG) (Revised Figure 4E–G). Wild-type mice were fed an HFHC diet for five weeks, and 2-DG (50 mg/kg) or vehicle was administered intraperitoneally every other day beginning at week 3. This short intervention period and modest dosing were chosen to limit potential systemic metabolic effects while modulating glycolytic activity of KCs. KCs apoptosis was assessed by TIM4/TUNEL co-staining. 2-DG treatment significantly reduced the proportion of TUNEL<sup>+</sup>KCs compared with vehicle controls, indicating protection against KCs death. These data, together with our complementary in vitro gain-of-function experiments, support a contributory role for excessive glycolytic activity in promoting KC apoptosis in MASLD. We have incorporated these findings into the revised manuscript to strengthen the causal link between glycolytic reprogramming and KCs loss in vivo (Revised manuscript, page 7, line 267-282).

      (4) The study does not address the polarization or ontogeny of KCs during early MASLD. Given that pro-inflammatory macrophages preferentially utilize glycolysis, such data could provide valuable insight into the reason for increased KC death beyond the presented hyperreliance on glycolysis.

      We thank the reviewer for this insightful comment. Regarding KCS ontogeny, flow cytometry analysis (Revised Figure 1C) shows that KCs remain uniformly TIM4<sup>hi</sup> during early MASLD, indicating that monocyte-derived KCs (TIM4<sup>low</sup>) have not yet emerged at these stages. To address KCs polarization, we assessed the expression of M1-type (pro-inflammatory) markers (Nos2, Cxcl9, CIITA, Cd86, Ccl3, and Ccl5) and M2-type (anti-inflammatory) markers (Chil3, Retnla, Arg1, and Mrc1) in KCs isolated from WT mice fed a HFHC diet for 0, 8, and 16 weeks. As shown in revised Figure S5A, M1 markers progressively increase over time, whereas M2 markers remain unchanged or slightly decrease. This polarization shift is consistent with the increased glycolytic activity observed in KCs during early MASLD. Together, these data indicate that embryonically derived KCs undergo a pro-inflammatory polarization accompanied by enhanced glycolytic metabolism during early MASLD, providing mechanistic context for their increased susceptibility to metabolic stress–induced cell death beyond hyperreliance on glycolysis alone (Revised manuscript, page 7-8, line 307-321).

      (5) The gating strategy for monocyte-derived macrophages (moMFs) appears suboptimal and may include monocytes. A more rigorous characterization of myeloid populations by including additional markers would strengthen the study's conclusions.

      We thank the reviewer for raising this important point. To improve the rigor of our analysis, we adopted gating strategies established in previous studies (PMID: 41131393; PMID: 32562600). Specifically, Kupffer cells were defined as CD45<sup>+</sup>CD11b<sup>+</sup>F4/80<sup>hi</sup> TIM4<sup>hi</sup> cells, while monocyte-derived macrophages (MoMFs) were defined as CD45<sup>+</sup>Ly6G<sup>-</sup>CD11b<sup>+</sup>F4/80<sup>low</sup> TIM4<sup>low/−</sup> cells, thereby excluding contaminating neutrophils and minimizing inclusion of circulating monocytes. Using this refined gating strategy, we observed a progressive reduction of KCs accompanied by a corresponding increase in MoMFs in WT mice during HFHC feeding (Revised Figures 1C and S2B–C), (Revised manuscript, page 4, line 154-163).

      (6) While BMDMs from Chil1 knockout mice are used to demonstrate enhanced glycolytic flux, it remains unclear whether Chil1 deficiency affects macrophage differentiation itself.

      We thank the reviewer for this important question. To determine whether Chi3l1 deficiency affects macrophage differentiation, we analyzed the expression of M1-type (pro-inflammatory) markers (Nos2, Cxcl9, CIITA, Cd86, Ccl3, and Ccl5) and M2-type (anti-inflammatory) markers (Chil3, Retnla, Arg1, and Mrc1) in Kupffer cells isolated from WT and Chil1<sup>-/-</sup> mice fed a HFHC diet for 0, 8, and 16 weeks. At baseline (0 weeks), Chi3l1 deficiency was associated with elevated expression of multiple M1 markers, whereas M2 marker expression was comparable between WT and Chil1<sup>-/-</sup> KCs. During MASLD progression, the pro-inflammatory signature in Chil1<sup>-/-</sup> KCs was further enhanced, while anti-inflammatory marker expression became dysregulated (revised Figure S5C). Together, these data indicate that Chi3l1 deficiency does not impair macrophage differentiation per se but biases KCs toward a partially pro-inflammatory, M1-like phenotype, providing additional context for the enhanced glycolytic flux observed in Chi3l1-deficient macrophages (Revised manuscript, page 7-8, line 307-321).

      (7) The authors use the PDK activator PS48 and the ATP synthase inhibitor oligomycin to argue that increased glycolytic flux at the expense of OXPHOS promotes KC death. However, given the high energy demands of KCs and the fact that OXPHOS yields 15-16 times more ATP per glucose molecule than glycolysis, the increased apoptosis observed in Figure 4C-F could primarily reflect energy deprivation rather than a glycolysis-specific mechanism.

      We thank the reviewer for highlighting this important point. We agree that KCs are highly metabolically active and that perturbations of OXPHOS can influence overall cellular energy balance. As noted in our response to comment #3, we further performed glycolysis inhibition assay by 2-DG in vivo, the protection of KCs observed following 2-DG in vivo (Revised Figure 4E-G) further provides evidence that increased glycolytic flux is not merely correlated with, but functionally contributes to KCs loss in

      MASLD.

      (8) In Figure 1C, KC numbers are significantly reduced after 4 and 16 weeks of HFHC diet in WT male mice, yet no comparable reduction is seen in Clec4Cre control mice, which should theoretically exhibit similar behavior under identical conditions.

      We thank the reviewer for this comment. During our analysis, we indeed observed no reduction in KCs in the Clec4f cre control mice. This prompted us to consider that Cre insertion itself might influence KCs mainteinence. To investigate this, we performed TIM4/Ki67 co-staining, which revealed significantly higher numbers of proliferating KCs in Clec4f cre mice compared with C57BL/6J mice under NCD. Following HFHC feeding, KCs proliferation in Clec4f cre mice increased even further. These results indicate that Cre insertion enhanced KCs self-renewal in Clec4f cre mice,which contributes to maintenance of the KCs pool during MASLD (Revised Figures S8A and S8B). (Revised manuscript, page 9, line 363-370).

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      To address the concerns raised in the public review, the authors should:

      (1) Reassess their conclusions using the same panels in flow and microscopy, e.g., the combination of CLEC4F, TIM4, and IBA1. This will allow resKCs (CLEC4F+TIM4+IBA1+), moKCs (CLEC4F+TIM4-IBA1+), and moMFs (CLEC4F-TIM4-IBA1+) to be accurately defined and hence their viability and numbers correctly assessed.

      We thank the reviewer for this insightful suggestion. In our flow cytometry analysis, we did not detect a CD45<sup>+</sup>CD11b<sup>low</sup>F4/80<sup>hi</sup>TIM4<sup>low</sup> population, indicating that monocyte-derived KCs (moKCs) have not emerged in our model at this stage. To more accurately quantify resident KCs (resKCs) in the current study, we replaced CLEC4F with TIM4 staining and enumerated TIM4<sup>+</sup>as well as TIM4<sup>+</sup>TUNEL<sup>+</sup> cells. These data were highly consistent with CLEC4F<sup>+</sup>TUNEL<sup>+</sup>cell counts, confirming that moKCs are not involved in KCs death during early MASLD (Revised Figure 1A,B,E,F).

      (2) Investigate why the number of KCs in controls and MASLD are so distinct between Figures 1 and 6.

      We appreciate the reviewer’s suggestions. Like we explained above, Cre insertion promotes KCs self-renewal (Revised manuscript, Figure S8). This enhanced proliferative capacity likely accounts for the relative preservation of KCs numbers in Clec4f-Cre mice during HFHC feeding, explaining the apparent discrepancy with WT mice (Revised manuscript, Figure 6D-E).

      (3) Normalise the tunel+ cells based on the number of KCs in PP vs PC regions.

      After normalizing KCs death to KCs numbers in periportal (PP) versus pericentral (PC) regions, we found the proportion was significantly higher in PV regions compared to CV regions at 8 weeks of HFHC feeding. We have therefore revised our texts. (Revised manuscript, page 5, lines 194-201).

      (4) Demonstrate the viability of KCs in vitro across conditions.

      To confirm the identity and health of isolated KCs in our in vitro studies, we show that ~95% of primary isolated KCs are TIM4<sup>+</sup> (Revised Figure S3A). Furthermore, Calcein-AM staining confirmed that the remaining KCs under our experimental conditions are viable and healthy (Revised Figure S4A).

      (5) Confirm previous studies demonstrating different degrees of KC loss depending on the model of MASLD.

      We thank the reviewer for highlighting this point. Consistent with previous studies, KCs loss has been reported to varying degrees depending on the MASLD model used, reflecting the heterogeneity of hepatic macrophages, marker choice, mouse husbandry, and diet regimen. For example, in a 6-week MCD feeding model, ~10% of CLEC4F<sup>+</sup> KCs were TUNEL<sup>+</sup> (Figure 4A, Ref. 9). Another 6-week MCD study reported a drop from 66% to 26% TIM4<sup>+</sup> KCs (Figure 2A, Ref. 12). In an HFD model, TIM4<sup>+</sup> KCs decreased by ~20% after 16 weeks (Figure 1G, Ref. 1). In a Western diet model, TIM4<sup>+</sup>KCs decreased by >50% at 36 weeks (Figures 1J and 2C, Ref. 2). Together, these studies underscore the model-dependent nature of KCs loss and highlight the importance of experimental context and marker selection when assessing KCs dynamics in MASLD. We have included these studies in our discussion section (Revised manuscript, page 9-10, line 393-402)

      (6) Demonstrate in vivo that loss of CHIL1 drives further glycolysis in KCs.

      In Figure 6G-H of our previous study, we showed that Chi3l1 deficiency leads to more glucose uptake by KCs in vivo whereas suppelementing KO mice with recombinant Chi3l1 will significantly reduced glucose uptake by KCs through treating mice with a fluorescent glucose analog 2-NBDG. We included the related figure here as Author response image 3.

      Author response image 3.

      Chi3l1 limits glucose uptake by Kupffer cells in vivo. (A) Measurement of 2-NBDG (a fluorescent glucose analog) uptake by KCs in vivo. WT and Chil1<sup>-/-</sup> mice, either untreated or supplemented with rChi3l1, were injected intraperitoneally with 12 mg/kg 2-NBDG. After 45mins, KCs were isolated and glucose uptake assessed by spectrophotometry. (B) Representative immunofluorescence images of liver sections stained for TIM4 (red) and 2-NBDG uptake (green) to visualize glucose uptake by KCs in situ. Scale bar = 10 µm (zoom). Quantification is shown as the percentage of TIM4<sup>+</sup> cells that are also 2-NBDG<sup>+</sup>. Representative images were shown in B. One-way ANOVA was performed in A, B. P value is as indicated.

      (7) There is no mention of the publication of the metabolomics dataset; this should be released with the manuscript.

      We included the raw metabolomics dataset as Table S1 and S2 now.

      Reviewer #3 (Recommendations for the authors):

      (1) Methods: Reconsider which methods are described in the main text versus the Supplementary Information to improve readability and consistency.

      Thank you for your valuable suggestion. We have reevaluated and adjusted the placement of the methods section between the main text and the supplementary materials.

      (2) Line 34: Check for grammar issues.

      L34 has been revised as follows : Additionally, using Chi3l1-deficient mice, we further demonstrated that increased glucose utilization accelerates KCs death in vivo.

      (3) Lines 101, 110: Explicitly reference the corresponding Supplementary Methods sections.

      We have included the references for these two methods sections (Revised supplementary materials and methods, Line 30, 65, respectively).

      (4) Figure 2: Iba1 marks all macrophages, not only monocyte-derived macrophages; both figure and text (line 205) require correction.

      We have corrected Iba1 represent hepatic macrophages including both KCs and MoMFs (Revised Figure 2C, manuscript page 5, line 182).

      (5) Line 218-219: Avoid overinterpretation, as only KCs, hepatocytes, and hepatic stellate cells were assessed - not all hepatic populations.

      We appreciate the reviewer’s valuable suggestion and rephrased our description accordingly (Revised manuscript, page 5, line 186-189).

      (6) Line 262: Use abbreviations consistently throughout the manuscript.

      We have gone through the whole manuscript and double checked the abbreviations.

      (7) Line 264: Include the palmitic acid (PA) concentration used.

      We included 800 µM PA in the revised manuscript (Revised manuscript, page 6, line 250).”

      (8) Lines 316-317: Check for grammar errors.

      Grammar errors are checked (Revised manuscript, page 8, line 340-341).

      (9) Line 337-338: See comment above on gating strategy.

      We updated gating strategy accordingly (Revised manuscript, page 9, line 361-362).

      (10) Line 343-344: Note that Chi3l1 is not exclusively expressed by KCs.

      We rephrased our words accordingly (Revised manuscript, page 9, line 374-378).

      (11) Lines 355-358: The statement that "sustained glycolytic hyperactivation culminates not in sustained activation, but in apoptotic cell death" is unsupported by data or literature, as macrophage polarization was not analyzed in this study.

      We removed the statement from the revised manuscript.

      (12) Lines 375-379: Rephrase to clarify that while KCs are metabolically active and glucose-demanding, excessive glycolytic flux accelerates apoptosis.

      We have rephrased to clarify (Revised Manuscript, page 10, lines 405-407).

      (13) Lines 375-385 & 387-397: Consolidate overlapping statements for conciseness and coherence.

      We have consolidate the overlapping statements (Revised manuscript, page 10, lines 405-425).

      Reference

      Daemen, S. et al. Dynamic Shifts in the Composition of Resident and Recruited Macrophages Influence Tissue Remodeling in NASH. Cell Rep 34, 108626, doi:10.1016/j.celrep.2020.108626 (2021).

      Remmerie, A. et al. Osteopontin Expression Identifies a Subset of Recruited Macrophages Distinct from Kupffer Cells in the Fatty Liver. Immunity 53, 641-657.e614, doi:10.1016/j.immuni.2020.08.004 (2020).

      Ozer, J., Ratner, M., Shaw, M., Bailey, W. & Schomaker, S. The current state of serum biomarkers of hepatotoxicity. Toxicology 245, 194-205, doi:10.1016/j.tox.2007.11.021 (2008).

      Malhi, H. & Gores, G. J. Molecular mechanisms of lipotoxicity in nonalcoholic fatty liver disease. Semin Liver Dis 28, 360-369, doi:10.1055/s-0028-1091980 (2008).

      Ibrahim, S. H., Hirsova, P. & Gores, G. J. Non-alcoholic steatohepatitis pathogenesis: sublethal hepatocyte injury as a driver of liver inflammation. Gut 67, 963-972, doi:10.1136/gutjnl-2017-315691 (2018).

      Kerr, J. F., Wyllie, A. H. & Currie, A. R. Apoptosis: a basic biological phenomenon with wide-ranging implications in tissue kinetics. British journal of cancer 26, 239-257, doi:10.1038/bjc.1972.33 (1972).

      Poon, I. K., Lucas, C. D., Rossi, A. G. & Ravichandran, K. S. Apoptotic cell clearance: basic biology and therapeutic potential. Nat Rev Immunol 14, 166-180, doi:10.1038/nri3607 (2014).

      Krenkel, O. & Tacke, F. Liver macrophages in tissue homeostasis and disease. Nat Rev Immunol 17, 306-321, doi:10.1038/nri.2017.11 (2017).

      Tran, S. et al. Impaired Kupffer Cell Self-Renewal Alters the Liver Response to Lipid Overload during Non-alcoholic Steatohepatitis. Immunity 53, 627-640.e625, doi:10.1016/j.immuni.2020.06.003 (2020).

      O'Neill, L. A. & Pearce, E. J. Immunometabolism governs dendritic cell and macrophage function. J Exp Med 213, 15-23, doi:10.1084/jem.20151570 (2016).

      Vander Heiden, M. G. & DeBerardinis, R. J. Understanding the Intersections between Metabolism and Cancer Biology. Cell 168, 657-669, doi:10.1016/j.cell.2016.12.039 (2017).

      Zhang J, Wang Y, Fan M, Guan Y, Zhang W, Huang F, Zhang Z, Li X, Yuan B, Liu W, Geng M, Li X, Xu J, Jiang C, Zhao W, Ye F, Zhu W, Meng L, Lu S, Holmdahl R. Reactive oxygen species regulation by NCF1 governs ferroptosis susceptibility of Kupffer cells to MASH. Cell Metab. 2024 Aug 6;36(8):1745-1763.e6. doi: 10.1016/j.cmet.2024.05.008. Epub 2024 Jun 7. PMID: 38851189.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      In this manuscript, the authors aimed to identify the molecular target and mechanism by which α-Mangostin, a xanthone from Garcinia mangostana, produces vasorelaxation that could explain the antihypertensive effects. Building on prior reports of vascular relaxation and ion channel modulation, the authors convincingly show that large-conductance potassium BK channels are the primary site of action. Using electrophysiological, pharmacological, and computational evidence, the authors achieved their aims and showed that BK channels are the critical molecular determinant of mangostin's vasodilatory effects, even though the vascular studies are quite preliminary in nature.

      Strengths:

      (1) The broad pharmacological profiling of mangostin across potassium channel families, revealing BK channels - and the vascular BK-alpha/beta1 complex - as the potently activated target in a concentration-dependent manner.

      (2) Detailed gating analyses showing large negative shifts in voltage-dependence of activation and altered activation and deactivation kinetics.

      (3) High-quality single-channel recordings for open probability and dwell times.

      (4) Convincing activation in reconstituted BKα/β1-Ca<sub>v</sub> nanodomains mimicking physiological conditions and functional proof-of-concept validation in mouse aortic rings.

      We thank the reviewer for acknowledging the strength of the different aspects investigated in our study.

      Weaknesses are minor:

      (1) Some mutagenesis data (e.g., partial loss at L312A) could benefit from complementary structural validation.

      In the attempt to improve structural insight for the presented mutagenesis data, we have used Alphafold3 (AF3; Abramson et al., 2024) to generate models of the I308A, L312M and A316P substitutions and repeated the docking for each (Fig. R1). According to these predictive models,

      The I308A substitution considerably straightens the S6 helix starting at this residue. Hence, all residues are displaced relative to the WT: C<sub>a</sub> of L312, F315, and A316 are displaced by 2.8 Å, 4.2 Å, and 4.6 Å, respectively, widening the bottom of the binding pocket. However, the prediction confidence is rated lower as in the other AF3 models for all helices (70 > plDDT > 50). In the docking, poses in the binding pocket comparable to these observed in the WT (i.e. involving I308A, L312 and A316) and with the same molecule orientation have higher binding energies (-7.13 to -6.66 kcal mol<sup>-1</sup>). Additionally, poses without contact to I308A arise that have a more vertical position, indicating that the structural change affects the binding region.

      The changes induced by L312M are localized to residues 313-323, where S6 bends towards S5. Binding energies are lower especially in the best 2 poses that are also most comparable to the WT docking (-9.88 kcal mol<sup>-1</sup>), but clustering overall is poor and poses are more heterogeneous. Interactions with L312M are completely abolished, while interactions with I308 (in 11/20 poses), F315 (in all poses), and A316 (in 5/20 poses) persist. Because of the rather small structural alteration induced by the substitution and the variable poses one could speculate that the reduced V<sub>½</sub> shift is due to the observed loss in binding to L312M; however, retained interactions to the other residues would still allow α-Mangostin to activate.

      A316P induces a displacement of the S6 helix compared to the WT while the other pore helices are not affected. S6 shows an enhanced outward bending around A316, which results in displacements of residues where a-Mangostin would bind, i.e., the C<sub>a</sub> of F315 and L312M are displaced by 2.4 Å and 2.8 Å (I308 is not affected). Residues below are moved in a more rotational way, resulting in a C<sub>a</sub> displacement of 3.1 Å for Y318 and even 5.7 Å for V319, before displacements decrease again towards the intracellular helix end. While interactions with A316P are present in 10/20 analyzed poses, the helix displacement seems to hinder I308 and L312 interactions, as the best docked a-Mangostin pose (-8.41 kcal mol<sup>-1</sup>) is predicted to only contact F315 and Y318, and overall, any I308 or L312 contacts only occurred in 3/20 and 7/20 poses (wildtype: 17/20 and 20/20 poses). This may hint at a mechanism where A316P probably has a substantial allosteric share in reducing the V<sub>½</sub> shift induced by a-Mangostin and underlines the exceptional effect of this mutation (i.e., complete loss of a V<sub>½</sub> shift).

      Author response image 1.

      Alphafold3 models of BK I308A, L312M, and A316P with α-Mangostin docked to the mutant structures. The upper row shows an overview of the mutant pore helices (AF3 models) used for molecular docking. The lower row shows the binding region with the wildtype structure overlaid in gray. Only 3 helices are shown for clarity.

      Although these results provide interesting tentative explanations for the effect of the mutations and conclusions from AF3 models become increasingly robust, we think that definitive statements of their mechanistic contributions would require experimental studies of mutant channels, i.e., cryo-EM or crystallography, that are beyond our means. Therefore, we have decided not to include this data in the manuscript; however, it is accessible for the interested reader within the public review. Hopefully, as cryo-EM structures have been obtained for the wildtype channel, there will be studies on mutations of this gating-relevant S6 segment in the future.

      (2) While Cav-BK nanodomains were reconstituted, direct measurement of calcium signals after mangostin application onto native smooth muscle could be valuable.

      We are not sure if a global elevation of cellular calcium concentration would be informative. We rather expect that the relevant local Ca<sup>2+</sup> elevation would occur as sparks in the BK-Ca<sub>v</sub> nanodomains, close to the membrane. We would anticipate a change in spark duration, as the Ca<sup>2+</sup> inward current would be stopped faster by the enhanced repolarization via a-Mangostin activated BKα/β1 channels. This would require fast Ca<sup>2+</sup> imaging acquisition speed to capture spark activity. We concur that this would be an informative experiment to investigate a more native situation. However, we would have to accomplish such methodologically challenging measurements in a separate project, which could fruitfully be combined with a more extensive characterization of aortic contraction as also suggested in the following remark (3).

      (3) The work has an impact on ion channel physiology and pharmacology, providing a mechanistic link between a natural product and vasodilation. Datasets include electrophysiology traces, mutagenesis scans, docking analyses, and aortic tension recordings. The latter, however, are preliminary in nature.

      We completely agree with the reviewer that there is ample room for further studies that could characterize different tissues important in blood pressure regulation (such as resistance arteries), elucidate even more physiological detail (such as modulatory effects of the endothelium), or look deeper into the pharmacology using chemically altered Mangostin derivatives. While we very much like this to happen in future projects, in this study we focused on the functional aspects of a-Mangostin in BK channel gating. We present our tension recordings as a proof-of-concept to underline the activity of a-Mangostin in native tissues, and we clearly show the importance of the BK channel by using iberiotoxin as a specific inhibitor which impressively abolished relaxation.

      References:

      Abramson, J. et al. (2024) “Accurate structure prediction of biomolecular interactions with AlphaFold 3,” Nature, 630(8016), pp. 493–500. Available at: https://doi.org/10.1038/s41586-024-07487-w.

      Reviewer #2 (Public review):

      Summary:

      In the present manuscript, Cordeiro et al. show that α-mangostin, a xanthone obtained from the fruit of the Garcinia mangostana tree, behaves as an agonist of the BK channels. The authors arrive at this conclusion through the effect of mangostin on macroscopic and single-channel currents elicited by BK channels formed by the α subunit and α + β1 sununits, as well as αβ1 channels coexpressed with voltage-dependent Ca2+ (CaV1,2) channels. The single-channel experiments show that α-mangostin produces a robust increase in the probability of opening without affecting the single-channel conductance. The authors contend that α-mangostin activation of the BK channel is state-independent and molecular docking and mutagenesis suggest that α-mangostin binds to a site in the internal cavity. Importantly, α-mangostin (10 μM) alleviates the contracture promoted by noradrenaline. Mangostin is ineffective if the contracted muscles are pretreated with the BK toxin iberiotoxin.

      Strengths:

      The set of results combining electrophysiological measurements, mutagenesis, and molecular docking reveals α-mangostin as a potent activator of BK channels and the putative location of the α-mangostin binding site. Moreover, experiments conducted on aortic preparations from mice suggest that α-mangostin can aid in developing drugs to treat a myriad of diverse diseases involving the BK channel.

      We thank the reviewer for pointing out the significance of our study.

      Weaknesses:

      Major:

      (1) Although the results indicate that α-mangostin is modifying the closed-open equilibrium, the conclusion that this can be due to a stabilization of the voltage sensor in its active configuration may prove to be wrong. It is more probable that, as has been demonstrated for other activators, the α-mangostin is increasing the equilibrium constant that defines the closed-open reaction (L in the Horrigan, Aldrich allosteric gating model for BK). The paper will gain much if the authors determine the probability of opening in a wide range of voltages, to determine how the drug is affecting (or not), the channel voltage dependence, the coupling between the voltage sensor and the pore, and the closed-open equilibrium (L).

      We would like to take the opportunity to clarify this potential misunderstanding. In our manuscript, we have discussed three mechanistic explanations for the Mangostin activation: (1) an electrostatic effect at the selectivity filter, (2) structural and electrostatic changes of S6 that facilitate the opening of a putative lower gate, and (3) hydrophobic gating, i.e., counteracting dewetting of the pore. All possibilities would impact S6 and lower the free energy for pore opening, and we concur that therefore Mangostin most likely affects the closed-open equilibrium (L) of the BKα channel.

      The sentence at the original lines 470-471, “(…) caused by an enhanced shift of the closed-open equilibrium toward the open state, such as the stabilization of the voltage sensor in an active conformation” refers to the observation that the presence of the β1 subunit enhances this closed-open shift. The stabilization of the voltage sensor domain was mentioned as one example of how it achieves this. We recognize that this example was an unfortunate choice, as β1 rather facilitates Ca<sup>2+</sup>-dependent allosteric pore opening unrelated to the discussed mechanisms of Mangostin. We have therefore removed this statement.

      As to the suggestion to dissect the effect of Mangostin on C, D, and L, we agree with the reviewer that this would surely add to a full biophysical characterization. However, in our project, we strove towards including more experiments showing the physiological implications of Mangostin activation to emphasize the implication for vasodilation. We hope the reviewer understands that, with limited resources, this came at the expense of a full investigation of the different gating components, which could pose a separate project by itself.

      (2) Apparently, the molecular docking was performed using the truncated structure of the human BK channel. However, it is unclear which one, since the PDB ID given in the Methods (6vg3), according to what I could find, corresponds to the unliganded, inactive PTK7 kinase domain. Be as it may, the apo and Ca2+ bound structures show that there is a rotation and a displacement of the S6 transmembrane domain. Therefore, the positions of the residues I308, L312, and A316 in the closed and open configurations of the BK channel are not the same. Hence, it is expected that the strength of binding will be different whether the channel is closed or open. This point needs to be discussed.

      We apologize for the typing error and thank the reviewer for indicating this erroneous PDB ID. (“6vg3”). It should have read PDB ID 6v3g as in the legend to Fig. 4B. The reviewer appropriately points out that there are differences in the S6 segment addressed in our study between the two available cryo-EM structures obtained in the presence (PDB ID 6v38) and absence of Ca<sup>2+</sup> (PDB ID 6v3g) (Tao and MacKinnon, 2019).

      We had actually performed the docking with both structures, but chosen to show the Ca<sup>2+</sup>-free structure to better visualize the I308 position. a-Mangostin is found in the same S6 region in both, not obstructing the K<sup>+</sup> conduction pathway. The binding energies of the favored poses are very similar; the binding energy in the best-ranking conformational cluster in the Ca<sup>2+</sup>-bound structure even was slightly lower (-8.64 kcal mol<sup>-1</sup>) than in the docking with the Ca<sup>2+</sup>-free channel (-8.58 kcal mol<sup>-1</sup>; Fig. 4B), which may not be a relevant difference.

      We compared the residue interactions in both dockings (Author response table 1). S317 and Y318, which did not reduce the shift in V<sub>½</sub> upon substitution, were not predicted to contact a-Mangostin in either structure. In both structures, L312 and F315 were predicted to interact in virtually all poses analyzed. In the docking to the Ca<sup>2+</sup>-free state, also I308 was predicted to interact in 17/20 poses, while contacts to A316 occurred in 5/20 poses. In the Ca<sup>2+</sup>-bound state, predicted interactions shifted from I308 (which is expected as it is buried in the protein) to A316, and the isoprenyl moiety close to I308 rotated downwards. This could indicate that a-Mangostin adopts a more horizontal position following the upward reorientation of S6 in the Ca<sup>2+</sup>-bound state when the channel moves from one to the other conformation (Fig. S4).

      Author response table 1.

      Number of interactions of S6 residues in 20 analyzed α-Mangostin poses in the molecular dockings to the Ca2+-free and Ca2

      These docking results are consistent with our functional measurements. Recent structures of the BK/γ1 complex showed that the VSD and Ca<sup>2+</sup>-bowl are stabilized in an active-like conformation that corresponds to the conformation seen in the Ca<sup>2+</sup>-bound state (Kallure et al., 2023; Yamanouchi et al., 2023; Redhardt, Raunser and Raisch, 2024), indicating that very likely the Ca<sup>2+</sup>-bound and Ca<sup>2+</sup>-free structures indeed represent open and closed conformations of the channel. We observed that α-Mangostin can bind to both of these states to activate the channel (Fig. 3C, D), showing the presence of a binding site in both conformations. Further, α-Mangostin induced a left-shift in V<sub>½</sub> also in higher Ca<sup>2+</sup> concentration (Fig. 2D), indicating that it still binds to and activates the channel after the conformational change in S6. As we could not determine affinity for the mutants due to limited solubility, we have no information on the nature of the contribution of the substitutions, i.e., reduced binding or allosteric effect. As I308 is buried in the Ca<sup>2+</sup>-bound state, its contribution is likely mostly allosteric. We have also proposed dewetting as possible activation mechanism, which we expect to be less sensitive to the exact pose of a molecule (as shown for NS11021, Nordquist et al., 2024). Therefore, α-Mangostin could, e.g., change solvent accessibility of the I308 sidechain, energetically favoring the buried (open) state.

      We have now included both dockings and Author response table 1 in Fig. S4, and we have added passages to the results section (starting at line 373) and discussion section (starting at lines 544, 588).

      Minor:

      (1) From Figure 3A, it is apparent that the increase in Po is at the expense of the long periods (seconds) that the channel remains closed. One might suggest that α-mangostin increases the burst periods. It would be beneficial if the authors measured both closed and open dwell times to test whether α-mangostin primarily affects the burst periods.

      We thank the reviewer for this valuable suggestion, which we have implemented. In our single channel measurements shown in our original Fig. 3 we have not observed burst behavior of the BKɑ channels. This can be explained by the fact that we measured in resting condition (100 nM free Ca<sub>i</sub></sup>2+</sup>) and with rather mild depolarisation (+40 mV) where Po was very low. We have therefore analyzed measurements in 5 µM free a<sub>i</sub></sup>2+</sup> where we recorded sufficient burst activity also in the basal state.

      The burst analysis showed that ɑ-Mangostin indeed prolongs bursts and shortens the interburst closures. Within bursts, both closed times and open times were increased, and we recorded a higher number of opening events per burst. We conclude that ɑ-Mangostin acts in both the closed and the open state, where it slows open-closed transitions resulting in less flicker, and stabilizes the open state via longer open times and a higher probability for closed-open transitions.

      We now show this data in Fig. 3D-F and Table S8, and have accordingly added passages to the results section (starting at line 285), the discussion (line 510), and the methods section (starting at line 746).

      (2) In several places, the authors make similarities in the mode of action of other BK activators and α-mangostin; however, the work of Gessner et al. PNAS 2012 indicates that NS1619 and Cym04 interact with the S6/RCK linker, and Webb et al. demonstrated that GoSlo-SR-5-6 agonist activity is abolished when residues in the S4/S5 linker and in the S6C region are mutated. These findings indicate that binding of the agonist is not near the selectivity filter, as the authors' results suggest that α-mangostin binds.

      We will gladly clarify our ideas concerning the binding sites of other activators and ɑ-Mangostin. We first hypothesized that ɑ-Mangostin may share characteristics and mode of action with the class of negatively charged activators (NCA) that we have described before (Schewe et al., 2019). NCA were found to occupy a common fenestration site that is located close to the selectivity filter in TREK K2P channels, and in this manuscript we have shown by THexA competition and mutagenesis experiments that ɑ-Mangostin also binds in this fenestration region in TREK-1 channels (Fig. S3).

      The existence of this common NCA binding site was also proposed for BK channels, as a docking placed the NCA NS11021 in an equivalent binding region, and, among others, NS11021 and GoSlo-SR-5-6 competed with THexA for binding in the pore (Schewe et al., 2019). These results were indeed not fully in agreement with the proposed binding site of GoSlo-SR-5-6 in Webb et al. (2015), although the most effective (double) mutants were located at S317 and I323, at the intracellular end of the cleft between neighboring S6 segments. In this manuscript, we have shown that α-Mangostin is present in the pore of BK channels by molecular docking, a THexA competition assay, and two mutations that reduced the shift in V<sub>½</sub> induced not only by ɑ-Mangostin but also by GoSlo-SR-5-6 (Fig. 4). While the docking was rather a starting point, both functional tests argue against a binding site in the S4/5 linker/S6C region; however, allosteric mechanisms could still reduce activation also in mutants in the S4/5 linker/S6C region far from the pore binding region proposed by us in the 2019 study and the present manuscript.

      To summarize, we did not mean to imply that all BK activators should bind to this site, especially if they are not part of the NCA class (as NS1619, Cym4, as well as BC5, whose different binding site enabled us to use it as a control in our THexA competition assay). However, the cleft close to gating relevant S6 residues may well pose a region especially susceptible to modulator binding (as BL-1249, GoSlo-SR-5-6, and ɑ-Mangostin). We have moved, respectively separated, the initial GoSlo references from the reference to the pore binding site in the paragraph (lines329, 358) to improve clarity.

      (3) The sentence starting in line 452 states that there is a pronounced allosteric coupling between the voltage sensors and Ca2+ binding. If the authors are referring to the coupling factor E in the Horrigan-Aldrich gating model, the references cited, in particular, Sun and Horrigan, concluded that the coupling between those sensors is weak.

      We are grateful for the opportunity to improve this passage. We intended to express that observed effects (in this case the shift in V<sub>½</sub>) are pronounced around 1 µM Ca<sup>2+</sup>. As the reviewer states, the coupling factor between the voltage and calcium sensors (E; 2.4) is weak compared to the coupling of Ca<sup>2+</sup> (C; 8) and voltage (D; 25) to the pore in the Horrigan-Aldrich model. However, the shape of the Ca<sup>2+</sup>-dependence of V<sub>½</sub> cannot be completely described when E is neglected, with the highest difference around 1-2 µM Ca<sup>2+</sup> (Horrigan and Aldrich, 2002). Deletion of the gating ring underlines the allosteric sensor coupling (Clay, 2017). This together with the steep Ca<sup>2+</sup>-dependence in this concentration range (meaning high Po changes upon occupancy increase; Cui, Cox and Aldrich, 1997) explains the higher apparent activation, visible as the higher shift in V<sub>½</sub> observed at the 1 µM Ca<sup>2+</sup>. Speaking with the model of Sun and Horrigan (2022), the suppressing “molecular logic gate” is already relieved by the presence of intermediate Ca<sup>2+</sup>, and the direct “gating lever” pathway via voltage acts synergistically and achieves the observed higher V<sub>½</sub> shift upon depolarization. We have adapted the sentence and separated the citations for better understanding (lines 503-507).

      References:

      Clay, J.R. (2017) “Novel description of the large conductance Ca2+-modulated K+ channel current, BK, during an action potential from suprachiasmatic nucleus neurons,” Physiological Reports, 5(20), p. e13473. Available at: https://doi.org/10.14814/phy2.13473.

      Cui, J., Cox, D.H. and Aldrich, R.W. (1997) “Intrinsic Voltage Dependence and Ca2+ Regulation of mslo Large Conductance Ca-activated K+ Channels,” Journal of General Physiology, 109(5), pp. 647–673. Available at: https://doi.org/10.1085/jgp.109.5.647.

      Horrigan, F.T. and Aldrich, R.W. (2002) “Coupling between voltage sensor activation, Ca2+ binding and channel opening in large conductance (BK) potassium channels,” The Journal of General Physiology, 120(3), pp. 267–305. Available at: https://doi.org/10.1085/jgp.20028605.

      Kallure, G.S. et al. (2023) “High-resolution structures illuminate key principles underlying voltage and LRRC26 regulation of Slo1 channels.” bioRxiv, p. 2023.12.20.572542. Available at: https://doi.org/10.1101/2023.12.20.572542.

      Nordquist, E.B., Jia, Z., Chen, J., 2024. “Small Molecule NS11021 Promotes BK Channel Activation by Increasing Inner Pore Hydration.” J. Chem. Inf. Model. 64, 7616–7625. https://doi.org/10.1021/acs.jcim.4c01012

      Redhardt, M., Raunser, S. and Raisch, T. (2024) “Cryo-EM structure of the Slo1 potassium channel with the auxiliary γ1 subunit suggests a mechanism for depolarization-independent activation,” FEBS Letters, 598(8), pp. 875–888. Available at: https://doi.org/10.1002/1873-3468.14863.

      Schewe, M. et al. (2019) “A pharmacological master key mechanism that unlocks the selectivity filter gate in K + channels.,” Science, 363(6429), pp. 875–880. Available at: https://doi.org/10.1126/science.aav0569.

      Sun, L. and Horrigan, F.T. (2022) “A gating lever and molecular logic gate that couple voltage and calcium sensor activation to opening in BK potassium channels,” Science Advances, 8(50), p. eabq5772. Available at: https://doi.org/10.1126/sciadv.abq5772.

      Tao, X. and MacKinnon, R. (2019) “Molecular structures of the human Slo1 K+ channel in complex with β4,” eLife 8, p. e51409. Available at: https://doi.org/10.7554/eLife.51409.

      Webb, T.I. et al. (2015) “Molecular mechanisms underlying the effect of the novel BK channel opener GoSlo: Involvement of the S4/S5 linker and the S6 segment,” Proceedings of the National Academy of Sciences, 112(7), pp. 2064–2069. Available at: https://doi.org/10.1073/pnas.1400555112.

      Yamanouchi, D. et al. (2023) “Dual allosteric modulation of voltage and calcium sensitivities of the Slo1-LRRC channel complex,” Molecular Cell, 83(24), pp. 4555-4569.e4. Available at: https://doi.org/10.1016/j.molcel.2023.11.005.

      Reviewer #3 (Public review):

      Summary:

      This research shows that a-mangostin, a proposed nutraceutical, with cardiovascular protective properties, could act through the activation of large conductance potassium permeable channels (BK). The authors provide convincing electrophysiological evidence that the compound binds to BK channels and induces a potent activation, increasing the magnitude of potassium currents. Since these channels are important modulators of the membrane potential of smooth muscle in vascular tissue, this activation leads to muscle relaxation, possibly explaining cardiovascular protective effects.

      Strengths:

      The authors present evidence based on several lines of experiments that a-mangostin is a potent activator of BK channels. The quality of the experiments and the analysis is high and represents an appropriate level of analysis. This research is timely and provides a basis to understand the physiological effects of natural compounds with proposed cardio-protective effects.

      We sincerely thank the reviewer for appraising the achievements of our study.

      Weaknesses:

      The identification of the binding site is not the strongest point of the manuscript. The authors show that the binding site is probably located in the hydrophobic cavity of the pore and show that point mutations reduce the magnitude of the negative voltage shift of activation produced by a-mangostin. However, these experiments do not demonstrate binding to these sites, and could be explained by allosteric effects on gating induced by the mutations themselves.

      We are aware that our functional data are unfortunately not sufficient to clearly distinguish between effects due to affinity loss or due to allosteric mechanisms. Our attempts to generate complete dose–response curves for the mutants to determine accurate apparent IC<sub>50</sub> values were unfortunately limited by the solubility of the compound. Consequently, we have avoided making claims about affinity loss in the mutant analysis, and have instead only reported the reduction in potency, expressed as the shift in V<sub>½</sub>. To reduce confounding effects from the mutations themselves, we selected substitutions that preserved the most wildtype-like GV-relationships, based on the extensive mutagenesis work of (Chen, Yan and Aldrich, 2014). We address this matter also in our answer to Recommendation (6) below, and we have replaced the word “binding” in the title of the manuscript. Nevertheless, we consider the proposed binding region to be well supported by the THexA competition experiments in combination with molecular docking, even though the specific mechanistic contributions of individual residues cannot yet be resolved.

      Reviewer #3 (Recommendations for the authors):

      (1) Natural xanthones as α-Mangostin induce vasorelaxation via binding to key gating residues in the S6 domain of BK channels.

      (2) If α-Mangostin occupies a similar binding site to quaternary ammoniums, what is the explanation for not observing a reduction in the single-channel current (fast blocking effect)? The α-Mangostin site proposed here is in a region of the channel that should occlude ion permeation. The authors should discuss possible explanations for this apparently contradictory observation.

      As the reviewer states, we indeed have not observed a reduced single channel amplitude in any measurement. The THexA competition assay showed that ɑ-Mangostin is present in the pore cavity and interferes with THexA access to its binding site. However, we do not think that their binding sites are similar, as QA ions bind directly below the filter entrance to block permeation, while our studies suggest that ɑ-Mangostin binds in the upper portion of the cleft between S6 helices. In this position, it would clearly overlap with the QA binding site and hinder access, but not block permeation. We would therefore not expect to see an amplitude reduction by intermittent α-Mangostin block. Consistently, all binding poses in our dockings were close to the cavity wall, without interfering with the central ion conduction pathway. To better illustrate this, we have added updated intracellular views of the dockings in the Ca<sup>2+</sup>-free and Ca<sup>2+</sup>-bound state (which we have also now included as suggested by another reviewer) to the supplementary information (Fig. S4A).

      (3) In Figure 2D, it is difficult to appreciate the differences between the symbols representing the G-V relationships of BKa channels at different intracellular Ca concentrations, before and after activation with 10 μM a-Mangostin. A clearer distinction between the curves would help to interpret the data more easily.

      We thank the reviewer for the suggestion to improve figure accessibility. We have changed the line appearance for better discrimination of the overlying portions.

      (4) Both THexA and TPA block BK channels through voltage and state-dependent mechanisms. Therefore, their apparent affinity could change if a-Mangostin simply increases open probability or alters dwell times rather than physically blocking access to the binding site.

      The reviewer addresses valid limitations that can affect the meaningfulness of competition experiments under certain conditions. However, we think that this does not apply to our results:

      Previous studies have shown that the voltage dependence of quaternary ammonium blockers up to C<sub>10</sub> is rather weak in BK channels, and only a slight increase in block is present in the voltage range +30 mV to +100 mV (Li and Aldrich, 2004; Thompson and Begenisich, 2012). Hence, THexA voltage dependence has already reached a plateau in the competition assay (at +40 mV), and its voltage dependence would have little effect on our results.

      Controversy exists about the nature of the state dependence of different quaternary ammonium blockers, but TBA is often recognized as an open channel blocker of BK channels, which probably also applies to THexA (Wilkens and Aldrich, 2006; Tang, Zeng and Lingle, 2009; Thompson and Begenisich, 2012; Posson, McCoy and Nimigean, 2013). Assuming such an open-channel block, apparent IC<sub>50</sub> values would be inversely proportional to Po. The THexA IC<sub>50</sub> was about 80 nM in the basal state, when Po is very low (0.024 at +40 mV as derived from the GV-relationship); an increase of open dwell times, respectively Po, in the presence of α-Mangostin to, e.g., 0.3 would therefore lead to a ≈10-fold decrease in apparent IC<sub>50</sub>. However, the apparent THexA IC<sub>50</sub> strongly increased rather than decreased (more than 20-fold to around 1.6 µM). This cannot arise from Po change and must reflect the altered access of THexA to its binding site caused by α-Mangostin. Assuming a pure closed channel block where apparent IC<sub>50</sub> would correlate with the closed times, an increase of about 1.4-fold is expected. However, we recorded a much stronger 20-fold increase. Therefore, we are convinced that we have conclusively shown that α-Mangostin is present in the BK pore irrespective of the state dependence of THexA block.

      (5) The pH dependence of the V1/2 shift supports the idea that α-Mangostin becomes more negatively charged at higher pH (enhancing its effect.) However, although the data are consistent with this interpretation, additional controls such as using a non-ionizable analog or assessing solubility changes with pH would be needed to confirm that the shift is caused specifically by ionization of α-Mangostin and not by indirect pH effects on channel gating.

      We agree with the reviewer that the pH experiment by itself is not sufficient to clearly tie the existence of a charge to a possible activation mechanism. We still think that this is an interesting observation and should be made known, as we have investigated the mechanism of negatively charged activators in different K<sup>+</sup> channel families before (Schewe et al., 2019). Unfortunately, we do not have access to uncharged derivatives mimicking the 3D conformation. From the commercially available substances, the bare xanthone backbone is completely insoluble in water. We have therefore tested the derivative 3-hydroxyxanthone as example with a minimal number of hydroxyl substituents (Author response image 2, Author response table 2 ). The 3-hydroxyxanthone indeed shows reduced activation compared to α-Mangostin. The shift in V<sub>½</sub> induced by 10 µM 3-hydroxyxanthone was only 14.99 ± 5.67 mV (≈50 mV for α-Mangostin). This supports that the presence of several (potentially) charged substituents is important for the activation mechanism. However, we have no knowledge about the efficacy of the compound or the local pK<sub>a</sub> of the different hydroxyl groups. As the reviewer stated, systematic chemical modifications would be necessary to elucidate the importance of the charged substituent number and positions, which is not within our capabilities.

      Author response image 2.

      Activation of BKα by 3-hydroxyxanthone. (A) GV-relationship before and after application of 10 µM 3-hydroxyxanthone. (B) V<sub>½</sub> before and after application of 10 µM 3-hydroxyxanthone compared to α-Mangostin and the resulting difference in V<sub>½</sub> (ΔV<sub>½</sub>). Measurements were conducted as described in the main manuscript with 100 nM free Ca<sub>i</sub><sup>2+</sup>.

      Author response table 2.

      Comparison of the V<sub>½</sub> ± SEM and ΔV<sub>½</sub> ± SEM before and after activation by 10 µM α-Mangostin or 10 µM 3-hydroxyxanthone in BKα channels. Unpaired t-test, two-tailed P values (α=0.05)

      (6) The reduced V1/2 shifts observed in the I308A, L312M, and A316PP mutants may result from intrinsic gating alterations rather than a true loss of a-Mangostin binding. The GoSlo-SR-5-6 control is informative, but the persistence of activation in A316P does not fully resolve this. A more convincing test would be employing double or triple mutants.

      As stated above, we acknowledge that our functional data do not allow us to definitively separate effects arising from a true loss of binding affinity from those due to potential allosteric effects. We tried to minimize intrinsic gating alteration brought by substitutions by not conducting a pure alanine or cysteine scanning mutagenesis. Instead, substitutions were chosen to be closest to the wildtype GV-relationship in (Chen, Yan and Aldrich, 2014) where possible. While L312M was virtually identical to the wildtype, A316P showed a change in slope in high Ca<sup>2+</sup> concentrations, which could indicate a changed voltage sensitivity. Additionally, A316P completely abolished α-mangostin activation. We therefore also used A316G to ensure that the channel is functional and retains voltage sensitivity, even if its V<sub>½</sub> was shifted stronger. As we have conducted paired measurements and assessed the V<sub>½</sub> before and after activation, we are confident that we can attribute a reduced shift to the reduced action of α-mangostin.

      Following the reviewer’s suggestion, we have generated and measured the double mutants I308A/L312M, I308A/A316G, and L312M/A316G (the triple mutant I308A/L312M/A316G did not produce measurable currents). The mutants I308A/L312M and I308A/A316G showed a moderate energy-additive effect and reduced the shift in V<sub>½</sub> by further ≈7 mV compared to the single mutation with the stronger shift. The combination L312M/A316G, however, did not further reduce the shift seen in the single mutations and did not even produce the shift induced by A316G alone.

      Author response image 3.

      Double Mutants I308A/L312M, I308A/A316G and L312M/A316G compared to the single mutations in the main manuscript. The V½ before and after activation with 10 µM α-Mangostin, the resulting shift in V½, and the GV-relationships are shown (n=6-7), measurements were made as in Fig. 4.

      Author response table 3.

      Summary of the V<sub>½</sub> before and after Mangostin activation and the resulting shifts in V<sub>½</sub> for the double mutants compared to the single mutants shown in the main manuscript.

      Following a suggestion by another reviewer, we have generated Alphafold3 (AF3) models for I308A, L312M and A316P and repeated the Mangostin docking. We learned that the mutations are all predicted to substantially impact the structure of the S6 helix, therefore altering the binding region, and A316P especially impacted the nature of residue interactions. This could be an explanation why the double mutants do not show a clear and consistent additive effect.

      Unfortunately, this outcome is not conclusive and the double mutants do not reveal further information compared to the single mutants. We have therefore decided not to include these measurements in the manuscript.

      As we do not know if our answers will be sent to all reviewers, we repeat the relevant part about the AF3 models here:

      (…) According to these predictive models,

      The I308A substitution considerably straightens the S6 helix starting at this residue. Hence, all residues are displaced relative to the WT: C<sub>a</sub> of L312, F315, and A316 are displaced by 2.8 Å, 4.2 Å, and 4.6 Å, respectively, widening the bottom of the binding pocket. However, the prediction confidence is rated lower as in the other AF3 models for all helices (70 > plDDT > 50). In the docking, poses in the binding pocket comparable to these observed in the WT (i.e. involving I308A, L312 and A316) and with the same molecule orientation have higher binding energies (-7.13 to -6.66 kcal mol<sup>-1</sup>). Additionally, poses without contact to I308A arise that have a more vertical position, indicating that the structural change affects the binding region.

      The changes induced by L312M are localized to residues 313-323, where S6 bends towards S5. Binding energies are lower especially in the best 2 poses that are also most comparable to the WT docking (-9.88 kcal mol<sup>-1</sup>), but clustering overall is poor and poses are more heterogeneous. Interactions with L312M are completely abolished, while interactions with I308 (in 11/20 poses), F315 (in all poses), and A316 (in 5/20 poses) persist. Because of the rather small structural alteration induced by the substitution and the variable poses one could speculate that the reduced V<sub>½</sub> shift is due to the observed loss in binding to L312M; however, retained interactions to the other residues would still allow α-Mangostin to activate.

      A316P induces a displacement of the S6 helix compared to the WT while the other pore helices are not affected. S6 shows an enhanced outward bending around A316, which results in displacements of residues where a-Mangostin would bind, i.e., the C<sub>a</sub> of F315 and L312M are displaced by 2.4 Å and 2.8 Å (I308 is not affected). Residues below are moved in a more rotational way, resulting in a C<sub>a</sub> displacement of 3.1 Å for Y318 and even 5.7 Å for V319, before displacements decrease again towards the intracellular helix end. While interactions with A316P are present in 10/20 analyzed poses, the helix displacement seems to hinder I308 and L312 interactions, as the best docked a-Mangostin pose (-8.41 kcal mol<sup>-1</sup>) is predicted to only contact F315 and Y318, and overall, any I308 or L312 contacts only occurred in 3/20 and 7/20 poses (wildtype: 17/20 and 20/20 poses). This may hint at a mechanism where A316P probably has a substantial allosteric share in reducing the V<sub>½</sub> shift induced by a-Mangostin and underlines the exceptional effect of this mutation (i.e., complete loss of a V<sub>½</sub> shift). (…)

      (7) The subtraction approach used to isolate BK currents (difference before and after a-Mangostin) assumes that the compound affects only BK channels. However, a-Mangostin could also modulate Cav currents directly, as reported for other polyphenolic compounds. No vehicle (DMSO) control is shown.

      We agree with the reviewer that α-Mangostin could also modulate Ca<sub>v</sub> currents; however, this would not interfere with the conclusions drawn from this nanodomain experiment. We intended to show the overall current modulation by ɑ-Mangostin in the voltage range relevant for Ca<sub>v</sub>-BK coupling, as this would be the determinant for the membrane potential mediating the vasoactive effect. In native tissue, BK and Ca<sub>v</sub> channels (among others) would likewise contribute to the net membrane conductance, with BK channels being a major contributor when activated. In fact, a concomitant inhibition of Ca<sub>v</sub> channels could act synergistically in favor of vasodilation. This could therefore be a subject for the further investigation of potential ɑ-Mangostin targets. However, the fact that iberiotoxin prevented relaxation in aortic preparations conclusively showed that BK channels are the major player in native tissue.

      We have reformulated some sentences to prevent misunderstandings that we refer to isolated BK currents instead of α-Mangostin activated currents.

      DMSO controls were conducted and did not impact BK or Ca<sub>v</sub>1.2 currents or the aortic tissue contraction. We have added representative measurements as Fig. S6 and stated the DMSO concentration in the Methods section (line 655).

      (8) Most kinetic fits were obtained at strong depolarizations (around +100 mV), which limits how well these results can be extrapolated to physiological voltages. Although the BK-Cav experiments show facilitation between -50 and +50 mV, providing plots for activation and deactivation in that range would strengthen the physiological relevance.

      We thank the reviewer for this valuable suggestion. We now additionally show that the impact of ɑ-Mangostin on activation is high at lower depolarisation, indeed underlining its physiological relevance. To address the activation time course in a more physiological voltage range, we have used our measurements of BKɑ channels in 10 µM Ca<sub>i</sub></sup>2+</sup> (where the V<sub>½</sub> shift induced by ɑ-Mangostin is equal to 100 nM ca<sub>i</sub><sup>2+</sup>+; Fig. 2D). The outward currents already present in the lower voltage range under these conditions allowed us to fit a monoexponential function to the traces of 0 mV to 100 mV prepulses. The τ of activation decreased from 29.6 ± 3.1 ms at 0 mV to 2.4 ± 2 ms at +100 mV. After ɑ-Mangostin activation, the time course was accelerated, with a τ of activation of 9.5 ± 4.7 ms at 0 mV to 2 ± 0.6 ms at +100 mV. This faster activation was particularly effective in the lower voltage range far from high Po, e.g., ɑ-Mangostin caused a decrease of more than half of the τ of activation at +20 mV (from 12.2 ± 0.6 ms to 4.98 ± 1.6 ms).

      Our data consists of families of different prepulse voltages and a fixed repolarisation step (to -50 mV for 100 nM free Ca<sub>i</sub><sup>2+</sup>, and to -100 mV for 10 µM free Ca<sub>i</sub><sup>2+</sup>). Thus, we are not able to add plots for the voltage-dependence of deactivation in the same way as for activation. However, we can present the deactivation time constants of lower prepulse voltage steps that produce outward currents in symmetrical ion conditions with 10 µM free Ca<sub>i</sub></sup>2+</sup>. For -20 mV and +20 mV prepulse voltages, which better reflect physiological depolarisation, the deactivation time constant shows a 3-to 5-fold increase after ɑ-Mangostin activation.

      We now show the plot for the voltage dependence of activation in Fig. S2A and a bar graph for activation/ deactivation time constants at +20 mV as Fig. S2B; data are summarized in Table S5. We hope this adds to illustrating the effect of ɑ-Mangostin under physiological conditions.

      (9) Minor: In several parts of the paper, induced shifts to negative voltages are referred to "leftward shifts". It would be useful to be consistent and employ a more specific reference to negative or positive directions.

      We thank the reviewer for the careful reading and have harmonized the terminology.

      References

      Chen, X., Yan, J. and Aldrich, R.W. (2014) “BK channel opening involves side-chain reorientation of multiple deep-pore residues,” Proceedings of the National Academy of Sciences, 111(1), pp. E79–E88. Available at: https://doi.org/10.1073/pnas.1321697111.

      Li, W. and Aldrich, R.W. (2004) “Unique Inner Pore Properties of BK Channels Revealed by Quaternary Ammonium Block,” Journal of General Physiology, 124(1), pp. 43–57. Available at: https://doi.org/10.1085/jgp.200409067.

      Posson, D.J., McCoy, J.G. and Nimigean, C.M. (2013) “The voltage-dependent gate in MthK potassium channels is located at the selectivity filter,” Nature Structural & Molecular Biology, 20(2), pp. 159–166. Available at: https://doi.org/10.1038/nsmb.2473.

      Schewe, M. et al. (2019) “A pharmacological master key mechanism that unlocks the selectivity filter gate in K + channels.,” Science, 363(6429), pp. 875–880. Available at: https://doi.org/10.1126/science.aav0569.

      Tang, Q.-Y., Zeng, X.-H. and Lingle, C.J. (2009) “Closed-channel block of BK potassium channels by bbTBA requires partial activation,” The Journal of General Physiology, 134(5), pp. 409–436. Available at: https://doi.org/10.1085/jgp.200910251.

      Thompson, J. and Begenisich, T. (2012) “Selectivity filter gating in large-conductance Ca2+-activated K+ channels,” Journal of General Physiology, 139(3), pp. 235–244. Available at: https://doi.org/10.1085/jgp.201110748.

      Wilkens, C.M. and Aldrich, R.W. (2006) “State-independent block of BK channels by an intracellular quaternary ammonium.,” The Journal of General Physiology, 128(3), pp. 347–364. Available at: https://doi.org/10.1085/jgp.200609579.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers and editors for their careful reading of our manuscript and thoughtful comments on it. We appreciate the overall positive opinion on our manuscript and helpful comments and suggestions from the reviewers. Overall, the main points identified by reviewers were 1) further broadening of the system to a range of inputs as well as the construct types that can be generated with the system and 2) Further consideration of any off-target joining or off-target effects on genes/proteins and the limits to the expandability of the kit. To address these concerns, we have added new data in Figure 6, illustrating the generation of a new construct using PCR and dsDNA fragments, new constructs for mpeg1.1 and for CRISPR gRNA expression and have revised the text to further address concerns and limitations of the toolkit. We thank the reviewers and editors for these suggestions and feel that they have substantially improved the manuscript.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors introduce ImPaqT, a modular toolkit for zebrafish transgenesis, utilizing the Golden Gate cloning approach with the rare-cutting enzyme PaqCI. The toolkit is designed to streamline the construction of transgenes with broad applications, particularly for immunological studies. By providing a versatile platform, the study aims to address limitations in generating plasmids for zebrafish transgenesis.

      Strengths:

      The ImPaqT toolkit offers a modular method for constructing transgenes tailored to specific research needs. By employing Golden Gate cloning, the system simplifies the assembly process, allowing seamless integration of multiple genetic elements while maintaining scalability for complex designs. The toolkit's utility is evident from its inclusion of a diverse range of promoters, genetic tools, and fluorescent markers, which cater to both immunological and general zebrafish research needs. Furthermore, the modular design ensures expandability, enabling researchers to customize constructs for diverse experimental designs. The validation provided in the manuscript is solid, demonstrating the successful generation of several functional transgenic lines. These examples highlight the toolkit's efficacy, particularly for immune-focused applications.

      We appreciate the overall positive evaluation of our toolkit and the time and effort in evaluating it.

      Weaknesses:

      While the toolkit's technical capabilities are well-demonstrated, there are several areas where additional validation and examples could enhance its impact. One limitation is the lack of data showing whether the toolkit can be directly used for rapid cloning and testing of enhancers or promoters, particularly cloning them directly from PCR using PaqCI overhangs without needing an entry vector. Similarly, the feasibility of cloning genes directly from PCR products into the system is not demonstrated, which would significantly increase the utility for researchers working with genomic elements.

      This is an excellent point. Given the increased use of gene synthesis and dsDNA fragments, we also thought it was good to demonstrate incorporation of these as well. We have added a new figure, Figure 6, which demonstrates generation of two new transgene constructs constructed by direct cloning of three PCR products along with a synthetic dsDNA fragment into a Tol2 flanked backbone plasmid as an alternative, rapid approach to generation of transgenes. The resulting plasmids, encoding the mpeg1.1. promoter, a separate p2a, and a tdTomato fluorescent protein along with either wildtype or dominant negative rac2 were properly assembled and in transient transgenic zebrafish injected with these constructs, dominant negative rac2 prevented macrophage recruitment to tail wounds, indicating that this approach worked for the generation of functional transgenes. These results are discussed in new text (lines 304-391) describing this new experiment and the finding that both PCR products and synthesized dsDNA could be efficiently incorporated in constructions generated with our approach as well as in the discussion (lines 494-499).

      The authors discuss potential applications such as using the toolkit for tissue-specific knockout applications by assembling CRISPR/Cas9 gRNA constructs. However, they do not demonstrate the cloning of short fragments, such as gRNA sequences downstream of a U6 promoter, which would be an important proof-of-concept to validate these applications. Furthermore, while the manuscript focuses on macrophage-specific promoters, the widely used mpeg1.1 promoter is not included or tested, which limits the toolkit's appeal for researchers studying macrophages and microglia.

      Yes, in the new figure described above, we have now shown that this method works with shorter PCR fragments such as the p2a fragment cloned within the tdTomato-p2a-rac2 constructs described above. This fragment is ~70 bp and while this is somewhat longer than a simple gRNA targeting sequence (though smaller than a complete sgRNA), we believe that this indicates that smaller size fragments can still be incorporated within these constructs. We also agree with the general idea of increasing functionality to incorporate CRISPR/Cas9 and now include a 3E encoding the zebrafish U6 promoter. As CRISPR expression constructs frequently incorporate complex construction, for instance, expression of tagged Cas9 along with the U6 driven gRNA as in Zhou et al., 2018 or along with rescue constructs as in Wang et al., 2021, we have given these constructs the non-standard 5’ end O3c, to enable multiplexing in these complex constructs.

      We agree that it is important to include mpeg1.1, given the broad use of this promoter within the field, we’ve now included an 5E mpeg1.1 construct within the toolkit.

      Another potential limitation is the handling of sequences containing PaqCI recognition sites. Although the authors discuss domestication to remove these sites, a demonstration of cloning strategies for such cases or alternative methods to address these challenges would provide practical guidance for users.

      Absolutely, we have now included a new figure (Supplementary Figure 6) that illustrates one domestication approach using PCR and homology-based cloning as an easy approach to domestication. In addition, we have also mentioned alternative approaches for domestication in the discussion (lines 439-444).

      Reviewer #2 (Public review):

      Summary:

      Hurst et al. developed a new Tol2-based transgenesis system ImPaqT, an Immunological toolkit for PaqCl-based Golden Gate Assembly of Tol2 Transgenes, to facilitate the production of transgenic zebrafish lines. This Golden Gate assembly-based approach relies on only a short 4-base pair overhang sequence in their final construct, and the insertion construct and backbone vector can be assembled in a single-tube reaction using PaqCl and ligase. This approach can also be expandable by introducing new overhang sequences while maintaining compatibility with existing ImPaqT constructs, allowing users to add fragments as needed.

      Strengths:

      The generation of several lines of transgenic zebrafish for the immunologic study demonstrates the feasibility of the ImPaqT in vivo. The lineage tracing of macrophages by LPS injection shows this approach's functionality, validating its usage in vivo.

      We appreciate the positive sentiments for our toolkit and the effort put into reviewing our manuscript.

      Weaknesses:

      (1) There is no quantitative data analysis showing the percentage of off-target based on these 4bp overhang sequences.

      While we agree that this is an important variable for the method, we feel that previous studies that have broadly tested off-target effects of all potential 4 bp overhang sequences have already given an effective overview of interactions between each of these overhangs (Potapov et al., 2018; Pryor et al., 2020). The results from these studies were incorporated into the NEB ligase fidelity viewer that we used to predict the overhangs that would have minimal off-target with each other: the tool also reports the expected off-target ligation of individual 4 bp overhangs. In all cases, we selected overhangs that would have minimal off-target efficiency, with each of the overhangs showing 1% or less off-target ligation with any of the other overhangs chosen. We have added new text, lines 119-124, that further clarifies that our selection for these ends.

      (2) There is no statement for the upper limitation of the expandability.

      Yes, we’ve been curious as well. While our cloning of 6 distinct fragments in Figure 5 and a new 5 fragment cloning added in revision seen in Figure 6, suggests that 5-6 fragments can be readily assembled, in the course of revisions we also attempted to generate a larger product of 11 fragments that ultimately failed. While the 11 fragment construct was unsuccessful, it is unclear whether this is due to the constructs chosen, the potential size of the plasmid or due to a failure of the technique/enzymes themselves. Given that published descriptions of PaqCI Golden Gate cloning approaches have found that PaqCI can assemble at least 32 fragments and can produce large sequences (e.g. in Sikkema et al., 2023, where they assemble the ~40 kbp T7 genome from 12, 24 and 32 distinct fragments using a PaqCI Golden Gate reaction), we suspect that our issues with the 11 fragment assembly are likely due to complications with the specific group of constructs that were combined, however, we have not been able to exhaustively test a range of constructs and assemblies of varying complexity levels. To recognize this, we have added additional text (lines 490-493) to the discussion describing that we have only combined 6 constructs, but that we think that this likely encompasses many of the applications that may be needed for this system, while recognizing that expansion beyond this number may be possible.

      (3) There is no data about any potential side effect on their endogenous function of promoter/protein of interest with the ImPaqT method.

      Absolutely, we have added new text (lines 457-470) to our discussion describing the potential side effects on protein function. For instance, the need to be aware of whether N- or C-termini of proteins can be modified and recognition of the potential for affecting/creating ectopic transcription factor binding sites as potential pitfalls to keep in mind.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The data presented in the manuscript is robust and well-supported. However, to fully demonstrate the broad applicability of the toolkit and strengthen its impact, a few additional experiments could be beneficial. Specific suggestions for these experiments and areas of improvement are outlined in the 'Weaknesses' section of the Public Review. Additionally, Figures 2-4 illustrate the same concept - cloning three fragments from entry vectors-which comes across as repetitive. Incorporating a more diverse range of use cases would better highlight the versatility of the toolkit.

      As we described in our replies to your public points above, we have now added new Figure 6 and new Supplementary Figure 6 addressing the cloning of PCR fragments, short fragments as well as a mechanism of domestication. We have also included the mpeg1.1 promoter within the toolkit. In addition, your point on the repetition of assay is fair and in our new Figure 6, we instead used wild type and dominant-negative Rac2 expression and failure of macrophage recruitment to the tail wound.

      Reviewer #2 (Recommendations for the authors):

      Hurst et al. developed a new Tol2-based transgenesis system ImPaqT, it is interesting and potentially efficient, but I have a few concerns:

      (1) The author claimed that the ImPaqT system is more efficient than other existing systems. The authors should provide such data to support their claim.

      Our argument wouldn’t be that the ImPaqT system is strictly speaking more efficient, but rather that the combination of minimal added sequence, the ability to expand or contract the fragments used, and, in our new Figure 6, the ability to directly utilize PCR products and dsDNA fragments, while retaining the ability to combinatorially build constructs from a suite of existing sequences is the main point of the method. We now explicitly state that Golden Gate cloning isn’t more efficient than existing techniques in the text (lines 534-537), but rather the particular strength of the method is the flexibility and minimal added sequence.

      (2) The ImPaqT is theoretically less prone to have off-target effects than existing systems, the authors should provide such data to validate their claim.

      Good point, we have now searched the zebrafish genome for PaqCI sites as well as for BsaI and BsmBI which are the 6-base cutters most commonly used for Golden Gate cloning. We found that PaqCI cuts every ~17 kb in the zebrafish genome while BsaI and BsmBI cut every ~9 kb or ~13 kb respectively, further supporting that PaqCI sites are rarer in the genome and should generally require domestication less often. We have now added new text describing this in lines 129-132.

      (3) The authors should mention any potential side effects of this system on the endogenous function of the promoter/protein of interest, at least in their discussion part.

      Yes, this should absolutely be expanded, as we said in your public comments above, we have now added new text describing potential pitfalls that this method may have on promoter or gene expression.

      (4) The authors are suggested to provide a balanced discussion about the expandable usage of this system beyond the immune system.

      We agree, this is also a good point that we should have emphasized more. We’ve added new text (lines 537-541) recognizing that in principle, many of the components we’ve derived should be useful in non-immune systems, but we also recognize that adapting this to new tissues will require the development of new promoters within the Golden Gate system which can be combined with these already developed tools.

      References

      Potapov, V., Ong, J.L., Kucera, R.B., Langhorst, B.W., Bilotti, K., Pryor, J.M., Cantor, E.J., Canton, B., Knight, T.F., Evans, T.C., Jr., et al. (2018). Comprehensive Profiling of Four Base Overhang Ligation Fidelity by T4 DNA Ligase and Application to DNA Assembly. ACS Synth Biol 7, 2665-2674.

      Pryor, J.M., Potapov, V., Kucera, R.B., Bilotti, K., Cantor, E.J., and Lohman, G.J.S. (2020). Enabling one-pot Golden Gate assemblies of unprecedented complexity using data-optimized assembly design. PLoS One 15, e0238592.

      Sikkema, A.P., Tabatabaei, S.K., Lee, Y.J., Lund, S., and Lohman, G.J.S. (2023). High-Complexity One-Pot Golden Gate Assembly. Curr Protoc 3, e882.

      Wang, Y., Hsu, A.Y., Walton, E.M., Park, S.J., Syahirah, R., Wang, T., Zhou, W., Ding, C., Lemke, A.P., Zhang, G., et al. (2021). A robust and flexible CRISPR/Cas9-based system for neutrophilspecific gene inactivation in zebrafish. J Cell Sci 134.

      Zhou, W., Cao, L., Jeffries, J., Zhu, X., Staiger, C.J., and Deng, Q. (2018). Neutrophil-specific knockout demonstrates a role for mitochondria in regulating neutrophil motility in zebrafish. Dis Model Mech 11.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      (1) The main weakness of this paper, in my view, is that it felt disconnected from the larger body of work on fitness and genotype-phenotype landscapes, including previous data on TFBSs in E. coli, genotype-phenotype maps of TFBSs in other systems, protein sequence landscapes (e.g., from mutational scans or combinatorially-complete libraries), and fitness landscapes of genomic mutations (e.g., combinatorially-complete landscapes of antibiotic resistance alleles). I have no doubt the authors are experts in this literature, and they probably cite most of it already given the enormous number of references. But they don't systematically introduce and summarize what was already known from all that work, and how their present study builds on it, in the Abstract and Introduction, which left me wondering for most of the paper why this project was necessary. Eventually, the authors do address most of these points, but not until the end, in the Discussion. Readers who have no familiarity with this literature might read this paper thinking that it's the first paper ever to study topography and evolutionary paths on genotype-phenotype landscapes, which is not true.

      There were two points that made this especially confusing for me. First, in order to choose which nucleotides in the binding sites to vary, the authors invoke existing data on the diversity of these sequences (position-weight matrices from RegulonDB). But since those PWMs can imply a genotype-phenotype map themselves, an obvious question I think the authors needed to have answered right away in the Introduction is why it is insufficient for their question. They only make a brief remark much later in the Results that the PWM data is just observed sequence diversity and doesn't directly reflect the regulation strength of every possible TFBS sequence. But that is too subtle in my opinion, and such a critical motivation for their study that it should be a major point in the Introduction.

      The second point where the lack of motivation in the Introduction created confusion for me was that they report enormous levels of sign epistasis in their data, to the point where these landscapes look like random uncorrelated landscapes. That was really surprising to me since it contrasts with other empirical landscape data I'm familiar with. It was only in the Discussion that I found some significant explanation of this - namely that this could be a difference between prokaryotic TFBSs, as this paper studies, and the eukaryotic TFBSs that have been the focus of many (almost all?) previous work. If that is in fact the case - that almost all previous studies have focused on eukaryotic TFBSs or other kinds of landscapes, and this is the first to do a systematic test of prokaryotic TFBS, then that should be a clear point made in the Abstract and Introduction. (I find a comparable statement only in the very last paragraph of the Discussion.) If that's the case, then I would also find that point to be a much stronger, more specific conclusion of this paper to emphasize than the more general result of observing epistasis and contingency (as is currently emphasized in the Abstract), which has been discussed in tons of other papers. This raises all sorts of exciting questions for future studies - why do the landscapes of prokaryotic TFBSs differ so dramatically from almost all the other landscapes we've observed in biology? What does that mean for the evolutionary dynamics of these different systems?

      We thank the reviewer for this thoughtful and detailed critique. We agree that the original version of the manuscript did not sufficiently motivate the study early on, nor did it clearly position our work within the broader literature on genotype–phenotype (GP) and fitness landscapes. We also agree that two specific issues, the role of PWMs and the unexpectedly high levels of sign epistasis, were insufficiently explained early on, which could lead to confusion for readers not already familiar with this field.

      Positioning within the broader landscape literature

      In response, we have substantially revised the Abstract and Introduction to explicitly situate our work within existing empirical studies of GP and fitness landscapes, including TFBS landscapes in bacteria, eukaryotic TFBS genotype–phenotype maps, in vitro TF–DNA binding studies, deep mutational scans of proteins, and combinatorially complete fitness landscapes such as antibiotic resistance alleles (Abstract; Introduction, lines 64–85). We now make clear that our study builds directly on this extensive body of work, rather than introducing the landscape framework itself. For example, we write in the introduction:

      “Over the last decade, genotype–phenotype (GP) maps and fitness landscapes have become central tools for understanding how molecular systems evolve under mutation and selection[22–25]. Such maps and landscapes have been experimentally studied for DNA[6,8,18,19,26,27], protein[28–32] and RNA[33–35] molecules, revealing key topographical properties that shape evolutionary outcomes, including epistasis[24,36]—the non-additive effects of multiple mutations on phenotype—landscape ruggedness, reflected in the number and distribution of fitness peaks, and constraints on adaptive evolution.”

      At the same time, we clarify what remains rare in the literature: large-scale, in vivo genotype–phenotype landscapes for bacterial transcription factor binding sites that are sufficiently dense to support explicit evolutionary analyses. While numerous high-throughput studies have characterized bacterial regulatory elements, these datasets typically do not provide quantitative regulatory phenotypes across large genotype spaces, nor do they analyze evolutionary accessibility. To our knowledge, only one such in vivo TFBS landscape had previously been characterized at comparable resolution for a bacterial local regulator (TetR). Our work extends this approach to three global regulators, enabling systematic comparisons across prokaryotic systems (Abstract, Introduction, lines 64–85). For example, we write in the introduction:

      “For transcription factor binding sites, most pertinent large-scale studies are based on in vitro binding assays, such as protein-binding microarrays (PBMs), and they focus predominantly on eukaryotic transcription factors[6]. While these studies have been instrumental in characterizing transcription factor binding preferences, they typically do not measure regulatory output in a native cellular context. In contrast, comprehensive in vivo data for bacterial TFBSs remain extremely rare. To our knowledge, only two high-resolutionin vivo landscapes have been previously mapped for bacterial regulators, those of the local regulators TetR[18] and LacI[27]. As a result, it remains unclear whether principles inferred from protein landscapes, eukaryotic TFBSs, or in vitro binding assays generalize to transcriptional regulation in bacteria, particularly for global regulators[11] that integrate multiple physiological signals.”

      Why PWMs are insufficient for our question.

      We agree with the reviewer that our original explanation of the role of PWMs was too cursory and should have been addressed explicitly in the Introduction. We have now revised the Introduction to clearly explain why PWMs derived from RegulonDB cannot substitute for empirical GP landscapes in our study (Introduction, lines 102–113).

      In this passage we now explain that, first, PWMs are inferred from a limited number of naturally occurring binding sites—typically on the order of hundreds of sequences—whose diversity reflects evolutionary history and genomic context rather than systematic exploration of sequence space. As a result, PWMs sample only a small and biased subset of the possible TFBS variants, whereas our libraries probe tens of thousands of sequences in a controlled manner, providing substantially broader and more uniform coverage of genotype space (Introduction, lines 102–113).

      Second, PWM scores are not direct measurements of regulatory strength. Instead, they represent probabilistic or heuristic scores that are primarily used for identifying candidate binding sites in genomes. Numerous studies have shown that PWM scores often correlate weakly with in vivo binding affinity or regulatory output, where DNA shape, cooperative interactions, and chromosomal context play important roles. As such, PWMs do not provide quantitative genotype–phenotype relationships for regulation strength (Introduction, lines 102–113).

      Third, PWMs assume independent and additive contributions of individual nucleotide positions. This assumption excludes epistatic interactions by construction. Because epistasis is central to landscape ruggedness, peak structure, and evolutionary accessibility, PWM-based models are fundamentally unsuited to address the evolutionary questions we study here (Introduction, lines 102–113). We now explicitly state this limitation early in the manuscript, rather than only alluding to it later in the Results.

      Sign epistasis and contrast with prior TFBS landscapes.

      We also agree with the reviewer that the extensive sign epistasis we observe—approaching levels expected for uncorrelated random landscapes—is surprising in light of much of the existing empirical landscape literature. Importantly, as the reviewer notes, most previous TFBS landscape studies have focused on in vitro binding systems or on eukaryotic transcription factors, which tend to exhibit smoother and more additive landscapes.

      To address this concern, we have revised the Abstract and Introduction to explicitly frame this contrast as a central result of the study (Abstract; Introduction, lines 151-153, Discussion, lines 652–668). For example, we write in the discussion:

      “We showed that the regulatory landscapes of all three TFs are highly rugged and have multiple peaks. The ruggedness of all three landscapes is also supported by the prevalence of epistasis between pairs of TFBS mutations (Supplementary Table S5). A particularly important form of epistasis is sign epistasis[24,93,94], because it can lead to multiple adaptive peaks [24,93,94] (see Supplementary Methods 7.5). Our landscapes contain up to 65% of mutation pairs with sign epistasis, a value that is especially high compared to the almost exclusively additive interactions of mutations in eukaryotic TFs[6,125].”

      We now emphasize that prokaryotic TFBS landscapes, particularly for global regulators, appear to be substantially more rugged and epistatic than most previously characterized TFBS landscapes, and that this difference likely reflects fundamental biological distinctions between regulatory systems.

      Revised emphasis and conclusions.

      Following the reviewer’s suggestion, we have adjusted the emphasis of the manuscript accordingly. Rather than highlighting epistasis and contingency as generic evolutionary phenomena, we now present the extreme ruggedness of prokaryotic TFBS landscapes as a system-specific finding with important implications for the evolution of gene regulation. We explicitly note that this raises new questions for future work—such as why prokaryotic regulatory landscapes differ so markedly from eukaryotic ones, and how these differences shape evolutionary dynamics—which we now highlight in the Introduction and Discussion (Abstract; Introduction, lines 151-153, Discussion, lines 652–668). For example, we write in the discussion:

      “… A possible reason for this greater incidence of epistasis lies in the nature of prokaryotic TFBSs. Specifically, prokaryotic TFBSs are at approximately 20bps twice as long as eukaryotic TFBSs[80,128] and exhibit symmetries that reflect the dimeric state of their cognate TFs[129–131]. These factors may increase the likelihood of intramolecular epistasis. Our observations raise important questions for future work, such as why the landscapes of prokaryotic TFBSs differ so dramatically from those of eukaryotic ones. And what do these differences imply for the evolutionary dynamics of gene regulation?”

      We believe that these revisions substantially improve the clarity, motivation, and positioning of the manuscript, and directly address the reviewer’s concerns by making both the necessity and the novelty of the study clear from the outset.

      (2) I am a bit concerned about the lack of uncertainties incorporated into the results. The authors acknowledge several key limitations of their approach, including the discreteness of the sort-seq bins in determining possible values of regulation strength, the existence of a large number of unsampled sequences in their genotype space, as well as measurement noise in the fluorescence readouts and sequencing. While the authors acknowledge the existence of these factors, I do not see much attempt to actually incorporate the effect of these uncertainties into their conclusions, which I suspect may be important. For example, given the bin size for the fluorescence in sort-seq, how confident are they that every sequence that appears to be a peak is actually a peak? Is it possible that many of the peak sequences have regulation strengths above all their neighbors but within the uncertainty of the fluorescence, making it possible that it's not really a peak? Perhaps such issues would average out and not change the statistical nature of their results, which are not about claiming that specific sequences are peaks, just how many peaks there are. Nevertheless, I think the lack of this robustness analysis makes the results less convincing than they otherwise would be.

      We thank the reviewer for raising this important concern. We fully agree that uncertainties arising from experimental resolution, measurement noise in fluorescence and sequencing, and incomplete sampling of genotype space should be incorporated explicitly into the analysis. While these limitations were acknowledged qualitatively in the original manuscript, we recognize that a direct, quantitative assessment of their impact on our conclusions is essential to strengthen the robustness of the study.

      We first clarify that regulation strength is not discretized in our analysis. For each TFBS, regulation strength is calculated as a continuous weighted average of fluorescence across all sorting bins, based on the sequencing read-count distribution of each sequence across bins. We clarified this information in the main text (Results, lines 201-203). Nevertheless, finite binning resolution and experimental noise introduce uncertainty in these estimates, which could in principle affect the identification of local peaks.

      Importantly, our study does not aim to assert that specific TFBS sequences are definitively peaks. Rather, our focus is on landscape-level statistical and topological properties—such as ruggedness, the abundance and distribution of peaks, and the evolutionary accessibility of strong regulation. We therefore centered our new analyses on testing whether these conclusions are robust to experimentally plausible sources of uncertainty, rather than on the identity of individual peaks.

      To address the reviewer’s concern, we performed two complementary analyses. The first evaluates whether the observed ruggedness of the landscapes could arise as an artifact of incomplete sampling. It addressed the effects of missing genotypes and the possibility of spurious peak identification due to unsampled neighbors. Sparse sampling can introduce opposing biases: true peaks may be missed, while other genotypes may be falsely classified as peaks because fitter neighbors are absent. As shown for uncorrelated random (House-of-Cards) landscapes (Kauffman & Levin, 1987), these effects can partially cancel.

      In this analysis, we constructed a null model by randomly permuting regulation strengths across the mapped genotype network while preserving its topology. The number of peaks in these randomized landscapes is only modestly higher than in the empirical data, indicating that the measured landscapes are close to the maximal ruggedness compatible with the sampled network (Results, lines 308–320).

      In addition, we quantified potential sampling bias by analyzing genotype connectivity. Here we defined the relative connectivity of a genotype as the fraction of possible single-mutant neighbors for which we had measured regulation strength. We observed only a very weak correlation between connectivity and regulation strength (R=-0.1, -0.1, 0.01 for the CRP, Fis, and IHF landscapes, Figures S13-S15). Similarly, the relative connectivity of peak genotypes is only weakly correlated with their regulation strength (R=-0.05, -0.04, 0.06 for the CRP, Fis, and IHF landscapes). (Results, lines 321–330), indicating that strongly regulating genotypes are not preferentially oversampled or undersampled (Results, lines 321–330).

      The second, and most important, analysis directly addresses the reviewer’s concern that experimental uncertainty could affect peak classification and, consequently, landscape navigability. We explicitly incorporated experimentally measured, genotype-specific noise estimates from biological replicates when comparing fitness values between neighboring genotypes. Using these uncertainty-aware comparisons, we then recomputed adaptive-walk dynamics and genotype visitation frequencies on the resulting noisy landscapes.

      We observe strong correlations between visitation frequencies in the noise-free and noisy landscapes across all three transcription factors (new Supplementary Figure S35), indicating that evolutionary accessibility patterns are robust to realistic levels of experimental uncertainty. These analyses are described in the revised Results (lines 622–636) and in a new Supplementary Methods section (“Incorporation of experimental uncertainty into adaptive walks”).

      Reviewer #2 (Public review):

      The authors aim to investigate the ability of evolution to create strong transcription factor binding sites (TFBSs) de novo in E. coli. They focus on three global transcriptional regulators: CRP, Fis, and IHF, using a massively parallel reporter assay to evaluate the regulatory effects of over 30,000 TFBS variants. By analyzing the resulting genotype-phenotype landscapes, they explore the ruggedness, accessibility, and evolutionary dynamics of regulatory landscapes, providing insights into the evolutionary feasibility of strong gene regulation. Their experiments show that de novo adaptive evolution of new gene regulation is feasible. It is also subject to a blend of chance, historical contingency, and evolutionary biases that favor some peaks and evolutionary paths.

      (1) Strengths of the methods and results:

      The authors successfully employed a well-designed sort-seq assay combined with high-throughput sequencing to map regulatory landscapes. The experimental design ensures reliable measurement of regulation strengths. Their system accounts for gene expression noise and normalizes measurements using appropriate controls.

      Comprehensive Landscape Mapping:

      The study examines ~30,000 TFBS variants per transcription factor, providing statistically robust and thorough maps of the regulatory landscapes for CRP, Fis, and IHF. The landscapes are rigorously analyzed for ruggedness (e.g., number of peaks) and epistasis, revealing parallels with theoretical uncorrelated random landscapes.

      Evolutionary Dynamics Simulations:

      Through simulations of adaptive walks under varying population dynamics, the authors demonstrate that high peaks in regulatory landscapes are accessible despite ruggedness. They identify key evolutionary phenomena, such as contingency (multiple paths to peaks) and biases toward specific evolutionary outcomes.

      Biological Relevance and Novelty:

      The author's work is novel in focusing on global regulators, which differ from previously studied local regulators (e.g., TetR). They provide compelling evidence that rugged landscapes are navigable, facilitating de novo evolution of regulatory interactions. The comparison of landscapes for CRP, Fis, and IHF underscores shared topographical features, suggesting general principles of global transcriptional regulation in bacteria.

      (2) Weaknesses of the methods and results:

      Undersampling of Genotype Space:

      While the quality filtering of the data ensures robustness, ~40% of the TFBS space remains uncharacterized. The authors acknowledge this limitation but could improve the analysis by employing subsampling or predictive modeling.

      We thank the reviewer for raising this point. We agree that undersampling of genotype space is an important limitation of our dataset and that, in principle, subsampling or predictive modeling approaches could be used to address missing genotypes. We have now clarified in the manuscript why these approaches are not straightforward in the context of our analyses and why we did not pursue them here.

      Although approximately 40% of TFBS genotypes were removed during the filtering step due to lack of reliable measurements, this filtering step was necessary to ensure robust estimation of regulation strength from sort-seq data. Importantly, random subsampling of the genotypes in our data set would not alleviate this limitation, because many of our key analyses—such as peak identification, quantification of epistasis, and assessment of evolutionary accessibility—require combinatorially complete local neighborhoods in genotype space. Subsampling would remove mutational neighbors from many neighborhoods, and thus further limit our ability to characterize landscape topology.

      Predictive modeling approaches could, in principle, be used to infer missing genotypes and reconstruct more complete landscapes. However, developing, experimentally validating, and benchmarking such models would not only substantially expand the scope of an already long paper, it would  also require additional assumptions about genotype–phenotype relationships that entail their own limitations. Our primary goal in this work was to provide the first large-scale empirical in vivo regulatory landscapes for global bacterial transcription factors, comprising tens of thousands of experimentally measured variants. We view these empirical landscapes as a necessary foundation upon which predictive modeling and landscape completion can be built in future, complementary studies.

      We have now revised the Discussion (lines 760-770) to explicitly articulate these points and to clarify that, while undersampling remains a limitation, it does not invalidate the landscape-level conclusions we draw from the combinatorially complete neighborhoods present in our data. There we also outline predictive modeling as an important directions for future work.

      For a more detailed answer regarding subsampling and peak classification, please also see our response to comment (2) of Reviewer #1.

      Simplified Regulatory Architecture:

      The study considers a minimal system of a single TFBS upstream of a reporter gene. While this may have been necessary for clarity, this simplification may not reflect the combinatorial complexity of transcriptional regulation in vivo.

      Point well taken. We have added paragraph to state explicitly that the system we use to study gene regulation is much simpler than most in vivo regulatory circuits (Discussion, lines 797-802)

      Lack of Experimental Validation of Simulations:

      The adaptive walks are based on simulated dynamics rather than experimental evolution. Incorporating in vivo experimental evolution studies would strengthen the conclusions. Although this is a large request for the paper, that would not prevent publication.

      We thank the reviewer for this important point. We fully agree that in vivo experimental evolution would provide a valuable and complementary way to validate the evolutionary dynamics inferred from our simulations. However, we ask for the reviewer's understanding that adding experimental evolution to an (already long) paper would go far beyond the scope of our study.

      Also, the goal of our study was not to reproduce evolutionary trajectories experimentally, but to characterize the structure of large empirical regulatory landscapes, and to use these landscapes as a data-driven basis for exploring evolutionary accessibility under well-defined population-genetic assumptions. The adaptive walks we employ are parameterized directly from experimentally measured genotype–phenotype maps, and incorporate established fixation probabilities. Such walks have been widely used to study evolutionary dynamics on empirical landscapes when experimental evolution is not tractable, because it would involve tens of thousands of genotypes that represent small mutational targets and would thus take a long time to evolve.

      An additional issue related to the feasibility of experimental evolution is that performing in vivo experimental evolution for the regulatory landscapes analyzed here would require tracking large populations across a combinatorially vast TFBS space, while simultaneously measuring regulatory phenotypes for thousands of evolving lineages, which is currently not experimentally feasible. This is another reason why simulation-based approaches have been the standard method for linking large-scale empirical landscapes to evolutionary dynamics in both theoretical and experimental studies.

      Furthermore, our conclusions are intentionally framed at the level of statistical and landscape-wide properties (e.g., accessibility of high peaks, contingency, and evolutionary bias), rather than at the level of specific mutational trajectories. As such, they do not rely on the precise reproduction of any single evolutionary path, but on aggregate patterns that are robust to reasonable variation in population-genetic parameters.

      In sum, we do not view experimental evolution as essential for the conclusions we draw, but as an important and exciting direction for future work that may be enabled by the landscapes we have experimentally mapped.

      Impact on the Field:

      This study advances our understanding of adaptive landscapes in gene regulation and offers a critical step toward deciphering how global regulators evolve de novo binding sites. The findings provide foundational insights for synthetic biology, evolutionary genetics, and systems biology by highlighting the evolutionary accessibility of strong regulation in bacteria.

      Utility of Methods and Dat

      The sort-seq approach, combined with landscape analysis, provides a robust framework that can be extended to other transcription factors and systems. If made publicly available, the study's data and code would be valuable for researchers modeling transcriptional regulation or studying evolutionary dynamics.

      Additional Context:

      The study builds on a growing body of work exploring regulatory evolution. For instance, recent studies on local regulators like TetR and AraC have revealed high ruggedness and epistasis in TFBS landscapes. This study distinguishes itself by focusing on global regulators, which are more biologically complex and influential in bacterial gene networks. The observed evolutionary contingency aligns with findings in other biological systems, such as protein evolution and RNA folding landscapes, underscoring the generality of these evolutionary principles.

      Conclusion:

      The authors successfully mapped the genotype-phenotype landscapes for three global regulators and simulated evolutionary dynamics to assess the feasibility of strong TFBS evolution. They convincingly demonstrate that ruggedness and epistasis, while prominent, do not preclude the evolution of strong regulation. Their results support the notion that gene regulation evolves through a blend of chance, contingency, and evolutionary biases.

      This paper makes a significant contribution to the understanding of regulatory evolution in bacteria. While minor limitations exist, the authors' methods are robust, and their findings are well-supported. The work will likely be of broad interest to researchers in molecular evolution, synthetic biology, and gene regulation.

      We thank the reviewer for their thorough evaluation and for their supportive opinion of this paper.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Line 28 (Abstract): "Landscape ruggedness does not prevent the evolution of strong regulation, because more than 10% of evolving populations can attain one of the highest peaks." I did not find this interpretation very convincing; only 10% of populations being able to achieve strong regulation sounds to me like ruggedness DOES impede adaptation in the vast majority of cases.

      We thank the reviewer for this thoughtful comment and agree that our original phrasing in the Abstract overstated this conclusion. We did not intend to imply that landscape ruggedness has only a minor effect on adaptation. On the contrary, our results clearly show that ruggedness strongly constrains evolutionary outcomes and prevents the majority of evolving populations from reaching the globally highest regulatory peaks. We have therefore toned down the wording in both the Abstract and the Discussion (lines 670-679) to reflect this more accurately. For example, in the abstract we now state

      “Nonetheless, evolutionary simulations show that ~10% of evolving populations can reach a peak of strong regulation, a proportion that is significantly greater than in comparable random landscapes.”

      In the discussion we state:

      “… Specifically, our evolutionary simulations show that 10% of populations with a size typical of E. coli reach one of the highest peaks. This percentage is significantly higher than in randomized landscapes (Supplementary Methods 9; Supplementary Figure S30)"

      Our intended interpretation was more limited: namely, that ruggedness does not fully preclude the evolution of strong regulation. In highly rugged landscapes with extensive sign epistasis—whose topological properties approach those of uncorrelated random landscapes—the a priori expectation is that access to the strongest peaks could be vanishingly rare or effectively impossible under Darwinian evolution. In this context, observing that a non-negligible fraction of populations (on the order of 10%) can reach one of the highest peaks suggests that strong regulation remains evolutionarily attainable, even though it is far from guaranteed.

      Motivated by the reviewer’s suggestion, we also added a null-model analysis that makes this point more explicitly and quantitatively. Specifically, we constructed randomized landscapes by permuting regulation-strength values across genotypes while preserving the experimentally sampled genotype network topology and all parameters of the evolutionary simulations (Supplementary Methods 9, “Randomized landscape null model for peak accessibility”). We then repeated the adaptive-walk simulations on these shuffled landscapes. This null model provides an expectation for peak accessibility in landscapes with identical sampling, neighborhood structure, and evolutionary dynamics, but without genotype–phenotype correlations.

      Using this null model, we find that the fraction of populations that reach high peaks in the empirical landscapes is substantially higher than expected by chance alone (new Supplementary Figure S30; Results, lines 504–516). Specifically, across the three transcription factors, empirical landscapes exhibit on average a ~3-fold higher accessibility of high regulatory peaks than shuffled landscapes. This comparison does not weaken the conclusion that ruggedness strongly impedes adaptation; rather, it shows that the structure of the measured genotype–phenotype landscapes enables greater accessibility of strong regulation than would be expected in equally rugged but unstructured landscapes.

      In response to the reviewer’s concern, we have revised the abstract and main text to avoid the phrase “does not prevent” and to more accurately convey this balance between constraint and accessibility. We now emphasize that ruggedness strongly constrains adaptation, while still allowing access to strong regulatory peaks at rates that exceed null expectations. (Discussion, lines 512-516). For example, in the discussion we state:

      “… In sum, rugged regulatory landscapes strongly constrain evolutionary trajectories, yet do not render the evolution of strong regulation vanishingly rare. Instead, strong regulatory phenotypes remain evolutionarily attainable at levels that exceed null expectations, even though they are reached by only a minority of evolving populations.”

      We believe that the revised wording, together with the added null-model analysis more faithfully represents our results and strengthens the quantitative interpretation of accessibility in these landscapes.

      (2) Line 123: I found the explanation of the plasmid system and the accompanying SI figures (Figures S1 and S2) confusing in terms of how many plasmids there were. In particular, the Figure S1 graphics show the plasmid specifically with CRP but the text in the graphic and in the caption refers to the plasmid pCAW-Sort-Seq-V2 (which, according to Table S1, isn't that just the base plasmid without any TF?). Figure S2 also shows the plasmid with CRP and does specify pCAW-Sort-Seq-V2-CRP-CRP0 in the graphic, but then the caption refers again only to the base plasmid pCAW-Sort-Seq-V2. I recommend the authors clarify these items for readers who might want to reproduce or build upon their system. In particular, I recommend the main text explain more explicitly that they generate three versions of this plasmid (one for each TF), and then on the backgrounds of each of those three plasmids, a whole library with all the binding site variants.

      We thank the reviewer for pointing out this lack of clarity. We agree that the original description of the plasmid system and the accompanying Supplementary Figures S1 and S2 could be confusing with respect to how many plasmids were used and how they differ.

      To clarify the experimental design, we start from a common backbone plasmid, pCAW-Sort-Seq-V2, which contains all shared regulatory and reporter elements but does not encode any transcription factor. From this backbone, we generated three distinct TF-specific plasmids, each carrying one of the transcription factors studied here—CRP, Fis, or IHF—resulting in pCAW-Sort-Seq-V2-CRP, pCAW-Sort-Seq-V2-Fis, and pCAW-Sort-Seq-V2-IHF. On the background of each TF-specific plasmid, we then constructed a complete library of plasmids containing all variants of the corresponding TF binding site cloned upstream of the reporter gene.

      We have revised the main text to explicitly describe this plasmid hierarchy and library construction strategy and to clarify that three TF-specific plasmids were generated prior to TFBS library construction (Results, Landscape mapping section; lines 159–193). In addition, we have redesigned Supplementary Figures S1 and S2 to facilitate understanding of the plasmid system. Specifically, these figures now clearly distinguish between the base plasmid backbone and the TF-specific plasmid derivatives. Also, the plasmid names shown in the graphics and captions are now consistent with those listed in Supplementary Table S1. Upon final publication, we will also deposit the sequences of all plasmids in Addgene to further facilitate reproducibility.

      (3) Line 135: Can the authors clarify whether these TFs are essential in these media conditions and, if not, why? I was expecting them to be so given the core functions of these TFs as described in the Introduction, but then Figure S3 appears to show that all knockouts are viable.

      We thank the reviewer for raising this important point and apologize for the lack of clarity in the original version of the manuscript. The transcription factors CRP, Fis, and IHF are not essential for viability under the growth conditions used in this study, but they are important for optimal growth and cellular fitness, consistent with their roles as global regulators.

      Under our experimental conditions, single-gene knockout strains (Δcrp, Δfis, and Δihf) are viable but exhibit slower growth dynamics compared to the wild-type strain, reflecting impaired regulation of core cellular processes (Supplementary Figure S3). This behavior is consistent with previous work showing that many global transcriptional regulators in E. coli are conditionally essential or strongly fitness-affecting, rather than absolutely essential under standard laboratory conditions.

      Importantly, while single knockouts remain viable, double mutants involving these global regulators are not viable, indicating substantial functional redundancy and network-level essentiality among global transcription factors. This explains why each TF can be studied individually in isolation, while combinations of deletions cannot be maintained.

      We have now clarified this point in the Results section by explicitly stating that the knockout strains show reduced growth rates but reach comparable cell densities during late exponential or early stationary phase, the growth phase at which all measurements were performed (Results, Landscape mapping section; lines 185–193). This clarification reconciles the apparent discrepancy between the biological importance of these transcription factors discussed in the Introduction and the viability of the single-knockout strains shown in Supplementary Figure S3.

      (4) Lines 141 and 227: The authors appear to refer to two different citations for different versions of RegulonDB (refs. 47 and 66). Did they actually use both versions for different purposes (if so, why?), or is this a typo?

      We thank the reviewer for noticing this inconsistency. We did not use two different versions of RegulonDB. The two separate references were an error. We have now corrected this by using a single, consistent RegulonDB citation in both locations.

      (5) Line 166 (Figure 1 caption): I think 2^8 here should be 4^8.

      Thank you. We have corrected “2<sup>8</sup>” to “4<sup>8</sup>” in the Figure 1 caption.

      (6) Figure 2Are the distributions in Figure 2a (regulation strengths across all TFBSs in the libraries) equivalent to the distributions in Figures S4-S6 (direct fluorescence readout from cell sorting), just transformed from fluorescence to regulation strength? If so I think that would be helpful to clarify, perhaps in the captions to Figures S4-S6 so that it's clear these contain the same information.

      No. Figures S4–S6 and Figure 2a do not show the same distributions. Figures S4–S6 display the raw fluorescence distributions obtained from cell sorting, whereas Figure 2a shows regulation strengths (S), which are derived quantities computed from these fluorescence data. Specifically, regulation strength is calculated as a weighted average over fluorescence bins using the sequencing read distribution for each TFBS (see Methods, “Regulation strengths”).

      To clarify this relationship, we have revised the main text (lines 201-203 and Figure 1b-c), to explicitly state how regulation strengths (S) were calculated.

      (7) Figure 2b: Can the authors label each logo/frequency matrix with its corresponding TF name in the graphic itself? I think this is only implied in the caption.

      We have updated Figure 2b to label each sequence logo / frequency matrix directly in the graphic with its corresponding transcription factor name (CRP, Fis, or IHF), in addition to mentioning these names in the caption. This change clarifies the figure and makes the TF identity immediately apparent to the reader.

      (8) Lines 290 and 298 (Figure 2 caption): The labels for panels b and c appear to be swapped in the caption.

      We thank the reviewer for pointing this out. The labels for panels b and c in the Figure 2 caption were indeed swapped. This has now been corrected.

      (9) Line 379: There is a missing period at the end of this line.

      We have added the missing period at the end of this line.

      (10) Line 400 (Figure 3 caption): There is a missing subtitle for panel c in the caption for this figure (all other panels seem to have bolded subtitles in their captions).

      We have added the missing subtitle for panel c in the Figure 3 caption to match the formatting of the other panels.

      (11) Line 583: There is a missing period after "Methods 7.5)".

      We have added the missing period after “Methods 7.5)”.

      (12) Line 641: "All three landscapes highly rugged" should probably be "All three landscapes are highly rugged".

      We have corrected the sentence to read “All three landscapes are highly rugged.”

    1. Since scav-5 and scav-6 are paralogs of scav-4, we analysed their functions in lipid accumulation using scav-5(ok1606) deletion mutants and scav-6 knockout alleles generated in this study through CRISPR/Cas9-mediated gene editing (Figure 4B). We found that when fed with JUb74, both scav-5(-) and scav-6(-) mutants had moderately reduced LD sizes, but not to the extent of scav-4(-) mutants (Figure 4E). Previous promoter reporter studies showed that scav-5 and scav-6 were expressed in the intestine.34 We constructed translational reporters for both genes and found weak or no signals for SCAV-5::TagRFP possibly due to low protein levels. The SCAV-6::TagRFP fusion protein was expressed in the intestine and was localized to the apical membrane (Figure 4C). From the fluorescent intensity, the scav-6 expression appeared to be weaker than the scav-4 expression. Moreover, scav-4(-) scav-6(-) double mutants had the same LD diameter as scav-4(-) single mutants (Figure 4F). The above results suggested that SCAV-4 may play a more significant role than the other two paralogs in intestinal lipid uptake.

      I'm surprised that the scav-5 and scav-6 paralogs were both able to reduce the large LD phenotype to the same extent as scav-4 (there doesn't appear to be significant difference between the mutants). To me this suggests either they each contribute a third of the BCFA uptake, or that they operate together to internalize BCFAs. The scav-4;scav-6 double mutant suggests the first idea isn't correct as you don't see a stronger effect there. Do you think its possible these transporters are working as a complex? I would be interested to see if you can rescue each of these mutants with scav-4 expression, or if rescue requires all receptors to be present.

    1. Author response:

      The following is the authors’ response to the previous reviews

      eLife Assessment

      This study offers valuable insights into how humans detect and adapt to regime shifts, highlighting dissociable contributions of the frontoparietal network and ventromedial prefrontal cortex to sensitivity to signal diagnosticity and transition probabilities. The combination of an innovative instructed-probability task, Bayesian behavioural modeling, and model-based fMRI analyses provides a solid foundation for the main claims; however, major interpretational limitations remain, particularly a potential confound between posterior switch probability and time in the neuroimaging results. At the behavioural level, reliance on explicitly instructed conditional probabilities leaves open alternative explanations that complicate attribution to a single computational mechanism, such that clearer disambiguation between competing accounts and stronger control of temporal and representational confounds would further strengthen the evidence.

      Thank you. In this revision, we addressed Reviewer 3’s remaining concern on the potential confound between posterior probability and time in neuroimaging results. First, as suggested by the reviewer, we provided images of activations for the effect of Pt and delta Pt after controlling for intertemporal prior in GLM-2. Second, we compared the effect of Pt and delta Pt between GLM-1 (without intertemporal prior) and GLM-2 (with intertemporal prior) and showed the results in a new figure (Figure 4).

      Regarding issue on reliance on explicitly instructed probabilities, we wish to point out that most of the concerns such as response mode and regression to the mean were addressed in the original behavioral paper by Massey and Wu (2005). Please see our response to this point in detail in Weakness (2) posted by Reviewer 3.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study examines human biases in a regime-change task, in which participants have to report the probability of a regime change in the face of noisy data. The behavioral results indicate that humans display systematic biases, in particular, overreaction in stable but noisy environments and underreaction in volatile settings with more certain signals. fMRI results suggest that a frontoparietal brain network is selectively involved in representing subjective sensitivity to noise, while the vmPFC selectively represents sensitivity to the rate of change.

      Strengths:

      - The study relies on a task that measures regime-change detection primarily based on descriptive information about the noisiness and rate of change. This distinguishes the study from prior work using reversal-learning or change-point tasks in which participants are required to learn these parameters from experiences. The authors discuss these differences comprehensively.

      - The study uses a simple Bayes-optimal model combined with model fitting, which seems to describe the data well. The model is comprehensively validated.

      - The authors apply model-based fMRI analyses that provide a close link to behavioral results, offering an elegant way to examine individual biases.

      Weaknesses:

      The authors have adequately addressed my prior concerns.

      Thank you for reviewing our paper and providing constructive comments that helped us improve our paper.

      Reviewer #3 (Public review):

      Thank you again for reviewing the manuscript. In this revision, we focused on addressing your concern on the potential confound between posterior probability and time in neuroimaging results. First, we presented whole-brain results of subjects’ probability estimates (Pt, their subjective posterior probability of switch) after controlling for the effect of time on probability of switch (the intertemporal prior). Second, we compared the effect of probability estimates (Pt) on vmPFC and ventral striatum activity—which we found to correlate with Pt—with and without including intertemporal prior in the GLM. These results will be summarized in a new figure (Figure 4) in the revised manuscript.

      As suggested by the reviewer, we also added slice-by-slice images of the whole-brain results on Pt and delta Pt in the supplement in addition to the Tables of Activation so that the activated brain regions can be clearly seen through these images.

      This study concerns how observers (human participants) detect changes in the statistics of their environment, termed regime shifts. To make this concrete, a series of 10 balls are drawn from an urn that contains mainly red or mainly blue balls. If there is a regime shift, the urn is changed over (from mainly red to mainly blue) at some point in the 10 trials. Participants report their belief that there has been a regime shift as a % probability. Their judgement should (mathematically) depend on the prior probability of a regime shift (which is set at one of three levels) and the strength of evidence (also one of three levels, operationalized as the proportion of red balls in the mostly-blue urn and vice versa). Participants are directly instructed of the prior probability of regime shift and proportion of red balls, which are presented on-screen as numerical probabilities. The task therefore differs from most previous work on this question in that probabilities are instructed rather than learned by observation, and beliefs are reported as numerical probabilities rather than being inferred from participants' choice behaviour (as in many bandit tasks, such as Behrens 2007 Nature Neurosci).

      The key behavioural finding is that participants over-estimate the prior probability of regime change when it is low, and under estimate it when it is high; and participants over-estimate the strength of evidence when it is low and under-estimate it when it is high. In other words participants make much less distinction between the different generative environments than an optimal observer would. This is termed 'system neglect'. A neuroeconomic-style mathematical model is presented and fit to data.

      Functional MRI results how that strength of evidence for a regime shift (roughly, the surprise associated with a blue ball from an apparently red urn) is associated with activity in the frontal-parietal orienting network. Meanwhile at time-points where the probability of a regime shift is high, there is activity in another network including vmPFC. Both networks show individual differences effects, such that people who were more sensitive to strength of evidence and prior probability show more activity in the frontal-parietal and vmPFC-linked networks respectively.

      Strengths

      (1) The study provides a different task for looking at change-detection and how this depends on estimates of environmental volatility and sensory evidence strength, in which participants are directly and precisely informed of the environmental volatility and sensory evidence strength rather than inferring them through observation as in most previous studies

      (2) Participants directly provide belief estimates as probabilities rather than experimenters inferring them from choice behaviour as in most previous studies

      (3) The results are consistent with well-established findings that surprising sensory events activate the frontal-parietal orienting network whilst updating of beliefs about the word ('regime shift') activates vmPFC.

      Weaknesses

      (1) The use of numerical probabilities (both to describe the environments to participants, and for participants to report their beliefs) may be problematic because people are notoriously bad at interpreting probabilities presented in this way, and show poor ability to reason with this information (see Kahneman's classic work on probabilistic reasoning, and how it can be improved by using natural frequencies). Therefore the fact that, in the present study, people do not fully use this information, or use it inaccurately, may reflect the mode of information delivery.

      In the response to this comment the authors have pointed out their own previous work showing that system neglect can occur even when numerical probabilities are not used. This is reassuring but there remains a large body of classic work showing that observers do struggle with conditional probabilities of the type presented in the task.

      Thank you. Yes, people do struggle with conditional probabilities in many studies. However, as our previous work suggested (Massey and Wu, 2005), system-neglect was likely not due to response mode (having to enter probability estimates or making binary predictions, and etc.).

      (2) Although a very precise model of 'system neglect' is presented, many other models could fit the data.

      For example, you would get similar effects due to attraction of parameter estimates towards a global mean - essentially application of a hyper-prior in which the parameters applied by each participant in each block are attracted towards the experiment-wise mean values of these parameters. For example, the prior probability of regime shift ground-truth values [0.01, 0.05, 0.10] are mapped to subjective values of [0.037, 0.052, 0.069]; this would occur if observers apply a hyper-prior that the probability of regime shift is about 0.05 (the average value over all blocks). This 'attraction to the mean' is a well-established phenomenon and cannot be ruled out with the current data (I suppose you could rule it out by comparing to another dataset in which the mean ground-truth value was different).

      More generally, any model in which participants don't fully use the numerical information they were given would produce apparent 'system neglect'. Four qualitatively different example reasons are: 1. Some individual participants completely ignored the probability values given. 2. Participants did not ignore the probability values given, but combined them with a hyperprior as above. 3. Participants had a reporting bias where their reported beliefs that a regime-change had occurred tend to be shifted towards 50% (rather than reporting 'confident' values such 5% or 95%). 4. Participants underweighted probability outliers, resulting in underweighting of evidence in the 'high signal diagnosticity' environment (10.1016/j.neuron.2014.01.020 )

      In summary I agree that any model that fits the data would have to capture the idea that participants don't differentiate between the different environments as much as they should, but I think there are a number of qualitatively different reasons why they might do this - of which the above are only examples - hence I find it problematic that the authors present the behaviour as evidence for one extremely specific model.

      We thank the reviewer for this comment. We thank you for putting out that there are alternative models that can describe the over- and underreaction seen in the dataset. Massey and Wu (2005) dealt with this possibility in their original paper. Their concern was not so much about alternative ways of modeling their results, but in terms of alternative psychological processes. For example, asymmetric noise accounts have been posited in the judgment and decision making literature as possible accounts of phenomena like over-confidence. They addressed what might be crudely called “regression/attraction to the mean” in two ways. First, they looked at median responses as well as mean responses (because medians are less affected by the regressive effect) and found the same patterns of over- and underreactions. Second, they also generated sequences that matched particular posterior probabilities (so that over- and underreaction cannot be explained by regression to the mean) and still found under- and overreactions.

      We also wish to point out in the judgment and decision making literature starting from Edwards (1968), there is a long history of using normative Bayesian model as the starting model and subsequently develop quasi-Bayesian models (like the system-neglect model) to describe systematic deviations from the normative Bayesian.

      Finally, we want to clarify that our primary goal is not to engage in model fitting exercise that examines different possible models. To us, what is more important is that system neglect is a psychologically motivated hypothesis. It is built on the idea that the lack of sensitivity to the system parameters is due to the fact that people focus primarily on the signals and secondarily on the system parameters that generate the signals. Massey and Wu (2005) dealt with a host of other potential explanations through experimental manipulations and data analysis. In this paper, we built on Massey and Wu to examine the neurocomputational basis that gives rise to over- and underreactions.

      (3) Despite efforts to control confounds in the fMRI study, including two control experiments, I think some confounds remain.

      For example, a network of regions is presented as correlating with the cumulative probability that there has been a regime shift in this block of 10 samples (Pt). However, regardless of the exact samples shown, Pt always increases with sample number (as by the time of later samples, there have been more opportunities for a regime shift)? To control for this the authors include, in a supplementary analysis, an 'intertemporal prior.' I would have preferred to see the results of this better-controlled analysis presented in the main figure. From the tables in the SI it is very difficult to tell how the results change with the includion of the control regressors.

      Thank you. In response, we added a new figure, now Figure 4, showing the results of Pt and delta Pt from GLM-2 where we added the intertemporal prior as a regressor to control for temporal confounds. We compared Pt and delta Pt results in vmPFC and ventral striatum between GLM-1 and GLM-2. We also showed the results on intertemporal prior on vmPFC and ventral striatum from GLM-2.

      On the other hand, two additional fMRI experiments are done as control experiments and the effect of Pt in the main study is compared to Pt in these control experiments. Whilst I admire the effort in carrying out control studies, I can't understand how these particular experiment are useful controls. For example, in experiment 3 participants simply type in numbers presented on the screen - how can we even have an estimate of Pt from this task?

      We thank the reviewer for this comment. On the one hand, the effect of Pt we see in brain activity can be simply due to motor confounds and the purpose of Experiment 3 was to control for them. Our question was, if subjects saw the similar visual layout and were just instructed to press buttons to indicate two-digit numbers, would we observe the vmPFC, ventral striatum, and the frontoparietal network like what we did in the main experiment (Experiment 1)?

      On the other hand, the effect of Pt can simply reflect probability estimates of that the current regime is the blue regime, and therefore not particularly about change detection. In Experiment 2, we tested that idea, namely whether what we found about Pt was unique to change detection. In Experiment 2, subjects estimated the probability that the current regime is the blue regime (just as they did in Experiment 1) except that there were no regime shifts involved. In other words, it is possible that the regions we identified were generally associated with probability estimation and not particularly about probability estimates of change. We used Experiment 2 to examine whether this were true.

      To make the purpose of the two control experiments clearer, we updated the paragraph describing the control experiments on page 9:

      “To establish the neural representations for regime-shift estimation, we performed three fMRI experiments (n = 30 subjects for each experiment, 90 subjects in total). Experiment 1 was the main experiment, while Experiments 2 to 3 were control experiments that ruled out two important confounds (Fig. 1E). The control experiments were designed to clarify whether any effect of subjects’ probability estimates of a regime shift, P<sub>t</sub>, in brain activity can be uniquely attributed to change detection. Here we considered two major confounds that can contribute to the effect of P<sub>t</sub>. First, since subjects in Experiment 1 made judgments about the probability that the current regime is the blue regime (which corresponded to probability of regime change), the effect of P<sub>t</sub> did not particularly have to do with change detection. To address this issue, in Experiment 2 subjects made exactly the same judgments as in Experiment 1 except that the environments were stationary (no transition from one regime to another was possible), as in Edwards (1968) classic “bookbag-and-poker chip” studies. Subjects in both experiments had to estimate the probability that the current regime is the blue regime, but this estimation corresponded to the estimates of regime change only in Experiment 1. Therefore, activity that correlated with probability estimates in Experiment 1 but not in Experiment 2 can be uniquely attributed to representing regime-shift judgments. Second, the effect of P<sub>t</sub> can be due to motor preparation and/or execution, as subjects in Experiment 1 entered two-digit numbers with button presses to indicate their probability estimates. To address this issue, in Experiment 3 subjects performed a task where they were presented with two-digit numbers and were instructed to enter the numbers with button presses. By comparing the fMRI results of these experiments, we were therefore able to establish the neural representations that can be uniquely attributed to the probability estimates of regime-shift.”

      To further make sure that the probability-estimate signals in Experiment 1 were not due to motor confounds, we implemented an action-handedness regressor in the GLM, as we described below on page 19:

      “Finally, we note that in GLM-1, we implemented an “action-handedness” regressor to directly address the motor-confound issue, that higher probability estimates preferentially involved right-handed responses for entering higher digits. The action-handedness regressor was parametric, coding -1 if both finger presses involved the left hand (e.g., a subject pressed “23” as her probability estimate when seeing a signal), 0 if using one left finger and one right finger (e.g., “75”), and 1 if both finger presses involved the right hand (e.g., “90”). Taken together, these results ruled out motor confounds and suggested that vmPFC and ventral striatum represent subjects’ probability estimates of change (regime shifts) and belief revision.”

      (4) The Discussion is very long, and whilst a lot of related literature is cited, I found it hard to pin down within the discussion, what the key contributions of this study are. In my opinion it would be better to have a short but incisive discussion highlighting the advances in understanding that arise from the current study, rather than reviewing the field so broadly.

      Thank you. We thank the reviewer for pushing us to highlight the key contributions. In response, we added a paragraph at the beginning of Discussion to better highlight our contributions:

      “In this study, we investigated how humans detect changes in the environments and the neural mechanisms that contribute to how we might under- and overreact in our judgments. Combining a novel behavioral paradigm with computational modeling and fMRI, we discovered that sensitivity to environmental parameters that directly impact change detection is a key mechanism for under- and overreactions. This mechanism is implemented by distinct brain networks in the frontal and parietal cortices and in accordance with the computational roles they played in change detection. By introducing the framework in system neglect and providing evidence for its neural implementations, this study offered both theoretical and empirical insights into how systematic judgment biases arise in dynamic environments.”

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      Thank you for pointing out the inclusion of the intertemporal prior in glm2, this seems like an important control that would address my criticism. Why not present this better-controlled analysis in the main figure, rather than the results for glm1 which has no effective control of the increasing posterior probability of a reversal with time?

      Thank you for this suggestion. We added a new figure (Figure 4) that showed results of Pt and delta Pt from GLM-2. We also compared the effect of Pt and delta Pt between GLM-1 and GLM-2. We found that the effect of Pt and delta Pt did not differ between GLM-1 and GLM-2. GLM-1 and GLM-2 differed on whether various task-related regressors contributing to Pt, including the intertemporal prior, were included in the model. In GLM-1, those task-related regressors were not included. In GLM-2, the task-related regressors were included in addition to Pt and delta P.

      The reason we kept results from GLM-1 (Figure 3) was primarily because we wanted to compare the effect of Pt between experiments under identical GLM. In other words, the regressors in GLM-1 was identical across all 3 experiments. In Experiments 1 and 2, Pt and delta Pt were respectively probability estimates and belief updates that current regime was the Blue regime. In Experiment 3, Pt and delta Pt were simply the number subjects were instructed to press (Pt) and change in number between successive periods (delta Pt).

      Here is the section in the main text where we discussed the new Figure 4 on page 19-22:

      We further examined the robustness of P<sub>t</sub> and ∆P<sub>t</sub> representations in vmPFC and ventral striatum in three follow-up analyses. In the first analysis, we implemented a GLM (GLM-2 in Methods) that, in addition to P<sub>t</sub> and ∆P<sub>t</sub>, included various task-related variables contributing to P<sub>t</sub> as regressors. Specifically, to account for the fact that the probability of regime change increased over time, we included the intertemporal prior as a regressor in GLM-2. The intertemporal prior is the natural logarithm of the odds in favor of regime shift in the t-th period, , where q is transition probability and t = 1, …, 10is the period (Eq. 1 in Methods). It describes normatively how the prior probability of change increased over time regardless of the signals (blue and red balls) the subjects saw during a trial. Including it along with P<sub>t</sub> would clarify whether any effect of P<sub>t</sub> can otherwise be attributed to the intertemporal prior. We found that the results of P<sub>t</sub> and ∆P<sub>t</sub> in the vmPFC and ventral striatum in GLM-2 were identical to those in GLM-1 (Fig. 4): Fig. 4A was meant to depict the results in slices identical to those shown in Fig. 3B for results based on GLM-1. For slice-by-slice results, see Fig. S7 in SI for results based on GLM-1 and Fig. S9 for GLM-2. For Tables of activations, see Tables S1-S3 in SI for GLM-1 and Tables S7-S9 for GLM-2. In a separate, independent region-of-interest (ROI) analysis on vmPFC and ventral striatum (Fig. 4BC; see Independent regions-of-interest (ROIs) analysis in Methods for details), we further compared the effect of both P<sub>t</sub> and ∆P<sub>t</sub> between GLM-1 and GLM-2. For P<sub>t</sub>, the difference between GLM-1 and GLM-2 was not significant (paired t-test, t(58) = −0.72, p = 0.47 in vmPFC, t(58) = −0.21, p = 0.83 in ventral striatum), while the effect of P<sub>t</sub> from GLM-1 (one sample t-test, t(29) = −3,82, p <.01 in vmPFC; t(29) = −3.06, p <.01 in ventral striatum) and GLM-2 was significant (one-sample t-test, t(29) = −2.69, p =.01 in vmPFC; t(29) = −2.50, p .02 in ventral striatum). For ∆P<sub>t</sub>, the difference between GLM-1 and GLM-2 was not significant (paired t-test, t(58) = −0.07, p =0.94 in vmPFC; t(58) = −0.14, p =0.88 in ventral striatum), while the effect of  from GLM-1 (one-sample t-test, t(29) = −3.12, p <.01 in vmPFC; t(29) = −4.14, p <.01 in ventral striatum) and GLM-2 was significant (one-sample t-test, t(29) = −2.92, p <.01 in vmPFC; t(29) = −3.59, p <.01 in ventral striatum). For the intertemporal prior, activity in both vmPFC and ventral striatum did not correlate significantly with the intertemporal prior (one-sample t-test, t(29) = −0.07, p =0.95 in vmPFC; t(29) = −0.53, p =0.60 in ventral striatum). All the t-tests described above were two-tailed. Taken together, these results suggest that vmPFC and ventral striatum represented P<sub>t</sub> and ∆P<sub>t</sub> regardless of whether the intertemporal prior and other task-related regressors contributing to P<sub>t</sub> were included in the GLM. We also did not find that vmPFC and ventral striatum to represent the intertemporal prior. In the second analysis, we implemented a GLM that replaced P<sub>t</sub> with the log odds of P<sub>t</sub>, 1n (P<sub>t</sub>/(1 - P<sub>t</sub>)) (Fig. S10 in SI). In the third analysis, we implemented a GLM that examined P<sub>t</sub> separately on periods when change-consistent (blue balls) and change-inconsistent (red balls) signals appeared (Fig. S11 in SI). Each of these analyses showed significant correlation with P<sub>t</sub> in vmPFC and ventral striatum, further establishing the robustness of the P<sub>t</sub> findings.

      As a further point I could not navigate the tables of fMRI activations in SI and recommend replacing or supplementing these with images. For example I cannot actually find a vmPFC or ventral striatum cluster listed for the effect of Pt in GLM1 (version in table S1), which I thought were the main results? Beyond that, comparing how much weaker (or not) those results are when additional confound regressors are included in GLM2 seems impossible.

      As suggested by the reviewer, we added slice-by-slice images showing the effect of Pt and delta Pt (Figure S9 in SI for GLM-2 and Figure S7 for GLM-1). The clusters in blue represent Pt effect, the clusters in orange represent delta Pt effect. As can be seen, both Pt and delta Pt are represented in the vmPFC and ventral striatum.

    1. Open a social media interface (not the one you’ve been working with) and choose a view (e.g., a list of posts, an individual post, an author page etc.). First identify as many pieces of information you can see the screen (without doing anything). For each piece of information: What data types might be used to represent that data on a computer? How is this data a simplification of reality? That is, what does it not capture? Who does it work best for, and who does it not work well for? Did the user(s) directly provide that data, or was it collected automatically by the social media site?

      TikTok only shows the number of likes as an integer data type, meaning it tells me how many people liked a video, but it does not show different emotions like Facebook, where users can react with various feelings. So we cannot really tell whether people truly enjoyed the video or just saved or liked it to share with others. It does not clearly reflect viewers’ real feelings, including mine. Another example is text data such as usernames and profile pictures which are based on users’ personal preferences and do not necessarily reflect who they are in real life. This is why there are many fake accounts on social media, created for different purposes. Sometimes when scrolling on TikTok, I wonder why I see unfamiliar videos that I have never searched for or talked about. I think this happens because the platform collects data from my followers, and if they like certain types of videos, similar content may also appear on my feed.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. Point-by-point description of the revisions


      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In this study, the authors investigated the effect of nutritional stress (HSD and HFD) on cardiac function by assessing multiple parameters on adult flies. They next identified the adaptive transcriptomic changes in the heart in response to these nutritional stresses and screened for their roles under ND, HSD and HFD. They identified fit gene, encoding a satiety gene, expressed by cardiomyocytes and pericardial cells.

      I think the characterisation is thorough; however, the conclusion is not well supported by the evidence. My main concern is that in many graphs, the difference between control and experiment is subtle, and, secondly, the authors showed some conflicting results (e.g. one RNAi showed a reduction of one parameter, however, the other independent RNAi did not. In this case, I believe the authors shouldn't conclude that the RNAi is functionally required, since the RNAis are meant to confirm each other.

      First, we thank the reviewer for her/his constructive comments and suggestions. We obtained new results presented in the last version of the manuscript, which consistently support our conclusions and improve the study.

      High-Sugar and High-Fat Diets modified cardiac performance

      They assessed how HSD and HFD affect Adult fly heart performance. Instead of performing 3 weeks of dietary manipulation as has been done before by other groups, they put adult flies on HSD for 7 days and HFD for only 3 days.

      We would like to clarify the nutritional challenge used. Cardiac function of flies was assessed at 10 days after emergence. Flies were put either in ND or HSD during these 10 days (ND and HSD conditions), or in ND for 7 days then transferred on HFD for 3days (HFD condition). Finally, all the females spent 10 days in a diet before being imaged or before hearts/brains dissection.

      They found: HSD increases HP and SI, and reduces AI. The difference is too small and not consistent between different control lines. Also, when the difference is this small, p value does not tell much!

      They probably intentionally induced a milder effect so that they could assess adaptive transcriptomic changes to this nutritional stress. In Fig. 1D SI is increased under HSD with control-KK, In Fig. S1C, SI is not changed under HSD with control-GD and control-GFP. Instead, DI is increased, which is also opposite to what they showed in Fig. 1 C. HFD increased ESD, EDD, SV, FS and CO.(Hypertrophy). This is not true with control-GD and control-GFP lines though! Comments: They have assessed many parameters in live animals with many different control lines, which is thorough. However, it is hard to draw any conclusions based on these conflicting results. Are these effect KK line specific?

      Globally, we agree with the reviewer that the results, presented in the first version of the manuscript, for the control lines were difficult to understand due to the inconsistency of the phenotypes. In this revised version, we performed new results in Figure 1 and __S1 __regarding the effect of 10 days HSD and 3 days HFD exposure vs ND.

      105 to 187 flies were imaged for the 3 control conditions, in the 3 diets concomitantly, to increase the power of our analysis. As mentioned in the main text (page 3, line 30-35; page 4, line 1-5), both diets deteriorate cardiac function with HFD leading to consistent phenotypes on heart diameters and rhythm and HSD milder effects. Indeed, the 3 control lines were uniformly affected by HFD after 3 days exposure, whereas 10 days in HSD was not sufficient to quantify a significant effect despite consistent the trends on several phenotypes (EDD, ESD, DI, AI and CO. These results revealed a different sensitivity of the cardiac performance when exposed to sugar and fat.

      As described in the text, we were nevertheless confident that our approach would be good to investigate the early molecular dysregulations induced by sugar. This was the purpose of our analysis, presented in the follow-up of the manuscript.

      Regarding the small differences measured in the phenotypes in HSD and HFD compared to ND, we would like to outline that the values presented are normalized values to control. Normalization is done for every independent experiment, performed at different dates, and permits the graphical representation of pooled values. Statistical analysis is performed using non-parametric Kruskal-Wallis test accordingly. Values are presented with the X axis cutting the Y axis at 0, this graphical representation also contributes to flattening the differences and p-values indicate their significance.

      Analysis of the fly cardiac transcriptome upon nutritional stress

      RNA seq to detect differentially expressed genes under HSD and HFD vs ND. Most DE genes are downregulated, which prompts them to assess how the downregulation of these genes adapts the animals to this nutritional stress.

      High Sugar Diet downregulated 1c-metabolism and Leloir galactose pathways.

      In this revised manuscript, we first present RT-qPCR validating the downregulation of Gnmt, Sardh and Galk expressions in the heart of 10days old HSD-fed females compared to ND-fed ones (Figure S3A).

      We apologize for the confused explanations in the first version of the manuscript. We show new results in Figure 3 and __S3 __on the cardiac function of both Gnmt and Sardh, where following reviewer’s suggestion, both genes were knocked down in the heart in ND and Gnmt overexpressed in HSD. No available tools allowed us to test Sardh overexpression in HSD and we could not get some for Galk.

      GNMT is downregulated under HSD and HFD.

      In ND, GNMT knockdown increased ESD, EDD and CO. Sardh knockdown did the same? However, Sardh knockdown did not affect ESD significantly.

      We reanalyze our first data and added new ones, comparing only knockdown or overexpression to the corresponding controls performed in concomitant experiments. Results are now shown in Figure 3C-E; S3C-H. Knocking down Gnmt in the heart increased HP, EDD, ESD and CO, Sardh knockdown in ND resulted in milder phenotypes but inducing significant hypertrophy in ND as Gnmt does. In both cases, FS was not impacted.

      Both genes have been previously shown as beneficial to muscular function in time-restricted feeding context (Livelo et al., 2023, Nat.Comm.), illustrating that, even if both enzymes are involved in opposite reaction, their function has the same effect on organ/tissue function, as they did for heart diameters. The text corresponding to results and discussion were updated accordingly (pages 5, 11).

      The conclusion here is: GNMT knockdown induces hypertrophy, similar to the effect of HFD.

      In HSD, further knockdown of GNMT reduced (rescued) HP, suggesting downregulation of GNMT under HSD is adaptive. Should overexpress GNMT under HSD to see if this manipulation further increases HP, to claim GNMT downregulation is an adaptive change to high sugar stress.

      We thank the reviewer for her/his suggestion. We now used UAS-GnmtWT (from FlyORF) to assess the role of Gnmt on cardiac function in HSD.

      As shown in (Figure 3C-E; S3C,F), overexpressing Gnmt in the heart in HSD was sufficient to rescue some sugar induced phenotypes or to induce other dysfunctions, when compared to corresponding controls evaluated in the same experiments in ND and HSD. Notably, HP increase and CO decrease are rescued by Gnmt cardiac overexpression in HSD. Interestingly, the cardiac diastolic constriction induced by HSD is associated to increased FS and CO in this genotype in sugar diet. These new results strengthen the positive effect of Gnmt on cardiac function, improving it in HSD and preventing its deterioration in this diet.

      Of note, Gnmt overexpression in ND did not trigger cardiac dysfunctions (data not shown).

      The results and conclusions have been corrected.

      Interestingly, HSD itself tends to decrease AI, a further knockdown of GNMT further decreases AI. This indicates GNMT downregulation under HSD contributes to AI reduction. Together, GNMT downregulation under HSD prevents HP from going higher, while its downregulation causes AI going down.

      In the manscript, the authors claim that " Gnmt KD led reduced HP and AI, suggesting that it is able to counteract the effect of HSD observed in control flies on these phenotypes". This is not true according to the logic in Results section 1. As in section 1, the effect of HSD on AI is not significant, so the authors shouldn't say" HS tended to reduce AI".

      Our reanalyzes and new results showed no Gnmt impact on AI, so these Figure panels were removed.

      Why GNMT knockdown reduced FS under ND (Fig. S3C), while increasing FS under HSD (Fig. 3F)? If GNMT knockdown induces hypertrophy, I would expect it to increase FS.

      Gnmt overexpression did not affect cardiac diameters in HSD, but it nevertheless led to an increased contractile efficacy compared to HSD controls (Figure S3F).

      These new results strengthen the positive effect of Gnmt on cardiac function, preventing its deterioration in sugar diet. The text was modified accordingly.

      High Fat Diet modulated CD36-scavenger receptor and Glut8 orthologues

      In this revised manuscript, we present RT-qPCR validating the downregulation of Snmp1 expression and the slight upregulation of nebu’s in the heart of 10days old HFD-fed females compared to ND-fed ones (Figure S3B).

      HFD: Snmp1 gene is downregulated, however, both overexpression and knockdown of Snmp1 in ND induced some phenotypes.

      Indeed, as mentioned in the revised manuscript (page 6, lines 21-24), in heart of females fed ND, both Snmp1 knockdown (Snmp1KK) and overexpression (Snmp1WT) showed a reduction of EDD and ESD (Figure 3J; S3J) but FS is increased accordingly only in Snmp1KK.

      As notified in the text, both downregulation and overexpression of Snmp1 led to side-phenotypes (page 6, lines 24-28): Snmp1KK exhibited abdominal fat increase (Figure S3K) and ostial cells seemed clearly malformed in Snmp1WT (Figure 3M). This may explain why the heart shows the same type of functional impairment in both genotypes.

      We now discussed the hypothesis that these similar cardiac dysfunctions may result from Snmp1 being a regulator of organismal or cardiac lipid homeostasis. Indeed, increasing body fat content is deleterious as is increasing the import of fat in the cardiomyocytes. Finally, both affects cardiac cells’ health and functioning.

      HFD: nebu has a role in regulating cardiac function under ND.

      HSD and HFD revealed the secretory function of the heart

      They identified diet-regulated secreted proteins that are required for cardiac dysfunction.

      Cardiac Fit expression impacted Cardiac performance.

      The author used Hand-G4 to knock down Fit using KK and GD lines, KK line showed a reduction in HP (Fig. 5A), but not GD line (Fig. S5D). How did the author conclude that Fit is required for cardiac function? Also, with the positive data, the difference is too subtle.

      We apologize and agree that the contradictory or inconsistent results obtained with the two RNAi lines were confusing.

      For this revised version, we first assess the effect of the two RNAi lines (KK and GD) on fit expression in the dissected hearts. RT-qPCR for KK line is presented in Figure S5A. GD line did not show a significant reduction of fit expression when expressed in the heart with Hand>, which can explain the former results presented (not shown but data are available). So, we removed all results obtained with the GD line in this revised version.

      To confirm the KK effects, we used fit KO allele (fit81) and truncated version of fit, without its signal peptide (fitDeltaSP), which has a dominant negative effect, both previously published and validated (Sun et al. 2017, Nat. Comm.). These two mutants were used to investigate the cardiac function of fit in our analysis. Results presented in Figure 5 and S5 confirm the phenotypes already observed with the KK line when expressed with Hand> in the heart and with Lsp2> in the fat body.

      Our results validate the effect of fit decrease on rhythmicity and contractility, the reverse effects being consistently observed in fit overexpression. In conclusion, we are confident in the requirement of Fit in the regulation of cardiac performance.

      These new data are now included in the results section “Cardiac Fit expression impacted Cardiac performance” (pages 8-9)

      **Referee cross-commenting**

      i agree with the experiments proposed by reviewer 2.

      Reviewer #1 (Significance (Required)):

      The study aims to examine the effect of diet on cardiac function.

      The strength is that a lot of characterisations were done.

      the weakness is the functional data regarding fit could not be validated in two different RNAis, thus the evidence is not strong to support the conclusions.

      We again would like to thank the reviewer for her/his remarks and suggestions. She/He highlights the weakness of the first analysis and this was an important and constructive feedbacks for us. We strengthened our results by increasing samples, reanalyzing data and performing mandatory new experiments that are now included in this revised version.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, Khamvongsa-Charbonnier et al. reported a RNA-seq analysis and RNA interference screening on high-fat and high-sugar-induced cardiomyopathy in Drosophila. The authors uncovered novel genes in 1C-metabolism, galactose metabolism, CD36-scavenger receptor and glucose transporter, as adaptative factors of cardiac function under high-fat and high-sugar treatment. The authors also identified a satiety hormone, Fit, as a cardiokine to control food intake and , expressed by dilp5 secretion. In summary, this study leverages the powerful genetic model Drosophila to uncover a number of new factors in regulating cardiac function under nutritional stresses and potentially offers new insights into molecular mechanisms underlying diet-related cardiac diseases. I have a few concerns, as listed below.

      First, we would like to thank the reviewer for her/his comments and suggestions that deeply help us to improve the take-home messages of our manuscript. Following her/his recommendations and suggestions, we can now present a revised and stronger version of our manuscript.

      1. Quantitative RT-PCR is required to validate the expression patterns of candidate genes identified from the RNAseq analysis.

      RT-qPCR have been performed on hearts dissected from 10 days old females fed ND, HSD or HFD. Gnmt, Sardh and Galk validated downregulation are presented in Figure S3A, Snmp1 downregulation and nebu upregulation (trend but non-significant) in Figure S3B, fit downregulation in Figure S5A.

      The authors state that the dysregulated gene expression patterns reflect acute adaptation to HSD and HFD stresses. Most of the candidate genes in this study were downregulated upon HSD and HFD. However, it is recommended that overexpression of these genes, rather than knockdown, is needed to confirm whether the downregulation of these candidate genes upon stresses is an adaptative response.

      We agree with the reviewer and followed her/his recommendation when tools were accessible for our analysis.

      For example, HSD feeding induces the heart period. Knocking down Gnmt, specifically in the heart, under the HSD feeding changes can reduce the heart period. This evidence is insufficient to suggest the protective role of Gnmt under the HSD diet. Gnmt has already been downregulated under the HSD. Further knockdown of Gnmt, instead of returning the Gnmt expression to normal levels, to protect cardiac contractile performance complicates the model.

      We thank the reviewer for her/his suggestion. We used UAS-*GnmtWT * (from FlyORF) to perform these experiments.

      As shown in (Figure 3C-E; S3C,F), knocking down Gnmt in the heart increased HP, EDD, ESD and CO. In the same Figure panels and in Figure S3F, we showed that overexpressing Gnmt with Hand> in HSD was sufficient to rescue some sugar induced phenotypes or to induce some, when compared to corresponding controls evaluated in the same experiments in ND and HSD. Gnmt overexpression in ND did not trigger cardiac dysfunctions (data not shown).

      HP increase and CO decrease are rescued by Gnmt cardiac overexpression in HSD. Interestingly, the cardiac constriction induced by HSD is not rescued by Gnmt overexpression, but it is enough to increase FS and CO in sugar diet. These new results strengthen the positive effect of Gnmt on cardiac function, improving it in HSD and preventing its deterioration in this diet.

      Sardh knockdown in ND, resulted in milder phenotypes but induced significant hypertrophy in ND as Gnmt does. No available tools allowed us to test its overexpression in HSD.

      Nevertheless, as mentioned and discussed in the manuscript (page 5, line 27-30; page 11, lines 11-14), such protective role of muscular function and integrity has already been characterized in fly IFM in time-restricted feeding experiments for Gnmt and Sardh (Livelo et al., 2023, Nat.Comm.). Our experiments show that both genes encounter the same role in cardiac function upon nutritional stresses. The text was modified accordingly.

      The authors suggest that the effect of nebu on heart contractility is not dependent on diet. However, based on the result from Figure 3O-P, the HFD treatment blocks the effect of nebu knockdown on heart contractility. The authors need to further explain these results and modify their conclusions accordingly.

      We completely agree with the reviewer. We did not correctly analyze these results. We reanalyze our data, taking into account only the experiments of nebu knockdown that were performed in ND and in HFD concomitantly. Results are shown in Figure 3O-P; S3L-N.

      As mentioned in the manuscript (page 7, lines 3-8), nebu knockdown led to identical HP decrease in both diets but its constrictive effect (reduction of heart diameters) in ND is abrogated by fat diet.

      We modified the text accordingly in the results and discussion (page 7, lines 8-11; page 12, lines 7-12).

      It is a bit confusing that knockdown of fit using Hand-Gal4 induced food intake, but knockdown of fit using tin-Gal4 or Dot-Gal4 did not significantly induce food intake (Fig 6A). The author did not provide any explanation of these results. What is even more confusing is that overexpressing fit using Dot-Gal4 decreased food intake, but overexpressing fit using Hand-Gal4 or tin-Gal4 did not significantly decrease food intake (Fig 6B). Why was the strong food intake phenotype not observed using Hand-Gal4 in both experiments? These confusing results lead to a question, which cell type is responsible for the production of cardiokine, Fit?

      We apologize for the misleading results presented in the initial manuscript. We hope that our revised version will clarify Fit function regarding its remote impact.

      Concerning the requirement of Fit function and the cell types that produces Fit, the results we obtained when evaluating cardiac performance strongly suggest that both cardiomyocytes and pericardial cells are important and recapitulate the effect of Hand> (Figure 5A-C; S5G-H).

      In the case of food intake measurements, we now present results with newly performed food intake experiments for the Hand>fitWT (Figure 6D). They show a significant reduction of food intake in this condition, corroborating the results obtained with Dot>. We add a clarification in the manuscript for this point (page 10, lines 11-16).

      When testing the role of cardiac Fit in Dilp5 secretion, the authors subjected flies to starvation stress. However, the main focus of the present study is on HSD and HFD. The RNAseq analysis showed that Fit expression was downregulated by both HSD and HFD. Can the authors show that Dilp5 secretion is reduced by both HSD and HFD? Most importantly, can the authors test whether overexpression of cardiac Fit blocks HSD- or HFD-reduced Dilp5 secretion?

      We understand the point raised by the reviewer. First of all, we wanted to correlate the measured impact on food intake, when manipulating fit expression in the heart, to the level of Dilp release, as it has been used and validated in (Sun et al. 2017, Nat. Comm.). In this purpose, we used the same approach and protocol and results are shown in Figure 6 E-F.

      As mentioned by the reviewer, fit expression is downregulated in both HSD and HFD (which we confirmed by RT-qPCR in Figure S5A). As suggested by the reviewer, we performed Dilp5 immunostaining on CNS from females that were fed HSD of HFD for 10 days. Our results, in Figure 6B (left panels) and corresponding quantifications in Figure 6C, show that both diets strongly induce a decrease in Dilp5 amount in the IPCs and that it was not due to an altered Dilp2 or Dilp5 expression in the CNS (Figure S6A). In this condition, overexpressing fit, which has a promoting effect on Dilp secretion (Figure 6B, right panels ND), may only have an additive effect. This is shown in Figure 6B-C.

      Reviewer #2 (Significance (Required)):

      In summary, this study leverages the powerful genetic model Drosophila to uncover a number of new factors in regulating cardiac function under nutritional stresses and potentially offers new insights into molecular mechanisms underlying diet-related cardiac diseases.

      We again would like to thank the reviewer for her/his remarks and suggestions. Her/His important and constructive feedbacks helped us to improve and strengthen our study. Despite the weak points of the first version, she/he had supportive feedback and we deeply thank her/him. This revised version had improved results and analysis, thanks to the use of new genetic tools that strengthen this analysis.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      Despite this compelling data regarding the protective role of HSF1 in the febrile response, what remains unexplained and complicates the authors' model is the observation that losing LvHSF1 at 'normal' temperatures of 25 ℃ is not detrimental to survival, even though viral loads increase and nSWD is likely still subject to LvHSF1 regulation. These observations suggest that WSSV infection may have other detrimental effects on the cell not reflected by viral load and that LvHSF1 may play additional roles in protecting the organism from these effects of WSSV infection, such as perhaps, perturbations to protein homeostasis. This is worth discussing, especially in light of the rather complicated roles of hormesis in protection from infection, the role of HSF1 in hormesis responses, and the findings from other groups that the authors discuss.

      We are grateful for your unbiased advice by reviewer. And we have added the description about the role of HSF1 in hormesis responses in discussion in Lines 422-425 in the revised manuscript. Thank you.

      Reviewer #2 (Public review):

      Temperature is a critical factor affecting the progression of viral diseases in vertebrates and invertebrates. In the current study, the authors investigate mechanisms by which high temperatures promote anti-viral resistance in shrimp. They show that high temperatures induce HSF1 expression, which in turn upregulates AMPs. The AMPs target viral envelope proteins and inhibit viral infection/replication. The authors confirm this process in drosophila and suggest that there may be a conserved mechanism of high-temperature mediated anti-viral response in arthropods. These findings will enhance our understanding of how high temperature improves resistance to viral infection in animals.

      The conclusions of this paper are mostly well supported by data, but some aspects of data analysis need to be clarified and extended. Further investigation on how WSSV infection is affected by AMP would have strengthened the study.

      We are grateful for your unbiased advice by reviewer. We have provided additional experimental evidence and supplementary instructions in the revised manuscript. Thank you.

      Reviewer #3 (Public review):

      In the manuscript titled "Heat Shock Factor Regulation of Antimicrobial Peptides Expression Suggests a Conserved Defense Mechanism Induced by Febrile Temperature in Arthropods", the authors investigate the role of heat shock factor 1 (HSF1) in regulating antimicrobial peptides (AMPs) in response to viral infections, particularly focusing on febrile temperatures. Using shrimp (Litopenaeus vannamei) and Drosophila S2 cells as models, this study shows that HSF1 induces the expression of AMPs, which in turn inhibit viral replication, offering insights into how febrile temperatures enhance immune responses. The study demonstrates that HSF1 binds to heat shock elements (HSE) in AMPs, suggesting a conserved antiviral defense mechanism in arthropods. The findings are informative for understanding innate immunity against viral infections, particularly in aquaculture. However, the logical flow of the paper can be improved.

      We are grateful for the positive comments and the unbiased advice by reviewer. We have improved the logical flow of the paper and added corresponding instructions in the revised manuscript. Thank you.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 1: The analysis compares Group TW to Group W (not the other way around).

      Thank you very much. To uncover the molecular mechanisms by which high temperature restricts WSSV infection, two shrimp groups, Group TW and Group W, were cultured at 25 °C. Group W comprised shrimp injected with WSSV and maintained at 25 °C continuously. In contrast, Group TW was subjected to a temperature increase to 32 °C at 24 hours post-injection (hpi). Gill samples were collected for analysis 12 hours post-temperature rise (hptr) and subjected to Illumina sequencing (Figure 1A). RNA-seq was used to identify genes responsive to high temperature, particularly those encoding potential transcriptional regulators. Thank you.

      (2) The RNA-seq data in Figure 1 focus only on the TFs. The manuscript would benefit from showing all the RNA-seq data and the differentially expressed genes. In particular, are the AMPs upregulated at the same time point? This should not be the case if LvHSF1 were responsible for the transcription of the AMPs, given the time lag between transcription and translation.

      Thank you for your suggestion. In Author response image 1, our previous study has revealed that classical heat shock proteins (such as HSP21, HSP70, HSP60, HSP83, HSP90, HSP27, HSP10, and Bip) were induced by RNA-seq between Group TW and Group W, suggesting heat shock proteins exert a crucial role in enhancing the resistance of shrimp to WSSV at elevated temperatures (32 ℃) and underscoring the reliability of our transcriptomic findings (Xiao et al., 2024).

      Additionally, we also analyzed the AMPs expression between Group TW and Group W, and the results show that some antimicrobial peptides such as Lysozyme and C-type lectin are upregulated between Group TW and Group W. Notably, we did not detect upregulated expression of SWD between Group TW and Group W. We agree with the reviewer's point of view that there is a time lag between transcription and translation. Supplementary experimental evidences show that the expression level of LvHSF1 is strongly induced by WSSV stimulation, and then the expression level of SWD begins to increase. We have added a description in Lines 136-138 in the revised manuscript.

      Author response image 1.

      The Figure of the heat shock proteins in Group TW and Group W

      Author response image 2.

      Transcriptional expression levels of HSF1 and SWD after WSSV stimulation

      Reference:

      Xiao, B., Wang, Y., He, J., Li, C., 2024. Febrile Temperature Acts through HSP70-Toll4 Signaling to Improve Shrimp Resistance to White Spot Syndrome Virus. J Immunol 213, 1187-1201.

      (3) The data showing the tissue distribution of LvHSF1 and nSWD is a rigorous approach and adds to the manuscript. A similar approach to understanding the time course of expression of AMPs in relationship to LvHSF1 expression levels would strengthen the authors' conclusions that LvHSF1 induction in response to high temperatures and viral infection, in turn, upregulates SWD and other antibacterial genes.

      Thank you for your suggestion. As you good suggestion, we detected the transcriptional expression levels of HSF1 and SWD after WSSV stimulation for 0, 2, 4, 6, 8, 12, 16, 20, and 24 hours. The transcriptional expression level of SWD was set to 1.00 at 0 h, in the early stage of WSSV infection (0-12 h, except 6 h), the expression level of LvHSF1 is strongly induced, and then the expression level of SWD begins to increase. Theses results show that LvHSF1 induction in response to viral infection, in turn, upregulates SWD and other antibacterial genes. Thank you.

      (4) The data (Figures 3 and 4) show that LvHSF1 is necessary to survive WSSV infection at high temperatures but does not affect survival at lower temperatures, even though LvHSF1 limits VP28 levels, and viral load at both temperatures is confusing. Does this suggest that LvHSF1 is not primarily important for protection against the virus but instead, for protection from the heat-induced damage caused by high temperatures, which would not be surprising? The manuscript would benefit if the authors could address this point. How do the authors envision the protection conferred by LvHSF1 only at high temperatures?

      Thank you for your comment. Although no significant difference in shrimp survival rates was observed between LvHSF1-silenced shrimp and GFP-silenced shrimp at low temperature (25 °C), shrimp with silenced LvHSF1 exhibited increased viral loads in hemocytes and gills, suggesting that upregulation of HSF1 expression can protect shrimp from WSSV infection.

      Notably, the tolerance temperature for L. vannamei growth ranges from 7.5 to 42 °C. When infected with WSSV, shrimp use behavioral fever to elevate their body temperature (~32 °C), thereby inhibiting WSSV infection (Rakhshaninejad et al., 2023; Xiao et al., 2024). And this temperature (~32 °C) will not cause heat-induced damage to the shrimp. Our results demonstrate that febrile temperatures induce HSF1, which in turn upregulates antimicrobial peptides (AMPs) that target viral envelope proteins and inhibit viral replication.

      Only at high temperatures, we observed that knockdown of HSF1 did not affect shrimp survival rate (Figure 4A). Thank you again for your valuable feedback.

      Reference:

      Rakhshaninejad, M., Zheng, L., Nauwynck, H., 2023. Shrimp (Penaeus vannamei) survive white spot syndrome virus infection by behavioral fever. Sci Rep 13, 18034.

      Xiao, B., Wang, Y., He, J., Li, C., 2024. Febrile Temperature Acts through HSP70-Toll4 Signaling to Improve Shrimp Resistance to White Spot Syndrome Virus. J Immunol 213, 1187-1201.

      (5) Related to the previous comment, the authors do not clearly distinguish between basal effects of LvHSF1 or nSWD induction and heat-induced effects and the differences related to the requirement of LvHSF1 for protection. Simply increasing LvHSF1 levels can result in increased nSWD. SWD levels increase upon WSSV infection even at 25 ℃, and the knockdown experiments suggest that this could also occur through LvHSF1. It would be useful to explicitly differentiate between basal functions of HSF1 and induced functions.

      Thank you for your suggestion. In previous responses, we have distinguished between basal effects of LvHSF1 or nSWD induction and heat-induced effects.

      As your good suggestion, we injected GST or rHSF1 protein into shrimp, the results showed that recombinant protein HSF1 could significantly induced the expression level of SWD (Supplementary Fig. 5C). Further, after knockdown of SWD, shrimp were injection with rLvHSF1 mixed with WSSV. The results showed that the viral load was significantly lower than the control group 48 hours post WSSV infection (Supplementary Fig. 5D). We have added these results to the Supplementary Figure 5C&5D and added a description in Lines 253-255 and Lines 290-293 in the revised manuscript. Thank you for your constructive comments.

      Reviewer #2 (Recommendations for the authors):

      (1) Two temperatures are used in the experiments of shrimp. It seems that HSF1 is also upregulated by WSSV infection at 25 ℃. However, this upregulation seems not to be able to protect the animals. The authors compare the infection at 25 and 32 ℃ but did not discuss the findings.

      Thank you for your comment. Although no significant difference in shrimp survival rates was observed between LvHSF1-silenced shrimp and GFP-silenced shrimp at low temperature (25 °C), shrimp with silenced LvHSF1 exhibited increased viral loads in hemocytes and gills, suggesting that upregulation of HSF1 expression can protect shrimp from WSSV infection. We have added a discussion of this finding in Lines 461-464 in the revised manuscript. Thank you.

      (2) In the abstract the authors say that "These insights provide new avenues for managing viral infections in aquaculture and other settings by leveraging environmental temperature control." However, this point has not been discussed in the main text.

      We appreciated your comments. We have added a discussion about the environmental temperature control in Lines 512-514 in the revised manuscript. Thank you.

      (3) Line 142: "These results suggest that LvHSF1 may play a key role in enhancing shrimp resistance to WSSV at elevated temperatures." Although this type of conclusion has been made in many studies, I think it is impossible to see a "KEY role" based mainly on change in expression.

      Thank you for your suggestion. We have revised this conclusion in the revised manuscript. Thank you.

      (4) Section 2.1 Induction of Heat Shock Factor 1 in Response to WSSV at High Temperature

      Figure 1. Identification of HSF1 as a key factor induced by high temperature.

      The two titles are confusing. Whether the upregulation of HSF1 is a response to high temperature or WSSV infection? I think it is more likely a response to high temperature. Did the authors see the difference in HSF1 expression in shrimp with and without WSSV infection at high temperatures?

      Thank you for your comment. We have modified the title of Section 2.1 in the revised manuscript. As your good suggestion, we have measured the expression of LvHSF1 after WSSV challenge at high temperatures (32 ℃) in revised Figure 2F-2H in Line 122 in the revised manuscript. The results demonstrate that the expression of LvHSF1 is strongly induced by WSSV stimulation at high temperatures (32 ℃) in the revised manuscript. Thank you.

      (5) Figure 2. Upregulation of LvHSF1 in shrimp challenged by WSSV at both low and high temperatures. Results for WSSV challenge at high temperatures are not included in this figure.

      Thank you for your suggestion. As your good suggestion, we have measured the expression of LvHSF1 after Poly (I: C) and WSSV challenge at high temperatures (32 ℃) in revised Figure 2C-2H. The results demonstrate that the expression of LvHSF1 is strongly induced by Poly (I: C) and WSSV stimulation at high temperatures (32 ℃). And we have added a description in Lines 168-179 in revised manuscript. Thank you.

      (6) Section 2.2 Expression Profiles of LvHSF1 in Shrimp Under Varied Temperature Conditions and WSSV Challenge. Did the authors try poly IC and WSSV challenge at 32℃, and compare with the un-challenge group? Why were only low temperature was analyzed?

      Thank you for your suggestion. As your good suggestion, we have measured the expression of LvHSF1 after Poly (I: C) and WSSV challenge at high temperatures (32 ℃) in revised Figure 2C-2H. And we have added a description about the expression of LvHSF1 after Poly (I: C) and WSSV challenge at high temperatures (32 ℃) in Lines 168-179 in revised manuscript. Thank you.

      (7) Figure 2: Please indicate the temperature used in C-E and F-H in the figure legend. Statistical significance: compared with which group? Please provide information in the legend or show it in the bar chart.

      Thank you for your suggestion. We have added the description of temperature used in revised Figures 2C-2E. The expression changes of HSF1 were compared with those of PBS control group at the corresponding time and we modified the comparison method of significance in revised Figures 2C-2E. Thank you.

      (8) Figure 3H: There are two groups (dsGFP+PBS; dsHSF1+PBS) showing with the same symbol (dot line).

      Thank you for your comment. The revised Figure 3H has used different symbols to distinguish the two groups. Thank you.

      (9) Line 205: qPCR

      Thank you for your careful checks. We have corrected this error in the revised manuscript. Thank you.

      (10) Figure 5d and f: Please indicate the sample in each row.

      Thank you for your suggestion. We have marked the samples in each row in the revised Figures 5d&5f.

      (11) Figure 3 and Figure 4: Why different tissues were analyzed in the two experiments? Low temperature: gill and hemocytes. High temperature: gill and muscle? It is better to use the same tissues so that they can be compared. Please indicate the tissue analyzed in D and d.

      Thank you for your suggestion. We have repeated the experiment to detect the copy number of WSSV in hemocyte at high temperature (32 °C) after LvHSF1 knockdown. The results showed that knockdown LvHSF1 showed increased viral loads in shrimp hemocyte (Figure 4C). We have supplemented the tissue information in Figure 4D&4d. Thank you.

      (12) Figure 2A The time for temperature treatment? hours or days?

      Thank you for your comment. Transcriptional expression of LvHSF1 in different tissues of healthy shrimp subjected to low (25 °C) and high (32 °C) temperatures for 12 hours. We have supplemented this information in the legend of Figure 2A in Lines 840-841 in revised manuscript. Thank you.

      (13) Line 249: purified by SDS-PAGE gel?

      Thank you for your comment. We have modified this description in Lines 272-274 in current manuscript. Thank you.

      (14) Line 258 "Next, to verify whether the anti-WSSV function of nSWD was mediated by LvHSF1 at high temperature". I think it is confusing to use "mediated" here. It seems that HSF1 is downstream of nSWD. Actually, HSF1 controls the expression of nSWD and thus regulates the anti-WSSV effect of shrimp at high temperatures.

      We appreciated your comments. We have modified this description in Lines 282-283 in current manuscript. Thank you.

      (15) Line 458 "The most probable anti-WSSV mechanism of nSWD is its direct interaction with WSSV envelope proteins VP24 and VP26, potentially inhibiting viral entry into target cells. I suggest the author analyze the entry of WSSV to see whether nSWD blocks this process.

      Thank you for your comment. In general, the antimicrobial mechanism of action of AMPs is thought to involve direct membrane disruption, especially for enveloped virus (such as WSSV) (Wilson et al., 2013).

      Thanks to the reviewers for their valuable comments. Our manuscript mainly focuses on the febrile temperature-inducible HSF in host antiviral immunity, and the role of HSF1 in regulating antimicrobial effectors (such as SWD). Due to the limitation of the manuscript's length, we will further investigate the functional mechanisms of SWD-specific anti-WSSV in future studies. Thank you.

      Reference:

      Wilson, S.S., Wiens, M.E., Smith, J.G., 2013. Antiviral Mechanisms of Human Defensins. Journal of Molecular Biology 425, 4965-4980.

      (16) Line 435-456 The author discusses the difference between two shrimp species. Did the two studies measure the same immune parameters? I wonder whether the different observation is due to true differences or different methods they used to evaluate the response. If no immune response was promoted in the previous study, what's the possible anti-viral mechanism?

      We appreciated your comments. Firstly, the shrimps in the two experimental groups have different adaptability to temperature. The optimal water temperature for M. japonicus growth ranges from 25 to 32 °C, and the tolerance temperature for L. vannamei growth ranges from 7.5 to 42 °C. Secondly, the experimental environmental factors are different in the two experimental groups. Ammonia is a key stress factor in aquatic environments that usually increases the risk of pathogenic diseases in aquatic animals, however, High temperatures (32°C) have been shown to inhibit the replication of WSSV and reduce mortality in WSSV-infected shrimp. Thirdly, the two studies tested different immune indicators. Ammonia-induced Hsf1 suppressed the production and function of MjVago-L, an arthropod interferon analog. In this study, our findings revealed the molecular mechanism through which the HSF-AMPs axis mediates host resistance to viruses induced by febrile temperature. Taken together, the benefits of HSF1 can be attributed to either the host or the pathogen, depending on the nature and context of the host-virus-environment interaction.

      (17) Line 472 "directly bind to WSSV envelope proteins and inhibit WSSV proliferation"

      I think it is confusing to use "proliferation" here. It seems that the binding of HSF affects the replication process. However, based on the authors' discussion, HSF may likely block viral entry.

      Thank you for your suggestion. We have modified this description in Lines 505-507 in the current manuscript. Thank you.

      Reviewer #3 (Recommendations for the authors):

      In the manuscript titled "Heat Shock Factor Regulation of Antimicrobial Peptides Expression Suggests a Conserved Defense Mechanism Induced by Febrile Temperature in Arthropods", the authors investigate the role of heat shock factor 1 (HSF1) in regulating antimicrobial peptides (AMPs) in response to viral infections, particularly focusing on febrile temperatures. Using shrimp (Litopenaeus vannamei) and Drosophila S2 cells as models, this study shows that HSF1 induces the expression of AMPs, which in turn inhibit viral replication, offering insights into how febrile temperatures enhance immune responses. The study demonstrates that HSF1 binds to heat shock elements (HSE) in AMPs, suggesting a conserved antiviral defense mechanism in arthropods. The findings are informative for understanding innate immunity against viral infections, particularly in aquaculture. However, the logical flow of the paper can be improved. Following are my specific concerns.

      Major comments

      (1) The study design is pretty good, but the logical flow is not. The following should be improved.

      (a) In Figure 1, the reason for selecting HSF1 as the focus of the study is not clearly explained.

      Thank you for your comment. In a previous study, we have revealed that heat shock proteins exerted a significant role in enhancing the resistance of shrimp to WSSV at elevated temperature (32 ℃) (Xiao et al., 2024). GO functional enrichment analysis of DEGs between group TW and group W, indicating that most DEGs were involved in biological processes such as protein refolding, chaperone-mediated protein folding, and heat response. Therefore, special attention has been paid to heat shock factor 1 (HSF1), the master regulator of the heat shock response. We have added the description in Lines 136-138 in the revised manuscript. Thank you.

      Reference:

      Xiao, B., Wang, Y., He, J., Li, C., 2024. Febrile Temperature Acts through HSP70-Toll4 Signaling to Improve Shrimp Resistance to White Spot Syndrome Virus. J Immunol 213, 1187-1201.

      (b) As the authors draw models in Figure 9, the established activation mechanism of HSF1 is via trimerization by the release of HSP90, which binds to misfolded proteins under stress conditions, such as heat shock. Therefore, the increase in the HSF1 mRNA level in Figure 1 is strange. The authors need to clarify this issue by explaining this established activation mechanism of HSF1 and also must provide the basis of upregulation of HSF1 by mRNA increase via citing papers in the Introduction.

      We appreciated your comments. Under non-stress conditions, HSF monomers are retained in the cytoplasm in a complex with HSP90. During the stress response, such as high temperature, HSF dissociates from the complex, trimerizes, and converts into a DNA-binding conformation through regulatory upstream promoter elements known as heat shock elements (HSEs) (Andrasi et al., 2021). Previous studies have demonstrated that the expression of HSF1 was remarkably induced by stress response, such as high temperature (Ren et al., 2025), virus infection (Merkling et al., 2015), and ammonia stress (Wang et al., 2024). Our results also showed that the expression of LvHSF1 was significant induced by WSSV infection and high temperature (Figure 2). Therefore, this is not surprising that the increase in the HSF1 mRNA level in Figure 1.

      In response, we have revised the proposed model to better reflect our experimental findings and the accompanying description. This revision ensures that the schematic is consistent with our data and accurately represents the proposed mechanism. We appreciate your careful review and constructive feedback.

      Reference:

      Andrasi, N., Pettko-Szandtner, A., Szabados, L., 2021. Diversity of plant heat shock factors: regulation, interactions, and functions. J Exp Bot 72, 1558-1575.

      Ren, Q., Li, L., Liu, L., Li, J., Shi, C., Sun, Y., Yao, X., Hou, Z., Xiang, S., 2025. The molecular mechanism of temperature-dependent phase separation of heat shock factor 1. Nature Chemical Biology.

      Merkling, S.H., Overheul, G.J., van Mierlo, J.T., Arends, D., Gilissen, C., van Rij, R.P., 2015. The heat shock response restricts virus infection in Drosophila. Sci Rep 5, 12758.

      Wang, X.X., Zhang, H., Gao, J., Wang, X.W., 2024. Ammonia stress-induced heat shock factor 1 enhances white spot syndrome virus infection by targeting the interferon-like system in shrimp. mBio 15, e0313623.

      (c) For RNA seq analysis in both in Figures 1 and 5, they need to provide changes in conventional HSF1 target chaperones (many HSPs) to validate their RNA seq data.

      Thank you for your suggestion. In Authopr response image 1, our previous study has revealed that classical heat shock proteins (such as HSP21, HSP70, HSP60, HSP83, HSP90, HSP27, HSP10, and Bip) were induced by RNA-seq between Group TW and Group W, suggesting heat shock proteins exert a crucial role in enhancing the resistance of shrimp to WSSV at elevated temperatures (32 ℃) and underscoring the reliability of our transcriptomic findings (Xiao et al., 2024). We have added the description in Lines 136-138 in the revised manuscript.

      In Figure 5, we have supplemented the heat shock proteins downregulated DEGs by transcriptome sequencing of dsGFP +WSSV (32 ℃) vs. dsLvHSF1 +WSSV (32 ℃) in Supplementary table 2. The results showed that the classical heat shock proteins were downregulated by the RNA-seq, underscoring the reliability of our transcriptomic findings. We have added the description in Lines 213-216 in the revised manuscript. Thank you.

      Reference:

      Xiao, B., Wang, Y., He, J., Li, C., 2024. Febrile Temperature Acts through HSP70-Toll4 Signaling to Improve Shrimp Resistance to White Spot Syndrome Virus. J Immunol 213, 1187-1201.

      (d) In Figure 5, they did experiments by focusing on the changes by HSF1 knockdown at 32 ℃. However, the logical flow should be focusing on genes whose expression was increased by 32 ℃ compared with 25 ℃ (in figure 1), among them they need to characterize HSF1 target genes. Here as mentioned above, classical HSP genes must be included in addition to those AMP genes.

      Thank you for your suggestion. As your good suggestion, we have supplemented the heat shock proteins downregulated DEGs by transcriptome sequencing of dsGFP +WSSV (32 ℃) vs. dsLvHSF1 +WSSV (32 ℃) in Supplementary table 2. The results showed that the classical heat shock proteins were downregulated by the RNA-seq, underscoring the reliability of our transcriptomic findings. We have added the description in Lines 213-216 in the revised manuscript. Thank you.

      (e) What is the logical basis of just picking nSWD? It is another example of cherry-picking similar to picking HSF1 in Figure 1.

      We appreciated your comments. To determine how temperature-induced LvHSF1 restricts WSSV infection, RNA-seq was performed to identify target genes regulated by HSF1. By analyzing the differentially expressed genes (DEGs), we screened eight candidate proteins for immunity-effector molecules, including SWD, CrustinⅠ, C-type lectin, Anti-lipopolysaccharide factor (ALF), and Vago. CrustinⅠ has been shown to play an important role in antiviral immunity (Li et al., 2020); C-type lectin (CTL1) can bind to the VP28, VP26, VP24, VP19, and VP14, thereby inhibiting the infection of WSSV (Zhao et al., 2009); Anti-lipopolysaccharide factor (ALF3) performs its anti-WSSV activity by binding to the envelope protein WSSV189 (Methatham et al., 2017); Vago can inhibit WSSV infection by activating the Jak/Stat pathway in shrimp (Gao et al., 2021). However, the detailed regulatory mechanism of SWD against WSSV was unclear, and particular attention was paid to the SWD. We have added the description in Lines 215-220 in the revised manuscript. Thank you for your valuable comments and the logic of the manuscript has been improved.

      Reference:

      Li, S., Lv, X., Yu, Y., Zhang, X., Li, F., 2020. Molecular and Functional Diversity of Crustin-Like Genes in the Shrimp Litopenaeus vannamei, Marine Drugs 18, 361.

      Zhao, Z.Y., Yin, Z.X., Xu, X.P., Weng, S.P., Rao, X.Y., Dai, Z.X., Luo, Y.W., Yang, G., Li, Z.S., Guan, H.J., Li, S.D., Chan, S.M., Yu, X.Q., He, J.G., 2009. A novel C-type lectin from the shrimp Litopenaeus vannamei possesses anti-white spot syndrome virus activity. Journal of Virology 83, 347-356.

      Methatham, T., Boonchuen, P., Jaree, P., Tassanakajon, A., Somboonwiwat, K., 2017. Antiviral action of the antimicrobial peptide ALFPm3 from Penaeus monodon against white spot syndrome virus. Dev Comp Immunol 69, 23-32.

      Gao, J., Zhao, B.R., Zhang, H., You, Y.L., Li, F., Wang, X.W., 2021. Interferon functional analog activates antiviral Jak/Stat signaling through integrin in an arthropod. Cell Rep 36, 109761.

      (f) Likewise, choosing Atta in S2 cells needs logic.

      We appreciated your comments. Our manuscript revealed that febrile temperature inducible HSF1 confers virus resistance by regulating the expression of antimicrobial peptides (AMPs) in L. vannamei. Further, we want to know that whether HSF1 regulation of antimicrobial peptides is a conserved defense mechanism induced by elevated temperature in arthropods, and experiments were performed in an invertebrate model system (Drosophila S2 cells). Previous study showed that DmAMPs (such as Attacin A, Cecropins A, Defensin, Metchnikowin, and Drosomycin) exerted a significant role in the antiviral immunity in Drosophila (Zhu et al., 2013). Our results showed that the expression of Attacin A, Cecropins A and Defensin were remarkably induced by DmHSF, and the expression of Attacin A was the highest induced. Therefore, DmAtta was chosen as a representative to further demonstrate that DmHSF1 exerts its anti-DCV function by regulating DmAMPs. We have added the description in Lines 328-330 and Lines 361-364 in the revised manuscript. Thank you for your valuable comments and the logic of the manuscript has been improved.

      Reference:

      Zhu, F., Ding, H., Zhu, B., 2013. Transcriptional profiling of Drosophila S2 cells in early response to Drosophila C virus. Virol J 10, 210.

      (2) From Figure 6I to 6K, the authors aimed to verify whether the anti-WSSV function of nSWD was mediated by LvHSF1 at high temperatures. However, what they showed was just showing that nSWD plays anti-WSSV function downstream of HSF1. The authors should show additional data for dsControl+rnSWD.

      Thank you for your suggestion. As your suggestion, after knockdown of SWD, shrimp were injection with rLvHSF1 mixed with WSSV. The results showed that the viral load was significantly lower than the control group 48 hours post WSSV infection (Supplementary Fig. 5D). We have added these results to the Supplementary Figure 5C&5D and added a description in Lines 290-293 in the revised manuscript. Thank you for your constructive comments.

      (3) For the physical interaction between nSWD and WSSV, it will be great if the authors perform Alphafold3 prediction analysis (Abramson et al PMID: 38718835).

      Thank you for your suggestion. As you suggestion, we performed Alphafold3 prediction analysis on SWD and WSSV (VP24 and VP26). The predicted template modeling (pTM) score measures the accuracy of the entire structure. A pTM score above 0.5 means the overall predicted fold for the complex might be similar to the true structure. The Alphafold3 prediction results show that there is a possible interaction between SWD and WSSV. Notably, our manuscript demonstrated that rSWD could interact with VP24 and VP26 by pulldown assays and confocal analysis.

      Author response image 3.

      Alphafold3 prediction analysis of SWD&VP24 as follow (pTM = 0.64)

      Author response image 4.

      Alphafold3 prediction analysis of SWD&VP26 as follow (pTM = 0.53)

      Minor comments

      (1) In the Abstract and many other places, the authors need to specifically write "Drosophila S2 cells" instead of "Drosophila" because conventionally Drosophila implies fruit fly as an organism. We don't say cultured human cells as "human" or "Homo sapiens" in papers.

      Thank you for your suggestion. We have modified the description of Drosophila in the revised manuscript. Thank you.

      (2) Figure numbers can be reduced for better readability. I would combine Figures 1 and 2, and Figures 3 and 4. If the combined figures are too crowded, some can go to into supplementary figures.

      Thank you for your suggestion. We have moved the Poly (I: C) data to Supplementary Figure 2 in the revised manuscript. However, we have added some experimental data to Figures 1, 2, 3, and 4. Therefore, we did not combine Figure 1 and Figure 2, and Figures 3 and 4. Thank you.

      (3) One of the best-understood roles of HSF1 in physiology other than heat shock response is longevity, in particular with C. elegans. The authors need to mention this in the Discussion by citing the following recent review paper (Lee PMID: 36380728).

      Thank you for your suggestion. We have supplemented the description of HSF1 regulating longevity and aging of organisms and cited the above reference in the revised manuscript (Lee and Lee, 2022). Thank you.

      Reference:

      Lee, H., Lee, S.V., 2022. Recent Progress in Regulation of Aging by Insulin/IGF-1 Signaling in Caenorhabditis elegans. Mol Cells 45, 763-770.

      (4) Please make your own label for small letter panels or transfer small letter panels to supplementary figures.

      Thank you for your suggestion. We have adjusted the relevant letter labels. The uppercase letters represent the main image of the Figure, and the small letter panels are the corresponding supplementary instructions in the revised manuscript. Thank you.

      (5) In the introduction part, I recommend changing the references for HSFs and HSR with recent ones.

      Thank you for your suggestion. We have added the latest references for HSFs and HSR in the Introduction part of the revised manuscript. Thank you.

      (6) In Figure 1, it is not intuitive to understand the name groups W and TW.

      We appreciated your comments. We have added the description of Group W and Group TW in revised Figure 1. Group W comprised shrimp injected with WSSV and maintained at 25 °C continuously. In contrast, Group TW was subjected to a temperature increase to 32 °C at 24 hours post-injection (hpi). Gill samples were collected for analysis 12 hours post-temperature rise (hptr) and subjected to Illumina sequencing. Thank you.

      (7) Please add some kinds of sequence comparisons of SWD and nSWD for readers to understand the homology.

      We appreciated your comments. We have added the multiple sequence alignment of SWD proteins in shrimp species in revised Supplementary Figure 3. Highly conserved amino acid residues and cysteine and residues are highlighted in red, indicating that LvSWD is a conserved antimicrobial peptide of the Crustin family. Thank you.

      (8) Naming nSWD with "newly identified" is strange as it will not be new anymore as time goes by. Please change the name.

      Thank you for your suggestion. We have modified the name of nSWD to SWD in the revised manuscript. Thank you.

      (9) Please write the full name for Lv (Litopenaeus vannamei), Dm (Drosophila melanogaster), ds (double-stranded) before using LvHSF1, DmHSF1, and dsLvHSF1.

      Thank you for your comments. We have added the full name of LvHSF1, DmHSF1, and dsLvHSF1 in the revised manuscript. Thank you.

      (10) In Figure 2, it will be better to transfer poly I:C data to supplementary figures.

      Thank you for your comments. We have moved the Poly (I: C) data to Supplementary Figure 2 in the revised manuscript. Thank you.

      (11) The label for pGL3-nSWD-M12 is confusing. M1 and M2 are OK. Please change M12 with M1/2 or another one.

      Thank you for your suggestion. We have changed pGL3-nSWD-M12 with pGL3-nSWD-M1/2 in the revised manuscript. Thank you.

    1. Author Response:

      The following is the authors’ response to the previous reviews

      eLife Assessment

      This article presents useful findings on how the timing of cooling affects the timing of autumn bud set in European beech saplings. The study leverages extensive experimental data and provides an interesting conceptual framework for the various ways in which warming can affect but set timing. The statistical analysis is compelling, but indicates some factors that may temper the authors' claims, while the designs of experiments offer incomplete support for the current claims as they rely on one population under extreme conditions for only one year each while a confounding effect (time in a chamber) sometimes lacks a control.

      We thank the editor and reviewers for their consideration of our revised manuscript and for their constructive suggestions. In response to the editor’s guidance, we have ensured that: 1) the experimental design is clearly presented as physiological forcing, 2) the Solstice-as-Phenology-Switch concept is explicitly defined, limited, and framed as inferred, 3) conclusions are strictly aligned with the scope of the evidence, and limitations are acknowledged transparently.

      We hope these revisions fully address the remaining concerns and clarify both the conceptual framework and the appropriate scope of inference.

      Public Review:

      Reviewer #1 (Public review):

      The authors identified the summer solstice (June 21) as a phenological "switch point", but the flexibility of this switch point remains poorly understood. A more precise explanation of what "flexibility" means in this context is needed, along with a description of the specific experimental results that would demonstrate this flexibility.

      We agree that the concept of “flexibility” required clearer definition and a more explicit link to the experimental results. In the Introduction, we now explicitly define flexibility as the capacity for the effective timing of the phenological switch to shift earlier or later depending on developmental progression, rather than occurring at a fixed calendar date. This switch occurs at the compensatory point between the antagonistic influences of early-season development [ESD effect] and late-season temperature [LST effect](L92-98). We have extended and clarified our explanation of the summer solstice’s role in this framework (L69-90). We propose that the solstice acts as an environmental switch that initiates the LST effect, as declining daylengths signal trees to become responsive to late-season cooling (L92-94). The compensatory point then occurs where the advancing ESD effect is balanced by the delaying LST effect. This point should therefore not be fixed to a calendar date but instead vary with developmental progression each year (L75-95).

      In the Discussion, we clarify that flexibility is demonstrated experimentally by the observation that the magnitude of July cooling effects (LST effect) on autumn phenology depend on prior developmental rate (ESD effect) [3.4 times greater delay in late-leafing trees], indicating that the position of the compensatory point is development-dependent rather than fixed to June 21 (L398-410). We have made consistent edits throughout the Discussion, in particular in the ‘Support for the Solstice-as-Phenology-Switch Hypothesis’ subsection (L514-530).

      The experiment did not directly measure the specific date of the phenological switch point. Instead, it was inferred by comparing temperature effects before and after the solstice. The manuscript should clearly state that this switch point remains an inferred conceptual node rather than a directly measured variable.

      We fully agree and have clarified this in the revised manuscript. In the Discussion, we now clearly state that the compensatory point is a conceptual node inferred from responses to cooling before the solstice (June), directly after it (July), or later in the growing season (August) rather than a directly observed phenological event (L352-358 & L405-406).

      In Experiment 1, the effect of bud type (terminal vs. lateral) was inconsistent across the overall model and the different leafing groups. The authors should provide a more thorough discussion of potential reasons for this inconsistency.

      This inconsistency reflects biological complexity. In the Discussion, we now expand our interpretation to note that terminal and lateral buds may differ in developmental status, resource allocation and hormonal context. We emphasize that bud-type effects are therefore expected to be context-dependent and to interact with wholeplant developmental state, which plausibly explains why effects differ across leafing groups and models (L390-396).

      In addition, the statistical model for Experiment 1 indicates that the measured variables (summer cooling and leaf emergence date) explain only 23.4% of the variation in bud formation timing. This leaves over 76% of the variation unexplained, suggesting that other important factors are involved. The discussion should address this limitation in greater depth, moving beyond a focus on the measured variables.

      We now discuss the explained and unexplained variance in more detail. We also make it clear that our experiment was designed to test specific mechanistic pathways rather than to fully explain all phenological variability or maximise predictive power L417-419).

      In the Discussion, we acknowledge that a substantial fraction of variation remains unexplained (L419-421). We discuss the possibility of other physiological mechanisms, such as photosynthetic assimilation, contributing to the unexplained variation (L421-427). However, large inter-individual variability is commonplace in autumn phenology. A low intra-class correlation coefficient (ICC = 0.26; see L276-280 for methods) suggests much of the remaining variation is attributable to individual-level differences rather than missing explanatory variables (L429-431). In line with the literature, we suggest that genetic and epigenetic differences likely contributed significantly to inter-individual variation, even within a single provenance population (L431-434). In this context of high individual variability, leaf-out timing (ESD effect) and summer cooling treatment (LST effect) together explaining 23.4% of variation in bud set timing is biologically meaningful and demonstrates the mechanistic importance of these processes (L438-441). For completeness, we also briefly discuss alternate sources of within-treatment variability (L434-437).

      Reviewer #2 (Public review):

      I think the experiments are interesting, but I found the exact methods of them somewhat extreme compared to how the authors present them.

      We appreciate this concern and have substantially revised the manuscript to clarify the experimental logic. In the Introduction, we now state explicitly that the study uses temperature regimes that were designed as strong physiological forcing treatments, intended to deeply constrain development and isolate mechanisms rather than to simulate natural or future climatic conditions (L113-115).

      In the Methods, we have enhanced our description of the non-linear effects of temperatures below 10°C on physiological processes (L154-158).

      At the start of the Discussion, we have added a dedicated paragraph clarifying the scope of inference: the experiment tests causality and constraint (i.e. whether specific physiological processes can drive phenological shifts), not quantitative responses under realistic climate scenarios (L346-363). Throughout the Discussion, we have revised language that could be read as scenario-based interpretation, replacing it with mechanistic phrasing.

      Further, given that much of the experiment happened outside, I am not sure how much we can generalize from one year for each experiment, especially when conducted on one population of one species.

      Given the large individual variation expected in phenological experiments, we used single experimental populations of single provenance beech saplings to minimise uncontrolled for variation arising from genetic differences (L358-360). This allowed us to elucidate mechanisms despite noisy biological heterogeneity associated with phenology.

      In the last round of revision, we toned down statements of generalisation. In the Discussion, we now go further to clarify what mechanistic understanding can be gleamed directly from our findings and then cautiously make suggestions how these mechanisms may play out in natural systems. We repeatedly state the intention of the study as mechanistic inference rather than predictive power, e.g. “However, extrapolations to more complex natural ecosystems should be made with caution as our experimental design prioritised mechanistic inference over generalisability and predictive power.” (L417-419). Alongside our previous calls for tests on other species, we now additionally call for tests on other provenances of beech (L511-512).

      I was also very concerned by the revisions.

      If this concern stems from the confusion regarding line-numbers and the two submitted versions of the manuscript (with tracked changes and without tracked changes; as required by eLife), then we hope that situation is now clarified. Otherwise, the authors do not understand why our previous revisions would be perceived as being concerning. Regardless, we have made every attempt to address the remaining comments comprehensively.

      Further, I am at a loss about their hypothesis, when they write in their letter: "Importantly, the Solstice-asPhenology-Switch hypothesis does not assume that the reversal is fixed to June 21." Why on earth reference the solstice if the authors do not mean to exactly reference the solstice?

      We appreciate this important conceptual point. The Solstice-as-Phenology-Switch hypothesis is central to our conceptual model and therefore requires clear explanation. In concert with our changes in response to Reviewer 1’s comment regarding flexibility, we have substantially revised and improved our description of this hypothesis (L69-108).

      Whilst the summer solstice is fixed to a calendar date (June 21), the timing of when trees change their autumn phenological responses to temperature is not (L88-90 & L515-517). This occurs when the compensatory point of two antagonistic effects is crossed. Higher early-season development rates (which are driven by temperature) have an advancing (negative) effect on autumn phenology, which we now refer to as the ESD effect (L71-78). Warmer late-season temperatures have a delaying (positive) effect because trees become phenologically susceptible to cooling, i.e. overwintering responses are induced in response to cooling, which we now refer to as the LST effect (L78-82). The point in time when these two effects balance each other out, i.e. the net effect = 0, is the compensatory point (L95-97 & L523-525). The reason this point occurs after the solstice, is because the LST effect only becomes active when days begin to shorten (L92-94 & L522-523). The solstice acts as an environmental switch, initiating trees’ susceptibility to cooling. Therefore, the solstice is referenced in the hypothesis because it forms a daylength barrier. In this framework, the compensatory point cannot occur earlier than the solstice because day lengths are still increasing (L517-519).

      In the Introduction and Discussion, we clarify that the solstice is referenced as a biologically meaningful photoperiodic cue, not as a fixed threshold date. We now emphasise that the hypothesis concerns a seasonal reversal in responses to temperature structured around photoperiod, whose effective timing depends on developmental state, rather than a reversal occurring precisely on June 21. To avoid confusion, we have reworded phrases such as “summer solstice effect reversal” to “reversal of phenological responses to temperature after the summer solstice” (L371). In accordance, we have also changed the title to “Developmental constraints mediate the reversal of temperature effects on the autumn phenology of European beech after the summer solstice”.

      The following comments stem from the first round of review. We have previously revised the manuscript in accordance with these comments. For most of these points we do not see further cause for changes except for any overlap with comments above. We therefore predominantly copy our previous responses in quotes for clarity, the exception being the comment regarding the framing of our results in relation to natural systems.

      The comments below relate to my original review with many of them still applying.

      Methods: As I read the Results I was surprised the authors did not give more info on the methods here. For example, they refer to the 'effect of July cooling' but never say what the cooling was. Once I read the methods I feared they were burying this as the methods feel quite extreme given the framing of the paper.

      “We understand the concern regarding the structure of the manuscript and note that the methods section was moved to the end of the paper in accordance with eLife’s recommended formatting. We have now moved the methods section before the results to ensure that readers are familiar with the treatments before encountering the outcomes.

      Regarding presentation, treatment details are now described in both the Methods and the relevant figure legends. Given this structure, we have chosen not to restate the full treatment conditions in the main Results text to avoid repetition.”

      The paper is framed as explaining observational results of natural systems, but the treatments are not natural for any system in Europe of which I have worked in. For example a low of 2 deg C at night and 7 deg C during the day through end of May and then 7/13 deg C in July is extreme. I think these methods need to be clearly laid out for the reader so they can judge what to make of the experiment before they see the results.

      We appreciate the reviewer’s concern regarding the use of relatively extreme temperature treatments and the need to ensure that our conclusions are consistent with the motivation for using them. The manuscript was also revised in this regard in the previous round, and we copy the relevant responses at the bottom of this response. Despite this, we agree that further explanation of how our experimental treatments suited the aims of our study was still required.

      The aim of these treatments was not to reproduce typical ambient conditions, but to act as a mechanistic probe. Such mechanisms are not readily identifiable from observations or mild manipulations, because the expected effects are small relative to natural variability; stronger perturbations are therefore required to generate a diagnostic contrast. By strongly constraining development in the early-season, and by providing a robust cooling signal in the late-season, we sought to reveal the causal structure underlying the observed solstice-related reversal in temperature effects on autumn phenology.

      Temperatures below 10°C intensively slow down cell division and mitotic rates, these rates then rapidly and non-linearly approach 0 as temperatures drop towards 0°C (Körner, 2021). As reflected in L152-158 of the revised manuscript, we selected a spring cooling regime of 2–7 °C to strongly slow developmental processes while maintaining a clear thermal safety margin that eliminates the risk of frost damage. Although a milder cooling regime (e.g. 5–10 °C) would be less extreme, it would also be expected to produce only a comparatively small reduction in developmental rates, thereby substantially reducing our ability to generate distinct early- and late-developing individuals and to detect carry-over effects on autumn phenology. Applying strong cooling therefore increases signal-to-noise and allows us to detect the underlying mechanism, which would not be possible with temperature treatments that represent average contemporary climatic variation.

      The use of conditions out with the norm is a standard practice to elucidate mechanisms in ecology, where organisms are often pushed to their physiological limits or transplanted into environments fundamentally different to those which they are adapted (Somero, 2010; Berend et al., 2019). Experiments targeting autumn phenology have utilised a broad range of environmental conditions from moderate to extreme manipulations (Tanino et al., 2010). For example, to test the controls of growth cessation and dormancy induction in Prunus species, one study applied a range of treatments including constant 9°C temperature and 24 hour photoperiod between April and July (Heide, 2008).

      Our experimental design aimed to reduce rates of development, cell division and maturation. In the Methods, we describe this aim and clearly state that the experimental design was not intended to mimic natural climatic variation (L154-156 & L181-186). Importantly, our conclusions are framed at the level of direction, timing, and interaction of effects, rather than the magnitude expected under contemporary or future field conditions (L360-363).

      This framing intends to reflect the primary inference of this study, which concerns when and why temperature effects reverse around the solstice, and how this timing depends on developmental state and diel temperature exposure, rather than making quantitative predictions for present-day or future climates. This aligns our conclusions with the experimental design. We have further revised the Discussion to explain these aims and conclusions more clearly, including the addition of a subsection at the beginning titled “Experimental forcing and scope of inference” (L346-363). We have also set up this expectation in the Introduction (L113-115).

      Additionally, we have improved the Discussion in a number of related aspects.

      We explicitly separate mechanistic conclusions and any relation to natural systems, remaining cautious to not overgeneralise or overstate our findings (L417-419).

      We now include a dedicated paragraph explaining that, although these specific conditions are not likely to be found in beech’s range, analogous developmental constraints can arise during cold springs, late cold spells following budburst, or at high-elevation and continental sites where temperatures remain low despite increasing photoperiod (L540-545, L583-588). We further explain that because developmental progression integrates temperature cumulatively over time, even short episodes of strong cooling can exert lasting carry-over effects on seasonal timing, thereby linking the forced experimental responses to processes relevant under natural, fluctuating conditions (L545-550).

      We explicitly state that the decoupling of day and night temperatures was not intended to represent realistic meteorological states (L458-460). We explain that this design was used diagnostically to isolate inherently diel physiological processes (e.g. nocturnal growth, cell division and expansion versus daytime carbon assimilation), and that the observed responses demonstrate the importance of diel timing of temperature exposure rather than the realism of the imposed cycles (L460-468).

      Previous response:

      We recognise that our temperature treatments were severe and do not mimic real world scenarios. They were deliberately designed to create large contrasts in developmental rates, thereby maximising our ability to detect the mechanisms underpinning the solstice switch. For example, the severe cooling between 4 April and 24 May was specifically designed to slow spring development as much as possible without damaging the plants. We have added text in the Methods to clarify this aim.

      I also think the control is confounded with growth chamber experience in Experiment 1. That is, the control plants never experience any time in a chamber, but all the treatments include significant time in a chamber. The authors mention how detrimental chamber time can be to saplings (indeed, they mention an aphid problem in experiment 2) so I think they need to be more upfront about this. The study is still very valuable, but -- again -- we may need to be more cautious in how much we infer from the results.

      We appreciate the reviewer’s concern about the potential confounding effect of chamber exposure in experiment 1. We have now discussed this limitation more explicitly, adding further explanation to the Methods and Discussion.

      Note that chamber-related problems (e.g. aphid infestations) primarily occurred under warm chamber conditions, whereas our experiment 1 cooling treatments maintained low temperatures that suppressed such issues. This means that an equivalent “warm chamber control” could have been associated with its own artefacts, as trees kept under warm chamber conditions would have been exposed to additional stressors that were not present under natural growing conditions. To address this point, we included a chamber control in experiment 2. While aphid abundance was indeed higher in the warm chamber controls, chamber exposure itself had no detectable effect on autumn phenology. This suggests that the main findings of experiment 1 are unlikely to be artefacts of chamber conditions.

      Nevertheless, we agree that chamber exposure remains a potential limitation of experiment 1, which requires clear acknowledgement. We now state this more explicitly in the manuscript while also emphasising that our results are supported by experiment 2 and by converging lines of external evidence.

      Also, I suggest the authors add a figure to explain their experiments as they are very hard to follow. Perhaps this could be added to Figure 1?

      We have now added figures to the methods section to depict the experimental timelines and settings more clearly (Figs. 2 and 3).

      Finally, given how much the authors extrapolate to carbon and forests, I would have liked to see some metrics related to carbon assimilation, versus just information on timing.

      We agree that carbon assimilation is an important component of forest carbon dynamics. However, the primary aim of this study was to identify how developmental state and diel cycles mediate temperature effects on autumn phenology, rather than to quantify carbon assimilation per se. Assessing photosynthetic controls on autumn phenology would require a substantially different experimental design and is therefore beyond the scope of the present study.

      That said, we were able to include measurements of photosynthetic assimilation during pre-solstice cooling (now presented as Fig. S12 for all treatments). These data show that cooling strongly reduced assimilation across all treatments, despite their markedly different phenological outcomes. This supports our interpretation that variation in assimilation alone cannot explain the observed phenological responses, consistent with previous manipulative and observational studies reporting a weak role of late-season assimilation in controlling autumn phenology.

      Fagus sylvatica: Fagus sylvatica is an extremely important tree to European forests, but it also has outlier responses to photoperiod and other cues (and leafs out very late) so using just this species to then state 'our results likely are generalisable across temperate tree species' seems questionable at best.

      We agree that Fagus sylvatica has a stronger photoperiod dependence than many other European tree species. As we note in our response to Reviewer 1, our findings align with previous research across temperate northern forests. Within our framework, interspecific variation in leaf-out timing would not alter the overall response pattern, though it could shift the specific timing of effect reversals. For example, earlier-leafing species may approach completion of development sooner and thus show sensitivity to late-season cooling earlier than F. sylvatica. Nevertheless, we acknowledge the importance of not overstating generality. We have therefore revised the manuscript to phrase conclusions more cautiously and highlight the need for further research across species.

      And the referenced response to Reviewer one:

      We agree that extrapolation from our experiments on Fagus sylvatica to other species and natural forests requires caution. However, it is precisely the controlled nature of our design that allowed us to isolate the precise mechanisms that appear to underpin the solstice switch, highlighting the role of diel and seasonal temperature variation. In natural systems, additional variables such as competition, precipitation, and soil heterogeneity can strongly influence phenology, but they also make it difficult to disentangle causal mechanisms. By minimising these confounding factors, our experiment provided a clear test of how temperature before and after the solstice regulates growth cessation.

      To acknowledge the limitation, we have toned down statements about generalisation (e.g. “likely generalisable” to “other temperate tree species may display similarities”) and explicitly call for follow-up studies across species and forest contexts. At the same time, we highlight that our findings align with independent evidence from manipulative experiments, satellite observations, flux measurements, and groundbased phenology, which suggests the mechanisms we report may extend beyond the specific populations studied here.”

      As described in responses above, we have further clarified what can be directly concluded from our study, avoiding overgeneralisation.

      Measuring end of season (EOS): It's well known that different parts of plants shut down at different times and each metric of end of season -- budset, end of radial expansion, leaf coloring etc. -- relate to different things. Thus I was surprised that the authors ignore all this complexity and seem to equate leaf coloring with budset (which can happen MONTHS before leaf coloring often) and with other metrics. The paper needs a much better connection to the physiology of end of season and a better explanation for the focus on budset. Relatedly, I was surprised the authors cite almost none of the literature on budset, which generally suggests is it is heavily controlled by photoperiod and population-level differences in photoperiod cues, meaning results may different with a different population of plants. 

      We thank the reviewer for pointing out that our discussion of the responses of different EOS metrics needs more clarity. We agree with much of this perspective, and we have added an additional analysis of leaf chlorophyll content data to use leaf discolouration as an alternative EOS marker. On this we would like to make two important points:

      Firstly, we agree that bud set often occurs before leaf discolouration, although this can depend on which definition of leaf discolouration is used. In experiment 1, budset occurred on average on day-of-year (DOY) 262 and leaf senescence (50% loss of leaf chlorophyll) occurred on DOY 320. However, we do not necessarily agree that this excludes the combined discussion of bud set and leaf senescence timing. Whilst environmental drivers can affect parts of plants differently, often responses from different end-of-season indicators (e.g. bud set and loss of leaf chlorophyll) are similar, even if only directionally. Figure S11 shows how, across both experiments, treatment effects were tightly conserved (R<sup>2</sup> = 0.49) amongst the two phenometrics. In accordance with these revisions, we have updated the manuscript title to “Developmental constraints mediate the summer solstice reversal of climate effects on the autumn phenology of European beech”.

      Secondly, shifts in bud set timing remain the primary focus of the manuscript as these shifts are of direct physiological relevance to plant development and dormancy induction, whereas leaf discolouration may simply follow bud set as a symptom of developmental completion. This is supported by our results, which show stronger responses of bud set than leaf senescence (Figs. 4 & 5 vs. Figs. S9 & S10).

      Following the reviewer’s suggestion, we have included more references on the topic of bud set and its environmental controls. The reviewer rightly stresses that photoperiod is considered the most important factor. Photoperiod is therefore key in our conceptual model. However, the responses we observed in F. sylvatica cannot be explained by photoperiod alone. For example, in experiment 1, July cooling delayed the autumn phenology of late-leafing trees but had negligible impact on early-leafing trees, even though both experienced the exact same photoperiod. Moreover, in experiment 2, day, night and full-day cooling showed substantial variations in their effects despite equal photoperiod across the climate regimes. This is why we suggest that the annual progression of photoperiod modulates the responses to temperature variations instead of eliciting complete control.

      Following the addition of an analysis of leaf senescence data, we also revised the terminology in places (including the title) from “primary growth cessation/bud set” to the broader term “autumn phenology.” This term is intended to encompass two distinct but related physiological processes—bud set and leaf senescence—both of which are commonly used as markers of autumn phenology and the end of the growing season.

      Somewhat minor comments:

      (1) How can a bud type -- which is apical or lateral -- be a random effect? The model needs to try to estimate a variance for each random effect so doing this for n=2 is quite odd to me. I think the authors should also report the results with bud type as fixed, or report the bud types separately.

      We have revised the analysis to include bud type as a fixed effect. There are only very minor numerical adjustments (e.g. rounding to 4.8 days instead of 4.9) and inferences are not altered. We also report the bud type effects for experiment 1 and experiment 2.

      (2) I didn't fully see how the authors results support the Solstice as Switch hypothesis, since what timing mattered seemed to depend on the timing of treatment and was not clearly related to solstice. Could it be that these results suggest the Solstice as Switch hypothesis is actually not well supported (e.g., line 135) and instead suggest that the pattern of climate in the summer months affects end of season timing?

      Our responses to the main comments in this new round of revision have comprehensively covered this topic.

      References

      Berend K, Haynes K, MacKenzie CM. 2019. Common garden experiments as a dynamic tool for ecological studies of alpine plants and communities in northeastern North America. Rhodora 121: 174.

      Heide OM. 2008. Interaction of photoperiod and temperature in the control of growth and dormancy of Prunus species. Scientia Horticulturae 115: 309–314.

      Körner C. 2021. Alpine Plant Life: Functional Plant Ecology of High Mountain Ecosystems. Cham: Springer International Publishing.

      Somero GN. 2010. The physiology of climate change: how potentials for acclimatization and genetic adaptation will determine ‘winners’ and ‘losers’. Journal of Experimental Biology 213: 912–920.

      Tanino KK, Kalcsits L, Silim S, Kendall E, Gray GR. 2010. Temperature-driven plasticity in growth cessation and dormancy development in deciduous woody plants: a working hypothesis suggesting how molecular and cellular function is affected by temperature during dormancy induction. Plant Molecular Biology 73: 49–65.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This valuable study combined careful computational modeling, a large patient sample, and replication in an independent general population sample to provide a computational account of a difference in risk-taking between people who have attempted suicide and those who have not. It is proposed that this difference reflects a general change in the approach to risky (high-reward) options and a lower emotional response to certain rewards. Evidence for the specificity of the effect to suicide, however, is incomplete, which would require additional analyses.

      We thank the editors and reviewers for this important assessment. Based on clinical interviews, we included patients with and without suicidality (S<sup>+</sup> and S<sup>-</sup> groups). However, in line with suicidal-related literature (e.g., Tsypes et al., 2024), two groups also differed substantially in the severity of symptoms (see Table 1). To address the request for evidence on specificity to suicidality beyond general symptom severity, we performed separate linear regressions to explain in gambling behaviour, value-insensitive approach parameter (β<sub>gain</sub>), and mood sensitivity to certain rewards (β<sub>CR</sub>) with group as a predictor (1 for S<sup>+</sup> group and 0 for S<sup>-</sup> group) and scores for anxiety and depression as covariates. Results remained significant after controlling anxiety and depression (ps < 0.027; Table S8). Given high correlations among anxiety and depression questionnaires (rs > 0.753, ps < 0.001), we performed Principal Components Analysis (PCA) on the clinical questionnaire to extract the orthogonal components, where each component explained 86.95%, 7.09%, 3.27%, and 2.68% variance, respectively. We then performed linear regressions using these components as covariates to control for anxiety and depression. Our main results remained significant (ps < 0.027; Table S9). We believe that these analyses provide evidence that the main effects on gambling and on mood were specific to suicide.

      Moreover, as Reviewer 3 pointed out, these “absence of evidence” cannot provide insights of “evidence of absence”. Although we median-split patients by the scores of general symptoms (e.g., depression and anxiety-related questionnaires) and verified no significant differences in these severities (Figure S11), we additionally conducted Bayesian statistics in gambling behavior, value-insensitive approach parameter, and mood sensitivity to certain rewards. BF<sub>01</sub> is a Bayes factor comparing the null model (M<sub>0</sub>) to the alternative model (M<sub>1</sub>), where M<sub>0</sub> assumes no group difference. BF<sub>01</sub> > 1 indicates that evidence favors M<sub>0</sub>. As can be seen in Table S7, most results supported null hypothesis, suggesting that general symptoms of anxiety and depression overall did not influence our main results. Overall, we believe that these analyses provide compelling evidence for the specificity of the effect to suicide, above and beyond depression and anxiety.

      Beyond these specific findings, this work highlights the broader utility of computational modelling and mood to better understand behavioral effect, showing how to use both mood and choice data to better comprehend a psychiatric issue. 

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors use a gambling task with momentary mood ratings from Rutledge et al. and compare computational models of choice and mood to identify markers of decisional and affective impairments underlying risk-prone behavior in adolescents with suicidal thoughts and behaviors (STB). The results show that adolescents with STB show enhanced gambling behavior (choosing the gamble rather than the sure amount), and this is driven by a bias towards the largest possible win rather than insensitivity to possible losses. Moreover, this group shows a diminished effect of receiving a certain reward (in the non-gambling trials) on mood. The results were replicated in an undifferentiated online sample where participants were divided into groups with or without STB based on their self-report of suicidal ideation on one question in the Beck Depression Inventory self-report instrument. The authors suggest, therefore, that adolescents with decreased sensitivity to certain rewards may need to be monitored more closely for STB due to their increased propensity to take risky decisions aimed at (expected) gains (such as relief from an unbearable situation through suicide), regardless of the potential losses.

      Strengths:

      (1) The study uses a previously validated task design and replicates previously found results through well-explained model-free and model-based analyses.

      (2) Sampling choice is optimal, with adolescents at high risk; an ideal cohort to target early preventative diagnoses and treatments for suicide.

      (3) Replication of the results in an online cohort increases confidence in the findings.

      (4) The models considered for comparison are thorough and well-motivated. The chosen models allow for teasing apart which decision and mood sensitivity parameters relate to risky decision-making across groups based on their hypotheses.

      (5) Novel finding of mood (in)sensitivity to non-risky rewards and its relationship with risk behavior in STB.

      Weaknesses:

      (1) The sample size of 25 for the S- group was justified based on previous studies (lines 181-183); however, all three papers cited mention that their sample was low powered as a study limitation.

      We thank the Reviewer for rising this concern. We agree that the sample size for S<sup>-</sup> group (n=25) is modest, and the prior studies we cited also acknowledged limited power. We wanted to point out that we obtained a comparable sample size to a prior study. In the revision, we therefore updated the section to justify this sample size in which we acknowledge the limited power of our study in the limitation section. Please see our clarification below:

      Page 32:

      “Third, despite replicating our main results in an independent dataset (n=747), the modest S<sup>-</sup> subgroup size (n=25) has a limited statistical power.”

      (2) Modeling in the mediation analysis focused on predicting risk behavior in this task from the model-derived bias for gains and suicidal symptom scores. However, the prediction of clinical interest is of suicidal behaviors from task parameters/behavior - as a psychiatrist or psychologist, I would want to use this task to potentially determine who is at higher risk of attempting suicide and therefore needs to be more closely watched rather than the other way around (predicting behavior in the task from their symptom profile). Unfortunately, the analyses presented do not show that this prediction can be made using the current task. I was left wondering: is there a correlation between beta_gain and STB? It is also important to test for the same relationships between task parameters and behavior in the healthy control group, or to clarify that the recommendations for potential clinical relevance of these findings apply exclusively to people with a diagnosis of depression or anxiety disorder. Indeed, in line 672, the authors claim their results provide "computational markers for general suicidal tendency among adolescents", but this was not shown here, as there were no models predicting STB within patient groups or across patients and healthy controls.

      Thank you for these thoughtful comments. Our study focuses on why adolescent patients with suicidality have increased risk behavior, aiming to provide a mechanism-based target for suicide prevention. Therefore, our dependent variable in the mediation model was gambling behavior. We also agree that the clinically relevant question is whether suicidality can be predicted from task-derived behavior/parameters. We thus used risky behavior and the potential mental parameters to predict STB. Linear regressions showed that gambling behavior, as well as the value-insensitive approach parameter, can predict suicidal symptom scores among patients (former: β = 9.189, t = 2.004, p = 0.048; latter: β = 5.587, t = 2.890, p = 0.005). In healthy controls, these predictions failed (gambling behavior: β = 1.471, t = 0.825, p = 0.411; approach: β = 0.874, t = 1.178, p = 0.241). These results suggest that clinical relevance of these findings apply exclusively to people with a diagnosis of depression or anxiety disorder. We found same patterns for the mood parameter (mood sensitivity to certain rewards: patients: β = -28.706, t = -2.801, p = 0.006; healthy controls: β = -2.204, t = -0.528, p = 0.599). In sum, we believe that our statement of “computational markers for general suicidal tendency among adolescents” is reasonable now. Please see our revisions below:

      Page 17:

      “Furthermore, linear regression showed that gambling rate can predict the current suicidal ideation score (BSI-C, β = 9.189, t = 2.004, p = 0.048) among patients, but not among HC (β = 1.471, t = 0.825, p = 0.411), suggesting that gambling behavior has patient-specific predictive utility for suicidal symptoms.”

      Page 19:

      “Furthermore, linear regression showed that approach parameter can predict the current suicidal ideation score (β = 5.587, t = 2.890, p = 0.005) among patients, but not among HC (β = 0.874, t = 1.178, p = 0.241), suggesting that value-insensitive approach parameter has patient-specific predictive utility for suicidal symptoms.”

      Page 21:

      “Furthermore, linear regression showed that mood sensitivity to CR can predict the current suicidal ideation score (β = -28.706, t = -2.801, p = 0.006) among patients, but not among HC (β = -2.204, t = 0.528, p = 0.599), suggesting that mood sensitivity to CR has patient-specific predictive utility for suicidal symptoms.”

      (3) The FDR correction for multiple comparisons mentioned briefly in lines 536-538 was not clear. Which analyses were included in the FDR correction? In particular, did the correlations between gambling rate and BSI-C/BSI-W survive such correction? Were there other correlations tested here (e.g., with the TAI score or ERQ-R and ERQ-S) that should be corrected for? Did the mediation model survive FDR correction? Was there a correction for other mediation models (e.g., with BSI-W as a predictor), or was this specific model hypothesized and pre-registered, and therefore no other models were considered? Did the differences in beta_gain across groups survive FDR when including comparisons of all other parameters across groups? Because the results were replicated in the online dataset, it is ok if they did not survive FDR in the patient dataset, but it is important to be clear about this in presenting the findings in the patient dataset.

      Thank you for raising the important issue of multiple testing and for asking us to clarify exactly which tests were covered by the FDR procedure. In the clinical dataset we conducted a large number of inferential tests (χ<sup>2</sup>, t-tests, ANOVAs, regressions) spanning: (i) group differences in demographic/clinical characteristics; (ii) sanity checks (e.g., anxiety/depression questionnaires); (iii) primary hypotheses (e.g., group differences in risky behavior); (iv) model-based analyses (parameter checks and between-group contrasts); and (v) control/sensitivity analyses. Post-hoc t-tests were performed only when the three-group ANOVA was significant. This yielded >150 p-values. FDR was applied using all these p-values. Please see our clarification below:

      Supplementary Page 4:

      “Supplementary Note 8: Clarification for FDR correction.

      In the clinical dataset we conducted a large number of inferential tests (χ<sup2\</sup>, t-tests, ANOVAs, regressions) spanning: (i) group differences in demographic/clinical characteristics; (ii) sanity checks (e.g., anxiety/depression questionnaires); (iii) primary hypotheses (e.g., group differences in risky behavior); (iv) model-based analyses (parameter checks and between-group contrasts); and (v) control/sensitivity analyses. Post-hoc t-tests were performed only when the three-group ANOVA was significant. This yielded >150 p-values. FDR was applied using all these p-values.”

      (4) There is a lack of explicit mention when replication analyses differ from the analyses in the patient sample. For instance, the mediation model is different in the two samples: in the patient sample, it is only tested in S+ and S- groups, but not in healthy controls, and the model relates a dimensional measure of suicidal symptoms to gambling in the task, whereas in the online sample, the model includes all participants (including those who are presumably equivalent to healthy controls) and the predictor is a binary measure of S+ versus S- rather than the response to item 9 in the BDI. Indeed, some results did not replicate at all and this needs to be emphasized more as the lack of replication can be interpreted not only as "the link between mood sensitivity to CR and gambling behavior may be specifically observable in suicidal patients" (lines 582-585) - it may also be that this link is not truly there, and without a replication it needs to be interpreted with caution.

      Thank you for these important comments. This study focused on cognitive and affective computational mechanisms underlying increased risky behavior in STB. Accordingly, we compared patients with STB (S<sup>+</sup>) with patients without STB (S<sup>-</sup>) and healthy controls (HC) to examine the effects of STB on risky behavior. Therefore, group comparison, instead of dimensional measure of suicidal symptoms by Beck Scale for Suicidal Ideation, can answer our research questions directly.

      To enhance consistency between the clinical and replication datasets, we included all participants in each dataset when performing the mediation analysis. Given that S<sup>-</sup> and HC did not differ in gambling behavior or the approach parameter in the clinical dataset, we merged these two groups. In the replication dataset, to mirror the S<sup>+</sup> vs. S<sup>-</sup> contrast used clinically, we categorized the general sample into S+ and S<sup>-</sup> based on BDI item 9. The mediation results remained significant in both datasets (the clinical dataset: a×b = 0.321, 95% CI = [0.070, 0.549], p = 0.016; the replication dataset: a×b = 0.143, 95% CI = [0.016, 0.288], p = 0.031), suggesting that STB is associated with increased risk behavior via stronger approach motivation.

      We also acknowledge the non-replication of the correlation between gambling behavior and mood sensitivity to certain rewards in the online sample. While this pattern might indicate that the link is specific to suicidal patients, it may also reflect sample-specific or unstable effects; thus, we now state this explicitly and interpret the finding with caution. Please see our revisions below:

      Page 15:

      “We next verified our results in an independent dataset, including the same task and BDI questionnaire in 747 general participants (500 females; age: 20.90±2.41) (46). One item in BDI involves the measurement of STB. In item 9 of BDI, participants chose one option that describes them best: Option 1, “I don't have any thoughts of killing myself.”; Option 2, “I have thoughts of killing myself, but I would not carry them out.”; Option 3, “I would like to kill myself.”; Option 4, “I would kill myself if I had the chance.”. In line with the current definition of S<sup>+</sup>/S<sup>-</sup> in the clinical dataset, we identified S<sup>+</sup> group as choosing Option 2, 3, or 4, while participants selecting Option 1 were categorized as S<sup>-</sup> group.”

      Page 19:

      “Given significant correlations between group, approach parameter, and gambling rate for gain trials (ps < 0.017), we further conducted a mediation analysis with the assumption of the mediating effect of approach motivation of suicidality on the risk behavior. Given that we aimed to test the effect of STB, with S<sup>-</sup> and HC as controls, and given that S<sup>-</sup> and HC did not differ in gambling behavior or in the approach parameter, we merged these two groups for the mediation analysis. Results supported our hypothesis (a×b = 0.321, 95% CI = [0.070, 0.549], p = 0.016; Figure 2C), confirming that suicidal thoughts and behavior increase risk behavior through stronger approach motivation.”

      Page 26:

      “However, we did not observe any significant correlation between mood sensitivity to CR and gambling behavior (ps > 0.389), which suggests that the link between mood sensitivity to CR and gambling behavior may be specifically observable in suicidal patients. Alternatively, this non-replicated result may also reflect sample-specific or unstable effects, which needs to be interpreted with caution.”

      (5) In interpreting their results, the authors use terms such as "motivation" (line 594) or "risk attitude" (line 606) that are not clear. In particular, how was risk attitude operationalized in this task? Is a bias for risky rewards not indicative of risk attitude? I ask because the claim is that "we did not observe a difference in risk attitude per se between STB and controls". However, it seems that participants with STB chose the risky option more often, so why is there no difference in risk attitude between the groups?

      Thank you for pointing out the ambiguity. In our manuscript, “motivation” and “risk attitude” are defined at the computational level. Following prior work with this task Rutledge et al., (2015, 2016), we decompose observed gambling into (i) value-dependent valuation parameters that capture risk attitude (e.g., risk aversion and loss aversion, which scale the subjective value of outcomes), and (ii) value-insensitive, valence-dependent biases that capture approach/avoidance motivation. Accordingly, a higher gambling rate does not imply a change in risk attitude per se: it can arise from an increased value-insensitive approach bias even when risk-attitude parameters are comparable between groups—which is what we observe for S<sup>+</sup> vs. controls. We have clarified this point in the computational modeling section.

      Pages 12-13:

      “Please note that a higher gambling rate does not imply a change in risk attitude per se: it can arise from an increased value-insensitive approach bias even when risk-attitude parameters are comparable between groups. Risk attitude is indeed conceptualized in economics as the curvature of the utility function (i.e., the subjective value) of the objective outcomes, with concave curves associated with risk aversion, and convex curves associated with risk seeking (54,56). By contrast, the approach or avoidance bias apply to all the value. A possible interpretation of the approach bias is that participant approach the option with the highest possible gain (the lottery) in the gain frame; the avoidance bias would then reflect a tendency to systematically avoid the highest potential losses (the lottery) in the loss frame.”

      Reviewer #2 (Public review):

      Summary:

      This article addresses a very pertinent question: what are the computational mechanisms underlying risky behaviour in patients who have attempted suicide? In particular, it is impressive how the authors find a broad behavioural effect whose mechanisms they can then explain and refine through computational modeling. This work is important because, currently, beyond previous suicide attempts, there has been a lack of predictive measures. This study is the first step towards that: understanding the cognition on a group level. This is before being able to include it in future predictive studies (based on the cross-sectional data, this study by itself cannot assess the predictive validity of the measure).

      Strengths:

      (1) Large sample size.

      (2) Replication of their own findings.

      (3) Well-controlled task with measures of behaviour and mood + precise and well-validated computational modeling.

      Weaknesses:

      I can't really see any major weakness, but I have a few questions:

      (1) I can see from the parameter recovery that the parameters are very well identified. Is it surprising that this is the case, given how many parameters there are for 90 trials? Could the authors show cross-correlations? I.e., make a correlation matrix with all real parameters and all fitted parameters to show that not only the diagonal (i.e., same data is the scatter plots in S3) are high, but that the off-diagonals are low.

      Thank you for raising these thoughtful concerns. The current task consisted of 90 choices and 36 mood ratings. There were 5 choice parameters and 4 mood parameters. The apparently strong identifiability is not unexpected, as 90 choice trials and 36 mood ratings are comparable to those in prior computational modeling literature (Blain & Rutledge, 2022).

      As suggested, we computed cross-correlations between all generating (“true”) and recovered (“fitted”) parameters. The resulting matrix showed high diagonal (choice winning model: rs > 0.91; mood winning model: rs > 0.90) and low off-diagonal (choice winning model: abs(rs) < 0.63; mood winning model: abs(rs) > 0.40) correlations, further supporting parameter recovery. Please see our clarifications below:

      Supplementary Pages 2-3:

      “Parameter recovery: Figure S3 shows good parameter recovery for both choice and mood winning model (choice: rs > 0.91, ps < 0.001; intraclass coefficients > 0.78; mood: rs > 0.90, ps < 0.001; intraclass coefficients > 0.86). Moreover, we computed cross-correlations between all generating (“true”) and recovered (“fitted”) parameters. The resulting matrix showed high diagonal (choice winning model: rs > 0.91; mood winning model: rs > 0.90) and low off-diagonal (choice winning model: abs(rs) < 0.63; mood winning model: abs(rs) > 0.40) correlations, further supporting parameter recovery.”

      Page 10:

      “The numbers of choice trials and mood ratings were comparable to those in prior computational modeling studies (34,35).”

      (2) Could the authors clarify the result in Figure 2B of a correlation between gambling rate and suicidal ideation score, is that a different result than they had before with the group main effect? I.e., is your analysis like this: gambling rate ~ suicide ideation + group assignment? (or a partial correlation)? I'm asking because BSI-C is also different between the groups. [same comment for later analyses, e.g. on approach parameter].

      Thank you for pointing out the lack of clarity. We performed group difference analysis and correlation of suicidal ideation analysis, separately. We first performed group difference analysis to test our hypothesis of STB effects. We then conducted correlational analysis to further specify our findings.

      (3) The authors correlate the impact of certain rewards on mood with the % gambling variable. Could there not be a more direct analysis by including mood directly in the choice model?

      Thank you for this insightful suggestion. As suggested, we tried to integrate mood into choice models by adding mood bias component(s) in line with previous literature (Vinckier et al., 2018). The first model (mcM1) assumes that mood biases choice, building on cM3 (the winning choice model). cmM2 further separated the mood bias parameter into two components according to participants’ choices.

      However, model comparison using BIC supported cM3 (Table S6), that is, without consideration of mood in choice modeling. This can be due to the lack of block design in our experimental design unlike e.g., Vinckier et al., (2018) and Eldar & Niv, (2015). Please see our clarifications below:

      Supplementary Pages 3-4:

      “Supplementary Note 6: integration of mood into choice models

      Although we modeled choice and mood separately to examine cognitive and affective mechanisms underlying increased risk behavior in adolescent suicidal patients, one interesting question was whether mood responses influence subsequent gambling choices and how to model them. First, we median-split mood responses (except the final rating) to compare gambling rate. Results showed a trend for less gambling rate in higher mood (t = -1.971, p = 0.050). However, there was no significant group difference (F = 0.680, p = 0.507). Second, with the assumption that mood biases choice, we constructed mcM1 based on cM3 (the winning choice model).

      Based on our finding of the negative correlation between mood sensitivity to certain rewards and gambling rate in S<sup>+</sup>, we separated β<sub>Mood</sub> parameter into β<sub>Mood-CR</sub> and β<sub>Mood-GR</sub> (cmM2).

      Model comparison using BIC supported cM3 (Table S6), that is, without consideration of mood in choice modeling. The mood bias parameters in neither cM2 nor cM3 reached significance (ps > 0.091), which may be due to the absence of a blocked design in our experiment, unlike in Vinckier et al. (2018) and Eldar and Niv (2015).”

      (4) In the large online sample, you split all participants into S+ and S-. I would have imagined that instead, you would do analyses that control for other clinical traits. Or, for example, you have in the S- group only participants who also have high depression scores, but low suicide items.

      Thank you for this insightful suggestion. Following prior suicide-related literature (Tsypes et al., 2024), we controlled for depression by including them as covariates. Note that depression scores were derived from our established bifactor model (Wang et al., 2025), which decomposed depression from the anxiety. These results remained largely significant (ps ≤ 0.050), except a marginally significant effect of group on gambling behavior (p = 0.059). Despite a trend, this effect with covariates of depression-related questionnaires is strong in our clinical cohort (p = 0.024; Table S8). This suggests that the link between suicidality and risky behavior persists above and beyond general depressive symptoms.

      Please see our clarifications below:

      Page 26:

      “After controlling for depression severity using our established bifactor model (see ref 60 for details), these results remained significant (ps ≤ 0.050), except a marginally significant effect of group on gambling behavior (p = 0.059). Despite a trend, this effect with covariates of depression-related questionnaires is strong in our clinical cohort (p = 0.024; Table S8). This suggests that the link between suicidality and risky behavior persists above and beyond general depressive symptoms.”

      Reviewer #3 (Public review):

      This manuscript investigates computational mechanisms underlying increased risk-taking behavior in adolescent patients with suicidal thoughts and behaviors. Using a well-established gambling task that incorporates momentary mood ratings and previously established computational modeling approaches, the authors identify particular aspects of choice behavior (which they term approach bias) and mood responsivity (to certain rewards) that differ as a function of suicidality. The authors replicate their findings on both clinical and large-scale non-clinical samples.

      (1) The main problem, however, is that the results do not seem to support a specific conclusion with regard to suicidality. The S+ and S- groups differ substantially in the severity of symptoms, as can be seen by all symptom questionnaires and the baseline and mean mood, where S- is closer to HC than it is to S+. The main analyses control for illness duration and medication but not for symptom severity. The supplementary analysis in Figure S11 is insufficient as it mistakes the absence of evidence (i.e., p > 0.05) for evidence of absence. Therefore, the results do not adequately deconfound suicidality from general symptom severity.

      Thank you for this important comment. Based on clinical interviews, we included patients with and without suicidality (S<sup>+</sup> and S<sup>-</sup> groups). However, in line with suicidal-related literature (e.g., Tsypes et al., 2024), two groups also differed substantially in the severity of symptoms (see Table 1). To address the request for evidence on specificity to suicidality beyond general symptom severity, we performed separate linear regressions to explain in gambling behaviour, value-insensitive approach parameter (β<sub>gain</sub>), and mood sensitivity to certain rewards (β<sub>CR</sub>) with group as a predictor (1 for S<sup>+</sup> group and 0 for S<sup>-</sup> group) and scores for anxiety and depression as covariates. Results remained significant after controlling anxiety and depression (ps < 0.027; Table S8). Given high correlations among anxiety and depression questionnaires (rs > 0.753, ps < 0.001), we performed Principal Components Analysis (PCA) on the clinical questionnaire to extract the orthogonal components, where each component explained 86.95%, 7.09%, 3.27%, and 2.68% variance, respectively. We then performed linear regressions using these components as covariates to control for anxiety and depression. Our main results remained significant (ps < 0.027; Table S9). We believe that these analyses provide evidence that the main effects on gambling and on mood were specific to suicide.

      As pointed out, these “absence of evidence” cannot provide insights of “evidence of absence”. Although we median-split patients by the scores of general symptoms (e.g., depression and anxiety-related questionnaires) and verified no significant differences in these severities (Figure S11), we additionally conducted Bayesian statistics in gambling behavior, value-insensitive approach parameter, and mood sensitivity to certain rewards. BF<sub>01</sub> is a Bayes factor comparing the null model (M<sub>0</sub>) to the alternative model (M₁), where M<sub>0</sub> assumes no group difference. BF<sub>01</sub> > 1 indicates that evidence favors M<sub>0</sub>. As can be seen in Table S7, most results supported null hypothesis, suggesting that general symptoms of anxiety and depression overall did not influence our main results. Overall, we believe that these analyses provide compelling evidence for the specificity of the effect to suicide, above and beyond depression and anxiety.

      Please see our revisions below:

      Page 17:

      “Within patients, this group effect on gambling rate remained significant after controlling for sex, illness duration, family history, diagnosis, and various medications use (ps < 0.05), as well as general symptoms (e.g., depression and anxiety; p = 0.024; also see Figure S11, Table S7 and Table S8). Given high correlations among anxiety and depression questionnaires (rs > 0.753, ps < 0.001), we performed Principal Components Analysis (PCA) to extract main components, where each component explained 86.95%, 7.09%, 3.27%, and 2.68% variance, respectively. To further control for anxiety and depression, linear regression using these components as covariates revealed that the group effect on gambling rate remained significant (p = 0.024; Table S9).”

      Pages 18-19:

      “Within patients, this group effect on the approach parameter remained significant after controlling for sex, illness duration, family history, diagnosis, and various medications use (ps < 0.05), as well as general symptoms (e.g., depression and anxiety; p = 0.027; also see Figure S11, Table S7 and Table S8). Linear regression using PCA components as covariates revealed that the group effect on approach parameter remained significant (p = 0.027; Table S9).”

      Page 21:

      “Within patients, this group effect on βCR remained significant after controlling for gambling rate, earnings, mood-related outcome effect, mood drift effect, sex, illness duration, family history, diagnosis, and various medications use (ps < 0.032), as well as general symptoms (e.g., depression and anxiety; p = 0.001; also see Figure S11, Table S7 and Table S8). Linear regression using PCA components as covariates revealed that the group effect on this mood parameter remained significant (p = 0.001; Table S9).”

      (2) The second main issue is that the relationship between an increased approach bias and decreased mood response to CR is conceptually unclear. In this respect, it would be natural to test whether mood responses influence subsequent gambling choices. This could be done either within the model by having mood moderate the approach bias or outside the model using model-agnostic analyses.

      Thank you for this important suggestion. As suggested, one interesting question was whether mood responses influence subsequent gambling choices and how to model them. First, we median-split mood responses (except the final rating) to compare gambling rate. Results showed a trend for less gambling rate in higher mood (t = -1.971, p = 0.050). However, there was no significant group difference (F = 0.680, p = 0.507). Second, with the assumption that mood biases choice, we constructed mcM1 based on cM3 (the winning choice model). Based on our finding of the negative correlation between mood sensitivity to certain rewards and gambling rate in S<sup>+</sup>, we separated β<sub>Mood</sub> parameter into β<sub>Mood-CR</sub> and β<sub>Mood-GR</sub> (cmM2). Model comparison using BIC supported cM3 (Table S6), that is, without consideration of mood in choice modeling. This can be due to the lack of block design in our experimental design unlike e.g., Vinckier et al., (2018) and Eldar & Niv, (2015). Please see Supplementary Pages 3-4:

      (3) Additionally, there is a conceptual inconsistency between the choice and mood findings that partly results from the analytic strategy. The approach bias is implemented in choice as a categorical value-independent effect, whereas the mood responses always scale linearly with the magnitude of outcomes. One way to make the models more conceptually related would be to include a categorical value-independent mood response to choosing to gamble/not to gamble.

      We apologise for the unclear statement. The approach bias is implemented in choice as a continuous value-independent effect, ranging from -1 to 1.

      It was true that the mood responses always scale with the magnitude of outcomes, since mood ratings were request after the outcomes. Therefore, mood parameters and the approach bias were both continuous.

      We also attempted to integrate mood into choice modelling. See Response 2 for Reviewer 3 for details.

      (4) The manuscript requires editing to improve clarity and precision. The use of terms such as "mood" and "approach motivation" is often inaccurate or not sufficiently specific. There are also many grammatical errors throughout the text.

      Thank you for this important suggestion. We have now explained motivation and mood in the Introduction section and the computational modeling section. Please see our clarifications below:

      Pages 3-4:

      “A growing literature indeed shows that risky behavior can be far better explained after adding value-insensitive approach and avoidance components to prospect theory(18,19), that is by including a decision bias in favor of the highest gain (approach) and another decision bias against the lowest loss (avoidance), above and beyond options value difference. This class of models highlights the important role of value-insensitive motivational components in decision making in addition to risk attitude-driven valuation (e.g., loss/risk aversion)(20).”

      Page 5:

      “Although mood is thought to persist for hours, days, or even weeks(30-33), momentary mood, measured over the timescale in the laboratory setting, represents the accumulation of the impact of multiple events at the scale of minutes(30,32,34-38). Momentary mood external validity is demonstrated e.g., through its association with depression symptoms(37). Mood is different from emotions, which reflect immediate affective reactivity and is more transient (e.g., from surprise to fear)(31-33,39).”

      We have corrected grammatical errors throughout the manuscript.

      5) Claims of clinical relevance should be toned down, given that the findings are based on noisy parameter estimates whose clinical utility for the treatment of an individual patient is doubtful at best.

      Thank you for this comment. We agree that we did not evaluate the noise in our estimate e.g., by assessing the test-retest reliability on the task parameters, which is outside the scope of the study, and it is indeed possible that parameter estimate is somehow noisy. Therefore, we tone down the clinical relevance of our results. Please see our revision below:

      Page 32:

      “Next, we did not evaluate the noise in our estimate e.g., by assessing the test-retest reliability on the task parameters and it is indeed possible that parameter estimate is somehow noisy.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Title: I believe "aberrant mood dynamics" is both too general and overstating the results of this study, which did not measure mood dynamics longitudinally. "Aberrant" is also overly pathologizing. I would suggest sticking more directly to the results, for instance, "Insensitivity of momentary mood to non-risky rewards in adolescent suicidal patients".

      Thank you for this suggestion. We have now corrected it.

      (2) Abstract: in line 61, "Our study uncovers the cognitive and affective mechanisms" suggests that these are the only ones, and you uncovered them. Of course, there could be more mechanisms contributing to risk behavior in STB, so I would suggest removing the word "the" or adding "one of the".

      Thank you for this suggestion. We have now corrected it.

      (3) One major weakness of this study is that suicidal thoughts and behaviors were not assessed via a clinical instrument such as the Columbia Suicide Severity Rating Scale - this should be mentioned upfront.

      Thank you for this comment. According to medical records and information from family and friends by the researcher and psychiatrists, patients with suicidal thoughts and behaviors were categorized as suicidal group (S<sup>+</sup>), while patients without suicidal thoughts and behaviors were identified as control group (S<sup>-</sup>). Note that medical records and information were recorded from clinical interviews where the psychiatrists were vigilant for signs of suicidal ideation and inquired about suicidal-related thoughts and behaviors from both the patients and their families. Therefore, the current group operation was possibly comparable to Columbia Suicide Severity Rating Scale.

      (4) Table 1: female/male are sex, not gender (gender is man/woman/transgender/non-binary).

      Thank you for this suggestion. We have now corrected it.

      (5) Equation 1: It would be good to clarify what happens in gain-only or loss-only trials (the other value is then 0, but this can be clarified as it is not technically a loss or a gain).

      Thank you for this suggestion. We have now corrected it. Please see below for our revision:

      Page 12:

      “Please note that V<sub>gain</sub> is 0 in gain trials and V<sub>loss</sub> is 0 in loss trials.”

      (6) Figure 1E: The model prediction is not informative here. Given the linear regression model, there is no other option except that the mean prediction would overlap with the mean empirical measurement (unless the model was specified incorrectly). The same is true in Figure 2A.

      Thank you for this suggestion. We have now removed plots for model prediction.

      (7) Figure 1G: There was no analysis of the differences between groups in terms of earnings, given that the ANOVA was not significant. Still, if the claim is that risky behavior is sometimes suboptimal in this task, it would be good to show that there is a correlation between, say, symptoms of STB across groups and 1) risky behavior and 2) earnings.

      Thank you for this insightful comment. In the patient cohort, risky behavior (gambling rate)—but not earnings—predicted the current suicidal ideation score (BSI-C, β = 9.189, t = 2.004, p = 0.048; earnings, β = 0.001, t = 0.582, p = 0.562). The lack of association for earnings is consistent with the task design, in which there is no stable optimal policy and payouts are only a coarse proxy for decision quality. Future work in learning paradigms, where optimality is well defined, may be better suited to test earnings-based links to STB. We have clarified this point below:

      Page 32:

      “Second, although we assumed that increased risky behavior in STB was suboptimal, the current task was not suited to test this, given the task design of random feedback for gambling option. Future work in learning paradigms, where optimality is well defined, may be better suited to test earnings-based links to STB.”

      (8) Line 290: "beta_gain: -1-1" is unclear. I believe you meant beta_gain \in [-1,1].

      Thank you for this suggestion. We have now corrected it to make it clear.

      (9) The gain and loss biases are modeled as minimum and maximum probabilities for choosing the gamble. This is a legitimate choice for value-agnostic biases, but it is not the traditional choice (as far as I know). I wonder if the same results would hold with the more traditional formulation of the bias as an added constant to the utility of the gamble, i.e., p(gamble) = 1/(1+ exp(-mu(U_gamble + beta_gain - U_certain)). I believe in this case, you would also not have to specify different equations for positive or negative biases, or to limit the bias to the range of [-1,1] (indeed, the bias would be in reward-equivalent units).

      Thank you for this suggestion. The winning choice model we used here was consistent with previous literature (Rutledge et al., 2015 & 2016), which decomposed the decision process into risk-attitude-driven valuation (e.g., loss and risk aversion) and value-insensitive motivational components. These approach/avoidance parameters are a decision bias in favor of the highest gain (approach) and another decision bias against the lowest loss (avoidance), above and beyond options value difference.

      As suggested, we also compared the traditional bias choice model. Model comparison did not support this. Please see our revision below:

      Supplementary Page 4:

      “We also considered the traditional bias parameter (cM4), rather than approach/avoidance parameters. We limited the bias to the range of [-100, 100], which was in reward-equivalent units.

      However, model comparison did not support cM4 (Table S6).”

      (10) Also, for equations 5-8, it seems that 5-6 are identical to 7-8 except for the use of beta_gain versus beta_loss. You might want to consider simplifying by putting beta in the equations and specifying in the text that, depending on the trial type (loss or gain), the relevant beta is used.

      Thank you for this suggestion. We have now simplified it. Please see response to Reviewer 2, point 3.

      (11) It is not clear what equations are applied to mixed trials in cM3.

      Sorry for the confusion. We have now clarified this point.

      Page 12:

      “Approach/avoidance parameters are not applied to in mixed trials.”

      (12) Model comparison: the mood models are nested within each other (e.g., mM3 can be derived from mM1 by setting beta_EV = beta_RPE). In this case, model comparison can use the likelihood ratio test instead of BIC, which can be too conservative (and therefore does not support the extra beta parameter for RPE, different from previous results in the literature). I wonder if a likelihood ratio test would lead to results more in line with previous findings with this task?

      Thanks for this suggestion. We agree that mM1 (CR+EV+RPE) and mM3 (CR+GR) are nested. However, our model space also included unnested models, such as mM5 (CR+GR<sub>better</sub>+GR<sub>worse</sub>). Therefore, it was not reasonable in our model space to use likelihood ratio tests.

      (13) Line 346: The replication sample is described as "healthy participants," however, their health (or mental health) status was not assessed, and they may as well have mental health concerns. I would suggest calling this a general sample or an undifferentiated sample - but not a healthy sample.

      Sorry for the confusion. We have now corrected this phrase.

      (14) Line 363: "in addition to the replication of previous findings in the validation dataset" is unclear. Are those tests not two-tailed?

      Sorry for the unclear statement. In the replication analyses, we used one-tailed t-tests because the direction of the effect was revealed on the clinical dataset. Please see our clarification below:

      Page 15:

      “For the replication of previous findings in the validation dataset, we used one-tailed tests in line with our clinically motivated directional hypothesis.”

      (15) Line 372: "validating our group manipulation" - the presented work does not have a manipulation. Maybe you meant "validating our grouping of participants"?

      Thank you for this suggestion. We have now corrected it to make it clear.

      (16) Figure 2B: It is not clear how the data were binned for illustration purposes only, and why this binning is necessary (I have not seen it in other papers) - presenting the data from each subject and the correlation line with error margins (as is done here) should be sufficient.

      Thank you for flagging this. For illustration only, we binned the data proportional to group sizes: in the patient sample (S<sup>-</sup> n = 25; S<sup>+</sup> n = 58; ≈1:2), we displayed 3 bins for S<sup>-</sup> and 6 bins for S<sup>+</sup>. We agree that binning is not necessary; all statistics were computed on raw, unbinned data. The binned panel was included solely for visualization, consistent with our prior work (Blain et al., 2023).

      (17) Table 2: delta BIC should be presented per subject (that is, divided by the number of subjects in each group), as the groups are of different sizes, so as presented now, the columns are not comparable across groups.

      Thank you for the helpful suggestion. Our goal in Table 2 is not to compare ΔBIC magnitudes across groups, but to identify the winning model within each group. The ΔBICs are aggregated at the group level solely to rank models for that group. Dividing by the number of participants would rescale each group’s column by a constant and would therefore not affect the within-group ranking or the conclusion that cM3 is the best model in all groups. For this reason, we retain the current presentation and interpret each column within group rather than across groups.

      (18) Line 640 - the effect of expectations and prediction errors on mood was not only shown in healthy people, but also in people with depression (Rutledge et al., 2007, https://pubmed.ncbi.nlm.nih.gov/28678984/)

      Thank you for this comment. Indeed, Rutledge et al., (2017) showed evidence for CR+EV+RPE mood model in adult people with depression. However, our study recruited adolescents with depression or anxiety, given that adolescent period might provide a developmental window for opportunities for early intervention of suicidality. Therefore, it is also possible that the current winning model was specific to adolescents. Please see our clarifications below:

      Page 28:

      “It is also possible that the current winning model was specific to adolescents. Given that Rutledge et al., (2017) supported the “CR-EV-RPE model” in adults with depression, our study with adolescent populations may suggest a developmental change for mood sensitivities.”

      (19) Supplemental material: Is the R2 section about R-squared? Perhaps you can use superscript on the 2 to make that clearer? For Figure S2, how was model recovery determined? Should I interpret the confusion matrix as suggesting that the winning model for each and every simulated subject was the generating model, or was the winning model determined for the whole simulated population in each of the 100 simulations? Traditionally, confusion matrices use the former measure, but the results of 100% recoverability make me suspect the latter was used here. In Figure S3, should we not be looking at simulated parameters and recovered parameters? What are "real parameters" here?

      Thank you for these important comments. We now consistently denote the coefficient of determination as R<sup>2</sup> (with a superscript 2) throughout the manuscript and Supplementary Materials.

      For the model recovery analysis in Figure S2, we have clarified that the confusion matrix is computed at the population level. Specifically, for each of the 100 simulations we generated a full dataset under each candidate model, fit all models to that dataset, and selected the winning model based on group-level model evidence (BIC). Each cell in the confusion matrix therefore reflects the proportion of simulations in which model j was selected as the best-fitting model when the data were generated by model i. This operation was reasonable because the decision of the winning model is made on the population-level dataset rather than on individual subjects.

      In Figure S3, the term “real parameters” referred to the parameters used to generate the simulated data. To avoid confusion, we now relabel these as “simulated (generating) parameters” and explicitly describe the figure as showing the relationship between simulated (generating) parameters and recovered parameters. Please see our revisions below:

      Supplementary Pages 2-3:

      “Model recovery: We generated 100 simulated datasets for each model (3 choice models and 8 mood models) using the fitted parameters of each model as the ground truth. Each dataset contained 201 trials and included 3 (or 8) sets of simulated data corresponding to the respective models. For each simulated dataset, we then fit all models and determined the winning model at the population level based on group-level BIC, yielding a confusion matrix in which each entry represents the proportion of simulations in which model j was selected as the best-fitting model when the data were generated by model i. As shown in Figure S2, all models are highly identifiable, indicating excellent recovery performance for both the choice and mood models.”

      “Parameter recovery: Figure S3 shows good parameter recovery for both choice and mood winning model (choice: rs > 0.91, ps < 0.001; intraclass coefficients > 0.78; mood: rs > 0.90, ps < 0.001; intraclass coefficients > 0.86). Moreover, we computed cross-correlations between all generating (“generating”) and recovered (“fitted”) parameters. The resulting matrix showed high diagonal (choice winning model: rs > 0.91; mood winning model: rs > 0.90) and low off-diagonal (choice winning model: abs(rs) < 0.63; mood winning model: abs(rs) > 0.40) correlations, further supporting parameter recovery.”

      Typos:

      (1) Line 90: original → originate

      (2) Line 596-598 - the same phrase is repeated twice.

      (3) Line 616: on the other word → hand.

      Sorry for the mistakes. We have now corrected them throughout the manuscript.

      Reviewer #2 (Recommendations for the authors):

      For people unfamiliar with interpersonal theory or motivational-volitional model, or three-step theory (lines 105-106), could you briefly explain the key idea of mood and suicide before going to the decision-making tasks? And from this, maybe motivate the predictions in your task? In particular, in the abstract and introduction, the phrasing could be a bit more concise and simpler. In the abstract, sentences were sometimes quite long. In the introduction, some paragraphs are somewhat repetitive. In the discussion, there were some typos.

      Thank you for these suggestions. We have now explained the key idea of mood and suicide before going to the decision-making tasks in the introduction, which can be seen below:

      Pages 4-5:

      “Contemporary theories of suicide converge on the idea that STB is initially caused by low mood experience. The interpersonal theory of suicide proposes that suicidal desire arises when people simultaneously feel socially disconnected (“thwarted belongingness”) and like a burden on others (“perceived burdensomeness”), experiences that are tightly linked to chronically low mood(25). The motivational–volitional model(26) and the three-step theory(27,28) similarly emphasize that when negative mood and feelings of defeat or entrapment are experienced as inescapable, they can give rise to suicidal ideation, and that the progression from ideation to suicide attempts depends on additional factors such as reduced fear of death, increased pain tolerance, and a tendency to act impulsively under intense affect. Some official organizations, e.g., National Institute of Mental Health, have also listed mood problems as warning signals(8). Interestingly, within the framework of decision making under uncertainty, gambling on lotteries with a revealed outcome has been found to induce high mood variance(29), providing an opportunity to assess the relationship between deficient mood and increased gambling decisions in STB.”

      We have also refined the wording and corrected typos throughout the manuscript.

      Reviewer #3 (Recommendations for the authors):

      (1) Since many readers might only read the abstract, it is important that it is both informative and accurate. I have two suggestions in this respect. First, for the abstract to be more informative, it may be helpful to indicate already there that these are value-insensitive approach-avoidance parameters, in the sense that they favor/disfavor the gamble regardless of the potential outcomes' magnitude or probability. This issue is also present throughout the text, where the phrases "approach and avoidance motivation" are referred to as if they have established and precise computational definitions. In my view, these terms could just as easily be interpreted as parameters that multiply the value of potential gains or losses, which is not what the authors mean. It would be helpful to clarify this terminology.

      Thank you for these suggestions. In line with previous literature (Rutledge et al., 2015 & 2016), approach and avoidance motivation are indeed defined at the computational level, referring to a decision bias in favor of the highest gain (approach) and another decision bias against the lowest loss (avoidance), above and beyond options value difference. We have cited these papers in the manuscript. We also make it clear to further clarify approach and avoidance parameters in the abstract and introduction. Please see our revisions below:

      Page 2 (Abstract):

      “Using a prospect theory model enhanced with value-insensitive approach-avoidance parameters revealed that this rise in risky behavior resulted only from a heightened approach parameter in S<sup>+</sup>.Altogether, model-based choice data analysis indicated dysfunction in the approach system in S<sup>+</sup>, leading to greater propensity for gambling in the gain domain regardless of the lottery expected value.”

      Page 3 (Introduction):

      “A growing literature indeed shows that risky behavior can be far better explained after adding value-insensitive approach and avoidance components to prospect theory(18,19), that is by including a decision bias in favor of the highest gain (approach) and another decision bias against the lowest loss (avoidance), above and beyond options value difference. This class of models highlights the important role of value-insensitive motivational components in decision making in addition to risk attitude-driven valuation (e.g., loss/risk aversion)(20).”

      (2) The statement "our study uncovers the cognitive and affective mechanisms contributing to increased risk behavior in STB" is overstating the findings, as the study may have uncovered some contributing mechanisms, but likely not all of them. Removing the word "the" would fix this issue.

      Thank you for this suggestion. We have now corrected it.

      (3) Since mood is typically defined as lasting hours, it's inappropriate to refer to ratings that only reflect the last few trials as self-reports of mood. To be sure, I view the distinction between emotions and moods as quantitative, not qualitative, so I do not think there is a problem studying the former to understand the latter, but to avoid confusion, the terminology should follow common usage.

      Thank you for this suggestion. We follow previous work and operational definitions regarding mood (Rutledge et al., 2014, Eldar & Niv, 2015, Vinckier et al., 2018). Emotion is usually a very brief response to a specific stimulus (Emanuel & Eldar, 2023), e.g., leading to rapid changes like surprise then fear. In contrast, mood is defined as a diffuse state that is not specific to one stimulus. Here, we operationally and computationally define mood as an affective state reflecting the recent history of safe and gamble outcomes. We now clarify that point in the main text. Please see our revision below:

      Page 5:

      “Although mood is thought to persist for hours, days, or even weeks(30-33), momentary mood, measured over the timescale in the laboratory setting, represents the accumulation of the impact of multiple events at the scale of minutes(30,32,34-38). Momentary mood external validity is demonstrated e.g., through its association with depression symptoms(37). Mood is different from emotions, which reflect immediate affective reactivity and is more transient (e.g. from surprise to fear)(31-33,39).”

      (4) Line 78: The phrases "increase in risk attitude", "decrease in loss attitude", and "decrease in value-independent choice biases" are unclear to me in terms of their directionality. An attitude might be avoidant or embracing. If it is the former then increasing it would decrease risk-taking.

      Thank you for pointing out the ambiguity. We have now corrected them throughout the manuscript. Please see our revision below:

      Page 4:

      “We therefore hypothesized that heightened approach motivation, or weakened avoidance motivation, would account for increased risk behavior in STB.”

      (5) Line 125: I was not sure why one would expect the mood response to gamble-related quantities (EV and RPE) to be lower in STB and not higher.

      Sorry for the typo. We hypothesized that mood would respond more strongly to gambling-related quantities—expected value (EV) and reward prediction error (RPE)—in adolescents with STB than in controls, given prior evidence that STB is associated with greater risk-taking.

      (6) The text could use proofreading, as there are many typos. These are from the first 100 lines alone:

      a) Abstract: regardless the lotteries -> regardless of the lotteries'.

      b) Line 78: it remains whether.

      c) Line 80: can each -> each can.

      d) Line 90: may original from.

      Sorry for the mistakes. We have now corrected them throughout the manuscript.

      (7) The rationale for focusing on the S+ group for mood model comparison is incorrect. The purpose is to identify parameters that vary as a function of suicidality, and for that, the S- group is just as important.

      Thank you for this comment. We agree that the S<sup>-</sup> group is as important as the S<sup>+</sup> group. A direct comparison was complicated because the winning mood models differed (S<sup>+</sup>: mM3; S<sup>-</sup>: mM5; Table 3). To ensure comparability, we checked results from both model specifications (mM3 and mM5). The conclusions were convergent: mood sensitivity to certain rewards (CR) was lower in S<sup>+</sup> than in S<sup>-</sup> (see Fig. 3 for mM3 and Fig. S8 for mM5).

      (8) There appears to be a contradiction between the inclusion criteria, which include having experienced suicidal thoughts and behaviors, and the definition of the S- group as not having suicidality.

      Thank you for pointing out this mistake. The corrected version of inclusion criteria can be seen on Page 7:

      “Patients were included if they met the following criteria: 1) both the researcher and psychiatrists agreed on their group classification; 2) they had a current diagnosis of major depressive disorder (MDD; unipolar depression), generalized anxiety disorder (GAD), or bipolar disorder with depressive episodes (BD), confirmed by two experienced psychiatrists using the Structured Clinical Interview for DSM-IV-TR-Patient Edition (SCID-P, 2/2001 revision; see Supplementary Note 1 for details); 3) they were between 10 and 19 years of age; 4) they had no organic brain disorders, intellectual disability, or head trauma; 5) they had no history of substance abuse; 6) they had no experience of electroconvulsive therapy.”

      (9) It would be helpful to specify whether mood modeling was based on objective or subjective values, and why.

      Thank you for this helpful suggestion. We have now clarified whether mood modeling was based on objective or subjective values, and why. Specifically, we constructed two model families: one in which mood was driven by objective monetary outcomes (objective values) and one in which mood was driven by subjective values derived from each participant’s fitted choice model (subjective values). We then used the VBA_groupBMC function in the VBA toolbox to perform family-wise model comparison, with 8 candidate mood models within each family. Consistent with previous literature, the objective-value family provided a clearly superior fit to the data (exceedance probability, EP = 1.000). Based on this result and for parsimony, we report and interpret the mood modeling results from the objective-value family in the main text. We have clarified this point below:

      Supplement Pages 4-5:

      “Supplementary Note 9: Mood model comparison using subjective values.

      To identify whether mood modeling was based on objective or subjective values, we constructed two model families: one in which mood was driven by objective monetary outcomes (objective values) and one in which mood was driven by subjective values derived from each participant’s fitted choice model (subjective values). We then used the VBA_groupBMC function in the VBA toolbox (Daunizeau et al., 2014) to perform family-wise model comparison, with 8 candidate mood models within each family. Consistent with previous literature, the objective-value family provided a clearly superior fit to the data (exceedance probability, EP = 1.000).”

    1. Author Response:

      The following is the authors’ response to the previous reviews

      Public Review:

      We thank the editor and reviewers for their thoughtful and constructive feedback, which has enabled us to greatly strengthen the manuscript. We apologize for the delay in resubmitting this as we were dealing with a large turnover in the lab due to trainee graduations which has We have carefully revised the text, figures, and supplementary materials in response to these comments. Below, we summarize the key revisions made followed by a point-by-point response to the reviewers’ critiques.

      (1) Performed CUTS analyses in human neuronal system: In the revised manuscript, we included new data demonstrating that the CUTS system can be applied to additional cellular models, specifically neuronal cells (Figure 5, Figure S4). To address whether CUTS functions effectively in neuronal contexts, we generated stable CUTS-expressing lines in differentiated BE(2)-C and ReN VM–derived differentiated neurons (Figure 5A-D, Figure S4 A-C). To ensure this was neuronal expression, we developed a new Tet-On3G system construct where the Tet-On3G transactivating protein is driven by the SYN1 promoter to ensure neuron-specific inducible expression for these experiments.

      (2) Define the relationship between CUTS and endogenous/physiological cryptic exons inclusion: To evaluate how well the CUTS system reflects physiological cryptic exon regulation, we performed RT-PCR analysis of several cryptic exons previously reported by us and evaluated CUTS activation at the RNA level in parallel (Figure S2E) . CUTS is sensitive to low-mild reductions in TDP-43 levels, whereas the tested endogenous cryptic exons exhibit variable responses to TDP-43 knockdown.

      (3) Defining stress-induced TDP-43 loss of function: We included new data demonstrating that the CUTS system can detect TDP-43 loss of function induced by acute sodium arsenite (NaAsO₂) treatment in HEK cells (Figure 3D–I). We have also tested additional stressor as part of a separate ongoing study where this work will be expanded upon (Xie et al., 2025). We selected this paradigm since TDP-43 loss of function in response to acute NaAsO₂ treatment is also supported by work from other labs(Huang et al., 2024).

      (4) Implications of using a TDP-43 Loss-of-Function sensor for therapeutic applications: In the revised manuscript, we clarify that CUTS-TDP43 is auto-regulated and we highlight two potential therapeutic applications: i) TDP-43 Knockdown-and-replacement: CUTS-TDP43 provides a strategy for simultaneous depletion of pathological TDP-43 species while enabling autoregulated re-expression of wild-type TDP-43. This design mitigates the risk of supraphysiologic overexpression, a known liability in conventional replacement approaches, by restoring TDP-43 within a self-limiting regulatory network that maintains homeostatic control. ii) Aggregation-independent correction: Because CUTS is autoregulatory, it can be repurposed to regulate alternative downstream effectors, including splicing modifiers or TDP-43 functional interactors, without expressing TDP-43 itself. This approach provides a potential aggregation-independent strategy to compensate for TDP-43 loss-of-function (LOF) by restoring downstream splicing. We are evaluating this work in a follow up study (Xie et al., 2025). In these ongoing studies, we show that CUTS-regulated expression of splicing proteins in response to TDP-43 loss restored subsets of cryptic exon events (24/28 events evaluated). These findings suggest CUTS as a versatile tool for both autoregulated TDP-43 replacement and trans-regulatory therapeutic correction. We expanded on this concept in the discussion section of this revised manuscript. We also note that autoregulatory TDP-43 biosensor strategies have been proposed in related systems, including TDP-Reg, underscoring broader interest in self-regulated TDP-43 systems (Wilkins et al., 2024).

      (5) Clarified mechanism of TDP-43 5FL causing strong loss of function: The TDP-43 5FL exhibits reduced RNA binding capacity, and we previously showed that the lack of RNA binding promotes aberrant homotypic phase separation of TDP-43 (Mann et al., 2019). Expression of RNA-deficient TDP-43 variant forms nuclear “anisomes” (Yu et al., 2021), which evidence suggests sequesters endogenous TDP-43 protein into insoluble structures. We expanded on this in our results section in this revised manuscript.

      (6) Improved figure clarity and data presentation: To enhance clarity and organization, we maintained the main structure of the manuscript while reorganizing figures and improved data visualization. Some examples include:

      Figure 1: We revised the schematic layout for greater clarity and simplicity. The figure now focuses more specifically on the CUTS data, with additional data on the UNC13A-TS and CFTR-TS moved to Figure S1. To improve readability, titles were added to all schematic panels. Visual consistency was also improved by refining the color labelling for each sensor in Figures 1C and 1D and adjusting the corresponding bar graphs accordingly.

      Figure 2: We reorganized the figure to clearly distinguish between protein and mRNA analyses for greater clarity. In the revised layout, western blot quantifications of TDP-43 and CUTS (GFP) signals are shown in Figures 2D and 2E, respectively, while the corresponding qPCR analyses are presented in Figures 2H and 2I. Minor edits include removing the percentage knockdown and fold-change annotations from the graphs and incorporating these values into a mini-table in Figure S2E.

      The original Figure 2D and 2G were reincorportated as reference panels in Figure S2A–B, while new graphs showing CUTS protein-level changes as a function of TDP-43 knockdown were added (Figure S2C–D). We also incorporated new data showing the behavior of endogenous cryptic exons under low siTDP-43 treatment (Figure S2E).

      Figure 3: We added new data demonstrating that the application of the CUTS system in detecting TDP-43 loss of function induced by stress conditions. Specifically, we show that sodium arsenite (NaAsO₂) treatment leads to TDP-43 functional impairment detectable by CUTS and supported with endogenous cryptic exon via RT-PCR (Figure 3D-I).

      Figure 5 and Figure S4: We introduced a new figure that demonstrates the effective application of the CUTS system in differentiated neuronal systems, thereby extending its usability to disease-relevant cell types.

      Figures 2SA and 4B were edited to include the corresponding labels on the sides of each image for clarity. Sup Figure 2A was moved to Sup Figure 3A, while Figure 4B remains in its original configuration.

      We thank the reviewers again for their insightful critiques and helpful suggestions, which have enabled us to substantially improve the manuscript. Please find our detailed response to each review below:

      Reviewer #1 (Public review):

      Summary:

      The authors create an elegant sensor for TDP -43 loss of function based on cryptic splicing of CFTR and UNC13A. The usefulness of this sensor primarily lies in its use in eventual high throughput screening and eventual in vivo models. The TDP-43 loss of function sensor was also used to express TDP-43 upon reduction of its levels.

      Strengths:

      The validation is convincing, the sensor was tested in models of TDP-43 loss of function, knockdown and models of TDP-43 mislocalization and aggregation. The sensor is susceptible to a minimal decrease of TDP-43 and can be used at the protein level unlike most of the tests currently employed,

      Weaknesses:

      Although the LOF sensor described in this study may be a primary readout for high-throughput screens, ALS/TDP-43 models typically employ primary readouts such as protein aggregation or mislocalization. The information in the two following points would assist users in making informed choices.

      (1) Testing the sensor in other cell lines

      We thank the reviewer for raising this important point. In agreement with this suggestion, we generated ReN VM cell lines and used a neuroblastoma cell line model (BE(2)-C) expressing the TetOn3G CUTS system under a human synapsin I (hSYN1) promoter. In this construct the transactivator protein is under the control of a neuronal specific hSYN1 promoter whereas the classical TetOn3G system uses a CMV-like promoter. Several studies have reported reduced activity or silencing of CMV and PGK-driven transgenes in neurons. Therefore, we for our neuronal experiments, we removed this promoter to generate a new version of a doxycycline-inducible CUTS system in which Tet-On 3G transactivator is now driven by the hSYN1 promoter which will express CUTS in response to doxycycline treatment. In this improved construct, we also replaced mCherry with mScarlet to enhance the fluorescent signal.

      To test this neuronal-adapted system, we established stable CUTS expression in undifferentiated BE(2)-C cells, a subclone of the SK-N-BE(2) neuroblastoma line that has been used to study TDP-43–dependent splicing function(Brown et al., 2022). This model can be differentiated into neuron-like cells within 10 days, as shown in Supplementary Figure 4A. Using this model, we confirmed that TDP-43 knockdown leads to robust activation of the CUTS system (Figure 5B-E). We additionally tested this in in a stable polyclonal ReN VM cells following differentiation into cortical-like neurons (Figure 5D, Figure S4B-C).

      (2) Establishing a correlation between the sensor's readout and the loss of function (LOF) in the physiological genes would be useful given that the LOF sensor is a hybrid structure and doesn't represent any physiological gene. It would be beneficial to determine if a minor decrease (e.g., 2%) in TDP-43 levels is physiologically significant for a subset of exons whose splicing is controlled by TDP43.

      We agree with the reviewer that correlating the sensor’s readout with physiological TDP-43 splicing targets is essential to validate its biological relevance. To this end, we complemented our sensor expression profile with endogenous cryptic exons (CEs) sensitive to TDP-43 depletion. We tested a panel of five physiological cryptic exons regulated by TDP-43 (LRP8, EPB41L4A, ARHGAP32, HDGFL2, and ACBD3). To address the reviewer’s concerned, we performed RT-PCR on samples from the low-dose siTDP-43 experiment shown in Figure S2E.

      The endogenous CEs used in the panel were selected based on our own and others’ preliminary observations. Among these, HDGFL2 showed a particularly robust increase in cryptic exon inclusion at very low siTDP-43 concentrations (38 pM), while untreated samples showed almost no CE inclusion. This finding strongly supports a direct mechanism linking mild TDP-43 reduction to loss of physiological splicing control.

      (3) Considering that most TDP-LOF pathologically occurs due to aggregation and or mislocalization, and in most cases the endogenous TDP-43 gene is functional but the protein becomes non-functional, the use of the loss of function sensor as a switch to produce TDP-43 and its eventual use as gene therapy would have to contend with the fact that the protein produced may also become nonfunctional. This would eventually be easy to test in one of the aggregation modes that were used to test the sensor.. However, as the authors suggest, this is a very interesting system to deliver other genetic modifiers of TDP-43 proteinopathy in a regulated fashion and timely fashion.

      We thank the reviewer for this thoughtful point and agree that in the disease-relevant context where endogenous TDP-43 is intact but TDP-43 function is lost due to mislocalization and/or aggregation, a re-supply of TDP-43 risks sequestration and loss of activity. In our manuscript, the CUTS-TDP43 module was presented as a control circuit proof-of-concept rather than a stand-alone approach: it demonstrates that CUTS can (i) sense LOF with high dynamic range and proportionality, and (ii) drive a payload under negative feedback such that total TDP-43 remains near baseline while partially rescuing a splicing readout (CFTR minigene) under knockdown conditions.

      Importantly, we evaluated CUTS in aggregation/mislocalization-prone contexts: ΔNLS, 5FL, and ΔNLS+5FL variants trigger CUTS activation (ref), allowing us to quantify LOF arising from these aggregation modes. This confirms that CUTS can operate precisely in the very settings where sequestration is likely to occur.

      To directly address the reviewer’s suggestion, in the revision we (i) clarify in the Discussion that CUTS-TDP43 is a circuit demonstration and not our proposed monotherapy in aggregation-dominant disease; and (ii) expand our therapeutic framing into two approaches:

      Knockdown-and-replacement: concurrently deplete aggregation-prone/endogenous pathologic TDP-43 species (i.e., mutant TDP-43) while using CUTS to re-deliver wild-type TDP-43 under autoregulation. Aggregation-independent correction: use of CUTS to deliver modifiers that bypass TDP-43 sequestration (e.g., downstream effectors or splicing correctors that restore LOF consequences without expressing TDP-43 itself).

      (4) I don't think the quantity of siRNA is directly proportional to the degree of TDP-43 knockdown/extent of TDP-43 loss. Therefore, to enhance the utility of the dose-response curves, I'd suggest using TDP-43 levels as the variable on the x-axis, rather than the amount of siRNA administered or even just adding a plot alongside the current plots would enable readers to quickly evaluate LOF response levels concerning the protein. While I understand that the sensitivity of Western blots for quantification might be why the authors have not created the graphs in this manner, having this information would be useful.

      We appreciate the reviewer’s insightful comment. As noted, in the original version of the graph, we incorporated the percentage of TDP-43 knockdown corresponding to each siTDP-43 concentration (indicated in red text). However, we agree that this format was not easy to interpret, given the amount of information presented. To address this, we generated two new plots in which the x-axis represents TDP-43 levels (percentage of remaining protein or mRNA), and the y-axis shows the fold change in CUTS signal measured by (i) TDP-43 protein pixel intensity and (ii) TDP-43 mRNA levels, respectively. These new plots are now included as Supplementary Figures 2C–D, which allow a clearer visualization of CUTS readout in relation to actual TDP-43 levels rather than siRNA dose. As the reviewer anticipated, the reason we did not originally present the data in this format was that at low siTDP-43 concentrations, the fold change is minimal and more difficult to quantify by Western blot. Nevertheless, we have now incorporated the revised plots to strengthen the interpretation of the dose–response relationship. Additionally, we experience batch effects across siRNA lots. We believe this revised format should enhance the clarity of the result.

      (5) p3 line 74: one of the reasons cited as a pitfall of using the endogenous cryptic exons exhibit variable responses to TDP-43 loss and may be cell type-specific. has the sensor been used in different cell lines?

      We tested the CUTS system in differentiated neuronal models using two differentiated neuronal cell types, BE(2)C and ReN VM cells. The results are presented in Figure 5 and Figure S4 of the revised manuscript.

      (6) The order of the text describing 1A and 1B is confusing. The text starts describing the TS cassettes referring to 1A using the CUTS cassettes which haven't been introduced yet as an example. I'd suggest reorganising this section. The graph, always in 1A showing readout proportional to GFP should be taken out or highlighted in the figure legend that it is theoretical.

      We agree with the reviewer’s point. In the original schematic (Figure 1A), we included the CUTS system as an example to introduce the TS cassette design, since it contains the three possible sensor configurations. However, we recognize that this could be confusing. Therefore, we have removed the CUTS cassette from Figure 1A, along with the theoretical graph showing GFP readout proportional to the degree of TDP-43 LOF. In agreement with this change, we also restructured Figure 1. As the focus is the CUTS system, we have moved the Western blot and quantification of UNC13A-TS and CFTR-TS to Supplementary Figure 1.

      Reviewer #2 (Public review):

      Summary:

      The authors goal is to develop a more accurate system that reports TDP-43 activity as a splicing regulator. Prior to this, most methods employed western blotting or QPCR-based assays to determine whether targets of TDP-43 were up or down-regulated. The problem with that is the sensitivity. This approach uses an ectopic delivered construct containing splicing elements from CFTR and UNC13A (two known splicing targets) fused to a GFP reporter. Not only does it report TDP-43 function well, but it operates at extremely sensitive TDP-43 levels, requiring only picomolar TDP-43 knockdown for detection. This reporter should supersede the use of current TDP-43 activity assays, it's cost-effective, rapid and reliable.

      Strengths:

      In general, the experiments are convincing and well designed. The rigor, number of samples and statistics, and gradient of TDP-43 knockdown were all viewed as strengths. In addition, the use of multiple assays to confirm the splicing changes were viewed as complimentary (ie PCR and GFPfluorescence) adding additional rigor. The final major strength I'll add is the very clever approach to tether TDP-43 to the loss of function cassette such that when TDP-43 is inactive it would autoregulate and induce wild-type TDP-43. This has many implications for the use of other genes, not just TDP-43, but also other protective factors that may need to be re-established upon TDP-43 loss of function.

      Weaknesses:

      (1) Admittedly, one needs to initially characterize the sensor and the use of cell lines is an obvious advantage, but it begs the question of whether this will work in neurons. Additional future experiments in primary neurons will be needed.

      We thank the reviewer for highlighting the importance of validating the sensor in neuronal models, given the central role of TDP-43 dysfunction in ALS/FTD and related neurodegenerative disorders. While initial characterization in established cell lines provides experimental control and scalability, we agree that demonstrating functionality in neuronal systems is essential. To address this, we adapted the CUTS platform for neuronal application by incorporating the human synapsin-1 (hSYN1) promoter into the Tet-On 3G system to enable inducible, neuronal specific expression. We validated this configuration in differentiated BE(2)-C cells (Figures 5A-C, S4A-C), where CUTS retained robust responsiveness to TDP-43 perturbation. In parallel, we generated stable CUTS-expressing ReN VM neural progenitor cells and differentiated them for three weeks prior to functional assessment (Figures 5A-C, S4A-C). In both neuronal models, CUTS was functional and responsive to TDP-43 siRNA. We are currently optimizing promoter selection and expression paradigms for fully differentiated iPSC-derived neuronal models and will be the subject of future studies.

      (2) The bulk analysis of GFP-positive cells is a bit crude. As mentioned in the manuscript, flow sorting would be an easy and obvious approach to get more accurate homogenous data. This is especially relevant since the GFP signal is quite heterogeneous in the image panels, for example, Figure 1C, meaning the siRNA is not fully penetrant. Therefore, stating that 1% TDP-43 knockdown achieves the desired sensor regulation might be misleading. Flow sorting would provide a much more accurate quantification of how subtle changes in TDP-43 protein levels track with GFP fluorescence.

      We thank the reviewer for this thoughtful suggestion. We agree that flow cytometry and sorting of GFP-positive populations would provide a higher-resolution, single-cell–level relationship between TDP-43 abundance and sensor output. Such an approach would reduce heterogeneity arising from incomplete siRNA penetrance and allow more precise quantification of how incremental changes in TDP-43 protein levels track with GFP fluorescence. In the present study, our goal was to establish proof-of-principle functionality of the CUTS circuit and to demonstrate that graded TDP-43 depletion produces a proportional sensor response at the population level. While GFP signal heterogeneity is visible in imaging panels, we hypothesize that this variability likely reflects known differences in siRNA uptake and transfection efficiency rather than instability of the circuit itself. Importantly, bulk measurements consistently demonstrated dose-dependent sensor regulation across independent experiments, supporting the robustness of the system despite cellular heterogeneity. Furthermore, we were able to quantify CUTS activation in HeLa TARDBP<sup>-/-</sup> cells. We also note that CUTS was developed as a practical tool for rapid assessment of TDP-43 LOF in standard laboratory settings. Although flow cytometry increases resolution, the ability to detect functional perturbation using bulk fluorescence measurements supports the utility of the system for routine and high-throughput applications.

      We agree that flow cytometry would provide a more refined analysis of the dynamic range and sensitivity of CUTS, particularly for defining thresholds such as minimal TDP-43 knockdown required for measurable activation. We plan to include this work in future studies. Specifically, we have implemented FACs sorting of CUTS-expressing cells in a parallel study in which we are conducting a CRISPR knockout screen to identify modifiers of TDP-43 splicing function. For this, we incorporate TDP-43 knockdown followed by FACs to stratify cells based on CUTS activation. This strategy enables direct evaluation of the relationship between the extent of TDP-43 LOF and CUTS sensor activation. These analyses are ongoing and provide a more quantitative analyses linking TDP-43 depletion to CUTS activation and address the reviewer’s concern regarding heterogeneity in bulk measurements. We plan to include this in a future study.

      (3) Some panels in the manuscript would benefit from additional clarity to make the data easier to visualize. For example, Figure 2D and 2G could be presented in a more clear manner, possibly split into additional graphs since there are too many outputs.

      We thank the reviewer for this suggestion. In response, we have split the graphs previously shown in Figures 2D and 2G to improve clarity, as we agree that these panels contained an extensive amount of data. We Specifically split Figure 2D into two separate graphs showing TDP-43 and GFP pixel intensity from Western blots on the Y-axis, plotted against low siTDP-43 treatment on the X-axis. Please see this data as Figure 2 D and Figure 2E in the new manuscript.

      Furthermore, for Figure 2G we also split into graphs showing the fold change of mRNA for TDP-43 and the CUTS cryptic exon plotted against low siTDP-43 treatment on the X-axis. Please see this data as Figure 2 H and Figure 2I in the new manuscript. We have maintained the previous graphs in Supplementary Figure 2 to preserve the full dataset for reference.

      (4) Sup Figure 2A image panels would benefit from being labeled, its difficult to tell what antibodies or fluorophores were used. Same with Figure 4B.

      We appreciate the reviewer’s careful observation. In both figures, we are showing mCherry and GFP signals. In the revised version, we have added the corresponding labels to the side of each image for clarity. Therefore, Sup Figure 2A has been moved and is now Sup Figure 3A, while Figure 4B remains in its original configuration.

      (5) Figure 3 is an important addition to this manuscript and in general is convincing showing that TDP43 loss of function mutants can alter the sensor. However, there is still wild-type endogenous TDP-43 in these cells, and it's unclear whether the 5FL mutant is acting as a dominant negative to deplete the total TDP-43 pool, which is what the data would suggest. This could have been clarified.

      The TDP-43 5FL variant exhibits reduced RNA-binding capacity, and we previously demonstrated that impaired RNA binding promotes aberrant homotypic phase separation of TDP-43. Consistent with this mechanism, expression of RNA-binding–deficient TDP-43 variants induces the formation of nuclear “anisomes” which have been shown to sequester endogenous TDP-43 into insoluble fractions via dominant-negative mechanisms (Cohen et al., 2015; Keating et al., 2023; Mann et al., 2019; Yu et al., 2021). These findings support a model in which disruption of RNA engagement alters TDP-43 biophysical behavior and promotes functional depletion through self-association. We have expanded this mechanistic explanation in the Results section of the revised manuscript to better contextualize the behavior of the 5FL construct and its impact on endogenous TDP-43.

      (6) Additional treatment with stressors that inactivate TDP-43 could be tested in future studies.

      We appreciate this suggestion and agree with this important point. Due to the lack of methods to directly induce endogenous TDP-43 aggregation and loss of function, the use of stressors has become a partial solution to address this issue. In line with this, our group has tested several stressors in follow-up research, including sodium arsenite (NaAsO₂), puromycin, KCl, MG132, sorbitol, and tunicamycin, using HEK cells expressing the CUTS system(Xie et al., 2025). We were able to show a dose-response relationship in relative GFP intensity under these conditions, with sodium arsenite showing the strongest effect, consistent with previous reports(Huang et al., 2024). To provide additional relevant findings in the current manuscript, we expanded this analysis by testing sodium arsenite in the CUTS system while also including endogenous cryptic exons. We therefore added a new figure showing the effect of sodium arsenite on the CUTS system, including GFP intensity measurements, qPCR using CUTS cryptic exon primers, and three endogenous cryptic exon reporters (ATG4B, GPSM2, and KCNQ2).

      Overall, the authors definitely achieved their goals by developing a very sensitive readout for TDP-43 function. The results are convincing, rigorous, and support their main conclusions. There are some minor weaknesses listed above, chief of which is the use of flow sorting to improve the data analysis. But regardless, this study will have an immediate impact for those who need a rapid, reliable, and sensitive assessment of TDP-43 activity, and it will be particularly impactful once this reporter can be used in isolated primary cells (ie neurons) and in vivo in animal models. Since TDP-43 loss of function is thought to be a dominant pathological mechanism in ALS/FTD and likely many other disorders, having these types of sensors is a major boost to the field and will change our ability to see sub-threshold changes in TDP-43 function that might otherwise not be possible with current approaches.

      (7) Regarding the methods, they seem a bit sparse and would benefit from additional detail. For example, I do not see a section in the methods where microscopy images were quantified (%GFP positive cells for example). This information is important and is lacking in the current form.

      We thank the reviewers, and we add the following information in the method section: For live imaging quantification, we measured the mean GFP signal intensity for each group. The values were averaged, and the fold change was calculated and plotted. For immunofluorescent imaging, we first created maximum intensity projection images. We then applied masks to the GFP, mCherry, and Hoechst signals. By overlapping the GFP and mCherry signals, we identified the number of GFP-positive cells. Similarly, by overlapping the mCherry signal with the Hoechst mask, we identified the CUTS-expressing cells. We then calculated the ratio of GFPpositive cells to CUTS-expressing cells and plotted it as a percentage of GFP-positive cells. All analyses were performed using the Nikon NIS software. This information is included in the methods of the revised manuscript.

      Reviewer #3 (Public review):

      The DNA and RNA binding protein TDP-43 has been pathologically implicated in a number of neurodegenerative diseases including ALS, FTD, and AD. Normally residing in the nucleus, in TDP-43 proteinopathies, TDP-43 mislocalizes to the cytoplasm where it is found in cytoplasmic aggregates. It is thought that both loss of nuclear function and cytoplasmic gain of toxic function are contributors to disease pathogenesis in TDP-43 proteinopathies. Recent studies have demonstrated that depletion of nuclear TDP-43 leads to loss of its nuclear function characterized by changes in gene expression and splicing of target mRNAs. However, to date, most readouts of TDP-43 loss of function events are dependent upon PCR-based assays for single mRNA targets. Thus, reliable and robust assays for detection of global changes in TDP-43 splicing events are lacking. In this manuscript, Xie, Merjane, Bergmann and colleagues describe a biosensor that reports on TDP-43 splicing function in real time. Overall, this is a well described unique resource that would be of high interest and utility to a number of researchers. Nonetheless, a couple of points should be addressed by the authors to enhance the overall utility and applicability of this biosensor.

      (1) While the rationale for selecting UNC13A CE as the reporting CE species is understood given the relevance to disease, could the authors please comment on whether other CE sequences would behave similarly or as robustly? This is particularly critical given the multitude of different splicing changes that can occur as a result of TDP-43 loss of function (ie cryptic exons of differing sensitivity, skiptic exons, premature polyadenylation).

      We thank the reviewer for this question regarding generalizability beyond the UNC13A CE. While UNC13A was selected due to its strong disease relevance and well-characterized sensitivity to TDP-43 loss-of-function (LOF), our platform is not intrinsically restricted to this sequence. In the manuscript, we directly compared three architectures: UNC13A-TS, CFTR-TS, and the combined CUTS sensor incorporating additional UG motif optimization. Under matched conditions in stable HEK293 lines, CUTS demonstrated superior specificity and sensitivity, exhibiting near-zero baseline activity and a proportional, log-linear response across low-dose siTDP43 (38–1200 pM) (Figures 1–2). Importantly, this head-to-head comparison demonstrates that sensor performance can be engineered and optimized beyond a single CE species.

      TDP-43 LOF is known to induce a spectrum of RNA processing defects, including cryptic exons with differing sensitivities and cell-type dependence, premature polyadenylation events (e.g., STMN2), and, under conditions of excess nuclear TDP-43, exon skipping (“skiptic exons”). This diversity supports the concept in which alternative CE elements, or other TDP-43 regulated RNAs, can be incorporated into the same sensor backbone and tuned for specific biological scenarios (cell type, specific stress responses, etc...). Consistent with this, the recently described TDP-REG system (Wilkins et al., 2024) designed and AI-generated de novo CE sequences to express reporters or gene payloads, and screened multiple candidates to identify the appropriate RNA elements required for this response. These findings demonstrate that CE sequences beyond UNC13A can serve as robust TDP-43 sensing elements when optimized. Our results complement this work by demonstrating that CUTS achieves tight baseline control and a steep dynamic range (>110,000-fold induction over baseline in HEK293 cells), while maintaining compatibility across both non-neuronal and neuronal model systems, as shown in the revised manuscript.

      In the revised manuscript, we show direct comparisons indicating that CUTS outperforms single-CE sensors such as UNC13A-TS and CFTR-TS under identical conditions. This supports independent work from other groups that alternative CE sequences can be engineered into effective sensors, depending on their paradigm and model systems. We have clarified this in the revised Discussion and now note that CUTS is adaptable to alternative CE inserts.

      (3) Could the authors provide evidence of the utility of their biosensor in disease relevant systems that do not rely on TDP-43 KD? For example, does this biosensor report on TDP-43 loss of function in C9orf72 iPSNs in a time-dependent manner? Alternatively, groups have modeled TDP-43 proteinopathy in wildtype iPSNs via MG132 treatment.

      We thank the reviewer for this important suggestion. We agree that demonstrating CUTS responsiveness in disease-relevant models independent of artificial TDP-43 knockdown would further strengthen its translational relevance. In the current study, our primary objective was to establish the sensitivity, dynamic range, and autoregulatory properties of the CUTS circuit under controlled perturbation of TDP-43 levels. siRNA-mediated depletion provides a reliable approach to establish the relationship between graded TDP-43 LOF and the CUTS sensor sensitivity/specificity. That said, CUTS is designed to detect functional TDP-43 loss irrespective of the upstream cause. As the reviewer notes, disease-relevant systems, such as C9orf72 iPSC-derived neurons and proteotoxic stress paradigms (e.g., MG132-induced impairment of TDP-43 nuclear function), are important for future studies. We are currently evaluating CUTS in iPSC-derived neuronal models of TDP-43 proteinopathy, but are optimizing the induction system, promoters, and timing. It should be noted that C9orf72 iPSC neurons do not exhibit TDP-43 LOF using standard differentiation protocols. Regarding pharmacological stress, we have shown that acute sodium arsenite treatment can activate CUTS (Figure 3). In a concurrent study under revision, we show that MG132 similarly causes TDP-43 LOF and CUTS activation (Xie et al., 2025). Notably, none of these induce complete nuclear loss of TDP-43; instead, they show nuclear TDP-43 retention or modest mislocalization. This suggests that TDP-43 LOF may also result from nuclear redistribution and dysfunction under these stress conditions, rather than from complete nuclear loss. We look forward to presenting these ongoing studies in the future.

      References

      Brown A-L, Wilkins OG, Keuss MJ, Kargbo-Hill SE, Zanovello M, Lee WC, Bampton A, Lee FCY, Masino L, Qi YA, Bryce-Smith S, Gatt A, Hallegger M, Fagegaltier D, Phatnani H, NYGC ALS Consortium, Newcombe J, Gustavsson EK, Seddighi S, Reyes JF, Coon SL, Ramos D, Schiavo G, Fisher EMC, Raj T, Secrier M, Lashley T, Ule J, Buratti E, Humphrey J, Ward ME, Fratta P. 2022. TDP-43 loss and ALS-risk SNPs drive mis-splicing and depletion of UNC13A. Nature 603:131–137. doi:10.1038/s41586-022-04436-3

      Cohen TJ, Hwang AW, Restrepo CR, Yuan C-X, Trojanowski JQ, Lee VMY. 2015. An acetylation switch controls TDP-43 function and aggregation propensity. Nat Commun 6:5845. doi:10.1038/ncomms6845

      Huang W-P, Ellis BCS, Hodgson RE, Sanchez Avila A, Kumar V, Rayment J, Moll T, Shelkovnikova TA. 2024. Stress-induced TDP-43 nuclear condensation causes splicing loss of function and STMN2 depletion. Cell Rep 43:114421. doi:10.1016/j.celrep.2024.114421

      Keating SS, Bademosi AT, San Gil R, Walker AK. 2023. Aggregation-prone TDP-43 sequesters and drives pathological transitions of free nuclear TDP-43. Cell Mol Life Sci 80:95. doi:10.1007/s00018-023-04739-2

      Mann JR, Gleixner AM, Mauna JC, Gomes E, DeChellis-Marks MR, Needham PG, Copley KE, Hurtle B, Portz B, Pyles NJ, Guo L, Calder CB, Wills ZP, Pandey UB, Kofler JK, Brodsky JL, Thathiah A, Shorter J, Donnelly CJ. 2019. RNA Binding Antagonizes Neurotoxic Phase Transitions of TDP-43. Neuron 102:321-338.e8. doi:10.1016/j.neuron.2019.01.048

      Wilkins OG, Chien MZYJ, Wlaschin JJ, Barattucci S, Harley P, Mattedi F, Mehta PR, Pisliakova M, Ryadnov E, Keuss MJ, Thompson D, Digby H, Knez L, Simkin RL, Diaz JA, Zanovello M, Brown A-L, Darbey A, Karda R, Fisher EMC, Cunningham TJ, Le Pichon CE, Ule J, Fratta P. 2024. Creation of de novo cryptic splicing for ALS and FTD precision medicine. Science 386:61–69. doi:10.1126/science.adk2539

      Xie L, Zhu Y, Hurtle BT, Wright M, Robinson JL, Mauna JC, Brown EE, Ngo M, Bergmann CA, Xu J, Merjane J, Gleixner AM, Grigorean G, Liu F, Rossoll W, Lee EB, Kiskinis E, Chikina M, Donnelly CJ. 2025. Contextdependent Interactors Regulate TDP-43 Dysfunction in ALS/FTLD. BioRxiv. doi:10.1101/2025.04.07.646890

      Yu H, Lu S, Gasior K, Singh D, Vazquez-Sanchez S, Tapia O, Toprani D, Beccari MS, Yates JR, Da Cruz S, Newby JM, Lafarga M, Gladfelter AS, Villa E, Cleveland DW. 2021. HSP70 chaperones RNA-free TDP-43 into anisotropic intranuclear liquid spherical shells. Science 371. doi:10.1126/science.abb4309.

    1. Author Response:

      The following is the authors’ response to the previous reviews

      Public Review:

      Reviewer #1 (Public review):

      The weaknesses are in the clarity and resolution of the data that forms the basis of the model. In addition to general whole embryo morphology that is used as evidence for CE defects, two forms of data are presented, co-expression and IP, as well as a strong reliance on IF of exogenously expressed proteins. Thus, it is critical that both forms of evidence be very strong and clear, and this is where there are deficiencies; 1) For vast majority of experiments general morphology and LWR was used as evidence of effects on convergent extension movements rather than keller explants or actual cell movements in the embryo. 2) the microscopy would benefit from super resolution microscopy since in many cases the differences in protein localization are not very pronounced. 3) the IP and Western analysis data often shows very subtle differences, and some cases not apparent.

      Major points.

      (1) Assessment of CE movement

      The authors conducted an analysis of the subcellular localization of PCP core proteins, including Vangl2, Pk, Fz, and Dvl, within animal cap explants (ectodermal explants). The authors primarily used the length-to-width ratio (LWR) to evaluate CE movement as a basis for their model. However, LWR can be influenced by multiple factors and is not sufficient to directly and clearly represent CE defects. While the author showed that Prickle knockdown suppresses animal cap elongation mediated by Activin treatment, they did not test their model using standard assays such as animal cap elongation or dorsal marginal zone (DMZ) Keller explants. Furthermore, although various imaging analyses were performed in Wnt11-overexpressing animal caps and DMZ explants, the Wnt11-overexpressing animal caps did not undergo CE movement. Given that this study focuses on the molecular mechanisms of Vangl2 and Ror2 regulation of Dvl2 during CE, the model should be validated in more appropriate tissues, such as DMZ explants.

      (2) Overexpression conditions

      Another concern is that most analyses were performed with overexpression conditions. PCP core proteins (Vangl2, Pk, Dvl, and Fz receptors) are known to display polarized subcellular localization in both the neural epithelium and DMZ explants (Ref: PCP and Septins govern the polarized organization of the actin cytoskeleton during convergent extension, Current Biology, 2024). However, in this study, overexpressed PCP core proteins failed to show polarized localization. Previous studies, such as those from the Wallingford lab, typically used 10-30 pg of RNA for PCP core proteins, whereas this study injected 100-500 pg, which is likely excessive and may have created artificial conditions that confound the imaging results.

      (3) Subtle and insufficient effects

      Several of the reported results show quite modest changes in imaging and immunoprecipitation analyses, which are not sufficient to strongly support the proposed molecular model. For example, most Dvl2 remained localized with Fz7 even under Vangl2 and Pk overexpression (Fig. 4). Similarly, Wnt11 overexpression only slightly reduced the association between Vangl2 and Dvl2 (Sup. Fig. 8), and the Ror2-related experiments also produced only subtle effects (Fig. 8, Sup. Fig. 15).

      We thank reviewer 1 for careful reading of our revised manuscript, and additional constructive criticisms. Since the two reviewers had divergent opinions towards our revised manuscript, we think that it might be more productive to request a Version of Record at this point, and have our proposed model debated/ tested by others in the field. We will keep the reviewer’s suggestions in mind while design ongoing studies. We would like to address the criticisms collectively below:

      (1) The primary goal of our current manuscript is to build a mechanistic model for non-canonical Wnt signaling through elucidating the functional relationships between Dvl, Vangl, PK and Ror during CE. They each have been studied extensively in prior literature using DMZ injected embryos, and DMZ, Keller and animal cap explants, so there is little doubt that the reduced LWR following their over-expression or knockdown in DMZ is due to disruption of CE. In the context of our study in the current manuscript, we primarily performed their co-injections in different combinations to differentiate synergistic vs. antagonistic relationship, and in the majority cases we relied on epistatsis to draw conclusions (e.g. Fig. 1; Fig. 2h, I; Suppl. Fig. 6; Suppl. Fig. 14). Nevertheless, we did follow the reviewer’s suggestion and used animal cap elongation as an additional assay to confirm that Pk and Vangl2 did synergize to disrupt CE, and their synergy could be blocked by Dvl2 co-overexpression; the new data is added to Fig. 1 (Fig. 1h, h’). Therefore, given the prior literature, our new animal cap explant data, and the specific scope of our current study, we feel that the LWR measurement is a reasonable assay to determine CE phenotype in this manuscript. We fully agree with the reviewer that our model will need to be tested at the cellular level through live imaging of DMZ explants; it is indeed the direction of our future study, but is beyond the scope of the current manuscript.

      (2) A salient feature of non-canonical Wnt signaling is that loss or over-expression of any components can often cause identical CE defects at the tissue/ embryo level. We used many co-injection experiments to demonstrate that this is due, at least in part, to a counterbalance between Dvl/Ror and Vangl/PK (e.g. Fig. 1; Fig. 2h, I; Suppl. Fig. 6; Suppl. Fig. 14). It is in this context that we planned the imaging and biochemical experiments to determine the possible molecular mechanisms underlying their functional interaction, and we feel that the moderate over-expression used is reasonable in this case for us to build the first integrated model. We do plan to test our model using lower expression in the future. To acknowledge the limitation of our study, we also added the following sentences in the Discussion:

      “We acknowledge, however, that our model explains primarily the potential molecular actions underlying the regulation of CE at the tissue level. Whether and how our model may explain the cellular behavior during CE, such as polarized remodeling of cell junction or extension of cell protrusions, will require further study.”

      (3) The Wnt11 induced reduction of Dvl2-Vangl2 co-IP (Suppl. Fig. 8, 15) may be moderate, but is statistically significant and reproducible, and we have reported similar findings in two other publications (DOI: 10.1093/hmg/ddx095; DOI: 10.1038/s41467-025-57658-0). Given the limitation of co-IP, we had to rely on high level over-expression to make the experiments feasible. We are building proximity based assays such as NanoBRET, and plan to verify the result with lower level expression in the future.

      Reviewer #2 (Public review):

      We thank the reviewer for the encouraging comments, and the suggestion to clarify the description related to Suppl. Fig. 15. We made revision according to the reviewer’s suggestion, and added Suppl. Fig. 16 to further examine the effect of Ror2 knockdown on the steady state interaction between Dvl2 and Vangl2 using imaging approach.

    1. I have been a vegetarian formore than twenty years, which I oncethought exempted me from the violence that accompanies the securing of

      Unfortunately, we are animals. We don't live off the sun's rays and water and simply kill out of competition for non-living resources, we eat other living things. Jains put great effort into not killing living things (don't eat root vegetables for example), but that severely impacts their lives.

      Being vegan I have a couple ways I think about the violence of my life. Mainly, I honestly don't think it has changed MY life much at all to be vegan, yet it has changed the lives of the many animals impacted by eating animal products regularly. * From an energy perspective, eating plants takes less lives simply because the animal I may eat had to eat something as well, and energy is lost as it goes through that cycle of eating. This is unchangeable right now. * The difficulties with being vegan aren't really because of the lifestyle itself, it's because of greater society. Society allows me to live a vegan lifestyle, in that I can easily get the nutrients I need from the grocery store's options (there is an abundance of food). Society also makes it difficult to be vegan because most available dishes and processed foods use animal products unnecessarily, it is simply the dominant way of living that perpetuates itself. I don't view that inconvenience as important to me, because it is simply a structural problem. * The Jain lifestyle at its most extreme kind of consumes one's life. Not being able to take a step without brushing potential bugs out of the way on the ground makes it difficult to merely exist. Perhaps it is the way of living that reduces suffering the most, but at what cost to you? Veganism doesn't require so much change in ways of living, just choices.

    Annotators

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      Figure 1D: It would be useful to indicate the number of embryos analyzed for these experiments (n = ?).

      Number of embryos now included in figure legend

      Figure 3B: The control condition for gcl⁻/⁻; ras-RNAi is labeled as "EV". This terminology (presumably "empty vector") is not defined in either the text or the figure legend. In addition, the magenta channel for the Ras-G37 condition appears to be flipped horizontally.

      We replaced with “-“ in figure and figure legend

      Page 7: The text states that "Ras-C40 activates the PI3K pathway," whereas the figure depicts Ras-C40 as activating the RalA pathway. This discrepancy could be confusing for the reader and should be corrected.

      The diagram has been corrected

      Figures 4 and 5: To facilitate interpretation, it may be helpful to include a schematic of the PI3K complex indicating the different subunits used in the study, along with information (potentially color-coded) about whether each construct primarily acts as an activator or inhibitor of PI3K function.

      Figure 4E and Figure 5E were added

      Figure 4A and 4B: For clarity and consistency with the text, the panels (and corresponding plots) for dp110-WT and dp110-CAAX could be placed before those for dp110-D954A and dp110-ΔRBD.

      Order of constructs was rearranged

      Figure 5C: The term "p60-TCEp3," which appears to correspond to the germ plasm-targeted p60-WT construct, is not defined in either the figure legend or the main text.

      Clarification was added to the text (p.11, line 225)

      Page 12: The reference "(Fig. S1A, Movie 1)" should be corrected to "(Fig. S2A, Movie 1)."

      Corrected

      Page 13: There is a missing word in the sentence "the biosensor appeared to be enrich to...", which should be corrected to "enriched."

      Corrected

      Figure 7A: Although the data presented are interesting and ultimately support the authors' conclusion that Torso regulates PIP3 levels, the results are somewhat counter-intuitive and may be confusing for readers. The authors might consider moving this panel to the Supplementary Figures. In addition, it could be informative to include PIP3 measurements for gcl⁻/⁻ (and possibly gcl⁺/⁻) pole buds in Figure 7B, as PIP3 appears particularly enriched in these conditions compared to wild type.

      We agree that at first the findings in the early embryos were confusing, but we prefer including them in the main figure to demonstrate changes in PIP 3 distributions in torso mutants. We are now providing a possible explanation for these findings (p13 line 270-). The differences are quite clear in the older embryos and measurements shown in 7B-D. Pole bud measurements for gcl-/- and gcl+/- are shown in figure 6 E-G.

      Reviewer #2

      Fig. legends to 1C and 1D are swapped.

      Corrected

      Why is csw not necessary for PGC formation? It acts upstream of Ras. This is not discussed.

      We now highlight this point in the text (and refer to studies on the sevenless kinase, which suggested a similar position of Csw parallel or downstream of Ras (page 6 line 107-).

      Fig 3C. Consider changing the order of the ras-variants used: S35, G37, C40 instead of S35, C40, G37.

      We changed the schematic in Figure 3C that should make the order of Ras variants more intuitive.

      Fig 4A, B: Consider changing the order of the panels. Control, dp110-wt, dp110-CAAX, dp110-D954A, dp110-deltaRBD.

      Order of constructs was rearranged

      Fig S4 is mentioned in the text before S2 and S3. Consider changing the suppl. figure order.

      Order of supplementary figures was rearranged

      Page 12: Fig S1 A does not show PIP2 dynamics. Movie 1 is not available to this reviewer. The authors most likely refer to fig. S2.

      Movie 1 was uploaded and figure calls were corrected

      Page 13, 1st para: Why do the authors use glc heterozygous embryos to look at PIP3 and PIP2? Particularly so when they report later in the MS that glc+/- behave differently to wt controls in terms of PIP3 levels (Fig. 7C). By looking at gcl+/+, they might find that now PIP2 levels are different in gcl mutant embryos or that the differences between PIP3 levels in +/+ and -/- are larger than compared with +/-.

      Since gcl+/- embryos form the same number of PGCs as WT but show a statistically significant increase in PI3K activity when comparing membrane to cytoplasm staining intensity, we favor using gcl+/- embryos, as these embryos may represent a more sensitive test for PIP2 and PIP3 levels.

      Pages 15 and 16: revise figure calls in the text.

      Figure calls were revised

      M+M: How were gcl+/- and gcl-/- embryos identified?

      Since all genetic manipulations in this alter the maternal contribution to the embryo, we us the term ‘mutant’ embryos referring to the maternal genotype (indicated on page 3 line 33 and more clearly stated in material and methods and reagent table). Embryos derived from mother of a specific maternal genotype are all identical, thus we can easily distinguish between embryos derived from homozygous mutant mothers (gcl-/-) or heterozygous mutant mothers (gcl-/+) In the reagents table we include the precise genotype description. “CyO” refers to the balancer chromosome commonly used to identify heterozygotes on the second chromosome. Flies with the CyO balancer have curly wings.

      Reviewer #3

      Figure 1B: The authors describe that embryos with OptoSos still form buds which protruded from the cortex, but PGCs largely fail to cellularize (described in pg. 5). I'm not sure what they meant by "fail to cellularize" as this is not obvious to me when looking at the figure. The authors should describe how they know it's cellularized in the controls and not in the OptoSos or change the wording to "suggesting a failure to cellularize".

      We used the word ‘protruded’ to describe our live observations. PGCs were quantified in fixed embryos, immunostained with anti-Vasa antibody to count Vasa positive cells (Fig 1C and D. We observe a lack of Vasa-positive PGCs, only in the light-activated OptoSos condition.

      Fig. 1B, lines 4-5: at what stage are these embryos? Cycle 9? Cycle 14? Both?

      Nuclear cycles of embryos for each panel are noted on the left side of each panel

      Fig. 4A: add dp110-CAAX results to Results section

      dp110-CAAX results are included in the Results section (p.9. line 177)

      Figure 5C: The hyper-clustered phenotype they describe is hard to visualize in this figure (described in pg. 11). The authors should describe what is meant by "hyper-clustered".

      We agree and re-worded the description of this observation to be clearer, page 11, line 226-.

      Figure 7: When comparing Fig. 7A and 7B torsoHH/WK images, we can see that in Fig. 7A that PIP3 pattern changes such that PIP3 is now at the most posterior end where PGC will eventually form (compared to control that has low PIP3 in this region), but then in Fig. 7B they are looking at the buds and they say PIP3 levels decrease, which does not correspond to Fig. 7A. Are these simply different stages and PIP3 levels change over time (looking at Fig. 7C, PIP3 does not seem to change a lot over time)?

      The figure legend now states more clearly that embryos were of different ages. We also explain in the text the apparent discrepancy in the patterns before and during budding (page13 line 266). The time points in figure 7C span nuclear cycle 10, not earlier (page14 line 274). By measuring membrane to cytoplasmic distribution, a more accurate comparison is possible at this stage.

      p. 5, line 5: "Optosos" is written "OptoSos" elsewhere (suggest using OptoSos throughout)

      Corrected

      Is it possible that inhibition of myosin II recruitment is due to conversion of PIP2 -> PIP3, thus loss of PIP2, or is it that myosin is specifically recruited to regions where PIP2 is high? This seems like a point that should be added to the discussion.

      This point is now discussed on page 20, line 403

      p. 5, line 6: suggest adding a comma after "Ras" for clarity

      Corrected

      p. 5, last line: the genotype is "w^1118" (with ^ indicating a superscript), not "w^-1118", and is italicized (this should be corrected throughout)

      Corrected

      p. 6, line 2: replace "cellularizing" with "cellularization"

      Corrected

      p. 6, lines 11-13: Where is it shown that knockdown of csw, dsor1 and rolled did not restore PGC formation? The data are not present in Fig. 2C (could include in supp fig?)

      We added these data as Supplementary figure 1

      p. 7, line 1: replace "interfere" with "interferes"

      Corrected

      p. 7, last three lines: what is stated here, "Ras-G37 [activates] both the RalA and the PI3K pathways, and Ras-C40 activates the PI3K pathway" is not consistent with what is diagrammed in Fig. 3C, where Ras-C40 is indicated as activating RalA (please correct either the text or the diagram)

      We apologize and corrected the figure

      p. 11, lines 1-2: the Pi3K21B gene and transcript should be italicized (note that Pi3K21B is the official gene name on FlyBase)

      Gene name was italicized

      p. 11, lines 6-10: it might be helpful to explain how the p60 construct was overexpressed (current lines 9-10) before describing the results (current lines 7-8)

      Clarification on p60 construct was added to p.11, line 215-

      p. 12, paragraph 2, line 2: the PIP2 biosensor should be written as "PLCgamma[PH]:mCherry" throughout, not "PLCy[PH]:mCherry"; this should be changed in the figures as well as the text (Symbol font can be used to turn "g" into lower-case "gamma", both in Word and in Illustrator)

      Gamma symbol was added

      It would also be helpful to show the overlap of the PIP2 and PIP3 signals in control vs. gcl mutants at different stages so the relative distribution and intensity of the signals can be better appreciated (consider adding this as a supplementary figure).

      Our data show that PIP2 is not affected by lack of GCL (Fig 6 B-D). We thus do not think that simultaneous imaging of PIP2 and PIP3 in gcl-/- would add to our conclusions. Furthermore, these experiments would require a significant time investment to generate the respective genotypes. Thus, we agree with the reviewer that this is experiment is beyond the scope of the paper.

      p. 12, paragraph 2, line 3: it does not appear that the two PIP markers were used "simultaneously" in Fig. 6A; however, this is evident from Fig. S2 and Movie 1 (consider placing callouts to these earlier in the paragraph or moving the description of simultaneous expression and observation of the two markers later in the paragraph to avoid confusion)

      We did simultaneously image PIP2 and PIP3 sensors and have added this as Movie 1 and also in supplementary Figure S4, which are now clearly referred to in the text.

      p. 12, paragraph 2, line 7: replace "Fig. S1A" with "Fig. S2" (this was confusing)

      Figure call was updated

      p. 16: change "Fig. 7G-I" to "Fig. 8G-I"

      Figure call was updated

      p. 20, Deming reference: there appears to be a stray asterisk in the title

      Asterisk was removed from reference

      Fig. 1D: need to explain that the colors in the graph indicate the numbers of PGCs formed (this could also be added as a label across the top of the graph); in addition, the number of embryos examined for each genotype should be included in the legend

      We added a label at the top of the graph and ‘n’ were added to figure legend

      Fig. 2B: spell out where csw, dsor1 and rolled data are shown; also, "n" is not defined; was this the number of embryos per genotype?

      We added these data as Supplemental Figure 1

      Fig. 3B: "EV" should be defined in the legend; is this "empty vector"?

      We are using a “-“ to mark controls without transgene

      Fig. 3C: see previous comment re: mistake in the diagram; I believe Ras-C40 was described as activating PI3K, not RalA

      We apologize and corrected the figure

      Fig. 4B, line 2: was the graph plotted from the data in panel (C) or panel (A)? panel (A) seems more likely, because the data in C is plotted in D; please correct the panel callout

      Figure legend was updated to refer to the correct panel

      Fig. 5C: describe "p60-TCEp3" in the legend

      We added germplasm-targeting 3’UTR (TCEp3) to legend and the construct and reference are provided in Material and Methods section

      Figure 6: In Fig. 6E-G, the "brightness" of PIP3 at the membrane corresponds to the images even with different views (posterior and orthogonal) and agrees with the graph.

      However, when looking at Fig. 6B, it looks to me that PIP2 is brighter in gcl+/-, but the opposite is true when looking at Fig. 6D (i.e., PIP2 looks brighter in gcl-/-). The authors might want to comment on this.

      We have updated the figure to better reflect our observations.

      Fig. 6A: define "(fire)" here or in the first figure legend where this is used

      We added an inset for the fire lookup table to clearly define the pseudcolor scheme used in the image

      Figure 8 title: "Actin fluorescence is increased in gcl-/- pole buds",But their graph in Fig. 8B comparing actin in gcl+/- to -/- is not significant

      Thanks for catching our mistake, myosin not actin is changed

      Fig. 8I: replace "Scarlett" with "Scarlet"

      Corrected

      Fig. 8D-F: Although the plots in panel E agree with the images in panel D, it is unclear why those in panel F are not more concordant. In F, myosin appears enriched at the cortex relative to the cytoplasm in gcl-/- mutants, which is hard to reconcile with the data in D-E.

      We have updated the figure to better reflect our observations.

      Fig. S2A: define the three time points shown here, and clarify that these are shown left to right (if this is indeed the case)

      We removed S2A and updated the movie to replace it

      Fig. S4: change "P60" to "p60" in the figure title

      Corrected

      Movie: The movies showing PIP2 and PIP3 in whole embryos are nice, but it would also be helpful to also include merged images of the two channels, so the reader can examine the relative accumulation of the two PIPs over time.

      Merged images panel was added to the movie.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      Although Torso is known to antagonize primordial germ cell (PGC) formation, the underlying mechanisms remain unclear. Canonical Torso signalling typically results in activation of Ras. However, the authors show that Ras-mediated suppression of PGC formation is independent of the Raf/MEK/ERK pathway. Instead, they uncover an unexpected role for Torso in activating phosphoinositide 3-kinase (PI3K) that promotes formation of PIP3 enriched posterior membrane domains. The resulting increase in PI3K activity disrupts PGC formation. Furthermore, they show that by promoting Torso degradation, the ubiquitin ligase adaptor Germ Cell-Less (GCL) primes the posterior membrane with reduced PIP3 to facilitate PGC formation. Lastly, the authors suggest a model where antagonistic relationship between GCL and Torso influences actomyosin contractility that may allow the bud to constrict for proper PGC formation.

      Major comments:

      Figure 1B: The authors describe that embryos with OptoSos still form buds which protruded from the cortex, but PGCs largely fail to cellularize (described in pg. 5). I'm not sure what they meant by "fail to cellularize" as this is not obvious to me when looking at the figure. The authors should describe how they know it's cellularized in the controls and not in the OptoSos or change the wording to "suggesting a failure to cellularize".

      Figure 5C: The hyper-clustered phenotype they describe is hard to visualize in this figure (described in pg. 11). The authors should describe what is meant by "hyper-clustered".

      Figure 6: In Fig. 6E-G, the "brightness" of PIP3 at the membrane corresponds to the images even with different views (posterior and orthogonal) and agrees with the graph. However, when looking at Fig. 6B, it looks to me that PIP2 is brighter in gcl+/-, but the opposite is true when looking at Fig. 6D (i.e., PIP2 looks brighter in gcl-/-). The authors might want to comment on this.

      It would also be helpful to show the overlap of the PIP2 and PIP3 signals in control vs. gcl mutants at different stages so the relative distribution and intensity of the signals can be better appreciated (consider adding this as a supplementary figure).

      Figure 7: When comparing Fig. 7A and 7B torsoHH/WK images, we can see that in Fig. 7A that PIP3 pattern changes such that PIP3 is now at the most posterior end where PGC will eventually form (compared to control that has low PIP3 in this region), but then in Fig. 7B they are looking at the buds and they say PIP3 levels decrease, which does not correspond to Fig. 7A. Are these simply different stages and PIP3 levels change over time (looking at Fig. 7C, PIP3 does not seem to change a lot over time)?

      Page 15, last paragraph: "If myosin II recruitment is inhibited when PIP3 levels are high" Is it possible that inhibition of myosin II recruitment is due to conversion of PIP2 -> PIP3, thus loss of PIP2, or is it that myosin is specifically recruited to regions where PIP2 is high? This seems like a point that should be added to the discussion.

      Overall, I think their claim that antagonistic activities of GCL and Torso is crucial for PGC formation is well justified. The combination of optogenetic tools with activation and lof mutants is nicely done. Some clarification regarding the PIP3 and PIP2 levels will be helpful to the reader (see my comments above). The myosin claim is less convincing (see my comment on Fig. 8D-F below).

      Minor comments on the text:

      p. 5, line 5: "Optosos" is written "OptoSos" elsewhere (suggest using OptoSos throughout) p. 5, line 6: suggest adding a comma after "Ras" for clarity p. 5, last line: the genotype is "w^1118" (with ^ indicating a superscript), not "w^-1118", and is italicized (this should be corrected throughout) p. 6, line 2: replace "cellularizing" with "cellularization" p. 6, lines 11-13: Where is it shown that knockdown of csw, dsor1 and rolled did not restore PGC formation? The data are not present in Fig. 2C (could include in supp fig?) p. 7, line 1: replace "interfere" with "interferes" p. 7, last three lines: what is stated here, "Ras-G37 [activates] both the RalA and the PI3K pathways, and Ras-C40 activates the PI3K pathway" is not consistent with what is diagrammed in Fig. 3C, where Ras-C40 is indicated as activating RalA (please correct either the text or the diagram) p. 11, lines 1-2: the Pi3K21B gene and transcript should be italicized (note that Pi3K21B is the official gene name on FlyBase) p. 11, lines 6-10: it might be helpful to explain how the p60 construct was overexpressed (current lines 9-10) before describing the results (current lines 7-8) p. 12, paragraph 2, line 2: the PIP2 biosensor should be written as "PLCgamma[PH]:mCherry" throughout, not "PLCy[PH]:mCherry"; this should be changed in the figures as well as the text (Symbol font can be used to turn "g" into lower-case "gamma", both in Word and in Illustrator) p. 12, paragraph 2, line 3: it does not appear that the two PIP markers were used "simultaneously" in Fig. 6A; however, this is evident from Fig. S2 and Movie 1 (consider placing callouts to these earlier in the paragraph or moving the description of simultaneous expression and observation of the two markers later in the paragraph to avoid confusion) p. 12, paragraph 2, line 7: replace "Fig. S1A" with "Fig. S2" (this was confusing) p. 16: change "Fig. 7G-I" to "Fig. 8G-I" p. 20, Deming reference: there appears to be a stray asterisk in the title

      Minor comments on the figures and figure legends:

      Fig. 1B, lines 4-5: at what stage are these embryos? Cycle 9? Cycle 14? Both? Fig. 1C: see previous comment about "w^1118" genotype nomenclature Fig. 1D: need to explain that the colors in the graph indicate the numbers of PGCs formed (this could also be added as a label across the top of the graph); in addition, the number of embryos examined for each genotype should be included in the legend Fig. 2B: spell out where csw, dsor1 and rolled data are shown; also, "n" is not defined; was this the number of embryos per genotype? Fig. 3B: "EV" should be defined in the legend; is this "empty vector"? Fig. 3C: see previous comment re: mistake in the diagram; I believe Ras-C40 was described as activating PI3K, not RalA Fig. 3E: fix "w^1118" as described above Fig. 4A: add dp110-CAAX results to Results section Fig. 4B, line 2: was the graph plotted from the data in panel (C) or panel (A)? panel (A) seems more likely, because the data in C is plotted in D; please correct the panel callout Fig. 5C: describe "p60-TCEp3" in the legend Fig. 6A: define "(fire)" here or in the first figure legend where this is used Figure 8 title: "Actin fluorescence is increased in gcl-/- pole buds",But their graph in Fig. 8B comparing actin in gcl+/- to -/- is not significant Fig. 8D-F: Although the plots in panel E agree with the images in panel D, it is unclear why those in panel F are not more concordant. In F, myosin appears enriched at the cortex relative to the cytoplasm in gcl-/- mutants, which is hard to reconcile with the data in D-E. Fig. 8I: replace "Scarlett" with "Scarlet" Fig. S2A: define the three time points shown here, and clarify that these are shown left to right (if this is indeed the case) Fig. S4: change "P60" to "p60" in the figure title

      Movie: The movies showing PIP2 and PIP3 in whole embryos are nice, but it would also be helpful to also include merged images of the two channels, so the reader can examine the relative accumulation of the two PIPs over time.

      Referees cross-commenting

      I agree enthusiastically with the comments of the other reviewers, who often came to the same conclusion I did about the manuscript and the data, including some of the detailed points about the figures, etc.

      Significance

      General assessment:

      The many strengths of this manuscript include elegant genetic and optogenetic approaches using well-designed transgenes.

      The main weakness is the lack of experiments showing simultaneous live imaging of the PIP2 and PIP3 sensors in gcl-/- and other genetic backgrounds, which would help the reader better envision how regulators of this pathway affect phospholipid distribution at the level of whole embryos and prospective pole cells. Note that because of the time required, I do not insist that they do this.

      Advance:

      Study demonstrates for the first time an unexpected role of Torso in PI3K regulation

      Audience:

      germ cell afficionados, developmental biologists, cell biologists, PI3K researchers

      My field of expertise:

      Drosophila, germ cell development, genetics, cell biology, live imaging, phosphoinositides

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This manuscript investigates how dentate gyrus (DG) granule cell subregions, specifically suprapyramidal (SB) and infrapyramidal (IB) blades, are differentially recruited during a high cognitive demand pattern separation task. The authors combine TRAP2 activity labeling, touchscreen-based TUNL behavior, and chemogenetic inhibition of adult-born dentate granule cells (abDGCs) or mature granule cells (mGCs) to dissect circuit contributions.

      This manuscript presents an interesting and well-designed investigation into DG activity patterns under varying cognitive demands and the role of abDGCs in shaping mGC activity. The integration of TRAP2-based activity labeling, chemogenetic manipulation, and behavioral assays provides valuable insight into DG subregional organization and functional recruitment. However, several methodological and quantitative issues limit the interpretability of the findings. Addressing the concerns below will greatly strengthen the rigor and clarity of the study.

      Major points:

      (1) Quantification methods for TRAP+ cells are not applied consistently across panels in Figure 1, making interpretation difficult. Specifically, Figure 1F reports TRAP+ mGCs as density, whereas Figure 1G reports TRAP+ abDGCs as a percentage, hindering direct comparison. Additionally, Figure 1H presents reactivation analysis only for mGCs; a parallel analysis for abDGCs is needed for comparison across cell types.

      In Figure 1G and 1H we report TRAP+ abDGCs as a percentage rather than density because we are analyzing colocalization of the two markers, which are very sparse in this population. Given the very low number of double-labeled abDGCs, calculating density would not be practical. In the revised manuscript we have clarified the rationale for using these measures. As noted in the current text, we did not observe abDGCs co-expressing TRAP and c-Fos; we have made this point more explicit to guide interpretation of these data.

      (2) The anatomical distribution of TRAP+ cells is different between low- and high-cognitive demand conditions (Figure 2). Are these sections from dorsal or ventral DG? Is this specific to dorsal DG, as it is preferentially involved in cognitive function? What happens in ventral DG?

      The sections shown in Figure 2 were obtained from the dorsal dentate gyrus (see Methods, “Histology and imaging”: stereotaxic coordinates −1.20 to −2.30 mm relative to bregma, Paxinos atlas). From a feasibility standpoint, it is not possible to analyze the entire longitudinal extent of the hippocampus with these low-throughput histological approaches. We therefore focused on the dorsal DG, for which there is a strong functional rationale. A large body of work indicates that the dorsal hippocampus, and specifically the dorsal DG, is preferentially involved in spatial memory and in the fine contextual discrimination that underlies pattern separation. The dorsal hippocampus is critical for encoding and distinguishing similar spatial representations, a core component of the high-cognitive demand task used here. In contrast, the ventral DG is more strongly associated with emotional regulation and affective memory processing and is less implicated in high-resolution spatial encoding. For these reasons, the present study was designed to assess TRAP+ cell distributions specifically in the dorsal DG.

      (3) The activity manipulation using chemogenetic inhibition of abDGCs in AsclCreER; hM4 mice was performed; however, because tamoxifen chow was administered for 4 or 7 weeks, the labeled abDGC population was not properly birth-dated. Instead, it consisted of a heterogeneous cohort of cells ranging from 0 to 5-7 weeks old. Thus, caution should be taken when interpreting these results, and the limitations of this approach should be acknowledged.

      We agree that prolonged tamoxifen administration results in labeling a heterogeneous population of abDGCs spanning approximately 0 to 5–7 weeks of age, rather than a precisely birth-dated cohort. This is a limitation of this approach and we have included discussion of this in more detail in the revised manuscript.

      (4) There is a major issue related to the quantification of the DREADD experiments in Figure 4, Figure 5, Figure 6, and Figure 7. The hM4 mouse line used in this study should be quantified using HA, rather than mCitrine, to reliably identify cells derived from the Ascl lineage. mCitrine expression in this mouse line is not specific to adult-born neurons (off-targets), and its expression does not accurately reflect hM4 expression.

      We agree that mCitrine is not a marker that allows localization of hM4Di as it is well known that the mCitrine can be independently expressed in a Cre independent manner in this mouse. As suggested, we have removed the figure that showed the mCitrine and have performed immunohistochemical localization of the DREADD with an antibody against the HA tag. This is now shown in Figure 5.

      (5) Key markers needed to assess the maturation state of abDGCs are missing from the quantification. Incorporating DCX and NeuN into the analysis would provide essential information about the developmental stage of these cells.

      The goal of this study was to examine activity patterns of adult-born versus mature granule cells, rather than to assess maturation state. The adult-born neurons analyzed were 25–39 days old, an age at which point most cells have progressed beyond the DCX⁺ stage and are expected to express NeuN based on prior work. We therefore do not think that including DCX or NeuN quantification would provide additional information relevant to the aims or interpretation of this study.

      Minor points:

      (1) The labeling (Distance from the hilus) in Figure 2B is misleading. Is that the same location as the subgranular zone (SGZ)? If so, it's better to use the term SGZ to avoid confusion.

      We have updated Figure 2B, the Methods, and the main text to more explicitly localize this which it the boundary between the subgranular zone (SGZ) and the hilus.

      (2) Cell number information is missing from Figures 2B and 2C; please include this data.

      We have now added the cell number information to the figure legends. In Figures 2B and 2C, each point corresponds to a single cell, with an equal number of mice per group. The total number of TRAP⁺ cells per mouse is shown in Figure 1F, which reports TRAP⁺ cell densities by group.

      (3) Sample DG images should clearly delineate the borders between the dentate gyrus and the hilus. In several images, this boundary is difficult to discern.

      We made the DG-hilus boundaries clearer in the sample images to improve visualization and interpretation.

      (4) In Figure 6, it is not clear how tamoxifen was administered to selectively inhibit the more mature 6-7-week-old abDGC population, nor how this paradigm differs from the chow-based approach. Please clarify the tamoxifen administration protocol and the rationale for its specificity.

      We apologize for the confusion here. The protocol used in Figure 6 is the same tamoxifen chow–based approach as in Figure 5, differing only in the duration of tamoxifen exposure. Mice in Figure 5 received tamoxifen chow for 7 weeks, whereas mice in Figure 6 received it for 4 weeks, restricting labeling to a younger and narrower cohort of adult-born DGCs. Thus, the population targeted in Figure 6 is younger than that in Figure 5 and does not correspond to mature 6–7-week-old neurons. By contrast, the experiment in Figure 4 targets a more mature population, consisting predominantly of ~5-week-old adult-born neurons as well as mature granule cells, which are Dock10-positive and express Cre endogenously, allowing selective manipulation of this later-stage population.

      We have corrected the paragraph accordingly and clarified the age range of the labeled populations in the revised manuscript.

      Reviewer #2 (Public review):

      Summary

      In this manuscript, the authors combine an automated touchscreen-based trial-unique nonmatching-to-location (TUNL) task with activity-dependent labeling (TRAP/c-Fos) and birth-dating of adult-born dentate granule cells (abDGCs) to examine how cognitive demand modulates dentate gyrus (DG) activity patterns. By varying spatial separation between sample and choice locations, the authors operationally increase task difficulty and show that higher demand is associated with increased mature granule cell (mGC) activity and an amplified suprapyramidal (SB) versus infrapyramidal (IB) blade bias. Using chemogenetic inhibition, they further demonstrate dissociable contributions of abDGCs and mGCs to task performance and DG activation patterns.

      The combination of behavioral manipulation, spatially resolved activity tagging, and temporally defined abDGC perturbations is a strength of the study and provides a novel circuit-level perspective on how adult neurogenesis modulates DG function. In particular, the comparison across different abDGC maturation windows is well designed and narrows the functionally relevant population to neurons within the critical period (~4-7 weeks). The finding that overall mGC activity levels, in addition to spatially biased activation patterns, are required for successful performance under high cognitive demand is intriguing.

      Major Comments

      (1) Individual variability and the relationship between performance and DG activation.

      The manuscript reports substantial inter-animal variability in the number of days required to reach the criterion, particularly during large-separation training. Given this variability, it would be informative to examine whether individual differences in performance correlate with TRAP+ or c-Fos+ density and/or spatial bias metrics. While the authors report no correlation between success and TRAP+ density in some analyses, a more systematic correlation across learning rate, final performance, and DG activation patterns (mGC vs abDGC, SB vs IB) could strengthen the interpretation that DG activity reflects task engagement rather than performance only.

      As mentioned, we previously reported no correlation between task success and TRAP+ density. We have now performed additional analyses examining correlations with learning rate, final performance, and DG activation patterns (mGC vs abDGC, SB vs IB), and found no significant relationships. Therefore, as we did not find any positive correlations the original interpretation that DG activity primarily reflects task engagement rather than performance level seems the most parsimonious.

      (2) Operational definition of "cognitive demand".

      The distinction between low (large separation) and high (small separation) cognitive demand is central to the manuscript, yet the definition remains somewhat broad. Reduced spatial separation likely alters multiple behavioral variables beyond cognitive load, including reward expectation, attentional demands, confidence, engagement, and potentially motivation. The authors should more explicitly acknowledge these alternative interpretations and clarify whether "cognitive demand" is intended as a composite construct rather than a strictly defined cognitive operation.

      We agree that reducing spatial separation between stimuli likely engages multiple behavioral and cognitive processes beyond a single, strictly defined operation. We have now clarified this point in the manuscript and explicitly state that our use of the term “cognitive demand” reflects a multidimensional behavioral challenge rather than a singular cognitive process (see Discussion).

      (3) Potential effects of task engagement on neurogenesis.

      Given the extensive behavioral training and known effects of experience on adult neurogenesis, it remains unclear whether the task itself alters the size or maturation state of the abDGC population. Although the focus is on activity and function rather than cell number, it would be useful to clarify whether neurogenesis rates were assessed or controlled for, or to explicitly state this as a limitation.

      While the primary goal of this study was to examine activity and functional recruitment of adult-born granule cells, we also quantified the survival of birth-dated neurons at the end of behavioral training. Density measurements of BrdU⁺ and EdU⁺ cells revealed no differences across experimental groups, indicating that engagement in the pattern separation task, across low to high cognitive demand conditions, did not significantly alter survival of adult-born neurons. In addition, we examined the spatial distribution of BrdU⁺ and EdU⁺ neurons between the suprapyramidal and infrapyramidal blades of the dentate gyrus. The proportion of newborn neurons was consistent across all groups, with approximately 60% located in the suprapyramidal blade and 40% in the infrapyramidal blade. These findings indicate that behavioral training did not alter the baseline distribution of adult-born neurons. We have now clarified these points in the manuscript (See Results).

      (4) Temporal resolution of activity tagging.

      TRAP and c-Fos labeling provide a snapshot of neural activity integrated over a temporal window, making it difficult to determine which task epochs or trial types drive the observed activation patterns. This limitation is partially acknowledged, but the conclusions occasionally imply trial-specific or demand-specific encoding. The authors should more clearly distinguish between sustained task engagement and moment-to-moment trial processing, and temper interpretations accordingly. While beyond the scope of the current study, this also motivates future experiments using in vivo recording approaches.

      We agree and have made changes to the manuscript to discuss these points (see Discussion and Limitations).

      (5) Interpretation of altered spatial patterns following abDGC inhibition.

      In the abDGC inhibition experiments, Cre+ DCZ animals show delayed learning relative to controls. As a result, when animals are sacrificed, they may be at an intermediate learning stage rather than at an equivalent behavioral endpoint. This raises the possibility that altered DG activation patterns reflect the learning stage rather than a direct circuit effect of abDGC inhibition. Additional clarification or analysis controlling for the learning stage would strengthen the causal interpretation.

      We agree that differences in learning stage could in principle confound the interpretation of DG activation patterns. However, although Cre+ DCZ-treated mice exhibited delayed learning, they ultimately reached the same performance criterion as control animals. Thus, adult-born DGC inhibition did not prevent learning but increased the time required to reach criterion, indicating that these neurons are beneficial for learning efficiency rather than strictly necessary for task acquisition. Importantly, all animals were sacrificed only after reaching the predefined success criterion. Therefore, the immunohistochemical analyses were performed at the same behavioral endpoint for Cre+ DCZ and control groups, even though the number of training days differed. Consequently, the observed differences in DG activation reflect circuit recruitment at equivalent task mastery rather than differences in learning stage.

      (6) Relationship between c-Fos density and behavioral performance.

      The study reports that abDGC inhibition increases c-Fos density while impairing performance, whereas mGC inhibition decreases c-Fos density and also impairs performance. This raises an important conceptual question regarding the relationship between overall activity levels and task success. The authors suggest that both sufficient activity and appropriate spatial patterning are required, but the manuscript would benefit from a more explicit discussion of how different perturbations may shift the identity, composition, or coordination of the active neuronal ensemble rather than simply altering total activity levels.

      We agree that our findings highlight that successful performance is not determined solely by the overall level of dentate gyrus activity, but rather by the composition and spatial organization of the active neuronal ensemble. In our study, inhibition of abDGCs increased overall mGC activity while disrupting the spatially organized, blade-biased activation pattern and impaired performance. In contrast, direct inhibition of mGCs reduced global excitability but preserved the relative spatial organization of active neurons in animals that continued to perform the task. These findings suggest that different perturbations alter task performance by shifting the identity and coordination of the active neuronal ensemble, rather than simply increasing or decreasing total activity levels. We have now expanded the Discussion to more explicitly address how dentate gyrus computations may depend on the structured recruitment of granule cell ensembles and how distinct manipulations differentially disrupt this organization.

      Reviewer #3 (Public review):

      Summary:

      The authors used genetic models and immunohistochemistry to identify how training in a spatial discrimination working memory task influences activity in the dentate gyrus subregion of the hippocampus. Finding that more cognitively challenging variants of the task evoked more and distinct patterns of activity, they then investigated whether newborn neurons in particular were important for learning this task and regulating the spatial activity patterns.

      Strengths:

      The focus on precise anatomical locations of activity is relatively novel and potentially important, given that little is known about how DG subregions contribute to behavior. The authors also use a task that is known to depend on this memory-related part of the brain.

      Weaknesses:

      Statistical rigor is insufficient. Many statistical results are not stated, inappropriate tests are used, and sample sizes differ across experiments (which appear to potentially underlie null results). The chemogenetic approach to inhibit adult-born neurons also does not appear to be targeting these neurons, as judged by their location in the DG.

      Please refer to the updated statistical analyses in response to the recommendations below.

      Recommendations for the authors:

      Reviewing Editor Comments

      Please note that reviewers agreed that appropriate revisions are needed to increase the strength of evidence for the paper's claims. Concerns were raised about a lack of statistical rigor in the statistical analyses used. Results of statistical tests were not consistently provided (i.e., statistic applied, value of statistic, degrees of freedom, p-value), and seemingly inappropriate statistical tests were used in some instances. Also, some comparisons had lower statistical power than others. When clarifying the statistical approaches used in the manuscript, we also encourage you to consider reading this article that outlines common statistical mistakes (Makin TR, Orban de Xivry JJ. Ten common statistical mistakes to watch out for when writing or reviewing a manuscript. Elife. 2019 Oct 9;8:e48175. doi: 10.7554/eLife.48175.), such as the importance of not basing conclusions on a significant p-value for one pair-wise comparison vs a non-significant p-value for another pairwise comparison (i.e., groups that are being compared should be included in the same statistical analysis, and interaction effects should be reported when appropriate). We hope that you find this information to be helpful should you decide to submit a revised manuscript to eLife.

      Reviewer #1 (Recommendations for the authors):

      (1) Standardize TRAP+ quantification across Figure 1.

      Please report TRAP+ cell numbers using consistent metrics (e.g., density or percentage) to enable comparison across cell types. In addition, extend the TRAP+ reactivation analysis in Figure 1H to include abDGCs so that reactivation dynamics can be compared directly between mGCs and abDGCs.

      Reply in Public Review

      (2) Clarify whether dorsal or ventral DG was analyzed in Figure 2.

      The differing anatomical distributions of TRAP+ cells under low- and high-demand conditions raise important questions about DG axis specificity. Please indicate whether analyses were performed in dorsal DG, ventral DG, or both, and provide data or justification accordingly.

      Reply in Public Review

      (3) Acknowledge limitations of the tamoxifen-chow labeling strategy in AsclCreER; hM4 experiments.

      Since tamoxifen chow administered over 4-7 weeks labels a heterogeneous abDGC population spanning a broad age range, this approach does not generate birth-dated cohorts. This limitation should be clearly addressed in the text and interpretations, particularly related to cell age-dependent effects, should be tempered.

      Reply in Public Review

      (4) Revise DREADD quantification using HA rather than mCitrine.

      The hM4 mouse line requires HA immunostaining to accurately identify Ascl-lineage cells expressing the DREADD receptor. Because mCitrine is not specific to adult-born neurons and does not reliably reflect hM4 expression, quantification based on mCitrine should be revised.

      Reply in Public Review

      (5) Include markers to assess abDGC maturation state.

      Adding quantification of DCX and NeuN would help define the developmental stage of abDGCs in key experiments and improve the interpretation of cell-age-dependent effects.

      Reply in Public Review

      (6) Clarify DG layer boundaries and terminology in Figure 2.

      If the metric labeled "Distance from the hilus" corresponds to the subgranular zone (SGZ), using SGZ terminology would prevent confusion. Additionally, please provide clearer delineation of DG and hilus borders in sample images.

      Reply in Public Review

      (7) Provide missing cell number data for Figures 2B and 2C.

      Reply in Public Review

      (8) Clarify the tamoxifen administration protocol in Figure 6.

      Please describe how the protocol selectively targets 6-7-week-old abDGCs and how it differs from the chow-based approach. This will help readers understand the intended specificity of the manipulation.

      Reply in Public Review

      Reviewer #2 (Recommendations for the authors):

      (1) EdU birth-dating timeline

      The manuscript would benefit from a clearer description of the EdU birth-dating timeline, ideally with a schematic similar to that provided for BrdU in Supplementary Figure 1.

      We appreciate the suggestion. However, we did not include a separate schematic for EdU because its use and birth-dating logic are identical to BrdU (both are thymidine analogs administered systemically and incorporated during S-phase). Therefore, the timeline shown in Supplementary Figure 1 applies equally to both markers. We have clarified this point in the Methods section to avoid confusion.

      (2) Clarity of TUNL task description.

      The description of the TUNL task, particularly for readers unfamiliar with touchscreen-based paradigms, is difficult to follow without consulting prior literature. A simplified schematic or a clearer step-by-step explanation in the main text or supplementary material would improve accessibility.

      We note that the main steps of the TUNL protocol are illustrated in Figure 1A, Supplementary Figure 2A and 2B. Nevertheless, we agree that the description in the text can be made clearer for readers less familiar with touchscreen-based tasks. Thus , we have now revised the Methods section to provide a clearer step-by-step description of the TUNL.

      (3) Influence of outliers in Figure 1G.

      In Figure 1G, the reported trend that ~1% of 25-39-day-old abDGCs are TRAP+ during LS trials appears to be driven by a small number of outliers. This should be acknowledged, and the wording of the conclusion moderated to reflect the variability in the data.

      We agree with the reviewer that the apparent outliers reflect the inherent sparsity of TRAP labeling in this population. In absolute terms, this corresponds to between 0 and 2 TRAP⁺ 25–39-day-old abDGCs per mouse, such that the presence or absence of a small number of labeled cells can appear as outliers when expressed as a percentage. We have revised the text to acknowledge this (see Results).

      (4) Presentation of learning curves.

      Rather than focusing primarily on "days before criterion" (DBC), it would be helpful to show full learning curves across the entire training period. This would provide a clearer picture of acquisition dynamics and inter-animal variability.

      We agree that learning curves can be informative in many behavioral paradigms. However, in our protocol, mice do not undergo the same number of training days because training stops individually once each animal reaches criterion. As a result, plotting full learning curves would produce trajectories of different lengths, making group comparisons difficult and visually cluttered. For this reason, we aligned animals based on days before criterion (DBC), which allows direct comparison of learning dynamics relative to task acquisition. We also consider the cumulative probability representation to be the most appropriate way to summarize learning progression across animals in this context which are also included in the figures.

      (5) Clarification of Figure 3B labeling

      In Figure 3B, the identity of the orange-labeled group above the LS condition is unclear. Clarification in the figure legend would improve interoperability.

      Figure 3B includes two experimental groups. One group performed both the large- and small-separation conditions; this group is shown in orange and labeled LS. Within this group, the upper orange trace corresponds to performance in the large-separation condition, while the lower orange trace corresponds to performance in the small-separation condition. The second group is a control group that performed only the large-separation configuration, and therefore only a single green trace is shown. We agree that this distinction was not sufficiently clear and have revised the figure legend and text to clarify the identity of each trace.

      Reviewer #3 (Recommendations for the authors):

      (1) Please label figures and, even better, put the legends on the same page.

      (2) Just to confirm, in establishing the task, mice performed above 70% for the small separation trials in one of the sessions on 2 consecutive days, for each criterion? Performance seems to be below 70%.

      Yes. To meet the criterion, each mouse had to reach ≥70% correct performance in at least one of the two daily sessions on two consecutive days. We then averaged the performance across both sessions for each of those days. As a result, if one session was ≥70% but the other was lower, the daily average could fall below 70%. The values shown in the figure correspond to these daily averages, further averaged across mice.

      (3) mGC needs to be explicitly defined. Am I assuming any non-birthdated GC is an mGC according to the authors? (which means it is unknown whether they are in fact mature, though likely most of them are).

      In this study, “mature granule cells” (mGCs) refer operationally to granule cells that are not birth-dated with BrdU or EdU and therefore are not classified as adult-born neurons within the defined labeling window. We agree that this population is not directly age-defined, and that while the majority are expected to be mature based on their birth timing relative to the labeling period, we cannot exclude the possibility that a small fraction may include younger, unlabeled neurons. We have now explicitly defined this usage of mGCs in the Methods and clarified this point in the text to avoid ambiguity.

      (4) Methods state that Kruskal-Wallis tests were used when more than 3 groups were compared, but I don't see these stats presented (e.g., for trap data in Figure 1, blade x task TRAP expt in Figure 3 (should be 2-way RM anova here and elsewhere), etc) or any corrections for multiple comparisons. I appreciate that the mean rates of TRAPed abGCs are higher in the S and LS groups than in the shaping group, but most mice do not have any BrdU+ cells that are also TRAPed, and there are no statistics here to support the claim. I don't think there is enough sampling to accurately quantify activation of abGCs. Also, no stats to support the claim that TRAPing increases at the "tip of the SB after the more demanding LS task".

      We agree with this comment. We have now systematically tested all datasets for normality (by group) and applied parametric tests when the data met normality assumptions, and non-parametric tests otherwise. The statistical analyses have been revised accordingly. We added the appropriate tests (including two-way ANOVA where relevant, such as for blade × group comparisons) and now report full statistics in the figure legends and results sections. For the TRAP analyses in adult-born DGCs, we explicitly acknowledge the very low number of BrdU⁺/TRAP⁺ cells, which limits statistical power and, in some cases, precludes robust statistical testing. These limitations are now clearly stated in the Results and Discussion, and the corresponding interpretations have been tempered. For all Kruskal–Wallis tests, post hoc pairwise comparisons were performed using Dunn’s test, with Bonferroni correction for multiple comparisons, as now specified in the Methods section. We also expanded the Methods to describe the statistical workflow in detail. In addition, we have added the previously missing statistical analysis for Figure 2C. Comparisons were performed between the 0–50% and 50–100% portions of the blade, where 0% corresponds to the apex and 100% corresponds to the distal tip of the blade.

      (5) Figure 3I: I can't figure out which effect is statistically significant here (what does the asterisk signify?). Why no individual data points in this graph?

      We agree that the absence of individual data points reduced interpretability, and we have now updated the figure to include individual data points to better illustrate data distribution and variability.

      (6) The gradient of activity (shap < S < LS) could be due to how long they've been trained on a given stage (e.g. less activity during shaping because they have habituated, and neurons encoding that task phase have already been selected)

      We agree that task duration and habituation could, in principle, influence activity levels. Under this interpretation, higher activity would primarily reflect task novelty rather than cognitive demand. However, our data do not support this explanation. Specifically, we found no correlation between the number of training days required to reach criterion and c-Fos–positive or TRAP-positive cell density within a given stage. Thus, animals that reached criterion rapidly did not show higher activity levels than animals that required more days of training and were presumably more habituated to the task demands. This suggests that the observed activity gradient (shaping < S < LS) is not driven by exposure duration or habituation, but rather reflects differences in cognitive demand across task stages.

      (7) The TRAP+ EDU+ cell in Figure 3 looks odd because the BrdU signal is (a lot) larger than the TRAP signal, but BrdU is in the nucleus and should be smaller.

      We agree that the example in Figure 3 is not optimal. In dividing cells, BrdU/EdU signals can sometimes appear broader or closely apposed, which may affect their apparent size.

      (8) For the Ascl-HM4Di experiment, HM4Di appears to be expressed in all of the areas of the granule cell layer where abGCs are NOT located (i.e. no expression in the deep cell layer, near the sgz). This is problematic because it suggests perhaps abGCs are not inhibited as expected.

      As noted in our response to Reviewer #1, we did not use the mCitrine to localize the DREADD receptor as it has been demonstrated that mCitrine expression is expressed in a Cre-independent manner and not correlated with hM4Di expression. In the revised manuscript we include a representative image were we performed immunostaining using an HA antibody to directly visualize hM4Di and confirm its expression in adult-born granule cells (Figure 5).

      (9) Line 267: "6-7 week old neurons by themselves do not influence either the performance of mice in the task". I don't think this is fair because this experiment wasn't designed with as much power to detect an effect. The group trends are in the same direction, but there are many fewer mice in this experiment (n=6/group) than in the =<7w experiment (n=11/group), where the effect just reached statistical significance.

      We are sorry for this confusion which came from an incorrect version. The experiment shown in Figure 6 does not target 6–7-week-old neurons specifically. It uses the same tamoxifen chow–based protocol as Figure 5, but with a shorter exposure (4 weeks vs. 7 weeks), thereby labeling a younger and more restricted cohort of adult-born DGCs. By contrast, Figure 4 targets a more mature population, consisting predominantly of ~5-week-old adult-born neurons as well as mature granule cells (Dock10+).

      We have corrected the paragraph accordingly and clarified the age range of the labeled populations in the revised manuscript.

    1. Author Response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Here Bansal et al., present a study on the fundamental blood and nectar feeding behaviors of the critical disease vector, Anopheles stephensi. The study encompasses not just the fundamental changes in blood feeding behaviors of the crucially understudied vector, but then use a transcriptomic approach to identify candidate neuromodulation path ways which influence blood feeding behavior in this mosquito species. The authors then provide evidence through RNAi knockdown of candidate pathways that the neuromodulators sNPF and Rya modulate feeding either via their physiological activity in the brain alone or through joint physiological activity along the brain-gut axis (but critically not the gut alone). Overall, I found this study to be built on tractable, well-designed behavioral experiments.

      Their study begins with a well-structured experiment to assess how the feeding behaviors of A. stephensi changes over the course of its life history and in response to its age, mating and oviposition status. The authors are careful and validate their experimental paradigm in the more well-studied Ae. aegypti, and are able to recapitulate the results of prior studies which show that mating is pre-requisite for blood feeding behaviors in Ae. aegypt. Here they find A. stephensi like another Anopheline mosquitoes has a more nuanced regulation of its blood and nectar feeding behaviors.

      The authors then go on to show in a Y- maze olfactometer that to some degree, changes in blood feeding status depend on behavioral modulation to host-cues, and this is not likely to be a simple change to the biting behaviors alone. I was especially struck by the swap in valence of the host-cues for the blood-fed and mated individuals which had not yet oviposited. This indicates that there is a change in behavior that is not simply desensitization to host-cues while navigating in flight, but something much more exciting happening.

      The authors then use a transcriptomic approach to identify candidate genes in the blood feeding stages of the mosquito's life cycle to identify a list of 9 candidates which have a role in regulating the host-seeking status of A. stephensi. Then through investigations of gene knockdown of candidates they identify the dual action of RYa and sNPF and candidate neuromodulators of host-seeking in this species. Overrall, I found the experiments to be welldesigned. I found the molecular approach to be sound. While I do not think the molecular approach is necessarily an all-encompassing mechanism identification (owing mostly to the fact that genetic resources are not yet available in A. stephensi as they are in other dipteran models), I think it sets up a rich lines of research questions for the neurobiology of mosquito behavioral plasticity and comparative evolution of neuromodulator action.

      Strengths:

      I am especially impressed by the authors' attention to small details in the course of this article. As I read and evaluated this article I continued to think how many crucial details I may have missed if I were the scientist conducting these experiments. That attention to detail paid off in spades and allowed the authors to carefully tease apart molecular candidates of blood-seeking stages. The authors top down approach to identifying RYamide and sNPF starting from first principles behavioral experiments is especially comprehensive. The results from both the behavioral and molecular target studies will have broad implications for the vectorial capacity of this species and comparative evolution of neural circuit modulation.

      I believe the authors have adequately addressed all of my concerns; however, I think an accompanying figure to match the explained methods of the tissue-specific knockdown would help readers. The methods are now explicitly written for the timing and concentrations required to achieve tissue-specific knockdown, but seeing the data as a supplement would be especially reassuring given the critical nature of tissue-specific knockdown to the final interpretations of this paper.

      We thank the reviewer for the suggestion and have now incorporated a schematic in the supplementary figure S9B, explaining our methodology for achieving tissue-specific knockdowns.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Bansal et al examine and characterize feeding behaviour in Anopheles stephensi mosquitoes. While sharing some similarities to the well-studied Aedes aegypti mosquito, the authors demonstrate that mated-females, but not unmated (virgin) females, exhibit suppression in their blood-feeding behaviour. Using brain transcriptomic analysis comparing sugar fed, blood fed and starved mosquitoes, several candidate genes potentially responsible for influencing blood-feeding behaviour were identified, including two neuropeptides (short NPF and RYamide) that are known to modulate feeding behaviour in other mosquito species. Using molecular tools including in situ hybridization, the authors map the distribution of cells producing these neuropeptides in the nervous system and in the gut. Further, by implementing systemic RNA interference (RNAi), the study suggests that both neuropeptides appear to promote blood-feeding (but do not impact sugar feeding) although the impact was observed only after both neuropeptide genes underwent knockdown.

      While the authors have addressed most of the concerns of the original manuscript, a few issues remain. Particularly, the following two points:

      (5) Figure 4

      The authors state that there is more efficient knockdown in the head of unfed females; however, this is not accurate since they only get knockdown in unfed animals, and no evidence of any knockdown in fed animals (panel D). This point should be revised in the results test as well.

      Perhaps we do not understand the reviewer's point or there has been a misunderstanding. In Figure 4D, we show that while there is more robust gene knockdown in unfed females, bloodfed females also showed modest but measurable knockdowns ranging from 5-40% for RYamide and 2-21% for sNPF.

      NEW-

      In both the dsRNA treatments where animals were fed, neither was significantly different from control. Therefore, there is no change, and indeed this is confirmed by the author's labelling of the figure stats in panel 4D.

      We agree with the reviewer and thank them for pointing it out. We have now revised the figure legend and the text to reflect these results (see lines 351-354).

      In addition, do the uninjected and dsGFP-injected relative mRNA expression data reflect combined RYa and sNPF levels? Why is there no variation in these data,...

      In these qPCRs, we calculated relative mRNA expression using the delta-delta Ct method (see line 975). For each neuropeptide its respective control was used. For simplicity, we combined the RYa and sNPF control data into a single representation. The value of this control is invariant because this method sets the control baseline to a value of 1.

      NEW-

      The authors are claiming that there is no variation between individual qPCR experiments (particularly in their controls)? Normally, one uses a known standard value (or calibrator) across multiple experiments/plates so that variation across biological replicates can be assessed. This has an impact on statistical analyses since there is no variation in the control data. Indeed, this impacts all figures/datasets in the manuscript where qPCR data is presented. All the controls have zero variation!

      We are truly thankful to this reviewer for insisting on this point. It has made us revisit what we thought we understood and now realise were doing wrong (though many in literature do it this way!). We were – incorrectly – setting each control to 1 and calculating relative fold changes for each replicate independently. While this is often seen in literature, we now realise that it is incorrect. We have revisited all our analyses and normalized all samples to the mean ΔCt of the control group, which captures biological variation in both control and experimental groups. All data are now re-plotted to show individual data points for both control and experimental groups, and the error bars on controls represent the biological variation across replicates (Figure 4D, 4F, 4G, S8, S9). Statistical analyses were also revised accordingly, and, importantly, they do not change any conclusions. Please note that the abdominal expression of sNPF and RYa are so low that the controls show very variable baseline expression values.

      Reviewer #3 (Public review):

      Summary:

      This manuscript investigates the regulation of host-seeking behavior in Anopheles stephensi females across different life stages and mating states. Through transcriptomic profiling, the authors identify differential gene expression between "blood-hungry" and "blood-sated" states. Two neuropeptides, sNPF and RYamide, are highlighted as potential mediators of host-seeking behavior. RNAi knockdown of these peptides alters host-seeking activity, and their expression is anatomically mapped in the mosquito brain (sNPF and RYamide) and midgut (sNPF only).

      Strengths:

      (1) The study addresses an important question in mosquito biology, with relevance to vector control and disease transmission.

      Transcriptomic profiling is used to uncover gene expression changes linked to behavioral states.

      (2) The identification of sNPF and RYamide as candidate regulators provides a clear focus for downstream mechanistic work.

      (3) RNAi experiments demonstrate that these neuropeptides are necessary for normal hostseeking behavior.

      (4) Anatomical localization of neuropeptide expression adds depth to the functional findings.

      Weaknesses:

      (1) The title implies that the neuropeptides promote host-seeking, but sufficiency is not demonstrated and some conclusions appear premature based on the current data. The support for this conclusion would be strengthened with functional validation using peptide injection or genetic manipulation.

      (2) The identification of candidate receptors is promising, but the manuscript would be significantly strengthened by testing whether receptor knockdowns phenocopy peptide knockdowns. Without this, it is difficult to conclude that the identified receptors mediate the behavioral effects.

      (3) Some important caveats, such as variation in knockdown efficiency and the possibility of offtarget effects, are not adequately discussed.

      These comments were addressed in the previous round.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Awesome paper everyone. A delight to read and review.

      Thank you very much! We appreciated your comments too!

    1. Reviewer #1 (Public review):

      Summary:

      This paper examines plasticity in early cortical (V1-V3) areas in an impressively large number of rod monochromats (individuals with achromatopia). The paper examines three things:

      (1) Cortical thickness. It is now well established that early complete blindness leads to increases in cortical thickness. This paper shows increased thickness confined to the foveal projection zone within achromats. This paper replicates work by Molz (2022) and Lowndes (2021), but the detailed mapping of cortical thickness as a function of eccentricity and the inclusion of higher retinotopic areas is particularly elegant.

      (2) Failure to show largescale reorganization of early visual areas using retinotopic mapping. This is a replication of a very recent study of Molz et al. but I believe, given anatomical variability, the larger n in this study, and how susceptible pRF findings are to small changes in procedure, this replication is also of interest.

      (3) Connective field modelling, examining the connections between V3-V1. The paper finds changes in the pattern of connections, and smaller connective fields in individuals with achromatopsia than normally sighted controls, and suggests that these reflect compensatory plasticity, with V3 compensating for the lower resolution V1 signal in individuals with achromatopsia.

      This is a carefully done study (both in terms of data collection and analysis) that is an impressive amount of work.

      *Effects of eye-movements

      The authors have carried out the eye-movement analyses I asked of them. Unfortunately, in 4 individuals they couldn't calibrate the eyetracker (it's impressive they managed in 10). I think this means that 4 of 13 (since a different participant was excluded from head motion) individuals weren't included in correlation analyses. Limiting the correlation analysis to individuals with better fixation has obvious issues. I'd recommend redoing (or additionally including) stats using non-parametric measures while classifying these 4 as having fixation instability of 3 (i.e. greater instability than the participant with the worst fixation who was successfully calibrated).

      *Interpreting pRFs

      The paper would be strengthened by a little more explicit clarity about what pRFs represent and how that affects their interpretation of their findings as plasticity vs. non-plasticity (I know the authors are aware of this, but I think it would be helpful for readers who are less experienced in pRFs). In the introduction it would be helpful to point out that pRFs represent the collective response of a large population of neurons, and as a result pRF estimates can vary depending on which population of neurons that stimulus drives.

      For example, imagine for the sake of argument that rods only project to V1 neurons with larger receptive fields. If one measured pRFs in a control observer under phototopic vs. scotopic conditions one would see smaller pRFs in the photopic conditions. This wouldn't represent 'plasticity' - it would represent the fact that the firing neurons contributing to the pRF signal are a slightly different population because of a change in the stimulus content. This is of course exactly what you see in 2C. And indeed, the authors make this identical point ". In the non-selective condition, the smaller pRFs in controls are in line with the higher spatial resolution of the<br /> cone system, which is not active in the achromat group." But this point would be clearer if more of the conceptual underpinnings were made explicit in the introduction (or at this point in the paper).

      Shifts in which population of neurons drive your pRFs can explain main of the more puzzling results in the paper without detracting from your main conclusions. For example, in 2D, I don't think it's differences in S/N driving your results (pRFs are at least theoretically meant to be robust to S/N). If smaller RFs 'drop out' under low luminance and these smaller RFs also tend to be more central, then one would expect the control results of 1D. And I think a similar argument might even be made for the smaller difference in the rod monochromats.

      It would be possible to make the point of Figure 4B more simply if Figure 4B was replaced by additional Panels in Figure 2 simply showing V3 pRF sizes/eccentricity distributions. That would make the point that you don't see the same expansion in pRF sizes in V3 in a way that is just as clear, and is closer to the data.

      *Interpreting cRFs

      Similarly, I think the paper would be improved with more clarity about the underlying signal in CF modeling. Once again, I appreciate that the authors are familiar with this, but it will help the reader in interpretation. (And I do believe thinking carefully about this may alter your interpretations). CF receptive fields 'find' the region in V1 that best predict the V3 signal in a given voxel. In resting state this likely represents a combination of:

      (1) visually driven signal - correlations that may or may not reflect connectivity but represent the fact that regions that represent the same region of visual space will be active at the same time.

      (2) global bilaterally symmetrical signal consisting of enhanced correlations between iso-eccentric regions (Raemaekers et al., 2014), which may arise from vasculature that symmetrically stems from the posterior cerebral artery (Tong et al., 2013; Tong and Frederick, 2014).

      (3) intrinsic neural fluctuations that are more strongly correlated between connected neurons. These are likely quite weak compared to the other contributions.

      I think if you ignore 2, (which is not likely to differ between rod mono and controls) and model 1 and 3, you might well see shifts in CFs towards the boundary of the scotoma - essentially the CF's location will be biased towards the region of V1 that has stronger correlations - which = the region which has a visual signal.

      I do find convincing the argument that you don't see the same shift in controls in the rod-selective condition. So I think the results of 4A are fine. But a little more clarity about 'what's under the hood' in CF modeling would be nice.

      *Interpreting the relationship between pRFs and cRFs

      So there's something here that confuses me. We are all agreed that V3 pRF sizes are similar across RM and control. V1 pRFs are larger in RM. It feels intuitive that smaller CFs would compensate but I can't make it make sense to myself when I think it through. Each pRF represents a combination of receptive field location scatter and bandwidth. You want to argue that eccentricity mapping looks pretty normal, so there's no reason to think increased rf scatter, and I can believe that (though I do think this assumption should be discussed explictly).

      So far I think we agree.

      But let's think about what drives a CF during visual stimulation ... Specifically lets think about 'the pRF of the CF' (the region of visual space represented by the cluster of voxels in the CF). If pRFs for individual voxels in V1 are big, then the pRF for the CF is also going to be large. But we know that pRFs for V3 are normal size. So, the V3 CF will 'find' a smaller number of voxels in V1, in order to try to find the 'correct sized' CF pRF. Note that this explanation is very similar to yours. But doesn't require ANY 'intrinsic' connectivity. It's really just assuming the whole thing is driven by the visual signal and the CF size is determined by the ratio of the pRF sizes in V3 vs. V1.

      One possible solution would be to regress out the visual stimulus and redo this analysis based on the residuals.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper examines plasticity in early cortical (V1-V3) areas in an impressively large number of rod monochromats (individuals with achromatopia). The paper examines three things:

      (1) Cortical thickness. It is now well established that early complete blindness leads to increases in cortical thickness. This paper shows increased thickness confined to the foveal projection zone within achromats. This paper replicates the work by Molz (2022) and Lowndes (2021), but the detailed mapping of cortical thickness as a function of eccentricity and the inclusion of higher visual areas is particularly elegant.

      (2) Failure to show largescale reorganization of early visual areas using retinotopic mapping. This is a replication of a very recent study by Molz et al. but I believe, given anatomical variability (and the very large n in this study) and how susceptible pRF findings are to small changes in procedure, this replication is also of interest.

      (3) Connective field modelling, examining the connections between V3-V1. The paper finds changes in the pattern of connections, and smaller connective fields in individuals with achromatopsia than normally sighted controls, and suggests that these reflect compensatory plasticity, with V3 compensating for the lower resolution V1 signal in individuals with achromatopsia.

      Strengths:

      This is a carefully done study (both in terms of data collection and analysis) that is an impressive amount of work. I have a number of methodological comments but I hope they will be considered as constructive engagement - this work is highly technical with a large number of factors to consider.

      Weaknesses:

      (1) Effects of eye-movements

      I have some concerns with how the effects of eye-movements are being examined. There are two main reasons the authors give for excluding eye-movements as a factor in their results. Both explanations have limitations.

      (a) The first is that R2 values are similar across groups in the foveal confluence. This is fine as far as it goes, but R2 values are going to be low in that region. So this shows that eyemovements don't affect coverage (the number of voxels that generate a reliable pRF), but doesn't show that eye-movements aren't impacting their other measures.

      We agree with the reviewer that eye movements could affect pRF measures. We have now also included data for all participants where we were able to obtain eye tracking measures and directly tested this relationship. Relevant results are copied below.

      Recap of results: 1) as expected gaze was less stable in achromats than controls, 2) achromats with more stable gaze did not show more activation in the scotoma projections zone, which we might have observed if fixation instability masks signals in this region 3) Gaze instability was not correlated with pRF size and eccentricity across V1 in achromats. We note that the relationship between nystagmus and visual sampling is complex - patients experience a stable image and may sample only during a specific phase of the eye movement. It is therefore not inherently clear if and how nystagmus affects pRF size.

      Relevant Manuscript text incorporating these analyses is copied below.

      To quantify eye movement, we used the following methods added to the manuscript:

      “Fixation stability

      Participants’ gaze was tracked throughout all pRF mapping runs. Collecting reliable gaze data from individuals with nystagmus is a challenge because out of the box calibration procedures mostly fail without stable fixation. To account for this, we implemented a post-hoc custom calibration procedure (Tailor et al., 2021). The eye-tracker was first precalibrated on a typically sighted individual. Then, before every other run, we collected gaze data from a 5-point fixation task (at fixation and above, below, left, and right of fixation at 5 eccentricity). This data allowed us to subsequently map the patient's recorded gaze coordinates to their precise locations on the screen. In 10 out of the 14 achromats we acquired reliable enough data to assess fixation stability.

      Calibration data processing: We first removed the first 0.5 seconds for each fixation location to allow for fixation to arrive on the target. We then performed (a) blink removal, (b) filtered out time points with eye movement velocity outliers (±2SD), and (c) filtered out any positions >3SDs to the left or right of the mean fixation location, and >1SD above or below. We took the median of the remaining gaze measurements as an approximate fixation estimate. The resulting 5 median fixation locations were used to fit an affine transformation that remapped the recorded gaze positions into screen space. 

      Quantifying fixation stability: after applying the transformation of the post-hoc calibration, data was filtered for blinks and extreme velocities (<2SD). For each functional run, fixation instability was measured as the standard deviation of gaze x-positions across 1second windows. Measures were then averaged across the two run repeats.”

      We report the resulting new fixation data results as follows:

      Results (coverage section):

      “Another potential confound in our findings is fixation instability. In pRF mapping, which is usually conducted under photopic (cone-dominant) conditions, unstable fixation can cause a signal drop in the foveal projection zone. As expected due to nystagmus, the achromatopsia group showed higher fixation instability compared to controls (rodselective: t<sub>(9.08)</sub>=-3.19, p=0.01; non-selective: t<sub<(9.41)</sub>=-4.88, p<0.001 degrees-offreedom corrected for unequal-variance; see Supplement Figure S2a). However, several lines of evidence suggest this instability cannot fully account for the lack of "filling in" in achromats. First, within the achromat group, we found no correlation between fixation stability and coverage (rod-selective: spearman-r<sub>(8)</sub> = -0.36, p=0.31; non-selective spearman-r<sub>(8)</sub>=0.07,p=0.85); Individuals with more stable, control-like fixation did not show more signal inside the scotoma (see Supplement 2). Second, in adults with achromatopsia, typically with less severe nystagmus (Kohl et al., 1993), two recent studies also found absence of filling in (Anderson et al., 2024; Molz et al., 2023).

      So, while we cannot fully exclude nystagmus masking foveal signals in the cortex of some patients, this converging evidence from structural and functional MRI measures across different studies and groups, strongly suggests that the deprived cortex does not substantially ‘fill in’ with peripheral rod inputs in achromatopsia.”

      Results (pRF size + eccentricity):

      “Larger pRFs indicate that neuronal populations in achromats’ V1 cortex, combine information across larger areas in visual space than in typically sighted controls. This could reflect true neural tuning differences as well as be driven by larger eye movement. However, fixation instability in achromats do not significantly correlate with pRF size in our sample (rod-selective: spearman-r<sub>(8)</sub> = -0.41, p=0.24; non-selective spearman-r<sub>(8)</sub>=0.37,p=0.29)

      It has been shown that fitting artefacts around scotoma edges, can give rise to similar outward eccentricity shifts (Binda et al., 2013). However, when accounting for fitting artefacts around the foveal scotoma edge by modelling the rod-free zone during pRF fitting, pRF size and eccentricity differences remain unchanged (see Supplement 3). Finally, we found no significant correlations between gaze stability and the eccentricity shift (rod-selective: spearman-r<sub>(8)</sub> = 0.58, p=0.08; non-selective spearman-r<sub>(8)</sub>=0.09,p=0.8, Supplement 4D)

      Together, these analyses reveal subtle differences in how V1 of achromats responds to rod signals outside the foveal zone, which are consistent with results from other studies (Molz et al. 2023, Anderson et al. 2024). While we found no direct evidence that these are being driven by confounding factors such as eye-movements or fitting artefacts, more work is needed to understand the underlying processes that give rise to these shifts.”

      The following text has been added to Supplement 2

      “As expected, achromats showed significant higher fixation instability compared to controls (as reported in the main text). We found no significant correlation between fixation instability and either coverage, pRF size, eccentricity in achromats. Results of Spearman R correlations in both rod- and non-selective conditions are reported in the figure. We note that the relationship between nystagmus and visual sampling is complex- patients experience a stable image and may sample only during specific eyemovement phases. It is therefore not fully clear if and how nystagmus should give rise to altered pRFs.”

      (b) The authors don't see a clear relationship between coverage and fixation stability. This seems to rest on a few ad hoc examples. (What happens if one plots mean fixation deviation vs. coverage (and sets the individuals who could not be calibrated as the highest value of calibrated fixation deviation. Does a relationship then emerge?).

      In any case, I wouldn't expect coverage to be particularly susceptible to eye-movements. If a voxel in the cortex entirely projects to the scotoma then it should be robustly silent. The effects of eye-movements will be to distort the size and eccentricity estimates of voxels that are not entirely silent.

      There are many places in the paper where eye-movements might be playing an important role. 

      Examples include the larger pRF sizes observed in achromats. Are those related to fixation instability?

      We thank the reviewer for their comment. As detailed in our previous response, we have now extracted fixation instability data from additional patients and have expanded our discussion of its potential effects throughout the manuscript.

      Given that fixation instability is expected to increase pRF size by a fixed amount, that would explain why ratios are close to 1 in V3 (Figure 4).

      We agree with the reviewer’s point, that the ratio change on its own is not strong evidence of compensation, this analysis was meant to complement the CF result. The plot in Figure 4 is intended to reconcile the connective field (CF) and pRF results. Its purpose is to illustrate that even though larger pRFs in achromats might seem counterintuitive alongside their smaller V3 CF sizes, the pRF data do not contradict the CF findings but they are in fact consistent with one another. We also agree that there are alternative explanations for the differences in pRF size, such as fixation stability, and we have now added this point to the text.

      Results (CF size):

      “To understand how this finer cortical sampling in V3 (smaller connective fields) impacts visual processing, we consider its effect on population receptive fields (pRFs). In V1, pRF sizes in achromats were significantly larger than in controls for both stimulus conditions, indicating coarser spatial tuning at the cortical input stage (Figure 4C, left). By selectively sampling from a smaller area of the V1 surface (smaller CFs), V3 can effectively compensate for this coarser input. If so, this process should result in a relative normalisation of pRF size in V3 compared to V1 (Figure 4C, right).

      To test this prediction, we plotted the ratio of pRF sizes between achromats and controls, where a value of 1 indicates parity between the groups (Figure 4B). As our compensatory connective field hypothesis predicts, the ratio was closer to 1 in V3 than in V1 across both stimulus conditions, confirming the pRF size difference was significantly reduced at the higher cortical stage. Together this shows converging evidence across the two models (pRF and CF) of hierarchical refinement as a possible compensatory mechanism, where V3's altered connectivity helps to normalize the processing of degraded sensory input from V1.”

      Discussion:

      “The hierarchical reorganisation observed in V3 is unlikely to be driven by fixation instability. Connective field (CF) estimates are robust to eye movements (Tangtartharakul et al., 2023), because they are anchored to V1 inputs rather than absolute screen position. Considered alone, the pRF results could alternatively be explained by eye movements introducing a fixed size offset that affects smaller V1 pRFs more strongly than those in V3. While we found no evidence for this relationship between pRF size and gaze measures in our patients, we cannot fully rule out the possibility. Nevertheless, the internal consistency between the CF and pRF measures provides a more parsimonious account; that sampling across the hierarchy accounts for coarser tuning at the input stage.”

      (2) Topography

      The claim of no change in topography is a little confusing given that you do see a change in eccentricity mapping in achromats. 

      Either this result is real, in which case there *is* a change in topography, albeit subtle, or it's an artifact. 

      Perhaps these results need a little bit of additional scrutiny. 

      One reason for concern is that you see different functions relating eccentricity to V1 segments depending on the stimulus. That almost certainly reflects biases in the modelling, not reorganization - the curves of Figure 2D are exactly what Binda et al. predict. 

      Another reason for concern is that I'm very surprised that you see so little effect of including/not including the scotoma - the differences seem more like what I'd expect from simply repeating the same code twice. (The quickest sanity check is just to increase the size of the estimated scotoma to be even bigger?).

      We thank the reviewer for their comment. We have double-checked our scotoma modelling, confirming its correct implementation. The results of the scotoma modelling are not identical to the full one, just similar (see below).

      Previous studies on “artificial scotomas” (such as the one reported by Binda et al.) have shown mixed results. While Binda and colleagues found that modelling artificial scotomas normalised pRF shifts, others found no effect (Haak et al. 2012, Prabhakaran et al. 2020). Notably, the rodfree zone in achromatopsia is considerably smaller (~0.5° radius) than most tested artificial scotomas. Moreover, it is unclear whether scotoma modelling is beneficial in clinical populations as artificial scotomas (screen-based masking) are not equivalent to retinal scotomas from inactive photoreceptors. A recent achromatopsia study (Anderson et al. 2024) also found no change in pRF estimates with scotoma modelling.

      In our scotoma analyses, we found meaningful differences only in the non-selective condition in controls where cones in the rod-free zone are stimulated - which would be the main expected effect of this modelling exercise (see below). In all other conditions (rod-selective in controls, both conditions in achromats), only rods are stimulated, we found no difference in coverage, eccentricity or pRF size when modelling the scotoma likely because the foveal signal is weak/absent, and did not contribute much to pRF estimates in the unmasked analyses.

      This means we cannot account for the eccentricity shift as an edge effect with this scotoma model – but we remain cautious about interpreting it as real. This is because first, as we mention in the paper, in the non-selective condition, which has a higher signal-to-noise ratio, the eccentricity estimates in achromats match those of the control group's rod system. Second, it is still possible that the observed shift is an artefact of modelling that was not accounted for by the approach of scotoma modelling.

      Our claim of "no change in topography" specifically referred to the absence of "filling-in" as measured by cortical coverage - the percentage of activated tissue regardless of fitted parameters. However, to avoid confusing given the eccentricity and pRF size results we now rephrased our claim.

      Abstract:

      “Cortical input stages (V1) exhibited high stability, with input-deprived cortex showing no retinotopic remapping and exhibiting structural hallmarks of deprivation.”

      Results (pRF eccentricity):

      “It has been shown that fitting artefacts around scotoma edges, can give rise to similar outward eccentricity shifts (Binda et al., 2013). However, when accounting for fitting artefacts around the foveal scotoma edge by modelling the rod-free zone during pRF fitting, pRF size and eccentricity differences remain unchanged (see Supplement 3). Finally, we found no significant correlations between gaze stability and the eccentricity shift (rod-selective: spearman-r<sub>(8)</sub> = 0.58, p=0.08; non-selective spearman-r<sub>(8)</sub>=0.09,p=0.8, Supplement 4D)

      Together, these analyses reveal subtle differences in how V1 of achromats responds to rod signals outside the foveal zone, which are consistent with results from other studies (Molz et al. 2023, Anderson et al. 2024). While we found no direct evidence that these are being driven by confounding factors such as eye movements or fitting artefacts, more work is needed to understand the underlying processes that give rise to these shifts.”

      To better illustrate the effect of scotoma modelling text has been added to Supplement 3:

      “Studies on artificial scotomas, where part of the visual field is masked, suggest that pRF estimates of eccentricity and size can be biased by fitting scotoma-edge artefacts, and that these can be mitigated by modelling the scotoma in the pRF fitting procedure (e.g., Binda et al. 2013).

      We therefore repeated the pRF modelling procedure with the rod-scotoma being modelled as a black oval mask (1.25°x0.9°) over the stimulus aperture model. As expected, a visible difference between the two models is only apparent in the nonselective condition in controls where the cones in the rod-free zone are being stimulated. In all the other conditions (rod-selective in controls, and both stimulation conditions in achromats) only the rods are stimulated, therefore the masked stimulus still matches the retinal activation, and no major differences can be observed. Performing the same statistical tests applied to the full model in the main text yields equivalent results of equivalent coverage in the rod-selective condition, with equivalent coverage across groups(t(47) = 0.78, p=0.43, BF10=0.31) and controls show a higher coverage in the non-selective stimulation condition compared to achromats (Mann U(52)=141, p<0.01; unequal variance, reverted to non-parametric).

      This consistency in pRF properties when modelling the rod scotoma, is in line with previous results from scotoma modelling; While Binda and colleagues found that this normalised pRF shifts, others found no effect (Haak et al. 2012, Prabhakaran et al. 2020). Notably, the rod-free zone in achromatopsia is considerably smaller (~0.5° radius) than most tested artificial scotomas, and as artificial scotomas (screen-based masking) are not equivalent to retinal scotomas from inactive photoreceptors, it is unclear how artificial scotoma findings generalise to clinical populations. Our results are in line with a recent achromatopsia study (Anderson et al. 2024) which also found no change in pRF estimates with scotoma modelling.”

      I'd also look at voxels that pass an R2>0.2 threshold for both the non-selective and selective stimulus. Are the pRF sizes the same for both stimuli? Are the eccentricity estimates? If not, that's another clear warning sign.

      Comparable results were obtained when using higher R2 thresholds. These results are now included in Supplement 6.

      (3) Connective field modelling

      Let's imagine a voxel on the edge of the scotoma. It will tend to have a connective field that borders the scotoma, and will be reduced in size (since it will likely exclude the cortical region of V1 that is solely driven by resting state activity). This predicts your rod monochromat data. The interesting question is why this doesn't happen for controls. One possibility is that there is topdown 'predictive' activity that smooths out the border of the scotoma (there's some hint of that in the data), e.g., Masuda and Wandell.

      One thing that concerns me is that the smaller connective fields don't make sense intuitively. When there is a visual stimulus, connective fields are predominantly driven by the visual signal. In achromats, there is a large swath of cortex (between 1-2.5 degrees) which shows relatively flat tuning as regards eccentricity. The curves for controls are much steeper, See Figure 2b. This predicts that visually driven connective fields should be larger for achromats. So, what's going on?

      The reviewer raises interesting points about the interpretation of our connective field results. The possibility of differential top-down modulation between controls and achromats is intriguing, however it is not supported by the data, if top-down modulation is activating foveal V1 in controls then we shouldn’t see a drop in the amount of significant vertices sampling from the fovea in the rod-selective condition compared to the non-selective, but in fact we do see quite a large drop in the amount of significant vertices in that area in the rod-selective condition. Therefore, at the moment we do not think there is strong basis to assume our data could be explained by achromats lacking top-down predictive activity in the scotoma area that is present in controls.

      Regarding the concern about smaller CFs seeming counterintuitive given the flat eccentricity tuning in achromats' V1: we believe there is not a straightforward prediction from pRF properties to CF sizes. The relationship between V1 pRF characteristics and V3 CF sampling is complex and not well-established in the literature, and the two can be decoupled to some degree. For instance, in our data, controls show flat V1 pRF sizes in the rod-selective condition (similar to achromats), yet their V3 CF sizes maintain the typical eccentricity-dependent increase seen in the non-selective condition. This suggests that CF size patterns don't simply mirror V1 pRF properties or visual stimuli responses.

      Importantly, CF modelling fundamentally differs from pRF analysis in how it might be affected by scotomas. Unlike pRF analysis where a scotoma creates a "silent" region in visual space, in CF modelling the deprived cortex remains physically present and continues generating neural signals (albeit not visually-driven ones). If V3-V1 connectivity were anatomically fixed, V3 would continue sampling from deprived V1 regions even if they do not produce visual-driven signals. A change in this sampling pattern, as we see in our data, is therefore evidence for plasticity.

      Our data support this interpretation. First, in achromats, the CF size pattern observed cannot be easily explained by scotoma-edge artefacts. V3 vertices sampling from the immediate vicinity of the scotoma (1°-3°) show CF sizes comparable to controls. The effect is only significant further away from the scotoma (4°-6°).

      Second, to assess how the presence of a scotoma affects CF measure we can compare the two conditions in the controls, since the rod-selective condition has a scotoma present and the nonselective condition does not. For this purpose, we performed an additional analysis, quantifying on a vertex-by-vertex level the differences in CF fitted parameters between the two stimulation conditions across V1. See results below. In achromats there are no systematic shifts between the stimulation conditions, as expected as both are rod-driven. In controls, this analysis reveals only subtle shifts (~0.45° in the rod-selective condition). CF size has also changed slightly although not significantly different from that observed in achromats. These shifts are much smaller than the CF size and eccentricity differences between controls and achromats, so we consider it unlikely that our findings are driven by scotoma artefacts.

      Author response image 1.

      Results (CF size):

      “The significant CF size differences are unlikely to be a model-fitting bias around a scotoma edge, as V3 vertices sampling from the immediate vicinity of the scotoma (1°3°) show CF sizes comparable to controls. The significant reduction in CF size occurs only further in the periphery (4°-6°), in regions that are primarily stimulus-driven.

      To understand how this finer cortical sampling in V3 (smaller connective fields) impacts visual processing, we consider its effect on population receptive fields (pRFs). In V1, pRF sizes in achromats were significantly larger than in controls for both stimulus conditions, indicating coarser spatial tuning at the cortical input stage (Figure 4C, left). By selectively sampling from a smaller area of the V1 surface (smaller CFs), V3 can effectively compensate for this coarser input. If so, this process should result in a relative normalisation of pRF size in V3 compared to V1 (Figure 4C, right).

      To test this prediction, we plotted the ratio of pRF sizes between achromats and controls, where a value of 1 indicates parity between the groups (Figure 4B). As our compensatory connective field hypothesis predicts, the ratio was closer to 1 in V3 than in V1 across both stimulus conditions, confirming the pRF size difference was significantly reduced at the higher cortical stage. Together this shows converging evidence across the two models (pRF and CF) of hierarchical refinement as a possible compensatory mechanism, where V3's altered connectivity helps to normalize the processing of degraded sensory input from V1.”

      Discussion (added paragraph):

      “The hierarchical reorganisation observed in V3 is unlikely to be driven by fixation instability. Connective field (CF) estimates are robust to eye movements (Tangtartharakul et al., 2023), because they are anchored to V1 inputs rather than absolute screen position. Considered alone, the pRF results could alternatively be explained by eye movements introducing a fixed size offset that affects smaller V1 pRFs more strongly than those in V3. While we found no evidence for this relationship between pRF size and gaze measures in our patients, we cannot fully rule out the possibility. Nevertheless, the internal consistency between the CF and pRF measures provides a more parsimonious account; that sampling across the hierarchy accounts for coarser tuning at the input stage.”

      The beta parameter is not described (and I believe it can alter connective field sizes).

      In Author response image 2, we plot the beta parameter of the pRF modelling in V1 with no R<sup>2</sup> filtering, error bars are 95% CIs:

      Author response image 2.

      The reviewer did not specify how beta might alter connective field sizes. We assume he meant that as in pRF mapping, the slope of activity from deprived to non-deprived cortex will artefactually create a CF model fit with smaller CF sizes. To test this, we calculated the slope of beta values between 0° and 3° in each participant in the rod-selective condition, as this range includes the scotoma and the area at the edge of the scotoma. We then used the slope as a covariate in an ANCOVA when comparing the CF sizes across groups in each sampled V1 segment. Accounting for the beta slope of V1 did not change the reported results. This analysis still shows smaller CF sizes in V3 in the rod-selective conditions between 4°-6° eccentricity – these differences remain significant (p<0.001 for 4°-5° and p<0.05 for 5°-6° when comparing achromats vs controls).

      Similarly, it's possible to get very small connective fields, but there wasn't a minimum size described in the thresholding.

      CF sizes were fit with a grid fit. Possible values were [0.5,1,2,3,4,5,7,10]. Therefore, the minimum size is 0.5. Filtering out the smallest connective field sizes does not change the results:

      Author response image 3.

      I might be missing something obvious, but I'm just deeply confused as to how the visual maps and the connectome maps can provide contradictory results given that the connectome maps are predominantly determined by the visual signal. Some intuition would be helpful.

      We agree that this appears counterintuitive, and now added further clarification. The two models (pRF and CF) fundamentally differ in what they measure and how they relate to visual processing. V1 pRF sizes reflect the relationship between neural activity and visual stimuli - essentially how much of a visual stimulus drives a voxel's response - while V3 CF sizes reflect how V3 samples from the V1 cortical surface, indicating how many V1 voxels contribute to a V3 voxel's activity.

      The measures constrain each other, as a V3 voxel's pRF size is expected to match the pooling of its connected V1 inputs. But they can be decoupled: A V3 voxel could sample from a small area of V1 cortex (a small CF in mm) that happens to represent a large area of visual space if those V1 voxels have large pRFs. The aim of Figure 4B is to clarify that the measures are consistent with one another even though they diverge in direction. In achromats, where V1 voxels have larger pRFs (coarser spatial resolution), V3 appears to compensate by sampling more selectively from V1 via smaller CF sizes. Theoretically, this should reduce the pRF size difference between controls and patients in V3, a prediction that our data supports.

      Results (CF size):

      “To understand how this finer cortical sampling in V3 (smaller connective fields) impacts visual processing, we consider its effect on population receptive fields (pRFs). In V1, pRF sizes in achromats were significantly larger than in controls for both stimulus conditions, indicating coarser spatial tuning at the cortical input stage (Figure 4C, left). By selectively sampling from a smaller area of the V1 surface (smaller CFs), V3 can effectively compensate for this coarser input. If so, this process should result in a relative normalisation of pRF size in V3 compared to V1 (Figure 4C, right).

      To test this prediction, we plotted the ratio of pRF sizes between achromats and controls, where a value of 1 indicates parity between the groups (Figure 4B). As our compensatory connective field hypothesis predicts, the ratio was closer to 1 in V3 than in V1 across both stimulus conditions, confirming the pRF size difference was significantly reduced at the higher cortical stage. Together this shows converging evidence across the two models (pRF and CF) of hierarchical refinement as a possible compensatory mechanism, where V3's altered connectivity helps to normalize the processing of degraded sensory input from V1.”

      Discussion (added paragraph):

      “The hierarchical reorganisation observed in V3 is unlikely to be driven by fixation instability. Connective field (CF) estimates are robust to eye movements (Tangtartharakul et al., 2023), because they are anchored to V1 inputs rather than absolute screen position. Considered alone, the pRF results could alternatively be explained by eye movements introducing a fixed size offset that affects smaller V1 pRFs more strongly than those in V3. While we found no evidence for this relationship between pRF size and gaze measures in our patients, we cannot fully rule out the possibility. Nevertheless, the internal consistency between the CF and pRF measures provides a more parsimonious account; that sampling across the hierarchy accounts for coarser tuning at the input stage.”

      Some analyses might also help provide the reader with insight. For example, doing analyses separately on V3 voxels that project entirely to scotoma regions, project entirely to stimulusdriven regions, and V3 voxels that project to 'mixed' regions.

      We agree that it is important to plot the connective field dynamics across the scotoma region.

      In Figure 4A we split the V3 vertices based on the V1 area they sample from. Therefore the 0°-1° would be considered as mainly sampling from the “scotoma” region and the higher the eccentricity is, the less “scotoma” it includes. The V3 vertices that have a significantly smaller CF size compared to controls are those sampling from mostly if not entirely stimulusdriven regions 4°-5° and 5°-6°. We are not sure how further binning the data by within, across and outside scotoma would be more informative.

      However, in Author response image 4, we plot in more details the distribution of CF sizes sampling from a V1 segment clearly inside and clearly outside the scotoma. The top figure shows the CF size distribution of V3 vertices that sample from a V1 0°-1° segment, where V1 is deprived of input due to the rod scotoma. In achromats, there is a clear drop in vertices with a very small (0.5) CF size. The bottom figure shows the distribution of V3 vertices that sample from the V1 4°-5° segment which falls outside the scotoma and shows a significant difference in CF size across the groups. Here in achromats you can see a drop in larger V3 CF sizes sampling from the V1 region, and an increase in smaller ones (note that this further addresses a previous concern that connective field differences across groups are solely driven by very small CFs).

      Author response image 4.

      Following the reviewer’s comment we have added the following statement in the results section discussing CF size:

      “The significant CF size differences are unlikely to be a model-fitting bias around a scotoma edge, as V3 vertices sampling from the immediate vicinity of the scotoma (1°3°) show CF sizes comparable to controls. The significant reduction in CF size occurs only further in the periphery (4°-6°), in regions that are primarily stimulus-driven.”

      The finding that pRF sizes are larger in achromats by a constant factor as a function of eccentricity is what differences in eye-movements would predict. It would be worth examining the relationship between pRF sizes and fixation stability.

      We found no relationship between fixation stability and pRF size in V1, although as we explain in response to an earlier point, this does not fully exclude the reviewers alterative explanation, which we now add to the discussion.

      Discussion:

      “The hierarchical reorganisation observed in V3 is unlikely to be driven by fixation instability. Connective field (CF) estimates are robust to eye movements (Tangtartharakul et al., 2023), because they are anchored to V1 inputs rather than absolute screen position. Considered alone, the pRF results could alternatively be explained by eye movements introducing a fixed size offset that affects smaller V1 pRFs more strongly than those in V3. While we found no evidence for this relationship between pRF size and gaze measures in our patients, we cannot fully rule out the possibility. Nevertheless, the internal consistency between the CF and pRF measures provides a more parsimonious account; that sampling across the hierarchy accounts for coarser tuning at the input stage.”

      Reviewer #2 (Public review):

      Summary:

      The authors inspect the stability and compensatory plasticity in the retinotopic mapping in patients with congenital achromatopsia. They report an increased cortical thickness in central (eccentricities 0-2 deg) in V1 and the expansion of this effect to V2 (trend) and V3 in a cohort with an average age of adolescents.

      In analyzing the receptive fields, they show that V1 had increased receptive field sizes in achromats, but there were no clear signs of reorganization filling in the rod-free area. In contrast, V3 showed an altered readout of V1 receptive fields. V3 of achromats oversampled the receptive fields bordering the rod-free zone, presumably to compensate and arrive at similar receptive fields as in the controls.

      These findings support a retention of peripheral-V1 connectivity, but a reorganization of later hierarchical stages of the visual system to compensate for the loss, highlighting a balance between stability and compensation in different stages of the visual hierarchy.

      Strengths:

      The experiment is carefully analyzed, and the data convey a clear and interesting message about the capacities of plasticity. 

      Weaknesses:

      The existence of unstable fixation and nystagmus in the patient group is alluded to, but not quantified or modeled out in the analyses. The authors may want to address this possible confound with a quantitative approach.

      We have responded to this in the “Recommendations for the authors” section of this reviewer, as they included a more detailed description of these points there.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I think the term rod monochromats should be included early in the paper since it's a more intuitive term to describe this population.

      We agree with the reviewer that the term “rod monochromats” is more intuitive as it clarifies the retinal source of the disease but have chosen the term achromats for consistency with a wide literature of published work in this group, including our own and our close collaborators’. To clarify, in the first mention of the group as achromats in the introduction we have now added this term:

      “Achromatopsia (also known as rod monochromacy) causes cone photoreceptors in the retina to be inactive from birth (Aboshiha et al., 2014).”

      (2) The paper essentially contains two definitions of 'eccentricity'. One (atlas/segments) comes from the Benson atlas and the other (functional) comes from pRF mapping. It would be good to make this distinction terminology clearer earlier in the paper. It would also be good to use more consistent terminology. I assume 'sampled atlas V1 eccentricity' in 3A is the same as 'V1 segment' in 1A?

      For consistency we have now referred to these as V1 segment and sampled V1 segment in the figures when describing the atlas-based definition, and eccentricity for the measured pRF-based eccentricity.

      (3) The 'stability vs. plasticity' framing in the introduction could be tightened slightly.

      We have made the following changes following the reviewer’s comment:

      “In the visual domain, the focal point of the debate on plasticity and stability has hinged on the extent to which retinal input deprivation can drive local reorganisation in early visual cortex, for example, for deprived tissue to take on inputs from spared retinal locations (Adams et al., 2007; Baker et al., 2005, 2008; Baseler et al., 2002, 2011; Calford et al., 2005; Dilks et al., 2009; Dumoulin & Knapen, 2018; Ferreira et al., 2016; Goesaert et al., 2014; Haak et al., 2015; Molz et al., 2023; Ritter et al., 2019; Schumacher et al., 2008). In reality visual impairment is a more global phenomenon, affecting all levels of visual processing, with complex dynamics beyond constricted local retinocortical projection zones(Carvalho et al., 2019).”

      (4) Figure 1A, define the x axis as degrees.

      We have now added the ° sign to all the tick labels indicating Benson map eccentricity.

      (5) Figure 2B, is there room for pictures of the silent substitution/standard stimulus

      We have now added images in a Supplement 5 to avoid cluttering the main Figure 2B

      (6) Figure 2

      Panel A has a slightly weird organization. The reader is supposed to compare the square symbols to each other, and the circles to each other, why not organize the figure so they are adjacent in the graph (i.e. non selective control, non-selective achromat, selective control, selective achromat)? That also helps the reader orient that in the non-selective conditions you have almost complete pRF coverage. 

      We have taken on the reviewer’s suggestion and changed the order.

      In the inset, maybe use empty symbols? That's the traditional way to say that the square/circle applies to both red and black.

      We prefer the current format.

      Figure 2C - the symbols change to circles? Why not keep the symbols of A?

      We have now changed the symbols of 2C&D.

      I'd put the non-selective maps above the selective maps?

      We appreciate the feedback but prefer to keep it as it is, as we feel the critical point is conveyed by the rod maps.

      (7) 'We propose a new hierarchical model of neural adaptation'. These ideas are hardly new. There are also other models, that would explain your data (cumulative plasticity) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5953572/

      We thank the reviewer for the reference. We have now cited it in our discussion and removed the word “new” form the mentioned sentence.

      “Therefore, there is theoretically broader scope for experience-dependent reweighting of inputs (Beyeler et al., 2017; Makin & Krakauer, 2023) and to optimise use of inputs that are still available, more reliable, or more relevant in the impaired system. Conversely, higher-order visual areas may appear more plastic simply because they integrate the cumulative effects of learning from multiple lower stages (Beyeler et al., 2017).”

      We propose a hierarchical model of neural adaptation…” [deleted the word new]

      (8) Line 508. No image of the stimulus is contained in the paper

      Corrected

      (9) Line 620. I believe the Figure is 1B, not 1C.

      Corrected

      (10) Figure 4A. CF Size - add mm2 to the axes.

      Corrected

      Reviewer #2 (Recommendations for the authors):

      I am not an expert on pRF mapping, and as such, I am unsure how to relate to pRF mapping performed in patients with unstable fixation (not quantified, but referred to) and nystagmus, such as the achromatic population here. Since the majority of the results hinge on this analysis, I would appreciate more data about the differences between the groups. Supplement 2, which is meant to speak to this, shows only the data from 3 typical participants, and in itself is not evidence for "no correlation between stable fixation and enhanced foveal". Additionally, I'd appreciate a clear methods explanation of how the authors address these confounds; this is too important a concern to be left for the discussion section.

      We agree with the reviewer that eye movements could affect pRF measures. We have now also included data for all participants where we were able to obtain eye tracking measures and directly tested this relationship. Relevant results are copied below.

      Recap of results: 1) as expected gaze was less stable in achromats than controls, 2) achromats with more stable gaze did not show more activation in the scotoma projections zone, which we might have observed if fixation instability masks signals in this region 3) Gaze instability was not correlated with pRF size and eccentricity across V1 in achromats. We note that the relationship between nystagmus and visual sampling is complex - patients experience a stable image and may sample only during a specific phase of the eye movement. It is therefore not inherently clear if and how nystagmus affects pRF size.

      Relevant Manuscript text incorporating these analyses is copied below.

      To quantify eye movement, we used the following methods added to the manuscript:

      “Fixation stability

      Participants’ gaze was tracked throughout all pRF mapping runs. Collecting reliable gaze data from individuals with nystagmus is a challenge because out of the box calibration procedures mostly fail without stable fixation. To account for this, we implemented a post-hoc custom calibration procedure (Tailor et al., 2021). The eye-tracker was first precalibrated on a typically sighted individual. Then, before every other run, we collected gaze data from a 5-point fixation task (at fixation and above, below, left, and right of fixation at 5 eccentricity). This data allowed us to subsequently map the patient's recorded gaze coordinates to their precise locations on the screen. In 10 out of the 14 achromats we acquired reliable enough data to assess fixation stability.

      Calibration data processing: We first removed the first 0.5 seconds for each fixation location to allow for fixation to arrive on the target. We then performed (a) blink removal, (b) filtered out time points with eye movement velocity outliers (±2SD), and (c) filtered out any positions >3SDs to the left or right of the mean fixation location, and >1SD above or below. We took the median of the remaining gaze measurements as an approximate fixation estimate. The resulting 5 median fixation locations were used to fit an affine transformation that remapped the recorded gaze positions into screen space.

      Quantifying fixation stability: after applying the transformation of the post-hoc calibration, data was filtered for blinks and extreme velocities (<2SD). For each functional run, fixation instability was measured as the standard deviation of gaze x-positions across 1second windows. Measures when then averaged across the two run repeats.”

      Results (coverage section):

      “Another potential confound in our findings is fixation instability. In pRF mapping, which is usually conducted under photopic (cone-dominant) conditions, unstable fixation can cause a signal drop in the foveal projection zone. As expected due to nystagmus, the achromatopsia group showed higher fixation instability compared to controls (rodselective: t<sub>(9.08)</sub>=-3.19, p=0.01; non-selective: t<sub<(9.41)</sub>=-4.88, p<0.001 degrees-offreedom corrected for unequal-variance; see Supplement Figure S2a). However, several lines of evidence suggest this instability cannot fully account for the lack of "filling in" in achromats. First, within the achromat group, we found no correlation between fixation stability and coverage (rod-selective: spearman-r<sub>(8)</sub> = -0.36, p=0.31; non-selective spearman-r<sub>(8)</sub>=0.07,p=0.85); Individuals with more stable, control-like fixation did not show more signal inside the scotoma (see Supplement 2). Second, in adults with achromatopsia, typically with less severe nystagmus (Kohl et al., 1993), two recent studies also found absence of filling in (Anderson et al., 2024; Molz et al., 2023).

      So, while we cannot fully exclude nystagmus masking foveal signals in the cortex of some patients, this converging evidence from structural and functional MRI measures across different studies and groups, strongly suggests that the deprived cortex does not substantially ‘fill in’ with peripheral rod inputs in achromatopsia.”

      Results (pRF size + eccentricity):

      “Larger pRFs indicate that neuronal populations in achromats’ V1 cortex, combine information across larger areas in visual space than in typically sighted controls. This could reflect true neural tuning differences as well as be driven by larger eye movement. However, fixation instability in achromats do not significantly correlate with pRF size in our sample (rod-selective: spearman-r<sub>(8)</sub> = -0.41, p=0.24; non-selective spearman-r<sub>(8)</sub>=0.37,p=0.29)

      It has been shown that fitting artefacts around scotoma edges, can give rise to similar outward eccentricity shifts (Binda et al., 2013). However, when accounting for fitting artefacts around the foveal scotoma edge by modelling the rod-free zone during pRF fitting, pRF size and eccentricity differences remain unchanged (see Supplement 3). Finally, we found no significant correlations between gaze stability and the eccentricity shift (rod-selective: spearman-r<sub>(8)</sub> = 0.58, p=0.08; non-selective spearman-r<sub>(8)</sub>=0.09,p=0.8, Supplement 4D)

      Together, these analyses reveal subtle differences in how V1 of achromats responds to rod signals outside the foveal zone, which are consistent with results from other studies (Molz et al. 2023, Anderson et al. 2024). While we found no direct evidence that these are being driven by confounding factors such as eye-movements or fitting artefacts, more work is needed to understand the underlying processes that give rise to these shifts.”

      The following text has been added to Supplement 2

      “As expected, achromats showed significant higher fixation instability compared to controls (as reported in the main text). We found no significant correlation between fixation instability and either coverage, pRF size, eccentricity in achromats. Results of Spearman R correlations in both rod- and non-selective conditions are reported in the figure. We note that the relationship between nystagmus and visual sampling is complex- patients experience a stable image and may sample only during specific eyemovement phases. It is therefore not fully clear if and how nystagmus should give rise to altered pRFs.”

      The field connectivity analysis similarly seems to be used only on task data from the same design; if it was replicated from resting-state data, that would be a good way to show consistency which is independent of measures requiring fixation. 

      We agree that resting-state data would be valuable; however, we did not collect such data in these individuals due to time limitations. Instead, we demonstrate the consistency and reliability of our results by replicating our findings across two different stimulation conditions (rod-selective and non-selective), which differ in luminance, contrast and signal amplitude in both groups and for controls also in the photoreceptors involved. The convergence of results across these distinct visual conditions strengthens our confidence in the reliability of the observed effects. Also, notably, CF estimates have been shown to be robust to large eye movements, and therefore also to differences in fixation stability across groups (Tangtartharakul et al., 2023).

      The authors may want to contextualize their findings in relation to what reorganization exists in cases of late-onset loss of part of the visual field on one hand (stroke recovery), and in the case of complete blindness from early life on the other, as both speak to different levels of plasticity the visual system is capable of.

      We thank the reviewer for their comment and have added a new paragraph discussing this topic.

      Discussion:

      “Our findings on hierarchical adaptation have broader implications for other visual disorders, depending on their timing and nature. For instance, a central scotoma acquired in adulthood, as in macular degeneration, may not trigger the same V3 sampling shifts (Haak et al., 2016), suggesting a sensitive window for this form of plasticity, after which connective fields remain more stable. This also raises questions about congenital blindness, where the absence of any driving input could lead to weakening or repurposing of hierarchical connections (Saccone et al., 2024). Moreover, principles may differ between a deprived but structurally intact cortex, as in retinal dystrophies, and a physically damaged cortex, as in stroke. In the latter, more extensive reorganisation may be required to sample effectively from surviving, and potentially disparate, regions of V1. Perceptual training effects in stroke rehabilitation may reflect such dynamics (Cavanaugh et al., 2025; Elshout et al., 2021).”

      A more minor point: Can the authors clarify what the dark adaptation is used for, and provide the supplementary analysis showing that the duration difference for some of the participants didn't impact the results (stated but not shown).

      The dark adaptation period before the rod-selective condition allowed rod photoreceptors to recover from bleaching caused by prior mesopic light exposure, ensuring optimal rod sensitivity under scotopic conditions. To verify that our 15-minute adaptation period was sufficient, we tested 10 control participants with an extended 45-minute adaptation period. As we found no differences in the resulting rod maps between standard and extended adaptation protocols, these participants were combined with the main control group for all analyses. Author response image 5 are the plots for the two dark adaptation periods.

      Author response image 5.

    1. On 2020-05-01 09:48:59, user Kasper Kepp wrote:

      This paper on the state-of-the-art Danish blood-donor data finds a IFR = 0.08% for people between 0-69 years of age. The study is very important because the sampling bias from case fatality ratios (the iceberg effect of knowing almost all deaths but only the most symptomatic cases, i.e. missing the dark number) is largely removed.

      By interpolation, the Danish population now has approximately 1.6% infection, corresponding to 100,000 people out of 6 million. The dark number stands at 12-fold the known cases (7-18).

      Some minor sampling biases remain (people who are blood donors need to be healthy and may be socioeconomically skewed) but considering the wide blood donor representativeness in Denmark, I think all Danish researchers will agree that sampling bias must be small.

      The IFR is also fully in line with the most representative data we have from Iceland (14% of population tested, 48000 tests), where the sampling bias is essentially eliminated, which stands at approximately 0.56% (10 deaths / 1799 cases as of May 1) and includes all the high-risk individuals >70 years. https://www.worldometers.in...

      Compared to the Santa Clara study, which caried potential major sampling bias, this issue seems to be now largely removed. Consensus in Denmark is now emerging that the overall whole-population crude mortality of covid-19 is of the order of 0.25-0.6%, in excellent agreement with the Iceland data.

      These two countries have not have their health care systems strained, making them the relevant data also for this reason for pinpointing the "real" mortality of covid-19 absent overmortality by capacity exhaustion as seen in some other countries.

      Obviously, the fact that the IFR is 0.08% for the 0-69 year old has enormous implications for political decision making in Scandinavia, as it evidences that most of the population can build immunity at much reduced mortality than previously assumed.

    1. On 2020-04-22 16:02:39, user Texas Longhorns wrote:

      The research paper does not indicate how many of those that participated had already been tested for Covid and what those test results were.

      If they over sampled people that had already tested positive and recovered of course you will get a higher rate of positive antibodies. That would not be indicative of the general population.

      There is also the problem of false positives because the test can trigger for the common cold that is also a coronavirus.

      I don't think this research passes muster as any reliable indication of antibodies in the general population and should absolutely not be used as a basis to reopen businesses and large public gatherings.

      Having antibodies to one strain of the virus may not give you any immunity to the more than 8 strains of Covid we know are out there.

      Even if the test results are accurate at 2% that is nothing and you need at least 60% solid immunity to consider any large population to have herd immunity protection.

    1. On 2020-04-10 13:51:26, user steve rubin wrote:

      Does anyone know peak hospitalization during the 2017-2018 flu season? From the cdc summary there were 808,000 total hospitalizations with 61,000 deaths and I remember the flu season was pretty long. I wonder how the curves for new cases, deaths, hospitalizations and icu use looked. I remember stories that the hospitals were crowded but I don't remember stories about people dying because there weren't beds or ventilators.

      I know that it's unpopular to compare coronavirus to the flu. Underestimating a threat is dangerous and could (and maybe did) lead to delay in ramping up testing and beds and ventilators and other necessary medical resources.

      When people were predicting 2,000,000 deaths in the US and then 200,000 deaths I could understand the fears. But now they're predicting 60,000 deaths and it may end up half of that, so I think it's reasonable to make the comparisons.

      Comparing situations to past situations is usually our best way to understand how to react. In terms of how contagious and how lethal this epidemic/pandemic is, it now seems that it and the flu are similarly contagious and that covid is much less lethal. The big difference is that we have a pretty big number of people with significant immunity to the flu while it's likely there was little immunity in our population to covid-19. If we end up with 200,000,000 people becoming infected but with only 60,000 deaths, then covid was a fifth as lethal as the flu for 2017-2018 with 3 in 10,000 infections dying vs 14 in 10,000 infections dying from the flu.

      However the comparison to the flu can lead to some counter-arguments. For example, the cdc uses a multiplier of ~80 for estimated current flu infections vs confirmed flu infections. Applying that to covid-19 means that we have ~500,000 confirmed cases so we would have had 40,000,000 total infections leaving another 160,000,000 to go assuming 60% of the population for herd immunity. Projecting deaths would mean 17,000 + 68,000 for a total of 85,000. We'll soon know what that multiplier is for covid-19 because there are a number of antibody surveys going on in the US and internationally. You can bet that the same thing will happen for the flu next year and we'll have a more accurate estimate of infections and lethality for the flu rather than the current guesstimates.

      A big question is how social distancing will have affected the final number of infections and deaths. It seems so logical that social distancing will curb infections and deaths, but many suggest that it may end up only prolonging the length of the pandemic while not making a significant difference in total final infections and total final deaths. The antibody tests may give us the answer to that as well.

    1. On 2020-05-20 17:49:28, user Christopher Leffler wrote:

      Bottom line, how many people does Dr. Ioannidis think will die in the US from this epidemic? If one reads the paper, he proposes that " even under congested circumstances, like cruise ships, aircraft carriers or homeless shelter, the proportion of people infected does not get to exceed 20-45%."<br /> Also, he believes that the infection fatality ratio is: " Infection fatality rates ranged from 0.03% to 0.50% and corrected values ranged from 0.02% to 0.40%."<br /> So, these numbers would give estimates for the United States of:<br /> Low end: 331,000,000 people * 0.2 * 0.0002 = 13,240.<br /> High end: 331,000,000 people * 0.45 * 0.004 = 595,800.<br /> The range is so wide as to provide no useful information. And of course, the pandemic is already at 92,387 deaths in the US, as of May 20, 2020. So we know Ioannidis low end is simply wrong.<br /> We have looked at the mortality in different age groups in New York, among residents and transit workers, and on the Diamond Princess:<br /> https://www.medrxiv.org/con...<br /> Quite early in the pandemic (early April), we showed that if the US followed the course that Italy and Spain had already experienced, we would see 100,000 dead in the US:<br /> https://www.researchgate.ne...<br /> More recently, we showed that if the mortality rates seen in New York MTA / New York State / Diamond Princess were observed nationally, the mortality could be over 600,000, which is the high end for Ioannidis work also:<br /> https://www.researchgate.ne...<br /> So, the bottom line is, that the high end projections from all groups could be quite high indeed. So we will need to be vigilant--wearing masks, protecting the vulnerable, etc. The pandemic is real. To say that it is similar to a typical flu is just plain false. Even Ioannidis own projections do not rule out that this is far worse than the flu. When is the last year the flu killed 92,000 Americans and was on track to kill potentially hundreds of thousands more?

    1. On 2020-10-07 12:44:20, user Iratxe Puebla wrote:

      Review completed as part of ASAPbio’s #PreprintReviewChallenge

      The study examines the incidence of heart disease deaths in the early pandemic period in the US (30 March to April 26) in areas without large COVID-19 outbreaks. The authors sought to study whether a decline in acute myocardial infarction (AMI) admissions was linked to either a higher mortality rate (which would suggest avoidance of care seeking), or lower mortality (which may suggest less triggers for AMI). The authors use data from the CDC’s s National Center for Health Statistics and apply inclusion criteria requiring >97% completeness for the data.

      The study includes data from a reliable source and includes controls involving a comparison to incidence of heart disease deaths in the same period in 2019 and 4 weeks earlier in 2020. While the study is observational and can only point to trends and not explain the reported decrease in incidence of heart disease death in several states during the study period, it helps surface this trend and opens lines for further research to evaluate whether the trend will sustain over a longer period and if so, look into the potential factors behind the trend. If the trend were to sustain over time and was found not to be associated with misclassification of death cause, it may provide avenues to identify factors that can reduce triggers for AMI.

      Minor comments<br /> - The authors indicate ‘The primary analysis captured 747,375,188 person-weeks for the early pandemic period and 101,620,248 person-weeks for the 2019 control period’ the number of person weeks for the control period is considerably lower, can the authors provide some context for this, and whether this may have any influence on the analysis?<br /> - The abstract indicates ‘The mean incidence rate (per 100,000 person-weeks) for heart disease in states without excess deaths during the early pandemic period was 3.95 (95% CI 3.83 to 4.06) versus 4.19 (95% CI 4.14 to 4.23) during the corresponding period in 2019’, the Results section reads ‘The mean incidence rate (per 100,000 person-weeks) for heart disease in states without excess deaths during the early pandemic period was 3.95 (95% CI 3.83 to 4.06) versus 4.35 (95% CI 4.23 to 4.48)’ it appears they need to be updated to match?

      Questions for the authors<br /> - Now that we have data from four additional months into the pandemic, are the authors planning an extension to the analysis?<br /> - For the states where an increase in the incidence of heart disease deaths was observed, the authors mention the possibility of harm due to avoidance of care, misclassification during a period of excess deaths and COVID-19 itself increasing cardiovascular deaths. Do the authors think that capacity at hospitals may have been a factor behind any increase in heart disease deaths? E.g. related to prioritization of COVID-19 admissions vs others.

    1. On 2020-10-26 17:59:08, user Meng-Ju Wu wrote:

      Hi! It is interesting to read the paper in discussion for EVs to differentiate ALS from healthy and diseased groups. And I want to share my thought on the study.

      I think the main contribution of the study includes the purification of EVs with the nickel-based isolation compared to the conventional methods that makes the analysis of specific EV parameters highly sensitive and reliable. If the EVs are reliably differentiate ALS patients from healthy and diseased group, clinical assessment with the blood test will significantly shorten the diagnosis time for ALS and that the treatment may be started as early as possible. In addition, if biomarkers are available to detect ALS patients, it means that we can develop the treatment specific to ALS using their unique properties. Patients can avoid costly and lengthy process of ALS diagnosis.

      I have two questions considering the methods. First, why was the supernatant from human plasma diluted in filtered PBS once but the serum from mice required 10 times for dilution? Second, what was the temperature and humidity condition for the incubation of activated charged agarose beads in NBI? I think the time to use the obtained serum would be the limitation of this approach. The content of the EVs might be changed if the centrifuged plasma samples are not immediately used. Such compositional change may be subject to the storage condition and the degradation rate of each specific proteins. It may also vary among species. Therefore, a specific time period to analyze the plasma should be strictly regulated.

      In general, I think there are no major grammatic or spelling errors. However, the content may be modified in order to make it more logical and convincing to read. In the introduction part, I think it is important to summarize how is ALS diagnosed clinically. If the readers are informed that electrophysiologic diagnosis takes longer time and effort and make the diagnosis, they would appreciate the value of blood test to detect suspected ALS patient in prodromal state. In the last paragraph of the introduction, it is not reasonable to mention that the study results suggesting EVs are food biomarkers. It should be mention in the discussion or conclusion section. In the material section, the time of patient inclusion was missing. In the animal model, the paper should mention why only female mice with SOD1G93A and male mice with TDP-43Q331K were studied. Also, the timing to study the two different genes as well as the number of the mice were concerning to interpret the results. I want to suggest making a visual diagram on the machine learning technique. You did a great job in comparing the difference between ultracentrifugation and NBI using EV-like liposomes. As such, I want to suggest applying the same comparison onto the animal model to test the reliability of the using the NBI method alone in the paper. The results and the discussion are well-written and consistent with the tables and figures provided

    1. On 2021-06-23 21:55:50, user David Wiseman PhD wrote:

      Summary:<br /> Regarding the continued and unnecessary confusion related to the Argoaic and Artuli comments.<br /> 1. These are in reality distractions from the central issue that the original NEJM paper remains uncorrected in NEJM as to shipping times. Although a secondary issue, also uncorrected is the "days" nomenclature that is the reason for confusion in the Argoaic and Artuli comments on this forum. Also uncorrected in the original paper is the exposure risk definition which were informed were also incorrect. Together, these issues controvert the conclusions of the original study.<br /> 2. The incorrect nomenclature for "days" in the NEJM paper as well as in a follow up work (Clin Infect Dis, Nicol et al.) inflates the number of "elapsed time" days. This has not been corrected by the original authors. We on the other hand have corrected this by providing the correct information in our preprint.<br /> 3. Dr. Argoaic seems to have been given a wrong and earlier version (10/26) of the data which, although contains a variable that is supposed to correct the above problem, does not. In fact one cannot come to any conclusion that there is a discrepancy based on this incorrect 10/26 version, unless you have some preconceived notion.<br /> 4. Other post hoc analyses reported in follow up works (including social media) by the original authors looking at time from last exposure, or using a pooled placebo group, although flawed for a several reasons, when examined closely, nonetheless support our conclusions that early PEP prophylaxis with HCQ is associated with a reduction of C19.

      Detail:<br /> Any confusion about "days" would disappear once the original authors correct the NEJM June 2020 paper as well as a follow up letter in Dec 2020 Clin Infect Dis (see upper red graph in Nicol et al. pubmed.ncbi.nlm.nih.gov/332... "pubmed.ncbi.nlm.nih.gov/33274360/)"). These errors inflate the "DAYS" by 1 day because the nomenclature for describing "days" was incorrect. As far as we know those corrections have not been made in the journals where these errors appear and in a way that can be retrieved in pubmed etc..

      As far as we can tell, anyone who has cited the NEJM paper (NIH guidelines, NEJM editorial, many meta-anlayses etc., our protocol in preprint version) also misunderstood the "days" to mean the inflated figure. So the authors need to correct this. As far as we know we are the only ones to do this. After we were informed of this error by the PI (who was unaware of the problem himself) we described this problem very clearly in our preprint, distinguishing between elapsed time and the day on which a study event occurred. For the benefit of those who remain confused, we will endeavor to make it even clearer in a future version. You can read our correspondence log referenced in the preprint to verify that the incorrect "days" nomenclature was unknown to the PI, at least until 10/27 when he informed us about it.

      You are confusing "DAY ON which an event occurred" with "DAYS FROM when an event occurred." For example the original NEJM Table 1 says "1 day, 2 days etc." for "Time from exposure to enrollment". This falsely inflates the number of elapsed time days by 1, and as the authors informed us (documented in our preprint), this really means DAY ON which enrollment occurred, with Day 1 = day of exposure, so you need to subtract 1 from the days to get elapsed time FROM exposure. The same error is repeated in Nicol et al. (note: we discuss other unrelated issues relating to time estimates in our preprint).

      To confuse matters further, the problem is not even corrected in the dataset linked (datestamp 10/26/20) in the Argoaic comment. In column FS there is a variable "exposure_days_to_drugstart." This appears to indicate elapsed time (ie DAYS FROM) when it actually means the "DAY ON" nomenclature. We were only informed of the nomenclature error on 10/27/20 and later provided with a new version of the dataset on 10/30 where an additional variable "Exposure_to_DrugStart" (column GR) was provided that corrects this error by subtracting 1 from all the values.

      Why the Argoaic comment does not link to the correct 10/30 version is unclear, but in this incorrect 10/26 version, the values for the new variable "Exposure_to_DrugStart" (column GR) are IDENTICAL to those in the "exposure_days_to_drugstart" (column FS) variable (they should be smaller by 1). Accordingly, unless Drs. Argoaic and Artuli had a preconceived notion (without checking the data) that some alteration had occurred, it is impossible to draw such a conclusion (albeit one that is incorrect for other reasons) from this incorrect 10/26 dataset. A number of colleagues have downloaded the 10/26 dataset from the link provided in the Agoraic comment, and have verified this problem.

      So in addition to the original data set released in August 2020, as well as the three revisions (9/9, 10/6 and 10/30) we describe in our preprint there is this incorrect 10/26 version. I don't know how many people this affects but it would be appropriate for them to be notified that the version they have may be an incorrect one. An announcement on the dataset signup page covidpep.umn.edu/data would also be in order (nothing there today).

      Regarding the possibly higher placebo rate of C19 on numbered day 4 (18.9%). This is matched by a commensurate change in its respective treatment arm, yielding RR=0.624 similar to that for numbered days 2 (0.578) and 3 (0.624), justifying pooling. We don't know if the 18.9% represents normal variation or has biological meaning.

      Although they used enrollment time data (completely irrelevant to considering whether or not early prophylaxis is beneficial), the original authors (Nicol et al.) in a post hoc analysis, used a pooled placebo cohort to compare daily event rates (red bar graph). This would mitigate possible effects of an outlying value in the placebo cohort. We applied this same pooled placebo method to the data that correctly takes into account shipping times. This method is still limited because it may obscure a poorly understood relationship between time and development of Covid-19. Although at best this would be considered a sensitivity analysis, we did it to answer the Artuli question. This approach yields the same trends as our primary analysis. Using 1-3 days elapsed time of intervention lag (numbered days 2-4) for Early prophylaxis, there is a 33% reduction trend in Covid-19 associated with HCQ (RR 0.67 p=0.12). Taking only 1-2 days elapsed time intervention lag, we obtain a 43% reduction trend (RR 0.57 p=0.09). This analysis appears to reveal a strong regression line (p=0.033) of Covid-19 reduction and intervention lag.

      We also looked at the post hoc analysis provided by the original authors (Nicol et al.) that used “Days from Last Exposure to Study Drug Start,” a variable not previously described in the publication, protocol or dataset, so we have no way of verifying it from the raw data. As in a similar PEP study (Barnabas et al. Ann Int Med) this variable has limited (or no) value, as we are trying to treat as quickly as possible from highest risk exposure, not an event (ie Last Exposure) that occurs at an undefined time later. (even the use of highest risk exposure has some limitation, which the authors pointed out to us and which we discuss in our preprint). Further the Nicol analysis used a modified ITT cohort, rather than the originally reported ITT cohort. with these limitations, pooling data for days 1-3 and comparing with the pooled placebo cohort (yields a trend reduction in C19 associated with HCQ (it is unclear which "days" nomenclature is used) after last exposure from 15.2% to 11.2% (RR 0.74, p=0.179).

      Taken together with these "sensitivity" analyses inspired by the original authors' methodology, suggests that this is not an artifact of subgroup analysis. It could be said that any conclusions made by the sort of analyses conducted by Nicol are equally prone to the "subgroup artifact" problem. (also note that in our paper, the demographics for placebo and treatment arms in the early cohort match well).

      Mention has been made elsewhere of two other PEP studies (Mitja, Barnabas) which concluded no effect of HCQ. It is important to note that the doses used in these studies were much lower than those used in the Boulware et al. NEJM study. Further, according to the PK modelling of the Boulware group (Al-Kofahi et al.) these doses would not have been expected to be efficacious (the Barnabas study used no substantial loading dose). So citing the Mitja and Barnabas studies to support claims of HCQ inefficacy in the Boulware et al paper is unjustified. On the contrary, taken together three studies suggest a dose-response effect. We discuss this in detail in our preprint.

      Lastly it is important to note the since the original NEJM study was terminated early, the entire original analysis can be thought of as a subgroup analysis, with all of the attendant problems referenced by the original authors (and us). There is certainly a great deal of under powering and propensity to Type 2 errors, among the issues inherent in a pragmatic study design. The study was not powered as an equivalence study and so no definitive statement can be made that the HCQ is not efficacious. Along with the still uncorrected (in the original journal) issues of shipping times, "days" nomenclature and exposure risk definitions, there are are certainly many efficacy signals that oppugn the original study conclusions,and controvert the statement made in a UMN press release (covidpep.umn.edu/updates) "covidpep.umn.edu/updates)") that the study provided a "conclusive" answer as to the efficacy of HCQ.

      _________________<br /> Please note that despite our offer to Dr. Argoaic to contact us directly to walk though the data to try to identify any issues, we have not been contacted.That offer is still extended to anyone who remains confused. We have also attempted to locate both Drs. Argoaic and Artuli to try to clear up their confusion, but these names do not exist in the mainstream literature (i.e pubmed, medrxiv), nor do they appear to have any kind of internet footprint.

      With regard to Table 1 of our preprint, the reason why there are no patients for “Day 1” is that there were no patients who received drug the same day as their high-risk exposure. This is consistent with the PIs comment on 8/25/20 (p10 of email log) (at a time when he thought that there was a “Day zero”) “Exposure time was a calculated variable based date of screening survey vs. data of high risk exposure. Same day would be zero. (Based on test turnaround time, I don’t think anyone was zero days).”

      We notice an obvious typo in the heading for the second column of our Table 1, which says “To”. But it should say “nPos”, to match the 5th column (and other tables). It is patently absurd that there should be a category of “1 to 0” days or “7 to 5” days etc. “From” makes no sense either and these typos have absolutely no effect on the analysis, interpretation or conclusions. This will be corrected in a later version.

    1. On 2025-11-30 17:00:32, user Cyril Burke wrote:

      RESPONSE TO REVIEWER #2<br /> June 27, 2022<br /> Reviewer #2: Thank-you for the opportunity to review this work which highlights the importance of monitoring serum creatinine over time and how this can be a useful tool in detecting possible CKD. This is an important topic as the use of sCr on its own is certainly under-utilized and changes are often missed because they don’t fall into a predefined category.<br /> Thank you for considering our manuscript and for your detailed comments.

      MAJOR CONCERNS

      A. “Choi- rates of ESRD in Black and White Veterans” doesn’t fit with the rest of the paper including the title; the introduction and conclusion also don’t adequately address this portion of the paper. It feels disjointed from the main point of discussion which is the use of sCr in screening “pre-CKD”. This section and discussion should be removed and possibly considered for another type of publication.<br /> We have attempted to clarify this inclusion. This manuscript could be divided into three or four short papers, increasing the likelihood that any one of them would be read. However, different groups tend to read papers about screening for kidney impairment, racial disparities, cofactors in modeling physiologic parameters, or policy proposals to encourage best practices. Despite the appeal of perhaps three or four publications, we decided to tell a complete story in a single paper, but we are open to suggestions.

      Black Americans suffer three times the kidney failure of White Americans. Other minority groups also have excessive rates of kidney disease. However, analysis of Veterans Administration interventions can bring that ratio close to one, similar interventions might also reduce to parity the risk for Hispanic, Asian, Native Americans, and others. Within-individual referencing should allow better monitoring of all patients and help to reveal the circumstances and novel kidney toxins that lead to progressive kidney decline. The ability to identify a healthy elderly cohort with essentially normal kidneys would help to calibrate expectations for all. Better modeling of GFR should help everyone, too.

      Over eight decades, anthropologists have had little scholarly success in diminishing the inappropriate use of ‘race’. Keeping these parts together may be no more successful, but we feel compelled to try.

      B. Cases 1 - 3, (lines 93 – 122): where are these cases from? There is no mention of ethics to publish these patient results, which appears to be a clear ethics violation. If so, these cases should be removed and patient consent and ethical approval obtained to publish them.<br /> The authors describe the reasons for not obtaining an ethics waiver for this secondary data analysis. Despite this, the relative ease of obtaining an ethics waiver for secondary data analysis usually means that this is done regardless.<br /> We take patient privacy seriously and have completely de-identified the Case data, as required by Privacy Act regulations. We understand that no authorization or waiver was necessary. We discussed the issues with an IRB representative, reviewed the relevant regulations, and confirmed no need for formal review of a secondary analysis of already publicly available IRB-approved data or of completely de-identified clinical data collected in the course of a treating relationship.

      IRBs have a critical role to play, but many (including ours) are overworked. We understand the impulse authors feel to gain IRB approval even when the regulations clearly do not required it. As we discuss in the revision, there is a more significant matter that IRBs could help to resolve if they have the resources to do so. For all of these reasons, and even though we, too, felt the urge to obtain IRB approval, we resisted adding “just a little more” to their work.

      C. The message of the article and data representation is unclear: do the authors wish to show that sCr is superior to eGFR in this “pre-CKD” stage, should both be used together? Do the authors wish to convey that a “creatinine blind range” does not exist? Or is the aim to demonstrate that continuous variables should not be interpreted in a categorical manner?<br /> Our interest is detection and prevention of progression of early kidney injury at GFRs above 60 mL/min – a range in which eGFR is especially unreliable. We have advanced the best argument we can to detect changes in sCr while kidney injury is still limited and perhaps reversible. If experience reveals that some avoidable exposure(s) begins the decline, then clinicians might alert patients and thereby reduce kidney disease. How best to use longitudinal sCr remains to be determined from experience. However, our message is that early changes in sCr can provide early warning of a decline in glomerular filtration. We are confident that clinicians can learn to separate other factors that may alter sCr, as we do for many other tests.

      MINOR CONCERNS<br /> ABSTRACT<br /> A. Vague. Doesn’t give a clear picture of the study<br /> We have tried to clarify the title and abstract and are open to further suggestions.

      INTRODUCTION<br /> B. 51 – 57: needs to state that these stats are from e.g. the US. The authors should consider adding international statistics to complement those from the US.<br /> We have updated the statistics on death rates from kidney disease to include US and global data.

      C. 68: reference KDIGO guidelines, state year<br /> We now reference the KDIGO 2012 guidelines.

      D. 75 – 77: is this reference of the New York Times the most appropriate?<br /> We have expanded this section with peer-reviewed, scholarly references. However, we found Hodge’s summary of the issue succinct and hence potentially more persuasive for some than decades of scholarly references that have had limited or no effect in the clinic.

      E. 82: within-individual variation not changes (this is repetition of the point made in lines 425 – 427, but should match the language)<br /> We have matched the language.

      F. 82 – 84: reference? If this is a question it should be presented as such<br /> We have attempted to clarify this statement.

      G. 84: “normal GFR above 60” = guidelines (including KDIGO) do not refer to 60 as normal GFR, 60 – 89 is mildly decreased. (see line 126)<br /> We agree and have corrected the language.

      H. 93: avoid the use of emotive words such as apparently (also in line 428)<br /> We wanted to emphasize appearance without proof and have made these changes.

      I. 94: “Not meeting KDIGO guidelines”: KDIGO 2.1.3 includes a drop in category (including those with GFR >90). This would appear to include some of the cases listed. Additionally, albuminuria should have been measured for case 2 and 3.<br /> We have clarified that cases may or may not fit KDIGO categories, though that question will frequently arise in evaluating sCr changes. Where available, we have added urine protein and/or albumin results to the Cases.

      J. 97: “progressive loss of nephrons equivalent to one kidney”: this is based on a single creatinine measurement.<br /> Since the original submission, we discovered for this Case (now Patient 3) early serum creatinine results and notes indicating a six-month period off thiazide diuretic. This data clarified the baseline and showed a remarkable effect of thiazide diuretic on sCr. We have added follow-up sCr results and details of thiazide use to the ASC chart.

      K. 93 – 122: Could any of these shifts be explained by changes in creatinine methodology or standardization of assays, especially over 15 – 20 years (major differences between assays existed before standardization and arguably still exist with certain methods).<br /> It would be useful to see a comparison between serial sCr and eGFR measurements on the same figure. There appears to be significant (possibly more pronounced) changes when eGFR is used. As line 87 mentions changes in eGFR may be as useful (and in some situations more useful) than changes in sCr alone.

      It would be helpful to have a chronology from each local laboratory with the date of every change in creatinine assay or standardization. However, any single shift draws attention but does not necessarily indicate significant change in glomerular filtration. After one or several incremental increases, over at least three months, the sCr pattern may meet the reference change value (RCV) that signals significant change. In the future, from age 20 or so, a patient’s medical record should retain the full range of the longitudinal sCr for true baseline comparison.

      As noted in the revised manuscript, Rule et al showed that there is measurable nephrosclerosis even in the youngest kidney donors, suggesting that some injuries (perhaps exposure to dietary toxins) may begin in childhood and that early preventive counseling may be worthwhile. Experience will show whether this can slow progression to CKD. As we note, quoting Delanaye, sCr accounts for virtually 100% of the variability in eGFR equations based on sCr (eGFRcr), and these equations add their own uncertainties, so no, we do not believe that eGFR is more useful than sCr when GFR is above 60 mL/min and possibly much lower as well.

      We have added eGFR results to the ASC charts (in blue), though availability was somewhat limited.

      L. 127 – 142: should there be separate charts for males and females, the differences in creatinine between males and females needs to be discussed somewhere in the paper.

      We do not think there should be separate charts for men and women based on size. The role of sex in eGFR equations is mainly based on the presumption that the average woman has less muscle mass than the average man. Clinicians care for individuals, not averages, and this sweeping generalization that increases agreement of the average of a population introduces unacceptable inaccuracy to individual care. Within-individual comparison eliminates the need for assumptions on relative size or muscle mass. Major changes in an individual’s muscle mass will usually be evident to the clinician who can adjust for them.

      However, reports suggest significant influence of sex hormones on renal function, including effects of estrogen and estrogen receptors, such as reducing kidney fibrosis, increasing lupus nephritis, and increasing CKD after bilateral oophorectomy. The mechanism of these effects and how they might be incorporated into eGFR estimating equations is unclear, but the effort may benefit from a more individualized approach with focus on a measurand rather than matching population-based averages of a quantity value (calculated from measurands).

      M. Similarly, is this suitable for all ages?<br /> We think so. Another sweeping generalization based on age merely introduces another inaccuracy which complicates the task of clinicians caring for individuals. Older persons have varying health, athleticism, muscle mass, dietary preferences, etc. Rule et al reported that biopsies of about 10% of older kidney donors had no nephrosclerosis. Within-individual comparison eliminates the need for assumptions on relative muscle mass or inevitable senescent decline in nephron number. We substitute the assumption that any change in an individual’s muscle mass will be evident and can be accounted for. A seemingly ubiquitous risk factor, or factors, starts injuring kidneys at a young age, which we may yet identify.

      N. 162 – 163: rephrase<br /> Done.

      METHODS<br /> O. 185 – 193: aim belongs in the introduction, can be adjusted to complement paragraph 178 – 182.<br /> Reorganized and rewritten.

      P. 196 – 205: reference sources

      References provided.

      Q. 224 – 247: not in keeping with the rest of the article or title and conclusion

      We have revised and restructured this section.

      RESULTS<br /> R. If eGFR is treated as a continuous variable does inverted sCr still have higher accuracy?<br /> We believe so. Serum creatinine is a measurand and reflects the total sum of physiologic processes, known and unknown. In contrast, eGFR equations yield a quantity value, calculated from a measurand and dependent on the assumptions and approximations incorporated by their authors. The eGFR equations are thus necessarily less accurate than the measurands they are derived from, in this case, sCr. In a hyperbolic relationship, as the independent variable drops below one and approaches zero, the effect is to amplify the inaccuracy of the independent variable in the dependent variable. By avoiding the mathematical inverting, the data suggest that direct use of sCr is far more practical for pre-CKD.

      S. As mentioned, the section on ESRD in black and white veterans doesn’t fit in with the rest of the article.<br /> We have revised, reorganized, and rewritten. We also outlined our rationale above.

      DISCUSSION<br /> T. As mentioned, section 4.1 doesn’t fit in with the rest of the article. As the authors note the correlation between illiteracy and CKD is likely not causal.<br /> See above.

      U. 387: erroneous creatinine blind range. The data presented does not show this is erroneous there is still a relative blind range. A distinction must be made between a population level “blind range” and an individual patient’s serial results. The data and figure 4 in particular demonstrate the lack of predictive ability of sCr above 40ml/min compared to below 40ml/min at a population level. For an individual patient this “blind range” is more relative, and a change in sCr even within the normal range may be predictive. (Note: the terminology “blind range” is problematic).<br /> We agree. On reading closer, Shemesh et al call attention to “subtle changes” in serum creatinine even though they had access only to the uncompensated Jaffe assay, so their recommendation to monitor sCr is even more forceful, today, due to more accurate and standardized creatinine assays. We have attempted to clarify this in the manuscript.

      V. 399 – 400: “rose slowly at first and then more rapidly as mGFR decreased below 60” this refers to a relative blind range. Whether these slow initial changes can be distinguished from analytical and intra-individual variation is the question that needs to be answered before we can say a “blind-range” doesn’t exist for an individual patient.

      We appreciate this observation. We believe longitudinal sCr is worth adopting to gain insights into individual sCr patterns, which may reveal early changes in GFR, among other influences on sCr. This is a low-cost, potentially high-impact population health measure, and there seems little risk in trying it because many clinicians already use components of the process.

      W. 425 - 432: sCr is indeed very useful when baseline measurements are available. eGFR remains useful when baseline sCr is not available or when large intervals between measurements are found.<br /> As Delanaye et al noted, virtually 100% of the variability in longitudinal eGFR is due to sCr, so we understand that the errors in eGFR can be (and usually are) greater than but cannot be less than those in sCr.

      X. 425: low analytical variation- if enzymatic methods are used<br /> Lee et al suggest that even the compensated Jaffe method provides some accuracy and reproducibility, which may allow longitudinal tracking of sCr even where more modern assays are as yet unavailable.

      Y. 428: avoid the use of “apparently”<br /> Done.

      Z. 430: reference 56 compares sCr and sCysC with creatinine clearance NOT with mGFR, this does not prove that mGFR has greater physiologic variability. Creatinine clearance is known to be highly variable (partially due to two sources of variability in the measurements of creatinine: serum and urine).<br /> The creatinine clearance is another form of mGFR, and our understanding of it begins with the units: if the clearance or removal of creatinine were being measured, the units should be umoles/minute, but they are mL/min. “Clearance” is an old concept coined by physiologists to describe many substances, such as urea, glucose, amino acids, and other metabolites. Since creatinine is mostly not reabsorbed and is only slightly secreted in the tubules, the “creatinine clearance” became a measure of GFR. The ratio of urine Creatinine to serum Creatinine is simply a factor for how much the original glomerular filtrate then gets concentrated (typically about 100-fold) by the kidney. Since the assumption is that the timed urine was once the rate of glomerular filtrate production, the creatinine clearance is a measure of the GFR.

      Creatinine clearance has some inaccuracies based on tubular secretion, but also has some advantages: blood concentrations are essentially constant during urine collection, no need for exogenous administration, and reliable measurements in serum and urine. The methods that we often call mGFR also have problems, including unverifiable assumptions about distributions, dilutional effects, and others we cite in the text. None of these are direct measures of GFR. Due to changes in remaining nephrons, even true GFR itself is not strictly proportional to the lost number of functional nephrons, which seems the ultimate measure of CKD that Rule et al estimated from biopsy material.

      AA. The limitations of sCr for screening should also be discussed: differences in performance and acceptability between enzymatic and Jaffe methods (still widely used in certain parts of the world), the effect of standardizing creatinine assays (an important initiative but one that could also produce shifts in results around the time of standardization- see cases), low InIx means that once-off values are exceedingly difficult to interpret, is a single raised creatinine value predictive (or should there be evidence of chronicity): similarly are there effects from protein rich meals, etc (The influence of a cooked-meat meal on estimated glomerular filtration rate. Annals of Clinical Biochemistry. 2007;44(1):35-42. doi:10.1258/000456307779595995)<br /> We have added discussion of additional references on reproducibility of sCr assays and discuss dietary meat and, in Part Three, possible dietary kidney toxins.

      CONCLUSION<br /> BB. The discussion recommends using SCr above eGFR while the conclusion recommends the NKF-ASN eGFR for use in pre-CKD and ASC charts. While the use of both together in a complementary fashion is understandable- this needs to be congruent with the discussion, aims and results.<br /> We have rewritten this section. We would welcome any further recommendations.

      Cyril O. Burke III, MD, FACP

    2. On 2025-11-30 23:44:45, user Cyril Burke wrote:

      [Note: This is the second of several rounds of review of an earlier version of our combined manuscript, aiming to reduce ‘racial’ disparity in kidney disease. The comments were kindly offered by nephrologists, through a medical journal, and we remain grateful to them for the time and care they gave to improve our manuscript.

      We removed identifying features and included our responses, at the end of this comment. The changing title and line numbers refer to earlier versions.]

      August 3, 2022<br /> Dear Dr. Burke III,

      REDACTED.

      Reviewer #1: Cyril O Burke III et al submit a revised version of their intriguing , unusual paper.

      Overall, the paper remains extremely lengthy (the total , including clean and track versions and reply to reviewers is close to 200 pages !!) , whereas it contains relatively little original data.

      The authors speculate and comment a lot (and most of these speculations/comments will hardly be understandable by the expected audience, primary care physicians), and this will in addition distract the reader from the main key message (which is right in the opinion of this reviewer (see first round of review) and warrants more attention and studies.

      The race part is irrelevant for the key point (race does not change over time, and thus is not relevant when looking at longitudinal serum creatinine or eGFR) and should be deleted in the opinion of this reviewer. In this respect, I completely agree with the comment of reviewer 2 in the first round.

      I can not resist quoting here the reply of the authors to reviewer 2. “This manuscript could be divided into three or four short papers, increasing the likelihood that any one of them would be read. However, different groups tend to read papers about screening for kidney impairment, racial disparities, cofactors in modeling physiologic parameters, or policy proposals to encourage best practices. Despite the appeal of perhaps three or four publications, we decided to tell a complete story in a single paper, but we are open to suggestions.”

      My reply to their reply: nobody would read the current paper , even partially. Shorten, shorten, shorten please and focus on the key message.

      Reviewer #2: Thank-you, once again, for the opportunity to review this lengthy “thesis-style” manuscript which discusses some important often over-looked topics. The under-use of serial creatinine measurements and over-reliance on often erroneous eGFR measurements is an important point which is easily missed by healthcare workers with potentially serious consequences. Likewise, the misuse of racial constructs in medicine (and elsewhere) is an important point.

      I am satisfied with this re-submission and the changes which have been made to the original manuscript.

      Minor points:<br /> 431: “creatinine inhibits several membrane transporters”. = Cimetidine

      502: “Because mGFRs have population variation as wide as sCr, with much greater physiologic variability compared to the relatively stable sCr and serum cystatin C”<br /> As mentioned previously the cited article compares the variability of sCr and cystatin C with CrCl, I agree with the authors that CrCl is a form of mGFR, however, probably one of the poorer forms and not what a reader will think of when mGFR is mentioned. In our current age of medicine when we talk about mGFR CrCl is seldom included, studies reviewing methods of mGFR will seldom include CrCl, however CrCl may be compared to one of the mGFR methods. Likewise, if a patient is sent for a mGFR, a CrCl will not be performed. In our current age of medicine mGFR refers to methods such as the clearance of iohexol, iothalamate, Cr-EDTA, inulin, DTPA, etc; the authors themselves mention this (line 539 – 540). I fully agree with the authors that mGFR is FAR from perfect and has many inaccuracies and imprecisions (which are often overlooked)- these are well published, some of which are cited in this manuscript. If the authors wish to use the current study as a source they should state the findings in a way that cannot be misinterpreted. For example: “CrCl has much greater physiologic variability than sCr and cystatin C …” – in this case the reader can determine for themselves whether they would use CrCl as a surrogate for mGFR. Alternatively, adjust the statement and use another source which has shown the variability that exists with what we currently refer to as mGFR method.

      670 – 719: As the authors specifically discuss age it would be prudent to briefly mention the short-comings, or considerations for interpretation, of serial creatinine measurements at a very young age which generally rise until late adolescence when steady muscle mass is achieved. Also note changes in creatinine and GFR from birth till 2 – 3 years.

      783 – 784: Consider re-wording the grammar makes this sentence difficult to read

      959 – 968: Note, editing has not been accepted (tracked changes still shown)

      1116 - 1121: “Using the opioid crisis as an example…. in, for example, the opioid crisis” – same sentence

      RESPONSE TO REVIEWERS:<br /> September 17, 2022<br /> Longitudinal creatinine, not ‘race’, signals pre-chronic kidney disease and decline in glomerular filtration rate

      We again greatly appreciate the reviewers for offering detailed comments and guidance, which we have endeavored to incorporate as best we could.

      Comments to the Author<br /> Reviewer #1: Cyril O Burke III et al submit a revised version of their intriguing, unusual paper.<br /> 1. Overall, the paper remains extremely lengthy (the total, including clean and track versions and reply to reviewers is close to 200 pages !!), whereas it contains relatively little original data.<br /> The authors speculate and comment a lot (and most of these speculations/comments will hardly be understandable by the expected audience, primary care physicians), and this will in addition distract the reader from the main key message (which is right in the opinion of this reviewer (see first round of review) and warrants more attention and studies.<br /> The race part is irrelevant for the key point (race does not change over time, and thus is not relevant when looking at longitudinal serum creatinine or eGFR) and should be deleted in the opinion of this reviewer. In this respect, I completely agree with the comment of reviewer 2 in the first round.<br /> I can not resist quoting here the reply of the authors to reviewer 2.<br /> "This manuscript could be divided into three or four short papers, increasing the likelihood that any one of them would be read. However, different groups tend to read papers about screening for kidney impairment, racial disparities, cofactors in modeling physiologic parameters, or policy proposals to encourage best practices. Despite the appeal of perhaps three or four publications, we decided to tell a complete story in a single paper, but we are open to suggestions."<br /> My reply to their reply: nobody would read the current paper, even partially. Shorten, shorten, shorten please, and focus on the key message.<br /> We fundamentally agree and have worked to shorten the text; to clarify our understanding that ‘race’ may change with time, location, and self-identification; and to add a Table of Contents to make the Parts more accessible to interested readers. We comment a lot because, in highly racialized societies, like the US [1,2], it can be difficult to see beyond ‘race’ without explicit speculation about other possible explanations for difference, which we understand, may or may not pan out under investigation. One hope is that all clinicians will pursue explanations other than ‘race’, but this seems unlikely. Busy medical researchers have little time to develop expertise outside their area of interest, which may explain why ‘Commentary’ and ‘Perspective’ articles have failed to inspire an ethical ban on the misuse of ‘race’ in medical research, journals, clinics, and elsewhere [3]. We do not know whether a suite of articles can meaningfully contribute to ending misuse of ‘race’, where so many scholarly articles have failed, but after perceiving little change over four decades, trying something completely different seemed (almost) rational.

      1. Nunez-Smith M, Curry LA, Bigby J, Berg D, Krumholz HM, Bradley EH. Impact of race on the professional lives of physicians of African descent. Ann Intern Med. 2007 Jan 2;146(1):45-51. doi: 10.7326/0003-4819-146-1-200701020-00008. PMID: 17200221.

      2. Betancourt JR, Reid AE. Black physicians' experience with race: should we be surprised? Ann Intern Med. 2007 Jan 2;146(1):68-9. doi: 10.7326/0003-4819-146-1-200701020-00013. PMID: 17200226.

      3. McFarling UL. Troubling podcast puts JAMA, the ‘voice of medicine,’ under fire for its mishandling of race. Stat News. 2021 April 6 [Cited 2022 August 31]. Available from: https://www.statnews.com/2021/04/06/podcast-puts-jama-under-fire-for-mishandling-of-race/ <br /> Reviewer #2: Thank-you, once again, for the opportunity to review this lengthy “thesis-style” manuscript which discusses some important often over-looked topics. The under-use of serial creatinine measurements and over-reliance on often erroneous eGFR measurements is an important point which is easily missed by healthcare workers with potentially serious consequences. Likewise, the misuse of racial constructs in medicine (and elsewhere) is an important point.<br /> Thank you for again giving time for helpful criticism and comments on our manuscript.

      A. I am satisfied with this re-submission and the changes which have been made to the original manuscript.<br /> Minor points:<br /> B. 431: “creatinine inhibits several membrane transporters”. = Cimetidine<br /> Corrected.

      C. 502: “Because mGFRs have population variation as wide as sCr, with much greater physiologic variability compared to the relatively stable sCr and serum cystatin C”<br /> As mentioned previously the cited article compares the variability of sCr and cystatin C with CrCl, I agree with the authors that CrCl is a form of mGFR, however, probably one of the poorer forms and not what a reader will think of when mGFR is mentioned. In our current age of medicine when we talk about mGFR CrCl is seldom included, studies reviewing methods of mGFR will seldom include CrCl, however CrCl may be compared to one of the mGFR methods. Likewise, if a patient is sent for a mGFR, a CrCl will not be performed. In our current age of medicine mGFR refers to methods such as the clearance of iohexol, iothalamate, Cr-EDTA, inulin, DTPA, etc; the authors themselves mention this (line 539 – 540). I fully agree with the authors that mGFR is FAR from perfect and has many inaccuracies and imprecisions (which are often overlooked)- these are well published, some of which are cited in this manuscript. If the authors wish to use the current study as a source they should state the findings in a way that cannot be misinterpreted. For example: “CrCl has much greater physiologic variability than sCr and cystatin C …” – in this case the reader can determine for themselves whether they would use CrCl as a surrogate for mGFR. Alternatively, adjust the statement and use another source which has shown the variability that exists with what we currently refer to as mGFR method.<br /> We appreciate this comment and have both added another reference and added to the text an argument for reconsidering creatinine clearance. Many hospitals and some countries lack the resources for advanced mGFR filtration markers, which are only used for research or for screening related to kidney transplants. However, most laboratories have the tools for ‘quick-creatinine clearance’ (quick-CrCl), which may be an acceptable alternative to the classic mGFRs. If confirmed, a simple and affordable quick-CrCl might allow hospitals and laboratories worldwide an alternative measurement requiring fewer assumptions for another aspect of glomerular filtration.

      D. 670 – 719: As the authors specifically discuss age it would be prudent to briefly mention the short-comings, or considerations for interpretation, of serial creatinine measurements at a very young age which generally rise until late adolescence when steady muscle mass is achieved. Also note changes in creatinine and GFR from birth till 2 – 3 years.<br /> We have added a brief discussion of the diagnosis of CKD in infants, children, and adolescents.

      E. 783 – 784: Consider re-wording, the grammar makes this sentence difficult to read<br /> Done.

      F. 959 – 968: Note, editing has not been accepted (tracked changes still shown).<br /> Done.

      G. 1116 - 1121: “Using the opioid crisis as an example…. in, for example, the opioid crisis” – same sentence.<br /> Rewritten.

      We thank you.

    3. On 2025-11-30 16:56:07, user Cyril Burke wrote:

      RESPONSE TO REVIEWER #1

      June 27, 2022<br /> Re: Longitudinal changes in creatinine signal early decline in glomerular filtration rate without consideration of age, sex, ‘race’, and nationality

      We greatly appreciate that the reviewers were thorough, fair, and helpful in their comments.

      Comments to the Author

      Reviewer #1: Burke et al submit a somewhat unusual paper, devoted to a topic of potential major clinical relevance, and as yet understudied.

      General comments

      1. The thesis of the authors, that using the baseline serum creatinine of a given patient would potentially improve the earlier diagnosis of kidney disease, even in the normal range, is in line with the experience of this reviewer, who always retrieves, whatever the difficulty of reaching that goal, past results of blood tests, and uses them as a way to date the onset of kidney disease, sometimes with important prognostic implications.

      Your experience adds support to the literature suggesting that historical sCr levels provide a context for sCr changes. These benefits might encourage investments in digital data exchanges so that electronic health records (EHRs) can ease collection and presentation of sCr results from multiple commercial and hospital laboratories.

      2. Yet, the authors do not provide data strongly supporting their thesis. For instance, when looking at case 2 [now Patient 3], should the last point (the most recent one) be omitted, there would be very little evidence supporting progressive early kidney disease.

      We advocate prospective monitoring of longitudinal sCr as a proxy for glomerular filtration rate (GFR). The Cases were meant to show that charting the data and simple follow-up over several visits and months can allow general clinicians to differentiate CKD from other explanations for increased sCr. The four case histories represent patients in a non-nephrology medical practice with borderline eGFR that raised the possibility of CKD. In each of these cases, retrospective collection of sCr values suggested varied explanations for the elevated sCr, and we expect many cases will represent sCr influences other than CKD, not necessarily warranting nephrology referral. Armed with this tool, and used prospectively, Physicians, nurse practitioner, and physician assistants (PCPs) might identify and manage the 90% of patients with currently unrecognized CKD.

      3. The claim that the statistics fit the data better when all points are used (page 9,11) should not come as a surprise. Using thresholds instead of the full range of values has long been known to be more powerful for statistical analysis. But fitting the data does not equal to a high positive predictive value!

      We agree that this is counterintuitive, so we thought this was an important point to discuss. Research methods that get translated into clinical settings rely on assumptions that are not always familiar to healthcare workers. Whatever the merits of thresholding conventions, understanding their mathematical underpinnings can inform a more nuanced interpretation of lab results. The revision includes our initial, intuitive assessment of the data and the interpretation of the residuals – from a mathematics perspective. Lack of awareness about residuals can easily lead to improper interpretation of thresholded lab data. The use of statistics is not intended to document superiority of fit but rather to demonstrate how simplifications with practical clinical value may gloss over clinically relevant information in some cases. The inclusion of additional charts seeks to take it away from abstracted statistics and toward more intuitive clinical concerns. We favor early diagnosis of kidney injury through investigation of nonspecific changes in longitudinal sCr. This method seems usable and may be manageable by PCPs using a time frame of several visits over several months to separate false positives, which may be influenced by chance attributable to the mathematical properties of lab data.

      4. A key question is whether in a real-world context, the earlier diagnosis of kidney disease would be possible, without too much background noise from intercurrent illness (functional), drugs (NSAIDS, etc.). In other words, would the specificity (or PPV) of the suspicion of early kidney disease be reasonable enough to catch the attention of clinicians

      We think so. We believe longitudinal serum creatinine (sCr) will encourage dialogue between patients and clinicians, raising awareness of the importance of avoiding kidney injuries that often happen out of sight and out of mind until, for far too many, culminating in urgent dialysis. In the same way that patients now ask for their blood pressure, we anticipate patients tracking their own sCr and kidney risks. Decades after introduction of the mercury sphygmomanometer, PCPs learned how to manage blood pressure to improve health. We believe longitudinal sCr can soon be a widely used tool because the concepts are old, there is a broad literature supporting this approach, and the value can be enhanced by more frequent testing of sCr. This is what PCPs do – sort the random cough, costochondritis, or stress response from nascent pneumonia, angina, and hypertension. PCPs already worry about the kidneys. They may welcome a tool to accompany the chest radiograph, electrocardiogram, and sphygmomanometer.

      Of interest, the decision analysis by den Hartog et al found markedly more false-positive diagnoses of CKD with eGFR than with serum creatinine alone.

      5. Even though there has been improvement in the standardization of measurement of serum creatinine (IDMS), the comparability of results measured by different labs remains suboptimal, at least in the experience of this reviewer, and medical shopping is not uncommon, making the availability of all previous results in the same graph a logistical challenge.

      We share this concern, which laboratorians have wrestled with for many years and will not be solved soon. However, we propose utilizing the maximum serum creatinine (sCr-max) to smooth the variability of these inputs (as well as the variability from patient diet and hydration). One laboratory will be the highest, and when patients use multiple laboratories, one laboratory may more often define the sCr-max. As patients learn the rationale for using the same lab, we believe most (not all) will voluntarily use one or perhaps two labs (as they mostly do when we repeating longitudinal MRI imaging studies, for example). The sCr-max reduces the effect of variability between laboratories, allowing clinical insights even without future improvements in sCr assays.

      Australia, Canada, and the United Kingdom have stricter sCr analytical performance goals than the United States, which could improve its sCr comparability by matching their standards.

      Specific comments

      1. The authors should mention that the USPTFS decided a month ago to revisit the question of screening for kidney disease in high-risk groups (page …)

      One reference stated that this initiative has not been announced publicly but is “under active consideration” by USPTFS because “…for a screening to help people live longer, healthier lives, clinicians must be able to treat the condition once it is found. The existence of effective treatments is one of many important factors that the Task Force considers.” This perspective is surprising because it ignores the potential of effective prevention by avoiding NSAIDs, hypotension, dehydration, and nephrotoxic medical treatments (e.g., aminoglycosides). We, too, look forward to updated findings from USPTFS.

      2. Even though ESRD has a legal meaning in the USA, not very relevant to the topic of this paper about early kidney disease, the authors should stick to the nomenclature proposed by a recent KDIGO consensus conference (see Levey et al. Nature Reviews in Nephrology). In particular, use kidney failure instead of ESRD/ESKD. When the topic is glomerular filtration, use that wording instead of kidney function (page…)

      We have adopted this terminology and would welcome any further recommendations.

      3. The authors allude to the concepts of prediabetes and prehypertension. But this reviewer points to the fact that the levels used to define those entities are currently “generic”, rather than based on previous values in an individual subject. Please discuss.

      We understand that the normal population ranges for serum glucose and blood pressure are narrower, with less interindividual variation, so population reference ranges work well for monitoring diabetes mellitus and hypertension. Unfortunately, this is not true for serum creatinine, though within-individual reference of longitudinal sCr appears to facilitate diagnosis of pre-CKD.

      4. The authors repeatedly mention in the discussion section evidence that even small increases in serum creatinine have prognostic significance. This has indeed been known for decades but is a different topic: AKI. Admittedly, there is growing evidence that AKI and CKD are linked. But that the stability of a biological parameter is prognostically best is all except surprising: the same is true for body weight, mood, blood pressure etc.

      We agree that AKI and CKD appear to be merging and this may become clearer from more frequent sampling and charting of longitudinal sCr. What has been missing is graphical representation of the data to allow quick assessment for CKD in long-term trends, and this may soon be obtainable from EHRs and IT departments, which should end the practice of deleting historical data of value to longitudinal analysis.

      [See next comment for Response to Reviewer #2.]

    1. On 2021-12-25 08:38:40, user Eslam Maher wrote:

      The authors investigate whether Machine Learning (ML) algorithms fare better compared to traditional Cox models in big data. They selected Glioblastoma and gliosarcoma from SEER as the basis of their data set. There are two main points that are worth considering here, (1) statistical, and (2) clinical.

      (1) a- Glioblastomas are relatively rare diseases, therefore, readers need to bare in mind that the hypothesis studied here may not be relevant to their work that is usually mono-institutional or multi-institutional. Unlike the huge SEER database, we never actually have such numbers at hand to analyze in survival models.

      There is no doubt that Cox would outperform ML models in smaller samples. ML is gaining popularity in the medical community that is hugely inflated and unnecessary.

      b- Unlike ML approaches, the performance of Cox models is heavily dependent on its assumptions. This includes the proportionality of hazards between levels of a given variable, which the authors do not seem to have investigated this assumption before running the model.

      Another assumption is how the model was selected in the first place. The authors say they have run Cox univariably to decide upon the variables that would be used in the final mode. It is unclear whether a "significant" variable is considered as such at 5% alpha. Regardless of the alpha level, automated stepwise methods are notorious, this is because they are very popular among physicians and not professional statisticians and epidemiologists. Stepwise methods do not allow modelers to think about the model at hand. Plus, some causal variables may not be statistically significant, while some nuisance variables may be coincidentally significant due to high N. Automated regression using p-values is a bad idea because it also ignores multiplicity problems.

      (2) a- 22.6% of the cases included had no surgery, how then were they diagnosed as glioblastomas if no tissue samples were available? It is unclear if surgeries comprised craniotomies and biopsies or the former alone.

      b- All glioblastomas and gliosarcomas are grade IV tumors, however, for some reason, grade is a variable included in the models with levels of grade I, II, III, and IV!

      c- Reference categories in the authors' models were selected alphabetically rather than clinically. For Site, there are 14 levels using ICD-O classifications. Such classifications are not meant for clinical correlations. For example, all Lobar sites (frontal, pariental, occipital etc) are part of the Cerebrum. There are only 2 cases available for cauda equina glioblastomas, which is nonsensical to include as a separate level in the model (which puts more constraints in the model's degrees of freedom while also resulting in unstable ratios).

      d- Finally, the median survival for glioblastoma patients as noted by the authors was eight months. Looking for model accuracy at 120 months is just insane.

      This would have a been a neat paper had the authors run a proper Cox model rather than run a straw man, and designed their study with a neuro-oncologist. Even then, please note that this preprint is concerned with the performace of these models IN BIG DATA only, so do not extrapolate to the data you are routinely working with.

    1. On 2020-03-20 20:57:29, user Sylvie Vullioud wrote:

      Could authors provide information to dissipate high risks of bias:

      1. Manuscript was first published on mediterranee-infection.com website, not on medRxiv. On the manuscript on the website on mediterranee-infection.com, I can read 'In Press 17 March 2020 – DOI : 10.1016/j.ijantimicag.2020.105949'. It means that manuscript was already accepted by International Journal of Antimicrobial Agents at the time when the manuscript was deposit on the 20.03.2020 on medRxiv.

      -> Pre-print on medRxiv is not a real pre-print to collect feed-back for manuscript improvement, as originally designed for. Moreover, medRxiv states: 'All preprints posted to medRxiv are accompanied by a prominent statement that the content has not been certified by peer review'.

      -> There is an obvious potential conflict of interest, because last author Raoult is editor of the article collection COVID-19 Therapeutic and Prevention in International Journal of Antimicrobial Agents.

      -> International Journal of Antimicrobial Agents is runned by Elsevier, suggesting 'If accepted for publication, we encourage authors to link from the preprint to their formal publication via its Digital Object Identifier (DOI)'.

      1. Discussion on the controversy of main cited Chinese paper, ref 8 ?

      2. According to paper, allocation of patients group was random but treated group is 51.2 years average and control group 37.3 years?

      3. Article describes 3 conditions of patients: asymptomatic, low and high symptoms. Why?

      4. Care to patients, biological and physiological sampling and analyses, and statistical analyses were not blinded. Why?

      5. I think that no placebo was used. Why?

      6. 6 patients on total of 42 were excluded from study: three patients were transferred to intensive care unit, 1 stopped because of nausea, 1 died. One left hospital. <br /> It is written :'study results presented here are therefore those of 36 patients (20 hydroxychloroquine-treated patients and 16 control patients). Why were dead, intensive care, and nausea patients not included in statistical treatment? <br /> -> This may be a selection bias? <br /> -> What about unwanted very worrying effects of the treatment?

      7. 'The protocol, appendices and any other relevant documentation were submitted to the French National Agency for Drug Safety (ANSM) (2020-000890-25) and to the French Ethic Committee (CPP Ile de France) (20.02.28.99113) for reviewing and approved on 5th and 6th March, 2020, respectively'. Pre-print was posted on 20.03.2020. Time points on day 14 on patients.<br /> -> So recruitment and study started before approval of ANSM and French Ethic Committee? How is it possible?

      8. How is it plausible that numerous authors (18!) participated equally to the work? Is it possible to add their respective contributions?

      Thank you in advance for considering my questions. <br /> Regards, <br /> Sylvie Vullioud

    1. On 2021-09-04 19:09:42, user Ben Veal wrote:

      As a qualified statistician who's been doing this stuff for over 20 years, and has worked on several medical studies I think I ought to add my voice to the crowd.<br /> There may be a few things that aren't fully accounted for such as the false positive rate for PCR tests, or unbalanced populations due to deaths of highly vulnerable members of the pre-infected group, but they should not alter the conclusions much. As mentioned by others the false positive rate for PCR tests would have the effect of biasing the risk ratio downwards, not upwards, so we should expect the effect to be even stronger than reported.

      As for the potential drop-out issue due to deaths of highly vulnerable people among the pre-infected group; this would only be a problem if there are some unaccounted for cofactors causing that high vulnerability. If this is the case then we can approximately correct for the imbalance by estimating the number of deaths in the pre-infected group based on the known infected mortality rate. <br /> I have done that calculation (see link below), and get a lower bound estimate for the 95% confidence interval of [4.3,11.23] which is still significant.<br /> However, it could make a big difference to the risk of hospitalization (again assuming there are some important cofactors unaccounted for).<br /> https://www.facebook.com/ec...

      Another criticism I have read in these comments is that they should have used a conditional model (https://en.wikipedia.org/wi... "https://en.wikipedia.org/wiki/Conditional_logistic_regression)") to account for the matching. Actually a conditional model is used when there is unequal distribution of the treatment groups (pre-infected & vaccinated) within each strata (age, gender, socio-economic status & geographic region), and you are unable to use covariates to control for this. But the matching that they did ensures that this isn't the case. Furthermore they control for all but one of the strata (geographic region) with covariates.

      So, overall I trust the overall conclusion; natural immunity from pre-infection is better than vaccination, but not as good as natural immunity + vaccination.

      This does not mean governments should put a halt to their vaccination programs since that's obviously going to result in more deaths among the vulnerable, but perhaps it might be wise to reduce the vaccination rate among the less vulnerable people (i.e. young healthy people) so that they can build up natural immunity and be better prepared to fend off new variants from spreading through the population. In fact it ought to now be possible to estimate the optimal proportions of vaccinated & unvaccinated that would result in the lowest risk of contagion spread, given that we can expect to see this virus reappearing every year.

    2. On 2021-09-14 13:39:06, user Henri van Werkhoven wrote:

      Dear colleagues,

      With interest did we read this manuscript which fueled a lively discussion during our journal club of the department of infectious diseases epidemiology at the University Medical Center Utrecht. The authors address a relevant research question. If there is a substantial difference in the risk of SARS-CoV-2 infections between previously infected and vaccinated individuals – as suggested - this may have consequences for social distancing, testing recommendations, and for projections of the impact of vaccination on future COVID-19 trends. However, we have several concerns regarding generalizability, selection bias, information bias, and confounding that we would like to address. We focus our discussion on model 1: the comparison of the fully vaccinated non-infected group (group 1) to the infected non-vaccinated group (group 2).

      In regard to generalizability:<br /> - Due to the matching process, only 4% of the available data is used (i.e. for model 1 only 32430/736559) and as a consequence the study population is fairly younger (with expectedly less comorbidity) than the source population (i.e. vaccinated individuals, infected individuals). Therefore, the study population may not be representative of this source population which severely limits the external validity of results for all vaccinated/infected people.<br /> - Naturally, subjects who died due to previous SARS-CoV-2 infection were not included in the study. Yet, without information on morbidity and mortality and contribution to the spread of SARS-CoV-2 from the primary infection, the results of the study are not informative for the question whether people without previous SARS-CoV-2 infection should be vaccinated or await natural infection. <br /> - All three study groups – vaccinated or infected at baseline (28th of February) – were established upon future information (no infection, no additional vaccination after June 1, 2021), which severely limits the use of the results for today’s decision making.

      In regard to selection bias:<br /> - People with a SARS-CoV-2 infection between February 28, 2021 and June 1, 2021, or those who received a first (infected group) or third vaccine (vaccinated group) between February 28, 2021 and August 14, 2021 were excluded from this study. Thus the study population of group 2 consists of previously infected people that do not take the opportunity to receive a booster vaccine, which may well be the less vulnerable people with a lower baseline risk of getting infected/hospitalized. This would bias the estimate in favor of the infected group.<br /> - Similarly, though at a smaller scale, people who died from COVID were not included in the analysis. This decreases the vulnerability of the infected group for secondary infections and/or hospitalization. This too would bias the estimate in favor of the infected group.

      In regard to information bias:<br /> - A difference in willingness to test between the vaccinated and previously infected group can result in biased estimates. Vaccinated people may be more on guard in regard to COVID-19 symptoms (especially if they adhere less to regulations because they are vaccinated) and will be tested more frequently. This can bias the estimate, again in favor of the infected group. However, this form of bias should not have affected the outcome hospitalization due to COVID-19, for which differences had the same direction. Yet, the number of those endpoints was low, limiting statistical power.

      In regard to confounding:<br /> - The authors acknowledge absence of information about health behavior, such as social distancing and masking. If the vaccinated group would adhere less to these preventive measures due to a sense of safety, this would also bias the estimates in favor of the infected group.<br /> - A potential important aspect is the young average age (36 years) of the study population. As they were all fully vaccinated before February 28th, we thought that a large proportion may have been health care workers, who have a higher chance of exposure to SARS-CoV-2, and thus infection after vaccination. This would also bias the estimate in favor of the infected group.

      We have scrutinized the paper in search of the fatal flaw; the one major methodological limitation that could explain the extreme effect in favor of the infected group, as reported. We conclude that it is not there, as we don’t think that any of the above biases can explain all of the effect. However, we did found several weaknesses that each have the potential to yield a modest bias, all in the same direction. Five modest biases may yield a large effect estimate. We, therefore, consider the question whether natural immunity provides better protection than full vaccination with Pfizer/BioNTech’s COVID vaccine remains unanswered.

      The authors (Annemarijn de Boer, Valentijn Schweitzer, Marc Bonten and Henri van Werkhoven, all at University Medical Center Utrecht) acknowledge all other journal club participants for their time dedicated to discussing the paper.

    1. On 2021-12-13 22:59:33, user Just Because I can wrote:

      Greetings RI team from Utah! I must begin with nicesties; "Go BRUNO"! My son graduated this past May 2021 from Brown. I am a speech and language pathologist with over 30 years of hospital, private and public school setting experiences. Over the past nine years, I have professionally focused on children ages 3-5 within the public preschool and private therapeutic settings. I service students and their parents with the most intensive and restrictive learning environments within our District due to cognitive, behavioral and communicative delays. I can't help but weigh in now, as I previously shared this article with my peers in August as I braced for the impact of the 2021 school year.

      Given your single assessment tool (I professionally do not profess strong decisions based on a single evaluative instrument, even as widely accepted at the Mullen), I've found your results to be intriguing and frankly, just as we anticipated.

      To compare to RI, our school district, closed schools for Remote Learning for only 3 mos. in the Spring of 2019 and returned to in person instruction with hybrid options in 2020. Of a caseload of 65 students, I had 3 that were online/virtual. In 2021, our District returned to essentially all in student learning.

      My informal observations this school year in Utah has been as follows:

      1. Increase in new referrals and eligible "older" 4+ year old children scoring remarkably delayed communication (Standard scores <50 given a typical range of 85-115) and no previous history of EI or preschool interventions. Our TIER 3, most restrictive preschool program has a marked influx of new referrals (e.g., total students in May was 24 and currently rises at 36 with 8 new referrals in Jan.)
      2. Many declined or rarely attended virtual Early Intervention supports, skipped medical wellness visits including dentistry during the pandemic.
      3. Increase in parent report of primary concerns with behavioral components.
      4. Given the current timeframe, we are NOT seeing marked progress with an influx in discharges (no longer eligible due to more typical standard scores). We are seeing progress and we have continued to see progress through the pandemic (which at times surprised me) but the levels of improvement are not as remarkable or typical as years past.
      5. Typical communication, fine/gross motor and even cognitive delays are still present but the comorbidity of exceptional delays in social/pragmatic and ultimately, behavioral skills combined make measured learning and ultimately IEP progress at a slower rate. Social/pragmatic delays are interfering with overall progress.
      6. Parent involvement, participation, enthusiasm and grit appear markedly depressed. Educational teams walk a fine line between empathy, compassion and expecting parents and care givers to step in and "do hard things" in difficult times. The teams are using external motivators such as pizza cards to motivate parents to attempt, complete and turn in 2x monthly parent based home practice pages.
      7. Increased rate of meeting attendance with Virtual options.

      Where do we go from here? I agree, measuring student outcomes is critical but supporting the parents (in any evidence based manner) is to me, a critical and crucial element. I thought the kids, once exposed to typical learning/situations and with repetition, our inflated numbers would flatten in a year and they would bounce back into typical ranges but it's the apathetic, tired, depressed parents that are lacking resilience and grit currently. I do think another component that would be most valuable and continues to need funding is Preschool for All (or most).

      Thank you to any cohort, parent, professional person interested in this dialogue, for reading my insights.

    1. On 2020-04-16 12:20:10, user Marlowe Fox wrote:

      The tests on the efficacy of HCQ are confounded by multiple variables, including comorbidities, symptom onset, prescription drugs (RAAS inhibitors appear to play a key role in viral intensity), and testosterone/estrogen level, to name only a few.

      Geneticists, epidemiologists, and other scientists have long used casual diagrams to clearly show variables that may potentially confound their results (1). The Wuhan study at the very least would need to account for the following:

      HCQ <— comorbidities —> recovery<br /> HCQ <— symptom onset —> recovery<br /> HCQ <— drug prescriptions —> recovery

      Adjusting for the confounding variable would essentially smooth out the flow of information between the treatment (HCQ) and the outcome (recovery), allowing for the inference of causal effects.

      Assuming observable data is not available to adjust for confounding variables, a casual mechanism (mediator) could smooth out the flow of information from the treatment to the outcome (so long as the mediator is not influenced by confounder).

      Luckily, multiple in vitro studies have been performed. One study posits that HCQ lowers endosomal pH which ultimately inhibits COVID from binding to ACE 2 and decreasing viral intensity (3).

      HCQ —> endosomal pH —>glycosylation of COVID cellular receptor —> ACE 2 binding —> viral intensity —> acute lung injury

      Another in-silico study posits that HCQ blocks specific protein sites on the host ACE2 cell, thereby thwarting its attempt to infect it and preventing the cytokine storm (over-reaction of the lymphatic system) that some posit is responsible for Acute Lung Injury (3). So here we have an entirely different causal mechanism:

      HCQ —> BRD-2 receptor sites —> cytokine storm —> acute lung injury

      Despite these problems, some believe that the p-values obviate the need to control for potentially lurking variables. However, they are subject to myriad influences, known as p-hacking. Whether it is the number of tests performed or the number of comparisons made, it increases the chance of finding a statistically significant p-value (4). Three professional statisticians co-authored a paper reviewing the validity of the Wuhan study (5). There were several issues with the data upon which the two significant p-values were based.

      I suppose there is also a pragmatic argument: The p-values, along with existing studies and reports, are sufficient enough evidence to offset any concern for lurking variables in these urgent times. In other words, how much evidence is sufficient to warrant large scale roll-out of a low-cost treatment that may have a beneficial effect, from saving individuals who would have otherwise died to curbing its spread?

      The consequences of large roll-out: manufacturing, scaling, distribution chains, and so forth could result in a tremendous diversion of resources. How many pharmaceutical manufacturers even have the capacity to roll out production of this magnitude? What if they all start scaling their labor to produce this particular drug. You can’t just put this genie back into the bottle. Not to mention the scientific energy/intellectual capital that would go to proving or disproving this proposed treatment. And why? Because scientific evidence demanded it? No because a tortured p-value and unpublished/unsubstantiated anecdotal evidence caught the attention of some in the media, and it has been over-popularized as a panacea. What about the risk that HCQ is not an effective treatment despite large investments in cash and resources that have been invested? Do you think the wheels of capitalism turn so easily? Investors will want a return and if that means continually touting an ineffective drug through spurious science, they will continue to do so. What about individuals taking HCQ as a prophylactic, believing themselves to be protected against COVID? Or COVID+ individuals taking HCQ and believing themselves to be cured? Or individuals who think: Well, if I get it—I’ll just take HCQ and be fine. This would increase the spread of COVID. From my perspective, the ignorance to viral transmission and the required precautions is widespread. This is just one more reason not to acquiesce to the new social norms of wearing face masks, social distancing, and abiding by shelter-in-place rules. Here, I think an understanding of cognitive psychology is important to anticipate the future behavior of a society in which a cheap and easy-to-manufacture cure is published in the media.

      To sum up, HCQ's efficacy is not sufficiently proven to warrant a widespread roll-out, because it could result in several downstream consequences, from the diversion of resources (both manufacturing capabilities and intellectual capital) to increasing the risk threshold of individuals--who spurious believe in an easy and cheap treatment--thereby increasing the infection rate. One of two things needs to happen. Clinical trials that properly adjust for all potential comorbidities. Or the discovery of a causal mechanism (in vivo), which would obviate the need to control/adjust for confounders. For me, this would tip the utilitarian scales in regard to the potential benefits versus the risks.

      References

      1. Judea Pearl and Dana Mackenzie. 2018. The Book of Why: The New Science of Cause and Effect (1st. ed.). Basic Books, Inc., USA.
      2. https://www.ncbi.nlm.nih.go...
      3. https://papers.ssrn.com/sol...
      4. https://www.scientificameri....
      5. https://zenodo.org/record/3....
    1. On 2020-05-15 01:01:11, user Timeisrelative wrote:

      This is not my field of study but I hope my comments are helpful to you. Thank you for publishing this important work.

      The name "SD" for your metric is confusing for three reasons. 1) Standard deviation which is also used in the paper is commonly abbreviated as SD. 2)Recently less travel has *increased* what people commonly refer to as "social distancing", however your metric "SD" tends to *decrease*. 3)Mobility is only one aspect of the common definition of social distancing. Other aspects are not attending mass gatherings, standing at least 6 ft apart, not shaking hands, etc.(https://hub.jhu.edu/2020/03... "https://hub.jhu.edu/2020/03/13/what-is-social-distancing/)") These other aspects are not captured by your metric so again I think it's confusing to call it a "social distancing ratio" and use the abbreviation SD. Better names might be "Mobility Reduction" or "Relative Mobility".

      Further, according to Wikipedia: "During the COVID-19 pandemic, the World Health Organization (WHO) suggested favoring the term "physical distancing" as opposed to "social distancing", in keeping with the fact that it is a physical distance which prevents transmission; people can remain socially connected via technology." (https://en.wikipedia.org/wi... "https://en.wikipedia.org/wiki/Social_distancing)")

      Your metric SD is based on "the assumption that when individuals make fewer trips, they physically interact less." But you are not looking at the number of trips directly, instead you look at the deviation from normal levels of trips. Why not look directly at the number of trips? Different areas my have widely varying baseline numbers of trips and one would expect infection rates to vary correspondingly. By measuring the correlation between the actual number of trips and infection rates we could see if that is in fact true.

      I'm having trouble understanding the calculation of GR. You state "A GR equal to zero indicates no new confirmed cases were reported in the last three days" However, plugging 0 into the all three Cj in the numerator of the GR calculation leads to log(0/3+0/3+0/3). The result is undefined(negative infinity) not zero. You also state " a value below one means that the growth rate during the last three days is lower than that of the last week" and testing some sample data does not produce this result. Perhaps I'm misinterpreting your formula?

      FIG 3 What is the "Raw Date" line? In your description of GR you say "We use 3-day moving averages to smooth volatile case reporting data." Does that statement refer to the 3-day summation in the numerator of "GR" or is there an additional 3-day moving average taken after GR is computed?

      The GR calculation itself introduces a lag due to averaging the previous 3 days of data in the numerator and previous 7 days of data in the denominator. This distinction is important as you state that the value of the 9-12 day lag "reflects the time it takes for symptoms to manifest after infection, worsen, and be reported." In fact the lag from the calculation itself is also a factor.

      It's also unclear if your source data is the date a positive test was taken or the date the lab results came back. When we are talking about a lag on the order of 10 days, a 1-3 day delay for results could be significant. Further, source data including the date of symptom onset is available in some states and would be more useful as it would eliminate part of the lag which could be affected by test availability and speed.

      Why are only the top 25 counties are analyzed? I would be interested in seeing the metrics calculated in other lesser affected areas. In other words, could mobility reductions result in the prevention of outbreaks or just in the reduction of major outbreaks?

      The metrics you've chosen (SD and GR) follow very similar paths among all 25 counties analyzed. All 25 counties saw sharp drops in SD between March 10th and March 20th. All 25 counties saw sharp drops in GR a few weeks later. However, adding counties that didn't have a sharp reduction in SD during that time period would be revealing. Also adding counties that had GR paths that either dropped over different time periods or that grew much slower and steadier would also help reveal if GR and SD are correlated in wider situations.

      Caption to Fig 2 has redundant text "(vertical dashed red lines)"

      "King County, Washington is excluded because it precedes widespread social distancing and was driven by an infection source that differs from other outbreaks in the US." Previously you demonstrated that the SD metric is not well correlated with dates of implementation for local and state social distancing directives. King County shouldn't be excluded just because it precedes widespread social distancing. Also how is it known that the "infection source" is different from the outbreaks at the top 25 counties chosen?

      "Last, the data used in this analysis does not differentiate amongst sociodemographic groups, and therefore may not representatively capture all groups such as the elderly, low income families and underrepresentative minorities, for whom social distancing may not be an option, or may not have cell phones." Everyone in those groups with a mobile phone and that has the apps and permissions required for teralytics to track them is expected to be included in the dataset. The dataset may not be representative of the population at large but that is not *because* the dataset doesn't differentiate between sociodemographic groups.

      Conclusions: "In conclusion, our results strongly support the conclusion that social distancing pays dividends in the vital reduction of load on hospital systems in the United States." I think this conclusion is too broad. You show no data on load of hospital systems. Your data is on the reduction in reported cases correlating to reduced number of trips in severely affected areas not social distancing as a whole.

    1. On 2020-11-26 12:13:59, user Dr Gareth Davies (Gruff) wrote:

      Thank you for this fascinating analysis! It brings together a great deal of very useful information, and the data were presented in useful and transparent ways, and the tables and graphs especially helpful in understanding the data.

      I would like to offer some constructive feedback concerning the statistics and their interpretation, as some results appear to have been misinterpreted and this undermines this excellent work.

      The use of term "statistically significant" (18 occurrences included negatives) is especially concerning and goes against best-practice. P values and confidence intervals are frequently misinterpreted by both review authors and readers. A lack of evidence is not evidence of lack of effect. This is especially concerning where interpretations of dose, frequency and trial length are interpreted, as they give the impression that some were demonstrably effective whereas others were demonstrably not effective and the latter is not something this study could ascertain and should definitely not conclude or discuss.

      (Best practice recommendations from Cochrane Handbook for Systematic Reviews of Interventions version 6.1 C.15.3.2: "Review authors should not describe results as ‘statistically significant’, ‘not statistically significant’ or ‘non-significant’ or unduly rely on thresholds for P values, but report the confidence interval together with the exact P value.")

      There is a great deal of heterogeneity in the studies that cannot be measured by an I-squared metric but are important and will affect. Differences in study populations, sizes, country, latitude, age ranges, comorbidities, length of trial, method of assessing outcome, dosing freqency, % participants <25nmol/L, year of study etc. can all introduce very large unmeasurable confounding bias that may strongly influence results in ways that cannot be accounted for by software calculating CIs, P values, or I^2 measures. I would strongly urge great caution in interpreting these as meaningful.

      For example, in the group of studies where dose equivalent > 2000 IU/d, the studies vary enormously in almost every attribute and yet the I-squared metric suggests only moderate heterogeneity which is very misleading. It is especially telling that in some studies the reported incidence of > 1 ARI in the intervention and control arms is wildly different across studies: ~17% (Rake 2020) ~74% (Camargo 2020); ~96% (Murdoch 2012), casting strong doubt on the reliability of the measure to capture the outcome of interest to the study.

      Berman 2012 showed a small population (N=124) of patients in Sweden (latitude 60°N) susceptible to ARIs (assessed with symptoms, range 40%-60%) and with measured high-prevalence of D deficiency (11.45%) responded positively to >2,000 IU with an odds ratio of 0.43 (CI 0.21 - .88). Among others, these results are combined with Camargo 2020 in New Zealand (40°S) in a very large population (N=5,056) of healthy adults with low prevalence of D deficiency (1.8%) where (ARIs self-reported cold/flu incidence ~75%) with an odds ratio 0.90 to 1.16; and Lehouck 2012 (adults with chronic obstructive lung disease).

      It's hard to see how the data from these trials can be meaningfully combined. It's no surprise the comined CI was large 0.84 to 1.31 (in truth it will be far larger since bias and measurement errors have not been accounted for), but the only interpretation possible here is that we cannot interpret anything from these combined data and more research is needed.

      The same problem occurs when combining individuals with deficiency (<25nmol/L) giving a combined CI of 0.53 - 1.16. This is reported as "a statistically significant protective effect of vitamin D was not seen in those with the lowest 25(OH)D concentrations" which is then wrongly interpreted to mean evidence of no effect which is simply not the case. All this means is the statistical power was too low to detect an effect with high confidence. Arguably, there IS a detectable effect if we use a lower confidence threshold. (I'm not suggesting this, I'm merely pointing out how careful we need to be interpreting statistics).

      Results with CIs crossing null can say nothing about the existence or non-existence of an effect and should not be reported or interpreted as such, especially if the ranges are large. The inability to reject the null hypothesis is not proof of the null hypothesis. It's just lack of study power.

      Statements such as "Greater protective efficacy of lower vs higher doses" has no evidential basis and should be removed. This analysis did not show a greater protective effect at lower doses! It showed an effect at lower doses and had insufficient data at higher doses to investigate the question. The subsequent musing over potential mechanisms to explain this imagined difference should also be removed.

      I would also strongly caution against multivariable meta-regressions on trial characteristics. There are simply too many potential unmeasured confounders and sources of measurement error to trust that this method will produce meaningful adjustments. There's no telling if this would properly adjust, or conversely introduce bias and loss of precision.

      I think if these issues were addressed the study contributes some very important and useful results confirming the positive beneficial effects of vitamin D, and suggests more research could help to answer the questions where the data were insufficient to cast light.

      Congratulations on the paper and I hope this feedback is helpful!

      Best wishes,

      Gareth

    1. On 2020-10-07 06:13:12, user Markku Peltonen wrote:

      There were a number of comments on this manuscript on twitter early August, with concerns on errors in the calculations among others. Might be useful for others, so here is what I tweeted on August 5th 2020 (https://twitter.com/MarkkuP...: "https://twitter.com/MarkkuPeltonen/status/1290754970292281349):")

      Recently there was a meta-analysis on the effects of masks conducted in Finland. A number of comments has been made about the quality of the piece, so I had a quick look at it. As the analysis was also mentioned at least in Sweden, few quick comments in English. 1/10

      Background: the Finnish Ministry of Social Affairs and Health did a systematic review in May 2020 on the use of community face coverings to prevent the spread of Covid-19. There was no meta-analysis in the review, which focused on effectiveness. 2/10

      The conclusion on that report was “very little research data available on the effectiveness of community face coverings in preventing the spread of COVID-19 in society.” and evidence “minor” or “non-existent”. 3/10

      So, now then a formal meta-analysis, identifying the same 5 randomised controlled trials, showing an effect with relative risk estimate 0.61 (95% CI 0.39-0.96).<br /> Few points: 4/10

      The meta-analysis focuses on efficacy; what is achievable potentially when perfect conditions. They do something which they call “account of bias caused by non-compliance”; ie. if persons in the mask-group did not were masks they “adjust” for this. 5/10

      To me, this sounds quite controversial: In my world we look at intention-to-treat first, and then perhaps maybe on the “per-protocol”/“as treated”. <br /> Efficacy important, but this is now something different than what the original systematic review aimed at. 6/10

      The problems of this accentuate in the Discussion, where the authors do not seem to understand the difference in efficacy and effectiveness, nor the fact that they are actually analysing something else than the original review, and making way too far-fetched conclusions. 7/10

      There are other peculiarities, for example “Four of the analyzed studies evaluated the use of masks on respiratory infections directly, and in one the primary outcome was compliance with mask use.”. Hopefully an error, I don’t believe they actually mix the outcomes like this. 8/

      . @jejkarppinen added the following comments after my initial post, which I agree with:<br /> - The potential biases in the original papers were not covered.<br /> - Quality of evidence was not evaluated at all.<br /> - Dissemination of the results did not consider the potential problems. 9/10

      Finally:<br /> - I've not read the original 5 studies. <br /> - I’m not an expert on systematic reviews/meta-analyses. <br /> - I do think recommendation for masks is motivated, and the evidence is there (but not here..).<br /> - I do think we should be objective when evaluating evidence. 10/10

      The original systematic review the Finnish Ministry of Social Affairs and Health in Finnish is here (english abstract only):<br /> http://julkaisut.valtioneuv...

      Ps. Somebody noted the lack of preregistered protocol, which reminded me that the PRISMA-guidelines helpful when reporting systematic reviews and meta-analyses. <br /> Their checklist should be followed in reporting:<br /> http://prisma-statement.org

      In addition, it was noted by Jesper Kivelä that there are errors in the calculations, these should be corrected (in Finnish):<br /> https://twitter.com/JesperK...

    1. On 2020-10-22 18:25:28, user helen colhoun wrote:

      From Helen M Colhoun, AXA Chair in Medical Informatics & Epidemiology, University of Edinburgh. Honorary Consultant in Public Health Medicine.<br /> David McAllister, Senior Clinical Lecturer in Epidemiology and Honorary Consultant in Public Health Medicine, University of Glasgow.<br /> The authors should be commended on attempting to characterise long-COVID-19. Post-viral syndromes are a well- recognised phenomenon and it is important to accurately quantify the full range of the COVID-19 on health. The authors are careful to state that their reported risks pertain only to those with symptomatic COVID. However there are several reasons to think that even among those symptomatic that these results may be subject to serious bias. First of all there is a fundamental weakness of estimating risk based on a non-representative sampling frame, i.e. those who have chosen to use the app in the first place. Then after dropping around half of the 45839 persons who tested positive as being asymptomatic (the numbers in the first part of the flow diagram do not quite add up) a further 14443 are dropped because of starting to use the app whilst already unhealthy- it is not clear whether some of this represents people reporting symptoms well before diagnosis. Then 25% of those remaining are dropped for not persistently logging their symptoms (which could easily be much more common in people with no persisting symptoms than those without). <br /> Another major problem is the lack of specificity of the diagnosis. The disease state of long-COVID19 would appear to be defined as having “at least one symptom lasting more than one day” which has then been further categorised as LC28 or LC56 if symptoms persisted for these number of days. These symptoms include clearly non-specific symptoms such as “fatigue” , “unusual muscle aches and pains” and “skipping a meal”. No comment is made as to the prevalence of such symptoms in the other millions of users of the ZOE app. In the paper we find a hint of the lack of specificity in that in a matched set of test negatives we find that “Individuals with long-COVID were more likely to report relapses (16.0%)….In comparison, in the matched group of 139 SARS-CoV2 negative tested individuals, a new bout of illness was reported in 11.5% of cases.” This difference could easily be attributable to recall bias since at least a large proportion of those with positive tests will have known their result.<br /> Unfortunately this paper is being widely reported in the press as showing that “long COVID affects around 10% of 18 to 49-year-olds who catch the virus.” However those studied comprise just 15% of all those with evidence of infection and it is plausible that many of those not studied have no evidence of long COVID. That is even before we consider the problem that most people who have “caught the virus” don’t even get tested. It would be more correct to say this; “having excluded 85% of people with detected COVID-19 who were asymptomatic or did not continue to record their symptom status, we find that 10% of young people with a positive test report at least one symptom for 28 days and 2% report at least one symptom for 56 days.These symptoms are not specific for COVID-19 and are commonly found in the general population. “ We suggest that the authors to make this important distinction clear in the title of the final version of their manuscript or it will continue to be misquoted. We also suggest that they discuss the impact of the potential biases raised above more fully.

    1. On 2019-07-16 13:28:54, user Guyguy wrote:

      EVOLUTION OF THE EBOLA EPIDEMIC IN THE PROVINCES OF NORTH KIVU AND

      ITURI.

      NEWS:

      High-Level Meeting on Ebola in Geneva

      On Monday, July 15, 2019, the Minister of Health, Dr. Oly Ilunga Kalenga, participated in the high-level meeting in Geneva to mobilize the international community to end the Ebola epidemic in the Democratic Republic of the Congo. His statement is available: Ladies and gentlemen.

      Since August 1, 2018, the Democratic Republic of the Congo is facing the Ebola epidemic<br /> the most complex of its history and the history of public health.<br /> As you followed yesterday, July 14, a positive case from Butembo was declared in the city of Goma. This morning, the positive case, quickly identified and isolated, was<br /> repatriated to Butembo. Vaccination has been launched for all contacts. Since the beginning of this epidemic, we prepared with the WHO for the possibility of positive cases in Goma.<br /> The situation is therefore under control and is being managed, as we did a few weeks with the positive case reported in Uganda. By the way, as a reminder, Goma is not the first provincial capital to report a positive case. This was the case in Bunia there<br /> a few weeks and in Mbandaka during the ninth epidemic of Ebola Virus Disease<br /> occurring in the province of Ecuador from May 7 to July 24, 2018.<br /> The risk factors of the current epidemic remain:<br /> - The density of the population;<br /> - the high mobility of the population;<br /> - The geographical area concerned covering 23 health zones spread over 2 provinces;<br /> - Part of the response is deployed in areas of military operation where armed groups and community militias;<br /> - The instrumentalisation of the epidemic by certain political actors during the period<br /> election.<br /> The tenth Ebola outbreak is not a humanitarian crisis. It's a health crisis public service, which intervenes in an environment characterized by development and shortcomings of the health system. This crisis requires a technical public health response to break the chain of<br /> transmission of the virus by relying on the actors of the health system and its partners<br /> traditional.<br /> Several pillars are thus implemented to break the chain of transmission, whose<br /> vaccination. The Ministry of Health has invited the last 28 and 29 June in Kinshasa, the<br /> producers of the four most advanced vaccines to fight Ebola, as well as the experts<br /> national and international for a meeting of scientific exchanges on vaccination in<br /> part of the ongoing epidemic. It emerged from these exchanges that the vaccine produced by the Merck, currently used in this outbreak, is the only one that has demonstrated its<br /> efficacy for reactive vaccination in the case of the current response. The good news<br /> is that there are enough doses available of this vaccine. To avoid confusion and<br /> amalgams in the difficult context of this epidemic, the Ministry of Health decided that no other vaccine trial would be implemented in the DRC until the tenth epidemic<br /> will be in progress.<br /> To date, thanks to the commitment of all, sufficient funds have been mobilized for<br /> previous response plans. On behalf of the Congolese Government, I express my gratitude to all donors.<br /> In developing the third strategic response plan (SRP3), covering the period of from February to July 2019, a special effort was made to put in place information for monitoring activities and expenditures to increase accountability operational than the financial accountability of all actors.<br /> The process of developing the fourth strategic response plan (SRP4), which will cover the<br /> period from July to December 2019, ended this Friday, July 12, 2019 in Goma. The<br /> The process was participatory and inclusive, and took into account lessons learned on an ongoing basis.<br /> The methodology for budgeting - bottom up - is part of the unit costs and<br /> the volume of the different activities to be implemented in each zone of<br /> health; these were then aggregated by sub-coordination.<br /> The Government is grateful for the contribution of our various partners as well as<br /> donors. However, this support must be in the respect of the Government, and in<br /> partnership with institutions and not in parallel. Only the anchoring of the riposte in the<br /> health system and the strengthening of the actors of the Ministry of Health will<br /> to ensure the sustainability of all achievements of the response. All sectoral support plans for the response must be developed in the same spirit, in consultation with the ministries<br /> sector. Public health actors want to make SRP4 a "final push". To get there, we demand from all actors of discipline and accountability. In each pillar, in each sub-coordination, the Ministry of Health and the co-leaders accredit implementation agencies on the basis of five criteria to ensure accountability:<br /> - Have a demonstrated operational capacity with regard to the number and<br /> the expertise of human resources (not agencies in "learning curve", recruiting<br /> on Linkedin for North Kivu);<br /> - Rationalize geographical deployment and ensure an effective presence on the<br /> field (not just attending meetings);<br /> - Commit to implementing the activities according to the validated protocols for the response;<br /> - Make a commitment to transmit the data to the General Coordination of the response, in<br /> respecting the reporting tools that allow the monitoring of the indicators of<br /> performance and produce dashboards;<br /> - Commit to adopting the scales and the Manual of Procedures for the Management of<br /> human resources developed by the Ministry of Health and the World Bank, which<br /> that no other vaccine trial would be implemented in the DRC until the tenth epidemic<br /> will be in progress.<br /> To date, thanks to the commitment of all, sufficient funds have been mobilized for<br /> previous response plans. On behalf of the Congolese Government, I express my gratitude to all donors.<br /> In developing the third strategic response plan (SRP3), covering the period of<br /> from February to July 2019, a special effort was made to put in place information for monitoring activities and expenditures to increase accountability operational than the financial accountability of all actors.<br /> The process of developing the fourth strategic response plan (SRP4), which will cover the<br /> period from July to December 2019, ended this Friday, July 12, 2019 in Goma. The process was participatory and inclusive, and took into account lessons learned on an ongoing basis.<br /> The methodology for budgeting - bottom up - is part of the unit costs and the volume of the different activities to be implemented in each zone of health; these were then aggregated by sub-coordination. The Government is grateful for the contribution of our various partners as well as donors. However, this support must be in the respect of the Government, and in<br /> partnership with institutions and not in parallel. Only the anchoring of the riposte in the<br /> health system and the strengthening of the actors of the Ministry of Health will<br /> to ensure the sustainability of all achievements of the response. All sectoral support plans for the response must be developed in the same spirit, in consultation with the ministry<br /> sector. Public health actors want to make SRP4 a "final push". To get there, we<br /> demand from all actors of discipline and accountability.<br /> In each pillar, in each sub-coordination, the Ministry of Health and the co-leaders<br /> accredit implementation agencies on the basis of five criteria to ensure<br /> accountability:<br /> - Have a demonstrated operational capacity with regard to the number and<br /> the expertise of human resources (not agencies in "learning curve", recruiting<br /> on Linkedin for North Kivu);<br /> - Rationalize geographical deployment and ensure an effective presence on the<br /> field (not just attending meetings);<br /> - Commit to implementing the activities according to the validated protocols for the response;<br /> - Make a commitment to transmit the data to the General Coordination of the response, in<br /> respecting the reporting tools that allow the monitoring of the indicators of<br /> performance and produce dashboards;<br /> - Commit to adopting the scales and the Manual of Procedures for the Management of<br /> prepared by the Ministry of Health and the World Bank, whom I wish to thank in particular for its unfailing support for the Government since the beginning of this epidemic.<br /> Only discipline and accountability will allow us to put an end to this epidemic, which has<br /> that too long.<br /> Now is the time to think about the post-Ebola era and start developing with others<br /> sectors, ambitious development plans that alone will be able to resolve fundamental problems of the population.<br /> Thank you.<br /> Source: Ministry of Health press team on the state of the response to the Ebola epidemic in the Democratic Republic of Congo

    1. On 2022-02-08 21:40:31, user Pierre Siffredi wrote:

      One factor influencing the validity of cross ancestry PRS is ancestral differences in the meaning of the phenotype, as well as the validity/reliability characteristics of it's measure.

      For example, it's been proposed that there be race specific charts for BMI. Given a white person and black person with the same BMI, the black person may have e.g. higher bone density, muscle mass, etc. But the genetics of these things, if observed in a white person, would give them a low BMI. Thus for this black person, using a european-based-PRS prediction of BMI provides a very different estimate from their observed BMI.

      When you get into softer phenotypes such as psychiatric measures, do we necessarily think that people of different ancestral backgrounds with the same BDI score have the same amount of depression? Does the concept of depression even hold consistently across ancestral background? If it does, does the variance hold constant too (thus affecting the r-squared predicted by PRS)?

      I think this notion is something under-explored in the context of PRS due to lack of availability of data, limited clinical/practical understanding of the phenotype (especially appraisals of measure validity in different groups), and the lazy desire to pretend as if we have perfectly measured everything and that there is no difference between the observed and latent variable.

    1. On 2021-12-19 11:46:40, user Kjell Krüger wrote:

      Tables and figures in the study point out that some 50% of the selection have status "unv." <br /> and "not born in Norway". Statistics from the study also marks out that some 80% of the total selection comes from the South-East region of Norway. Finally some 35% of unv. are marked with virusvariant "unknown", which we may suppose is other than omicron, as the study was done in the period up to october? It could be of interest to se some more deviation analyzes made on these parameters. Amount of beds i Norwegian hospitals are stated by SSB to be some 11500 beds, of which now some 400 are occupied with cov patients. I suppose all these parameters also should be interesting indata for future planning for how to manage future epidemic crises in Norway. Maybe new studies also will highlight possibilities that some regions should be set up with more capacity and competence than others, with the possibility to also transport both personell and patients between regions? I think questions and answers on these matters will be of big interest for politicians in both locally, regionally and nationally area one day when this crisis fade out - and preparation for the next one begins.

    1. On 2020-06-06 01:33:13, user David Hood wrote:

      I think the "39.5% of cases seeking medical consultation in primary care settings" may be overly conservative in the model for a parameter representing getting medical advice, as it is based of influenza in the 2018 'flu season (a fairly typical year). We know from the ESR influenza surveillance site that healthline historically (I don't know the period for what they determine historical) get around 40000 Influenza like illness calls a year, and for the period from the week of 14/2 to 29/5 there are historically around 10000 ILI calls. In 2020, for the period from the week of 14/2 to 29/5, there were around 26000 ILI calls. Even allowing for false positive worries from anxious people boosting call numbers, it suggests that people seeking official advice about ILI is dramatically higher in 2020 (which I also acknowledge is not the same as visiting a primary care location about an ILI, which is the 39.5% figure, but the official advice was to ring Healthline, who were presumably advising testing/ isolation/ primary health as appropriate)

    1. On 2023-07-21 14:12:39, user Gaël Nicolas wrote:

      I think that this variant is definitely a strong contributor to AD. However, the pedigrees also show that the patients with DNA available and carrying the variant, also carry one APOE4 allele. Actually, APOE4 segregates as good as SORL1 in these pedigrees! All affected individuals with DNA available are SORL1+/APOE4+. One unaffected individual is SORL1+/APOE4- (family 1) and one unaffected individual is SORL1-/APOE4+ (family 2). To be clear, I have absolutely no doubt of a major role of the SORL1 variant here, but I feel that this is very much consistent with a more complex inheritance and not purely autosomal dominant, as shown in our penetrance paper (Schramm et al., Genome Medicine 2022, PMID 35761418)

      Interestingly, we have the same variant in three independant families from France (one of them is mentioned in this preprint). Although there is an obvious aggregation of AD cases in the families, there is a huge diversity of ages of onset and younger cases have a positive family history in both branches, suggesting the contribution of additional factors. Some of them are APOE4+ but not the 2 youngest probands. This may suggest the contribution of undetected contributing variants along with SORL1.

      Overall, our penetrance paper (Schramm et al., 2022) and many pedigrees suggest a contribution of additional factors with SORL1 variants and that SORL1 alone may not be sufficient / fully penetrant. We have clear evidence for APOE4, as this is a common allele, but we know that there are many other other AD-associated variants, especially rare variants, among known variants (as families with SORL1+ABCA7 as we previously reported in Campion et al., Acta Neuropath 2019, PMID 30911827) and in other papers and, obviously, not yet known variants.

      I thus recommend to use such results with great caution for genetic counseling, as we still don't exactly know how variants in other genes may drastically change an age of onset from 50 to 75-80 for example, or to absence of AD (as also shown for some truncating variants, as in Campion et al., 2019 where a mother transmitted a truncating a truncating variant and was unaffected with AD at age 95 years).

    1. On 2020-04-24 09:57:00, user Philip Davies wrote:

      Well, well well,

      This pre-print would make a good script for an episode of Columbo.

      The retrospective analysis, as presented, leads the reader to just one conclusion in a bazaar of many possible conclusions.

      I am even starting to have sympathy with D. Raoult and his team. I note his hot tempered response to this paper, where he lists two enormous factors that should be considered when wrestling with the data: the fact that the HCQ and HCQ & AZ cohorts were a sicker crowd (he lists lymphopenia) and that the sickest of the non-HCQ ventilated patients were then given HCQ (plus AZ in most cases) in a desperate last bid only for most to die.

      Raoult's point is certainly valid.

      We must remember that for most of the study period the use of HCQ was "ex-license" on a compassionate basis only. This means only the sickest patients got it. Remember also that this is a retrospective analysis, therefore observational. It was not run as a therapeutic trial. On the other hand, the use of AZ was already accepted (hence 30% of the non-HCQ cohort got it anyway).... although do be aware that by this time there had been quite a lot of focus on potentially dangerous QT lengthening when HCQ and AZ were used together in very sick patients.

      The HCQ cohort was, across all key determinants, the weakest and sickest group (it had the poorest prospects looking at age, ethnicity, smoking status, congestive heart failure, peripheral vascular disease, cerebrovascular disease (strokes),dementia, COPD, Diabetes (with and without complications)! ... and indeed, the HCQ and HCQ & AZ cohorts did have 100% more lymphopenia than the non-HCQ group.

      BUT, the big asymmetric issues become obvious when we look at the pre- and post- ventilator numbers.

      In terms of patients discharged without needing ventilation, the "victorious" non-HCQ group performs poorer than the 2 treated groups. This despite having a better prognostic baseline. But the results for this group change dramatically (for the better) when we look at the outcomes of ventilation. 25 ventilated patients came from this group.... but 19 of these 25 patients were then started on HCQ or HCQ & AZ after ventilation was started. It is screamingly obvious that these would be the sickest patients in that group: they were given such compassionate drugs in extremis. So having ejected 19 of 25 ventilated patients into the other cohorts, the non-HCQ group only had 3 deaths from its remaining 6 ventilated patients.

      The numbers of ventilated patients in the other cohorts (HCQ and HCQ & AZ) were thus substantially inflated with these new super-sick patients, who mostly died.

      There really can be no conclusion at all when looking at a study of this nature without knowing much more about individual clinical conditions and guiding principles behind clinician's decision making. It's still possible to make some reasonable assumptions:

      If I were Columbo?... I would say the non-HCQ cohort contained patients of extremes, with the best and worst potential. The worst would have been the very frail (malignancy and or congestive heart failure maybe ... see the stats), who probably were earmarked for 'supplemental oxygen' only from the very start. Such patients would not have been suitable for compassionate use of non proven drugs (remember, most of this came before the "emergency use" edict by FDA). This would explain the number of non-ventilated patients who died in this group (they may have been given AZ only, not being a controversial drug, but otherwise they did not get any significant interventional therapy). These patients would have had significant chronic disease and very poor obs/indices (including lymphopenia). But given that this cohort had, overall, a better starting prognosis than the other two groups, it means that the remaining patients in the group were promising candidates for survival (with better obs/indices). Such patients, not being part of a clinical trial, would not have been offered HCQ on a compassionate basis unless they got dramatically worse .... and of course, the ones who did get worse on the ventilator were started on HCQ (& often AZ as well) and thus swapped into the HCQ / HCQ & AZ cohorts.

      If we can understand that, then we might start to think that in fact HCQ & AZ is the best performing cohort with the other 2 vaguely distant. But this is being unfair to the HCQ cohort:

      The reason that a sick patient would be given one experimental drug on a compassionate basis (HCQ) but not have a rather less experimental drug further added (AZ), can really only be explained by considering risk versus benefit. A clinician would choose to use HCQ because the patient was particularly sick. The clinician would only add AZ if it was felt that this was worth the risk.... but a particularly sick patient with significant cardiovascular disease (the HCQ contained the most CVD risk) might then die of a more abrupt arrhythmia through adding yet another QT lengthening drug. I dare say the clinicians were tempted to make some "Hail Mary" plays, but we must remember, these patients were not part of an ongoing trial, these drugs were "ex-license" for compassionate use only and clinicians were still accountable for responsible actions. So for those particularly sick frail patients, it wasn't worth the risk.

      I am pretty sure that the HCQ cohort (which had pretty good pre-ventilator stats) crashed badly because it was loaded with the sickest patients .... patients that were too sick to risk adding AZ.

      So, the findings of this retrospective analysis are, in my opinion, likely to be incorrect.

      I believe I can confidently state that:

      1. The HCQ cohort started with the sickest patients and had even more of the sickest added during ventilation. Some were too sick to risk the addition of AZ to existing HCQ.
      2. The HCQ/AZ cohort also had some very sick patients (again with more additions during ventilation).
      3. The Non-HCQ cohort had the best prognosis overall from the very start (although likely a polarized mixture of the most frail and the most promising)... and then its stats got even better when it jettisoned its sickest ventilated patients into the other 2 cohorts.

      It is almost impossible to reach a conclusion from all this. BUT, the most likely finding is NOT that adding HCQ delivers a worse outcome than standard treatment. In fact, if we look at the pre-ventilator stats, the addition of HCQ might actually have provided considerable benefit to a particularly sick group of patients. Whether or not the addition of AZ to HCQ adds benefit is also unclear ... although my 'swingometer' is pointing slightly more to benefit than harm.

      Once again. I suggest that a robust study into prophylaxis and early treatment (using sensible safer doses adjusted for pulmonary sequestration) will deliver the most interesting results for CQ/HCQ.

      Dr Phil Davies<br /> Aldershot Centre For Health<br /> http://thevirus.uk

      EditView in discussion<br /> Discussion on medrxiv 3 comments<br /> medrxiv viewer<br /> Philip Davies<br /> Philip Davies 4 days ago<br /> The low dose arm of this study is worth following.

      The big problem for this study is comparison. It really has not defined the control population at all. The Italian and Chinese references are entirely different. Even the 2 Chinese populations referenced had massively different outcomes because the populations examined were different.

      The Italian mortality rate was actually similar to the overall study average here (but much higher than the low dose arm). The Chinese study involved all patients admitted to the two hospitals ... that included a majority of patients with moderate ("ordinary" as the Chinese class it) disease severity. The patients in this Brazilian study were regarded as severe or critical ... such patients (looking at worldwide stats) would attract a mortality of 30-40% plus.

      This is the most important factor. Do not compare apples with pears. So far this study points the "swingometer" in favor of benefit versus harm for the use of HQN in patients with advanced disease.

      Once again however, we are looking at the potential impact of an orally administered drug to patients with advanced disease. That's a big ask.

      For CQ and HCQ the most interesting results will likely come from studies looking at prophylaxis and early treatment (using safe doses, not silly high doses with added drugs that also lengthen QT). We can't yet guess how they will pan out.

      Dr Philip Davies<br /> GP<br /> Aldershot Centre For Health, UK<br /> http://thevirus.uk

    2. On 2020-04-24 00:57:17, user Philip Davies wrote:

      Well, well well ...

      This pre-print would make a good script for an episode of Columbo.

      The retrospective analysis, as presented, leads the reader to just one conclusion in a bazaar of many possible conclusions.

      I am even starting to have sympathy with D. Raoult and his team. I note his hot tempered response to this paper, where he lists two enormous factors that should be considered when wrestling with the data: the fact that the HCQ and HCQ & AZ cohorts were a sicker crowd (he lists lymphopenia) and that the sickest of the non-HCQ ventilated patients were then given HCQ (plus AZ in most cases) in a desperate last bid only for most to die.

      Raoult's point is certainly valid.

      We must remember that for most of the study period the use of HCQ was "ex-license" on a compassionate basis only. This means only the sickest patients got it. Remember also that this is a retrospective analysis, therefore observational. It was not run as a therapeutic trial. On the other hand, the use of AZ was already accepted (hence 30% of the non-HCQ cohort got it anyway).... although do be aware that by this time there had been quite a lot of focus on potentially dangerous QT lengthening when HCQ and AZ were used together in very sick patients.

      The HCQ cohort was, across all key determinants, the weakest and sickest group (it had the poorest prospects looking at age, ethnicity, smoking status, congestive heart failure, peripheral vascular disease, cerebrovascular disease (strokes),dementia, COPD, Diabetes (with and without complications)! ... and indeed, the HCQ and HCQ & AZ cohorts did have 100% more lymphopenia than the non-HCQ group.

      BUT, the big asymmetric issues become obvious when we look at the pre- and post- ventilator numbers.

      In terms of patients discharged without needing ventilation, the "victorious" non-HCQ group performs poorer than the 2 treated groups. This despite having a better prognostic baseline. But the results for this group change dramatically (for the better) when we look at the outcomes of ventilation. 25 ventilated patients came from this group.... but 19 of these 25 patients were then started on HCQ or HCQ & AZ after ventilation was started. It is screamingly obvious that these would be the sickest patients in that group: they were given such compassionate drugs in extremis. So having ejected 19 of 25 ventilated patients into the other cohorts, the non-HCQ group only had 3 deaths from its remaining 6 ventilated patients.

      The numbers of ventilated patients in the other cohorts (HCQ and HCQ & AZ) were thus substantially inflated with these new super-sick patients, who mostly died.

      There really can be no conclusion at all when looking at a study of this nature without knowing much more about individual clinical conditions and guiding principles behind clinician's decision making. It's still possible to make some reasonable assumptions:

      If I were Columbo?... I would say the non-HCQ cohort contained patients of extremes, with the best and worst potential. The worst would have been the very frail (malignancy and or congestive heart failure maybe ... see the stats), who probably were earmarked for 'supplemental oxygen' only from the very start. Such patients would not have been suitable for compassionate use of non proven drugs (remember, most of this came before the "emergency use" edict by FDA). This would explain the number of non-ventilated patients who died in this group (they may have been given AZ only, not being a controversial drug, but otherwise they did not get any significant interventional therapy). These patients would have had significant chronic disease and very poor obs/indices (including lymphopenia). But given that this cohort had, overall, a better starting prognosis than the other two groups, it means that the remaining patients in the group were promising candidates for survival (with better obs/indices). Such patients, not being part of a clinical trial, would not have been offered HCQ on a compassionate basis unless they got dramatically worse .... and of course, the ones who did get worse on the ventilator were started on HCQ (& often AZ as well) and thus swapped into the HCQ / HCQ & AZ cohorts.

      If we can understand that, then we might start to think that in fact HCQ & AZ is the best performing cohort with the other 2 vaguely distant. But this is being unfair to the HCQ cohort:

      The reason that a sick patient would be given one experimental drug on a compassionate basis (HCQ) but not have a rather less experimental drug further added (AZ), can really only be explained by considering risk versus benefit. A clinician would choose to use HCQ because the patient was particularly sick. The clinician would only add AZ if it was felt that this was worth the risk.... but a particularly sick patient with significant cardiovascular disease (the HCQ contained the most CVD risk) might then die of a more abrupt arrhythmia through adding yet another QT lengthening drug. I dare say the clinicians were tempted to make some "Hail Mary" plays, but we must remember, these patients were not part of an ongoing trial, these drugs were "ex-license" for compassionate use only and clinicians were still accountable for responsible actions. So for those particularly sick frail patients, it wasn't worth the risk.

      I am pretty sure that the HCQ cohort (which had pretty good pre-ventilator stats) crashed badly because it was loaded with the sickest patients .... patients that were too sick to risk adding AZ.

      So, the findings of this retrospective analysis are, in my opinion, likely to be incorrect.

      I believe I can confidently state that:

      1. The HCQ cohort started with the sickest patients and had even more of the sickest added during ventilation. Some were too sick to risk the addition of AZ to existing HCQ.
      2. The HCQ/AZ cohort also had some very sick patients (again with more additions during ventilation).
      3. The Non-HCQ cohort had the best prognosis overall from the very start (although likely a polarized mixture of the most frail and the most promising)... and then its stats got even better when it jettisoned its sickest ventilated patients into the other 2 cohorts.

      It is almost impossible to reach a conclusion from all this. BUT, the most likely finding is NOT that adding HCQ delivers a worse outcome than standard treatment. In fact, if we look at the pre-ventilator stats, the addition of HCQ might actually have provided considerable benefit to a particularly sick group of patients. Whether or not the addition of AZ to HCQ adds benefit is also unclear ... although my 'swingometer' is pointing slightly more to benefit than harm.

      Once again. I suggest that a robust study into prophylaxis and early treatment (using sensible safer doses adjusted for pulmonary sequestration) will deliver the most interesting results for CQ/HCQ.

      Dr Phil Davies<br /> Aldershot Centre For Health<br /> http://thevirus.uk

    1. On 2020-06-24 18:56:17, user André GILLIBERT wrote:

      Title : Proposal for improved reporting of the Recovery trial<br /> André GILLIBERT (M.D.)1, Florian NAUDET (M.D., P.H.D.)2<br /> 1 Department of Biostatistics, CHU Rouen, F 76000, Rouen, France<br /> 2 Univ Rennes, CHU Rennes, Inserm, CIC 1414 (Centre d’Investigation Clinique de Rennes), F- 35000 Rennes, France

      **Introduction**

      Dear authors,<br /> We read with interest the pre-print of the article entitled “Effect of Dexamethasone in Hospitalized Patients with COVID-19: Preliminary Report”. This reports the preliminary results of a large scale randomized clinical trial (RCT) conducted in 176 hospitals in the United Kingdom. To our knowledge it is the largest scale pragmatic RCT comparing treatments of the COVID-19 in curative intent. The 28-days survival endpoint is objective, clinically relevant and should not be influenced by the measurement bias that may be caused by the open-label design. While 2,315 study protocols have been registered on ClinicalTrials.gov about COVID-19, as of June 24th 2020, Recovery is, to our knowledge, the only randomized clinical trial on COVID-19 that succeeded to include more than ten thousands patients. The open-label design and simple electronic case report form (e-CRF) may have helped to include a non-negligible proportion of all COVID-19 patients hospitalized in the United Kingdom (UK). Indeed, as of June 24th 2020, approximatively 43,000 patients died of COVID-19 in hospital in the UK, of whom approximatively 0.24 × 11,500 = 2,760, that is more than 6% of all hospital deaths of COVID-19, where included in the Recovery study.<br /> Having read with interest version 6.0 of the publicly available study protocol (https://www.recoverytrial.n... "https://www.recoverytrial.net/files/recovery-protocol-v6-0-2020-05-14.pdf)") we had hoped for more details in the reporting of methods and results of this trial and take advantage of the open-peer review process offered by pre-prints servers to suggest improving some aspects of the reporting before the final peer-reviewed publication. Please, find below some easy to answer comments that may help to improve the article overall.

      **Interim analyses and multiple treatment arms**

      The first information would be about interim analyses. The protocol (version 6.0) specifies that it is adaptive and that randomization arms may be added removed or paused according to decisions of the Trial Steering Committee (TSC) basing its decision on interim analyses performed by the Data Monitoring Committee (DMC) and communicated when “the randomised comparisons in the study have provided evidence on mortality that is strong enough […] to affect national and global treatment strategies” (protocol, page 16, section 4.4, 2nd paragraph). The Supplementary Materials of the manuscript specifies that “the independent Data Monitoring Committee reviews unblinded analyses of the study data and any other information considered relevant at intervals of around 2 weeks”. This suggests that many interim analyses may have been performed from the start (March 9th) to the end (June 8th) of the study.<br /> Statistically, interim analyses not properly taken in account generate an inflation of the type I error rate which may be increased again by the multiple treatment arms. Methods such as triangular tests make it possible to control the type I error rate. Most methods of control of type I error rate in interim analyses require that the maximal sample size be defined a priori and that the timing and number of interim analyses be pre-planned. This protocol being adaptive, new arms were added, implying new statistical tests in interim analyses, and no pre-defined sample size as seen in page 2 of the protocol: “[...] it may be possible to randomise several thousand with mild disease [...], but realistic, appropriate sample sizes could not be estimated at the start of the trial.” This make control of the type I error rate difficult. The fact that the study has been stopped on the final analysis as we understand from the current draft rather than interim analysis does not remove the type I error rate inflation. The multiple treatment arms lead to another inflation of the type I error rate.<br /> The current manuscript does not specify any procedure to fix these problems. The Statistical Analysis Plans (SAP) V1.0 (in section 5.5) and V1.1 (in section 5.6) specify that “Evaluation of the primary trial (main randomisation) and secondary randomisation will be conducted independently and no adjustment be made for these. Formal adjustment will not be made for multiple treatment comparisons, the testing of secondary and subsidiary outcomes, or subgroup analyses.” and nothing is specified about interim analysis. Therefore, we conclude that no P-value adjustment for multiple testing has been performed, neither for multiple treatment arms nor for interim analysis. If an interim analysis assessing 4 to 6 treatment arms at the 5% significance level has been performed every 2 weeks from march to June, up to 50 tests may have been performed, leading to major inflation of type I error rate. In our opinion, the best way to assess and maybe fix the type I error rate inflation, is to report with maximal transparency every interim analysis that has been performed, with the following information:<br /> 1. Date of the interim analysis and number of patients included at that stage<br /> 2. Was the interim analysis planned (e.g. every 2 weeks as planned according to supplementary material) or unplanned (e.g. due to an external event, for instance the article of Mehra et al about hydroxychloroquine published in The Lancet, doi:10.1016/S0140-6736(20)31180-6), and if exceptional, why?<br /> 3. Which statistical analyzes, on which randomization arms, have been performed at each stage <br /> 4. If predefined, what criteria (statistical or not) would have conducted to early arrest of a randomization arm for inefficiency and what criteria would have conducted to arrest for proved efficacy?<br /> 5. If statistical criteria were not predefined, did the DMC provide a rationale for his choice to communicate or not the results to the TSC? If yes, could the rationale be provided?<br /> 6. The results of statistical analyzes performed at each step<br /> 7. The decision of the DMC to communicate or not the results to the TSC and which results have been reported as the case may be<br /> The information about interim analyses and multiple randomization arms will help to assess whether the inflation of type I error rate is severe or not. A post hoc multiple testing adjustment, taking in account the many randomized treatments and interim analyses, should be attempted, and discussed, even though there may be technical issues due to the adaptative nature of the protocol.

      **Adjustment for age**

      An adjustment for age (in three categories <70 years, 70-79, >= 80 years, see legend of table S2) in a Cox model was performed for the comparison of dexamethasone to standard of care in the article. This adjustment was not specified in the version 6.0 of the protocol but was, according to the manuscript “added once the imbalance in age (a key prognostic factor) became apparent”. This is confirmed by the addition of a words ““However, in the event that there are any important imbalances between the randomised groups in key baseline subgroups (see section 5.4), emphasis will be placed on analyses that are adjusted for the relevant baseline characteristic(s).” in section 5.5 page 16 of the SAP V1.1 of June 20th compared to the SAP V1.0 of June 9th which specified a log-rank test. The SAP V1.0 of the 9th June may have been written before the database has been analyzed (data cut June 10th) but the SAP of the 20th has probably been written after preliminary analysis have been performed. This is consistent with the words “became apparent” of the manuscript. Therefore, in our opinion, this adjustment must be considered as a post hoc analysis rather than as the main analysis. Moreover, even though the SAP V1.1 specifies that an “important imbalance” will lead to an “emphasis” on adjusted analyses, it does not change the primary analysis (see section 5.1.1 page 14). It is not clear what “important imbalance” means. To interpret that, we will perform statistical tests to assess balance of key baseline subgroups specified in SAP V1.1 (see section 5.4):<br /> 1. Risk group (three risk groups with approximately equal number of deaths based on factors recorded at randomisation). Its distribution is shown in figure S2. A chi-square tests on the distribution of risk groups in Dexamethasone 1255/500/349 and Usual care 2680/926/715 groups, lead to a P-value=0.092. A chi-square test for trend yields a P-value equal to 0.23.<br /> 2. Requirement for respiratory support at randomisation (None; Oxygen only; Ventilation or ECMO). P-value=0.89 for chi-square test and P-value=0.86 for chi-square for trend.<br /> 3. Time since illness onset (<=7 days; >7 days). P-value=0.17<br /> 4. Age (<70; 70-79; 80+ years). P-value=0.016 for chi-square test, p=0.019 for chi-square test for trend<br /> 5. Sex (Male; Female). P-value=0.97 for chi-square test<br /> 6. Ethnicity (White; Black, Asian or Minority Ethnic). No data found.<br /> The criteria to define “important imbalance” seems to be statistical significance at the 0.05 threshold, however that should have been stated and tests for all other variables should have been provided too.<br /> First, this adjustment, from a theoretical point-of-view, was not necessary since the study was randomized; if the exact condition of imbalance triggering the adjustment was pre-specified in the protocol or SAP before the imbalance was known, it could induce a very slight reduction of the type I error rate and power. However, as it was performed when the imbalance was known, there is a risk that the sign of the imbalance (i.e. higher age in the dexamethasone group) have influenced the choice of adjustment. Indeed, an adjustment conditional to a higher age in the dexamethasone group will increase the estimated effect of dexamethasone in these conditions, and so, provide an inflation of the type I error rate. If the same conditional adjustment were further considered for other prognostic variables, the inflation could even be higher. <br /> Unless there is strong evidence that the amendment to the SAP was performed without knowledge of the sign of the imbalance (higher age in the dexamethasone group), we suggest that the primary analysis be kept as originally planned, without adjustment, and that the age adjustment be performed in a sensitivity analysis only. The knowledge of the sign of the unbalance is unclear in the last version of the SAP (V1.1, June 20th) and in the manuscript. In addition, in an open label trial, it is always better to stick to the protocol.

      **Results in other treatment arms**

      The manuscript specifies that “the Steering Committee closed recruitment to the dexamethasone arm since enrolment exceeded 2000 patients.” It is not stated whether any other treatment arm has exceeded 2000 patients or not and whether the study is still ongoing. Results of treatment arms that have been stopped should be provided (all arms having enrolled more than 2000 patients?). If not, the number of patients randomized in other treatment arms should, at least, be reported. If the study is completely stopped, all treatments should be analyzed and reported, unless there is a specific reason not to do so; that reason should be stated as the case may be. This data would be useful to provide evidence on other molecules. It would also clarify the number of statistical tests that have been performed or not, providing more information about the overall inflation of alpha risk.

      **Sample size**

      The paragraph about the sample size suggests that inclusions were planned, at some time, to stop when 2000 patients were included in the dexamethasone arm. The amended protocol (May 14th), the SAP V1.0 (June 9th) and the SAP V1.1 (June 20th, 4 days after the results have been officially announced) all have a paragraph about the sample size but all specify that the sample size is not fixed and none specify any criteria of arrest of the research based on sample size. There are 2104 patients included in this arm, which is substantially larger than the target of 2000 patients. The exact chronology and methodology should be clarified: when was the sample size computed and what was the exact criteria to arrest the research? Could the document (internal report?) related to this sample size calculation and statistical or non-statistical decision of arrest of the research be published in supplementary material?<br /> Indeed, assessment of the type I error rate requires knowing exactly when and why the research has been arrested: arrest for low inclusion rate of new patients or for reaching target sample size cannot be interpreted the same as arrest for high efficacy observed on an interim analysis.

      **Future of the protocol**

      With the new evidence about dexamethasone, the protocol will probably be stopped or evolve. The future recruitment may slow as the peak of the epidemic curve in United Kingdom is passed. The past, present and future of the protocol needs also to be known to assess the actual type I error rate. Indeed, future analyses, that have not yet been performed influence the overall type I error rate. That is why we suggest that author’s provide the daily or weekly inclusion rate from March to June and discuss the future of the study.

      **Loss to follow-up**

      Table S1 shows that the follow-up forms have been received for 1940/2104 (92.2%) patients of the dexamethasone group and 3973/4321 patients of the usual care group (91.9%). The patients without follow-up forms (8.5% overall) may either be lost to follow-up or have been included in the 28 last days before June 10th 2020 (data cut). The manuscript mentions that 4.8% of patients “had not been followed for 28 days by the time of the data cut”, suggesting that 8.5%-4.8% = 3.7% of patients are lost to follow-up, but that is our own interpretation. We suggest that authors report the actual number of loss to follow-up and how their data have been imputed or analyzed. The number of loss to follow-up may differ for different outcomes. For instance, if the Office of National Statistics (ONS) data has been used for vital status assessment, there should be no loss to follow-up on that outcome.

      **Vital status**

      The current manuscript only specifies the data of the web-based case report (e-CRF) form, filled by hospital staff, as source of information, suggesting that it is the only source of information about the vital status. The document entitled “Definition and Derivation of Baseline Characteristics and Outcomes” provided at https://www.recoverytrial.n... specifies many other sources. For instance, the vital status had to be assessed from the Office of National Statistics (ONS). Other sources, including Secondary Use Service Admitted Patient Care (SUSAPC) and e-CRF could be used for interim analysis. The ONS was considered as the defining source (most reliable). Whether the ONS data has been used or not should be clarified. If the ONS data have been used, statistics of agreement of the two data sources (e-CRF and ONS) may be provided to help assessing the quality of data. If the ONS data have not been used, this deviation from the planned protocol should be documented.<br /> The manuscript as well as the recovery-outcomes-definitions-v1-0.pdf file specifies that the follow-up form of the e-CRF is completed at “the earliest of (i) discharge from acute care (ii) death, or (iii) 28 days after the main randomisation”. If the follow-up form is not updated further, patients discharged alive before day 28 (e.g. day 14) may have incomplete vital status information at day 28. The following information should be specified:<br /> 1. Whether the follow-up form of the e-CRF had to be updated by hospital staff at day 28 for these patients<br /> 2. If response to (1) is yes, whether there was a means to distinguish between a lost to follow-up at day 28 (form not updated) and a patient discharged and alive at day 28 (form updated to “alive at day 28”)<br /> 3. If response to (2) is yes, how many patients discharged before day 28 were lost to follow-up at day 28<br /> 4. If response to (2) is yes, how has their vital status at day 28 been imputed or managed in models with censorships (log-rank, Kaplan-Meier, Cox)<br /> Of course, this information is really needed if the ONS and SUSAPC data have not been used.<br /> The quality of the vital status information is critical in such a large scale open-label multi-centric trial, because there is a risk that one or more center selectively report death, biasing the primary analysis.

      **Inclusion distribution by center**

      A multicentric study provides stronger evidence than a single-center study but sometimes, few centers include most patients, with a risk of low-quality data or selection bias. The very high number of included patients in the Recovery trial suggests that many centers included many patients but the distribution of inclusions per center could be reported.

      **Randomization**

      The protocol specifies that “in some hospitals, not all treatment arms will be available (e.g. due to manufacturing and supply shortages); and at some times, not all treatment arms will be active (e.g. due to lack of relevant approvals and contractual agreements).” This is further clarified in the SAP V1 (section 2.4.2 Exclusion criteria, page 8) by the sentence “If one or more of the active drug treatments is not available at the hospital or is believed, by the attending clinician, to be contraindicated (or definitely indicated) for the specific patient, then this fact will be recorded via the web-based form prior to randomisation; random allocation will then be between the remaining (or indicated) arms.” Showing that randomization arms may be closed on an individual basis, when the patient is included, with the argument of contraindication or definitive indication. It seems that the “standard of care” group could not be removed and that at least another randomization arm had to be kept as suggested by the words “random allocation will then be between the remaining arms (in a 2:1:1:1, 2:1:1 or 2:1 ratio)” in section 2.9.1 page 11 of the SAP V1.0. Even exclusion of a single randomization arm can lead to imbalance between groups. For instance, if physicians believed that a treatment was contraindicated for the most severe patients, only non-severe patients could be randomized to the treatment’s arm, while most severe patients would be randomized to other arms. Several things can be done to assess and fix this bias. First, report how many times this feature has been used and which randomization arms have been most excluded. If it has been used many times, provide the pattern of use that help to assess whether this is a collective measure (e.g. 2-weeks period of shortage of a treatment in a center ? no major selection bias) or individual measure. If its use has been rare, a sensitivity analysis could simply exclude these patients. If it has been frequent, we suggest a statistical method to analyze this data without bias, based on the following principles: patients randomized between 3 randomization arms A, B and C (population X) are comparable for the comparisons of A to B. Patients randomized between A, B and D (population Y), are comparable for the comparisons of A to B. Population X and population Y may differ but, inside each population, A can be compared to B. Therefore, the within-X comparison of A to B and within-Y comparison of A to B are both valid and can be meta-analyzed to assess a global difference between A and B. This can be simply done with an adjustment on the population (X or Y) in a fixed effects multivariate model. Pooling of X and Y populations should not be performed without adjustment.<br /> A second problem with randomization exists although the dexamethasone arm is the least affected. Randomization arms have been added in this adaptative trial. When a new randomization arm is added, new patients may be randomized to this arm and fewer patients are randomized to other arms. Consequently, the distribution of dates of inclusion may differ between groups. This may have some impact on the mortality at two levels: (1) the medical prescription of hospitalization may have evolved as the epidemic evolved, with hospitalization reserved to most severe patients at the peak of epidemic and maybe wider hospitalization criteria at the start of epidemic and (2) evolution of patients included in the Recovery trial. Indeed, even if centers should have included as many patients as possible as soon as their inclusion criteria were met, it is possible that they have only included part of eligible patients and that this part evolved with time. This bias can be easily assessed and fixed: the curves of inclusions in the different arms and mortality rate in the Recovery trial can be drawn as a function of date (from March to June) and an adjustment on date of inclusion may be performed in a sensitivity analysis.

      **Conclusion**

      Recovery is the study with the best methodology that we have seen on COVID-19 treatments in curative intent and we salute the initiative of publishing transparently the protocol, its amendments, the statistical analysis plan and the first draft of the report. We hope that our reporting suggestions will be taken in account in the final version of the paper. We think that discussing these points will qualify the interpretation of results, further improve the transparent approach adopted by designers of the study and improve the reliability of the conclusions. We expect a high-quality reporting of these final results, with full transparency on interim analyses, statistical analysis plans and statistical analysis reports. We hope that these comments are helpful and again we acknowledge that this study is not solely outstanding in terms of importance of the results but is also a stellar example for the whole field of therapeutic research. We invite other researchers to provide comments to this article to engage in Open Science.

    1. On 2022-01-14 00:43:08, user disqus_mV149tuM7g wrote:

      I am not a medical professional, but a common sense confounding variable immediately popped up in my mind, for which this (and most other studies) did not control for (though I understand it may not have been possible to control for it in this study given the data collection method, but more so I am baffled that from what I see 0 scientists and humans on earth apparently have thought of this common sense confounding variable and 0 studies that I know for attempted to control for it):

      A) Do we not know that omicron is more similar to the common cold compare to delta? B) Do we not know that there is at least some common T cell protection across different coronaviruses, such that even T cells produced from a common cold give at least some protection against covid?

      So then, without any further medical knowledge, the immediate common sense confounding variable that pops up in my mind using basic inferential logic is that if A and B are true, could it be that given the timing of omicron (came in early winter) compared to delta (came in summer), much more people had a common cold before omicron as opposed to delta? Also, less people abided by restrictions in Fall 2021 compared to Spring 2021. So couldn't this partially be the reason for why "omicron" is more mild than delta? Of course, that would mean that "omicron in those who had a common cold recently" is more mild than delta, NOT that "omicron" is more mild than delta. Do you see how dangerous it is (for people who did not have a common cold in a long time, especially if unvaccinated) to claim that "omicron" is more mild than delta? Again, I don't know if all of this is true or not, but I certainly think it warrants a more closer look.

      Another confounding variable I can think of (though this one I am less certain of, but I don't think it hurts to put it out there): I remember early studies in 2020 showed viral load was associated with illness severity, and that those who wore masks tended to have less severe illness. Assuming those studies were correct, could it be that because omicron is more transmissible, more people are getting infected with omicron with low viral load compared to delta? For example, maybe more people are getting delta through droplet spread resulting in higher viral load, and more people who wear surgical masks but get omicron due to being in a small store with enough aerosols going through the mask and giving them omicron get omicron, resulting in less viral loads overall for omicron infections. Has this been controlled for? I have yet to see any studies that controlled for it.

    1. On 2021-04-10 18:48:39, user Daniel Haake wrote:

      Regarding version 6 of your study, I have pointed out with my comment which statistical problems are present due to your study design, which leads to an overestimation of the calculated IFR (cf. https://www.medrxiv.org/con... "https://www.medrxiv.org/content/10.1101/2020.07.23.20160895v6?versioned=true#disqus_thread)"). Thank you very much for your reply to my statement. I think that an exchange is important, because this is the only way to get reasonable results. Therefore, please do not regard my comments as criticism, but as suggestions for improvement on how to achieve correct values. Since my statement is still valid with version 7, I answer to your answer, in which I comment here in version 7.


      Re: Re: The time of the determination of the death figures

      Here you seem to have misunderstood me. I meant that with your example wave of infections and starting the study shortly after the peak of the wave, there is the problem that antibodies have not yet been formed by many people by the time the study starts. By choosing the time of death then, you caught 95% of the deaths, but only a much smaller proportion of those infected. This leads to an underestimated numerator and thus an overestimated IFR.

      Just because it was also done that way in the Geneva seropaevelence study does not automatically mean it is correct. So there are also very much studies where the study date was chosen for the number of deaths. For example:

      https://www.who.int/bulleti...<br /> https://www.medrxiv.org/con... <br /> https://www.medrxiv.org/con...

      ?However, I agree with you that the Santa Clara County study should be taken with a grain of salt, as here the subjects were called via a Facebook ad and thus bias may have occurred.? As I said, I understand the idea of taking a later date for the number of deaths. However, the associated problems regarding the underestimation of the infected, which I wrote about in the previous answer, still remain.

      It is still incomprehensible that you calculate a difference of 22-24 days, but then take a value 28 days after the study midpoint. This puts them 4-6 days behind your own calculation and thus automatically increases the IFR. Why do you elaborately calculate the difference of 22-24 days to determine the correct time, but then don't use that value??? Let me open up another example. Let's say we are testing at the peak of an infection wave. But now we count all the dead who showed up after a certain time, but we don't take into account that a large number of people still got infected after that. Some of the counted dead will also have become infected after the study. Then we have recorded all the dead, but not all the infected. Or do you want to say that all the dead are from the first half of the infection wave and none from the second part of the infection wave (especially since that would lead to an IFR of 0% for the second part of the infection wave). As you can see, it is problematic if you assume the number of deaths in the much later course, because you then choose the denominator of the quotient too small and arrive at an IFR that is too high.

      In general, only deceased persons who are clear to have been infected before the latest time at which study participants may have become infected may then be included. This is not the time of the study, since the antibody tests can only be positive after some time following an infection.


      Re: Re: PCR tests from countries with tracing programs

      Is it really "PCR testing per confirmed case", not "PCR testing per capita" that is the important parameter? Let us assume two example scenarios for this purpose. Let's assume that we test every resident and at that time 1% of the population is in the status where the PCR test is positive. Then we currently know from everyone what their status is. But then we would only get 1 positive tested person out of 100 tests performed. This test would then not be taken because of the too low ratio of tests per positive case. And this, although we would have tested even everyone. Now let's assume the opposite case. We test in a country where we don't know exactly where how many people are infected. Now we test in one region and assume that this result is transferable for the whole country. But actually this region is not as affected as other regions, we just don't know. Now we do 10,000 tests and find 20 infected people there. Then we come up with a ratio of 1 positive test per 500 tests performed. That test would then be included in your selection, even though the ratio of infected is actually higher. Therefore, it is just not the "per confirmed case" that is the important parameter. Because if there is a high number of cases in the country, you could now double and triple test everyone and know very well and still this investigation would be excluded. At the same time, however, studies can be included with few tests and thus a high statistical uncertainty for the reasons mentioned earlier.??

      The comparison with South Korea is also problematic. 0 or 1 seropositive results are far too few to have any statistical significance. The statistical uncertainty here is simply too high. And, as already mentioned, the results of these investigations cannot be transferred across the board to the other investigations. ??

      Including reported case numbers from countries that have a tracking system that works well for you leads to an overestimation of IFR.


      Re: Re: Study selection

      That you screen out studies, based on recruitment I can understand. I think that is statistically correct. I also see the danger with recruitment that you can't get representative results. Therefore, it is also understandable that you want to see which studies are useful and which are not.<br /> Nevertheless, you just sort out the studies that have a low calculation of IFR and leave studies with high values in your study. This leads to a shift toward the high values. Furthermore, studies that are straight up deviant are more problematic because a larger shift is possible in that direction. Let's say there is a hypothetical virus with an IFR of actually 0.5%. Then we have a study with a value of 0.3% and a study with 1.5%. The high value in particular is further away from the actual value and thus shifts the calculated value upward. If you have an actual IFR of 0.5%, you can misestimate by a maximum of 0.5 percentage points on the downside and by 99.5 percentage points on the upside in theory. This is also not surprising because such distributions are right skewed. If I remove both, the study with the too low value and the study with the too high value, the actual value does not change. If I remove both, the calculated value shifts upwards, because a stronger shift is possible in this direction. This leads to an overestimation of the IFR.


      Re: Re: Adjustment of death rates for Europe due to excess mortality

      You write in your reply that this is not relevant because reported deaths were used and not excess mortality. In Appendix Q you write: <br /> "For example, the Belgian study used in our metaregression computed age-specific IFRs using seroprevalence findings in conjunction with data on excess mortality in Belgium“. You may not have applied this to other studies. However, you are using a study that did. Accordingly, this is crucial and has an impact on your result.


      Re: Re: Calculation of the IFR of influenza

      You nevertheless calculate an age-specific IFR for COVID-19 and calculate the IFR as it would look if there were an equal distribution across age groups, which in fact there is not. At the same time, you say what the IFR is for influenza, which, as shown, you understate. After all, the comparability of numbers due to changing life circumstances do not change in a short period of time. Therefore it is no problem to use the IFR for influenza of several years. Thus you suggest a comparability of the numbers. It is not possible to compare an IFR that assumes an equal distribution of age groups with an IFR that does not assume an equal distribution. However, this is exactly what is being suggested. By the way, it is not only the media, it was also taken up by Dr. Drosten. For another reason the comparability is difficult. Namely, an IFR is compared of influenza, where we could already protect the vulneable groups to some extent by vaccination and also an infection could have been gone through in the past, which helps to fight the disease and can therefore lead to fewer problems. However, to be honest, one can of course argue here that this is just the way the situation is. Therefore it is also understandable for me if one nevertheless makes such a comparison. Then, however, by assuming an equal distribution over the age structure for both viruses, or the actual distribution for both. By the way, there is another problem. There is a comparison of an estimated IFR with a measured one.

      ---------------------------------------------------


      Additional comment

      With the studies to date, it is very difficult to estimate how high the IFR actually is. This is because there are problems with all methods. If you take antibody studies, there is the problem that antibodies are not detectable in all infected people. If you take the reported numbers of cases, there is the problem of the dark field. How could one calculate a clean IFR? By actually testing a certain proportion of the population as a representative group on a regular basis. For example, you can test 1 per thousand of the population every week and see if they are positive for COVID-19. Then look at how many people have died over time from the group of positives. Those deceased could then be autopsied by default to determine whether they died from or with COVID-19. In doing so, one must then determine what period of time after infection is still valid to count as a COVID-19 dead person. After all, is a person who died 10 months after infection still a COVID-19 dead person? After all, it is the elderly who are dying. But it is not atypical that they would have died over time even without infection. Now imagine that a 94-year-old dies 10 months after an infection. Can one then still say whether it was due to COVID-19? In this case, one would probably have to look at the medical history before and after COVID-19 and also see what symptoms the deceased had after the infection. Only with such a procedure it is possible to calculate a clean IFR. For a correct comparability with influenza, this procedure would also have to be used for the calculation of the IFR of influenza. If you are really interested in a scientific comparability of the IFR, you should proceed in this way.

    1. 1 IntroductionCurrent AI ethics initiatives, especially when adopted in scientific institutes or companies, mostly embrace a principle-based approach (Mittelstadt, 2019). However, establishing principles alone does not suffice; they also must be convincingly put into practice. Most AI ethics guidelines do shy away from coming up with methods to accomplish this (Hagendorff, 2020). Nevertheless, recently more and more research papers appeared that describe steps on how to come “from what to how” (Eitel-Porter, 2020; Morley et al., 2020; Theodorou & Dignum, 2020; Vakkuri et al., 2019a). However, AI ethics still fails in certain regards. The reasons for that are manifold. This is why both in academia and public debates, many authors state that AI ethics has not permeated the AI industry yet, quite the contrary (Vakkuri et al., 2019b). Despite the mentioned reasons, this is due to current AI ethics discourses hardly taking considerations on moral psychology into account. They do not consider the limitations of the human mind, the many hidden psychological forces like powerful cognitive biases, blind spots and the like that can affect the likelihood of ethical or unethical behavior. In order to effectively improve moral decision making in the AI field and to live up to common ideals and expectations, AI ethics initiatives can seek inspiration from another ethical framework that is yet largely underrepresented in AI ethics, namely virtue ethics. Instead of focusing only on principles, AI ethics can put a stronger focus on virtues or, in other words, on character dispositions in AI practitioners in order to effectively put itself into practice. When using the term “AI practitioners” or “professionals”, this includes AI or machine learning researchers, research project supervisors, data scientists, industry engineers and developers, as well as managers and other domain experts.Moreover, to bridge the gap between existing AI ethics initiatives and the requirements for their successful implementation, one should consider insights from moral psychology because, up to now, most parts of the AI ethics discourse disregard the psychological processes that limit the goals and effectiveness of ethics programs. This paper aims to respond to this gap in research. AI ethics, in order to be truly successful, should not only repeat bullet points from the numerous ethics codes (Jobin et al., 2019). It should also discuss the right dispositions and character strengths in AI practitioners that can help not only to identify ethical issues and to engender the motivation to take action, but also—and this is even more important—to discover and circumvent one’s own vulnerability to psychological forces affecting moral behavior. The purpose of this paper is to state how this can be executed and how AI ethics can choose a virtue-based approach in order to effectively put itself into practice.2 AI Ethics—the Current Principled ApproachCurrent AI ethics programs often come with specific weaknesses and shortcomings. First and foremost, without being accompanied by binding legal norms, their normative principles lack reinforcement mechanisms (Rességuier & Rodrigues, 2020). Basically, deviations from codes of ethics have no or very minor consequences. Moreover, even when AI applications fulfill all ethical requirements stipulated, it does not necessarily mean that the application itself is “ethically approved” when used in the wrong contexts or when developed by organizations that follow unethical intentions (Hagendorff, 2021a; Lauer, 2020). In addition to that, ethics can be used for marketing purposes (Floridi, 2019; Wagner, 2018). Recent AI ethics initiatives of the private sector have faced a lot of criticism in this regard. In fact, industry efforts for ethical and fair AI are compared to past efforts of “Big Tobacco” to whitewash the image of smoking (Abdalla & Abdalla, 2020). “Big Tech”, so the argument, uses ethics initiatives and targeted research funds to avoid legislation or the creation of binding legal norms (Ochigame, 2019). Hence, avoiding or addressing criticism like that is paramount for trustworthy ethics initiatives.The latest progress in AI ethics research was configured by a “practical turn”, which was among other things inspired by the conclusion that principles alone cannot guarantee ethical AI (Mittelstadt, 2019). To accomplish that, so the argument, principles must be put into practice. Recently, several frameworks were developed, describing the process “from what to how” (Hallensleben et al., 2020; Morley et al., 2020; Zicari, 2020). Basically, this implies considering the context dependency in the process of realizing codes of ethics, the different requirements for different stakeholders, as well as the demonstration of ways of dealing with conflicting principles or values, for instance in the case of fairness and accuracy (Whittlestone et al., 2019). Ultimately, however, the practical turn frameworks are often just more detailed codes of ethics that use more fine-grained concepts than the initial high-level guidelines. For instance, instead of just stressing the importance of privacy, like the first generation of comprehensive AI ethics guidelines did, they hint to the Privacy by Design or Privacy Impact Assessment toolkits (Cavoukian, 2011; Cavoukian et al., 2010; Oetzel & Spiekermann, 2014). Or instead of just stipulating principles for AI, they differentiate between stages of algorithmic development, namely business and use-case development; design phase, where the business or use case is translated into tangible requirements for AI practitioners; training and test data procurement; building of the AI application; testing the application; deployment of the application and monitoring of the application’s performance (Morley et al., 2020). Other frameworks (Dignum, 2018) are rougher and differentiate between ethics by design (integrating ethical decision routines in AI systems (Hagendorff, 2021c)), ethics in design (finding development methods that support the evaluation of ethical implications of AI systems (Floridi et al., 2018)) and ethics for design (ensuring integrity on the side of developers (Johnson, 2017)). But, as stated above, all frameworks still stick to the principled approach. The main transformation lies in the principles being far more nuanced and less abstract compared to the beginnings of AI ethics code initiatives (Future of Life Institute, 2017). Typologies for every stage of the AI development pipeline are available. Differentiating principles solves one problem, namely the problem of too much abstraction. At the same time, however, it leaves some other problems open. Speaking more broadly, current AI ethics disregards certain dimensions it should actually be having. In organizations of all kinds, the likelihood of unethical decisions or behavior can be controlled to a certain extent. Antecedents for unethical behavior are individual characteristics (gender, cognitive moral development, idealism, job satisfaction, etc.), moral issue characteristics (the concentration and probability of negative effects, the magnitude of consequences, the proximity of the issue, etc.) and organizational environment characteristics (a benevolent ethical climate, ethical culture, code existence, rule enforcement, etc.) (Kish-Gephart et al., 2010). With regard to AI ethics, these factors are only partially considered. Most parts of the discourse are focused on discussing organizational environment characteristics (codes of ethics) or moral issues characteristics (AI safety) (Brundage et al., 2018; Hagendorff, 2020, 2021b), but not individual characteristics (character dispositions) increasing the likelihood of ethical decision making in AI research and development.Therefore, a successful ethics strategy should focus on individual dispositions and organizational structures alike, whereas the overarching goal of every measure should be the prevention of harm. Or, in this case: prevent AI-based applications from inflicting direct or indirect harm. This rationale can be fulfilled by ensuring explainability of algorithmic decision making, by mitigating biases and promoting fairness in machine learning, by fostering AI robustness and the like. However, in addition to listing these issues is asking how AI practitioners can be taught to intuitively keep them in mind. This would mean to transition from a situation of an external “ethics assessment” of existing AI products with a “checkbox guideline” to an internal process of establishing “ethics for design”.Empirical research shows that having plain knowledge on ethical topics or moral dilemmas is likely to have no measurable influence on decision making. Even ethics professionals, meaning ethics professors and other scholars of ethics, typically do not act more ethically than non-ethicists (Schwitzgebel, 2009; Schwitzgebel & Rust, 2014). Correspondingly, in the AI field, empirical research shows that ethical principles have no significant influence on technology developer’s decision making routines (McNamara et al., 2018). Ultimately, ethical principles do not suffice to secure prosocial ways to develop and use new technologies (Mittelstadt, 2019). Normative principles are not worth much if they are not acknowledged and adhered to. In order to actually acknowledge the importance of ethical considerations, certain character dispositions or virtues are required, among others, virtues that encourage us to stick to moral ideals and values.3 Basic AI Virtues—the Foundation for Ethical Decision MakingWestern virtue ethics has its roots in moral theories of Greek philosophers. However, after deontology and utilitarianism became more mainstream in modern philosophy, virtue ethics recently experienced a “comeback”. Roughly speaking, this comeback of scholarly interest in virtue ethics was initiated by Anscombe’s essay “Modern Moral Philosophy” (1958) but found prominent supporters and continued to grow by MacIntyre (1981), Nussbaum (1993), Hursthouse (2001) and many more. Virtue ethics also has a rich tradition in East and Southeast philosophy, especially in Confucian and Buddhist ethical theories (Keown, 1992; Tiwald, 2010). Virtue-based ethical theories treat character as fundamental to ethics, whereas deontology, arguably the most prevalent ethical theory, focusses on principles. But what are the differences between principles and virtues? The former is based on normative rules that are universally valid, the latter addresses the question of what constitutes a good person or character. While ethical principles equal obligations, virtues are ideals that AI practitioners can aspire to. Deontology-inspired normative principles focus on the action rather than the actor. Thus, principlism defines action-guiding principles, whereas virtue ethics demands the development of specific positive character dispositions or character strengths.Why are these dispositions of importance for AI practitioners? One reason is that individuals, who display traits such as justice, honesty, empathy and the like, acquire (public) trust. Trust, in turn, makes it easier for people to cooperate and work together, it creates a sense of community and it makes social interactions more predictable (Schneier, 2012). Acquiring and maintaining the trust of other players in the AI field, but also the trust of the general public, can be a prerequisite for providing AI products and services. After all, intrinsically motivated actions are more trustworthy in comparison to those which are simply the product of extrinsically motivated rule following behavior (Meara et al., 1996).One has to admit that a lot of ongoing AI basic research or very specific, small AI applications have such weak ethical implications that virtues or ethical values have no relevance at all. But AI applications that involve personal data, that are part of human–computer interaction or that are used on a grand scale clearly have ethical implications that can be addressed by virtue ethics. In the theoretical process of transitioning from an “uncultivated” to a morally habituated state, “technomoral virtues” like civility, courage, humility, magnanimity and others can be fostered and acquired (Vallor, 2016; Harris 2008a; Kohen et al., 2019; Gambelin, 2020; Sison et al., 2017; Neubert, 2017; Harris 2008b; Ratti & Stapleford, 2021). In philosophy, virtue ethics traditionally comprises cardinal virtues, namely fortitude, justice, prudence and moderation. Further, a list of six broad virtues that can be distilled from religious texts, oaths and other virtue inventories was put together by Peterson and Seligman (2004), whereas the virtues are wisdom, courage, humanity, justice, temperance and transcendence. Furthermore, in her famous book “Technology and the Virtues”, Vallor (2016, 2021) identified twelve technomoral virtues, namely honesty, self-control, humility, justice, courage, empathy, care, civility, flexibility, perspective, magnanimity and wisdom. The selection was criticized in secondary literature (Howard, 2018; Vallor, 2018) but remains arguably the most important virtue-based approach in ethics of technology. In the more specific context of AI applications, however, one has to sort out those virtues that are particularly important in the field of AI ethics. Here, existing literature and preliminary works are spare (Constantinescu et al., 2021; Neubert & Montañez, 2020).Based on patterns and regularities of the ongoing discussion on AI ethics, an ethics strategy that is based on virtues would constitute four basic AI virtues, where each virtue corresponds to a set of principles (see Table 1). The basic AI virtues are justice, honesty, responsibility and care. But how exactly can these virtues be derived from AI ethics principles? Why do exactly these four virtues suffice? When consulting meta-studies on AI ethics guidelines that stem from the sciences, industry, as well as governments (Fjeld et al., 2020; Hagendorff, 2020; Jobin et al., 2019), it becomes clear that AI ethics norms comprise a certain set of reoccurring principles. The mentioned meta-studies on AI ethics guidelines list these principles hierarchically, starting with the most frequently mentioned principles (fairness, transparency, accountability, etc.) and ending at principles that are mentioned rather seldom, but nevertheless repeatedly (sustainability, diversity, social cohesion etc.). When sifting through all these principles, one can, by using a reductionist approach and clustering them into groups, distill four basic virtues that cover all of them (see Fig. 1). The decisive question for the selection of the four basic AI virtues was: Does virtue A describe character dispositions that, when internalized by AI practitioners, will intrinsically motivate them to act in a way that “automatically” ensures or makes it more likely that the outcomes of their actions, among others, result in technological artefacts that meet the requirements that principle X specifies? Or, in short, does virtue A translate into behavior that is likely to result in an outcome that corresponds to the requirements of principle X? This question had to be applied for every principle that was derived from the meta-studies, testing by how many different virtues they can be covered. Ultimately, this process resulted in only four distinct virtues.Table 1 List of basic AI virtuesFull size tableFig. 1Full size imageUsing meta-studies on AI ethics guidelines as sources to distill four basic AI virtuesTo name some examples: The principle of algorithmic fairness corresponds to the virtue of justice. A just person will “automatically” be motivated to contribute to machine outputs that do not discriminate against groups of people, independently of external factors and guideline rules. The principle of transparency, as a second example, corresponds to the virtue of honesty, because an honest person will “automatically” be inclined to be open about mistakes, to not hide technical shortcomings, to make research outcomes accessible and explainable. The principle of safe AI would be a third example. Here, the virtue of care will move professionals to act in a manner that they do not only acknowledge the importance of safety and harm avoidance, but also act accordingly. Ultimately, the transition happens between deontological rules, principles or universal norms on the one hand and virtues, intrinsic motives or character dispositions on the other hand. Nevertheless, both fields are connected by the same objective, namely to come up with trustworthy, human-centered, beneficial AI applications. Just the means to reach this objective are different.As said before, the four basic AI virtues cover all common principles of AI ethics as described in prior discourses (Fjeld et al., 2020; Floridi et al., 2018; Hagendorff, 2020; Jobin et al., 2019; Morley et al., 2020). They are the precondition for putting principles into practice by representing different motivational settings for steering decision making processes in AI research and development in the right direction. But stipulating those four basic AI virtues is not enough. Tackling ethics problems in practice also needs second-order virtues that enable professionals to deal with “bounded ethicality”.4 Second-Order AI Virtues—a Response to Bounded EthicalityWhen using a simple ethical theory, one can assume that individuals go through three phases. First, individuals perceive that they are confronted with a moral decision they have to make. Secondly, they reflect on ethical principles and come up with a moral judgment. And finally, they act accordingly to these judgments and therefore act morally. But individuals do not actually behave this way. In fact, moral judgments are in most cases not influenced by moral reasoning (Haidt, 2001). Moral judgments are done intuitively, and moral reasoning is used in hindsight to justify one’s initial reaction. In short, typically, moral action precedes moral judgment. This leads to consequences for AI ethics. It shows that parts of current ethics initiatives can be reduced to plain “justifications” for the status quo of technology development—or at least they are adopted to it. For instance, the most commonly stressed AI ethics principles are fairness, accountability, explainability, transparency, privacy and safety (Hagendorff, 2020). However, these are issues for which a lot of technical solutions already exist and where a lot of research is done anyhow. Hence, AI ethics initiatives are simply reaffirming existing practices. On a macro level, this stands in correspondence with the aforementioned fact that moral judgments do not determine, but rather follow or explain prior decision making processes.Although explicit ethics training may improve AI practitioners’ intellectual understanding of ethics itself, there are many limitations restricting ethical decision making in practice, no matter how comprehensive one’s knowledge on ethical theories is. Many reasons for unethical behavior are resulting from environmental influences on human behavior and limitations through bounded rationality or, to be more precise, “bounded ethicality” (Bazerman & Tenbrunsel, 2011; Tenbrunsel & Messick, 2004). Bounded ethicality is an umbrella term that is used in moral psychology to name environmental as well as intrapersonal factors that can thwart ethical decision making in practice. Hence, in order to address bounded ethicality, AI ethics programs are in need of specific virtues, namely virtues that help to “debias” ethical decision making in order to overcome bounded ethicality.The first step to successively dissolve bounded ethicality is to inform AI practitioners not about the importance of machine biases, but psychological biases as well as situational forces. Here, two second-order virtues come into play, namely prudence and fortitude (see Table 2). In Aristotelian virtue ethics, prudence (or phrónēsis) guides the enactment of individual virtues in unique moral situations, meaning that a person can intelligently express virtuous behavior (Aristotle et al., 2012). As a unifying intellectual virtue, prudence also gains center stage in modern virtue-based approaches to engineering ethics (Frigo et al., 2021). In this paper, prudence plays a similar role and is used in combination with another virtue, namely fortitude. While both virtues may help to overcome bounded ethicality, they are at the same time enablers for living up to the basic virtues. Individual psychological biases as well as situational forces can get in the way of acting justly, honestly, responsibly or caringly. Prudence and fortitude are the answers to the many forces that may restrict basic AI virtues, where prudence is aiming primarily at individual factors, while fortitude addresses supra-individual issues that can impair ethical decision making in AI research and development.Table 2 List of second-order AI virtuesFull size tableIn the following, a selection of some of the major factors of bounded ethicality that can be tackled by prudence shall be described. This selection is neither exhaustive nor does it go into much detail. However, it is meant to be a practical overview that can set the scene for more in-depth subsequent analyses.Clearly, the most obvious factors of bounded ethicality are psychological biases (Cain & Detsky, 2008). It is common that people’s first and often only reaction to moral problems is emotional. Or, in other words, taking up dual-process theory, their reaction follows system 1 thinking (Kahneman, 2012; Tversky & Kahneman, 1974), meaning an intuitive, implicit, effortless, automatic mode of mental information processing. System 1 thinking predominates everyday decisions. System 2, on the other hand, is a conscious, logical, less error-prone, but slow and effortful mode of thinking. Although many decision making routines would require system 2 thinking, individuals often lack the energy to switch from system 1 to system 2. Ethical decision making needs cognitive energy (Mead et al., 2009). This is why prudence is such an important virtue, since it helps AI practitioners to transition from system 1 to system 2 thinking in ethical problems. This is not to say that the dual-process theory is without criticism. Recently, cognitive scientists have challenged its validity (Grayot, 2020), even though they did not abandon it in toto. It still remains a scientifically sound heuristic in moral psychology. Thus, system 2 thinking remains strikingly close to critical ethical thinking, although it does obviously not necessarily result in it (Bonnefon, 2018).The transition from system 1 to system 2 thinking in ethical problems can also be useful for mitigating another powerful psychological force, namely implicit biases (Banaji & Greenwald, 2013), that can impair at least two basic AI virtues, namely justice and care. Individuals have implicit associations, also called “ordinary prejudices”, that lead them to classify, categorize and perceive their social surroundings with accordance to prejudices and stereotypes. This effect is so strong that even individuals who are absolutely sure to not be hostile towards minority groups actually are exactly that. The reason for that lies in the fact that people succumb to subconscious biases that reflect culturally established stereotypes or discrimination patterns. Hence, unintentional discrimination cannot be unlearned without changing culture, the media, the extent of exposure to people from minorities and the like. Evidently, this task cannot be fulfilled by the AI sector. Nevertheless, implicit biases can be tackled by increasing workforce diversity in AI firms and by using prudence as a virtue to accept the irrefutable existence and problematic nature of implicit biases as well as their influence on justice in the first place.Another important bias that can compromise basic AI virtues and that can at the same time be overcome by prudence is in-group favoritism (Efferson et al., 2008). This bias causes people to sympathize with others who share their culture, organization, gender, skin color, etc. For AI practitioners, this means that AI applications which have negative side-effects on outgroups, for instance the livelihoods of clickworkers in South-east Asia (Graham et al., 2017), are rated less ethically problematic than AI applications that would have similar consequences for in-groups. Moreover, the current gender imbalance in the AI field might be prolonged by in-group favoritism in human resource management. In-group favoritism mainly stifles character dispositions like justice and care. Prudence, on the other hand, is apt to work against in-group favoritism by recognizing artificial group constructions as well as definitions of who counts as “we” and who as “others”, bolstering not only fair decision making, but also abilities to empathize with “distant” individuals.One further and important effect of bounded ethicality that can impair the realization of the basic AI virtues is self-serving biases. These biases cause revisionist impulses in humans, helping to downplay or deny past unethical actions while memorizing ethical ones, resulting in a self-concept that depicts oneself as ethical. When one asks individuals to rate how ethical they think they are on a scale of 0 to 100 related to other individuals, the majority of them will give themselves a score of more than 50 (Epley & Dunning, 2000). The same holds true when people are asked to assess the organization they are a part of in relation to other organizations. Average scores are higher than 50, although actually the average score would have to be 50. What one can learn from this is that generally speaking, people overestimate their ethicality. Moreover, self-serving biases cause people to blame other people when things go wrong, but to view successes as being one’s own achievement. Others are to blame for ethical problems, depicting the problems as being outside of one’s own control. In the AI sector, self-serving biases can come into play when attributing errors or inaccuracies in applications as being the result of others, when reacting dismissive to critical feedback or feelings of concern, etc. Moreover, not overcoming self-serving biases by prudence can mean to act unjustly and dishonestly, further compromising basic AI virtues.Value-action gaps are another effect of bounded ethicality revealed by empirical studies in moral psychology (Godin et al., 2005; Jansen & Glinow, 1985). Value-action gaps occur in the discrepancy between people’s self-concepts or moral values and their actual behavior. In short, the gaps mark the distance between what people say and what people do. Prudence, on the other hand, can help to identify that distance. In the AI field, value-action gaps can occur on an organizational level, for instance by using lots of ethics-related terms in corporate reports and press releases while actually being involved in unethical businesses practices, lawsuits, fraud, etc. (Loughran et al., 2009). Especially the AI sector is often accused of ethics-washing, hence of talking much about ethics, but not acting accordingly (Hao, 2019). Likewise, value-action gaps can occur on an individual level, for instance by holding AI safety or data security issues in high esteem while actually accepting improper quality assurance or rushed development and therefore provoking technical vulnerabilities in machine learning models. Akin to value-action gaps are behavioral forecasting errors (Diekmann et al., 2003). Here, people tend to believe that they will act ethically in a given situation X, while when situation X actually occurs, they do not behave accordingly (Woodzicka & LaFrance, 2001). They underestimate the extent to which they will indeed stick to their ideals and intentions. All these effects can interfere negatively with basic AI virtues, mostly with care, honesty and justice. This is why prudence with regard to value-action gaps is of great importance.The concept of moral disengagement is another important factor in bounded ethical decision making (Bandura, 1999). Techniques of moral disengagement allow individuals to selectively turn their moral concerns on and off. In many day-to-day decisions, people act contrary to their own ethical standards, but without feeling bad about it or having a guilty conscience. The main techniques in moral disengagement processes comprise justifications, where wrongdoing is justified as means to a higher end; changes in one’s definition about what is ethical; euphemistic labels, where individuals detach themselves from problematic action contexts by using linguistic distancing mechanisms; denial of being personally responsible for particular outcomes, where responsibility is attributed to a larger group of people; the use of comparisons, where own wrongdoings are relativized by pointing at other contexts of wrongdoings or the avoidance of certain information that refers to negative consequences of one’s own behavior. Again, prudence can help to identify cases of moral disengagement in the AI field and act as a response to it. Addressing moral disengagement with prudence can be a requirement to live up to all basic AI virtues.In the following, a selection of some of the major factors of bounded ethicality that can be tackled by fortitude shall be described. Here, supra-individual issues that can impair ethical decision making in AI research and development are addressed. Certainly, one of the most relevant factors one has to discuss in this context are situational forces. Numerous empirical studies in moral psychology have shown that situational forces can have a massive impact on moral behavior (Isen & Levin, 1972; Latané & Darley, 1968; Williams & Bargh, 2008). Situational forces can range from specific influences like the noise of a lawnmower that significantly affects helping behavior (Mathews & Canon, 1975) to more relevant factors like competitive orientations, time constraints, tiredness, stress, etc., which are likely to alter or overwrite ethical concerns (Cave & ÓhÉigeartaigh, 2018; Darley & Batson, 1973; Kouchaki & Smith, 2014). Especially financial incentives have a significant influence on ethical behavior. In environments that are structured by economic imperatives, decisions that clearly have an ethical dimension can be reframed as pure business decisions. All in all, money has manifold detrimental consequences for decision making since it leads to decisions that are proven to be less social, less ethical or less cooperative (Gino & Mogilner, 2014; Gino & Pierce, 2009; Kouchaki et al., 2013; Palazzo et al., 2012; Vohs et al., 2006). Ultimately, various finance law obligations or monetary factual constraints that a company’s management has to comply to can conflict with or overwrite AI virtues. Especially in contexts like this, virtue ethics can significantly be pushed into the background, although the perceived constraints lead to immoral outcomes. In short, situational forces can have negative impacts on unfolding all four basic AI virtues, namely justice, honesty, responsibility and care. In general, critics of virtue ethics have pointed out that moral behavior is not determined by character traits, but social contexts and concrete situations (Kupperman, 2001). However, situationist accounts are in fact entirely compatible with virtue ethics since it provides particular virtues like fortitude that are intended to counteract situational forces (and that can explain why some individuals deviate from expected behavior in classical psychological experiments like the Milgram experiment (Milgram, 1963)). Fortitude is supposed to help to counteract situational pressure, allowing the mentioned basic virtues to flourish.Similar to and often not clearly distinguishable from situational forces are peer influences (Asch, 1951, 1956). Individuals want to follow the crowd, adapt their behavior to that of their peers and act similarly to them. This is also called conformity bias. Conformity biases can become a problem for two reasons: First, group norms can possess unethical traits, leading for instance to a collective acceptance of harm. Second, the reliance on group norms and the associated effects of conformity bias induces a suppression of own ethical judgments. In other words, if one individual starts to misbehave, for instance by cheating, others follow suit (Gino et al., 2009). A similar problem occurs with authorities (Milgram, 1963). Humans have an internal tendency for being obedient to authorities. This willingness to please authorities can have positive consequences when executives act ethically themselves. If this is not the case, the opposite becomes true. For AI ethics, this means that social norms that tacitly emerge from AI practitioner’s behavioral routines as well as managerial decisions can both bolster ethical as well as unethical working cultures. In the case of the latter, the decisive factor is the way individuals respond to inner normative conflicts with their surroundings. Do they act in conformity and obedience even if it means to violate basic AI virtues? Or do they stick to their dispositions and deviate from detrimental social norms or orders? Fortitude, one of the two second-order virtues, can ensure the appropriate mental strength to stick to the right intentions and behavior, be it in cases where everyone disobeys a certain law but oneself does not want to join in, where managerial orders instruct to bring a risky product to the market as fast as possible but oneself insists on piloting it before release or where under extreme time pressure one insists on devoting time to understand and analyze training data sets.5 Ethics Training—AI Virtues Come into BeingIn traditional virtue ethics concepts, virtues emerge from habitual, repeated and gradually refined practice of right and prudent actions (Aristotle et al., 2012). At first, specific virtues are encouraged and practiced by performing acts that are inspired by “noble” human role-models and that resemble other patterns, narratives or social models of the virtue in question. Later, virtues are refined by taking the particularity of given situations into account. Regarding AI virtues, the proceeding is not much different (Bezuidenhout & Ratti, 2021). However, cultivating basic and second-order AI virtues means achieving virtuous practice embedded in a specific organizational and cultural context. A virtuous practice requires some sort of moral self-cultivation that encompasses the acquirement of motivations or the will to take action, knowledge on ethical issues, skills to identify them and moral reasoning to make the right moral decisions (Johnson, 2017). One could reckon that especially aforementioned skills or motivations are either innate or the result of childhood education. But ethical dispositions can be changed by education in all stages of life, for instance by powerful experiences, virtuous leaders or a certain work atmosphere in organizations. To put it in a nutshell, virtues can be trained and taught in order to foster ethical decision making and to overcome bounded ethicality. Most importantly, if ethics training imparts only explicit knowledge (or ethical principles), this will very likely have no effect on behavior. Ethics training must also impart tacit knowledge, meaning skills of social perception and emotion that cause individuals to automatically feel and want the right thing in a given situation (Haidt, 2006, p. 160).The simplest form of ethics programs comprise ethics training sessions combined with incentive schemes for members of a given organization that reward the abidance of ethical principles and punish their violation. These ethics programs have numerous disadvantages. First, individuals that are part of them are likely to only seek to perform well on behavior covered by exactly these programs. Areas that are not covered are neglected. That way, ethics programs can even increase unethical behavior by actually well-intended sanctioning systems (Gneezy & Rustichini, 2000). For instance, in case a fine is put on a specific unethical behavior, individuals who benefit from this behavior might simply weigh the advantage of the unethical behavior against the disadvantage of the fine. If the former outweighs the latter, the unethical behavior might even increase if a sanctioning system is in place. Ethical decisions would simply be reframed as monetary decisions. In addition to that, individuals can become inclined to trick incentive schemes and reward systems. Moreover, those programs solely focus on extrinsic motivators and do not change intrinsic dispositions and moral attitudes. All in all, ethics programs that comprise simple reward and sanctioning systems—as well as corresponding surveillance and monitoring mechanisms—are very likely to fail.A further risk of ethics programs or ethics training are reactance phenomena. Reactance occurs when individuals protest against constraints of their personal freedoms. As soon as ethical principles restrict the freedom of AI practitioners doing their work, they might react to this restriction by trying to reclaim that very freedom by all means (Dillard & Shen, 2005; Dowd et al., 1991; Hong, 1992). People want to escape restrictions, thus the moment when such restrictions are put in place—no matter whether they are justified from an ethical perspective or not—people might start striving to break free from them. Ultimately, “forcing” ethics programs on members of an organization is not a good idea. Ethics programs should not be decoupled from the inner mechanisms and routines of an organization. Hence, in order to avoid reactance and to fit ethics programs into actual structures and routines of an organization, it makes sense to carefully craft specific, unique compliance measures that take particular decision processes of AI practitioners and managers into account. In addition to that, ethics programs can be implemented in organizations with delay. This has the effect of a “future lock-in” (Rogers & Bazerman, 2008), meaning that policies achieve more support, since the time delay allows for an elimination of the immediate costs of implementation, for individuals to prepare for the respective measures and for a recognition of their advantages.Considering all of that, what measures can actually support AI practitioners and AI companies’ managers to strengthen AI virtues? Here, again, insights from moral psychology as well as behavioral ethics research can be used (Hines et al., 1987; Kollmuss & Agyeman, 2002; Treviño et al., 2006, 2014) to catalogue measures that bolster ethical decision making as well as virtue acquisition (see Tables 3 and 4). The measures can be vaguely divided into those that tend to affect single individuals and those that bring about or relate to structural changes in organizations. The following Table 3 lists measures that relate to AI professionals on an individual level.
    1. Virtue EthicsFirst published Fri Jul 18, 2003; substantive revision Tue Oct 11, 2022 Virtue ethics is currently one of three major approaches in normative ethics. It may, initially, be identified as the one that emphasizes the virtues, or moral character, in contrast to the approach that emphasizes duties or rules (deontology) or that emphasizes the consequences of actions (consequentialism). Suppose it is obvious that someone in need should be helped. A utilitarian will point to the fact that the consequences of doing so will maximize well-being, a deontologist to the fact that, in doing so the agent will be acting in accordance with a moral rule such as “Do unto others as you would be done by” and a virtue ethicist to the fact that helping the person would be charitable or benevolent. This is not to say that only virtue ethicists attend to virtues, any more than it is to say that only consequentialists attend to consequences or only deontologists to rules. Each of the above-mentioned approaches can make room for virtues, consequences, and rules. Indeed, any plausible normative ethical theory will have something to say about all three. What distinguishes virtue ethics from consequentialism or deontology is the centrality of virtue within the theory (Watson 1990; Kawall 2009). Whereas consequentialists will define virtues as traits that yield good consequences and deontologists will define them as traits possessed by those who reliably fulfil their duties, virtue ethicists will resist the attempt to define virtues in terms of some other concept that is taken to be more fundamental. Rather, virtues and vices will be foundational for virtue ethical theories and other normative notions will be grounded in them. We begin by discussing two concepts that are central to all forms of virtue ethics, namely, virtue and practical wisdom. Then we note some of the features that distinguish different virtue ethical theories from one another before turning to objections that have been raised against virtue ethics and responses offered on its behalf. We conclude with a look at some of the directions in which future research might develop. 1. Preliminaries 1.1 Virtue 1.2 Practical Wisdom 2. Forms of Virtue Ethics 2.1 Eudaimonist Virtue Ethics 2.2 Agent-Based and Exemplarist Virtue Ethics 2.3 Target-Centered Virtue Ethics 2.4 Platonistic Virtue Ethics 3. Objections to virtue ethics 4. Future Directions Bibliography Academic Tools Other Internet Resources Related Entries 1. Preliminaries In the West, virtue ethics’ founding fathers are Plato and Aristotle, and in the East it can be traced back to Mencius and Confucius. It persisted as the dominant approach in Western moral philosophy until at least the Enlightenment, suffered a momentary eclipse during the nineteenth century, but re-emerged in Anglo-American philosophy in the late 1950s. It was heralded by Anscombe’s famous article “Modern Moral Philosophy” (Anscombe 1958) which crystallized an increasing dissatisfaction with the forms of deontology and utilitarianism then prevailing. Neither of them, at that time, paid attention to a number of topics that had always figured in the virtue ethics tradition—virtues and vices, motives and moral character, moral education, moral wisdom or discernment, friendship and family relationships, a deep concept of happiness, the role of the emotions in our moral life and the fundamentally important questions of what sorts of persons we should be and how we should live. Its re-emergence had an invigorating effect on the other two approaches, many of whose proponents then began to address these topics in the terms of their favoured theory. (One consequence of this has been that it is now necessary to distinguish “virtue ethics” (the third approach) from “virtue theory”, a term which includes accounts of virtue within the other approaches.) Interest in Kant’s virtue theory has redirected philosophers’ attention to Kant’s long neglected Doctrine of Virtue, and utilitarians have developed consequentialist virtue theories (Driver 2001; Hurka 2001). It has also generated virtue ethical readings of philosophers other than Plato and Aristotle, such as Martineau, Hume and Nietzsche, and thereby different forms of virtue ethics have developed (Slote 2001; Swanton 2003, 2011a). Although modern virtue ethics does not have to take a “neo-Aristotelian” or eudaimonist form (see section 2), almost any modern version still shows that its roots are in ancient Greek philosophy by the employment of three concepts derived from it. These are arête (excellence or virtue), phronesis (practical or moral wisdom) and eudaimonia (usually translated as happiness or flourishing). (See Annas 2011 for a short, clear, and authoritative account of all three.) We discuss the first two in the remainder of this section. Eudaimonia is discussed in connection with eudaimonist versions of virtue ethics in the next. 1.1 Virtue A virtue is an excellent trait of character. It is a disposition, well entrenched in its possessor—something that, as we say, goes all the way down, unlike a habit such as being a tea-drinker—to notice, expect, value, feel, desire, choose, act, and react in certain characteristic ways. To possess a virtue is to be a certain sort of person with a certain complex mindset. A significant aspect of this mindset is the wholehearted acceptance of a distinctive range of considerations as reasons for action. An honest person cannot be identified simply as one who, for example, practices honest dealing and does not cheat. If such actions are done merely because the agent thinks that honesty is the best policy, or because they fear being caught out, rather than through recognising “To do otherwise would be dishonest” as the relevant reason, they are not the actions of an honest person. An honest person cannot be identified simply as one who, for example, tells the truth because it is the truth, for one can have the virtue of honesty without being tactless or indiscreet. The honest person recognises “That would be a lie” as a strong (though perhaps not overriding) reason for not making certain statements in certain circumstances, and gives due, but not overriding, weight to “That would be the truth” as a reason for making them. An honest person’s reasons and choices with respect to honest and dishonest actions reflect her views about honesty, truth, and deception—but of course such views manifest themselves with respect to other actions, and to emotional reactions as well. Valuing honesty as she does, she chooses, where possible to work with honest people, to have honest friends, to bring up her children to be honest. She disapproves of, dislikes, deplores dishonesty, is not amused by certain tales of chicanery, despises or pities those who succeed through deception rather than thinking they have been clever, is unsurprised, or pleased (as appropriate) when honesty triumphs, is shocked or distressed when those near and dear to her do what is dishonest and so on. Given that a virtue is such a multi-track disposition, it would obviously be reckless to attribute one to an agent on the basis of a single observed action or even a series of similar actions, especially if you don’t know the agent’s reasons for doing as she did (Sreenivasan 2002). Possessing a virtue is a matter of degree. To possess such a disposition fully is to possess full or perfect virtue, which is rare, and there are a number of ways of falling short of this ideal (Athanassoulis 2000). Most people who can truly be described as fairly virtuous, and certainly markedly better than those who can truly be described as dishonest, self-centred and greedy, still have their blind spots—little areas where they do not act for the reasons one would expect. So someone honest or kind in most situations, and notably so in demanding ones, may nevertheless be trivially tainted by snobbery, inclined to be disingenuous about their forebears and less than kind to strangers with the wrong accent. Further, it is not easy to get one’s emotions in harmony with one’s rational recognition of certain reasons for action. I may be honest enough to recognise that I must own up to a mistake because it would be dishonest not to do so without my acceptance being so wholehearted that I can own up easily, with no inner conflict. Following (and adapting) Aristotle, virtue ethicists draw a distinction between full or perfect virtue and “continence”, or strength of will. The fully virtuous do what they should without a struggle against contrary desires; the continent have to control a desire or temptation to do otherwise. Describing the continent as “falling short” of perfect virtue appears to go against the intuition that there is something particularly admirable about people who manage to act well when it is especially hard for them to do so, but the plausibility of this depends on exactly what “makes it hard” (Foot 1978: 11–14). If it is the circumstances in which the agent acts—say that she is very poor when she sees someone drop a full purse or that she is in deep grief when someone visits seeking help—then indeed it is particularly admirable of her to restore the purse or give the help when it is hard for her to do so. But if what makes it hard is an imperfection in her character—the temptation to keep what is not hers, or a callous indifference to the suffering of others—then it is not. 1.2 Practical Wisdom Another way in which one can easily fall short of full virtue is through lacking phronesis—moral or practical wisdom. The concept of a virtue is the concept of something that makes its possessor good: a virtuous person is a morally good, excellent or admirable person who acts and feels as she should. These are commonly accepted truisms. But it is equally common, in relation to particular (putative) examples of virtues to give these truisms up. We may say of someone that he is generous or honest “to a fault”. It is commonly asserted that someone’s compassion might lead them to act wrongly, to tell a lie they should not have told, for example, in their desire to prevent someone else’s hurt feelings. It is also said that courage, in a desperado, enables him to do far more wicked things than he would have been able to do if he were timid. So it would appear that generosity, honesty, compassion and courage despite being virtues, are sometimes faults. Someone who is generous, honest, compassionate, and courageous might not be a morally good person—or, if it is still held to be a truism that they are, then morally good people may be led by what makes them morally good to act wrongly! How have we arrived at such an odd conclusion? The answer lies in too ready an acceptance of ordinary usage, which permits a fairly wide-ranging application of many of the virtue terms, combined, perhaps, with a modern readiness to suppose that the virtuous agent is motivated by emotion or inclination, not by rational choice. If one thinks of generosity or honesty as the disposition to be moved to action by generous or honest impulses such as the desire to give or to speak the truth, if one thinks of compassion as the disposition to be moved by the sufferings of others and to act on that emotion, if one thinks of courage as mere fearlessness or the willingness to face danger, then it will indeed seem obvious that these are all dispositions that can lead to their possessor’s acting wrongly. But it is also obvious, as soon as it is stated, that these are dispositions that can be possessed by children, and although children thus endowed (bar the “courageous” disposition) would undoubtedly be very nice children, we would not say that they were morally virtuous or admirable people. The ordinary usage, or the reliance on motivation by inclination, gives us what Aristotle calls “natural virtue”—a proto version of full virtue awaiting perfection by phronesis or practical wisdom. Aristotle makes a number of specific remarks about phronesis that are the subject of much scholarly debate, but the (related) modern concept is best understood by thinking of what the virtuous morally mature adult has that nice children, including nice adolescents, lack. Both the virtuous adult and the nice child have good intentions, but the child is much more prone to mess things up because he is ignorant of what he needs to know in order to do what he intends. A virtuous adult is not, of course, infallible and may also, on occasion, fail to do what she intended to do through lack of knowledge, but only on those occasions on which the lack of knowledge is not culpable. So, for example, children and adolescents often harm those they intend to benefit either because they do not know how to set about securing the benefit or because their understanding of what is beneficial and harmful is limited and often mistaken. Such ignorance in small children is rarely, if ever culpable. Adults, on the other hand, are culpable if they mess things up by being thoughtless, insensitive, reckless, impulsive, shortsighted, and by assuming that what suits them will suit everyone instead of taking a more objective viewpoint. They are also culpable if their understanding of what is beneficial and harmful is mistaken. It is part of practical wisdom to know how to secure real benefits effectively; those who have practical wisdom will not make the mistake of concealing the hurtful truth from the person who really needs to know it in the belief that they are benefiting him. Quite generally, given that good intentions are intentions to act well or “do the right thing”, we may say that practical wisdom is the knowledge or understanding that enables its possessor, unlike the nice adolescents, to do just that, in any given situation. The detailed specification of what is involved in such knowledge or understanding has not yet appeared in the literature, but some aspects of it are becoming well known. Even many deontologists now stress the point that their action-guiding rules cannot, reliably, be applied without practical wisdom, because correct application requires situational appreciation—the capacity to recognise, in any particular situation, those features of it that are morally salient. This brings out two aspects of practical wisdom. One is that it characteristically comes only with experience of life. Amongst the morally relevant features of a situation may be the likely consequences, for the people involved, of a certain action, and this is something that adolescents are notoriously clueless about precisely because they are inexperienced. It is part of practical wisdom to be wise about human beings and human life. (It should go without saying that the virtuous are mindful of the consequences of possible actions. How could they fail to be reckless, thoughtless and short-sighted if they were not?) The second is the practically wise agent’s capacity to recognise some features of a situation as more important than others, or indeed, in that situation, as the only relevant ones. The wise do not see things in the same way as the nice adolescents who, with their under-developed virtues, still tend to see the personally disadvantageous nature of a certain action as competing in importance with its honesty or benevolence or justice. These aspects coalesce in the description of the practically wise as those who understand what is truly worthwhile, truly important, and thereby truly advantageous in life, who know, in short, how to live well. 2. Forms of Virtue Ethics While all forms of virtue ethics agree that virtue is central and practical wisdom required, they differ in how they combine these and other concepts to illuminate what we should do in particular contexts and how we should live our lives as a whole. In what follows we sketch four distinct forms taken by contemporary virtue ethics, namely, a) eudaimonist virtue ethics, b) agent-based and exemplarist virtue ethics, c) target-centered virtue ethics, and d) Platonistic virtue ethics. 2.1 Eudaimonist Virtue Ethics The distinctive feature of eudaimonist versions of virtue ethics is that they define virtues in terms of their relationship to eudaimonia. A virtue is a trait that contributes to or is a constituent of eudaimonia and we ought to develop virtues, the eudaimonist claims, precisely because they contribute to eudaimonia. The concept of eudaimonia, a key term in ancient Greek moral philosophy, is standardly translated as “happiness” or “flourishing” and occasionally as “well-being.” Each translation has its disadvantages. The trouble with “flourishing” is that animals and even plants can flourish but eudaimonia is possible only for rational beings. The trouble with “happiness” is that in ordinary conversation it connotes something subjectively determined. It is for me, not for you, to pronounce on whether I am happy. If I think I am happy then I am—it is not something I can be wrong about (barring advanced cases of self-deception). Contrast my being healthy or flourishing. Here we have no difficulty in recognizing that I might think I was healthy, either physically or psychologically, or think that I was flourishing but be wrong. In this respect, “flourishing” is a better translation than “happiness”. It is all too easy to be mistaken about whether one’s life is eudaimon (the adjective from eudaimonia) not simply because it is easy to deceive oneself, but because it is easy to have a mistaken conception of eudaimonia, or of what it is to live well as a human being, believing it to consist largely in physical pleasure or luxury for example. Eudaimonia is, avowedly, a moralized or value-laden concept of happiness, something like “true” or “real” happiness or “the sort of happiness worth seeking or having.” It is thereby the sort of concept about which there can be substantial disagreement between people with different views about human life that cannot be resolved by appeal to some external standard on which, despite their different views, the parties to the disagreement concur (Hursthouse 1999: 188–189). Most versions of virtue ethics agree that living a life in accordance with virtue is necessary for eudaimonia. This supreme good is not conceived of as an independently defined state (made up of, say, a list of non-moral goods that does not include virtuous activity) which exercise of the virtues might be thought to promote. It is, within virtue ethics, already conceived of as something of which virtuous activity is at least partially constitutive (Kraut 1989). Thereby virtue ethicists claim that a human life devoted to physical pleasure or the acquisition of wealth is not eudaimon, but a wasted life. But although all standard versions of virtue ethics insist on that conceptual link between virtue and eudaimonia, further links are matters of dispute and generate different versions. For Aristotle, virtue is necessary but not sufficient—what is also needed are external goods which are a matter of luck. For Plato and the Stoics, virtue is both necessary and sufficient for eudaimonia (Annas 1993). According to eudaimonist virtue ethics, the good life is the eudaimon life, and the virtues are what enable a human being to be eudaimon because the virtues just are those character traits that benefit their possessor in that way, barring bad luck. So there is a link between eudaimonia and what confers virtue status on a character trait. (For a discussion of the differences between eudaimonists see Baril 2014. For recent defenses of eudaimonism see Annas 2011; LeBar 2013b; Badhwar 2014; and Bloomfield 2014.) 2.2 Agent-Based and Exemplarist Virtue Ethics Rather than deriving the normativity of virtue from the value of eudaimonia, agent-based virtue ethicists argue that other forms of normativity—including the value of eudaimonia—are traced back to and ultimately explained in terms of the motivational and dispositional qualities of agents. It is unclear how many other forms of normativity must be explained in terms of the qualities of agents in order for a theory to count as agent-based. The two best-known agent-based theorists, Michael Slote and Linda Zagzebski, trace a wide range of normative qualities back to the qualities of agents. For example, Slote defines rightness and wrongness in terms of agents’ motivations: “[A]gent-based virtue ethics … understands rightness in terms of good motivations and wrongness in terms of the having of bad (or insufficiently good) motives” (2001: 14). Similarly, he explains the goodness of an action, the value of eudaimonia, the justice of a law or social institution, and the normativity of practical rationality in terms of the motivational and dispositional qualities of agents (2001: 99–100, 154, 2000). Zagzebski likewise defines right and wrong actions by reference to the emotions, motives, and dispositions of virtuous and vicious agents. For example, “A wrong act = an act that the phronimos characteristically would not do, and he would feel guilty if he did = an act such that it is not the case that he might do it = an act that expresses a vice = an act that is against a requirement of virtue (the virtuous self)” (Zagzebski 2004: 160). Her definitions of duties, good and bad ends, and good and bad states of affairs are similarly grounded in the motivational and dispositional states of exemplary agents (1998, 2004, 2010). However, there could also be less ambitious agent-based approaches to virtue ethics (see Slote 1997). At the very least, an agent-based approach must be committed to explaining what one should do by reference to the motivational and dispositional states of agents. But this is not yet a sufficient condition for counting as an agent-based approach, since the same condition will be met by every virtue ethical account. For a theory to count as an agent-based form of virtue ethics it must also be the case that the normative properties of motivations and dispositions cannot be explained in terms of the normative properties of something else (such as eudaimonia or states of affairs) which is taken to be more fundamental. Beyond this basic commitment, there is room for agent-based theories to be developed in a number of different directions. The most important distinguishing factor has to do with how motivations and dispositions are taken to matter for the purposes of explaining other normative qualities. For Slote what matters are this particular agent’s actual motives and dispositions. The goodness of action A, for example, is derived from the agent’s motives when she performs A. If those motives are good then the action is good, if not then not. On Zagzebski’s account, by contrast, a good or bad, right or wrong action is defined not by this agent’s actual motives but rather by whether this is the sort of action a virtuously motivated agent would perform (Zagzebski 2004: 160). Appeal to the virtuous agent’s hypothetical motives and dispositions enables Zagzebski to distinguish between performing the right action and doing so for the right reasons (a distinction that, as Brady (2004) observes, Slote has trouble drawing). Another point on which agent-based forms of virtue ethics might differ concerns how one identifies virtuous motivations and dispositions. According to Zagzebski’s exemplarist account, “We do not have criteria for goodness in advance of identifying the exemplars of goodness” (Zagzebski 2004: 41). As we observe the people around us, we find ourselves wanting to be like some of them (in at least some respects) and not wanting to be like others. The former provide us with positive exemplars and the latter with negative ones. Our understanding of better and worse motivations and virtuous and vicious dispositions is grounded in these primitive responses to exemplars (2004: 53). This is not to say that every time we act we stop and ask ourselves what one of our exemplars would do in this situations. Our moral concepts become more refined over time as we encounter a wider variety of exemplars and begin to draw systematic connections between them, noting what they have in common, how they differ, and which of these commonalities and differences matter, morally speaking. Recognizable motivational profiles emerge and come to be labeled as virtues or vices, and these, in turn, shape our understanding of the obligations we have and the ends we should pursue. However, even though the systematising of moral thought can travel a long way from our starting point, according to the exemplarist it never reaches a stage where reference to exemplars is replaced by the recognition of something more fundamental. At the end of the day, according to the exemplarist, our moral system still rests on our basic propensity to take a liking (or disliking) to exemplars. Nevertheless, one could be an agent-based theorist without advancing the exemplarist’s account of the origins or reference conditions for judgments of good and bad, virtuous and vicious. 2.3 Target-Centered Virtue Ethics The touchstone for eudaimonist virtue ethicists is a flourishing human life. For agent-based virtue ethicists it is an exemplary agent’s motivations. The target-centered view developed by Christine Swanton (2003), by contrast, begins with our existing conceptions of the virtues. We already have a passable idea of which traits are virtues and what they involve. Of course, this untutored understanding can be clarified and improved, and it is one of the tasks of the virtue ethicist to help us do precisely that. But rather than stripping things back to something as basic as the motivations we want to imitate or building it up to something as elaborate as an entire flourishing life, the target-centered view begins where most ethics students find themselves, namely, with the idea that generosity, courage, self-discipline, compassion, and the like get a tick of approval. It then examines what these traits involve. A complete account of virtue will map out 1) its field, 2) its mode of responsiveness, 3) its basis of moral acknowledgment, and 4) its target. Different virtues are concerned with different fields. Courage, for example, is concerned with what might harm us, whereas generosity is concerned with the sharing of time, talent, and property. The basis of acknowledgment of a virtue is the feature within the virtue’s field to which it responds. To continue with our previous examples, generosity is attentive to the benefits that others might enjoy through one’s agency, and courage responds to threats to value, status, or the bonds that exist between oneself and particular others, and the fear such threats might generate. A virtue’s mode has to do with how it responds to the bases of acknowledgment within its field. Generosity promotes a good, namely, another’s benefit, whereas courage defends a value, bond, or status. Finally, a virtue’s target is that at which it is aimed. Courage aims to control fear and handle danger, while generosity aims to share time, talents, or possessions with others in ways that benefit them. A virtue, on a target-centered account, “is a disposition to respond to, or acknowledge, items within its field or fields in an excellent or good enough way” (Swanton 2003: 19). A virtuous act is an act that hits the target of a virtue, which is to say that it succeeds in responding to items in its field in the specified way (233). Providing a target-centered definition of a right action requires us to move beyond the analysis of a single virtue and the actions that follow from it. This is because a single action context may involve a number of different, overlapping fields. Determination might lead me to persist in trying to complete a difficult task even if doing so requires a singleness of purpose. But love for my family might make a different use of my time and attention. In order to define right action a target-centered view must explain how we handle different virtues’ conflicting claims on our resources. There are at least three different ways to address this challenge. A perfectionist target-centered account would stipulate, “An act is right if and only if it is overall virtuous, and that entails that it is the, or a, best action possible in the circumstances” (239–240). A more permissive target-centered account would not identify ‘right’ with ‘best’, but would allow an action to count as right provided “it is good enough even if not the (or a) best action” (240). A minimalist target-centered account would not even require an action to be good in order to be right. On such a view, “An act is right if and only if it is not overall vicious” (240). (For further discussion of target-centered virtue ethics see Van Zyl 2014; and Smith 2016). 2.4 Platonistic Virtue Ethics The fourth form a virtue ethic might adopt takes its inspiration from Plato. The Socrates of Plato’s dialogues devotes a great deal of time to asking his fellow Athenians to explain the nature of virtues like justice, courage, piety, and wisdom. So it is clear that Plato counts as a virtue theorist. But it is a matter of some debate whether he should be read as a virtue ethicist (White 2015). What is not open to debate is whether Plato has had an important influence on the contemporary revival of interest in virtue ethics. A number of those who have contributed to the revival have done so as Plato scholars (e.g., Prior 1991; Kamtekar 1998; Annas 1999; and Reshotko 2006). However, often they have ended up championing a eudaimonist version of virtue ethics (see Prior 2001 and Annas 2011), rather than a version that would warrant a separate classification. Nevertheless, there are two variants that call for distinct treatment. Timothy Chappell takes the defining feature of Platonistic virtue ethics to be that “Good agency in the truest and fullest sense presupposes the contemplation of the Form of the Good” (2014). Chappell follows Iris Murdoch in arguing that “In the moral life the enemy is the fat relentless ego” (Murdoch 1971: 51). Constantly attending to our needs, our desires, our passions, and our thoughts skews our perspective on what the world is actually like and blinds us to the goods around us. Contemplating the goodness of something we encounter—which is to say, carefully attending to it “for its own sake, in order to understand it” (Chappell 2014: 300)—breaks this natural tendency by drawing our attention away from ourselves. Contemplating such goodness with regularity makes room for new habits of thought that focus more readily and more honestly on things other than the self. It alters the quality of our consciousness. And “anything which alters consciousness in the direction of unselfishness, objectivity, and realism is to be connected with virtue” (Murdoch 1971: 82). The virtues get defined, then, in terms of qualities that help one “pierce the veil of selfish consciousness and join the world as it really is” (91). And good agency is defined by the possession and exercise of such virtues. Within Chappell’s and Murdoch’s framework, then, not all normative properties get defined in terms of virtue. Goodness, in particular, is not so defined. But the kind of goodness which is possible for creatures like us is defined by virtue, and any answer to the question of what one should do or how one should live will appeal to the virtues. Another Platonistic variant of virtue ethics is exemplified by Robert Merrihew Adams. Unlike Murdoch and Chappell, his starting point is not a set of claims about our consciousness of goodness. Rather, he begins with an account of the metaphysics of goodness. Like Murdoch and others influenced by Platonism, Adams’s account of goodness is built around a conception of a supremely perfect good. And like Augustine, Adams takes that perfect good to be God. God is both the exemplification and the source of all goodness. Other things are good, he suggests, to the extent that they resemble God (Adams 1999). The resemblance requirement identifies a necessary condition for being good, but it does not yet give us a sufficient condition. This is because there are ways in which finite creatures might resemble God that would not be suitable to the type of creature they are. For example, if God were all-knowing, then the belief, “I am all-knowing,” would be a suitable belief for God to have. In God, such a belief—because true—would be part of God’s perfection. However, as neither you nor I are all-knowing, the belief, “I am all-knowing,” in one of us would not be good. To rule out such cases we need to introduce another factor. That factor is the fitting response to goodness, which Adams suggests is love. Adams uses love to weed out problematic resemblances: “being excellent in the way that a finite thing can be consists in resembling God in a way that could serve God as a reason for loving the thing” (Adams 1999: 36). Virtues come into the account as one of the ways in which some things (namely, persons) could resemble God. “[M]ost of the excellences that are most important to us, and of whose value we are most confident, are excellences of persons or of qualities or actions or works or lives or stories of persons” (1999: 42). This is one of the reasons Adams offers for conceiving of the ideal of perfection as a personal God, rather than an impersonal form of the Good. Many of the excellences of persons of which we are most confident are virtues such as love, wisdom, justice, patience, and generosity. And within many theistic traditions, including Adams’s own Christian tradition, such virtues are commonly attributed to divine agents. A Platonistic account like the one Adams puts forward in Finite and Infinite Goods clearly does not derive all other normative properties from the virtues (for a discussion of the relationship between this view and the one he puts forward in A Theory of Virtue (2006) see Pettigrove 2014). Goodness provides the normative foundation. Virtues are not built on that foundation; rather, as one of the varieties of goodness of whose value we are most confident, virtues form part of the foundation. Obligations, by contrast, come into the account at a different level. Moral obligations, Adams argues, are determined by the expectations and demands that “arise in a relationship or system of relationships that is good or valuable” (1999: 244). Other things being equal, the more virtuous the parties to the relationship, the more binding the obligation. Thus, within Adams’s account, the good (which includes virtue) is prior to the right. However, once good relationships have given rise to obligations, those obligations take on a life of their own. Their bindingness is not traced directly to considerations of goodness. Rather, they are determined by the expectations of the parties and the demands of the relationship. 3. Objections to virtue ethics A number of objections have been raised against virtue ethics, some of which bear more directly on one form of virtue ethics than on others. In this section we consider eight objections, namely, the a) application, b) adequacy, c) relativism, d) conflict, e) self-effacement, f) justification, g) egoism, and h) situationist problems. a) In the early days of virtue ethics’ revival, the approach was associated with an “anti-codifiability” thesis about ethics, directed against the prevailing pretensions of normative theory. At the time, utilitarians and deontologists commonly (though not universally) held that the task of ethical theory was to come up with a code consisting of universal rules or principles (possibly only one, as in the case of act-utilitarianism) which would have two significant features: i) the rule(s) would amount to a decision procedure for determining what the right action was in any particular case; ii) the rule(s) would be stated in such terms that any non-virtuous person could understand and apply it (them) correctly. Virtue ethicists maintained, contrary to these two claims, that it was quite unrealistic to imagine that there could be such a code (see, in particular, McDowell 1979). The results of attempts to produce and employ such a code, in the heady days of the 1960s and 1970s, when medical and then bioethics boomed and bloomed, tended to support the virtue ethicists’ claim. More and more utilitarians and deontologists found themselves agreed on their general rules but on opposite sides of the controversial moral issues in contemporary discussion. It came to be recognised that moral sensitivity, perception, imagination, and judgement informed by experience—phronesis in short—is needed to apply rules or principles correctly. Hence many (though by no means all) utilitarians and deontologists have explicitly abandoned (ii) and much less emphasis is placed on (i). Nevertheless, the complaint that virtue ethics does not produce codifiable principles is still a commonly voiced criticism of the approach, expressed as the objection that it is, in principle, unable to provide action-guidance. Initially, the objection was based on a misunderstanding. Blinkered by slogans that described virtue ethics as “concerned with Being rather than Doing,” as addressing “What sort of person should I be?” but not “What should I do?” as being “agent-centered rather than act-centered,” its critics maintained that it was unable to provide action-guidance. Hence, rather than being a normative rival to utilitarian and deontological ethics, it could claim to be no more than a valuable supplement to them. The rather odd idea was that all virtue ethics could offer was, “Identify a moral exemplar and do what he would do,” as though the university student trying to decide whether to study music (her preference) or engineering (her parents’ preference) was supposed to ask herself, “What would Socrates study if he were in my circumstances?” But the objection failed to take note of Anscombe’s hint that a great deal of specific action guidance could be found in rules employing the virtue and vice terms (“v-rules”) such as “Do what is honest/charitable; do not do what is dishonest/uncharitable” (Hursthouse 1999). (It is a noteworthy feature of our virtue and vice vocabulary that, although our list of generally recognised virtue terms is comparatively short, our list of vice terms is remarkably, and usefully, long, far exceeding anything that anyone who thinks in terms of standard deontological rules has ever come up with. Much invaluable action guidance comes from avoiding courses of action that would be irresponsible, feckless, lazy, inconsiderate, uncooperative, harsh, intolerant, selfish, mercenary, indiscreet, tactless, arrogant, unsympathetic, cold, incautious, unenterprising, pusillanimous, feeble, presumptuous, rude, hypocritical, self-indulgent, materialistic, grasping, short-sighted, vindictive, calculating, ungrateful, grudging, brutal, profligate, disloyal, and on and on.) (b) A closely related objection has to do with whether virtue ethics can provide an adequate account of right action. This worry can take two forms. (i) One might think a virtue ethical account of right action is extensionally inadequate. It is possible to perform a right action without being virtuous and a virtuous person can occasionally perform the wrong action without that calling her virtue into question. If virtue is neither necessary nor sufficient for right action, one might wonder whether the relationship between rightness/wrongness and virtue/vice is close enough for the former to be identified in terms of the latter. (ii) Alternatively, even if one thought it possible to produce a virtue ethical account that picked out all (and only) right actions, one might still think that at least in some cases virtue is not what explains rightness (Adams 2006:6–8). Some virtue ethicists respond to the adequacy objection by rejecting the assumption that virtue ethics ought to be in the business of providing an account of right action in the first place. Following in the footsteps of Anscombe (1958) and MacIntyre (1985), Talbot Brewer (2009) argues that to work with the categories of rightness and wrongness is already to get off on the wrong foot. Contemporary conceptions of right and wrong action, built as they are around a notion of moral duty that presupposes a framework of divine (or moral) law or around a conception of obligation that is defined in contrast to self-interest, carry baggage the virtue ethicist is better off without. Virtue ethics can address the questions of how one should live, what kind of person one should become, and even what one should do without that committing it to providing an account of ‘right action’. One might choose, instead, to work with aretaic concepts (defined in terms of virtues and vices) and axiological concepts (defined in terms of good and bad, better and worse) and leave out deontic notions (like right/wrong action, duty, and obligation) altogether. Other virtue ethicists wish to retain the concept of right action but note that in the current philosophical discussion a number of distinct qualities march under that banner. In some contexts, ‘right action’ identifies the best action an agent might perform in the circumstances. In others, it designates an action that is commendable (even if not the best possible). In still others, it picks out actions that are not blameworthy (even if not commendable). A virtue ethicist might choose to define one of these—for example, the best action—in terms of virtues and vices, but appeal to other normative concepts—such as legitimate expectations—when defining other conceptions of right action. As we observed in section 2, a virtue ethical account need not attempt to reduce all other normative concepts to virtues and vices. What is required is simply (i) that virtue is not reduced to some other normative concept that is taken to be more fundamental and (ii) that some other normative concepts are explained in terms of virtue and vice. This takes the sting out of the adequacy objection, which is most compelling against versions of virtue ethics that attempt to define all of the senses of ‘right action’ in terms of virtues. Appealing to virtues and vices makes it much easier to achieve extensional adequacy. Making room for normative concepts that are not taken to be reducible to virtue and vice concepts makes it even easier to generate a theory that is both extensionally and explanatorily adequate. Whether one needs other concepts and, if so, how many, is still a matter of debate among virtue ethicists, as is the question of whether virtue ethics even ought to be offering an account of right action. Either way virtue ethicists have resources available to them to address the adequacy objection. Insofar as the different versions of virtue ethics all retain an emphasis on the virtues, they are open to the familiar problem of (c) the charge of cultural relativity. Is it not the case that different cultures embody different virtues, (MacIntyre 1985) and hence that the v-rules will pick out actions as right or wrong only relative to a particular culture? Different replies have been made to this charge. One—the tu quoque, or “partners in crime” response—exhibits a quite familiar pattern in virtue ethicists’ defensive strategy (Solomon 1988). They admit that, for them, cultural relativism is a challenge, but point out that it is just as much a problem for the other two approaches. The (putative) cultural variation in character traits regarded as virtues is no greater—indeed markedly less—than the cultural variation in rules of conduct, and different cultures have different ideas about what constitutes happiness or welfare. That cultural relativity should be a problem common to all three approaches is hardly surprising. It is related, after all, to the “justification problem” (see below) the quite general metaethical problem of justifying one’s moral beliefs to those who disagree, whether they be moral sceptics, pluralists or from another culture. A bolder strategy involves claiming that virtue ethics has less difficulty with cultural relativity than the other two approaches. Much cultural disagreement arises, it may be claimed, from local understandings of the virtues, but the virtues themselves are not relative to culture (Nussbaum 1993). Another objection to which the tu quoque response is partially appropriate is (d) “the conflict problem.” What does virtue ethics have to say about dilemmas—cases in which, apparently, the requirements of different virtues conflict because they point in opposed directions? Charity prompts me to kill the person who would be better off dead, but justice forbids it. Honesty points to telling the hurtful truth, kindness and compassion to remaining silent or even lying. What shall I do? Of course, the same sorts of dilemmas are generated by conflicts between deontological rules. Deontology and virtue ethics share the conflict problem (and are happy to take it on board rather than follow some of the utilitarians in their consequentialist resolutions of such dilemmas) and in fact their strategies for responding to it are parallel. Both aim to resolve a number of dilemmas by arguing that the conflict is merely apparent; a discriminating understanding of the virtues or rules in question, possessed only by those with practical wisdom, will perceive that, in this particular case, the virtues do not make opposing demands or that one rule outranks another, or has a certain exception clause built into it. Whether this is all there is to it depends on whether there are any irresolvable dilemmas. If there are, proponents of either normative approach may point out reasonably that it could only be a mistake to offer a resolution of what is, ex hypothesi, irresolvable. Another problem arguably shared by all three approaches is (e), that of being self-effacing. An ethical theory is self-effacing if, roughly, whatever it claims justifies a particular action, or makes it right, had better not be the agent’s motive for doing it. Michael Stocker (1976) originally introduced it as a problem for deontology and consequentialism. He pointed out that the agent who, rightly, visits a friend in hospital will rather lessen the impact of his visit on her if he tells her either that he is doing it because it is his duty or because he thought it would maximize the general happiness. But as Simon Keller observes, she won’t be any better pleased if he tells her that he is visiting her because it is what a virtuous agent would do, so virtue ethics would appear to have the problem too (Keller 2007). However, virtue ethics’ defenders have argued that not all forms of virtue ethics are subject to this objection (Pettigrove 2011) and those that are are not seriously undermined by the problem (Martinez 2011). Another problem for virtue ethics, which is shared by both utilitarianism and deontology, is (f) “the justification problem.” Abstractly conceived, this is the problem of how we justify or ground our ethical beliefs, an issue that is hotly debated at the level of metaethics. In its particular versions, for deontology there is the question of how to justify its claims that certain moral rules are the correct ones, and for utilitarianism of how to justify its claim that all that really matters morally are consequences for happiness or well-being. For virtue ethics, the problem concerns the question of which character traits are the virtues. In the metaethical debate, there is widespread disagreement about the possibility of providing an external foundation for ethics—“external” in the sense of being external to ethical beliefs—and the same disagreement is found amongst deontologists and utilitarians. Some believe that their normative ethics can be placed on a secure basis, resistant to any form of scepticism, such as what anyone rationally desires, or would accept or agree on, regardless of their ethical outlook; others that it cannot. Virtue ethicists have eschewed any attempt to ground virtue ethics in an external foundation while continuing to maintain that their claims can be validated. Some follow a form of Rawls’s coherentist approach (Slote 2001; Swanton 2003); neo-Aristotelians a form of ethical naturalism. A misunderstanding of eudaimonia as an unmoralized concept leads some critics to suppose that the neo-Aristotelians are attempting to ground their claims in a scientific account of human nature and what counts, for a human being, as flourishing. Others assume that, if this is not what they are doing, they cannot be validating their claims that, for example, justice, charity, courage, and generosity are virtues. Either they are illegitimately helping themselves to Aristotle’s discredited natural teleology (Williams 1985) or producing mere rationalizations of their own personal or culturally inculcated values. But McDowell, Foot, MacIntyre and Hursthouse have all outlined versions of a third way between these two extremes. Eudaimonia in virtue ethics, is indeed a moralized concept, but it is not only that. Claims about what constitutes flourishing for human beings no more float free of scientific facts about what human beings are like than ethological claims about what constitutes flourishing for elephants. In both cases, the truth of the claims depends in part on what kind of animal they are and what capacities, desires and interests the humans or elephants have. The best available science today (including evolutionary theory and psychology) supports rather than undermines the ancient Greek assumption that we are social animals, like elephants and wolves and unlike polar bears. No rationalizing explanation in terms of anything like a social contract is needed to explain why we choose to live together, subjugating our egoistic desires in order to secure the advantages of co-operation. Like other social animals, our natural impulses are not solely directed towards our own pleasures and preservation, but include altruistic and cooperative ones. This basic fact about us should make more comprehensible the claim that the virtues are at least partially constitutive of human flourishing and also undercut the objection that virtue ethics is, in some sense, egoistic. (g) The egoism objection has a number of sources. One is a simple confusion. Once it is understood that the fully virtuous agent characteristically does what she should without inner conflict, it is triumphantly asserted that “she is only doing what she wants to do and hence is being selfish.” So when the generous person gives gladly, as the generous are wont to do, it turns out she is not generous and unselfish after all, or at least not as generous as the one who greedily wants to hang on to everything she has but forces herself to give because she thinks she should! A related version ascribes bizarre reasons to the virtuous agent, unjustifiably assuming that she acts as she does because she believes that acting thus on this occasion will help her to achieve eudaimonia. But “the virtuous agent” is just “the agent with the virtues” and it is part of our ordinary understanding of the virtue terms that each carries with it its own typical range of reasons for acting. The virtuous agent acts as she does because she believes that someone’s suffering will be averted, or someone benefited, or the truth established, or a debt repaid, or … thereby. It is the exercise of the virtues during one’s life that is held to be at least partially constitutive of eudaimonia, and this is consistent with recognising that bad luck may land the virtuous agent in circumstances that require her to give up her life. Given the sorts of considerations that courageous, honest, loyal, charitable people wholeheartedly recognise as reasons for action, they may find themselves compelled to face danger for a worthwhile end, to speak out in someone’s defence, or refuse to reveal the names of their comrades, even when they know that this will inevitably lead to their execution, to share their last crust and face starvation. On the view that the exercise of the virtues is necessary but not sufficient for eudaimonia, such cases are described as those in which the virtuous agent sees that, as things have unfortunately turned out, eudaimonia is not possible for them (Foot 2001, 95). On the Stoical view that it is both necessary and sufficient, a eudaimon life is a life that has been successfully lived (where “success” of course is not to be understood in a materialistic way) and such people die knowing not only that they have made a success of their lives but that they have also brought their lives to a markedly successful completion. Either way, such heroic acts can hardly be regarded as egoistic. A lingering suggestion of egoism may be found in the misconceived distinction between so-called “self-regarding” and “other-regarding” virtues. Those who have been insulated from the ancient tradition tend to regard justice and benevolence as real virtues, which benefit others but not their possessor, and prudence, fortitude and providence (the virtue whose opposite is “improvidence” or being a spendthrift) as not real virtues at all because they benefit only their possessor. This is a mistake on two counts. Firstly, justice and benevolence do, in general, benefit their possessors, since without them eudaimonia is not possible. Secondly, given that we live together, as social animals, the “self-regarding” virtues do benefit others—those who lack them are a great drain on, and sometimes grief to, those who are close to them (as parents with improvident or imprudent adult offspring know only too well). The most recent objection (h) to virtue ethics claims that work in “situationist” social psychology shows that there are no such things as character traits and thereby no such things as virtues for virtue ethics to be about (Doris 1998; Harman 1999). In reply, some virtue ethicists have argued that the social psychologists’ studies are irrelevant to the multi-track disposition (see above) that a virtue is supposed to be (Sreenivasan 2002; Kamtekar 2004). Mindful of just how multi-track it is, they agree that it would be reckless in the extreme to ascribe a demanding virtue such as charity to people of whom they know no more than that they have exhibited conventional decency; this would indeed be “a fundamental attribution error.” Others have worked to develop alternative, empirically grounded conceptions of character traits (Snow 2010; Miller 2013 and 2014; however see Upton 2016 for objections to Miller). There have been other responses as well (summarized helpfully in Prinz 2009 and Miller 2014). Notable among these is a response by Adams (2006, echoing Merritt 2000) who steers a middle road between “no character traits at all” and the exacting standard of the Aristotelian conception of virtue which, because of its emphasis on phronesis, requires a high level of character integration. On his conception, character traits may be “frail and fragmentary” but still virtues, and not uncommon. But giving up the idea that practical wisdom is the heart of all the virtues, as Adams has to do, is a substantial sacrifice, as Russell (2009) and Kamtekar (2010) argue. Even though the “situationist challenge” has left traditional virtue ethicists unmoved, it has generated a healthy engagement with empirical psychological literature, which has also been fuelled by the growing literature on Foot’s Natural Goodness and, quite independently, an upsurge of interest in character education (see below). 4. Future Directions Over the past thirty-five years most of those contributing to the revival of virtue ethics have worked within a neo-Aristotelian, eudaimonist framework. However, as noted in section 2, other forms of virtue ethics have begun to emerge. Theorists have begun to turn to philosophers like Hutcheson, Hume, Nietzsche, Martineau, and Heidegger for resources they might use to develop alternatives (see Russell 2006; Swanton 2013 and 2015; Taylor 2015; and Harcourt 2015). Others have turned their attention eastward, exploring Confucian, Buddhist, and Hindu traditions (Yu 2007; Slingerland 2011; Finnigan and Tanaka 2011; McRae 2012; Angle and Slote 2013; Davis 2014; Flanagan 2015; Perrett and Pettigrove 2015; and Sim 2015). These explorations promise to open up new avenues for the development of virtue ethics. Although virtue ethics has grown remarkably in the last thirty-five years, it is still very much in the minority, particularly in the area of applied ethics. Many editors of big textbook collections on “moral problems” or “applied ethics” now try to include articles representative of each of the three normative approaches but are often unable to find a virtue ethics article addressing a particular issue. This is sometimes, no doubt, because “the” issue has been set up as a deontologicial/utilitarian debate, but it is often simply because no virtue ethicist has yet written on the topic. However, the last decade has seen an increase in the amount of attention applied virtue ethics has received (Walker and Ivanhoe 2007; Hartman 2013; Austin 2014; Van Hooft 2014; and Annas 2015). This area can certainly be expected to grow in the future, and it looks as though applying virtue ethics in the field of environmental ethics may prove particularly fruitful (Sandler 2007; Hursthouse 2007, 2011; Zwolinski and Schmidtz 2013; Cafaro 2015). Whether virtue ethics can be expected to grow into “virtue politics”—i.e. to extend from moral philosophy into political philosophy—is not so clear. Gisela Striker (2006) has argued that Aristotle’s ethics cannot be understood adequately without attending to its place in his politics. That suggests that at least those virtue ethicists who take their inspiration from Aristotle should have resources to offer for the development of virtue politics. But, while Plato and Aristotle can be great inspirations as far as virtue ethics is concerned, neither, on the face of it, are attractive sources of insight where politics is concerned. However, recent work suggests that Aristotelian ideas can, after all, generate a satisfyingly liberal political philosophy (Nussbaum 2006; LeBar 2013a). Moreover, as noted above, virtue ethics does not have to be neo-Aristotelian. It may be that the virtue ethics of Hutcheson and Hume can be naturally extended into a modern political philosophy (Hursthouse 1990–91; Slote 1993). Following Plato and Aristotle, modern virtue ethics has always emphasised the importance of moral education, not as the inculcation of rules but as the training of character. There is now a growing movement towards virtues education, amongst both academics (Carr 1999; Athanassoulis 2014; Curren 2015) and teachers in the classroom. One exciting thing about research in this area is its engagement with other academic disciplines, including psychology, educational theory, and theology (see Cline 2015; and Snow 2015). Finally, one of the more productive developments of virtue ethics has come through the study of particular virtues and vices. There are now a number of careful studies of the cardinal virtues and capital vices (Pieper 1966; Taylor 2006; Curzer 2012; Timpe and Boyd 2014). Others have explored less widely discussed virtues or vices, such as civility, decency, truthfulness, ambition, and meekness (Calhoun 2000; Kekes 2002; Williams 2002; and Pettigrove 2007 and 2012). One of the questions these studies raise is “How many virtues are there?” A second is, “How are these virtues related to one another?” Some virtue ethicists have been happy to work on the assumption that there is no principled reason for limiting the number of virtues and plenty of reason for positing a plurality of them (Swanton 2003; Battaly 2015). Others have been concerned that such an open-handed approach to the virtues will make it difficult for virtue ethicists to come up with an adequate account of right action or deal with the conflict problem discussed above. Dan Russell has proposed cardinality and a version of the unity thesis as a solution to what he calls “the enumeration problem” (the problem of too many virtues). The apparent proliferation of virtues can be significantly reduced if we group virtues together with some being cardinal and others subordinate extensions of those cardinal virtues. Possible conflicts between the remaining virtues can then be managed if they are tied together in some way as part of a unified whole (Russell 2009). This highlights two important avenues for future research, one of which explores individual virtues and the other of which analyses how they might be related to one another.
    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      The authors describe the results of a single study designed to investigate the extent to which horizontal orientation energy plays a key role in supporting view-invariant face recognition. The authors collected behavioral data from adult observers who were asked to complete an old/new face matching task by learning broad-spectrum faces (not orientation filtered) during a familiarization phase and subsequently trying to label filtered faces as previously seen or novel at test. This data revealed a clear bias favoring the use of horizontal orientation energy across viewpoint changes in the target images. The authors then compared different ideal observer models (cross-correlations between target and probe stimuli) to examine how this profile might be reflected in the image-level appearance of their filtered images. This revealed that a model looking for the best matching face within a viewpoint differed substantially from human data, exhibiting a vertical orientation bias for extreme profiles. However, a model forced to match targets to probes at different viewing angles exhibited a consistent horizontal bias in much the same manner as human observers.

      Strengths:

      I think the question is an important one: The horizontal orientation bias is a great example of a low-level image property being linked to high-level recognition outcomes, and understanding the nature of that connection is important. I found the old/new task to be a straightforward task that was implemented ably and that has the benefit of being simple for participants to carry out and simple to analyze. I particularly appreciated that the authors chose to describe human data via a lower-dimensional model (their Gaussian fits to individual data) for further analysis. This was a nice way to express the nature of the tuning function, favoring horizontal orientation bias in a way that makes key parameters explicit. Broadly speaking, I also thought that the model comparison they include between the view-selective and view-tolerant models was a great next step. This analysis has the potential to reveal some good insights into how this bias emerges and ask fine-grained questions about the parameters in their model fits to the behavioral data.

      Weaknesses:

      I will start with what I think is the biggest difficulty I had with the paper. Much as I liked the model comparison analysis, I also don't quite know what to make of the view-tolerant model. As I understand the authors' description, the key feature of this model is that it does not get to compare the target and probe at the same yaw angle, but must instead pick a best match from candidates that are at different yaws. While it is interesting to see that this leads to a very different orientation profile, it also isn't obvious to me why such a comparison would be reflective of what the visual system is probably doing. I can see that the view-specific model is more or less assuming something like an exemplar representation of each face: You have the opportunity to compare a new image to a whole library of viewpoints, and presumably it isn't hard to start with some kind of first pass that identifies the best matching view first before trying to identify/match the individual in question. What I don't get about the view-tolerant model is that it seems almost like an anti-exemplar model: You specifically lack the best viewpoint in the library but have to make do with the other options. Again, this is sort of interesting and the very different behavior of the model is neat to discuss, but it doesn't seem easy to align with any theoretical perspective on face recognition. My thinking here is that it might be useful to consider an additional alternate model that doesn't specifically exclude the best-matching viewpoint, but perhaps condenses appearance across views into something like a prototype. I could even see an argument for something like the yaw-averages presented earlier in the manuscript as the basis for such a model, but this might be too much of a stretch. Overall, what I'd like to see is some kind of alternate model that incorporates the existence of the best-match viewpoint somehow, but without the explicit exemplar structure of the view-specific model.

      The design of the view-tolerant model aligned with the requirements of tolerant recognition and revealed the stimulus information enabling to abstract identity away from variations in face appearance. However, it did not involve the notion that such ability may depend on a prototype or summary representation of face identity built up through varied encounters (Burton, Jenkins and Schweinberger 2011, Jenkins, White et al. 2011, Mike Burton 2013, Burton, Kramer et al. 2016, Menon, Kemp and White 2018).

      We agree with the Reviewer that the average of the different views of a face is a good proxy of its central tendency (i.e., stable identity properties; Figure 1). We thus followed their suggestion and included an additional model observer that compared specific views to full-spectrum view-averaged identities. The examination of the orientation tuning profile of this so-called view-average model observer confirmed the crucial contribution of horizontal identity cues to view-invariant recognition as the horizontal range best predicted the average summary of full-spectrum face appearances across views. This additional model observer is now presented in the Discussion and Supplementary files 2 and 3.

      Besides this larger issue, I would also like to see some more details about the nature of the cross-correlation that is the basis for this model comparison. I mostly think I get what is happening, but I think the authors could expand more on the nature of their noise model to make more explicit what is happening before these cross-correlations are taken. I infer that there is a noise-addition step to get them off the ceiling, but I felt that I had to read between the lines a bit to determine this.

      In the Methods section, we now provide detailed information about the addition of noise to model observer cross-correlations: ‘In a pilot phase, we measured the overall identification performance of each model. Initially, the view-selective model performed at ceiling, yielding a correlation of 1 since there was an exact target-probe match across all trials. To avoid ceiling effects and to keep model performance close to human levels (Supplementary File 2), we thus decreased the signal-to-noise ratio (SNR) of the target and probe images to .125 by combining each with distinct noise patterns (face RMS contrast: .01; noise RMS contrast: .08). Each trial (i.e. target-probe pairing) was iterated ten times with different random noise patterns.’

      We also added a supplemental with the graphic illustration of the d’ distributions of each model and human observers: ‘Sensitivity d’ of the view-tolerant model was much lower than view-selective model and human sensitivity (Supplementary File 2), even without noise. The view-tolerant model therefore processed fully visible stimuli (SNR of 1). This decreased sensitivity in the view-tolerant compared to the view-selective model is expected, as none of the probes exactly matched the target at the pixel level due to viewpoint differences. In contrast to humans who rely on internally stored representations to match identity across views, the model observer lacks such internal representations and entirely relies on (less efficient) pixelwise comparisons.’

      Another thing that I think is worth considering and commenting on is the stimuli themselves and the extent to which this may limit the outcomes of their behavioral task. The use of the 3D laser-scanned faces has some obvious advantages, but also (I think) removes the possibility for pigmentation to contribute to recognition, removes the contribution of varying illumination and expression to appearance variability, and perhaps presents observers with more homogeneous faces than one typically has to worry about. I don't think these negate the current results, but I'd like the authors to expand on their discussion of these factors, particularly pigmentation. Naively, surface color and texture seem like they could offer diagnostic cues to identity that don't rely so critically on horizontal orientations, so removing these may mean that horizontal bias is particularly evident when face shape is the critical cue for recognition.

      Our stimuli were originally designed by Troje and Bulthoff (1996). These are 3D laser scans of white individuals aged between 20 and 40 years, posing with a neutral expression. Different views of the faces were shot under a fixed illumination. Ears and a small portion of the neck were visible while the hair region was removed. All face images had a normalized skin color and we further converted them to grayscales

      While we agree that this stimulus set offers a restricted range of within- and between-identity variations compared to what is experienced in natural settings, we believe that the present findings generalize to more ecological viewing conditions. Indeed, past evidence showed that the recognition of face pictures shot under largely variable pose, age, expression, illumination, hair style is tuned to the horizontal range of the face stimulus (Dakin and Watt 2009, Dumont, Roux-Sibilon and Goffaux 2024). In other words, our finding that view-tolerant identity recognition is mainly driven by horizontal face information would likely replicate with the use of a more ecological stimulus set.

      Moreover, the skin color normalization and grayscale conversion, while limiting the range of face variability, did not eliminate the contribution of surface pigmentation in our study. It is thus unlikely that our findings exclusively reflect the orientation dependence of face shape processing. Pigmentation refers to all surface reflectance properties (Russell, Sinha et al. 2006) and hue (color) is only one among others. The grayscaled 3D laser scanned faces used here contained natural variations in crucial surface cues such as skin albedo (i.e., how light or dark the surface appears) and texture (i.e., spatial variation in how light is reflected); they have actually been used to disentangle the role of shape and surface cues to identity recognition (e.g., Troje and Bulthoff 1996, Vuong, Peissig et al. 2005, Russell, Sinha et al. 2006, Russell, Biederman et al. 2007, Jiang, Dricot et al. 2009). Moreover, a past study of ours demonstrated that the diagnosticity of the horizontal range of face information is not restricted to face shape cues; the specialized processing of face shape and surface both selectively rely on horizontal information (Dumont, Roux-Sibilon and Goffaux 2024).

      For these reasons, the present findings are unlikely to be fully determined by shape processing, and we expect them to generalize to more ecological stimulus sets. We discuss these aspects in the revised manuscript.

      Reviewer #2 (Public review):

      This study investigates the visual information that is used for the recognition of faces. This is an important question in vision research and is critical for social interactions more generally. The authors ask whether our ability to recognise faces, across different viewpoints, varies as a function of the orientation information available in the image. Consistent with previous findings from this group and others, they find that horizontally filtered faces were recognised better than vertically filtered faces. Next, they probe the mechanism underlying this pattern of data by designing two model observers. The first was optimised for faces at a specific viewpoint (view-selective). The second was generalised across viewpoints (view-tolerant). In contrast to the human data, the view-specific model shows that the information that is useful for identity judgements varies according to viewpoint. For example, frontal face identities are again optimally discriminated with horizontal orientation information, but profiles are optimally discriminated with more vertical orientation information. These findings show human face recognition is biased toward horizontal orientation information, even though this may be suboptimal for the recognition of profile views of the face.

      One issue in the design of this study was the lowering of the signal-to-noise ratio in the view-selective observer. This decision was taken to avoid ceiling effects. However, it is not clear how this affects the similarity with the human observers.

      In the Methods section, we now provide detailed information about the addition of noise to model observer cross-correlations: ‘In a pilot phase, we measured the overall identification performance of each model. Initially, the view-selective model performed at ceiling, yielding a correlation of 1 since there was an exact target-probe match across all trials. To avoid ceiling effects and to keep model performance close to human levels (Supplementary File 2), we thus decreased the signal-to-noise ratio (SNR) of the target and probe images to .125 by combining each with distinct noise patterns (face RMS contrast: .01; noise RMS contrast: .08). Each trial (i.e. target-probe pairing) was iterated ten times with different random noise patterns.’

      We also added a supplemental with the graphic illustration of the d’ distributions of each model and human observers.

      Another issue is the decision to normalise image energy across orientations and viewpoints. I can see the logic in wanting to control for these effects, but this does reflect natural variation in image properties. So, again, I wonder what the results would look like without this step.

      All stimuli were matched for luminance and contrast. It is crucial to normalize image energy across orientations as natural image energy is disproportionately distributed across orientations (e.g., Hansen, Essock et al. 2003). Images of faces cropped from their background as used here contain most of their energy in the horizontal range (Keil 2008, Keil 2009, Goffaux and Greenwood 2016). If not normalized after orientation filtering, such uneven distribution of energy would boost recognition performance in the horizontal range across views. Normalization was performed across our experimental conditions merely to avoid energy from explaining the influence of viewpoint on the orientation tuning profile.

      We were not aware of any systematic natural variations of energy across face views. To address this, we measured face average energy (i.e., RMS contrast) in the original stimulus set, i.e., before the application of any image processing or manipulation. Background pixels were excluded from these image analyses. Across yaws, we found energy to range between .11 and .14 on a 0 to 1 grayscale. This is moderate compared to the range of energy variations we measured across identities (from .08 to .18). This suggests that variations in energy across viewpoints are moderate compared to variations related to identity. It is unclear whether these observations are specific to our stimulus set or whether they are generalizable to faces we encounter in everyday life. They, however, indicate that RMS contrast did not substantially vary across views in the present study and suggest that RMS normalization is unlikely to have affected the influence of viewpoint on recognition performance.

      In the revised methods section, we explicitly motivate energy normalization: ‘Images of faces cropped from their background as used here contain most of their energy in the horizontal range (Goffaux, 2019; Goffaux & Greenwood, 2016; Keil, 2009). Across yaws, we found face energy to range between .11 and .14 on a 0 to 1 grayscale, which is moderate compared to the range of face energy variations we measured across identities (from .08 to .18). To prevent energy from explaining our results, in all images, the luminance and RMS contrast of the face pixels were fixed to 0.55 and 0.15, respectively, and background pixels were uniformly set to 0.55. The percentage of clipped pixel values (below 0 or above 1) per image did not exceed 3%.’.

      Despite the bias toward horizontal orientations in human observers, there were some differences in the orientation preference at each viewpoint. For example, frontal faces were biased to horizontal (90 degrees), but other viewpoints had biases that were slightly off horizontal (e.g., right profile: 80 degrees, left profile: 100 degrees). This does seem to show that differences in statistical information at different viewpoints (more horizontal information for frontal and more vertical information for profile) do influence human perception. It would be good to reflect on this nuance in the data.

      Indeed, human performance data indicates that while identity recognition remains tuned to horizontal information, horizontal tuning peak shows some variation across viewpoints. We primarily focused on the first aspect because of its direct relevance to our research objective, but also discussed the second aspect: with yaw rotation, certain non-horizontal morphological features such as the jaw line or nose bridge, etc. may increasingly contribute to identity recognition, whereas at frontal or near frontal views, features are mostly horizontally-oriented (e.g., Keil 2008, Keil 2009). In the revised Discussion, we directly relate the modest fluctuations of peak location to yaw differences in face feature appearance.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Based on a discussion with the reviewers, we integrated the recommendations and reached a consensus on the eLife assessment. To move from a "solid" to a "compelling/convincing" strength-of-evidence rating, please address the reviewers' comments. Key points are to clarify and test the plausibility of the models (e.g., effects of different noise-addition steps, inclusion/exclusion of specific orientation channels in the view-dependent comparison, and alternative decision criteria), and to address or discuss the limitations of the stimulus set in capturing recognition under more naturalistic scenarios, for example, including texture cues.

      Reviewer #1 (Recommendations for the authors):

      I generally found the paper to be very well-written, so I have only a few minor comments here.

      (1) I didn't really follow why the estimation of the Gaussian functions described in the text was preferred over a simpler ML framework. Do these approaches differ that much? I see references to prior studies in which these were applied, so I can certainly go check these out, but I could see value in adding just a bit of text to briefly make the case that this is important.

      Employing a simpler linear framework, i.e. a linear model predicting d’ from the interaction between orientation and viewpoint, would result in an 8 (orientation) * 7 (viewpoint) design that is difficult to analyze. The interaction term would almost certainly reach significance but its interpretation would be limited. We would either have to rely on numerous local comparisons, which are not particularly informative for our research objectives (e.g., knowing whether d’ differs significantly between two adjacent orientations at a given viewpoint is of little relevance), or to use a polynomial contrast approach (testing the linear, quadratic, … up to the 7th order trends), which would also be difficult to interpret. For such complex, approximately Gaussian-shaped data, the highest-order polynomial trend would likely provide the best fit, but without offering meaningful insight.

      In contrast, a nonlinear approach appears more appropriate. The Gaussian model we used allows us to characterize the parameters of the tuning profile, namely, peak location, peak amplitude, standard deviation (or bandwidth) and base amplitude. These parameters are not merely statistical parameters. Rather, they are directly interpretable in cognitive/functional terms. The peak location corresponds to the orientation at which the Gaussian curve is centred, i.e. the preferred orientation band for identity recognition. The standard deviation represents the width of the curve, reflecting the strength or selectivity of the tuning. The base amplitude is the height of the Gaussian curve base, indicating the minimum level of sensitivity, typically found near vertical orientation. Finally, the peak amplitude refers to the height of the Gaussian curve relative to its baseline, that is, it captures the advantage of horizontal over vertical orientations.

      Moreover, the use of a nonlinear, Gaussian model is motivated by past work that showed that the Gaussian function fits the evolution of recognition performance as a function of orientation (Dakin and Watt 2009, Goffaux and Greenwood 2016). Orientation selectivity at primary stages of visual processing has also been modelled using Gaussian (or Difference of Gaussians; Ringach, Hawken and Shapley 2003).

      We revised the data analysis section to include a justification for our use of a Gaussian model: ‘Therefore, fitting the human sensitivity data could be fitted using a simple Gaussian model. seemed most appropriate as it allows characterizing the parameters of the tuning profile, namely, peak location, peak amplitude, standard deviation and base amplitude, which are directly interpretable in cognitive/functional terms. Moreover, the use of a nonlinear, Gaussian model is motivated by past work that showed that the Gaussian function fits the evolution of recognition performance as a function of orientation (Dakin & Watt, 2009; Goffaux & Greenwood, 2016). Simpler frameworks, i.e. a linear model predicting d’ from the interaction between orientation and viewpoint, would result in an 8 (orientation) * 7 (viewpoint) design that is difficult to analyze and interpret.’

      (2) When reporting the luminance and contrast of your stimuli, please make clear what these units and measures are. This was a case where I had to take a second to assure myself that I knew what the values meant.

      We clarified that the luminance and contrast values reported in the manuscript are on a grey scale ranging from 0 to 1.

      (3) In your Procedure section, I think describing the familiarization task right away would help the text flow more clearly. At present, you began talking about the old/new task, and I was immediately wondering how familiarization worked!

      The procedure section now starts with the description of the familiarization task.

      (4) p. 3 - "Culminates" doesn't seem like the right word here.

      We agree and rephrased this way: ‘The tolerance of face identity recognition is stronger for familiar than unfamiliar faces’.

      (5) p. 5 - I think "with the multiple" shouldn't have "the".

      Indeed, we removed the “the”.

      Reviewer #2 (Recommendations for the authors):

      I enjoyed reading the manuscript, but thought the Introduction was a bit long. I wasn't sure about the relevance of the section on temporal contiguity. I think this might have been more relevant if this had been a manipulation in the design. So, I wonder if this might be shortened or removed to focus on the key questions. On the other hand, I found the overview of the view-selective and view-tolerant to be a bit brief. There is plenty of detail here, but I found it difficult to break down what was done when I first read it. It might be good to provide an overview in the Discussion too.

      While past research on the contribution of temporal contiguity to face identity recognition brings interesting insights into the nature of the visual experience leading to view-tolerant performance, we agree with the Reviewer that this aspect is not directly at stake here. We reduced the review of this literature in the Introduction. We clarified the description of the model observers as suggested by the reviewer and made sure to provide an overview of the model observers in the Discussion as well.

      References.

      Burton, A. M., R. Jenkins and S. R. Schweinberger (2011). "Mental representations of familiar faces." Br J Psychol 102(4): 943-958.

      Burton, A. M., R. S. Kramer, K. L. Ritchie and R. Jenkins (2016). "Identity From Variation: Representations of Faces Derived From Multiple Instances." Cogn Sci 40(1): 202-223.

      Dakin, S. C. and R. J. Watt (2009). "Biological "bar codes" in human faces." J Vis 9(4): 2 1-10.

      Dumont, H., A. Roux-Sibilon and V. Goffaux (2024). "Horizontal face information is the main gateway to the shape and surface cues to familiar face identity." PLoS One 19(10): e0311225.

      Goffaux, V. and J. A. Greenwood (2016). "The orientation selectivity of face identification." Scientific Reports 6(34204): 34204.

      Hansen, B. C., E. A. Essock, Y. Zheng and J. K. DeFord (2003). "Perceptual anisotropies in visual processing and their relation to natural image statistics." Network 14(3): 501-526.

      Jenkins, R., D. White, X. Van Montfort and A. Mike Burton (2011). "Variability in photos of the same face." Cognition 121(3): 313-323.

      Jiang, F., L. Dricot, V. Blanz, R. Goebel and B. Rossion (2009). "Neural correlates of shape and surface reflectance information in individual faces." Neuroscience 163(4): 1078-1091.

      Keil, M. S. (2008). "Does face image statistics predict a preferred spatial frequency for human face processing?" Proc Biol Sci 275(1647): 2095-2100.

      Keil, M. S. (2009). ""I look in your eyes, honey": internal face features induce spatial frequency preference for human face processing." PLoS Comput Biol 5(3): e1000329.

      Menon, N., R. I. Kemp and D. White (2018). "More than a sum of parts: robust face recognition by integrating variation." R Soc Open Sci 5(5): 172381.

      Mike Burton, A. (2013). "Why has research in face recognition progressed so slowly? The importance of variability." Q J Exp Psychol (Hove) 66(8): 1467-1485.

      Ringach, D. L., M. J. Hawken and R. Shapley (2003). "Dynamics of orientation tuning in macaque V1: the role of global and tuned suppression." Journal of neurophysiology 90(1): 342-352.

      Russell, R., I. Biederman, M. Nederhouser and P. Sinha (2007). "The utility of surface reflectance for the recognition of upright and inverted faces." Vision Res 47(2): 157-165.

      Russell, R., P. Sinha, I. Biederman and M. Nederhouser (2006). "Is pigmentation important for face recognition? Evidence from contrast negation." Perception 35(6): 749-759.

      Troje, N. F. and H. H. Bulthoff (1996). "Face recognition under varying poses: the role of texture and shape." Vision Res 36(12): 1761-1771.

      Vuong, Q. C., J. J. Peissig, M. C. Harrison and M. J. Tarr (2005). "The role of surface pigmentation for recognition revealed by contrast reversal in faces and Greebles." Vision Res 45(10): 1213-1223.

    1. Reviewer #2 (Public review):

      I think this paper is an excellent and timely contribution. It clearly shows that learning overlapping relationships in a disjoint training schedule (where the overlaps are not encountered close together in time) appears to aid the formation of an integrated associative memory structure (a cognitive map) and supports generalisation. I believe the methods are sound and the results are clear. I only have a couple of methodological questions that may not warrant any changes to the paper (or only very minor changes/additions):

      (1) The mixed effects models did not include random slopes for the within-subject factors ("spatial manipulation" and "block"), and so the corresponding fixed effect inferences may be unsafe. Having said that, it is likely that including these slopes may not be warranted given their contribution to the model's fit. I recommend that the authors check this.

      (2) The mixed effects models for accuracy appear to model average performance across trials rather than using a generalised linear model with a (e.g.) logit link function and the binomial distribution to characterise performance. I think this is a little sub-optimal, as the latter is often more sensitive. Nonetheless, it is not in any way wrong; the results are clear enough as is, and there may be a good reason to avoid a non-linear link function, which can alter the interpretation of effects close to the ceiling and floor.

      I think the introduction and/or discussion would benefit from contrasting their results with Berens & Bird (2022, PLOS Comp Bio). In this paper, it is shown that blocking the training of discriminations in a linear hierarchy (what we call progressive training) substantially benefited transitive inference performance. This seems at odds with the author's finding that "participants struggle to integrate information across rows and columns, i.e. across groups of transitions that were trained separately in time".

      I would really like to know what the authors think about this discrepancy (or, indeed, whether they think there is one at all). Is it possibly because "progressive" learning is some combination of "grouping", "blocking" and "chaining" (where there is a structured overlap between adjacently trained relationships)? Or is it something else, e.g., that there is a fundamental difference between learning associations and discriminations (personally, I lean on this explanation)?

      Relevant to this, the authors note that their "findings do contradict recent reports from the category learning literature, where blocking seems to help learning and generalisation (Dekker et al., 2022; Flesch et al., 2018; Noh et al., 2016). It may be that where the goal is not to learn a complex knowledge structure - like a map - but simply to compress exemplars by mapping them onto a smaller number of labels - the benefits of blocking emerge." However, the benefit of progressive (blocked) training in my own work was observed in a task that required learning a complex/relational structure in the form of a transitive hierarchy, which theoretical accounts suggest depends on learning map-like representations (Whittington et al., 2020).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #2 (Public review): 

      Weaknesses:

      (1) Can the authors comment on the possibility of inflammatory response pathways being activated by hypoxia? Has this been shown before? While not the focus of the manuscript, it could be discussed in the Discussion as an interesting finding and potential involvement of other cells in the Hypoxic response.

      We thank the reviewer for reviewing our manuscript and for the important comment about inflammation. Indeed, hypoxia has been shown to activate the inflammatory response pathways. In various studies, it was found that HIF-1a can interact with NF-κB signaling, leading to the upregulation of pro-inflammatory cytokines such as IL-1β, IL-6, and TNF-α (Rius et al., Cell, 2008; Hagberg et al., Nat Rev Neurol, 2015).

      In our transcriptomics data (Fig. 2D), and to the reviewers’ point, we identified enrichment of inflammatory signaling response following the hypoxic exposure. Since hSO at the time of analyses do contain some astrocytes, we think these contribute to the observed pro-inflammatory changes and emphasize the feasibility of capturing this response in organoids in vitro. This is also important because ADM is known to have anti-inflammatory properties and should be investigated as such in future studies focused on hypoxia-induced inflammation.

      In the manuscript, we included a few sentences in the discussion to address the lack of in-depth analyses of inflammation as a limitation of our study.

      (2) Could the authors comment on the mechanism at play here with respect to ADM and binding to RAMP2 receptors - is this a potential autocrine loop, or is the source of ADM from other cell types besides inhibitory neurons? Given the scRNA-seq data, what cell-to-cell mechanisms can be at play? Since different cells express ADM, there could be different mechanisms in place in ventral vs dorsal areas.

      Based on our scRNA-seq data in hSOs showing significant upregulation of ADM expression in astrocytes and progenitors, and increased expression of RAMP2 receptors on neurons, we speculate that the primary mechanism is likely to involve paracrine interactions. However, we cannot exclude autocrine mechanisms with the current experiments. Dissecting these interactions in a cell-type specific manner could be an important focus for future ADM-related studies.

      To address the question about the possible different mechanisms in ventral versus dorsal areas, in the revision, we plotted and included in the figures the data about the cell-type expression of ADM and its receptors in hCOs (Fig. S3)

      (3) For data from Figure 6 - while the ELISA assays are informative to determine which pathways (PKA, AKT, ERK) are active, there is no positive control to indicate these assays are "working" - therefore, if possible, western blot analysis from assembloid tissue could be used (perhaps using the same lysates from Figure 3) as an alternative to validate changes at the protein level (however, this might prove difficult); further to this, is P-CREB activated at the protein level using WB?

      We thank the reviewer for this comment and the observation. Although we did not include a traditional positive control in these ELISA assays, several lines of evidence indicate that the measurements are reliable. First, the standard curves behaved as expected, and all sample values fell within the assay’s dynamic range. Second, technical replicates showed low variability, and the observed changes across experimental conditions (e.g., hypoxia vs. control) were consistent with the expected biological responses based on previous literature. We agree that including western blot validation would strengthen the findings, and we will note this for our future studies focused on CREB and ADM.

      (4) Could the authors comment further on the mechanism and what biological pathways and potential events are downstream of ADM binding to RAMP2 in inhibitory neurons? What functional impact would this have linked to the CREB pathway proposed? While the link to GABA receptors is proposed, CREB has many targets beyond this.

      We appreciate the reviewers’ insightful question. Currently, not much is known about the molecular pathways and downstream cellular events triggered by ADM binding to RAMP2 in inhibitory neurons, and in general in brain cells. The data from our study brings the first information about the cell-type specific expression of ADM in baseline and hypoxic conditions and is one of the key novelties of our study.

      While the signaling landscape of ADM in interneurons is largely unexplored, several studies in other (non-brain) cell types have demonstrated that ADM binding to RAMP2 can activate downstream cascades such as the cAMP/PKA/CREB pathway, PI3K/AKT, and ERK/MAPK, all of which are also known to be critical regulators of neuronal development and survival. These previously published data along with our CREB-targeted findings in hypoxic interneurons, suggest ADM–RAMP2 signaling could influence multiple aspects of interneuron biology, but these remain to be evaluated in future studies.

      We agree with the reviewer that CREB has a wide range of transcriptional targets. We decided to focus on GABA as a target of CREB for two main reasons, including: (i) GABA signaling has been previously shown to play an important role in the migration of cortical interneurons, and (ii) a previous study by Birey et al. (Cell Stem Cell, 2022) demonstrated that CREB pathway activity is essential for regulating interneuron migration in assembloid models of Timothy Syndrome, thus further providing evidence that dysregulation of CREB activity disrupts migration dynamics.

      While our study provides a first step toward uncovering the mechanisms of interneuron migration protection by ADM, we fully acknowledge that future work will be needed to delineate the full spectrum of ADM–RAMP2 downstream signaling events in inhibitory neurons and other brain cells.

      (5) Does hypoxia cause any changes to inhibitory neurogenesis (earlier stages than migration?) - this might always be known, but was not discussed.

      We appreciate this question from the reviewer; however, this was not something that we focused on in this manuscript due to the already large amount of data included. A separate study focusing on neurogenesis defects and the molecular mechanisms of injury for that specific developmental process would be an important next step.

      (6) In the Discussion section, it might be worth detailing to the readers what the functional impact of delayed/reduced migration of inhibitory neurons into the cortex might result in, in terms of functional consequences for neural circuit development.

      We thank the Reviewer for the suggestion of detailing the functional impact of reduced inhibitory neuron migration. The manuscript to discuss that previous studies show that failure of interneurons to migrate and reach their designated targets within the appropriate developmental window leads to their elimination through apoptosis. Decreased numbers (or abnormal development) of interneurons are associated with neurodevelopmental impairments and abnormal functional connectivity in the brain.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The authors should examine if all cortical interneurons are affected by ADM or only subtypes (Parvalbumin/Somatostatin).

      We thank the reviewer for raising this important question. In our study, we utilized the Dlx1/2b::eGFP reporter to broadly label cortical interneurons; however, this system does not distinguish specific interneuron subtypes. To address this, in the manuscript we used the single-cell RNA sequencing data and immunostainings to provide this information. As expected based on our previous reports, most cortical interneurons present in organoids are represented by calretinin (CALB2), somatostatin (SST) and calbindin (CALB1). These data are now presented in Fig. S3.

      Separately, we used available scRNA-seq data from developing human brain and showed that at ~20 PCW, the developing human brain has similar types of cortical interneurons. These data are now included in Fig. S5.

      (2) The authors should test more candidates from their bulk RNA-seq data with different fold changes for regulation after hypoxia, to allow the reader to judge at which cut-off the DEGs may be reproducible. This would make this database much more valuable for the field of hypoxia research.

      We appreciate the reviewers’ thoughtful suggestion. In addition to the bulk RNA-seq analysis, we did validate several upregulated hypoxia-responsive genes with varying fold changes by qPCR; these include PDK1, PFKP, VEGFA (Fig. S1).

      We do agree that in-depth investigation of specific cut-offs would be interesting, however, this could be the focus of a different manuscript.

      Reviewer #3 (Recommendations for the authors):

      Most of the evidence presented is convincing in supporting the conclusions, and I have only minor suggestions for improvement:

      (1) The bulk RNA-seq was performed in hSOs only, which may not fully capture the phenotypes of migrating or migrated interneurons. It would be valuable, if feasible, to sort migrated cells from hSO-hCO assembloids and specifically examine their molecular mediators.

      We thank the reviewer for this suggestion. While it is likely that the cellular environment will have some influence on a subset of the molecular changes, based on all the data from the manuscript and our specific target, the RNA-sequencing on hSOs was sufficient to capture essential changes like ADM upregulation. The in-depth exploration on differential responses of migrated versus non-migrated interneurons to hypoxia could be the focus of a different project.

      (2) In Figure 3, it is striking that cell-type heterogeneity dominates over hypoxia vs. control conditions. A joint embedding of hSO and hCO cells could provide further insight into molecular differences between migrated and non-migrated interneurons.

      We thank the reviewer for this observation and opportunity to clarify. Since we manually separated the assembloids before the analyses, we processed these samples separately. That is why they separate like this. In the revision, we added data about ADM expression and its receptors’ expression in the hCOs.

      (3) It would be helpful to expand the discussion on how closely the migration observed in hSO-hCO assembloids reflects in vivo conditions, and what environmental aspects are absent from this model. This would better frame the interpretation and translational relevance of the findings.

      We thank the Reviewer for bringing up this important point. Although the assembloid model offers the unique advantage of allowing the direct investigation of migration patterns of hypoxic interneurons, we fully agree it does not fully recapitulate the in vivo environment. While there are multiple aspects that cannot be recapitulated in vitro at this time (e.g. cellular complexity, vasculature, immune response, etc), we are encouraged by the validation of our main findings in ex vivo developing human brain tissue, which strongly supports the validity of our findings for in vivo conditions.

      We expanded our discussion to include more details and the need to validate these findings using in vivo models.

      (4) The authors suggest that hypoxia is also associated with delayed interneuron maturation, yet the bulk RNA-seq data primarily reveal stress and hypoxia-related genes. A more detailed discussion of why genes linked to interneuron maturation and function were not strongly affected would clarify this point.

      We thank the Reviewer for the opportunity to clarify.

      The RNAseq data was performed during the acute stages of hypoxia/reoxygenation and we think a maturation phenotype might be difficult to capture at this point and would require analysis at later in vitro assembloid maturation stages.

      Our speculation about a possible maturation defect is based on data from previous studies from developmental biology that showed failure of interneurons to reach their final cortical location within a specified developmental window will impair their integration within the neuronal network, and thus lead to maturation defects and possible elimination by apoptosis.

      Since preterm infants suffer from countless hypoxic events over multiple months, we speculate these repetitive events are likely to induce cumulative delays in migration, inability of interneurons to reach their target in time, followed by abnormal integration within the excitatory network, and eventual elimination of some of these interneurons through apoptosis. However, the direct demonstration of this effect following a hypoxic insult would require prolonged in vivo experiments in rodents to follow the migration, network integration and apoptosis of interneurons; to our knowledge this experimental design is not technically feasible at this time, and thus this hypothesis remains speculative and only included in the discussion.

      (5) Relatedly, while the focus on interneuron migration is well justified, acknowledging how hypoxia might also impact other aspects of cortical development (e.g., progenitor proliferation, neuronal maturation, or circuit integration) would place the findings in a broader developmental framework and strengthen their relevance.

      We appreciate the Reviewer’s suggestion to discuss the role of hypoxia on other interneuron developmental processes during cortical development. In the manuscript, we included text in the discussion about the likely effects of hypoxia on interneuron proliferation, maturation and circuit integration.

      (6) Very minor: in Figure S3C and D, it was not stated what the colors mean (grey: control, yellow: hypoxia)

      Thank you for pointing out this error; we corrected it in our revision.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #2 (Public review):

      In the manuscript Ruhling et al propose a rapid uptake pathway that is dependent on lysosomal exocytosis, lysosomal Ca2+ and acid sphingomyelinase, and further suggest that the intracellular trafficking and fate of the pathogen is dictated by the mode of entry. Overall, this is manuscript argues for an important mechanism of a 'rapid' cellular entry pathway of S.aureus that is dependent on lysosomal exocytosis and acid sphingomyelinase and links the intracellular fate of bacterium including phagosomal dynamics, cytosolic replication and host cell death to different modes of uptake. 

      Key strength is the nature of the idea proposed, while continued reliance on inhibitor treatment combined with lack of phenotype / conditional phenotype for genetic knock out is a major weakness. 

      In the revised version, the authors perform experiments with ASM KO cells to provide genetic evidence of the role for ASM in S. aureus entry through lysosomal modulation. The key additional experiment is the phenotype of reduced bacterial uptake in low serum, but not in high serum conditions. The authors suggest this could be due to the SM from serum itself affecting the entry. While this explanation is plausible, prolonged exposure of cells to low serum is well documented to alter several cellular functions, particularly in the context of this manuscript, lysosomal positioning, exocytosis and Ca2+ signaling. A better control here could be WT cells grown in low serum.

      As the reviewer suggested, we did culture both, WT control cells as well as ASM knock-outs, under low serum conditions before conducting the invasion assays. Hence, the detected effects on S. aureus invasion must be caused by lack of functional ASM in the mutant.

      We apologize that this did not become evident from the manuscript’s text. We thus included a change in line 259 which now reads:

      ”To test whether FBS confounded our invasion experiments, we cultivated WT as well as ASM K.O. cells in medium with reduced FBS concentration (1%) and determined the S. aureus invasion efficiency (Figure 2I).”

      If SM in serum can interfere, why do they see such pronounced phenotype on bacterial entry in WT cells upon chemical inhibition?

      We explain the differences between inhibitor-treated WT cells and ASM K.O.s by the severe accumulation of SM upon genetic ablation of ASM. We demonstrated this by HPLC-MS/MS measurements in Figure 2L. If cells were cultured in 10% FBS, an ASM K.O. resulted in approx. 4-times higher levels of cellular SM C18:0 when compared to WT cells, while amitriptyline treatment of WT cells had no effect, and ARC39 treatment increased SM C18:0 levels only by 2-fold. This likely results from different durations of SM accumulation in the cell pools which is caused either by complete absence of ASM (in case of the ASM K.O.) or only in the hour-range upon treatment with the inhibitors.

      Under low serum conditions, the severe SM C18:0 accumulation in the ASM K.O. was found decreased (from 4-fold to 2-fold when compared to WT cells; Figure 2M). Here, the WT cells used as reference also were cultured in the same manner as the ASM K.O. A similar pattern was observed for other SM species (Supp. Figure 3). This correlates with the S. aureus invasion phenotype in ASM K.O.: under high serum conditions (and resulting in severe SM accumulation) we did not detect an invasion defect, while under low serum conditions (resulting in only moderate SM accumulation) S. aureus invasion was reduced in the knock-outs when compared to WT cells cultured in the same conditions, respectively.

      While the authors argue a role for undetectable nano-scale Cer platforms on the cell surface caused by ASM activity, results do not rule out a SM independent role in the cellular uptake phenotype of ASM inhibitors.

      Since the comments starting with the line above are identical to the previous comments by the reviewer, we assume that these points of criticism still resound with the Reviewer, although we had agreed previously with the reviewer that we do not show formation of ceramide-enriched platforms, we had changed the manuscript accordingly in the previous revision round already (see also our comment below).

      The authors have attempted to address many of the points raised in the previous revision. While the new data presented provide partial evidence, the reliance on chemical inhibitors and lack of clear results directly documenting release of lysosomal Ca2+, or single bacterial tracking, or clear distinction between ASM dependent and independent processes dampen the enthusiasm.

      We continue to share the reviewer’s desire to discriminate between ASM-dependent and ASMindependent processes, but the simultaneous occurrence of multiple pathways of bacterial uptake is currently the limiting factor and technological challenge in our laboratory, since these events happen rapidly. We do hope that we or others will be able to address these limitations in the future, for instance with the technologies suggested by the reviewer.

      I acknowledge the author's argument of different ASM inhibitors showing similar phenotypes across different assays as pointing to a role for ASM, but the lack of phenotype in ASM KO cells is concerning. The author's argument that altered lipid composition in ASM KO cells could be overcoming the ASMmediated infection effects by other ASM-independent mechanisms is speculative, as they acknowledge, and moderates the importance of ASM-dependent pathway. The SM accumulation in ASM KO cells does not distinguish between localized alterations within the cells. If this pathway can be compensated, how central is it likely to be ? 

      We here want to elaborate again, since our revision experiments demonstrate the ASM-dependency of the rapid uptake under low serum conditions – see also above. We were convinced that the genetic evidence of an S. aureus invasion phenotype in ASM K.O.s under these conditions would eliminate the reviewer’s concern about the role of ASM during the bacterial invasion (see also above). Our lipidomics data of ASM K.O.s cultured in 1% and 10% FBS (Figure 2, M, Supp. Figure 3) and inhibitor-treated WT cells (Figure 2L, Supp. Figure 3) show a correlation between SM accumulation and the invasion phenotype observed by us.

      We agree with the reviewer, however, that it remains elusive why changes in the sphingolipidome increase ASM-independent S. aureus internalization by host cells. One explanation is a dysfunction of the lipid raft-associated protein caveolin-1 upon strong SM accumulation, which was previously shown to appear in ASM-deficient cells (1, 2). A lack of caveolin-1 results in strongly increased host cell entry of S. aureus in certain cell types (3, 4). In other cell types, such as A549 cells, S. aureus invades in an αtoxin and caveolin-1 dependent fashion (5). It will be interesting to study, to what extent such processes as described by Goldmann and colleagues will depend on ASM. However, a characterization of the mechanism behind these observations requires further experimentation and is beyond the scope of the current manuscript. 

      As to the centrality of the pathway: we cannot and do not make any assumptions on the centrality of the pathway and its importance in vivo. As scientists we were intrigued by our finding of an ASM dependent uptake pathway for S. aureus – especially its speed. In different as of yet still unidentified host cell types or cell lines such a pathway may pose a major entry point for pathogens. Alternatively, we may have identified an ASM-dependent mode of receptor uptake, with which the bacteria “piggyback” into the cells.

      The authors allude to lower phagosomal escape rate in ASM KO cells compared to inhibitor treatment, which appears to contradict the notion of uptake and intracellular trafficking phenotype being tightly linked. As they point out, these results might be hard to interpret.

      We again want to add that we measured phagosomal escape of S. aureus in WT and ASM K.O. cells cultured in 1% FBS (low serum conditions) and compared it to escape rates obtained with host cells cultured in 10% FBS. Again, we infected cells for 10 or 30 min and determined the escape rates 3h p.i. However, the results are similar to escape rates determined with 10% FBS (see Author response image 1). This was addressed already during the manuscript’s first revision. We found that escape rates of S. aureus were significantly decreased in absence of ASM regardless of the FBS concentration in the medium.

      Author response image 1.

      We therefore think that prolonged absence of ASM has additional side effects. For instance, certain endocytic pathways could be up- or down-regulated to adapt for the absence of ASM or could be affected by other changes in the lipidome (that can be minimized but not completely prevented by culturing cells in 1% FBS). This could, for instance, affect maturation of S. aureus-containing phagosomes and hence phagosomal escape.

      As it is currently unclear in how far the prolonged absence of ASM activity affects cellular processes, we think other experiments investigating the role of ASM-dependent invasion for phagosomal escape are more reliable. Most importantly, bacteria that enter host cell early during infection (and thus, predominantly via the “rapid” ASM-dependent pathway) possess lower phagosomal escape rates than bacteria that entered host cells later during infection (Figure 5, D and E). This is confirmed by higher escapes rates upon blocking ASM-dependent invasion with Vacuolin-1 (Figure 4E) and three different ASM inhibitors (Figure 4C and D). We further demonstrate that sphingomyelin on the plasma membrane during invasion influences phagosomal escape, while sphingomyelin levels in the phagosomal membrane did not change phagosomal escape (Figure5 a and b). This is summarized in Figure 5F.

      Could an inducible KD system recapitulate (some of) the phenotype of inhibitor treatment? If S. aureus does not escape phagosome in macrophages, could it provide a system to potentially decouple the uptake and intracellular trafficking effects by ASM (or its inhibitor treatment) ?

      Knock-downs in our laboratory are based on the vector pLVTHM(6). Inducible knock-downs in the cells would require the introduction of an inducible Tet<sup>on</sup> system, which the cells currently do not harbor.

      However, it needs to be stated that for optimal gene knock-downs, the induction of this system has to be performed by doxycycline supplementation in the medium for 7 days thus leading to several days of growth of the cells, which will allow the cells to adapt their lipid metabolism thus reflecting a situation that we encounter for the K.O.s.

      ASM-dependent uptake of S. aureus in macrophages has been demonstrated before (7). However, the course of infection in macrophages differs from non-professional phagocytes (8). E.g. in macrophages, S. aureus replicates within phagosomes, whereas in non-professional phagocytes replicates in the host cytosol. Absence of ASM therefore may influence the intracellular infection of macrophages with S. aureus in a distinct manner.

      The role of ASM on cell surface remains unclear. The hypothesis proposed by the authors that the localized generation of Cer on the surface by released ASM leads to generation of Cer-enriched platforms could be plausible, but is not backed by data, technical challenges to visualize these platforms notwithstanding. These results do not rule out possible SM independent effects of ASM on the cell surface, if indeed the role of ASM is confirmed by controlled genetic depletion studies.

      We agree with the reviewer that we do not show generation of ceramide-enriched platforms (see also above). We thus already had changed Figure 6F in the revised manuscript to make clear that it remains elusive whether ceramide-enriched platforms are formed. We also had added a sentence to the discussion (line 615) to emphasize that the existence of these microdomains is still debated in lipid research.

      We think that the following observations support SM-dependent effects of ASM during S. aureus invasion:

      (i) Reduced invasion upon removing SM from the plasma membrane (Figure 2N, Supp. Figure 2M)

      (ii) Increased invasion in TPC1 and Syt7 K.O. (Figure 2, P) in presence of exogenously added SMase.

      However, we agree with the reviewer that we do not directly demonstrate ASM-mediated SM cleavage during S. aureus invasion. Hence, we had added a sentence to the discussion that mentions a possible SM-independent role of ASM for invasion (line 556) that reads:

      “Since it remains elusive to which extent ASM processes SM on the plasma membrane during S. aureus invasion, one may speculate that ASM could also have functions other than SM metabolization during host cell entry of the pathogen. However, we did not detect a direct interaction between S. aureus and ASM in an S. aureus-host interactome screen (9).”

      The reviewer acknowledges technical challenges in directly visualizing lysosomal Ca2+ using the methods outlined. Genetically encoded lysosomal Ca2+ sensor such as Gcamp3-ML1 might provide better ways to directly visualize this during inhibitor treatment, or S. aureus infection. 

      We again thank the reviewer for this suggestion. We already had included the following section in our discussion (then: line 593): “Since fluorescent calcium reporters allow to monitor this process microscopically, future experiments may visualize this process in more detail and contribute to our understanding of the underlying signaling. mechanisms.”

      References for the purpose of this response letter:

      (1) Rappaport, J., C. Garnacho, and S. Muro, Clathrin-mediated endocytosis is impaired in type AB Niemann-Pick disease model cells and can be restored by ICAM-1-mediated enzyme replacement. Mol Pharm, 2014. 11(8): p. 2887-95.

      (2) Rappaport, J., et al., Altered Clathrin-Independent Endocytosis in Type A Niemann-Pick Disease Cells and Rescue by ICAM-1-Targeted Enzyme Delivery. Mol Pharm, 2015. 12(5): p. 1366-76.

      (3) Hoffmann, C., et al., Caveolin limits membrane microdomain mobility and integrin-mediated uptake of fibronectin-binding pathogens. J Cell Sci, 2010. 123(Pt 24): p. 4280-91.

      (4) Tricou, L.-P., et al., Staphylococcus aureus can use an alternative pathway to be internalized by osteoblasts in absence of β1 integrins. Scientific Reports, 2024. 14(1): p. 28643.

      (5) Goldmann, O., et al., Alpha-hemolysin promotes internalization of Staphylococcus aureus into human lung epithelial cells via caveolin-1- and cholesterol-rich lipid rafts. Cell Mol Life Sci, 2024. 81(1): p. 435.

      (6) Wiznerowicz, M. and D. Trono, Conditional suppression of cellular genes: lentivirus vectormediated drug-inducible RNA interference. J Virol, 2003. 77(16): p. 8957-61.

      (7) Li, C., et al., Regulation of Staphylococcus aureus Infection of Macrophages by CD44, Reactive Oxygen Species, and Acid Sphingomyelinase. Antioxid Redox Signal, 2018. 28(10): p. 916-934.

      (8) Moldovan, A. and M.J. Fraunholz, In or out: Phagosomal escape of Staphylococcus aureus. Cell Microbiol, 2019. 21(3): p. e12997.

      (9) Rühling, M., et al., Identification of the Staphylococcus aureus endothelial cell surface interactome by proximity labeling. mBio, 2025. 0(0): p. e03654-24.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) The study does not explore or discuss how oral ingestion of Nora virus leads to the colonization of stem cells, which are located basally in the gut. This mechanism should be discussed.

      We have added an additional paragraph (4th) in the Discussion dealing with this issue and are further discussing the consequences of RNAi potentially not being functional in progenitor cells in the paragraph on antiviral responses.

      (2) The authors fail to detect Dicer-GFP fusion protein expression in stem cells, a finding that could explain why the virus persists in these cells. Further investigation is needed to determine whether RNAi functions are effective in stem cells compared to enterocytes. For clarification, the authors could cross esg-Gal4 UAS-GFP and Myo-Gal4 UAS-GFP with UAS GFP-RNAi and/or express a Dicer-GFP construct under a stem cell-specific driver.

      Actually, it is well-known in the Drosophila literature on the intestinal epithelium that RNAi functions well in progenitor cells as the technique has been widely used to understand the control of stem cell division and differentiation in tens of articles. We provide here just a few examples: Jiang et al., Nat Commun (2025) https://doi.org/10.1038/s41467-024-55255-1; Zhai et al., PLoS Genetics (2017) https://doi.org/10.1371/journal.pgen.1006854; Wu et al., https://doi.org/10.1371/journal.pgen.1009649.

      (3) The presentation of experimental parameters (e.g., pathogen type, temperature, time points) should be improved in the results section and at the top of the figures to enhance clarity. Additionally, details regarding the mode of oral infection (continuous exposure vs. single feeding on a filter) should be specified. Given that fly stock flipping frequency influences microbiota load (as noted in Broderick et al.), this should be reported, especially for lifespan studies.

      P. aeruginosa oral infection was always by continuous exposure, as detailed in the Mat.& Meth. section. Nora infection was done by exposure to the viral solution for 24h, as detailed in Mat. & Meth. The flipping frequency had also been reported in that section.

      (4) To confirm that enterocyte colonization requires stem cell proliferation and differentiation, the authors should analyze Nora virus localization in JAK-STAT-deficient flies infected with bacteria or toxicants. This would help determine whether the virus can infect enterocytes in the absence of enterocyte differentiation, but stimulation of stem cells.

      We now provide these data (pictures and quantification) in Fig.7 G-H and discuss them in the main text.

      (5) The study does not discuss the spatial distribution of Nora virus infection along the gut. Specifically, it remains unclear whether viral colonization is higher in gut regions R2 and R3, which contain proliferative stem cells. Addressing this could provide valuable insights into the virus's infection dynamics.

      We have now specified that Nora virus was detected only in the posterior midgut; we are now also providing a schematic illustration in Fig. S5J.

      Recommendations for the authors:

      Major Suggestion

      See weaknesses section for key areas requiring improvement.

      Minor Suggestions

      (1) Line 79: Mention Nox in the text. Key references on Nox include Jones (2013), Iatsenko (2018), and Patel (2016).

      Done.

      (2) Line 92: The long list of publications is unnecessary and can be shortened.

      We are not sure that many investigators are aware of the scope of our studies on host-pathogen relationships and this is the adequate place for a reminder.

      (3) Line 196: Cite Choi et al. (Aging Cell, 2008; 7:318-334. doi: 10.1111/j.1474- 9726.2008.00380.x) for the initial work on gut dysplasia during aging. However, note that dysbiosis in aging is demonstrated in Buchon et al. (2009, Genes and Development) and other studies.

      Done.

      (4) Line 265: It would be interesting to clarify whether the shortened lifespan of Norainfected flies after a clean injury is dependent on the microbiota.

      The shortened life span of Nora-infected flies is not due to the injury as demonstrated in Fig. S4F. Hence, the shortened lifespan is differentially affected by the microbiota according to nutrition conditions as documented in Fig. 3D-E.

      (5) Line 285: Clarify what is meant by "polyubiquitin promoter"-do the authors mean a ubiquitous Gal4 driver? Specify the Gal4 lines used in the result section.

      Done. The construct is a direct fusion of the ubiquitin p63E promoter to the Dicer-fluorescent protein sequences as described in Girardi et al., Sci Rep, 2015.

      (6) Line 347: Indicate the references aligning with the most recent studies on this topic.

      Done.

      (7) Line 373 and elsewhere: Mention studies that have shown the microbiota influence on lifespan, in relation to dietary richness.

      Done.

      (8) Line 588: Provide details on the method used for hemolymph collection.

      Done.

      (9) Line 964: Clarify the phrase "as previously shown"-where in this paper was it demonstrated?

      The legends have been rewritten and the phrase has been deleted.

      (10) Line 987: In "survival of non-infested with PA14," explicitly mention Nora to distinguish between different infections.

      Done.

      Figures & Experimental Details

      (11) Figures: Improve figure legends or add information at the top of figures, specifying:

      Number of flies used to monitor Nora virus titer.

      Temperature conditions. o Age of flies used in experiments.

      Done.

      (12) Figure 2E: The lifespan of Nora-negative flies appears very short. Was this lifespan assay conducted at 29{degree sign}C? What was the fly stock flipping rate?

      Correct, it was 29°C. As described in the Material and Methods section, the flies were flipped every two (29°C) to four days (25°C).

      (13) Figure 4C: Improve labeling on the plate for better clarity.

      Done.

      (14) Figure 6C: The figure legend on the right is difficult to interpret. Clarify what "+" indicates and explicitly write out the genotype. Is NP identical to NPG4G80?

      Done. NP is the NP1 driver. We usually use it in a version that also includes a Gal80<sup>ts</sup> transgene to express the gene of interest only at the adult stage.

      (15) Dissection Details: Clearly state which part of the gut was dissected-midgut, entire gut, {plus minus} Malpighian tubules. This should be specified in the results section.

      Done (no Malpighian tubules nor crop) for RTqPCR analyses.

      (16) Clean Injury: Provide more details in the results section regarding the injury site and needle size.

      Done.

      (17) Use "Abx" instead of "AntiB," as the former is more commonly recognized.

      Done.

      Reviewer #2 (Public review):

      The title does not seem to be fully supported by the data. While the authors convincingly show the increased sensitivity to Pseudomonas infection, effects on another tested bacterium, Serratia marcescens, were not significantly different between Nora-virus-infected and noninfected flies. Thus, effects of 'intestinal infection' seem to be too broad a claim.

      We agree with the reviewer and have accordingly modified the title, which now explicitly refers to P. aeruginosa.

      Also, whether the Nora virus increases sensitivity to oxidative stress is not so clear to me: the figure that supports this claim is the survival assay of Figure 5F. However, the difference in survival between control and paraquat-treated Nora (-) flies seems to be in the same order as between control and paraquat-treated Nora (+) flies. Rather, cause and effect seem to be the reverse: paraquat increases ISC proliferation, higher viral loads, and consequently shorter survival. I suggest rephrasing the title and conclusions accordingly.

      While we usually just directly compare Nora (+) vs. Nora (-) flies with the same conditions, we note that the difference of survival between control and paraquat-treated Nora (-) flies is of about 9 days, based on LT50 values whereas it is of 8 days for Nora(+) flies. This difference is of about two days when comparing Nora (+) to Nora (-) flies exposed to paraquat. Thus, Nora does contribute to an increased sensitivity to oxidative stress likely by the process highlighted by the reviewer and also by its own detrimental action on the homeostasis of the intestinal epithelium and associated disruption of its barrier function.

      Quantification of immunofluorescence microscopy is missing, rendering the images somewhat anecdotal. Quantification should be provided. It will then also be of interest to quantify the number of Nora (+) cells, and the Nora virus levels per infected cell (e.g. Figure 5H). Also, the claim that the Nora virus initially infects ISC and later (upon stress) infects enterocytes requires quantification.

      Missing quantifications of pictures have been added: Figs. S5E and 7H. We are not sure we understand the reviewer comment on “Nora virus levels per infected cell”: the signal we are seeing may correspond to aggregates of the virus and would be impossible to quantify reliably, e.g., in the right-most panel of Fig. 5H. Fig. 5I clearly shows that no Nora is detected in enterocytes of young 5-day-old flies in the absence of infectious or xenobiotic challenge.

      Genetic support for the role of the JAK-STAT pathway in driving ISC proliferation and supporting Nora virus replication is convincing. It would also be of interest to analyze other pathways implicated in ISC proliferation (e.g. JNK, EGFR), especially given the observations of Nigg et al, showing an involvement of STING/NF-kB and EGFR pathway in driving intestinal phenotypes of Drosophila A virus-infected flies (doi: 10.1016/j.cub.2024.05.009).

      We agree with the reviewer that these would be interesting experiments to perform, especially in the light of one hypothesis that antiviral defenses may prevent the initial infection of enterocytes as discussed at length in our updated discussion on host antiviral defenses. However, we are currently unable to perform additional experiments and leave it to other interested investigators studying antiviral innate immunity to address these questions. In this work, we used the interference with the JAK-STAT pathway as a second tool to block the division of ISCs.

      Figure 5E: An intriguing observation is that GFP:Dicer2 seems to be unstable in Nora virusinfected cells. Here, GFP control driven by the same driver line would be required to confidently conclude that this is due to an effect on Dicer-2 specifically.

      Actually, this experiment was not performed using the Gal4-UAS system but a direct fusion. We do know that GFP is stable when expressed in enterocytes, e.g., Lee et al., Cell Host&Microbe (2016) DOI: 10.1016/j.chom.2016.10.010.

      Legends are mostly conclusive, and essential information about the experimental setup is missing in the captions of multiple figures, making the interpretation of the data difficult. See my private recommendations for suggestions to improve the data presentation.

      Done.

      Recommendations for the authors:

      Suggestions for the presentation of the data:

      (1) I found the names Ore-R(SC) and Ore-R(SM) for noninfected vs infected Ore-R flies not very intuitive. I suggest renaming them into something that makes the infection status clear.

      These notations refer to two distinct sub-strains that may reflect different origins with some likely genetic drift accounting for the distinct properties of the two sub-strains. As the ORE-R (SM) have different infection status: infested, cleaned, re-infected, we fear that this would not clarify the matter. Of note, ORE-R(SC) are refractory to Nora virus infection (Fig. S1I).

      (2) Please define the number of flies analyzed for survival assays in the legends.

      Done.

      (3) The authors provide conclusions in most of the figure legends, without providing an explanation of the experiment that was done. Conclusions should be used sparingly, if at all, in legends. Also, relevant information is often missing in the legends (time points after infection, Figure 2E food source, etc.). I suggest the authors carefully double-check their legends and rephrase the conclusive legends with descriptive ones.

      Done. The figure legends have been rewritten.

      (4) Several of the legends indicate that 'data represent the mean of biological triplicates' however some panels do not represent triplicates (e.g. Figure 1C-E). Please correct.

      Done.

      (5) Legends: which multiple comparison test was used for ANOVA?

      Done. Tukey’s post-hoc test was used for direct comparisons.

      (6) Line 888: black arrows are not shown in the figure.

      Corrected.

      (7) Figure 1F: legend on the figure seems incorrect (all are labeled Nora (+)); likewise for Figure 2C (all labeled Nora (-)).

      Corrected.

      (8) Materials and methods: please describe how the Nora virus antibody was raised (and specify on line 271 what viral protein is recognized).

      Done. As the whole virus was used for immunization, we cannot state which specific viral proteins are detected by the antibody.

      (9) Please define what is presented in the box plots (mean, range, whiskers, individual data points).

      Done.

      (10) Figure 4 and associated text (line 221): a brief explanation of the Smurf assay would be useful.

      Done.

      (11) Figure 4C: I did not find the picture of the agar plate informative, as similar information is conveyed in Figure 4D. Also, the labelling cannot be clearly read.

      Figure 4D provides a quantification of panel C. The readability has been improved.

      (12) Figure 4C: It is suggested that Nora-positive, smurf-negative flies were analyzed, but from Figure 4B it seems that these do not exist. Please explain.

      The data in Fig. 4B do not represent absolute numbers but percentages. Thus, there were at most 50% of SMURF-positive flies at the time of the assay, the rest being Smurf-negative yet Nora-positive.

      (13) The abbreviations PA14 and Db11 are used in several figures. I would suggest defining the abbreviation in the legend to facilitate interpretation.

      Done.

      (14) Figure 5A/5G: the Nora virus RNA levels in this figure are dramatically lower than the levels in other figure panels. Please check/correct.

      Done. The reviewer is indeed correct: we have forgotten to write that for these two panels, the loads are relative and not absolute as is the case in other panels. 5A: the load in whole flies was taken to be 1; 5G: untreated Nora-positive flies were taken to be 1.

      (15) Figure 6A: total number of AporTag positive cells are reported. Were the same number of total cells analyzed? Please define.

      We have not counted all of the cells in each midgut but provide the number of ApopTag positive cells per midgut. We thus make the assumption that the overall number of midgut cells is not varying much from one midgut to the other. Visual inspection of DAPI-stained nuclei did not reveal any obvious change in the density of enterocyte nuclei as illustrated in Fig. S6 (we guess that everyone in the field is making the same assumption when counting mitotic ISCs with PHH3 staining).

      (16) Figure 6C: I find the shades of blue difficult to distinguish and suggest to us other colors.

      Done.

      (17) There seems to be a large mismatch between the percentage of Nora virus-positive cells in Figures 5C, 6H and the images of Figures 5G and 5H. Why?

      We think there might be a mistake with the Figure numbers cited by the referee. We guess the point the referee was trying to raise is the difference of perceived Nora virus burden between Fig. 5H and Fig. 6G, a quite valid point. For Fig. 5H, we had measured the Nora-virus load by RTqPCR (Fig. 5G, relative burden) but had not quantified the images. This is now done and shown in Fig. 5I. In Fig. 5H, young flies were used and hence there was no Nora virus detected in ECs, as now quantified in Fig. 5I. For Fig. 6G, we had to use 30-day old intestines to be able to observe Nora virus in the enterocytes of the controls. We have now included this important point in the main text and in the Figure legends.

      (18) The Title of the legend in Figure 7 is not supported by the data as 'spread through the intestine' has not been analyzed. Please adjust.

      Done.

      (19) All figures in which ANOVA is used: I assume that anything not labeled with an asterisk was found to be non-significant? If so, this should be indicated in the manuscript.

      Actually, we have not highlighted obvious differences to maintain clarity (e.g., Fig. 1E between uncured Ore-R(SM) and cured Ore-R(SC). We thus have underlined the biologically relevant differences in the panels. The interested readr can refer to the primary data that are accessible on a data repository.

      (20) Figure 7C: the authors may want to contrast their finding that Upd3 was not upregulated in Nora virus-infected flies (in the absence of PA14) with the findings of Kuyateh et al, who did report upregulation of Upd3 (https://doi.org/10.3390/v15091849).

      We thank the reviewer for pointing out this study we were unaware of. We would like to point out that this article is difficult to follow as it is not 100% clear in which of the analyzed studies the induction of upd3 was observed and which exact experimental conditions were followed, e.g., young or old flies, whole flies or gut… We have looked in more detail at ref. 133 of this article, which refers to an unpublished study from the Hultmark laboratory that is however available online: (https://www.diva-portal.org/smash/record.jsf?aq2=%5B%5B%5D%5D&c=15&af=%5B%5D&searchType=SIMPLE&sortOrder2=title_sort_asc&query=Nora+virus&language=en&pid=diva2%3A1045375&aq=%5B%5B%5D%5D&sf=all&aqe=%5B%5D&sortOrder=author_sort_asc&onlyFullText=false&noOfRows=50&dswid=4587).

      In that study, flies were “infected” with Nora virus by expressing a cDNA clone injected into embryos. The problem is that for some unknown reasons the authors used Relish mutant flies. It is thus difficult to conclude as these flies are defective for the IMD and Sting pathways whereas our flies are wild-type. We were also interested to read that genes involved in midgut stem cells differentiation were expressed in flies harboring Nora virus, which is in keeping with the data of the present study. However, it is difficult to discuss this when we know little on the background of the studies analyzed by Kuyateh et al, in as much as our Discussion is already rather long.

      (21) Figure 7E: are the differences between control and Dome/Stat knockdown flies significantly different for Nora (+) flies (in the absence of Pseudomonas)? This is not clear from the data presentation.

      The answer to the question is positive: the JAK-STAT pathway also contributes to the maintenance of intestinal epithelium homeostasis in the absence of bacterial infection, that is presumably basal conditions. We have modified Fig. 7E to include more comparisons.

      Textual suggestions:

      (22) Line 25 strives > thrives

      Done.

      (23) Lines 150- 152, etc are not very informative. Also, some of the viruses analyzed are not "known contaminating viruses", but viruses used experimentally (VSV, IIV6, CrPV). I suggest adjusting the phrasing.

      Done.

      (24) Line 862: weaker fitness > lower fitness.

      Done.

      (25) Virology terms:

      (a) I suggest not using the term titer for qPCR readouts (which do not involve titration). Viral RNA level or viral RNA load would be more appropriate.

      Done.

      (b) I would propose rephrasing the Y-axis label of Figure 1C, E to Nora RNA load (same for other figures showing viral RNA).

      Done.

      (c) Infested: rather use the more accurate term infected.

      Done.

      (d) Contamination: rather use the term infection.

      We have modified some but not all occurrences of this word. We believe that it is important to use the word contamination when referring to enterocytes: the enterocytes are not infected by Nora; rather, differentiated infected ISCs become contaminated enterocytes. Infection refers to an active process whereas contamination refers to a state.

      (e) Proliferation: rather use the term replication.

      According to our US-English dictionary, proliferation refers to the “rapid reproduction of a cell, part, or organism”, which is the meaning we intend. Replication does not have this notion of speed of reproduction.

      (f) Drosophila should not be italicized in Drosophila A virus, following the ICTV convention that a "virus name should never be italicized, even when it includes the name of a host species or genus" https://ictv.global/faq/names.

      Done.

      (26) Line 873-975: please rephrase the legend of Figure 1F as the current one is not informative.

      Done.

      (27) Line 934: I suggest moving the justification of the time point chosen "= LT50 on the survival test in 935 Fig. 2E" to the main text.

      Done.

      (28) Line 936: with drop > with a drop.

      No longer relevant.

      (29) Line 940-941: the grammar of the sentence does not seem to be correct as it suggests that SDS induces Diptericin expression.

      No longer relevant.

      (30) Line 952-953; line 980: please correct mismatch singular/plural (antibody have, inhibition do).

      Done.

      (31) Line 422: "It will be interesting to determine whether the absence of a Dcr2 fluorescent proteins fusions in progenitor cells that we report in this study rules out a role for the RNAi pathway in intestinal host defense against the Nora virus". It would be of interest to discuss this finding in the context that virus-derived Nora virus siRNAs can be easily detected and that the viruses encode an RNAi antagonist (doi: 10.1371/journal.ppat.1002872).

      Done. We have updated the Discussion and propose a model whereby RNAi would prevent primary infection of enterocytes and then virus replication in proliferating progenitor cells would allow the virus to effectively inhibit the RNAi machinery when the infected progenitor cells become enterocytes.

      (32) Line 159: Nora virus phenotypes differ between laboratories. I would be interested to read the authors' speculations on why this would be the case.

      Our work shows that the effects of Nora virus depend significantly on several parameters we have identified: nutrition quality, age, exposure to abiotic or biotic stresses, and fly genotypes with the existence of Nora-refractory strains. These parameters as well as potential differences between laboratories are actually discussed in the second paragraph of the Discussion.

      (32) Line 175: capitalization of ORE-R vs Ore-R at other places in the manuscript.

      Done.

      (33) Line 185-194: PA14 and Pseudomonas are used interchangeably. Perhaps it is clearer to stick to a single term for consistency.

      PA14 is one clinical strain used to study P. aeruginosa. There are many others such as PAO1, which is also widely used. We have decided to write P. aeruginosa PA14 the first time we are using it in each figure legend, and use only PA14 afterwards.

      Reviewer #3 (Public review):

      The claim that Dcr2 is not abundant in ISCs because the protein is not stable is logically consistent and reasonable. Perhaps I missed this, but the authors could additionally knock down or use somatic CRISPR to delete Dcr2 in ISCs to test whether a lack of Dcr2 underlies sensitivity. In this experiment, the expectation would be that depleting Dcr2 in ISCs genetically would make little difference to susceptibility overall compared to controls. This is not an essential experiment request.

      We agree with the reviewer that these would be interesting experiments to perform. However, we are currently unable to perform additional experiments and leave it to other interested investigators studying antiviral innate immunity to address these questions dealing with the specific steps of RNA interference that may be missing in progenitor cells.

      Recommendations for the authors:

      (1) Line 206-207 and 214-216: the order of ideas presented here is unintuitive. In Lines 206207, it is said that ABX treatment had no effect, which is counterintuitive to the nature of infection susceptibility. But this is resolved in Lines 214-216 when the reader realizes that S3G is fed on a sucrose solution, and so likely microbiota-depleted. Perhaps more could be said to clarify this in the main text, and/or swap the order of these observations so a casual reader is not confused about the nature and extent of the microbiota contributing to the sensitivity of Nora-infected flies.

      As suggested by the reviewer, we have clarified the text with respect to the food source and microbiota load; we emphasize that the microbiota plays a protective role in Nora-negative flies fed on sucrose solution even though the microbiota load is very low under these conditions. Of note, the microbiota is not depleted in sucrose-fed Nora-positive flies: we suspect that delaminating enterocytes may actually provide directly or more likely indirectly (peritrophic matrix) nutrients for the microbiota.

      (2) Line 262-265: the text may be a bit exaggerated given only 3 pathogens tested, one of which was a fungal natural infection breaching the cuticle and largely bypassing the gut. This could be re-phrased.

      The important point is that uninfected Nora-positive flies die with a LT50 of about 10 days even when noninfected; it has nothing to do with the number of pathogens tested. Thus, any infection that causes death with kinetics in this range may be misinterpreted in the absence of a relevant uninjured or clean injury control.

      (3) Line 379-382: I don't know if citing Schissel et al. is needed here. This paper's methods and data are highly problematic, as mentioned by the authors. This is not a highly cited paper, nor does it add value to the present discussion to cite it only to discredit it. Perhaps this can be left out and the field can move on quietly - naturally, this choice is the present authors', and this is just my view.

      We have actually cited this article at two other places and thus had not cited it “only to discredit it”. We have nevertheless removed the lines as suggested by the reviewer.

      (4) Line 404: perhaps clarify "Interestingly, mammalian stem cells..."

      Done.

      (5) Line 455: my understanding of digital PCR is that it is highly useful for detecting rare variants but not particularly better than qPCR for estimating loads/titres? This is not to say dPCR is worse, just that dPCR and primer-specific RT + qPCR are comparable if load/titre is desired. For instance, Qiagen actually recommends qPCR over dPCR specifically (and pretty much exclusively) for gene expression: https://www.qiagen.com/us/applications/digitalpcr/beginners/dpcr-vs-qpcr.

      (6) Perhaps Line 455 could drop the advocacy for digital PCR? I agree using dissected guts, or seemingly aged individuals per Figure 3B(?), is a valuable thing to point out. Maybe the aged individuals point could be added here? I guess the idea behind dissected guts is to have samples enriched in Nora virus.

      Cleaning Nora-positive strains is really difficult and we suspect that as long as there is one viral particle left, it may be sufficient to re-ignite the contamination of the strain. Our own experience with digital PCR on the expression of AMP-like molecules in the head of flies is that we found the approach to be more sensitive than classical RTqPCR (Xu et al., EMBO Rep, 2023).

    1. Reviewer #1 (Public review):

      Summary:

      This paper leverages 7T fMRI data from the Natural Scenes Dataset to investigate whether retinotopic coding, the position-selective organization of visual response structures, spontaneous resting-state interactions between the Default Network (DN) and the Dorsal Attention Network (dATN). Using individualized network parcellations and population receptive field (pRF) modeling, the authors show that DN voxels can be split into two subpopulations based on their response to visual stimulation: those with position-specific positive BOLD responses (+pRFs) and those with position-specific negative BOLD responses (-pRFs). Critically, these subpopulations relate differently to the dATN during rest: -pRFs are anticorrelated with the dATN, +pRFs are positively correlated, and non-retinotopic DN voxels show no coupling. The anticorrelation (and positive correlation) is enhanced when DN and dATN voxels share visual field preferences. An event-triggered analysis suggests that retinotopic coding shapes both "top-down" (DN-initiated) and "bottom-up" (dATN-initiated) spontaneous activity transients, supporting the claim that the retinotopic scaffold is intrinsic to the DN. These findings challenge the prevailing view of global DN-dATN antagonism and suggest retinotopic coding as an organizing principle for cross-network communication.

      Strengths:

      The central finding that what looks like network-level independence between DN and dATN decomposes into structured, bivalent interactions organized by voxel-level visual field preferences is a compelling demonstration that macro-scale network descriptions can hide meaningful substructure. The logic of the analysis is clean: pRF properties are estimated from retinotopic mapping data and then used to predict resting-state coupling in completely independent scanning sessions. This cross-session, cross-modality design rules out many circularity concerns.

      The use of individualized multi-session hierarchical Bayesian parcellation (Kong et al.) to define DN and dATN boundaries within each subject is the right methodological choice for this question. Network boundaries in posterior cortex, where DN and dATN interdigitate most closely, vary considerably across individuals, and group-average approaches would introduce exactly the kind of misassignment that would most confound the result.

      The matched-vs-random pRF analysis is well-controlled. The authors demonstrate that cortical distance between matched and randomly-matched dATN pRFs does not differ, effectively ruling out spatial proximity on the cortical surface as a confound. tSNR controls further show that signal quality differences do not drive the effect.

      The event-triggered analysis (Figure 3) is creative and adds genuine value. Showing that retinotopically-specific coupling persists during DN-initiated activity transients, not only dATN-initiated ones, is the key piece of evidence for the claim that the code is intrinsic to the DN rather than passively inherited through bottom-up visual drive.

      The result is observed consistently across all individual participants, which provides strong evidence for the robustness of the qualitative pattern despite the small sample size inherent to densely-sampled designs.

      Weaknesses

      (1) The nature of negative pRFs requires more scrutiny

      The entire interpretive framework depends on treating negative pRFs in the DN as genuine position-selective neural responses (suppression). However, negative BOLD signals are well known to arise from non-neural sources, specifically, vascular stealing (where activation in nearby tissue diverts blood from adjacent voxels) and macrovascular draining vein effects that produce spatially displaced signal inversions. These concerns are amplified at 7T, where T2*-weighted GE-EPI carries substantial macrovascular weighting. The DN and dATN interdigitate extensively in the posterior cortex, often within millimeters. A negative pRF in a DN voxel adjacent to a positive dATN voxel could, in principle, reflect the hemodynamic shadow of its neighbor rather than an independent neural response.

      The spatial dispersion control (matched vs. random pRFs have similar cortical distribution) is valuable but addresses long-range confounds, not *local* hemodynamic crosstalk. The reliability of sign and center position across runs is reassuring but does not exclude a vascular origin, as vascular architecture is itself stable across sessions. I would encourage the authors to test whether the matched-vs-random effect survives exclusion of voxels near large pial vessels (identifiable from T2* contrast or the venograms available in the NSD). These analyses would not be dispositive, but they would meaningfully strengthen the neural interpretation.

      (2) Amount of retinotopic mapping data and choice of pRF pipeline

      The NSD includes 6 runs of retinotopic mapping (~5 minutes each; 3 bar-aperture, 3 wedge/ring). The authors use only the 3 bar-aperture runs (~15 minutes total per subject) and fit their own pRFs using AFNI's 3dNLfim procedure, rather than using the pRF estimates provided as part of the NSD release (which were fitted using the analyzePRF toolbox with all 6 runs).

      Fifteen minutes of bar data is quite limited for reliable voxel-wise pRF estimation, especially in regions far from the early visual cortex, where signal-to-noise is inherently lower. Standard recommendations for robust pRF mapping in higher-order regions generally suggest substantially more data. The variance-explained threshold is close to the noise floor by design, meaning that a non-trivial number of the "retinotopic" DN voxels may be poorly estimated. Given that the core analyses depend on both the sign and the center position of these pRFs, the limited data is a significant concern.

      The authors do not explain why they chose to re-fit pRFs rather than use the NSD-provided estimates. If the motivation was methodological (e.g., the NSD pRF pipeline does not readily yield signed amplitude, or the bar-only fits were judged more appropriate for detecting negative responses), this should be made explicit. If the NSD-provided pRFs can reproduce the key findings, this would substantially increase confidence in the results. If they cannot, that divergence itself would be important to understand. I would ask the authors to address this choice and, if feasible, to report whether the core results replicate using the NSD-provided pRF estimates and/or whether using all 6 runs of retinotopy data changes the findings.

      (3) pRF model adequacy for the Default Network

      The isotropic Gaussian pRF model was developed for and validated in early and mid-level visual cortex, where it captures the dominant spatial selectivity of neuronal populations. In DN voxels where the model explains comparatively little variance, it is less clear that the model is capturing the right quantity. Specifically, the negative pRFs could conceivably be described by a model with a dominant suppressive surround (e.g., a difference-of-Gaussians model), in which what appears as a "negative pRF" in the standard model is actually the surround component of a center-surround mechanism whose center is poorly resolved. This distinction matters: a genuine inverted code (negative center response) implies a qualitatively different computation than inherited surround suppression from nearby visual cortex.

      The authors should consider discussing why the standard model is sufficient for the questions asked, or ideally, testing whether the sign distinction survives under alternative pRF model specifications.

      (4) Interpreting resting-state transients as top-down vs. bottom-up

      The event-triggered analysis labels high-amplitude DN pRF activations as "top-down events" and dATN activations as "bottom-up events." This is a reasonable inference given experience-sampling studies showing that rest involves alternation between internal and external attention, but it remains an inference. Without concurrent experience sampling, eye-tracking, or physiological monitoring, we cannot establish that a spontaneous DN transient reflects memory retrieval or internally-directed thought rather than a global arousal fluctuation. Similarly, dATN transients during rest could reflect covert shifts of spatial attention to remembered or imagined locations rather than bottom-up processing per se. I would ask the authors to soften this framing or to discuss what additional data would be needed to validate the top-down/bottom-up attribution.

      (5) The "retinotopic code" vs. "visual field bias" distinction

      The paper uses the language of a "retinotopic code" throughout and correctly distinguishes this from a "retinotopic map," noting that DN voxels do not form a continuous topographic representation on the cortical surface. This distinction deserves greater emphasis. In vision science, retinotopic maps carry computational significance through their topographic continuity and relationship to cortical wiring. A distributed collection of voxels with coarse visual field preferences but no cortical topography is a fundamentally different organizational feature. Recent reviews have drawn an explicit distinction between *retinotopic maps* and *visual field biases* (Groen, Dekker, Knapen & Silson, TiCS 2022), and the present findings may be more accurately characterized as the latter. Perhaps the authors think that the distinction is merely a signal-to-noise distinction, in which case I would invite them to clearly speak to this interpretation. In any case, this is not a criticism of the findings themselves, but clarity on this point would prevent conflation of two different organizational principles and would help position the work for both the vision and network neuroscience communities.

    1. Reviewer #3 (Public review):

      Summary:

      Environments change over time; therefore, optimal decision-making ought to discount older observations of the environment in favor of newer ones in a manner consistent with the amount of temporal instability. Computational models of perceptual decision-making model this temporal discounting with a 'leak' parameter that determines the rate at which older information is discarded. In this study, McGaughey and Gold examine the neurophysiological mechanisms that could underlie adaptation to different degrees of temporal instability. They developed a novel variant of the well-established perceptual decision-making random-dot-motion paradigm, in which the stimulus being evaluated was preceded by an 'adapting' stimulus with either high or low temporal stability. When the test stimulus was preceded by the adapting stimulus with lower temporal stability, NHPs showed reduced psychometric slopes, indicative of increased temporal discounting ('leak'). While the NHPs performed this task, single-unit neural activity was recorded in area MT, along with pupillometric data. The authors use these neural and pupil datasets to investigate two potential sources of adaptive discounting under varying amounts of temporal instability: sensory adaptation (changes in instantaneous evidence encoding), and arousal-related changes in evidence accumulation. MT neurons respond differently to the test stimulus under conditions of high vs low temporal stability of the adapting stimulus - when the adapting stimulus is more stable, MT neurons have larger and more selective responses to the test stimulus. In addition, evoked pupil responses to the test stimulus were modulated by the adapting stimulus. Both the strength of the difference in MT responses across contexts and the difference in pupil diameter across contexts were correlated with context-dependent modulation of the monkeys' behavior over sessions. The paper concludes that both sources appear to independently contribute to adaptive evidence accumulation, likely operating at different processing stages in the brain.

      Strengths:

      (1) While computational models of perceptual decision-making have been very useful for explaining behavior and neural responses in decision-making areas, we are still in search of some of the neural mechanisms that could implement such models. Studies such as this one, which aim to identify neural correlates of simplified model parameters, are quite crucial.

      (2) Analysis is generally careful and well-executed.

      (3) Prompts some interesting follow-up questions that could be answered with simultaneous recordings and causal manipulations, as the authors state in the Discussion - e.g., which areas are affected by arousal-related neuromodulation correlated with evoked pupil size and how.

      Weaknesses:

      (1) The task design may not be optimal. While the amount of time the monkey is exposed to each motion direction during the adapting stimulus is matched, it's hard to know if the reduced MT responses to the test stimulus are truly due to the greater frequency of switches during the HSF adapting stimulus or because the monkeys have been exposed to more repetitions of the stimulus. It's increased sensory adaptation in either case, but it makes it problematic to interpret this as temporal context-dependent adaptation specifically. I think this could potentially be partially addressed by an analysis that is in the paper, but could potentially be emphasized/fleshed out more, specifically the results shown in Figure 4D that seem to show that most of the reduction in neural response for adapting units occurs between the first and second stimuli.

      (2) The pupillometric analysis seems to be an indirect way of assessing whether the accumulator itself might be modulated by temporal context, but the link could be made clearer. The authors show that context-dependent behavior is related to pupil size, which is related to arousal/neuromodulation, but it would be helpful to have some idea of what neural mechanisms underlying adaptive decision-making are actually impacted by this neuromodulation. Lacking neural data to address this question (e.g., from a brain region proposed to be involved in the accumulation process), at least more discussion of this would be helpful. Essentially, I'm unsure of how to interpret the pupil results: the argument that temporal context affects instantaneous evidence encoding in MT that then drives the accumulator is very clear, but I am a bit confused about what, mechanistically, I should think about the effect of neuromodulation doing.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We would like to thank the reviewers for taking the time to review our manuscript and for the insightful comments given us that will help to improve our manuscript. Please find below a point-by-point answer to each reviewer.

      *Reviewer #1 (Evidence, reproducibility and clarity (Required)): **

      The authors have set up a mouse embryonic sensory neuron system to study impact of complete loss of frataxin (using a nice cre-based AAV approach). There is careful delineation of the phenotype of these cells upon complete frataxin loss using a significant range of relevant endpoints (e.g. OCR, oxidative stress, mitochondrial imaging at EM level). A major finding is the failure of neurons lacking frataxin to undergo full soma maturation - so smaller cells. In addition, AMPK is activated (maybe not surprisingly given the severe loss of mitochondrial function and drop in ATP). Solid mechanistic experiments reveal that AMPK activation when blocked prevents the suppression of soma size (we do not get the same data with regard to alanine supplementation). There are interesting studies with alanine that, in part, reverse indices of oxidative stress (mitochondrial stress, specifically). The experiments are well designed with mechanistic insight and the data clearly presented with appropriate statistical analysis. A major problem is the culture system. The labelling studies and soma size analysis reveal that this is not a truly representative population of DRG neurons. It seems all the small neurons are missing - I assume all trkA positive and GDNF-dependent neurons have been lost somewhere (this comprises 80% of the neurons at the lumbar level). The methods section covering the mouse DRG culture is sparse in terms of details and refers to a text book which I cannot access. Another issue is the background glucose concentration - growing such cells at 25mM is standard I know - but its still sub-optimal. Glucose at this concentration represents a hyperglycemic state - normal glucose is 5-10mM - its not really correct to term it glycolysis inhibitory since hexokinase, the rate limiting enzyme, has a Kd around 0.3-1mM glucose. When studying AMPK this system will exhibit suppressed AMPK activity/expression due to the high background glucose concentration of 25mM.*

      * Reviewer #1 (Significance (Required)):

      The use of this unrepresentative culture system does lower the significance. While large caliber sensory neuron, e.g. proprioceptive, dysfunction is important during development and into the adult it seems rather unfortunate that the authors ignore all other sensory neurons! Persons with Friedreich ataxia (FA) also suffer from small fiber abnormalities, e.g. pain, and these neurons actually express a higher density of mitochondria (since they are unmyelinated). So, when the authors state this model "faithfully recapitulates key hallmarks of FA...." I have to say I disagree. In terms of general significance the work is well performed with some good mechanistically strong studies, however, it does still contain a major purely descriptive component. The focus on AMPK is understandable but we learn nothing really novel about its function and role in sensory neurons. *

      We sincerely thank Reviewer #1 for the careful evaluation of our work and for the positive appreciation of the experimental design, mechanistic approach, and data presentation. We are grateful for the reviewer’s comments, which helped us clarify several aspects of the manuscript and improve the description of our culture system and metabolic conditions.

      Comment on alanine/ALA

      We would first like to clarify a terminology issue. In our study, we did not use alanine supplementation, but alpha-lipoic acid (ALA). We have checked and revised the text to avoid any possible ambiguity on this point.

      Comment on the DRG culture system and representation of sensory subtypes:

      We appreciate the reviewer’s concern regarding the representativeness of the embryonic dorsal root ganglia (DRG) culture system. We agree that this in vitro model does not fully reproduce the cellular diversity and maturation state of the in vivo DRG environment, and we have revised the manuscript to make this limitation more explicit. That said, we respectfully do not think our cultures are devoid of small sensory neurons. In the original submission, Supplementary Fig. 1D-E already showed a substantial population of CGRP-positive neurons__, supporting the presence of peptidergic small-diameter sensory neurons. In addition, we performed TrkA immunostaining,__ which showed that a large proportion of neurons in our cultures are also TrkA-positive. We can add these TrkA data to the revised manuscript if the reviewer and editor consider that this would strengthen the characterization of the culture system.

      More broadly, the reviewer raises an important point: dissociated embryonic DRG cultures maintained under simplified trophic conditions cannot be expected to preserve the full in vivo balance of mature sensory neuron subtypes. Embryonic and neonatal DRG neurons are known to depend strongly on trophic support in vitro, and sensory subtype maturation normally requires both neurotrophic cues and interactions with the native microenvironment. We therefore agree that our system should be viewed as a reductionist model of frataxin loss in developing sensory neurons rather than a complete reconstruction of the mature DRG. We have now expanded the methods section to better describe the culture conditions and revised the discussion to acknowledge more explicitly that future work using more complex conditions, such as combined trophic factor regimens, neuron–glia co-cultures, or organotypic approaches, may help preserve a more physiological sensory subtype composition.

      Comment on glucose concentration and “glycolysis-inhibitory” conditions:

      We thank the reviewer for prompting us to clarify this point. We agree that chronic exposure to 25 mM glucose can influence neuronal metabolism and AMPK signaling, and this issue has been discussed in the literature for neuronal culture systems. However, we believe there was a misunderstanding regarding the specific experiment referred to in our manuscript. In the condition that we termed “glycolysis-inhibitory,” the neurons were not maintained in high glucose. Rather, these experiments were performed in glucose-free medium supplemented with galactose, i.e. in the absence of glucose. Galactose substitution is commonly used to reduce ATP production from glycolysis and increase dependence on mitochondrial oxidative phosphorylation. We have revised the methods and results sections to make this point much clearer and now explicitly distinguish between low-glucose conditions and glucose-free/galactose conditions__.__

      Comment on significance and disease relevance:

      We appreciate the reviewer’s concern regarding the extent to which this model recapitulates the full spectrum of sensory pathology in FA. We agree that our culture system is rather artificial and might therefore not model the entire peripheral phenotype of FA.

      That being said, we believe the model remains highly relevant to a major and well-established component of FA neuropathology. Multiple neuropathological and clinical studies indicate that FA is characterized predominantly by a dorsal root ganglionopathy / sensory neuronopathy with marked involvement of large myelinated sensory neurons and their projections, which is central to the loss of proprioception and sensory ataxia that define the disease. Reviews of FA neuropathology consistently emphasize DRG hypoplasia/atrophy and loss of large myelinated fibers as hallmark features.

      We agree that small-fiber abnormalities have also been reported, including reduced intraepidermal nerve fiber density in some studies, and we do not wish to dismiss that aspect of the disease. However, the current literature still supports that the dominant and most characteristic peripheral lesion in FA affects large sensory neurons and large myelinated fibers more prominently than small fibers. We have therefore revised our wording and no longer state that the model “faithfully recapitulates” the full disease.

      * *Comment on novelty of AMPK findings:

      We agree that AMPK is a canonical metabolic stress sensor and that its activation in the context of severe mitochondrial dysfunction is not, by itself, unexpected. We have therefore revised the discussion to better frame the novelty of our study. In our view, the main contribution is not the mere observation of AMPK activation, but the demonstration, in frataxin-deficient primary sensory neurons, that AMPK activation is functionally linked to the defect in soma growth/maturation and that pharmacological AMPK inhibition can rescue this phenotype. We hope this distinction is now clearer in the revised manuscript.

      * Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary: In the Present study, the authors develop a new model of FA in cultured DRG neurons, and show its relation with Fe_S deficiency. It is also associated with defects in mTOR signaling, ALA synthesis and AMPKs

      The conclusions convincing and the work is thorough. The results are well presented and easily understood and repeatable.

      Reviewer #2 (Significance (Required)):

      While there have been hints at some of the findings ( references to AMPK), there have not been so well documented before. Thus they are important Is there any evidence of the present finding on cell size in the clinical literature ( pt size, cell size) in non DRG tissue? ( Patient size etc) Might the present findings reflect a developmental event that drives the spinal cord hypoplasia.*

      We sincerely thank Reviewer #2 for the very positive evaluation of our work. We are grateful for the recognition of the rigor, clarity, and reproducibility of the study, as well as for highlighting the relevance of our findings linking frataxin deficiency to Fe-S cluster impairment, mitochondrial dysfunction, and alterations in AMPK and mTOR signaling, as well as lipoic acid metabolism.

      We also thank the reviewer for the insightful comment regarding the potential relevance of our observations on reduced neuronal soma size.

      To our knowledge, there is no direct clinical evidence describing reduced neuronal cell size per se in patient tissues outside of the DRG. However, neuropathological studies of FA consistently report hypoplasia and atrophy of the DRG__, characterized by a marked reduction in the size and number of sensory neurons, particularly affecting large neurons. These features are widely interpreted as reflecting a developmental defect rather than purely degenerative loss.__

      More broadly, several studies have described spinal cord hypoplasia__,__ including reduced cross-sectional area of the cord and thinning of posterior columns, which are thought to arise early in disease progression. These observations support the idea that impaired neuronal growth and maturation may be a key component of the pathology.

      In this context, we agree with the reviewer that our findings may reflect a developmental mechanism contributing to the hypoplasia observed in FA__, __rather than solely a degenerative process. Our in vitro data showing reduced soma size in frataxin-deficient sensory neurons, together with the involvement of AMPK/mTOR signaling pathways known to regulate cellular growth, are consistent with this hypothesis.

      We have now revised the discussion to incorporate this point and to more explicitly propose that bioenergetic stress and AMPK activation in frataxin-deficient neurons may limit neuronal growth and maturation during development__,__ thereby contributing to the structural deficits observed in patients.

      At the same time, we have moderated our conclusions to emphasize that our model primarily captures cell-autonomous mechanisms in developing sensory neurons__,__ and that further in vivo studies will be required to directly establish the contribution of these mechanisms to human pathology.

      • Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary In the present study, the authors develop a new model of Friedreich ataxia (FA), a disease caused by frataxin deficieny, using primary cultures of embryonic mouse Dorsal Root Ganglia neurons with complete frataxin depletion. This model reproduces key biochemical hallmarks of FA, including Fe-S enzyme deficiency, mitochondrial iron dysregulation, and oxidative stress. They also observe that these frataxin-deficient neurons exhibit a reduction in soma size. They claim that this defect is mediated by AMP-activated protein kinase (AMPK) hyperactivation and suppression of mTOR signaling, which occurs in response to mitochondrial dysfunction and redox imbalance. They are able to restore soma growth by genetic inhibition of AMPK or treatment with lipoic acid (ALA). The study is carried out meticulously, and the results are generally well presented, with the exception of a few specific experiments that will be noted below.

      Major points: - Mitochondrial iron was measured using the fluorescent iron sensor RPA. However, when using this probe loss of signal can be caused by either increased iron or by loss of membrane potential. Thus, as mitochondrial membrane potential is decreased in the model used, it can not be concluded from the results obtained that mitochondrial iron is increased. To confirm that mitochondrial iron is increased, authors should either use a dequenching approach (as indicated in Petrat F, et al., Biochem J. 2002 362:137-47), or use another mitochondrial iron specific probe.

      • Authors describe that ALA treatment improves mitochondrial function and reduces oxidative stress, and they hypothesize that restored mitochondrial activity may contribute to AMPK downregulation. However, to provide a more mechanistic insight into this observation, it would be advisable to assess whether the indicated treatment is able to restore mitochondrial functionality by performing a Seahorse assay

      • Authors state that their data supports a model in which full frataxin depletion first induces a deficit of Fe-S synthesis, subsequently triggering downstream consequences such as iron dysregulation and oxidative stress. This may be plausible for oxidative stress, as it has been measured at 15 div. However, as alterations in iron homeostasis have not been measured at 15 div. it can not be concluded that they appear later than deficiency in FeS proteins. The authors should measure TfR and FT-L expression at 15div, or alternatively indicate in the discussion that it cannot be concluded whether the alteration in iron metabolism occurs after the deficiency in Fe‑S proteins

      Minor points: Previous studies have reported dysregulation of the AMPK and mTOR signaling pathways in various models of Friedreich's ataxia. It would therefore be appropriate to highlight these findings in the discussion According to authors, Immunofluorescence confirmed efficient mitochondrial localization of mtLplA delivered via AAV9-mediated transduction (Fig. S5A). However, the image provided suggests partial co-localization. This should be acknowledged in the description of the results, or either provide further data or measures confirming such efficient mitochondrial localization.

      Reviewer #3 (Significance (Required)):

      General assessment: Authors present a new model of Friedreich ataxia (FA) in Dorsal Root Ganglia neurons. This new model offers the advantage of being conditional, allowing frataxin deficiency to be induced and enabling the analysis of the emergence of various alterations across different generations. However, it also presents the limitation of inducing a complete loss of frataxin, a condition that does not occur in patients, who typically exhibit only a partial deficiency of this protein. Although the experimental work presented is of generally good quality (aside from some minor issues previously noted), it remains unclear whether the study provides substantial advances to the field of Friedreich's ataxia. The conditional nature of the model would, in principle, allow for a deeper exploration of mechanistic aspects underlying how frataxin deficiency leads to the observed phenotypes; however, this potential is not fully exploited in the current manuscript. In this context, the proposed relationship among energy deficiency, AMPK hyperactivation, and treatment with lipoic acid would be considerably strengthened by analyzing the effects of this compound on mitochondrial respiration Advance: The effects of frataxin deficiency on DRGs had been previously addressed by other authors. In this new model, the authors describe a series of phenotypes, most of which have already been reported in other models of the disease (including models using DRGs). On the one hand, this reinforces the validity of the model, but on the other, it reduces the novelty of the observations presented.*

      • *

      We thank Reviewer #3 for the careful evaluation of our manuscript and for the constructive and insightful comments. We are grateful for the positive appreciation of the overall quality of the study and for the suggestions that helped us improve the rigor and clarity of our work.

      Major points:

      Iron probe

      We thank the reviewer for this important remark. We agree that RPA fluorescence depends both on mitochondrial membrane potential and iron-dependent quenching. To address this point, we performed iron modulation experiments. Treatment with a membrane-permeant iron chelator strongly increased RPA fluorescence in both CT and KO neurons, whereas iron loading with ferric ammonium citrate (FAC) decreased the signal in both conditions. These bidirectional changes demonstrate that RPA is efficiently targeted and remains fully responsive to mitochondrial iron in KO neurons, arguing against impaired probe loading as the primary cause of the reduced basal signal.

      Nevertheless, to exclude any potential contribution of mitochondrial membrane potential differences, we propose to complement these experiments with an independent mitochondrial iron probe, Mito-FerroGreen, which detects mitochondrial Fe²⁺ via a distinct mechanism, independent of mitochondrial membrane potential. We would need about 8 weeks to perform these experiments.

      Effect of ALA on mitochondrial function

      We thank the reviewer for this suggestion. We agree that assessing mitochondrial respiration would provide additional mechanistic insight into the effect of alpha-lipoic acid (ALA). In the original version, we had data showing that ALA treatment restores intracellular ATP levels, suggesting an improvement of mitochondrial function. However, we agree that this is not formal proof. We propose for a revised version to look at mitochondrial membrane potential as a proxy for mitochondrial function. While we agree that Seahorse-based analysis of oxygen consumption would be highly informative, these experiments require substantial time in primary DRG cultures and would significantly delay the revision. But if the reviewer or editor consider this essential, this could be performed.

      Temporal relationship between Fe-S deficiency and iron dysregulation

      We thank the reviewer for this important comment.

      In response, we have now analyzed markers of iron homeostasis (TFR1 and FRTL) at 15 DIV, the same time point at which Fe-S protein deficiency is already evident. These new data show that iron homeostasis is not significantly altered at this stage, supporting our interpretation that Fe-S deficiency precedes detectable changes in iron metabolism.

      We have included these new results in the revised manuscript (Fig. S2E) and clarified the temporal sequence in the results and discussion sections.

      Minor points:

      1. We thank the reviewer for this suggestion. We have expanded the discussion to better acknowledge previous studies reporting dysregulation of AMPK and mTOR signaling pathways in various models of Friedreich ataxia, and we now position our findings within this existing body of work.
      2. We thank the reviewer for this important observation. We agree that the immunofluorescence data indicate partial, rather than complete, co-localization of mtLplA with mitochondrial markers. We believe this is most likely due to high levels of mtLplA overexpression, leading to partial saturation of the mitochondrial import machinery and consequently incomplete mitochondrial targeting. This interpretation is supported by our western blot analysis (Fig. S5B), which shows the presence of two bands corresponding to processed (mitochondrial) and unprocessed (non-imported) forms of the protein. We have revised the text accordingly to more accurately reflect these observations. We thank the reviewer for the thoughtful evaluation of the significance of our work and for highlighting both the strengths and limitations of our model. We agree that our model, based on complete frataxin depletion, does not fully recapitulate the partial deficiency observed in patients with FA. However, we believe that this approach provides a valuable experimental advantage, allowing us to: precisely control the timing of frataxin loss, investigate early cellular events, and dissect cell-autonomous mechanisms in sensory neurons. We have revised the manuscript to more clearly acknowledge this limitation.

      Regarding novelty, we agree that several individual phenotypes observed in our study (e.g., Fe-S deficiency, oxidative stress, mitochondrial dysfunction) have been reported in previous models. However, we would like to emphasize that our model enables the integration of these features within a single conditional system in primary sensory neurons, and importantly allows us to uncover a functional link between bioenergetic stress, AMPK activation, and impaired neuronal growth.

      In particular, our data identify AMPK as a key mediator of soma size reduction, and demonstrate that its inhibition can rescue this phenotype. We believe this provides a novel mechanistic connection between mitochondrial dysfunction and neuronal growth regulation in frataxin-deficient sensory neurons.

      Finally, we have revised the discussion to better highlight both the strengths and limitations of the model, and to more clearly position our findings as contributing to the understanding of early pathogenic mechanisms and developmental aspects of sensory neuron dysfunction in FA.

    1. Tsze-kung asked, saying, ‘Is there one word which may serve as a rule of practice for all one’s life?’ The Master said, ‘Is not reciprocity such a word? What you do not want done to yourself, do not do to others.’” Confucius, Analects 15.23 [b9] (~500 BCE China)

      The quote from Confucius reminds me that when we interact with others, we should learn to put ourselves in their position. It made me think that a lot of conflicts or misunderstandings could be avoided if people just asked themselves, “Would I want to be treated this way?” I think that this quote is practical and it can be something you can apply in small, daily interactions. To me, it emphasizes empathy and mutual respect. It also makes me see the importance of understanding others and acting with consideration.

  5. Mar 2026
    1. Reviewer #2 (Public review):

      In 'Developmental constraints mediate the summer solstice reversal of climate effects on European beech bud set [their original title]' Rebindaine and co-authors report on two experiments on Fagus sylvatica where they manipulated temperatures of saplings between day and night and at different times of year. I think the experiments are interesting, but I found the exact methods of them somewhat extreme compared to how the authors present them. Further, given that much of the experiment happened outside, I am not sure how much we can generalize from one year for each experiment, especially when conducted on one population of one species. I was also very concerned by the revisions.

      I expand briefly on these concerns and a few others for readers of the paper (see `The below comments relate to my original review'). Subsequent edits to the paper addressed some of these by providing a new figure and moving around the methods. Further, I am at a loss about their hypothesis, when they write in their letter: "Importantly, the Solstice-as-Phenology-Switch hypothesis does not assume that the reversal is fixed to June 21." Why on earth reference the solstice if the authors do not mean to exactly reference the solstice?

      The comments below relate to my original review with many of them still applying.

      Methods: As I read the Results I was surprised the authors did not give more info on the methods here. For example, they refer to the 'effect of July cooling' but never say what the cooling was. Once I read the methods I feared they were burying this as the methods feel quite extreme given the framing of the paper. The paper is framed as explaining observational results of natural systems, but the treatments are not natural for any system in Europe of which I have worked in. For example a low of 2 deg C at night and 7 deg C during the day through end of May and then 7/13 deg C in July is extreme. I think these methods need to be clearly laid out for the reader so they can judge what to make of the experiment before they see the results.

      I also think the control is confounded with growth chamber experience in Experiment 1. That is, the control plants never experience any time in a chamber, but all the treatments include significant time in a chamber. The authors mention how detrimental chamber time can be to saplings (indeed, they mention an aphid problem in experiment 2) so I think they need to be more upfront about this. The study is still very valuable, but -- again -- we may need to be more cautious in how much we infer from the results.

      Also, I suggest the authors add a figure to explain their experiments as they are very hard to follow. Perhaps this could be added to Figure 1?

      Finally, given how much the authors extrapolate to carbon and forests, I would have liked to see some metrics related to carbon assimilation, versus just information on timing.

      Fagus sylvatica: Fagus sylvatica is an extremely important tree to European forests, but it also has outlier responses to photoperiod and other cues (and leafs out very late) so using just this species to then state 'our results likely are generalisable across temperate tree species' seems questionable at best.

      Measuring end of season (EOS): It's well known that different parts of plants shut down at different times and each metric of end of season -- budset, end of radial expansion, leaf coloring etc. -- relate to different things. Thus I was surprised that the authors ignore all this complexity and seem to equate leaf coloring with budset (which can happen MONTHS before leaf coloring often) and with other metrics. The paper needs a much better connection to the physiology of end of season and a better explanation for the focus on budset. Relatedly, I was surprised the authors cite almost none of the literature on budset, which generally suggests is it is heavily controlled by photoperiod and population-level differences in photoperiod cues, meaning results may different with a different population of plants.

      Somewhat minor comments:<br /> (1) How can a bud type -- which is apical or lateral -- be a random effect? The model needs to try to estimate a variance for each random effect so doing this for n=2 is quite odd to me. I think the authors should also report the results with bud type as fixed, or report the bud types separately.<br /> (2) I didn't fully see how the authors results support the Solstice as Switch hypothesis, since what timing mattered seemed to depend on the timing of treatment and was not clearly related to solstice. Could it be that these results suggest the Solstice as Switch hypothesis is actually not well supported (e.g., line 135) and instead suggest that the pattern of climate in the summer months affects end of season timing?

    2. Author Response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This article presents valuable findings on how the timing of cooling affects the timing of autumn bud set in European beech saplings. The study leverages extensive experimental data and provides an interesting conceptual framework of the various ways in which warming can affect bud set timing. The support for the findings is incomplete, though extra justifications of the experimental settings, clarifications of the interpretation of the results, and alternative statistical analyses can make the conclusions more robust.

      We thank the editors and reviewers for their expert assessment of our findings and their interest in our conceptual framework. Below we respond to the specific reviewer and editor comments.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study provided key experimental evidence for the "Solstice-as-PhenologySwitch Hypothesis" through two temperature manipulation experiments.

      Strengths:

      The research is data-rich, particularly in exploring the effects of pre- and postsolstice cooling, as well as daytime versus nighttime cooling, on bud set timing, showcasing significant innovation. The article is well-written, logically clear, and is likely to attract a wide readership.

      Thank you for your generous description of our study and the manuscript.

      Weaknesses:

      However, there are several issues that need to be addressed.

      (1) In Experiment 1, significant differences were observed in the impact of cooling in July versus August. July cooling induced a delay in bud set dates that was 3.5 times greater in late-leafing trees compared to early-leafing ones, while August cooling induced comparable advances in bud set timing in both early- and late-leafing trees.

      The study did not explain why the timing (July vs. August) resulted in different mechanisms. Can a link be established between phenology and photosynthetic product accumulation? Additionally, can the study differentiate between the direct warming effect and the developmental effect, and quantify their relative contributions?

      We thank the reviewer for pointing out that we could improve our explanation of the different responses to July and August cooling in experiment 1. Whilst we incorporated this in the conceptual model and the figure caption (Fig. 1b), we now also address this topic in more depth in the discussion section, focussing on daylength and photosynthetic assimilation as the possible mediators of this change in responses (L350-371).

      For the early-season development effect vs the late-season temperature effect we can use the leaf-out day-of-year (as a proxy for development), and the summer cooling treatments (direct temperature effect) to assess the relative importance of these two components of our model. We have now included a variance partitioning analysis following this logic, see L246-252 for methods, L278-281 for results.

      (2) The two experimental setups differed in photoperiod: one used a 13-hour photoperiod at approximately 4,300 lux, while the other used an ambient day length of 16 hours with a light intensity of around 6,900 lux. What criteria were used to select these conditions, and do they accurately represent real-world scenarios? Furthermore, as shown in Figure S1, significant differences in soil moisture content existed between treatments - could this have influenced the conclusions?

      This question may reflect a misunderstanding regarding the light availability that we hope to address with improved clarification. The duration and intensity of the lighting in these experiments was always set to reflect the average conditions experienced in Zurich for those respective times of the year. Day length in spring is shorter than it is in summer, so the durations were simply adjusted to reflect this reality. The 13-hour, 4,300 lux conditions in experiment 1 were only for the April-May period, when we reduced developmental rates for the late-leafing trees (L125-129). In July, the photoperiod was set to 16 hours and light intensity was approximately 7,300 lux (L150-154). This is equitable to experiment 2–when treatments were applied in June and July–where photoperiod was 16 hours and light intensity approximately 6,900 lux (L206-207). These conditions reflect the average daylengths in Zurich, and the maximum light intensity output by the chambers.

      As mentioned in our initial author response, we do not think small differences in soil moisture levels should influence our conclusions. All pots were watered sufficiently to avoid water deficit, and all efforts were made to minimise differences in water availability. A Tukey honest significant difference test showed that only one treatment pair (6 - Late_July_Extreme vs. 7 - Early_August_Moderate, difference = 6%, p < 0.05) had significantly different soil water content, a pair whose responses are not compared. We have added words to this effect in the figure legend of Fig. S1.

      (3) The authors investigated how changes in air temperature around the summer solstice affected primary growth cessation, but the summer solstice also marks an important transition in photoperiod. How can the influence of photoperiod be distinguished from the temperature effect in this context?

      We agree that photoperiod likely plays a central role. Our conceptual model (Fig. 1) explicitly incorporates photoperiod as the framework within which temperature responses are regulated (L72-75, L627-629 & L638-641). The Solstice-as-Phenology-Switch hypothesis assumes that the annual progression of daylength sets the physiological “window” for trees’ responsiveness to temperature. Our experiments therefore focused on how temperature responses differ before versus after the solstice, while recognising that this reversal is likely enabled by the photoperiod signal. In other words, photoperiod provides the regulatory backdrop, and our results identify how diel and seasonal temperature cues are interpreted within that photoperiodic framework.

      (4) The study utilized potted trees in a controlled environment, which limits the generalization of the results to natural forests. Wild trees are subject to additional variables, such as competition and precipitation. Moreover, climate differences between years (2022 vs. 2023) were not controlled. As such, the conclusions may be overgeneralized to "all temperate tree species", as the experiment only involved potted European beech seedlings. The discussion would benefit from addressing species-specific differences.

      We agree that extrapolation from our experiments on Fagus sylvatica to other species and natural forests requires caution. However, it is precisely the controlled nature of our design that allowed us to isolate the precise mechanisms that appear to underpin the solstice switch, highlighting the role of diel and seasonal temperature variation. In natural systems, additional variables such as competition, precipitation, and soil heterogeneity can strongly influence phenology, but they also make it difficult to disentangle causal mechanisms. By minimising these confounding factors, our experiment provided a clear test of how temperature before and after the solstice regulates growth cessation.

      To acknowledge the limitation, we have toned down statements about generalisation (e.g. “likely generalisable” to “other temperate tree species may display similarities”; L409-411) and explicitly call for follow-up studies across species and forest contexts (L413–414). At the same time, we highlight that our findings align with independent evidence from manipulative experiments, satellite observations, flux measurements, and ground-based phenology, which suggests the mechanisms we report may extend beyond the specific populations studied here.

      Reviewer #2 (Public review):

      In 'Developmental constraints mediate the summer solstice reversal of climate effects on European beech bud set', Rebindaine and co-authors report on two experiments on Fagus sylvatica where they manipulated temperatures of saplings between day and night and at different times of year. I enjoyed reading this paper and found it well written. I think the experiments are interesting, but I found the exact methods somewhat extreme compared to how the authors present them. Further, given that much of the experiment happened outside, I am not sure how much we can generalize from one year for each experiment, especially when conducted on one population of one species. I next expand briefly on these concerns and a few others.

      Thank you for the kind comments. We appreciate your concerns regarding the severity of our treatments and the generalisability of our results, and you can find our detailed responses below.

      Concerns:

      (1) As I read the Results, I was surprised the authors did not give more information on the methods here. For example, they refer to the 'effect of July cooling' but never say what the cooling was. Once I read the methods, I feared they were burying this as the methods feel quite extreme given the framing of the paper. The paper is framed as explaining observational results of natural systems, but the treatments are not natural for any system in Europe that I have worked in. For example, a low of 2 {degree sign}C at night and 7 {degree sign}C during the day through the end of May and then 7/13 {degree sign}C in July is extreme. I think these methods need to be clearly laid out for the reader so they can judge what to make of the experiment before they see the results.

      We understand the concern regarding the structure of the manuscript and note that the methods section was moved to the end of the paper in accordance with eLife’s recommended formatting. We have now moved the methods section before the results to ensure that readers are familiar with the treatments before encountering the outcomes.

      We recognise that our temperature treatments were severe and do not mimic real world scenarios. They were deliberately designed to create large contrasts in developmental rates, thereby maximising our ability to detect the mechanisms underpinning the solstice switch. For example, the severe cooling between 4 April and 24 May was specifically designed to slow spring development as much as possible without damaging the plants (L129-L133). We have added text in the Methods to clarify this aim (L129-131 & L156-161).

      Regarding presentation, treatment details are now described in both the Methods and the relevant figure legends. Given this structure, we have chosen not to restate the full treatment conditions in the main Results text to avoid repetition.

      (2) I also think the control is confounded with the growth chamber experience in Experiment 1. That is, the control plants never experience any time in a chamber, but all the treatments include significant time in a chamber. The authors mention how detrimental chamber time can be to saplings (indeed, they mention an aphid problem in experiment 2), so I think they need to be more upfront about this. The study is still very valuable, but again, we may need to be more cautious in how much we infer from the results.

      We appreciate the reviewer’s concern about the potential confounding effect of chamber exposure in experiment 1. We have now discussed this limitation more explicitly, adding further explanation to the Methods (L146-148) and Discussion (L345-346).

      Note that chamber-related problems (e.g. aphid infestations) primarily occurred under warm chamber conditions, whereas our experiment 1 cooling treatments maintained low temperatures that suppressed such issues. This means that an equivalent “warm chamber control” could have been associated with its own artefacts, as trees kept under warm chamber conditions would have been exposed to additional stressors that were not present under natural growing conditions. To address this point, we included a chamber control in experiment 2. While aphid abundance was indeed higher in the warm chamber controls, chamber exposure itself had no detectable effect on autumn phenology. This suggests that the main findings of experiment 1 are unlikely to be artefacts of chamber conditions (L141145).

      Nevertheless, we agree that chamber exposure remains a potential limitation of experiment 1, which requires clear acknowledgement. We now state this more explicitly in the manuscript while also emphasising that our results are supported by experiment 2 and by converging lines of external evidence.

      (3) I suggest the authors add a figure to explain their experiments, as they are very hard to follow. Perhaps this could be added to Figure 1?

      We have now added figures to the methods section to depict the experimental timelines and settings more clearly (Figs. 2 and 3).

      (4) Given how much the authors extrapolate to carbon and forests, I would have liked to see some metrics related to carbon assimilation, versus just information on timing.

      We agree that including more data on photosynthetic assimilation would be valuable for interpreting phenological responses. Indeed, it was our intention to collect this information. However, unfortunately, we experienced technical challenges with the equipment available to us during the experimental period, which prevented us from collecting a full dataset. Nevertheless, we were able to obtain measurements during pre-solstice cooling (now presented as Fig. S12, including data for all treatments), which show that cooling treatments strongly reduced assimilation rates compared to controls. Importantly, these strong reductions occurred across all cooling treatments, yet their phenological outcomes differed markedly, demonstrating that assimilation alone cannot explain the observed responses. As we discuss, our findings are consistent with previous manipulative and observational studies reporting a weak role of late-season assimilation in controlling autumn phenology.

      (5) Fagus sylvatica is an extremely important tree to European forests, but it also has outlier responses to photoperiod and other cues (and leafs out very late), so using just this species to then state 'our results likely are generalisable across temperate tree species' seems questionable at best.

      We agree that Fagus sylvatica has a stronger photoperiod dependence than many other European tree species. As we note in our response to Reviewer 1 (comment 4), our findings align with previous research across temperate northern forests. Within our framework, interspecific variation in leaf-out timing would not alter the overall response pattern, though it could shift the specific timing of effect reversals. For example, earlier-leafing species may approach completion of development sooner and thus show sensitivity to late-season cooling earlier than F. sylvatica. Nevertheless, we acknowledge the importance of not overstating generality. We have therefore revised the manuscript to phrase conclusions more cautiously (L409411) and highlight the need for further research across species (L413–414).

      (6) Another concern relates to measuring the end of season (EOS). It is well known that different parts of plants shut down at different times, and each metric of end of season - budset, end of radial expansion, leaf coloring, etc - relates to different things. Thus, I was surprised that the authors ignore all this complexity and seem to equate leaf coloring with budset (which can happen MONTHS before leaf coloring often) and with other metrics. The paper needs a much better connection to the physiology of end of season and a better explanation for the focus on budset. Relatedly, I was surprised that the authors cite almost none of the literature on budset, which generally suggests it is heavily controlled by photoperiod and population-level differences in photoperiod cues, meaning results may be different with a different population of plants.

      We thank the reviewer for pointing out that our discussion of the responses of different EOS metrics needs more clarity. We agree with much of this perspective, and we have added an additional analysis of leaf chlorophyll content data to use leaf discolouration as an alternative EOS marker (L179-195 for methods, L296-311 for results). On this we would like to make two important points:

      Firstly, we agree that bud set often occurs before leaf discolouration, although this can depend on which definition of leaf discolouration is used. In experiment 1, bud set occurred on average on day-of-year (DOY) 262 and leaf senescence (50% loss of leaf chlorophyll) occurred on DOY 320. However, we do not necessarily agree that this excludes the combined discussion of bud set and leaf senescence timing. Whilst environmental drivers can affect parts of plants differently, often responses from different end-of-season indicators (e.g. bud set and loss of leaf chlorophyll) are similar, even if only directionally. Figure S11 shows how, across both experiments, treatment effects were tightly conserved (R<sup>2</sup> = 0.49) amongst the two phenometrics. In accordance with these revisions, we have updated the manuscript title to “Developmental constraints mediate the summer solstice reversal of climate effects on the autumn phenology of European beech” (L1-2).

      Secondly, shifts in bud set timing remain the primary focus of the manuscript as these shifts are of direct physiological relevance to plant development and dormancy induction, whereas leaf discolouration may simply follow bud set as a symptom of developmental completion. This is supported by our results, which show stronger responses of bud set than leaf senescence (Figs. 4 & 5 vs. Figs. S9 & S10).

      Following the reviewer’s suggestion, we have included more references on the topic of bud set and its environmental controls. The reviewer rightly stresses that photoperiod is considered the most important factor. As mentioned above (see Reviewer 1 comment 3), photoperiod is therefore key in our conceptual model. However, the responses we observed in F. sylvatica cannot be explained by photoperiod alone. For example, in experiment 1, July cooling delayed the autumn phenology of late-leafing trees but had negligible impact on early-leafing trees, even though both experienced the exact same photoperiod. Moreover, in experiment 2, day, night and full-day cooling showed substantial variations in their effects despite equal photoperiod across the climate regimes. This is why we suggest that the annual progression of photoperiod modulates the responses to temperature variations instead of eliciting complete control.

      (7) I didn't fully see how the authors' results support the Solstice as Switch hypothesis, since what timing mattered seemed to depend on the timing of treatment and was not clearly related to the solstice. Could it be that these results suggest the Solstice as Switch hypothesis is actually not well supported (e.g., line 135) and instead suggest that the pattern of climate in the summer months affects end-of season timing?

      We interpret this concern as relating to the flexibility in reversal timing that we observed. Importantly, the Solstice-as-Phenology-Switch hypothesis does not assume that the reversal is fixed to June 21. Rather the hypothesis implies that reversal occurs around the solstice, when photoperiod cues cause tree individuals to shift from accelerating to decelerating their seasonal development. Our conceptual model (Fig. 1) explicitly incorporates this flexibility by showing how the timing of the reversal depends on developmental speed: Individuals that develop more slowly (or leaf out later) cross the compensatory point later in the summer, whereas fast developing individuals reach it earlier.

      Our experiments support this framework: pre-solstice full-day cooling delayed bud set, whereas post-solstice full-day cooling advanced it, with differences between early- and late-developing individuals consistent with the model. Moreover, the contrasting impacts of daytime vs. night time cooling demonstrate how diel conditions can further shape when the reversal is expressed. Thus, rather than contradicting the Solstice-as-Phenology-Switch hypothesis, our findings reinforce it and extend it by showing how flexibility arises from interactions between developmental progression, diel temperature responses, and photoperiod.

      We have added an additional section in the Discussion that elaborates on how our results support the Solstice-as-Phenology-Switch hypothesis (L416-432).

      Recommendations for the authors:

      Reviewing Editor (Recommendations for the authors):

      (1) The current strength of evidence is incomplete. Extra justifications of the experimental settings, clarifications of the interpretation of the results, and alternative statistical analyses could make the conclusions more solid.

      We agree with the vast majority of the reviewer comments and have made the relevant edits. We believe that these have dramatically improved the clarity of the manuscript. The revised analyses have not changed our conclusions, though we have toned down generalisations.

      (2) The Solstice as Switch hypothesis is about the effect of temperature warming. However, the two experiments did not simulate warming but rather cooling. Although a temperature difference can be obtained compared to the control in both cases, the impacts on plant physiology and phenology should still be different between the two scenarios.

      Thank you for raising this point, which requires clearer communication in our manuscript. The Solstice-as-Phenology-Switch hypothesis posits that changes in temperature before and after the summer solstice have opposite effects on the autumn phenology of northern forest trees. While the hypothesis has most often been framed in terms of warming, the underlying mechanism concerns whether development is accelerated or slowed relative to ambient conditions. In essence, we are exploring the effect of changes in temperature – not warming per se. In warmer springs, development begins earlier and/or proceeds faster, while in colder springs the opposite occurs; the same logic applies to post-solstice conditions. We have extended our explanation in the Introduction (L69-71).

      In our experiments, we applied cooling to create strong contrasts in developmental rates without damaging the trees. These treatments allow us to test the direction of phenological responses relative to ambient conditions. Thus, although we used cooling rather than warming, the results are directly informative for the Solstice-as Switch framework, which concerns the relative effect of temperature changes rather than the absolute direction of manipulation.

      (3) The number of groups for bud type and summer temperature treatment is too small to be used as a random effect; it would be more appropriate to treat them as fixed-effect terms.

      We have revised the analysis to include bud type as a fixed effect. There are only very minor numerical adjustments (e.g. rounding to 4.8 days instead of 4.9, see L271) and inferences are not altered. We also report the bud type effects for experiment 1 (L262-266) and experiment 2 (L292-293)

      (4) Please add more clarifications for Figure 4 about what this figure is for and how you derived this figure, whether the data were from your experiments or others.

      We have rewritten the caption for Figure 6 (Fig. 4 in the previous manuscript) to clarify where the data came from and how the figure was generated (L687-693). This figure serves as a visual guide to aid the understanding of the processes that may govern the patterns we have observed. Figure 6a uses data from previous studies on diel patterns in F. sylvatica, specifically growth (Zweifel et al., 2021) and photosynthetic assimilation rates (Urban et al., 2014). To aid visualisation, we linearly interpolated between measurements points, converted the values to a relative percentage (compared to observed maximum), and then smoothed the resulting curves. Based on the evidence from experiment 2, we suggest there may be a temperature threshold below which overwintering responses (e.g. bud set) are induced in F. sylvatica. Figure 6b depicts a theoretical diel pattern of this potential threshold. In simple terms, the threshold must be lower at night because nights are typically colder than days.

      Reviewer #2 (Recommendations for the authors):

      (1) How can a bud type -- which is apical or lateral -- be a random effect? The model needs to try to estimate a variance for each random effect, so doing this for n=2 is quite odd to me. I think the authors should also report the results with bud type as fixed, or report the bud types separately.

      See point (3) in reviewing editor’s recommendations for the authors.

      (2) Could the authors move the methods earlier and remind readers of them in the results?

      We have addressed this issue, please see detailed response under reviewer 2’s concerns.

      Urban O, Klem K, Holišová P, Šigut L, Šprtová M, Teslová-Navrátilová P, Zitová M, Špunda V, Marek MV, Grace J. 2014. Impact of elevated CO2 concentration on dynamics of leaf photosynthesis in Fagus sylvatica is modulated by sky conditions. Environmental Pollution 185: 271–280.

      Zweifel R, Sterck F, Braun S, Buchmann N, Eugster W, Gessler A, Häni M, Peters RL, Walthert L, Wilhelm M, et al. 2021. Why trees grow at night. New Phytologist 231: 2174–2185.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Throughout the paper, the authors do a fantastic job of highlighting caveats in their approach, from image acquisition to analysis. Despite this, some conclusions and viewpoints portrayed in this study do not appear well-supported by the provided data. Furthermore, there are a few technical points regarding the analysis that should be addressed.

      We thank the reviewer for the comments, due to the age of the work and logistic constraints, we are unable to perform further experiments and analysis to address some of the concerns. We revised conclusions and viewpoints accordingly to reflect reviewer concerns.

      (1) Analysis of signaling traces

      Relevance of "modeled signaling level": It is not clear whether this added complexity and potential for error (below) provides benefits over a more simple analysis such as taking the derivative (shown in Figure 3C). Could the authors provide evidence for the benefits? For example, does the "maximal response" given a simpler metric correlate less well with cell fate than that calculated from the fitted response?

      We think the benefits of modeled signaling level are the conceptual accuracy to the extent possible with the data. It’s true that the assumptions brought-in may cause certain biases. We perform this and the simplest (raw data averaging, Fig.2). Intermediate results in between (such as the first derivative in Fig.3C) may correlate well or less well, but cannot be interpreted biologically.

      Assumptions for "modeled signaling level": According to equation (1) Kaede levels are monotonically increasing. This is assumed given the stability of the fluorescent protein. However, this only holds for the "totally produced Kaede/fluorescence." Other metrics such as mean fluorescence can very well decrease over time due to growth and division. Does "intensity" mean total fluorescence? Visual inspection of the traces shown in Figure 2 suggests that "fluorescence intensity" can decrease. What does this mean for the inferred traces?

      Yes the segmentations measure intensity in a fixed volume inside a cell, therefore it’s a spatial average (concentration) and is susceptible to cell volume changes. This has been noted in the revision. The raw measurement does fluctuate and can decrease, we think the short-time-scale fluctuations are likely measurement variations/errors rather than underlying big changes in concentration.

      Estimation of Kaede reporter half-live: It is not clear how the mRNA stability of Kaede is estimated. It sounds like it was just assessed visually, which seems not entirely appropriate given the quantitative aspects of the rest of the study. Also, given that Shh signaling was inhibited on the level of Smoothened, it is not obvious how the dynamics of signaling shutdown affect the estimate. Most results in Figure 7 seem to be quite robust to the estimate of the half-live. That they are, might suggest that the whole analysis is unnecessary in the first place. However, not all are. Thus, it would be important to make this estimate more quantitative.

      Yes we agree. Unfortunately we don’t have the quantitative data required to better estimate Kaede mRNA stability. The timing of Cyc inhibition to the ceasing of ptch mRNA production is roughly estimated but not necessarily precise in this context.

      (2) Assignment of fates and correlations

      Error estimate for cell-type assignment: Trying to correlate signaling traces to cell fate decisions requires accurate cell fate assignment post-tracking. The provided protocol suggests a rather manual, expert-directed process of making those decisions. Can the authors provide any error-bound on those decisions, for example comparing the results obtained by two experts or something comparable? I am particularly concerned about the results regarding the higher degree of variability in the correlation between signaling dynamics and cell fate in the posterior neural tube. Here, the expression of Olig2 does not seem to segregate between different assigned fates, while it does so nicely in the anterior neural tube. This would suggest to me that cells in the posterior neural tube might not yet be fully committed to a fate or that there could be a relatively high error rate in assigning fates. Thus, the results could emerge from technical errors or differences in pure timing. Could the authors please comment on these possibilities?

      This is a very insightful point. We did examine the posterior data again (cross-checked by 2 co-authors) to make sure the mixed situation has correct cell fate assignment. As established by others’ and our previous studies (See also Fig.1A), the identification of MFPs and LFPs in zebrafish spinal cord is very robust. The MFPs are the apical constricted single column of cells along the midline on top of the notochord, and the LFPs are the 2 columns of cells next to MFP on both sides. LFPs’ expression of olig2:gfp did vary more in the posterior (timing of response/commitment could be a factor as the reviewer pointed out), but eventually the cells at those positions will be V3 interneurons or floor plates and have not been observed to make motoneurons. There are 3 low Olig2:GFP pMNs in the anterior dataset (Fig.2B’) and 3 high Olig2:GFP LFPs in the posterior dataset (Fig.2D’) that we checked carefully. The heterogeneity argument is based on the verified tracking and final positioning of these cells.

      Clustering and fates: One approach the authors use to analyze the correlation between signaling and fate is clustering of cell traces and comparison of the fate distributions in those clusters. There is a large number of clusters with only single traces, suggesting that the data (number of traces) might not be sufficient for this analysis. Furthermore, I am skeptical about clustering cells of different anterior-posterior identities together, given potential differences in the timing of signal reception and signaling. I am not convinced that this analysis reveals enough about how signaling maps to fate given the heterogeneity in traces in large clusters and the prevalence of extremely small clusters.

      We agree. Due to the age of the work and logistic constraints, we are unable to perform further experiments and analysis to enrich the tracks for this revision. We are aware of upcoming, independent studies with many more systematic tracks and analysis which will address these concerns. We have added the caveats the reviewer raised.

      Signaling vector and hand-picked metrics: As an alternative approach, that might be better suited for their data, the authors then pick three metrics (based on their model-predicted signaling dynamics) and show that the maximal response is a very good predictor of fate for different anterior-posterior identities. Previous information-theoretic analysis of signaling dynamics has found that a whole time-vector of signaling can carry much more information than individual metrics (Selimkhanov et al, 2014, PMID: 25504722). Have the authors tried to use approaches that make use of the whole trace (such as simple classifiers (Granados et al, 2018, PMID: 29784812), or can comment on why this is not feasible for their data? The authors should at least make clear that their results present a lower bound to how accurately cells can make cell-fate decisions based on signaling dynamics.

      Thanks for these suggestions. We are limited by the measurement noise, coverage window of the traces and the number of tracks to make use of the full dynamics in a more informative manner.

      (3) Consequences of signaling heterogeneity

      The authors focus heavily on portraying that signaling dynamics are highly variable, which seems visually true at first glance. However, there is no metric used or a description given of what this actually means. Mainly, the variability seems to relate to the correlation between signaling and fate. However, given the data and analysis, I would argue that the decoding of signaling dynamics into fate is surprisingly accurate. So signaling dynamics that seem quite noisy and variable by visual inspection can actually be very well discriminated by cells, which to me appears very exciting.

      Yes – we agree that most cells are actually accurate in such a highly dynamic tissue. In the literature, the view has been more focused on how the GRN enables this accuracy. We therefore highlighted the heterogeneity and limit of accuracy of the GRN here. We added this point to make our presentation more balanced.

      Indeed, simple features of signaling traces can predict cell fate as well as position (for anterior progenitors). Given that signaling should be a function of position, it naively seems as if signaling read-out could be almost perfect. It might be interesting to plot dorsal-ventral position vs the signaling metrics, to also investigate how Shh concentration/position maps to signaling dynamics, this would give an even more comprehensive view of signal transmission.

      We’d refer readers to our earlier study Xiong et al., 2013 where ptch2:kaede, nkx2:gfp and olig2:gfp were plotted against position over time in single cell tracks. It was found that position was not a good predictor of signaling levels or cell fates at early stages when the cell fates were specified.

      There remains the discrepancy between signaling traces and fate in the posterior neural tube. The authors point towards differences in tissue architecture and difficulties in interpreting a "small" Shh gradient. However, the data seems consistent with differences in timing of cell-fate decisions between anterior and posterior cells. The authors show that fate does initially not correlate well with position in the posterior neural tube. So, signaling dynamics should likely also not, as they should rather be a function of position, given they are downstream of the Shh gradient. As mentioned above, not even Olig2 expression does segregate the assigned fates well. All this points towards a difference in the time of fate assignment between the anterior and posterior. Given likely delays in reporter protein production and maturation, it can thus not be expected that signaling dynamics correlate better with cell fate than the reporter "83%". Can the authors please discuss this possibility in the paper?

      Yes this is an important point/caveat of live signaling and fate tracking. As discussed in the manuscript, due to the sensitivity limit of fluorescent imaging, it’s difficult to determine the time when cells start to respond to the signal, and how variable that is from cell to cell. The posterior cells may be more variable in either spatial or temporal responses compared to the anterior and we are not able to distinguish that. However, signaling dynamics is not necessarily a good function of position or time either, there is no evidence for that in our results here. The 83% correlation is thus striking for the posterior progenitors indicating a certain robust logic in the GRN to capture a strong (even short-lived) response to Shh, regardless of position or time. This is an interest possibility (we do not claim it a mechanism as we have not tested it with perturbations) that challenges the prevailing view in the field that these progenitors integrate Shh exposure over time, or that they acquire positional information by reading a gradient.

      The discussion has been modified to be more nuanced about these points.

      Thus, while this paper represents an example of what the community needs to do to gain a better understanding of robust patterning under variability, the provided data is not always sufficient to make clear conclusions regarding the functional consequences of signaling dynamics.

      We quite agree. Together with the reviewer, we look forward to seeing the publication of some recent, independent progresses overcoming the challenges in our work by other colleagues.

      Reviewer #2 (Public Review):

      Summary:

      In this work, Xiong and colleagues examine the relationship between the profile of the morphogen Shh and the resulting cell fate decisions in the zebrafish neural tube. For this, the authors combine high-resolution live imaging of an established Shh reporter with reporter lines for the different progenitor types arising in the forming neural tube. One of the key observations in this manuscript is that, while, on average, cells respond to differences in Shh activity to adopt distinct progenitor fates, at the single cell level there is strong heterogeneity between Shh response and fate choices. Further, the authors showed that this heterogeneity was particularly prominent for the pMN fate, with similar Shh response dynamics to those observed in neighboring LFP progenitors.

      Strengths:

      It is important to directly correlate Shh activity with the downstream TFs marking distinct progenitor types in vivo and with single cell resolution. This additional analysis is in line with previous observations from these authors, namely in Xiong, 2013. Further, the authors show that cells in different anterior-posterior positions within the neural tube show distinct levels of heterogeneity in their response to Shh, which is a very interesting observation and merits further investigation.

      Weaknesses:

      This is a convincing work, however, adding a few more analyses and clarifications would, in my view, strengthen the key finding of heterogeneity between Shh response and the resulting cell fate choices.

      We thank the reviewer for the comments, due to the age of the work and logistic constraints, we are unable to perform further experiments and analysis to address some of the concerns. We revised conclusions and viewpoints accordingly to reflect reviewer concerns.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for The Authors):

      Minor comments:

      y-axis label suddenly changes to Ptch2-reporter level in Figure 5. Is what is plotted different from what is seen as examples in Figure 3?

      Thanks! Figure 5 tracks are as Figure 3B, this has been annotated in the figure legends.

      There are random bounding boxes in some of the figures.

      Sometimes the m in "More dorsal" is stylized with a capital M and sometimes not. It is somewhat confusing as a name for cell types but it is fine if no alternative can be found.

      This study unfortunately does not include markers that distinguish the interneurons dorsal to pMNs. We categorized them collectively as “more dorsal”.

      Response-time is defined as "the amount of time with an above-basal Shh response". This seems to me as the definition of response duration. I would assume that response-time, means the time it takes until a response is first observed. Please consider changing this.

      We did not use “duration” because a response time course recorded in these tracks may include multiple durations (on and off). The duration of exposure/response has been specifically used in the field as a single period of response. So it’s a sum of active responding time here. Clarified in the text.

      Reviewer #2 (Recommendations for The Authors):

      (1) The authors address several possible setbacks of transforming the measured fluorescence intensity of the patched reporter into a readout of the Shh signaling activity over time, however, one aspect that isn't directly addressed is the potential effect of differences in the z position of analyzed cells. These could, at least in principle, be sufficient to introduce significant noise in the fluorescence measurements. Can the authors subset their datasets by initial, as well as average, z position and then re-examine the measured trends for both Shh activity and the intensity of the cell fate reporters used in the study?

      The zebrafish early neural plate/tube has a small thickness in z in dorsal-ventral imaging and the tissue is transparent. The depth-associated scattering contributes very little, if at all to the fluorescent signals in the imaged time window. This can be seen in the nuclear/membrane signal of the movies, which is largely uniform across the tissue in z in the neural tissue. It can also be seen that the notochord cells, further ventral, appears to be dimmer.

      (2) It is critical for the validity of this study that the intensity of the patched reporter introduced by the authors in 2012, and used again in this study, faithfully represents the signaling activity of Shh. In this study, the authors provide measurements of the transcriptional rate of Kaede and additional modeling for this purpose. However, an important point is to determine how sensitive is the reporter to changes in Shh signaling of different magnitudes?

      We consider this BAC reporter line a good (probably still the best live reporter) one as it resolves the endogenous gradient up to the dorsal interneuron domains (Huang et al., 2012, Xiong et al., 2013) and responds well to perturbations (Notch, Cyclopamine, etc). But it’s true that we don’t have information of how sensitive it responds to changes of different magnitude. As far as we know, there is no in vivo, single cell information of how Shh targets respond to signaling of different magnitudes.

      (3) To strengthen the previous point, it would be nice to extend the analysis in Figure 2, at least partially, using other readouts for Shh activity (e.g. GBS-GFP)?

      We have used a GBS-RFP line previously and found it to be lower resolution in terms of showing the DV gradient, compared to ptch2:kaede.

      (4) It is unclear to me what is the relevant time window during which cells respond to Shh in the anterior versus posterior domains to determine progenitor specification. This is a concern to me, since: i) the average heterogeneity of Shh activity seems to increase strongly in time (Figure 2A/C); and ii) it is important to exclude that the finding of heterogeneous relationship between Shh activity and fate choices is largely driven by later timepoints, where potentially its activity is no longer relevant for cell fate specification. Can this point be clarified when this data is introduced in the manuscript and further discussed?

      Yes this is an important point/caveat of live signaling and fate tracking. As discussed in the manuscript, due to the sensitivity limit of fluorescent imaging, it’s difficult to determine the time when cells start to respond to the signal, and how variable that is from cell to cell. The posterior cells may be more variable in either spatial or temporal responses compared to the anterior and we are not able to distinguish that.

      (i) The ptch2:kaede reporter variability is higher in terms of magnitude (the signal gets brighter) in later times but the heterogeneity (overlap between difference cell fate groups) is lower in later times

      (ii) Similarly, the heterogenous relationship is more pronounced in early time points. Since we do not know exactly when the activity becomes no longer relevant (from our earlier studies we do think that the cells become specified early, when Shh signaling is noisy), we modelled the response profile and searched for a good predictor. The maximum response stands out, particularly as a good indicator for the posterior cells, suggests an early window/time of specification.

      Discussion has been modified to clarify these points.

      (5) Is the response of the patched reporter, as well as cell fate reporters, to defined concentrations of exogenously provided Shh heterogeneous, for instance, in in vitro experiments?

      Well-controlled (e.g., microfluidics and labeled Shh molecules) in vitro experiments will be fantastic future directions. Existing tissue explant + Shh dose approaches do not resolve the heterogeneity of exposure at single cell level but may be helpful in testing the limits and variabilities at different magnitudes.

      (6) The source of noise in this system is not entirely clear to me. The authors seem to attribute the heterogeneity they observe to the way cells respond to Shh, but can it be excluded that the morphogen profile is itself noisy to start with? It is currently difficult to distinguish between these two possibilities, given that the Shh activity reporter used in this study is itself a transcriptional output of the pathway. Can the distribution of Shh itself be analyzed (even if in immunostainings) during neural tube formation?

      Yes we fully agree. More quantitative analysis may help dissecting the sources of noise. The morphogen profile (particularly through time) will be great. Currently no reagent is available to achieve that. Studies using an engineered morphogen or tagged morphogen suggest that the pattern through tissue reasonably captures simple diffusion dynamics. However, at single cell level considerable randomness may still remain and difficult to quantitatively compare with still staining.

      (7) It is unclear to me how the authors define the ultimate cell fate of cells in their analysis in Figure 6. The brief description in the methods and in the manuscript seems to suggest that, in combination with marker expression, the cell position is used as a criteria to assign the fate to the progenitors - if this is the case, I guess the observed relationship in Figure 6 with LMDV distance is almost a control? This could be clarified for the readers.

      Yes indeed Figure 6 is a control as LMDV distances lead to final positions which form part of our determination of cell fates.

      As established by others’ and our previous studies (See also Fig.1A), the identification of MFPs and LFPs in zebrafish spinal cord is very robust. The MFPs are the apical constricted single column of cells along the midline on top of the notochord, and the LFPs are the 2 columns of cells next to MFP on both sides. LFPs’ expression of olig2:gfp did vary more in the posterior (timing of response/commitment could be a factor as the reviewer pointed out), but eventually the cells at those positions will be V3 interneurons or floor plates and have not been observed to make motoneurons. There are 3 low Olig2:GFP pMNs in the anterior dataset (Fig.2B’) and 3 high Olig2:GFP LFPs in the posterior dataset (Fig.2D’) that we checked carefully.

      The methods of fate determination are described in detail in methods.

      (8) The graphs in Figures 6 and 7 are difficult to interpret. What proportion, and absolute number, of cells are "mis specified" when the authors show the distinct colored lines in the pMN, LFP or more dorsal domains? How do the authors determine where each cell fate domain begins and ends to access for "mis-specified" cells? Can the authors also provide the corresponding experimental images in the figure?

      We apologize for the difficulties to interpret these figures. The graphs are a ranked list of all cells using the specified metric. The visual is to help generate an intuition of how mixed vs clear-cut the pattern is given the tested metric. They are not to be interpreted as the actual pattern in the tissue and there are no data images that show these patterns.

      (9) Given the experimental limitations/technical challenges discussed by the authors during the paper, the score of around 90% of predictability of cell fate choices is rather high in the anterior domain, suggesting a minor functional role for heterogeneity in this region. Even for the posterior domain, the score of 83% predictability based on the maximum response to Shh is still relatively high. In my view, this author's conclusions should be adjusted to make this difference clearer in the abstract and discussion, highlighting that the heterogeneity between Shh response and cell fate choices, particularly in the pMN fate, are stronger in the posterior domain affecting the precision of cell fate decisions particularly in this region. Can the authors further comment on potential mechanisms driving this difference?

      Yes – we agree that most cells are actually accurate in such a highly dynamic tissue. In the literature, the view has been more focused on how the GRN enables this accuracy. We therefore highlighted the heterogeneity and limit of accuracy of the GRN here.

      We have added the fact that the Shh response is still the main determinant of the pattern despite the heterogeneity in the Discussion. We also further discussed possibilities of the anterior posterior differences.

      (10) Following up from the previous point, the data in Figure 7 suggests that there might be different underlying mechanisms in how anterior and posterior cells interpret the Shh profile, with anterior cells potentially responding to the integrated concentration of Shh (since response time, average response, or maximum response to Shh all provide similar predictability scores for cell fate choices). In contrast, only the maximum response to Shh can provide a good prediction of posterior cell fate, consistent with a more instantaneous response to morphogen concentration (and thus potentially more error-prone measurement of the Shh profile?). This is a very interesting observation in my view. Could this be further tested?

      Thank you. Yes we found this very interesting too. We discussed the possibilities, including the reviewer’s suggestion that these cells may have different contexts or strategy to interpret the signal. It is also possible that the anterior cells use the same strategy (maximum response at an early time) and the subsequent response/duration do not matter to their fate commitment. A precise approach to shut down Shh response dynamics in single cells (e.g., optogenetics) will enable the test of these ideas. We hope following up studies will take such approaches.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) Conceptual framing and interpretation:

      The central conclusion may require more precise framing to avoid potential overreach. The authors' interpretation equating "physical distance between TAD boundaries" with overall "TAD boundary architecture," and "transcriptional bursting events" with broader "gene activity," could benefit from clarification. This framing may not fully capture the temporal dynamics of transcription or the regulatory complexity within TADs. Furthermore, the broad conclusion of an uncoupled relationship appears to challenge extensive prior evidence from perturbation studies showing that disrupting TAD boundaries can alter gene expression. The authors' own observation of reduced gene activity upon RAD21 degradation suggests that global TAD disruption can affect transcription. A more precise and limited conclusion, acknowledging that their data demonstrate a lack of detectable correlation between boundary distance and bursting activity in their system, would be more accurate and help reconcile these findings with the existing literature.

      We have modified statements throughout the manuscript, including in the title, to enhance the precision of our conclusions to avoid overreach. We have also added on p. 16 of our Discussion, a separate section on the limitations of the study, noting that our conclusions are limited to TAD boundary distances and do not reflect the structure of TAD boundaries or of TADs themselves. We have also expanded our Discussion of possible TAD functions on p. 14/15.

      (2) Technical methods and data presentation:

      (2.1) Accuracy and dimensionality of distance measurements: The manuscript does not clearly state whether distances are measured in 2D or 3D, nor does it sufficiently address precision limits. The stated Z-step size (1 µm) may be inadequate for accurately measuring sub-micron chromatin distances in 3D.

      We state in both the Results and Methods that our data represent 2D distances derived from maximal-intensity projections of 3D image stacks. We previously published a detailed analysis of the precision of this measurement approach applied to chromatin interactions and documented the effect of 2D vs 3D analysis on these types of measurements. This study by Finn et al., 2022 is cited in the text. We also show in Figure S3 and mention on p. 6 and 10 that we observe similar results using either 2D or 3D analysis.

      (2.2) Probe design and systematic error: The genomic coverage size of the BAC probes used for DNA FISH is not explicitly stated. Large probe coverage could inherently blur the precise spatial location of adjacent DNA loci. The reported average distance (~300 nm) may be influenced by the physical size of the probes, as well as systematic expansion or distortion introduced by sample fixation and FISH processing. Although such technical limitations are currently unavoidable, the authors should clarify how these factors might affect their ability to detect subtle distance changes.

      The genomic location and size of all probes are provided in Supplementary Table 1. We deliberately use relatively large BAC probes both to generate robust, highly reproducible signals and to eliminate effects arising from local chromatin behavior. In line with earlier characterization of BAC probes (Finn et al., Cell, 2019; Finn et al., Methods, 2022), we find a strong correlation between micro-C/Hi_C interaction frequency and distance measurements. Systematic errors such as sample fixation and FISH processing have previously been evaluated by comparison to live cell data (see Finn et al., 2019) and found to be negligible, especially as all our analyses involve pairwise comparisons, which would both be similarly affected by systematic errors. We discuss resolution limits due to probe size in our new section on study limitations on p. 16.

      (2.3) Data Visualization: The manuscript would benefit from including representative, zoomed-in regions of interest from the raw imaging data. This would allow readers to visually assess measured distance differences against background noise.

      Raw images for inspection at any magnification are available at https://figshare.com/projects/_b_TAD_boundaries_and_gene_activity_are_uncoupled_b_/271078.

      (2.4) Potential impact of resolution limits: In Figure 5, the micro-C data reveal a clear difference in interaction patterns inside versus outside the VARS2 locus TAD, yet the imaging data show no corresponding distance difference. This strongly suggests that the current imaging system, limited by optical resolution, probe size, and localisation accuracy, may be unable to resolve finer-scale spatial reorganizations associated with specific chromatin conformations (e.g., enhancer-promoter loops). The authors should explicitly discuss that their conclusion of "no coupling observed" may be constrained by the resolution and sensitivity of their method and does not preclude the possibility of detecting such associations with higher-precision measurements or in live-cell dynamics.

      We generally see good agreement between micro-C/Hi-C data and distance measurements. Specifically, we consistently find closer proximity of boundaries than non-boundaries and larger boundary distances for larger TADs than for smaller ones, as presented throughout the study. Contrary to the reviewer’s statement, this is also true for the VARS2 TAD, where we find statistically significant shorter boundary distances for boundary probes (350 nm) vs the outside control region (390 nm), which correlates with the difference in micro-C interaction score of 5847 vs 2308. These data are shown in Figure 3. Regardless, we mention the issue of resolution due to probe size in the study limitation section on p. 16.

      Reviewer #2 (Public review):

      In untreated cells, the distribution of distance measurements between boundary probes is exceptionally narrow. While depletion of RAD21 clearly demonstrates an ability to detect changes in this distribution, this tight baseline distribution may limit sensitivity to more subtle changes (like those one might expect from transcriptional influences). In addition, the correlation analysis is asymmetric, primarily stratifying by transcriptional status and then comparing boundary distances. Given the central claim that boundary architecture does not influence gene activity, the analysis should be done from the opposite perspective (stratifying by boundary distance).

      We mention the limitations on resolution of our approach in our discussion of study limitations on p. 16. An example of an analysis of stratifying by boundary distance is presented in Figure S3C. The conclusion is the same as stratifying by activity status.

      Strong disruption of boundary distances is only observed upon depletion of cohesin. Notably, this corresponds with the largest changes in gene activity. In contrast, depletion of CTCF actually had minimal impact on boundary distances and also had minimal impact on gene activity. This makes sense in light of previous work, where live cell imaging demonstrated that cohesin is more important for domain-structure, whereas CTCF is only important for blocking cohesin from continuing on, such that the fully formed loop occurs in a very small percentage of cells. Therefore, the fact that disruption of cohesin (more important for internal domain structure) affects gene activity while disruption of CTCF does not is exceptionally interesting but is lacking from the discussion.

      We mention the stronger effect of cohesion depletion compared to CTCF loss on gene expression in multiple locations in the Results and Discussion.

      On a related note, this approach primarily tests the role of boundary interactions rather than domain organization as a whole, and it should be acknowledged that internal domain structures are not directly assessed.

      We have modified statements throughout the manuscript to clearly indicate that our conclusions relate to boundary interactions rather than domain organization as a whole. We also discuss this in our section on study limitations.

      The comparison to work in other organisms (particularly the comparisons made to Drosophila) should be handled with care. The mechanisms underlying domain formation differ substantially across these systems, particularly regarding the differences in CTCF's role.

      We have modified our discussion of the data on Drosophila TADs, particularly as it relates to CTCF.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I couldn't locate the image data from figshare with the information provided (DOI: 10.6084/m9.figshare.30728354)

      The link has been updated

      https://figshare.com/projects/_b_TAD_boundaries_and_gene_activity_are_uncoupled_b_/271078.

      Reviewer #2 (Recommendations for the authors):

      Some of the conclusions overreach. I recommend revising the claims and discussion to focus solely on the proximity of boundaries, instead of TADs themselves. This would match better with your experiments.

      We have modified statements throughout the manuscript, including in the title, to enhance the precision of our conclusions to avoid overreach. We have also added on p. 16, a separate section on limitations of our study, noting that our conclusions are limited to TAD boundary distances and do not reflect on the structure of the TADs themselves. We have also expanded our Discussion of possible TAD functions on p. 14/15.

      I do disagree with the interpretation of the data in some parts, particularly at the end, where you state that disruption of TADs does not impact gene activity. For example, "Altogether, these results demonstrate that disruption of TAD boundary architecture is insufficient to alter gene expression" doesn't seem to match the results. Sure, depletion of CTCF minimally impacted gene expression, but it also minimally impacted the boundary distances. I think it is interesting that depletion of RAD21 had a bigger impact on both gene expression and boundary distances, and this should be discussed.

      We have deleted this statement and now mention on p. 13 that RAD21 depletion affected gene expression, whereas loss of CTCF did not, and on p. 15 that loss of RAD21 had a greater impact on boundary distances than loss of CTCF. We have also expanded our Discussion of possible TAD functions on p. 14/15.

      Related to this, I also recommend expanding the discussion of prior live-cell imaging work (ref 32) that showed that the fully formed CTCF loop is a rare event.

      We have expanded the discussion of prior live-cell imaging work in several locations.

      All the analysis is done from the perspective of the gene expression (e.g. group by expression and then measure distances). It would help to show that the inverse analysis is consistent (e.g. group by distances and measure gene expression).

      Analysis of data stratified by distance measurements is shown in Figure S3C.

      The discussion of the Drosophila work is strange, given that CTCF in Drosophila has a very different N-terminus, explaining why it doesn't really form loops. Sure, maybe it contributes to domains in some way, but probably no more than the dozens of other architectural proteins that have been found in that system. This work clearly focuses on CTCF-loop domains, so I would be specific about that. In the introduction, you do a good job of saying "in human cells, TADs are.... marked by binding sites for the CTCF protein". However, then you overgeneralize and state that TADs form via a process of loop extrusion. I think a simple statement before this to say that TADs in human cells have become somewhat synonymous with CTCF loop domains, and that is how you will use the term here. However, other organisms have TADs despite the lack of conservation of the CTCF protein.

      We have modified the text accordingly.

      On a related note, in the discussion, you cite two papers in Drosophila to state that "TADs form prior to the establishment of cell-type-specific gene expression programs", but that's not entirely accurate for those papers. They actually show that TADs occur coincident with ZGA, but loops form before that (ref 23: Espinola et al), or that there are indeed a few boundaries that show up before ZGA, but these correspond to RNA Polymerase (ref 24: Ing-Simmons et al.).

      We have corrected this statement.

    1. Author response:

      The following is the authors’ response to the original reviews.

      It is important to make a few key points about our work. First, our paper is largely a computational biophysics paper, augmented by experimental results. Generally speaking, computational biophysics work intends to achieve one of two things (or both). One is to provide more molecular level insight into various behaviors of biomolecular systems that have not been (or cannot be) provided by qualitative experimental results alone. The second general goal of computational biophysics it to formulate new hypotheses to be tested subsequently by experiment. In our paper, we have achieved both of these goals and then confirmed the key computational results by experiment.

      eLife Assessment

      This study investigates how the HIV inhibitor lenacapavir influences capsid mechanics and interactions with the nuclear pore complex. It provides important insights into how drug-induced hyperstabilization of the viral shell can compromise its structural integrity during nuclear entry. While the modeling is technically sophisticated and the results are promising, some mechanistic interpretations rely on assumptions embedded in the simulations, leaving parts of the evidence incomplete.

      Given our response below, regarding the rigor and “completeness” of our work, we do not feel that an editorial judgement of “leaving parts of the evidence incomplete” is justified.

      We also note that another recent experimental paper has validated essentially every prediction made in our eLife paper: https://www.biorxiv.org/content/10.64898/2026.01.05.697065v1

      We thus disagree that the evidence we have presented in our paper is incomplete.

      Public Reviews:

      Reviewer #1 (Public review):

      The paper from Hudait and Voth details a number of coarse-grained simulations as well as some experiments focused on the stability of HIV capsids in the presence of the drug lenacapavir. The authors find that LEN hyperstabilizes the capsid, making it fragile and prone to breaking inside the nuclear pore complex.

      I found the paper interesting. I have a few suggestions for clarification and/or improvement. 

      (1) How directly comparable are the NPC-capsid and capsid-only simulations? A major result rests on the conclusion that the kinetics of rupture are faster inside the NPC, but are the numbers of LENs bound identical? Is the time really comparable, given that the simulations have different starting points? I'm not really doubting the result, but I think it could be made more rigorous/quantitative.

      We note (also in the manuscript) that it is difficult to compare the timescales obtained from coarse-grained MD simulations and experiments (“real time”) given that, by design, the CG simulations are accelerated to greatly enhance sampling. However, we can qualitatively compare the timescales of different CG simulations (without directly comparing the corresponding experimental timescales).

      We agree with the reviewer that the starting point of NPC-capsid and capsid-only simulations is different, as is the biological environment in which the rupture occurs. When analyzing the NPC-only and capsid-only simulations, what was striking to us was that at the NPC the capsid-LEN complex ruptures in a multicomponent environment, where several FG-NUPs compete to displace the LENs. It is well established in experiments that LEN has a detrimental effect on capsid integrity.

      In Figure 2, we plot the number of LEN molecules as a function of CG simulation time. The initial capsid-LEN complex was equilibrated without NPC and then placed at the cytoplasmic end of the NPC for docking. The number of LEN molecules for the capsid-only simulations and the NPC-docked simulations is nearly identical, and an insignificant number of LEN molecules unbind at the NPC. Hence, we added the following clarification:

      Page 10, paragraph 11

      “Note that the number of LEN molecules bound to the capsid for the free capsid and NPCdocked capsids are nearly identical. Hence, the disparity in timescale of lattice rupture is not only because of the effect of LEN on capsid lattice properties.”

      Is the time really comparable, given that the simulations have different starting points?

      Yes, the CG timescales of both the NPC and freely diffusing capsid unbiased simulations are comparable, since they were done using identical simulation settings.

      (2) Related to the above, it is stated on page 12 that, based on the estimated free-energy barrier, pentamer dissociation should occur in ~10 us of CG time. But certainly, the simulations cover at least this length of time?

      Our implicit solvent CG MD simulations are designed to access timescales far beyond the capabilities of the fully atomistic simulations. We reiterate here that it is difficult to directly compare the timescales obtained from CG MD simulations and experiments.

      As described in the text, there are 12 pentamers in the capsid (7 in the wide end and 5 in the narrow end). For the narrow end to rupture, all 5 pentamers should progressively dissociate. In our unbiased simulations (Fig. S5), in 25 us of CG time, we observe (partial) dissociation of one or two pentamers. Hence, our unbiased CG simulation timescales were not long enough to observe rupturing of the narrow end.

      (3) At first, I was surprised that even in a CG simulation, LEN would spontaneously bind to the correct site. But if I read the SI correctly, LEN was parameterized specifically to bind to hexamers and not pentamers. This is fine, but I think it's worth describing in the main text.

      We modified (see below) the main text to include the details.

      Page 4, paragraph 1

      “We model LEN and CA interactions such that LEN molecules can only bind to CA hexamers, and all interactions to CA pentamers are turned off, as in experiments, CA selectively associates with hexamers (25, 36).”

      Reviewer #2 (Public review):

      Here, Hudait et al. use CG modeling to investigate the mechanism by which Lenacapavir (LEN) treats HIV capsids that dock to the nuclear pore complex (NPC). However, the manuscript fails to present meaningful findings that were previously unreported in the literature and is thus of low impact. Many claims made in the manuscript are not substantiated by the presented data. Key mechanistic details that the work purports to reveal are artifacts of the parameterization choices or simulation/analysis design, with the simulations said to reveal details that they were specifically biased to reproduce. This makes the manuscript highly problematic, as its contributions to the literature would represent misconceptions based on oversights in modeling and thus mislead future readers. 

      We strongly disagree with these statements, and they do not reflect the facts. We provide a rebuttal to these statements in the “Author Response” statements below.

      (1) Considering the literature, it is unclear that the manuscript presents new scientific discoveries. The following are results from this paper that have been previously reported:

      (a) LEN-bound capsid can dock to the nuclear pore (Figure 2; see e.g. 10.1016/j.cell.2024.12.008 or 10.1128/mbio.03613-24). 

      (b) NUP98 interacts with the docked capsid (Figure 2; see e.g. 10.1016/j.virol.2013.02.008 or 10.1038/s41586-023-06969-7 or 10.1016/j.cell.2024.12.008). 

      (c) LEN and NUP98 compete for a binding interface (Figure 2; see e.g. 10.1126/science.abb4808 or 10.1371/journal.ppat.1004459). 

      (d) LEN creates capsid defects (Figure 3 and 5, see e.g. 10.1073/pnas.2420497122). 

      (e) RNP can emerge from a damaged capsid (Figure 3 and 5; see e.g. 10.1073/pnas.2117781119 or 10.7554/eLife.64776). 

      (f) LEN hyperstabilizes/reduces the elasticity of the capsid lattice (Figure 6; see e.g. 10.1371/journal.ppat.1012537). 

      The goal of our simulations (in combination with experiments from the Pathak group) is to provide molecular-level insight into the sequence of events of NPC docking of capsid and the effect of LEN binding leading to sequential dissociation of pentamers and leading to rupturing of the narrow end of the cone-shaped capsid. We also compare the events leading to capsid rupture at the NPC with the same for a freely diffusing capsid, akin to that in cytoplasm. The reviewer should carefully read the abstract of our paper. In fact, the above are all papers that present qualitative experimental results that help validate our model, but they do not provide details on the molecule-scale events. For example, the paper (10.1073/pnas.2420497122 written by our coauthors in the Pathak group) is extensively used to compare the behavior of LEN-bound capsid in the cytoplasm.

      (2) The mechanistic findings related to how these processes occur are problematic, either based on circular reasoning or unsubstantiated, based on the presented data. In some cases, features of parameterization and simulation/analysis design are erroneously interpreted as predictions by the CG models. 

      We strongly disagree with this assessment. Our CG NPC model is largely a “bottomup” model derived from molecular scale interactions sampled in atomistic simulations (see our previous paper in PNAS https://doi.org/10.1073/pnas.2313737121). The reviewer appears to be ignorant of the “bottom-up” approach based on rigorous statistical mechanics to derive moleculescale model (please refer to a detailed review on bottom-up coarse-graining: J. Chem. Theory. Comput., 2022, 18. 5759-5791).

      Using the “bottom-up” CG model of the NPC, we predicted several molecular-level details of capsid import and docking to the NPC. Our key predictions were that there is an intrinsic capsid lattice elasticity and also the pleomorphic nature of the NPC channel is key for successful capsid docking https://doi.org/10.1073/pnas.2313737121). Our computational predictions have benn, for example, validated in a recently published paper by an experimental group: Hou, Z., Shen, Y., Fronik, S. et al. HIV-1 nuclear import is selective and depends on both capsid elasticity and nuclear pore adaptability. Nat Microbiol 10, 1868–1885 (2025). https://doi.org/10.1038/s41564025-02054-z). Our work is an excellent example of how systematically derived “bottom-up” CG models can accurately predict molecular details of complex biological processes.

      We have now added the following statement:

      Page 3, Paragraph 1

      “Importantly, the computational predictions of capsid docking to the NPC central channel have been recently validated in a HIV-1 core import at the NPC using cryo-ET (33), demonstrating how systematically derived “bottom-up” CG models can accurately predict molecular details of complex biomolecular processes.”

      (a) Claim: LEN-bound capsids remain associated with the NPC after rupture. CG simulations did not reach the timescale needed to demonstrate continued association or failure to translocate, leaving the claim unsubstantiated.

      The reviewer fails to recognize that the statement is based on the experimental results of LEN-bound capsid that remains bound to the NPC after rupture and fails to translocate to the nuclear side (from the Pathak group in the section “Ruptured LEN-viral complexes remain bound to the NPC”). The Reviewers’ comment is incorrect. 

      (b) Claim: LEN contributes to loss of capsid elasticity. The authors do not measure elasticity here, only force constants of fluctuations between capsomers in freely diffusing capsids. Elasticity is defined as the ability of a material to undergo reversible deformation when subjected to stress. Other computational works that actually measure elasticity (e.g., 0.1371/journal.ppat.1012537) could represent a point of comparison but are not cited. The changes in force constants in the presence of LEN are shown in Figure 6C, but the text of the scale bar legend and units of k are not legible, so one cannot discern the magnitude or significance of the change.

      The concept of elasticity can extend down to the mesoscopic scale. Many examples can be found in the large number of elastic network models (ENMs) of proteins published by many authors. The reviewer also fails to comprehend the meaning of the effective spring constants in the HeteroENM model and how they relate to the response of the capsid to stress (e.g., in the NPC). Note, in the NPC central channel, the capsid encounters several nucleoporins (including disordered FG Nucleoporins that not have specific interactions to rest of the proteins), and also a confined environment. This environment can exert inward stress to the capsid, which is also reflected in stress on the capsid lattice. Furthermore, the cited computational AFM studies are very far from a realistic in vivo or even in vitro set of conditions. In contrast, our study presents a realistic environment which the capsid will encounter in NPC, and then these predictions are validated by experimental results.

      (c) Claim: Capsid defects are formed along striated patterns of capsid disorder. Data is not presented that correlates defects/cracks with striations. 

      We presented the data of formation of striated patterns of lattice stress in the capsid that runs from capsid narrow end to the wide end in coarse-grained model (https://doi.org/10.1073/pnas.2313737121), and atomistic model (https://doi.org/10.1073/pnas.2117781119). Both of our papers are extensively cited in the current manuscript. Also, when the capsid is ruptured, one cannot visualize the striated patterns.

      (d) Claim: Typically 1-2 LEN, but rarely 3 bind per capsid hexamer. The authors state: "The magnitude of the attractive interactions was adjusted to capture the substoichiometric binding of LEN to CA hexamers (Faysal et al., 2024). ... We simulated LEN binding to the capsid cone (in the absence of NPC), which resulted in a substoichiometric binding (~1.5 LEN per CA hexamer), consistent with experimental data (Singh et al., 2024)." This means LEN was specifically parameterized to reproduce the 1-2 binding ratio per hexamer apparent from experiments, so this was a parameterization choice, not a prediction by CG simulations as the authors erroneously claim: "This indicates that the probability of binding a third LEN molecule to a CA hexamer is impeded, likely due to steric effects that prevent the approach of an incoming molecule to a CA hexamer where 2 LEN molecules are already associated. ... Approximately 20% of CA hexamers remain unoccupied despite the availability of a large excess of unbound LEN molecules. This suggests a heterogeneity in the molecular environment of the capsid lattice for LEN binding." These statements represent gross over-interpretation of a bias deliberately introduced during parameterization, and the "finding" represents circular reasoning. Also, if "steric effects" play any role, the authors could analyze the model to characterize and report them rather than simply speculate.

      Reviewer comment: “This means LEN was specifically parameterized to reproduce the 1-2 binding ratio per hexamer apparent from experiments, so this was a parameterization choice, not a prediction by CG simulations as the authors erroneously claim.” – This comment by reviewer is deeply flawed and we strongly disagree. In our CG model there is no restriction on the number of LEN molecules that can bind to a CA hexamer. We again restate that, the experimental results on LEN binding to CA hexamers and inability of LEN to bind to pentamers were used as no allatom (AA) forcefield yet exists.

      The steric effect of the lack of third LEN binding to a hexamer is a likely hypothesis (which one is allowed to make). More importantly, an investigation of the steric effect of LEN binding to the CA hexamer is not the main goal of the manuscript.

      (e) Claim: Competition between NUP98 and LEN regulates capsid docking. The authors state: "A fraction of LEN molecules bound at the narrow end dissociate to allow NUP98 binding to the capsid ... Therefore, LEN can inhibit the efficient binding of the viral cores to the NPC, resulting in an increased number of cores in the cytoplasm." Capsid docking occurs regardless of the presence of LEN, and appears to occur at the same rate as the LEN-free capsid presented in the authors' previous work (Hudait &Voth, 2024). The presented data simply show that there is a fluctuation of bound LEN, with about 10 fewer (<5%) bound at the end of the simulation than at the beginning, and the curve (Figure 2A) does not clearly correlate with increased NUP98 contact. In that case, no data is shown that connects LEN binding with the regulation of the docking process. Further, the two quoted statements contradict each other. The presented data appear to show that NUP outcompetes LEN binding, rather than LEN inhibiting NUP binding. The "Therefore" statement is an attempt to reconcile with experimental studies, but is not substantiated by the presented data.

      We disagree with this spurious statement, and we see no real contradiction. We have now added a minor clarification that LEN can inhibit efficient capsid binding at significantly high concentration.

      Page 6, Paragraph 1

      “Therefore, at significantly high concentration LEN can inhibit the efficient binding of the viral cores to the NPC, resulting in an increased number of cores in the cytoplasm.”

      (f) Claim: LEN binding leads to spontaneous dissociation of pentamers. The CG simulation trajectories show pentamer dissociation. However, it is quite difficult to believe that a pentamer in the wide end of the capsid would dissociate and diffuse 100 nm away before a hexamer in the narrow end (previously between two pentamers and now only partially coordinated, also in a highly curved environment, and further under the force of the extruding RNA) would dissociate, as in Figure 2B. A more plausible explanation could be force balance between pent-hex versus hex-hex contacts, an aspect of CG parameterization. No further modeling is presented to explain the release of pentamers, and changes in pent-hex stiffness are not apparent in the force constant fluctuation analysis in Figure 6C.

      This is both a misrepresentation of the simulations and a failure to understand them (as well as the supporting experiments) on the part of the reviewer. In the presence of LEN, the hexameric lattice is hyperstabilized. In contrast, the pentamers are not. As a consequence, the pentamers are dissociated. The pentamers at the narrow end are dissociated first, due to high curvature. The reviewer, from a point of being uninformed, simply speculates on what they think should happen. Moreover, as emphasized earlier and which the reviewer fails to comprehend is that ours is a “bottom-up CG model” so it predicts, not builds in, these effects.

      (g) Claim: WTMetaD simulations predict capsid rupture. The authors state: "In WTMetaD simulations, we used the mean coordination number (Figure S6) between CA proteins in pentamers and in hexamers as the reaction coordinate." This means that the coordination number, the number of pent-hex contacts, is the bias used to accelerate simulation sampling. Yet the authors then interpret a change in coordination number leading to capsid rupture as a discovery, representing a fundamental misuse of the WTMetaD method. Changes in coordination number cannot be claimed as an emergent property when they are in fact the applied bias, when the simulation forced them to sample such states. The bias must be orthogonal to the feature of interest for that feature to be discoverable. While the reported free energies are orthogonal to the reaction coordinate, the structural and stepwise-mechanism "findings" here represent circular reasoning.

      Unfortunately, the reviewer appears to be quite uninformed on the WTMetaD method and what it does. The chosen collective variable (CV) in our case is the coordination variable and the MetaD samples along that variable (the conditional free energy) as it is designed to do. The reviewer may wish to educate themself by reading Dama et al (https://doi.org/10.1103/PhysRevLett.112.240602). We also note that “emergent properties” are not along some other, uncoupled coordinate.

      (3) Another major concern with this work is the excessive self-citation, and the conspicuous lack of engagement with similar computational modeling studies that investigate the HIV capsid and its interactions with LEN, capsid mechanical properties relevant to nuclear entry, and other capsidNPC simulations (e.g., 10.1016/j.cell.2024.12.008 and 10.1371/journal.ppat.1012537). Other such studies available in the literature include examination of varying aspects of the system at both CG and all-atom levels of resolution, which could be highly complementary to the present work and, in many cases, lend support to the authors' claims rather than detract from them. The choice to omit relevant literature implies either a lack of perspective or a lack of collegiality, which the presentation of the work suffers from. Overall, it is essential to discuss findings in the context of competing studies to give readers an accurate view of the state of the field and how the present work fits into it. It is appropriate in a CG modeling study to discuss the potential weaknesses of the methodology, points of disagreement with alternative modeling studies, and any lack of correlation with a broader range of experimental work. Qualitative agreement with select experiments does not constitute model validation. 

      We disagree with this statement and point out where we have cited other work, including the ones mentioned above. However, our CG model is a largely bottom-up CG model which differs from other more ad hoc CG approaches (and some well-known CG models). We do not wish to emphasize the obvious flaws in those other CG approaches and models, since that is not the focus of our manuscript.

      (4) Other critiques, questions, concerns:

      (a) The first Results sub-heading presents "results", complete with several supplementary figures and a movie that are from a previous publication about the development of the HIV capsid-NPC model in the absence of LEN (Hudait &Voth, 2024). This information should be included as part of the introduction or an abbreviated main-text methods section rather than being included within Results as if it represents a newly reported advancement, as this could be misleading. 

      The movie in question (capsid docking to NPC without LEN) is essential for comparison of LEN-binding dynamics. Different from our previous paper, we simulated significantly longer timescales of capsid docking and performed several additional analyses that is relevant to this paper. Moreover, the first section of the result is titled “Coarse-grained modeling and simulation”, hence we only present a summary of the CG models and key validation steps in this section.

      (b) The authors say the unbiased simulations of capsid-NPC docking were run as two independent replicates, but results from only one trajectory are ever shown plotted over time. It is not mentioned if the time series data are averaged or smoothed, so what is the shadow in these plots (e.g., Figures 1,2, and Supplementary Figure 5)?

      These simulations are the average from two replicas. “For all the plots, the solid lines are the mean values calculated from the time series of two independent replicas, and the shaded region is the standard deviation at each timestep.” This was mentioned in the original figure caption.

      (c) Why do the insets showing LEN binding in Figure 2A look so different from the models they are apparently zoomed in on? Both instances really look like they are taken from different simulation frames, rather than being a zoomed-in view.

      It is difficult to discern a high curvature region of the capsid due to object overlap of different regions of the capsid. This is likely a case of “perspective distortion” in image processing.

      (d) What are the sudden jerks apparent in the SI movies? Perhaps this is related to the rate at which trajectory frames are saved, but occasionally, during the relatively smooth motion of the capsidNPC complex, something dramatic happens all of a sudden in a frame. For example, significant and apparently instantaneous reorientation of the cone far beyond what preceding motions suggest is possible (SI movie 2, at timestamp 0.22), RNP extrusion suddenly in a single frame (SI movie 2, at timestamp 0.27), and simultaneous opening of all pentamers all at once starting in a single frame (SI movie 2, at timestamp 0.33). This almost makes the movie look generated from separate trajectories or discontinuous portions of the same trajectory. If movies have been edited for visual clarity (e.g., to skip over time when "nothing" is happening and focus on the exciting aspects), then the authors should state so in the captions. 

      This is due to the rate at which trajectory frames are saved for movie generation for faster processing of the movies. We added the following in movie caption: 

      “The movie frames correspond to snapshots every 250000 𝜏<sub>CG</sub>.” 

      (e) Figure 3c presents a time series of the degree of defects at pent-hex and hex-hex interfaces, but I do not understand the normalization. The authors state, "we represented the defects as the number of under-coordinated CA monomers of the hexamers at the pentamer-hexamer-pentamer and hexamer-hexamer interface as N_Pen-Hex and N_Hex-Hex ... Note that in N_Pen-Hex and N_Hex-Hex are calculated by normalizing by the total number of CA pentamer (12) and hexamer rings (209) respectively." Shouldn't the number of uncoordinated monomers be normalized by the number of that type of monomer, rather than the number of capsomers/rings? E.g., 12*5 and 209*6, rather than 12 and 209?

      We prefer to continue with the current normalization, since typically in the HIV-1 literature capsids are represented as a collection of hexamers and pentamers (rather than total number of CA monomers).

      (f) The authors state that "Although high computational cost precluded us from continuing these CG MD simulations, we expect these defects at the hexamer-hexamer interface to propagate the high curvature ends of the capsid." The defects being reported are apparently propagating from (not towards) the high curvature ends of the capsid. 

      We corrected the statement as follows:

      “Although high computational cost precluded us from continuing these CG MD simulations, we expect these defects at the hexamer-hexamer interface to propagate from the high curvature to low curvature end of the capsid.”

      (g) The first half of the paper uses the color orange in figures to indicate LEN, but the second half uses orange to indicate defects, and this could be confusing for some readers. Both LEN and "defects" are simply a cluster of spheres, so highlighted defects appear to represent LEN without careful reading of captions.

      We only show LEN in Figure 1, and in rest of the figures the bound LEN molecules are not shown for clarity. The defects are shown in a darker shade of orange (amber). 

      (h) SI Figure S3 captions says "The CA monomers to which at least one LEN molecule is bound are shown in orange spheres. The CA monomers to which no LEN molecule is bound are shown in white spheres. " While in contradiction, the main-text Fig 2 says "The CA monomers to which at least one LEN molecule is bound are shown in white spheres. The CA monomers to which no LEN molecule is bound are shown in orange spheres. " One of these must be a typo.

      We have corrected the erroneous caption in Fig. S3. The color scheme in Fig. 2 and Fig. S3 are now consistent.

      (i) The authors state that: "CG MD simulations and live-cell imaging demonstrate that LEN-treated capsids dock at the NPC and rupture at the narrow end when bound to the central channel and then remain associated to the NPC after rupture." However, the live cell imaging data do not show where rupture occurs, such that this statement is at least partially false. It is also unclear that CG simulations show that cores remain bound following rupture, given that simulations were not extended to the timescale needed to observe this, again rendering the statement partially false.

      We modified the statement as follows:

      “CG MD simulations complemented by the outcome of live-cell imaging demonstrate that LENtreated capsids dock at the NPC and rupture at the narrow end when bound to the central channel and then remain associated with the NPC after rupture.”

      (j) The authors state: "We previously demonstrated that the RNP complex inside the capsid contributes to internal mechanical strain on the lattice driven by CACTD-RNP interactions and condensation state of RNP complex (Hudait &Voth, 2024). " In that case, why do the present CG models detect no difference in results for condensed versus uncondensed RNP?

      In our previous paper, the difference from condensation state of RNP complex appear only in the pill-shaped capsid, and not in the cone-shaped capsid. In this manuscript, we only investigated the cone-shaped capsid.

      (k) The authors state: "The distribution demonstrates that the binding of LEN to the distorted lattice sites is energetically favorable. Since LEN localizes at the hydrophobic pocket between two adjoining CA monomers, it is sterically favorable to accommodate the incoming molecule at a distorted lattice site. This can be attributed to the higher available void volume at the distorted lattice relative to an ordered lattice, the latter being tightly packed. This also allows the drug molecule to avoid the multitude of unfavorable CA-LEN interactions and establish the energetically favorable interactions leading to a successful binding event. " What multitude of unfavorable interactions are the authors referring to? Data is not presented to substantiate the claim of increased void volume between hexamers in the distorted lattice. Capsomer distortion is shown as a schematic in Figure 6A rather than in the context of the actual model.

      “What multitude of unfavorable interactions are the authors referring to?” We have now added the following sentence to clarify

      “Here we denote unfavorable CA-LEN interactions as all interactions other than the electrostatic and van der Waal interactions that lead to CA-LEN binding (17).”

      “In the distorted lattice, there is an increase of void volume is based on standard solid-state physics understanding. We added the word “likely” in the statement. “. This can likely be attributed to the higher available void volume at the distorted lattice relative to an ordered lattice, the latter being tightly packed (41).”

      Moreover, in one of our previous manuscripts, we established that compressive or expansive strain induces more closely packed or expanded lattice (A. Yu et al., Strain and rupture of HIV-1 capsids during uncoating. Proceedings of the National Academy of Sciences 119, e2117781119 (2022)).

      (l) The authors state that "These striated patterns also demonstrate deviations from ideal lattice packing. " What does ideal lattice packing mean in this context, where hexamers are in numerous unique environments in terms of curvature? What is the structural reference point?

      The ideal lattice packing definition is provided in our previous manuscripts: 1. A. Yu et al., Strain and rupture of HIV-1 capsids during uncoating. Proceedings of the National Academy of Sciences 119, e2117781119 (2022), 2. A. Hudait, G. A. Voth, HIV-1 capsid shape, orientation, and entropic elasticity regulate translocation into the nuclear pore complex. Proceedings of the National Academy of Sciences 121, e2313737121 (2024).

      These manuscripts are cited in the previous statement. The ideal lattice packing is defined based on lattice separations in each core (in cryo-ET and atomistic simulations) using a local order parameter, which measures the near-neighbor contacts of a particle. Moreover, the ideal packing reference is calculated from all available capsid shapes (cone, ellipsoid, and tubular), and takes into account different curvatures.

      (m) If pentamer-hexamer interactions are weakened in the presence of LEN, why are differences at these interfaces not apparent in the Figure 6C data that shows stiffening of the interactions between capsomer subunits?

      We have added a statement as follows:

      “Based on our analysis, we hypothesize that LEN binding hyperstabilzes the CA hexamerhexamer interactions relative to CA hexamer-pentamer interaction.”

      (n) The authors state: "Lattice defects arising from the loss of pentamers and cracks along the weak points of the hexameric lattice drive the uncoating of the capsid." The word rupture or failure should be used here rather than uncoating; it is unclear that the authors are studying the true process of uncoating and whether the defects induced by LEN binding relate in any way to uncoating. 

      We have now changed “uncoating” to “rupture” throughout the manuscript.

      (o) The authors state: " LEN-treated broken cores are stabilized by the interaction with the disordered FG-NUP98 mesh at the NPC." But no data is presented to demonstrate that capsid stability is increased by NUP98 interaction. In fact, the presented data could suggest the opposite since capsids in contact with NUP98 in the NPC appeared to rupture faster than freely diffusing capsids.

      We have modified the statement as follows

      “We hypothesize that LEN-treated broken cores are stabilized by the interaction with the disordered FG-NUP98 mesh at the NPC.”

      (p) The authors state: "LEN binding stimulates similar changes in free capsids, but they occur with lower frequency on similar time scales, suggesting that the cores docked at the NPC are under increased stress, resulting in more frequent weakening of the hexamer-pentamer and hexamerhexamer interactions, as well as more nucleation of defects at the hexamer-hexamer Interface. ... Our results suggest that in the presence of the LEN, capsid docking into the NPC central channel will increase stress, resulting in more frequent breaks in the capsid lattice compared to free capsids." The first is a run-on sentence. The results shown support that LEN stimulates changes in free capsids to happen faster, but not more frequently. The frequency with which an event occurs is separate from the speed with which the event occurs.

      We have fixed the run-on sentence.

      The results shown support that LEN stimulates changes in free capsids to happen faster, but not more frequently. The frequency with which an event occurs is separate from the speed with which the event occurs.

      We disagree with the reviewer. The statement was intended to provide a comparison between free capsid and NPC-bound capsid.

      (q) The authors state: "A possible mechanistic pathway of capsid disassembly can be that multiple pentamers are dissociated from the capsid sequentially, and the remaining hexameric lattice remains stabilized by bound LEN molecules for a time, before the structural integrity of the remaining lattice is compromised." This statement is inconsistent with experimental studies that say LEN does not lead to capsid disassembly, and may even prevent disassembly as part of its disruption of proper uncoating (e.g., 10.1073/pnas.2420497122 previously published by the authors).

      We disagree with the interpretation of the reviewer. Our interpretation based on our results is LEN binding accelerates capsid rupture (from pentamer-rich high curvature ends), and the rest of the broken hexameric lattice is hyperstabilized. Ultimately, lattice rupture will lead to release the RNP, and hence the intended goal of the drug is achieved.

      (r) Finally, it remains a concern with the authors' work that the bottom-up solvent-free CG modeling software used in this and supporting works is not open source or even available to other researchers like other commonly used molecular dynamics software packages, raising significant questions about transparency and reproducibility.

      The simulations were performed in LAMMPS, which is open source. This software is already stated in the Methods. Input data is provided upon request.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 1: In part B, it appears the middle panel was screenshotted from a ppt, given the red line underneath Lenacapavir. You can export it to an image instead.

      The figure is fixed.

      (2) Figure 6: In part A, the LEN_d in the graph is illegible. Also, in the panel next to it, it also appears to have been screenshotted from a ppt.

      The figure is fixed.

      (3) Page 6: There's an errant quotation mark at the end of a paragraph.

      Removed the errant quotation

      Reviewer #2 (Recommendations for the authors):

      The code used to perform bottom-up solvent-free CG modeling simulations is not made available.

      This is not true. LAMMPS was used as stated in Methods.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This study established a C921Y OGT-ID mouse model, systematically demonstrating in mammals the pathological link between O-GlcNAc metabolic imbalance and neurodevelopmental disorders (cortical malformation, microcephaly) as well as behavioral abnormalities (hyperactivity, impulsivity, learning/memory deficits). However, critical flaws in the current findings require resolution to ensure scientific rigor.

      The most concerning finding appears in Figure S12. While Supplementary Figure S12 demonstrates decreased OGA expression without significant OGT level changes in C921Y mutants via Western blot/qPCR, previous reports (Florence Authier, et al., Dis Model Mech. 2023) described OGT downregulation in Western blot and an increase in qPCR in the same models. The opposite OGT expression outcomes in supposedly identical mouse models directly challenge the model's reliability. This discrepancy raises serious concerns about either the experimental execution or the interpretation of results. The authors must revalidate the data with rigorous controls or provide a molecular biology-based explanation.

      We thank the reviewer for their time and effort in improving the quality of our manuscript.

      We would like to point out that the results presented in the previous Fig. S12 (now Fig. S13) are from different ages of the mice and restricted to the prefrontal cortex, compared to the previous report (Florence Authier, et al., Dis Model Mech. 2023) where we showed OGT and OGA mRNA/protein expression in total brain homogenates. In this previous study, we observed a significant reduction in OGT protein levels while OGT mRNA levels were significantly increased in the brains of 3 months old mutant C921Y compared to WT controls. However, in our current study (Figure S12, now S13), OGA and OGT mRNA/protein expression have been a) restricted to the pre-frontal cortex and b) are from 4 months old male mice. Therefore, a direct comparison of findings from total brain vs. prefrontal cortex would be speculative. In our present work, OGT protein levels are not changed in the pre-frontal cortex, while OGT mRNA levels are increased (similarly to the total brain data), albeit not significantly.

      It is plausible that the different levels of OGT protein expression in total brain (previous study) and prefrontal cortex (current study) potentially reflect regional differences in the regulation of OGT protein levels/stability, since OGT mRNA levels are increased in both cases. This notion is also supported by additional analyses in three other brain regions (hippocampus, striatum and cerebellum) and these data are now included in Figures S13 and S14.

      A few additional comments to the author may be helpful to improve the study.

      Major

      (1) While this study systematically validated multi-dimensional phenotypes (including neuroanatomical abnormalities and behavioral deficits) in OGT C921Y mutant mice, there is a lack of relevant mechanisms and intervention experiments. For example, the absence of targeted intervention studies on key signaling pathways prevents verification of whether proteomics-identified molecular changes directly drive phenotypic manifestations.

      We agree with the reviewer that the suggested experiments would further strengthen our work. However, the extensive nature of the suggested studies would result in considerable delay in sharing this work with the scientific and patient communities. Nevertheless, we appreciate the reviewers’ comment and will continue to work along these lines, and report in a follow up manuscript in the future.

      (2) Although MRI detected nodular dysplasia and heterotopia in the cingulate cortex, the cellular basis remains undefined. Spatiotemporal immunofluorescence analysis using neuronal (NeuN), astrocytic (GFAP), and synaptic (Synaptophysin) markers is recommended to identify affected cell populations (e.g., radial glial migration defects or intermediate progenitor differentiation abnormalities).

      Following the reviewers’ suggestion, we have performed additional analyses to identify the cellular composition of the observed nodular dysplasia using neuronal and glial markers. These new analyses indicate that the nodular collections in the layers II/III were predominantly neurons, for example see cresyl violet (Fig. 6E). Moreover, we have also performed immunofluorescence imaging using NeuN and GFAP (Fig. 6G-H), which reflect that the dystrophic collections are predominantly neurons. To further corroborate these findings, we have also performed multiplex IHC analyses, presented in Fig. S12, which indicate that: i) the nodular cortical malformations were populated by neurons and oligodendrocytes and ii) predominantly affected layers II-V, as reflected by the distribution of neuronal markers Reelin and POU class 3 homeobox 2 (POU3F2), and collectively (Fig. 6 and Fig. S12) reflect neuronal disorganisation due to migration defects rather than differentiation defects. We appreciate the reviewers’ suggestion to perform spatiotemporal analyses of these cellular features; however, tissue from defined stages of development is not available. 

      (3) While proteomics revealed dysregulation in pathways including Wnt/β-catenin and mTOR signaling, two critical issues remain unresolved: a) O-GlcNAc glycoproteomic alterations remain unexamined; b) The causal relationship between pathway changes and O-GlcNAc imbalance lacks validation. It is recommended to use co-immunoprecipitation or glycosylation sequencing to confirm whether the relevant proteins undergo O-GlcNAc modification changes, identify specific modification sites, and verify their interactions with OGT.

      We agree with the referee that these experiments would further strenghten the work. However, we respectfully point out that the inference that altered proteins must themselves be O-GlcNAc modified is not necessarily correct. For instance, O-GlcNAcylation of unknown protein kinase X, E3 ligase/DUB, Y or transcription factor Z could indirectly affect these pathways/proteins. Nevertheless, we have performed further experiments to explore whether Wnt/β-catenin and mTOR signalling are functionally affected, as pointed out by the referee. In the qPCR analyses, we did not observe significant changes in expression of Wnt target genes (Cdkn1a, Ccnd1, Myc, Ramp3, Tfrc), neither in protein levels of key proteins involved in Wnt/β-catenin (non-phosphorylated β-catenin) and mTOR (phosphorylated rpS6) signalling by western blots (data not shown). These results suggest that both pathways are not functionally deregulated in prefrontal cortex of adult OGT<sup>C921Y</sup> mice to a significant extent.

      (4) Given that OGT-ID neuropathology likely originates embryonically, we recommend serial analyses from E14.5 to P7 to examine cellular dynamics during critical corticogenesis phases.

      We appreciate the reviewers’ suggestion to perform spatiotemporal analyses of these cellular dynamics; however, tissue from defined stages of development is not available. As stated above, we want to share our current findings with the scientific and patient communities in a timely manner, and the suggested experiments could form the foundation of a follow up study in the future.

      (5) The interpretation of Figure 8A constitutes overinterpretation. Current data fail to conclusively demonstrate impairment of OGT's protein interaction network and lack direct evidence supporting the proposed mechanisms of HCF1 misprocessing or OGA loss.

      Thank you for the comment. To avoid misleading the readers, we have removed panel A from the previous version of Figure 8 and updated the version of record.

      Reviewer #2 (Public review):

      Summary:

      The authors are trying to understand why certain mutants of O-GlcNAc transferase (OGT) appear to cause developmental disorders in humans. As an important step towards that goal, the authors generated a mouse model with one of these mutations that disrupts OGT activity. They then go on to test these mice for behavioral differences, finding that the mutant mice exhibit some signs of hyperactivity and differences in learning and memory. They then examine alterations to the structure of the brain and skull and again find changes in the mutant mice that have been associated with developmental disorders. Finally, they identify proteins that are up- or down-regulated between the two mice as potential mechanisms to explain the observations.

      Strengths:

      The major strength of this manuscript is the creation of this mouse model, as a key step in beginning to understand how OGT mutants cause developmental disorders. This line will prove important for not only the authors but other investigators as well, enabling the testing of various hypotheses and potentially treatments. The experiments are also rigorously performed, and the conclusions are well supported by the data.

      Weaknesses:

      The only weakness identified is a lack of mechanistic insight. However, this certainly may come in the future through more targeted experimentation using this mouse model.

      We agree with the reviewer that the suggested experiments would further strengthen our work. However, the extensive nature of the suggested studies would result in considerable delay in sharing this work with the scientific and patient communities. Nevertheless, we appreciate the reviewers’ comment and will continue to work along these lines, and report in a follow up manuscript in the future.

      Recommendations for the authors:

      Editor's note:

      Should you choose to revise your manuscript, if you have not already done so, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and, where appropriate, 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript.

      Statistics including exact p-values have been included in the main text for all key questions where appropriate.

      Reviewer #1 (Recommendations for the authors):

      (1) In Figure 1F, the y-axis labels and scale values are partially obscured by graphical elements, compromising accurate interpretation of the data range.

      Panel 1F has been adjusted to make the y-axis label visible.

      (2) Regarding the histological analyses in Figure 6, the current H&E staining and Luxol Fast Blue myelin staining results lack age-matched wild-type control samples processed in parallel, which undermines experimental comparability. To enhance methodological rigor, control group staining results should be displayed adjacent to each experimental group image.

      The original Figure 6 already contained comparison between WT and OGT<sup>C921Y</sup> tissues. The Figure has been updated with additional data from the WT and C921Y mutant groups shown side by side.

      Reviewer #2 (Recommendations for the authors):

      (1) I believe that Figures S1 and S2 were switched during the submission. The legends are correct, so the authors should just be careful with the order when they upload the final versions.

      Figures S1 and S2 have been re-ordered.

      (2) On page 18, the authors state, "Although no significant changes in the expression of OGT were observed in OGTC921Y cortex (Figure S12A, C), there was a significant increase in OGT/OGA protein ratio in OGTC921Y mice (Fig. S12D). As a functional consequence, global O-GlcNAcylation of proteins in the brain was drastically impaired in the OGTC921Y brain compared to WT (Figure S12E, F).

      To me, this statement suggests that the incorrect ratio of OGT to OGA is responsible for the altered O-GlcNAc levels. I think this is missing important information. The authors are, I'm sure, aware that OGT and OGA expression is linked to O-GlcNAc levels. I think it would be better to describe the situation here as the tissue attempting to respond to lower OGT activity by lowering OGA levels. However, the tissue is not fully successful, resulting in lower overall O-GlcNAc levels as seen by RL2. If the difference were only driven by the OGT/OGA ratio, one would expect increased O-GlcNAc levels due to decreased OGA. I think it is important to point out more details here for non-expert readers.

      Thank you for the insightful comment, we have included these aspects in the revised text, please see page 20.

      (3) I am a little surprised that the authors did not explore differences in O-GlcNAc-modified proteins through a more targeted enrichment of these proteins for analysis of potential modification differences, in addition to just changes in protein abundance.

      We agree that these experiments would further strengthen the work. However, it is not known yet whether OGT-CDG is caused by loss of O-GlcNAc modification on specific proteins or due to as yet to decipher mechanisms (e.g. OGT interactome, HCF1 processing, feedback on OGA levels) which we are not able to confirm in the current manuscript. Therefore, as a starting point, we have performed whole proteome analysis to establish candidate hypothesis which could lead to discovering cellular and molecular mechanisms underlying OGT-CDG. Lastly, we appreciate the reviewers’ comment and will continue to work along these lines, and report in a follow up manuscript in the future.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript presents high-resolution cryoEM structures of VPS34-complex II bound to Rab5A at 3.2A resolution. The Williams group previously reported the structure of VPS34 complex II bound to Rab5A on liposomes using tomography, and therefore, the previous structure, although very informative, was at lower resolution.

      The first new structure they present is of the 'REIE>AAAA' mutant complex bound to RAB5A. The structure resembles the previously determined one, except that an additional molecule of RAB5A was observed bound to the complex in a new position, interacting with the solenoid of VPS15.

      Although this second binding site exhibited reduced occupancy of RAB5A in the structure, the authors determined an additional structure in which the primary binding site was mutated to prevent RAB5A binding ('REIE>ERIR'). In this structure, there is no RAB5A bound to the primary binding site on VPS34, but the RAB5A bound to VPS15 now has strong density. The authors note that the way in which RAB5A interacts with each site is distinct, though both interfaces involve the switch regions. The authors confirm the location of this additional binding site using HDX-MS.

      The authors then determine multiple structures of the wild-type complex bound to RAB5A from a single sample, as they use 3D classifications to separate out versions of the complex bound to 0, 1, or 2 copies of RAB5A. Overall, the structure of VPS34-Complex II does not change between the different states, and the data indicate that both RAB5A binding sites can be occupied at the same time.

      The authors then design a new mutant form of the complex (SHMIT>DDMIE) that is expected to disrupt the interaction at the secondary site between VPS15 and RAB5A. This mutation had a minor impact on the Kd for RAB5A binding, but when combined with the REIE>ERIR mutation of the primary binding site, RAB5A binding to the complex was abolished.

      Comparison of sequences across species indicated that the RAB5A binding site on VPS15 was conserved in yeast,while the RAB5A binding site on VPS34 is not.

      The authors tested the impact of a corresponding yeast Vps15 mutation (SHLITY>DDLIEY) predicted to disrupt interaction with yeast Rab5/Vps21, and found that this mutant Vps15 protein was mislocalized and caused defective CPY processing.

      The authors then compare these structures of the RAB5A-class II complex to recently published structures from the Hurley group of the RAB1A-class I complex, and find that in both complexes the Rab protein is bound to the VPS34 binding site in a somewhat similar manner. However, a key difference is that the position of VPS34 is slightly different in the two complexes because of the unique ATL14L and UVRAG subunits in the class I and class II complexes, respectively. This difference creates a different RAB binding pocket that explains the difference in RAB specificity between the two complexes.

      Finally, the higher resolution structures enable the authors to now model portions of BECLIN1 and UVRAG that were not previously modeled in the cryoET structure.

      Strengths:

      Overall, I found this to be an interesting and comprehensive study of the structural basis for the interaction of RAB5A with VPS34-complex II. The authors have performed experiments to validate their structural interpretations, and they present a clear and thorough comparative analysis of the Rab binding sites in the two different VPS34 complexes. The result is a much better understanding of how two different Rab GTPases specifically recruit two different, but highly similar complexes to the membrane surface.

      Weaknesses:

      No significant weaknesses were noted.

      Reviewer #2 (Public review):

      Summary:

      The work by Spokaite et al describes the discovery of a novel Rab5 binding site present in complex II of class III PI3K using a combination of HDX and Cryo EM. Extensive mutational and sequence analysis define this as the primordial Rab5 interface. The data presented are convincing that this is indeed a biologically relevant interface, and is important in defining mechanistically how VPS34 complexes are regulated.

      This paper is a very nice expansion of their previous cryo-ET work from 2021, and is an excellent companion piece on high-resolution cryo-EM of the complex I class III complex bound to Rab1 from the Hurley lab in 2025. Overall, this work is of excellent technical quality and answers important unexplained observations on some unexpected mutational analysis from the previous work.

      They used their increased affinity VPS34 mutant to determine the 3.2 ang structure of Rab5 bound to VPS34-CII. Clear density was seen for the original Rab5 interface, but an additional site was observed. Based on this structure, they mutated out the VPS34 interface, allowing for a high-resolution structure of the Rab5 bound at the VPS15 interface.

      They extensively validated the VPS15 interface in the yeast variant of VPS34, showing that the Vp215-Rab5 (VPS21) interface identified is critical in controlling complex II VPS34 recruitment.

      The major strengths of this paper are that the experiments appear to be done carefully and rigorously, and I have very few experimental suggestions.

      Here is what I recommend based on some very minor weaknesses I observed

      (1) My main concern has to do a little bit with presentation. My main issue is how the authors use mutant description. They clearly indicate the mutant sequence in the human isoform (for example, see Figure 2A, VPS15 described as 579-SHMIT-583>DDMIE); however, when they shift to the yeast version, they shift to saying VPS15 mutant, but don't define the mutant, Figure 2G). I would recommend they just include the same sequence numbering and WT to mutant replacement every time a new mutant (or species) is described. It is always easier to interpret what is being shown when the authors are jumping between species, when the exact mutant is included. This is particularly important in this paper, where we are jumping between different subunits and different species, so a clear description in the figure/figure legends makes it much easier to read for non-specialists.

      The reviewer has made an excellent point here. To clarify the yeast mutation, we have revised the manuscript main text to refer to the yeast mutant as SHLITY>DDLIEY, and we have added this to the legend for Figs. 2F,G.

      (2) The HDX data very clearly shows that Rab5 is likely able to bind at both sites, which back ups the cryo EM data nicely. I am slightly confused by some of the HDX statements described in the methods.

      (3) The authors state, "Only statistically significant peptides showing a difference greater than 0.25 Da and greater than 5% for at least two timepoints were kept." This seems to be confusing as to why they required multiple timepoints, and before they also describe that they required a p-value of less than 0.05. It might be clearer to state that significant differences required a 0.25 Da, 5%, and p-value of <0.05 (n=3). Also, what do they mean by kept? Does this mean that they only fully processed the peptides with differences?

      (4) They show peptide traces for a selection in the supplement, but it would be ideal to include the full set of HDX data as an Excel file, including peptides with no differences, as there is a lot of additional information (deuteration levels for everything) that would be useful to share, as recommended from the Masson et al 2019 recommendations paper. This may be attached, but this reviewer could not see an example of it in the shared data dropbox folder.

      We have revised the HDX method description to clarify. All peptides were kept and fully processed. However, for the results displayed, we have illustrated only peptides meeting the criteria described.

      The Excel file for all peptides (as recommended by Masson et al) was deposited with PRIDE, with the identifier with the dataset identifier PXD061277, in addition, we have included this excel file in our supplementary material.

      Reviewer #3 (Public review):

      Summary:

      The manuscript of Spokaite et al. focuses on the Vps34 complex involved in PI3P production. This complex exists in two variants, one (class I) specific for autophagy, and a second one (class II) specific for the endocytic system. Both differ only in one subunit. The authors previously showed that the Vps34 complexes interact with Rab GTPases, Rab1 or Rab5 (for class II), and the identified site was found at Vps34. Now, the authors identify a conserved and overlooked Rab5 binding site in Vps15, which is required for the function of the Class II complex. In support of this, they show cryo-EM data with a second Rab5 bound to Vps15, identify the corresponding residues, and show by mutant analysis that impaired Rab5 binding also results in defects using yeast as a model system.

      Overall, this is a most complete study with little to criticize. The paper shows convincingly that the two Rab5 binding sites are required for Vps34 complex II function, with the Vps15 binding site being critical for endosomal localization. The structural data is very much complete.

      Weaknesses:

      What I am missing are a few controls that show that the mutations in Vps15 do not affect autophagy. I am wondering if this mutant is still functional in autophagy. This can be simply tested by sorting of Atg8 to the vacuole lumen using established assays or by following PhoΔ60 sorting. This analysis would reveal that the corresponding mutant is specific for the Class II complex.

      One of the first noted features of the VPS34 complexes was that the ATG14-containing complex (VPS34-CI) is important for autophagy, while the VPS38 (yeast orthologue of UVRAG) subunit characteristic of VPS34-CII is important for endocytic sorting (PMID 11157979). However, the VPS34, VPS15 and BECLIN1 subunits are required are present in both complexes, as such, mutations of them may affect both processes.

      We agree with the reviewer that is an important undertaking to examine the effect of the SHLITY>DDLIEY mutation in yeast Vps15 on autophagy. However, the focus of the current manuscript is VPS34-complex II and RAB5 interaction/activation. An autophagy effect would be more relevant for VPS34 complex I and RAB1. We have not presented any results for human VPS34-complex I - RAB1 nor yeast Vps34-complex I – Ypt1 (yeast RAB1 orthologue). We are preparing another manuscript focusing entirely on this, and it is not a simple story. While we think this is an important question, we believe that this is beyond the scope of the current manuscript.

      It would be helpful if the authors could clarify whether they believe that Vps34 kinase activity is stimulated by Rab binding or whether this stimulation is a consequence of better membrane localization of Vps34. In other words, is the complex active with soluble PI3P in solution, and does the activity change if Rab5 is added to the complex? This might have been addressed in the past, but I did not see evidence for this, as the authors only addressed the activity of the Vps34 complexes on membranes.

      The reviewer has raised an excellent question, which was addressed briefly in the introduction to the manuscript. We have now somewhat expanded on these issues near the end of the discussion in the revised manuscript. In our previously published study, we found that soluble RAB5-GTP did not stimulate the complex II activity (supplementary figure 2b of PMID: 33692360). This is consistent with our finding in this manuscript showing that RAB5 did not cause large conformational changes in solution. However, our previous single-molecule study showed that once complex II is recruited to the membrane by RAB5, and RAB5 increases the turnover rate on membranes, indicating an additional allosteric activation (Figure 7 of PMID: 33137306). This study indicated that the primary the role of RAB5 is to anchor complex II on the membrane. Once the complex is anchored on the membrane by RAB5, the kinase domain is in the vicinity of its substrate, PI, leading to higher turnover.

      The Echelon Class III PI3K ELISA Kit (Echelon, K-3000) comes with a soluble PI, diC8 to measure the VPS34 activity, and it is certainly active with this soluble substrate. However, if the substrate is in membranes, the VPS34 activity is greatly dependent on the character of the membrane.

      I also found the last paragraph of the results section a bit out of place, even though this is a nice observation that the N-terminal part of BECLIN has these domains. However, what does it add to the story?

      The reviewer is correct that the high-resolution features of BECLIN1 at the base of the V-shaped complex that we observed are not related to RAB5 binding, but they are characteristic of VPS34-CII and likely to be important for the specific role of VPS34-CII. This is the first high-resolution structure of the VPS34-CII that has been reported, and we believe it would be irresponsible not to briefly describe them, since they are unique to VPS34-CII. For this reason, we have placed this section at the end of the results, and we now clarify that we do not see a relevance to RAB5 function, but we describe the arrangement of a region (the BH3) that has been functionally noted in many previous studies, in the absence of a structure.

      Reviewing Editor Comments:

      Please address the following suggestions for minor changes to the manuscript. Use your best scientific judgment in addressing the comments and describe the modifications together with your reasoning in a cover letter. We look forward to seeing the revised version of this very nice study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I found a portion of the description of the cryoEM complexes on the top of page 9 to be redundant with similar descriptions near the top of page 7, and it was not clear to me at first that these were describing the same structures. Part of my confusion was due to the redundancy, including the statement near the bottom of page 7: 'Models were built and refined for all RAB5associated VPS34-CII assemblies', and then the similar statement on page 9: 'We fit and refined atomic models into both densities'. I believe these are describing the same models? To clarify for the reader, perhaps on page 9, the authors could begin this part with a statement such as "as described above", and eliminate the redundant descriptions.

      The reviewer is correct. Both sections describe the same set of cryo-EM classes from the same sample. The only difference is what we analysed in the two sections: number of RAB5s bound in the first section and the effect of RAB5 binding in the second section. We have revised the text to make this clear, and to make the second section more succinct.

      Reviewer #3 (Recommendations for the authors):

      (1) The authors show nicely that a mutation in Vps15 disrupts binding to Vps21 in vivo, with defects in the endocytic pathway as analyzed by CPY sorting. I am wondering if this mutant is still functional in autophagy. This can be simply tested by sorting of Atg8 to the vacuole lumen using established assays or by following Pho∆60 sorting. This analysis would reveal that the corresponding mutant is specific for the Class II complex. If the authors were to find evidence that this Vps15 mutant also affects autophagy, it would indicate that there is possibly also another Rab1 binding site in Vps15.

      As we stated above, an autophagy effect would be more relevant for VPS34 complex I and RAB1. We have not presented any results for human VPS34-complex I - RAB1 nor yeast Vps34-complex I – Ypt1 (yeast RAB1 orthologue). We are preparing another manuscript focusing entirely on this, and it is not a simple story. While we think this is an important question, we believe that this is beyond the scope of the current manuscript.

      (2) It would be helpful if the authors could clarify whether they believe that Vps34 kinase activity is stimulated by Rab binding or whether this stimulation is a consequence of better membrane localization of Vps34. In other words, is the complex active with soluble PI3P in solution, and does the activity change if Rab5 is added to the complex? This might have been addressed in the past, but I did not see evidence for this, as the authors only addressed the activity of the Vps34 complexes on membranes.

      As in our response to reviewer #3 above, this point was addressed in previous publications and was described in the introduction to our manuscript.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This important study provides compelling evidence that fever-like temperatures enhance the export of Plasmodium falciparum transmembrane proteins, including the cytoadherence protein PfEMP1 and the nutrient channel PSAC, to the red blood cell surface, thereby increasing cytoadhesion. Using rigorous and well-controlled experiments, the authors convincingly demonstrate that this effect results from accelerated protein trafficking rather than changes in protein production or parasite development. These findings significantly advance our understanding of parasite virulence mechanisms and offer insights into how febrile episodes may exacerbate malaria severity.

      We thank all reviewers for their constructive feedback on our manuscript.

      We believe we have addressed all the questions in the rebuttal below in writing, including planned experiments we will perform to strengthen the conclusions of the manuscript.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript from Jones and colleagues investigates a previously described phenomenon in which P. falciparum malaria parasites display increased trafficking of proteins displayed on the surface of infected RBCs, as well as increased cytoadherence in response to febrile temperatures. While this parasite response was previously described, it was not uniformly accepted, and conflicting reports can be found in the literature. This variability likely arises due to differences in the methods employed and the degree of temperature increase to which the parasites were exposed. Here, the authors are very careful to employ a temperature shift that likely reflects what is happening in infected humans and that they demonstrate is not detrimental to parasite viability or replication. In addition, they go on to investigate what steps in protein trafficking are affected by exposure to increased temperature and show that the effect is not specific to PfEMP1 but rather likely affects all transmembrane domain-containing proteins that are trafficked to the RBC. They also detect increased rates of phosphorylation of trafficked proteins, consistent with overall increased protein export.

      Strengths:

      The authors used a relatively mild increase in temperature (39 degrees), which they demonstrate is not detrimental to parasite viability or replication. This enabled them to avoid potential complications of a more severe heat shock that might have affected previously published studies. They employed a clever method of fractionation of RBCs infected with a var2csa-nanoluc fusion protein expressing parasite line to determine which step in the export pathway was likely accelerating in response to increased temperature. This enabled them to determine that export across the PVM is being affected. They also explored changes in phosphorylation of exported proteins and demonstrated that the effect is not limited to PfEMP1 but appears to affect numerous (or potentially all) exported transmembrane domain-containing proteins.

      Weaknesses:

      All the experiments investigating changes resulting from increased temperature were conducted after an increase in temperature from 16 to 24 hours, with sampling or assays conducted at the 24 hr mark. While this provided consistency throughout the study, this is a time point relatively early in the export of proteins to the RBC surface, as shown in Figure 1E. At 24 hrs, only approximately 50% of wildtype parasites are positive for PfEMP1, while at 32 hrs this approaches 80%. Since the authors only checked the effect of heat stress at 24 hrs, it is not possible to determine if the changes they observe reflect an overall increase in protein trafficking or instead a shift to earlier (or an accelerated) trafficking. In other words, if a second time point had been considered (for example, 32 hrs or later), would the parasites grown in the absence of heat stress catch up?

      We did not assess cytoadhesion at later stages, but in the supplementary figures we show that at 40 hours post infection both heat stress and control conditions have comparable proportions of VAR2CSA-positive iRBCs, whilst they differ at 24h. This is true for the DMSO (control wildtype resembling) HA-tagged lines of HSP70x and PF3D7_072500 (Supplementary Figures 9 and 12 respectively). In the light that protein levels appear not changed, we conclude that trafficking is accelerated during these earlier timepoints, but remains comparable at later stages. This would still increase the overall bound parasite mass as parasites start to adhere earlier during or after a heat stress.

      Reviewer #2 (Public review):

      This manuscript describes experiments characterising how malaria parasites respond to physiologically relevant heat-shock conditions. The authors show, quite convincingly, that moderate heat-shock appears to increase cytoadherance, likely by increasing trafficking of surface proteins involved in this process.

      While generally of a high quality and including a lot of data, I have a few small questions and comments, mainly regarding data interpretation.

      (1) The authors use sorbitol lysis as a proxy for trafficking of PSAC components. This is a very roundabout way of doing things and does not, I think, really show what they claim. There could be a myriad of other reasons for this increased activity (indeed, the authors note potential PSAC activation under these conditions). One further reason could be a difference in the membrane stability following heat shock, which may affect sorbitol uptake, or the fragility of the erythrocytes to hypotonic shock. I really suggest that the authors stick to what they show (increased PSAC) without trying to use this as evidence for increased trafficking of a number of non-specified proteins that they cannot follow directly.

      This is a valid point, however, uninfected RBCs do not lyse following heat stress, nor do much younger iRBCs, indicating that the observed effect is specific to infected RBCs at a defined stage. The sorbitol sensitivity assay is performed at 37°C under normal conditions after cells are returned to non–heat stress temperatures, so the effect is not due to transient changes in membrane permeability at elevated temperature.

      Planned experiment: However, to increase the strength of our conclusions and further test our hypothesis, we will perform sorbitol sensitivity assays on >20 hours post infection iRBCs following heat stress in the presence and absence of furosemide, a PSAC inhibitor. If iRBC lysis is abolished with furosemide present, this would confirm that the effect is PSAC-dependent. However, the effect could also possibly be due to altered PSAC activity during heat stress which is maintained at lower temperatures, as outlined in the discussion.

      New Results:

      We performed sorbitol sensitivity assays on >20 hours post-infection iRBCs following heat stress in the presence and absence of the PSAC inhibitor furosemide. These additional experiments were added to the supplementary figures (Supplementary Figure 3). Importantly, sorbitol-mediated lysis of iRBCs, with or without prior heat stress, was reduced when furosemide was present, demonstrating that the observed effect is likely PSAC-dependent. We also observed that uninfected RBCs did not lyse with sorbitol, regardless of heat stress, confirming that the effect is specific to infected cells.

      (2) Supplementary Figure 6C/D: The KAHRP signal does not look like it should. In fact, it doesn't look like anything specific. The HSP70-X signal is also blurry and overexposed. These pictures cannot be used to justify the authors' statements about a lack of colocalisation in any way.

      Planned experiment: We agree that the IFAs are not the best as presented and will include better quality supplementary images in a revised version.

      New Results:

      Immunofluorescence microscopy, including the localisation of the two HA-tagged proteins (PF3D7_1039000 and PF3D7_0702500), has been repeated and higher-quality images are now included in the updated manuscript (Supplementary Figures 9 and 11). These images include co-staining with the P. falciparum proteins KAHRP and SPB1 to assess possible co-localisations. Furthermore, following the reviewer’s suggestion, we have softened the statement regarding PF3D7_1039000-HA to better reflect the data, changing “...does not colocalise” to “...does not strongly colocalise”.

      (3) Figure 6: This experiment confuses me. The authors purport to fractionate proteins using differential lysis, but the proteins they detect are supposed to be transmembrane proteins and thus should always be found associated with the pellet, whether lysis is done using equinatoxin or saponin. Have they discovered a currently unknown trafficking pathway to tell us about? Whilst there is a lot of discussion about the trafficking pathways for TM proteins through the host cell, a number of studies have shown that these proteins are generally found in a membrane-bound state. The authors should elaborate, or choose an experiment that is capable of showing compartment-specific localisation of membrane-bound proteins (protease protection, for example).

      We do not believe we identified a novel trafficking pathway, but that we capture trafficking intermediates of PfEMP1 between the PVM and the RBC periphery, in either small vesicles, and possibly including Maurer’s clefts. These would still be membrane embedded, but because of their small size, not be pelleted using the centrifugation speeds in our study (we did not use ultracentrifugation). This explanation, we believe, is in line with the current hypothesis of PfEMP1 and other exported TMD protein trafficking to the periphery or the Maurer’s clefts.

      (4) The red blood cell contains, in addition to HSP70-X, a number of human HSPs (HSP70 and HSP90 are significant in this current case). As the name suggests, these proteins non-specifically shield exposed hydrophobic domains revealed upon partial protein unfolding following thermal insult. I would thus have expected to find significantly more enrichment following heat shock, but this is not the case. Is it possible that the physiological heat shock conditions used in this current study are not high enough to cause a real heat shock?

      As noted by the reviewer, we do not see enrichment of red blood cell heat shock proteins following heat stress, either with FIKK10.2-TurboID or in the phosphoproteome. We used a physiologically relevant heat stress that significantly modifies the iRBC, as shown by our functional assays. While a higher temperature might induce an association of red blood cell heat shock proteins, such conditions may not accurately reflect the most commonly found in the context of malaria infection.

      Reviewer #3 (Public review):

      Summary:

      In this paper, it is established that high fever-like 39 C temperatures cause parasite-infected red blood cells to become stickier. It is thought that high temperatures might help the spleen to destroy parasite-infected cells, and they become stickier in order to remain trapped in blood vessels, so they stop passing through the spleen.

      Strengths:

      The strength of this research is that it shows that fever-like temperatures can cause parasite-infected red blood cells to stick to surfaces designed to mimic the walls of small blood vessels. In a natural infection, this would cause parasite-infected red blood cells to stop circulating through the spleen, where the parasites would be destroyed by the immune system. It is thought that fevers could lead to infected red blood cells becoming stiffer and therefore more easily destroyed in the spleen. Parasites respond to fevers by making their red blood cells stickier, so they stop flowing around the body and into the spleen. The experiments here prove that fever temperatures increase the export of Velcro-like sticky proteins onto the surface of the infected red blood cells and are very thorough and convincing.

      Weaknesses:

      A minor weakness of the paper is that the effects of fever on the stiffness of infected red blood cells were not measured. This can be easily done in the laboratory by measuring how the passage of infected red blood cells through a bed of tiny metal balls is delayed under fever-like temperatures.

      Previous work by Marinkovic et al. (cited in this manuscript) reported that all RBCs, both infected and uninfected, increase in stiffness at 41 °C compared with 37 °C, with trophozoites and schizonts exhibiting a particularly pronounced increase. We agree that it would be interesting to determine whether similar changes occur at physiological fever-like temperatures, and whether this increase in stiffness coincides with the period of elevated protein trafficking. However, here we focused on enhanced protein export using multiple complementary approaches, and have chosen to address rigidity questions in a different study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      As mentioned above, a second time point in many of the assays (for example, 36 hrs or later) would be useful to determine if heat stress simply accelerates trafficking of proteins to the RBC or if instead it results in an overall increase in trafficking.

      As mentioned earlier: We did not assess cytoadhesion at later stages, but in the supplementary figures we show that at 40 hours post infection both heat stress and control conditions have comparable proportions of VAR2CSA-positive iRBCs. This is true for the DMSO (control wildtype resembling) HA-tagged lines of HSP70x and PF3D7_072500 (Supplementary Figures 9 and 12 respectively). The end level of VAR2CSA is the same in both conditions, but at 24 hours post infection it is higher following heat stress, indicating that trafficking is accelerated.

      In the text, the authors frequently mention changes in the parasites' phenotype in response to heat stress; however, the way it is described is a bit ambiguous and can be confusing. For example, on page 3, they state that "Following heat stress, significantly more iRBCs (57.6% +/-19.4%) cytoadhered.....". From this sentence, it is not initially clear if the end result is cytoadherence of 57.6% of iRBCs or if this refers to an increase of 57.6%. This could be stated explicitly (e.g., "an increase of 57.6% +/- 19.4%") to avoid confusion. Similar descriptions of the results are found throughout the paper.

      We agree this is confusing and altered the text accordingly.

      The authors might consider citing and discussing the paper from Andrade et al (Nat Med, 2020, 26:1929-1940), which describes longer circulation times (less cytoadherence) by parasites in the dry season (asymptomatic patients) than in febrile patients in the wet season (stronger cytoadhesion of younger stages). This would seem to be consistent with the data presented here.

      We are aware of the Andrade study, but chose not to cite it in this context since the reported differences in cytoadhesion appear more consistent with PfEMP1 expression levels, as hypothesized by the authors, than with altered trafficking.

      Reviewer #2 (Recommendations for the authors):

      General comments on the text:

      (1) "Approximately 10% of the proteins encoded by P. falciparum are predicted to be exported beyond the parasite plasma membrane (PPM) into the parasitophorous vacuole lumen (PVL) and subsequently across the parasitophorous vacuole membrane (PVM) into the RBC cytosol."

      To my knowledge, it has not been really demonstrated that all exported proteins take this route (transfer step in the PVL), and how transmembrane proteins transfer from the parasite to the erythrocyte is still poorly understood. I recommend that the authors rephrase this for precision.

      We agree with this reviewer and will change the statement.

      Changes:

      We have clarified these statements to accurately reflect the current understanding of protein export. Approximately 10% of P. falciparum encoded proteins are predicted to be exported beyond the parasite plasma membrane, with many thought to pass through the parasitophorous vacuole lumen (PVL) and parasitophorous vacuole membrane (PVM) into the RBC cytosol, although the exact routes for transmembrane proteins are not fully understood.”

      (2) "Charnaud et al. 25, but not Cobb et al. 26, found HSP70x to be essential for normal PfEMP1 trafficking, although both studies concluded that HSP70x is dispensable for intraerythrocytic parasite growth at 37 {degree sign}C."

      The trafficking block in Charnaud is likely due to a delay in parasite development and cannot thus really be directly related to PfEMP1 trafficking.

      Charnaud et al., report: “Microscopy of Giemsa stained IE indicated that ΔHsp70-x appeared similar to CS2 with no obvious abnormalities (Fig 2c). To more accurately quantify changes in maturation through the cell cycle, the DNA content of parasites stained with ethidium bromide was measured by flow cytometry (Fig 2d). This indicated that most parasites had the same DNA content at each timepoint and were maturing at the same rate.”

      Thus, we cannot conclude that the trafficking phenotype reported in the Charnaud study can be attributed to a growth delay. This is also supported by only minor changes in the transcriptome, which would likely be more widely perturbed if there was a significant growth delay. However, we will change the statement “Charnaud et al., found HSP70x to be essential for normal PfEMP1 trafficking”, to ”…important for PfEMP1 trafficking” to more precisely reflect the data.

      (3) "NanoLuciferase (NanoLuc) fusion proteins and compartment-specific isolation confirmed a greater abundance of PfEMP1 in the RBC cytosol following heat stress."

      Please see my comments about the differentiation between soluble and TM-containing proteins. One would expect that PfEMP1 is membrane-integrated, and thus should not be found in the cytosol (implying a soluble form).

      See our response above.

      (4) "Importantly, heat stress did not accelerate parasite development through the asexual life cycle (Supplementary Figure 1)."

      The authors should constrain this statement to the time frame in which the heat-shock was given. Previous publications have shown a speeded-up development only in younger-stage parasites, which the authors did not study.

      We will re-phrase.

      Changes:

      We have rephrased the sentence to clarify the time window of heat stress: ”Importantly, heat stress between 16-24 hours post-invasion did not accelerate parasite development through the asexual life cycle (Supplementary Figure 1).” The supplementary figure title has also been updated to match.

      (5) I recommend that the authors include line numbers. This makes the reviewers' lives much easier.

      We agree and apologize for this oversight.

      We now added line numbers.

      Reviewer #3 (Recommendations for the authors):

      (1) All the experiments have been performed to a very high standard, and I have no major questions about the results. However, the paper would go up to the next level if the effect of fever temperatures on the stiffness of the iRBCs had been investigated by measuring the passage of iRBCs through an artificial spleen where a bed of metal spheres mimics interendothelial splenic slits.

      See our comment from above.

      (2) With respect to Figures 5E, 6C, and 6E, why was there not a decrease in bioluminescence levels at 39 {degree sign}C for Sap and NP40 to match the increase in EqtII?

      The assay is not performed as a sequence of permeabilisation steps. Instead, samples are split into three parallel treatments: one with EqtII, one with Saponin, and one with NP40. The protein measured in each case reflects the total released under that specific condition rather than being cumulative. Therefore, the NP40 fraction includes proteins from the Saponin-accessible compartment, the EqtII-accessible compartment, and the parasite cytosol.

      (3) In the Supplementary gene maps, I could not read the white text on the black gene boxes.

      We apologize: these have not converted well and will be altered with the revised version.

      Changes

      We have significantly increased the size of all fonts within the gene maps and improved the resolution of the figures to improve readability.

      (4) In Figure S6, why does HSP70-x look different between parts C and D IFAs, with the latter showing much more export?

      We agree these IFAs are not optimal and we will provide better images.

      New Results:

      Immunofluorescence microscopy, including the localisation of the two HA-tagged proteins (PF3D7_1039000 and PF3D7_0702500), has been repeated and higher-quality images are now included in the updated manuscript (Supplementary Figures 9 and 11). These figures now include multiple images of HA-tagged staining to more accurately represent the observed localisation and export patterns.

      (5) Would the authors care to comment on what kinase might be additionally phosphorylating at 39 {degree sign}C?

      We presume these are Maurer’s clefts FIKK kinases as most of the hyperphosphorylated proteins are MC residents. However, without directly testing for this using conditional KO parasite lines, we cannot exclude that host kinases are also playing a role.

      (6) Could the additional assembly of PSAC at the iRBC membrane be important for survival at 39 {degree sign}C?

      We have tested to see if nutrient uptake helps parasite survival during heat stress in the presence of furosemide and lower nutrient concentrations, but did not see a difference in growth following heat stress compared to control temperature conditions.

      New Results:

      We have added a new supplementary figure (Supplementary Figure 4) detailing experiments testing parasite growth under altered nutrient availability using two approaches (sub-lethal furosemide concentrations or reduced-nutrient RPMI) and with or without a 40°C heat stress applied between 16-24 hpi.

      The main text now references this data: “Culturing parasites in sub-lethal furosemide concentrations or in reduced nutrient media lead to reduced parasitaemia (Supplementary Figure 4). However, the parasitaemia is not further reduced following heat stress. This shows that increased PSAC levels/activity do not enhance parasite survival under conditions of limited nutrient availability either from furosemide-induced nutrient deprivation or a reduced nutrient media composition.”

      These experiments show that nutrient uptake does not improve parasite survival during heat stress compared to control temperature conditions.

      (7) Would the authors like to speculate on how higher temperatures increase the transport of exported proteins with TMDs?

      There are many possible explanations, one of which is that unfolding of the hydrophobic TMD domains is favoured at elevated temperatures. However, we have no data to support this hypothesis and therefore refrained from particularly stating this possibility.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Quantitative interactome mapping of skeletal muscle insulin resistance Ng et al present a series of proteomics/interactomics studies in skeletal muscle to identity insulin regulated complexes/interactions and changes ot these in insulin resistant muscle. More mechanistically, the Authors focus on changes in interactions involving chaperones in the ER/SR, presenting interesting data on the effect of PDIA6 overexpression alters insulin sensitivity in muscle ex vivo.

      Major Comments:

      The section entitled "Validating the regulation of PPIs with insulin resistance in C2C12 myotubes with quantitative XL-MS". This is not really a validation of th previous data as presented, but more an orthologous assay that helped pinpoint the interest in the ER. Suggest adjusting the title.

      Figure 3B - the "decrease" in AS160 pS588 regulation appears to be due to increased basal, not decreased phosphorylation in after insulin. This should be commented on or clarified.

      PDIA6 is down-regulated in muscle from people with T2D - so why did the authors decide to overexpress PDIA6? I note this rationale is explained in the discussion, and could be articulated better in the results.

      Figure 5J and K. The TA muscles are substantially larger from PDIA6 OE mice. Are the muscle fibres also larger? Tbhis relates to the normalisation of data in K. This appears to be normalised to g tissue. If so, is the difference between control, and OE mice being driven by the increase in muscle mass - with uptake per muscle or per fibre the same?

      Minor Comments:

      For the PCP-MS data form C2C12 cells. The authors use an analysis of AUC to assess protein abundance, which, as they state, is important for chronic treatments if total protein is not separately quantified. However, the analysis of changes in protein distribution is less clear from the text in the results section. Intuitively, a profile that is normalised to total intensity in all fractions would provide a protein abundance-independent read-out for changes in protein distribution. Does the "local analysis" capture this same information? Could the Authors provide a little more information here?

      Figure 1M - are the Authors sure that VPS41 should be in this panel. It doesn't seem to be insulin regulated, and the arrow appears to refer to movement between insulin sensitive and insulin resistant.

      Figure 1N - "This includes an array of TBC1 domain-containing proteins (TBC1D15, 195 TBC1D17, TBC1D8B) that are consistently reduced with IR". Do the Authors mean the abundance was less, or that complex formation was reduced?

      Optional. In general, there is a lot of text discussing the literature around proteins highlighted in the analysis. This is useful to an extent, but the Authors might consider streamlining this a little (perhaps moving some of the information ot supp tables?).

      Why do the Authors think the crosslinking MS was not able to capture acute PPI changes like the PCP-MS was?

      For the EDL crosslinking data. Are the Authors able to provide a comparison with C2C12 data - to highlight the differences and similarities between tissue and the cell model? This may be a challenge if the authors think most differences may be technical.

      Please check - "reduces free-glycerol levels essential for fatty acid synthesis". Glycerol does not directly contribute to FA synthesis. But is needed for triglyceride synthesis.

      Do the Authors think that the change in PDIA6 interactions may be a general/indirect indication of changes in ER redox and/or protein misfolding in insulin resistance?

      Is PDIA6 an ER luminal protein? If so, it being phosphorylated is interesting.

      Referees cross-commenting

      Similarly, reviewer #1 raises important points on the description of key parts of the analysis, that will need to be addressed. I think we agree that the manuscript emcpmpasses a great deal of data, and that it is somewhat difficult to follow why PDIA6 was selected for validation. Overall, the reviews pick up on different aspects of the manuscript that could be improved.

      Significance

      Overall, the strength of the paper is in the underlaying proteomics workflows and analysis. The work presented of very high technical quality, and I have no doubt the data presented will be of use to the field beyond the analysis in this current publication.

      However, a weakness is doubts over the relevance of the data on PDIA6 overexpression in muscle insulin resistance.

      This will be of interest to those in the proteomics, interactomics and metabolism fields.

      My expertise is in glucose metabolism, insulin signalling and insulin resistance.

    1. [[Aria Khodaverdi p]] on [[Martijn Aslander p]] lls. Compares it to [[Doug Engelbart Demo]] and [[Vannevar Bush As We May Think 20210304173014]] but light on examples that trigger his fascination

    1. Temporary characters must cease operation as soon as practicable and cannot be transferred to another person.

      What is the reasoning behind this? Example - with Amity changes, we have created a part-time Ambassador character that is written by one of our writers in our group, under my overall direction. Sometimes I may write for this character too. This helps the campaign region and overall direction for IC story lines. I think the wording on (d) could be improved, and I'd like to see this provision relaxed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The study by Lotonin et al. investigates correlates of protection against African swine fever virus (ASFV) infection. The study is based on a comprehensive work, including the measurement of immune parameters using complementary methodologies. An important aspect of the work is the temporal analysis of the immune events, allowing for the capture of the dynamics of the immune responses induced after infection. Also, the work compares responses induced in farm and SPF pigs, showing the latter an enhanced capacity to induce a protective immunity. Overall, the results obtained are interesting and relevant for the field. The findings described in the study further validate work from previous studies (critical role of virus-specific T cell responses) and provide new evidence on the importance of a balanced innate immune response during the immunization process. This information increases our knowledge on basic ASF immunology, one of the important gaps in ASF research that needs to be addressed for a more rational design of effective vaccines. Further studies will be required to corroborate that the results obtained based on the immunization of pigs by a not completely attenuated virus strain are also valid in other models, such as immunization using live attenuated vaccines.

      While overall the conclusions of the work are well supported by the results, I consider that the following issues should be addressed to improve the interpretation of the results:

      We thank Reviewer #1 for their thoughtful and constructive feedback, which significantly contributed to improving the clarity and quality of our manuscript. Below, we respond to each of the reviewer’s comments and describe the revisions that were incorporated.

      (1) An important issue in the study is the characterization of the infection outcome observed upon Estonia 2014 inoculation. Infected pigs show a long period of viremia, which is not linked to clinical signs. Indeed, animals are recovered by 20 days post-infection (dpi), but virus levels in blood remain high until 141 dpi. This is uncommon for ASF acute infections and rather indicates a potential induction of a chronic infection. Have the authors analysed this possibility deeply? Are there lesions indicative of chronic ASF in infected pigs at 17 dpi (when they have sacrificed some animals) or, more importantly, at later time points? Does the virus persist in some tissues at late time points, once clinical signs are not observed? Has all this been tested in previous studies?

      Tissue samples were tested for viral loads only at 17 dpi during the immunization phase, and long-term persistence of the virus in tissues has not been assessed in our previous studies. At 17 dpi, lesions were most prominently observed in the lymph nodes of both farm and SPF pigs. In a previous study using the Estonia 2014 strain (doi: 10.1371/journal.ppat.1010522), organs were analyzed at 28 dpi, and no pathological signs were detected. This finding calls into question the likelihood of chronic infection being induced by this strain.

      (2) Virus loads post-Estonia infection significantly differ from whole blood and serum (Figure 1C), while they are very similar in the same samples post-challenge. Have the authors validated these results using methods to quantify infectious particles, such as Hemadsorption or Immunoperoxidase assays? This is important, since it would determine the duration of virus replication post-Estonia inoculation, which is a very relevant parameter of the model.

      We did not perform virus titration but instead used qPCR as a sensitive and standardized method to assess viral genome loads. Although qPCR does not distinguish between infectious and non-infectious virus, it provides a reliable proxy for relative viral replication and clearance dynamics in this model. Unfortunately, no sample material remains from this experiment, but we agree that subsequent studies employing infectious virus quantification would be valuable for further refining our understanding of viral persistence and replication following Estonia 2014 infection.

      (3) Related to the previous points, do the authors consider it expected that the induction of immunosuppressive mechanisms during such a prolonged virus persistence, as described in humans and mouse models? Have the authors analysed the presence of immunosuppressive mechanisms during the virus persistence phase (IL10, myeloid-derived suppressor cells)? Have the authors used T cell exhausting markers to immunophenotype ASFV Estonia-induced T cells?

      We agree with the reviewer that the lack of long-term protection can be linked to immunosuppressive mechanisms, as demonstrated for genotype I strains (doi: 10.1128/JVI.00350-20). The proposed markers were not analyzed in this study but represent important targets for future investigation. We addressed this point in the discussion.

      (4) A broader analysis of inflammatory mediators during the persistence phase would also be very informative. Is the presence of high VLs at late time points linked to a systemic inflammatory response? For instance, levels of IFNa are still higher at 11 dpi than at baseline, but they are not analysed at later time points.

      While IFN-α levels remain elevated at 11 dpi, this response is typically transient in ASFV infection and likely not linked to persistent viremia. We agree that analyzing additional inflammatory markers at later time points would be valuable, and future studies should be designed to further understand viral persistence.

      (5) The authors observed a correlation between IL1b in serum before challenge and protection. The authors also nicely discuss the potential role of this cytokine in promoting memory CD4 T cell functionality, as demonstrated in mice previously. However, the cells producing IL1b before ASFV challenge are not identified. Might it be linked to virus persistence in some organs? This important issue should be discussed in the manuscript.

      We agree that identifying the cellular source of IL-1β prior to challenge is important, and this should be addressed in subsequent studies. We included a discussion on the potential link between elevated IL-1β levels and virus persistence in certain organs.

      (6) The lack of non-immunized controls during the challenge makes the interpretation of the results difficult. Has this challenge dose been previously tested in pigs of the age to demonstrate its 100% lethality? Can the low percentage of protected farm pigs be due to a modulation of memory T and B cell development by the persistence of the virus, or might it be related to the duration of the immunity, which in this model is tested at a very late time point? Related to this, how has the challenge day been selected? Have the authors analysed ASFV Estonia-induced immune responses over time to select it?

      In our previous study, intramuscular infection with ~3–6 × 10<sup>2</sup> TCID<sub>50</sub>/mL led to 100% lethality (doi: 10.1371/journal.ppat.1010522), which is notably lower than the dose used in the present study, although the route here was oronasal. The modulation of memory responses could be more thoroughly assessed in future studies using exhaustion markers. The challenge time point was selected based on the clearance of the virus from blood and serum. We agree that the lack of protection in some animals is puzzling and warrants further investigation, particularly to assess the role of immune duration, potential T cell exhaustion caused by viral persistence, or other immunological factors that may influence protection. Based on our experience, vaccine virus persistence alone does not sufficiently explain the lack-of-protection phenomenon. We incorporated these important aspects into the revised discussion.

      (7) Also, non-immunized controls at 0 dpc would help in the interpretation of the results from Figure 2C. Do the authors consider that the pig's age might influence the immune status (cytokine levels) at the time of challenge and thus the infection outcome?

      We support the view that including non-immunized controls at 0 dpc would strengthen the interpretation of cytokine dynamics and will consider this in future experimental designs. Regarding age, while all animals were within a similar age range at the time of challenge, we acknowledge that age-related differences in immune status could influence baseline cytokine levels and infection outcomes, and this is an important factor to consider.

      (8) Besides anti-CD2v antibodies, anti-C-type lectin antibodies can also inhibit hemadsorption (DOI: 10.1099/jgv.0.000024). Please correct the corresponding text in the results and discussion sections related to humoral responses as correlates of protection. Also, a more extended discussion on the controversial role of neutralizing antibodies (which have not been analysed in this study), or other functional mechanisms such as ADCC against ASFV would improve the discussion.

      The relevant text in the Results and Discussion sections was revised accordingly, and the discussion was extended to more thoroughly address the roles of antibodies.

      Reviewer #2 (Public review):

      Summary:

      In the current study, the authors attempt to identify correlates of protection for improved outcomes following re-challenge with ASFV. An advantage is the study design, which compares the responses to a vaccine-like mild challenge and during a virulent challenge months later. It is a fairly thorough description of the immune status of animals in terms of T cell responses, antibody responses, cytokines, and transcriptional responses, and the methods appear largely standard. The comparison between SPF and farm animals is interesting and probably useful for the field in that it suggests that SPF conditions might not fully recapitulate immune protection in the real world. I thought some of the conclusions were over-stated, and there are several locations where the data could be presented more clearly.

      Strengths:

      The study is fairly comprehensive in the depth of immune read-outs interrogated. The potential pathways are systematically explored. Comparison of farm animals and SPF animals gives insights into how baseline immune function can differ based on hygiene, which would also likely inform interpretation of vaccination studies going forward.

      Weaknesses:

      Some of the conclusions are over-interpreted and should be more robustly shown or toned down. There are also some issues with data presentation that need to be resolved and data that aren't provided that should be, like flow cytometry plots.

      We appreciate the feedback from the Reviewer #2 and acknowledge the concerns raised regarding data presentation. In the revised manuscript, we clarified our conclusions where needed and ensured that interpretations were better aligned with the data shown.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In the Introduction, more details on the experimental model would be appreciated. A short summary of findings obtained with this model in previous works from the authors would help to better understand the context of the study.

      Basic information on the model was added in the Introduction section of the revised manuscript.

      (2) In Figure 1, the addition of more time points on the x-axes would help the interpretation of the figures.

      We agree and have added extra time points to the x-axes.

      (3) To better understand the results in Figure 2A, a figure showing cytokine levels post-Estonia infection of only challenged pigs would help, indicating protected and non-protected animals as in Figure 2C. This figure would be better linked to the corresponding dot plot (Figure 2B).

      Our statistical analyses in Figure 2A are based on using both challenged and non-challenged pigs to assess differences between SPF and farm pigs. We prefer not to remove the non-challenged pigs in order to avoid losing statistical power. Moreover, even when non-challenged and challenged pigs are displayed in the plots, upregulation of IFN-α and IL-8 can be visualized and remains consistent with the positive and negative correlates of protection shown in Figure 2C.

      (4) Dark red colour associated with SPF non-protected is difficult to differentiate from light red in some figures.

      We thank the reviewer for this remark. To preserve the color scheme across the paper, we changed the circle data points to squares for the non-protected SPF pig in the most crowded figures: Figures 1–3 and Supplementary Figures 2 and 8.

      (5) In Supplementary figures 12-16, grouping of the animal numbers (SPF vs farm) would facilitate the interpretation of the results.

      Information on the animal numbers for each group (SPF vs. farm) has been added to the figure captions.

      (6) Are the results shown in Figure 8 based on absolute scores as mentioned? Results from 0 dpc are not shown. Is that correct?

      That is correct. BTM expression values are absolute and could not be normalized, as RNA was not isolated either immediately before the challenge or on day 0 post-challenge. This information is now clarified in the figure captions.

      Reviewer #2 (Recommendations for the authors):

      (1) The authors use the words "predicted" and "predicts" although they haven't used any methods to show that this is true, such as a multivariate analysis. I don't think correlation coefficients are sufficient to indicate prediction. This needs to be fixed.

      We agree with this and have made changes in the text to avoid this impression.

      (2) "Lower baseline immune activation was linked to increased protective immunity." Presumably, the authors mean prior to challenge, not prior to "vaccination"?

      In this sentence written in the Abstract, we refer to baseline immune activation in the steady state, i.e., prior to any infection, as demonstrated in a previous study by Radulovic et al. (2022). The sentence was adapted accordingly. This concept is further explored in the Discussion section.

      (3) The abstract mentioned the comparison between farm and SPF pigs, but didn't provide any context for those findings. It could be added here.

      In the new version, we have added information on this model in the Introduction section.

      (4) Figure legends need N to be indicated. For example, the viral load figures don't appear to be representative of all 9 or 5 animals. Is there a reason why not all were challenged, and how were those 5 challenged selected?

      Numbers of animals in each group were added to the figure captions. We have also provided details regarding the animals sacrificed at different time points of the experiment in the ‘Animal experiment’ section of the Methods.

      (5) 1A doesn't have a legend to indicate whether dark or light color indicates sampling.

      Fair point. We have added the information to the figure.

      (6) For Figure 3C, it's not clear how the correlation is presented. The legend indicates in writing that the color indicates the outcome it correlates with, but the legend suggests that it is r.

      The method of presenting correlation data is consistent across all figures, including Figure 3C. The color reflects the direction and strength of the correlation, corresponding to the r coefficient obtained from correlating immunological parameters with clinical scores. We have clarified this description in the figure caption to improve readability.

      (7) For some of the correlation data in 2D and 3C, it would be nice to provide the plots in the supplemental. Also, are there enough data points for a robust interpretation of correlation curves?

      We agree that providing the plots will improve clarity and have included them in the supplementary material. While we acknowledge that the number of data points is modest, we believe it is sufficient to support a robust interpretation of the correlation curves. Corresponding p-value cutoffs are noted in the figure captions.

      (8) The figure 2C method of indicating significance is confusing. There must be a clearer way to present this figure.

      Analyzing statistical significance for the dataset shown in Figure 2C is challenging due to the small number of animals. We carefully considered alternative ways of presenting statistical significance, however, given the limited group sizes, we believe that the current approach provides the most transparent and informative representation of the data.

      For clarity, we divided the animals into SPF and farm groups, as well as into protected (4 SPF, 2 farm pigs) and non-protected (1 SPF, 3 farm pigs) categories, and performed both group-based (unpaired t-test) and time-based (mixed-effects analysis) comparisons. All significant differences were added to the plots so that readers could directly visualize the observed trends and compare them with the correlation analysis presented in Figure 2D.

      (9) Please note that "viremia" means the presence of a virus specifically in the blood. Other descriptions of viral load should be used if this was not measured.

      We have clarified this in the text. When referring to organs, we use the term “viral loads.”

      (10) The way of putting a square around boxes that are significant can be misleading when a box is surrounded by other significant comparisons. Like for Figure 6B - probably all of these are really significant, but I can't tell for sure.

      Good point. We changed rectangles to circles for better readability of the figures.

      (11) There is a potential argument that these correlates of protection might only be valid for this specific vaccine. It should be noted that comparisons of multiple vaccines would be needed before assuming the correlates are broadly relevant.

      We agree with this statement and address it in the Discussion section.

      (12) For the circled pathways in Figure 9, it is not clear from the diagram if there is a directionality to the involvement of those pathways. Modulated or induced?

      When discussing pathways identified by transcriptome analysis, we are always referring to their induction, as this is based on the normalized enrichment score (NES). We have now specified this in the figure caption.

      (13) The authors speculate about NK cells, but this is based on transcriptional pathways identified and the literature. Is there any indication from the flow cytometry data whether activated NK cells versus NKT cells are associated with protection? Also, the memory phenotype of those cells?

      Regarding NK cells, the BTM analysis was corroborated by the flow cytometry data shown in Supplementary Figure 8. NK cells were defined as CD3<sup>-</sup>CD8α<sup>+</sup>. Specific markers to distinguish NKT cells or to assess memory phenotypes were not included in our panel.

      (14) In the discussion, "Our study demonstrates that T cell activation represents a robust correlate of protection against ASFV" doesn't indicate whether they mean after vaccination or after challenge. Re-using the same time points throughout the manuscript compounds this confusion.

      In this case, we mean that T cell activation upon immunization/vaccination and challenge correlates with protection. This information has been added to the sentence. Although some time points overlap between the immunization and challenge phases, we consistently use “dpi” and “dpc” to clearly distinguish them.

      (15) Flow cytometry gating strategies should be provided in the supplemental, particularly since this species is less frequently studied using flow cytometry; it would be helpful to understand gating and expression levels of key markers.

      We have provided the gating strategy in Supplementary Figure 7, which is also referenced in the “Flow cytometry and hematology analysis” section of the Methods.

      (16) Some of the discussion is a bit long and repetitive - e.g. the parts on antibodies and the last paragraph with multiple other parts of the discussion and manuscript.

      While we agree that some sections are extensive, we think that this level of detail is necessary to integrate the different datasets and to place our findings in the context of previous literature.

    1. Author response:

      eLife Assessment

      This study uses a Bayesian framework to characterize latent brain state dynamics associated with memory encoding and performance in children, as measured with functional magnetic resonance imaging. The novelty of the approach offers valuable insights into memory-related brain activity, but the consideration of developmental changes in memory and brain dynamics, and the evidence to support the proposed mapping between specific states and distinct aspects of memory, are incomplete. This work will be of interest to researchers interested in cognitive neuroscience and the development of memory.

      We are grateful to the editor and reviewers for their positive feedback and constructive evaluation. Their comments have identified important areas where the manuscript can be strengthened. Below, we outline our planned revisions.

      Reviewer #1 (Public review):

      Zeng et al. characterized the dynamic brain states that emerged during episodic encoding and the reactivation of these states during the offline rest period in children aged 8-13. In the study, participants encoded scene images during fMRI and later performed a memory recognition test. The authors adopted the BSDS approach and identified four states during encoding, including an "active-encoding" state. The occupancy rate of, and the state transition rates towards, this active-encoding state positively predicted memory accuracy across participants. The authors then decoded the brain states during pre- and post-encoding rests with the model trained on the encoding data to examine state reactivation. They found that the state temporal profile and transition structure shifted from encoding to post-encoding rest. They also showed that the mean lifetime and stability (measured with self-transition probability) of the "default-mode" state during post-encoding rest predict memory performance. How brain dynamics during encoding and offline rest support long-term memory remains understudied, particularly in children. Thus, this study addresses an important question in the field. The authors implemented an advanced computational framework to identify latent brain states during encoding and carefully characterized their spatiotemporal features. The study also showed evidence for the behavioral relevance of these states, providing valuable insights into the link between state dynamics and successful encoding and consolidation.

      We thank Reviewer #1 for the positive feedback on our study. And we would like to thank you for the reviewer's constructive feedback. We plan to incorporate detailed methodological justifications and a thorough limitation analysis. We also plan to enhance the overall logical coherence of the manuscript, ensuring a more robust and scientifically sound presentation.

      Weaknesses:

      (1) If applicable, please provide information on the decoding performance of states during pre- and post-encoding rests. The Methods noted that the authors applied a threshold of 0.1 z-scored likelihood, and based on Figure S2, it seems like most TRs were assigned a reinstated state during post-encoding rest. It would be useful to know, for the decodable TRs, how strong the evidence was in favor of one state over others. Further, was decoding performance better during post- vs. pre- encoding rest? This is critical for establishing that these states were indeed "reinstated" during rest. The authors showed individual-specific correlations between encoding and post-encoding state distribution, which is an important validation of the method, but this result alone is not sufficient to suggest that the states during encoding were the ones that occurred during rest. The authors found that the state dynamics vary substantially between encoding and rest, and it would be helpful to clarify whether these differences might be related to decoding performance. I am also curious whether, if the authors apply the BSDS approach to independently identify brain states during rest periods (instead of using the trained model from encoding), they find similar states during rest as those that emerged during encoding?

      We plan three additional analyses to strengthen the evidence for state reinstatement during rest: First, we will report quantitative decoding confidence metrics for each decoded time point, including the log-likelihood between the winning state and the next-best state. We will compare these distributions between pre- and post-encoding rest to test whether decoding quality differs between conditions, as the reviewer suggests. Second, we will provide a more detailed characterization of the decoding process, including the proportion of TRs that survive the log-likelihood threshold of 0.1 during pre- vs. post-encoding rest and whether this proportion relates to memory performance. Third, we will train an independent BSDS model directly on the rest data (rather than using the encoding-trained model) and assess the degree of correspondence between the independently discovered rest states and the encoding states in terms of amplitude profiles and covariance structures. Convergence between the two approaches would provide strong validation that the encoding-defined states genuinely re-emerge at rest. Together with our evidence from our previous analyses, these additional analyses will strengthen our claims.

      (2) During post-encoding rest, the intermediate activation state (S1) became the dominant state. Overall, the paper did not focus too much on this state. For example, when examining the relationship between state transitions and memory performance, the authors also did not include this state as a part of the analyses presented in the paper (lines 203-211). Could the author report more information about this state and/or discuss how this state might be relevant to memory formation and consolidation?

      We thank the reviewer for this suggestion. During encoding, S1 had the lowest occupancy (~10%) and showed no significant relationship with memory performance, which led us to interpret it as a non-essential transient configuration. In the revision, we will provide a more thorough characterization of S1, and conduct correlation analyses to probe whether its dynamic properties during post-encoding rest correlate with individual memory performance.

      (3) Two outcome measures from the BSDS model were the occupancy rate and the mean lifetime. The authors found a significant association with behavior and occupancy rate in some analyses, and mean lifetime in others. The paper would benefit from a stronger theoretical framing explaining how and why these two different measures provide distinct information about the brain dynamics, which will help clarify the interpretation of results when association with behavior was specific to one measure.

      We thank the reviewer for this suggestion. Occupancy rate and mean lifetime, while related, capture fundamentally different aspects of brain state dynamics. Occupancy rate reflects the total proportion of time the brain spends in a given state, capturing the overall prevalence of that configuration across the scanning session. Mean lifetime, by contrast, measures the average uninterrupted duration of each state visit, indexing the temporal stability or persistence of a given network configuration once it is entered. Critically, two states could have identical occupancy rates but very different mean lifetimes, a state visited frequently but briefly versus one visited rarely but sustained, implying distinct underlying neural dynamics. In the context of memory, high occupancy of the active-encoding state may reflect repeated engagement of encoding-optimal circuits, while long mean lifetime of the default-mode state during rest may reflect sustained consolidation-related processing. We will expand the theoretical framework in the revised manuscript to articulate these distinctions and connect them to extant findings suggesting that temporal stability versus frequency of state visits may have dissociable behavioral correlates in working memory and episodic memory (He et al., 2023; Stevner et al., 2019).

      (4) For performance on a memory recognition test, d' is a more common metric in the literature as it isolates the memory signal for the old items from response bias. According to Methods (line 451), the authors have computed a different metric as their primary behavioral measure (hits + correction rejections - misses - false alarms). Please provide a rationale for choosing this measure instead. Have the authors considered computing d' as well and examining brain-behavior relationships using d'?

      Our primary memory recognition metric computed as (hits + correct rejections − misses − false alarms) / total trials, provides an unbiased linear estimate of discrimination ability that is mathematically consistent with d' in directional effects. We selected this measure because it is particularly robust with limited trial counts per condition (Verde et al., 2006; Wickens, 2001). Nonetheless, we agree that reporting d' is important for comparability with the broader literature. In the revision, we will compute d' for each participant and conduct parallel brain–behavior correlation analyses to demonstrate that our findings are robust across both metrics.

      (5) While this study examined brain state dynamics in children, there was no adult sample to compare with. Therefore, it is hard to conclude whether the findings are specific to children (or developing brains). It would be helpful to discuss this point in the paper.

      We thank the reviewer for raising this point. While several studies have documented memory-related replay and reinstatement in adults at both the regional and systems levels(Tambini et al., 2017; Wimmer et al., 2020), few have examined whether analogous state-level reinstatement occurs in children. Our study was motivated by this gap: we sought to test whether children show dynamic brain state reinstatement mechanisms similar to those described in adults. However, we acknowledge that without a direct adult comparison, we cannot determine whether the observed patterns are unique to children or reflect general principles of episodic memory organization. In the revised manuscript, we will: (a) frame the study more carefully as examining whether established state-level consolidation mechanisms also operate during childhood, (b) discuss findings in relation to adult studies, and (c) include exploratory analyses of age-related variability in both memory performance and BSDS dynamics within our sample, while acknowledging that the narrow age range (8–13) and small sample size limit the power of such developmental analyses. We will clearly identify the absence of an adult comparison as a limitation.

      Reviewer #2 (Public review):

      This paper investigates the latent dynamic brain states that emerge during memory encoding and predict later memory performance in children (N = 24, ages: 8 -13 years). A novel computational approach (Bayesian Switching Dynamic Systems, BSDS) discovers latent brain states from fMRI data in an unsupervised and parameter-free manner that is agnostic to external stimuli, resulting in 4 states: an active-encoding state, a default-mode state, an inactive state, and an intermediate state. The key finding is that the percentage of time occupied in the active-encoding state (characterized by greater activity in hippocampal, visual, and frontoparietal regions), as well as greater transitions to this state, predicts memory accuracy. Memory accuracy was also predicted by the mean lifetime and transitions to the default-mode state (characterized by greater activity in medial prefrontal cortex and posterior cingulate cortex) during post-encoding rest. Together, the results provide insights into dynamic interactions between brain regions that may be optimal for encoding novel information and consolidating memories for long-term retention.

      We thank Reviewer #2 for recognizing the novelty and broader utility of our methodology and for noting that the manuscript is well-written and concise.

      Weaknesses:

      (1) The study focuses on middle childhood, but there is a lack of engagement in the Introduction or Discussion about what is known about memory development and the brain during this period. Many of the brain regions examined in this study, particularly frontoparietal regions, undergo developmental changes that could influence their involvement in memory encoding and consolidation. The paper would be strengthened by more directly linking the findings to what is already known about episodic memory development and the brain.

      We thank the reviewer for this suggestion. In response, we will substantially expand the Introduction and Discussion to situate our findings within the developmental cognitive neuroscience literature on episodic memory. In particular, we will address the protracted developmental trajectory of frontoparietal regions, the well-documented maturation of hippocampal–cortical connectivity during middle childhood, and how these developmental changes may influence the brain state configurations we observed (He et al., 2023; Ryali et al., 2016). This will provide the necessary developmental context for interpreting our state dynamics results.

      (2) A more thorough overview of the BSDS algorithm is needed, since this is likely a novel method for most readers. Although many of the nitty-gritty details can be referenced in prior work, it was unclear from the main text if the BSDS algorithm discovered latent states based on activation patterns, functional connectivity, or both. Figure 1F is not very informative (and is missing labels).

      We thank the reviewer for this suggestion. We agree that a more accessible overview of the BSDS algorithm (Lee et al., 2025; Taghia et al., 2018) is needed. In the revision, we will expand the Methods and provide a concise algorithmic overview in the main text that clarifies the following key points: (a) BSDS operates on multivariate time series from the ROIs and infers latent brain states defined jointly by their mean activation patterns (amplitude vectors) and inter-regional covariance matrices (functional connectivity); (b) it employs a hidden Markov model framework with Bayesian inference and automatic relevance determination to identify the number of states without manual specification; and (c) state assignments are made at each TR, yielding a temporal sequence that enables computation of occupancy rates, mean lifetimes, and transition probabilities. We will also revise Figure 1F to include appropriate labels and a clearer schematic of the model's inputs, latent structure, and outputs.

      (3) A further confusion about the BSDS algorithm was whether it necessarily had to work on the rest data. Figure 4A suggests that each TR was assigned one of the four states based on the maximum win from the log-likelihood estimation. Without more details about how this algorithm was applied to the rest data, it is difficult to evaluate the claim on page 14 about the spontaneous emergence of the states at rest.

      The key methodological point is that the BSDS model, once trained on encoding data, can be applied to new (rest) time series via log-likelihood estimation: for each TR during rest, the model computes the log-likelihood of each state given the observed multivariate signal, and the state with the maximum log-likelihood is assigned to that TR. This "decoding" approach tests whether the spatial configurations learned during encoding are present during rest, rather than fitting new states de novo. We applied a threshold to the log-likelihood values to exclude TRs where the evidence for any single state was weak, thus controlling for potential misassignment. We will substantially clarify this process in the revised Methods and main text, and as described in our response to Reviewer #1 point 1, we will also conduct additional analyses to address the concerns raised.

      (4) Although the BSDS algorithm was validated in prior simulations and task-based fMRI using sustained block designs in adults, it is unclear whether it is appropriate for the kind of event-related design used in the current study. Figure 1G shows very rapid state changes, which is quantified in the low mean lifetime of the states (between 1-3 TRs on average) in Figure 4C. On the one hand, it is a strength of the algorithm that it is not necessarily tied to external stimuli. On the other hand, it would be helpful to see simulations validating that rapid transitions between states in fMRI data are meaningful and not due to noise.

      This is an important methodological question. The rapid state changes observed in our event-related design (mean lifetimes of 1–3 TRs) differ from the longer state durations typically observed with block designs(He et al., 2023; Zeng et al., 2024), where sustained cognitive demands stabilize brain configurations. We believe these rapid transitions are consistent with the inherent dynamics of event-related encoding, where each trial involves rapid shifts between sensory processing, memory binding, and attentional engagement. Several considerations support the meaningfulness of these transitions: (a) the identified states have interpretable amplitude profiles consistent with well-established memory-related brain systems; (b) state dynamics show statistically significant, directionally consistent correlations with subsequent memory performance; and (c) the transition structure during encoding is distinct from that observed during rest, indicating sensitivity to task demands. Nonetheless, we acknowledge the concern about noise and will conduct additional analyses in the revision to address the concerns raised.

      (5) The Methods section mentions that participants actively imagined themselves within the encoded scenes and were instructed to memorize the images for a later test during the post-encoding rest scan. This detail needs to be included in the main text and incorporated into the interpretation of the findings, as there are likely mechanistic differences between spontaneous memory replay/reinstatement vs. active rehearsal.

      We thank the reviewer for this suggestion. We will include these experimental details in the main text and incorporate it into the interpretation of our findings in the context of spontaneous memory replay/reinstatement vs. active rehearsal (Liu et al., 2019; Wimmer et al., 2020).

      (6) Information about the general linear model used to discover the 16 ROIs that showed a subsequent memory effect are missing, such as: covariates in the model (motion, etc.), group analysis approach (parametric or nonparametric), whether and how multiple-comparisons correction was performed, if clusters were overlapping at all or distinct, if the total number of clusters was 16 or if this was only a subset of regions that showed the effect.

      We apologize for the missing methodological details. In the revised manuscript, we will provide complete information on the general linear model used to identify the 16 ROIs, including: the event regressors and parametric modulators included in the model, nuisance covariates (motion parameters, white matter and CSF regressors), the group-level analysis approach and statistical thresholding, the method for multiple-comparisons correction, whether the 16 ROIs represent all significant clusters or a subset, and whether any clusters were spatially overlapping. We will also clarify how peak voxels were selected for ROI definition.

      Reviewer #3 (Public review):

      This paper uses a novel method to look at how stable brain states and the transitions between them promote memory formation during encoding and post-encoding rest in children. I think the paper has some weaknesses (detailed below) that mean that the authors fall short of achieving their aims. Although the paper has an interesting methodological approach, the authors need better logic, and are potentially "double dipping" in their results - meaning their logic is circular. I think the method that they are using could be useful to the broader neuroimaging community, although they need to make this argument clearer in the paper.

      We thank Reviewer #3 for recognizing the novelty of our approach and its potential utility for the broader neuroimaging community.

      (1) The authors use children as their study subjects but fail to reconcile why children are used, if the same phenomena are expected to be seen in adults (or only children), and if and how their findings change with age across an age range that ranges from middle childhood into early adolescence. They need to include more consideration for the development of their subject population. The authors should make it clear why and how memory was tested in children and not adults. Are adults and children expected to encode and consolidate in a similar manner to children? Do the findings here also apply to adults? How was the age range of 8-13-year-old children selected? Why didn't the authors look at change with age? Does memory performance change with age? Do the BSDS dynamics change with age in the authors' sample?

      Our study was motivated by the observation that while adult studies have documented memory replay and reinstatement, very little is known about whether these dynamic state-level mechanisms operate during middle childhood, a period characterized by substantial improvements in episodic memory ability and ongoing maturation of frontoparietal and hippocampal–cortical circuits. The age range of 8–13 was defined a priori based on typical developmental classifications of middle childhood through early adolescence, representing a period when episodic memory abilities are developing rapidly.

      In response to the reviewer's specific questions: (a) we will conduct exploratory analyses testing whether memory accuracy, BSDS state dynamics (occupancy, mean lifetime, transitions), and brain–behavior correlations vary as a function of age within our sample; (b) we will clearly discuss whether adults are expected to show similar patterns, drawing on the extant adult literature; and (c) we will acknowledge as a limitation that our sample size (N = 24) and narrow age range provide limited statistical power for detecting continuous age-related changes, and that a dedicated cross-sectional or longitudinal developmental design would be needed to draw firm conclusions about developmental trajectories. Please also see responses to Reviewer #1 point 5 and Reviewer #2 point 1.

      (2) The authors look for brain state dynamics within a preselected set of ROIs that are selected because they display a subsequent memory effect. This is problematic because the state that is most associated with subsequent memory (S3, or State 3) is also the one that shows most activity in these regions (that have already been a priori selected due to displaying a subsequent memory effect). This logic is circular. It would be helpful if they could look at brain state dynamics in a more ROI agnostic whole brain approach so that we can learn something beyond what a subsequent memory analysis tells us. I think the authors are "double dipping" in that they selected regions for further analysis based on a subsequent memory association (remembered > forgotten contrast) and then found states within those regions showing a subsequent memory effect to further analyze for being associated with subsequent memory. Would it be possible instead to do a whole-brain analysis (something a bit more agnostic to findings) using the BSDS framework, and then, from a whole-brain perspective, look for particular brain states associated with subsequent memory? As it stands, it looks like S3 (state 3) has greater overall activation in all brain regions associated with subsequent memory, so it makes sense that this brain state is also most associated with subsequent memory. The BSDS analysis is therefore not adding anything new beyond what the authors find with the simple subsequent memory contrast that they show in Figure 1C. This particularly effects the following findings: (a) active-encoding state occupancy rate correlated positively with memory accuracy, (b) transitions to the active-encoding state were beneficial / Conversely, transitions toward the inactive state (S4) were detrimental, with incoming transitions showing negative correlations with memory accuracy / The active-encoding state serves as a "hub" configuration that facilitates memory formation, while pathways leading to this state enhance performance and transitions away from it impair encoding.

      We appreciate this critique, which raises an important concern about analytical circularity.

      a) Why BSDS adds information beyond the static subsequent memory contrast. The reviewer notes that S3 (the active-encoding state) shows high activation in the same regions selected by the subsequent memory contrast, and therefore questions whether BSDS provides new information. We respectfully argue that BSDS captures dimensions of neural organization that a static contrast cannot. Specifically: (a) the subsequent memory contrast identifies which regions are differentially active for remembered vs. forgotten items, averaged across the entire encoding session, it provides no temporal information about when or for how long these regions are co-active; (b) BSDS reveals the moment-to-moment temporal evolution of brain states, including the duration and stability of each configuration (mean lifetime), which independently predicts behavior; (c) BSDS uniquely captures transition dynamics, the rates and patterns of switching between states, which we show are predictive of memory in ways not derivable from the contrast map (e.g., transitions from S2→S3 positively predict memory, transitions toward S4 negatively predict memory); and (d) BSDS characterizes the full covariance structure among regions within each state, revealing distinct connectivity patterns (e.g., the high clustering coefficient and global efficiency of S3), which are not captured by univariate activation contrasts. Thus, while the ROI selection is informed by the subsequent memory effect, the information BSDS extracts from those regions, temporal dynamics, transition patterns, and multivariate covariance, is orthogonal to the information used for selection.

      b) Additional validation. To directly address the circularity concern empirically, we will conduct additional analysis using ROIs from previous studies (e.g. network templates) / meta-analyses/Neurosynth ROIs (He et al., 2023; Meer et al., 2020; Taghia et al., 2018), without resorting to selection based on the subsequent memory contrast.

      (3) The task used to test memory in children seems strange. Why should children remember arbitrary scenes? How this was chosen for encoding needs to be made clear. There needs to be more description of the memory task and why it was chosen. Why was scene encoding chosen? What does scene encoding have to do with the stated goal of (a) "Understanding how children's brains form lasting memories", (b) "optimizing education" and (c) "identifying learning disabilities"? What was the design of the recognition memory test? How many novel scenes were included in the test, and how were they chosen? How close were the "new" images to previously seen "old" images? Was this varied parametrically (i.e., was the similarity between new and old images assessed and quantified?)

      Scene encoding was chosen for several reasons: (a) scenes are rich, complex stimuli that engage the hippocampal–parahippocampal memory system, eliciting robust subsequent memory effects suitable for BSDS modeling; (b) scene encoding recruits distributed networks spanning visual cortex, MTL, and frontoparietal regions, enabling detection of multi-region brain states; and (c) scene encoding paradigms have been widely used in both adult and developmental studies of episodic memory and replay(Tambini et al., 2017; Tompary et al., 2017), facilitating comparison with prior work.

      Regarding the recognition test: participants viewed 200 images (100 old, 100 new), with novel scenes drawn from the same categories (buildings and natural scenes) but chosen to be perceptually distinct from studied images. Similarity between old and new images was not parametrically manipulated or quantified: we will note this limitation. We will also expand the main text to include full task details and have deleted claims about implications for educational optimization and learning disability identification (see also Reviewer #3 point 7).

      (4) They ultimately found four brain states during encoding. It would be helpful if they could make the logic and foundation for arriving at this number clear.

      The number of brain states is not predetermined by the user but is automatically determined by the BSDS algorithm through Bayesian automatic relevance determination (ARD). The model is initialized with a maximum number of possible states, and during inference, states that contribute minimally to explaining the data are effectively pruned, their associated parameters are driven to near-zero by the ARD prior. In our data, the model converged on four states. This is a key advantage of BSDS over conventional HMM approaches, which require the user to specify the state number a priori. We will clarify this process in the revised Methods and Results, referencing the original BSDS methodology paper (Taghia et al., 2018) for full mathematical details.

      (5) There is already extant work on whether brain states during post-encoding rest predict memory outcomes. This work needs to be cited and referred to. The present manuscript needs to be better situated within prior work. The authors should look at the work by Alexa Tompary and Lila Davachi. They have already addressed many of the questions that the authors seek to answer. The authors should read their papers (and the papers they cite and that cite them) and then situate their work within the prior literature.

      We agree that the manuscript must be better situated within the existing literature on post-encoding rest and memory consolidation. We will revise the Introduction and Discussion to further discuss with the foundational work in adults by Tompary & Davachi (2017, Neuron; 2024, eLife) on consolidation-related hippocampal–mPFC representational overlap, as well as Tambini & Davachi (2013, PNAS; 2019, Trends in Cognitive Sciences) on hippocampal persistence during post-encoding rest and awake reactivation(Tambini et al., 2019; Tambini et al., 2017; Tompary et al., 2017). We will explicitly discuss how our BSDS-based approach to state-level reinstatement complements and extends these earlier findings, which largely focused on region-specific pattern similarity or hippocampal–cortical connectivity, by characterizing reinstatement at the level of dynamic, whole-network configurations.

      (6) The authors should back up the claim that "successful episodic memory formation critically depends on the temporal coordination between these systems. Brain regions must coordinate their activity through dynamic functional interactions, rapidly reconfiguring their activity and connectivity patterns in response to changing cognitive demands and stimulus characteristics." Do they have any specific evidence supporting this claim?

      The claim that episodic memory depends on temporal coordination and dynamic functional interactions is supported by several lines of evidence: (a) within our study, the significant correlations between state transition rates and memory performance directly demonstrate that dynamic inter-state communication predicts memory outcomes; (b) studies showing that hippocampal–prefrontal theta coherence during encoding predicts subsequent memory (e.g., Zielinski et al., 2020)(Zielinski et al., 2020); and (c) recent work demonstrating that rapid reconfiguration of large-scale brain networks supports cognitive functions including working memory (Shine et al., 2018; Braun et al., 2015)(Braun et al., 2015; Shine et al., 2018) and episodic encoding (Phan et al., 2024)(Phan et al., 2024) We will revise this passage to include specific citations and to make clear that our own transition–behavior correlations constitute direct evidence for this claim.

      (7) These claims seem overstated: "this work has broad implications for understanding memory function in children, for developing educational interventions that enhance memory formation, and enabling early identification of children at risk for learning disabilities." Can the authors add citations that would support these claims, or if not, remove them?

      We thank the reviewer for raising this point. We agree that the current framing overstates the practical implications. We have now removed these claims and remark on future studies that are needed here.

      References

      (1) Braun, U., Schafer, A., Walter, H., Erk, S., Romanczuk-Seiferth, N., Haddad, L., . . . Bassett, D. S. (2015). Dynamic reconfiguration of frontal brain networks during executive cognition in humans. Proc Natl Acad Sci U S A, 112(37), 11678-11683.

      (2) He, Y., Liang, X., Chen, M., Tian, T., Zeng, Y., Liu, J., . . . Qin, S. (2023). Development of brain-state dynamics involved in working memory. Cerebral Cortex.

      (3) Lee, B., Young, C. B., Cai, W., Yuan, R., Ryman, S., Kim, J., . . . Menon, V. (2025). Dopaminergic modulation and dosage effects on brain state dynamics and working memory component processes in Parkinson’s disease. Nature Communications, 16(1), 2433.

      (4) Liu, Y., Dolan, R. J., Kurth-Nelson, Z., & Behrens, T. E. J. (2019). Human Replay Spontaneously Reorganizes Experience. Cell, 178(3), 640-652.e614.

      (5) Meer, J. N. v. d., Breakspear, M., Chang, L. J., Sonkusare, S., & Cocchi, L. (2020). Movie viewing elicits rich and reliable brain state dynamics. Nature Communications, 11(1), 5004.

      (6) Phan, A. T., Xie, W., Chapeton, J. I., Inati, S. K., & Zaghloul, K. A. (2024). Dynamic patterns of functional connectivity in the human brain underlie individual memory formation. Nature Communications, 15(1), 8969.

      (7) Ryali, S., Supekar, K., Chen, T., Kochalka, J., Cai, W., Nicholas, J., . . . Menon, V. (2016). Temporal Dynamics and Developmental Maturation of Salience, Default and Central-Executive Network Interactions Revealed by Variational Bayes Hidden Markov Modeling. PLoS Comput Biol, 12(12), e1005138.

      (8) Shine, J. M., & Poldrack, R. A. (2018). Principles of dynamic network reconfiguration across diverse brain states. Neuroimage, 180, 396-405.

      (9) Stevner, A. B. A., Vidaurre, D., Cabral, J., Rapuano, K., Nielsen, S. F. V., Tagliazucchi, E., . . . Kringelbach, M. L. (2019). Discovery of key whole-brain transitions and dynamics during human wakefulness and non-REM sleep. Nature Communications, 10(1), 1035.

      (10) Taghia, J., Cai, W., Ryali, S., Kochalka, J., Nicholas, J., Chen, T., & Menon, V. (2018). Uncovering hidden brain state dynamics that regulate performance and decision-making during cognition. Nature Communications, 9(1), 2505.

      (11) Tambini, A., & Davachi, L. (2019). Awake Reactivation of Prior Experiences Consolidates Memories and Biases Cognition. Trends in Cognitive Sciences, 23(10), 876-890.

      (12) Tambini, A., Rimmele, U., Phelps, E. A., & Davachi, L. (2017). Emotional brain states carry over and enhance future memory formation. Nature Neuroscience, 20(2), 271-278.

      (13) Tompary, A., & Davachi, L. (2017). Consolidation Promotes the Emergence of Representational Overlap in the Hippocampus and Medial Prefrontal Cortex. Neuron, 96(1), 228-241.e225.

      (14) Verde, M. F., Macmillan, N. A., & Rotello, C. M. (2006). Measures of sensitivity based on a single hit rate and false alarm rate: The accuracy, precision, and robustness of′, A z, and A’. Perception & psychophysics, 68(4), 643-654.

      (15) Wickens, T. D. (2001). Elementary signal detection theory: Oxford university press.

      (16) Wimmer, G. E., Liu, Y., Vehar, N., Behrens, T. E. J., & Dolan, R. J. (2020). Episodic memory retrieval success is associated with rapid replay of episode content. Nature Neuroscience, 23(8), 1025-1033.

      (17) Zeng, Y., Xiong, B., Gao, H., Liu, C., Chen, C., Wu, J., & Qin, S. (2024). Cortisol awakening response prompts dynamic reconfiguration of brain networks in emotional and executive functioning. Proceedings of the National Academy of Sciences, 121(52), e2405850121.

      (18) Zielinski, M. C., Tang, W., & Jadhav, S. P. (2020). The role of replay and theta sequences in mediating hippocampal-prefrontal interactions for memory and cognition. Hippocampus, 30(1), 60-72.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      General Statements

      Our study identifies characteristics of secretory signal peptides in fungi, and how their sequence determines which alternative pathways that proteins take to the endoplasmic reticulum. All 3 reviewers grasp this, and agree that the study is publishable. Reviewer 3 puts it well, that we "convincingly show that the length of the hydrophobic helix in a signal peptide is the main factor distinguishing [...] pathways. This simplifies a previous model [...] provides a modest but important advancement to the field of protein secretion. ... The study extends its computational analysis beyond the model yeast Saccharomyces cerevisiae to a diverse range of fungal species."

      Thank you to all the reviewers: we found the reviews fair and constructive. and have addressed them in full.

      In the process of responding to reviews, we softened the claim in the title to "Protein secretion routes in fungi are predicted by the length of the hydrophobic helix in the signal sequence". We also reorganised the manuscript to put the cross-fungal analysis first, followed by the more detailed mechanistic analysis. We feel that this leads a broader audience through the story more effectively. This reorganisation also moved some material from introduction to discussion. Also on larger-scale changes, we reformatted the materials and methods section as requested.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary:

      In this manuscript the authors analyze characteristics of secretory signal peptides in fungi. They identify length of the hydrophobic core rather than overall hydrophobicity as the parameter that determines whether proteins use SRP-dependent cotranslational import through the Sec61 channel, or SRP-independent posttranslational translocation through the hetero-heptameric Sec complex to enter the ER.

      Major comments

      1. The authors need to adequately use the existing nomenclature in the field:

        There is no 'Sec63 translocon'. Proteins with more hydrophobic signal sequences are targeted to the ER by SRP and its receptor, and these proteins are translocated cotranslationally by the Sec61 channel (aka the translocon). Proteins with less hydrophobic signal sequences are imported into the ER postranslationally by the Sec complex consisting of the Sec61 channel and hetero-tetrameric Sec63 complex (Sec62, Sec63, Sec71, Sec72).

        Sec63 on its own also contributes to co-translational import (Brodsky et al, PNAS, 1995), so the term 'Sec63 translocon' is really confusing and should be replaced by the standard nomenclature as above throughout the paper.

      We sincerely appreciate the advice in correctly navigating terminology in the secretion and translocation field. We now say "Sec complex", and not the incorrect "Sec63 translocon". In the same spirit, we have replaced the terminology "Sec63-dependent" with "Sec-dependent", which is a more accurate description of the overall role of the Sec complex. For example, Ast et al. primarily assayed dependence on the Sec complex using sec72∆ strains.

      The paper should contain a proper methods section.

      We have reformatted the manuscript with a separate materials and methods section in the main manuscript, per Genetics/G3 journal family guidelines.

      The authors should explain more explicitly the differences of the Phobius and DeepTMHMM algorithms. Why was that particular algorithm chosen for comparison to Phobius?

      We initially focused on algorithms that distinguish SPs and TM sequences in a single tool, which both Phobius and DeepTMHMM do. This differs from other algorithms such as the SignalP family, that do not also predict TM sequences - SignalP version 4.0 onwards was indeed trained to exclude TM sequences from their predictions (PMID: 21959131).

      In response to this and the similar comment from reviewer 2, we expanded our analysis to compare with the SignalP6.0 algorithm as well as DeepTMHMM.

      Minor comments

      • p2, para 2: ER protein import has been studied for 50 years, and its complexity been obvious for well over a decade

      We corrected this to "However, detailed functional investigations of secretion mechanisms in eukaryotes have focused on a handful of model yeasts and mammalian cells, revealing unexpected complexity"

      • p2, para 3: ref for the signal sequence should be one of the original Blobel papers instead of [8]

      We added the citation to Blobel and Sabatini, 1971, and kept the 1979 citation as we find the additional context is helpful to readers.

      • p3, para 1: ref for SRP should be Walter, Ibrahimi, & Blobel, JCB 1981, instead of [11]

      We added the original citation, and again kept the more modern citation that summarizes the field in decades following initial discovery.

      • p3, para 1: NB: SRP and its receptor do NOT translocate anything, they TARGET proteins to the ER

      We have corrected this, thank you.

      Reviewer #1 (Significance (Required)):

      The authors report an interesting observation which is of interest to the field and sufficiently well documented in this manuscript to be convincing. The paper does extend our understanding of the critical characteristics of secretory signal peptides.

      A limitation of all signal peptide prediction by current algorithms is that they are trained on 'standard' signal peptides and tend to miss ones that do not sufficiently conform to the standard parameters.

      Thank you for this point, the "standard/non-standard" conceptualization is helpful and we now mention this in our expanded discussion. We agree that testing the limits of these models would involve experimental screening of non-standard or non-natural sequences.

      Reviewer's expertise: SRP and Sec61 channel structure/function analysis, cell-free assays for ER protein import, yeast genetics

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Review of manuscript of Sones-Dykes et al. entitled: 'Protein secretion routes in fungi are mostly determined by the length of the hydrophobic helix in the signal peptide'

      This manuscript deals with the important question of how different fungi exhibit variety in protein targeting to the secretory pathway mostly using bioinformatic sequence analysis. This is important for understanding the evolution of the diverse targeting routes within the early secretory pathway, but also for biotechnology since diverse fungi are used as "biofactories" in biotechnological production of secreted proteins. While the results of the current study mostly confirm the analyses already carried out in S.cerevisiae, the work is important and warrants publication in a suitable journal.

      We appreciate this positive and balanced appraisal.

      Major points:

      1. Could the authors elaborate what was the motivation to use Phobius and not some other signal peptide predictor? I am wondering because of the cited Ast et al. paper is already several years old and new improved prediction tools such as the latest SignalP iteration have been developed since that study.

      The main motivation to use Phobius, and check with DeepTMHMM, was that these tools simultaneously predict cleaved signal peptides and transmembrane helices, unlike other tools that predict only cleaved signal peptides and can give false positives with N-terminal transmembrane helices.

      To clarify this point, we also emailed Prof. Henrik Nielsen, the lead developer of SignalP. I asked: "Although we mostly used Phobius prediction and also compared to DeepTMHMM, reviewers have asked us to also compare to SignalP. A critical part of our argument is about predictions of the h-region length, so we would like to compare h-region lengths to SignalP4.1 HMM mode in addition to SignalP6.0."

      Prof. Nielsen replied:

      As for your question, I must tell you that SignalP 4.1 does not have an HMM mode at all. The last SignalP version to have an HMM mode was 3.0. Therefore, 4.0, 4.1, and 5.0 do not output signal peptide regions; this was first reintroduced with version 6.0. See also the FAQ tab at the website.

      *You could try to install version 3.0, but for your purpose, I would not recommend it. The old HMM module had a strong preference for certain h-region lengths because of a specific kind of overtraining. This was, at least partially, solved in Phobius through regularization of the length distribution. Since h-region length is a crucial parameter in your analysis, I would not trust the region assignments by SignalP 3.0. You are welcome to cite me for that to the reviewers, if needed. *

      But comparing the region assignments between Phobius and SignalP 6.0 will be interesting.**

      Regarding SignalP3.0, we now cite Liaci et al., who analysed all experimentally verified eukaryotic signal peptides using SignalP 3.0, and Xue et al., who analysed S. cerevisiae signal peptides, and both arrived at similar conclusions that cleaved signal peptides have hydrophobic regions of length 8-14 amino acids.

      Also, we have expanded our analysis to also compare Phobius and SignalP6.0 predictions of entire signal peptides and of h-regions. The comparisons are now in Figures 4, S3, and S4.

      I am slightly puzzled by the analysis of the annotation of the Sec63- and SRP-dependent targeting sequences presented in Fig. 1. Could the "SRP-dependent" sequences with long hydrophobic sequences simply be called transmembrane helices? Based on structure of the SPC, it has been proposed that cleavable signal peptides with h-regions beyond 18 residues are extremely rare so I would imagine that majority of these sequences are longer transmembrane segments.

      The point of this figure is to compare lists of proteins that are experimentally verified to be Sec-dependent or SRP-dependent in their targeting, so that's the correct way to refer to them for the purpose of this analysis. Yes, the conclusion of this paper and other work (e.g. Ast et al.) is that these SRP-dependent sequences with long hydrophobic sequences are mostly transmembrane (TM) helices.

      I appreciate the analysis of protein targeting features in evolutionarily distinct fungal species, but since the authors highlight importance of fungi in heterologous industrial protein production, it would have been satisfying to see some of these fungi included in this analysis. In particular, Pichia pastoris and Trichoderma reesei are commonly used fungi with apparently a highly specialized secretory machinery capable of very high production levels of different secretory proteins. I would urge the authors to consider the aspect of selecting optimal secretion signals for these industrial fungi and perhaps include some discussion of it in this manuscript.

      We added Pichia pastoris (Komagataella phaffii) and Trichoderma reesei to the analysis. We appreciate the suggestion to discuss optimal secretion signals, however, our analysis doesn't directly address that so we chose to leave that point out.

      Minor points:

      1. The authors state that both Sec63 and SRP pathways converge at the Sec61 translocon. However, we now know that targeting of proteins to Sec61 is even more complicated and for example the EMC is a complex that delivers some proteins to Sec61. It might be appropriate to cite some recent reviews on complexity of early protein targeting to Sec61 in the Introduction.

      As a review of complexity of early protein targeting, we cite a Aviram and Schuldiner 2017 (Targeting and translocation of proteins to the endoplasmic reticulum at a glance). We could add other citations if the reviewer considers this to be necessary.

      Page 5. The authors repeat the compound hydropathy analysis of Ast et al. and used the earlier reported 9-amino acid window for this. Is this analysis result robust with other window sizes?

      Ast et al., checked that this result is robust to window sizes of 9, 11, or 19 aa, in their Figure S1A, which we now specifically mention. In our manuscript, we instead check robustness to different hydropathy scales and prediction algorithms.

      Page 12. Authors state that "cleaved signal peptides do not need to span a membrane". A recent structure of the signal peptidase complex (PMID: 34388369) directly suggests that the signal peptide does span the membrane immediately before its final cleavage. Importantly, the SPC thins the membrane in this region to accommodate the shorter signal peptide h-region and this is proposed as a basis for SPC discriminating between signal peptides and longer transmembrane segments. It would be appropriate to cite this paper in the Discussion.

      Thank you for bringing this important paper to our attention. We have clarified our wording here and cited Liaci et al (PMID: 34388369) in the updated manuscript. Both for the detailed structural discussion, and for similarly concluding that in mammals "Signal peptides possess short h-regions".

      Reviewer #2 (Significance (Required)):

      Protein targeting into the early secretory pathway is an important general concept, and recent years have revealed many new aspects into the diverse mechanisms that cells employ for targeting of proteins with diverse folding needs by use of protein-specific targeting sequences. Also, how proteins are targeted is an important biotechnological question as choice of e.g. the signal peptide can have a dramatic impact on quantity and quality of the produced protein.

      This work is generally interesting to cell biologists studying mechanisms of protein targeting, but the results are mostly confirmatory. Still, no-one has carried out such analysis and fungi are remarkably diverse with potential for new innovations in protein targeting and therefore, the work should be published in my opinion. The suitable audience in my view is quite specialized and could be cell biologists with high interest in fungal protein secretion or biotechnologists using fungi for heterologous expression. For the latter, I would request the authors to extend the data analysis to a few more most biotechnologically relevant fungi and add some discussion on choice of signal peptide in biotechnological protein production in fungi.

      We appreciate this fair perspective. Indeed, we have added analyses of the biotechnologically relevant fungi Komagataella phaffii (Pichia pastoris), and Trichoderma reesei.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary:

      This manuscript revisits the analysis of hydrophobic forces driving endoplasmic reticulum translocation in fungi. Sones-Dykes and Wallace convincingly show that the length of the hydrophobic helix in a signal peptide is the main factor distinguishing SRP-dependent and Sec63-dependent pathways. This simplifies a previous model that relied on a compound hydropathy score, which incorporated both length and hydrophobicity. The analysis, confirmed by Phobius and DeepTMHMM, indicates that length alone is an equally effective and simpler metric for predicting the translocation route in fungi. The study extends its computational analysis beyond the model yeast Saccharomyces cerevisiae to a diverse range of fungal species. It finds that the bimodal distribution of hydrophobic helix lengths-short for predicted Sec63-dependent and long for SRP-dependent proteins-is highly conserved. By broadly identifying proteins with short hydrophobic helixes, the research suggests that the Sec63 translocation route is crucial for cell wall biogenesis and secretion (likely encompassing and the secretion of virulence factors). This provides a functional and pathological context for the translocation pathway choice.

      The manuscript was well written, and its central messages were clear.

      We appreciate this, and are glad that the messages came across clearly.

      Major points:

      • Extension of analysis to human secretome: In Fig 4, the helix length analysis is extended to additional organisms, among them Homo sapiens. It is observed that 'h-region lengths in humans had a similar distribution'. However, as the authors themselves note in the introduction, the functional thresholds of signal peptides are dramatically different in mammalian cells. Without overlaying 'ground truth' data of Sec63-dependence in humans, it is difficult to draw any conclusions about the meaning of h region length on human translocation preferences. I would suggest either: (1) Performing an analysis similar to that done in Fig 1 for the human secretome (2) Removing the human outgroup from the analysis in Fig 4.

      We appreciate the reviewer's point, but decided to keep the human analysis as an outgroup in Fig 4. only. This manuscript focuses on fungi by extrapolating and testing results from S. cerevisiae on other fungi. A mechanistic interpretation of signal peptides in human cells is out of scope due to the mentioned differences in functional thresholds of signal peptides in human cells. However, including humans gives a context that we feel readers would ask for if we did not include it.

      If we wanted to analyse the human signal peptides thoroughly then it would be interesting to extend to a more diverse range of eukaryotes, and extend beyond signal peptide prediction algorithms to structural modeling of signal peptides into cognate translocon structures. That's a whole different project.

      • Incorporate additional cross-validation: Since the key findings from this paper stem from hydrophobic segment predictions, it would be beneficial to augment the conclusions with another independent analysis. The Hessa scale (PMID: 15674282) has the advantage of being a 'biological' hydrophobicity scale defined by transmembrane helix insertion. It would be important to show that the findings obtained with Phobius (e.g. no improvement in categorization with compound score) also hold with this scale.

      Thank you for this helpful and important point. We also performed the analysis with the Hessa scale, included in the updated manuscript as Figure S2. The Hessa scale looks like a better predictor than the Kyte-Doolittle or Rose scales in that the distributions are clearly different for SRP-dependent and Sec63-dependent proteins. However, there is no improvement in classification, both because the Hessa maximum hydrophobicity distributions for SP and TM groups overlap, and also because the 97.5% accuracy of the length-based prediction is already so good that there's no room to improve in classifying this set of S. cerevisiae sequences.

      Minor points:

      • Incorporate GO analysis in Fig 4: Visualization of the GO analysis referenced in the text (Fig 4) may be useful to drive home the point of .

      We have indicated the top enriched GO terms in the paper, and also provided the full GO results in the supplementary data at https://github.com/TristanSones-Dykes/TMSP_Pub. There's not really more information in these GO analyses that makes it worth plotting. For example, for predicted signal peptides in all annotated fungi, "extracellular region" and "cell wall" come up as very highly enriched with extremely low p-values.

      • Cite origin of 'ground truth' protein list: The authors cite 83 and 107 bona-fide Sec63-dependent and SRP-dependent proteins which were used to define the 'ground truth' lists. It would be informative to define how these lists were collected; for example, the Ast et al. paper referenced appears to validate ~40-50 proteins as Sec63-dependent.

      The 'ground truth' protein list was collected and curated in the paper by Ast et al., and thoroughly explained there. In our expanded methods section, we now explain their classification based on localisation/mislocalisation of GFP-tagged proteins in sec72∆ (Sec63 complex deficient) strains. After careful checking, we didn't find any flaws in their analysis or any better yeast datasets more recent than 2013. So, we think the approach of giving a brief description here and referring to Ast et al. for a thorough description is most helpful for readers.

      Reviewer #3 (Significance (Required)):

      This manuscript by Sones-Dykes and Wallace provides a modest but important advancement to the field of protein secretion. While previous work has already identified that Sec63-dependent proteins in baker's yeast have moderately hydrophobic signal peptides, this paper refines this concept and extends it for additional fungal species. It will be of interest to researchers studying protein translocation/secretion pathways and fungal biology.

      Thank you for supporting the main point of our paper. We agree with the assessment, and that this analysis needed to be done to discover if and how results from S. cerevisiae extend to other fungi. We hope that this paper will encourage new work on mechanisms of protein secretion in other fungi, especially of the role of the Sec63 complex.

    1. Author response:

      Reviewer 1 (Public review):

      (1) Figure 1B shows the PREDICTED force-extension curve for DNA based on a worm-like chain model. Where is the experimental evidence for this curve? This issue is crucial because the F-E curve will decide how and when a catch-bond is induced (if at all it is) as the motor moves against the tensiometer. Unless this is actually measured by some other means, I find it hard to accept all the results based on Figure 1B.

      The Worm-Like-Chain model for the elasticity of DNA was established by early work from the Bustamante lab (Smith et al., 1992)  and Marko and Siggia (Marko and Siggia, 1995), and was further validated and refined by the Block lab (Bouchiat et al., 1999; Wang et al., 1997). The 50 nm persistence length is the consensus value, and was shown to be independent of force and extension in Figure 3 of Bouchiat et al (Bouchiat et al., 1999). However, we would like to stress that for our conclusions, the precise details of the Force-Extension relationship of our dsDNA are immaterial. The key point is that the motor stretches the DNA and stalls when it reaches its stall force. Our claim of the catch-bond character of kinesin is based on the longer duration at stall compared to the run duration in the absence of load. Provided that the motor is indeed stalling because it has stretched out the DNA (which is strongly supported by the repeated stalling around the predicted extension corresponding to ~6 pN of force), then the stall duration depends on neither the precise value for the extension nor the precise value of the force at stall.

      (2) The authors can correct me on this, but I believe that all the catch-bond studies using optical traps have exerted a load force that exceeds the actual force generated by the motor. For example, see Figure 2 in reference 42 (Kunwar et al). It is in this regime (load force > force from motor) that the dissociation rate is reduced (catch-bond is activated). Such a regime is never reached in the DNA tensiometer study because of the very construction of the experiment. I am very surprised that this point is overlooked in this manuscript. I am therefore not even sure that the present experiments even induce a catch-bond (in the sense reported for earlier papers).

      It is true that Kunwar et al measured binding durations at super-stall loads and used that to conclude that dynein does act as a catch-bond (but kinesin does not) (Kunwar et al., 2011). However, we would like to correct the reviewer on this one. This approach of exerting super-stall forces and measuring binding durations is in fact less common than the approach of allowing the motor to walk up to stall and measuring the binding duration. This ‘fixed trap’ approach has been used to show catch-bond behavior of dynein (Leidel et al., 2012; Rai et al., 2013) and kinesin (Kuo et al., 2022; Pyrpassopoulos et al., 2020). For the non-processive motor Myosin I, a dynamic force clamp was used to keep the actin filament in place while the myosin generated a single step (Laakso et al., 2008). Because the motor generates the force, these are not superstall forces either.

      (3) I appreciate the concerns about the Vertical force from the optical trap. But that leads to the following questions that have not at all been addressed in this paper:

      (i) Why is the Vertical force only a problem for Kinesins, and not a problem for the dynein studies?

      Actually, we do not claim that vertical force is not a problem for dynein; our data do not speak to this question. There is debate in the literature as to whether dynein has catch bond behavior in the traditional single-bead optical trap geometry - while some studies have measured dynein catch bond behavior (Kunwar et al., 2011; Leidel et al., 2012; Rai et al., 2013), others have found that dynein has slip-bond or ideal-bond behavior (Ezber et al., 2020; Nicholas et al., 2015; Rao et al., 2019). This discrepancy may relate to vertical forces, but not in an obvious way.

      (ii) The authors state that "With this geometry, a kinesin motor pulls against the elastic force of a stretched DNA solely in a direction parallel to the microtubule". Is this really true? What matters is not just how the kinesin pulls the DNA, but also how the DNA pulls on the kinesin. In Figure 1A, what is the guarantee that the DNA is oriented only in the plane of the paper? In fact, the DNA could even be bending transiently in a manner that it pulls the kinesin motor UPWARDS (Vertical force). How are the authors sure that the reaction force between DNA and kinesin is oriented SOLELY along the microtubule?

      We acknowledge that “solely” is an absolute term that is too strong to describe our geometry. We will soften this term in our revision to “nearly parallel to the microtubule”. In the Geometry Calculations section of Supplementary Methods, we calculate that if the motor and streptavidin are on the same protofilament, the vertical force will be <1% of the horizontal force. We also note that if the motor is on a different protofilament, there will be lateral forces and forces perpendicular to the microtubule surface, except they are oriented toward rather than away from the microtubule. The DNA can surely bend due to thermal forces, but because inertia plays a negligible role at the nanoscale (Howard, 2001; Purcell, 1977), any resulting upward forces will only be thermal forces, which the motor is already subjected to at all times.

      (4) For this study to be really impactful and for some of the above concerns to be addressed, the data should also have included DNA tensiometer experiments with Dynein. I wonder why this was not done?

      As much as we would love to fully characterize dynein here, this paper is about kinesin and it took a substantial effort. The dynein work merits a stand-alone paper.

      While I do like several aspects of the paper, I do not believe that the conclusions are supported by the data presented in this paper for the reasons stated above.

      The three key points the reviewer makes are the validity of the worm-like-chain model, the question of superstall loads, and the role of DNA bending in generating vertical forces. We hope that we have fully addressed these concerns in our responses above.

      Reviewer #2 (Public review):

      Major comments:

      (1) The use of the term "catch bond" is misleading, as the authors do not really mean consistently a catch bond in the classical sense (i.e., a protein-protein interaction having a dissociation rate that decreases with load). Instead, what they mean is that after motor detachment (i.e., after a motor protein dissociating from a tubulin protein), there is a slip state during which the reattachment rate is higher as compared to a motor diffusing in solution. While this may indeed influence the dynamics of bidirectional cargo transport (e.g., during tug-of-war events), the used terms (detachment (with or without slip?), dissociation, rescue, ...) need to be better defined and the results discussed in the context of these definitions. It is very unsatisfactory at the moment, for example, that kinesin-3 is at first not classified as a catch bond, but later on (after tweaking the definitions) it is. In essence, the typical slip/catch bond nomenclature used for protein-protein interaction is not readily applicable for motors with slippage.

      We appreciate the reviewer’s point and we will work to streamline and define terms in our revision.

      (2) The authors define the stall duration as the time at full load, terminated by >60 nm slips/detachments. Isn't that a problem? Smaller slips are not detected/considered... but are also indicative of a motor dissociation event, i.e., the end of a stall. What is the distribution of the slip distances? If the slip distances follow an exponential decay, a large number of short slips are expected, and the presented data (neglecting those short slips) would be highly distorted.

      The reviewer brings up a good point that there may be undetected slips. To address this question, we plotted the distribution of slip distances for kinesin-3, which by far had the most slip events. As the reviewer suggested, it is indeed an exponential distribution. Our preliminary analysis suggests that roughly 20% of events are missed due to this 60 nm cutoff. This will change our unloaded duration numbers slightly, but this will not alter our conclusions.\

      (3) Along the same line: Why do the authors compare the stall duration (without including the time it took the motor to reach stall) to the unloaded single motor run durations? Shouldn't the times of the runs be included?

      The elastic force of the DNA spring is variable as the motor steps up to stall, and so if we included the entire run duration then it would be difficult to specify what force we were comparing to unloaded. More importantly, if we assume that any stepping and detachment behavior is history independent, then it is mathematically proper to take any arbitrary starting point (such as when the motor reaches stall), start the clock there, and measure the distribution of detachments durations relative to that starting point.

      More importantly, what we do in Fig. 3 is to separate out the ramps from the stalls and, using a statistical model, we compute a separate duration parameter (which is the inverse of the off-rate) for the ramp and the stall. What we find is that the relationship between ramp, stall, and unloaded durations is different for the three motors, which is interesting in itself.

      (4) At many places, it appears too simple that for the biologically relevant processes, mainly/only the load-dependent off-rates of the motors matter. The stall forces and the kind of motor-cargo linkage (e.g., rigid vs. diffusive) do likely also matter. For example: "In the context of pulling a large cargo through the viscous cytoplasm or competing against dynein in a tug-of-war, these slip events enable the motor to maintain force generation and, hence, are distinct from true detachment events." I disagree. The kinesin force at reattachment (after slippage) is much smaller than at stall. What helps, however, is that due to the geometry of being held close to the microtubule (either by the DNA in the present case or by the cargo in vivo) the attachment rate is much higher. Note also that upon DNA relaxation, the motor is likely kept close to the microtubule surface, while, for example, when bound to a vesicle, the motor may diffuse away from the microtubule quickly (e.g., reference 20).

      We appreciate the reviewer’s detailed thinking here, and we offer our perspective. As to the first point, we agree that the stall force is relevant and that the rigidity of the motor-cargo linkage will play a role. The goal of the sentence on pulling cargo that the reviewer highlights is to set up our analysis of slips, which we define as rearward displacements that don’t return to the baseline before force generation resumes. We agree that force after slippage is much smaller than at stall, and we plan to clarify that section of text. However, as shown in the model diagram in Fig. 5, we differentiate between the slip state (and recovery from this slip state) and the detached state (and reattachment from this detached state). This delineation is important because, as the reviewer points out, if we are measuring detachment and reattachment with our DNA tensiometer, then the geometry of a vesicle in a cell will be different and diffusion away from the microtubule or elastic recoil perpendicular to the microtubule will suppress this reattachment.

      Our evidence for a slip state in which the motor maintains association with the microtubule comes from optical trapping work by Tokelis et al (Toleikis et al., 2020) and Sudhakar et al (Sudhakar et al., 2021). In particular, Sudhakar used small, high index Germanium microspheres that had a low drag coefficient. They showed that during ‘slip’ events, the relaxation time constant of the bead back to the center of the trap was nearly 10-fold slower than the trap response time, consistent with the motor exerting drag on the microtubule. (With larger beads, the drag of the bead swamps the motor-microtubule friction.) Another piece of support for the motor maintaining association during a slip is work by Ramaiya et al. who used birefringent microspheres to exert and measure rotational torque during kinesin stepping (Ramaiya et al., 2017). In most traces, when the motor returned to baseline following a stall, the torque was dissipated as well, consistent with a ‘detached’ state. However, a slip event is shown in S18a where the motor slips backward while maintaining torque. This is best explained by the motor slipping backward in a state where the heads are associated with the microtubule (at least sufficiently to resist rotational forces). Thus, we term the resumption after slip to be a rescue from the slip state rather than a reattachment from the detached state.

      To finish the point, with the complex geometry of a vesicle, during slip events the motor remains associated with the microtubule and hence primed for recovery. This recovery rate is expected to be the same as for the DNA tensiometer. Following a detachment, however, we agree that there will likely be a higher probability of reattachment in the DNA tensiometer due to proximity effects, whereas with a vesicle any elastic recoil or ‘rolling’ will pull the detached motor away from the microtubule, suppressing reattachment. We plan to clarify these points in the text of the revision.

      (5) Why were all motors linked to the neck-coil domain of kinesin-1? Couldn't it be that for normal function, the different coils matter? Autoinhibition can also be circumvented by consistently shortening the constructs.

      We chose this dimerization approach to focus on how the mechoanochemical properties of kinesins vary between the three dominant transport families. We agree that in cells, autoinhibition of both kinesins and dynein likely play roles in regulating bidirectional transport, as will the activity of other regulatory proteins. The native coiled-coils may act as as ‘shock absorbers’ due to their compliance, or they might slow the motor reattachment rate due to the relatively large search volumes created by their long lengths (10s of nm). These are topics for future work. By using the neck-coil domain of kinesin-1 for all three motors, we eliminate any differences in autoinhibition or other regulation between the three kinesin families and focus solely on differences in the mechanochemistry of their motor domains.

      (6) I am worried about the neutravidin on the microtubules, which may act as roadblocks (e.g. DOI: 10.1039/b803585g), slip termination sites (maybe without the neutravidin, the rescue rate would be much lower?), and potentially also DNA-interaction sites? At 8 nM neutravidin and the given level of biotinylation, what density of neutravidin do the authors expect on their microtubules? Can the authors rule out that the observed stall events are predominantly the result of a kinesin motor being stopped after a short slippage event at a neutravidin molecule?

      We will address these points in our revision.

      (7) Also, the unloaded runs should be performed on the same microtubules as in the DNA experiments, i.e., with neutravidin. Otherwise, I do not see how the values can be compared.

      We will address this point in our revision.

      (8) If, as stated, "a portion of kinesin-3 unloaded run durations were limited by the length of the microtubules, meaning the unloaded duration is a lower limit." corrections (such as Kaplan-Meier) should be applied, DOI: 10.1016/j.bpj.2017.09.024.

      (9) Shouldn't Kaplan-Meier also be applied to the ramp durations ... as a ramp may also artificially end upon stall? Also, doesn't the comparison between ramp and stall duration have a problem, as each stall is preceded by a ramp ...and the (maximum) ramp times will depend on the speed of the motor? Kinesin-3 is the fastest motor and will reach stall much faster than kinesin-1. Isn't it obvious that the stall durations are longer than the ramp duration (as seen for all three motors in Figure 3)?

      The reviewer rightly notes the many challenges in estimating the motor off-rates during ramps. To estimate ramp off-rates and as an independent approach to calculating the unloaded and stall durations, we developed a Markov model coupled with Bayesian inference methods to estimate a duration parameter (equivalent to the inverse of the off-rate) for the unloaded, ramp, and stall duration distributions. With the ramps, we have left censoring due to the difficulty in detecting the start of the ramps in the fluctuating baseline, and we have right censoring due to reaching stall (with different censoring of the ramp duration for the three motors due to their different speeds). The Markov model assumes a constant detachment probability and history independence, and thus is robust even in the face of left and right censoring (details in the Supplementary section). This approach is preferred over Kaplan-Meier because, although these non-parametric methods make no assumptions for the distribution, they require the user to know exactly where the start time is.

      Regarding the potential underestimate of the kinesin-3 unloaded run duration due to finite microtubule lengths. The first point is that the unloaded duration data in Fig. 2C are quite linear up to 6 s and are well fit by the single-exponential fit (the points above 6s don’t affect the fit very much). The second point is that when we used our Markov model (which is robust against right censoring) to estimate the unloaded and stall durations, the results agreed with the single-exponential fits very well (Table S2). For instance, the single-exponential fit for the kinesin-3 unloaded duration was 2.74 s (2.33 – 3.17 s 95% CI) and the estimate from the Markov model was 2.76 (2.28 – 3.34 s 95% CI). Thus, we chose not to make any corrections due to finite microtubule lengths.

      (10) It is not clear what is seen in Figure S6A: It looks like only single motors (green, w/o a DNA molecule) are walking ... Note: the influence of the attached DNA onto the stepping duration of a motor may depend on the DNA conformation (stretched and near to the microtubule (with neutravidin!) in the tethered case and spherically coiled in the untethered case).

      In Figure S6A kymograph, the green traces are GFP-labeled kinesin-1 without DNA attached (which are in excess) and the red diagonal trace is a motor with DNA attached. There are also two faint horizontal red traces, which are labeled DNA diffusing by (smearing over a large area during a single frame). Panel S6B shows run durations of motors with DNA attached. We agree that the DNA conformation will differ if it is attached and stretched (more linear) versus simply being transported (random coil), but by its nature this control experiment is only addressing random coil DNA.

      (11) Along this line: While the run time of kinesin-1 with DNA (1.4 s) is significantly shorter than the stall time (3.0 s), it is still larger than the unloaded run time (1.0 s). What do the authors think is the origin of this increase?

      Our interpretation of the unloaded kinesin-DNA result is that the much slower diffusion constant of the DNA relative to the motor alone enables motors to transiently detach and rebind before the DNA cargo has diffused away, thus extending the run duration. In contrast, such detachment events for motors alone normally result in the motor diffusing away from the microtubule, terminating the run. This argument has been used to reconcile the longer single-motor run lengths in the gliding assay versus the bead assay (Block et al., 1990). Notably, this slower diffusion constant should not play a role in the DNA tensiometer geometry because if the motor transiently detaches, then it will be pulled backward by the elastic forces of the DNA and detected as a slip or detachment event. We will address this point in the revision.

      (12) "The simplest prediction is that against the low loads experienced during ramps, the detachment rate should match the unloaded detachment rate." I disagree. I would already expect a slight increase.

      Agreed. We will change this text to: “The prediction for a slip bond is that against the low loads experienced during ramps, the detachment rate should be equal to or faster than the unloaded detachment rate.”

      (13) Isn't the model over-defined by fitting the values for the load-dependence of the strong-to-weak transition and fitting the load dependence into the transition to the slip state?

      Essentially, yes, it is overdefined, but that is essentially by design and it is still very useful. Our goal here was to make as simple a model as possible that could account for the data and use it to compare model parameters for the different motor families. Ignoring the complexity of the slip and detached states, a model with a strong and weak state in the stepping cycle and a single transition out of the stepping cycle is the simplest formulation possible. And having rate constants (k<sub>S-W</sub> and k<sub>slip</sub> in our case) that vary exponentially with load makes thermodynamic sense for modeling mechanochemistry (Howard, 2001). Thus, we were pleasantly surprised that this bare-bones model could recapitulate the unloaded and stall durations for all three motors (Fig. 5C-E).

      (14) "When kinesin-1 was tethered to a glass coverslip via a DNA linker and hydrodynamic forces were imposed on an associated microtubule, kinesin-1 dissociation rates were relatively insensitive to loads up to ~3 pN, inconsistent with slip-bond characteristics (37)." This statement appears not to be true. In reference 37, very similar to the geometry reported here, the microtubules were fixed on the surface, and the stepping of single kinesin motors attached to large beads (to which defined forces were applied by hydrodynamics) via long DNA linkers was studied. In fact, quite a number of statements made in the present manuscript have been made already in ref. 37 (see in particular sections 2.6 and 2.7), and the authors may consider putting their results better into this context in the Introduction and Discussion. It is also noteworthy to discuss that the (admittedly limited) data in ref. 37 does not indicate a "catch-bond" behavior but rather an insensitivity to force over a defined range of forces.

      The reviewer misquoted our sentence. The actual wording of the sentence was: “When kinesin-1 was connected to micron-scale beads through a DNA linker and hydrodynamic forces parallel to the microtubule imposed, dissociation rates were relatively insensitive to loads up to ~3 pN, inconsistent with slip-bond characteristics (Urbanska et al., 2021).” The sentence the reviewer quoted was in a previous version that is available on BioRxiv and perhaps they were reading that version. Nonetheless, in the revision we will note in the Discussion that this behavior was indicative of an ideal bond (not a catch-bond), and we will also add a sentence in the Introduction highlighting this work.

      Reviewer #3 (Public review):

      The authors attribute the differences in the behaviour of kinesins when pulling against a DNA tether compared to an optical trap to the differences in the perpendicular forces. However, the compliance is also much different in these two experiments. The optical trap acts like a ~ linear spring with stiffness ~ 0.05 pN/nm. The dsDNA tether is an entropic spring, with negligible stiffness at low extensions and very high compliance once the tether is extended to its contour length (Fig. 1B). The effect of the compliance on the results should be addressed in the manuscript.

      This is an interesting point. To address it, we calculated the predicted stiffness of the dsDNA by taking the slope of theoretical force-extension curve in Fig. 1B. Below 650 nm extension, the stiffness is <0.001 pN/nM; it reaches 0.01 pN/nM at 855 nm, and at 960 nm where the force is 6 pN the stiffness is roughly 0.2 pN/nm. That value is higher than the quoted 0.05 pN/nm trap stiffness, but for reference, at this stiffness, an 8 nm step leads to a 1.6 pN jump in force, which is reasonable. Importantly, the stiffness of kinesin motors has been estimated to be in the range of 0.3 pN (Coppin et al., 1996; Coppin et al., 1997). Granted, this stiffness is also nonlinear, but what this means is that even at stall, our dsDNA tether has a similar predicted compliance to the motor that is pulling on it. We will address this point in our revision.  

      Compared to an optical trapping assay, the motors are also tethered closer to the microtubule in this geometry. In an optical trap assay, the bead could rotate when the kinesin is not bound. The authors should discuss how this tethering is expected to affect the kinesin reattachment and slipping. While likely outside the scope of this study, it would be interesting to compare the static tether used here with a dynamic tether like MAP7 or the CAP-GLY domain of p150glued.

      Please see our response to Reviewer #2 Major Comment #4 above, which asks this same question in the context of intracellular cargo. We plan to address this in our revision. Regarding a dynamic tether, we agree that’s interesting – there are kinesins that have a second, non-canonical binding site that achieves this tethering (ncd and Cin8); p150glued likely does this naturally for dynein-dynactin-activator complexes; and we speculated in a review some years ago (Hancock, 2014) that during bidirectional transport kinesin and dynein may act as dynamic tethers for one another when not engaged, enhancing the activity of the opposing motor.

      In the single-molecule extension traces (Figure 1F-H; S3), the kinesin-2 traces often show jumps in position at the beginning of runs (e.g., the four runs from ~4-13 s in Fig. 1G). These jumps are not apparent in the kinesin-1 and -3 traces. What is the explanation? Is kinesin-2 binding accelerated by resisting loads more strongly than kinesin-1 and -3?

      Due to the compliance of the dsDNA, the 95% limits for the initial attachment position are +/- 290 nm (Fig. S2). Thus, some apparent ‘jumps’ from the detached state are expected. We will take a closer look at why there are jumps for kinesin-2 that aren’t apparent for kinesin-1 or -3.

      When comparing the durations of unloaded and stall events (Fig. 2), there is a potential for bias in the measurement, where very long unloaded runs cannot be observed due to the limited length of the microtubule (Thompson, Hoeprich, and Berger, 2013), while the duration of tethered runs is only limited by photobleaching. Was the possible censoring of the results addressed in the analysis?

      Yes. Please see response to Reviewer #2 points (8) and (9) above.

      The mathematical model is helpful in interpreting the data. To assess how the "slip" state contributes to the association kinetics, it would be helpful to compare the proposed model with a similar model with no slip state. Could the slips be explained by fast reattachments from the detached state?

      In the model, the slip state and the detached states are conceptually similar; they only differ in the sequence (slip to detached) and the transition rates into and out of them. The simple answer is: yes, the slips could be explained by fast reattachments from the detached state. In that case, the slip state and recovery could be called a “detached state with fast reattachment kinetics”. However, the key data for defining the kinetics of the slip and detached states is the distribution of Recovery times shown in Fig. 4D-F, which required a triple exponential to account for all of the data. If we simplified the model by eliminating the slip state and incorporating fast reattachment from a single detached state, then the distribution of Recovery times would be a single-exponential with a time constant equivalent to t<sub>1</sub>, which would be a poor fit to the experimental distributions in Fig. 4D-F.

      We appreciate the efforts and helpful suggestions of all three reviewers and the Editor.

      References:

      Block, S.M., L.S. Goldstein, and B.J. Schnapp. 1990. Bead movement by single kinesin molecules studied with optical tweezers. Nature. 348:348-352.

      Bouchiat, C., M.D. Wang, J. Allemand, T. Strick, S.M. Block, and V. Croquette. 1999. Estimating the persistence length of a worm-like chain molecule from force-extension measurements. Biophys J. 76:409-413.

      Coppin, C.M., J.T. Finer, J.A. Spudich, and R.D. Vale. 1996. Detection of sub-8-nm movements of kinesin by high-resolution optical-trap microscopy. Proc Natl Acad Sci U S A. 93:1913-1917.

      Coppin, C.M., D.W. Pierce, L. Hsu, and R.D. Vale. 1997. The load dependence of kinesin's mechanical cycle. Proc Natl Acad Sci U S A. 94:8539-8544.

      Ezber, Y., V. Belyy, S. Can, and A. Yildiz. 2020. Dynein Harnesses Active Fluctuations of Microtubules for Faster Movement. Nat Phys. 16:312-316.

      Hancock, W.O. 2014. Bidirectional cargo transport: moving beyond tug of war. Nat Rev Mol Cell Biol. 15:615-628.

      Howard, J. 2001. Mechanics of Motor Proteins and the Cytoskeleton. Sinauer Associates, Inc., Sunderland, MA. 367 pp.

      Kunwar, A., S.K. Tripathy, J. Xu, M.K. Mattson, P. Anand, R. Sigua, M. Vershinin, R.J. McKenney, C.C. Yu, A. Mogilner, and S.P. Gross. 2011. Mechanical stochastic tug-of-war models cannot explain bidirectional lipid-droplet transport. Proc Natl Acad Sci U S A. 108:18960-18965.

      Kuo, Y.W., M. Mahamdeh, Y. Tuna, and J. Howard. 2022. The force required to remove tubulin from the microtubule lattice by pulling on its alpha-tubulin C-terminal tail. Nature communications. 13:3651.

      Laakso, J.M., J.H. Lewis, H. Shuman, and E.M. Ostap. 2008. Myosin I can act as a molecular force sensor. Science. 321:133-136.

      Leidel, C., R.A. Longoria, F.M. Gutierrez, and G.T. Shubeita. 2012. Measuring molecular motor forces in vivo: implications for tug-of-war models of bidirectional transport. Biophys J. 103:492-500.

      Marko, J.F., and E.D. Siggia. 1995. Stretching DNA. Macromolecules. 28:8759-8770.

      Nicholas, M.P., F. Berger, L. Rao, S. Brenner, C. Cho, and A. Gennerich. 2015. Cytoplasmic dynein regulates its attachment to microtubules via nucleotide state-switched mechanosensing at multiple AAA domains. Proc Natl Acad Sci U S A. 112:6371-6376.

      Purcell, E.M. 1977. Life at low Reynolds Number. Amer J. Phys. 45:3-11.

      Pyrpassopoulos, S., H. Shuman, and E.M. Ostap. 2020. Modulation of Kinesin's Load-Bearing Capacity by Force Geometry and the Microtubule Track. Biophys J. 118:243-253.

      Rai, A.K., A. Rai, A.J. Ramaiya, R. Jha, and R. Mallik. 2013. Molecular adaptations allow dynein to generate large collective forces inside cells. Cell. 152:172-182.

      Ramaiya, A., B. Roy, M. Bugiel, and E. Schaffer. 2017. Kinesin rotates unidirectionally and generates torque while walking on microtubules. Proc Natl Acad Sci U S A. 114:10894-10899.

      Rao, L., F. Berger, M.P. Nicholas, and A. Gennerich. 2019. Molecular mechanism of cytoplasmic dynein tension sensing. Nature communications. 10:3332.

      Smith, S.B., L. Finzi, and C. Bustamante. 1992. Direct mechanical measurements of the elasticity of single DNA molecules by using magnetic beads. Science. 258:1122-1126.

      Sudhakar, S., M.K. Abdosamadi, T.J. Jachowski, M. Bugiel, A. Jannasch, and E. Schaffer. 2021. Germanium nanospheres for ultraresolution picotensiometry of kinesin motors. Science. 371.

      Toleikis, A., N.J. Carter, and R.A. Cross. 2020. Backstepping Mechanism of Kinesin-1. Biophys J. 119:1984-1994.

      Urbanska, M., A. Ludecke, W.J. Walter, A.M. van Oijen, K.E. Duderstadt, and S. Diez. 2021. Highly-Parallel Microfluidics-Based Force Spectroscopy on Single Cytoskeletal Motors. Small. 17:e2007388.

      Wang, M.D., H. Yin, R. Landick, J. Gelles, and S.M. Block. 1997. Stretching DNA with optical tweezers. Biophys J. 72:1335-1346.

    1. Author Response:

      eLife Assessment

      The nematode C. elegans is an ideal model in which to achieve the ambitious goal of a genome-wide atlas of protein expression and localization. In this paper, the authors explore the utility of a new and efficient method for labeling proteins with fluorescent tags, evaluating its potential to be the basis for a larger, genome-wide effort that is likely to be very useful for the community. While the evidence for the method itself is solid, carrying out this project at a large scale will require significant additional feasibility studies.

      We appreciate the editor’s recognition that the evidence for our method is solid and that a genome-wide protein atlas in C. elegans would be highly valuable to the community. However, we respectfully disagree that significant additional feasibility studies are required. As comparison, the yeast proteome-wide GFP tagging project (Huh et al., Nature 2003) achieved ~75% coverage of ~6,000 proteins directly from an established protocol without any prior significant feasibility studies, at least to our knowledge. While the C. elegans genome is 3 times in size, we would argue that our tagging protocol may even be less labor intensive as it does not involve any cloning and the screening is visual, requiring no molecular biology skills. Reviewer 3 notes: “They also provide convincing evidence that labelling the whole proteome is an achievable goal with relatively limited resources and time.”

      Our pilot study validates all key parameters for genome-wide scaling: editing efficiency at novel loci with untested reagents, viability of tagged worms, and detectability of multiple spectrally separated fluorophores across expression ranges. These address the core technical, biological, and practical challenges of large-scale endogenous tagging in a multicellular organism, leaving no fundamental barriers in our view.

      The proposed cost and timeline align quite favorably with established large-scale consortium projects: e.g., ENCODE pilot analyzed 1% of the human genome at ~$55 million over 4 years; Mouse Knockout Consortium scaled to ~20,000 genes over 20 years (ongoing) with ~$100 million; Human Protein Atlas mapped ~87% of proteins with antibodies in fixed cells (through much more labor intensive methods) over 20+ years at >$100 million. With ~8% of C. elegans genes already tagged (WormTagDB), scaling our protocol to the proteome is feasible, potentially covering the genome in 5-6 years by a single lab or faster with distributed effort at a reagent cost of merely $2.2 million. The main barriers now are funding commitment and assembling collaborators, not further feasibility testing.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Eroglu and Hobert demonstrate that injecting CRISPR guides and repair constructs to target three genes at a time, tagging each with a different fluorescent protein, and selecting which gene to tag with which fluorophore based on genes' expression levels, can improve the efficiency of gene tagging.

      Strengths:

      This manuscript demonstrates that three genes can be targeted efficiently with three different fluorophores. It also presents some practical considerations, like using the fluorophore least complicated by agar/worm autofluorescence for genes with low expression levels, and cost calculations if the same methods were used on all genes.

      Weaknesses:

      Eroglu has demonstrated in a previous publication that single-stranded DNA injection can increase the efficiency of CRISPR in C. elegans while inserting two fluorescent proteins and a co-CRISPR marker into three loci. The current work is, therefore, an incremental advance. In general, I applaud the authors' willingness to think ahead to how whole proteome tagging might be accomplished, but I predict that the advance here will be one of many small advances that will get the field to that goal.

      Our manuscript indeed builds on prior multiplex editing (including our own co-CRISPR work), but the manuscript's primary contribution is not a novel technical breakthrough per se. Instead, our main goal was to pilot and strategize a feasible path to whole-proteome tagging in C. elegans and importantly test the following key parameters: (1) success rate of triple pools with prior untested reagents at novel targets; (2) utility of fluorophores across expression levels; (3) major effects on tagged protein function. In prior multiplexing, we used two targets which we already knew could be edited quite efficiently, with the 3rd target a point mutation with nearly 100% efficiency. Thus, it was not at all clear that picking 3 random genes and replacing the 3rd highly efficient locus with another less efficient large insertion would work or be sufficiently scalable for thousands of novel genes with unvalidated reagents at first pass.

      The title vastly oversells the advance in my view, and the first sentence of the Discussion seems a more apt summary of the key advance here.

      Some injections target genes on the same chromosome together, which will create unnecessary issues when doing necessary backcrossing, especially if the mutation rate is increased by CRISPR.

      We disagree with the reviewer’s assessment of the need for backcrossing, for two reasons: (1) Prior studies have shown that off-target mutations are not a serious concern in C. elegans (reviewed in PMID: 26336798 and PMID: 24685391). For instance, WGS of strains after CRISPR/Cas9 found negligible off-target effects (PMID: 25249454, PMID: 30420468 – using similar RNP/ssDNA method and multiple guides; PMID: 23979577, PMID: 27650892 using other methods). Targeted sequencing studies have reported similar findings, using various CRISPR/Cas9 methods, with essentially no mutations at sites other than the intended target (PMID: 23995389; PMID: 23817069). (2) If the goal is to tag the entire genome, the introduction of backcrossing should not reasonably be a routine part of the initial tagging.

      Lastly, if one wants to backcross at a later stage, the existence of tags on the same chromosome is actually an advantage because it permits selection for recombinants with wild-type chromosomes.

      Also, the need for backcrossing and perhaps sequencing made me wonder if injecting 3 together really is helpful vs targeting each gene separately, since only 5 worms need to be injected.

      Apart from our disagreement regarding backcrossing, we are puzzled by the reviewer’s comment that tagging each gene separately may not be considered helpful. Why would one do single tagging at a time, rather than triple tagging if the whole point of the paper is to demonstrate the scalability of tagging? Meaning, that one can shortcut tagging all genes by a factor of 3 through joint tagging? It is important to keep in mind that the rate limiting step for tagging the whole genome is the number of injections that can be done per day. Since there is no cloning to generate the repair templates/guides and all other reagents are commercially available and not sample specific, these can be prepared quite rapidly. Being able to isolate multiple lines (together or independently) from the same injection increases throughput 3-fold and in our view does not provide any disadvantages as individual tags can be isolated independently if desired.

      Beyond the numerous technical advantages pooling provides (also lower cost and throughput for making injection mixes as well as imaging), our results show that it yields epistemic benefits as well: we would never have noted the subcellular pattern in Fig. 6B, C with different sets of mitochondria being marked by different mitochondrial proteins had we imaged them separately or even aligned to a pan-mitochondrial landmark. As we mentioned in the discussion, grouping proteins predicted to localize to the same compartment together can simultaneously test how uniform or differentiated such compartments are during the screen.

      The limited utility of current blue fluorescent proteins makes me wonder if it's worth using at all at this stage, before there are better blue (or far red) fluorescent proteins.

      We do not think that the utility of current BFPs is very limiting. The theoretical brightness of mTagBFP2 is comparable to that of EGFP (PMID: 30886412), which was useful for the bulk of currently tagged proteins. Due to modestly higher autofluorescence in the blue spectrum, the practical brightness is somewhat less ideal, but we have shown that many proteins are expressed high enough to be detected quite well with mTagBFP2 by eye at low magnification. We also note that many tags that are not visible by eye under a dissection scope become visible with long exposure cameras of widefield microscopes or modern confocal (GaAsP) detectors, so the list of genes detectable with mTagBFP2 is likely to be much higher. We routinely use mTagBFP2 to super-resolve subnuclear structures with endogenous tags (e.g., in the nucleolus), with some tags having lower annotated FPKMs than the genes tested here.

      Some literature reviews, particularly in the Introduction and Abstract, rely too much on recent examples from the authors' laboratory instead of presenting the state of the field. I'd like to have known what exactly has been done with simultaneous injection targeting multiple loci more thoroughly, comparing what has been accomplished to date by various laboratories' advances to date.

      We are not sure what the reviewer is referring to when bemoaning that the Abstract and Introduction are too focused on our paper and not presenting the state of the field. In the Abstract, we do not refer to any literature. In the Introduction, we cite 28 papers, 6 of those from our lab (4 of which providing examples of protein tags). We do not believe that this can be fairly called an unbalanced presentation of the state of the field.

      This being said, we will gladly expand our Introduction to provide more background on co-CRISPRing. Labs have routinely used co-conversion (“coCRISPR”) markers for picking out their intended edits (e.g., point mutations or insertions), as it has been shown by multiple groups that a CRISPR/Cas9 edit at one locus correlates with efficiency at other simultaneous targets (PMID: 25161212). Generally, making point mutations with the Cas9/RNP protocol is highly efficient, especially at specific loci such as dpy-10. However, multiple FP-sized insertions have not been routinely attempted. We and only one other group have successfully attempted it using previously working targets and reagents (e.g., 28% in PMID: 26187122). Importantly, the efficiency of such multiple insertions has never been assessed at scale and using entirely untested reagents at novel sites – critical parameters to determine for a whole genome approach. So, we test here (1) the efficiency of triple insertions and (2) the chance of getting them with new and untested guides and reagents.

      In our view, since we have to use some injection/coCRISPR marker anyway for those genes which are not expressed at dissecting-scope visible levels (likely most genes), using highly expressed intended targets as improvised markers in a pooled approach makes our approach much more efficient. It allows us to find the worms with the highest chance of yielding CRISPR insertions, which we can screen with higher power methods for the dimmer targets, while enabling us to co-isolate other intended targets. Insertions, being often heterozygous in F1, can be segregated independently if desired, or homozygosed together to facilitate maintenance then outcrossed individually by those interested in studying specific genes in more detail.

      In the revised version of this manuscript, we will discuss some of these points in the first paragraph of the results section:

      “In C. elegans, screening for novel CRISPR/Cas9-induced genomic edits is facilitated either by use of co-injection markers (i.e., plasmids that form extrachromosomal arrays) that yield phenotypes or fluorescence in progeny of successfully injected worms, or co-editing well characterized loci using established and highly efficient reagents which likewise yield visible phenotypes. In the latter approach, termed “co-CRISPR”, worms edited at the marker locus are most likely to also carry the intended edit (Arribere et al., 2014).”

      “These attempts pooled reagents previously established to work efficiently and targeted genes that were known to yield functional fusion proteins when tagged. Thus, while in principle current methods could allow tagging of at least 3 independent loci in one injection if a co-CRISPR marker is omitted, it is not known to what extent such an approach could be generalized across the genome with previously unvalidated reagents (i.e., guides and repair template homology arms) at novel loci.”

      Reviewer #2 (Public review):

      The manuscript by Eroglu and Hobert presents a set of strains each harboring up to three fluorescently tagged endogenous proteins. While there is technically nothing wrong with the method and the images are beautiful, we struggled to appreciate the advance of this work - who is this paper for?

      We consider this paper to have two purposes: (1) motivate the community to come together to consider such genome-wide tagging approach; (2) provide a reference point for funding agencies that such an aim is not unreasonable and will provide novel interesting insights.

      As a technical method, the advance is minimal since the first author had already demonstrated that three mutations (fluorophore insertion and co-CRISPR marker) could be introduced simultaneously.

      We agree that the basic principle is similar. However, it was not clear that triple pooling three novel large edits would work, given the numbers in our original paper or that it would be scalable.

      The dpy-10 coCRISPR marker previously used is a highly efficient single site, with close to 100% hit rate. We also knew in the earlier study that the two pooled insertions already worked quite efficiently and did not disrupt the function of targeted proteins. Exchanging these plus dpy-10 for three novel tags was not guaranteed to succeed for many potential reasons, including both biological and technical. For instance, such a “marker free” approach necessitates that a significant number of targets in the genome should be expressed highly enough to be visible by fluorescence stereomicroscopy when tagged with current best fluorophores. The chance of disrupting gene function by tagging was also not explored in detail in C. elegans, nor whether one untested guide is generally sufficient. We think that establishing these parameters was meaningful and necessary for the goal of whole genome tagging. We have clarified some of these points in the text.

      As a pilot for creating genome-scale resources, it is not clear whether three different fluorophores in one animal, while elegantly designed and implemented, will be desired by the broader community.

      The usage of three different fluorophores is largely driven by the ability to co-inject and therefore cut injection effort by a factor of three. Moreover, having all three fluorophores together facilitates imaging and maintenance. Lastly, co-labeling has the potential to reveal unexpected patterns of co-localization or lack thereof (example: two mitochondrial proteins that we found to not have overlapping distribution). We clarified this point in the revised text in both the results and discussion.

      Finally, the interpretation of the patterns observed in the created lines is somewhat lacking. A Table with all the observations must be included. This can replace the descriptions of the observations with the different lines, which could be somewhat laborious for the reader, and are often wrong. There are numerous mistaken expectations of protein expression here, but two examples include:

      We are not convinced that expectations are mistaken. Below we respond to the reviewer’s specific examples and we are open to hear from the reviewer about additional cases.

      (1) The expectation that ACDH-10 is enriched in the intestine and epidermal tissues (hypodermis).

      There are multiple paralogs of this protein (see WormPaths or WormFlux) that may share functions in different tissues. There is also no reason to assume that fatty acid metabolism does not occur in other tissues (including the germline). Finally, there are no published studies about this enzyme, so we really don't know for sure what it's doing.

      The expression of acdh-10 is annotated in multiple scRNA datasets as intestine and epidermal enriched (Packer et al 2019, highest intestine and hyp; Ghaddar et al 2023 intestine, sheath and BWM, and even oocyte). We did not mean to imply that fatty acid metabolism does not occur in the gonad, nor that a paralog of acdh-10 could not be performing the same function in tissues where acdh-10 is not expressed.

      However, this raises an important question: why have different paralogs doing the same thing? Duplicate genes with the same function are generally not evolutionarily stable (PMID: 11073452, PMID: 24659815). That there are such striking tissue specific expression patterns of an essential or widely expressed protein class suggests that paralogs of the gene likely differ in some meaningful parameter that might align with tissue-specific functional needs or regulation. The reviewer’s statement that “there are no published studies about this enzyme, so we really don't know for sure what it's doing” is in fact an excellent demonstration of our point; finding out where the duplicates are expressed can provide a starting point to uncover potential differences between the paralogs. At the very least it can delineate to what degree paralogs diverge in their expression across the proteome and identify which such cases merit further study. In a more ideal scenario, prior information of protein function could indicate that the involved pathway requires tissue specific regulation.

      (2) The expectation that HXK-1 is ubiquitously expressed.

      Three paralogous enzymes are all associated with the same reaction, and we have shown that these three function redundantly in vivo, perhaps in different tissues (PMID: 40011787).

      The cited paper (PMID: 40011787) does not show where they are expressed. We discussed redundancy/paralogs above in point 1, and in our view the same applies here. They may perform the same reaction but are likely to differ in some meaningful way, be it regulation or rate of activity, for them to be stably maintained as functional genes over evolution.

      Moreover, single-cell RNA-seq data (PMID: 38816550) also show enrichment of hxk-1 in gonadal sheath cells.

      We note that the Ghaddar et al. and CeNGEN/Taylor et al. datasets do not. The scRNA paper cited by the referee (PMID: 38816550) also shows enrichment in neurons and pharynx, which we did not note. In our view, these in fact further support our goals: often, transcript datasets alone (frequently used to infer tissue function) do not sufficiently predict protein expression. One can post hoc find an scRNA-seq dataset that aligns somewhat with our protein observations, but how does one know which to trust a priori? Disagreements between transcript datasets will ultimately require resolution at the protein level, in our view.

      To clarify these points, we will add the following to the discussion section:

      “We also noted unexpected cell type dependent distributions of proteins involved in broadly important metabolic processes such as ACDH-10, which was depleted from the germline compared to other tissues, and HXK-1, which was highly enriched in the gonadal sheath. Notably, for these as well as other cases, scRNA-seq datasets were not sufficient to deduce a priori the observed cell type specific differences at the protein level. Importantly, many genes encoding metabolic enzymes including acdh-10 and hxk-1 have paralogs that likely perform similar catalytic functions. Yet, duplicate genes with identical functions are generally not evolutionarily stable (Adler et al., 2014; Lynch and Conery, 2000); thus such genes are likely to differ in some meaningful parameter (e.g., regulation or activity) that might align with tissue-specific functional needs. Fully annotating the expression patterns of paralogs at the protein level could indicate which tissues require unique metabolic needs and indicate which paralogous genes have undergone sub- versus neo-functionalization. For those proteins that are less functionally understood, unexpected distributions might indicate which merit further study.”

      The table should have at least the following information: gene/protein name - Wormbase ID - TPM levels of single cell data assigned to tissues for L2, L4, and adult (all published) - tissues in which expression is observed in the lines presented by the authors.

      We will add this information to the table including annotated expression levels in young adults from various datasets (but not larval datasets as we did not image these). We note that each of these studies use different pipelines and report different metrics (scaled TPM/Z-score versus Seurat average expression versus TPM), so comparisons between them are not informative unless they are integrated and analyzed together.

      Reviewer #3 (Public review):

      Summary:

      The authors argue that establishing the expression pattern and subcellular localisation of an animal's proteome will highlight many hypotheses for further study. To make this point and show feasibility, they developed a pipeline to knock in DNA encoding fluorescent tags into C. elegans genes.

      Strengths:

      The authors effectively make the points above. For example, they provide evidence of two populations of mitochondria in the C. elegans germline that differ qualitatively in the proteins they express. They also provide convincing evidence that labelling the whole proteome is an achievable goal with relatively limited resources and time.

      We are grateful for the referee’s appreciation that whole proteome tagging is feasible.

      Weaknesses:

      Cell biology in C. elegans is challenging because of the small size of many of its cells, notably neurons. This can make establishing the sub-cellular localisation of a fluorescently tagged protein, or co-localizing it with another protein, tricky. The authors point out in their introduction that advances in light microscopy, such as diSPIM, STED, and ISM (a close relative of SIM), have increased the resolution of light microscopy. They also point out that recent advances in expansion microscopy can similarly help overcome the resolution limit.

      (1) Have the authors investigated if the three fluorescent tags they use are appropriate for super-resolution microscopy of C. elegans, e.g., STED or SIM? Would Elektra be better than mTAGBFP2? How does mScarlet3-S2 compare to mScarlet 3?

      All three tags work for ISM (i.e., Airyscan). We previously tried Electra (not for the genes tested here) but could not isolate positive tags. Given Electra is not that much brighter on paper than mTagBFP2 we did not pursue it further, though we recognize that these may simply have been unlucky injections. mScarlet3-S2 is quite a bit dimmer than mScarlet3 on paper – the advantage is that it has higher photostability. In our view, the limiting factor will be having FPs that are bright enough to screen, image and scale to the whole genome, so brightness will likely provide an advantage over photostability at this stage.

      (2) Have the authors investigated what tags could be used in expansion microscopy - that is, which retain antigenicity or even fluorescence after the protocol is applied? It may be useful to add different epitope tags to the knock-in cassettes for this purpose.

      mSG and mSc3 retain fluorescence after fixing with formaldehyde. We have not tested mTagBFP2 fluorescence in fixed worms. We agree that adding different epitope tags would be useful.

      The paper is fine as it stands. The experiments above could add value to it and future-proof it, but are not essential. If the experiments are not attempted, the authors could refer to the points above in the discussion.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      we thank the reviewers for their close reading of the manuscript and detailed comments.

      __Reviewer #1 __

      1. The idea that Xrp1 induction switches around 16 h post-IR, becomes RpS12-dependent, and subsequently engages cell competition is interesting and potentially important. However, the evidence supporting RpS12-dependence of Xrp1 induction is currently not sufficiently convincing. For example, based on the images in Figure 6F-supplement 1, the conclusion that Xrp1 is induced in an RpS12-dependent manner appears difficult to support. The authors should strengthen and quantify this result or provide the raw image data. In addition, because this point is central to the authors' model, they should move the key supporting data from the supplementary figures to the main figures to ensure that this critical claim is clearly supported and readily accessible to readers.

      We apologize for confusing all three reviewers with this figure. Actually, Figure 6F supplement 1 does not compare RpS12-dependent and -independent Xrp1-HA expression. Instead, it shows that the rps12-independent Xrp1-HA expression is only mildly p53-dependent, which is consistent with our idea. We had not compared RpS12-dependence or Xrp1 expression in this manuscript because we had published that previously and found a substantial dependency (Fig 1N-P of Ji et al 2021). Because that previous paper used an anti-Xrp1 antibody, and the present paper measures an HA-tagged Xrp1 protein, it is probably a good idea to include the RpS12-dependence of late Xrp1 expression again, using the Xrp1-HA reagent. We have this data, which shows ~75% dependence, which is highly significant statistically. We will include this data in the revised manuscript, within one of the main figures.

      • The authors suggest a model in which Xrp1 executes two qualitatively distinct "modes"(pro-repair/acute DDR and elimination of aneuploid cells), but this remains only partially convincing as currently presented. The authors should at least (i) provide quantitative evidence that could explain how Xrp1 might produce distinct outcomes across phases(e.g., comparing Xrp1-HA levels and/or the fraction of Xrp1-HA-positive cells at 2-4 h versus 16-24 h post-IR), and (ii) explicitly discuss plausible mechanisms in the Discussion. Even if the molecular "switch" is not fully resolved experimentally, a clearer, data-grounded discussion of how Xrp1 could mediate these temporally distinct functions is needed. In addition, since ISR signaling (e.g., eIF2α phosphorylation) has been implicated as a single feature associated with Xrp1-dependent loser elimination, the authors should consider assessing p-eIF2α levels in Xrp1-HA positive cells at early versus late time points after IR(e.g., 4 h vs 24 h).

      We thank the reviewer for highlighting the need for this discussion. We will clarify these issues in the revised manuscript but do not think further experiments are necessary.

      1. It was well established previously and confirmed here that little DNA damage remains ~24h after IR. This is sufficient to explain why there is little DDR at this stage. We will make this clear in the revision.
      2. We did not intend to claim that no cell competition happens during the acute DDR ~4h after IR. We are not aware of experiments showing the DDR is strictly cell autonomous and not influenced by neighboring cells. If the acute DDR is indeed cell autonomous, or mostly so, this could be due to the additional genes induced directly by p53 that are not induced by Xrp1 ~24h after IR. The cell death gene Rpr is one example reported in our paper. We will discuss this in the revision.
      3. The reference to ISR as the single feature inducing Xrp1 expression is referring to two Nature Cell Biology papers published in 2021 (Baumgartner et al 2021; Recasens-Alvarez et al 2021). This idea has not stood the test of time. The ISR reporter activities shown in these papers were later shown to be downstream of Xrp1, not upstream (Langton et al 2021; Kiparaki et al 2022). Langton et al argued that there could be an initial ISR that was too small to be detectable, but this is hypothetical. There are now multiple papers and preprints showing that it is long isoforms of Xrp1 are ISR responsive, but that short isoforms of Xrp1 initiate cell competition, and that RpS12-dependent alternative splicing produces the short isoform. The short Xrp1 isoforms lack the uORF that responds to ISR (Elife 2021 Oct 4:10:e74047; bioRxiv 06.15.659587; bioRxiv 2025.10.29.685279). This is not consistent with the ISR initiating cell competition idea. Because we and others have shown that it is Xrp1 activity that induces eIF2α phosphorylation (Ochi et al 2021, Langton et al 2021, Kiparaki et al 2022), eIF2α phosphorylation in Xrp1 expressing cells would not prove a role for ISR and we do not propose to make these measurements. We are undecided whether to include this discussion of the ISR in the paper. It would lengthen the paper and we do not think it is directly relevant.
      4. The idea that aneuploid cells-or cells with altered ribosomal gene dosage-could be removed via Xrp1-mediated cell competition is intriguing. However, the manuscript does not currently provide any evidence that such cells are, in fact, being eliminated. The authors should therefore (i) quantify cell-level overlap metrics, such as the fraction of γH2Av-positive cells that are Xrp1-HA-positive (and vice versa), as well as the fraction of γH2Av-positive cells that are cleaved Dcp-1-positive (and vice versa) at 24 h post-IR. These quantitative analyses would clarify whether the late Xrp1-HA-positive population corresponds to persistently damaged cells and whether it is enriched for cells undergoing apoptosis/clearance. The authors should also (ii) directly assess aneuploidy/segmental copy-number imbalance in the late Xrp1-HA-positive clusters (e.g., by DNA FISH targeting one or two chromosome arms/regions), and if these experiments cannot be completed within a reasonable revision timeframe, the authors should temper their wording and present aneuploidy and selective elimination as a plausible interpretation supported byRpS12 dependency and prior literature, rather than as a demonstrated conclusion in the current study.

      We agree that aneuploidy is not demonstrated in the current study. Elimination of aneuploid cells with altered Rp gene dose was already established by previous papers. We cited previous work in the manuscript but did not summarize the evidence explicitly, so we are not sure whether the referee was fully aware. Ji et al (2021) created 17 different segmental aneuploidies using Flp/FRT recombination including or abutting 10 different Rp genes, together covering >20% of the euploid genome. The results showed that segmental aneuploidies are largely removed by Rp gene dose-dependent cell competition using the RpS12 and Xrp1 genes. Others have since confirmed that aneuploidies are removed by cell competition and that the effects of Rp gene dose depend on Xrp1 (Fusari et al Cell Genomics 2025). Therefore, we consider it established that aneuploid cells with altered Rp gene dosage are removed by this mechanism. We will discuss this explicitly in the revised manuscript.

      The question of whether cells dying in a p53-independent manner ~24h after irradiation are aneuploid cells undergoing cell competition was also addressed previously. Ji et al 2021 already showed that most of these cells are eliminated by RpS12 and Xrp1, consistent with altered Rp gene dosage, and that preventing cell competition leads to persistence into adulthood of cells that can be recognized at Rp+/- from their bristle phenotype. Evidence was shown that most such cells are segmental aneuploids, consistent with earlier studies of DNA repair mutants (Baker, 1978). We will summarize this in the revised manuscript so that it is not necessary to read the cited references to appreciate the evidence. The only new observation being made in this paper about the ~24h cell death stage is that loss of p53 increases the number of these cells, which could be because inadequate DNA repair leads to more aneuploid cells.

      It is important to appreciate that we do not claim that cells labeled by the DNA damage marker γH2Av are aneuploid, or being removed by cell competition. On the contrary, γH2Av labels cells with unrepaired DNA damage, whereas segmental aneuploidy can only occur as a consequence of completed DNA repair. Thus γH2Av-labeled cells are not generally expected to be Xrp1 positive or undergoing cell competition. Some may be, if they are cells that have both unrepaired DNA damage and repaired DNA damage that led to aneuploidy. We cannot quantify overlap in the existing data, since mouse antibodies for γH2Av and HA-tag were used in separate experiments. Repeating the experiments with different antibodies to measure the overlap would not address any outstanding questions.

      We doubt FISH would be effective at measuring aneuploidy because only gene dose corresponding to the probes would be detected. Only small portions of the genome could be assessed at a time so the frequency at which aneuploidy could be detected would be low. We will make it clear in the revised manuscript that cell competition of aneuploid cells is not a new claim of this paper but something that has been studied before.

      • Regarding the statistical analysis, revisions are warranted. In multiple panels, Student's t-tests are repeatedly performed against the same control, which inflates the family-wise error rate and increases the risk of false-positive findings. In such cases, an overall ANOVA (one-way) followed by an appropriate multiple-comparison procedure-such as Dunnett's-test would be more appropriate.

      This concern applies in particular to:

      Figure 1A- Supplement 1

      Figure 2M-R

      Figure 3Q, R

      Figure 5D

      Figure 5J- Supplement 1

      Figure 6G- Supplement 1

      1. Figure 6I- Supplement 2

      We agree and will apply Anova with multiple comparison procedures in the revised manuscript.

      Minor comments:

      1. Figure 2E is not cited in the text, and it is difficult to tell from the images as presented whether p53DN overexpression suppresses the Gstd-lacZ signal at 4 h post-IR.

      We will replace Fig 2E with a clearer example, and add a quantification of all our data, with statistics, as a supplemental figure. Note that the conclusion is already substantiated by qRT-PCR data (Figure 2M)

      In Figure 4, rpr150-lacZ does not appear to be upregulated by Xrp1 overexpression. Therefore, the authors should revise the figure title to avoid misleading readers, because rpr, a well-known p53-responsive pro-apoptotic gene, is not induced under this condition.

      We will change the Figure title. Failure to induce rpr150-LacZ here is a control to show that Xrp1 overexpression does not induce p53 activity.

      In Figure 6E, based on the data as presented, it is difficult to determine whether cleaved Dcp-1 (cDCP1)-positive cell counts are reduced upon Xrp1 knockdown. The authors should provide clearer representative images and/or include the underlying raw images as supplementary source data to support the conclusion.

      We will replace Fig 6E with a clearer example, and add a quantification of all the data.

      The authors should (i) show raw data points overlaid on summary plots (e.g., dot plots on top of bar graphs/box plots) to convey data distribution and (ii) include higher-magnification insets and/or quantitative localization/overlap analyses where colocalization is central to the interpretation (e.g., Xrp1-HA relative to γH2Av).

      We agree regarding the data display. As discussed later, colocalization is not relevant to the interpretation.

      __Reviewer #2 __

      1. First, authors present evidence that Xrp1 is induced in wing discs exposed to ionizing radiation (IR, known to cause DSBs) and that this induction relies on p53 regulating Xrp1transcription (Figure 1 and S1). Data are clear but there is a puzzling result. Xrp1-lacZ (a reporter of Xrp1 transcription) is induced by IR but independently of p53. These results need attention as they appear to be contradictory (why Xrp1-mRNA but not Xrp1-lacZ relies on p53). Nicely, authors show that Xrp1-lacZ induction relies on Xrp1/Irbp18 autoregulatory feedback. Is the lacZ insertion somehow interfering with the capacity of p53 to bind and regulate Xrp1 expression?

      We agree that it is a puzzling result. We have also noted elsewhere that Xrp1-LacZ does not always reflect Xrp1 mRNA and protein expression (Kumar and Baker 2022). We can add the reviewer's hypothesis to the manuscript, although it does not explain why Xrp1-LacZ is induced by IR

      • Second, authors use a collection of reporter genes and show that Xrp1 regulates, most but not all, Dp53 target genes. It is really unclear whether the reaper-lacZ used in Figure 3L-P recapitulates the induction of reaper by p53. I know this reporter was claimed by other do so, but NOT in the wing disc. I would then remove it as mRNA data are clear.

      rpr150-lacZ was used as a p53 reporter in wing imaginal discs by Wells et al. 2011 (PMC3296280). We will cite this in the revised manuscript. We prefer not to remove it as we also use this reporter for the experiment shown in Fig 4.

      3 Third, authors show that Xrp1, as expected from the previous data in Figure 2 and 3, also mediated the role of Dp53 in inducing cell death, although only partially, and these differences are attributed to the gene reaper (p53 but not Xrp1 target). Dcp1 should be cDcp1 and clones should be magnified in Fig 5E-G.

      We will follow this advice in the revised manuscript

      • First, the impact of Xrp1 on the levels of DNA damage and cell death after 24h of IR are shown in a p53 mutant background (6E1-6E3). Authors should present the data in a clean +/+ background. Quantification of 6F should also be done in the same background.

      This data was presented in a the p53 mutant background to focus on the p53-independent removal of cells by cell competition. We can perform an experiment in the presence of wild type p53 for completeness if desired, but a mixture of DDR and cell competition effects may result.

      Second, hid-GFP is being induced by IR already at 4 h after IR and this induction and this induction relies on p53 and Xrp1 activities as shown in previous figures. Thus, the data presented in 6G-J could be a trivial consequence of the strong perdurance of the GFP protein.

      hid-GFP is not expressed at 4 hours in p53DN and Xrp1 K/D (Fig 3D,E), so the expression in 6G-J cannot be explained by GFP perdurance from the earlier timepoint.

      Third, the role of cell competition (driven by Minute aneuploids) is not demonstrated and relies simply on the potential role of Xrp1 in the late wave of cell death, proposal that has not been demonstrated in this paper either. Indeed, the no-role of RpS12 in the late induction (24 h wave) of Xrp1 (Figure 6 S1-F) reinforces my doubts. Authors should reflect in the introduction and discussion sections the most recent literature in the field.

      The role of Xrp1 in the late wave of p53-independent cell death is shown in Fig 6D-F. As discussed above (reviewer 1 point 1), Fig 6S1-F shows the limited role of p53 in rpS12-independent Xrp1 induction, not the role of RpS12. We will add a figure to the revised manuscript showing the strong RpS12 dependence of the late induction of Xrp1-HA and explain this more clearly. We did not include this in the first manuscript version because we had already published this result, albeit with an anti-Xrp1 antibody (Ji et al Fig 1 N-P). As also discussed above (reviewer 1 point 3), we agree that the role of cell competition in removing aneuploid cells is not demonstrated in the present manuscript, but we considered this had been demonstrated previously (Ji et al 2021), and parts of that study recently confirmed by others (Fusari 2025 Cell Genomics), so it is not necessary to add further experimental support here, although it will be useful to explain the published literature more fully.

      Reviewer #3

      1. Figure 2E. Based on the text, I think the authors are claiming that the expression of GStD-LacZ is reduced in the posterior compartment of panel 2E compared to 2D. This is unconvincing. If at all, the expression along the DV boundary in the posterior compartment is stronger in E than in D. Am I missing something?

      We will replace Fig 2E with a clearer example, and add a quantification of all our data, with statistics, as a supplemental figure. Note that the conclusion is already substantiated by qRT-PCR data (Figure 2M)

      Figure 3I - K. The expression in the posterior compartment is supposed to be reduced compared to the anterior compartment. Once again, these differences are not easily apparent to me. Perhaps these images need to be quantified to illustrate the supposed difference.

      We are sorry that the reviewer found the images unconvincing. We will replace these figures with other examples, and add quantifications of all data, with statistics, as a supplemental figure. Note that the conclusions are already substantiated by qRT-PCR data (Figure 3R)

      • . *

      Line 286. The heading "Xrp1 is sufficient for the expression of p53-dependent DDR genes" is misleading. As stated in the final sentence of paragraph 2 of this section, the authors show that Xrp1 functions downstream of p53 and is sufficient for expressing a subset of p53-dependent DDR genes.

      We apologize for misleading the reviewer. We will change the heading to "Xrp1 is sufficient for the expression of many p53-dependent DDR genes", which is the meaning we intended.

      Figure 5, panels F and G could be made much easier for the reader to follow. The labels in these two panels are very difficult to see and understand. It might be better to show some high magnification regions (e.g. insets) that show the differences in the prevalence of cell death in regions with different genotypes. Also, why is Xrp1 +/- not quantified in panel H since the authors claim that cell death is reduced even in the heterozygous cells?

      It is a good idea to add enlarged figures, and we will do so. We can quantify the Xrp1+/- genotype as well.

      Line 363 and Figure 6D, E. The authors argue that the increase in H2Av in the posterior compartment implies that cells with damaged DNA are not being eliminated when Xrp1 function is reduced. An alternative explanation is that the p53 mutation together with the Xrp1 knockdown impairs the DDR even more resulting in increased H2Av staining. I don't know how that authors' data can exclude this possibility.

      We agree with the reviewer and did not intend to exclude this possibility. We will rewrite this text to make both explanations clear.

      Line 365. Is the resolution of the "double labeling" sufficient to conclude that some of the H2Av cells upregulate Xrp1-HA? A more conservative interpretation would be that in these regions that have increased H2Av, that there is more expression of Xrp1-HA.

      We apologize for a mistake in the submitted manuscript. In fact the anti-H2Av and anti-HA primary antibodies used were both raised in mouse, and Fig 6G,H show distinct wing discs, not double labels. We will replace line 365 with the sentence suggested by the reviewer.

      Figure 6 - supplement 1. The expression of Xrp1-HA is reduced in the p53DN cells when they are a loss mutant for rps12. Although statistically significant, this reduction is modest. If this induction were due to a cell competition like phenomenon, would you not expect the induction to be completely abolished since rpS12 mutations abolish cell competition completely? Please explain.

      We apologize for confusing all three reviewers with Figure 6F supplement 1. This figure does not compare RpS12-dependent and -independent Xrp1-HA expression. Instead, it shows that the rps12-independent Xrp1-HA expression is only mildly p53-dependent, which is consistent with our conclusions. We will add a figure to the revised manuscript showing the strong RpS12 dependence of the late induction of Xrp1-HA and explain this more clearly. We did not include this in the initial manuscript version because we had already published this result, albeit with an anti-Xrp1 antibody (Ji et al Fig 1 N-P).

    1. Since mm. 35–39 hold onto the dominant harmony from the end of TR, what we find is a blurred entry into S-space. As a result, commentators have differed about where the secondary theme begins.6Close This problem can occur when S-themes start on or over the dominant, following an HC:MC in the key of S. Sonata Theory regards such an opening as one type of S0  (S-zero) or S1.0  theme: a new melodic idea, usually with a clear initiating function, but a theme that, at its opening, “retains the MC’s active dominant, which continues to ring through the succeeding music as momentarily fixed or immobile . . . [rather like] a prolongation of the caesura-dominant itself” (EST, 142–43). Emerging out of the low-register darkness and directed forward by the now diatonically inflected wobble in the viola, D3-C♮3, the cello opens the exposition’s part 2 in m. 35 with S0. It begins with a triadic climb on the sustained dominant, D2-F♯2-A2 (5̂-7̂-2̂), mm. 35–36, releasing the preceding G minor into G major with the B♮ upper-neighbor at the end of m. 35. At the same time, it reanimates the cello’s dotted-eighth-and-three-sixteenths rhythm from mm. 31–32 (traceable back to the P1.3 melody in mm. 13–17), the task of whose pulsations is always to flow into the succeeding bar: it will recur throughout much of S. Recalling Adorno’s suggestion that this movement may be heard “as the [unfolding] history of the opening fifth,” we may be invited to hear a relationship between the D-F♯-A opening of S0 and the blunt fifth-leap of P0. As we shall observe, other aspects of the subsequent S-theme also suggest back-references to P, continuing the sense of this music as enacting a process of ramification and becoming. As so often in Beethoven, it is possible to hear S as an imaginative recasting of several of P’s characteristic features: the principle, once again, of contrasting derivation. If one wishes to underscore this point, it is possible, with due cautionary nuances, to suggest that a new subrotation begins at m. 35. But to claim, with Adorno, that our task must be to show the “mediated identity” of P and S (my italics) is an ideologically grounded step too far (1998, 13). The cello’s D2-F♯2-A2 is answered three octaves higher and in retrograde by the first violin, A5-F♯5-D5, mm. 36–37. Continuing the process of S-emergence in the manner of a question or proposal, the cello climbs higher on the rungs of the V7/III chord, F♯2-A2-C3, mm. 37–38. The first violin responds with a reply that floats upward into the highest available register, sweeping the fog away into a patch of momentarily confident serenity, gliding along with the now-rolling meter. Triggered by the I6 chord in m. 39 (reckoning now in G major), the seraphic mm. 39–40, with fluttering inner voices, sound a complete cadential progression and produce a seemingly trouble-free III:IAC on the second beat of m. 40. Mm. 35–40 can be grouped as a compressed, six-bar sentential phrase. Even while they prolong a V7 harmony, mm. 35–36 and 37–38 suggest the onset of a rhetorical presentation (2+2, αα‎′). In this case, Beethoven omits the usual continuation idea (β‎) and proceeds immediately to the S1.2 cadential unit (γ‎). Let’s call the presentation, mm. 35–38, S1.1 (S0==>S1.1) and attach the designator S1.2 to the cadence, mm. 39–40.7Close Grasping the import of this six-bar phrase, mm. 35–40, is critical to understanding all that follows in the exposition. Recall the menacing E-minor threat from P, remembering also that no E-minor PAC had been sounded in that zone: that chilling seal of negativity had been pushed aside, repressed in m. 19. The point now, in S, is to secure a major-mode III:PAC with the hope of resolving it into a I:PAC in the parallel spot of the recapitulation, whereby the mechanics of the sonata process would overturn the initial E minor into E major. While by no means providing terminal closure, sounding the serene, G-major IAC in m. 40 is the first step of this attempt. It could be understood, for instance, as a six-bar antecedent, naïvely hoping for a consequent. But no consequent follows it. Instead, mm. 41 backs up to sound a variant of m. 39, a phrase-extension seeking to replicate the III:IAC with the melody now in the second violin. Near the cadential moment, m. 42, the predicted cadence falls apart on an f♯o7 chord (viio7, with the cello also shifting momentarily into a higher register), slipping onto V65 at the end of the bar. Nonetheless, gliding along on the metrical rails, the sense of local serenity spins onward in mm. 43–45, S1.3, piano and dolce. These bars constitute another, similar cadential unit, I-ii6-V(7)-I, producing a second III:IAC at the downbeat of m. 45, again with B5 in the topmost voice. As before, the IAC is not allowed to settle, but is immediately subjected to a variant of S1.3', mm. 45–46 (= mm. 43–44). This time the potential IAC-effect in m. 47 is softened through melodic diminution, and instead the tonic chord on m. 47 starts the gentle push of yet another cadential progression, mm. 47–48, this time clearly headed for a desired III:PAC downbeat and the hoped-for structural closure in 49. More than that, the V65/V in the second half of m. 47 and, above all, the melodic descent in the first violin in m. 48 (6̂-1̂-3̂-2̂) recall and transpose m. 18 from P—the E-minor cadential moment whose seemingly inevitable i:PAC had been subverted. And similarly, Beethoven subverts the predicted G-major cadence in m. 49 with an unexpected forte, f#o42—enharmonically the same diminished seventh that had thwarted the E-minor cadence in m. 19. By now it has become clear that sounding that III:PAC (EEC) is not going to be an easy task. For all of its dolce serenity up to this point, S is now running the risk of being reduced to a string of failed cadential modules. The diminished-seventh bluster of mm. 49–50, S1.4, not only blocks the expected III:PAC but also assumes the role of a two-bar anacrusis: a new, energetic windup gathering up strength to throw off a hopefully more secure approach to the anticipated structural cadence. Once again, the procedure in play—backing up to restate or refashion an earlier, unsuccessful cadential module—is the familiar “one-more-time technique” (Schmalfeldt 1992). Its first release, with the viola now in the upper voice, is in mm. 51–52, an S1.3 variant now falling, with the viola’s 6̂-5̂-4̂-3̂-2̂-(1̂) descent, toward a promised III:PAC. But again the cadence is blocked by an even more emphatic intervention of the S1.4 anacrusis-windup, mm. 53–54, expanding outward in an aggressively strenuous wedge. This opens onto a climactic cadential in m. 55, with registral extremes in the outer voices.8Close At this point the S zone’s “one-more-time” strategy changes. With the F♮6 in the first violin, m. 55, we abandon the quest for a straightforward cadential module. The three bars of mm. 55–57—at first a near-gravityless hovering, then a dolce, rapid plunging down to earth—close the wide-open wedge and signal a preparation for something new. They land on the downbeat of m. 58, where something different starts to generate. Call it S1.5: a more decisive buildup, begun in a hushed, secretive pianissimo: reculer pour mieux sauter. If the soaring mm. 55–57 had struck us as a metrical expansion, unpinning our entrainment with the previously smooth-flowing meter, the chromatic mm. 58–64 give us a different sense of metrical compression or disruption. The off-kilter rhythms and tied eighth notes set the notated meter into conflict with what soon locks into an implicit displaced from the barline by a half-beat: a metrically offset hemiola. While anticipated in m. 58, this becomes clearly apparent by m. 59, where the “misaligned ” implications are more securely established with the second eighth note of the bar. Their metrical-clash tuggings, which Kerman characterized as “nervous . . . twitchy syncopation” (1966, 126), are unmistakable in the buildup occupying mm. 60–64. Reinforcing the edgy tension of mm. 58–64 are the chromatic bass-line windings around the ever-strengthening dominant (notice the potent augmented-sixth approach to the in mm. 62–63) and the inexorable homophonic crescendo. By m. 64 the now-supercharged V7 is sounded forte, with ringing double-stops in the upper three parts. The import of all this could not be clearer: the drawing-back of the tensest possible bowstring in preparation for a potent downbeat-release. The arrow is shot forth with the sforzando tonic chord in m. 65, elided with and setting off a new, decisive thematic module. Notice also how Beethoven enhances m. 65’s shooting-forth through a foreshortening of the last of the metrically displaced “” implications by an eighth note. Thus the ensemble’s final bow-stroke in m. 64, marked staccato, becomes the trigger-moment that snaps the off-kilter syncopations back into realignment with the notated barlines, restoring our entrainment with meter. We now confront the most analytically challenging moment of the exposition, one that will shape any larger interpretive reading that we have of the movement. M. 65 is certainly a point of strong tonic arrival: G major rings out with celebratory flourishes, and it is emphatically prepared by a preceding V7. But does it qualify as a structural cadence? For Sonata Theory the question matters, since one of its central concerns is to attend to the manner of attaining, or not attaining, the generically mandated, non-tonic PAC near the end of any exposition: the completion of the essential expositional trajectory with the cadential production of the EEC. For all of the sense of euphoric arrival at m. 65, the notational evidence on behalf of an unassailably secured structural cadence is not complete, leaving open the possibility for two different understandings of this moment. In such cases Sonata Theory’s maxim is to explicate the ambiguities rather than to insist upon only one right way to understand the situation. Why might one hesitate before endorsing m. 65 as a structural cadence? What I’ll call Reading 1 draws attention to its cadential complications. Here at the downbeat of m. 65 we first notice that the topmost voice is on 5̂, D6, setting off an arpeggio cascade down to another 5̂, D4. From that perspective m. 65 might heard as a III:IAC, not a III:PAC,9Close and that accented high D6 continues to ring through mm. 65–68 as if sustained or frozen in that register. Moreover, at m. 65 Beethoven silences the second violin for two blank bars: its valenced leading-tone in m. 64, F♯5, is kept from its predicted resolution onto G5. Why? (As we shall see, in the parallel passage in the recapitulation this does not happen.) To be sure, the sforzando kickoff to the new thematic idea is forcefully accented, but the m. 65 reduction from the preceding double-stop thickness to a three-part texture is at least worthy of our notice. We might also observe that in m. 65 the downbeat G2 in the cello is of the briefest possible duration, and the vigorous G2-D2 alternation in the cello keeps the D2 dominant of mm. 63–64 in play through m. 68, albeit on metrically weak offbeats. This means that the thematic bolt shot forth in mm. 65–68 is registrally framed by a quasi-sustained D6 on the top and D2 on the bottom: the theme is encased within 5̂ above and 5̂ below. To what degree does all this undercut, or at least attenuate, the impression of a structural cadence? Or, in extreme versions of Reading 1, is it conceivable to hear m. 65 as anything other than a cadence? The alternative would be to hear S1.5, mm. 58–64, less as a cadential-function module than as a broad anacrusis that lands squarely on the tonic at m. 65 to set free a fresh, resolute thematic idea. (As noted in chapter 4, the music preceding elided PACs or PAC-effects, particularly when the thematic material of the cadential downbeat is vectored determinedly forward, can often take on the additional, preparatory function of an extended anacrusis, released at the point of tonic arrival.) But what would such a reading suggest? M. 65 surely marks an attainment of some sort. But it may be that m. 65’s G major is insisted upon by a dogged force of will, not attained by a problem-free cadence: a hyper-strong downbeat prepared by a metrically conflicted, seven-bar anacrusis in mm. 58–64.10Close “If G major cannot be secured with an unequivocal cadence—if there is no literal PAC—we will at least proclaim G major to be sufficiently attained by fiat. Plant the flag with fortitude even though the territory is not yet fully conquered.” This would mean that m. 65 falls short of being read as an EEC. And yet for all of these complications most listeners would probably find it more intuitive to hear an implicit cadential arrival at m. 65, especially in the immediate secondary-theme context of repeated cadential frustration through the several preceding “one-more-time” blockages, which are generically common toward the ends of secondary-theme zones. Those favoring a (quasi-) cadential understanding of m. 65—call it Reading 2—might suggest that the “PAC” resolution of the preceding V7 is something to be conceptually understood, even though upon examination it is not literally present: the forceful, sforzando elision of the newly released theme blots the implicit PAC out of audibility. Listeners, the argument might go, will hear a PAC-effect at m. 65 even though a check of the notation does not provide the written evidence for one. Such a PAC-effect, in turn, could be understood as providing at least a locally credible EEC-effect. Within the flexibilities afforded by Sonata Theory practice, the argument would be that, given the strength of the m. 65 arrival and the manner in which it is prepared, it could be considered a deformational EEC—a contextually practical substitute for it—seeking to ground the G-major tonic by assertion, that is, by means other than the prototypically normative cadence. In sum, Reading 1 (no structural cadence) argues that the generically expected III:PAC is so compromised at m. 65 that we should not conclude that the EEC has been satisfactorily accomplished. Reading 2 (implicit cadence-effect) allows for a sufficient EEC-effect via a cadentially attenuated but practicable stand-in for the EEC. Is it obligatory to choose either the one way or the other? Or might it be, in the reading that I prefer, that Beethoven has purposely composed these ambiguities into mm. 58–65 in order to unsettle our confidence in what, now mulling over the matter two centuries later, Sonata Theory regards as a normatively secured EEC? Perhaps the point is precisely that of its almost-ness, its combination of yes-and-no features, both of which play into the dramatic staging of the movement’s larger {– +} drama of modal reversal or non-reversal. Any such conclusion would have to be a central part of one’s hermeneutic reading of the movement. What then do we make of the theme that begins in m. 65? Should we think of it as a closing theme (post-EEC) or not? It may sound like a characteristic C theme, or a C theme that could have been, but, again, the confidence of its C-status can be called into question through the multiple attenuations of the PAC-effect at m. 65. How to resolve this question? As I have also noted in chapter 5’s discussion of the first movement of Haydn’s “Military” Symphony, Sonata Theory refers to such a thing as an SC  theme: “the presence of a theme literally in precedential, S-space that in other respects sounds as though it is more characteristically a closing theme.” This kind of theme seems “to bestride both the S- and C-concepts” (EST, 190–91). While regarding m. 65 as self-evidently precadential is a step too far, my preference is to call this an SC theme, if only to remind myself of the problems surrounding the m. 65 moment. If you are convinced by the EEC-effect at m. 65 and wish to regard the new theme as C, that’s also fine: substitute your C for my SC in what follows. In most cases SC themes will lead to a clearer production of an EEC (and C themes will normally confirm the EEC with one or more cadences). That’s not the case here. This SC (or C) theme starts out as a confident sentence, with presentation αα‎′ (mm. 65–66, 67–68), but the sentence is cut short in m. 69a. Its bluff bravado is redirected elsewhere; the theme is cut off at the knees. (The brutality of the truncation is not adequately captured by the benign connotation of the word “retransition,” RT.) Even if we have considered m. 65 to mark a sufficient EEC, that G-major confidence cannot be reaffirmed with closing material. This leaves the exposition cadentially open. Under these circumstances m. 65’s “EEC-effect” is at best left undersecured and uncertain. And with SC’s inadequacy now demonstrated, m. 70a brings back the malevolent E minor with a vengeance. We are thrown back to m. 1 and the repeat of the exposition. In sum, this {– +} exposition (E minor, G major, i-III) has produced at best a tenuous EEC-effect, one that has proved unable to be confirmed—and in fact is lost—in the brief music that follows, producing a non-closed exposition. Given m. 65’s ambiguity, I suggest that this movement is at least in dialogue with the concept of what Sonata Theory calls a failed exposition, not at all in the sense that Beethoven has composed it poorly but rather in the sense that he has staged a musical drama of cadential ambiguity (an EEC almost but perhaps not quite attained) within an exposition that, by its end, is left open. The expositional tale told here is one in which the major mode (III), while very much present, has proven unable to produce and maintain an unequivocal, major-mode PAC close. In turn this means that the expositional hope of producing an unequivocal I:PAC/ESC in the recapitulation is cast into doubt. On the other hand, we should remember that there have also been no E-minor PACs in the exposition. A bitter struggle is brewing. But before getting to the recapitulation, we have to pass through the trials of the development. Development (mm. 70b–138) Rotation 1 (mm. 70b–107) In both the first and second endings Beethoven suppress

      We now blurrily enter the S space starting on a dominant. Commentators differ on where the Secondary theme starts due to the theme starting on a dominant following a HC; or S0/S1.0 theme in sonata theory. The S theme suggests references to P-- the book suggests one could argue that a new subrotation begins at m.35. M.35-40 seeks to secure a major mode. The book calls 35-38 S0-S1.1, and S1.2 to the Ms. 39-40 cadence. the 6/8 gets disrupted around measure 58 giving the feeling of a 3/4 displacement. Measure 60-64 are characterized as nervous twitchy syncopation. M.65 is a point of tonic arrival in G major with the production of the EEC within the essential expositional trajectory in sonata theory, although whether or not this is a structural cadence is complicated. m.65 falls short of an EEC as there is no PAC. although it is very hearable to a listener as a cadence. The book calls this a deformational EEC. The author suggests this is a failed expostiion.