10,000 Matching Annotations
  1. Last 7 days
    1. eLife Assessment

      This important study investigates how signals from the nervous system can influence the response to different food sources. To demonstrate the role of specific neuronal and intestinal regulators in sensing food quality and modulating digestion, the authors present evidence through a combination of genetic screening, RNA-seq analysis, and functional studies. These findings shed light on an adaptive strategy to integrate food perception with physiological responses, with a mix of solid and convincing evidence supporting the work.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Liu et al have tried to dissect the neural and molecular mechanisms that C. elegans use to avoid the digestion of harmful bacterial food. Liu et al show that C. elegans use ON-OFF state of AWC olfactory neurons to regulate the digestion of harmful gram-positive bacteria S. saprophyticus (SS). Authors show that when C. elegans are fed on SS food, AWC neurons switch to OFF fate, which prevents the digestion of S. saprophyticus, and this helps C. elegans avoid these harmful bacteria. Using genetic and transcriptional analysis as well as making use of previously published findings, Liu et al implicate p38 MAPK pathway (in particular, NSY-1, the C. elegans homolog of MAPKKK ASK1) and insulin signaling in this process.

      Strengths:

      The revised manuscript has improved significantly. The authors have addressed almost all the comments that I had in my initial review.

      Weaknesses:

      None.

    3. Reviewer #2 (Public review):

      Summary:

      Using C. elegans as a model, the authors present an interesting story demonstrating a new regulatory connection between olfactory neurons and the digestive system. Mechanistically, they identified key factors (NSY-1, STR-130 et.al) in neurons, as well as critical 'signaling factors' (INS-23, DAF-2) that bridge different cells/tissues to execute the digestive shutdown induced by poor-quality food (Staphylococcus saprophyticus, SS).

      Strengths:

      The conclusions of this manuscript are mostly well supported by the experimental results shown.

      Weaknesses:

      The authors have done a nice job in addressing my comments.

    4. Reviewer #3 (Public review):

      Summary:

      The study explores a molecular mechanism by which C. elegans detects low-quality food through neuron-digestive crosstalk, offering new insights into food quality control systems. Liu and colleagues demonstrated that NSY-1, expressed in AWC neurons, is a key regulator for sensing Staphylococcus saprophyticus (SS), inducing avoidance behavior and shutting down the digestive system via intestinal BCF-1. They further revealed that INS-23, an insulin peptide, interacts with the DAF-2 receptor in the gut to modulate SS digestion. The study uncovers a food quality control system connecting neural and intestinal responses, enabling C. elegans to adapt to environmental challenges.

      Strengths:

      The study employs a genetic screening approach to identify nsy-1 as a critical regulator in detecting food quality and initiating adaptive responses in C. elegans. The use of RNA-seq analysis is particularly noteworthy, as it reveals distinct regulatory pathways involved in food sensing (Figure 4) and digestion of Staphylococcus saprophyticus (Figure 5). The strategic application of both positive and negative data mining enhances the depth of analysis. Importantly, the discovery that C. elegans halts digestion in response to harmful food and employs avoidance behavior highlights a physiological adaptation mechanism.

      Weaknesses:

      Major weaknesses have been addressed.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Liu et al have tried to dissect the neural and molecular mechanisms that C. elegans use to avoid digestion of harmful bacterial food. Liu et al show that C. elegans use the ON-OFF state of AWC olfactory neurons to regulate the digestion of harmful gram-positive bacteria S. saprophyticus (SS). The authors show that when C. elegans are fed on SS food, AWC neurons switch to OFF fate which prevents digestion of S. saprophyticus and this helps C. elegans avoid these harmful bacteria. Using genetic and transcriptional analysis as well as making use of previously published findings, Liu et al implicate the p38 MAPK pathway (in particular, NSY-1, the C. elegans homolog of MAPKKK ASK1) and insulin signaling in this process.

      Strengths:

      The authors have used multiple approaches to test the hypothesis that they present in this manuscript.

      Weaknesses:

      Overall, I am not convinced that the authors have provided sufficient evidence to support the various components of their hypothesis. While they present data that loosely align with their hypothesis, they fail to consider alternative explanations and do not use rigorous approaches to strengthen their overall hypothesis. The selective picking of genes from the RNA sequencing data and forcing the data to fit the proposed hypothesis based on previously published findings, without exploring other approaches, indicates a lack of thoroughness and rigor. These critical shortcomings significantly diminish enthusiasm for the manuscript in its totality. In my opinion, this is the biggest weakness in this manuscript.

      We appreciate the reviewer’s all the suggestions which help us to improve this paper. We now addressed reviewer’s comments at the section of “Reviewer #1 (Recommendations for the authors)”

      Reviewer #2 (Public review):

      Summary:

      Using C. elegans as a model, the authors present an interesting story demonstrating a new regulatory connection between olfactory neurons and the digestive system.

      Mechanistically, they identified key factors (NSY-1, STR-130 et.al) in neurons, as well as critical 'signaling factors' (INS-23, DAF-2) that bridge different cells/tissues to execute the digestive shutdown induced by poor-quality food (Staphylococcus saprophyticus, SS).

      Strengths:

      The conclusions of this manuscript are mostly well supported by the experimental results shown.

      Weaknesses:

      Several issues could be addressed and clarified to strengthen their conclusions.

      (1) The word "olfactory" should be carefully used and checked in this manuscript. Although AWCs are classic olfactory neurons in C. elegans, no data in this manuscript supports the idea that olfactory signals from SS drive the responses in the digestive system. To validate that it is truly olfaction, the authors may want to check the responses of worms (e.g. AWC, digestive shutdown, INS-23 expression) to odors from SS.

      We appreciate the reviewer’s careful attention to terminology. We agree that the term "olfactory" requires direct experimental validation. However, in this paper, we only used "olfactory" to specific define the AWC neurons. As reviewer’s suggestion, we now deleted the word “olfactory”.

      (2) In line 113, what does "once the digestive system is activated" mean? The authors need to provide a clearer statement about 'digestive activation' and 'digestive shutdown'.

      Previously, we observed that activating larval digestion with heat-killed E. coli or E. coli cell wall peptidoglycan (PGN) enabled the digestion of SS as food (Hao et al., 2024). Additionally, when animals reached the L2 stage by feeding normal OP50 diet, they could utilize SS as a food source to support growth (Figure 1figure supplement 1D). These findings suggest that once digestion is activated (via E. coli components or L2-stage maturation), worms gain the capacity to process SS as a viable food source, abolishing SS-induced growth impairment (Hao et al., 2024) ( Figure 1figure supplement 1D).

      (3) No control data on OP50. This would affect the conclusions generated from Figures 2A, 2B, 2D, 3B, 3C, 3G, 4D-G, 5D-E, 6B-D.

      We appreciate  this point. The central goal of the experiments listed (Figures 2A,B,D; 3B,C,G; 4D-G; 5D-E; 6B-D) was not to compare growth or behavior between SS and OP50 under standard conditions, but rather to understand the genetic basis of the C. elegans response specifically to SS, as identified through our nsy-1 mutant screen.

      Our data in Figure 1 clearly establishes the fundamental difference in growth and feeding behavior when larvae encounter SS compared to OP50 (Figures 1A,B). Having established SS as an unfavorable food source that triggers a specific protective response (digestive shutdown), the subsequent experiments focus on deciphering how this response is mediated.

      Therefore, within these specific experimental contexts under SS feeding: The primary comparison is between wild-type (N2) and nsy-1 mutant animals. All assays (growth, behavior, survival) are performed under the same SS feeding conditionsfor both genotypes.

      This design allows us to directly assess the functional role of NSY-1 in mediating the SS-specific response pathway we are investigating. Including an OP50 control for every figure would not address this core genetic question and could introduce confounding variables given the established difference in how C. elegans treats these two food sources. The critical internal control for these specific experiments is the performance of the wild-type under SS versus the mutant under SS.

      (4) Do the authors know which factors are released from AWC neurons to drive the digestive shutdown?

      Enrichment analysis revealed that genes related to extracellular functions, such as insulin-related genes, are induced in nsy-1 mutant animals (Figure 5—figure supplement 1A, Supplementary file 4). Further analysis of insulin-related genes from the RNA-seq data showed that ins-23 is predominantly induced in nsy-1 mutant animals (Figure 5—figure supplement 1B), suggesting its potential role in promoting SS digestion. We found that knockdown of ins-23 in nsy-1 mutants inhibited SS digestion (Figure 5D). Given that INS-23 is expressed in AWC neurons (Figure 5figure supplement 3A, CeNGEN), this suggests increased production and likely enhanced release of INS-23 from AWC neurons in the nsy-1 mutant background, which promotes SS digestion.

      The insulin/insulin-like growth factor signaling (IIS) pathway, particularly through the DAF-2 receptor, integrates nutritional signals to regulate various behavioral and physiological responses related to food (Kodama et al., 2006; Ryu et al., 2018). It has been shown that INS-23 acts as an antagonist for the DAF-2 receptor to promote larval diapause (Matsunaga et al., 2018). To test whether ins-23 induction in nsy-1 mutants promotes SS digestion through its receptor, DAF-2, we constructed a nsy-1; daf-2 double mutant. We found that the SS digestion ability of the nsy-1 mutant was inhibited by the daf-2 mutation. This suggests that the nsy-1 mutation induces the insulin peptide ins-23, which promotes SS digestion through its potential receptor, DAF-2.

      The data supports a model where AWC neurons regulate digestion via the release of INS-23. Loss of nsy-1 function increases INS-23 release from AWC, activating DAF-2 signaling and promoting digestion. Conversely, in wild-type animals, reduced INS-23 release from AWC contributes to digestive shutdown in response to SS food.

      Reviewer #3 (Public review):

      Summary:

      The study explores a molecular mechanism by which C. elegans detects low-quality food through neuron-digestive crosstalk, offering new insights into food quality control systems. Liu and colleagues demonstrated that NSY-1, expressed in AWC neurons, is a key regulator for sensing Staphylococcus saprophyticus (SS), inducing avoidance behavior and shutting down the digestive system via intestinal BCF-1. They further revealed that INS-23, an insulin peptide, interacts with the DAF-2 receptor in the gut to modulate SS digestion. The study uncovers a food quality control system connecting neural and intestinal responses, enabling C. elegans to adapt to environmental challenges.

      Strengths:

      The study employs a genetic screening approach to identify nsy-1 as a critical regulator in detecting food quality and initiating adaptive responses in C. elegans. The use of RNA-seq analysis is particularly noteworthy, as it reveals distinct regulatory pathways involved in food sensing (Figure 4) and digestion of Staphylococcus saprophyticus (Figure 5). The strategic application of both positive and negative data mining enhances the depth of analysis. Importantly, the discovery that C. elegans halts digestion in response to harmful food and employs avoidance behavior highlights a physiological adaptation mechanism.

      Weaknesses:

      Major points:

      (1) While NSY-1 positively regulates str-130 expression in AWC neurons and is critical for SS avoidance and survival, the authors should examine whether similar phenotypes are observed in str-130 mutants.

      In this study, we mainly focused on how worms sense adverse food sources (SS food) and shutdown digestion (not growth as digestion shutdown readout). We found that nsy-1 in AWC play key roles in response SS food, once nsy-1 mutation, mutant animals cannot detect SS food and digest it, therefore growth under SS food. From RNA-seq, we found that nsy-1 positively regulates several sensory perception related genes (sra-32, str-87, str-112, str-130, str-160, str-230) (Figure 4figure supplement 1A, Supplementary file 2). After screen, we found that we found that knockdown of str-130 in wild-type animals promoted SS digestion, thereby supporting animal growth (Figure 4D), and the proportion of animals with two AWC<sup>OFF</sup> neurons decreased (Figure 4E). Secondly, we found that overexpression of str-130 in nsy-1 mutant animals inhibited SS digestion, thereby slowing animal growth (Figure 4F), and the proportion of animals with two AWC<sup>OFF</sup> neurons increased (Figure 4G). These results demonstrate that NSY-1 promotes the AWC<sup>OFF</sup> state by inducing str-130 expression, which in turn inhibits SS digestion in C. elegans.

      (2) NSY-1 promotes the AWC-OFF state through str-130, inhibiting SS digestion. The authors should investigate whether STR-130 in AWC neurons regulates bcf-1 expression levels in the intestine.

      We agree with the reviewer's suggestion regarding the potential role of STR-130 in AWC neurons regulating intestinal bcf-1 expression. To address this, we generated transgenic worms with AWC-specific knockdown of str-130, achieved by rescuing sid-1 cDNA expression under the ceh-36 promoter (AWC-specific) in sid-1(qt9);BCF-1::GFP background worms.

      We observed that AWC neuron-specific RNAi of str-130 elevated intestinal BCF-1::GFP expression (Figure 6—figure supplement 1B). This demonstrates that STR-130 functions cell-non-autonomously in AWC neurons to repress BCF-1 expression in the intestine.

      (3) The current results rely on str-2 expression levels to indicate the AWC state. Ablating AWC neurons and testing the effects on digestion would provide stronger evidence for their role in digestive regulation.

      To confirm the important of AWC state in SS digestion, we performed AWC-specific neuron ablation experiments using previously validated transgenic strain that expresses cleaved caspase under the AWC-specific promoter, ceh-36 (ceh-36p::caspase). Critically, worms with ablated AWC neurons completely failed to digest SS food (Figure 3—figure supplement 4), phenocopying the non-digesting state of wild-type worms on SS when AWC-OFF signaling is impaired. This result directly confirms that functional AWC neurons are essential for initiating SS digestion, aligning with our model where the AWC-OFF state (induced by SS) inhibits digestion while the AWC-ON state promotes it.

      Furthermore, we previously study discovered that AWC ablation activates the intestinal mitochondrial unfolded protein response and inhibits food digestion, mechanistically linking neuronal integrity to gut stress responses and digestive inhibition.

      Together, these functional ablation studies provide compelling physiological evidence that AWC neurons act as central regulators of food-state sensing and gut function.

      (4) The claim that NSY-1 inhibits INS-23 and that INS-23 interacts with DAF-2 to regulate bcf-1 expression (Line 339-340) requires further validation. Neuron-specific disruption of INS-23 and gut-specific rescue of DAF-2 should be tested.

      We agree with the reviewer that the proposed NSY-1 ⊣ INS-23 → DAF-2 → BCF-1 signaling axis requires tissue-specific validation. To address this, we conducted compartment-specific functional dissection of INS-23 and DAF-2:

      AWC neuronal role of INS-23:

      To test whether INS-23 acts in AWC neurons to regulate intestinal BCF-1, we generated AWC-specific knockdown strains which was achieved by rescuing sid-1 cDNA expression under the ceh-36 promoter in a sid-1(qt9);BCF-1::GFP background. We found that AWC-restricted ins-23 knockdown significantly reduced intestinal BCF-1::GFP expression (Figure 6—figure supplement 1A). This confirms that INS-23 functions cell-non-autonomously within AWC sensory neurons to activate intestinal BCF-1, consistent with NSY-1’s upstream inhibition of INS-23 in this neuronal  subtype

      Intestinal role of DAF-2 as INS-23 receptor:

      To investigate weather DAF-2 acts as the gut-localized receptor for neuronal INS-23 signaling, we performed tissue-specific rescue experiments in the nsy-1(ag3);daf-2(e1370) double mutant. When DAF-2 was re-introduced specifically in the intestine (using the ges-1 promoter), we observed a significant suppression of SS digestion (Figure 5—figure supplement 3B), but not rescue digestive defect. This indicates that INS-23 induction in nsy-1 mutants promotes digestion independently of intestinal DAF-2 function.

      (5) Figure Reference Errors: Lines 296-297 mention Figure 6E, which does not exist in the main text. This appears to refer to Figure 5E, which has not been described.

      We corrected this.

      Reviewer #1 (Recommendations for the authors):

      I would like the authors to address the following comments in a resubmission.

      (1) The hallmark of the activated p38 MAPK pathway is the phosphorylation of most downstream kinase p38 (PMK-1/PMK2 in C. elegans) of this kinase cascade. Previous work from Bergmann lab showed that the most downstream kinase of this pathway, PMK-1/PMK-2, is not required for AWC asymmetry. I wonder whether that is the case also for the model that Liu et al have presented in this manuscript. Since p38/PMK-1 undergoes activation (phosphorylation) in response to pathogenic bacteria like P. aeruginosa, it is worth testing whether PMK-1 plays a role downstream of NSY-1 in the model that Liu et al present in this manuscript. It would be worth testing whether there is increased phosphorylation of p38 when C. elegans are fed SS and whether that phosphorylation regulates downstream components that Liu et al have identified in this manuscript.

      We thank the reviewer for raising this important point regarding PMK-1/p38 MAPK signaling. As established in our prior work (Reference 1), SS exposure triggers phosphorylation of PMK-1 (P-PMK-1) in C. elegans, and pmk-1 mutants exhibit enhanced growth on SS (Figure-1, Figure-2). This confirms that PMK-1-mediated innate immune signaling actively regulates SS responsiveness and digestion.

      To address whether PMK-1 functions downstream of NSY-1 within our proposed model, we performed critical epistasis analyses. While we observed that nsy-1 mutation elevates ins-23 (indicating NSY-1 suppression of ins-23), knockdown of pmk-1 did not alter ins-23 expression levels (Figure 5-figure supplement 3C). This demonstrates that PMK-1 does not operate through the ins-23 pathway to regulate SS digestion. Thus, although both pathways respond to SS, the PMK-1-mediated innate immune response and the NSY-1/INS-23 axis constitute distinct regulatory mechanisms governing digestive adaptation.

      Reference 1: Geng, S., Li, Q., Zhou, X., Zheng, J., Liu, H., Zeng, J., Yang, R., Fu, H., Hao, F., Feng, Q., & Qi, B. (2022). Gut commensal E. coli outer membrane proteins activate the host food digestive system through neural-immune communication. Cell host & microbe, 30(10), 1401–1416.e8. https://doi.org/10.1016/j.chom.2022.08.004

      (2) Since p38 MAPK pathway has a well-established role in host defense in the C. elegans intestine, it is important to show that NSY-1 does not function in the intestine in the model that Liu et al present. I would like the authors to reintroduce nsy-1 in C. elegans intestine in nsy-1 mutant animals and then test whether it has any effect on worm length on SS food (similar to what is done in Figure 3 for AWC-specific nsy-1).

      Beyond its  established  role  in  AWC  neurons,  we  detected  NSY-1 expression in the intestine (Figure 3-figure supplement 2A). To assess intestinal NSY-1 function, we performed tissue-specific rescue experiments in nsy-1 mutants using the intestinal-specific vha-1 promoter. Intestinal expression of NSY-1 significantly suppressed the enhanced SS digestion phenotype in nsy-1 mutants (Figure 3-figure supplement 2B), demonstrating functional involvement of gut-localized NSY-1 in regulating digestive responses. We propose intestinal NSY-1 mediates this effect through innate immune signaling, consistent with its known pathway components. As previously established (Reference 1), the canonical PMK-1/p38 MAPK pathway functions downstream of NSY-1, with both sek-1 and pmk-1 knockdown enhancing SS digestion through immune modulation. This indicates intestinal NSY-1 suppresses digestion may act through PMK-1-mediated immune responses. Since neuronal NSY-1's role in digestive control was previously undefined, we prioritized mechanistic analysis of its neuronal function in digestion regulation.

      Notably, this immune-mediated mechanism operates independently of NSY-1's neuronal regulation pathway. In AWC neurons, NSY-1 controls digestion exclusively through the neuropeptide signaling axis (INS-23/DAF-2/BCF-1) without engaging innate immune components.

      Reference 1: Geng, S., Li, Q., Zhou, X., Zheng, J., Liu, H., Zeng, J., Yang, R., Fu, H., Hao, F., Feng, Q., & Qi, B. (2022). Gut commensal E. coli outer membrane proteins activate the host food digestive system through neural-immune communication. Cell host & microbe, 30(10), 1401–1416.e8. https://doi.org/10.1016/j.chom.2022.08.004

      (3) At multiple places, wild-type (WT) controls have been labeled as N2. It is better to label all controls as WT (and not as N2).

      Corrected.

      (4) In Figure 2B, the aversion response should be scored at multiple time points, like Figure 1C, rather than at just one timepoint.

      We thank the reviewer for suggesting multi-timepoint analysis of aversion behavior. In accordance with this recommendation, we have now quantified SS avoidance at multi-timepoint. As shown in the revised Figure 2B, nsy-1 mutants exhibited significantly impaired avoidance responses at both 4h and 6h but not at 8h, confirming that NSY-1 is essential for sustained aversion to SS food in the early response. This data demonstrates that the critical role of NSY-1 in food discrimination at initial sensory responses.

      (5) Does the re-introduction of nsy-1 in AWC neurons in nsy-1 mutant background help animals avoid SS in dwelling and food-choice assays? Along the same lines, does the CRISPR-generated AWC-specific mutant of NSY-1 fail to avoid SS in dwelling and food-choice assays similar to the whole-animal mutant? These behavioral data are missing in Figure 3.

      We thank the reviewer for prompting behavioral validation of AWC-specific nsy-1 functions. To determine whether NSY-1 in AWC neurons mediates SS sensory perception, we performed dwelling (avoidance) and food-choice assays using AWC-specific nsy-1 knockout and AWC-rescued strains (nsy-1(ag3); Podr-1::nsy-1). In dwelling assays, AWC-specific nsy-1 KO mutants exhibited significantly impaired SS avoidance at 6h (Figure 3-figure supplement 3A), while AWC-rescued strains restored avoidance capacity at 2-6h (Figure 3-figure supplement 3B). Food-choice assays further revealed that AWC nsy-1 KO mutants preferentially migrated toward SS (Figure 3-figure supplement 3C), whereas AWC-rescued showed no preference between SS and HK-E. coli (Figure 3-figure supplement 3D). These data conclusively demonstrate that NSY-1 acts in AWC neurons to mediate SS recognition and aversion behaviors.

      (6) In Figure 3E and F, the number of animals that were used for scoring AWC str-2p::GFP expression should be specified.

      we added the number of animals in the figure.

      (7)  RNA seq analysis identified multiple GPCRs (including STR-130) that are upregulated in an NSY-1-dependent manner when animals are fed with SS bacteria. However, the authors decided to only characterize STR-130 because of previously published findings. It is important to rule out the role of other GPCRs since all are upregulated on SS food as shown in Figure S4 B. I would like the authors to knock down other GPCRs in the same manner as they did for STR-130 and demonstrate that only str-130 knockdown behaves similarly to the nsy-1 mutant (if that is the case) using the assay presented in Figure 4 D.

      We appreciate the reviewer’s suggestion to comprehensively evaluate NSY-1-regulated GPCRs. In response, we extended our functional analysis to all six GPCRs (str-130, str-230, str-87, str-112, str-160, and sra-32) identified as NSY-1-dependent and SS-induced in RNA-seq (Figure 4—figure supplement 1).

      Using RNAi knockdown and the SS growth assay, we observed that RNAi of str-130, str-230, str-87, or str-112 significantly enhanced SS growth (Figure 4—figure supplement 2A), with str-130 RNAi exhibiting the most robust phenotype—phenocopying nsy-1 mutants. Crucially, none of these GPCR knockdowns further enhanced growth in nsy-1(ag3) mutants (Figure 4—figure supplement 2B), confirming their position downstream of NSY-1. These data establish str-130 as the dominant effector of NSY-1-mediated SS response regulation, while suggesting minor contributions from other GPCRs (str-230, str-87, str-112).

      (8) In Figure 4E and G, the number of animals that were used for scoring GFP expression should be specified.

      we added the number of animals in the figure.

      (9) When comparing Figure 3E and Figure 4E, it appears that the loss of str-130 RNAi does not phenocopy nsy-1 mutant. This raises the question of whether the inefficiency of RNAi targeting str-130 is the cause, or if STR-130 is not the only GPCR regulated by NSY-1 on SS food. I would like the authors to address this discrepancy. If RNAi inefficiency is indeed the cause, using an RNAi-sensitive background, such as an eri- 1 mutant, could help strengthen the data presented in Figure 4E. Conversely, if RNAi inefficiency is not responsible for the discrepancy, I suggest that the authors investigate the roles of other GPCRs that were identified by RNA sequencing.

      We appreciate the reviewer’s observation regarding the phenotypic difference between nsy-1 mutants and str-130 (RNAi) animals on SS food (Fig. 3E vs Fig. 4E).

      While both genetic perturbations significantly enhance SS growth and increase the proportion of animals exhibiting AWC<sup>ON</sup> states compared to wild type (indicating enhanced digestion), the specific AWC<sup>ON </sup> neuron configurations differ: nsy-1 mutants predominantly show 2 AWC<sup>ON</sup> animals, whereas str-130(RNAi) animals primarily exhibit the 1 AWC<sup>ON</sup> /1 AWC<sup>OFF</sup> configuration (Fig. 3E vs Fig. 4E).

      This difference likely arises because STR-130 is the key GPCR mediating NSY-1's inhibitory effect on SS digestion, but it is not the sole GPCR involved, as evidenced by our RNAi screen identifying several additional NSY-1-regulated GPCRs (str-230, str-87, str-112) whose depletion also enhanced SS growth (Fig. 4A-D).

      The robust SS growth enhancement and AWC<sup>ON </sup> state increase caused by str-130 (RNAi) (phenocopying the nsy-1 mutant’s functional outcome of enhanced digestion) (Figure 4D, 4E) indicate effective RNAi knockdown for this specific assay. Therefore, the distinct neural configurations reflect the partial redundancy among GPCRs downstream of NSY-1, rather than an inherent inefficiency of the str-130 RNAi.

      The nsy-1 mutant phenotype represents the complete loss of all inhibitory GPCR signaling coordinated by NSY-1, while str-130(RNAi) represents the loss of its major component. Investigating the roles of other identified GPCRs (str-230, str-87, str-112) in modulating AWC<sup>ON </sup> neuron states is an important direction for future research.

      (10) In Figure 4 F and 4 G, the authors show that the overexpression of STR-130 rescues the nsy-1 mutant phenotype suggesting that NSY-1 might function through STR-130 to control digestion on SS food. These data place STR-130 downstream of NSY-1. To further strengthen these epistasis data, authors should knock down str-130 in nsy-1 mutant animals and show that the combined loss of both genes produces the same effect as the loss of either gene alone.

      We thank the reviewer for the insightful suggestion to further define the genetic relationship between nsy-1 and str-130. To strengthen our epistasis analysis, we performed RNAi knockdown of str-130 in the nsy-1(ag3) mutant background and assessed development on SS food. Consistent with STR-130 acting downstream of NSY-1, the loss of str-130 via RNAi did not further enhance the developmental capacity (i.e., growth phenotype) of nsy-1(ag3) mutant animals on SS. This lack of enhancement indicates that str-130 and nsy-1 function within the same genetic pathway, with str-130 acting epistatically downstream of nsy-1 (Figure 4—figure supplement 3). This finding reinforces the model proposed from our overexpression data (Fig. 4F-G) – that NSY-1 primarily exerts its inhibitory effect on SS digestion by inducing the expression GPCR STR-130.

      (11) In Figure 5C, please mention "ins-23 transcript levels" on the top of the graph so that it is clear what these data represent.

      We appreciate the reviewer’s suggestion.

      (12) Since all ins genes were upregulated in nsy-1 mutants (though ins-23 was indeed the most highly upregulated gene) on SS food from RNA seq analysis (Figure S5 B), it is important to first phenotypically characterize all of them using "worm length assay". If this analysis shows that ins-23 has the most robust phenotype, it would make more sense to just focus on ins-23.

      We agree with the reviewer that initial phenotypic characterization of candidate genes identified through transcriptomic analysis is valuable.Our RNA-seq data revealed that several insulin-like peptide genes, including ins-22, ins-23, ins-24, and ins-27, were significantly upregulated in the nsy-1 mutant on SS food (Figure 5—figure supplement 1B). We prioritized these insulin-like peptide genes for functional validation because they are known to act as neuropeptides capable of mediating non-cell autonomous signaling in previous studies (Shao et al 2016).

      To determine if any were functionally responsible for the enhanced SS growth observed in nsy-1 mutants, we performed functional phenotypic screening using the SS growth assay (worm length assay). We individually knocked down each of these candidates (ins-22, ins-23, ins-24, ins-27) in the nsy-1(ag3) mutant background. Among these, only RNAi targeting ins-23 significantly attenuated (i.e., suppressed) the enhanced development of the nsy-1(ag3) mutant on SS (Figure 5—figure supplement 2). This targeted functional screening revealed that ins-23 has the most robust and specific role in mediating the enhanced digestion phenotype downstream of NSY-1 loss, providing the critical justification for our subsequent focus on this particular insulin-like peptide.

      Ref:

      Shao, L. W., Niu, R., & Liu, Y. (2016). Neuropeptide signals cell non-autonomous mitochondrial unfolded protein response. Cell research, 26(11), 1182–1196. https://doi.org/10.1038/cr.2016.118

      Reviewer #2 (Recommendations for the authors):

      There are several minor errors and typos in the manuscript

      (1) A number of typos in the figures, like "length".

      Corrected.

      (2) The 'axis labels' are inconsistent from panel to panel, like "relative body length" and "relative worm length".

      Corrected.

      (3) The fonts are inconsistent from panel to panel.

      Corrected.

      (4) There is no Ex unique number for transgenic lines.

      Corrected.

      Reviewer #3 (Recommendations for the authors):

      Minor points:

      (1)  Figure 3B, 3C, 3G, 4D, 4F, 5D, 5E, and 6C: Replace "lenth" with "length" (consistent with Figure 2A).

      Corrected.

      (2) Figure 4D: Correct "ctontrol" to "control."

      Corrected.

      (3) Figure 4G: Update the co-injection marker to Podr-1::GFP instead of Pstr-2::GFP.

      Corrected.

      (4) Figure 5C: This figure is missing from the Results section.

      Corrected.

      (5) Figure 6A: Label the graph with Pbcf-1::bcf-1::GFP, as in Figure 6D.

      Corrected.

      (6) Italicization: Lines 588 and 603-italicize nsy-1.

      Corrected.

      (7) Supplementary Figure S2A: Correct "Screeng" to "Screening."

      Corrected.

      (8) Spelling/Proofreading: Ensure consistent spelling and grammar, such as correcting "mutan" to "mutant" in Figure 4A.

      Corrected.

    1. eLife Assessment

      In this valuable manuscript, Rao and colleagues investigate the UFD-1/NPL-4 complex, which is involved in extracting misfolded proteins in the plasma membrane and the accumulation of pathogenic bacteria in the intestine. Using convincing methods, the authors find that knockdown of the ufd-1 and npl-4 genes leads to shortened lifespan of the nematode C. elegans and reduced accumulation of the bacterial pathogen P. aeruginosa in the intestine.

    2. Reviewer #1 (Public review):

      The authors adequately addressed the concerns I raised in my initial review, which are noted below.

      (1) I suggest that the authors choose a different term in their title, abstract and manuscript to describe the phenotypes associated with ufd-1 and npl-4 knockdown other than an "inflammation-like response." Inflammation is a pathological term with four cardinal signs: redness (rubor), swelling (tumor), warmth (calor) and pain (dolor). These are not symptoms known to occur in C. elegans. The authors could consider using "inappropriate," "aberrant" or "toxic" immune activation in the title and abstract.

      (2) I think it is important to point out in the context of the authors novelty claim in the abstract and manuscript that the toxic effects of inappropriate immune activation in C. elegans has been widely catalogued. For example: doi.org/10.1371/journal.ppat.1011120 (2023); doi:10.1186/s12915-016-0320-z (2016).; doi:10.1126/science.1203411 (2011); doi:10.1534/g3.115.025650 (2016). In addition, doi:10.7554/eLife.74206 (2022) previously described a mutation that caused innate immune activation that reduced accumulation of P. aeruginosa in the intestine, but also caused animals to have a shortened lifespan.

      Thus, I do not think this study reveals the existence of inflammatory-like responses in C. elegans, as stated by the authors. Indeed, I think it is important for the authors to remove this novelty claim from their paper and discuss their work in the context of these studies in a paragraph in the introduction.

      (3) The authors rely on the use of RNAi of ufd-1 and npl-4 to study their effect on P. aeruginosa colonization and pathogen resistance throughout the manuscript. To address the possibility of off-target effects of the RNAi, the authors should consider both (i) showing with qRT-PCR that these genes are indeed targeted during RNAi, and (ii) confirming their phenotypes with an orthologous technique, preferably by studying ufd-1 and npl-4 loss-of-function mutants [both in the wild-type and sek-1(km4) backgrounds]. If mutation of these genes is lethal, the authors could use Auxin Inducible Degron (AID) technology to induce the degradation of these proteins in post-developmental animals.

      (4) I am confused about the author's explanation regarding their observation that inhibition of the UFD-1/ NPL-4 complex extends the lifespan of sek-1(km25) animals, but not pmk-1(km25) animals, as SEK-1 is the MAPKK that functions immediately upstream of the p38 MAPK PMK-1 to promote pathogen resistance.

      I am also confused why their RNA-seq experiment revealed a signature of intracellular pathogen response genes and not PMK-1 targets, which the authors propose is accounting for toxic immune activation. Activation of which immune response leads to toxicity?

      (5) The authors did not test alternative explanations for why UFD-1/ NPL-4 complex inhibition compromises survival during pathogen infection, other than exuberant immune activation. For example, it is possible that inhibition of this proteosome complex shortens lifespan by compromising the general health/ normal physiology of nematodes. Immune responses could be activated as a secondary consequence of this stress, and not be a direct cause of early mortality. Does sek-1(km4) mutant suppress the lifespan shortened lifespan of ufd-1 and npl-4 knockdown? This experiment should also be done with loss-of-function mutants, as noted in point 3.

      (6) The conclusion of Figure 6 hinges on an experiment that uses double RNAi to knockdown two genes at the same time (Fig. 6D and 6G), an approach that is inherently fraught in C. elegans biology owing to the likelihood that the efficiency of RNAi-mediated gene knockdown is compromised and may account for the observed phenotypes. The proper control for double RNAi is not empty vector + ufd-1(RNAi), but rather gfp(RNAi) + ufd-1(RNAi), as the introduction of a second hairpin RNA is what may compromise knockdown efficiency. In this context, it is important to confirm that knockdown of both genes occurs as expected (with qRT-PCR) and to confirm this phenotype using available elt-2 loss-of-function mutants.

      (7) A supplementary table with the source data for at least three replications (mean lifespan, n, statistical comparison) for each pathogenesis assay should be included in this manuscript.

      Comments on revisions:

      The authors adequately addressed the concerns I raised.

    3. Reviewer #2 (Public review):

      Summary:

      The authors aimed to uncover what role, if any, the UFD1/NPL4 complex might play in innate immune responses of the nematode C. elegans. The authors find that loss of the complex renders animals more sensitive to both pathogenic and non-pathogenic bacteria. However, there appears to be a complex interplay with known innate immune pathways since loss of UFD1/NPL4 actually results in increased survival of animals lacking the canonical innate immune pathways.

      Strengths:

      The authors perform robust genetic analysis to exclude and include possible mechanisms by which the UFD1/NPL4 pathway acts in the innate immune response.

      Weaknesses:

      The argument that the loss of the UFD1/NPL4 complex triggers a response that mimics that of an intracellular pathogen is not thoroughly investigated. Additionally, the finding of a role of the GATA transcription factor, ELT-2, in this response is suggestive, but experiments showing sufficiency in the context of loss of the UFD1/NPL4 complex need to be explored.

      Comments on revisions:

      The authors have performed several control experiments for their RNAi based experiments and also tested the requirement for xbp-1s in their paradigm. The findings and their interpretations are acceptable.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) I suggest that the author's choose a different term in their title, abstract and manuscript to describe the phenotypes associated with ufd-1 and npl-4 knockdown other than an "inflammation-like response." Inflammation is a pathological term with four cardinal signs: redness (rubor), swelling (tumor), warmth (calor) and pain (dolor). These are not symptoms know to occur in C. elegans. The authors could consider using "tolerance" instead, as this term may better describe their findings.

      We have changed “inflammation-like response” to “aberrant immune response” throughout the manuscript.

      (2) It would help the reader to better understand the novelty of the findings in this study if the authors include a paragraph in their introduction to put their results in context of the published literature that has examined the relationship between immune activation and nematode health and survival. In particular, I suggest that the authors discuss doi:10.7554/eLife.74206 (2022), a study that charcterized a similar observation to what the authors are reporting. This study found that low cholesterol reduces pathogen tolerance and host survival during pathogen infection. Cholesterol scarcity increases p38 PMK-1 phosphorylation, priming immune effector induction in a manner that reduces pathogen accumulation in the intestine during a subsequent infection. I also suggest that the authors highlight in this introductory paragraph that the toxic effects of inappropriate immune activation in C. elegans has been widely catalogued. For example: doi.org/10.1371/journal.ppat.1011120 (2023); doi:10.1186/s12915-016-0320-z (2016).; doi:10.1126/science.1203411 (2011); doi:10.1534/g3.115.025650 (2016).

      In this context, the authors could consider re-wording their novelty claim in the abstract and introduction to take into account this previous body of work.

      We have added a paragraph to the Discussion section to place our findings in the context of previous research. The revised manuscript now includes the following text (page 11, lines 336–344): “Previous studies have shown that hyperactivation of immune pathways can negatively affect organismal development. For example, sustained activation of the p38 MAPK pathway impairs development in C. elegans (Cheesman et al., 2016; Kim et al., 2016), and excessive activation of the IPR also leads to developmental defects (Lažetić et al., 2023). Similar to our current study, recent work has demonstrated that heightened immune responses can reduce gut pathogen load while paradoxically decreasing host survival during infection (Ghosh and Singh, 2024; Peterson et al., 2022). However, our study uniquely shows that while such heightened immune responses are detrimental to immunocompetent animals, they can be beneficial in the context of immunodeficiency.”

      (3) The authors rely on the use of RNAi of ufd-1 and npl-4 to study their effect on P. aeruginosa colonization and pathogen resistance throughout the manuscript. To address the possibility of off-target effects of the RNAi, the authors should consider both (i) showing with qRT-PCR that these genes are indeed targeted during RNAi, and (ii) confirming their phenotypes with an orthologous technique, preferably by studying ufd-1 and npl-4 loss-offunction mutants [both in the wild-type and sek-1(km4) backgrounds]. If mutation of these genes is lethal, the authors could use Auxin Inducible Degron (AID) technology to induce the degradation of these proteins in post-developmental animals.

      We attempted several protocols of CRISPR in our laboratory to generate ufd-1 loss-of-function mutants; however, these efforts were unsuccessful. While this does not rule out the possibility of generating ufd-1 mutants, the failure is likely due to technical limitations on our part rather than an inherent inability to disrupt the gene. Nevertheless, to confirm the specificity of our RNAi-based approach, we quantified ufd-1 and npl-4 mRNA levels following RNAi treatment and found that each gene was specifically and effectively downregulated by its respective RNAi. 

      Importantly, ufd-1 and npl-4 RNA sequences do not share significant homology, yet knockdown of either gene results in nearly identical phenotypes, including reduced survival on P. aeruginosa, diminished intestinal colonization, and shortened lifespan. These consistent outcomes strongly support the conclusion that the phenotypes are attributable to the disruption of the functional UFD-1-NPL-4 complex. We have added these results in the revised manuscript (pages 4-5, lines 114-125): “To confirm the specificity of the RNAi knockdowns and rule out potential off-target effects, we examined transcript levels of ufd-1 and npl-4 following RNAi treatment. RNAi against ufd-1 significantly reduced ufd-1 mRNA levels without reducing npl-4 expression, while npl-4 RNAi specifically downregulated npl-4 transcripts with no impact on ufd-1 mRNA levels (Figure 1—figure supplement 1A and B). Additionally, alignment of ufd-1 and npl-4 mRNA sequences against the C. elegans transcriptome revealed no significant similarity to other genes, supporting the specificity of the RNAi constructs. Moreover, the ufd-1 and npl-4 RNA sequences do not share significant sequence similarity. Therefore, the highly similar phenotypes observed in ufd-1 and npl-4 knockdown animals, including shortened lifespan, reduced survival on P. aeruginosa, and decreased intestinal colonization with P. aeruginosa, strongly suggest that these outcomes result from the disruption of the functional UFD-1-NPL-4 complex.”

      (4) I am confused about the authors explanation regarding their observation that inhibition of the UFD-1/ NPL-4 complex extends the lifespan of sek-1(km25) animals, but not pmk-1(km25) animals, as SEK-1 is the MAPKK that functions immediately upstream of the p38 MAPK PMK-1 to promote pathogen resistance.

      I am also confused why their RNA-seq experiment revealed a signature of intracellular pathogen response genes and not PMK-1 targets, which the authors propose is accounting for toxic immune activation. Activation of which immune response leads to toxicity?

      We consistently observe that sek-1(km4) mutants are more sensitive to P. aeruginosa infection than pmk-1(km25) mutants, a finding also reported in previous studies (for example, PMID: 33658510). Given that SEK-1 functions upstream of PMK-1 in the MAPK signaling cascade, it is plausible that SEK-1 also regulates additional MAP kinases, such as PMK-2 (PMID: 25671546), which could contribute to the enhanced susceptibility observed in sek-1 mutants.

      Our results show that inhibition of the UFD-1-NPL-4 complex improves survival specifically in severely immunocompromised animals, such as sek-1(km4) mutants, but not in pmk1(km25) mutants. To further validate this, we generated the double mutant dbl-1(nk3);pmk1(km25), which exhibits reduced survival on P. aeruginosa compared to either single mutant.

      Notably, inhibition of the UFD-1-NPL-4 complex also enhances survival in the dbl1(nk3);pmk-1(km25) background, reinforcing the observation that this response is specific to severely compromised immune states.

      We would also like to clarify that the observed phenotypes are independent of the SEK1/PMK-1 pathway, as shown in Figure 3A-3C, Figure 3—figure supplement 1, and Figure 4A-4C. The IPR seems to play a role in the observed phenotypes, as inhibition of some of the protease and pals genes (IPR genes) leads to increased P. aeruginosa colonization in ufd-1 knockdown animals (Figure 6—figure supplement 1). The other immune response pathway that leads to the observed phenotypes is ELT-2, as explained in Figure 6. Finally, we have included in the revised manuscript a note that, in addition, as-yet unidentified pathways are also likely contributing to the phenotypes triggered by disruption of the UFD-1-NPL-4 complex.

      (5) The authors did not test alternative explanations for why UFD-1/ NPL-4 complex inhibition compromises survival during pathogen infection, other than exuberant immune activation. For example, it is possible that inhibition of this proteosome complex shortens lifespan by compromising the general health/ normal physiology of nematodes. Immune responses could be activated as a secondary consequence of this stress, and not be a direct cause of early morality. Does sek-1(km4) mutant suppress the lifespan shortened lifespan of ufd-1 and npl-4 knockdown? This experiment should also be done with loss-offunction mutants, as noted in point 3.

      We have already included this data in Figure 4D, where we observed that ufd-1 and npl-4 knockdown reduce the lifespan of sek-1(km4) animals. It is possible that immune activation is a secondary consequence of cellular stress induced by inhibition of the UFD-1NPL-4 complex. However, our data strongly suggest that the observed phenotypes, including reduced gut pathogen load and decreased survival on the pathogen, are due to the aberrant immune response activated by the inhibition of the UFD-1-NPL-4 complex. Evidence from sek-1(km4) mutants particularly underscores the role of this dysregulated immune activation. While this aberrant immune response is detrimental to wild-type animals under pathogenic conditions, it appears to be beneficial in severely immunocompromised backgrounds. Specifically, in sek-1(km4) mutants, inhibition of the UFD-1-NPL-4 complex enhances survival during P. aeruginosa infection (Figure 4A). However, under non-infectious conditions, where sek-1(km4) mutants exhibit a normal lifespan, the same immune activation becomes harmful (Figure 4D). Together, these findings demonstrate that the aberrant immune response induced by UFD-1–NPL-4 inhibition is context-dependent: it is advantageous only for immunocompromised animals under infection, but deleterious to healthy animals under infection and to both healthy and immunocompromised animals under non-infectious conditions.

      (6) The conclusion of Figure 6 hinges on an experiments that uses double RNAi to knockdown two genes at the same time (Fig. 6D and 6G), an approach that is inherently fraught in C. elegans biology owing the likelihood that the efficiency of RNAi-mediated gene knockdown is compromised and may account for the observed phenotypes. The proper control for double RNAi is not empty vector + ufd-1(RNAi), but rather gfp(RNAi) + ufd1(RNAi), as the introduction of a second hairpin RNA is what may compromise knockdown efficiency. In this context, it is important to confirm that knockdown of both genes occurs as expected (with qRT-PCR) and to confirm this phenotype using available elt-2 loss-of-function mutants.

      We thank the reviewer for this helpful suggestion. We have repeated all double

      RNAi experiments using gfp RNAi as a control instead of the empty vector (Figure 6 and Figure 6—figure supplement 1). Additionally, we assessed the efficiency of gene knockdown in the double RNAi conditions (Figure 6—figure supplement 2) and found that RNAi efficacy was not compromised by the double RNAi treatment.

      (7) A supplementary table with the source data for at least three replications (mean lifespan, n, statistical comparison) for each pathogenesis assay should be included in this manuscript.

      The source data is provided for all the data presented in the manuscript.

      Reviewer #2 (Public Review):

      Summary:

      The authors aimed to uncover what role, if any, the UFD1/NPL4 complex might play in the innate immune responses of the nematode C. elegans. The authors find that loss of the complex renders animals more sensitive to both pathogenic and non-pathogenic bacteria. However, there appears to be a complex interplay with known innate immune pathways since the loss of UFD1/NPL4 actually results in increased survival of animals lacking the canonical innate immune pathways.

      We thank the reviewer for providing an excellent summary of our work.

      Strengths:

      The authors perform robust genetic analysis to exclude and include possible mechanisms by which the UFD1/NPL4 pathway acts in the innate immune response.

      We thank the reviewer for highlighting the strengths of our work.

      Weaknesses:

      The argument that the loss of the UFD1/NPL4 complex triggers a response that mimics that of an intracellular pathogen has not been thoroughly investigated. Additionally, the finding of a role of the GATA transcription factor, ELT-2, in this response is suggestive, but experiments showing sufficiency in the context of loss of the UFD1/NPL4 complex need to be explored.

      We have investigated the role of IPR genes in the phenotypes observed upon ufd1 knockdown (Figure 6—figure supplement 1), and our results suggest that the IPR may contribute, at least in part, to the phenotypic outcomes of ufd-1 RNAi. In the Discussion section (pages 11–12, lines 345–356), we have included a detailed discussion on the possible mechanisms underlying IPR activation upon inhibition of the UFD-1–NPL-4 complex. We agree that the interaction between the UFD-1–NPL-4 complex and the IPR is intriguing and warrants further investigation. However, we believe that an in-depth exploration of this interaction lies beyond the scope of the current study.

      We have incorporated new data on ELT-2 overexpression in the revised manuscript. Overexpression of ELT-2 partially phenocopies the effects of ufd-1 knockdown, supporting the idea that other pathways likely contribute to the full spectrum of phenotypes observed upon UFD-1-NPL-4 complex inhibition. The revised manuscript reads (page 10, lines 311319): “To determine whether ELT-2 activation alone is sufficient to recapitulate the phenotypes observed upon UFD-1-NPL-4 complex inhibition, we analyzed animals overexpressing ELT-2. Similar to ufd-1 knockdown, ELT-2 overexpression led to a significant reduction in the colonization of the gut by P. aeruginosa (Figure 6—figure supplement 3A and 3B). However, overexpression of ELT-2 did not alter the survival of worms on P. aeruginosa (Figure 6—figure supplement 3C). Taken together, these findings suggest that the phenotypes triggered by disruption of the UFD-1-NPL-4 complex are partially mediated by ELT-2. However, additional pathways, yet to be identified, likely cooperate with ELT-2 to regulate both pathogen resistance and host survival.”

      Reviewer #1 (Recommendations For The Authors):

      The authors could consider avoiding the use of descriptors (e.g., "drastic") when presenting their data.

      We have removed the descriptors.

      Reviewer #2 (Recommendations For The Authors):

      What happens with overexpression of ELT2?

      Overexpression of ELT-2 partially recapitulates the phenotypes of ufd-1 knockdowns, indicating that additional pathways are likely involved in controlling the phenotypes observed upon inhibition of the UFD-1-NPL-4 complex. The revised manuscript reads (page 10, lines 311-319): “To determine whether ELT-2 activation alone is sufficient to recapitulate the phenotypes observed upon UFD-1-NPL-4 complex inhibition, we analyzed animals overexpressing ELT-2. Similar to ufd-1 knockdown, ELT-2 overexpression led to a significant reduction in the colonization of the gut by P. aeruginosa (Figure 6—figure supplement 3A and 3B). However, overexpression of ELT-2 did not alter the survival of worms on P. aeruginosa (Figure 6—figure supplement 3C). Taken together, these findings suggest that the phenotypes triggered by disruption of the UFD-1-NPL-4 complex are partially mediated by ELT-2. However, additional pathways, yet to be identified, likely cooperate with ELT-2 to regulate both pathogen resistance and host survival.”

      The data with xbp-1 loss of function is very different than that of pek1 and atf-6. Does loss of ufd1/npl4 suppress the increased pathogen survival of xbp-1s overexpressing animals?

      We have examined worms overexpressing XBP-1s and found that overexpression of XBP-1s does not rescue the phenotypes caused by ufd-1 knockdown. The revised manuscript reads (page 6, lines 167-174): “To further examine the role of XBP-1 in this context, we assessed the effect of ufd-1 knockdown in animals neuronally overexpressing the constitutively active spliced form of XBP-1 (XBP-1s), which has been previously associated with enhanced longevity (Taylor and Dillin, 2013). Knockdown of ufd-1 resulted in the reduced survival of XBP-1s-overexpressing animals on P. aeruginosa, despite a concurrent decrease in bacterial colonization of the gut (Figure 2—figure supplement 1A-C). This indicated that the XBP-1 pathway was not required for the reduced P. aeruginosa colonization of ufd-1 knockdown animals.” 

      Lastly, while the pathogen burden is reduced in ufd1/npl4 loss and pumping rates are marginally affected, have you checked defecation rates? Could they be increased?

      We thank the reviewer for this valuable suggestion. We measured defecation rates following ufd-1 and npl-4 knockdown and, unexpectedly, found that inhibition of ufd-1/npl-4 leads to a reduction in defecation frequency. These findings clearly indicate that altered defecation cannot explain the observed decrease in gut colonization. The revised manuscript reads (page 5, lines 138-148): “The clearance of intestinal contents through the defecation motor program (DMP) is known to influence gut colonization by P. aeruginosa in C. elegans (Das et al., 2023). It is therefore conceivable that knockdown of the UFD-1-NPL-4 complex might increase defecation frequency, thereby promoting the physical expulsion of bacteria and resulting in reduced gut colonization. To test this possibility, we measured DMP rates in animals subjected to ufd-1 and npl-4 RNAi. Contrary to this hypothesis, both ufd-1 and npl-4 knockdown animals exhibited a significant reduction in defecation frequency compared to control RNAi-treated animals (Figure 1—figure supplement 2C). This reduction in DMP rate persisted even after 12 hours of exposure to P. aeruginosa (Figure 1—figure supplement 2D). Thus, the change in the DMP rate in ufd-1 and npl-4 knockdown animals is unlikely to be the reason for the reduced gut colonization by P. aeruginosa.”

      In summary, we would like to thank the reviewers again for providing constructive and thoughtful feedback. We believe we have fully addressed all the concerns of the reviewers by carrying out several new experiments and modifying the text. The manuscript has undergone substantial revision and has thereby improved significantly. We do hope that the evidence in support of the conclusions is found to be complete in the revised manuscript.

    1. eLife Assessment

      The identification of RBMX2 as a novel regulator linking mycobacterial infection to Epithelial-Mesenchymal Transition and cancer progression are fundamental findings that advance our understanding of a major research question about the link between infectious and non-infectious diseases, microbiology and oncology. It does so by introducing RBMX2 as a novel host factor, a potential therapeutic target and biomarker for both TB and lung cancer. The evidence provided is convincing because it is appropriate and the validated multi-omics methodologies used are in line with the current state of the art. This study will be of interest to scientists working in the fields of drug discovery, microbiology and oncology.

    2. Reviewer #3 (Public review):

      Summary:

      This study investigates the role of the host protein RBMX2 in regulating the response to Mycobacterium bovis infection and its connection to epithelial-mesenchymal transition (EMT), a key pathway in cancer progression. Using bovine and human cell models, the authors have wisely shown that RBMX2 expression is upregulated following M. bovis infection and promotes bacterial adhesion, invasion, and survival by disrupting epithelial tight junctions via the p65/MMP-9 signaling pathway. They also demonstrate that RBMX2 facilitates EMT and is overexpressed in human lung cancers, suggesting a potential link between chronic infection and tumor progression. The study highlights RBMX2 as a novel host factor that could serve as a therapeutic target for both TB pathogenesis and infection-related cancer risk.

      Strengths:

      The major strengths lie in its multi-omics integration (transcriptomics, proteomics, metabolomics) to map RBMX2's impact on host pathways, combined with rigorous functional assays (knockout/knockdown, adhesion/invasion, barrier tests) that establish causality through the p65/MMP-9 axis. Validation across bovine and human cell models and in clinical tissue samples enhances translational relevance. Finally, identifying RBMX2 as a novel regulator linking mycobacterial infection to EMT and cancer progression opens exciting therapeutic avenues.

      Weaknesses:

      There are a few minor weaknesses like grammatical errors, spelling mistakes. Also, the manuscript is too dense; improving the narratives in the Results and Discussion section could help readers follow the logic of the experimental design and conclusions.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This manuscript presents a compelling study identifying RBMX2 as a novel host factor upregulated during Mycobacterium bovis infection.

      The study demonstrates that RBMX2 plays a role in:

      (1) Facilitating M. bovis adhesion, invasion, and survival in epithelial cells.

      (2) Disrupting tight junctions and promoting EMT.

      (3) Contributing to inflammatory responses and possibly predisposing infected tissue to lung cancer development.

      By using a combination of CRISPR-Cas9 library screening, multi-omics, coculture models, and bioinformatics, the authors establish a detailed mechanistic link between M. bovis infection and cancer-related EMT through the p65/MMP-9 signaling axis. Identification of RBMX2 as a bridge between TB infection and EMT is novel.

      Strengths:

      This topic and data are both novel and significant, expanding the understanding of transcriptomic diversity beyond RBM2 in M. bovis responsive functions.

      Weaknesses:

      (1) The abstract and introduction sometimes suggest RBMX2 has protective anti-TB functions, yet results show it facilitates pathogen adhesion and survival. The authors need to rephrase claims to avoid contradiction.

      We sincerely appreciate the reviewer's valuable feedback regarding the need to clarify RBMX2's role throughout the manuscript. We have carefully revised the text to ensure consistent messaging about RBMX2's function in promoting M. bovis infection. Below we detail the specific modifications made:\

      (1) Introduction Revisions:

      Changed "The objective of this study was to elucidate the correlation between host genes and the susceptibility of M.bovis infection" to "The objective of this study was to identify host factors that promote susceptibility to M.bovis infection"

      Revised "RBMX2 polyclonal and monoclonal cell lines exhibited favorable phenotypes" to "RBMX2 knockout cell lines showed reduced bacterial survival"

      Replaced "The immune regulatory mechanism of RBMX2" with "The role of RBMX2 in facilitating M.bovis immune evasion"

      (2) Results Revisions:

      Modified "RBMX2 fails to affect cell morphology and the ability to proliferate and promotes M.bovis infection" to "RBMX2 does not alter cell viability but significantly enhances M.bovis infection"

      Strengthened conclusion in Figure 4: "RBMX2 actively disrupts tight junctions to facilitate bacterial invasion"

      (3) Discussion Revisions:

      Revised screening description: "We screened host factors affecting M.bovis susceptibility and identified RBMX2 as a key promoter of infection"

      Strengthened concluding statement: "In summary, RBMX2 drives TB pathogenesis by compromising epithelial barriers and inducing EMT"

      These targeted revisions ensure that:

      All sections consistently present RBMX2 as promoting infection; the language aligns with our experimental finding; potential protective interpretations have been eliminated. We believe these modifications have successfully addressed the reviewer's concern while maintaining the manuscript's original structure and scientific content. We appreciate the opportunity to improve our manuscript and thank the reviewer for this constructive suggestion.

      (2) While p65/MMP-9 is convincingly implicated, the role of MAPK/p38 and JNK is less clearly resolved.

      We sincerely appreciate the reviewer's insightful comment regarding the roles of MAPK/p38 and JNK in our study. Our experimental data clearly demonstrated that RBMX2 knockout significantly reduced phosphorylation levels of p65, p38, and JNK (Fig. 5A), indicating potential involvement of all three pathways in RBMX2-mediated regulation.

      Through systematic functional validation, we obtained several important findings:

      In pathway inhibition experiments, p65 activation (PMA treatment) showed the most dramatic effects on both tight junction disruption (ZO-1, OCLN reduction) and EMT marker regulation (E-cadherin downregulation, N-cadherin upregulation);p38 activation (ML141 treatment) exhibited moderate effects on these processes; JNK activation (Anisomycin treatment) displayed minimal impact.

      Most conclusively, siRNA-mediated silencing of p65 alone was sufficient to:

      Restore epithelial barrier function

      Reverse EMT marker expression

      Reduce bacterial adhesion and invasion

      These results establish a clear hierarchy in pathway importance: p65 serves as the primary mediator of RBMX2's effects, while p38 plays a secondary role and JNK appears non-essential under our experimental conditions. We have now clarified this relationship in the revised Discussion section to strengthen this conclusion.

      This refined understanding of pathway hierarchy provides important mechanistic insights while maintaining consistency with all our experimental data. We thank the reviewer for this valuable suggestion that helped improve our manuscript.

      (3) Metabolomics results are interesting but not integrated deeply into the main EMT narrative.

      Thank you for this constructive suggestion. In this article, we detected the metabolome of RBMX2 knockout and wild-type cells after Mycobacterium bovis infection, which mainly served as supporting evidence for our EMT model. However, we did not conduct an in-depth discussion of these findings. We have now added a detailed discussion of this section to further support our EMT model.

      ADD:Meanwhile, metabolic pathways enriched after RBMX2 deletion, such as nucleotide metabolism, nucleotide sugar synthesis, and pentose interconversion, primarily support cell proliferation and migration during EMT by providing energy precursors, regulating glycosylation modifications, and maintaining redox balance; cofactor synthesis and amino sugar metabolism participate in EMT regulation through influencing metabolic remodeling and extracellular matrix interactions; chemokine and cGMP-PKG signaling pathways may further mediate inflammatory responses and cytoskeletal rearrangements, collectively promoting the EMT process.

      (4) A key finding and starting point of this study is the upregulation of RBMX2 upon M. bovis infection. However, the authors have only assessed RBMX2 expression at the mRNA level following infection with M. bovis and BCG. To strengthen this conclusion, it is essential to validate RBMX2 expression at the protein level through techniques such as Western blotting or immunofluorescence. This would significantly enhance the credibility and impact of the study's foundational observation.

      Thank you for your comment. We have supplemented the experiments in this part and found that Mycobacterium bovis infection can significantly enhance the expression level of RBMX2 protein.

      (5) The manuscript would benefit from a more in-depth discussion of the relationship between tuberculosis (TB) and lung cancer. While the study provides experimental evidence suggesting a link via EMT induction, integrating current literature on the epidemiological and mechanistic connections between chronic TB infection and lung tumorigenesis would provide important context and reinforce the translational relevance of the findings.

      We sincerely appreciate the valuable comments from the reviewer. We fully agree with your suggestion to further explore the relationship between tuberculosis (TB) and lung cancer. In the revised manuscript, we will add a new paragraph in the Discussion section to systematically integrate the current literature on the epidemiological and mechanistic links between chronic tuberculosis infection and lung cancer development, including the potential bridging roles of chronic inflammation, tissue damage repair, immune microenvironment remodeling, and the epithelial-mesenchymal transition (EMT) pathway. This addition will help more comprehensively interpret the clinical implications of the observed EMT activation in the context of our study, thereby enhancing the biological plausibility and clinical translational value of our findings.

      ADD:There is growing epidemiological evidence suggesting that chronic TB infection represents a potential risk factor for the development of lung cancer. Studies have shown that individuals with a history of TB exhibit a significantly increased risk of lung cancer, particularly in areas of the lung with pre-existing fibrotic scars, indicating that chronic inflammation, tissue repair, and immune microenvironment remodeling may collectively contribute to malignant transformation 74. Moreover, EMT not only endows epithelial cells with mesenchymal features that enhance migratory and invasive capacity but is also associated with the acquisition of cancer stem cell-like properties and therapeutic resistance 75. Therefore, EMT may serve as a crucial molecular link connecting chronic TB infection with the malignant transformation of lung epithelial cells, warranting further investigation in the intersection of infection and tumorigenesis.

      Reviewer #2 (Public review):

      Summary:

      I am not familiar with cancer biology, so my review mainly focuses on the infection part of the manuscript. Wang et al identified an RNA-binding protein RBMX2 that links the Mycobacterium bovis infection to the epithelial-Mesenchymal transition and lung cancer progression. Upon mycobacterium infection, the expression of RBMX2 was moderately increased in multiple bovine and human cell lines, as well as bovine lung and liver tissues. Using global approaches, including RNA-seq and proteomics, the authors identified differential gene expression caused by the RBMX2 knockout during M. bovis infection. Knockout of RBMX2 led to significant upregulations of tight-junction related genes such as CLDN-5, OCLN, ZO-1, whereas M. bovis infection affects the integrity of epithelial cell tight junctions and inflammatory responses. This study establishes that RBMX2 is an important host factor that modulates the infection process of M. bovis.

      Strengths:

      (1) This study tested multiple types of bovine and human cells, including macrophages, epithelial cells, and clinical tissues at multiple timepoints, and firmly confirmed the induced expression of RBMX2 upon M. bovis infection.

      (2) The authors have generated the monoclonal RBMX2 knockout cell lines and comprehensively characterized the RBMX2-dependent gene expression changes using a combination of global omics approaches. The study has validated the impact of RBMX2 knockout on the tight-junction pathway and on the M. bovis infection, establishing RBMX2 as a crucial host factor.

      Weaknesses:

      (1) The RBMX2 was only moderately induced (less than 2-fold) upon M. bovis infection, arguing its contribution may be small. Its value as a therapeutic target is not justified. How RBMX2 was activated by M. bovis infection was unclear.

      Thank you for your valuable and constructive comments. In this study, we primarily utilized the CRISPR whole-genome screening approach to identify key factors involved in bovine tuberculosis infection. Through four rounds of screening using a whole-genome knockout cell line of bovine lung epithelial cells infected with Mycobacterium bovis, we identified RBMX2 as a critical factor.

      Although the transcriptional level change of RBMX2 was less than two-fold, following the suggestion of Reviewer 1, we examined its expression at the protein level, where the change was more pronounced, and we have added these results to the manuscript.

      Regarding the mechanism by which RBMX2 is activated upon M. bovis infection, we previously screened for interacting proteins using a Mycobacterium tuberculosis secreted and membrane protein library, but unfortunately, we did not identify any direct interacting proteins from M. tuberculosis (https://doi.org/10.1093/nar/gkx1173).

      (2) Although multiple time points have been included in the study, most analyses lack temporal resolution. It is difficult to appreciate the impact/consequence of M. bovis infection on the analyzed pathways and processes.

      We appreciate the valuable comments from the reviewers. Although our study included multiple time points post-infection, in our experimental design we focused on different biological processes and phenotypes at distinct time points:

      During the early phase (e.g., 2 hours post-infection), we focused on barrier phenotypes during the intermediate phase (e.g., 24 hours post-infection), we concentrated more on pathway activation and EMT phenotypes;

      And during the later phase (e.g., 48–72 hours post-infection), we focused more on cell death phenotypes, which were validated in another FII article (https://doi.org/10.3389/fimmu.2024.1431207).

      We also examined the impact of varying infection durations on RBMX2 knockout EBL cellular lines via GO analysis. At 0 hpi, genes were primarily related to the pathways of cell junctions, extracellular regions, and cell junction organization. At 24 hpi, genes were mainly associated with pathways of the basement membrane, cell adhesion, integrin binding and cell migration By 48 hpi, genes were annotated into epithelial cell differentiation and were negatively regulated during epithelial cell proliferation. This indicated that RBMX2 can regulate cellular connectivity throughout the stages of M. bovis infection.

      For KEGG analysis, genes linked to the MAPK signaling pathway, chemical carcinogen-DNA adducts, and chemical carcinogen-receptor activation were observed at 0 hpi. At 24 hpi, significant enrichment was found in the ECM-receptor interaction, PI3K-Akt signaling pathway, and focal adhesion. Upon enrichment analysis at 48 hpi, significant enrichment was noted in the TGF-beta signaling pathway, transcriptional misregulation in cancer, microRNAs in cancer, small cell lung cancer, and p53 signaling pathway.

      Reviewer #3 (Public review):

      Summary:

      This study investigates the role of the host protein RBMX2 in regulating the response to Mycobacterium bovis infection and its connection to epithelial-mesenchymal transition (EMT), a key pathway in cancer progression. Using bovine and human cell models, the authors have wisely shown that RBMX2 expression is upregulated following M. bovis infection and promotes bacterial adhesion, invasion, and survival by disrupting epithelial tight junctions via the p65/MMP-9 signaling pathway. They also demonstrate that RBMX2 facilitates EMT and is overexpressed in human lung cancers, suggesting a potential link between chronic infection and tumor progression. The study highlights RBMX2 as a novel host factor that could serve as a therapeutic target for both TB pathogenesis and infection-related cancer risk.

      Strengths:

      The major strengths lie in its multi-omics integration (transcriptomics, proteomics, metabolomics) to map RBMX2's impact on host pathways, combined with rigorous functional assays (knockout/knockdown, adhesion/invasion, barrier tests) that establish causality through the p65/MMP-9 axis. Validation across bovine and human cell models and in clinical tissue samples enhances translational relevance. Finally, identifying RBMX2 as a novel regulator linking mycobacterial infection to EMT and cancer progression opens exciting therapeutic avenues.

      Weaknesses:

      Although it's a solid study, there are a few weaknesses noted below.

      (1) In the transcriptomics analysis, the authors performed (GO/KEGG) to explore biological functions. Did they perform the search locally or globally? If the search was performed with a global reference, then I would recommend doing a local search. That would give more relevant results. What is the logic behind highlighting some of the enriched pathways (in red), and how are they relevant to the current study?

      We appreciate the reviewer's thoughtful questions regarding our transcriptomic analysis. In this study, we employed a localized enrichment approach focusing specifically on gene expression profiles from our bovine lung epithelial cell system. This cell-type-specific analysis provides more biologically relevant results than global database searches alone.

      Regarding the highlighted pathways, these represent:

      Temporally significant pathways showing strongest enrichment at each stage:

      (1) 0h: Cell junction organization (immediate barrier response)

      (2) 24h: ECM-receptor interaction (early EMT initiation)

      (3) 48h: TGF-β signaling (chronic remodeling)

      Mechanistically linked to our core findings about RBMX2's role in:

      (1) Epithelial barrier disruption

      (2) Mesenchymal transition

      (3) Chronic infection outcomes

      We selected these particular pathways because they:

      (1) Showed the most statistically significant changes (FDR <0.001)

      (2) Formed a coherent biological narrative across infection stages

      (3) Were independently validated in our functional assays

      This targeted approach allows us to focus on the most infection-relevant pathways while maintaining statistical rigor.

      (2) While the authors show that RBMX2 expression correlates with EMT-related gene expression and barrier dysfunction, the evidence for direct association remains limited in this study. How does RBMX2 activate p65? Does it bind directly to p65 or modulate any upstream kinases? Could ChIP-seq or CLIP-seq provide further evidence for direct RNA or DNA targets of RBMX2 that drive EMT or NF-κB signaling?

      We sincerely appreciate the reviewer's in-depth questions regarding the mechanisms by which RBMX2 activates p65 and its association with EMT. Although the molecular mechanism remains to be fully elucidated, our study has provided experimental evidence supporting a direct regulatory relationship between RBMX2 and the p65 subunit of the NF-κB pathway. Specifically, we investigated whether the transcription factor p65 could directly bind to the promoter region of RBMX2 using CHIP experiments. The results demonstrated that the transcription factor p65 can physically bind to the RBMX2 region.

      Furthermore, dual-luciferase reporter assays were conducted, showing that p65 significantly enhances the transcriptional activity of the RBMX2 promoter, indicating a direct regulatory effect of RBMX2 on p65 expression.

      These findings support our hypothesis that RBMX2 activates the NF-κB signaling pathway through direct interaction with the p65 protein, thereby participating in the regulation of EMT progression and barrier function.

      In our subsequent work papers, we will also employ experiments such as CLIP to further investigate the specific mechanisms through which RBMX2 exerts its regulatory functions.

      ADD and Revise in Results:

      To thoroughly verify the regulatory mechanism between RBMX2 and p65, we initiated our investigation by conducting an in-depth analysis of the RBMX2 promoter region to identify potential interactions with the transcription factor p65. Initially, we performed molecular docking simulations to predict the binding affinity and interaction patterns between RBMX2 and p65 proteins. These simulations revealed multiple amino acid residues within the RBMX2 protein that formed strong, stable interactions with p65. The docking analysis yielded a high docking score of 1978.643 (Fig. 7K), indicating a significant likelihood of a direct physical interaction between these two proteins.

      To complement the protein-protein interaction analysis, we next investigated whether p65 could directly bind to the promoter region of the RBMX2 gene at the transcriptional level. Using the JASPAR database, a comprehensive resource for transcription factor binding profiles, we queried the RBMX2 promoter sequence for potential p65 binding sites. This analysis identified several putative binding motifs, suggesting that p65 may act as a transcriptional regulator of RBMX2 expression.

      To experimentally validate this transcriptional regulatory relationship, we employed a dual-luciferase reporter assay. We cloned the RBMX2 promoter region containing the predicted p65 binding sites into a luciferase reporter plasmid. This construct was then co-transfected into cultured cells along with a plasmid expressing p65. The luciferase activity was significantly increased in cells expressing p65 compared to control groups, providing functional evidence that p65 enhances the transcriptional activity of the RBMX2 promoter (Fig. 7I).

      Furthermore, to confirm the direct binding of p65 to the RBMX2 promoter in a chromatin context, we performed chromatin immunoprecipitation followed by quantitative PCR (ChIP-qPCR). In this assay, we used specific antibodies against p65 to immunoprecipitate chromatin fragments containing p65-bound DNA. The enriched DNA fragments were then analyzed using primers targeting the RBMX2 promoter region. Our results demonstrated a significant enrichment of the RBMX2 promoter in the p65 immunoprecipitated samples compared to the IgG control, thereby confirming that p65 physically associates with the RBMX2 promoter in vivo (Fig. 7J). Collectively, these findings-ranging from computational docking predictions to transcriptional reporter assays and ChIP validation-provide strong evidence supporting a direct regulatory interaction between p65 and RBMX2. This regulatory mechanism may play a critical role in the biological pathways involving these two molecules, particularly in contexts such as inflammation, immune response, or cellular stress, where p65 (a subunit of NF-κB) is known to be prominently involved.

      (3) The manuscript suggests that RBMX2 enhances adhesion/invasion of several bacterial species (e.g., E. coli, Salmonella), not just M. bovis. This raises questions about the specificity of RBMX2's role in Mycobacterium-specific pathogenesis. Is RBMX2 a general epithelial barrier regulator or does it exhibit preferential effects in mycobacterial infection contexts? How does this generality affect its potential as a TB-specific therapeutic target?

      Thank you for your valuable comments. When we initially designed this experiment, we were interested in whether the RBMX2 knockout cell line could confer effective resistance not only against Mycobacterium bovis but also against Gram-negative and Gram-positive bacteria. Surprisingly, we indeed observed resistance to the invasion of these pathogens, albeit weaker compared to that against Mycobacterium bovis.

      Nevertheless, we believe these findings merit publication in eLife. Moreover, RBMX2 knockout does not affect the phenotype of epithelial barrier disruption under normal conditions; its significant regulatory effect on barrier function is only evident upon infection with Mycobacterium bovis.

      Importantly, during our genome-wide knockout library screening, RBMX2 was not identified in the screening models for Salmonella or Escherichia coli, but was consistently detected across multiple rounds of screening in the Mycobacterium bovis model.

      (4) The quality of the figures is very poor. High-resolution images should be provided.

      Thank you for your feedback; we provided higher-resolution images.

      (5) The methods are not very descriptive, particularly the omics section.

      Thank you for your comments; we have revised the description of the sequencing section.

      (6) The manuscript is too dense, with extensive multi-omics data (transcriptomics, proteomics, metabolomics) but relatively little mechanistic integration. The authors should have focused on the key mechanistic pathways in the figures. Improving the narratives in the Results and Discussion section could help readers follow the logic of the experimental design and conclusions.

      Thank you for your valuable comments. We have streamlined the figures and revised the description of the results section accordingly.

      Reviewer #2 (Recommendations for the authors):

      (1) The first part of the results and the major conclusions largely overlap with the previous paper by the same authors (Frontiers in Immunology, https://doi.org/10.3389/fimmu.2024.1431207). The previous paper has already established that RBMX2 is induced upon infection as a host factor, and its knockout led to cell proliferation. Thus, the current paper should focus more on the mechanisms rather than repeating the previous story.

      We appreciate the reviewer's careful reading and constructive feedback. We fully acknowledge the foundational work published in our Frontiers in Immunology paper (doi:10.3389/fimmu.2024.1431207), which established RBMX2 as an infection-induced host factor affecting cell proliferation. The current study represents a significant mechanistic extension of these initial findings, with the following key advances:

      (1) Novel Mechanistic Insights (Current Study Focus):

      Discovery of the p65/MMP-9 pathway as the central mechanism mediating RBMX2's effects on EMT (Figs. 4-6)

      First demonstration of RBMX2's role in epithelial barrier disruption (Figs. 2-3)

      Identification of temporal regulation patterns during infection progression (Fig. 7)

      (2) Expanded Biological Scope:

      Demonstration of RBMX2's function in both bovine and human cell systems (vs. previous bovine-only data)

      Clinical correlation with TB lesions

      Therapeutic potential assessment through pathway inhibition

      (3) Technical Advancements:

      CRISPR-based mechanistic validation (vs. previous siRNA approach)

      Multi-omics integration (transcriptomics + metabolomics)

      Advanced live-cell imaging

      We have now:

      Removed redundant proliferation data from Results

      Sharpened the Introduction to highlight mechanistic questions

      Added explicit discussion comparing both studies

      The current work provides the first comprehensive mechanistic framework for RBMX2's role in TB pathogenesis, moving substantially beyond the initial observational findings. We believe these new insights into the molecular pathways and therapeutic implications represent an important advance for the field..

      (2) Line 107-110: The CRISPR screening results are not provided. Has it been published, or is it an unpublished dataset? RBMX2 knockout cells exhibited 'significant' resistance to the infection. How significant? Data?

      Thank you for your valuable comments. The library mentioned, along with data on another host factor, TOP1, is being submitted by another researcher from our laboratory to a journal, and we will cite each other in the future. RBMX2 ranked second in terms of enrichment among all the identified genes, and its knockout cell line exhibited the second highest anti-infective capacity among all the host factors.

      (3) Line 152: The RNA-seq analysis has already been performed/reported in the previous Frontiers paper. Therein, 173 genes were found to be differentially expressed. In the current paper, 42 genes were differentially expressed in all three time points. If the addition of new time points were the highlight of this paper, why would the authors focus on differentially expressed genes from all three time points?

      Thank you for your valuable comments.

      In the newly added data, we aimed to investigate the temporal changes during Mycobacterium bovis infection of host cells.

      Previous study (Frontiers): Single 24h timepoint → 173 DEGs

      Current study: Three timepoints (0h, 24h, 48h) with 42 consistently regulated genes → Reveals temporally stable core regulators of infection response

      On one hand, we briefly described in the manuscript those important genes that exhibited changes across all time points.

      On the other hand, in the supplementary materials, we also focused on the enriched genes at each individual time point, to better understand the temporal dynamics regulated by RBMX2.

      (4) Line 153: The '0 h' time point is in fact 2 h post-infection. Why did the authors skip the real 0h time point? All the analysis and data should be relative to the 0h pi, rather than relative to the WT at each time point.

      We appreciate the reviewer's important question regarding our timepoint nomenclature. The experimental timeline was designed as follows:

      (1) Infection Protocol:

      2h to 0h: Bacterial co-culture (MOI 20:1)

      0h: Gentamicin (100 μg/ml) added to kill extracellular bacteria

      0h+: Monitored intracellular survival

      (2) Rationale for "0h" Designation:

      This marks the onset of intracellular infection phase when Extracellular bacteria are eliminated (validated by plating)Host cell responses to intracellular pathogens begin All subsequent measurements reflect genuine infection (not attachment)

      (3)Technical Validation:

      Confirmed complete extracellular killing by:

      Culture supernatant plating (0 CFU after gentamycin)

      Microscopy ( no surface-associated bacteria)

      (4) Comparative Analysis:

      All data are presented as:

      Fold-change relative to uninfected controls at each timepoint

      We have now:

      Clarified the timeline in Methods

      Specified "0h = post-gentamicin" in all figure legends

      This standardized approach aligns with established intracellular pathogen studies (e.g., Cell Microbiol. 2018;20:e12840). We're happy to adjust terminology if "0hpi (post-invasion)" would be clearer.

      (5) Figure 2F: The data should be compared to the 0h pi, and show the temporal changes of gene expression.

      Thank you for your suggestion. We have added additional information to this section. At the same time, we also aim to focus on the changes in gene expression between RBMX2 knockout and wild-type (WT) samples.

      We have now:

      Added temporal expression profiles relative to 0hpi baseline (SFig.4C).

      Clarified the dual normalization approach in Methods

      Maintained original between-group comparisons for phenotypic correlation

      (6) Line 207. Not all the proteins were down-regulated post-infection.

      Thank you for your comment. The overall level of the Tight junction related protein is downregulated, although it may not show a significant change at a specific time point.

      We have revised our description, changing the keyword from "All" to "Most."

      (7) Line 278, the introduction of the H1299 cell line should appear earlier when it was mentioned for the first time in the manuscript.

      Thank you for your comment. We have provided a description in the abstract and Result1.

      ADD:

      Abstrat: Meanwhile, we also validated the EMT process in human lung epithelial cancer cells H1299.

      Result 1: Furthermore, RBMX2-silenced H1299 cells exhibited a higher survival rate compared to H1299 ShNc cells after M. bovis infection (Fig. 1H).

      (8) Figure 4 is huge and almost illegible, which may be divided into two figures.

      Thank you for your valuable comments. We have streamlined the figures and revised the description of the results section accordingly.

      Reviewer #3 (Recommendations for the authors):

      I encountered frequent grammatical and syntactic issues. Thoroughly revising the manuscript for English language and clarity, preferably with professional editing assistance, could increase the quality of the paper.

      Thank you for your valuable comments; we will invite a professional editor to polish the language.

    1. eLife Assessment

      The article presents important findings describing the role of IL27 in maintaining HSCs at steady state, and in emergency haematopoiesis in response to T. goodii by limiting the inflammatory monocyte outcomes. The evidence provided are solid and support that IL27 acts at the level of HSCs and not downstream. This study will be of interest to immunologists and hematologists, as well as infectious disease researchers.

    2. Reviewer #1 (Public review):

      In the manuscript, Aldridge and colleagues investigate the role of IL-27 in regulating hematopoiesis during T. gondii infection. Using loss-of-function approaches, reporter mice, and the generation of serial chimeric mice, they elegantly demonstrate that IL-27 induction plays a critical role in modulating bone marrow myelopoiesis and monocyte generation to the infection site. The study is well-designed, with clear experimental approaches that effectively address the mechanisms by which IL-27 regulates bone marrow myelopoiesis and prevents HSC exhaustion. I have two minor comments that could enhance the conceptual framework of this study:

      (1) The authors indirectly show that IL-27R expression on HSPCs is necessary for regulating HSC proliferation and preventing exhaustion. However, given that they have access to IL-27RFlox mice, they could cross these with Fgd5Cre mice to specifically delete IL-27R on long-term HSCs. This would provide direct evidence for the role of IL-27 signaling in LTHSCs during infection.

      (2) Since memory T and B cells often home to the bone marrow, it would be interesting to consider the potential cross-talk between these cells, HSPCs, and IL-27 signaling during secondary T. gondii infection. A brief discussion of this possibility would strengthen the study's broader implications.

    3. Reviewer #2 (Public review):

      Aldridge et al. demonstrate the important role of IL-27 in limiting emergency myelopoiesis in response to Toxoplasma gondii infection. Interestingly, IL-27 acts specifically at the level of early haematopoietic progenitors, inducing STAT signalling, which, in this case, dampens proliferation and preserves HSC fitness.

      They used different mouse genetic models such as HSC lineage tracing, IL27 and IL27R-deficient mice to show that :

      HSCs actively participate in emergency myelopoiesis during Toxoplasma gondii infection.

      The absence of IL27 and IL27R increases monocyte progenitors and monocytes, mainly inflammatory monocytes CCR2hi.

      At steady state, loss of IL27 impairs HSC fitness as competitive transplantation shows long-term engraftment deficiency of IL27 BM cells. This impairment is exacerbated after infection.

      IL27 is produced by various BM and other tissue cells at steady state and its expression increases with infection, mainly by increasing the number of monocytes producing it.

      This article highlights a new mechanism that acts directly at the level of early hematopoietic cells to limit over-inflammation during infection.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      In the manuscript, Aldridge and colleagues investigate the role of IL-27 in regulating hematopoiesis during T. gondii infection. Using loss-of-function approaches, reporter mice, and the generation of serial chimeric mice, they elegantly demonstrate that IL-27 induction plays a critical role in modulating bone marrow myelopoiesis and monocyte generation to the infection site. The study is well-designed, with clear experimental approaches that effectively adddress the mechanisms by which IL-27 regulates bone marrow myelopoiesis and prevents HSC exhaustion.

      Reviewer #2 (Public review):

      Summary:

      Aldridge et al. aim to demonstrate the role of IL27 in limiting emergency myelopoiesis in response to Toxoplasma gondii infection by acting directly at the level of early haematopoietic progenitors.

      They used different mouse genetic models, such as HSC lineage tracing, IL27 and IL27R-deficient mice, to show that:

      (1) HSCs actively participate in emergency myelopoiesis during Toxoplasma gondii infection.

      (2) The absence of IL27 and IL27R increases monocyte progenitors and monocytes, mainly inflammatory monocytes CCR2hi.

      (3) At steady state, loss of IL27 impairs HSC fitness as competitive transplantation shows long-term engraftment deficiency of IL27 BM cells. This impairment is exacerbated after infection.

      (4) IL27 is produced by various BM and other tissue cells at steady state, and its expression increases with infection, mainly by increasing the number of monocytes producing it.

      Although it is indisputable that IL27 has a role in emergency myelopoiesis by limiting the number of proinflammatory monocytes in response to infection, the authors' claim that it acts only on HSCs and not on more committed progenitors (CMP, GMP, MP) is not supported by the quality of the data presented here, as described below in the weakness section. In addition, this study highlights a role for IL27 during infection, but does not focus on trained immunity, which is the focus of the targeted elife issue.

      We thank the reviewer for these comments. We did try (and perhaps failed) to highlight that all cells within the HSPC category, which includes HSCs and MPPs, have the potential to contribute. The lack of IRGM1-RFP reporter expression in CMPs (Supp Fig5C) suggests that only HSCs and MPPs are progenitors that respond to IL-27 within the bone marrow, and thus that IL-27 signaling on these contributes to the effects observed on monopoiesis and peripheral monocyte populations. We have emphasized this in the revised manuscript, particularly in the introduction (line 82) and discussion (lines 469-472). While this manuscript does not focus solely on trained immunity, the impacts of infection regulating HSC differentiation and having a long-term impact on this compartment are a central theme of trained immunity. For example, Figure 6 and the supporting supplemental figures almost exclusively focus on the differentiation potential that is programed into LTHSCs by infection and the role of IL-27 in regulating this programing. Additionally, Figure 7 shows the long-term consequences of such training. The introduction      and discussion have been modified  to emphasize these connections to trained immunity.         

      Weakness

      (1) In Figure 4, MFI quantification is required. This figure also shows the expression level (FACS and RNA) in progenitors (GMP and CMP, GP, MP), which is quite similar to that of HSC at this level, so it is really surprising that CMP does not respond at all to IL27 (S5C).

      As requested, we have included the MFIs, calculated as a fold change over control FMOs, in the revised manuscript. While HSPCs and CMPs show relatively similar RNA expression of Il27ra (Supp. Fig. 5 A), the levels of surface IL-27R expression by CMPs is lower than HSPCs (Fig. 4C, revised). Additional downstream progenitors (including GMPs) show highly reduced RNA expression and a corresponding low expression of the receptor protein. This is now more apparent with the quantified MFIs (Fig 4-5).

      (2) Total BM was used to test the direct effect of IL27 on HSC. There could be an indirect effect from other more mature BM cells, even if they show lower receptor expression than HSC. This should be done on a different sorted population to prove the direct effect of IL27 on HSC. The authors need to look more closely at some stat-dependent genes or stat itself in different sorted cell populations, not just irgm1. It is also known that Stat is associated with increased HSC proliferation in response to IFN, which is the opposite of what is observed here.

      We thank the reviewer for this question. We have found that the methanol fixation required to detect pSTAT disrupted the ability to stain for HSPCs by flow cytometry. Thus, we used the IRGM1 reporter, which we have found to be a sensitive and high-fidelity reporter of STAT1 activity while preserving epitope markers of HSPCs.

      We agree that the use of bulk bone marrow in the in vitro stimulations could allow for the activation of non-HSPC cell types that are IL-27R+. This is now emphasized in the text. However, there are advantages to this bulk approach as it allows simultaneous analysis of all HSPC populations and downstream progenitors in the same cultures, allowing the ability to assess how the small numbers of IL-27R expressing lymphocytes present in these cultures respond (data that are now included, Supp. Fig. 5C). These cultures also allow a direct comparison of our IL-27R expression analysis with responsiveness to IL-27. Only a selection of the populations analyzed are shown in these data; however, all populations in Figure 4A were also analyzed in Supp. Fig. 5C. These data sets directly correlate receptor expression with sensitivity to IL-27. If this effect was indirect (i.e the ability of IL-27 to induce IFN-γ) then we would expect more robust expression of the IRGM1 reporter across other cell populations. However, while IFN-γ stimulates broad expression of IRGM1, the effects of IL-27 are restricted to HSPC and mature lymphocytes (Supp. Fig. 5C). In other words, the cells that express the highest levels of the IL-27R are most responsive to IL-27.

      While we do not directly measure HSPC proliferation in these cultures, we agree with the reviewer that the decreased proportions of proliferating HSPCs seen in the absence of IL-27 during infection (Fig. 7A) is a complex data set. The reviewer is also correct that interferons can promote HSC proliferations; however, they can also promote cell stress, DNA damage, and even cell death of HSCs during chronic exposure (reviewed extensively in Demerdash, Y., et al. Exp Hematol. 2021. PMID: 33571568). Thus IFNs, much like IL-27, appear to regulate HSPCs with contextual importance, inducing their proliferation but also death. The activation of STAT1 and STAT3 by IL-27 may be at the core of some of these effects observed in our data, and we point out that IL-10, another activator of STAT1+3, has been shown to limit HSC responses to inflammation (lined 58-62), but we have also presented other possibilities in the discussion.

      (3) The decrease in HSC fitness in IL27R KO at steady state could be an indirect effect of the increase in proinflammatory monocytes contributing to high levels of inflammatory cytokines in the BM and thus chronic HSC activation that is enhanced in response to infection. What is the pro-Inflammatory cytokine profile of the BM of IL27 OR IL27R deficient mice and of mixed chimera mice.

      We thank the reviewer for this insightful comment. This was part of our stated rationale in generating the mixed WT:IL-27R-/- BM chimeras presented in Figure 2. In this mixed setting, there remained differences between the ability of the IL-27R sufficient and deficient stem cells to generate inflammatory macrophages. These results suggest that differences in the inflammatory environment do not account for the differences observed. This conclusion is further supported by the observation that the infection-induced levels of IFN-γ in the bone marrow are equivalent in the presence or absence of IL-27 (now included in the revised manuscript, Supp. Fig. 1F).

      (4) Furthermore, the FACS profile of KI67/brdu of Figure 7 is doubtful, as it is shown in different literature that KSL are not predominantly quiescent as shown here, but about 50% are KI67-. This is also inconsistent with the increase of HSC observed in Figure 1. Quantification of total BruDU+ HSC and other progenitors is also important to quantify all cells that have proliferated during infection. As the repopulation of IL27-deficient BM is also lower in the absence of infection the proliation  of HSC in IL27R KO mice in the absence of infection is also important.

      The comment indicates that the reviewer is concerned that our staining for Ki67 is on the low end of reported literature (~10-50% of LSKs, depending on age of the mice and simulation (Thapa R, et al. Stem Cell Res Ther. 2023. PMID: 37280691; Nies KPH, et al. Cytometry A. 2018. PMID: 30176186)). Our stains were performed on cells from infected mice, which does alter the classic markers used to identify HSPCs. For this reason, we are stringent with our gating strategy and may be excluding more HSPCs than are included in other reports. We have included our FMO control in the revised manuscript to indicate our gating approach (Supp. Fig. 9A). While the population of Ki67+ HSPCs is low, these results were consistent between our experiments and provide data sets that are interpretable.

      (5) The immunofluorescence in Figure 3 shows a high level of background and it is difficult to see the GFP and tomato positive cells. In this sense, the number of HSCs quantified as Procr+ (more than 8000 on a single BM section) is inconsistent with the total number of HSCs that a BM can contain (i.e., around 6000 per BM as quantified in Figure 1).

      We agree with the reviewer and have found that there is a high level of background in these stains. We have thresholded these images, as described in our methods, to minimize this. Additionally, the increased numbers of Procr+ cells in the imaging vs our flow data is expected, and has been reported by others (Steinert, EM, et al. Cell. 2015. PMID: 25957682).

      (6) The addition of arrows to the figure will help to visualise positive cells. It is also not clear why the author normalised the GFP+ cells to the tomato+ cells in Figure 3D.

      We thank the reviewer for this comment and have added the suggested arrows. We have also included a more detailed explanation for our normalization strategy.

      (7) Furthermore, even if monocytes represent a high proportion of IL27-producing cells, they are only 50% of the cells at 5dpi, as shown in Figure 3 and S4. Without other monocyte markers, line 307 is incorrect.

      We thank the reviewer for this clarification and have adjusted the text accordingly.

      (8) How do the authors explain that in Figure 1, 5-10% of labelled precursors and monocytes can give 100% of monocytes? This would mean that only labelled HSC can differentiate into PEC monocytes. 5

      We thank the reviewer for their interest in this result. Monocytes and macrophages are some

      Reviewer #1 (Recommendations for the authors):

      I have two minor comments that could enhance the conceptual framework of this study:

      (1) The authors indirectly show that IL-27R expression on HSPCs is necessary for regulating HSC proliferation and preventing exhaustion. However, given that they have access to IL-27RFlox mice, they could cross these with Fgd5Cre mice to specifically delete IL-27R on long-term HSCs. This would provide direct evidence for the role of IL-27 signaling in LTHSCs during infection.

      We appreciate this comment and did attempt this experiment with several HSPC specific Cres, including the Procr-cre (used elsewhere in the manuscript) and the MDS1-cre-ERT2 (Jackson Laboratory Strain #:032863). Unfortunately, validation revealed that deletion efficiency of the IL-27R with these HSCspecific Cre lines was inefficient, and so experiments are ongoing to enhance efficiency of the deletion and test alternative Cre lines (such as the Fgd5-cre).

      (2) Since memory T and B cells often home to the bone marrow, it would be interesting to consider the potential cross-talk between these cells, HSPCs, and IL-27 signaling during secondary T. gondii infection. A brief discussion of this possibility would strengthen the study's broader implications.

      We thank the reviewer for this opportunity. We have previously investigated the interplay between immune cells in the bone marrow (Glatman Zaretsky A, et al. Cell Rep. 2017. PMID: 28228257) and now include these possibilities in the discussion (line 465-470).

      Reviewer #2 (Recommendations for the authors):

      Minor points:

      (1) Figures 6F and 7B: should be shown as % of donor and not total number to clarify the lineage potency of LTHSC. The fact that the results of transplantation are separated into different figures makes it not easy to follow. To see if the increase in monocyte production by IL27 KO BM is specific, the percent of donorderived cells for other populations, such as lymphoid, but also in MP, and inflammatory monocytes, is necessary to confirm Figure 2.

      Perhaps there has been a misunderstanding? In these plots, we are not analyzing mixed chimeras but single transfer chimeras into lethally irradiated hosts. Thus, the % of donor reaches ~80- 90%. However, to measure the actual output of the HSPCs, the cell number was necessary to compare amongst groups. Additional description is provided in the figure legends and in the text of the manuscript (lines 391-392, 434-436, 651-653, and 680-682).

      (2) The heavy UMAP description is unnecessary. Responses As requested, we have reduced this description of how the UMAPs were derived.

      As requested, we have reduced this description of how the UMAPs were derived

    1. eLife Assessment

      This important study describes the effect of beta-glucan innate training of macrophages and its effect on uptake of tumour cells and on the production of inflammatory cytokines. The data are convincing and show decreased phagocytic activity of apoptotic tumour cells accompanied by lower levels of secreted IL-1β, and in vivo findings are also provided in the revision. This finding has potential impact on designing potential macrophage-targeted cancer immuno-therapeutic approaches.

    2. Reviewer #1 (Public review):

      Summary:

      The authors were attempting to describe if trained innate immunity would modulate antibody dependent-cellular phagocytosis (ADCP) and/or efferocytosis.

      Strengths:

      The use of primary murine macrophages, and not a cell line, is considered a strength.

      The trained immunity mediated changes to phagocytosis affected both myeloma and breast cancer cells. The broad effect is consistent with trained immunity.

      In this revised manuscript, the authors now include in vivo data to show in vivo relevance.

      Weaknesses:

      There are many types of cancers so it would be helpful to focus the title more for the types of cancers included in the present study, the most relevant of course would be the type of cancer used for the in vivo model.

    3. Reviewer #3 (Public review):

      Summary:

      Chatzis et al showed that β-glucan trained macrophages have decreased phagocytic activity of apoptotic tumor cells and that is accompanied by lower levels of secreted IL-1β using mouse model.

      Strengths:

      This finding has potential impact on designing new cancer immunotherapeutic approaches by targeting macrophage efferocytosis.

      The concerns have been addressed.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors were attempting to describe whether trained innate immunity would modulate antibody-dependent cellular phagocytosis (ADCP) and/or efferocytosis.

      Strengths:

      The use of primary murine macrophages, and not a cell line, is considered a strength. The trained immunity-mediated changes to phagocytosis affected both melanoma and breast cancer cells. The broad effect is consistent with trained immunity.

      Weaknesses:

      The most significant weakness, also noted by the authors in the discussion, is the lack of in vivo data. Without these data, it is not possible to put the in vitro data in context. It is unknown if the described effects on efferocytosis will be relevant to the in vivo progression of cancer.

      We thank the reviewer for these comments. To examine the role of trained immunity on the modulation of macrophage efferocytosis in vivo, we performed immunostaining analysis in sections from B16F10 tumour samples.

      Importantly, we found that macrophage efferocytosis of apoptotic tumour cells was significantly decreased in the tumour tissue that was excised from mice treated with β-glucan 7 days prior to tumour inoculation (supplementary Figure 3). These data are consistent with our findings using co-culture assays further strengthening the impact of our key findings in this report.

      Reviewer #2 (Public review):

      Summary:

      The authors follow up their preclinical work on beta-glucan-induced trained immunity in murine tumor models that they published in Cell in 2020. In particular, they focus on the role of trained immunity and efferocytosis of cancer cells

      Strengths:

      While properly conducted, the work is underwhelming and fully depends on in vitro observations performed with co-cultures of bone marrow derived macrophages from beta-glucantreated mice and tumor cell lines. From these in vitro studies, the authors conclude that trained immunity induction has no effect on antibody-dependent cellular phagocytosis, while it decreases efferocytosis.

      Weaknesses:

      It would be important to study these phenomena in tumor mouse models in vivo. The authors clearly have the expertise as they have shown in previous studies. Especially because the in vitro observation appears to conflict with the in vivo anti-tumor found in mice prophylactically treated with beta-glucan. Clearly, trained immunity is associated with diverse cellular responses and mechanisms, some of which may promote tumor growth, as the current manuscript suggests, but in the absence of in vivo studies, it is merely a mechanistic exercise of which the relevance is difficult to determine.

      We thank the reviewer for raising this important comment. We have followed reviewer’s suggestion and examined the role of trained immunity on the modulation of macrophage efferocytosis in vivo. As mentioned in our response to Reviewer 1, we demonstrate that efferocytosis of apoptotic melanoma cells in situ was attenuated in tumour samples from ‘trained’ mice as compared to those from controltreated mice.

      Efferocytosis displays a pro-tumour and immunosuppressive role, therefore both our in vitro co-culture (Figure 1) and in vivo (supplementary Figure 3) findings are consistent with our previously published in vivo data supporting the tumour-suppressive role of prophylactic treatment with β-glucan (Kalafati, Kourtzelis et al, PMID: 33125892). 

      Reviewer #3 (Public review):

      Summary:

      Chatzis et al showed that β-glucan trained macrophages have decreased phagocytic activity of apoptotic tumor cells and that is accompanied by lower levels of secreted IL-1β using a mouse model. Strengths: This finding has a potential impact on designing new cancer immunotherapeutic approaches by targeting macrophage efferocytosis.

      Weaknesses:

      Whether this finding could be applied to other scenarios is underdetermined.

      (1)  Does the decrease of efferocytosis also occur in human monocytes/macrophages after training?

      (2)  Both β-glucan and BCG are well-trained innate immunity agents, the authors showed that β-glucan decreased efferocytosis via IL-1 β, so it is interesting to know whether BCG has a similar effect.

      We thank the reviewer for these comments. Our data suggest that induction of trained immunity with β-glucan contributes to decreased macrophage efferocytosis of tumour cells based on co-culture and in vivo approaches in a mouse setting.  

      We agree with the reviewer that utilisation of a human setting would be important to provide additional validation of our findings.

      Induction of trained immunity entails epigenetic and metabolic reprogramming of hematopoietic stem and progenitor cells (HSPCs). As such, the elucidation of mechanisms that modulate trained immunity in human cells would require the establishment of a macrophage differentiation model based on the use of HSPCs rather than the stimulation of monocytes or macrophages with β-glucan.

      Additionally, the investigation of the impact of BCG in trained immunity-dependent phagocytosis would require the assessment of all different types of phagocytic cargos (apoptotic melanoma and breast cancer cells, apoptotic neutrophils, microbial bioparticles) as we did in the case of the β-glucan.  The capacity of different molecules to induce trained immunity in the efferocytosis setting requires further investigation that would be beyond the scope of this study. Therefore, we plan to address these very interesting points in a future study.

      Additional text was added in the Discussion section to clarify the reviewer's points. In addition, we provide a more specific title that reflects better the specificity of our findings.

    1. eLife Assessment

      The manuscript provides important findings on how striatal projection neurons regulate spontaneous locomotion speed in the context of implicit motivation and distinct contextual valence. The manuscript presented convincing supporting evidence for the findings. This work will be of broad interest to neuroscientists in the fields of basal ganglia, movement control, and cognition.

    2. Reviewer #1 (Public review):

      Summary:

      This fundamental work employed multidisciplinary approaches and conducted rigorous experiments to study how a specific subset of neurons in the dorsal striatum (i.e., "patchy" striatal neurons) modulates locomotion speed depending on the valence of naturalistic contexts.

      Strengths:

      The scientific findings are novel and original and significantly advance our understanding of how the striatal circuit regulates spontaneous movement in various contexts.

      Weaknesses:

      This is extensive research involving various circuit manipulation approaches. Some of these circuit manipulations are not physiological. This is discussed.

    3. Reviewer #2 (Public review):

      Hawes et al. investigated the role of striatal neurons in the patch compartment of the dorsal striatum. Using Sepw1-Cre line, the authors combined a modified version of the light/dark transition box test that allows them to examine locomotor activity in different environmental valence with a variety of approaches, including cell-type-specific ablation, miniscope calcium imaging, fiber photometry, and opto-/chemogenetics. First, they found ablation of patchy striatal neurons resulted in an increase in movement vigor when mice stayed in a safe area or when they moved back from more anxiogenic to safe environments. The following miniscope imaging experiment revealed that a larger fraction of striatal patchy neurons was negatively correlated with movement speed, particularly in an anxiogenic area. Next, the authors investigated differential activity patterns of patchy neurons' axon terminals, focusing on those in GPe, GPi, and SNr, showing that the patchy axons in SNr reflect movement speed/vigor. Chemogenetic and optogenetic activation of these patchy striatal neurons suppressed the locomotor vigor, thus demonstrating their causal role in the modulation of locomotor vigor when exposed to valence differentials. Unlike the activation of striatal patches, such a suppressive effect on locomotion was absent when optogenetically activating matrix neurons by using the Calb1-Cre line, indicating distinctive roles in the control of locomotor vigor by striatal patch and matrix neurons. Together, they have concluded that nigrostriatal neurons within striatal patches negatively regulate movement vigor, dependent on behavioral contexts where motivational valence differs.

      The strengths of this work include the use of multiple experimental approaches, including genetic/viral ablation of patch neurons, miniscope single-cell imaging, as well as projection-specific recording of axonal activity by fiber photometry, and causal manipulation of the neurons by chemogenetic and optogenetics. Although similar findings were reported previously, the authors' results will be of value owing to multiple levels of investigation. In my view, this study will add to the important literature by demonstrating how patch (striosomal) neurons in the striatum controls movement vigor.

    4. Reviewer #3 (Public review):

      Hawes et al. combined behavioral, optical imaging, and activity manipulation techniques to investigate the role of striatal patch SPNs in locomotion regulation. Using Sepw1-Cre transgenic mice, they found that patch SPNs encode locomotion deceleration in a light-dark box procedure through optical imaging techniques. Moreover, genetic ablation of patch SPNs increased locomotion speed, while chemogenetic activation of these neurons decreased it. The authors concluded that a subtype of patch striatonigral neurons modulates locomotion speed based on external environmental cues.

      In the revision, the authors have largely addressed my concerns with additional explanation and discussion, although some of the key experiments to strengthen the authors' claim by identifying the function of specific cell populations remain to be conducted due to technical challenges. Nevertheless, the current results remain valuable and interesting to a wide audience in the field.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review): 

      Summary:

      This fundamental work employed multidisciplinary approaches and conducted rigorous experiments to study how a specific subset of neurons in the dorsal striatum (i.e., "patchy" striatal neurons) modulates locomotion speed depending on the valence of the naturalistic context. 

      Strengths: 

      The scientific findings are novel and original and significantly advance our understanding of how the striatal circuit regulates spontaneous movement in various contexts.  Response: We appreciate the reviewer’s positive evaluation.

      Weaknesses: 

      This is extensive research involving various circuit manipulation approaches. Some of these circuit manipulations are not physiological. A balanced discussion of the technical strengths and limitations of the present work would be helpful and beneficial to the field. Minor issues in data presentation were also noted. 

      We have incorporated the recommended discussion of technical limitations and addressed the physiological plausibility of our manipulations on Page 33 of the revised Discussion section. Specifically, we wrote: 

      “Judicious interpretation of the present data must consider the technical limitations of the various methods and circuit-level manipulations applied. Patchy neurons are distributed unevenly across the extensive structure of the striatum, and their targeted manipulation is constrained by viral spread in the dorsal striatum. Somatic calcium imaging using single-photon microscopy captures activity from only a subset of patchy neurons within a narrow focal plane beneath each implanted GRIN lens. Similarly, limitations in light diffusion from optical fibers may reduce the effective population of targeted fibers in both photometry and optogenetic experiments. For example, the more modest locomotor slowing observed with optogenetic activation of striatonigral fibers in the SNr compared to the stronger effects seen with Gq-DREADD activation across the dorsal striatum could reflect limited fiber optic coverage in the SNr.Alternatively, it may suggest that non-striatonigral mechanisms also contribute to generalized slowing. Our photometry data do not support a role for striatopallidal projections from patchy neurons in movement suppression. The potential contribution of intrastriatal mechanisms, discussed earlier, remains to be empirically tested. Although the behavioral assays used were naturalistic, many of the circuit-level interventions were not. Broad ablation or widespread activation of patchy neurons and their efferent projections represent non-physiological manipulations. Nonetheless, these perturbation results are interpreted alongside more naturalistic observations, such as in vivo imaging of patchy neuron somata and axon terminals, to form a coherent understanding of their functional role”.

      Reviewer #2 (Public review):

      Hawes et al. investigated the role of striatal neurons in the patch compartment of the dorsal striatum. Using Sepw1-Cre line, the authors combined a modified version of the light/dark transition box test that allows them to examine locomotor activity in different environmental valence with a variety of approaches, including cell-type-specific ablation, miniscope calcium imaging, fiber photometry, and opto-/chemogenetics. First, they found ablation of patchy striatal neurons resulted in an increase in movement vigor when mice stayed in a safe area or when they moved back from more anxiogenic to safe environments. The following miniscope imaging experiment revealed that a larger fraction of striatal patchy neurons was negatively correlated with movement speed, particularly in an anxiogenic area. Next, the authors investigated differential activity patterns of patchy neurons' axon terminals, focusing on those in GPe, GPi, and SNr, showing that the patchy axons in SNr reflect movement speed/vigor. Chemogenetic and optogenetic activation of these patchy striatal neurons suppressed the locomotor vigor, thus demonstrating their causal role in the modulation of locomotor vigor when exposed to valence differentials. Unlike the activation of striatal patches, such a suppressive effect on locomotion was absent when optogenetically activating matrix neurons by using the Calb1-Cre line, indicating distinctive roles in the control of locomotor vigor by striatal patch and matrix neurons. Together, they have concluded that nigrostriatal neurons within striatal patches negatively regulate movement vigor, dependent on behavioral contexts where motivational valence differs.

      We are grateful for the reviewer’s thorough summary of our main findings.

      In my view, this study will add to the important literature by demonstrating how patch (striosomal) neurons in the striatum control movement vigor. This study has applied multiple approaches to investigate their functionality in locomotor behavior, and the obtained data largely support their conclusions. Nevertheless, I have some suggestions for improvements in the manuscript and figures regarding their data interpretation, accuracy, and efficacy of data presentation

      We appreciate the reviewer’s overall positive assessment and have made substantial improvements to the revised manuscript in response to reviewers’ constructive suggestions.

      (1) The authors found that the activation of the striatonigral pathway in the patch compartment suppresses locomotor speed, which contradicts with canonical roles of the direct pathway. It would be great if the authors could provide mechanistic explanations in the Discussion section. One possibility is that striatal D1R patch neurons directly inhibit dopaminergic cells that regulate movement vigor (Nadal et al., Sci. Rep., 2021; Okunomiya et al., J Neurosci., 2025). Providing plausible explanations will help readers infer possible physiological processes and give them ideas for future follow-up studies.

      We have added the recommended data interpretation and future perspectives on Page 30 of the revised Discussion section. Specifically, we wrote:

      “Potential mechanisms by which striatal patchy neurons reduce locomotion involve the supression of dopamine availability within the striatum. Dopamine, primarily supplied by neurons in the SNc and VTA,broadly facilitates locomotion (Gerfen and Surmeier 2011, Dudman and Krakauer 2016). Recent studies have shown that direct activation of patchy neurons leads to a reduction in striatal dopamine levels, accompanied by decreased walking speed (Nadel, Pawelko et al. 2021, Dong, Wang et al. 2025, Okunomiya, Watanabe et al. 2025). Patchy neuron projections terminate in structures known as “dendron bouquets”, which enwrap SNc dendrites within the SNr and can pause tonic dopamine neuron firing (Crittenden, Tillberg et al. 2016, Evans, Twedell et al. 2020). The present work highlights a role for patchy striatonigral inputs within the SN in decelerating movement, potentially through GABAergic dendron bouquets that limit dopamine release back to the striatum (Dong, Wang et al. 2025). Additionally, intrastriatal collaterals of patch spiny projection neurons (SPNs) have been shown to suppress dopamine release and associated synaptic plasticity via dynorphin-mediated activation of kappa opioid receptors on dopamine terminals (Hawes, Salinas et al. 2017). This intrastriatal mechanism may further contribute to the reduction in striatal dopamine levels and the observed decrease in locomotor speed, representing a compelling avenue for future investigation.”

      (2) On page 14, Line 301, the authors stated that "Cre-dependent mCheery signals were colocalized with the patch marker (MOR1) in the dorsal striatum (Fig. 1B)". But I could not find any mCherry on that panel, so please modify it.

      We have included representative images of mCherry and MOR1 staining in Supplementary Fig. S1 of the revised manuscript.

      (3) From data shown in Figure 1, I've got the impression that mice ablated with striatal patch neurons were generally hyperactive, but this is probably not the case, as two separate experiments using LLbox and DDbox showed no difference in locomotor vigor between control and ablated mice. For the sake of better interpretation, it may be good to add a statement in Lines 365-366 that these experiments suggest the absence of hyperactive locomotion in general by ablating these specific neurons.

      As suggested by the reviewer, we have added the following statement on Page 17 of the revised manuscript: “These data also indicate that PA elevates valence-specific speed without inducing general hyperactivity”.

      (4) In Line 536, where Figure 5A was cited, the author mentioned that they used inhibitory DREADDs (AAV-DIO-hM4Di-mCherrry), but I could not find associated data on Figure 5. Please cite Figure S3, accordingly.

      We have added the citation for the now Fig. S4 on Page 25 of the revised manuscript.

      (5) Personally, the Figure panel labels of "Hi" and "ii" were confusing at first glance. It would be better to have alternatives.

      As suggested by the reviewer, we have now labeled each figure panel with a distinct single alphabetical letter.

      (6) There is a typo on Figure 4A: tdTomata → tdTomato

      We have made the correction on the figure.

      Reviewer #3 (Public review):

      Hawes et al. combined behavioral, optical imaging, and activity manipulation techniques to investigate the role of striatal patch SPNs in locomotion regulation. Using Sepw1-Cre transgenic mice, they found that patch SPNs encode locomotion deceleration in a light-dark box procedure through optical imaging techniques. Moreover, genetic ablation of patch SPNs increased locomotion speed, while chemogenetic activation of these neurons decreased it. The authors concluded that a subtype of patch striatonigral neurons modulates locomotion speed based on external environmental cues. Below are some major concerns:

      The study concludes that patch striatonigral neurons regulate locomotion speed. However, unless I missed something, very little evidence is presented to support the idea that it is specifically striatonigral neurons, rather than striatopallidal neurons, that mediate these effects. In fact, the optogenetic experiments shown in Fig. 6 suggest otherwise. What about the behavioral effects of optogenetic stimulation of striatonigral versus striatopallidal neuron somas in Sepw1-Cre mice?

      Our photometry data implicate striatonigral neurons in locomotor slowing, as evidenced by a negative cross-correlation with acceleration and a negative lag, indicating that their activity reliably precedes—and may therefore contribute to—deceleration. In contrast, photometry results from striatopallidal neurons showed no clear correlation with speed or acceleration.

      Figure 6 demonstrates that optogenetic manipulation within the SNr of Sepw1-Cre<sup>+</sup> striatonigral axons recapitulated context-dependent locomotor changes seen with Gq-DREADD activation of both striatonigral and striatopallidal Sepw1-Cre<sup>+</sup> cells in the dorsal striatum but failed to produce the broader locomotor speed change observed when targeting all Sepw1-Cre<sup>+</sup> cells in the dorsal striatum using either ablation or Gq-DREADD activation. The more subtle speed-restrictive phenotype resulting from ChR activation in the SNr could, as the reviewer suggests, implicate striatopallidal neurons in broad locomotor speed regulation. However, our photometry data indicate that this scenario is unlikely, as activity of striatopallidal Sepw1-Cre<sup>+</sup> fibers is not correlated with locomotor speed. Another plausible explanation is that the optogenetic approach may have affected fewer striatonigral fibers, potentially due to the limited spatial spread of light from the optical fiber within the SNr. Broad locomotor speed change in LDbox might require the recruitment of a larger number of striatonigral fibers than we were able to manipulate with optogenetics. We have added discussion of these technical limitations to the revised manuscript. Additionally, we now discuss the possibility that intrastriatal collaterals may contribute to reduced local dopamine levels by releasing dynorphin, which acts on kappa opioid receptors located on dopamine fibers (Hawes, Salinas et al. 2017), thereby suppressing dopamine release.

      The reviewer also suggests an interesting experiment involving optogenetic stimulation of striatonigral versus striatopallidal somata in Sepw1-Cre mice. While we agree that this approach would yield valuable insights, we have thus far been unable to achieve reliable results using retroviral vectors. Moreover, selectively targeting striatopallidal terminals optogenetically remains technically challenging, as striatonigral fibers also traverse the pallidum, and the broad anatomical distribution of the pallidum complicates precise targeting. This proposed work will need to be pursued in a future study, either with improved retrograde viral tools or the development of additional mouse lines that offer more selective access to these neuronal populations as we documented recently (Dong, Wang et al. 2025).

      In the abstract, the authors state that patch SPNs control speed without affecting valence. This claim seems to lack sufficient data to support it. Additionally, speed, velocity, and acceleration are very distinct qualities. It is necessary to clarify precisely what patch neurons encode and control in the current study.

      We believe the reviewer’s interpretation pertains to a statement in the Introduction rather than the Abstract: “Our findings reveal that patchy SPNs control the speed at which mice navigate the valence differential between high- and low-anxiety zones, without affecting valence perception itself.” Throughout our study, mice consistently preferred the dark zone in the Light/Dark box, indicating intact perception of the valence differential between illuminated areas. While our manipulations altered locomotor speed, they did not affect time spent in the dark zone, supporting the conclusion that valence perception remained unaltered. We appreciate the reviewer’s insight and agree it is an intriguing possibility that locomotor responses could, over time, influence internal states such as anxiety. We addressed this in the Discussion, noting that while dark preference was robust to our manipulations, future studies are warranted to explore the relationship between anxious locomotor vigor and anxiety itself. We report changes in scalar measures of animal speed across Light/Dark box conditions and under various experimental manipulations. Separately, we show that activity in both patchy neuron somata and striatonigral fibers is negatively correlated with acceleration—indicating a positive correlation with deceleration. Notably, the direction of the cross-correlational lag between striatonigral fiber activity and acceleration suggests that this activity precedes and may causally contribute to mouse deceleration, thereby influencing reductions in speed. To clarify this, we revised a sentence in the Results section:

      “Moreover, patchy neuron efferent activity at the SNr may causally contribute to deceleration, asindicated by the negative cross-correlational lag, thereby reducing animal speed.”. We also updated the Discussion to read: “Together, these data specifically implicate patchy striatonigral neurons in slowing locomotion by acting within the SNr to drive deceleration.”

      One of the major results relies on chemogenetic manipulation (Figure 5). It would be helpful to demonstrate through slice electrophysiology that hM3Dq and hM4Di indeed cause changes in the activity of dorsal striatal SPNs, as intended by the DREADD system. This would support both the positive (Gq) and negative (Gi) findings, where no effects on behavior were observed.

      We were unable to perform this experiment; however, hM3Dq has previously been shown to be effective in striatal neurons (Alcacer, Andreoli et al. 2017). The lack of effect observed in GiDREADD mice serves as an unintended but valuable control, helping to rule out off-target effects of the DREADD agonist JHU37160 and thereby reinforcing the specificity of hM3Dq-mediated activation in our study. We have now included an important caveat regarding the Gi-DREADD results, acknowledging the possibility that they may not have worked effectively in our target cells:

      “Potential explanations for the negative results in Gi-DREADD mice include inherently low basal activity among patchy neurons or insufficient expression of GIRK channels in striatal neurons, which may limit the effectiveness of Gicoupling in suppressing neuronal activity (Shan, Fang et al. 2022).”

      Finally, could the behavioral effects observed in the current study, resulting from various manipulations of patch SPNs, be due to alterations in nigrostriatal dopamine release within the dorsal striatum?

      We agree that this is an important potential implication of our work, especially given that we and others have shown that patchy striatonigral neurons provide strong inhibitory input to dopaminergic neurons involved in locomotor control (Nadel, Pawelko et al. 2021, Lazaridis, Crittenden et al. 2024, Dong, Wang et al. 2025, Okunomiya, Watanabe et al. 2025). Accordingly, we have expanded the discussion section to include potential mechanistic explanations that support and contextualize our main findings.

      Reviewer #1 (Recommendations for the authors):

      Here are some minor issues for the authors' reference:

      (1) This work supports the motor-suppressing effect of patchy SPNs, and >80% of them are direct pathway SPNs. This conclusion is not expected from the traditional basal ganglia direct/indirect pathway model. Most experiments were performed using nonphysiological approaches to suppress (i.e., ablation) or activate (i.e., continuous chemo-optogenetic stimulation). It remains uncertain if the reported observations are relevant to the normal biological function of patchy SPNs under physiological conditions. Particularly, under what circumstances an imbalanced patch/matrix activity may be induced, as proposed in the sections related to the data presented in Figure 6. A thorough discussion and clarification remain needed. Or it should be discussed as a limitation of the present work.

      We have added discussion and clarification of physiological limitations in response to reviewer feedback. Additionally, we revised the opening sentence of an original paragraph in the discussion section to emphasize that it interprets our findings in the context of more physiological studies reporting natural shifts in patchy SPN activity due to cognitive conflict, stress, or training. The revised opening sentence now reads: “Together with previous studies of naturally occurring shifts in patchy neuron activation, these data illustrate ethologically relevant roles for a subgroup of genetically defined patchy neurons in behavior.”

      (2) Lines 499-500: How striato-nigral cells encode speed and deceleration deserves a thorough discussion and clarification. These striatonigral cells can target both SNr GABAergic neurons and dendrites of the dopaminergic neurons. A discussion of microcircuits formed by the patchy SPNs axons in the SNr GABAergic and SNC DAergic neurons should be presented.

      We have added this point at lines 499–500, including a reference to a relevant review of microcircuitry. Additionally, we expanded the discussion section to address microcircuit mechanisms that may underlie our main findings.

      (3) Line 70: "BNST" should be spelled out at the first time it is mentioned.

      This has been done.

      (4) Line 133: only GCaMP6 was listed in the method, but GCaMP8 was also used (Figure 4). Clarification or details are needed.

      Thank you for your careful attention to detail. We have corrected the typographical errors in the Methods section. Specifically, in the Stereotaxic Injections section, we corrected “GCaMP83” to “GCaMP8s.” In the Fiber Implant section, we removed the incorrect reference to “GCaMP6s” and clarified that GCaMP8s was used for photometry, and hChR2 was used for optogenetics.

      (5) Line 183: Can the authors describe more precisely what "a moment" means in terms of seconds or minutes?

      This has been done.

      (6) Line 288: typo: missing / in ΔF

      Thank you this has been fixed

      (7) Line 301-302: the statement of "mCherry and MOR1 colocalization" does not match the images in Figure 1B.

      This has been corrected by proving a new Supplementary Figure S1.

      (8) Related to the statement between Lines 303-304: Figure 1c data may reflect changes in MOR1 protein or cell loss. Quantification of NeuN+ neurons within the MOR1 area would strengthen the conclusion of 60% of patchy cell loss in Figure 1C

      Since the efficacy of AAV-FLEX-taCasp3 in cell ablation has been well established in our previous publications and those of others (Yang, Chiang et al. 2013, Wu, Kung et al. 2019), we do not believe the observed loss of MOR1 staining in Fig. 1C merely reflects reduced MOR1 expression. Moreover, a general neuronal marker such as NeuN may not reliably detect the specific loss of patchy neurons in our ablation model, given the technical limitations of conventional cell-counting methods like MBF’s StereoInvestigator, which typically exhibit a variability margin of 15–20%.

      (9) Lines 313-314: "Similarly, PA mice demonstrated greater stay-time in the dark zone (Figure 1E)." Revision is needed to better reflect what is shown in Figure 1E and avoid misunderstandings.

      Thank you this has been addressed.

      (10) The color code in Figure 2Gi seems inconsistent with the others? Clarifications are needed

      Color coding in Figure 2Gi differs from that in 2Eii out of necessity. For example, the "Light" cells depicted in light blue in 2Eii are represented by both light gray and light red dots in 2Gi. Importantly, Figure 2G does not encode specific speed relationships; instead, any association with speed is indicated by a red hue.

      (11) Lines 538-539: the statement of "Over half of the patch was covered" was not supported by Figure 5C. Clarification is needed.

      Thank you. For clarity, we updated the x-axis labels in Figures 1C and 5C from “% area covered” to “% DS area covered,” and defined “DS” as “dorsal striatal” in the corresponding figure legends. Additionally, we revised the sentence in question to read: “As with ablation, histological examination indicated that a substantial fraction of dorsal patch territories, identified through MOR1 staining, were impacted (Fig. 5C).”

      (12) Figure 3: statistical significance in Figure 3 should be labeled in various panels.

      We believe the reviewer's concern pertains to the scatter plot in panel F—specifically, whether the data points are significantly different from zero. In panel 3F, the 95% confidence interval clearly overlaps with zero, indicating that the results are not statistically significant.

      (13) Figures 6D-E: no difference in the speed of control mice and ChR2 mice under continuous optical stimulation was not expected. It was different from Gq-DRADDS study in Figure 5E-F. Clarifications are needed.

      For mice undergoing constant ChR2 activation of Sepw1-Cre+ SNr efferents, overall locomotor speed does not differ from controls. However, the BIL (bright-to-illuminated) effect on zone transitions isdisrupted: activating Sepw1-Cre<sup>+ </sup> fibers in the SNr blunts the typical increase in speed observed when mice flee from the light zone toward the dark zone. This impaired BIL-related speed increase upon exiting the light was similarly observed in the Gq-DREADD cohort. The reviewer is correct that this optogenetic manipulation within the SNr did not produce the more generalized speed reductions seen with broader Gq-DREADD activation of all Sepw1-Cre<sup>+ </sup> cells in the dorsal striatum. A likely explanation is the difference in targeting—ChR2 specifically activates SNr-bound terminals, whereas Gq-DREADD broadly activates entire Sepw1-Cre<sup>+ </sup> cells. Notably, many of the generalized speed profile changes observed with chemogenetic activation are opposite to those resulting from broad ablation of Sepw1-Cre<sup>+ </sup> cells. The more subtle speed-restrictive phenotype observed with ChR2 activation targeted to the SNr may suggest that fewer striatonigral fibers were affected by this technique, possibly due to the limited spread of light from the fiber optic. Broad locomotor speed change in LDbox might require the recruitment of a larger number of striatonigral fibers than we were able to manipulate with an optogenetic approach. Alternatively, it could indicate that non-striatonigral Sepw1-Cre<sup>+ </sup> projections—such as striatopallidal or intrastriatal pathways—play a role in more generalized slowing. If striatopallidal fibers contributed to locomotor slowing, we would expect to see non-zero cross-correlations between neural activity and speed or acceleration, along with negative lag indicating that neural activity precedes the behavioral change. However, our fiber photometry data do not support such a role for Sepw1-Cre<sup>+ </sup> striatopallidal fibers. We have also referenced the possibility that intrastriatal collaterals could suppress striatal dopamine levels, potentially explaining the stronger slowing phenotype observed when the entire striatal population is affected, as opposed to selectively targeting striatonigral terminals. These technical considerations and interpretive nuances have been incorporated and clarified in the revised discussion section.

      (14) Lines 632: "compliment": a typo?

      Yes, it should be “complement”.

      (15) Figure 4 legend: descriptions of panels A and B were swapped

      Thank you. This has been corrected.

      (16) Friedman (2020) was listed twice in the bibliography (Lines 920-929).

      Thank you. This has been corrected.

      Reviewer #3 (Recommendations for the authors):

      It will be helpful to label and add figure legends below each figure.

      Thank you for the suggestion.

      Editor's note:

      Should you choose to revise your manuscript, if you have not already done so, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and, where appropriate, 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript. We noted some instances where only p values are reported.

      Readers would also benefit from coding individual data points by sex and noting N/sex

      We have included detailed statistical information in the revised manuscript. Both male and female mice were used in all experiments in approximately equal numbers. Since no sex-related differences were observed, we did not report the number of animals by sex.

      References

      Alcacer, C., L. Andreoli, I. Sebastianutto, J. Jakobsson, T. Fieblinger and M. A. Cenci (2017). "Chemogenetic stimulation of striatal projection neurons modulates responses to Parkinson's disease therapy." J Clin Invest 127(2): 720-734.

      Crittenden, J. R., P. W. Tillberg, M. H. Riad, Y. Shima, C. R. Gerfen, J. Curry, D. E. Housman, S. B. Nelson, E. S. Boyden and A. M. Graybiel (2016). "Striosome-dendron bouquets highlight a unique striatonigral circuit targeting dopamine-containing neurons." Proc Natl Acad Sci U S A 113(40): 1131811323.

      Dong, J., L. Wang, B. T. Sullivan, L. Sun, V. M. Martinez Smith, L. Chang, J. Ding, W. Le, C. R. Gerfen and H. Cai (2025). "Molecularly distinct striatonigral neuron subtypes differentially regulate locomotion." Nat Commun 16(1): 2710.

      Dudman, J. T. and J. W. Krakauer (2016). "The basal ganglia: from motor commands to the control of vigor." Curr Opin Neurobiol 37: 158-166.

      Evans, R. C., E. L. Twedell, M. Zhu, J. Ascencio, R. Zhang and Z. M. Khaliq (2020). "Functional Dissection of Basal Ganglia Inhibitory Inputs onto Substantia Nigra Dopaminergic Neurons." Cell Rep 32(11): 108156.

      Gerfen, C. R. and D. J. Surmeier (2011). "Modulation of striatal projection systems by dopamine." Annual review of neuroscience 34: 441-466.

      Hawes, S. L., A. G. Salinas, D. M. Lovinger and K. T. Blackwell (2017). "Long-term plasticity of corticostriatal synapses is modulated by pathway-specific co-release of opioids through kappa-opioid receptors." J Physiol 595(16): 5637-5652.

      Lazaridis, I., J. R. Crittenden, G. Ahn, K. Hirokane, T. Yoshida, A. Mahar, V. Skara, K. Meletis, K.Parvataneni, J. T. Ting, E. Hueske, A. Matsushima and A. M. Graybiel (2024). "Striosomes Target Nigral Dopamine-Containing Neurons via Direct-D1 and Indirect-D2 Pathways Paralleling Classic DirectIndirect Basal Ganglia Systems." bioRxiv.

      Nadel, J. A., S. S. Pawelko, J. R. Scott, R. McLaughlin, M. Fox, M. Ghanem, R. van der Merwe, N. G. Hollon, E. S. Ramsson and C. D. Howard (2021). "Optogenetic stimulation of striatal patches modifies habit formation and inhibits dopamine release." Sci Rep 11(1): 19847.

      Okunomiya, T., D. Watanabe, H. Banno, T. Kondo, K. Imamura, R. Takahashi and H. Inoue (2025).

      "Striosome Circuitry Stimulation Inhibits Striatal Dopamine Release and Locomotion." J Neurosci 45(4).

      Shan, Q., Q. Fang and Y. Tian (2022). "Evidence that GIRK Channels Mediate the DREADD-hM4Di Receptor Activation-Induced Reduction in Membrane Excitability of Striatal Medium Spiny Neurons." ACS Chem Neurosci 13(14): 2084-2091.

      Wu, J., J. Kung, J. Dong, L. Chang, C. Xie, A. Habib, S. Hawes, N. Yang, V. Chen, Z. Liu, R. Evans, B. Liang, L. Sun, J. Ding, J. Yu, S. Saez-Atienzar, B. Tang, Z. Khaliq, D. T. Lin, W. Le and H. Cai (2019). "Distinct Connectivity and Functionality of Aldehyde Dehydrogenase 1a1-Positive Nigrostriatal Dopaminergic Neurons in Motor Learning." Cell Rep 28(5): 1167-1181 e1167.

      Wu, J., J. Kung, J. Dong, L. Chang, C. Xie, A. Habib, S. Hawes, N. Yang, V. Chen, Z. Liu, R. Evans, B. Liang, L. Sun, J. Ding, J. Yu, S. Saez-Atienzar, B. Tang, Z. Khaliq, D. T. Lin, W. Le and H. Cai (2019). "Distinct Connectivity and Functionality of Aldehyde Dehydrogenase 1a1-Positive Nigrostriatal Dopaminergic Neurons in Motor Learning." Cell Rep 28(5): 1167-1181 e1167.

    1. eLife Assessment

      In this manuscript, Park et al. developed a multiplexed CRISPR construct to genetically ablate the GABA transporter GAT3 in the mouse visual cortex, with effects on population-level neuronal activity. This work is important, as it sheds light on how GAT3 controls the processing of visual information. The findings are compelling, leveraging state-of-the-art gene CRISPR/Cas9, in vivo two-photon laser scanning microscopy, and advanced statistical modeling.

    2. Reviewer #1 (Public review):

      Summary:

      The authors have investigated the role of GAT3 in the visual system. First, they have developed a CRISPR/Cas9-based approach to locally knock out this transporter in the visual cortex. They then demonstrated electrophysiologically that this manipulation increases inhibitory synaptic input into layer 2/3 pyramidal cells. They further examined the functional consequences by imaging neuronal activity in the visual cortex in vivo. They found that absence of GAT3 leads to reduced spontaneous neuronal activity and attenuated neuronal responses and reliability to visual stimuli, but without an effect on orientation selectivity. Further analysis of this data suggests that Gat3 removal leads to less coordinated activity between individual neurons and in population activity patterns, thereby impaired information encoding. Overall, this is an elegant and technically advanced study that demonstrates a new and important role of GAT3 in controlling processing of visual information.

      Strengths:

      Development of a new approach for a local knockout (GAT3)

      Important and novel insights into visual system function and its dependence on GAT3

      Plausible cellular mechanism

      Weaknesses:

      No major weaknesses.

    3. Reviewer #2 (Public review):

      Summary:

      Park et al. has made a tool for spatiotemporally restricted knockout of the astrocytic GABA transporter GAT3 leveraging CRISPR/Cas9 and viral transduction in adult mice, and evaluated the effects of GAT3 on neural encoding of visual stimulation.

      Strengths:

      This concise manuscript leverages state-of-the-art gene CRISPR/Cas9 technology for knocking out astrocytic genes. This has to a little degree been preformed previously in astrocytes and represents an important development in the field. Moreover they utilize in vivo two-photon imaging of neural responses to visual stimuli as a readout of neural activity, in addition to validating their data with ex vivo electrophysiology. Lastly, they use advanced statistical modeling to analyze the impact on GAT3 knockout. Overall, the study comes across as rigorous and convincing.

      Weaknesses:

      Adding the following experiments would potentially have strengthened the conclusions and helped interpret the findings, although may be considered outside the scope of this manuscript, and be pursued in future work:

      (1) Neural activity is quite profoundly influenced by GAT3 knockout. Corroborating these relatively large changes to neural activity with in vivo electrophysiology of some sort as an additional readout would have strengthened the conclusions.

      (2) Given the quite large effects on neural coding in visual cortex assessed with jRGECO imaging it would have been interesting the mouse groups could have been subjected to behavioral testing assessing the visual system.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors have investigated the role of GAT3 in the visual system. First, they have developed a CRISPR/Cas9-based approach to locally knock out this transporter in the visual cortex. They then demonstrated electrophysiologically that this manipulation increases inhibitory synaptic input into layer 2/3 pyramidal cells. They further examined the functional consequences by imaging neuronal activity in the visual cortex in vivo. They found that the absence of GAT3 leads to reduced spontaneous neuronal activity and attenuated neuronal responses and reliability to visual stimuli, but without an effect on orientation selectivity. Further analysis of this data suggests that Gat3 removal leads to less coordinated activity between individual neurons and in population activity patterns, thereby impairing information encoding. Overall, this is an elegant and technically advanced study that demonstrates a new and important role of GAT3 in controlling the processing of visual information.

      We are grateful to the reviewer for their positive appraisal of our work, including our technical advances and our demonstration of how cortical astrocytes play a role in visual information processing by neurons via GAT3-mediated regulation of activity.

      Strengths:

      (1)  Development of a new approach for a local knockout (GAT3).

      (2)  Important and novel insights into visual system function and its dependence on GAT3.

      (3)  Plausible cellular mechanism.

      Weaknesses:

      No major weaknesses were identified by this reviewer.

      We thank the reviewer for highlighting the strengths of our study, including the development of a novel local knockout strategy for GAT3, the discovery of important functional consequences for visual system processing, and the identification of a plausible underlying cellular mechanism.

      Reviewer #2 (Public review):

      Summary:

      Park et al. have made a tool for spatiotemporally restricted knockout of the astrocytic GABA transporter GAT3, leveraging CRISPR/Cas9 and viral transduction in adult mice, and evaluated the effects of GAT3 on neural encoding of visual stimulation.

      Strengths:

      This concise manuscript leverages state-of-the-art gene CRISPR/Cas9 technology for knocking out astrocytic genes. This has only to a small degree been performed previously in astrocytes, and it represents an important development in the field. Moreover, the authors utilize in vivo two-photon imaging of neural responses to visual stimuli as a readout of neural activity, in addition to validating their data with ex vivo electrophysiology. Lastly, they use advanced statistical modeling to analyze the impact of GAT3 knockout. Overall, the study comes across as rigorous and convincing.

      We appreciate the reviewer’s endorsement of our experimental rigor and methodological innovation. We agree that combining in vivo and ex vivo measurements with rigorous analytical methods strengthens the overall conclusions of the study and demonstrates the important role of astrocytic GAT3 in cortical visual processing.

      Weaknesses:

      Adding the following experiments would potentially have strengthened the conclusions and helped with interpreting the findings:

      (1) Neural activity is quite profoundly influenced by GAT3 knockout. Corroborating these relatively large changes to neural activity with in vivo electrophysiology of some sort as an additional readout would have strengthened the conclusions.

      We agree that further investigation of neuronal activity at higher temporal resolution would provide valuable complementary data, particularly given the profound effects we observed using a pan-neuronal calcium indicator. Detailed in vivo electrophysiology—such as large-scale Neuropixel recordings—would allow assessment of single-neuron spiking dynamics and potentially cell-type specific responses following GAT3 deletion. While such an investigation is beyond the scope of the current study, we concur that it would be an important follow-up direction to further dissect the effects of GAT3 knockout on neuron activity profiles at both single-cell and population levels.

      (2) Given the quite large effects on neural coding in visual cortex assessed på jRGECO imaging, it would have been interesting if the mouse groups could have been subjected to behavioral testing, assessing the visual system.

      We appreciate the reviewer’s suggestion to explore potential behavioral consequences of GAT3 deletion. Based on our observed alterations in visual cortical activity, we agree that GAT3 knockout could impact visual discrimination-based behaviors. Astrocytes in the visual cortex are highly tuned to sensory and motor events and are generally known to shape behavioral outputs (Slezak et al., 2019; Kofuji & Araque, 2021). Our study suggests that regulation of inhibitory signaling via GAT3 transporters is a possible mechanism by which astrocytes influence visually guided behaviors. Although behavioral assessments fall beyond the scope of the current work, we agree with the reviewer’s suggestion and will pursue future experiments employing paradigms such as go/no-go visual detection or two-alternative forced choice to determine whether astrocytic GAT3 modulates visually guided behaviors and perceptual decisionmaking.  

      Reviewer #1 (Recommendations for the authors):

      It could be more clearly stated from the very beginning that a method was developed and used which, by itself, apparently has no cell type selectivity. It is highly plausible that the effects are mostly due to the absence of astrocytic GAT3, as discussed by the authors, but the distinction of what has been done and what is interpretation based on the literature is occasionally a bit blurry. This is also important because there are CRISPR/Cas9-based approaches that are astrocyte-specific (e.g., GEARBOCS).

      We thank the reviewer for this helpful suggestion. As noted, our current approach does not confer celltype specificity on its own. Although our interpretation—supported by expression patterns and prior literature—attributes the observed effects primarily to astrocytic GAT3 loss, we agree that this distinction should be explicitly stated. We have revised the Introduction section (lines 83-87) to clarify that while MRCUTS allows for local gene knockout, it is not inherently cell-type specific unless combined with celltype restricted Cre drivers, as is possible in future applications.

      A change of ambient GABA following GAT3 deletion is central to the proposed cellular mechanism. Demonstrating this directly would strengthen the manuscript (e.g., changed tonic GABAergic current in the absence of GAT3, and insensitivity to SNAP-5114).

      While we recognize that directly quantifying ambient GABA levels would further strengthen our study, substantial evidence supports the role of GABA transporters in coordinately regulating both phasic and tonic inhibition and cellular excitability (Kinney, 2005; Keros & Hablitz, 2005; Semyanov et al. 2003).

      Moreover, tonic GABA currents have been shown to strongly correlate with phasic inhibitory bursts (Glykys & Mody, 2007; Farrant & Nusser, 2005; Ataka & Gu, 2006), suggesting shared underlying regulatory mechanisms. Furthermore, as the reviewer correctly points out, alternative mechanisms such as non-vesicular GABA release or disinhibition via interneuron suppression cannot be excluded (also discussed in Kinney 2005). Given these considerations, we prioritized sIPSC measurements as a more integrative and reliable proxy for altered GABAergic signaling in L2/3 pyramidal neurons. We have revised the Discussion section (lines 329-333) to explain our choice of approach for further clarification.

      We also agree it would be of interest to test whether GAT3 KO neurons exhibit insensitivity to SNAP-5114, both ex vivo and in vivo. However, based on our SNAP-5114 application experiments in vivo, which revealed only subtle effects on single-neuron properties (Figure S2A-F), we anticipate that interpreting a lack of effect in the KO condition would be challenging and potentially inconclusive.  

      References

      Ataka, T. & Gu, J. G. Relationship between tonic inhibitory currents and phasic inhibitory activity in the spinal cord lamina II region of adult mice. Mol. Pain. (2006).  

      Bright, D. & Smart, T. Methods for recording and measuring tonic GABAA receptor-mediated inhibition. Front. Neural Circuits. 7, (2013).

      Farrant, M. & Nusser, Z. Variations on an inhibitory theme: phasic and tonic activation of GABAA receptors. Nat. Rev. Neurosci. 6, 215–229 (2005).  

      Glykys, J. & Mody, I. Activation of GABAA Receptors: Views from Outside the Synaptic Cleft. Neuron. 56, 763-770 (2007).

      Keros, S. & Hablitz, J. J. Subtype-Specific GABA Transporter Antagonists Synergistically Modulate Phasic and Tonic GABAA Conductances in Rat Neocortex. J. Neurophysiol. 94, 2073–2085 (2005).

      Kinney, G. A. GAT-3 Transporters Regulate Inhibition in the Neocortex. J. Neurophysiol. 94, 4533–4537 (2005).

      Kofuji, P. & Araque, A. Astrocytes and Behavior. Annu. Rev. Neurosci. 44, 49–67 (2021).

      Semyanov, A., Walker, M. & Kullmann, D. GABA uptake regulates cortical excitability via cell type–specific tonic inhibition. Nat. Neurosci. 6, 484–490 (2003).

      Slezak, M., Kandler, S., Van Veldhoven, P. P., Van den Haute, C., Bonin, V. & Holt, M.G. Distinct

      Mechanisms for Visual and Motor-Related Astrocyte Responses in Mouse Visual Cortex. Curr. Biol. 18, 3120-3127 (2019).

    1. eLife Assessment

      This important study presents a cross-species and cross-disciplinary analysis of cortical folding. The authors use a combination of physical gel models, computational simulations, and morphometric analysis, extending prior work in human brain development to macaques and ferrets. The findings support the hypothesis that mechanical forces driven by differential growth can account for major aspects of gyrification. The evidence presented, though limited in certain species-specific and parametric details, is overall strong and convincingly supports the central claims; the findings will be of broad interest in developmental neuroscience.

    2. Reviewer #1 (Public review):

      The manuscript by Yin and colleagues addresses a long-standing question in the field of cortical morphogenesis, regarding factors that determine differential cortical folding across species and individuals with cortical malformations. The authors present work based on a computational model of cortical folding evaluated alongside a physical model that makes use of gel swelling to investigate the role of a two-layer model for cortical morphogenesis. The study assesses these models against empirically derived cortical surfaces based on MRI data from ferret, macaque monkey, and human brains.

      The manuscript is clearly written and presented, and the experimental work (physical gel modeling as well as numerical simulations) and analyses (subsequent morphometric evaluations) are conducted at the highest methodological standards. It constitutes an exemplary use of interdisciplinary approaches for addressing the question of cortical morphogenesis by bringing together well-tuned computational modeling with physical gel models. In addition, the comparative approaches used in this paper establish a foundation for broad-ranging future lines of work that investigate the impact of perturbations or abnormalities during cortical development.

      The cross-species approach taken in this study is a major strength of the work. However, correspondence across the two methodologies did not appear to be equally consistent in predicting brain folding across all three species. The results presented in Figures 4 (and Figures S3 & S4) show broad correspondence in shape index and major sulci landmarks across all three species. Nevertheless, the results presented for the human brain lack the same degree of clear correspondence for the gel model results as observed in the macaque and ferret. While this study clearly establishes a strong foundation for comparative cortical anatomy across species and the impact of perturbations on individual morphogenesis, further work that fine-tunes physical modeling of complex morphologies, such as that of the human cortex, may help to further understand the factors that determine cortical functionalization and pathologies.

    3. Reviewer #2 (Public review):

      This manuscript explores the mechanisms underlying cerebral cortical folding using a combination of physical modelling, computational simulations, and geometric morphometrics. The authors extend their prior work on human brain development (Tallinen et al., 2014; 2016) to a comparative framework involving three mammalian species: ferrets (Carnivora), macaques (Old World monkeys), and humans (Hominoidea). By integrating swelling gel experiments with mathematical differential growth models, they simulate sulcification instability and recapitulate key features of brain folding across species. The authors make commendable use of publicly available datasets to construct 3D models of fetal and neonatal brain surfaces: fetal macaque (ref. [26]), newborn ferret (ref. [11]), and fetal human (ref. [22]).

      Using a combination of physical models and numerical simulations, the authors compare the resulting folding morphologies to real brain surfaces using morphometric analysis. Their results show qualitative and quantitative concordance with observed cortical folding patterns, supporting the view that differential tangential growth of the cortex relative to the subcortical substrate is sufficient to account for much of the diversity in cortical folding. This is a very important point in our field, and can be used in the teaching of medical students.

      Brain folding remains a topic of ongoing debate. While some regard it as a critical specialization linked to higher cognitive function, others consider it an epiphenomenon of expansion and constrained geometry. This divergence was evident in discussions during the Strüngmann Forum on cortical development (Silver et al., 2019). Though folding abnormalities are reliable indicators of disrupted neurodevelopmental processes (e.g., neurogenesis, migration), their relationship to functional architecture remains unclear. Recent evidence suggests that the absolute number of neurons varies significantly with position-sulcus versus gyrus-with potential implications for local processing capacity (e.g., https://doi.org/10.1002/cne.25626). The field is thus in need of comparative, mechanistic studies like the present one.

      This paper offers an elegant and timely contribution by combining gel-based morphogenesis, numerical modelling, and morphometric analysis to examine cortical folding across species. The experimental design - constructing two-layer PDMS models from 3D MRI data and immersing them in organic solvents to induce differential swelling - is well-established in prior literature. The authors further complement this with a continuum mechanics model simulating folding as a result of differential growth, as well as a comparative analysis of surface morphologies derived from in vivo, in vitro, and in silico brains.

      I offer a few suggestions here for clarification and further exploration:

      Major Comments

      (1) Choice of Developmental Stages and Initial Conditions

      The authors should provide a clearer justification for the specific developmental stages chosen (e.g., G85 for macaque, GW23 for human). How sensitive are the resulting folding patterns to the initial surface geometry of the gel models? Given that folding is a nonlinear process, early geometric perturbations may propagate into divergent morphologies. Exploring this sensitivity-either through simulations or reference to prior work-would enhance the robustness of the findings.

      (2) Parameter Space and Breakdown Points

      The numerical model assumes homogeneous growth profiles and simplifies several aspects of cortical mechanics. Parameters such as cortical thickness, modulus ratios, and growth ratios are described in Table II. It would be informative to discuss the range of parameter values for which the model remains valid, and under what conditions the physical and computational models diverge. This would help delineate the boundaries of the current modelling framework and indicate directions for refinement.

      (3) Neglected Regional Features: The Occipital Pole of the Macaque

      One conspicuous omission is the lack of attention to the occipital pole of the macaque, which is known to remain smooth even at later gestational stages and has an unusually high neuronal density (2.5× higher than adjacent cortex). This feature is not reproduced in the gel or numerical models, nor is it discussed. Acknowledging this discrepancy-and speculating on possible developmental or mechanical explanations-would add depth to the comparative analysis. The authors may wish to include this as a limitation or a target for future work.

      (4) Spatio-Temporal Growth Rates and Available Human Data

      The authors note that accurate, species-specific spatio-temporal growth data are lacking, limiting the ability to model inhomogeneous cortical expansion. While this may be true for ferret and macaque, there are high-quality datasets available for human fetal development, now extended through ultrasound imaging (e.g., https://doi.org/10.1038/s41586-023-06630-3). Incorporating or at least referencing such data could improve the fidelity of the human model and expand the applicability of the approach to clinical or pathological scenarios.

      (5) Future Applications: The Inverse Problem and Fossil Brains

      The authors suggest that their morphometric framework could be extended to solve the inverse growth problem-reconstructing fetal geometries from adult brains. This speculative but intriguing direction has implications for evolutionary neuroscience, particularly the interpretation of fossil endocasts. Although beyond the scope of this paper, I encourage the authors to elaborate briefly on how such a framework might be practically implemented and validated.

      Conclusion

      This is a well-executed and creative study that integrates diverse methodologies to address a longstanding question in developmental neurobiology. While a few aspects-such as regional folding peculiarities, sensitivity to initial conditions, and available human data-could be further elaborated, they do not detract from the overall quality and novelty of the work. I enthusiastically support this paper and believe that it will be of broad interest to the neuroscience, biomechanics, and developmental biology communities.

      Note: The paper mentions a companion paper [reference 11] that explores the cellular and anatomical changes in the ferret cortex. I did not have access to this manuscript, but judging from the title, this paper might further strengthen the conclusions.

    4. Author response:

      Reviewer 1 (Public review):

      The manuscript by Yin and colleagues addresses a long-standing question in the field of cortical morphogenesis, regarding factors that determine differential cortical folding across species and individuals with cortical malformations. The authors present work based on a computational model of cortical folding evaluated alongside a physical model that makes use of gel swelling to investigate the role of a two-layer model for cortical morphogenesis. The study assesses these models against empirically derived cortical surfaces based on MRI data from ferret, macaque monkey, and human brains.

      The manuscript is clearly written and presented, and the experimental work (physical gel modeling as well as numerical simulations) and analyses (subsequent morphometric evaluations) are conducted at the highest methodological standards. It constitutes an exemplary use of interdisciplinary approaches for addressing the question of cortical morphogenesis by bringing together well-tuned computational modeling with physical gel models. In addition, the comparative approaches used in this paper establish a foundation for broad-ranging future lines of work that investigate the impact of perturbations or abnormalities during cortical development.

      The cross-species approach taken in this study is a major strength of the work. However, correspondence across the two methodologies did not appear to be equally consistent in predicting brain folding across all three species. The results presented in Figures 4 (and Figures S3 and S4) show broad correspondence in shape index and major sulci landmarks across all three species. Nevertheless, the results presented for the human brain lack the same degree of clear correspondence for the gel model results as observed in the macaque and ferret. While this study clearly establishes a strong foundation for comparative cortical anatomy across species and the impact of perturbations on individual morphogenesis, further work that fine-tunes physical modeling of complex morphologies, such as that of the human cortex, may help to further understand the factors that determine cortical functionalization and pathologies.

      We thank the reviewer for positive opinions and helpful comments. Yes, the physical gel model of the human brain has a lower similarity index with the real brain. There are several reasons.

      First, the highly convoluted human cortex has a few major folds (primary sulci) and a very large number of minor folds associated with secondary or tertiary sulci (on scales of order comparable to the cortical thickness), relative to the ferret and macaque cerebral cortex. In our gel model, the exact shapes, positions, and orientations of these minor folds are stochastic, which makes it hard to have a very high similarity index of the gel models when compared with the brain of a single individual.

      Second, in real human brains, these minor folds evolve dynamically with age and show differences among individuals. In experiments with the gel brain, multiscale folds form and eventually disappear as the swelling progresses through the thickness. Our physical model results are snapshots during this dynamical process, which makes it hard to have a concrete one-to-one correspondence between the instantaneous shapes of the swelling gel and the growing human brain.

      Third, the growth of the brain cortex is inhomogeneous in space and varying with time, whereas, in the gel model, swelling is relatively homogeneous.

      We agree that further systematic work, based on our proposed methods, with more fine-tuned gel geometries and properties, might provide a deeper understanding of the relations between brain geometry, and growth-induced folds and their functionalization and pathologies. Further analysis of cortical pathologies using computational and physical gel models can be found in our companion paper (Choi et al., 2025), also submitted to eLife:

      G. P. T. Choi, C. Liu, S. Yin, G. Sejourn´ e, R. S. Smith, C. A. Walsh, L. Mahadevan, Biophysical basis for´ brain folding and misfolding patterns in ferrets and humans. Preprint, bioRxiv 2025.03.05.641682.

      Reviewer 2 (Public review):

      This manuscript explores the mechanisms underlying cerebral cortical folding using a combination of physical modelling, computational simulations, and geometric morphometrics. The authors extend their prior work on human brain development (Tallinen et al., 2014; 2016) to a comparative framework involving three mammalian species: ferrets (Carnivora), macaques (Old World monkeys), and humans (Hominoidea). By integrating swelling gel experiments with mathematical differential growth models, they simulate sulcification instability and recapitulate key features of brain folding across species. The authors make commendable use of publicly available datasets to construct 3D models of fetal and neonatal brain surfaces: fetal macaque (ref. [26]), newborn ferret (ref. [11]), and fetal human (ref. [22]).

      Using a combination of physical models and numerical simulations, the authors compare the resulting folding morphologies to real brain surfaces using morphometric analysis. Their results show qualitative and quantitative concordance with observed cortical folding patterns, supporting the view that differential tangential growth of the cortex relative to the subcortical substrate is sufficient to account for much of the diversity in cortical folding. This is a very important point in our field, and can be used in the teaching of medical students.

      Brain folding remains a topic of ongoing debate. While some regard it as a critical specialization linked to higher cognitive function, others consider it an epiphenomenon of expansion and constrained geometry. This divergence was evident in discussions during the Strungmann Forum on cortical development (Silver¨ et al., 2019). Though folding abnormalities are reliable indicators of disrupted neurodevelopmental processes (e.g., neurogenesis, migration), their relationship to functional architecture remains unclear. Recent evidence suggests that the absolute number of neurons varies significantly with position-sulcus versus gyrus-with potential implications for local processing capacity (e.g., https://doi.org/10.1002/cne.25626). The field is thus in need of comparative, mechanistic studies like the present one.

      This paper offers an elegant and timely contribution by combining gel-based morphogenesis, numerical modelling, and morphometric analysis to examine cortical folding across species. The experimental design - constructing two-layer PDMS models from 3D MRI data and immersing them in organic solvents to induce differential swelling - is well-established in prior literature. The authors further complement this with a continuum mechanics model simulating folding as a result of differential growth, as well as a comparative analysis of surface morphologies derived from in vivo, in vitro, and in silico brains.

      We thank the reviewer for the very positive comments.

      I offer a few suggestions here for clarification and further exploration:

      Major Comments

      (1)   Choice of Developmental Stages and Initial Conditions

      The authors should provide a clearer justification for the specific developmental stages chosen (e.g., G85 for macaque, GW23 for human). How sensitive are the resulting folding patterns to the initial surface geometry of the gel models? Given that folding is a nonlinear process, early geometric perturbations may propagate into divergent morphologies. Exploring this sensitivity-either through simulations or reference to prior work-would enhance the robustness of the findings.

      The initial geometry is one of the important factors that decides the final folding pattern. The smooth brain in the early developmental stage shows a broad consistency across individuals, and we expect the main folds to form similarly across species and individuals.

      Generally, we choose the initial geometry when the brain cortex is still relatively smooth. For the human, this corresponds approximately to GW23, as the major folds such as the Rolandic fissure (central sulcus), arise during this developmental stage. For the macaque brain, we chose developmental stage G85, primarily because of the availability of the dataset corresponding to this time, which also corresponds to the least folded.

      We expect that large-scale folding patterns are strongly sensitive to the initial geometry but fine-scale features are not. Since our goal is to explain the large-scale features, we expect sensitivity to the initial shape.

      Enclosed are some results from other researchers that are consistent with this idea. Below are some images of simulations from Wang et al. obtained by perturbing the geometry of a sphere to an ellipsoid. We see that the growth-induced folds mostly maintain their width (wavelength), but change their orientations.

      Reference:

      Wang, X., Lefevre, J., Bohi, A., Harrach, M.A., Dinomais, M. and Rousseau, F., 2021. The influence of` biophysical parameters in a biomechanical model of cortical folding patterns. Scientific Reports, 11(1), p.7686.

      Related results from the same group show that slight perturbations of brain geometry, cause these folds also tend to change their orientations but not width/wavelength (Bohi et al., 2019).

      Reference:

      Bohi, A., Wang, X., Harrach, M., Dinomais, M., Rousseau, F. and Lefevre, J., 2019, July. Global per-` turbation of initial geometry in a biomechanical model of cortical morphogenesis. In 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) (pp. 442-445). IEEE.

      Finally, a systematic discussion of the role of perturbations on the initial geometries and physical properties can be seen in our work on understanding a different system, gut morphogenesis (Gill et al., 2024).

      We have added the discussion about geometric sensitivity in the section Methods-Numerical Simulations:

      “Small perturbations on initial geometry would affect minor folds, but the main features of major folds, such as orientations, width, and depth, are expected to be conserved across individuals [49, 50]. For simplicity, we do not perturb the fetal brain geometry obtained from datasets.”

      (2) Parameter Space and Breakdown Points

      The numerical model assumes homogeneous growth profiles and simplifies several aspects of cortical mechanics. Parameters such as cortical thickness, modulus ratios, and growth ratios are described in Table II. It would be informative to discuss the range of parameter values for which the model remains valid, and under what conditions the physical and computational models diverge. This would help delineate the boundaries of the current modelling framework and indicate directions for refinement.

      Exploring the valid parameter space is a key problem. We have tested a series of growth parameters and will state them explicitly in our revision. In the current version, we chose the ones that yield a relatively high similarity index to the animal brains. More generally, folding patterns are largely regulated by geometry as well as physical parameters, such as cortical thickness, modulus ratios, growth ratios, and inhomogeneity. In our previous work on a different system, gut morphogenesis, where similar folding patterns are seen, we have explored these features (Gill et al., 2024).

      Reference:

      Gill, H.K., Yin, S., Nerurkar, N.L., Lawlor, J.C., Lee, C., Huycke, T.R., Mahadevan, L. and Tabin, C.J., 2024. Hox gene activity directs physical forces to differentially shape chick small and large intestinal epithelia. Developmental Cell, 59(21), pp.2834-2849.

      (3) Neglected Regional Features: The Occipital Pole of the Macaque

      One conspicuous omission is the lack of attention to the occipital pole of the macaque, which is known to remain smooth even at later gestational stages and has an unusually high neuronal density (2.5× higher than adjacent cortex). This feature is not reproduced in the gel or numerical models, nor is it discussed. Acknowledging this discrepancy-and speculating on possible developmental or mechanical explanationswould add depth to the comparative analysis. The authors may wish to include this as a limitation or a target for future work.

      Yes, we have added that the omission of the Occipital Pole of the macaque is one of our paper’s limitations. Our main aim in this paper is to explore the formation of large-scale folds, so the smooth region is neglected. But future work could include this to make the model more complete.

      The main text has been modified in Methods, 3D model reconstruction, pre-processing:

      “To focus on fold formation, we neglected some smooth regions such as the Occipital Pole of the macaque.”

      (4) Spatio-Temporal Growth Rates and Available Human Data

      The authors note that accurate, species-specific spatio-temporal growth data are lacking, limiting the ability to model inhomogeneous cortical expansion. While this may be true for ferret and macaque, there are high-quality datasets available for human fetal development, now extended through ultrasound imaging (e.g., https://doi.org/10.1038/s41586-023-06630-3). Incorporating or at least referencing such data could improve the fidelity of the human model and expand the applicability of the approach to clinical or pathological scenarios.

      We thank the reviewer for pointing out the very useful datasets that exist for the exploration of inhomogeneous growth driven folding patterns. We have referred to this paper to provide suggestions for further work in exploring the role of growth inhomogeneities.

      We have referred to this high-quality dataset in our main text, Discussion:

      “...the effect of inhomogeneous growth needs to be further investigated by incorporating regional growth of the gray and white matter not only in human brains [29, 31] based on public datasets [45], but also in other species.”

      A few works have tried to incorporate inhomogeneous growth in simulating human brain folding by separating the central sulcus area into several lobes (e.g., lobe parcellation method, Wang, PhD Thesis, 2021). Since our goal in this paper is to explain the large-scale features of folding in a minimal setting, we have kept our model simple and show that it is still capable of capturing the main features of folding in a range of mammalian brains.

      Reference:

      Xiaoyu Wang. Modelisation et caract´ erisation du plissement cortical. Signal and Image Processing. Ecole´ nationale superieure Mines-T´ el´ ecom Atlantique, 2021. English.´ 〈NNT : 2021IMTA0248〉.

      (5) Future Applications: The Inverse Problem and Fossil Brains

      The authors suggest that their morphometric framework could be extended to solve the inverse growth problem-reconstructing fetal geometries from adult brains. This speculative but intriguing direction has implications for evolutionary neuroscience, particularly the interpretation of fossil endocasts. Although beyond the scope of this paper, I encourage the authors to elaborate briefly on how such a framework might be practically implemented and validated.

      For the inverse problem, we could use the following strategies:

      a. Perform systematic simulations using different geometries and physical parameters to obtain the variation in morphologies as a function of parameters.

      b. Using either supervised training or unsupervised training (physics-informed neural networks, PINNs) to learn these characteristic morphologies and classify their dependence on the parameters using neural networks. These can then be trained to determine the possible range of geometrical and physical parameters that yield buckled patterns seen in the systematic simulations.

      c. Reconstruct the 3D surface from fossil endocasts. Using the well-trained neural network, it should be possible to predict the initial shape of the smooth brain cortex, growth profile, and stiffness ratio of the gray and white matter.

      As an example in this direction, supervised neural networks have been used recently to solve the forward problem to predict the buckling pattern of a growing two-layer system (Chavoshnejad et al., 2023). The inverse problem can then be solved using machine-learning methods when the training datasets are the folded shape, which are then used to predict the initial geometry and physical properties.

      Reference:

      Chavoshnejad, P., Chen, L., Yu, X., Hou, J., Filla, N., Zhu, D., Liu, T., Li, G., Razavi, M.J. and Wang, X., 2023. An integrated finite element method and machine learning algorithm for brain morphology prediction. Cerebral Cortex, 33(15), pp.9354-9366.

      Conclusion

      This is a well-executed and creative study that integrates diverse methodologies to address a longstanding question in developmental neurobiology. While a few aspects-such as regional folding peculiarities, sensitivity to initial conditions, and available human data-could be further elaborated, they do not detract from the overall quality and novelty of the work. I enthusiastically support this paper and believe that it will be of broad interest to the neuroscience, biomechanics, and developmental biology communities.

      Note: The paper mentions a companion paper [reference 11] that explores the cellular and anatomical changes in the ferret cortex. I did not have access to this manuscript, but judging from the title, this paper might further strengthen the conclusions.

      The companion paper (Choi et al., 2025) has also been submitted to Elife and can be found on bioXiv here:

      G. P. T. Choi, C. Liu, S. Yin, G. Sejourn´ e, R. S. Smith, C. A. Walsh, L. Mahadevan, Biophysical basis for´ brain folding and misfolding patterns in ferrets and humans. bioRxiv 2025.03.05.641682.

    1. eLife Assessment

      This valuable study introduces a novel experimental and modeling framework to quantify passive joint torques in Drosophila, revealing that passive forces are insufficient to support body weight, contrary to prior assumptions based on larger insects. The approach is technically impressive, combining genetic silencing, kinematic tracking, and biomechanical modeling. However, the strength of evidence is incomplete, limited by concerns about the specificity of the genetic tools, simplifications in the mechanical model, and limited functional interpretation.

    2. Reviewer #1 (Public review):

      Summary:

      In this work, Wang et al. use a combination of genetic tools, novel experimental approaches and biomechanical models to quantify the contribution of passive leg forces in Drosophila. They also deduce that passive forces are not sufficient to support the body weight of the animal. Overall, the contribution of passive forces reported in this work is much less than what one would expect based on the size of the organism and previous literature from larger insects and mammals. This is an interesting finding, but some major caveats in their approach remain unanswered.

      Strengths:

      (1) The authors combine experimental measurements and modeling to quantify the contributions of passive forces at limb joints in Drosophila.

      (2) The authors replicate a previous experimental strategy (Hooper et al 2009, J. Neuro) to suspend animals in air for measuring passive forces and, as in previous studies, find that passive forces are much stronger than gravitational forces acting on the limbs. While in these previous studies using large insects, a lot of invasive approaches for accurately quantifying passive forces are possible (e.g., physically cutting of nerves, directly measuring muscle forces in isolated preparations, etc), the small size of Drosophila makes this difficult. The authors overcome this using a novel approach where they attach additional weight to the leg (changes gravitational force) and inactivate motor neurons (remove active forces). With a few approximations and assumptions, the authors then deduce the contribution of passive forces at each joint for each leg.

      (3) The authors find interesting differences in passive forces across different legs. This could have behavioral implications.

      (4) Finally, the authors compare experimental results of how a free-standing Drosophila is lowered ("falls down") on silencing motor neurons, to a biomechanical "OpenSim" model for deducing the role of passive forces in supporting the body weight of the fly. Using this approach, they conclude that passive forces are not sufficient to support the body weight of the fly.

      Weaknesses:

      (1) Line 65 "(Figure 1A). Inactivation causes a change in the leg's rest position; however, in preliminary experiments, the body rotation did not have a large effect on the rest positions of the leg following inactivation. This result is consistent with the one already reported for stick insects and shows that passive forces within the leg are much larger than the gravitational force on a leg and dominate limb position [1]." This is the direct replication of the previous work by Hooper et al 2009 and therefore authors should ideally show the data for this condition (no weight attached).

      (2) The authors use vglut-gal4, a very broad driver for inactivating motor neurons. The driver labels all glutamatergic neurons, including brain descending neurons and nerve cord interneurons, in addition to motor neurons. Additionally, the strength of inactivation might differ in different neurons (including motor neurons) depending on the expression levels of the opsins. As a result, in this condition, the authors might not be removing all active forces. This is a major caveat that authors do not address. They explore that they are not potentially silencing all inputs to muscles by using an additional octopaminergic driver, but this doesn't address the points mentioned above. At the very least, the authors should try using other motor neuron drivers, as well as other neuronal silencers. This driver is so broad that authors couldn't even use it for physiology experiments. Additionally, the authors could silence VGlut-labeled motor neurons and record muscle activity (potentially using GCaMP as has been done in several recent papers cited by the authors, Azevedo et al, 2020) as a much more direct readout.

      (3) Figure 4 uses an extremely simplified OpenSim model that makes several assumptions that are known to be false. For example, the Thorax-Coxa joint is assumed to be a ball and socket joint, which it is not. Tibia-tarsus joint is completely ignored and likely makes a major contribution in supporting overall posture, given the importance of the leg "claw" for adhering to substrates. Moreover, there are a couple of recent open-source neuromechanical models that include all these details (NeuromechFly by Lobato-Rios et al, 2022, Nat. Methods, and the fly body model by Vaxenburg et al, 2025, Nature). Leveraging these models to rule in or rule out contributions at other joints that are ignored in the authors' OpenSim model would be very helpful to make their case.

      (4) Figure 5 shows the experimental validation of Figure 4 simulations; however, it suffers from several caveats.

      a) The authors track a single point on the head of the fly to estimate the height of the fly. This has several issues. Firstly, it is not clear how accurate the tracking would be. Secondly, it is not clear how the fly actually "falls" on VGlut silencing; do all flies fall in a similar manner in every trial? Almost certainly, there will be some "pitch" and "role" in the way the fly falls. These will affect the location of this single-tracked point that doesn't reflect the authors' expectations. Unless the authors track multiple points on the fly and show examples of tracked videos, it is hard to believe this dataset and, hence, any of the resulting interpretations.

      b) As described in the previous point, the "reason" the fly falls on silencing all glutamatergic neurons could be due to silencing all sorts of premotor/interneurons in addition to the silencing of motor neurons.

      c) (line 175) "The first finding is that there was a large variation in the initial height of the fly (Figure 5C), consistent with a recent study of flies walking on a treadmill[20]." The cited paper refers to how height varies during "walking". However, in the current study, the authors are only looking at "standing" (i.e. non-walking) flies. So it is not the correct reference. In my opinion, this could simply reflect poor estimation of the fly's height based on poor tracking or other factors like pitch and role.

      d) "The rate at which the fly fell to the ground was much smaller in the experimental flies than it was in the simulated flies (Figure 5E). The median rate of falling was 1.3 mm/s compared to 37 mm/s for the simulated flies (Figure 5F). (Line 190) The most likely reason for the longer than expected time for the fly to fall is delays associated with motor neuron inactivation and muscle inactivation." I don't believe this reasoning. There are so many caveats (which I described in the above points) in the model and the experiment, that any of those could be responsible for this massive difference between experiment and modeling. Simply not getting rid of all active forces (inadequate silencing) could be one obvious reason. Other reasons could be that the model is using underestimates of passive forces, as alluded to in point 3.

      (5) Final figure (Figure 6) focuses on understanding the time course of neuronal silencing. First of all, I'm not entirely sure how relevant this is for the story. It could be an interesting supplemental data. But it seems a bit tangential. Additionally, it also suffers from major caveats.

      a) The authors now use a new genetic driver for which they don't have any behavioral data in any previous figures. So we do not know if any of this data holds true for the previous experiments. The authors perform whole-cell recordings from random unidentified motor neurons labeled by E49-Gal4>GtACR1 to deduce a time constant for behavioral results obtained in the VGlut-Gal4>GtACR1 experiments.

      b) The DMD setup is useful for focal inactivation, however, the appropriate controls and data are not presented. Line 200 "A spot of light on the cell body produces as much of the hyperpolarization as stimulating the entire fly (mean of 11.3 mV vs 13.1 mV across 9 neurons). Conversely, excluding the cell body produces only a small effect on the MN (mean of 2.6 mV)." First of all, the control experiment for showing that DMD is indeed causing focal inactivation would be to gradually move the spot of light away from the labeled soma, i.e. to the neighboring "labelled" soma and show that there is indeed focal inactivation. Instead authors move it quite a long distance into unlabeled neuropil. Secondly, I still don't get why the authors are doing this experiment. Even if we believe the DMD is functioning perfectly, all this really tells us is that a random subset motor neurons (maybe 5 or 6 cells, legend is missing this info) labeled by E49-Gal4 is strongly hyperpolarized by its own GtACR1 channel opening, rather than being impacted because of hyperpolarizations in other E49-Gal4 labeled neurons. This has no relevance to the interpretation of any of the VGlut-Gal4 behavioral data. VGLut-Gal4 is much broader and also labels all glutamatergic neurons, most of which are inhibitory interneurons whose silencing could lead to disinhibition of downstream networks.

    3. Reviewer #2 (Public review):

      Summary:

      The authors aim to quantify passive muscle forces in the legs of Drosophila, and test the hypothesis that these forces would be sufficient to support body weight in small insects. They take advantage of the genetic tools available in Drosophila, and use a combination of genetic silencing (optogenetic inactivation of motor neurons), kinematic measurements, and simulations using OpenSim. This integrative toolkit is used to examine the role of passive torques across multiple leg joints. They find that passive forces are weaker than expected - in particular, passive forces were found to be too weak to support the body weight of the fly. This challenges previous scaling assumptions derived from studies in larger insects and has potential implications for our understanding of motor control in small animals.

      Strengths:

      The primary strength of this work lies in its integration of multiple analyses. By pulling together simulations, kinematic measurements from high-resolution videos, and genetic manipulation, they are able to overcome limitations of past studies. In particular, optogenetic manipulation allowed for measurements to be made in whole animals, and the modeling component is valuable because it both validates experimental findings and elucidates the mechanism behind some of the observed dynamic consequences (e.g., the rapid fall after motor inactivation). The conclusions made in the study are well-supported by the data and could have an impact on a number of fields, including invertebrate neurobiology and bioinspired design.

      Weaknesses:

      While (as mentioned above) the study's conclusions are well-supported by the results and modeling, limitations arise because of the assumptions made. For instance, using a linear approximation may not hold at larger joint angles, and future studies would benefit from accounting for nonlinearities. Future studies could also delve into the source of passive forces, which is important for more deeply understanding the anatomical and physical basis of the results in this study. For instance, assessments of muscle or joint properties to correlate stiffness values with physical structure might be an area of future consideration

    4. Reviewer #3 (Public review):

      Summary:

      The authors present a novel method to measure passive joint torques - torques due to internal forces other than active muscle contraction - in the fruit fly: genetically inactivating all motor neurons in intact limb acted upon by a gravitational load results in a change in limb configuration; evaluating the moment equilibrium condition about the limb joints then yields a direct estimate of the passive joint torques. Deactivating all motor neurons in an intact standing fly provided two further conclusions: First, because deactivation causes the fly to drop to the floor, the passive joint torques are deemed insufficient to maintain rotational equilibrium against the body weight; using a multi-body-dynamics simulation, the authors estimate that the passive torques would need to be about 40-80 times higher to maintain a typical posture without active muscle action. Second, a delay between the motor neuron inactivation and the onset of the "free fall" motivates the authors to invoke a simple exponential decay model, which is then used to derive a time constant for muscle deactivation, in robust agreement with direct electro-physiological recordings.

      Strengths:

      The experimental design that permits determination of passive joint torques is elegant, effective, novel, and altogether excellent; it permits measurements previously impossible. A careful error analysis is presented, and a spectrum of technically challenging methods, including multi-body dynamics and e-phys, is deployed to further interpret and contextualise the results.

      Weaknesses:

      (1) Passive torques are measured, but only some short speculative statements, largely based on previous work, are offered on their functional significance; some of these claims are not well supported by experimental evidence or theoretical arguments. Passive forces are judged as "large" compared to the weight force of the limb, but the arguably more relevant force is the force limb muscles can generate, which, even in equilibrium conditions, is already about two orders of magnitude larger. The conclusion that passive forces are dynamically irrelevant seems natural, but contrasts with the assertion that "passive forces [...] will have a strong influence on limb kinematics". As a result, the functional significance of passive joint torques in the fruit fly, if any, remains unclear, and this ambiguity represents a missed opportunity. We now know the magnitude of passive joint torques - do they matter and for what? Are they helpful, for example, to maintain robust neuronal control, or a mechanical constraint that negatively impacts performance, e.g., because they present a sink for muscle work?

      (2) The work is framed with a scaling argument, but the assumptions that underpin the associated claims are not explicit and can thus not be evaluated. This is problematic because at least some arguments appear to contradict textbook scaling theory or everyday experience. For example, active forces are assumed to scale with limb volume, when every textbook would have them scale with area instead; and the asserted scaling of passive forces involves some hidden assumptions that demand more explicit discussion to alert the reader to associated limitations. Passive forces are said to be important only in small animals, but a quick self-experiment confirms that they are sufficient to stabilize human fingers or ankles against gravity, systems orders of magnitude larger than an insect limb, in seeming contradiction with the alleged dominance of scale. Throughout the manuscript, there are such and similar inaccuracies or ambiguities in the mechanical framing and interpretation, making it hard to fairly evaluate some claims, and rendering others likely incorrect.

    5. Author response:

      Reviewer 1:

      (1) Line 65 "(Figure 1A). Inactivation causes a change in the leg's rest position; however, in preliminary experiments, the body rotation did not have a large effect on the rest positions of the leg following inactivation. This result is consistent with the one already reported for stick insects and shows that passive forces within the leg are much larger than the gravitational force on a leg and dominate limb position [1]." This is the direct replication of the previous work by Hooper et al 2009 and therefore authors should ideally show the data for this condition (no weight attached).

      We did not present this data – the effect of inactivation on the leg’s rest position in unweighted leg - because it was already reported in the case of stick insects. However, we understand the reviewer’s point that it is important to present the data showing this replication. We will do the same in the revised version.

      (2) The authors use vglut-gal4, a very broad driver for inactivating motor neurons. The driver labels all glutamatergic neurons, including brain descending neurons and nerve cord interneurons, in addition to motor neurons. Additionally, the strength of inactivation might differ in different neurons (including motor neurons) depending on the expression levels of the opsins. As a result, in this condition, the authors might not be removing all active forces. This is a major caveat that authors do not address. They explore that they are not potentially silencing all inputs to muscles by using an additional octopaminergic driver, but this doesn't address the points mentioned above. At the very least, the authors should try using other motor neuron drivers, as well as other neuronal silencers. This driver is so broad that authors couldn't even use it for physiology experiments. Additionally, the authors could silence VGlut-labeled motor neurons and record muscle activity (potentially using GCaMP as has been done in several recent papers cited by the authors, Azevedo et al, 2020) as a much more direct readout.

      This reviewer critique is related to the use of vglut-gal4 –a broad driver– to inactivate motor neurons (MNs). The reviewer argues that the use of a broad driver might result in some effects that are not due to MN inactivation. Conversely, it is possible that not all MNs are inactivated. These critiques raise important points that we will address in the revision by 1) performing experiments with other MN drivers as suggested by the reviewer, 2) performing experiments in flies that are inactivated by freezing. These measurements will provide other estimates of passive forces allowing us to better triangulate the range of values for the passive forces. Moreover, it appears that one of the reviewer’s main concern is that the passive forces are overestimated because of the residual active forces. We will discuss this possibility in detail. It is important to note that in the end what we hope to accomplish is to provide a useful estimate of the passive forces. It is unlikely that the passive force will be a precise number like a physical constant as the passive forces likely depend on recent history.

      (3) Figure 4 uses an extremely simplified OpenSim model that makes several assumptions that are known to be false. For example, the Thorax-Coxa joint is assumed to be a ball and socket joint, which it is not. Tibia-tarsus joint is completely ignored and likely makes a major contribution in supporting overall posture, given the importance of the leg "claw" for adhering to substrates. Moreover, there are a couple of recent open-source neuromechanical models that include all these details (NeuromechFly by Lobato-Rios et al, 2022, Nat. Methods, and the fly body model by Vaxenburg et al, 2025, Nature). Leveraging these models to rule in or rule out contributions at other joints that are ignored in the authors' OpenSim model would be very helpful to make their case.

      Our OpenSim model predates the newer mechanical model. In the revised manuscript, we will revisit the model in light of recent developments.

      (4) Figure 5 shows the experimental validation of Figure 4 simulations; however, it suffers from several caveats.

      a) The authors track a single point on the head of the fly to estimate the height of the fly. This has several issues. Firstly, it is not clear how accurate the tracking would be. Secondly, it is not clear how the fly actually "falls" on VGlut silencing; do all flies fall in a similar manner in every trial? Almost certainly, there will be some "pitch" and "role" in the way the fly falls. These will affect the location of this single-tracked point that doesn't reflect the authors' expectations. Unless the authors track multiple points on the fly and show examples of tracked videos, it is hard to believe this dataset and, hence, any of the resulting interpretations.

      b) As described in the previous point, the "reason" the fly falls on silencing all glutamatergic neurons could be due to silencing all sorts of premotor/interneurons in addition to the silencing of motor neurons.

      c) (line 175) "The first finding is that there was a large variation in the initial height of the fly (Figure 5C), consistent with a recent study of flies walking on a treadmill[20]." The cited paper refers to how height varies during "walking". However, in the current study, the authors are only looking at "standing" (i.e. non-walking) flies. So it is not the correct reference. In my opinion, this could simply reflect poor estimation of the fly's height based on poor tracking or other factors like pitch and role.

      d) "The rate at which the fly fell to the ground was much smaller in the experimental flies than it was in the simulated flies (Figure 5E). The median rate of falling was 1.3 mm/s compared to 37 mm/s for the simulated flies (Figure 5F). (Line 190) The most likely reason for the longer than expected time for the fly to fall is delays associated with motor neuron inactivation and muscle inactivation." I don't believe this reasoning. There are so many caveats (which I described in the above points) in the model and the experiment, that any of those could be responsible for this massive difference between experiment and modeling. Simply not getting rid of all active forces (inadequate silencing) could be one obvious reason. Other reasons could be that the model is using underestimates of passive forces, as alluded to in point 3.

      (4a) Although we agree that measuring different points on the body would allow us to estimate the moments, we disagree that the height of the fly cannot be evaluated from the measurement of a single point. The measurements have been performed using the same techniques that we used to assess the fly’s height in a different study where we estimated the resolution of our imaging system to be ~20 mm(Chun et. al. 2021). We will include these details in the revised manuscript. The video showing the falling experiments are not available or referenced in the manuscript. These will be made available.

      b) We will repeat the “falling” experiment with a more restrictive driver.

      c) We disagree with the reviewer on this point. The system has a resolution of ~20 mm and is sufficient to make conclusion about the difference in the height of the fly. We will clarify this point in the revised manuscript.

      d) We do not follow the reviewer’s rationale here. The passive forces in the model (along with any residual forces) are the same in the model as well as in the experiment. Moreover, there will be a delay between light onset, neuronal inactivation and muscle inactivation. These processes are not instantaneous. In Figure 6, we estimate these delays and have concluded that they will cause substantial delay. In the revised manuscript, we will discuss other reasons for the delay suggested by the reviewer.

      (5) Final figure (Figure 6) focuses on understanding the time course of neuronal silencing. First of all, I'm not entirely sure how relevant this is for the story. It could be an interesting supplemental data. But it seems a bit tangential. Additionally, it also suffers from major caveats.

      a) The authors now use a new genetic driver for which they don't have any behavioral data in any previous figures. So we do not know if any of this data holds true for the previous experiments. The authors perform whole-cell recordings from random unidentified motor neurons labeled by E49-Gal4>GtACR1 to deduce a time constant for behavioral results obtained in the VGlut-Gal4>GtACR1 experiments.

      b) The DMD setup is useful for focal inactivation, however, the appropriate controls and data are not presented. Line 200 "A spot of light on the cell body produces as much of the hyperpolarization as stimulating the entire fly (mean of 11.3 mV vs 13.1 mV across 9 neurons). Conversely, excluding the cell body produces only a small effect on the MN (mean of 2.6 mV)." First of all, the control experiment for showing that DMD is indeed causing focal inactivation would be to gradually move the spot of light away from the labeled soma, i.e. to the neighboring "labelled" soma and show that there is indeed focal inactivation. Instead authors move it quite a long distance into unlabeled neuropil. Secondly, I still don't get why the authors are doing this experiment. Even if we believe the DMD is functioning perfectly, all this really tells us is that a random subset motor neurons (maybe 5 or 6 cells, legend is missing this info) labeled by E49-Gal4 is strongly hyperpolarized by its own GtACR1 channel opening, rather than being impacted because of hyperpolarizations in other E49-Gal4 labeled neurons. This has no relevance to the interpretation of any of the VGlut-Gal4 behavioral data. VGLut-Gal4 is much broader and also labels all glutamatergic neurons, most of which are inhibitory interneurons whose silencing could lead to disinhibition of downstream networks.

      (5 a) However, we can address the reviewer critique by recording from the Vglut line while using a MN line to target the recordings to MNs.

      b) Once we use the Vglut driver to perform these recordings, it will help assess how much of the MN inactivation is due to the GtACR expressed in the MN versus other neurons.

      Reviewer 2:

      While (as mentioned above) the study's conclusions are well-supported by the results and modeling, limitations arise because of the assumptions made. For instance, using a linear approximation may not hold at larger joint angles, and future studies would benefit from accounting for nonlinearities. Future studies could also delve into the source of passive forces, which is important for more deeply understanding the anatomical and physical basis of the results in this study. For instance, assessments of muscle or joint properties to correlate stiffness values with physical structure might be an area of future consideration.

      We agree with these comments but believe that these studies represent avenues for future work.

      Reviewer 3:

      (1) Passive torques are measured, but only some short speculative statements, largely based on previous work, are offered on their functional significance; some of these claims are not well supported by experimental evidence or theoretical arguments. Passive forces are judged as "large" compared to the weight force of the limb, but the arguably more relevant force is the force limb muscles can generate, which, even in equilibrium conditions, is already about two orders of magnitude larger. The conclusion that passive forces are dynamically irrelevant seems natural, but contrasts with the assertion that "passive forces [...] will have a strong influence on limb kinematics". As a result, the functional significance of passive joint torques in the fruit fly, if any, remains unclear, and this ambiguity represents a missed opportunity. We now know the magnitude of passive joint torques - do they matter and for what? Are they helpful, for example, to maintain robust neuronal control, or a mechanical constraint that negatively impacts performance, e.g., because they present a sink for muscle work?

      To us, measuring passive forces was the first step to understanding neural/biomechanical control of limb. In general, we agree with these comments and would like to understand the role of passive forces in overall control of limb. A complete discussion of the role of the significance of passive forces in the control of limb is beyond the scope of this study. We would like to note that it is unlikely that the active forces are two orders of magnitude larger during unloaded movement of the limb. However, these issues will have to be settled in future work.

      (2) The work is framed with a scaling argument, but the assumptions that underpin the associated claims are not explicit and can thus not be evaluated. This is problematic because at least some arguments appear to contradict textbook scaling theory or everyday experience. For example, active forces are assumed to scale with limb volume, when every textbook would have them scale with area instead; and the asserted scaling of passive forces involves some hidden assumptions that demand more explicit discussion to alert the reader to associated limitations. Passive forces are said to be important only in small animals, but a quick self-experiment confirms that they are sufficient to stabilize human fingers or ankles against gravity, systems orders of magnitude larger than an insect limb, in seeming contradiction with the alleged dominance of scale. Throughout the manuscript, there are such and similar inaccuracies or ambiguities in the mechanical framing and interpretation, making it hard to fairly evaluate some claims, and rendering others likely incorrect.

      We interpret this comment as making two separate points. The first one is that the reviewer says that our statement that active forces depend on the third power of the limb or L<sup>3</sup> is incorrect. We agree and apologize for this oversight. Specifically, on L6-7 we say, “both inertial forces and active forces scale with the mass if the limb which in turn scales with the volume of the limb and therefore depends on the third power of limb length (L<sup>3</sup>)”. Instead, this statement should read “inertial forces scale with the mass if the limb which in turn scales with the volume of the limb and therefore depends on the third power of limb length (L<sup>3</sup>)”. However, this oversight does not affect the scaling argument as the scaling arguments in the rest of the manuscript only involves inertial forces and not active forces.

      The second point is about the scaling law that governs passive forces. In the current manuscript, we have assumed that the passive forces scale as L<sup>2</sup> based on previous work. The reviewer has pointed out that this assumption might be incorrect or at the very least needs a rationale. We agree with this assessment: passive forces that arise in the muscle are likely to scale as L<sup>2</sup> but passive forces that arise in the joint might not. In the revised manuscript, we will discuss this concern.

      Response to the public comment:

      There was a comment from a reader: “None of our work cited in various places in this preprint (i.e., Zakotnik et al. 2006, Guschlbauer et al. 2007, Page et al. 2008, Hooper et al. 2009, Hooper 2012, Ache and Matheson 2012, Blümel et al. 2012, Ache and Matheson 2013, von Twickel et al. 2019, and Guschlbauer et al. 2022) claims or implies that passive forces could be sufficient to support the weight of an insect or any animal. To claim or suggest otherwise (as done in lines 33-35) is incorrect and sets up a misleading straw man that misrepresents our work. All statements in the preprint regarding our work related to this specific matter need to be removed or edited accordingly. For instance, the investigations, calculations, and interpretations in Hooper et al. 2009 are solely about limbs that are not being used in stance or other loaded tasks (indeed, the article's title specifically refers to "unloaded" leg posture and movements). Trying to use this work to predict whether passive muscle forces alone can support a stick insect against gravity requires considering much more than the oversimplified calculation given in lines 290-292. Other “back of the envelope calculations” (lines 299-300) are likely also insufficient and erroneous. The discussion in lines 289-304 needs to be edited accordingly”

      We thank the reader for their comment. However, we interpret these studies differently. The studies above rightly focused on unloaded legs because it would be difficult to study passive forces in an intact insect without genetic tools. The commenter correctly points out that these studies do not comment on whether passive forces are strong enough to support the weight of the fly. However, we disagree that our arguments based on their results are unreasonable or strawman. We think that our interpretation of their measurements is correct. Moreover, we were motivated by Yox et. el. 1982 who states in so many words: “Stiffness of the muscles in the joints of all the legs might be sufficient to support a resting arthropod. A more rigorous analysis of all supporting limbs and joint angles would be required to prove this hypothesis”. We were inspired by this comment. In the revised manuscript, we will make it clear that the statement made in Line 33 is based on Yox. et. al. and our interpretation of measurements made by others.

    1. eLife Assessment

      This important study characterises the morphogenesis of cortical folding in the ferret and human cerebral cortex using complementary physical and computational modelling. Notably, these approaches are applied to charting, in the ferret model, known abnormalities of cortical folding in humans. The study finds that variation in cortical thickness and expansion account for deviations in morphology, and supports these findings using cutting-edge approaches from both physical gel models and numerical simulations. The strength of evidence is convincing, and although it could benefit from more quantitative assessment, the study will be of broad interest to the field of developmental neuroscience.

    2. Reviewer #1 (Public review):

      The manuscript by Choi and colleagues investigates the impact of variation in cortical geometry and growth on cortical surface morphology. Specifically, the study uses physical gel models and computational models to evaluate the impact of varying specific features/parameters of the cortical surface. The study makes use of this approach to address the topic of malformations of cortical development and finds that cortical thickness and cortical expansion rate are the drivers of differences in morphogenesis.

      The study is composed of two main sections. First, the authors validate numerical simulation and gel model approaches against real cortical postnatal development in the ferret. Next, the study turns to modelling malformations in cortical development using modified tangential growth rate and cortical thickness parameters in numerical simulations. The findings investigate three genetically linked cortical malformations observed in the human brain to demonstrate the impact of the two physical parameters on folding in the ferret brain.

      This is a tightly presented study that demonstrates a key insight into cortical morphogenesis and the impact of deviations from normal development. The dual physical and computational modeling approach offers the potential for unique insights into mechanisms driving malformations. This study establishes a strong foundation for further work directly probing the development of cortical folding in the ferret brain. One weakness of the current study is that the interpretation of the results in the context of human cortical development is at present indirect, as the modelling results are solely derived from the ferret. However, these modelling approaches demonstrate proof of concept for investigating related alterations more directly in future work through similar approaches to models of the human cerebral cortex.

    3. Reviewer #2 (Public review):

      Summary:

      Based on MRI data of the ferret (a gyrencephalic non-primate animal, in whom folding happens postnatally), the authors create in vitro physical gel models and in silico numerical simulations of typical cortical gyrification. They then use genetic manipulations of animal models to demonstrate that cortical thickness and expansion rate are primary drivers of atypical morphogenesis. These observations are then used to explain cortical malformations in humans.

      Strengths:

      The paper is very interesting and original, and combines physical gel experiments, numerical simulations, as well as observations in MCD. The figures are informative, and the results appear to have good overall face validity.

      Weaknesses:

      On the other hand, I perceived some lack of quantitative analyses in the different experiments, and currently, there seems to be rather a visual/qualitative interpretation of the different processes and their similarities/differences.

      Ideally, the authors also quantify local/pointwise surface expansion in the physical and simulation experiments, to more directly compare these processes. Time courses of eg, cortical curvature changes, could also be plotted and compared for those experiments.

      I had a similar impression about the comparisons between simulation results and human MRI data. Again, face validity appears high, but the comparison appeared mainly qualitative.

      I felt that MCDs could have been better contextualized in the introduction.

    4. Author response:

      Reviewer 1 (Public review):

      The manuscript by Choi and colleagues investigates the impact of variation in cortical geometry and growth on cortical surface morphology. Specifically, the study uses physical gel models and computational models to evaluate the impact of varying specific features/parameters of the cortical surface. The study makes use of this approach to address the topic of malformations of cortical development and finds that cortical thickness and cortical expansion rate are the drivers of differences in morphogenesis.

      The study is composed of two main sections. First, the authors validate numerical simulation and gel model approaches against real cortical postnatal development in the ferret. Next, the study turns to modelling malformations in cortical development using modified tangential growth rate and cortical thickness parameters in numerical simulations. The findings investigate three genetically linked cortical malformations observed in the human brain to demonstrate the impact of the two physical parameters on folding in the ferret brain.

      This is a tightly presented study that demonstrates a key insight into cortical morphogenesis and the impact of deviations from normal development. The dual physical and computational modeling approach offers the potential for unique insights into mechanisms driving malformations. This study establishes a strong foundation for further work directly probing the development of cortical folding in the ferret brain. One weakness of the current study is that the interpretation of the results in the context of human cortical development is at present indirect, as the modelling results are solely derived from the ferret. However, these modelling approaches demonstrate proof of concept for investigating related alterations more directly in future work through similar approaches to models of the human cerebral cortex.

      We thank the reviewer for the very positive comments. While the current gel and organismal experiments focus on the ferret only, we want to emphasize that our analysis does consider previous observations of human brains and morphologies therein (Tallinen et al., Proc. Natl. Acad. Sci. 2014; Tallinen et al., Nat. Phys. 2016), which we compare and explain. This allows us to analyze the implications of our study broadly to understand the explanations of cortical malformations in humans using the ferret to motivate our study. Further analysis of normal human brain growth using computational and physical gel models can be found in our companion paper (Yin et al., 2025), also submitted to eLife:

      S. Yin, C. Liu, G. P. T. Choi, Y. Jung, K. Heuer, R. Toro, L. Mahadevan, Morphogenesis and morphometry of brain folding patterns across species. bioRxiv 2025.03.05.641692.

      In future work, we plan to obtain malformed human cortical surface data, which would allow us to further investigate related alterations more directly.

      Reviewer 2 (Public review):

      Summary:

      Based on MRI data of the ferret (a gyrencephalic non-primate animal, in whom folding happens postnatally), the authors create in vitro physical gel models and in silico numerical simulations of typical cortical gyrification. They then use genetic manipulations of animal models to demonstrate that cortical thickness and expansion rate are primary drivers of atypical morphogenesis. These observations are then used to explain cortical malformations in humans.

      Strengths:

      The paper is very interesting and original, and combines physical gel experiments, numerical simulations, as well as observations in MCD. The figures are informative, and the results appear to have good overall face validity.

      We thank the reviewer for the very positive comments.

      Weaknesses:

      On the other hand, I perceived some lack of quantitative analyses in the different experiments, and currently, there seems to be rather a visual/qualitative interpretation of the different processes and their similarities/differences. Ideally, the authors also quantify local/pointwise surface expansion in the physical and simulation experiments, to more directly compare these processes. Time courses of eg, cortical curvature changes, could also be plotted and compared for those experiments. I had a similar impression about the comparisons between simulation results and human MRI data. Again, face validity appears high, but the comparison appeared mainly qualitative.

      We thank the reviewer for the comments. Besides the visual and qualitative comparisons between the models, we would like to point out that we have included the quantification of the shape difference between the real and simulated ferret brain models via spherical parameterization and the curvature-based shape index as detailed in main text Fig. 4 and SI Section 3. We have also utilized spherical harmonics representations for the comparison between the real and simulated ferret brains at different maximum order N. In our revision, we plan to further include the curvature-based shape index calculations for the comparison between the real and simulated ferret brains at more time points.

      As for the comparison between the malformation simulation results and human MRI data in the current work, since the human MRI data are two-dimensional while our computational models are threedimensional, we focus on the qualitative comparison between them. In future work, we plan to obtain malformed human cortical surface data, from which we can then perform the parameterization-based and curvature-based shape analysis for a more quantitative assessment.

      I felt that MCDs could have been better contextualized in the introduction.

      We thank the reviewer for the comment and will include a more detailed introduction to MCDs in our revision.

    1. eLife Assessment

      This is an important study reporting a new phenotype for a gene cluster that has previously been associated with the responses of the Gram-negative opportunistic pathogen Pseudomonas aeruginosa to flow fluid. Expression of the froABCD gene cluster is induced by HOCl in vitro and by activated immune cells, which produce these types of reactive chlorine species. Overall, the evidence presented by the authors is solid; however, the mechanism of fro-induction by HOCl remains unclear, and the evidence in support of the authors' claims is descriptive, which needs to be improved. This study is of interest to infection biologists interested in mechanisms of bacterial pathogenicity.

    2. Reviewer #1 (Public review):

      Summary:

      Foik et al. report that hypochlorous acid, a reactive chlorine species generated during host defense, activates the transcription of the froABCD in P. aeruginosa. This gene cluster had previously been associated with a potential role during the flow of fluids and appears to be regulated by the sigma factor FroR and its anti-sigma factor FroI. In the present study, the authors show that froABCD is expressed both in neutrophils and macrophages, which they claim is likely a result of HOCl but not H2O2 production. Fro expression is also induced in a murine model of corneal infection, which is characterized by immune cell invasion. Expression of the fro system can be quenched by several antioxidants, such as methionine, cysteine, and others. FroR-deficient cells that lack froABCD expression during HOCl stress appear more sensitive to the oxidant.

      Strengths:

      The authors provide a number of data supporting their claim that transcription of the froABCD system is induced by reactive chlorine species. This was shown by RNAseq, qRT-PCR, and through microscopy using a transcriptional reporter fusion. Likewise, elevated expression of froABCD was shown in vitro and in vivo, excluding potential in vitro artifacts. The manuscript, while mostly descriptive, is easy to follow, and the data were presented clearly.

      Weaknesses:

      (1) Lines 60-62: Some of the authors' conclusions are not supported by the data and thus appear unfounded. One example: "we determine that fro upregulation.....These data suggest a novel mechanism..." Their data do not show that MSR upregulation is a direct effect of FroABCD. Instead, it could be possible that the FroR sigma factor also controls the expression of msr genes, which would be independent of froABCD.

      (2) The authors show increased fro transcription both in neutrophils and macrophages; however, the two types of immune cells differ quite dramatically with respect to myeloperoxidase activation and HOCl production. Neither has this been discussed nor considered here.

      (3) With respect to the activation of fro expression upon challenge with conditioned media from stimulated neutrophils, does the conditioned media contain detectable amounts of HOCl? Do chloramines, which are byproducts of HOCl oxidation with amines, also stimulate expression?

      (4) A better control to prove that this fro expression is indeed induced by HOCl in activated neutrophils would be to conduct the experiments in the presence of a myeloperoxidase inhibitor.

      (5) The work was conducted with two different P. aeruginosa strains (i.e. AL143 and PAO1F). None of the figure legends provides details on which strain was used. For instance, in line 111, the authors refer to Figure S1B for data that I thought were done with PAO1F, while in 154, data were presented in the context of the infection model, which was conducted with the other strain.

      (6) It would be good if immune cell recruitment at 2hrs and 20hrs PI could be quantified.

      (7) The conclusions of Figure 4 are, in my opinion, weak (line 187-188; "It is possible that ....."). These antioxidants likely quench the low amounts of NaOCl directly. This would significantly reduce the NaOCl concentrations to a level that no longer activates expression of fro. There is no direct evidence provided that oxidized methionine induces fro expression. Do the authors postulate that this is free methionine, or could methionine and/or cysteine oxidation in FroR increase the binding affinity of the sigma factor to the promoter? Another possibility is that NaOCl deactivates the anti-sigma factor. None of these scenarios has been considered here.

      (8) Line 184: The reaction constants of HOCl with Cys and Met are similar.

      (9) Treatment with 16 uM NaOCl caused a growth arrest of ~15 hrs in the WT (Figure 5A), whereas no growth at all was recorded with 7.5 uM in Figure 3A.

      (10) The concentration range of NaOCl causing fro expression is extremely narrow, while oxidative burst rapidly generates HOCl at much higher concentrations. This should be discussed in more detail.

    3. Reviewer #2 (Public review):

      Summary:

      Foik et al. studied the regulation of the fro operon in response to HOCl, an oxidant derived from immune cells, especially neutrophils. They use a transcriptional fusion of YFP to the froA promoter in an mCherry-expressing P. aeruginosa strain to determine fro-induction under the microscope. They use this system to study fro expression in medium, in the presence of neutrophils and macrophages, neutrophil-conditioned medium, and several chemical stimuli, including NaCl, HOCl, hydrogen peroxide, nitric acid, hydrochloric acid, and sodium hydroxide. They also use a corneal infection model to demonstrate that froA is upregulated in P. aeruginosa 20 h post-infection and perform transcriptional analyses in WT and a froR mutant in response to HOCl.

      Strengths:

      Their data clearly shows that HOCl is a strong inducer of the fro Operon. The addition of HOCl-quenching chemicals together with HOCl abrogates the response. They also show that a froR mutant is more susceptible to HOCl than WT. Their transcriptomic data reveal genes under control of the FroR/FroI sigma factor/anti sigma factor system.

      Weaknesses:

      Although the presented evidence is mostly solid, some of their findings need to be evaluated more carefully; explaining the rationale behind some of the experiments might enhance the article, and some of the models proposed by the authors seem far-fetched, as outlined below:

      (1) In line 76 the authors claim "Relative to P. aeruginosa that were incubated in host cell-free media, P. aeruginosa in close proximity to human neutrophils or that were engulfed in mouse macrophages appeared to increase fro expression (Fig. 1C)". Counting bacterial cells in Figure 1C shows that 1 in 17 bacteria (5.8%) induce the froA-promotor in media in the absence of immune cells, while 4 in 72 bacteria (only 5.5%) do the same in the presence of neutrophils. Contrary to the authors' claims, it appears that P. aeruginosa actually decreases fro-expression in close proximity to neutrophils. There is a slight increase in fro-expression in bacteria co-incubated with macrophages (3 in 21, or 14.3%). A more rigorous statistical analysis might substantiate the authors' claim, but, as is, the claim "neutrophils increase fro expression" is untenable.

      (2) The authors should explain the rationale behind some of the chemicals used. Why did they use nitric acid? Especially at these high concentrations, a strong acid such as nitric acid might have a significant influence on the medium pH. I understand that the medium is phosphate-buffered, but 25 mM nitric acid in an unbuffered medium would shift the pH well below 2. Similar considerations apply to hydrochloric acid and sodium hydroxide.

      (3) In line 187, the authors state that "It is possible that oxidized methionine increases fro expression" and they suggest a model to that effect in Figure 5D. It is unclear why the authors singled out methionine sulfoxide, since a number of other things get oxidized by HOCl. In line 184, the authors state, in the same vein, that "HOCl oxidizes methionine residues 100-fold more rapidly than other cellular components". The authors should state which other cellular compounds they are referring to. Certainly not cysteine and other thiols, which react equally fast and are highly abundant in the cell: P. aeruginosa contains 340 µM GSH, 140 µM CoA-SH (https://doi.org/10.1074/jbc.RA119.009934) plus free cysteine and cysteines in proteins (based on codon usage, 1.34% of amino acids in proteins are cysteine, while methionine is only slightly more present at 2.10%, although a number of starting methionines are removed from mature proteins).

      (4) Overall (and this is probably not addressable with the authors' data), some very interesting questions remain unanswered: what is the molecular mechanism of fro-induction? How is the FroR/FroI system modulated by HOCl? Does the system sense free or protein-bound methionine-sulfoxide? Are certain methionine residues in these proteins directly oxidized by HOCl? Many "HOCl-sensing" proteins are also modified at cysteine residues or amino groups; could those play a role? And lastly: what is the connection between shear/fluid flow and HOCl, or are these totally separate mechanisms of fro-induction?

    4. Author response:

      We greatly appreciate the efforts of the reviewers, which have provided insightful and helpful comments to improve the manuscript. The feedback touches upon a number of topics, focusing on clarification or justification of experimental techniques and on understanding the mechanism by which P. aeruginosa detects HOCl. All reviewers raised the issue of how HOCl activates fro expression, including whether free or protein-bound methionine, cysteine, or other HOCl byproducts induce this expression. For the upcoming revision, we plan to perform experiments that address this issue and will discuss potential mechanistic models in light of the new data. In addition, we plan to perform additional experiments to address a reviewer’s concerns regarding the dependence of the fro response on HOCl production by neutrophils. The revision will correct imprecise statements pointed out by reviewers, and address all remaining issues requiring clarification or further discussion, including the range of HOCl sensitivity, relationship between HOCl and flow sensitivity, and justification for testing the fro response to nitric acid.

    1. eLife Assessment

      This study provides valuable insights into the host's variable susceptibility to Mycobacterium tuberculosis, using a novel collection of wild-derived inbred mouse lines from diverse geographic locations, along with immunological and single-cell transcriptomic analyses. While the data are convincing, a deeper mechanistic investigation into neutrophil subset functions would have further enhanced the study. This work will interest microbiologists and immunologists in the tuberculosis field.

    2. Reviewer #1 (Public review):

      Summary:

      This study investigated the heterogeneous responses to Mycobacterium tuberculosis (Mtb) in 19 wild-derived inbred mouse strains collected from various geographic locations. The goal of this study is to identify novel mechanisms that regulate host susceptibility to Mtb infection. Using the genetically resistant C57BL/6 mouse strain as the control, they successfully identified a few mouse strains that revealed higher bacterial burdens in the lung, implicating increased susceptibility in those mouse strains. Furthermore, using flow cytometry analysis, they discovered strong correlations between CFU and various immune cell types, including T cells and B cells. The higher neutrophil numbers correlated with significantly higher CFU in some of the newly identified susceptible mouse strains. Interestingly, MANB and MANC mice exhibited comparable numbers of neutrophils but showed drastically different bacterial burdens. The authors then focused on the neutrophil heterogeneity and utilized a single-cell RNA-seq approach, which led to identifying distinct neutrophil subsets in various mouse strains, including C57BL/6, MANA, MANB, and MANC. Pathway analysis on neutrophils in susceptible MANC strain revealed a highly activated and glycolytic phenotype, implicating a possible mechanism that may contribute to the susceptible phenotype. Lastly, the authors found that a small group of neutrophil-specific genes are expressed across many other cell types in the MANC strain.

      Strengths:

      This manuscript has many strengths.

      (1) Utilizing and characterizing novel mouse strains that complement the current widely used mouse models in the field of TB. Many of those mouse strains will be novel tools for studying host responses to Mtb infection.

      (2) The study revealed very unique biology of neutrophils during Mtb infection. It has been well-established that high numbers of neutrophils correlate with high bacterial burden in mice. However, this work uncovered that some mouse strains could be resistant to infection even with high numbers of neutrophils in the lung, indicating the diverse functions of neutrophils. This information is important.

      Weaknesses:

      The weaknesses of the manuscript are that the work is relatively descriptive. It is unclear whether the neutrophil subsets are indeed functionally different. While single-cell RNA seq did provide some clues at transcription levels, functional and mechanistic investigations are lacking. Similarly, it is unclear how highly activated and glycolytic neutrophils in MANC strain contribute to its susceptibility.

    3. Reviewer #2 (Public review):

      Summary:

      These studies investigate the phenotypic variability and roles of neutrophils in tuberculosis (TB) susceptibility by using a diverse collection of wild-derived inbred mouse lines. The authors aimed to identify new phenotypes during Mycobacterium tuberculosis infection by developing, infecting, and phenotyping 19 genetically diverse wild-derived inbred mouse lines originating from different geographic regions in North America and South America. The investigators achieved their main goals, which were to show that increasing genetic diversity increases the phenotypic spectrum observed in response to aerosolized M. tuberculosis, and further to provide insights into immune and/or inflammatory correlates of pulmonary TB. Briefly, investigators infected wild-derived mice with aerosolized M. tuberculosis and assessed early infection control at 21 days post-infection. The time point was specifically selected to correspond to the period after infection when acquired immunity and antigen-specific responses manifest strongly, and also early susceptibility (morbidity and mortality) due to M. tuberculosis infection has been observed in other highly susceptible wild-derived mouse strains, some Collaborative Cross inbred strains, and approximately 30% of individuals in the Diversity Outbred mouse population. Here, the investigators normalized bacterial burden across mice based on inoculum dose and determined the percent of immune cells using flow cytometry, primarily focused on macrophages, neutrophils, CD4 T cells, CD8 T cells, and B cells in the lungs. They also used single-cell RNA sequencing to identify neutrophil subpopulations and immune phenotypes, elegantly supplemented with in vitro macrophage infections and antibody depletion assays to confirm immune cell contributions to susceptibility. The main results from this study confirm that mouse strains show considerable variability to M. tuberculosis susceptibility. Authors observed that enhanced infection control correlated with higher percentages of CD4 and CD8 T cells, and B cells, but not necessarily with the percentage of interferon-gamma (IFN-γ) producing cells. High levels of neutrophils and immature neutrophils (band cells) were associated with increased susceptibility, and the mouse strain with the most neutrophils, the MANC line, exhibited a transcriptional signature indicative of a highly activated state, and containing potentially tissue-destructive, mediators that could contribute to the strain's increased susceptibility and be leveraged to understand how neutrophils drive lung tissue damage, cavitation, and granuloma necrosis in pulmonary TB.

      Strengths:

      The strengths are addressing a critically important consideration in the tuberculosis field - mouse model(s) of the human disease, and taking advantage of the novel phenotypes observed to determine potential mechanisms. Notable strengths include,

      (1) Innovative generation and use of mouse models: Developing wild-derived inbred mice from diverse geographic locations is innovative, and this approach expands the range of phenotypic responses observed during M. tuberculosis infection. Additionally, the authors have deposited strains at The Jackson Laboratory making these valuable resources available to the scientific community.

      (2) Potential for translational research: The findings have implications for human pulmonary TB, particularly the discovery of neutrophil-associated susceptibility in primary infection and/or neutrophil-mediated disease progression that could both inform the development of therapeutic targets and also be used to test the effectiveness of such therapies.

      (3) Comprehensive experimental design: The investigators use many complementary approaches including in vivo M. tuberculosis infection, in vitro macrophage studies, neutrophil depletion experiments, flow cytometry, and a number of data mining, machine learning, and imaging to produce robust and comprehensive analyses of the wild-derives d strains and neutrophil subpopulations in 3 weeks after M. tuberculosis infection.

      Weaknesses:

      The manuscript and studies have considerable strengths and very few weaknesses. One minor consideration is that phenotyping is limited to a single limited-time point; however, this time point was carefully selected and has a strong biological rationale provided by investigators. This potential weakness does not diminish the overall findings, exciting results, or conclusions.

    4. Author response:

      Reviewer #1 (Public review):

      […] Strengths:

      This manuscript has many strengths.

      (1) Utilizing and characterizing novel mouse strains that complement the current widely used mouse models in the field of TB. Many of those mouse strains will be novel tools for studying host responses to Mtb infection.

      (2) The study revealed very unique biology of neutrophils during Mtb infection. It has been well-established that high numbers of neutrophils correlate with high bacterial burden in mice. However, this work uncovered that some mouse strains could be resistant to infection even with high numbers of neutrophils in the lung, indicating the diverse functions of neutrophils. This information is important.

      We are grateful for the reviewer’s thoughtful consideration of our work and appreciate their comment that our mouse strains can benefit the models available in the TB field. We further appreciate the recognition of the importance of neutrophil diversity during Mtb infection.

      Weaknesses:

      The weaknesses of the manuscript are that the work is relatively descriptive. It is unclear whether the neutrophil subsets are indeed functionally different. While single-cell RNA seq did provide some clues at transcription levels, functional and mechanistic investigations are lacking.

      We appreciate this comment and agree that further research needs to be done on the functionality of the neutrophils to discover mechanistic differences between the mouse genotypes. Out attempts at extracting sufficient RNA from sorted neutrophils from the mouse lungs were unsuccessful. However, future attempts at comparing RNA expression between mouse genotypes as well as proteomic data are necessary to determine the mechanistic differences in neutrophil biology in these mice.

      Similarly, it is unclear how highly activated and glycolytic neutrophils in MANC strain contribute to its susceptibility.

      This is a fair comment and we agree that it is still unclear how these neutrophils contribute to MANC susceptibility. Growing the neutrophils ex vivo and infecting them with Mtb is technically challenging, due to the slow growth of Mtb and the short lifespan of the neutrophils. As mentioned in the comment above, future in vivo characterization and RNA expression studies will be necessary to address these questions.

      Reviewer #2 (Public review):

      […] Strengths:

      The strengths are addressing a critically important consideration in the tuberculosis field - mouse model(s) of the human disease, and taking advantage of the novel phenotypes observed to determine potential mechanisms. Notable strengths include,

      (1) Innovative generation and use of mouse models: Developing wild-derived inbred mice from diverse geographic locations is innovative, and this approach expands the range of phenotypic responses observed during M. tuberculosis infection. Additionally, the authors have deposited strains at The Jackson Laboratory making these valuable resources available to the scientific community.

      (2) Potential for translational research: The findings have implications for human pulmonary TB, particularly the discovery of neutrophil-associated susceptibility in primary infection and/or neutrophil-mediated disease progression that could both inform the development of therapeutic targets and also be used to test the effectiveness of such therapies.

      (3) Comprehensive experimental design: The investigators use many complementary approaches including in vivo M. tuberculosis infection, in vitro macrophage studies, neutrophil depletion experiments, flow cytometry, and a number of data mining, machine learning, and imaging to produce robust and comprehensive analyses of the wild-derives d strains and neutrophil subpopulations in 3 weeks after M. tuberculosis infection.

      We thank the reviewer for their thorough and thoughtful assessment of our study. We appreciate the recognition that this mouse model can become a resource and can benefit the study of different immune responses to Mtb infection as well as be informative for studying human TB. We further appreciate their comment that the complementary approaches we have used to characterized the mouse phenotypes strengthens this study.

      Weaknesses:

      The manuscript and studies have considerable strengths and very few weaknesses. One minor consideration is that phenotyping is limited to a single limited-time point; however, this time point was carefully selected and has a strong biological rationale provided by investigators. This potential weakness does not diminish the overall findings, exciting results, or conclusions.

      We thank the reviewer for pointing out that a single time point has been studied, and that this time point is biologically relevant. We agree that additional time points, including later time points that address systemic dissemination, should be included in future studies.

    1. eLife Assessment

      In this important study, the authors develop a microfluidic "Vessel-on-Chip" model to study Neisseria meningitidis interactions in an in vitro vascular system. Compelling evidence demonstrates that endothelial cell-lined channels can be colonized by N. meningitidis, triggering neutrophil recruitment with advantages over complex surgical xenograft models. This system offers potential for follow-on studies of N. meningitidis pathogenesis, though it lacks the cellular complexity of true vasculature including smooth muscle cells and pericytes.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Summary:

      The work by Pinon et al describes the generation of a microvascular model to study Neisseria meningitidis interactions with blood vessels. The model uses a novel and relatively high throughput fabrication method that allows full control over the geometry of the vessels. The model is well characterized from the vascular standpoint and shows improvements when exposed to flow. The authors show that Neisseria binds to the 3D model in a similar geometry that in the animal xenograft model, induces an increase in permeability short after bacterial perfusion, and endothelial cytoskeleton rearrangements including a honeycomb actin structure. Finally, the authors show neutrophil recruitment to bacterial microcolonies and phagocytosis of Neisseria.

      Strengths:

      The article is overall well written, and it is a great advancement in the bioengineering and sepsis infection field. The authors achieved their aim at establishing a good model for Neisseria vascular pathogenesis and the results support the conclusions. I support the publication of the manuscript. I include below some clarifications that I consider would be good for readers.

      One of the most novel things of the manuscript is the use of a relatively quick photoablation system. Could this technique be applied in other laboratories? While the revised manuscript includes more technical details as requested, the description remains difficult to follow for readers from a biology background. I recommend revising this section to improve clarity and accessibility for a broader scientific audience.

      The authors suggest that in the animal model, early 3h infection with Neisseria do not show increase in vascular permeability, contrary to their findings in the 3D in vitro model. However, they show a non-significant increase in permeability of 70 KDa Dextran in the animal xenograft early infection. As a bioengineer this seems to point that if the experiment would have been done with a lower molecular weight tracer, significant increases in permeability could have been detected. I would suggest to do this experiment that could capture early events in vascular disruption.

      One of the great advantages of the system is the possibility of visualizing infection-related events at high resolution. The authors show the formation of actin of a honeycomb structure beneath the bacterial microcolonies. This only occurred in 65% of the microcolonies. Is this result similar to in vitro 2D endothelial cultures in static and under flow? Also, the group has shown in the past positive staining of other cytoskeletal proteins, such as ezrin in the ERM complex. Does this also occur in the 3D system?

      Significance:

      The manuscript is comprehensive, complete and represents the first bioengineered model of sepsis. One of the major strengths is the carful characterization and benchmarking against the animal xenograft model. Beyond the technical achievement, the manuscript is also highly quantitative and includes advanced image analysis that could benefit many scientists. The authors show a quick photoablation method that would be useful for the bioengineering community and improved the state-of-the-art providing a new experimental model for sepsis.

      My expertise is on infection bioengineered models.

    3. Reviewer #2 (Public review):

      Pinon and colleagues have developed a Vessel-on-Chip model showcasing geometrical and physical properties similar to the murine vessels used in the study of systemic infections. The vessel was created via highly controllable laser photoablation in a collagen matrix, subsequent seeding of human endothelial cells, and flow perfusion to induce mechanical cues. This model could be infected with Neisseria meningitidis as a model of systemic infection. In this model, microcolony formation and dynamics, and effects on the host were very similar to those described for the human skin xenograft mouse model (the current gold standard for systemic studies) and were consistent with observations made in patients. The model could also recapitulate the neutrophil response upon N. meningitidis systemic infection.

      The claims and the conclusions are supported by the data, the methods are properly presented, and the data is analyzed adequately. The most important strength of this manuscript is the technology developed to build this model, which is impressive and very innovative. The Vessel-on-Chip can be tuned to acquire complex shapes and, according to the authors, the process has been optimized to produce models very quickly. This is a great advancement compared with the technologies used to produce other equivalent models. This model proves to be equivalent to the most advanced model used to date (skin xenograft mouse model). The human skin xenograft mouse model requires complex surgical techniques and has the practical and ethical limitations associated with the use of animals. However, the Vessel-on-chip model is free of ethical concerns, can be produced quickly, and allows to precisely tune the vessel's geometry and to perform higher resolution microscopy. Both models were comparable in terms of the hallmarks defining the disease, suggesting that the presented model can be an effective replacement of the animal use in this area. In addition, the Vessel-on-Chip allows to perform microscopy with higher resolution and ease, which can in turn allow more complex and precise image-based analysis.

      A limitation of this model is that it lacks the multicellularity that characterizes other similar models, which could be useful to research disease more extensively. However, the authors discuss the possibilities of adding other cells to the model, for example, fibroblasts. It is also not clear whether the technology presented in the current paper can be adopted by other labs. The methodology is complex and requires specialized equipment and personnel, which might hinder its widespread utilization of this model by researchers in the field.

      This manuscript will be of interest for a specialized audience focusing on the development of microphysiological models. The technology presented here can be of great interest to researchers whose main area of interest is the endothelium and the blood vessels, for example, researchers on the study of systemic infections, atherosclerosis, angiogenesis, etc. This manuscript can have great applications for a broad audience and it can present an opportunity to begin collaborations, aimed at answering diverse research questions with the same model.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript Pinon et al. describe the development of a 3D model of human vasculature within a microchip to study Neisseria meningitidis (Nm)- host interactions and validate it through its comparison to the current gold-standard model consisting of human skin engrafted onto a mouse. There is a pressing need for robust biomimetic models with which to study Nm-host interactions because Nm is a human-specific pathogen for which research has been primarily limited to simple 2D human cell culture assays. Their investigation relies primarily on data derived from microscopy and its quantitative analysis, which support the authors' goal of validating their Vessel-on-Chip (VOC) as a useful tool for studying vascular infections by Nm, and by extension, other pathogens associated with blood vessels.

      Strengths:<br /> • Introduces a novel human in vitro system that promotes control of experimental variables and permits greater quantitative analysis than previous models<br /> • The VOC model is validated by direct comparison to the state-of-the-art human skin graft on mouse model<br /> • The authors make significant efforts to quantify, model, and statistically analyze their data<br /> • The laser ablation approach permits defining custom vascular architecture<br /> • The VOC model permits the addition and/or alteration of cell types and microbes added to the model<br /> • The VOC model permits the establishment of an endothelium developed by shear stress and active infusion of reagents into the system

      Weaknesses:<br /> • The work presented here is mostly descriptive, with little new information that is learned about the biology of Nm or endothelial cells. However, the goal of this study was to establish the VOC model, and the validation presented here is necessary for follow-on studies on Nm pathogenesis and host response.<br /> • The VOC model contains one cell type, human umbilical cord vascular endothelial cells (HUVECs), while true vasculature contains a number of other cell types that associate with and affect the endothelium, such as smooth muscle cells, pericytes, and components of the immune system. These and other shortcomings of the VOC model as it currently stands warrant additional discussion.

      Impact:<br /> The VOC model presented by Pinon et al. is an exciting advancement in the set of tools available to study human pathogens interacting with the vasculature. This manuscript focuses on validating the model, and as such sets the foundation for impactful research in the future. Of particular value is the photoablation technique that permits the custom design of vascular architecture without the use of artificial scaffolding structures described in previously published works.

    5. Author response:

      Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility, and clarity):

      The work by Pinon et al describes the generation of a microvascular model to study Neisseria meningitidis interactions with blood vessels. The model uses a novel and relatively high throughput fabrication method that allows full control over the geometry of the vessels. The model is well characterized. The authors then study different aspects of Neisseriaendothelial interactions and benchmark the bacterial infection model against the best disease model available, a human skin xenograft mouse model, which is one of the great strengths of the paper. The authors show that Neisseria binds to the 3D model in a similar geometry that in the animal xenograft model, induces an increase in permeability short after bacterial perfusion, and induces endothelial cytoskeleton rearrangements. Finally, the authors show neutrophil recruitment to bacterial microcolonies and phagocytosis of Neisseria. The article is overall well written, and it is a great advancement in the bioengineering and sepsis infection field, and I only have a few major comments and some minor.

      Major comments:

      Infection-on-chip. I would recommend the authors to change the terminology of "infection on chip" to better reflect their work. The term is vague and it decreases novelty, as there are multiple infection on chips models that recapitulate other infections (recently reviewed in https://doi.org/10.1038/s41564-024-01645-6) including Ebola, SARS-CoV-2, Plasmodium and Candida. Maybe the term "sepsis on chip" would be more specific and exemplify better the work and novelty. Also, I would suggest that the authors carefully take a look at the text and consider when they use VoC or to current term IoC, as of now sometimes they are used interchangeably, with VoC being used occasionally in bacteria perfused experiments.

      We thank Reviewer #1 for this suggestion. Indeed, we have chosen to replace the term "Infection-on-Chip" by "infected Vessel-on-chip" to avoid any confusion in the title and the text. Also, we have removed all the terms "IoC" which referred to "Infection-on-Chip" and replaced with "VoC" for "Vessel-on-Chip". We think these terms will improve the clarity of the main text.

      Author response image 1.

      F-actin (red) and ezrin (yellow) staining after 3h of infection with N. meningitidis (green) in 2D (top) and 3D (bottom) vessel-on-chip models.

      Fig 3 and Supplementary 3: Permeability. The authors suggest that early 3h infection with Neisseria do not show increase in vascular permeability in the animal model, contrary to their findings in the 3D in vitro model. However, they show a non-significant increase in permeability of 70 KDa Dextran in the animal xenograft early infection. This seems to point that if the experiment would have been done with a lower molecular weight tracer, significant increases in permeability could have been detected. I would suggest to do this experiment that could capture early events in vascular disruption.

      Comparing permeability under healthy and infected conditions using Dextran smaller than 70 kDa is challenging. Previous research (1) has shown that molecules below 70 kDa already diffuse freely in healthy tissue. Given this high baseline diffusion, we believe that no significant difference would be observed before and after N. meningitidis infection and these experiments were not carried out. As discussed in the manuscript, bacteria induced permeability in mouse occurs at later time points, 16h post infection as shown previoulsy (2). As discussed in the manuscript, this difference between the xenograft model and the chip likely reflect the absence in the chip of various cell types present in the tissue parenchyma.

      The authors show the formation of actin of a honeycomb structure beneath the bacterial microcolonies. This only occurred in 65% of the microcolonies. Is this result similar to in vitro 2D endothelial cultures in static and under flow? Also, the group has shown in the past positive staining of other cytoskeletal proteins, such as ezrin in the ERM complex. Does this also occur in the 3D system?

      We thank the Reviewer #1 for this suggestion.

      • According to this recommendation, we imaged monolayers of endothelial cells in the flat regions of the chip (the two lateral channels) using the same microscopy conditions (i.e., Obj. 40X N.A. 1.05) that have been used to detect honeycomb structures in the 3D vessels in vitro. We showed that more than 56% of infected cells present these honeycomb structures in 2D, which is 13% less than in 3D, and is not significant due to the distributions of both populations. Thus, we conclude that under both in vitro conditions, 2D and 3D, the amount of infected cells exhibiting cortical plaques is similar. We have added the graph and the confocal images in Figure S4B and lines 418-419 of the revised manuscript.

      • We recently performed staining of ezrin in the chip and imaged both the 3D and 2D regions. Although ezrin staining was visible in 3D (Fig. 1 of this response), it was not as obvious as other markers under these infected conditions and we did not include it in the main text. Interpretation of this result is not straight forward as for instance the substrate of the cells is different and it would require further studies on the behaviour of ERM proteins in these different contexts.

      One of the most novel things of the manuscript is the use of a relatively quick photoablation system. I would suggest that the authors add a more extensive description of the protocol in methods. Could this technique be applied in other laboratories? If this is a major limitation, it should be listed in the discussion.

      Following the Reviewer’s comment, we introduced more detailed explanations regarding the photoablation:

      • L157-163 (Results): "Briefly, the chosen design is digitalized into a list of positions to ablate. A pulsed UV-LASER beam is injected into the microscope and shaped to cover the back aperture of the objective. The laser is then focused on each position that needs ablation. After introducing endothelial cells (HUVEC) in the carved regions,…"

      • L512-516 (Discussion): "The speed capabilities drastically improve with the pulsing repetition rate. Given that our laser source emits pulses at 10kHz, as compared to other photoablation lasers with repetitions around 100 Hz, our solution could potentially gain a factor of 100."

      • L1082-1087 (Materials and Methods): "…, and imported in a python code. The control of the various elements is embedded and checked for this specific set of hardware. The code is available upon request." Adding these three paragraphs gives more details on how photoablation works thus improving the manuscript.

      Minor comments:

      Supplementary Fig 2. The reference to subpanels H and I is swapped.

      The references to subpanels H and I have been correctly swapped back in the reviewed version.

      Line 203: I would suggest to delete this sentence. Although a strength of the submitted paper is the direct comparison of the VoC model with the animal model to better replicate Neisseria infection, a direct comparison with animal permeability is not needed in all vascular engineering papers, as vascular permeability measurements in animals have been well established in the past.

      The sentence "While previously developed VoC platforms aimed at replicating physiological permeability properties, they often lack direct comparisons with in vivo values." has been removed from the revised text.

      Fig 3: Bacteria binding experiments. I would suggest the addition of more methodological information in the main results text to guarantee a good interpretation of the experiment. First, it would be better that wall shear stress rather than flow rate is described in the main text, as flow rate is dependent on the geometry of the vessel being used. Second, how long was the perfusion of Neisseria in the binding experiment performed to quantify colony doubling or elongation? As per figure 1C, I would guess than 100 min, but it would be better if this information is directly given to the readers.

      We thank Reviewer #1 for these two suggestions that will improve the text clarity (e.g., L316). (i) Indeed, we have changed the flow rate in terms of shear stress. (ii) Also, we have normalized the quantification of the colony doubling time according to the first time-point where a single bacteria is attached to the vessel wall. Thus, early adhesion bacteria will be defined by a longer curve while late adhesion bacteria by a shorter curve. In total, the experiment lasted for 3 hours (modifications appear in L318 and L321-326).

      Fig 4: The honeycomb structure is not visible in the 3D rendering of panel D. I would recommend to show the actin staining in the absence of Neisseria staining as well.

      According to this suggestion, a zoom of the 3D rendering of the cortical plaque without colony had been added to the figure 4 of the revised manuscript.

      Line 421: E-selectin is referred as CD62E in this sentence. I would suggest to use the same terminology everywhere.

      We have replaced the "CD62E" term with "E-selectin" to improve clarity.

      Line 508: "This difference is most likely associated with the presence of other cell types in the in vivo tissues and the onset of intravascular coagulation". Do the authors refer to the presence of perivascular cells, pericytes or fibroblasts? If so, it could be good to mention them, as well as those future iterations of the model could include the presence of these cell types.

      By "other cell types", we refer to pericytes (3), fibroblasts (4), and perivascular macrophages (5), which surround endothelial cells and contribute to vessel stability. The main text was modified to include this information (Lines 548 and 555-570) and their potential roles during infection disussed.

      Discussion: The discussion covers very well the advantages of the model over in vitro 2D endothelial models and the animal xenograft but fails to include limitations. This would include the choice of HUVEC cells, an umbilical vein cell line to study microcirculation, the lack of perivascular cells or limitations on the fabrication technique regarding application in other labs (if any).

      We thank Reviewer #1 for this suggestion. Indeed, our manuscript may lack explaining limitations, and adding them to the text will help improve it:

      • The perspectives of our model include introducing perivascular cells surrounding the vessel and fibroblasts into the collagen gel as discussed previously and added in the discussion part (L555-570).

      • Our choice for HUVEC cells focused on recapitulating the characteristics of venules that respect key features such as the overexpression of CD62E and adhesion of neutrophils during inflammation. Using microvascular endothelial cells originating from different tissues would be very interesting. This possibility is now mentioned in the discussion lines 567-568.

      • Photoablation is a homemade fabrication technique that can be implemented in any lab harboring an epifluorescence microscope. This method has been more detailed in the revised manuscript (L1085-1087).

      Line 576: The authors state that the model could be applied to other systemic infections but failed to mention that some infections have already been modelled in 3D bioengineered vascular models (examples found in https://doi.org/10.1038/s41564-024-01645-6). This includes a capillary photoablated vascular model to study malaria (DOI: 10.1126/sciadv.aay724).

      Thes two important references have been introduced in the main text (L84, 647, 648).

      Line 1213: Are the 6M neutrophil solution in 10ul under flow. Also, I would suggest to rewrite this sentence in the following line "After, the flow has been then added to the system at 0.7-1 µl/min."

      We now specified that neutrophils are circulated in the chip under flow conditions, lines 1321-1322.

      Significance

      The manuscript is comprehensive, complete and represents the first bioengineered model of sepsis. One of the major strengths is the carful characterization and benchmarking against the animal xenograft model. Its main limitations is the brief description of the photoablation methodology and more clarity is needed in the description of bacteria perfusion experiments, given their complexity. The manuscript will be of interest for the general infection community and to the tissue engineering community if more details on fabrication methods are included. My expertise is on infection bioengineered models.

      Reviewer #2 (Evidence, reproducibility, and clarity):

      Summary:

      The authors develop a Vessel-on-Chip model, which has geometrical and physical properties similar to the murine vessels used in the study of systemic infections. The vessel was created via highly controllable laser photoablation in a collagen matrix, subsequent seeding of human endothelial cells and flow perfusion to induce mechanical cues. This vessel could be infected with Neisseria meningitidis, as a model of systemic infection. In this model, microcolony formation and dynamics, and effects on the host were very similar to those described for the human skin xenograft mouse, which is the current gold standard for these studies, and were consistent with observations made in patients. The model could also recapitulate the neutrophil response upon N. meningitidis systemic infection.

      Major comments:

      I have no major comments. The claims and the conclusions are supported by the data, the methods are properly presented and the data is analyzed adequately. Furthermore, I would like to propose an optional experiment could improve the manuscript. In the discussion it is stated that the vascular geometry might contribute to bacterial colonization in areas of lower velocity. It would be interesting to recapitulate this experimentally. It is of course optional but it would be of great interest, since this is something that can only be proven in the organ-on-chip (where flow speed can be tuned) and not as much in animal models. Besides, it would increase impact, demonstrating the superiority of the chip in this area rather than proving to be equal to current models.

      We have conducted additional experiments on infection in different vascular geometries now added these results figure 3/S3 and lines 288-305. We compared sheared stress levels as determined by Comsol simulation and experimentally determined bacterial adhesion sites. In the conditions used, the range of shear generated by the tested geometries do not appear to change the efficiency of bacterial adhesion. These results are consistent with a previous study from our group which show that in this range of shear stresses the effect on adhesion is limited (6) . Furthermore, qualitative observations in the animal model indicate that bacteria do not have an obvious preference in terms of binding site.

      Minor comments:

      I have a series of suggestions which, in my opinion, would improve the discussion. They are further elaborated in the following section, in the context of the limitations.

      • How to recapitulate the vessels in the context of a specific organ or tissue? If the pathogen is often found in the luminal space of other organs after disseminating from the blood, how can this process be recapitulated with this mode, if at all?

      For reasons that are not fully understood, postmortem histological studies reveal bacteria only inside blood vessels but rarely if ever in the organ parenchyma. The presence of intravascular bacteria could nevertheless alter cells in the tissue parenchyma. The notable exception is the brain where bacteria exit the bacterial lumen to access the cerebrospinal fluid. The chip we describe is fully adapted to develop a blood brain barrier model and more specific organ environments. This implies the addition of more cell types in the hydrogel. A paragraph on this topic has been added (Lines 548 and 552-570).

      • Similarly, could other immune responses related to systemic infection be recapitulated? The authors could discuss the potential of including other immune cells that might be found in the interstitial space, for example.

      This important discussion point has been added to the manuscript (L623-636). As suggested by Reviewer #2, other immune cells respond to N. meningitis and can be explored using our model. For instance, macrophages and dendritic cells are activated upon N. meningitis infection, eliminate the bacteria through phagocytosis, produce pro-inflammatory cytokines and chemokines potentially activating lymphocytes (7). Such an immune response, yet complex, would be interesting to study in our model as skin-xenograft mice are deprived of B and T lymphocytes to ensure acceptance of human skin grafts.

      • A minor correction: in line 467 it should probably be "aspects" instead of "aspect", and the authors could consider rephrasing that sentence slightly for increased clarity.

      We have corrected the sentence with "we demonstrated that our VoC strongly replicates key aspects of the in vivo human skin xenograft mouse model, the gold standard for studying meningococcal disease under physiological conditions." in lines 499-503.

      Strengths and limitations

      The most important strength of this manuscript is the technology they developed to build this model, which is impressive and very innovative. The Vessel-on-Chip can be tuned to acquire complex shapes and, according to the authors, the process has been optimized to produce models very quickly. This is a great advancement compared with the technologies used to produce other equivalent models. This model proves to be equivalent to the most advanced model used to date, but allows to perform microscopy with higher resolution and ease, which can in turn allow more complex and precise image-based analysis. However, the authors do not seem to present any new mechanistic insights obtained using this model. All the findings obtained in the infection-on-chip demonstrate that the model is equivalent to the human skin xenograft mouse model, and can offer superior resolution for microscopy. However, the advantages of the model do not seem to be exploited to obtain more insights on the pathogenicity mechanisms of N. meningitidis, host-pathogen interactions or potential applications in the discovery of potential treatments. For example, experiments to elucidate the role of certain N. meningiditis genes on infection could enrich the manuscript and prove the superiority of the model. However, I understand these experiments are time-consuming and out of the scope of the current manuscript. In addition, the model lacks the multicellularity that characterizes other similar models. The authors mention that the pathogen can be found in the luminal space of several organs, however, this luminal space has not been recapitulated in the model. Even though this would be a new project, it would be interesting that the authors hypothesize about the possibilities of combining this model with other organ models. The inclusion of circulating neutrophils is a great asset; however it would also be interesting to hypothesize about how to recapitulate other immune responses related to systemic infection.

      We thank Reviewer #2 for his/her comment on the strengths and limitations of our work. The difficulty is that our study opens many futur research directions and applications and we hope that the work serves as the basis for many future studies but one can only address a limited set of experiments in a single manuscript.

      • Experiments investigating the role of N. meningitidis genes require significant optimization of the system. Multiplexing is a potential avenue for future development, which would allow the testing of many mutants. The fast photoablation approach is particularly amenable to such adaptation.

      • Cells and bacteria inside the chambers could be isolated and analyzed at the transcriptomic level or by flow cytometry. This would imply optimizing a protocol for collecting cells from the device via collagenase digestion, for instance. This type of approach would also benefit from multiplexing to enhance the number of cells.

      • As mentioned above, the revised manuscript discusses the multicellular capabilities of our model, including the integration of additional immune cells and potential connections to other organ systems. We believe that these approaches are feasible and valuable for studying various aspects of N. meningitidis infection.

      Advance

      The most important advance of this manuscript is technical: the development of a model that proves to be equivalent to the most complex model used to date to study meningococcal systemic infections. The human skin xenograft mouse model requires complex surgical techniques and has the practical and ethical limitations associated with the use of animals. However, the Infection-on-chip model is completely in vitro, can be produced quickly, and allows to precisely tune the vessel’s geometry and to perform higher resolution microscopy. Both models were comparable in terms of the hallmarks defining the disease, suggesting that the presented model can be an effective replacement of the animal use in this area.

      Other vessel-on-chip models can recapitulate an endothelial barrier in a tube-like morphology, but do not recapitulate other complex geometries, that are more physiologically relevant and could impact infection (in addition to other non-infectious diseases). However, in the manuscript it is not clear whether the different morphologies are necessary to study or recapitulate N. meningitidis infection, or if the tubular morphologies achieved in other similar models would suffice.

      Audience

      This manuscript might be of interest for a specialized audience focusing on the development of microphysiological models. The technology presented here can be of great interest to researchers whose main area of interest is the endothelium and the blood vessels, for example, researchers on the study of systemic infections, atherosclerosis, angiogenesis, etc. Thus, the tool presented (vessel-on-chip) can have great applications for a broad audience. However, even when the method might be faster and easier to use than other equivalent methods, it could still be difficult to implement in another laboratory, especially if it lacks expertise in bioengineering. Therefore, the method could be more of interest for laboratories with expertise in bioengineering looking to expand or optimize their toolbox. Alternatively, this paper present itself as an opportunity to begin collaborations, since the model could be used to test other pathogen or conditions.

      Field of expertise:

      Infection biology, organ-on-chip, fungal pathogens.

      I lack the expertise to evaluate the image-based analysis.

      References

      (1) Gyohei Egawa, Satoshi Nakamizo, Yohei Natsuaki, Hiromi Doi, Yoshiki Miyachi, and Kenji Kabashima. Intravital analysis of vascular permeability in mice using two-photon microscopy. Scientific Reports, 3(1):1932, Jun 2013. ISSN 2045-2322. doi: 10.1038/srep01932.

      (2) Valeria Manriquez, Pierre Nivoit, Tomas Urbina, Hebert Echenique-Rivera, Keira Melican, Marie-Paule Fernandez-Gerlinger, Patricia Flamant, Taliah Schmitt, Patrick Bruneval, Dorian Obino, and Guillaume Duménil. Colonization of dermal arterioles by neisseria meningitidis provides a safe haven from neutrophils. Nature Communications, 12(1):4547, Jul 2021. ISSN 2041-1723. doi: 10.1038/s41467-021-24797-z.

      (3) Mats Hellström, Holger Gerhardt, Mattias Kalén, Xuri Li, Ulf Eriksson, Hartwig Wolburg, and Christer Betsholtz. Lack of pericytes leads to endothelial hyperplasia and abnormal vascular morphogenesis. Journal of Cell Biology, 153(3):543–554, Apr 2001. ISSN 0021-9525. doi: 10.1083/jcb.153.3.543.

      (4) Arsheen M. Rajan, Roger C. Ma, Katrinka M. Kocha, Dan J. Zhang, and Peng Huang. Dual function of perivascular fibroblasts in vascular stabilization in zebrafish. PLOS Genetics, 16(10):1–31, 10 2020. doi: 10.1371/journal.pgen.1008800.

      (5) Huanhuan He, Julia J. Mack, Esra Güç, Carmen M. Warren, Mario Leonardo Squadrito, Witold W. Kilarski, Caroline Baer, Ryan D. Freshman, Austin I. McDonald, Safiyyah Ziyad, Melody A. Swartz, Michele De Palma, and M. Luisa Iruela-Arispe. Perivascular macrophages limit permeability. Arteriosclerosis, Thrombosis, and Vascular Biology, 36(11):2203–2212, 2016. doi: 10.1161/ATVBAHA. 116.307592.

      (6) Emilie Mairey, Auguste Genovesio, Emmanuel Donnadieu, Christine Bernard, Francis Jaubert, Elisabeth Pinard, Jacques Seylaz, Jean-Christophe Olivo-Marin, Xavier Nassif, and Guillaume Dumenil. Cerebral microcirculation shear stress levels determine Neisseria meningitidis attachment sites along the blood–brain barrier . Journal of Experimental Medicine, 203(8):1939–1950, 07 2006. ISSN 0022-1007. doi: 10.1084/jem.20060482.

      (7) Riya Joshi and Sunil D. Saroj. Survival and evasion of neisseria meningitidis from macrophages. Medicine in Microecology, 17:100087, 2023. ISSN 2590-0978. doi: https://doi.org/10.1016/j.medmic. 2023.100087.

    1. eLife Assessment

      Yabaji et al. reports a fundamental study highlighting the mechanistic connection for susceptibility to TB infection via the sst1 locus, this was shown to involve increased IFN and Myc production causing the down-regulation of anti-oxidant defence genes and chronic lipidation. Ultimately, lipid peroxidation may underlie infectivity and macrophage dysfunction. Overall, the data presented are compelling, supported by a well designed multi-omics approach and the findings will be of broad interest to researchers investigating the molecular mechanisms of TB infection.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Summary:

      In this report, Yabaji et al describe studies designed to address the mechanism behind the TB susceptibility gene sst1. This locus is known to affect expression of IFN and synergizes with Myc to potentiate infectivity. Using a variety of molecular expression and imaging techniques, the authors demonstrate that mice harboring an sst1 transgene (compared to B6 controls) are highly susceptible to TB infection via a mechanism involving loss of antioxidant defense systems, the down regulation of key antioxidant genes and ferritin controlling intracellular iron levels. The combination of increased iron plus decreased antioxidant defense systems in turn increases lipid peroxidation and downstream sequelae. Inhibition of peroxidation diminishes infectivity increases ferritin levels. Furthermore, the authors demonstrate that Myc activation potentiates this process and that down regulation of NRF2 antioxidant defenses accompany potentiated infectivity. Increased peroxidation products (4-HNE) may activate the ASK1/JNK system leading to IFNb superinduction and diminished macrophage viability thereby diminishing ability to withstand TB infection. Extending these findings, additional mouse models plus some work in humans supports the peroxidation hypothesis. Overall, the work is significant for it introduces a molecular basis for TB infectivity and presents a potential novel therapeutic opportunity.

      Strengths:

      (1) Strengths of this study include a multi-omic analysis of infectivity combining gene expression analysis with biochemical and cell biological evaluation.

      (2) Novel identification of an iron-catalyzed lipid peroxidation based mechanism for why the sst1 locus is linked to TB infection.

      (3) Parallels to human biology are included via analysis of Myc upregulation in peripheral blood from patients.

      (4) Appropriate statistical analysis

      Weaknesses:

      (1) Lipid peroxidation is a broad phenotype process and the authors honed in on 4-HNE dependent processes as a likely mechanism because they can measure 4-HNE conjugated proteins. However, lipid peroxidation is a complex phenomenon and the work presented herein is largely descriptive.

      (2) The authors continually refer to increased 4HNE while they do not measure this 9 carbon lipid, they actually measure 4-HNE conjugated proteins immunochemically.

      (3) The authors do not distinguish between increased protein-HNE adducts and increased membrane peroxidation (or both) as mechanistically linked to infectivity.

    3. Author response:

      General Statements

      We are grateful for constructive reviewers’ comments and criticisms and have thoroughly addressed all major and minor comments in the revised manuscript.

      Summary of new data.

      We have performed the following additional experiments to support our concept:

      (1) The kinetcs of ROS production in B6 and B6.Sst1S macrophages after TNF stimulation (Fig. 3I and J, Suppl. Fig. 3G);

      (2) Time course of stress kinase activation (Fig.3K) that clearly demonstrated the persistent stress kinase (phospho-ASK1 and phospho-cJUN) activation exclusively in. the B6.Sst1S macrophages;

      (3) New Fig.4 C-E panels include comparisons of the B6 and B6.Sst1S macrophage responses to TNF and effects of IFNAR1 blockade in both backgrounds.

      (4) We performed new experiments demonstrating that the synthesis of lipid peroxidation products (LPO) occurs in TNF-stimulated macrophages earlier than the IFNβ super-induction (Suppl.Fig.4A and B).

      (5) We demonstrated that the IFNAR1 blockade 12, 24 and 32 h after TNF stimulation still reduced the accumulation of LPO product (4-HNE) in TNF-stimulated B6.Sst1S BMDMs (Suppl.Fig.4 E-G).

      (6) We added comparison of cMyc expression between the wild type B6 and B6.Sst1S BMDMs during TNF stimulation for 6-24 h (Fig.5I-J).

      (7) New data comparing 4-HNE levels in Mtb-infected B6 wild type and B6.Sst1S macrophages and quantification of replicating Mtb was added (Fig.6B, Suppl.Fig.7C and D).

      (8) In vivo data described in Fig.7 was thoroughly revised and new data was included. We demonstrated increased 4-HNE loads in multibacillary lesions (Fig.7A, Suppl. Fig.9A) and the 4-HNE accumulation in CD11b+ myeloid cells (Fig.7B and Suppl.Fig.9B). We demonstrated that the Ifnb – expressing cells are activated iNOS+ macrophages (Fig.7D and Suppl.Fig.13A). Using new fluorescent multiplex IHC, we have shown that stress markers phopho-cJun and Chac1 in TB lesions are expressed by Ifnb- and iNOS-expressing macrophages (Fig.7E and Suppl.Fig.13D-F).

      (9) We performed additional experiment to demonstrate that naïve (non-BCG vaccinated) lymphocytes did not improve Mtb control by Mtb-infected macrophages in agreement with previously published data (Suppl.Fig.7H).

      Summary of updates

      Following reviewers requests we updated figures to include isotype control antibodies, effects of inhibitors on non-stimulated cells, positive and negative controls for labile iron pool, additional images of 4-HNE and live/dead cell staining.

      Isotype control for IFNAR1 blockade were included in Fig.3M, Fig.4C -E, Fig.6L-M Suppl.Fig.4F-G, 7I.

      Positive and negative controls for labile iron pool measurements were added to Fig.3E, Fig.5D, Suppl.Fig.3B

      Cell death staining images were added Suppl.Fig.3H

      Co-staining of 4-HNE with tubulin was added to Suppl.Fig.3A.

      High magnification images for Figure 7 were added in Suppl.Fig.8 to demonstrate paucibacillary and multibacillary image classification.

      Single-channel color images for individual markers were provided in Fig.7E and Suppl.Fig.13B-F.

      Inhibitor effects on non-stimulated cells were included in Fig.5 D-H, Suppl.Fig.6A and B. Titration of CSF1R inhibitors for non-toxic concentration determination are included in Suppl.Fig.6D.

      In addition, we updated the figure legends in the revised manuscript to include more details about the experiments. We also clarified our conclusions in the Discussion. Responses to every major and minor comment of the reviewers are provided below.

      Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity:

      Summary

      The study by Yabaji et al. examines macrophage phenotypes B6.Sst1S mice, a mouse strain with increased susceptibility to M. tuberculosis infection that develops necrotic lung lesions. Extending previous work, the authors specifically focus on delineating the molecular mechanisms driving aberrant oxidative stress in TNF-activated B6.Sst1S macrophages that has been associated with impaired control of M. tuberculosis. The authors use scRNAseq of bone marrow-derived macrophages to further characterize distinctions between B6.Sst1S and control macrophages and ascribe distinct trajectories upon TNF stimulation. Combined with results using inhibitory antibodies and small molecule inhibitors in in vitro experimentation, the authors propose that TNF-induced protracted c-Myc expression in B6.Sst1S macrophages disables the cellular defense against oxidative stress, which promotes intracellular accumulation of lipid peroxidation products, fueled at least in part by overexpression of type I IFNs by these cells. Using lung tissue sections from M. tuberculosis-infected B6.Sst1S mice, the authors suggest that the presence of a greater number of cells with lipid peroxidation products in lung lesions with high counts of stained M. tuberculosis are indicative of progressive loss of host control due to the TNF-induced dysregulation of macrophage responses to oxidative stress. In patients with active tuberculosis disease, the authors suggest that peripheral blood gene expression indicative of increased Myc activity was associated with treatment failure.

      Major comments

      The authors describe differences in protein expression, phosphorylation or binding when referring to Fig 2A-C, 2G, 3D, 5B, 5C. However, such differences are not easily apparent or very subtle and, in some cases, confounded by differences in resting cells (e.g. pASK1 Fig 3L; c-Myc Fig 5B) as well as analyses across separate gels/blots (e.g. Fig 3K, Fig 5B). Quantitative analyses across different independent experiments with adequate statistical analyses are required to strengthen the associated conclusions.

      We updated our Western blots as follows:

      (1) Densitometery of normalized bands is included above each lane (Fig.2A-C; Fig.3C-D and 3K; Fig.4A-B; Fig.5B,C,I,J). New data in Fig.3K is added to highlight differences between B6 and B6.Sst1S at individual timepoints after TNF stimulation. In Fig.5I we added new data comparing Myc levels in B6 and B6.Sst1S with and without JNK inhibitor and updated the results accordingly. New Fig.3K clearly demonstrates the persistent activation of p-cJun and pAsk1 at 24 and 36h of TNF stimulation. In Fig.5B we clearly demonstrate that Myc levels were higher in B6.Sst1S after 12 h of TNF stimulation. At 6h, however, the basal differences in Myc levels are consistently higher in B6.Sst1S and the induction by TNF is 1.6-fold similar in both backgrounds. We noted this in the text.

      (2) A representative experiment is shown in individual panels and the corresponding figure legend contains information on number of biological repeats. Each Western blot was repeated 2 – 4 times.

      The representative images of fluorescence microscopy in Fig 3H, 4H, 5H, S3C, S3I, S5A, S6A seem to suggest that under some conditions the fluorescence signal is located just around the nucleus rather than absent or diminished from the cytoplasm. It is unclear whether this reflects selective translocation of targets across the cell, morphological changes of macrophages in culture in response to the various treatments, or variations in focal point at which images were acquired. Control images (e.g. cellular actin, DIC) should be included for clarification. If cell morphology changes depending on treatments, how was this accounted for in the quantitative analyses? In addition, negative controls validating specificity of fluorescence signals would be warranted.

      Our conclusion of higher LPO production is based on several parameters: 4-HNE staining, measurements of MDA in cell lysates and oxidized lipids using BODIPY C11. Taken together they demonstrate significant and reproducible increase in LPO accumulation in TNFstimulated B6.Sst1S macrophages. This excludes imaging artefact related to unequal 4-HNE distribution noted by the reviewer. In fact, we also noted that the 4-HNE was spread within cell body of B6.Sst1S macrophages and confirmed it using co-staining with tubulin, as suggested by the reviewer (new Suppl.Fig.3A). Since low molecular weight LPO products, such as MDA and 4-HNE, traverse cell membranes, it is unlikely that they will be strictly localized to a specific membrane bound compartment. However, we agree that at lower concentrations, there might be some restricted localization, explaining a visible perinuclear ring of 4-HNE staining in B6 macrophages. This phenomenon may be explained just by thicker cytoplasm surrounding nucleus in activated macrophages spread on adherent plastic surface or by proximity to specific organelles involved in generation or clearance of LPO products and definitively warrants further investigation.

      We also included images of non-stimulated cells in Fig.3H, Suppl.Fig.3A and 3E. We used multiple fields for imaging and quantified fluorescence signals (Suppl. Fig.3D and 3F, Suppl.Fig.4G, Suppl.Fig.6A and B).

      We used negative controls without primary antibodies for the initial staining optimization, but did not include it in every experiment.

      To interpret the evaluation on the hierarchy of molecular mechanisms in B6.Sst1S macrophages, comparative analyses with B6 control cells should be included (e.g. Fig 4C-I, Fig 5, Fig 6B, E-M, S6C, S6E-F). This will provide weight to the conclusions that the dysregulated processes are specifically associated with the susceptibility of B6.Sst1S macrophages.

      Understanding the sst1-mediated effects on macrophage activation is the focus of our previously published studies Bhattacharya et al., JCI, 2021) and this manuscript. The data comparing B6 and B6.Sst1S macrophage are presented in Fig.1, Fig.2, Fig.3, Fig.4, Fig.5A-C, I and J, Fig.6A-C, 6J and corresponding supplemental figures 1, 2, 3, 4A and B, Suppl.Fig.5, Suppl.Fig.6C, Suppl.Fig.7A-D,7F.

      Once we identified the aberrantly activated pathways in the B6.Sst1S, we used specific inhibitors to correct the aberrant response in B6.Sst1S.

      All experiments using inhibitory antibodies require comparison to the effect of a matched isotype control in the same experiment (e.g. Fig 3J, 4F, G, I; 6L, 6M, S3G, S6F).

      Isotype control for IFNAR1 blockade were included in Fig.3M, Fig.4C-E, Fig.6L-M Suppl.Fig.4F-G, 7I.

      Experiments using inhibitors require inclusion of an inhibitor-only control to assess inhibitor effects on unstimulated cells (e.g. Fig 4I, 5D-I)

      Inhibitor effects on non-stimulated cells were included in Fig.5 D-H, Suppl.Fig.6A and B.

      Fig 3K and Fig 5J appear to contain the same images for p-c-Jun and b-tubulin blots.

      Fig.3K and 5J partially overlapped but had different focus – 3K has been updated to reflect the time course of stress kinase activation. Fig.5J is updated (currently Fig.5I and J) to display B6 and B6.Sst1S macrophage data including cMyc and p-cJun levels.

      Data of TNF-treated cells in Fig 3I appear to be replotted in Fig 3J.

      Currently these data is presented in Fig.3L and 3M and has been updated to include comparison of B6 and B6.Sst1S cells (Fig.3L) and effects of inhibitors in Fig.3M.

      It is stated that lungs from 2 mice with paucibacillary and 2 mice with multi-bacillary lesions were analyses. There is contradicting information on whether these tissues were collected at the same time post infection (week 14?) or whether the pauci-bacillary lesions were in lungs collected at earlier time points post infection (see Fig S8A). If the former, how do the authors conclude that multi-bacillary lesions are a progression from paucibacillary lesions and indicative of loss of M. tuberculosis control, especially if only one lesion type is observed in an individual host? If the latter, comparison between lesions will likely be dominated by temporal differences in the immune response to infection.

      In either case, it is relevant to consider density, location, and cellular composition of lesions (see also comments on GeoMx spatial profiling). Is the macrophage number/density per tissue area comparable between pauci-bacillary and multi-bacillary lesions?

      We did not collect lungs at the same time point. As described in greater detail in our preprints (Yabaji et al., https://doi.org/10.1101/2025.02.28.640830 and https://doi.org/10.1101/2023.10.17.562695) pulmonary TB lesions in our model of slow TB progression are heterogeneous between the animals at the same timepoint, as observed in human TB patients and other chronic TB animal models. Therefore, we perform analyses of individual TB lesions that are classified by a certified veterinary pathologist in a blinded manner based on their morphology (H&E) and acid fast staining of the bacteria, as depicted in Suppl.Fig.8. Currently it is impossible to monitor progression of individual lesions in mice. However, in mice TB is progressive disease and no healing and recovery from the disease have been observed in our studies or reported in literature. Therefore, we assumed that paucibacillary lesions preceded the multibacillary ones, and not vice versa, thus reflecting the disease progression. In our opinion, this conclusion most likely reflects the natural course of the disease. However, we edited the text : instead of disease progression we refer to paucibacillary and multibacillary lesions.

      Does 4HNE staining align with macrophages and if so, is it elevated compared to control mice and driven by TNF in the susceptible vs more resistant mice?

      We performed additional staining and analyses to demonstrate the 4-HNE accumulation in CD11b+ myeloid cells of macrophage morphology. Non-necrotic lesions contain negligible proportion of neutrophils (Fig.7B, Suppl.Fig.9B). B6 mice do not develop advanced multibacillary TB lesions containing 4-HNE+ cells. Also, 4-HNE staining was localized to TB lesions and was not found in uninvolved lung areas of the infected mice, as shown in Suppl.Fig.9A (left panel).

      It is well established that TNF plays a central role in the formation and maintenance of TB granulomas in humans and in all animal models. Therefore, TNF neutralization would lead to rapid TB progression, rapid Mtb growth and lesions destruction in both B6 and B6.Sst1S genetic backgrounds.

      Pathway analysis of spatial transcriptomic data (Suppl.Fig.11) identified TNF signaling via NFkB among dominant pathways upregulated in multibacillary lesions, suggesting that the 4-HNE accumulation paralleled increased TNF signaling. In addition, in vivo other cytokines, including IFN-I, could activate macrophages and stimulate production of reactive oxygen and nitrogen species and lead to the accumulation of LPO products as shown in this manuscript.

      It would be relevant to state how many independent lesions per host were sampled in both the multiplex IHC as well as the GeoMx data. Can the authors show the selected regions of interest in the tissue overview and in the analyses to appreciate within-host and across-host heterogeneity of lesions. The nature of the spatial transcriptomics platform used is such that the data are derived from tissue areas that contain more than just Iba1+ macrophages. At later stages of infection, the cellular composition of such macrophage-rich areas will be different when compared to lesions earlier in the infection process. Hence, gene expression profiles and differences between tissue regions cannot be attributed to macrophages in this tissue region but are more likely a reflection of a mix of cellular composition and per-cell gene expression.

      We used Iba1 staining to identify macrophages in TB lesions and programmed GeoMx instrument to collect spatial transcriptomics probes from Iba1+ cells within ROIs. Also, we selected regions of interest (ROI) avoiding necrotic areas (depicted in Suppl.Fig.10). We agree that Iba1+ macrophage population is heterogenous – some Iba1+ cells are activated iNOS+ macrophages, other are iNOS-negative (Fig.7C and D, and Suppl.Fig.13A). Multibacillary lesions contain larger areas occupied by activated (iNOS+) macrophages (Fig.7D,

      Suppl.Fig.13B and 13F). Although the GeoMx spatial transcriptomic platform does not provide single cell resolution, it allowed us to compare populations of Iba1+ cells in paucibacillary and multibacillary TB lesions and to identify a shift in their overall activation pattern.

      It is stated that loss of control of M. tuberculosis in multibacillary lesions was associated with "downregulation of IFNg-inducible genes". If the authors base this on the tissue expression of individual genes, this requires further investigation to support such conclusion (also see comment on GeoMx above). Furthermore, how might this conclusion be compatible with significantly elevated iNOS+ cells (Fig 7D) in multibacillary lesions?

      We demonstrated that Ciita gene expression is specifically induced by IFN-gamma and is suppressed by IFN-I (Fig.6M). The expression of Ciita in paucibacillary lesions suggest the presence of the IFN-gamma activated cells and its disappearance in the multibacillary lesion is consistent with massive activation of IFN-I pathway (Fig.7C).

      It is appreciated that the human blood signature analyses contain Myc-signatures but the association with treatment failure is not very strong based on the data in Fig 13B and C (Suppl.Fig.15B and C now). The authors indicate that they have no information on disease severity, but it should perhaps not be assumed that treatment failure is indicative of poor host control of the infection. Perhaps independent analyses in separate cohort/data set can add strength and provide -additional insights (e.g. PMID: 35841871; PMID: 32451443, PMID: 17205474, PMID: 22872737). In addition, the human data analyses could be strengthened by extension to additional signatures such as IFN, TNF, oxidative stress. Details of the human study design are not very clear and are lacking patient demographics, site of disease, time of blood collection relative to treatment onset, approving ethics committees.

      X axis of Suppl.Fig.15A represent pre-defined molecular signature gene sets (MSigDB) in Gene Set Enrichment Analysis (GSEA) database (https://www.gseamsigdb.org/gsea/msigdb). On Y axis is area under curve (AUC) score for each gene set. The Myc upregulated gene set myc_up was identified among top gene sets associated with treatment failure using unbiased ssGSEA algorithm. The upregulation of Myc pathway in the blood transcriptome associated with TB treatment failure most likely reflects greater proportion of immature cells in peripheral blood, possibly due to increased myelopoiesis.

      Pathway analysis of the differentially expressed genes revealed that treatment failures were associated with the following pathways relevant to this study: NF-kB Signaling, Flt3 Signaling in Hematopoietic Progenitor Cells (indicative of common myeloid progenitor cell proliferation), SAPK/JNK Signaling and Senescence (indicative of oxidative stress). The upregulation of these pathways in human patients with poor TB treatment outcomes correlates with our findings in TB susceptible mice. The detailed analysis of differentially regulated pathways in human TB patients is beyond the scope of this study and is presented in another manuscript entitled “ Tuberculosis risk signatures and differential gene expression predict individuals who fail treatment” by Arthur VanValkenburg et al., submitted for publication.

      Blood collection for PBMC gene expression profiling of TB patients was prior to TB treatment or within a first week of treatment commencement. Boxplot of bootstrapped ssGSEA enrichment AUC scores from several oncogene signatures ranked from lowest to highest AUC score, with myc_up and myc_dn genes highlighted in red.

      We agree with the reviewer that not every gene in the myc_up gene set correlates with the treatment outcome. But the association of the gene set is statistically significant, as presented in Suppl.Fig.15B – C.

      We updated the details of the study, including study sites and the ethics committee approval statement and references describing these cohorts.

      Other comments

      It is excellent that the authors provide individual data points. Choosing a colour other than black would increase clarity when black bars are used.

      We followed this useful suggestion and selected consistent color codes for B6 and B6.Sst1S groups to enhance clarity throughout the revised manuscript.

      Error bars are inconsistently depicted as either bi-directional or just unidirectional.

      We used bi-directional error bars in the revised manuscript.

      Fig 1E, G, H - please include a scale to clarify what the heat map is representing.

      We have included the expression key in Fig.1E,G and H and Suppl.Fig.1C and D in the revised version.

      Fig 2K, Fig S10A gene information cannot be deciphered.

      We increased the font in previous Fig.2K and moved to supplement to keep larger fonts (current Suppl.Fig.2G).

      Fig S4A,B please add error bars.

      These data are presented as Suppl.Fig.5 in the revised version. We performed one experiment to test the hypothesis. Because the data indicated no clear increase in transposon small RNAs in the sst1S macrophages, we did not pursue this hypothesis further, and therefore, the error bars were not included. However, we decided to include these negative data because it rejects a very attractive and plausible hypothesis.

      Please use gene names as per convention (e.g. Ifnb1) to distinguish gene expression from protein expression in figures and text.

      We addressed the comment in the revised manuscript.

      Fig S8B. Contrary to the description of results, there seems to be minimal overlap between the signal for YFP and the Ifnb1 probe. Is the Ifnb1 reporter mouse a legacy reporter? If so, it is worth stating this and including such considerations in the data interpretation.

      The YFP reporter expresses YFP protein under the control of the Ifnb1 promoter. The YFP protein accumulates within the cells and while Ifnb protein is rapidly secreted and does not accumulate in the producing cells in appreciable amounts. So YFP is not a lineage tracing reporter, but its accumulation marks the Ifnb1 promoter activity in cells, although the YFP protein half-life is longer than that of the Ifnb1 mRNA that is rapidly degraded (Witt et al., BioRxiv, 2024; doi:10.1101/2024.08.28.61018). Therefore, there is no precise spatiotemporal coincidence of these readouts.

      Please clarify what is meant by "normal interstitium" ? If the tissue is from uninfected mice, please state clearly.

      In this context we refer to the uninvolved lung areas of the infected lungs. In every sample we compare uninvolved lung areas and TB lesions of the same animal. Also, we performed staining of lung of non-infected mice as additional controls.

      If macrophage cultures underwent media changes every 48h, how was loss of liberated Mtb taken into account especially if differences in cell density/survival were noted? The assessment of M. tuberculosis load by qPCR is not well described. In particular, the method of normalization applied within the experiments (not within the qPCR) here remains unclear, even with reference to the authors' prior publication.

      Our lab has many years of experience working with macrophage monolayers infected with virulent Mtb and uses optimized protocols to avoid cell losses and related artifacts. Recently we published a detailed protocol for this methodology in STAR Protocols (Yabaji et al., 2022; PMID 35310069). In brief, it includes preparation of single cell suspensions of Mtb by filtration to remove clumps, use of low multiplicity of infection, preparation of healthy confluent monolayers and use of nutrient rich culture medium and medium change every 2 days. We also rigorously control for cell loss using whole well imaging and quantification of cell numbers and live/dead staining.

      Please add citation for the limma package.

      The references has been added (Ritchie et al, NAR 2015; PMID 25605792).

      The description of methodology relating to the "oncogene signatures" is unclear.

      This signature was described in Bild etal, Nature, 2006 and McQuerry JA, et al, 2019 “Pathway activity profiling of growth factor receptor network and stemness pathways differentiates metaplastic breast cancer histological subtypes”. BMC Cancer 19: 881 and is cited in Methods section Oncogene signatures

      Please clearly state time points post infection for mouse analyses.

      We collected lung samples from Mtb infected mice 12 – 20 weeks post infection. The lesions were heterogeneous and were individually classified using criteria described above.

      Reference is made to "a list of genes unique to type I [interferon] genes [....]" (p29). Can the authors indicate the source of the information used for compiling this list?

      The lists were compiled from Reactome, EMBL's European Bioinformatics Institute and GSEA databases. The links for all datasets are provided in Suppl.Table 8 “Expression of IFN pathway genes in Iba1+ cells from pauci- and multi-bacillary lesions of Mtb infected B6.Sst1S mouse lungs” in the “Pool IFN I & II gene sets” worksheet.

      The discussion at present is very long, contains repetition of results and meanders on occasion.

      Thank you for this suggestion, We critically revised the text for brevity and clarity.

      Reviewer #1 (Significance):  

      Strengths and limitations  

      Strengths: multi-pronged analysis approaches for delineating molecular mechanisms of macrophage responses that might underpin susceptibility to M. tuberculosis infection; integration of mouse tissues and human blood samples  

      Weaknesses: not all conclusions supported by data presented; some concerns related to experimental design and controls; links between findings in human cohort and the mechanistic insights gained in mouse macrophage model uncertain

      The revised manuscript addresses every major and minor comment of the reviewers, including isotype controls and naïve T cells, to provide additional support for our conclusions. Our study revealed causal links between Myc hyperactivity with the deficiency of anti-oxidant defense and type I interferon pathway hyperactivity. We have shown that Myc hyperactivity in TNF-stimulated macrophages compromises antioxidant defense leading to autocatalytic lipid peroxidation and interferon-beta superinduction that in turn amplifies lipid peroxidation, thus, forming a vicious cycle of destructive chronic inflammation. This mechanism offers a plausible mechanistic explanation of for the association of Myc hyperactivity with poorer treatment outcomes in TB patients and provide a novel target for host-directed TB therapy.

      Advance

      The study has the potential to advance molecular understanding of the TNF-driven state of oxidative stress previously observed in B6.Sst1S macrophages and possible implications for host control of M. tuberculosis in vivo.

      Audience

      Experts seeking understanding of host factors mediating M. tuberculosis control, or failure thereof, with appreciation for the utility of the featured mouse model in assessing TB diseases progression and severe manifestation. Interest is likely extended to audience more broadly interested in TNF-driven macrophage (dys)function in infectious, inflammatory, and autoimmune pathologies.

      Reviewer expertise

      In preparing this review, I am drawing on my expertise in assessing macrophage responses and host defense mechanisms in bacterial infections (incl. virulent M. tuberculosis) through in vitro and in vivo studies. This includes but is not limited to macrophage infection and stimulation assays, microscopy, intra-macrophage replication of M. tuberculosis, analyses of lung tissues using multi-plex IHC and spatial transcriptomics (e.g. GeoMx). I am familiar with the interpretation of RNAseq analyses in human and mouse cells/tissues, but can provide only limited assessment of appropriateness of algorithms and analysis frameworks.

      Reviewer #2 (Evidence, reproducibility and clarity):

      Yabaji et al. investigated the effects of BMDMs stimulated with TNF from both WT and B6.Sst1S mice, which have previously been identified to contain the sst1 locus conferring susceptibility to Mycobacterium tuberculosis. They identified that B6.Sst1S macrophages show a superinduction of IFNß, which might be caused by increased c-Myc expression, expanding on the mechanistic insights made by the same group (Bhattacharya et al. 2021). Furthermore, prolonged TNF stimulation led to oxidative stress, which WT BMDMs could compensate for by the activation of the antioxidant defense via NRF2. On the other hand, B6.Sst1S BMDMs lack the expression of SP110 and SP140, co-activators of NRF2, and were therefore subjected to maintained oxidative stress. Yabaji et al. could link those findings to in vivo studies by correlating the presence of stressed and aberrantly activated macrophages within granulomas to the failure of Mtb control, as well as the progression towards necrosis. As the knowledge regarding Mtb progression and necrosis of granulomas is not yet well understood, findings that might help provide novel therapy options for TB are crucial. Overall, the manuscript has interesting findings with regard to macrophage responses in Mycobacteria tuberculosis infection.

      However, in its current form there are several shortcomings, both with respect to the precision of the experiments and conclusions drawn.

      In particular a) important controls are often missing, e.g. T-cells form non-immune mice in Fig. 6J, in F, effectivity of BCG in B6 mice in 6N; b) single experiments are shown throughout the manuscript, in particular western blots and histology without proper quantification and statistics, this is absolutely not acceptable; c) very few repetitions are shown in in vitro experiments, where there is no evidence for limitation in resources (usually not more than 3), it is not clear what "independent experiment means" - i.e. the robustness of the findings is questionable; d) data are often normalized multiple times, e.g. in the case of qPCR, and the methods of normalization are not clear (what house-keeping gene exactly?);

      Moreover, experiments regarding IFN I signaling (e.g. short term TNF treatment of BMDMs to analyze LPO, making sure that the reporter mouse for IFNß works in vivo) and c-Myc (e.g. the increase after M-CSF addition might impact on other analysis as well and the experiments should be adjusted to control for this effect; MYC expression in the human samples) should be carefully repeated and evaluated to draw correct conclusions.

      In addition, we would like to strongly encourage the authors to more precisely outline the experimental set-ups and figure legends, so that the reader can easily understand and follow them. In other words: The legends are - in part very - incomplete. In addition, the authors should be mindful of gene names vs. protein names and italicize where appropriate.

      We appreciate a very thorough evaluation of our manuscript by this reviewer. Their insightful comments helped us improve the manuscript. As outlined below in point-by-point responses (1) we added important controls including isotype control antibodies in IFNAR blocking experiments and non-vaccinated T cells in T cell – macrophage interactions experiments; updated figure legends to indicate number of repeated experiment where a representative experiment is shown, numbers of mouse lungs and individual lesions, methods of data normalization, where it was missing. We also explained our in vitro experimental design and how we analyzed and excluded effects of media change and fresh CSF1 addition, by using a rest period before TNF stimulation and Mtb infection. The data shown in Suppl. Fig. 6C (previously Suppl. Fig. 5B) demonstrate that Myc levels induced by CSF1 return to the basal level at 12 h after media change. Our detailed in vitro protocol that contains these details has been published (Yabaji et al., STAR Protocols, 2022). We added new data demonstrating the ROS and LPO production at 6h of TNF stimulation, while the Ifnb1 mRNA super-induction occurred at 16 – 18 h, and edited the text to highlight these dynamics. The upregulation of Myc pathway in human samples does not necessarily mean the upregulation of Myc itself, it could be due to the dysregulation of downstream pathways. The upregulation of Myc pathway in the blood transcriptome associated with TB treatment failure most likely reflects greater proportion of immature cells in peripheral blood, possibly due to increased myelopoiesis. The detailed analysis of this cell populations in human patients is suggested by our findings but it is beyond the scope of this study.

      The reviewer’s comments also suggested that a summary of our findings was necessary. The main focus of our study was to untangle connections between oxidative stress and Ifnb1 superinduction. It revealed that Myc hyperactivity caused partial deficiency of antioxidant defense leading to type I interferon pathway hyperactivity that in turn amplifies lipid peroxidation, thus establishing a vicious cycle driving inflammatory tissue damage.

      Our laboratory worked on mechanisms of TB granuloma necrosis over more than two decades using genetic, molecular and immunological analyses in vitro and in vivo. It provided mechanistic basis for independent studies in other laboratories using our mouse model and further expanding our findings, thus supporting the reproducibility and robustness of our results and our lab’s expertise.

      Specific comments to the experiments and data:

      - Fig. 1E: Evaluation of differences in up- and downregulation between B6 and B6.Sst1S cells should highlight where these cells are within the heatmap, as it is only labelled with the clusters, or it should be depicted differently (in particular for cluster 1 and 2). Furthermore, a more simple labelling of the pathways would increase the readability of the data.

      For our scRNAseq data presentation, we used formats accepted by computational community. To clarify Fig.1E, we added labels above B6 and B6.Sst1S-specific clusters.

      - Fig. 2D, E: The staining legend is missing. For the quantification it is not clear what % total means. Is this based on the intensity or area? What do the dots represent in the bar chart? Is one data point pooled from several pictures? If not, the experiments need to be repeated, as three pictures might not be representative for evaluation.

      - Fig. 2E: Statistics comparing B6/ B6,SsT1S with TNF (different) is required: Absence of induction is not a proof for a difference!

      We included staining with NRF2-specific antibodies and performed area quantification per field using ImageJ to calculate the NRF2 total signal intensity per field. Each dot in the graph represents the average intensity of 3 fields in a representative experiment. The experiment was repeated 3 times. We included pairwise comparison of TNF-stimulated B6 and B6.Sst1S macrophages and updated the figure legend.

      - Fig. 3E: Positive and negative control need to be depicted in the figure (see legend).

      We have added the positive and negative controls for the determination of labile iron pool to the data in Fig. 3E and related Suppl. Fig. 3B and to Fig. 5D that also demonstrates labile iron determination.

      - Fig. 3I: A quantification by flow cytometry or total cell counts are important, as 6% cell death in cell culture is a very modest observation. Otherwise, confocal images of the quantification would be a good addition to judge the specificity of the viability staining.

      To validate the specificity of the viability staining method, we have provided fluorescent images as Suppl.Fig.3H. The main point of this experiment was to demonstrate a modest, but reproducible, increase in cell death in the sst1-mutant macrophages that suggested an IFNdependent oxidative damage. In our study, we did not focus on mechanisms of cell death, but on a state of chronic oxidative stress in the sst1 mutant live cells during TNF stimulation.

      - Fig. 3I, J: What does one dot represent?

      We performed this assay in 96 well format and each dot represent the % cell death in an individual well.

      - Fig. 3K,L: For the B6 BMDMs it seems that p-cJun is highly increased at 12h in (L), while it is not in (K). On the other hand, for the B6.Sst1S BMDMs it peaks at 24h in (K), while in (L) it seems to at 12h. According to the data in (L) it seems that p-cJun is rather earlier and stronger activated in B6 BMDMs and has a weakened but prolonged activation in the B6.Sst1S BMDMs, which would not fit with your statement in the text that B6.Sst1S BMDMs show an upregulation.

      These experiments need repetitions and quantification and statistiscs.

      Fig. 3L: ASK1 seems to be higher at 12h for the B6 BMDMs and similar for both lines at 24h, which is not fitting to the statement in the text. ("Also, the ASK1 - JNK - cJun stress kinase axis was upregulated in B6.Sst1S macrophages, as compared to B6, after 12 - 36 h of TNF stimulation")

      These experiments were repeated, and new data were added to highlight differences in ASK1 and c-Jun phosphorylation between B6 and B6.Sst1S at individual timepoints after TNF stimulation (presented in new Fig.3K). It demonstrated that after TNF stimulation the activation of stress kinases ASK1 and c-Jun initially increased in both genetic backgrounds. However, their upregulation was maintained exclusively in the sst1-susceptible macrophages from 24 to 36 h of TNF stimulation, while in the resistant macrophages their upregulation was transient. Thus, during prolonged TNF stimulation, B6.Sst1S macrophages experience stress that cannot be resolved, as evidenced by this kinetic analysis. The quantification of the band intensity was added to Western blot images above individual lanes.

      Reviewer 2 pointed to missing isotype control antibodies in Fig.3 and Fig.4:

      - Figure 3J: the isotype control for the IFNAR antibody is missing

      - Figure 4E: It seems the isotype control itself has already an effect in the reduction of IFNb.

      - Fig. 4H: It seems that the Isotype control antibody had an effect to increase 4-HNE (compared to TNF stimulated only).

      We always include isotype control antibodies in our experiments because antibodies are known to modulate macrophage activation via binding to Fc receptor. To address the reviewer’s comments, we updated all panels that present the effects of IFNAR1 blockade with isotypematched non-specific control antibodies in the revised manuscript. Specifically, we included isotype control in Fig. 3M (previously Fig.3J), Fig.4I, Suppl.4E-G, Fig.6L-M), Suppl.Fig.7I (previously Suppl.Fig.6F).

      - Fig.4A - C: "IFNAR1 blockade, however, did not increase either the NRF2 and FTL protein levels, or the Fth, Ftl and Gpx1 mRNA levels above those treated with isotype control antibodies"

      Maybe not above the isotype but it is higher than the TNF alone stimulation at least for NRF2 at 8h and for Ftl at both time points. Why does the isotype already cause stimulation/induction of the cells? !These experiments need repetitions and quantification and statistics!

      To determine specific effects of IFNAR blockade we compared effects of non-specific isotype control and IFNAR1-specific antibodies. In our experiments, the isotype control antibody modestly increased of Nrf2 and Ftl protein levels and the Fth and Ftl mRNA levels, but their effects were similar to the effect of IFNAR-specific antibody. The non-IFN -specific effects of antibodies, although are of potential biological significance, are modest in our model and their analysis is beyond the scope of this study.

      - Fig.4H Was the AB added also at 12h post stimulation? Figure legend should be adjusted.

      The IFNAR1 blocking antibodies and isotype control antibodies were added at 2 h after TNF stimulation in Fig.4H and 4I, as described in the corresponding figure legend. The data demonstrating effects of IFNAR blockade after 12, 24,and 33h of TNF stimulation are presented in Suppl.Fig.4 E-G.

      - Figure 4I: How was the data measured here, i.e. what is depicted? The isotype control is missing. It seems a two-way ANOVA was used, yet it is stated differently. The figure legend should be revised, as Dunnett's multiple comparison would only check for significances compared to the control.

      The microscopy images and bar graphs were updated to include isotype control and presented in Suppl. Fig.4E - G of the revised version. We also revised the statistical analysis to include correction for multiple comparisons.

      - Figure 4C and subsequent: How exactly was the experiment done (house-keeping gene)?

      We included the details in the figure legends of revised version. We quantified the gene expression by DDCt method using b-actin (for Fig. 4C-E) and 18S (For Fig. 4F and G) as internal controls.

      - Figure 4D,E: Information on cells used is missing. Why the change in stimulation time? Did it not work after 12h? Then the experiments in A-C should be repeated for 16h.

      The updated Fig. 4D and E present comparison of B6 and B6.Sst1S BMDMs clearly demonstrating significant difference between these macrophages in Ifnb1 mRNA expression 16 h after TNF stimulation, in agreement with our previous publication(Bhattacharya, et al., 2021). There we studied the time course of responses of B6 and B6.Sst1S macrophages to TNF at 2h intervals and demonstrated the divergence between their activation trajectories starting at 12 h of TNF stimulation Therefore, to reveal the underlying mechanisms we focus our analyses on this critical timepoint, i.e. as close to the divergence as possible. However, the difference between the strains in Ifnb1 mRNA expression achieved significance only by 16h of TNF stimulation. That is why we have used this timepoint for the Ifnb1 and Rsad2 analyses. It clearly shows that the superinduction was not driven by the positive feedback via IFNAR, as has been shown by the Ivashkiv lab for B6 wild type macrophages previously PMID 21220349.

      - Figure 4E: It would be helpful to see if these transcripts are actually translated into protein levels, e.g. perform an ELISA. Authors state that IFNAR blockages does not alter the expression but you statistic says otherwise.

      - The data for Ifnb expression (or better protein level) should be provided for B6 BMDMs as well.

      We have previously reported the differences in Ifnb protein secretion (He et al., Plos Pathogens, 2013 and Bhattacharya et al., JCI 2021). We use mRNA quantification by qRT-PCR as a more sensitive and direct measurement of the sst1-mediated phenotype. The revised Fig.4D and E include responses of B6 in addition to the B6.Sst1S to demonstrate that the IFNAR blockade does not reduce the Ifnb1 mRNA levels in TNF-stimulated B6.Sst1S mutant to the B6 wild type levels. A slight reduction can be explained by a known positive feedback loop in the IFN-I pathway (see above). In this experiment we emphasized that the effect of the sst1 locus is substantially greater, as compared to the effect of the IFNAR blockade (Fig.4D), and updated the text accordingly.

      - Fig. 4F: To what does the fold induction refer to? If it is again to unstimulated cells, then why is the induction now so much higher than in (E) where it was only 50x (now to 100x).

      - Figure 4G: Again to what is the fold induction referring to? It seems your Fer-1 treatment only contains 2 data points. This needs to be fixed.

      Yes, the fold induction was calculated by normalizing mRNA levels to untreated control incubated for the same time. Regarding the variation in Ifnb1 mRNA levels - a two-fold variation is not unusual in these experiments that may result in the Ifnb1 mRNA superinduction ranging from 50 -200-fold at this timepoint (16h). The graph in Fig.4G was modified to make all datapoints more visible.

      - "These data suggest that type I IFN signaling does not initiate LPO in our model but maintains and amplifies it during prolonged TNF stimulation that, eventually, may lead to cell death". Data for a short term TNF stimulation are not shown, however, so it might impact also on the initiation of LPO.

      - The overall conclusion drawn from Fig. 3 and 4 is not really clear with regard that IFN does not initiate LPO. Where is that shown? Data on earlier stimulation time points should be added to make this clear.

      We demonstrated ROS production (new Suppl.Fig.3G) and the rate of LPO biosynthesis (new Suppl.Fig.4E-F) at 6 h post TNF stimulation, while the Ifnb1 superinduction occurs between 12-18 h post TNF stimulation. This temporal separation supports our conclusion that IFN-β superinduction does not initiate LPO. We clarified it in the text:

      “Thus, Ifnb1 super-induction and IFN-I pathway hyperactivity in B6.Sst1S macrophages follow the initial LPO production, and maintain and amplify it during prolonged TNF stimulation”. (Previously: These data suggest that type I IFN signaling does not initiate LPO in our model). We also edited the conclusion in this section to explain the hierarchy of the sst1-regulated AOD and IFN-I pathways better:

      “Taken together, the above experiments allowed us to reject the hypothesis that IFN-I hyperactivity caused the sst1-dependent AOD dysregulation. In contrast, they established that the hyperactivity of the IFN-I pathway in TNF-stimulated B6.Sst1S macrophages was itself driven by the initial dysregulation of AOD and iron-mediated lipid peroxidation. During prolonged TNF stimulation, however, the IFN-I pathway was upregulated, possibly via ROS/LPOdependent JNK activation, and acted as a potent amplifier of lipid peroxidation”.

      We believe that these additional data and explanation strengthen our conclusions drawn from Figures 3 and 4.

      - "A select set of mouse LTR-containing endogenous retroviruses (ERV's) (Jayewickreme et al, 2021), and non-retroviral LINE L1 elements were expressed at a basal level before and after TNF stimulation, but their levels in the B6.Sst1S BMDMs were similar to or lower than those seen in B6". This sentence should be revised as the differences between B6 and B6.Sst1S BMDMs seem small and are not there after 48h anymore. Are these mild changes really caused by the mutation or could they result from different housing conditions and/or slowly diverging genetically lines. How many mice were used for the analysis? Is there already heterogeneity between mice from the same line?

      We agree with the reviewer that the data presented in Suppl.Fig.4 (Suppl.Fig.5 in the revised version) indicated no increase in single- and double-stranded transposon RNAs in the B6.Sst1S macrophages. The purpose of these experiment was to test the hypothesis that increased transposon expression might be responsible for triggering the superinduction of type I interferon response in TNF-stimulated B6.Sst1S macrophages. In collaboration with a transposon expert Dr. Nelson Lau (co-author of this manuscript) we demonstrated that transposon expression was not increased above the B6 level and, thus, rejected this attractive hypothesis. We explained the purpose of this experiment in the text and adequately described our findings as “the levels in the B6.Sst1S BMDMs were similar to or lower than those seen in B6”…and concluded that ” the above analyses allowed us to exclude the overexpression of persistent viral or transposon RNAs as a primary mechanism of the IFN-I pathway hyperactivity” in the sst1-mutant macrophages.

      - Fig. 5A: Indeed, it even seems that Myc is upregulated for the mutant BMDMs. Yet, there are only 2 data points for B6 12h.

      These experiments need repetitions and quantification and statistics.

      We observed these differences in c-Myc mRNA levels by independent methods: RNAseq and qRT-PCR. The qRT-PCR experiments were repeated 3 times. A representative experiment in Fig.5A shows 3 data points for each condition. We reformatted the panel to make all data points clearly visible.

      - Fig. 5B: Why would the protein level decrease in the controls over 6h of additional cultivation? Is this caused by fresh M-CSF? In this case maybe cells should be left to settle for one day before stimulating them to properly compare c-Myc induction. Comment on two c-Myc bands is needed. At 12h only the upper one seems increased for TNF stimulated mutant BMDMs compared to B6 BMDMs.

      We agree with the reviewer’s point that cells need to be rested after media change that contains fresh CSF-1. Indeed, in Suppl.Fig.6C, we show that after media change containing 10% L929 supernatant (a source of CSF1) there is an increase in c-Myc protein levels that takes approximately 12 hours to return to baseline.

      Our protocol includes resting period of 18-24 h after medium change before TNF stimulation.

      We updated Methods to highlight this detail. Thus, the increase in c-Myc levels we observe at 12 h of TNF stimulation (Fig.5B) is induced by TNF, not the addition of growth factors, as further discussed in the text.

      The two c-Myc bands observed in Fig.5B,I and J, are similar to patterns reported in previous studies that used the same commercial antibodies (PMIDs: 24395249, 24137534, 25351955). Whether they correspond to different c-Myc isoforms or post-translational modifications is unknown.

      - Fig. 5A,B: It seems that not all the RNA is translated into protein, as c-Myc at 12h in the mutant BMDMs seems to be lower than at 6h, while the gene expression implicates it vice versa.

      In addition to Fig.5B, the time course of Myc protein expression up to 24 h is presented in new panels Fig. 5I-5J. It demonstrates the gradual decrease of Myc protein levels. The observed dissociation between the mRNA and protein levels in the sst1-mutant BMDMs at 12 and 24 h is most likely due to translation inhibition as a result of the development of the integrated stress response, ISR (as shown in our previous publication by Bhattacharya et al., JCI, 2021). Translation of Myc is known to be particularly sensitive to the ISR (PMID18551192, PMID25079319, PMID28490664). Perhaps, the IFN-driven ISR may serve as a backup mechanism for Myc downregulation. We are planning to investigate these regulatory mechanisms in greater detail in the future.

      - Fig. 5J: Indeed, the inhibitor seems to cause the downregulation of the proteins. Explanation?

      This experiment was repeated twice and the average normalized densitometry values are presented in the updated Fig.5J. The main question addressed in this experiment was whether hyperactivity of JNK in TNF-stimulated sst1 mutant macrophages contributed to Myc upregulation, as had been previously shown in cancer. Comparing effects of JNK inhibition on phospho-cJun and c-Myc protein levels in TNF stimulated B6.Sst1S macrophages (updated Fig.5J), we rejected the hypotghesis that JNK activity might have a major role in c-Myc upregulation in sst1 mutant macrophages.

      - "TNF stimulation tended to reduce the LPO accumulation in the B6 macrophages and to increase it in the B6.Sst1S ones" However, this is not apparent in Sup. Fig. 6B. Here it seems that there might be a significant increase.

      Suppl.Fig.6B (currently Suppl.Fig.7B) shows the 4-HNE accumulation at day 3 post infection. The data obtained after 5 days of Mtb infection are shown in Fig.6A. We clarified this in the text: “By day 5 post infection, TNF stimulation induced significant LPO accumulation only in the B6.Sst1S macrophages (Fig.6A)”.

      - Fig. 6B: Mtb and 4-HNE should be shown in two different channels in order to really assign each staining correctly.

      What time point is this? Are the mycobacteria cleared at MOI1, since it looks that there are fewer than that? How does this look like for the B6 BMDMs? Are there even less mycobacteria?

      We included B6 infection data to the updated Fig.6B and added Suppl.Fig.7C and 7D that address this reviewer’s comment. The data represent day 5 of Mtb infection as indicated in the updated Fig.6B and Suppl.Fig.7C and 7D legends. New Suppl.Fig.7D shows quantification of replicating Mtb using Mtb replication reporter stain expressing single strand DNA binding protein GFP fusion, as described in Methods. We observed fewer Mtb and a lower percentage of replicating Mtb in B6 macrophages, but we did not observe a complete Mtb elimination in either background.

      We used red fluorescence for both Mtb::mCherry and 4-HNE staining to clearly visualize the SSB-GFP puncta in replicating Mtb DNA. In the revised manuscript, we have included the relevant channels in Suppl. Fig.7C and D to demonstrate clearly distinct patterns of Mtb::mCherry and 4-HNE signals. We did not aim to quantify the 4-HNE signal intensity in this experiment. For the 4-HNE quantification we use Mtb that expressed no reporter proteins (Fig.6A-B and Suppl.Fig.7A-B).

      - Fig 6E: In the context of survival a viability staining needs to be included, as well as the data from day 0. Then it needs to be analyzed whether cell numbers remain the same from D0 or if there is a change.

      We updated Fig.6 legend to indicate that the cell number percentages were calculated based on the number of cells at Day 0 (immediately after Mtb infection). We routinely use fixable cell death staining to enumerate cell death to exclude artifacts due to cell loss. Brief protocol containing this information is included in Methods section. The detailed protocol including normalization using BCG spike has been published – Yabaji et al, STAR Protocols, 2022. Here we did not present dead cell percentage as it remained low and we did not observe damage to macrophage monolayers. The fold change of Mtb was calculated after normalization using Mtb load at Day 0 after infection and washes.

      "The 3D imaging demonstrated that YFP-positive cells were restricted to the lesions, but did not strictly co-localize with intracellular Mtb, i.e. the Ifnb promoter activity was triggered by inflammatory stimuli, but not by the direct recognition of intracellular bacteria. We validated the IFNb reporter findings using in situ hybridization with the Ifnb probe, as well as anti-GFP antibody staining (Suppl.Fig.8B - E)." The colocalization is not present within the tissue sections. It seems that the reporter line does not show the same staining pattern in vivo as the IFNß probe or the anti GFP antibody staining. The reporter line has to be tested for the specificity of the staining. Furthermore, to state that it was restricted to the lesions, an uninvolved tissue area needs to be depicted.

      The Ifnb secreting cells are notoriously difficult to detect in vivo using direct staining of the protein. Therefore, lineage tracing of reporter expression are used as surrogates. The Ifnb reporter used in our study has been developed by the Locksley laboratory (Scheu et al., PNAS, 2008, PMID: 19088190) and has been validated in many independent studies. The reporter mice express the YFP protein under the control of the Ifnb1 promoter. The YFP protein accumulates within the cells, while Ifnb protein is rapidly secreted and does not accumulate in the producing cells in appreciable amounts. Also, the kinetics of YFP protein degradation is much slower as compared to the endogenous Ifnb1 mRNA that was detected using in situ hybridization. Thus, there is no precise spatiotemporal coincidence of these readouts in Ifnb expressing cells in vivo. However, this methodology more closely reflect the Ifnb expressing cells in vivo, as compared to a Cre-lox mediated lineage tracing approach. In the revised manuscript we demonstrate that both YFP and mRNA signals partially overlap (Suppl.Fig.12B). In Suppl.Fig.12B. we also included a new panel showing no YFP expression in the uninvolved area of the reporter mice infected with Mtb. The YFP expression by activated macrophages is demonstrated by co-staining with Iba1- and iNOS-specific antibodies (new Fig.7D and Suppl.Fig.13A). Our specificity control also included TB lesions in mice that do not carry the YFP reporter and did not express the YFP signal, as reported elsewhere (Yabaji et al., BioRxiv, https://doi.org/10.1101/2023.10.17.562695).

      - Are paucibacillary and multibacillary lesions different within the same animal or does one animal have one lesion phenotype? If that is the case, what is causing the differences between mice? Bacterial counts for the mice are required.

      The heterogeneity of pulmonary TB lesions has been widely acknowledged in clinic and highlighted in recent experimental studies. In our model of chronic pulmonary TB (described in detail in Yabaji et al., https://doi.org/10.1101/2025.02.28.640830 and https://doi.org/10.1101/2023.10.17.562695) the development of pulmonary TB lesions is not synchronized, i.e. the lesions are heterogeneous between the animals and within individual animals at the same timepoint. Therefore, we performed a lesion stratification where individual lesions were classified by a certified veterinary pathologist in a blinded manner based on their morphology (H&E) and acid fast staining of the bacteria, as depicted in Suppl.Fig.8.

      - "Among the IFN-inducible genes upregulated in paucibacillary lesions were Ifi44l, a recently described negative regulator of IFN-I that enhances control of Mtb in human macrophages (DeDiego et al, 2019; Jiang et al, 2021) and Ciita, a regulator of MHC class II inducible by IFNy, but not IFN-I (Suppl.Table 8 and Suppl.Fig.10 D-E)." Why is Sup. Fig. 10 D, E referred to? The figure legend is also not clear, e.g. what means "upregulated in a subset of IFN-inducible genes"? Input for the hallmarks needs to be defined.

      These data is now presented in Suppl.Fig.11 and following the reviewer’s comment, we moved reference to panels 11D – E up to previous paragraph in the main text, where it naturally belongs . We also edited the figure legend to refer to the list of IFN-inducible genes compiled from the literature that is discussed in the text. We appreciate the reviewer’s suggestion that helped us improve the text clarity. The inputs for the Hallmark pathway analysis are presented in Suppl.Tables 7 and 8, as described in the text.

      - Fig. 7C: Single channel pictures are required as it is hard to see the differences in staining with so many markers. Why is there no iNOS expression in the bottom row? What does the rectangle indicate on the bottom right? As black is chosen for DAPI, it is not visible at all. In case the signal is needed a visible a color should be chosen.

      We thoroughly revised this figure to address the reviewer’s concern about the lack of clarity. We provide individual channels for each marker in Fig.7D – E and Suppl.Fig.13F. We have to use DAPI in these presentation in gray scale to better visualize other markers.

      - "In the advanced lesions these markers were primarily expressed by activated macrophages (Iba1+) expressing iNOS and/or Ifny (YFP+)(Fig.7D)" Iba1 is needed in the quantification. Based on the images, iNOS seems to be highly produced in Iba1 negative cells. Which cells do produce it then? Flow cytometry data for this quantification are required. This would allow you to specifically check which cells express the markers and allow for a more precise analysis of double positive cells.

      Currently these data demonstrating the co-localization of stress markers phospho-c-Jun and Chac1 with YFP are presented in Fig.7E (images) and Suppl.Fig.13D (quantification). The co-localization of stress markers phospho-cJun and Chac1 with iNOS is presented in Suppl.Fig.13F (images) and Suppl.Fig.13E (quantification). We agree that some iNOS+ cells are Iba1-negative (Fig.7D). We manually quantified percentages of Iba1+iNOS+ double positive cells and demonstrated that they represent the majority of the iNOS+ population(Suppl.Fig.13A). Regarding the required FACS analysis, we focus on spatial approaches because of the heterogeneity of the lesions that would be lost if lungs are dissociated for FACS. We are working on spatial transcriptomics at a single cell resolution that preserves spatial organization of TB lesions to address the reviewer’s comment and will present our results in the future.

      - Results part 6: In general, can you please state for each experiment at what time point mice were analyzed? You should include an additional macrophage staining (e.g. MerTK, F4/80), as alveolar macrophages are not staining well for Iba1 and you might therefore miss them in your IF microscopy. It would be very nice if you could perform flow cytometry to really check on the macrophages during infection and distinguish subsets (e.g. alveolar macrophages, interstitial macrophages, monocytes).

      We have included the details of time post infection in figure legends for Fig.7, Suppl.Figures 8, 9, 12B, 13, 14A of the revised manuscript. We have performed staining with CD11b, CD206 and CD163 to differentiate the recruited and lung resident macrophages and determined that in chronic pulmonary TB lesions in our model the vast majority of macrophages are recruited CD11b+, but not resident (CD206+ and CD163+) macrophages. These data is presented in another manuscript (Yabaji et al., BioRxiv https://doi.org/10.1101/2023.10.17.562695).

      - Spatial sequencing: The manuscript would highly profit from more data on that. It would be very interesting to check for the DEGs and show differential spatial distribution. Expression of marker genes should be inferred to further define macrophage subsets (e.g. alveolar macrophages, interstitial macrophages, recruited macrophages) and see if these subsets behave differently within the same lesion but also between the lesions. Additional bioinformatic approaches might allow you to investigate cell-cell interactions. There is a lot of potential with such a dataset, especially from TB lesions, that would elevate your findings and prove interesting to the TB field.

      - "Thus, progression from the Mtb-controlling paucibacillary to non-controlling multibacillary TB lesions in the lungs of TB susceptible mice was mechanistically linked with a pathological state of macrophage activation characterized by escalating stress (as evidenced by the upregulation phospho-cJUN, PKR and Chac1), the upregulation of IFNβ and the IFN-I pathway hyperactivity, with a concurrent reduction of IFNγ responses." To really show the upregulation within macrophages and their activation, a more detailed IF microscopy with the inclusion of additional macrophage markers needs to be provided. Flow cytometry would enable analysis for the differences between alveolar and interstitial macrophages, as well as for monocytes. As however, it seems that the majority of iNOS, as well as the stress associated markers are not produced by Iba1+ cells. Analyzing granulocytes and T lymphocytes should be considered.

      We appreciate the reviewer’s suggestion. Indeed, our model provides an excellent opportunity to investigate macrophage heterogeneity and cell interactions within chronic TB lesions. We are working on spatial transcriptomics at a single cell resolution that would address the reviewer’s comment and will present our results in the future.

      In agreement with classical literature the overwhelming majority of myeloid cells in chronic pulmonary TB lesions is represented by macrophages. Neutrophils are detected at the necrotic stage, but our study is focused on pre-necrotic stages to reveal the earlier mechanisms predisposing to the necrotization. We never observed neutrophils or T cells expressing iNOS in our studies.

      - It's mentioned in the method section that controls in the IF staining were only fixed for 10min, while the infected cells were fixed for 30min. Consistency is important as the PFA fixation might impact on the fluorescence signal. Therefore, controls should be repeated with the same fixation time.

      We have carefully considered the impact of fixation time on fluorescence and have separately analyzed the non-infected and infected samples to address this concern. For the non-infected samples, we examined the effect of TNF in both B6 and B6.Sst1S backgrounds, ensuring that a consistent fixation protocol (10 min) was applied across all experiments without Mtb infection.

      For the Mtb infection experiments, we employed an optimized fixation protocol (30 min) to ensure that Mtb was killed before handling the plates, which is critical for preserving the integrity of the samples. In this context, we compared B6 and B6.Sst1S samples to evaluate the effects of fixation and Mtb infection on lipid peroxidation (LPO) induction.

      We believe this approach balances the need for experimental consistency with the specific requirements for handling infected cells, and we have revised the manuscript to reflect this clarification.

      - Reactive oxygen species levels should be determined in B6 and B6.Sst1S BMDMs (stimulated and unstimulated), as they are very important for oxidative stress.

      We have conducted experiments to measure ROS production in both B6 and B6.Sst1S BMDMs and demonstrated higher levels of ROS in the susceptible BMDMs after prolonged TNF stimulation (new Fig.3I-J and Suppl. Fig. 3G). Additionally, we have previously published a comparison of ROS production between B6 and B6.Sst1S by FACS (PMID: 33301427), which also supports the findings presented here.

      - Sup. Fig 2C: The inclusion of an unstimulated control would be advisable in order to evaluate if there are already difference in the beginning.

      We have included the untreated control to the Suppl. Fig. 2C (currently Suppl. Fig. 2D) in the revised manuscript.

      - Sup. Fig. 3F: Why is the fold change now lower than in Fig. 4D (fold change of around 28 compared to 120 in 4D)?

      The data in Fig.4D (Fig.4E in the revised manuscript) and Suppl.Fig.3F (currently Suppl.Fig.4C) represent separate experiments and this variation between experiments is commonly observed in qRT-PCR that is affected by slight variations in the expression in unsimulated controls used for the normalization and the kinetics of the response. This 2-4 fold difference between same treatments in separate experiments, as compared to 30 – 100 fold and higher induction by TNF does not affect the data interpretation.

      - Sup. Fig. 5C, D: The data seems very interesting as you even observe an increase in gene expression. Data for the B6 mice should be evaluated for increase to a similar level as the TNF treated mutants. Data on the viability of the cells are necessary, as they no longer receive MCSF and might be dying at this point already.

      To ensure that the observed effects were not confounded by cytotoxicity, we determined non-toxic concentrations of the CSF1R inhibitors during 48h of incubation and used them in our experiments that lasted for 24h. To address this valid comment, we have included cell viability data in the revised manuscript to confirm that the treatments did not result in cell death (Suppl. Fig. 6D). This experiment rejected our hypothesis that CSF1 driven Myc expression could be involved in the Ifnb superinduction. Other effects of CSF1R inhibitors on type I IFN pathway are intriguing but are beyond the scope of this study.

      - Sup. Fig 12: the phospho-c-Jun picture for (P) is not the same as in the merged one with Iba1. Double positive cells are mentioned to be analyzed, but from the staining it appears that P-c-Jun is expressed by other cells. You do not indicate how many replicates were counted and if the P and M lesions were evaluated within the same animal. What does the error bar indicate? It seems unlikely from the plots that the double positive cells are significant. Please provide the p values and statistical analysis.

      We thank the reviewer for bringing this inadvertent field replacement in the single phospho-cJun channel to our attention. However, the quantification of Iba1+phospho-cJun+ double positive cells in Suppl.Fig.12 and our conclusions were not affected. In the revised manuscript, images and quantification of phospho-cJun and Iba1 co-expression are shown in new Suppl.Fig.13B and C, respectively. We have also updated the figure legends to denote the number of lesions analyzed and statistical tests. Specifically, lesions from 6–8 mice per group (paucibacillary and multibacillary) were evaluated. Each dot in panels Suppl.Fig.13 represent individual lesions.

      - Sup. Fig. 13D (suppl.Fig.15D now): What about the expression of MYC itself? Other parts of the signaling pathway should be analyzed(e.g. IFNb, JNK)?

      The difference in MYC mRNA expression tended to be higher in TB patients with poor outcomes, but it was not statistically significant after correction for multiple testing. The upregulation of Myc pathway in the blood transcriptome associated with TB treatment failure most likely reflects greater proportion of immature cells in peripheral blood, possibly due to increased myelopoiesis. Pathway analysis of the differentially expressed genes revealed that treatment failures were associated with the following pathways relevant to this study: NF-kB Signaling, Flt3 Signaling in Hematopoietic Progenitor Cells (indicative of common myeloid progenitor cell proliferation), SAPK/JNK Signaling and Senescence (possibly indicative of oxidative stress). The upregulation of these pathways in human patients with poor TB treatment outcomes correlates with our findings in TB susceptible mice.

      - In the mfIHC you he usage of anti-mouse antibodies is mentioned. Pictures of sections incubated with the secondary antibody alone are required to exclude the possibility that the staining is not specific. Especially, as this data is essential to the manuscript and mouse-antimouse antibodies are notorious for background noise.

      We are well aware of the technical difficulties associated with using mouse on mouse staining. In those cases, we use rabbit anti-mouse isotype specific antibodies specifically developed to avoid non-specific background (Abcam cat#ab133469). Each antibody panel for fluorescent multiplexed IHC is carefully optimized prior to studies. We did not use any primary mouse antibodies in the final version of the manuscript and, hence, removed this mention from the Methods.

      - In order to tie the story together, it would be interesting to treat infected mice with an INFAR antibody, as well as perform this experiment with a Myc antibody. According to your data, you might expect the survival of the mice to be increased or bacterial loads to be affected.

      In collaboration with the Vance laboratory, we tested effects of type I IFN pathway inhibition in B6.Sst1S mice on TB susceptibility: either type I receptor knockout or blocking antibodies increased their resistance to virulent Mtb (published in Ji et al., 2019; PMID 31611644). Unfortunately, blocking Myc using neutralizing antibodies in vivo is not currently achievable. Specifically blocking Myc using small molecule inhibitors in vivo is notoriously difficult, as recognized in oncology literature. We consider using small molecule inhibitors of either Myc translation or specific pathways downstream of Myc in the future.

      - It is surprising that you not even once cite or mention your previous study on bioRxiv considering the similarity of the results and topic (https://doi.org/10.1101/2020.12.14.422743). Is not even your Figure 1I and Figure 2 J, K the same as in that study depicted in Figure 4?

      The reviewer refers to the first version of this manuscript uploaded to BioRxiv, but it has never been published. We continued this work and greatly expanded our original observations, as presented in the current manuscript. Therefore, we do not consider the previous version as an independent manuscript and, therefore, do not cite it.

      - Please revise spelling of the manuscript and pay attention to write gene names in italics

      Thank you, we corrected the gene and protein names according to current nomenclature.

      Minor points:

      - Fig. 1: Please provide some DEGs that explain why you used this resolution for the clustering of the scRNAseq data and that these clusters are truly distinct from each other.

      Differential gene expression in clusters is presented in Suppl.Fig.1C (interferon response) and Suppl.Fig.1D (stress markers and interferon response previously established in our studies).

      - Fig. 1F: What do the two lines represent (magenta, green)?

      The lines indicate pseudotime trajectories of B6 (magenta) and B6.Sst1S (green) BMDMs.

      - Fig. 1F, G: Why was cluster 6 excluded?

      This cluster was not different between B6 and B6.Sst1S, so it was not useful for drawing the strain-specific trajectories.

      - Fig. 1E, G, H: The intensity scales are missing. They are vital to understand the data.

      We have included the scale in revised manuscript (Fig.1E,G,H and Suppl.Fig.1C-D).

      - Fig. 2G-I: please revise order, as you first refer to Fig. 2H and I

      We revised the panels’ order accordingly

      - Fig. 5: You say the data represents three samples but at least in D and E you have more. Please revise. Why do you only include at (G) the inhibitor only control?

      We added the inhibitor only controls to Fig. 5D - H. We also indicated the number of replicates in the updated Fig.5 legend.

      - Figure 7A, Sup. Fig. 8: Are these maximum intensity projection? Or is one z-level from the 3D stack depicted?

      The Fig. 7A shows 3D images with all the stacks combined.

      - Fig. 7B: What do the white boxes indicate?

      We have removed this panel in the revised version and replaced it with better images.

      - Sup. Fig. 1A: The legend for the staining is missing

      The Suppl. Fig.1A shows the relative proportions of either naïve (R and S) or TNFstimulated (RT and ST) B6 or B6.Sst1S macrophages within individual single cell clusters depicted in Fig.1B. The color code is shown next to the graph on the right.

      - Sup. Fig. 1B: The feature plots are not clear: The legend for the expression levels is missing. What does the heading means?

      We updated the headings, as in Fig.1C. The dots represent individual cells expressing Sp110 mRNA (upper panels) and Sp140 mRNA (lower panels).

      - Sup. Fig. 3C: The scale bar is barely visible.

      We resized the scale bar to make it visible and presented in Suppl. Fig.3E (previously Suppl. Fig.3C).

      - Sup. Fig. 3D: There is not figure legend or the legend to C-E is wrong.

      - Sup. Fig. 3F, G: You do not state to what the data is relative to.

      We identified an error in the Suppl.Fig.3 legend referring to specific panels. The Suppl.Fig.3 legend has been updated accordingly. New panels were added and Suppl.Fig.3-G panels are now Suppl.Fig.4C-D.

      - Sup. Fig. 3H: It seems you used a two-way ANOVA, yet state it differently. Please revise the figure legend, as Dunnett's multiple comparison would only check for significances compared to the control.

      Following the reviewer’s comment, we repeated statistical analysis to include correction for multiple comparisons and revised the figure and legend accordingly.

      - Sup. Fig. 4A, B: It is not clear what the lines depict as the legend is not explained. Names that are not required should be changed to make it clear what is depicted (e.g. "TE@" what does this refer to?)

      This previous Sup. Fig 4 is now Sup. Fig. 5. The “TE@” is a leftover label from the bioinformatics pipeline, referring to “Transposable Element”. We apologize for this confusion and have removed these extraneous labels. We have also added transposon names of the LTR (MMLV30 and RTLV4) and L1Md to Suppl.Fig.5A and 5B legend, respectively.

      - Sup. 4B: What does the y-scale on the right refer to?

      We apologize for the missing label for the y-scale on the right which represents the mRNA expression level for the SetDB1 gene, which has a much lower steady state level than the LINE L1Md, so we plotted two Y-scales to allow both the gene and transposon to be visualized on this graph.

      - Sup. 4C: Interpretation of the data is highly hindered by the fact that the scales differ between the B6 and B6.Sst1. The scales are barely visible.

      We apologize for the missing labels for the y-scales of these coverage plots, which were originally meant to just show a qualitative picture of the small RNA sequencing that was already quantitated by the total amounts in Sup. 4B. We have added thee auto-scaled Y-scales to Sup. 4C and improved the presentation of this figure.

      - Sup. Fig. 5A, B: Is the legend correct? Did you add the antibody for 2 days or is the quantification from day 3?

      We recognize that the reviewer refers to Suppl.Fig.6A-B (Suppl.Fig.7A-B in the revised manuscript). We did not add antibodies to live cells. The figure legend describes staining with 4HNE-specific antibodies 3 days post Mtb infection.

      - Sup. Fig. 8A: Are the "early" and "intermediate" lesions from the same time points? What are the definitions for these stages?

      We discussed our lesion classification according to histopathology and bacterial loads above. Of note, in the revised manuscript we simplified our classification to denote paucibacillary and multibacillary lesions only. We agree with reviewers that designation lesions as early, intermediate and advanced lesions were based on our assumptions regarding the time course of their progression from low to high bacterial loads.

      - Sup. Fig. 8E: You should state that the bottom picture is an enlargement of an area in the top one. Scale bars are missing.

      We replaced this panel with clearer images in Suppl.Fig.12B.

      - Sup. Fig. 11A: The IF staining is only visible for Iba and iNOS. Please provide single channels in order to make the other staining visible.

      Suppl.Fig.11A (now Suppl.Fig.13B) shows the low-magnification images of TB lesions. In the Fig. 7 and Suppl. Fig. 13F of the revised manuscript we provided images for individual markers.

      - Sup. Fig. 13A (Suppl.Fig.15A now): Your axis label is not clear. What do the numbers behind the genes indicate? Why did you choose oncogene signatures and not inflammatory markers to check for a correlation with disease outcome?

      X axis of Suppl.Fig.15A represent pre-defined molecular signature gene sets MSigDB) in Gene Set Enrichment Analysis (GSEA) database (https://www.gseamsigdb.org/gsea/msigdb). On Y axis is area under curve (AUC) score for each gene set.

      - Sup. 13D(Suppl.Fig.15D now): Maybe you could reorder the patients, so that the impression is clearer, as right now only the top genes seem to show a diverging gene signature, while the rest gives the impression of an equal distribution.

      The Myc upregulated gene set myc_up was identified among top gene sets associated with treatment failure using unbiased ssGSEA algorithm. We agree with the reviewer that not every gene in the myc_up gene set correlates with the treatment outcome. But the association of the gene set is statistically significant, as presented in Suppl.Fig.15B – C.

      - The scale bars for many microscopy pictures are missing.

      We have included clearly visible scale bars to all the microscopy images in the revised version.

      - The black bar plots should be changed (e.g. in color), since the single data points cannot be seen otherwise.

      - It would be advisable that a consistent color scheme would be used throughout the manuscript to make it easier to identify similar conditions, as otherwise many different colours are not required and lead right now rather to confusion (e.g. sometimes a black bar refers to BMDMs with and sometimes without TNF stimulation, or B6 BMDMs). Furthermore, plot sizes and fonts should be consistent within the manuscript (including the supplemental data)

      We followed this useful suggestion and selected consistent color codes for B6 and B6.Sst1S groups to enhance clarity throughout the revised manuscript.

      Within the methods section:

      - At which concentration did you use the IFNAR antibody and the isotype?

      We updated method section by including respective concentrations in the revised manuscript.

      - Were mice maintained under SPF conditions? At what age where they used?

      Yes, the mice are specific pathogen free. We used 10 - 14 week old mice for Mtb infection.

      - The BMDM cultivation is not clear. According to your cited paper you use LCCM but can you provide how much M-CSF it contains? How do you make sure that amounts are the same between experiments and do not vary? You do not mention how you actually obtain this conditioned medium. Is there the possibility of contamination or transferred fibroblasts that would impact on the data analysis? Is LCCM also added during stimulation and inhibitor treatment?

      We obtain LCCM by collecting the supernatant from L929 cell line that form confluent monolayer according to well-established protocols for LCCM collection. The supernatants are filtered through 0.22 micron filters to exclude contamination with L929 cells and bacteria. The medium is prepared in 500 ml batches that are sufficient for multiples experiments. Each batch of L929-conditioned medium is tested for biological activity using serial dilutions.

      - How was the BCG infection performed? How much bacteria did you use? Which BCG strain was used?

      We infected mice with M. bovis BCG Pasteur subcutaneously in the hock using 10<sup>6</sup> CFU per mouse.

      - At what density did you seed the BMDMs for stimulation and inhibitor experiments?

      In 96 well plates, we seed 12,000 cells per well and allow the cells to grow for 4 days to reach confluency (approximately 50,000 cells per well). For a 6-well plate, we seed 2.5 × 10<sup>5</sup> cells per well and culture them for 4 days to reach confluency. For a 24-well plate, we seed 50,000 cells per well and keep the cells in media for 4 days before starting any treatments. This ensures that the cells are in a proliferative or near-confluent state before beginning the stimulation or inhibitor treatments. Our detailed protocol is published in STAR Protocols (Yabaji et al., 2022; PMID 35310069).

      - What machine did you use to perform the bulk RNA sequencing? How many replicates did you include for the sequencing?

      For bulk sequencing we used 3 RNA samples for each condition. The samples were sequenced at Boston University Microarray & Sequencing Resource service using Illumina NextSeq<sup>TM</sup> 2000 instrument.

      - How many replicates were used for the scRNA sequencing? Why is your threshold for the exclusion of mitochondrial DNA so high? A typical threshold of less than 5% has been reported to work well with mouse tissue.

      We used one sample per condition. For the mitochondrial cutoff, we usually base it off of the total distribution. There is no "universal" threshold that can be applied to all datasets. Thresholds must be determined empirically.

      - You do not mention how many PCAs were considered for the scRNA sequencing analysis.

      We considered 50 PCAs, this information was added to Methods

      - You should name all the package versions you used for the scRNA sequencing (e.g. for the slingshot, VAM package)

      The following package versions were used: Seurat v4.0.4, VAM v1.0.0, Slingshot v2.3.0, SingleCellTK v2.4.1, Celda v1.10.0, we added this information to Methods.

      - You mention two batches for the human samples. Can you specify what the two batches are?

      Human blood samples were collected at five sites, as described in the updated Methods section and two RNAseq batches were processed separately that required batch correction.

      - At which temperature was the IF staining performed?

      We performed the IF at 4oC. We included the details in revised version.

      Reviewer #2 (Significance):

      Overall, the manuscript has interesting findings with regard to macrophage responses in Mycobacteria tuberculosis infection.

      However, in its current form there are several shortcomings, both with respect to the precision of the experiments and conclusions drawn.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary

      The authors use a mouse model designed to be more susceptible to M.tb (addition of sst1 locus) which has granulomatous lesions more similar to human granulomas, making this mouse highly relevant for M.tb pathogenesis studies. Using WT B6 macrophages or sst1B6 macrophages, the authors seek to understand the how the sst1 locus affects macrophage response to prolonged TNFa exposure, which can occur during a pro-inflammatory response in the lungs. Using single cell RNA-seq, revealed clusters of mutant macrophages with upregulated genes associated with oxidative stress responses and IFN-I signaling pathways when treated with TNF compared to WT macs. The authors go on to show that mutant macrophages have decreased NRF2, decreased antioxidant defense genes and less Sp110 and Sp140. Mutant macrophages are also more susceptible to lipid peroxidation and ironmediated oxidative stress. The IFN-I pathway hyperactivity is caused by the dysregulation of iron storage and antioxidant defense. These mutant macrophages are more susceptible to M.tb infection, showing they are less able to control bacterial growth even in the presence of T cells from BCG vaccinated mice. The transcription factor Myc is more highly expressed in mutant macs during TNF treatment and inhibition Myc led to better control of M.tb growth. Myc is also more abundant in PBMCs from M.tb infected humans with poor outcomes, suggesting that Myc should be further investigated as a target for host-directed therapies for tuberculosis.

      Major Comments

      Isotypes for IF imaging and confocal IF imaging are not listed, or not performed. It is a concern that the microscopy images throughout the manuscript do not have isotype controls for the primary antibodies.

      Fig 4 (and later) the anti-IFNAR Ab is used along with the Isotype antibody, Fig 4I does not show the isotype. Use of the isotype antibody is also missing in later figures as well as Fig 3J. Why was this left off as the proper control for the Ab?

      We addressed the comment in revised manuscript as described above in summary and responses to reviewers 1 and 2. Isotype controls for IFNAR1 blockade were included in Fig.3M (previously 3J), Fig. 4I, Suppl.Fig.4G (previously Fig.4I), and updated Fig.4C-E, Fig.6L-M, Suppl.Fig.4F-G, 7I.

      Conclusions drawn by the authors from some of the WB data are worded strongly, yet by eye the blots don't look as dramatically different as suggested. It would be very helpful to quantify the density of bands when making conclusions. (for example, Fig 4A).

      We added the densitometry of Western blot values after normalization above each lane in Fig.2A-C, Fig.3C-D and 3K; Fig.4A-B, Fig.5B,C,I,J.

      Fig 5A is not described clearly. If the gene expression is normalized to untreated B6 macs, then the level of untreated B6 macs should be 1. In the graph the blue bars are slightly below 1, which would not suggest that levels "initially increased and subsequently downregulated" as stated in the text. It seems like the text describes the protein expression but not the RNA expression. Please check this section and more clearly describe the results.

      We appreciate the reviewer’s comment and modified the text to specify the mRNA and protein expression data, as follows:

      “We observed that Myc was regulated in an sst1-dependent manner: in TNF-stimulated B6 wild type BMDMs, c-Myc mRNA was downregulated, while in the susceptible macrophages c-Myc mRNA was upregulated (Fig.5A). The c-Myc protein levels were also higher in the B6.Sst1S cells in unstimulated BMDMs and 6 – 12 h of TNF stimulation (Fig.5B)”.

      Also, why look at RNA through 24h but protein only through 12h? If c-myc transcripts continue to increase through 24h, it would be interesting to see if protein levels also increase at this later time point.

      The time-course of Myc expression up to 24 h is presented in new panels Fig. 5I-5J It demonstrates the decrease of Myc protein levels at 24 h. In the wild type B6 BMDMs the levels of Myc protein significantly decreased in parallel with the mRNA suppression presented in Fig.5A. In contrast , we observed the dissociation of the mRNA and protein levels in the _sst1_mutant BMDMs at 12 and 24 h, most likely, because the mutant macrophages develop integrated stress response (as shown in our previous publication by Bhattacharya et al., JCI, 2021) that is known to inhibit Myc mRNA translation.

      Fig 5J the bands look smaller after D-JNK1 treatment at 6 and 12h though in the text is says no change. Quantifying the bands here would be helpful to see if there really is no difference.

      This experiment was repeated twice, and the average normalized densitometry values are presented in the updated Fig.5J. The main question addressed in this experiment was whether the hyperactivity of JNK in TNF-stimulated sst1 mutant macrophages contributed to Myc upregulation, as was previously shown in cancer. Comparing effects of JNK inhibition on phospho-cJun and c-Myc protein levels in TNF stimulated B6.Sst1S macrophages (updated Fig.5J), we concluded that JNK did not have a major role in c-Myc upregulation in this context.

      Section 4, third paragraph, the conclusion that JNK activation in mutant macs drives pathways downstream of Myc are not supported here. Are there data or other literature from the lab that supports this claim?

      This statement was based on evidence from available literature where JNK was shown to activate oncogens, including Myc. In addition, inhibition of Myc in our model upregulated ferritin (Fig.Fig.5C), reduced the labile iron pool, prevented the LPO accumulation (Fig.5D - G) and inhibited stress markers (Fig.5H). However, we do not have direct experimental evidence in our model that Myc inhibition reduces ASK1 and JNK activities. Hence, we removed this statement from the text and plan to investigate this in the future.

      Fig 6N Please provide further rationale for the BCG in vivo experiment. It is unclear what the hypothesis was for this experiment.

      In the current version BCG vaccination data is presented in Suppl.Fig.14B. We demonstrate that stressed BMDMs do not respond to activation by BCG-specific T cells (Fig.6J) and their unresponsiveness is mediated by type I interferon (Fig.6L and 6M). The observed accumulation of the stressed macrophages in pulmonary TB lesions of the sst1-susceptible mice (Fig.7E, Suppl.Fig.13 and 14A) and the upregulation of type I interferon pathway (Fig.1E,1G, 7C), Suppl.Fig.1C and 11) suggested that the effect of further boosting T lymphocytes using BCG in Mtb-infected mice will be neutralized due to the macrophage unresponsiveness. This experiment provides a novel insight explaining why BCG vaccine may not be efficient against pulmonary TB in susceptible hosts.

      The in vitro work is all concerning treatment with TNFa and how this exposure modifies the responses in B6 vs sst1B6 macrophages; however, this is not explored in the in vivo studies. Are there differences in TNFa levels in the pauci- vs multi-bacillary lesions that lead to (or correlate with) the accumulation of peroxidation products in the intralesional macrophages. How to the experiments with TNFa in vitro relate back to how the macrophages are responding in vivo during infection?

      Our investigation of mechanisms of necrosis of TB granulomas stems from and supported by in vivo studies as summarized below.

      This work started with the characterization necrotic TB granulomas in C3HeB/FeJ mice in vivo followed by a classical forward genetic analysis of susceptibility to virulent Mtb in vivo.

      That led to the discovery of the sst1 locus and demonstration that it plays a dominant role in the formation of necrotic TB granulomas in mouse lungs in vivo. Using genetic and immunological approaches we demonstrated that the sst1 susceptibility allele controls macrophage function in vivo (Yan, et al., J.Immunol. 2007) and an aberrant macrophage activation by TNF and increased production of Ifn-b in vitro (He et al. Plos Pathogens, 2013). In collaboration with the Vance lab we demonstrated that the type I IFN receptor inactivation reduced the susceptibility to intracellular bacteria of the sst1-susceptible mice in vivo (Ji et al., Nature Microbiology, 2019). Next, we demonstrated that the Ifnb1 mRNA superinduction results from combined effects of TNF and JNK leading to integrated stress response in vitro (Bhattacharya, JCI, 2021). Thus, our previous work started with extensive characterization of the in vivo phenotype that led to the identification of the underlying macrophage deficiency that allowed for the detailed characterization of the macrophage phenotype in vitro presented in this manuscript. In a separate study, the Sher lab confirmed our conclusions and their in vivo relevance using Bach1 knockout in the sst1-susceptible B6.Sst1S background, where boosting antioxidant defense by Bach1 inactivation resulted in decreased type I interferon pathway activity and reduced granuloma necrosis. We have chosen TNF stimulation for our in vitro studies because this cytokine is most relevant for the formation and maintenance of the integrity of TB granulomas in vivo as shown in mice, non-human primates and humans. Here we demonstrate that although TNF is necessary for host resistance to virulent Mtb, its activity is insufficient for full protection of the susceptible hosts, because of altered macrophages responsiveness to TNF. Thus, our exploration of the necrosis of TB granulomas encompass both in vitro and extensive in vivo studies.

      Minor comments

      Introduction, while well written, is longer than necessary. Consider shortening this section. Throughout figures, many graphs show a fold induction/accumulation/etc, but it is rarely specified what the internal control is for each graph. This needs to be added.

      Paragraph one, authors use the phrase "the entire IFN pathway was dramatically upregulated..." seems to be an exaggeration. How do you know the "entire" IFN pathway was upregulated in a dramatic fashion?

      (1) We shortened the introduction and discussion; (2) verified that figure legends internal controls that were used to calculate fold induction; (3) removed the word “entire” to avoid overinterpretation.

      Figures 1E, G and H and supp fig 1C, the heat maps are missing an expression key Section 2 second paragraph refers to figs 2D, E as cytoplasmic in the text, but figure legend and y-axis of 2E show total protein.

      The expression keys were added to Fig.1E,G,H, Fig.7C, Suppl.Fig.1C and 1D and Suppl.Fig.11A of the revised manuscript.

      Section 3 end of paragraph 1 refers to Fig 3h. Does this also refer to Supp Fig 3E?

      Yes, Fig.3H shows microscopy of 4-HNE and Suppl.Fig.3H shows quantification of the image analysis. In the revised manuscript these data are presented in Fig.3H and Suppl.Fig.3F. The text was modified to reflect this change.

      Supplemental Fig 3 legend for C-E seems to incorrectly also reference F and G.

      We corrected this error in the figure legend. New panels were added to Suppl.Fig.3 and previous Suppl.Fig.3F and G were moved to Suppl.Fig.4 panels C and D of the revise version.

      Fig 3K, the p-cJun was inhibited with the JNK inhibitor, however it’s unclear why this was done or the conclusion drawn from this experiment. Use of the JNK inhibitor is not discussed in the text.

      The JNK inhibitor was used to confirm that c-Jun phosphorylation in our studies is mediated by JNK and to compare effects of JNK inhibition on phospho-cJun and Myc expression. This experiment demonstrated that the JNK inhibitor effectively inhibited c-Jun phosphorylation but not Myc upregulation, as shown in Fig.5I-J of the revised manuscript.

      Fig 4 I and Supp Fig 3 H seem to have been swapped? The graph in Fig 4I matches the images in Supp Fig 3I. Please check.

      We reorganized the panels to provide microscopy images and corresponding quantification together in the revised the panels Fig. 4H and Fig. 4I, as well as in Suppl. Fig. 4F and Suppl. Fig. 4G.

      Fig 6, it is unclear what % cell number means. Also for bacterial growth, the data are fold change compared to what internal control?

      We updated Fig.6 legend to indicate that the cell number percentages were calculated based on the number of cells at Day 0 (immediately after Mtb infection). We routinely use fixable cell death staining to enumerate cell death. Brief protocol containing this information is included in Methods section. The detailed protocol including normalization using BCG spike has been published – Yabaji et al, STAR Protocols, 2022. Here we did not present dead cell percentage as it remained low and we did not observe damage to macrophage monolayers. This allows us to exclude artifacts due to cell loss. The fold change of Mtb was calculated after normalization using Mtb load at Day 0 after infection and washes.

      Fig 7B needs an expression key

      The expression keys was added to Fig.7C (previously Fig. 7B).

      Supp Fig 7 and Supp Fig 8A, what do the arrows indicate?

      In Suppl.Fig.8 (previously Suppl.Fig.7) the arrows indicate acid fast bacilli (Mtb). In figures Fig.7A and Suppl.Fig.9A arrows indicate Mtb expressing fluorescent reporter mCherry. Corresponding figure legends were updated in the revised version.

      Supp Fig 9A, two ROI appear to be outlined in white, not just 1 as the legend says Methods:

      We updated the figure legend.

      Certain items are listed in the Reagents section that are not used in the manuscript, such as necrostatin-1 or Z-VAD-FMK. Please carefully check the methods to ensure extra items or missing items does not occur.

      These experiments were performed, but not included in the final manuscript. Hence, we removed the “necrostatin-1 or Z-VAD-FMK” from the reagents section in methods of revised version.

      Western blot, method of visualizing/imaging bands is not provided, method of quantifying density is not provided, though this was done for fig 5C and should be performed for the other WBs.

      We used GE ImageQuant LAS4000 Multi-Mode Imager to acquire the Western blot images and the densitometric analyses were performed by area quantification using ImageJ. We included this information in the method section. We added the densitometry of Western blot values after normalization above each lane in Fig.2A-C, Fig.3C-D and 3K; Fig.4A-B, Fig.5B,C,I,J.

      Reviewer #3 (Significance):

      The work of Yabaji et al is of high significance to the field of macrophage biology and M.tb pathogenesis in macrophages. This work builds from previously published work (Bhattacharya 2021) in which the authors first identified the aberrant response induced by TNF in sst1 mutant macrophages. Better understanding how macrophages with the sst1 locus respond not only to bacterial infection but stimulation with relevant ligands such as TNF will aid the field in identifying biomarkers for TB, biomarkers that can suggest a poor outcome vs. "cure" in response to antibiotic treatment or design of host-directed therapies.

      This work will be of interest to those who study macrophage biology and who study M.tb pathogenesis and tuberculosis in particular. This study expands the knowledge already gained on the sst1 locus to further determine how early macrophage responses are shaped that can ultimately determine disease progression.

      Strengths of the study include the methodologies, employing both bulk and single cell-RNA seq to answer specific questions. Data are analyze using automated methods (such as HALO) to eliminated bias. The experiments are well planned and designed to determine the mechanisms behind the increased iron-related oxidative stress found in the mutant macrophages following TNF treatment. Also, in vivo studies were performed to validate some of the in vitro work. Examining pauci-bacillary lesions vs multi-bacillary lesions and spatial transcriptomics is a significant strength of this work. The inclusion of human data is another strength of the study, showing increased Myc in humans with poor response to antibiotics for TB.

      Limitations include the fact that the work is all done with BMDMs. Use of alveolar macrophages from the mice would be a more relevant cell type for M.tb studies. AMs are less inflammatory, therefore treatment with TNF of AMs could result in different results compared to BMDMs. Reviewer's field of expertise: macrophage activation, M.tb pathogenesis in human and mouse models, cell signaling.

      Limitations: not qualified to evaluate single cell or bulk RNA-seq technical analysis/methodology or spatial transcriptomics analysis.

    1. eLife Assessment

      This useful study shows that stimuli of a certain size elicit theta oscillations in V1 neurons both in spikes and local field potentials, and monkeys performing a dot detection task on these stimuli show theta rhythmicity in their response times. This replicates previous findings showing rhythmic theta activity in V4 and behaviour when stimuli are presented in the receptive field along with a surrounding flanker stimulus. However, there is incomplete evidence that rhythmicity in neural activity is related to the rhythmicity in behavior, and the mechanisms underlying these oscillations remain unclear.

    2. Reviewer #1 (Public review):

      Summary:

      The authors add to the body of evidence showing theta rhythmic modulations of neuronal activity and behavior.

      Strengths:

      Precise characterization of the effects of visual stimulation on theta-induced neuronal oscillations of spiking neurons in V1 and its relevance for behavior.

      The manuscript is well-written and clearly presented,

      Weaknesses:

      The advances are limited over the established body of evidence. Both theta-induced visual oscillations and their relevance for behavior have been firmly established by prior work, including prior work from the authors. There is no major new technique, data, finding, or insight that extends our knowledge in a majorly significant way beyond existing knowledge, in my opinion. I would suggest that the authors re-evaluate the body of existing work to more strongly place their work in the context of existing work. A study that targets fundamental holes or open questions in the field would have been viewed as more impactful.

    3. Reviewer #2 (Public review):

      Summary:

      Schmid & colleagues test an interesting hypothesis that V1 neurons might act as theta-tuned filters to incoming sensory information, and thereby influence downstream processing and detection performance.

      Strengths:

      The authors report that circular stimuli elicit theta oscillations in V1 single units and population activity. They also report that the phase of the theta oscillations influences performance in a change detection task.

      Weaknesses:

      The results are reported in terms of specific stimulus sizes. To truly reflect general-purpose spatial computations in the primary visual cortex, it will be important to establish a relationship between stimulus size and receptive field size.

      I have several major concerns that I would like the authors to address:

      (1) First paragraph of Results: The results are presented at very specific stimulus sizes: 0.3-degree, 1-degree, 4-degree, and so on. A key missing piece of information is the size of the receptive fields (RFs) that were recorded from. A related missing information is at what eccentricity these RFs were recorded from. Since there is nothing magical about a 1-degree stimulus, any general-purpose computation in the primary visual cortex has to establish a relationship between RF size and stimulus size.

      (2) Second paragraph of Results: The authors state that "specific stimulus sizes consistently induced strong theta rhythmic activity: 1{degree sign} in MUA and 2{degree sign} in LFP". What is the interpretation of these specific sizes? Given that the LFP and MUAe reflect different aspects of neural activity, how does one interpret the discrepancy?

      (3) Third paragraph of Results: Again related to (1), what is the relationship between the stimulus size that elicited the largest theta peaks and RF size at the population level? (1)-(3) taken together, there seems to be an opportunity to reveal something more fundamental about V1 processing that the authors might have missed here.

      (4) Change detection task: It was not clear to me whether the timing of the luminance change, which varied from 500ms to 1500ms, was drawn from an exponential distribution or a uniform distribution. Only an exponential distribution has the property of a flat hazard function, which will be important to establish that the animal could not anticipate the timing of the upcoming change.

      (5) Figure 3D: Have the authors tried to fit the data separately for each animal? There seems to be an inconsistency in the results between the 2 animals. The circular data points ('AL') seem positively correlated, similar to the overall trend, but the diamond data points ('DP') seem to have a negative slope.

    4. Reviewer #3 (Public review):

      Summary:

      This paper investigates changes in brain oscillations in V1 in response to experimentally manipulating visual stimulus features (size, contrast at optimal size) and examines whether these effects are of perceptual relevance. The results reveal prominent stimulus-related theta oscillations in V1 that match in frequency the rhythms of behavioural performance (response speed in detecting targets in the visual display). Phase analyses relate these fluctuations of detection performance more formally to opposite theta phase angles in V1.

      Strengths:

      The non-human primate model provides unique findings on how brain oscillations relate to rhythms in perception (in two rhesus monkeys) that align well with findings from human studies (as occurring in the theta band). However, theta rhythms in humans are typically associated with fronto-parietal activity in the domain of spatial orienting, attentional sampling, while here the focus is on V1. Importantly, microsaccade-controls seem to speak against a spatial orienting/ attentional sampling mechanism to explain the observed effects (at least regarding overt attention).

      Weaknesses:

      This study provides interesting clues on perceptually relevant brain oscillations. Despite the microsaccade-control, I believe it remains an open question whether the V1 rhythmicity is of pure V1 origin, or driven by top-down input, as it is conceivable that specific stimuli capture attention differently (and hence induce specific covert attentional (re)orienting patterns). For perceptually relevant (yet beta) rhythmicity over occipital areas that are top-down generated, see e.g., Veniero et al., 2019.

    1. eLife Assessment

      In this useful study, ectopic expression and knockdown strategies were used to assess the effects of increasing and decreasing Cyclic di-AMP on the developmental cycle in Chlamydia. The authors convincingly demonstrate that overexpression of the dacA-ybbR operon results in increased production of c-di-AMP and early expression of the transitionary gene hctA and late gene omcB. Whilst the authors have attempted to revise the submission, the model proposed in the revised manuscript is still not fully supported by the data presented.

    2. Reviewer #2 (Public review):

      This manuscript describes the role of the production of c-di-AMP on the chlamydial developmental cycle. The main findings remain the same. The authors show that overexpression of the dacA-ybbR operon results in increased production of c-di-AMP and early expression of transitionary and late genes. The authors also knocked down the expression of the dacA-ybbR operon and reported a modest reduction in the expression of both hctA and omcB. The authors conclude with a model suggesting the amount of c-di-AMP determines the fate of the RB, continued replication, or EB conversion.

      Overall, this is a very intriguing study with important implications however the data is very preliminary and the model is very rudimentary. The data support the observation that dramatically increased c-di-AMP has an impact on transitionary gene expression and late gene expression suggesting dysregulation of the developmental cycle. This effect goes away with modest changes in c-di-AMP (detaTM-DacA vs detaTM-DacA (D164N)). However, the model predicts that low levels of c-di-AMP delays EB production is not not well supported by the data. If this prediction were true then the growth rate would increase with c-di-AMP reduction and the data does not show this. The levels of of c-di-AMP at the lower levels need to be better validated as it seems like only very high levels make a difference for dysregulated late gene expression. However, on the low end it's not clear what levels are needed to have an effect as only DacAopMut and DacAopKD show any effects on the cycle and the c-di-AMP levels are only different at 24 hours.

      The data still do not support the overall model.

      In Figure 1 the authors show at 24 hpi.

      DacA overexpression increases cdiAMP to ~4000 pg/ml

      DacAmut overexpression reduces cdiAMP dramatically to ~256 pg/ml)

      DacATM overexpression increases cdiAMP to ~4000 pg/ml.

      DacAmutTM overexpression does not seem to change cdiAMP ~1500 pg/ml .

      dacAKD decreases cdiAMP to ~300 pg/ml .

      dacAKDcom increased cdiAMP to ~8000 pg/ml.

      DacA-ybbRop overexpression increased cdiAMP to ~500,000 pg/ml.

      DacA-ybbRopmut ~300 pg/ml.

      However in Figure 2 the data show that overexpression of DacA (cdiAMP ~4000 pg/ml) did not have a different phenotype than over expression of the mutant (cdiAMP ~256 pg/ml). HctA expression down, omcB expression down, euo not much change, replication down, and IFUs down. Additionally, Figure 3 shows no differences in anything measured although cdiAMP levels were again dramatically different. DacATM overexpression (~4000 pg/ml) and DacAmutTM (~1500). This makes it unclear what cdiAMP is doing to the developmental cycle.

      In Figure 4 the authors knockdown dacA (dacA-KD) and complement the knockdown (dacA-KDcom) dacAKD decreases cdiAMP (~300) while DacA-KDcom increases cdiAMP much above wt (~8000).<br /> KD decreased hctA and omcB at 24hpi. Complementation resulted in a moderate increase in hctA at a single time point but not at 24 hpi and had no effect on euo or omcB expression. Importantly, complementation decreased the growth rate. Based on the proposed model, growth rate should increase as the chlamydia should all be RBs and replicating and not exiting the cell cycle to become EBs (not replicating). Interestingly reducing cdiAMP levels by over expressing DacAmut (~256 pg/ml) did not have an effect on the cycle but the reduction in cdiAMP by knockdown of dacA (~300 pg/ml) did have a moderate effect on the cycle.

      For Figure 5 DacA-ybbRop was overexpressed and this increased cdiAMP dramatically ~500,000 pg/ml as compared to wt ~1500. This increased hctA only at an early timepoint and not at 24hpi and again had no effect on omcB or euo. Overexpression of the operon with the mutation DacA-ybbRopmut reduced cdiAMP to ~300 pg/ml and this showed a reduction in growth rate similar to dacAmut but a more dramatic decrease in IFUs.

      Overall:

      DacA overexpression increases cdiAMP to ~4000 pg/ml (decreased everything except euo)

      DacAmut overexpression reduces cdiAMP dramatically (~256 pg/ml). (decreased everything except euo)

      DacATM overexpression increases cdiAMP to ~4000 pg/ml (no changes noted)

      DacAmutTM overexpression does not seem to change cdiAMP ~1500 pg/ml (no changes noted)

      dacAKD decrease cdiAMP to ~300 pg/ml (decreased everything except euo)

      dacAKDcom increased cdiAMP to ~8000 pg/ml (decreases growth rate, increase hctA a little but not omcB)

      DacA-ybbRop overexpression increased cdiAMP to ~500,000 pg/ml (decreases growth rate, increase hctA a little but not omcB)

      DacA-ybbRopmut ~300 pg/ml (decreased everything except euo)

      Overall, the data show that increasing cdiAMP only has a phenotype if it is dramatically increased, no effect at 4000 pg/ml. Decreasing cdiAMP has a consistent effect, decreased growth rate, IFU, hctA expression and omcB expression. However, if their proposed model was correct and low levels of cdiAMP blocked EB conversion then more chlamydial cells would be RBs (dividing cells) and the growth rate should increase. Conversely, if cdiAMP levels were dramatically raised then all RBs would all convert and the growth rate would be very low. When cdiAMP was raised to ~4000 pg/ml there was no effect on the growth rate. However, an increase to ~8000 pg/ml resulted in a significant decrease but growth continued. Increasing cdAMP to ~500,000 pg/ml had less of an impact on the growth rate. Overall, the data does not cleanly support the proposed model.

    3. Author response:

      The following is the authors’ response to the current reviews

      Reviewer #2 (Public review): 

      This manuscript describes the role of the production of c-di-AMP on the chlamydial developmental cycle. The main findings remain the same. The authors show that overexpression of the dacA-ybbR operon results in increased production of c-di-AMP and early expression of transitionary and late genes. The authors also knocked down the expression of the dacA-ybbR operon and reported a modest reduction in the expression of both hctA and omcB. The authors conclude with a model suggesting the amount of c-di-AMP determines the fate of the RB, continued replication, or EB conversion. 

      Overall, this is a very intriguing study with important implications however the data is very preliminary and the model is very rudimentary. The data support the observation that dramatically increased c-di-AMP has an impact on transitionary gene expression and late gene expression suggesting dysregulation of the developmental cycle. This effect goes away with modest changes in c-di-AMP (detaTM-DacA vs detaTM-DacA (D164N)). However, the model predicts that low levels of c-di-AMP delays EB production is not not well supported by the data. If this prediction were true then the growth rate would increase with c-di-AMP reduction and the data does not show this. The levels of of c-di-AMP at the lower levels need to be better validated as it seems like only very high levels make a difference for dysregulated late gene expression. However, on the low end it's not clear what levels are needed to have an effect as only DacAopMut and DacAopKD show any effects on the cycle and the c-di-AMP levels are only different at 24 hours. 

      These appear to be the same comments the reviewer presented last time, so we will reiterate our prior points here and elsewhere. We do not think and nor do we predict that low c-di-AMP levels should increase growth rate (as measured by gDNA levels), and this conclusion cannot be drawn from our data. Rather, we predict that the inability to accumulate c-di-AMP should delay production of EBs, and this is what the data show. The reviewer has applied their own subjective (and erroneous) interpretation to the model. The asynchronicity of the normal developmental cycle means RBs continue to replicate as EBs are forming, so gDNA levels cannot be used as the sole metric for determining RB levels. We show that reduced c-di-AMP levels reduce EB levels as well as transcripts associated with late stages of development. The parsimonious interpretation of these data support that low c-di-AMP levels delay progression through the developmental cycle consistent with our model.

      The data still do not support the overall model.

      We disagree.  We have presented quantified data that include appropriate controls and statistical tests, and the reviewer has not disputed that or pointed to additional experiments that need to be performed.  The reviewer has imposed a subjective interpretation of our model based on their own biases.  A reader is free, of course, to disagree with our model, but a reviewer should not block a manuscript based on such a disagreement if no experimental flaws have been identified. 

      In Figure 1 the authors show at 24 hpi. 

      We also showed data from 16hpi, which is a more relevant timepoint for assessing premature transition to EBs.  In contrast, the 24hpi is more important for assessing developmental effects of reduced c-di-AMP levels.

      DacA overexpression increases cdiAMP to ~4000 pg/ml 

      DacAmut overexpression reduces cdiAMP dramatically to ~256 pg/ml) 

      DacATM overexpression increases cdiAMP to ~4000 pg/ml. 

      DacAmutTM overexpression does not seem to change cdiAMP ~1500 pg/ml . 

      dacAKD decreases cdiAMP to ~300 pg/ml . 

      dacAKDcom increased cdiAMP to ~8000 pg/ml. 

      DacA-ybbRop overexpression increased cdiAMP to ~500,000 pg/ml. 

      DacA-ybbRopmut ~300 pg/ml. 

      However in Figure 2 the data show that overexpression of DacA (cdiAMP ~4000 pg/ml) did not have a different phenotype than over expression of the mutant (cdiAMP ~256 pg/ml). HctA expression down, omcB expression down, euo not much change, replication down, and IFUs down. Additionally, Figure 3 shows no differences in anything measured although cdiAMP levels were again dramatically different. DacATM overexpression (~4000 pg/ml) and DacAmutTM (~1500). This makes it unclear what cdiAMP is doing to the developmental cycle. 

      As we have explained in the text and in response to reviewer comments on previous rounds of review, overexpressing the full-length WT or mutant DacA is detrimental to developmental cycle progression for reasons that have nothing to do with c-di-AMP levels (likely disrupting membrane function), since, as the reviewer notes, the WT DacA deltaTM strain had similar c-di-AMP levels but no negative effects on growth/development. If we had not presented the effects of overexpressing the individual isoforms, then a reviewer would surely have requested such, which is why we present these data even though they don’t seem to support our model.  This is an honest representation of our findings.  The reviewer seems intent on nitpicking a minor datapoint that seems to contradict the rest of the manuscript while ignoring or not carefully reading the rest of the manuscript.

      In Figure 4 the authors knockdown dacA (dacA-KD) and complement the knockdown (dacA-KDcom) 

      dacAKD decreases cdiAMP (~300) while DacA-KDcom increases cdiAMP much above wt (~8000). 

      KD decreased hctA and omcB at 24hpi. Complementation resulted in a moderate increase in hctA at a single time point but not at 24 hpi and had no effect on euo or omcB expression.

      By 24hpi, late gene transcripts are being maximally produced during a normal developmental cycle. It is unclear why the reviewer thinks that these transcripts should be elevated above this level in any of our strains that prematurely transition to EBs. There is no basis in the literature to support such an assumption. As we noted in the text, the dacA-KDcom strain phenocopied the dacAop OE strain, and we showed RNAseq data and EB production curves for the latter that support our conclusions of the effect of increased c-di-AMP levels on developmental progression.

      Importantly, complementation decreased the growth rate.

      Yes, since the c-di-AMP levels breached the “EB threshold” at 16hpi, it causes premature transition to EBs, which do not replicate their gDNA, at an earlier stage of the cycle when fewer organisms are present. Therefore, the gDNA levels are decreased at 24hpi, which is consistent with our model.

      Based on the proposed model, growth rate should increase as the chlamydia should all be RBs and replicating and not exiting the cell cycle to become EBs (not replicating).

      This is a spurious conclusion from the reviewer. As we clearly showed, the dacA-KDcom did not restore a wild-type phenotype and instead mimicked the dacAop OE strain. This was commented on in the text.

      Interestingly reducing cdiAMP levels by over expressing DacAmut (~256 pg/ml) did not have an effect on the cycle but the reduction in cdiAMP by knockdown of dacA (~300 pg/ml) did have a moderate effect on the cycle. 

      This is again a spurious conclusion from the reviewer. The dacAMut and dacA-KD strains are distinct. As noted in the text and above for DacA WT OE, overexpressing the DacAMut similarly disrupts organism morphology, which is different from dacA-KD. These strains should not be directly compared because of this. This point has been previously highlighted in the text (in Results and Discussion).

      For Figure 5 DacA-ybbRop was overexpressed and this increased cdiAMP dramatically ~500,000 pg/ml as compared to wt ~1500. This increased hctA only at an early timepoint and not at 24hpi and again had no effect on omcB or euo.

      As we explained in prior reviews, our RNAseq data more comprehensively assessed transcripts for the dacAop OE strain. These data show convincingly that late gene transcripts (not just hctA and omcB) are elevated earlier in the developmental cycle. Again, it is not clear why the reviewer should expect that late gene transcripts should be higher in these strains than they are during a normal developmental cycle. This is not part of our model and appears to be a bias that the reviewer has imposed that is not supported by the literature.

      Overexpression of the operon with the mutation DacA-ybbRopmut reduced cdiAMP to ~300 pg/ml and this showed a reduction in growth rate similar to dacAmut but a more dramatic decrease in IFUs. 

      As we described in the text, in earlier revisions, and above, the dacAMut OE strain has distinct effects unrelated to c-di-AMP levels and, therefore, should not be compared to other strains in terms of linking its c-di-AMP levels to its phenotype.

      Overall: 

      DacA overexpression increases cdiAMP to ~4000 pg/ml (decreased everything except euo) 

      DacAmut overexpression reduces cdiAMP dramatically (~256 pg/ml). (decreased everything except euo) 

      DacATM overexpression increases cdiAMP to ~4000 pg/ml (no changes noted) 

      DacAmutTM overexpression does not seem to change cdiAMP ~1500 pg/ml (no changes noted) 

      dacAKD decrease cdiAMP to ~300 pg/ml (decreased everything except euo) 

      dacAKDcom increased cdiAMP to ~8000 pg/ml (decreases growth rate, increase hctA a little but not omcB) 

      DacA-ybbRop overexpression increased cdiAMP to ~500,000 pg/ml (decreases growth rate, increase hctA a little but not omcB) <br /> DacA-ybbRopmut ~300 pg/ml (decreased everything except euo) 

      Overall, the data show that increasing cdiAMP only has a phenotype if it is dramatically increased, no effect at 4000 pg/ml.

      Yes, this clearly shows there is a threshold - as we hypothesize!  However, these thresholds are more important at the 16hpi timepoint not 24hpi (which the reviewer is referencing) when assessing premature transition to EBs.  We specifically highlighted in our prior revision in Figure 1E this EB threshold to make this point clearer for the reader.  Once the threshold is breached, then the overall c-di-AMP levels become irrelevant as the RBs have begun their transition to EBs.

      Decreasing cdiAMP has a consistent effect, decreased growth rate, IFU, hctA expression and omcB expression. However, if their proposed model was correct and low levels of cdiAMP blocked EB conversion then more chlamydial cells would be RBs (dividing cells) and the growth rate should increase.

      The only effect should be normal gDNA levels, which is what we see in the dacA-KD.  Given the asynchronicity of a normal developmental cycle in which RBs continue to replicate as EBs are still forming, there is no basis to assume gDNA levels should increase under these conditions for the dacA-KD strain at 24hpi.

      Conversely, if cdiAMP levels were dramatically raised then all RBs would all convert and the growth rate would be very low.

      We agree. This is what is reflected by the dacAop OE and dacA-KDcom strains, with reduced gDNA levels at 24hpi since organisms have transitioned to EBs at an earlier time post-infection.

      When cdiAMP was raised to ~4000 pg/ml there was no effect on the growth rate.

      Yes, because it had not breached the EB threshold at 16hpi – consistent with our model!  The reviewer is confusing effects of elevated c-di-AMP at 24hpi when they should be assessed at the 16hpi timepoint for strains overproducing this molecule.

      However, an increase to ~8000 pg/ml resulted in a significant decrease but growth continued.

      If the reviewer is referring to the dacA-KDcom strain, then this is not accurate. gDNA levels were decreased in this strain at 24hpi when the c-di-AMP levels were increased compared to the WT (mCherry OE) control at 16hpi, indicating this strain had breached the “EB threshold” and initiated conversion to EBs at an earlier timepoint post-infection when fewer organisms were present.

      Increasing cdAMP to ~500,000 pg/ml had less of an impact on the growth rate.

      It is not clear what this conclusion is based on and what the reviewer is comparing to.  This is a subjective assessment not based on our data.

      Overall, the data does not cleanly support the proposed model.

      It is an unfortunate aspect of biology, particularly for obligate intracellular bacteria – a challenging experimental system on which to work, that the data are not always “clean”.  The overall effects of increased c-di-AMP levels on chlamydial developmental cycle progression we have documented support our model, and we think the reader, as always, should make their own assessment.


      The following is the authors’ response to the original reviews.

      Reviewer #2 (Public review): 

      This manuscript describes the role of the production of c-di-AMP on the chlamydial developmental cycle. The main findings remain the same. The authors show that overexpression of the dacA-ybbR operon results in increased production of c-di-AMP and early expression of transitionary and late genes. The authors also knocked down the expression of the dacA-ybbR operon and reported a modest reduction in the expression of both hctA and omcB. The authors conclude with a model suggesting the amount of c-di-AMP determines the fate of the RB, continued replication, or EB conversion. 

      Overall, this is a very intriguing study with important implications however, the data is very preliminary, and the model is very rudimentary. The data support the observation that dramatically increased c-di-AMP has an impact on transitionary gene expression and late gene expression suggesting dysregulation of the developmental cycle. This effect goes away with modest changes in c-di-AMP (detaTM-DacA vs detaTM-DacA (D164N)). However, the model predicts that low levels of c-di-AMP delays EB production is not not well supported by the data. If this prediction were true then the growth rate would increase with c-di-AMP reduction and the data does not show this.

      Thank you for the comments. We have apparently not adequately communicated our predictions and the model. We do not think and nor do we predict that low c-di-AMP levels should increase growth rate, and there is no basis in any of our data to support that. Rather, we predict that the inability to accumulate c-di-AMP should delay production of EBs, and this is what the data show. We have clarified this in the text (line 89 paragraph).

      The levels of c-di-AMP at the lower levels need to be better validated as it seems like only very high levels make a difference for dysregulated late gene expression. However, on the low end it's not clear what levels are needed to have an effect as only DacAopMut and DacAopKD show any effects on the cycle and the c-di-AMP levels are only different at 24 hours.

      Our hypothesis is that increasing concentrations of c-di-AMP within a given RB is a signal for it to undergo secondary differentiation to the EB, and the data support this as noted by the reviewers. Again, we stress that low levels of c-di-AMP are irrelevant to the model. We have revised Figure 1E to indicate the level of c-di-AMP in the control strain at the 24hpi timepoint that coincides with increased EB levels. We hope this will further clarify the goals of our study. That a given strain might be below the EB control is not relevant to the model beyond indicating that it has not reached the necessary threshold for triggering secondary differentiation.

      The authors responded to reviewers' critiques by adding the overexpression of DacA without the transmembrane region. This addition does not really help their case. They show that detaTM-DacA and detaTM-DacA (D164N) had the same effects on c-di-AMP levels but the figure shows no effects on the developmental cycle.

      As it relates directly to the reviewer’s point, the delta-TM strains did not show the same level of c-di-AMP. It may be that the reviewer misread the graph. The purpose of testing these strains was to show that the negative effects of overexpressing full-length WT DacA were due to its membrane localization. Both the FL and deltaTM-DacA (WT) overexpression had equivalent c-di-AMP levels even though the delta-TM overexpression looked like the mCherry-expressing strain based on the measured parameters. This shows that the c-di-AMP levels were irrelevant to the phenotypes observed when overexpressing these WT isoforms. For the mutant isoforms, the delta-TM looked like the mCherry-expressing control while the FL isoform was negatively impacted for reasons we described in the Discussion (e.g., dominant negative effect). In addition, at 16hpi, neither delta-TM strain had c-di-AMP levels that approached the 24h control as denoted in Figure 1E (dashed line) and in the text, which explains why these strains did not show increased late gene transcripts at an earlier timepoint like the dacAop and dacA-KDcom strains.

      Describing the significance of the findings: 

      The findings are important and point to very exciting new avenues to explore the important questions in chlamydial cell form development. The authors present a model that is not quantified and does not match the data well. 

      We respectfully disagree with this assessment as noted above in response to the reviewer’s critique. All of our data are quantified and support the hypothesis as stated.

      Describing the strength of evidence: 

      The evidence presented is incomplete. The authors do a nice job of showing that overexpression of the dacA-ybbR operon increases c-di-AMP and that knockdown or overexpression of the catalytically dead DacA protein decreases the c-di-AMP levels. However, the effects on the developmental cycle and how they fit the proposed model are less well supported. 

      Overall this is a very intriguing finding that will require more gene expression data, phenotypic characterization of cell forms, and better quantitative models to fully interpret these findings. 

      It is not clear what quantitative models the reviewer would prefer, but, ultimately, it is up to the reader to decide whether they agree or not with the model we present. The data are the data, and we have tried to present them as clearly as possible. We would emphasize that, with the number of strains we have analyzed, we have presented a huge amount of data for a study with an obligate intracellular bacterium. As a comparison, most publications on Chlamydia might use a handful of transformant strains, if any. Given the cost and time associated with performing such studies, it is prohibitive to attempt all the time points that one might like to do, and it is not clear to us that further studies will add to or alter the conclusions of the current manuscript.

      Reviewer #2 (Recommendations for the authors): 

      Minor critiques 

      The graphs have red and blue lines but the figure legends are red and black. It would be better if these matched. 

      Changed.

      For Figure 1C. The labels are not very helpful. It's not clear what is HeLa vs mCherry. I believe it is uninfected vs Chlamydia infected.

      Changed.

    1. eLife Assessment

      This valuable study revisits the effects of substitution model selection on phylogenetics by comparing reversible and non-reversible DNA substitution models. The authors provide solid evidence that 1) it can be beneficial to include non-time-reversible models in addition to general time-reversible models when inferring phylogenetic trees out of simulated viral genome sequence data sets, and that 2) non time-reversible models may fit the real data better than the reversible substitution models commonly used in phylogenetics, a finding consistent with previous work.

    1. Reviewer #1 (Public review):

      Summary:

      This is a contribution to the field of developmental bioelectricity. How do changes of resting potential at the cell membrane affect downstream processes? Zhou et al. reported in 2015 that phosphatidylserine and K-Ras cluster upon plasma membrane depolarization and that voltage-dependent ERK activation occurs when constitutively active K-RasG12V mutants are overexpressed. In this paper, the authors advance the knowledge of this phenomenon by showing that membrane depolarization up-regulates mitosis and that this process is dependent on voltage-dependent activation of ERK. ERK activity's voltage-dependence is derived from changes in the dynamics of phosphatidylserine in the plasma membrane and not by extracellular calcium dynamics. This paper reports an interesting and important finding. It is somewhat derivative of Zhou et al., 2015. (https://www.science.org/doi/full/10.1126/science.aaa5619). The main novelty seems to be that they find quantitatively different conclusions upon conducting similar experiments, albeit with a different cell line (U2OS) than those used by Zhou et al. Sasaki et al. do show that increased K+ levels increase proliferation, which Zhou et al. did not look at. The data presented in this paper are a useful contribution to a field often lacking such data.

      Strengths:

      Bioelectricity is an important field for areas of cell, developmental, and evolutionary biology, as well as for biomedicine. Confirmation of ERK as a transduction mechanism and a characterization of the molecular details involved in the control of cell proliferation are interesting and impactful.

      Weaknesses:

      The authors lean heavily on the assumption that the Nernst equation is an accurate predictor of membrane potential based on K+ level. This is a large oversimplification that undermines the author's conclusions, most glaringly in Figure 2C. The author's conclusions should be weakened to reflect that the activity of voltage gated ion channels and homeostatic compensation are unaccounted for.

      There are grammatical tense errors are made throughout the paper (ex line 99 "This kinetics should be these kinetics")

      Line 71: Zhou et al. use BHK, N2A, PSA-3 cells, this paper uses U2OS (osteosarcoma) cells. Could that explain the differences in bioelectric properties that they describe? In general, there should be more discussion of the choice of cell line. Why were U2OS cells chosen? What are the implications of the fact that these are cancer cells, and bone cancer cells in particular? Does this paper provide specific insights for bone cancers? And crucially, how applicable are findings from these cells to other contexts?

      Line 115: The authors use EGF to calibrate 'maximal' ERK stimulation. Is this level near saturation? Either way is fine, but it would be useful to clarify.

      Line 121: Starting line 121 the authors say "Of note, U2OS cells expressed wild-type K-Ras but not an active mutant of K-Ras, which means voltage dependent ERK activation occurs not only in tumor cells but also in normal cells". Given that U2OS cells are bone sarcoma cells, is it appropriate to refer to these as 'normal' cells in contrast to 'tumor' cells?

      Line 101: These normalizations seem reasonable, the conclusions sufficiently supported and the requisite assumptions clearly presented. Because the dish-to-dish and cell-to-cell variation may reflect biologically relevant phenomena it would be ideal if non-normalized data could be added in supplemental data where feasible.

      Figure 2C is listed as Figure 2D in the text

      There is no Figure 2F (Referenced in line 148)

    2. Reviewer #2 (Public review):

      Sasaki et al. use a combination of live-cell biosensors and patch-clamp electrophysiology to investigate the effect of membrane potential on the ERK MAPK signaling pathway, and probe associated effects on proliferation. This is an effect that has long been proposed, but a convincing demonstration has remained elusive, because it is difficult to perturb membrane potential without disturbing other aspects of cell physiology in complex ways. The time-resolved measurements here are a nice contribution to this question, and the perforated patch clamp experiments with an ERK biosensor are fantastic - they come closer to addressing the above difficulty of perturbing voltage than any prior work. It would have been difficult to obtain these observations with any other combination of tools.

      However, there are still some concerns as detailed in specific comments below:

      Specific comments:

      (1) All the observations of ERK activation, by both high extracellular K+ and voltage clamp, could be explained by cell volume increase (more discussion in subsequent comments). There is a substantial literature on ERK activation by hypotonic cell swelling (e.g. https://doi.org/10.1042/bj3090013, https://doi.org/10.1002/j.1460-2075.1996.tb00938.x, among others). Here are some possible observations that could demonstrate that ERK activation by volume change is distinct from the effects reported here:

      i) Does hypotonic shock activate ERK in U2OS cells?

      ii) Can hypotonic shock activate ERK even after PS depletion, whereas extracellular K+ cannot?

      iii) Does high extracellular K+ change cell volume in U2OS cells, measured via an accurate method such as fluorescence exclusion microscopy?

      iv) It would be helpful to check the osmolality of all the extracellular solutions, even though they were nominally targeted to be iso-osmotic.

      (2) Some more details about the experimental design and the results are needed from Figure 1:

      i) For how long are the cells serum-starved? From the Methods section, it seems like the G1 release in different K+ concentration is done without serum, is this correct? Is the prior thymidine treatment also performed in the absence of serum?

      ii) There is a question of whether depolarization constitutes a physiologically relevant mechanism to regulate proliferation, and how depolarization interacts with other extracellular signals that might be present in an in vivo context. Does depolarization only promote proliferation after extended serum starvation (in what is presumably a stressed cell state)? What fraction of total cells are observed to be mitotic (without normalization), and how does this compare to the proliferation of these cells growing in serum-supplemented media? Can K+ concentration tune proliferation rate even in serum-supplemented media?

      (3) In Figure 2, there are some possible concerns with the perfusion experiment:

      i) Is the buffer static in the period before perfusion with high K+, or is it perfused? This is not clear from the Methods. If it is static, how does the ERK activity change when perfused with 5 mM K+? In other words, how much of the response is due to flow/media exchange versus change in K+ concentration?

      ii) Why do there appear to be population-average decreases in ERK activity in the period before perfusion with high K+ (especially in contrast to Fig. 3)? The imaging period does not seem frequent enough for photobleaching to be significant.

      (4) Figure 3 contains important results on couplings between membrane potential and MAPK signaling. However, there are a few concerns:

      i) Does cell volume change upon voltage clamping? Previous authors have shown that depolarizing voltage clamp can cause cells to swell, at least in the whole-cell configuration:

      https://www.cell.com/biophysj/fulltext/S0006-3495(18)30441-7 . Could it be possible that the clamping protocol induces changes in ERK signaling due to changes in cell volume, and not by an independent mechanism?

      ii) Does the -80 mV clamp begin at time 0 minutes? If so, one might expect a transient decrease in sensor FRET ratio, depending on the original resting potential of the cells. Typical estimates for resting potential in HEK293 cells range from -40 mV to -15 mV, which would reach the range that induces an ERK response by depolarizing clamp in Fig. 3B. What are the resting potentials of the cells before they are clamped to -80 mV, and why do we not see this downward transient?

      (5) The activation of ERK by perforated voltage clamp and by high extracellular K+ are each convincing, but it is unclear whether they need to act purely through the same mechanism - while additional extracellular K+ does depolarize the cell, it could also be affecting function of voltage-independent transporters and cell volume regulatory mechanisms on the timescales studied. To more strongly show this, the following should be done with the HEK cells where there is already voltage clamp data:

      i) Measure resting potential using the perforated patch in zero-current configuration in the high K+ medium. Ideally this should be done in the time window after high K+ addition where ERK activation is observed (10-20 minutes) to minimize the possibility of drift due to changes in transporter and channel activity due to post-translational regulation.

      ii) Measure YFP/CFP ratio of the HEK cells in the high K+ medium (in contrast to the U2OS cells from Fig. 2 where there is no patch data).

      iii) The assertion that high K+ is equivalent to changes in Vmem for ERK signaling would be supported if the YFP/CFP change from K+ addition is comparable to that induced by voltage clamp to the same potential. This would be particularly convincing if the experiment could be done with each of the 15 mM, 30 mM, and 145 mM conditions.

      (6) Line 170: "ERK activity was reduced with a fast time course (within 1 minute) after repolarization to -80 mV." I don't see this in the data: in Fig. 3C, it looks like ERK remains elevated for > 10 min after the electrical stimulus has returned to -80 mV

      Comments on revisions:

      The authors have done a good job addressing the comments on the previous submission.

    3. Reviewer #3 (Public review):

      Summary:

      This paper demonstrates that membrane depolarization induces a small increase in cell entry into mitosis. Based on previous work from another lab, the authors propose that ERK activation might be involved. They show convincingly using a combination of assays that ERK is activated by membrane depolarization. They show this is Ca2+ independent and is a result of activation of the whole K-Ras/ERK cascade which results from changed dynamics of phosphatidylserine in the plasma membrane that activates K-Ras. Although the activation of the Ras/ERK pathway by membrane depolarization is not new, linking it to an increase in cell proliferation is novel.

      Strengths

      A major strength of the study is the use of different techniques - live imaging with ERK reporters, as well as Western blotting to demonstrate ERK activation as well as different methods for inducing membrane depolarization. They also use a number of different cell lines. Via Western blotting the authors are also able to show that the whole MAPK cascade is activated.

      Weaknesses

      A weakness of the study is the data in Figure 1 showing that membrane depolarization results in an increase of cells entering mitosis. There are very few cells entering mitosis in their sample in any condition. This should be done with many more cells to increase the confidence in the results. The study also lacks a mechanistic link between ERK activation by membrane depolarization and increased cell proliferation.

      The authors did achieve their aims with the caveat that the cell proliferation results could be strengthened. The results, for the most par,t support the conclusions.

      This work suggests that alterations in membrane potential may have more physiological functions than action potential in the neural system as it has an effect on intracellular signalling and potentially cell proliferation.

      In the revised manuscript, the authors have now addressed the issues with Figure 1, and the data presented are much clearer. They did also attempt to pinpoint when in the cell cycle ERK is having its activity, but unfortunately, this was not conclusive.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This is a contribution to the field of developmental bioelectricity. How do changes of resting potential at the cell membrane affect downstream processes? Zhou et al. reported in 2015 that phosphatidylserine and K-Ras cluster upon plasma membrane depolarization and that voltage-dependent ERK activation occurs when constitutive active K-RasG12V mutants are overexpressed. In this paper, the authors advance the knowledge of this phenomenon by showing that membrane depolarization up-regulates mitosis and that this process is dependent on voltage-dependent activation of ERK. ERK activity's voltage-dependence is derived from changes in the dynamics of phosphatidylserine in the plasma membrane and not by extracellular calcium dynamics.

      Strengths:

      Bioelectricity is an important field for areas of cell, developmental, and evolutionary biology, as well as for biomedicine. Confirmation of ERK as a transduction mechanism, and a characterization of the molecular details involved in control of cell proliferation, is interesting and impactful.

      Weaknesses:

      The functional cell division data need to be stronger. They show that increasing K+ increases proliferation and argue that since a MEK inhibitor (U0126) reduces proliferation in K+ treated cells, K+ induces cell division via ERK. But I don't see statistics to show that the rescue is significant, and I don't see a key U0126-only control. If the U0126 alone reduces proliferation, the combined effect wouldn't prove much.

      We thank the reviewer for constructive feedback. We repeated the experiment including the U0126-only control (5K+U). We updated Fig.1, presenting the newly obtained data with statistical analysis.

      Also, unless I'm missing something, it looks like every sample in their control has exactly the same number of mitotic cells. I understand that they are normalizing to this column, but shouldn't they be normalizing to the mean, with the independent values scattering around 1? It doesn't seem like it can be paired replicates since there are 6 replicates in the control and 4 replicates in one of the conditions? 

      We apologize for the unclear description. As the reviewer pointed out, the experiments were not paired replicates due to the limited number of conditions that can be conducted as a single experiment. To overcome this problem, we always included a control condition (i.e. 5K) based on which normalization was performed. This is the reason the data in 5K is always 1 and the sample size of 5K is the largest. Data include 100-900 mitotic cells within the imaging frame of 6 hrs. We re-wrote the figure legend (Fig1) and the main text, which hopefully clarified our experimental framework.

      Reviewer #2 (Public review):

      Sasaki et al. use a combination of live-cell biosensors and patch-clamp electrophysiology to investigate the effect of membrane potential on the ERK MAPK signaling pathway, and probe associated effects on proliferation. This is an effect that has long been proposed, but convincing demonstration has remained elusive, because it is difficult to perturb membrane potential without disturbing other aspects of cell physiology in complex ways. The time-resolved measurements here are a nice contribution to this question, and the perforated patch clamp experiments with an ERK biosensor are fantastic - they come closer to addressing the above difficulty of perturbing voltage than any prior work. It would have been difficult to obtain these observations with any other combination of tools.

      However, there are still some concerns as detailed in specific comments below:

      Specific comments:

      (1) All the observations of ERK activation, by both high extracellular K+ and voltage clamp, could be explained by cell volume increase (more discussion in subsequent comments). There is a substantial literature on ERK activation by hypotonic cell swelling (e.g. https://doi.org/10.1042/bj3090013https://doi.org/10.1002/j.1460-2075.1996.tb00938.x, among others). Here are some possible observations that could demonstrate that ERK activation by volume change is distinct from the effects reported here:

      (i) Does hypotonic shock activate ERK in U2OS cells?

      (ii) Can hypotonic shock activate ERK even after PS depletion, whereas extracellular K+ cannot?

      (iii) Does high extracellular K+ change cell volume in U2OS cells, measured via an accurate method such as fluorescence exclusion microscopy?

      (iv) It would be helpful to check the osmolality of all the extracellular solutions, even though they were nominally targeted to be iso-osmotic.

      This is an important point. We conducted several experiments and provided explanations to rule out the possibility that ERK activation can be explained solely by cell volume change. We measured the osmolarity of all solutions used in this paper, which were 296-305 mOsm/L. This information was added to the Material and Methods section (line 387). Under our experimental conditions, ERK activation was not observed with hypotonic 70 % nor 50% osmolarity solution (Fig.S2).

      It is therefore unlikely that the main cause of ERK activation upon high K<sup>+</sup> perfusion is due to cell volume change. We would like to pursue this issue further when we obtain capacity to measure accurate cell volume change in the future.

      (2) Some more details about the experimental design and the results are needed from Figure 1:

      (i) For how long are the cells serum-starved? From the Methods section, it seems like the G1 release in different K+ concentration is done without serum, is this correct? Is the prior thymidine treatment also performed in the absence of serum?

      Only the high K<sup>+</sup> incubation phase was serum free. We added the following sentence in the main text (line 63) and an experimental diagram was added as Fig1A. “Cells were incubated in the presence of serum except for the phase with altered K<sup>+</sup> concentration. “

      (ii) There is a question of whether depolarization constitutes a physiologically relevant mechanism to regulate proliferation, and how depolarization interacts with other extracellular signals that might be present in an in vivo context.

      This is a very important point. However, the significance of membrane depolarization for cell proliferation in vivo is beyond the scope of this study. This important question will be addressed in the future.

      Does depolarization only promote proliferation after extended serum starvation (in what is presumably a stressed cell state)?

      Cells were cultured in the presence of serum prior to the high K<sup>+</sup> incubation phase as described above. We added a new figure (Fig1A).

      What fraction of total cells are observed to be mitotic (without normalization), and how does this compare to the proliferation of these cells growing in serum-supplemented media? Can K+ concentration tune proliferation rate even in serum-supplemented media?

      We included data recorded in serum-supplemented conditions (Fig.1), which showed a high mitotic rate. This is presumably due to the growth factors included in serum. There is no significant difference between 5K+FBS and 15K+FBS.

      (3) In Figure 2, there are some possible concerns with the perfusion experiment:

      (i) Is the buffer static in the period before perfusion with high K+, or is it perfused? This is not clear from the Methods. If it is static, how does the ERK activity change when perfused with 5 mM K+? In other words, how much of the response is due to flow/media exchange versus change in K+ concentration?

      The buffer was static prior to high K perfusion. We confirmed that perfusion alone does not activate ERK (Fig.S2). We added the following sentence to the main text. “We also confirmed that the effect of perfusion was negligible, as ERK activation was not observed upon start of the 5K<sup>+</sup> perfusion” (line 150).

      (ii) Why do there appear to be population-average decreases in ERK activity in the period before perfusion with high K+ (especially in contrast to Fig. 3)? The imaging period does not seem frequent enough for photo bleaching to be significant.

      Although we don’ t have a clear answer to this question, we speculate that several aspects of the experimental setup may have contributed to the difference. The cell lines and imaging systems used in Fig.2 and Fig.3 were different. The expression level may be different between U2OS cells and HEK 293 cells: transient expression in U2OS cells in contrast to stable expression in HEK 293 cells. This difference may lead to the different signal-to-noise ratio. The imaging system used in Fig.2 is an epi-illumination microscope excited with a 439/24 bandpass filter and detected with 483/32 (CFP) and 542/27 (YFP), while the imaging system used in Fig.3 is a confocal microscope excited with 458 nm laser and detected with 475-525 (DFP) and LP530 (YFP). These optical setups may also contribute to the different population-average properties before stimulation.

      (4) Figure 3 contains important results on couplings between membrane potential and MAPK signaling. However, there are a few concerns:

      (i) Does cell volume change upon voltage clamping? Previous authors have shown that depolarizing voltage clamp can cause cells to swell, at least in the whole-cell configuration: https://www.cell.com/biophysj/fulltext/S0006-3495(18)30441-7 . Could it be possible that the clamping protocol induces changes in ERK signaling due to changes in cell volume, and not by an independent mechanism?

      We do not know whether cell volume is altered in the perforated-patch configuration. As discussed above, however, the effect of cell volume changes on ERK activity seemed to be negligible, because ERK activation was not observed with hypotonic 70 % nor 50% osmolarity solution (Fig.S2)

      (ii) Does the -80 mV clamp begin at time 0 minutes? If so, one might expect a transient decrease in sensor FRET ratio, depending on the original resting potential of the cells. Typical estimates for resting potential in HEK293 cells range from -40 mV to -15 mV, which would reach the range that induces an ERK response by depolarizing clamp in Fig. 3B. What are the resting potentials of the cells before they are clamped to -80 mV, and why do we not see this downward transient?

      We set the potential to -80mV immediately after the giga-seal formation and waited for at least 5 minutes to allow pore formation by gramicidin. We started imaging only after membrane potential was expected to have reached a steady state at -80 mV. We now included this sentence in the ‘Material and Methods’ section (line 398).

      (5) The activation of ERK by perforated voltage clamp and by high extracellular K+ are each convincing, but it is unclear whether they need to act purely through the same mechanism - while additional extracellular K+ does depolarize the cell, it could also be affecting function of voltage-independent transporters and cell volume regulatory mechanisms on the timescales studied. To more strongly show this, the following should be done with the HEK cells where there is already voltage clamp data:

      (i) Measure resting potential using the perforated patch in zero-current configuration in the high K+ medium. Ideally this should be done in the time window after high K+ addition where ERK activation is observed (10-20 minutes) to minimize the possibility of drift due to changes in transporter and channel activity due to post-translational regulation.

      We measured membrane potential in the perforated patch configuration and confirmed that there is negligible potential drift within 20 minutes of perfusion with 145 K+ (only 1~5 mV change during perfusion).

      (ii) Measure YFP/CFP ratio of the HEK cells in the high K+ medium (in contrast to the U2OS cells from Fig. 2 where there is no patch data).

      YFP/CFP ratio data in HEK cells are shown in Fig.S1. As the signal-to-noise level is affected by the expression level of the probe, it is difficult to compare between cells with different expression levels. A higher YFP/CFP value with HEK cells compared to HeLa cells and A431 cells (Sup1) does not necessarily mean that HEK cells have higher ERK activity.

      (iii) The assertion that high K+ is equivalent to changes in Vmem for ERK signaling would be supported if the YFP/CFP change from K+ addition is comparable to that induced by voltage clamp to the same potential. This would be particularly convincing if the experiment could be done with each of the 15 mM, 30 mM, and 145 mM conditions.

      The experimental system using fluorescent biosensor cannot measure absolute ERK activity and can only measure the amount of change after a specific stimulus compared to the period before the stimulus. In electrophysiology experiments, the pre-stimulation membrane potential was clamped to -80 mV, whereas in the perfusion experiment, the membrane potential was variable in individual cells (-35 to -15 mV). It is therefore difficult to compare the results of electrophysiology experiments with those of the perfusion system. Unlike ion channels, it is currently not possible to plot absolute ERK activity with respect to the overall membrane potential. In the present study, we therefore discussed the change rather than the absolute value of ERK activity.

      (6) Line 170: "ERK activity was reduced with a fast time course (within 1 minute) after repolarization to -80 mV." I don't see this in the data: in Fig. 3C, it looks like ERK remains elevated for > 10 min after the electrical stimulus has returned to -80 mV

      Thank you for pointing out that our description was confusing. We changed the sentence to clarify the point we wanted to make. It now reads as follows. “ERK activity showed signs of reduction within 1 minute after repolarization to -80 mV.” (line 174)

      Reviewer #3 (Public review):

      Summary:

      This paper demonstrates that membrane depolarization induces a small increase in cell entry into mitosis. Based on previous work from another lab, the authors propose that ERK activation might be involved. They show convincingly using a combination of assays that ERK is activated by membrane depolarization. They show this is Ca2+ independent and is a result of activation of the whole K-Ras/ERK cascade which results from changed dynamics of phosphatidylserine in the plasma membrane that activates K-Ras. Although the activation of the Ras/ERK pathway by membrane depolarization is not new, linking it to an increase in cell proliferation is novel.

      Strengths

      A major strength of the study is the use of different techniques - live imaging with ERK reporters, as well as Western blotting to demonstrate ERK activation as well as different methods for inducing membrane depolarization. They also use a number of different cell lines. Via Western blotting the authors are also able to show that the whole MAPK cascade is activated.

      Weaknesses

      A weakness of the study is the data in Figure 1 showing that membrane depolarization results in an increase of cells entering mitosis. There are very few cells entering mitosis in their sample in any condition. This should be done with many more cells to increase confidence in the results.

      We apologize that that description was not clear. Due to the limited number of conditions that can be conducted as a single experiment, we always included control condition (i.e. 5K) and performed normalization by comparing with the control condition of the initial 1.5 hrs. Data were from 100-900 mitotic cell counts within 6hr of the imaging time window. We re-wrote the figure legend (Fig1) and the main text.

      The study also lacks a mechanistic link between ERK activation by membrane depolarization and increased cell proliferation.

      The present study focused on the link between membrane potential and the ERK activity; the mechanistic link between ERK activity and cell proliferation is beyond the scope of the present study. This important topic will be pursued further in subsequent studies.

      The authors did achieve their aims with the caveat that the cell proliferation results could be strengthened. The results for the most part support the conclusions.

      This work suggests that alterations in membrane potential may have more physiological functions than action potential in the neural system as it has an effect on intracellular signalling and potentially cell proliferation.

      Reviewer #1 (Recommendations for the authors):

      minor typo:

      ERK activity has voltage-dependency with the physiological rang of membrane potential should be "range"

      Corrected

      Reviewer #2 (Recommendations for the authors):

      Small points:

      Line 82: rang -> range

      Corrected

      Line 102: ". they were stimulated" -> ". The cells were stimulated"

      Corrected

      Figs. 2C, 2D show exactly the same data points and the same information. Please cut one of these figures.

      We deleted 2C and added the information in 2D and made new Fig.2C.

      For all figs: Please indicate # of cells and # of independent dishes used in each experiment, and make clear whether individual data-points correspond to cells, dishes, or some other unit of measure.

      We added the information in figure legends.

      Reviewer #3 (Recommendations for the authors):

      The authors should repeat the cell proliferation experiments with more cells to strengthen the data. They could also use alternative assays like phosphorylated histone H3 staining for cells in M phase, that might to easier to quantitate.

      We repeated the experiment and Fig.1 was replaced with the new Fig.1

      The authors should investigate how the upregulation of ERK is driving cells into mitosis. At what point in the cell cycle is activated ERK induced by membrane depolarization having the effect. Is it entry into mitosis or earlier in the cell cycle?

      The cells were incubated with a high K+ solution 8-9 hr after G1 release, which is supposed to correspond to G2. These data suggest that mitotic activity is stimulated when ERK is activated at G2. However, we lack conclusive data at present to show the consequence of ERK activation during G2. We therefore cannot pinpoint the stage of cell cycle where depolarization-activated ERK exerts its effect.

      The authors refer a lot to the work of Zhou et al 2015 throughout the paper. This is not necessary and is a bit distracting.

      We deleted several sentence from the manuscript.

    5. eLife Assessment

      This useful paper presents evidence from several experimental approaches that suggest that changes in membrane potential directly affect ERK signaling to regulate cell division. This result is relevant because it supports an ion channel-independent pathway by which changes in membrane voltage can affect cell growth. The reviewers point out that while some experimental results and interpretations are compelling, the strength of evidence is still incomplete and changes to the manuscript are needed to rule out other possible interpretations of the data.

    1. eLife Assessment

      This is a fundamental study providing molecular insight into how cross-talk between histone modifications regulates the histone H3K36 methyltransferase SETD2. The manuscript contains excellent quality data, and the conclusions are convincing and justified. This work will be of interest to many biochemists working in the field of chromatin biology and epigenetics.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Mack and colleagues investigate the role of posttranslational modifications, including lysine acetylation and ubiquitination, in methyltransferase activity of SETD2 and show that this enzyme functions as a tumor suppressor in a KRASG12C-driven lung adenocarcinoma. In contrast to H3K36me2-specific oncogenic methyltransferases, the deletion of SETD2, which is capable of H3K36 trimethylation, increases lethality in a KRASG12C-driven lung adenocarcinoma mouse tumor model. In vitro, the authors demonstrate that polyacetylation of histone H3, particularly of H3K27, H3K14 and H3K23, promotes the catalytic activity of SETD2, whereas ubiquitination of H2A and H2B has no effect.

      Strengths:

      Overall, this is a well-designed study that addresses an important biological question regarding the functioning of the essential chromatin component. The manuscript contains excellent quality data, and the conclusions are convincing and justified. This work will be of interest to many biochemists working in the field of chromatin biology and epigenetics.

      Comments on revisions:

      All previous comments are well addressed, and I enthusiastically support publication.

    3. Reviewer #2 (Public review):

      Summary:

      Human histone H3K36 methyltransferase Setd2 has been previously shown to be a tumor suppressor in lung and pancreatic cancer. In this manuscript by Mack et al., the authors first use a mouse KRASG12D-driven lung cancer model to confirm in vivo that Setd2 depletion exacerbates tumorigenesis. They then investigate the enzymatic regulation of the Setd2 SET domain in vitro, demonstrating that H2A, H3, or H4 acetylation stimulates Setd2-SET activity, with specific enhancement by mono-acetylation at H3K14ac or H3K27ac. In contrast, histone ubiquitination has no effect. The authors propose that H3K27ac may regulate Setd2-SET activity by facilitating its binding to nucleosomes. This work provides insight into how cross-talk between histone modifications regulates Setd2 function.

      Comments on revisions:

      (1) Regarding New Figure 2F lane 1, please reference PMID: 33972509 Fig 4D bottom. Setd2-SET is a well-known robust K36 trimethylase. Why, under the authors' conditions, do WT nucleosomes show a significant amount of K36me1 and K36me2 accumulation, whereas K36me3 is not as pronounced? As a comparison, the authors should also report the evidence for the efficiency of each chemical modification that generates K36 methylation mimic.

      (2) The bottom panel of Figure 2B does not match the top one; the number of repeats should be indicated in the figure legends.

      (3) In Figure 4E, the differences between Setd2-bound WT and acetylated nucleosomes are minimal, as judged by both the decreasing trend of unbound nucleosomes and the increasing trend of bound fractions. This experiment needs to be quantified based on multiple repeats.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) Labels should be added in the Figures and should be uniform across all Figures (some are distorted).

      We thank the Reviewer for pointing out this issue. As requested, labels have been edited to ensure they are legible and are consistent in font, size, and style.  

      Reviewer #2 (Public review):

      (1) As for Figure 2F, Setd2-SET activity on WT rNuc (H3) appears to be significantly lower compared to what is extensively reported in the literature. This is particularly puzzling given that Figure 2B suggests that using 3H-SAM, H3-nuc are much better substrates than K36me1, whereas in Figure 3F, rH3 is weaker than K36me1. It is recommended for the authors to perform additional experimental repeats and include a quantitative analysis to ensure the consistency and reliability of these findings.  

      We appreciate the Reviewer’s points. We respectfully suggest that these comments may reflect potential confusion around interpreting how different assays detect in vitro methylation, what data can and cannot be compared, and the nature of the different substrates used. 

      With respect to point 1 (Western signal significantly lower compared to extensive literature): To the best of our knowledge, it would be extremely challenging to make a quantitative argument comparing the strength of the Western signal in Figure 2F with results reported in the literature. Specifically, comparing our results with previous studies would require (1) all the studies to have used the exact same antibodies as antibody signal intensities vary depending on the specific activity and selectively of a particular antibody and even its lot number, (2) similar in vitro methylation reaction condition, (3) the same type of recombinant nucleosomes used, and so on. Further, given that these are Western blots, we do not understand how one could interpret an absolute activity level. In the figure, all we can conclude is that in in vitro methylation reactions, our recombinant SETD2 protein methylates rNucs to generate mono-, di-, and tri-methylation at K36 (using vetted antibodies (see Fig. 2e)). If there is a specific paper within the extensive literature that the Reviewer highlights, we could look more into the details of why the signals are different (our guess is that any difference would largely be due to the use of different antibodies). We add that it might be challenging to find a similar experiment performed in the literature; we are not aware of a similar experiment. 

      With respect to comparing Figure 2B and 2F: We do not understand how one can meaningfully compare incorporation of radiolabeled SAM to antibody-based detection on film using an antibody against specific methyl states. In particular, regarding the question regarding comparing rH3 vs H3K36me1 nucleosomes, we point out that in using recombinant nucleosomes installed with native modifications (e.g. H3K36me1), in which the entire population of the starting material is mono-methylated, then naturally the Western signal with an anti-H3K36me1 antibody will be strong. In Fig. 2b, the assay is incorporation of radiolabeled methyl, which is added to the preexiting mono-methylated substrate. In other words, the results are entirely consistent if one understands how the methylation reactions were performed, how methylation was detected, and the nature of the reagents.

      (2) The additional bands observed in Figure 4B, which appear to be H4, should be accompanied by quantification of the intensity of the H3 bands to better assess K36me3 activity. Additionally, the quantification presented in Figure 4C for SAH does not seem accurate as it potentially includes non-specific methylation activity, likely from H4. This needs to be addressed for clarity and accuracy. 

      We thank the reviewer for this comment. The additional bands observed in Figure 4B represent degradation products of histone H3, not H4 methylation. This is commonly seen in in vitro reactions using recombinant nucleosomes, where partial proteolysis of H3 can occur under the assay conditions.  

      (3) In Figure 4E, the differences between bound and unbound substrates are not sufficiently pronounced. Given the modest differences observed, authors might want to consider repeating the assay with sufficient replicates to ensure the results are statistically robust.

      In Figure 4E, we observe a clear difference between the bound and unbound substrate. To aid interpretation, we have clarified in the figure where the bound complex migrates on the gel, while the unbound nucleosomes migrate at the bottom of the gel. The differences are indeed subtle, which we highlight in the text.  

      (4) Regarding labeling, there are multiple issues that need correction: In the depiction of Epicypher's dNuc, it is crucial to clearly mark H2B as the upper band, rather than ambiguously labeling H2A/H2B together when two distinct bands are evident. In Figure 3B and D, the histones appear to be mislabeled, and the band corresponding to H4 has been cut off. It would be beneficial to refer to Figure 3E for correct labeling to maintain consistency and accuracy across figures. 

      Thank you for pointing this out. To avoid any confusion, we have delineated the H2B and H2A markers and indicate the band corresponding to H4.

      (5) There are issues with the image quality in some blots; for instance, Figure 2EF and Figure 2D exhibit excessive contrast and pixelation, respectively. These issues could potentially obscure or misrepresent the data, and thus, adjustments in image processing are recommended to provide clearer, more accurate representations. 

      Contrast adjustments were applied uniformly across each entire image and were not used to modify any specific region of the blot. We have corrected the issue of increased pixelation in Figure 2D. 

      (6) The authors are recommended to provide detailed descriptions of the materials used, including catalog numbers and specific products, to allow for reproducibility and verification of experimental conditions. 

      We have added the missing product specifications and catalog numbers to ensure clarity and reproducibility of the experiments.

      (7) The identification of Setd2 as a tumor suppressor in KrasG12C-driven LUAD is a significant finding. However, the discussion on how this discovery could inspire future therapeutic approaches needs to be more balanced. The current discussion (Page 10) around the potential use of inhibitors is somewhat confusing and could benefit from a clearer explanation of how Setd2's role could be targeted therapeutically. It would be beneficial for the authors to explore both current and potential future strategies in a more structured manner, perhaps by delineating between direct inhibitors, pathway modulators, and other therapeutic modalities. 

      SETD2 is a tumor suppressor in lung cancer (as we show here and many others have clearly established in the literature) and thus we would recommend avoiding a SETD2 inhibitor to treat solid tumors, as it could have a very much unwanted affect.  Our discussion addresses a different point regarding the relative importance of the enzymatic activity versus other, nonenzymatic functions of SETD2. We believe that a detailed exploration of the therapeutic potential of inhibiting SETD2 would be better suited in a review or a more therapy-focused manuscript.

    1. eLife Assessment

      This study identifies novel approaches to improving transgene expression in the injured mammalian myocardium through a combination of a tissue regeneration enhancer element and engineered AAVs - specifically, a liver-detargeting capsid, AAV.cc84, and an in vivo library screen-selected AAV-IR41. The evidence is convincing, and the AAV vectors are of fundamental value to the field of cardiac gene therapy. Future research exploring how to combine the features of AAV.cc84 and AAV-IR41 could yield an even more promising vector for therapeutic use.

    2. Reviewer #1 (Public review):

      In this manuscript, Wolfson and co-authors demonstrate a combination of an injury-specific enhancer and engineered AAV that enhances transgene expression in injured myocardium. The authors characterize spatiotemporal dynamics of TREE-directed AAV expression in the injured heart using a non-invasive longitudinal monitoring system. They show that transgene expression is drastically increased 3 days post-injury, driven by 2ankrd1a. They reported a liver-detargeted capsid, AAV cc.84, with decreased viral entry into the liver while maintaining TREE transgene specificity. They further identified the IR41 serotype with enhanced transgene expression in injured myocardium from AAV library screening. This is an interesting study that optimizes the potential application of TREE delivery for cardiac repair.

      Comments on revisions:

      The authors are responsive and have addressed my concerns.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript by Wolfson et al., various adeno-associated viruses (AAVs) were delivered to mice to assess the cardiac-specificity, injury border-zone cardiomyocyte transduction rate, and temporal dynamics in the goal to find better AAVs for gene therapies targeting the heart. The authors delivered tissue regeneration enhancer elements (TREEs) controlling luciferase expression and used IVIS imaging to examine transduction in the heart and other organs. They found that luciferase expression increased in the first week after injury when using AAV9-TREE-Hsp68 promoter, waning to baseline levels by 7 weeks. However, AAV9 vectors transduced the liver, which was significantly reduced by using an AAV.cc84 liver de-targeting capsid. The authors then performed in vivo screening of AAV9 capsids and found AAV-IR41 to preferentially transduce injured myocardium when compared to AAV9. Finally, the authors combined TREEs with AAV-IR41 to show improved luciferase expression compared to AAV9-TREE at 7, 14 and 21 days after injury.

      Overall, this manuscript provides insights into TREE expression dynamics when paired with various heart-targeting capsids, which can be useful for researchers studying ischemic injury of murine hearts. While the authors have shown the success of using AAV9-TREEs in porcine hearts, it is unknown whether the expression dynamics would be similar in pigs or humans, as mentioned in the limitations.

      Strengths:

      Important contribution to the AAV gene therapy literature.

      Comments on revised version:

      My concerns have been adequately addressed.

    4. Reviewer #3 (Public review):

      Summary:

      The tissue regeneration enhancer elements (TREEs) identified in zebrafish have been shown to drive injury-activated temporal-spatial gene expression in mice and large animals. These findings increase the translational potential of findings in zebrafish to mammals. In this manuscript, the authors tested TREEs in combination with different adeno-associated viral (AAV) vectors using in vivo luciferase bioluminescent imaging that allows for longitudinal tracking. The TREE-driven luciferase delivered by a liver de-targeted AAV.cc84 decreased off-target transduction in liver. They further screened an AAV library to identify capsid variants that display enhanced transduction for infarcted myocardium post ischemia reperfusion and myocardial infarction. A new capsid variant, AAV.IR41, was found to show increased transduction post I/R and MI.

      Strengths:

      The authors injected AAV-cargo several days after ischemia/reperfusion (I/R) injury as a clinically relevant approach. Overall, this study is significant in that it identifies new AAV vectors that can be used to deliver promising genes as potential new gene therapies in the future. The manuscript is well-written and the data are also of high quality.

      Weaknesses:

      The authors have addressed my previous concerns.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      In this manuscript, Wolfson and co-authors demonstrate a combination of an injury-specific enhancer and engineered AAV that enhances transgene expression in injured myocardium. The authors characterize spatiotemporal dynamics of TREE-directed AAV expression in the injured heart using a non-invasive longitudinal monitoring system. They show that transgene expression is drastically increased 3 days post-injury, driven by 2ankrd1a. They reported a liver-detargeted capsid, AAV cc.84, with decreased viral entry into the liver while maintaining TREE transgene specificity. They further identified the IR41 serotype with enhanced transgene expression in injured myocardium from AAV library screening. This is an interesting study that optimizes the potential application of TREE delivery for cardiac repair. However, several concerns were raised prior to publication:

      Major Concerns:

      (1) In Figure 1, the authors demonstrated that 2andkrd1aEN is not responsive to sham injury after AAV delivery, but Figure 3 shows a strong response to sham when AAV is delivered after injury. The authors do not provide an explanation for this observation.

      This discrepancy is due to the timing of AAV delivery. In Figure 1, AAV was delivered 60 days prior to IVIS imaging and cardiac injury, allowing time for the baseline level of AAV transgene expression to reach a plateau. From this baseline level, we were able to measure fold change in luminescence signal before and after cardiac injury. In Figure 3, AAV was delivered 4 days after cardiac injury. Luminescence in the heart was measured 3 days later (day 7), when the baseline of AAV transgene expression is still building. The data from Figure 1C-D inform us that the 2ankrd1aEN response to cardiac injury peaks within the first week and returns to baseline levels after 5-7 weeks. In Figure 3E, we show that 2ankrd2aEN provides a baseline level of expression that is present in sham hearts and reaches its plateau after 6 weeks. In contrast, I/R injured hearts show enhanced expression in the first 3-4 weeks, corresponding with the dynamics of 2ankrd1aEN’s response to injury observed in Figure 1C. We have now included a phrase in the revised manuscript on p. 7, paragraph 1 to clarify.

      (2) In Figure 4, a higher GFP signal is observed in all areas of the heart of the IR41-treated mouse compared to AAV9. The authors should compare GFP expression between AAV9 and IR41 in uninjured hearts and provide insights into enhanced cardiac tropism to confirm that IR41 is MI injury enriched, not Sham as well.

      We sought to address this question with the experiments presented in Figure 5. We treated sham mice with AAV9 and IR41 containing 2ankrd1aEN. Figure 5D showed IR41 delivered more vector genomes to the sham heart on average, though not with a p-value less than 0.05 compared with AAV9. In Supplemental Figure 5B, IR41 also provided higher luminescence at day 7 post-sham but was comparable at day 14 and day 21. These data suggest IR41 might increase heart tropism in healthy hearts, but IR41’s effect is most dramatic when delivered to injured hearts, where cardiac vector genomes are highest (Figure 5D). We have now included a sentence in the revised manuscript on p. 8, paragraph 2 to clarify.

      (3) The authors should clarify which model is being used between myocardial infarction (MI) and Ischemia-reperfusion (IR) throughout the figures, as the experimental schemes and figure legends did not match with each other (MI or IR in Figure 1A, 1D, 3A, and 3E). Both models cause different types of injuries. The authors should explain the difference in TREE expression in both models.

      We have revised the figures to specify the model, where I/R or MI is used.

      (4) In Figure 2, the authors use REN instead of 2ankrd1aEN to demonstrate liver-detargeting using AAV cc.84. Is there a specific reason?

      Our data in Figure 1 informed us that off-target liver expression is more specifically an issue for REN compared to 2ankrd1aEN. Baseline levels of luminescence in the heart could not be as clearly marked due to off-target expression in the liver, which was showcased in Figure 2B with AAV9 delivery to sham mice. As discussed above, 2ankrd1aEN provided stronger baseline levels of expression of the heart which could be more clearly marked in IVIS images for tracking fold changes over time. For these reasons, we sought to explore how incorporation of the AAV.cc84 capsid could be utilized to minimize off-target liver expression. We have now included a sentence in the revised manuscript on p. 5, paragraph 3 to clarify.

      Reviewer #2 (Public review):

      In this manuscript by Wolfson et al., various adeno-associated viruses (AAVs) were delivered to mice to assess the cardiac-specificity, injury border-zone cardiomyocyte transduction rate, and temporal dynamics, with the goal of finding better AAVs for gene therapies targeting the heart. The authors delivered tissue regeneration enhancer elements (TREEs) controlling luciferase expression and used IVIS imaging to examine transduction in the heart and other organs. They found that luciferase expression increased in the first week after injury when using AAV9-TREE-Hsp68 promoter, waning to baseline levels by 7 weeks. However, AAV9 vectors transduced the liver, which was significantly reduced by using an AAV.cc84 liver de-targeting capsid. The authors then performed in vivo screening of AAV9 capsids and found AAV-IR41 to preferentially transduce injured myocardium when compared to AAV9. Finally, the authors combined TREEs with AAV-IR41 to show improved luciferase expression compared to AAV9-TREE at 7, 14, and 21 days after injury.

      Overall, this manuscript provides insights into TREE expression dynamics when paired with various heart-targeting capsids, which can be useful for researchers studying ischemic injury of murine hearts. While the authors have shown the success of using AAV9-TREEs in porcine hearts, it is unknown whether the expression dynamics would be similar in pigs or humans, as mentioned in the limitations.

      The following questions and concerns can be addressed to improve the manuscript:

      (1) From the IVIS data, it seems that the Hsp68 promoter might not be "normally silent in mouse tissues," specifically in the liver (Figure S1B). Are there any other promoters that can be combined with TREEs to induce cardiac-injury specific expression while minimizing liver expression? This could simplify capsid design to focus on delivery to injured areas.

      Indeed we found the Hsp68 promoter does provide low levels of baseline expression, especially in the liver of mice. The Hsp68 promoter was initially chosen due to its permissive nature allowing for assessment of expression directed by TREEs. Many or most groups use the Hsp68 promoter for enhancer tests in mice, but we agree that other permissive promoters might have lower baseline levels of expression and might have the benefit of smaller size. We have not rigorously tested other permissive promoters in our experiments.

      (2) Why is it that AAV9-TREE-Hsp68-Luc wane in expression (Figure 1C and 1D), whereas AAV.cc84-TREE-Hsp68-Luc expresses stably for over 2 months (3E)? This has important implications for the goal of transience in gene delivery.

      Please see our response to reviewer 1’s comment #1 above.

      (3) AAV-IR41 was found to transduce cardiomyocytes in the injured zone. However, this capsid also shows a very strong off-target liver expression. From a capsid design perspective, is it possible to combine AAV-cc84 and AAV-IR41?

      This approach is in theory possible as these epitopes are structurally distinct. However, since the mechanism (receptor usage) is currently unknown, it would not be possible to predict whether the properties are mutually exclusive. Further, we would need to ensure that combining modifications does not impact vector yield. We can explore such features with next generation candidates as we continue to improve the platform. We have now included a sentence in the revised manuscript on p. 9, paragraph 3, mentioning the possibility of combining the two capsid mutations.

      (4) It would be helpful to see immunostaining for the various time points in Figure 5. Is it possible to use an anti-luciferase antibody (or AAV-TREE-Hsp68-eGFP) to compare the two TREE capsids?

      We were not able to do immunostaining of luciferase expression, because the biopsied hearts were used to quantify vector genomes via qPCR. We have previously reported results of immunostaining of EGFP expression directed by 2ankrd1aEN in I/R-injured mouse hearts (Yan et al., 2023), which we expect to match the expression seen in these experiments.

      Reviewer #3 (Public review):

      Summary:

      The tissue regeneration enhancer elements (TREEs) identified in zebrafish have been shown to drive injury-activated temporal-spatial gene expression in mice and large animals. These findings increase the translational potential of findings in zebrafish to mammals. In this manuscript, the authors tested TREEs in combination with different adeno-associated viral (AAV) vectors using in vivo luciferase bioluminescent imaging that allows for longitudinal tracking. The TREE-driven luciferase delivered by a liver de-targeted AAV.cc84 decreased off-target transduction in the liver. They further screened an AAV library to identify capsid variants that display enhanced transduction for myocardium post-myocardial infarction. A new capsid variant, AAV.IR41, was found to show increased transduction at the infarct border zones.

      Strengths:

      The authors injected AAV-cargo several days after ischemia/reperfusion (I/R) injury as a clinically relevant approach. Overall, this study is significant in that it identifies new AAV vectors for potential new gene therapies in the future. The manuscript is well-written, and their data are also of high quality.

      Weaknesses:

      The authors might be using MI (myocardial infarction) and I/R injury interchangeably in their text and labels. For instance, "We systemically transduced mice at 4 days after permanent left coronary artery ligation with either AAV9 or IR41 harboring a 2ankrd1aEN-Hsp68::fLuc transgene. IVIS imaging revealed higher expression levels in animals transduced with IR41 compared to AAV9, in both sham and I/R groups (Fig. 5A)". They should keep it consistent. There is also no description for the MI model.

      We have adjusted figure labels and main text to ensure the injury model is described correctly.

      We have also addressed all additional Recommendations for the authors, which requested minor modifications to figures like error bars and image annotation.

    1. eLife Assessment

      This important study provides a conceptual advance in our understanding of how membrane geometry modulates the balance between specific and non-specific molecular interactions, reversing multiphase morphologies in postsynaptic protein assemblies. Using a mesoscale simulation framework grounded in experimental binding affinities, the authors successfully recapitulate key experimental observations in both solution and membrane-associated systems, providing novel mechanistic insight into how spatial constraints regulate postsynaptic condensate organization. The conclusions are supported by solid strength of evidence and the findings are of broad significance for both computational and experimental biologists

    2. Reviewer #2 (Public review):

      This is a timely and insightful study aiming to explore the general physical principles for the sub-compartmentalization--or lack thereof--in the phase separation processes underlying the assembly of postsynaptic densities (PSDs), especially the markedly different organizations in three-dimensional (3D) droplets on one hand and the two-dimensional (2D) condensates associated with a cellular membrane on the other. Simulation of a highly simplified model (one bead per protein domain) is apparently carefully executed. Based on a thorough consideration of various control cases, the main conclusion regarding the trade-off between repulsive excluded volume interactions and attractive interactions among protein domains in determining the structures of 3D vs 2D model PSD condensates is quite convincing. The novel results in this manuscript should be published.

      Comment on the revised manuscript:

      The authors have adequately addressed all my previous concerns. The manuscript is now much improved, ready for publication as a version of record.

    3. Reviewer #3 (Public review):

      Summary:

      In this work, Yamada, Brandani and Takada have developed a mesoscopic model of the interacting proteins in the postsynaptic density. They have performed simulations, based on this model and using the software ReaDDy, to study the phase separation in this system in 2D (on the membrane) and 3D (in the bulk). They have carefully investigated the reasons behind different morphologies observed in each case, and have looked at differences in valency, specific/non-specific interactions and interfacial tension.

      Strengths:

      The simulation model is developed very carefully, with strong reliance on binding valency and geometry, experimentally measured affinities, and physical considerations like the hydrodynamic radii. The presented analyses are also thorough, and great effort has been put into investigating different scenarios that might explain the observed effects.

      Weaknesses:

      The biggest weakness of the study, in my opinion, has been a lack of more in-depth and quantitative physical insights about phase separation theories. In the revised version, the authors have added text to point the interested reader to the respective theories, and have included a qualitative assessment of their findings in the light of said theories. This better positions their discussion. I still believe the role of entropic effects need more attention, which can be the subject of future studies.

      The authors have revised their Introduction and added text to the Discussion, to enrich their view on the attractive and repulsive forces as well as mixing entropy. This version better covers the physics of phase separation.

      I appreciate the added discussion about the different diffusive behavior in the membrane in contrast to the bulk (i.e. the Saffman-Delbrück model). This paves the way for future studies, including realistic kinetics of the studied system.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This study uses mesoscale simulations to investigate how membrane geometry regulates the multiphase organization of postsynaptic condensates. It reveals that dimensionality shifts the balance between specific and non-specific interactions, thereby reversing domain morphology observed in vitro versus in vivo.

      Strengths:

      The model is grounded in experimental binding affinities, reproduces key experimental observations in 3D and 2D contexts, and offers mechanistic insight into how geometry and molecular features drive phase behavior.

      Weaknesses:

      The model omits other synaptic components that may influence domain organization and does not extensively explore parameter sensitivity or broader physiological variability.

      We thank the reviewer for his/her time and effort to our manuscript. We agree with the point that the contribution of other synaptic components should be addressed. We have included a discussion of the effects of environmental factors such as protein and ion concentrations, as well as other omitted postsynaptic components (SAPAP, Shank, and Homer) on phase morphology. In the middle of the 2<sup>nd</sup> paragraph of Discussion, we added: 

      “While these in vivo results contain additional scaffold and cytoskeletal elements omitted in our model, such as SAPAP, Shank and Homer, nearly all proteins in the middle and lower layers of the PSD associate directly or indirectly with PSD-95 in the upper PSD layer. Consequently, it is probable that other scaffold proteins contribute to the mobility of AMPAR-containing and NMDAR-containing nanodomains indistinguishably. They may increase the stability of the AMPAR and NMDAR clusters but are unlikely to have a distinct effect to reverse the phase-separation phenomenon.”

      Also, as the reviewer pointed out, we agree with that physiological factors such as ion concentration may influence the phase. However, conditions such as ion concentration are implicitly implemented as the specific and nonspecific interactions in this model, which makes it difficult to estimate the effect of each physiological condition individually. We added the variability potential of physiological conditions to the discussion section as a limitation of this model. To investigate parameter sensitivity in more detail, we performed additional MD simulations with weakened membrane constraints to account for the behavior between 3D and 2D. We added:

      “First, our results did not provide direct insights to physiological conditions, such as ion concentrations. Since such factors are implicitly implemented in our model, it is difficult to estimate these effects individually. This suggests the need for future implementation of environmental factors and validation under a broader range of in vivo-like settings.”

      Reviewer #2 (Public review):

      This is a timely and insightful study aiming to explore the general physical principles for the sub-compartmentalization--or lack thereof--in the phase separation processes underlying the assembly of postsynaptic densities (PSDs), especially the markedly different organizations in three-dimensional (3D) droplets on one hand and the twodimensional (2D) condensates associated with a cellular membrane on the other. Simulation of a highly simplified model (one bead per protein domain) is carefully executed. Based on a thorough consideration of various control cases, the main conclusion regarding the trade-off between repulsive excluded volume interactions and attractive interactions among protein domains in determining the structures of 3D vs 2D model PSD condensates is quite convincing. The results in this manuscript are novel; however, as it stands, there is substantial room for improvement in the presentation of the background and the findings of this work. In particular,

      (i) conceptual connections with prior works should be better discussed 

      (ii) essential details of the model should be clarified, and

      (iii) the generality and limitations of the authors' approach should be better delineated.

      We appreciate the reviewer for his/her time and effort on our manuscript and for encouraging comments and helpful suggestions. We answered every technical comment the reviewer mentioned below.

      Specifically, the following items should be addressed (with the additional references mentioned below cited and discussed):

      (1) Excluded volume effects are referred to throughout the text by various terms and descriptions such as "repulsive force according to the volume" (e.g., in the Introduction), "nonspecific volume interaction", and "volume effects" in this manuscript. This is somewhat curious and not conducive to clarity, because these terms have alternate or connotations of alternate meanings (e.g., in biomolecular modeling, repulsive interactions usually refer to those with longer spatial ranges, such as that between like charges). It will be much clearer if the authors simply refer to excluded volume interactions as excluded volume interactions (or effects).  

      Thank you for this comment. We have substituted the words “excluded volume interactions” for words of similar meaning. However, we have left the expression of “non-specific interactions” as they are referring to explicit interactions that are given as force fields in the model, rather than in the general meaning of excluded volume effect.

      (2) In as much as the impact of excluded volume effects on subcompartmentalization of condensates ("multiple phases" in the authors' terminology), it has been demonstrated by both coarse-grained molecular dynamics and field-theoretic simulations that excluded volume is conducive to demixing of molecular species in condensates [Pal et al., Phys Rev E 103:042406 (2021); see especially Figures 4-5 of this reference]. This prior work bears directly on the authors' observation. Its relationship with the present work should be discussed.  

      We appreciate the reviewer’s insightful comment. We have now included a more detailed discussion on excluded volume effect in the revised manuscript, which provides important context for our findings. Furthermore, we have cited the references to support and enrich the discussion, as recommended.

      (3)  In the present model setup, activation of the CaMKII kinase affects only its binding to GluN2Bc. This approach is reasonable and leads to model predictions that are essentially consistent with the experiment. More broadly, however, do the authors expect activation of the CaMKII kinase to lead to phosphorylation of some of the molecular species involved with PSDs? This may be of interest since biomolecular condensates are known to be modulated by phosphorylation [Kim et al., Science 365:825-829 (2019); Lin et al, eLife 13:RP100284 (2025)].  

      We agree that phosphorylation effect on phase separation is an important and interesting aspect to consider. Some experimental results have shown that activation of CaMKII can lead to phosphorylation of various proteins and make PSD condensate more stable by altering their interactions. We included the sentence below in limitations:

      “In this context, we also do not explicitly account for downstream phosphorylation events. Although such proteins are not included in the current components, they will regulate PSD-95, affecting its binding valency, or diffusion coefficient. This is a subject worthy of future research.”

      (4) The forcefield for confinement of AMPAR/TARP and NMDAR/GluN2Bc to 2D should be specified in the main text. Have the authors explored the sensitivity of their 2D findings on the strength of this confinement?

      We thank the reviewer for the helpful recommendation. We have revised the manuscript to include membrane-mimicking potential on main text. Furthermore, we also think that exploring the shape of the 3D/2D condensate phase due to the sensitivity of confinement is a very interesting point. We have additionally performed MD simulations with smaller/larger membrane constraints and included the results in supporting information as Figure S5. The following parts are added:

      “We further attempted to mimic intermediate conditions between 3D and 2D systems in two different manners. First, we applied a weaker membrane constraint in 2D system. Even when the strength of membrane constraints is reduced by a factor of 1000, NMDARs are located on the inner side when the CaMKII was active, as well as the result in 2D system (Fig.S5ABC). Second, to weaken further the effect of membrane constraints, we artificially altered the membrane thickness from 5 nm to 50 nm, in addition to reducing the membrane constraints by 1000. As a result, NMDAR clusters move to the bottom and surround AMPAR (Fig.S5DEF). In this artificial intermediate condition, both states in which the NMDARs are outside (corresponding to 3D) and in which the NMDARs are inside (corresponding to 2D) are observed, depending on the strength of the membrane constraint.”

      (5)  Some of the labels in Figure 1 are confusing. In Figure 1A, the structure labeled as AMPAR has the same shape as the structure labeled as TARP in Figure 1B, but TARP is labeled as one of the smaller structures (like small legs) in the lower part of AMPAR in Figure 1A. Does the TARP in Figure 1B correspond to the small structures in the lower part of AMPAR? If so, this should be specified (and better indicated graphically), and in that case, it would be better not to use the same structural drawing for the overall structure and a substructure. The same issue is seen for NMDAR in Figure 1A and GluN2Bc in Figure 1B. 

      (6) In addition to clarifying Figure 1, the authors should clarify the usage of AMPAR vs TARP and NMDAR vs GluN2Bc in other parts of the text as well.

      (7) The physics of the authors' model will be much clearer if they provide an easily accessible graphical description of the relative interaction strengths between different domain-representing spheres (beads) in their model. For this purpose, a representation similar to that given by Feric et al., Cell 165:1686-1697 (2016) (especially Figure 6B in this reference) of the pairwise interactions among the beads in the authors' model should be provided as an additional main-text figure. Different interaction schemes corresponding to inactive and activated CAMKII should be given. In this way, the general principles (beyond the PSD system) governing 3D vs 2D multiple-component condensate organization can be made much more apparent.  \

      We sincerely appreciate the reviewer’s comments. According to the recommendation, we have changed the diagram in Figure 1B into interaction matrix with each mesoscale molecular representation and the expression in main text to be clearer about AMPAR and TARP, and about the relationship between NMDAR and GluN2Bc. Former diagram of the pairs of specific interaction is moved to supplementary figure. 

      (8) Can the authors' rationalization of the observed difference between 3D and 2D model PSD condensates be captured by an intuitive appreciation of the restriction on favorable interactions by steric hindrance and the reduction in interaction cooperativity in 2D vs 3D?  

      We thank the reviewer for the comment. As pointed out, the multiphase morphology change observed in this study can be attributed to a decrease in coordination number in 2D compared to 3D. We have included the physicochemical rationalization in the discussion.  

      (9) In the authors' model, the propensity to form 2D condensates is quite weak. Is this prediction consistent with the experiment? Real PSDs do form 2D condensates around synapses.  

      We are grateful to the reviewer for highlighting this important point. We agree with that the real PSD forms 3D condensates beneath the 2D membrane. Some lower PSD components under the membrane (i.e. SAPAP, Shank, and Homer) are omitted in our system, which may cause a weak condensation. To emphasize this, we have added the following sentence:

      “While these in vivo results contain additional scaffold and cytoskeletal elements omitted in our model, such as SAPAP, Shank and Homer, nearly all proteins in the middle and lower layers of the PSD associate directly or indirectly with PSD-95 in the upper PSD layer. Consequently, it is probable that other scaffold proteins contribute to the mobility of AMPAR-containing and NMDAR-containing nanodomains indistinguishably. They may increase the stability of the AMPAR and NMDAR clusters but are unlikely to have a distinct effect to reverse the phase-separation phenomenon.”

      However, we believe that the clusters formed on the 2D membrane are not a robust “phase” because they do not follow scaling law. In fact, in our previous study of PSD system with AMPAR(TARP)<sub>4</sub> and PSD-95, we have already reported that phase separation is less likely to occur in 2D than in 3D. The previous result suggests that phase separation on membrane may be difficult to achieve, which is consistent with the results of this study.

      (10) More theoretical context should be provided in the Introduction and/or Discussion by drawing connections to pertinent prior works on physical determinants of co-mixing and de-mixing in multiple-component condensates (e.g., amino acid sequence), such as Lin et al., New J Phys 19:115003 (2017) and Lin et al., Biochemistry 57:2499-2508 (2018). 

      (11) In the discussion of the physiological/neurological significance of PSD in the Introduction and/or Discussion, for general interest it is useful to point to a recently studied possible connection between the hydrostatic pressure-induced dissolution of model PSD and high-pressure neurological syndrome [Lin et al., Chem Eur J 26:11024-11031 (2020)].

      We thank the reviewer for the helpful recommendation. We have added the recommended references in each relevant part in introduction, respectively.

      (12) It is more accurate to use "perpendicular to the membrane" rather than "vertical" in the caption for Figure 3E and other such descriptions of the orientation of the CaMKII hexagonal plane in the text.

      We thank you for your comment. We replaced the word “vertical” with “perpendicular" in the main text and caption.

      Reviewer #3 (Public review):

      Summary:

      In this work, Yamada, Brandani, and Takada have developed a mesoscopic model of the interacting proteins in the postsynaptic density. They have performed simulations, based on this model and using the software ReaDDy, to study the phase separation in this system in 2D (on the membrane) and 3D (in the bulk). They have carefully investigated the reasons behind different morphologies observed in each case, and have looked at differences in valency, specific/non-specific interactions, and interfacial tension.

      Strengths:

      The simulation model is developed very carefully, with strong reliance on binding valency and geometry, experimentally measured affinities, and physical considerations like the hydrodynamic radii. The presented analyses are also thorough, and great effort has been put into investigating different scenarios that might explain the observed effects.

      Weaknesses:

      The biggest weakness of the study, in my opinion, has to do with a lack of more in-depth physical insight about phase separation. For example, the authors express surprise about similar interactions between components resulting in different phase separation in 2D and 3D. This is not surprising at all, as in 3D, higher coordination numbers and more available volume translate to lower free energy, which easily explains phase separation. The role of entropy is also significantly missing from the analyses. When interaction strengths are small, entropic effects play major roles. In the introduction, the authors present an oversimplified view of associative and segregative phase transitions based on the attractive and repulsive interactions, and I'm afraid that this view, in which all the observed morphologies should have clear pairwise enthalpic explanations, diffuses throughout the analysis. Meanwhile, I believe the authors correctly identify some relevant effects, where they consider specific/nonspecific interactions, or when they investigate the reduced valency of CaMKII in the 2D system.

      We thank the reviewer for the insightful and constructive comments. Regarding the difference in phase behavior between 2D and 3D systems, we appreciate the reviewer’s clarification that differences in coordination number and entropy in higher dimensions can account for the observed morphology of the phases. While it may be clear that entropy decreases due to the decrease of coordination number, our objective was to uncover how such an isotropic entropy reduction regulates the behavior of each phase driven by different interactions, which remains largely unknown. To emphasize this, we modified the introduction and have now included a discussion of the entropic contributions to phase behavior in both 2D and 3D systems, and we have made this clearer in the revised manuscript by referencing relevant theoretical frameworks. In the Discussion, we added the sentence below:

      “Generally, phase separation can be explained by the Flory-Huggins theory and its extensions: phase separation can be favored by the difference in the effective pairwise interactions in the same phase compared to those across different phases, and is disfavored by mixing entropy. The effective interactions contain various molecular interactions, including direct van der Waals and electrostatic interactions, hydrophobic interactions, and purely entropic macromolecular excluded volume interactions. For the latter, Asakura-Oosawa depletion force can drive the phase separation. Furthermore, the demixing effect was explicitly demonstrated in previous simulations and field theory (61). Importantly, we note that the effective pairwise interactions scale with the coordination number z. The coordination number is a clear and major difference between 3D and 2D systems. In 3D systems, large z allows both relatively strong few specific interactions and many weak non-specific interactions. While a single specific interaction is, by definition, stronger than a single non-specific interaction, contribution of the latter can have strong impact due to its large number. On the other hand, a smaller z in the membrane-bound 2D system limits the number of interactions. In case of limited competitive binding, specific interactions tend to be prioritized compared to non-specific ones. In fact, Fig. 3A clearly shows that number of specific interactions in 2D is similar to that in 3D, while that of non-specific interactions is dramatically reduced in 2D. In the current PSD system, CaMKII is characterized by large valency and large volume. In the 3D solution system, non-specific excluded volume interactions drive CaMKII to the outer phase, while this effect is largely reduced in 2D, resulting in the reversed multiphase.   

      Also, I sense some haste in comparing the findings with experimental observations. For example, the authors mention that "For the current four component PSD system, the product of concentrations of each molecule in the dilute phase is in good agreement with that of the experimental concentrations (Table S2)." But the data used here is the dilute phase, which is the remnant of a system prepared at very high concentrations and allowed to phase separate. The errors reported in Table S2 already cast doubt on this comparison. 

      We thank the reviewer for the insightful comment. In the validation process, we adjusted the parameters so that the number of molecules in dilute phase is consistent with the experimental lower limit of phase separation, based on the assumption that phase-separated dilute phase is the same concentration as the critical concentration. That is why we focus on comparing dilute phase concentration in Table S2. However, in our simulations, the number of protein molecules is relatively small since it is based on the average number per synapse spine. For example, there are only about 60 CaMKII molecules at most, and its presence in the dilute phase is highly sensitive to concentration, as the reviewer pointed out. This is one of the limitations, so we have added a description to the Limitations section. We added:

      “Second, parameter calibration contains some uncertainty. Previous in vitro study results used for parameter validation are at relatively high concentrations for phase separation, which may shift critical thresholds compared to that in in vivo environments. Also, since the number of molecules included in the model is small, the difference of a single molecule could result in a large error during this validation process.”

      Or while the 2D system is prepared via confining the particles to the vicinity of the membrane, the different diffusive behavior in the membrane, in contrast to the bulk (i.e., the Saffman-Delbrück model), is not considered. This would thus make it difficult to interpret the results of a coupled 2D/3D system and compare them to the actual system.

      We appreciate the reviewer’s helpful comment. We agree with that there is a concern that the Einstein-Stokes equation does not adequately reproduce the diffusion of membrane-embedded particles. We recalculated the diffusion coefficients for every membrane particle used in this model using the Saffman-Delbrück model and found that diffusion coefficients for receptor cores (AMPAR and NMDAR) were approximately three times larger. These values are still about ~10 times smaller than that of molecules diffusing under the cytoplasm. Additionally, since this study focuses on the morphology of the phase/cluster at the thermodynamic equilibrium, we think that the magnitude of the diffusion coefficient has little influence on the final structure of the cluster. However, we will incorporate the membrane-embedded diffusion as a future improvement item for better modelling and implementation. We added:

      “Third, we estimated all the diffusion coefficients from the Einstein-Stokes equation, which may oversimplify membrane-associated dynamics. Applying the Saffmann-Delbrück model to membrane-embedded particles would be desired although the resulting diffusion coefficients remain of the same order of magnitude. These limitations highlight the need for further research, yet they do not undermine the core significance of the present findings in advancing our understanding of multiphase morphologies.”

    1. eLife Assessment

      Kin selection and inclusive fitness have generated significant controversy. This paper reconsiders the general form of Hamilton's rule in which benefits and costs are defined as regression coefficients, with higher-order coefficients being added to accommodate non-linear interactions. The paper is a landmark contribution to the field with compelling, systematic analysis, giving clarity to long-standing debates.

    2. Joint Public Review:

      This manuscript reconsiders the "general form" of Hamilton's rule, in which "benefit" and "cost" are defined as regression coefficients. It points out that there is no reason to insist on Hamilton's rule of the form -c+br>0, and that, in fact, arbitrarily many terms (i.e. higher-order regression coefficients) can be added to Hamilton's rule to reflect nonlinear interactions. Furthermore, it argues that insisting on a rule of the form -c+br>0 can result in conditions that are true but meaningless and that statistical considerations should be employed to determine which form of Hamilton's rule is meaningful for a given dataset or model.

      Comments on latest version:

      The authors have provided a robust, valuable and detailed response to the previous reviews.

      Comments from Reviewer #1: I have nothing further to add.

      Comments from Reviewer #2: I appreciate the clarifications the author has made to the manuscript regarding (i) "sample covariance" terminology, (ii) the generality of the "generalized Price equation", and (iii) the distinction between the covariance and regression forms of the Price equation. I also appreciate that the ms now engages more deeply with some of the previous literature on regression-based Hamilton's rules (e.g. Smith et al., 2010; Rousset 2015). I feel these revisions make this contribution more valuable, and also more technically sound, since the term "sample covariance" is no longer used incorrectly.

      I also add that I agree with the substance of the authors' response to Reviewer #3. That is, the original submission was very clear that the regression-based Hamilton's rule is already completely general in the range of situations to which it applies, and that the added "generality" in the present ms refers to the variety of regression models that can be applied to these situations. In this way, the original ms already anticipates and addresses the criticism that Reviewer #3 raises.

      Reviewer #3 did not provide comments on the revised version.

    3. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      There has been intense controversy over the generality of Hamilton's inclusive fitness rule for how evolution works on social behaviors. All generally agree that relatedness can be a game changer, for example allowing for otherwise unselectable altruistic behaviors when 𝑐 < 𝑟𝑏, where 𝑐 is the fitness cost to the altruism, 𝑏 is the fitness benefit to another, and 𝑟 their relatedness. Many complications have been successfully incorporated into the theory, including different reproductive values and viscous population structures.

      I agree, especially if by incorporating viscous population structures, the reviewer means the discovery of the cancellation effect (Wilson, Pollock, and Dugatkin, 1992, Taylor, 1992).

      The controversy has centered on another dimension; Hamilton's original model was for additive fitness, but how does his result hold when fitnesses are non-additive? One approach has been not to worry about a general result but just find results for particular cases. A consistent finding is that the results depend on the frequency of the social allele - nonadditivity causes frequency dependence that was absent in Hamilton's approach.

      Just to be extra precise: Hamilton’s (1964) original model did not use the Price equation nor the regression approach to define costs and benefits, and it did indeed simply presuppose fixed, additive fitness effects.

      Also for extra precision on terminology: many researchers will describe all fitnesses in social evolution as frequency dependent. The reason they do, is that with or without additivity, both the fitness of cooperators (with the social allele) and the fitness of defectors (without the social alle) typically increase in the frequency of cooperators in the population; the more cooperators there are, the more individuals run into them, which increases average fitness. The result depending on the frequency I take to mean that which of those two fitnesses is larger flips at a certain frequency, which automatically implies that the difference between them is depending on the frequency of the social allele. This is indeed the result of non-additivity. We will return to this in more detail in the response to Reviewer #3. Also at the end of Appendix B I have added a bit to be extra precise regarding frequency dependence.

      Two other approaches derive from Queller via the Price equation. Queller 1 is to find forms like Hamilton's rule, but with additional terms that deal with non-additive interaction, each with an r-like population structure variable multiplied by a b-like fitness effect (Queller, 1985). Queller 2 redefines the fitness effects c and b as partial regressions of the actor's and recipient's genes on fitness. This leaves Hamilton's rule intact, just with new definitions of c and b that depend on frequency (Queller, 1992a).

      Queller 2 is the version that has been most adopted by the inclusive fitness community along with assertions that Hamilton's rule in completely general. In this paper, van Veelen argues that Queller 1 is the correct approach. He derives a general form that Queller only hinted at. He does so within a more rigorous framework that puts both Price's equation and Hamilton's rule on firmer statistical ground. Within that framework, the Queller 2 approach is seen to be a statistical misspecification - it employs a model without interaction in cases that actually do have interaction. If we accept that this is a fatal flaw, the original version of Hamilton's rule is limited to linear fitness models, which might not be common.

      I totally agree.

      Strengths:

      While the approach is not entirely new, this paper provides a more rigorous approach and a more general result. It shows that both Queller 1 and Queller 2 are identities and give accurate results, because both are derived from the Price equation, which is an identity. So why prefer Queller 1? It identifies the misspecification issue with the Queller 2 approach and points out its consequences. For example, it will not give the minimum squared differences between the model and data. It does not separate the behavioral effects of the individuals from the population state (𝑏 and 𝑐 become dependent on 𝑟 and the population frequency).

      Just to be precise on a detail: in the data domain, as long as the number of parameters in a statistical model is lower than the number of data points, adding parameters typically (generically) lowers the sum of squared errors. That is to say, for an underspecified statistical model, the sum of squared errors goes down if a parameter is added, but for an already overspecified statistical model, the same is still true (although, typically, by how much the sum of squared errors is reduced will differ). The model specification task for a statistician includes knowing when to keep adding parameters, because the data suggest that the model is still underspecified, and when to stop adding parameters, because the model is well-specified, even if adding parameters still reduces the sum of squared errors.

      In a modeling context, on the other hand, one can say that sum of squared differences will stop decreasing at the point where the statistical model is well-specified, that is: when it matches the model we are considering.

      The paper also shows how the same problems can apply to non-social traits. Epistasis is the non-additivity of effects of two genes within the individual. (So one wonders why have we not had a similarly fierce controversy over how we should treat epistasis?)

      The paper is clearly written. Though somewhat repetitive, particularly in the long supplement, most of that repetition has the purpose of underscoring how the same points apply equally to a variety of different models.

      Finally, this may be a big step towards reconciliation in the inclusive fitness wars. Van Veelen has been one of the harshest critics of inclusive fitness, and now he is proposing a version of it.

      I am very happy to hear this, because I am indeed hopeful for reconciliation. I would like to add a comment, though. The debate on Hamilton’s rule/inclusive fitness is regularly thought of as a battle between two partizan camps, where both sides care at least as much about winning as they do about getting things right. This is totally understandable, because to some degree that is true. Also, I agree that it is fair to position me in the camp that is critical of the inclusive fitness literature. However, I would like to think that I have not been taking random shots at Hamilton’s rule. I have pointed to problems with the typical use of the Price equation and Hamilton’s rule, and I think I did for very good reasons. I am obviously very happy that finding the Generalized Price equation, and the general version of Hamilton’s rule, allowed me to go beyond this, and (finally) offer a correct alternative, and I totally appreciate that this opens the door for reconciliation, as this reviewer points out. But I would not describe this as a road-toDamascus moment. In order to illustrate the continuity in my work, I would like to point to three papers.

      In van Veelen (2007), I pointed to the missing link between the central result in Hamilton’s (1964) famous paper (which states that selection dynamics take the population to a state where mean inclusive fitness is maximized), and Hamilton’s actual rule (which states that selection will lead to individuals maximizing their individual inclusive fitness). My repair stated the additional assumptions that were necessary to make the latter follow from the former. I would say that this can hardly be characterized as an attack on Hamilton’s rule. Reading Hamilton (1964) with enough care to notice something is missing, and then repairing it, I think is a sign of respect, and not an attack.

      Van Veelen (2011) is about the replicator dynamics for n-player games, with the possibility of assortment. This puts the paper in a domain that does not assume weak selection, and that is typically not much oriented towards inclusive fitness. I included a theorem that implies that, under the condition of linearity, inclusive fitness not only gets the direction of selection right, but 𝑟𝑏 − 𝑐 becomes a parameter that also determines the speed of selection. This I think is representative, in the sense that in many of my papers, I carefully stake out when the classic version of Hamilton’s rule does work.

      In Akdeniz and van Veelen (2020), we moreover take a totally standard inclusive fitness approach in a model of the cancellation effect at the group level.

      I would say that this does not line up with the image of a harsh critic that takes random shots at Hamilton’s rule or inclusive fitness.

      Weaknesses:

      van Veelen argues that the field essentially abandoned the Queller 1 approach after its publication. I think this is putting it too strongly - there have been a number of theoretical studies that incorporate extra terms with higher-order relatednesses. It is probably accurate to say that there has been relative neglect. But perhaps this is partly due to a perception that this approach is difficult to apply.

      I can imagine that the perceived difficulty in application may have played a role in the neglect of the Queller 1 approach. What for sure has played a role, and I would think a much bigger one, is that the literature has been pretty outspoken that the Queller 1 approach is the wrong way to go. The main text cites a number of papers that hold this position very emphatically (The first one of those was a News and Views by Alan Grafen (1985) that accompanied the paper in which Queller presented his Queller 1 approach. I am very happy that Appendix B shows on how many levels this News and Views was wrong.). There is only a handful of papers that follow the Queller 1 example.

      The model in this paper is quite elegant and helps clarify conceptual issues, but I wonder how practical it will turn out to be. In terms of modeling complicated cases, I suspect most practitioners will continue doing what they have been doing, for example using population genetics or adaptive dynamics, without worrying about neatly separating out a series of terms multiplying fitness coefficients and population structure coefficients.

      I am not sure if I see what the reviewer envisions practitioners that use population genetics will keep on doing. I would think that the Generalized Price equation in regression form is a description of population genetic dynamics, and therefore, if practitioners will not make an effort to “neatly separate out a series of terms multiplying fitness coefficients and population structure coefficients”, then all I can say is that they should. I cannot do more than explain why, if they do not, they are at risk of mischaracterizing what gets selected and why.

      Regarding those that use adaptive dynamics, I would say that this is a whole different approach. Within this approach, one can also apply inclusive fitness; see Section 6 and Appendix D of van Veelen et al. (2017). Appendix D is full of deep technical results and was done by Benjamin Allen.

      For empirical studies, it is going to be hard to even try to estimate all those additional parameters. In reality, even the standard Hamilton's rule is rarely tested by trying to estimate all its parameters. Instead, it is commonly tested more indirectly, for example by comparative tests of the importance of relatedness. That of course would not distinguish between additive and non-additive models that both depend on relatedness, but it does test the core idea of kin selection. It will be interesting to see if van Veelen's approach stimulates new ways of exploring the real world.

      Regarding the impact on empirical studies, there are a few things that I would like to say. The first is that I would just like to repeat, maybe a bit more elaborately, what I wrote at the end of the main text. Given that the generalized version of Hamilton’s rule produces a host of Hamilton-like rules, and given the fact that all of them by construction indicate the direction of selection accurately, the question whether or not Hamilton’s rule holds turns out to be illposed. That means that we can stop doing empirical tests of Hamilton’s rule, which are predicated on the idea that Hamilton’s rule, with benefits and costs being determined by the regression method, could be violated – which it cannot (Side note: it is possible to violate Hamilton’s rule, if costs and benefits are defined according to the counterfactual method; see van Veelen et al. (2017) and van Veelen (2018). This way of defining costs and benefits is less common, although there are authors that find this definition natural enough to assume that this is the way in which everybody defines costs and benefits (Karlin and Matessi, 1983, Matessi and Karlin, 1984).). Instead, we should do empirical studies to find out which version of Hamilton’s rule applies to which behaviour in which species.

      would like to not understate what a step forward this is. The size of the step forwards is of course also due to the dismal point of departure. As theorists, we have failed our empiricists, because all 12 studies included in the review by Bourke (2014) of papers that explicitly test Hamilton’s rule are based on the misguided idea that the traditional Hamilton’s rule, with costs and benefits defined according to the regression method, can be violated. While the field does sometimes have disdain for mathematical nit-picking, this is a point where a little more attention to detail would have really helped. If the hypothesis is that Hamilton’s rule holds, and the null is that it does not, then trying to specify how the empirical quantity that reflects inclusive fitness would be distributed under the null hypothesis (in order to do the right statistical tests) would have forced researchers to do something with the information that this quantity is not distributed at all, because Hamilton’s rule is general (in the sense that it holds for any way in which the world works). If one would prefer to reverse the null and the alternative hypothesis, one would run into similar problems. Understanding that the question is ill-posed therefore is a big step forwards from the terrible state of statistics and the waste of research time, attention and money on the empirical side of this field (see also Section 8 of van Veelen et al., 2017).

      I would agree that doing comparative statics may not be much affected by this. Section 5 of van Veelen et al. (2017) indicates that there can be a large set of circumstances under which the general idea “relatedness up → cooperation up” still applies. But that may be a bit unambitious, and Section 8 of van Veelen et al. (2017), and the final section of van Veelen (2018) contain some reflections on empirical testing that may allow us to go beyond that. As long as there is change happening in the Generalized Price equation, the population is not in equilibrium. For empirical tests, one can either aim to capture selection as it happens, or assume that what we observe reflects properties of an equilibrium. This leads to interesting reflections on how to do empirics, which may differ between traits that are continuous and traits that are discrete (again: see van Veelen et al. (2017), and van Veelen (2018).

      Reviewer #2 (Public review):

      Summary:

      This manuscript reconsiders the "general form" of Hamilton's rule, in which "benefit" and "cost" are defined as regression coefficients. It points out that there is no reason to insist on Hamilton's rule of the form −𝑐 + 𝑏𝑟 > 0, and that, in fact, arbitrarily many terms (i.e. higherorder regression coefficients) can be added to Hamilton's rule to reflect nonlinear interactions. Furthermore, it argues that insisting on a rule of the form −𝑐 + 𝑏𝑟 > 0 can result in conditions that are true but meaningless and that statistical considerations should be employed to determine which form of Hamilton's rule is meaningful for a given dataset or model.

      Totally right. I cannot help to want to be extra precise, though, by distinguishing between the data domain and the modelling domain. In the data domain, statistical considerations apply in order to avoid misspecification. In this domain, avoiding misspecification can be complicated, because we do not know the underlying data generating process, and we depend on noisy data to make a best guess. In the modeling domain, however, there is no excuse for misspecification, as the model is postulated by the modeler. I therefore would think that in this domain, it does not really require “statistical considerations” to minimize the probability of misspecification; we can get the probability of misspecification all the way down to 0 by just choosing not to do it.

      Strengths:

      The point is an important one. While it is not entirely novel-the idea of adding extra terms to Hamilton's rule has arisen sporadically (Queller, 1985, 2011; Fletcher et al., 2006; van Veelen et al., 2017)--it is very useful to have a systematic treatment of this point. I think the manuscript can make an important contribution by helping to clarify a number of debates in the literature. I particularly appreciate the heterozygote advantage example in the SI.

      Me too, and I really hope the readers make it this far! I have thought of putting it in the main text, but did not know where that would fit.

      Weaknesses:

      Although the mathematical analysis is rigorously done and I largely agree with the conclusions, I feel there are some issues regarding terminology, some regarding the state of the field, and the practice of statistics that need to be clarified if the manuscript is truly to resolve the outstanding issues of the field. Otherwise, I worry that it will in some ways add to the confusion.

      (1) The "generalized" Price equation: I agree that the equations labeled (PE.C) and (GPE.C) are different in a subtle yet meaningful way. But I do not see any way in which (GPE.C) is more general than (PE.C). That is, I cannot envision any circumstance in which (GPE.C) applies but (PE.C) does not. A term other than "generalized" should be used.

      This is a great point! Just to make sure that those that read the reports online understand this point, let me add some detail. The equation labeled (PE.C) – which is short for Price equation in covariance form – is

      The derivation in Appendix A then assumes that we have a statistical model that includes a constant and a linear term for the p-score. It then defines the model-estimated fitness of individual 𝑖 as , where 𝑤<sub> 𝑖</sub> is the realized number of offspring of individual 𝑖, and 𝜀<sub> 𝑖</sub> is the error term – and it is the sum over all individuals of this error term-squared that is minimized. The vector of model-estimated fitnesses will typically be different for different choices of the statistical model. Appendix A then goes on to show that, whatever the statistical model is that is used, for all of them , as long as the statistical model includes a constant and a linear term for the p-score. That means that we can rewrite (PE.C) as

      The point that the reviewer is making, is that this is not really a generalization. For a given dataset (or, more generally, for a given population transition, whether empirical or in a model), is just a number, and it happens to be the case that 𝐶𝑜𝑣(𝑤:, 𝑝) returns the same number, whatever statistical model we use for determining what the model-estimated fitnesses 𝑤<sub> 𝑖</sub> are (as long as the statistical model includes a constant and a linear term for the p-score). In other words, (PE.C) is not really nested in (GPE.C), so (GPE.C) is not a proper generalization of (PE.C).

      This is a totally correct point, and I had actually struggled a bit with the question what terminology to use here. Equation (GPE.C) is definitely general, in the sense that we can change the statistical model, and thereby change the vector of model-estimated fitnesses , but as long as we keep the constant and the linear term in the statistical model, the equation still applies. But it is not a generalization of (PE.C).

      I do however have a hard time coming up with a better label. The General Price equation may be a bit better, but it still suggests generalization. The Statistical Model-based Price equation does not suggest or imply generalization, but it does not convey how general it is, and it suggests that it could be an alternative to the normal Price equation that one may or may not choose to use – while this version really is the one we should use. It may moreover create the impression that this is only for doing statistics, and one might use the traditional Price equation for anything that is not statistics. I cannot really think of other good alternatives, but I am of course open to suggestions.

      So, by lack of a better label, I called this the Generalized Price equation in covariance form. Though clearly imperfect, there are still a few good things about this label. The first is that, as mentioned above, this equation is general, in the sense that it holds, regardless of the statistical model. The second reason is that this is Step 1 in a sequence of three steps., the other two of which do produce proper generalizations. Step 2 goes from this equation in covariance form to the Generalized Price Equation in regression form, which is a proper generalization of the traditional Price equation in regression form. Step 3 goes from the Generalized Price Equation in regression form to the general version of Hamilton’s rule, which is also a proper generalization of the classical Hamilton’s rule. Since I would suggest that Step 1 on its own is kind of useless, and therefore Step 1 and Step 2 will typically come as a package, I would be tempted to think that this justifies the abuse of terminology for the Price Equation in covariance form. I did however add the observation made by the reviewer at the point where the Generalized Price equation (in both forms) is derived, so I hope this at least partly addresses this concern.

      (2) Regression vs covariance forms of the Price equation: I think the author uses "generalized" in reference to what Price called the "regression form" of his equation. But to almost everyone in the field, the "Price Equation" refers to the covariance form. For this reason, it is very confusing when the manuscript refers to the regression form as simply "the Price Equation".

      As an example, in the box on p. 15, the manuscript states "The Price equation can be generalized, in the sense that one can write a variety of Price-like equations for a variety of possible true models, that may have generated the data." But it is not the Price equation (covariance form) that is being generalized here. It is only the regression that Price used that is being generalized.

      To be consistent with the field, I suggest the term "Price Equation" be used only to refer to the covariance form unless it is otherwise specified as in "regression form of the Price equation".

      I am not sure about the level of confusion induced here, but I totally see that it can be helpful to avoid all ambiguity. I therefore went over everything, and whenever I wrote “Price equation”, I tried to make sure it comes either with “in covariance form” or with “in regression form”. At some places, it is a bit over the top to keep repeating “in regression form”, when it is abundantly clear which form is being discussed. Also, I added no qualifiers if a statement is true for both forms of the Price equation, or if the claim refers to the whole package of going through Step 1 and Step 2 mentioned above.

      (3) Sample covariance: The author refers to the covariance in the Price equation as “sample covariance”. This is not correct, since sample covariance has a denominator of N-1 rather than N (Bessel’s correction). The correct term, when summing over an entire population, is “population covariance”. Price (1972) was clear about this: “In this paper we will be concerned with population functions and make no use of sample functions”. This point is elaborated on by Frank (2012), in the subsection “Interpretation of Covariance”.

      I totally agree. On page 418 of van Veelen (2005), I wrote:

      “Another possibility is that we think of 𝑧<sub>i</sub> and 𝑞<sub>i</sub>, 𝑖 = 1,…,𝑁 as realizations of a jointly distributed random variable. […] In that case the expression between square brackets is a good approximation for what statisticians […] call a sample covariance. A sample covariance is defined as but in large samples it is OK to replace 𝑁 − 1 by 𝑁, and then this formula reduces to Price’s 𝐶𝑜𝑣(𝑧, 𝑞).”

      In van Veelen et al. (2012), I slid a little, because in Box 1 on page 66, I wrote that is the sample covariance, and only in footnote 1 on the same page did I include Bessel’s correction, when I wrote:

      “To be perfectly precise, the sample covariance is defined as

      In this manuscript, I slid a little further, and left Bessel’s correction out altogether. I am happy that the reviewer pointed this out, so I can make this maximally precise again.

      The reviewer also quotes Price (1972), page 485:

      “In this paper we will be concerned with population functions and make no use of sample functions”.

      Below, the reviewer will return to the issue of distinguishing between the sample covariance with Bessel’s correction, and the sample covariance without Bessel’s correction, where the latter is regularly also referred to as the population covariance. A natural interpretation of the quote from Price (1972), if we read a bit around this quote in the paper, is that the difference between his “population functions” and his “sample functions” is indeed Bessel’s correction.

      The reviewer also states that Frank (2012) elaborates on this in the subsection “Interpretation of Covariance”. What is interesting, though, is that, when Frank (2012) writes, on page 1017 “It is important to distinguish between population measures and sample measures”, the difference between those is not that one does, and the other does not include Bessel’s correction. The difference between “population measures” and “sample measures” in Frank (2012), page 1017

      “It is important to distinguish between population measures and sample measures”,

      the difference between those is not that one does, and the other does not include Bessel’s correction. The difference between “population measures” and “sample measures” in Frank (2012), page 1017, is that

      “In many statistical applications, one only has data on a subset of the full population, that subset forming a sample.”

      The distinction between a population covariance and a sample covariance in Frank (2012) therefore is that they are “covariances” of different things (where the word covariances is in quotation marks, because, again, they are not really covariances). Besides just making sure that Price (1972) and Frank (2012) are not using these terms in the same way, this also perfectly illustrates the mix-up between statistical populations (or data generating processes) and biological populations that I discuss on pages 8 and 9 of Appendix A. I will return to this below, when I explain why I want to avoid using the word “population covariance” for the sample covariance without Bessel’s correction.

      Of course, the difference is negligible when the population is large. However, the author applies the covariance formula to populations as small as 𝑁 = 2, for which the correction factor is significant.

      Absolutely right.

      The author objects to using the term "population covariance" (SI, pp. 8-9) on the grounds that it might be misleading if the covariance, regression coefficients, etc. are used for inference because in this case, what is being inferred is not a population statistic but an underlying relationship. However, I am not convinced that statistical inference is or should be the primary use of the Price equation (see next point). At any rate, avoiding potential confusion is not a sufficient reason to use incorrect terminology.

      There are a few related, but separate issues. One is what to call the 𝐶𝑜𝑣(𝑤, 𝑝)-term. Another, somewhat broader, is to avoid mixing up statistical populations and biological populations. A third is what the primary use of the Price equation is. The third issue I will respond to below, where it reappears. Here I will focus on the first two, which can be discussed without addressing the third.

      In a data context, I now call the 𝐶𝑜𝑣(𝑤, 𝑝)-term “’" times the sample covariance, or, in other words, the sample covariance without Bessel’s correction”. This should be unambiguous. In a modeling context I refer to 𝐶𝑜𝑣(𝑤, 𝑝)-term as “the 𝐶𝑜𝑣(𝑤, 𝑝)-term” and describe it as a summary statistic or a notational convention. There are two reasons for this choice.

      The first is that neither of these use the word “population”. I like this, because there is a persistent scope for confusion between statistical populations and biological populations (as exemplified by Frank, 2012). This leads to an incorrect, but widespread intuition that if we “know the entire (biological) population” in a data context, there is nothing that can be estimated. This is what pages 8 and 9 of Appendix A are all about.

      The second reason is that by using two labels, I also differentiate between the data context and the modeling context. This is important for reasons I will return to later.

      Relatedly, I suggest avoiding using 𝐸 for the second term in the Price equation, since (as the ms points out), it is not the expectation of any random variable. It is a population mean. There is no reason not to use something like Avg or bar notation to indicate population mean. Price (1972) uses "ave" for average.

      I totally agree that the second term in the Price equation is not an expectation. I made this point in van Veelen (2005), and I repeated this in the manuscript. This remark by the reviewer prompted me to spell this out a bit more emphatically in Appendix A. That still leaves me with the choice what notation to use.

      I therefore looked up all contributions to the Theme issue “Fifty years of the Price equation” in the Philosophical Transactions of the Royal Society B, and found that almost all contributions use 𝐸, sometimes saying that this refers to an expectation or an average. Of course, this is wrong. However (and this is another argument), it is equally wrong as using 𝐶𝑜𝑣 or 𝑉𝑎𝑟. The terms abbreviated as 𝐶𝑜𝑣 and 𝑉𝑎𝑟 are equally much not a covariance and a variance as the term abbreviated as 𝐸 is not an expectation. So I would think that there are a few reasons for sticking with 𝐸 here; 1) consistency with the literature; 2) consistency with the treatment of other terms; and 3) the fact that this term is not really of any importance in this manuscript. I do however totally understand the reviewer’s reasons, which I suppose include that for using 𝐸, there are relatively unproblematic alternatives (ave or upper bar) that are not available for the other terms. I hope therefore that being a bit more emphatic in the manuscript about 𝐸 not being an expectation at least partly addresses this concern.

      I should add, however, that the distinction between population statistics vs sample statistics goes away for regression coefficients (e.g. b, c, and r in Hamilton's rule) since in this case, Bessel's correction cancels out.

      Totally correct.

      (4) Descriptive vs. inferential statistics: When discussing the statistical quantities in the Price Equation, the author appears to treat them all as inferential statistics. That is, he takes the position that the population data are all generated by some probabilistic model and that the goal of computing the statistical quantities in the Price Equation is to correctly infer this model.

      Before I respond to this, I would like to point out that this literature has started going off the rails right from the very beginning. One of the initial construction errors was to use the ungeneralized Price equation in regression form. The other one is that the paper in which Price (1970) presented his equation is inconsistent, and suggests that the equation can be used for constructing hypotheses and for testing them at the same time (see van Veelen (2005), page 416). That, of course, is not possible; the first happens in the theory/modeling domain, and the second in the empirical testing/statistics domain, and they are separate exercises.

      These construction errors have warped the literature based on it, and have resulted in a lot of mental gymnastics and esoteric statements, which are needed if we are not willing to consider the possibility that there could be anything amiss with the original paper by Price (1970).

      In this paper, I undo both of these construction errors. Undoing the second one means exploring both domains separately. In Sections 2-4 of Appendix A I explore the possibility that the Price equation is applied to data. In Section 5 of Appendix A I explore the possibility that it is used in a modelling context. The primary effort here is just to do it right, and I have not read anything to suggest that I did not succeed in doing this. Secondarily, of course, I also want to contrast this to what happens in the existing literature. That is what this point by the reviewer is about. It is therefore important to be aware that seeing the contrast accurately is complicated by the apologetic warp in the existing literature.

      As a first effort to unwarp, I would like to point to the fact that I am not taking any position on what the Price equation should be used for. All I do here is explore (and find) possibilities, both in the statistical inference domain and in the modeling domain. I also find that there is scope for misspecification in both, and that, in both domains, we should want to avoid misspecification. The thing that I criticize in the existing literature therefore is not the choice of domain. The thing that I criticize is the insistence on, and celebrating of what is most accurately described as misspecification. This typically happens in the modeling domain.

      It is worth pointing out that those who argue in favor of the Price Equation do not see it this way: "it is a mistake to assume that it must be the evolutionary theorist, writing out covariances, who is performing the equivalent of a statistical analysis." (Gardner, West, and Wild, 2011); "Neither data nor inferences are considered here" (Rousset, 2015). From what I can tell, to the supporters of the Price equation and the regression form of Hamilton's rule, the statistical quantities involved are either population-level *descriptive* statistics (in an empirical context), or else are statistics of random variables (in a stochastic modeling context).

      Again, this description of the friction between my paper and the existing literature is predicated on the suggestion that I have only one domain in mind where the Price equation can be applied. That is not the case; I consider both.

      In the previous paragraph, the reviewer states that I “treat statistical quantities as inferential statistics”, and in this paragraph the reviewer contrasts that with the supporters of the (ungeneralized) Price equation that supposedly treat the same quantities as “descriptive statistics”. This is also beside the point, but it will take some effort to sort out the spaghetti of entangled arguments (where the spaghetti is the result of the history in this field, as indicated earlier).

      First of all, it is not unimportant to point out that the way most people use the terms “inferential statistics” and “descriptive statistics” is that the first refers to an activity, and the second to a function of a bunch of numbers, typically data. Inferential statistics is a combination of parameter estimation and model specification (those are activities). Descriptive statistics are for instance the average values of variables of interest (which makes them a function of a set of numbers). When doing inferential statistics (or statistical inference), looking at the descriptive statistics of the dataset is just a routine before the real work begins. It is important to remember that.

      Now I suppose that this reviewer uses these words a little differently. When he or she writes that I “treat statistical quantities as inferential statistics”, I assume that the reviewer means that I want to use a term like for doing statistical inference, or that, when I want to interpret such a term, I include considerations typical of statistical inference. Within the data domain, that is totally correct. In the paper I argue that there are very good reasons for this. We would like to know what the data can tell us about the actual fitness function, and if we do our statistical inference right, and choose our Price-like equation accordingly, then that means that we would be able to give a meaningful interpretation to a term like . It also means that we then have an equation that describes the genetic population dynamics accurately.

      When the reviewer states that other papers treat them as “population level descriptive statistics” in an empirical context, I have a hard time coming up with papers for which that is the case. Most papers apply the Price equation in the modeling domain (That is to say: this is true in evolution. In ecology the Price equation is often applied to data; see Pillai and Gouhier (2019) and Bourrat et al. (2023)). But even if there are researchers that apply the Price equation to data, then considering these statistical quantities as “descriptive statistics” would not make sense. Looking at the descriptive statistics alone is not an empirical exercise; it is just a routine that happens before the actual statistical inference starts. In a data context, saying that considerations that are standard in statistical inference do not apply, because one is just not doing statistical inference, is the equivalent of an admission of guilt. If you do not consider statistical significance, and never mention that sample size could matter, because you are using these terms as “descriptive statistics, not inferential statistics”, then you’re basically admitting to not doing a serious empirical study.

      Besides treating statistical quantities as descriptive statistics in a data context, the reviewer also states that, in a stochastic modeling context, other researchers treat the same statistical quantities as “statistics of random variables”. This is first of all very generous to the existing literature. I imagine that the reviewer is imagining a modeling exercise where for instance the covariance between two variables is postulated. A theory exercise would then take that as a starting point for the derivation of some theoretical result. This, however, is not what happens in most of the literature.

      There are two things that I would like to point out. First of all, postulating covariances and deriving results from assumptions regarding those covariances is not an activity that requires using the Price equation. There are many stochastic models that function perfectly fine without the Price equation. This is maybe a detail, but it is important to realize that what the reviewer probably thinks of as a legitimate theoretical exercise may be something that can very well be done without the Price equation.

      Secondly, I would like to repeat something that I have pointed out before, which is that the Price equation can be written for any transition, whether this transition is likely or unlikely, given a model, and even for transitions that are impossible. For all of those transitions, one can write the (ungeneralized) Price equation, and for all of those, the Price equation will be an identity, and it will contain the things that the reviewer refers to as “statistical quantities”. It is important to realize that these “statistical quantities”, therefore, are properties of a transition, and that every transition comes with its own ”statistical quantity”. That implies that they are not properties of random variables; they reflect something regarding one transition. What one could imagine, though, is the following. To fix ideas, let’s take the Price equation in regression form, and focus on . A meaningful modeling exercise starts with assumptions about the likelihood of all different transitions, and therefore the likelihood of different values of 𝛽 materializing – or it starts with assumptions that imply those probabilities. In a theoretical exercise, one could then derive statements about the expectation and variance of those “statistical quantities”. For instance, one can calculate the expected value 𝐸[𝛽] =𝐸, and the variance 𝑉𝑎𝑟[𝛽] = 𝑉𝑎𝑟 , where this expectation is a proper expectation (taken over the probabilities with which these transitions materialize) and this variance is a proper variance, for the same reason.

      This is what I do on page 416 of van Veelen (2005) and in Section 5 of Appendix A. I think something like this is what the reviewer may have in mind, but it is worth pointing out that this still does not mean that the from the Price equation for any given transition is now a property of a random variable. Much of the literature, however, is not at the level of sophistication that I imagine the reviewer has in mind – although there are papers that are; see the discussion below of Rousset and Billiard (2000) and Van Cleve (2015).

      In the appendix to this reply, I will address the quotes from Gardner, West, and Wild (2011) and Rousset (2015). This takes up some space, so that is why it is at the end of this reply.

      In short, the manuscript seems to argue that Price equation users are performing statistical inference incorrectly, whereas the users insist that they are not doing statistical inference at all.

      That is not what the manuscript argues, but I am happy to clarify. The manuscript explores both the use of the Price equation when applied to data (and therefore for statistical inference) and when applied to transitions in a model. The criticism on the existing literature is not that it performs statistical inference incorrectly. The criticism is that the literature insists on misspecification, which typically happens in a modelling context.

      The problem (and here I think the author would agree with me) arises when users of the Price equation go on to make predictive or causal claims that would require the kind of statistical analysis they claim not to be doing. Claims of the form "Hamilton's rule predicts.." or use of terms like "benefit" and "cost" suggest that one has inferred a predictive or causal relationship in the given data, while somehow bypassing the entire theory of statistical inference.

      I do not really know how to interpret this paragraph. The use of the word “data” suggests that this pertains to a data context, but I do not know what would qualify as a “predictive claim” in that domain, or how any study would go from data to a claim of the form “Hamilton’s rule predicts …”. Again, I do not really know papers that apply the Price equation to data. None of the empirical papers reviewed in Bourke (2014) for instance do. I would however agree that it is close to obvious that an approach that does indeed bypass the entire theory of statistical inference cannot identify causal relations in datasets. I think the examples in Section 2 of Appendix A also clearly illustrate that a literature in which the word “sample size” is absent, cannot be doing statistical inference.

      There is also a third way to use the Price equation which is entirely unobjectionable: as a way to express the relationship between individual-level fitness and population-level gene frequency change in a form that is convenient for further algebraic manipulation. I suspect that this is actually the most common use of the Price equation in practice.

      I am not sure if I understand what it means for the Price equation to “express the relationship between individual-level fitness and population-level gene frequency change”. That is a bit reminiscent of how John Maynard Smith saw the Price equation (Okasha, 2005), but he also emphasized that he was unable to follow George Price and his equation. For sure, it cannot be that one side of the Price equation reflects something at the individual level and the other something at the population level, because both sides of the Price equation are equally aggregated over the population. Just to be safe, and to avoid unwarranted associative thinking, I would therefore choose to be minimalistic, and say that the Price equation is an identity for a transition between a parent population and an offspring population.

      Regardless of the words we choose, however, the question how harmless or objectionable the use of the Price equation is in the literature is absolutely relevant. In earlier papers I have tried to cover a spectrum of examples of different ways to use (or misuse) the Price equation. In van Veelen (2005) I cover Grafen (1985a), Taylor (1989), Price (1972), and Sober and Wilson (2007). The main paper that is discussed in van Veelen et al. (2012) is Queller (1992b), but Section 7 of that paper also discusses the way the Price equation is used in Rousset and Billiard (2000), Taylor (1989), Queller (1985), and Page and Nowak (2002). These discussions also come with a description of how much it takes to repair them, and this varies all the way from nothing, or a bit of minor rewording, to being beyond repair.

      What is good to observe, is that the papers in which the use of the Price equation is the least problematic, are also the papers in which, if the reference to the Price equation would be taken out, nothing really changes. These are papers that start with a model, or a collection of models, and that, at some point in the derivation of their results, point to a step that can, but does not have to be described as using the Price equation. An example of this is Rousset and Billiard (2000); see the detailed description in Section 7 of van Veelen et al. (2012).

      I am happy to point to a few more papers on the no harm, no foul end of the spectrum here.

      Allen and Tarnita (2012) discuss properties of the dynamics in a well-defined set of models.

      Towards the end of the paper, a version of the Price equation more or less naturally appears. This is more of an interesting aside, though, and does not really play a role in derivation of the core results of the paper. Van Cleve (2015) is similar to Rousset and Billiard (2000), in that the “application of the Price equation” there is a minor ingredient of the derivation of the results. (A detail that this reviewer may find worth mentioning, given earlier comments, is that Van Cleve (2015) writes the left-hand side of the Price equation as 𝐸(𝑤Δ𝑝|𝐩), instead of . First two very unimportant things. Van Cleve (2015) uses 𝑤 for mean fitness, for which is a more common symbol. Another detail of lesser importance is that it includes the vector of parent p-scores in the notation, which in their notation is 𝐩. More importantly, however, is that Van Cleve (2015) writes 𝐸(Δ𝑝) for , which extends the (mis)use of the symbol 𝐸 for what really is just an average. This is consistent within the Price equation, in the sense that it now denotes the average with 𝐸, both on the right-hand side and on the left-hand side of the Price equation. It can however be a little bit confusing, because when Rousset and Billiard (2000) write , then this is a proper expectation. In their case, this summarizes all possible transitions out of a given state, and weighs them by their probabilities of happening, given a state summarized by 𝑝.). I am also happy to extend the spectrum a bit here. Some papers on inclusive fitness do not use the Price equation at all, even though one could imagine places where it could be inserted. A nice example of such a paper is Taylor et al. (2007).

      In this paper, I hope I can be excused from taking a complete inventory of this literature, and I hope that I do not have to count how many papers fall into the different categories. This would help assess the veracity of the suspicion the reviewer has, which is that the most common use of the Price equation is entirely unobjectionable, but I just do not have the time. I would however not want to underestimate the aggregate damage done in this field. The spectrum spanned in my earlier papers does include a fair amount of nonsense results. This typically happens in papers that do not study a specific model or set of models, but that take the Price equation as their point of departure for their theorizing. Also there seems to be a positive correlation between how exalted and venerating the language is that is used when describing the wonders and depths of the Price equation, and how little sense the claims make that are “derived” with it.

      We also should not set the bar too low. This is a literature that, at the starting point, has a few construction errors in it, as described in the paper. That is reason for concern. Moreover, one of the main end products of this literature is what we send our empiricists to the field with. As Section 8 of van Veelen et al. (2017) indicates, what we have supplied to our empiricists to work with is nothing short of terrible. I would therefore want to maintain that the damage done is enormous, and if there are also a few papers around that may use the ungeneralized Price equation in an innocuous way, then that is not enough redemption for my taste. We are still facing a literature in which, at every instance where the Price equation is used, we still need to check in which category it falls.

      For a paper that aims to clarify these thorny concepts in the literature, I think it is worth pointing out these different interpretations of statistical quantities in the Price equation (descriptive statistics vs inferential statistics vs algebraic manipulation). One can then critique the conclusions that are inappropriately drawn from the Price equation, which would require rigorous statistical inference to draw. Without these clarifications, supporters of the Price equation will again argue that this manuscript has misunderstood the purpose of the equation and that they never claimed to do inference in the first place.

      I would like to return to the point that I made at the beginning of my response to point (4), which is that the “thorniness” of these concepts is the result of the warp in the literature, resulting from the construction errors in Price (1970). If people want to understand how to apply the Price equation right, I think that reading Appendix A and B would work just fine. Again, I have not read anything that suggests that there is anything incorrect in there, so if the literature contains “thorny” concepts, it might just be that this is the result of the mental gymnastics necessitated by the unwillingness to accept that there might be something not completely right with Price (1970). Moreover, given my experiences in the field, I am not sure that there is anything that I could say that would convince the supporters of the ungeneralized Price equation.

      (5) "True" models: Even if one accepts that the statistical quantities in the Price equation are inferential in nature, the author appears to go a step further by asserting that, even in empirical populations, there is a specific "true" model which it is our goal to infer. This assumption manifests at many points in the SI when the author refers to the "true model" or "true, underlying population structure" in the context of an empirical population.

      Again, in Appendix A I explore both a data context and a modeling context. In the modeling context none of this applies, because in such a context, there is only the model that we postulate. In the part in which I explore what the Price equation can do in a data context, I do indeed use words like “true model” or "true underlying population structure".  

      I do not think it is necessary or appropriate, in empirical contexts, to posit the existence of a Platonic "true" model that is generating the data. Real populations are not governed by mathematical models. Moreover, the goal of statistical inference is not to determine the "true model" for given data but to say whether a given statistical model is justified based on this data. Fitting a linear model, for example, does not rule out the possibility there may be higher-order interactions - it just means we do not have a statistical basis to infer these higher-order interactions from the data (say, because their p-scores are insignificant), and so we leave them out.

      This remark suggests that the statistical approach in Sections 2-4 of Appendix A is more naïve than it should be, and that I would overlook the possibility of, for instance, interaction effects that are really nonzero, but that are statistically not significant. Now first of all, at a superficial level, I would like to say that this strikes me as somewhat inconsistent. In the remarks further back, the reviewer seems to excuse those that use the Price equation on data without any statistical considerations whatsoever. The reason why the reviewer is giving them a pass, is that they are “just not doing statistical inference”. Instead, they are doing this whole other thing with, you know, descriptive statistics. As I indicated above, that is just a fancy way of saying that they are not doing serious statistics – or serious empirics, for that matter.

      In this comment, on the other hand, the reviewer also suggests that the statistics that I use to replace the total absence of any statistical considerations with, is not quite up to snuff. Below, I will indicate why that is not the case at all, but I think it is also worth registering a touch of irony there.

      In order to address this issue, it is worth first observing that the whole of classical statistics is based on probability theory in the following sense. We are always asking ourselves the question: if the data generating process works like this, what would the likelihood be of certain outcomes (datasets); and if the data generating process works some other way (sometimes: the complement of whatever “this” is), what would the likelihood then be of the same outcomes. By comparing those, we draw inferences about the underlying data generating process (which is a word suggestive of a “Platonic” world view that the reviewer seems to reject). Therefore, if one would impose a ban on using Platonic words like “true data generating process”; “actual fitness function”; or “the population structure that is out there”, it would be impossible to teach any course in statistics, basic or advanced. Also it would be impossible to practice, and talk about, applied statistics.

      Now the reviewer claims that “Real populations are not governed by mathematical models”. I do not really know if I agree or disagree with that statement, but the example that the reviewer gives does not fit that claim. The reviewer suggests that if we find a higher order term not to be statistically significant (and therefore we reject the hypothesis that it is nonzero), then that would not necessarily mean that it is not there. That is totally true, and statisticians tend to be fully aware of that. But that does not imply that there is no true data-generating process; the whole premise of this example is that there is, but that the sample size is not large enough to determine it in a detailed enough way so as to include this interaction effect, that apparently is small relative to the sample size.

      The third thing to reflect on here, is that the reviewer seems to suggest that the Generalized Price equation in regression form, as presented in my paper, comes with a specific statistical approach, that he or she classifies as philosophically naïve or unsophisticated. That, however, is not the case, and I am very grateful that this remark by this reviewer allows me to make a point that I think shines a light on how the Generalized Price equation puts the train that started going off the rails in 1970 back on track, and reconnects it with the statistics it borrows its terminology from. To see that, it is good to be aware that statistics never gives certainty. The whole discipline is built around the awareness that it is possible to draw the wrong inference, and the aim is to determine, minimize, and balance, the likelihoods of making different wrong inferences. So, statistics produces statements about the confidence with which one can say that something works one way or the other. In some instances, the data are not enough to say anything with any confidence. In other cases, the data are rich enough so that it is really unlikely that we incorrectly infer that for instance a certain gene matters for fitness.

      The nice thing about the setup with the Generalized Price equation, is that those statistical considerations translate one-to-one to considerations regarding which Price-like equation to choose. If the data do not allow us to pick any model with confidence, then we should be equally agnostic about which Price-like equation describes the population genetic dynamics accurately. If the statistics gives us high confidence that a certain model matches the data, then we should pick the matching Price-like equation with the same confidence. This also carries over to higher level statistical considerations.

      If we think about terms that, if we would gather a gargantuan amount of data, might be statistically significant, but very small, then economists call those statistically significant, but economically insignificant. When rejecting the statistical significance on the basis of a not gargantuan dataset, statisticians are aware that terms that really have a zero effect, as well as terms, the effect of which is really small, are rejected with the same statistical test – and that we should be fine with that. All such considerations carry over to what we think of regarding the choice of a Price-like equation to describe the population genetic dynamics. Even if people disagree about whether or not to include a term that is statistically significant, but relatively small, such a disagreement can still happen within this setup, and just translates to a disagreement on which Price-like equation to choose.

      Similarly, people could also disagree about whether it is justified to use polynomials to characterize a fitness function. If we decide that we can, because of Taylor expansions, then the core result of the paper implies that the population genetic dynamics can be summarized by a generalized Hamilton’s rule (as long as the fitness function includes a constant and a linear term regarding the p-score). On the other hand, if we do not believe this is justified, and prefer to use an altogether different family of fitness functions, then we can no longer do this. All of this leaves space for all kinds of statistical considerations and disagreements, that just carry over to the choice for one or the other Price-like equation as an accurate description of the population genetic dynamics. Or, if one does not believe polynomials should be used, then this leads to not picking any Price-like equation at all.

      So, this is a long way of saying that the Generalized Price equation creates space for all statistical considerations to regain their place, and does not hinge on one approach to statistics or another.

      What we can say is that if we apply the statistical model to data generated by a probabilistic model, and if these models match, then as the number of observations grows to infinity, the estimators in the statistical model converge to the parameters of the data-generating one.

      But this is a mathematical statement, not a statement about real-world populations.

      Again, I do not know if I agree or disagree with the last sentence. However, that does not really matter, because either option only has implications for how we are to think of the relation between a Price-like equation describing a population genetic dynamics and real-world populations. It is not relevant for the question which Price-like equation to pick, or whether to pick one at all.

      A resolution I suggest to points 3, 4, and 5 above is:

      *A priori, the statistical quantities in the Price Equation are descriptive statistics, pertaining only to the specific population data given.

      *If one wishes to impute any predictive power, generalizability, or causal meaning to these statistics, all the standard considerations of inferential statistics apply. In particular, one must choose a statistical model that is justified based on the given data. In this case, one is not guaranteed to obtain the standard (linear) Hamilton's rule and may obtain any of an infinite family of rules.

      *If one uses a model that is not justified based on the given data, the results will still be correct for the given population data but will lack any meaning or generalizability beyond that.

      *In particular, if one considers data generated by a probabilistic model, and applies a statistical model that does not match the data-generating one, the results will be misleading, and will not generalize beyond the randomly generated realization one uses.

      Of course, the author may propose a different resolution to points 3-5, but they should be resolved somehow. Otherwise, the terminology in the manuscript will be incorrect and the ms will not resolve confusion in the field.

      I have outlined my solutions extensively above. I really appreciate that Reviewers #1 and #2 have spent time and attention on the manuscript and on the long appendices.  

      Appendix to the response to reviewer #2: Some remarks on Gardner, West & Wild (2011), Frank (2012), and Rousset (2015)

      An accurate response to the quote from Gardner, West, and Wild (2011) in the review report takes up space. I therefore wanted to put that in an appendix to the response to reviewer #2. I also include a few paragraphs regarding Frank (2012) and Rousset (2015), both of which are also mentioned by reviewer #2. All of this might also be of interest to people that are curious about how what I find in my paper relates to the existing literature.

      Gardner, West & Wild (2011) The quote I am responding to is “it is a mistake to assume that it must be the evolutionary theorist, writing out covariances, who is performing the equivalent of a statistical analysis” I want to put that into context, so I will go over the whole paragraph that surrounds the quote. The paragraph is called Statistics and Evolutionary Theory and can be found on page 1038 of the paper. I think that it is worth pointing out that it is not easy to respond to their somewhat impressionistic collages of words and formulas. I will therefore cut the paragraph up in a few smaller bits and try to make sense of it bit by bit. The paragraph begins with:

      “Our account of the general theory of kin selection has been framed in statistical terms.” Based on what they write two sentences down, the best match between those words and what they do in the paper would be: “our account uses words like “covariance”, “variance” and “expectation” for things that are not what “covariance”, “variance” and “expectation” mean in probability theory and statistics.” I would be totally open to an argument why that is nonetheless OK to do, but the way Gardner, West, and Wild (2011) phrase it obscures the fact that this needs any justification or reflection at all. “Framing something in statistical terms” is unspecific enough to sound completely harmless.

      “The use of statistical methods in the mathematical development of Darwinian theory has itself been subjected to recent criticism (van Veelen, 2005; Nowak et al., 2010b), so we address this criticism here.

      Also here, specifics would be helpful. The “use of statistical methods” sounds like it is more than just using terms from statistics, so this might refer to the minimizing of the sum of squared differences, which is also mentioned a sentence down in Gardner, West, and Wild (2011). If it does, then it is worth observing that in statistics, the minimizing of the sum of squared differences (or residuals, or errors) comes with theorems that point very clearly to what is being achieved by doing this. The Gauss–Markov theorem states that the ordinary least squares (OLS) estimator has the lowest variance within the class of linear unbiased estimators. This implies that minimizing the sum of squared errors helps answering a well-defined question in statistics; under certain conditions, an OLS estimator is our best shot at uncovering an unknown relation between variables. To also minimize a sum of squared differences, but now in the modeling domain, qualifies as “use of statistical methods” only in a very shallow way. It means that a similar minimization is performed. Without an equivalent of the Gauss-Markov theorem that would shine a light on what it is that is being achieved by doing so, that does not carry the same weight as it does in the statistics domain – in that it does not carry any weight at all.

      “The concern is that statistical terms – such as covariances and least-squares regressions – should properly be reserved for conventional statistical analyses, where hypotheses are tested against explicit data, and that they are out of place in the foundations of evolutionary theory (van Veelen, 2005; Nowak et al., 2010b).”

      Again, a few things are a bit vague. What are “explicit data”? Are there data that are not explicit? Why the generic “foundations of evolutionary theory”, instead of a more specific description of what these statistical terms are used for? But either way, this is a misrepresentation of what I wrote in van Veelen (2005). I did not suggest to “reserve statistical terms for conventional statistical analysis” just because. As I do here in the current paper, what I did there was explore the possibilities for the Price equation to help with what I then called Type I and Type II questions. Type I questions find themselves in the modeling domain and Type II questions find themselves in the statistical domain. I was not arguing for a ban on applying statistical concepts outside of the domain of statistical inference. All that I said is that in its current practice, it does not really help answering questions of either type.  

      “However, this concern is misplaced. First, natural selection is a statistical process, and it is therefore natural that this should be defined in terms of aggregate statistics, even if only strictly by analogy (Frank, 1997a, 1998).”

      This is a vague non-argument. Almost nothing is well-defined here. What does it mean for natural selection to be a statistical process? Is that just an unusual term for a random process? If so, then I suppose I agree, but that has nothing to do with what I state or claim. And what does it mean to be defined in terms of aggregate statistics? What is the alternative? I have no idea how any of this relates to anything that I claim or state in my papers.

      “Second, Fisher (1930, p198) coined the term ‘covariance’ in the context of his exposition of the genetical theory of natural selection, so the evolutionary usage of this term has precedent over the way the term is used in other fields.”

      This is what I would call a “historic fallacy”. The fact that Fisher coined the term “covariance” in a book on genetics and natural selection does not mean that any “evolutionary usage” of the term “covariance”, however nonsensical, now has precedent over the way the term is used in other fields. Irrespective of the path that the history of science, genetics, or statistics took, right now we are in a place where about every student at every university anywhere in the world that takes a course in probability theory and/or statistics, learns that covariance is a property of a random variable (see also Wikipedia). And they do for a very good reason; it is essential in recognizing the relation between probability theory on the one hand and statistics on the other. Being curious how this “evolutionary usage” of the term covariance works, if covariance turns out not to be a property of a random variable, is therefore perfectly justified, and “Fisher coined the term” is not a safe word that exempts it from scrutiny. 

      Third, it is a mistake to assume that it must be the evolutionary theorist, writing out covariances, who is performing the equivalent of a statistical analysis.

      Again, that is just not what anyone is saying. Nobody is suggesting that an evolutionary theorist should perform the equivalent of statistical analysis. All I did was point to how little is being achieved by transferring formulas from statistics to a modeling context.

      A better analogy is to regard Mother Nature in the role of statistician, analysing fitness effects of genes by the method of least-squares, and driving genetic change according to the results of her analyses (cf. Crow, 2008).

      I have no idea what any of this means. Mother Nature is a personification of something that is not a person, and that does not have cognition. Without sentience, “Mother Nature” cannot assume the role of statistician, and cannot analyse fitness effects.

      More generally, analogy is the basis of all understanding, so when isomorphisms arise unexpectedly between different branches of mathematics (in this case, theoretical population genetics and statistical least-squares analysis) this represents an opportunity for advancing scientific progress and not an anomaly that is to be avoided.

      This is a strawman argument, puffed up with platitudes. Nobody is arguing against analogies. But what is the analogy supposed to be here? Just taking least squares from statistical inference and performing it in a modeling context does not make it an analogy. The GaussMarkov theorem, which is the basis for why least squares helps answering questions in statistics, just does not mean anything in a modeling context. OLS in modeling is just willful misspecification, and nothing that it does in statistics translates to anything meaningful in modeling. Again, declaring it an analogy, or an isomorphism, does not make it one.

      Frank (2012) Because the reviewer also mentions Frank (2012), I would like to include a small remark on this paper too. “Natural Selection. IV. The Price equation” by Frank (2012) is partly a response to my earlier criticism of the use of the Price equation. Much like Gardner, West, and Wild (2011), I would describe this paper as what is called a ”flight forwards” in Dutch. While the questions I ask are relatively prosaic (such as: how does the Price equation help derive a prediction from model assumptions?), Frank (2012) pivots to suggesting that there is a profound philosophy-of-science disagreement that I am on the wrong side of. It is close to impossible to respond to Frank (2012), because it is a labyrinth of arguments that sound deep and impressive, but that are just not specific enough to know how they relate to points that I made – or even just what they mean in general. Just to pick a random paragraph:

      “Is there some reorientation for the expression of natural selection that may provide subtle perspective, from which we can understand our subject more deeply and analyse our problems with greater ease and greater insight? My answer is, as I have mentioned, that the Price equation provides that sort of reorientation. To argue the point, I will have to keep at the distinction between the concrete and the abstract, and the relative roles of those two endpoints in mature theoretical understanding.”

      For many of those terms, I have no real idea what they mean, and also reading the rest of the paper does not help understanding what this has to do with the more prosaic questions that are waiting for an answer. What is “reorientation”? What does “concrete” versus “abstract” have to do with the question what is being achieved by doing least squares regressions in modeling? What would be an example of a mature and an immature theoretical understanding?

      Rousset (2015) is also mentioned by the reviewer. This paper is not esoteric. It states, as reviewer #2 points out, that "neither data nor inferences are considered". This paper therefore finds itself in the modeling domain, and not in the data domain. It does however still dodge the question what the benefits are of misspecification in the modeling domain. As a matter of fact, it denies that there is misspecification at all.

      “In the presence of synergies, the residuals have zero mean and are uncorrelated to the predictors. No further assumption is made about the distribution of the residuals. Thus, there is no sense in which the regression is misspecified.”

      This is a remarkable quote, and testament to the lasting impact of the construction errors in Price (1970). Misspecification is literally defined as getting the model wrong. In statistics, avoiding misspecification can be complicated, because of the noise in the data. The real datagenerating process is unknown, and because of the noise, there is always the possibility that data that are generated by one model look like they could also have been generated by another. The challenge is to reduce the odds of getting the model wrong to acceptable proportions, which is what statistical tests are for. But in modeling, we know what the model is; it is postulated by the modeler. Therefore, misspecification can be avoided by just not replacing it with a different model.

      What is being discussed in this part of Rousset (2015) is replacing what in this manuscript is called Model 3 (𝑤<sub>𝑖</sub> = 𝛼 + 𝛽<sub>1,0</sub>𝑝<sub>𝑖</sub> + 𝛽<sub>1,1</sub>𝑝<sub>𝑖</sub> + 𝛽<sub>1,1</sub>𝑝<sub>𝑖</sub>𝑞<sub>𝑖</sub> + 𝜀<sub>𝑖</sub>) with Model 2 (𝑤<sub>𝑖</sub> = 𝛼 + 𝛽<sub>1,0</sub>𝑝<sub>𝑖</sub>+ 𝛽<sub>1,0</sub>𝑝<sub>𝑖</sub>𝑞<sub>𝑖</sub> + 𝜀<sub>𝑖</sub>), and choosing the parameters in Model 2 so that it is as close as it can be to Model

      (3) This is just the definition of misspecification. That is to say: the misspecification part is the choosing of Model 2 as a reference model. The minimizing of the sum of squared residuals one could consider as minimizing the damage.

      While Rousset (2015) finds itself in the modeling domain, it does nonetheless point to the field of statistics here, by stating that “the residuals have zero mean and are uncorrelated to the predictors”. From this, the paper concludes that “there is no sense in which the regression is misspecified”. That is just plain wrong. Minimizing the sum of the squared residuals guarantees that the residuals are uncorrelated with the variables that are included in the reference model, with respect to which the squared sum of residuals is minimized. The criterion that Rousset (2015) uses is that the model is well-specified if there is no correlation between the residuals (here: ) and the variables included in the reference model (here: 𝑝<sub>𝑖</sub> and 𝑞<sub>𝑖</sub>). But according to this criterion, all models would always be well-specified, and no model could ever be misspecified. The correct criterion, however, also requires that the residuals are not correlated with variables not included in the reference model. And here, the residuals are in fact correlated with 𝑝<sub>𝑖</sub>𝑞<sub>𝑖</sub>, which is the variable that is included in Model 3, but not in Model 2. Therefore, according to the correct version of this criterion, this model is in fact misspecified – as it should be, because getting the model wrong is the definition of misspecification.

      In order to make sure that there can be no misunderstanding, I have added subsections at the end of Section 2 and Section 4 of Appendix A, and at the end of Section 2 of Appendix B. These subsections show that the algebra of minimizing the sum of squared errors implies that there is no correlation between the errors, or the residuals, and the variables that are included in the model. This is by no means something new; it is the reason why we do OLS to begin with. For additional details about misspecification, I would refer to Section 1b (viii) in van Veelen (2020).

      Finally, there is a detail worth noticing. In the main text, as well as in Appendix B, I use an analogy (and, unlike what Gardner, West, and Wild, 2011, refer to as an analogy, this actually is one). This is an analogy between two choices. On the one hand, there is the choice between Price-like equation 1 (based on Model 1 as a reference model) and Price-like equation 2 (based on Model 2 as a reference model) both applied to Model 2. On the other hand, there is the choice between Price-like equation 2 (based on Model 2 as a reference model) and Price-like equation 3 (based on Model 3 as a reference model) both applied to Model 3. Model 1 is the non-social model, Model 2 is the social model without interaction term, and Model 3 is the social model with interaction term. That makes the first choice a choice between treating a social model as a social model, or as a non-social model. The second choice is between treating a social model with interaction term as a social model with interaction term, or as a social model without interaction term. The power of this analogy is that every argument against treating the social model as if it is a non-social model is also an argument against treating the social model with interaction term as if it is a social model without interaction term.

      This ties in with the incorrect criterion for when a model is well-specified from Rousset (2015) as follows. His criterion (that there should be no correlation between the residuals and the variables in the model) declares the social model without interaction term well-specified as a reference model, when we are considering a social model with interaction term. According to the same criterion, however, the non-social model would also have to be declared to be wellspecified as a reference model, when the model we are considering is a social model. The reason is that also here, there is no correlation between the residuals and the variables that are included in this model. This is clearly not what anyone is advocating for, and for good reasons. The residuals here would, after all, be correlated with the p-score of the partner, which is a variable that is not included in the non-social model. This is a good indication that we should not use the non-social model for a social trait.

      Reviewer #3 (Public review):

      Before responding to this review, I would like to express that I appreciate the fact that the reviews and the responses are public at eLife. Besides just being useful in general, this also allows readers to get a behind the scenes glimpse into the state of the field, and the level of the reviewing. While the reports by Reviewers #1 and #2 show openness and an interest in getting things right, the report by Reviewer #3 is representative of the many review reports that I have received from the inclusive fitness community in the past. These reports tend to be rhetorically strong, and to those who do not have the time to dig deeper in the details, these reports are probably also convincing. I will therefore go through this review line by line to show how little there is behind the confident off-hand dismissal.

      There is an interesting mathematical connection - an "isomorphism"-between Price's equation and least-squares linear regression.

      This is esoteric and needlessly vague. Why is the word “isomorphism” used? In mathematics, an isomorphism is a structure-preserving mapping. The Price equation is an equation, or an identity, which makes it a bit difficult to imagine what the set of objects is on one end of the mapping. Least-squares linear regression can perhaps be seen as a function of a dataset, which would make it a single object (one function). This complicates things at the other end of the mapping too, if that set is a singleton set. The only isomorphism that I can think of is a trivial isomorphism where one equation is mapped onto one function and vice versa. It seems unlikely that this is what the reviewer means. The word isomorphism moreover is in quotes, so maybe this is supposed to be figurative. But what would it be that is being suggested here by this figure of speech? Just saying that there is, as the reviewer puts it, an “interesting mathematical connection”, does not make it so. It would already be a start to just specify what the mathematical connection is, because I have a hard time seeing what that would be. Is it just that, if you divide the Cov(𝑤, 𝑝)-term by the Var(𝑝)-term, then you get a regression coefficient? If that is what the reviewer has in mind, that would be a rather shallow observation.

      Some people have misinterpreted this connection as meaning that there is a generalitylimiting assumption of linearity within Price's equation, and hence that Hamilton's rule-which is derived from Price's equation-provides only an approximation of the action of natural selection.

      Here, the reviewer pulls a switcheroo. The use of the word “general”, or “generality”, here refers to the fact that the classical Price equation is an identity for all possible transitions between a parent and an offspring population. This is the sense in which the inclusive fitness literature uses the word general, and so do I in the relevant places in the manuscript. When I do, I make sure to add phrases like “in the sense that whatever the true model is, it always gets the direction of selection right”. As a consequence, the classical Hamilton’s rule is also totally general, in the same sense.

      One of the core points of the paper is that this is not unique to the classical Price equation. As a matter of fact, there is a large set of Price-like equations and Hamilton-like rules that are equally much identities, and equally much general (in the sense that they get the direction of selection right for all possible transitions). The being an identity and being completely general (in this sense) therefore cannot be a decisive criterion in favour of the classical Price equation and the classical Hamilton’s rule.

      On the other hand, the way in which my Generalized Price equation and my generalized version of Hamilton’s rule are general, is that they do not restrict the statistical model with respect to which errors are squared, summed and minimized to one linear statistical model. This generalization generates the variety of Price-like equations and Hamilton-like rules mentioned above (all of which are general in the sense of always getting the direction of selection right) and it gives us the flexibility to pick one that separates terms that reflect the fitness function from terms that reflect the population state.

      In response to my generalizing the Price equation and Hamilton’s rule in this second sense, the criticism of the reviewer comes down to saying that the Price equation and Hamilton’s rule do not need generalizing, because they already are general – the switcheroo being that this refers to generality in the first sense. That makes it sound like this could be an honest mistake, confusing one way in which these can be described as general with another. However, I really hammered this point home in the manuscript. Even a cursory reading of the manuscript reveals that I am fully aware that the classical Price equation and the classical Hamilton’s rule are general in the first sense.

      It is also not helpful that, as a description of what I supposedly claim, this is impressionistic, and lacks specificity. The Price equation is an equation, or an identity. What does it mean for there to be an “assumption of linearity” within it? For the classical Price equation in covariance form (which Reviewer #2 argues is what most people think of as “the Price equation”) there is no way in which one can transform this into a meaningful statement. There is just nothing in there to which the adjective “linear” can be applied. Linearity only becomes a thing when we ask ourselves how we can interpret the regression coefficient in the classical Price equation in regression form. That would be the linearity of the statistical model the differences with which are squared, summed and minimized in the regression.

      This is in contrast to the majority view that Hamilton's rule is a fully general and exact result.

      Again, in this manuscript, I write, time and again, that the classical Hamilton’s rule is fully general (in the sense that it is applies to any transition), and exact (if that means that it always gets the direction of selection right). So, this is clearly not where the contrast with the majority view lies. The contrast with the majority view is that the majority insist on misspecification, and I suggest not to do that.

      To briefly give some mathematical details: Price's equation defines the action of natural selection in relation to a trait of interest as the covariance between fitness 𝑤 and the genetic breeding value 𝑔 for the trait, i.e. Cov(𝑤, 𝑔);

      The Price equation is an identity, not a definition. When deciding on a definition, there is some freedom. We can choose to define ⊂ so that 𝐴 ⊂ 𝐵 means that 𝐴 is a strict subset of 𝐵; or we can choose to define ⊂ so that 𝐴 ⊂ 𝐵 means that 𝐴 is a (not necessarily strict) subset of 𝐵. The Price equation does not “define the action of natural selection”, because it is an identity. There is no freedom to “define” any other way.

      The more serious reason why this is conceptually also a little dangerous, is the following. Imagine a locus with two alleles. Both of them are non-coding bits of DNA. Selection therefore does not act on either of them. Now imagine a parent population with an average p-score of 0.5, or, in other words, the frequency of these alleles in the parent population is 50-50. That makes the expected value of the p-score in the offspring population 0.5 too. In finite populations, however, randomness can make the p-score grow a bit larger or a bit smaller than 0.5. If the parent population is small, the variance (the expected squared deviation from 0.5) can actually be sizeable. If the p-score in the offspring population lands above 0.5, then the Price equation has a > 0 and a 𝐶𝑜𝑣(𝑤, 𝑝) > 0. Describing the Price equation as “defining the action of natural selection” now suggests that higher p-scores have been selected for (or, in other words, that “the action of natural selection in relation to a trait of interest” is positive). With equal probability, however, < 0 and therefore also 𝐶𝑜𝑣(𝑤, 𝑝) < 0, and this would then make us draw the opposite conclusion, that natural selection has acted to lower the p-scores in the population. Both of those would be wrong, because in this situation, it would have been randomness that changed the average p-score. 

      this is a fully general result that applies exactly to any arbitrary set of (𝑔, 𝑤) data; without any loss of generality this covariance can be expressed as the product of genetic variance Var(𝑝) and a coefficient 𝑏(𝑔, 𝑤), the coefficient simply being defined as 𝑏(𝑔, 𝑤) = for all Var(𝑝) > 0; it happens that if one fits a straight line to the same (𝑔, 𝑤) data by means of least-squares regression then the slope of that line is equal to 𝑏(𝑔, 𝑤).

      Why this needs to be explained is a bit of a mystery. These “mathematical details” are in almost all Price equation papers, and they are the point of departure of my Appendix A (it is on page 7 of a more than 90 page long set of appendices). Seeing the need to explain this suggests that the reviewer thinks that there is a chance that I or anyone reading this paper would have missed this. I have not, and, more importantly, none of this invalidates the point I make in the paper.   

      All of this has already been discussed, repeatedly, in the literature.

      All of this has already been discussed, repeatedly, in the literature indeed. It is just that it does not engage with anything I write in the manuscript, or that I wrote in my other papers.

      Now turn to the present paper: the first sentence of the Abstract says "The generality of Hamilton's rule is much debated", and then the next sentence says "In this paper, I show that this debate can be resolved by constructing a general version of Hamilton's rule".

      This is correct.

      But immediately it's clear that this isn't really resolving the debate, what this paper is actually doing is asserting the correctness of the minority view (i.e. that Hamilton's rule as it currently stands is not a general result)

      It seems to me that the reason why this is “immediately clear” to this reviewer is that the reviewer has not processed the contents of the paper. I am not sure if I have to repeat this, but I am not saying that “Hamilton’s rule as it currently stands” is not general (in the sense that it always gets the direction of selection right). It is, and I say that it is a bunch of times. But so are other rules.

      and then attempting to build a more general form of Hamilton's rule upon that shaky foundation.

      I am not just “attempting to build a more general form of Hamilton's rule”. I did in fact build a more general form of Hamilton’s rule (where the generality refers to the richer set of reference statistical models).

      Predictably, the paper erroneously interprets the standard formulation of Hamilton's rule as a linear approximation and develops non-linear extensions to improve the goodness of fit for a result that is already exactly correct.

      Nowhere in the paper or the appendices do I describe the standard formulation of Hamilton’s rule (or, for that matter, any formulation of Hamilton’s rule) as an “approximation”. It is just not a word that has anything to do with this. If we are doing statistical inference, and the sum of squared errors that is minimized decreases by adding a variable in the statistical model with regard to which the sum of squared errors is minimized, then that will typically improve the goodness of fit. In statistics this is not described that as an improvement in how well the statistical model “approximates” the data, or whatever it is that the reviewer would suggest is being approximated here.

      This is not a convincing contribution. It will not change minds or improve understanding of the topic.

      There is indeed plenty of scope for this not to change minds or improve understanding of the topic. It will not change the minds or improve the understanding of those that are not really interested in getting this right. Obviously, it will also not convince those that do not read it.

      Nor is it particularly novel. Smith et al (2010, "A generalisation of Hamilton's rule for the evolution of microbial cooperation" Science 328, 1700-1703) similarly interpreted Hamilton's rule as a linear model and provided a corresponding polynomial expansion - usefully fitting the model to microbial data so as to learn something about the costs and benefits of cooperation in an empirical setting. it's odd that this paper isn't cited here.

      Let me begin by pointing to what I agree with. Given that smith et al. (2010) and my manuscript are both in the business of generalizing Hamilton’s rule, it would be helpful to the reader if my paper includes more information about how the two efforts relate. I will discuss the relation below, and I will also include that in Appendix B, and point to it in the main text. Before I do, however, I would like to point to two details in the review report that fit a pattern.

      The first is that the reviewer describes what smith et al. (2010) do as “useful”, and seems to think of fitting polynomial expansions as a legitimate way to “learn something about the costs and benefits of cooperation in an empirical setting”. That sounds quite positive. My paper, in which I supposedly repeat this, however, is characterized as misguided. This fits a pattern; all of the reviews I received from the inclusive fitness community include a “done before”, and regularly the done before is described approvingly, while my paper is described as fundamentally flawed.

      Also customary is the lack of detail. What would be really useful here, is something like “equation A.14 in this manuscript is the same as equation 6 in smith et al. (2010) if we choose . This kind of statement would pin down the way in which what I do has been done before. That, however, would require going into detail, at the risk of finding out that what is done in my manuscript is actually quite different from what happens in smith et al. (2010). That is also a recurrent thing. When I look up the done before, I typically find something that is not quite the same.  

      Now on to the paper. What smith et al. (2010) try to do is something that I wholeheartedly support. It is an empirical study that tries to capture non-linearity. A first point of order is that it is worth asking ourselves: linear or non-linear in what? For that, I would like to go back to the setup of my manuscript. Model 2 from the Main Text is

      In this fitness function, 𝑝! is the p-score of individual 𝑖 and 𝑞! is the p-score of the partner that individual 𝑖 is matched with. This is a standard model of social behaviour if 𝛽<sub>1,0</sub> < 0 and 𝛽<sub>0,1</sub> > 0. Such choices for 𝛽<sub>1,0</sub> and 𝛽<sub>0,1</sub> indicate that having a higher p-score decreases the fitness of individual 𝑖 and increases the fitness of its partner. Here we assume that 𝛼 = 1, 𝛽<sub>1,0</sub> \= −1, and 𝛽<sub>0,1</sub> \= 2. We assume that p-scores can only be 0 or 1, or, in other words, we assume that there are only cooperators and defectors in the population (or, in terms of smith et al., 2010: cooperators and cheaters).

      For a well-mixed population, where the likelihood of being matched with a cooperator is the same for cooperators and defectors (it is equal to the frequency of cooperators for both), we can now plot the fitnesses of cooperators (red) and defectors (blue) as a function of the frequency of cooperators (Appendix 1-figure 6 left).

      We can do the same for a population with relatedness where the probability of being matched with a cooperator is + 𝑓<sub>c</sub> for cooperators, and 𝑓<sub>c</sub> for defectors, where 𝑓<sub>c</sub> is the frequency of cooperators (Appendix 1-figure 6 right). For relatedness 𝑟 = 0 and 𝑟 = "7, cooperation is selected against at every frequency.

      Increasing relatedness further, we would find that for 𝑟 = the lines coincide, which implies that at every frequency, cooperation is neither selected for nor against. For 𝑟 > ": cooperation will be selected for at every frequency. This pattern implies that, as we have seen in the manuscript, the classical Hamilton’s rule works perfectly fine for Model 2; with 𝑐 = −𝛽<sub>1,0</sub> = 1 and 𝑏 = 𝛽<sub>0,1</sub> \= 2, cooperation is selected for if and only if 𝑟𝑏 > 𝑐. The fitnesses of cooperators and defectors as functions of the frequency of cooperators, moreover, are always parallel lines, regardless of relatedness.

      Model 3 in the main text extends Model 2 by adding an interaction term:

      Now we choose 𝛼 = 1, 𝛽<sub>1,0</sub> = −1, 𝛽<sub>1,0</sub> = 1, and 𝛽<sub>1,1</sub>  \= 1. We again draw the fitnesses of cooperators and defectors, both at relatedness 𝑟 = 0 (Appendix 1-figure 7 left) and at relatedness 𝑟 = (Appendix 1-figure 7 right). In the manuscript, I argue that the appropriate version of Hamilton’s rule here is Queller’s rule: 𝑟<sub>0,1</sub>𝑏<sub>0,1</sub> + 𝑟<sub>1,1</sub>𝑏<sub>1,1</sub> > 𝑐 with 𝑐 = −𝛽<sub>1,0</sub> = 1, 𝑏<sub>0,1</sub> = 𝛽<sub>0,1</sub> = 1, and 𝑏<sub>1,1</sub> = 𝛽<sub>1,1</sub> = 1. The fitnesses of cooperators and defectors as functions of the frequency of cooperators are still straight lines, but they are no longer parallel.

      The first thing to observe, therefore, is that a model with synergy, in which the classic version of Hamilton’s rule would be misspecified, and Queller’s rule would be well-specified, does not require the fitnesses as functions of the frequencies of cooperators to be non-linear. All that changes with the addition of the interaction term, is that they stop being parallel.

      The paper by smith et al. (2010) is an effort to capture non-linearities in the way fitnesses depend on the frequency of cooperators. That, therefore, goes beyond the step from Model 2 to Model 3. Whether it uses the right method to capture those non-linearities, we will come back to in a second, but it is important to realize that also without these non-linearities, the classic version of Hamilton’s rule can be too limiting to accurately describe selection. (Here, I should add that this implies that we were wrong in Wu et al. (2013), when we suggested that “for this experiment, it seems unnecessary to use the generalized Hamilton’s rule, if instead the Malthusian fitness is adopted. In other words, the Wrightian fitness approach calls for a generalization of Hamilton’s rule, whereas the Malthusian fitness approach does not (or at least not in a drastic way, as Malthusian fitnesses are almost linear in the frequency of cooperators).” Using Malthusian fitnesses, the functions were close to linear, but not close to parallel, and therefore also here, Hamilton’s rule needs generalizing - albeit in a different way than smith et al. (2010) did).

      The cooperation that is observed in the Myxococcus xanthus studied by smith et al. (2010) is not a good match with a model where individuals are matched in pairs for an interaction that determines their fitnesses. These microbes cooperate in large groups, and a better match would therefore be the n-player public goods games studied in van Veelen (2018). There, we see that simple, straightforward ways to describe synergies (or anti-synergies) can easily lead to fitnesses not being linear in the frequency of cooperators.

      The way smith et al. (2010) try to capture those non-linearities, however, is not free of complications. We addressed those in Wu et al. (2013), and I summarized them, shortly, in van Veelen (2018). One of the issues is that most of the non-linearity smith et al. (2010) pick up is the result of considering Wrightian fitness rather than Malthusian fitness. In a continuous time model with a constant growth rate, the population size at time 𝑡 is 𝑁(𝑡) = 𝑒<sup>mt</sup>𝑁(0), where 𝑚 is the Malthusian fitness. In a discrete time model with a constant average number of offspring per individual, the population at time 𝑡 is 𝑁(𝑡) = 𝑤<sup>t</sup>𝑁(0), where 𝑤 is the Wrightian fitness. If we take 𝑚 = ln 𝑤, these are the same, and if 𝑤 is close to 1, then 𝑚 can be approximated by 𝑤 − 1. That also implies that if 𝑤 is close to 1 (or, equivalently, if 𝑚 is close to 0) one is locally linear if the other is too. However, in the experiment by smith et al. (2010) the aggregate fitness effects are not small, and what is highly nonlinear in terms of Wrightian fitness is close to linear in Malthusian fitness.

      Another complication is that the Taylor coefficients that smith et al. (2010) find are the result of a combination of the data and the choice of a functional form they choose to first apply to their data. That means that a different choice of a functional form would have given different Taylor coefficients, while the in-between transformation can also be skipped. Also, the number of Taylor coefficients is larger than the dimensionality of the data, which are based on averages for 6 frequencies. For more details on these complications, I would like to refer to Wu et al. (2013) and van Veelen (2018). A nice detail is that if we consider the way the fitnesses of cooperators and defectors compare when using Malthusian fitnesses, then a comparison of the slopes actually suggests anti-synergies, which leads to a stable mix of cooperators and cheaters, already in the absence of population structure. This matches what is suggested by Archetti and Scheuring, (2011, 2012) and Archetti (2018).

      Besides these technical complications, smith et al. (2010) is also different, in the sense that it is an empirical paper. It does not contain the Generalized Price equation, it contains no insights regarding how to derive population genetic dynamics from the Generalized Price equation, or how to derive the appropriate rules from those, and it has a very different approach to separating fitness effects and population structure.

      To end on a positive note, I would like to quote a bit out of Wu et al. (2013):

      “While we criticise these mathematical issues, we are convinced that smith et al. (2010) aim into the right direction: to incorporate the nonlinearities characteristic of biology into social evolution, we may have to extend and generalize the approach of inclusive fitness. It would be beautiful if such a generalization would ultimately include Hamilton’s original rule as a special case […].”

      I like to think that this is exactly what I have done in this paper.

      References

      Akdeniz, A., & van Veelen, M. (2020). The cancellation effect at the group level. Evolution, 74(7), 1246–1254. doi: 10.1111/evo.13995

      Allen, B., & Tarnita, C. E. (2012). Measures of success in a class of evolutionary models with fixed population size and structure. Journal of Mathematical Biology, 68, 109–143. doi: 10.1007/s00285-012-0622-x

      Archetti, M. (2018). How to Analyze Models of Nonlinear Public Goods. Games 2018, Vol. 9, Page 17, 9(2), 17. doi: 10.3390/g9020017

      Archetti, M., & Scheuring, I. (2011). Coexistence of cooperation and defection in public goods games. Evolution, 65(4), 1140–1148. doi: 10.1111/j.1558-5646.2010.01185.x

      Archetti, M., & Scheuring, I. (2012). Review: Game theory of public goods in one-shot social dilemmas without assortment. Journal of Theoretical Biology, 299, 9–20. doi: 10.1016/j.jtbi.2011.06.018

      Bourke, A. F. G. (2014). Hamilton’s rule and the causes of social evolution. Philosophical Transactions of the Royal Society B: Biological Sciences, 369(1642), 20130362. doi: 10.1098/rstb.2013.0362

      Bourrat, P., Godsoe, W., Pillai, P., Gouhier, T. C., Ulrich, W., Gotelli, N. J., & van Veelen, M. (2023). What is the price of using the Price equation in ecology? Oikos, 2023(8). doi: 10.1111/oik.10024

      Crow, J. F. (2008). Commentary: Haldane and beanbag genetics. International Journal of Epidemiology, 37(3), 442–445. doi: 10.1093/ije/dyn048

      Fisher, R. (1930). The genetical theory of natural selection. Retrieved from https://www.cabidigitallibrary.org/doi/full/10.5555/19601600934

      Fletcher, J. A., & Zwick, M. (2006). Unifying the theories of inclusive fitness and reciprocal altruism. American Naturalist, 168(2), 252–262. doi: 10.1086/506529

      Frank, S. A. (1997). The Price equation, Fisher’s fundamental theorem, kin selection, and causal analysis. Evolution, 51(6), 1712–1729. doi: 10.1111/j.1558-5646.1997.tb05096.x

      Frank, S. A. (1998). Foundations of social evolution. Princeton: Princeton University Press.

      Frank, S. A. (2012). Natural selection. IV. The Price equation*. Journal of Evolutionary Biology, 25(6), 1002–1019. doi: 10.1111/j.1420-9101.2012.02498.x

      Gardner, A., West, S. A., & Wild, G. (2011). The genetical theory of kin selection. Journal of Evolutionary Biology, 24(5), 1020–1043. doi: 10.1111/j.1420-9101.2011.02236.x

      Grafen, A. (1985a). A geometric view of relatedness. Oxford Surveys in Evolutionary Biology, 2(2), 28-89.

      Grafen, A. (1985b). News and Views. Evolutionary theory: Hamilton’s rule OK. Nature, 318(6044), 310–311. doi: 10.1038/318310a0

      Hamilton, W. D. (1964). The genetical evolution of social behaviour. I. Journal of Theoretical Biology, 7(1), 1–16. doi: 10.1016/0022-5193(64)90038-4

      Karlin, S., & Matessi, C. (1983). The eleventh R. A. Fisher Memorial Lecture - Kin selection and altruism. Proceedings of the Royal Society of London. Series B. Biological Sciences, 219(1216), 327–353. doi: 10.1098/rspb.1983.0077

      Matessi, C., & Karlin, S. (1984). On the evolution of altruism by kin selection. Proceedings of the National Academy of Sciences, 81(6), 1754–1758. doi: 10.1073/pnas.81.6.1754

      Nowak, M. A., Tarnita, C. E., & Wilson, E. O. (2010). The evolution of eusociality. Nature, 466(7310), 1057–1062. doi: 10.1038/nature09205

      Okasha, S. (2005). Maynard Smith on the levels of selection question. Biology and Philosophy, 20(5), 989–1010. doi: 10.1007/S10539-005-9019-1/METRICS

      Page, K. M., & Nowak, M. A. (2002). Unifying evolutionary dynamics. Journal of Theoretical Biology, 219(1). doi: 10.1016/S0022-5193(02)93112-7

      Pillai, P., & Gouhier, T. C. (2019). Not even wrong: the spurious measurement of biodiversity’s effects on ecosystem functioning. Ecology, 100(7), e02645. doi: 10.1002/ecy.2645

      Price, G. R. (1970). Selection and Covariance. Nature, 227(5257), 520–521. doi: 10.1038/227520a0

      Price, G. R. (1972). Extension of covariance selection mathematics. Annals of Human Genetics, 35(4), 485-490.

      Queller, D. C. (1985). Kinship, reciprocity and synergism in the evolution of social behaviour. Nature, 318(6044), 366–367. doi: 10.1038/318366a0

      Queller, D. C. (1992a). A general model for kin selection. Evolution, 46(2), 376–380. doi: 10.1111/j.1558-5646.1992.tb02045.x

      Queller, D. C. (1992b). Quantitative Genetics, Inclusive Fitness, and Group Selection. The American Naturalist, 139(3), 540–558. doi: 10.1086/285343

      Queller, D. C. (2011). Expanded social fitness and Hamilton’s rule for kin, kith, and kind. Proceedings of the National Academy of Sciences, 108(supplement_2), 10792–10799. doi: 10.1073/pnas.1100298108

      Rousset, & Billiard. (2000). A theoretical basis for measures of kin selection in subdivided populations: Finite populations and localized dispersal. Journal of Evolutionary Biology, 13(5). doi: 10.1046/j.1420-9101.2000.00219.x

      Rousset, F. (2015). Regression, least squares, and the general version of inclusive fitness. Evolution, 69(11), 2963–2970. doi: 10.1111/evo.12791

      Smith, J., Van Dyken, J. D., & Zee, P. C. (2010). A generalization of hamilton’s rule for the evolution of microbial cooperation. Science, 328(5986), 1700–1703. doi: 10.1126/science.1189675

      Sober, Elliott., & Wilson, D. Sloan. (2007). Unto others : the evolution and psychology of unselfish behavior. 394. Retrieved from https://www.hup.harvard.edu/books/9780674930476

      Taylor, P. D. (1992). Altruism in viscous populations - an inclusive fitness model. Evolutionary Ecology, 6(4), 352–356. doi: 10.1007/bf02270971

      Taylor, Peter D. (1989). Evolutionary stability in one-parameter models under weak selection. Theoretical Population Biology, 36(2), 125–143. doi: 10.1016/00405809(89)90025-7

      Taylor, Peter D., Day, T., & Wild, G. (2007). Evolution of cooperation in a finite homogeneous graph. Nature, 447(7143), 469–472. doi: 10.1038/nature05784

      Van Cleve, J. (2015). Social evolution and genetic interactions in the short and long term. Theoretical Population Biology, 103. doi: 10.1016/j.tpb.2015.05.002

      van Veelen, M. (2005). On the use of the Price equation. Journal of Theoretical Biology, 237(4). doi: 10.1016/j.jtbi.2005.04.026

      van Veelen, M. (2007). Hamilton’s missing link. Journal of Theoretical Biology, 246(3). doi: 10.1016/j.jtbi.2007.01.001

      van Veelen, M. (2011). The replicator dynamics with n players and population structure. Journal of Theoretical Biology, 276(1). doi: 10.1016/j.jtbi.2011.01.044

      van Veelen, M. (2018). Can Hamilton’s rule be violated? ELife, 7. doi: 10.7554/eLife.41901

      van Veelen, M. (2020). The problem with the Price equation. Philosophical Transactions of the Royal Society B: Biological Sciences, 375(1797), 20190355. doi: 10.1098/rstb.2019.0355

      van Veelen, M., Allen, B., Hoffman, M., Simon, B., & Veller, C. (2017). Hamilton’s rule. Journal of Theoretical Biology, 414. doi: 10.1016/j.jtbi.2016.08.019

      van Veelen, M., García, J., Sabelis, M. W., & Egas, M. (2012). Group selection and inclusive fitness are not equivalent; the Price equation vs. models and statistics. Journal of Theoretical Biology, 299. doi: 10.1016/j.jtbi.2011.07.025

      Wilson, D. S., Pollock, G. B., & Dugatkin, L. A. (1992). Can altruism evolve in purely viscous populations? Evolutionary Ecology, 6(4), 331–341. doi: 10.1007/bf02270969

      Wu, B., Gokhale, C. S., van Veelen, M., Wang, L., & Traulsen, A. (2013). Interpretations arising from Wrightian and Malthusian fitness under strong frequency dependent selection. Ecology and Evolution, 3(5). doi: 10.1002/ece3.500

    1. eLife Assessment

      The ratio of nuclei to cell volume is a well-controlled parameter in eukaryotic cells. This study now reports important findings that expand our understanding of the regulatory relationship between cell size and number of nuclei. The evidence supporting the conclusions is convincing obtained by applying appropriate and validated methodology in line with current state-of-the-art. The paper will be of broad interest for cell biologists and fungal biotechnologists seeking to understand mechanisms determining cell size and number of nuclei and why this knowledge might also be of importance for the production of enzymes and thus production strains not only of Aspergillus oryzae but also other industrially used fungi.

    2. Reviewer #1 (Public review):

      Filamentous fungi are established work horses in biotechnology with Aspergillus oryzae as a prominent example with a thousand-year of history. Still the cell biology and biochemical properties of the production strains is not well understood. The paper of the Takeshita group describes the change in nuclear numbers and correlate it to different production capacities. They used microfluidic devices to really correlate the production with nuclear numbers. In addition, they used microdissection to understand expression profile changes and found an increase of ribosomes. The analysis of two genes involved in cell volume control in S. pombe did not reveal conclusive answers to explain the phenomenon. It appears that it is a multi-trait phenotype. Finally, they identified SNPs in many industrial strains and tried to correlate them to the capability of increasing their nuclear numbers.

      The methods used in the paper range from high quality cell biology, Raman spectroscopy to atomic force and electron microscopy and from laser microdissection to the use of microfluidic devices to study individual hyphae.

      This is a very interesting, biotechnologically relevant paper with the application of excellent cell biology.

      Comments on revised version:

      The authors addressed all suggestions satisfactorily.

    3. Reviewer #2 (Public review):

      Summary:

      In the study presented by Itani and colleagues it is shown that some strains of Aspergillus oryzae - especially those used industrially for the production of sake and soy sauce - develop hyphae with a significantly increased number of nuclei and cell volume over time. These thick hyphae are formed by branching from normal hyphae and grow faster and therefore dominate the colonies. The number of nuclei positively correlates with the thicker hyphae and also the amount of secreted enzymes. The addition of nutrients such as yeast extract or certain amino acids enhanced this effect. Genome and transcriptome analyses identified genes, including rseA, that are associated with the increased number of nuclei and enzyme production. The authors conclude from their data involvement of glycosyltransferases, calcium channels and the tor regulatory cascade in regulation of cell volume and number of nuclei. Thicker hyphae and an increased number of nuclei was also observed in high-production strains of other industrially used fungi such as Trichoderma reesei and Penicillium chrysogenum, leading to the hypothesis that the mentioned phenotypes are characteristic of production strains which is of significant interest for fungal biotechnology.

      Strengths:

      The study is very comprehensive and involves application of divers state-of-the-art cell biological, biochemical and genetical methods. Overall, the data are properly controlled and analyzed, figures and movies are of excellent quality.<br /> The results are particularly interesting with regard to the elucidation of molecular mechanisms that regulate the size of fungal hyphae and their number of nuclei. For this, the authors have discovered a very good model: (regular) strains with a low number of nuclei and strains with high number of nuclei. Also, the results can be expected to be of interest for the further optimization of industrially relevant filamentous fungi.

      In the revision the authors addressed all my comments and as a result produced an even stronger study.

    4. Reviewer #3 (Public review):

      Summary:

      The authors seek to determine the underlying traits that support the exceptional capacity of Aspergillus oryzae to secrete enzymes and heterologous proteins. To do so, they leverage the availability of multiple domesticated isolates of A. oryzae along with other Aspergillus species to perform comparative imaging and genomic analysis.

      Strengths:

      The strength of this study lies in the use of multifaceted approaches to identify significant differences in hyphal morphology that correlate with enzyme secretion, which is then followed by the use of genomics to identify candidate functions that underlie these differences.

      Weaknesses:

      Although the image analysis and data interpretation is convincing, the genetic data supporting the author's model is somewhat more speculative and will likely require additional investigation.

      Overall, the authors have achieved their aims in that they are able to clearly document the presence of two distinct hyphal forms in A. oryzae and other Aspergillus species, and to correlate the presence of the thicker rapidly growing form with enhanced enzyme secretion. The image analysis is convincing. The discovery that addition of yeast extract and specific amino acids can stimulate formation of the novel hyphal form is also notable. Although the conclusions are generally supported by the results, this is perhaps less so for the genetic analysis as it remains unclear how direct the role of RseA and the calcium transporters might be in supporting the formation of the thicker hyphae.

      The results presented here will impact the field. The complexity of hyphal morphology and how it affects secretion are not well understood despite the importance of these processes for the fungal lifestyle. In addition, the description of approaches that can be used to facilitate the study of these different hyphal forms (i.e., stimulation using yeast extract or specific animo acids) will benefit future efforts to understand the molecular basis of their formation.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Filamentous fungi are established workhorses in biotechnology, with Aspergillus oryzae as a prominent example with a thousand-year history. Still, the cell biology and biochemical properties of the production strains is not well understood. The paper of the Takeshita group describes the change in nuclear numbers and correlates it to different production capacities. They used microfluidic devices to really correlate the production with nuclear numbers. In addition, they used microdissection to understand expression profile changes and found an increase in ribosomes. The analysis of two genes involved in cell volume control in S. pombe did not reveal conclusive answers to explain the phenomenon. It appears that it is a multi-trait phenotype. Finally, they identified SNPs in many industrial strains and tried to correlate them to the capability of increasing their nuclear numbers. 

      The methods used in the paper range from high-quality cell biology, Raman spectroscopy, to atomic force and electron microscopy, and from laser microdissection to the use of microfluidic devices to

      study individual hyphae. 

      This is a very interesting, biotechnologically relevant paper with the application of excellent cell biology. I have only minor suggestions for improvement. 

      We sincerely appreciate your fair and positive evaluation of our work. Thank you for your suggestions for improvement. We respond to each of them appropriately.

      Reviewer #2 (Public review): 

      Summary: 

      In the study presented by Itani and colleagues, it is shown that some strains of Aspergillus oryzae - especially those used industrially for the production of sake and soy sauce - develop hyphae with a significantly increased number of nuclei and cell volume over time. These thick hyphae are formed by branching from normal hyphae and grow faster and therefore dominate the colonies. The number of nuclei positively correlates with the thicker hyphae and also the amount of secreted enzymes. The addition of nutrients such as yeast extract or certain amino acids enhanced this effect. Genome and transcriptome analyses identified genes, including rseA, that are associated with the increased number of nuclei and enzyme production. The authors conclude from their data involvement of glycosyltransferases, calcium channels, and the tor regulatory cascade in the regulation of cell volume and number of nuclei. Thicker hyphae and an increased number of nuclei were also observed in high-production strains of other industrially used fungi such as Trichoderma reesei and Penicillium chrysogenum, leading to the hypothesis that the mentioned phenotypes are characteristic of production strains, which is of significant interest for fungal biotechnology. 

      Strengths: 

      The study is very comprehensive and involves the application of diverse state-of-the-art cell biological, biochemical, and genetic methods. Overall, the data are properly controlled and analyzed, figures and

      movies are of excellent quality. 

      The results are particularly interesting with regard to the elucidation of molecular mechanisms that regulate the size of fungal hyphae and their number of nuclei. For this, the authors have discovered a very good model: (regular) strains with a low number of nuclei and strains with a high number of nuclei. Also, the results can be expected to be of interest for the further optimization of industrially relevant filamentous

      fungi. 

      Weaknesses: 

      There are only a few open questions concerning the activity of the many nuclei in production strains (active versus inactive), their number of chromosomes (haploid/diploid), and whether hyper-branching always leads to propagation of nuclei. 

      We are very grateful for your recognition of our findings, the proposed model, and their significance for future applications. We are grateful for the questions, which contribute to a more accurate understanding. 

      Our responses to each are provided below.  

      Reviewer #3 (Public review): 

      Summary: 

      The authors seek to determine the underlying traits that support the exceptional capacity of Aspergillus oryzae to secrete enzymes and heterologous proteins. To do so, they leverage the availability of multiple domesticated isolates of A. oryzae along with other Aspergillus species to perform comparative imaging and genomic analysis. 

      Strengths: 

      The strength of this study lies in the use of multifaceted approaches to identify significant differences in hyphal morphology that correlate with enzyme secretion, which is then followed by the use of genomics to identify candidate functions that underlie these differences. 

      Weaknesses: 

      There are aspects of the methods that would benefit from the inclusion of more detail on how experiments were performed and data interpreted. 

      Overall, the authors have achieved their aims in that they are able to clearly document the presence of two distinct hyphal forms in A. oryzae and other Aspergillus species, and to correlate the presence of the thicker, rapidly growing form with enhanced enzyme secretion. The image analysis is convincing. The discovery that the addition of yeast extract and specific amino acids can stimulate the formation of the novel hyphal form is also notable. Although the conclusions are generally supported by the results, this is perhaps less so for the genetic analysis as it remains unclear how direct the role of RseA and the calcium transporters might be in supporting the formation of the thicker hyphae. 

      The results presented here will impact the field. The complexity of hyphal morphology and how it affects secretion is not well understood despite the importance of these processes for the fungal lifestyle. In addition, the description of approaches that can be used to facilitate the study of these different hyphal forms (i.e., stimulation using yeast extract or specific amino acids) will benefit future efforts to understand the molecular basis of their formation. 

      We are very grateful for your fair and thoughtful evaluation of our work. We agree that the genetic analysis in the latter part is relatively weaker compared to the imaging analysis in the first half. Rather than a single mutation causing a dramatic phenotypic change, we believe that the accumulation of various mutations through breeding leads to the observed phenotype, making it difficult to clearly demonstrate causality. Since transcriptome and SNP analyses have revealed key pathways and phenotypes, it would be gratifying if these insights could contribute to future applications utilizing filamentous fungi.

      Reviewer #1 (Recommendations for the authors): 

      I was wondering what happens if thick hyphae were taken as inoculum for a new colony or thin hyphae. Is it possible to enrich for one or the other type of hyphae? Perhaps in the presence of yeast extract or certain amino acids. 

      Added an explanation in the discussion.

      L304-306. When thick hyphae were cultured on fresh medium, thin hyphae initially emerged, suggesting that sustained metabolic activity is required for the formation of thick hyphae with a high number of nuclei.    

      L120-121. In some cases, thick hyphae emerged by branching from thick hyphae (Fig. 2D, left), while in other cases, thin hyphae emerged from thick hyphae (Fig. 2D, right). Thin hyphae emerge in the early stage of cultivation even in the presence of yeast extract or certain amino acids.

      In the Discussion, they hypothesize that the primary effect could be on cell wall rigidity. I am wondering if that hypothesis could be tested by adding, for instance, sublethal concentrations of cytochalasin to hyphae of A. nidulans to weaken the cell wall. 

      The question is reasonable. To ensure accurate understanding, we moved Fig. S6 to Fig. 6 and revised the discussion as follows. 

      L294-295. In our model, cell wall loosening at a branching site and regulation of cell volume by turgor pressure constitute necessary conditions for increasing cell volume and maintaining thick hyphae. L306-309. Weakening the cell wall by treatment with a low concentration of calcofluor white did not lead to hyphal thickening or an increase in nuclear number. On the contrary, thick hyphae have thicker cell walls (Fig. 2H-K), which are necessary to maintain the increased cell volume.

      I recommend including some older literature. It was described already 20 years ago that A. nigerdifferentiates hyphae with different capacities to secrete proteins (PMID: 16238620). In addition, there are old reports in A. nidulans reporting high numbers of nuclei (https://doi.org/10.1099/00221287-60-1-133). Perhaps it is worth trying to reproduce those cultural conditions. At least this should be discussed. In the same line, the number of nuclei increases a lot in the stalk of conidiophores in A. nidulans. These observations could be used as examples that the phenomenon observed in A. oryzae may be of general importance. 

      Thank you for the suggestion. It is a very interesting proposal. We checked the nuclei distribution of A. nidulans on the media and added the following discussion.

      L328-334. A previous study reported an increase in the number of nuclei in A. nidulans (62, 63). Here, we examined the nuclear distribution of A. nidulans grown on the culture media, however, did not find class III hyphae as observed in A. oryzae. Even in A. nidulans, conidiophore stalks contain a high number of nuclei. It has been shown that A. oryzae has a taller conidiophore stalk (64). In the thick hyphae of A. oryzae, the expression level of flbA, an early regulator of conidiophore development (65), was elevated. This suggests that differentiation to aerial hyphae may be involved in the increase of hyphal volume and nuclear number. 

      (62) Clutterbuck A.J. Synchronous Nuclear Division and Septation in Aspergillus nidulans. J Gen Microbiol 60, 133-135 (1970).

      (63) Vinck, A., Terlou, M., Pestman, W.R., Martens, E.P., Ram, A.F., van den Hondel, C.A., Wösten, H.A. Hyphal differentiation in the exploring mycelium of Aspergillus niger. Mol Microbiol 58, 693-9 (2005).

      (64) Wada R, Maruyama J, Yamaguchi H, Yamamoto N, Wagu Y, Paoletti M, Archer DB, Dyer PS, Kitamoto K. Presence and functionality of mating type genes in the supposedly asexual filamentous fungus Aspergillus oryzae. Appl Environ Microbiol 78, 2819-29 (2012).

      (65) Lee, B.N., Adams, T.H. Overexpression of flbA, an early regulator of Aspergillus asexual sporulation, leads to activation of brlA and premature initiation of development. Mol Microbiol 14, 323-34 (1994).

      Reviewer #2 (Recommendations for the authors): 

      I suggest addressing the following questions to strengthen the manuscript: 

      (1) Do the authors have an explanation for their result that with an increase in the number of nuclei the individual nucleus is smaller? Have the authors checked whether all the nuclei are haploid or diploid?

      Thank you for the very important question. We added new results to Fig. S5D and S5E and the following discussion.

      L335-340. We investigated whether the reduction in nuclear size observed in thick hyphae was due to a change from diploid to haploid status. However, no difference in GFP-histone fluorescence intensity was detected between thick and thin hyphae (Fig. S5D). In both RIB40 and RIB915 strains, no significant difference in conidial spore size was observed despite the large difference in the number of nuclei within the hyphae (Fig. S5E). These results suggest that both thick and thin hyphae remain haploid, and that the smaller nuclear size observed in thick hyphae is likely due to a higher nuclear density.

      (2) In this context, the biological relevance of the increase in the number of nuclei should also be discussed in more detail. It remains to be clarified whether in hyphae with a high number of nuclei all nuclei are functionally active or whether many nuclei are possibly "inactive". Studies on the transcriptional activity of individual nuclei or on DNA replication (e.g., by EdU labeling) could clarify this. 

      Added the explanation below.

      L102-105. The transcriptional activity of each nucleus is unknown. However, a previous study (Yasui et al., FBB 2020) demonstrated that nuclear division is synchronized even when there are more than 200 nuclei. This suggests that DNA replication occurs similarly in most nuclei. Furthermore, since the germination rate of conidia and the colonies formed from individual conidia show no significant abnormalities, it is suggested that nearly all nuclei possess normal genomes and chromosomes.

      (3) It becomes not entirely clear what the underlying signal is that causes a thin hypha to branch into a thick multinucleated cell. This needs to be discussed in more detail. 

      Thanks for the suggestion. We clarified the signal to increase nuclear number and cell volume.

      L294-309. Although it is speculative, we propose a model to aid interpretation in the discussion. We have clarified that both genetic potential and environmental signals such as nutrients are important.

      (4) Is increased branching always correlated with an increased number of nuclei? 

      It is not an increase in branching, but rather the thickening of hyphae and an increase in cell volume that is consistently associated with an increase in nuclear number. Approximately 40 hours after inoculation, within 400 μm from the tip, the number of branches was 3.4 (SD=2.4) in thin hyphae and 2.6 (SD=0.5) in thick hyphae, suggesting that branching does not increase (n=4). Since thick hyphae elongate faster, it seems that fewer branches are present near the tip, even if the branching frequency itself remains unchanged.

      (5) The abstract does not summarize the many findings of the manuscript in an adequate way. 

      abstract change

      Minor: 

      (1) Lines 49-50: Why italics? 

      corrected.

      (2) Line 179: process. 

      corrected.

      (3) Lines 313-314: Do not forget (and discuss) in this context mycorrhiza fungi with up to thousands of nuclei that were apparently selected during evolution for this high number of nuclei. 

      Thank you for the very interesting suggestion. We have added the following discussion.

      L339-351. The regulation of nuclear number and its ecological strategy are intriguing in other fungi such as N. crassa, which rapidly spreads after wildfires (68), and arbuscular mycorrhiza fungi that form symbiotic relationships with plants and contain thousands of nuclei within hyphae lacking septa (69).

      (68) Jacobson, D. J. et al. Neurospora in temperate forests of western North America. Mycologia 96, 66–74 (2004).

      (69) Kokkoris V, Stefani F, Dalpé Y, Dettman J, Corradi N. Nuclear Dynamics in the Arbuscular Mycorrhizal Fungi. Trends Plant Sci. 25, 765-778 (2020).

      (4) Lines 356-358: many typos.

      corrected.

      Reviewer #3 (Recommendations for the authors): 

      Specific suggestions or clarifications for the authors include: 

      (1) Lines 49-50: Is this sentence italicized for a reason? 

      It was a mistake, so we have corrected it.

      (2) Line 83: More detail on the specific characteristics of the different classes of hyphae would be helpful. Perhaps include a schematic drawing that emphasizes the differences between class I,II, and III hyphae. 

      L398-400. The classification is described in the Methods section: Class I – nuclei are distributed at regular intervals without overlapping; Class II – nuclei are aligned but occasionally overlap; Class III – nuclei are scattered throughout the hyphae without alignment. Representative images are shown in a previous study (Yasui et al., FBB 2020). 

      L82-84. We have added this information to clarify the classification.

      (3) Lines 102-103: It was not very clear how this experiment was done. Are you counting nuclei within 100 um of the tip? Are these all in one hyphal compartment? These details could be provided in a drawing that would make it easier for the reader to understand how this was done. 

      L109. Due to variation in the distance from the hyphal tip to the septum, we counted the number of nuclei within 100 μm from the hyphal tip. When septa were present, nuclei were counted in the same manner, so multiple compartments may be included. Changed the explanation.

      (4) Lines 134-140: Is there a way to calibrate levels of secreted protein or amylase activity per nucleus? That is, if the ratio of cytoplasmic volume per nucleus is constant, does the same apply to the secreted product? Knowing this would help to clarify whether the key feature in enhanced secretion is nuclear (e.g., gene expression) versus a cytoplasmic trait (e.g., vesicle trafficking). 

      Enzyme activity was measured across the entire mycelium, which includes a mixture of hyphae with high and low numbers of nuclei. Therefore, it is difficult to assess the correlation between enzyme activity and nuclear number. Enzyme activity was normalized by fungal biomass. The size of each colony is shown in Fig. 1B. Additionally, the correlation between the proportion of hyphae with increased nuclear number and enzyme activity is shown in Fig. 3H. In the experiment where enzyme activity was measured in a single hypha, we attempted to measure the number of nuclei; however, we could not use the nuclear GFP strain because the substrate exhibits green fluorescence. DAPI staining also failed due to limited dye access to the microfluidic channel. Changed the section title, ‘Increase in nuclear number and enzyme secretion’ from ‘Correlation between nuclear number and enzyme secretion’.

      (5) Line 151 and Figure 3F: YE also triggered a ~5-fold enhancement of secretion in A. nidulans without a concomitant increase in hyphal width. This merits some comment in the text.  

      Added an explanation, L156-157.

      In A. nidulans, the addition of yeast extract did not cause a dramatic increase in nuclear number, but hyphal width increased by 1.4-times and protein secretion increased by 5.1-times.

      (6) Line 252: Were nimE levels detected or altered in thick hyphae? The levels of this cycling might play a more important role in a shortened cell cycle than the authors have considered, especially as NimE functions during both G1 and G2. 

      Added an explanation below, L260-262.

      The expression level of nimE (AO090003000993) was low in both thick and thin hyphae, with no significant difference observed. As known in other organisms, its function is likely regulated through phosphorylation and the protein degradation.

      (7) Line 254: Please provide a citation for the statement that branches emerge as a result of cell wall loosening. 

      rephrased and added citation, L263.

      Branching is thought to occur through the degradation and reconstruction of the cell wall at the branching site (54).

      Harris SD. Branching of fungal hyphae: regulation, mechanisms and comparison with other branching systems. Mycologia 100, 823-32 (2008).   

      (8) Lines 275-277: It would be interesting to know whether the addition of rapamycin also suppressed the ability of amino acids to trigger greater numbers of class III hyphae. 

      We added new results at Fig. S2G.

      L168. Rapamycin decreased the ratio of hyphae with increased nuclei even in the medium with yeast extract (Fig. S2G).

      (9) Lines 282-289: My sense is that this model is too speculative at this time. The role of RseA seems very broad based on the strong deletion phenotype. How would the removal of RseA be regulated to limit its effect to the branch site? Also, the msyA deletion phenotype isn't entirely consistent with what you would expect if it were necessary to maintain thick hyphae. Lastly, the authors do not show that translational capacity is enhanced in thick hyphae. I would suggest that these statements be tempered to some degree. 

      Thank you for your comment. We agree that it was too speculative, whereas we believe that some explanatory interpretation is necessary. Therefore, we have revised the text as follows, L294-300. In our model, cell wall loosening during branching and regulation of cell volume by turgor pressure constitute necessary conditions for increasing cell volume and maintaining thick hyphae. RseA and MsyA may be involved in these processes. At the same time, enhanced translational capacity by increased expression of ribosomal genes, possibly due to associated with TOR activation by specific amino acids, and mechanisms that accelerate the cell cycle represent another essential condition that enables an increase in nuclear number.

      (10) General: how do the authors reconcile the observation that YE and amino acids stimulate the formation of thicker hyphae, yet the time lapse imaging (Figure 2E) suggests that these hyphae arise at a later time during colony development when these resources might be limiting? The authors should consider providing some insight into this in the Discussion. 

      L300-305. Added a discussion below.

      Both genetic potential and nutritional environmental signals are likely required for the formation of thick hyphae with a high number of nuclei. When thick hyphae were cultured on fresh medium, thin hyphae initially emerged, suggesting the necessity of sustained high metabolic activity.

    1. eLife Assessment

      This important study reports that an oncogenic population in an epithelium can either be repressed or spread, depending on the tissues. This is explained based on the differential interfacial tension hypothesis, and supported by pharmacological perturbations and numerical simulations using the vertex model. The study conveys a key message, but, as it stands, the strength of evidence is incomplete, and a more detailed analysis of the mechanistic origin of the different tensions and better comparison between experiments and simulations would strongly strengthen the message.

    2. Reviewer #1 (Public review):

      Summary:

      The behaviour of cells expressing constitutively active HRas is examined in mosaic monolayers, both in MCF10a breast epithelial and Beas2b bronchial epithelial cell lines, mimicking the potential initial phase of development of carcinoma. Single HRas-positive cells are excluded from MCF10a but not Beas2b monolayers. Most interestingly, however, when in groups, these cells are not excluded, but rather sharply segregated within a MCF10a monolayer. In contrast, they freely mix with wt Beas2b cells. Biophysical analysis identifies high tension at heterotypic interfaces between HRas and wild-type cells as the likely reason for segregation of MCF10a cells. The hypothesis is supported experimentally, as myosin inhibition abolishes segregation. The probable reason for the lack of segregation in the bronchial epithelium is to be found in the different intrinsic properties of these cells, which form a looser tissue with lower basal actomyosin activity. The behaviour of single cells and groups is recapitulated in a vortex model based on the principle of differential interfacial tension, under the condition of high heterotypic interfacial tension.

      Strengths:

      Despite being long recognized as a crucial event during cancer development, segregation of oncogenic cells has been a largely understudied question. This nice work addresses the mechanics of this phenomenon through a straightforward experimental design, applying the biophysical analytical approaches established in the field of morphogenesis. Comparison between two cell types provides some preliminary clues on the diversity of effects in various cancers.

      Weaknesses:

      Although not calling into question the main message of this study, there are a few issues that one may want to address:

      (1) One may be careful in interpreting the comparison between MCF10a and Beas2b cells as used in this study. The conditions may not necessarily be representative of the actual properties of breast and bronchial epithelia. How much of the epithelial organization is reconstituted under these experimental conditions remains to be established. This is particularly obvious for bronchial cells, which would need quite specific culture conditions to build a proper bronchial layer. In this study, they seemed to be on the verge of a mesenchymal phenotype (large gaps, huge protrusions, cells growing on top of each other, as mentioned in the manuscript).

      As an alternative to Beas2b, comparison of MCF10a with another cell line capable of more robust in vitro epithelial organization, but ideally with different adhesive and/or tensile properties, would be highly interesting, as it may narrow down the parameters involved in segregation of oncogenic cells.

      (2) While the seminal description of tissue properties based on interfacial tensions (Brodland 2002) is clearly key to interpreting these data, the actual "Differential Interfacial Tension Hypothesis" poses that segregation results from global differences, i.e., juxtaposition of two tissues displaying different intrinsic tensions. On the contrary, the results of the present work support a different scenario, where what counts is the actual difference in tension ALONG the tissue boundary, in other words, that segregation is driven by high HETEROTYPIC interfacial tension. This is an important distinction that should be clarified.

      (3) Related: The fact that actomyosin accumulates at the heterotypic interface is key here. It would be quite informative to better document the pattern of this accumulation, which is not clear enough from the images of the current manuscript: Are we talking about the actual interface between mutant and wt cells (membrane/cortex of heterotypic contacts)? Or is it more globally overactivated in the whole cell layer along the border? Some better images and some quantification would help.

      (4) In the case of Beas2b cells, mutant cells show higher actin than wt cells, while actin is, on the contrary, lower in mutant MCF10a cells (Figure 2b). Has this been taken into account in the model? It may be in line with the idea that HRas may have a different action on the two cell types, a possibility that would certainly be worth considering and discussing.

      In conclusion, the study conveys an important message, but, as it stands, the strength of evidence is incomplete. It would greatly benefit from a more detailed and complete analysis of the experimental data, a better fit between this analysis and the corresponding vertex model, and a more in-depth discussion of biological and biophysical aspects. These revisions should be rather easily done, and would then make the evidence much more solid.

    3. Reviewer #2 (Public review):

      Summary:

      The authors investigate the behavior of oncogenic cells in mammary and bronchial epithelia. They observe that individual oncogenic cells are preferentially excluded from the mammary epithelium, but they remain integrated in the bronchial epithelium. They also observe that clusters of oncogenic cells form a compact cluster in the mammary epithelium, but they disperse in the bronchial epithelium. The authors demonstrate experimentally and in the vertex model simulations that the difference in observed behavior is due to the differential tension between the mutant and wild-type cells due to a differential expression of actin and myosin.

      Strengths:

      (1) Very detailed analysis of experiments to systematically characterize and quantify differences between mammary and bronchial epithelia.

      (2) Detailed comparison between the experiments and vertex model simulations to identify the differential cell line tension between the oncogenic and wild-type cells as one of the key parameters that are responsible for the different behavior of oncogenic cells in mammary and bronchial epithelia

      Weaknesses:

      (1) It is unclear what the mechanistic origin of the shape-tension coupling is, which is used in the vertex model, and how important that coupling is for the presented results. The authors claim that the shape-tension coupling is due to the anisotropic distribution of stress fibers when cells are under external stress. It is unclear why the stress fibers should affect an effective line tension on the cell boundaries and why the stress fibers should be sensitive to the magnitude of the internal isotropic cell pressure. In experiments, it makes sense that stress fibers form when cells are stretched. Similar stress fibers form when the cytoskeleton or polymer networks are stretched. It is unclear why the stress fibers should be sensitive to the magnitude of internal isotropic cell pressure. If all the surrounding cells have the same internal pressure, then the cell would not be significantly deformed due to that pressure, and stress fibers would not form. The authors should better justify the use of the shape-tension coupling in the model and also present simulation results without that coupling. I expect that most of the observed behavior is already captured by the differential tension, even if there is no shape-tension coupling.

      (2) The observed difference of shape indices between the interfacial and bulk cells in simulations in the absence of differential line tension is concerning. This suggests that either there are not enough statistics from the simulations or that something is wrong with the simulations. For all presented simulation results, the authors should repeat multiple simulations and then present both averages and standard deviations. This way, it would be easier to determine whether the observed differences in simulations are statistically significant.

      (3) The authors should also analyze the cell line tension data in simulations and make a comparison with experiments.

    4. Author response:

      Reviewer #1 (Public review):

      Summary:

      The behavior of cells expressing constitutively active HRas is examined in mosaic monolayers, both in MCF10a breast epithelial and Beas2b bronchial epithelial cell lines, mimicking the potential initial phase of development of carcinoma. Single HRas-positive cells are excluded from MCF10a but not Beas2b monolayers. Most interestingly, however, when in groups, these cells are not excluded, but rather sharply segregated within a MCF10a monolayer. In contrast, they freely mix with wt Beas2b cells. Biophysical analysis identifies high tension at heterotypic interfaces between HRas and wild-type cells as the likely reason for segregation of MCF10a cells. The hypothesis is supported experimentally, as myosin inhibition abolishes segregation. The probable reason for the lack of segregation in the bronchial epithelium is to be found in the different intrinsic properties of these cells, which form a looser tissue with lower basal actomyosin activity. The behaviour of single cells and groups is recapitulated in a vortex model based on the principle of differential interfacial tension, under the condition of high heterotypic interfacial tension.

      Strengths:

      Despite being long recognized as a crucial event during cancer development, segregation of oncogenic cells has been a largely understudied question. This nice work addresses the mechanics of this phenomenon through a straightforward experimental design, applying the biophysical analytical approaches established in the field of morphogenesis. Comparison between two cell types provides some preliminary clues on the diversity of effects in various cancers.

      Weaknesses:

      Although not calling into question the main message of this study, there are a few issues that one may want to address:

      (1) One may be careful in interpreting the comparison between MCF10a and Beas2b cells as used in this study. The conditions may not necessarily be representative of the actual properties of breast and bronchial epithelia. How much of the epithelial organization is reconstituted under these experimental conditions remains to be established. This is particularly obvious for bronchial cells, which would need quite specific culture conditions to build a proper bronchial layer. In this study, they seemed to be on the verge of a mesenchymal phenotype (large gaps, huge protrusions, cells growing on top of each other, as mentioned in the manuscript).

      We thank the reviewer for this important point. We agree that our experimental conditions do not fully recapitulate the in vivo architecture of either breast or bronchial epithelia. However, here, our intention is to compare two well-established epithelial lines with distinct intrinsic mechanical and organizational properties, rather than to reproduce in-vivo microenvironment. Nevertheless, to address this, we have now strengthened our quantitative analysis of epithelial integrity in Beas2b monolayers, by including ZO-1 immunofluorescence along with E-cadherin immunofluorescence. These measurements confirm that Beas2b monolayers under our culture conditions retain junctional organization, albeit with larger gaps and protrusions compared to MCF10a. We will revise the text to make this distinction explicit.

      As an alternative to Beas2b, comparison of MCF10a with another cell line capable of more robust in vitro epithelial organization, but ideally with different adhesive and/or tensile properties, would be highly interesting, as it may narrow down the parameters involved in segregation of oncogenic cells.

      We agree with the reviewer that the inclusion of an additional epithelial model system with distinct adhesive and organizational properties would provide valuable insights. In line with this suggestion, we are currently repeating the key experiments using Madin-Darby Canine Kidney (MDCK) cells, a well-established model epithelial cell line. We believe this complementary system will allow us to further dissect the behaviour of HRasV12-expressing cells.

      (2) While the seminal description of tissue properties based on interfacial tensions (Brodland 2002) is clearly key to interpreting these data, the actual "Differential Interfacial Tension Hypothesis" poses that segregation results from global differences, i.e., juxtaposition of two tissues displaying different intrinsic tensions. On the contrary, the results of the present work support a different scenario, where what counts is the actual difference in tension ALONG the tissue boundary, in other words, that segregation is driven by high HETEROTYPIC interfacial tension. This is an important distinction that should be clarified.

      We thank the reviewer for this insightful comment. As correctly noted, Brodland’s 2002 work provided a seminal formulation of the Differential Interfacial Tension Hypothesis (DITH), which frames tissue organization in terms of effective interfacial tensions. In its original form, DITH emphasized segregation as a consequence of global differences in the intrinsic (bulk) tensions of juxtaposed tissues.

      While our results specifically show that segregation is determined by local interfacial mechanics between transformed- and host cells, from our experiments with blebbistatin, where we observed lost in segregation upon reducing global contractility, we believe that the differences in local interfacial mechanics also stem from global differences which belong intrinsically to the tissues in discussion here.

      To directly map global interfacial tension, in the revised manuscript, we aim to perform staining with E-cadherin, and actin in the two tissues, and measure cortical actin, stress fibers, and E-cadherin levels at the cell-cell junctions. Once the global tissue mechanics are mapped, we can be more confident about our claim on DITH. Nevertheless, we will also clarify this distinction, more clearly in the text and explicitly state that while DITH provided the foundation for conceptualizing tissue mechanics, our findings on transformed cell- healthy cell interactions specifically demonstrate that segregation is driven by high heterotypic interfacial tension at the tissue boundary.

      (3) Related: The fact that actomyosin accumulates at the heterotypic interface is key here. It would be quite informative to better document the pattern of this accumulation, which is not clear enough from the images of the current manuscript: Are we talking about the actual interface between mutant and wt cells (membrane/cortex of heterotypic contacts)? Or is it more globally overactivated in the whole cell layer along the border? Some better images and some quantification would help.

      We agree that more detailed visualization of actomyosin distribution would strengthen our conclusion. We are currently working on re-imaging the heterotypic interfaces at higher magnification and are quantifying fluorescence intensity of actin and myosin-II along cell–cell boundaries. All of this will be integrated in the next version of the manuscript.

      (4) In the case of Beas2b cells, mutant cells show higher actin than wt cells, while actin is, on the contrary, lower in mutant MCF10a cells (Author response image 2). Has this been taken into account in the model? It may be in line with the idea that HRas may have a different action on the two cell types, a possibility that would certainly be worth considering and discussing.

      Our current vertex model does not explicitly incorporate actin levels; rather, it captures their functional consequences indirectly through effective mechanical parameters such as cortical tension and adhesion strength. Nonetheless, we agree that the opposite trends in actin enrichment between Beas2b and MCF10a HRasV12 mutants raise the important possibility that HRas signaling may act through distinct mechanisms in the two cell types.

      To further investigate this, we are currently culturing MCF10a and Beas2b HRasV12 mutant populations separately (i.e., without wild-type cells) to assess their intrinsic organization and behavior in isolation. These experiments will help us disentangle how HRas activation differentially impacts epithelial architecture in these two cellular contexts, and we will discuss these ongoing efforts in the revised manuscript.

      From the modelling perspective, the model currently does not account for the different actin levels of mutants with respect to wt cells in the two tissues. This can be accounted for by having different  and  for mutants and wt in the two cases in simulation.

      In conclusion, the study conveys an important message, but, as it stands, the strength of evidence is incomplete. It would greatly benefit from a more detailed and complete analysis of the experimental data, a better fit between this analysis and the corresponding vertex model, and a more in-depth discussion of biological and biophysical aspects. These revisions should be rather easily done, and would then make the evidence much more solid.

      Reviewer #2 (Public review):

      Summary:

      The authors investigate the behavior of oncogenic cells in mammary and bronchial epithelia. They observe that individual oncogenic cells are preferentially excluded from the mammary epithelium, but they remain integrated in the bronchial epithelium. They also observe that clusters of oncogenic cells form a compact cluster in the mammary epithelium, but they disperse in the bronchial epithelium. The authors demonstrate experimentally and in the vertex model simulations that the difference in observed behavior is due to the differential tension between the mutant and wild-type cells due to a differential expression of actin and myosin.

      Strengths:

      (1) Very detailed analysis of experiments to systematically characterize and quantify differences between mammary and bronchial epithelia.

      (2) Detailed comparison between the experiments and vertex model simulations to identify the differential cell line tension between the oncogenic and wild-type cells as one of the key parameters that are responsible for the different behavior of oncogenic cells in mammary and bronchial epithelia

      Weaknesses:

      (1) It is unclear what the mechanistic origin of the shape-tension coupling is, which is used in the vertex model, and how important that coupling is for the presented results. The authors claim that the shape-tension coupling is due to the anisotropic distribution of stress fibers when cells are under external stress. It is unclear why the stress fibers should affect an effective line tension on the cell boundaries and why the stress fibers should be sensitive to the magnitude of the internal isotropic cell pressure. In experiments, it makes sense that stress fibers form when cells are stretched. Similar stress fibers form when the cytoskeleton or polymer networks are stretched. It is unclear why the stress fibers should be sensitive to the magnitude of internal isotropic cell pressure. If all the surrounding cells have the same internal pressure, then the cell would not be significantly deformed due to that pressure, and stress fibers would not form. The authors should better justify the use of the shape-tension coupling in the model and also present simulation results without that coupling. I expect that most of the observed behavior is already captured by the differential tension, even if there is no shape-tension coupling. 

      While the segregation behavior can be captured by the differential tension, without the shape-tension coupling, we noticed unjamming and aligned movement of wild type cells at the mutant-cell interface. This was only captured when we incorporated shape tension coupling in the model, suggesting changes in cell shapes due to differential interfacial tension is essential in driving the fate of the mutants.  Below, difference between shape indices of cells at the interface and away from the boundary is plotted versus the interfacial tension in the case of no shape-tension coupling [Author response image 1]. The red dashed line represents the experimental value of the shape index difference. The blue line is the shape index difference between two randomly chosen groups of cells (half of the total number of cells in each group is taken). At zero line-tension, the difference in shape index between interface cells and cells away from the interface is same as that between randomly chosen groups of cells, which is expected since there should be no interface at zero line-tension. The no shape-tension data presented here are averaged over 19 seeds. Although the results without shape-tension coupling reaches experimental values at high enough differential tension [Author response image 2], a closer inspection of the simulation results show that the cells are just squeezed and are aligned perpendicular to the interface, which is contrary to what is seen in experiments.

      Author response image 1.

      Shape indices versus the interfacial line tension<br />

      Calculating the average of the absolute value of the dot product of the nematic director and the interface edge for simulations with and without shape-tension coupling clearly shows that with shape-tension coupling, the cells align and elongate along the interface as is seen in experiment, given by an interface dot product value > 0.5 at high enough line-tension values. Further, shape-tension coupling or biased edge tension has been used before to model for cell elongation during embryo elongation [1] and here we use it as an active line-tension force, which elongates cells along the interface, in addition to the differential tension which is passive. This additional quantification of the alignment and elongation of cells along the interface will be added to the Supplementary Information (SI).

      [1] Dye, N. A., Popović, M., Iyer, K. V., Fuhrmann, J. F., Piscitello-Gómez, R., Eaton, S., & Jülicher, F. (2021). Self-organized patterning of cell morphology via mechanosensitive feedback. Elife, 10, e57964.

      Author response image 2.

      Change in interfacial tension with and without shape tension coupling<br />

      (2) The observed difference of shape indices between the interfacial and bulk cells in simulations in the absence of differential line tension is concerning. This suggests that either there are not enough statistics from the simulations or that something is wrong with the simulations. For all presented simulation results, the authors should repeat multiple simulations and then present both averages and standard deviations. This way, it would be easier to determine whether the observed differences in simulations are statistically significant.

      The reviewer is right in pointing out that statistics for the plots must be shown. The difference in shape indices between the interfacial and bulk cells in simulations has been calculated over 11 different seed values. The observed differences in simulations along with the standard deviations have been plotted below [Author response image 3]. This figure in the paper will be updated to include the standard deviations. The non-zero difference in shape index in the absence of differential line tension for low values of stress threshold is due to the shape-tension coupling acting even at low differential tension. Thus, a non-zero, sufficiently high value of the stress threshold is required in our model with shape-tension coupling, for the model to make sense. This has also been stated in section 4 of the paper. The importance of the stress-tension coupling has been stated in response to the previous point.

      Author response image 3.<br />

      (3) The authors should also analyze the cell line tension data in simulations and make a comparison with experiments.

      We agree with the reviewer that cell line tension data should also be analyzed and compared with experiments. This will be added to the next version of the paper.

    1. eLife Assessment

      In this important study, the authors use computational modeling to explore how fast learning can be reconciled with the accumulation of stable memories in the olfactory bulb, where adult neurogenesis is prominent. Their model demonstrates that changes in excitability, plasticity, and susceptibility to apoptosis during the maturation of adult-born granule cells can help resolve the flexibility-stability dilemma. These compelling results provide a coherent picture of a neurogenesis-dependent learning process that is consistent with diverse experimental observations and may serve as a foundation for further experimental and computational studies.

    2. Reviewer #1 (Public review):

      Summary:

      Sakelaris and Riecke used computational modeling to explore how neurogenesis and sequential integration of new neurons into a network support memory formation and maintenance. They focus on the integration of granule cells in the olfactory bulb, a brain area where adult neurogenesis is prominent. Experimental results published during recent years provide an excellent basis to address the question at hand by biologically constrained models. The study extends previous computational models and provides a coherent picture of how multiple processes may act in concert to enable rapid learning, high stability of memories, and high memory capacity. This computational model generates experimentally testable predictions and is likely to be valuable to understand roles of neurogenesis and related phenomena in memory. One of the key findings is that important features of the memory system depend on transient properties of adult-born granule cells such as enhanced excitability and apoptosis during specific phases the development of individual neurons. The model can explain many experimental observations, and suggests specific functions for different processes (e.g., importance of apoptosis for continual learning). While this model is obviously a massive simplification of the biological system, it conceptualizes diverse experimental observations into a coherent picture, it generates testable predictions for experiments, and it and will likely inspire further modeling and experimental studies.

      Strengths:

      - The model can explain diverse experimental observations

      - The model directly represents the biological network

      Weaknesses:

      - As many other models of biological networks, this model contains major simplifications.

    3. Reviewer #2 (Public review):

      Summary:

      The authors propose a mechanism to provide flexibility to learn new information while preserving stability in neural networks by combining structural plasticity and synaptic plasticity.

      Strengths:

      An intriguing idea, well embedded in experimental data.

      Authors have done a great job addressing reviewers' concerns

      Weaknesses:

      None

    4. Reviewer #3 (Public review):

      The manuscript is focused on local bulbar mechanisms to solve the flexibility-stability dilemma in contrast to long range interactions documented in other systems (hippocampus-cortex). The network performance is assessed in a perceptual learning task: the network is presented with alternating, similar artificial stimuli (defined as enrichment) and the authors assess its ability to discriminate between these stimuli by comparing the mitral cell representations quantified by Fisher discriminant analysis. The authors use enhancement in discriminability between stimuli as function of the degree of specificity of connectivity in the network to quantify the formation of an odor-specific network structure which as such has memory - they quantify memory as the specificity of that connectivity.

      The focus on neurogenesis, excitability and synaptic connectivity of abGCs is topical, and the authors systematically built their model, clearly stating their assumptions and setting up the questions and answers. In my opinion, the combination of latent dendritic representations, excitability and apoptosis in an age-dependent manner is interesting and as the authors point out leads to experimentally testable hypotheses.

      In the revised manuscript, the authors have systematically addressed my previous concerns. In particular, they now refer to previous work on granule cells-mitral cell interactions more generally, they explain the pros and cons for usage of specificity in connectivity as a proxy for memory capacity, and the biological plausibility of the model.

    5. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      (1) Figure 2 and related text: it would be useful to explain more explicitly what is meant by "neurogenic" and "non-neurogenic" models. I presume that the total number of neurons in non-neurogenic models is lower than in neurogenic models because no new neurons are added. It would be useful to plot the number of GCs as a function of timesteps.

      We have clarified the distinction between neurogenic and non-neurogenic models in the text (Lines 142-145), explicitly noting that in non-neurogenic models, no new GCs are added, resulting in a lower total neuron count over time. In response to the reviewer’s suggestion, we generated a plot showing the number of GCs over time (see below). Because the neurogenic model exhibits a simple linear increase, we found this plot not especially informative for inclusion in the manuscript. However, we agree with the reviewer’s later comments that similar plots are useful for interpreting specific results, and we have included those where appropriate.

      Author response image 1.

      Number of GCs over time for neurogenic (solid line) and non-neurogenic (dotted line) networks

      (2) Figure 2F, G: memory declines dramatically when the number of GCs at enrichment onset increases beyond an optimum. Why?

      We have explained the reasoning more thoroughly in the text (Lines 174-177) and added a new supplemental figure to support this reasoning (Figure S2). As the number of GCs increases, the network becomes overly inhibited and the response of abGCs to the stimuli decreases (Fig S2A). This leads to a smaller population of GCs being able to integrate with the stimulus (Fig S2B) which is expected given the activity-dependent plasticity rule. Moreover, it can be seen in Fig S2C that for networks with increasing size, the GCs that do learn only connect to MCs that are driven strongest by the stimuli until they struggle to connect to any MCs at all.

      In principle, a homeostatic mechanism like synaptic scaling could reduce activity to restore balance, but such a mechanism would also likely disrupt existing memories. Alternatively, we suggest activity-dependent apoptosis as a superior homeostatic mechanism because it leads to a stable level of activity without substantially erasing existing memories.

      (3) The paragraph describing synaptic connectivity of abGCs (related to Figure 2H) is confusing. What is the directionality of synapses considered here: mitral-to-granule, or granule-to-mitral? The text is opaque here. Connectivity matrix in Figure 2H: who is presynaptic, who is postsynaptic? If I understand correctly, these questions are actually irrelevant because all mitralgranule synapses in the network are reciprocal. This should be pointed out explicitly in the figure legend. Generally: the fact that the network is fully reciprocal (if I understand correctly) is very important but not stated with sufficient emphasis. It should be stated very explicitly in the text that connectivity matrices are fully reciprocal, and an equation clarifying this point should be included in Methods.

      (6) Connectivity matrix: to what degree was connectivity between mitral and granule cells reciprocal (fraction of connections in either direction that were paired with a connection in the opposite direction between the same cell pair)? Was connectivity shaped by experience (enrichment) reciprocal?

      (7) Directly related to the above: it would be useful to show the disynaptic connectivity matrix between mitral cells and analyze its symmetry. For the symmetric component, it should then be analyzed what fraction of this can be attributed to the reciprocal synapses, and what fraction is contributed by connectivity via different granule cells. This should then be compared to models with biologically realistic fractions of reciprocal connections. Is the model proposed here consistent with a biologically realistic fraction of reciprocal synapses between mitral-granule cell pairs?

      We appreciate these insightful and detailed comments. We agree that the assumption that MC-GC synapses were fully reciprocal was not clearly stated. We now explicitly state this in the main text (lines 90-94, 369-370, Figure 2 caption) and methods (line 561), emphasize its importance. As the reviewer points out, this is a simplifying assumption and does not fully reflect the biology because not all synapses are reciprocal in the true system. We also note that our synaptic plasticity model does not break the reciprocity assumption: all connections added or pruned during learning remain reciprocal. As a result, the disynaptic connectivity matrix (Bottom panel below, MCs sorted by stimulus as shown in the top panel) is always symmetric.

      We have now made these statements explicit in the main text and in the methods. Regarding functional consequences of this assumption, earlier work by our group has examined the impact of the degree of reciprocity of MC-GC synapses in a similar OB model (Chow, Wick & Riecke, Plos Comp Bio 2012). The study examined three different changes in reciprocity by (1) redirecting a fraction of the inhibitory connections of each GC to randomly chosen MCs instead of the MCs that drive that GC, (2) allowing heterogeneity in reciprocal weights so that there is no relationship between the strength of the MC -> GC synapse and the GC -> MC synapse, (3) reducing the level of self-inhibition a MC receives from the GCs that it excites. The model was found to be quite robust to each of these manipulations, suggesting that our present model likely remains functionally relevant even if biological reciprocity is partial. We reference this work now in the discussion, lines 490-492.

      Author response image 2.

      Disynaptic connectivity. Top: MC activity in response to the two stimuli, sorted by MC selectivity. Bottom: Disynaptic connectivity matrix (diagonal subtracted).

      (4) How were mitral cells sorted in Figure 2H? This needs to be explained.

      (5) Directly related to the point above: the text mentions that synaptic connectivity between GCs of the "learning cluster" and mitral cells (which direction?) is increased for mitral cells responding by enrichment odors, but this is not shown in the figure. This statement suggests that mitral cells sorted to the bottom of the y-axis respond more strongly to enrichment odors, but the information is not given directly. Please provide more information to back up your statements.

      Indeed as the reviewer inferred, MCs in Figure 2H were sorted so that those that receive the strongest stimulation from the odor were at the bottom of the y-axis. We have clarified this in the Figure 2 caption and added a subplot to Figure 2H showing the average MC input to make this more explicit.

      (8) Apoptosis (Figure 4 and related text): paragraph 231ff is somewhat difficult to comprehend because the "number" of enrichments should really be the "frequency" of enrichments. In Figure 4, it is not mentioned explicitly that each enrichment is with different random new odors.

      We agree that the term “number” of enrichments was imprecise and have revised the text to refer instead to the frequency of enrichment events (Lines 255-267). We also clarified that in Figure 4, each enrichment corresponds to a different set of randomly sampled odors, and we now state this explicitly in both the Figure 4 legend and main text (Lines 260-261).

      (9) Apoptosis: apoptosis improves memory but the underlying reason remains opaque. A simple prediction of the data in Figure 4D and 4E is that the number of GCs in 4E. It would be helpful to show this. Furthermore, an obvious question that arises is whether a higher frequency of enrichments improves memories because the total number of granule cells is kept low, or because granule cells are removed specifically based on their activity (or both). This could be addressed easily by artificially removing a random subset of granule cells in a simulation such as 4E to match granule cell numbers to the case in 4D.

      Apoptosis improves learning is because it reduces the total inhibition in the network by removing GCs and thus prevents deficits in learning that occur in Fig. 2G as GCs accumulate in the network. As the reviewer inferred, the number of GCs in Figure 4D is lower than in 4E and this is now clarified in the text. This difference was shown implicitly in Supplementary Figure S4D (previously S3D), but we now explicitly reference this plot to support this point as well (Line 266).

      As the reviewer notes, there is a question in whether increased enrichment frequency improves memory because it limits the total number of GCs, or because apoptosis selectively removes GCs based on their activity, or both. Our model supports both mechanisms. Importantly, simply reducing GC numbers through random deletion will degrade existing memories: random removal erodes memory representations encoded by those GCs. In contrast, our age and activity dependent apoptosis rule targets a specific cohort of adult-born GCs. This selective removal minimizes damage to existing memories encoded by GCs outside of this cohort while keeping GC numbers within a regime that supports robust learning (as shown in Figure 2G).

      However, we note that if enrichment frequency becomes too high, even recent memories can be lost due to premature pruning of GCs that have not yet stabilized their synaptic connections. This tradeoff has been shown experimentally (Forest et al., Nat Comm 2019) which we reproduce in our model (Figure S4).

      (10) Text related to Figure 5: "Learning flexibility...approached a steady state when the growth of the network started to saturate". Please show the growth (better: size) of the network (total number of GCs) for these simulations (and other panels in Figure 5). It would also be useful to show the total number of GCs in other figures (e.g. Figure 4; see above).

      We have now added a supplementary figure (Figure S6) that shows the total number of GCs over time for the simulations presented. This confirms that the network size approaches a steady state around the same time that learning flexibility begins to plateau, as noted in the original text (now line 275), and highlights the large number of GCs without apoptosis as well as the slightly reduced number of GCs in the permanent encoding model (line 312).

      (11) As much as I appreciate the comprehensive discussion of the results in a broader context, I feel that the discussion can be somewhat shortened. The section on lateral inhibition is not fully valid given that synaptic connectivity is reciprocal. I also feel that much of the final section (Model assumptions and outlook) can be dropped (except for the last paragraph), not because anything is irrelevant, but because these points have been made, onen repeatedly, in the text above.

      We agree that the discussion could be streamlined and have revised the manuscript accordingly. Specifically, we have shortened the section on lateral inhibition and clarified that the OB relies predominantly on reciprocal connectivity (Line 370). We also agree that parts of the final section were repetitive and have removed these. However, to address comments by Reviewer 3, we also expanded on some of the model assumptions. We thank the reviewer for helping us improve the clarity and focus of the manuscript.

      (12) Figure 5: bolding every 5th curve is confusing.

      We have adjusted our figure accordingly.

      (13) "...we biased the dendritic field...": it would be helpful to explain the idea of a "dendritic field" in a bit more detail prior to this sentence.

      We have now noted that GC’s "dendritic field" refers to the subset of MCs with which it is capable of forming synaptic connections when we initially describe the model (Line 97).

      Reviewer #3:

      (1) The authors find that a network with age-dependent synaptic plasticity outperforms one with constant age-independent plasticity and that having more GC per se is not sufficient to explain this effect. In addition, having an initial higher excitability of GCs leads to increased performance. To what degree the increased excitability of abGCs is conceptually necessarily independent of them having higher synaptic plasticity rates / fast synapses?

      We thank the reviewer for this question, as the difference between excitability and plasticity rate in memory formation is something we intended to highlight in this study. We have updated the (Lines 157-198) to clarify this.

      At the cellular level, a neuron's excitability and its rate of synaptic plasticity are mechanistically distinct: excitability is governed by factors such as ion channel expression or membrane resistance, whereas plasticity rates are influenced by molecular pathways involved in synapse and dendritic spine formation and remodeling. While these are independent properties, they are functionally coupled: most synaptic plasticity rules are activity-dependent, so greater excitability can increase the likelihood of plasticity being induced but does not itself guarantee learning.

      Our model reflects this distinction. Increased excitability biases which neurons become activated and thus eligible to undergo plasticity, but actual learning still depends on the plasticity rate itself. This can be seen by comparing the model constant plasticity and excitability (solid blue and green curves in Figure 2C) to the model with only transient excitability (solid blue and green lines in Figure 2E). In both cases, the strength and duration of the memory remain limited by the plasticity rate. We note additionally that, in this network, neurons compete to learn new stimuli: as GCs start to learn, they suppress MC activity through recurrent inhibition which suppresses learning in other GCs who otherwise would have been in position to learn the odor. As a result there is not a significant increase in the overall number of neurons recruited to learn (Figure 2J). In a different network architecture, such as a feedforward network, we would not expect this to be the case; greater excitability in a population of neurons would likely increase the memory by increasing the number of neurons recruited to learn. Transiently enhanced excitability biases which neurons join the memory engram (Figure 2J), but the extent and rate of learning still depend on the plasticity rates themselves. We did note in the original text (now lines 284-286) that this bias in recruitment subtly increases memory stability, but the extent is not great. In principle, a model can be engineered to rely on transiently increased excitability to encode memories in orthogonal subpopulations of neurons and that this could resolve the flexibility-stability dilemma. However, in that case, the number of memories that can be stored within a short time would be bounded by the size of this subpopulation such that even if a large number of odors are presented, mature GCs cannot become part of the engram and the network would likely fail to learn the stimuli. However, when this was tested experimentally (Forest et al. Cereb Cor. 2020), it was found that mature GCs participated in the engram when the number of odors was sufficiently high. Our results are consistent with these experiments: for complex odor environments, neonatal GCs, which are mature during odor exposure, and abGCs both participate in the engrams.

      Author response image 3.

      Simulating learning in more complex odor environments. Top: enrichment consisted of three odor pairs presented sequentially in a random order. Bottom: enrichment consisted of five odor pairs. Left: discriminability of the odor pairs over time. Middle: connectivity between MCs (sorted by odor selectivity) and GCs (sorted by age). In both cases AbGCs develop a clear connectivity structure. In more complex environments neonatal GCs also start to develop a clear connectivity structure. Right: combined engram membership across all stimuli by GC age.

      In sum, transiently increased excitability alone will not make learning any faster, so a fast learning system must have a high plasticity rate. If this plasticity rate stays high, then memories stored in these neurons, even if no longer highly excitable, will be vulnerable as the neurons can still be driven above their plasticity threshold by moderately interfering stimuli and will thus be quickly forgotten. Conversely, if the reviewer is wondering if a greater increase in the plasticity rate of new neurons can compensate for a lack of excitability, this is not the case: if a newborn neuron is not sufficiently driven by the stimulus it will not learn regardless of how high its plasticity rate is.

      (2) The authors do not mention previous theoretical work on the specificity of mitral to granule cell interactions from several groups (Koulakov & Rinberg - Neuron, 2011; Gilra & Bhalla, PLoSOne, 2015; Grabska-Bawinska...Mainen, Pouget, Latham, Nat. Neurosci. 2017; Tootoonian, Schaefer, Latham, PLoS Comput. Biol., 2022), nor work on the relevance of top-down feedback from the olfactory cortex on the abGC during odor discrimination tasks (Wu & Komiyama, Sci. Adv. 2020), or of top-down regulation from the olfactory cortex on regulating the activity of the mitral/tuned cells in task engaged mice (Lindeman et al., PLoS Comput. Biol., 2024), or in naïve mice that encounter odorants (in the absence of specific context; Boyd, et al., Cell Rep, 2015; Otazu et al., Neuron 2015, Chae et al., Neuron, 2022). In particular, the presence of rich topdown control of granule cell activity (including of abGCs) puts into question the plausibility of one of the opening statements of the authors with respect to relying solely on local circuit mechanisms to solve the flexibility-stability dilemma. I think the discussion of this work is important in order to put into context the idea of specific interactions between the abGCs and the mitral cells.

      We thank the reviewer for these detailed and thorough comments, and whole-heartedly agree that it is important to discuss the listed studies in order to contextualize our work through the broader lens of how information is processed in the OB. We have expanded our discussion to further acknowledge and integrate insight from previous theoretical and experimental work cited by the reviewer. (Lines 361-366, 493-550)

      Regarding the importance of top-down feedback, we of course recognize that in practice cortical inputs play a critical role in abGC survival and synaptic integration. However, its nature is not quite clear and is likely variable across behavioral seungs. In the paradigm that we study in the manuscript, there is likely no key reward value or contextual signal that is relayed to the OB. One plausible interpretation is that in this task, cortical feedback provides a random, variable baseline excitatory drive to GCs. This would likely be consistent with many of the listed studies, e.g.

      (1) Glomerular layer targeting of feedback would be explicitly unrelated to glomerular odor specificity, as in Boyd et al.

      (2) GC activity would decrease if these cortical inputs were silenced, resulting in stronger MC responses as in Otazu et al., Chae et al.

      (3) Silencing PCx during learning would prevent GCs from reaching activity-dependent plasticity thresholds, resulting in decreased spine density as in Wu & Komiyama.

      Likewise activating PCx would lead to increased spine density.

      In this interpretation, the effect of top-down input could be captured implicitly by adjusting model parameters such as activity or plasticity thresholds. For the purposes of our study, we opted to neglect these inputs in favor of model simplicity.

      Critically, even if top-down inputs play a substantially larger role, by perhaps even going as far as providing signals to abGCs to modulate their development, the core solution to the flexibility-stability dilemma that we describe stays local: we predict that the memory persists in the same network in which it was formed.

      (3) To what the degree of specific connectivity reflects a specific stimulus configuration, and is a good proxy for determining the stimulus discriminability and memory capacity in terms of temporal activity patterns (difference in latency/phase with respect to the respiration cycle, etc.) which may account to a substantial fraction of ability to discriminate between stimuli? The authors mention in the discussion that this is, indeed, an upper bound and specific connectivity is necessary for different temporal activity patterns, but a further expansion on this topic would help in understanding the limitations of the model.

      We thank the reviewer for raising this important point. Indeed, there have been several recent experimental studies indicating that much of the information needed for olfactory discrimination is encoded in the temporal activity patterns of mitral and tuned cells. Our model does not explicitly simulate these dynamics. It was for this reason that we defined memory in terms of the learned structure of the network rather than by firing rate activity. This is motivated by the idea that learned patterns of connectivity constrain the space of neural activity the network can support, and thus shape stimulus responses. We now make this limitation more explicit in the discussion and clarify that the specific MC–GC connectivity we analyze should be seen as a structural substrate that constrains the possible temporal transformations the network could support (Lines 492-506).

      (4) Reward or reward prediction error signals are not considered in the model. They however are ubiquitous in nature and likely to be encountered and shape the connectivity and activity patterns of the abGC-mitral cell network. Including a discussion of how the model may be adjusted to incorporate reward/error signals would strengthen the manuscript.

      We appreciate the reviewer’s suggestion and agree that reward and reward prediction error signals are critical components of many learning paradigms. We deliberately chose not to model associative learning, reward signals or top-down neuromodulation in this work. Our goal is to investigate the role of adult neurogenesis in a regime where its contribution has been shown to be experimentally necessary. Specifically, we focused on an unsupervised perceptual learning paradigm where adult neurogenesis is required for successful odor discrimination (Moreno et al. PNAS, 2008). In contrast, when the same odors are used in a rewarded learning paradigm, performance remains intact even when adult neurogenesis is ablated (Imayoshi et al., Nat. Neuro., 2008). This dissociation suggests that neurogenesis is dispensable in contexts where reward can guide learning. As such, we argue that isolating the contribution of local circuit dynamics in an unsupervised setting is critical to understanding what neurogenesis is uniquely enabling, especially given the evolutionary cost of maintaining it.

      We agree that extending this work to incorporate reward-driven plasticity or neuromodulatory influences would be a valuable direction for future research. In particular, it could help clarify how different learning paradigms engage distinct abGC cohorts (e.g., Mandairon et al., eLife 2018; Wu & Komiyama, Sci. Adv. 2020), and how task structure shapes memory allocation and engram composition. We have incorporated this into the discussion regarding extending our model to include top down feedback (lines 539-553).

      Specific comments

      (1) Lines 84-86; 507-509; Eq(3): Sensory input is defined by a basal parameter of MCs spontaneous activity (Sspontaneus) and the odor stimuli input (Siodor) but is not clear from the main text or methods how sensory inputs (glomerular patterns) were modeled

      We now clarify in the Methods section "Stimulus model" how the sensory inputs were modeled. Specifically, odor-evoked inputs to mitral cells (Siodor) were generated either as Gaussian profiles across the mitral cell population (Figs. 2,3) or as sparser random patterns (Figs. 4,5). In Figures 2 and 3, the denser Gaussian stimuli require more GCs to learn the odors, aiding in visualization of the connectivity matrix (Figure 2H) and abGC recruitment plots (Figure 2I,J; Figure 3C,E). However, real olfactory stimuli activate a sparse set of MCs, so in Figures 4 and 5 where we address learning of many stimuli, we utilize sparser, binary, stimuli delivered to only 10% of MCs, in range of experimental data (Wachowiak and Cohen, Neuron, 2001). The fact that the stimuli are binary, however, is not realistic and leads to denser representations. This leads to a worst-case scenario for the model as denser memory representations are easier to overwrite. These points has been added explicitly to the Methods section "Stimulus model" to improve clarity.

      (2) Lines 118-122: The used perceptual learning task explanation is done only in the context of the discriminability of similar artificial stimuli using the Fisher discriminant and "Memory" metric. A detailed description of the logic of the perceptual learning task methods and objective, taking into account Comment 1, would help to better understand the model.

      We thank the reviewer for pointing out had not adequately described the task and have updated the main text (lines 125-132) and included a new methods section "Perceptual learning task" to describe it more explicitly. The experiments that inspired the simulation followed an ecological model of discrimination learning (Moreno et al. PNAS 2009): For one hour a day over a ten day "enrichment period", two tea balls containing similar but distinct odors were suspended from the lid of each mouse's home cage. The mice engaged with the stimuli under self-directed conditions, therefore learning through natural experience. As a result the mice use olfactory information to discriminate between the similar stimuli, a skill potentially relevant for navigation or social behaviors.

      In our simulations, we model these experiments as follows. During the enrichment period, the model is stimulated with a randomly selected stimulus chosen from a set of two similar stimuli, corresponding to a mouse choosing to sniff one of the tea balls. During enrichment, in between these bouts of "sniffing", the model only receives spontaneous activity, reflecting the temporal sparsity of sensory input even over the enrichment period. Outside of enrichment, the model again receives only spontaneous input.

      (3) Rapid re-learning of forgotten odor pair is enabled by sensory-dependent dendritic elaboration of neurons that initially encoded the odors and the observed re-learning would occur even if neurogenesis was blocked following the first enrichment and even though the initial learning did require neurogenesis. When this would ever occur in nature? The re-learning of an odor period? Why is this highlighted in the study?

      We believe that this sort of learning is certainly relevant in nature. To clarify: by “learning,” we do not refer to the memory of an entire “odor period”, but simply an altered mapping of specific stimuli. Therefore, forgeung could occur if these specific stimuli are absent from the environment for a period of time, and re-learning would occur when these stimuli are re-encountered. Natural odor environments are highly dynamic, as environmental conditions and social contexts change over time. The odors an animal encounters also depend strongly on its own behavior; as it explores different environments, it may be exposed to particular odors intermittently: it could encounter them in one location, then not return to that location for some time before returning again.

      Such natural variability in odor exposure makes the ability to forget and re-learn especially valuable, allowing the animal to prioritize relevant information while maintaining flexibility. To this end, we show in Figure 5G that the synaptic forgetting of odors is beneficial to the performance of the model because it reduces interference in the network. Therefore we highlight that re-learning enabled by adult neurogenesis is a highly efficient strategy for memory storage and retrieval, which is why he emphasize it in this study.

      (4) Figure 2A: I understand that the ages shown at the bottom of the colored boxes represent the GC age. If so, find a better way to express that to avoid confusing 'GC ages' from the days shown in the perceptual learning task description (Figure 2B).

      We have updated the text in the figure to disambiguate the two and refer to the “days” shown in the perceptual learning task description now as “time relative to enrichment”

      (5) Figure 2B: Clarify how the two-dimensional arrays are arranged to represent the patterns shown. Does each point of the array represent one neuron? If so, are these neurons re-arranged to help the readers visually differentiate patterns A and B? Are the patterns of activity of MCs in the model spatially and temporally sparse as observed in experimental work?

      In Figure 2B, each point in the two-dimensional array represents the activity of a single mitral cell. The layout is purely for visualization—neurons are re-arranged to make the differences between odor patterns A and B visually apparent. This ordering does not reflect anatomical position or model architecture. We revised the Figure 2 caption to say this explicitly.

      Regarding spatial sparseness, as we mentioned in the response to the reviewer’s comment (1), the activity of mitral cells in response to odors is spatially sparse in the model. Regarding temporal sparseness, while the model is not spiking and does not include temporal dynamics within the timescale of the breath, however, odor input is delivered in discrete, odorspecific epochs interleaved with periods of no input, which leads to temporally structured activity patterns. This information has been made explicit in the new methods sections "Stimulus model" and "Perceptual learning task"

      (6) Figure 3C and Line 189: potential confusion between the color code mentioned in the legend for the enrichment and developing periods.

      It appeared to be a confusion in the text and has been corrected (Lines 212-213).

      (7) Figure 5F: For clarity, this would benefit from replacing the bold line with areas in the plot to depict the enrichment periods.

      We agree that replacing the bolded line segments with shaded areas is more clear and have updated the figure accordingly, and appreciate the reviewer's suggestion to clarify the figure.

      (8) Lines 380, 416: Potential role of cortical feedback and or neuromodulation depending on behavioral relevance or permanent exposure? Later mentioned in Lines 467 - 474.

      We have updated the text to acknowledge the role of potential cortical feedback and neuromodulation, now in lines 403-407.

    1. eLife Assessment

      This important work sets out to identify the neural substrates of associative fear responses in adult zebrafish. Through a compelling and innovative paradigm and analysis, the authors suggest brain regions associated with individual differences in fear memory. While several findings are well supported, aspects of the interpretation and presentation are partially incomplete, and the manuscript would benefit from adjusting key claims or including additional experiments. Nonetheless, this study showcases the strength of zebrafish for systems-level neuroscience and will be of broad interest to the neuroscience community.

    2. Reviewer #1 (Public review):

      Summary:

      This work provides a comprehensive analysis of how adult zebrafish show fear responses to conspecific alarm substances (CAS) and retain their associative memory. It shows that freezing is a more reliable measure of fear response and memory compared to evasive swimming, and that the reactivity and the type of responses depend on the zebrafish strain. It further suggests neuronal substrates of different fear responses based on c-Fos mapping.

      Strengths:

      The behavioral part is the most comprehensive and detailed yet in the zebrafish field, providing strong support for the authors' claim. The flow from Figure 1 to Figure 4 is very smooth. They provide extremely detailed, yet complementary and necessary, analyses of how different categories of behavior emerge over time during the CAS exposure and memory retrieval. I'm convinced that neuro researchers who study fear/stress responses will always refer to this paper to plan and interpret their future experiments.

      Weaknesses:

      The neural analysis part is very comprehensive. Figure 5 and Figure 6 are independent but complement each other very well. They together support that the cerebellar system is the key brain component for a freezing response. Their extreme focus on high-level analyses, however, came at the expense of biological intuitions. I suggest adding some figure panels and result/discussion paragraphs to help with that aspect.

    3. Reviewer #2 (Public review):

      In this study, Fontana et al. develop a paradigm for associative conditioning by pairing exposure to an alarm substance with a novel tank. Exposure to conspecific alarm substance (CAS) in the novel tank triggers freezing and what they characterize as evasive swimming behaviour, which is subsequently seen in a re-exposure to the novel tank without the CAS present. Importantly, these states are identified via automated processes, including postural tracking and a random forest classification process, which could be very useful tools for subsequent studies.

      In their experiments, they focus on the differences in behaviour among strains of zebrafish (both males and females), and among individual zebrafish. For males and females of different strains, they find some differences, though the clearest message seems to be that the most robust measure of the behaviour in response to both the CAS and in the memory trials is the freezing behaviour, while evasive behaviour is more variable. and not always seen. This may relate to their observation of significant "evasiveness" in vehicle control experiments (discussed further below).

      Moving on to individual variation from within this multi-strain male/female dataset, they first examine transition matrices between states and find tthat his is not dramatically altered by stimulus exposure. They then use clustering to identify 4 different "classes" of zebrafish that differ in their expression (or not) of two types of behaviour: freezing and/or evasive behaviour. They show that over the three exposure epochs of the experiment, this classification is somewhat stable in an individual fish, though many fish change their behaviour - e.g., evading + freezing -> only freezing.

      In the final set of experiments, the authors move beyond behavioural analyses and perform whole-brain cFos mapping of these individual zebrafish. They perform analyses aimed at identifying correlations between individual behavioural expression and the number of cFos-positive cells in different brain regions. Using partial least squares analysis, they find areas associated with two types of behavioural contrasts, which differ in their weighting of different behavioural expression during the Memory trials. Covariation and network structure analysis within different classes of larvae also find some differences in covariation among brain areas, providing hypotheses as to underlying network effects that may govern the expression of freezing and/or evasive behavior in the memory trial phases.

      Overall, I find this to be an interesting study that employs state of the are methods of behavioural analyses and whole-brain cFos analyses, but I am left a little bit confused as to what the take home message is and what can be concluded from this complex study that mixes in analyses of strain, sex, and individuality within a quite complex assay with multiple behavioural parameters.

      My suggestions are as follows:

      (1) My first concern relates to the claim in the abstract that "We found that fear memory behavior fell into four distinct groups: non-reactive, evaders, evading freezers, and freezers".

      In my opinion, the "freezing" aspect is well supported as being both triggered by the CAS and for memory effect upon re-exposure to the tank, but I am less convinced about the "evasive" behaviour. In Figure 2, it appears that "evasiveness" is generally not increased in both the Exposure or Memory phases for many groups, and in Figure 5, it appears that "evasiveness" is expressed by nearly 50% of the fish in the pre-exposure condition before CAS addition and in all phases in the vehicle condition. Therefore, it appears that most of the expression of this behaviour is independent of any memory-based effect.

      (2) My second concern relates to the claim in the abstract that "background strain and sex influenced how fish respond to CAS, with males more likely to increase evasive behaviors than females and the TU strain more likely to be non-reactive."

      My understanding, based on the introduction and on the methods, is that it is likely important that the CAS be prepared from conspecifics of the same strain and sex, and for this reason, they prepared different CAS specific for each strain and each sex. Therefore, the "CAS" that is applied is necessarily different for each condition, and I am concerned about if the differences observed could relate more to variation in the quality, purity, concentration, etc. of the specific CAS samples for different groups, rather than their reactivity to the substance or their ability to form memories based on such experiences.

      (3) My third concern relates to the interpretation of the cFos data.

      As I mentioned above, I feel as though the behavioural analysis is perhaps more complex than is warranted via the inclusion of evasiveness, and I wonder if the conclusions from the experiments would be simpler if analyzed only from the perspective of freezing.

      But considering the presented analyses: while I dont think there is anything wrong with the partial least squares approach and the network analyses, I am concerned that the simple messaging in the text does not reflect the complexity of this analysis combining different weightings of different behavioural characteristics in a behavioural contrast, or covariations among many regions and what such analyses mean at the level of brain function. For these reasons, I feel like statements along the lines of "Behavioral variation is driven by differences in the activity of brain regions outside the telencephalon, such as the cerebellum, preglomerular nuclei, preoptic area and hypothalamus" are not well supported.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript by Fontana et al. sets out to fill a critical gap in our understanding of how individuality in fear responses corresponds to changes in brain activity. Previous work has shown in myriad species that fear behaviors are highly variable, and these variabilities correlate with sex and strain, with epigenetic modifications, and neural activity in specific regions of the brain, such as the amygdala. However, a whole-brain functional assessment of whether activity in different regions of the brain is associated with fear behavior has been difficult to assess, in part due to the large size and opacity of the brain. The Kenney group overcomes these limitations using the zebrafish, together with powerful behavioral and brain imaging approaches pioneered by their lab. To overcome the technical obstacles of delivering a reproducible unconditioned stimulus in water and quantifying nuanced behavioral responses, the authors developed a three-day conditioning paradigm in which fish were repeatedly exposed to CAS in one tank context and to control water in another. Leveraging automated cluster analysis across over 300 individuals from four inbred strains, they identified four distinct memory-recall phenotypes - non-reactive, evaders, evading freezers, and freezers - demonstrating both the robustness of their assay and the influence of genetic background and sex on fear learning. Finally, whole-brain imaging using the AZBA atlas (Kenney et al. eLife) and cfos mapping coupled with multivariate analysis revealed that although all fish reengaged telencephalic regions during recall, high-freezing phenotypes uniquely recruited cerebellar, preglomerular, and pretectal nuclei, whereas mixed evasion-freezing fish showed preferential activation of preoptic and hypothalamic areas - a finding that lays the groundwork for dissecting the distributed neural substrates of associative fear in zebrafish.

      Strengths:

      The strengths of the study lie in the use of zeberarish and the innovative behavioral, modeling, and brain imaging tools applied to address this question. The question of how brain-wide activity correlates with variations in fear behavior is fundamental, and arguably, this system is the only system that could be used to address this. The statistics are appropriate, and the study is well reasoned. Overall, I like this manuscript very much and think it adds invaluable information to the field of fear/anxiety.

      Weaknesses:

      I have a few questions and suggestions.

      (1) The three-day contextual fear paradigm, as implemented - one CAS pairing on day 2 followed by a single recall test on day 3 - inevitably conflates acquisition and long-term memory, making it impossible to know whether strains like TU truly recall the association poorly or simply learn it more slowly. For example, given that TU fish extinguish fear faster than AB or TL strains in extended protocols, they may simply require additional or repeated CAS pairings to achieve the same asymptotic performance. To disentangle learning kinetics from recall strength, the assay could be revised to include multiple acquisition trials (e.g., conditioning on two or more consecutive days) with an immediate post-conditioning probe to assess acquisition independent of consolidation, and continuous measurement of freezing and evasive behaviors across each trial to fit learning curves for each strain. Such refinements - even if on a subset of the strains - would reveal whether "non-reactive" phenotypes reflect genuine recall deficits or merely delayed acquisition.

      (2) My second major question is with respect to Figure 3 panel B. This is a complex figure, and I can understand the gist of what the authors are attempting to show, but it is difficult to understand as it is. Can this be represented in a way that is clearer and explained a bit more easily?

      (3) The brain mapping is by far one of the most interesting aspects of this study, and the methods that the group used are interesting. The brain mapping, however, relies on generating "contrasting" groups (Figure 6A), and I was not clear as to how these two groups were formed. Could the authors elaborate a bit?

    1. eLife Assessment

      This valuable study focuses on defining how the HSP70 chaperone system utilizes J-domain proteins to regulate the heat shock response-associated transcription factor HSF1. Using a combination of orthogonal techniques in yeast, this manuscript provides compelling evidence that the J-domain protein Apj1 facilitates attenuation of HSF1 transcriptional activity through a mechanism involving its dissociation from heat shock gene promoter regions. This work generates new insight into the mechanism of HSF1 transcriptional regulation and is a significant contribution of broad interest to cell biologists interested in proteostasis, chaperone networks, and stress-responsive signaling.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors present a thorough mechanistic study of the J-domain protein Apj1 in Saccharomyces cerevisiae, establishing it as a key repressor of Hsf1 during the attenuation phase of the heat shock response (HSR). The authors integrate genetic, transcriptomic (ribosome profiling), biochemical (ChIP, Western), and imaging data to dissect how Apj1, Ydj1 and Sis1 modulate Hsf1 activity under stress and non-stress conditions. The work proposes a model where Apj1 specifically promotes displacement of Hsf1 from DNA-bound heat shock elements, linking nuclear PQC to transcriptional control.

      Strengths:

      Overall, the work is highly novel-this is the first detailed functional dissection of Apj1 in Hsf1 attenuation. It fills an important gap in our understanding of how Hsf1 activity is fine-tuned after stress induction, with implications for broader eukaryotic systems. I really appreciate the use of innovative techniques including ribosome profiling and time-resolved localization of proteins (and tagged loci) to probe Hsf1 mechanism. The overall proposed mechanism is compelling and clear-the discussion proposes a phased control model for Hsf1 by distinct JDPs, with Apj1 acting post-activation, while Sis1 and Ydj1 suppress basal activity.

      The manuscript is well-written and will be exciting for the proteostasis field and beyond.

      Comments on revised version:

      The authors have addressed all my concerns,

    3. Reviewer #2 (Public review):

      Summary:

      Overall, the work is exceptionally well done and controlled and the results properly and appropriately interpreted. While several of the approaches, while powerful, are somewhat indirect (i.e., following gene expression via ribosomal profiling) additional experiments utilizing traditional gene expression assays added in revision combine to ultimately provide a compelling answer to the main questions being asked.

      The key finding from this work is the discovery that Apj1 regulates Hsf1 attenuation in a manner that includes Hsp70. That finding is strongly supported by the experimental data. While it would be ideal to also demonstrate Apj1-controlled differential binding of Ssa1/2 to Hsf1 at either the N- or C-terminal binding sites during attenuation, the Hsp70-Hsf1 interactions are difficult to reproducibly assess in cell extracts and are likely beyond the scope of this study. However, this work paves the way in the future for potential biochemical reconstitution assays that could elucidate both Hsp70-Hsf1 interactions as well as the distinct JDP-Hsf1 interactions reported here.

      This discovery raises additional new questions about JDP specificity in HSR regulation and the role of JDPs in navigating protein aggregation and sensing of proteostatic challenge in the nucleus, thus advancing the field and opening new, exciting avenues for exploration.

    4. Reviewer #3 (Public review):

      Summary:

      The heat shock response (HSR) is an inducible transcriptional program that has provided paradigmatic insight into how stress cues feed information into the control of gene expression. The recent elucidation that the chaperone Hsp70 controls the DNA binding activity of the central HSR transcription factor Hsf1 by direct binding has spurred the question how such a general chaperone obtains specificity. This study has addressed the next logical question, how J-domain proteins execute this task in budding yeast, the leading cell model for studying the HSR. While an involvement and in part overlapping function of general class A and B J-domain proteins, Ydj1 and Sis1 are indicated by the genetic analysis a highly specific role for the class A Apj1 in displacing Hsf1 from the promoters is found unveiling specificity in the system.

      Strengths

      The central strong point of the paper is the identification of class A J-domain protein Apj1 as a specific regulator of the attenuation of the HSR by removing Hsf1 from HSEs at the promoters. The genetic evidence and the ChIP data strongly support this claim. This identification of a specific role for a lowly expressed nuclear J-domain protein changes how the wiring of the HSR should be viewed. It also raises important questions regarding the model of chaperone titration, the concept that a chaperone with limiting availability is involved in a thug of war involving competing interactions with misfolded protein substrates and regulatory interactions with Hsf1. Perhaps Apj1 with its low levels and interactions with misfolded and aggregated proteins in the nucleus is the titrated Hsp70 (co)chaperone that determines the extent of the HSR? This would mean that Apj1 is at the nexus of the chaperone titration mechanism. Although Apj1 is not a highly conserved J domain protein among eukaryotes the strength of the study is that is provides a conceptual framework for what may be required for chaperone titration in other eukaryotes: One or more nuclear J-domain proteins with low nuclear levels that has an affinity for Hsf1 and that can become limiting due to interactions with misfolded Hsp70 proteins. The provides a pathway for how these may be identified using for example ChIP-seq.

      Weakness

      A built-in challenge when studying the mechanism of the HSR is the general role of Hsp70 chaperone system and its J domain proteins. Indeed, a weakness of the study is that it is unclear what of the phenotypic effects have to do with directly recruiting Hsp70 to Hsf1 dependent on a J domain protein and what instead is an indirect effect of protein misfolding caused by the mutation. This interpretation problem is clearly and appropriately dealt with in the manuscript text and in experiments but is of such fundamental nature that it cannot easily be fully ruled out.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      We thank the reviewer for his/her very positive comments.

      Reviewer #2 (Public review):

      We thank the reviewer for his/her positive evaluation. We plan to add RNAseq data of yeast wild-type and JDP mutant strains as more direct readout for the role of Apj1 in controlling Hsf1 activity. We agree with the reviewer that our study includes one major finding: the central role of Apj1 in controlling the attenuation phase of the heat shock response. In accordance with the reviewer we consider this finding highly relevant and interesting for a broad readership. We agree that additional studies are now necessary to mechanistically dissect how the diverse JDPs support Hsp70 in controlling Hsf1 activity. We believe that such analysis should be part of an independent study but we will indicate this aspect as part of an outlook in the discussion section of a revised manuscript.

      Reviewer #3 (Public review):

      We thank the reviewer for his/her suggestions. We agree that it is sometimes difficult to distinguish direct effects of JDP mutants on heat shock regulation from indirect ones, which can result from the accumulation of misfolded proteins that titrate Hsp70 capacity. We also agree that an in vitro reconstitution of Hsf1 displacement from DNA by Apj1/Hsp70 will be important, also to dissect Apj1 function mechanistically. We will add this point as outlook to the revised manuscript.

      Reviewer #1 (Recommendations for the authors): 

      (1) Can the authors submit the raw translatome data to a standard repository? Also, the data should be summarized in a supplemental Excel table. 

      We submitted the raw translatome data to the NCBI Gene Expression Omnibus and added the analyzed data sets (shown in Figures 1 and 5) as Supplementary Tables S4/S5 (excel sheets). We additionally included RNAseq analysis of yeast WT and JDP mutants set grown at 25°C, complementing and confirming our former translatome analysis (new Figure 5, Figure Supplement 2). Respective transcriptome raw data were also deposited at the NCBI Gene Expression Omnibus and analyzed data are available as Supplementary Table S7.

      (2) MW indicators need to be added to the Western Blot figures. 

      We added molecular weight markers to the Western Blot figures.

      (3) Can the authors please include the sequences of the primers used in all the RT-qPCR experiments? They mention they are in the supplemental information, but I couldn't locate them. 

      We added the sequences of the RT-qPCR primers as Supplementary Table S4.

      (4) Given the clear mechanism proposed, it would be nice if the authors could provide a nice summary figure. 

      We followed the suggestion of the reviewer and illustrate our main finding as new Figure 7.

      Reviewer #2 (Recommendations for the authors): 

      (1) As mentioned above, a co-IP experiment between Hsf1 and Ssa1/2 in APJ1 and apj1∆ cells, utilizing Hsf1 alleles with and without the two known binding sites, would cement the assignment of Apj1 in the Hsf1 regulatory circuit. 

      We agree with the reviewer that Hsf1-Ssa1/2 pulldown experiments, as done by Pincus and colleagues (1), will further specify the role of Apj1 in targeting Hsp70 to Hsf1 during the attenuation phase of the heat shock response. We have tried extensively such pulldown experiments to document dissociation of Ssa1/2 from Hsf1 upon heat shock in yeast wild-type cells. While we could specifically detect Ssa1/2 upon Hsf-HA1 pulldown, our results after heat shock were highly variable and inconclusive and did not allow us to probe for a role of Apj1 or the two known Ssa1/2 binding sites in the phase-specific targeting. We now discuss the potential roles of the two distinct Ssa1/2 binding sites for phase-specific regulation of Hsf1 activity in the revised manuscript (page 12, lanes 17-21).

      (2) Experiments in Figure 3 nicely localize CHIP reactions with known HSEs. A final confirmatory experiment utilizing a mutated HSE (another classic experiment in the field) would cement this finding and validate the motif and reporter-based analysis. 

      We thank the reviewer for this meaningful suggestions. We have done something like this by using the non-Hsf1 regulated gene BUD3, which lacks HSEs, as reference. We engineered a counterpart, termed “BUD3 HS-UAS”, which bears inserted HSEs, derived from the native UAS of HSP82, within the BUD3 UAS. We show that BUD3<sup>+</sup> lacking HSEs is not occupied by Hsf1 and Apj1 under either non-stress or heat shock conditions while BUD3-HSE is clearly occupied under both, paralleling Hsf1 and Apj1 occupancy of HSP82 (Figure 3E). We have renamed the engineered allele to “BUD3-HSE” to clarify the experimental design and output.

      (3) Page 8 - the ydj1-4xcga allele is introduced without explaining why it's needed, since ydj1∆ cells are viable. The authors should acknowledge the latter fact, then justify why the RQC depletion approach is preferred. Especially since the ydj1∆ mutant appears in Figure 5B. 

      ydj1∆ cells are viable, yet they grow extremely slowly at 25°C and hardly at 30°C,  making them difficult to handle. The RQC-mediated depletion of Ydj1 in ydj1-4xcga cells allows for solid growth at 30°C, facilitating strain handling and analysis of Ydj1 function. Importantly, ydj1-4xcga cells are still temperature-sensitive and exhibit the same deregulation of the heat shock response upon combination with apj1D as observed for ydj1∆ cells. Thus ydj1 knockout and knockdown cells do not differ in the relevant phenotypes reported here and we performed most of the analysis with  ydj1-4xcga cells due to their growth advantage. We added a respective explanation to the text (page 8, lanes 13-14) .

      (4) The authors raise the possibility that Sis1, Apj1, and Ydj1 may all be competing for access to Ssa1/2 at different phases of the HSR, and that access may be dictated by conformational changes in Hsf1. Given that there are at least two known Hsp70 binding sites that have negative regulatory activity in Hsf1, the possibility that domain-specific association governs the different roles should be considered. It is also unclear how the JDPs are associating with Hsf1 differentially if all binding is through Ssa1/2. 

      We thank the reviewer for the comment and will add the possibility of specific roles of the identified Hsp70 binding sites in regulating Hsf1 activity at the different phases of the heat shock response to the discussion section. Binding of Ssa1/2 to substrates (including Hsf1) is dependent on J-domain proteins (JDPs), which differ in substrate specificity. It is tempting to speculate that the distinct JDPs recognize different sites in Hsf1 and are responsible for mediating the specific binding of Ssa1/2 to either N- or C-terminal sites in Hsf1. Thus, the specific binding of a JDP to Hsf1 might dictate the binding to Ssa1/2 to either binding site. We discuss this aspect in the revised manuscript (page 12, lanes 17-21).

      (5) Figure 6 - temperature sensitivity of hsf1 and ydj1 mutants has been linked to defects in the cell wall integrity pathway rather than general proteostasis collapse. This is easily tested via plating on osmotically supportive media (i.e., 1M sorbitol) and should be done throughout Figure 6 to properly interpret the results.

      Our data indicate proteostasis breakdown in ydj1 cells by showing strongly altered localization of Sis1-GFP, pointing to massive protein aggregation (Figure 6 – Figure Supplement  1D).

      We followed the suggestion of the reviewer and performed spot tests in presence of 1 M sorbitol (see figure below). The presence of sorbitol is improving growth of ydj1-4xcga mutant cells at increased temperatures, in agreement with the remark of the reviewer. We, however, do not think that growth rescue by sorbitol is pointing to specific defects of the ydj1 mutant in cell wall integrity. Sorbitol functions as a chemical chaperone and has been shown to have protective effects on cellular proteostasis and to rescue phenotypes of diverse point mutants in yeast cells by facilitating folding of the respective mutant proteins and suppressing their aggregation (2-4). Thus sorbitol can broadly restore proteostasis, which can also explain its effects on growth of ydj1 mutants at increased temperatures. Therefore the readout of the spot test with sorbitol is not unambiguous and we therefore prefer not showing it in the manuscript.

      Author response image 1.

      Serial dilutions of indicated yeast strains were spotted on YPD plates without and with 1 M sorbitol and incubated at indicated temperatures for 2 days.<br />

      Reviewer #3 (Recommendations for the authors): 

      (1) Line 154: Can the authors, by analysis, offer an explanation for why HSR attenuation varies between genes for the sis1-4xcga strain? Is it, for example, a consequence of that a hypomorph and not a knock is used, a mRNA turnover issue, or that Hsf1 has different affinities for the HSEs in the promoters? 

      We used the sis1-4xcga knock-down strain because Sis1 is essential for yeast viability. The point raised by the reviewer is highly valid and we extensively thought about the diverse consequences of Sis1 depletion on levels of e.g. translated BTN2 (minor impact) and HSP104 (strong impact) mRNA. We meanwhile performed transcriptome analysis and confirmed the specific impact of Sis1 depletion on HSP104 mRNA levels, while BTN2 mRNA levels remained much less affected (new Figure 5 - Figure Supplement 2A/B). We compared numbers and spacings of HSEs in the respective target genes but could not identify obvious differences. Hsf1 occupancy within the UAS region of both BTN2 and HSP104 is very comparable at three different time points of a 39°C heat shock: 0, 5 and 120 min, arguing against different Hsf1 affinities to the respective HSEs (5). The molecular basis for the target-specific derepression upon Sis1 depletion thus remains to be explored. We added a respective comment to the revised version of the manuscript (page 12, lanes 3-8) .

      (2) Line 194: The analysis of ChIP-seq is not very elaborated in its presentation. How specific is this interaction? Can it be ruled out by analysis that it is simply the highly expressed genes after the HS that lead to Apj1 appearing there? More generally: Can the data in the main figure be presented to give a more unbiased genome-wide view of the results?

      We overall observed a low number of Apj1 binding events in the UAS of genes. The interaction of Apj1 with HSEs is specific as we do not observe Apj1 binding to the UAS of well-expressed non-heat shock genes. Similarly, Apj1 does not bind to ARS504 (Figure S3 – Figure Supplement 1). We extended the description of our ChIP-seq analysis procedures leading to the identification of HSEs as Apj1 target sites to make it easier to understand the data analysis. We additionally re-analysed the two Apj1 binding peaks that did not reveal an HSE in our original analysis. Using a modified setting we can identify a slightly degenerated HSE in the promoter region of the two genes (TMA10, RIE1) and changed Figure 3C accordingly. Notably, TMA10 is a known target gene of Hsf1. The expanded analysis is further documenting the specificity of the Apj1 binding peaks.

      (3) Line 215. Figure 3. The clear anticorrelation is puzzling. Presumably, Apj1 binds Hsf1 as a substrate, and then a straight correlation is expected: When Hsf1 substrate levels decrease at the promoters, also Apj1 signal is predicted to decrease. What explanations could there be for this? Is it, for example, that Hsf1 is not always available as a substrate on every promoter, or is Apj1 tied up elsewhere in the cell/nucleus early after HS? 

      We propose that Apj1 binds HSE-bound Hsf1 only after clearance of nuclear inclusions, which form upon heat stress. Apj1 thereby couples the restoration of nuclear proteostasis to the attenuation of the heat shock response. This explains the delayed binding of Apj1 to HSEs (via Hsf1), while Hsf1 shows highest binding upon activation of the heat shock response (early timepoints). Notably, the binding efficiency of Hsf1 and Apj1 (% input) largely differ, as we determine strong binding of Hsf1 five min post heat shock (30-40% of input), whereas maximal 3-4% of the input is pulled down with Apj1 (60 min post heat shock) (Figure 3D). Even at this late timepoint 10-20% of the input is pulled down with Hsf1. The diverse kinetics and pulldown efficiencies suggest that Apj1 displaces Hsf1 from HSEs and accordingly Hsf1 stays bound to HSEs in apj1D cells (Figure 4). This activity of Apj1 explains the anti-correlation: increased targeting of Apj1 to HSE-bound Hsf1 will lower the absolute levels of HSE-bound Hsf1. What we observe in the ChIP experiment at the individual timepoints is a snapshot of this reaction. Accordingly, at the last timepoint (120 min after heat shock ) analyzed, we observe low binding of both Hsf1 and Apj1 as the heat shock response has been shut down.

      (4) Line 253: "Sis-depleted".  

      We have corrected the mistake.

      (5) Line 332: Fig. 6C SIS1 OE from pRS315. A YIP would have been better, 20% of the cells will typically not express a protein with a CEN/ARS of the pRS-series so the Sis1 overexpression phenotype may be underestimated and this may impact on the interpretation. 

      We agree with the reviewer that Yeast Integrated Plasmids (YIP) represent the gold standard for complementation assays. We are not aware of a study showing that 20% of cells harboring pRS-plasmids do not express the encoded protein. The results shown in Fig. 8C/D demonstrate that even strong overproduction of Sis1 cannot restore Hsf1 activity control. This interpretation also will not be affected assuming that a certain percentage of these cells do not express Sis1. Nevertheless, we added a comment to the respective section pointing to the possibility that the Sis1 effect might be underestimated due to variations in Sis1 expression (page 11, lanes 15-19).

      (6) Figure 1C. Since n=2, a more transparent way of showing the data is the individual data points. It is used elsewhere in the manuscript, and I recommend it. 

      We agree that showing individual data points can enhance transparency, particularly with small sample sizes. However, the log2 fold change (log2FC) values presented in Figure 1C and other figures derived from ribosome profiling and RNAseq experiments were generated using the DESeq2 package. This DeSeq2 pipeline is widely used in analyzing differential gene expression and known for its statistical robustness. It performs differential expression analysis based on a model that incorporates normalization, dispersion estimation, and shrinkage of fold changes. The pipeline automatically accounts for biological, technical variability, and batch effects, thereby improving the reliability of results. These log2FC values are not directly calculated from log-transformed normalized counts of individual samples but are instead estimated from a fitted model comparing group means. Therefore, the individual values of replicates in DESeq2 log2FC cannot be shown.

      (7) Figure 1D. Please add the number of minutes on the X-axis. Figure legend: "Cycloheximide" is capitalized.  

      We revised the figure and figure legend as recommended.

      (8) Several figure panels: Statistical tests and SD error bars for experiments performed in duplicates simply feel wrong for this reviewer. I do recognize that parts of the community are calculating, in essence, quasi-p-values using parametric methods for experiments with far too low sample numbers, but I recommend not doing so. In my opinion, better to show the two data points and interpret with caution.

      We followed the advice of the reviewer and removed statistical tests for experiments based on duplicates.

      References

      (1) Krakowiak, J., Zheng, X., Patel, N., Feder, Z. A., Anandhakumar, J., Valerius, K. et al. (2018) Hsf1 and Hsp70 constitute a two-component feedback loop that regulates the yeast heat shock response eLife 7,

      (2) Guiberson, N. G. L., Pineda, A., Abramov, D., Kharel, P., Carnazza, K. E., Wragg, R. T. et al. (2018) Mechanism-based rescue of Munc18-1 dysfunction in varied encephalopathies by chemical chaperones Nature communications 9, 3986

      (3) Singh, L. R., Chen, X., Kozich, V., and Kruger, W. D. (2007) Chemical chaperone rescue of mutant human cystathionine beta-synthase Mol Genet Metab 91, 335-342

      (4) Marathe, S., and Bose, T. (2024) Chemical chaperone - sorbitol corrects cohesion and translational defects in the Roberts mutant bioRxiv  10.1101/2024.09.04.6109452024.2009.2004.610945

      (5) Pincus, D., Anandhakumar, J., Thiru, P., Guertin, M. J., Erkine, A. M., and Gross, D. S. (2018) Genetic and epigenetic determinants establish a continuum of Hsf1 occupancy and activity across the yeast genome Mol Biol Cell 29, 3168-3182

    1. eLife Assessment

      This important study provides compelling evidence that fever-like temperatures enhance the export of Plasmodium falciparum transmembrane proteins, including the cytoadherence protein PfEMP1 and the nutrient channel PSAC, to the red blood cell surface, thereby increasing cytoadhesion. Using rigorous and well-controlled experiments, the authors convincingly demonstrate that this effect results from accelerated protein trafficking rather than changes in protein production or parasite development. These findings significantly advance our understanding of parasite virulence mechanisms and offer insights into how febrile episodes may exacerbate malaria severity.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript from Jones and colleagues investigates a previously described phenomenon in which P. falciparum malaria parasites display increased trafficking of proteins displayed on the surface of infected RBCs, as well as increased cytoadherence in response to febrile temperatures. While this parasite response was previously described, it was not uniformly accepted, and conflicting reports can be found in the literature. This variability likely arises due to differences in the methods employed and the degree of temperature increase to which the parasites were exposed. Here, the authors are very careful to employ a temperature shift that likely reflects what is happening in infected humans and that they demonstrate is not detrimental to parasite viability or replication. In addition, they go on to investigate what steps in protein trafficking are affected by exposure to increased temperature and show that the effect is not specific to PfEMP1 but rather likely affects all transmembrane domain-containing proteins that are trafficked to the RBC. They also detect increased rates of phosphorylation of trafficked proteins, consistent with overall increased protein export.

      Strengths:

      The authors used a relatively mild increase in temperature (39 degrees), which they demonstrate is not detrimental to parasite viability or replication. This enabled them to avoid potential complications of a more severe heat shock that might have affected previously published studies. They employed a clever method of fractionation of RBCs infected with a var2csa-nanoluc fusion protein expressing parasite line to determine which step in the export pathway was likely accelerating in response to increased temperature. This enabled them to determine that export across the PVM is being affected. They also explored changes in phosphorylation of exported proteins and demonstrated that the effect is not limited to PfEMP1 but appears to affect numerous (or potentially all) exported transmembrane domain-containing proteins.

      Weaknesses:

      All the experiments investigating changes resulting from increased temperature were conducted after an increase in temperature from 16 to 24 hours, with sampling or assays conducted at the 24 hr mark. While this provided consistency throughout the study, this is a time point relatively early in the export of proteins to the RBC surface, as shown in Figure 1E. At 24 hrs, only approximately 50% of wildtype parasites are positive for PfEMP1, while at 32 hrs this approaches 80%. Since the authors only checked the effect of heat stress at 24 hrs, it is not possible to determine if the changes they observe reflect an overall increase in protein trafficking or instead a shift to earlier (or an accelerated) trafficking. In other words, if a second time point had been considered (for example, 32 hrs or later), would the parasites grown in the absence of heat stress catch up?

    3. Reviewer #2 (Public review):

      This manuscript describes experiments characterising how malaria parasites respond to physiologically relevant heat-shock conditions. The authors show, quite convincingly, that moderate heat-shock appears to increase cytoadherance, likely by increasing trafficking of surface proteins involved in this process.

      While generally of a high quality and including a lot of data, I have a few small questions and comments, mainly regarding data interpretation.

      (1) The authors use sorbitol lysis as a proxy for trafficking of PSAC components. This is a very roundabout way of doing things and does not, I think, really show what they claim. There could be a myriad of other reasons for this increased activity (indeed, the authors note potential PSAC activation under these conditions). One further reason could be a difference in the membrane stability following heat shock, which may affect sorbitol uptake, or the fragility of the erythrocytes to hypotonic shock. I really suggest that the authors stick to what they show (increased PSAC) without trying to use this as evidence for increased trafficking of a number of non-specified proteins that they cannot follow directly.

      (2) Supplementary Figure 6C/D: The KAHRP signal does not look like it should. In fact, it doesn't look like anything specific. The HSP70-X signal is also blurry and overexposed. These pictures cannot be used to justify the authors' statements about a lack of colocalisation in any way.

      (3) Figure 6: This experiment confuses me. The authors purport to fractionate proteins using differential lysis, but the proteins they detect are supposed to be transmembrane proteins and thus should always be found associated with the pellet, whether lysis is done using equinatoxin or saponin. Have they discovered a currently unknown trafficking pathway to tell us about? Whilst there is a lot of discussion about the trafficking pathways for TM proteins through the host cell, a number of studies have shown that these proteins are generally found in a membrane-bound state. The authors should elaborate, or choose an experiment that is capable of showing compartment-specific localisation of membrane-bound proteins (protease protection, for example).

      (4) The red blood cell contains, in addition to HSP70-X, a number of human HSPs (HSP70 and HSP90 are significant in this current case). As the name suggests, these proteins non-specifically shield exposed hydrophobic domains revealed upon partial protein unfolding following thermal insult. I would thus have expected to find significantly more enrichment following heat shock, but this is not the case. Is it possible that the physiological heat shock conditions used in this current study are not high enough to cause a real heat shock?

    4. Reviewer #3 (Public review):

      Summary:

      In this paper, it is established that high fever-like 39{degree sign}C temperatures cause parasite-infected red blood cells to become stickier. It is thought that high temperatures might help the spleen to destroy parasite-infected cells, and they become stickier in order to remain trapped in blood vessels, so they stop passing through the spleen.

      Strengths:

      The strength of this research is that it shows that fever-like temperatures can cause parasite-infected red blood cells to stick to surfaces designed to mimic the walls of small blood vessels. In a natural infection, this would cause parasite-infected red blood cells to stop circulating through the spleen, where the parasites would be destroyed by the immune system. It is thought that fevers could lead to infected red blood cells becoming stiffer and therefore more easily destroyed in the spleen. Parasites respond to fevers by making their red blood cells stickier, so they stop flowing around the body and into the spleen. The experiments here prove that fever temperatures increase the export of Velcro-like sticky proteins onto the surface of the infected red blood cells and are very thorough and convincing.

      Weaknesses:

      A minor weakness of the paper is that the effects of fever on the stiffness of infected red blood cells were not measured. This can be easily done in the laboratory by measuring how the passage of infected red blood cells through a bed of tiny metal balls is delayed under fever-like temperatures.

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      This manuscript from Jones and colleagues investigates a previously described phenomenon in which P. falciparum malaria parasites display increased trafficking of proteins displayed on the surface of infected RBCs, as well as increased cytoadherence in response to febrile temperatures. While this parasite response was previously described, it was not uniformly accepted, and conflicting reports can be found in the literature. This variability likely arises due to differences in the methods employed and the degree of temperature increase to which the parasites were exposed. Here, the authors are very careful to employ a temperature shift that likely reflects what is happening in infected humans and that they demonstrate is not detrimental to parasite viability or replication. In addition, they go on to investigate what steps in protein trafficking are affected by exposure to increased temperature and show that the effect is not specific to PfEMP1 but rather likely affects all transmembrane domain-containing proteins that are trafficked to the RBC. They also detect increased rates of phosphorylation of trafficked proteins, consistent with overall increased protein export.

      Strengths:

      The authors used a relatively mild increase in temperature (39 degrees), which they demonstrate is not detrimental to parasite viability or replication. This enabled them to avoid potential complications of a more severe heat shock that might have affected previously published studies. They employed a clever method of fractionation of RBCs infected with a var2csa-nanoluc fusion protein expressing parasite line to determine which step in the export pathway was likely accelerating in response to increased temperature. This enabled them to determine that export across the PVM is being affected. They also explored changes in phosphorylation of exported proteins and demonstrated that the effect is not limited to PfEMP1 but appears to affect numerous (or potentially all) exported transmembrane domain-containing proteins.

      Weaknesses:

      All the experiments investigating changes resulting from increased temperature were conducted after an increase in temperature from 16 to 24 hours, with sampling or assays conducted at the 24 hr mark. While this provided consistency throughout the study, this is a time point relatively early in the export of proteins to the RBC surface, as shown in Figure 1E. At 24 hrs, only approximately 50% of wildtype parasites are positive for PfEMP1, while at 32 hrs this approaches 80%. Since the authors only checked the effect of heat stress at 24 hrs, it is not possible to determine if the changes they observe reflect an overall increase in protein trafficking or instead a shift to earlier (or an accelerated) trafficking. In other words, if a second time point had been considered (for example, 32 hrs or later), would the parasites grown in the absence of heat stress catch up?

      We did not assess cytoadhesion at later stages, but in the supplementary figures we show that at 40 hours post infection both heat stress and control conditions have comparable proportions of VAR2CSA-positive iRBCs, whilst they differ at 24h. This is true for the DMSO (control wildtype resembling) HA-tagged lines of HSP70x and PF3D7_072500 (Supplementary Figures 9 and 12 respectively). In the light that protein levels appear not changed, we conclude that trafficking is accelerated during these earlier timepoints, but remains comparable at later stages. This would still increase the overall bound parasite mass as parasites start to adhere earlier during or after a heat stress.

      Reviewer #2 (Public review):

      This manuscript describes experiments characterising how malaria parasites respond to physiologically relevant heat-shock conditions. The authors show, quite convincingly, that moderate heat-shock appears to increase cytoadherance, likely by increasing trafficking of surface proteins involved in this process.

      While generally of a high quality and including a lot of data, I have a few small questions and comments, mainly regarding data interpretation.

      (1) The authors use sorbitol lysis as a proxy for trafficking of PSAC components. This is a very roundabout way of doing things and does not, I think, really show what they claim. There could be a myriad of other reasons for this increased activity (indeed, the authors note potential PSAC activation under these conditions). One further reason could be a difference in the membrane stability following heat shock, which may affect sorbitol uptake, or the fragility of the erythrocytes to hypotonic shock. I really suggest that the authors stick to what they show (increased PSAC) without trying to use this as evidence for increased trafficking of a number of non-specified proteins that they cannot follow directly.

      This is a valid point, however, uninfected RBCs do not lyse following heat stress, nor do much younger iRBCs, indicating that the observed effect is specific to infected RBCs at a defined stage. The sorbitol sensitivity assay is performed at 37°C under normal conditions after cells are returned to non–heat stress temperatures, so the effect is not due to transient changes in membrane permeability at elevated temperature. 

      Planned experiment: However, to increase the strength of our conclusions and further test our hypothesis, we will perform sorbitol sensitivity assays on >20 hours post infection iRBCs following heat stress in the presence and absence of furosemide, a PSAC inhibitor. If iRBC lysis is abolished with furosemide present, this would confirm that the effect is PSAC-dependent. However, the effect could also possibly be due to altered PSAC activity during heat stress which is maintained at lower temperatures, as outlined in the discussion.

      (2) Supplementary Figure 6C/D: The KAHRP signal does not look like it should. In fact, it doesn't look like anything specific. The HSP70-X signal is also blurry and overexposed. These pictures cannot be used to justify the authors' statements about a lack of colocalisation in any way.

      Planned experiment: We agree that the IFAs are not the best as presented and will include better quality supplementary images in a revised version.

      (3) Figure 6: This experiment confuses me. The authors purport to fractionate proteins using differential lysis, but the proteins they detect are supposed to be transmembrane proteins and thus should always be found associated with the pellet, whether lysis is done using equinatoxin or saponin. Have they discovered a currently unknown trafficking pathway to tell us about? Whilst there is a lot of discussion about the trafficking pathways for TM proteins through the host cell, a number of studies have shown that these proteins are generally found in a membrane-bound state. The authors should elaborate, or choose an experiment that is capable of showing compartment-specific localisation of membrane-bound proteins (protease protection, for example).

      We do not believe we identified a novel trafficking pathway, but that we capture trafficking intermediates of PfEMP1 between the PVM and the RBC periphery, in either small vesicles, and/ or possibly Maurer’s clefts. These would still be membrane embedded, but because of their small size, not be pelleted using the centrifugation speeds in our study (we did not use ultracentrifugation). This explanation, we believe, is in line with the current hypothesis of PfEMP1 and other exported TMD protein trafficking to the periphery or the Maurer’s clefts.

      (4) The red blood cell contains, in addition to HSP70-X, a number of human HSPs (HSP70 and HSP90 are significant in this current case). As the name suggests, these proteins non-specifically shield exposed hydrophobic domains revealed upon partial protein unfolding following thermal insult. I would thus have expected to find significantly more enrichment following heat shock, but this is not the case. Is it possible that the physiological heat shock conditions used in this current study are not high enough to cause a real heat shock?

      As noted by the reviewer, we do not see enrichment of red blood cell heat shock proteins following heat stress, either with FIKK10.2-TurboID or in the phosphoproteome. We used a physiologically relevant heat stress that significantly modifies the iRBC, as shown by our functional assays. While a higher temperature might induce an association of red blood cell heat shock proteins, such conditions may not accurately reflect the most commonly found context of malaria infection.

      Reviewer #3 (Public review):

      Summary:

      In this paper, it is established that high fever-like 39 C temperatures cause parasite-infected red blood cells to become stickier. It is thought that high temperatures might help the spleen to destroy parasite-infected cells, and they become stickier in order to remain trapped in blood vessels, so they stop passing through the spleen.

      Strengths:

      The strength of this research is that it shows that fever-like temperatures can cause parasite-infected red blood cells to stick to surfaces designed to mimic the walls of small blood vessels. In a natural infection, this would cause parasite-infected red blood cells to stop circulating through the spleen, where the parasites would be destroyed by the immune system. It is thought that fevers could lead to infected red blood cells becoming stiffer and therefore more easily destroyed in the spleen. Parasites respond to fevers by making their red blood cells stickier, so they stop flowing around the body and into the spleen. The experiments here prove that fever temperatures increase the export of Velcro-like sticky proteins onto the surface of the infected red blood cells and are very thorough and convincing.

      Weaknesses:

      A minor weakness of the paper is that the effects of fever on the stiffness of infected red blood cells were not measured. This can be easily done in the laboratory by measuring how the passage of infected red blood cells through a bed of tiny metal balls is delayed under fever-like temperatures.

      Previous work by Marinkovic et al. (cited in this manuscript) reported that all RBCs, both infected and uninfected, increase in stiffness at 41 °C compared with 37 °C, with trophozoites and schizonts exhibiting a particularly pronounced increase. We agree that it would be interesting to determine whether similar changes occur at physiological fever-like temperatures, and whether this increase in stiffness coincides with the period of elevated protein trafficking. However, since we have already demonstrated enhanced protein export using multiple complementary approaches, we have chosen to address these questions in a follow-up study.

    1. eLife Assessment

      This study provides important insights into how the EBH domain of microtubule end-binding protein 1 (EB1) interacts with SxIP peptides derived from the MACF plus-end tracking protein. The revised manuscript includes convincing ITC and NMR experiments that clarify the role of flanking residues and address the influence of dimerization and cooperativity on binding. While some mechanistic aspects remain difficult to resolve experimentally, the data and analysis now more clearly justify the proposed "dock-and-lock" model and its interpretive value. This work will be of interest to structural biologists and biophysicists studying microtubule-associated protein interactions.

    2. Reviewer #1 (Public review):

      Summary:

      In this article, Almeida and colleagues use a combination of NMR and ITC to study the interaction of the EBH domain of microtubule end-binding protein 1 (EB1) with SxIP peptides derived from the MACF plus-end tracking protein. EBH forms a dimer and in isolation has previously been shown to have a disordered C-terminal tail. Here, the authors use NMR to determine a solution structure of the EBH dimer bound to 11-mer SxIP peptides derived from MACF, and observe that the disordered C-terminal of EBH is recruited by residues C-terminal to the SxIP motif to fold into the final complex. By comparison of binding in different length peptides, and of EBH lacking the C-terminal tail, they show that these additional contacts increase binding affinity by an order of magnitude, greatly stabilising the interaction, in a binding mode they term 'dock-and-lock'.

      The authors also use their new structural knowledge to design peptides with higher affinities, and show in a cell model that these can be weakly recruited to microtubule ends - although a dimeric construct is necessary for efficient recruitment. Ultimately, by demonstrating the feasibility of targeting these proteins, this work points towards the possibility of designing small-molecules to block the interactions.

    3. Reviewer #2 (Public review):

      Summary:

      The C-terminal region of EB1 is responsible for protein-protein interactions, thereby recruiting the binding partners of EB1 to microtubules; the coiled-coil region (EBH) and the acidic tail are critical for their binding partners. The authors demonstrated by using NMR that the binding mode of EBH with the SxIP motif, which is a two-step process termed "dock-and-lock". The ITC analysis supports the results obtained from NMR. The initial version of the manuscript contained ambiguities on the ITC data; however, the results of the revised manuscript are convincing and support the two-step binding model.

      Strength:

      The authors propose a novel model of "dock-and-lock" by using multiple methods of NMR, ITC and cell biology.

    4. Author response:

      The following is the authors’ response to the original reviews

      We would like to express our sincere gratitude to the reviewers for their thorough analysis of the manuscript and their extremely helpful comments. We have taken all the suggestions into consideration and conducted a range of additional experiments to address the points raised. We have also extensively revised the manuscript to clarify descriptions, correct inaccuracies and remove inconsistencies. We have modified the figures for clarity and content.

      Overall, we expanded the description of the EBH structure to emphasise its dimeric nature and the impact of the two binding sites on interpreting the binding data, including cooperativity. Using ITC, we tested the effect of the pre-SxIP residues on the binding affinity with additional peptides. We found that these residues had a significant effect, albeit much smaller than that of the post-SxIP residues. We analysed the binding of the 11MACF-VLL mutant with EBH-ΔC and evaluated the exchange rates. In agreement with our model, we found that the EBH affinity for the SxIP peptide from CK5P2 (KKSRLPRILIKRSR), which has a C-terminal sequence similar to that of the 11MACF-VLLRK mutant, is 21nM, which is similar to the affinity of the mutant itself. This demonstrates the significant variation in affinity observed among natural SxIP ligands, as predicted by our study. Our responses to the specific points raised by the reviewers are provided below.

      Reviewer #1 (Public Review):

      There is no direct experimental evidence for independent dock and lock steps. The model is certainly plausible given their structural data, but all titration and CEST measurements are fully consistent with a simple one-step binding mechanism. Indeed, it is acknowledged that the results for the VLL peptide are not consistent with the predictions of this model, as affinity and dissociation rates do not co-vary. The model may still be a helpful way to interpret and discuss their results, and may indeed be the correct mechanism, but this has not yet been proven.

      Unfortunately, it is not possible to obtain direct experimental evidence because the folding of the C-terminus is too fast to influence the NMR parameters. However, as the reviewer pointed out, our structural data support the two-step model, since folding of the C-terminus is only possible once the ligand containing the post-SxIP residues has bound. By adopting a mechanistically supported model, we can analyse the contributions to binding and relate them to the structural characteristics of the complex. This provides a clearer insight into the roles of the various regions in the interaction and allows to modify them rationally to enhance the ligand affinity.

      In the revised version, we restate the equations in terms of comparing the on-rates. This provides a clearer view of the effect of the additional stage, which cannot increase the overall on-rate since the two stages are sequential. If the forward rate of the second stage is comparable to or slower than the off-rate of the first stage, the overall on-rate decreases. Conversely, if the forward rate is much faster, the overall on-rate remains unchanged. For the wild-type 11MACF peptide, we observed that the presence of the EBH C-terminus does not affect the on-rate of binding, which is in perfect agreement with the two-step model and indicates that the C-terminus folds very quickly.

      Additionally, we evaluated the binding of the 11MACF-VLL mutant to EBH-ΔC and observed a twofold decrease in Kd compared to WT 11MAC, primarily due to an increase in the on-rate. Interestingly, this rate is approximately twice as low as the overall on-rate for EBH/11MACF-VLL binding, contradicting the sequential two-step model. This suggests a more complex binding process where binding is accelerated by additional hydrophobic interactions with the unfolded C-terminus. However, given the difficulty of quantifying very slow exchange rates, it is more likely that the discrepancy is due to the accuracy of the rate measurements. Therefore, the model allows the rational analysis of changes in binding parameters due to mutations.

      There is little discussion of the fact that binding occurs to EBH dimers -  either in terms of the functional significance of this or in the  acquisition and analysis of their data. There is no discussion of  cooperation in binding (or its absence), either in the analysis of NMR  titrations or in ITC measurements. Complete ITC fit results have not  been reported so it is not possible to evaluate this for oneself.

      We added information about the dimer to the introduction, emphasising its role in enhancing interaction with microtubules (MTs) and its structural role in SxIP binding. The ITC data do not exhibit any biphasic behaviour and can be fitted to a single-site model with 1:1 stoichiometry relative to the EB1c monomer. This corresponds to two independent binding sites in the dimer. We have added the stoichiometry to Table 1 and the description. The NMR titration data for the 11MACF and 11MACF-VLL interactions were fitted to the TITAN dimer model, which includes cooperativity parameters. For WT 11MACF, both cooperativity parameters were zero, corresponding to independent binding sites in the ITC model. For 11MACF-VLL, the fitting suggests weak negative cooperativity, with a ~3-fold increase in Kd for binding to the second site and no change in the off-rate. This difference in Kd is likely to be too small to induce a biphasic shape to the ITC curve. As the cooperativity effect on the NMR spectra is small and absent in the ITC, we used the independent sites model for data analysis, as there is insufficient justification for introducing extra parameters into the model. Crucially, fitting to this model did not alter the off-rate value obtained by NMR or affect the conclusions. We added a description of cooperativity to the results and discussion.

      Three peptides are used to examine the role of C-terminal residues in SxIP motifs: 4-MACF (SKIP), 6-MACF (SKIPTP), and 11-MACF (KPSKIPTPQRK). The 11-mer demonstrates the strongest binding, but this has added residues to the N-terminal as well. It has also introduced charges at both termini, further complicating the interpretation of changes in binding affinities. Given this, I do not believe the authors can reasonably attribute increased affinities solely to post-SxIP residues.

      We tested the 9MACF peptide SKIPTPQRK, which has the same N-terminus as the 4- and 6-MACF peptides, and found that its binding affinity is ~10-fold weaker than that of 11MACF. This demonstrates the contribution of both the pre- and post-SxIP residues. This is likely due to electrostatic interactions between the positively charged N-terminus and the negatively charged EBH surface, similar to those involving the positive charges at the peptide C-terminus. Although significant, the contribution of the N-terminal peptide region is approximately one order of magnitude lower than that of the post-SxIP residues, meaning the post-SxIP region is the main affinity modulator. We have added the binding data on 9MACF and a discussion of the contributions to the manuscript.

      Experimental uncertainties are, with exceptions, not reported.

      Uncertainties added to the number in Table 1 and the text. Information on how uncertainties were calculated added to Table 1.

      Reviewer #1 (Recommendations For The Authors):

      (1) Have you tested the binding of the WT dimer in your cell model?

      We haven’t tested the WT dimer because it has already been reported in the 2009 Cell paper by Honappa et al. In the cell experiments, our main focus was on recruiting the high-affinity mutant to MTs. The low level of recruitment, despite the mutant's high affinity, highlights the importance of dimerisation or additional contributions to binding.

      (2) Please deposit all NMR dynamics measurements (relaxation rates and derived model-free parameters) alongside structural data in the BMRB.

      The relaxation data have been submitted to BMRB, IDs 53187 and 53188

      (3) Please report complete fitting results, e.g. for ITC, including stoichiometries. Clarify what this means for binding to a dimer, and if there is any evidence of cooperativity. Figure 3C, right hand panel, shows an unusual stoichiometry, can the authors comment on this?

      We have added more information on stoichiometry and cooperativity; please refer to our response to the above comment for details. We repeated the titration for the VLLRK mutant using fresh peptide stock. As expected, the stoichiometry was close to 1:1 relative to the EB1c monomer. The new data are now included in the table and figure.

      (4) Please report uncertainties for all measurements of Kd, koff, kon, ∆G, ∆H, ∆S, and explain whether these are determined from statistical analysis, technical or biological repeats (and where reported, clarify between standard deviation/standard error). Please also be aware of standard guidelines for reporting significant figures for data with uncertainties, as these have not been followed in Table 1.

      Uncertainties added to the number in Table 1 and the text. Information on how uncertainties were calculated added to Table 1.

      (5) The construct design for the cell model is unclear - given the importance of flanking residues, please report and discuss how the sequences are attached to venus: which termini is attached, and what is the linker composition?

      We cloned the peptides at the C-terminus of mTFP, after the GS linker of the vector. The peptide itself contains a GS sequence at the N-terminus, creating a highly flexible GSGS linker that separates the SxIP region from mTFP and minimises the potential effect of mTFP on binding. We followed the design of Honappa et al. to enable direct comparison with the published results. We have added this information to the 'Methods' section..

      (6) Which HSQC pulse sequence was used for 2D lineshape analysis? The authors mention non-linear chemical shift changes, presumably associated with the dimer interface - this would be useful to expand upon and clarify.

      For the lineshape analysis, we used the standard Bruker sequence hsqcfpf3gpphwg with soft-pulse watergate water suppression and flip-back. This sequence is included in the TITAN model. We added the description of the non-linear chemical shift changes and connection of these changes to the allosteric effect of the binding to the supplementary information describing details of the lineshape analysis.

      (7) Figure 1A could usefully highlight the dimer interface in the surface representation also.

      We believe that including the interface would make the figure too complicated. The dimer configuration is shown in different colours for the two subunits, clearly demonstrating their involvement in forming the binding site.

      (8) Figures 1C and 1D could usefully show a secondary structure schematic to assist the reader. The x-axis in these figures is not linear and this should be corrected. The calculation of combined chemical shift perturbations should be described.

      Thank you for the helpful suggestion. We changed the scale of the figures and added the diagram of the secondary structure.

      (9) Units are missing from many figure axes.

      We added missing units to the axes. Thank you for highlighting this.

      (10) What peptide concentrations are used in Figure 1C? Presumably, these should be reported at saturation for this to be a fair comparison, this should be clarified.

      The protein concentration was 50 µM. Peptides 4MACF and 6MACF were added at a 100-fold molar excess and peptide 11MACF was added at a 4-fold excess. Saturation was achieved for 11MACF. This was impossible for the short peptides due to their mM affinity. This information has been added to the figure legend. The figure's main aim is to illustrate the differences in the chemical shift perturbation profiles, which can be achieved even if full saturation is not attained. Although the absolute value of the chemical shifts is proportional to the degree of saturation, the distribution of the largest chemical shift changes is independent of this degree. Therefore, we can draw conclusions about the distribution of changes by comparing under non-saturation conditions.

      (11) The presentation of raw peak intensities in Figure 1D shows primarily the flexibility of the C-terminal region associated with high intensities. Beyond this, when comparing the binding of peptides it would be much more informative to show relative peak intensities. Residues around 210-225 appear to show strong broadening in the presence of peptide, but this is masked by the low initial intensity. Can the authors clarify and discuss this? Also, what peptide concentrations were used for this comparison? For a fair comparison, it should be close to saturation - particularly to exclude exchange broadening contributions.

      The protein concentration was 50 µM. 6MACF and 6MACF peptides were added at a 100-fold excess and 11MACF at a 4-fold excess. Saturation was achieved for 11MACF. This was impossible to achieve for the short peptide due to its mM affinity. This information has been added to the figure legend. Upon checking the data, we found a small systematic offset in the coiled-coil region of some of the complexes, as the integral intensity had been used in the initial plot. While this does not change the conclusion regarding the high dynamics of the C-terminus, it does create an inaccurate perception of the relative intensities of the folded regions in the different complexes, as noted by the reviewer. We have now plotted the amplitudes at the maximum of the peaks, which do not exhibit any systematic offset as they are much less susceptible to baseline distortions. We are grateful to the reviewer for highlighting this apparent discrepancy.

      (12) Figure 2 - the scale for S2 order parameters appears to be backwards, given the caption, but its range should be indicated. Similarly, the range of values for Rex should also be indicated. These data should also be tabulated/plotted in supporting information.

      We have corrected the figure legend and added S2 and Rex plots to the supplementary material. The figure aims to highlight regions of increased mobility, while the plots provide full quantitative information on the values. We thank the reviewer for pointing out the error in the figure legend and for the suggestions regarding the plots.

      (13) The scale in Figure 3B is illegible. Indeed, the whole structure is quite small and could usefully be expanded.

      We increased the size of the structure panels and added a scale.

      (14) Figure 4 does not show a decrease in exchange rates, as per the caption - no comparison of exchange rates is shown, only thermodynamic information in panel E. Panel C shows CEST measurements, but it is not clear what system this is for - please clarify, and consider showing the comparable data for the ∆C construct for comparison.

      We have amended the figure legend to clarify that the figure shows binding parameters. We added information about the CEST profiles for the EBH/11MACF interaction to the figure legend (Figure 4C). Exchange with the ∆C construct is too fast for CEST measurements. We used lineshape analysis to evaluate the exchange rates for this construct.

      (15) The schematics shown in Figure 4D, and elsewhere, are really quite difficult to understand. They may pose additional challenges to colourblind readers. Please consider ways that this could be clarified.

      We simplified the colour scheme in the model to make the colours easier to see and to highlight SxIP and non-SxIP regions. We believe that this improved the clarity of the figure.

      (16) Figures S1D/E - the x-axes are unclear and units are missing from the y-axes.

      We re-labelled the axes to clarify the scale and units. Thank you for pointing this.

      Reviewer #2 (Public Review):

      The C-terminal tail of EB1, which is adjacent to EBH and is not analyzed in this study, is highly acidic and plays an important role in protein interactions. If the authors discuss the C-terminus of EB1, they should analyze the whole C-terminus of EB1, which would strengthen the conclusion they have made.

      Honapa et al., Cell, 2009, reported chemical shift perturbations (CSPs) on the peptide binding for the full EB1c fragment, which includes the negatively charged C-terminus. Similar to our study, they observed significant CSPs in the FVIP region but negligible CSPs at the negatively charged EEY end. They concluded that the final eight EB1c residues did not contribute to binding and used a truncated EB1c construct for their structural analysis. Building on that study, we used the same EEY-truncated construct to analyse the contribution of the C-terminus in more detail. We believe that conducting additional experiments with the full C-terminus with respect to SxIP binding would be superfluous, as it would merely replicate the findings of Honapa EA. We have added the rationale for selecting the truncated EB1c construct to the text, referencing Honapa et al.

      Reviewer #2 (Recommendations For The Authors):

      (1) Figure 2C: The authors can analyze the 11MACF peptide as well, to provide more assurance to their argument. It would be easier to distinguish the sequences of "SKIP" and "FVIP" by changing their colors.

      Our relaxation analysis (Fig. 2C) focuses on the dynamics of the unstructured C-terminal region in both the free and complex forms. Further relaxation analysis of the peptide would not provide additional information on this, and would be complicated by the presence of free peptide in solution.

      (2) Figure 3B: Acidic residues in EBH should be labeled.<br /> Page 6, line 11: If the authors insist that the acidic patch will influence the interactions between EB1 and the peptide, the data of the analysis using the entire EB1 C-terminus should be included, given that the C-terminal tail of EB1 is highly acidic.

      To test the contribution of charge to binding, we conducted an ITC experiment at increasing salt concentrations. We observed a significant increase in Kd values when the concentration of NaCl increased from 50 to 150 mM, which supports our conclusion regarding the significant electrostatic contribution. This conclusion is independent of the presence or absence of the C-terminus.

      As we explained earlier, Honapa et al., Cell 2009, conducted an NMR experiment on the full EB1c and observed no CPSs in the EEY region, indicating a negligible contribution from the EEY region to SxIP binding. Therefore, we think that additional experiments involving the entire C-terminus are unnecessary, as they would simply replicate the results of Honapa et al. We have added the rationale for selecting the truncated EB1c to the text, referencing Honapa et al.

      It would be very difficult to label the acidic residues without enlarging 3B considerably. However, we do not think this is necessary as we are not discussing any specific residues. The current figure shows the distribution of the surface charge, which is sufficient for our purposes.

      (3) Figure 2B (Page 4, line 27): The side chain of S5477 should be drawn. The authors should include a figure of the crystal structure of EBH and SxIP as a comparison (Honnappa et al., Cell, 2009). In their paper, Honnappa et al. performed chemical shift perturbation titrations by NMR. From their analysis, I imagine that the EB1 tail may not be critical for the EB1 C-terminus:SxIP interactions, since the signals in the tail are not significantly perturbed. The authors should cite this paper.

      We are grateful to the reviewer for highlighting this. CSP analysis of the Honapa EA revealed significant changes in the FVIP region, which we also observed. They also reported negligible CSPs at the EEY end, demonstrating that this part of the tail is non-critical and can be removed. We have added text to the manuscript to highlight the similarity between CSPs and those observed in Honapa EA. Figure 2B shows the side chains for the residues with the strongest detected contacts. These do not include S5477.

      (4) Figure 3C (ITC data): The stoichiometric ratios in the ITC data look strange. EBH vs KPSKIPVLLRKRK, is it 1:1?

      We repeated the ITC experiments using a new stock of the peptide and a new batch of the protein, checking the concentrations using UV spectroscopy. The new experiments produced a stoichiometry close to 1, as shown in the table.

      (5) Page 10, line 27: "The TPQ sequence of 11MACF is not optimal...": What is the meaning of "optimal"? The transient interaction between EB1 and its binding partner is responsible for the dynamics of the microtubule cytoskeleton. In a sense, the relatively weak interaction is "optimal" for the system. The authors should rephrase the word.

      We agree that weak interactions are optimal from a functional perspective, as they have been selected through evolution. In our case, 'optimal' refers to the hydrophobic interaction with the C-terminus. We replaced 'optimal' with 'ideal' to draw more attention to the second part of the sentence, which clarifies the context.

      (6) Page 11, line 2: "small number of comets enriched in the peptide that were too faint for the quantitative analysis, comparable to the reported previously (Honnappa, Gouveia et al. 2009)." Honnappa et al. used EGFP-fusion constructs in their study: EGFP forms a weak dimer, which presumably gave different results from the authors' mTFP-constructs. The authors can note this point in the text.

      We are grateful to the reviewer for highlighting this. This aligns well with our conclusion that dimerisation is important for localisation to comets. We have added this point to the text.

      (7) Page 10, line 21: The authors calculate the free energy of complex formation between EBH and MACF peptide and explain in the text, but it is hard to follow.

      We simplified and clarified the description of the energy contributions by focusing on the SxIP and non-SxIP regions of the peptide, as well as the EBH C-terminus.

      Minor points:

      Page 2, line 9: IP motifs are not usually located in the C-terminus. For example, SxIP in Tastin is located in the N-terminal region, and SxIPs in CLASP are in the middle.

      We corrected this statement, removing C-terminal.

      Page 3, line 4: The authors should note the residue numbers of SKIP.

      We think that in this context the residue number of the SxIP region are not important and would be distracting.

      Figure 3D and Figure S3F: Make the colors and the order the same between the two figures.

      We changed the colour scheme and the order of ITC parameters in S3F to match the main figure.

      Figure 1A, 2B, Figure S5: Change the color of SKIP from other residues in the same chain, otherwise the readers cannot distinguish. Likewise, change the color of FVIP in Figure 2B.

      We think that changing the colours will complicate the figures unnecessary. The corresponding residues are clearly labelled in the figures.

      Figure 3, Figure S5, S6, S7: Box the letters of SKIP for clarity.

      We boxed the SxIP region in S5 (new S6) and underlined in S6 (new S7). In S7 (new S8) the location of SxIP is very clear from the homology.

      Figure 3B; Figure S2: Hard to recognize the peptide (MACF in green).

      We increased the size of 3D and S2, making it easier to see the peptide.

      Figure 1C and D: Make the residual numbers of the x-axes the same between the two graphs.

      We made new plots with a linear scale for the residue numbers.

      Figure 2A: The structures shown are not EB1. It should be described as EBH or EB1(191-260 a.a.).

      Corrected.

      Page 5, line 17: "the S2 values of the C-terminus" should be "the S2 values of the C-terminal loop in EBH", otherwise it is confusing.

      Corrected.

      Page 6, line 27; Figure S3C and S6: Please indicate the assignments of the resonances from "253FVI255" in the Figures.

      We labelled the peaks corresponding to the 253FVI255 region in figure S6 (new S7). Figure S3 shows EBH-ΔC that does not include this region.

      Page 7, line 25: Figure S7 should be S8.

      Corrected

      Page 12, line 6: "sulfatrahsferases" must by a typo.

      Corrected.

    1. eLife Assessment

      This useful study develops an individual-based model to investigate the evolution of division of labor in vertebrates, comparing the contributions of group augmentation and kin selection. The model incorporates several biologically relevant features, including age-dependent task switching and separate manipulation of relatedness and group-size benefits. However, the evidence remains inadequate to support the authors' central claim that group augmentation is the primary driver of vertebrate division of labor. Key modelling assumptions-such as floater dominance advantages, the absence of task synergy, and the narrow parameter space explored-restrict the potential for kin selection to produce division of labor, thereby limiting the generality of the conclusions.

    2. Reviewer #1 (Public review):

      This paper presents a computational model of the evolution of two different kinds of helping ("work," presumably denoting provisioning, and defense tasks) in a model inspired by cooperatively breeding vertebrates. The helpers in this model are a mix of previous offspring of the breeder and floaters that might have joined the group, and can either transition between the tasks as they age or not. The two types of help have differential costs: "work" reduces "dominance value," (DV), a measure of competitiveness for breeding spots, which otherwise goes up linearly with age, but defense reduces survival probability. Both eventually might preclude the helper from becoming a breeder and reproducing. How much the helpers help, and which tasks (and whether they transition or not), as well as their propensity to disperse, are all evolving quantities. The authors consider three main scenarios: one where relatedness emerges from the model, but there is no benefit to living in groups, one where there is no relatedness, but living in larger groups gives a survival benefit (group augmentation, GA), and one where both effects operate. The main claim is that evolving defensive help or division of labor requires the group augmentation; it doesn't evolve through kin selection alone in the authors' simulations.

      This is an interesting model, and there is much to like about the complexity that is built in. Individual-based simulations like this can be a valuable tool to explore the complex interaction of life history and social traits. Yet, models like this also have to take care of both being very clear on their construction and exploring how some of the ancillary but potentially consequential assumptions affect the results, including robust exploration of the parameter space. I think the current manuscript falls short in these areas, and therefore, I am not yet convinced of the results.

      In this round, the authors provided some clarity, but some questions still remain, and I remain unconvinced by a main assumption that was not addressed.

      Based on the authors' response, if I understand the life history correctly, dispersers either immediately join another group (with 1-the probability of dispersing), or remain floaters until they successfully compete for a breeder spot or die? Is that correct? I honestly cannot decide because this seems implicit in the first response but the response to my second point raises the possibility of not working while floating but can work if they later join a group as a subordinate. If it is the case that floaters can have multiple opportunities to join groups as subordinates (not as breeders; I assume that this is the case for breeding competition), this should be stated, and more details about how.

      So there is still some clarification to be done, and more to the point, the clarification that happened only happened in the response. The authors should add these details to the main text. Currently, the main text only says vaguely that joining a group after dispersing " is also controlled by the same genetic dispersal predisposition" without saying how.

      In response to my query about the reasonableness of the assumption that floaters are in better condition (in the KS treatment) because they don't do any work, the authors have done some additional modeling but I fail to see how that addresses my point. The additional simulations do not touch the feature I was commenting on, and arguably make it stronger (since assuming a positive beta_r -which btw is listed as 0 in Table 1- would make floaters on average be even more stronger than subordinates). It also again confuses me with regard to the previous point, since it implies that now dispersal is also potentially a lifetime event. Is that true?

      Meanwhile, the simplest and most convincing robustness check, which I had suggested last round, is not done: simply reduce the increase in the R of the floater by age relative to subordinates. I suspect this will actually change the results. It seems fairly transparent to me that an average floater in the KS scenario will have R about 15-20% higher than the subordinates (given no defense evolves, y_h=0.1 and H_work evolves to be around 5, and the average lifespan for both floaters and subordinates are in the range of 3.7-2.5 roughly, depending on m). That could be a substantial advantage in competition for breeding spots, depending on how that scramble competition actually works. I asked about this function in the last round (how non-linear is it?) but the authors seem to have neglected to answer.

      More generally, I find that the assumption (and it is an assumption) floaters are better off than subordinates in a territory to be still questionable. There is no attempt to justify this with any data, and any data I can find points the other way (though typically they compare breeders and floaters, e.g.: https://bioone.org/journals/ardeola/volume-63/issue-1/arla.63.1.2016.rp3/The-Unknown-Life-of-Floaters--The-Hidden-Face-of/10.13157/arla.63.1.2016.rp3.full concludes "the current preliminary consensus is that floaters are 'making the best of a bad job'."). I think if the authors really want to assume that floaters have higher dominance than subordinates, they should justify it. This is driving at least one and possibly most of the key results, since it affects the reproductive value of subordinates (and therefore the costs of helping).

      Regarding division of labor, I think I was not clear so will try again. The authors assume that the group reproduction is 1+H_total/(1+H_total), where H_total is the sum of all the defense and work help, but with the proviso that if one of the totals is higher than "H_max", the average of the two totals (plus k_m, but that's set to a low value, so we can ignore it), it is replaced by that. That means, for example, if total "work" help is 10 and "defense" help is 0, total help is given by 5 (well, 5.1 but will ignore k_m). That's what I meant by "marginal benefit of help is only reduced by a half" last round, since in this scenario, adding 1 to work help would make total help go to 5.5 vs. adding 1 to defense help which would make it go to 6. That is a pretty weak form of modeling "both types of tasks are necessary to successfully produce offspring" as the newly added passage says (which I agree with), since if you were getting no defense by a lot of food, adding more food should plausibly have no effect on your production whatsoever (not just half of adding a little defense). This probably explains why often the "division of labor" condition isn't that different than the no DoL condition.

    3. Reviewer #2 (Public review):

      Summary:

      This paper formulates an individual-based model to understand the evolution of division of labor in vertebrates. The model considers a population subdivided in groups, each group has a single asexually-reproducing breeder, other group members (subordinates) can perform two types of tasks called "work" or "defense", individuals have different ages, individuals can disperse between groups, each individual has a dominance rank that increases with age, and upon death of the breeder a new breeder is chosen among group members depending on their dominance. "Workers" pay a reproduction cost by having their dominance decreased, and "defenders" pay a survival cost. Every group member receives a survival benefit with increasing group size. There are 6 genetic traits, each controlled by a single locus, that control propensities to help and disperse, and how task choice and dispersal relate to dominance. To study the effect of group augmentation without kin selection, the authors cross-foster individuals to eliminate relatedness. The paper allows for the evolution of the 6 genetic traits under some different parameter values to study the conditions under which division of labour evolves, defined as the occurrence of different subordinates performing "work" and "defense" tasks. The authors envision the model as one of vertebrate division of labor.

      The main conclusion of the paper is that group augmentation is the primary factor causing the evolution of vertebrate division of labor, rather than kin selection. This conclusion is drawn because, for the parameter values considered, when the benefit of group augmentation is set to zero, no division of labor evolves and all subordinates perform "work" tasks but no "defense" tasks.

      Strengths:

      The model incorporates various biologically realistic details, including the possibility to evolve age polytheism where individuals switch from "work" to "defence" tasks as they age or vice versa, as well as the possibility of comparing the action of group augmentation alone with that of kin selection alone.

      Weaknesses:

      The model and its analysis is limited, which makes the results insufficient to reach the main conclusion that group augmentation and not kin selection is the primary cause of the evolution of vertebrate division of labor. There are several reasons.

      First, the model strongly restricts the possibility that kin selection is relevant. The two tasks considered essentially differ only by whether they are costly for reproduction or survival. "Work" tasks are those costly for reproduction and "defense" tasks are those costly for survival. The two tasks provide the same benefits for reproduction (eqs. 4, 5) and survival (through group augmentation, eq. 3.1). So, whether one, the other, or both tasks evolve presumably only depends on which task is less costly, not really on which benefits it provides. As the two tasks give the same benefits, there is no possibility that the two tasks act synergistically, where performing one task increases a benefit (e.g., increasing someone's survival) that is going to be compounded by someone else performing the other task (e.g., increasing that someone's reproduction). So, there is very little scope for kin selection to cause the evolution of labour in this model. Note synergy between tasks is not something unusual in division of labour models, but is in fact a basic element in them, so excluding it from the start in the model and then making general claims about division of labour is unwarranted. I made this same point in my first review, although phrased differently, but it was left unaddressed.

      Second, the parameter space is very little explored. This is generally an issue when trying to make general claims from an individual-based model where only a very narrow parameter region has been explored of a necessarily particular model. However, in this paper, the issue is more evident. As in this model the two tasks ultimately only differ by their costs, the parameter values specifying their costs should be varied to determine their effects. Instead, the model sets a very low survival cost for work (yh=0.1) and a very high survival cost for defense (xh=3), the latter of which can be compensated by the benefit of group augmentation (xn=3). Some very limited variation of xh and xn is explored, always for very high values, effectively making defense unevolvable except if there is group augmentation. Hence, as I stated in my previous review, a more extensive parameter exploration addressing this should be included, but this has not been done. Consequently, the main conclusion that "division of labor" needs group augmentation is essentially enforced by the limited parameter exploration, in addition to the first reason above.

      Third, what is called "division of labor" here is an overinterpretation. When the two tasks evolve, what exists in the model is some individuals that do reproduction-costly tasks (so-called "work") and survival-costly tasks (so-called "defense"). However, there are really no two tasks that are being completed, in the sense that completing both tasks (e.g., work and defense) is not necessary to achieve a goal (e.g., reproduction). In this model there is only one task (reproduction, equation 4,5) to which both "tasks" contribute equally and so one task doesn't need to be completed if the other task compensates for it. So, this model does not actually consider division of labor.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This paper presents a computational model of the evolution of two different kinds of helping ("work," presumably denoting provisioning, and defense tasks) in a model inspired by cooperatively breeding vertebrates. The helpers in this model are a mix of previous offspring of the breeder and floaters that might have joined the group, and can either transition between the tasks as they age or not. The two types of help have differential costs: "work" reduces "dominance value," (DV), a measure of competitiveness for breeding spots, which otherwise goes up linearly with age, but defense reduces survival probability. Both eventually might preclude the helper from becoming a breeder and reproducing. How much the helpers help, and which tasks (and whether they transition or not), as well as their propensity to disperse, are all evolving quantities. The authors consider three main scenarios: one where relatedness emerges from the model, but there is no benefit to living in groups, one where there is no relatedness, but living in larger groups gives a survival benefit (group augmentation, GA), and one where both effects operate. The main claim is that evolving defensive help or division of labor requires the group augmentation; it doesn't evolve through kin selection alone in the authors' simulations.

      This is an interesting model, and there is much to like about the complexity that is built in. Individual-based simulations like this can be a valuable tool to explore the complex interaction of life history and social traits. Yet, models like this also have to take care of both being very clear on their construction and exploring how some of the ancillary but potentially consequential assumptions affect the results, including robust exploration of the parameter space. I think the current manuscript falls short in these areas, and therefore, I am not yet convinced of the results. Much of this is a matter of clearer and more complete writing: the Materials and Methods section in particular is incomplete or vague in some important junctions. However, there are also some issues with the assumptions that are described clearly.

      Below, I describe my main issues, mostly having to do with model features that are unclear, poorly motivated (as they stand), or potentially unrealistic or underexplored.

      We would like to thank the reviewer for the thoughtful comments that helped us to greatly improve the clarity of our paper.  

      One of the main issues I have is that there is almost no information on what happens to dispersers in the model. Line 369-67 states dispersers might join another group or remain as floaters, but gives no further information on how this is determined. Poring through the notation table also comes up empty as there is no apparent parameter affecting this consequential life history event. At some point, I convinced myself that dispersers remain floaters until they die or become breeders, but several points in the text contradict this directly (e.g., l 107). Clearly this is a hugely important model feature since it determines fitness cost and benefits of dispersal and group size (which also affects relatedness and/or fitness depending on the model). There just isn't enough information to understand this crucial component of the model, and without it, it is hard to make sense of the model output.

      We use the same dispersal gene β to represent the likelihood an individual will either leave or join a group, thereby quantifying both dispersal and immigration using the same parameter. Specifically, individuals with higher β are more likely to remain as floaters (i.e., disperse from their natal group to become a breeder elsewhere), whereas those with lower β are either more likely to remain in their natal group as subordinates (i.e., queue in a group for the breeding position) or join another group if they dispersed.  

      We added in the text “Dispersers may migrate to another group to become subordinates or remain as floaters waiting for breeding opportunities, which is also controlled by the same genetic dispersal propensity as subordinates” to clarify this issue. We also added in Table 1 that β is the “genetic predisposition to disperse versus remain in a group”, and to Figure 1 that “subordinates in the group (natal and immigrants) […]” after we already clarified that “Dispersers/floaters may join a random group to become subordinates.”

      Related to that, it seems to be implied (but never stated explicitly) that floaters do not work, and therefore their DV increases linearly with age (H_work in eq.2 is zero). That means any floaters that manage to stick around long enough would have higher success in competition for breeding spots relative to existing group members. How realistic is this? I think this might be driving the kin selection-only results that defense doesn't evolve without group augmentation (one of the two main ways). Any subordinates (which are mainly zero in the no GA, according to the SI tables; this assumes N=breeder+subordinates, but this isn't explicit anywhere) would be outcompeted by floaters after a short time (since they evolve high H and floaters don't), which in turn increases the benefit of dispersal, explaining why it is so high. Is this parameter regime reasonable? My understanding is that floaters often aren't usually high resource holding potential individuals (either b/c high RHP ones would get selected out of the floater population by establishing territories or b/c floating isn't typically a thriving strategy, given that many resources are tied to territories). In this case, the assumption seems to bias things towards the floaters and against subordinates to inherit territories. This should be explored either with a higher mortality rate for floaters and/or a lower DV increase, or both.

      When it comes to floaters replacing dead breeders, the authors say a bit more, but again, the actual equation for the scramble competition (which only appears as "scramble context" in the notation table) is not given. Is it simply proportional to R_i/\sum_j R_j ? Or is there some other function used? What are the actual numbers of floaters per breeding territory that emerge under different parameter values? These are all very important quantities that have to be described clearly.

      Although it is true that dispersers do not work when they are floaters, they may later help if they immigrate into a group as a subordinate. Consequently, immigrant subordinates have no inherent competitive advantage over natal subordinates (as step 2.2. “Join a group” is followed by step 3. “Help”, which occurs before step 5. “Become a breeder”). Nevertheless, floaters can potentially outcompete subordinates of the same age if they attempt to breed without first queuing as a subordinate (step 5) when subordinates are engaged in work tasks. We believe that this assumption is realistic and constitutes part of the costs associated with work tasks. However, floaters are at a disadvantage for becoming a breeder because: (1) floaters incur higher mortality than individuals within groups (Eq. 3); and (2) floaters may only attempt to become breeders in some breeding cycles (versus subordinate groups members, who are automatically candidates for an open breeding position in the group in each cycle). Therefore, due to their higher mortality, floaters are rarely older than individuals within groups, which heavily influences their dominance value and competitiveness. Additionally, any competitive advantage that floaters might have over other subordinate group members is unlikely to drive the kin selection-only results because subordinates would preferably choose defense tasks instead of work tasks so as not to be at a competitive disadvantage compared to floaters.  

      Regarding whether floaters aren't usually high resource holding potential (RHP) individuals and, therefore, our assumptions might be unrealistic; empirical work in a number of species has shown that dispersers are not necessarily those of lower RHP or of lower quality. In fact, according to the ecological constraints hypothesis, one might predict that high quality individuals are the ones that disperse because only individuals in good condition (e.g., larger body size, better energy reserves) can afford the costs associated with dispersal (Cote et al., 2022). To allow differences in dispersal propensity depending on RHP, we extended our model in the Supplemental Materials by incorporating a reaction norm of dispersal based on their rank (D = 1 / (1 + exp (β<sub>R</sub> * Rβ<sub>0</sub>)) under the section “Dominance-dependent dispersal propensities” and now referenced in L195. This approach allows individuals to adjust their dispersal strategy to their competitiveness and to avoid kin competition by remaining as a subordinate in another group. Results show that the addition of the reaction norm of dispersal to rank did not qualitatively influence the results described in the main text.  

      We also added “number of floaters” present in the whole population to the summary tables as requested.  

      As a side note, the “scramble context” we mention was an additional implementation in which we made rank independent of age. However, since the main conclusions remained unchanged, we decided to remove it for simplicity from the final manuscript, but we forgot to remove it from Table 1 before submission.  

      I also think the asexual reproduction with small mutations assumption is a fairly strong one that also seems to bias the model outcomes in a particular way. I appreciate that the authors actually measured relatedness within groups (though if most groups under KS have no subordinates, that relatedness becomes a bit moot), and also eliminated it with their ingenious swapping-out-subordinates procedure. The fact remains that unless they eliminate relatedness completely, average relatedness, by design, will be very high. (Again, this is also affected by how the fate of the dispersers is determined, but clearly there isn't a lot of joining happening, just judging from mean group sizes under KS only.) This is, of course, why there is so much helping evolving (even if it's not defensive) unless they completely cut out relatedness.

      As we showed in the Supplementary Tables and the section on relatedness in the SI (“Kin selection and the evolution of division of labor"), high relatedness does not appear to explain our results. In evolutionary biology generally and in game theory specifically (with the exception of models on sexual selection or sex-specific traits), asexual reproduction is often modelled because it reduces unnecessary complexity. To further study the effect of relatedness on kin structures more closely resembling those of vertebrates, however, we created an additional “relatedness structure level”, where we shuffled half of the philopatric offspring using the same method used to remove relatedness completely, effectively reducing withingroup relatedness structure by half. As shown in the new Figure S3, the conclusions of the model remain unchanged.  

      Finally, the "need for division of labor" section is also unclear, and its construction also would seem to bias things against division of labor evolving. For starters, I don't understand the rationale for the convoluted way the authors create an incentive for division of labor. Why not implement something much simpler, like a law of minimum (i.e., the total effect of helping is whatever the help amount for the lowest value task is) or more intuitively: the fecundity is simply a function of "work" help (draw Poisson number of offspring) and survival of offspring (draw binomial from the fecundity) is a function of the "defense" help. As it is, even though the authors say they require division of labor, in fact, they only make a single type of help marginally less beneficial (basically by half) if it is done more than the other. That's a fairly weak selection for division of labor, and to me it seems hard to justify. I suspect either of the alternative assumptions above would actually impose enough selection to make division of labor evolve even without group augmentation.

      In nature, multiple tasks are often necessary to successfully rear offspring. We simplify this principle in the model by maximizing reproductive output when both tasks are carried out to a similar extent, allowing for some flexibility from the mean. We added to the manuscript “For example, in many cooperatively breeding birds, the primary reasons that individuals fail to produce offspring are (1) starvation, which is mitigated by the feeding of offspring, and (2) nest depredation, which is countered by defensive behavior. Consequently, both types of tasks are necessary to successfully produce offspring, and focusing solely on one while neglecting the other is likely to result in lower reproductive success than if both tasks are performed by individuals within the group.”

      Regarding making fecundity a function of work tasks and offspring survival as a function of defensive tasks, these are actually equivalent in model terms, as it’s the same whether breeders produce three offspring and two die, or if they only produce one. This represents, of course, an oversimplification of the natural context, where breeding unsuccessfully is more costly (in terms of time and energy investment) than not breeding at all.

      Overall, this is an interesting model, but the simulation is not adequately described or explored to have confidence in the main conclusions yet. Better exposition and more exploration of alternative assumptions and parameter space are needed.

      We hope that our clarifications and extension of the model satisfy your concerns.  

      Reviewer #2 (Public review):

      Summary:

      This paper formulates an individual-based model to understand the evolution of division of labor in vertebrates. A main conclusion of the paper is that direct fitness benefits are the primary factor causing the evolution of vertebrate division of labor, rather than indirect fitness benefits.

      Strengths:

      The paper formulates an individual-based model that is inspired by vertebrate life history. The model incorporates numerous biologically realistic details, including the possibility to evolve age polytheism where individuals switch from work to defence tasks as they age or vice versa, as well as the possibility of comparing the action of group augmentation alone with that of kin selection alone.

      Weaknesses:

      The model makes assumptions that restrict the possibility that kin selection leads to the evolution of helping. In particular, the model assumes that in the absence of group augmentation, subordinates can only help breeders but cannot help non-breeders or increase the survival of breeders, whereas with group augmentation, subordinates can help both breeders and non-breeders and increase the survival of breeders. This is unrealistic as subordinates in real organisms can help other subordinates and increase the survival of non-breeders, even in the absence of group augmentation, for instance, with targeted helping to dominants or allies. This restriction artificially limits the ability of kin selection alone to lead to the evolution of helping, and potentially to division of labor. Hence, the conclusion that group augmentation is the primary driving factor driving vertebrate division of labor appears forced by the imposed restrictions on kin selection. The model used is also quite particular, and so the claimed generality across vertebrates is not warranted.

      We would like to thank the reviewer for the in-depth review. We respond to these and other comments below.  

      I describe some suggestions for improving the paper below, more or less in the paper's order.

      First, the introduction goes to great lengths trying to convince the reader that this model is the first in this or another way, particularly in being only for vertebrates, as illustrated in the abstract where it is stated that "we lack a theoretical framework to explore the conditions under which division of labor is likely to evolve" (line 13). However, this is a risky and unnecessary motivation. There are many models of division of labor and some of them are likely to be abstract enough to apply to vertebrates even if they are not tailored to vertebrates, so the claims for being first are not only likely to be wrong but will put many readers in an antagonistic position right from the start, which will make it harder to communicate the results. Instead of claiming to be the first or that there is a lack of theoretical frameworks for vertebrate division of labor, I think it is enough and sufficiently interesting to say that the paper formulates an individual-based model motivated by the life history of vertebrates to understand the evolution of vertebrate division of labor. You could then describe the life history properties that the model incorporates (subordinates can become reproductive, low relatedness, age polyethism, etc.) without saying this has never been done or that it is exclusive to vertebrates; indeed, the paper states that these features do not occur in eusocial insects, which is surprising as some "primitively" eusocial insects show them. So, in short, I think the introduction should be extensively revised to avoid claims of being the first and to make it focused on the question being addressed and how it is addressed. I think this could be done in 2-3 paragraphs without the rather extensive review of the literature in the current introduction.

      We have revised the novelty statements in the Introduction by more clearly emphasizing how our model addresses gaps in the existing literature. More details are provided in the comments below.

      Second, the description of the model and results should be clarified substantially. I will give specific suggestions later, but for now, I will just say that it is unclear what the figures show. First, it is unclear what the axes in Figure 2 show, particularly for the vertical one. According to the text in the figure axis, it presumably refers to T, but T is a function of age t, so it is unclear what is being plotted. The legend explaining the triangle and circle symbols is unintelligible (lines 227-230), so again it is unclear what is being plotted; part of the reason for this unintelligibility is that the procedure that presumably underlies it (section starting on line 493) is poorly explained and not understandable (I detail why below). Second, the axes in Figure 3 are similarly unclear. The text in the vertical axis in panel A suggests this is T, however, T is a function of t and gamma_t, so something else must be being done to plot this. Similarly, in panel B, the horizontal axis is presumably R, but R is a function of t and of the helping genotype, so again some explanation is lacking. In all figures, the symbol of what is being plotted should be included.

      We added the symbols of the variables to the Figure axes to increase clarity. In Figure 3A, we corrected the subindex t in the x-axis; it should be subindex R (reaction norm to dominance rank instead of age). As described in Table 1, all values of T, H and R are phenotypically expressed values. For instance, T values are the phenotypically expressed values from the individuals in the population according to their genetic gamma values and their current dominance rank at a given time point.  

      Third, the conclusions sound stronger than the results are. A main conclusion of the paper is that "kin selection alone is unlikely to select for the evolution of defensive tasks and division of labor in vertebrates" (lines 194-195). This conclusion is drawn from the left column in Figure 2, where only kin selection is at play, and the helping that evolves only involves work rather than defense tasks. This conclusion follows because the model assumes that without group augmentation (i.e., xn=0, the kin selection scenario), subordinates can only help breeders to reproduce but cannot help breeders or other subordinates to survive, so the only form of help that evolves is the least costly, not the most beneficial as there is no difference in the benefits given among forms of helping. This assumption is unrealistic, particularly for vertebrates where subordinates can help other group members survive even in the absence of group augmentation (e.g., with targeted help to certain group members, because of dominance hierarchies where the helping would go to the breeder, or because of alliances where the helping would go to other subordinates). I go into further details below, but in short, the model forces a narrow scope for the kin selection scenario, and then the paper concludes that kin selection alone is unlikely to be of relevance for the evolution of vertebrate division of labor. This conclusion is particular to the model used, and it is misleading to suggest that this is a general feature of such a particular model.

      The scope of this paper was to study division of labor in cooperatively breeding species with fertile workers (i.e., primarily vertebrates), in which help is exclusively directed towards breeders to enhance offspring production (i.e., alloparental care). Our focus is in line with previous work in most other social animals, including eusocial insects and humans, which emphasizes how division of labor maximizes group productivity. Other forms of “general” help are not considered in the paper, and such forms of help are rarely considered in cooperatively breeding vertebrates or in the division of labor literature, as they do not result in task partitioning to enhance productivity.

      Overall, I think the paper should be revised extensively to clarify its aims, model, results, and scope of its conclusions.

      Recommendations for the authors: 

      Reviewer #1 (Recommendations for the authors):

      I reserved this section for more minor comments, relating to clarity and a general admonition to give us more detail and exploration of some basic population genetic quantities.

      Another minor point, although depending on whether I assume right or wrong, it could be major: I am not entirely sure that dispersers help in the groups they join as helpers, because of line 399, which states specifically that individuals who do remain in natal territories do. But I assume dispersers help (elsewhere, the authors state helping is not conditional on relatedness to the breeder). Otherwise, this model becomes even weirder for me. Either way, please clarify.

      Apologies if this was not clear. Immigrants that join a group (so dispersers from another group) as a subordinate help and queue for a breeding position, as does any natal subordinate born into the group. We rephased the sentence to “Subordinate group members, either natal or immigrants to the group, […]”  

      More generally, in simulation studies like this, there can be interactions between the strength of selection (which affects overall genetic variation maintained in the population), population size, and mutation rate/size, which can affect, for example, relatedness values. None of these quantities is explored here (and their interactions are not quantified), so it is not possible to evaluate the robustness of any of these results.

      Thank you for your comments about the parameter landscape. It is important to point out that variations in the mutation rate do not qualitatively affect our results, as this is something we explored in previous versions of the model (not shown). Briefly, we find that variations in the mutation rates only alter the time required to reach equilibrium. Increasing the step size of mutation diminishes the strength of selection by adding stochasticity and reducing the genetic correlation between offspring and their parents. Population size could, in theory, affect our results, as small populations are more prone to extinction. Since this was not something we planned to explore in the paper directly, we specifically chose a large population size, or better said, a large number of territories (i.e. 5000) that can potentially host a large population.  

      The authors also never say how it is actually determined. There is the evolved helping variable, and there is also the evolved reaction norm. I assume that the actual amount of help of each type is given by the product of T (equation 1) and H (for defense) and (1-T) and H (for work), but this should be stated explicitly.  

      Help provided is an interaction between H (total effort) and T (proportion of total effort invested in each type of task). To clarify the distinction between these two processes, we have now added “Hence, the gene α regulates the amount of help expressed, while the genes γ determine which specific helping tasks are performed at different time points in the breeding cycle”.  

      It is also weird that after introducing the T variable as a function of age, Figure 3 actually depicts it as a function of dominance value.

      Thank you for pointing out an error in Eq. 1. This inequality was indeed written incorrectly in the paper (but is correct in the model code); it is dominance rank instead of age (see code in Individual.cpp lines 99-119). We corrected this mistake throughout the manuscript.

      What is "scramble context"?

      “Scramble context” was an additional implementation that we decided to remove from the final manuscript, but we forgot to remove from Table 1 before submission. We have now removed it from the table.

      Reviewer #2 (Recommendations for the authors):

      Some specific comments:

      (1) L 31: "All theoretical..." These absolute statements are risky and unnecessary.

      Rephrased to “To date, most theoretical and empirical work…”

      (2) L 46: I believe Tom Wenseleers has published on the evolution of division of labor with reproductive workers and high within-colony conflict.

      Tom Wenseleers has indeed produced some models on the evolution of cooperation in social insects where some workers may reproduce. However, these models focus on the relevance of relatedness and policing selecting for a reduction in within-group conflict and the evolution of reproductive division of labor. Our model focuses instead on division of labor among workers (helpers). We have rephased this section to “task specialization is linked to sterility and where conflict of interest is generally low” to account for species of social insect in which variation in relatedness between group members and higher levels of reproductive conflict may arise. We also cited one of his papers.  

      (3) L 57: Again, unnecessary categorical statements.

      Rephrased to “Although a great deal of recent empirical work highlights the importance of direct benefits in the evolution of cooperative breeding behavior in vertebrates [21–24], we lack understanding on the joint influence of direct and indirect fitness benefits in the evolution of division of labor.”

      (4) L 67: This is said to be a key distinction, but in the paper, such a key role is not clearly shown. This and other tangential points are unnecessary to keep the introduction to the point.

      The different fitness costs of different tasks is the basis of our model on division of labor. Therefore, this is a key distinction and basis from which to describe different tasks in the model. We have left this sentence unchanged.

      (5) L 61-73: "In vertebrates, however, helpers may obtain fitness benefits directly via reproduction..." Some social insects may do so as well. It seems unnecessary and incorrect to say that vertebrate sociality is fundamentally different from invertebrate one. I think it is sufficiently interesting to say this work aims to understand vertebrate division of labor, by explicitly modeling aspects of its life history, without saying this can't happen in invertebrates or that no other model has ever done anything like it.

      Our point is not that, in some social insects, workers cannot obtain direct fitness benefits, but that previous models where the focus is on the colony reproductive outcome are only a good approximation to eusocial insect with sterile workers. However, to make this clearer we have added “In vertebrates and social insect with fertile workers, however, helpers may obtain fitness benefits directly via […]”.  

      (6) L 74-86: By this point, the introduction reads like a series of disconnected comments without a clear point.

      In L60 we added: “Understanding how direct and indirect benefits interact is particularly important in systems where individuals may differentially bear the fitness costs of cooperation”. By adding this sentence, we emphasize our focus on the largely unexplored direct fitness benefits and costs, as well as their interaction with indirect fitness. We then proceed to explain why it is crucial to consider that tasks have varying direct fitness costs and how the fitness benefits derived from cooperation change with age and resource-holding potential. These elements are essential for studying the division of labour in species with totipotent workers.

      (7) L 87: This sentence gives a clear aim. It would be clearer if the introduction focused on this aim.

      With the new sentence added in L60 (see previous comment), we bring the focus to the main question that we are trying to address in this paper earlier in the Introduction.  

      (8) L 88: "stochastic model" should be changed to "individual-based model".

      Done.

      (9) L 104: "limited number" is unclear. Say a fixed finite number, or something specific.

      Done.

      (10) L 105: "unspecified number" is unclear. Say the number of subordinates emerges from the population dynamics.

      Changed to “variable number of subordinate helpers, the number of which is shaped by population dynamics, with all group members capable of reproducing during their lifetime”.

      (11) L 112: "Dispersers" is used, but in the previous lines 107-109, the three categories introduced used different terms. Those three terms introduced should be used consistently throughout the paper, without using two or more terms for one thing.

      We use the term “disperser” to describe individuals that disperse from their natal group.

      Dispersers can assume one of three roles: (1) they can join another group as "subordinates"; (2) they can join another group as "breeders" if they successfully outcompete others; or (3) they can remain as "floaters" if they fail to join a group. "Floaters" are individuals who persist in a transient state without access to a breeding territory, waiting for opportunities to join a group in an established territory. We rephased the sentence to “Dispersers cannot reproduce without acquiring a territory (denoted here as floaters)”. This was also clarified in other instances where the term “dispersers” was used (e.g. L407). Other instances where this might not have been so clear, we replace “dispersers” with “floaters”.  

      (12) L 112: "(floaters)" Unclear parenthesis.

      See previous comment.  

      (13) L 115: There should be a reference to Methods around here.

      Added a reference to Figure 1.

      (14) L 117: To be clearer, say instead that dominance value is a linearly increasing function of age as a proxy of RHP and a linearly decreasing function of help provided due to the costs of working tasks. And refer to equation 2.

      Rephrased to “We use the term dominance value to designate the competitiveness of an individual compared to other candidates in becoming a breeder, regardless of group membership, that increases as a function of age, serving as a proxy for resource holding potential (RHP), and decreases as a function of help provided, reflecting costs to body condition from performing working tasks (Eq. 2).” We did not include “linearly” to keep it simpler, since it is clear from Eq. 2, which is now referenced here.  

      (15) L 119: "Subordinate helpers". As all subordinates are helpers, the helper qualifier is confusing.

      Subordinates are not necessarily helpers, as they can evolve help values of 0, hence, why we make it explicit here.

      (16) L 119: "choose". This terminology may be misleading. The way things are implemented in the model is that individuals are assigned a task depending on their genetic traits gamma. Perhaps it would be better to use a less intentional term, like perform one of two tasks.

      We changed “choose between two” to “engage in one of two”, which has less connotations of intentionality.

      (17) L 124: "Subordinates can [...] exhibit task specialization that [...] varies with their dominance value". It should be that it varies with age.

      Apologies. The equation was wrong; it does vary with dominance value. We corrected it accordingly.

      (18) L 133: "maximised" This is apparently important for the modelling procedure, but it is completely unclear what it means. Equation 4 comes out of nowhere, and it is said that such an equation is the maximum amount of help that can affect fecundity. Why? What does this mean? If there is something that is maximised, this should be proven. This value is then used for something (line 507), but it is unclear why or what it is used for (it says "we use the value of Hmax instead" without saying what for, no justification for the listed inequalities are given, and the claimed maximisation of an unspecified variable at those H values is not proven). Moreover, the notation in this section is also unclear: what are the sums over? Also, Hdefence and Hwork should vary over the index that is summed over, but the notation suggests that those quantities don't vary.

      We changed “maximized” to “greatest”, and we added a clarification to the rationality behind the maximization of the impact of help in the breeder’s productivity: “For example, in many cooperatively breeding birds, the primary reasons that breeders fail to produce offspring are (1) starvation, which is mitigated by the feeding of offspring, here considered as a work task, and (2) nest depredation, which is countered by defensive behavior. Consequently, both types of tasks are often necessary for successful reproduction, and focusing solely on one while neglecting the other is likely to result in lower reproductive success than if both tasks are performed by helpers within the group.”

      We now also clarify that the sums are for help given within a group (L 507), and added indexes to the equations.

      (19) L 152: "habitat saturation" How is this implemented? How is density dependence implemented? Or can the population size keep increasing indefinitely? It would be good to plot the population size over time, the group size over time, and the variance in group size over time. This could substantiate later statements about enhancing group productivity and could all be shown in the SI.

      Habitat saturation emerges from population dynamics due to the limited availability of territories and the fluctuating number of individuals, leading highly productive environments to experience habitat saturation. Although the number of group members is not restricted in our model, the population could theoretically increase indefinitely. However, this is not observed in the results presented here, as we selected parameter landscapes that stabilize population numbers. We confined our parameters to those where the population neither increased indefinitely (nor collapsed), as we did not incorporate density-dependent mortality traits for simplification. Consequently, the group size in the SI, where the standard deviation is already included, closely represents group size at any other given time during equilibrium.

      L 336: we changed “environments with habitat saturation” to “environments that lead to habitat saturation”, to increase clarity.

      (20) L 152: "lifecycle". Rather than the lifecycle, the figure describes the cycle of events in a single time step. The lifecycle (birth to death) goes over multiple time steps (as individuals live over multiple steps). So this figure shouldn't be called a life cycle.

      We changed “lifecycle” to “breeding cycle”.

      (21) L 156: "generation". This is not a generation but a time step.

      We changed “generation” to “breeding cycle”.

      (22) L 157: "previous life cycle" would mean that the productivity of a breeder depends on the number of helpers that its parents had, which is not what is meant.

      We changed “lifecycle” to “breeding cycle”.

      (23) L 158: "Maximum productivity is achieved when different helping tasks are performed to a similar extent." Again, unclear why that is the case.

      We added a clarification on this, see response to comment 18.  

      (24) L 160: "Dispersers/floaters". Use just one term for a single thing.

      See response to comment 11.   

      (25) L 162: "dispersal costs". I don't recall these being described in Methods.

      Individuals that disperse do not enjoy the protection of living in a territory and within a group of other individuals, so they have a higher mortality risk, described in Eq. 3.3. (negative values in the exponential part of the equation increase survival). The cost of dispersal is the same as individuals that remain as floaters at a given time step.

      (26) L 164: "generation" -> time step.

      We changed this to “breeding cycle”.  

      (27) L 170: "Our results show that division of labor initially emerges because of direct fitness benefits..." This is a general statement, but the results are only particular to the model. So this statement and others in the manuscript should be particular to the model. Also, Figure 2 doesn't say anything about what evolves "initially" as it only plots evolutionary equilibria.

      We rephrased this statement to “Our results suggest that voluntary division of labor involving tasks with different fitness costs is more likely to emerge initially because of direct fitness benefits”, to more accurately represent the conditions under which we modeled the division of labor.  

      Our reference to “initially” is regarding group formation (family groups versus aggregations of unrelated individuals or a mix). This is shown in the comparison between the different graphs at equilibrium. The initial state of the simulation is that all individuals disperse and do not cooperate.  

      (28) L 171: "but a combination of direct and indirect fitness benefits leads to higher rates and more stable forms of division of labor". What do you mean by "higher rates and more stable forms of division of labor"? Say how division of labor is shown in the figure (with intermediate T?).

      Yes, intermediate values of T show division of labor if γR ≠ 0. This is described under the section “The role of dominance in task specialization”. We added “with intermediate values suggesting a division of labor” to the Figure 2 legend.  

      (29) L173-175: "as depicted in Figure 2, intermediate values of task specialization indicate in all cases age/dominance-mediated task specialization (γt ≠ 0; Table 1) and never a lack of specialization (γt = 0; Table 1)". This sentence is unclear and imprecise. Does this sentence want to say that in Figure 2, all plots with intermediate values of T involve gamma t different from zero? If so, just say that.

      Rephrased to: “In Figure 2, all plots depicting intermediate values of T exhibit non-zero γR values and, hence, division of labor”.

      (30) L179-180: "forms of help that impact survival never evolve under any environmental condition when only kin selection occurs". This is misleading because under the KS scenario, help cannot positively impact survival in this model, so they never evolve.

      Help cannot affect survival but could potentially affect group persistence. If helpers increase breeder productivity and offspring remain philopatric and queue for the breeding position, then they will receive help from related individuals.   

      (31) L 210: "initially". What do you mean by that?

      Help only evolves in our model in family groups, which may then open the door for the evolution of help in mixed-kin groups. Therefore, we use “initially” to refer to the ancestral group structure that likely led to cooperation under benign environmental conditions. We rephased this section to “in more benign (and often highly productive) environments that lead to habitat saturation, help likely evolved initially in family groups, and defensive tasks are favored because competition for the breeding position is lower under kin selection.”

      (32) L 212: "kin selection is achieved". What does that mean?

      Rephased to “kin selection acts not only by selecting subordinates in their natal group to increase the productivity of a related breeder […]”

      (33) L 216: "division of labor seems to be more likely to evolve in increasingly harsh environments". Say in parentheses where this is shown.

      Added.  

      (34) L 218: "help evolves in benign environments". I don't see where this is shown. Figure 2 doesn't show that H is higher with lower m (e.g., in KS+GA column).

      Help does not evolve in benign environments under only direct fitness benefits derived from group augmentation (shown in Figure 2).  

      (35) L 225: "y-axis" should be "vertical axis", as y has another meaning in the model.

      Done.

      (36) L 226: "likelihood". Here and throughout, "likelihood" should be changed to probability. Likelihood means something else.

      Thank you for the advice, we have corrected this through the manuscript.  

      (37) L 236: "the slope of the reaction norm for the dominance value in task specialization".

      Unclear. Clearer to say: the rate at which individuals to shift from defense to work as they age.

      The important part is not so much the rate but the direction, that is, from work task to defense (or vice versa) as their rank increases. Changed to “the direction and rate of change in task specialization with dominance”.

      (38) L 257: "(task = 0; cost to dominance value)," This seems out of place.

      This aims to clarify that work tasks have a cost to dominance, while defense tasks have a cost to survival. This is particularly relevant in this model since different helping tasks are defined by their fitness costs.

      (39) L 258: "increase"-> "increase with age".

      Added “with dominance”.

      (40) L 262: "division of labor equilibria" What is that?

      Changed to “at equilibrium when division of labor evolves”

      (41) L 268: "Our findings suggest that direct benefits of group living play a driving role in the evolution of division of labor via task specialization in species with totipotent workers". This is a very general statement, but the results are much more circumscribed. First, the model is quite specific by assuming that, in the absence of group augmentation (xn=0), indirect fitness benefits can only be given to breeders (Equation 5) but not to other subordinates (Equations 2, 3.1). This is unrealistic, particularly for vertebrates, and reduces the possibility that indirect fitness benefits play a role.  

      As previously discussed, the scope of this paper was to study division of labor in cooperatively breeding species with fertile workers in which help is exclusively directed towards breeders to enhance offspring production through alloparental care. Other forms of “general” help do not result in task partitioning to enhance productivity.

      Second, the difference in costs of work and defense are what drive the evolution of "division of labor" (understood as intermediate T in case this is what the authors mean) in the KS scenario, but the functional forms of those two costs are quite specific and not of the same form, so these functions may bias the results found. Specifically, R is an unbounded linear function of work and the effect of this function becomes weaker as the individual ages due to the weakening force of selection with age (Equation 2) whereas Sh is a particular bounded nonlinear function of defense (Equation 3.1). These differences may tend to make the effect of Sh stronger due to the particular functions chosen.  

      The difference in costs is inherent to the nature of the different tasks (work versus defense): while survival is naturally bounded, with death as the lower bound, dominance costs are potentially unbounded, as they are influenced by dynamic social contexts and potential competitors. Therefore, we believe that the model’s cost structure is not too different from that in nature.  

      Third, no parameter sweep is given to see to what extent these results hold across the many parameters involved. So, in summary, the discussion should at least reflect that the results are of a restricted nature rather than giving the impression that they are of the suggested level of generality.

      During the exploratory phase of the model development, various parameters and values were assessed. However, the manuscript only details the ranges of values and parameters where changes in the behaviors of interest were observed, enhancing clarity and conciseness. For instance, variation in yh (the cost of help on dominance when performing “work tasks”) led to behavioral changes similar to those caused by changes in xh (the cost of help in survival when performing “defensive tasks”), as both are proportional to each other. Specifically, since an increase in defense costs raises the proportion of work relative to defense tasks, while an increase in the costs of work task has the opposite effect, only results for the variation of xh were included in the manuscript to avoid redundancy. Added to Table 1: “To maintain conciseness, further exploration of the parameter landscape was not included in the manuscript”.

      (42) L 270: "in eusocial insects often characterized by high relatedness and reproductive inhibition, sterile workers acquire fitness benefits only indirectly". This is misleading. Sterile workers of any taxa, be it insects or vertebrates, can only acquire fitness benefits indirectly as they are sterile, but eusocial insects involve not only sterile workers.

      Rephased to “In contrast, in eusocial species characterized by high relatedness and permanent worker sterility, such as most eusocial insects, workers acquire fitness benefits only indirectly”. In any case, permanent sterility only occurs in eusocial invertebrates; in vertebrates with reproductive inhibition sterility is only temporal and context dependent. Therefore, in vertebrates, sterile workers may potentially obtain direct fitness benefits if the social context changes, as is the case in naked mole-rats.  

      (43) L 273: "Group members in eusocial species are therefore predicted to maximize colony fitness due to the associated lower within-group conflict". Again, this is incorrect. Primitively eusocial insects have high conflict.

      We added “Group members in such eusocial species” to clarify that we are not referring here to primitively eusocial species but those with permanent sterile workers.  

      (44) L 277: "when the benefits of cooperation are evenly distributed among group members". In this model, the benefits of cooperation are not evenly distributed among group members: breeders reproduce, but subordinates don't.

      Subordinates may reproduce if they become breeders later in life. However, subordinates also benefit from cooperation as subordinates directly (greater survival in larger groups), and indirectly if they are related to the breeder. Here we refer to the first one, and we expand on that in the following sentence.  

      (45) L 280: "survival fitness benefits derived from living in larger groups seem to be key for the evolution of cooperative behavior in vertebrates [22, 63], and may also translate into low within-group conflict. This suggests that selection for division of labor in vertebrates is stronger in smaller groups". I don't see how the previous sentence suggests this. The paper does not present results to support this statement (i.e., no selection gradients in smaller vs larger groups are shown).

      The benefits of living in a larger group entail diminishing returns, so those living in smaller groups benefit greater by an increase in productivity and group size than those in a larger group.  

      (46) L 284: "Our model demonstrates that vertebrates evolve a more stable division of labor". Where is that shown? How is "more stable" measured?

      Rephrased to “vertebrates are more likely to evolve division of labor”. This is shown in Figure 2, that exemplifies that division of labor evolves in a wider range of environmental condition and to a higher degree (intermediate values of T).  

      (47) L 287: "direct fitness benefits in the form of group augmentation select more strongly for defensive tasks". Where is that shown? Establishing this would entail comparing selection gradients with direct fitness benefits of group augmentation and without them.

      In Figure 2, when we compare the GA column to KS+GA column, we see that at equilibrium, more helpers choose defense tasks, specially when they are free to choose their preferred task (circles).  

      (48) L 288: "kin selection alone seems to select only for work tasks." Again, this may be an artifact of the model assuming that helpers cannot increase non-breeders' fitness components except via group augmentation, and that defense tasks are inherently more costly than work tasks.

      As stated previously, we are studying task specialization in cooperative breeders where help is in the form of alloparental care (from allofeeding and egg care to defense from predators). We also assume that the costs are different, but whether one or the other is more costly depends on the relative context (e.g., a task can be more costly if it affects competitiveness in a very competitive environment). It is important to note that we name these tasks “work” and “defense” for practical reasons, but the focus of the paper is on tasks with different fitness costs that for their characteristics may not fit so well in under this terminology. While we acknowledge that most tasks have both kinds of fitness costs to a degree, here we focus on the main fitness costs of each kind of task (L430-436).  

      (49) L 290: "are comparatively large". This sounds as if the tasks are large, which is presumably not what is meant.

      Rephrased to “costs to dominance value and to the probability of attaining a breeding position are comparatively larger than survival costs.”

      (50) L 298: "helpers are predicted to increase defensive tasks with age or rank, whereas in harsh environments, work tasks are predicted to increase with age or rank." Add parentheses referring to where this is shown.

      This is shown in Figure 3, but since this is described in the discussion, we did not add a reference to the figure. If the editor would like us to refer to figures here, we can (see also comments below relating to the same issue).

      (51) L 308: "the role of age and environmental harshness on the evolution of division of labor". What is the prediction? Simply, the role of age is an assumption, not a prediction.

      Rephrased to “the role of environmental harshness on the evolution of division of labor via age-dependent task specialization”.

      (52) L 315: "individuals shifting from work tasks such as foraging for food, digging, and maintaining the burrow system, to defensive tasks such as guarding and patrolling as individuals grow older and larger". Say in parentheses where this is predicted.

      This prediction comes from Figure 3, we do not reference it here since we are in the Discussion section.  

      (53) L 320: "Under these conditions, our model predicts the highest levels of task partitioning and division of labor." Where is this predicted? Add parentheses referring to where this is shown. As it is, it is not possible to check the validity of the statement.

      This prediction comes from Figure 2 column KS+GA, we do not reference it here since we are in the Discussion section. The results with references to the figures are found under the Results section. In the discussion, we reiterate the results already described and add some examples from real data that seem to confirm our predictions.  

      (54) L 322: "In line with our model predictions, larger and older helpers of this species invest relatively more in territory maintenance, whereas younger/smaller helpers defend the breeding shelter of the dominant pair to a greater extent against experimentally exposed egg predators". These predictions are neat, but are now very difficult to understand from the figures. Maybe at the bottom of 3A, you could add a diagram work->defense for negative gamma_t and defense>work for positive gamma_t (or whatever order it is).

      Done.

      (55) L 325: "Territory maintenance has been shown to greatly affect routine metabolic rates and, hence, growth rates [80], which directly translates into a decrease in the likelihood of becoming dominant and attaining breeding status, as predicted by our model." This seems to be an assumption, not a prediction.

      That is true. We removed: “as predicted by our model”.  

      (56) L 352: "controlled". This means something else.

      Changed to “addressed”.

      (57) L 356: "summary, our study represents the first theoretical model aimed at elucidating the potential mechanisms underlying division of labor between temporal non-reproductives via task specialization in taxa beyond eusocial organisms". Again, claiming to be the first is risky and unnecessary.

      Rephrased to “our study helps to elucidate”.

      (58) L 358: "Harsh environments, where individuals can obtain direct fitness benefits from group living, favor division of labor, thereby enhancing group productivity and, consequently, group size." I'm not sure about this conclusion as harsh environments (large m in Figure 2) also involve the evolution of no division of labor (from the triangles and circles that are zero in the right bottom panel) and perhaps more so than with less harsh environments (intermediate m). Incidentally, in the bottom right panel of Figure 2, do the two separate clusters of triangles and circles mean that there is some sort of evolutionary branching?

      Yes, there are two different equilibria for the same set of conditions. Although it is true that for m=0.3 less division of labor evolves when kin selection and group augmentation act together, it is not the case when only group augmentation takes place. In addition, we qualify m=0.2 as harsh as opposed to benign in which we observe the rise of habitat saturation (m=0.1). m=0.3 is then an extreme harsh environment, in which in several instances different parameter landscape causes population collapse (see figures in the Supplemental Material).  

      (59) L 360: "Variation in the relative fitness costs of different helping tasks with age favors temporal polyethism". I don't see that this has been shown. Temporal polyethism evolves here whenever gamma_t evolves non-zero values. Figure 3A shows that non-zero gamma_t evolves with harsher environments, but I don't see what the "variation in relative fitness costs of different helping tasks" refers to.

      The evolved reaction norms of the model are towards different fitness costs depending on the task performed, since this is how we define the different types of tasks in the model.  

      (60) L 382: "undefined". Say variable. Undefined is something else.

      Undefined is more accurate, since we did not define how many subordinates there were per group, while “variable” could have been defined within a range, which was not the case in this model.  

      (61) L 390: "each genetic locus". Say earlier that each genetic trait is controlled by a single locus.

      Added.  

      (62) L 395: "complete" and "consistent" -> "certain".

      We changed one to “certain” and another to “absolute” to avoid using the same adjective twice in a sentence.  

      (63) L 396: What determines whether dispersers become subordinates or floaters? A trait? Or a fixed probability?

      We added “which is also controlled by the same genetic dispersal predisposition as for subordinates”.

      (64) L 412-413: "cycle". This should be a breeding step.

      Changed to “season” instead.

      (65) L 418: Say negatively impacts (it could also be positively impacts, which I guess is not what you mean).

      Done.

      (66) L 425: "a sample of floaters". Chosen how?

      Added “randomly drawn”.

      (67) L 426-428. But the equation in Table 1 indicates that all floaters compete for breeding spots, not a sample of floaters. This is not clear.

      The number of floaters sampled to try to breed at a given group is N<sub>f,b</sub> = 𝑓∗𝑁<sub>𝑓</sub>/𝑁<sub>𝑏</sub> (Table 1).

      Therefore, N<sub>f,b</sub> is the sample size of floaters for a given open breeding position, and f is how many groups on average a floater attempts to access in each time step.  

      (68) L 432. In the figure, the breeding cycle is called a step, but here it is called a cycle. There should be a single term used throughout. Breeding is not really a cycle here (it doesn't involve multiple steps that are repeated cyclically), so it seems more appropriate to call this breeding steps or breeding seasons.

      Taken into account previous comments, we changed the terms “generation” and “life cycle” to “breeding cycle”. We added “or seasons”.  

      (69) L 439: "generations". What are generations here, as generations are overlapping? You probably mean time steps or something else.

      Changed to “breeding cycles”.

      (70) L 439: "equilibrium was reached". Presumably, equilibrium is reached only asymptotically, so some cutoff is implemented in practice. So maybe say explicitly what cutoff was implemented.

      As mentioned, we run the model for 200’000 time steps, and if equilibrium was not reached for the phenotypic values, then we run the model for longer, with 400’000 time steps being the maximum at which all simulation reached equilibrium. In some cases, genetic values did not reach equilibrium at ranges at which there was no impact on phenotypic values, so these were disregarded to assess whether equilibrium was reached.  

      (71) L 452: "Even though individuals are likely to change the total amount of help given throughout their lives". Do you mean in real organisms or in the model? Say which. If it is in the model, it is not clear how.

      We added “in nature” to clarify that this was not the case in the model.  

      (72) L 455: "For more details on how individuals may adapt their level of help with age and social and environmental conditions, see [63]." Do you mean real individuals or in the model? Again, if it is in the model, it is unclear how this is possible and should be explained in this paper at least briefly rather than citing another one.

      We rephrased it to “How individuals in the model may adapt their level of help with age and social and environmental conditions has been described elsewhere.” We do not go into detail here because it is not within the scope of the paper, and those results have been described elsewhere.  

      (73) L 475: "helpers". Make terminology consistent throughout.

      All helpers are subordinates, but not all subordinates are helpers, as they may evolve no help. Since here we are describing those subordinates that do help, we use that terminology. We added “subordinate helpers” to clarify this further.  

      (74) L 476: "proportional". The dependence in Equation 1 is not "proportional to". Say something like "a survival probability (not rate) that decreases with the amount of help provided".

      Done.

      (75) L 482: "environmental"-> baseline, as defined first.

      Done.

      (76) L 486: "benefits". Can you briefly say in parentheses what those benefits are in real organisms? As in line 475, where you reminded the reader of survival costs due to predator defense.

      Added “such as those offered by safety in numbers or increased resource defense potential”.

      (77) L 494. "we first outline a basic model in which individuals". It is not clear what this sentence says, and the remainder of this section does not clarify it.

      We made two models for comparison, one where individuals can choose freely which task they prefer to perform, and another in which there is an increase in productivity when both kinds of tasks are performed to a similar extent at group level. In the latter model, individuals may choose an unpreferred task at certain times during their lived to increase the effect of the help provided in the breeder’s (and group’s) productivity.  

      We rephrased this section to “we first outline a basic model where individuals evolve their preferred helping task. Then we compare this to another model in which the breeder’s reproductive outcome is maximized when the group’s helping effort in each kind of tasks is performed to a roughly equal degree.”

      (78) L 496: "by performing both tasks". Sounds as if the breeder performs both tasks, not helpers.

      We changed to “when the group’s helping effort in each kind of tasks”.

      (79) L 497: "the maximum amount of cumulative help of each type (sigma Hmax) that can affect fecundity is given by Eq. 4:" This statement is imprecise. Presumably, what is meant is that this level of help maximises breeder productivity, as stated earlier in the paper. However, there is no proof that this level of help maximises breeder productivity, so this expression seems unjustified and it is unclear how it is used.

      This is a description of the model set up. As described later in the same section, the cumulative help of each time that will influence the breeder’s fecundity if maximum Hmax. Therefore, it does represent the maximum amount of cumulative help of each type that can affect the breeder’s fecundity.

      (80) L 500: "reproduced" -> "reproduce".

      Done.  

      (81) L 503. Say here what K is so that the reader knows what equation 5 is showing.

      Added “K” to the “The quantity of offspring produced (K)”.

      (82) L 503: "diminishing returns" -> "diminishing returns as help increases".

      Done.  

      (83) L 507: Why these inequalities?

      These inequalities explain the use of Hmax (response to comment 79). We rephased it to “the cumulative defense effort is larger than or the cumulative work effort is larger than ”.  

      (84) L 526: "removing the influence of relatedness from the model". It would be helpful to plot relatedness in this and the other scenario to check that it is indeed low here and high in the other.

      The actual values of relatedness are provided in the Supplemental Material Table S1. We added this reference to Figure 2.  

      (85) L 528: "It is possible that direct and indirect fitness benefits could have an additive effect on the evolution of alloparental care". This is technically incorrect. It is also unclear what the point of this sentence is.

      We have removed this sentence.  

      (86) Table 1: Say what are the allowed values for these genotypic traits (can they take negative values, be greater than one, are they continuous or discrete?): e.g., alpha \in [0,1] or alpha \in (-infinity, infinity). For phenotypic traits, it would be helpful if the third column lists the equation where the trait is defined. As the variables in the first column are scalars, they should not be bold face. Survival "rate" should be survival "probability" throughout.

      All genetic traits can take any real number (-infinity, infinity), but the phenotypic values are either constrained by the equation like for logistic formulas, or manually constrained like for dispersal propensity or help (only positive numbers allowed). We added “Each genetic trait is controlled by a single locus, and may take any real number” (L403), and added the boundaries for help and dominance value in Table 1. We decided against including the equations in the table due to space constraints. We removed the bold face as suggested. We changed all instances of “survival rate” to “survival probability”.

      (87) Figures S1, S2: I don't recall seeing references to these figures in the main text, but there should be, as well as for Tables S1-S3.

      Table S1 is now referenced in Figure 2. The other figures are now referenced in the main text when we reference the different sections in the Supplemental Materials (L190 and L198). Other Tables are referenced in their respective Figures in the SI.

    1. eLife Assessment

      This study introduces a novel and potentially valuable metric-phenological lag-to quantify the gap between observed and expected phenological shifts under climate warming. While the dataset is extensive and the framework is clearly defined, key assumptions (e.g., base temperature, linear forcing response) are not empirically tested, and the analysis underexplores key spatial and climatic gradients. The strength of evidence is mostly solid but would benefit from further validation and deeper analysis.

    2. Reviewer #1 (Public review):

      Jiang et al. present a measure of phenological lag by quantifying the effects of abiotic constraints on the differences between observed and expected phenological changes, using a combination of previously published phenology change data for 980 species, and associated climate data for study sites. They found that, across all samples, observed phenological responses to climate warming were smaller than expected responses for both leafing and flowering spring events. They also show that data from experimental studies included in their analysis exhibited increased phenological lag compared to observational studies, possibly as a result of reduced sensitivity to climatic changes. Furthermore, the authors present evidence that spatial trends in phenological responses to warming may differ than what would be expected from phenological sensitivity, due to the seasonal timing of when warming occurs. Thus, climate change may not result in geographic convergences of phenological responses. This study presents an interesting way to separate the individual effects of climate change and other abiotic changes on the phenological responses across sites and species.

      Strengths:

      A straightforward mathematical definition of phenological lag allows for this method to potentially be applied in different geographic contexts. Where data exists, other researchers can partition the effects of various abiotic forcings on phenological responses that differ from those expected from warming sensitivity alone.

      Identifying phenological lag, and associated contributing factors, provides a method by which more nuanced predictions of phenological responses to climate change can be made. Thus, this study could improve ecological forecasting models.

      Weaknesses:

      The analysis here could be more robust. A more thorough examination of phenological lag would provide stronger evidence that the framework presented has utility. The differences in phenologica lag by study approach, species origin, region, and growth form are interesting, and could be expanded. For example, the authors have the data to explore the relationships between phenological lag and the quantitative variables included in the final model (altitude, latitude, mean annual temperature) and other spatial or temporal variables. This would also provide stronger evidence for the author's claims about potential mechanisms that contribute to phenological lag.

      The authors include very little data visualizations, and instead report results and model statistics in tables. This is difficult to interpret and may obscure underlying patterns in the data. Including visual representations of variable distributions and between-variable relationships, in addition to model statistics, provides stronger evidence than model statistics alone.

    3. Reviewer #3 (Public review):

      Summary:

      The authors developed a new phenological lag metric and applied this analytical framework to a global dataset to synthesize shifts in spring phenology and assess how abiotic constraints influence spring phenology.

      Strengths:

      The dataset developed in this study is extensive, and the phenological lag metric is valuable.

      Weaknesses:

      The stability of the method used in this study needs improvement, particularly in the calculation of forcing requirements. In addition, the visualization of the results (such as Table 1) should be enhanced.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Jiang et al. present a measure of phenological lag by quantifying the effects of abiotic constraints on the differences between observed and expected phenological changes, using a combination of previously published phenology change data for 980 species, and associated climate data for study sites. They found that, across all samples, observed phenological responses to climate warming were smaller than expected responses for both leafing and flowering spring events. They also show that data from experimental studies included in their analysis exhibited increased phenological lag compared to observational studies, possibly as a result of reduced sensitivity to climatic changes. Furthermore, the authors present compelling evidence that spatial trends in phenological responses to warming may differ from what would be expected from phenological sensitivity, due to the seasonal timing of when warming occurs. Thus, climate change may not result in geographic convergences of phenological responses. This study presents an interesting way to separate the individual effects of climate change and other abiotic changes on the phenological responses across sites and species.

      Greater phenological lag with experimental studies results in reduced sensitivity to climatic changes, not other way around.

      Strengths:

      A clearly defined and straightforward mathematical definition of phenological lag allows for this method to be applied in different scientific contexts. Where data exists, other researchers can partition the effects of various abiotic forcings on phenological responses that differ from those expected from warming sensitivity alone.

      Sensitivity does not tell the magnitude of phenological changes, nor does it provide indications of mechanisms responsible for changes in spring phenology. Because of uneven warming, the same average temperature change (annual or spring temperatures) can have greater (greater warming prior to budburst) or smaller (smaller warming prior to budburst) phenological change than that with even warming. When average temperature change is close to zero, uneven warming can lead to infinite sensitivity values, either advanced (warmer temperatures prior to budburst) or delayed (cooler temperatures prior to budburst) spring phenology.

      It is not clear why sensitivity is so popularly used in phenological research.

      Identifying phenological lag and associated contributing factors provides a method by which more nuanced predictions of phenological responses to climate change can be made. Thus, this study could improve ecological forecasting models.

      Weaknesses:

      The authors include very few data visualizations, and instead report results and model statistics in tables. This is difficult to interpret and may obscure underlying patterns in the data. Including visual representations of variable distributions and between-variable relationships, in addition to model statistics, provides stronger evidence than model statistics alone.

      The use of stepwise, automated regression may be less suitable than a hypothesis-driven approach to model selection, combined with expanded data visualization. The use of stepwise regression may produce inappropriate models based on factors of the sample data that may preclude or require different variable selection.

      We used two statistical methods, variance analysis to examine differential phenological responses (Figure 2) and regression analysis to determine the relative importance of forcing change, budburst temperature, and physiological lag, the drivers of changes in spring phenology (Table 2). Our objective was to understand why plants show differential responses by research approach, species origin, climatic region, and growth form identified in previous research. Variable selection may affect minor (altitude, latitude, MAT, and average spring temperature change) or insignificant (photoperiod and long-term precipitation) variables, but not those related to drivers of spring phenology. We are not sure how hypothesis-driven approach can help with our objective.

      Reviewer #2 (Public review):

      Summary:

      This is a meta-analysis of the relative contributions of spring forcing temperature, winter chilling, photoperiod and environmental variables in explaining plant flowering and leafing phenology. The authors develop a new summary variable called phenology lag to describe why species might have different responses than predicted by spring temperature.

      Strengths:

      The summary statistic is used to make a variety of comparisons, such as between observational studies and experimental studies.

      Weaknesses:

      By combining winter chilling effects, photoperiod effects, and environmental stresses that might affect phenology, the authors create a new variable that is hard to interpret. The authors do not provide information in the abstract about new insights that this variable provides.

      Phenological lag contains effects of all constraints that may include chilling effects, photoperiod effects, and environmental stresses and is, indeed, hard to interpret without investigation of individual constraints. In our synthesis, spring phenology (or photoperiod effect) is not significant across all studies complied. It is also unlikely that lack of winter chilling causes the systemic differences in phenological lag between observational and experimental studies or between native and exotic species (see discussion at lines 335-339). At individual study level, the contribution of different constraints to the overall lag effect can be specifically determined if moisture stresses, species chilling and photoperiod effects, or cold hardiness are known from on-site monitoring or previous research.

      The meaning of phenological lag is described at lines 34-38 in the abstract.

      Comments:

      It would be useful to have a map showing the sites of the studies.

      A map showing the sites of the studies was added as supplementary Figure S1.

      The authors should provide a section in which the strengths and weaknesses of the approach are discussed. Is it possible that mixing different types of data, studies, sample sizes, number of years, experimental set-ups, and growth habits results in artifacts that influence the results?

      Both strengths and weaknesses are discussed at various places throughout the paper. The weakness of our method, as indicated by the reviewer, is the inclusion of different constraints in the phenological lag and has been described at lines 34-38 in the abstract and lines 80-86 in the introduction of the concept. We have also expanded Conclusion section to discuss possible caveats at lines 369-393.

      As in all data analyses, the results can change with addition of more/different data, especially when sample size is relatively small. Ideally, comparisons are made among levels of fixed effects while controlling variations of other conditions. In phenological studies, however, climatic, phenological, and biological conditions all vary. For example, observational and experimental studies differ not only in the nature of warming (natural climate change vs artificial warming), but also in levels of warming (greater warming with experimental studies) and climatic, phenological, and biological conditions (Table 1). All phenological syntheses (or meta-analyses) have to make do with this uncontrolled nature of phenological data.

      Now that the authors have created this new variable, phenological lag, which of the components that contribute to it has the most influence on it? Or which components are most influential in which circumstances? For example, what are some examples where photoperiod causes a phenological lag?

      Any of the phenological constraints identified can contribute alone or in combination with others to the overall effect of phenological lag. Across all studies with this synthesis, the lack of significance with spring phenology rules out photoperiod effect, while the association of longer phenological lags with longer accumulation of winter chilling does not suggest general chilling shortage with the current extent of climate change.

      Although spring phenology is not significant across all studies, photoperiod effect can be influential at individual studies where changes in spring phenology are large. However, reported photoperiod effects in the literature are mostly confounding effects with temperatures, i.e., longer photoperiods are associated with longer hours of high daytime temperatures (see Chu et al., 2021). Other than European beech under an unlikely scenario of climate change (growth resumes at beginning of winter), there has been not clear evidence showing the effect of photoperiod in constraining spring phenology.

      Another confounding effect with photoperiod is extra heating effect with artificial light sources in warming experiments. Some early studies have shown that leaf temperature can be several degrees above the ambient air, due to long-wave radiation with artificial light sources. It is hard to believe the constraining effect of photoperiod on spring phenology if phenological changes are within inter-annual variations (can be a few weeks), although photoperiod effect has been increasingly discussed recently.

      Recommendations for the authors:

      Reviewing Editor:

      A key methodological concern is the inconsistent definition of growth temperature across observations. It is calculated over the interval between the baseline phenological date and the expected date under warming - a window that varies by species, site, and treatment. This variability limits comparability across observations and may introduce circularity, as growth temperature is derived from the same modelled expectation (i.e., the expected phenological advance) that it is later used to explain.

      The term “growth temperature” has been replaced with “budburst temperature” to indicate temperatures at species events. Budburst temperature is the average temperature within the window of expected response with the warmer climate and, as indicated by the editor, varies by species, sites, and treatments. This species-specific temperature provides an opportunity to compare among species, sites, and treatments and helps explain differences in observed responses, as demonstrated in the discussion of results in this synthesis.

      Forcing change, budburst temperature, and expected response are related. High budburst temperatures are associated with smaller expected responses, which helps explain smaller observed responses with late season species and areas of warm climates that have been often attributed to chilling or photoperiod effect.

      Additionally, the use of degree days above 0 {degree sign}C as a universal metric for spring forcing oversimplifies species' temperature responses. This approach assumes not only a fixed base temperature but also a linear response to temperature accumulation, which overlooks well-established nonlinear or species-specific thermal response curves. To improve the robustness and interpretability of the phenological lag framework, we encourage the authors to consider these limitations and explore ways to test or justify these modelling assumptions more explicitly.

      The use of 0 degree base temperature may not be the best choice for some species. Except for some early work, there has been few experimental research on physiological aspects of chilling and forcing processes. A popular alternative is modelling using assumed temperature response models. As variables influencing chilling and forcing processes are not controlled, the determined base temperatures and temperature response models may be OK with the species studied under particular conditions but would be inappropriate for applications beyond. It is hard to believe that species, in a study, all have different base temperature for accumulation of spring forcing and optimum temperature for winter chilling. Apparently, this is the result of model fitting, not actual dynamics of chilling and forcing processes.

      Two base temperatures are commonly used, 0 and 5 oC, although choice is not generally justified. It is known for long time that temperatures above 0oC contribute to spring forcing. My personal experience at tree nursery suggests that seedlings will flush after winter cold storage, even at forcing temperatures ≤ 5 oC in the dark. The use of 5 oC is rather the choice of tradition (5 oC is commonly used to define growing season) than scientific justification. The use of high base temperatures may not make much difference at high temperatures due to short forcing duration but will underestimate forcing at low temperatures due to long forcing duration and large proportions of forcing between 0 and base temperatures. We are not aware of any experimental studies that demonstrate non-zero base temperatures.

      Within the dominant range of spring temperatures (e.g., between 5 and 25 oC), the forcing responses to temperatures can be approximated with linear models. Again, we are not aware of any non-linear forcing models that can be safely applied beyond the species studied under particular conditions.

      Regardless, the uses of different base temperatures or forcing models would not affect the partitioning of phenological changes, simply because temperature response models reflect physiological aspects of chilling and forcing processes and would not change with climate warming.

      The authors introduce a new metric, phenological lag, to assess how phenological constraints influence spring phenology, offering new insights into phenological research. However, there are several concerns. First, the research question and the study's aim are not clearly presented. The authors primarily analyzed phenological lag and simply compared it across different groups, but additional analyses are needed to adequately address the research question. In addition, the broader importance of this study is not clearly explained - why this research is necessary and what it contributes to the field should be explicitly stated.

      The research question is outlined at lines 92-108. We added “Our objective was to determine how phenological responses differ among different groups and how differential responses are related to drivers of spring phenology, i.e., forcing change, budburst temperature, and phenological lag” at lines 106-108.

      (1) Abstract: The methodological improvements and more key results should be included.

      Growth temperature has been replaced with “budburst temperature” to indicate temperatures at time of budburst. More results are added at lines 40-48.

      (2) Line 32: Terms such as "sensitivity analysis" and "phenological lag" need clearer definitions.

      We added at lines 32-33 to define sensitivity analysis “that is based on rates of phenological changes, not on drivers of spring phenology”. Phenological lag is defined at lines 34-38.

      (3) Lines 38-47: Further results and the urgency or importance of the study should be conveyed.

      More results are added at lines 40-48. The importance of this study is described at lines 48-50.

      (4) Line 57-58: This sentence is unclear - please clarify.

      The sentence is modified to “difficult using sensitivity analysis that is based on rates of phenological changes, not on drivers of spring phenology".

      (5) Line 60: break "endodormancy".

      Breaking dormancy would mean endodormancy.

      (6) Line 67: What does "growth temperature" refer to?

      Growth temperature has been replaced with “budburst temperature” to indicate temperatures at time of budburst. It is calculated as the average temperature within the window of expected response with the warmer climate.

      (7) Lines 87-94: The specific purpose of the study is vague. Why is this method needed, and how will it serve future research?

      We have modified the paragraph at lines 92-108 to provide justification and objective of the study.

      (8) Lines 163-164: The rationale for exploring differences in observed responses and phenological lag needs to be better justified.

      We added explanations at lines 179-182 why observed responses and phenological lag were chosen in the analysis.

      (9) Lines 178-183: Tables and figures should be properly cited within the text.

      Table S3 was added at line 197.

      (10) Lines 195-198: Clarify whether variables were scaled before model analysis.

      We clarified at line 192 “variables were not standardized prior to regression analysis”.

      (11) Line 206-207: The observed response is presented as the number of advanced days, while temperature sensitivity refers to the response of spring phenology to temperature - these are different variables and should not be conflated.

      The two variables are related but show different aspects of phenological changes. Observed response divided by average temperature change gives temperature sensitivity. Observed response is the total changes in number of days observed, while temperature sensitivity is the change in number of days per unit change in average temperature (oC). Sensitivity may reflects rates of phenological change with temperature (see responses to reviewer 1).

      (12) In the discussion section, the authors compared phenological responses among different groups separately. This section requires substantial improvement to more clearly answer the research question.

      These discussions are related to our objective “how phenological responses differ among different groups identified in previous research (i.e., research approach, species origin, climatic region, and growth form) and how these differential responses are related to drivers of spring phenology, i.e., forcing change, budburst temperature, and phenological lag”.

    1. eLife Assessment

      This paper presents a valuable software package, named "Virtual Brain Inference" (VBI), that enables faster and more efficient inference of parameters in dynamical system models of whole-brain activity, grounded in artificial network networks for Bayesian statistical inference. The authors have provided convincing evidence, across several case studies, for the utility and validity of the methods using simulated data from several commonly used models, but more thorough benchmarking could be used to demonstrate the practical utility of the toolkit. This work will be of interest to computational neuroscientists interested in modelling large-scale brain dynamics.

    2. Reviewer #1 (Public review):

      This work provides a new Python toolkit for combining generative modeling of neural dynamics and inversion methods to infer likely model parameters that explain empirical neuroimaging data. The authors provided tests to show the toolkit's broad applicability, accuracy, and robustness; hence, it will be very useful for people interested in using computational approaches to better understand the brain.

      Strengths:

      The work's primary strength is the tool's integrative nature, which seamlessly combines forward modelling with backward inference. This is important as available tools in the literature can only do one and not the other, which limits their accessibility to neuroscientists with limited computational expertise. Another strength of the paper is the demonstration of how the tool can be applied to a broad range of computational models popularly used in the field to interrogate diverse neuroimaging data, ensuring that the methodology is not optimal to only one model. Moreover, through extensive in-silico testing, the work provided evidence that the tool can accurately infer ground-truth parameters even in the presence of noise, which is important to ensure results from future hypothesis testing are meaningful.

      Weaknesses

      The paper still lacks appropriate quantitative benchmarking relative to non-Bayesian-based inference tools, especially with respect to performance accuracy and computational complexity and efficiency. Without this benchmarking, it is difficult to fully comprehend the power of the software or its ability to be extended to contexts beyond large-scale computational brain modelling.

    3. Reviewer #2 (Public review):

      Summary:

      Whole-brain network modeling is a common type of dynamical systems-based method to create individualized models of brain activity incorporating subject-specific structural connectome inferred from diffusion imaging data. This type of model has often been used to infer biophysical parameters of the individual brain that cannot be directly measured using neuroimaging but may be relevant to specific cognitive functions or diseases. Here, Ziaeemehr et al introduce a new toolkit, named "Virtual Brain Inference" (VBI), offering a new computational approach for estimating these parameters using Bayesian inference powered by artificial neural networks. The basic idea is to use simulated data, given known parameters, to train artificial neural networks to solve the inverse problem, namely, to infer the posterior distribution over the parameter space given data-derived features. The authors have demonstrated the utility of the toolkit using simulated data from several commonly used whole-brain network models in case studies.

      Strength:

      - Model inversion is an important problem in whole-brain network modeling. The toolkit presents a significant methodological step up from common practices, with the potential to broadly impact how the community infers model parameters.<br /> - Notably, the method allows the estimation of the posterior distribution of parameters instead of a point estimation, which provides information about the uncertainty of the estimation, which is generally lacking in existing methods.<br /> - The case studies were able to demonstrate the detection of degeneracy in the parameters, which is important. Degeneracy is quite common in this type of models. If not handled mindfully, they may lead to spurious or stable parameter estimation. Thus, the toolkit can potentially be used to improve feature selection or to simply indicate the uncertainty.<br /> - In principle, the posterior distribution can be directly computed given new data without doing any additional simulation, which could improve the efficiency of parameter inference on the artificial neural network is well-trained.

      Weaknesses:

      - The z-scores used to measure prediction error are generally between 1-3, which seems quite large to me. It would give readers a better sense of the utility of the method if comparisons to simpler methods, such as k-nearest neighbor methods, are provided in terms of accuracy.<br /> - A lot of simulations are required to train the posterior estimator, which is computationally more expensive than existing approaches. Inferring from Figure S1, at the required order of magnitudes of the number of simulations, the simulation time could range from days to years, depending on the hardware. The payoff is that once the estimator is well-trained, the parameter inversion will be very fast given new data. However, it is not clear to me how often such use cases would be encountered. It would be very helpful if the authors could provide a few more concrete examples of using trained models for hypothesis testing, e.g., in various disease conditions.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This work provides a new Python toolkit for combining generative modeling of neural dynamics and inversion methods to infer likely model parameters that explain empirical neuroimaging data. The authors provided tests to show the toolkit's broad applicability and accuracy; hence, it will be very useful for people interested in using computational approaches to better understand the brain.

      Strengths:

      The work's primary strength is the tool's integrative nature, which seamlessly combines forward modelling with backward inference. This is important as available tools in the literature can only do one and not the other, which limits their accessibility to neuroscientists with limited computational expertise. Another strength of the paper is the demonstration of how the tool can be applied to a broad range of computational models popularly used in the field to interrogate diverse neuroimaging data, ensuring that the methodology is not optimal to only one model. Moreover, through extensive in-silico testing, the work provided evidence that the tool can accurately infer ground-truth parameters, which is important to ensure results from future hypothesis testing are meaningful.

      We are happy to hear the positive feedback on our effort to provide an open-source and widely accessible tool for both fast forward simulations and flexible model inversion, applicable across popular models of large-scale brain dynamics.

      Weaknesses:

      Although the tool itself is the main strength of the work, the paper lacked a thorough analysis of issues concerning robustness and benchmarking relative to existing tools.

      The first issue is the robustness to the choice of features to be included in the objective function. This choice significantly affects the training and changes the results, as the authors even acknowledged themselves multiple times (e.g., Page 17 last sentence of first paragraph or Page 19 first sentence of second paragraph). This brings the question of whether the accurate results found in the various demonstrations are due to the biased selection of features (possibly from priors on what worked in previous works). The robustness of the neural estimator and the inference method to noise was also not demonstrated. This is important as most neuroimaging measurements are inherently noisy to various degrees.

      The second issue is on benchmarking. Because the tool developed is, in principle, only a combination of existing tools specific to modeling or Bayesian inference, the work failed to provide a more compelling demonstration of its added value. This could have been demonstrated through appropriate benchmarking relative to existing methodologies, specifically in terms of accuracy and computational efficiency.

      We fully agree with the reviewer that the VBI estimation heavily depends on the choice of data features, and this is the core of the inference procedure, not its weakness. We have demonstrated different scenarios showing how the informativeness of features (commonly used in the literature) results in varying uncertainty quantification. For instance, using summary statistics of functional connectivity (FC) and functional connectivity dynamics (FCD) matrices to estimate global coupling parameter leads to fast convergence; however, it is not sufficient to accurately estimate the whole-brain heterogeneous excitability parameter, which requires features such as statistical moments of time series. VBI provides a taxonomy of data features that users can employ to test their hypotheses. It is important to note that one major advantage of VBI is its ability to make estimation using a battery of data features, rather than relying on a limited set (such as only FC or FCD) as is often the case in the literature. In the revised version, we will elaborate further by presenting additional scenarios to demonstrate the robustness of the estimation. We will also evaluate the robustness of the neural density estimators to (dynamical/additive) noise.

      More importantly, relative to benchmarking, we would like to draw attention to a key point regarding existing tools and methods. The literature often uses optimization for fitting whole-brain network models, and its limitations for reliable causal hypothesis testing have been pointed out in the Introduction/Discussion. As also noted by the reviewer under strengths, and to the best of our knowledge, there are no existing tools other than VBI that can scale and generalize to operate across whole-brain models for Bayesian model inversion. Previously, we developed Hamiltonian Monte Carlo (HMC) sampling for Epileptor model in epilepsy (Hashemi et al., 2020, Jha et al., 2022). This phenomenological model is very well-behaved in terms of numerical integration, gradient calculation, and dynamical system properties (Jirsa et al., 2014). However, this does not directly generalize to other models, particularly the Montbrió model for resting-state, which exhibits bistability with noise driving transitions between states. As shown in Baldy et al., 2024, even at the level of a single neural mass model (i.e., one brain region), gradient-based HMC failed to capture such switching behaviour, particularly when only one state variable (membrane potential) was observed while the other (firing rate) was missing. Our attempts to use other methods (e.g., the second-derivative-based Laplace approximation used in Dynamic Causal Modeling) also failed, due to divergence in gradient calculation. Nevertheless, reparameterization techniques (Baldy et al., 2024) and hybrid algorithms (Gabrié et al., 2022) could offer improvements, although this remains an open problem for these classes of computational models.

      In sum, for oscillatory systems, it has been shown previously that SBI approach used in VBI substantially outperforms both gradient-based and gradient-free alternative methods (Gonçalves et al., 2020, Hashemi et al., 2023, Baldy et al., 2024). Importantly, for bistable systems with switching dynamics, gradient-based methods fail to converge, while gradient-free methods do not scale to the whole-brain level (Hashemi et al., 2020). Hence, the generalizability of VBI relies on the fact that neither the model nor the data features need to be differentiable. We will clarify this point in the revised version. Moreover, we will provide better explanations for some terms mentioned by the reviewer in Recommendations.

      Hashemi, M., Vattikonda, A. N., Sip, V., Guye, M., Bartolomei, F., Woodman, M. M., & Jirsa, V. K. (2020). The Bayesian Virtual Epileptic Patient: A probabilistic framework designed to infer the spatial map of epileptogenicity in a personalized large-scale brain model of epilepsy spread. NeuroImage, 217, 116839.

      Jha, J., Hashemi, M., Vattikonda, A. N., Wang, H., & Jirsa, V. (2022). Fully Bayesian estimation of virtual brain parameters with self-tuning Hamiltonian Monte Carlo. Machine Learning: Science and Technology, 3(3), 035016.

      Jirsa, V. K., Stacey, W. C., Quilichini, P. P., Ivanov, A. I., & Bernard, C. (2014). On the nature of seizure dynamics. Brain, 137(8), 2210-2230.

      Baldy, N., Breyton, M., Woodman, M. M., Jirsa, V. K., & Hashemi, M. (2024). Inference on the macroscopic dynamics of spiking neurons. Neural Computation, 36(10), 2030-2072.

      Baldy, N., Woodman, M., Jirsa, V., & Hashemi, M. (2024). Dynamic Causal Modeling in Probabilistic Programming Languages. bioRxiv, 2024-11.

      Gabrié, M., Rotskoff, G. M., & Vanden-Eijnden, E. (2022). Adaptive Monte Carlo augmented with normalizing flows. Proceedings of the National Academy of Sciences, 119(10), e2109420119.

      Gonçalves, P. J., Lueckmann, J. M., Deistler, M., Nonnenmacher, M., Öcal, K., Bassetto, G., ... & Macke, J. H. (2020). Training deep neural density estimators to identify mechanistic models of neural dynamics. elife, 9, e56261.

      Hashemi, M., Vattikonda, A. N., Jha, J., Sip, V., Woodman, M. M., Bartolomei, F., & Jirsa, V. K. (2023). Amortized Bayesian inference on generative dynamical network models of epilepsy using deep neural density estimators. Neural Networks, 163, 178-194.

      Reviewer #2 (Public review):

      Summary:

      Whole-brain network modeling is a common type of dynamical systems-based method to create individualized models of brain activity incorporating subject-specific structural connectome inferred from diffusion imaging data. This type of model has often been used to infer biophysical parameters of the individual brain that cannot be directly measured using neuroimaging but may be relevant to specific cognitive functions or diseases. Here, Ziaeemehr et al introduce a new toolkit, named "Virtual Brain Inference" (VBI), offering a new computational approach for estimating these parameters using Bayesian inference powered by artificial neural networks. The basic idea is to use simulated data, given known parameters, to train artificial neural networks to solve the inverse problem, namely, to infer the posterior distribution over the parameter space given data-derived features. The authors have demonstrated the utility of the toolkit using simulated data from several commonly used whole-brain network models in case studies.

      Strengths:

      (1) Model inversion is an important problem in whole-brain network modeling. The toolkit presents a significant methodological step up from common practices, with the potential to broadly impact how the community infers model parameters.

      (2) Notably, the method allows the estimation of the posterior distribution of parameters instead of a point estimation, which provides information about the uncertainty of the estimation, which is generally lacking in existing methods.

      (3) The case studies were able to demonstrate the detection of degeneracy in the parameters, which is important. Degeneracy is quite common in this type of model. If not handled mindfully, they may lead to spurious or stable parameter estimation. Thus, the toolkit can potentially be used to improve feature selection or to simply indicate the uncertainty.

      (4) In principle, the posterior distribution can be directly computed given new data without doing any additional simulation, which could improve the efficiency of parameter inference on the artificial neural network if well-trained.

      We thank the reviewer for the careful consideration of important aspects of the VBI tool, such as uncertainty quantification, degeneracy detection, parallelization, and amortization strategy.

      Weaknesses:

      (1) While the posterior estimator was trained with a large quantity of simulated data, the testing/validation is only demonstrated with a single case study (one point in parameter space) per model. This is not sufficient to demonstrate the method's accuracy and reliability, but only its feasibility. Demonstrating the accuracy and reliability of the posterior estimation in large test sets would inspire more confidence.

      (2) The authors have only demonstrated validation of the method using simulated data, but not features derived from actual EEG/MEG or fMRI data. So, it is unclear if the posterior estimator, when applied to real data, would produce results as sensible as using simulated data. Human data can often look quite different from the simulated data, which may be considered out of distribution. Thus, the authors should consider using simulated test data with out-of-distribution parameters to validate the method and using real human data to demonstrate, e.g., the reliability of the method across sessions.

      (3) The z-scores used to measure prediction error are generally between 1-3, which seems quite large to me. It would give readers a better sense of the utility of the method if comparisons to simpler methods, such as k-nearest neighbor methods, are provided in terms of accuracy.

      (4) A lot of simulations are required to train the posterior estimator, which seems much more than existing approaches. Inferring from Figure S1, at the required order of magnitudes of the number of simulations, the simulation time could range from days to years, depending on the hardware. Although once the estimator is well-trained, the parameter inverse given new data will be very fast, it is not clear to me how often such use cases would be encountered. Because the estimator is trained based on an individual connectome, it can only be used to do parameter inversion for the same subject. Typically, we only have one session of resting state data from each participant, while longitudinal resting state data where we can assume the structural connectome remains constant, is rare. Thus, the cost-efficiency and practical utility of training such a posterior estimator remains unclear.

      We agree with the reviewer that it is necessary to show results on larger synthetic test sets, and we will elaborate further by presenting additional scenarios to demonstrate the robustness of the estimation. However, there are some points raised by the reviewer that we need to clarify.

      The validation on empirical data was beyond the scope of this study, as it relates to model validation rather than the inversion algorithms. This is also because we aimed to avoid repetition, given that we have previously demonstrated model validation on empirical data using theses techniques, for invasive sEEG (Hashemi et al., 2023), MEG (Sorrentino et al., 2024), EEG (Angiolelli et al., 2025) and fMRI (Lavanga et al., 2024, Rabuffo et al., 2025). Note that if the features of the observed data are not included during training, VBI ignores them, as it requires an invertible mapping function between parameters and data features.

      We have used z-scores and posterior shrinkage to measure prediction performance, as these are Bayesian metrics that take into account the variance of both prior and posterior rather than only the mean value or thresholding for ranking of the prediction used in k-NN or confusion matrix methods. This helps avoid biased accuracy estimation, for instance, if the mean posterior is close to the true value but there is no posterior shrinkage. Although shrinkage is bounded between 0 and 1, we agree that z-scores have no upper bound for such diagnostics.

      Finally, the number of required simulations depends on the dimensionality of the parameter space and the informativeness of the data features. For instance, estimating a single global scaling parameter requires around 100 simulations, whereas estimating whole-brain heterogeneous parameters requires substantially more simulations. Nevertheless, we have provided fast simulations, and one key advantage of VBI is that simulations can be run in parallel (unlike MCMC sampling, which is more limited in this regard). Hence, with commonly accessible CPUs/GPUs, the fast simulations and parallelization capabilities of the VBI tool allow us to run on the order of 1 million simulations within 2–3 days on desktops, or in less than half a day on supercomputers at cohort level, rather than over several years! It has been previously shown that the SBI method used in VBI provides an order-of-magnitude faster inversion than HMC for whole-brain epilepsy spread (Hashemi et al., 2023). Moreover, after training, the amortized strategy is critical for enabling hypothesis testing within seconds to minutes. We agree that longitudinal resting-state data under the assumption of a constant structural connectome is rare; however, this strategy is essential in brain diseases such as epilepsy, where experimental hypothesis testing is prohibitive.

      We will clarify these points and better explain some terms mentioned by the reviewer in the revised manuscript.

      Hashemi, M., Vattikonda, A. N., Jha, J., Sip, V., Woodman, M. M., Bartolomei, F., & Jirsa, V. K. (2023). Amortized Bayesian inference on generative dynamical network models of epilepsy using deep neural density estimators. Neural Networks, 163, 178-194.

      Sorrentino, P., Pathak, A., Ziaeemehr, A., Lopez, E. T., Cipriano, L., Romano, A., ... & Hashemi, M. (2024). The virtual multiple sclerosis patient. IScience, 27(7).

      Angiolelli, M., Depannemaecker, D., Agouram, H., Regis, J., Carron, R., Woodman, M., ... & Sorrentino, P. (2025). The virtual parkinsonian patient. npj Systems Biology and Applications, 11(1), 40.

      Lavanga, M., Stumme, J., Yalcinkaya, B. H., Fousek, J., Jockwitz, C., Sheheitli, H., ... & Jirsa, V. (2023). The virtual aging brain: Causal inference supports interhemispheric dedifferentiation in healthy aging. NeuroImage, 283, 120403.

      Rabuffo, G., Lokossou, H. A., Li, Z., Ziaee-Mehr, A., Hashemi, M., Quilichini, P. P., ... & Bernard, C. (2025). Mapping global brain reconfigurations following local targeted manipulations. Proceedings of the National Academy of Sciences, 122(16), e2405706122.

      Recommendations for the authors:

      We appreciate the time and effort of the reviewers, and their insightful and constructive comments to improve the paper. We have now addressed the reviewers’ comments in our revised manuscript and provide here below detailed explanations of the changes.

      We have adapted the Wilson-Cowan model to follow the same brain network modeling notation as the other models (Fig. 3 in the main text and Figs. S2–S4 in the supplementary materials). Additionally, we have included multiple figures in the supplementary material presenting extensive in-silico testing to demonstrate the accuracy and reliability of the estimations across different configurations, as well as the sensitivity to both additive and dynamical noise.

      Reviewer #1 (Recommendations for the authors):

      (1) There were some inaccurate statements throughout the text that need to be corrected.

      a) In section 2.1, paragraph 1, the authors mentioned that they would describe network models corresponding to different types of neuroimaging recordings. This is inaccurate. The models were developed to approximate various aspects of the architecture of neural circuits. They were not developed per se to solely describe a specific neuroimaging modality.

      Thank you for pointing this out. We agree that our phrasing in Section 2.1, paragraph 1, was not clear that the network models were developed to generate neural activity at the source level, and that a projection needs to be established to transform the simulated neural activity into empirically measurable quantities, such as BOLD fMRI, EEG, or MEG. We have revised the wording in the revised manuscript to clarify this point accordingly.

      b) The use of the term "spatio-temporal data features" is misleading as there are no true spatial features extracted.

      We have clarified that:Following Hashemi et al., 2024, we use the term spatio-temporal data features to refer to both statistical and temporal features derived from time series. In contrast, we refer to the connectivity features extracted from FC/FCD matrices as functional data features. We would like to retain this term, as it is used consistently in the code.

      (2) The authors need to improve the model descriptions in Equations (1)-(10). Several variables/parameters were not explained, limiting the accessibility of the work to those without prior experience in computational modeling.

      Thank you for pointing this out. In the revised manuscript, we have improved the model descriptions, all variables and parameters used in these equations.

      (3) Various things need further clarification and/or explanation:

      a) There is a need to highlight that the models section only provides examples of one of the many possible variants of the models. For example, the Wilson-Cowan model described is not your typical and more popular cortico-cortical-based Wilson-Cowan model. This is important to ensure that the work reflects an accurate account of the literature, avoiding future references that the models presented are THE models.

      This is a very important point. We have now highlighted that each model represents one of many possible variants. Moreover, we adapted the Wilson-Cowan model as a whole-brain network modeling approach to harmonize with all other models.

      b) In Figure 1, it is unclear where the empirical data come into play. The neural density estimator also sounds like a black box and needs further explanation (e.g., its architecture).

      Thank you for the careful reading. This is correct. We have now clarified where the empirical data enters as input to the neural density estimator and have added further explanation in section 2.2.

      c) There is also a need to better explain what shrinkage means and what the z-score vs shrinkage implies.

      We have elaborated on the definition of posterior z-score and shrinkage.

      d) It is unclear how the authors decided on the number of training samples to use.

      There is no specific rule for determining the optimal number of simulations required for training. In general, the larger number of simulations, within the available computational budget, the better the posterior estimation is likely to be. In the case of synthetic data, we have monitored the z-score and posterior shrinkage to assess the quality and reliability of the inferred parameters.  This also critically depends on the parameter dimensionality. For instance, in estimating only global coupling parameter, a maximum of 300 simulations was used, demonstrating accurate estimation across models and different realizations (Fig S20), except for the Jansen-Rit model, where coupling did not induce a significant change in the intrinsic frequency of regional activity. We have now pointed this out in the discussion.

      e) In the Results section, paragraph 1, there is a need to clarify that "ground truth" is available because you simulate data using predefined parameters. In fact, these predefined parameters and how they were chosen to generate the observed data were never described in the text.

      The "ground truth" is often chosen randomly within biologically plausible ranges, typically with some level of heterogeneity, and this has now been highlighted.

      f) Can the authors comment on why the median of the posterior distributions (e.g., in Figure 4E) is actually far off from the ground truth parameters? This is probably understandable in the Jansen-Ritt model due to complexity, but not obvious in the very low-dimensional Stuart-Landau oscillator model.

      This can happen due to non-identifiability in high-dimensional settings. Figure 4E represents the posterior estimation using Jansen-Rit model with high-dimensional parameters. An accurate estimation close to the true values can be observed in the low-dimensional Stuart-Landau model, as shown in Figure 5.

      g) In Figure 7, the FC and FCD matrices look weird relative to those typically seen in other works.

      We have updated Figure 7. To do the our best, we have followed the code and the parameters from the following paper Kong et al., Nat Commun 12, 6373 (2021), and the following repo https://github.com/ThomasYeoLab/CBIG/blob/master/stable_projects/fMRI_dynamics/Kong2021_pMFM/examples/scripts/CBIG_pMFM_parameter_estimation_example.py

      We considered 300 iterations for optimizing the parameters, using CMA-ES method, and with window length of 60 sec, and TR=0.72 sec, yielding a 1118 × 1118 FCD matrix for each run. Nevertheless, some discrepancy can happen with the shown FC/FCD, due to convergence of the optimization process and other model parameters.

      h) In Figure 8, results for the J parameter are missing. Also, the BOLD signal time series of some regions in Figure 8B looks very weird, with some having very large deflections.

      We have updated Figure 8. In this figure, the parameter J is not inferred; it is instead presented in the appendix (S18). Please note that the system is in a bistable regime. We have implemented the full Wong-Wang model (Deco, 2014, Journal of Neuroscience), by optimized external current and global coupling (using CMA-ES optimization) to maximize the fluidity of FCD, as those typically seen in other works:

      Author response image 1.

      i) On page 14, the authors mentioned that they perform a PCA on the FC/FCD matrices. Can the authors explain this step further and what it specifically gives out, as this is something unusual in the generative model fitting literature?

      Indeed, PCA is a widely used dimension reduction method in machine learning. Please note that in SBI, any dimensionality reduction technique, such as PCA, can be used, as long as it preserves information relevant to the target parameters.

      j) On page 3, what does ABC in ABC methods stand for?

      ABC stands for Approximate Bayesian Computation, which is now spelled out in the text.

      Reviewer #2 (Recommendations for the authors):

      Overall, I found the paper well-written. These are basically just minor comments:

      We appreciate your positive feedback.

      (1) P3:

      - Amortization requires more explanation for the neuroscience audience.

      - What does ABC stand for?

      We have elaborated on Amortization. ABC stands for Approximate Bayesian Computation, which is now spelled out in the text.

      (2) Section 2.1:

      Should clarify the parcellation used

      In section 2.1, we now mentioned that: “The structural connectome was built with TVB-specific reconstruction pipeline using generally available neuroimaging software (Schirner et al., Neuroimage 2015)”.

      (3) P20: The method for sensitivity analysis (Figure 5F) is not clearly described.

      We have now added a subsection in the Methods section to explain the sensitivity analysis.

      (4) P21: statement that 10k simulations took less than 1 min doesn't match info shown in Figure S1. Please clarify.

      This is correct, as for the Epileptor model, the total integration time is less than 100 ms. Due to the model’s stable behavior with a large time step and the use of 10 CPU cores, all simulations were completed in less than a minute. Previously (Hashemi et al., 2023) it has been reported that each VEP run to simulate 100sec of whole-brain epileptic patterns takes only 0.003 s using a JIT compiler. The other models require more computational cost due to longer integration durations and smaller time steps. We have clarified this point.

      (5) P23-24: the distribution of FCDs also doesn't match well even if we don't consider element-wise correspondence. Please clarify.

      This is correct, as we used summary statistics of the FCD, such as fluidity, and due to noise, each realization of the FCD matrix exhibits different element-wise correspondence. We have already mentioned this point.

    1. eLife Assessment

      This important study identifies and partially characterises two proteins optimised for coordinated peptidoglycan degradation during two spore morphogenesis programs in the bacterium Myxococcus xanthus. The evidence supporting the conclusions is solid, although the description of the data is somewhat overstated. After some editing, the paper will be of interest to those studying peptidoglycan synthesis and reorganisation, which is a central aspect of microbial cell biology.

    2. Reviewer #1 (Public review):

      Summary:

      Ramirez Carbo et al. use the powerful M. xanthus spore morphogenesis model to address fundamental mechanisms in coordinated peptidoglycan remodeling and degradation. As peptidoglycan is an essential macromolecule and difficult to study in vivo, the authors use indirect but important methodology. The authors first identify two lytic transglycosylase (Ltg) enzymes necessary for spore morphogenesis using mutant phenotypic studies. They characterize these mutants for their role in coordinating spore morphogenesis induced either in fruiting bodies (starvation-dependent) or in liquid-rich media conditions (chemical-dependent). They conclude from these phenotypic and epistatic analyses that LtgA is necessary for morphogenesis during chemical-induced sporulation, and LtgB appears to be necessary to coordinate LtgA activity by interfering with LtgA function. Under starvation-induced sporulation, the absence of LtgB interferes with the building of fruiting bodies. LtgA does not appear to play a primary role in promoting aggregation into fruiting bodies, nor in degradation of peptidoglycan as assayed by loss of signal in anti-PG immunofluorescence. The authors demonstrate that the purified periplasmic domain of LtgA is highly active in degrading purified PG sacculi in vitro, while that of LtgB is highly reduced (relative to LtgA or lysozyme). The authors use photoactivated mCherry Lyt fusions and PALM to track the fusion protein mobility, which they state correlates with activity as immobilization results from PG binding. They demonstrate that in vegetative cells, a greater proportion of LtgA-PAmCh is more immobile (more active) than LtgB-PAmCh, but that directly after chemical-induction of sporulation, LtgB-PAmCh becomes more immobile (active). These analyses in the partner mutant backgrounds suggest that LtgA-PAmCh is more immobile (less active) in the absence of LtgB, but the reverse is not observed. Finally, the authors demonstrate that overexpression of LtgA in vegetative conditions leads to cell rounding, likely because of uncontrolled PG degradation, while overexpression of LtgB displays no phenotype.

      Strengths:

      This paper capitalizes on a novel spore morphogenesis mechanism to define proteins and mechanisms involved in peptidoglycan reorganization. The authors use the powerful PALM microscopy technique to assess Ltg activity in vivo by assaying for immobility as a proxy for PG binding. The authors elucidate a novel mechanism by which two Ltg's function together- with one (LtgB) seeming to regulate the activity of the other (the primary Ltg).

      Despite some weaknesses, there is no question that this study provides important insight into mechanisms of peptidoglycan remodeling- a difficult but highly impactful area of study with implications for the development of novel therapeutics and the discovery of mechanisms of fundamental bacterial physiology.

      Weaknesses:

      In many places, the authors do not adequately justify interpretations of their assays, leading to some apparently unjustified conclusions. Many of these are minor and may just require citations to demonstrate that the interpretations are justified by previous studies (detailed in recommendations below), but two bigger concerns are as follows:

      (1) It is not clear how the muropeptides listed in Figure 1 were assigned, and it is missing in the methods. In the sporulating conditions, the spectra look like combinations of multiple peaks, and the data, as stated, is not convincing to the non-specialist eye.

      (2) The observation that the lytB mutant prevents appropriate aggregation into fruiting bodies does not allow the interpretation that the absence of LytB prevents PG morphogenesis in the starvation-induced sporulation pathway, per se. It is more likely that in the lytB mutant, the morphogenesis program is not even triggered. This is because signaling proteins and regulators (specifically, C-signal accumulation/activated FruA), which are dependent on increased cell-cell signaling in the fruiting body, do not accumulate appropriately in shallow aggregates. C-signal/FruA are necessary to trigger the sporulation program in FBs. BTW: A hypothesis to explain the indirect effect of ltgB absence on aggregation could be that UDP-precursors are not regulated appropriately (unregulated LtyA??), so polysaccharides necessary for motility are not properly produced.

      Along these lines, fruiting body formation does not equal sporulation, and even "darkened" fruiting bodies can be misleading, as some mutants form polysaccharide-rich fruiting bodies (that appear dark under certain light conditions in the stereomicroscope) but do not sporulate efficiently. The wording in the text suggests that the authors assume that sporulation levels are normal because fruiting bodies are produced (see specific comments for details).

      (3) The authors repeatedly state that production of spore coat polysaccharides likely affects the PG IP staining (see below), but this is not well justified. A citation is needed if this has already been directly shown, or the language needs to be softened.

      (4) Better justification for the immobility of Lyt proteins in vivo as an assay for activity may be required. If this is well known in the field, it should be explicitly stated. The authors address this better in the discussion - but still state it is a correlation.

    3. Reviewer #2 (Public review):

      Summary:

      The authors' initial goal was to demonstrate loss of PG during the slow sporulation process of Myxococcus xanthus, with examination of the PG degradation products in order to implicate possible enzymes involved. Upon finding a predominance of LGT products, they examined sporulation in strains lacking each of the 14 candidate LGTs encoded in the genome, leading to the identification of two sporulation-linked LGTs. An extensive characterization of the roles played by these LGTs. One LGT is responsible for the slow sporulation PG degradation, while another is required for the rapid sporulation process. Interestingly, the "slow" LGT seems to provide an important regulatory brake on the rapid enzyme. Single-molecule fluorescent tracking of these enzymes was used to develop a model for their interaction with PG that mimics their observed activity. The rate of PG synthesis activity was also shown to impact the rate of PG degradation, suggesting potential interplay between the synthetic and degradative enzymes.

      Strengths:

      The genetic analysis to identify sporulation-linked LGTs and their effects on growth, sporulation, and spore properties was well done and productive. The fluorescence microscopy to track LGT mobility, presumably tied to activity, produced a convincing argument about the mechanism of regulation of one LGT by another.

      Weaknesses:

      While the impact of LGTs on sporulation was clearly demonstrated, the PG analysis that resulted from the study of LGTs raised some important unanswered questions. The analyses suggest that the PG is degraded to quite small fragments, which would normally be lost during the purification of PG. How these small fragments were thus detected is unclear, and this suggests a more complex story concerning PG metabolism during sporulation. An anti-PG antibody is used to quantify PG in the spores, but it is not made clear what the specificity of this antibody is, and thus whether it would recognize the LGT-altered PG of the spore. The authors suggest a "new mechanism of sporulation" when they have actually simply identified an important factor (PG degradation by LGTs) within a complex "process of sporulation".

    1. eLife Assessment

      In this manuscript, Chen et al. used cryo-ET and in vitro reconstituted system to demonstrate that the autoinhibited form of LRRK2 can also assemble into filaments on the microtubule surface, with a new interface involving the N-terminal repeats that were disordered in the previous active-LRRK2 filament structure. The structure obtained in this study is the highest resolution of LRRK2 filaments done by subtomogram averaging, representing a major technical advance compared to the previous paper from the same group. This is an important study, especially considering the pharmacological implications of the effect of inhibitors of the protein. The strengths of the data are convincing, but the study would be considerably strengthened if the authors explored the physiological significance of the new interfaces and the incomplete decoration of microtubules described here.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Chen et al. use cryo-electron tomography and an in vitro reconstitution system to demonstrate that the autoinhibited form of LRRK2 can assemble into filaments that wrap around microtubules. These filaments are generally shorter and less ordered than the previously characterized active-LRRK2 filaments. The structure reveals a novel interface involving the N-terminal repeats, which were disordered in the earlier active filament structure. Additionally, the autoinhibited filaments exhibit distinct helical parameters compared to the active form.

      Strengths:

      This study presents the highest-resolution structure of LRRK2 filaments obtained via subtomogram averaging, marking a significant technical advance over the authors' previous work published in Cell. The data are well presented, with high-quality visualizations, and the findings provide meaningful insights into the structural dynamics of LRRK2.

      Weaknesses and Suggestions:

      The revised manuscript by Chen et al. has fully addressed all of my previous suggestions regarding the rearrangement of the main figures.

    3. Reviewer #2 (Public review):

      The authors of this paper have done much pioneering work to decipher and understand LRRK2 structure and function and uncover the mechanism by which LRRK2 binds to microtubules and to study the roles that this may play in biology. Their previous data demonstrated that LRRK2 in the active conformation (pathogenic mutation or Type I inhibitor complex) bound to microtubule filaments in an ordered helical arrangement. This they showed induced a "roadblock" in the microtubule impacting vesicular trafficking. The authors have postulated that this is a potentially serious flaw with Type 1 inhibitors and that companies should consider generating Type 2 inhibitors in which the LRRK2 is trapped in the inactive conformation. Indeed the authors have published much data that LRRK2 complexed to Type 2 inhibitors does not seem to associate with microtubules and cause roadblocks in parallel experiments to those undertaken with type 1 inhibitors published above.

      In the current study the authors have undertaken an in vitro reconstitution of microtubule bound filaments of LRRK2 in the inactive conformation, which surprisingly revealed that inactive LRRK2 can also interact with microtubules in its auto-inhibited state. The authors' data shows that while the same interphases are seen with both the active LRRK2 and inactive microtubule bound forms of LRRK2, they identified a new interphase that involves the WD40-ARM-ANK- domains that reportedly contributes to the ability of the inactive form of LRRK2 to bind to microtubule filaments. The structures of the inactive LRRK2 complexed to microtubules are of medium resolution and do not allow visualisation of side chains.

      This study is extremely well written and the figures incredibly clear and well presented. The finding that LRRK2 in the inactive autoinhibited form can associate with microtubules is an important observation that merits further investigation. This new observation makes an important contribution to the literature and builds upon the pioneering research that this team of researchers has contributed to the LRRK2 fields.

      Comments on revised version:

      The authors have adequately addressed my questions and those of the other Reviewers in my opinion.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript by Chen et al examines the structure of the inactive LRRK2 bound to microtubules using cryo-EM tomography. Mutations in this protein have been shown to be linked to Parkinson's Disease. It is already shown that the active-like conformation of LRRK2 binds to the MT lattice, but this investigation shows that full-length LRRk2 can oligomerize on MTs in its autoinhibited state with different helical parameters than were observed with active-like state. The structural studies suggest that the autoinhibited state is less stable on MTs.

      Strengths:

      The protein of interest is very important biomedically and a novel conformational binding to microtubules in proposed

      The authors have addressed my original critique.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Chen et al. used cryo-ET and in vitro reconstituted system to demonstrate that the autoinhibited form of LRRK2 can also assemble into filaments that wrap around the microtubule, although the filaments are typically shorter and less regular compared to the previously reported active-LRRK2 filaments. The structure revealed a new interface involving the N-terminal repeats that were disordered in the previous active-LRRK2 filament structure. The autoinhibited-LRRK2 filament also has different helical parameters compared to the active form.

      Strengths:

      The structure obtained in this study is the highest resolution of LRRK2 filaments done by subtomogram averaging, representing a major technical advance compared to the previous Cell paper from the same group. Overall, I think the data are well presented with beautiful graphic rendering, and valuable insights can be gained from this structural study.

      Weaknesses:

      (1) There are only three main figures, together with 9 supplemental figures. The authors may consider breaking the currently overwhelming Figures 1 and 3 into smaller figures and moving some of the supplemental figures to the main figure, e.g., Figure S7.

      (2) The key analysis of this manuscript is to compare the current structure with the previous active-LRRK2 filament structure. Currently, such a comparison is buried in Figure 3H. It should be part of Figure 1.

      We thank the reviewer for this suggestion. As suggested, we have rearranged the figures, split Figure 1 and 3 into smaller Figures, and moved the comparison analysis in Figure 3H to the new Figure 1. Specifically, the old Figure 1 is separated into two figures, introducing the model-building process and describing the two symmetric axes. The old Figure 3 is also separated into two small figures, describing the geometric analysis and model comparison, respectively.

      Reviewer #2 (Public review):

      The authors of this paper have done much pioneering work to decipher and understand LRRK2 structure and function, to uncover the mechanism by which LRRK2 binds to microtubules, and to study the roles that this may play in biology. Their previous data demonstrated that LRRK2 in the active conformation (pathogenic mutation or Type I inhibitor complex) bound to microtubule filaments in an ordered helical arrangement. This they showed induced a "roadblock" in the microtubule impacting vesicular trafficking. The authors have postulated that this is a potentially serious flaw with Type 1 inhibitors and that companies should consider generating Type 2 inhibitors in which the LRRK2 is trapped in the inactive conformation. Indeed the authors have published much data that LRRK2 complexed to Type 2 inhibitors does not seem to associate with microtubules and cause roadblocks in parallel experiments to those undertaken with type 1 inhibitors published above.

      In the current study, the authors have undertaken an in vitro reconstitution of microtubule-bound filaments of LRRK2 in the inactive conformation, which surprisingly revealed that inactive LRRK2 can also interact with microtubules in its auto-inhibited state. The authors' data shows that while the same interphases are seen with both the active LRRK2 and inactive microtubule bound forms of LRRK2, they identified a new interphase that involves the WD40-ARM-ANK- domains that reportedly contributes to the ability of the inactive form of LRRK2 to bind to microtubule filaments. The structures of the inactive LRRK2 complexed to microtubules are of medium resolution and do not allow visualisation of side chains.

      This study is extremely well-written and the figures are incredibly clear and well-presented. The finding that LRRK2 in the inactive autoinhibited form can be associated with microtubules is an important observation that merits further investigation. This new observation makes an important contribution to the literature and builds upon the pioneering research that this team of researchers has contributed to the LRRK2 fields. However, in my opinion, there is still significant work that could be considered to further investigate this question and understand the physiological significance of this observation.

      We thank the reviewer for the positive comments and we agree that more work can be done next to understand the physiological significance of the autoinhibited LRRK2 in cellular environments. We are actively working on understanding how the stability of autoinhibited full-length LRRK2 is regulated, especially how the transfer between autoinhibited and active forms of LRRK2 can happen. Our in situ data (Watabane et al. 2020) indicates that overexpressed hyperactive PD-mutant LRRK2 mainly adopts its active-like conformation in cells. Thus, learning how the state transfer occurs will allow us to target autoinhibited LRRK2 specifically and efficiently in cells and study its structure and function in physiological conditions.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Chen et al examines the structure of the inactive LRRK2 bound to microtubules using cryo-EM tomography. Mutations in this protein have been shown to be linked to Parkinson's Disease. It is already shown that the active-like conformation of LRRK2 binds to the MT lattice, but this investigation shows that full-length LRRk2 can oligomerize on MTs in its autoinhibited state with different helical parameters than were observed with the active-like state. The structural studies suggest that the autoinhibited state is less stable on MTs.

      Strengths:

      The protein of interest is very important biomedically and a novel conformational binding to microtubules in the proposed.

      Weaknesses:

      (1) The structures are all low resolution.

      We thank the reviewer for the comments on both the strengths and weaknesses of the manuscript. We agree with the reviewer that higher resolution would provide more information about how LRRK2 interacts with microtubules and oligomerizes in its autoinhibited form. However, with the current resolution, our model-building benefited significantly from the published high-resolution models and the alpha-fold predictions. We used cryo-ET and subtomogram analysis to solve the structure because this filament is less regular than the right-handed active LRRK2 filament, preventing us from using conventional single-particle analysis. As highlighted by reviewer 1, being able to push the resolution to sub-nanometer is an important advance reflecting state-of-the-art subtomogram analysis, especially for a heterogeneous sample.  Notably, the microtubule reconstruction reached higher resolution, comparable to our previous single-particle studies on LRRK2-RCKW (Snead and Matyszewski et al.), confirming the data quality.

      (2) There are no measurements of the affinity of the various LRRK2 molecules (with and without inhibitors) to microtubules. This should be addressed through biochemical sedimentation assay.

      We thank the reviewer for the suggestion and we agree that learning the binding affinity between LRRK2 and microtubules would be informative. We attempted to purify the LRRK2 with mutants on the WD40:ARM/ANK interface we identified in the manuscript.. Unfortunately, either LRRK2 or LRRK2<sup>I2020T</sup> with N-terminal mutants (R521A/F573A/E854K), the yield and purity of the final samples are significantly worse than our routine LRRK2 prep. Our chromatography and gel electrophoresis results indicate that proteins are degrading during purification.

      Author response image 1.

      While we have attached the results here, and it would be interesting to investigate why N-terminal mutations destabilize LRRK2, we anticipate that significant efforts would be required for further experiments, which we respectfully consider outside of the scope of this manuscript. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In Figure S9, the graphic definition of "chain length" in panel A is misleading. The authors can simply note in the figure legend that "chain length is the number of asymmetric units in a continuous chain".

      We thank the reviewer for the suggestion. The updated figure and legend have incorporated the changes.

      (2) In Figure S7B, the conformation changes of the 'G-loop' and the 'DYG' motifs are not so convincing at the current resolution.

      We thank the reviewer for pointing it out. We agree that our model resolution is not high enough to support the unbiased observation of the conformation changes of the key kinase motifs. In the revised manuscript, we avoided emphasizing the comparison between the two models. Instead, we state that for both the MLi-2 bound map and the GZD-824 bound map, the corresponding published high-resolution models fit into each kinase map, but the MLi-2 bound model doesn’t fit as well in the GZD-824 bound map, with a correlation value dropped from 0.44 to 0.4, supporting our statement that “full-length LRRK2 bound to microtubules is in its autoinhibited state in our reconstituted system”.

      Reviewer #2 (Recommendations for the authors):

      (1) Are there any cellular experiments that could be done to demonstrate that inactive LRRK2 associates with microtubules in cells?

      We thank the reviewer for pointing out this direction for future studies. We are studying the physiological significance of the autoinhibited LRRK2 in cells, but haven’t yet been successful at demonstrating physiological binding to microtubules. Further, as noted in our response to reviewer #3, we are also actively working on understanding how the stability of autoinhibited full-length LRRK2 is regulated, especially how the transfer between autoinhibited and active forms of LRRK2 can happen. Our in situ data (Watabane et al. 2020) indicates that hyperactive PD-mutant overexpressed LRRK2 mainly adopts its active-like conformation in cells. Thus, learning how the state transfer occurs will allow us to target autoinhibited LRRK2 specifically and efficiently in cells and study its structure and function in physiological conditions.

      (2) Previous work that the authors and others have undertaken has suggested that only LRRK2 in its active conformation can associate with microtubule filaments and the authors have shown that this leads to a roadblock in vesicular transport only when LRRK2 is complexed with Type 1 but not Type 2 inhibitors. There seems to be some discrepancy here that is not addressed in the paper as based on the current results one would also expect LRRK2 bound to Type 2 inhibitors to induce roadblocks in microtubule filaments. How can this be explained?

      We thank the reviewer for raising this important question. Taking all of our published data together, we believe that LRRK2 can introduce roadblocks with Type 1 inhibitor bound in the active-like conformation, where N-terminus LRRK2 domains are flexible and don’t block the kinase active site. In other words, full-length LRRK2 can form roadblocks when it behaves more like the truncated LRRK2<sup>RCKW</sup> variant. The autoinhibited LRRK2 forms shorter and less stable oligomers on microtubules, making it harder to block transport. Consistent with this, our in situ LRRK2-microtubule structure was observed in cells where LRRK2 is in an active-like conformation, and the LRRK2 N-terminus appeared to be flexible and away from the microtubule when forming right-handed filaments.

      (3) Does the finding that inactive LRRK2 only binds to microtubules as a short filament, explain the differences between the inactive and active forms of LRRK2 binding to microtubules and causing roadblocks?

      We thank the reviewer for discussing this point with us and asking the question. As we replied in the previous comment, the reviewer’s conclusion explains how the roadblock phenomenon occurs only under certain circumstances. We expanded our discussion to add the following and address the question:

      “Notably, we previously demonstrated that active‐like LRRK2, when bound to a Type I inhibitor, can form roadblocks that impair vesicular transport. Since autoinhibited LRRK2 assembles into shorter, less stable oligomers on microtubules, we anticipate it will exert reduced road‐blocking effects in cells, regardless of the inhibitor bound.”

      (4) Could the authors undertake further characterization of the new WD40-ARM-ANK interphase that they have identified? Is this important for the binding of the autoinhibited mutant? Could mutants be made in this interphase to see if this prevents the autoinhibited but not the active conformation of LRRK2 binding to microtubules?

      We thank the reviewer for the comment. As mentioned in our response to Reviewer #2, public comment #2, we attempted to purify the LRRK2 with mutants on the WD40:ARM/ANK interface we identified in the manuscript multiple times. Unfortunately, either LRRK2 or LRRK2<sup>I2020T</sup> with N-terminal mutants (R521A/F573A/E854K), the yield and purity of the final samples are significantly worse than our routine LRRK2 prep. Our chromatography and gel electrophoresis results indicate that proteins are degrading during purification.

      (5) The authors identify several disease-relevant missense mutations that appear to lie within the novel interphase that the authors have characterised in this study. Although this is discussed in the Discussion, some experimental data demonstrating how these missense mutations impact the ability of inactive LRRK2 to bind to microtubule filaments in the presence or absence of Type 1 and Type 2 compounds could provide further experimental data that emphasises the physiological importance of the results presented in this study.

      We thank the reviewer for discussing this interesting direction. The disease-relevant missense mutations can have a direct or indirect impact on the binding of autoinhibited LRRK2 to microtubules, and we agree that it would be interesting to test it out in the future. However, we anticipate that significant effort would be required for further experiments. Alas, our funding for this project ended suddenly and we want to report our results to the community.

      (6) For the data that is shown in Figure 1, could the authors explain how this differs from results in previous papers of the authors showing that the active form of LRRK2 binds microtubules? How does the binding observed here differ from that observed in the previous studies? To a non-specialist reader, the data looks fairly like what has previously been reported.

      We thank the reviewer for asking the question. As mentioned in the response to the public review, the detailed comparison between the data and the previous papers is described in Figure 3, and we agree that it is helpful to incorporate this information in Figure 1. In the revised manuscript, we have incorporated the comparison panel in Figure 1.

      (7) The finding that the autoinhibited LRRK2 forms short and sparse oligomers on microtubules raises the question of how physiological this observation is. Having some data that suggests that this is physiologically relevant would boost the impact of this study.

      We agree with the reviewer on this comment. As discussed in the response to the first comment from the reviewer, we have not been able to assess the physiological relevance of LRRK2 binding to microtubules in either active or inactive state, but continue to pursue this line of research. We are aware and regret that this lessens the impact of this work.

      (8) For the more general reader the authors could potentially better highlight why the key finding in this paper is important.

      We thank the reviewer for the suggestion. To further address the significance of the key findings, especially how it can open up more possibilities for inhibitor-based drug development, we expand our discussion section to include the following:

      “Understanding how Type I and Type II inhibitors’ binding to LRRK2 affects its mechanism is vital to the design of inhibitor-based PD drug development strategies. Our findings revealed that different LRRK2 kinase inhibitors bind to autoinhibited LRRK2 similarly either in solution or on microtubules. Furthermore, the observation of autoinhibited LRRK2 forming short, less stable oligomers on microtubules opens new possibilities to inhibit LRRK2 activity in PD patients. A Type I inhibitor specifically targeting autoinhibited LRRK2 may alleviate the effect of LRRK2 roadblocks on microtubules. Alternatively, a promising strategy of LRRK2 inhibitor design can focus on the stabilization of allosteric N-terminus blocking on the kinase domain, which favors the formation of autoinhibited LRRK2 oligomers on microtubules and causes fewer side effects.”

      Reviewer #3 (Recommendations for the authors):

      In the third paragraph of the introduction, expand on whether type-1 inhibitors which "capture kinases in a closed, "active-like" conformation still inhibit the kinase activity.

      We thank the reviewer for the request to expand this paragraph. We added the following explanation for better understanding in the third paragraph:

      “Type-I inhibitors bind to the ATP binding site and target the kinase in its ‘active-like' conformation, inhibiting its kinase activity.”

    1. eLife Assessment

      This is an important study that demonstrates that blood pressure variability impairs myogenic tone and diminishes baroreceptor reflex. The study also provides evidence that blood pressure variability blunts functional hyperemia and contributes to cognitive decline. The evidence is compelling whereby the authors use appropriate and validated methodology in line with or more rigorous than the current state-of-the-art.

    2. Reviewer #1 (Public review):

      This study examined the effect of blood pressure variability on brain microvascular function and cognitive performance. By implementing a model of blood pressure variability using intermittent infusion of AngII for 25 days, the authors examined different cardiovascular variables, cerebral blood flow and cognitive function during midlife (12-15-month-old mice). Key findings from this study demonstrate that blood pressure variability impairs baroreceptor reflex and impairs myogenic tone in brain arterioles, particularly at higher blood pressure. They also provide evidence that blood pressure variability blunts functional hyperemia and impairs cognitive function and activity. Simultaneous monitoring of cardiovascular parameters, in vivo imaging recordings, and the combination of physiological and behavioral studies reflect rigor in addressing the hypothesis. The experiments are well designed, and data generated are clear.

      A number of issues raised earlier were addressed by the authors in the revised manuscript. The responses are convincing. These included circadian rhythm considerations, baroreflex findings, BP fluctuations driven by animal movement, and data presentation.

      Overall, this is a solid study with huge physiological implications. I believe that it will be of great benefit to the field.

    3. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This study examined the effect of blood pressure variability on brain microvascular function and cognitive performance. By implementing a model of blood pressure variability using an intermittent infusion of AngII for 25 days, the authors examined different cardiovascular variables, cerebral blood flow, and cognitive function during midlife (12-15-month-old mice). Key findings from this study demonstrate that blood pressure variability impairs baroreceptor reflex and impairs myogenic tone in brain arterioles, particularly at higher blood pressure. They also provide evidence that blood pressure variability blunts functional hyperemia and impairs cognitive function and activity. Simultaneous monitoring of cardiovascular parameters, in vivo imaging recordings, and the combination of physiological and behavioral studies reflect rigor in addressing the hypothesis. The experiments are well-designed, and the data generated are clear. I list below a number of suggestions to enhance this important work:

      (1) Figure 1B: It is surprising that the BP circadian rhythm is not distinguishable in either group. Figure 2, however, shows differences in circadian rhythm at different timepoints during infusion. Could the authors explain the lack of circadian effect in the 24-h traces?

      The circadian rhythm pattern is apparent in Figure 2 (Active BP higher than Inactive BP), where BP is presented as 12hour averages. When the BP data is expressed as one-hour averages (rather than minute-to-minute) over 24hours, now included in the revised manuscript as Supplemental Figure 3C-D, the circadian rhythm becomes noticeable. In addition, we have included one-hour average BP data for all mice in the control and BPV groups, Supplemental Figure 3A-B.

      Notably, the Ang-II induced pulsatile BP pattern remains evident in the one-hour averages for the BPV group, Supplemental Figure 3B. To minimize bias and validate variability, pump administrations start times were randomized for both control and BPV groups, Supplemental Figure 3A-B. Despite these adjustments, the circadian rhythm profile of BP is consistently maintained across individual mice and in the collective dataset, Supplemental Figure 3C-D.

      (2) While saline infusion does not result in elevation of BP when compared to Ang II, there is an evident "and huge" BP variability in the saline group, at least 40mmHg within 1 hour. This is a significant physiological effect to take into consideration, and therefore it warrants discussion.

      Thank you for this comment. The large variations in BP in the raw traces during saline infusion reflects transient BP changes induced by movement/activity, which is now included in Figure 1B (maroon trace). The revised manuscript now includes Line 222 “Note that dynamic activity-driven BP changes were apparent during both saline- and Ang II infusions, Figure 1B”.

      (3) The decrease in DBP in the BPV group is very interesting. It is known that chronic Ang II increases cardiac hypertrophy, are there any changes to heart morphology, mass, and/or function during BPV? Can the decrease in DBP in BPV be attributed to preload dysfunction? This observation should be discussed.

      The lower DBP in the BPV group was already present at baseline, while both groups were still infused with saline, and was a difference beyond our control. However, this is an important and valid consideration, particularly considering the minimal yet significant increase in SBP within the BPV group (Figure 1D). Our goal was to induce significant transient blood pressure responses (BPV) and investigate the impact on cardiovascular and neurovascular outcomes in the absence of hypertension. We did not anticipate any major cardiac remodeling at this early time point (considering the absence of overt hypertension) and thus cardiac remodeling was not assessed and this is now discussed in the revised manuscript (Line 443-453).

      (4) Examining the baroreceptor reflex during the early and late phases of BPV is quite compelling. Figures 3D and 3E clearly delineate the differences between the two phases. For clarity, I would recommend plotting the data as is shown in panels D and E, rather than showing the mathematical ratio. Alternatively, plotting the correlation of ∆HR to ∆SBP and analyzing the slopes might be more digestible to the reader. The impairment in baroreceptor reflex in the BPV during high BP is clear, is there any indication whether this response might be due to loss of sympathetic or gain of parasympathetic response based on the model used?

      We appreciate the reviewer’s suggestion and have accordingly generated new figures displaying scatter plots of SBP vs HR with linear regression analysis (Figure 3D-G). Our goal is to further investigate which branch of the autonomic nervous system is affected in this model. The loss of a bradycardic response suggests either an enhancement of sympathetic activity, a reduction in parasympathetic activity, or a combination of both. This is briefly discussed in the revised manuscript (Line 486-496).

      Heart rate variability (HRV) serves as an index of neurocardiac function and dynamic, non-linear autonomic nervous system processes, as described in Shaffer and Ginsber[1]. However, given that our data was limited to BP and HR readings collected at one-minute intervals, our primary assessment of autonomic function is limited to the bradycardic response. Further studies will be necessary to fully characterize the autonomic parameters influenced by chronic BPV.

      (5) Figure 3B shows a drop in HR when the pump is ON irrespective of treatment (i.e., independent of BP changes). What is the underlying mechanism?

      We apologize for any lack of clarity. These observed heart rate (HR) changes occurred during Ang II infusion, when blood pressure (BP) was actively increasing. In the control group, the pump solution was switched to Ang II during specific periods (days 3-5 and 21-25 of the treatment protocol) to induce BP elevations and a baroreceptor response, allowing direct comparisons between the control and BPV group.

      To clarify this point, we have revised Line 260-263 of the manuscript: “To compare pressure-induced bradycardic responses between BPV and control mice at both early and later treatment stages, a cohort of control mice received Ang II infusion on days 3-5 (early phase) (Supplemental Figure 4) and days 21-25 (late phase) thereby transiently increasing BP”.

      Additionally, a detailed description has been added to the Methods section (Line 96-101): “Controls receiving Ang II: To facilitate between-group comparisons (control vs BPV), a separate cohort of control mice were subjected to the same pump infusion parameters as BPV mice but for a brief period receiving Ang II infusions on days 3-5 and 21-25 for experiments assessing pressure-evoked responses, including bradycardic reflex, myogenic response, and functional hyperemia at high BP.”

      (6) The correlation of ∆diameter vs MAP during low and high BP is compelling, and the shift in the cerebral autoregulation curve is also a good observation. I would strongly recommend that the authors include a schematic showing the working hypothesis that depicts the shift of the curve during BPV.

      Thank you for this insightful comment. The increase in vessel reactivity to BP elevations in parenchymal arterioles of BPV mice suggests that chronic BPV induces a leftward shift and a potential narrowing of the cerebral autoregulation range (lower BP thresholds for both the upper and lower limits of autoregulation). This has been incorporated (and discussed) into the revised manuscript (see Figure 5N).

      One potential explanation for these changes is that the absence of sustained hypertension, a prominent feature in most rodent models of hypertension, limits adaptive processes that protect the cerebral microcirculation from large BP fluctuations (e.g., vascular remodeling). While this study does not specifically address arteriole remodeling, the lack of such adaptation may reduce pressure buffering by upstream arterioles, thereby rendering the microcirculation more vulnerable to significant BP fluctuations.

      The unique model allows for measurements of parenchymal arteriole reactivity to acute dynamic changes in BP (both an increase and decrease in MAP). Our findings indicate that chronic BPV enhances the reactivity of parenchymal arterioles to BP changes—both during an increase in BP and upon its return to baseline, Supplemental Figure 5C, F. The data suggest an increased myogenic response to pressure elevation, indicative of heightened contractility, a common adaptive process observed in rodent models of hypertension[2-4]. However, our model also reveals a notable tendency for greater dilation when the BP drops, Supplemental Figure 5F. This intriguing observation may suggest ischemia during the vasoconstriction phase (at higher BP), leading to enhanced release of dilatory signals, which subsequently manifest as a greater dilation upon BP reduction. This phenomenon bears similarities to chronic hypoperfusion models[5,6], where vasodilatory mechanisms become more pronounced in response to sustained ischemic conditions. Future studies investigating the effects of BPV on myogenic responses and brain perfusion will be a priority for our ongoing research.

      (7) Functional hyperemia impairment in the BPV group is clear and well-described. Pairing this response with the kinetics of the recovery phase is an interesting observation. I suggest elaborating on why BPV group exerts lower responses and how this links to the rapid decline during recovery.

      Based on the heightened reactivity of BPV parenchymal arterioles to intravascular pressure (Figure 5), we anticipate that the reduction of sensory-evoked dilations results from an increased vasoconstrictive activity and/or a decreased availability of vasodilatory signaling pathways (NO, EETs, COX-derived prostaglandins)[7,8]. Consequently, the magnitude of the FH response is blunted during periods of elevated BP in BPV mice.

      Additionally, upon termination of the stimulus-induced response−when vasodilatory signals would typically dominate−vasoconstrictive mechanisms are rapidly engaged (or unmasked), leading to quicker return to baseline. This shift in the balance between vasodilatory and vasoconstrictive forces favors vasoconstriction, contributing to the altered recovery kinetics observed in BPV mice. This has been included in the Discussion section of the revised manuscript.

      (8) The experimental design for the cognitive/behavioral assessment is clear and it is a reasonable experiment based on previous results. However, the discussion associated with these results falls short. I recommend that the authors describe the rationale to assess recognition memory, short-term spatial memory, and mice activity, and explain why these outcomes are relevant in the BPV context. Are there other studies that support these findings? The authors discussed that no changes in alternation might be due to the age of the mice, which could already exhibit cognitive deficits. In this line of thought, what is the primary contributor to behavioral impairment? I think that this sentence weakens the conclusion on BPV impairing cognitive function and might even imply that age per se might be the factor that modulates the various physiological outcomes observed here. I recommend clarifying this section in the discussion.

      We thank the reviewer for this comment. Clinical studies have demonstrated that patients with elevated BPV exhibit impairments across multiple cognitive domains, including declines in processing speed[9] and episodic memory[10]. To evaluate memory function, we utilized behavioral tests: the novel object recognition (NOR) task to assess episodic memory[11] and the spontaneous Y-maze to evaluate short-term spatial memory[12].

      Previous research indicates that older C57Bl6 mice (14-month-old) exhibit cognitive deficits compared to younger counterparts (4- and 9-month-old)[13]. To ensure rigorous selection for behavioral testing, we conducted preliminary NOR assessment, evaluating recognition memory at the one-hour delay but observing failures at the four-, and 24-hour delays, indicating age-related deficits. Based on these results, animals failing recognition criteria were excluded from subsequent behavioral assessment. However, because no baseline cognitive testing was conducted for the spontaneous Y-maze, it is possible that some mice with aged-related deficits were included in this test, which may have influenced data interpretation.

      Additionally, the absence of differences in the Y-maze performance may suggest that short-term spatial memory remains intact following 25 days of BPV, a point that is now discussed in the revised manuscript.

      (9) Why were only male mice used?

      We appreciate this comment and acknowledge the importance of conducting experiments in both male and female mice. Studies involving female mice are currently ongoing, with telemetry data collection approximately halfway completed and two-photon imaging studies on functional hyperemia also partially completed. However, using middleaged mice for these experiments has proven challenging due to high mortality rates following telemetry surgeries. As a result, we initially limited our first cohort to male mice.

      (10) In the results for Figure 3: "Ang II evoked significant increases in SBP in both control and BPV groups;...". Also, in the figure legend: "B. Five-minute average HR when the pump is OFF or ON (infusing Ang II) for control and BPV groups...." The authors should clarify this as the methods do not state a control group that receives Ang II.

      Please refer to response to comment 5.

      Reviewer #2 (Public review):

      Summary:

      Blood pressure variability has been identified as an important risk factor for dementia. However, there are no established animal models to study the molecular mechanisms of increased blood pressure variability. In this manuscript, the authors present a novel mouse model of elevated BPV produced by pulsatile infusions of high-dose angiotensin II (3.1ug/hour) in middle-aged male mice. Using elegant methodology, including direct blood pressure measurement by telemetry, programmable infusion pumps, in vivo two-photon microscopy, and neurobehavioral tests, the authors show that this BPV model resulted in a blunted bradycardic response and cognitive deficits, enhanced myogenic response in parenchymal arterioles, and a loss of the pressure-evoked increase in functional hyperemia to whisker stimulation.

      Strengths:

      As the presentation of the first model of increased blood pressure variability, this manuscript establishes a method for assessing molecular mechanisms. The state-of-the-art methodology and robust data analysis provide convincing evidence that increased blood pressure variability impacts brain health.

      Weaknesses:

      One major drawback is that there is no comparison with another pressor agent (such as phenylephrine); therefore, it is not possible to conclude whether the observed effects are a result of increased blood pressure variability or caused by direct actions of Ang II.

      We acknowledge this limitation and have attempted to address the concern by introducing an alternative vasopressor, norepinephrine (NE), Figure 4. A subcutaneous dose of 45 µg/kg/min was titrated to match Ang II-induced transient BP pulse (Systolic BP ~150-180 mmHg), Figure 4A. Similar to Ang II treated mice, NE-treated mice exhibited no significant changes in average mean arterial pressure (MAP) throughout the 20-day treatment period (Figure 4B). Although there was a trend (P=0.08) towards increased average real variability (ARV) (Figure 4C left), it did not reach statistical significance. The coefficient of variation (CV) (Figure 4C right) was significantly increased by day 3-4 of treatment (P=0.02).

      Notably, unlike the bradycardic response observed during Ang II-induced BP elevations, NE infusions elicited a tachycardic response (Figure 4A), likely due to β-1 adrenergic receptor activation. However, significant mortality was observed within the NE cohort: three of six mice died prematurely during the second week of treatment, and two additional mice required euthanasia on days 18 and 20 due to lethargy, impaired mobility, and tachypnea.

      While we recognize the importance of comparing results across vasopressors, further investigation using additional vasopressors would require a dedicated study, as each agent may induce distinct off-target effects, potentially generating unique animal models. Alternatively, a mechanical approach−such as implanting a tethered intra-aortic balloon[14] connected to a syringe pump−could be explored to modulate blood pressure variability without pharmacological intervention. However, such an approach falls beyond the scope of the present study.

      Ang II is known to have direct actions on cerebrovascular reactivity, neuronal function, and learning and memory. Given that Ang II is increased in only 15% of human hypertensive patients (and an even lower percentage of non-hypertensive), the clinical relevance is diminished. Nonetheless, this is an important study establishing the first mouse model of increased BPV.

      We agree that high Ang II levels are not a predominant cause of hypertension in humans, which is why it is critical that our pulsatile Ang II dosing did not cause overt hypertension, (no increase in 24-hour MAP). Ang II was solely a tool to produce controlled, transient increases in BP to yield a significant increase in BPV.

      Regarding BPV specifically, prior studies indicate that primary hypertensive patients with elevated urinary angiotensinogen-to-creatinine ratio exhibit significantly higher mean 24-hour systolic ARV compared to those with lower ratios[15]. However, the fundamental mechanisms driving these harmful increases in BPV remain poorly defined. A central theme across clinical BPV studies is impaired arterial stiffness, which has been proposed to contribute to BPV through reduced arterial compliance and diminished baroreflex sensitivity. Moreover, increased BPV can exert mechanical stress on arterial walls, leading to arterial remodeling and stiffness−ultimately perpetuating a detrimental feed-forward cycle[16].

      In our model, male BPV mice exhibited a minimal yet significant elevation in SBP without corresponding increases in DBP, potentially reflecting isolated systolic hypertension, which is strongly associated with arterial stiffness[17,18]. Our initial goal was to establish controlled rapid fluctuations in BP, and Ang II was selected as the pressor due to its potent vasoconstrictive properties and short half-life[19].

      We appreciate the reviewer’s insightful comment and acknowledge the necessity of exploring alternative mechanisms underlying BPV, and independent of Ang II. It is our long-term goal to investigate these factors in further studies.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) How was the dose of Ang II determined? It seems that this dose (3.1ug/hr) is quite high.

      The Ang II dose was titrated in a preliminary study to one that induced a significant and transient BP response without increasing 24-hour blood pressure (i.e. no hypertension).

      Ang II was delivered subcutaneously at 3.1 μg/hr, a concentration comparable to high-dose Ang II administration via mini-osmotic pumps (~1700 ng/kg/min)[20], with one-hour pulses occurring every 3-4 hours. With 6 pulses per day, the total daily dose equates to 18.6 µg/day in a ~30 gram mouse.

      For comparison, if the same 18.6 µg/day dose were administered continuously via a mini-osmotic pump (18.6 µg/0.03kg/1440min), the resulting dosage would be approximately 431 ng/kg/min[21,22], aligning with subpressor dose levels. Thus, while the total dose may appear high, it is not delivered in a constant manner but rather intermittently, allowing for controlled, rapid variations in blood pressure.

      (2) Were behavioral studies performed on the same mice that were individually housed? Individual housing causes significant stress in mice that can affect learning and memory tasks (PMC6709207). It's not a huge issue since the control mice would have been housed the same way, but it is something that could be mentioned in the discussion section.

      Behavioral studies were performed on mice that were individually housed following the telemetry surgery. The study was started once BP levels stabilized, as mice required several days to achieve hemodynamic stability post-surgery. Consequently, all mice were individually housed for several days before undergoing behavioral assessment.

      To account for potential cognitive variability, earlier novel object recognition (NOR) tests were conducted to established cognitive capacity, and mice that did not meet criteria were excluded from further behavioral testing. However, we acknowledge that individual housing induces stress, which can influence learning and memory, and this is a factor we were unable to fully control. Given that both experimental and control groups experienced the same housing conditions, this stress effect should be comparable across cohorts. A discussion on this limitation is now included in the text.

      (3) It looks like one control mouse that was included in both Figures 1 and 2 (control n=12) but was excluded in Table 1 (control n=11), this isn't mentioned in the text - please include the exclusion criteria in the manuscript.

      We apologize for the typo−12 control animals were consistently utilized across Figure 1-2, Table 1, Supplemental Table 1, Figure 6C, and Supplemental Figure 2B. Since the initial submission, one control mouse was completed and included into the telemetry control cohort. Thus, in the updated manuscript, we have corrected the control sample size to 13 mice across these figures ensuring consistency.

      Additionally, exclusion criteria have now been explicitly included in the manuscript (Line 173-175). Mice were excluded from the study if they died prematurely (died prior to treatment onset) or mice exhibited abnormally elevated pressure while receiving saline, likely due to complications from telemetry surgery.

      (4) Please include a statement on why female mice were not included in this study.

      As discussed in our response to Reviewer #1, our initial intention was to include both male and female mice in this study. However, high mortality rates following telemetry surgeries significantly constrained our ability to advance all aspects of the study. As a result, we limited our first cohort to males to establish the basics of the model. A statement is now included in the manuscript, Line 50-53: “Female mice were not included in the present study due to high post-surgery mortality observed in 12-14-month-old mice following complex procedures. To minimized confounding effects of differential survival and to establish foundational data for this model, we restricted the investigation to male mice.”

      Potential sex differences might be complex and warrants a separate future research to comprehensively assess sex as a biological variable, which are currently ongoing.

      (5) On page 14, "experiments from control vs experimental mice were not equally conducted in the same season raising the possibility for a seasonal effect" - does this mean that control experiments were not conducted at the same time as the Ang II infusions in BPV mice? This has huge implications on whether the effects observed are induced by treatment or just batch seasonal effects.

      We fully acknowledge the reviewer’s concern, and our statement aims to provide transparency regarding the study’s limitations. Several challenges contributed to this outcome, including high mortality rates following surgeries (primarily telemetry implantation) and technical issues related to instrumentation, particularly telemetry functionality.

      Differences between BPV and saline mice emerge primarily due to mortality or telemetry failures−some mice did not survive post-surgery, while others remain healthy but had non-functional telemeters. This issue was particularly pronounced in 14-month-old mice, as their fragile vasculature occasionally prevented proper BP readings.

      Each experiment required a minimum of two and a half months per mouse to complete, with a cost (also per mouse) exceeding $1500 USD ($300 pump, $175 mouse, $900 telemeters, per diem, drugs, reagents etc.). Despite our best effort to ensure comparable seasonal/batch data, these logistical and technical constraints prevented perfect synchronization.

      To evaluate whether seasonal differences influenced our results, we incorporated additional telemetry data into the control cohort. Of the seven included control mice, six underwent the same treatment but were allocated to a separate branch of the study, which endpoints did not require a chronic cranial window. We found no significant differences in 24-hour average MAP during the baseline period between control mice with or without a cranial window, Supplemental Figure 2A. Additionally, we grouped mice into seasonal categories based on Georgia’s climate: “Spring-Summer” (May-September) and “Fall-Winter” (October-April) but observed no BP differences between these periods, Supplemental Figure 2B.

      Given the absence of seasonal effects on BP and the fact that mice were sourced from two independent suppliers (Jackson Laboratory and NIA), we anticipate that the observed results are driven by treatment rather than seasonal or batch effects.

      (6) Methods, two-photon imaging: did the authors mean "retro-orbital" instead of "intra-orbital" injection of the Texas red dye? Also, is this a Texas red-dextran? If so, what molecular weight?

      Thank you for this comment. The correct terminology is “retro-orbital” rather than “intra-orbital” injection. Additionally, we utilized Texas Red-dextran (70 kDa, 5% [wt/vol] in saline) for the imaging experiments. These details have now been incorporated into the Methods section.

      (1) Shaffer F, Ginsberg JP. An Overview of Heart Rate Variability Metrics and Norms. Front Public Health. 2017;5:258. doi: 10.3389/fpubh.2017.00258

      (2) Pires PW, Jackson WF, Dorrance AM. Regulation of myogenic tone and structure of parenchymal arterioles by hypertension and the mineralocorticoid receptor. Am J Physiol Heart Circ Physiol. 2015;309:H127-136. doi: 10.1152/ajpheart.00168.2015

      (3) Iddings JA, Kim KJ, Zhou Y, Higashimori H, Filosa JA. Enhanced parenchymal arteriole tone and astrocyte signaling protect neurovascular coupling mediated parenchymal arteriole vasodilation in the spontaneously hypertensive rat. J Cereb Blood Flow Metab. 2015;35:1127-1136. doi: 10.1038/jcbfm.2015.31

      (4) Diaz JR, Kim KJ, Brands MW, Filosa JA. Augmented astrocyte microdomain Ca(2+) dynamics and parenchymal arteriole tone in angiotensin II-infused hypertensive mice. Glia. 2019;67:551-565. doi: 10.1002/glia.23564

      (5) Kim KJ, Diaz JR, Presa JL, Muller PR, Brands MW, Khan MB, Hess DC, Althammer F, Stern JE, Filosa JA. Decreased parenchymal arteriolar tone uncouples vessel-to-neuronal communication in a mouse model of vascular cognitive impairment. GeroScience. 2021. doi: 10.1007/s11357-020-00305-x

      (6) Chan SL, Nelson MT, Cipolla MJ. Transient receptor potential vanilloid-4 channels are involved in diminished myogenic tone in brain parenchymal arterioles in response to chronic hypoperfusion in mice. Acta Physiol (Oxf). 2019;225:e13181. doi: 10.1111/apha.13181

      (7) Tarantini S, Hertelendy P, Tucsek Z, Valcarcel-Ares MN, Smith N, Menyhart A, Farkas E, Hodges EL, Towner R, Deak F, et al. Pharmacologically-induced neurovascular uncoupling is associated with cognitive impairment in mice. J Cereb Blood Flow Metab. 2015;35:1871-1881. doi: 10.1038/jcbfm.2015.162

      (8) Ma J, Ayata C, Huang PL, Fishman MC, Moskowitz MA. Regional cerebral blood flow response to vibrissal stimulation in mice lacking type I NOS gene expression. Am J Physiol. 1996;270:H1085-1090. doi: 10.1152/ajpheart.1996.270.3.H1085

      (9) Sible IJ, Nation DA. Blood Pressure Variability and Cognitive Decline: A Post Hoc Analysis of the SPRINT MIND Trial. Am J Hypertens. 2023;36:168-175. doi: 10.1093/ajh/hpac128

      (10) Epstein NU, Lane KA, Farlow MR, Risacher SL, Saykin AJ, Gao S. Cognitive dysfunction and greater visit-to-visit systolic blood pressure variability. Journal of the American Geriatrics Society. 2013;61:2168-2173. doi: 10.1111/jgs.12542

      (11) Antunes M, Biala G. The novel object recognition memory: neurobiology, test procedure, and its modifications. Cognitive processing. 2012;13:93-110. doi: 10.1007/s10339-011-0430-z

      (12) Kraeuter AK, Guest PC, Sarnyai Z. The Y-Maze for Assessment of Spatial Working and Reference Memory in Mice. Methods Mol Biol. 2019;1916:105-111. doi: 10.1007/978-1-4939-8994-2_10

      (13) Singhal G, Morgan J, Jawahar MC, Corrigan F, Jaehne EJ, Toben C, Breen J, Pederson SM, Manavis J, Hannan AJ, et al. Effects of aging on the motor, cognitive and affective behaviors, neuroimmune responses and hippocampal gene expression. Behav Brain Res. 2020;383:112501. doi: 10.1016/j.bbr.2020.112501

      (14) Tediashvili G, Wang D, Reichenspurner H, Deuse T, Schrepfer S. Balloon-based Injury to Induce Myointimal Hyperplasia in the Mouse Abdominal Aorta. J Vis Exp. 2018. doi: 10.3791/56477

      (15) Ozkayar N, Dede F, Akyel F, Yildirim T, Ates I, Turhan T, Altun B. Relationship between blood pressure variability and renal activity of the renin-angiotensin system. J Hum Hypertens. 2016;30:297-302. doi: 10.1038/jhh.2015.71

      (16) Kajikawa M, Higashi Y. Blood pressure variability and arterial stiffness: the chicken or the egg? Hypertens Res. 2024;47:1223-1224. doi: 10.1038/s41440-024-01589-8

      (17) Laurent S, Boutouyrie P. Arterial Stiffness and Hypertension in the Elderly. Front Cardiovasc Med. 2020;7:544302. doi: 10.3389/fcvm.2020.544302

      (18) Wallace SM, Yasmin, McEniery CM, Maki-Petaja KM, Booth AD, Cockcroft JR, Wilkinson IB. Isolated systolic hypertension is characterized by increased aortic stiffness and endothelial dysfunction. Hypertension. 2007;50:228-233. doi: 10.1161/HYPERTENSIONAHA.107.089391

      (19) Al-Merani SA, Brooks DP, Chapman BJ, Munday KA. The half-lives of angiotensin II, angiotensin II-amide, angiotensin III, Sar1-Ala8-angiotensin II and renin in the circulatory system of the rat. J Physiol. 1978;278:471490. doi: 10.1113/jphysiol.1978.sp012318

      (20) Zimmerman MC, Lazartigues E, Sharma RV, Davisson RL. Hypertension caused by angiotensin II infusion involves increased superoxide production in the central nervous system. Circ Res. 2004;95:210-216. doi: 10.1161/01.RES.0000135483.12297.e4

      (21) Gonzalez-Villalobos RA, Seth DM, Satou R, Horton H, Ohashi N, Miyata K, Katsurada A, Tran DV, Kobori H, Navar LG. Intrarenal angiotensin II and angiotensinogen augmentation in chronic angiotensin II-infused mice. Am J Physiol Renal Physiol. 2008;295:F772-779. doi: 10.1152/ajprenal.00019.2008

      (22) Nakagawa P, Nair AR, Agbor LN, Gomez J, Wu J, Zhang SY, Lu KT, Morgan DA, Rahmouni K, Grobe JL, et al. Increased Susceptibility of Mice Lacking Renin-b to Angiotensin II-Induced Organ Damage. Hypertension. 2020;76:468-477. doi: 10.1161/HYPERTENSIONAHA.120.14972

    1. eLife Assessment

      This study offers a valuable contribution to our understanding of the role of layer 6b cortical neurons in sleep-wake regulation, providing new insight into how this understudied neural population may regulate cortical arousal via orexin signaling. The evidence supporting these findings is solid, although somewhat constrained by limitations in the specificity of the genetic targeting strategy. Nonetheless, the work introduces new avenues for uncovering how the classical wake-promoting peptide, orexin, exerts its effects on the cortex.

    2. Reviewer #1 (Public review):

      Summary:

      Meijer et al. sought to investigate the role of cortical layer 6b (L6b) neurons in modulating sleep-wake states and cortical oscillations under baseline and sleep deprived conditions and in response to orexin A and B. Using chronic EEG recordings in mice with silencing of Drd1a+ neurons (via constitutive Cre-dependent knockout of SNAP25), the authors report that while overall baseline sleep-wake architecture and response to sleep deprivation minimal/unchanged, "L6b silencing" leads to a slowing of theta activity during wakefulness and REM sleep, and a reduction in EEG power during NREM sleep. Additionally, orexin B-induced increases in theta activity were attenuated in L6b silenced mice, which the authors state suggests a modulatory role for L6b in orexin-mediated arousal regulation. The manuscript is generally well written with clarity and transparency. However, a major concern is the lack of specificity in the genetic manipulation, which targets Drd1a+ neurons not exclusive to L6b, undermining the attribution of observed effects solely to L6b. Verification of neuronal silencing is also unclear, and statistical inconsistencies between the main text and figures/tables make it difficult to effectively evaluate the text and stated outcomes.

      Strengths:

      (1) The text is well written.

      (2) The authors are transparent about methodological details.

      (3) The stated sleep, circadian, and orexin infusion experiments appear to be well designed, executed, and analyzed (with the exceptions of some statistical analyses detailed below).

      Weaknesses:

      (1) All outcomes are attributed specifically to L6b neurons, but the genetic manipulation is not specific to L6b neurons. The authors acknowledge this as a limitation, but in my view, this global manipulation is more than a limitation - it affects the overall interpretations of the data. The Hoerder-Suabedissen et al., 2018 paper shows sparse, but also dense, expression of Drd1a+ neurons in brain regions outside of the L6b. Given this issue, the results are largely overstated throughout the paper.

      (2) It is not clear to me that the "silencing" of Drd1a+ neurons was verified.

      (3) There were various discrepancies (and potentially misattributions) between the stated significant differences in Supplementary Table T1 data and Figure 3a & S2 spectral plots. This issue makes it difficult to effectively evaluate the main text and stated outcomes.

      Related, the authors stated that post hoc comparisons of EEG spectral frequency bins were not corrected for multiple testing. Instead, significance was only denoted if changes in at least two consecutive frequency bins were significant. However, there are multiple plots in which a single significance marker is placed over an isolated bin (i.e., 4c, 6, S5, S6). Unless each marker is equivalent to 2 consecutive frequency bins, these markers should be removed from the plots. Otherwise, please define the frequency and size of these markers in the main text.

      (4) A rainbow color scale, as in Figure 3, we've now learned, can be misleading and difficult to interpret. The viridis color scale or a different diverging color scale are good alternatives.

      (5) How much time elapsed between vehicle/orexin A & B infusions?

      (6) For Figure 6, there are statistical discrepancies between the main text and the plots (pg. 10):

      a) The text claims post hoc differences for relative ORXA frontal EEG, but there are no significance markers on the plot.<br /> b) The text states that there were no post hoc differences for the relative ORXA occipital EEG, but significance markers are on the plot.<br /> c) The main test for the relative ORXB frontal EEG was not significant, but there are post hoc significance markers on the plot.<br /> d) For relative ORXB occipital EEG, there are significant markers on the plot outside of the stated range in the text.

      (7) Some important details are only available in figure captions, making it difficult to understand the main text. For example, when describing Figure 3c in the main text on page 7, it is not clear what type of transitions are being discussed without reading the figure caption. Likewise, a "decrease," "shift," and "change" are mentioned, but relative to what? Similar comment for the EEG theta activity description on pages 7 - 8. Please add relevant details to the main text.

      (8) Statistical comparisons for data in Figure 3e, post hoc analyses for data in Figure S7a-b REM data, and post hoc analyses for Figure S7c (not b) occipital EEG should be included to support differences claims. Please denote these differences on the respective plots.

      (9) In the subsection titled "Layer 6b mediates effects of orexin on vigilance states (pg. 8)," there does not seem to be any stated differences between control and L6b silenced mice. A more accurate subtitle is needed.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Meijer and colleagues investigated the effects of inactivation (conditional silencing) of cortical layer 6b neurons on sleep-wake states and EEG spectral power under the following three conditions: during natural sleep-wake states, after sleep deprivation, or after intracerebroventricular administration of orexin A and B. The authors report that silencing of L6b neurons did not have a significant effect on the total time spent in sleep-wake states, duration, or number of state epochs, or the response to sleep deprivation. However, silencing of L6b neurons did slow down theta-frequency (6-9 Hz) during wake and REM sleep, and reduced the total EEG power during NREM sleep. Infusion of orexin A in the mice in which cortical layer 6b neurons were inactivated produced an increase in wakefulness. A similar effect was observed after infusion of orexin A in the mice in which these neurons were not silenced, but the effect (i.e., increase in wakefulness) was of a smaller magnitude. Silencing of cortical layer 6b neurons attenuated the effect of orexin B in increasing theta activity, as was observed in the control mice. The authors conclude that the cortical neurons in layer 6b play an essential role in state-dependent dynamics of brain activity, vigilance state control, and sleep regulation.

      Strengths:

      (1) A focus on cortical layer 6b neurons, which are an understudied neuronal population, especially in the context of brain and behavioral state transitions.

      (2) The authors used a well-established mouse model to study the effect of inactivation of cortical layer 6b neurons.

      Weaknesses:

      (1) Although the authors used a highly selective approach to silence layer 6b neurons, the observed changes in EEG oscillations cannot be solely attributed to layer 6b neurons because of the ICV route for orexin administration.

      (2) The rationale for using only male rats is not provided.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      (1) All outcomes are attributed specifically to L6b neurons, but the genetic manipulation is not specific to L6b neurons. The authors acknowledge this as a limitation, but in my view, this global manipulation is more than a limitation - it affects the overall interpretations of the data. The Hoerder-Suabedissen et al., 2018 paper shows sparse, but also dense, expression of Drd1a+ neurons in brain regions outside of the L6b. Given this issue, the results are largely overstated throughout the paper.

      We appreciate the reviewer’s careful reading and concern that some of our statements may have overstated the implications of our data. The Drd1a Cre mouse model used (FK164) has a relatively selective expression of Drd1a Cre in cortex, especially in layer 6b, but indeed some expression is seen in layer 6a and subcortically. We will nuance our claims throughout the paper to ensure that the conclusions are supported by our findings, and further discuss the impact of this limitation on the overall interpretation of our results. Specifically, we will discuss the potential contribution of relevant subcortical areas and layer 6a in the effects we observed.

      (2) It is not clear to me that the "silencing" of Drd1a+ neurons was verified.

      In our previous publications, we showed confirmation of the loss of regulated synaptic vesicle release from the Cre positive neuronal population (Marques-Smith et al., 2016; Hoerder-Suabedissen et al., 2018; Messore et al., 2024), which validates our approach to “silence” cortical neurons. We will discuss this further in the revised manuscript.

      (3) There were various discrepancies (and potentially misattributions) between the stated significant differences in Supplementary Table T1 data and Figure 3a & S2 spectral plots. This issue makes it difficult to effectively evaluate the main text and stated outcomes.

      We thank the reviewer for spotting the inconsistencies in how the statistical comparisons were presented: indeed, in the text we described two-way ANOVAs with posthoc tests but in the figures significance markers were positioned based on multiple t-tests. We have revised Supplementary Table T1, Figure 3a and S2 to ensure that all statistics are presented consistently throughout the manuscript, i.e. with two-way ANOVAs and accompanying posthoc tests.

      Related, the authors stated that post hoc comparisons of EEG spectral frequency bins were not corrected for multiple testing. Instead, significance was only denoted if changes in at least two consecutive frequency bins were significant. However, there are multiple plots in which a single significance marker is placed over an isolated bin (i.e., 4c, 6, S5, S6). Unless each marker is equivalent to 2 consecutive frequency bins, these markers should be removed from the plots. Otherwise, please define the frequency and size of these markers in the main text.

      In line with the previous comment, we have adjusted markers to reflect the results from posthoc tests after two-way ANOVAs in Figures 6 and supplementary figures S5 and S6. 

      We thank the reviewer for pointing out that in our comparisons of EEG spectra, in some cases single isolated frequency bins, where p-value reached 0.05 were shown as significantly different, which indeed could have occurred by chance given that, in line with previous literature, we have not employed multiple testing comparison. In the revised manuscript we will use an unbiased approach by plotting actual p-values for all bins, and moderate our conclusions accordingly, while giving the readers the opportunity to evaluate the magnitude and extent of the differences directly, rather than relying on an arbitrary threshold for significance.

      (4) A rainbow color scale, as in Figure 3, we've now learned, can be misleading and difficult to interpret. The viridis color scale or a different diverging color scale are good alternatives.

      Thank you for pointing this out, we have adjusted the colour scale.

      (5) How much time elapsed between vehicle/orexin A & B infusions?

      There were 2-4 non-infusions days between infusions. We will add this information to methods when revising the manuscript.

      (6) For Figure 6, there are statistical discrepancies between the main text and the plots (pg. 10):

      a) The text claims post hoc differences for relative ORXA frontal EEG, but there are no significance markers on the plot.

      b) The text states that there were no post hoc differences for the relative ORXA occipital EEG, but significance markers are on the plot.

      c) The main test for the relative ORXB frontal EEG was not significant, but there are post hoc significance markers on the plot.

      d) For relative ORXB occipital EEG, there are significant markers on the plot outside of the stated range in the text.

      Thank you for your careful observations, these issues reflect the same inconsistency as raise above, where the text describes two-way ANOVAs and the figures refers to results obtained with multiple t tests. We shall adjust the markers in the figures to be only shown when the ANOVA is significant and show the results of posthoc tests after ANOVAs instead of the results of multiple t tests.

      (7) Some important details are only available in figure captions, making it difficult to understand the main text. For example, when describing Figure 3c in the main text on page 7, it is not clear what type of transitions are being discussed without reading the figure caption. Likewise, a "decrease," "shift," and "change" are mentioned, but relative to what? Similar comment for the EEG theta activity description on pages 7 - 8. Please add relevant details to the main text.

      We will adjust the wording in the main text to reflect more precisely which comparisons are shown in the figures.

      (8) Statistical comparisons for data in Figure 3e, post hoc analyses for data in Figure S7a-b REM data, and post hoc analyses for Figure S7c (not b) occipital EEG should be included to support differences claims. Please denote these differences on the respective plots.

      We have added the statistical comparisons for Figure 3e to the results section.

      We have added the statistical comparisons for Figure S7A to the results section.

      We have added the statistical comparison for Figure S7b to the results section.

      In Figure S7c, there was an overall genotype difference, but there was not a time x genotype interaction, so we have not performed posthoc tests and did not plot posthoc significance markers for this figure. We have adjusted the wording in the results section to make this clearer.

      We have adjusted the reference to the figure S7c which was incorrect, thank you for your careful attention.

      (9) In the subsection titled "Layer 6b mediates effects of orexin on vigilance states (pg. 8)," there does not seem to be any stated differences between control and L6b silenced mice. A more accurate subtitle is needed.

      We shall change the subtitle to: “The effects of orexin on vigilance states in L6b silenced mice”. The main finding described in this section is that the increase in EEG theta frequency after ORXB infusion is attenuated in L6b silenced mice, so a statement summarizing this finding could be an alternative title. However, then it would not accurately reflect other, less conspicuous, yet potentially important findings described in this section (during NREM sleep, only in L6b silenced animals there is an increase in power in the lower frequency bins in the frontal derivation; in the occipital derivation, levels of relative SWA during NREM sleep after ORXA infusion were lower in L6b silenced than in control animals).

      Reviewer #2 (Public review):

      Weaknesses:

      (1) Although the authors used a highly selective approach to silence layer 6b neurons, the observed changes in EEG oscillations cannot be solely attributed to layer 6b neurons because of the ICV route for orexin administration.

      We completely agree, and did not want to imply that orexin administered through the ICV route reaches cortical Drd1a Cre expressing neurons only. We will re-word the corresponding sentences accordingly throughout the manuscript.

      (2) The rationale for using only male rats is not provided.

      We agree that this is an important limitation and will acknowledge and discuss it further in the revised manuscript. Unfortunately, our experimental protocol precluded the possibility of monitoring accurately the oestrous cycle, which as well-known has an influence on sleep-wake architecture, brain oscillations as well as orexin signalling and receptor abundance. We therefore decided to use male mice only for the current study, but planning to use both sexes in our follow up work.

    1. eLife Assessment

      In this valuable study, the authors use a cutting-edge method to perform voltage imaging of CA1 pyramidal cells in head-fixed mice running on a track while local field potentials (LFPs) were recorded in the contralateral hemisphere. The authors provide solid evidence of synchronous ensembles of CA1 pyramidal neurons that are associated with contralaterally recorded theta rhythms but not with contralaterally recorded sharp wave-ripples during exploration of a novel environment. The paper will be of interest to scientists who are interested in hippocampal neuronal coding of novel environments, particularly those with experimental questions that can benefit from this cutting-edge imaging technique.

    2. Joint Public Review:

      Summary:

      There has been extensive electrophysiological research investigating the relationship between local field potential patterns and individual cell spike patterns in the hippocampus. In this study, the authors used innovative imaging techniques to examine spike synchrony of hippocampal cells during locomotion and immobility states. The authors report that hippocampal place cells exhibit prominent synchronous spikes that co-occur with theta oscillations during exploration of novel environments.

      Strengths:

      The single cell voltage imaging used in this study is a highly novel method that may allow recordings that were not previously possible using traditional methods.

      Weaknesses:

      Local field potential recordings were obtained from the contralateral hemisphere for technical reasons, which limits some of the study's claims.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Joint Public Review:

      Summary:

      For many years, there has been extensive electrophysiological research investigating the relationship between local field potential patterns and individual cell spike patterns in the hippocampus. In this study, using innovative imaging techniques, they examined spike synchrony of hippocampal cells during locomotion and immobility states. The authors demonstrated that hippocampal place cells exhibit prominent synchronous spikes locked to theta oscillations.

      Strengths:

      The single cell voltage imaging used in this study is a highly novel method that may allow recordings that were not previously possible using existing methods.

      We thank the reviewer for recognizing the strengths of our study.

      Weaknesses:

      The strength of evidence remains incomplete because of the main claim that synchronous events are not associated with ripples. As was mentioned in previous rounds of review, ripples emerge locally and independently in the two hemispheres. Thus, obtaining ripple recordings from the contralateral hemisphere does not provide solid evidence for this claim. The papers the authors are citing to make the claim that "Additionally, we implanted electrodes in the contralateral CA1 region to monitor theta and ripple oscillations, which are known to co-occur across hemispheres (29-31)" do not support this claim. For example, reference 29 contains the following statement: "These findings suggest that ripples emerge locally and independently in the two hemispheres".

      In our previous revisions, we took care to limit our claim to what our data directly supported: that synchronous ensembles of CA1 neurons were not associated with ripple oscillations recorded in the contralateral hippocampus. To address reviewer concerns, we changed the Title, modified the Abstract, adjusted relevant text in the Results, and explicitly acknowledged the methodological limitations in the Discussion. 

      In this round, we further revised the manuscript to directly address the editor’s and reviewer’s remaining concerns: 

      (1) We replaced the word “surprisingly” with a more neutral “Moreover” to avoid implying that the observed dissociation was unexpected given the use of contralateral recordings.

      Introduction (line 67-69):

      “Moreover, these synchronous ensembles occurred outside of contralateral ripples (c-ripples) …”

      (2) We removed the clause stating that ripples “co-occur across hemispheres”, along with the associated citation to Buzsaki et al. (2003), to avoid potential misinterpretation. The sentence now simply states that we recorded ripple and theta oscillations in the contralateral CA1.

      Introduction (line 63-64):

      “Additionally, we implanted electrodes in the contralateral CA1 region to monitor theta and ripple oscillations.” (co-occurrence claim removed)

      (3) We carefully replaced all mentions of “ripples” in the manuscript with “c-ripples” (i.e., contralateral ripples) to ensure that the scope of our findings is clearly defined and cannot be misinterpreted.

      (4) We strengthened the acknowledgment of the methodological limitations in the Discussion. 

      Discussion (line 528-533): 

      “While contralateral LFP recordings can capture large-scale hippocampal theta and ripple oscillations, they do not fully reflect ipsilateral-specific dynamics, such as variation in theta phase alignment or locally generated ripple events (Buzsaki et al., 2003; Szabo et al., 2022; Huang et al., 2024). Given that ripple oscillations can emerge locally and independently in each hemisphere, interpretations based on contralateral recordings must be made with caution. Further studies incorporating simultaneous ipsilateral field potential recordings will be essential to more precisely understand local-global network interactions.”

      These revisions ensure that our manuscript now presents a consistent and appropriately limited interpretation across all sections. We hope these clarifications address all remaining concerns and accurately reflect the scope of our findings.

    1. eLife Assessment

      This paper reports a valuable discovery that specific-mode electroacupuncture (EA) transiently opens the blood-brain barrier (BBB) in rats. The evidence is solid but lacks functional validation of BBB permeability changes. The work will be of interest to medical scientists working in the field of electroacupuncture and drug delivery.

    2. Reviewer #1 (Public review):

      Summary:

      The work from this paper successfully mapped transcriptional landscape and identified EA-responsive cell types (endothelial, microglia). Data suggest EA modulates BBB via immune pathways and cell communication. However, claims of "BBB opening" are not directly proven (no permeability data).

      Strengths:

      First scRNA-seq atlas of EA effects on BBB, revealing 23 cell clusters and 8 cell types. High cell throughput (98,338 cells), doublet removal, and robust clustering (Seurat, SingleR). Comprehensive bioinformatics (GO/KEGG, CellPhoneDB for ligand-receptor interactions). Raw data were deposited in GEO (GSE272895) and can be accessed.

      Weaknesses:

      (1) No in vivo/in vitro assays confirm BBB permeability changes (e.g., Evans blue leakage, TEER).

      (2) Only male rats were used, ignoring sex-specific BBB differences.

      (3) Pericytes and neurons, critical for the BBB, were not captured, likely due to dissociation artifacts.

      (4) Protein-level validation (Western blot, IHC) absent for key genes (e.g., LY6E, HSP90).

      (5) Fixed stimulation protocol (2/100 Hz, 40 min); no dose-response or temporal analysis.

    3. Reviewer #2 (Public review):

      Summary:

      This study uses single-cell RNA sequencing to explore how electroacupuncture (EA) stimulation alters the brain's cellular and molecular landscape after blood-brain barrier (BBB) opening. The authors aim to identify changes in gene expression and signaling pathways across brain cell types in response to EA stimulation using single-cell RNA sequencing. This direction holds promise for understanding the consequences of noninvasive methods of BBB opening for therapeutic drug delivery across the BBB.

      Strengths:

      (1) The study addresses an emerging and potentially important application of noninvasive stimulation methods to manipulate BBB permeability.

      (2) The dataset provides broad transcriptional profiling across multiple brain cell types using single-cell resolution, which could serve as a valuable community resource.

      (3) Analyses of receptor-ligand signaling and cell-cell communication are included and have the potential to offer mechanistic insight into BBB regulation.

      Weaknesses:

      (1) The work falls short in its current form. The experimental design lacks a clear justification, and readers are not provided with sufficient background information on the extent, timing, or regional specificity of BBB opening in this EA model. These details, established in prior work, are critical to understanding the rationale behind the current transcriptomic analyses.

      (2) Further, the results are often presented with minimal context or interpretation. There is no model of intercellular or molecular coordination to explain the BBB-opening process, despite the stated goal of identifying such mechanisms. The statement that EA induces a "unique frontal cortex-specific transcriptome signature" is not supported, as no data from other brain regions are presented. Biological interpretation is at times unclear or inaccurate - for instance, attributing astrocyte migration effects to endothelial cell clusters or suggesting microglial tight junction changes without connecting them meaningfully to endothelial function.

      (3) The study does include analyses of receptor-ligand signaling and cell-cell communication, which could be among its most biologically rich outputs. However, these are relegated to supplementary material and not shown in the leading figures. This choice limits the utility of the manuscript as a hypothesis-generating resource.

      (4) Overall, while the dataset may be of interest to BBB researchers and those developing technologies for drug delivery across the BBB, the manuscript in its current form does not yet fulfill its interpretive goals. A more integrated and biologically grounded analysis would be beneficial.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The work from this paper successfully mapped transcriptional landscape and identified EA-responsive cell types (endothelial, microglia). Data suggest EA modulates BBB via immune pathways and cell communication. However, claims of "BBB opening" are not directly proven (no permeability data).

      (1) No in vivo/in vitro assays confirm BBB permeability changes (e.g., Evans blue leakage, TEER).  

      (2) Only male rats were used, ignoring sex-specific BBB differences.

      (3) Pericytes and neurons, critical for the BBB, were not captured, likely due to dissociation artifacts.

      (4) Protein-level validation (Western blot, IHC) absent for key genes (e.g., LY6E, HSP90).

      (5) Fixed stimulation protocol (2/100 Hz, 40 min); no dose-response or temporal analysis.

      (1) We sincerely apologize for the oversight regarding the description of changes in blood-brain barrier permeability. In fact, our team conducted a series of preliminary studies that verified this aspect, but we did not provide a more detailed introduction in the introduction section. We will address and improve this in the revised manuscript. (2) We are very grateful to the reviewers for pointing out the important and meaningful issue of "gender-specific BBB differences." We will make this a focal point in our future research.

      (2) As for pericytes and neurons, we acknowledge their importance in the function of the blood-brain barrier. We acknowledge the importance of pericytes and neurons in the blood-brain barrier. However, neurons are absent because our sample processing method involves dissociation. During the dissociation procedure, neuronal axons, which are relatively long, are filtered out during the frequent cell suspension steps and cannot enter the downstream microfluidic system for analysis, so they are not present in our data. Since this experiment is primarily focused on non-neuronal cells, we did not choose to use nucleus extraction for sample processing. As for pericytes, we believe they are not captured because their proportion in our samples is extremely low, which is why they are not present in the data. Further research may require single-nucleus transcriptomics or the separate isolation of these two cell types for study. Of course, in our current mechanistic studies, we are also fully considering the important roles these two cell types play in BBB function.

      (3) In addition, for verification at the protein level, we have recently conducted some experiments and will include these results in the revised version.

      (5) Lastly, regarding our electroacupuncture intervention model, we actually conducted a series of parameter optimization experiments during the preliminary exploration phase. This part is indeed lacking in our current introduction, and we will add it to the research background and introduction.

      Reviewer #2 (Public review):

      Summary:

      This study uses single-cell RNA sequencing to explore how electroacupuncture (EA) stimulation alters the brain's cellular and molecular landscape after blood-brain barrier (BBB) opening. The authors aim to identify changes in gene expression and signaling pathways across brain cell types in response to EA stimulation using single-cell RNA sequencing. This direction holds promise for understanding the consequences of noninvasive methods of BBB opening for therapeutic drug delivery across the BBB.

      (1) The work falls short in its current form. The experimental design lacks a clear justification, and readers are not provided with sufficient background information on the extent, timing, or regional specificity of BBB opening in this EA model. These details, established in prior work, are critical to understanding the rationale behind the current transcriptomic analyses.

      (2) Further, the results are often presented with minimal context or interpretation. There is no model of intercellular or molecular coordination to explain the BBB-opening process, despite the stated goal of identifying such mechanisms. The statement that EA induces a "unique frontal cortex-specific transcriptome signature" is not supported, as no data from other brain regions are presented. Biological interpretation is at times unclear or inaccurate - for instance, attributing astrocyte migration effects to endothelial cell clusters or suggesting microglial tight junction changes without connecting them meaningfully to endothelial function.<br /> (3) The study does include analyses of receptor-ligand signaling and cell-cell communication, which could be among its most biologically rich outputs. However, these are relegated to supplementary material and not shown in the leading figures. This choice limits the utility of the manuscript as a hypothesis-generating resource.

      (4) Overall, while the dataset may be of interest to BBB researchers and those developing technologies for drug delivery across the BBB, the manuscript in its current form does not yet fulfill its interpretive goals. A more integrated and biologically grounded analysis would be beneficial.

      (1) It was indeed our mistake that we did not pay attention to the importance of research background factors such as the degree, timing, or regional specificity of BBB opening for the rationale and purpose of this experimental design. In our revision, we will thoroughly elaborate on the relevant previous studies.

      (2) Our current study is actually based on previous findings that electroacupuncture can open the BBB, with a more pronounced effect observed in the frontal lobe (this aspect should be further described in the research background). Building on this foundation, our aim is to delineate the potential biological mechanisms involved. Therefore, we selected frontal lobe tissue as our primary choice for sequencing and have not yet investigated differences across other brain regions, although this may become a focus of future research. Additionally, we recognize that the mechanism underlying BBB opening is complex, and at present, we cannot determine whether it is driven by a single direct factor or by coordinated actions between cells or molecules. As such, our results are presented only briefly for now, and we will carefully consider whether to supplement our findings by incorporating insights from other studies.

      (3) Thank you very much for bringing this to our attention. We will include the key results of the receptor-ligand signaling and cell-cell communication analysis in the main manuscript.

      (4) Indeed, our current dataset and analysis tend to present objective data results. We are also conducting a series of validations that may be related to the biology of the blood-brain barrier, and we look forward to sharing and discussing any future research findings with you and everyone.

    1. eLife Assessment

      This study presents valuable computational findings on the neural basis of learning new motor memories without interfering with previously learned behaviours using recurrent neural networks. The evidence supporting the claims of the authors is solid, but it would benefit from stronger and clearer links with experimental findings. This work will be of interest to computational and experimental neuroscientists working in motor learning.

    2. Reviewer #1 (Public review):

      Summary:

      This work investigates the neural basis of continual motor learning, specifically how brains might accommodate new motor memories without interfering with previously learned behaviours. Mainly drawing inspiration from recent experimental studies in monkeys (Losey et al. and Sun, O'Shea et al.), the authors use recurrent neural networks (RNNs) to model sequential learning and examine the emergence and properties of two proposed neural signatures of motor memory: the "uniform shift" observed in preparatory activity and the "memory trace" observed in execution activity.

      Strengths:

      The work's main contribution is demonstrating that both uniform shifts and memory traces emerge in RNN models trained on a sequential BCI task, without requiring explicit additional mechanisms. The work explores the relationship between these signatures and behavioural savings, finding that the memory trace correlates with immediate retention savings in networks without context, while the uniform shift does not. The study also investigates how properties of the new task perturbation (within- vs. outside-manifold) and the presence of explicit context cues affect these signatures and their relationship to savings, generally finding that context signals and outside-manifold perturbations reduce savings by decreasing the inherent overlap in the neural strategies used to solve the task.

      Weaknesses:

      A primary weakness is the lack of clear definitions of the uniform shift and the memory trace, which are quite different metrics. Another primary weakness is that the task modelled is well-matched to the Losey et al. BCI paradigm, but not well-matched to the Sun, O'Shea et al.'s curl field paradigm, which is likely impacting some of the results, primarily the lack of a relationship between the uniform shift and motor memories. While there are improvements that could be made in this work, we think it is a demonstration that modeling learning in neural activity using neural network models continues to be a valuable tool, moving the field forward.

    3. Reviewer #2 (Public review):

      Summary:

      Chang et al. develop an RNN model of a BCI sequential learning task to examine the emergence of motor memory in the network. They use this system to quantify signatures of memory in continual learning, comparing their model with experimental observations from monkeys in prior publications. They show that the RNN model has signatures of shifts associated with sequential learning without any non-standard learning rules. This convincing study contributes to the knowledge of how motor memories are formed and shaped so that they are flexible in acquiring multiple behaviors.

      Strengths:

      This paper describes a well-designed numerical experiment that comes to a clear interpretation of a set of neural BCI experiments. The learning signatures the authors describe are interesting and well laid out, and the paper is well written. I find it insightful that the neural signature of motor learning emerges in a trained network without special learning rules.

      Weaknesses:

      The paper could be stronger if it made a stronger interpretation of how memory traces and uniform shifts are related. These two observations are taken from the BCI sequential learning literature and introduced by two different prior experimental papers on two different tasks, so it seems like there is an opportunity here to use the RNN model to unite these concepts, or define another metric for signatures of learning from a more normative approach.

    4. Reviewer #3 (Public review):

      Summary:

      The authors build and analyze recurrent neural network (RNN) models of brain-computer interface (BCI) multi-task learning, developing a valuable theoretical understanding of learning-related neural population phenomena ("memory traces" and "uniform shifts") that have been reported in recent experimental studies of BCI and motor learning. The authors find that both phenomena emerge in their RNN models, and both correlate in some manner to learning-related behavioral phenomena ("savings" and "forgetting"). The authors also reveal that RNN training details, in particular, incorporating a task-indicating contextual input, can impact these population-level signatures of learning in RNN activity and their relation to those behavioral phenomena.

      Strengths:

      The text is well written, and the figures are clearly composed to convey the core concepts and findings. The RNN studies are elegant in their ability to recapitulate the memory trace and uniform shift phenomena, and further allow evaluations of novel scenarios that were not tested in the original corpus of the modeled animal experiments. The authors assess the sensitivity of their results to multiple approaches to RNN training, including training connectivity within a model of motor cortex, training only an upstream model that provides inputs to the motor cortex model, and providing task-indicating contextual inputs.

      Weaknesses:

      (1) It is unclear to what extent these RNN models operate in regimes relevant to biological neural networks (e.g., motor cortex), even at the neural-population level of abstraction studied here. Can the authors speak to how sensitive their results are to details that might speak to these operating regimes (e.g., signal-to-noise ratios or dimensionality of the RNN activities)?

      (2) The work could be further strengthened by analyses demonstrating a more direct link between the neural population phenomena (memory trace and uniform shift) and the behavioral phenomena (savings, forgetting, etc). While in animal experiments, it can be exceedingly difficult to demonstrate links beyond correlative effects, the promise of a model is the relative tractability of implementing manipulations that might establish something closer to a causal link between phenomena. Is it the case that the memory trace is a task-dependent, mean-preserving rotation of the across-target task-relevant activity space? And that the uniform shift is a translation (non-mean-preserving) of that space? If so, could the authors design regularization schemes that specifically target each of these effects, enabling a more direct test of the functional role the effects play in driving behavioral phenomena?

      Minor Comments:

      The current study is based on BCI learning of center-out tasks, analogous to the Losey et al. task that initially reported the memory trace phenomena. However, a rather different behavioral task - involving arm movements through curl force fields - was employed by the Sun, O'Shea, et al. study that originally reported the uniform shift phenomena. How should readers interpret the current study's findings related to the uniform shift? To what extent might the behavioral implications of the uniform shift depend on the demands of the task, e.g., the biomechanics, day-to-day experiencing of different curl-field perturbations, etc.?

    5. Author response:

      We thank the reviewers for their thoughtful comments, and we plan to implement many of their suggestions to improve the paper. We agree that the paper can benefit from clearer links between the two neural signatures (memory traces and uniform shifts) themselves, and between the neural signatures and behavioral phenomena. We will address these limitations in multiple ways. First, as the reviewers noted, RNN models have the potential to probe these relationships, so we plan to perform further analyses and modeling experiments to uncover any causal relationships. Second, we will also establish clearer definitions of the neural signatures and explore how these signatures can be unified using our models. Finally, we will compare the experimental paradigms between Losey et al and Sun, O’Shea et al, and discuss how differences between the paradigms may have impacted our observations, particularly in the context of other experimental and modeling papers.

    1. eLife Assessment

      This important study introduces the Life Identification Number (LIN) coding system as a powerful and versatile approach for classifying Neisseria gonorrhoeae lineages. The authors show that LIN codes capture both previously defined lineages and their relationships in a way that aligns with the species' phylogenetic structure. The compelling evidence presented, together with its integration into the PubMLST platform, underscores its strong potential to enhance epidemiological surveillance and advance our understanding of gonococcal population biology.

    2. Reviewer #1 (Public review):

      Summary:

      Bacterial species that frequently undergo horizontal gene transfer events tend to have genomes that approach linkage equilibrium, making it challenging to analyze population structure and establish the relationships between isolates. To overcome this problem, researchers have established several effective schemes for analyzing N. gonorrhoeae isolates, including MLST and NG-STAR. This report shows that Life Identification Number (LIN) Codes provide for a robust and improved discrimination between different N. gonorrhoeae isolates.

      Strengths:

      The description of the system is clear, the analysis is convincing, and the comparisons to other methods show the improvements offered by LIN Codes.

      Weaknesses:

      No major weaknesses were identified by this reviewer.

    3. Reviewer #2 (Public review):

      Summary:

      This paper describes a new approach for analyzing genome sequences.

      Strengths:

      The work was performed with great rigor and provides much greater insights than earlier classification systems.

      Weaknesses:

      A minor weakness is that the clinical application of LIN coding could be articulated in a more in-depth way. The LIN coding system is very impressive and is certainly superior to other protocols. My recommendation, although not necessary for this paper, is that the authors expand their analysis to noncoding sequences, especially those upstream of open reading frames. In this respect, important cis-acting regulatory mutations that might help to further distinguish strains could be identified.

    4. Reviewer #3 (Public review):

      Summary:

      In this well-written manuscript, Unitt and colleagues propose a new, hierarchical nomenclature system for the pathogen Neisseria gonorrhoeae. The proposed nomenclature addresses a longstanding problem in N. gonorrhoeae genomics, namely that the highly recombinant population complicates typing schemes based on only a few loci and that previous typing systems, even those based on the core genome, group strains at only one level of genomic divergence without a system for clustering sequence types together. In this work, the authors have revised the core genome MLST scheme for N. gonorrhoeae and devised life identification numbers (LIN) codes to describe the N. gonorrhoeae population structure.

      Strengths:

      The LIN codes proposed in this manuscript are congruent with previous typing methods for Neisseria gonorrhoeae, like cgMLST groups, Ng-STAR, and NG-MAST. Importantly, they improve upon many of these methods as the LIN codes are also congruent with the phylogeny and represent monophyletic lineages/sublineages.

      The LIN code assignment has been implemented in PubMLST, allowing other researchers to assign LIN codes to new assemblies and put genomes of interest in context with global datasets.

      Weaknesses:

      The authors correctly highlight that cgMLST-based clusters can be fused due to "intermediate isolates" generated through processes like horizontal gene transfer. However, the LIN codes proposed here are also based on single linkage clustering of cgMLST at multiple levels. It is unclear if future recombination or sequencing of previously unsampled diversity within N. gonorrhoeae merges together higher-level clusters, and if so, how this will impact the stability of the nomenclature.

      The authors have defined higher resolution thresholds for the LIN code scheme. However, they do not investigate how these levels correspond to previously identified transmission clusters from genomic epidemiology studies. It would be useful for future users of the scheme to know the relevant LIN code thresholds for these investigations.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Bacterial species that frequently undergo horizontal gene transfer events tend to have genomes that approach linkage equilibrium, making it challenging to analyze population structure and establish the relationships between isolates. To overcome this problem, researchers have established several effective schemes for analyzing N. gonorrhoeae isolates, including MLST and NG-STAR. This report shows that Life Identification Number (LIN) Codes provide for a robust and improved discrimination between different N. gonorrhoeae isolates.

      Strengths:

      The description of the system is clear, the analysis is convincing, and the comparisons to other methods show the improvements offered by LIN Codes.

      Weaknesses:

      No major weaknesses were identified by this reviewer.

      We thank the reviewer for their assessment of our paper.

      Reviewer #2 (Public review):

      Summary:

      This paper describes a new approach for analyzing genome sequences.

      Strengths:

      The work was performed with great rigor and provides much greater insights than earlier classification systems.

      Weaknesses:

      A minor weakness is that the clinical application of LIN coding could be articulated in a more in-depth way. The LIN coding system is very impressive and is certainly superior to other protocols. My recommendation, although not necessary for this paper, is that the authors expand their analysis to noncoding sequences, especially those upstream of open reading frames. In this respect, important cis-acting regulatory mutations that might help to further distinguish strains could be identified.

      We thank the reviewer for their comments. LIN code could be applied clinically, for example in the analysis of antibiotic resistant isolates, or to investigate outbreaks associated with a particular lineage. We will update the text to describe this more thoroughly.

      In regards to non-coding sequences: unfortunately, intergenic regions are generally unsuitable for use in typing systems as (i) they are subject to phase variation, which can occlude relationships based on descent; (ii) they are inherently difficult to assemble and therefore can introduce variation due to the sequencing procedure rather than biology. For the type of variant typing that LIN code represents, which aims to replicate phylogenetic clustering, protein encoding sequences are the best choice for convenience, stability, and accuracy. This is not to say that it is not a valid object to base a nomenclature on intergenic regions, which might be especially suitable for predicting some phenotypic characters, but this will still be subject to problem (ii), depending on the sequencing technology used.  Such a nomenclature system should stand beside, rather than be combined with or used in place of, phylogenetic typing. However, we could certainly investigate the relationship between an isolates LIN code and regulatory mutations in the future.

      Reviewer #3 (Public review):

      Summary:

      In this well-written manuscript, Unitt and colleagues propose a new, hierarchical nomenclature system for the pathogen Neisseria gonorrhoeae. The proposed nomenclature addresses a longstanding problem in N. gonorrhoeae genomics, namely that the highly recombinant population complicates typing schemes based on only a few loci and that previous typing systems, even those based on the core genome, group strains at only one level of genomic divergence without a system for clustering sequence types together. In this work, the authors have revised the core genome MLST scheme for N. gonorrhoeae and devised life identification numbers (LIN) codes to describe the N. gonorrhoeae population structure.

      Strengths:

      The LIN codes proposed in this manuscript are congruent with previous typing methods for Neisseria gonorrhea, like cgMLST groups, Ng-STAR, and NG-MAST. Importantly, they improve upon many of these methods as the LIN codes are also congruent with the phylogeny and represent monophyletic lineages/sublineages.

      The LIN code assignment has been implemented in PubMLST, allowing other researchers to assign LIN codes to new assemblies and put genomes of interest in context with global datasets.

      Weaknesses:

      The authors correctly highlight that cgMLST-based clusters can be fused due n to "intermediate isolates" generated through processes like horizontal gene transfer. However, the LIN codes proposed here are also based on single linkage clustering of cgMLST at multiple levels. It is unclear if future recombination or sequencing of previously unsampled diversity within N. gonorrhoeae merges together higher-level clusters, and if so, how this will impact the stability of the nomenclature.

      The authors have defined higher resolution thresholds for the LIN code scheme. However, they do not investigate how these levels correspond to previously identified transmission clusters from genomic epidemiology studies. It would be useful for future users of the scheme to know the relevant LIN code thresholds for these investigations.

      We thank the reviewer for their insightful comments. LIN codes do use multi-level single linkage clustering to define the cluster number of isolates. However, unlike previous applications of simple single linkage clustering such as N. gonorrhoeae core genome groups (Harrison et al., 2020), once assigned in LIN code, these cluster numbers are fixed within an unchanging barcode assigned to each isolate. Therefore, the nomenclature is stable, as the addition of new isolates cannot change previously established LIN codes.

      Cluster stability was considered during the selection of allelic mismatch thresholds. By choosing thresholds based on natural breaks in population structure (Figure 3), applying clustering statistics such as the silhouette score, and by assessing where cluster stability has been maintained within the previous core genome groups nomenclature, we can have confidence that the thresholds which we have selected will form stable clusters. For example, with core genome groups there has been significant group fusion with clusters formed at a threshold of 400 allelic differences, while clustering at a threshold of 300 allelic differences has remained cohesive over time (supported by a high silhouette score) and so was selected as an important threshold in the gonococcal LIN code. LIN codes have now been applied to >27000 isolates in PubMLST, and the nomenclature has remained effective despite the continual addition of new isolates to this collection. The manuscript will be revised to emphasise these points.

      Work is in progress to explore what LIN code thresholds are generally associated with transmission chains. These will likely be the last 7 thresholds (25, 10, 7, 5, 3, 1, 0) as previous work has suggested that isolates linked by transmission within one year are associated with <14 single nucleotide polymorphism differences (De Silva et al., 2016). The results of this analysis will be described in a future article, currently in preparation.

      Harrison, O.B., et al. Neisseria gonorrhoeae Population Genomics: Use of the Gonococcal Core Genome to Improve Surveillance of Antimicrobial Resistance. The Journal of Infectious Diseases 2020.

      De Silva, D., et al. Whole-genome sequencing to determine transmission of Neisseria gonorrhoeae: an observational study. The Lancet Infectious Diseases 2016;16(11):1295-1303.

    1. eLife Assessment

      This study provides valuable insights into microtubule remodeling during liver-stage Plasmodium berghei development, demonstrating that deletion of the alpha-tubulin C-terminal tail impairs parasite growth in mosquitoes and abolishes infection in HeLa cells. The work is technically ambitious, employing advanced microscopy, genetic mutants, and pharmacological approaches. However, key claims are only partially supported due to incomplete evidence linking tubulin modifications to microtubule dynamics and uncertain antibody-based PTM detection.

    2. Reviewer #1 (Public review):

      The authors try to investigate how the population of microtubules (LSPMB) that originate from sporozoite subpellicular microtubules (SSPM) and are remodelled during liver-stage development of malaria parasites. These bundles shrink over time and help form structures needed for cell division. The authors have used expansion microscopy, live-cell imaging, genetically engineered mutants, and pharmacological perturbation to study parasite development with liver cells.

      A major strength of the manuscript is the live cell imaging and expansion microscopy to study this challenging liver stage of parasite development. It gives important knowledge that PTMs of α-tubulin, such as polyglutamylation and tyrosination/detyrosination, are crucial for microtubule stability. Mutations in α-tubulin reduce the parasite's ability to move and proliferate in the liver cells. The drug oryzalin, which targets microtubules, also blocks parasite development, showing how important dynamic microtubules are at this stage.

      The major problem in the manuscript was the way it flows, as the authors keep shifting from the liver stage to the sporogony stages and then back to the liver stages. It was very confusing at times to know what the real focus of the study is, whether sporozoite development or liver stage development. The flow of the manuscript could be improved. Some of the findings reported here substantiate the previous electron microscopy.

      Overall, the study represents an important contribution towards understanding cytoskeletal remodelling during liver stage infection. The study suggests that tubulin modifications are key for the parasite's survival in the liver and could be targets for new malaria treatments. This is also the stage that has been used for vaccine development, so any knowledge of how parasites proliferate in the liver cells will be beneficial towards intervention approaches.

    3. Reviewer #2 (Public review):

      Summary:

      The authors investigated microtubule distribution and their possible post-translational modifications (PTM) in Plasmodium berghei during development of the liver stage, using either hepatocytes or HeLa cells as models. They used conventional immunofluorescence assays and expansion microscopy with various antibodies recognising tubulin and, in the second part of the work, its candidate PTMs, as well as markers of Plasmodium, in addition to live imaging with a fluorescent marker for tubulin. In the third part of the study, they generated 3 mutants deprived of either the last four residues or the last 11 residues, or where a candidate polyglutamylation site was substituted by an alanine residue.

      Strengths:

      In the first part, microtubules are monitored by a combination of two approaches (IFA and live), revealing nicely the evolution of the sporozoite subpellicular microtubules (SSPM, the sporozoite is the developmental stage present in salivary glands of the mosquitoes and that infects hepatocytes) into a different structure termed liver-stage parasite microtubule bundle (LSPMB). The LSPMB shrinks during the course of parasite development and finally disappears while hemi-spindles emerge over time. Contact points between these two structures are observed frequently in live cells and occasionally in fixed cells, suggesting the intriguing possibility that tubulin might be recycled from the LSPMB to contribute to hemi-spindle formation.

      In the second part, antibodies recognising (1) the final tyrosine found at the C-terminal tail and (2) a stretch of 3 glutamate residues in a side chain are used to monitor these candidate PTMs. Signals are positive at the SSPM, and while it remains positive for polyglutamylation, it becomes negative for the final tyrosine at the LSPM, while a positive signal emerges at hemi-spindles at later stages of development.

      In the last part, the three mutants are fed to mosquitoes, where they show reduced development, the one lacking the alpha-tubulin tail even failing to reach the salivary glands. However, the two other mutants infect HeLa cells normally, whereas sporozoites with the C-terminal tail deletion recovered from the haemolymph did not develop in these cells.

      The first part provides convincing evidence that microtubules are extensively remodelled during the infection of hepatocytes and HeLa cells, in agreement with the spectacular Plasmodium morphogenetic changes accompanying massive and rapid proliferation. The third part brings further confirmation that the C-terminal tail of alpha-tubulin is essential for multiple stages of parasite development, in agreement with previous work (50). Since it is the region where several post-translational modifications take place in other organisms (detyrosination, polyglutamylation, glycylation), it makes sense to propose that the essential function is related to these PTMs also in Plasmodium.

      Weaknesses:

      The significance of tubulin PTM relies on two antibodies whose reactivity to Plasmodium tubulins is unclear (see below). The interpretation of the literature on detyrosination and polyglutamylation is confusing in several places, meaning that the statements about the possible role of these PTMs need to be carefully revisited.

      The authors use the term "tyrosination" but the alpha1-tubulin studied here possesses the final tyrosine when it is synthesised, so it is "tyrosinated" by default. It could potentially be removed by a tyrosine carboxypeptidase of the vasoinhibin family (VASH) as reported in other species. After removal, this tyrosine can be added again by a tubulin-tyrosine ligase (TTL) enzyme. It is therefore more appropriate to talk about detyrosination-retyrosination rather than tyrosination (this confusion is unfortunately common in the literature, see Janke & Magiera, 2020).

      The difficulty here is that there is so far no evidence that detyrosination takes place in Plasmodium. Neither VASH nor TTL could be identified in the Plasmodium genome (ref 31, something we can confirm with our unsuccessful BLAST analyses), and mass spectrometry studies of purified tubulin, albeit from blood stages, did not find evidence for detyrosination (reference 43). Western blots using an antibody against detyrosinated tubulin did not produce a positive signal, neither on purified tubulin, nor on whole parasites (43). Of course, the situation could be different in liver stages, but the question of the detyrosinating enzyme is still there. The existence of a unique Plasmodium system for detyrosination cannot be formally ruled out, but given the high degree of conservation of these PTMs and their associated enzymes, it sounds difficult to imagine.

      The fact that the anti-tyrosinated antibody still produced a signal in the cell line where the final tyrosine is deleted raises issues about its specificity. A cross-reactivity with beta-tubulin is proposed, but the Plasmodium beta-tubulin does not carry a final tyrosine, further raising concerns about antibody specificity.

      The interpretation of these results should therefore be considered carefully. There also seems to be some confusion in the function of detyrosination cited from the literature. It is said in line 229 that "tyrosination has been associated with stable microtubules" (33, 34, 50, 55). References 33 and 34 actually show that tyrosinated microtubules turn over faster in neurons or in epithelial cells, respectively, while references 50 and 55 do not study de/retyrosination. The general consensus is that tyrosinated microtubules are more dynamic (see reference 24).

      The situation is a bit different for polyglutamylation since several candidate poly- or mono-glutamylases have been identified in the Plasmodium genome, and at least mono-glutamylation of beta-tubulin has been formally proven, still in bloodstream stages (ref 43). The authors propose that the residue E445 is the polyglutamylation site. To our knowledge, this has not been demonstrated for Plasmodium. This residue is indeed the favourite one in several organisms such as humans and trypanosomes (Eddé et al., Science 1990; Schneider et al., JCS, 1997), and it is tempting to propose it would be the same here. However, TTLLs bind the tubulin tails from their C-terminal end like a glove on a finger (Garnham et al., Cell, 2015), and the presence of two extra residues in Plasmodium tubulins would mean that the reactive glutamate might be in position E447 rather than E445. This is worth discussing.<br /> On the positive side, it is encouraging to see that signals for both anti-tyrosinated tail and poly-glutamylated side chain are going down in the various mutants, but this would need validation with a comparison for alpha-tubulin signal.

      Line 316: polyglutamylation "is commonly associated with dynamic microtubule behavior (78-80)". Actually, references 78 and 79 show the impact of this PTM on interaction with spastin, and reference 80 discusses polyglutamylation as a marker of stable microtubules in the context of cilia and flagella. The consensus is that polyglutamylated microtubules tend to be more stable (ref24).

      Conclusion:

      The first and the third parts of this manuscript - evolution of microtubules and importance of the C-terminal tails for Plasmodium development - are convincing and well supported by data. However, the presence and role of tubulin PTM should be carefully reconsidered.

      Plasmodium tubulins are more closely related to plant tubulins and are sensitive to inhibitors that do not affect mammalian microtubules. They therefore represent promising drug targets as several well-characterised compounds used as herbicides are available. The work produced here further defines the evolution of the microtubule network in sporozoites and liver stages, which are the initial and essential first steps of the infection. Moreover, Plasmodium has multiple specificities that make it a fascinating organism to study both for cell biology and evolution. The data reported here are elegant and will attract the attention of the community working on parasites but also on the cytoskeleton at large. It will be interesting to have the feedback of other people working on tubulin PTMs to figure out the significance of this part of the work.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript by Atchou et al. investigates the role of the microtubule cytoskeleton in sporozoites of Plasmodium berghei, including possible functions of microtubule post-translational modifications (tyrosination and polyglutamylation) in the development of sporozoites in the liver. They also assessed the development of sporozoites in the mosquito. Using cell culture models and in vivo infections with parasites that contain tubulin mutants deficient in certain PTMs, they show that may aspects of the life cycle progression are impaired. The main conclusion is that microtubule PTMs play a major role in the differentiation processes of the parasites.

      However, there are a number of major and minor points of criticism that relate to the interpretation of some of the data.

      Comments:

      (1) The first paragraph of "Results" almost suggests that the presence of a subpellicular MT-array in sporozoites is a new discovery. This is not the case, see e.g. the recent publication by Ferreira et al. (Nature Communications, 2023).

      (2) Why were HeLa cells and not hepatocytes (as in Figure 3) used for measuring infection rates of the mutants in Figure 5H and 5L? As I understand, HeLa cells are not natural host cells for invading sporozoites. HeLa cells are epithelial cells derived from a cervical tumour. I am not an expert in Plasmodium biology, but is a HeLa infection an accepted surrogate model for liver stage development?

      (3) The tubulin staining in Figures 1A and 1B is confusing and doesn't seem to make sense. Whereas in 1A the antibody nicely stains host and parasite tubulin, in 1B, only parasite tubulin is visible. If the same antibody and the same host cells have been used, HeLa cytoplasmic microtubules should be visible in 1B. In fact, they should be the predominant antigen. The same applies to Figure 2, where host microtubules are also not visible.

      (4) In Figures 2A and B, the host nuclei appear to have very different sizes in the DMSO controls and in the drug-treated cells. For example, in the 20 µM (-) image (bottom right), the nuclei are much larger than in the DMSO (-) control (top left). If this is the case, expansion microscopy hasn't worked reproducibly, and therefore, quantification of fluorescence is problematic. The scalebar is the same for all panels.

      (5) I don't quite follow the argument that spindles and the LSPMB are dynamic structures (e.g., lines 145, 174). That is a trivial statement for the spindle, as it is always dynamic, but beyond that, it has only been shown that the structure is sensitive to oryzalin. That says little about any "natural" dynamic behaviour. Any microtubule structure can be destroyed by a particular physical or chemical treatment, but that doesn't mean all structures are dynamic. It also depends on the definition of "dynamic" in a particular context, for example, the time scale of dynamic behaviour (changes within seconds, minutes, or hours).

      (6) I am not sure what part in the story EB1 plays. The data are only shown in the Supplements and don't seem to be of particular relevance. EB1 is a ubiquitous protein associated with microtubule plus ends. The statement (line 192) that it "may play a broader role..." is unsubstantiated and cannot be based merely on the observation that it is expressed in a particular life cycle stage.

      (7) Line 196 onwards: The antibody IN105 is better known in the field as polyE. Maybe that should be added in Materials and Methods. Also, the antibody T9028 against tyrosinated tubulin is poorly validated in the literature and rarely used. Usually, researchers in this field use the monoclonal antibody YL1/2. I am not sure why this unusual antibody was chosen in this study. In fact, has its specificity against tyrosinated α-tubulin from Plasmodium berghei ever been shown? The original antigen was human and had the sequence EGEEY. The Plasmodium sequence is YEADY and hence very different. It is stated that the LSPMB is both polyglutamylated and tyrosinated. This is unusual because polyglutamylated microtubules are usually indicative of stable microtubules, whereas tyrosinated microtubules are found on freshly polymerised and dynamic microtubules. However, a co-localisation within the same cell has not been attempted. This is, however, possible since polyE is a rabbit antibody and T9028 is a mouse antibody. I suspect that differences or gradients along the LSPMB would have been noticed. Also, in lines 207/208, it is said that tyrosination disappears after hepatocyte invasion, which is shown in Figure 3. However, in Figure 3A, quite a lot of positive signals for tyrosination are visible in the 54 and 56 hpi panels.

      (8) In line 229, it is stated that tyrosination "has previously been associated with stable microtubule in motility". This statement is not correct. In fact, none of the cited references that apparently support this statement show that this is the case. On the contrary, stable microtubules, such as flagellar axonemes, are almost completely detyrosinated. Therefore, tyrosination is a marker for dynamic microtubules, whereas detyrosinated microtubules are indicative of stable microtubules. This is an established fact, and it is odd that the authors claim the opposite.

      (9) Line 236 onwards: Concerning the generation of tubulin mutants, I think it is necessary to demonstrate successful replacement of the wild-type allele by the mutant allele. I am sure the authors have done this by amplification and subsequent sequencing of the genomic locus using PCR primers outside the plasmid sequences. I suggest including this information, e.g., by displaying the chromatograph trace in a supplementary figure. Or are the sequences displayed in Figure S3B already derived from sequenced genomic DNA? This is not described in the Legend or in Materials and Methods. The left PCR products obtained for Figure S3 B would be a suitable template for sequencing.

      (10) It is also important to be aware of the fact that glutamylation also occurs on β-tubulin. This signal will also be detected by polyE (IN105). Therefore, it is surprising that IN105 immunofluorescence is negative on the C-term Δ cells (Figure S3 D). Is there anything known about confirmed polyglutamylation sites on both α- and β-tubulins in Plasmodium, e.g., by MS? In Toxoplasma, both α- and β-tubulin have been shown to be polyglutamylated.

      (11) Figure S3 is very confusing. In the legend, certain intron deletions are mentioned. How does this relate to posttranslational tubulin modifications? The corresponding section in Results (lines 288-292) is also not very helpful in understanding this.

      (12) Figure 4E doesn't look like brightfield microscopy but like some sort of fluorescent imaging. In Figure 4C, were the control (NoΔ) cells with an integrated cassette, but no mutations, or non-transgenic cells?

      (13) It is difficult to understand why the TyΔ and the CtΔ mutants still show quite a strong signal using the anti-tyrosination antibody. If the mutants have replaced all wild-type alleles, the signal should be completely absent, unless the antibody (see my comment above concerning T9028) cross-reacts with detyrosinated microtubules. Therefore, the quantitation in Figures 5F and 5G is actually indicative of something that shouldn't be like that. The quantitation of 5F is at odds with the microscopy image in 5D. If this image is representative, the anti-Ty staining in TyΔ is as strong as in the control NoΔ.

      (14) The statement that the failure of CtΔ mutants to generate viable sporozoites is due to the lack of microtubule PTMs (lines 295-296) is speculative. The lack of the entire C-terminal tail could have a number of consequences, such as impaired microtubule assembly or failure to recruit and bind associated proteins. This is not necessarily linked to PTMs. Also, it has been shown in yeast that for microtubules to form properly and exquisite regulation (proteostasis) of the ratio between α- and β-tubulin is essential (Wethekam and Moore, 2023). I am not sure, but according to Materials and Methods (line 423), the gene cassettes for replacing the wild-type tubulin gene with the mutant versions contain a selectable marker gene for pyrimethamine selection. Are there qPCR data that show that expression levels of mutant α-tubulin are more or less the same as the wild-type levels?

      (15) In the Discussion, my impression is that two recent studies, the superb Expansion Microscopy study by Bertiaux et al. (2021) and the cryo-EM study by Ferreira et al. (2023), are not sufficiently recognised (although they are cited elsewhere in the manuscript). The latter study includes a detailed description of the microtubule cytoskeleton in sporozoites. However, the present study clearly expands the knowledge about the structure of the cytoskeleton in liver stage parasites and is one of the few studies addressing the distribution and function of microtubule post-translational modifications in Plasmodium.

      (16) I somewhat disagree with the statement of a co-occurrence of polyglutamylated and tyrosinated microtubules. I think the resolution is too low to reach that conclusion. As this is a bold claim, and would be contrary to what is known from other organisms, it would require a more rigorous validation. Given the apparent problems with the anti-Ty antibody (signal in the TyΔ mutant), one should be very cautious with this claim.

      (17) In the Discussion (lines 311 and 377), it is again claimed that tyrosinated microtubules are "a well-known marker of stable microtubules". This statement is completely incorrect, and I am surprised by this serious mistake. A few lines later, the authors say that polyglutamylated is "commonly associated with dynamic microtubule behaviour". Again, this is completely incorrect and is the opposite of what is firmly established in the literature. Polyglutamylation and detyrosination are markers of stable microtubules.

      (18) In line 339, the authors interpret the residual antibody staining after the introduction of the mutant tubulin as a compensatory mechanism. There is no evidence for this. More likely explanations are firstly the quality of the anti-Ty-antibody used (see comment above), and the fact that also β-tubulin carries C-terminal polyglutamylation sites, which haven't been investigated in this study. PTMs on β-tubulin are not compensatory, but normal PTMs, at least in all other organisms where microtubule PTMs have been investigated.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      The authors try to investigate how the population of microtubules (LSPMB) that originate from sporozoite subpellicular microtubules (SSPM) and are remodelled during liver-stage development of malaria parasites. These bundles shrink over time and help form structures needed for cell division. The authors have used expansion microscopy, live-cell imaging, genetically engineered mutants, and pharmacological perturbation to study parasite development with liver cells.

      A major strength of the manuscript is the live cell imaging and expansion microscopy to study this challenging liver stage of parasite development. It gives important knowledge that PTMs of α-tubulin, such as polyglutamylation and tyrosination/detyrosination, are crucial for microtubule stability. Mutations in α-tubulin reduce the parasite's ability to move and proliferate in the liver cells. The drug oryzalin, which targets microtubules, also blocks parasite development, showing how important dynamic microtubules are at this stage.

      The major problem in the manuscript was the way it flows, as the authors keep shifting from the liver stage to the sporogony stages and then back to the liver stages. It was very confusing at times to know what the real focus of the study is, whether sporozoite development or liver stage development. The flow of the manuscript could be improved. Some of the findings reported here substantiate the previous electron microscopy.

      Overall, the study represents an important contribution towards understanding cytoskeletal remodelling during liver stage infection. The study suggests that tubulin modifications are key for the parasite's survival in the liver and could be targets for new malaria treatments. This is also the stage that has been used for vaccine development, so any knowledge of how parasites proliferate in the liver cells will be beneficial towards intervention approaches.

      We would like to express our sincere gratitude to Reviewer #1 for the positive and encouraging feedback on our manuscript. We are delighted that the reviewer found our experimental design and methodologies appropriate and that our study represents an important contribution to understanding cytoskeletal remodelling during liver stage infection, a critical phase for vaccine development. We are also grateful to the reviewer for highlighting the issue with the manuscript's flow. We acknowledge this limitation and will significantly improve the narrative structure and logical progression in the revised manuscript to ensure clarity and avoid any potential confusion. Thank you again for your thoughtful and constructive comments.

      Reviewer #2 (Public review):

      Summary:

      The authors investigated microtubule distribution and their possible post-translational modifications (PTM) in Plasmodium berghei during development of the liver stage, using either hepatocytes or HeLa cells as models. They used conventional immunofluorescence assays and expansion microscopy with various antibodies recognising tubulin and, in the second part of the work, its candidate PTMs, as well as markers of Plasmodium, in addition to live imaging with a fluorescent marker for tubulin. In the third part of the study, they generated 3 mutants deprived of either the last four residues or the last 11 residues, or where a candidate polyglutamylation site was substituted by an alanine residue.

      Strengths:

      In the first part, microtubules are monitored by a combination of two approaches (IFA and live), revealing nicely the evolution of the sporozoite subpellicular microtubules (SSPM, the sporozoite is the developmental stage present in salivary glands of the mosquitoes and that infects hepatocytes) into a different structure termed liver-stage parasite microtubule bundle (LSPMB). The LSPMB shrinks during the course of parasite development and finally disappears while hemi-spindles emerge over time. Contact points between these two structures are observed frequently in live cells and occasionally in fixed cells, suggesting the intriguing possibility that tubulin might be recycled from the LSPMB to contribute to hemi-spindle formation.

      In the second part, antibodies recognising (1) the final tyrosine found at the C-terminal tail and (2) a stretch of 3 glutamate residues in a side chain are used to monitor these candidate PTMs. Signals are positive at the SSPM, and while it remains positive for polyglutamylation, it becomes negative for the final tyrosine at the LSPM, while a positive signal emerges at hemi-spindles at later stages of development.

      In the last part, the three mutants are fed to mosquitoes, where they show reduced development, the one lacking the alpha-tubulin tail even failing to reach the salivary glands. However, the two other mutants infect HeLa cells normally, whereas sporozoites with the C-terminal tail deletion recovered from the haemolymph did not develop in these cells.

      The first part provides convincing evidence that microtubules are extensively remodelled during the infection of hepatocytes and HeLa cells, in agreement with the spectacular Plasmodium morphogenetic changes accompanying massive and rapid proliferation. The third part brings further confirmation that the C-terminal tail of alpha-tubulin is essential for multiple stages of parasite development, in agreement with previous work (50). Since it is the region where several post-translational modifications take place in other organisms (detyrosination, polyglutamylation, glycylation), it makes sense to propose that the essential function is related to these PTMs also in Plasmodium.

      Weaknesses:

      The significance of tubulin PTM relies on two antibodies whose reactivity to Plasmodium tubulins is unclear (see below). The interpretation of the literature on detyrosination and polyglutamylation is confusing in several places, meaning that the statements about the possible role of these PTMs need to be carefully revisited.

      The authors use the term "tyrosination" but the alpha1-tubulin studied here possesses the final tyrosine when it is synthesised, so it is "tyrosinated" by default. It could potentially be removed by a tyrosine carboxypeptidase of the vasoinhibin family (VASH) as reported in other species. After removal, this tyrosine can be added again by a tubulin-tyrosine ligase (TTL) enzyme. It is therefore more appropriate to talk about detyrosination-retyrosination rather than tyrosination (this confusion is unfortunately common in the literature, see Janke & Magiera, 2020).

      The difficulty here is that there is so far no evidence that detyrosination takes place in Plasmodium. Neither VASH nor TTL could be identified in the Plasmodium genome (ref 31, something we can confirm with our unsuccessful BLAST analyses), and mass spectrometry studies of purified tubulin, albeit from blood stages, did not find evidence for detyrosination (reference 43). Western blots using an antibody against detyrosinated tubulin did not produce a positive signal, neither on purified tubulin, nor on whole parasites (43). Of course, the situation could be different in liver stages, but the question of the detyrosinating enzyme is still there. The existence of a unique Plasmodium system for detyrosination cannot be formally ruled out but given the high degree of conservation of these PTMs and their associated enzymes, it sounds difficult to imagine.

      The fact that the anti-tyrosinated antibody still produced a signal in the cell line where the final tyrosine is deleted raises issues about its specificity. A cross-reactivity with beta-tubulin is proposed, but the Plasmodium beta-tubulin does not carry a final tyrosine, further raising concerns about antibody specificity.

      The interpretation of these results should therefore be considered carefully. There also seems to be some confusion in the function of detyrosination cited from the literature. It is said in line 229 that "tyrosination has been associated with stable microtubules" (33, 34, 50, 55). References 33 and 34 actually show that tyrosinated microtubules turn over faster in neurons or in epithelial cells, respectively, while references 50 and 55 do not study de/retyrosination. The general consensus is that tyrosinated microtubules are more dynamic (see reference 24).

      The situation is a bit different for polyglutamylation since several candidate poly- or mono-glutamylases have been identified in the Plasmodium genome, and at least mono-glutamylation of beta-tubulin has been formally proven, still in bloodstream stages (ref 43). The authors propose that the residue E445 is the polyglutamylation site. To our knowledge, this has not been demonstrated for Plasmodium. This residue is indeed the favourite one in several organisms such as humans and trypanosomes (Eddé et al., Science 1990; Schneider et al., JCS, 1997), and it is tempting to propose it would be the same here. However, TTLLs bind the tubulin tails from their C-terminal end like a glove on a finger (Garnham et al., Cell, 2015), and the presence of two extra residues in Plasmodium tubulins would mean that the reactive glutamate might be in position E447 rather than E445. This is worth discussing.

      On the positive side, it is encouraging to see that signals for both anti-tyrosinated tail and poly-glutamylated side chain are going down in the various mutants, but this would need validation with a comparison for alpha-tubulin signal.

      Line 316: polyglutamylation "is commonly associated with dynamic microtubule behavior (78-80)". Actually, references 78 and 79 show the impact of this PTM on interaction with spastin, and reference 80 discusses polyglutamylation as a marker of stable microtubules in the context of cilia and flagella. The consensus is that polyglutamylated microtubules tend to be more stable (ref24).

      Conclusion:

      The first and the third parts of this manuscript - evolution of microtubules and importance of the C-terminal tails for Plasmodium development - are convincing and well supported by data. However, the presence and role of tubulin PTM should be carefully reconsidered.

      Plasmodium tubulins are more closely related to plant tubulins and are sensitive to inhibitors that do not affect mammalian microtubules. They therefore represent promising drug targets as several well-characterised compounds used as herbicides are available. The work produced here further defines the evolution of the microtubule network in sporozoites and liver stages, which are the initial and essential first steps of the infection. Moreover, Plasmodium has multiple specificities that make it a fascinating organism to study both for cell biology and evolution. The data reported here are elegant and will attract the attention of the community working on parasites but also on the cytoskeleton at large. It will be interesting to have the feedback of other people working on tubulin PTMs to figure out the significance of this part of the work.

      We thank Reviewer #2 for the thoughtful and detailed evaluation of our manuscript. We are pleased that the reviewer found our study elegant and believe it will attract the attention of the broader scientific community, both those working on parasites and those focused on cytoskeleton biology. We also acknowledge the concerns raised regarding the specificity of the antibodies used to detect tubulin post-translational modifications (PTMs), as well as the interpretation of their signals and the current lack of identified detyrosination enzymes in the Plasmodium genome. We agree that these are important limitations, and we will address them thoroughly in the revised manuscript. This includes clarifying our interpretation of tyrosination versus detyrosination, adjusting our claims regarding polyglutamylation sites, and carefully revisiting the literature cited to ensure accurate contextualization of PTM function in microtubule stability.

      We are grateful for the reviewer’s close reading and critical feedback, which will help us substantially improve the clarity, precision, and strength of our manuscript.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Atchou et al. investigates the role of the microtubule cytoskeleton in sporozoites of Plasmodium berghei, including possible functions of microtubule post-translational modifications (tyrosination and polyglutamylation) in the development of sporozoites in the liver. They also assessed the development of sporozoites in the mosquito. Using cell culture models and in vivo infections with parasites that contain tubulin mutants deficient in certain PTMs, they show that may aspects of the life cycle progression are impaired. The main conclusion is that microtubule PTMs play a major role in the differentiation processes of the parasites.

      However, there are a number of major and minor points of criticism that relate to the interpretation of some of the data.

      We thank Reviewer #3 for the overall positive assessment of our study and for recognizing its contribution to advancing our understanding of Plasmodium biology and malaria pathogenesis. We appreciate the reviewer’s constructive feedback, particularly regarding the interpretation of some of our data. These comments have been very helpful in guiding our revisions, and we have worked to improve both the clarity of our presentation and the precision of our interpretations in the revised manuscript.

      Below, we respond in detail to each of the reviewer’s points.

      Comments:<br /> (1) The first paragraph of "Results" almost suggests that the presence of a subpellicular MT-array in sporozoites is a new discovery. This is not the case, see e.g. the recent publication by Ferreira et al. (Nature Communications, 2023).

      We thank the reviewer for pointing this out and fully agree that the subpellicular microtubule (SPM) array in sporozoites is well established, as documented in earlier work (e.g., Cyrklaff et al., 2007) and more recently by Ferreira et al. (Nat. Commun., 2023). Our intention was not to suggest that the existence of the SSPM is a novel finding. Rather, our study builds on this existing knowledge by demonstrating that these sporozoite-derived microtubules are not disassembled upon hepatocyte entry but are repurposed into a newly described structure, the liver stage parasite microtubule bundle (LSPMB). This reorganization, its persistence into liver stage development, and its dynamic role in microtubule remodeling and nuclear division are, to our knowledge, novel observations. We will revise the manuscript to make this distinction clearer in the introduction and the results section.

      (2) Why were HeLa cells and not hepatocytes (as in Figure 3) used for measuring infection rates of the mutants in Figure 5H and 5L? As I understand, HeLa cells are not natural host cells for invading sporozoites. HeLa cells are epithelial cells derived from a cervical tumour. I am not an expert in Plasmodium biology, but is a HeLa infection an accepted surrogate model for liver stage development?

      We appreciate the opportunity to clarify our experimental model. While HeLa cells are not the natural host cells, they are a well-established and validated in vitro model for studying Plasmodium berghei liver stage development in our lab and others. In this system, the parasite completes its full development and generates infectious merozoites. Numerous studies have successfully used HeLa cells as a liver stage infection model, with key findings subsequently validated in primary hepatocytes or in vivo, confirming its utility as a representative model. We employed this cell line primarily to reduce animal usage in accordance with the 3Rs principles (Replacement, Reduction, Refinement). Importantly, to ensure the biological relevance of our discoveries in HeLa cells, we validated our key findings in primary mouse hepatocytes, as shown in Figure 3. Furthermore, we confirmed the in vivo infectivity of mutant parasite lines that produced typical salivary gland sporozoites through an in vivo infection assay, presented in Figure S4C.

      (3) The tubulin staining in Figures 1A and 1B is confusing and doesn't seem to make sense. Whereas in 1A the antibody nicely stains host and parasite tubulin, in 1B, only parasite tubulin is visible. If the same antibody and the same host cells have been used, HeLa cytoplasmic microtubules should be visible in 1B. In fact, they should be the predominant antigen. The same applies to Figure 2, where host microtubules are also not visible.

      We thank the reviewer for this careful observation regarding the α-tubulin staining in Figures 1A and 1B. The same host cell type (HeLa) and α-tubulin antibody were indeed used in both experiments. Figure 1A shows results from conventional immunofluorescence assays, where both host and parasite microtubules are clearly stained. In contrast, Figure 1B shows the outcome of ultrastructure expansion microscopy (U-ExM), where parasite microtubules appear prominently, while host microtubules are less visible.

      This effect appears to be a technical outcome of the U-ExM protocol, which can differentially preserve or reveal microtubule epitopes. We consistently observed stronger parasite signal across various cell types, including primary hepatocytes (Figure 3A,B). The lack of visible host microtubules in some U-ExM images does not reflect their absence, but rather reduced signal intensity relative to the parasite structures. This is not observed with all antibodies, e.g., host microtubules stain strongly with anti-tyrosinated α-tubulin (Figure 3B), likely reflecting their high tyrosination state.

      To overcome this limitation, we employed PS-ExM and combined PS-ExM/U-ExM approaches (as described in reference 56), which allowed simultaneous high-resolution visualization of both host and parasite microtubule networks. These combined methods are now being used in follow-up studies to investigate host–parasite microtubule interactions in more detail.

      We will clarify this point in the revised manuscript to avoid confusion.

      (4) In Figures 2A and B, the host nuclei appear to have very different sizes in the DMSO controls and in the drug-treated cells. For example, in the 20 µM (-) image (bottom right), the nuclei are much larger than in the DMSO (-) control (top left). If this is the case, expansion microscopy hasn't worked reproducibly, and therefore, quantification of fluorescence is problematic. The scalebar is the same for all panels.

      The expansion microscopy methods used in this study have been rigorously validated for both reproducibility and isotropicity. However, as the reviewer rightly notes, host cell nuclei can vary in size due to several factors, including cell cycle stage, infection status, and the extent of parasite development, all of which can influence host nuclei morphology and size.

      Importantly, the quantifications relevant to our conclusions were focused specifically on parasite structures. We did not rely on host nuclear size or host fluorescence intensity as a quantitative readout in this context. While we acknowledge the observed variability in host nuclear dimensions, it does not compromise the accuracy or reproducibility of the parasite specific measurements central to our study.

      We will clarify this point in the revised figure legend and manuscript.

      (5) I don't quite follow the argument that spindles and the LSPMB are dynamic structures (e.g., lines 145, 174). That is a trivial statement for the spindle, as it is always dynamic, but beyond that, it has only been shown that the structure is sensitive to oryzalin. That says little about any "natural" dynamic behaviour. Any microtubule structure can be destroyed by a particular physical or chemical treatment, but that doesn't mean all structures are dynamic. It also depends on the definition of "dynamic" in a particular context, for example, the time scale of dynamic behaviour (changes within seconds, minutes, or hours).

      We agree that sensitivity to chemical depolymerization alone does not necessarily indicate dynamic behavior, particularly in the absence of data on turnover kinetics or temporal changes.

      Our interpretation was based on two observations: first, that the LSPMB, which derives from the highly stable sporozoite subpellicular microtubules (known to be drug-resistant), becomes susceptible to depolymerization during the liver stage; and second, that the LSPMB gradually shrinks over time during parasite development. These features suggested a transition toward a more dynamic state compared to its origin. However, we fully agree that “dynamic” is a context-dependent term and that direct evidence such as turnover rates or structural changes on short time scales, is required to rigorously define microtubule dynamics.

      We will revise the manuscript to clarify our use of this term and explicitly acknowledge the need for further studies to characterize the timescale and mechanisms underlying LSPMB remodeling.

      (6) I am not sure what part in the story EB1 plays. The data are only shown in the Supplements and don't seem to be of particular relevance. EB1 is a ubiquitous protein associated with microtubule plus ends. The statement (line 192) that it "may play a broader role..." is unsubstantiated and cannot be based merely on the observation that it is expressed in a particular life cycle stage.

      We agree that EB1 is a ubiquitous microtubule plus-end binding protein and that its presence alone does not imply a novel function. Previous studies (e.g., Maurer et al., 2023; Yang et al., 2023; Zeeshan et al., 2023) have focused on its role during Plasmodium sexual stages, while its expression during liver and mosquito stages has not been previously documented.

      Our data extend this knowledge by showing that EB1 is also expressed during liver stage development, particularly during the highly mitotic schizont phase. While we agree that this observation alone does not prove functional involvement, it raises the possibility of a broader role for EB1 in regulating microtubule dynamics beyond sexual stages. To avoid overinterpretation, we have presented these findings in the supplementary material and will revise the manuscript to tone down speculative statements and clearly frame this as a preliminary observation that warrants further investigation.

      (7) Line 196 onwards: The antibody IN105 is better known in the field as polyE. Maybe that should be added in Materials and Methods. Also, the antibody T9028 against tyrosinated tubulin is poorly validated in the literature and rarely used. Usually, researchers in this field use the monoclonal antibody YL1/2. I am not sure why this unusual antibody was chosen in this study. In fact, has its specificity against tyrosinated α-tubulin from Plasmodium berghei ever been shown? The original antigen was human and had the sequence EGEEY. The Plasmodium sequence is YEADY and hence very different. It is stated that the LSPMB is both polyglutamylated and tyrosinated. This is unusual because polyglutamylated microtubules are usually indicative of stable microtubules, whereas tyrosinated microtubules are found on freshly polymerised and dynamic microtubules. However, a co-localisation within the same cell has not been attempted. This is, however, possible since polyE is a rabbit antibody and T9028 is a mouse antibody. I suspect that differences or gradients along the LSPMB would have been noticed. Also, in lines 207/208, it is said that tyrosination disappears after hepatocyte invasion, which is shown in Figure 3. However, in Figure 3A, quite a lot of positive signals for tyrosination are visible in the 54 and 56 hpi panels.

      First, we acknowledge that the IN105 antibody is more widely known as "polyE" in the field. We will update the Materials and Methods section accordingly to reflect this nomenclature.

      Regarding the use of the T9028 antibody against tyrosinated α-tubulin: we agree that this monoclonal antibody is less commonly used than YL1/2, and we appreciate the reviewer drawing attention to this. The original antigen for T9028 is based on the mammalian C-terminal sequence EGEEY, which differs from the Plasmodium α1-tubulin sequence (YEADY). Like many in the field, we face the challenge that most available antibodies are raised against mammalian epitopes, and specificity in Plasmodium can vary. Nonetheless, the literature (e.g., Hirst et al., 2022; Fennell et al., 2008) has demonstrated that tyrosination occurs in Plasmodium α1-tubulin, using anti-tyrosination antibodies including YL1/2.

      Following the reviewer’s excellent suggestion, we are currently repeating the key experiments using the YL1/2 antibody to compare staining patterns directly with those obtained using T9028. We will include these results in the revised manuscript.

      Concerning the potential co-localization of polyglutamylation and tyrosination on the LSPMB: we agree that this is an interesting and testable hypothesis. In the current manuscript, Figures 3A and 3B were generated from independent experiments, and thus co-localization was not assessed. However, as the reviewer correctly notes, polyE and T9028 antibodies are raised in rabbit and mouse, respectively, making co-staining feasible. We will follow up on this experimentally and, if feasible within our revision timeline, include data in the revised version or highlight this as a future direction.

      Finally, with regard to Figure 3 and the observation that tyrosination appears to persist at 54 and 56 hpi (Figure 3B): the reviewer is correct that tyrosination signal is still detectable at these time points. Our statement that tyrosination “disappears after hepatocyte invasion” was intended to refer to an overall decrease in signal intensity during early liver stage development, with a reappearance at later stages (e.g., cytomere formation). We will rephrase this section for greater clarity and ensure that figure annotations and legends unambiguously reflect the dynamics observed.

      (8) In line 229, it is stated that tyrosination "has previously been associated with stable microtubule in motility". This statement is not correct. In fact, none of the cited references that apparently support this statement show that this is the case. On the contrary, stable microtubules, such as flagellar axonemes, are almost completely detyrosinated. Therefore, tyrosination is a marker for dynamic microtubules, whereas detyrosinated microtubules are indicative of stable microtubules. This is an established fact, and it is odd that the authors claim the opposite.

      We fully agree that in canonical eukaryotic systems, tyrosinated microtubules are generally markers of dynamic microtubule populations, whereas detyrosinated microtubules are typically associated with stability particularly in structures such as flagellar axonemes.

      Our original statement will be corrected. In our study, we observed that tyrosinated microtubules are prevalent in invasive stages (sporozoites and merozoites), while detyrosinated forms become more prominent during intracellular liver stage development. This pattern is consistent with the established link between tyrosination and dynamic microtubules.

      What is particularly intriguing in Plasmodium is the apparent cycling of tyrosination despite the absence of known tubulin tyrosine ligase (TTL) homologs in the genome. This suggests either a highly divergent enzyme or the involvement of host cell factors, a hypothesis supported by the reappearance of tyrosinated microtubules during liver stage schizogony (Figure 3B).

      We will revise the relevant text and the Discussion section to reflect these mechanistic considerations more accurately and to avoid misrepresenting established principles of microtubule biology.

      (9) Line 236 onwards: Concerning the generation of tubulin mutants, I think it is necessary to demonstrate successful replacement of the wild-type allele by the mutant allele. I am sure the authors have done this by amplification and subsequent sequencing of the genomic locus using PCR primers outside the plasmid sequences. I suggest including this information, e.g., by displaying the chromatograph trace in a supplementary figure. Or are the sequences displayed in Figure S3B already derived from sequenced genomic DNA? This is not described in the Legend or in Materials and Methods. The left PCR products obtained for Figure S3 B would be a suitable template for sequencing.

      Indeed, these data are presented in Figure 4B and the corresponding sequence data are shown in Figure S3B. We appreciate the reviewer’s suggestion, which will help improve the transparency and reproducibility of our methodology.

      (10) It is also important to be aware of the fact that glutamylation also occurs on β-tubulin. This signal will also be detected by polyE (IN105). Therefore, it is surprising that IN105 immunofluorescence is negative on the C-term Δ cells (Figure S3 D). Is there anything known about confirmed polyglutamylation sites on both α- and β-tubulins in Plasmodium, e.g., by MS? In Toxoplasma, both α- and β-tubulin have been shown to be polyglutamylated.

      Indeed, polyglutamylation is known to occur not only on α-tubulin but also on β-tubulin in many organisms, including Toxoplasma gondii, and the polyE (IN105) antibody is expected to detect polyglutamylation on both tubulin isoforms.

      The parasites shown in Figure S3D correspond to mutant lines originally generated by Spreng et al. (2019): the IntronΔ mutant (with deletion of introns in the Plasmodium α1-tubulin gene) and the C-termΔ mutant (with deletion of the final three C-terminal residues: ADY). As the reviewer correctly notes, this particular C-terminal deletion does not include the predicted polyglutamylation site (E445 or E447, depending on alignment), and thus should not abolish all polyglutamylation. However, in our experiments, the IN105 signal is substantially reduced in this mutant. This may suggest that structural alterations in the tubulin tail affect accessibility of the polyglutamylation epitope or influence the modification itself though we cannot exclude other possibilities, including changes in antibody recognition.

      To date, polyglutamylation sites in Plasmodium tubulins have not been definitively confirmed by mass spectrometry. However, a recent MS-based study (reference 43) detected monoglutamylation on β-tubulin in blood stage parasites. Direct MS evidence for polyglutamylation of either α- or β-tubulin in Plasmodium liver stages is still lacking. We will clarify these points in the revised manuscript to avoid potential confusion and to highlight the need for future biochemical validation of PTM sites.

      (11) Figure S3 is very confusing. In the legend, certain intron deletions are mentioned. How does this relate to posttranslational tubulin modifications? The corresponding section in Results (lines 288-292) is also not very helpful in understanding this.

      The parasite lines shown in Figure S3D were originally generated by Spreng et al. (2019) and are not directly part of the main set of PTM-targeted mutants described in our study. Specifically, the IntronΔ line carries deletions in introns of the Plasmodium α1-tubulin gene, while the C-termΔ line lacks the final three C-terminal residues (ADY). These lines were included for comparative purposes to explore whether structural changes in α-tubulin could impact polyglutamylation signal, as detected by the polyE (IN105) antibody.

      We acknowledge that the figure legend and corresponding text (lines 288–292) did not adequately explain the rationale for including these control lines. We will revise both the legend and Results section to more clearly describe the origin, purpose, and relevance of these mutants to the overall study.

      (12) Figure 4E doesn't look like brightfield microscopy but like some sort of fluorescent imaging. In Figure 4C, were the control (NoΔ) cells with an integrated cassette, but no mutations, or non-transgenic cells?

      The reviewer is absolutely correct: Figure 4E shows a fluorescent image acquired using widefield microscopy and not a brightfield image. We will revise the figure legend accordingly to avoid confusion. The “BF” (brightfield) label applies only to the left panel in Figure 4C, which depicts oocysts imaged using transmitted light.

      Regarding the controls labeled "NoΔ" in Figure 4C, we confirm that these parasites contain the integrated selection cassette but do not harbor any mutations in the target gene. They serve as proper integration controls, allowing us to distinguish the effects of the point mutations or deletions introduced in the experimental lines.

      (13) It is difficult to understand why the TyΔ and the CtΔ mutants still show quite a strong signal using the anti-tyrosination antibody. If the mutants have replaced all wild-type alleles, the signal should be completely absent, unless the antibody (see my comment above concerning T9028) cross-reacts with detyrosinated microtubules. Therefore, the quantitation in Figures 5F and 5G is actually indicative of something that shouldn't be like that. The quantitation of 5F is at odds with the microscopy image in 5D. If this image is representative, the anti-Ty staining in TyΔ is as strong as in the control NoΔ.

      We agree that the persistence of anti-tyrosination signal in the TyΔ and CtΔ mutant lines is unexpected, given that all wild-type alleles were replaced. This discrepancy has led us to further investigate the specificity of the T9028 antibody, as raised in the reviewer’s earlier comment. To address this concern, we are currently repeating the key experiments using the well-established YL1/2 monoclonal antibody, which is widely accepted for detecting tyrosinated α-tubulin in other systems.

      We also acknowledge that Figure 5F shows residual tyrosination signal, and the reviewer is correct that this should not occur if the modified residues are the exclusive PTM sites. One possible explanation is that adjacent residues or even alternative tubulin isoforms may serve as substrates. While α1-tubulin is the dominant isoform in Plasmodium, low-level expression of α2-tubulin has been detected in liver stages based on transcriptomic data, and it may contribute to the observed signal.

      Regarding the apparent discrepancy between the quantification in Figure 5F and the representative image in Figure 5D, we will revise the figure legend to clarify that image selection aimed to show detectable signal, not necessarily the average phenotype. We will also reassess and, if needed, repeat the quantification with improved image sets to ensure accuracy and consistency.

      We will revise the manuscript to reflect these points and include a more nuanced interpretation of the residual staining in the mutant lines.

      (14) The statement that the failure of CtΔ mutants to generate viable sporozoites is due to the lack of microtubule PTMs (lines 295-296) is speculative. The lack of the entire C-terminal tail could have a number of consequences, such as impaired microtubule assembly or failure to recruit and bind associated proteins. This is not necessarily linked to PTMs. Also, it has been shown in yeast that for microtubules to form properly and exquisite regulation (proteostasis) of the ratio between α- and β-tubulin is essential (Wethekam and Moore, 2023). I am not sure, but according to Materials and Methods (line 423), the gene cassettes for replacing the wild-type tubulin gene with the mutant versions contain a selectable marker gene for pyrimethamine selection. Are there qPCR data that show that expression levels of mutant α-tubulin are more or less the same as the wild-type levels?

      We agree that attributing the developmental failure of the CtΔ mutants solely to the absence of microtubule post-translational modifications (PTMs) is speculative. As the reviewer rightly points out, deletion of the entire C-terminal tail may have multiple effects, including impaired microtubule assembly, altered α/β-tubulin stoichiometry, or disruption of interactions with essential microtubule-associated proteins (MAPs). These consequences may arise independently of PTMs.

      That said, we note that PTMs particularly polyglutamylation, can modulate MAP binding by altering the surface charge of microtubules (Genova et al., 2023; Mitchell et al., 2010). Therefore, while PTM loss may be a contributing factor, we acknowledge that the phenotype likely results from a combination of mechanisms. We will revise the relevant section of the manuscript to present a more cautious and balanced interpretation.

      Regarding the reviewer’s question on expression levels: although the replacement constructs include a pyrimethamine resistance cassette, we have not yet quantified α-tubulin transcript levels by qPCR. In the interim, the study by Spreng et al. (2019) (reference 50) on a related α1-tubulin nutations provides valuable insight. They observed no difference in mRNA levels in day 12 oocysts, yet reported fainter microtubule staining and shorter sporozoites, suggesting a post-transcriptional mechanism affecting protein expression or function in later stages. Furthermore, the phenotypic spectrum across their mutant panel (Suppl. Fig. 3 D and E) implies that robust α-tubulin regulation is highly sensitive to specific sequences.

      We acknowledge this as a current limitation in our study and will address it in the revised manuscript, noting that direct measurement of transcript levels is a key area for future investigation.

      (15) In the Discussion, my impression is that two recent studies, the superb Expansion Microscopy study by Bertiaux et al. (2021) and the cryo-EM study by Ferreira et al. (2023), are not sufficiently recognised (although they are cited elsewhere in the manuscript). The latter study includes a detailed description of the microtubule cytoskeleton in sporozoites. However, the present study clearly expands the knowledge about the structure of the cytoskeleton in liver stage parasites and is one of the few studies addressing the distribution and function of microtubule post-translational modifications in Plasmodium.

      Indeed, our work builds upon the established knowledge from Bertiaux et al. (2021) and the cryo-EM study by Ferreira et al. (2023), as rightly mentioned by the reviewer. We agree that these foundational studies, combined with our findings, will significantly expand the understanding of Plasmodium biology and cytoskeleton dynamics across its life cycle and will open the door for further investigations. We are grateful for this suggestion and will ensure these key studies are appropriately acknowledged in the revised manuscript.

      (16) I somewhat disagree with the statement of a co-occurrence of polyglutamylated and tyrosinated microtubules. I think the resolution is too low to reach that conclusion. As this is a bold claim, and would be contrary to what is known from other organisms, it would require a more rigorous validation. Given the apparent problems with the anti-Ty antibody (signal in the TyΔ mutant), one should be very cautious with this claim.

      This is a very important point to clarify. As mentioned previously, the initial experiments for these modifications were performed independently. It is established that sporozoite subpellicular microtubules exhibit both tyrosination and polyglutamylation. We will revise the manuscript to temper this statement and clearly indicate that the co-occurrence of these PTMs remains a hypothesis that requires more rigorous validation. As suggested, we are now conducting additional co-staining experiments using the better validated YL1/2 antibody to re-express and directly compare the distribution of both PTMs within the same cell. These follow-up experiments will help clarify whether both modifications occur simultaneously on the same microtubule structures in Plasmodium liver stages.

      (17) In the Discussion (lines 311 and 377), it is again claimed that tyrosinated microtubules are "a well-known marker of stable microtubules". This statement is completely incorrect, and I am surprised by this serious mistake. A few lines later, the authors say that polyglutamylated is "commonly associated with dynamic microtubule behaviour". Again, this is completely incorrect and is the opposite of what is firmly established in the literature. Polyglutamylation and detyrosination are markers of stable microtubules.

      Indeed, in canonical eukaryotic systems, tyrosinated microtubules are generally considered markers of dynamic microtubule populations, whereas detyrosinated and polyglutamylated microtubules are more commonly associated with stability.

      We acknowledge this mistake and will revise the Discussion to correct these statements accordingly. In the context of Plasmodium, our observations suggest an unusual regulation of microtubule dynamics, which may reflect parasite-specific adaptations. For example, we observed tyrosinated α-tubulin in the stable subpellicular microtubules of sporozoites structures typically known for their exceptional stability. This atypical association implies either non-canonical roles for tyrosination or parasite-specific mechanisms for modulating microtubule properties. Additionally, the presence of both PTMs at different stages of development and on different microtubule populations suggests tightly regulated spatial and temporal modulation of microtubule function.

      We will carefully revise the relevant sections of the manuscript to remove incorrect generalizations and ensure accurate representation of the current consensus in the field, while emphasizing the possibility of Plasmodium-specific adaptations that merit further study.

      (18) In line 339, the authors interpret the residual antibody staining after the introduction of the mutant tubulin as a compensatory mechanism. There is no evidence for this. More likely explanations are firstly the quality of the anti-Ty-antibody used (see comment above), and the fact that also β-tubulin carries C-terminal polyglutamylation sites, which haven't been investigated in this study. PTMs on β-tubulin are not compensatory, but normal PTMs, at least in all other organisms where microtubule PTMs have been investigated.

      As mentioned above, we are currently repeating the key experiments with the [YL1/2] antibody, as suggested. Furthermore, we fully agree with the reviewer's point regarding polyglutamylation on β-tubulin. The C-terminal tail of β-tubulin does indeed contain polyglutamylation sites. As we noted in the manuscript (Lines 340-352), this aspect has not been investigated in the present study, and we acknowledge it as a valuable direction for future research. We will revise the text accordingly to avoid overinterpretation and to more accurately reflect the limitations of our current data.

    1. eLife Assessment

      This important work shows that fine particulate matter exposure to the lungs led to nociceptor-dependent neutrophilic inflammation. Likely macrophage-neuronal crosstalk, via release of artemin from macrophages and activation of Gfra3 on the JNC neuron, potentiated the response. The data convincingly strengthens links between pollutants, immune and neural interactions.

    2. Reviewer #1 (Public review):

      Summary:

      In the presented study, the authors aim to explore the role of nociceptors in the fine particulate matter (FPM) mediated Asthma phenotype, using rodent models of allergic airway inflammation. This manuscript builds on previous studies, and identify transciptomic reprogramming and an increased sensitivity of the jugular nodose complex (JNC) neurons, one of the major sensory ganglion for the airways, on exposure to FPM along with Ova during the challenge phase. The authors then use OX-314 a selectively permeable form of lidocaine, and TRPV1 knockouts to demonstrate that nociceptor blocking can reduce airway inflammation in their experimental setup.

      The authors further identify the presence of Gfra3 on the JNC neurons, a receptor for the protein Artemin, and demonstrate their sensitivity to Artmein as a ligand. They further show that alveolar macrophages release Artemin on exposure to FPM.

      Strengths:

      The study builds on results available from multiple previous works, and presents important results which allow insights into the mixed phenotypes of Asthma seen clinically. In addition, by identifying the role of nociceptors, they identify potential therapeutic targets which bear high translational potential.

      Weaknesses:

      While the results presented in the study are highly relevant, there is a need for further mechanistic dissection to allow better inferences. Currently, certain results seem associative. Also, certain visualisations and experimental protocols presented in the manuscript need careful assessment and interpretation.

      While Asthma is a chronic disease, the presented results are particularly important to explore Asthma exacerbations in response to acute exposure to air pollutants. This is relevant in today's age of increasing air pollution and increasing global travel.

      Comments on revisions:

      Thank you for addressing the suggestions. No further comments.

    3. Reviewer #2 (Public review):

      Summary:

      The authors sought to investigate the role of nociceptor neurons in the pathogenesis of pollution-mediated neutrophilic asthma. The authors overall achieved the aim of demonstrating that nociceptor neurons are important to the pathogenesis of pollution-exacerbated asthma. Their results support their conclusions overall, although there are ways the study findings can be strengthened. This work further evaluates how nociceptor neurons contribute to asthma pathogenesis important for consideration while proposing treatment strategies for under treated asthma endotypes.

      Strengths:

      The authors utilize TRPV1 ablated mice to confirm the effects of intranasally administered QX-314 utilized to block sodium currents.

      Use of intravital microscopy to track alveolar macrophage and neutrophil motility in their model

      The authors demonstrate that via artemin, which is upregulated in alveolar macrophages in response to pollution, sensitizes JNC neurons thereby increasing their responsiveness to pollution. Ablation or inactivity of nociceptor neurons prevented the pollution induced increase in inflammation.

      Weaknesses:

      While neutrophilic, unclear of the endotype of asthma represented by the model

      Comments on revisions:

      The authors have addressed or commented on all concerns.

    4. Reviewer #3 (Public review):

      Asthma is a complex disease that includes endogenous epithelial, immune and neural components that respond to environmental stimuli. Small airborne particles with diameters in the range of 2.5 micrometers or less, so-called PM2.5, are thought to contribute to some forms of asthma. These forms of asthma may have neutrophils, eosinophils and macrophages in bronchoalveolar lavage. Here, Wang and colleagues build on a recent model that incorporated PM2.5 which was found to have a neutrophilic component. Wang altered the model to provide an extra kick via the incorporation of ovalbumin. The major strength of this work is that silencing TRPV1-expressing neurons either pharmacologically or genetically, modulated inflammation and the motility of neutrophils. By examining bronchoalveolar lavage fluid, they found not only that levels of a number of cytokines were increased, but also that artemin, a protein that supports neuronal development and function, was elevated, which did not occur in nociceptor- ablated mice. Their data strengthens links between pollutants, immune and neural interactions.

      Comments on revisions:

      The manuscript has been revised extensively, including the addition of new experiments, such as intravital microscopy. Did the comments from the reviewers, manifest by additional experiments and modifying how some of the data was presented, result in any changes in the hypotheses or the interpretation of such?

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      In the presented study, the authors aim to explore the role of nociceptors in the fine particulate matter (FPM) mediated Asthma phenotype, using rodent models of allergic airway inflammation. This manuscript builds on previous studies and identify transcriptomic reprogramming and an increased sensitivity of the jugular nodose complex (JNC) neurons, one of the major sensory ganglia for the airways, on exposure to FPM along with Ova during the challenge phase. The authors then use OX-314 a selectively permeable form of lidocaine, and TRPV1 knockouts to demonstrate that nociceptor blocking can reduce airway inflammation in their experimental setup. The authors further identify the presence of Gfra3 on the JNC neurons, a receptor for the protein Artemin, and demonstrate their sensitivity to Artemin as a ligand. They further show that alveolar macrophages release Artemin on exposure to FPM.

      We thank the reviewer for their valuable comments, which have significantly enhanced the quality of our manuscript. A point-by-point rebuttal is provided below.

      Strength

      The study builds on results available from multiple previous work and presents important results which allow insights into the mixed phenotypes of Asthma seen clinically. In addition, by identifying the role of nociceptors, they identify potential therapeutic targets which bear high translational potential.

      Weakness

      While the results presented in the study are highly relevant, there is a need for further mechanistic dissection to allow better inferences. Currently certain results seem associative. Also, certain visualisations and experimental protocols presented in the manuscript need careful assessment and interpretation. While Asthma is a chronic disease, the presented results are particularly important to explore Asthma exacerbations in response to acute exposure to air pollutants. This is relevant in today's age of increasing air pollution and increasing global travel.

      Major

      The JNC is a major group of neurons responsible for receiving sensory inputs from the airways. However, the DRG also contains nociceptors and is known to receive afference from the upper airways. An explanation of why the study was restricted to the JNC would be important.

      We acknowledge that some afferents to the upper airways do arise from the DRG, specifically in the upper thoracic segments (T1–T5). We have added a statement in the text to note this subset of nociceptive and spinally mediated pathways. However, the preponderance of evidence indicates that the majority of airway and lung afferents (70–80%, sometimes up to 90%) originate from the jugular–nodose complex (JNC). Given this large imbalance—and because our study focuses on the mechanosensory, and chemosensory functions mediated primarily by the JNC—we restricted our analysis to this main vagal pathway. By contrast, DRG innervation, though functionally important for nociception and irritation-related reflexes, accounts for a smaller yet significant (~20–30%) fraction of the total afferent pool. The referenced tracing studies[1,2] support this distribution and are cited to clarify our rationale for emphasizing the JNC in our work.

      Similarly, the role of the Artemin in the study remains associative. The study results present that Artemin sensitize nociceptors to lead to an increased inflammatory response (Supplementary Figure 2), however, both upstream and downstream evidence for this inference needs to be dissected further. For instance, the evidence for the role of Artemin in the model comes from ex vivo experiments with alveolar macrophages, but not in the experimental model created. Blocking or activation experiments could be performed, along with investigating the change in the total number of nociceptors with Artemin exposure. Similarly, the downstream effects of the potential Artemin-mediated JNC stimulation should be explored in the context of this experimental setup. A detailed dissection of the mechanisms is important. Additionally, it is also important to discuss the hypothesis leading to the selection of Artemin as a target, which currently seems arbitrary.

      Our data show that exogenous i) OVA-FPM exposed AM secrete Artemin and that ii) recombinant Artemin can sensitize nociceptors, potentially heightening the inflammatory response. As suggested, we agree that more upstream and downstream evidence is needed for definitive mechanistic insight. In response, we have expanded our experiments to include intravital microscopy, which demonstrates impaired motility of alveolar macrophages and neutrophils in nociceptor-ablated mice, suggesting a bidirectional crosstalk between AMs and nociceptor neurons.  

      In future studies, we will perform blocking or activation studies to clarify Artemin’s in vivo effects and confirm its role in modulating airway nociceptors. We also recognize the importance of examining whether Artemin exposure alters the phenotype of these neurons and lung innervation density. As recommended, we plan targeted interventions (e.g., Artemin-neutralizing antibodies or overexpression strategies) to delineate the mechanisms by which Artemin-mediated nociceptor stimulation influences the local inflammatory environment.

      We have expanded our discussion to clarify that Artemin is a recognized growth factor known to sensitize certain sensory neurons, including those responsive to tissue injury and inflammation. This literature-based rationale guided our hypothesis that Artemin might increase nociceptor reactivity in the lung and thereby influence alveolar macrophage function. By combining ex vivo and intravital approaches, we have begun to map these interactions but agree that further in vivo studies are necessary to confirm causality, dissect signal transduction pathways, and fully validate Artemin’s contributions to AM–nociceptor crosstalk. We have revised our manuscript accordingly to highlight these limitations.

      A deeper exploration of the inflammatory parameters could be performed. The multiplex analysis of the cytokine analysis shows a reduction in certain cytokines like IL-6 and MCP (figure 3F), which needs to be discussed. Additionally, investigating the change in proportions of the different immune cell populations is important, which currently restricts the eosinophil and neutrophil counts in the BAL. This is also important as the study builds on work from Prof. Chang's group, which also identified the expansion of an invariant iNKT cell population by FPM, regulatory in nature. Adding data on airway hyperresponsiveness, if possible, would be a welcome addition, considering Asthma as the disease context.

      We thank the reviewer for highlighting the need for a more comprehensive exploration of inflammatory parameters. To address these concerns:

      (1) Cytokine Analysis: We re-ran all statistical analyses, including the CBA and ELISA assays, and confirmed that TNFα and Artemin are the only differentially expressed cytokines across experimental groups. We have expanded the Discussion to emphasize TNFα’s role in this context.

      (2) Immune Cell Profiling in BALF: Our data show that co-exposure with FPM exacerbates CD45+ cells, eosinophil, neutrophil, T-cells and monocyte infiltration. Notably, CD45+ cells and neutrophils were the only population reduced under nociceptor neuron loss-of-function conditions (QX314–treated or TRPV1-DTA mice, Author response image 1).

      Of note, we also confirmed these data using intravital imaging and in a second line of nociceptor ablated mice (NaV1.8DTA). We are aware of Prof. Chang’s work suggesting expansion of an invariant iNKT cell population this population in future

      (3) Airway Hyperresponsiveness (AHR): We recognize that adding AHR data would strengthen the asthma-related context. Unfortunately, we are not currently equipped to perform AHR measurements, but we intend to include this in future experiments to provide a more complete assessment of airway function.

      Author response image 1.

      The authors could revisit the data presented in terms of visualization. For instance, the pooled data presented in some of the figures is probably leading to a wide variation which makes interpretation more difficult. Presenting data separately for each experimental replicate might help the reader. This is also important considering the possible variation seen between experiments (for instance, in Figure 3A and 3C and 3B and 3D, the neutrophil and eosinophil panels for the same groups seem to have an almost 2-fold difference.). Similarly, in the cytokine analysis, the authors have used a common axis for depicting all cytokine values which leads to difficulties in interpretation (Figure 3F). Analysis of the RNA seq results and the DEGs could be revisited to include pathway analysis etc (Figure 2), and the supplementary information could include detailed lists of the major target genes.

      To address this query, we have completely reformatted all graphs and included both gene lists and lists of enriched pathways for all three comparisons in Supplementary Table 1. We also confirmed our flow cytometry analysis functionally by performing intravital imaging.

      The authors should also consider citing the previous experimental setup used for some particular protocols. For instance, the use of the specified protocol for OVA in a C57 background needs to be justified, as there are various protocols reported in the literature. Additionally, doses used in some experiments seem arbitrary (The FPM and Artemin exposure in Figure 4). Depicting the dose-response curve or citing previous literature for the same would be important. Similarly, different sample sizes seen in experiments should be explained, whether they are due to mortality, failure to exhibit phenotypes, or due to technical failures. The RNA seq experiment mentions only 2 biological replicates in one of the groups which should be addressed either by increasing the sample size or by replicating the experiment. Moreover, nested comparisons in experiments performed for Figure 1 need to be performed. Neurons isolated from each mouse should be maintained and analysed separately to retain biological replicates to better represent the heterogeneity.

      We appreciate the request for clarity regarding the experimental protocols and sample sizes:

      OVA Model in C57BL/6 Mice: We adapted a previously published OVA protocol in C57BL/6 mice[3-5] (PMID: 39661516), which uses two doses of sensitization to compensate for the lower Th2 response compared to BALB/c[6]. We increased the dose of OVA (100 µg) because our initial experiments produced low eosinophil infiltration. Although this dosage is on the higher side, some studies have noted local IFNγ induction in C57BL/6 mice; however, we did not detect IFNγ in our setup.

      FPM and Artemin Doses: We did not perform a full dose-response assay for FPM and Artemin but used 100 ng/mL as reported in prior literature, where TRPA1 and TRPV1 mRNA were upregulated after 18 hours of incubation[7]. This reference has been added for clarity.

      Sample Sizes and Exclusions: One control mouse was excluded from the RNA-seq experiment because a parallel PCA analysis indicated it was an outlier. This was the only exclusion in the study, and this have been indicated in the method section of the article.  

      Nested Comparisons and Biological Replicates: We reanalyzed the relevant data with a nested one-way ANOVA and updated the figures accordingly. Neurons isolated from each mouse were first averaged to preserve biological replicates and capture potential heterogeneity; and data was analysed on the per mouse averages.

      The manuscript should be more detailed regarding the statistics employed. Currently, there is a section mentioned in the methods section, but details of corrections employed and specific stats for specific experiments should be described. There are also some minor grammatical errors and incomplete sentences in the manuscript which should be corrected. The authors should also consider a more expansive literature review in the introduction/discussion sections.

      We have updated the figure legends and methods to include more detailed information on the specific statistical tests used for each experiment. In addition, we have fixed minor grammatical errors and incomplete sentences throughout the manuscript. Finally, we have expanded our Introduction and Discussion to include additional references and a broader literature context.

      Reviewer #2 (Public review):

      The authors sought to investigate the role of nociceptor neurons in the pathogenesis of pollutionmediated neutrophilic asthma.

      We thank the reviewer for their valuable comments, which have significantly enhanced the quality of our manuscript. A point-by-point rebuttal is provided below.

      Strength

      The authors utilize TRPV1 ablated mice to confirm effects of intranasally administered QX-314 utilized to block sodium currents. The authors demonstrate that via artemin, which is upregulated in alveolar macrophages in response to pollution, sensitizes JNC neurons thereby increasing their responsiveness to pollution. Ablation or inactivity of nociceptor neurons prevented the pollution induced increase in inflammation.

      Weakness

      While neutrophilic, the model used doesn't appear to truly recapitulate a Th2/Th17 phenotype.  No IL-17A is visible/evident in the BALF fluid within the model. (Figure 3F). Unclear of the relevance of the RNAseq dataset, none of the identified DEGs were evaluated in the context of mechanism. The authors overall achieved the aim of demonstrating that nociceptor neurons are important to the pathogenesis of pollutionexacerbated asthma. Their results support their conclusions overall, although there are ways the study findings can be strengthened. This work further evaluates how nociceptor neurons contribute to asthma pathogenesis important for consideration while proposing treatment strategies for undertreated asthma endotypes.

      Major

      Utilizing a different model, one using house dust mite or alternaria alternata or similar that is able to induce a true Th2/th17 type response that is also more translatable to humans for confirmation.

      We appreciate the suggestion to use additional allergen models. In a pilot study, we did observe increased Artemin in the BALF of house dust mite–treated mice, although the levels were low under our current dosing schedule (20 µg/dose daily from Day 0–4 and Day 7–9, with sacrifice on Day 10; Auhtor response image 2). Conversely, using an Alternaria alternata model at 100 µg/dose daily from Day 0–2 (sacrificed on Day 3) did not yield a detectable increase in Artemin. We suspect these findings may reflect the specific dose and timing used. We plan to refine our protocols (e.g., longer exposures or higher doses) for HDM and/or Alternaria to better model a Th2/Th17 response and further validate our observations in a setting more translatable to human asthma.

      Author response image 2.

      Additional analysis, maybe pathway analysis on the RNAseq dataset presented in Figure 2. Unclear how these genes are relevant/how they affect functionality. At present it is acceptable to say they are transcriptionally reprogramed, but no protein evaluation is provided which would get more at function, however, the authors do show some functional data in Figure 1, so maybe this could somehow be discussed/related to Figure 2.

      We have expanded our RNA-seq analysis to include gene lists and enriched pathways for all three comparisons in Supplementary Table 1. We have also revised our discussion to align these transcriptomic changes with the functional data shown in Figure 1. While we have not yet performed protein-level validation for all identified genes, the patterns observed in our RNA-seq dataset suggest pathways potentially tied to nociceptor activation and the downstream inflammatory response. We plan to conduct targeted protein analyses in future studies to further substantiate these findings.

      Histology and localization of neutrophils/nociceptor neurons/alveolar macrophages would enhance the study findings.

      We appreciate the reviewer’s suggestion to include histological data showing the distribution of neutrophils, nociceptor neurons, and alveolar macrophages. While we have not yet performed detailed histological staining of these cell types, we have added live in-vivo intravital microscopy data (Figure 4) that illustrate impaired AM and neutrophil motility in nociceptor-ablated mice. We plan to include additional histological analyses in future studies to further localize these cells in the lung tissue.

      Minor:

      The first 3 figures are small and hard to read.

      We have enlarged Figures 1 and 3 in the revised manuscript to improve readability. We have also added the corresponding gene lists and enriched pathways to Supplementary Table 1 for clarity.

      The figures are mislabeled in the text. Figure 2 is discussed twice in two different contexts; the second mention is supposed to be labeled as Figure 2.

      We corrected the mislabeled figures in the text, ensuring that each figure is referenced accurately.

      Figure 4 isn't cited in the text. I think it is supposed to be referenced in the paragraph before the discussion starts and is currently labeled as Figure 1.

      We have updated the text to properly cite Figure 4 in the relevant paragraph before the Discussion begins, rather than labeling it as Figure 1.

      Notating which statistical analysis was used with each figure/subfigure would be beneficial. Also, it's important to notate if the data was analyzed for multiple comparisons.

      We have revised each figure/subfigure legend to specify the statistical tests used, including information on whether corrections for multiple comparisons were applied. This provides a clearer understanding of how each dataset was analyzed.

      Reviewer #3 (Public review):

      Asthma is a complex disease that includes endogenous epithelial, immune, and neural components that respond awkwardly to environmental stimuli. Small airborne particles with diameters in the range of 2.5 micrometers or less, so-called PM2.5, are generally thought to contribute to some forms of asthma. These forms of asthma may have increased numbers of neutrophils and/or eosinophils present in bronchoalveolar lavage fluid and are difficult to treat effectively as they tend to be poorly responsive to steroids. Here, Wang and colleagues build on a recent model that incorporated PM2.5 which was found to have a neutrophilic component. Wang altered the model to provide an extra kick via the incorporation of ovalbumin. Building on their prior expertise linking nociceptors and inflammation, they find that silencing TRPV1-expressing neurons either pharmacologically or genetically, abrogated inflammation and the accumulation of neutrophils. By examining bronchoalveolar lavage fluid, they found not only that levels of the number of cytokines were increased, but also that artemin, a protein that supports neuronal development and function, was elevated, which did not occur in nociceptor-ablated mice. They also found that alveolar macrophages exposed to PM2.5 particles had increased artemin transcription, suggesting a further link between pollutants, and immune and neural interactions.

      We thank the reviewer for their valuable comments, which have significantly enhanced the quality of our manuscript. A point-by-point rebuttal is provided below.

      Weakness

      There are substantial caveats that must be attached to the suggestions by the authors that targeting nociceptors might provide an approach to the treatment of neutrophilic airway inflammation in pollutiondriven asthma in general and wildfire-associated respiratory problems in particular.  

      These caveats include the uncertainty of the relevance of the conventional source of PM2.5, to pollution and asthma. According to the National Institute of Standards and Technology (NIST), the standard reference material (SRM) 2786 is a mix obtained from an air intake system in the Czech Republic. It is not clear exactly what is in the mix, and a recent bioRxiv preprint, https://www.biorxiv.org/content/10.1101/2023.08.18.553903v3.full.pdf reveals the presence of endotoxin. Care should thus be taken in interpreting data using particulate matter. Regarding wildfires, there is data that indicates that such exposure is toxic to macrophages. What impact might that then have on the production of cytokines, and artemin, in humans?

      We recognize the potential limitations of using SRM2786 (obtained from a Czech air-intake system) as a model for realworld PM2.5 exposure. Our rationale for choosing SRM2786 is that it is commercially available and represents a broad spectrum of ambient air pollutants, in contrast to more specialized sources like diesel exhaust particles. However, we acknowledge in the discussion the presence of endotoxin in SRM2786, as suggested by recent reports, and agree that this may influence immune responses and should be considered when interpreting our data.

      Regarding wildfire-associated exposure, we are aware that certain components of wildfire smoke can be toxic to macrophages. We do not think this play a significant role in the current study design as number of AMs, as determined by flow cytometry and intravital microscopy, are similar when comparing OVA-exposed mice to OVA-FPM exposed animals. Thus, these results rule out significant AM toxicity by FPM.

      Ultimately, while our findings suggest that modulating nociceptor activity may reduce neutrophilic inflammation, we emphasize that additional research—including different PM2.5 sources, validation of endotoxin levels, and in vivo confirmation in human-relevant models—is necessary before drawing definitive conclusions about treating pollutiondriven asthma or wildfire-induced respiratory problems.

      The Introductory paragraph implies links between wildfire events, particular exposure, and neutrophilic asthma. I am not aware of such a link having been established, in which case the paragraph needs revision. In the paragraph that begins with 'Urban pollution', it is suggested that eosinophilic asthma is treatment responsive in comparison to the neutrophilic form. That may not be the case, and they may often these cellular components may occur together. In much of the manuscript, there is a mismatch between the text and the figure numbers. For example, in the Results, Figure 2 should be Figure 3 some of the time, and Figure 3 is actually Figure 4, while the reference to Figure 1F-H is Figure 4H. Please check carefully.

      (a) Introduction Paragraph and Wildfire–Neutrophilic Asthma Link

      We add references to the introduction to support the link between wildfire, respiratory symptoms and the link to neutrophilic asthma [8-12].

      (b) Distinction Between Eosinophilic and Neutrophilic Asthma

      We recognize that eosinophilic and neutrophilic airway infiltrates can co-occur in the same individual and that treatment responsiveness can vary considerably. Our intention was to note that conventional asthma therapies (e.g., inhaled corticosteroids) are generally more effective for eosinophilic-driven disease than for neutrophilic phenotypes, but we agree that these inflammatory endotypes often overlap in clinical practice. We have revised the text in the “Urban pollution” section to acknowledge this complexity and to clarify that inflammatory cell populations in asthma are not always discrete.

      Figure Numbering and Text–Figure Mismatch

      We sincerely apologize for the confusion caused by mismatched figure labels and references in the Results section. We have carefully reviewed and corrected all figure references throughout the manuscript to ensure accuracy.

      References

      (1) Kim, S. H. et al. Mapping of the Sensory Innervation of the Mouse Lung by Specific Vagal and Dorsal Root Ganglion Neuronal Subsets. eNeuro 9 (2022). https://doi.org/10.1523/ENEURO.0026-22.2022

      (2) McGovern, A. E. et al. Evidence for multiple sensory circuits in the brain arising from the respiratory system: an anterograde viral tract tracing study in rodents. Brain Struct Funct 220, 3683-3699 (2015). https://doi.org/10.1007/s00429-014-0883-9

      (3) Shen, C.-C., Wang, C.-C., Liao, M.-H. & Jan, T.-R. A single exposure to iron oxide nanoparticles attenuates antigen-specific antibody production and T-cell reactivity in ovalbumin-sensitized BALB/c mice. International journal of nanomedicine, 1229-1235 (2011).  

      (4) Delayre-Orthez, C., De Blay, F., Frossard, N. & Pons, F. Dose-dependent effects of endotoxins on allergen sensitization and challenge in the mouse. Clinical & Experimental Allergy 34, 1789-1795 (2004).  

      (5) Morokata, T., Ishikawa, J. & Yamada, T. Antigen dose defines T helper 1 and T helper 2 responses in the lungs of C57BL/6 and BALB/c mice independently of splenic responses. Immunology letters 72, 119-126 (2000).  

      (6) Li, L., Hua, L., He, Y. & Bao, Y. Differential effects of formaldehyde exposure on airway inflammation and bronchial hyperresponsiveness in BALB/c and C57BL/6 mice. PLoS One 12, e0179231 (2017).  

      (7) Ikeda-Miyagawa, Y. et al. Peripherally increased artemin is a key regulator of TRPA1/V1 expression in primary afferent neurons. Molecular pain 11, s12990-12015-10004-12997 (2015).  

      (8) Baan, E. J. et al. Characterization of Asthma by Age of Onset: A Multi-Database Cohort Study. J Allergy Clin Immunol Pract 10, 1825-1834 e1828 (2022). https://doi.org/10.1016/j.jaip.2022.03.019

      (9) de Nijs, S. B., Venekamp, L. N. & Bel, E. H. Adult-onset asthma: is it really different? Eur Respir Rev 22, 44-52 (2013). https://doi.org/10.1183/09059180.00007112

      (10) Gianniou, N. et al. Acute effects of smoke exposure on airway and systemic inflammation in forest firefighters. J Asthma Allergy 11, 81-88 (2018). https://doi.org/10.2147/JAA.S136417

      (11) Noah, T. L., Worden, C. P., Rebuli, M. E. & Jaspers, I. The Effects of Wildfire Smoke on Asthma and Allergy. Curr Allergy Asthma Rep 23, 375-387 (2023). https://doi.org/10.1007/s11882-023-01090-1

      (12) Wilgus, M. L. & Merchant, M. Clearing the Air: Understanding the Impact of Wildfire Smoke on Asthma and COPD. Healthcare (Basel) 12 (2024). https://doi.org/10.3390/healthcare12030307

    1. eLife Assessment

      CCL2 is a chemokine with immune cell chemoattractant properties, and it appears to play a role in several chronic inflammatory diseases. The RNA-binding protein HuR controls the stability and translation of CCL2 mRNA. This paper presents convincing evidence that a relatively common genetic variant tied to several disease phenotypes affects the interaction between the mRNA of CCL2 and the RNA-binding protein HuR. While the experiments cannot definitively distinguish between effects on RNA transcription and stability, CCL2 is thought to be relevant for leukocyte migration in various conditions, including chronic inflammation and cancer, and the study presents important findings that may be relevant to a broad audience.

    2. Reviewer #1 (Public review):

      Summary:

      This paper presents evidence that a relatively common genetic variant tied to several disease phenotypes affects the interaction between the mRNA of CCL2 and the RNA binding protein HuR. CCL2 is an immune cell chemoattractant protein.

      Strengths:

      The study is well conducted with relevant controls. The techniques are appropriate, and several approaches provided concordant results were generally supportive of the conclusions reached. The impact of this work, identifying a genetic variant that works by altering the binding of an RNA-regulatory protein, has important implications given that the HuR protein could be a drug target to improve its function and over-ride this genetic change. This could have important implications for a number of diseases where this genetic variant contributes to disease risk.

      The authors have done a nice job of citing prior work. Details of the experimental protocols are well elaborated and the significance of the findings are well contextualized.

      Weaknesses:

      Authors have addressed prior weaknesses.

    3. Reviewer #2 (Public review):

      This study focuses on the differential binding of the RNA-binding protein HuR to CCL2 transcript (genetic variants rs13900 T or C). The study explores how this interaction influences the stability and translation of CCL2 mRNA. Employing a combination of bioinformatics, reporter assays, binding assays, and modulation of HuR expression, the study proposes that the rs13900T allele confers increased binding to HuR, leading to greater mRNA stability and higher translational efficiency. These findings indicate that rs13900T allele might contribute to heightened disease susceptibility due to enhanced CCL2 expression mediated by HuR. The study is interesting and most results are convincing, however the interpretation relative to RNA transcription and/or stability must be modified, and some data need better presentation or interpretation.

      Major Points

      Figure 2C:<br /> The authors describe an experiment to assess mRNA stability by labeling nascent RNA with EU for 3 hours, followed by washout of EU, and then incubation with or without actinomycin D for an additional 4 hours before measuring the remaining EU-labeled RNA. While the approach to label nascent RNA with EU is appropriate for tracking RNA decay, I have concerns regarding the use and interpretation of actinomycin D in this context.<br /> After EU washout, the pool of EU-labeled RNA is fixed and no new EU incorporation can occur. Therefore, the addition of actinomycin D at this stage should not affect the decay rate of the already labeled RNA, as transcription of EU-labeled RNA has effectively ceased. In this design, measuring the decrease in EU-labeled RNA over time reflects mRNA stability (even in absence of actinomycin D) rather than transcriptional activity.<br /> Therefore, the authors' statement that the non-actinomycin D treatment group represents transcriptional changes is not accurate here. Since EU labeling was stopped prior to the 4-hour incubation, any changes in EU-labeled RNA levels during this period reflect RNA decay, not new transcription.

      In summary:<br /> To assess transcriptional changes, one would compare the amount of EU-labeled RNA synthesized during the initial labeling period (the first 3 hours), before washout.<br /> If the authors wish to use actinomycin D to block transcription, this should be done in a separate decay assay without EU labeling.<br /> In the current experimental setup, actinomycin D is unnecessary after EU washout and does not influence the decay of the labeled RNA.<br /> I recommend the authors reconsider the interpretation of their data accordingly. I recommend to remove the data points relative to the presence of actinomycin D, as the non-actinomycin D samples are already representative of post-transcriptional changes given that EU was washed out. If Authors want to assess transcriptional changes, they would have to assess the levels during the initial labeling period (before the washout). Transcriptional differences were not assessed, therefore I would modify the text accordingly.<br /> In this context, any changes observed in the actinomycin D-treated samples are likely attributable to general cellular stress induced by actinomycin D, which is known to be highly stressful for cells. This stress could indirectly influence the decay rates of already-labeled EU-RNA.

      Figure 4C and 4D:<br /> The Author provided an updated gel with relative quantification - which effectively show the enhanced binding of CCL2 mRNA carrying the T variant to HuR - but they only provided it as data for reviewers (Figure R1). I highly recommend to use these data in the final manuscript instead of the data currently presented in Figure 4C and 4D. This would be important in order not to not create confusion in the reader or concerns regarding probe degradation or saturation.

      Minor points<br /> For the IP, I recommend to explain in the final version why the input was not provided (lack of material) and to clarify that the specific binding of Actin was used as a loading control in absence of input. This would be highly beneficial for the readers.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Recommendations For The Authors):

      Comment 1: The authors need to do more to cite the prior work of others. CCL2 allelic expression imbalance tied to the rs13900 alleles was first reported by Johnson et al. (Pharmacogenet Genomics. 2008 Sep; 18(9): 781-791) and should be cited in the Introduction on line 128 next to the Pham 2012 reference. Also, in the Results section, line 142, please provide references for the statement "We and others have previously reported a perfect linkage disequilibrium between rs1024611 in the CCL2 cis-regulatory region and rs13900 in its 3′ UTR" since the linkage disequilibrium for these 2 SNPs is not reported in the ENSEMBL server for the 1000 genomes dataset. #

      We thank the reviewer for pointing out the omission regarding the citation of prior work. We acknowledge that Johnson et al. (2008) reported the association between rs13900 and CCL2 allelic expression imbalance based on Snapshot methodology while examining _cis-_acting variants of 42 candidate genes. To acknowledge these prior studies, we have cited the previous works of Johnson et al. (Johnson et al., 2008) along with Pham et al. (Pham et al., 2012) that linked rs13900 to CCL2 allelic expression imbalance. The text in the introduction section (Lines 128-130) has been updated to reflect the above-mentioned changes.

      “We and others have demonstrated AEI in CCL2 using rs13900 as a marker with the T allele showing a higher expression level relative to C allele (Johnson et al., 2008; Pham et al., 2012).”

      We have cited some previous studies that suggested strong linkage disequilibrium between rs1024611 and rs13900 within CCL2 gene, with D’=1 and R<sup>2</sup>=0.96 (Hubal et al., 2010; Intemann et al., 2011; Kasztelewicz et al., 2017; Pham et al., 2012) on Line 144. To address the concern regarding unreported linkage disequilibrium between rs1024611 and rs13900, we reviewed the pairwise linkage disequilibrium data by population in the ENSEMBL server for 1000 Genome dataset and confirm that the linkage disequilibrium (LD) between rs1024611 and rs13900 has been observed, with D’=1 and R<sup>2</sup>=0.92 to 1.0 in specific populations. We have included a table (Author response table 1) depicting pairwise LD between rs13900 and rs1024611 as reported in the ENSEMBL server for the 1000 genome dataset, a URL reference to the ENSEMBL server data.

      Author response table 1.

      Pairwise linkage disequilibrium data between rs13900 and rs1024611 by population reported in the ENSEMBL server for the 1000 genome dataset

      F. Variant, Focus Variant; R<sup>2</sup>, correlation between the pair loci; D’, difference between the observed and expected frequency of a given haplotype.

      URL: https://www.ensembl.org/Homo_sapiens/Variation/HighLD?db=core;r=17:34252269-34253269;v=rs1024611;vdb=variation;vf=959559590;second_variant_name=rs13900

      Comment 2: Certain details of the experimental protocols need to be further elaborated or clarified to contextualize the significance of the findings. For example, in the results line 184 the authors state "Using nascent RNA allows accurate determination of mRNA decay by eliminating the effects of preexisting mRNA." How does measuring nascent RNA enable the accurate determination of mRNA decay? Doesn't it measure allele-specific mRNA synthesis? Please elaborate, as this is a key result of the study. Can the authors provide a reference supporting this statement?

      It is worthwhile to mention that mRNA decay can be precisely measured by eliminating the effect of any preexisting mRNA. Metabolic labeling with 4-thiouridine allows exclusive capture of newly synthesized RNA which will allow quantification of RNA decay eliminating any interference from preexisting RNA. We agree that nascent RNA measurement primarily reflects synthesis rate rather than degradation. However, in conjugation with actinomycin-D based inhibition studies it can be exploited for accurate mRNA decay determination of the newly synthesized RNA (Russo et al., 2017). Therefore, our aim was to use the nascent RNA to study decay kinetics. The imbalance in the CCL2 allele expression does occur at the transcriptional level as seen in non-actinomycin-D treatment group (Figure 2C) although the impact of post-transcriptional mechanisms that alter transcripts stability cannot be ruled out. Therefore, we employed a novel approach that could assess both the synthesis and the degradation by combining actinomycin-D inhibition and nascent RNA capture in the same experimental setup. In the presence of actinomycin-D, we could detect much greater allelic difference in the expression levels of the rs13900T and C allele four-hour post-treatment, suggesting a role for post-transcriptional mechanisms in CCL2 AEI.

      “We have expanded the method section in the revised draft to include experimental details on capture of nascent RNA and subsequent downstream analysis” (Lines 553-563).

      Newly synthesized RNA was isolated using the Click-It Nascent RNA Capture Kit (Invitrogen, Cat No: C10365) following the manufacturer’s protocol. Peripheral blood mononuclear cells (PBMCs) or monocyte-derived macrophages (MDMs) obtained from heterozygous individuals were stimulated with lipopolysaccharide (LPS) for 3 hours in presence of 0.2 mM 5-ethynyl uridine (EU) (Jao and Salic, 2008; Paulsen et al., 2013). After the pulse, the culture medium was replaced with fresh growth medium devoid of EU. To assess RNA stability, actinomycin-D (5 µg/mL) was added, and samples were collected at 0, 1, 2, and 4 h post-treatment. The EU RNA was subjected to a click reaction that adds a biotin handle which was then captured by streptavidin beads. The captured RNA was used for cDNA synthesis (Superscript Vilo kit, Cat No: 11754250), PCR amplification, and allelic quantification.”

      Comment 3: Also, they next state that the assay was carried out using cells treated with actinomycin D (line 186). Doesn't actinomycin D block transcription? The original study by Jia et al 2008 in PNAS reported that low concentration of ActD (100 nM) blocked RNA pol I and higher concentration (2 uM) blocked RNA pol II. This or the study on which the InVitrogen kit is based should be cited. The concentration of actinomycin D used to treat the cells should be given. They report that the T allele transcript was more abundant than the C allele transcript in nascent RNA. Why doesn't that argue for a transcriptional mechanism rather than an RNA-stability mechanism? This result should be discussed in the Discussion.

      In our study, we used a concentration of 5 µg/mL (3.98 µM), which as noted by the reviewer can effectively inhibit RNA polymerase II (Pl II) activity. We have updated our manuscript to include details and cited the original work of (Jao and Salic, 2008; Paulsen et al., 2013), which thoroughly investigate the effect of various concentrations of ActD on RNA polymerase I and II (Line no 557). A discussion of the RNA stability mechanism is provided in the Result section (Lines 196-198).

      Comment 4: In their bioinformatics analysis of the allele-specific CCL2 mRNAs, they reported that the analysis obtained a score of 1e (line 214). What does that mean? Is it significant?

      We acknowledge that the notation “a score of 1e” was unclear and thank the reviewer for pointing it out. We have clarified its significance in the revised manuscript. The following text has been included in the result section (Line no 223)

      “The score of 1e was obtained using RBP-Var, a bioinformatics tool that scores variants involved in posttranscriptional interaction and regulation (Mao et al., 2016). Here, the annotation system rates the functional confidence of variants from category 1 to 6. While Category 1 is the most significant category and includes variants that are known to be expression quantitative trait loci (eQTLs), likely affecting RBP binding site, RNA secondary structure and expression, category 6 is assigned to minimal possibility to affect RBP binding. Additionally, subcategories provide further annotation ranging from the most informational variants (a) to the least informational variant (e). Reported 1e denotes that the variant has a motif for RBP binding. Although the employed scoring system is hierarchical from 1a to 1e, with decreasing confidence in the variant’s function. However, all the variants in category 1 are considered potentially functional to some degree.”

      Comment 5: In Figure 3A, why is the rare SNP rs181021073 shown? This SNP does not comeup anywhere else in the paper. For clarity, it should be removed from Figure 3A.

      We thank the reviewer for pointing out the error in Figure 3A and apologize for the oversight. We agree that the SNP rs1810210732 is not mentioned anywhere in the manuscript and its inclusion in Figure 3A may have caused confusion. We have removed this SNP from the revised figure.

      Comment 6: For the RNA EMSA results presented in Fig. 4C with recombinant ELAVL1 (HuR), there is clearly a loss of unbound T allele probe with increasing concentrations of the recombinant protein (without a concomitant increase in shifted complex). This suggests that the T allele probe is degraded or loses its fluorescent tag in the presence of recombinant HuR, whereas the C allele probe does not. The quantitation of the shifted complex presented in Fig. 4D as a percentage of bound and unbound probe is therefore artificially elevated for the T allele compared to the C allele. In fact, there seems to be little difference between the shifted complexes with the T and C allele probes. The authors should explain this difference in free probe levels.

      We appreciate the constructive critique of the reviewer regarding the RNA EMSA results in Fig. 4C. To address this, we repeated the experiments to analyze the differential binding of rs13900T/C allele bearing probes with increasing concentration of the recombinant HuR. No degradation/ loss of fluorescence tag for T allele was noted in presence of recombinant HuR in three independent experiments (Author response image 1). This indicates that both the probes with C or T allele show comparable stability and are not affected by increasing concentration of recombinant HuR. The apparent reduction in the unbound T allele probe in Figure 4C may be due to saturation at higher HuR concentration rather than degradation.

      Author response image 1.

      Differential binding and stability of oligoribonucleotide probes containing rs13900C or T alleles with recombinant HuR. (A) REMSA with labeled oligoribonucleotides containing either rs13900C or rs13900T and recombinant HuR at indicated concentrations. (B&C) Representative quantitative densitometric analysis of HuR binding to the oligoribonucleotides bearing rs13900 T or C. The signal in the bound fractions were normalized with the free probe. The figure represents data from three independent experiments (mean ± SEM).

      Comment 7: In the Methods section, concentrations and source of reagents should be given. For example, what was the bacterial origin of LPS and concentration? What concentration of actinomycin D? What was the source? Was it provided with the nascent RNA kit? In describing the riboprobes used for REMSA, please underline the allele in the sequences (lines 549 and 550).

      Thank you for your detailed feedback and suggestions regarding the Materials and Methods Section. We regret the oversight in providing detailed information on reagent concentrations and sources in the method section. We have now rectified this omission and have provided the necessary details and a summary of material/reagents used is presented as a supplementary table (Supplementary Table 4) to enable others to replicate our experiments accurately. Regarding the description of riboprobes for RNA Electrophoretic Mobility Shift Assay, we underlined and bold the allele in the sequences as suggested (Lines 603-604).

      Comment 8: For polysome profiling on line 603, please provide a protocol for the differentiation of primary macrophages from monocytes (please cite an original protocol, not a prior paper that does not give a detailed protocol).

      We agree with the reviewer’s comment and have included the following text for primary macrophage differentiation from monocytes in the method section cited the original protocol (Line 668).

      “Human monocytes were isolated from fresh blood as described earlier (Gavrilin et al., 2009) with slight modification. Briefly, peripheral blood mononuclear cells were isolated by density gradient centrifugation using Histopaque, followed by immunomagnetic negative selection using EasySep Human Monocyte isolation kit. A high purity level for CD14+ cells was consistently achieved (≥90%) through this procedure, as confirmed by flowcytometry. The purified monocytes were immediately used for macrophage differentiation by treating them with 50 ng/mL M-CSF (PeproTech) for 72 h and flow cytometric measurement of surface markers CD64+,

      CD206+, CD44 was used to confirm the differentiation”. This data is now shown in the new Supplementary Figure S6.

      Comment 9: In the legend of Figure 2, please replace "5 ug of actinomycin D" with the actual concentration used.

      We appreciate your attention to detail and thank you for pointing out the error in the legend of Figure 2. We regret the oversight and have made the suggested change (Line 739).

      Comment 10: In the Discussion, the authors cite the study of CCL2 mRNA stabilization by HuR in mice by Sasaki et al (lines 407-9). Is regulation of CCL2 mRNA by HuR in the mouse relevant to human studies?

      How conserved is the 3'UTR of mouse and human CCL2? Is the rs13900 variant located in a conserved region? How many putative HuR sites are found in the 3'UTR of human and mouse CCL2 3'UTR? Does HuR dimerize (see Pabis et al 2019, NAR)? This information could be added to the Discussion.

      Thank you for your valuable comment. We appreciate your suggestion to include information on the dimerization of HuR in our discussion. While reporting the overall structure and domain arrangement of HuR, Pabis et al. (2019) deciphered dimerization involving Trp261 in RRM3 as key requirement for functional activity of HuR in vitro. This finding provides additional context for understanding HuR’s role in regulating CCL2 expression. We have added the following few lines in the discussion (Lines 421-428) acknowledging HuR’s ability to dimerize and cite the relevant references.

      “HuR consists of three RNA recognition motifs (RRMs) that are highly conserved and canonical in nature (Ripin et al., 2019). In absence of RNA the three RRMs are flexibly linked but upon RNA binding they transition to a more compact arrangement. Mutational analysis revealed that HuR function is inseparably linked to RRM3 dimerization and RNA binding. Dimerization enables recognition of tandem AREs by dimeric HuR (Pabis et al., 2019) and explains how this protein family can regulate numerous targets found in pre-mRNAs, mature mRNAs, miRNAs and long noncoding RNAs.”

      We aligned the CCL2 3’UTR from five different mammalian species and found that the region flanking rs13900/ HuR binding site is relatively conserved (Author response image 2). Based on PAR-CLIP datasets there are four HuR binding regions in human CCL2 3’ UTR (Lebedeva et al., 2011). However, the region overlapping rs13900 seems to be predominantly involved in the CCL2 regulation (Fan et al., 2011). This information has been included in the discussion.

      Author response image 2.

      Cross-species alignment of the CCL2 3’UTR region flanking the rs13900 using homologous regions from 5 different mammals. (Hu, Human; CH, Chimps; MO, Mouse; RA, Rat; DO, Dog, rs13900 is shown within the brackets Y, pyrimidine)

      Reviewer #2 (Recommendations For The Authors):

      Comment 1: The supplemental figures need appropriate figure legends.

      We regret the oversight and thank the reviewer for bringing it to our attention. We have now included the figure legend for the supplemental figures in the revised manuscript.

      Comment 2: The data on LPS-induced CCL2 expression in PBMCs should be represented as a scatter plot with statistical significance to enhance clarity and interpretability.

      We thank the reviewer for this constructive suggestion. In the revised Figure 2A the induction of CCL2 expression by LPS in PBMCs obtained from 6 volunteers is represented as a scatter plot. We have also included individual data points in the updated figure and statistical significance to improve clarity and interpretability.

      Comment 3: The stability of CCL2 mRNA in control cells needs comparison with treated cells for context. The stability of a housekeeping gene (such as GAPDH or ACTB) should always be included as a control in actinomycin D experiments. Clarify the differential stability of rs13900C vs. rs13900T alleles.

      We used 18S to normalize data for the mRNA stability studies, as it is abundant and has been recommended for such studies, as it is relatively unaltered when compared to other housekeeping genes following Act D treatment in well-controlled studies (Barta et al., 2023). We also compared Ct values between the Act D-treated samples and the Act D-untreated samples in this study and found them to be comparable (Author response image 3).

      Author response image 3.

      Ct values of 18s rRNA in ACT-D and control samples in Fig 2.

      Comment 4: In the main text and the methods, the authors state that nascent RNA was obtained in the presence of actinomycin D and EU. However, actinomycin D blocks the transcription of nascent RNAs, therefore the findings in Figure 2C do not reflect nascent RNA

      Please see our response to Reviewer 1 Comment 2. We would like to emphasize that to assess the differential role of the rs13900 in nascent RNA decay we integrated nascent RNA labeling and transcriptional inhibition. Briefly, PBMC from a heterozygous individual were either unstimulated or stimulated with LPS and pulsed with 5-ethynyl uridine (0.2 mM) for 3 h and the media was replaced with EU free growth medium. RNA was obtained at 0,1, 2 and 4 h following actinomycin-D treatment (5 µg/mL) to assess the stability of nascent RNA.

      Comment 5: Figure 4A is not clearly described or labeled. What are lanes 2 and 6?

      Figure 4 has now been updated to clearly describe all the lanes. Lanes 2 and 6 represent the mobility shift seen following the incubation by whole cell extracts and oligonucleotide bearing rs13900C and rs13900T probes respectively.

      Comment 6: Figure 4C and Figure 4D: the charts in Figure 4D do not seem to reflect the changes in Figure 4C. How was the mean variant calculated? How do the authors explain the different quantities in unbound/free RNA in rs13900C compared to rs13900T?

      We appreciate the constructive critique of the reviewer regarding the RNA EMSA results in Fig. 4C. To address this, we repeated the experiments to analyze the differential binding of rs13900T/C probes with increasing concentration of the recombinant HuR. No degradation/ loss of fluorescence tag in presence of HuR was noted in case of T allele (Author response image 1). This indicates that both the C and T allele probes exhibit comparable stability and are not affected by increasing the concentration of recombinant HuR. The apparent reduction in the unbound T allele probe in Figure 4C may be due to saturation due to higher HuR concentration rather than degradation. Also please note under limiting HuR concentration (50µM) there is more binding of purified HuR by the T bearing oligoribonucleotide (compare lanes 2 & 6 in Author response image 1).

      Comment 7: Figure 5A does not look like an IP. The authors should show the heavy and light chains and clarify why there is co-precipitation of beta-actin with IgG and HuR. Also, they should include input samples. Figure 5B: given that in a traditional RIP the mRNA is not cross-linked and fragmented, any region of CCL2 mRNA would be amplified, not just the 3'UTR. In other words, Figure 5B can be valuable to show the enrichment of CCL2 mRNA in general, but not the enrichment of a specific region.

      We understand the reviewer’s concern on Figure 5A and 5B. Due to sample limitations we are unable to confirm these results using heavy and light chains antibodies. However, it is important to note that co-precipitation of β-actin with IgG and HuR can be due to its non-specific binding with protein G. In a recent study non-specific precipitation by protein G or A was reported for proteins such as p53, p65 and β-actin (Zeng et al., 2022). We are including a figure provided by MBL Life Sciences as the quality check document for their RIP Assay Kit (RN 1001) that was used in our study. It is evident from Author response image 4 that even pre-clearing the lysate may not remove the ubiquitously expressed proteins such as β-actin or GAPDH and they will persist as contaminants in pull-down samples. Hence the presence of β-actin in the IgG and HuR IP fractions may be due to non-specific interactions with the agarose beads.

      Author response image 4.

      MBL RIP-Assay Kit’s Quality Check. Quality check of immunoprecipitated endogenous PTBP1 expressed in Jurkat cells. Lane 1: Jurkat (WB positive cells), Lane 2: Jurkat + normal Rabbit IgG, Lane 3: Jurkat+ anti-PTBP1.

      We agree with the reviewer’s comments that traditional RIP without cross-linking and fragmentation allows amplification of any region of CCL2 mRNA. However, the upregulation of CCL2 gene expression in α-HuR immunoprecipitated samples indirectly reflects the enrichment of CCL2 mRNA associated with HuR. Moreover, 3’-UTR targeting primers were used for amplification to examine HuR binding at this region. We believe this approach ensures that the above enrichment specifically reflects HuR association with the 3’-UTR rather than other parts of the transcript.

      Comment 8: Construct Validation in Luciferase Assays (Figure 6): The authors need to confirm equal transfection amounts of constructs and show changes in luciferase mRNA levels. It would be better to use a dual luciferase construct for internal normalization.

      We would like to thank the reviewer for his concern and comments related to the luciferase reporter assay. As mentioned in the Methods equal transfection amount (0.5 µg) were used in our study (Line 658). We chose to normalize the reporter activity using total protein concentration instead of using a dual-reporter system to avoid crosstalk with co-transfected control plasmids. This is now included in the Materials and Method section (Lines 662-664). The optimized design of the LightSwitch Assay system used in our study allows a single assay design when a highly efficient transfection system is used (as recommended by the manufacturer). We verified the presence of the correct insert in the CCL2 Light Switch 3’UTR reporter constructs (Author response image 5). We also sequenced the vector backbone of both constructs to rule out any inadvertently added mutations.

      Author response image 5.

      Schematic of the Lightswitch 3’UTR vector. (A) Vector information. The vector contains a multiple cloning site (MCS) upstream of the Renilla Luciferase gene (RenSP). Human 3’UTR CCL2 is cloned into MCS downstream of the reporter gene and it becomes a part of a hybrid transcript that contains the luciferase coding sequence used to the UTR sequence of CCL2. Constructs containing rs13900C or rs13900T allele were generated using site-specific mutagenesis on CCL2 LightSwitch 3’UTR reporter. The constructs were validated by Sanger sequencing. (B&C) Sequence chromatograph of the constructs containing CCL2-3’UTR insert showing rs13900C and rs13900T respectively. The result confirms the fidelity of the constructs used in the reporter assay.

      Comment 9: Polysome Data Presentation: The authors should present the distribution of luciferase mRNA (rs13900T and rs13900C) in all fractions separately and include data on the translation of a control like ACTB or GAPDH.

      Since our assessment of CCL2 allele-specific enrichment in the polysome fractions from MDMs of heterozygous donors did not yield a consistent pattern for differential loading (Supplementary Table3), we used a 3’UTR reporter-based assays that estimated the impact of rs13900 T and C alleles on overall translational output (translatability). The translatability was calculated as luciferase activity normalized by luciferase mRNA levels after adjusting for protein and 18S rRNA using a previously reported method (Zhang et al., 2017). As the measurement of relative allele enrichment in polysome fractions was not included in our invitro reporter assays, it is not possible to present the distribution of luciferase mRNA in various fractions separately. Author response image 6 shows the proportion of CCL2 mRNA in different fractions corresponding to cytosolic, monosome and polysome fractions obtained from MDM lysates from heterozygous donors along with 18S rRNA quantification.

      Author response image 6.

      Determination of rs13900C/T allelic enrichment in polysome fractions and its effect on polysome loading. Polysome profile obtained by sucrose gradient centrifugation of macrophages before and after stimulation with LPS (1 µg/mL) for 3 h. (A&B) The CCL2 mRNA shifts from monosome-associated fractions to heavier polysomes following LPS stimulation, indicating increased translation efficiency. (C&D) In contrast, the distribution of 18S shows no significant shift due to LPS treatment. (mean ± SEM, n=4). The percentage of mRNA loading on polysome was calculated using ΔCT method (mean ± SEM, n=4). (E&F) CCL2 AEI measurement in polysomes of macrophages from heterozygous donors (n=2). Genomic and cDNA were subjected to Sanger sequencing and the peak height of both the alleles were used to determine the relative abundance of each allele.

      Comment 10: Please explain in detail how primary monocytes were transfected with siRNAs for more than 72 hours. Typically, primary monocytes are very hard to transfect, have a very limited lifespan in culture (around 48 hours), and show a high level of cell death upon transfection. If monocytes were differentiated from macrophages, explain in detail how it was done and provide supporting citations from the literature.

      We agree with the challenges associated with transfecting primary monocytes, including their limited lifespan in culture and susceptibility to cell death following transfection and apologize for not elaborating the method section on lentiviral transduction of primary macrophages. To overcome these limitations, we utilized monocytes undergoing differentiation into macrophages rather than fully differentiated macrophages for our experiments. Cells were transfected by slightly modifying the method described by Plaisance-Bonstaff et.al 2019 (Plaisance-Bonstaff et al., 2019). Briefly, monocytes were purified from PBMCs obtained from homozygous donors for rs13900 C or rs13900T by negative selection. Upon purification cells were resuspended in 24 well plates at a seeding density of 0.5 x10<sup>6</sup> cells per well and were further cultured in the medium supplemented with 50 ng/mL M-CSF (Fig S7 and Fig. S6). After 24 h, ready to use GFP-tagged pCMV6-HuR or CMV-null lentiviral particles (Amsbio, Cambridge, M.A) were transduced into 0.5 x10<sup>6</sup> cells in presence of polybrene (60 µg/mL) at a MOI of 1. The cells were processed for HuR and CCL2 expression 72 h after transduction after stimulation with LPS for 3 h. This data is now shown in new Supplementary Figure S7.

      Comment 11: The authors should prove the binding of HuR to the 3'UTR of CCL2 not only in vitro but also in cells. For this aim, a CLIP including RNA fragmentation followed by RT-PCR or sequencing would be more informative than a RIP. It would be helpful also to demonstrate the different binding to the 3'UTR variants (rs13900C vs. rs13900T).

      We thank the reviewer for his valuable suggestion on validating binding of HuR to the 3’UTR in cells. It is important to highlight that several independent datasets including CLIP have already demonstrated that HuR binds to the 3’UTR of CCL2 including the region spanning the rs13900 locus. We have summarized the relevant studies in a tabular form (Supplementary Table-2). We are unable to confirm these results in new experiments due to sample limitation. The already existing data and experimental evidence provided in this manuscript strongly suggest that HuR binds within the 3’UTR. Also, a previously published study (Fan et al, 2011) showed that only the first 125 bp of the CCL2 3’UTR that flanks rs13900 showed strong binding to HuR but not the CCL2 coding region or other regions of 3’UTR. This further suggests that the HuR binding to the CCL2 is localized to the 3’UTR that flanks rs13900. Please note that the primers used for amplification of the RIP material were 3’-UTR specific.

      Comment 12: To quantify nascent RNA, Figure 2C should be replaced by new experiments. To label nascent RNA, authors can perform a run on/run-off experiments only with EU, without actinomycin D. As aforementioned, ActD blocks the transcription of new RNA, therefore is not useful for studying nascent RNA.

      We thank the reviewer for the suggestion and would like to emphasize that while measuring the rs13900C/T allelic ratio in nascent RNA, the experimental setup included evaluating the AEI both in presence and absence of the transcriptional inhibitor actinomycin D. The data presented in Figure 2C shows that the AEI in presence of actinomycin D is amplified in comparison to non-actinomycin D treatment. This provides definitive evidence to our hypothesis that rs13900T confers greater stability to the CCL2 message. We apologize for the oversight of not mentioning non-ACT D treatment in the methods. Necessary changes have been made to the revised manuscript (Lines 553-63).

      Comment 13: The authors should also investigate the role of TIA1 as a potential RBP and explore the possibility that TIA1 may interact more with the C allele to suppress translation.

      Based on the existing studies, we highlighted the importance of RNA-binding proteins such as TIA1 and U2AF56 that may interact with CCL2 transcript (Lines 408-09). However, exploring TIA1 binding and its functional consequences are beyond the scope of the current study. We thank the reviewer for this comment and this aspect will be pursued in future studies.

      Comment 14: It would be informative if the authors included study limitations and potential clinical implications of these findings, particularly regarding therapeutic approaches targeting CCL2.

      We would like to inform the reviewer that the submitted manuscript included the limitations of our study. They were discussed at appropriate places and were not included as a separate section. For instance, Line 398 emphasizes the need for in-depth studies for association of rs13900 and canonical CCL2 transcript. The need for additional studies regarding SNP-induced structural changes in RNA and its implication for RBP accessibility was highlighted at Lines 417-419. The inconclusive results of differential loading of polysomes and the need to conduct further research on the impact of rs13900 on CCL2 translatability in primary cells (Lines 457-459). We noted at Lines 484-485 about our further studies exploring the differential binding of HuR to the other regions of CCL2 3’UTR.

      Multiple studies have indicated that functional interference of HuR as a novel therapeutic strategy, particularly in the context of cancer, inflammation, neurodegeneration, and autoimmune disorders. These approaches include inhibitors such as MS-444, KH-3, and CMLD-2 that disrupt the interaction between HuR and ARE elements or mRNAs of target genes involved in disease pathology (Chaudhary et al., 2023; Fattahi et al., 2022; Lang et al., 2017; Liu et al., 2020; Wang et al., 2019; Wei et al., 2024), offering a potential new avenue for disease treatment. Findings from our studies provide unique insights on regulation of CCL2 expression by both rs13900 and HuR. We strongly believe that the SNP rs13900 and HuR represent a new druggable target for M/M-mediated disorders such as inflammatory diseases, cancer, and cardiovascular diseases. The potential clinical implications have been discussed in the revised manuscript (Lines 487-494)

      References

      Barta, N., Ordog, N., Pantazi, V., Berzsenyi, I., Borsos, B.N., Majoros, H., Pahi, Z.G., Ujfaludi, Z., Pankotai, T., 2023. Identifying Suitable Reference Gene Candidates for Quantification of DNA Damage-Induced Cellular Responses in Human U2OS Cell Culture System. Biomolecules 13.

      Chaudhary, S., Appadurai, M.I., Maurya, S.K., Nallasamy, P., Marimuthu, S., Shah, A., Atri, P., Ramakanth, C.V., Lele, S.M., Seshacharyulu, P., Ponnusamy, M.P., Nasser, M.W., Ganti, A.K., Batra, S.K., Lakshmanan, I., 2023. MUC16 promotes triple-negative breast cancer lung metastasis by modulating RNA-binding protein ELAVL1/HUR. Breast Cancer Res 25, 25.

      Fan, J., Ishmael, F.T., Fang, X., Myers, A., Cheadle, C., Huang, S.K., Atasoy, U., Gorospe, M., Stellato, C., 2011. Chemokine transcripts as targets of the RNA-binding protein HuR in human airway epithelium. J Immunol 186, 2482-2494.

      Fattahi, F., Ellis, J.S., Sylvester, M., Bahleda, K., Hietanen, S., Correa, L., Lugogo, N.L., Atasoy, U., 2022. HuR-Targeted Inhibition Impairs Th2 Proinflammatory Responses in Asthmatic CD4(+) T Cells. J Immunol 208, 38-48.

      Hubal, M.J., Devaney, J.M., Hoffman, E.P., Zambraski, E.J., Gordish-Dressman, H., Kearns, A.K., Larkin, J.S., Adham, K., Patel, R.R., Clarkson, P.M., 2010. CCL2 and CCR2 polymorphisms are associated with markers of exercise-induced skeletal muscle damage. J Appl Physiol (1985) 108, 1651-1658.

      Intemann, C.D., Thye, T., Forster, B., Owusu-Dabo, E., Gyapong, J., Horstmann, R.D., Meyer, C.G., 2011. MCP1 haplotypes associated with protection from pulmonary tuberculosis. BMC Genet 12, 34.

      Jao, C.Y., Salic, A., 2008. Exploring RNA transcription and turnover in vivo by using click chemistry. Proc Natl Acad Sci U S A 105, 15779-15784.

      Johnson, A.D., Zhang, Y., Papp, A.C., Pinsonneault, J.K., Lim, J.E., Saffen, D., Dai, Z., Wang, D., Sadee, W., 2008. Polymorphisms affecting gene transcription and mRNA processing in pharmacogenetic candidate genes: detection through allelic expression imbalance in human target tissues. Pharmacogenet Genomics 18, 781791.

      Kasztelewicz, B., Czech-Kowalska, J., Lipka, B., Milewska-Bobula, B., Borszewska-Kornacka, M.K., Romanska, J., Dzierzanowska-Fangrat, K., 2017. Cytokine gene polymorphism associations with congenital cytomegalovirus infection and sensorineural hearing loss. Eur J Clin Microbiol Infect Dis 36, 1811-1818. Lang, M., Berry, D., Passecker, K., Mesteri, I., Bhuju, S., Ebner, F., Sedlyarov, V., Evstatiev, R., Dammann, K., Loy, A., Kuzyk, O., Kovarik, P., Khare, V., Beibel, M., Roma, G., Meisner-Kober, N., Gasche, C., 2017. HuR Small-Molecule Inhibitor Elicits Differential Effects in Adenomatosis Polyposis and Colorectal Carcinogenesis. Cancer Res 77, 2424-2438.

      Lebedeva, S., Jens, M., Theil, K., Schwanhausser, B., Selbach, M., Landthaler, M., Rajewsky, N., 2011. Transcriptome-wide analysis of regulatory interactions of the RNA-binding protein HuR. Mol Cell 43, 340-352.

      Liu, S., Huang, Z., Tang, A., Wu, X., Aube, J., Xu, L., Xing, C., Huang, Y., 2020. Inhibition of RNA-binding protein HuR reduces glomerulosclerosis in experimental nephritis. Clin Sci (Lond) 134, 1433-1448.

      Mao, F., Xiao, L., Li, X., Liang, J., Teng, H., Cai, W., Sun, Z.S., 2016. RBP-Var: a database of functional variants involved in regulation mediated by RNA-binding proteins. Nucleic Acids Res 44, D154-163.

      Pabis, M., Popowicz, G.M., Stehle, R., Fernandez-Ramos, D., Asami, S., Warner, L., Garcia-Maurino, S.M., Schlundt, A., Martinez-Chantar, M.L., Diaz-Moreno, I., Sattler, M., 2019. HuR biological function involves RRM3-mediated dimerization and RNA binding by all three RRMs. Nucleic Acids Res 47, 1011-1029.

      Paulsen, M.T., Veloso, A., Prasad, J., Bedi, K., Ljungman, E.A., Tsan, Y.C., Chang, C.W., Tarrier, B., Washburn, J.G., Lyons, R., Robinson, D.R., Kumar-Sinha, C., Wilson, T.E., Ljungman, M., 2013. Coordinated regulation of synthesis and stability of RNA during the acute TNF-induced proinflammatory response. Proc Natl Acad Sci U S A 110, 2240-2245.

      Pham, M.H., Bonello, G.B., Castiblanco, J., Le, T., Sigala, J., He, W., Mummidi, S., 2012. The rs1024611 regulatory region polymorphism is associated with CCL2 allelic expression imbalance. PLoS One 7, e49498.

      Plaisance-Bonstaff, K., Faia, C., Wyczechowska, D., Jeansonne, D., Vittori, C., Peruzzi, F., 2019. Isolation, Transfection, and Culture of Primary Human Monocytes. J Vis Exp.

      Ripin, N., Boudet, J., Duszczyk, M.M., Hinniger, A., Faller, M., Krepl, M., Gadi, A., Schneider, R.J., Sponer, J., Meisner-Kober, N.C., Allain, F.H., 2019. Molecular basis for AU-rich element recognition and dimerization by the HuR C-terminal RRM. Proc Natl Acad Sci U S A 116, 2935-2944.

      Russo, J., Heck, A.M., Wilusz, J., Wilusz, C.J., 2017. Metabolic labeling and recovery of nascent RNA to accurately quantify mRNA stability. Methods 120, 39-48.

      Wang, J., Hjelmeland, A.B., Nabors, L.B., King, P.H., 2019. Anti-cancer effects of the HuR inhibitor, MS-444, in malignant glioma cells. Cancer Biol Ther 20, 979-988.

      Wei, L., Kim, S.H., Armaly, A.M., Aube, J., Xu, L., Wu, X., 2024. RNA-binding protein HuR inhibition induces multiple programmed cell death in breast and prostate cancer. Cell Commun Signal 22, 580.

      Zeng, X., Zeng, W.H., Zhou, J., Liu, X.M., Huang, G., Zhu, H., Xiao, S., Zeng, Y., Cao, D., 2022. Removal of nonspecific binding proteins is required in co-immunoprecipitation with nuclear proteins. Biotechniques 73, 289-296.

      Zhang, X., Chen, X., Liu, Q., Zhang, S., Hu, W., 2017. Translation repression via modulation of the cytoplasmic poly(A)-binding protein in the inflammatory response. Elife 6.

    1. eLife Assessment

      This study is a fundamental advance in the field of developmental biology and transcriptional regulation that demonstrates the use of hPSC-derived organoids to generate reproducible organoids to study the mechanisms that drive neural tube closure. The work is exceptional and solid, providing both technical advances and new knowledge on human development through embryo models.

    2. Reviewer #1 (Public review):

      Summary:

      This is a wonderful and landmark study in the field of human embryo modeling. It uses patterned human gastruloids and conducts a functional screen on neural tube closure, and identifies positive and negative regulators, and defines the epistasis among them.

      Strengths:

      The above was achieved following optimization of the micro-pattern-based gastruloid protocol to achieve high efficiency, and then optimized to conduct and deliver CRISPRi without disrupting the protocol. This is a technical tour de force as well as one of the first studies to reveal new knowledge on human development through embryo models, which has not been done before.

      The manuscript is very solid and well-written. The figures are clear, elegant, and meaningful. The conclusions are fully supported by the data shown. The methods are well-detailed, which is very important for such a study.

      Weaknesses:

      This reviewer did not identify any meaningful, major, or minor caveats that need addressing or correcting.

      A minor weakness is that one can never find out if the findings in human embryo models can be in vitro revalidated in humans in vivo. This is for obvious and justified ethical reasons. However, the authors acknowledge this point in the section of the manuscript detailing the limitations of their study.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript is a technical report on a new model of early neurogenesis, coupled to a novel platform for genetic screens. The model is more faithful than others published to date, and the screening platform is an advance over existing ones in terms of speed and throughput.

      Strengths:

      It is novel and useful.

      Weaknesses:

      The novelty of the results is limited in terms of biology, mainly a proof of concept of the platform and a very good demonstration of the hierarchical interactions of the top regulators of GRNs.

      The value of the manuscript could be enhanced in two ways:

      (1) by showing its versatility and transforming the level of neural tube to midbrain and hindbrain, and looking at the transcriptional hierarchies there.

      (2) by relating the patterning of the organoids to the situation in vivo, in particular with the information in reference 49. The authors make a statement "To compare our findings with in vivo gene expression patterns, we applied the same approach to published scRNA-seq data from 4-week-old human embryos at the neurula stage" but it would be good to have a more nuanced reference: what stage, what genes are missing, what do they add to the information in that reference?

    1. eLife Assessment

      In this important study, the authors engineered and characterised novel genetically encoded calcium indicators (GECIs) and an analytical tool (CaFire) capable of reporting and quantifying various sub-synaptic events, including miniature synaptic events, with a speed and sensitivity approaching that of intracellular electrophysiological recordings. While the evidence supporting the improvements in the speed and accuracy of these tools is convincing, including additional information about key imaging parameters, the Bar8f experiments, and CaFire would strengthen the study. This work will be of interest to neurobiologists studying synaptic calcium dynamics in various model systems.

    2. Reviewer #1 (Public review):

      Summary:

      Chen et al. engineered and characterized a suite of next-generation GECIs for the Drosophila NMJ that allow for the visualization of calcium dynamics within the presynaptic compartment, at presynaptic active zones, and in the postsynaptic compartment. These GECIs include ratiometric presynaptic Scar8m (targeted to synaptic vesicles), ratiometric active zone localized Bar8f (targeted to the scaffold molecule BRP), and postsynaptic SynapGCaMP8m. The authors demonstrate that these new indicators are a large improvement on the widely used GCaMP6 and GCaMP7 series GECIs, with increased speed and sensitivity. They show that presynaptic Scar8m accurately captures presynaptic calcium dynamics with superior sensitivity to the GCaMP6 and GCaMP7 series and with similar kinetics to chemical dyes. The active-zone targeted Bar8f sensor was assessed for the ability to detect release-site-specific nanodomain changes, but the authors concluded that this sensor is still too slow to accurately do so. Lastly, the use of postsynaptic SynapGCaMP8m was shown to enable the detection of quantal events with similar resolution to electrophysiological recordings. Finally, the authors developed a Python-based analysis software, CaFire, that enables automated quantification of evoked and spontaneous calcium signals. These tools will greatly expand our ability to detect activity at individual synapses without the need for chemical dyes or electrophysiology.

      Strengths:

      (1) In this study, the authors rigorously compare their newly engineered GECIs to those previously used at the Drosophila NMJ, highlighting improvements in localization, speed, and sensitivity. These comparisons appropriately substantiate the authors' claim that their GECIs are superior to those currently in use.

      (2) The authors demonstrate the ability of Scar8m to capture subtle changes in presynaptic calcium resulting from differences between MN-Ib and MN-Is terminals and from the induction of presynaptic homeostatic potentiation (PHP), rivaling the sensitivity of chemical dyes.

      (3) The improved postsynaptic SynapGCaMP8m is shown to approach the resolution of electrophysiology in resolving quantal events.

      (4) The authors created a publicly available pipeline that streamlines and standardizes analysis of calcium imaging data.

      Weaknesses:

      (1) Given the superior performance of GCaMP8m in the vesicle-tethered and postsynaptic applications, an analysis of its functionality at individual active zones ("Bar8m") would be a useful addition to this compendium, especially since the authors show that the faster kinetics of GCaMP8f are still not fast enough to resolve active zone-specific calcium dynamics.

      (2) Description of the CaFire pipeline could be clearer (for example, what exactly is the role of Excel?), and the GitHub user guide could be more fleshed out (with the addition of example ImageJ scripts and analyzed images).

    3. Reviewer #2 (Public review):

      Summary

      Calcium ions play a key role in synaptic transmission and plasticity. To improve calcium measurements at synaptic terminals, previous studies have targeted genetically encoded calcium indicators (GECIs) to pre- and postsynaptic locations. Here, Chen et al. improve these constructs by incorporating the latest GCaMP8 sensors and a stable red fluorescent protein to enable ratiometric measurements. In addition, they develop a new analysis platform, 'CaFire', to facilitate automated quantification. Using these tools, the authors demonstrate favorable properties of their sensors relative to earlier constructs. Impressively, by positioning postsynaptic GCaMP8m near glutamate receptors, they show that their sensors can report miniature synaptic events with speed and sensitivity approaching that of intracellular electrophysiological recordings. These new sensors and the analysis platform provide a valuable tool for resolving synaptic events using all-optical methods.

      Strengths:

      The authors present a rigorous characterization of their sensors using well-established assays. They employ immunostaining and super-resolution STED microscopy to confirm correct subcellular targeting. Additionally, they quantify response amplitude, rise and decay kinetics, and provide side-by-side comparisons with earlier-generation GECIs. Importantly, they show that the new sensors can reproduce known differences in evoked Ca²⁺ responses between distinct nerve terminals. Finally, they present what appears to be the first simultaneous calcium imaging and intracellular mEPSP recording to directly assess the sensitivity of different sensors in detecting individual miniature synaptic events.

      Weaknesses:

      Major points:

      (1) While the authors rigorously compared the response amplitude, rise, and decay kinetics of several sensors, key parameters like brightness and photobleaching rates are not reported. I feel that including this information is important as synaptically tethered sensors, compared to freely diffusible cytosolic indicators, can be especially prone to photobleaching, particularly under the high-intensity illumination and high-magnification conditions required for synaptic imaging. Quantifying baseline brightness and photobleaching rates would add valuable information for researchers intending to adopt these tools, especially in the context of prolonged or high-speed imaging experiments.

      (2) In several places, the authors compare the performance of their sensors with synthetic calcium dyes, but these comparisons are based on literature values rather than on side-by-side measurements in the same preparation. Given differences in imaging conditions across studies (e.g., illumination, camera sensitivity, and noise), parameters like indicator brightness, SNR, and photobleaching are difficult to compare meaningfully. Additionally, the limited frame rate used in the present study may preclude accurate assessment of rise times relative to fast chemical dyes. These issues weaken the claim made in the abstract that "...a ratiometric presynaptic GCaMP8m sensor accurately captures .. Ca²⁺ changes with superior sensitivity and similar kinetics compared to chemical dyes." The authors should clearly acknowledge these limitations and soften their conclusions. A direct comparison in the same system, if feasible, would greatly strengthen the manuscript.

      (3) The authors state that their indicators can now achieve measurements previously attainable with chemical dyes and electrophysiology. I encourage the authors to also consider how their tools might enable new measurements beyond what these traditional techniques allow. For example, while electrophysiology can detect summed mEPSPs across synapses, imaging could go a step further by spatially resolving the synaptic origin of individual mEPSP events. One could, for instance, image MN-Ib and MN-Is simultaneously without silencing either input, and detect mEPSP events specific to each synapse. This would enable synapse-specific mapping of quantal events - something electrophysiology alone cannot provide. Demonstrating even a proof-of-principle along these lines could highlight the unique advantages of the new tools by showing that they not only match previous methods but also enable new types of measurements.

      (4) For ratiometric measurements, it is important to estimate and subtract background signals in each channel. Without this correction, the computed ratio may be skewed, as background adds an offset to both channels and can distort the ratio. However, it is not clear from the Methods section whether, or how, background fluorescence was measured and subtracted.

      (5) At line 212, the authors claim "... GCaMP8m showing 345.7% higher SNR over GCaMP6s....(Fig. 3D and E) ", yet the cited figure panels do not present any SNR quantification. Figures 3D and E only show response amplitudes and kinetics, which are distinct from SNR. The methods section also does not describe details for how SNR was defined or computed.

      (6) Lines 285-287 "As expected, summed ΔF values scaled strongly and positively with AZ size (Fig. 5F), reflecting a greater number of Cav2 channels at larger AZs". I am not sure about this conclusion. A positive correlation between summed ΔF values and AZ size could simply reflect more GCaMP molecules in larger AZs, which would give rise to larger total fluorescence change even at a given level of calcium increase.

      (7) Lines 313-314: "SynapGCaMP quantal signals appeared to qualitatively reflect the same events measured with electrophysiological recordings (Fig. 6D)." This statement is quite confusing. In Figure 6D, the corresponding calcium and ephys traces look completely different and appear to reflect distinct sets of events. It was only after reading Figure 7 that I realized the traces shown in Figure 6D might not have been recorded simultaneously. The authors should clarify this point.

      (8) Lines 310-313: "SynapGCaMP8m .... striking an optimal balance between speed and sensitivity", and Lines 314-316: "We conclude that SynapGCaMP8m is an optimal indicator to measure quantal transmission events at the synapse." Statements like these are subjective. In the authors' own comparison, GCaMP8m is significantly slower than GCaMP8f (at least in terms of decay time), despite having a moderately higher response amplitude. It is therefore unclear why GCaMP8m is considered 'optimal'. The authors should clarify this point or explain their rationale for prioritizing response amplitude over speed in the context of their application.

    4. Reviewer #3 (Public review):

      Genetically encoded calcium indicators (GECIs) are essential tools in neurobiology and physiology. Technological constraints in targeting and kinetics of previous versions of GECIs have limited their application at the subcellular level. Chen et al. present a set of novel tools that overcome many of these limitations. Through systematic testing in the Drosophila NMJ, they demonstrate improved targeting of GCaMP variants to synaptic compartments and report enhanced brightness and temporal fidelity using members of the GCaMP8 series. These advancements are likely to facilitate more precise investigation of synaptic physiology.

      This is a comprehensive and detailed manuscript that introduces and validates new GECI tools optimized for the study of neurotransmission and neuronal excitability. These tools are likely to be highly impactful across neuroscience subfields. The authors are commended for publicly sharing their imaging software.

      This manuscript could be improved by further testing the GECIs across physiologically relevant ranges of activity, including at high frequency and over long imaging sessions. The authors provide a custom software package (CaFire) for Ca2+ imaging analysis; however, to improve clarity and utility for future users, we recommend providing references to existing Ca2+ imaging tools for context and elaborating on some conceptual and methodological aspects, with more guidance for broader usability. These enhancements would strengthen this already strong manuscript.

    1. eLife Assessment

      This important study describes a non-canonical role for IκBα in regulating mouse embryonic stem cell pluripotency and differentiation, independent of the classical NF-κB pathway. The conclusions are convincingly supported through orthogonal approaches and separation of function mutants. The findings add new insight into pluripotency regulation in mouse cells.

    2. Reviewer #1 (Public review):

      Summary:

      This study probes the role of the NF-κB inhibitor IκBa in the regulation of pluripotency in mouse embyronic stem cells (mESCs). It follows from previous work that identified a chromatin-specific role for IκBa in the regulation of tissue stem cell differentiation. The work presented here shows that a fraction of IκBa specifically associates with chromatin in pluripotent stem cells. Using three Nfkbia-knockout lines, the authors show that IκBa ablation impairs the exit from pluripotency, with embryonic bodies (an in vitro model of mESC multi-lineage differentiation) still expressing high levels of pluripotency markers after sustained exposure to differentiation signals. The maintenance of aberrant pluripotency gene expression under differentiation conditions is accompanied by pluripotency-associated epigenetic profiles of DNA methylation and histone marks. Using elegant separation of function mutants identified in a separate study, the authors generate versions of IκBa that are either impaired in histone/chromatin binding or NF-κB binding. They show that the provision of the WT IκBa, or the NF-κB-binding mutant can rescue the changes in gene expression driven by loss of IκBa, but the chromatin-binding mutant can not. Thus the study identifies a chromatin-specific, NF-κB-independent role of IκBa as a regulator of exit from pluripotency.

      Strengths:

      The strengths of the manuscript lie in:<br /> (a) the use of several orthogonal assays to support the conclusions on the effects of exit from pluripotency;<br /> (b) the use of three independent clonal Nfkbia-KO mESC lines (lacking IκBa), which increase confidence in the conclusions; and<br /> (c) the use of separation of function mutants to determine the relative contributions of the chromatin-associated and NF-κB-associated IκBa, which would otherwise be very difficult to unpick.

      Weaknesses:

      No notable weaknesses remain in this revised version.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript investigates the role of IκBα in regulating mouse embryonic stem cell (ESC) pluripotency and differentiation. The authors demonstrate that IκBα knockout impairs the exit from the naïve pluripotent state during embryoid body differentiation. Through mechanistic studies using various mutants, they show that IκBα regulates ESC differentiation through chromatin-related functions, independent of the canonical NF-κB pathway.

      Strengths:

      The authors nicely investigate the role of IκBα in pluripotency exit, using embryoid body formation and complementing the phenotypic analysis with a number of genome-wide approaches, including transcriptomic, histone marks deposition, and DNA methylation analyses. Moreover, they generate a first-of-its-kind mutant set that allows them to uncouple IκBα's function in chromatin regulation versus its NF-κB-related functions. This work contributes to our understanding of cellular plasticity and development, potentially interesting a broad audience including developmental biologists, chromatin biology researchers, and cell signaling experts.

      Weaknesses:

      Future experiments will likely help establish a more direct mechanistic link between IκBα activity and the chromatin remodeling events observed in pluripotent cells.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study probes the role of the NF-κB inhibitor IκBa in the regulation of pluripotency in mouse embyronic stem cells (mESCs). It follows from previous work that identified a chromatin-specific role for IκBa in the regulation of tissue stem cell differentiation. The work presented here shows that a fraction of IκBa specifically associates with chromatin in pluripotent stem cells. Using three Nfkbia-knockout lines, the authors show that IκBa ablation impairs the exit from pluripotency, with embryonic bodies (an in vitro model of mESC multi-lineage differentiation) still expressing high levels of pluripotency markers after sustained exposure to differentiation signals. The maintenance of aberrant pluripotency gene expression under differentiation conditions is accompanied by pluripotency-associated epigenetic profiles of DNA methylation and histone marks. Using elegant separation of function mutants identified in a separate study, the authors generate versions of IκBa that are either impaired in histone/chromatin binding or NF-κB binding. They show that the provision of the WT IκBa, or the NF-κB-binding mutant can rescue the changes in gene expression driven by loss of IκBa, but the chromatin-binding mutant can not. Thus the study identifies a chromatin-specific, NF-κB-independent role of IκBa as a regulator of exit from pluripotency.

      Strengths:

      The strengths of the manuscript lie in: (a) the use of several orthogonal assays to support the conclusions on the effects of exit from pluripotency; (b) the use of three independent clonal Nfkbia-KO mESC lines (lacking IκBa), which increase confidence in the conclusions; and (c) the use of separation of function mutants to determine the relative contributions of the chromatin-associated and NF-κB-associated IκBa, which would otherwise be very difficult to unpick.

      Weaknesses:

      In this reviewer's view, the term "differentiation" is used inappropriately in this manuscript. The data showing aberrant expression of pluripotency markers during embryoid body formation are supported by several lines of evidence and are convincing. However, the authors call the phenotype of Nfkbia-KO cells a "differentiation impairment" while the data on differentiation markers are not shown (beyond the fact that H3K4me1, marking poised enhancers, is reduced in genes underlying GO processes associated with differentiation and organ development). Data on differentiation marker expression from the transcriptomic and embryoid body immunofluorescent experiments, for example, should be at hand without the need to conduct many more experiments and would help to support the conclusions of the study or make them more specific. The lack of probing the differentiation versus pluripotency genes may be a missed opportunity in gaining in-depth understanding of the phenotype associated with loss of the chromatin-associated function of IκBa.

      Reviewer #2 (Public review):

      Summary:

      This manuscript investigates the role of IκBα in regulating mouse embryonic stem cell (ESC) pluripotency and differentiation. The authors demonstrate that IκBα knockout impairs the exit from the naïve pluripotent state during embryoid body differentiation. Through mechanistic studies using various mutants, they show that IκBα regulates ESC differentiation through chromatin-related functions, independent of the canonical NFκB pathway.

      Strengths:

      The authors nicely investigate the role of IκBα in pluripotency exit, using embryoid body formation and complementing the phenotypic analysis with a number of genome-wide approaches, including transcriptomic, histone marks deposition, and DNA methylation analyses. Moreover, they generate a first-of-its-kind mutant set that allows them to uncouple IκBα's function in chromatin regulation versus its NF-κB-related functions. This work contributes to our understanding of cellular plasticity and development, potentially interesting a broad audience including developmental biologists, chromatin biology researchers, and cell signaling experts.

      Weaknesses:

      - The study's main limitation is the lack of crucial controls using bona fide naïve cells across key experiments, including DNA methylation analysis, gene expression profiling in embryoid bodies, and histone mark deposition. This omission makes it difficult to evaluate whether the observed changes in IκBα-KO cells truly reflect naïve pluripotency characteristics.

      - Several conclusions in the manuscript require a more measured interpretation. The authors should revise their statements regarding the strength of the pluripotency exit block, the extent of hypomethylation, and the global nature of chromatin changes. - From a methodological perspective, the manuscript would benefit from additional orthogonal approaches to strengthen the knockout findings, which may be influenced by clonal expansion of ES cells.

      Overall, this study makes an important contribution to the field. However, the concerns raised regarding controls, data interpretation, and methodology should be addressed to strengthen the manuscript and support the authors' conclusions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I have the following comments and suggestions for the authors to consider:

      (1) Fig, 1D: the number of replicates for this experiment is not mentioned. It would be good to see if the apparent accumulation of IκBa on chromatin of S/L cells is reproducible. If it is, does the accumulation of IκBa "prime" chromatin for differentiation?

      We apologize for missing this information in the figure legend. We have repeated the experiment two independent times, and confirmed the localization of IκBα in the chromatin fraction of mESCs cultured in Serum/LIF (S/L). We have included the information in the figure legend.

      Regarding the second question, we do believe that the presence of IκBα primes mESCs to exit from differentiation. Previous data from the lab (Mulero et al Cancer Cell 2012; Marruecos et al EMBO Reports 2020) demonstrated that IκBα regulates important developmental genes (Hox genes and differentiation-related genes), which become dysregulated upon IκBα depletion. Based on those previous results, together with our results that demonstrated that lack of IκBα hyperactivates the pluripotency network, we conclude that IκBα is a crucial element to attenuate pluripotency programs, allowing a successful exit from naïve pluripotency and differentiation.

      (2) Fig. 1E: From what is shown, Rela doesn't agree (i.e. no enrichment in EpiSCs in the Atlasi data). Are the culture conditions in Atlasi 2020 the same as in this paper (base medium etc.)? Also, why not label all genes/proteins that are shown in 1C?

      Differences observed between our data and the in-silico data might be due to differences in culture conditions used in Atlasi and colleagues. In particular, Atlasi et al. cultured the mESCs in 2i/LIF for 2 consecutive months, whereas we induced ground state of naïve pluripotency (2i/LIF) for only 96h. In the case of EpiSC differentiation, similar protocols are used in both our work and in Atlasi et al. Nevertheless, despite existing differences, in both studies IκBα is enriched in the ground state of naive pluripotency. 

      The reason why some proteins that are missing in Figure 1E but appearing in Figure 1C is because they are not detected in the mass spectrometry experiment.

      (3) Fig. 1F: The word "clustering" here is misleading. While Nfkbia shows similar dynamics as pluripotency genes, clustering should not be used unless clusters of genes are shown in the same heatmap (and the transcripts naturally cluster together). The figure would be even more informative if all the genes from the 4 different categories were presented on the same heatmap.

      As suggested by the reviewer, we have generated a heatmap where the  genes from the different four categories (Figure 1F) are displayed  and clustered together:

      Author response image 1.

      Heatmap including all the genes from Figure 1F of the manuscript and clustering is simultaneously conducted over the four categories.

      As shown in previous heatmap, we can confirm that most of the Nf-kB genes (except for Nfkbia and Nfkbid) clustered together with differentiation markers.   

      Nonetheless, to be more conservative with original Figure 1F and for clarity upon gene categories,  we have updated the figure  with a combined heatmap, sliced by gene categories.  In this updated version, we can observe how IkBα gene, though classified by the biological process where it classically belongs (NF-kB pathway), is higher at pluripotency, whereas it decreases upon differentiation induction, similarly as most of the pluripotency genes.

      We have also changed the text accordingly and have added the following sentences in the main text (lines 121-125): “The expression pattern of Nfkbia was similar to the pluripotency genes whereas most of the NF-κB genes were upregulated upon differentiation, showing an analogous expression dynamics as developmental genes, as previously described”.

      (4) This reviewer felt that the statement "Notably, several polycomb elements were highly expressed in mESCs, consistent with the possibility that chromatin-bound IκBα modulates PRC2 activity in the pluripotent state" (p.5, lines 125-127) is premature here. While similar expression dynamics may be consistent with a linked function, they in no way suggest this. This can be more accurately stated to point out that Nfkbia shows similar expression dynamics in pluripotency and differentiation as Polycomb component      genes.

      We agree that the statement is premature and we have changed it by: “Previous reports have demonstrated that chromatin-bound IκBα modulates PRC2 activity in different adult stem cell models [27]. Interestingly, we observed that most of the Polycomb target genes follow a similar expression pattern of Nfkbia and pluripotency, with higher expression in mESCs (Figure 1F).” (lines 125-128 in the manucript).

      (5) Top of p. 6: the results are mis-attributed to Fig. 1, it should be Fig. 2.

      We thank the reviewer for this observation. We have corrected it in the main text.

      (6) Fig. 1B and Fig. 5I: the images of the AP stains are very difficult to see, better resolution images should be used.

      We have increased both the resolution and the size of the AP colonies.

      (7) Line 142 (p.6): Fig. S1B should be S1C. In general the manuscript would benefit from review of the order and labeling of the figure panels as there are a number of inconsistencies.

      We have better organized the figures in the new version of the manuscript. In particular, we have reorganized the Figure S1 to have a more logical order. We have done the same for the Figure 2 and Figure 5 and they are updated in the new version of the reviewed manuscript.

      (8) The authors call the phenotype of Nfkbia-KO cells a "differentiation impairment". Do the EBs shown in Fig. 2 also express differentiation markers? Do they fail to up-regulate those markers or just fail to down-regulate pluripotency markers? At the transcriptomic level the Nfkbia-KO cells still change significantly upon provision of differentiation signals (Fig. 2C), what types of gene processes underlie the differences between WT and KO cells and which processes are common? Also, based on this figure, the phenotype looks to be more of a delay than a failure in differentiation, as the cells still follow the same trajectory but lag behind the WT cells. It is difficult to discern whether this is the case based on Fig. 2E-G as we don't see the later time point (up to Day 9).

      In general, with the data presented in Fig. 2C and Fig. S1, the authors show that many of the hallmarks of exit from pluripotency are impaired in Nfkbia-KO cells, as well as the general "transcriptional status" of the cells, but they don't show differentiation markers (which would be necessary to conclude an impairment in differentiation). The data should be readily available in the datasets that are in the manuscript already and it will be informative to extract and present them. The data are not currently publicly accessible (unavailable until July 2025) so it was not possible to mine them.

      We appreciate the observation, and we have included more data to confirm that the IκBα-KO cells show a differentiation impairment. In the first version of the manuscript, differentiation markers are displayed from Figures 2E-G, where genes from the three germ layers (ectoderm, mesoderm and endoderm) are not activated in IκBα-KO EBs at 48h and 96h. Moreover, the volcano plot displayed in Figure S1F of the first version clearly shows a downregulation of important differentiation genes such as a T, Eomes, Lhx1 and Foxa2. We agree that 96h EBs is an early time point to talk about differentiation impairment. For that reason, we have also included the same pluripotent and differentiation genes in 216h EBs (Figures S1F-G of the newer version of the manuscript). It is clearly observed that IκBα-KO 216h EBs maintain an upregulation of pluripotency programs which negatively correlate with a lower differentiation capability. Moreover, the impairment in the differentiation with a higher expression of pluripotency markers is confirmed by the presence of high SSEA-1 expression in IκBα-KO 216h EBs (Figure S1C of the manuscript) and alkaline phosphatase (AP) staining (Figure 2C of the manuscript). Lastly, the fact that IκBα-KO teratomas contain higher proportion of OCT3/4+ cells further confirming that IκBα-KO cells cannot differentiate because of the inability to exit from pluripotency.

      Finally, generated data (and deposited in GEO repository with SuperSeries id GSE239565) is already publicly available. 

      (9) Fig. 5A: even if there are no global changes in NF-κB target genes, could a small subset of NF-κB target genes still mediate the IκBa effects?

      We have analyzed the whole NF-κB signature, and we have identified a small cluster of genes that are differentially expressed at 96h EBs between IκBα-KO and IκBα-WT (Author response image 2). Interestingly, what we observed is the opposite as expected since we see un downregulation of that subset in the IκBα-KO 96h EBs (Author response image 3). For that reason, detected changes in the NF-κB target gene expression after deletion of Nfkbia do not support an NF-κB inhibitory role for IkBa in pluripotent ESC.

      Author response image 2.

      Heatmap of NF-κB genes expression at the different time points of differentiation (mESCs, 48h EBs, 96h EBs). Highlighted region marks the genes that are differentially expressed between both genotypes at 96h EBs.

       

      Author response image 3.

      Violin plot of genes from the NF-κB pathway which are differentially expressed at 96h EBs.

      (10) Lines 233-238, the part of the text is repeated.

      We appreciate the observation and have deleted the repeated part.

      (11) The data in Fig. 5D-E make it difficult to be sure whether the conclusions on the relative subcellular localisations of the different mutants are accurate, as the chromatin-binding mutant seems to be less abundant than the other mutants (judging from the Input in Fig. 5C and also from the tubulin loading controls in Fig. 5D-E). Showing the IκBa levels in total extracts would make the interpretation of these data more robust. The authors do mention that the chromatin-binding mutant IκBa protein is consistently expressed at lower levels but they do not comment on how this may affect the data interpretation - could the lack of rescue be due to lower levels of the chromatin-binding mutant IκBa relative to the wild-type IκBa? This should be addressed in the Discussion, if not tested formally by normalising the expression levels of the different forms of IκBa in the rescue experiments.

      Although protein stability is different among the SOF mutants, IκBα<sup>ΔChromatin</sup> is exclusively detected in the cytoplasm, with lack of detection in the chromatin compartment (Figures 5D-E of the reviewed manuscript). For this reason, we believe that the quantitative differences in protein levels of the different mutants cannot explain the subcellular localization differences and the phenotype observed.

      Nonetheless, we cannot discard that differences in the protein levels between SOF mutants can affect the rescue phenotype, and we have specified so in the discussion section of the manuscript. 

      (12) Lines 260-261: "Induction of i-IκBαWT and i-IκBαΔNF-κB reduced the expression levels of the naive pluripotent genes Zfp42, Klf2, Sox2 and Tbx3, which were increased by i-IκBαΔChromatin (Figure 5F)." This is not an accurate statement. The expression was not reduced by the ΔChrom mutant in the same way as it was by the WT and the ΔNF-κB mutant, but it was not increased.

      We have better specified the description of the results displayed in Figure 5F (lines 258-261 of the main manuscript):

      “Induction of i-IκBα<sup>WT</sup> and i-IκBα<sup>ΔNF-κB</sup> reduced the expression levels of the naïve pluripotent genes Zfp42, Klf2, Sox2 and Tbx3. On the other hand, the same genes either do not change their expression (Zfp42, Sox2, Klf2) or increase their levels (Tbx3) upon i-IκBα<sup>ΔChromatin</sup>  induction (Figure 5F).”

      (13) In Fig. 5J the images will ideally be shown before and after Doxycycline treatment, to better support the conclusions.

      We have included a new panel in Figure S4 (Figure S4E in the reviewed manuscript) where the No doxycycline control 216 EBs between the different conditions (i-IκBα<sup>WT</sup>, i-IκBα<sup>ΔChrom</sup> and i-IκBα<sup>ΔNF-κB</sup>) are included.

      Reviewer #2 (Recommendations for the authors):

      - The PCA analysis in Figure 2 appears to contradict the authors' conclusions about global transcriptome changes in KO cells. Furthermore, there is a discrepancy between immunofluorescence data showing near-complete methylation loss and the methylation array analysis results.

      Although there is a differentiation block in the IkBa KO EBs, this is not complete and they show some differentiation trend after 96h (Fig 2C), moreover, acquisition of differentiation genes from all three germ layers is strongly affected (Figure 2E of the reviewed manuscript) and these programs remain downregulated and pluripotency genes are still expressed in IκBα-KO EBs at later time points (216h) (Fig 2B). Altogether demonstrates that the lack of IκBα impairs differentiation and the silencing of the pluripotency network.

      Discrepancies between methylation array and immunofluorescence are expected since immunofluorescence is not quantitative and the methylation array is very precise.  

      - The authors should revise their statements regarding the strength of the pluripotency exit block, the extent of hypomethylation, and the global nature of chromatin changes. For example, the observed chromatin changes, including H3K27ac modifications, appear relatively modest and should be described as such. - The manuscript would benefit from additional orthogonal approaches to strengthen the knockout findings, which may be influenced by clonal expansion of ES cells. Additionally, the emphasis on overlapping H3K4me3 and H3K27me3 regions should be reduced, as these represent a minor fraction of the affected regions (only 41 regions).

      We have revised the text and have included it in the discussion section the following text (lines 327-331 in the reviewed manuscript):

      “Although IκBα KO  mESCs  exhibit a transcriptional phenotype and hypomethylation state  that resembles the ground state of naïve pluripotency, there are only modest changes on histone marks associated to enhancers (H3K27Ac) or gene regulation (H3K4me3 and H3K27me3). Altogether indicates that further experiments are required to fully elucidate the effect of chromatin IκBα.”

      We have also included Fig S3E-S3F to show that similar differences as WT and KO in H3K4me3 and H3K27me3 are observed in a serum/LIF and 2i conditions, further supporting the fact that KO cells in Serum/LIF resemble WT cells in 2i condition.

    1. eLife Assessment

      This paper presents an important theoretical exploration of how a flexible protein domain with multiple DNA binding sites may simultaneously provide stability to the DNA-bound state and enables exploration of the DNA strand. The authors propose a mechanism ("octopusing") for protein doing a random walk while bound to DNA which simultaneously enables exploration of the DNA strand and enhances the stability of the bound state This study presents compelling evidence that their findings has implications for the way intrinsically disordered regions (IDR) of transcription factors proteins (TF) can enhance their ability to efficiently find their binding site on the DNA from which they exert control over the transcription of their target gene. The paper concludes with a comparison of model predictions with experimental data which gives further support to the proposed model.

    2. Reviewer #1 (Public review):

      Summary:

      The authors define the principles that, based on first principles, should be guiding the optimisation of transcription factors with intrinsically disordered regions (IDR). The authors introduce an original search process, coined "octopusing", that involves transcription factor IDR and their binding affinities to optimise search times and binding affinities. The first part concerns the optimal strategies to define binding affinities to the genome in the receiving region that is called the "antenna", highlighting the following: (i) reduce the target to IDR-binding distance on the genome, (ii) optimise the distance between the DNA binding domain and the binding sites on the IDR to be as close as possible to the distance between their binding sites on the genome; (iii) keep the same number of binding sites and their targets and modulate this number with binding strength, reducing them with increased strength; (iv) modulate the binding strength to be above a threshold that depends on the proportion of IDR binding sites in the antenna. The second part concerns the scaling of the search time in function of key parameters such as the volume of the nucleus, and the size of the antenna, derived as a combination of 3D search and 1D "octopusing". The third part focuses on validation, where the current results are compared to binding probability data from a single experiment, and new experiments are proposed to further validate the model as well as testing designed transcription factors.

      Strengths:

      The strength of this work is that it provides simple, interpretable and testable theoretical conclusions. This will allow the derived design principles to be understood, evaluated and improved in the future. The theoretical derivations are rigorous. The authors provide a comparison to experiments, and also propose new experiments to be performed in the future. This is a great value in the paper since it will set the stage and inspire new experimental techniques. Further, the field needs inspiration and motivation to develop these techniques, since they are required to benchmark the transcription factors designed with the methods presented in this paper, as well as to develop novel data based or in vivo methods that would greatly benefit the field. As such, this paper is a fundamental contribution to the field.

      Weaknesses:

      The model presents many first principles to drive the design of transcription factors, but arguably, other principles and mechanisms might also play a role by being beneficial to the search and binding process. These other principles are mentioned at the end of the discussion part of the paper. On the other hand, an important task left to do, is to critically consider these principles altogether, and analyse the available data to quantify which role is predominant among transcription factors IDRs functions. Further, since one function doesn't exclude another, a theoretical investigation of possible crosstalk, interaction, and cooperativity of those different hypothetical functions is still missing.

    3. Reviewer #2 (Public review):

      Summary:

      This is an interesting theoretical exploration of how a flexible protein domain, which has multiple DNA-binding sites along it, affects the stability of the protein-DNA complex. It proposes a mechanism ("octopusing") for protein doing a random walk while bound to DNA which simultaneously enables exploration of the DNA strand and stability of the bound state.

      Strengths:

      Stability of the protein-DNA bound state and the ability of the protein to perform 1d diffusion along the DNA are two properties of a transcription factor that are usually seen as being in opposition of each other. The octopusing mechanism is an elegant resolution of the puzzle of how both could be accommodated. This mechanism has interesting biological implications for the functional role of intrinsically disordered domains in transcription factor (TF) proteins. They show theoretically how these domains, if flexible and able to make multiple weak contacts with the DNA, can enhance the ability of the TF to efficiently find their binding site on the DNA from which they exert control over the transcription of their target gene. The paper concludes with a comparison of model predictions with experimental data which gives further support to the proposed mechanism. Overall, this is an interesting and well-executed theoretical paper that proposes an interesting idea about the functional role for IDR domains in TFs.

      Weaknesses:

      It is not clear how ubiquitous among eukaryotic transcription factors are the DNA binding sites for multiple subdomains along the IDR, which are assumed by the model. These assumptions though, provide interesting points of departure for further experiments.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      The authors define the principles that, based on first principles, should be guiding the optimisation of trascription factors with intrinsically disordered regions (IDR). The first part of the study defines the following principles to optimize the binding affinities to the genome in the receiving region that is called the ”antenna”: (i) reduce the target to IDR-binding distance on the genome, (ii) optimise the distance betwee the DNA binding domain and the binding sites on the IDR to be as close as possible to the distance between their binding sites on the genome; (iii) keep the same number of binding sites and their targets and modulate this number with binding strength, reducing them with increased strenght; (iv) modulate the binding strenght to be above a threshold that depends on the proportion of IDR binding sites in the antenna. The second part defines the scaling of the seach time in function of key parameters such as the volume of the nucleus, and the size of the antenna, derived as a combination of 3D search of the antenna and 1D ”octopusing” on the antenna. The third part focuses on validation, where the current results are compared to binding probabilith data from a single experiment, and new experiment are proposed to further validate the model as well as testing designed transcription factors.

      Strengths:

      The strength of this work is that it provides simple, interpretable and testable theoretical conclusions. This will allow the derived design principles to be understood, evaluated and improved in the future. The theoretical derivations are rigorous. The authors provides a comparison to experiments, and also propose new experiments to be performed in the future, this is a great value in the paper since it will set the stage and inspire new experimental techniques. Further, the field needs inspiration and motivations to develop these techniques, since they are required to benchmark the transcription factors designed with the methods presented in this paper, as well as to develop novel data based or in vivo methods that would greatly benefit the field. As such, this paper is a fundamental contribution to the field.

      Weaknesses:

      The model assumption that the interaction between the transcription factor and the DNA outside of the antenna region is negligible is probably too strong for many/most transcription factors, particularly in organisms with a longer genome than yeasts. The model presents many first principles to drive the design of transcription factor, but arguably, other principles and mechanisms might also play a role by being beneficial to the search and binding process. Specifically: (i) a role of the IDR in complex formation and cooperativity between multiple trascription factors, (ii) ability of the IDR to do parallel searching based on multiple DNA binding sites spaced by disordered regions, (iii) affinity of the IDR to specific compartmentalisations in the nucleus reducing the search time, etc. The paper would be improved by a discussion over alternative mechanisms.

      We thank the reviewer for highlighting that our work delivers simple, interpretable and rigorously derived conclusions, backed by experimental comparison and concrete proposals for future studies.

      Regarding interactions outside the antenna region, Supplementary S10 shows that the non-specific IDR–DNA interactions (on the order of 1 kBT) only slightly alter the 3D diffusion coefficient and thus do not affect our conclusions regarding the optimal search process.

      We have also added sentences in the discussion section regarding the alternative mechanism.

      Reviewer #2 (Public review):

      Summary:

      This is an interesting theoretical exploration of how a flexible protein domain, which has multiple DNAbinding sites along it, affects the stability of the protein-DNA complex. It proposes a mechanism (”octopusing”) for protein doing a random walk while bound to DNA which simultaneously enables exploration of the DNA strand and stability of the bound state.

      Strengths:

      Stability of the protein-DNA bound state and the ability of the protein to perform 1d diffusion along the DNA are two properties of a transcription factor that are usually seen as being in opposition of each other. The octopusing mechanism is an elegant resolution of the puzzle of how both could be accommodated. This mechanism has interesting biological implications for the functional role of intrinsically disordered domains in transcription factor (TF) proteins. They show theoretically how these domains, if flexible and able to make multiple weak contacts with the DNA, can enhance the ability of the TF to efficiently find their binding site on the DNA from which they exert control over the transcription of their target gene. The paper concludes with a comparison of model predictions with experimental data which gives further support to the proposed model. Overall, this is an interesting and well executed theoretical paper that proposes an interesting idea about the functional role for IDR domains in TFs.

      Weaknesses:

      IDR domains are assumed flexible which I believe is not always the case. Also, I’m not sure how ubiquitous are the assumed binding sites on the DNA for multiple subdomains along the IDR. These assumptions though seem like interesting points of departure for further experiments.

      We thank the reviewer for their careful and insightful evaluation of our work. In particular, we appreciate your emphasis on the inherent trade-off between binding stability and one-dimensional diffusion, and your recognition of how the octopusing mechanism elegantly reconciles these conflicting requirements.

      To address the flexibility of TFs with IDRs, we incorporated the spring’s rest length—effectively introducing tunable rigidity—in Supplementary Section S1, and we show that our design principles for binding probability remain robust. Indeed, this is a highly interesting point; a comprehensive study will require more detailed modeling alongside experimental validation.

      We acknowledge that the current evidence for IDR-directed DNA binding is primarily derived from a limited number of well-studied cases, particularly Msn2 in yeast, and the ubiquity of this mechanism across diverse transcription factors remains to be established.

      Reviewer #1 (Recommendations for the authors):

      The paper jumps to fast to the results, an larger introduction might improve the paper, the current introduction jumps too fast to results. Further, line 50, I don’t think that the figure is properly referenced. The formula 2 is confusing since what is the target volume V1 is not explained in the context of the formula, please expand the explanations.

      We appreciate the reviewer’s valuable recommendations. We have expanded the Introduction, clarified V<sub>1</sub>, and updated the line 50.

      Reviewer #2 (Recommendations for the authors):

      I have some mostly minor suggestions to the authors for improving the manuscript:

      In the abstract and introduction on at least two occasions the authors talk about IDRs as though they’re necessarily flexible. My understanding is that, while this is a very reasonable assumption, I don’t think this is something we know with any certainty for most IDRs. If the authors agree with my assessment I think they should reflect this uncertainty in the writing.

      Thank you for the recommendations. We revised the wording to reflect the uncertainty, changing it to: “... commonly assumed to behave as a long, flexible...” and “...can be assumed as flexible....”.

      It took me a bit of time to figure out what’s going on in Figure 1b. To help the reader I would suggest labeling the DBD targets (yellow square) and the IDR targets (gray squares) as such. The figure also left me guessing whether the DBD domain can bind to the IDR targets non-specifically? (I presume not.) This also brought a slightly bigger question into focus for me, wouldn’t the presence of the IDR binding ”sites” (since these ”sites” are on the protein I think the term ”domains” instead of ”sites” ) mean that this would increase the time the protein is bound non-specifically somewhere far from the target thereby increasing the search time. Or is the ability of the protein to bind specifically to DNA away from the DBD target ignored?

      We have labeled the DBD targets and IDR targets in the figure. ‘Domains’ usually refers to structured parts; we keep using ‘sites’ and clarify that they correspond to short linear motifs.

      The reviewer is correct. Our model omits any non-specific binding between the DBD and IDR-binding targets, as well as between the TF and other DNA regions. If such interactions were to substantially lengthen the search time, they would effectively revert our mechanism to the classical bacterial facilitateddiffusion model, which is generally considered inappropriate for IDR-mediated TF search in eukaryotic cells. However, Supplementary Figure S10 demonstrates that non-specific IDR–DNA interactions induce only marginal changes in the effective three-dimensional diffusion coefficient within complex chromatin environments, and therefore do not alter our conclusions regarding the optimal search process.

      In Equation 2 and the text that follows I was left wondering what is the target volume V1. Also, I think it would be helpful to the reader to give them a sense of scale for the dimension full quantities appearing in Equation 2. This is done later when comparing the theory to experimental data, but I think it would be helpful to give a sense of size earlier in the manuscript.

      V<sub>1</sub> denotes the volume of the IDR–binding target region, which is on the order of bp<sup>3</sup>. f(d,l<sub>0</sub>) has units of inverse volume. We have included the units and specified the order of magnitude of V<sub>1</sub> after Equation 2.

      The binding energy EB is discussed a number of times but it wasn’t clear to me that this quantity referred to the energy per IDR site on the DNA or the total energy when the IDR is bound to DNA. In Figure 1 it would seem that the model allows only one IDR domain bound at a given time but I think the model allows for multiple IDR domains to be bound to the IDR target sites simultaneously. Right? Maybe make this clear in the Figure and the text.

      E<sub>B</sub> denotes the binding energy per binding site, where each site corresponds to a short linear motif. Yes, we allow for multiple IDR domains to be bound to the IDR target sites simultaneously. We have clarified the definition of E<sub>B</sub> and adjusted the figure slightly to avoid any misunderstanding.

      After Eq 4 the discussion suggests that for ϕ << 1 the threshold energy is much greater than kBT, but that’s hard to imagine given that the logarithmic dependence of the latter on the former. Also in Figure 2d it seems that the threshold energy is about 8 kBT. Clearly this is not a big deal, just thought the authors might want to revise the language.

      Thank you. We now clarify the sentence using the representative values of ϕ and E<sub>th</sub> after Equation 4.

      Right after Figure 2 there is a discussion of the different parameters that the authors vary. I suggest having a figure that illustrates these parameters (possibly in Figure 1b) to make it easier to follow the discussion.

      We have added explanations of the relevant parameters in Figure 1 for clarity.

      When discussing the dynamics of search the result stated is that the search time is minimum for a specific value of R. I think it would be useful to translate this into a TF concentration. Also, if R represents the radius of the cells nucleus 1/6 um is almost an order of magnitude smaller than the size of a typical nucleus. Is this a worry? Either way some clarification of this number would be helpful.

      Thank you for the suggestion. As noted later in this section, we have translated R into an equivalent TF concentration, and we clarify that we assume the scaling of the minimum search time remains unchanged when extrapolated to the size of a typical nucleus.

      There is a comment regarding the role of the DNA persistence length and how it was not accounted for. It would be helpful if the authors could add a sentence or two explains how a folded DNA conformation, as is the case in the nucleus, would affect their calculation. (So that the reader gets an idea without having to get into the details described in the Supplement).

      Thank you. We have revised the sentence to: “We have verified that reducing the DNA persistence length, which promotes increased DNA coiling, results in only a modest increase in mean search time. Even under extreme coiling conditions, the increase remains below 30% of the baseline value, as detailed in Supplementary S9.”.

    1. eLife Assessment

      This paper reports a useful low-cost platform for studying mosquito behaviors such as flight activity, sugar feeding, and host-seeking responses over the course of several weeks, and demonstrates key applications of this platform. While the authors provide a biological proof of principle, the evidence that supports the validation of the tracking algorithm is incomplete; it lacks biological replicates, independent confirmation of the tracking algorithm, and data on mosquito survival.

    2. Reviewer #1 (Public review):

      Summary:

      This paper describes a behavioral platform "BuzzWatch" and its application in long-term behavioral monitoring. The study tested the system with different mosquito species and Aedes aegypti colonies and monitored behavioral response to blood feeding, change in photoperiod, and host-cue application at different times of the day.

      Strengths:

      BuzzWatch is a novel, custom-built behavioral system that can be used to monitor time-of-day-specific and long-term mosquito behaviors. The authors provide detailed documentation of the construction of the assay and custom flight tracking algorithm on a dedicated website, making them accessible to other researchers in the field. The authors performed a wide range of experiments using the BuzzWatch system and discovered differences in midday activity level among Aedes aegypti colonies, and reversible change in the daily activity profile post-blood-feeding.

      Weaknesses:

      The authors report the population metric "fraction flying" as their main readout of the daily activity profile. It is worth explaining why conventional metrics like travel distance/activity level are not reported. Alternatively, these metrics could be shown, considering the development and implementation of a flight trajectory tracking pipeline in this paper.

      The authors defined the sugar-feeding index using occupancy on the sugar feeder. However, the correlation between landing on the sugar feeder and active sugar feeding is not mentioned or tested in this paper. Is sugar feeding always observed when mosquitoes land on the sugar feeder? Do they leave the sugar feeding surface once sugar feeding is complete? One can imagine that texture preference and prolonged occupancy may lead to inaccurate reporting of sugar feeding. While occupancy on the sugar feeder is an informative behavioral readout, its link with sugar feeding activity (consumption) needs to be evaluated. Otherwise, the authors should discuss the caveats that this method presents explicitly to avoid overinterpretation of their results.

      Throughout the manuscript, the authors mentioned existing mosquito activity monitoring systems and their drawbacks. However, many of these statements are misleading and sometimes incorrect. The authors claim that beam-break monitors are "limited to counting active versus inactive states". Though these systems provide indirect readouts that may underreport activity, the number of beam-breaks in a time interval is correlated with activity level, as is commonly used and reported in Drosophila and mosquitoes and a number of reports in mosquitoes an updated LAM system with larger behavioral arenas and multiple infrared beams. The authors also mentioned the newer, camera-based alternatives to beam-break monitors, but again referred to these systems as "only detecting activity when a moving insect blocks a light beam"; however, these systems actually use video tracking (e.g., Araujo et al. 2020).

      The fold change in behavior presented in Figure 4D is rather confusing. Under the two different photoperiods, it is not clear how an hourly comparison is justified (i.e., comparing the light-on activity in the 20L4D condition with scotophase activity in the 12L12D condition). The same point applies to Figure 4H.

      The behavioral changes after changing photoperiod (Figure 4) require a control group (12L12D throughout) to account for age-related effects. This is controlled for the experiment in Figure 3 but not for Figure 4.

    3. Reviewer #2 (Public review):

      Summary:

      This study establishes a platform for studying mosquito flight activity over the course of several weeks and demonstrates key applications of such a paradigm: the comparison of daily activity profiles across different Aedes aegypti populations and the quantification of responses to physiological and environmental perturbations.

      Strengths:

      (1) Overall, the authors succeed in setting up a low-cost, scalable tracking system that stably records mosquito flight activity for several weeks and uses it to demonstrate compelling use cases.

      (2) The text is organized well, is easy to read, and is understandable for a broad audience.

      (3) Instructions for constructing housing and for performing tracking with a dedicated GUI are available on an accompanying website, with open-source (and well-organized) code.

      (4) A complementary pair of methods (one testing for activity signals at specific times of the day, and the other capturing broader daily patterns) is used effectively.

      Weaknesses:

      (1) In the interval-based GLMM results, since each time interval is tested independently, p-values should be corrected for multiple hypotheses (for instance, through controlling the false discovery rate).

      (2) The accompanying GUI application needs some modifications to fully work out of the box on a sample video.

    4. Reviewer #3 (Public review):

      Summary:

      The authors in this paper introduce BuzzWatch, an open-source, low-cost (200-300 Euros) platform for long-term monitoring of mosquito flight and behavior. They use a Raspberry Pi with a Noirv2 Camera set up under laboratory conditions to observe 3 different species of mosquitoes. The system captures a variety of multimodal data, like flight activity, sugar feeding, and host-seeking responses, with the help of external modules like CO2 and fructose-soaked cottons. They also release a GUI in addition to automated tracking and behaviour analysis, which doesn't run on Pi but rather on a personal laptop.

      Four main use cases are demonstrated:

      (1) Characterizing diel rhythms in various Aedes aegypti populations.

      (2) Differentiating behaviors of native African vs. invasive human-adapted subspecies.

      (3) Assessing physiological (blood-feeding) and environmental (light regime) perturbations.

      (4) Testing time-of-day variation in responses to host-associated cues like CO₂ and heat.

      Description (Strengths):

      (1) The authors introduce a low-cost, scalable system that uses flight tracking in 2D as an alternative to 3D multi-camera systems.

      (2) Due to the low pixel quality required by the system, they can record for weeks at a time, capturing long temporal and behavioral activities.

      (3) They also integrate external modules such as lights, CO2, and heat as a way to measure responses to a variety of stimuli.

      (4) They also introduce a wiki as a guide for building replication and a help in using the GUI module.

      (5) They implement both GLMM hourly and PCA of behavior data.

      Limitations - Major Comments:

      (1) Most experiments are only done with single replicates per colony. If the setup is claimed to be cheap and replicable, there should be clearer replicates across experiments.

      (2) No external validation for the flight tracking algorithm using manual annotation or comparison with field data. The authors focus early on biological proof of principle, but the validity of the tracking algorithm is not presented. How accurate is the algorithm at classifying behaviours (e.g., vs human ground truth)? How reliable is tracking?

      (3) Why develop a custom GUI instead of using established packages such as rethomics (https://rethomics.github.io/) that are already available for behavioral analysis?

      (4) Why use RGB light strips when perceptual white light for humans is not relevant for mosquitoes? The choice of lighting should be based on the mosquito's visual perception. - https://pmc.ncbi.nlm.nih.gov/articles/PMC12077400/ .

      (5) Why use GLMMs instead of GAMs (with explicit periodic components)? With GLMMs, you do not account for temporal structure, which is highly relevant and autocorrelated in behavioral time series data.

      (6) What is the proportion of mosquitoes that stay alive throughout the experiments? How do you address dead animals in tracking? No data are available on whether all mosquitoes made it through the monitoring period. No survival data is mentioned in the paper, and in the wiki, it is not clear how it is used or how it affects the analyses - https://theomaire.github.io/buzzwatch/analyze.html#diff-cond .

      (7 )The sugar feeding behavior is not manually validated.

      (8) Figure 4d is difficult to understand - how did you align time? Why is ZT4 aligning with ZT0? Should you "warp" the time series to compare them (e.g., from dawn to dusk)?

      (9) No video recordings are made available for demonstration or validation purposes.

      Appraisal

      (1) The core conclusions---that BuzzWatch can capture multiscale mosquito behavioral rhythms and quantify the effect of genetic, environmental, and physiological variation - show promise but require stronger validation.

      (2) Statistical approaches (GLMM, PCA) are chosen but may not be optimal for temporal data with autocorrelation.

      (3) The host-seeking module shows a differential response, which is a potentially valuable feature.

    1. eLife Assessment

      This valuable work shows that subcortically-generated behaviors, like grooming, can have widespread representations in cortical activity. While the evidence is solid, additional analyses are necessary to strengthen the claims associated with outsized cortical representations of grooming onsets, as well as to address atypical grooming events. This work will be of interest to neuroscientists interested in how subcortically-generated behaviors are represented across the cortex.

    2. Reviewer #1 (Public review):

      In their manuscript, Michelson et al use a combination of mesoscopic 1p and single-cell resolution 2p imaging to characterise cortical encoding of grooming behaviour. Despite their subcortical locus of control (and non-reliance on cortex), the authors report that grooming movements are accompanied by widespread activation of dorsal cortex. Different grooming movements elicit distinct spatiotemporal cortical activity patterns. They find that cortical engagement is greater at the beginning of grooming episodes than at their end. They also report greater cortical activation for atypical unilateral grooming movements seen under head-restraint in comparison to cortical activity during bilateral movements typical of unrestrained or spontaneous grooming.

      While this is not the first study to report cortical representations of subcortically controlled behaviours, and the authors themselves cite many previous reports of cortical activation during locomotion and even grooming (Sjöbom et al 2020), the value of the present study lies in their use of imaging to reveal the widespread nature of cortical activation during execution of a complex, innate behaviour. I also appreciate the systematic approach used by the authors to break down grooming episodes into their constituent movements and reveal their transition structure.

      I do have concerns, however, that some of the authors' claims are insufficiently supported by their results, and more analysis is required to convincingly rule out alternative interpretations.

      (1) One possible explanation for the gradual decline in cortical activity is that unilateral movements associated with greater cortical activation dominate early in grooming episodes, whereas bilateral movements that elicit weaker cortical activity dominate later (Figure 3G and 2C). The authors could check whether cortical activity associated with the *same* grooming movement is constant or declines during such episodes. A related point: doesn't the regression analysis shown in Figure 3, Supplement 2, assume that a stationary relationship between movement and spatiotemporal patterns of cortical activity?

      (2) From the decline in cortical responses during long grooming episodes, the authors suggest that "mesoscale cortical activity mostly reflects the initiation of subcortically-mediated behaviors, rather than the behavior itself". The authors have taken a lot of trouble to come up with a rich, detailed segmentation and clustering of the grooming behaviour into its constituent movements (Figure 1). Therefore, I am somewhat surprised that they make this claim solely from analysis of averaged cortical activity during nearly minute-long grooming episodes rather than a higher time resolution analysis of transitions between distinct grooming movements (like the prior study by Sjöbom et al and related work in striatal encoding of innate movement sequences by Markowitz et al).

      (3) The authors find that unilateral, atypical grooming movements elicit cortical activity that is distinct from the more naturalistic bilateral movements. They interpret this as reflecting the temporal transition structure of the behaviour. However, an alternative explanation is that the differences (or similarities) in evoked activity simply reflect differences (or similarities) in the kinematics of these movements, with bilateral movements appearing more similar to each other than to unilateral movements. A related point: there is little description of the "non-grooming forelimb movements". Were these kinematically similar to the unilateral forelimb movements, which may explain why they cluster together in Figure 4H?

      (4) Page 13, last paragraph: the authors suggest that similar encoding of non-grooming forelimb movements and unilateral grooming movements may reflect a shared reliance on the cortex. This is rather speculative. Several studies have demonstrated that voluntary unilateral movements employed for reaching or lever pressing are not generally reliant on the cortex (Whishaw et al, Beh Brain Res, 1991; Kawai et al, 2015). There isn't, in my opinion, a broad consensus for the authors' statement that "reaching for food is a cortex-dependent action". Rather than extrapolating from past studies, could the authors not experimentally assess whether unilateral grooming movements are more sensitive to cortical silencing than bilateral ones, possibly revealing a cortical locus of control?

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Michelson, Gupta, and Murphy use calcium imaging to map the distribution of neural activity across the cerebral cortex of grooming, head-restrained mice. Animals groomed spontaneously and in response to wetting of the face. Individual movement elements, such as bilateral strokes across the face, resembled those observed in freely-moving animals. Sequencing of movement elements was structured, but did not consist of full "syntactic grooming chains." Widefield imaging across the cortex revealed distinct patterns of activity for distinct movement elements. Individual neurons responded strongly during movement and had largely similar properties across cortical areas.

      Strengths:

      In my opinion, this is a solid paper that will be of interest to the mouse sensorimotor neuroscience community. The experiments are technically sound, the text is well-written, and the figures are clear. The activity maps are presented in standardized Allen Atlas coordinates, and I expect they will be very useful for future studies of orofacial and limb movement.

      Weaknesses:

      While the manuscript provides a valuable description of cortical activity during head-restrained grooming, I think it could engage a bit more with contemporary theories and debates in cortical physiology and motor control. The Abstract nicely highlights an apparent paradox: the motor cortex sends strong projections to the spinal cord, and is strongly modulated during behaviors like grooming. Nevertheless, blocking corticospinal traffic by inactivating or lesioning the motor cortex leaves such behaviors intact. There are several potential resolutions to this paradox. First, cortical activity during grooming could be confined to an "output-null" subspace that is responsible for monitoring sensorimotor events and preparing voluntary movements, but does not drive muscle activity (c.f. work in the macaque: Kaufman et al., Nature Neuroscience 2014; Churchland & Shenoy, Nature Reviews Neuroscience 2024). Second, cortical activity during grooming could be transmitted to lower centers, but gated out through inhibition. Third, it is possible that cortical activity in intact animals does contribute to muscle activation during grooming, but following a lesion or inactivation, other descending pathways compensate for the cortical deficit. The authors might wish to discuss their findings in light of these considerations.

      In the first paragraph of the Introduction, it could be made clearer which results are specific to mice. The Niell & Stryker finding, for example, holds in mice, but not marmosets (Liska et al., eLife 2024).

      The "hotspots" in Figure 3G appear to be more anterior during bilateral elliptical than unilateral elliptical movements. How do the authors interpret this finding?

      The distribution of single-neuron responses looks relatively similar across cortical areas, including forelimb, hindlimb, and trunk somatosensory cortex, and primary and secondary forelimb motor cortex. What do the authors make of this?