7,306 Matching Annotations
  1. Feb 2025
    1. Note: This response was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      General Statements<br /> The reviewer comments helped us improve the paper by including new computations, figures, and analyses related to vasopressin, drug dosages, and treatment cessation. We have also removed confusing terminology from the text. We believe that the paper is now more comprehensive, clear, and rigorous.

      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      The authors address the question of lowering long-term elevated cortisol levels by affecting the parameters in a previously published mathematical model of the hypothalamic-pituitary-adrenal (HPA) axis. The parameters are related to various pathways. The elevation in cortisol levels is related to diseases e.g. mood disorders and Cushing's syndrome.<br /> The authors conducted a systematic in silico analysis of various points of intervention in the HPA axis. They found that only two interventions targeting corticotropin-releasing hormone (CRH) can lower long-term cortisol. Other drug targets either fail to lower cortisol due to gland-mass compensation or lower cortisol but harm other aspects of the HPA axis. Thus, they identify potential drug targets, including CRH-neutralizing antibodies and CRH synthesis inhibitors, for lowering long-term cortisol in mood disorders and in those suffering from chronic stress.<br /> The method used is in silico investigations of the mathematical model.<br /> The draft is well written with a single typo in line 270. I have no further comments!

      Response: The typo is fixed.

      Reviewer #1 (Significance):

      In silico predictions without direct use of data is a weakness but the conducted analysis is convincing. An improved understanding of why some drugs work and others do not is important and is postulated to agree with clinical evidence.

      Response: We thank the reviewer for this endorsement.

      Reviewer #2 (Evidence, reproducibility and clarity):

      Summary<br /> The authors utilise a mathematical model of the hypothalamic-pituitary-adrenal axis to address the utility of interventions altering its various outputs (CRH, ACTH and cortisol) to ameliorate axis disruption in response to chronic stress. They show that a lowering of circulating CRH by either blocking its synthesis or increasing its clearance is effective at returning the HPA axis to basal activity at all levels. In contrast, interventions altering ACTH or cortisol production, their circulating levels or actions are ineffective in the model. This is consistent with data on the long-term efficacy of drugs reducing excess corticosteroids in patients and animal models. The use of mathematical models to describe complex interactions in endocrine systems is a valuable advance in our understanding of potential mechanisms and therapies and this is an excellent example.

      Response: We thank the reviewer for this endorsement.

      Major comments<br /> 1. The model of the HPA axis that the authors have described previously is a little simplistic when considering the known physiology. Specifically, this model ignores the contribution of vasopressin to the axis, which has been described as being the primary hypothalamic factor driving HPA axis activity in chronic stress (see doi.org/10.1016/S0079-6123(08)00403-2). Including this may be beyond the scope of the current model, however it should be considered and at least commented on. It is notable that the model fits the clinical and animal model data, which may suggest that the contribution of vasopressin in the long term may be overestimated, possibly as a result of differential effects of the two hypothalamic factors, with CRH driving ACTH release and POMC gene expression, whilst vasopressin only increases ACTH release without augmenting POMC expression. This is worthy of discussion.

      Response: We thank the reviewer for this comment which helped us discuss vasopressin. We agree that adding it as a variable in the model is beyond the scope of the current study. We describe its effects in the introduction and discussion sections. Interestingly, when one considers the best characterized effect of vasopressin, namely enhancing CRH-dependent ACTH release, one can use this model to investigate the effects of inhibiting vasopressin. We predict that vasopressin inhibition is unlikely to be an effective strategy for lowering long-term cortisol and alleviating stress-related mental disorders, as evidenced by the failure of clinical trials.

      In the introduction we add:<br /> 1. “CRH stimulates the secretion of adrenocorticotropic hormone (ACTH) by corticotroph cells in the anterior pituitary, an effect enhanced by vasopressin (Aguilera et al, 2008; Antoni, 2017).” (lines 35-37)<br /> 2. Clinical trials for two vasopressin 1b receptor antagonist candidates, SSR149415 and TS-121, in the table of HPA-related clinical trials (Table 1)

      In the discussion we add (lines 398-409): ”One important factor not explicitly considered in the model is the contribution of vasopressin to the axis. Vasopressin potentiates the CRH-dependent release of ACTH from pituitary corticotrophs by acting on the V1b receptor (V1bR) (Aguilera et al, 2008; Antoni, 2017). Including this hormone explicitly is beyond the current scope. However, a simple analysis indicates that the effect of elevated vasopressin can be modeled by increasing the ACTH secretion parameter b2. This suggests that vasopressin V1b receptor antagonists should have effects similar to inhibitors of ACTH production. As such, vasopressin receptor antagonists should be compensated by the HPA axis without long-term effects on cortisol. Accordingly, V1bR antagonists did not show statistically significant efficacy in clinical trials for major depressive disorder and generalized anxiety disorder (Griebel et al, 2012; Chaki, 2021; Kamiya et al, 2020). However, vasopressin may have additional relevant effects on the HPA axis and the central nervous system which warrant a more detailed modeling analysis.”

      1. The model that this study relies on is dependent on slow changes in the various levels of the endocrine axis and the authors have focused on alterations in cell number as the process leading to a prolongation of their dysfunction. For the stress axis, the evidence for changes in corticotroph cell number is weak and the recent paper of Lopez et al (DOI: 10.1126/sciadv.abe44) suggests that chronic stress, at least over a period of 3 weeks does not lead to an alteration in the number of corticotrophs, despite cell population changes in the adrenal gland. There are other processes which could lead to prolonged alteration of corticotroph output and it would be better to focus (as the authors have in places) on functional mass, rather than cell number which may suggest it is not the trophic effect of CRH that is important for increased functional mass.

      Response: We thank the reviewer for this. We now refer only to functional mass changes. We corrected all places in which hyperplasia of corticotrophs is mentioned. We also state in lines 125-126 that the model is agnostic as to whether growth in functional mass is due to hyperplasia or hypertrophy.<br /> We also added a citation for Lopez et al. 2021 (line 86) to support the growth of cortisol-secreting cells in the zona fasciculata of the adrenal gland under stress conditions.

      1. The parameters in the model for interventions are described as simply being less than or greater than one- to what extent are the effects of these interventions dependent on their specific value? For example, presumably if the I1 value is close to zero, then the CRH-synthesis inhibitor would be ineffective. Likewise, if it were close to 1 then there would be negligible release of CRH in response to stress, and the preservation of a response to acute stress would be lost. Can the authors show the range of values for I1, C1 and A1 where the interventions are effective at normalising HPA axis function whilst (for I1 and A1) still preserving the acute stress response?

      Response: We thank the reviewer for this comment that helped us to add a new section in the results on dose response, and three new figures (Figure 4, Figure S2 and Figure S3):

      CRH interventions have a dose-dependent response in the model<br /> We computed the effects of drug doses by varying the relevant model parameter, where zero dose means no change in the parameter and high doses mean large changes in the parameter. We find that both candidate interventions for lowering cortisol - CRH-synthesis inhibitors and CRH-blocking antibodies - cause a dose-dependent reduction of steady-state cortisol (Figure 4A). This indicates that putative treatment may require finding the appropriate dose to return the patients to their normal cortisol baseline range. Other drug candidates have no effect on long-term cortisol steady state (Figure S2).

      At all doses, the steady states of CRH and ACTH remain normal (Figure 4B-C). The acute stress response, defined as peak cortisol upon acute stress input relative to steady-state cortisol, is dose dependent (Figure 4D and Figure S3). At a dose that returns cortisol to the normal range, the acute response is also normalized.

      We also tested the effects of abrupt treatment cessation. For both CRH interventions, stopping treatment led to a rapid return to hypercortisolemia (Figure 4E-F and Figure S4).

      Figure 4. Predicted effective interventions have a dose-dependent effect on cortisol, and cortisol abruptly rises when treatment is ceased. (A) Cortisol steady state in the model upon changes in doses of CRH-synthesis inhibitors and CRH-blocking antibodies. (B-C) The same changes in drug doses have no effect on ACTH (B) and CRH (C) steady state levels. (D) Cortisol peak response to an acute stress relative to steady state for different drug doses. (E-F) HPA dynamics upon cessation of CRH-synthesis inhibitors (E) and anti-CRH antibodies (F) after 50 days.”

      In the supplemental information:

      Cortisol dose response to HPA-targeting drugs

      Figure S2. Cortisol steady state dose response to HPA-targeting drugs, related to Figure 4.

      Figure S3. Cortisol peak response to acute stressor under varying concentrations of HPA-targeting drugs, related to Figure 4.”

      1. In the models that the authors describe with CRH interventions, what is the impact of stopping the intervention on axis output in the short and long-term? Presumably ceasing the use of CRH antagonists would lead to much more severe axis dysregulation than CRH neutralising antibodies or CRH synthesis inhibitors.

      Response: We have now added new analysis on drug cessation (new figure 4E-F, Figure S4). After a 50 day treatment, sudden cessation caused a rapid return to hypercortisolemia:<br /> We added in lines 277-278: “We also tested the effects of abrupt treatment cessation. For both CRH interventions, stopping treatment led to a rapid return to hypercortisolemia (Figure 4E-F).”

      Reviewer #2 (Significance):

      Whilst the study builds on the use of a previously described mathematical model, its utility in identifying potential targets for therapy within the important area of chronic stress makes it an important example of the value of the modelling approach to decisions on appropriate targets for therapy. The model does not include important known factors which have been described as being important in the HPA axis response to chronic stress and would be considerably improved if these could be incorporated.<br /> The study builds on conceptual insights into the role a delayed or slow functional mass change might play in dysregulation of endocrine axes and this could be applied to other physiological systems and will be of interest to modellers and physiologists alike. The authors are leaders in this field and there are few other modellers considering systems level interactions over this timescale.

      Response: We thank the reviewer for this endorsement.

      As a pituitary physiologist, my review has focused on the interactions between the various players in HPA axis function, I do not have the expertise to comment on mathematical modelling aspects.

      Reviewer #3 (Evidence, reproducibility and clarity):

      This extremely interesting paper asks why various attempts to treat depression and bipolar disorder with glucocorticoid antagonists or cortisol synthesis inhibitors have failed. The starting point for their analysis is a simple computational model that, importantly, includes the facts that CRH stimulates not only ACTH release but also corticotroph growth and ACTH stimulates not only cortisol production but also the growth of cells in the adrenal cortex. They call this the "gland mass model". According to the model, if the hypothalamus receives a continuous stress input, all of the HPA hormones will be elevated-CRH transiently and the others in a sustained fashion. Adding a sufficient dose of a CRH inhibitor (decreasing the rate constant b1 in the model) or a CRH antibody (increasing the rate constant a1) normalizes the hormone levels, whereas blocking cortisol function or production does not. This is demonstrated by numerical simulations and backed up by deriving analytical expressions for the hormone concentrations at steady state. The paper provides a plausible explanation for why past therapeutic efforts have failed and points to a couple of approaches that might succeed. These conclusions are hypotheses-they haven't been tested experimentally and we really don't know how accurately the system is described by this nice, simple model-but they are really intriguing hypotheses that could lead to therapeutic breakthroughs. I strongly recommend publication.

      Response: We thank the reviewer for this endorsement.

      My only criticisms are minor:

      1. The authors should specify what exact change in the model's parameters they are making to implement their therapeutic interventions. E.g. in Fig 1B top left and 2A, what is the change in the value of b1 that corresponds to the addition of a CRH-synthesis inhibitor? (I'd guess it's being dropped to zero, but if this is stated, I missed it)

      Response: We thank the reviewer for that comment which helped us to clarify what is the required parameter change to normalize cortisol. We have now added in lines 173-175: “According to equation (1), as a general guideline, treating cortisol levels that are x-fold higher than baseline requires a drug dose that alters the relevant parameter (e.g., CRH production or removal rate) by a similar x-fold.”

      1. I think it would also be useful to show a dose-response relationship for the various interventions.

      Response: We thank the reviewer for this comment that helped us to add a new section in the results on dose response, and three new figures (Figure 4, Figure S2 and Figure S3):

      CRH interventions have a dose-dependent response in the model<br /> We computed the effects of drug doses by varying the relevant model parameter, where zero dose means no change in the parameter and high doses mean large changes in the parameter. We find that both candidate interventions for lowering cortisol - CRH-synthesis inhibitors and CRH-blocking antibodies - cause a dose-dependent reduction of steady-state cortisol (Figure 4A). This indicates that putative treatment may require finding the appropriate dose to return the patients to their normal cortisol baseline range. Other drug candidates have no effect on long-term cortisol steady state (Figure S2).

      At all doses, the steady states of CRH and ACTH remain normal (Figure 4B-C). The acute stress response, defined as peak cortisol upon acute stress input relative to steady-state cortisol, is dose dependent (Figure 4D and Figure S3). At a dose that returns cortisol to the normal range, the acute response is also normalized.

      We also tested the effects of abrupt treatment cessation. For both CRH interventions, stopping treatment led to a rapid return to hypercortisolemia (Figure 4E-F and Figure S4).

      Figure 4. Predicted effective interventions have a dose-dependent effect on cortisol, and cortisol abruptly rises when treatment is ceased. (A) Cortisol steady state in the model upon changes in doses of CRH-synthesis inhibitors and CRH-blocking antibodies. (B-C) The same changes in drug doses have no effect on ACTH (B) and CRH (C) steady state levels. (D) Cortisol peak response to an acute stress relative to steady state for different drug doses. (E-F) HPA dynamics upon cessation of CRH-synthesis inhibitors (E) and anti-CRH antibodies (F) after 50 days.”

      In the supplemental information:

      Cortisol dose response to HPA-targeting drugs

      Figure S2. Cortisol steady state dose response to HPA-targeting drugs, related to Figure 4.

      Figure S3. Cortisol peak response to acute stressor under varying concentrations of HPA-targeting drugs, related to Figure 4.”

      *Referees cross-commenting*

      It looks like we are all enthusiastic about this work.

      Response: Thank you.

      Reviewer #3 (Significance):

      Strengths: It's a beautiful new insight on a really important topic, extracted from a simple, understandable mathematical model of the HPA axis.

      Weaknesses: It is based on a model and the model could be wrong. This does not however diminish my enthusiasm for this provocative work.

      Advance: It is highly original.

      Audience: I hope attracts a wide audience--modelers, endocrinologists, psychiatrists, drug developers.

      My expertise: I am a systems biologist, have taught psychopharmacology to medical students, and have an interest in endocrine signaling.

    1. Those with disabilities often find ways to cope with their disability, that is, find ways to work around difficulties they encounter and seek out places and strategies that work for them (whether realizing they have a disability or not). Additionally, people with disabilities might change their behavior (whether intentionally or not) to hide the fact that they have a disability, which is called masking and may take a mental or physical toll on the person masking, which others around them won’t realiz

      People with disabilities often feel difficult due to various challenges caused by their conditions. Therefore we should do as much as possible to make their lives easier. For example, I think it is very warm to have parking spots for disabled in parking lot so that they can park in a nearby place without moving around.

    2. 10.2. Accessible Design# There are several ways of managing disabilities. All of these ways of managing disabilities might be appropriate at different times for different situations. 10.2.1. Coping Strategies# Those with disabilities often find ways to cope with their disability, that is, find ways to work around difficulties they encounter and seek out places and strategies that work for them (whether realizing they have a disability or not). Additionally, people with disabilities might change their behavior (whether intentionally or not) to hide the fact that they have a disability, which is called masking and may take a mental or physical toll on the person masking, which others around them won’t realize. For example, kids who are nearsighted and don’t realize their ability to see is different from other kids will often seek out seats at the front of classrooms where they can see better. As for us two authors, we both have ADHD and were drawn to PhD programs where our tendency to hyperfocus on following our curiosity was rewarded (though executive dysfunction with finishing projects created challenges)[1]. This way of managing disabilities puts the burden fully on disabled people to manage their disability in a world that was not designed for them, trying to fit in with “normal” people. 10.2.2. Modifying the Person# Another way of managing disabilities is assistive technology [j13], which is something that helps a disabled person act as though they were not disabled. In other words, it is something that helps a disabled person become more “normal” (according to whatever a society’s assumptions are). For example: Glasses help people with near-sightedness see in the same way that people with “normal” vision do Walkers and wheelchairs can help some disabled people move around closer to the way “normal” people can (though stairs can still be a problem) A spoon might automatically balance itself [j14] when held by someone whose hands shake Stimulants (e.g., caffeine, Adderall) can increase executive function in people with ADHD, so they can plan and complete tasks more like how neurotypical people do. Assistive technologies give tools to disabled people to help them become more “normal.” So the disabled person becomes able to move through a world that was not designed for them. But there is still an expectation that disabled people must become more “normal,” and often these assistive technologies are very expensive. Additionally, attempts to make disabled people (or people with other differences) act “normal” can be abusive, such as Applied Behavior Analysis (ABA) therapy for autistic people [j15], or “Gay Conversion Therapy” [j16]. 10.2.3. Making an environment work for all# Another strategy for managing disability is to use Universal Design [j17], which originated in architecture. In universal design, the goal is to make environments and buildings have options so that there is a way for everyone to use it[2]. For example, a building with stairs might also have ramps and elevators, so people with different mobility needs (e.g., people with wheelchairs, baby strollers, or luggage) can access each area. In the elevators the buttons might be at a height that both short and tall people can reach. The elevator buttons might have labels both drawn (for people who can see them) and in braille (for people who cannot), and the ground floor button may be marked with a star, so that even those who cannot read can at least choose the ground floor. In this way of managing disabilities, the burden is put on the designers to make sure the environment works for everyone, though disabled people might need to go out of their way to access features of the environment. 10.2.4. Making a tool adapt to users# When creating computer programs, programmers can do things that aren’t possible with architecture (where Universal Design came out of), that is: programs can change how they work for each individual user. All people (including disabled people) have different abilities, and making a system that can modify how it runs to match the abilities a user has is called Ability based design [j18]. For example, a phone might detect that the user has gone from a dark to a light environment, and might automatically change the phone brightness or color scheme to be easier to read. Or a computer program might detect that a user’s hands tremble when they are trying to select something on the screen, and the computer might change the text size, or try to guess the intended selection. In this way of managing disabilities, the burden is put on the computer programmers and designers to detect and adapt to the disabled person. 10.2.5. Are things getting better?# We could look at inventions of new accessible technologies and think the world is getting better for disabled people. But in reality, it is much more complicated. Some new technologies make improvements for some people with some disabilities, but other new technologies are continually being made in ways that are not accessible. And, in general, cultures shift in many ways all the time, making things better or worse for different disabled people.

      The comparison between assistive technology and universal design also made me reflect on how differently society perceives accommodations. Glasses, for example, are a widely accepted assistive tool, to the point where people forget that nearsightedness is technically a disability. Meanwhile, other assistive devices, like wheelchairs or ADHD medication, can sometimes carry stigma, even though they serve the same purpose—helping people function in a world that isn’t designed for them. T

  2. social-media-ethics-automation.github.io social-media-ethics-automation.github.io
    1. A disability is an ability that a person doesn’t have, but that their society expects them to have.[1] For example: If a building only has staircases to get up to the second floor (it was built assuming everyone could walk up stairs), then someone who cannot get up stairs has a disability in that situation. If a physical picture book was made with the assumption that people would be able to see the pictures, then someone who cannot see has a disability in that situation. If tall grocery store shelves were made with the assumption that people would be able to reach them, then people who are short, or who can’t lift their arms up, or who can’t stand up, all would have a disability in that situation. If an airplane seat was designed with little leg room, assuming people’s legs wouldn’t be too long, then someone who is very tall, or who has difficulty bending their legs would have a disability in that situation. Which abilities are expected of people, and therefore what things are considered disabilities, are socially defined [j1]. Different societies and groups of people make different assumptions about what people can do, and so what is considered a disability in one group, might just be “normal” in another. There are many things we might not be able to do that won’t be considered disabilities because our social groups don’t expect us to be able to do them. For example, none of us have wings that we can fly with, but that is not considered a disability, because our social groups didn’t assume we would be able to. Or, for a more practical example, let’s look at color vision: Most humans are trichromats, meaning they can see three base colors (red, green, and blue), along with all combinations of those three colors. Human societies often assume that people will be trichromats. So people who can’t see as many colors are considered to be color blind [j2], a disability. But there are also a small number of people who are tetrachromats [j3] and can see four base colors[2] and all combinations of those four colors. In comparison to tetrachromats, trichromats (the majority of people), lack the ability to see some colors. But our society doesn’t build things for tetrachromats, so their extra ability to see color doesn’t help them much. And trichromats’ relative reduction in seeing color doesn’t cause them difficulty, so being a trichromat isn’t considered to be a disability. Some disabilities are visible disabilities that other people can notice by observing the disabled person (e.g., wearing glasses is an indication of a visual disability, or a missing limb might be noticeable). Other disabilities are invisible disabilities that other people cannot notice by observing the disabled person (e.g., chronic fatigue syndrome [j4], contact lenses for a visual disability, or a prosthetic for a missing limb covered by clothing). Sometimes people with invisible disabilities get unfairly accused of “faking” or “making up” their disability (e.g., someone who can walk short distances but needs to use a wheelchair when going long distances). Disabilities can be accepted as socially normal, like is sometimes the case for wearing glasses or contacts, or it can be stigmatized [j5] as socially unacceptable, inconvenient, or blamed on the disabled person. Some people (like many with chronic pain) would welcome a cure that got rid of their disability. Others (like many autistic people [j6]), are insulted by the suggestion that there is something wrong with them that needs to be “cured,” and think the only reason autism is considered a “disability” at all is because society doesn’t make reasonable accommodations for them the way it does for neurotypical [j7] people. Many of the disabilities we mentioned above were permanent disabilities, that is, disabilities that won’t go away. But disabilities can also be temporary disabilities, like a broken leg in a cast, which may eventually get better. Disabilities can also vary over time (e.g., “Today is a bad day for my back pain”). Disabilities can even be situational disabilities, like the loss of fine motor skills when wearing thick gloves in the cold, or trying to watch a video on your phone in class with the sound off, or trying to type on a computer while holding a baby. As you look through all these types of disabilities, you might discover ways you have experienced disability in your life. Though please keep in mind that different disabilities can be very different, and everyone’s experience with their own disability can vary. So having some experience with disability does not make someone an expert in any other experience of disability. As for our experience with disability, Kyle has been diagnosed with generalized anxiety disorder [j8] and Susan has been diagnosed with depression [j9]. Kyle and Susan also both have: near sightedness [j10]: our eyes cannot focus on things far away (unless we use corrective lenses, like glasses or contacts) ADHD [j11]: we have difficulty controlling our focus, sometimes being hyperfocused and sometimes being highly distracted and also have difficulties with executive dysfunction [j12]. [1]

      This made me think about how I’ve encountered situational disabilities in my own life. For example, trying to use a smartphone in bright sunlight when the screen becomes unreadable is a form of situational disability. Similarly, being in a loud space where I can't hear a conversation well might resemble the experience of someone with hearing loss, even if it's only temporary. It’s a reminder that disability is fluid and context-dependent, not just a fixed identity that applies to a specific group of people.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:  

      Reviewer #1 (Public Review):  

      Summary:  

      The study by Pudlowski et al. investigates how the intricate structure of centrioles is formed by studying the role of a complex formed by delta- and epsilon-tubulin and the TEDC1 and TEDC2 proteins. For this, they employ knockout cell lines, EM, and ultrastructure expansion microscopy as well as pull-downs. Previous work has indicated a role of delta- and epsilon-tubulin in triplet microtubule formation. Without triplet microtubules centriolar cylinders can still form, but are unstable, resulting in futile rounds of de novo centriole assembly during S phase and disassembly during mitosis. Here the authors show that all four proteins function as a complex and knockout of any of the four proteins results in the same phenotype. They further find that mutant centrioles lack inner scaffold proteins and contain an extended proximal end including markers such as SAS6 and CEP135, suggesting that triplet microtubule formation is linked to limiting proximal end extension and formation of the central region that contains the inner scaffold. Finally, they show that mutant centrioles seem to undergo elongation during early mitosis before disassembly, although it is not clear if this may also be due to prolonged mitotic duration in mutants.  

      Strengths:  

      Overall this is a well-performed study, well presented, with conclusions mostly supported by the data. The use of knockout cell lines and rescue experiments is convincing.  

      Weaknesses:  

      In some cases, additional controls and quantification would be needed, in particular regarding cell cycle and centriole elongation stages, to make the data and conclusions more robust. 

      We thank the reviewer for these comments and have improved our analyses of these as detailed below.

      Reviewer #2 (Public Review):  

      Summary:  

      In this article, the authors study the function of TEDC1 and TEDC2, two proteins previously reported to interact with TUBD1 and TUBE1. Previous work by the same group had shown that TUBD1 and TUBE1 are required for centriole assembly and that human cells lacking these proteins form abnormal centrioles that only have singlet microtubules that disintegrate in mitosis. In this new work, the authors demonstrate that TEDC1 and TEDC2 depletion results in the same phenotype with abnormal centrioles that also disintegrate into mitosis. In addition, they were able to localize these proteins to the proximal end of the centriole, a result not previously achieved with TUBD1 and TUBE1, providing a better understanding of where and when the complex is involved in centriole growth.  

      Strengths:  

      The results are very convincing, particularly the phenotype, which is the same as previously observed for TUBD1 and TUBE1. The U-ExM localization is also convincing:

      despite a signal that's not very homogeneous, it's clear that the complex is in the proximal region of the centriole and procentriole. The phenotype observed in U-ExM on the elongation of the cartwheel is also spectacular and opens the question of the regulation of the size of this structure. The authors also report convincing results on direct interactions between TUBD1, TUBE1, TEDC1, and TEDC2, and an intriguing structural prediction suggesting that TEDC1 and TEDC2 form a heterodimer that interacts with the TUBD1- TUBE1 heterodimer.  

      Weaknesses:  

      The phenotypes observed in U-ExM on cartwheel elongation merit further quantification, enabling the field to appreciate better what is happening at the level of this structure.  

      We thank the reviewer for these comments and have improved our analyses of cartwheel elongation as detailed below.

      Reviewer #3 (Public Review):  

      Summary:  

      Human cells deficient in delta-tubulin or epsilon-tubulin form unstable centrioles, which lack triplet microtubules and undergo a futile formation and disintegration cycle. In this study, the authors show that human cells lacking the associated proteins TEDC1 or TEDC2 have these identical phenotypes. They use genetics to knockout TEDC1 or TEDC2 in p53negative RPE-1 cells and expansion microscopy to structurally characterize mutant centrioles. Biochemical methods and AlphaFold-multimer prediction software are used to investigate interactions between tubulins and TEDC1 and TEDC2.  

      The study shows that mutant centrioles are built only of A tubules, which elongate and extend their proximal region, fail to incorporate structural components, and finally disintegrate in mitosis. In addition, they demonstrate that delta-tubulin or epsilon-tubulin and TEDC1 and TEDC2 form one complex and that TEDC1 TEDC2 can interact independently of tubulins. Finally, they show that the localization of four proteins is mutually dependent.  

      Strengths:  

      The results presented here are mostly convincing, the study is exciting and important, and the manuscript is well-written. The study shows that delta-tubulin, epsilon-tubulin, TEDC1, and TEDC2 function together to build a stable and functional centriole, significantly contributing to the field and our understanding of the centriole assembly process.  

      Weaknesses:  

      The ultrastructural characterization of TEDC1 and TEDC2 obtained by U-ExM is inconclusive. Improving the quality of the signals is paramount for this manuscript.  

      We thank the reviewer for these comments and have improved our imaging of TEDC1 and TEDC2 localization, as detailed below.

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):  

      The reviewers agreed that the conclusions are largely supported by solid evidence, but felt that improving the following aspects would make some of the data more convincing:  

      (1) The UExM localizations of TEDC1/2 are not very convincing and the reviewers suggest to complement these with alternative super-resolution approaches (e.g. SIM) and/or different labeling techniques such as pre-expansion labeling using STAR red/orange secondaries (also robust for SIM and STED), use of the Halo tag, different tag antibodies, etc 

      We thank the reviewers for these recommendations and have adapted two of these strategies to improve our imaging of TEDC1 and TEDC2 localization. First, we used an alternative super-resolution approach, a Yokogawa CSU-W1 SoRA confocal scanner (resolution = 120 nm) and imaged cells grown on coverslips (not expanded). We found that TEDC1 and TEDC2 localize to procentrioles and the proximal end of parental centrioles (Fig 2 – Supplementary Figure 1a, b). Second, we used a recently described expansion gel chemistry (Kong et al., Methods Mol Biol 2024) combined with Abberior Star red and orange secondary antibodies. This technique resulted in robust signal at centrosomes and in the cytoplasm and indicated that TEDC1 and TEDC2 localize near the centriole walls of procentrioles and the proximal region of parental centrioles, near CEP44 (Fig 2 – Supplementary Figure 1c, d). These results complement and support our initial observations (Fig 2C, D) and we have edited the text to reflect this (lines 157-163). We also note that these Flag tag and V5 tag primary antibodies are specific and have little background signal in all applications (Fig 2 – Supplementary Fig 1E-J), while other commercially available antibodies against these tags did exhibit non-specific signal. 

      (2) The cell cycle classifications of centrioles would strongly benefit, apart from a better description, from adding quantifications of average centriole length at a given stage based on tubulin staining (not acTub). 

      We thank the reviewers for these recommendations. We have added an improved description of our cell cycle analyses (lines 234-237). We have also added new analyses for centriole length as measured by staining with alpha-tubulin (Fig 4 – Supp 3 and Fig 4 – Supp 4). We find that in all mutants, acetylated tubulin elongates along with alpha-tubulin in a similar way as control centrioles.

      Reviewer #1 (Recommendations For The Authors):  

      Specific points:  

      (1) The introduction is a bit oddly structured. About halfway through it summarizes what is going to be presented in the study, giving the impression that it is about to conclude, but then continues with additional, detailed introduction paragraphs. Overall, the authors may also want to consider making it more concise.

      We thank the reviewer for these suggestions and have shortened and restructured the introduction for clarity and conciseness.

      (2) The text should explain to the non-expert reader why endogenous proteins are not detected and why exogenously expressed, tagged versions are used. Related to this, the authors state overexpression, but what is this assessment based on? Does expression at the endogenous level also rescue? At least by western blot, these questions should be addressed. 

      In the text, we have added clarification about why endogenous proteins were not detected for immunofluorescence (lines 149-151). To quantify the overexpression, we have added Western blots of TEDC1 and TEDC2 to Fig 1 – Supplementary Figure 1E,F. We note that endogenous levels of both proteins are very low, and the rescue constructs are overexpressed 20 to 70 fold above endogenous levels.  

      (3) The figures should clearly indicate when tagged proteins are used and detected.

      Currently, this info is only found in the legends but should be in the figure panels as well. 

      We have made these changes to the figure panels in Fig 2, Fig 2 – Supp 1, and Fig 3.

      (4)  I could not find a description and reference to Figure 2 Supplement 2 and 3. 

      We have replaced these supplements with new supplementary figures for TEDC1 and TEDC2 localization (Fig 2 – Supp 1).

      (5) The multiple bands including unspecific (?) bands should be labeled to guide the reader in the western blots. 

      We have labeled nonspecific bands in our Western blots with asterisks (Fig 1 – Supp 1, Fig 3)

      (6) The alphafold prediction suggests that TUBD1 can bind to the TED complex in the absence of TUBE1 can this be shown? This would be a nice validation of the predicted architecture of the complex. I also missed a bit of a discussion of the predicted architecture. How could it be linked to triplet microtubule formation? Is the latest alphafold version 3 adding anything to this analysis? 

      In our pulldown experiments, we found that TUBD1 cannot bind to TEDC1 or TEDC2 in the absence of TUBE1 (Fig 3C, D, IB: TUBD1). We performed this experiment with three biological replicates and found the same result. It is possible that TUBD1 and TUBE1 form an intact heterodimer, similar to alpha-tubulin and beta-tubulin, and this will be an exciting area of future research.

      We have added new analysis from AlphaFold3 (Fig 3 – Supp 1B). AlphaFold3 predicts a similar structure as AlphaFold Multimer.

      We have also added additional discussion about the AlphaFold prediction to the text (lines 220-222, 365-367). Thanks to the reviewer for pointing out this oversight.

      (7) I suggest briefly explaining in the text how cells and centrioles at different cell cycle stages were identified. I found some info in the legend of Figure 1, but no info for other figures or in the text. Related to this, how are procentrioles defined in de novo formation? There is no parental centriole to serve as a reference. 

      We have added a brief explanation of the synchronization and identification in lines 234237. We have also clarified the text regarding de novo centrioles, and now term these “de novo centrioles in the first cell cycle after their formation” (lines 271-272).

      (8) Related to point 7: using acetylated tubulin as a universal length and width marker seems unreliable since it is a PTM. The authors should use general tubulin staining to estimate centriole dimensions, or at least establish that acetylated tubulin correlates well with the overall tubulin signal in all mutants. 

      We have added two supplementary data figures (Fig 4 – supp 3 and Fig 4 – supp 4) in which we co-stain control and mutant centrioles with alpha-tubulin. We found that acetylated tubulin marked mutant centrioles well and as alpha-tubulin length increased, acetylated tubulin length also increased. 

      (9) Presence and absence of various centriolar proteins. These analyses lack a clear reference for the precise centriole elongation stage. This is particularly problematic for proteins that are recruited at specific later stages (such as inner scaffold proteins). The staining should be correlated with centriole length measurements, ideally using general tubulin staining.  

      As described for point 8, we have added two supplementary data figures in which we costain control and mutant centrioles with alpha-tubulin and found that acetylated tubulin also increases as overall tubulin length increases in all mutants. We note that inner scaffold proteins are absent in all our mutant centrioles at all stages of the cell and centriole cycle, as also previously reported for POC5 in Wang et al., 2017.

      Reviewer #2 (Recommendations For The Authors):  

      Here's a list of points I think could be improved:  

      -  As the authors previously published, the centriole appears to have a smaller internal diameter than mature centrioles. Could the authors measure to see if the phenotype is identical? Is the centriole blocked in the bloom phase (Laporte et al. 2024)? 

      We have added an additional supplementary figure (Fig 4 – supp 5) to show that mutant centrioles have smaller diameters than mature centrioles, as we previously reported for the delta-tubulin and epsilon-tubulin mutant centrioles by EM. We thank the reviewers for the additional question of the bloom phase. Given the comparatively smaller number of centrioles we analyzed in this paper compared to Laporte et al (50 to 80 centrioles per condition here, versus 800 centrioles in Laporte et al), it is difficult to definitively conclude whether there is a block in bloom phase. This would be an interesting area for future research.  

      -  The images of the centrioles in EM are beautiful. Would it be possible to apply a symmetrisation on it to better see the centriolar structures? For example, is the A-C linker present? 

      We thank the reviewer for this excellent suggestion. Using centrioleJ, we find that the A-C linker is absent from mutant centrioles. The symmetrized images have been added to Fig 1 – Supplementary Fig 2, and additional discussion has been added to the text (line 143-144, line 368-374).  

      -  How many EM images were taken? Did the centrioles have 100% A-microtubule only or sometimes with B-MT? 

      For TEM, we focused on centrioles that were positioned to give perfect cross-section images of the centriolar microtubules, and thus did not take images of off-angle or rotated centrioles. Given the difficulty of this experiment (centrioles are small structures within the cell, centrosomes are single-copy organelles, and off-angle centrioles were not imaged), we were lucky to image 3 centrioles that were in perfect cross-section – 2 for Tedc1<sup>-/-</sup> and 1 for Tedc2<sup>-/-</sup>. Our images indicate that these centrioles only have A-tubules (Fig 1 – Supp Fig

      2).

      -  In Figure 2 - it would be preferable to write TEDC2-flag or TEDC1-flag and not TEDC2/1. 

      We have made this change

      -  It seems that Figures 2C and D aren't cited, and some of the data in the supplemental data are not described in the main text. 

      We have replaced these supplements with new supplementary figures for TEDC1 and TEDC2 localization (Fig 2 – Supp 1).

      -  The signal in U-ExM with the anti-Flag antibody is heterogeneous. Did the authors test several anti-FLAG antibodies in U-ExM? 

      We tested several anti-Flag and anti-V5 antibodies for our analyses, and chose these because they have little background signal in all applications (Fig 2 – Supplementary Fig 1E-J). Other commercially available antibodies against these tags did exhibit non-specific signal.

      -  The AlphaFold prediction is difficult to interpret, the authors should provide more views and the PDB file. 

      We have added 2 additional views of the AlphaFold prediction in Fig 3 – Supp 1A.

      -  In general, but particularly for Figure 4: the length doesn't seem to be divided by the expansion factor, it is therefore difficult to compare with known EM dimensions. Can the authors correct the scale bars? 

      We have corrected the scale bars for all figures to account for the expansion factor.

      -  Concerning Gamma-tubulin that is "recruited to the lumen of centrioles by the inner scaffold, had localization defects in mutant centrioles. However, we were unable to reliably detect gamma-tubulin within the lumen of control or de novo-formed centrioles in S or G2-phase (Figure 4 - Supplement 1E), and thus were unable to test this hypothesis". In Laporte et al 2024, Gamma-tubulin arrives later than the inner scaffold and only on mature centrioles, so this result appears to be in line with previous observation. However, the authors should be able to detect a proximal signal under the microtubules of the procentriole, is this the case? 

      We agree that this is an exciting question. However, in our expansion microscopy staining, we frequently observe that gamma-tubulin surrounds centrioles, corresponding to its role in the pericentriolar material (PCM). In our hands, we find it difficult to distinguish between centriolar gamma-tubulin at the base of the A-tubule from gamma-tubulin within the PCM.  

      -  In the signal elongation of SAS-6, STIL, CEP135, CPAP, and CEP44, would it be possible to quantify the length of these signals (with dimensions divided by the expansion factor for comparison with known TEM distances)? 

      We have quantified the lengths of SAS-6 and CEP135 in new Fig 4 – Supp 3 and Fig 4 – Supp 4.  

      -  The authors observe that centrin is present, but only as a SFI1 dot-like localization (which is another protein that would be interesting to look at), and not an inner scaffold localization. Can the authors elaborate? These results suggest that the distal part is correctly formed with only a microtubule singlet. 

      We agree with the reviewer’s interpretation that the centriole distal tip is likely correctly formed with only singlet microtubules, as both distal centrin and CP110 are present. We have added this point to the discussion (line 415).

      -The authors observe that CPAP is elongated, but CPAP has two locations, proximal and distal. Is it distal or proximal elongation? Is the proximal signal of CPAP longer than that of CEP44 in the mutants? The authors discuss that the elongation could come from overexpression of CPAP, but here it seems that the centriole is not overlong, just the structures around the cartwheel. 

      We thank the reviewer for this point. It is difficult for us to conclude whether the proximal or distal region is extended in the mutants, as our mutant centrioles lacks a visible separation between these two regions. It would be interesting to probe this question in the future by testing whether subdomains of CPAP may be differentially regulated in our mutants.

      Reviewer #3 (Recommendations For The Authors):  

      It isn't apparent to me what was counted in Figure 1C. Were all centrioles (mother centrioles and procentrioles) counted? Where is the 40% in control cells coming from? Can this set of data be presented differently? 

      We apologize for the confusion. In this figure, all centrioles were counted. We have updated the figure legend for clarity. We performed this analysis in a similar way as in Wang et al., 2017 to better compare phenotypes.  

      Figure 2C. and the text lines 182-187: The ultrastructural characterization of TEDC1 and TEDC2 suffers from the low quality of the TEDC1 and TEDC2 signals obtained postexpansion. In comparison with robust low-resolution immunosignal, it appears that most of the signal cannot be recovered after expansion. Another sub-resolution imaging method to re-analyze TEDC1 and TEDC22 localization would be essential. The same concern applies to Figures 2 - Supplement 2 and 3. Also, Figure 2 - Supplement 2 and Supplement 3 do not seem to be cited. 

      We thank the reviewer for these recommendations. As also mentioned above, we used an alternative super-resolution approach, a Yokogawa CSU-W1 SoRA confocal scanner (resolution = 120 nm), and found that TEDC1 and TEDC2 localize to procentrioles and the proximal end of parental centrioles (Fig 2 – Supplementary Figure 1a, b). Second, we used a recently described expansion gel chemistry (Kong et al., Methods Mol Biol 2024) combined with Abberior Star red and orange secondary antibodies. This technique resulted in robust signal at centrosomes and in the cytoplasm and indicated that TEDC1 and TEDC2 localize near the centriole walls of procentrioles and the proximal region of parental centrioles, near CEP44 (Fig 2 – Supplementary Figure 1c, d). These stainings complement and support our initial observations (Fig 2C, D) and we have edited the text to reflect this (lines 157-163). We have also removed the supplementary figures that were uncited in the text.

      TUBD1 and TUBE1 form a dimer and TEDC2 and TEDC1 can interact. Any speculation as to why TEDC2 does not pull down both TUBE1 and TUBD1? 

      We apologize for the confusion. TEDC2 does pull down both TUBE1 and TUBD1 (Fig 3D, pull-down, second column, Tedc2-V5-APEX2 rescuing the Tedc2<sup>-/-</sup> cells pulls down TUBD1, TUBE1, and TEDC1).  

      Figure 4A and B. The authors use acetylated tubulin to determine the length of procentrioles in the S and G2 phases. However, procentrioles are not acetylated on their distal ends in these cell phase phases (as the authors also mention further in the text). Why has alpha tubulin not been used since it works well in U-ExM? The average size of the control, G2 procentrioles, seems too small in Figure 4A and not consistent with other imaging data (for instance, in Figure 4 - Supplement 1 C, Cp110, and CPAP staining). There is no statistical analysis in F4A.  

      We have added two supplementary data figures (Fig 4 – supp 3 and Fig 4 – supp 4) in which we co-stain control and mutant centrioles with alpha-tubulin. We found that acetylated tubulin correlates well with overall tubulin signal in all mutants. We have added statistical analysis to the figure legend of Fig 4A.

      Lines 260 - 262: "These results indicate that centrioles with singlet microtubules can elongate to the same length as controls, and therefore that triplet microtubules are not essential for regulating centriole length." It is hard to agree with this statement. Mutant procentrioles show aberrantly elongated proximal signals of several tested proteins. In addition, in lines 326 - 328, the authors state that "Together, these results indicate that centrioles lacking compound microtubules are unable to properly regulate the length of the proximal end."  

      We thank the reviewer and have clarified the statement to state that these results indicate that centrioles with singlet microtubules can elongate to the same overall length as control centrioles in G2 phase.  

      Line 353: The authors suggest that elongated procentriole structure in mitosis may represent intermediates in centriole disassembly. Another interpretation, more in line with the EM data from Wang et al., 2017, would be that these mutant procentrioles first additionally elongate before they disassemble in late mitosis. The aberrant intermediate structure concept would need further exploration. For instance, anti-alpha/beta-tubulin antibodies could be used to investigate centriole microtubules.  

      We apologize for the confusion and have edited this section for clarity (lines 341-343): “We conclude that in our mutant cells, centrioles elongate in early mitosis to form an aberrant intermediate structure, followed by fragmentation in late mitosis.”

      References need to be included in lines 122, 277, 279. 

      We have added these references

      Line 281: Add references PMID: 30559430 and PMID: 32526902.  

      We have added these references (lines 265-266).

      Line 289: "Moreover, our results suggest that centriole glutamylation is a multistep process, in which long glutamate side chains are added later during centriole maturation." This does not seem like an original observation. For instance, see PMID: 32526902.  

      We have added this reference (lines 273-274).

    1. The pharmaceutical industry is a massive elephant. Like the blind men of the famous parable, we each catch hold of a tiny piece of it—leg, tail, trunk—and think we have a handle on it: it is strong and solid, it is hairy, it moves like a snake. From about $880 billion dollars of sales for 2011, the industry is expected to growapproximately 5 percent a year in the future.

      This analogy presents the pharmaceutical industry as a powerful but multifaceted entity that different stakeholders perceive differently. The industry's financial growth suggests continued expansion, but the metaphor subtly implies the challenges of incomplete understanding and differing priorities among those interacting with it. Scientists may focus on research and innovation, while policymakers see regulation, and patients experience cost and accessibility concerns.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review): 

      Hotinger et al. explore the population dynamics of Salmonella enterica serovar Typhimurium in mice using genetically tagged bacteria. In addition to physiological observations, pathology assessments, and CFU measurements, the study emphasizes quantifying host bottleneck sizes that limit Salmonella colonization and dissemination. The authors also investigate the genetic distances between bacterial populations at various infection sites within the host.

      Initially, the study confirms that pretreatment with the antibiotic streptomycin before inoculation via orogastric gavage increases the bacterial burden in the gastrointestinal (GI) tract, leading to more severe symptoms and heightened fecal shedding of bacteria. This pretreatment also significantly reduces between-animal variation in bacterial burden and fecal shedding. The authors then calculate founding population sizes across different organs, discovering a severe bottleneck in the intestine, with founding populations reduced by approximately 10^6-fold compared to the inoculum size. Streptomycin pretreatment increases the founding population size and bacterial replication in the GI tract. Moreover, by calculating genetic distances between populations, the authors demonstrate that, in untreated mice, Salmonella populations within the GI tract are genetically dissimilar, suggesting limited exchange between colonization sites. In contrast, streptomycin pretreatment reduces genetic distances, indicating increased exchange.

      In extraintestinal organs, the bacterial burden is generally not substantially increased by streptomycin pretreatment, with significant differences observed only in the mesenteric lymph nodes and bile. However, the founding population sizes in these organs are increased. By comparing genetic distances between organs, the authors provide evidence that subpopulations colonizing extraintestinal organs diverge early after infection from those in the GI tract. This hypothesis is further tested by measuring bacterial burden and founding population sizes in the liver and GI tract at 5 and 120 hours post-infection. Additionally, they compare orogastric gavage infection with the less injurious method of infection via drinking, finding similar results for CFUs, founding populations, and genetic distances. These results argue against injuries during gavage as a route of direct infection. 

      To bypass bottlenecks associated with the GI tract, the authors compare intravenous (IV) and intraperitoneal (IP) routes of infection. They find approximately a 10-fold increase in bacterial burden and founding population size in immune-rich organs with IV/IP routes compared to orogastric gavage in streptomycin-pretreated animals. This difference is interpreted as a result of "extra steps required to reach systemic organs."

      While IP and IV routes yield similar results in immune-rich organs, IP infections lead to higher bacterial burdens in nearby sites, such as the pancreas, adipose tissue, and intraperitoneal wash, as well as somewhat increased founding population sizes. The authors correlate these findings with the presence of white lesions in adipose tissue. Genetic distance comparisons reveal that, apart from the spleen and liver, IP infections lead to genetically distinct populations in infected organs, whereas IV infections generally result in higher genetic similarity. 

      Finally, the authors investigate GI tract reseeding, identifying two distinct routes. They observe that the GI tracts of IP/IV-infected mice are colonized either by a clonal or a diversely tagged bacterial population. In clonally reseeded animals, the genetic distance within the GI tract is very low (often zero) compared to the bile population, which is predominantly clonal or pauciclonal. These animals also display pathological signs, such as cloudy/hardened bile and increased bacterial burden, leading the authors to conclude that the GI tract was reseeded by bacteria from the gallbladder bile. In contrast, animals reseeded by more complex bacterial populations show that bile contributes only a minor fraction of the tags. Given the large founding population size in these animals' GI tracts, which is larger than in orogastrically infected animals, the authors suggest a highly permissive second reseeding route, largely independent of bile. They speculate that this route may involve a reversal of known mechanisms that the pathogen uses to escape from the intestine. 

      The manuscript presents a substantial body of work that offers a meticulously detailed understanding of the population dynamics of S. Typhimurium in mice. It quantifies the processes shaping the within-host dynamics of this pathogen and provides new insights into its spread, including previously unrecognized dissemination routes. The methodology is appropriate and carefully executed, and the manuscript is well-written, clearly presented, and concise. The authors' conclusions are well-supported by experimental results and thoroughly discussed. This work underscores the power of using highly diverse barcoded pathogens to uncover the within-host population dynamics of infections and will likely inspire further investigations into the molecular mechanisms underlying the bottlenecks and dissemination routes described here.

      Major point:

      Substantial conclusions in the manuscript rely on genetic distance measurements using the Cavalli-Sforza chord distance. However, it is unclear whether these genetic distance measurements are independent of the founding population size. I would anticipate that in populations with larger founding population sizes, where the relative tag frequencies are closer to those in the inoculum, the genetic distances would appear smaller compared to populations with smaller founding sizes independent of their actual relatedness. This potential dependency could have implications for the interpretation of findings, such as those in Figures 2B and 2D, where antibiotic-pretreated animals consistently exhibit higher founding population sizes and smaller genetic distances compared to untreated animals.

      Thank you for raising this important point regarding reliance on cord distances for gauging genetic distance in barcoded populations. The reviewer is correct that samples with more founders will be more similar to the inoculum and thus inherently more similar to other samples that also have more founders. However, creation of libraries containing very large numbers of unique barcodes can often circumvent this issue. In this case, the effect size of chance-based similarity is not large enough to change the interpretation of the data in Figures 2B and 2D. In our case, the library has ~6x10<sup>4</sup> barcodes, and the founding populations in Figure 2B are ~10<sup>3</sup>. Randomly resampling to create two populations of 10<sup>3</sup> cells from an initial population with 6x10<sup>4</sup> barcodes is expected to yield largely distinct populations with very little similarity. Thus, the similarity between streptomycin-treated populations in Figure 2D is likely the result of biology rather than chance.  

      Reviewer #2 (Public review):

      In this paper, Hotinger et. al. propose an improved barcoded library system, called STAMPR, to study Salmonella population dynamics during infection. Using this system, the authors demonstrate significant diversity in the colonization of different Salmonella clones (defined by the presence of different barcodes) not only across different organs (liver, spleen, adipose tissues, pancreas, and gall bladder) but also within different compartments of the same gastrointestinal tissue. Additionally, this system revealed that microbiota competition is the major bottleneck in Salmonella intestinal colonization, which can be mitigated by streptomycin treatment. However, this has been demonstrated previously in numerous publications. They also show that there was minimal sharing between populations found in the intestine and those in the other organs. Upon IV and IP infection to bypass the intestinal bottleneck, they were able to demonstrate, using this library, that Salmonella can renter the intestine through two possible routes. One route is essentially the reverse path used to escape the gut, leading to a diverse intestinal population; while the other, through the bile, typically results in a clonal population. Although the authors showed that the STAMPR pipeline improved the ability to identify founder populations and their diversity within the same animal during infections, some of the conclusions appear speculative and not fully supported.

      (1) It's particularly interesting how the authors, using this system, demonstrate the dominant role of the microbiota bottleneck in Salmonella colonization and how it is widened by antibiotic treatment (Figure 1). Additionally, the ability to track Salmonella reseeding of the gut from other organs starting with IV and IP injections of the pathogen provides a new tool to study population dynamics (Figure 5). However, I don't think it is possible to argue that the proximal and distal small intestine, Peyer's patches (PPs), cecum, colon, and feces have different founder populations for reasons other than stochastic variations. All the barcoded Salmonella clones have the same fitness and the fact that some are found or expanded in one region of the gastrointestinal tract rather than another likely results from random chance - such as being forced in a specific region of the gut for physical or spatial reasons-and subsequent expansion, rather than any inherent biological cause. For example, some bacteria may randomly adhere to the mucus, some may swim toward the epithelial layer, while others remain in the lumen; all will proliferate in those respective sites. In this way, different founder populations arise based on random localization during movement through the gastrointestinal tract, which is an observation, but it doesn't significantly contribute to understanding pathogen colonization dynamics or pathogenesis. Therefore, I would suggest placing less emphasis on describing these differences or better discussing this aspect, especially in the context of the gastrointestinal tract.

      Thank you for helping us identify this area for further clarification. We agree with the reviewer’s interpretation that seeding of proximal and distal small intestine, Peyer's patches (PPs), cecum, colon, and feces with different founder populations is likely caused by stochastic variations, consistent with separate stochastic bottlenecks to establishing these separate niches. To clarify this point we have modified the text in the results section, “Streptomycin treatment decreases compartmentalization of S. Typhimurium populations within the intestine”.

      Change to text:

      “Except for the cecum and colon, in untreated animals the S. Typhimurium populations in different regions of the intestine were dissimilar (Avg. GD ranged from 0.369 to 0.729, 2D left); i.e., there is little sharing between populations in the intestine. These data suggest that there are separate bottlenecks in different regions of the intestine that cause stochastic differences in the identity of the founders. Interestingly, when these founders replicate, they do not mix, remaining compartmentalized with little sharing between populations throughout the intestinal tract (i.e., barcodes found in one region are not in other regions, Figure S3). This was surprising as the luminal contents, an environment presumably conducive to bacterial movement, were not removed from these samples.”

      In this section we are interested in the underlying biology that occurs after the initial bottleneck to preserve this compartmentalization during outgrowth of the intestinal population. In other words, what prevents these separate populations from merging (e.g., what prevents the bacteria replicating in the proximal small intestine from traveling through the intestine and establishing a niche in the distal small intestine)? While we do not explore the mechanisms of compartmentalization, we observe that it is disrupted by streptomycin pretreatment, suggesting a microbiota-dependent biological cause. 

      (2) I do think that STAMPR is useful for studying the dynamics of pathogen spread to organs where Salmonella likely resides intracellularly (Figure 3). The observation that the liver is colonized by an early intestinal population, which continues to proliferate at a steady rate throughout the infection, is very interesting and may be due to the unique nature of the organ compared to the mucosal environment. What is the biological relevance during infection? Do the authors observe the same pattern (Figures 3C and G) when normalizing the population data for the spleen and mesenteric lymph nodes (mLN)? If not, what do the authors think is driving this different distribution?

      Thank you for raising this interesting point. These data indicate that the liver is seeded from the intestine early during infection. The timing and source of dissemination have relevance for understanding how host and pathogen variables control the spread of bacteria to systemic sites. For example, our conclusion (early dissemination) indicates that the immune state of a host at the time of exposure to a pathogen, and for a short period thereafter, are what primarily influence the process of dissemination, not the later response to an active infection. 

      We observe that the liver and mucosal environments within the intestine have similar colonization behaviors. Both niches are seeded early during infection, followed by steady pathogen proliferation and compartmentalization that apparently inhibits further seeding. This results in the identity of barcodes in the liver population remaining distinct from the intestinal populations, and the intestinal populations remaining distinct from each other.

      We observe a similar pattern to the liver in the spleen and MLN (the barcodes in the spleen and MLN are dissimilar to the population in the intestine). To clarify this point, we have modified the text (below) and added this analysis as a supplemental figure (S4).

      Change to text:

      Genetic distance comparison of liver samples to other sites revealed that, regardless of streptomycin treatment, there was very little sharing of barcodes between the intestine and extraintestinal sites (Avg. GD >0.75, Figure 3C). Furthermore, the MLN and spleen populations also lacked similarity with the intestine (Figure S4). These analyses strongly support the idea that S. Typhimurium disseminates to extraintestinal organs relatively early following inoculation, before it establishes a replicative niche in the intestine.

      (3) Figure 6: Could the bile pathology be due to increased general bacterial translocation rather than Salmonella colonization specifically? Did the authors check for the presence of other bacteria (potentially also proliferating) in the bile? Do the authors know whether Salmonella's metabolic activity in the bile could be responsible for gallbladder pathology?

      The reviewer raises interesting points for future work. We did not check whether other bacterial species are translocating during S. Typhimurium infection. The relevance of Salmonella’s metabolic activity is also very interesting, and we hope these questions will be answered by future studies.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor points:

      (1) P. 9/10 "... the marked delay in shedding after IP and IV relative to orogastric inoculation suggest that the S. Typhimurium population encounters substantial bottleneck(s) on the route(s) from extraintestinal sites back to the intestine.": Can you conclude that from the data? It could also be possible that there is a biological mechanism (other than chance events) that delays the re-entry to the intestine.

      We propose that the delay in shedding indicates additional obstacles that bacteria face when re-entering the intestine, and that there are likely biological mechanisms that cause this delay. However, these unknown mechanisms effectively act as additional bottlenecks by causing a stochastic loss of population diversity. 

      (2) P. 11 "...both organs would likely contain all 10 barcodes. In contrast, a library with 10,000 barcodes can be used to distinguish between a bottleneck resulting in Ns = 1,000 and Ns = 10,000, since these bottlenecks result in a different number of barcodes in output samples. Furthermore, high diversity libraries reduce the likelihood that two tissue samples share the same barcode(s) due to random chance, enabling more accurate quantification of bacterial dissemination.": I agree with the general analysis, but I find it misleading to talk about the presence of barcodes when the analyses in this manuscript are based on the much more powerful comparison of relative abundance of individual tags instead of their presence or absence.

      The reviewer raises an excellent point, and the distinction between relative abundance versus presence/absence is discussed extensively in the original STAMPR manuscript. Although relative abundance is powerful, the primary metric used in this study (Ns) is calculated principally from the number of barcodes, corrected (via simulations) for the probability of observing the same barcode across distinct founders. Although this correction procedure does rely on barcode abundance, the primary driver of founding population quantification is the number of barcodes.

      (3) P.14 "the library in LB supplemented with SM was not significantly different than the parent strain" and Figure 2C: How was significance tested? How many times were the growth curves recorded? On my print-out, the red color has different shades for different growth curves.

      Significance was tested with a Mann-Whitney and growth curves were performed 5 times. Growth curves are displayed with 50% opacity, and as a result multiple curves directly on top of each other appear darker. The legend to S2 has been modified accordingly.

      (4) P.16: close bracket in the equation for FRD calculation.

      Done

      (5) Figure 2C "Average CFU per founder": I found the wording confusing at first as I thought you divided the average bacterial burden per organ by Ns, instead of averaging the CFU/Ns calculated for each mouse.

      The wording has been clarified. 

      (6) Figure 3B: It would be helpful to include expected genetic distances in the schematic as it is difficult to infer the genetic distance when only two of three, respectively, different "barcode colors" are used. While I find the explanation in the main text intuitive, a graphical representation would have helped me.

      Thank you for the suggestion. Unfortunately, using colors to represent barcodes is imperfect and limits the diversity that can be depicted. We have modified Figure 3B to further clarify. 

      (7) Figure 3C: Why do you compare the genetic distance to the liver, when you discuss the genetic distance of the intestinal population? Is it not possible that the intestinal populations are similar to the extraintestinal organs except the liver?

      For clarity, we chose to highlight exclusively the liver. However, we observed a similar pattern to the liver in other extraintestinal organs. To clarify the generalizability of this point we have added a supplemental figure with comparisons to MLN and Spleen (Supplemental figure S4) as well as further text.

      (8) Figure 3C & S5A: I found "+SM" and "+SM, Drinking" confusing and would have preferred "+SM, Gavage" and "+SM, Drinking" for clarity.

      Done, thank you for the suggestion.

      (9) Figure 3G&H: I find it worthy of discussion that the bacterial burden increases over time, while the founding population decreases. Does that not indicate that replication only occurs at specific sites leading to the amplification of only a few barcodes and thereby a larger change of the relative barcode abundance compared to the inoculum?

      From 5h to 120h the size of the founding population decreases in multiple intestinal sites. This likely indicates that the impact of the initial bottleneck is still ongoing at 5h, although further temporal analysis would be required to define the exact timing of the bottleneck. Notably, the passage time through the mouse intestine is ~5h. Many of the founders observed at 5h could be a population that will never establish a replicative niche, and failing to colonize be shed in the feces, bottlenecking the population between 5h and 120h. To clarify this point we have added the following text:

      Section “S. Typhimurium disseminates out of the intestine before establishing an intestinal replicative niche”.

      “In contrast to the liver, there were more founders present in samples from the intestine (particularly in the colon) at 5 hours versus 120 hours (Figure 3H). These data likely indicate that many of the founders observed in the intestine at 5 hours are shed in the feces prior to establishing a replicative niche, and demonstrates that the forces restricting the S. Typhimurium population in the intestine act over a period of > 5 hours.”  

      (10) Figure S2A: I do not understand this figure. Why are there more than 70.000 tags listed? I was under the impression the barcode library in S. Typhimurium had 55.000 tags while only the plasmid pSM1 had more than 70.000 (but the plasmid should not be relevant here). Why are there distinct lines at approximately 10^-5 and a bit lower? I would have expected continuously distributed barcode frequencies.

      During barcode analysis, each library is mapped to the total barcode list in the barcode donor pSM1, which contains ~70,000 barcodes. This enables consistent analysis across different bacterial libraries. The designation “barcode number” refers to the barcode number in pSM1, meaning many of the barcodes in the Salmonella library are at zero reads. This graph type was chosen to show there was no bias toward a particular barcode, however there is significant overlap of the points, making individual barcode frequencies difficult to see. We have changed the x-axis to state “pSM1 Barcode Number” and clarified in the figure legend.

      Since the y-axes on these graphs is on a log10 scale, the lines represent barcodes with 1 read, 2 reads, 3 reads, etc. As the number of reads per barcode increases linearly, the space between them decreases on logarithmic axes.

      (11) There are a few typos in the figure legends of the supplementary material. For example Figure S2: S. Typhimurium not italicized, ~7x105 no superscript. Fig. S4&5 ", Open circles" is "O" is capitalized.

      Typos have been corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This is an interesting manuscript where the authors systematically measure rG4 levels in brain samples at different ages of patients affected by AD. To the best of my knowledge this is the first time that BG4 staining is used in this context and the authors provide compelling evidence to show an association with BG4 staining and age or AD progression, which interestingly indicates that such RNA structure might play a role in regulating protein homeostasis as previously speculated. The methods used and the results reported seems robust and reproducible. There were two main things that needed addressing:

      (1) Usually in BG4 staining experiments to ensure that the signal detected is genuinely due to rG4 an RNase treatment experiment is performed. This does not have to be extended to all the samples presented but having a couple of controls where the authors observe loss of staining upon RNase treatment will be key to ensure with confidence that rG4s are detected under the experimental conditions. This is particularly relevant for this brain tissue samples where BG4 staining has never been performed before.

      (2) The authors have an association between rG4-formation and age/disease progression. They also observe distribution dependency of this, which is great. However, this is still an association which does not allow the model to be supported. This is not something that can be fixed with an easy experiment and it is what it is, but my point is that the narrative of the manuscript should be more fair and reflect the fact that, although interesting, what the authors are observing is a simple correlation. They should still go ahead and propose a model for it, but they should be more balanced in the conclusion and do not imply that this evidence is sufficient to demonstrate the proposed model. It is absolutely fine to refer to the literature and comment on the fact that similar observations have been reported and this is in line with those, but still this is not an ultimate demonstration.

      Comments on current version:

      The authors have now addressed my concerns.

      We thank the reviewer for their support!

      Reviewer #2 (Public review):

      RNA guanine-rich G-quadruplexes (rG4s) are non-canonical higher order nucleic acid structures that can form under physiological conditions. Interestingly, cellular stress is positively correlated with rG4 induction.

      In this study, the authors examined human hippocampal postmortem tissue for the formation ofrG4s in aging and Alzheimer Disease (AD). rG4 immunostaining strongly increased in the hippocampus with both age and with AD severity. 21 cases were used in this study (age range 30-92).

      This immunostaining co-localized with hyper-phosphorylated tau immunostaining in neurons. The BG4 staining levels were also impacted by APOE status. rG4 structure was previously found to drive tau aggregation. Based on these observations, the authors propose a model of neurodegeneration in which chronic rG4 formation drives proteostasis collapse.

      This model is interesting, and would explain different observations (e.g., RNA is present in AD aggregates and rG4s can enhance protein oligomerization and tau aggregation).

      Main issue from the previous round of review:

      There is indeed a positive correlation between Braak stage severity and BG4 staining, but this correlation is relatively weak and borderline significant ((R = 0.52, p value = 0.028). This is probably the main limitation of this study, which should be clearly acknowledged (together with a reminder that "correlation is not causality"). Related to this, here is no clear justification to exclude the four individuals in Fig 1d (without them R increases to 0.78). Please remove this statement. On the other hand, the difference based on APOE status is more striking.

      Comments on current version:

      The authors have made laudable efforts to address the criticisms I made in my evaluation of the original manuscript.

      We thank the reviewer for their support!

      Recommendations for the authors:

      Reviewing Editor:

      I would suggest two minor edits:

      - The findings are correlative and descriptive, but the title implies functionality (A New Role for RNA G-quadruplexes in Aging and Alzheimer′s Disease). I would suggest toning down this title).

      - While I understand the limitations in performing additional biochemical experiments to validate the immunofluorescence study, I think this is worth mentioning as a limitation in the text.

      We have made these two changes as requested, altering the title to remove the word Role that may imply more meaning than intended, and adding a line to the discussion on the need for future additional biochemical experiments.

      Reviewer #1 (Recommendations for the authors):

      Thanks for addressing the concerns raised.

      We thank the reviewer for their support!

      Reviewer #2 (Recommendations for the authors):

      Minor point:

      Related to the "correlation is not causality" remark I made in my evaluation of the original manuscript: the authors' answer is reasonable. Still, I would suggest to modify the abstract: "we propose a model of neurodegeneration in which chronic rG4 formation drives proteostasis collapse" => "we propose a model of neurodegeneration in which chronic rG4 formation is linked to proteostasis collapse"

      All other remarks I made have been answered properly.

      We thank the reviewer for their support! We have made the change exactly as requested by the reviewer.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The manuscript investigates lipid scrambling mechanisms across TMEM16 family members using coarse-grained molecular dynamics (MD) simulations. While the study presents a statistically rigorous analysis of lipid scrambling events across multiple structures and conformations, several critical issues undermine its novelty, impact, and alignment with experimental observations.

      Critical issues:

      (1) Lack of Novelty:

      The phenomenon of lipid scrambling via an open hydrophilic groove is already well-established in the literature, including through atomistic MD simulations. The authors themselves acknowledge this fact in their introduction and discussion. By employing coarse-grained simulations, the study essentially reiterates previously known findings with limited additional mechanistic insight. The repeated observation of scrambling occurring predominantly via the groove does not offer significant advancement beyond prior work.

      We agree with the reviewer’s statement regarding the lack of novelty when it comes to our observations of scrambling in the groove of open Ca<sup>2+</sup>-bound TMEM16 structures. However, we feel that the inclusion of closed structures in this study, which attempts to address the yet unanswered question of how scrambling by TMEM16s occurs in the absence of Ca<sup>2+</sup>, offers new observations for the field. In our study we specifically address to what extent the induced membrane deformation, which has been theorized to aid lipids cross the bilayer especially in the absence of Ca<sup>2+</sup>, contributes to the rate of scrambling (see references 36, 59, and 66). There are also several TMEM16F structures solved under activating conditions (bound to Ca<sup>2+</sup> and in the presence of PIP2) which feature structural rearrangements to TM6 that may be indicative of an open state (PDB 6P48) and had not been tested in simulations. We show that these structures do not scramble and thereby present evidence against an out-of-the-groove scrambling mechanism for these states. Although we find a handful of examples of lipids being scrambled by Ca<sup>2+</sup>-free structures of TMEM16 scramblases, none of our simulations suggest that these events are related to the degree of deformation.

      (2) Redundancy Across Systems:

      The manuscript explores multiple TMEM16 family members in activating and non-activating conformations, but the conclusions remain largely confirmatory. The extensive dataset generated through coarse-grained MD simulations primarily reinforces established mechanistic models rather than uncovering fundamentally new insights. The effort, while statistically robust, feels excessive given the incremental nature of the findings.

      Again, we agree with the reviewer’s statement that our results largely confirm those published by other groups and our own. We think there is however value in comparing the scrambling competence of these TMEM16 structures in a consistent manner in a single study to reduce inconsistencies that may be introduced by different simulation methods, parameters, environmental variables such as lipid composition as used in other published works of single family members. The consistency across our simulations and high number of observed scrambling events have allowed us to confirm that the mechanism of scrambling is shared by multiple family members and relies most obviously on groove dilation.

      (3) Discrepancy with Experimental Observations:

      The use of coarse-grained simulations introduces inherent limitations in accurately representing lipid scrambling dynamics at the atomistic level. Experimental studies have highlighted nuances in lipid permeation that are not fully captured by coarse-grained models. This discrepancy raises questions about the biological relevance of the reported scrambling events, especially those occurring outside the canonical groove.

      We thank the reviewer for bringing up the possible inaccuracies introduced by coarse graining our simulations. This is also a concern for us, and we address this issue extensively in our discussion. As the reviewer pointed out above, our CG simulations have largely confirmed existing evidence in the field which we think speaks well to the transferability of observations from atomistic simulations to the coarse-grained level of detail. We have made both qualitative and quantitative comparisons between atomistic and coarse-grained simulations of nhTMEM16 and TMEM16F (Figure 1, Figure 4-figure supplement 1, Figure 4-figure supplement 5) showing the two methods give similar answers for where lipids interact with the protein, including outside of the canonical groove. We do not dispute the possible discrepancy between our simulations and experiment, but our goal is to share new nuanced ideas for the predicted TMEM16 scrambling mechanism that we hope will be tested by future experimental studies.

      (4) Alternative Scrambling Sites:

      The manuscript reports scrambling events at the dimer-dimer interface as a novel mechanism. While this observation is intriguing, it is not explored in sufficient detail to establish its functional significance. Furthermore, the low frequency of these events (relative to groove-mediated scrambling) suggests they may be artifacts of the simulation model rather than biologically meaningful pathways.

      We agree with the reviewer that our observed number of scrambling events in the dimer interface is too low to present it as strong evidence for it being the alternative mechanism for Ca<sup>2+</sup>-independent scrambling. This will require additional experiments and computational studies which we plan to do in future research. However, we are less certain that these are artifacts of the coarse-grained simulation system as we observed a similar event in an atomistic simulation of TMEM16F.

      Conclusion:

      Overall, while the study is technically sound and presents a large dataset of lipid scrambling events across multiple TMEM16 structures, it falls short in terms of novelty and mechanistic advancement. The findings are largely confirmatory and do not bridge the gap between coarse-grained simulations and experimental observations. Future efforts should focus on resolving these limitations, possibly through atomistic simulations or experimental validation of the alternative scrambling pathways.

      Reviewer #2 (Public review):

      Summary:

      Stephens et al. present a comprehensive study of TMEM16-members via coarse-grained MD simulations (CGMD). They particularly focus on the scramblase ability of these proteins and aim to characterize the "energetics of scrambling". Through their simulations, the authors interestingly relate protein conformational states to the membrane's thickness and link those to the scrambling ability of TMEM members, measured as the trespassing tendency of lipids across leaflets. They validate their simulation with a direct qualitative comparison with Cryo-EM maps.

      Strengths:

      The study demonstrates an efficient use of CGMD simulations to explore lipid scrambling across various TMEM16 family members. By leveraging this approach, the authors are able to bypass some of the sampling limitations inherent in all-atom simulations, providing a more comprehensive and high-throughput analysis of lipid scrambling. Their comparison of different protein conformations, including open and closed groove states, presents a detailed exploration of how structural features influence scrambling activity, adding significant value to the field. A key contribution of this study is the finding that groove dilation plays a central role in lipid scrambling. The authors observe that for scrambling-competent TMEM16 structures, there is substantial membrane thinning and groove widening. The open Ca<sup>2+</sup>-bound nhTMEM16 structure (PDB ID 4WIS) was identified as the fastest scrambler in their simulations, with scrambling rates as high as 24.4 {plus minus} 5.2 events per μs. This structure also shows significant membrane thinning (up to 18 Å), which supports the hypothesis that groove dilation lowers the energetic barrier for lipid translocation, facilitating scrambling.

      The study also establishes a correlation between structural features and scrambling competence, though analyses often lack statistical robustness and quantitative comparisons. The simulations differentiate between open and closed conformations of TMEM16 structures, with open-groove structures exhibiting increased scrambling activity, while closed-groove structures do not. This finding aligns with previous research suggesting that the structural dynamics of the groove are critical for scrambling. Furthermore, the authors explore how the physical dimensions of the groove qualitatively correlate with observed scrambling rates. For example, TMEM16K induces increased membrane thinning in its open form, suggesting that membrane properties, along with structural features, play a role in modulating scrambling activity.

      Another significant finding is the concept of "out-of-the-groove" scrambling, where lipid translocation occurs outside the protein's groove. This observation introduces the possibility of alternate scrambling mechanisms that do not follow the traditional "credit-card model" of groove-mediated lipid scrambling. In their simulations, the authors note that these out-of-the-groove events predominantly occur at the dimer interface between TM3 and TM10, especially in mammalian TMEM16 structures. While these events were not observed in fungal TMEM16s, they may provide insight into Ca<sup>2+</sup>-independent scrambling mechanisms, as they do not require groove opening.

      Weaknesses:

      A significant challenge of the study is the discrepancy between the scrambling rates observed in CGMD simulations and those reported experimentally. Despite the authors' claim that the rates are in line experimentally, the observed differences can mean large energetic discrepancies in describing scrambling (larger than 1kT barrier in reality). For instance, the authors report scrambling rates of 10.7 events per μs for TMEM16F and 24.4 events per μs for nhTMEM16, which are several orders of magnitude faster than experimental rates. While the authors suggest that this discrepancy could be due to the Martini 3 force field's faster diffusion dynamics, this explanation does not fully account for the large difference in rates. A more thorough discussion on how the choice of force field and simulation parameters influence the results, and how these discrepancies can be reconciled with experimental data, would strengthen the conclusions. Likewise, rate calculations in the study are based on 10 μs simulations, while experimental scrambling rates occur over seconds. This timescale discrepancy limits the study's accuracy, as the simulations may not capture rare or slow scrambling events that are observed experimentally and therefore might underestimate the kinetics of scrambling. It's however important to recognize that it's hard (borderline unachievable) to pinpoint reasonable kinetics for systems like this using the currently available computational power and force field accuracy. The faster diffusion in simulations may lead to overestimated scrambling rates, making the simulation results less comparable to real-world observations. Thus, I would therefore read the findings qualitatively rather than quantitatively. An interesting observation is the asymmetry observed in the scrambling rates of the two monomers. Since MARTINI is known to be limited in correctly sampling protein dynamics, the authors - in order to preserve the fold - have applied a strong (500 kJ mol-1 nm-2) elastic network. However, I am wondering how the ENM applies across the dimer and if any asymmetry can be noticed in the application of restraints for each monomer and at the dimer interface. How can this have potentially biased the asymmetry in the scrambling rates observed between the monomers? Is this artificially obtained from restraining the initial structure, or is the asymmetry somehow gatekeeping the scrambling mechanism to occur majorly across a single monomer? Answering this question would have far-reaching implications to better describe the mechanism of scrambling.

      The main aim of our computational survey was to directly compare all relevant published TMEM16 structures in both open and closed states using the Martini 3 CGMD force field. Our standardized simulation and analysis protocol allowed us to quantitatively compare scrambling rates across the TMEM16 family, something that has never been done before. We do acknowledge that direct comparison between simulated versus experimental scrambling rates is complicated and is best to be interpreted qualitatively. In line with other reports (e.g., Li et al, PNAS 2024), lipid scrambling in CGMD is 2-3 orders of magnitude faster than typical experimental findings. In the CG simulation field, these increased dynamics due to the smoother energy landscape are a well known phenomenon. In our view, this is a valuable trade-off for being able to capture statistically robust scrambling dynamics and gain mechanistic understanding in the first place, since these are currently challenging to obtain otherwise. For example, with all-atom MD it would have been near-impossible to conclude that groove openness and high scrambling rates are closely related, simply because one would only measure a handful of scrambling events in (at most) a handful of structures.

      Considering the elastic network: the reviewer is correct in that the elastic network restrains the overall structure to the experimental conformation. This is necessary because the Martini 3 force field does not accurately model changes in secondary (and tertiary) structure. In fact, by retaining the structural information from the experimental structures, we argue that the elastic network helped us arrive at the conclusion that groove openness is the major contributing factor in determining a protein’s scrambling rate. This is best exemplified by the asymmetric X-ray structure of TMEM16K (5OC9), in which the groove of one subunit is more dilated than the other. In our simulation, this information was stored in the elastic network, yielding a 4x higher rate in the open groove than in the closed groove, within the same trajectory.

      Notably, the manuscript does not explore the impact of membrane composition on scrambling rates. While the authors use a specific lipid composition (DOPC) in their simulations, they acknowledge that membrane composition can influence scrambling activity. However, the study does not explore how different lipids or membrane environments or varying membrane curvature and tension, could alter scrambling behaviour. I appreciate that this might have been beyond the scope of this particular paper and the authors plan to further chase these questions, as this work sets a strong protocol for this study. Contextualizing scrambling in the context of membrane composition is particularly relevant since the authors note that TMEM16K's scrambling rate increases tenfold in thinner membranes, suggesting that lipid-specific or membrane-thickness-dependent effects could play a role.

      Considering different membrane compositions: for this study, we chose to keep the membranes as simple as possible. We opted for pure DOPC membranes, because it has (1) negligible intrinsic curvature, (2) forms fluid membranes, and (3) was used previously by others (Li et al, PNAS 2024). As mentioned by the reviewer, we believe our current study defines a good standardized protocol and solid baseline for future efforts looking into the additional effects of membrane composition, tension, and curvature that could all affect TMEM16-mediated lipid scrambling.

      Reviewer #3 (Public review):

      Strengths:

      The strength of this study emerges from a comparative analysis of multiple structural starting points and understanding global/local motions of the protein with respect to lipid movement. Although the protein is well-studied, both experimentally and computationally, the understanding of conformational events in different family members, especially membrane thickness less compared to fungal scramblases offers good insights.

      We appreciate the reviewer recognizing the value of the comparative study. In addition to valuable insights from previous experimental and computational work, we hope to put forward a unifying framework that highlights various TMEM16 structural features and membrane properties that underlie scrambling function.

      Weaknesses:

      The weakness of the work is to fully reconcile with experimental evidence of Ca²⁺-independent scrambling rates observed in prior studies, but this part is also challenging using coarse-grain molecular simulations. Previous reports have identified lipid crossing, packing defects, and other associated events, so it is difficult to place this paper in that context. However, the absence of validation leaves certain claims, like alternative scrambling pathways, speculative.

      It is generally difficult to quantitatively compare bulk measurements of scrambling phenomena with simulation results. The advantage of simulations is to directly observe the transient scrambling events at a spatial and temporal resolution that is currently unattainable for experiments. The current experimental evidence for the precise mechanism of Ca<sup>2+</sup>-independent scrambling is still under debate. We therefore hope to leverage the strength of MD and statistical rigor of coarse-grained simulations to generate testable hypotheses for further structural, biochemical, and computational studies.

    1. 3.5 IPC in Shared-Memory Systems Interprocess communication using shared memory requires communicating processes to establish a region of shared memory. Typically, a shared-memory region resides in the address space of the process creating the shared-memory segment. Other processes that wish to communicate using this shared-memory segment must attach it to their address space. Recall that, normally, the operating system tries to prevent one process from accessing another process's memory. Shared memory requires that two or more processes agree to remove this restriction. They can then exchange information by reading and writing data in the shared areas. The form of the data and the location are determined by these processes and are not under the operating system's control. The processes are also responsible for ensuring that they are not writing to the same location simultaneously. To illustrate the concept of cooperating processes, let's consider the producer–consumer problem, which is a common paradigm for cooperating processes. A producer process produces information that is consumed by a consumer process. For example, a compiler may produce assembly code that is consumed by an assembler. The assembler, in turn, may produce object modules that are consumed by the loader. The producer–consumer problem also provides a useful metaphor for the client–server paradigm. We generally think of a server as a producer and a client as a consumer. For example, a web server produces (that is, provides) web content such as HTML files and images, which are consumed (that is, read) by the client web browser requesting the resource. One solution to the producer–consumer problem uses shared memory. To allow producer and consumer processes to run concurrently, we must have available a buffer of items that can be filled by the producer and emptied by the consumer. This buffer will reside in a region of memory that is shared by the producer and consumer processes. A producer can produce one item while the consumer is consuming another item. The producer and consumer must be synchronized, so that the consumer does not try to consume an item that has not yet been produced. Two types of buffers can be used. The unbounded buffer places no practical limit on the size of the buffer. The consumer may have to wait for new items, but the producer can always produce new items. The bounded buffer assumes a fixed buffer size. In this case, the consumer must wait if the buffer is empty, and the producer must wait if the buffer is full. Let's look more closely at how the bounded buffer illustrates interprocess communication using shared memory. The following variables reside in a region of memory shared by the producer and consumer processes: #define BUFFER_SIZE 10 typedef struct { . . . } item; item buffer[BUFFER_SIZE]; int in = 0; int out = 0; The shared buffer is implemented as a circular array with two logical pointers: in and out. The variable in points to the next free position in the buffer; out points to the first full position in the buffer. The buffer is empty when in == out; the buffer is full when ((in + 1) % BUFFER_SIZE) == out. The code for the producer process is shown in Figure 3.12, and the code for the consumer process is shown in Figure 3.13. The producer process has a local variable next_produced in which the new item to be produced is stored. The consumer process has a local variable next_consumed in which the item to be consumed is stored. item next_produced; while (true) {      /* produce an item in next_produced */      while (((in + 1) % BUFFER_SIZE) == out)        ; /* do nothing */      buffer[in] = next_produced;      in = (in + 1) % BUFFER_SIZE; } Figure 3.12 The producer process using shared memory. item next_consumed; while (true) {      while (in == out)        ; /* do nothing */      next_consumed = buffer[out];      out = (out + 1) % BUFFER_SIZE;      /* consume the item in next_consumed */ } Figure 3.13 The consumer process using shared memory. This scheme allows at most BUFFER_SIZE − 1 items in the buffer at the same time. We leave it as an exercise for you to provide a solution in which BUFFER_SIZE items can be in the buffer at the same time. In Section 3.7.1, we illustrate the POSIX API for shared memory. One issue this illustration does not address concerns the situation in which both the producer process and the consumer process attempt to access the shared buffer concurrently. In Chapter 6 and Chapter 7, we discuss how synchronization among cooperating processes can be implemented effectively in a shared-memory environment.

      Interprocess communication (IPC) in shared-memory systems allows processes to communicate by creating a shared-memory region. Normally, operating systems restrict processes from accessing each other’s memory, but shared memory requires processes to agree to lift this restriction. These processes determine data structure and management without operating system intervention. A common example is the producer–consumer problem, where a producer generates data consumed by a consumer. This paradigm extends to client-server models, such as web servers providing content to browsers. Shared memory enables concurrent execution of producers and consumers through a buffer, which can be either unbounded (allowing unlimited production) or bounded (with a fixed size, requiring synchronization). A bounded buffer, implemented as a circular array, uses two pointers, in and out, to manage data flow. The buffer is empty when in == out and full when ((in + 1) % BUFFER_SIZE) == out. The producer adds items to the buffer, while the consumer removes them. However, simultaneous access by both processes can lead to conflicts, requiring synchronization techniques, discussed in later chapters. This model enhances efficiency by minimizing kernel intervention, but careful synchronization is necessary to avoid issues like race conditions and data inconsistency.

    1. There are many design principles in broad use that are a bit more precise, even though they might not be universally good in all contexts:Simple. This is a design aesthetic that prizes minimalism and learnability. These can be good qualities, reducing how much people have to learn to use an interface and how long it takes to learn. But simplicity isn’t always good. Should moderation tools in social media simple? There’s nothing inherently simple about regulating speech, so they might need to be complicated, to reflect the complexity of preventing hate speech.Novel. In some design cultures (e.g., fashion design), the best design is the new design that pushes boundaries and explores undiscovered territories. Novelty is powerful in that it has the power to surprise and empower in new ways. It also has the power to convey status, because possession of new design suggests knowledge and awareness of the bleeding edge of human creativity, which can have status in some cultures. But novelty trades off against simplicity, because simplicity often requires familiarity and convention66 Norman, D. A. (1999). Affordance, conventions, and design. ACM interactions. .Powerful. This aesthetic values the ability of designs to augment human ability. Take, for example, a graphing calculator. These are exceedingly complex little devices with thousands of functions that can support almost any kind of mathematics. It’s certainly not simple or novel, but it’s extremely powerful. But power isn’t always good. Sometimes power leads to complexity that poses barriers to use and adoption. Powerful designs can also amplify harm; for example, powerful saved searches on Twitter enable trolls to quickly find people to harass by keyword. Is that harm worth whatever other positive might come from that power, such as saved time?Invisible. Some trends in design aesthetics value designs that “get out of the way”, trying to bring a person as close as possible to their activity, their information, and their goals. Designs that achieve invisibility don’t try to be the center of attention, but rather put the attention on the work that a person is doing with the design. Good example of designs that attempt to be invisible are the many intelligent assistants such as Siri and Alexa, which try to provide “natural” interfaces that don’t need to be learned, personalized, or calibrated. All of this may come at the expense of power and control, however, as the mechanisms we often use for invisibility are automated.Universal. The premise of universal design77 Story, M. F. (1998). Maximizing usability: the principles of universal design. Assistive Technology.  as something that all of humanity should be able to access, prizing equality over other values. For example, designing a website that is screen readable so that people who are blind can read it often constrains the type of interactivity that can be used on a site. What’s better: power and novelty or universal access? Maybe there are some types of designs that are so powerful, they should only be used by certain people with certain knowledge and skills. Of course, universal designs are rarely universal; all design exclude somehow.Just. The premise of design justice11 Costanza-Chock, S. (2020). Design justice: Community-led practices to build the worlds we need. MIT Press.  is the purpose of design should not be to amplify inequities and injustices in the world, but to dismantle them. This might mean that a design that ultimately serves the enrich and empower the wealthy (e.g., Facebook Ads) might be deemed worse than a design that helps dismantle an unjust system (e.g., a social media network for small-business loan networking amongst Black owned businesses)

      This reading talks about different ways to design things, like making them simple, new, powerful, hidden, fair, or useful for everyone. I agree that powerful designs can be both helpful and harmful, like how saved searches on Twitter can be used for good or bad. It made me think about how designers need to be careful about how their work affects people, not just how easy or exciting it is to use.

    1. n 1982 my initial goal was to reacquaint myself with the people of 0 Cruzeiro. We had lost contact with each other for many years. Letter writing was complicated by my friends' illiteracy, and after a few years both sides desisted. And if many of my Alto friends were peripatetic rural migrants, I was even more so during the early years of life "in the academy," when my family and I constantly moved back and forth across the country. Nonethe­less, prior to my return in 1982 I sent dozens of letters to everyone I could think of ... and received no response. I feared returning to a social void and felt that I might as well begin my research anywhere at all as in Born Jesus da Mata, for clearly the social world I once knew had evaporated. But curiosity and my saudades, as Brazilians call the pull of nostalgic longings, led me to persist in the plan to return to Born Jesus. In my letters I had mentioned the approximate date of my arrival in the capital city of Recife but had given no other details. Yet when we stepped off the plane, there in the crowd waving madly to us was my old friend and sometime adversary, Seu Felix, still the reigning prefeito and "boss" of Born Jesus. "Did I forget to send you a reply?" Felix asked in his usual distracted way. I had indeed come home.

      The emotional and practical challenges of sustaining long-term relationships despite distance and literacy obstacles are highlighted in this chapter. A issue pertinent to migration studies, the author's worry of returning to a "social void" is a reflection of larger worries about displacement and the loss of previous relationships. Furthermore, Seu Felix's casual comment about forgetting to react highlights the unpredictable nature of interpersonal relationships—social ties can remain in unexpected ways, even though written correspondence may not. This raises the question of how many cultures manage to stay connected in the face of challenges like migration and illiteracy. In comparable communities, are there non-written means of communication that could take the place of letters?

  3. Jan 2025
    1. The theologian Buber confronted the /1 suspension of the ethical" in accordance with the will and purpose of something "higher," the Divine;

      This phrase the, "suspension of the ethical" stood out to me because of how often we see the people in the book do this. So far, in just the Introduction we have seen many different people suspend their ethics for some outside reason. While their reasoning may be different from the 'divine' its a similar concept. Instead of women mourning the loss of their children, they have learned to expect that most of their kids will die. The way I see it is they are suspending their ethics/morals because that has become the 'norm'. It made me think about how many times we see people suspend their ethics for an outside being even today. Whether it be for a 'divine' reason or for a similar reason. That they can act a certain way because it is 'normal'. In my eyes, this is a way for someone to feel better about not acting or feeling a certain way. The reason these mothers, even fathers, aren't upset with how many losses they've endured is because they can blame it on the society's expectation. Or in a religious concept as described in the passage, Abraham can blame the divine for sacrificing his son. The reason this resonated so much with me is because I feel the need to connect this with everything going on in the United States. How people are passing certain laws because of the need to please a divine being. Or how others are placing blame on certain groups to elevate their status. They're suspending their ethics because of some outside influence. The way I see it is the suspension of ethics is finding a moral scapegoat, and it is a dangerous way of thinking.

    1. hese psychological processes have implications for our communication because when we attribute causality to another person’s personality, we tend to have a stronger emotional reaction and tend to assume that this personality characteristic is stable, which may lead us to avoid communication with the person or to react negatively.

      I never thought of it in these terms, but it really makes sense that if we assume someone has done something because of their personality (as opposed to external factors) we get emotional reactions more strongly, I certain get more emotional if i think someone is late for our meeting due to internal factors (laziness, not thinking the meeting is important etc) vs thinking that their car might be broken down

    2. Race, gender, sexual orientation, class, ability, nationality, and age all affect the perceptions that we make. The schemata through which we interpret what we perceive are influenced by our cultural identities. As we are socialized into various cultural identities, we internalize beliefs, attitudes, and values shared by others in our cultural group. Schemata held by members of a cultural identity group have similarities, but schemata held by different cultural groups may vary greatly. Unless we are exposed to various cultural groups and learn how others perceive us and the world around them, we will likely have a narrow or naïve view of the world and assume that others see things the way we do. Exposing yourself to and experiencing cultural differences in perspective doesn’t mean that you have to change your schema to match another cultural group’s. Instead, it may offer you a chance to better understand why and how your schemata were constructed the way they were.

      This is gold, and i think its one of the most important lessons and attitudes about life. I think I can tell within about 10 minutes of casually chatting with someone if they've been much of a world traveler or not (without overtly asking if a person has travelled). From a lifetime of observation alone, I've found that people who travel extensively for both work and enjoyment tend to be much more openminded and comfortable about other cultures and values. Seeing how people live in relative poverty or in different communities can certainly help a person appreciate where they themselves come from. Travelling can help a person feel more curious and less threatened by new experiences and ideas that are different from the norms back home. Generally I find that people who never left their home states tend to be fearful about the world, unable to understand different religions and value systems, and are very certain that the way things are done 'back home' are the only acceptable way to do things. That's a shame. The world is such a big place, and its much more interesting when you have an open mind and a willing attitude for exploring and experiencing.

    1. Whenever we try to pierce the meanings of lives very different from our own, we face two interpretive risks. On the one hand, we may be tempted to attribute our own ways of thinking and feeling to "other" mothers. Any suggestion of radically different existential premises (such as those, for example, that guide selective neglect in Northeast Brazil) is rejected out of hand as impossible, unthinkable. To describe some poor women as aiding and abetting the deaths of certain of thei r infants can only be seen as "victim blaming." But the alternative is to cast women as passive "victims" of their fate, as powerless, without will, agency, or subjectivity. Part of the difficulty lies in the confusion between causality and blame. There must be a way to look dispassionately at the problem of child survival and conclude that a child died from mortal neglect, even at her or his mother's own hands, without also blaming the mother-that is, without holding her personally and morally accountable.

      I think this challenges the way we often judge people from outside their circumstances. For example, when we see a mother making decisions that seem unthinkable to us, it's easy to assume she's cold or uncaring. But could it be that the lack of resources, along with cultural beliefs about life and death, shapes her actions? What if we shifted our focus from blaming individuals to addressing the systems that lead to these difficult choices? It makes me wonder how many of our assumptions about “right” and “wrong” are influenced by our own privilege, and how different the world would look if we looked deeper into the larger forces that continue to shape people’s lives.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      MacDonald et al., investigated the consequence of double knockout of substance P and CGRPα on pain behaviors using a newly created mouse model. The investigators used two methods to confirm knockout of these neuropeptides: traditional immunolabeling and a neat in vitro assay where sensory neurons from either wildtype or double knock are co-cultured with substance P "sniffer cells", HEK cells stably expressing NKR1 (a substance P receptor), GCaMP6s and Gα15. It should be noted that functional assays confirming CGRPα knockout were not performed. Subsequently, the authors assayed double knockout mice (DKO) and wildtype (WT) mice in numerous behavioral assays using different pain models, including acute pain and itch stimuli, intraplanar injection of Complete Freund's Adjuvant, prostaglandin E2, capsaicin, AITC, oxaliplatin, as well as the spared nerve injury model. Surprisingly, the authors found that pain behaviors did not differ between DKO and WT mice in any of the behavioral assays or pain paradigms. Importantly, female and male mice were included in all analyses. These data are important and significant, as both substance P and CGRPα have been implicated in pain signaling, though the magnitude of the effect of a single knockout of either gene has been variable and/or small between studies.

      The conclusions of the study are largely supported by the data; however, additional experimental controls and analyses would strengthen the authors claims.

      We thank the reviewer for their insightful comments and have answered them below.

      (1) The authors note that single knockout models of either substance P or CGRPα have produced variable effects on pain behaviors that are study-dependent. Therefore, it would have strengthened the study if the authors included these single knockout strains in a side-by-side analysis (in at least some of the behavioral assays), as has been done in prior studies in the field when using double- or triple-knockout mouse models (for example, see PMID: 33771873). If in the authors hands, single knockouts of either peptide also show no significant differences in pain behaviors, then the finding that double knockouts also do not show significant differences would be less surprising.

      In our study, we found no phenotypic differences between WT and DKO mice, suggesting Substance P and CGRPα are largely dispensable for pain behavior. We agree that if we had we observed significant changes in behavior, it would have been interesting to examine the effects of knocking out each gene individually to determine which peptide is responsible for the phenotype. However, given the double deletion had no effect, we can predict that loss of each alone would have no or minor effects. In line with this, a more recent study that comprehensively phenotyped the Calca KO mouse found no deficits in a range of danger related behaviors (PMID: 34376756). Overall, as we are reporting negative data about the Double KO, we do not believe extensive studies of the single KOs is necessary to support the findings of our paper.

      (2) It is unclear why the authors only show functional validation of substance P knockout using "sniffer" cells, but not CGRPα. Inclusion of this experiment would have added an additional layer of rigor to the study.

      Imaging of CGRPα release is more challenging using the ‘sniffer’ approach because functional CGRP receptors require the expression of two genes: Calcrl (or Calcr) along with Ramp1. We now have succeeded in generating a new stable cell line expressing Calcrl and Ramp1, along with GCaMPs and human Galpha15 and include new data in the revised Figure 1F-H and Figure Supplement 1B. These cells respond robustly to CGRPalpha, but not to SP. In contrast, the existing SP cell line responds to SP but not CGRPalpha. Capsaicin evokes a strong response in these cells in co-culture with DRGs. This response is dramatically reduced in the DKO. This data therefore confirms our mice have a loss of CGRPalpha signaling as indicated by IHC.

      (3) The authors should be a bit more reserved in the claims made in the manuscript. The main claim of the study is that "CGRPα and substance P are not required for pain transmission." However, the authors also note that neuropeptides can have opposing effects that may produce a net effect of no change. In my view, the data presented show that double knockout of substance P and CGRPα do not affect somatic pain behaviors, but do not preclude a role for either of these molecules in pain signaling more generally. Indeed, the authors also note that these neuropeptides could be involved in nociceptor crosstalk with the immune or vascular systems to promote headache. The authors only assayed pain responses to glabrous skin stimulation. How the DKO mice would behave in orofacial pain assays, migraine assays, visceral pain assays, or bone/joint pain assays, for example, was not tested. I do not suggest the authors include these experiments, only that they address the limitations/weaknesses of their study more thoroughly.

      The reviewer makes an important point that we agree with. Our study assesses acute and chronic pain in peptide DKO mice lacking Substance P and CGRPα. Most of our data focuses on the hindpaw as pain in the paw is the gold-standard approach for phenotyping pain targets and numerous well-validated chronic pain models have been developed for this body site.  However, to extend the conclusions to other tissues, we did also look at visceral pain and GI distress using acetic acid and LiCl models (Figure 2J and Figure 2 supplement). We agree with the reviewer that given the utility of CGRP monoclonal antibodies, migraine experiments would be interesting for future studies using these mice, a point we highlight in the discussion. Bone/joint pain is also clearly important from a translational perspective, but outside the scope of the current study.

      (4) A more minor but important point, the authors do not describe the nature of the WT animals used. Are the littermates or a separately maintained colony of WT animals? The WT strain background should be included in the methods section.

      The WT strain are C57/BL6j from Jackson Lab. This has been added to the methods.

      Reviewer #2 (Public Review):

      Summary:

      The paper aimed to examine the effect of co-ablating Substance P and CGRPα peptides on pain using Tac1 and Calca double knockout (DKO) mice. The authors observed no significant changes in acute, inflammatory, and neuropathic pain. These results suggest that Substance P and CGRPα peptides do not play a major role in mediating pain in mice. Moreover, they reveal that the lack of behavioral phenotype cannot be explained by the redundancy between the two peptides, which are often co-expressed in the same neuron

      Strengths:

      The paper uses a straightforward approach to address a significant question in the field. The authors confirm the absence of Substance P and CGRPα peptides at the levels of DRG, spinal cord, and midbrain. Subsequently, they employ a comprehensive battery of behavioral tests to examine pain phenotypes, including acute, inflammatory, and neuropathic pain. Additionally, they evaluate neurogenic inflammation by measuring edema and extravasation, revealing no changes in DKO mice. The data are compelling, and the study's conclusions are well-supported by the results. The manuscript is succinct and well-presented.

      We thank the reviewer for their enthusiasm for the importance of our work.

      Reviewer #3 (Public Review):

      In this study, the authors were assessing the role of double global knockout of substance P and CGPRα on the transmission of acute and chronic pain. The authors first generated the double knockout (DKO) mice and validated their animal model. This is then followed by a series of acute and chronic pain assessments to evaluate if the global DKO of these neuropeptides are important in modulating acute and chronic pain behaviors. Authors found that these DKO mice Substance P and CGRPα are not required for the transmission of acute and chronic pain although both neuropeptides are strongly implicated in chronic pain. This study does provide more insight into the role of these neuropeptides on chronic pain processing, however, more work still needs to be done. (see the comments below).

      We thank the reviewer for their detailed and constructive feedback, and below outline the steps we have taken to answer their concerns.

      (1) In assessing the double KO (result #1), why are different regions of the brains shown for substance P and CGRPα (for example, midbrain for substance P and amygdala for CGRPα)? Since the authors mentioned that these peptides co-expressed in the brain (as in the introduction), shouldn't the same brain regions be shown for both IHC? It would be ideal if the authors could show both regions (midbrain and amygdala) in addition to the DRG and spinal cord for both peptides in their findings.<br /> In addition, since this is double KO, the authors should show more representative IHC-stained brain regions (spanning from the anterior to posterior).

      We could not co-stain both SP and CGRP in the same sections as the DKO mouse has endogenous GFP and RFP fluorescence, limiting us to one channel (far red). Specifically, we use a Calca KO that is a Cre:GRP knock-in/knockout (Chen et al 2018, PMID30344042) and Tac1 KO is a tagRFP knock-in/knockout (Wu et al 2018 PMID29485996). This is why we show different brain sections.

      (2) It is also unclear as to why the authors only assessed the loss of substance P signaling in the double KO mice. Shouldn't the same be done for CGRPα signaling? Either the authors assess this, or the authors have to provide clear explanations as to why only substance P signaling was assessed.

      As noted in our response to Reviewer 1, imaging of CGRP release is more challenging using the ‘sniffer’ approach because functional CGRP receptors require the expression of two genes: Calcrl (or Calcr) along with Ramp1. We have now generated this cell line and performed the experiment (see revised Figure 1 and Figure 1 Supplement).

      (3) Has these animal's naturalistic behavior been assessed after the double KO (food intake, sleep, locomotion for example)? I think this is important as changes to these naturalistic behaviors can affect pain processes or outcomes.

      We agree that assessment of naturalistic behavior including food intake, sleep and locomotion would be interesting to look at in DKO mice. However, our study is focused on acute and chronic pain behavior of these animals, and therefore a comprehensive phenotypic assessment of naturalistic home-cage behavior is outside the scope of our study.

      (4) Figure 2H: The authors acknowledge that there is a trend to decrease with capsaicin-evoked coping-like responses. However, a close look at the graph suggests that the lack of significance could be driven by 1 mouse. Have the authors run an outlier test? Alternatively, the authors should consider adding more n to these experiments to verify their conclusions.

      We were reluctant to add more animals searching for significance. Instead, we investigated the potential phenotype further by looking at cfos staining in the cord and found no differences (Figure 2, supplement 1). This result suggests loss of the two peptides does not grossly disrupt capsaicin evoked pain signal transmission between the nociceptor and post-synaptic dorsal neurons in the spinal cord.

      (5) Similarly, the values for WT in the evoked cFos activity (Figure 2- Suppl Figure 1) are pretty variable. Considering that the n number is low (n = 5), authors should consider adding more n.<br /> Also, since the n number is low in this experiment (eg. 5 vs 4), does this pass the normality test to run a parametric unpaired t-test? Either the authors increase their n numbers or run the appropriate statistical test.

      As described in the statistical tables, the Shapiro-Wilk test indicates these data do pass the normality test. Therefore, we retain the use of the unpaired t test, which demonstrates no significant difference between the groups.

      (6) In most of the results, authors ran a parametric test despite the low n number. Authors have to ensure that they are carrying out the appropriate statistical test for their dataset and n number.

      We now provide a table of the statistical results, which provides detailed information about all statistical tests performed in this study. For experiments where we make a single comparison between the two distributions (WT vs DKO), we have run a Shapiro-Wilk test. Where the data from both groups pass the normality test, we retain the use of the unpaired t test. Where the Shapiro-Wilk test indicates data from either group are unlikely to be normally distributed, we now use a Mann-Whitney U test to compare the groups, as this non-parametric test makes no assumptions about the underlying distribution.

      Many experiments involved two factors (genotype, and e.g. temperature, drug, time-point). These data were analyzed in the original submission using 2-WAY ANOVA or Repeated Measures 2-WAY ANOVA, followed by post-hoc Sidak’s tests to compute p values adjusted for multiple comparisons. Because there is no widely agreed non-parametric alternative to 2-WAY ANOVA for analyzing data with two factors and that enables us to account for multiple comparisons, we used 2-WAY ANOVA as is typically used in the field for these kinds of experiments. We reasoned sticking with the 2-WAY ANOVA was the best course of action based on information provided by the statistical software used for this study - https://www.graphpad.com/support/faq/with-two-way-anova-why-doesnt-prism-offer-a-nonparametric-alternative-test-for-normality-test-for-homogeneity-of-variances-test-for-outliers/

      We note that regardless of the test, our conclusion that there are no major changes in acute or chronic pain behaviors are clear and strongly supported.

      (7) Along the same line of comment with the previous, authors should increase the n number for DKO for staining (Figure 4) as n number is only 3 and there is variability in the cFos quantification in the ipsilateral side.

      We believe this is not necessary as the finding is clear that there is no difference.

      (8) Authors should provide references for statement made in Line 319-321 as authors mentioned that there are accumulating evidence indicating that secretion of these neuropeptides from nociceptor peripheral terminals modulates immune cells and the vasculature in diverse tissues.

      We now provide several references to primary papers and reviews supporting this statement.

      (9) Authors state that the sample size used was similar to those from previous studies, but no references were provided. Also, even though the sample sizes used were similar, I believe that the right statistic test should be used to analyze the data.

      We have now cited several classic studies phenotyping mouse KOs in pain in the methods that used similar sample sizes. As detailed above, we have taken the reviewer’s feedback on board and performed normality testing to ensure the correct statistical test is used for each experiment.

      (10) In the discussion, the authors noted that knocking out of a gene remains the strongest test of whether the molecule is essential for a biological phenomenon. At the same time, it was acknowledged that Substance P infusion into the spinal cord elicits pain, but it is analgesic in the brain. The authors might want to expand more on this discussion, including how we can selectively assess the role of these neuropeptides in areas of interest. For example, knocking out both Substance P and CGRPα in selected areas instead of the global KO since there are reported compensatory effects.

      This is highlighted in the closing paragraph: “Emerging approaches to image and manipulate these molecules (Girven et al., 2022; Kim et al., 2023), as well as advances in quantitating pain behaviors (Bohic et al., 2023; MacDonald and Chesler, 2023), may ultimately reveal the fundamental roles of neuropeptides in generating our experience of pain.” The Kim preprint (now published, and so the citation has been updated in the text) describes a method of inactivating neuropeptide transmission in select brain regions in a cell-type specific manner.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      I do not have any major comments. My minor comments are as follows:

      (1) What was the control group for all behavioral studies? Was it WT from an independent colony or one of the littermates was used for generating controls?

      We used C57/Bl6 mice from Jax. This is now mentioned in methods.

      (2) In Fig. 2H, it seems that the effect will become significant if several mice are added.

      We are reluctant to add mice searching for significance. Sample sizes were determined before we collected the data blind.

      (3) There is no figure 3, but two figures 4.

      Thank you. This has been corrected.

      (4) Multiple typos in the legend for figure 4 (lines 234-254). Line 242 (& n=8 (3M, 3F)), line 243 (swelling and plasma), line 252 ((n=8 for) & n=6 for DKO (4M, 4F)).

      Thank you. This has been corrected.

      (5) In Figure 4 (lines 273-285), the contralateral side is mentioned in B but no images are shown.

      Thank you. We removed the mention.

      (6) Although ligand knockouts cannot be compared directly with receptor inhibition, the readers could benefit from discussing studies of receptor ablation and/or pharmacological inhibition.

      We do discuss the classic studies of receptor KO, and the clinical data on receptor blockers here –

      “However, selective antagonists of the Substance P receptor NKR1 failed to relieve chronic pain in human clinical trials (Hill, 2000). Although CGRP monoclonal antibodies and receptor blockers have proven effective for subsets of migraine patients, their usefulness for other types of pain in humans is unclear (De Matteis et al., 2020; Jin et al., 2018). In line with this, knockout mice deficient in Substance P, CGRPα or their receptors have been reported to display some pain deficits, but the analgesic effects are neither large nor consistent between studies (Cao et al., 1998; De Felipe et al., 1998; Guo et al., 2012; Salmon et al., 2001, 1999; Zimmer et al., 1998).” 

      Reviewer #3 (Recommendations For The Authors):

      Minor comments:

      (1) Figure 1E: What does chambers mean? Additionally, are the 12 chambers equally from the male and female samples (6 from male and 6 from female)?

      We have changed this to well. Each replicate is an individual well from 8 well chamber slide. In all these experiments, the wells are approximately evenly distributed by mouse, because from each mouse we cultured around 8 wells’ worth of DRGs.

      (2) Figure 1D: What does low and high mean in the Hargreaves test?

      These refer to a low and high active intensity of the radiant heat stimulus. Number is now described in the methods. 40 and 55 in the intensity units used by the instrument.

      (3) Figure 2-Suppl Figure 1: Authors should provide a bigger image of the image so that it is clearer to the readers.

      We think the image is of a reasonable size and comparable to the images used elsewhere in the paper.

      (4) Authors should consider labeling their supplementary figures in running numbers or combining supplementary figures together to avoid confusion. For example, Figure 2-Supplementary Figure 1 and Figure 2- Supplementary Figure 2 can be combined as just Supplementary Figure 2.

      We agree with the reviewer this would be clearer, but we have followed eLife’s convention for labelling and numbering supplements.

      (5) Figure 3 is mislabeled as Figure 4.

      Thank you. We have corrected this.

      (6) Only female mice were used in the CFA experiment, which does not go in line with the rest of the results which consist of both sexes.

      We have repeated the experiment with additional male mice. To be consistent with the von frey data, these were followed for 7 days, and so the figure now shows a 7 day time course.

      (7) Typo in line 243. The word "and" is subscript.

      Thank you. We have corrected this.

      (8) There is a typo in the legend for Figure 4 where E is labeled I, G is labeled as F, and J is labeled as J.

      Thank you. We have corrected this.

      (9) Authors should specify what "several weeks" means (Line 263).

      It means three weeks. We tested to 21 days. We will replace with three.

      (10) Authors should specify what "one day" means (Line 267). For example, how many days after the intraplantar oxaliplatin treatment? Also, authors should justify why that specific time point was selected or have a reference for it.

      This means one day after - 24 hours. Please see PMID: 33693512. Two references are provided in them methods.

      (11) Figure 4 legend: authors should again be specific on what "prolonged" entails (Line 277).

      We have replaced prolonged with 30 minutes brushing. Specifically, 3 x 10 min stim period, with 1 min rest between stim. It is in the methods.

      (12) In the methods section, authors state that both male and female mice were used for all experiments. However, only female mice were used in the CFA experiment (see minor comment #6). Authors should verify and correct this.

      This is correct. We only used female mice for one of the groups. We have since repeated with males, now included in the data.

      (13) Authors should be more specific in the methods section on how long the habituation per day, how many days and what were the mice habituation to (experimenter, room, chamber, etc)?

      As noted in the methods, mice are habituated for at least an hour to the chambers, and thus implicitly to the room. We do not perform explicit habituation to the investigator such as repeated handling.

      (14) Authors need to provide more information on the semi-automated procedure they are referring to in Line 397. Also, authors should also provide the criteria for cFos quantification (eg. Intensity, etc). If this has been published before, they should provide the reference.

      We have added this. We used the ‘Find maxima’ and ‘Analyze particles’ functions in FIJI, followed by a manual curation step.

      (15) How much acetone was applied and how was it applied to the paw? (Line 495)

      We used the same applicator (1ml syringe with a well at the top) to generate a droplet of acetone that was used for all mice. This has been added to methods.

      (16) Authors should specify the amount of capsaicin injected in μl (Line 500).

      20 ul. We have added this.

      (17) Authors should explain or reference why they are analyzing the 15 min interval between 5 and 20 minutes for injection (Line507-508).

      Acetic acid behaviour lasts around 30 mins in our hands. We chose the 15 minute interval because it reduces burdensome hand scoring time by 50% versus doing the whole 30 mins. We reasoned that in the first 5 mins post injection the animal behaviour may be contaminated by stress related to handling, injection and return to chamber. Thus, 5 and 20 minutes provided a sensible time-frame for scoring the behavior when it is at its peak.

      (18) Authors have to provide more information/explanation on how they decide on the conditioned taste aversion protocol. Like why they do 30 mins exposure to a single water-containing bottle followed 90 mins exposure to both bottles. If this has been published before, they should provide the reference.

      We read dozens of different published protocols in the literature, and piloted one that was something of an amalgam of some of them with various adaptations of convenience. Because it worked on our first attempt, we stuck to it. The advantage of the CTA assay is it is incredibly robust to changes in the specificities of the paradigm, evincing the clear survival value of learning to avoid tastes that make you sick.

      (19) Authors again should provide more detail in their methods section.

      a. Specify the time frame that they are assessing here (Line 533).

      This can be seen in the Figure. 0 to 120 mins. We have added it to the methods.

      b. How long were the mice allowed to recover post-SNI before mechanical allodynia was assessed (Line 545)?

      This is apparent in the figures. 2 days to 21 days. We have added it to the methods.

      c. How much of the oxaliplatin was injected into the mice?

      40 ug / 40 ul (see PMID:33693512)

      Editors note: Reviewers agreed that addressing the concerns about power, outliers, and statistics, as well as functional validation of CGRPα would raise the strength of evidence to compelling, and inclusion of comparison to single KO would raise it to exceptional.

      Should you choose to revise your manuscript, please check to ensure full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05.

    1. Reviewer #3 (Public review):

      Summary:

      Overall, this is a clearly written manuscript with nice hypothesis testing in a non-model organism that addresses the mechanism of Wolbachia-mediated male killing. The authors aim to determine how five previously identified male-killing genes (encoded in the prophage region of the wHm Wolbachia strain) impact the native host, Homona magnanima moths. This work builds on the authors' previous studies in which<br /> (1) they tested the impact of these same wHm genes via heterologous expression in Drosophila melanogaster<br /> (2) also examined the activity of other male-killing genes (e.g., from the wFur Wolbachia strain in its native host: Ostrinia furnacalis moths).

      Advances here include identifying which wHm gene most strongly recapitulates the male-killing phenotype in the native host (rather than in Drosophila), and the finding that the Hm-Oscar protein has the potential for male-killing in a diverse set of lepidopterans, as inferred by the cell-culture assays.

      Strengths:

      Strengths of the manuscript include the reverse genetics approaches to dissect the impact of specific male-killing loci, and use of a "masculinization" assay in Lepidopteran cell lines to determine the impact of interactions between specific masc and oscar homologs.

      Weaknesses:

      It is clear from Figure 1 that the combinations of wmk homologs do not cause male killing on their own here. While I largely agree with the author's conclusions that oscar is the primary MK factor in this system, I don't think we can yet rule out that wmk(s) may work synergistically or interactively with oscar in vivo. This might be worth a small note in the discussion. (eg at line 294 'indicating that wmk likely targets factors other than masc." - this could be downstream of the impacts of oscar; perhaps dependent on oscar-mediated impacts on masc first).

      Regarding the perceived male-bias in Figure 2a: I think readers might be interpreting "unhatched" as "total before hatching". You could eliminate ambiguity by perhaps splitting the bars into male and female, and then within a bar, coloring by hatched versus unhatched. But this is a minor point, and I think the updated text helps clarify this.

      The new Figure 4b looks to be largely redundant with the oscar information in Figure 1a.

      Updated statistical comparisons for the RNA-seq analysis are helpful. However these analyses are based on single libraries (albeit each a pool of many individuals), so this is still a weaker aspect of the manuscript.

      The new information on masc similarity is useful (Fig 4d) - if the authors could please include a heatmap legend for the colors, that would be helpful. Also, please avoid green and red in the same figure when key for interpretation.

      Figure 1A "helix-turn-helix" is misspelled. ("tern").

    2. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Insects and their relatives are commonly infected with microbes that are transmitted from mothers to their offspring. A number of these microbes have independently evolved the ability to kill the sons of infected females very early in their development; this male killing strategy has evolved because males are transmission dead-ends for the microbe. A major question in the field has been to identify the genes that cause male killing and to understand how they work. This has been especially challenging because most male-killing microbes cannot be genetically manipulated. This study focuses on a male-killing bacterium called Wolbachia. Different Wolbachia strains kill male embryos in beetles, flies, moths, and other arthropods. This is remarkable because how sex is determined differs widely in these hosts. Two Wolbachia genes have been previously implicated in male-killing by Wolbachia: oscar (in moth male-killing) and wmk (in fly male-killing). The genomes of some male-killing Wolbachia contain both of these genes, so it is a challenge to disentangle the two.

      This paper provides strong evidence that oscar is responsible for male-killing in moths. Here, the authors study a strain of Wolbachia that kills males in a pest of tea, Homona magnanima. Overexpressing oscar, but not wmk, kills male moth embryos. This is because oscar interferes with masculinizer, the master gene that controls sex determination in moths and butterflies. Interfering with the masculinizer gene in this way leads the (male) embryo down a path of female development, which causes problems in regulating the expression of genes that are found on the sex chromosomes.

      We would like to thank you for evaluating our manuscript.

      Strengths:

      The authors use a broad number of approaches to implicate oscar, and to dissect its mechanism of male lethality. These approaches include:

      (1) Overexpressing oscar (and wmk) by injecting RNA into moth eggs.

      (2) Determining the sex of embryos by staining female sex chromosomes.

      (3) Determining the consequences of oscar expression by assaying sex-specific splice variants of doublesex, a key sex determination gene, and by quantifying gene expression and dosage of sex chromosomes, using RNASeq.

      (4) Expressing oscar along with masculinizer from various moth and butterfly species, in a silkmoth cell line.

      This extends recently published studies implicating oscar in male-killing by Wolbachia in Ostrinia corn borer moths, although the Homona and Ostrinia oscar proteins are quite divergent. Combined with other studies, there is now broad support for oscar as the male-killing gene in moths and butterflies (i.e. order Lepidoptera). So an outstanding question is to understand the role of wmk. Is it the master male-killing gene in insects other than Lepidoptera and if so, how does it operate?

      Thank you for your comments. Wolbachia strains often carry wmk genes, but as observed in this study, the homologs in Homona showed no apparent MK ability. These showed strong male lethality in D. melanogaster, but it is still unclear whether the genes are the master male-killing gene in Diptera. It is also possible that the genes show toxicities in other lepidopteran insects as well as in other insect taxa. Further functional validation assays in different insects are warranted to clarify whether wmk shows toxicity in different insect taxa. We have also discussed the functions of wmk in the Discussion section (lines 301-306).

      Weaknesses:

      I found the transfection assays of oscar and masculinizer in the silkworm cell line (Figure 4) to be difficult to follow. There are also places in the text where more explanation would be helpful for non-experts (see recommendations).

      Thank you for your suggestion. We have thoroughly revised the manuscript to address all the questions, comments and suggestions you raised in “recommendations”. In particular, we have revised the section on the transfection assays of Oscar and Masc in Bm-N4 cells (result section “Hm-oscar suppresses the masculinizing functions of lepidopteran masc genes” starts on line 214 and Fig. 4; materials and methods section ”Transfection assays and quantification of BmIMP<sup>M</sup>”, starts on line 483). We have also provided more detailed explanations for non-experts in some contexts (in response to your recommendation). We believe that the resulting revisions have significantly improved the quality and comprehensiveness of our manuscript.

      Reviewer #2 (Public review):

      Summary:

      Wolbachia are maternally transmitted bacteria that can manipulate host reproduction in various ways. Some Wolbachia induce male killing (MK), where the sons of infected mothers are killed during development. Several MK-associated genes have been identified in Homona magnanima, including Hm-oscar and wmk-1-4, but the mechanistic links between these Wolbachia genes and MK in the native host are still unclear.

      In this manuscript, Arai et al. show that Hm-oscar is the gene responsible for Wolbachia-induced MK in Homona magnanima. They provide evidence that Hm-Oscar functions through interactions with the sex determination system. They also found that Hm-Oscar disrupts sex determination in male embryos by inducing female-type dsx splicing and impairing dosage compensation. Additionally, Hm-Oscar suppresses the function of Masc. The manuscript is well-written and presents intriguing findings. The results support their conclusions regarding the diversity and commonality of MK mechanisms, contributing to our understanding of the mechanisms and evolutionary aspects of Wolbachia-induced MK.

      We would like to thank you for evaluating our manuscript.

      Strengths/weaknesses:

      (1) The authors found that transient overexpression of Hm-oscar, but not wmk-1-4, in Wolbachia-free H. magnanima embryos induces female-biased sex ratios. These results are striking and mirror the phenotype of the wHm-t infected line (WT12). However, Table 1 lists the "male ratio," while the text presents the "female ratio" with standard deviation. For consistency, the calculation term should be uniform, and the "ratio" should be listed for each replicate.

      We have revised the first results section (Hm-oscar induces female-biased sex ratios, starting from line 147) accordingly to maintain the consistency in the calculation term. In the revised manuscript, the 'male ratio' is now consistently used, in alignment with Fig. 1. In addition, we have included all sex ratio information (number of males and females) in the supplementary data file for transparency and clarity.

      (2) The error bars in Figure 3 are quite large, and the figure lacks statistical significance labels. The authors should perform statistical analysis to demonstrate that Hm-oscar-overexpressed male embryos have higher levels of Z-linked gene expression.

      The large error bar on each chromosome (Fig.3a-d) likely reflect the overall variation in expression levels across different transcripts. Accordingly, we have included statistical data for Figure 3 based on the Steel-Dwass test for expression levels. However, displaying statistical significance directly on the whisker plots would make the figure too cluttered due to the numerous combinations. Instead, we have provided all the statistical data in the supplementary data file. To further support the claim that Z-linked genes are more highly expressed in wHm-t-infected/Hb-Oscar-injected embryos, we have included the expression data for a Z-linked gene tpi, along with its statistical data in the revised manuscript (Fig. 3e, lines 210-212).

      (3) The authors demonstrated that Hm-Oscar suppresses the masculinizing functions of lepidopteran Masc in BmN-4 cells derived from the female ovaries of Bombyx mori. They should clarify why this cell line was chosen and its biological relevance. Additionally, they should explain the rationale for evaluating the expression levels of the male-specific BmIMP variant and whether it is equivalent to dsx.

      Thank you for your suggestion. We selected BmN-4 cell line because previous studies have established it as a reliable model for investigating the functions of lepidopteran masc genes and the interactions between masc and Oscar genes (Katsuma et al., 2019; 2022). In addition, BmIMP<sup>M</sup> is a male-specific regulator of the male-type dsx, making it an ideal target for assessing the 'maleness' induced by transfection of the masc gene in female-derived BmN-4 cells (Suzuki et al., 2010; Katsuma et al., 2015). We have included more detailed background information in the revised manuscript and have thoroughly revised this section (Hm-oscar suppresses the masculinizing functions of lepidopteran masc genes, starting at line 214) and Figure 4 for better clarity.

      (4) Although the authors show that Hm-oscar is involved in Wolbachia-induced MK in Homona magnanima and interacts with the sex determination system in lepidopteran insects, the precise molecular mechanism of Hm-oscar-induced MK remains unclear. Further studies are needed to elucidate how Hm-oscar regulates Homona magnanima genes to induce MK, though this may be beyond the scope of the current manuscript.

      Based on our findings and previous studies in Homona, Ostrinia and Bombyx (Arai et al., 2023a; Katsuma et al., 2023; Kiuchi et al., 2014), we hypothesize that the molecular mechanisms underlying _w_Hm-induced MK are likely linked to impaired dosage compensation caused by the inhibition of Masc function by the Hm-Oscar protein. While the precise mechanisms remain unclear, unbalanced Z-linked gene expression due to the impaired dosage compensation (i.e., 2-fold higher Z-linked gene expression compared to normal males) is known to be lethal for lepidopteran males (Kiuchi et al., 2014; Fukui et al., 2015; Visser et al., 2021). We have outlined this hypothesis in the Discussion section (lines 245-254).

      Reviewer #3 (Public review):

      Summary:

      Overall, this is a clearly written manuscript with nice hypothesis testing in a non-model organism that addresses the mechanism of Wolbachia-mediated male killing. The authors aim to determine how five previously identified male-killing genes (encoded in the prophage region of the wHm Wolbachia strain) impact the native host, Homona magnanima moths. This work builds on the authors' previous studies in which:

      (1) They tested the impact of these same wHm genes via heterologous expression in Drosophila melanogaster.

      (2) They examined the activity of other male-killing genes (e.g., from the wFur Wolbachia strain in its native host: Ostrinia furnacalis moths).

      Advances here include identifying which wHm gene most strongly recapitulates the male-killing phenotype in the native host (rather than in Drosophila), and the finding that the Hm-Oscar protein has the potential for male-killing in a diverse set of lepidopterans, as inferred by the cell-culture assays.

      Strengths:

      Strengths of the manuscript include the reverse genetics approaches to dissect the impact of specific male-killing loci, and the use of a "masculinization" assay in Lepidopteran cell lines to determine the impact of interactions between specific masc and oscar homologs.

      We would like to thank you for evaluating our manuscript.

      Weaknesses:

      My major comments are related to the lack of statistics for several experiments (and the data normalization process), and opportunities to make the manuscript more broadly accessible.

      Thank you for your suggestions. We have thoroughly revised the manuscript to provide clearer explanations for non-experts. In addition, we have included more detailed statistical data for Figure 3 and Figure 4 based on the Steel-Dwass tests. For Figure 3a-d, displaying statistical significance directly on the whisker plots would make the figure too cluttered due to the numerous combinations. Therefore, we have provided all the statistical data in the supplementary data file. To further support the claim that Z-linked genes are more highly expressed in w_Hm-t-infected/Hm-Oscar-injected embryos, we have included the expression data for a Z-linked gene _tpi, along with its statistical data in the revised manuscript (Fig.3e, lines 210-212). Regarding Figure 4, we have revised the Figure based on the reviewer’s suggestions, and provided more detailed information on how the expression data were analyzed (Transfection assays and quantification of BmIMP<sup>M</sup>, lines 495-520). We have also included more detailed background information on the assay system (Hm-oscar suppresses the masculinizing functions of lepidopteran masc genes, lines 215-237). Although we did not observe statistical significance based on the Steel-Dwass test, likely due to limited replicates, the observed changes in the IMP gene expression remain clearly evident.

      The manuscript I think would be much improved by providing more details regarding some of the genes and cross-lineage comparisons. I know some of this is reported in previous publications, but some summary and/or additional analysis would make this current manuscript much more approachable for a broader audience, and help guide readers to specific important findings. For example, a graphic and/or more detail on how the wmk/oscar homologs (within and across Wolbachia strains) differ (e.g., domains, percent divergence, etc) would be helpful for contextualizing some of the results. I recognize the authors discuss this in parts (e.g., lines 223-227), but it does require some bouncing between sections to follow. Similarly, the experiments presented in Figure 4 indicate that Hm-oscar has broad spectrum activity: how similar are the masc proteins from these various lepidopterans? Are they highly conserved? Rapidly evolving? Do the patterns of masc protein evolution provide any hints at how Oscar might be interacting with masc?

      Thank you for your valuable suggestion. To address this, we have included a visualization of the structural differences between the Oscar and wmk homologs in Figure 1a of the revised manuscript. In addition, we have included more detailed information for these genes and revised the introduction (lines 110-114; 124-137) and discussion (lines 255-266) to provide a clearer and more comprehensive overview. We have also described the similarity of the Masc proteins and Oscar proteins that we used, which is now reflected in the revised Figure 4b and 4d. More detailed information on these proteins is available in the supplementary data. Notably, Masc proteins exhibit high sequence variability with conserved domains (Figure 4d). Previous study identified the N-terminal region of Masc as crucial for the Oscar function (Katsuma et al., 2022). The wide spectrum of the actions of Hm-Oscar likely stems from these conserved structures of Masc, but the effects might have undergone evolutionary tuning through interactions with the native host as discussed in lines 293-294.

      It is clear from Figure 1 that the combinations of wmk homologs do not cause male killing on their own. Did the authors test if any of the wmk homologs impact the MK phenotype of oscar? It looks like a previous study tested this in wFur (noted in lines 250-252), but given that the authors also highlight the differences between the wFur-oscar and Hm-oscar proteins, this may be worth testing in this system. Related to this, what is the explanation for why there would be 4 copies of wmk in Hm?

      Thank you for your valuable suggestion. Unfortunately, we have not yet tested the effects of co-expression of wmk and Oscar. Due to a technical issue, the mixing of multiple constructs results in a reduced amount of mRNA (i.e. mixing wmk-3 and Hm-Oscar at the same concentration results in a 2-fold lower concentration in mRNA for both genes compared to mono-injected groups). In addition, we have previously tested injecting mRNA at the twofold higher concentration (i.e. 2 ug/ul mRNA), which resulted in very low hatchability regardless of the genes. Katsuma et al (2022) tested the effect of wmk on the sex determination system, but did not test the effect of co-injection/transfection of wmk and Oscar. Considering the results of this and previous studies (Katsuma et al., 2022; Arai et al., 2023), it is likely that the targets of the wmk and oscar genes are different (as discussed in lines 267-289). Co-injection of wmk and oscar may not produce additive effects. Nevertheless, we would like to test the results in future studies using the Drosophila system as well.

      As you point out, it is an interesting point that the moth-derived MK Wolbachia w_Hm-t encodes four _wmk genes, although they have no apparent effect on host survival. The exact functional relevance of these wmk homologs remains unclear. However, they may play a role in Wolbachia biology as transcriptional regulators, given that they encode HTH domains. Wolbachia generally encode several wmk homologs in their genome, regardless of whether they induce MK. This suggests that the functions of the wmk genes may be 'suppressed' in certain Wolbachia-host systems. The wmk and Hm-oscar genes are located within a prophage region, and some wmk genes are tandemly arrayed (as described in Arai et al., 2023). These wmk homologs may have increased in number by horizontal phage transfer, and the region containing wmk and adjacent sequences may act as a genomic island for virulence. So far, the function of wmk homologs has only been tested in D. melanogaster and H. magnanima, and further studies in other Wolbachia-host systems are highly warranted to test whether wmk exerts MK effects in other insect models. These points have been briefly discussed in the revised manuscript (lines 301-306; 318-320).

      Why are some of the broods male-biased (2/3) rather than ~50:50? (Lines 170-175, Figure 2a). For example, there is a strong male bias in un-hatched oscar-injected and naturally infected embryos, whereas the control uninfected embryos have normal 50:50 sex ratios. It is difficult to interpret the rate of male-killing given that the sex ratios of different sets of zygotes are quite variable.

      The observed male-biased sex ratios in unhatched embryos are due to the occurrence of MK during embryogenesis. In the unhatched groups, the skew towards males reflects that fact that the male embryos were targeted and killed by Wolbachia/Oscar, leading to a surplus of unhatched male embryos. Conversely, hatched individuals show a higher proportion of females because many of the males were eliminated during embryogenesis. Thus, the unhatched embryos are more male-biased, while the hatched individuals are more female-biased in the Hm-oscar/_w_Hm-t treated groups. We have revised the relevant section (Males are killed mainly at the embryonic stage, lines 179-186) and provided more detailed information to clarify this explanation.

      Figure 2b - it appears there are both male and female bands in the HmOsc male lane. I think this makes sense (likely a partial phenotype due to the nature of the overexpression approach), but this is worth highlighting, especially in the context of trying to understand how much of the MK phenotype might be recapitulated through these methods. Related, there is no negative control for this PCR.

      Thank you for your suggestion. As you noted, a faint dsx-M band is visible in the Hm-oscar treated group in Figure 2b. This is consistent with previous findings by Arai et al. (2023), which reported that male embryos with low-density w_Hm-t showed double bands of _dsx-M and dsx-F, similar to what we observed in this study. This information has been included in the revised manuscript in lines 196-198, as follows:

      “Notably, male embryos expressing Hm-oscar also exhibited weak male-type dsx splicing in addition to the female-type splicing, resembling the previously observed pattern in male embryos infected with low-titer _w_Hm-t (Arai et al., 2023a).”

      Also, we appreciate your comment regarding the missing of negative control. The figure has now been revised as we realised that the negative control lane had been lost during the preparation of the figure. We also included the relevant molecular marker information in both the figure legends and Figure 2b.

      It appears the RNA-seq analysis (Figure 3) is based on a single biological replicate for each condition. And, there are no statistical comparisons that support the conclusions of a shift in dosage compensation. Finally, it is unclear what exactly is new data here: the authors note "The expression data of the wHm-t-infected and non-infected groups were also calculated based on the transcriptome data included in Arai et al. (2023a)" - So, are the data in Figure 3c and 3d a re-print of previous data? The level of dosage compensation inferred by visually comparing the control conditions in 3b and 3d does not appear consistent. With only one biological replicate library per condition, what looks like a re-print of previous data, and no statistical comparisons, this is a weakly supported conclusion.

      Thank you for your suggestion. In this study, we generated the RNA-seq data for the Hm-oscar/GFP-injected groups, but did not sequence the w_Hm-t-infected/NSR lines. Instead, the previously generated RNA-seq data of _w_Hm-t-infected/NSR (Arai et al., 2023) were re-analyzed (rather than simply reprinted) to evaluate whether the expression patterns of _Hm-oscar-injected and w_Hm-t-infected groups are similar. We have revised the Results section (_Hm-oscar impairs dosage compensation in male embryos, lines 200-212), the Materials and methods section (Quantification of Z chromosome-linked genes, lines 454-456), and the figure legends to provide more precise information about this analysis.

      Although we did not perform replicates for the RNA-seq comparisons, it is important to note that each RNA-seq sample contains 50-60 male/female individuals. We believe the results are still robust and clearly indicative of the trends we observe. This was further supported by the quantification of Hmtpi gene expression, which we have visualized in Figure 3e (and lines 210-212). As you noted, the expression patterns in Figure 3b (GFP injected) and Figure 3d (NSR) are not completely identical. This discrepancy may be due to the differences between injection treatments and natural infections. Nevertheless, both treatments are consistent in showing that gene expressions on the Z chromosome (Chr01 and Chr15) are not upregulated.

      We have also added more detailed statistical data for Figure 3 based on the Steel-Dwass tests. For Figure 3a-d, however, showing the statistical significance directly on the whisker plots would create excessive clutter due to the numerous combinations of chromosomes. Instead, we have provided the full statistical data in the supplementary data file. Furthermore, to support/strengthen our conclusion that Z-linked genes are highly expressed in w_Hm-t-infected/_Hm-Oscar-injected embryos, we have included expression data for the Z-linked gene tpi, along with statistical data, in the revised manuscript (Fig. 3e, lines 210-212).

      In Figure 4: There are no statistics to support the conclusions presented here. Additionally, the data have gone through a normalization process, but it is difficult to follow exactly how this was done. The control conditions appear to always be normalized to 100 ("The expression levels of BmImpM in the Masc and Hm-Oscar/Oscar co-transfected cells were normalized by setting each Masc-transfected cell as 100"). I see two problems with this approach:

      (1) This has eliminated all of the natural variation in BmImpM expression, which is likely not always identical across cells/replicates.

      (2) How then was the percentage of BmImpM calculated for each of the experimental conditions? Was each replicate sample arbitrarily paired with a control sample? This can lead to very different outcomes depending on which samples are paired with each other. The most appropriate way to calculate the change between experimental and control would be to take the difference between every single sample (6 total, 3 control, 3 experimental) and the mean of the control group. The mean of the control can then be set at 100 as the authors like, but this also maintains the variability in the dataset and then eliminates the issue of arbitrary pairings. This approach would also then facilitate statistical comparisons which is currently missing.

      Thank you for your suggestion. As you pointed out in (1), the previous analysis did indeed eliminate the natural variation in BmIMP-M expression. In the revised manuscript and Figure 4, we have reanalyzed the data following your suggestion and have described the variation across replicates.

      For (2), the data shown in the previous manuscript were normalized to 100 for each Masc-treated group. In doing so, each replicate sample was arbitrarily paired with a control sample from the same cell lot to account for variations that might occur due to differences in cell lots. However, following your recommendation, we have revised the figure to set the average of the Hm-masc treated group to 100, rather than using arbitrary pairings. More detailed normalization procedures have been provided in the section 'Transfection assays and quantification of BmIMP' (lines 483-520). Additionally, we have provided more detailed background information on the assay system in lines 218-223. Although we did not observe statistical significance based on the Steel-Dwass test, likely due to the limited number of replicates, the differences in IMP gene expression between the Masc-treated and Masc&Hm-oscar-treated groups remain evident.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Line 38: change to: 'Wolbachia are maternally transmitted'.

      Revised accordingly (line 38).

      Line 69: remove 'seemingly'.

      Revised accordingly (line 69).

      Paragraph starting line 123: I don't think this is so clear to a reader who is not familiar with the work and system. It would be helpful to more clearly explain that candidate male-killing genes from Wolbachia that infect Homona were inserted into Drosophila melanogaster, and that their expression was then induced, with interesting patterns (and that it can be a bit difficult to interpret the transgenic expression of genes from a moth male-killer that are inserted into a fly). Also, the sentence about the combined action of cifA and cifB in Drosophila cytoplasmic incompatibility is also confusing to a non-expert. I would suggest removing it.

      Thank you for your suggestion. We have revised the paragraph (lines 124-139) to provide clearer background information, making it easier for non-experts to follow. We have also removed the sentence regarding the combined effect of cifA and cifB to improve the flow and overall clarity.

      Line 170: what is the explanation for the male-biased sex ratio instead of 50-50?

      The male-biased sex ratio occurs because MK happens during embryogenesis. Unhatched embryos include males that were killed by Wolbachia/Oscar, resulting in a higher proportion of unhatched male embryos. Conversely, the hatched individuals display a female bias, as most of the males were eliminated during embryogenesis. Thus, the unhatched embryos are more male-biased, while the hatched individuals are more female-biased in the Hm-oscar/_w_Hm-t treated groups. We have revised the section “Males are killed mainly at the embryonic stage” (lines 170-186) to include more detailed information explaining this phenomenon.

      Line 190: please explain what are the Z chromosomes in Bombyx and Homona and Lepidoptera in general (chromosomes 1 and 15?), as this is not so clear for a non-expert.

      Thank you for your suggestion. I have revised the section (lines 200-212) to include more precise background information about the chromosome constitutions in lines 202-204 as follows:

      “Unlike other lepidopteran species, Tortricidae, including H. magnanima, generally possess a large Z chromosome that is homologous to B. mori chromosomes 1 (Z) and 15 (autosome).”

      Line 222: please explain oscar diversity and classification in more detail, as this is not so clear for a non-expert.

      Thank you for your suggestion. We have revised the sentences to provide clearer background information on the diversity of oscar genes (lines 255-264).

      Figure 4: I found this difficult to follow. Why are there 2 rows (HmOscar and Oscar)? Does oscar here refer to oscar from Ostrinia? I am also a bit confused about the baseline control of Masc in these cell lines. If I understand Lepidoptera sex determination, then these cell lines are expressing high levels of female-specific piRNAs that suppress Masc. How specific are these piRNAs (i.e. do Bombyx piRNAs suppress Mascs from other Lepidoptera)? How much extra Masc will override endogenous piRNA? Information is lost by setting Masc expression to 100% in each separate comparison.

      Yes, the Oscar indicates the w_Fur-encoded _oscar (Oscar from Ostrinia) that was tested to compare function with the Homona-derived Hm-oscar gene. In addition, following the reviewer's suggestions, we have revised the figure and included more detailed information on how we adjusted the expressions in the M&M section.

      A previous study (Shoji et al., 2017, RNA 23:86–97) demonstrated that the Fem piRNA (29 bp) in Bombyx mori requires a 17 bp complementary sequence from its 5' region for its function. However, in species other than B. mori, no significant homology (i.e., over 17 bp matches) was found between the B. mori Fem piRNA and the masc genes analyzed in this study. Therefore, it is likely that the Fem piRNA expressed in BmN-4 cells is unable to suppress the masculinizing function driven by masc genes in other lepidopteran species. In addition, we did not quantify the levels of piRNA in this system, but the expression levels of masc are probably too high to be suppressed.

      Figure 4 legend: spelling of Spodoptera.

      Revised accordingly.

      Reviewer #2 (Recommendations for the authors):

      In Figure 2, what is the dsx splicing type for the hatched male in the Hm-oscar-injected group and the wHm-t infected line? Dsx-F or dsx-M?

      Thank you for your suggestion. Unfortunately, we have not tested splicing in the hatched male neonates (1st instar larvae), partly due to difficulties in obtaining sufficient material for RNA extraction. Based on the previous publication in the Ostrinia system, where Oscar-bearing w_Sca induces MK, the hatched males (ZZ) exhibit female type _dsx as observed in the male embryos (Herran et al., 2022). The hatched Homona males may show double bands for dsx-M and dsx-F as observed in this study.

      The size of the markers (in kilobase pairs) should be indicated in Figure 2.

      We have accordingly included the marker information in the revised Figure 2b and the figure legends.

      In Figure 3, could the authors identify which genes exhibit higher expression levels in the Hm-oscar-injected group and the wHm-t infected line? Could they provide hints for the possible mechanism of male-killing?

      In the RNA-seq data shown in Figure 3a-d, we observed that both the Hm-oscar-injected and w_Hm-infected groups generally exhibited upregulated expression of Z-linked genes. Rather than the upregulation or downregulation of a specific gene, we consider that global upregulation of Z-linked genes, caused by improper dosage compensation, is lethal for males. The Z chromosome contains various genes involved in key biological processes such as endocrine function and detoxification, and disruption of these processes may contribute to male lethality. Additionally, in this revised manuscript, we have provided more detailed information on the expression level of the Z-linked gene _tpi. We have also discussed the potential mechanisms of MK in the Discussion section (lines 245-254).

      The format of the references should be consistent. Gene and species names should be italicized.

      We have accordingly formatted.

      Reviewer #3 (Recommendations for the authors):

      The authors use the term "upstream" (e.g., Oscar suppressed the function of masculinizer, the upstream male sex determinant...), which was sometimes confusing. In many cases, it reads as though the masculinizer was upstream of oscar, but what I think the authors are trying to convey is that masculinizer is a primary sex-determining factor.

      Thank you for your suggestion. We have accordingly revised the term.

      Line 101: which insect is wFur from?

      It is from Ostrinia furnacalis - line 104 has been revised.

      Figure 1: it would be helpful to indicate the statistical results on the figure.

      Accordingly, we have added statistical data (binominal test) for Figure 1. The data for the Steel-Dwass test have been included in the supplementary data.

      Figure 2b: please label the ladder on the gel.

      Thank you for your suggestion. We have accordingly labeled the DNA ladder on the gel.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      This study by Wu et al. provides valuable computational insights into PROTAC-related protein complexes, focusing on linker roles, protein-protein interaction stability, and lysine residue accessibility. The findings are significant for PROTAC development in cancer treatment, particularly breast and prostate cancers.

      The authors' claims about the role of PROTAC linkers and protein-protein interaction stability are generally supported by their computational data. However, the conclusions regarding lysine accessibility could be strengthened with more in-depth analysis. The use of the term "protein functional dynamics" is not fully justified by the presented work, which focuses primarily on structural dynamics rather than functional aspects.

      Strengths:

      (1) Comprehensive computational analysis of PROTAC-related protein complexes.

      (2) Focus on critical aspects: linker role, protein-protein interaction stability, and lysine accessibility.

      Weaknesses:

      (1) Limited examination of lysine accessibility despite its stated importance.

      (2) Use of RMSD as the primary metric for conformational assessment, which may overlook important local structural changes.

      Reviewer #1 (Recommendations for the authors):

      (1) The authors' claims about the role of PROTAC linkers and protein-protein interaction stability are generally supported by their computational data. However, the conclusions regarding lysine accessibility could be strengthened with more in-depth analysis. Expand the analysis of lysine accessibility, potentially correlating it with other structural features such as linker length.

      We thank the reviewers for the suggestions! We performed time dependent correlation analysis to correlate the dihedral angles of the PROTACs and the Lys-Gly distance (Figures 6 and S17). We included detailed explanation on page 16:

      “To further examine the correlation between PROTAC rotation and the Lys-Gly interaction, we performed a time-dependent correlation analysis. This analysis showed that PROTAC rotation translates motion over time, leading to the Lys-Gly interaction, with a correlation peak around 60-85 ns, marking the time of the interaction (Figure 6 and Figure S17). In addition, the pseudo dihedral angles also showed a high correlation (0.85 in the case of dBET1) with Lys-Gly distance. This indicated that degradation complex undergoes structural rearrangement and drives the Lys-Gly interaction.”

      (2) The use of the term "protein functional dynamics" is not fully justified by the presented work, which focuses primarily on structural dynamics rather than functional aspects. Consider changing "protein functional dynamics" to "protein dynamics" to more accurately reflect the scope of the study.

      Thanks to the reviewer for the suggestion to use the more accurate terminology! We agreed with the reviewer that if we keep “protein functional dynamics” in the title, we should focus on how the “overall protein dynamic” links to the “function” – The function is directly related to PROTAC-induced structural dynamics which is commonly seen in “protein-structural-function” relationship, but it is not our main focus. Therefore, we changed the title to replace “functional” by “structural”.  

      (3) Incorporate more local and specific characterization methods in addition to RMSD for a more comprehensive conformational assessment.

      We thank the reviewer for the suggestion. We performed time dependent correlation analysis to understand how the rotation of PROTACs can translate to the Lys-Gly interaction. In addition, we performed dihedral entropies analysis for each dihedral angle in the linker of the PROTACs to better examine the flexibility of each PROTAC.

      We included detailed explanation at page 18: “Our dihedral entropies analysis showed that dBET57 has ~0.3 kcal/mol lower entropies than the other three linkers, suggesting dBET57 is less flexible than other PROTACs (Figure S18).”

      Reviewer #2 (Public review):

      Summary:

      The manuscript reports the computational study of the dynamics of PROTAC-induced degradation complexes. The research investigates how different linkers within PROTACs affect the formation and stability of ternary complexes between the target protein BRD4BD1 and Cereblon E3 ligase, and the degradation machinery. Using computational modeling, docking, and molecular dynamics simulations, the study demonstrates that although all PROTACs form ternary complexes, the linkers significantly influence the dynamics and efficacy of protein degradation. The findings highlight that the flexibility and positioning of Lys residues are crucial for successful ubiquitination. The results also discussed the correlated motions between the PROTAC linker and the complex.

      Strengths:

      The field of PROTAC discovery and design, characterized by its limited research, distinguishes itself from traditional binary ligand-protein interactions by forming a ternary complex involving two proteins. The current understanding of how the structure of PROTAC influences its degradation efficacy remains insufficient. This study investigated the atomic-level dynamics of the degradation complex, offering potentially valuable insights for future research into PROTAC degradability.

      Reviewer #2 (Recommendations for the authors):

      (1) Regarding the modeling of the ternary complex, the BRD4 structure (3MXF) is from humans, whereas the CRBN structure in 4CI3 is derived from Gallus gallus. Is there a specific reason for not using structures from the same species, especially considering that human CRBN structures are available in the Protein Data Bank (e.g., 8OIZ, 4TZ4)?

      We appreciate the reviewer’s insightful comment regarding the choice of crystal structures of BRD4 and CRBN structures from two species. Our initial selection of 4CI3 for CRBN structure was based on its high resolution and publication in Nature journal. Furthermore, the Gallus gallus CRBN structure shares high degree of sequence and structural similarity with Homo sapiens CRBN, especially in the ligand binding region. At the time of our study, we were aware of 4TZ4 as Homo sapiens CRBN, however, we did not use this structure since no publication or detailed experimental was associated with it. Additionally, PDB 8OIZ, was not publicly available yet for other researchers to use at the time.

      (2) Based on the crystal structure (PDB ID: 6BNB) discussed in Reference 6, the ternary complex of dBET57 exhibits a conformation distinct from other PROTACs, with CRBN adopting an "open" conformation. Using the same CRBN structure for dBET57 as for other PROTACs might result in inaccurate docking outcomes.

      Thank you for the reviewer’s comment! As noted by the authors in Reference 6, the observed open conformation of CRBN in the dBET57 ternary complex may result from the high salt crystallization conditions, which could drive structural rearrangement, and crystal contacts that may induce this conformation. The authors also mentioned that this open conformation could, in part, reflect CRBN’s intrinsic plasticity. However, they acknowledged that further studies are needed to determine whether this conformational flexibility is a characteristic feature of CRBN that enables it to accommodate a variety of substrates. Despite these observations, we believe that the compatibility of the observed BRD4<sup>BD1</sup> binding conformation with both open and closed CRBN states suggests that these conformational changes are all possible. Therefore, we believe using the same initial CRBN structure for dBET57 as for other PROTACs can still reasonably reveal the dynamic nature of the ternary complex and would not significantly affect the accuracy of our docking outcomes either.

      (3) Figure 2 displays only a single frame from the simulations, which might not provide a comprehensive representation. Could a contact frequency heatmap of PROTAC with the proteins be included to offer a more detailed view?

      We thank the reviewer for the suggestion! We performed the contact map analysis to observe the average distance between PROTACs and BRD4<sup>BD1</sup> over 400ns of MD simulation (new Figure S4 added).

      We included detailed explanation at page 8 and 9: “The residues contact map throughout the 400ns MD simulation also showed different pattern of protein-protein interactions, indicating that the linkers were able to adopt different conformations (Figure S4).”

      (4) The conclusions in Figure 3 and S11 are based on a single 400 ns trajectory. The reproducibility of these results is therefore uncertain.

      We thank the reviewer for the suggestion! We added one more random seed MD simulation for each PROTAC to ensure the reproducibility of the results. The Result is shown in Figure S21 and the details for each MD run are updated in Table 1.

      (5) Figure 4 indicates significant differences between the first and last 100 ns of the simulations. Does this suggest that the simulations have not converged? If so, how can the statistical analysis presented in this paper be considered reliable?

      We thank the reviewers for the question. The simulation was initiated with a 10-15A gap between BRD4 and Ub to monitor the movement of degradation machinery and Lys-Gly interaction. The significant changes in pseudo dihedral in Figure 4 shows that the large-scale movement of the degradation complex can initiate the Lys-Gly binding. It does not relate to unstable sampling because the system remains very stable when BRD4 comes close to Ub.

      (6) In Figure 5, the dihedral angle of dBET57_#9MD1 is marked on a peptide bond. Shouldn't this angle have a high energy barrier for rotation?

      We thank the reviewers for catching the error! Indeed, it was an error that the dihedral angles were marked on the peptide bond. We reworked the figure and double checked our dihedral correlation analysis. The updated correlate dihedral angle selection and the correlation coefficient is shown in Figure 5.

      (7) Given that crystal structures for dBET 70, 23, and 57 are available, why is there a need to model the complex using protein-protein docking?

      We thank the reviewer for the feedback. Only dBET23 has the ternary complex available in a crystal structure, which has the PROTAC and both proteins, while dBET1, dBET57 and dBET70 are not completed as ternary complexes. Although dBET70 has a crystal structure, its PROTAC’s conformation is not resolved, and thus we decided to still perform protein-protein docking with dBET70. 

      We includeed the explanation at page 8: “Only dBET23 crystal structure is available with the PROTAC and both proteins, while the experimentally determined ternary complexes of dBET1, dBET57 and dBET70 are not available. “

      (8) On page 9, it is mentioned that "only one of the 12 PDB files had CRBN bound to DDB1 (PDB ID 4TZ4)." However, there are numerous structures of the DDB1-CRBN complex available, including those used for docking like 4CI3, as well as 4CI1, 4CI2, 8OIZ, etc.

      We thank the reviewers for the comment! We acknowledged the existence of several DDB1-CRBN complex crystal structures, such as PDB IDs 4CI1, 4CI2, 4CI3, and the more recent 8OIZ. For our study, we chose to use 4TZ4 to maintain consistency in complex construction and to align with the methodology established in a previously published JBC paper (https://doi.org/10.1016/j.jbc.2022.101653), which successfully utilized the same structure for a similar construct. At the time our study was conducted, the 8OIZ structure had not yet been released. We appreciate your suggestion and will consider incorporating alternative structures in future studies to further investigate our findings.

      (9) Table 2 is first referenced on page 8, while Table 1 is mentioned first on page 10. The numbering of these tables should be reversed to reflect their order of appearance in the text.

      We thank the reviewer for catching the error! We switched the order of Table 1 and Table 2.

      Reviewer #3 (Public review):

      The authors offer an interesting computational study on the dynamics of PROTAC-driven protein degradation. They employed a combination of protein-protein docking, structural alignment, atomistic MD simulations, and post-analysis to model a series of CRBN-dBET-BRD4 ternary complexes, as well as the entire degradation machinery complex. These degraders, with different linker properties, were all capable of forming stable ternary complexes but had been shown experimentally to exhibit different degradation capabilities. While in the initial models of the degradation machinery complex, no surface Lys residue(s) of BRD4 were exposed sufficiently for the crucial ubiquitination step, MD simulations illustrated protein functional dynamics of the entire complex and local side-chain arrangements to bring Lys residue(s) to the catalytic pocket of E2/Ub for reactions. Using these simulations, the authors were able to present a hypothesis as to how linker property affects degradation potency. They were able to roughly correlate the distance of Lys residues to the catalytic pocket of E2/Ub with observed DC50/5h values. This is an interesting and timely study that presents interesting tools that could be used to guide future PROTAC design or optimization.

      Reviewer #3 (Recommendations for the authors):

      (1) My most important comment refers to the MM/PBSA analysis, the results of which are shown in Figure S9: binding affinities of -40 to -50 kcal/mol are unrealistic. This would correspond to a dissociation constant of 10^-37 M. This analysis needs to be removed or corrected.

      We thank the reviewer for the comment! MM/PBSA analysis indeed cannot give realistic binding free energy. It does not include the configurational entropy loss which should be a large positive value. In addition, while the implicit PBSA solvent model computes solvation free energy, the absolute values may not be very accurate. However, because this is a commonly used energy calculation, and some readers may like to see quantitative values to ensure that the systems have stable intermolecular attractions, we kept the analysis in SI. We edited the figure legend, moved the Figure S10 in SI page 19, and added sentences to clearly state that the calculations did not include configuration entropy loss “Note that the energy calculations focus on non-bonded intermolecular interactions and solvation free energy calculations using MM/PBSA, where the configuration entropy loss during protein binding was not explicitly included. “.

      (2) I think that the analysis of what in the different dBETx makes them cause different degradation potency is underdeveloped. The dihedral angle analysis (Figure 4B) did not explain the observed behavior in my opinion. Please add additional, clearer analysis as to what structural differences in the dBETx make them sample very different conformations.

      We thank the reviewer for the suggestions! Based on the suggestion, we further performed dihedral entropy analysis for each dihedral angle in the linker part of the PROTAC to examine the flexibility of each PROTAC. Because each PROTAC has a different linker, we now clearly label them in a new Figure S18 in SI page 27. Low dihedral entropies indicate a more rigid structure and thus less flexibility to make a PROTAC more difficult to rearrange and facilitate the protein structural dynamic necessary for ubiquitination.

      We added detailed explanation on page 18: “Our dihedral entropy analysis showed that dBET57 has ~0.3 kcal/mol lower configuration entropies than the other dBETs with three different linkers, suggesting that dBET57 is less flexible than the other PROTACs (Figure S18).”

      (3) "The movement of the degradation machinery correlated with rotations of specific dihedrals of the linker region in dBETs (Figure 5).": this is not sufficiently clear from the figure. Definitely not in a quantitative way.

      We thank the reviewers for the suggestions! To further understand the correlation between PROTACs dihedral angles and the movement of degradation machinery, we performed time dependent correlation analysis to correlate the dihedral angles of the PROTACs and the Lys-Gly distance (Figures 6 and S17).

      We included detailed explanation on page 16:

      “To further examine the correlation between PROTAC rotation and the Lys-Gly interaction, we performed a time-dependent correlation analysis. This analysis showed that PROTAC rotation translates motion over time, leading to the Lys-Gly interaction, with a correlation peak around 60-85 ns, marking the time of the interaction (Figure 6 and Figure S17). In addition, the pseudo dihedral angles also showed a high correlation (0.85 in the case of dBET1) with Lys-Gly distance. This indicated that degradation complex undergoes structural rearrangement and drives the Lys-Gly interaction.

      (4) Cartoons are needed at multiple stages throughout the paper to enhance the clarity of what the modeled complexes looked like (e.g. which subunits they contained).

      We thank the reviewers for the suggestions. We added and remade several Figures with cartoons to better represent the stages. We also used higher resolution and included clearer labels for each protein system.

      (5) The difference between CRL4A E3 ligase and CRBN E3 ligase is not clear to the non-expert reader.

      Thanks for the reviewer’s comment! To clarify the terms "CRL4A E3 ligase" and "CRBN E3 ligase", which refer to different levels of description for the protein complexes, we added a couple of sentences in the Figure 1 legend. As a result, the non-expert readers can clearly know the differences.

      As illustrated in Figure 1,

      • CRL4A E3 ligase refers to the full E3 ligase complex, which includes all protein components such as CRBN, DDB1, CUL4A, and RBX1.

      • CRBN E3 ligase, on the other hand, is a more colloquial term typically used to describe just the CRBN protein, often in isolation from the full CRL4A complex.

      (6) Figure 1, legend: unclear why it's E3 in A and E2 in B.

      We thank the reviewer for the question! E3 ligase in Figure 1A refers to CRBN E3 ligase, where researchers also simply term it CRBN. We have added a sentence to specify that CRBN E3 ligase is also termed CRBN for simplicity. In Figure 1B, E2 was unclear in the sentences. The full name of E2 should be E2 ubiquitin-conjugating enzyme. Because the name is a bit long, researchers also call it E2 enzyme. We have corrected it and used E2 enzyme to make it clearer. 

      (7) "Although the protein-protein binding affinities were similar, other degraders such as dBET1 and dBET57 had a DC50/5h of about 500 nM". It's unclear what experimental data supports the assertion that the protein-protein binding affinities are similar.

      We thank reviewer for the question. Indeed, the statement is unclear.

      We corrected the sentence in page 6: “Although utilizing the exact same warheads, other degraders such as dBET1 and dBET57 had a DC<sub>50/5h</sub> of about 500 nM.”

      (8) Was the construction of the degradation machinery complex guided by experimental data (maybe cryo-EM or tomography)? If not, what is the accuracy of the starting complex for MD? This may impact the reliability of the obtained results.

      Thank you for your insightful comments! Yes, the construction of the degradation machinery complex was guided by available high-resolution crystal structures, which was selected to maintain consistency and align with the methodology established in a previously published JBC paper (https://doi.org/10.1016/j.jbc.2022.101653).

      We acknowledged that static crystal structures represent only a single snapshot of the system and may not capture the full conformational flexibility of the complex. To address this limitation, we performed MD simulations using multiple starting structures. This approach allowed us to explore a broader conformational landscape and reduced the dependence on any single starting configuration, thereby enhancing the reliability of the results.

      We hope this clarifies the robustness of our methodology and the steps taken to ensure accuracy in our simulations.

      (9) "With quantitative data, we revealed the mechanism underlying dBETx-induced degradation machinery": I think this may be too strong of an assertion. The authors may have developed a mechanistic hypothesis that can be tested experimentally in the future.

      We thank the reviewer for the suggestion. This is indeed a strong assertion and needs to be modified. We edited the sentence in page 7: “With quantitative data, we revealed the importance of the structural dynamics of dBETx-induced motions, which arrange positions of surface lysine residues of BRD4<sup>BD1</sup> and the entire degradation machinery.”

      (10) Figure S2: are the RMSDs calculated over all residues? Or just the BRD4 residues? Given that the structures are aligned with respect to CRBN, the reported RMSD numbers might be artificially low since there are many more CRBN residues than there are BRD4 residues. Also, why weren't the crystal structures used for dBET 23 and 70 for the modeling? Wouldn't you want to use the most accurate possible structures? Simulations were run for 23. Why not for 70?

      We thank the reviewer for the suggestion. We added a sentence to more clearly explain the RMSD calculations in Figure S2: “The structural superposition is performed based on the backbone of CRBN and RMSD calculation is conducted based on the backbone of BRD4<sup>BD1</sup>.”

      Although dBET70 has crystal structure, its PROTAC structure is not resolved, and thus we decided to still perform protein-protein docking with dBET70.  dBET1 and dBET57 do not have a crystal structure for the ternary complexes.

      We included the explanation at page 8: “Only dBET23 crystal structure is available with the PROTACs and both proteins, while the experimentally determined ternary complexes of dBET1, PROTACs of dBET57 and dBET70 are not available. “

      a. And there are no crystal structures available for 1 and 57? If so, please clearly say that. Otherwise please report the RMSD.

      We thank the reviewer for the suggestion. We included the explanation at page 8: “Only dBET23 crystal structure is available with the PROTACs and both proteins, while the experimentally determined ternary complexes of dBET1, PROTACs of dBET57 and dBET70 are not available.”

      (11) Table 2 is referenced before Table 1.

      We thank the reviewer for catching the error! We switched the order for Table 1 and Table 2.

      (12) Figure S3 is not referenced in the main paper.

      We thank the reviewer for catching the error! We now referred Figure S3 on page7.

      (13) Minor comments on grammar and sentence structure:

      a. It should be "binding of a ternary complex"

      b. "Our shows the importance": word missing.

      c. "...providing insights into potential orientations for ubiquitination. observe whether the preferred conformations are pre-organized for ubiquitination." Word or words missing.

      We thank reviewer for catching the errors! We corrected grammatical errors and unclear sentences throughout the entire paper and revised the sentences to make them easily understandable for non-expert readers.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We are grateful to the reviewers for their detailed evaluation and insightful comments, which have improved the clarity and readability of this manuscript. We have addressed all reviewer comments and incorporated their suggested changes into the text and figures. The line numbers in our response correspond to those in the revised manuscript. Following reviewer 3’s comment, we have repeated the structural refinement of G234A and G234V apo crystal structures without water molecules, which improved the reliability of the data.

      Reviewer #1

      1. Abstract: The current abstract is challenging to follow. For instance, the phrase "The detached head preferentially binds to the forward tubulin-binding site after ATP binding, but the mechanism preventing premature binding to the microtubule while awaiting ATP remains unknown" could imply that the tethered head binds ATP, which is misleading. A clearer statement would be: "The detached head preferentially binds to the forward tubulin-binding site after ATP binding to the leading, microtubule-bound head, but the mechanism preventing premature binding to the microtubule while its partner awaits ATP remains unknown." Response: We thank the reviewer for the suggestion to improve clarity. We have revised the indicated sentence and updated the abstract to enhance clarity.

      Terminology: In the introduction, consider rephrasing to "...its two motor domains ("heads")."

      Response: We have corrected the phrase accordingly (line 44).

      Lines 71-72: The sentences "This mechanism explains how the tethered head preferentially binds to the forward-binding site 'after ATP binding.' However, it does not clarify how the tethered head is prevented from rebinding to the rear-tubulin binding site 'before ATP binding'" could be rephrased for clarity. A suggested revision is: "This mechanism explains how the tethered head preferentially binds to the forward-binding site after ATP binding to the microtubule-bound, leading head. However, it does not clarify how the tethered head is prevented from rebinding to the rear-tubulin binding site before ATP binds to the leading head."

      Response: We appreciate the suggestion for clarification. We have corrected the phrase accordingly (lines 72-75).

      Line 98: Consider revising "could release both ADP" to "could release both ADPs" or "could release both ADP molecules."

      Response: We have corrected the phrase accordingly (line 100).

      Lines 103-104: The statement "Therefore, these results suggest the tension posed to the neck linker plays a critical role in suppressing microtubule-binding of the tethered head" should be clarified. Since tension only develops in the two-heads-bound state, using "steric hindrance" instead of "tension" may improve precision.

      Response: We have corrected this sentence as follows: “These findings suggest that constraints on the neck linker (whether from steric hindrance or interactions with the head or microtubule) are crucial in preventing the tethered head from binding to microtubule” (lines 105-107).

      Lines 374-375: Replace "...before ATP-binding triggers the forward stepping..." with "...before ATP binding to the leading head triggers the forward stepping..."

      Response: We have corrected the phrase accordingly (line 374-375).

      Tense Consistency: Ensure consistent use of present or past tense throughout the manuscript for clarity.

      Response: We have reviewed the manuscript and corrected the verb tenses.

      Reviewer #2

      1. Lines 72-73 can be deleted as they are repetitive with lines 95-96. Response: While I acknowledge the reviewer’s point about redundancy, we would like to retain this sentence as it provides an important connection to the opening sentence of the next paragraph, where we explain why the rear-head gating model is required.

      Line 87: The authors should cite Mickolaczyk et al. PNAS 2015 and Sudhakar et al. Science 2021 as these studies also observed that the trailing head takes a sub-step and is located on the right side of the leading head before it moves forward and completes the step.

      Response: We did not cite these two papers as they contradict the statement of this sentence and rather suggest that kinesin waits for ATP-binding in the “two-head-bound” state. We interpreted this discrepancy as follows: 1) Mickolaczyk’s observations likely represent multiple motor-driven movement. Ensuring mono-valency of bead labeling is essential. In optical trapping assays, it is established that >98% of the bead motility is driven by a single motor when less than 50% of beads moved along the microtubule when brought into contact with microtubule using optical trap. The corresponding author has extensive experience preparing monovalent probes for optical trapping bead assays and high-speed single-molecule assays using gold probe (Tomishige et al., J. Cell Biol. 142, 989 (1998)), having established reliable protocols for monovalent labeling of kinesin with gold probes (refer to methods in Isojima et al., Nat. Chem. Biol. 2016 and Niitani et al. biorxiv 2024). The colloidal gold was coated with three SAMs (self-assembled monolayers) in a ratio of 1:10:10 (biotin-SAM:carboxy-SAM:hydroxy-SAM) to reduce surface biotin molecules and non-specific kinesin binding. The gold particles and kinesin-streptavidin complex were mixed at a 1:1 ratio, though this mixing ratio does not guarantee that 100% of the gold particle movements along microtubule are driven by single motors. We established that standard deviations (s.d.) of on- and off-axis displacements (especially that of off-axis) are key indicators for distinguishing between single- and multiple-motor driven motility of the gold probe. Under the above single-molecule conditions, majority of off-axis s.d. traces exhibited clear two-state transitions between microtubule-bound (low s.d.) and -unbound (high s.d.) states of the gold-labeled head, while under multivalent conditions (with higher kinesin:gold ratio and/or higher biotin-SAM ratio on the gold surface), most traces showed sub-steps but lacked these two-state transitions, instead displaying uncorrelated on- and off-axis s.d. traces. In contrast, Mickolajczyk et al. used commercial streptavidin-coated gold nanoparticles mixed with kinesin at a 6:1 motor-to-gold ratio. While their 2016 and 2017 papers did not show s.d. traces, their Biophys. J. 2019 paper (Fig.4) displayed s.d. traces that are characteristic of multivalent bead motility according to the criteria described above. 2) Sudhakar et al.’s interpretation that rapid sub-steps between 8-nms steps represent tethered head movement (illustrated in Fig 4 of their paper) is likely incorrect. The optical trap force acts on the neck linker of the microtubule-bound head, not to the neck linker of the tethered head. Consequently, trailing head detachment should not cause significant displacement of the trapped bead (as illustrated in Fig. 4 of Carter and Cross, Nature 2005). Instead, conformational changes in the neck linker of the microtubule-bound head (i.e., cover-neck bundle formation after ATP binding (Hwang et al. Structure 2008)) would cause bead displacement, supporting that kinesin waits for ATP in the “one-head-bound state”.

      Lines 103: The authors should cite Benoit et al. kinesin14 and Kif1A structures as these studies directly show the conformations of the neck-linkers when both heads are bound to the microtubule.

      Response: We cited the paper (line 105).

      Line 113: There is an extra "e" on "nucleotide".

      Response: We have corrected the typo (line 117).

      Line 118: I would delete "universal" as it is not clear whether all kinesins use a tension-based mechanism.

      Response: We agree with the reviewer’s comment. Further, reviewer 3 noted that recent studies showed that kinesin-3 may not be explained by this mechanism, so we have removed the word “universal” from this sentence as well as from the Abstract and Discussion.

      Line 132: Why did the authors decide to use a cys-lite mutant for X-ray and cryo-EM studies?

      Response: We used the Cys-light mutant to maintain consistency across various experimental techniques in this paper and to enable direct comparison with the nucleotide-free kinesin-1 structures reported by Cao et al. (2014, 2017), who used the same Cys-light construct. To express this, we revised the sentence as follows: “For consistency across experimental techniques and comparison with the previously solved nucleotide-free kinesin-1 structures, we used a cysteine-light mutant kinesin, where surface-exposed cysteines were replaced with either Ala or Ser” (lines 135-138).

      Line 192: The authors refer to Figures 3A and B when they discuss ATP-like and ADP-like conformations. However, these figures refer to open, semi-open, and closed conformations. Things become clear later in the text, but this is confusing, as is. I recommend the authors either show ATP-like and ADP-like classification as a supplemental figure and refer to that figure or not refer to the figure in this sentence.

      Response: To explain the result in this paragraph, we should reference these figures, while we acknowledge the reviewer’s comment about the confusing nomenclature in Fig.3. To address this, Fig. 3A now lists both the old terminology (nucleotide-free, ADP-like, and ATP-like) alongside the new terminology (open, semi-open, and closed).

      Lines 259-260: I would delete "as evidenced by..." and just cite those papers.

      Response: We have corrected this sentence accordingly (line 265-266).

      Lines 262-276: The authors should cite the relevant literature in this paragraph as most of their conclusions here were already shown by previous structural studies.

      Response: Reviewer 3 also noted that this paragraph outlines our current understanding, which seems out of place in the Results and more relevant for the Discussion. Therefore, we have moved this paragraph to the Discussion section and added relevant citations from the literature (lines 390-406).

      Recent biophysical studies claim that neck-linker docking is a two-step process that occurs in ATP binding and ATP hydrolysis. Do the authors agree with this model? Can they comment on why the neck-linker only partially docks during ATP binding, and require ATP hydrolysis to complete the docking? If they disagree with this model, this should be explained in the Discussion.

      Response: This paper focuses on the neck linker’s extensibility in coordinated motility rather than its docking onto the head. The correlation between ATP binding/hydrolysis and neck linker-docking has been examined in a concurrent paper by Niitani et al. (biorxiv 10.1101/2024.09.19.613828) and is discussed in their Discussion section. In this paper, using loose backward constraint on the neck linker, we demonstrated that docking of the initial neck linker segment is sufficient to half-open the gate. Furthermore, extending the neck linker length increased the ATP off-rate of the rear E236A head, indicating that forward neck linker strain plays a crucial role in stabilizing the closed state. These findings support the hypothesis that neck linker docking remains partially unstable in the one-head-bound state and achieves full stabilization only after transitioning to the two-head-bound state.

      Lines 285: The authors should cite Benoit et al. as they showed this clearly in their structure. Benoit et al. showed that, even though both heads are bound to AMP-PNP, the neck linkers are pointed in opposite directions and the rigid body conformations of the trailing and leading heads are different. Do the authors take this into account when they model the Topen-Lopen state? Can they also comment on why the heads can have different rigid body conformations even though they are bound to the same nucleotide? Is this because tension on the neck-linker is too high if both heads are in the open conformation?

      Response: We have added a citation to Benoit et al. 2021. The Topen-Lopen state is an off-pathway conformational state that differs from the on-pathway two-head-bound states (Tclosed-Lopen) studied using cryoEM. Using smFRET, we showed this state appeared only in the neck linker extended mutants, for which no cryoEM observation exist. Therefore, we modeled the Topen-Lopen state by assuming both heads adopt identical conformations in the open state, and showed that this off-pathway transition is suppressed because it would cause an intolerable increase in neck linker tension. Benoit et al.’s finding that the front open head can bind AMPPNP aligns with Niitani et al.’s observation (bioRxiv 2024) that while the front head can bind ATP, it maintains a low ATP affinity state—unlike the rear head, which exhibits high ATP affinity. This suggests that ATP binding (nucleotide state) is not tightly coupled to the open-to-closed conformational transition of the head.

      Line 308: How do the authors estimate the tension on the neck linker? This needs to be explained briefly in the main text as it is central to the conclusions of this work.

      Response: While we briefly described the method to estimate the tension in the text, we did not specify which part of the disordered neck linker was used for this calculation. We have now added this explanation as follows: “To estimate the amount of this tension, we isolated the disordered neck linker segments from both the leading and trailing heads that are stretched between the motor domains without steric hindrance or docking onto the head (Fig. S4 D). Then, we applied a harmonic potential to the Cα atoms at both ends of the stretched region and calculated the tension from the average displacement of the Cα atom from the potential minimum using MD simulations (Fig. 7, A and B)” (lines 300-306)

      Line 308: Calculated tension is a lot higher than the force needed to pull a tubulin out from its tail from the microtubule (Kuo et al. Nat Comms 2022). Even the lowest tension they reported is a lot higher than the estimates made by Clancy et al. and Hyeon and Onuchic. The authors should comment on why this might be the case.

      Response: The neck linker tension between two heads differs from the force applied by the optical trap to the bead attached to the coiled-coil stalk. Because these forces act in different direction and the coiled-coil stalk contains flexible hinges, torques, rather than forces, should be compared, though this is difficult to estimate (as described in Figure S16 in Hwang and Karplus, Structure 16, 62-71 (2008)). Hyeon & Onuchi (2007) and Hariharan & Hancock (2009) calculated the neck linker tension using a worm-like chain model, yielding different results of 12-15 pN and 28 pN, respectively (Clancy et al. cited these results). This discrepancy stems from different end-to-end distances used in their calculations (3.1 nm versus 4 nm). The 4 nm distance used by Hariharan and Hancock likely represents the tension in the two-head-bound state, as it equals half the distance between two heads on adjacent tubulin-binding sites. Using MD simulation, Hariharan and Hancock further estimated the neck linker tension of 15 pN in constraint force mode and 35 pN in force-clamp mode. Our estimated tension (39 pN) in Tclosed-Lopen state is comparable to the upper limit of these calculations. This estimated tension using isolated neck linkers is likely an overestimate, since the stretched neck linker in the presence of the motor domain includes an additional energetic contribution from its direct interaction with the leading head, which will be described in detail in our response to the reviewer 2’s comment #16. To address this, we have included the following sentence: “The tension in the Tclosed-Lopen state is likely an overestimate since this measurement excludes the enthalpic component discussed above, though it is comparable to previous MD measurements and theoretical calculations using a worm-like chain model (Hariharan and Hancock, 2009).” (lines 307-311)

      Line 321: I would also cite Shastry and Hancock here.

      Response: We have cited this paper (line 322).

      Lines 387: "...the transition from one-head-bound to two-head-bound Topen-Lopen state".

      Response: We have corrected the phrase accordingly (lines 387-388).

      Lines 418-428: The authors assume that the neck-linker extension is purely entropic. However, neck linkers are almost fully stretched especially in unfavorable two-head-bound conformations, and they can potentially make contact with the motor domains. Therefore, this process may not be purely entropic and may also involve energetic terms when considering the free energy of neck linker docking.

      Response: We appreciate the reviewer’s comment, as we had overlooked this important point. After examining the simulation movies of neck linker dynamics in Topen-Lopen and Tclosed-Lopen states (Fig. S4B, C and Videos 3, 4), we found that the stretched neck linker region in the Topen-Lopen state was displaced from the head and showed no interaction with the head during the simulation period. However, in the Tclosed-Lopen state, we observed a stable interaction between the K326 residue in the neck linker and the D37 and F48 residues of the leading open head (which can be seen in Video 4). This interaction was not included in our tension estimation (Fig. S4D), which assumed the tension had a purely entropic origin. Therefore, the estimated tension in the Tclosed-Lopen state is likely an overestimate, while the tension in the Topen-Lopen state remains purely entropic. We have added two sentences to describe these observations as follows: “Throughout the simulation, the stretched neck linker remained displaced from the head without any interaction, suggesting that the neck linker behaves as an entropic spring.” (lines 288-290), and “During this simulation, we observed a stable contact between the K326 side chain of the disordered neck linker and the D37 and F48 residues of the leading head (see Video 4), suggesting that the neck linker tension in Tclose-Lopen state includes an energetic component.” (lines 293-296)

      Lines 452-454: I think this sentence summarizes the most significant contribution of this work and should be clearly mentioned in the abstract.

      Response: We thank the reviewer for this suggestion and have incorporated the sentence into the abstract.

      Lines 476-479: This sentence claims that neck linker docking is not necessary. Instead, rotation of the R-sub domain of the motor domain is sufficient to trigger the forward step. I would omit this sentence, as the rationale is not well explained, and it conflicts with a large body of literature on neck-linker docking. This could be an interesting idea to discuss in a perspective article or a topic of future research, but it may unnecessarily confuse the reader at the conclusion of this work.

      Response: We included this sentence because it provides a testable prediction for neck linker-docking independent stepping, and we are preparing a manuscript to experimentally test this hypothesis. However, we agree with the reviewer’s comment that this statement conflicts with the common view in this field, and without additional verification or statement, it would confuse readers. Therefore, we have removed this sentence from the manuscript.

      Reviewer #3

      Major Comments:

      1. The Abstract is not clearly written to distinguish which kinesin head is being discussed.

      Response: We revised the second sentence in the abstract to distinguish between the tethered and microtubule-bound heads and updated the abstract to enhance clarity.

      The authors describe the bulge formed by the terminus if helix 4 as an obstruction that is "creating an intolerable increase in neck linker tension", but could it not simply be that forward head binding is conformationally disfavoured? Perhaps these ideas are not mutually exclusive.

      Response: We agree with the reviewer that in the ATP-waiting state, the tethered head might also be prevented from binding to the tubulin-binding site due to the neck linker requiring a highly stretched configuration—this could occur before the tension increase that accompanies the transition from semi-open to open conformation. While we addressed this possibility in the Discussion section (lines 398-405 of the original version), our explanation was not sufficiently clear. We have therefore revised the sentence to clarify this point as follows: “Therefore, we can only speculate that the tension would lie somewhere between that of the Tclose-Lopen and Topen-Lopen states, and that microtubule binding of the tethered semi-open head may be restricted because the disordered neck linkers would need to adopt highly stretched configurations.” (lines 421-424)

      The term "universal" in describing this tension-based regulation mechanism seems unjustified without examination of other kinesins. They might consider Kif1A as a subject given its shorter and seemingly more entropically-constrained neck linker. Recent structures of Kif1A bound to MTs in two-heads bound states have recently been described by Benoit et al. (Nat Comm. 2024).

      Response: We agree with the reviewer and acknowledge that this tension-based regulation mechanism may not apply to some other kinesin subfamilies, which have different neck linker properties, such as varying neck linker lengths or specific interactions with the motor domain. We removed the word “universal” from the Abstract, Introduction and Discussion and added a final sentence to the Discussion as follows: “Additionally, studies are needed to examine whether this mechanism extends to other kinesin subfamilies with different neck linker properties, such as varying neck linker lengths (kinesin-2: Hariharan and Hancock, 2009; kinesin-3: Benoit et al., 2024) or specific interactions with the motor domain (kinesin-6: Guan et al., 2016; Ranaivoson et al., 2023).” (lines 501-505).

      The authors should consider discussing how having two chains in the asymmetric unit of the APO motor impacts the NL structure.

      Response: The G234A apo and G234V apo crystals share the same asymmetric unit since the G234A crystal was grown from a G234V crystal seed. We inspected the structures near the proximal end of the neck linker (or the C-terminus of the a6 helix connected to neck linker) that could cause steric hindrance or direct interaction with the initial segment of the neck linker. The closest element of the adjacent chain was L5, which was separated by 1.1 nm from the proximal end of the neck linker (K324 residue) and did not interact with it. The proximal ends of the neck linkers of chains A and B face each other, with a cylindrical cavity between them. This cavity in G234V apo allows an antiparallel β-sheet formation between the two stretched neck linkers of chain A and B (Figure S2A). However, we did not observe density corresponding to the antiparallel β-sheet in the cavity of G234A apo, likely due to its slightly smaller cavity size. Notably, this antiparallel β-sheet formation would be geometrically impossible for the two neck linkers in a dimer since their C-termini are joined in parallel by the neck coiled-coil. These explanations have been added to the text (lines 154-156) and the legend of Figure S2.

      At barely 3 angstroms, how are waters modelled and how is it their B-factors are so low? Rfact and Rfree are also quite divergent for the GA mutant (APO) structure.

      Response: To improve the R-factor, we placed water molecules to account for unmodeled and discontinuous electron density peaks that were too small to be interpreted as polypeptides. However, this treatment was likely incorrect and is the primary reason for both the low B-factor and Rfree values, which led to the large discrepancy between Rwork and Rfree. To address this issue, we repeated the structural refinement of G234A and G234V apo structures by removing water molecules placed on unmodeled density peaks. We retained only one water molecule in the nucleotide pocket of chain A in the G234A apo structure due to its well-defined density (Figure S1). This improved refinement significantly reduced the discrepancy between Rwork and Rfree of G234A apo from 20.0/28.1% to 20.7/26.5%. For G234V apo, while the discrepancy remained unchanged, the overall values were improved from 24.4/29.2% to 20.0/25.8%. We updated Table 1 and deposited these refined structures to the Protein Data Bank (PDB# 9L78 and 9L6K) with details provided in the “Data availability” section.

      Lines 262-276: This section describes our current understanding of the mechanism of neck linker docking in accord with NP closure, which seems out of place in the Results and more relevant for the Discussion. Likewise, the two paragraphs before and after the description of the gold nanocluster study describe a re-evaluation and graphical/animated description of others' findings (Figure 4 and videos 1 and 2), rather than analysis of structural data obtained experimentally in this study.

      Response: We acknowledge that this paragraph describes previous findings rather than current results. Therefore, we have relocated it to the Discussion section with appropriate citations from the literatures (lines 390-406). In addition, the paragraph, which precedes the gold nanocluster study, draws from previous research using different subdomain boundaries, so we added the relevant citations accordingly (line 238).

      It is mentioned in the Discussion that the neck linker-docking is not necessary to trigger the forward step after ATP binding, but rather the rotation of the R-domain is sufficient to diminish the steric hindrance that limits tethered head binding. Are they suggesting that the neck linker could be undocked or disordered when making the forward step of a two-headed motor? According to other structural studies, a fully docked neck-linker is required to adopt the closed conformation. Moreover, binding of the leading head to the MT is necessary for complete closure of the nucleotide-binding pocket of the trailing head.

      Response: This sentence was included because it offers a testable prediction for neck linker-docking independent stepping, and we are currently preparing a manuscript to test this hypothesis experimentally. The prediction is supported by Niitani et al.’s finding (biorxiv 10.1101/2024.09.19.613828) that loose neck linker crosslinking, which allows docking of the initial segment of the neck linker onto the head but prevents complete neck docking, reduced ATP-induced microtubule detachment rate by half. However, since this statement challenges the conventional understanding in this field and requires further verification, as noted by reviewer 2, we have removed it to avoid confusion.

      Minor Comments:

      Line 113 - "nucletodiee-free" spelling.

      Response: We have corrected the typo (line 117).

      Lines 118-122 - Final sentence of Introduction needs improvement: "Moderate neck-linker extension"? Terms are not defined/vague.

      Response: To clarify this point, we revised this sentence as follows: “among possible conformational transitions, the one that requires less entropy reduction from stretching the disordered neck linker is favored” (lines 123-125).

      Line 131 - Possible Error: "N-terminal motor domain (1-332 residues)" - should this be 1-322?

      Response: This is our mistakes and we corrected the number of residues (line 134).

      It could be difficult for some readers to follow the naming convention used Tapo-Lapo which is equivalent to Topen-Lopen in the final mechanistic model figure.

      Response: In response to the reviewer’s comment, we have removed the reference to the Tapo-Lapo state from the Introduction and revised the notation in the Result section from Tapo-Lapo to Topen-Lopen.

    1. Old English is so unlike the modern version that it feels like a stretch to think of them as the same language at all

      It may be useful to teach this in schools, granted we have translations. to preserve the language and to continue the tradition of old english. Similar to how Jewish people study and learn Hebrew for their religious traditions, making many ancient texts readable.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews: 

      Reviewer #1 (Public review): 

      Koren et al. derive and analyse a spiking network model optimised to represent external signals using the minimum number of spikes. Unlike most prior work using a similar setup, the network includes separate populations of excitatory and inhibitory neurons. The authors show that the optimised connectivity has a like-to-like structure, which leads to the experimentally observed phenomenon of feature competition. The authors also examine how various (hyper)parameters-such as adaptation timescale, the excitatory-to-inhibitory cell ratio, regularization strength, and background current-affect the model. These findings add biological realism to a specific implementation of efficient coding. They show that efficient coding explains, or at least is consistent with, multiple experimentally observed properties of excitatory and inhibitory neurons. 

      As discussed in the first round of reviews, the model's ability to replicate biological observations such as the 4:1 ratio of excitatory vs. inhibitory neurons hinges on somewhat arbitrary hyperparameter choices. Although this may limit the model's explanatory power, the authors have made significant efforts to explore how these parameters influence their model. It is an empirical question whether the uncovered relationships between, e.g., metabolic cost and the fraction of excitatory neurons are biologically relevant.

      The revised manuscript is also more transparent about the model's limitations, such as the lack of excitatory-excitatory connectivity. Further improvements could come from explicitly acknowledging additional discrepancies with biological data, such as the widely reported weak stimulus tuning of inhibitory neurons in the primary sensory cortex of untrained animals.

      We thank the Reviewer for their insightful characterization of our paper and for further suggestions on how to improve it. We have now further improved the transparency about model’s limitations and we explicitly acknowledged the discrepancy with biological data about connection probability and about the selectivity of inhibitory neurons (pages 4 and 15).

      Reviewer #2 (Public review): 

      Summary: 

      In this work, the authors present a biologically plausible, efficient E-I spiking network model and study various aspects of the model and its relation to experimental observations. This includes a derivation of the network into two (E-I) populations, the study of single-neuron perturbations and lateral-inhibition, the study of the effects of adaptation and metabolic cost, and considerations of optimal parameters. From this, they conclude that their work puts forth a plausible implementation of efficient coding that matches several experimental findings, including feature-specific inhibition, tight instantaneous balance, a 4 to 1 ratio of excitatory to inhibitory neurons, and a 3 to 1 ratio of I-I to E-I connectivity strength.

      Strengths: 

      While many network implementations of efficient coding have been developed, such normative models are often abstract and lacking sufficient detail to compare directly to experiments. The intention of this work to produce a more plausible and efficient spiking model and compare it with experimental data is important and necessary in order to test these models. In rigorously deriving the model with real physical units, this work maps efficient spiking networks onto other more classical biophysical spiking neuron models. It also attempts to compare the model to recent single-neuron perturbation experiments, as well as some long-standing puzzles about neural circuits, such as the presence of separate excitatory and inhibitory neurons, the ratio of excitatory to inhibitory neurons, and E/I balance. One of the primary goals of this paper, to determine if these are merely biological constraints or come from some normative efficient coding objective, is also important. Lastly, though several of the observations have been reported and studied before, this work arguably studies them in more depth, which could be useful for comparing more directly to experiments.

      Weaknesses: 

      This work is the latest among a line of research papers studying the properties of efficient spiking networks. Many of the characteristics and findings here have been discussed before, thereby limiting the new insights that this work can provide. Thus, the conclusions of this work should be considered and understood in the context of those previous works, as the authors state. Furthermore, the number of assumptions and free parameters in the model, though necessary to bring the model closer to biophysical reality, make it more difficult to understand and to draw clear conclusions from. As the authors state, many of the optimality claims depend on these free parameters, such as the dimensionality of the input signal (M=3), the relative weighting of encoding error and metabolic cost, and several others. This raises the possibility that it is not the case that the set of biophysical properties measured in the brain are accounted for by efficient coding, but rather that theories of efficient coding are flexible enough to be consistent with this regime. With this in mind, some of the conclusions made in the text may be overstated and should be considered in this light.

      Conclusions, Impact, and additional context: 

      Notions of optimality are important for normative theories, but they are often studied in simple models with as few free parameters as possible. Biophysically detailed and mechanistic models, on the other hand, will often have many free parameters by their very nature, thereby muddying the connection to optimality. This tradeoff is an important concern in neuroscientific models. Previous efficient spiking models have often been criticized for their lack of biophysically-plausible characteristics, such as large synaptic weights, dense connectivity, and instantaneous communication. This work is an important contribution in showing that such networks can be modified to be much closer to biophysical reality without losing their essential properties. Though the model presented does suffer from complexity issues which raise questions about its connections to "optimal" efficient coding, the extensive study of various parameter dependencies offers a good characterization of the model and puts its conclusions in context.

      We thank the Reviewer for their thorough and accurate assessment of our paper.  

      Reviewer #3 (Public review): 

      Summary: 

      In their paper the authors tackle three things at once in a theoretical model: how can spiking neural networks perform efficient coding, how can such networks limit the energy use at the same time, and how can this be done in a more biologically realistic way than previous work. 

      They start by working from a long-running theory on how networks operating in a precisely balanced state can perform efficient coding. First, they assume split networks of excitatory (E) and inhibitory (I) neurons. The E neurons have the task to represent some lower dimensional input signal, and the I neurons have the task to represent the signal represented by the E neurons. Additionally, the E and I populations should minimize an energy cost represented by the sum of all spikes. All this results in two loss functions for the E and I populations, and the networks are then derived by assuming E and I neurons should only spike if this improves their respective loss. This results in networks of spiking neurons that live in a balanced state, and can accurately represent the network inputs. 

      They then investigate in depth different aspects of the resulting networks, such as responses to perturbations, the effect of following Dale's law, spiking statistics, the excitation (E)/inhibition (I) balance, optimal E/I cell ratios, and others. Overall, they expand on previous work by taking a more biological angle on the theory and show the networks can operate in a biologically realistic regime.

      Strengths: 

      * The authors take a much more biological angle on the efficient spiking networks theory than previous work, which is an essential contribution to the field

      * They make a very extensive investigation of many aspects of the network in this context, and do so thoroughly

      * They put sensible constraints on their networks, while still maintaining the good properties these networks should have

      Weaknesses: 

      * One of the core goals of the paper is to make a more biophysically realistic network than previous work using similar optimization principles. One of the important things they consider is a split into E and I neurons. While this works fine, and they consider the coding consequences of this, it is not clear from an optimization perspective why the split into E and I neurons and following Dale's law would be beneficial. This would be out of scope for the current paper however.

      * The theoretical advances in the paper are not all novel by themselves, as most of them (in particular the split into E and I neurons and the use of biophysical constants) had been achieved in previous models. However, the authors discuss these links thoroughly and do more in-depth follow-up experiments with the resulting model. 

      Assessment and context: 

      Overall, although much of the underlying theory is not necessarily new, the work provides an important addition to the field. The authors succeeded well in their goal of making the networks more biologically realistic, and incorporate aspects of energy efficiency. For computational neuroscientists this paper is a good example of how to build models that link well to experimental knowledge and constraints, while still being computationally and mathematically tractable. For experimental readers the model provides a clearer link of efficient coding spiking networks to known experimental constraints and provides a few predictions.

      We thank the Reviewer for a positive assessment and for pointing out the merits of our work.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors):

      The authors have addressed my previous concerns, and I agree that the manuscript has improved. However, I believe they could still do more to acknowledge two notable mismatches between the model and experimental data.

      (1) Stimulus selectivity of excitatory and inhibitory neurons 

      In the model, excitatory and inhibitory neurons exhibit similar stimulus selectivity, which appears inconsistent with most experimental findings. The authors argue that whether inhibitory neurons are less selective remains an open question, citing three studies in support. However, only one of these studies (Ranyan) was conducted in primary sensory cortex and it is, to my knowledge, one of the few papers showing this (indeed, it's often cited as an exception). The other two studies (Kuan and Najafi) recorded from the parietal cortex of mice trained on decision making tasks, and therefore seem less relevant to the model.

      In contrast to the cited studies, the overwhelming majority of the work has found that inhibitory neurons in sensory cortex, in particular those expressing Parvalbumin, are less stimulus selective than excitatory cells. And this is indeed the prevailing view, as summarized by the review from Hu et al. (Science, 2014): "PV+ interneurons exhibit broader orientation tuning and weaker contrast specificity than pyramidal neurons." This view emerged from numerous classical studies, including Sohya et al. (J. Neurosci., 2007), Cardin (J. Neurosci., 2007), Nowak (Cereb. Cortex, 2008), Niell et al. ( J. Neurosci., 2008), Liu (J. Neurosci., 2009), Kerlin (Neuron, 2010), Ma et al. (J. Neurosci., 2010), Hofer et al. (Nature Neurosci. 2011), and Atallah et al. (Neuron 2012). Weak inhibitory tuning has been confirmed by recent studies, such as Sanghavi & Kar (biorxiv 2023), Znamenskiy et al. (Neuron 2024), and Hong et al. (Nature, 2024).

      The authors should acknowledge this consensus and cite the conflicting evidence. Failing to do so is cherry picking from the literature. Since training can increase the stimulus selectivity of PV+ neurons to that of Pyr levels, also in primary visual cortex (Khan et al. Neuron 2018), a favourable interpretation of the model is that it represents a highly optimized, if not overtrained, state.

      We have carefully considered the literature cited by the Reviewer. We agree with the interpretation that stimulus selectivity of inhibitory neurons in our model is higher than the stimulus selectivity of Parvalbumin-positive inhibitory neurons in the primary sensory cortex of naïve animals. We have edited the text in Discussion (page 14).

      (2) Connection probability 

      The manuscript claims that "rectification sets the overall connection probability to 0.5, consistent with experimental results (Pala & Petersen; Campagnola et al.)." However, the cited studies, and others, report significantly lower probabilities, except for Pyr-PV (E-I connections in the model). For example, Campagnola et al. measured PV-Pyr connectivity at 34% in L2/3 and 20% in L5.

      It's perfectly acceptable that the model cannot replicate every detail of biological circuits. But it's important to be cautious when claiming consistency with experimental data.

      Here as well, we agree with the Reviewer that the connection probability of 0.5 is consistent with reported connectivity of Pyr-PV neurons, but less so with reported connectivity of PV-Pyr neurons. We have now qualified our claim about compatibility of the connection probability in our model with empirical observations more precise (page 4).

      Reviewer #2 (Recommendations for the authors): 

      I commend the authors for an extremely thorough and detailed rebuttal, and for all of the additional work put in to address the reviewer concerns. For the most part, I am satisfied with the current state of the manuscript. 

      We thank the Reviewer for recognizing our effort to address the first round of Reviews to our best ability.

      Here are some small points still remaining that I think the authors should address: 

      (1) Pg. 8, "We verified the robustness of the model to small deviations from the optimal synaptic weights" - while the authors now cite Calaim et al. 2022 in the discussion, its relevance to several of the results justify its inclusion in other places. Here is one place where the authors test something that was also studied in this previous paper.

      The Reviewer is correct that Calaim et al. (eLife 2022) addressed the robustness of synaptic weights, and we now cited this study when describing our results on jiVering of synaptic connections (page 8).

      (2) Pg. 9, "In our optimal E-I network we indeed found that optimal coding efficiency is achieved in absence of within-neuron feedback or with weak adaptation in both cell types" Pg. 10, "the absence of within-neuron feedback or the presence of weak and short-lasting spike-triggered adaptation in both E and I neurons are optimally efficient solutions" The authors seem to state that both weak adaptation and no adaptation at all are optimal. In contrast to the rest of the results presented, this is very vague and does not give a particular level of adaptation as being optimal. The authors should make this more clear. 

      We agree that the text about optimal level of adaptation was unclear. The optimal solution is no adaptation, while weak and short-lasting adaptation define a slightly suboptimal, yet still efficient, network state, as now stated on page 10.

      (3) Pg. 13, "In summary our analysis suggests that optimal coding efficiency is achieved with four times more E neurons than I neurons and with mean I-I synaptic efficacy about 3 times stronger..." --- claims such as these are still too strong, in my opinion. It is rather the case that the particular ratio of E to I neurons and connections strengths can be made consistent with an optimally efficient regime.

      We agree here as well. We have revised the text (page 13) to beVer explain our results.

      (4) Pg. 14, "firing rates in the 1CT model were highly sensitive to variations in the metabolic constant" (Fig. 8I, as compared to Fig. 6C). This difference between the 1CT and E-I networks is striking, and I would suspect it is due to some idiosyncrasies in the difference between the two models (e.g., the relative amount of delay that it takes for lateral inhibition to take effect, or the fact that E-E connections have not been removed in this model). The authors should ideally back up this result with some justified explanation. 

      We agree with Reviewer that the delay for lateral inhibition in the E-I model is twice that of the 1CT model and that the E-I model gains stability from the lack of E-E connectivity. Furthermore, the tuning is stronger in I compared to E neurons in the E-I model, which contributes to making the E-I network inhibition-dominated (Fig. 1H). In contrast, the average excitation and inhibition in the 1CT model are of exactly the same magnitude. The property of being inhibition-dominated makes the E-I model more stable. We report these observations in the revised text (pages 14-15). 

      Reviewer #3 (Recommendations for the authors): 

      Overall my points were very well responded to and I removed most of my weaknesses.

      I appreciate the authors implementing my suggested analysis change for Figure 8, and I find the result very clear. I would further suggest they add a bit of text for the reader as to why this is done. For a new reader without much knowledge of these networks at first it seems the inhibitory population is very good at representation in fig 8G: so why is it not further considered in fig 8H?

      We thank the reviewer for providing further suggestions. We now clarified in the text why only the excitatory population of the E-I model is considered in E-I vs 1 cell type model comparison (page 14). 

      Thanks for sharing the code. From a quick browse through it looks very manageable to implement for follow up work, although some more guidance for how to navigate the quite complicated codebase and how to reproduce specific paper results would be helpful.

      We have also updated the code repository, where we have included more complete instructions on how to reproduce results of each figure. We renamed the folders with the computer code so that they point to a specific figure in the paper. The repository has been completed with the output of the numerical simulations we run, which allows immediate replot of all figures. We have deposited the repository at Zenodo to have the final version of the code associated with the DOI ttps://doi.org/10.5281/zenodo.14628524. This is mentioned in the section Code availability (page 17).

    1. We also need to consider the story of our data when working with qualitative data, such as quotations, observations, or descriptions.

      nm563 We should also think about the possible outliers or causes of some of the data as well as who the audience is and how it can affect the credibility of the data. How the data was achieved may also affect the credibility.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The authors constructed a novel HSV-based therapeutic vaccine to cure SIV in a primate model. The novel HSV vector is deleted for ICP34.5. Evidence is given that this protein blocks HIV reactivation by interference with the NF-kB pathway. The deleted construct supposedly would reactivate SIV from latency. The SIV genes carried by the vector ought to elicit a strong immune response. Together the HSV vector would elicit a shock and kill effect. This is tested in a primate model.

      Thank you for your kind comments and suggestions, which are very helpful in improving our manuscript. We have carefully revised our manuscript and performed additional experiments accordingly, and we now think this version has been substantially improved for your reconsideration.

      Strengths and weaknesses:

      (1) Deleting ICP34.5 from the HSV construct has a very strong effect on HIV reactivation. Why is no eGFP readout given in Figure 1C as for WT HSV? The mechanism underlying increased activation by deleting ICP34.5 is only partially explored. Overexpression of ICP34.5 has a much smaller effect (reduction in reactivation) than deletion of ICP34.5 (strong activation); so the story seems incomplete.

      Thank you for your careful review and kind reminder.

      (1) We are sorry for the misunderstanding of Figure 1C. In the experiment of Figue 1C, we used an HSV-1 17 strain containing GFP (HSV-GFP) and HSV-DICP34.5 (recombinant HSV-1 17 strain with ICP34.5 deletion based on HSV-GFP) to reactivate the HIV latency cell line (J-Lat 10.6 cell). Since detecting GFP cannot distinguish between HSV infection and HIV reactivation, we assessed the reactivation by measuring the mRNA levels of HIV LTR upon stimulation with either HSV-GFP or HSV-ΔICP34.5. Actually, in Figure 1B, we had verified the reactivation efficacy by infecting J-Lat 10.6 cells with the HSV-1 17 strain containing GFP (HSV-GFP) and found significant upregulation of mRNA levels of HIV-1 LTR, Tat, Gag, Vif, and Vpr. We have adjusted the corresponding descriptions accordingly in the revised manuscript.

      (2) We agree with your insightful mention that the mechanism underlying increased activation by HSV-ΔICP34.5 is worthy to be further explored in the future study. In this study, we found that ICP34.5 play an antagonistic role with the reactivation of HIV latency by HSV-1 mainly through the modulation of host NF-κB and HSF1 pathways, while HSV-1 (especially HSV-ΔICP34.5) might reactivate HIV latency through NF-κB, HSF1, and other yet-to-be-determined mechanisms. Thus, ICP34.5 overexpression can only a partial effect on the reduction of the HIV latency reactivation by HSV-1. We have mentioned this issue in the revised “Discussion section”. Intriguingly, these findings collectively indicated that ICP34.5 might play an antagonistic role in the reactivation of HIV by HSV-1, and thus our modified HSV-DICP34.5 constructs can effectively reactivate HIV/SIV latency through the release of imprisonment from ICP34.5. However, ICP34.5 overexpression had only a partial effect on the reduction of the HIV latency reactivation, indicating that HSV-DICP34.5-based constructs can also reactivate HIV latency through other yet-to-be-determined mechanisms. (Lines 334 to 340).

      (2) No toxicity data are given for deleting ICP34.5. How specific is the effect for HIV reactivation? An RNA seq analysis is required to show the effect on cellular genes.

      Thank you for your questions and suggestions.

      (1) It’s well known that ICP34.5 is a neurotoxicity factor that can antagonize host immune responses, and previous studies (in gene therapy and oncolytic virotherapy) have shown that the safety of recombinant HSV-based vector can be improved by deleting ICP34.5. In this study, we also found that HSV-DICP34.5 exhibited lower virulence and replication ability than its parental strain (HSV-GFP) (Figure 1D, Figure S1). In addition, HSV-DICP34.5 induced a lower level of inflammatory cytokines (including IL-6, IL-1β, and TNF-α) in primary CD4+ T cells from PLWH compared to HSV-GFP stimulation, likely due to its lower virulence and replication ability (Figure 1I-K). In addition, the CD4+ /CD8+ T cell ratio (Figure 5I) and body weight (Figure S9) after treatment were effectively ameliorated in the SIV-infected macaques of the ART+HSV-DICP34.5-sPD1-SIVgag/SIVenv group. Our data also demonstrated that there was no significant effect on the cell composition of peripheral blood in the SIV-infected macaques of ART+HSV-sPD1-SIVgag/SIVenv group (Figure S10). Thus, these data suggest the safety of HSV-DICP34.5 in PLWH might be tolerable. We have added the corresponding description in the revised manuscript.

      (2) In our study, we found both adenovirus and vaccinia virus cannot reactivate HIV latency (Figure S3). In addition, the deletion of ICP0 gene from HSV-1 diminished the reactivation effect of HIV latency by HSV-1 (Figure S4). Thus, these data suggested the reactivation of HIV latency by HSV-1 might be virus-specific. Of course, this might be further investigated in future studies. We have added the corresponding description in the revised manuscript.

      (3) To explore the mechanism of reactivating viral latency by HSV-DICP34.5-based constructs, we performed RNA-seq analysis (Figure S5). We have added the corresponding description accordingly in the revised manuscript.

      (3) The primate groups are too small and the results to variable to make averages. In Figure 5, the group with ART and saline has two slow rebounders. It is not correct to average those with a single quick rebounder. Here the interpretation is NOT supported by the data.

      We agree with you that this is a pilot study with limited numbers of rhesus macaques. Although the number of macaques was relatively limited, these nine macaques were distributed evenly based on the background level of age, sex, weight, CD4 count, and viral load (VL) (Table S2). All SIV-infected macaques used in this study had a long history of SIV infection and had several courses of ART therapy, which mimics treatment of chronic HIV-1 infection in humans. These macaques were infected with SIVmac239 for more than 5 years, and highly pathogenic SIV-infected macaques have been well-validated as a stringent model to recapitulate HIV-1 pathogenesis and persistence during ART therapy in humans. Indeed, in our Chinese rhesus model, ART treatment effectively suppressed SIV infection to undetectable levels in plasma, and upon ART discontinuation, virus rapidly rebounded, which is very similar with that in ART-treated HIV patients. We think the results of this pilot study were very promising for further studies which will be expanded the scale of animals and then to preclinical and clinical study in our next projects. Thank you for your understanding.

      As for your question regarding “the two animals with low VL and slow rebound”, our explanation is following: As mentioned above, these macaques were distributed evenly based on the background level of CD4 count and VL (Table S2), and then there were different change of viral load and viral rebound in different groups. Thus, we think these data can support our interpretation. Moreover, our conclusion can also be supported from at least three evidences.

      (1) The VL in the ART+saline group promptly rebounded after ART discontinuation, with an average 8.63-fold increase in the rebounded peak VL compared with the pre-ART VL (Figure 5A, D and E). However, plasma VL in the ART+HSV-sPD1-SIVgag/SIVenv group exhibited a delayed rebound interval (Figure 5B-D).

      (2) There was a lower rebounded peak VL than pre-ART VL in the ART+HSV-sPD1-SIVgag/SIVenv group (average 12.20-fold decrease), while a higher rebounded peak VL than pre-ART VL in the ART+HSV-empty group (average 2.74-fold increase) (Figure 5E).

      (3) We found significant suppression of total SIV DNA and integrated SIV DNA provirus in the ART+HSV-sPD1-SIVgag/SIVenv group. However, the copies of the SIV DNA provirus were significantly improved in the ART+HSV-empty group and ART+saline group (Figure 5F-G).

      Thank you for your understanding.

      Discussion

      HSV vectors are mainly used in cancer treatment partially due to induced inflammation. Whether these are suitable to cure PLWH without major symptoms is a bit questionable to me and should at least be argued for.

      Thank you for your kind question comment and question. We confirmed the enhanced reactivation of HIV latency by HSV-∆ICP34.5 in primary CD4+ T cells from people living with HIV (PLWH) (Figure S2). As mentioned above, previous studies have shown that the safety of recombinant HSV-based vector can be improved by deleting ICP34.5. In this study, we also found that HSV-DICP34.5 exhibited lower virulence and replication ability than its parental strain (HSV-GFP) (Figure 1D, Figure S1). In addition, HSV-DICP34.5 induced a lower level of inflammatory cytokines (including IL-6, IL-1β, and TNF-α) in primary CD4+ T cells from PLWH compared to HSV-GFP stimulation, likely due to its lower virulence and replication ability (Figure 1I-K). In addition, the CD4+ /CD8+ T cell ratio (Figure 5I) and body weight (Figure S9) after treatment were effectively ameliorated in the SIV-infected macaques of the ART+HSV-DICP34.5-sPD1-SIVgag/SIVenv group. Our data also demonstrated that there was no significant effect on the cell composition of peripheral blood in the SIV-infected macaques of ART+HSV-sPD1-SIVgag/SIVenv group (Figure S10). Thus, these data suggest the safety of HSV-DICP34.5 in PLWH might be tolerable. We have added the corresponding description in the revised manuscript.

      Reviewer #2 (Public Review):

      Summary:

      In this article, Wen et. al. describe the development of a 'proof-of-concept' bi-functional vector based on HSV-deltaICP-34.5's ability to purge latent HIV-1 and SIV genomes from cells. They show that co-infection of latent J-lat T-cell lines with an HSV-deltaICP-34.5 vector can reactivate HIV-1 from a latent state. Over- or stable expression of ICP 34.5 ORF in these cells can arrest latent HIV-1 genomes from transcription, even in the presence of latency reversal agents. ICP34.5 can co-IP with- and de-phosphorylate IKKa/b to block its interaction with NF-k/B transcription factor. Additionally, ICP34.5 can interact with HSF1 which was identified by mass-spec. Thus, the authors propose that the latency reversal effect of HSV-deltaICP-34.5 in co-infected JLat cells is due to modulatory effects on the IKKa/b-NF-kB and PP1-HSF-1 pathway.

      Next, the authors cleverly construct a bifunctional HSV-based vector with deleted ICP34.5 and 47 ORFs to purge latency and avoid immunological refluxes, and additionally, expand the application of this construct as a vaccine by introducing SIV genes. They use this 'vaccine' in mouse models and show the expected SIV-immune responses. Experiments in rhesus macaques (RM), further elicit the potential for their approach to reactivate SIV genomes and at the same time block their replication by antibodies. What was interesting in the SIV experiments is that the dual-functional vector vaccine containing sPD1- and SIV Gag/Env ORFs effectively delayed SIV rebound in RMs and in some cases almost neutralized viral DNA copy detection in serum. Very promising indeed, however, there are some questions I wish the authors had explored to get answers to, detailed below.

      Overall, this is an elegant and timely work demonstrating the feasibility of reducing virus rebound in animals, with the potential to expand to clinical studies. The work was well-written, and sections were clearly discussed.

      Strengths:

      The work is well designed, rationale explained, and written very clearly for lay readers.<br /> Claims are adequately supported by evidence and well-designed experiments including controls.

      Thank you for your nice comments regarding our work.

      Weaknesses:

      (1) While the mechanism of ICP34.5 interaction and modulation of the NF-kB and HSF1 pathways are shown, this only proves ICP34.5 interactions but does not give away the mechanism of how the HSV-deltaICP-34.5 vector purges HIV-1 latency. What other components of the vector are required for latency reversal? Perhaps serial deletion experiments of the other ORFs in the HSV-deltaICP-34.5 vector might be revealing.

      Thank you for your valuable suggestion. In fact, we are currently further exploring some potential viral genes of HSV-1 that might play a role in the reactivation of HIV latency. We have found that the deletion of ICP0 gene from HSV-1 diminished the reactivation effect of HIV latency by HSV-1 (Figure S4), showing that ICP0 might play a vital role for the reactivation. Of course, this might be further investigated in future studies. We have added the corresponding description in the revised manuscript.

      (2) The efficacy of the HSV vaccine vectors was evaluated in Rhesus Macaque model animals. Animals were chronically infected with SIV (a parent of HIV), treated with ART, challenged with bi-functional HSV vaccine or controls, and discontinued treatment, and the resulting virus burden and immune responses were monitored. The animals showed SIV Gag and Env-specific immune responses, and delayed virus rebound (however rebound is still there), and below-detection viral DNA copies. What would make a more convincing argument to this reviewer will be data to demonstrate that after the bi-functional vaccine, the animals show overall reduction in the number of circulating latent cells. The feasibility of obtaining such a result is not clearly demonstrated.

      Thank you for your valuable mention. We have now provided more data about this issue. We found significant suppression of total SIV DNA and integrated SIV DNA provirus in the ART+HSV-sPD1-SIVgag/SIVenv group. However, the copies of the SIV DNA provirus were significantly improved in the ART+HSV-empty group and ART+saline group (Figure 5F-G). We have added the corresponding description in the revised manuscript.

      (3) The authors state that the reduced virus rebound detected following bi-functional vaccine delivery is due to latent genomes becoming activated and steady-state neutralization of these viruses by antibody response. This needs to be demonstrated. Perhaps cell-culture experiments from specimens taken from animals might help address this issue. In lab cultures one could create environments without antibody responses, under these conditions one would expect a higher level of viral loads to be released in response to the vaccine in question.

      Thanks for your kind mention and suggestion. We performed the following cell experiment to address this issue. Primary CD4+ T cells from people living with HIV (PLWH) were isolated, and then infected with HSV or HSV-∆ICP34.5 constructs. As expected, we confirmed the enhanced reactivation of HIV latency by HSV-∆ICP34.5 (Figure S2). Thank you.

      (4) How do the authors imagine neutralizing HIV-1 envelope epitopes by a similar strategy? A discussion of this point may also help.

      Thank you for your kind comment. We have added the corresponding discussion in the revised manuscript. “The current consensus on HIV/AIDS vaccines emphasizes the importance of simultaneously inducing broadly neutralizing antibodies and cellular immune responses. Therefore, we believe that incorporating the induction of broadly neutralizing antibodies into our future optimizing approaches may lead to better therapeutic outcomes.” (Lines 384 to 388)

      (5) I thought the empty HSV-vector control also elicited somewhat delayed kinetics in virus rebound and neutralization, can the authors comment on why this is the case?

      Thank you for your careful review and mention. We agree with you that the HSV-1 empty vector does exhibit somewhat a delayed rebound. We think the possible reason is: Although the empty HSV-vector cannot elicit SIV-specific CTL responses, it effectively activates the latent SIV reserviors, and then these activated virions can be partially killed by ART drugs. Therefore, even without carrying HIV/SIV antigens, somewhat delayed kinetics in virus rebound may be observed. Thank you.

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors should provide toxicity data for HSV transduction after deleting ICP34.5 and provide an explanation of why overexpression of ICP34.5 has such a small effect.

      Thank you for your questions and suggestions. As mentioned above, we now provided data for the safety of HSV-DICP34.5-based constructs.

      (1) It’s well known that ICP34.5 is a neurotoxicity factor that can antagonize host immune responses, and previous studies (in gene therapy and oncolytic virotherapy) have shown that the safety of recombinant HSV-based vector can be improved by deleting ICP34.5. In this study, we also found that HSV-DICP34.5 exhibited lower virulence and replication ability than its parental strain (HSV-GFP) (Figure 1D, Figure S1). In addition, HSV-DICP34.5 induced a lower level of inflammatory cytokines (including IL-6, IL-1β, and TNF-α) in primary CD4+ T cells from PLWH compared to HSV-GFP stimulation, likely due to its lower virulence and replication ability (Figure 1I-K). In addition, the CD4+ /CD8+ T cell ratio (Figure 5I) and body weight (Figure S9) after treatment were effectively ameliorated in the SIV-infected macaques of the ART+HSV-DICP34.5-sPD1-SIVgag/SIVenv group. Our data also demonstrated that there was no significant effect on the cell composition of peripheral blood in the SIV-infected macaques of ART+HSV-sPD1-SIVgag/SIVenv group (Figure S10). Thus, these data suggest the safety of HSV-DICP34.5 in PLWH might be tolerable. We have added the corresponding description in the revised manuscript.

      (2) We agree with your insightful mention that the mechanism underlying increased activation by HSV-ΔICP34.5 is worthy to be further explored in the future study. In this study, we found that ICP34.5 play an antagonistic role with the reactivation of HIV latency by HSV-1 mainly through the modulation of host NF-κB and HSF1 pathways, while HSV-1 (especially HSV-ΔICP34.5) might reactivate HIV latency through NF-κB, HSF1, and other yet-to-be-determined mechanisms. Thus, ICP34.5 overexpression can only a partial effect on the reduction of the HIV latency reactivation by HSV-1. We have mentioned this issue in the revised “Discussion section”. “Intriguingly, these findings collectively indicated that ICP34.5 might play an antagonistic role in the reactivation of HIV by HSV-1, and thus our modified HSV-DICP34.5 constructs can effectively reactivate HIV/SIV latency through the release of imprisonment from ICP34.5. However, ICP34.5 overexpression had only a partial effect on the reduction of the HIV latency reactivation, indicating that HSV-DICP34.5-based constructs can also reactivate HIV latency through other yet-to-be-determined mechanisms.” (Lines 334 to 340).

      (2) How specific is the effect for HIV reactivation? An RNA seq analysis is required to show the effect on cellular genes.

      Thank you for your questions and suggestions.

      (1) In our study, we found both adenovirus and vaccinia virus cannot reactivate HIV latency (Figure S3). In addition, the deletion of ICP0 gene from HSV-1 diminished the reactivation effect of HIV latency by HSV-1 (Figure S4). Thus, these data suggested the reactivation of HIV latency by HSV-1 might be virus-specific. Of course, this might be further investigated in future studies. We have added the corresponding description in the revised manuscript.

      (2) To explore the mechanism of reactivating viral latency by HSV-DICP34.5-based constructs, we performed RNA-seq analysis (Figure S5). Results showed that there were numerous differentially expressed genes (DEGs) in response to HSV-ΔICP34.5 infection. Among them, 2288 genes were upregulated, and 611 genes were downregulated. GO analysis showed the enrichment of these DEGs in cellular cycle, cellular development, and cellular proliferation, and KEGG enrichment analysis indicated the enrichment in pathways such as cellular cycle and cytokine-cytokine receptor interaction. We have added the corresponding description accordingly in the revised manuscript.

      (3) A comparison in primates has to be given for constructs with or without ICP34.5 to validate cell culture data (what is an empty vector?)

      Thank you for your reminder. In the revised manuscript, we performed the following cell experiment to address this issue. Primary CD4+ T cells from people living with HIV (PLWH) were isolated, and then infected with HSV or HSV-∆ICP34.5 constructs. As expected, we confirmed the enhanced reactivation of HIV latency by HSV-∆ICP34.5 (Figure S2). Thank you.

      (4) Legends should be improved in writing and content.

      Thank you for your kind mention. In the revised version, we have improved both the manuscript content and the legends of all Figures have been carefully revised in writing and content. Thank you.

      (5) The primate groups should be enlarged before any reliable conclusions can be made. Inflammatory/tox data should be provided.

      Thank you for your question.

      (1) As mentioned above, we agree with you that this is a pilot study with limited numbers of rhesus macaques. Although the number of macaques was relatively limited, these nine macaques were distributed evenly based on the background level of age, sex, weight, CD4 count, and viral load (VL) (Table S2). All SIV-infected macaques used in this study had a long history of SIV infection and had several courses of ART therapy, which mimics treatment of chronic HIV-1 infection in humans. These macaques were infected with SIVmac239 for more than 5 years, and highly pathogenic SIV-infected macaques have been well-validated as a stringent model to recapitulate HIV-1 pathogenesis and persistence during ART therapy in humans. Indeed, in our Chinese rhesus model, ART treatment effectively suppressed SIV infection to undetectable levels in plasma, and upon ART discontinuation, virus rapidly rebounded, which is very similar with that in ART-treated HIV patients. We think the results of this pilot study were very promising for further studies which will be expanded the scale of animals and then to preclinical and clinical study in our next projects. Thank you for your understanding.

      (2) As well known, ICP34.5 is a neurotoxicity factor that can antagonize host immune responses, and previous studies have shown that the safety of recombinant HSV-based vector can be improved by deleting ICP34.5. In this study, we also found that HSV-DICP34.5 exhibited lower virulence and replication ability than its parental strain (HSV-GFP) (Figure 1D, Figure S1). In addition, HSV-DICP34.5 induced a lower level of inflammatory cytokines (including IL-6, IL-1β, and TNF-α) in primary CD4+ T cells from PLWH compared to HSV-GFP stimulation, likely due to its lower virulence and replication ability (Figure 1I-K). In addition, the CD4+ /CD8+ T cell ratio (Figure 5I) and body weight (Figure S9) after treatment were effectively ameliorated in the SIV-infected macaques of the ART+HSV-DICP34.5-sPD1-SIVgag/SIVenv group. Our data also demonstrated that there was no significant effect on the cell composition of peripheral blood in the SIV-infected macaques of ART+HSV-sPD1-SIVgag/SIVenv group (Figure S10). Thus, these data suggest the safety of HSV-DICP34.5 in PLWH might be tolerable. We have added the corresponding description in the revised manuscript.

      (6) Discuss the potential of inflammatory HSV vaccines to be used in PLWH without clinical symptoms.

      Thank you for your mention. As discussed above, we found that HSV-DICP34.5 exhibited lower virulence and replication ability than its parental strain (Figure 1D, Figure S1), and we also found that HSV-DICP34.5 induced a lower level of inflammatory cytokines (including IL-6, IL-1β, and TNF-α) in primary CD4+ T cells from PLWH compared to HSV-GFP stimulation, likely due to its lower virulence and replication ability (Figure 1I-K). In addition, the CD4+ /CD8+ T cell ratio (Figure 5I) and body weight (Figure S9) after treatment were effectively ameliorated in the SIV-infected macaques of the ART+HSV-DICP34.5-sPD1-SIVgag/SIVenv group. Our data also demonstrated that there was no significant effect on the cell composition of peripheral blood in the SIV-infected macaques of ART+HSV-sPD1-SIVgag/SIVenv group (Figure S10). Thus, these data suggest the safety of HSV-DICP34.5 in PLWH might be tolerable. We have added the corresponding description in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      I think the authors have done due diligence to the experimental system, and collected evidence to show the feasibility of delaying virus rebound in macaques. However, I would encourage the authors to perform experiments that can back up the claim that delayed virus rebound is due to neutralization effects, or perhaps due to a reduction in viral reservoir. I believe insights into this process will add rigor, and push the relevance of the study to the next level.

      Thank you for your nice comment and valuable suggestion. We have now provided more data about this issue. We found significant suppression of total SIV DNA and integrated SIV DNA provirus in the ART+HSV-sPD1-SIVgag/SIVenv group. However, the copies of the SIV DNA provirus were significantly improved in the ART+HSV-empty group and ART+saline group (Figure 5F-G). We also discussed that incorporating the induction of broadly neutralizing antibodies into our future optimizing approaches may lead to better therapeutic outcomes in the revised Discussion section. We have added the corresponding description in the revised manuscript. Thank you.

      Altogether, all of the above comments and suggestions are very helpful in improving our manuscript. We have taken these comments into account seriously and try our best to address these questions point-by-point. After making extensive revisions, we now submit this revised manuscript for your re-consideration. Thank you again for all of your comments and suggestions.

    1. amorphous. What makes a work “artistic”? How do we define “superior” or “lasting”? Let’s break down some of the defining qualities of literature in a bit more detail, starting with the word “artistic.” Consider the following works of art. "Wanderer Above the Sea of Fog" by Caspar David Friedrich (1818) is in the public domain "stick figure self portrait" by marco Links to an external site. on flickr (2019) is licensed CC BY-NC-SA 2.0 Which of these images do you feel is higher quality or more “artistic”? Which is lower-quality or less artistic? Why? So how does this relate to our attempts to define literature? Literature is art, but with words. While the artist uses different colors, paintbrushes, mediums, canvases, and techniques, the writer uses different genres and literary techniques called literary devices Links to an external site.. Just like different types of paint, paintbrushes, and artistic tools, there are literally hundreds of literary devices, but some of the most common are metaphor, simile, personification, and imagery. Genre is the type or style of literature. Each genre has its own conventions. Literary genres include creative nonfiction, fiction, drama, and poetry. Works that are literary tend to masterfully use genre conventions and literary devices to create a world in the mind of the reader. Works that are less literary tend to be for practical and/or entertainment purposes, and the writer dedicates less focused energy towards artfully employing literary devices. However, just because a work is not as literary as another does not mean it cannot be enjoyed. Just like a stick figure or cartoon character might be perfectly fine if intended for a particular audience or purpose, readers can still enjoy People Magazine even though it is not of the same literary quality as Hamlet. So, to use an example from earlier: Hamlet by William Shakespeare People Magazine Has lasted hundreds of years Is written by a master of his craft Covers deep, meaningful concepts like love, loss, war & political corruption Uses many literary devices such as metaphors Is often forgotten by the next issue Is written by a pop culture writer Covers shallow issues like what plastic surgeries a starlet has had Usually does not use literary devices in a masterful way, but merely to capture the attention of an audience While some literature falls into clear designations of literature or not literature, most works are open to debate. Given the sometimes difficult task of determining whether a work falls into one camp or the other, it may be more helpful to think of Literature less as a dichotomy than a spectrum, with popular magazines on one end and works like Hamlet and Beloved on the other, and most written works falling somewhere between the two extremes. The Literary Spectrum This spectrum can be a helpful way to think about literature because it provides a more open-ended way to discuss writing as art than simply labeling works as literary or not. After viewing the above chart, why do you think popular magazines and a Calculus textbook are considered "less literary"? In terms of popular magazines, they do not fit the definition of literature as "lasting" in the sense that they usually fade from relevancy quickly after publication. Additionally, the authors of such magazines are striving for quick entertainment rather than leaving a meaningful impression on the reader. They tend not to use literary devices, such as metaphor, in a masterful way. On the other end, Shakespeare's Hamlet definitely fits the definition of "lasting," in that it has survived hundreds of years. It is full of literary devices used for rhetorical effect and, one would argue, it touches upon deep themes such as death, the afterlife, murder, vengeance, and love, rather than trifling issues such as a starlet's most recent plastic surgery. Certainly, works of literature are up for debate: that is the quintessential question literary scholars might ask. What makes certain literary works survive the test of time? What makes a story, poem, or drama "good"? While literary scholars are less interested in proving a certain work is "good" or not -- and more focused on analyzing the ways to illuminate a given work -- it can be helpful for you to consider what kinds of literature you like and why you like it. What about the way it was written causes you to feel the way you do about it? Who Decides What is Literature? Now that we have at least somewhat clarified the definition of literature, who decides what works are or are not literature? Historically speaking, kings, queens, publishers, literary critics, professors, colleges, and readers (like you!) have decided which works survive and which works do not. Aristotle was one of the first writers to attempt to decide what works fall into the category of literature, and what works do not. While Aristotle was most famous for his contributions to science and philosophy, he is also considered one of the first literary critics. A literary critic is a person who studies and analyzes literature. A literary critic produces scholarship called literary criticism. An example of this would be Aristotle’s Poetics, in which he identifies the defining qualities of a “good” Tragedy. Aristotle’s analysis of Tragedy was so influential that it is still used today, over two thousand years later! When a work is officially decided to constitute literature, it enters something called the Canon. Not to be confused with the large metal tube that shoots bombs popular in the 16th through the 19th centuries (cannon), the Literary Canon is a collection of works that are considered by the powers that be to constitute literature. A work that falls into this designation is called canonical. So, to use an example from Aristotle’s Poetics, Aristotle defined Sophocles’ Oedipus Trilogy as the pinnacle of the Tragic Genre. From there, in part due to Aristotle's influence, Greek society valued Oedipus so much that they kept discussing, reading, referencing, and teaching it. Thus, it became a kind of shining example of the Tragic Canon, one which has lasted thousands of years and continues to be read and lauded to this day. Other tragedies, fairly or not, are often judged on their quality in comparison to Sophocles' works. It seems crazy to think but someone who died thousands of years ago still influences what we consider literature today! Memes and Video Games: Today's Literature? All this talk of thousands-of-years-old texts might seem out of touch. A lot of people think "old and boring" and literature are synonymous. Students are often surprised to hear that comic books and video games can, arguably, be considered literature, too. There are plenty of arguments to be made that comic books, such as Maus by Art Spiegalman (1991) or Fun Home by Alison Bechdel (2006) are literature. Cutting edge literary scholars argue video games like Kentucky Route Zero by Cardboard Computer (2015) can be considered literary. There is also literature that is published in tweets, like Jennifer Egan's "Black Box"  Links to an external site.(2012). Some might even consider memes literature! Generative question: do you think memes can be literary? A meme is an image or video containing cultural values or ideas, often represented through allusion (implied reference to another work, without naming that work or its author). Memes can spread rapidly spreads through social media. Why? Because the best ones are #relatable; that is, they speak to a common human experience. Usually memes take the form of text superimposed on an image. For example, the meme above conveys the dramatic reaction students sometimes give when I assign an essay. This is done primarily through a literary device called hyperbole, or exaggeration for rhetorical effect. It conveys its message comically through certain conventions that come along with the meme genre, such as the syntactic structure "me, a [insert noun]" and asterisks, which convey action. Just like in the Shakespearean drama, the colon indicates what each character (me and the students, in this case) is saying or doing. My chihuahuas' face looks silly and very dramatic. Through this use of image, text, format, and convention, the meaning I intended to convey was that I was making fun of my students for being over-dramatic about what to me seems like a fairly simple assignment. While some might dismiss memes as shallow, when you start to unravel the layers of meaning, they can actually be very complex and even, dare I say, literary! Think about a recent meme you have seen, or your favorite meme of all time. Imagine explaining this meme to someone who has no idea what it means. What is the message or idea behind the meme? What cultural reference points does it use to convey this message? In what ways might this meme be considered literature? How might this compare to a short poem, like a haiku? Not Literature Let's say you come to the conclusion that a meme, a gossip magazine, or the Twilight Series is not literary. Does that mean you have to feel guilty and give up reading it forever? Or that it is not "good"? No! Just because a work is not literary does not mean it is "bad," that it does not have value, or that one cannot enjoy it. Indeed, there are plenty of examples of written works that are on the less literary side of the spectrum but are still fun and enriching to read. Joe Dirt is not on the same artistic level of cinema as Schindler's List, but my husband still loves watching it. Nothing Taylor Swift has produced is as deep as Tupac Shakur's "Changes" Links to an external site. (1992) or Mitski's "Last Words of a Shooting Star" Links to an external site. (2014), but listening to Taylor Swift is my guilty pleasure. This is all to say that whether a text is literary or not is not as important as the methods of analyzing texts. In fact, texts which were excluded from literature are often argued into the literary canon through such analysis. Part of what makes analyzing literature so fun is that it means the definition of literature is always up for debate! This is especially important given the history of the canon. The Problem with the Canon In an ideal world, literature would be celebrated purely based on its artistic merit. Well-written works would last, poorly-written works would wither from public memory. However, that is not always the case. Works often achieve public prominence or survive based on qualities unrelated to skill or aesthetics, such as an author's fame, wealth, connections, or acceptance by the dominant culture. William Wordsworth, for example, was named Poet Laureate of England and has been taught as one of the #Big6 major Romantic-era authors ever since. Indeed, he is accepted as part of the literary canon. One would be hard-pressed to find a Literature anthology that does not feature William Wordsworth. However, how many people have read or heard of Dorothy Wordsworth, William Wordsworth's sister, who arguably depicted Romantic themes with equal skill and beauty? Or James Hogg, a Scottish contemporary of Wordsworth who was a lower-class shepherd? Similarly, while most readers have encountered F. Scott Fitzgerald or Edgar Allen Poe in their high school literature classes, how many have read Frederick Douglass? In short, all artistic skill (arguably) considered equal, why do some authors predominantly feature in the Canon while others do not? Let’s perform an experimental activity. On a scratch piece of paper, write down as many works of literature that you feel constitute “Big L Literature.” Perhaps they are works you read in high school, works which have been made into films, or works you have been taught or told are literary masterworks. Don’t turn the page until you have written them down. Try to think of at least 10, but a larger sample size is better. Once you are finished, continue to the next paragraph. Alright, now look at your list. If you know the author of the literary texts you named, write their name next to the work. If you do not know the author, Google the information and write it down. Continue doing this until you have named the author of each work. Once you are finished, read on to the next paragraph. Now, as uncomfortable as it seems, label the gender/race/age/presumed sexual orientation of the authors you listed. After you have categorized them to the best of your ability, consider the following questions: What percentage of the authors are male? What percentage of the authors are white? What percentage of the authors are old/dead? What patterns do you notice? Why do you think this is? Answer As a cultural relic, similar to art, many scholars suggest literature is a reflection of the society which produces it. This includes positive aspects of society (championing values such as love, justice, and good triumphing over evil), but it can also reflect negative aspects of society (such as discrimination, racism, sexism, homophobia, historical lack of opportunity for marginalized authors). For example, enslaved Africans were often prevented from learning to read and write as a form of control. When Phillis Wheatley published her book of poetry, Poems on Various Subjects, Religious and Moral Links to an external site. (1773) she had to defend the fact that she wrote it, as racist views that slaves were incapable of writing poetry were popularly held. Later, Frederick Douglass wrote about how his masters banned him from reading and writing, as the slaveowners realized "education and slavery were incompatible with each other" (Douglass). He later championed his learning to read and write as the means which conveyed him to freedom. However, even when trying to publish The Narrative of the Life of Frederick Douglass, his publishers were forced to prove that it was, in fact, a slave who wrote the story and not a white man who wrote it for him. Slave owners actively attempted to keep this book from circulation as it threatened the institution of slavery upon which they depended. Indeed, to this day, Douglass' book continues to be banned in some prisons (Darby, Gilroy). How could black writers enter the canon en masse if they were not allowed to read or write? Or if they were forced to spend all of their waking hours working? And if those who had the means to read and write had to jump through absurd hoops just to have their works published? And if even those texts which were published were banned? Similarly, throughout much of Western history, women have been discouraged from pursuing reading and writing, as it distracted from society's expectations for women to focus on motherly and household duties. Until the 1700s, women were not allowed to go to college. Even then, very few went: only the extremely wealthy. It was not until the 19th century that women truly began attending college. Virginia Woolf wrote in A Room of One's Own that if there are fewer works of literature written by women, it is only because society, historically, has not given women the time, education, funding, or space to do so. In this extended essay, she describes an imaginary sister of William Shakespeare who could have been just as great of a writer had she the same opportunities as her brother. I told you in the course of this paper that Shakespeare had a sister; but do not look for her in Sir Sidney Lee's life of the poet. She died young—alas, she never wrote a word. She lies buried where the omnibuses now stop, opposite the Elephant and Castle. Now my belief is that this poet who never wrote a word and was buried at the cross-roads still lives. She lives in you and in me, and in many other women who are not here tonight, for they are washing up the dishes and putting the children to bed. But she lives; for great poets do not die; they are continuing presences; they need only the opportunity to walk among us in the flesh. This opportunity, as I think, it is now coming within your power to give her. Woolf argues that in our time those who have been excluded from literature can now join the canon by adding their voices. The inequity of representation in literature -- which has arguably improved, but in many ways persists today -- can be remedied if more people from a wide array of backgrounds and walks of life are empowered to study and create Literature. That is one reason why the current study of literature is so exciting. As a student and budding literary scholar, you have the power to influence culture through your reading and analysis of literature! Links to an external site. Works Cited Bacon, Katie. "An African Voice."  Links to an external site.The Atlantic, 2000. "Battle of the Authors: Are The Most Popular Rated Fiction Books Written by Men or Women?"  Links to an external site.Wordery, 1 Mar. 2019. Darby, Luke. "Illinois Prison Bans Frederick Douglass's Memoir and Other "Racial" Books." Links to an external site. GQ, 20 August 2019. Douglass, Frederick. The Narrative of the Life of Frederick Douglass. 1845. Friedrich, Caspar David. "Wanderer Above the Sea of Fog." Hamburger Kunsthalle Museum, 1818. Gilroy, Paul. "Banned Books of Guantánamo: 'An American Slave' by Frederick Douglass." Links to an external site. Vice, 14 Nov. 2014. "literature, n.; 3b & 5" OED Online, Oxford University Press, September 2019, www.oed.com/view/Entry/109080. Accessed 6 September 2019. Rollison, David. "Big L vs Little L Literature." Survey of World Literature I. College of Marin, 2008. Lecture. Wheatley, Phillis. Poems on Various Subjects, Religious and Moral Links to an external site.. 1773. Woolf, Virginia. A Room of One's Own. 1929. Contributed by Heather Ringo & Athena Kashyap City College of San Francisco Links to an external site. Sourced from ASCCC Open Educational Resources Initiative Links to an external site.

      amorphous= without form Big L literature has lasting artistic merit Little L literature is anything published

    1. That overconfidence is bad for learning because if we think we already know something, we might study less.

      the hunger for knowledge over time will disappear as some people may not even have a need for learning when they can just google search something and find out without remembering it.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors constructed a novel HSV-based therapeutic vaccine to cure SIV in a primate model. The novel HSV vector is deleted for ICP34.5. Evidence is given that this protein blocks HIV reactivation by interference with the NFkappaB pathway. The deleted construct supposedly would reactivate SIV from latency. The SIV genes carried by the vector ought to elicit a strong immune response. Together the HSV vector would elicit a shock and kill effect. This is tested in a primate model.

      Strengths and weaknesses:

      (1) Deleting ICP34.5 from the HSV construct has a very strong effect on HIV reactivation. The mechanism underlying increased activation by deleting ICP34.5 is only partially explored. Overexpression of ICP34.5 has a much smaller effect (reduction in reactivation) than deletion of ICP34.5 (strong activation); this is acknowledged by the authors that no full mechanistic explanation can be given at this moment.

      Thank you for your comments. We agree with you that the mechanism underlying increased reactivation by deleting ICP34.5 is only partially explored. As you pointed out, the deletion of ICP34.5 leads to a significant reactivation, while the overexpression of ICP34.5 has a relatively weak inhibitory effect on reactivation. This difference prompts us to further contemplate the role of HSV-1 in regulating HIV latency and reactivation. Our data (Figure S4), along with previous literature (Mosca et al., 1987, Nabel et al., 1988), have indicated that the ICP0 protein might play a crucial role in the reactivation of HIV latency. However, we found for the first time that ICP34.5 can play an antagonistic role with this reactivation. This is a very interesting topic for understanding the complicated interactions between host cells and different viruses. We will investigate the deeper insights in future studies, and we have mentioned this limitation in the revised Discussion Section. Thank you!

      (2) No toxicity data are given for deleting ICP34.5. How specific is the effect for HIV reactivation? A RNA seq analysis is required to show the effect on cellular genes.

      A RNA seq analysis was done in the revised manuscript comparing the effect of HSV-1 and deleted vector in J-LAT cells (Fig S5). More than 2000 genes are upregulated after transduction with the modified vector in comparison with the WT vector. Hence, the specificity of upregulation of SIV genes is questioned. Authors do NOT comment on these findings. In my view it questions the utility of this approach.

      Thank you for your mentions.

      (1) As for the toxicity of HSV-ΔICP34.5, it is well known that ICP34.5 is a neurotoxicity factor that can antagonize host immune responses, and thus deleting ICP34.5 is beneficial to improve the safety of HSV-based constructs. As expected, we have demonstrated experimentally that HSV-DICP34.5 exhibited lower virulence and replication ability than wild-type HSV-1 (Figure S1). Importantly, we also observed a significant decrease in the expression of inflammatory factors in PWLH when compared to wild-type HSV-1 (Figure 1I-K). These data suggested that the safety of HSV-DICP34.5 should be more tolerable than wild-type HSV vector.

      (2) The RNASeq analysis is aimed to explore the HSV-ΔICP34.5-induced signaling pathways, but it is not suitable to use this data for assessing the toxicity of HSV-ΔICP34.5 constructs. As for the RNASeq data, we think it is reasonable to observe many upregulated genes (which are involved in a variety of signaling pathways), since HSV-DICP34.5 constructs reactivated HIV latency more effectively than wild-type HSV by modulating the IKKα/β-NF-kB pathway and PP1-HSF1 pathway.

      (3) To further validate whether HSV-ΔICP34.5 can specifically activate the HIV latent reservoir, we conducted additional experiments using vaccinia virus and adenovirus as controls, and results showed that both vaccinia virus and adenovirus cannot effectively reactivate HIV latency (Figure S3). Moreover, the deletion of ICP0 gene from HSV-1 diminished the reactivation effect of HIV latency by HSV-1, and overexpressing ICP0 greatly reactivate the latent HIV (Figure S4, Figure S5), implying that this reactivation should be virus-specific and ICP0 plays an important factor on reversing HIV latency. Interestingly, we herein found that ICP34.5 can act as an antagonistic factor for this reactivation of HIV latency by HSV-1. Thus, after the deletion of ICP34.5, the ability of HSV to reverse HIV latency was significantly enhanced. Our research group will investigate the underlying mechanism in future studies. Thank you for your insightful mention.

      (3) The primate groups are too small and the results to variable to make averages. In Fig 5, the group with ART and saline has two slow rebounders. It is not correct to average those with the single quick rebounder. Here the interpretation is NOT supported by the data.

      Although authors provided some promising SIV DNA data, no additional animals were added. Groups of 3 animals are too small to make any conclusion, especially since the huge variability in response. The average numbers out of 3 are still presented in the paper, which is not proper science.

      No data are given of the effect of the deletion in primates. Now the deleted construct is compared with an empty vector containing no SIV genes. Authors provide new data in Fig S2 on the comparison of WT and modified vector in cells from PLWH, but data are not that convincing. A significant difference in reactivation is seen for LTR in only 2/4 donors and in Gag in 3/4 donors. (Additional question what is meaning of LTR mRNA, do authors relate to genomic RNA??)

      Thank you for your serious review and kind reminder.

      (1) We agree with you that it is not appropriated to use averages for this pilot study with limited numbers of macaques. We are currently unable to conduct another experiment with a larger number of macaques, but we think the results of this pilot study were very promising for further studies. Now, following your kind suggestions, we have removed the averages and now presented the data for each monkey individually in the revised manuscript. We have also modified the corresponding description accordingly (Line 254 to 262). Thank you for your understanding.

      (2) Regarding your comment about the lack of data on the deletion of ICP34.5 from HSV-1, we are sorry for previously unclear description. In fact, the empty vector used in our animal experiments not only does not contain SIV antigens but also has the ICP34.5 deletion. We have revised the corresponding description accordingly (For example, we use HSV-DICP34.5DICP47-empty, HSV-DICP34.5DICP47-sPD1-SIVgag/SIVenv instead of HSV-empty, HSV-sPD1-SIVgag/SIVenv). We hope this revision will address your question.

      (3) As for the reactivation effects observed in PLWH samples, the data may be not perfect, but we think this result (a significant difference in reactivation is seen for LTR in 2/4 donors and for Gag in 3/4 donors, and the purpose of detecting LTR RNA is to evaluate the level of virus replication) is promising to support our conclusion (The enhanced reactivation effect in primary CD4+ T cells by HSV-∆ICP34.5 than wild-type HSV). Of course, we recognize the need for more samples to gain a comprehensive understanding of reactivation effect in different individuals in future study. In addition, we corrected the description of LTR RNA (Lines 99-106 and 115-116). Thank you for the reminder!

      Discussion

      HSV vectors are mainly used in cancer treatment partially due to induced inflammation. Whether these are suitable to cure PLWH without major symptoms is a bit questionable to me and should at least be argued for.

      The RNA seq data add on to this worry and should at least be discussed.

      Thank you for your mention. As mentioned above, the RNASeq analysis is aimed to explore the HSV-ΔICP34.5-induced signaling pathways, but it is not suitable to use this data for assessing the toxicity of HSV-ΔICP34.5 constructs. Actually, ICP34.5 is a neurotoxicity factor that can antagonize innate immune responses, and thus ICP34.5 deletion is beneficial to improve the safety of HSV-based constructs. As expected, our data have demonstrated experimentally that HSV-DICP34.5 exhibited lower virulence and replication ability than wild-type HSV-1 (Figure S1). Importantly, HSV-DICP34.5 induced a lower level of inflammatory cytokines (including IL-6, IL-1β, and TNF-α) in primary CD4+ T cells from PLWH compared to HSV stimulation, likely due to its lower virulence and replication ability (Figure 1I-K). In addition, the CD4+ /CD8+ T cell ratio (Figure 5H) and body weight (Figure S10) after treatment were effectively ameliorated in the SIV-infected macaques of the ART+HSV-DICP34.5DICP47-sPD1-SIVgag/SIVenv group. Our data also demonstrated that there was no significant effect on the cell composition of peripheral blood in the SIV-infected macaques of ART+HSV-DICP34.5DICP47-sPD1-SIVgag/SIVenv group (Figure S11). These data suggested that the safety of HSV-DICP34.5 should be more tolerable than wild-type HSV vector. We have added a more comprehensive description in the revised Discussion (Lines 328-334). Thank you again for all of your kind comments and suggestions.

      Reviewer #2 (Public review):

      Summary:

      In this article Wen et. al., describe the development of a 'proof-of-concept' bi-functional vector based out of HSV-deltaICP-34.5's ability to purge latent HIV-1 and SIV genomes from cells. They show that co-infection of latent J-lat T-cell lines with a HSV-deltaICP-34.5 vector can reactivate HIV-1 from a latent state. Over- or stable expression of ICP 34.5 ORF in these cells can arrest latent HIV-1 genomes from transcription, even in the presence of latency reversal agents. ICP34.5 can co-IP with- and de-phosphorylate IKKa/b to block its interaction with NF-k/B transcription factor. Additionally, ICP34.5 can interact with HSF1 which was identified by mass-spec. Thus, the authors propose that the latency reversal effect of HSV-deltaICP-34.5 in co-infected JLat cells is due to modulatory effects on the IKKa/b-NF-kB and PP1-HSF-1 pathway.

      Next the authors cleverly construct a bifunctional HSV based vector with deleted ICP34.5 and 47 ORFs to purge latency and avoid immunological refluxes, and additionally expand the application of this construct as a vaccine by introducing SIV genes. They use this 'vaccine' in mouse models and show the expected SIV-immune responses. Experiments in rhesus macaques (RM), further elicit potential for their approach to reactivate SIV genomes and at the same time block their replication by antibodies. What was interesting in the SIV experiments is that the dual-functional vector vaccine containing sPD1- and SIV Gag/Env ORFs effectively delayed SIV rebound in RMs and in some cases almost neutralized viral DNA copy detection in serum. Very promising indeed, however there are some questions I wish the authors explored to answer, detailed below.

      Overall, this is an elegant and timely work demonstrating the feasibility of reducing virus rebound in animals, and potentially expand to clinical studies. The work was well written, and sections were clearly discussed.

      Strengths:

      The work is well designed, rationale explained and written very clearly for lay readers.

      Claims are adequately supported by evidence and well designed experiments including controls.

      We appreciate your positive comment for our work.

      Weaknesses:

      (1) It looks like ICP0 is also involved in latency reversal effects. More follow-up work will be required to test if this is in fact true.

      Both our data (Figure S4, Figure S5) and previous literature (Nabel et al., 1988, Mosca et al., 1987) have reported that HSV ICP0 may play a role in reversing HIV latency. However, the exact mechanisms behind this effect have not yet been fully elucidated. Of note, we herein reported for the first time that ICP34.5 can act as an antagonistic factor for this reactivation of HIV latency by HSV-1. Thus, after the deletion of ICP34.5, the ability of HSV to reverse HIV latency was significantly enhanced. Our research group will investigate the underlying mechanism in future studies. Thank you for your insightful mention.

      (2) It is difficult to estimate the depletion of the latent viral reservoir. The authors have tried to address this issue. A more convincing argument to this reviewer will be data to demonstrate that after the bi-functional vaccine, the animals show overall reduction in the number of circulating latent cells. The feasibility to obtain such a result is not clearly demonstrated.

      Thank you for your comment. As you mentioned, we have indeed measured both total DNA and integrated DNA (iDNA) in blood cells (see Figure 5E-F), which can provide support for the reduction of the latent viral reservoir. Thank you for your kind reminder.

      (3) The authors state that the reduced virus rebound detected following bi-functional vaccine delivery is due to latent genomes becoming activated and steady-state neutralization of these viruses by antibody response. This needs to be demonstrated. Perhaps cell-culture experiments from specimen taken from animals might help address this issue. In lab cultures one could create environments without antibody responses, under these conditions one would expect higher level of viral loads being released in response to the vaccine in question.

      Thank you for your valuable suggestion. We believe that the reduced virus rebound observed may be influenced by immune responses from T cells and antibodies induced by both ART and the vaccine. We appreciate your insight and agree that future studies should focus on investigating the activation effects of the vaccine under controlled conditions that simulate the absence of immune responses in primary animal cells. This will help us better understand the mechanisms involved and address your concerns more comprehensively.

      Reviewer #2 (Recommendations for the authors):

      The Authors have sufficiently addressed my comments. Below are a few minor changes that can help with clarity.

      Lines 126-127: This sentence should be changed. Perhaps, "these data suggests that .... Safety of... in PLWH might be tolerable, at least in vitro."

      Thanks for your suggestion. We have revised it accordingly. (Line 130).

      Lines 128-132: Would this not mean that reactivation is due to ICP0 gene? Have the authors tried to express ICP0-gene into J-Lat cells and see if that is the reason for reactivation? This seems somewhat incomplete. At the end of 132, please add ", in the presence of ICP0". Also a sentence describing this effect is warranted.

      Thank you for your insightful suggestion. Yes, both our data and previous literature supported that the ICP0 gene can play a significant role in the reactivation of HIV latency (Figure S4, Figure S5). Of note, we herein reported for the first time that ICP34.5 can act as an antagonistic factor for this reactivation of HIV latency by HSV-1. Thus, after the deletion of ICP34.5, the ability of HSV to reverse HIV latency was significantly enhanced. We have described this effect in the revised version accordingly. Additionally, we have added the phrase “in the presence of ICP0” to the results section (Lines 137) to clarify this point.

      MOSCA, J. D., BEDNARIK, D. P., RAJ, N. B., ROSEN, C. A., SODROSKI, J. G., HASELTINE, W. A., HAYWARD, G. S. & PITHA, P. M. 1987. Activation of human immunodeficiency virus by herpesvirus infection: identification of a region within the long terminal repeat that responds to a trans-acting factor encoded by herpes simplex virus 1. Proc Natl Acad Sci U S A 84:  7408.DOI: https://doi.org/10.1073/pnas.84.21.7408, PMID: 2823260

      NABEL, G. J., RICE, S. A., KNIPE, D. M. & BALTIMORE, D. 1988. Alternative mechanisms for activation of human immunodeficiency virus enhancer in T cells. Science 239:  1299.DOI: https://doi.org/10.1126/science.2830675, PMID: 2830675

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper introduces a new approach to modeling human behavioral responses using image-computable models. They create a model (VAM) that is a combination of a standard CNN coupled with a standard evidence accumulation model (EAM). The combined model is then trained directly on image-level data using human behavioral responses. This approach is original and can have wide applicability. However, many of the specific findings reported are less compelling.

      Strengths:

      (1) The manuscript presents an original approach to fitting an image-computable model to human behavioral data. This type of approach is sorely needed in the field.

      (2) The analyses are very technically sophisticated.

      (3) The behavioral data are large both in terms of sample size (N=75) and in terms of trials per subject.

      Weaknesses:

      Major

      (1) The manuscript appears to suggest that it is the first to combine CNNs with evidence accumulation models (EAMs). However, this was done in a 2022 preprint

      (https://www.biorxiv.org/content/10.1101/2022.08.23.505015v1) that introduced a network called RTNet. This preprint is cited here, but never really discussed. Further, the two unique features of the current approach discussed in lines 55-60 are both present to some extent in RTNet. Given the strong conceptual similarity in approach, it seems that a detailed discussion of similarities and differences (of which there are many) should feature in the Introduction.

      Thanks for pointing this out—we agree that the novel contributions of our model (the VAM) with respect to prior related models (including RTNet) should be clarified, and have revised the Introduction accordingly. We include the following clarifications in the Introduction:

      “The key feature of the VAM that distinguishes it from prior models is that the CNN and EAM parameters are jointly fitted to the RT, choice, and visual stimulus data from individual participants in a unified Bayesian framework. Thus, both the visual representations learned by the CNN and the EAM parameters are directly constrained by behavioral data. In contrast, prior models first optimize the CNN to perform the behavioral task, then separately fit a minimal set of high-level CNN parameters [RTNet, Rafiei et al., 2024] and/or the EAM parameters to behavioral data [Annis et al., 2021; Holmes et al., 2020; Trueblood et al., 2021]. As we will show, fitting the CNN with human data—rather than optimizing the model to perform a task—has significant consequences for the representations learned by the model.”

      E.g. in the case of RTNet, the variability of the Bayesian CNN weight distribution, the decision threshold, and the magnitude of the noise added to the images are adjusted to match the average human accuracy (separately for each task condition). RTNet is an interesting and useful model that we believe has complementary strengths to our own work.

      Since there are several other existing models in addition to the VAM and RTNet that use CNNs to generate RTs or RT proxies (by our count, at least six that we cite earlier in the Introduction), we felt it was inappropriate to preferentially include a detailed comparison of the VAM and RTNet beyond the passage quoted above.

      (2) In the approach here, a given stimulus is always processed in the same way through the core CNN to produce activations v_k. These v_k's are then corrupted by Gaussian noise to produce drift rates d_k, which can differ from trial to trial even for the same stimulus. In other words, the assumption built into VAM appears to be that the drift rate variability stems entirely from post-sensory (decisional) noise. In contrast, the typical interpretation of EAMs is that the variability in drift rates is sensory. This is also the assumption built into RTNet where the core CNN produces noisy evidence. Can the authors comment on the plausibility of VAM's assumption that the noise is post-sensory?

      In our view, the VAM is compatible with a model in which the drift rate variability for a given stimulus is due to sensory noise, since we do not specify the origin of the Gaussian noise added to the drift rates. As the reviewer notes, the CNN component of the VAM processes a given stimulus deterministically, yielding the mean drift rates. This does not preclude us from imagining an additional (unmodeled) sensory process that adds variability to the drift rates. The VAM simply represents this and other hypothetical sources of variability as additive Gaussian noise. We agree however that it is worthwhile to think about the origin of the drift rate variability, though it is not a focus of our work.

      (3) Figure 2 plots how well VAM explains different behavioral features. It would be very useful if the authors could also fit simple EAMs to the data to clarify which of these features are explainable by EAMs only and which are not.

      In our view, fitting simple EAMs to the data would not be especially informative and poses a number of challenges for the particular task we study (LIM) that are neatly avoided by using the VAM. In particular, as we show in Figure 2, the stimuli vary along several dimensions that all appear to influence behavior: horizontal position, vertical position, layout, target direction, and flanker direction. Since the VAM is stimulus-computable, fitting the VAM automatically discovers how all of these stimulus features influence behavior (via their effect on the drift rates outputted by the CNN). In contrast, fitting a simple EAM (e.g. the LBA model) necessitates choosing a particular parameterization that specifies the relationship between all of the stimulus features and the EAM model parameters. This raises a number of practical questions. For example, should we attempt to fit a separate EAM for each stimulus feature, or model all stimulus features simultaneously?

      Moreover, while we could in principle navigate these issues and fit simple EAMs to the data, we do not intend to claim that simple EAMs fail to explain the relationship between stimulus features and behavior as well as the VAM. Rather, the key strength of the VAM relative to simple EAMs is that it includes a detailed and biologically plausible model of human vision. The majority of the paper capitalizes on this strength by showing how behavioral effects of interest (namely congruency effects) can be explained in terms of the VAM’s visual representations.

      (4) VAM is tested in two different ways behaviorally. First, it is tested to what extent it captures individual differences (Figure 2B-E). Second, it is tested to what extent it captures average subject data (Figure 2F-J). It wasn't clear to me why for some metrics only individual differences are examined and for other metrics only average human data is examined. I think that it will be much more informative if separate figures examine average human data and individual difference data. I think that it's especially important to clarify whether VAM can capture individual differences for the quantities plotted in Figures 2F-J.

      We would like to clarify that Fig. 2J in fact already shows how well the VAM captures individual differences for the average subject data shown in Fig. 2H (stimulus layout) and Fig. 2I (stimulus position). For a given participant and stimulus feature, we calculated the Pearson's r between model/participant mean RTs across each stimulus feature value. Fig. 2J shows the distribution of these Pearson’s r values across all participants for stimulus layout and horizontal/vertical position.

      Fig. 2G also already shows how well the VAM captures individual differences in behavior. Specifically, this panel shows individual differences in mean RT attributable to differences in age. For Fig. 2F, which shows how the model drift rates differ on congruent vs. incongruent trials, there is no sensible way to compare the models to the participants at any level of analysis (since the participants do not have drift rates). 

      (5) The authors look inside VAM and perform many exploratory analyses. I found many of these difficult to follow since there was little guidance about why each analysis was conducted. This also made it difficult to assess the likelihood that any given result is robust and replicable. More importantly, it was unclear which results are hypothesized to depend on the VAM architecture and training, and which results would be expected in performance-optimized CNNs. The authors train and examine performance-optimized CNNs later, but it would be useful to compare those results to the VAM results immediately when each VAM result is first introduced.

      Thanks for pointing this out—we apologize for any confusion caused by our presentation of the CNN analyses. We have added in additional motivating statements, methodological clarifications, and relevant references to our Results, particularly for Figure 3 in which we first introduce the analyses of the CNN representations/activity. In general, each analysis is prefaced by a guiding question or specific rationale, e.g. “How do the models' visual representations enable target selectivity for stimuli that vary along several irrelevant dimensions?” We also provide numerous references in which these analysis techniques have been used to address similar questions in CNNs or the primate visual cortex.

      We chose to maintain the current organization of our results in which the comparison between the VAM and the task-optimized models are presented in a separate figure. We felt that including analyses of both the VAM and task-optimized models in the initial analyses of the CNN representations would be overwhelming for many readers. As the reviewer acknowledges, some readers may already find these results challenging to follow. 

      (6) The authors don't examine how the task-optimized models would produce RTs. They say in lines 371-2 that they "could not examine the RT congruency effect since the task-optimized models do not generate RTs." CNNs alone don't generate RTs, but RTs can easily be generated from them using the same EAM add-on that is part of VAM. Given that the CNNs are already trained, I can't see a reason why the authors can't train EAMs on top of the already trained CNNs and generate RTs, so these can provide a better comparison to VAM.

      We appreciate this suggestion, but we judge the suggestion to “train EAMs on top of the already trained CNNs and generate RTs” to be a significant expansion of the scope of the paper with multiple possible roads forward. In particular, one must specify how the outputs of the task-optimized CNN (logits for each possible response) relate to drift rates, and there is no widely-accepted or standard way to do this. Previously proposed methods include transforming representation distances in the last layer to drift rates (https://doi.org/10.1037/xlm0000968), fitting additional subject-specific parameters that map the logits to drift rates

      (https://doi.org/10.1007/s42113-019-00042-1), or using the softmax-scored model outputs as drift rates directly (https://doi.org/10.1038/s41562-024-01914-8), though in the latter case the RTs are not on the same scale as human data. In our view, evaluating these different methods is beyond the scope of this paper. An advantage of the VAM is that one does not have to fit two separate models (a CNN and a EAM) to generate RTs.

      Nonetheless, we agree that it would be informative to examine something like RTs in the task-optimized models. Our revised Results section now includes an analysis of the confidence of the task-optimized models’ decisions, which we use a proxy for RTs:   

      “Since the task-optimized models do not generate RTs, it is not possible to directly measure RT congruency effects in these models without making additional assumptions about how the CNN's classification decisions relate to RTs. However, as a coarse proxy for RT, we can examine the confidence of the CNN's decisions, defined as the softmax-scored logit (probability) of the most probable direction in the final CNN layer. This choice of RT proxy is motivated by some prior studies that have combined CNNs with EAMs [Annis et al., 2021; Holmes et al., 2020; Trueblood et al., 2021]. These studies explicitly or implicitly derive a measure of decision confidence from the activity of the last CNN layer. The confidence measure is then mapped to the EAM drift rates, such that greater decision confidence generally corresponds to higher drift rates (and therefore shorter RTs).

      We calculated the average confidence of each task-optimized CNN separately for congruent vs. incongruent trials. On average, the task-optimized models showed higher confidence on congruent vs. incongruent trials (W = 21.0, p < 1e-3, Wilcoxon signed-rank test; Cohen's d = 0.99; n = 75 models). These analyses therefore provide some evidence that task-optimized CNNs have the capacity to exhibit congruency effects, though an explicit comparison of the magnitude of these effects with human data requires additional modeling assumptions (e.g., fitting a separate EAM).”

      (7) The Discussion felt very long and mostly a summary of the Results. I also couldn't shake the feeling that it had many just-so stories related to the variety of findings reported. I think that the section should be condensed and the authors should be clearer about which explanations are speculations and which are air-tight arguments based on the data.

      We have shortened the Discussion modestly and we have added in some clarifying language to help clarify which arguments are more speculative vs. directly supported by our data.

      Specifically, we added in the phrase “we speculate that…” for two suggestions in the Discussion (paragraphs 3 and 5), and we ensured that any other more speculative suggestions contain such clarifying language. We have also added in subheadings in the Discussion to help readers navigate this section. 

      (8) In one of the control analyses, the authors train different VAMs on each RT quantile. I don't understand how it can be claimed that this approach can serve as a model of an individual's sensory processing. Which of the 5 sets of weights (5 VAMs) captures a given subject's visual processing? Are the authors saying that the visual system of a given subject changes based on the expected RT for a stimulus? I feel like I'm missing something about how the authors think about these results.

      We agree that these particular analyses may cause confusion and have removed them from our revised manuscript.

      Reviewer #2 (Public Review):

      In an image-computable model of speeded decision-making, the authors introduce and fit a combined CCN-EAM (a 'VAM') to flanker-task-like data. They show that the VAM can fit mean RTs and accuracies as well as the congruency effect that is present in the data, and subsequently analyze the VAM in terms of where in the network congruency effects arise.

      Overall, combining DNNs and EAMs appears to be a promising avenue to seriously model the visual system in decision-making tasks compared to the current practice in EAMs. Some variants have been proposed or used before (e.g., doi.org/10.1016/j.neuroimage.2017.12.078 , doi.org/10.1007/s42113-019-00042-1), but always in the context of using task-trained models, rather than models trained on behavioral data. However, I was surprised to read that the authors developed their model in the context of a conflict task, rather than a simpler perceptual decision-making task. Conflict effects in human behavior are particularly complex, and thereby, the authors set a high goal for themselves in terms of the to-be-explained human behavior. Unfortunately, the proposed VAM does not appear to provide a great account of conflict effects that are considered fundamental features of human behavior, like the shape of response time distributions, and specifically, delta plots (doi.org/10.1037/0096-1523.20.4.731). The authors argue that it is beyond the scope of the presented paper to analyze delta plots, but as these are central to studies of human conflict behavior, models that aim to explain conflict behavior will need to be able to fit and explain delta plots.

      Theories on conflict often suggest that negative/positive-trending delta plots arise through the relative timing of response activation related to relevant and irrelevant information.

      Accumulation for relevant and irrelevant information would, as a result, either start at different points in time or the rates vary over time. The current VAM, as a feedforward neural network model, does not appear to be able to capture such effects, and perhaps fundamentally not so: accumulation for each choice option is forced to start at the same time, and rates are a static output of the CNN.

      The proposed solution of fitting five separate VAMs (one for each of five RT quantiles) is not satisfactory: it does not explain how delta plots result from the model, for the same reason that fitting five evidence accumulation models (one per RT quantile) does not explain how response time distributions arise. If, for example, one would want to make a prediction about someone's response time and choice based on a given stimulus, one would first have to decide which of the five VAMs to use, which is circular. But more importantly, this way of fitting multiple models does not explain the latent mechanism that underlies the shape of the delta plots.

      As such, the extensive analyses on the VAM layers and the resulting conclusions that conflict effects arise due to changing representations across layers (e.g., "the selection of task-relevant information occurs through the orthogonalization of relevant and irrelevant representations") - while inspiring, they remain hard to weigh, as they are contingent on the assumption that the VAM can capture human behavior in the conflict task, which it struggles with. That said, the promise of combining CNNs and EAMs is clearly there. A way forward could be to either adjust the proposed model so that it can explain delta plots, which would potentially require temporal dynamics and time-varying evidence accumulation rates, or perhaps to start simpler and combine CCNs-EAMs that are able to fit more standard perceptual decision-making tasks without conflict effects.

      We thank the reviewer for their thoughtful comments on our work. However, we note that the

      VAM does in fact capture the positive-trending RT delta plot observed in the participant data (Fig. S4A), though the intercepts for models/participants differ somewhat. On the other hand, the conditional accuracy functions (Fig. S4B) reveal a more pronounced difference between model and participant behavior. As the reviewer points out, capturing these effects is likely to require a model that can produce time-varying drift rates, whereas our model produces a fixed drift rate for a given stimulus. We also agree that fitting a separate VAM to each RT quantile is not a satisfactory means of addressing this limitation and have removed these analyses from our revised manuscript.

      However, while we agree that accurately capturing these dynamic effects is a laudable goal, it is in our view also worthwhile to consider explanations for the mean behavioral effect (i.e. the accuracy congruency effect), which can occur independently of any consideration of dynamics. One of our main findings is that across-model variability in accuracy congruency effects is better attributed to variation in representation geometry (target/flanker subspace alignment) vs.

      variation in the degree of flanker suppression. This finding does not require any consideration of dynamics to be valid at the level of explanation we pursue (across-user variability in congruency effects), but also does not preclude additional dynamic processes that could give rise to more specific error patterns. Our revised discussion now includes a section where we summarize and elaborate on these ideas:

      “It is not difficult to imagine how the orthogonalization mechanism described above, which explains variability in accuracy congruency effects across individuals, could act in concert with other dynamic processes that explain variability in congruency effects within individuals (e.g., as a function of RT). In general, any process that dynamically gates the influence of irrelevant sensory information on behavioral outputs could accomplish this, for example ramping inhibition of incorrect response activation [https://doi.org/10.3389/fnhum.2010.00222], a shrinking attention spotlight [https://doi.org/10.1016/j.cogpsych.2011.08.001], or dynamics in neural population-level geometry [https://doi.org/10.1038/nn.3643]. To pursue these ideas, future work may aim to incorporate dynamics into the visual component and decision component of the VAM with recurrent CNNs [https://doi.org/10.48550/arXiv.1807.00053, https://doi.org/10.48550/arXiv.2306.11582] and the task-DyVA model [https://doi.org/10.1038/s41562-022-01510-8], respectively.”

      Reviewer #3 (Public Review):

      Summary:

      In this article, the authors combine a well-established choice-response time (RT) model (the Linear Ballistic Accumulator) with a CNN model of visual processing to model image-based decisions (referred to as the Visual Accumulator Model - VAM). While this is not the first effort to combine these modeling frameworks, it uses this combination of approaches uniquely.

      Specifically, the authors attempt to better understand the structure of human information representations by fitting this model to behavioral (choice-RT) data from a classic flanker task. This objective is made possible by using a very large (by psychological modeling standards) industry data set to jointly fit both components of this VAM model to individual-level data. Using this approach, they illustrate (among other results) (1) how the interaction between target and flanker representations influence the presence and strength of congruency effects, (2) how the structure of representations changes (distributed versus more localized) with depth in the CNN model component, and (3) how different model training paradigms change the nature of information representations. This work contributes to the ML literature by demonstrating the value of training models with richer behavioral data. It also contributes to cognitive science by demonstrating how ML approaches can be integrated into cognitive modeling. Finally, it contributes to the literature on conflict modeling by illustrating how information representations may lead to some of the classic effects observed in this area of research.

      Strengths:

      (1) The data set used for this analysis is unique and is made publicly available as part of this article. Specifically, they have access to data for 75 participants with >25,000 trials per participant. This scale of data/individual is unusual and is the foundation on which this research rests.

      (2) This is the first time, to my knowledge, that a model combining a CNN with a choice-RT model has been jointly fit to choice-RT data at the level of individual people. This type of model combination has been used before but in a more restricted context. This joint fitting, and in particular, learning a CNN through the choice-RT modeling framework, allows the authors to probe the structure of human information representations learned directly from behavioral data.

      (3) The analysis approaches used in this article are state-of-the-art. The training of these models is straightforward given the data available. The interesting part of this article (opinion of course) is the way in which they probe what CNN has learned once trained. I find their analysis of how distractor and target information interfere with each other particularly compelling as well as their demonstration that training on behavioral data changes the structure of information representations when compared to training models on standard task-optimized data.

      Weaknesses:

      (1) Just as the data in this article is a major strength, it is also a weakness. This type of modeling would be difficult, if not impossible to do with standard laboratory data. I don't know what the data floor would be, but collecting tens of thousands of decisions for a single person is impractical in most contexts. Thus this type of work may live in the realm of industry. I do want to re-iterate that the data for this study was made publicly available though!

      We suspect (but have not systematically tested) that the VAMs can be fitted with substantially less data. We use data augmentation techniques (various randomized image transformations) during training to improve the generalization capabilities of the VAMs, and these methods are likely to be particularly important when training on smaller datasets. One could consider increasing the amount of image data augmentation when working with smaller datasets, or pursuing other forms of data augmentation like resampling from estimated RT distributions (see https://doi.org/10.1038/s41562-022-01510-8 for an example of this). In general, we don’t think that prospective users of our approach should be discouraged if they have only a few hundred trials per subject (or less) - it’s worth trying!

      (2) While this article uses choice-RT data it doesn't fully leverage the richness of the RT data itself. As the authors point out, this modeling framework, the LBA component in particular, does not account for some of the more nuanced but well-established RT effects in this data. This is not a big concern given the already nice contributions of this article and it leads to an opportunity for ongoing investigation.

      We agree that fully capturing the more nuanced behavioral effects you mention (e.g. RT delta plots and conditional accuracy functions) is a worthwhile goal for future research—see our response to Reviewer #2 for a more detailed discussion. ----------

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The phrase in the Abstract "convolutional neural network models of visual processing and traditional EAMs are jointly fitted" made me initially believe that the two models were fitted independently. You may want to re-word to clarify.

      We think that the phrase “jointly fitted” already makes it clear that both the CNN and EAM parameters are estimated simultaneously, in agreement with how this term is usually used. But we have nonetheless appended some additional clarifying language to that sentence (“in a unified Bayesian framework”).

      (2) Lines 27-28: EAMs "are the most successful and widely-used computational models of decision-making." This is only true for the specific type of decision-making examined here, namely joint modeling of choice and response times. Signal detection theory is arguably more widely-used when response times are not modeled.

      Thanks for pointing this out - we have revised the referenced sentence accordingly.

      (3) Could the authors clarify what is plotted in Figure 2F?

      Fig. 2F shows the drift rates for the target, flanker, and “other” (non-target/non-flanker) accumulators averaged over trials and models for congruent vs. incongruent trials. In case this was a source of confusion, we do not show the value of the flanker drift rates on congruent trials because the flanker and target accumulators are identical (i.e. the flanker/congruent drift rates are equivalent to the target/congruent drift rates).

      (4) Lines 214-7: "The observation that single-unit information for target direction decreased between the fourth and final convolutional layers while population-level decoding remained high is especially noteworthy in that it implies a transition from representing target direction with specialized "target neurons" to a more distributed, ensemble-level code." Can the authors clarify why this is the only reasonable explanation for these results? It seems like many other explanations could be construed.

      We have added additional clarification to this section and now use more tentative language:

      “The observation that single-unit information for target direction decreased between the fourth and final convolutional layers indicates that the units become progressively less selective for particular target directions. Since population-level decoding remained high in these layers, this suggests a transition from representing target direction with specialized "target neurons" to a more distributed, ensemble-level code.”

      (5) Lines 372-376: "Thus, simply training the model to perform the task is not sufficient to reproduce a behavioral phenomenon widely-observed in conflict tasks. This challenges a core (but often implicit) assumption of the task-optimized training paradigm, namely that to do a task well, a training model will result in model representations that are similar to those employed by humans." While I agree with the general sentiment, I feel that its application here is strange. Unless I'm missing something, in the context of the preceding sentence, the authors seem to be saying that researchers in the field expect that CNNs can produce a behavioral phenomenon (RTs) that is completely outside of their design and training. I don't think that anyone actually expects that.

      We moved the discussion/analyses of RTs to the next paragraph. It should now be clear that this statement refers specifically to the absence of an accuracy congruency effect in the task-optimized models.

      (6) Lines 387-389: "As a result, the VAMs may learn richer representations of the stimuli, since a variety of stimulus features-layout, stimulus position, flanker direction-influence behavior (Figure 2)." That is certainly true of tasks like this one where an optimal model would only focus on a tiny part of the image, whereas humans are distracted by many features. I'm not sure that this distractibility is the same as "richer representations". When CNNs classify images based on the background, would the authors claim that they have richer representations than humans?

      We agree that “richer” may not be the best way to characterize these representations, and have changed it to “more complex”.

      (7) Is it possible that drift rate d_k for each response happens to be negative on a given trial? If so, how is the decision given on such trials (since presumably none of the accumulators will ever reach the boundary)?

      It is indeed possible for all of the drift rates to be negative, though we found that this occurred for a vanishingly small number of trials (mean ± s.e.m. percent trials/model: 0.080 ± 0.011%, n = 75 models), as reported in the Methods. These trials were excluded from analyses.

      (8)  Can the authors comment on how they chose the CNN architecture and whether they expect that different architectures will produce similar results?

      Before establishing the seven-layer CNN architecture used throughout the paper, we conducted some preliminary experiments using other architectures that differed primarily in the number of CNN layers. We found that models with significantly fewer than seven layers typically failed to reach human-level accuracy on the task while larger models achieved human-level accuracy but (unsurprisingly) took longer to train.

      Reviewer #3 (Recommendations For The Authors):

      - In the introduction to this paper (particularly the paragraph beginning in line 33), the authors note that EAMs have typically been used in simplified settings and that they do not provide a means to account for how people extract information from naturalistic stimuli. While I agree with this, the idea of connecting CNNs of visual processing with EAMs for a joint modeling framework has been done. I recommend looking at and referencing these two articles as well as adjusting the tenor of this part of an introduction to better reflect the current state of the literature. For full disclosure, I am one of the authors on these articles. https://link.springer.com/article/10.1007/s42113-019-00042-1 https://www.sciencedirect.com/science/article/abs/pii/S0010027721001323

      We agree—thanks for pointing this out. The revised Introduction now discusses prior related models in more detail (including those referenced above) and better clarifies the novel contributions of our model. We specifically highlight that a novel contribution of the VAM is that “the CNN and EAM parameters are jointly fitted to the RT, choice, and visual stimulus data from individual participants in a unified Bayesian framework.”

      - The statement in lines 56-58 implies that this is the first article to glue CNNs together with EAMs. I would edit this accordingly based on the prior comment here and references provided. I will note that the second feature of the approach in this paper is still novel and really nice, namely the fact that the CNN and the EAM are jointly fitted. In the aforementioned references, the CNN is trained on the image set, and individual level Bayesian estimation was only applied to the EAM. Thus, it may be useful to highlight the joint estimation aspect of this investigation as well as how the uniqueness of the data available makes it possible.

      Agreed—see above.

      - Figure 3c and associated text. I understand the MI analysis you are performing here, however it is difficult to interpret as it stands. In the figure, what does a MI of 0.1 mean?? Can you give some context to that scale? I do find the interpretation of the hunchback shape in lines 210-222 to be somewhat of a stretch. The discussion that precedes (lines 199-209) this is clear and convincing. Can this discussion be strengthened more? And more interpretability of Figure 3c would be helpful; entropic scales can be hard to interpret without some context or scale associated.

      The MI analyses in Fig. 3C (and also Figs. 4C and 6E) show normalized MI, in which the raw MI has been divided by the entropy of the stimulus feature distribution. This normalization facilitates comparing the MI for different stimulus features, which is relevant for Figs. 4C and 6E. The normalized MI has a possible range of [0, 1], where 1 indicates perfect correlation between the two variables and 0 indicates complete independence. We now note in the legend of these figures that the possible normalized MI range is [0, 1], which should help with interpreting these values. Our revised results section for Fig. 3C now also includes some additional remarks on our interpretation of the hunchback shape of the MI.

      - Lines 244-248 and the analyses in Figure 3 suggest a change in the behavior of the CNN around layer 4. This is just a musing, but what would happen if you just used a 4 layer CNN, or even a 3 layer? This is not just a methods question. Your analysis suggests a transition from localized to distributed information representation. Right now, the EAM only sees the output of the distributed representation. What if it saw the results the more local representations from early layers? Of course, a shallower network may just form the distributed representations earlier, but it would interesting if there were a way to tease out not just the presence of distributed vs local representations, but the utility of those to the EAM.

      Thanks for this interesting suggestion. We did do some preliminary experiments in models with fewer layers, though we only examined the outputs of these models and did not assess their representations. We found that models with 3–5 layers generally failed to achieve human-level accuracy on the task. In principle, one could relate this observation to the representations of these models as a means of assessing the relative utility of distributed/local representations. However, there are confounding factors that one would ideally control for in order to compare models with different numbers of layers in this fashion (namely, the number of parameters).

      - Section Line 359 (Task optimized models) - It would be helpful to clarify here what these task-optimized models are being trained to do. As I understand it, they are being trained to directly predict the target direction. But are you asking them to learn to predict the true target direction? Or are you training them to predict what each individual responds? I think it is the second (since you have 75 of these), but it's not clear. I looked at the methods and still couldn't get a clear description of this. Also, are you just stripping the LBA off of the end of the CNN and then essentially putting a softmax in its place? If so, it would be helpful to say so.

      The task-optimized models were actually trained to output the true target direction in each stimulus, rather than trained to match the decisions of the human participants. We trained 75 such models since we wanted to use exactly the same stimuli as were used to train each VAM. The task-optimized CNNs were identical to those used in the VAMs, except that the outputs of the last layer were converted to softmax-scored probabilities for each direction rather than drift rates. The Results and Methods section now included additional commentary that clarifies these points.

      - Line 373-376: This statement is pretty well established at this point in the similarity judgement literature. I recommend looking at and referencing https://onlinelibrary.wiley.com/doi/full/10.1111/cogs.13226 https://www.nature.com/articles/s41562-020-00951-3 https://link.springer.com/article/10.1007/s42113-020-00073-z

      Thanks for pointing this out. For reference, the statement in question is “Thus, simply training the model to perform the task is not sufficient to reproduce a behavioral phenomenon widely-observed in conflict tasks. This challenges a core (but often implicit) assumption of the task-optimized training paradigm, namely that training a model to do a task well will result in model representations that are similar to those employed by humans.”

      We agree that the first and third reference you mention are relevant, and we now cite them along with some other relevant work. In our view, the second reference you mention is not particularly relevant (that paper introduces a new computational model for similarity judgements that is fit to human data, but does not comment on training models to perform tasks vs. fitting to human data).

      - Line 387-388: "VAMs may learn richer representations". This is a bit of a philosophical point, but I'll go ahead and mention it. The standard VAM does not necessarily learn "richer" feature representations. Rather, you are asking the VAM and task-optimized models to do different things. As a result, they learn different representations. "Better" or "richer" is in the eye of the beholder. In one view, you could view the VAM performance as sub-par since it exhibits strange artifacts (congruency effects) and the expansion of dimensionality in the VAM representations is merely a side-effect of poor performance. I'm not advocating this view, just playing devils advocate and suggesting a more nuanced discussion of the difference between the VAM and task-optimized models.

      We agree—this is a great point. We have changed this statement to read “the VAMs may learn more complex [rather than richer] representations of the stimuli”.

      - Lines 567-570: Here you discuss how the LBA backend of the VAM can't account for shrinking spotlight-like RT effects but that fitting models to different RT quantiles helps overcome this. I find this to be one of the weakest points of the paper (the whole process of fitting RT quantiles separately to begin with). This is just a limitation of the RT component of the model. This is a great paper but this is just a limitation inherent in the model. I don't see a need to qualify this limitation and think it would be better to just point out that this is a limitation of the LBA itself (be more clear that it is the LBA that is the limiting factor here) and that this leaves room for future research. From your last sentence of this paragraph, I agree that recurrent CNNs would be interesting. I will note that RNN choice-RT models are out there (though not with CNNs as part of the model).

      We agree and have revised this section of the Discussion accordingly (see our response to Reviewer #2 for more detail). We also removed the analyses of models trained on separate RT quantiles.

    1. Reviewer #1 (Public review):

      Summary:

      In a previous work, Prut and colleagues had shown that during reaching, high-frequency stimulation of the cerebellar outputs resulted in reduced reach velocity. Moreover, they showed that the stimulation produced reaches that deviated from a straight line, with the shoulder and elbow movements becoming less coordinated. In this report, they extend their previous work by the addition of modeling results that investigate the relationship between the kinematic changes and torques produced at the joints. The results show that the slowing is not due to reductions in interaction torques alone, as the reductions in velocity occur even for movements that are single joints. More interestingly, the experiment revealed evidence for the decomposition of the reaching movement, as well as an increase in the variance of the trajectory.

      Strengths:

      This is a rare experiment in a non-human primate that assessed the importance of cerebellar input to the motor cortex during reaching.

      Weaknesses:

      My major concerns are described below.

      If I understand the task design correctly, the monkeys did not need to stop their hand at the target. I think this design may be suboptimal for investigating the role of the cerebellum in control of reaching because a number of earlier works have found that the cerebellum's contributions are particularly significant as the movement ends, i.e., stopping at the target. For example, in mice, interposed nucleus neurons tend to be most active near the end of the reach that requires extension, and their activation produces flexion forces during the reach (Becker and Person 2019). Indeed, the inactivation of interposed neurons that project to the thalamus results in overshooting of reaching movements (Low et al. 2018). Recent work has also found that many Purkinje cells show a burst-pause pattern as the reach nears its endpoint, and stimulation of the mossy fibers tends to disrupt endpoint control (Calame et al. 2023). Thus, the fact that the current paper has no data regarding endpoint control of the reach is puzzling to me.

      Because stimulation continued after the cursor had crossed the target, it is interesting to ask whether this disruption had any effects on the movements that were task-irrelevant. The reason for asking this is because we have found that whereas during task-relevant eye or tongue movements the Purkinje cells are strongly modulated, the modulations are much more muted when similar movements are performed but are task-irrelevant (Pi et al., PNAS 2024; Hage et al. Biorxiv 2024). Thus, it is interesting to ask whether the effects of stimulation were global and affected all movements, or were the effects primarily concerned with the task-relevant movements.

      If the schematic in Figure 1 is accurate, it is difficult for me to see how any of the reaching movements can be termed single joint. In the paper, T1 is labeled as a single joint, and T2-T4 are labeled as dual-joint. The authors should provide data to justify this.

      Because at least part of this work was previously analyzed and published, information should be provided regarding which data are new.

    2. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In a previous work, Prut and colleagues had shown that during reaching, high-frequency stimulation of the cerebellar outputs resulted in reduced reach velocity. Moreover, they showed that the stimulation produced reaches that deviated from a straight line, with the shoulder and elbow movements becoming less coordinated. In this report, they extend their previous work by the addition of modeling results that investigate the relationship between the kinematic changes and torques produced at the joints. The results show that the slowing is not due to reductions in interaction torques alone, as the reductions in velocity occur even for movements that are single joints. More interestingly, the experiment revealed evidence for the decomposition of the reaching movement, as well as an increase in the variance of the trajectory.

      Strengths:

      This is a rare experiment in a non-human primate that assessed the importance of cerebellar input to the motor cortex during reaching.

      Weaknesses:

      My major concerns are described below.

      If I understand the task design correctly, the monkeys did not need to stop their hand at the target. I think this design may be suboptimal for investigating the role of the cerebellum in control of reaching because a number of earlier works have found that the cerebellum's contributions are particularly significant as the movement ends, i.e., stopping at the target. For example, in mice, interposed nucleus neurons tend to be most active near the end of the reach that requires extension, and their activation produces flexion forces during the reach (Becker and Person 2019). Indeed, the inactivation of interposed neurons that project to the thalamus results in overshooting of reaching movements (Low et al. 2018). Recent work has also found that many Purkinje cells show a burst-pause pattern as the reach nears its endpoint, and stimulation of the mossy fibers tends to disrupt endpoint control (Calame et al. 2023). Thus, the fact that the current paper has no data regarding endpoint control of the reach is puzzling to me.

      We appreciate the reviewer’s point that cerebellar contributions can be particularly critical near the endpoint of a reach. In our current task design, monkeys were indeed required to hold at the target briefly—100 ms for Monkeys S and P, and 150 ms for Monkeys C and M—before receiving a reward. However, given the size of the targets and the velocity of movements, it often happened that the monkey didn’t have to stop its movement to obtain a reward. Importantly, we relaxed the task’s requirements (by increasing target size and reducing temporal constraints) to allow monkeys to perform the task under cerebellar block conditions as we found that the strict criteria in these conditions yield a low success rate. This design is suboptimal for studying endpoint accuracy which, as we now appreciate, is an important aspect of cerebellar control. In our revision, we will clarify these aspects of the task design and acknowledge that it is sub-optimal for examining the role of cerebellum in end-point control. Future studies will explicitly address this point more carefully.

      Because stimulation continued after the cursor had crossed the target, it is interesting to ask whether this disruption had any effects on the movements that were task-irrelevant. The reason for asking this is because we have found that whereas during task-relevant eye or tongue movements the Purkinje cells are strongly modulated, the modulations are much more muted when similar movements are performed but are task-irrelevant (Pi et al., PNAS 2024; Hage et al. Biorxiv 2024). Thus, it is interesting to ask whether the effects of stimulation were global and affected all movements, or were the effects primarily concerned with the task-relevant movements.

      This is a very interesting suggestion. Although our main analysis focused on target-directed reaching movements, we have the data for the between-trial movements under continuous stimulation (e.g., return to center movements). In our revised supplementary material, we will examine the effect of cerebellar block on endpoint velocities in inter-trial movements versus task-related movements.

      If the schematic in Figure 1 is accurate, it is difficult for me to see how any of the reaching movements can be termed single joint. In the paper, T1 is labeled as a single joint, and T2-T4 are labeled as dual-joint. The authors should provide data to justify this.

      The is reviewer right and movements to all targets engages shoulder and elbow but the single joint participation varied in a target-specific manner. In the manuscript, we used the term “single-joint” to indicate a target direction in which one joint remains stationary, resulting in minimal coupling torque at the adjacent joint. Specifically, for Targets 1 and 5 in our experiments, the net torque (and thus acceleration) at the elbow was negligible, and hence the shoulder experienced correspondingly low coupling torque (as illustrated in Figure 3c of our manuscript). To avoid confusion, we will use the term ‘predominantly single-joint’ movements in our revised manuscript to indicate targets with low coupling torques. We will also include an additional figure in the revised supplementary material displaying the net torques at the shoulder and elbow, similar to Figures 2c and 3c. Our goal is to demonstrate that movements to targets 1 and 5 are characterized by predominantly one-joint engagement (i.e., the elbow is stationary with low net torque) and low coupling torques, rather than implying a purely isolated, single-joint motion.

      Because at least part of this work was previously analyzed and published, information should be provided regarding which data are new.

      We will include a clear statement in the Methods section specifying which components of the dataset and analyses are entirely new. While some of the same animals and stimulation protocol were presented in prior work, the inverse-dynamics modeling, analyses of progressive movement changes across trials under stimulation and invariance of motor noise to movement velocity are newly reported in this manuscript.

      Reviewer #2 (Public review):

      This manuscript asks an interesting and important question: what part of 'cerebellar' motor dysfunction is an acute control problem vs a compensatory strategy to the acute control issue? The authors use a cerebellar 'blockade' protocol, consisting of high-frequency stimuli applied to the cerebellar peduncle which is thought to interfere with outflow signals. This protocol was applied in monkeys performing center outreaching movements and has been published from this laboratory in several preceding studies. I found the take-home-message broadly convincing and clarifying - that cerebellar block reduces muscle activation acutely particularly in movements that involve multiple joints and therefore invoke interaction torques, and that movements progressively slow down to in effect 'compensate' for these acute tone deficits. The manuscript was generally well written, and the data was clear, convincing, and novel. My comments below highlight suggestions to improve clarity and sharpen some arguments.

      Primary comments:

      (1) Torque vs. tone: Is it known whether this type of cerebellar blockade is reducing muscle tone or inducing any type of acute co-contraction that could influence limb velocity through mechanisms different than 'atonia'? If so, the authors should discuss this information in the discussion section starting around line 336, and clarify that this motivates (if it does) the focus on 'torques' rather than muscle activation. Relatedly, besides the fact that there are joints involved, is there a reason there is so much emphasis on torque per se? If the muscle is deprived of sufficient drive, it would seem that it would be more straightforward to conceptualize the deficit as one of insufficient timed drive to a set of muscles than joint force. Some text better contextualizing the choices made here would be sufficient to address this concern. I found statements like those in the introduction "hand velocity was low initially, reflecting a primary muscle torque deficit" to be lacking in substance. Either that statement is self-evident or the alternative was not made clear. Finally, emphasize that it is a loss of self-generated torque at the shoulder that accounts for the velocity deficits. At times the phrasing makes it seem that there is a loss of some kind of passive torque.

      We appreciate the reviewer’s emphasis on distinguishing reduced muscle tone and altered co-contraction patterns as possible explanations for decreased limb velocity. Our focus on torques arises from previous studies suggesting that the core deficit in cerebellar ataxia is impaired prediction of coupling torques. This point will be added in the discussion section of our revised manuscript where we will explain why we prioritize muscle torques and how muscle-level activation collectively contributes to net joint torques. Also, we will underscore that the observed velocity deficits primarily reflect a reduction of self-generated torque at the shoulder (whether acute or adaptive), rather than any reduction in passive torques.

      (2) Please clarify some of the experimental metrics: Ln 94 RESULTS. The success rate is used as a primary behavioral readout, but what constitutes success is not clearly defined in the methods. In addition to providing a clear definition in the methods section, it would also be helpful for the authors to provide a brief list of criteria used to determine a 'successful' movement in the results section before the behavioral consequences of stimulation are described. In particular, the time and positional error requirements should be clear.

      Successful trials were trials in which monkeys didn’t leave the center position before the go signal and reached the peripheral target within a specific time criteria. These values varied in different monkeys. We will include detailed definitions of our success criteria in the revised methods section of our manuscript. Specifically, we will update our methods section to include (i) the timing criteria of each phase of the trials and (ii) the size of the peripheral targets indicating the tolerance for endpoint accuracy.

      (3) Based on the polar plot in Figure 1c, it seemed odd to consider Targets 1-4 outward and 5-8 inward movements, when 1 and 5 are side-to-side. Is there a rationale for this grouping or might results be cleaner by cleanly segregating outward (targets 2-4) and inward (targets 6-8) movements? Indeed, by Figure 3 where interaction torques are measured, this grouping would seem to align with the hypothesis much more cleanly since it is with T2,T3,and T4 where clear coupling torques deficits are seen with cerebellar block.

      We acknowledge the reviewer’s observation regarding Targets 1 and 5 being side-to-side rather than strictly “outward” or “inward.” In the first section of our results, we grouped the targets in this way to emphasize the notably stronger effect of the cerebellar block on targets involving shoulder flexion (‘outward’) as compared to those involving shoulder extension (‘inwards’). For subsequent analyses we focused on the effects of cerebellar block on outward targets where movements were single-joint (Target 1) vs. multi-joint (Targets 2-4). To clarify this aspect, in our revised manuscript we will explain the rationale for grouping T1–T4 as “outward” and T5–T8 as “inward,” including how we defined them.

      (4) I did not follow Figure 3d. Both the figure axis labels and the description in the main text were difficult to follow. Furthermore, the color code per animal made me question whether the linear regression across the entire dataset was valid, or would be better performed within animal, and the regressions summarized across animals. The authors should look again at this section and figure.

      We will revise the figure labels and legend to clarify how each axis is defined. Please note that pooling the data was done after confirming that data from each animal expressed a similar trend. Specifically, the correlation coefficients were all positive but statistically significant in 3 out of the 4 monkeys. Moreover, following the reviewers’ feedback, we also did a partial correlation analysis (which controls for the variability across monkeys) and found a significant correlation (r = 0.33, p < 0.001). These points will be described in the revised manuscript.

      (5) Line 206+ The rationale for examining movement decomposition with a cerebellar block is presented as testing the role of the cerebellum in timing. Yet it is not spelled out what movement decomposition and trajectory variability have to do with motor timing per se.

      The reviewer is right and the relations between timing, decomposition and variability need to be explicitly presented. In our revision, we will explain how decomposed movements may reflect impaired temporal coordination across multiple joints—a critical cerebellar function. We will also clarify how increased variability in joint coordination can result in increased trial-to-trial variability of trajectories.

      Reviewer #3 (Public review):

      Summary:

      In their manuscript, "Disentangling acute motor deficits and adaptive responses evoked by the loss of cerebellar output," Sinha and colleagues aim to identify distinct causes of motor impairments seen when perturbing cerebellar circuits. This goal is an important one, given the diversity of movement-related phenotypes in patients with cerebellar lesions or injuries, which are especially difficult to dissect given the chronic nature of the circuit damage. To address this goal, the authors use high-frequency stimulation (HFS) of the superior cerebellar peduncle in monkeys performing reaching movements. HFS provides an attractive approach for transiently disrupting cerebellar function previously published by this group. First, they found a reduction in hand velocities during reaching, which was more pronounced for outward versus inward movements. By modeling inverse dynamics, they find evidence that shoulder muscle torques are especially affected. Next, the authors examine the temporal evolution of movement phenotypes over successive blocks of HFS trials. Using this analysis, they find that in addition to the acute, specific effects on muscle torques in early HFS trials, there was an additional progressive reduction in velocity during later trials, which they interpret as an adaptive response to the inability to effectively compensate for interaction torques during cerebellar block. Finally, the authors examine movement decomposition and trajectory, finding that even when low-velocity reaches are matched to controls, HFS produces abnormally decomposed movements and higher than expected variability in trajectory.

      Strengths:

      Overall, this work provides important insight into how perturbation of cerebellar circuits can elicit diverse effects on movement across multiple timescales.

      The HFS approach provides temporal resolution and enables analysis that would be hard to perform in the context of chronic lesions or slow pharmacological interventions. Thus, this study describes an important advance over prior methods of circuit disruption, and their approach can be used as a framework for future studies that delve deeper into how additional aspects of sensorimotor control are disrupted (e.g., response to limb perturbations).

      In addition, the authors use well-designed behavioral approaches and analysis methods to distinguish immediate from longer-term adaptive effects of HFS on behavior. Moreover, inverse dynamics modeling provides important insight into how movements with different kinematics and muscle dynamics might be differentially disrupted by cerebellar perturbation.

      Weaknesses:

      The argument that there are acute and adaptive effects to perturbing cerebellar circuits is compelling, but there seems to be a lost opportunity to leverage the fast and reversible nature of the perturbations to further test this idea and strengthen the interpretation. Specifically, the authors could have bolstered this argument by looking at the effects of terminating HFS - one might hypothesize that the acute impacts on muscle torques would quickly return to baseline in the absence of HFS, whereas the longer-term adaptive component would persist in the form of aftereffects during the 'washout' period. As is, the reversible nature of the perturbation seems underutilized in testing the authors' ideas.

      We agree that our approach could more explicitly exploit the rapid reversibility of high-frequency stimulation (HFS) by examining post-stimulation ‘washout’ periods. However, for the present dataset, we ended the session after the set of cerebellar block trials. We plan to study the effect of cerebellar block on immediate post-block washout trials in the future.  

      The analysis showing that there is a gradual reduction in velocity during what the authors call an adaptive phase is convincing. That said, the argument is made that this is due to difficulty in compensating for interaction torques. Even if the inward targets (i.e., targets 6-8) do not show a deficit during the acute phase, these targets still have significant interaction torques (Figure 3c). Given the interpretation of the data as presented, it is not clear why disruption of movement during the adaptive phase would not be seen for these targets as well since they also have large interaction torques. Moreover, it is difficult to delve into this issue in more detail, as the analyses in Figures 4 and 5 omit the inward targets.

      The reviewer is right and movements to Targets 6–8 (inward) were seemingly unaffected despite also involving significant interaction torques. In fact, we have already attempted to address this issue in the discussion section of the version 1 of our manuscript. Specifically, we note that while outward targets (2–4) tend to involve higher coupling torque impulses on average, this alone does not fully explain the differential impact of cerebellar block, as illustrated by discrepancies at the individual target level (e.g., target 7 vs. target 1). We proposed two possible explanations: (1) a bias toward shoulder flexion in the effect of cerebellar block—consistent with earlier studies showing ipsilateral flexor activation or tone changes following stimulation or lesioning of the deep cerebellar nuclei; and (2) a posture-related facilitation of inward (shoulder extension) movements from the central starting position.

      The text in the Introduction and in the prior work developing the HFS approach overstates the selectivity of the perturbations. First, there is an emphasis on signals transmitted to the neocortex. As the authors state several times in the Discussion, there are many subcortical targets of the cerebellar nuclei as well, and thus it is difficult to disentangle target-specific behavioral effects using this approach. Second, the superior cerebellar peduncle contains both cerebellar outputs and inputs (e.g., spinocerebellar). Therefore, the selectivity in perturbing cerebellar output feels overstated. Readers would benefit from a more agnostic claim that HFS affects cerebellar communication with the rest of the nervous system, which would not affect the major findings of the study.

      The reviewer is right that the superior cerebellar peduncle carries both descending and ascending fibers, and that cerebellar nuclei project to subcortical as well as cortical targets. However, it is also important to note that in primates the cerebellar-thalamo-cortical (CTC) pathway greatly expanded (on the expanse of the cerbello-rubro-spinal tract) in mediating cerebellar control of voluntary movements (Horne and Butler, 1995). The cerebello-subcortical pathways lost its importance over the course of evolution (Nathan and Smith, 1982, Padel et al., 1981, ten Donkelaar, 1988). In our previous study we found that the ascending spinocerebellar axons which enter the cerebellum through the SCP are weakly task-related and the descending system is quite small (Cohen et al, 2017). However, we cannot rule out an effect of HFS mediated in part through other systems. In the revised introduction section, we will clarify this point and use more careful language about the scope of our stimulation, emphasizing that HFS disrupts cerebellar communication broadly, rather than solely the cerebello-thalamo-cortical pathway.

      The text implies that increased movement decomposition and variability must be due to noise. However, this assumption is not tested. It is possible that the impairments observed are caused by disrupted commands, independent of whether these command signals are noisy. In other words, commands could be low noise but still faulty.

      We recognize the reviewer’s concern about linking movement decomposition and trial-to-trial trajectory variability with motor noise. As presented in our discussion section, we interpret these motor abnormalities as a form of motor noise in the sense that they are generated by faulty motor commands. We draw our interpretation from the findings of previous research work which show that the cerebellum aids in the state estimation of the limb and subsequent generation of accurate feedforward commands. Therefore, disruption of the cerebellar output may lead to faulty motor commands resulting in the observed asynchronous joint activations (i.e., movement decomposition) and unpredictable trajectories (i.e., increased trial-to-trial variability). Both observed deficits resemble increased motor noise.

      Throughout the text, the use of the term 'feedforward control' seems unnecessary. To dig into the feedforward component of the deficit, the authors could quantify the trajectory errors only at the earliest time points (e.g., in Figure 5d), but even with this analysis, it is difficult to disentangle feedforward- and feedback-mediated effects when deficits are seen throughout the reach. While outside the scope of this study, it would be interesting to explore how feedback responses to limb perturbation are affected in control versus HFS conditions. However, as is, these questions are not explored, and the claim of impaired feedforward control feels overstated.

      We agree that to strictly focus on feedforward control, we could have examined the measured variables in the first 50-100 ms of the movement which has been shown to be unaffected by feedback responses (Pruszynski et al. 2008, Todorov and Jordan 2002, Pruszynski and Scott 2012, Crevecoeur et al. 2013). However, in our task the amplitude of movements made by our monkeys was small and therefore the response measures we used were too small in the first 50-100 ms for a robust estimation. Also, fixing a time window led to an unfair comparison between control and cerebellar block trials, in which velocity was significantly reduced and therefore movement time was longer. Therefore, we used the peak velocity, torque-impulse at the peak velocity and maximum deviation of the hand trajectory as response measures. We will acknowledge this point in the discussion section of our revised manuscript. We will also tone down references to feedforward control throughout the text of our revised manuscript as suggested by the reviewer.

      The terminology 'single-joint' movement is a bit confusing. At a minimum, it would be nice to show kinematics during different target reaches to demonstrate that certain targets are indeed single joint movements. More of an issue, however, is that it seems like these are not actually 'single-joint' movements. For example, Figure 2c shows that target 1 exhibits high elbow and shoulder torques, but in the text, T1 is described as a 'single-joint' reach (e.g. lines 155-156). The point that I think the authors are making is that these targets have low interaction torques. If that is the case, the terminology should be changed or clarified to avoid confusion.

      Indeed, as reviewer #1 also noted, movements to target 1 and 5 are not purely single-joint but rather have relatively low coupling torques. Our intention while using the term “single-joint” was to indicate a target direction in which one joint remains stationary, resulting in minimal coupling torque at the adjacent joint. Specifically, for Targets 1 and 5 in our experiments, the net torque (and thus acceleration) at the elbow was negligible, and hence the shoulder experienced correspondingly low coupling torque (as illustrated in Figure 3c of our manuscript). ). To avoid confusion, we will use the term ‘predominantly single-joint’ movements in our revised manuscript to indicate targets with low coupling torques. We will also include an additional figure in the revised supplementary material displaying the net torques at the shoulder and elbow, similar to Figures 2c and 3c. Our goal is to demonstrate that movements to targets 1 and 5 are characterized by predominantly one-joint engagement (i.e., the elbow is stationary with low net torque) and low coupling torques, rather than implying a purely isolated, single-joint motion.

      The labels in Figure 3d are confusing and could use more explanation in the figure legend.

      In Figure 3d, it is stated that data from all monkeys is pooled. However, if there is a systematic bias between animals, this could generate spurious correlations. Were correlations also calculated for each animal separately to confirm the same trend between velocity and coupling torques holds for each animal?

      We will revise the figure legend and main-text explanation for Figure 3d. Please note that pooling the data was done after confirming that data from each animal expressed a similar trend. Specifically, the correlation coefficients were positive but significant for 3 out of the 4 monkeys. Moreover, following the reviewers’ feedback, we also did a partial correlation analysis (which controls for the variability across monkeys) and found a significant correlation (r = 0.33, p < 0.001). These points will be described in the revised manuscript.

      In Table S1, it would be nice to see target-specific success rates. The data would suggest that targets with the highest interaction torques will have the largest reduction in success rates, especially during later HFS trials. Is this the case?

      We will provide a breakdown of the success rates as a function of targets. However, one should note that success/failure may depend on several factors beyond impaired limb dynamics. In a previous study (Nashef et al. 2019) we identified several causes of failure such as (i) not entering the central target in time, (ii) moving out too early from the peripheral target, (iii) Reaction time longer than permitted, or (iv) premature exit from the central target before permitted.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors Eapen et al. investigated the peptide inhibitors of Cdc20. They applied a rational design approach, substituting residues found in the D-box consensus sequences to better align the peptides with the Cdc20-degron interface. In the process, the authors designed and tested a series of more potent binders, including ones that contain unnatural amino acids, and verified binding modes by elucidating the Cdc-20-peptide structures. The authors further showed that these peptides can engage with Cdc20 in the cellular context, and can inhibit APC/C<sup>Cdc20</sup> ubiquitination activity. Finally, the authors demonstrated that these peptides could be used as portable degron motifs that drive the degradation of a fused fluorescent protein.

      Strengths:

      This manuscript is clear and straightforward to follow. The investigation of different peptide variations was comprehensive and well-executed. This work provided the groundwork for the development of peptide drug modalities to inhibit degradation or apply peptides as portable motifs to achieve targeted degradation. Both of which are impactful.

      Weaknesses:

      A few minor comments:

      (1) In my opinion, more attention to the solubility issue needs to be discussed and/or tested. On page 10, what is the solubility of D2 before a modification was made? The authors mentioned that position 2 is likely solvent exposed, it is not immediately clear to me why the mutation made was from one hydrophobic residue to another. What was the level of improvement in solubility? Are there any affinity data associated with the peptide that differ with D2 only at position 2?

      The reviewer is correct that we have not done any detailed solubility characterisation; we refer only to observations rather than quantitative analysis. We wrote that we reverted from Leu to Ala due to solubility - we will clarify this statement to say that that we reverted to Ala, as it was the residue present in D1, for which we observed a measurable affinity by SPR and saw a concentration-dependent response in the thermal shift analysis. We do not have any peptides or affinity data that explore single-site mutations with the parental peptide of D2. D2 is included in the paper because of its link to the consensus D-box sequence and thus was the logical path to the investigations into positions 3 and 7 that come later in the manuscript.

      (2) I'm not entirely convinced that the D19 density not observed in the crystal structure was due to crystal packing. This peptide is peculiar as it also did not induce any thermal stabilization of Cdc20 in the cellular thermal shift assay. Perhaps the binding of this peptide could be investigated in more detail (i.e., NMR?) Or at least more explanation could be provided.

      This section will be clarified. The lack of observed density was likely due to the relatively low affinity of D19 and also to the lack of binding of the three C-terminal residues in the crystal, and consequently it has a further reduced affinity. The current wording in the manuscript puts greater emphasis on this second aspect being a D19-specific issue, even though it applies to all four soaked peptides. The extent of peptide-induced thermal stabilisations observed by TSA and CETSA is different, with the latter experiment consistently showing smaller shifts. This observation may be due to the more complex medium (cell lysate vs. purified protein) and/or different concentrations of the proteins in solution. In the CETSA, we over-expressed a HiBiT-tagged Cdc20, which is present in addition to any endogenously expressed Cdc20. Although we did not investigate it, the near identical D-box binding sites on Cdc20 and Cdh1 would suggest that there will be cross-specificity, which could further influence the CETSA experiments.

      Reviewer #2 (Public review):

      Summary:

      The authors took a well-characterised (partly by them), important E3 ligase, in the anaphase-promoting complex, and decided to design peptide inhibitors for it based on one of the known interacting motifs (called D-box) from its substrates. They incorporate unnatural amino acids to better occupy the interaction site, improve the binding affinity, and lay foundations for future therapeutics - maybe combining their findings with additional target sites.

      Strengths:

      The paper is mostly strengths - a logical progression of experiments, very well explained and carried out to a high standard. The authors use a carefully chosen variety of techniques (including X-ray crystallography, multiple binding analyses, and ubiquitination assays) to verify their findings - and they impressively achieve their goals by honing in on tight-binders.

      Weaknesses:

      Some things are not explained fully and it would be useful to have some clarification. Why did the authors decide to model their inhibitors on the D-box motif and not the other two SLiMs that they describe?

      For completeness, in addition to the D-box we did originally construct peptides based on the ABBA and KEN-box motifs, but they did not show any shift in melting temperature of cdc20 in the thermal shift assay whereas the D-box peptides did; consequently, we focused our efforts on the D-box peptides. Moreover, there is much evidence from the literature that points to the unique importance of the D-box motif in mediating productive interactions of substrates with the APC/C (i.e. those leading to polyubiquitination & degradation). One of the clearest examples is a study by Mark Hall’s lab (described in Qin et al. 2016), which tested the degradation of 15 substrates of yeast APC/C in strains carrying alleles of Cdh1 in which the docking sites for D-box, KEN or ABBA were mutated. They observed that whereas degradation of all 15 substrates depended on D-box binding, only a subset required the KEN binding site on Cdh1 and only one required the ABBA binding site. A more recent study from David Morgan’s lab (Hartooni et al. 2022) looking at binding affinities of different degron peptides concluded that KEN motif has very low affinity for Cdc20 and is unlikely to mediate degradation of APC/C-Cdc20 substrates. Engagement of substrate with the D-box receptor is therefore the most critical event mediating APC/C activity and the interaction that needs to be blocked for most effective inhibition of substrate degradation.

      What exactly do they mean when they say their 'observation is consistent with the idea that high-affinity binding at degron binding sites on APC/C, such as in the case of the yeast 'pseudo-substrate' inhibitor Acm1, acts to impede polyubiquitination of the bound protein'? It's an interesting thing to think about, and probably the paper they cite explains it more but I would like to know without having to find that other paper.

      Interesting results from a number of labs (Choi et al. 2008, Enquist-Newman et al. 2008, Burton et al. 2011, Qin et al. 2019) have shown that mutation of degron SLiMs in Acm1 that weaken interaction with the APC/C have the unexpected consequence of converting Acm1 from APC/C inhibitor to APC/C substrate. A necessary conclusion of these studies is that the outcome of degron binding (i.e. whether the binder functions as substrate or inhibitor) depends on factors other than D-box affinity and that D-box affinity can counteract them. One idea is that if a binder interacts too tightly, this removes some flexibility required for the polyubiquitination process. The most recent study on this question (Qin et al.2019) specifically pins the explanation for the inhibitory function of the high affinity D-box in Acm1 on its ‘D-box Extension’ (i.e. residues 8-12) preventing interaction with APC10. In our current study, the binding affinity of peptides is measured against Cdc20. In cellular assays however, the D-box must also engage APC10 for degradation to occur. It may be that the peptide binding most strongly to the D-box pocket on Cdc20 is less able to bind to APC10 and therefore less effective in triggering APC10-dependent steps in the polyubiquitination pathway. The important Hartooni et al. paper from David Morgan’s lab confirms that even though the binding of D-box residues to APC10 is very weak on its own, it can contribute 100X increase in affinity of a peptide by adding cooperativity to the interaction of D-box with co-activator.

      After further reading on this topic, we will modify the relevant piece of text from:

      “However, we found the opposite effect: D2 and D3 showed increased rates of mNeon degradation compared to D1 and D19 (Fig. 8C,D). This observation is consistent with the idea that high-affinity binding at degron binding sites on APC/C, such as in the case of the yeast ‘pseudo-substrate’ inhibitor Acm1, acts to impede polyubiquitination of the bound protein (Qin et al. 2019). Indeed, there is no evidence that Hsl1, which is the highest affinity natural D-box (D1) used in our study, is degraded any more rapidly than other substrates of APC/C in yeast mitosis. As shown in Qin et al., mutation of the high affinity D-box in Acm1 converts it from inhibitor to substrate (Qin et al. 2019). Overall, our results support the conclusions that all the D-box peptides engage productively with the APC/C and that the highest affinity interactors act as inhibitors rather than functional degrons of APC/C.”

      to:

      “However, we found the opposite effect: D2 and D3 showed increased rates of mNeon degradation compared to D1 and D19 (Fig. 8C,D). This observation is consistent with conclusions from other studies that affinity of degron binding does not necessarily correlate with efficiency of degradation. Indeed, there is no evidence that Hsl1, which is the highest affinity natural D-box (D1) used in our study, is degraded any more rapidly than other substrates of APC/C in yeast mitosis. A number of studies of a yeast ‘pseudo-substrate’ inhibitor Acm1, have shown that mutation of the high affinity D-box in Acm1 converts it from inhibitor to substrate (Choi et al. 2008, Enquist-Newman et al. 2008, Burton et al. 2011) through a mechanism that governs recruitment of APC10 (Qin et al. 2019). Our study does not consider the contribution of APC10 to binding of our peptides to APC/C<sup>Cdc20</sup> complex, but since there is strong cooperativity provided by this additional interaction (Hartooni et al. 2022) we propose this as the critical factor in determining the ability of the different peptides to mediate degradation of associated mNeon.”

      Re Figure 6 and the fact that we did look at peptide binding in cells, these experiments were done in unsynchronised cells, so most Cdc20 would not be bound to APC/C.

      Reviewer #3 (Public review):

      Summary:

      Eapen and coworkers use a rational design approach to generate new peptide-inspired ligands at the D-box interface of cdc20. These new peptides serve as new starting points for blocking APC/C in the context of cancer, as well as manipulating APC/C for targeted protein degradation therapeutic approaches.

      Strengths:

      The characterization of new peptide-like ligands is generally solid and multifaceted, including binding assays, thermal stability enhancement in vitro and in cells, X-ray crystallography, and degradation assays.

      Weaknesses:

      One important finding of the study is that the strongest binders did not correlate with the fastest degradation in a cellular assay, but explanations for this behavior were not supported experimentally. Some minor issues regarding experimental replicates and details were also noted.

      Interesting results from a number of labs (Choi et al. 2008, Enquist-Newman et al. 2008, Burton et al. 2011, Qin et al. 2019) have shown that mutation of degron SLiMs in Acm1 that weaken interaction with the APC/C have the unexpected consequence of converting Acm1 from APC/C inhibitor to APC/C substrate. A necessary conclusion of these studies is that the outcome of degron binding (i.e. whether the binder functions as substrate or inhibitor) depends on factors other than D-box affinity and that D-box affinity can counteract them. One idea is that if a binder interacts too tightly, this removes some flexibility required for the polyubiquitination process. The most recent study on this question (Qin et al.2019) specifically pins the explanation for the inhibitory function of the high affinity D-box in Acm1 on its ‘D-box Extension’ (i.e. residues 8-12) preventing interaction with APC10. In our current study, the binding affinity of peptides is measured against Cdc20. In cellular assays however, the D-box must also engage APC10 for degradation to occur. It may be that the peptide binding most strongly to the D-box pocket on Cdc20 is less able to bind to APC10 and therefore less effective in triggering APC10-dependent steps in the polyubiquitination pathway. The important Hartooni et al. paper from David Morgan’s lab confirms that even though the binding of D-box residues to APC10 is very weak on its own, it can contribute 100X increase in affinity of a peptide by adding cooperativity to the interaction of D-box with co-activator.

      After further reading on this topic, we will modify the relevant piece of text from:

      “However, we found the opposite effect: D2 and D3 showed increased rates of mNeon degradation compared to D1 and D19 (Fig. 8C,D). This observation is consistent with the idea that high-affinity binding at degron binding sites on APC/C, such as in the case of the yeast ‘pseudo-substrate’ inhibitor Acm1, acts to impede polyubiquitination of the bound protein (Qin et al. 2019). Indeed, there is no evidence that Hsl1, which is the highest affinity natural D-box (D1) used in our study, is degraded any more rapidly than other substrates of APC/C in yeast mitosis. As shown in Qin et al., mutation of the high affinity D-box in Acm1 converts it from inhibitor to substrate (Qin et al. 2019). Overall, our results support the conclusions that all the D-box peptides engage productively with the APC/C and that the highest affinity interactors act as inhibitors rather than functional degrons of APC/C.”

      to:

      “However, we found the opposite effect: D2 and D3 showed increased rates of mNeon degradation compared to D1 and D19 (Fig. 8C,D). This observation is consistent with conclusions from other studies that affinity of degron binding does not necessarily correlate with efficiency of degradation. Indeed, there is no evidence that Hsl1, which is the highest affinity natural D-box (D1) used in our study, is degraded any more rapidly than other substrates of APC/C in yeast mitosis. A number of studies of a yeast ‘pseudo-substrate’ inhibitor Acm1, have shown that mutation of the high affinity D-box in Acm1 converts it from inhibitor to substrate (Choi et al. 2008, Enquist-Newman et al. 2008, Burton et al. 2011) through a mechanism that governs recruitment of APC10 (Qin et al. 2019). Our study does not consider the contribution of APC10 to binding of our peptides to APC/C<sup>Cdc20</sup> complex, but since there is strong cooperativity provided by this additional interaction (Hartooni et al. 2022) we propose this as the critical factor in determining the ability of the different peptides to mediate degradation of associated mNeon.”

      Re Figure 6 and the fact that we did look at peptide binding in cells, these experiments were done in unsynchronised cells, so most Cdc20 would not be bound to APC/C.

    1. Now, it is clear that the decline of a language must ultimately have political and economic causes: it is not due simply to the bad influence of this or that individual writer. But an effect can become a cause, reinforcing the original cause and producing the same effect in an intensified form, and so on indefinitely. A man may take to drink because he feels himself to be a failure, and then fail all the more completely because he drinks. It is rather the same thing that is happening to the English language. It becomes ugly and inaccurate because our thoughts are foolish, but the slovenliness of our language makes it easier for us to have foolish thoughts. The point is that the process is reversible. Modern English, especially written English, is full of bad habits which spread by imitation and which can be avoided if one is willing to take the necessary trouble. If one gets rid of these habits one can think more clearly, and to think clearly is a necessary first step toward political regeneration: so that the fight against bad English is not frivolous and is not the exclusive concern of professional writers. I will come back to this presently, and I hope that by that time the meaning of what I have said here will have become clearer. Meanwhile, here are five specimens of the English language as it is now habitually written.

      I've noticed some people struggling to read any of this (I can't either). However, from my understanding the point is this:

      We don't really use English in a proper way because our own minds have also been "corrupted". This bad thoughts feed into the degredation of our language which feeds back into more bad thoughts.

      The way I summarized this is making me think Orwell might be right. -_-

    1. Latané and Darley’s original findings have been replicated in numerous studies. Increasing the number of bystanders inhibited helping behavior with many kinds of people, including children, college students, and future ministers (Darley & Batson, 1973; Latané & Nida, 1981; Plötner et al., 2015); in both small towns and large cities (Latané & Dabbs, 1975); in a variety of settings, such as psychology laboratories, city streets, and subway trains (Harrison & Wells, 1991; Latané & Darley, 1970; Piliavin & Piliavin, 1972); and with different kinds of emergencies, such as seizures, potential fires, fights, and accidents (Latané & Darley, 1968; Shotland & Straw, 1976; Staub, 1974), as well as with less-serious events such as having a flat tire (Hurley & Allen, 1974).

      One thing not mentioned in these discussions, though I imagine at least some of the researchers considered it: Fewer people helping when more bystanders are present may not be because everyone expects someone else to help. It may be because we as individuals do not like to stand out in a crowd, or act differently from others. Most of us are sensitive to what others will think of us. People inclined to help may be more self-secure and confident, or less concerned with public perceptions.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer 1

      Major issue #1. Regarding the conclusions on IRE1 signaling, both yeast species have different IRE1 activities (https://elifesciences.org/articles/00048), the total deletion of IRE1 in S pombe appears to indicate that expansion of perinuclear ER is independent of IRE1, however since IRE1 signaling has exclusively a negative impact on mRNA expression, it might be relevant to identify mRNA whose expression is stabilized under those circumstances and evaluate whether those could confer a mechanism which would also yield perinuclear ER expansion (eg differential deregulation of ER stress controlled lipid biosynthesis required for lipid membrane synthesis). In S. cerevisiae, do the authors observe HAC1 mRNA splicing?

      We have not tested whether HAC1 mRNA is processed in S. cerevisiae. To address this question, we will perform RT-PCR to test it.

      In addition, as requested by the reviewers, we will further test the involvement of Ire1 in the HU/DIA-induced phenotype in S. pombe. For that, we will reassess our RNA-seq data and compare it with data from (Kimmig et al., 2012) (UPR activation in S. pombe). We will test the levels and splicing of mRNA of Bip1 upon HU/DIA treatments by RT-PCR and finally we will test the levels of Gas2p which has been described to decrease upon Ire1/UPR activation in S. pombe.

      We are confident in that the results of these experiments and the re-analysis of our RNA-Seq data will help us to infer the mechanisms that modulate the ER response to HU or DIA treatment.

      Major issue #2. The authors indicate that HU and DIA lead to thiol stress, it might be relevant to evaluate the thiol-redox status of major secretory proteins in S. pombe (or even cargo reporters if necessary) to fully document the stress impact on global protein redox status.

      We agree with the reviewer that it is important to determine the redox and the functional state of the secretory pathway in our conditions to fully understand the cellular consequences of these treatments, especially in the case of HU, as it is routinely used in clinics.

      In this context, we have already included new data showing that HU or DIA treatment leads to alterations in the Golgi apparatus and in the distribution of secretory proteins (Figures 3A-B).

      In addition, we plan to perform mass spectrometry experiments to detect protein glutathionylation in our conditions, as it has been previously shown that DIA treatment leads to glutathionylation of key ER proteins such as Bip1, Pdi or Ero1 (Lind et al., 2002; Wang & Sevier, 2016), which might by reproduced upon HU treatment. We will test specifically the redox state of Bip1, Pdi and/or Ero1 by immunoprecipitation and western blot.

      Finally, we plan to test the folding and processing of specific secretory cargoes by western blot in our experimental conditions (See below, Reviewer 2, Major issue #1).

      What happens if HU-treated yeast cells are grown in the presence of n-acetyl cysteine?

      We have tested whether the addition of this antioxidant could prevent and/or revert the N-Cap phenotype. We found that NAC in combination with HU increased N-Cap incidence (Figure 5H). As NAC is a GSH precursor and we find that GSH is required to develop the phenotype of N-Cap (Figure 5A-B, D, G), this result further supports that the HU-induced cellular damage might involve ectopic glutathionylation of proteins.

      Unfortunately, we have not tested NAC in combination with DIA, as NAC seems to reduce DIA as soon as they get in contact, as judged by the change in the characteristic orange color of DIA, the same that happens when we combine GSH and DIA (Supplementary Figure 5A-B).

      In this regard, the following information has been added to the manuscript (page 32-33, highlighted in blue):

      "We also tested GSH addition to the medium in combination with either HU or DIA. When mixed with DIA, we noticed that the color of the culture changed after GSH addition (Figure S5A), which suggests that GSH and DIA can interact extracellularly, thus preventing us from being able to draw conclusions from those experiments. On the other hand, combining GSH with HU increased N-Cap incidence (Figure 5G), as expected based on our previous observations. Additionally, we checked whether the addition of the antioxidant N-acetyl cysteine (NAC), a GSH precursor, impacted upon the N-Cap phenotype. The results were the same as with GSH addition: when combined with HU, NAC increased N-Cap incidence (Figure 5H), whereas in combination, the two compounds interacted extracellularly (Figure S5B). These data align with NAC being a precursor of GSH, as incrementing GSH levels augments the penetrance of the HU-induced phenotype".

      Major issue #3. The appearance of cytosolic aggregates is intriguing, do the authors have any idea on the nature of the protein aggregates?

      DIA is a strong oxidant, and HU treatment results in the production of reactive oxygen species (ROS). Therefore, one hypothesis would be that cytoplasmic chaperone foci represent oxidized and/or misfolded soluble proteins. Indeed, this hypothesis is supported by the appearance of cytoplasmic foci containing the guk1-9-GFP and Rho1.C17R-GFP soluble reporters of misfolding upon HU or DIA treatment (Figure 4I-J). We have already tested if they contain Vgl1, which is one of the main components of heat shock induced stress granules in S. pombe (Wen et al., 2010). However, we found that HU or DIA-induced foci lacked this stress granule marker, and indeed Vgl1 did not form any foci in response to these treatments. Therefore, our aggregates differ from the canonical stress-induced granules. We have yet to include this data in the manuscript, but we plan to do that for the final version.

      To further explore the nature of the cytoplasmic aggregates induced by HU and DIA, we will test whether Hsp104-containing foci colocalize with guk1-9-GFP and/or Rho1.C17R-GFP foci.

      Are those resulting from proficient retrotranslocation or reflux of misfolded proteins from the ER?

      To test whether these cytosolic aggregates result from retrotranslocation from the ER, we plan to use the vacuolar Carboxipeptidase Y mutant reporter CPY*, which is misfolded. This misfolded protein is imported into the ER lumen but does not reach the vacuole. Instead, it is retrotranslocated to the cytoplasm, where it is ubiquitinated and degraded by the proteasome (Mukaiyama et al., 2012). We will analyze by fluorescence microscopy the localization of CPY*´-GFP and Hsp104-containing aggregates upon HU or DIA treatment and with or without proteasome inhibitors. We can also test the levels, processing and ubiquitination of CPY*-GFP by western blot, as ubiquitination of retrotranslocated proteins occurs once they are in the cytoplasm.

      Are those aggregates membrane bound or do they correspond to aggresomes as initially defined? The Walter lab has demonstrated a tight balance between ER phagy and ER membrane expansion (https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.0040423), which could also impact on the presence of protein aggregates in the cytosol.

      Our results suggest that these aggregates are not bound to ER membranes, as they do not appear in close proximity to the ER area marked by mCherry-AHDL in fluorescence microscopy images.

      To fully rule out this possibility, we will test whether these Hsp104-aggregates colocalize with ER transmembrane proteins such as Rtn1 or Yop1, with Gma12-GFP that marks the Golgi apparatus and with the dye FM4-64 that stains endosomal-vacuole membranes.

      We have tested whether deletion of key genes involved in autophagy affected the N-Cap phenotype. To this end, we used deletions of ypt1, vac8 and atg8 in strains expressing Cut11-GFP and/or mCherry-AHDL and found that none of them affected N-Cap formation. These data suggest that the core machinery of autophagy is not critical for HU/DIA-induced ER expansion. We plan to include this data in the final version of the manuscript along with the rest of experiments proposed.

      To get deeper insights and to fully rule out a possible contribution of macro-autophagy to the HU- and DIA-induced phenotypes, we plan to analyze by western blot whether GFP-Atg8 is induced and cleaved upon HU or DIA treatments which would be indicative of macroautophagy activation.

      To test whether the cytoplasmic aggregates are the result of an imbalance between ER-expansion and ER-phagy we plan to analyze the localization of GFP-Atg8 and Hsp104-RFP in the atg7Δ mutant, impaired in the core macro-autophagy machinery. In these conditions, the number or size of the cytoplasmic aggregates might be impacted.

      On the other hand, it has been recently shown that an ER-selective microautophagy occurs in yeasts upon ER stress (Schäfer et al., 2020; Schuck et al., 2014). This micro-ER-phagy involves the direct uptake of ER membranes into lysosomes, is independent of the core autophagy machinery and depends on the ESCRT system and is influenced by the Nem1-Spo7 phosphatase. ESCRT directly functions in scission of the lysosomal membrane to complete the uptake of the ER membrane. Interestingly, N-Caps are fragmented in the absence of cmp7 and specially in the absence of vps4 or lem2, the nuclear adaptor of the ESCRT (Figure 3E), We had initially interpreted these results as the need to maintain nuclear membrane identity during the process of ER expansion (Kume et al., 2019); however, the appearance of fragmented ER upon HU treatment in the absence of ESCRT might also be due to an inability to complete microautophagic uptake of ER membranes. To test this hypothesis, we plan to analyze whether the fragmented ER in these conditions co-localize with lysosome/vacuole markers.

      Major issue #4. Nucleotide depletion was previously shown to lead to HSP16 expression through activation of the spc1 MAPK pathway (https://academic.oup.com/nar/article/29/14/3030/2383924), one might think that HU (or diamide) could lead to this through a nucleotide dependent mechanism and not necessary through a thiol-redox protein misfolding stress. This issue has to be sorted out to ensure that the HSP effect is independent of nucleotide depletion.

      As stated in (Taricani et al., 2001), hsp16 expression is strongly induced in a cdc22-M45 mutant background. We performed experiments in this mutant that were included in the original version of the manuscript and remain in the current version (Sup. Fig. 2C) and, under restrictive conditions, we do not see spontaneous N-Cap formation. If Hsp16 overexpression and nucleotide depletion were key to the mechanism triggering N-Cap appearance, we would expect this mutant to eventually form N-Caps when placed at restrictive temperature. Furthermore, Taricani et al. show that Hsp16 expression was abolished in a Δatf1 mutant background in the presence of HU, and we found that this mutant is still able to produce N-Caps in HU; therefore, our results strongly suggest that the phenotype of N-cap is independent on the MAPK pathway and on the expression of hsp16.

      Minor issues

      1. __P1 - UPR = Unfolded Protein Response: __Corrected in the manuscript
      2. 2__. P22 - HSP upregulation "might" be indicative of a folding stress:__ Corrected in the manuscript
      3. __ The abstract does not reflect the findings presented in the manuscript. In addition, I would recommend the authors revise the storytelling in their manuscript to push forward the message on either the specific phenotype associated with perinuclear ER or on the characterization of protein misfolding stress.__ We have modified the abstract to better reflect our findings and will further revise our arguments in the final version of the manuscript once we have the results of the experiments proposed

      Reviewer 2

      Major issue #1. The authors state the cytoplasmic and ER folding are both disrupted. The impact on ER protein biogenesis would be bolstered with some biochemical data focused on the folding of one or more nascent secretory proteins. Is disulfide bond formation and/or protein folding indeed disrupted?

      We have addressed the status of secretion in cells treated with HU or DIA by assessing the morphology of the Golgi apparatus and the localization of several secretory proteins by fluorescence microscopy and found that both HU and DIA treatments impact the secretion system. In addition, we plan on addressing the redox status of ER proteins (Bip1, Pdi or Ero1) by biochemical approaches. Please see the answer to major issue #2 from reviewer 1.

      We will also analyze by western blot the biogenesis and processing of the wildtype vacuolar Carboxypeptidase Y (Cpy1-GFP) and alkaline phosphase (Pho8-GFP), two widely used markers to test the functionality of the ER/endomembrane system.

      Major issue #2. Increased signal of Bip1 in the expanded perinuclear ER is shown and is suggested as consistent with immobilization of BiP upon binding of misfolded proteins. The authors suggest that this increased signal must reflect Bip1 redistribution because "Bip1 levels are constant". Yet, the western image (Figure 4B) looks to show increased level of Bip1 protein up HU treatment. Given the abundance of Bip1 in cells, it seems possible that a two-fold increase in newly synthesized proteins in the perinuclear region may account for the increased signal. These original data cited by the authors uses photobleaching (not just fluorescence intensity) to show a change in crowding / mobility, which the authors should consider to support their conclusion. Alternatively, a detected increased engagement of Bip1 with substrates (e.g. pulldown experiment) would be similarly strengthening.

      This same issue arose with reviewer 3, so we decided to change the image of the western blot showing another one with less exposure and added a quantification showing that Bip1-GFP levels remain mostly constant between control conditions and treatments with HU and DIA.

      We have also performed the suggested photobleaching experiment to analyze potential changes in crowding and mobility in Bip1-GFP upon HU treatment. We found that Bip1-GFP signal recovers after photobleaching the perinuclear ER in HU-treated cells that had not yet expanded the ER, showing that Bip1-GFP is dynamic in these conditions. However, Bip1-GFP signal did not recover after photobleaching the whole N-Cap in cells that had fully developed the expanded perinuclear ER phenotype, whereas it did recover when only half of the N-Cap region was bleached. This suggests that Bip1-GFP is mobile within the expanded perinuclear ER but cannot freely diffuse between the cortical and the perinuclear ER once the N-Cap is formed.

      These data have been included in the revised version of the manuscript, in figure 4B, sup. figures 4A-B, and in page 23.

      Major issue #3. It is curious that cycloheximide (CHX) has a distinct impact on HU versus DIA treatment. Blocking protein synthesis with CHX exacerbates the phenotype with DIA, but not HU. The authors use the data with CHX to argue that their drug treatments are interfering with folding during synthesis and translation into the ER. If so, what is the rationale as to why CHX treatment decreases expansion upon HU treatment? Relatedly, is protein synthesis and/or ER import impacted upon treatment with HU and/or DIA?

      As all three reviewers had comments about the CHX and Pm-related data, we revised those experiments and noticed a phenotype occurring upon HU+CHX treatment that had gone unnoticed previously and that changed our understanding about the effect of these drugs on the ER. Briefly, we noticed that, although CHX treatment decreases the HU-induced expansion of the perinuclear ER, it indeed induced expansion but in this case in the cortical area of the ER. This means that the phenotype of ER expansion in HU is not being suppressed by addition of CHX, but rather taking place in another area of the ER (cortical ER). We do not understand why this happens; however, these results show that ER expansion is exacerbated both in DIA and HU when combined with CHX. We have included this data in Figures 3C-D and in page 22.

      We also examined the trafficking of secretory proteins that go from the ER to the cell tips and noticed that this transit was affected under both drugs (Figures 3A-B). This suggests that, although there is still protein synthesis when cells are exposed to the drugs (as can be seen by the higher levels of chaperones induced by both stresses (Figure 4C-E)), their protein synthesis capacity is possibly impinged on to certain degree. All this information is now included in the manuscript (page 19).

      Major issue #4. While the authors suggest that there is disulfide stress in the ER / nucleus, the redox environment in these compartments is not tested directly (only cytoplasmic probes).

      Although we have only included experiments using one redox sensor in the manuscript, we had tested the oxidation of several biosensors during HU and DIA exposure monitoring cytoplasmic, mitochondrial and glutathione-specific probes. We have tried to use ER directed probes however, we have not been successful due to oversaturation of the probe in the highly oxidative environment of the ER lumen.

      Although so far we have not been able to directly test the redox status of the ER with optical probes, we plan to test the folding and redox status of several ER proteins and secretory markers by biochemical approaches, so hopefully these experiments will give us more information on this question (See answer to Reviewer 1, Main Issue #2 and Reviewer 2, Main issue #1).

      Major Issue #5. What do the authors envision is the role of the cytoplasmic chaperone foci? Do CHX / Pm treatment with HU/DIA reverse the chaperone foci?

      Pm causes premature termination of translation, leading to the release of truncated, misfolded, or incomplete polypeptides into the cytosol and the re-engagement of ribosomes in a new cycle of unproductive translation, as puromycin does not block ribosomes (Aviner, 2020; Azzam & Algranati, 1973). This is likely to decrease the number of peptides entering the ER that can be targeted by either HU or DIA, decreasing in turn ER expansion. Indeed, we have found that Pm treatment alone results in the formation of multiple cytoplasmic protein aggregates marked by Hsp104-GFP (Figure 4K), consistent with a continuous release of incomplete and misfolded nascent peptides to the cytoplasm. This would explain why Pm treatment suppresses N-Cap formation when cells are treated with either HU or DIA.

      To further test this idea, we plan to carefully analyze the number, size and dynamics of Hsp104-containing cytoplasmic aggregates in cells treated with HU or DIA and Pm, where N-Caps are suppressed. We expect to find an increase in the accumulation of proteotoxicity in the cytoplasm in these conditions.

      On the other hand, CHX inhibits translation elongation by stalling ribosomes on mRNAs, preventing further peptide elongation but leaving incomplete polypeptides tethered to the blocked ribosomes. This reduces overall protein load entering the ER by blocking new protein synthesis and stabilizes misfolded proteins bound to ribosomes. Accordingly, it has been shown previously that blocking translation with CHX abolishes protein aggregation (Cabrera et al., 2020; Zhou et al., 2014). Similarly, we have found that Hsp104 foci are not observed when we add CHX alone or in combination with HU or DIA (Figures 4K-L). These results suggest that cytoplasmic foci that we observe upon HU or DIA treatment likely contain misfolded proteins derived from ongoing translation.

      As this question has also been raised by reviewer 1, we have decided to further explore the nature of these cytoplasmic foci (please see answer to Reviewer1, Issue 3). Briefly:

      • We plan to test whether they colocalize with the foci of Guk1-9-GFP and Rho1.C17R-GFP reporters of misfolding that appear upon HU or DIA treatments.
      • We will test whether these foci are membrane bound.
      • We plan to test whether the cytoplasmic foci represent proteins retro-translocated from the ER.
      • We will also test whether autophagy or an imbalance between ER expansion and ER-phagy might contribute to the accumulation of cytoplasmic protein foci. The new data regarding the suppression of cytoplasmic foci by CHX treatment has already been included in the current version of the manuscript in Figure 4K and in the text (page 30).

      The authors argue that cytoplasmic foci are "independent" from ER expansion and are "not a direct consequence of thiol stress" based on the observation that DTT does not reverse these foci. This seems like a strong statement based on the limited analysis of these foci.

      We agree with the reviewer. We have toned down our statements about the relationship between thiol stress, the cytoplasmic chaperone foci and their relationship with ER expansion. We have removed from the text the statement that cytoplasmic foci are independent from ER expansion and thiol stress and have further revised our claims about CHX and Pm in the main text and the discussion to address these and the other reviewers' concerns.

      Major Issue #6. Based on the transcriptional data, the authors speculate a potential role on role on iron-sulfur cluster protein biogenesis. This would seem to be rather straightforward to test.

      To address this issue, we plan to analyze the localization of proteins involved in iron-sulfur cluster assembly and/or containing iron-sulfur clusters by in vivo fluorescence microscopy, such as DNA polymerase Dna2 or Grx5, during HU or DIA treatments.

      Related to this, we have found that a subunit of the ribonucleotide reductase (RNR) aggregated in the cytoplasm upon HU exposure (Figure S2B). It is worth noting that RNR is an iron-containing protein whose maturation needs cytosolic Grxs (Cotruvo & Stubbe, 2011; Mühlenhoff et al., 2020). The catalytic site, the activity site (which governs overall RNR activity through interactions with ATP) and the specificity site (which determines substrate choice) are located in the R1 (Cdc22) subunits, which are the ones that aggregate, while the R2 subunits (Suc22) contain the di-nuclear iron center and a tyrosyl radical that can be transferred to the catalytic site during RNR activity (Aye et al., 2015). The fact that a subunit of RNR aggregates could be related to an impingement on its synthesis and/or maturation due to defects in iron-sulfur cluster formation, as it has been recently published that RNR cofactor biosynthesis shares components with cytosolic iron-sulfur protein biogenesis and that the iron-sulfur cluster assembly machinery is essential for iron loading and cofactor assembly in RNR in yeast (Li et al., 2017). This information has been added to the discussion.

      Major Issue #7. The authors suggest that "pre-treatment" with DTT before HU addition suppresses formation of the N-Caps. However, these samples (Figure 2J) contain DTT coincident with the treatment as well. To say it is the effect of pre-treatment, the DTT should be added and then washed out prior to HU or DIA addition. Alternatively, the language used to describe these experiments and their outcomes could be revised.

      We modified the language used to describe the experiment in the manuscript, as suggested by the reviewer, to clarify that while DTT is kept in the medium, N-Caps never form. In addition, we have also performed a pre-treatment with DTT; adding 1 mM DTT one hour before, washing the reducing agent out and adding HU to the medium then. The result indicates that pre-treating cells with DTT significantly reduces N-Cap formation after a 4-hour incubation with HU, which suggests that triggering reducing stress "protects" cells from the oxidative damage induced by HU and DIA. This information has been also added to the manuscript (Figure 2J).

      Major Issue #8. For a manuscript with 128 references there is rather limited discussion of the data in the context of the wider literature. The discussion primarily focuses on a recap of the results. The authors do cite several prior works focused on redox-dependent nuclear expansion. However, while cited, there is no real discussion of the relationship between this work in the context of that previously published (including several known disulfide bonded proteins that are involved in nuclear/ER architecture).

      We have revised and expanded our discussion. In addition, in the final revision of our work we will increase the discussion in the context of the new results obtained.

      Minor points

      1. __ Figure numbering goes from figure 4 to S6 to 5.__ We have updated the numbering of the figures after merging several supplementary figures, so now this issue is fixed.

      __ It would be helpful to the reader to explain what some of the reporters are in brief. For example, Guk1-9-GFP and Rho1.C17R-GFP reporters__.

      Both the Guk1-9-GFP and Rho1.C17R-GFP are two thermosensitive mutants in guanylate kinase and Rho1 GTPase respectively, that have been previously used in S. pombe as soluble reporters of misfolding in conditions of heat stress. During mild heat shock, both mutants aggregate into reversible protein aggregate centers (Cabrera et al., 2020). This information has now been added to the manuscript.

      __ Supplementary Figure 3. The main text suggests panel 3A is focused on diamide treatment. The figure legend discusses this in terms of HU treatment. Which is correct?__

      We thank the reviewer for pointing out this mistake. The experiment was performed in 75 mM HU, the legend was correct. It has now been corrected in the manuscript.

      __ The authors use ref 110 and 111 to suggest the importance of UPR-independent signaling. However, they do not point out that this UPR-independent signaling referred to in these papers is dependent on the UPR transmembrane kinase IRE1.__

      We have included pertinent clarification in the new discussion.

      Reviewer 3

      Major issue #1. It is hard to see how the claim of ER stress can be supported if BiP levels do not change (Fig. 4B). Also, this figure is overexposed. The RNA-seq data should be able to establish ER stress as well, but no rigorous analysis of ER stress markers is presented.

      Regarding the levels of Bip1, we now show in Figure 4 a less exposed image of the western blot, and a quantification of Bip1-GFP intensity from three independent experiments. We find that, in our experimental conditions, neither HU nor DIA treatments significantly altered Bip1 levels.

      With respect to the RNA-Seq, as we mentioned in the major issue 1 from reviewer 1, we plan to reassess our data to further clarify and add information about ER stress markers induced or repressed by HU and DIA. We also will test the levels of Bip1 and several UPR targets by RT-PCR and by western blot.

      Major issue #2. The interpretation of the CHX and puromycin experiments of Figure 3A-B is hard to follow. My best guess is that the authors argue that CHX decreases misfolded protein load and that puromycin increases misfolded protein load, and that since DIA is a stronger oxidative stress than HU hence CHX is only protective under HU and not DIA. However, while CHX decreases misfolded protein load, puromycin hasn't been show directly to increase it and I don't see how this explains puromycin being protective at all.

      We have found that puromycin treatment alone results in the formation of cytoplasmic foci containing Hsp104, suggesting that puromycin indeed increases folding stress in the cytoplasm. We have now included this data in Figure 4K (please see Main Issue #5 from Reviewer 2). Pm suppresses the formation of N-caps induced by HU or DIA; however, we have not addressed cell survival or fitness in these conditions and therefore we cannot conclude about being protective.

      In addition, upon the reevaluation of our data, we have realized that CHX treatment suppresses HU-induced perinuclear expansion, although it does not suppress but instead enhances ER expansion in the cortical region. This data has been added to the present version of the manuscript in Figure 3C-D (page 22).

      Furthermore, puromycin causes Ca leakage from the ER (which can be recapitulated with thapsigargin and blocked with anisomycin; easy experiments), which could be responsible for the differences from CHX, and the model does not address the effects on downstream stress signaling. The authors should be much more clear regarding their argument, since this data is used to support the argument of disrupted ER proteostasis.

      As the reviewer requested, we plan to test the effect of anisomycin (thapsigargin has been described to not work in yeast, as they lack a (SERCA)‐type Ca2+ pump (Strayle et al., 1999), which this drugs targets.

      Regarding the downstream effects of HU or DIA treatment on ER proteostasis, we plan to further explore the effect of these drugs on the secretory system (please see major issue #2 from Reviewer 1) and to evaluate the redox state and processing of several key ER and secretory proteins. We will further explore the nature of the aggregates that appear in the cytoplasm in our experimental conditions, which will also shed light into the downstream effects of these drugs in cytoplasmic proteostasis (please see answer to issue #5 from Reviewer 2).

      Major issue #3. The claim that a canonical UPR is not induced is weak. First, the transcriptional program of S. cerevisiae from Travers et al is used as the canonical UPR, and compared to HU/DIA induced stress in S. pombe. These organisms may not be similar enough to assume that they have transcriptionally identical UPRs. Second, no consideration is given to the mechanism by which the different transcripts are modulated between "canonical" and HU/DIA induced UPR. Is it solely through RIDD, or does it point to differences in sensing or signaling transduction?

      We plan on readdressing this topic by analyzing the genes that have been described to be differentially expressed during UPR activation in S. pombe and comparing them with our data, first by reevaluating our transcriptomic data and second by choosing Bip1 and some other of the differentially expressed genes in (Kimmig et al., 2012) (for example, Gas2, Pho1 or Yop1) and assessing by RT-PCR their mRNA levels in our experimental conditions. As an alternative approach, we will also analyse the levels of UPR targets by western blot upon HU or DIA treatment.

      We are confident that the results of these experiments and the re-analysis of our RNA-Seq data will allow us to infer the mechanisms that modulate the ER response to HU or DIA treatment.

      Finally, the p-values used are unadjusted (e.g. by Bonferroni's method or by ANOVA or at least controlled by an FDR approach) and unmodulated (extremely important when n = 3 and variance is poorly sampled), which makes them not dependable. It looks like HSF1 targets are induced, which should be addressed.

      We thank the reviewer for pointing this out. We forgot to include this information which now appears in the M&M section as follows:

      "A gene was considered as differentially expressed when it showed an absolute value of log2FC(LFC){greater than or equal to}1 and an adjusted p-valueIn this regard, we plan to perform proteome-wide mass spectrometry experiments to detect protein glutathionylation in our conditions, as it has been previously shown that DIA treatment leads to glutathionylation of key ER proteins such as Bip1, Pdi or Ero1 (Lind et al., 2002; Wang & Sevier, 2016), which might by reproduced upon HU treatment. We will also test specifically the redox state of Bip1, Pdi and/or Ero1 by immunoprecipitation and western blot. We also plan to test the folding and processing of specific secretory cargoes by western blot in our experimental conditions (see below, and Reviewer 2, Major issue #1).

      We have already tested whether mutant strains with deletions of key enzymes in both cytoplasmic and ER redox systems are able to expand the ER upon HU or DIA treatment. We have found that only pgr1Δ (glutathione reductase), gsa1Δ (glutathione synthetase) and gcs1Δ (glutamate-cysteine ligase) mutants fully suppressed N-Cap formation, which suggests that glutathione has an important role in the phenotype of ER expansion. We have now added the pgr1Δ mutant strain to the main text of the manuscript (Figure 5C, page 31).

      Major issue #5. Figure S5 presents weak ER expansion in fribrosarcoma cells in response to HU (at very low concentrations and DIA is not included). The lack of any other phenotypes being presented could suggest that such experiments were done but didn't show any effect. The authors should straightforwardly discuss whether they performed experiments looking for perinuclear ER expansion or NPC clustering, and if not, what challenges precluded such experiments. Given how important this line of experimentation is for establishing generality, much more discussion is needed here.

      We not only investigated the effects of HU on the ER in mammalian cells, but also of DIA. The results from this experiment mimicked the effect of HU (an increase in ER-ID fluorescence intensity in DIA). We merely excluded this information from the manuscript because we were focusing on HU at that point due to its importance as it is used currently in clinics. In this new version of the manuscript, we have included an extra panel in supplementary figure 5 to show the results from DIA in mammalian cells.

      Minor concerns

      1) Figure 1A should show individual data points (i.e. 3 averages of independent experiments) in the bar graph.

      Although we initially changed the graph, we believe the bar plot disposition facilitates its comprehension and went back to the initial one. Also, as the rest of the graphs similar to 1A are all expressed as bar plots, changing one would mean that, to avoid visual noise, we should change all. Therefore, we preferred keeping the figure as it was in the original version. However, we include here the graph with each of the averages of the independent experiments.

      2) It is argued that Figure 1B demonstrates that the SPB is clustered with the NPC cluster. However, a single image is not enough to support this claim, as the association could be coincidental.

      We have changed the image to show a whole population of cells, with several of them having NPC clusters, and we have indicated the position of SPB in each of them (all colocalizing with the N-Cap).

      3) Figures 1B through 1D do not indicate the HU concentration.

      We thank the reviewer for pointing out this mistake. Figures 1B and 1C represent cells exposed to 15 mM HU for 4 hours, while the graph in 1D shows the results from cells exposed to 75 mM HU over a 4-hour period. This information has been now added to the corresponding figure legend.

      4) I was confused by the photobleaching experiments of Figure S1. How do the authors know that there is complete photobleaching of the cytoplasm or nucleus in the absence of a positive control? If photobleaching is incomplete, they could be measuring motility without compartments rather than transport between compartments, and hence the conclusion that trafficking is unaffected could be wrong.

      Our control is the background of each microscopy image; we make sure that after the laser bleaches a cell, the bleached area coincides with the background noise. That way, we make sure that fluorescence from any remaining GFP is completely removed from the bleached area.

      5) On page 8, they say "exposure to DIA" when they intend HU.

      This has been corrected in the manuscript.

      6) In Figure S3A, the colocalization of INM proteins with the ER are presented. It is not clearly explained what conclusions are meant to be drawn from this figure, but it seems it would have been more useful to compare INM and Cut11, to see whether the NPCs are localizing at the INM or ONM.

      We have added an explanation in the main text to clarify the main conclusions derived from this figure. We think that NPCs localize in a section of the nucleus where the two membranes (INM and ONM) are still bound together.

      7) I had to read Figure 2C's description and caption several times to understand the experiment. A schematic would be helpful. 20 mM HU is low compared to most conditions used. Does repositioning eventually take place for 75 mM HU or 3 mM DIA treatment, or do the cells just die before they get a chance?

      20 mM HU was used in this experiment to provide a time frame suitable for analysis after HU addition, as a higher HU concentration increases the repositioning time. We found that both HU (75mM 4h) and DIA (3mM 4h)-induced ER expansions are reversible upon drug washout. If HU is kept in the media, ER expansions are eventually resolved. However, DIA is a strong oxidant and if it is kept in the media ER expansions are not resolved and cells do not survive.

      8) Figure 2D shows little oxidative consequence from 75 mM HU treatment until 40 min., the same time that phenotypes are observed (Figure 1D). Is this relationship consistent with the kinetics of other concentrations of HU, or of DIA? Seems like a pretty important mechanistic consideration that can rationalize the effects of the two oxidants.

      Thanks to this comment, we realized the notation underneath Figure 1D (1E in the new version of the manuscript) could lead to misunderstandings, as the timings there were "random". We have now made a clarification for this panel to be clearer: the timings are normalized to the moment when NPCs cluster. The fact that, before, that moment coincided with "40 minutes" does not mean N-Caps appear at that time point-quite the opposite, as most of them start to appear after >2 hours have passed in HU. We hope this can be better understood now.

      9) Figure S4 is missing the asterisk on the lower left cell.

      Fixed in the corresponding figure.

      10) How is roundness determined in Figure S4B?

      Roundness in Figure S4B (now S2E) is determined the same way as in Figure 1D, and as is described in the Method section (copied below). A clarification has been added to the legend to address that.

      The 'roundness' parameter in the 'Shape Descriptors' plugin of Fiji/ImageJ was used after applying a threshold to the image in order to select only the more intense regions and subtract background noise (Schindelin et al., 2012). Roundness descriptor follows the function:

              Round=4 X [Area]/π X [Major axis]2
      

      where [Area] constitutes the area of an ellipse fitted to the selected region in the image and [Major axis] is the diameter of the round shape that in this case would fit the perimeter of the nucleus.

      11) What threshold is used to determine whether cells analyzed in Figures S4C have "small ER" or "large ER"?

      Large ER are considered when their area along the projection of a 3-Z section is over 4 μm2 (more than twice the mean area of the ER in cells with N-Caps in milder conditions). This has now been clarified in the legend of the corresponding figure.

      __12) The authors interpret Figure 4K as indicating that ER expansion is not involved in the generation of punctal misfolded protein aggregates. However, the washout occurs only after the proteins have already aggregated. The proper interpretation is that the aggregates are not reversible by resolution of the stress, and hence are not physically reliant on disulfide bonds. __

      We agree with the reviewer and have modified the interpretation of the indicated figure accordingly (page 30).

      The speculation that these proteins are iron dependent is a stretch; there is no reason to believe that losses of iron metabolism are the most important stress in these cells. It seems at least as likely that oxidizing cysteine-containing proteins in the cytosol or messing with the GSH/GSSG ratio in the cytosol would make plenty of proteins misfold; oxidative stress in budding yeast does activate hsf1. However, this point could be addresses by centrifugation and mass spectrometry to identify the aggregated proteome. It is also surprising that the authors did not investigate ER protein aggregation, perhaps by looking at puncta formation of chaperones beyond BiP. By contrast, the fact that gcs1 deletion prevents ER expansion but does not prevent Hsp104 puncta does support the idea that cytoplasmic aggregation is not dependent on ER expansion.

      To address this suggestion, we plan to analyze the localization of other chaperones and components of the protein quality control such as the ER Hsp40 Scj1 or the ribosome-associated Hsp70 Sks2.

      13) Figure 4L is cited on page 28 when Figure 4K is intended.

      This has been corrected in the text, although new panels have been added and now it is 4N.

      • *
    1. Reviewer #2 (Public review):

      Summary:

      This study aimed to investigate changes in neural responses over time after acute stress and their association with real-life stress. To this end, functional MRI data was collected from 3 tasks (Oddball, 2-back, Associative retrieval) early and late following stress and control conditions. Emotional ratings during a stressful week before an exam and a non-stressful week without an exam were used to index real-world stress. In total, data from 70 individuals were used for the analyses in the paper. Results showed increased oddball related activation early after stress whereas activation to the associative retrieval was reduced across early and late trials following stress compared with control. Brain activation during the oddball task after stress contrasted against control correlated with the index used to measure stress in the real-world. This is a very ambitious study and the findings that stress has opposite effects on the oddball and the associative retrieval tasks is new. However, I am not convinced that brain responses are correlated with real-world stress from the results presented in the paper. I also have several other concerns listed below.

      Strengths:

      The study uses a unique design based on hypothesis firmly grounded in theories of stress related brain function. Large amounts of data are collected for all of the 70 participants included in the analyses and the hypotheses tested using paired tests have strong statistical power. Data collection methods are sound aiming to reduce stress induced by being in the scanner environment for the first time and reducing variation in cortisol due to circadian rhythm.

      Weaknesses:

      An important argument in the paper is that neural responses associated with stress in the lab correspond to stress in real life. This conclusion is based on a single correlation analysis. This is weak evidence because the correlation is based on 70 individuals and may be driven by outliers. In fact, the correlation between the difference in stress-related SN activation (Stress-Control) and real life stress residual is likely to be driven by outliers. In fig 5b, there are 3 persons with SN values of around 2, which is twice as much as the fourth highest value. There is also 1 person with a Real life stress residual of -3 or -4, which is three to four times as much as the person with the second lowest value. These 4 outliers should be removed before calculating the correlation coefficient. Also, no power analysis is presented in the paper showing what effect size is needed for significant results given a sample size of 70.

      It is not clear why the activation maps from the tasks performed in the scanner are referred to as the SN, ECN, and DMN. They are discussed as if they were resting state networks. They are however not resting state networks because they are the results of contrasting two task conditions to each other and not the results from correlating BOLD time-series data from different regions within subjects. Even though masks corresponding to SN, ECN, and DMN are used to calculate means of all voxels, I think these contrasts should be referred to as the tasks that were used to evoke them. It becomes misleading to call them networks which usually refers to nodes and edges in fMRI studies. The first scan was a resting state scan, but these data are not presented in the paper.

      Introduction<br /> In the introduction it is said that there are genomically driven effects of cortisol 1 to 2 hours after stress. This is repeated in the discussion: "[the late stress phase] is thought to be dominated by genomically driven effects of glucocorticoids". (There is no reference to this statement however.) This idea, that gene expression should only be regulated by corticosteroids following stress seems unrealistic. The increase in cortisol was only around 60% from baseline in the current study which seems to be similar to other studies. This means that the baseline cortisol level is far from zero. Therefore, effects of cortisol on gene expression must occur all the time and be tightly regulated by circadian clocks. To propose that genomically driven effects of cortisol only exist 1 to 2 hours following stress is therefore too simplistic.

      In the last paragraph, it says that n=83. However, the final sample consists of 70 people. Correct this number.

      Methods<br /> The EMA data analysis is difficult to understand. Why are the residuals used instead of means for example? I could not understand how the residual values used in the analysis should be interpreted from the way this section was written. Therefore, I cannot judge whether the index is valid or reliable. Using mean values is more common than using residuals when investigating individual differences in stress responses. The use of residuals needs justification and clarification. The results from an analysis using mean values should also be reported.

      How was AUCi calculated? What software was used to calculate AUCi?

      How was the mediation analysis performed? The only information I found was: "We additionally ran separate models with an interaction term modelled for neural activity in the targeted ROI's to examine the relationship between task performance and neural responses, with random slopes and intercepts also modelled for ROI activity." This is not how mediation analyses are done conventionally. It is common to use structural equation modelling or a series of regression analyses. What is meant by separate models? Was a reduced model compared to a full model with an interaction term? In this case, this is not a mediation analysis. I think the term moderation is better to describe this analysis.

  4. social-media-ethics-automation.github.io social-media-ethics-automation.github.io
    1. When someone presents themselves as open and as sharing their vulnerabilities with us, it makes the connection feel authentic. We feel like they have entangled their wellbeing with ours by sharing their vulnerabilities with us. Think about how this works with celebrity personalities. Jennifer Lawrence became a favorite of many when she tripped at the Oscars [f2], and turned the moment into her persona as someone with a cool-girl, unpolished, unfiltered way about her. She came across as relatable and as sharing her vulnerabilities with us, which let many people feel that they had a closer, more authentic connection with her. Over time, that persona has come to be read differently, with some suggesting that this open-styled persona is in itself also a performance. Does this mean that her performance of vulnerability was inauthentic?

      When I read this paragraph, I think this phenomenon is very interesting. In particular, the example of Jennifer Lawrence makes me instantly understand what is expressed in topic sentence. When someone presents a very perfect feeling, maybe not many people like them because it gives others the impression that this person is very fake and unfamiliar. However, if some weaknesses or shortcomings are properly exposed, it may appear that the person is very friendly and sincere. I also looked up some relevant materials and found that we can make use of this psychological state to help us narrow the distance with others in the process of making friends.

    2. These needs may not always be as obvious in highly individualized societies, like Post-Enlightenment Europe and the United States. The possibility for self-reliance has been created in part by making certain things dependable and institutionalized. You can go get yourself food without feeling like you have to trust anyone because you can just go to the store (which has to adhere to corporate legal requirements) and buy food (the supply of which is made stable by complex networks of growing, manufacturing, and transportation, covered by the assurances of FDA-compliant labeling) from people who work there (and are subject to labor laws and HR regulations, which, if they are not followed, means the staff person does not get paid, so their wellbeing depends on them doing their job). The need to trust other people is obscured by the many institutions that we have created. Institutions have ways, sometimes, of getting around human whims and surprises. But at the end of the day, it is still hugely important to us that we feel clear about who can be trusted, and for what.

      This passage is a logically clear and vivid discussion, which successfully explains the deep human need for authenticity and its social significance. I think this example is very good. Everyone has the experience of going to the supermarket to buy things. This very daily example made me understand this argument at once and agree with it.

    3. These reactions make sense. Try to imagine the early days of human social life, before we started attaching our welfare to the land in terms of planting crops and building structures designed for permanence. Our nomadic forebears functioned in groups who coordinated in highly specialized ways to ensure the survival of the whole. Although such communities are often pictured as being prehistoric, primitive, and obsolete, we now know that such societies were and are highly sophisticated, often developing and depending on highly specified legal codes, some of which are still in use today in Bedouin communities in North Africa. Other nomadic groups, such as Roma people (which you may have heard derogatorily called ‘gypsies’), live within and around land-based nations and their various borders and laws. To ensure the survival of their ethnicity, cultures, and languages, they depend on being able to trust each other. The nations whose land we are living and studying on here also knew the importance of being able to know who can be trusted. These needs may not always be as obvious in highly individualized societies, like Post-Enlightenment Europe and the United States. The possibility for self-reliance has been created in part by making certain things dependable and institutionalized. You can go get yourself food without feeling like you have to trust anyone because you can just go to the store (which has to adhere to corporate legal requirements) and buy food (the supply of which is made stable by complex networks of growing, manufacturing, and transportation, covered by the assurances of FDA-compliant labeling) from people who work there (and are subject to labor laws and HR regulations, which, if they are not followed, means the staff person does not get paid, so their wellbeing depends on them doing their job). The need to trust other people is obscured by the many institutions that we have created. Institutions have ways, sometimes, of getting around human whims and surprises. But at the end of the day, it is still hugely important to us that we feel clear about who can be trusted, and for what.

      It’s interesting to think about how trust was such a fundamental part of survival in nomadic societies, and how that need for trust hasn’t gone away—it’s just evolved into more complex systems in modern life. I never really thought about it that way before, especially with things like buying food. Even though we don’t always feel like we need to trust the person behind the counter, so many systems rely on trust—like laws, regulations, and the people making sure everything runs smoothly. It makes me realize how much we depend on trust in ways we don’t always see.

    1. "We should make few friends for the sake of pleasure, since but little sweetness suffices to season life, just as little salt suffices for our meat."

      I found this quote/ analogy to be really beneficial as a way of making a point, but also interesting for what it might be extended to mean. I think this perfectly describes the previous sentence about a lack of mirth being better than too much merrymaking, since people would generally agree that plain food is more edible than way overly-salted food. But in either case, doing so once in a while is not going to kill you (nor will an occasional imbalance of mirth condemn a person). Perhaps more interestingly, this analogy made me wonder about the exceptions. For general eating, you may season meat lightly. But to preserve the meat due to certain seasons or circumstances, you might apply a lot of salt (like with jerky). Thus, how games would have been considered in very dark times like war, famine, disease, etc. On one hand, I could see a defense for perhaps even more merriment, given that these times would tax the soul further than usual and would thus need sufficient relaxation, as mentioned earlier by Aquinas' story of the tense bow. However, these times might also be deemed too inappropriate for games given the earlier mentioned possible objections of jokes being wrong for certain situations.

    1. However, in teacher research, the data collection effort is purposeful, deliberate, organized, and systematic. The information we gather from our data may serve as evidence that confirms our insights and validates our intuition.

      I chose this section to annotate because as a social media manager, I also have a purpose with my research and the data I receive will also confirm my intuition on what I felt I was looking for. I ultimately decided that I will do my research on social media in the sports world since I am in SRM. I haven't exactly decided where I will go with my question but I have some ideas. I have a feeling about what I think my results may look like, so whatever data i gather from games can help confirm if I was correct or not.

    1. However, most societies do not value creative thinking and so our skills in generating ideas rapidly atrophies, as we do not practice it, and instead actively learn to suppress it

      I think this point was pretty interesting. This reminds me of how in class, we talked about how when brainstorming ideas, we need to unfilter out our ideas and let them flow out, because if we filter out our ideas, we may lose out on interesting ideas that can contribute to the bigger picture of how we want the project to look like. Plus, taking a little bit and looking back at the idea later can also add interesting insights that make it useful as opposed to just saying it's a dumb idea and forgetting about it.

    1. One critique of human-centered design is that it narrowly focuses on people and their needs rather than a systems-level view of the activities that people engage in, and the multiple people and systems involved in those activities.

      I understand this point of view because this was my first thought when being introduced to the human centered design. I agree with this statement but maybe a little too much. What happens when we have to consider too many groups of people involved. A new design/solution may not be able to account for everybody without losing quality or function? How does one deal with that appropriately? I think as I start to design this question will be the most prominent.

    2. Some design scholars have questioned whether focusing on people and activities is enough to account for what really matters, encouraging designers to consider human values77 Friedman, B., & Hendry, D. G. (2019). Value sensitive design: Shaping technology with moral imagination. MIT Press. . For example, instead of viewing a pizza delivery app as a way to get pizza faster and more easily, we might view it as a way of supporting the independence of elderly who do not have the mobility to pick up a pizza on their own. Or, perhaps more darkly, instead of viewing TSA screening at an airport a way of identifying potential terrorists, we consider it through the value of power, as the screening process had more to do with maintaining political power in times of fear than it did with actually preventing terrorism. This shift in framing can enable designers to better consider the values of design stakeholders through their design process, and identify people they may not have designed for otherwise (e.g., people who are house bound because of injury, or politicians).

      I found this paragraph in Chapter 1 really interesting because the author suggests we should look at design, and situations in general, in a different lens. Too often we focus on what's the most obvious target and work backwards from there, but it can also be beneficial to first consider multiple perspectives and determine other stakeholders in the process. Having a different lens when viewing design choices can allow you to make better decisions and more aware of the user. I think what the author is suggesting overall is that we should not be hyper fixate on particular group.

    3. Some design scholars have questioned whether focusing on people and activities is enough to account for what really matters, encouraging designers to consider human values77 Friedman, B., & Hendry, D. G. (2019). Value sensitive design: Shaping technology with moral imagination. MIT Press. . For example, instead of viewing a pizza delivery app as a way to get pizza faster and more easily, we might view it as a way of supporting the independence of elderly who do not have the mobility to pick up a pizza on their own. Or, perhaps more darkly, instead of viewing TSA screening at an airport a way of identifying potential terrorists, we consider it through the value of power, as the screening process had more to do with maintaining political power in times of fear than it did with actually preventing terrorism. This shift in framing can enable designers to better consider the values of design stakeholders through their design process, and identify people they may not have designed for otherwise (e.g., people who are house bound because of injury, or politicians).

      I agree that looking at human values is important when considering "fixing" and "improving" the lives of specific groups. As mentioned, while a mobile app may help a specific group, it can also unintentional but another group at a disadvantage. In my opinion, I think it is easy to get so focused on fixing an issue, that it's easy to neglect or forget about the others involved. I agree that viewing it from this perspective does allow designers to consider those who may be affected "negatively" by the design.

    1. In fact, intercultural communication has the potential to enrich various aspects of our lives. In order to communicate well within various cultural contexts, it is important to keep an open mind and avoid making assumptions about others’ cultural identities. While you may be able to identify some aspects of the cultural context within a communication encounter, there may also be cultural influences that you can’t see. A competent communicator shouldn’t assume to know all the cultural contexts a person brings to an encounter, since not all cultural identities are visible. As with the other contexts, it requires skill to adapt to shifting contexts, and the best way to develop these skills is through practice and reflection.

      This is also one of the challenging parts especially if your culture differs with others. like in my country talking to someone older than you and making eye contact with them is disrespectful but here its different and like this annotation says at the end we just need to adapt skills through practice and reflection. i think this can take a while to do but the more you practice the better you succeed.

    1. Authors’ Response (31 October 2024)

      GENERAL ASSESSMENT

      Pannexin (Panx) hemichannels are a family of heptameric membrane proteins that form pores in the plasma membrane through which ions and relatively large organic molecules can permeate. ATP release through Panx channels during the process of apoptosis is one established biological role of these proteins in the immune system, but they are widely expressed in many cells throughout the body, including the nervous system, and likely play many interesting and important roles that are yet to be defined. Although several structures have now been solved of different Panx subtypes from different species, their biophysical mechanisms remain poorly understood, including what physiological signals control their activation. Electrophysiological measurements of ionic currents flowing in response to Panx channel activation have shown that some subtypes can be activated by strong membrane depolarization or caspase cleavage of the C-terminus. Here, Henze and colleagues set out to identify endogenous activators of Panx channels, focusing on the Panx1 and Panx2 subtypes, by fractionating mouse liver extracts and screening for activation of Panx channels expressed in mammalian cells using whole-cell patch clamp recordings. The authors present a comprehensive examination with robust methodologies and supporting data that demonstrate that lysophospholipids (LPCs) directly Panx-1 and 2 channels. These methodologies include channel mutagenesis, electrophysiology, ATP release and fluorescence assays, molecular modelling, and cryogenic electron microscopy (cryo-EM). Mouse liver extracts were initially used to identify LPC activators, but the authors go on to individually evaluate many different types of LPCs to determine those that are more specific for Panx channel activation. Importantly, the enzymes that endogenously regulate the production of these LPCs were also assessed along with other by-products that were shown not to promote pannexin channel activation. In addition, the authors used synovial fluid from canine patients, which is enriched in LPCs, to highlight the importance of the findings in pathology. Overall, we think this is likely to be a landmark study because it provides strong evidence that LPCs can function as activators of Panx1 and Panx2 channels, linking two established mediators of inflammatory responses and opening an entirely new area for exploring the biological roles of Panx channels. Although the mechanism of LPC activation of Panx channels remains unresolved, this study provides an excellent foundation for future studies and importantly provides clinical relevance.

      We thank the reviewers for their time and effort in reviewing our manuscript. Based on their valuable comments and suggestions, we have made substantial revisions. The updated manuscript now includes two new experiments supporting that lysophospholipid-triggered channel activation promotes the release of signaling molecules critical for immune response and demonstrates that this novel class of agonist activates the inflammasome in human macrophages through endogenously expressed Panx1. To better highlight the significance of our findings, we have excluded the cryo-EM panel from this manuscript. We believe these changes address the main concerns raised by the reviewers and enhance the overall clarity and impact of our findings. Below, we provide a point-by-point response to each of the reviewers’ comments.

      RECOMMENDATIONS

      Essential revisions:

      1. The authors present a tremendous amount of data using different approaches, cells and assays along with a written presentation that is quite abbreviated, which may make comprehension challenging for some readers. We would encourage the authors to expand the written presentation to more fully describe the experiments that were done and how the data were analysed so that the 2 key conclusions can be more fully appreciated by readers. A lot of data is also presented in supplemental figures that could be brought into the main figures and more thoroughly presented and discussed.

      We appreciate and agree with the reviewers’ observation. Our initial manuscript may have been challenging to follow due to our use of both wild-type and GS-tagged versions of Panx1 from human and frog origins, combined with different fluorescence techniques across cell types. In this revision, we used only human wild-type Panx1 expressed in HEK293S GnTI<sup>-</sup> cells, except for activity-guided fractionation experiments, where we used GS-tagged Panx1 expressed in HEK293 cells (Fig. 1). For functional reconstitution studies, we employed YO-PRO-1 uptake assays, as optimizing the Venus-based assay was challenging. We have clarified these exceptions in the main text. We think these adjustments simplify the narrative and ensure an appropriate balance between main and supplemental figures.

      1. It would also be useful to present data on the ion selectivity of Panx channels activated by LPC. How does this compare to data obtained when the channel is activated by depolarization? If the two stimuli activate related open states then the ion selectivity may be quite similar, but perhaps not if the two stimuli activate different open states. The authors earlier work in eLife shows interesting shifts in reversal potentials (Vrev) when substituting external chloride with gluconate but not when substituting external sodium with N-methyl-D-glucamine, and these changed with mutations within the external pore of Panx channels. Related measurements comparing channels activated by LPC with membrane depolarization would be valuable for assessing whether similar or distinct open states are activated by LPC and voltage. It would be ideal to make Vrev measurements using a fixed step depolarization to open the channel and then various steps to more negative voltages to measure tail currents in pinpointing Vrev (a so called instantaneous IV).

      We fully agree with the reviewer on the importance of ion selectivity experiments. However, comparing the properties of LPC-activated channels with those activated by membrane depolarization presented technical challenges, as LPC appears to stimulate Panx1 in synergy with voltage. Prolonged LPC exposure destabilizes patches, complicating G-V curve acquisition and kinetic analyses. While such experiments could provide mechanistic insights, we think they are beyond the scope of current study.

      1. Data is presented for expression of Panx channels in different cell types (HEK vs HEKS GnTI-) and different constructs (Panx1 vs Panx1-GS vs other engineered constructs). The authors have tried to be clear about what was done in each experiment, but it can be challenging for the reader to keep everything straight. The labelling in Fig 1E helps a lot, and we encourage the authors to use that approach systematically throughout. It would also help to clearly identify the cell type and channel construct whenever showing traces, like those in Fig 1D. Doing this systematically throughout all the figures would also make it clear where a control is missing. For example, if labelling for the type of cell was included in Fig 1D it would be immediately clear that a GnTI- vector alone control for WT Panx1 is missing as the vector control shown is for HEK cells and formally that is only a control for Panx2 and 3. Can the authors explain why PLC activates Panx1 overexpressed in HEK293 GnTl- cells but not in HEK293 cells? Is this purely a function of expression levels? If so, it would be good to provide that supporting information.

      As mentioned above, we believe our revised version is more straightforward to digest. We have improved labeling and provided explanations where necessary to clarify the manuscript. While Panx1 expression levels are indeed higher in GnTI<sup>-</sup> than in HEK293 cells, we are uncertain whether the absence of detectable currents in HEK293 cells is solely due to expression levels. Some post-translational modifications that inhibit Panx1, such as lysine acetylation, may also impact activity. Future studies are needed to explore these mechanisms further.

      1. The mVenus quenching experiments are somewhat confusing in the way data are presented. In Fig 2B the y axis is labelled fluorescence (%) but when the channel is closed at time = 0 the value of fluorescence is 0 rather than 100 %, and as the channel opens when LPC is added the values grow towards 100 instead of towards 0 as iodide permeates and quenches. It would be helpful if these types of data could be presented more intuitively. Also, how was the initial rate calculated that is plotted in Fig 2C? It would be helpful to show how this is done in a figure panel somewhere. Why was the initial rate expressed as a percent maximum, what is the maximum and why are the values so low? Why is the effect of CBX so weak in these quenching experiments with Panx1 compared to other assays? This assay is used in a lot of experiments so anything that could be done to bolster confidence is what it reports on would be valuable to readers. Bringing in as many control experiments that have been done, including any that are already published, would be helpful.

      We modified the Y-axis in Figure 2 to “Quench (%)” for clarity. The data reflects fluorescence reduction over time, starting from LPC addition, normalized to the maximal decrease observed after Triton-X100 addition (3 minutes), enabling consistent quenching value comparisons. Although the quenching value appears small, normalization against complete cell solubilization provides reproducible comparisons. We do not fully understand why CBX effects vary in Venus quenching experiments, but we speculate that its steroid-like pentacyclic structure may influence the lysophospholipid agonistic effects. As noted in prior studies (DOI: 10.1085/jgp.201511505; DOI: 10.7554/eLife.54670), CBX likely acts as an allosteric modulator rather than a simple pore blocker, potentially contributing to these variations.

      1. Could provide more information to help rationalize how Yo-Pro-1, which has a charge of +2, can permeate what are thought to be anion favouring Panx channels? We appreciate that the biophysical properties of Panx channel remain mysterious, but it would help to hear how a bit more about the authors thinking. It might also help to cite other papers that have measured Yo-Pro-1 uptake through Panx channels. Was the Strep-tagged construct of Panx1 expressed in GnTI- cells and shown to be functional using electrophysiology?

      Our recent study suggest that the electrostatic landscape along the permeation pathway may influence its ion selectivity (DOI: 10.1101/2024.06.13.598903). However, we have not yet fully elucidated how Panx1 permeates both anions and cations. Based on our findings, ion selectivity may vary with activation stimulus intensity and duration. Cation permeation through Panx1 is often demonstrated with YO-PRO-1, which measures uptake over minutes, unlike electrophysiological measurements conducted over milliseconds to seconds. We referenced two representative studies employing YO-PRO-1 to assess Panx1 activity. Whole-cell current measurements from a similar construct with an intracellular loop insertion indicate that our STREP-tagged construct likely retains functional capacity.

      1. In Fig 5 panel C, data is presented as the ratio of LPC induced current at -60 mV to that measured at +110 mV in the absence of LPC. What is the rationale for analysing the data this way? It would be helpful to also plot the two values separately for all of the constructs presented so the reader can see whether any of the mutants disproportionately alter LPC induced current relative to depolarization activated current. Also, for all currents shown in the figures, the authors should include a dashed coloured line at zero current, both for the LPC activated currents and the voltage steps.

      We used the ratio of LPC-induced current to the current measured at +110 mV to determine whether any of the mutants disproportionately affect LPC-induced current relative to depolarization-activated current. Since the mutants that did not respond to LPC also exhibited smaller voltage-stimulated currents than those that did respond, we reasoned that using this ratio would better capture the information the reviewer is suggesting to gauge. Showing the zero current level may be helpful if the goal was to compare basal currents, which in our experience vary significantly from patch to patch. However, since we are comparing LPC- and voltage-induced currents within the same patch, we believe that including basal current measurements would not add useful information to our study.

      Given that new experiments included to further highlight the significance of the discovery of Panx1 agonists, we opted to separate structure-based mechanistic studies from this manuscript and removed this experiment along with the docking and cryo-EM studies.

      1. The fragmented NTD density shown in Fig S8 panel A may resemble either lipid density or the average density of both NTD and lipid. For example, Class7 and Class8 in Fig.S8 panel D displayed split densities, which may resemble a phosphate head group and two tails of lipid. A protomer mask may not be the ideal approach to separate different classes of NTD because as shown in Fig S8 panel D, most high-resolution features are located on TM1-4, suggesting that the classification was focused on TM1-4. A more suitable approach would involve using a smaller mask including NTD, TM1, and the neighbouring TM2 region to separate different NTD classes.

      We agree with the reviewer and attempted 3D classification using multiple smaller masks including the suggested region. However, the maps remained poorly defined, and we were unable to confidently assign the NTD.

      1. The authors don’t discuss whether the LPC-bound structures display changes in the external part of the pore, which is the anion-selective filter and the narrower part of the pore. If there are no conformational changes there, then the present structures cannot explain permeability to large molecules like ATP. In this context, a plot for the pore dimension will be helpful to see differences along the pore between their different structures. It would also be clearer if the authors overlaid maps of protomers to illustrate differences at the NTD and the "selectivity filter."

      Both maps show that the narrowest constriction, formed by W74, has a diameter of approximately 9 Å. Previous steered molecular dynamics simulations suggest that ATP can permeate through such a constriction, implying an ion selection mechanism distinct from a simple steric barrier.

      1. The time between the addition of LPC to the nanodisc-reconstituted protein and grid preparation is not mentioned. Dynamic diffusion of LPC could result in equal probabilities for the bound and unbound forms. This raises the possibility of finding the Primed state in the LPC-bound state as well. Additionally, can the authors rationalize how LPC might reach the pore region when the channel is in the closed state before the application of LPC?

      We appreciate the reviewer’s insight. We incubated LPC and nanodisc-reconstituted protein for 30 minutes, speculating that LPC approaches the pore similarly to other lipids in prior structures. In separate studies, we are optimizing conditions to capture more defined conformations.

      1. In the cryo-EM map of the “resting” state (EMDB-21150), a part of the density was interpreted as NTD flipped to the intracellular side. This density, however, is poorly defined, and not connected to the S1 helix, raising concerns about whether this density corresponds to the NTD as seen in the “resting” state structure (PDB-ID: 6VD7). In addition, some residues in the C-terminus (after K333 in frog PANX1) are missing from the atomic model. Some of these residues are predicted by AlphaFold2 to form a short alpha helix and are shown to form a short alpha helix in some published PANX1 structures. Interestingly, in both the AF2 model and 6WBF, this short alpha helix is located approximately in the weak density that the authors suggest represents the “flipped” NTD. We encourage the authors to be cautious in interpreting this part as the “flipped” NTD without further validation or justification.

      We agree that the density corresponding the extended NTD into the cytoplasm is relatively weak. In our recent study, we compared two Panx1 structures with or without the mentioned C-terminal helix and found evidence suggesting the likelihood of NTD extension (DOI: 10.1101/2024.06.13.598903). Nevertheless, to prevent potential confusion, we have removed the cryo-EM panel from this manuscript.

      1. Since the authors did not observe densities of bound PLC in the cryo-EM map, it is important to acknowledge in the text the inherent limitations of using docking and mutagenesis methods to locate where PLC binds.

      Thank you for the suggestion. We have removed this section to avoid potential confusion.

      Optional suggestions:

      1. The authors used MeOH to extract mouse liver for reversed-phase chromatography. Was the study designed to focus on hydrophobic compounds that likely bind to the TMD? Panx1 has both ECD and ICD with substantial sizes that could interact with water soluble compounds? Also, the use of whole-cell recordings to screen fractions would not likely identify polar compounds that interact with the cytoplasmic part of the TMD? It would be useful for the authors to comment on these aspects of their screen and provide their rationale for fractionating liver rather than other tissues.

      We have added a rationale in line 90, stating: “The soluble fractions were excluded from this study, as the most polar fraction induced strong channel activities in the absence of exogenously expressed pannexins.” Additionally, we have included a figure to support this rationale (Fig. S1A).

      1. The authors show that LPCs reversibly increase inward currents at a holding voltage of -60 mV (not always specified in legends) in cells expressing Panx1 and 2, and then show families of currents activated by depolarizing voltage steps in the absence of LPC without asking what happens when you depolarize the membrane after LPC activation? If LPCs can be applied for long enough without disrupting recordings, it would be valuable to obtain both I-V relations and G-V relations before and after LPC activation of Panx channels. Does LPC disproportionately increase current at some voltages compared to others? Is the outward rectification reduced by LPC? Does Vrev remain unchanged (see point above)? Its hard to predict what would be observed, but almost any outcome from these experiments would suggest additional experiments to explore the extent to which the open states activated by LPC and depolarization are similar or distinct.

      Unfortunately, in our hands, the prolonged application of lysolipids at concentrations necessary to achieve significant currents tends to destabilize the patch. This makes it challenging to obtain G-V curves or perform the previously mentioned kinetic analyses. We believe this destabilization may be due to lysolipids’ surfactant-like qualities, which can disrupt the giga seal. Additionally, prolonged exposure seems to cause channel desensitization, which could be another confounding factor.

      1. From the results presented, the authors cannot rule out that mutagenesis-induced insensitivity of Panx channels to LPCs results from allosteric perturbations in the channels rather than direct binding/gating by LPCs. In Fig 5 panel A-C, the authors introduced double mutants on TM1 and TM2 to interfere with LPC binding, however, the double mutants may also disrupt the interaction network formed within NTD, TM1, and TM2. This disruption could potentially rearrange the conformation of NTD, favouring the resting closed state. Three double Asn mutants, which abolished LPC induced current, also exhibited lower currents through voltage activation in Fig 5S, raising the possibility the mutant channels fail to activate in response to LPC due to an increased energy barrier. One way to gain further insight would be to mutate residues in NTD that interact with those substituted by the three double Asn mutants and to measuring currents from both voltage activation and LPC activation. Such results might help to elucidate whether the three double Asn mutants interfere with LPC binding. It would also be important to show that the voltage-activated currents in Fig. S5 are sensitive to CBX?

      Thank you for the comment, with which we agree. Our initial intention was to use the mutagenesis studies to experimentally support the docking study. Due to uncertainties associated with the presented cryo-EM maps, we have decided to remove this study from the current manuscript. We will consider the proposed experiments in a future study.

      1. Could the authors elaborate on how LPC opens Panx1 by altering the conformation of the NTDs in an uncoordinated manner, going from “primed” state to the “active” state. In the “primed” state, the NTDs seem to be ordered by forming interactions with the TMD, thus resulting in the largest (possible?) pore size around the NTDs. In contrast, in the “active” state, the authors suggest that the NTDs are fragmented as a result of uncoordinated rearrangement, which conceivably will lead to a reduction in pore size around NTDs (isn’t it?). It is therefore not intuitive to understand why a conformation with a smaller pore size represents an “active” state.

      We believe the uncoordinated arrangement of NTDs is dynamic, allowing for potential variations in pore size during the activated conformation. Alternatively, NTD movement may be coupled with conformational changes in TM1 and the extracellular domain, which in turn could alter the electrostatic properties of the permeation pathway. We believe a functional study exploring this mechanism would be more appropriately presented as a separate study.

      1. Can the authors provide a positive control for these negative results presented in Fig S1B and C?

      The positive results are presented in Fig. 1D and E.

      1. Raw images in Fig S6 and Fig S7 should contain units of measurement.

      Thank you for pointing this out.

      1. It may be beneficial to show the superposition between primed state and activated state in both protomer and overall structure. In addition, superposition between primed state and PDB 7F8J.

      We attempted to superimpose the cryo-EM maps; however, visually highlighting the differences in figure format proved challenging. Higher-resolution maps would allow for model building, which would more effectively convey these distinctions.

      1. Including particles number in each class in Fig S8 panel C and D would help in evaluating the quality of classification.

      Noted.

      1. A table for cryo-EM statistics should be included.

      Thanks, noted.

      1. n values are often provided as a range within legends but it would be better to provide individual values for each dataset. In many figures you can see most of the data points, which is great, but it would be easy to add n values to the plots themselves, perhaps in parentheses above the data points.

      While we agree that transparency is essential, adding n-values to each graph would make some figures less clear and potentially harder to interpret in this case. We believe that the dot plots, n-value range, and statistical analysis provide adequate support for our claims.

      1. The way caspase activation of Panx channels is presented in the introduction could be viewed as dismissive or inflammatory for those who have studied that mechanism. We think the caspase activation literature is quite convincing and there is no need to be dismissive when pointing out that there are good reasons to believe that other mechanisms of activation likely exist. We encourage you to revise the introduction accordingly.

      Thank you for this comment. Although we intended to support the caspase activation mechanism in our introduction, we understand that the reviewer’s interpretation indicates a need for clarification. We hope the revised introduction removes any perception of dismissiveness.

      1. Why is the patient data in Fig 4F normalized differently than everything else? Once the above issues with mVenus quenching data are clarified, it would be good to be systematic and use the same approach here.

      For Fig. 4F, we used a distinct normalization method to account for substantial day-to-day variation in experiments involving body fluids. Notably, we did not apply this normalization to other experimental panels due to their considerably lower day-to-day variation.

      1. What was the rational for using the structure from ref 35 in the docking task?

      The docking task utilized the human orthologue with a flipped-up NTD. We believe that this flipped-up conformation is likely the active form that responds to lysolipids. As our functional experiments primarily use the human orthologue for biological relevance, this structure choice is consistent. Our docking data shows that LPC does not dock at this site when using a construct with the downward-flipped NTD.

      1. Perhaps better to refer to double Asn ‘substitutions’ rather than as ‘mutations’ because that makes one think they are Asn in the wt protein.

      Done.

      1. From Fig S1, we gather that Panx2 is much larger than Panx1 and 3. If that is the case, its worth noting that to readers somewhere.

      We have added the molecular weight of each subtype in the figure legend.

      1. Please provide holding voltages and zero current levels in all figures presenting currents.

      We provided holding voltages. However, the zero current levels vary among the examples presented, making direct comparisons difficult. Since we are comparing currents with and without LPC, we believe that indicating zero current levels is unnecessary for this study.

      1. While the authors successfully establish lysophospholipid-gating of Panx1 and Panx2, Panx3 appears unaffected. It may be advisable to be more specific in the title of the article.

      We are uncertain whether Panx3 is unaffected by lysophospholipids, as we have not observed activation of this subtype under any tested conditions.

      (This is a response to peer review conducted by Biophysics Colab on version 1 of this preprint.)

    2. Consolidated Peer Review Report (20 December 2023)

      GENERAL ASSESSMENT

      Pannexin (Panx) hemichannels are a family of heptameric membrane proteins that form pores in the plasma membrane through which ions and relatively large organic molecules can permeate. ATP release through Panx channels during the process of apoptosis is one established biological role of these proteins in the immune system, but they are widely expressed in many cells throughout the body, including the nervous system, and likely play many interesting and important roles that are yet to be defined. Although several structures have now been solved of different Panx subtypes from different species, their biophysical mechanisms remain poorly understood, including what physiological signals control their activation. Electrophysiological measurements of ionic currents flowing in response to Panx channel activation have shown that some subtypes can be activated by strong membrane depolarization or caspase cleavage of the C-terminus. Here, Henze and colleagues set out to identify endogenous activators of Panx channels, focusing on the Panx1 and Panx2 subtypes, by fractionating mouse liver extracts and screening for activation of Panx channels expressed in mammalian cells using whole-cell patch clamp recordings. The authors present a comprehensive examination with robust methodologies and supporting data that demonstrate that lysophospholipids (LPCs) directly Panx-1 and 2 channels. These methodologies include channel mutagenesis, electrophysiology, ATP release and fluorescence assays, molecular modelling, and cryogenic electron microscopy (cryo-EM). Mouse liver extracts were initially used to identify LPC activators, but the authors go on to individually evaluate many different types of LPCs to determine those that are more specific for Panx channel activation. Importantly, the enzymes that endogenously regulate the production of these LPCs were also assessed along with other by-products that were shown not to promote pannexin channel activation. In addition, the authors used synovial fluid from canine patients, which is enriched in LPCs, to highlight the importance of the findings in pathology. Overall, we think this is likely to be a landmark study because it provides strong evidence that LPCs can function as activators of Panx1 and Panx2 channels, linking two established mediators of inflammatory responses and opening an entirely new area for exploring the biological roles of Panx channels. Although the mechanism of LPC activation of Panx channels remains unresolved, this study provides an excellent foundation for future studies and importantly provides clinical relevance.

      RECOMMENDATIONS

      Essential revisions:

      1. The authors present a tremendous amount of data using different approaches, cells and assays along with a written presentation that is quite abbreviated, which may make comprehension challenging for some readers. We would encourage the authors to expand the written presentation to more fully describe the experiments that were done and how the data were analysed so that the key conclusions can be more fully appreciated by readers. A lot of data is also presented in supplemental figures that could be brought into the main figures and more thoroughly presented and discussed.
      2. It would also be useful to present data on the ion selectivity of Panx channels activated by LPC. How does this compare to data obtained when the channel is activated by depolarization? If the two stimuli activate related open states then the ion selectivity may be quite similar, but perhaps not if the two stimuli activate different open states. The authors earlier work in eLife shows interesting shifts in reversal potentials (Vrev) when substituting external chloride with gluconate but not when substituting external sodium with N-methyl-D-glucamine, and these changed with mutations within the external pore of Panx channels. Related measurements comparing channels activated by LPC with membrane depolarization would be valuable for assessing whether similar or distinct open states are activated by LPC and voltage. It would be ideal to make Vrev measurements using a fixed step depolarization to open the channel and then various steps to more negative voltages to measure tail currents in pinpointing Vrev (a so called instantaneous IV).
      3. Data is presented for expression of Panx channels in different cell types (HEK vs HEKS GnTI-) and different constructs (Panx1 vs Panx1-GS vs other engineered constructs). The authors have tried to be clear about what was done in each experiment, but it can be challenging for the reader to keep everything straight. The labelling in Fig 1E helps a lot, and we encourage the authors to use that approach systematically throughout. It would also help to clearly identify the cell type and channel construct whenever showing traces, like those in Fig 1D. Doing this systematically throughout all the figures would also make it clear where a control is missing. For example, if labelling for the type of cell was included in Fig 1D it would be immediately clear that a GnTI- vector alone control for WT Panx1 is missing as the vector control shown is for HEK cells and formally that is only a control for Panx2 and 3. Can the authors explain why PLC activates Panx1 overexpressed in HEK293 GnTl- cells but not in HEK293 cells? Is this purely a function of expression levels? If so, it would be good to provide that supporting information.
      4. The mVenus quenching experiments are somewhat confusing in the way data are presented. In Fig 2B the y axis is labelled fluorescence (%) but when the channel is closed at time = 0 the value of fluorescence is 0 rather than 100 %, and as the channel opens when LPC is added the values grow towards 100 instead of towards 0 as iodide permeates and quenches. It would be helpful if these types of data could be presented more intuitively. Also, how was the initial rate calculated that is plotted in Fig 2C? It would be helpful to show how this is done in a figure panel somewhere. Why was the initial rate expressed as a percent maximum, what is the maximum and why are the values so low? Why is the effect of CBX so weak in these quenching experiments with Panx1 compared to other assays? This assay is used in a lot of experiments so anything that could be done to bolster confidence is what it reports on would be valuable to readers. Bringing in as many control experiments that have been done, including any that are already published, would be helpful.
      5. Could provide more information to help rationalize how Yo-Pro-1, which has a charge of +2, can permeate what are thought to be anion favouring Panx channels? We appreciate that the biophysical properties of Panx channel remain mysterious, but it would help to hear how a bit more about the authors thinking. It might also help to cite other papers that have measured Yo-Pro-1 uptake through Panx channels. Was the Strep-tagged construct of Panx1 expressed in GnTI- cells and shown to be functional using electrophysiology?
      6. In Fig 5 panel C, data is presented as the ratio of LPC induced current at -60 mV to that measured at +110 mV in the absence of LPC. What is the rationale for analysing the data this way? It would be helpful to also plot the two values separately for all of the constructs presented so the reader can see whether any of the mutants disproportionately alter LPC induced current relative to depolarization activated current. Also, for all currents shown in the figures, the authors should include a dashed coloured line at zero current, both for the LPC activated currents and the voltage steps.
      7. The fragmented NTD density shown in Fig S8 panel A may resemble either lipid density or the average density of both NTD and lipid. For example, Class7 and Class8 in Fig.S8 panel D displayed split densities, which may resemble a phosphate head group and two tails of lipid. A protomer mask may not be the ideal approach to separate different classes of NTD because as shown in Fig S8 panel D, most high-resolution features are located on TM1-4, suggesting that the classification was focused on TM1-4. A more suitable approach would involve using a smaller mask including NTD, TM1, and the neighbouring TM2 region to separate different NTD classes.
      8. The authors don’t discuss whether the LPC-bound structures display changes in the external part of the pore, which is the anion-selective filter and the narrower part of the pore. If there are no conformational changes there, then the present structures cannot explain permeability to large molecules like ATP. In this context, a plot for the pore dimension will be helpful to see differences along the pore between their different structures. It would also be clearer if the authors overlaid maps of protomers to illustrate differences at the NTD and the "selectivity filter."
      9. The time between the addition of LPC to the nanodisc-reconstituted protein and grid preparation is not mentioned. Dynamic diffusion of LPC could result in equal probabilities for the bound and unbound forms. This raises the possibility of finding the Primed state in the LPC-bound state as well. Additionally, can the authors rationalize how LPC might reach the pore region when the channel is in the closed state before the application of LPC?
      10. In the cryo-EM map of the “resting” state (EMDB-21150), a part of the density was interpreted as NTD flipped to the intracellular side. This density, however, is poorly defined, and not connected to the S1 helix, raising concerns about whether this density corresponds to the NTD as seen in the “resting” state structure (PDB-ID: 6VD7). In addition, some residues in the C-terminus (after K333 in frog PANX1) are missing from the atomic model. Some of these residues are predicted by AlphaFold2 to form a short alpha helix and are shown to form a short alpha helix in some published PANX1 structures. Interestingly, in both the AF2 model and 6WBF, this short alpha helix is located approximately in the weak density that the authors suggest represents the “flipped” NTD. We encourage the authors to be cautious in interpreting this part as the “flipped” NTD without further validation or justification.
      11. Since the authors did not observe densities of bound PLC in the cryo-EM map, it is important to acknowledge in the text the inherent limitations of using docking and mutagenesis methods to locate where PLC binds.

      Optional suggestions:

      1. The authors used MeOH to extract mouse liver for reversed-phase chromatography. Was the study designed to focus on hydrophobic compounds that likely bind to the TMD? Panx1 has both ECD and ICD with substantial sizes that could interact with water soluble compounds? Also, the use of whole-cell recordings to screen fractions would not likely identify polar compounds that interact with the cytoplasmic part of the TMD? It would be useful for the authors to comment on these aspects of their screen and provide their rationale for fractionating liver rather than other tissues.
      2. The authors show that LPCs reversibly increase inward currents at a holding voltage of -60 mV (not always specified in legends) in cells expressing Panx1 and 2, and then show families of currents activated by depolarizing voltage steps in the absence of LPC without asking what happens when you depolarize the membrane after LPC activation? If LPCs can be applied for long enough without disrupting recordings, it would be valuable to obtain both I-V relations and G-V relations before and after LPC activation of Panx channels. Does LPC disproportionately increase current at some voltages compared to others? Is the outward rectification reduced by LPC? Does Vrev remain unchanged (see point above)? Its hard to predict what would be observed, but almost any outcome from these experiments would suggest additional experiments to explore the extent to which the open states activated by LPC and depolarization are similar or distinct.
      3. From the results presented, the authors cannot rule out that mutagenesis-induced insensitivity of Panx channels to LPCs results from allosteric perturbations in the channels rather than direct binding/gating by LPCs. In Fig 5 panel A-C, the authors introduced double mutants on TM1 and TM2 to interfere with LPC binding, however, the double mutants may also disrupt the interaction network formed within NTD, TM1, and TM2. This disruption could potentially rearrange the conformation of NTD, favouring the resting closed state. Three double Asn mutants, which abolished LPC induced current, also exhibited lower currents through voltage activation in Fig 5S, raising the possibility the mutant channels fail to activate in response to LPC due to an increased energy barrier. One way to gain further insight would be to mutate residues in NTD that interact with those substituted by the three double Asn mutants and to measuring currents from both voltage activation and LPC activation. Such results might help to elucidate whether the three double Asn mutants interfere with LPC binding. It would also be important to show that the voltage-activated currents in Fig. S5 are sensitive to CBX?
      4. Could the authors elaborate on how LPC opens Panx1 by altering the conformation of the NTDs in an uncoordinated manner, going from “primed” state to the “active” state. In the “primed” state, the NTDs seem to be ordered by forming interactions with the TMD, thus resulting in the largest (possible?) pore size around the NTDs. In contrast, in the “active” state, the authors suggest that the NTDs are fragmented as a result of uncoordinated rearrangement, which conceivably will lead to a reduction in pore size around NTDs (isn’t it?). It is therefore not intuitive to understand why a conformation with a smaller pore size represents an “active” state.
      5. Can the authors provide a positive control for these negative results presented in Fig S1B and C?
      6. Raw images in Fig S6 and Fig S7 should contain units of measurement.
      7. It may be beneficial to show the superposition between primed state and activated state in both protomer and overall structure. In addition, superposition between primed state and PDB 7F8J.
      8. Including particles number in each class in Fig S8 panel C and D would help in evaluating the quality of classification.
      9. A table for cryo-EM statistics should be included.
      10. n values are often provided as a range within legends but it would be better to provide individual values for each dataset. In many figures you can see most of the data points, which is great, but it would be easy to add n values to the plots themselves, perhaps in parentheses above the data points.
      11. The way caspase activation of Panx channels is presented in the introduction could be viewed as dismissive or inflammatory for those who have studied that mechanism. We think the caspase activation literature is quite convincing and there is no need to be dismissive when pointing out that there are good reasons to believe that other mechanisms of activation likely exist. We encourage you to revise the introduction accordingly.
      12. Why is the patient data in Fig 4F normalized differently than everything else? Once the above issues with mVenus quenching data are clarified, it would be good to be systematic and use the same approach here.
      13. What was the rational for using the structure from ref 35 in the docking task?
      14. Perhaps better to refer to double Asn ‘substitutions’ rather than as ‘mutations’ because that makes one think they are Asn in the wt protein.
      15. From Fig S1, we gather that Panx2 is much larger than Panx1 and 3. If that is the case, its worth noting that to readers somewhere.
      16. Please provide holding voltages and zero current levels in all figures presenting currents.
      17. While the authors successfully establish lysophospholipid-gating of Panx1 and Panx2, Panx3 appears unaffected. It may be advisable to be more specific in the title of the article.

      REVIEWING TEAM

      Reviewed by:

      Jorge Contreras, Professor, University of California, Davis, USA: electrophysiology and ion channel mechanisms

      Wei Lü, Associate Professor, Department of Structural Biology, Van Andel Institute, USA: ion channel mechanisms, X-ray crystallography and cryo-EM

      Xiaofeng Tan, Research Fellow, NINDS, NIH, USA: structural biology (X-ray crystallography and cryo-electron microscopy) and ion channel mechanisms

      Kenton J. Swartz, Senior Investigator, NINDS, NIH, USA: ion channel structure and mechanisms, chemical biology and biophysics, electrophysiology and fluorescence spectroscopy

      Curated by:

      Kenton J. Swartz, Senior Investigator, NINDS, NIH, USA

      (This consolidated report is a result of peer review conducted by Biophysics Colab on version 1 of this preprint. Comments concerning minor and presentational issues have been omitted for brevity.)

    1. Reviewer #2 (Public review):

      Summary:

      This manuscript provides experimental evidence on circadian behavioural cycles in Antarctic krill. The krill were obtained directly from krill fishing vessels and the experiments were carried out on board using an advanced incubation device capable of recording activity levels over a number of days. A number of different experiments were carried out where krill were first exposed to simulated light:dark (L:D) regimes for some days followed by continuous darkness (DD). These were carried out on krill collected during late autumn and late summer. A further set of experiments was performed on krill across three different seasons (summer, autumn, winter), where incubations were all DD conditions. Activity was measured as the frequency by which an infrared beam close to the top of the incubation tube was broken over unit time. Results showed that patterns of increased and decreased activity that appeared synchronised to the LD cycle persisted during the DD period. This was interpreted as evidence of the operation of an internal (endogenous) clock. The amplitude of the behavioural cycles decreased with time in DD, which further suggests that this clock is relatively weak. The authors argued that the existence of a weak endogenous clock is an adaptation to life at high latitudes since allowing the clock to be modulated by external (exogenous) factors is an advantage when there is a high degree of seasonality. This hypothesis is further supported by seasonal DD experiments which showed that the periodicity of high and low activity levels differed between seasons.

      Strengths

      Although there has been a lot of field observations of various circadian type behaviour in Antarctic krill, relatively few experimental studies have been published considering this behaviour in terms of circadian patterns of activity. Krill are not a model organism and obtaining them and incubating them in suitable conditions are both difficult undertakings. Furthermore, there is a need to consider what their natural circadian rhythms are without the overinfluence of laboratory-induced artefacts. For this reason alone, the setup of the present study is ideal to consider this aspect of krill biology. Furthermore, the equipment developed for measuring levels of activity is well-designed and likely to minimise artefacts.

      Weaknesses

      I have little criticism of the rationale for carrying out this work, nor of the experimental design. Nevertheless, the manuscript would benefit from a clearer explanation of the experimental design, particularly aimed at readers not familiar with research into circadian rhythms. Furthermore, I have a more fundamental question about the relationship between levels of activity and DVM on which I will expand below. Finally, it was unclear how the observational results made here related to the molecular aspects considered in the Discussion.

      (1) Explanation of experimental design - I acknowledge that the format of this particular journal insists that the Results are the first section that follows the Introduction. This nevertheless presents a problem for the reader since many of the concepts and terms that would generally be in the Methods are yet to be explained to the reader. Hence, right from the start of the Results section, the reader is thrown into the detail of what happened during the LD-DD experiments without being fully aware of why this type of experiment was carried out in the first place. Even after reading the Methods, further explanation would have been helpful. Circadian cycle type research of this sort often entrains organisms to certain light cycles and then takes the light away to see if the cycle continues in complete darkness, but this critical piece of knowledge does not come until much later (e.g. lines 369-372) leaving the reader guessing until this point why the authors took the approach they did. I would suggest the following (1) that more effort is made in the Introduction to explain the exact LD/DD protocols adopted (2) that a schematic figure is placed early on in the manuscript where the protocol is explained including some logical flow charts of e.g. if behavioural cycle continues in DD then internal clock exists versus if cycle does not continue in DD, the exogenous cues dominate - followed by - major decrease in cyclic amplitude = weak clock versus minor decrease = strong clock and so on

      (2) Activity vs kinesis - in this study, we are shown data that (i) krill have a circadian cycle - incubation experiments; (ii) that krill swarms display DVM in this region - echosounder data (although see my later point). My question here is regarding the relationship between what is being measured by the incubation experiments and the in situ swarm behaviour observations. The incubation experiments are essentially measuring the propensity of krill to swim upwards since it logs the number of times an individual (or group) break a beam towards the top of the incubation tube. I argue that krill may be still highly active in the rest of the tube but just do not swim close to the surface, so this approach may not be a good measure of "activity". Otherwise, I suggest a more correct term of what is being measured is the level of "upward kinesis". As the authors themselves note, krill are negatively buoyant and must always be active to remain pelagic. What changes over the day-night cycle is whether they decide to expend that activity on swimming upwards, downwards or remaining at the same depth. Explaining the pattern as upward kinesis then also explains by swarms move upwards during the night. Just being more active at night may not necessarily result in them swimming upwards.

      (3) Molecular relevance - Although I am interested in molecular clock aspects behind these circadian rhythms, it was not made clear how the results of the present study allow any further insight into this. In lines 282 to 284, the findings of the study by Biscontin et al (2017) are discussed with regard to how TIM protein is degraded by light via the clock photreceptor CRYTOCHROME 1. This element of the Discussion would be a lot more relevant if the results of the present study were considered in terms of whether they supported or refuted this or any other molecular clock model. As it stands, this paragraph is purely background knowledge and a candidate for deletion in the interest of shortening the Discussion.

      Other aspects<br /> (i) 'Bimodal swimming' was used in the Abstract and later in the text without the term being fully explained. I could interpret it to mean a number of things so some explanation is required before the term is introduced.<br /> (ii) Midnight sinking - I was struck by Figure 2b with regards to the dip in activity after the initial ascent, as well as the rise in activity predawn. Cushing (1951) Biol Rev 26: 158-192 describes the different phases of a DVM common to a number of marine organisms observed in situ where there is a period of midnight sinking following the initial dusk ascent and a dawn rise prior to dawn descent. Tarling et al (2002) observe midnight sinking pattern in Calanus finmarchicus and consider whether it is a response to feeding satiation or predation avoidance (i.e. exogenous factors). Evidence from the present study indicates that midnight sinking (and potential dawn rise) behaviour could alternatively be under endogenous control to a greater or lesser degree. This is something that should certainly be mentioned in the Discussion, possibly in place of the molecular discussion element mentioned above - possibly adding to the paragraph Lines 303-319.

      (iii) Lines 200-207 - I struggled to follow this argument regarding Piccolin et al identifying a 12 h rhythm whereas the present study indicates a ~24 h rhythm. Is one contradicting the other - please make this clear.

      (iv) Although I agree that the hydroacoustic data should be included and is generally supportive of the results, I think that two further aspects should be made clear for context (a) whether there was any groundtruthing that the acoustic marks were indeed krill and not potentially some other group know to perform DVM such as myctophids (b) how representative were these patterns - I have a sense that they were heavily selected to show only ones with prominent DVM as opposed to other parts of the dataset where such a pattern was less clear - I am aware of a lot of krill research where DVM is not such a clear pattern and it is disingenuous to provide these patterns as the definitive way in which krill behaves. I ask this be made clear to the reader (note also that there is a suggestion of midnight sinking in Fig 5b on 28/2).

    2. Author response:

      Reviewer #1 (Public review):  

      Hüppe and colleagues had already developed an apparatus and an analytical approach to capture swimming activity rhythms in krill. In a previous manuscript they explained the system, and here they employ it to show a circadian clock, supplemented by exogenous light, produces an activity pattern consistent with "twilight" diel vertical migration (DVM; a peak at sunset, a midnight sink, and a peak in the latter half of the night). 

      They used light:dark (LD) followed by dark:dark (DD) photoperiods at two times of the year to confirm the circadian clock, coupled with DD experiments at four times of year to show rhythmicity occurs throughout the year along with DVM in the wild population. The individual activity data show variability in the rhythmic response, which is expected. However, their results showed rhythmicity was sustained in DD throughout the year, although the amplitude decayed quickly. The interpretation of a weak clock is reasonable, and they provide a convincing justification for the adaptive nature of such a clock in a species that has a wide distributional range and experiences various photic environments. These data also show that exogenous light increases the activity response and can explain the morning activity bouts, with the circadian clock explaining the evening and late-night bouts. This acknowledgement that vertical migration can be driven by multiple proximate mechanisms is important. 

      The work is rigorously done, and the interpretations are sound. I see no major weaknesses in the manuscript. Because a considerable amount of processing is required to extract and interpret the rhythmic signals (see Methods and previous AMAZE paper), it is informative to have the individual activity plots of krill as a gut check on the group data. 

      The manuscript will be useful to the field as it provides an elegant example of looking for biological rhythms in a marine planktonic organism and disentangling the exogenous response from the endogenous one. Furthermore, as high latitude environments change, understanding how important organisms like krill have the potential to respond will become increasingly important. This work provides a solid behavioral dataset to complement the earlier molecular data suggestive of a circadian clock in this species. 

      We appreciate the positive evaluation of our work by Reviewer 1, acknowledging our approach to record locomotor activity in krill as well as the importance of the findings in assessing krill’s potential to respond to environmental change in their habitat.  

      Reviewer #2 (Public review):  

      Summary: 

      This manuscript provides experimental evidence on circadian behavioural cycles in Antarctic krill. The krill were obtained directly from krill fishing vessels and the experiments were carried out on board using an advanced incubation device capable of recording activity levels over a number of days. A number of different experiments were carried out where krill were first exposed to simulated light:dark (L:D) regimes for some days followed by continuous darkness (DD). These were carried out on krill collected during late autumn and late summer. A further set of experiments was performed on krill across three different seasons (summer, autumn, winter), where incubations were all DD conditions. Activity was measured as the frequency by which an infrared beam close to the top of the incubation tube was broken over unit time. Results showed that patterns of increased and decreased activity that appeared synchronised to the LD cycle persisted during the DD period. This was interpreted as evidence of the operation of an internal (endogenous) clock. The amplitude of the behavioural cycles decreased with time in DD, which further suggests that this clock is relatively weak. The authors argued that the existence of a weak endogenous clock is an adaptation to life at high latitudes since allowing the clock to be modulated by external (exogenous) factors is an advantage when there is a high degree of seasonality. This hypothesis is further supported by seasonal DD experiments which showed that the periodicity of high and low activity levels differed between seasons. 

      Strengths 

      Although there has been a lot of field observations of various circadian type behaviour in Antarctic krill, relatively few experimental studies have been published considering this behaviour in terms of circadian patterns of activity. Krill are not a model organism and obtaining them and incubating them in suitable conditions are both difficult undertakings. Furthermore, there is a need to consider what their natural circadian rhythms are without the overinfluence of laboratory-induced artefacts. For this reason alone, the setup of the present study is ideal to consider this aspect of krill biology.

      Furthermore, the equipment developed for measuring levels of activity is well-designed and likely to minimise artefacts. 

      We would like to thank Reviewer 2 for their positive assessment of our approach to study the influence of the circadian clock on krill behavior. We are delighted, that Reviewer 2 found our mechanistic approach in understanding daily behavioral patterns of Antarctic krill using the AMAZE set-up convincing, and that the challenging circumstances of working with a polar, non-model species are acknowledged.

      Weaknesses 

      I have little criticism of the rationale for carrying out this work, nor of the experimental design. Nevertheless, the manuscript would benefit from a clearer explanation of the experimental design, particularly aimed at readers not familiar with research into circadian rhythms. Furthermore, I have a more fundamental question about the relationship between levels of activity and DVM on which I will expand below. Finally, it was unclear how the observational results made here related to the molecular aspects considered in the Discussion. 

      (1) Explanation of experimental design - I acknowledge that the format of this particular journal insists that the Results are the first section that follows the Introduction. This nevertheless presents a problem for the reader since many of the concepts and terms that would generally be in the Methods are yet to be explained to the reader. Hence, right from the start of the Results section, the reader is thrown into the detail of what happened during the LD-DD experiments without being fully aware of why this type of experiment was carried out in the first place. Even after reading the Methods, further explanation would have been helpful. Circadian cycle type research of this sort often entrains organisms to certain light cycles and then takes the light away to see if the cycle continues in complete darkness, but this critical piece of knowledge does not come until much later (e.g. lines 369372) leaving the reader guessing until this point why the authors took the approach they did. I would suggest the following (1) that more effort is made in the Introduction to explain the exact LD/DD protocols adopted (2) that a schematic figure is placed early on in the manuscript where the protocol is explained including some logical flow charts of e.g. if behavioural cycle continues in DD then internal clock exists versus if cycle does not continue in DD, the exogenous cues dominate - followed by - major decrease in cyclic amplitude = weak clock versus minor decrease = strong clock and so on 

      We would like to thank Reviewer 2 for pointing out that the experimental design and the rationale behind it are not becoming clear early in the manuscript, especially for people outside the field of chronobiology. We think that the suggestion to include a schematic figure early in the manuscript is excellent and we plan to implement this in a revised version of the manuscript.  

      (2) Activity vs kinesis - in this study, we are shown data that (i) krill have a circadian cycle - incubation experiments; (ii) that krill swarms display DVM in this region - echosounder data (although see my later point). My question here is regarding the relationship between what is being measured by the incubation experiments and the in situ swarm behaviour observations. The incubation experiments are essentially measuring the propensity of krill to swim upwards since it logs the number of times an individual (or group) break a beam towards the top of the incubation tube. I argue that krill may be still highly active in the rest of the tube but just do not swim close to the surface, so this approach may not be a good measure of "activity". Otherwise, I suggest a more correct term of what is being measured is the level of "upward kinesis". As the authors themselves note, krill are negatively buoyant and must always be active to remain pelagic. What changes over the day-night cycle is whether they decide to expend that activity on swimming upwards, downwards or remaining at the same depth. Explaining the pattern as upward kinesis then also explains by swarms move upwards during the night. Just being more active at night may not necessarily result in them swimming upwards. 

      We believe that there is a slight misunderstanding in the way that what we call “activity” is measured. The experimental columns are equipped with five detector modules, evenly distributed over the height of the column. In our analysis we count all beam breaks that are caused by upward movement, i.e. every time a detector module is triggered after a detector module at a lower position has been triggered, and not only when the top detector module is triggered. In this way, we record upward swimming movements throughout the column, and not only when the krill swims all the way to the top of the column. This still means that what we are measuring is swimming activity, caused by upward swimming. We use this measure, to deliberately separate increased swimming activity, from baseline activity (i.e. swimming which solely compensates for negative buoyancy) and inactivity (i.e. passive sinking). 

      A higher activity is thus at first interpreted as an increase in swimming activity, which in the field may result in upwards directed swimming but also could mean a horizontal increase in activity, for example representing increased foraging and feeding activity. This would explain the daily activity pattern observed under LD cycles (Fig. 2), which shows a general increase in activity during the dark phase. This nighttime increase could be used for both upward directed migration during sunset as well as horizontal directed swimming for feeding and foraging throughout the night.

      We will formulate the description of the activity metric more clearly in the revised version of the manuscript.

      (3) Molecular relevance - Although I am interested in molecular clock aspects behind these circadian rhythms, it was not made clear how the results of the present study allow any further insight into this. In lines 282 to 284, the findings of the study by Biscontin et al (2017) are discussed with regard to how TIM protein is degraded by light via the clock photreceptor CRYTOCHROME 1. This element of the Discussion would be a lot more relevant if the results of the present study were considered in terms of whether they supported or refuted this or any other molecular clock model. As it stands, this paragraph is purely background knowledge and a candidate for deletion in the interest of shortening the Discussion.  

      We agree that this part is not directly related to the data presented in the manuscript and will therefore omit this part in the revised version of the manuscript to keep the discussion concise and focused on the results. 

      Other aspects 

      (i) 'Bimodal swimming' was used in the Abstract and later in the text without the term being fully explained. I could interpret it to mean a number of things so some explanation is required before the term is introduced. 

      We thank the Reviewer for pointing this out and will provide an explanation for the term “bimodal swimming” in a revised version of the manuscript. 

      (ii) Midnight sinking - I was struck by Figure 2b with regards to the dip in activity after the initial ascent, as well as the rise in activity predawn. Cushing (1951) Biol Rev 26: 158-192 describes the different phases of a DVM common to a number of marine organisms observed in situ where there is a period of midnight sinking following the initial dusk ascent and a dawn rise prior to dawn descent. Tarling et al (2002) observe midnight sinking pattern in Calanus finmarchicus and consider whether it is a response to feeding satiation or predation avoidance (i.e. exogenous factors). Evidence from the present study indicates that midnight sinking (and potential dawn rise) behaviour could alternatively be under endogenous control to a greater or lesser degree. This is something that should certainly be mentioned in the Discussion, possibly in place of the molecular discussion element mentioned above - possibly adding to the paragraph Lines 303-319. 

      We would like to thank the Reviewer for pointing this out and agree that it would be interesting to add the idea of an endogenous control of midnight sinking to the discussion. We plan to implement this in a revised version of the manuscript. 

      (iii) Lines 200-207 - I struggled to follow this argument regarding Piccolin et al identifying a 12 h rhythm whereas the present study indicates a ~24 h rhythm. Is one contradicting the other - please make this clear. 

      In our study we found that the circadian clock drives a bimodal pattern of swimming activity in krill, meaning it controls two bouts of activity in a 24 h cycle. Piccolin et al. (2020) identified a swimming activity pattern of ~12 h (i.e. two peaks in 24 h) at the group level, which is in line with our findings at the individual level. We will revisit the mentioned section for more clarity in a revised version.   

      (iv) Although I agree that the hydroacoustic data should be included and is generally supportive of the results, I think that two further aspects should be made clear for context (a) whether there was any groundtruthing that the acoustic marks were indeed krill and not potentially some other group know to perform DVM such as myctophids (b) how representative were these patterns - I have a sense that they were heavily selected to show only ones with prominent DVM as opposed to other parts of the dataset where such a pattern was less clear - I am aware of a lot of krill research where DVM is not such a clear pattern and it is disingenuous to provide these patterns as the definitive way in which krill behaves. I ask this be made clear to the reader (note also that there is a suggestion of midnight sinking in Fig 5b on 28/2).  

      To clarify the mentioned points concerning the hydroacoustic data:

      a) As mentioned in the Methods section, only hydroacoustic data during active fishing was included in the analysis. E. superba occurs in large monospecific aggregations and the fishery is actively targeting E. superba and monitoring their catch and the proportion of non-target species continuously with cameras. Krill fishery bycatch rates are very low (0.1–0.3%, Krafft et al. 2018), and fishing operations would stop if non-target species were being caught in significant proportions at any time. Therefore, and supported by our own observations when we conducted the experiments, we argue that it is a valid assumption that the backscattering signal shown in Figure 5 is predominantly caused by E. superba. 

      b) We are aware of the fact that DVM patterns of Antarctic krill are highly variable and that normal DVM patterns do not need to be the rule (e.g. see our cited study on the plasticity of krill DVM by Bahlburg et al. 2023). The visualized data were not selected for their DVM pattern but represent the period directly preceding the sampling for behavioral experiments in four different seasons (namely S1-S4), including the day of sampling. These periods were chosen to assess the DVM behavior of krill swarms in the field in the days before and during the sampling for behavioral experiments. 

      We will include these aspects in the Methods section in a revised version of the manuscript in order to improve understanding.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors' research group had previously demonstrated the release of large multivesicular body-like structures by human colorectal cancer cells. This manuscript expands on their findings, revealing that this phenomenon is not exclusive to colorectal cancer cells but is also observed in various other cell types, including different cultured cell lines, as well as cells in the mouse kidney and liver. Furthermore, the authors argue that these large multivesicular body-like structures originate from intracellular amphisomes, which they term "amphiectosomes." These amphiectosomes release their intraluminal vesicles (ILVs) through a "torn-bag mechanism." Finally, the authors demonstrate that the ILVs of amphiectosomes are either LC3B positive or CD63 positive. This distinction implies that the ILVs either originate from amphisomes or multivesicular bodies, respectively.

      Strengths:

      The manuscript reports a potential origin of extracellular vesicle (EV) biogenesis. The reported observations are intriguing.

      Weaknesses:

      It is essential to note that the manuscript has issues with experimental designs and lacks consistency in the presented data. Here is a list of the major concerns:

      (1) The authors culture the cells in the presence of fetal bovine serum (FBS) in the culture medium. Given that FBS contains a substantial amount of EVs, this raises a significant issue, as it becomes challenging to differentiate between EVs derived from FBS and those released by the cells. This concern extends to all transmission electron microscopy (TEM) images (Figure 1, 2P-S, S5, Figure 4 P-U) and the quantification of EV numbers in Figure 3. The authors need to use an FBS-free cell culture medium.

      Although FBS indeed contains bovine EVs, however, the presence of very large multivesicular EVs (amphiectosomes) that our manuscript focuses on has never been observed and reported. For reported size distributions of EVs in FBS, please find a few relevant references below:

      PMID: 29410778, PMID: 33532042, PMID: 30940830 and PMID: 37298194

      All the above publications show that the number of lEVs > 350-500 nm is negligible in FBS. The average diameter of MV-lEVs (amphiectosomes) described in our manuscript is around 1.00-1.50 micrometer.

      Reviewer #1: These papers evaluated the effectiveness of various methods to eliminate EVs from FBS, emphasizing the challenges associated with the presence of EVs in FBS. They also caution against using FBS in EV studies due to these issues. However, I did not find a clear indication regarding the size distributions of EVs in FBS in these papers.

      Please provide accurate reference supporting the claim that 'lEVs > 350-500 nm are negligible in FBS.' The papers cited by the authors do not address this specific point.

      In the revised manuscript, we addressed the point that due to sterile filtering of FBS, it cannot contain large >0.22 µm EVs

      Our response to Reviewer #1 point 2. When we demonstrated the TEM of isolated EVs, we consistently used serum- free conditioned medium (Fig2 P-S, Fig2S5 J, O) as described previously (Németh et al 2021, PMID: 34665280).

      Reviewer #1: This is an important point that is not mentioned in the original main text, figure legend or method. Please address.

      We agree and we apologize for it. We added this information to the revised manuscript.

      Our response to Reviewer #1 point 3. Our TEM images show cells captured in the process of budding and scission of large multivesicular EVs excluding the possibility that these structures could have originated from FBS.

      Reviewer #1: These images may also depict the engulfment of EVs in FBS. Hence, it is crucial to utilize EV-free or EV-depleted FBS.

      As we mentioned earlier, we added the information to the revised manuscript that sterile filtering of the FBS presumably removed particles >0.22 µm EVs

      Our response to Reviewer #1 point 4. In addition, in our confocal analysis, we studied Palm-GFP positive, cell-line derived MV-lEVs. Importantly, in these experiments, FBS-derived EVs are non-fluorescent, therefore, the distinction between GFP positive MV-lEVs and FBS-derived EVs was evident.

      Reviewer #1: I agree that these fluorescent-labeled assays conclusively indicate that the MV-lEVs are originating from the cells. However, the images of concerns are the non- fluorescent-labeled images in (Figure 1, 2P-S, S5, Figure 4 P-U and Figure 3). The MV-lEVs may derive from both the cells and FBS.

      Please see above our response to points 1-3.

      Our response to Reviewer #1 point 5. In addition, culturing cells in FBS-free medium (serum starvation) significantly affects autophagy. Given that in our study, we focused on autophagy related amphiectosome secretion, we intentionally chose to use FBS supplemented medium.

      Reviewer #1 If this is a concern, the authors should use EV-depletive FBS.

      As we discussed above, sterile filtration of FBS removes particles >0.22 µm. In addition, based on our preliminary experiments, EV-depleted serum may effect cell physiology. 

      Our response to Reviewer #1 point 6. Even though the authors of this manuscript are not familiar with the technological details how FBS is processed before commercialization, it is reasonable to assume that the samples are subjected to sterile filtration (through a 0.22 micron filter) after which MV-lEVs cannot be present in the commercial FBS samples.

      Reviewer #1This is a fair comment that needs to be included in the manuscript.

      As you suggested, this comment is now included in the revised manuscript

      (2) The data presented in Figure 2 is not convincingly supportive of the authors' conclusion. The authors argue that "...CD81 was present in the plasma membrane-derived limiting membrane (Figures 2B, D, F), while CD63 was only found inside the MV-lEVs (Fig. 2A, C, E)." However, in Figure 2G, there is an observable CD63 signal in the limiting membrane (overlapping with the green signals), and in Figure 2J, CD81 also exhibits overlap with MV-IEVs.

      Both CD63 and CD81 are tetraspanins known to be present both in the membrane of sEVs and in the plasma membrane of cells (for references, please see Uniprot subcellular location maps: https://www.uniprot.org/uniprotkb/P08962/entry#subcellular_location https://www.uniprot.org/uniprotkb/P60033/entry#subcellular_location). However, according the feedback of the reviewer, for clarity, we will delete the implicated sentence from the text.

      Reviewer #1 Please also justify the statement questioned in (3) as these arguments are interconnected.

      We hope you find our above responses to your comment acceptable.

      (3) Following up on the previous concern, the authors argue that CD81 and CD63 are exclusively located on the limiting membrane and MV-IEVs, respectively (Figure 2-A-M). However, in lines 104-106, the authors conclude that "The simultaneous presence of CD63, CD81, TSG101, ALIX, and the autophagosome marker LC3B within the MV-lEVs..." This statement indicates that CD63 and CD81 co-localize to the MV-IEVs. The authors need to address this apparent discrepancy and provide an explanation.

      There must be a misunderstanding because we did not claim or implicate in the text that “CD81 and CD63 are exclusively located on the limiting membrane and MV-IEVs”. Here we studied co-localization of the above proteins in the case intraluminal vesicles (ILVs). In Fig 2. we did not show any analysis of limiting membrane co-localization.

      Reviewer #1 I have indicated that this statement is found in lines 104-106, where the authors argue, 'The simultaneous presence of CD63, CD81, TSG101, ALIX, and the autophagosome marker LC3B within the MV-lEVs...' If the authors acknowledge the inaccuracy of this statement, please provide a justification for this argument.

      For clarity, we modified the description of data shown in Fig2 in the revised manuscript.

      (4) The specificity of the antibodies used in Figure 2 should be validated through knockout or knockdown experiments. Several of the antibodies used in this figure detect multiple bands on western blots, raising doubts about their specificity. Verification through additional experimental approaches is essential to ensure the reliability and accuracy of all the immunostaining data in this manuscript.

      We will consider this suggestion during the revision of the manuscript.

      Reviewer #1:Please do so.

      We carefully considered the suggestion, but we realized that it was not feasible for us to perform gene silencing in the case of all our used antibodies before resubmission of our revised manuscript. However, we repeated the Western blot for mouse anti-CD81 (Invitrogen MAA5-13548) and replaced the previous Western blot by it in the revised manuscript (Fig.2-S4H)

      (5) In Figures 2P-R, the morphology of the MV-IEVs does not resemble those shown in Figures 1-A, H, and D, indicating a notable inconsistency in the data.

      EM images in Figure2 P-R show sEVs separated from serum-free conditioned media as opposed to MV-lEVs, which were in situ captured in fixed tissue cultures (Fig1). Therefore, the two EV populations necessarily have different size and structure. Furthermore, Fig. 1 shows images of ultrathin sections while in Figure 2P-R, we used a negative-positive contrasting of intact sEV-s without embedding and sectioning.

      (6) There are no loading controls provided for any of the western blot data.

      Not even the latest MISEV 2023 guidelines give recommendations for proper loading control for separated EVs in Western blot (MISEV 2023 , DOI: 10.1002/jev2.12404 PMID: 38326288). Here we applied our previously developed method (PMID: 37103858), which in our opinion, is the most reliable approach to be used for sEV Western blotting. For whole cell lysates, we used actin as loading control (Fig3-S2B).

      Reviewer #1: The blots referenced here (Fig2-S3; Fig2-S4B; Fig3-S2B) were conducted using total cell lysates, not EV extracts. Only one blot in Fig3-S2B includes an actin control. All remaining blots should incorporate actin controls for consistency.

      Fig2-S3 (corresponding to Fig2-S4 in the revised manuscript) only shows reactivity of the used antibodies. This Western blot is not intended to serve as a basis of any quantitative conclusions. Fig2-S4 (corresponding to Fig2-S5 in the revised manuscript) includes the actin control. Fig3-S2B shows the complete membrane, which was cut into 4 pieces, and the immune reactivity of different antibodies was tested. The actin band was included on the anti-LC3B blot. For clarity, we rephrased the figure legend.

      Additionally, for Figures 2-S4B, the authors should run the samples from lanes i-iii in a single gel.

      Please note that in Figure 2- S4B, we did run a single gel, and the blot was cut into 4 pieces, which were tested by anti-GFP, anti-RFP, anti-LC3A and anti-LC3B antibodies. Full Western blots are shown in Fig.3_S2 B, and lanes “1”, “2” and “3” correspond to “i”, “ii” and “iii” in Fig.2-S4, respectively.

      Reviewer #1: In the original Figure 2- S4B, the blots were sectioned into 12 pieces. If lanes "i," "ii," and "iii" were run on the same blot, the authors are advised to eliminate the grids between these lanes.

      Grids separating the lanes have been eliminated on Fig.2_S4 (now Fig.2_S5 in the revised manuscript).

      (7) In Figure 2-S4, is there co-localization observed between LC3RFP (LC3A?) with other MV-IFV markers? How about LC3B? Does LC3B co-localize with other MV-IFV markers?

      In Supplementary Figure 2-S4, we showed successful generation of HEK293T-PalmGFP-LC3RFP cell line. In this case we tested the cells, and not the released MV-lEVs. LC3A co-localized with the RFP signal as expected.

      Reviewer #1: Does LC3RFP colocalize with MV-IFV markers in HEK293T-PalmGFP-LC3RFP cell line? This experiment aims to clarify the conclusion made in lines 104-106, where the authors assert that 'The concurrent existence of CD63, CD81, TSG101, ALIX, and the autophagosome marker LC3B within the MV-lEVs...'

      In the case of PalmGFP-LC3RFP cells, LC3-RFP is overexpressed. Simultaneous assessment of this overexpressed protein with non-overexpressed, fluorescent antibod-detected molecules proved to be challenging because of spectral overlaps and inappropriate signal-noise ratios. Furthermore, in association with EVs, the number of antibody-detected molecules is substantially lower than in cells. Therefore, even though we tried, we could not successfully perform these experiments.

      (8) The TEM images presented in Figure 2-S5, specifically F, G, H, and I, do not closely resemble the images in Figure 2-S5 K, L, M, N, and O. Despite this dissimilarity, the authors argue that these images depict the same structures. The authors should provide an explanation for this observed discrepancy to ensure clarity and consistency in the interpretation of the presented data.

      As indicated in Material and Methods, Fig 2-S5 F, G, H and I are conventional TEM images fixed by 4% glutaraldehyde 1% OsO<sub>4</sub> 2h and embedded into Epon resin with a post contrasting of 3.75% uranyl acetate 10 min and 12 min lead citrate. Samples processed this way have very high structure preservation and better image quality, however, they are not suitable for immune detection. In contrast, Fig.2.-S5 K,L,M,N shows immunogold labelling of in situ fixed samples. In this case we used milder fixation (4% PFA, 0.1% glutaraldehyde, postfixed by 0.5% OsO<sub>4</sub> 30 min) and LR-White hydrophilic resin embedding. This special resin enables immunogold TEM analysis. The sections were exposed to H<sub>2</sub>O<sub>2</sub> and NaBH<sub>4</sub> to render the epitopes accessible in the resin. Because of the different applied techniques, the preservation of the structure is not the same. In the case of Fig.2 J, O, separated sEVs were visualised by negative-positive contrast and immunogold labelling as described previously (PMID: 37103858).

      Reviewer #1: Please include this justification in the revised version.

      We included this justification in the revised manuscript.

      (9) For Figures 3C and 3-S1, the authors should include the images used for EV quantification. Considering the concern regarding potential contamination introduced by FBS (concern 1), it is advisable for the authors to employ an independent method to identify EVs, thereby confirming the reliability of the data presented in these figures.

      In our revised manuscript, we will provide all the images used for EV quantification in Figure 3C. Given that Figures 3C and 3-S1 show MV-lEVs released by HEK293T-PlamGFP cells, the possible interference by FBS-derived non-fluorescent EVs can be excluded.

      Reviewer #1: Please provide all the images.

      Original LASX files are provided (DOI: 10.6019/S-BIAD1456 ).

      Reviewer #1: The images raising concerns regarding the contamination of EVs in FBS primarily consist of transmission electron microscopy (TEM) images, namely, Figure 1, 2P-S, S5, and Figure 4 P-U, along with the quantification of EV numbers in Figure 3. These concerns persist despite the use of fluorescent-labeled experiments. While fluorescent-labeled MV-lEVs are conclusively identified as originating from the cells, the MV-lEVs observed in Figure 1, 2P-S, S5, and Figure 4 P-U and Figure 3 may derive from both the cells and FBS.

      Large EVs (with diameter >800 nm) derived from FBS were not present in our experiments, as discussed above.

      (10) Do the amphiectosomes released from other cell types as well as cells in mouse kidneys or liver contain LC3B positive and CD63 positive ILVs?

      Based on our confocal microscopic analysis, in addition the HEK293T-PalmGFP cells, HT29 and HepG2 cells also release similar LC3B and CD63 positive MV-lEVs. Preliminary evidence shows MV-lEV secretion by additional cell types.

      The response of Reviewer #1: Please show these data in the revised manuscript. Moreover, do cells in mouse kidneys or liver contain LC3B positive and CD63 positive ILVs?

      We have added new confocal microscopic images to Fig2-S3 showing amphiectosomes released also by the H9c2 (ATCC) cardiomyoblast cell line. To preserve the ultrastructure of MV-lEVs in complex organs like kidney and liver, fixation with 4% glutaraldehyde with 1% OsO4 appears to be essential. This fixation does not allow for immune detection to assess LC3B and CD63 positive MV-lEVs in the ultrathin sections.

      Reviewer #2 (Public Review):

      Summary:

      The authors had previously identified that a colorectal cancer cell line generates small extracellular vesicles (sEVs) via a mechanism where a larger intracellular compartment containing these sEVs is secreted from the surface of the cell and then tears to release its contents. Previous studies have suggested that intraluminal vesicles (ILVs) inside endosomal multivesicular bodies and amphisomes can be secreted by the fusion of the compartment with the plasma membrane. The 'torn bag mechanism' considered in this manuscript is distinctly different because it involves initial budding off of a plasma membrane-enclosed compartment (called the amphiectosome in this manuscript, or MV-lEV). The authors successfully set out to investigate whether this mechanism is common to many cell types and to determine some of the subcellular processes involved.

      The strengths of the study are:

      (1) The high-quality imaging approaches used, seem to show good examples of the proposed mechanism.

      (2) They screen several cell lines for these structures, also search for similar structures in vivo, and show the tearing process by real-time imaging.

      (3) Regarding the intracellular mechanisms of ILV production, the authors also try to demonstrate the different stages of amphiectosome production and differently labelled ILVs using immuno-EM.

      Several of these techniques are technically challenging to do well, and so these are critical strengths of the manuscript.

      The weaknesses are:

      (1) Most of the analysis is undertaken with cell lines. In fact, all of the analysis involving the assessment of specific proteins associated with amphiectosomes and ILVs are performed in vitro, so it is unclear whether these processes are really mirrored in vivo. The images shown in vivo only demonstrate putative amphiectosomes in the circulation, which is perhaps surprising if they normally have a short half-life and would need to pass through an endothelium to reach the vessel lumen unless they were secreted by the endothelial cells themselves.

      Our previous results analyzing PFA-fixed, paraffin embedded sections of colorectal cancer patients provided direct evidence that MV-lEV secretion also occurs in humans in vivo (PMID: 31007874). Regarding your comment on the presence of amphiectosomes in the circulation despite their short half-lives, we would like to point out that Fig1.X shows a circulating lymphocyte which releases MV-lEV within the vessel lumen. Furthermore, in the revised manuscript, an additional Fig.1-S1 is provided. Here, we show the release of MV-lEVs both by an endothelial and a sub-endothelial cell (Fig.1-S1G). In addition, these images show the simultaneous presence of MV-lEVs and sEVs in the circulation (Fig.1-S1.A,C,D,H and I). The transmission electron micrographs of mouse kidney and liver sections provide additional evidence that the MV-lEVs are released by different types of cells, and the “torn bag release” also takes place in vivo (Fig.1.V).

      (2) The analysis of the intracellular formation of compartments involved in the secretion process (Figure 2-S5) relies on immuno-EM, which is generally less convincing than high-/super-resolution fluorescence microscopy because the immuno-labelling is inevitably very sporadic and patchy. High-quality EM is challenging for many labs (and seems to be done very well here), but high-/super-resolution fluorescence microscopy techniques are more commonly employed, and the study already shows that these techniques should be applicable to studying the intracellular trafficking processes.

      As you suggested, in the revised manuscript, we present additional super-resolution microscopy (STED) data. The intracellular formation of amphisomes, the fragmentation of LC3B-positive membranes and the formation of LC3B-positive ILVs were captured (Fig. 3B-F).

      (3) One aspect of the mechanism, which needs some consideration, is what happens to the amphisome membrane, once it has budded off inside the amphiectosome. In the fluorescence images, it seems to be disrupted, but presumably, this must happen after separation from the cell to avoid the release of ILVs inside the cell. There is an additional part of Figure 1 (Figure 1Y onwards), which does not seem to be discussed in the text (and should be), that alludes to amphiectosomes often having a double membrane.

      We agree with your comment regarding the amphisome membrane and we added a sentence to the Discussion of the revised manuscript. Fig1Y onwards is now discussed in the manuscript. In addition, we labelled the surface of living HEK293 cells with wheat germ agglutinin (WGA), which binds to sialic acid and N-acetyl-D-glucosamine. After removing the unbound WGA by washes, the cells were cultured for an additional 3 hours, and the release of amphiectosomes was studied. The budding amphiectosome had WGA positive membrane providing evidence that the external limiting membrane had a plasma membrane origin (Fig.3G)

      (4) The real-time analysis of the amphiectosome tearing mechanism seemed relatively slow to me (over three minutes), and if this has been observed multiple times, it would be helpful to know if this is typical or whether there is considerable variation.

      Thank you for this comment. In the revised manuscript, we highlight that the first released LC3 positive ILV was detected as early as within 40 sec.

      Overall, I think the authors have been successful in identifying amphiectosomes secreted from multiple cell lines and demonstrating that the ILVs inside them have at least two origins (autophagosome membrane and late endosomal multivesicular body) based on the markers that they carry. The analysis of intracellular compartments producing these structures is rather less convincing and it remains unclear what cells release these structures in vivo.

      I think there could be a significant impact on the EV field and consequently on our understanding of cell-cell signalling based on these findings. It will flag the importance of investigating the release of amphiectosomes in other studies, and although the authors do not discuss it, the molecular mechanisms involved in this type of 'ectosomal-style' release will be different from multivesicular compartment fusion to the plasma membrane and should be possible to be manipulated independently. Any experiments that demonstrate this would greatly strengthen the manuscript.

      We appreciate these comments of the reviewer. Experiments are on their way to elucidate the mechanism of the “ectosomal style” exosome release and will be the topic of our next publication.

      In general, the EV field has struggled to link up analysis of the subcellular biology of sEV secretion and the biochemical/physical analysis of the sEVs themselves, so from that perspective, the manuscript provides a novel angle on this problem.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors describe a novel mode of release of small extracellular vesicles. These small EVs are released via the rupture of the membrane of so-called amphiectosomes that resemble "morphologically" Multivesicular Bodies.

      These structures have been initially described by the authors as released by colorectal cancer cells (https://doi.org/10.1080/20013078.2019.1596668). In this manuscript, they provide experiments that allow us to generalize this process to other cells. In brief, amphiectosomes are likely released by ectocytosis of amphisomes that are formed by the fusion of multivesicular endosomes with autophagosomes. The authors propose that their model puts forward the hypothesis that LC3 positive vesicles are formed by "curling" of the autophagosomal membrane which then gives rise to an organelle where both CD63 and LC3 positive small EVs co-exist and would be released then by a budding mechanism at the cell surface that appears similar to the budding of microvesicles /ectosomes. Very correctly the authors make the distinction from migrasomes because these structures appear very similar in morphology.

      Strengths:

      The findings are interesting despite that it is unclear what would be the functional relevance of such a process and even how it could be induced. It points to a novel mode of release of extracellular vesicles.

      Weaknesses:

      This reviewer has comments and concerns concerning the interpretation of the data and the proposed model. In addition, in my opinion, some of the results in particular micrographs and immunoblots (even shown as supplementary data) are not of quality to support the conclusions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Highlight MV-IEV, ILV and limiting membrane in Figure-1G, N, and U.

      Based on the suggestion, we revised Figure1

      (2) Figure 1-Y-AF are not mentioned in the text.

      In the revised manuscript, we discuss Figure 1Y-AF

      (3) The term "IEVs" in Figure 2-S2 is not defined.

      We modified the figure legend: we changed MV-lEV to amphiectosome

      (4) Need to quantify co-localization in Figure 2-S2.

      As suggested, we carried out the co-localisation analysis (Fig2-S2I), and Fig2-S2 was re-edited

      Reviewer #2 (Recommendations For The Authors):

      I have two recommendations for improving the manuscript through additional experiments:

      (1) I think the description of the intracellular processes taking place in order to form amphiectosomes would be much stronger if some super-resolution imaging could be undertaken. This should label the different compartments before and after fusion with specific markers that highlight the protein signature of the different limiting and ILV membranes much more clearly than immuno-EM. It will also help in characterising the double-membrane structure of amphiectosomes at the point of budding and reveal whether the patchy labelling of the inner membrane emerges after amphiectosome release (the schematic model currently suggests that it happens before).

      Thank you for your suggestion. STED microscopy was applied and results are shown in new Fig3 and the schematic model was modified accordingly.

      (2) The implications of the manuscript would be more wide-ranging if the authors could test genetic manipulations that are believed to block exosome or ectosome release, eg. Rab27a or Arrdc1 knockdown. This may allow them to determine whether MV-lEVs can be released independently of the classical exosome release mechanism because they use a different route to be released from the plasma membrane. This experiment is not essential, but I think it would start to address the core regulatory mechanisms involved, and if successful, would easily allow the authors to determine the ratio of CD63-positive sEVs being secreted via classical versus amphiectosome routes.

      The suggestion is very valuable for us and these studies are being performed in a separate project.

      I think there are several other ways in which the manuscript could be improved to better explain some of the approaches, findings and interpretation:

      (1) Include some explanation in the text of certain key tools, particularly:

      a. Palm-GFP and whether its expression might alter the properties of the plasma membrane since this is used in a lot of experiments and is the only marker that seems to uniformly label the outer membrane of amphiectosomes. One concern might be that its expression drives amphiectosome secretion.

      We found evidence for amphiectosome release also in the case of several different cells not expressing Palm-GFP. We believe, this excludes the possibility that Palm-GFP expression is the inducer of the amphiectosome release. Both by fluorescent and electron microscopy, the Palm-GFP non expressing cells showed very similar MV-lEVs. In addition, in the case of non-transduced HEK293 and fluorescent WGA-binding, we made similar observations.

      b. Lactadherin - does this label the amphiectosomes after their release or does the wash-off step mean that it only labels cells, which subsequently release amphiectosomes?

      Lactadherin labels the amphiectosomes after their release and fixation. Living cells cannot be labelled by lactadherin as PS is absent in the external plasma membrane layer of living cells. We used WGA on HEK293 cells to further support the plasma membrane origin of the external membrane of amphiectosomes.

      (2) Explain the EM and confocal imaging approaches more clearly. Most importantly, is a 3D reconstruction always involved to confirm that 'separated' amphiectosomes are not joined to cells in another Z-plane.

      Thank you for your suggestion. We have modified the manuscript accordingly

      (3) Presenting triple-labelled images with red, green and yellow channels does not allow individual labelling to be determined without single-channel images and even then, it is much more informative to use three distinguishable colours that make a different colour with overlap, eg. CMY? Fig.2_S2D and E do not display individual channels, so definitely need to be changed.

      In case of Fig.2_S2D, we now show the individual channels, the earlier E image has been removed. In case of the STED images, CMY colors had been used, as you suggested.

      (4) Please discuss in the text the data in Figure 1Y onwards concerning single/double membranes on MV-lEVs.

      In the revised manuscript, we discuss the question on single/double membranes and we refer to Figure 1Y-AF

      (5) On line 162, reword 'intraluminal TSPAN4 only' to 'one in which TSPAN4 is only intraluminal' to make it clear that other proteins are also marking the intraluminal region, not TSPAN4 only.

      We modified the text accordingly.

      (6) Points for further discussion and further conclusions:

      a. In vivo experiments - discuss the limitations of this part of the analysis - it seems that none of the amphiectosome markers have been analysed in this part of the study and the MV-lEVs are only in the circulation.

      b. Can the authors give any further indication of the levels of MV-lEVs relative to free sEVs from any of their studies?

      Using our current approach, it is not possible to determine the levels of MV-lEVs to free sEV. Without analyzing serial ultrathin sections, determination of the relative ratio of MV-lEVs and sEVs would depend on the actual section plane. In future projects, we will determine the ratio of LC3 positive and negative sEVs by single EV analysis techniques (such as SP-IRIS). In the revised manuscript, additional TEM images are included to provide evidence for the simultaneous presence of sEVs and MV-lEVs and MV-lEVs both inside and outside of the circulation.

      c. Please discuss the single versus double membrane issue (relating to experiments proposed above).

      We discuss this question in more details in the revised manuscript.

      d. Please point out that the release mechanism (plasma membrane budding) will involve different molecular mechanisms to establish exosome release, and this might provide a route to determine relative importance.

      We are currently running a systemic analysis of the release mechanism of amphiectosomes, and this will be the topic of a separate manuscript.

      Reviewer #3 (Recommendations For The Authors):

      * The model is not supported.

      * The data is not of quality.

      * The appropriate methods are not exploited.

      We are sorry, we cannot respond to these unsupported critiques.

    1. Unit 1 Socratic Seminar: Is Social Media More Beneficial or Negative to Society? Directions: Read and Annotate the readings by making comments in the document, or the margins if you’re doing it on paper on the 2 articles listed below. Use a physical highlighter or highlighter tool for quotes/ideas you want to explore more and talk about. TYPE YOUR ANSWERS IN BLUE IF DOING THIS DIGITALLY.<br /> Fill out the Summaries at the bottom of Fill out the 6 questions you will use during the Socratic to drive the conversation. Do this part LAST!

      Write out the 6 Questions you will use during the Socratic Seminar. (Do this LAST IN BLUE FONT) 1. How will social media evolve in the future 2.How is social media affecting our outside interactions with others 3. 4. 5. 6.

      Reading #1: Supporters Argue: Social Media Is Beneficial Overall 1a Supporters argue that social networking is a phenomenon that is beneficial overall and has changed the world for the better. Perhaps the greatest measure of social media's success, they contend, is the role it played in ousting undemocratic governments in Tunisia and Egypt. Journalist Peter Beaumont of the British newspaper the Guardian argued in 2011 that "a young woman or a young man with a smartphone" was the "defining" image of the Arab Spring. "The instantaneous nature of how social media communicate self-broadcast ideas, unlimited by publication deadlines and broadcast news slots, explains in part the speed at which these revolutions have unraveled, their almost viral spread across a region," he contended. "It explains, too, the often loose and non-hierarchical organization of the protest movements unconsciously modeled on the networks of the web." 2a Indeed, supporters argue that social media can be extremely useful in encouraging people who would not typically be politically motivated to engage in various issues or causes. While such statements are sometimes derided by critics as "hashtag activism" or "slacktivism," defenders insist that such actions really can make a difference. "What is commonly called slacktivism is not at all about 'slacking activists,'" Harvard University sociology professor Zeynep Tufekci wrote on her blog in 2012. "[R]ather it is about non-activists taking symbolic action—often in spheres traditionally engaged only by activists or professionals (governments, NGOs, international institutions.). Since these so-called 'slacktivists' were never activists to begin with, they are not in dereliction of their activist duties. On the contrary, they are acting, symbolically and in a small way, in a sphere that has traditionally been closed off to 'the masses' in any meaningful fashion." 3a Social media has many other benefits, advocates contend, including the potential to assist during times of catastrophe. During and after the terrorist attacks that rocked Paris, France, in November 2015, supporters note, people took to Facebook, Twitter, and other social media to communicate to loved ones that they were safe, or to offer refuge to people stranded in the city. "The attacks which ravaged the French capital yesterday showed how social media can also play a much more positive role," Forbes contributor Federico Guerrini wrote. "Facebook activated its Safety Check tool…to help people in areas affected by a disaster let their Facebook friends know they are safe. Twitter was also helpful: residents used the hashtag #porteouverte [open doors] to offer shelter to people stranded in the city." Advocates of social networking contend that sites like Facebook and Twitter have brought people closer together. "It has never been easier to make friends than it is right now, mainly thanks to social networking sites," writer Dave Parrack argued on the technology website MakeUseOf.com in 2012. "Just a few decades ago it was pretty tough to connect with people unless you were the overly outgoing type able to make conversation with anyone at a party. The rise of mobile phones helped change this, connecting people in a new way, but then social networks sprang up and the whole idea of friendship changed once more and forever." 4a Supporters maintain that social networking sites increasingly function as a refuge where people can relax with their friends and family. "This is where social media become a powerful social force in the modern sphere," Taso Lagos of the University of Washington wrote in the Seattle Times in 2012. "Because we live in a world of constant anxiety and stress about our lives, our careers, the planet and the fate of our families and friends, trusted sites like Facebook and Twitter are places we turn to relieve this tension and allow us to live and express our humanity." Social media, he argued, are "the community centers of the future." 5a Such sites provide many valuable benefits, defenders argue, including enhancing people's sense of self-worth. The act of taking and posting selfies, they contend, helps people exert control over their self-image and the way they are viewed. "The harshly judged practice of self-picture taking," Huffington Post contributor Molly Fosco wrote in March 2014, "while perhaps excessive or annoying at times, can actually be a really simple way to feel really good about yourself…. Although our selfies might be veiled in narcissism, self-obsession, or boastfulness I think that for many it's a genuine attempt to boost self-esteem. Seeing a close-up picture of your own face and willingly showing it to thousands of people with one click is a form of self-confidence that I don't think should be quickly dismissed." 6a Supporters of social media discount many of the fears typically raised by opponents, noting that it is common for new technology to stir criticism. In the late 19th century, they note, some observers predicted that the telephone would severely damage interpersonal relationships, just as detractors of social media do today. The telephone "was going to bring down our society," Megan Moreno of the University of Wisconsin in Madison told the New York Times in 2012. "Men would be calling women and making lascivious comments, and women would be so vulnerable, and we'd never have civilized conversations again." She added, "When a new technology comes out that is something so important, there is this initial alarmist reaction." Write out a 100-word summary of your thoughts/ideas/opinions of the strengths and weaknesses of the Beneficial Side. (TYPE IN BLUE FONT) Social media supporters argue that it is a good thing for the world and there is proof that it helps the movements like Arab Spring to go on smoothly and global peace talks to be the most constructive. It leads to a lot of people and even calling on them to fight for their cause. It is useful for giving real time help like social media platforms and info at the time of an emergency and also brings people who don't live close to each other, closer to each other. Social media can also lead to the development of good self esteem through many apps. These benefits of the media source have risks which include being heavily dependent on technology, getting wrong info, and the threat of getting into harmful sites with people despite how useful it can be sometimes .

      Reading #2: Opponents Argue: Social Media Is Not Beneficial Overall 1b Opponents of social networking argue that such sites are not beneficial overall and that they gradually erode many essential aspects of communication and socialization. "The shortcomings of social media would not bother me awfully if I did not suspect that Facebook friendship and Twitter chatter are displacing real rapport and real conversation," New York Times commentator Bill Keller argued in 2011. "The things we may be unlearning, tweet by tweet—complexity, acuity, patience, wisdom, intimacy—are things that matter." 2b Indeed, critics contend, the rise of social networking has coincided with a decline in the quality of conversation. "As we ramp up the volume and velocity of online connections, we start to expect faster answers," MIT psychology professor Sherry Turkle wrote in the New York Times in 2012. "To get these, we ask one another simpler questions; we dumb down our communications, even on the most important matters." 3b Opponents argue that social media can contribute to feelings of sadness and loneliness. A study by researchers at the University of Michigan in 2013, they note, found that college-aged users felt worse the more they used Facebook. Because people's Facebook personas are often curated to make their lives seem fun or perfect, critics argue, that browsing social media can contribute to feelings of inadequacy. "When you're on a site like Facebook, you get lots of posts about what people are doing," co-author John Jonides, a cognitive neuroscientist at the Department of Psychology at the University of Michigan, told National Public Radio in 2013. "That sets up a social comparison — you maybe feel your life is not as full and rich as those people you see on Facebook." 4b Social media, critics charge, can lead people to obsess about themselves and their self-image to the point where it can be harmful. People need to look deeper for self-worth, they contend, than achieving "likes" by posting selfies on social media. "[I]if you've just spent half an hour editing a photo by blurring around your eyes with one app, adding eyelashes with another, then changing the colors with a third," Teen Vogue contributor Tiffany Perry wrote in March 2016, "chances are you're giving too much merit to how others perceive you." 5b Other critics claim that the impact of social media on political phenomena like the Arab Spring has been overstated. New Yorker columnist Malcolm Gladwell noted in 2011 that many revolutions took place throughout history before the advent of social networking. "People with a grievance will always find ways to communicate with each other," he wrote. "How they choose to do it is less interesting, in the end, than why they were driven to do it in the first place." 6b Opponents also assert that promoting political or social causes on social media has little real impact other than to make the person making the post feel good about themselves. In 2013, for example, the United Nations Children's Emergency Fund (UNICEF), a U.N. organization that raises money to help and protect children throughout the world, ran an ad campaign with a slogan that read "Like us on Facebook, and we will vaccinate zero children against polio." The point of the campaign, UNICEF explained, was not to disparage "likes" but to encourage more active support, such as contributing money to buy vaccines. "Slacktivism's inherent laziness disqualifies it as a real agent of progress because it does not possess the enthusiasm necessary for change," contributor Elias Tavaras wrote for the Hill in January 2016. "How can a post on Facebook inspire necessary action, especially when sitting down on a comfy computer chair? Indeed, the passion one may feel disappears, with a simple scroll or is drowned out by the other slacktivist posts." 7b Critics charge that social media users are in danger of having their online personas co-opted by corporations eager to collect the information users share and employ it for marketing purposes. Robert Barry of the pop culture website The Quietus argues that social media is turning people into "branded products." "Online businesses which seem to be promising something for nothing—from social networking to file sharing—are really offering you, their audience, as a readymade and fully packaged item for purchase," he argued, "be that by the ghost of advertising's future, or the investor whose faith gives that ghost substance." Write out a 100-word summary of your thoughts/ideas/opinions of the strengths and weaknesses of the Against Side. (TYPE IN BLUE FONT)

    1. Why

      In the context of this poem, “why” does not seem to be a genuine question inviting explanation but rather a rhetorical one. If we read “why” this way, I think there are three main effects. One effect becomes obvious when considered in conjunction with the personified phrases that follow the word “why”: “fled the Ocean, “skipt the Mountains,” and “turned the Jordan.” Based on the rest of the poem, we already know that it was God who caused these phenomena. However, the “why” serves as a sort of rhetorical emphasis, forcing the recognition that it was God—no one or nothing else—who animated the inanimate. Secondly, given that God accomplishes this seemingly impossible feat, Milton’s use of “why” conjures a sense of awe. That is to say, by asking why an unmovable object moves, Milton forces his audience to confront that there is no rational explanation for these occurrences. Thus, the audience must instead indulge their awe, inspired by the inability to truly comprehend the extent and inner workings of God’s omnipotence. However, Milton’s questioning via the use of “why” may also seem somewhat foreboding when considered alongside the strength of these natural forces. In other words, why did these strong and dynamic entities flee? Knowing that the answer is God, the “why” almost seems to invite the realization that whatever caused these entities to skip or turn away must be fearfully powerful. These three readings of but a single word seem to summarize Milton’s portrayal of God in this poem: as a powerful and awesome—yet fear-inspiring—entity.

    1. 26. When Jesus was in Galilee at the beginning of his fourth year he was playing by the Jordan, and made seven pools. A boy spoilt them, and was struck dead. The parents complained. Joseph asked Mary to admonish Jesus. She begged him not to do such things, and he, not willing to grieve her, ‘smote the back side of the dead boy with his foot and bade him rise: which he did, and Jesus went on with his pools’. 27. He took clay from the pools and made twelve sparrows, on the sabbath. A Jew saw it and spoke to Joseph, who spoke to Jesus. Jesus clapped his hands and bade the sparrows fly away. All marvelled, and some went and told the chief priests and Pharisees. 28. The son of Annas the priest broke up the pools with a stick, and Jesus with a word withered him up. 29. Joseph was afraid and took Jesus home. On the way a boy ran against Jesus and got on his shoulder, meaning to hurt him. Jesus said, ‘May you not return whole from the way you go.’ He fell dead. Complaints of the parents, as in Thomas. Joseph to Jesus: ‘Why do you do such things? Many are now complaining against you and hate us on your account, and we suffer injuries through you.’ Jesus: ‘No son is wise whom his father has not taught according to the knowledge of this age, and the curse of his father hurts no man except those who do ill.’ All reviled Jesus to Joseph and he was afraid. ‘Then Jesus took the dead boy by the ear and held him up by it in the sight of all, and they saw Jesus speaking to him as a father to his son. And his spirit returned unto him and he lived again, and all marvelled.’ 30. Master Zacchaeus spoke reproachfully to Joseph; ‘You and Mary think more of your son than of the traditions of the elders.’ Joseph: ‘But who can teach him? if you can do so, we are very willing.’ Jesus overhearing said, ‘What you say is well for ordinary people: I have no earthly father. When I am lifted up from the earth I will make all mention of your descent to cease. I know when you were born and how long you have to live.’ All cried out in wonder, ‘We have never heard the like.’ Jesus: ‘Does this surprise you? I will tell you more. I have seen Abraham and spoken with him, and he has seen me.’ None could answer. Jesus: ‘I have been among you with the children, and you have not known me. I have spoken with you as with the wise and you have not understood my voice, for you are less than me, and of little faith.’ 31. Zacchaeus said, ‘Give him to me and I will take him to Levi who shall teach him letters.’ Levi bade him answer to Aleph: he was silent. Levi smote him with a rod of storax on the head. Jesus: ‘Why do you hit me? Know of a truth that he who is smitten teaches the smiter more than he is taught of him. For I can teach you the things that you yourself say. But all these who speak and hear are blind like sounding brass or a tinkling cymbal wherein is no perception of those things that are signified by their sound.’ Further he said to Zacchaeus, ‘Every letter from Aleph to Thau is discerned by the arrangement of it. First say what Thau is, and I will tell you what Aleph is.’ And again he said, ‘They who do not know Aleph, how can they tell Thau, hypocrites that they are? Say what Aleph is first and then will I believe you when you say Beth. He said to the master, ‘Let the master of the law say what the first letter is, or why it has many triangles [eight adjectives follow].’ Levi was stupefied and then began to lament, ‘Ought he to live on the earth? Nay, rather is he worthy to be hung on a great cross. He can put out fire and escape all torments by guile. I think he was born before the flood, before the deluge. What womb bare him? What mother gave him birth? What breasts suckled him? I fly before him’, etc., etc. Jesus smiled and said with command to all the children of Israel that stood and heard him, ‘Let the unfruitful bear fruit, and the blind see, and the lame walk straight, and the poor enjoy good things, and the dead revive, and every one return into a restored state, and abide in him who is the root of life and of everlasting sweetness.’ All were healed who had fallen into evil infirmities. No one thereafter dared to say aught to him or hear aught of him. 32. At Nazareth the boy Zeno fell from the upper storey and was raised. Joseph, Mary, and Jesus went thence to Jericho. 33. Jesus' pitcher was broken by a child, and he brought water in his cloak. 34. He took a little corn out of his mother's barn and sowed it. When reaped it made three measures, which he gave away. 35–6. [Translated below.] 37. A bed of six cubits was ordered of Joseph, and he told his lad to cut a beam of the right length, but he made it too short. Joseph was troubled. Jesus pulled it out to the right length. 38. He went to school the second time. ‘Say Alpha.’ Jesus: ‘Tell me first what Beta is, and I will tell you what Alpha is.’ The master smote him and died.

      Disconcerting childhood.

    1. To modern ears such language mocking and other Asian mocking may seem novel, but it is actually an old part of the white racist framing of Asian Americans. White English speakers on the West Coast developed this mocking in the mid- to late nineteenth century as their way of making fun of the English-Chinese speech of Chinese workers, as well as of racializing them. An early 1900s ragtime song goes, “Ching, Chong, Oh Mister Ching Chong, You are the king of Chinatown. Ching Chong, I love your sing-song.”2

      This makes me think about how racial stereotypes are a big part of American history. The "Ching Chong" slur against Asian Americans was a way to make them seem less than human. This shows that these harmful attitudes are still around today, and it makes it clear that we need to deal with them.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Therefore, their tool may be useful for stimulating multiple populations using a blue excitatory opsin in neuron A and their tool for red excitation of neuron B… Yet, there are no data presented that showcases their new tool for this purpose

      We agree with the reviewer that in this manuscript we have not experimentally shown the applicability of our system for dual optical stimulation. However, the suppression of blue-light excitation of ZipV/T-IvfChr-expressing neurons strongly suggests this can be used in experiments exciting populations of neurons similarly shown for BiPOLES. We don’t see a theoretical basis where this experiment cannot be done if sufficient cell targeting mechanisms (such as the use of cre-lox or retroAAV) is utilized. We have started several projects pursuing these utilities in the meantime.

      While they do show that red light = excitation and blue light = inhibition, they neither show 1) all-optical on/off modulation of the same cell; nor 2) high-frequency inhibition or excitation (max stim rate of 20hz, which is the same as the BiPOLES paper used for their LC stimulation paradigm; Vierock, as above, Figure 7a-d).

      Regarding point 1, we understand that the reviewer asks if we have optically excited (with red light) and inhibited (with blue light) the same neurons. If so, figure 4B1 (optical excitation of ZipT-IvfCh with red light) and figure 5A (optical inhibition of  ZipT-IvfCh with blue light) represent largely the same set of neurons.

      Regarding point 2, we respectfully disagree with the reviewer’s interpretation of Figure 7a-d) in Vierock et al. As we understand, in this part the authors apply a 20 Hz optical stimulation protocol to the LC neurons in vivo. However, there is no data showing that individual neurons do follow this stimulation protocol. To be clear, we are not saying that BiPOLES cannot drive 20 Hz APs. Very likely it can. It is based on ChrimsonR which is capable of doing so (Klapoetke et al., Figure 2). Although, in this manuscript we have not shown data for optical stimulation above 20Hz, our system is based on vfChrimson, which is known to drive AP of 100Hz and above (Mager et al., figure 2 and 3).  

      they must revise the manuscript to show that their approach is both 1) different in some way when compared to BiPOLES (it is my understanding that they did not do this, as per the supplementary alignment of the BiPOLES sequence and the sequence of the BiPOLES-like construct that they did test) and 2) that the properties that the investigators specifically tailored their construct to have confer some sort of experimental advantage when compared to the existing standard.

      In the latest version of the manuscript, we have compared our ZipV-IvfChr and the BiPOLES construct adapted with vfChrimson (Fig. 2 Suppl 1). The mean photocurrent amplitude of IvfChr in the ZipV-IvfChr construct is ~2.7 x higher than BiPOLES adapted with vfChrimson (14 randomly selected HEK293 cells in each group) (Fig. 2 Suppl 1B). We conducted this experiment in HEK293 cells to ensure accurate voltage-clamping and less biased cell selection. Even adjusting for the smaller photocurrent of vfChrimson vs ChrimsonR, this would still translate to ~1.6 x greater photocurrent with ZipV-IvfChr compared to the original BiPOLES utilizing ChrimsonR. We believe the increased efficiency of excitation is an important aspect of adapting vfChrimson for red-light excitation of neurons.

      Reviewer #2 (Public Review):

      (1) In the Introduction or Discussion, the authors could better motivate the need for a red-shifted actuator that lacks blue crosstalk, by giving some specific examples of how the tool could be productively used, e.g. pairing with another blue-shifted excitatory opsin in a different population, or pairing with a GFP-based fluorescent indicator, e.g. GCaMP. The motivation for the current tool is not obvious to non-experts.

      In the discussion, we now provided examples for potential use of the tool. For example, one of the key aspects that can be manipulated by the existing tool is the induction of spike-timing dependent plasticity with 2 wavelengths of light with blue light channelrhodopsin such as oChIEF is used to evoke presynaptic release and ZipT-IvfChr expressed in postsynaptic neuron. In this situation, the rapid termination of inhibitory response is critical so it does not interfere with the induction of LTP or LTD. Another experiment is the alternate control of projection neurons and interneurons in cortical areas, independent controls of neurons of direct and indirect pathways in the striatum to manipulate behavior.

      (2) Simultaneous excitation and inhibition are not the same as non-excitation. The authors mentioned shunting briefly. Another possible issue is changes in osmotic balance. Activation of a Na+ channel and a Cl- channel will lead to net import of NaCl into the cell, possibly changing osmotic pressure. Please discuss.

      We agree with the notion that osmotic, ionic and pH changes in small neuronal structure can be disruptive to the physiology and this is the reason we developed our approach where the fastest channelrhodopsins are used so we can minimize the channel opening time and the flux of ions through the channels when brief light illuminations are applied. Not only the flux of protons, sodium ions and calcium ions are minimized, the flux of chloride should be minimal as well (as the membrane potential should be close to the reversal potential of chloride reversal potential hence low ion flow). Hence our approach should be minimally disruptive compared to most other existing channelrhodopsin-based approaches when short or minimal light pulses were used in conjunction with our tools. This recommendation is included in the updated manuscript .

      (3) The authors showed that in ZipT-IvfChr, orange light drives excitation and blue light does not. But what about simultaneous blue and orange light? Can the blue light overwhelm the effect of the orange light? Since the stated goal is to open the blue part of the spectrum for other applications, one is now worried about "negative" crosstalk. Please discuss and, ideally, characterize this phenomenon.

      We now have performed this experiment. Simultaneous blue (470nm) and red light (635nm) stimulation does not produce AP (Fig .4 Suppl 1A)). This suggests the inhibitory effect of ACR is more efficient than the excitatory effects of IvfChr due to their higher conductance, this re-emphasizes the rapid termination of the ACR effects is critical for minimal disruption of physiological effects in such pairing strategy.

      (3.1) Does the use of the new tool require careful balancing of the expression levels of the ZipT and the IvfChr? Does it require careful balancing of blue and orange light intensities?

      As with any optogenetic tool, the users should validate the efficacy of the tool in their own system. Our tool solely relies on the balanced expression of the 2A system, the efficiency of the two opsins and their degradation of the time-span of expression. These aspects of the tool would be better addressed in future versions of the tools or improvement of the BiPOLES-type of tandem expression in subsequent versions. From the instrumentation side, the light intensity and differential penetration depth requires careful consideration. However, this holds true in most optogenetic and fluorescence imaging-based approaches as well. In the current update of the manuscript, we have included further discussion on these aspects as well.

      (3.2) Also, many opsins show complex and nonlinear responses to dual-wavelength illumination, so each component should be characterized individually under simultaneous blue + orange light.

      We now have performed this experiment (please see our comment to point 3)

      (3.3) I was expecting to see photocurrents at different holding potentials as a function of illumination wavelength for the coexpressed construct (i.e. to see at what wavelength it switches from being excitatory to inhibitory); and also to see I-V curves of the photocurrent at blue and orange wavelengths for the co-expressed constructs (i.e. to see the reversal potential under blue excitation). Overall, the patch clamp and spectroscopic characterization of the individual constructs was stronger than that of the combined constructs.

      We have added the IV curves for the co-expressed construct at different holding potentials for 470nm and 635nm wavelengths. This shows reverse potential for the two wavelengths that are intended for in vitro and in vivo applications. Performing a similar experiment for a variety of wavelengths would not be as valuable, in part, due to the enormous amount of data generated. As we have shown in the study, the response of any channelrhodopsins vary with different light duration and light intensities in addition to the wavelengths and holding potentials. The results for each recorded cell could include stimulation by different wavelengths, stimulation by different illumination intensities, stimulation with different light duration in addition to different holding potentials. Not only would the results be highly variable from cell-to-cell, there will be potentially hundreds or thousands of combinations to be tested per cell (e.g., 5 light intensities @1, 2.5 , 5 , 10 and 20 mW/mm>sup>2</sup>, 8 different wavelengths @ 450nm, 475nm, 500nm, 525nm, 550nm, 575nm, 600nm and 625nm, 7 light durations @ 1ms, 5ms, 10ms, 50ms, 100ms, 500ms and 1s, and , and 6 holding potentials @ -80mV, -70mV, -60mV, -40mV, -20mV and 0mV would result in 1680 stimulation conditions per recorded cell).Technically, the significant lowering of membrane resistance when both IvfChr and ZipACR variants are activated simultaneously would compromise the quality of voltage-clamping even in HEK293 cells with series resistance compensation. We have yet to see any other studies that had included such ambitious electrophysiology experiment for the channelrhodopsin characterization, likely due to the feasibility of such experiment.

      Reviewer #3 (Public Review):

      (1) The enhanced vf-Chrimson could potentially be a highlight of the manuscript, serving broader applications. Yet, gauging the overall improvements of ivf-Chrimson in comparison to other Chrimson variants remains intricate due to several reasons. First, photocurrents from ivf-Chrimson seem smaller than those from C-Chrimson (Supplemental Figure 3), and a direct comparison with standard vf-Chrimson is absent.

      We appreciate the reviewer’s positive view of our modified variant. We did not emphasize this particular modification as it was identical to our previous published modification and similar to that previously published by others (CsChrimson and C1Chrimson). In all these cases, improved membrane expression was consistently detected. We believe that expression data and our comparison of C-Chrimson and IvfChr is sufficient to justify the improved membrane expression and function.

      Second, while membrane expression of ivf-Chrimson appears enhanced in provided brightfield recordings, the quantitative analysis would necessitate confocal microscopy and a membrane marker (Supplemental Figure)

      We have now quantified the results with a membrane palmitoylated mCherry using confocal microscopy shown in Fig 2 Suppl1 A. We measured the Pearson Correlation Coefficient of the mCherry with EGFP or Citrine signal for the 6 constructs (vfChrimson, vfChrimson with trafficking sequence, vfChrimson with N-terminal signaling peptide from oChIEF (C-vfChrimson), vfChrimson with trafficking sequence and N-terminal signaling peptide from oChIEF (IvfChr), BiPOLES with EGFP or citrine and vfChrimson) and the results were identical and consistent with the prior results using epifluorescence microscopy.

      (2) Finally, other N-terminal modified Chrimson variants, like CsChrimson by Klapoetke et al. in 2014 and C1Chrimson by Oda et al. in 2018, have been generated. Comparing ivf-Chrimson to vf-CsChrimson or vf-C1Chrimson would be important to evaluate the benefits of the applied N-terminal modification.

      Our development of IvfChrimson is similar to the approach of vf-CsChrimson and identical to that of vf-C1Chrimson and we do not claim these modifications to be unique or superior. However, we have developed our design independently of these other studies and we have more extensive functional comparison and characterization data of our IvfChrimson variant than the other studies.

      (2.1) The action spectra of ZipACR suggest peak absorption of ZipACR WT and its mutant at 525 - 550 nm (Fig. 3). This is even further red-shifted than previously reported by Govorunova et al. Further action spectra recordings differ for all constructs between recordings initiated with blue or red light (Supplementary Fig. 5). This discrepancy is unexpected and should be discussed.

      We thank the reviewer for the comment, this was a mistake in the traces used for the figure. The example traces were the spectral response measured from the 400 nm to 650 nm instead of the 650 nm to 400 nm order shown in the spectral data. This has now been corrected.

      Additionally, the representative photocurrents of Zip(151V) in Fig. 3D1 do not align with the corresponding action spectrum in Fig. 3D2 as they show maximal photocurrents for 400 nm excitation.

      Please, see point above.

      (3) The authors introduce two different bicistronic expression cassettes-ZipT-IvfChR and ZipV-IvfChR-without providing clear guidelines on their conditions of use. Although the authors assert that ZipT is slower and further red-shifted than ZipV, the differences in the data for both ACR mutants are small and the benefits of the different final constructs should be explained.

      In our testing in neurons, ZipT has less ‘escaped’ spikes after the termination of the light pulses in the cells we have tested. However, this is dependent on the membrane properties such as capacitance and resistance of the cells. ZipV has a faster termination time and in some situations may be necessary due to its faster termination time and reduced disruption of physiological processes.

      We have now included this discussion in our updated manuscript.

      (4) The ZipT/V-IvfChRs are designed as bicistronic constructs; yet, disparities in membrane trafficking and protein degradation between the two channels could lead to divergences in blue and red light photoresponses. For future applicants, understanding the extent of expression ratio variations across cells using the presented expression cassettes could be of significance and should be discussed.

      We now have included this discussion in our responses above.

      Reviewer #1 (Recommendations For The Authors):

      (1) The Figure 1a mV cartoon traces for chloride are confusing. The chloride currents are depolarizing, not hyperpolarizing. As noted by the authors, these channels largely generate AP blockade through shunting inhibition (division), not hyperpolarization (subtraction).

      The figure has been corrected.

      (2) Figure 2A does not show where the light is applied. Why are some of the bars blue and some of them not filled?

      This has been corrected

      (3) Figure 2C1 does not show where the light is applied. There should be an inset to detail the blue-light-cessation-evoked AP. Also doesn't give the holding potential.

      The requested details are added.

      (4) Figure 2C2 inset is described as showing that "Light-induced currents with 470 nm illumination were initially outward but turned inward immediately following light offset." Is that correct? It looks to me like the current turns inward about half-way through the light pulse and then becomes even stronger after the light turns off. That is also consistent with the CC traces, which appear to show a transition toward depolarization during the light pulse before the AP initiation at light offset.

      Yes, the reviewer's observation is correct. There are blue light-induced outward and inward current peaks at the onset and offset of the light. Accordingly, we have modified the phrasing for Fig. 2C2.

      (5) Figure 3D1 shows that Zip(151V) has a peak current at 400nm, with a steady increase in current from red to blue, however, this is not the case in the summary data in 3D2. It's also not shown in Supplementary Figure 5B. What's going on?

      We apologize for the prior version of the figure associated with the first submission. The example traces from 400nm -> 650 nm were incorrectly included in the figure whereas the 650nm -> 400 nm example traces should be included. This has been corrected.

      (6) Figure 3D1 has no time scale.

      It is now been included

      (7) Figure 3E1 should read "Transduced" and not "Transfected"

      This has been corrected.

      (8) IvfChr fidelity drops off dramatically at 20hz...down to 50% efficiency of generating APs. This is described in the legend as "high frequency". Maybe the cart came before the horse in this figure...as it looks like in panel C that using less light power density improves fidelity in the dual opsin configuration with red light stimulation...why not use that power for the characterization? Did you try any higher frequencies? Or longer pulse widths? This is an important characterization to inform further use of the tool. This shortcoming isn't a cell-intrinsic limitation, as the 470nm stim with IVfChr was 100% successful at both 10hz and 20hz.

      It is known that red but not blue light pulses induce desensitization (optical fatigue) in red-shifted ChR variants. Indeed, one can reinstate the response to red light, by giving violet-blue light pulses (Fig 4. Suppl 2). We think this is the reason that the 470nm stimulation was more effective in inducing AP in cells expressing IvfChR. Higher light intensities induce greater desensitization, but are preferred for faster opening of channels and depolarization of neurons. This can explain why, in some situations, lower light intensities were more effective in producing APs when pulse trains were used. We have recordings from cells firing APs at 40Hz (not included). All these cells had high expression levels of the opsin.   

      (9) Figure 4D: why use 100ms pulse width? How do you know that this isn't causing depol block? Or some of the nefarious concerns that are raised in the discussion, such as "...disrupt[ion of] normal neuronal physiology and signal processing that occurs in millisecond time scale"?

      We used 100ms pulse duration to follow the published protocol that this experiment is based on (Lin et al., 2013, Nature Neuroscience). 

      (10) Figure 4E-bottom: What is the blue peak at light onset? Is the tool driving early activation before silencing?

      There seems to be an early, sharp and brief activation by blue light. We don’t know the definite cause of this, but we speculate this is driven by blue-light activation of ZipACR and not the IvfChr portion of the construct. The reason is that such a sharp rise is absent when only IvfChr is expressed (Fig. 4E, upper panel). Soma-targeted motif tethered to channelrhodopsins is known to result in preferential expression of channels close to soma but does not exclude the expression of channelrhodopsin in axonal and dendritic compartments, especially when animals are allow to recover for long period of time after viral injection. We believe that ZipACR at axonal terminals where the chloride concentration is high can still cause blue-light evoked depolarization and transmitter release. We observed this phenomenon in two mice in their first trial. The data for individual trials for each mouse are included in a supplementary table.

      (11) Figure 4G: Earlier in this same figure (B2, C), 470nm light was more effective at stimulating IvfChr than 635nm light. Is it unexpected that 638nm light would in this in vivo context be more effective at driving IvfChr responses than 450 nm light (at least as reflected by the AUC measurements)? Does this reflect fiber placement and light penetration/scattering?

      The spectral peaks of Chrimson-based variants including vfChrimson are all centered around 600 nm, and at 635 / 638 nm light, the amplitudes of photo-response decline, the channel onset slows, and the channels suffer greater desensitization. In isolated preparations where the light penetration is similar between 635 / 638 nm and 470 nm, 470 nm responses can outperform 635 / 638 nm responses due to its lack of desensitization and higher consistency in its response. This is also a strong reason that we have developed our current approach. In in vivo preparation shown in Fig. 4D-G, the much higher tissue penetration of 638nm light due to reduced absorption and reduced scattering can offset the performance of IvfChr to 450 nm light.  

      (12) In the methods, it is noted that different viral batches appear to generate different levels of neuronal toxicity. If that is the case, how did you differentiate between true differences between constructs vs. differential cell health effects?

      For figure 4D-F (whisker movement), we determined virus toxicity using NeuN staining. In slice recordings, we used the electrophysiological property of the neurons to assess their health. For this manuscript, we had one batch of virus that produced toxicity. We did not include any data from this batch.

      Reviewer #2 (Recommendations For The Authors):

      ● Define AUC on first use.

      It is now defined.

      ● Figure 3C2: Please explain how the photocurrents were normalized. As presented, it looks like under strong orange light, the ZipACR has higher photocurrent than the ivfChr.

      This is due to the fact vfChrimson and other Chrimson-based variants do not fully recover in the dark after 590 nm stimulation. We tested IvfChrimson with both reconditioning light pulse of 405 nm and without 405 nm and we can consistently reach a greater ‘maximal’ response from the same cell after 405 nm reconditioning (see Fig. 4 Suppl 2). We therefore normalize the response to the maximal recorded response of the cell often achieved with 10 or 20 mW/mm<sup>2</sup> 590 nm stimulation after 405 nm reconditioning. We understand this can be confusing and have now replaced the light-intensity response in Fig. 3C2 with the one with 405 nm reconditioning which is easier to interpret for the readers.

      ● P. 3: "As expected, blue light pulses induce transient membrane suppression..." Unclear what "suppression" means. Shunting? Hyperpolarization?

      We rephrased this to “As expected, blue light pulses transiently suppress APs…”

      ● P. 3: "illumination at 470 nm and 590 nm wavelengths led to similar amounts of courtship song (110.1 {plus minus} 12.8 and 78.5 {plus minus} 11.6,n = 16-17, respectively)". What are the units of "courtship song"?

      The unit for courtship song is the number of pulses per 10 seconds. This has been clarified in the figure.

      ● P. 5: The quantification of photocurrent in terms of pA/pF/A.U. is non-standard. I understand the impetus to normalize by expression to give something proportional to per-molecule conductance, but a user cares about overall photocurrent. Please also give the real photocurrents, either pA or pA/pF.

      We have provided the real photocurrent in pA or pA/pF where scientifically appropriate. To avoid selection and experimenter’s bias in our data, we did not set criteria for data elimination for cells with specific fluorescence intensity or photocurrent amplitude. Some resulting response can range from vary up to 20 folds from the same construct in many experiments. We do not believe that averaging absolute photocurrent amplitude would be justified due to the imbalance of weighing in the results. We do acknowledge that not selecting or eliminating data points would introduce higher noise in recordings with smaller responses but this is preferable over the selection or experimenter bias that is likely to be introduced otherwise.

      ● Please quote illumination intensities wherever possible.

      ● P. 7: why was the red light crosstalk into Zip(151T) tested at 635 nm instead of 590 nm? Isn't the relevant parameter 590 nm, since that will be used for the excitatory opsin?

      In all our characterizations of the constructs using slice electrophysiology recordings, we used 635nm instead of 590nm. The reason is that compared to 590nm wavelength, at 635nm the photocurrent for Zip(151T) and Zip(151V) is significantly reduced (Fig. 3D1,D2).

      ● P. 10: "we examined the power at which responses to 470 nm and 635 nm lights induce APs in neurons expressing ZipT-IvfChr, ZipV-IvfChr, or IvfChr", but the preceding sentence says you didn't test the ZipT-IvfChr. This is confusing, please clarify.

      The previous paragraph refers to the photocurrent recordings in HEK293 cells where our fast LED based illumination system is limited to 590 nm light, whereas the subsequent paragraph refers to the brain slice neuronal recordings. We have now emphasized the difference of the experiments in the rewrite.

      ● Fig. 4B1, top: Why don't the blue traces return to the same baseline after the stimulus epochs?

      We observed this shift in baseline (~4mV more depolarized) in cells expressing IvfChR (or vfChR) only with blue light stimulation. This was observed in the neurons recorded in the CA1 as well (data not shown). There was no such a change following red light stimulation (Fig. 4B1). Therefore, this should not affect the applicability of our construct. The original paper introducing vfChR did not test the responses of their constructs to blue light. There could be another photocycle state that is activated stronger by 470nm than 590nm and it has a slow off-rate, but this is only a speculation from our side. It must be noted we did not observe such a phenomenon in cells expressing ChrimsonR (Fig. 1 Suppl 1C).

      ● Fig. S3B, right: The two colors are barely distinguishable on the graph. Consider more distinct colors and/or different symbols.

      It has been changed accordingly.

      ● P. 15: "However, we do not recommend the use of orange light pulses, as we observed a significant photocurrent in this wavelength." Not clear what this is referring to. Which construct? Under which circumstances shouldn't one use orange light pulses? Where's the data showing this?

      This is referring to Fig. 3D1,D2 and Figure 4 suppl Fig. 2 which show a normalized ~40-50% photocurrent at 590nm. Now in the text, the reference figures for the data are added.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Audio et al. measured cerebral blood volume (CBV) across cortical areas and layers using high-resolution MRI with contrast agents in non-human primates. While the non-invasive CBV MRI methodology is often used to enhance fMRI sensitivity in NHPs, its application for baseline CBV measurement is rare due to the complexities of susceptibility contrast mechanisms. The authors determined the number of large vessels and the areal and laminar variations of CBV in NHP and compared those with various other metrics.

      Strengths:

      Non-invasive mapping of relative cerebral blood volume is novel for non-human primates. A key finding was the observation of variations in CBV across regions; primary sensory cortices had high CBV, whereas other higher areas had low CBV. The measured CBV values correlated with previously reported neuronal and receptor densities.

      Weaknesses:

      A weakness of this manuscript is that the quantification of CBV with postprocessing approaches to remove susceptibility effects from pial and penetrating vessels, as well as orientation dependency, is not fully validated, especially on a laminar scale. Further specific comments follow.

      We suspect that the comment regarding the lack of validation on laminar level stems from an error made by the corresponding author in the original bioRxiv submission (v1, May 17th https://www.biorxiv.org/content/10.1101/2024.05.16.594068v1?versioned=true), where Figure 3 which contains laminar validation was lost during pdf conversion. After submitting to E-Life, this mistake was quickly identified, and a corrected manuscript was re-uploaded to the bioRxiv (v2, June 5th, https://doi.org/10.1101/2024.05.16.594068). Although we informed the eLife staff about the update, it appears that the revised manuscript may not have reached reviewer #1 in time. We sincerely apologize for any confusion or inconvenience this may have caused.

      (1) Baseline CBV indices were determined using contrast agent-enhanced MRI (deltaR2*). Although this approach is suitable for areal comparisons, its application on a laminar scale has not been validated in the literature or in this study. By comparing with histological vascular information of V1, the authors attempted to validate their approach. However, the generalization of their method is questionable. The main issue is whether the large vessel contribution is minimized by processing approaches properly in various cortical areas (such as clusters 1-3 in Figure 5). It would be beneficial to compare deltaR2* with deltaR2 induced by contrast agents in a few selected slices, as deltaR2 is supposed to be sensitive to microvessels, not macrovessels. Please discuss this issue.

      The requested validation is presented in Figure 3F, which compares our deltaR2* measurements with previously invasive estimates of large vessel, capillary and cytochrome oxidase (CO) levels in V1 (Weber et al., 2008; doi.org/10.1093/cercor/bhm259). Our deltaR2* values show a stronger correspondence with microvascularity and CO levels than large vessels. Moreover, Figure 3D illustrates relative differences between V1 and V2, which closely align with the relative vascular volume differences reported by Zheng et al., 1991. It is important to note that Weber and colleagues averaged across V2-V5 due to similar vascularity across these areas. In our material, we also observed similar vascularity in these areas, though V5 (e.g., MT) has slightly denser vascularity, in agreement with reports of CO staining.

      Additionally, we report similar GM/WM vascular density, and high vascular density in primary sensory areas. Unfortunately, available ground-truth data on vascularity does not provide further (general) validation data for laminar vasculature in macaques (such as those in cluster 1-3; Fig. 5). That said, we have provided substantial evidence linking whole-brain vascular measures with variations in neuron (for data distribution, see Supp. Fig. 6F) and receptor densities, which we believe provides strong support for our approach.

      We would like to clarify that the authors do not assert that gradient-echo MRI is exclusively sensitive to microvessels and not macrovessels. This is not stated anywhere in the manuscript. If any sentence appears misleading, please let us know, and we will consider revising it. It is well-established that large vessels contribute to ΔR2* (Ogawa et al., 1993; Boxerman et al., 1995), and this is clearly stated in the manuscript (introduction, methods, results and discussion) and demonstrated in Figures 2A, B, and Supp. Figs. 2, 3, and 4. The primary concern, as the reviewer also noted, is whether we have sufficiently minimized the contribution of large vessels in our parcellated data analysis.

      At the parcellated level, we used the median value to avoid skewness in the data distribution, which primarily arises from large vessels, as regions near these vessels exhibit higher ΔR2*. The skewness of ΔR2* is also visible in Figure 1F, G. While this approach mitigates this large-small vessel issue, it does not entirely resolve it, as a slight linear increase toward the cortical surface remains (in all parcels). This is likely due to our inability to delineate all penetrating vessels as shown in Figure 2E and because contrast agents cumulatively accumulate toward superficial layers where blood originates and returns to the pial surface. To mitigate this issue, we detrended across layers the parcellated profiles, obtaining results similar to the ground-truth measures of vascularity in V1-V5 and CO histology in V1.

      (2) High-resolution MRI with a critical sampling frequency estimated from previous studies (Weber 2008, Zheng 1991) was performed to separate penetrating vessels, which is considered one of the major advancements in this study. However, this approach is still insufficient to accurately identify the number of vessels due to the blooming effects of susceptibility and insufficient spatial resolution. There was no detailed description of the detection criteria. More importantly, the number of observable penetrating vessels is dependent on imaging parameters and the dose of the contrast agent. If imaging slices were obtained in parallel to the cortex with higher in-plane resolution, it would likely improve the detection of penetrating vessels. Using higher-field MRI would further enhance the detection of penetrating vessels. Therefore, the reported value is only applicable to the experimental and processing conditions used in this study. Detailed selection criteria should be mentioned, and all potential pitfalls should be discussed.

      We believe that Figure 2 represents a significant conceptual and data analysis advancement in the field of vascular imaging. To the best of our knowledge, this is the first MRI study attempting to assess vessel density across cortical layers and compare the number of vessels to the known ground-truth. While we do not claim to have achieved a perfect solution (as shown in Figure 2), we offer a robust challenge to the imaging community by introducing this novel benchmarking approach. Our hope is that this conceptual framework will inspire the MR imaging community to tackle this challenge.

      Regarding imaging parameters, TE did not have much effect on our results, with a slight effect observed in the superficial layers due to the presence of large pial vessels (blooming effect; Fig. 2C). This also suggests that similar results could be achieved by changing the contrast agent dose, though there are, of course, CNR requirements and limitations at either end of the spectrum.

      We completely agree with the reviewer that spatial resolution is critical in resolving the arterio-venous networks, and we have dedicated significant attention to this topic in the introduction, results and discussion sections. We also agree with the reviewer that if imaging slices were obtained in parallel to the cortex with higher in-plane resolution, it would improve the detection of vessels. However, while this approach is ideal for counting vessels in a single plane and isolated region of cortex, it is less suited to the surface mapping of vessels, which is the focus of our study.

      Regarding the exclusion of vessels, based on visual comparison of vessels in volume space, Frangi-filter detection of vessels in volume space, and surface detection of vessels, we found no evidence to develop additional exclusion criteria (Supp. Fig. 3). On the contrary, we identified a number of false negatives in both the surface maps and volume maps. Notable exceptions to this rule seemed to occur at premotor areas F2 and F3 (Matelli et al., 1984; Patterns of cytochrome oxidase activity in the frontal agranular cortex of the macaque monkey). In these regions, we observed peculiar “pockets” of signal drop-out in equivolumetric layers 4-5. It is unclear what these signal-voids represent but it is interesting to note that these cortical areas F1-F5 were originally delineated by distinct CO+ positive large cells (Matelli et al., 1984).

      (3) Attempts to obtain pial vascular structures were made (Figure 2). As mentioned in this manuscript, the blooming effect of susceptibility contrasts is problematic. In the MRI community, T1-based Gd contrast agents have been used for mapping large vasculature, which is a better approach for obtaining pial vascular structures. Alternatively, computer tomography with a blood contrast agent can be used for mapping blood vasculature noninvasively. This issue should be discussed.

      We agree with the reviewer that T1-based contrast agents may offer more precise direct localization of large vessels in pial vasculature. However, the primary focus of our study was not on visualizing pial vascular structures, but rather on measuring vascular volume across cortical layers. For this purpose, we opted to use ferumoxytol, which provides superior T2*-contrast and about ten times longer plasma half-life compared to gadolinium. While we anticipated artifacts from the pial network, we developed a novel method to indirectly map these long-distance susceptibility artifacts arising from large vessels onto the cortical surface (Fig. 2A). If the goal would be to specifically visualize pial vessels, we applaud the high-resolution TOF angiography developed for direct vessel visualization (Bollman et al., 2022; https://doi.org/10.7554/eLife.71186)

      Changes in text:

      “4.1 Methodological considerations - vessel density informed MRI

      While the pial vessels can be directly visualized using high-resolution time-of-flight MRI (Bollmann et al., 2022), and computed tomography (Starosolski et al., 2015), imaging of the dense vascularity within the large and highly convoluted primate gray matter presents other formidable challenges. Here, we used a combination of ferumoxytol contrast agent and cortical layer resolution 3D gradient-echo MRI to map cerebrovascular architecture in macaque monkeys. These methods allowed us to indirectly delineate large vessels and indirectly estimate translaminar variations in cortical microvasculature.”

      (4) Since baseline R2* is related to baseline R2, vascular volume, iron content, and susceptibility gradients, it is difficult to correlate it with physiological parameters. Baseline R2* is also sensitive to imaging parameters; higher spatial resolution tends to result in lower R2* values (closer to the R2 value). Therefore, baseline R2* findings need to be emphasized.

      We agree with the reviewer's comment on the complexity of correlating baseline R2* with vasculature, given its sensitivity to multiple factors such as venous oxygenation, iron content, and imaging parameters such as image resolution. While our study focuses on vascular measurements, one could also highlight iron’s role in brain energy metabolism. Deoxygenated blood affects R2*, iron in oligodendrocytes supports myelination and neuronal signaling, and iron’s role in cytochrome c oxidase during electron transport impacts mitochondrial energy production. These metabolic factors collectively affect baseline R2* and link it to vasculature. Though quantitative susceptibility mapping (QSM) could help differentiate these different factors, it is beyond the scope of this study.

      (5) CBV-weighted deltaR2* is correlated with various other metrics (cytoarchitectural parcellation, myelin/receptor density, cortical thickness, CO, cell-type specificity, etc.). While testing the correlation between deltaR2* and these other metrics may be acceptable as an exploratory analysis, it is challenging for readers to discern a causal relationship between them. A critical question is whether CBV-weighted deltaR2* can provide insights into other metrics in diseased or abnormal brain states. If this is the case, then high-resolution deltaR2* will be useful. Please comment on this possibility.

      We agree with the reviewer that correlation deltaR2* with other metrics, such as myelin and cortical thickness, receptors and interneuron types, remains exploratory. Establishing causal relationships requires advanced multivariate analysis across cortical layers, but mapping histological stains to cortical layers is still under development. While this exploratory approach is promising, the ability to apply these insights to diseased or abnormal brain states is not yet clear. Layer-specific analysis of vasculature and function in disease is a future goal, and ongoing work aims to expand this line of inquiry. For now, while high-resolution deltaR2* may indeed offer diagnostic potential, we prefer to refrain from overstating its clinical utility at this stage. We agree that multimodal studies integrating neuroanatomy, function, and vascular metrics will be valuable for deeper insights into brain abnormalities.

      Changes in text:

      “4.3 The vascular network architecture is intricately connected to the neuroanatomical organization within cerebral cortex

      …To comprehensively understand the factors contributing to the vascular organization of the brain, experimental disentanglement through multivariate analysis of laminar cell types and receptor densities is needed (Hayashi et al., 2021, Froudist-Walsh et al., 2023).”

      (6) There is no discussion about the deltaR2* difference across subcortical areas (Figure 1). This finding is intriguing and warrants a thorough discussion in the context of the cortical findings.

      We thank the reviewer for this comment. We have expanded discussion on subcortical structures:

      Section 4.3, 1st paragraph:

      “In the cerebral cortex, neurons account for a significant portion (≈80-90%) of energy demand, with most of this energy allocated to signaling (≈80%) and maintaining membrane resting potentials (≈20%) (Attwell and Laughlin, 2001; Howarth et al., 2012). Since firing frequency is modulatory and the neural networks utilize distributed coding, the maintenance of resting-state membrane potential determines the minimal energy budget and the lower-limit for cerebral perfusion. Based on neuronal variability and energy dedicated to maintaining surface potential, this suggest an approximate (4 × 20% ≈) 80% variation in CBF and a resultant 25% variation in CBV across the cortex, in line with Grubbs' law (CBV = 0.80 × CBF0.38) (Grubb et al., 1974). In the cerebellar cortex, neuron density is higher, and the resting potentials are thought to account for more than 50% of energy usage (Howarth et al., 2012), aligning with its higher vascular volume compared to the cerebral cortex (Fig. 1F). However, this is a simplified estimation, and a more comprehensive assessment would need to account for consider an aggregate of biophysical factors such as…”

      Section 4.3, 4th paragraph:

      “When viewed in terms of information flow, CBV appear to decrease along the canonical circuit pathway (e.g., L4→L2/3→L5) in the primary visual cortex (Douglas and Martin, 2007) and as one ascends the hierarchy (e.g., V1→V2→V3&4→MT→7A) from primary sensory areas (Fig. 3F, Supp. Fig. 8) (Felleman and Van Essen et al., 1991, Markov et al., 2014). A similar pattern is observed in the auditory hierarchy, where the inferior colliculus, an early processing hub, exhibits the highest vascular volume, followed by a gradual reduction along cortical auditory ‘where’ and ‘what’ pathways (Fig. 1F, Fig. 3B).”

      (7) Figure 3 is missing. Several statements in the manuscript require statistics (e.g., bimodality in Figure 2D, Figure 3F).

      We apologize to the reviewer for the absence of Figure 3 in the initial submission.

      As for statistical testing of bimodality, we respectfully disagree and feel that this would not add much value to the manuscript. We think a descriptive, rather than rigorous, approach is sufficient in this context.

      Reviewer #2 (Public review):

      Summary:

      This manuscript presents a new approach for non-invasive, MRI-based measurements of cerebral blood volume (CBV). Here, the authors use ferumoxytol, a high-contrast agent, and apply specific sequences to infer CBV. The authors then move to statistically compare measured regional CBV with the known distribution of different types of neurons, markers of metabolic load, and others. While the presented methodology captures an estimated 30% of the vasculature, the authors corroborated previous findings regarding the lack of vascular compartmentalization around functional neuronal units in the primary visual cortex.

      Strengths:

      Non-invasive methodology geared to map vascular properties in vivo.

      Implementation of a highly sensitive approach for measuring blood volume.

      Ability to map vascular structural and functional vascular metrics to other types of published data.

      Weaknesses:

      The key issue here is the underlying assumption about the appropriate spatial sampling frequency needed to capture the architecture of the brain vasculature. Namely, ~7 penetrating vessels / mm2 as derived from Weber et al 2008 (Cer Cor). The cited work begins by characterizing the spacing of penetrating arteries and ascending veins using a vascular cast of 7 monkeys (Macaca mulatta, same as in the current paper). The ~7 penetrating vessels / mm2 are computed by dividing the total number of identified vessels by the area imaged. The problem here is that all measurements were made in a "non-volumetric" manner and only in V1. Extrapolating from here to the entire brain seems like an over-assumption, particularly given the region-dependent heterogeneity that the current paper reports.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      - For broader readership, it would be beneficial to provide a guide on how to interpret baseline R2* versus ΔR2*.

      The text was edited as follows:

      “…For quantitative assessment, R<sub>2</sub>* values were estimated from multi-echo gradient-echo images acquired both before and after the administration of ferumoxytol contrast agent (Table 1). Subsequently, the baseline R<sub>2</sub>* and ΔR<sub>2</sub>*, an indirect proxy measure of CBV (Boxerman et al., 1995), volume maps for each subject were mapped onto the twelve native equivolumetric layers (ELs) (Fig. 1C). Each vertex was then corrected for normal of the cortex relative to B<sub>0</sub> direction (Supp. Fig. 1). Surface maps for each subject were registered onto a Mac25Rhesus average surface using cortical curvature landmarks and then averaged across the subjects (Fig. 1D, E). Around cortical midthickness, the distribution of R<sub>2</sub>*, an aggregate measure for ferritin-bound iron, myelin content and venous oxygenation levels (Langkammer et al., 2012), resembled the spatial pattern of ΔR<sub>2</sub>* vascular volume. However, across cortical layers, these measures exhibited reversed patterns: R<sub>2</sub>* increased toward the white matter surface, whereas ΔR<sub>2</sub> decreased (Fig. 1E, G).”

      - The legends in Figure 1 describe green/cyan arrows, which are not visible in the figure itself.

      We thank the reviewer for noting this discrepancy. The reference to green/cyan arrows was removed from the Figure 1 legend.

      - There are typos in Section 3.3: "(Figure 4A, E)" and "(cluster 3; Figure 3)" should be corrected to Figure 5.

      We thank the reviewer for noting this error. The references to the Figures were corrected.

      Reviewer #2 (Recommendations for the authors):

      The work is elegantly presented and very easy to follow. The figures and the data presented there are compelling and well-organized. I have enjoyed reading the paper, despite my disagreement with the validity of the methodology presented.

      Validation against MRA methods (high resolution needed here, Bolan et al 2006, cited also by the authors). Certainly, that work used a much higher magnetic field. This could be done through collaboration if such a magnet is not available. In my humble opinion, the current arguments provided in the paper as validation fall short in convincing future readers. Other TOF approaches might be better suited (in combination with line scanning or single plane sequences) for the 3T used in this work.

      We appreciate the reviewer’s suggestion regarding time-of-flight (TOF) angiography at ultra-high magnetic fields, such as 9.4T for improved visualization of fast-flowing blood in arterial vessels, as elegantly demonstrated in Bolan et al., 2006. However, our focus was on mapping vasculature across cortical layers and TOF is not optimal for imaging slow capillary blood inflow. To enhance CNR also at capillary level, we used ferumoxytol-contrast agent to create quantitative CBV-weighted cortical layer maps (Boxerman et al., 1995).

      We are open to collaborative opportunities to revisit this work using ultra-high magnetic field strengths and more detailed neuroanatomical ground-truth measures. However, the recommended line scanning or single-plane sequences, at least on first impression, seem inadequate for whole-brain coverage and cortical surface mapping.

      Some of the methodology can be made more accessible to non-MRI readers. For example, a more elaborate explanation of R2* and ΔR2 could benefit future readers.

      Elaborated as requested (see above reply).

      A more detailed discussion of the limitations of the methodology could also be beneficial here. Explain the potential implications of under-sampling denser vascular areas (i.e. with potentially more than 7 penetrating vessels per mm2).

      V1, with its highest neuronal density, likely also has the highest feeding/draining vessel density. Based on this, we hypothesized that a 0.23 mm isotropic image resolution would sufficiently capture cortical arterio-venous networks, but we did not achieve the expected detection of 7 penetrating vessels per mm<sup>2</sup>. Consequently, we refrained from quantifying vessel density in other areas, albeit we did report the total vessel count.

      This under-sampling likely biases our ΔR2* estimates, skewing them toward larger vessels. To address this, we used median parcel values to avoid over-representing large vessels (the long-tail in ΔR2 parcels data distribution represents large vessels) and corrected for the cortical surface bias where blood originates from and returns to the pial network. These steps helped mitigate large vessel bias as described in the methods, results and discussion (see also our response to Reviewer #1, question #1).

      To improve clarity for readers, we further clarified:

      Methods:

      “The effect of blood accumulation in large feeding arteries and draining veins toward in the superficial layers was estimated using linear model and regressed out from the parcellated ΔR<sub>2</sub>* maps.”

      Results:

      “To mitigate bias resulting from undersampling the large-caliber vessels (Fig. 2A, B), median parcel values were obtained and M132 parcellated ΔR2* profiles were then detrended across ELs in each subject and then averaged.”

      Discussion:

      “This methodology, however, has known limitations. First, gradient-echo imaging is more sensitized toward large pial vessels running along the cortical surface and large penetrating vessels, which could differentially bias the estimation of Δ R<sub>2</sub>* across cortical layers (Fig. 2A, 2B) (Boxermann et al., 1995; Zhao et al., 2006). Additionally, vessel orientation relative to the B<sub>0</sub> direction introduce strong layer-specific biases in quantitative ΔR<sub>2</sub>* measurements (Supp. Fig. 1C) (Ogawa et al., 1993; Viessmann et al., 2019; Lauwers et al., 2008). To address these concerns, we conducted necessary corrections for B<sub>0</sub>-orientation, obtained parcel median values and regressed linear-trend thereby mitigating the effect of undersampling large-caliber vessels across ELs (Fig. 2C, Supp. Fig. 1).” 

      Please note, we are currently unable to create BALSA links to the figures due to maintenance issues at the data repository. As a result, we have opted to remove the links:

    1. Reviewer #1 (Public review):

      Summary:

      Boldt et al test several possible relationships between trandiagnostically-defined compulsivity and cognitive offloading in a large online sample. To do so, they develop a new and useful cognitive task to jointly estimate biases in confidence and reminder-setting. In doing so, they find that over-confidence is related to less utilization of reminder-setting, which partially mediates the negative relationship between compulsivity and lower reminder-setting. The paper thus establishes that, contrary to the over-use of checking behaviors in patients with OCD, greater levels of transdiagnostically-defined compulsivity predicts less deployment of cognitive offloading. The authors offer speculative reasons as to why (perhaps it's perfectionism in less clinically-severe presentations that lowers the cost of expending memory resources), and sets an agenda to understand the divergence in cognitive between clinical and nonclinical samples. Because only a partial mediation had robust evidence, multiple effects may be at play, whereby compulsivity impacts cognitive offloading via overconfidence and also by other causal pathways.

      Strengths:

      The study develops an easy-to-implement task to jointly measure confidence and replicates several major findings on confidence and cognitive offloading. The study uses a useful measure of cognitive offloading - the tendency to set reminders to augment accuracy in the presence of experimentally manipulated costs. Moreover, the utilizes multiple measures of presumed biases -- overall tendency to set reminders, the empirically estimated indifference point at which people engage reminders, and a bias measure that compares optimal indifference points to engage reminders relative to the empirically observed indifference points. That the study observes convergenence along all these measures strengthens the inferences made relating compulsivity to the under-use of reminder-setting. Lastly, the study does find evidence for one of several a priori hypotheses and sets a compelling agenda to try to explain why such a finding diverges from an ostensible opposing finding in clinical OCD samples and the over-use of cognitive offloading.

      Weaknesses:

      Although I think this design and study are very helpful for the field, I felt that a feature of the design might reduce the tasks's sensitivity to measuring dispositional tendencies to engage cognitive offloading. In particular, the design introduces prediction errors, that could induce learning and interfere with natural tendencies to deploy reminder-setting behavior. These PEs comprise whether a given selected strategy will be or not be allowed to be engaged. We know individuals with compulsivity can learn even when instructed not to learn (e.g., Sharp, Dolan and Eldar, 2021, Psychological Medicine), and that more generally, they have trouble with structure knowledge (eg Seow et al; Fradkin et al), and thus might be sensitive to these PEs. Thus, a dispositional tendency to set reminders might be differentially impacted for those with compulsivity after an NPE, where they want to set a reminder, but aren't allowed to. After such an NPE, they may avoid moreso the tendency to set reminders. Those with compulsivity likely have superstitious beliefs about how checking behaviors lead to a resolution of catastrophes, that might in part originate from inferring structure in the presence of noise or from purely irrelevant sources of information for a given decision problem.<br /> It would be good to know if such learning effects exist, if they're modulated by PE (you can imagine PEs are higher if you are more incentivized - e.g., 9 points as opposed to only 3 points - to use reminders, and you are told you cannot use them), and if this learning effect confounds the relationship between compulsivity and reminder-setting.

      A more subtle point, I think this study can be more said to be an exploration than a deductive of test of a particular model -> hypothesis -> experiment. Typically, when we test a hypothesis, we contrast it with competing models. Here, the tests were two-sided because multiple models, with mutually exclusive predictions (over-use or under-use of reminders) were tested. Moreover, it's unclear exactly how to make sense of what is called the direct mechanism, which is supported by the partial (as opposed to complete) mediation.

    1. This means that we need to study the problems of today, not those of yesterday.

      Work culture is always undergoing change. The qoute that I highlighted undermines this premise. As we read throughout chapter 1, we learn about the vast differences of I/O Psychology throughout history and in different parts of the globe. For instance, we can evaluate gender differences through 1985-2003. Although this time frame may seem small in the context of our planet's history, we can actually observe a huge shift in the amount of women who entered the field of I/O Psychology, which doubled! Or, perhaps we can observe how the Civil Rights Act of 1964 affected work culture. As this ended employment discrimination, we can think about how diversity brought about so many new and fresh ideas to various work spaces. I/O Psychology is not the ultimate answer to solve all problems, but we can use it as an aid in changing our perspective and approach in workplace challenges. Whether it’s addressing issues of equity, enhancing collaboration, or improving employee well-being, I/O Psychology helps us navigate the complexities of an evolving work culture.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper investigates the effects of the explicit recognition of statistical structure and sleep consolidation on the transfer of learned structure to novel stimuli. The results show a striking dissociation in transfer ability between explicit and implicit learning of structure, finding that only explicit learners transfer structure immediately. Implicit learners, on the other hand, show an intriguing immediate structural interference effect (better learning of novel structure) followed by successful transfer only after a period of sleep.

      Strengths:

      This paper is very well written and motivated, and the data are presented clearly with a logical flow. There are several replications and control experiments and analyses that make the pattern of results very compelling. The results are novel and intriguing, providing important constraints on theories of consolidation. The discussion of relevant literature is thorough. In summary, this work makes an exciting and important contribution to the literature.

      Weaknesses:

      There have been several recent papers that have identified issues with alternative forced choice (AFC) tests as a method of assessing statistical learning (e.g. Isbilen et al. 2020, Cognitive Science). A key argument is that while statistical learning is typically implicit, AFC involves explicit deliberation and therefore does not match the learning process well. The use of AFC in this study thus leaves open the question of whether the AFC measure benefits the explicit learners in particular, given the congruence between knowledge and testing format, and whether, more generally, the results would have been different had the method of assessing generalization been implicit. Prior work has shown that explicit and implicit measures of statistical learning do not always produce the same results (eg. Kiai & Melloni, 2021, bioRxiv; Liu et al. 2023, Cognition).

      We agree that numerous papers in the Statistical Learning literature discuss how different test measures can lead to different results and, in principle, using a different measure could have led to varying results in our study. In addition, we believe there are numerous additional factors relevant to this issue including the dichotomous vs. continuous nature of implicit vs. explicit learning and the complexity of the interactions between the (degree of) explicitness of the participants' knowledge and the applied test method that transcend a simple labeling of tests as implicit or explicit and that strongly constrains the type of variations the results of  different test would produce. Therefore, running the same experiments with different learning measures in future studies could provide additional interesting data with potentially different results.

      However, the most important aspect of our reply concerning the reviewer's comment is that although quantitative differences between the learning rate of explicit and implicit learners are reported in our study, they are not of central importance to our interpretations. What is central are the different qualitative patterns of performance shown by the explicit and the implicit learners, i.e., the opposite directions of learning differences for “novel” and “same” structure pairs, which are seen in comparisons within the explicit group vs. within the implicit group and in the reported interaction. Following the reviewer's concern, any advantage an explicit participant might have in responding to 2AFC trials using “novel” structure pairs should also be present in the replies of 2AFC trials using the “same” structure pairs and this effect, at best, could modulate the overall magnitude of the across groups (Expl/Impl.) effect but not the relative magnitudes within one group. Therefore, we see no parsimonious reason to believe that any additional interaction between the explicitness level of participants and the chosen test type would impede our results and their interpretation.

      Given that the explicit/implicit classification was based on an exit survey, it is unclear when participants who are labeled "explicit" gained that explicit knowledge. This might have occurred during or after either of the sessions, which could impact the interpretation of the effects.

      We agree that this is a shortcoming of the current design, and obtaining the information about participants’ learning immediately after Phase 1 would have been preferred. However, we made this choice deliberately as the disadvantage of assessing the level of learning at the end of the experiment is far less damaging than the alternative of exposing the participants to the exit survey question earlier and thereby letting them achieve explicitness or influence their mindset otherwise through contemplating the survey questions before Phase 2. Our Experiment 5 shows how realistic this danger of unwanted influence is: with a single sentence alluding to pairs in the instructions of Exp 5, we  could completely change participants' quantitative performance and qualitative response pattern. Unfortunately, there is no implicit assessment of explicitness we could use in our experimental setup. We also note that given the cumulative nature of statistical learning, we expect that the effect of using an exit survey for this assessment only shifts absolute magnitudes (i.e. the fraction of people who would fall into the explicit vs. implicit groups) but not aspects of the results that would influence our conclusions.

      Reviewer #2 (Public Review):

      Summary:

      Sleep has not only been shown to support the strengthening of memory traces but also their transformation. A special form of such transformation is the abstraction of general rules from the presentation of individual exemplars. The current work used large online experiments with hundreds of participants to shed further light on this question. In the training phase, participants saw composite items (scenes) that were made up of pairs of spatially coupled (i.e., they were next to each other) abstract shapes. In the initial training, they saw scenes made up of six horizontally structured pairs, and in the second training phase, which took place after a retention phase (2 min awake, 12 h incl. sleep, 12 h only wake, 24 h incl. sleep), they saw pairs that were horizontally or vertically coupled. After the second training phase, a two-alternatives-forced-choice (2-AFC) paradigm, where participants had to identify true pairs versus randomly assembled foils, was used to measure the performance of all pairs. Finally, participants were asked five questions to identify, if they had insight into the pair structure, and post-hoc groups were assigned based on this. Mainly the authors find that participants in the 2-minute retention experiment without explicit knowledge of the task structure were at chance level performance for the same structure in the second training phase, but had above chance performance for the vertical structure. The opposite was true for both sleep conditions. In the 12 h wake condition these participants showed no ability to discriminate the pairs from the second training phase at all.

      Strengths:

      All in all, the study was performed to a high standard and the sample size in the implicit condition was large enough to draw robust conclusions. The authors make several important statistical comparisons and also report an interesting resampling approach. There is also a lot of supplemental data regarding robustness.

      Weaknesses:

      My main concern regards the small sample size in the explicit group and the lack of experimental control.

      The sample sizes of the explicit participants in our experiments are, indeed, much smaller than those of the implicit participants due to the process of how we obtain the members of the two groups. However, these sample sizes of the explicit groups are not small at all compared to typical experiments reported in Visual Statistical Learning studies, rather they tend to be average to large sizes. It is the sizes of the implicit subgroups that are unusually high due to the aforementioned data collecting process. Moreover, the explicit subgroups have significantly larger effect sizes than the implicit subgroup, bolstering the achieved power that is also confirmed by the reported Bayes Factors that support the “effect” or the “no effect” conclusions in the various tests ranging in value from substantial to very strong.  Based on these statistical measures,  we think the sample sizes of the explicit participants in our studies are adequate.

      As for the lack of experimental control, indeed, we could not fully randomize consolidation condition assignment. Instead, the assignment was a product of when the study was made available on the online platform Prolific. This method could, in theory, lead to an unobserved covariate, such as morningness, being unbalanced between conditions. We do not have any reasons to believe that such a condition would critically alter the effects reported in our study, but as it follows from the nature of unobserved variables, we obviously cannot state this with certainty. Therefore, we added an explicit discussion of these potential pitfalls in the revised version of the manuscript.

      Reviewer #3 (Public Review):

      In this project, Garber and Fiser examined how the structure of incidentally learned regularities influences subsequent learning of regularities, that either have the same structure or a different one. Over a series of six online experiments, it was found that the structure (spatial arrangement) of the first set of regularities affected the learning of the second set, indicating that it has indeed been abstracted away from the specific items that have been learned. The effect was found to depend on the explicitness of the original learning: Participants who noticed regularities in the stimuli were better at learning subsequent regularities of the same structure than of a different one. On the other hand, participants whose learning was only implicit had an opposite pattern: they were better in learning regularities of a novel structure than of the same one. This opposite effect was reversed and came to match the pattern of the explicit group when an overnight sleep separated the first and second learning phases, suggesting that the abstraction and transfer in the implicit case were aided by memory consolidation.

      These results are interesting and can bridge several open gaps between different areas of study in learning and memory. However, I feel that a few issues in the manuscript need addressing for the results to be completely convincing:

      (1) The reported studies have a wonderful and complex design. The complexity is warranted, as it aims to address several questions at once, and the data is robust enough to support such an endeavor. However, this work would benefit from more statistical rigor. First, the authors base their results on multiple t-tests conducted on different variables in the data. Analysis of a complex design should begin with a large model incorporating all variables of interest. Only then, significant findings would warrant further follow-up investigation into simple effects (e.g., first find an interaction effect between group and novelty, and only then dive into what drives that interaction). Furthermore, regardless of the statistical strategy used, a correction for multiple comparisons is needed here. Otherwise, it is hard to be convinced that none of these effects are spurious. Last, there is considerable variation in sample size between experiments. As the authors have conducted a power analysis, it would be good to report that information per each experiment, so readers know what power to expect in each.

      Answering the questions we were interested in required us to investigate two related but separate types of effects within our data: general above-chance performance in learning, and within- and across-group differences.

      Above-chance performance: As typical in SL studies, we needed to assess whether learning happened at all and which types of items were learned. For this, a comparison to the chance level is crucial and, therefore, one-sample t-test is the statistical test of choice. Note that all our t-tests were subject to experiment-wise correction for multiple comparisons using the Holm-Bonferroni procedure, as reported in the Supplementary Materials.

      Within- and across-group differences: To obtain our results regarding group and par-type differences and their interactions, we used mixed ANOVAs and appropriate post-hoc tests as the reviewer suggested. These results are reported in the method section.

      Concerning power analysis, in the revised version of the manuscript we added analysis of achieved power for the statistical tests most critical to our arguments.

      (2) Some methodological details in this manuscript I found murky, which makes it hard to interpret results. For example, the secondary results section of Exp1 (under Methods) states that phase 2 foils for one structure were made of items of the other structure. This is an important detail, as it may make testing in phase 2 easier, and tie learning of one structure to the other. As a result, the authors infer a "consistency effect", and only 8 test trials are said to be used in all subsequent analyses of all experiments. I found the details, interpretation, and decision in this paragraph to lack sufficient detail, justification, and visibility. I could not find either of these important design and analysis decisions reflected in the main text of the manuscript or in the design figure. I would also expect to see a report of results when using all the data as originally planned.

      We thank the reviewer for pointing out these critical open questions our manuscript that need further clarification. The inferred “consistency effect” is based on patterns found in the data, which show an increase in negative correlation between test types during the test phase. As this is apparently an effect of the design of the test phase and not an effect of the training phase, which we were interested in, we decided to minimize this effect as far as possible by focusing on the early test trials. For the revised version of the manuscript, we revamped and expanded the discussion of how this issue was handled and also add a short comment in the main text, mentioning the use of only a subset of test trials and pointing the interested reader to the details.

      Similarly, the matched sample analysis is a great addition, but details are missing. Most importantly, it was not clear to me why the same matching method should be used for all experiments instead of choosing the best matching subgroup (regardless of how it was arrived at), and why the nearest-neighbor method with replacement was chosen, as it is not evident from the numbers in Supplementary Table 1 that it was indeed the best-performing method overall. Such omissions hinder interpreting the work.

      Since our approach provided four different balanced metrics (see Supp. Tables 1-4) for each matching method, it is not completely straightforward to make a principled decision across the methods. In addition, selecting the best method for each experiment separately carries the suspicion of cherry-picking the most suitable results for our purposes. For the revised version, we expanded on our description of the matching and decision process and added supplementary descriptive plots showing what our data looks like under each matching method for each experiment. These plots highlight that the matching techniques produce qualitatively roughly identical results and picking one of them over the other does not alter the conclusions of the test. The plots give the interested reader all the necessary information to assess the extent our design decisions influence our results.

      (3) To me, the most surprising result in this work relates to the performance of implicit participants when phase 2 followed phase 1 almost immediately (Experiment 1 and Supplementary Experiment 1). These participants had a deficit in learning the same structure but a benefit in learning the novel one. The first part is easier to reconcile, as primacy effects have been reported in statistical learning literature, and so new learning in this second phase could be expected to be worse. However, a simultaneous benefit in learning pairs of a new structure ("structural novelty effect") is harder to explain, and I could not find a satisfactory explanation in the manuscript.

      Although we might not have worded it clearly, we do not claim that our "structural novelty effect" comes from a “benefit” in learning pairs of the novel structure. Rather, we used the term “interference” and lack of this interference. In other words, we believe that one possible explanation is that there is no actual benefit for learning pairs of the novel structure but simply unhindered learning for pairs of the novel structure and simultaneous inference for learning pairs of the same structure. Stronger interference for the same compared to the novel structure items seems as a reasonable interpretation as similarity-based interference is well established in the general (not SL-specific) literature under the label of proactive interference.

      After possible design and statistical confounds (my previous comments) are ruled out, a deeper treatment of this finding would be warranted, both empirically (e.g., do explicit participants collapse across Experiments 1 and Supplementary Experiment 1 show the same effect?) and theoretically (e.g., why would this phenomenon be unique only to implicit learning, and why would it dissipate after a long awake break?).

      Across all experiments, the explicit participants showed the same pattern of results but no significant difference between pair types, probably due to insufficiency of the available  sample sizes. We already included in the main text the collapsed explicit results across Experiments 1-4 and Supplementary Experiment 1 (p. 16).  This analysis confirmed that, indeed, there was a significant generalization for explicit participants across the two learning phases. We could re-run the same analysis for only Experiment 1 and Supplementary Experiment 1, but due to the small sample of  N=12 in Suppl. Exp. 1, this test will be likely completely underpowered. Obtaining the sufficient sample size for this one test would require an excessive number (several hundreds) of new participants.

      In terms of theoretical treatment, we already presented our interpretation of our results in the discussion section, which we expanded on in the revised manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) It would be very useful to add individual data points (and/or another depiction of the distribution) to the bar plots. If not in the main figures, as added figures in the supplement.

      We added violin plots for all results in the Supplementary.

      (2) It would be helpful to include in the supplement some examples of responses that led to the 'explicit' or 'implicit' classification. Specifically, what kind of response was considered to contain a partial recognition of the underlying structure vs. no recognition?

      We added example responses used for classification in the Supplementary.

      (3) It would be useful to show the results of Experiment 5 as well as the diagonal version as supplemental figures.

      We added the requested figures in the Supplementary.

      Typos: page 10: "in in the tests", page 15: "rerun"

      Fixed.

      Reviewer #2 (Recommendations For The Authors):

      (1) My strongest reservation relates to the small sample size in the explicit group. The authors do report stats for all experiments together in one analysis and I think this is the only robust finding for this group. I would suggest removing any comparisons between this smaller group and the larger implicit group since they do not make a lot of sense due to the imbalance in sample size in my opinion. If they do want to report the explicit group individually for each experiment, they should at least test for differences between the experiments also for this group using ANOVA.

      We do agree that the unbalanced nature of the sample sizes can be problematic for the between-group comparisons. The t-tests reported for between-group comparisons are in fact Welch’s t-test better suited for unequal sample sizes and variances. Previously, we failed to report that these t-tests were Welch’s t-test, which we fixed in the revised version.

      In the Supplementary, we previously reported an ANOVA including all explicit participants from all experiments. This showed a significant main effect of Experiment and test type, but no significant interaction. We take this as evidence that although specific levels of learning vary by experimental condition, the overall pattern of learning (i.e. which pairs are learned better) are the same across all experiments.

      (2) Moreover, the explicit group does not only differ in the explicitness of their memory but also regarding learning performance per se (as evidenced by performance differences for the first training). This important confound needs to be acknowledged and discussed more thoroughly!

      We agree that this topic is important, this is why the subsection “The Type of Transfer Depends on Quality of Knowledge, Not Quantity of Knowledge” deals exclusively with this issue. See our reply to the next point.

      (3) The resampling approach is somewhat interesting to solve the issue raised in 2. However, I doubt that the authors actually achieve what they are claiming. Since we have a 2-AFC task the possibility must be considered that participants who chose correctly in the implicit group did so by chance. This means that the assumption that the matched pairs actually have the same amount of memory for the first training period as the explicit group is likely false. Therefore, this analysis is still comparing apples and oranges.

      We address this idea in detail in the supplementary materials pointing out first that the matched results showed the same pattern as the full results suggesting that Phase 1 and Phase 2 results are independent for this group, and by arguing that randomly selected subset of participants should not show a significant deviation from null performance in the Same vs. Novel performance in Phase 2.

      (4) One important issue, when conducting online experiments is assuring random allocation of participants. How did the authors recruit participants to ensure they did not select participants for the different experiments that differed regarding their preference for wake vs. sleep retention intervals? If no care was taken in this regard, I would suggest reporting this and maybe briefly discussing it.

      This shortcoming was now reported and addressed in the discussion section of the revised manuscript.

      (5) I could not find any information about the exact questions that were asked about the task rules. Also, there was no information on how the answers were used to assign groups. Both should be added.

      The exact questions were added to the revised Supplementary.

      (6) I think that the literature on sleep and rule extraction is well-represented in the manuscript. However, I think also referring more thoroughly to the literature on how sleep leads to gist extraction, schemas, and insight would help understand the relevance of the present research.

      We subsumed references to the mentioned areas of research under the labels of abstraction and generalization. In the revised section, we listed the appropriate labels along with the already used references to make the connection to a vast literature treating generalization in related but distinct ways more explicit.

      (7) It is unclear to me why the items learned in the first learning phase interfere with those learned in the second learning phase (without sleep) and not vice versa. What is the author's explanation for this?

      We added a paragraph on this to our revised discussion section. In short, there may also be retroactive interference. However, we would need yet another variation of the paradigm to properly measure it, and this was outside the scope of the current work.

      (8) As far as I can tell the study lacks all of the usual control tasks that are used in the field of sleep and memory (especially subjective sleepiness and objective vigilance). In addition, this research has the circadian confound, and therefore additional controls would have been warranted, e.g., morningness-eveningness, retrieval capabilities. Also, performance immediately after training phase 1 was not tested, which would serve as an important control for circadian differences in initial learning of the rule.

      The study uses a number of the control measures established in the sleep and memory literature, such as habitual sleep quality and sleep quality during the night of and the night before the experiment. However, there are, of course, more potentially interesting measures, such as the ones named by the reviewer.

      Testing performance right after training phase 1 would have been very interesting indeed. However, due to the nature of statistical learning tasks, this would have completely confounded the implicitness of learning by presenting participants with segmented input; i.e. isolated pairs. Therefore, we opted for the lesser of two evils in our design decision.

      (9) As far as I can tell, there is no effect of sleep on correctly identifying pairs from training phase 1. This would be expected and thus should be discussed.

      As noted and referenced in the discussion section, the effect of sleep on statistical learning per se is a subject of controversy in the literature, where some studies apparently find effects, while others find no effect on statistical learning whatsoever.

      (10) The manuscript should explicitly mention if the study was preregistered.

      It was not.

      Reviewer #3 (Recommendations For The Authors):

      The topic of this project is close to my heart, and I commend the authors for conducting numerous variations of the experiment with large sample sizes. I have some suggestions I feel will make the paper stronger, and a few minor comments that caught my eye during reading:

      (1) First and foremost, I found the paper's structure cumbersome. For instance, different aspects of Experiment 1 results are reported in (1) the main text, (2) under methods, and (3) in Supplementary. This makes reading unnecessarily difficult. This relates not only to the analysis results - the sample size is reported as 226 in the main text, 226+3 in Methods, and 226+3+19 in Supplementary. I strongly suggest removing all results from the Methods section and merging the supplementary results with the main results.

      We overhauled the structure of the paper, moving much more information into the proper method section and out of the Supplementary.

      (2) "Attention checks" and "response bias" appear first in Supplementary Experiment 1 but are explained only later under Experiment 1. The same thing for the experimental procedure. I therefore suggest placing Experiment 1 before Supplementary Experiment 1, but related to my previous comment - have one paragraph dedicated to Subject Exclusion of all experiments.

      The new structure of the Method sections solves this.

      (3) Figure 4 is mentioned but does not appear in the manuscript.

      This has been fixed. The paragraph in question now references the correct supplementary figure.

      (4) OSF project includes only data with no README file on how to understand the data. The work would also benefit from sharing the experimental and analysis codes.

      A README file was added.

      (5) This sentence is repeated in relation to four experiments: "Bayes Factors from Bayesian t-tests for implicit participants reported for experiments 1, 2, and 3 used an r-scale parameter of 0.5 instead of the default √2/2, reflecting that Experiment 1 found small effect sizes for this group". First, it is missing an explanation of what the r-scale means. Second, it sounds as if this was a product of the procedure, but in fact it was a decision by the researcher if I am correct. If so, it is missing a description of how and why this choice was made.

      This was indeed a decision by the researchers, in line with a Baysian logic of evidence accumulation. We made the explanation in the paper clearer.

      (6) Did I understand correctly that each pair was tested 4 times? Was it against the same foil? Did you make sure not to repeat the same pair in back-to-back trials? These details, in addition to what I noted in the public review, are needed.

      Each pair was tested 4 times. Each time against a different foil pair. Details have been added to the Method section.

      (7) Also in relation to my public review, I could not understand why the sample size was overshot by so much in Experiment 1 (229 instead of 198.15)?

      The calculated sample size of 198.15 was for the implicit subgroup alone, while 229 included explicit and implicit participants.

      (8) The correlation between phase 1 and phase 2 is only tested in explicit participants. Why is that? A test in implicit participants is needed for completeness.

      Correlations for implicit participants have been added.

      (9) There is known asymmetry between the horizontal and vertical plains in our visual system (with preference for horizontal stimuli). I was missing a comparison between learning in the two structures, and a report of how many participants received either in Phase 1.

      The allocation of participants to horizontal and vertical conditions was balanced. In the Method section we already report an ANOVA testing for a potential effect of orientation condition, which was not significant.

      Minor/aesthetic comments:

      (1) "In Phase 2, explicit participants performed above chance for learning pairs that shared their higher level orientation structure with that of pairs in Phase 1". This sounds as if there was a separate test following the two learning phases. Perhaps reword to "for phase 2 pairs".

      Fixed

      (2) "the two asleep-consolidation groups (Exp. 3 and 4)" - I think you mean Exp. 2 and 4.

      Fixed.

      (3) "acquiring explicitness in Experiment 5 as compared to 1" I think you mean Supplementary Experiment 1 as compared to 1.

      Fixed

      (4) "without such a redescription, the previously learned patterns in Phase 1 interfere with new ones in Phase 2, when redescription occurs..." The comma should be a dot.

      Fixed

      (5) In Experiment 4, did 168 or 169 participants survive exclusion? Both accounts exist, and so do reports of degrees of freedom that allow both 23 and 24 explicit participants.

      Fixed.

      (6) "Implicit learners also performed above chance.." in Experiment 2 is missing (n=XX).

      Fixed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews:

      We are grateful to the reviewers and the editorial team for their feedback and thorough revisions of our paper. We also appreciate their acknowledgement that this study represents a significant advancement in the field of reproductive neuroendocrinology and offers insights on the contribution of obesity vs melanocortin signaling in women’s fertility. In the revised version, we will provide a more detailed clarification of the data and methodology and adhere to the reviewers’ suggestions.

      Please find below our answers to specific concerns in the public review:

      Given the fact that mice lacking MC4R in Kiss1 neurons remained fertile despite some reproductive irregularities, the overall tone and some of the conclusions of the manuscript (e.g., from the abstract: "... Mc4r expressed in Kiss1 neurons is required for fertility in females") were overstated. Perhaps this can be described as a contributing pathway, but other mechanisms must also be involved in conveying metabolic information to the reproductive system.

      We will tone down these statements throughout the manuscript to indicate that MC4R in Kiss1 neurons plays a role in the metabolic control of fertility (rather than “…is required for fertility”)

      The mechanistic studies evaluating melanocortin signalling in Kiss1 neurons were all completed in ovariectomised animals (with and without exogenous hormones) that do not experience cyclical hormone changes. Such cyclical changes are fundamental to how these neurons function in vivo and may dynamically alter the way they respond to neuropeptides. Therefore, eliminating this variable makes interpretation difficult.

      Mice lack true follicular and luteal phases and therefore it is impossible to separate estrogen-mediated changes from progesterone-mediated changes (e.g., in a proestrous female). Therefore, we use an ovariectomized female model in which we can generate a LH surge with an E2-replacement regimen [1]. This model enables us to focus on estrogen effects, exclude progesterone effects, and minimize variability. Inclusion of cycling females would make interpretation much more difficult.

      (1) Bosch et al., 2013 Mol & Cell Endo; https://doi.org/10.1016/j.mce.2012.12.021

      Use of the POMC-Cre to target ontogenetic inputs to Kiss1 neurons might have targeted a wider population of cells than intended.

      POMC is transiently expressed during embryonic development in a portion of cells fated to be Kiss1 or NPY/AgRP neurons [1-2]. Therefore, this is a valid concern when crossing with a floxed mouse. However, use of AAVs in adult animals avoids this issue and leads to specific expression in POMC neurons [3]. This POMC-Cre mouse has been used extensively with AAVs to drive specific expression in POMC neurons by other laboratories [4-7]. Therefore, we are confident that our optogenetic studies have narrowly targeted POMC inputs.

      (1) Padilla et al., 2010 Nat Med; https://doi.org/10.1038/nm.2126

      (2) Lam et al., 2017 Mol Metab; https://doi.org/10.1016/j.molmet.2017.02.007

      (3) Stincic et al., 2018 eNeuro; https://doi.org/10.1523/eneuro.0103-18.2018

      (4) Fenselau et al., 2017 Nat Neuro; https://doi.org/10.1038/nn.4442

      (5) Rau & Hentges, 2019 J Neuro; https://doi.org/10.1523/jneurosci.3193-18.2019

      (6) Fortin et al., 2021 Nutrients; https://doi.org/10.3390/nu13051642

      (7) Villa et al., 2024 J Neuro; https://doi.org/10.1523/jneurosci.0222-24.2024

      Recommendations for Authors

      We thank the reviewers and the editorial team for their comments and thorough revisions of our paper. We have now addressed their comments and edited the manuscript accordingly:

      Reviewer #1 (Recommendations For The Authors):

      L80 -This is an awkward sentence; it isn't an inverse agonist of the AgRP; this may read better just to say that the inverse agonist, AgRP.

      Thank you for this comment. This has now been changed in the text (L80).

      L86 - This text reads as if mice have an inherent obesity issue.

      This has also now been addressed in the text (L86).

      L131 - The numbers of digits past the decimal point should match for both mean and SEM.

      This has also now been addressed throughout the text.

      Figure 1D: Revise the bar graphs with distinct SEM bars, as these data are not generated within the same mice.

      The graphs are now changed, and they include distinct SEM and individual data points.

      Figure 2I-L - An n of 3 for controls is pretty minimal, though the clustering of data points is tight.

      We thank the reviewer for this comment, and we emphasize that while we agree that an n=3 for controls is minimal, the mRNA level values of this group are close, therefore the clustering of the data points is tight. We are happy to provide the raw data value for these groups if the reviewer wishes to.

      L159 - The role of reduced dynorphin mRNA is pretty speculative with regard to basal levels of LH, especially since no other indices of LH secretion were affected. It should also be recognized that mRNA levels do not always equate to activity.

      We agree with the reviewer that our explanation of the role of the reduced dynorphin with regards to the elevated basal LH is speculative, however, we only report that the higher LH levels correlates with the lower expression of the Pdyn gene expression, which is in line with the well documented role of Dynorphin on inhibiting LH secretion. We also recognize that mRNA levels don’t necessarily reflect activity. We have now added this statement to the text (L159).

      L164 - Given the ovary data, it seems that the increase seen in KO mice isn't quite sufficient, but is it known how much of a surge is necessary for ovulation in mice?

      We agree with the reviewer’s comment that the LH surge in Kiss1MC4RKO group is not enough to consistently induce ovulation, which is supported by the decrease in the numbers of corpora lutea data (Figure 2, O).

      According to literature, an LH surge in the female mice is estimated by a LH value >4 ng/ml (Bahougne et al., 2020). According to this rule, our data show that only two females out of six had LH surge in the KO group, while four females out of five had LH surge in the control group.  

      L211 - According to the figure, LH pulses were not recovered and remained similar to KO levels. Looking at the LH secretory patterns presented, it seems like the pulse frequency data should be interpreted with some caution, given that some of the pulses identified are tenuous at best.

      We agree that the LH pulses identified by our software (criteria described in the methods) are variable in shape (LH pulses are difficult to detect clearly in gonad intact females) and did not differ in number between groups; however, the reinsertion of Mc4r within Kiss1 neurons restored LH basal levels, amplitude and total secretory mass, which are clear indicatives of a significant improvement in the ability of these mice to release LH.

      L218 - Is there a reason why the surge was not looked at in these groups?

      Ovarian histology is the best indicator of ovulation. In these mice, corpora lutea were absent, indicating impaired ovulation, thus, we did not consider performing an LH surge protocol was necessary.

      L244 - This would also fit with previous findings in sheep that not all Kiss neurons express MC receptors

      We agree with this comment.

      L329 - Given the rapidity of its actions, how would this membrane ER function during a normal surge?

      Rapid estrogen signaling can act to ease transitions between states. Membrane delimited E2 actions can quickly attenuate or enhance coupling between receptors and signaling cascades. These effects will precede E2-driven changes in gene expression that produce more stable alterations in signaling. This combination of mechanisms will reduce any lag between rises in serum E2 and physiological effects. Considering the abbreviated mouse reproductive cycle, parallel mechanisms acting on different timescales are particularly important.

      L365 - I'm a little confused as to how this particular work sheds light on a role for MC3R. Is the relative distribution of the two isoforms within Kiss neurons known?

      In the present study, we report that hypothalamic Mc3r expression decreases leading up to the age of puberty onset (p30), in line with the profile of expression of Mc4r and a recent publication involving Mc3r in puberty onset (Lam et al., 2021), suggesting that both receptors may be involved in the control of reproductive function, potentially through the direct regulation of Kiss1 neurons as characterized in our present study.

      L422 - While I understand the nature of this statement, the receptor may simply reflect the activity of what binds to it, i.e., AgRP vs. alpha-MSH, suggesting that maybe the prepubertal period is more AgRP-dominated.

      We agree with this statement, and this needs to be further investigated.

      L495 - Reinsertion of Mc4R in Kiss1 neurons

      Thank you for this comment. This is now corrected in the text (L501).

      L524 - Bilateral ovariectomy of 6-month

      Thank you for this comment. This is now corrected in the text (L530).

      L538 - Is it known what stage of the cycle these mice were in when samples were collected?

      Yes, the samples were collected in diestrus. This is now mentioned in the text (L548)

      L556 - Pulse amplitude is usually measured relative to the preceding nadir.

      The method that we have been consistently using in our lab is the average of the 4 highest LH values in the samples collection period for each animal. We have found this to be consistent and representative of the overall amplitude (McCarthy et al., 2021; Talbi et al., 2021).

      L594 - This is a little confusing - the whole MBH would contain the ARH, but only the ARH was collected from the KO mice. If the whole MBH, dynorphin and Tac3, and Tac3 are expressed outside of the ARC, making interpretation of changes specifically within the ARH is difficult.

      Here (L592), we describe two different experiments, as mentioned by i) and ii).

      For experiment 1 (i): MBH was used in the WT mice at ages P10, P15, P22 and P30 to investigate the expression of the melanocortin genes (Agrp, Pomc, Mc3r and Mc4r).

      For experiment 2 (ii): In both KO and control groups, only the micro-dissected ARH was used to investigate genes expressions of Pdyn, Kiss1, Tac2, Tacr3.

      Reviewer #2 (Recommendations For The Authors):

      The validation experiments for the various manipulations are currently presented in the supplementary data. Still, in my opinion, these are critically important for interpreting the data, and it should be considered to present these more comprehensively in the main body of the manuscript. In Figure S1, it seems that the exposure of the two images is not the same, with a higher background in the control. Has this image been adjusted to highlight the staining, while the other has not? It looks like there remains a low level of expression still present in at least some of the KO cells - this may reflect difficulties using RNAscope (with its extreme amplification) to detect the absence of a signal, or it could also be that the knockout is incomplete. A percentage of cells still express MC4R. I think this should be acknowledged or discussed.

      We thank the reviewer for the feedback. While we agree that the validation of the mouse model is critical, we would like to keep it in the supplemental data.

      We also agree that the exposure looks different between the KO and WT controls, and we thank the reviewer for this comment. The quality of the photograph decreased when transferring to the manuscript. This has now been improved in the revised figure.

      As for the MC4R expression in some of the KO cells, we believe that MC4R is expressed in non Kiss1 cells as shown in the merged figure. Therefore, we believe that the Knockout of Mc4r in Kiss1 neurons is complete in these mice.

      The clear difference from the PVN's lack of effect is convincing and indicates that a specific knockout has been achieved. Is equivalent data also available for the AVPV population of cells that are examined later in the manuscript? Do those Kiss1 neurons also express the MC4R? The same question applies to the knock-in experiment: Was the expression of MC4R also driven in the AVPV population using this approach

      Yes, Kiss1 neurons in the AVPV also express MC4R as indicated in this study, and thus Mc4r is removed/reinserted in the AVPV as well in this mouse model.

      The quantitative RT-qPCR data on developmental changes in metabolic signaling molecules are really peripheral to the paper's main question. Relative to the validation experiments (as discussed above), I think these are less important data and could be placed into a supplementary figure. The discussion of these data becomes problematic, e.g., on line 359, the changes are described as "a low melanocortin tone..." but this seems problematic when referring to reduced expression of AgRP, an inverse agonist at the MC4R. If you are going to present these data, individual data points should be shown. Similarly, the question about whether this is a PCOS-like phenotype is perhaps worth asking. Still, the simple assessment of T and AMH could also be reported in a sentence without necessarily showing the data (or placing it in a supplementary figure). Better to focus on the key question - which is the role of MC4R signaling in Kiss1 neurons.

      We understand this reviewer’s concerns, however, due to the impact of MC4R signaling (particularly in the context of AgRP) on puberty, we strongly believe that the reader will benefit from expression profile across ages so we will respectfully disagree and keep in the main figure.  

      Per this reviewer’s comment, we have now added individual data points to Figure 1D.

      We also agree with the reviewer that the T and AMH data are not in the main scope of the paper, but since we uncovered a PCOS-like phenotype in female mice with specific deletion of Mc4r from Kiss1 neurons, it is important to keep these data in the main figure to show that the phenotype does not fully resemble a PCOS model.

      Having praised the experimental design, I think it is fair to acknowledge that the reproductive data from these experiments remain difficult to interpret. I understand that it is difficult to illustrate estrous cycles, but the "quantitative" data on percentages of time spent in any one stage are not as informative as seeing the actual individual patterns in Figure 2B. Were all of the animals consistently like the one illustrated, with persistent diestrus and only occasional evidence of ovulation?

      We agree that Figure 2C may be difficult to interpret but it is the best way to capture the all the data points for each group.

      All the 5 Kiss1MC4RKO females had persistent diestrus phases with only one or two estrus phases over 15 days (except for one female who had 4 estrous days), compared to control females who had 7 to 9 days of estrous, as shown in the graph (except for one female who had 5 days of estrus over 15 days period).

      Given that LH pulses appear to be normal, does this, in fact, suggest an ovarian problem? Is that possible? Are MC4R and Kiss1 co-expressed in the ovary? Or do you think this suggests an ovulation problem, perhaps driven by the impaired LH surge?

      This reviewer is correct in that our findings suggest a central defect in ovulation based on the deficit observed in the preovulatory LH surge. Thus, it is possible to have normal LH pulses, which are driven by one population of Kiss1 neurons (ARH) and the LH surge, driven by a distinct population of Kiss1 neurons (AVPV).

      Similarly, the response to the "LH surge induction protocol" is impaired (why not look at endogenous LH surges?). It seems that ovulation should be an all-or-none phenomenon in that if the LH surge is sufficient to induce ovulation, then all available follicles would be ovulated. If it is not, then no follicles will be ovulated. Why fewer follicles are ovulated in the gene-targeted animals seems more likely to be due to impaired follicular development rather than a subthreshold LH surge. So, this again points back to the ovary. Or perhaps we need a more thorough assessment of the pattern of LH pulses throughout the cycles in these animals.

      An LH surge induction protocol allows us to submit all female mice to the same conditions and expect a similar response, which is then optimal to compare with animals with an expected ovulation deficit, as it eliminates   external factors. We disagree in that ovulation is an all-or-none phenomenon because in mice numerous follicles mature at the same time and thus a decrease in the number of ovulated oocytes may be significant between groups even if the animals are not completely infertile.

      Collectively, my assessment of these data is that there are effects on reproduction, but they are actually relatively subtle. There were abnormal cycles and impaired LH surge in response to exogenous estrogen. But the animals are not actually infertile, so can ovulate and express normal reproductive behavior. So while there is a role for MC4R signalling in Kiss1 neurons, it may be a contributing modulatory role rather than a major regulatory mechanism. I think the tone of the descriptions should reflect this. I like the way it is framed in some parts of the discussion ("reproductive impairments...mediated by MC4R in Kiss1 neurons and not by their obese phenotype"), but the overall significance of this is overstated in some places, such as the abstract and in other parts of the discussion ("this population is tightly controlled by melanocortins").

      As mentioned in previous responses, ovulation in mice is not all-or nothing, so while the mice can reproduce, the disruption in the central mechanisms that control ovulation and irregular estrous cycles are a significant advancement in the field with strong translational potential to species where only one oocyte is usually ovulated, like in humans, where reproductive disorders in MC4R patients had been attributed to the obesity phenotype rather than to a central action of MC4R (as the reviewer captured in their comment). This is one of the main findings of this study.

      The overstatement has been now addressed throughout the text.

      For in vitro studies, all mice were ovariectomized and given estradiol "replacement." What was the rationale for this? Wouldn't this suppress the basal activity of these neurons? Then it appears that some of the animals were studied as ovariectomised (for an unspecified time but apparently ">7 days", without hormone replacement. The basal activity of these cells would be dramatically different. I think these artificial manipulations make these data quite difficult to interpret. How does this reflect the situation in a normal (or abnormal) estrous cycle? My understanding is that the brain slice approach already compromises the ability of this population of cells to function as a coordinated network (i.e., coordinated episodes of activity that are seen in vivo have not been observed in vitro in brain slices). Ovariectomizing and providing exogenous hormones also removes the additional regulatory elements of the cyclical changes in hormone inputs, so the cells may or may not behave like they would in vivo. Perhaps the authors could justify their choice of experimental model.

      We have clarified that the mice were ovariectomized for 7-10 days. A group of 3 mice are OVXed at once and then used on subsequent days a week later. This delay is both for the recovery of the animal and to allow for “washout” of endogenous ovarian hormones. For optogenetic studies, we were not measuring basal activity. Rather, we prioritized the ability to detect a postsynaptic response. While E2 decreases the networked activity of Kiss1- ARH neurons, the Hcn channels, calcium channels, and Vglut2 expression are all increased. This leads to increased excitability and more glutamate release. Mice lack true follicular and luteal phases and therefore it is impossible to separate estrogen-mediated changes from progesterone-mediated changes (e.g., in a proestrous female). Therefore, we use an ovariectomized female model in which we can generate a LH surge with an E2-replacement regimen (Bosch et al., J Mol Cell Endocrinology 2013). This model enables us to focus on estrogen effects, exclude progesterone effects, and minimize variability. Finally, we have documented that Kiss1<sup>ARH</sup> neurons retain the synchronization of their neuronal firing in the hypothalamic slice preparation (Qiu et al., eLife 2016).

      Figure 4E shows neurons' staining after expressing a Cre-dependent channel rhodopsin vector into POMC-Cre mice. The number of labelled cells looks markedly larger than expected for adult POMC neurons. Was the specificity of this approach to neurons expressing POMC checked? I understand that the POMC-Cre mice have been criticised for ectopic expression of Cre during development in other populations of neurons in the arcuate nucleus that does not express POMC, such as the AgRP neurons (e.g., PMID: 22166984). Is it possible that this is not a problem in adult animals? Has that been validated in these animals? The description of the method suggests that it is acknowledged that some of the expression driven in these animals might be in AgRP neurons. Still, optogenetic activation of these cells will include all cells expressing Cre at the time of AAV administration.

      POMC is transiently expressed during embryonic development in a portion of cells fated to be Kiss1 or NPY/AgRP neurons. Therefore, this is a valid concern when crossing with a floxed mouse. However, use of AAVs in adult animals avoids this issue and leads to specific expression in POMC neurons. This POMC-Cre mouse has been used extensively with AAVs to drive specific expression in POMC neurons by other laboratories (Padilla et al., Nat Med 2010; Lam et al., Mol Metab 2017; Stincic et al., eNeuro 2018 eNeuro; Fenselau et al., Nat Neuro 2017). We have previously shown that AAV-driven mCherry expression is limited to cells labeled with a beta-endorphin antibody (Stincic et al., 2018 eNeuro). Therefore, we are confident that our optogenetic studies have narrowly targeted POMC inputs.

      Some additional explanation of the electrophysiology result may be required. For example, on Line 292, I'm confused by Fig 4M. Why is the response to 20Hz stimulation different in this cell (compared to the one in 4L) before administering naloxone? What proportion of cells showed this opposite response? On line 307: Is 5 cells sufficient for testing the POMC inputs onto AVPV and PeN Kiss1 neurons? How many slices/animals are included in collecting these 5 cells? The rapid action of STX illustrates the ability to modulate the response to MTII, but I am struggling to understand the implications of this in a physiological context. Suppose this response is desensitized by longer-term treatment with E2 (as indicated in the manuscript). Is it relevant to normal regulation during the cycle (particularly in the AVPV, where the key regulatory step seems to be the prolonged exposure to high estradiol as part of the preovulatory signals leading up to the LH surge)?

      As stated in the text, E2 has been shown to increase POMC expression and beta-Endorphin immunostaining. We do not know the effects of E2 on aMSH expression and release. E2 also tends to attenuate the coupling between inhibitory postsynaptic metabotropic (Gi,o-coupled) receptors and signaling cascades. So, there is likely a combination of pre- and post-synaptic mechanisms contributing to these responses. However, the focus of the current studies was on the predominant melanocortin signaling and, as such, we chose to eliminate the influence of opioid signaling. We have added two more cells to this group, both of which were successfully rescued for a total of 5 of 6 cells (6 slices, 5 animals). Between the labeling of b-endorphin fibers and high rate of rescue, we do believe that this is sufficient evidence to support a direct POMC input to Kiss1<sup>AVP/PeN</sup> neurons.

      Line 52: "Here, we show that Mc4r expressed in Kiss1 neurons is required for fertility in females." The knockout animals remain fertile, so this conclusion needs to be re-worded.

      Thank you for this comment. This has now been changed (L52).

      Line 80: "The melanocortin 4 receptor (MC4R) binds α-melanocyte stimulating hormone (αMSH), an agonist product of the pro-opiomelanocortin (Pomc) gene, and the inverse agonist of the agouti-related peptide (AgRP) to regulate food intake and energy expenditure" Is this the correct wording? I think it should be stated that AgRP is an inverse agonist at the MC4R, not that αMSH is the inverse agonist of AgRP. Re-work this sentence.

      Thank you for this comment. This has now been changed (L79-80).

      Line 88: "... however, conflicting reports exist". Describe what these conflicting reports show. Many MC4 variants ("mutations") are expressed in humans, but few will fully inactivate signalling like the mouse knockout.

      We thank the reviewer for this comment. By conflicting data, we refer to the studies that report no reproductive impairments in women with MC4R mutations. Either because the metabolic impairments (obesity, hyperphagia, hyperinsulinemia, hyperleptinemia, etc) are so strong that the focus is skewed to these issues, without a full reproductive assessment in these women, or simply because the reviewer mentioned, not all MC4R mutations fully inactivate its signaling in humans - as opposed to mouse models where reproductive disruption has been described previously in full body MC4RKOs.

      Line 91: "...that largely affects females". Is this a genuine sex difference, or are reproductive deficits simply more overt in female rodents? I think the Coss paper (reference 19 in the manuscript) showed a greater effect of diet-induced obesity in males than in females.

      We believe that sex differences exist with regards to the role of MC4R in the regulation of fertility, as we show that most of this effect is mediated by MC4R signaling in Kiss1 AVPV neurons, a neuronal population that is specific to the female brain.

      As far as we can tell, the Coss paper (Villa et al., 2024) has only tested males but not females. Moreover, they investigated the effect of diet induced obesity in mice on their fertility (specifically LH secretion), while in this study we are specifically looking at the deletion of MC4R from Kiss1 neurons, and these mice were not obese (Figure 2A). While both these conditions induce impaired fertility, the mechanisms and signaling pathways are different (our mice lack MC4R signaling while the obese mice have a decrease in MC4R expression but the signaling is still functional).

      Line 392: also Hessler et al. PMID: 32337804.

      This reference is now added to the text (Line 393).

      Line 433. The discussion of how advanced puberty onset (seen in the Kiss1-specific KO animals) might be caused by MC4R signalling in AVPV Kiss1 neurons, which are sexually dimorphic, which might explain sex differences in puberty timing in mammals seems extremely speculative and based on limited data. More targeted experiments would be needed to address this, and I think this speculation should be removed here.

      This speculation has now been removed from the text.

      Line 438: "Furthermore, our findings suggest that metabolic cues, through the regulation of the melanocortin output onto Kiss1AVPV/PeN neurons, are essential for the timing and magnitude of the GnRH/LH surge." Again, I think this is overstating the present data, which has only looked at an artificial hormone administration regime. The animals are fertile and, thus, must be able to mount a sufficient LH surge. The major effect, in fact, seems to be on their cycle, perhaps leading to impaired follicular development. Please acknowledge that this will be one of the multiple pathways by which metabolic information is fed into the HPG axis.

      In addition to the effect on their cycles as mentioned by the reviewer, the Kiss1MC4RKO females also display impaired fertility (Figure 2, S-T) and fewer corpora lutea which is in line with the impaired mounting of LH surge (Figure 2, M). Even if the LH surge is induced by the hormone administration protocol, it only reflects the natural ability of the HPG axis to mount the surge, as this regimen is only there to mimic the endogenous hormonal changes leading to LH surge and therefore ovulation, in a controlled manner. Nonetheless, we agree with this reviewer that this is not the sole mechanism by which metabolism regulates reproductive function and this has been emphasized in the paper. (line 443)

      Reviewer #3 (Recommendations For The Authors):

      The decreased melanocortin tone drives puberty onset (Figure 1D), and this is correlative. The transgenic animals' hypothalamic expression of Agrp, Pomc, Mc4r, and Mc3r can be measured to strengthen the claim. Hprt expression should be demonstrated, as this housekeeping gene was used as a common denominator.

      We thank the reviewer for this comment. While we think that indeed, measuring Agrp, Pomc, Mc4r, and Mc3r gene expressions in the transgenic mice will strengthen our claim and give more insights into the melanocortins tone during pubertal maturation, this is unfortunately not feasible as it will involve generating a lot of mice (at least n=40 pups for an n=5/group, KO and control littermates, females only -which will require setting up lots of breeding pairs-) during different ages throughout puberty.

      As for the gene expression of Hprt, because we have 6 mice per age, 4 ages total, every gene (Agrp, Pomc, Mc4r, Mc3r) was run in a separate plate with Hprt as its own housekeeping gene. Samples were run in duplicates for each Hprt and melanocortin genes in a 96 well = 48 wells for Hprt and 48 wells for each of the melanocortin genes. Therefore, it won’t be possible to represent one Hprt expression for all the four genes, however every gene was normalized to the Hprt gene expression that was ran in the same plate).

      In Figures 4 and 5, dot plots can be used (as opposed to the bar graphs) to better reflect the individual data points.

      Figures 4 and 5 have been revised to include individual data points.

      The electrophysiology experiment requires more details in the method section. In addition to the publication cited, a brief recap of the methodology used in this paper, such as the focal application of MTII (Figure 4B), is also needed.

      We have added more details to the Methods.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors observed a decline in autophagy and proteasome activity in the context of Milton knockdown. Through proteomic analysis, they identified an increase in the protein levels of eIF2β, subsequently pinpointing a novel interaction within eIF subunits where eIF2β contributes to the reduction of eIF2α phosphorylation levels. Furthermore, they demonstrated that overexpression of eIF2β suppresses autophagy and leads to diminished motor function. It was also shown that in a heterozygous mutant background of eIF2β, Milton knockdown could be rescued. This work represents a novel and significant contribution to the field, revealing for the first time that the loss of mitochondria from axons can lead to impaired autophagy function via eIF2β, potentially influencing the acceleration of aging. To further support the authors' claims, several improvements are necessary, particularly in the methods of quantification and the points that should be demonstrated quantitatively. It is crucial to investigate the correlation between aging and the proteins eIF2β and eIF2α.

      Thank you so much for your review and comments. We included analyses of protein levels of eIF2α, eIF2β, and eIF2γ at 7 days and 21 days (Figure 4D). The manuscript was revised as below;

      Lines 242-245 ‘As for the other subunits of eIF2 complex, proteome analysis did not detect a significant difference in the protein levels of eIF2α and eIF2γ between milton knockdown and control flies at 7 and 21 days (Figure 4D).’

      Reviewer #2 (Public Review):

      In the manuscript, the authors aimed to elucidate the molecular mechanism that explains neurodegeneration caused by the depletion of axonal mitochondria. In Drosophila, starting with siRNA depletion of Milton and Miro, the authors attempted to demonstrate that the depletion of axonal mitochondria induces the defect in autophagy. From proteome analyses, the authors hypothesized that autophagy is impacted by the abundance of eIF2β and the phosphorylation of eIF2α. The authors followed up the proteome analyses by testing the effects of eIF2β overexpression and depletion on autophagy. With the results from those experiments, the authors proposed a novel role of eIF2β in proteostasis that underlies neurodegeneration derived from the depletion of axonal mitochondria.

      The manuscript has several weaknesses. The reader should take extra care while reading this manuscript and when acknowledging the findings and the model in this manuscript.

      The defect in autophagy by the depletion of axonal mitochondria is one of the main claims in the paper. The authors should work more on describing their results of LC3-II/LC3-I ratio, as there are multiple ways to interpret the LC3 blotting for the autophagy assessment. Lysosomal defects result in the accumulation of LC3-II thus the LC3-II/LC3-I ratio gets higher. On the other hand, the defect in the early steps of autophagosome formation could result in a lower LC3-II/LC3-I ratio. From the results of the actual blotting, the LC3-I abundance is the source of the major difference for all conditions (Milton RNAi and eIF2β overexpression and depletion). In the text, the authors simply state the observation of their LC3 blotting. The manuscript lacks an explanation of how to evaluate the LC3-II/LC3-I ratio. Also, the manuscript lacks an elaboration on what the results of the LC3 blotting indicate about the state of autophagy by the depletion of axonal mitochondria.

      Thank you for pointing it out, and we apologize for an insufficient description of the result. We included quantitation of the levels of LC3-I and LC3-II in Figure 2A, 2D, 3D, 6B and 7B. As the reviewer pointed out, changes in the LC3-II/LC3-I ratio do not necessarily indicate autophagy defects. However, since p62 accumulation (Figure 2B, 2E, 3E, 6C, 7C in the original manuscript), these results collectively suggest that autophagy is lowered. We revised the manuscript to include this discussion as below:

      Lines 174-186 ‘During autophagy progression, LC3 is conjugated with phosphatidylethanolamine to form LC3-II, which localizes to isolation membranes and autophagosomes. LC3-I accumulation occurs when autophagosome formation is impaired, and LC3-II accumulation is associated with lysosomal defects(31,32). p62 is an autophagy substrate, and its accumulation suggests autophagic defects(31,32). We found that milton knockdown increased LC3-I, and the LC3-II/LC3-I ratio was lower in milton knockdown flies than in control flies at 14-day-old (Figure 2A). We also analyzed p62 levels in head lysates sequentially extracted using detergents with different stringencies (1% Triton X-100 and 2% SDS). Western blotting revealed that p62 levels were increased in the brains of 14-day-old of milton knockdown flies (Figure 2B). The increase in the p62 level was significant in the Triton X-100-soluble fraction but not in the SDS-soluble fraction (Figure 2B), suggesting that depletion of axonal mitochondria impairs the degradation of less-aggregated proteins.’

      Line 189-190 : ‘At 30 day-old, LC3-I was still higher, and the LC3-II/LC3-I ratio was lower, in milton knockdown compared to the control (Figure 2D).’

      Line 199-201: ‘However, in contrast with milton knockdown, Pfk knockdown did not affect the levels of LC3-I, LC3-II or the LC3-II/LC3-I ratio (Figure 3D).’

      Line 275-281: ‘Neuronal overexpression of eIF2β increased LC3-II, while the LC3-II/LC3-I ratio was not significantly different (Figure 6A and B). Overexpression of eIF2β significantly increased the p62 level in the Triton X-100-soluble fraction (Figure 6C, 4-fold vs. control, p < 0.005 (1% Triton X-100)) but not in the SDS-soluble fraction (Figure 6C, 2-fold vs. control, p = 0.062 (2% SDS)), as observed in brains of milton knockdown flies (Figure 2B). These data suggest that neuronal overexpression of eIF2β accumulates autophagic substrates.’

      Line 307-315: ‘Neuronal knockdown of milton causes accumulation of autophagic substrate p62 in the Triton X-100-soluble fraction (Figure 2B), and we tested if lowering eIF2β ameliorates it. We found that eIF2β heterozygosity caused a mild increase in LC3-I levels and decreases in LC3-II levels, resulting in a significantly lower LC3-II/LC3-I ratio in milton knockdown flies (Figure 7B). eIF2β heterozygosity decreased the p62 level in the Triton X-100-soluble fraction in the brains of milton knockdown flies (Figure 7C). The p62 level in the SDS-soluble fraction, which is not sensitive to milton knockdown (Figure 2B), was not affected (Figure 7C). These results suggest that suppression of eIF2β ameliorates the impairment of autophagy caused by milton knockdown.’

      Another main point of the paper is the up-regulation of eIF2β by depleting the axonal mitochondria leads to the proteostasis crisis. This claim is formed by the findings from the proteome analyses. The authors should have presented their proteomic data with much thorough presentation and explanation. As in the experiment scheme shown in Figure 4A, the author did two proteome analyses: one from the 7-day-old sample and the other from the 21-day-old sample. The manuscript only shows a plot of the result from the 7-day-old sample, but that of the result from the 21-day-old sample. For the 21-day-old sample, the authors only provided data in the supplemental table, in which the abundance ratio of eIF2β from the 21-day-old sample is 0.753, meaning eIF2β is depleted in the 21-day-old sample. The authors should have explained the impact of the eIF2β depletion in the 21-day-old sample, so the reader could fully understand the authors' interpretation of the role of eIF2β on proteostasis.

      Thank you for pointing it out. We included plots of the results of 21-day-old proteome as a part of the main figure (Figure 4C). As the reviewer pointed out, eIF2β protein levels are reduced at the 21-day-old. Since a reduction in the eIF2_β_ ameliorated milton knockdown-induced locomotor defects in aged flies (Figure 7D), the reduction in eIF2β observed in the 21-day-old milton knockdown flies is not likely to negatively contribute to milton knockdown-induced defects. We included this discussion in the manuscript as below:

      Lines 337-341:‘eIF2β protein levels are reduced at the 21-day-old; however, since a reduction in the eIF2β ameliorated milton knockdown-induced locomotor defects in aged flies (Figure 7), the reduction in eIF2β observed in the 21-day-old is not likely to negatively contribute to milton knockdown-induced defects.’

      The manuscript consists of several weaknesses in its data and explanation regarding translation.

      (1) The authors are likely misunderstanding the effect of phosphorylation of eIF2α on translation. The P-eIF2α is inhibitory for translation initiation. However, the authors seem to be mistaken that the down-regulation of P-eIF2α inhibits translation.

      We are sorry for our insufficient explanation in the previous version. As the reviewer pointed out, it is well known that the phosphorylated form of eIF2α inhibits translation initiation. Neuronal knockdown of milton caused a reduction in p-eIF2α (Figure 4J and K), and it also lowered translation (Figure 5); the relationship between these two events is currently unclear. We do not think that a reduction in the p-eIF2α suppressed translation; rather, we propose that the unbalance of expression levels of the components of eIF2 complexes negatively affects translation. We revised discussion sections to describe our interpretation more in detail as below:

      Line 368-378: ‘eIF2β is a component of eIF2, which meditates translational regulation and ISR initiation. When ISR is activated, phosphorylated eIF2α suppresses global translation and induces translation of ATF4, which mediates transcription of autophagy-related genes(39,40). Since ISR can positively regulate autophagy, we suspected that suppression of ISR underlies a reduction in autophagic protein degradation. We found neuronal knockdown of milton reduced phosphorylated eIF2α, suggesting that ISR is reduced (Figure 4). However, we also found that global translation was reduced (Figure 5). It may be possible that increased levels of eIF2β disrupt the eIF2 complex or alter its functions. The stoichiometric mismatch caused by an imbalance of eIF2 components may inhibit ISR induction. Supporting this model, we found that eIF2β upregulation reduced the levels of p-eIF2α (Figure 6).’

      We have revised the graphical abstract and removed the eIF2 complex since its role in the loss of proteostasis caused by milton knockdown has not been elucidated yet.

      (2) The result of polysome profiling in Figure 4H is implausible. By 10%-25% sucrose density gradient, polysomes are not expected to be observed. The authors should have used a gradient with much denser sucrose, such as 10-50%.

      Thank you for pointing it out. It was a mistake of 10-50%, and we apologize for the oversight. It was corrected (Figure 5).

      (3) Also on the polysome profiling, as in the method section, the authors seemed to fractionate ultra-centrifuged samples from top to bottom and then measured A260 by a plate reader. In that case, the authors should have provided a line plot with individual data points, not the smoothly connected ones in the manuscript.

      Thank you for pointing it out. We revised the graph (Figure 5).

      (4) For both the results from polysome profiling and puromycin incorporation (Figure 4H and I), the difference between control siRNA and Milton siRNA are subtle, if not nonexistent. This might arise from the lack of spatial resolution in their experiment as the authors used head lysate for these data but the ratio of Phospho-eIF2α/eIF2α only changes in the axons, based on their results in Figure 4E-G. The authors could have attempted to capture the spatial resolution for the axonal translation to see the difference between control siRNA and Milton siRNA.

      Thank you for your comment. We agree that it would be an interesting experiment, but it will take a considerable amount of time to analyze axonal translation with spatial resolution. We will try to include such analyses in the future. For this manuscript, we revised the discussion section to include the reviewer's suggestion as below;

      Lines 351-353: ‘Further analyses to dissect the effects of milton knockdown on proteostasis and translation in the cell body and axon by experiments with spatial resolution would be needed.’

      Recommendations for the authors:

      From the Reviewing Editor:

      As the Reviewing Editor, I have read your manuscript and the associated peer reviews. I have concerns about publishing this work in its current form. I think that your manuscript cannot claim to have found a novel function of eIF2beta because of technical uncertainties and conceptual problems that should be addressed.

      Thank you so much for your review and comments. We addressed all the concerns raised by the reviewers. Point-by-point responses are listed below.

      First, your manuscript is based partly on what appears to be a mistaken understanding of the mechanistic basis of the ISR. Specifically, eIF2 is a heterotrimeric complex of alpha, beta, and gamma subunits. When eIF2a is phosphorylated, the heterotrimer adopts a new conformation. This conformation directly binds and inhibits eIF2B, the decameric GEF that exchanges the GDP bound to the gamma subunit of the eIF2 complex for GTP. Unless I misunderstood your paper, you seem to propose that decreasing levels of phospho-eIF2a will inhibit translation, but this is backward from what we know about the ISR.

      Thank you for your insightful comment, and we are sorry for the confusion. We did not mean to propose that decreasing levels of phospho-eIF2_a_ inhibits translation. We apologize for our insufficient explanation, which might have caused a misunderstanding (Lines 312-318 in the original version). We agree with the reviewer that ‘mismatch due to elevated eIF2-beta could change the behavior of the ISR’. We revised the text in the result section as follows:

      Lines 259-264 (in the Result section) ‘Phosphorylation of eIF2α induces conformational changes in the eIF2 complex and inhibits global translation(36). To analyze the effects of milton knockdown on translation, we performed polysome gradient centrifugation to examine the level of ribosome binding to mRNA. Since p-eIF2α was downregulated, we hypothesized that milton knockdown would enhance translation. However, unexpectedly, we found that milton knockdown significantly reduced the level of mRNAs associated with polysomes (Figure 5A and B).’

      Lines 368-378 (in the Discussion section): ‘eIF2β is a component of eIF2, which meditates translational regulation and ISR initiation. When ISR is activated, phosphorylated eIF2α suppresses global translation and induces translation of ATF4, which mediates transcription of autophagy-related genes(39,40). Since ISR can positively regulate autophagy, we suspected that suppression of ISR underlies a reduction in autophagic protein degradation. We found neuronal knockdown of milton reduced phosphorylated eIF2α, suggesting that ISR is reduced (Figure 4). However, we also found that global translation was reduced (Figure 5). It may be possible that increased levels of eIF2β disrupt the eIF2 complex or alter its functions. The stoichiometric mismatch caused by an imbalance of eIF2 components may inhibit ISR induction. Supporting this model, we found that eIF2β upregulation reduced the levels of p-eIF2α (Figure 6).’

      It may be possible that a stoichiometric mismatch due to elevated eIF2-beta could change the behavior of the ISR, but your paper doesn't adequately address the expression levels of all three eIF2 subunits: alpha, beta, and gamma. The proteomic data shown in Fig 4B is unconvincing on its own because the changes in the beta subunit are subtle. The Western blot in Figure 4C suggests that the KD changes the mass or mobility of the beta subunit, and most importantly, there are no Western blots measuring the levels of eIF2a, eIF2a-phospho, or eIF2-gamma.

      We appreciate the reviewer’s comment and agree that the stoichiometric mismatch due to elevated eIF2β may interfere with ISR. We found overexpression of eIF2β lowered p-eIF2 alpha (Figure S2 in V1), which supports this model. We included this data in the main figure in the revised manuscript (Figure 6D) and revised the text as below:

      Lines 279-281: ‘Since milton knockdown reduced the p-eIF2α level (Figure 4K), we asked whether an increase in eIF2β affects p-eIF2α. Neuronal overexpression of eIF2β did not affect the eIF2α level but significantly decreased the p-eIF2α level (Figure 6D, E).’

      Expression data of eIF2α and eIF2γ from proteomic analyses has been extracted from proteome analyses and included as a table (Figure 4D). Western blots of phospho-eIF2a (Figure S1 in V1) in the main figure (Figure 4G). The result section was revised as below;

      Lines 242-245: ‘As for the other subunits of eIF2 complex, proteome analysis did not detect a significant difference in the protein levels of eIF2α and eIF2γ between milton knockdown and control flies at 7 and 21 days (Figure 4D).’

      Reviewer #1 (Recommendations For The Authors):

      L125-128: In this section, while the efficiency of Milton knockdown is referenced from a previous publication, it is necessary to also mention that the Miro knockdown has been similarly reported in the literature. Additionally, the Methods section lacks details on the Miro RNAi line used, and Table 2 does not include the genotype for Miro RNAi. This information should be included for clarity and completeness.

      Thank you for pointing it out. Knockdown efficiency with this strain has been reported (Iijima-Ando et al., PLoS Genet, 2012). We revised the text to include citation and knockdown efficiency as follows:

      Lines 139-147: ‘There was no significant increase in ubiquitinated proteins in milton knockdown flies at 1-day old, suggesting that the accumulation of ubiquitinated proteins caused by milton knockdown is age-dependent (Figure S1). We also analyzed the effect of the neuronal knockdown of Miro, a partner of milton, on the accumulation of ubiquitin-positive proteins. Since severe knockdown of Miro in neurons causes lethality, we used UAS-Miro RNAi strain with low knockdown efficiency, whose expression driven by elav-GAL4 caused 30% reduction of Miro mRNA in head extract(24). Although there was a tendency for increased ubiquitin-positive puncta in Miro knockdown brains, the difference was not significant (Figure 1B, p>0.05 between control RNAi and Miro RNAi). These data suggest that the depletion of axonal mitochondria induced by milton knockdown leads to the accumulation of ubiquitinated proteins before neurodegeneration occurs.’

      L132-L136: The current phrasing in this section suggests an increase in ubiquitinated proteins for both Milton and Miro knockdowns. However, since there is no significant difference noted for Miro, it is incorrect to state an increase in ubiquitin-positive puncta. Furthermore, combining the results of Milton knockdown to claim an increase in ubiquitinated proteins prior to neurodegeneration is misleading. At the very least, the expression here needs to be moderated to accurately reflect the findings.

      Thank you for pointing it out. We revised the text as above.

      L137-L141: Results in Figure 1 indicate that Milton knockdown leads to an increase in ubiquitinated proteins at 14 days, while Miro knockdown shows no difference from the control at either 14 or 30 days. Conversely, both the control and Miro exhibit an increase in ubiquitinated proteins with aging, but this trend does not seem to apply to Milton knockdown. This observation suggests that Milton KD may not affect the changes in protein quality control associated with aging. It implies that Milton's function might be more related to protein homeostasis in younger cells, or that changes due to aging might overshadow the effects of Milton knockdown. These interpretations should be included in the Results or Discussion sections for a more comprehensive analysis.

      Thank you for your insightful comment. We revised the text to include those points as follows:

      Lines 152-153: ‘These results suggest that depletion of axonal mitochondria may have more impact on proteostasis in young neurons than in old neurons.’

      Lines 355-362: ‘The depletion of axonal mitochondria and accumulation of abnormal proteins are both characteristics of aged brains(37,38). Our results suggest that the loss of axonal mitochondria is an event upstream of proteostasis collapse during aging. Neuronal knockdown of milton had more impact on proteostasis in young neurons than the old neurons (Figure 1). Proteome analyses also showed that age-related pathways, such as immune responses, are enhanced in young flies with milton knockdown (Table 2). The reduction in axonal transport of mitochondria may be one of the triggering events of age-related changes and accelerates the onset of aging in the brain.’

      L143 : Please remove the erroneously included quotation mark.

      Thank you for pointing it out. We corrected it.

      L145-L147:

      - While it is understood that Milton knockdown results in a reduction of mitochondria in axons, as reported previously and seemingly indicated in Figure 1E, this paper repeatedly refers to axonal depletion of mitochondria. Therefore, it would be beneficial to quantitatively assess the number of mitochondria in the axonal terminals located in the lamina via electron microscopy. Such quantification would robustly reinforce the argument that mitochondrial absence in axons is a consequence of Milton knockdown.

      Thank you for pointing it out. We included quantitation of the number of mitochondria in the synaptic terminals (Figure 1E).

      The text and figure legend was revised accordingly:

      Lines 156-157: ‘As previously reported(24), the number of mitochondria in presynaptic terminals decreased in milton knockdown (Figure 1E).’

      - The knockdown of Milton is known to reduce mitochondrial transport from an early stage, but what about swelling? By observing swelling at 1 day and 14 days, it may be possible to confirm the onset of swelling and discuss its correlation with the accumulation of ubiquitinated proteins.

      Quantitation of axonal swelling has also been included (Figure 1F).

      We appreciate reviewer’s comments on the correlation between the accumulation of ubiquitinated proteins and axonal swelling. Axonal swelling was not observed at 3-days-old (Iijima-Ando et al., PLoS Genetics, 2012), indicating that axonal swelling is an age-dependent event. Dense materials are found in swollen axons more often than in normal axons, suggesting a positive correlation between disruption of proteostasis and axonal damage. It would be interesting to analyze the time course of events further; however, we feel it is beyond the scope of this manuscript. We revised the text as below to include this discussion:

      Lines 157-159: ‘The swelling of presynaptic terminals, characterized by the enlargement and roundness, was not reported at 3-day-old(24) but observed at this age with about 4% of total presynaptic terminals (Figure 1F, asterisks).’

      Lines 162-167: ‘Dense materials are rarely found in age-matched control neurons, indicating that milton knockdown induces abnormal protein accumulation in the presynaptic terminals (Figure 1G and H). In milton knockdown neurons, dense materials are found in swollen presynaptic terminals more often than in presynaptic terminals without swelling, suggesting a positive correlation between the disruption of proteostasis and axonal damage (Figure 1G).’

      Lines 362-365: ‘Disruption of proteostasis is expected to contribute neurodegeneration(38), and it would be interesting to analyze the sequence of protein accumulation and axonal degeneration in milton knockdown ((24,29) and Figure 1) in detail with higher time resolution.’

      L147-L151: Though Figures 1F and 1G provide qualitative representations, it is advisable to quantitatively assess whether dense materials significantly accumulate. Such quantitative analysis would be required to verify the accumulation of dense materials in the context of the study.

      Thank you for pointing it out. We included quantitation of the number of neurons with dense material (Figure 1G). We revised the manuscript as follows:

      Line 161-163: ‘Dense materials are rarely found in age-matched control neurons, indicating that milton knockdown induces abnormal protein accumulation in the presynaptic terminals (Figure 1G and H).’

      Regarding Figure 1B, C:

      - Even though the count of puncta in the whole brain appears to be fewer than 400, the magnification of the optic lobe suggests a substantial presence of puncta. Please clarify in the Methods section what constitutes a puncta and whether the quantification in the whole brain is based on a 2D or 3D analysis. Detail the methodology used for quantification.

      Thank you for your comment. We revised the method section to include more details as below:

      Lines 434-437: ‘Quantitative analysis was performed using ImageJ (National Institutes of Health) with maximum projection images derived from Z-stack images acquired with same settings. Puncta was identified with mean intensity and area using ImageJ.’

      - What about 1-day-old specimens? Does Milton knockdown already show an increase in ubiquitinated protein accumulation at this early stage? Investigating whether ubiquitin-protein accumulation is involved in aging promotion or is already prevalent during developmental stages is a necessary experiment.

      Thank you for your comment. We carried out immunostaining with an anti-ubiquitin antibody in the brains at 1-day-old. No significant difference was detected between the control and milton knockdown. This result has been included as Figure S1 in the revised manuscript. The result section was revised as below:

      Line 136-139 ‘There was no significant increase in ubiquitinated proteins in milton knockdown flies at 1-day old, suggesting that the accumulation of ubiquitinated proteins caused by milton knockdown is age-dependent (Figure S1).’

      For Figure 1E: In the Electron Microscopy section of the Methods, define how swollen axons were identified and describe the quantification methodology used.

      Thank you for your comment. Swollen axons are, unlike normal axons, round in shape and enlarged. We revised the text as below;

      Lines 157-160: ‘The swelling of presynaptic terminals, characterized by the enlargement and roundness, was not reported at 3-day-old(24) but observed at this age with about 4% of total presynaptic terminals (Figure 1F, asterisks).’

      Lines 683-684, Figure 1 legend: ‘Swollen presynaptic terminals (asterisks in (F)), characterized by the enlargement and higher circularity, were found more frequently in milton knockdown neurons.’

      L218-L219: Throughout the text, the expression 'eIF2β is "upregulated" in response to Milton knockdown' is frequently used. However, considering the presented results, it might be more accurate to interpret that under the condition of Milton knockdown, eIF2β is not undergoing degradation but rather remains stable.

      Thank you for pointing it out. We replaced ‘upregulated’ with ‘increased’ throughout the text.

      L234-L235: On what basis is the conclusion drawn that there is a reduction? Given that three experiments have been conducted, it would be possible and more convincing to quantify the results to determine if there is a significant decrease.

      Thank you for pointing it out. We quantified the AUC of polysome fraction and carried out statistical analysis. There is a significant decrease in polysome in milton knockdown, and this result has been included in Figure 5B. We revised the figure and the legend accordingly.

      L236: 5H-> 4H

      Thank you for pointing it out, and we are sorry for the confusion. We corrected it.

      L238-L239: Since there is no significant difference observed, it may not be accurate to interpret a reduction in puromycin incorporation.

      Thank you for pointing it out. As described above, quantification of polysome fractions showed that milton knockdown significantly reduce polysome (Figure 5B). We revised the manuscript as below;

      Lines 263-264: ‘However, unexpectedly, we found that milton knockdown significantly reduced the level of mRNAs associated with polysomes (Figure 5A and B).’

      Figure 5D and Figure 6D: Climbing assays have been conducted, but I believe experiments should also be performed to examine whether overexpression or heterozygous mutants of eIF2β induce or suppress degeneration.

      Thank you for pointing it out. We analyzed the eyes with eIF2_β_ overexpression for neurodegeneration. Although there was a tendency of elevated neurodegeneration in the retina with eIF2_β_ overexpression, the difference between control and eIF2_β_ overexpression did not reach statistical significance (Figure S2). This result has been included as Figure S2 in the revised manuscript, and the following sentences have been included in the text:

      Lines 288-293: ‘We asked if eIF2β overexpression causes neurodegeneration, as depletion of axonal mitochondria in the photoreceptor neurons causes axon degeneration in an age-dependent manner(24). eIF2β overexpression in photoreceptor neurons tends to increase neurodegeneration in aged flies, while it was not statistically significant (p>0.05, Figure S2).’

      L271-L272: The results in Figure 6B are surprising. I anticipated a greater increase compared to the Milton knockdown alone. While p62 appears to be reduced, it is not clear why these results lead to the conclusion that lowering eIF2β rescues autophagic impairment. Please add a discussion section to address this point.

      Thank you for pointing it out. We apologize for the unclear description of the result. Milton knockdown flies show p62 accumulation (Figure 2), and deleting one copy of eIF2beta in milton knockdown background reduced p62 accumulation (Figure 7C). We revised the text as below:

      Lines 307-315: ‘Neuronal knockdown of milton causes accumulation of autophagic substrate p62 in the Triton X-100-soluble fraction (Figure 2B), and we tested if lowering eIF2β ameliorates it. We found that eIF2β heterozygosity caused a mild increase in LC3-I levels and decreases in LC3-II levels, resulting in a significantly lower LC3-II/LC3-I ratio in milton knockdown flies (Figure 7B). eIF2β heterozygosity decreased the p62 level in the Triton X-100-soluble fraction in the brains of milton knockdown flies (Figure 7C). The p62 level in the SDS-soluble fraction, which is not sensitive to milton knockdown (Figure 2B), was not affected (Figure 7C). These results suggest that suppression of eIF2β ameliorates the impairment of autophagy caused by milton knockdown.’

      L369: Please specify the source of the anti-ubiquitin antibody used.

      Thank you for pointing it out. We included the antibody information in the method section.

      Figure 7: While the relationship between Milton knockdown and the eIF2β and eIF2α proteins has been elucidated through the authors' efforts, I would like to see an investigation into whether eIF2β is upregulated and eIF2α phosphorylation is reduced in simply aged Drosophila. This would help us understand the correlation between aging and eIF2 protein dynamics.

      Thank you for your comment. We agree that it is an important question, and we are working on it. However, we feel that it is beyond the scope of the current manuscript.

      L645-L646: If the mushroom body is identified using mito-GFP, then include mito-GFP in the genotype listed in Supplementary Table 2.

      We are sorry for the oversight. We corrected it in Supplementary Table 2.

      Additionally, while it is presumed that the mito-GFP signal decreases in axons with Milton RNAi, how was the lobe tips area accurately selected for analysis? Please include these details along with a comprehensive description of the quantification methodology in the Methods section.

      Thank you for your comment. Although the mito-GFP signal in the axon is weak in the milton knockdown neurons, it is sufficient to distinguish the mushroom body structure from the background. We revised the method section to include this information in the method section:

      Line 437-438: ‘For eIF2α and p-eIF2α immunostaining, the mushroom body was detected by mitoGFP expression.’

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this work, the authors present a cornucopia of data generated using deep mutational scanning (DMS) of variants in MET kinase, a protein target implicated in many different forms of cancer. The authors conducted a heroic amount of deep mutational scanning, using computational structural models to augment the interpretation of their DMS findings.

      Strengths:

      This powerful combination of computational models, experimental structures in the literature, dose-response curves, and DMS enables them to identify resistance and sensitizing mutations in the MET kinase domain, as well as consider inhibitors in the context of the clinically relevant exon-14 deletion. They then try to use the existing language model ESM1b augmented by an XGBoost regressor to identify key biophysical drivers of fitness. The authors provide an incredible study that has a treasure trove of data on a clinically relevant target that will appeal to many.

      We thank Reviewer 1 for their generous assessment of our manuscript!

      Weaknesses:

      However, the authors do not equally consider alternative possible mechanisms of resistance or sensitivity beyond the impact of mutation on binding, even though the measure used to discuss resistance and sensitivity is ultimately a resistance score derived from the increase or decrease of the presence of a variant during cell growth.

      For this resistance screen, Ba/F3 was a carefully chosen cellular selection system due to its addiction to exogenously provided IL-3, undetected expression of endogenous RTKs (including MET), and dependence on kinase transgenes to promote signaling and growth under IL-3 withdrawal. Together this allows for the readout of variants that alter kinase-driven proliferation without the caveat of bypass resistance. In our previous phenotypic screen (Estevam et al., 2024, eLife), we also carefully examined the impact of all possible MET kinase domain mutations both in the presence and absence of IL-3 withdrawal, but no inhibitors. There, we identified a small group of mutations that were associated with gain-of-function behavior located at conserved regulatory motifs outside of the catalytic site, yet these mutations were largely sensitive to inhibitors within this screen.

      Here, the majority of resistance mutations were located at or near the ATP-binding pocket, suggesting an impact on resistance through direct drug interactions. However, there was also a small population of distal mutations that met our statistical definitions of resistance. Within the crizotinib selection, sites such as T1293, L1272, T1261, amongst others, demonstrated resistance profiles but were located in C-lobe away from the catalytic site. While we did not experimentally validate these specific mutations, it is possible that non-direct drug binders instead promote resistance through allosteric or conformational mechanisms which preserve kinase activity and signaling. Indeed, our ML framework explicitly included conformational and stability effects as significant in improving predictions.

      We would be happy to further discuss any specific alternative resistance mechanisms Reviewer 1 has in mind! Thank you for highlighting this!

      There are also points of discussion and interpretation that rely heavily on docked models of kinase-inhibitor pairs without considering alternative binding modes or providing any validation of the docked pose. Lastly, the use of ESM1b is powerful but constrained heavily by the limited structural training data provided, which can lead to misleading interpretations without considering alternative conformations or poses.

      The majority of our interpretations are grounded in the X-ray structures of WT MET bound to the inhibitors studied (or close analogs). The use of docked models (note - to mutant structures predicted by UMol, not ESM, that can have conformational changes) is primarily in the ML part of the manuscript. Indeed, in our models, conformational and binding mode changes are taken into account as features (see Ligand RMSD, Residue RMSD). There are certainly improved methods (AF3 variants) emerging that might have even more power to model these changes, but they come with greater computational costs and are something we will be evaluating in the future.

      We added to the results section: “While our features can account for some changes in MET-mutant conformation and altered inhibitor binding pose, the prediction of these aspects can likely be improved with new methods.”

      Reviewer #2 (Public review):

      Summary:

      This manuscript provides a comprehensive overview of potential resistance mutations within MET Receptor Tyrosine Kinase and defines how specific mutations affect different inhibitors and modes of target engagement. The goal is to identify inhibitor combinations with the lowest overlap in their sensitivity to resistant mutations and determine if certain resistance mutations/mechanisms are more prevalent for specific modes of ATP-binding site engagement. To achieve this, the authors measured the ability of ~6000 single mutants of MET's kinase domain (in the context of a cytosolic TPR fusion) to drive IL-3-independent proliferation (used as a proxy for activity) of Ba/F3 cells (deep mutational profiling) in the presence of 11 different inhibitors. The authors then used co-crystal and docked structures of inhibitor-bound MET complexes to define the mechanistic basis of resistance and applied a protein language model to develop a predictive model of inhibitor sensitivity/resistance.

      Strengths:

      The major strengths of this manuscript are the comprehensive nature of the study and the rigorous methods used to measure the sensitivity of ~6000 MET mutants in a pooled format. The dataset generated will be a valuable resource for researchers interested in understanding kinase inhibitor sensitivity and, more broadly, small molecule ligand/protein interactions. The structural analyses are systematic and comprehensive, providing interesting insights into resistance mechanisms. Furthermore, the use of machine learning to define inhibitor-specific fitness landscapes is a valuable addition to the narrative. Although the ESM1b protein language model is only moderately successful in identifying the underlying mechanistic basis of resistance, the authors' attempt to integrate systematic sequence/function datasets with machine learning serves as a foundation for future efforts.

      We thank Reviewer 2 for their thoughtful assessment of our manuscript!

      Weaknesses:

      The main limitation of this study is that the authors' efforts to define general mechanisms between inhibitor classes were only moderately successful due to the challenge of uncoupling inhibitor-specific interaction effects from more general mechanisms related to the mode of ATP-binding site engagement. However, this is a minor limitation that only minimally detracts from the impressive overall scope of the study.

      We agree. We have added to the discussion: “A full landscape of mutational effects can help to predict drug response and guide small molecule design to counteract acquired resistance. The ability to define molecular mechanisms towards that goal will likely require more purposefully chosen chemical inhibitors and combinatorial mutational libraries to be maximally informative.”

      Reviewer #3 (Public review):

      Summary:

      In the manuscript 'Mapping kinase domain resistance mechanisms for the MET receptor tyrosine kinase via deep mutational scanning' by Estevam et al, deep mutational scanning is used to assess the impact of ~5,764 mutants in the MET kinase domain on the binding of 11 inhibitors. Analyses were divided by individual inhibitor and kinase inhibitor subtypes (I, II, I 1/2, and III). While a number of mutants were consistent with previous clinical reports, novel potential resistance mutants were also described. This study has implications for the development of combination therapies, namely which combination of inhibitors to avoid based on overlapping resistance mutant profiles. While one suggested pair of inhibitors with the least overlapping resistance mutation profiles was suggested, this manuscript presents a proof of concept toward a more systematic approach for improved selection of combination therapeutics. Furthermore, in a final part of this manuscript the data was used to train a machine learning model, the ESM-1b protein language model augmented with an XG Boost Regressor framework, and found that they could improve predictions of resistance mutations above the initial ESM-1b model.

      Strengths:

      Overall this paper is a tour-de-force of data collection and analysis to establish a more systematic approach for the design of combination therapies, especially in targeting MET and other kinases, a family of proteins significant to therapeutic intervention for a variety of diseases. The presentation of the work is mostly concise and clear with thousands of data points presented neatly and clearly. The discovery of novel resistance mutants for individual MET inhibitors, kinase inhibitor subtypes within the context of MET, and all resistance mutants across inhibitor subtypes for MET has clinical relevance. However, probably the most promising outcome of this paper is the proposal of the inhibitor combination of Crizotinib and Cabozantib as Type I and Type II inhibitors, respectively, with the least overlapping resistance mutation profiles and therefore potentially the most successful combination therapy for MET. While this specific combination is not necessarily the point, it illustrates a compelling systematic approach for deciding how to proceed in developing combination therapy schedules for kinases. In an insightful final section of this paper, the authors approach using their data to train a machine learning model, perhaps understanding that performing these experiments for every kinase for every inhibitor could be prohibitive to applying this method in practice.

      We thank Reviewer 3 for their assessment of our manuscript (we are very happy to have it described as a tour-de-force!)

      Weaknesses:

      This paper presents a clear set of experiments with a compelling justification. The content of the paper is overall of high quality. Below are mostly regarding clarifications in presentation.

      Two places could use more computational experiments and analysis, however. Both are presented as suggestions, but at least a discussion of these topics would improve the overall relevance of this work. In the first case it seems that while the analyses conducted on this dataset were chosen with care to be the most relevant to human health, further analyses of these results and their implications of our understanding of allosteric interactions and their effects on inhibitor binding would be a relevant addition. For example, for any given residue type found to be a resistance mutant are there consistent amino acid mutations to which a large or small or effect is found. For example is a mutation from alanine to phenylalanine always deleterious, though one can assume the exact location of a residue matters significantly. Some of this analysis is done in dividing resistance mutants by those that are near the inhibitor binding site and those that aren't, but more of these types of analyses could help the reader understand the large amount of data presented here. A mention at least of the existing literature in this area and the lack or presence of trends would be worthwhile. For example, is there any correlation with a simpler metric like the Grantham score to predict effects of mutations (in a way the ESM-1b model is a better version of this, so this is somewhat implicitly discussed).

      Indeed we experimented with including these types of features in the XGBoost scheme (particularly residue volume change and distance) to augment the predictive power of the ESM model - see Figure 8 - figure supplement 1; however, we didn’t find them as significant. Therefore, the signal is likely very small and/or incorporated into the baseline ESM model.

      Indeed, this discussion relates to the second point this manuscript could improve upon: the machine learning section. The main actionable item here is that this results section seems the least polished and could do a better job describing what was done. In the figure it looks like results for certain inhibitors were held out as test data - was this all mutants for a single inhibitor, or some other scheme? Overall I think the implications of this section could be fleshed out, potentially with more experiments.

      Figure 8A and the methods section contain a very detailed explanation of test data. We have thought about it and do not have any easy path to improve the description, which we reproduce here:

      “Experimental fitness scores of MET variants in the presence of DMSO and AMG458 were ignored in model training and testing since having just one set of data for a type I ½ inhibitor and DMSO leads to learning by simply memorizing the inhibitor type, without generalizability. The remaining dataset was split into training and test sets to further avoid overfitting (Figure 8A). The following data points were held out for testing - (a) all mutations in the presence of one type I (crizotinib) and one type II (glesatinib analog) inhibitor, (b) 20% of randomly chosen positions (columns) and (c) all mutations in two randomly selected amino acids (rows) (e.g. all mutations to Phe, Ser). After splitting the dataset into train and test sets, the train set was used for XGBoost hyperparameter tuning and cross-validation. For tuning the hyperparameters of each of the XGBoost models, we held out 20% of randomly sampled data points in the training set and used the remaining 80% data for Bayesian hyperparameter optimization of the models with Optuna (Akiba et al., 2019), with an objective to minimize the mean squared error between the fitness predictions on 20% held out split and the corresponding experimental fitness scores. The following hyperparameters were sampled and tuned: type of booster (booster - gbtree or dart), maximum tree depth (max_depth), number of trees (n_estimators), learning rate (eta), minimum leaf split loss (gamma), subsample ratio of columns when constructing each tree (colsample_bytree), L1 and L2 regularization terms (alpha and beta) and tree growth policy (grow_policy - depthwise or lossguide). After identifying the best combination of hyperparameters for each of the models, we performed 10-fold cross validation (with re-sampling) of the models on the full training set. The training set consists of data points corresponding to 230 positions and 18 amino acids. We split these into 10 parts such that each part corresponds to data from 23 positions and 2 amino acids. Then, at each of 10 iterations of cross-validation, models were trained on 9 of 10 parts (207 positions and 16 amino acids) and evaluated on the 1 held out part (23 positions and 2 amino acids). Through this protocol we ensure that we evaluate performance of the models with different subsets of positions and amino acids. The average Pearson correlation and mean squared error of the models from these 10 iterations were calculated and the best performing model out of 8192 models was chosen as the one with the highest cross-validation correlation. The final XGBoost models were obtained by training on the full training set and also used to obtain the fitness score predictions for the validation and test sets. These predictions were used to calculate the inhibitor-wise correlations shown in Figure 8B.“

      As mentioned in the 'Strengths' section, one of the appealing aspects of this paper is indeed its potential wide applicability across kinases -- could you use this ML model to predict resistance mutants for an entirely different kinase? This doesn't seem far-fetched, and would be an extremely compelling addition to this paper to prove the value of this approach.

      This is exactly where we want to go next! But as we see here, it is going to be hard and require more purposeful selection of chemicals and likely combinatorial mutations to be maximally informative (see also reviewer 2 response where we have added text)

      Another area in which this paper could improve its clarity is in the description of caveats of the assay. The exact math used to define resistance mutants and its dependence on the DMSO control is interesting, it is worth discussing where the failure modes of this procedure might be. Could it be that the resistance mutants identified in this assay would differ significantly from those found in patients? That results here are consistent with those seen in the clinic is promising, but discrepancies could remain.

      Thank you for pointing this out. The greatest trade-off of probing the intracellular MET kinase (juxtamembrane, kinase domain, c-tail) in the constitutively active TPR system is that while we gain cytoplasmic expression, constitutive oligomerization, and HGF-independent activation, other features like membrane-proximal effects are lost and translatability of some mutations in non-proliferative conditions may also be limited. Nevertheless, Ba/F3 allows IL-3 withdrawal to serve as an effective variant readout of transgenic kinase variant effects due to its undetectable expression of endogenous RTKs and addiction to exogenous interleukin-3 (IL-3).

      In our previous study, we were also interested in comparing the phenotypic results to available patient populations in cBioPortal. We observed that our DMS captured known oncogenic MET kinase variants, in addition to a population of gain-of-function variants within clinical residue positions that have not been clinically reported. Interestingly, the population of possible novel gain-of-function mutant codons were more distant in genetic space (2-3 Hamming distance) from wild type than the clinically reported variant codon (1-2 Hamming distance).

      For this inhibitor screen, we also carefully compared previously reported and validated resistance mutations across referenced publications to that of our inhibitor screen, and observed large agreement as noted in-text. While discrepancies could definitely remain, there is precedence for consistency.

      Furthermore a more in depth discussion of the MetdelEx14 results is warranted. For example, why is the DMSO signature in Figure 1 - supplement 4 so different from that of Figure 1?

      In our previous study (Estevam et al., 2024), we more directly compared MET and METΔExon14, and while observed several differences, especially at conserved regulatory motifs, the TPR expression system did not provide a robust differential. Therefore, we hypothesize that a membrane-bound context is likely necessary to obtain a differential that captures juxtamembrane regulatory effects for these two isoforms. For that reason, we did not place heavy emphasis on the differences between MET and METΔExon14 in this study. Nevertheless, we performed parallel analysis of the METΔExon14 inhibitor DMS and provided all source and analyzed data in our GitHub repository (https://github.com/fraser-lab/MET_kinase_Inhibitor_DMS).

      In our analysis of resistance, we used Rosace to score and compare DMSO and inhibitor landscapes. We present the full distribution of raw scores in Figure 1 for each condition. However, to visually highlight resistance mutations as a heatmap, we subtracted the scores of each variant in each inhibitor condition from the raw DMSO score, making the heatmaps in Figure 1 - supplement 4 appear more “blue.”

      And finally, there is a lot of emphasis put on the unexpected results of this assay for the tivantinib "type III" inhibitor - could this in fact be because the molecule "is highly selective for the inactive or unphosphorylated form of c-Met" according to Eathiraj et al JBC 2011?

      The work presented by Eathiraj et al JBC 2011 is a key study we reference and is foundational to tivantinib. While the point brought up about tivantinib’s selective preference for an inactive conformation is valid, this is also true for type II kinase inhibitors. In our study, regardless of inhibitor conformational preference, tivantinib was the only one with a nearly identical landscape to DMSO and exhibited selection even in the absence of Ba/F3 MET-addiction (Figure 1E). This result is in closer agreement with MET agnostic behavior reported by Basilico et al., 2013 and Katayama et al., 2013.

      While this paper is crisply written with beautiful figures, the complexity of the data warrants a bit more clarity in how the results are visualized. Namely, clearly highlighting mutants that have previously reported and those identified by this study across all figures could help significantly in understanding the more novel findings of the work.

      To better compare and contrast novel mutation identified in this study to others, we compiled a list of reported resistance mutations from recent clinical and experimental studies (Pecci et al 2024; Yao et al., 2023; Bahcall et al., 2022; Recondo et al., 2020; Rotow et al ., 2020; Fujino et al., 2019), since a direct database with resistance annotations does not exist for MET, to the best of our knowledge. In total, this amounted to 31 annotated resistance mutations across crizotinib, capmatinib, tepotinib, savolitinib, cabozantinib, merestinib, and glesatinib, which we have now tabulated in a new figure (Figure 4) and commentary in the main text:

      To assess the agreement between our DMS and previously annotated resistance mutations, we compiled a list of reported resistance mutations from recent clinical and experimental studies (Pecci et al 2024; Yao et al., 2023; Bahcall et al., 2022; Recondo et al., 2020; Rotow et al ., 2020; Fujino et al., 2019) (Figure 4A,B). Overall, previously discovered mutations are strongly shifted to a GOF distribution for the drugs where resistance is reported from treatment or experiment; in contrast, the distribution is centered around neutral for those sites for other drugs not reported in the literature (Figure 4C). However, even in cases such as L1195V, we observe GOF DMS scores indicative of resistance to previously reported inhibitors. Given this overall strong concordance with prior literature and clinical results, we can also provide hypotheses to clarify the role of mutations that are observed in combination with others. For example, H1094Y is a reported driver mutation that has been linked to resistance in METΔEx14 for glesatinib with either the secondary L1195V mutation or in isolation (Recodo et al., 2020). However, in our assay H1094Y demonstrated slight sensitivity to gelesatinib, suggesting that either resistance is linked to the exon14 deletion isoform, the L1195V mutation, or a cellular factor not modeled well by the BaF3 system.

      Finally, the potential impacts and follow-ups of this excellent study could be communicated better - it is recommended that they advertise better this paper as a resource for the community both as a dataset and as a proof of concept. In this realm I would encourage the authors to emphasize the multiple potential uses of this dataset by others to provide answers and insights on a variety of problems.

      Please see below

      Related to this, the decision to include the MetdelEx14 results, but not discuss them at all is interesting, do the authors expect future analyses to lead to useful insights? Is it surprising that trends are broadly the same to the data discussed?

      Our previous paper suggests that Ba/F3 isn’t a great model for measuring the differences between MET and METΔEx14, so we haven’t emphasized other than to point to our previous paper. We include the full analysis here nonetheless as a resource. Potentially where the greatest differences between resistance mutant behaviors would be observed is in the full-length, membrane-bound MET and METΔEx14 receptor isoforms. While outside of the scope of this study, there is great potential to use the resistance mutations identified in this study as a filtered group to test and map differential inhibitor sensitivities between receptor isoforms.

      And finally it could be valuable to have a small addition of introspection from the authors on how this approach could be altered and/or improved in the future to facilitate the general application of this approach for combination therapies for other targets.

      See also reviewer 2 response where we have added text.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major points of revision:

      (1) It seems like much of the structural interpretation of the inhibitor binding mode, outside of crizotinib binding, appears to come from docked models of the inhibitor to the MET kinase domain. Given the potential variability of the docked structure to the kinase domain, it would be useful for the authors to consider alternative possible binding modes that their docking pipeline may have suggested. It could also be useful to provide some degree of validation or contextualization of their docking models.

      All individual figures are very carefully inspected based on either existing crystal structures of the inhibitor or closely related inhibitors (ATP, 3DKC; crizotinib, 2WGJ; tepotinib, 4R1V; tivantinib, 3RHK; AMG-458, 5T3Q; NVP-BVU972, 3QTI; merestinib, 4EEV; savolitinib, 6SDE). In total, four structural interpretations were the result of docking onto reference experimental structures (capmatinib, cabozantinib, glumetinib, glesatinib). As we wrote above, different conformations and binding modes are possible in predicted mutant structures (as we did here at scale) and included in the ML analysis already.

      (2) In the first section, the authors classify an inhibitor as Type Ia on docking models, but mention the conflicting literature describing it as type Ib - it would be helpful to provide a contextualization of why this distinction between Ia and Ib matters, and what difference it might make. It would also be useful to know if their docking score only suggested poses compatible with Ia or if other poses were provided as well. Validation using other method might be beneficial, especially since they acknowledge the conflicting literature for classification. Or at least recontextualization that more evidence would be needed.

      Kinase inhibitors have several canonical structural definitions we use to base the classifications in this study. Specifically, type I inhibitors are classified in MET by interactions with Y1230, D1228, K1110 in addition to its conformation in the ATP-binding site. Type I inhibitors are further subdivided into type 1a in MET if it leverages interactions with the solvent front and residue G1163. In prior literature referenced, tepotinib was classified as type 1b, which would imply it does not have solvent front interactions, like savolitinib (PDB 6SDE) or NVP-BVU972 (PDB 3QTI). However, in the tepotinib experimental structure (PDB 4R1V), we observed a greater structural resemblance to other type 1a inhibitors opposed to type 1b (Figure 1 - figure supplement 1b).

      (3) The measure used to discuss resistance and sensitivity is ultimately a resistance score derived from the increase or decrease of the presence of a variant during cell growth. This is not a measure of direct binding. It would be helpful if the authors discussed alternative mechanisms through which these variants may impact resistance and/or sensitivity, such as stability, protonation effects, or kinase activity. The score itself may be convolving over all these potential mechanisms to drive GOF and LOF observed behavior.

      See the response to the public review. Indeed, our ML framework explicitly included conformational and stability effects as significant in improving predictions.

      (4) While it is promising to try and improve the predictive properties of ESM1b, it is not exactly clear why the authors considered their structural data of 11 inhibitors a sufficient dataset with which to augment the model. It would be useful for the authors to provide some additional context for why they wished to augment ESM1b in particular with their dataset, and provide any metrics indicating that their training data of 11 inhibitors provided an adequate statistical sample.

      We don’t understand what this means. Sorry!

      (5) The authors use ESM-1b to predict the fitness impact of each mutation and augment it using protein structural data of drug-target interactions. However, using an XGBoost regressor on a single set of 11 kinase-inhibitor interaction pairs is an incredibly sparse dataset to train upon. It would be useful for the authors to consider the limitations of their model, as well as its extensibility in the context of alternate binding poses, alternate conformations, or changes in protonation states of ligand or inhibitor.

      On the contrary - this is 11 chemicals across 3000 mutations. We have discussed alternative interpretations above.

      Minor points:

      (1) It would also be useful for the authors to provide more context around their choice of regressor. XGBoost is a powerful regressor but can easily overfit high dimensional data when paired with language models such as ESM-1b. This would be particularly useful since some of the features to train on were also generated using existing models such as ThermoMPNN.

      Yes - we are quite concerned about overfitting and have tried to assess overfitting by careful design of test and validation sets.

      (2) The authors also mention excluding their DMSO and AMG458 scores in the model training and testing due to overfitting issues - it would be useful to have an SI figure pointing to this data.

      No - we exclude the DMSO because that is the reference (baseline) and AMG because it has a different binding mode. This isn’t related to overfitting.

      (3) The authors mention in their docking pipeline that 5 binding modes were used for each ligand docking, but it appears that only one binding mode is considered in the main figures. It would be useful for the authors to provide additional details about what were the other binding modes used for, how different were each binding mode, and how was the "primary" mode selected (and how much better was its score than the others).

      The reviewer misinterprets the difference between poses shown in figures, based on mostly crystal structures or carefully selected templates, and the use of docked models in feature engineering for the ML part of the study. Where existing crystal structures do not exist, we performed docking for capmatinib, cabozantinib, glumetinib, glesatinib onto reference structures bound to type I (2WGJ) and type II (4EEV) inhibitors. We selected one representative binding mode based on the reference inhibitor, and while not exact, at a minimum these models provide a basis for structural interpretation.

      Reviewer #2 (Recommendations for the authors):

      My main suggestion is for the authors to add a few sentences (in non-technical language) to the results section, specifically before the results shown in Figure 3, defining gain-of-function, loss-of-function, resistance, and sensitivity. While these definitions are present in the materials and methods section, explicitly discussing them prior to the relevant results would significantly improve the overall readability of the manuscript.

      We defined “gain-of-function” and “loss-of-function” mutations as those with fitness scores statistically greater or lower than wild-type. Within the DMSO condition, gain-of-function and loss-of -function labels describe mutational perturbation to protein function, whereas within inhibitor conditions, the labels describe the difference in fitness introduced by an inhibitor.

      We have also clarified these definitions where the terms are first introduced: “As expected, the DMSO control population displayed a bimodal distribution with mutations exhibiting wild-type fitness centered around 0, with a wider distribution of mutations that exhibited loss- or gain-of-function effects, as defined by fitness scores with statistically significant lower or greater scores than wild-type, respectively.”

      Figure 7D. Please add a bit more detail to the legend on how fold change (y-axis) was calculated.

      Here, fold change represents the number of viable cells at each inhibitor concentration relative to the TKI control, measured with the CellTiter-Glo® Luminescent Cell Viability Assay (Promega) as an end point readout. We have updated the legend of Figure 7D with calculation details: “Dose-response for each inhibitor concentration is represented as the fraction of viable cells relative to the TKI free control.”

      I must admit, I did not understand what "Specific inhibitor fitness landscapes also aid in identifying mutations with potential drug sensitivity, such as R1086 and C1091 in the MET P-loop" means. These are positions where most mutations lead to greater sensitivity to crizotinib. Is the idea that there are potentially clinically-relevant MET mutations that can be targeted over wild type with crizotinib?

      Thank you for highlighting this! The P-loop (phosphate-binding loop) is a glycine-rich structural motif conserved in kinase domains. This motif is located in the N-lobe, where its primary role is to gate ATP entry into the active site and stabilize the phosphate groups of ATP when bound. Therefore, the P-loop is a common target region for ATP-competitive inhibitor design, but also a site where resistance can emerge (Roumiantsev et al., 2002). The idea we’d like to convey is that identifying residues that offer the potential for drug stabilization with the added benefit of having lower risk resistance, is an attractive consideration for novel inhibitor design.

      We have added to the text: “Individual inhibitor resistance landscapes also aid in identifying target residues for novel drug design by providing insights into mutability and known resistance cases. This enables the selection of vectors for chemical elaboration with potential lower risk of resistance development. Sites with mutational profiles such as R1086 and C1091, located in the common drug target P-loop of MET, could be likely candidates for crizotinib.”

      Reviewer #3 (Recommendations for the authors):

      (1) Suggested Improvements to the Figures:

      a)  Figure 4A - T1261 seems to be mislabeled

      b)  In Figure 3A it's suggested to highlight mutants determined to be resistance mutants by this scheme.

      c)  In Figure 3D it would be informative to highlight which of these resistance mutants have already been previously reported and which are novel to this study

      d)  Throughout figures 3A, 3D, and 4G the graphical choices on how to highlight synonymous mutations and mutations not performed in the assay needs improvement.

      The Green vs Grey 'TRUE' vs 'FALSE' boxes are confusing. Just a green box indicating synonymous mutations would be sufficient. Additionally these green boxes are hard to see, and often edges of this green box are currently missing making it even more difficult to see and interpret.

      * In Figure 4A mutants do not seem to be indicated by a line or plus sign, but this is not explained in the legend or the caption. Please add.

      * In 3D and 4G it is not clear if the mutants not performed are indicated at all - perhaps they are indicated in white, making them indistinguishable from scores with 0. Please clarify.

      T1261 and G1242 are now correctly labeled.

      In text we have also highlighted reported resistance mutations for crizotinib, which are inclusive of clinical reports and in vitro characterization: “These sites, and many of the individual mutations, have been noted in prior reports, such as: D1228N/H/V/Y, Y1230C/H/N/S, G1163R.”

      We have adjusted the heatmaps to improve visual clarity. Mutations with score 0 are white, as indicated by the scale bar, and mutations uncaptured by the screen are now in light yellow. The green outline distinguishing WT synonymous mutations have also been adjusted so edges are no longer cut off. In our representations, we only distinguished mutations by the score color scale bar and WT outline. What looked like a “plus” or “line” in the original figure was only the heatmap background, which now should be resolved in the updated figure and legends for Figure 3 and Figure 4.

      (2) Some Minor Suggested Improvements to the Text:

      a)  The abbreviation CBL for 'CBL docking site' is used without being defined.

      b)  Figure 3G is referenced, but it does not exist.

      c)  In the sentence 'Beyond these well characterized sites, regions with sensitivity occurred throughout the kinase, primarily in loop-regions which have the greatest mutational tolerance in DMSO, but do not provide a growth advantage in the presence of an inhibitor (Figure 1 - Figure Supplement 1; Figure 1 - Figure Supplement 2).'. It is not clear why these supplemental figures are being referenced.

      d)  In the supplement section 'Enrich2 Scoring' has what seem like placeholders for citations in [brackets]

      Cbl is a E3 ubiquitin ligase that plays a role in MET regulation through engagement with exon 14, specifically at Y1003 when phosphorylated. This mode of regulation was more highlighted in our previous study. However, since Cbl was only mentioned briefly in this study, we have removed reference to it to simplify the text.

      In addition, we have removed the figure 3G reference and corrected the in-text range. We have also removed references to figure supplements where unnecessary and edited the “Enrich2 scoring” method section to now reference missing citations.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary: 

      In organisms with open mitosis, nuclear envelope breakdown at mitotic entry and re‐assembly of the nuclear envelope at the end of mitosis are important, highly regulated processes. One key regulator of nuclear envelope re‐assembly is the BAF (Barrier‐to‐Autointegration) protein, which contributes to cross‐linking of chromosomes to the nuclear envelope. Crucially, BAF has to be in a dephosphorylated form to carry out this function, and PP2A has been shown to be the phosphatase that dephosphorylates BAF. The Ankle2/LEM4 protein has previously been identified as an important regulator of PP2A in the dephosphorylation of BAF but its precise function is not fully understood, and Li and colleagues set out to investigate the function of Ankle2/LEM4 in both Drosophila flies and Drosophila cell lines.

      Strengths: 

      The authors use a combination of biochemical and imaging techniques to understand the biology of Ankle2/LEM4. On the whole, the experiments are well conducted and the results look convincing. A particular strength of this manuscript is that the authors are able to study both cellular phenotypes and organismal effects of their mutants by studying both Drosophila D‐mel cells and whole flies.

      The work presented in this manuscript significantly enhances our understanding of how Ankle2/LEM4 supports BAF dephosphorylation at the end of mitosis. Particularly interesting is the finding that Ankle2/LEM4 appears to be a bona fide PP2A regulatory protein in Drosophila, as well as the localisation of Ankle2/LEM4 and how this is influenced by the interaction between Ankle2 and the ER protein Vap33. It would be interesting to see, though, whether these insights are conserved in mammalian cells, e.g. does mammalian Vap33 also interact with LEM4? Is LEM4 also a part of the PP2A holoenzyme complex in mammalian cells? 

      We feel that conducting experiments to test the level of conservation of our findings in mammalian cells is outside the scope of our study, and we will leave it for other labs to investigate.

      Weaknesses: 

      This work is certainly impactful but more discussion and comparison of the Drosophila versus mammalian cell system would be helpful. Also, to attract the largest possible readership, the Ankle2 protein should be referred to as Ankle2/LEM4 throughout the paper to make it clear that this is the same molecule. 

      We have reinforced our presentation and discussion of similarities and differences between Ankle2 from Drosophila vs humans where relevant throughout the Introduction and Discussion sections. Additionally, we have added the mention that Ankle2 is also called LEM4 in humans in the Abstract and Introduction. However, when referring to Drosophila Ankle2, we do not use LEM4 because it is not listed as an alternate name for this gene/protein in FlyBase.

      A schematic model at the end of the final figure would be very useful to summarise the findings.

      We have already provided a schematic model in Figure S3, where we think it is better placed.

      Reviewer #2 (Public review):

      The authors first identify Ankle2 as a regulatory subunit and direct interactor of PP2A, showing they interact both in vitro and in vivo to promote BAF dephosphorylation. The Ankyrin domain of Ankle2 is important for the interaction with PP2A. They then show Ankle2 also interacts with the ER protein Vap33 through FFAT motifs and they particularly co‐localize during mitosis. The recruitment of Ankle2 to Vap33 is essential to ER and nuclear envelop membrane in telophase while earlier in mitosis, it relies on the C terminus but not the FFAT motifs for recruitments to the nuclear membrane and spindle envelop in early mitosis. The molecular determinants and receptors are currently not known. The authors check the function of the PP2A recruitment to Ankle2/Vap33 in the context of embryos and show this recruitment pathway is functionally important. While the Ankle2/Vap33 interaction is dispensable in adult flies ‐looking at wing development, the PP2A/Ankle2 interaction is essential for correct wing and fly development. Overall, this is a very complete paper that reveals the molecular mechanism of PP2A recruitment to Ankle2 and studies both the cellular and the physiological effect of this interaction in the context of fly development.

      Strengths: 

      The paper is well written and the narrative is well‐developed. The figures are of high quality, wellcontrolled, clearly labelled, and easy to understand. They support the claims made by the authors. 

      Weaknesses: 

      The study would benefit from being discussed in the context of what is already known on Ankle2 biology in C.elegans and human cells. It is important to highlight the structures shown in the paper are alphafold models, rather than validated structures. 

      We have enhanced our presentation of what is known about LEM‐4L/Ankle2 in C. elegans and humans in the Introduction, and further developed comparisons of our findings regarding Drosophila Ankle2 with these orthologs in the Results and Discussion sections. We have also specified in all sections and figure legends that the structures shown are AlphaFold3 models.

      Reviewer #3 (Public review): 

      Summary: 

      The authors were interested in how Ankle2 regulates nuclear envelope reformation after cell division. Other published manuscripts, including those from the authors, show without a doubt that Ankle2 plays a role in this critical process. However, the mechanism by which Ankle2 functions was unclear. Previous work using worms and humans (Asencio et al., 2012) established that human ANKLE2 could bind endogenous PP2A subunits. The binding was direct and was mediated through a region before and including the first ankyrin repeat in human ANKLE2. In addition to its interaction with PP2A, Asencio et al., 2012 also show that ANKLE2 regulates VRK1 kinase activity. Together PP2A and VRK1 regulate BAF phosphorylation for proper nuclear envelope reformation. Here, the authors provide more evidence for interaction with PP2A by also mapping the domain of interaction to the ankyrin repeat in Drosophila. In addition, the ankyrin repeat is essential for nuclear envelope reformation after division. They show that Ankle2 can bind in a PP2A complex without other known regulatory subunits of PP2A. The authors also identify a novel interaction with ER protein Vap33, but functional relevance for this interaction in nuclear envelope reformation is not provided in the manuscript, which the authors explicitly state. This manuscript does not comment on the activity of Ballchen/VRK1 in relation to Ankle2 loss and BAF phosphorylation or nuclear envelope reformation, even though links were previously shown by multiple studies (Asencio et al., Link et al., Apridita Sebastian et al.,). Nuclear envelope defects were rescued by the reduction of VRK1 in two of these manuscripts. It is possible that BAF phosphorylation phenotypes can be contributed by both PP2A inactivity and VRK1 overactivity due to the loss of Ankle2.

      Strengths: 

      This manuscript is a useful finding linking Ankle2 function during nuclear envelope reformation to the PP2A complex. The authors present solid data showing that Ankle2 can form a complex with PP2A‐29B and Mts and generate a phosphoproteomic resource that is fundamentally important to understanding Ankle2 biology. 

      Weaknesses: 

      However, the main findings/conclusions about subcellular localization might be incomplete since they are drawn from overexpression experiments. In addition, throughout the text, some conclusions are overstated or are not supported by data. 

      It is true that all experiments studying subcellular localization were done with tagged proteins overexpressed in flies and cell culture. Nevertheless, we show that Ankle2‐GFP is functional since it rescues phenotypes resulting from the loss of endogenous Ankle2 in both flies and cultured cells. The antibodies we generated against Ankle2 were unable to reliably detect the endogenous protein by immunofluorescence. We have now stated this caveat in our manuscript. Regarding the validity of our conclusions in relation to our data, we address each point raised by the reviewer under the Recommendations for the authors. In some cases, we have adjusted our conclusions and in other cases, we have provided additional clarification or justification. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      There are a few experimental issues that should be addressed, specific comments are listed below: 

      (1) Figure 1F: In this experiment, the authors immunoprecipitate GFP‐PP2A‐29B or PP2A‐B29BGFP and Western blot for Ankle2 and Mts to demonstrate that both are co‐immunoprecipitated. To demonstrate that these interactions are specific, the authors should also blot for a protein that is expected to definitely NOT co‐immunoprecipitate with PP2A‐B29; e.g. tubulin. 

      Our conclusion that GFP‐PP2A‐29B and PP2A‐29B‐GFP specifically interact with Ankle2 and Mts is also based on mass spectrometry analysis of the purification products from embryos and cells in culture, comparing with products of purification of GFP alone (Fig 1E‐F, S1C‐D and Tables S2, S3). The lists of identified proteins reveal that most proteins (including tubulins) are not enriched with GFP‐PP2A‐29B or PP2A‐29B‐GFP like Ankle2 and Mts are.

      (2) Figure 2A: The colour coding of the dots is not explained in the figure legend. 

      We have now added the explanation.

      (3) Figure 2B: The competition experiment is a good idea. Do the authors get the same results when they conduct the experiment the other way round, i.e. keep the concentration of Tws the same but increase the concentration of Ankle2? 

      We have tried this reverse experiment but saw little effect. The failure to observe displacement of Tws by Ankle2 in this context could be due to a higher affinity of Tws than Ankle2 in the PP2A complex, or to lower expression levels achieved for Ankle2 (a larger protein) relative to Tws.

      (4) Figure 5D: The hyperphosphorylation of BAF is very difficult to see, and it is impossible to tell whether the hyperphosphorylation has been rescued or not by the different Ankle2 constructs. Can the phosphorylated and the hyperphosphorylated bands be separated better? This panel needs significant improvements to support the claims in the text.

      In our opinion, the hyperphosphorylated (upper band) and unphosphorylated (lower band) forms of BAF are well resolved and readily distinguishable. The fainter band in the middle could correspond to a partially phosphorylated form of BAF but we do not venture to speculate on its precise identity nor do we need it to draw our conclusions. The important information from this blot is that the level of unphosphorylated BAF after Ankle2 RNAi increases when Ankle2WT‐GFP and Ankle2Fm+FL1‐GFP are expressed but not when Flag‐GFP or Ankle2ANK‐GFP are expressed. In these experiments, the rescue of unphosphorylated BAF is incomplete because not all cells express the GFP‐tagged protein in our non‐clonal stable cell lines.

      Reviewer #2 (Recommendations for the authors):

      (1) The alphafold models need to be labelled as such better on the figures, to distinguish them from X‐ray crystallography structures. Alphafold will always propose a solution but it is not necessarily correct. 

      We have added the note “MODEL” directly in Figures 2C, 2D, 4F and S3B, in addition to the information already provided in the text and figure legends specifying that these are models generated by AlphaFold3.

      (2) Figure 4 F. Annotate the Ankle2 FL1 peptide. 

      We have indicated the amino acid residues in the figure.

      (3) Problems with the statistical tests. T‐tests cannot be used for comparing multiple groups, as this favors error propagation. 

      All of our t‐tests compare only two groups at a time, as indicated. In this regard, our labeling in Fig 5C may have been misleading. We have now changed it.

      (4) Close‐ups of ring canal in Figure S2. In Figure S2, there seem to be lots of GFP‐Ankle2 vesicles in the cytoplasm of the oocyte. 

      We agree that the image showing Ankle2‐GFP alone in the RNAi Vap33 condition suggested a cytoplasmic granular localization of unknown nature. However, upon examination, we realized that this image did not correspond to the same z‐step as the matching merged image (which also

      included DNA staining). We have now replaced the image with the correct one.

      Reviewer #3 (Recommendations for the authors): 

      Be more accurate about what conclusions can be made from reported data, particularly from overexpression and deletion studies. 

      (1) The domain analysis for physical interaction is quite thorough. However, localization information is taken from overexpressed constructs. While these data show what could happen, the authors are not using endogenous levels of Ankle2 in cells or tissues that are known to require Ankle2. As a result, it is difficult to determine whether localization results are biologically meaningful. 

      We have added the following text at the end of the third Results section:

      “We were unable to examine the localization of endogenous Ankle2 because the antibodies that we generated gave inconclusive results in immunofluorescence. For the remainder of our study, we relied on the overexpression of Ankle2‐GFP, which may not perfectly reflect the localization and function of endogenous Ankle2. However, Ankle2‐GFP is functional as it can rescue phenotypes observed when endogenous Ankle2 is depleted (see below).”

      (2) The data showing that Ankle2 is a regulator unit of the PP2A complex also relies on in vitro binding assays in an over‐expression context. Data certainly show Ankle2 can bind proteins in the PP2A complex when overexpressed. However, the authors could not isolate enough of the complex from the animal to test function, so Ankle2 acting as a regulatory subunit isn't functionally shown. There are other possibilities, such as Ankle2 acts as a scaffold for complex assembly.  

      The competition experiments shown in Fig 2 are based on complexes assembling in cells and are not in vitro binding assays. We show 4 lines of evidence supporting the idea that Ankle2 functions as a regulatory subunit of PP2A: 1) Ankle2 interacts with the structural (PP2A‐29B) and catalytic (Mts) subunits of PP2A without any known regulatory subunit of PP2A. 2) Depletion of Ankle2 leads to the hyperphosphorylation of the known PP2A substrate BAF. 3) The PP2A regulatory subunit Tws/B55 competes with Ankle2 for formation of a complex with PP2A. 4) AlphaFold3 predicts that Ankle2 engages in a complex with PP2A at a position similar to that of known regulatory subunits of PP2A including Tws/B55, and consistent with their mutually exclusive presence in PP2A complexes. If Ankle2 acted as a scaffold for the formation of a PP2A complex containing other regulatory subunits, we would expect to detect Ankle2 and another regulatory subunit in the same complex.

      (3) Throughout the text, some conclusions are overstated or are not supported by data. Examples are below: 

      a. Page 1: "we show for the first time that Ankle2 is a regulatory subunit of PP2A"  The authors show binding and changes in BAF phosphorylation levels, but changes in PP2A activity with modulation of Ankle2 weren't shown. 

      We have replaced this phrase with this one:

      “…we provide several lines of evidence that suggest that Ankle2 is a regulatory subunit of PP2A…”

      b. Page 3: "The requirement for Ankle2 in the development of the central nervous system was initially discovered through its targeting by the microcephaly‐causing Zika virus (Shah et al.,

      2018)." 

      This is not the first paper showing ANKLE2 plays a role in the development of the CNS. Yamamoto et al., 2014 identified mutants in Ankle2 with defects in CNS development in flies and humans, establishing it as a human microcephaly‐causing gene. 

      We are sorry for this oversight. We have now cited this important work.

      c. Page 6: "Moreover, BAF appears to be the only obligatory substrate of Ankle2‐dependent dephosphorylation for cell proliferation as lowering the dose of the BAF kinase NHK‐1/Ballchen rescues wing development defects caused by the partial depletion of Ankle2 (Li et al., 2024)."  It is unclear why the authors conclude this since Ballchen/VRK1 can phosphorylate many things besides BAF. 

      Although the conclusion cannot be drawn categorically, it seems to be by far the most likely scenario. However, we agree that in principle, other mechanisms could also account for these genetic observations, such as the dephosphorylation of another, still unidentified obligatory substrate of PP2A‐Ankle2 that would also be phosphorylated by NHK‐1/Ballchen. However, we have also shown that expression of an unphosphorylatable mutant form of BAF rescues phenotypes observed upon loss of Ankle2 function (Li et al, 2024). We have changed our sentence as follows:

      "Moreover, BAF could be the only obligatory substrate of Ankle2‐dependent dephosphorylation for cell proliferation as lowering the dose of the BAF kinase NHK‐1/Ballchen or expression of an unphosphorylatable mutant form of BAF rescues wing development defects caused by the partial depletion of Ankle2 (Li et al., 2024).”

      d. Page 10: "These results suggest that a Vap33‐Ankle2‐PP2A complex can mediate the recruitment of a pool of PP2A at the NE."

      There is insufficient evidence to indicate that Vap33‐Ankle2‐PP2A exists in a stable state in the cell and that this complex mediates recruitment of PP2A at the NE. The images do not include Vap33, showing no evidence it is present when PP2A is at the NE and the complex could only be detected with overexpression. 

      We agree with this caveat and recognize the need to be cautious when proposing our model. In this regard, we feel that our wording is reasonable and appropriate, using “suggest” rather than “prove”, “show” or “indicate”.

      e. Page 11: These results suggest that the interaction of Ankle2 with PP2A is essential for its function in BAF dephosphorylation and nuclear reassembly." Page 14: "these results indicate that the interaction of Ankle2 with PP2A is essential during embryo". Page 14: "These results indicate that the interaction of Ankle2 with PP2A but not with Vap33 is essential for its function during cell proliferation in imaginal wing disc development." 

      These experiments show that the ankyrin repeat in Ankle2 is necessary for these processes. It does not say PP2A interaction with Ankle2 is necessary because other things could bind the domain. 

      We have revised the segments of the text mentioned, taking the reviewer’s legitimate concerns into consideration. We have also added the following sentence to the Discussion:

      “However, it remains formally possible that the deletion of Ankyrin repeats used to disrupt the Ankle2‐PP2A interaction abrogated another, unknown aspect of Ankle2 function.”

      f. Page 12: "Overall, we conclude that in addition to its N‐terminal PP2A‐interacting Ankyrin domain, Ankle2 requires the integrity of its C‐terminal portion for its essential function in nuclear reassembly." 

      No data was shown for differences in nuclear reassembly, only the ability for ANKLE2 truncation mutants to localize to the nuclear envelope. It isn't clear whether the nuclear envelope reformation is normal in Figure S6 which the authors refer to. Lamin staining could help determine and conclude the C‐terminal region is important for nuclear envelope reformation. 

      Our conclusion is drawn from the results shown in Figures S4 and S5 (described in the same section), where a rescue assay in cells was performed to assess the functionality of different variants of Ankle2‐GFP when endogenous Ankle2 was depleted. In this assay, Lamin and DNA staining were used to examine nuclear reassembly (as in Figure 5). Figure S6 shows the localizations of the different variants of Ankle2‐GFP, but endogenous Ankle2 is not depleted in these cells.

      g. Page 13: "We conclude that the ability of Ankle2 to interact with PP2A is required for the timely recruitment of BAF at reassembling nuclei and ensuing NE reassembly."

      It's possible the Ankyrin domain in ANKLE2 is interacting with proteins other than PP2A to recruit BAF at reassembling nuclei, especially since ANKLE2 is found to regulate VRK1 (Link 2019) which has been found to phosphorylate BAF during the cell cycle (Molitor 2014). Additionally, the images in Figure 6A appear to show fully reassembled nuclear envelopes in all mutants by 180s. 

      This point relates to point e, raised above by this reviewer. We have re‐written the sentence as follows:

      “We conclude that the Ankyrin domain, required for the ability of Ankle2 to interact with PP2A, is necessary for the timely recruitment of BAF at reassembling nuclei and ensuing NE reassembly.”

      Please note that in this paragraph, we discuss a delay in RFP‐BAF recruitment, rather than the complete elimination of this recruitment. 

      h. Page 16: "Our unbiased phosphoproteomic analysis confirmed that BAF dephosphorylation depends on Ankle2, despite the absence of a detectable interaction between Drosophila Ankle2 and BAF, which may be due to the lack of a LEM domain in the former (Fishburn et al., 2024). Moreover, while Ankle2 was shown to bind and inhibit the BAF counteracting kinase VRK1 in humans (Asencio et al., 2012), we detected no interaction between Ankle2 and NHK‐1/Ballchen (VRK1 ortholog) in Drosophila. This suggests that the loss of Ankle2 causes BAF hyperphosphorylation by preventing PP2A‐dependent dephosphorylation rather than by preventing inhibition of NHK‐1"

      There could be transient binding between Ankle2 and Ballchen/VRK1/NHK‐1 or activity can be indirect, but that doesn't mean there is not a contribution of BAF phosphorylation by Ballchen/VRK1/NHK‐1. Genetic evidence from three model systems, including Drosophila, indicates there is a strong genetic interaction between Ankle2 and Ballchen/VRK1/NHK‐1 that includes rescue of lethality.

      We agree and we have re‐written in this way:

      “While a putative interaction between Ankle2 and NHK‐1 in Drosophila could occur transiently, thereby escaping detection, the simplest interpretation of our results is that the loss of Ankle2 causes BAF hyperphosphorylation by preventing PP2A‐dependent dephosphorylation rather than by preventing inhibition of NHK‐1.”

      We do not question the fact that Ballchen/VRK1/NHK‐1 phosphorylates BAF and genetically interacts with Ankle2. The antagonistic relationship between Ballchen/VRK1/NHK‐1 and Ankle2 observed genetically can be explained by the fact that the kinase phosphorylates BAF while PP2AAnkle2 dephosphorylates it, without the need to invoke an additional inhibition of the kinase by Ankle2.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The hypothesis is based on the idea that inversions capture genetic variants that have antagonistic effects on male sexual success (via some display traits) and survival of females (or both sexes) until reproduction. Furthermore, a sufficiently skewed distribution of male sexual success will tend to generate synergistic epistasis for male fitness even if the individual loci contribute to sexually selected traits in an additive way. This should favor inversions that keep these male-beneficial alleles at different loci together at a cis-LD. A series of simulations are presented and show that the scenario works at least under some conditions. While a polymorphism at a single locus with large antagonistic effects can be maintained for a certain range of parameters, a second such variant with somewhat smaller effects tends to be lost unless closely linked. It becomes much more likely for genomically distant variants that add to the antagonism to spread if they get trapped in an inversion; the model predicts this should drive accumulation of sexually antagonistic variants on the inversion versus standard haplotype, leading to the evolution of haplotypes with very strong cumulative antagonistic pleiotropic effects. This idea has some analogies with one of predominant hypotheses for the evolution of sex chromosomes, and the authors discuss these similarities. The model is quite specific, but the basic idea is intuitive and thus should be robust to the details of model assumption. It makes perfect sense in the context of the geographic pattern of inversion frequencies. One prediction of the models (notably that leads to the evolution of nearly homozygously lethal haplotypes) does not seem to reflect the reality of chromosomal inversions in Drosophila, as the authors carefully discuss, but it is the case of some other "supergenes", notably in ants. So the theoretical part is a strong novel contribution.

      We appreciate the detailed and accurate summary of our main theoretic results.

      To provide empirical support for this idea, the authors study the dynamics of inversions in population cages over one generation, tracking their frequencies through amplicon sequencing at three time points: (young adults), embryos and very old adult offspring of either sex (>2 months from adult emergence). Out of four inversions included in the experiment, two show patterns consistent with antagonistic effects on male sexual success (competitive paternity) and the survival of offspring, especially females, until an old age, which the authors interpret as consistent with their theory.

      As I have argued in my comments on previous versions, the experiment only addresses one of the elements of the theoretical hypothesis, namely antagonistic effects of inversions on male reproductive success and other fitness components, in particular of females. Furthermore, the design of this experiment is not ideal from the viewpoint of the biological hypothesis it is aiming to test. This is in part because, rather than testing for the effects of inversion on male reproductive success versus the key fitness components of survival to maturity and female reproductive output, it looks at the effects on male reproductive success versus survival to a rather old age of 2 months. The relevance of survival until old age to fitness under natural conditions is unclear, as the authors now acknowledge. Furthermore, up to 15% of males that may have contributed to the next generation did not survive until genotyping, and thus the difference between these males' inversion frequency and that in their offspring may be confounded by this potential survival-based sampling bias. The experiment does not test for two other key elements of the proposed theory: the assumption of frequency-dependence of selection on male sexual success, and the prediction of synergistic epistasis for male fitness among genetic variants in the inversion. To be fair, particularly testing for synergistic epistasis would be exceedingly difficult, and the authors have now included a discussion of the above caveats and limitations, making their conclusions more tentative. This is good but of course does not make these limitations of the experiment go away. These limitations mean that the paper is stronger as a theoretical than as an empirical contribution.

      We discuss the choice to focus on exploring the potential antagonistic effects of the inversion karyotype on male reproductive success and survival in our general response above. Primarily, this prediction seemed to be the most specific to the proposed model as compared to other alternate models. Still, further studies are clearly needed to elucidate the potential frequency dependence and genetic architecture of the inversions.

      Regarding the choice of age at collection, it is unknown to what degree our selected collection age of 10 weeks correlates with survival in the wild, but we feel confident that there will be some positive correlation.

      We now further clarify that across our experiments, a minimum of 5% and a mean of 9% of the males used in the parental generation died before collection. These proportions do not appear sufficient to explain the differences between paternal and embryo inversion frequencies shown in Figure 9.

      Reviewer #2 (Public review):

      Summary:

      In their manuscript the authors address the question whether the inversion polymorphism in D. melanogaster can be explained by sexually antagonistic selection. They designed a new simulation tool to perform computer simulations, which confirmed their hypothesis. They also show a tradeoff between male reproduction and survival. Furthermore, some inversions display sex-specific survival.

      Strengths:

      It is an interesting idea on how chromosomal inversions may be maintained

      Weaknesses:

      The authors motivate their study by the observation that inversions are maintained in D. melanogaster and because inversions are more frequent closer to the equator, the authors conclude that it is unlikely that the inversion contributes to adaptation in more stressful environments. Rather the inversion seems to be more common in habitats that are closer to the native environment of ancestral Drosophila populations.

      While I do agree with the authors that this observation is interesting, I do not think that it rules out a role in local adaptation. After all, the inversion is common in Africa, so it is perfectly conceivable that the non-inverted chromosome may have acquired a mutation contributing to the novel environment.

      Based on their hypothesis, the authors propose an alternative strategy, which could maintain the inversion in a population. They perform some computer simulations, which are in line with the predicted behavior. Finally, the authors perform experiments and interpret the results as empirical evidence for their hypothesis. While the reviewer is not fully convinced about the empirical support, the key problem is that the proposed model does not explain the patterns of clinal variation observed for inversions in D. melanogaster. According to the proposed model, the inversions should have a similar frequency along latitudinal clines. So in essence, the authors develop a complicated theory because they felt that the current models do not explain the patterns of clinal variation, but this model also fails to explain the pattern of clinal variation.

      To the contrary – in the Discussion paragraph beginning on Line 671, we explain why we would predict that a tradeoff between survival and reproduction should lead to clinal inversion frequencies. We suggest that a karyotype associated with a survival penalty should be increasingly disadvantageous in more challenging environments (such as high altitudes and latitudes for this species). Furthermore, an advantage in male reproductive competition conferred by that same haplotype may be reduced by the lower population densities that we would expect in more challenging environments (meaning that each female should encounter fewer males). Individually or jointly, these two factors predict that the equilibrium frequency of a balanced inversion frequency polymorphism should depend on a local population’s environmental harshness and population density, with the ensuing prediction that inversion frequency should correlate with certain environmental variables.

      Reviewer #3 (Public review):

      Summary:

      In this study, McAllester and Pool develop a new model to explain the maintenance of balanced inversion polymorphism, based on (sexually) antagonistic alleles and a trade-off between male reproduction and survival (in females or both sexes). Simulations of this model support the plausibility of this mechanism. In addition, the authors use experiments on four naturally occurring inversion polymorphisms in D. melanogaster and find tentative evidence for one aspect of their theoretical model, namely the existence of the above-mentioned trade-off in two out of the four inversions.

      Strengths:

      (1) The study develops and analyzes a new (Drosophila melanogaster-inspired) model for the maintenance of balanced inversion polymorphism, combining elements of (sexually) antagonistically (pleiotropic) alleles, negative frequency-dependent selection and synergistic epistasis. Simulations of the model suggest that the hypothesized mechanism might be plausible.

      (2) The above-mentioned model assumes, as a specific example, a trade-off between male reproductive display and survival; in the second part of their study, the authors perform laboratory experiments on four common D. melanogaster inversions to study whether these polymorphisms may be subject to such a trade-off. The authors observe that two of the four inversions show suggestive evidence that is consistent with a trade-off between male reproduction and survival.

      Open issues:

      (1) A gap in the current modeling is that, while a diploid situation is being studied, the model does not investigate the effects of varying degrees of dominance. It would thus be important and interesting, as the authors mention, to fill this gap in future work.

      (2) It will also be important to further explore and corroborate the potential importance and generality of trade-offs between different fitness components in maintaining inversion polymorphisms in future work.

      We appreciate the work put in to evaluating, improving, and summarizing our study. We agree that further work studying the effects of dominance and of the fitness components of the inversions is important.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      l. 354 : I don't understand what the authors mean by "an antagonistic and non-antagonistic allele". If there is a antagonistic polymorphism at a locus, then both alleles have antagonistic effects; i.e., allele B increases trait 1 and reduced trait 2 relative to allele A and vice versa.

      Edited, agreed that the terminology used here was sub-optimal.

      Reviewer #2 (Recommendations for the authors):

      The motivation for their model is their claim that the clinal inversion frequencies are not compatible with local adaptation. The reviewer doubts this strong statement. Furthermore, the proposed model also fails to explain the inversion frequencies in natural populations.

      Hence, rather than building a straw man, it would be better if the authors first show their experiments and then present their model as an explanation for the empirical results. Nevertheless, it is also clear that the empirical data are not very strong and cannot be fully explained by the proposed model.

      This claim that we reject any role of local adaptation in clinal variation and selection upon inversion polymorphism does not hold up in a reading of our manuscript. We even suggest that locally varying selective pressures must be playing some role, although that does not imply that local adaptation is the ultimate driver of inversion frequencies. Indeed, we suggest that local adaptation alone is an insufficient explanation for inversion frequency clines in D. melanogaster, including because (1) these frequency clines do not approach the alternate fixed genotypes predicted by local directional selection, (2) these derived inversions tend to be more frequent in more ancestral environments (l.113-158).

      In our public review response above, and in the Discussion section of our paper, we explain why our model can predict both the clinal frequencies of many Drosophila inversions and their intermediate maximal frequencies. Of course, we do not predict that most inversions in this species should follow the specific tradeoff investigated here. In fact, we were surprised to find even two inversions that experimentally supported our predicted tradeoff. Still, it remains possible that other inversions in this species are subject to other balanced tradeoffs not investigated here, which could help explain why they rarely reach high local frequencies.

      Reviewer #3 (Recommendations for the authors):

      My previous comments have been adequately addressed.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      […]

      To provide empirical support for this idea, the authors study the dynamics of inversions in population cages over one generation, tracking their frequencies through amplicon sequencing at three time points: (young adults), embryos and very old adult offspring of either sex (>2 months from adult emergence). Out of four inversions included in the experiment, two show patterns consistent with antagonistic effects on male sexual success (competitive paternity) and the survival of offspring, especially females, until an old age, which the authors interpret as consistent with their theory.

      There are several reasons why the support from these data for the proposed theory is not waterproof.

      (1) As I have already pointed out in my previous review, survival until 2 months (in fact, it is 10 weeks and so 2.3 months) of age is of little direct relevance to fitness, whether under natural conditions or under typical lab conditions.

      The authors argue this objection away with two arguments

      First, citing Pool (2015) they claim that the average generation time (i.e. the average age at which flies reproduce) in nature is 24 days. That paper made an estimate of 14.7 generations per year under the North Carolina climate. As also stated in Pool (2015), the conditions in that locality for Drosophila reproduction and development are not suitable during three months of the year. This yields an average generation length of about 19.5 days during the 9 months during which the flies can reproduce. On the highly nutritional food used in the lab and at the optimal temperature of 25 C, Drosophila need about 11-12 days to develop from egg to adult. Even assuming these perfect conditions, the average age (counted from adult eclosion) would be about 8 days. In practice, larval development in nature is likely longer for nutritional and temperature reasons, and thus the genomic data analyzed by Pool imply that the average adult age of reproducing flies in nature would be about 5 days, and not 24 days, and even less 10 weeks. This corresponds neatly to the 2-6 days median life expectancy of Drosophila adults in the field based on capture-recapture (e.g., Rosewell and Shorrocks 1987).

      Second, the authors also claim that survival over a period of 2 month is highly relevant because flies have to survive long periods where reproduction is not possible. However, to survive the winter flies enter a reproductive diapause, which involves profound physiological changes that indeed allow them to survive for months, remaining mostly inactive, stress resistant and hidden from predators. Flies in the authors' experiment were not diapausing, given that they were given plentiful food and kept warm. It is still possible that survival to the ripe old age of 10 weeks under these conditions still correlates well with surviving diapause under harsh conditions, but if so, the authors should cite relevant data. Even then, I do not think this allows the authors to conclude that longevity is "the main selective pressure" on Drosophila (l. 936).

      This is overall a thoughtfully presented critique and we have endeavored to improve our discussion of Pool (2015) and to clarify some of the language used about survival elsewhere. While we agree that challenges other than survival to 10 weeks are very relevant to Drosophila melanogaster, collection at 10 weeks does encompass some of these other challenges. Egg to adult viability still contributes to the frequencies of the inversions at collection and is not separable from longevity in this data. Collection at longevity was chosen in part to encompass all lifetime fitness challenges that might influence the inversion frequency at collection, albeit still within permissive laboratory conditions. Future experiments exploring specific stressors independently and beyond permissive lab conditions would generate a clearer picture.

      In addition to general edits, the specific phrase mentioned at 1. 936 [now line 1003] has been revised from “In many such cases females are in reproductive diapause, and so longevity is the main selective pressure.” to “While longevity is a key selective pressure underlying overwintering, the relationship between longevity in permissive lab conditions without diapause and in natural conditions under diapause is unclear (Schmidt et al. 2005; Flatt 2020), and our experiment represents just one of many possible ways to examine tradeoffs involving survival.”

      (2) It appears that the "parental" (in fact, paternal) inversion frequency was estimated by sequencing sires that survived until the end of the two-week mating period. No information is provided on male mortality during the mating period, but substantial mortality is likely given constant courtship and mating opportunities. If so, the difference between the parental and embryo inversion frequency could reflect the differential survival of males until the point of sampling rather than / in addition to sexual selection.

      We have further clarified that when referenced as parental frequency, the frequency presented is ½ the paternal frequency as the mothers were homokaryotypic for the standard arrangement. We chose to present both due to considerations in representing the frequency change from paternal to embryo frequencies, where a hypothetical change from 0.20 frequency in fathers to 0.15 frequency in embryos represents a selective benefit (a frequency increase in the population), despite the reality that this is a decrease in allele frequency between paternal and embryo cohorts.

      We mentioned a maximum 15% paternal mortality at line 827 [now l.1056], but have now added complete data on the counts of flies in the experiment as a supplemental table (Table S1) and have added or corrected further references to this in the results and methods [lines 555, 638, 975]. It is true that this may influence the observed frequency changes to some degree, and while we adjusted our sampling method to account for the effects of this mortality on statistical power [l.1056ff], we have now edited the manuscript to better highlight potential effects of this phenomenon on the recorded frequency changes.

      It is also worth noting that, if mortality among fathers over the mating period is codirectional with mortality among aged offspring, this would bias the results against detecting an opposing antagonistic selective effect of the inversions on paternity share. This is now also mentioned in the manuscript, l.639ff.

      (3) Finally, irrespective of the above caveats, the experimental data only address one of the elements of the theoretical hypothesis, namely antagonistic effects of inversions on reproduction and survival, notably that of females. It does not test for two other key elements of the proposed theory: the assumption of frequency-dependence of selection on male sexual success, and the prediction of synergistic epistasis for male fitness among genetic variants in the inversion. To be fair, particularly testing the latter prediction would be exceedingly difficult. Nonetheless, these limitations of the experiment mean that the paper is much stronger theoretical than empirical contribution.

      This is a fair criticism of the limitations of our results, and we now summarize such caveats more directly in the discussion summary, lines 876ff.

      Reviewer #2 (Public Review): 

      […]

      Comments on the latest version:

      I would like to give an example of the confusing terminology of the authors:

      "Additionally, fitness conveyed by an allele favoring display quality is also frequency-dependent: since mating success depends on the display qualities of other males, the relative advantage of a display trait will be diminished as more males carry it..."

      I do not understand the difference to an advantageous allele, as it increases in frequency the frequency increase of this allele decreases, but this has nothing to do with frequency dependent selection. In my opinion, the authors re-define frequency dependent selection, as for frequency dependent selection needs to change with frequency, but from their verbal description this is not clear.

      We have edited this text for greater clarity, now line 232ff. We did not seek to redefine frequency dependence, and did mean by “the relative advantage of a display trait will be diminished” that an equivalent s would diminish with frequency. We have now remedied terminological issues introduced in the prior revision with regard to frequency dependent selection.

      One example of how challenging the style of the manuscript is comes from their description of the DNA extraction procedure. In principle a straightforward method, but even here the authors provide a convoluted uninformative description of the procedure.

      We have edited for clarity the text on lines 1016-1020. Citing a published protocol and mentioning our modifications seems an appropriate trade-off between representing what was done accurately, citing the sources we relied on in doing it, and limiting the volume of information in the main text for such a straightforward and common method. 

      It is not apparent to the reviewer why the authors have not invested more effort to make their manuscript digestible.

      We have invested a great deal of effort in making this manuscript as clear as we are able to.  We regret that our writing has not been to this reviewer’s liking. We believe we have been highly responsive to all specific criticisms, including revising all passages cited as unclear. In this round, we have again scrutinized the entire manuscript for any opportunity to clarify it, and we have made further changes throughout.  Although our subject matter is conceptually nuanced, we nevertheless remain optimistic that a careful, fresh reading of our revised manuscript would yield a more favorable impression.

      Reviewer #3 (Public Review):

      […]

      Weaknesses:

      A gap in the current modeling is that, while a diploid situation is being studied, the model does not investigate the effects of varying degrees of dominance. It would be important and interesting to fill this gap in future work.

      Agreed, and now reinforced at lines 892ff.

      Comments on the latest version:

      Most of the comments which I have made in my public review have been adequately addressed.

      Some of the writing still seems somewhat verbose and perhaps not yet maximally succinct; some additional line-by-line polishing might still be helpful at this stage in terms of further improving clarity and flow (for the authors to consider and decide).

      We have made further changes and some polishing in this draft, and greatly appreciate the guidance provided in improving the draft so far. 

      Reviewer #1 (Recommendations For The Authors):

      (1) While the model results are convincing, some of the verbal interpretation is confusing. In particular, the authors state that in their model the allele favoring male display quality shows a negative frequency dependence whereas the alternative allele has a positive frequency dependence. This does not make sense to me in the context of population genetics theory. For a one-locus, two-allele model the change of allele frequency under selection depends on the fitness of the genotypes concerned relative to each other. Thus, at least under no dominance assumed in this model, if the relative fitness of AA decreases with the frequency of allele A, the relative fitness of aa must decrease with the frequency of allele a. I.e., if selection is negatively frequency dependent, then it is so for both alleles.

      This phrasing was wrong, and we have edited the relevant section.

      (2) I am still not entirely sure that the synergistic epistasis assumed in the verbal model is actually generated in the simulations; this would be easy enough to check by extracting the mating success of males with different genotypes from the simulation output should be reported, e.g., as a figure supplement.

      Our new Figure S2, which depicts haplotype frequencies for a set of the simulations presented in Figure 4, should demonstrate a necessary presence of synergistic epistasis. These results further clarify that the weaker allele B is only kept when linked to A. The same fitness classes of genotype are present in the simulations with and without the inversion, so the only mechanical difference is the rate of recombination, and the only way this might change selection on the alleles is if a variant has a different fitness in one haplotype background than another – i.e. epistasis. The maintenance of haplotypes AB and ab to the exclusion of Ab and aB relies on the lesser relative fitness of Ab and aB. And since survival values are multiplicative, this additional contribution must come from the mate success of AB being disproportionately larger than Ab or aB, indicating the emergent synergistic epistasis posited by our model. We have clarified this point in the text at line 363ff.

      (3) l. 318ff: What was this set number of males? I could not find this information anywhere. Also, this model of the mating system is commonly referred to as "best of N", so the authors may want to include this label in the description.

      We indicate this detail just after the referenced line, now reworded and on l. 338-340 as “For each female’s mating competition, 100 males were sampled, though see Figure S1 for plots with varying encounter number.”  Among these edits, “one hundred” has been changed to a numeral for easier skimming, and Figure S1 is now referenced here earlier in the text. Several edits have also been made in the caption of Figures 2 and 3, and in the relevant methods section to clarify the number of encountered males simulated, mention best of N terminology, and clarify how the quality score is used in the mate competition.

      (4) The description of the experiment is still confusing. The number of individuals of each sex entered in each mating cage is missing from the Methods (l. 914); although I did finally find it in the Results. These flies were laying over 2 weeks - does this mean that offspring from the entire period were used to obtain the embryo and aged offspring frequencies, or only from a particular egg collection? If the former, does this mean that the offspring obtained from different egg batches were aged separately? Were the offspring aged in cages or bottles, at what density? Given that only those males that survived until the end of the two-week mating period were sequenced, it is important to know what % of the initial number of males these survivors were. A substantial mortality of the parental males could bias the estimate of parental frequencies. How many parental males, embryos and aged offspring were sequenced? Were all individuals of a given cage and stage extracted and sequenced as a single pool or were there multiple pools? The description could also be structured better. For example, the food and grape agar recipes and cage construction are inserted at random points of the description of the crossing design, which does not help.

      We have now reorganized and edited these portions of the Methods text. Portions of this comment overlap with edits responding to (2) of the Public Review and below for l. 921 in Details. Offspring from different laying periods were aged in different bottles, further separated by the time at which they eclosed. They were then pooled for DNA extraction and library preparation by sex and a binary early or late eclosion time. This data was present in the “D. mel. Sample Size” column of supplemental tables S6 and S7 (now S7 and S8), but we have added and referenced a new table to specifically collate the sample sizes of different experimental stages, table S1. Now referenced at lines 555, 638, 975, 1057.

      (5) The caption of figure 9 and the discussion of its results should be clear and explicit about the fact that "adult offspring" in Fig 9A and "female" and "male" refers to adults surviving to old age (whereas "parental" in Fig 9A refers to young adults in their reproductive prime. This has consequences for the interpretation of the difference between "parental" and "adult offspring", as it combines one generation of usual selection as it occurs under the conditions of the lab culture (young adult at generation t -> young adult in generation t+1) with an additional step of selection for longevity. Thus, a marked change in allele frequency does not imply that the "parental" frequency does not represent an equilibrium frequency of the inversions under the lab culture conditions. Furthermore, it would be useful to state explicitly that Figure 9B represents the same results as figure 9A, but with the aged offspring split by sex.

      Figure caption edited to provide further clarity on the age of cohorts and presented data, along with the relevant results section (2.3) referencing this figure.

      We avoid making any statements about the equilibrium frequencies of inversions under lab conditions, and whether or not any step of our experiment reflects such equilibria, because our investigation does not rely upon or test for such conditions. Instead, our analysis focuses on whether inversions have contrasting effects (as indicated by frequency changes that are incompatible with neutral sampling) between different life history components.  Under our model, such frequency reversals might be detectable both at equilibrium balanced inversion frequencies and also at frequencies some distance away from equilibria. We have now clarified this point at l. 970-972.

      Details:

      l. 211: this should be modified as male-only costs are now included.

      Edited. “survival likelihood (of either or both sexes).”

      l. 343: misplaced period

      Edited.

      l. 814: "We confirmed model predictions...": This sounds like it refers to an empirical confirmation of a theory prediction, but I think the authors just want to say that their simulations predicted antagonistic variants can be maintained at an intermediate equilibrium frequency. So the wording should be changed to avoid ambiguity.

      Edited. Now line 869.

      l. 853: How can a genome be "empty"? Do the authors mean an absence of any polymorphism?

      Edited to: “In SAIsim, a population is instantiated as a python object, and populated with individuals which are also represented by python objects. These individuals may be instantiated using genomes specified by the user, or by default carry no genomic variation.” Lines 913ff.

      l. 853: I do not see this diagramed in Figure 5

      Apologies, fixed to Fig. 2

      l. 864: is crossing-over in the model limited to female gametogenesis (reflecting the Drosophila case) or does it occur in both sexes?

      There is a variable in the simulator to make crossover female-specific. All simulations were performed with female-only crossover. Edited for clarity. “While the simulator can allow recombination in both sexes, all simulations presented only generate crossovers and gene conversion events for female gametes, in accordance with the biology of D. melanogaster.” Lines 928-929.

      l. 906: "F2" is ambiguous; does this mean that the mix of lines was allowed to breed for two generations? Also, in other places in the manuscript these flies appear to be referred to are "parental". So do not use F2.

      Edited, F2 language removed and replaced with being allowed to breed for two generations. Now lines 967ff.

      l. 910: this is incorrect/imprecise; what can be inferred is the frequency of the inversions in male gametes that contributed to fertilization. This would correspond to the frequency in successful males only if each successful male genotype had the same paternity share.

      Edited, now “Since no inversions could be inherited through the mothers, inversion frequencies among successful male gametes could be inferred from their pooled offspring.” Now line 994.

      l. 912: "without a controlled day/night cycle" meaning what? Constant light? Constant darkness? Daylight falling through the windows?

      Edited to “Unless otherwise noted, all flies were kept in a lab space of 23°C with around a degree of temperature fluctuation and without a controlled day/night cycle. Light exposure was dependent on the varying use of the space by laboratory workers but amounted to near constant exposure to at least a minimal level of lighting, with some variable light due to indirect lighting from adjacent rooms with exterior windows.” Now lines 1007-1010.

      l. 921: I cannot parse this sentence. Were the offspring isolated as virgins?

      No, the logistics of collecting virgins would have been prohibitive, and it did not seem essential for our experiment. Hopefully the edits to this section are clearer, now lines 978ff.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Rigor in the design and application of scientific experiments is an ongoing concern in preclinical (animal) research. Because findings from these studies are often used in the design of clinical (human) studies, it is critical that the results of the preclinical studies are valid and replicable. However, several recent peer-reviewed published papers have shown that some of the research results in cardiovascular research literature may not be valid because their use of key design elements is unacceptably low. The current study is designed to expand on and replicate previous preclinical studies in nine leading scientific research journals. Cardiovascular research articles that were used for examination were obtained from a PubMed Search. These articles were carefully examined for four elements that are important in the design of animal experiments: use of both biological sexes, randomization of subjects for experimental groups, blinding of the experimenters, and estimating the proper size of samples for the experimental groups. The findings of the current study indicate that the use of these four design elements in the reported research in preclinical research is unacceptably low. Therefore, the results replicate previous studies and demonstrate once again that there is an ongoing problem in the experimental design of preclinical cardiovascular research.

      Strengths:

      This study selected four important design elements for study. The descriptions in the text and figures of this paper clearly demonstrate that the rate of use of all four design elements in the examined research articles was unacceptably low. The current study is important because it replicates previous studies and continues to call attention once again to serious problems in the design of preclinical studies, and the problem does not seem to lessen over time.

      Weaknesses:

      The current study uses both descriptive and inferential statistics extensively in describing the results. The descriptive statistics are clear and strong, demonstrating the main point of the study, that the use of these design elements is quite low, which may invalidate many of the reported studies. In addition, inferential statistical tests were used to compare the use of the four design elements against each other and to compare some of the journals. The use of inferential statistical tests appears weak because the wrong tests may have been used in some cases. However, the overall descriptive findings are very strong and make the major points of the study.

      We sincerely appreciate the reviewer’s comments and detailed feedback and their recognition of the importance of this work in replicating previous studies and calling attention to the problems in preclinical study design. In response to the reviewer’s suggestions, we have recalculated our inferential statistics. In place of our previous inferential statistics, we have used an alternative correction calculation for p-values (Holm-Bonferroni corrections) and used median-based linear model analyses and nonparametric Kruskal-Wallis tests that are more appropriate for analyzing this dataset. Our overall trends in results remain the same.

      Reviewer #2 (Public Review):

      Summary

      This study replicates a 2017 study in which the authors reviewed papers for four key elements of rigor: inclusion of sex as a biological variable, randomization of subjects, blinding outcomes, and pre-specified sample size estimation. Here they screened 298 published papers for the four elements. Over a 10 year period, rigor (defined as including any of the 4 elements) failed to improve. They could not detect any differences across the journals they surveyed, nor across models. They focused primarily on cardiovascular disease, which both helps focus the research but limits the potential generalizability to a broader range of scientific investigation. There is no reason, however, to believe rigor is any better or worse in other fields, and hence this study is a good 'snapshot' of the progress of improving rigor over time.

      Strengths

      The authors randomly selected papers from leading journals, e.g., PNAS). Each paper was reviewed by 2 investigators. They pulled papers over a 10-year period, 2011 to 2021, and have a good sample of time over which to look for changes. The analysis followed generally accepted guidelines for a structured review.

      Weaknesses

      The authors did not use the exact same journals as they did in the 2017 study. This makes comparing the results complicated. Also, they pulled papers from 2011 to 2021, and hence cannot assess the impact of their own prior paper.

      The authors write "the proportion of studies including animals of both biological sexes generally increased between 2011 and 2021, though not significantly (R2= 0.0762, F(1,9)= 0.742, p= 0.411 (corrected p=8.2". This statement is not rigorous because the regression result is not statistically significant. Their data supports neither a claim of an increase nor a decrease over time. A similar problem repeats several times in the remainder of their results presentation.

      I think the Introduction and the Discussion are somewhat repetitive and the wording could be reduced.

      Impact and Context

      Lack of reproducibility remains an enormous problem in science, plaguing both basic and translational investigations. With the increased scrutiny on rigor, and requirements at NIH and other funding agencies for more rigor and transparency, one would expect to find increasing rigor, as evidenced by authors including more study design elements (SDEs) that are recommended. This review found no such change, and this is quite disheartening. The data implies that journals-editors and reviewers-will have to increase their scrutiny and standards applied to preclinical and basic studies. This work could also serve as a call to action to investigators outside of cardiovascular science to reflect on their own experiences and when planning future projects.

      We sincerely appreciate the reviewer’s insights and comments and recognition of our work contributing to the growing body of evidence on the lack of rigor in preclinical cardiovascular research study design. Regarding the weaknesses the reviewer noted; the referenced 2017 publication details a study by Ramirez et al, and was not conducted by our group. Our study aimed to expand upon their findings by using a more recent timeframe and an alternative list of highly respected cardiovascular research journals. We have now better clarified this distinction in the manuscript. We have also addressed our phrasing regarding the lack of statistical significance in the increase of the proportion of studies including animals of both sexes from 2011-2021.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Many of the methods in this study were strong or adequate. Although the descriptive statistics appear solid, there are significant problems that need to be addressed in the selection and use of inferential statistics.

      (1) One of the design elements that was studied was sample size estimation. This is usually done by a power analysis. The authors should consider what group size for the examined journals is adequate for their statistics to be valid. Or they could report the power of their studies to achieve a given meaningful difference.

      We thank the reviewer for this excellent observation. We unfortunately failed to conduct an a priori power analysis. Previous research (Gupta, et al. 2016) suggests that post-hoc power calculations should not be carried out after the study has been conducted. We acknowledge the importance of establishing a sufficient sample size to draw sound conclusions based on an adequate effect size, and we regret that we did not carry out the appropriate estimations. We are very appreciative of the reviewer’s suggestions and aim to implement such an appropriate study design element in future studies.

      Gupta KK, Attri JP, Singh A, Kaur H, Kaur G. Basic concepts for sample size calculation: Critical step for any clinical trials!. Saudi J Anaesth. 2016;10(3):328-331. doi:10.4103/1658-354X.174918

      (2) A Bonferroni correction was used extensively. Because of its use, the corrected p values often appear much too high. The Bonferroni test becomes much too conservative for more than 3 or 4 tests. I suggest using a different test for multiple comparisons.

      We thank the reviewer for their insightful suggestion. We have updated all p-values to reflect a Holm-Bonferroni correction instead. All p-values have been corrected and updated.

      (3) The use of the chi-square test for categorical data is appropriate. However, the t-test and multiple regression tests are designed for continuous variables. Here, it appears that they were used for the nominal variables (Table 1). For these nominal data, other nonparametric tests should be used.

      We thank the reviewer for this valuable insight. We have updated our statistical analysis methods and now use nonparametric Kruskal-Wallis tests to analyze differences in SDE reporting across journals, instead of chi-square test. Our reported p-values have been adjusted accordingly.

      (4) It is not clear exactly when each test is used. The stats section in Methods should better delineate when each test is used. In addition, it would be helpful to include the test used in the figure legends.

      We thank the reviewer for bringing up this important point. We have now updated the methods section to better delineate which tests were used, and also included the specific tests in the figure legends.

      (5) You will need to rewrite some sections of the text to reflect the changes due to changing your use of statistics.

      We have rewritten the sections of the text to reflect the changes in our use of statistics.

      Here are a few comments on the presentation.

      (1) Some of the figure legends are almost impossible to read. They are too congested.

      We thank the reviewer for pointing this out. We have edited the figure legends to make them more readable. We will also attach a pdf with the graphs to allow for easier formatting.

      (2) Also, is it possible to drop some of the panels in Figure 1?

      The panels in figure 1 have been rearranged to make them more readable. We believe that each panel provides valuable visual summaries of our data, that will aid readers in understanding our results.

      (3) It is not mandatory that values of y-axis on the graphs go up 100% (Figs 2 and 3). Using a maximum value of 100% clumps the lines visually. I suggest a max value on the y-axis of the graph of 50% or 60%. That will spread the lines better visually so differences can better be seen.

      We thank the reviewer for considering the experience of our paper’s readers. The y-axes of Figures 2 and 3 have been truncated to 50%. The trend lines in each Figure now appear more separated and differences can better be seen.

      Reviewer #2 (Recommendations For The Authors):

      The authors did not use the exact same journals as they did in the 2017 study. This makes comparing the results complicated. Also, they pulled papers from 2011 to 2021, and hence cannot assess the impact of their own prior paper.

      We appreciate the reviewer’s concern in maintaining consistency with the paper published by Ramirez, et al. in 2017. To clarify, our efforts focused on providing a replication study that expanded upon the original Ramirez publication - which we have no affiliation with. For our study, we used different academic journals than those used by Ramirez, et al, and also a different time-frame. We have updated the language in the manuscript to better-clarify the purpose and parameters of our study relative to the previous, unaffiliated, study.

      The authors write "the proportion of studies including animals of both biological sexes generally increased between 2011 and 2021, though not significantly (R2= 0.0762, F(1,9)= 0.742, p= 0.411 (corrected p=8.2". This statement is not rigorous because the regression result is not statistically significant. Their data supports neither a claim of an increase nor a decrease over time. A similar problem repeats several times in the remainder of their results presentation.

      Thank you for bringing this information to our attention. We agree with the concern regarding the statement, “the proportion of studies including animals of both biological sexes generally increased between 2011 and 2021, though not significantly (R2= 0.0762, F(1,9)= 0.742, p= 0.411 (corrected p=8.2.” We have rephrased the statement. Our updated Holm-Bonferroni corrected p-value is now noted in this more appropriately worded description of our results. Lastly, we have addressed the wording and redundancy seen in both the introduction and discussion and have made both more concise.

      I think the Introduction and the Discussion are somewhat repetitive and the wording could be reduced.

      We thank the reviewer for bringing this to our attention. We have addressed the redundancy across the Introduction and the Discussion. We have also altered the wording to reflect a more concise explanation of our study.

      The 'trends' are not statistically significant. A non-significant trend does not exist and no claim of a 'trend' is justified by the data.

      We thank the reviewer for this observation. We have updated the phrasing of ‘trends’ in all areas of the manuscript.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this paper Kawasaki et al describe a regulatory role for the PIWI/piRNA pathway in rRNA regulation in Zebrafish. This regulatory role was uncovered through a screen for gonadogenesis defective mutants, which identified a mutation in the meioc gene, a coiled-coil germ granule protein. Loss of this gene leads to redistribution of Piwil1 from germ granules to the nucleolus, resulting in silencing of rRNA transcription.

      Strengths:

      Most of the experimental data provided in this paper is compelling. It is clear that in the absence of meioc, PiwiL1 translocates in to the nucleolus and results in down regulation of rRNA transcription. the genetic compensation of meioc mutant phenotypes (both organismal and molecular) through reduction in PiwiL1 levels are evidence for a direct role for PiwiL1 in mediating the phenotypes of meioc mutant.

      Weaknesses:

      Questions remain on the mechanistic details by which PiwiL1 mediated rRNA down regulation, and whether this is a function of Piwi in an unperturbed/wildtype setting. There is certainly some evidence provided in support of the natural function for piwi in regulating rRNA transcription (figure 5A+5B). However, the de-enrichment of H3K9me3 in the heterozygous (Figure 6F) is very modest and in my opinion not convincingly different relative to the control provided. It is certainly possible that PiwiL1 is regulating levels through cleavage of nascent transcripts. Another aspect I found confounding here is the reduction in rRNA small RNAs in the meioc mutant; I would have assumed that the interaction of PiwiL1 with the rRNA is mediated through small RNAs but the reduction in numbers do not support this model. But perhaps it is simply a redistribution of small RNAs that is occurring. Finally, the ability to reduce PiwiL1 in the nucleolus through polI inhibition with actD and BMH-21 is surprising. What drives the accumulation of PiwiL1 in the nucleolus then if in the meioc mutant there is less transcription anyway?

      Despite the weaknesses outlined, overall I find this paper to be solid and valuable, providing evidence for a consistent link between PIWI systems and ribosomal biogenesis. Their results are likely to be of interest to people in the community, and provide tools for further elucidating the reasons for this link.

      The amount of cytoplasmic rRNA in piwi+/- was increased by 26% on average (figure 5A+5B), the amount of ChiP-qPCR of H3K9 was decreased by about 26% (Figure 6F), and ChiP-qPCR of Piwil1 was decreased by 35% (Figure 6G), so we don't think there is a big discrepancy. On the other hand, the amount of ChiP-qPCR of H3K9 in meioc<sup>mo/mo</sup> was increased by about 130% (Figure 6F), while ChiP-qPCR of Piwil1 was increased by 50%, so there may be a mechanism for H3K9 regulation of Meioc that is not mediated by Piwil1. As for what drives the accumulation of Piwil1 in the nucleolus, although we have found that Piwil1 has affinity for rRNA (Fig. 6A), we do not know what recruits it. Significant increases in the 18-35nt small RNA of 18S, 28S rRNA and R2 were not detected in meioc<sup>mo/mo</sup> testes enriched for 1-8 cell spermatogonia, compared with meioc<sup>+/mo</sup> testes. The nucleolar localization of Piwil1 has revealed in this study, which will be a new topic for future research.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors report that Meioc is required to upregulate rRNA transcription and promote differentiation of spermatogonial stem cells in zebrafish. The authors show that upregulated protein synthesis is required to support spermatogonial stem cells' differentiation into multi-celled cysts of spermatogonia. Coiled coil protein Meioc is required for this upregulated protein synthesis and for increasing rRNA transcription, such that the Meioc knockout accumulates 1-2 cell spermatogonia and fails to produce cysts with more than 8 spermatogonia. The Meioc knockout exhibits continued transcriptional repression of rDNA. Meioc interacts with and sequesters Piwil1 to the cytoplasm. Loss of Meioc increases Piwil1 localization to the nucleolus, where Piwil1 interacts with transcriptional silencers that repress rRNA transcription.

      Strengths:

      This is a fundamental study that expands our understanding of how ribosome biogenesis contributes to differentiation and demonstrates that zebrafish Meioc plays a role in this process during spermatogenesis. This work also expands our evolutionary understanding of Meioc and Ythdc2's molecular roles in germline differentiation. In mouse, the Meioc knockout phenocopies the Ythdc2 knockout, and studies thus far have indicated that Meioc and Ythdc2 act together to regulate germline differentiation. Here, in zebrafish, Meioc has acquired a Ythdc2-independent function. This study also identifies a new role for Piwil1 in directing transcriptional silencing of rDNA.

      Weaknesses:

      There are limited details on the stem cell-enriched hyperplastic testes used as a tool for mass spec experiments, and additional information is needed to fully evaluate the mass spec results. What mutation do these testes carry? Does this protein interact with Meioc in the wildtype testes? How could this mutation affect the results from the Meioc immunoprecipitation?

      Stem cell-enriched hyperplastic testes came from wild-type adult sox17::GFP transgenic zebrafish. Sperm were found in these hyperplastic testes, and when stem cells were transplanted, they self-renewed and differentiated into sperm. It is not known if the hyperplasias develop due to a genetic variant in the line. We will add the following comment.

      “The stem cell-enriched hyperplastic testes, which are occasionally found in adult wildtype zebrafish, contain cells at all stages of spermatogenesis. Hyperplasia-derived SSCs self-renewed and differentiated in the same manner as SSCs of normal testes in transplants of aggregates mixed with normal testicular cells.”

      Reviewer #3 (Public review):

      Summary:

      The paper describes the molecular pathway to regulate germ cell differentiation in zebrafish through ribosomal RNA biogenesis. Meioc sequesters Piwil1, a Piwi homolog, which suppresses the transcription of the 45S pre-rDNA by the formation of heterochromatin, to the perinuclear bodies. The key results are solid and useful to researchers in the field of germ cell/meiosis as well as RNA biosynthesis and chromatin.

      Strengths:

      The authors nicely provided the molecular evidence on the antagonism of Meioc to Piwil1 in the rRNA synthesis, which supported by the genetic evidence that the inability of the meioc mutant to enter meiosis is suppressed by the piwil1 heterozygosity.

      Weaknesses:

      (1) Although the paper provides very convincing evidence for the authors' claim, the scientific contents are poorly written and incorrectly described. As a result, it is hard to read the text. Checking by scientific experts would be highly recommended. For example, on line 38, "the global translation activity is generally [inhibited]", is incorrect and, rather, a sentence like "the activity is lowered relative to other cells" is more appropriate here. See minor points for more examples.

      Thank you for pointing that out. I will correct the parts pointed out.

      (2) In some figures, it is hard for readers outside of zebrafish meiosis to evaluate the results without more explanation and drawing.

      We will refine Figure 1A and add schema of spermatogonia culture system in a supplemental figure. 

      (3) Figure 1E, F, cycloheximide experiments: Please mention the toxicity of the concentration of the drug in cell proliferation and viability.

      When testicular tissue culture was performed at 0.1, 1, 10, 100, 250, and 500mM, abnormal strong OP-puro signals including nuclei were found in cells at 10mM or more. We will add the results in the Supplemental Material. In addition, at 1mM, growth was perturbed in fast-growing 32≤-cell cysts of spermatogonia, but not in 1-4-cell spermatogonia, as described in L122-125.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      By way of background, the Jiang lab has previously shown that loss of the type II BMP receptor Punt (Put) from intestinal progenitors (ISCs and EBs) caused them to differentiate into EBs, with a concomitant loss of ISCs (Tian and Jiang, eLife 2014). The mechanism by which this occurs was activation of Notch in Put-deficient progenitors. How Notch was upregulated in Put-deficient ISCs was not established in this prior work. In the current study, the authors test whether a very low level of Dl was responsible. But co-depletion of Dl and Put led to a similar phenotype as depletion of Put alone. This result suggested that Dl was not the mechanism. They next investigate genetic interactions between BMP signaling and Numb, an inhibitor of Notch signaling. Prior work from Bardin, Schweisguth and other labs has shown that Numb is not required for ISC self-renewal. However the authors wanted to know whether loss of both the BMP signal transducer Mad and Numb would cause ISC loss. This result was observed for RNAi depletion from progenitors and for mad, numb double mutant clones. Of note, ISC loss was observed in 40% of mad, numb double mutant clones, whereas 60% of these clones had an ISC. They then employed a two-color tracing system called RGT to look at the outcome of ISC divisions (asymmetric (ISC/EB) or symmetric (ISC/ISC or EB/EB)). Control clones had 69%, 15% and 16%, respectively, whereas mad, numb double mutant clones had much lower ISC/ISC (11%) and much higher EB/EB (37%). They conclude that loss of Numb in moderate BMP loss of function mutants increased symmetric differentiation which lead caused ISC loss. They also reported that Numb<sup>15</sup> and numb<sup>4</sup> clones had a moderate but significant increase in ISC-lacking clones compared to control clones, supporting the model that Numb plays a role in ISC maintenance. Finally, they investigated the relevance of these observation during regeneration. After bleomycin treatment, there was a significant increase in ISC-lacking clones and a significant decrease in clone size in numb<sup>4</sup> and Numb<sup>15</sup> clones compared to control clones. Because bleomycin treatment has been shown to cause variation in BMP ligand production, the authors interpret the numb clone under bleomycin results as demonstrating an essential role of Numb in ISC maintenance during regeneration.

      Strengths:

      (i) Most data is quantified with statistical analysis

      (ii) Experiments have appropriate controls and large numbers of samples

      (iii) Results demonstrate an important role of Numb in maintaining ISC number during regeneration and a genetic interaction between Mad and Numb during homeostasis.

      Weaknesses:

      (i) No quantification for Fig. 1

      Thank you for your suggestion. Quantification of Fig.1 will be added.  

      (ii) The premise is a bit unclear. Under homeostasis, strong loss of BMP (Put) leads to loss of ISCs, presumably regardless of Numb level (which was not tested). But moderate loss of BMP (Mad) does not show ISC loss unless Numb is also reduced. I am confused as to why numb does not play a role in Put mutants. Did the authors test whether concomitant loss of Put and Numb leads to even more ISC loss than Put-mutation alone.

      Thank you for your comment. We have tested the genetic interaction between punt and numb using punt RNAi and numb RNAi driven by esg<sup>ts</sup>. According to the results in this study and our previously published data, punt mutant clone or esg<sup>ts</sup>> punt RNAi could induce a rapid loss of ISC (whin 8 days). We did not observe further enhancement of stem cell loss phenotype caused punt RNAi by numb RNAi.

      (iii) I think that the use of the word "essential" is a bit strong here. Numb plays an important role but in either during homeostasis or regeneration, most numb clones or mad, numb double mutant clones still have ISCs. Therefore, I think that the authors should temper their language about the role of Numb in ISC maintenance.

      Thank you. We will revise the language.

      Reviewer #2 (Public review):

      Summary:

      This work assesses the genetic interaction between the Bmp signaling pathway and the factor Numb, which can inhibit Notch signalling. It follows up on the previous studies of the group (Tian, Elife, 2014; Tian, PNAS, 2014) regarding BMP signaling in controlling stem cell fate decision as well as on the work of another group (Sallé, EMBO, 2017) that investigated the function of Numb on enteroendocrine fate in the midgut. This is an important study providing evidence of a Numb-mediated back up mechanism for stem cell maintenance.

      Strengths:

      (1) Experiments are consistent with these previous publications while also extending our understanding of how Numb functions in the ISC.

      (2) Provides an interesting model of a "back up" protection mechanism for ISC maintenance.

      Weaknesses:

      (1) Aspects of the experiments could be better controlled or annotated:

      (a) As they "randomly chose" the regions analyzed, it would be better to have all from a defined region (R4 or R2, for example) or to at least note the region as there are important regional differences for some aspects of midgut biology.

      Thank you. Since we mainly focus on region 4, we have added the clarification in the manuscript.

      (b) It is not clear to me why MARCM clones were induced and then flies grown at 18{degree sign}C? It would help to explain why they used this unconventional protocol.

      To avoid spontaneous clone, we kept the flies under 18°C.

      (2) There are technical limitations with trying to conclude from double-knockdown experiments in the ISC lineage, such as those in Figure 1 where Dl and put are both being knocked down: depending on how fast both proteins are depleted, it may be that only one of them (put, for example) is inactivated and affects the fate decision prior to the other one (Dl) being depleted. Therefore, it is difficult to definitively conclude that the decision is independent of Dl ligand.

      In our hand, Dl-RNAi is very effective and exhibited loss of N pathway activity as determined by the N pathway reporter Su(H)-lacZ (Fig. 1D). Therefore, the ectopic Su(H)-lacZ expression in Punt Dl double RNAi (fig. 1E) is unlikely due to residual Dl expression. Nevertheless, we will change the statement “BMP signaling blocks ligand-independent N activity” to” Loss of BMP signaling results in ectopic N pathway activity even when Dl is depleted”

      (3) Additional quantification of many phenotypes would be desired.

      (a) It would be useful to see esg-GFP cells/total cells and not just field as the density might change (2E for example).

      We focused on R4 region for quantification where the cell density did not exhibit apparent change in different experimental groups. In addition, we have examined many guts for quantification. It is unlikely that the difference in the esg+ cell number is caused by change in cell density.

      (b) Similarly, for 2F and 2G, it would be nice to see the % of ISC/ total cell and EB/total cell and not only per esgGFP+ cell.

      Unfortunately, we didn’t have the suggested quantification. However, we believe that quantification of the percentage of ISC or EB among all progenitor cells, as we did here, provides a faithful measurement of the self-renewal status of each experimental group.

      (c) Fig1: There is no quantification - specifically it would be interesting to know how many esg+ are su(H)lacZ positive in Put- Dl- condition compared to WT or Put- alone. What is the n?

      Quantification will be added.

      (d) Fig2: Pros + cells are not seen in the image? Are they all DllacZ+?

      Anti-Pros and anti-E(spl)mβ-CD2 were stained in the same channel (magenta).  Pros+ is nuclear dot-like staining, while CD2 outlined the cell membrane of EB cell.

      (e) Fig3: it would be nice to have the size clone quantification instead of the distribution between groups of 2 cell 3 cells 4 cell clones.

      Thank you for your suggestion. In this study, we have quantified the clone size of each clone and calculated the average size for each genotype. However, the frequency distribution analysis was chosen because it highlights the significance of the clone size differences among genotypes.

      (f) How many times were experiments performed?

      All experiments are performed 3 times.

      (4) The authors do not comment on the reduction of clone size in DSS treatment in Figure 6K. How do they interpret this? Does it conflict with their model of Bleo vs DSS?

      numb<sup>4</sup> clone containing guts treated with DSS exhibited a slight reduction of clone size, evident by a higher percentage of 2-cell clones and lower percentage of > 8 cell clones. This reduction is less significant in guts containing numb<sup>15</sup> clones. However, the percentage of Dl<sup>+</sup>-containing clones is similar between DSS and mock-treated guts. It is possible that ISC proliferation is lightly reduced due to numb<sup>4</sup> mutation or the genetic background.

      (5) There is probably a mistake on sentence line 314 -316 "Indeed, previous studies indicate that endogenous Numb was not undetectable by Numb antibodies that could detect Numb expression in the nervous system".

      We will make a correction of the sentence.

      Reviewer #3 (Public review):

      Summary:

      The authors provide an in-depth analysis of the function of Numb in adult Drosophila midgut. Based on RNAi combinations and double mutant clonal analyses, they propose that Numb has a function in inhibiting Notch pathway to maintain intestinal stem cells, and is a backup mechanism with BMP pathway in maintaining midgut stem cell mediated homeostasis.

      Strengths:

      Overall, this is a carefully constructed series of experiments, and the results and statistical analyses provides believable evidence that Numb has a role, albeit weak compared to other pathways, in sustaining ISC and in promoting regeneration especially after damage by bleomycin, which may damage enterocytes and therefore disrupt BMP pathway more. The results overall support their claim.

      The data are highly coherent, and support a genetic function of Numb, in collaborating with BMP signaling, to maintain the number and proliferative function of ISCs in adult midguts. The authors used appropriate and sophisticated genetic tools of double RNAi, mutant clonal analysis and dual marker stem cell tracing approaches to ensure the results are reproducible and consistent. The statistical analyses provide confidence that the phenotypic changes are reliable albeit weaker than many other mutants previously studied.

      Weaknesses:

      In the absence of Numb itself, the midgut has a weak reduction of ISC number (Fig. 3 and 5), as well as weak albeit not statistically significant reduction of ISC clone size/proliferation. I think the authors published similar experiments with BMP pathway mutants. The mad<sup>1-2</sup> allele used here as stated below may not be very representative of other BMP pathway mutants. Therefore, it could be beneficial to compare the number of ISC number and clone sizes between other BMP experiments to provide the readers with a clearer picture of how these two pathways individually contribute (stronger/weaker effects) to the ISC number and gut homeostasis.

      Thank you for your comment. We have tested other components of BMP pathway in our previously study (Tian et al., 2014). More complete loss of BMP signaling (for example, Put clones, Put RNAi, Tkv/Sax double mutant clones or double RNAi) resulted in ISC loss regardless of the status of numb, suggesting a more predominant role of BMP signaling in ISC self-renewal compared with Numb. We speculate that the weak stem cell loss phenotype associated with numb mutant clones in otherwise wild type background could be due to fluctuation of BMP signaling in homeostatic guts.

      The main weakness of this manuscript is the analysis of the BMP pathway components, especially the mad<sup>1-2</sup> allele. The mad RNAi and mad<sup>1-2</sup> alleles (P insertion) are supposed to be weak alleles and that might be suitable for genetic enhancement assays here together with numb RNAi. However, the mad<sup>1-2</sup> allele, and sometimes the mad RNAi, showed weakly increased ISC clone size. This is kind of counter-intuitive that they should have a similar ISC loss and ISC clone size reduction.

      We used mad<sup>1-2</sup> and mad RNAi here to test the genetic interaction with numb because our previous studies showed that partial loss of BMP signaling under these conditions did not cause stem cell loss, therefore, may provide a sensitized background to determine the role of Numb in ISC self-renewal. The increased proliferation of ISC/ clone size in associated with mad<sup>1-2</sup> and mad RNAi is due to the fact that the reduction of BMP signaling in either EC or EB will non-autonomously induce stem cell proliferation. However, in mad numb double mutant clones, there was a reduction in clone size, which correlated with loss of ISC.

      A much stronger phenotype was observed when numb mutants were subject to treatment of tissue damaging agents Bleomycin, which causes damage in different ways than DSS. Bleomycin as previously shown to be causing mainly enterocyte damage,  and therefore disrupt BMP signaling from ECs more likely. Therefore, this treatment together with loss of numb led to a highly significant reduction of ISC in clones and reduction of clone size/proliferation. One improvement is that it is not clear whether the authors discussed the nature of the two numb mutant alleles used in this study and the comparison to the strength of the RNAi allele. Because the phenotypes are weak and more variable, the use of specific reagents is important.

      Numb<sup>15</sup> is a null allele, and the nature of numb<sup>4</sup> has not been elucidated. According to Domingos, P.M. et al., numb<sup>15</sup> induced a more severe phenotype than numb<sup>4</sup> did. Consistently, we also found that more numb<sup>15</sup> mutant clones were void of stem cell than numb<sup>4</sup>.

      Furthermore, the use of possible activating alleles of either or both pathways to test genetic enhancement or synergistic activation will provide strong support for the claims.

      Activation of BMP (Tkv<sup>CA</sup>) also induced stem cell tumor (Tian et al., 2014), which is not suitable for synergistic activation experiment.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study offers a useful treatment of how the population of excitatory and inhibitory neurons integrates principles of energy efficiency in their coding strategies. The analysis provides a comprehensive characterisation of the model, highlighting the structured connectivity between excitatory and inhibitory neurons. However, the manuscript provides an incomplete motivation for parameter choices. Furthermore, the work is insufficiently contextualized within the literature, and some of the findings appear overlapping and incremental given previous work.

      We are genuinely grateful to the Editors and Reviewers for taking time to provide extremely valuable suggestions and comments, which will help us to substantially improve our paper. We decided to do our very best to implement all suggestions, as detailed in the point-by-point rebuttal letter below. We feel that our paper has improved considerably as a result. 

      Public Reviews:

      Reviewer #1 (Public Review): 

      Summary: Koren et al. derive and analyse a spiking network model optimised to represent external signals using the minimum number of spikes. Unlike most prior work using a similar setup, the network includes separate populations of excitatory and inhibitory neurons. The authors show that the optimised connectivity has a like-to-like structure, leading to the experimentally observed phenomenon of feature competition. They also characterise the impact of various (hyper)parameters, such as adaptation timescale, ratio of excitatory to inhibitory cells, regularisation strength, and background current. These results add useful biological realism to a particular model of efficient coding. However, not all claims seem fully supported by the evidence. Specifically, several biological features, such as the ratio of excitatory to inhibitory neurons, which the authors claim to explain through efficient coding, might be contingent on arbitrary modelling choices. In addition, earlier work has already established the importance of structured connectivity for feature competition. A clearer presentation of modelling choices, limitations, and prior work could improve the manuscript.

      Thanks for these insights and for this summary of our work.  

      Major comments:

      (1) Much is made of the 4:1 ratio between excitatory and inhibitory neurons, which the authors claim to explain through efficient coding. I see two issues with this conclusion: (i) The 4:1 ratio is specific to rodents; humans have an approximate 2:1 ratio (see Fang & Xia et al., Science 2022 and references therein); (ii) the optimal ratio in the model depends on a seemingly arbitrary choice of hyperparameters, particularly the weighting of encoding error versus metabolic cost. This second concern applies to several other results, including the strength of inhibitory versus excitatory synapses. While the model can, therefore, be made consistent with biological data, this requires auxiliary assumptions.

      We now describe better the ratio of numbers of E and I neurons found in real data, as suggested. The first submission already contained an analysis of how the optimal ratio of E vs I neuron numbers depends in our model on the relative weighting of the loss of E and I neurons and on the relative weighting of the encoding error vs the metabolic cost in the loss function (see Fig. 7E). We revised the text on page 12 describing Fig. 7E. 

      To allow readers to form easily a clear idea of how the weighting of the error vs the cost may influence the optimal network configuration, we now present how optimal parameters depend on the weighting in a systematic way, by always including this type of analysis when studying all other model parameters (time constants of single E and I neurons, noise intensity, metabolic constant, ratio of mean I-I to E-I connectivity). These results are shown on the Supplementary Fig. S4 A-D and H, and we comment briefly on each of them in Results sections (pages 9, 10, 11 and 12) that analyze each of these parameters.  

      Following this Reviewer’s comment, we now included a joint analysis of network performance relative to the ratio of E-I neuron numbers and the ratio of mean I-I to E-I connectivity (Fig. 7J). We found a positive correlation between optima values of these two ratios. This implies that a lower ratio of E-I neuron numbers, such as a 2:1 ratio in human cortex mentioned by the reviewer, predicts lower optimal ratio of I-I to E-I connectivity and thus weaker inhibition in the network. We made sure that this finding is suitably described in revision (page 13).

      (2) A growing body of evidence supports the importance of structured E-I and I-E connectivity for feature selectivity and response to perturbations. For example, this is a major conclusion from the Oldenburg paper (reference 62 in the manuscript), which includes extensive modelling work. Similar conclusions can be found in work from Znamenskiy and colleagues (experiments and spiking network model; bioRxiv 2018, Neuron 2023 (ref. 82)), Sadeh & Clopath (rate network; eLife, 2020), and Mackwood et al. (rate network with plasticity; eLife, 2021). The current manuscript adds to this evidence by showing that (a particular implementation of) efficient coding in spiking networks leads to structured connectivity. The fact that this structured connectivity then explains perturbation responses is, in the light of earlier findings, not new.

      We agree that the main contribution of our manuscript in this respect is to show how efficient coding in spiking networks can lead to structured connectivity implementing lateral inhibition similar to that proposed in the recent studies mentioned by the Reviewer. We apologize if this was not clear enough in the previous version. We streamlined the presentation to make it clearer in revision.  We nevertheless think it useful to report the effects of perturbations within this network because these results give information about how lateral inhibition works in our network. Thus, we kept presenting it in the revised version, although we de-emphasized and simplified its presentation. We now give more emphasis to the novelty of the derivation of this connectivity rule from the principles of efficient coding (pages 4 and 6). We also describe better (page 8) what the specific results of our simulated perturbation experiments add to the existing literature.

      (3) The model's limitations are hard to discern, being relegated to the manuscript's last and rather equivocal paragraph. For instance, the lack of recurrent excitation, crucial in neural dynamics and computation, likely influences the results: neuronal time constants must be as large as the target readout (Figure 4), presumably because the network cannot integrate the signal without recurrent excitation. However, this and other results are not presented in tandem with relevant caveats.

      We improved the Limitations paragraph in Discussion, and also anticipated caveats in tandem with results when needed, as suggested. 

      We now mention the assumption of equal time constants between the targets and readouts in the Abstract. 

      We now added the analysis of the network performance and dynamics as a function of the time constant of the target (t<sub>x</sub>) to the Supplementary Fig S5 (C-E). These results are briefly discussed in text on page 13. The only measure sensitive to t<sub>x</sub> is the encoding error of E neurons, with a minimum at t<sub>x</sub> =9 ms, while I neurons and metabolic cost show no dependency. Firing rates, variability of spiking as well as the average and instantaneous balance show no dependency on t<sub>x</sub>. We note that t<sub>x</sub> = t, with t=1/l the time constant of the population readout (Eq. 9), is an assumption we use when we derive the model from the efficiency objective (Eq. 18 to 23). In our new and preliminary work (Koren, Emanuel, Panzeri, Biorxiv 2024), we derived a more general class of models where this assumption is relaxed, which gives a network with E-E connectivity that adapts to the time constant of the stimulus. Thus, the reviewer is correct in the intuition that the network requires E-E connectivity to better integrate target signals with a different time constant than the time constant of the membrane. We now better emphasize this limitation in Discussion (page 16).

      (4) On repeated occasions, results from the model are referred to as predictions claimed to match the data. A prediction is a statement about what will happen in the future – but most of the “predictions” from the model are actually findings that broadly match earlier experimental results, making them “postdictions”.

      This distinction is important: compared to postdictions, predictions are a much stronger test because they are falsifiable. This is especially relevant given (my impression) that key parameters of the model were tweaked to match the data.

      We now comment on every result from the model as either matching earlier experimental results, or being a prediction for experiments. 

      In Section “Assumptions and emergent properties of the efficient E-I network derived from first principles”, we report (page 4) that neural networks have connectivity structure that relates to tuning similarity of neurons (postdiction). 

      In Section “Encoding performance and neural dynamics in an optimally efficient E-I network” we report (page 5) that in a network with optimal parameters, I neurons have higher firing rate than E neurons (postdiction), that single neurons show temporally correlated synaptic currents (postdiction) and that the distribution of firing rates across neurons is log-normal (postdiction). 

      In Section “Competition across neurons with similar stimulus tuning emerging in efficient spiking networks” we report (page 6)  that the activity perturbation of E neurons induces lateral inhibition on other E neurons, and that the strength of lateral inhibition depends on tuning similarity (postdiction). We show that activity perturbation of E neurons induces lateral excitation in I neurons (prediction). We moreover show that the specific effects of the perturbation of neural activity rely on structured E-I-E connectivity (prediction for experiments, but similar result in Sadeh and Clopath, 2020). We show strong voltage correlations but weak spike-timing correlations in our network (prediction for experiments, but similar result in Boerlin et al. 2013). 

      In Section “The effect of structured connectivity on coding efficiency and neural dynamics”, we report (page 7) that our model predicts a number of differences between networks with structured and unstructured (random) connectivity. In particular, structured networks differ from unstructured ones by showing better encoding performance, lower metabolic cost, weaker variance over time in the membrane potential of each neuron, lower firing rates and weaker average and instantaneous balance of synaptic currents.

      In Section “Weak or no spike-triggered adaptation optimizes network efficiency”, we report (page 9) that our model predicts better encoding performance in networks with adaptation compared to facilitation. Our results suggest that adaptation should be stronger in E compared to I (PV+) neurons (postdiction). In the same section, we report (page 10) that our results suggest that the instantaneous balance is a better predictor of model efficiency than average balance (prediction).

      In Section “Non-specific currents regulate network coding properties”, we report (page 10) that our model predicts that more than half of the distance between the resting potential and firing threshold is taken by external currents that are unrelated to feedforward processing (postdiction). We also report (page 11) that our model predicts that moderate levels of uncorrelated (additive) noise is beneficial for efficiency (prediction for experiments, but similar results in Chalk et al., 2016, Koren et al., 2017, Timcheck et al. 2022).

      In Section “Optimal ratio of E-I neuron numbers and of mean I-I to E-I synaptic efficacy coincide with biophysical measurements”, we predict the optimal ratio of E to I neuron numbers to be 4:1 (postdiction) and the optimal ratio of mean I-I to E-I connectivity to be 3:1 (postdiction). Further, we report (page 13) that our results predict that a decrease in the ratio of E-I neuron numbers is accompanied with the decrease in the ratio of mean I-I to E-I connectivity. 

      Finally, in Section “Dependence of efficient coding and neural dynamics on the stimulus statistics”, we report (page 13) that our model predicts that the efficiency of the network has almost no dependence on the time scale of the stimulus (prediction). 

      Reviewer #2 (Public Review):

      Summary:

      In this work, the authors present a biologically plausible, efficient E-I spiking network model and study various aspects of the model and its relation to experimental observations. This includes a derivation of the network into two (E-I) populations, the study of single-neuron perturbations and lateral-inhibition, the study of the effects of adaptation and metabolic cost, and considerations of optimal parameters. From this, they conclude that their work puts forth a plausible implementation of efficient coding that matches several experimental findings, including feature-specific inhibition, tight instantaneous balance, a 4 to 1 ratio of excitatory to inhibitory neurons, and a 3 to 1 ratio of I-I to E-I connectivity strength. It thus argues that some of these observations may come as a direct consequence of efficient coding.

      Strengths:

      While many network implementations of efficient coding have been developed, such normative models are often abstract and lacking sufficient detail to compare directly to experiments. The intention of this work to produce a more plausible and efficient spiking model and compare it with experimental data is important and necessary in order to test these models.

      In rigorously deriving the model with real physical units, this work maps efficient spiking networks onto other more classical biophysical spiking neuron models. It also attempts to compare the model to recent single-neuron perturbation experiments, as well as some longstanding puzzles about neural circuits, such as the presence of separate excitatory and inhibitory neurons, the ratio of excitatory to inhibitory neurons, and E/I balance. One of the primary goals of this paper, to determine if these are merely biological constraints or come from some normative efficient coding objective, is also important.

      Though several of the observations have been reported and studied before (see below), this work arguably studies them in more depth, which could be useful for comparing more directly to experiments.

      Thanks for these insights and for the kind words of appreciation of the strengths of our work.  

      Weaknesses:

      Though the text of the paper may suggest otherwise, many of the modeling choices and observations found in the paper have been introduced in previous work on efficient spiking models, thereby making this work somewhat repetitive and incremental at times. This includes the derivation of the network into separate excitatory and inhibitory populations, discussion of physical units, comparison of voltage versus spike-timing correlations, and instantaneous E/I balance, all of which can be found in one of the first efficient spiking network papers (Boerlin et al. 2013), as well as in subsequent papers. Metabolic cost and slow adaptation currents were also presented in a previous study (Gutierrez & Deneve 2019). Though it is perfectly fine and reasonable to build upon these previous studies, the language of the text gives them insufficient credit.

      We indeed built our work on these important previous studies, and we apologize if this was not clear enough. We thus improved the text to make sure that credit to previous studies is more precisely and more clearly given (see detailed reply for the list of changes made). 

      To facilitate the understanding on how we built on previous work, we expanded the comparison of our results with the results of Boerlin et al. (2013) about voltage correlations and uncorrelated spiking (page 7), comparison with the derivation of physical units of Boerlin et al. (2013) (page 3), discussion of how results on the ratio of the number of E to I neurons relate  to Calaim et al (2022) and Barrett et al. (2016) (page 16), and comment on the previous work by Gutierrez and Deneve about adaptation (page 8).  

      Furthermore, the paper makes several claims of optimality that are not convincing enough, as they are only verified by a limited parameter sweep of single parameters at a time, are unintuitive and may be in conflict with previous findings of efficient spiking networks. This includes the following. 

      Coding error (RMSE) has a minimum at intermediate metabolic cost (Figure 5B), despite the fact that intuitively, zero metabolic cost would indicate that the network is solely minimizing coding error and that previous work has suggested that additional costs bias the output. 

      Coding error also appears to have a minimum at intermediate values of the ratio of E to I neurons (effectively the number of I neurons) and the number of encoded variables (Figures 6D, 7B). These both have to do with the redundancy in the network (number of neurons for each encoded variable), and previous work suggests that networks can code for arbitrary numbers of variables provided the redundancy is high enough (e.g., Calaim et al. 2022). 

      Lastly, the performance of the E-I variant of the network is shown to be better than that of a single cell type (1CT: Figure 7C, D). Given that the E-I network is performing a similar computation as to the 1CT model but with more neurons (i.e., instead of an E neuron directly providing lateral inhibition to its neighbor, it goes through an interneuron), this is unintuitive and again not supported by previous work. These may be valid emergent properties of the E-I spiking network derived here, but their presentation and description are not sufficient to determine this.

      With regard to the concern that our previous analyses considered optimal parameter sets determined with a sweep of a single parameter at a time, we have addressed this issue in two ways. First, we presented (Figure 6I and 7J and text on pages 11 and 13) results of joint sweeps of variations of pairs of parameters whose joint variations are expected to influence optimality in a way that cannot be understood varying one parameter at a time. These new analyses complement the joint parameter sweep of the time constants of single E and I neurons (t<sub>r</sub><sup>E</sup> and t<sub>r</sub><sup>I</sup>) that has already been presented in Fig. 5A (former Fig. 4A). Second, we conducted, within a reasonable/realistic range of possible variations of each individual parameter, a Monte-Carlo random joint sampling (10000 simulations with 20 trials each) of all 6 model parameters that we explored in the paper. We presented these new results on Fig. 2 and discuss it on pages 5-6. 

      The Reviewer is correct in stating that the error (RMSE) exhibits a counterintuitive minimum as a function of the metabolic constant despite the fact that, intuitively, for vanishing metabolic constant the network is solely minimizing the coding error (Fig. 6B). In our understanding, this counterintuitive finding is due to the presence of noise in the membrane potential dynamics. In the presence of noise, a non-vanishing metabolic constant is needed to suppress “inefficient” spikes purely induced by noise that do not contribute to coding and increase the error. This gives rise to a form of “stochastic resonance”, where the noise improves detection of the signal coming from the feedforward currents. We note that the metabolic constant and the noise variance both appear in the non-specific external current (Eq. 29f in Methods), and, thus, a covariation in their optimal values is expected. Indeed, we find that the optimal metabolic constant monotonically increases as a function of the noise variance, with stronger regularization (larger beta) required to compensate for larger variability (larger sigma) (Fig. 6I). Finally, we note that a moderate level of noise (which, in turn, induces a non-trivial minimum of the coding error as a function of beta) in the network is optimal. The beneficial effect of moderate levels of noise on performance in networks with efficient coding has been shown in different contexts in previous work (Chalk et al. 2016, Koren and Deneve, 2017). The intuition is that the noise prevents the excessive synchronization of the network and insufficient single neuron variability that decrease the performance. The points above are now explained in the revised text on page 11.

      The Reviewer is also correct in stating that the network exhibits an optimal performance for intermediate values of the number of I neurons and the number of encoded features. In our understanding, the optimal number of encoded features of M=3 arises simply because all the other parameters were optimized for those values of M. The purpose of those analyses was not to state that a network optimally encodes only a given number of features, but how a network whose parameters are optimized for a given M perform reasonably well when M is varied. We clarify this on page 13 of Results in Discussion on page 16. In the same Discussion paragraph we refer also to the results of Calaim et al mentioned by the Reviewer. 

      To address the concern about the comparison of efficiency between the E-I and the 1CT model, we took advantage of the Reviewer’s suggestions to consider this issue more deeply. In revision, we now compare the efficiency of the 1CT model with the E population of the E-I model (Fig. 8H). This new comparison changes the conclusion about which model is more efficient, as it shows the 1CT model is slightly more efficient than the E-I model. Nevertheless, the E-I model performance is more robust to small variations of optimal parameters, e.g., it exhibits biologically plausible firing rates for non-optimal values of the metabolic constant. See also the reply to point 3 of the Public Review of Reviewer 2 for more detail. We added these results and the ensuing caveats for the interpretation of this comparison on Page 14, and also revised the title of the last subsection of Results.  

      Alternatively, the methodology of the model suggests that ad hoc modeling choices may be playing a role. For example, an arbitrary weighting of coding error and metabolic cost of 0.7 to 0.3, respectively, is chosen without mention of how this affects the results. Furthermore, the scaling of synaptic weights appears to be controlled separately for each connection type in the network (Table 1), despite the fact that some of these quantities are likely linked in the optimal network derivation. Finally, the optimal threshold and metabolic constants are an order of magnitude larger than the synaptic weights (Table 1). All of these considerations suggest one of the following two possibilities. One, the model has a substantial number of unconstrained parameters to tune, in which case more parameter sweeps would be necessary to definitively make claims of optimality. Or two, parameters are being decoupled from those constrained by the optimal derivation, and the optima simply corresponds to the values that should come out of the derivation.

      We thank the reviewer for bringing about these important questions.

      In the first submission, we presented both the encoding error and the metabolic cost separately as a function of the parameters, so that readers could get an understanding of how stable optimal parameters would be to the change of the relative weighting of encoding error and metabolic cost. We specified this in Results (page 5) and we kept presenting separately encoding and metabolic terms in the revision.

      However, we agree that it is important to present the explicit quantification on how the optimal parameters may depend on g<sub>L</sub>. In the first submission, we showed the analysis for all possible weightings in case of two parameters for which we found this analysis was the most relevant – the ratio of neuron numbers (Fig. 7E, Fig. 6E in first submission) and the optimal number of input features M (see last paragraph on page 13 and Fig. 8D). We now show this analysis also for the rest of studied model parameters in the Supplementary Fig. S4 (A-D and H). This is discussed on pages 9, 10,11 and 12.

      With regard to the concern that the scaling of synaptic weights should not be controlled separately for each connection type in the network, we agree and we would like to clarify that we did not control such scaling separately. Apologies if this was not clear enough. From the optimal analytical solution, we obtained that the connectivity scales with the standard deviation of decoding weights (s<sub>w</sub><sup>E</sup> and s<sub>w</sub><sup>I</sup>) of the pre and postsynaptic populations (Methods, Eq. 32). We studied the network properties as a function of the ratio of average I-I to E-I connectivity (Fig. 7 F-I; Supplementary Fig. S4 D-H), which is equivalent to the ratio of standard deviations s<sub>w</sub><sup>I</sup> /s<sub>w</sub><sup>E</sup> (see Methods, Eq. 35). We clarified this in text on page 12.

      Next, it is correct that our synaptic weights are an order of magnitude smaller than the metabolic constant. We analysed a simpler version of the network that has the coding and dynamics identical to our full model (Methods, Eq. 25) but without the external currents. We found that the optimal parameters determining the firing threshold in such a simpler network were biologically implausible (see Supplementary Text 2 and Supplementary Table S1). We considered as another simple solution the rescaling of the synaptic efficacy such as to have biologically plausible threshold. However, that gave implausible mean synaptic efficacy (see Supplementary Text 2).  Thus, to be able to define a network with biologically plausible firing threshold and mean synaptic efficacy, we introduced the non-specific external current. After introducing such current, we were able to shift the firing threshold to biologically plausible values while keeping realistic values of mean synaptic efficacy. Biologically plausible values for the firing threshold are around 15 -– 20 mV above the resting potential (Constantinople and Bruno, 2013), which is the value that we have in our model. A plausible value for the average synaptic strength is between a fraction of one millivolt to a couple of millivolts (Constantinople & Bruno, 2013, Campagnola et al. 2022), which also corresponds to values that the synaptic weights take. The above results are briefly explained in the revised text on page 4.

      Finally, to study the optimality of the network when changing multiple parameters at a time, we added a new analysis with Monte-Carlo random joint sampling (10.000 parameter sets with 20 trials for each set) of all 6 model parameters that we explored in the paper. We compared (Fig 2) the so-obtained results of each simulation with those obtained from the understanding gained from varying one or two parameters at a time (optimal parameters reported in Table 1 and used throughout the paper).  We found (Fig. 2) that the optimal configuration in Table 1 was never improved by any other simulations we performed, and that the first three random simulations that came the closest to the optimal one of Table 1 had stronger noise intensity but also stronger metabolic cost than the configuration on Table 1. The second, third and fourth configurations had longer time constants of both E and I single neurons (adaptation time constants). Ratio of E-I neuron numbers and of I-I to E-I connectivity in the second, third and fourth best configuration were either jointly increased or decreased with respect to our configuration. These results are reported on Fig. 2 and in Tables 2-3 and they are discussed in Results (page 5).

      Reviewer #3 (Public Review):

      Summary:

      In their paper the authors tackle three things at once in a theoretical model: how can spiking neural networks perform efficient coding, how can such networks limit the energy use at the same time, and how can this be done in a more biologically realistic way than previous work?

      They start by working from a long-running theory on how networks operating in a precisely balanced state can perform efficient coding. First, they assume split networks of excitatory (E) and inhibitory (I) neurons. The E neurons have the task to represent some lower dimensional input signal, and the I neurons have the task to represent the signal represented by the E neurons. Additionally, the E and I populations should minimize an energy cost represented by the sum of all spikes. All this results in two loss functions for the E and I populations, and the networks are then derived by assuming E and I neurons should only spike if this improves their respective loss. This results in networks of spiking neurons that live in a balanced state, and can accurately represent the network inputs.

      They then investigate in-depth different aspects of the resulting networks, such as responses to perturbations, the effect of following Dale's law, spiking statistics, the excitation (E)/inhibition (I) balance, optimal E/I cell ratios, and others. Overall, they expand on previous work by taking a more biological angle on the theory and showing the networks can operate in a biologically realistic regime.

      Strengths:

      (1) The authors take a much more biological angle on the efficient spiking networks theory than previous work, which is an essential contribution to the field.

      (2) They make a very extensive investigation of many aspects of the network in this context, and do so thoroughly.

      (3) They put sensible constraints on their networks, while still maintaining the good properties these networks should have.

      Thanks for this summary and for these kind words of appreciation of the strengths of our work.  

      Weaknesses:

      (1) The paper has somewhat overstated the significance of their theoretical contributions, and should make much clearer what aspects of the derivations are novel. Large parts were done in very similar ways in previous papers. Specifically: the split into E and I neurons was also done in Boerlin et al (2008) and in Barrett et al (2016). Defining the networks in terms of realistic units was already done by Boerlin et al (2008). It would also be worth it to discuss Barrett et al (2016) specifically more, as there they also use split E/I networks and perform biologically relevant experiments.

      We improved the text to make sure that credit to previous studies is more precisely and more clearly given (see rebuttal to the specific suggestions of Reviewer 2 for a full list).

      We apologize if this was not clear enough in the previous version. 

      With regard to the specific point raised here about the E-I split, we revised the text on page 2. With regard to the realistic units, we revised the text on page 3. Finally, we commented on relation between our results and results of the study by Barrett et al. (2016) on page 16.

      (2) It is not clear from an optimization perspective why the split into E and I neurons and following Dale's law would be beneficial. While the constraints of Dale's law are sensible (splitting the population in E and I neurons, and removing any non-Dalian connection), they are imposed from biology and not from any coding principles. A discussion of how this could be done would be much appreciated, and in the main text, this should be made clear.

      We indeed removed non-Dalian connections because Dale’s law is a major constraint for biological plausibility. Our logic was to consider efficient coding within the space of networks that satisfy this (and other) biological plausibility constraints. We did not intend to claim that removing the non-Dalian connections was the result of an analytical optimization. We clarified this in revision (page 4).

      (3) Related to the previous point, the claim that the network with split E and I neurons has a lower average loss than a 1 cell-type (1-CT) network seems incorrect to me. Only the E population coding error should be compared to the 1-CT network loss, or the sum of the E and I populations (not their average). In my author recommendations, I go more in-depth on this point.

      We carefully considered these possibilities and decided to compare only the E population of the E-I model with the 1-CT model. On Fig.8G (7C of the first submission), E neurons have a slightly higher error and cost compared to the 1CT network. In the revision, we compared the loss of E neurons of the E-I model with the loss of the 1-CT model. Using such comparison, we found that the 1CT network has lower loss and is more efficient compared to E neurons of the E-I model. We revised Figure 8H and text on page 14 to address this point. 

      (4) While the paper is supposed to bring the balanced spiking networks they consider in a more experimentally relevant context, for experimental audiences I don't think it is easy to follow how the model works, and I recommend reworking both the main text and methods to improve on that aspect.

      We tried to make the presentation of the model more accessible to a non-computational audience in the revised paper. We carefully edited the text throughout to make it as accessible as possible. 

      Assessment and context:

      Overall, although much of the underlying theory is not necessarily new, the work provides an important addition to the field. The authors succeeded well in their goal of making the networks more biologically realistic, and incorporating aspects of energy efficiency. For computational neuroscientists, this paper is a good example of how to build models that link well to experimental knowledge and constraints, while still being computationally and mathematically tractable. For experimental readers, the model provides a clearer link between efficient coding spiking networks to known experimental constraints and provides a few predictions.

      Thanks for these kind words. We revised the paper to make sure that these points emerge more clearly and in a more accessible way from the revised paper.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Referring to the major comments:

      (1) Be upfront about particular modelling choices and why you made them; avoid talk of a "striking/surprising", etc. ability to explain data when this actually requires otherwise-arbitrary choices and auxiliary assumptions. Ideally, this nuance is already clear from the abstract.

      We removed all the "striking/surprising" and similar expressions from the text. 

      We added to the Abstract the assumption of equal time constants of the stimulus and of the membrane of E and I neurons and the assumption of the independence of encoded stimulus features.

      In revision, we performed additional analyses (joint parameter sweeps, Monte-Carlo joint sampling of all 6 model parameters) providing additional evidence that the network parameters in Table 1 capture reasonably well the optimal solution. These are reported on Figs. 2, 6I and 7J and in Results (pages 5, 11 and 13). See rebuttal to weaknesses of the public review of the Referee 2 for details.

      (2) Make even more of an effort to acknowledge prior work on the importance of structured E-I and I-E connectivity.

      We have revised the text (page 4) to better place our results within previous work on structured E-I and I-E connectivity.

      (3) Be clear about the model's limitations and mention them throughout the text. This will allow readers to interpret your results appropriately.

      We now comment more on model's limitations, in particular the simplifying assumption about the network's computation (page 16), the lack of E-E connectivity (page 3), the absence of long-term adaptation (page 10), and the simplification of only having one type of inhibitory neurons (page 16). 

      (4) Present your "predictions" for what they are: aspects of the model that can be made consistent with the existing data after some fitting. Except in the few cases where you make actual predictions, which deserve to be highlighted.

      We followed the suggestion of the reviewer and distinguished cases where the model is consistent with the data (postdictions) from actual predictions, where empirical measurements are not available or not conclusive. We compiled a list of predictions and postdictions in response to the point 4 of Reviewer 1. In revision, we now comment about every property of the model as either reproducing a known property of biological networks (postdiction) or being a prediction. We improved the text in Results on pages 4, 5, 6, 7, 9, 10, 11, 12 and 13 to accommodate these requests.

      Minor comments and recommendations

      It's a sizable list, but most can be addressed with some text edits.

      (1) The image captions should give more details about the simulations and analyses, particularly regarding sample sizes and statistical tests. In Figure 5, for example, it is unclear if the lines represent averages over multiple signals and, if so, how many. It's probably not a single realization, but if it is, this might explain the otherwise puzzling optimal number of three stimuli. Box plots visualize the distribution across simulation trials, but it's not clear how many. In Figure 7d, a star suggests statistical significance, but the caption does not mention the test or its results; the y-axis should also have larger limits.

      All statistical results were computed on 100 or 200 simulation trials, depending on the figure, with duration of the trial of 1 second of simulated time. To compute statistical results in Fig. 1, we used 10 trials with duration of 10 seconds for each trial. Each trial consisted of M independent realizations of Ornstein-Uhlenbeck (OU) processes as stimuli, independent noise in the membrane potential and an independent draw of tuning parameters, such that the results are general over specific realization of these random variables. Realizations of the OU processes were independent across stimulus dimensions and across trials. We added this information in the caption of each figure. 

      The optimal number of M=3 stimuli is the result of measuring the performance of the network in 100 simulation trials (for each parameter value), thus following the same procedure as for all other parameters. Boxplots on Fig. 8G-H were also generated from results computed in 100 simulation trials, which we have now specified in the caption of the figure, together with the statistical test used for assessing the significance (twotailed t-test). We also enlarged the limits of Fig. 8H (7D in the previous version).

      (2) The Oldenburg paper (reference 62) finds suppression of all but nearby neurons in response to two- photon stimulation of small neural ensembles (instead of single neurons, as in Chettih & Harvey). This isn't perfectly consistent with the model's results, even though the Oldenburg experiments seem more relevant given the model's small size, and strong connectivity/high connection probability between similarly tuned neurons. What might explain the potential mismatch?

      We sincerely apologize for not having been precise enough on this point when comparing our model against Chettih & Harvey and Oldenburg et al. We corrected the sentence (page 6) to remove the claim that our model reproduces both. 

      We speculate that the discrepancy between perturbing our model and the Oldenburg data may arise from the lack of E-E connectivity in our model. Synaptic connections between E neurons with similar selectivity could create an enhancement instead of suppression between neuronal pairs with very similar tuning. We added a sentence about this in the section with perturbation experiments “Competition across neurons with similar stimulus tuning emerging in efficient spiking networks” (page 7) where we discuss this limitation of our model. We feel that this example shows the utility to derive some perturbation results from our model, as not all networks with some degree of lateral inhibition will show the same perturbation results. Comparing our model's perturbation with real data perturbation results has thus some value to better appreciate strengths and limitations of our approach. 

      (3) "Previous studies optogenetically stimulated E neurons but did not determine whether the recorded neurons were excitatory or inhibitory " (p. 11). I believe Oldenburg et al. did specifically image excitatory neurons.

      The reviewer is correct about Oldenburg et al. imaging specifically excitatory neurons. We have revised this part of the Discussion (page 15). 

      (4) The authors write that efficiency is particularly achieved where adaptation is stronger in E compared to I neurons (p. 7; Figure 4). Although this would be consistent with experimental data (the I neurons in the model seem akin to fast-spiking Pv+ cells), I struggle to see it in the figure. Instead, it seems like there are roughly two regimes. If either of the neuronal timescales is faster than the stimulus timescale, the optimisation fails. If both are at least as slow, optimisation succeeds.

      We agree with the reviewer that the adaptation properties of our inhibitory neurons are compatible with Pv+ cells. What is essential for determining the dynamical regime of the network is less the relation to the time constant of the stimulus (t<sub>x</sub>) but rather the relation between the time constant of the population readout (t, which is also the membrane time constant) and the time constant of the single neuron (t<sub>r</sub><sup>y</sup> for y=E and y=I; see Eq. 23, 25 or 29e). The relation between t and t<sub>r</sub><sup>y</sup> determines if single neurons generate spike-triggered adaptation (t<sub>r</sub><sup>y</sup> > t) or spike-triggered facilitation (t<sub>r</sub><sup>y</sup> < t; see Table 4). In regimes with facilitation in either E or I neurons (or both), the network performance strongly deteriorates compared to regimes with adaptation (Fig. 5A). 

      Beyond adaptation leading to better performance, we also found different effects of adaptation in E and I neurons. We acknowledge that the difference of these effects was difficult to see from the Fig. 4B in the first submission. We have now replotted results from previously shown Fig. 4B to focus on the adaptation regime only, (since the Fig. 5A already establishes that this is the regime with better performance). We also added figures showing the differential effect of adaptation in E and I cell type on the firing rate and on the average loss (Fig. 5C-D). Fig. 5B and C (top plots) show that with adaptation in E neurons, the error and the loss increase more slowly than with adaptation in I neurons. Moreover, the firing rate in both cell types decreases with adaptation in E neurons, while this is not the case with adaptation in I neurons (Fig. 5D). These results are added to the figure panels specified above and discussed in text on page 9.

      To clarify the relation between neuronal and stimulus timescale, we now also added the analysis of network performance as a function of the time constant of the stimulus t<sub>x</sub> (Supplementary Fig. S5 C-E). We found that the model's performance is optimal when the time constant of the stimulus is close to the membrane time constant t. This result is expected, because the equality of these time constants was imposed in our analytical derivation of the model (t<sub>x</sub>  = t). We see a similar decrease in performance for values of t<sub>x</sub>  that are faster and slower with respect to the membrane time constant (Supplementary Fig. S5C, top). These results are added to the figure panels specified above and discussed in text on page 13.

      (5) A key functional property of cortical interneurons is their lower stimulus selectivity. Does the model replicate this feature?

      We think that whether I neurons are less selective than E neurons is still an open question. A number of recent empirical studies reported that the selectivity of I neurons is comparable to the selectivity of E neurons (see., e.g., Kuan et al. Nature 2024, Runyan et al. Neuron 2010, Najafi et al. Neuron 2020). In our model, the optimal solution prescribes a precise structure in recurrent connectivity (see Eq. 24 and Fig. 1C(ii)) and structured connectivity endows I neurons with stimulus selectivity. To show this, we added plots of example tuning curves and the distribution of the selectivity index across E and I neurons (Fig. 8E-F) and described these new results in Results (page 14). Tuning curves in our network were similar to those computed in a previous work that addressed stimulus tuning in efficient spiking networks (Barrett et al. 2016). We evaluated tuning curves using M=3 constant stimulus features and we varied one of the features while the two others were kept fixed. We provided details on how the tuning curves and the selectivity index were computed in a new Methods subsection (“Tuning curves and selectivity index”) on page 50.

      (6) The final panels of Figure 4 are presented as an approach to test the efficiency of biological networks. The authors seem to measure the instantaneous (and time-averaged) E-I balance while varying the adaptation parameter and then correlate this with the loss. If that is indeed the approach (it's difficult to tell), this doesn't seem to suggest a tractable experiment. Also, the conclusion is somewhat obvious: the tighter the single neuron balance, the fewer unnecessary spikes are fired. I recommend that the authors clearly explain their analysis and how they envision its application to biological data.

      We indeed measured the instantaneous (and time-averaged) E-I balance while varying the adaptation parameters and then correlating this with the loss. We did not want to imply that the latter panels of Figure 4 are a means to test the efficiency or biological networks or that we are suggesting new and possibly unfeasible experiments. We see it as a way to better conceptually understand how spike triggered adaptation helps the network’s coding efficiency, by tightening the E I balance in a way that it reduces the number of unnecessary spikes. We apologize if the previous text was confusing in this respect.   We have now removed the initial paragraph of former Results Subsection (including removing the subsection title) and added new text about different effect of adaptation in E and I neurons on Page 9. We also thoroughly revised Figure 5.

      (7) The external stimuli are repeatedly said to vary (or be tracked) across "multiple time scales", which might inadvertently be interpreted as (i) a single stimulus containing multiple timescales or (ii) simultaneously presented stimuli containing different timescales. These scenarios are potential targets for efficient coding through neuronal adaptation (reference 21 in the manuscript and Pozzorini et al. Nat. Neuro. 2013), but they are not addressed in the current model. I recommend the authors clarify their statements regarding timescales (and if they're up for it, acknowledge this as a limitation).

      We thank the reviewer for bringing up this interesting point. To address the second point raised by the Reviewer (simultaneously presented stimuli containing multiple timescales), we performed new analyses to test the model with simultaneously presented stimuli that have different timescales. We found that the model encodes efficiently such stimuli.  We tested the case with a 3-dimensional stimulus where each dimension is an Ornstein-Uhlenbeck process with a different time constant. More precisely, we kept the time constant in the first dimension fixed (at 10 ms), and varied the time constant in the second and third dimension such that the time constant in the third dimension is doubled with respect to the second dimension. We plotted the encoding error in every stimulus dimension for E and I neurons (Fig. 8B, left plot) as well as the encoding error and the metabolic cost averaged across stimulus dimensions (Fig. 8B, right plot). The results are briefly described with text on page 13.

      Regarding the case i) (single stimulus containing multiple timescales), we considered two possibilities. One possibility is that timescales of the stimulus are separable, and in this case a single stimulus containing several time scales can be decomposed in several stimuli with a single time scale each. As we assign a new set of weights for each dimension of the decomposed stimulus, this case is similar to the case ii) that we already addressed. Another possibility is that timescales of the stimulus cannot be separated. This case is not covered in the present analysis and we listed it among the limitations of the model. We revised the text (page 13) around the question of multiple time scales and included the citation of Pozzorini et al. (2013). 

      (8) It is claimed that the model uses a mixed code to represent signals, citing reference 47 (Rigotti et al., Nature 2013). But whereas the model seems to use linear mixed selectivity, the Rigotti reference highlights the virtues of nonlinear mixed selectivity. In my understanding, a linearly mixed code does not enjoy the same benefits since it’s mathematically equivalent to a non-mixed code (simply rotate the readout matrix). I recommend that the authors clarify the type of selectivity used by their model and how it relates to the paper(s) they cite.

      The reviewer is correct that our selectivity is a linear mixing of input variables, and differs from the selectivity in Rigotti et al. (2013) which is non-linear. We revised the sentence on page 4 to clarify better that the mixed selectivity we consider is linear and we removed Rigotti’s citation. 

      (9) Reference 46 is cited as evidence that leaky integration of sensory features is a relevant computation for sensory areas. I don’t think this is quite what the reference shows. Instead, it finds certain morphological and electrophysiological differences between single pyramidal neurons in the primary visual cortex compared to the prefrontal cortex. Reference 46’ then goes on to speculate that these are differences relevant to sensory computation. This may seem like a quibble, but given the centrality of the objectivee function in normative theories, I think it's important to clarify why a particular objective is chosen.

      We agree that our reference of Amatrudo et al was not the best reference and that the previous text was confusing. We thus tried to improve on its clarity. We looked at the previous theoretical efficient coding papers introducing this leaky integration and we could not find in the previous theoretical work a justification of this assumption based on experimental papers. However, there is evidence that neurons in sensory structures, and in cortical association areas respond to time varying sensory evidence by summing stimuli over time with a weight that decreases steadily going back in time from the time of firing, which suggests that neurons integrate time-varying sensory features. In many cases, these integration kernels decay approximately exponentially going back in time, and several models explaining successfully perceptual readouts of neural activity work assuming leaky integration. This suggests that the mathematical approximation of leaky integration of sensory evidence, though possibly simplistic, is reasonable.  We revised the text in this respect (page 2).  

      (10) The definition of the objective function uses beta as a tuning parameter, but later parts of the text and figures refer to a parameter g_L which might only be introduced in the convex combination of Eq. 40a.

      This is correct. Parameter optimization has been performed on a weighted sum of the average encoding error and cost as given by the Eq. 39a (40a in first submission), with the weighting g<sub>L</sub> for the error versus the cost, and not the beta that is part of the objective in Eq.10. The convex combination in Eq. 39a allowed us to find a set of optimal parameters that is within biologically realistic parameter ranges, which includes realistic values for the firing threshold. The average encoding error and metabolic cost (the two terms on the right-hand side of Eq. 39a, without weighting with g<sub>L</sub>) in our network are of the same order (see Fig 8G for the E-I model where these values are plotted separately for the optimal network). Weighing the cost with optimal beta that is in the range of ~10 would have yielded a network that optimizes almost exclusively the metabolic cost and would bias the results towards solutions with poor encoding accuracy.

      To document more fully how the choice of weighting of the error with the cost (g<sub>L</sub>) affects the optimal parameters, we now added new analysis (Fig. 8D and Supplementary Fig. S4 A-D and H) showing optimal parameters as a function of this weighting. We commented on these results in the text on pages 9-11 and 12. For further details, please see also the reply to point 1 or Reviewer 1.

      (11) Figure 1J: "In E neurons, the distribution of inhibitory and of net synaptic inputs overlap". In my understanding, they are in fact identical, and this is by construction. It might help the reader to state this.

      We apologize for an unclear statement. In E neurons, net synaptic current is the sum of the feedforward current and of recurrent inhibition (Eq. 29c and Eq. 42). With our choice of tuning parameters that are symmetric around zero and with stimulus features that have vanishing mean, the mean of the feedforward current is close to zero. Because of this, the mean of the net current is negative and is close to the mean of the inhibitory current. We have clarified this in the text (page 5).

      (12) A few typos:

      -  p1. "Minimizes the encoding accuracy" should be "maximizes..."

      -  p1: "as well the progress" should be something like "as well as the progress"

      -  p.11 In recorded neurons where excitatory or inhibitory. ", "where" should be "were" - Fig3: missing parentheses (B)

      -  Fig4B: the 200 ticks on the y-scale are cut off.

      -  Panel Fig. 5a: "stimulus" should be "stimuli".

      -  Ref 24 "Efficient andadaptive sensory codes" is missing a space.

      -  p. 26: "requires" should be "required".

      -  On several occasions, the article "the" is missing.

      We thank the reviewer for kindly pointing out the typos that we now corrected.

      Reviewer #2 (Recommendations For The Authors):

      I would like to give the authors more details about the two main weaknesses discussed above, so that they may address specific points in the paper. First, there is the relation to previous work. Several published articles have presented very similar results to those discussed here, including references 5, 26, 28, 32, 33, 42, 43, 48, and an additional reference not cited by the authors (Calaim et al. 2022 eLife e73276). This includes:

      (1) Derivation of an E-I efficient spiking network, which is found in refs. 28, 42, 43, and 48. This is not reflected in the text: e.g., "These previous implementations, however, had neurons that did not respect Dale's law" (Introduction, pg. 1); "Unlike previous approaches (28, 48), we hypothesize that E and I neurons have distinct normative objectives...". The authors should discuss how their derivation compares to these.

      We have now fully clarified on page 3 that our model builds on the seminal previous works that introduced E-I networks with efficient coding (Supplementary text in Boerlin et al. 2013, Chalk et al. 2016, Barrett et al. 2016). 

      (2) Inclusion of a slow adaptation current: I believe this also appears in a previous paper (Gutierrez & Deneve 2019, ref. 33) in almost the exact same form, and is again not reflected in the text: "The strength of the current is proportional to the difference in inverse time constants ... and is thus absent in previous studies assuming that these time constants are equal (... ref. 33). Again, the authors should compare their derivation to this previous work.

      We thank the reviewer for pointing this out. We sincerely apologize if our previous version did not recognize sufficiently clearly that the previous work of Gutierrez and Deneve (eLife 2019; ref 33) introduced first the slow adaptation current that is similar to spike-triggered adaptation in our model. We have made sure that the revised text recognizes it more clearly. We also explained better what we changed or added with respect to this previous work (see revised text on page 8). 

      The work by Gutierrez and Deneve (2019) emphasizes the interplay between single neuron property (an adapting current in single neurons) and network property (networklevel coding through structured recurrent connections). They use a network that does not distinguish E and I neurons. Our contribution instead focuses on the adaptation in an E-I network. To improve the presentation following the Reviewer’s comment, we now better emphasize the differential effect of adaptation in E and in I neurons in revision (Fig. 5 B-D). Moreover, Gutierrez and Deneve studied the effect of adaptation on slower time scales (1 or 2 seconds) while we study the adaptation on a finer time scale of tens of milliseconds. The revised text detailed this is reported on Page 8.

      (3) Background currents and physical units: Pg. 26: "these models did not contain any synaptic current unrelated to feedforward and recurrent processing" and "Moreover previous models on efficient coding did not thoroughly consider physical units of variables" - this was briefly described in ref. 28 (Boerlin et al. 2013), in which the voltage and threshold are transformed by adding a common constant, and additional aspects of physical units are discussed.

      It is correct that Boerlin et al (2013) suggested adding a common constant to introduce physical units. We now revised the text to make clearer the relation between our results and the results of Boerlin et al. (2013) (page 3). In our paper, we built on Boerlin et al. (2013) and assigned physical units to computational variables that define the model's objective (the targets, the estimates, the metabolic constant, etc.). We assigned units to computational variables in such a way that physical variables (such as membrane potential, transmembrane currents, firing thresholds and resets) have the correct physical units.  We have now clarified how we derived physical units in the section of Results where we introduce the biophysical model (page 3) and specified how this derivation relates to the results in Boerlin et al. (2013).

      (4) Voltage correlations, spike correlations, and instantaneous E/I balance: this was already pointed out in Boerlin et al. 2013 (ref 28; from that paper: "Despite these strong correlations of the membrane potentials, the neurons fire rarely and asynchronously") and others including ref. 32. The authors mention this briefly in the Discussion, but it should be more prominent that this work presents a more thorough study of this well-known characteristic of the network.

      We agree that it would be important to comment on how our results relate to these results in Boerlin et al. (2013). It is correct that in Boerlin et al. (2013) neurons have strong correlations in the membrane potentials, but fire asynchronously, similarly to what we observe in our model. However, asynchronous dynamics in Boerlin et al. (2013) strongly depends on the assumption of instantaneous synaptic transmission and time discretization, with a “one spike per time bin” rule in numerical implementation. This rule enforces that at most one spike is fired in each time bin, thus actively preventing any synchronization across neurons. If this rule is removed, their network synchronizes, unless the metabolic constant is strong enough to control such synchronization to bring it back to asynchronous regime (see ref. 36). Our implementation does not contain any specific rule that would prevent synchronization across neurons. We now cite the paper by Boerlin and colleagues and briefly summarize this discussion when we describe the result of Fig. 3D on page 7. 

      (5) Perturbations and parameters sweep: I found one previous paper on efficient spiking networks (Calaim et al. 2022) which the authors did not cite, but appears to be highly relevant to the work presented here. Though the authors perform different perturbations from this previous study, they should ideally discuss how their findings relate to this one. Furthermore, this previous study performs extensive sweeps over various network parameters, which the authors might discuss here, when relevant. For example, on pg. 8, the authors write “We predict that, if number of neurons within the population decreases, neurons have to fire more spikes to achieve an optimal population readout” – this was already shown in Calaim et al. 2022 Figure 5, and the authors should mention if their results are consistent.

      We apologize for not being aware of Calaim et al. (2022) when we submitted the first version of our paper. This important study is now cited in the revised version. We have now, as suggested, performed sweeps of multiple parameters inspired by the work of Calaim. This new analysis is described extensively in reply to Weaknesses in the Public Review of reviewer 2 and is found in Fig 2, 6I and 7J and described on pages 5,11 and 13.

      The Reviewer is also correct that the compensation mechanism that applies when changing the ratio of E-I neuron numbers is similar to the one described in Barrett et al. (2016) and related to our claim “if number of neurons within the population decreases, neurons have to fire more spikes to achieve an optimal population readout”. We have now added (page 11) that this prediction is consistent with the finding of Barrett et al. (2016).

      With regard to the dependence of optimal coding properties on the number of neurons, we have tried to better describe similarities and differences with our work and that of Calaim et al as well as with the work of Barrett et al. (2016) which reports highly relevant results. These additional considerations are summarized in a paragraph in Discussion (page 16).

      (6) Overall, the authors should distinguish which of their results are novel, which ones are consistent with previous work on efficient spiking networks, and which ones are consistent in general with network implementations of efficient and sparse coding. In many of the above cases, this manuscript goes into much more depth and study of each of the network characteristics, which is interesting and commendable, but this should be made clear. In clarifying the points listed above, I hope that the authors can better contextualize their work in relation to previous studies, and highlight what are the unique characteristics of the model presented here.

      We made a number of clarifications of the text to provide better contextualization of our model within existing literature and to credit more precisely previous publications. This includes commenting on previous studies that introduced separate objective functions of E and I neurons (page 2), spike-triggered adaptation (page 8), physical units (page 3), and changes in the number of neurons in the network (page 16). 

      Next, there are the claims of optimal parameters. As explained on pg. 35 (criterion for determining optimal model parameters), it appears to me that they simply vary each parameter one at a time around the optimal value. This argument appears somewhat circular, as they would need to know the optimal parameters before starting this sweep. In general, I find these optimality considerations to be the most interesting and novel part of the paper, but the simulations are relatively limited, so I would ask the authors to either back them up with more extensive parameter sweeps that consider covariations in different parameters simultaneously (as in Calaim et al. 2022). Furthermore, the authors should make sure that they are not breaking any of the required relationships between parameters necessary for the optimization of the loss function. Again, some of the results (such as coding error not being minimized with zero metabolic cost) suggests that there might be issues here. 

      We thank the reviewer for this insightful suggestion. We have now added a joint sweep of all relevant model parameters using Monte-Carlo parameter search with 10.000 iterations. We randomly drew parameter configurations from predetermined parameter ranges that are detailed in the newly added Table 2. Parameters were sampled from a uniform distribution. We varied all the six model parameters studied in the paper (metabolic constant, noise intensity, time constant of single E and I neurons, ratio of E to I neurons and ratio of the mean I-I to E-I connectivity).  We now present these results on a new Figure 2. We did not find any set of parameters with lower loss than the parameters in Table 1 when the weighting of the error with the cost was in the following range: 0.4<g<sub>L</sub><0.81 (Fig. 2C). While our large but finite Monte-Carlo random sampling does not fully prove that the configuration we selected as optimal (on Table 1) is a global optimum, it shows that this configuration is highly efficient. Further, and as detailed in the rebuttal to the Weaknesses of the Public Review of Referee 2, analyses of the near optimal solutions are compatible with the notion (resulting from the join parameter sweep studies that we added to Figures 6 and 7) that network optimality may be influenced by joint covariations in parameters. These new results are reported in Results (page 5, 11 and 13) and in Figure 2, 6I an 7J.

      Some more specific points:

      (1) In general, I find it difficult to understand the scaling of the RMSE, cost, and loss values in Figures 4-7. Why are RMSE values in the range of 1-10, whereas loss and cost values are in the range of 0-1? Perhaps the authors can explicitly write the values of the RMSE and loss for the simulation in Figure 1G as a reference point.

      Encoding error (RMSE), metabolic cost (MC) and average loss for a well performing network are within the range of 1-10 (see Fig. 8G or 7C in the first submission). To ease the visualization of results, we normalized the cost and the loss on Figs. 6-8 in order to plot them on the same figure (while the computation of the optima is done following the Eq. 39 and is without normalization). We have now explicitly written the values of RMSE, MC and the average loss (non-normalized) for the simulation in Fig. 1D on page 5, as suggested by the reviewer. We have also revised Fig. 4 and now show the absolute and not the relative values of the RMSE and the MC (metabolic cost). 

      (2) Optimal E-I neuron ratio of 4:1 and efficacy ratio of 3:1: besides being unintuitive in relation to previous work, are these two optimal settings related to one another? If there are 4x more excitatory neurons than inhibitory neurons, won't this affect the efficacy ratio of the weights of the two populations? What happens if these two parameters are varied together?

      Thanks for this insightful point. Indeed, the optima of these two parameters are interdependent and positively correlated - if we decrease the E-I neuron ratio, the optimal efficacy ratio decreases as well. To better show this relation we added figures with 2dimensional parameter search (Fig. 7J) where we varied jointly the two ratios. The red cross on the right figure marks the optimal ratios used as optimal parameters in our study. These finding are discussed on page 13.

      (3) Optimal dimensionality of M=[1,4]: Again, previous work (Calaim et al. 2022) would suggest that efficient spiking networks can code for arbitrary dimensional signals, but that performance depends on the redundancy in the network - the more neurons, the better the coding. From this, I don't understand how or why the authors find a minimum in Figure 7B. Why does coding performance get worse for small M?

      We optimized all model parameters with M=3 and this is the reason why M=3 is the optimal number of inputs when we vary this parameter. Our network shows a distinct minimum of the encoding error as a function of the stimulus dimensionality for both E and I neurons (Fig. 8C, top). This minimum is reflected in the minimum of the average loss (Fig. 8C, bottom). The minimum of the loss is shifted (or biased) by the metabolic cost, with strong weighting of the cost lowering the optimal number of inputs. This is discussed on pages 13-14.

      Here are a list of other, more minor points, that the authors can consider addressing to make the results and text more clear:

      (1) Feedforward efficient coding models: in the introduction (pg. 1) and discussion (pg. 11) it is mentioned that early efficient coding models, such as that of Olshausen & Field 96, were purely feedforward, which I believe to be untrue (e.g., see Eq. 2 of O&F 96). Later models made this even more explicit (Rozell et al. 2008). Perhaps the authors can either clarify what they meant by this, or downplay this point.

      We sincerely apologize for the oversight present in the previous version of the text. We agree with the reviewer that the model in Olshausen and Field (1996) indeed defines a network with recurrent connections, and the same type of recurrent connectivity has been used by Rozell et al. (2008, 2013). The structure of the connectivity in Olshausen and Field (as well as in Rozell et al (2008)) is closely related to the structure of connectivity that we derived in our model. We have corrected the text in the introduction (page 1) to remove these errors.

      (2) Pg. 2 - The authors state: "We draw tuning parameters from a normal distribution...", but in the methods, it states that these are then normalized across neurons, so perhaps the authors could add this here, or rephrase it to say that weights are drawn uniformly on the hypersphere.

      We rephrased the description of how weights were determined (page 2).

      (3) Pg. 2 - "We hypothesize the time-resolved metabolic cost to be proportional to the estimate of a momentary firing rate of the neural population" - from what I can see, this is not the usual population rate, which would be an average or sum of rates across the population.

      Indeed, the time-dependent metabolic cost is not the population rate (in the sense of the sum of instantaneous firing rates across neurons), but is proportional to it by a factor of 1/t. More precisely, we can define the instantaneous estimate of the firing rate of a single neuron i as z<sub>i</sub>(t) = 1/t<sub>r</sub> r<sub>i</sub>(t) with r<sub>i</sub>(t) as in Eq. 7. We have clarified this in the revised text on page 3. 

      (4) Pg. 3: "The synaptic strength between two neurons is proportional to their tuning similarity if the tuning similarity is positive" - based on the figure and results, this appears to be the case for I-E, E-I, and I-I connections, but not for E-E connections. This should be clarified in the text. Furthermore, one reference given in the subsequent sentence (Ko et al. 2011, ref. 51), is specifically about E-E connections, so doesn't appear to be relevant here.

      We have now specified that the Eq. 24 does not describe E-E connections. We also agree that the reference (Ko et al. 2011) did not adequately support our claim and we thus removed it and revised the text on page 3 accordingly.

      (5) Pg. 3: "the relative weight of the metabolic cost over the encoding error controls the operating regime of the network" and "and an operating regime controlled by the metabolic constant" - what do you mean by operating regime here?

      We used the expression “operating regime” in the sense of a dynamical regime of the network.  However, we agree that this expression may be confusing and we removed it in revision. 

      (6) Pg. 3: "Previous studies interpreted changes of the metabolic constant beta as changes to the firing thresholds, which has less biological plausibility" - can the authors explain why this is less plausible, or ideally provide a reference for it?

      In biological networks, global variables such as brain state can strongly modulate the way neural networks respond to a feedforward stimulus. These variables influence neural activity in at least two distinct ways. One is by changing non-specific synaptic inputs to neurons, which is a network-wide effect (Destexhe and Pare, Nature Reviews Neurosci. 2003). This is captured in our model by changing the strength of the mean and fluctuations in the external currents. Beyond modulating synaptic currents, another way of modulating neural activity is by changing cell-intrinsic factors that modulate the firing threshold in biological neurons (Pozzorini et al. 2013). Previous studies on spiking networks with efficient coding interpreted the effect of the metabolic constant as changes to the firing threshold (Koren and Deneve, 2017, Gutierrez and Deneve 2019), which corresponds to cell-intrinsic factors. Here we instead propose that the metabolic constant modulates the neural activity by changing the non-specific synaptic input, homogeneously across all neurons in the network. Interpreting the metabolic constant as setting the mean of the non-specific synaptic input was necessary in our model to find an optimal set of parameters (as in Table 1) that is also biologically plausible. We revised the text accordingly (page 4).

      (7) Pg. 4: Competition across neurons: since the model lacks E-E connectivity, it seems trivial to conclude that there is competition through lateral inhibition, and it can be directly determined from the connectivity. What is gained from running these perturbation experiments?

      We agree that a reader with a good understanding of sparse / efficient coding theory can tell that there is competition across neurons with similar tuning already from the equation for the recurrent connectivity (Eq. 24). However, we presume that not all readers can see this from the equations and that it is worth showing this with simulations.

      Following the reviewer's comment, we have now downplayed the result about the model manifesting lateral inhibition in general on page 6. We have also removed its extensive elaboration in Discussion.

      One reason to run perturbation experiments was to test to what extent the optimal model qualitatively replicates empirical findings, in particular, single neuron perturbation experiments in Chettih and Harvey, 2019, without specifically tuning any of the model parameters. We found that the model reproduces qualitatively the main empirical findings, without tuning the model to replicate the data. We revised the text on page 5 accordingly.

      Further reason to run these experiments was to refine predictions about the minimal amount of connectivity structure that generates perturbation response profiles that are qualitatively compatible with empirical observations. To establish this, we did perturbation experiments while removing the connectivity structure of a particular connectivity sub-matrices (E-I, I-I or I-E; Fig. S3 F). This allowed us to determine which connectivity matrix has to be structured to observe results that qualitatively match empirical findings. We found that the structure of E-I and I-E connectivity is necessary, but not the structure of I-I connectivity. Finally, we tested partial removal of the connectivity structure where we replaced the precise (and optimal) connectivity structure and imposed a simpler connectivity rule. In the optimal connectivity, the connection strength is proportional to the tuning similarity. A simpler connectivity rule, in contrast, only specifies that neurons with similar tuning share a connection, and beyond this the connection strength is random. Running perturbation experiments in such a network obeying a simpler connectivity rule still qualitatively replicated empirical results from Chettih and Harvey (2019). This is shown on the Supplementary Fig. S2F on described on page 8.

      (8) Pg. 4: "the optimal E-I network provided a precise and unbiased estimator of the multidimensional and time-dependent target signal" - from previous work (e.g., Calaim et al. 2022), I would guess that the estimator is indeed biased by the metabolic cost. Why is this not the case here? Did you tune the output weights to remove this bias?

      Output weights were not tuned to remove the bias. On Fig. 1H in the first submission we plotted the bias for the network that minimizes the encoding error. We forgot to specify this in the text and figure caption, for which we apologize. We now replaced this figure with a new one (Fig. 1E) where we plot the bias of the network minimizing the average loss (with parameters as in Table 1). The bias of the network minimizing the error is close to zero, B^E = 0.02 and B^I = 0.03.  The bias of the network minimizing the loss is stronger and negative, B^E = -0.15 and B^I=-0.34. In the text of Results, we now report the bias of both networks (i.e., optimizing the encoding error and optimizing the loss). We also added a plot showing trial-averaged estimates and a time-dependent bias in each stimulus dimension (Supplementary figure S1 F). Note that the network minimizing the encoding error requires a lower metabolic constant (β = 6) than the network optimizing the loss (β=14), however, the optimal metabolic cost in both networks is nonzero. We revised the text and explained these points on page 5.

      (9) Pg. 4: "The distribution of firing rates was well described by a log-normal distribution" - I find this quite interesting, but it isn't clear to me how much this is due to the simulation of a finitetime noisy input. If the neurons all have equal tuning on the hypersphere, I would expect that the variability in firing is primarily due to how much the input correlates with their tuning. If this is true, I would guess that if you extend the duration of the simulation, the distribution would become tighter. Can you confirm that this is the stationary distribution of the firing rates?

      We now simulated the network with longer simulation time (10 seconds of simulated time instead of 2 seconds used previously) and also iterated the simulation across 10 trials to report a result that is general across random draws of tuning parameters (previously a single set of tuning parameters was used). The reviewer is correct that the distribution of firing rates of E neurons has become tighter with longer simulation time, but distributions remain log-normal. We also recomputed the coefficient of variation (CV) using the same procedure. We updated these plots on Fig. 1F.

      (10) Pg. 4: "We observed a strong average E-I balance" - based on the plots in Figure 1J, the inputs appear to be inhibition-dominated, especially for excitatory neurons. So by what criterion are you calling this strong average balance?

      The reviewer is correct about the fact that the net synaptic input to single neurons in our optimal network shows excess inhibition and the network is inhibition-dominated, so we revised this sentence (page 5) accordingly.  

      (11) Pg. 4: Stronger instantaneous balance in I neurons compared to E neurons - this is curious, and I have two questions: (1) can the authors provide any intuition or explanation for why this is the case in the model? and (2) does this relate to any literature on balance that might suggest inhibitory neurons are more balanced than excitatory neurons?

      In our model, I neurons receive excitatory and inhibitory synaptic currents through synaptic connections that are precisely structured. E neurons receive structured inhibition and a feedforward current. The feedforward current consists of M=3 independent OU processes projected on the tuning vectors of E neurons w<sub>i</sub><sup>E</sup>. We speculate that because the synaptic inhibition and feedforward current are different processes and the 3 OU inputs are independent, it is harder for E neurons to achieve the instantaneous balance that would be as precise as in I neurons. While we think that the feedforward current in our model reflects biologically plausible sensory processing, it is not a mechanistic model of feedforward processing. In biological neurons, real feedforward signals are implemented as a series of complex feedforward synaptic inputs from downstream areas, while the feedforward current in our model is a sum of stimulus features, and is thus a simplification of a biological process that generates feedforward signals. We speculate that a mechanistic implementation of the feedforward current could increase the instantaneous balance in E neurons.  Furthermore, the presence of EE connections could potentially also increase the instantaneous balance in E neurons. We revised the Discussion about these important questions that lie on the side of model limitations and could be advanced in future work. We could not find any empirical evidence directly comparing the instantaneous balance in E versus I neurons.  We have reported these considerations in the revised Discussion (page 16).

      (12) Pg. 5, comparison with random connectivity: "Randomizing E-I and I-E connectivity led to several-fold increases in the encoding error as well as to significant increases in the metabolic cost" and Discussion, pg. 11: "the structured network exhibits several fold lower encoding error compared to unstructured networks": I'm wondering if these comparisons are fair. First, regarding activity changes that affect the metabolic cost - it is known that random balanced networks can have global activity control, so it is not straightforward that randomizing the connectivity will change the metabolic cost. What about shuffling the weights but keeping an average balance for each neuron's input weights? Second, regarding coding error, it is trivial that random weights will not map onto the correct readout. A fairer comparison, in my opinion, would at least be to retrain the output weights to find the best-fitting decoder for the threedimensional signal, something more akin to a reservoir network.

      Thank you for raising these interesting questions. The purpose of comparing networks with and without connectivity structure was to observe causal effects of the connectivity structure on the neural activity. We agree that the effect on the encoding error is close to trivial, because shuffling of connectivity weights decouples neural dynamics from decoding weights. We have carefully considered Reviewer's suggestions to better compare the performance of structured and unstructured networks. 

      In reply to the first point, we followed the reviewer's suggestion and compared the optimal network with a shuffled network that matched the optimal network in its average balance. This was achieved by increasing the metabolic constant, decreasing the noise intensity and slightly decreasing the feedforward stimulus (we did not find a way to match the net current in both cell types by changing a single parameter). As we compared the metabolic cost between the optimal and the shuffled network with matched average balance, we still found lower metabolic cost in the optimal network, even though the difference was now smaller. We replaced Fig. 3B from the first submission with these new results in Fig. 4B and commented on them in the text (page 7).

      In reply to the second point, we followed reviewer’s suggestion and compared the encoding error (RMSE) of the optimal network and the network with shuffled connectivity where decoding weights are trained such as to optimally reconstruct the target signal. As suggested, we now analyzed the encoding error of the networks using decoding weights trained on the set of spike trains generated by the network using linear least square regression to minimize the decoding error. For a fair and quantitative comparison and because we did not train decoding weights of our structured model, we performed this same analysis using spike trains generated by networks with structured and shuffled recurrent connectivity. We found that the encoding error is smaller in the E population and much smaller in the I population in the structured compared to the random network. Decoding weights found numerically in the optimal network approach uniform distribution of weights that we used in our model (Fig. 4A, right). In contrast, decoding weights obtained from the random network do not converge to a uniform distribution, but instead form a much sparser distribution, in particular in I neurons (Supplementary Fig. S3 A). These additional results reported in the above mentioned figures are discussed in text on page 14.  

      (13) Pg. 5: "a shift from mean-driven to fluctuation-driven spiking" and Pg. 11 "a network structured as in our efficient coding solution operates in a dynamical regime that is more stimulus-driven, compared to an unstructured network that is more fluctuation driven" - I would expect that the balanced condition dictates that spiking is always fluctuation driven. I'm wondering if the authors can clarify this.

      We agree with the reviewer that networks with and without connectivity structure are fluctuation-driven, because in a mean-driven network the mean current must be suprathreshold (Ahmadian and Miller, 2021), which is not the case of either network. We removed the claim of the change from mean to fluctuation driven regime in the revised paper. We are grateful to the Reviewer for helping us tighten the elaboration of our findings.

      (14) Pg. 5: "suggesting that variability of spiking is independent of the connectivity structure" - the literature of balanced networks argues against this. Is this not simply because you have a noisy input? Can you test this claim?

      We thank the reviewer for the suggestion. We tested this claim by measuring the coefficient of variation in networks receiving a constant stimulus. In particular, we set the same strength in each of the M=3 stimulus dimensions and set the stimulus amplitude such as to match the firing rate of the optimal network in response to the OU stimulus. We computed the coefficient of variation in 200 simulation trials.  The removal of connectivity structure did not cause significant change of the coefficient of variation in a network driven by a constant stimulus (Fig. 4E). These additional results are discussed in text on page 7. 

      We also taken the suggestion about variability of spiking being independent of the connectivity structure. We removed this claim in the revision, because we only tested a couple of specific cases where the connectivity is structured with respect to tuning similarity (fully structured, fully unstructured and partially unstructured networks). This is not exhaustive of all possible structures that recurrent connectivity may have.

      (15) Pg. 6: "we also removed the connectivity structure only partially, keeping like-to-like connectivity structure and removing all structure beyond like-to-like" - can you clarify what this means, perhaps using an equation? What connectivity structure is there besides like-to-like?

      In the optimal model, the strength of the synapse between a pair of neurons is proportional to the tuning similarity of the two neurons, Y<sub>ij</sub> proportional to J<sub>ij</sub> for Y<sub>ij</sub> >0 (see Eq. 24 and Fig. 1C(ii)). Besides networks with optimal connectivity, we also tested networks with a simpler connectivity rule. Such a simpler rule prescribes a connection if the pair of neurons has similar tuning (Y<sub>ij</sub> >0), and no connection otherwise. The strength of the connection following this simpler connectivity rule is otherwise random (and not proportional to pairwise tuning similarity Y<sub>ij</sub> as it is in the optimal network). We clarified this in the revision (page 8), also by avoiding the term “like-to-like” for the second type of networks, which could indeed be prone to confusion.

      (16) Pgs. 6-7: "we indeed found that optimal coding efficiency is achieved with weak adaptation in both cell types" and "adaptation in E neurons promotes efficient coding because it enforces every spike to be error- correcting" - this was not clear to me. First, it appears as though optimal efficiency is achieved without adaptation nor facilitation, i.e., when the time constants are all equal. Indeed, this is what is stated in Table 1. So is there really a weak adaptation present in the optimal case? Second, it seems that the network already enforces each spike to be errorcorrecting without adaptation, so why and how would adaptation help with this?

      We agree with the Reviewer that the network without adaptation in E and I neurons is already optimal. It is also true that most spikes in an optimal network should already be error-correcting (besides some spikes that might be caused by the noise). However, regimes with weak adaptation in E neurons remain close to optimality. Spike-triggered facilitation, meanwhile, ads spikes that are unnecessary and decrease network efficiency. We revised the Fig.5 (Fig. 4 in first submission) and replaced 2-dimensional plots in Fig.4 C-F with plots that show the differential effect of adaptation in E neurons (top) and in I neurons (bottom plots) for the measures of the encoding error (RMSE), the efficiency (average loss) and the firing rate (Fig. 5B-D). On the new Fig. 5C it is evident that the loss of E and I population grows slowly with adaptation in E neurons (top) while it grows faster with adaptation in I neurons (bottom). These considerations are explained in revised text on page 9.

      (17) Pg. 7: "adaptation in E neurons resulted in an increase of the encoding error in E neurons and a decrease in I neurons" - it would be nice if the authors could provide any explanation or intuition for why this is the case. Could it perhaps be because the E population has fewer spikes, making the signal easier to track for the I population?

      We agree that this could indeed be the case. We commented on it in revision (page 9).

      (18) Pg. 7: "The average balance was precise...with strong adaptation in E neurons, and it got weaker when increasing the adaptation in I neurons (Figure 4E)" - I found the wording of this a bit confusing. Didn't the balance get stronger with larger I time constants?

      By increasing the time constant of I neurons, the average imbalance got weaker (closer to zero) in E neurons (Fig. 5G, left), but stronger (further away from zero) in I neurons (Fig. 5G, right). We have revised the text on page 9 to make this clearer.

      (19) Pg. 7: Figure 4F is not directly described in the text.

      We have now added text (page 9) commenting on this figure in revision.

      (20) Pg. 8: "indicating that the recurrent network dynamics generates substantial variability even in the absence of variability in the external current" -- how does this observation relate to your earlier claim (which I noted above) that "variability of spiking is independent of connectivity structure"?

      We agree that the claim about variability of spiking being independent of connectivity structure was overstated and we thus removed it. The observation that we wanted to report is that both structured and unstructured networks have very similar levels of variability of spiking of single neurons. The fact that much of the variability of the optimal network is generated by recurrent connections is not incompatible. We revised the related text (page 11) for clarity.

      (21) Pg. 9: "We found that in the optimally efficient network, the mean E-I and I-E synaptic efficacy are exactly balanced" - isn't this by design based on the derivation of the network?

      True, the I-E connectivity matrix is the transpose of the E-I connectivity matrix, and their means are the same by the analytical solution. This however remains a finding of our study. We have clarified this in the revised text (page 12).

      (22) Pg. 30, eq. 25: the authors should verify if they include all possible connectivity here, or if they exclude EE connectivity beforehand.

      We now specify that the equation for recurrent connectivity (Eq. 24, Eq. 25 in first submission) does not include the E-E connectivity in the revised text (page 41).

      Reviewer #3 (Recommendations For The Authors):

      Essential

      (1)  Currently, they measure the RMSE and cost of the E and I population separately, and the 1CT model. Then, they average the losses of the E and I populations, and compare that to the 1CT model, with the conclusion that the 1CT model has a higher average loss. However, it seems to me that only the E population should be compared to the 1CT model. The I population loss determines how well the I population can represent the E population representation (which it can do extremely well). But the overall coding accuracy of the network of the input signal itself is only represented by the E population. Even if you do combine the E and I losses, they should be summed, not averaged. I believe a more fair conclusion would be that the E/I networks have generally slightly worse performance because of needing to follow Dale's law, but are still highly efficient and precise nonetheless. Of course, I might be making a critical error somewhere above, and happy to be convinced otherwise!

      We carefully considered the reviewer's comment and tested different ways of combining the losses of the E and I population. We decided to follow the reviewer's suggestion and to compare the loss of the E population of the E-I model with the loss of the one cell type model. As evident already from the Fig. 8G, such comparison indeed changes the result to make the 1CT model more efficient. Also, the sum of losses of E and I neurons results in the 1CT model being more efficient than the E-I model. Note, however, the robustness of the E-I model to changes in the metabolic constant (Fig. 6C, top). The firing rates of the E-I model stay within physiological ranges for any value of the metabolic constant, while the firing rate of the 1CT model skyrocket for the metabolic constant that is lower than optimal (Fig. 8I).

      We added to Results (page 14) a summary of these findings.

      (2) The methods and main text should make much clearer what aspects of the derivation are novel, and which are not novel (see review weaknesses for specifics).

      We specified these aspects, as discussed in more detail in the above reply to point 4 of the public review of Reviewer 1.

      Request:

      If possible, I would like to see the code before publication and give recommendations on that (is it easy to parse and reproduce, etc.)

      We are happy to share the computer code with the reviewer and the community. We added a link to our public repository containing the computer code that we used for simulations and analysis to the preprint and submission (section “Code availability” on page 17). 

      Suggestions:

      (1) I believe that for an eLife audience, the main text is too math-heavy at the beginning, and it could be much simplified, or more effort could be made to guide the reader through the math.

      We tried to do our best to improve the clarity of description of mathematical expressions in the main text.

      (2) Generally vector notation makes network equations for spiking neurons much clearer and easier to parse, I would recommend using that throughout the paper (and not just in the supplementary methods).

      We now use vector notation throughout the paper whenever we think that this improves the intelligibility of the text. 

      (3) In the discussion or at the end of the results adding a clear section summarizing what the minimal requirements or essential assumptions are for biological networks to implement this theory would be helpful for experimentalists and theorists alike.

      We have added such a section in Discussion (page 15). 

      (5) I think the title is a bit too cumbersome and hard to parse. Might I suggest something like 'Efficient coding and energy use in biophysically realistic excitatory-inhibitory spiking networks' or 'Biophysically constrained excitatory-inhibitory spiking networks can efficiently implement efficient coding'.

      We followed reviewer’s suggestion and changed the title to “Efficient coding in biophysically realistic excitatory-inhibitory spiking networks.”

      (6) How the connections were shuffled exactly was not clear to me in how it was described now. Did they just take the derived connectivity, and shuffle the connections around? I recommend a more explicit methods section on it (I might have missed it).

      Indeed, the connections of the optimal network were randomly shuffled, without repetition, between all neuronal pairs of a specific connectivity matrix. This allows to preserve all properties of the distribution of connectivity weights and only removes the structure of the connectivity, which is precisely what we wanted to test. We now added a section in Methods (“Removal of connectivity structure”) on pages 51-52 where we explain how the connectivity structure is removed.

      (7) Figure 1 sub-panel ordering was confusing to read (first up down, then left right). Not sure if re- arranging is possible, but perhaps it could be A, B, and C at the top, with subsublabels (i) and (ii). Might become too busy though.

      We followed this suggestion and rearranged the Fig. 1 as suggested by the reviewer. 

      (8) Equation 3 in the main text should specify that 'y' stands for either E or I.

      This has been specified in the revision (page 3). 

      (9) Figure 1D shows a rough sketch of the types of connectivities that exist, but I would find it very useful to also see the actual connection strengths and the effect of enforcing Dale's law.

      We revised this figure (now Fig. 1B (ii)) and added connection strengths as well as a sketch of a connection that was removed because of Dale’s law.

      (10) The main text mentions how the readout weights are defined (normal distributions), but I think this should also be mentioned in the methods.

      Agreed. We indeed had Methods section “Parametrization of synaptic connectivity (page 46), where we explain how readout weights are defined. We apologize if a call on this section was not salient enough in the first submission. We made sure that the revised main text contains a clear pointer to this Methods section for details. 

      (11) The text seems to mix ‘decoding weights’ and ‘readout weights’.

      Thanks for this suggestion to use consistent language. We opted for ‘decoding weights’ and removed ‘readout weights’.

      (12) The way the paper is written makes it quite hard to parse what are new experimental predictions, and what results reproduce known features. I wonder if some sort of 'box' is possible with novel predictions that experimentalists could easily look at and design an experiment around.

      We now revised the text. We clarified for every property of the model if this property is a prediction of facts that were not yet experimentally tested or if it accounts for previously observed properties of biological neurons. Please see the reply to point 4 of Reviewer 1. 

      (13) Typo's etc.:

      Page 5 bottom -- ("all") should have one of the quotes change direction (common latex typo, seems to be the only place with the issue).

      We thank the reviewer for pointing out this typo that has been removed in revision.

    1. It is not, I think, uncharita-ble to say that we can see in his argument that he has only halfdigested the spiritual message of liberalism which he is seekingto convey to the legal profession. For everything that he says isreally dependent upon an enormous overvaluation of the impor-tance of the bare fact that a rule may be said to be a valid rule oflaw, as if this, once declared, was conclusive of the final moralquestion: "Ought this rule of law to be obeyed?" Surely the trulyliberal answer to any sinister use of the slogan "law is law" orof the distinction between law and morals is, "Very well, but thatdoes not conclude the question. Law is not morality; do not letit supplant morality."

      He has digested only half of the spiritual message of liberalism. What half did he digest and which did he omit?

      He recognized the liberal principle that a law is law regardless of moral principles. However, he neglected the second half of the principle which stipulates that just because 'law is law', does not mean it has to be obeyed.

      Once again we reaffirm that Law is NOT morality, do not let it supplant morality

    Annotators

    1. Reviewer #1 (Public review):

      Summary:

      The Authors investigated the anatomical features of the excitatory synaptic boutons in layer 1 of the human temporal neocortex. They examined the size of the synapse, the macular or the perforated appearance and the size of the synaptic active zone, the number and volume of the mitochondria, the number of the synaptic and the dense core vesicles, also differentiating between the readily releasable, the recycling and the resting pool of synaptic vesicles. The coverage of the synapse by astrocytic processes was also assessed, and all the above parameters were compared to other layers of the human temporal neocortex. The Authors conclude that the subcellular morphology of the layer 1 synapses is suitable for the functions of the neocortical layer, i.e. the synaptic integration within the cortical column. The low glial coverage of the synapses might allow the glutamate spillover from the synapses enhancing synpatic crosstalk within this cortical layer.

      Strengths:

      The strengths of this paper are the abundant and very precious data about the fine structure of the human neocortical layer 1. Quantitative electron microscopy data (especially that derived from the human brain) are very valuable, since this is a highly time- and energy consuming work. The techniques used to obtain the data, as well as the analyses and the statistics performed by the Authors are all solid, strengthen this manuscript, and mainly support the conclusions drawn in the discussion.

      Comments on latest version:

      The corrected version of the article titled „Ultrastructural sublaminar specific diversity of excitatory synaptic boutons in layer 1 of the adult human temporal lobe neocortex" has been improved thanks to the comments and suggestions of the reviewers. The Authors implemented several of my comments and suggestions. However, many of them were not completed. It is understandable that the Authors did not start a whole new series of experiment investigating inhibitory synapses (as it was a misunderstanding affecting 2 reviewers from the three). But the English text is still very hard to understand and has many mistakes, although I suggested to extensively review the use of English. Furthermore, my suggestion about avoiding many abbreviations in the abstract, analyse and discuss more the perforated synapses, the figure presentation (Figure 3) and including data about the astrocytic coverage in the Results section were not implemented. My questions about the number of docked vesicles and p10 vesicles, as well as about the different categories of the vesicle pools have not been answered neither. Many other minor comments and suggestions were answered, corrected and implemented, but I think it could have been improved more if the Authors take into account all of the reviewers' suggestions, not only some of them. I still have several main and minor concerns, with a few new ones as well I did not realized earlier, but still think it is important.

      Main concerns:

      (1) Epileptic patients:<br /> As all patients were epileptic, it is not correct to state in the abstract that non-epileptic tissue was investigated. Even if the seizure onset zone was not in the region investigated, seizures usually invade the temporal lobe in TLE. If you can prove that no spiking activity occured in the sample you investigated and the seizures did not invade that region, then you can write that it is presumably non-epileptic. I would suggest to write „L1 of the human temporal lobe neocortical biopsy tissue". See also Methods lines 608-612. Write only „non-epileptic" or „non-affected" if you verified it with EcoG. If this was the case, please write a few sentences about it in the Methods.

      (2) About the inhibitory/excitatory synapses.<br /> Since our focus was on excitatory synaptic boutons as already stated in the title we have not analyzed inhibitory SBs.<br /> Now, I do understand that only excitatory synapses were investigated. Although it was written in the title, I did not realized, since all over the manuscript the Authors were writing synapses, and were distinguishing between inhibitory and excitatory syanpses in the text and showing numerous excitatory and inhibitory synapses on Figure 2 and discussing inhibitory interneurons in the Discussion as well. Maybe this was the reason why two reviewers out of the three (including myself) thought you investigated both types of synapses but did not differentiated between them. So, please, emphasize in the Abstract (line 40), Introduction (for ex. line 92-97) and the Discussion (line 369) that only excitatory synaptic boutons were investigated.<br /> As this paper investigated only excitatory synaptic boutons, I think it is irrelevant to write such a long section in the Discussion about inhibitory interneurons and their functions in the L1 of the human temporal lobe neocortex. Same applies to the schematic drawing of the possible wiring of L1 (Figure 7). As no inhibitory interneurons were examined, neither the connection of the different excitatory cells, only the morphology of single synaptic boutons without any reference on their origin, I think this figure does not illustrate the work done in this paper. This could be a figure of a review paper about the human L1, but is is inappropriate in this study.

      (3) Perforated synapses<br /> "the findings of the Geinismann group suggesting that perforated synapses are more efficient than non-perforated ones is nowadays very controversially discussed"<br /> I did not ask the Authors to say that perforated synapses are more efficient. However, based on the literature (for ex. Harris et al, 1992; Carlin and Siekievitz, 1982; Nieto-Sampedro et al., 1982) the presence of perforated synapses is indeed a good sign of synapse division/formation - which in turn might be coupled to synaptic plasticity (Geinisman et al, 1993), increased synaptic activity (Vrensen and Cardozo, 1981), LTP (Geinisman et al, 1991, Harris et al, 2003), pathological axonal sprouting (Frotscher et al, 2006), etc. I think it is worth mentioning this at least in the Discussion.

      (4) Question about the vesicle pools<br /> Results, Line 271: Still not understandable, why the RRP was defined as {less than or equal to}10 nm and {less than or equal to}20nm. Why did you use two categories? One would be sufficient (for example {less than or equal to}20nm). Or the vesicles between 10 and 20nm were considered to be part of RRP? In this case there is a typo, it should be {greater than or equal to}10 nm and {less than or equal to}20nm.<br /> The answer of the Authors was to my question raised: We decided that also those very close within 10 and 20 nm away from the PreAZ, which is less than a SV diameter may also contribute to the RRP since it was shown that SVs are quite mobile.<br /> This does not clarify why did you use two categories. Furthermore, I did not receive answer (such as Referee #2) for my question on how could you have 3x as many docked vesicles than vesicles {less than or equal to}10nm. The category {less than or equal to}10nm should also contain the docked vesicles. Or if this is not the case, please, clarify better what were your categories.

      (5) Astrocytic coverage<br /> On Fig. 6 data are presented on the astrocytic coverage derived from L1 and L4. In my previous review I asked to include this in the text of the Results as well, but I still do not see it. It is also lacking from the Results how many samples from which layer were investigated in this analysis. Only percentages are given, and only for L1 (but how many patients, L1a and/or L1b and/or L4 is not provided). In contrast, Figure 6 and Supplementary Table 2 (patient table) contains the information that this analysis has been made in L4 as well. Please, include this information in the text as well (around lines 348-360).<br /> About how to determine glial elements. I cannot agree with the Authors that glial elements can be determined with high certainty based only on the anatomical features of the profiles seen in the EM. „With 25 years of experience in (serial) EM work" I would say, that glial elements can be very similar to spine necks and axonal profiles.<br /> All in all, if similar methods were used to determine the glial coverage in the different layers of the human neocortex, than it can be compared (I guess this is the case). However, I would say in the text that proper determination would need immunostaining and a new analysis. This only gives an estimatation with the possibility of a certain degree of error.

      (6) Large interindividual differences in the synapse density should be discussed in the Discussion.

    2. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors investigated the anatomical features of the synaptic boutons in layer 1 of the human temporal neocortex. They examined the size of each synapse, the macular or perforated appearance, the size of the synaptic active zone, the number and volume of the mitochondria, and the number of synaptic and dense core vesicles, also differentiating between the readily releasable, the recycling, and the resting pool of synaptic vesicles. The coverage of the synapse by astrocytic processes was also assessed, and all the above parameters were compared to other layers of the human temporal neocortex. The authors conclude that the subcellular morphology of the layer 1 synapses are suitable for the functions of the neocortical layer, i.e. the synaptic integration within the cortical column. The low glial coverage of the synapses might allow increased glutamate spillover from the synapses, enhancing synaptic crosstalk within this cortical layer.

      Strengths:

      The strengths of this paper are the abundant and very precious data about the fine structure of the human neocortical layer 1. Quantitative electron microscopy data (especially that derived from the human brain) are very valuable since this is a highly time- and energy-consuming work. The techniques used to obtain the data, as well as the analyses and the statistics performed by the authors are all solid, strengthen this manuscript, and mainly support the conclusions drawn in the discussion.

      We would like to thank reviewer#1 for his very positive comments on our manuscript stating that such data about the fine structure of the human neocortex are are highly relevant.

      Weaknesses:

      There are several weaknesses in this work. First, the authors should check and review extensively for improvements to the use of English. Second, several additional analyses performed on the existing data could substantially elevate the value of the data presented. Much more information could be gained from the existing data about the functions of the investigated layer, of the cortical column, and about the information processing of the human neocortex. Third, several methodological concerns weaken the conclusions drawn from the results.

      We would like to thank the reviewer for his critical and thus helpful comments on our manuscript. We took the first comment of the reviewer concerning the English and have thus improved our manuscript by rephrasing and shortening sentences. Secondly, according to the reviewer several additional analyses should be performed on the existing data, which could substantially elevate the value of the data presented. We will implement some of the suggestions in the improved version of the manuscript where appropriate. We will address a more detailed answer to the reviewer’s queries in her/his suggestions to the authors (see below). However, the reviewer states himself: “The techniques used to obtain the data, as well as the analyses and the statistics performed by the authors are all solid, strengthen this manuscript, and mainly support the conclusions drawn in the discussion”.

      Reviewer #2 (Public review):

      Summary:

      The study of Rollenhagen et al. examines the ultrastructural features of Layer 1 of the human temporal cortex. The tissue was derived from drug-resistant epileptic patients undergoing surgery, and was selected as far as possible from the epilepsy focus, and as such considered to be non-epileptic. The analyses included 4 patients with different ages, sex, medication, and onset of epilepsy. The manuscript is a follow-on study with 3 previous publications from the same authors on different layers of the temporal cortex:

      Layer 4 - Yakoubi et al 2019 eLife

      Layer 5 - Yakoubi et al 2019 Cerebral Cortex

      Layer 6 - Schmuhl-Giesen et al 2022 Cerebral Cortex.

      They find, that the L1 synaptic boutons mainly have a single active zone, a very large pool of synaptic vesicles, and are mostly devoid of astrocytic coverage.

      Strengths:

      The manuscript is well-written and easy to read. The Results section gives a detailed set of figures showing many morphological parameters of synaptic boutons and glial elements. The authors provide comparative data of all the layers examined by them so far in the Discussion. Given that anatomical data in the human brain are still very limited, the current manuscript has substantial relevance. The work appears to be generally well done, the EM and EM tomography images are of very good quality. The analysis is clear and precise.

      We would like to thank the reviewer for his very positive evaluation of our paper and the comments that such data have a substantial relevance, in particular in the human neocortex. In contrast to reviewer#1, this reviewer’s opinion is that the manuscript is well written and easy to read.

      Weaknesses:

      One of the main findings of this paper is that "low degree of astrocytic coverage of L1 SBs suggests that glutamate spillover and as a consequence synaptic cross-talk may occur at the majority of synaptic complexes in L1". However, the authors only quantified the volume ratio of astrocytes in all 6 layers, which is not necessarily the same as the glial coverage of synapses. In order to strengthen this statement, the authors could provide 3D data (that they have from the aligned serial sections) detailing the percentage of synapses that have glial processes in close proximity to the synaptic cleft, that would prevent spillover.

      We agree with the reviewer that we only quantified the volume ratio of the astrocytic coverage but not necessarily the percentage of synapses that may or not contribute to the formation of the ‘tripartite’ synapse. As suggested, we will re-analyze our material with respect to the percentage of coverage for individual synaptic boutons in each layer and will implement the results in the improved version of the manuscript. However, since this is a completely new analysis that is time-consuming we would like to ask the reviewer for additional time to perform this task.

      A specific statement is missing on whether only glutamatergic boutons were analyzed in this MS, or GABAergic boutons were also included. There is a statement, that they can be distinguished from glutamatergic ones, but it would be useful to state it clearly in the Abstract, Results, and Methods section what sort of boutons were analyzed. Also, what is the percentage of those boutons from the total bouton population in L1?

      We would like to thank the reviewer for this comment. Although our title clearly states, we focused on quantitative 3D-models of excitatory synaptic boutons, we will point out that more clearly in the Methods and Result chapters. Our data support recent findings by others (see for example Cano-Astorga et al. 2023, 2024; Shapson-Coe et al. 2024) that have evaluated the ratio between excitatory vs. inhibitory synaptic boutons in the temporal lobe neocortex, the same area as in our study, which was between 10-15% inhibitory terminals but with a significant layer and region specific difference. We will include the excitatory vs. inhibitory ratio and the corresponding citations in the Results section.

      Synaptic vesicle diameter (that has been established to be ~40nm independent of species) can properly be measured with EM tomography only, as it provides the possibility to find the largest diameter of every given vesicle. Measuring it in 50 nm thick sections results in underestimation (just like here the values are ~25 nm) as the measured diameter will be smaller than the true diameter if the vesicle is not cut in the middle, (which is the least probable scenario). The authors have the EM tomography data set for measuring the vesicle diameter properly.

      We partially disagree with the reviewer on this point. Using high-resolution transmission electron microscopy, we measured the distance from the outer-to-outer membrane only on those synaptic vesicles that were round in shape with a clear ring-like structure to avoid double counts and discarded all those that were only partially cut according to criteria developed by Abercrombie (1946) and Boissonnat (1988). We assumed that within a 55±5 nm thick ultrathin section (silver to gray interference contrast) all clear-ring-like vesicles were distributed in this section assuming a vesicle diameter between 25 to 40nm. For large DCVs, double-counts were excluded by careful examination of adjacent images and were only counted in the image where they appeared largest.

      In addition, we have measured synaptic vesicles using TEM tomography and came to similar results. We will address this in Material and Methods that both methods were used.

      It is a bit misleading to call vesicle populations at certain arbitrary distances from the presynaptic active zone as readily releasable pool, recycling pool, and resting pool, as these are functional categories, and cannot directly be translated to vesicles at certain distances. Indeed, it is debated whether the morphologically docked vesicles are the ones, that are readily releasable, as further molecular steps, such as proper priming are also a prerequisite for release.

      We thank the reviewer for this comment. However, nobody before us tried to define a morphological correlate for the three functionally defined pools of synaptic vesicles since synaptic vesicles normally are distributed over the entire nerve terminal. As already mentioned above, after long and thorough discussions with Profs. Bill Betz, Chuck Stevens, Thomas Schikorski and other experts in this field we tried to define the readily releasable (RRP), recycling (RP) and resting pools by measuring the distance of each synaptic vesicle to the presynaptic density (PreAZ). Using distance as a criterion, we defined the RRP including all vesicles that were located within a distance (perimeter) of 10 to 20 nm from the PreAZ that is less than an average vesicle diameter (between 25 to 40 nm). The RP was defined as vesicles within a distance of 60-200 nm away, still quite close but also rapidly available on demand and the remaining ones beyond 200 nm were suggested to belong to the resting pool. This concept was developed for our first publication (Sätzler et al. 2002) and this approximation since then is very much acknowledged by scientist working in the field of synaptic neuroscience and computational neuroscientist. We were asked by several labs worldwide whether they can use our data of the perimeter analysis for modeling. We agree that our definition of the three pools can be seen as arbitrary but we never claimed that our approach is the truth but nothing as the truth. Concerning the debate whether only docked vesicles or also those very close the PreAZ should constitute the RRP we have a paper in preparation using our perimeter analysis, EM tomography and simulations trying to clarify this debate. Our preliminary results suggest that the size of the RRP should be reconsidered.

      Tissue shrinkage due to aldehyde fixation is a well-documented phenomenon that needs compensation when dealing with density values. The authors cite Korogod et al 2015 - which actually draws attention to the problem comparing aldehyde fixed and non-fixed tissue, still the data is non-compensated in the manuscript. Since all the previous publications from this lab are based on aldehyde fixed non-compensated data, and for this sake, this dataset should be kept as it is for comparative purposes, it would be important to provide a scaling factor applicable to be able to compare these data to other publications.

      We thank the reviewer for his suggestion. However, for several reasons we did not correct for shrinkage caused by aldehyde fixation. There are papers by Eyre et al. (2007) and the mentioned paper by Korogod et al. 2015 that have demonstrated that cryo-fixation reveals larger numbers of docked synaptic vesicles, a smaller glial volume, and a less intimate glial coverage of synapses and blood vessels compared to chemical fixation. Other structural subelements such as active zone size and shape and the total number of synaptic vesicles remained unaffected. In two further publications Zhao et al. (2012a, b) investigating hippocampal mossy fiber boutons using cryo-fixation and substitutions came to similar results with respect to bouton and active zone size and number and diameter of synaptic vesicles compared to aldehyde-fixation as described by Rollenhagen et al. 2007 for the same nerve terminal. This was one of the reasons not correcting for shrinkage. In addition, all cited papers state that chemical fixation in general provides a much better ultrastructural preservation of tissue samples when compared with cryo-fixation and substitution where optimal preservation is only regional within a block of tissue and therefore less suitable for large-scale ultrastructural analyses as we performed.

      Reviewer #3 (Public review):

      Summary:

      Rollenhagen et al. offer a detailed description of layer 1 of the human neocortex. They use electron microscopy to assess the morphological parameters of presynaptic terminals, active zones, vesicle density/distribution, mitochondrial morphology, and astrocytic coverage. The data is collected from tissue from four patients undergoing epilepsy surgery. As the epileptic focus was localized in all patients to the hippocampus, the tissue examined in this manuscript is considered non-epileptic (access) tissue.

      Strengths:

      The quality of the electron microscopic images is very high, and the data is analyzed carefully. Data from human tissue is always precious and the authors here provide a detailed analysis using adequate approaches, and the data is clearly presented.

      We are very thankful to the reviewer upon his very positive comments about our data analysis and presentation.

      Weaknesses:

      The study provides only morphological details, these can be useful in the future when combined with functional assessments or computational approaches. The authors emphasize the importance of their findings on astrocytic coverage and suggest important implications for glutamate spillover. However, the percentage of synapses that form tripartite synapses has not been quantified, the authors' functional claims are based solely on volumetric fraction measurements.

      We thank the reviewer for his critical comments on our findings concerning the layer-specific astrocytic coverage as also suggested by reviewer#2. As already stated above we will analyze the astrocytic coverage and the layer-specific percentage of astrocytic contribution to the ‘tripartite’ synapse in more detail. We are, however, a bit puzzled about the comment that structural anatomists usually receive that our study only provides morphological details. Our thorough analysis of structural and synaptic parameters of synaptic boutons underlie and might even predict the function of synaptic boutons in a given microcircuit or network and will thus very much improve our understanding and knowledge about the functional properties of these structures, in particular in the human brain where such studies are still quite rare. The main goal of our studies in the human neocortex was the quantitative morphology of synaptic boutons and thus the synaptic organization of the cortical column, layer by layer which to our knowledge is the first such detailed study undertaken in the human brain. Our efforts have set a golden standard in the analysis of synaptic boutons embedded in different microcircuits und is meanwhile internationally very well accepted.

      The distinction between excitatory and inhibitory synapses is not clear, they should be analyzed separately.

      As already stated above in response to reviewer#1 our study focused on excitatory synaptic boutons since they represent the majority of synapses. However, in the improved version of our manuscript in the Material and Method section we included a paragraph with structural criteria to distinguish excitatory from inhibitory terminals (see also our comment to reviewer#1 concerning this point) including appropriate citations.

      The text connects functional and morphological characteristics in a very direct way. For example, connecting plasticity to any measurement the authors present would be rather difficult without any additional functional experiments. References to various vesicle pools based on the location of the vesicles are also more complex than suggested in the manuscript. The text should better reflect the limitations of the conclusions that can be drawn from the authors' data.

      We thank the reviewer for this comment. However, it has been shown by meanwhile numerous publications that the shape and size of the active zone together with the pool of synaptic vesicles and the astrocytic coverage critically determines synaptic transmission and synaptic strength, but can also contribute to the modulation of synaptic plasticity (see also citations within the text). It has been shown that synaptic boutons can switch upon certain stimulation conditions to different modes of release (uni- vs. multiquantal, uni- vs multivesicular release) and from asynchronous to synchronous release leading also to the modulation of synaptic short- and long-term plasticity. To the second comment: When we started with our first paper about the Calyx of Held – principal neuron synapse in the MNTB (Sätzler et al. 2002) we tried to define a morphological correlate for the three functionally defined pools. As already mentioned above in our reply to the other two reviewers, this is rather difficult since synaptic vesicles are normally distributed over the entire nerve terminal. After long and thorough discussions with Bill Betz, Chuck Stevens and other leading scientist in the field of synaptic neuroscience, we together with Bert Sakmann tried to define a morphological correlate for the functionally defined pools using a perimeter analysis. We defined the readily releasable pool as vesicles 10 to 20 nm away from the presynaptic active zone, the recycling pool as those in 60-200 nm distance and the remaining as those belonging to the resting pool. However, it has been shown by capacitance measurements (see for example Hallermann et al 2003), FM1-43 investigations (see for example Henkel et al. 1996) and high-resolution electron microscopy (see for example Schikorski and Stevens 2001; Schikorski 2014) that our estimate of the RRP nearly perfectly matches with the functionally defined pools at hippocampal and cortical synapses (Silver et al. 2003). In addition, in one of our own papers (Rollenhagen et al. 2018) we also estimated the RP functionally from trains of EPSPs using an exponential fit analysis and came to similar results upon its size using the perimeter analysis.

      Of course, as stated by the reviewer the scenario could be more complex, using other criteria but we never claimed that our morphologically defined pools are the truth but nothing as the truth but we believe it offers a quite good approximation.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Abstract:

      Avoid the numerous abbreviations in the abstract. The paragraph describing the results obtained in this study is too short. Include more results, such as the size of the active zone, the proportion of perforated synapses, the ratio of synapses terminating on dendrites/spines, the percentage of volume occupied by mitochondria, etc. In the last paragraph, compare the layer-specific data to other layers of the neocortex before writing the concluding sentence.

      To meet the word limits of the abstract (150 words) defined by eLife we had to use abbreviations. We followed the suggestions by the reviewer and expanded our abstract by adding the proportion of macular vs. perforated active zone and the percentage of mitochondria within an SB. However, we did not include the comparison of structural parameters in the Abstract since this is discussed thoroughly in the MS at other places (see Results and Discussion).

      Results:

      First of all, wonderful data! Lots of work, very valuable quantitative electron microscopy results.

      Main concerns:

      Adding several analyses would give much more information about the cortical synaptic organization. It would be very useful to differentiate between excitatory and inhibitory terminals (and give their ratio) and include this information in all different analyses, such as in the SV number, SV pool analysis, mitochondrion analysis, etc., that would give functional information as well. You have all the data for this, and you know how to differentiate between inhibitory and excitatory synapses, it can be done. We could see the possible morphological differences between excitatory and inhibitory synapses (maybe one is larger/has more SVs, etc. than the other). Based on these possible differences conclusions could be drawn about functional hypotheses, such as one or the other is more efficient in inducing postsynaptic potentials, excitation or inhibition is more pronounced in layer 1, etc. Furthermore, looking at the ratio of perforated synapses, we could gain information about the formation of new synapses. Maybe there is a difference between excitatory and inhibitory circuits in this point of view.

      To the first point: Since our focus was on excitatory synaptic boutons as already stated in the title we have not analyzed inhibitory SBs. To do so, we have to re-analyze our complete data which is time-consuming and an additional workload. However, we can give a ratio excitatory vs. inhibitory synaptic boutons which was between 10-15% but with layer-specific differences. Our finding are in good agreement with a recent publication in Science by the Lichtman group (Shapson-Coe et al. 2024) and work by the DeFelipe group (Cano-Astorga et al. 2023, 2024) estimating the number of inhibitory boutons in different layers of the temporal lobe neocortex as we did by 10-15%. We included a small paragraph about inhibitory synapses, their percentage and included the citations in our Results section. Concerning the ratio between macular, non-perforated vs. perforated active zones we stated the majority of synaptic boutons were of the macular, non-perforated type (~75%; see improved version of the MS). If perforated, this was found predominantly on the postsynaptic site, but quite rare in L1 SBs. Since GABAergic terminals had only a small or no clearly visible PSD this would be hard to look at.

      To the last point, it has been demonstrated that the number of dense core vesicles and their fusion with the presynaptic density could be a critical factor in the build-up of the active zone. In addition, the findings of the Geinismann group suggesting that perforated synapses are more efficient than non-perforated ones is nowadays very controversially discussed since other factors such as size of the active zone (see for example Matz et al. 2010; Holderith et al. 2012) and the astrocytic coverage contribute to synaptic efficacy and strength.

      Related to this topic: although in the case of rat CA1 pyramidal cells all inhibitory synapses terminated on dendritic shafts (Megias et al., Neuroscience 2001), please be aware that both excitatory and inhibitory synapses can terminate on both dendritic shafts and spines in humans (inhibitory synapses are though rare on spines, usually less than 10%, but they do exist, see for example Wittner et al, Neuroscience, 2001). Please, define the excitatory/inhibitory nature of the synapses based on morphological features (not on their postsynaptic target), i.e., flattened vesicles and thin postsynaptic density for GABAergic synapses, whereas larger, round vesicles and thick postsynaptic density for glutamatergic synapses. Anyway, the ratio of excitatory and inhibitory synapses on dendrites and spines in the two sublamina would also give useful information about the synaptic organization of the human neocortical layer 1.

      We are aware that not all terminals targeting on spines are excitatory, in turn it has been shown that not all terminals on shafts were inhibitory as long thought (Silver et al. 2003). However, as stated by the reviewer their abundancy on spines is rather low. At the moment it is rather unclear which functional impact inhibitory terminals on spines have, despite a local inhibition (see for example Kubota et al. eLife 2015), and thus their role is rather speculative since excitatory synapses are the predominant class on dendritic spines. As already stated above the ratio of excitatory vs. inhibitory terminals is between 10-15% and not significantly different between the two sublaminae. We are willing to add this in the results section (see in the improved version of the manuscript).

      (2) About the glial coverage: Please, specify how glial elements were determined. What were the morphological features specific to astroglial processes? In Figure 5, how could we know whether the glial element marked by green is not a spine neck? The lack of morphological features specific to glial processes makes this analysis weak. The most accurate would be to make it with the aid of GFAP staining. I know this is not possible with your existing data, but at least, provide information on how glial processes were identified.

      We used the criteria first described by Peters et al. (1991) and Ventura and Harris (1999) identifying astrocytic profiles by their irregular stellate shape, relatively clear cytoplasm, numerous glycogen granules and bundles of intermediate filaments. After more than 20 years of structural investigations, we hope that the reviewers will believe us that we can identify astrocytic processes at the high-resolution TEM level. In some of our publications (Rollenhagen et al. 2007; 2015; 2018; Yakoubi et al. 2019a) we have used glutamine synthetase pre-embedding immunhistochemistry to identify astrocytic processes, but a disadvantage of this method is the reduction of the ultrastructural preservation of the tissue. We have included the criteria to identify astrocytic processes of glial coverage in our manuscript together with the two citations (see improved version of the manuscript).

      (3) The authors state that the total number of SVs was very variable. How was the distribution of the number of SVs? Homogenous distribution suggests that different types of synapses cannot be distinguished based on their morphological features, whereas distribution with more than one peak would suggest that different types of synapses are present in L1, and that they can be differentiated by their appearance (number of SVs, for example). This might be also related to the type of synapse (i.e., excitatory or inhibitory). The same applies to the number of RP and resting pool SVs.

      To look for differences in structural and synaptic parameters that can further classify synaptic boutons we have performed a hierarchical cluster and multivariance analysis. However, it turned out that according to structural and functional parameters no further classification into subtypes could be done.

      (4) The authors should check and review extensively for improvements to the use of English. The Results and Discussion sections contain many sentences which are not easy to understand. They have either a too complicated structure, or they are incomplete and hard to follow. Few examples: "The RRP/PreAZ at p20 nm criterium was on average 19.05 {plus minus} 17.23 SVs (L1a: 25.04 {plus minus} 21.09 SVs and L1b: 13.07 {plus minus} 13.87SVs) and thus nearly 2-fold larger for L1a." If you take out the parenthesis, the sentence has no meaning. "The majority of SBs in L1 of the human TLN had a single at most three AZs that could be of the non-perforated macular or perforated type comparable with results for other layers in the human TLN but by ~1.5-fold larger than in rodent and non-human primates." Rephrase these types of sentences, please.

      We partially agree with the reviewer. We have improved our manuscript by rephrasing and shortening sentences.

      Other suggestions:

      (1) Put the synaptic density part after the description of the neuronal and synaptic composition part, it is more logical this way (i.e., first qualitative description, the distinction between sublayers, then quantitative data). Please write down in the description of the neuronal and synaptic composition part how L1a and L1b were differentiated (see also my comment on Figure 1).

      We agree with the reviewer and did the change according to the suggestion. For a better understanding, we have also expanded the neuronal and synaptic description of the two sublaminae in L1.

      (2) Introduce a list of abbreviations at the beginning, that would help.

      It is quite unusual to provide a list of abbreviations in eLife. However, when used first the full meaning of the abbreviations is now given.

      (3) What is cleft width? Usually, it refers to the distance between the pre- and the postsynaptic membrane, but here, I think it refers to the size (diameter) of the active zone. Please, clarify in the Result section (as it appears earlier than the Methods section, where it is explained). I would probably use the expression "synaptic cleft size" instead of "synaptic cleft width" to avoid misunderstanding.

      We thank the reviewer for the suggestion and used synaptic cleft size for better clarity and have transferred the sentence from the Material and Methods to the Results section.

      (4) The description of the different SVs (RRP, RP, etc.) is not clear in lines 236-242. What does it mean, that RRP vesicles are located {less than or equal to}10 nm and {less than or equal to}20 nm from the active zone? Explain, why the two different distance criteria were used. Furthermore, how were the vesicles located at p20-p60 defined? Why were these vesicles not considered in the determination of the different pools?

      As stated in the public review to the reviewers concern we have tried to define a morphological correlate to the three functionally defined pools. After thorough discussions, with leading scientists in the field of synaptic neuroscience we have decided to use the distance of individual vesicles from the PreAZ and sort vesicles upon these criteria. One can argue that this approach is random, however, these distance criteria were described by Rizzoli and Betz (2004, 2005) and Denker and Rizzoli (2010). As also stated in the public review there is still a controversial discussion whether only docked or omega-shaped SVs constitute the RRP. We decided that also those very close within 10 and 20 nm away from the PreAZ, which is less than a SV diameter may also contribute to the RRP since it was shown that SVs are quite mobile.

      (5) Please, explain how the number of docked vesicles can be 3x larger in L1b, than the number of vesicles located at p10? Docked vesicles are the closest (with the membrane touching the PreAZ)... if this comes from the fact that another pool of boutons was used for the EM tomography analysis, then the entire pool of boutons analyzed, then it means that the selection of boutons for the EM tomography is highly biased. This also implies that EM tomography data are most probably not valid for the entire L1b. The difference might also come from the different ratios of dendrite/spine synapses included in the two different analyses. In this case, it would be helpful to distinguish between synapses terminating on dendrites/spines and analyse them separately (same as for inhibitory/excitatory, which is not exactly the same as dendrite/spine!). Different n numbers of synapses are given in the text (n=25, 25, 25 25) and in Table 2 (n=91, 98, 87, and 84) for the analysis of the docked vesicles, please, correct this.

      This is a correct value and thus there is a nearly 3-fold difference. The TEM tomography was carried out on the same blocks that have been used for our 3D-volume reconstructions. To carry out TEM tomography we had to use thicker sections (250 nm) to look for complete SBs as we also did in our serial sections, but of course, we could not quantify the same SBs. The completeness of SBs was one of our main criteria to reconstruct structural and synaptic parameters. The second was that the synaptic cleft was cut perpendicular. Only SBs that met these criteria were chosen for further quantitative analysis. In this respect we are of course biased in both methods.

      Secondly, as already stated we did not quantify inhibitory terminals in serial sections. However, we did not find significant differences between shaft vs. spine synapses.

      Finally, in Table 2 the total number of ‘docked’ SVs is given analyzed from the total number of SBs analyzed.

      Discussion:

      Please include the recent findings of human L1 neurons, including the "rosehip" cells in the L1 neuronal network, see Boldog et al., Nat Neurosci 2018. It would be also useful to consider in the discussion the human-specific cortical synchrony and integration phenomena derived from in vitro data (Mansvelder, Lein, Tamas, Wittner, Larkum, Huberfeld labs, etc.), and how the synaptic morphology can be related to these.

      We thank the reviewer and include the reference in our chapter functional significance.

      Figures and Tables:

      Figure 1: In the legend, it is written that CR cells are marked by an asterisk, but on the figure it is marked by arrowheads. H: I would put the dashed line slightly lower, just above the two neuronal cell bodies. Now it looks like in the middle of the astrocytic layer. One of the asterisks marking the CR cell is not above the nucleus of that cell. I: the gabaergic neuron is outside of the framed area. I would delete the frame, anyway, the arrowheads and the asterisk are enough to show what the authors want to show.

      We have changed the Figure according to the suggestions raised by the reviewer.

      Figure 3: The transparent yellow is not visible. It is a bit disturbing that the contours of the boutons are not visible, I would make the transparent yellow stronger (less transparent). The SVs in green/magenta will be still visible.

      We wanted to highlight the internal subelements of SBs and thus made the covering transparent but we think it is still visible.

      Figure 6C: The data concerning other layers than L1 are most probably taken from other publications of the research group. One is cited (for L6), but not the others. Please correct this, or if not, then write this in the Results and Methods.

      We changed the citation in the improved version of the manuscript. We overlooked that the values for L4 and L5 were already published in Schmuhl-Giesen et al. 2022.

      Table 1: What does central and lateral cleft width mean in Table 1? Furthermore, please, give the name for abbreviations CV and IQR in Tables 1 and 2.

      The measurements of the synaptic cleft are now described in detail in the Results section. We now have given the full names for CV and IQR in the legends of tables 1 and 2.

      Supplemental Figures 1 and 2: Why Hu01 and Hu02 are twice? What is the difference? Based on the figure legend, it is L1a and L1b? If yes, please, indicate on the figure or in the legend.<br /> Supplemental Table 1: What is TLE in the case of Hu_04? If it is temporal lobe epilepsy, then why age at epilepsy onset is missing?

      Yes, Hu01 and Hu02 were selected for both L1a and L1b in separate serial sections preparations each. We indicated this now in the figure legend. Concerning Hu_04, unfortunately we do not have any further information about the medical background of the patient.

      Supplemental Table 1 (Patient table), that there are many abbreviations explained which do not appear in the table (lBAZ: Brivaracetam CBZ: Carbamazepine; CLB: Clobazam; ESL: Eslicarbazepin; GGL: Ganglioglioma, etc.), please check and correct.

      We have removed the unnecessary abbreviations.

      Other minor suggestions:

      What is Pr? Please, give the name a first appearance (line 368).

      We explained Pr (release probability) when used for the first time.

      Give the name for t-LDT, please (lines 442-443).

      We explained t-LTD (timing-dependent long-term depression) when used for the first time.

      Typo in line 169: DCW instead of DCV (dense core vesicle), DCV is used in the figure legends.

      We changed DCW to DCV.

      Typo in line 190: Yokoubi instead of Yakoubi (reference).

      We changed Yokoubi to Yakoubi.

      Typo in line 237: Rizzoloi instead of Rizzoli (reference).

      We changed Rizzoloi to Rizzoli.

      Line 229-230: One reference is not inserted properly - Piccolo and Bassoon.

      The reference of Schoch and Gundelfinger and Murkherjee to the build-up of the active zone and the role of DCV containing Piccolo and Bassoon are properly cited in the text.

      Typo in line 398: exit instead of exist.

      Corrected

      Typo in line 700: Reynolds (1063) instead of 1963.

      Corrected

      Reviewer #2 (Recommendations for the authors):

      Abstract:

      The last sentence seems far-fetched, and unrelated to the manuscript. How mostly single active zone boutons can "mediate, integrate and synchronize contextual and cross-modal information, enabling flexible and state-dependent processing of feedforward sensory inputs from other layers of the cortical column"? Which of the anatomical findings of the manuscript led to these conclusions?

      According to the review by Schuman et al. (2021) layer 1 is regarded as a layer that mediate, integrate and synchronize contextual and cross-modal information, enabling flexible and state-dependent processing of feedforward sensory inputs from other layers of the cortical column to which the structural quantitative 3D- models of SBs contribute since they are an integral element connecting neurons and building networks.

      I am also puzzled by the authors' statement in more than one place of the manuscript that "L1a can be characterized as a predominantly astrocytic sublamina". If the L1 contains the lowest measured volume ratio of glial processes (Figure 6), then this description does not seem to hold. Please rephrase.

      The reviewer is right and we rephrased the sentences for more clarity in the improved version of our manuscript.

      Results:

      The authors find large inter-patient variability in the synapse density at L1, which raises the issue of what were the criteria to include certain patients in the analyses. Apparently, these are different from the ones analysed in their previous papers, and all the provided parameters were different (sex, age, medication, onset of epilepsy), and any of them can result in altered synapse density.

      First, we have not used all patients for this study. Secondly, it was not possible to use all patients for all six layers.

      It would be useful to add a panel for Figure 1 with synapse density across the different layers, as they provide this data in the Discussion.

      We implemented a Supplementary Table 1 with the synaptic density values over all layers compared in the Discussion.

      I cannot find Source Data 1 in the manuscript although it is referred to in more than 1 place (e.g. page 5 line 100).

      Source data were uploaded when our manuscript was submitted directly to eLife as Supplemental Material. However, as stated by bioRxiv ‘any Supplemental Materials associated with this manuscript have not been transferred to bioRxiv to avoid the posting of potentially sensitive information’ all source data have not been uploaded to the preprint server.

      Page 5 line 100 the correct value is 7.3*107 or rather 108?

      We corrected the value in the improved version of the MS.

      It would be nice to put the synapse density values into context by comparing them to e.g. mouse, rat, or monkey data.

      Since we are working on the human temporal lobe neocortex we avoided to compare those data with those estimated in experimental animals. In addition as discussed by DeFelipe et al. (1999) different methods were used to quantify synaptic density in experimental animals so these results are difficult to compare.

      Page 5 Line 117 CR-cells stands for Cayal-Retzius cells?

      CR-cells is the abbreviation for Cajal-Retzius cells.

      Page 6 Line 146 repeated sentence.

      We deleted the repeated sentence.

      Page 7 Line 154 "file-scale TEM" ??

      We replaced file-scale by fine-scale.

      Page 7 Line 164 "GABAergic synapses identified by the smaller more spherical SVs". With this fixation condition, GABAergic vesicles are more ovoid than glutamatergic ones. What were the criteria to distinguish them?

      To our knowledge in meanwhile numerous publications using the same fixation inhibitory terminals contain more spherical and smaller and not roundish synaptic vesicles and showed no clear prominent PSDs as described in our paper. We have addressed that more clearly in the results section of the improved version of the MS.

      Page 8 line 197 "The majority (~98%) of SBs in L1a and L1b had only a single (Figures 2C-E, 3A-C, E) at most two or three AZs" is in striking contrast with the other statement from page 7 Line 163 "Numerous SBs in both sublaminae were seen to establish either two or three synaptic contacts on the same spine or dendrite". Which of these statements is valid? Please provide exact quantification for this statement and decide which one is true.

      It is true that the majority of synaptic boutons had a single active zone. However, for example on a spine not only a single but also two or three SBs can be found. We have rephrased this sentence for more clarity.

      Page 9 Line 206 "L1 AZs did not show a large variability in size as indicated by the low SD, CV, and variance (Table 1)" Is this inter-patient variance of mean values? As in Supplementary Figure 1, both the SBs volume and PreAZ area show large variability in a given patient sample. Only the inter-patient variability of mean values seems low. Please state it clearly throughout the MS for other datasets as well.

      For clarity concerning the variability between patients and structural parameters we have generated box plots (Suppl. Figures 1 and 2).

      Page 9 Line 208 data is on Figure 5A and not 8A.

      We thank the reviewer and corrected the citation of the Figure

      Page 12 Line 295 how can the number of docked vesicles for L1b be larger than the one measured by the perimeter p10 nm? This later should contain the docked and PreAZ membrane proximal pool as well. This difference is even larger if we assume, that at EM tomography only partial AZs were analysed in a 200 nm thick section, not the entire AZ as for the perimeter measurement. Can the authors provide density estimates by dividing the docked / p10 nm vesicle numbers with the AZ area and comparing them?

      This is a result comparing both methods. To the second concern: As stated in the text only synaptic boutons were the active zone can be followed from the beginning to its end and were the synaptic cleft was cut perpendicular were included in the TEM tomography sample as we also did in our 3D-volume reconstructions.

      Methods:

      Page 25 Line 624 While the PSD area can be equivocally measured, due to the dense appearance of the PSD on the EM images, the PreAZ is more difficult to outline due to lack of evident anatomical markers except the synaptic cleft (the dense material is much thinner). That is why in many publications the PreAZ area is considered to be identical to the PSD area. What are the anatomical criteria used here for the PreAZ? Why do the authors correct the PSD area, which is easy to measure with the PreAZ area that is much less certain to outline?

      As stated in material and Methods both the pre- and postsynaptic densities are not defined by placing a closed contour in both densities because one can’t be certain that the dense accumulation of particles defining both areas since the impregnation (staining) and contrast of both structures critically depends on the uranyl and lead staining which could led to misinterpretation due to different staining results. That’s why we have drawn a contour line from the beginning to the end of the presynaptic density and extrapolated that for the postsynaptic density (for details see Material and Methods). In our samples both the pre- and postsynaptic densities were always clearly visible in those boutons further analyze.

      Page 26 Line 640 vesicle density measurement: All the synaptic vesicles that are in the 50 nm thick section in their entirety are missed, and there are methods based on EM tomography to correct these estimations. One can not assume, that the error caused by "double counts" of vesicles cancels for the lost ones. There are stereological methods to estimate both types of error please include them and correct the values.

      We would like to point out that the whole body of our work to structural analysis of vesicle pools is based on image data stemming from transmission electron microscopy (TEM) generating a projection of the entire volume of the ultra-thin section and NOT from scanning electron microscopy (SEM) where only a small volume close to the surface of the section would be captured. Operating in TEM mode ensures that no vesicle is missed only because it is embedded in its entirety in the section as postulated by the reviewer. Hence, EM tomography, which is basically a TEM operating from different incident angles in relation to the specimen or section, does not provide any advantage in detecting these vesicles. It does, however, help to better position a 3D object within the section volume itself and therefore allows to detect objects that could overlap from one viewing angle by using another angle. As the average vesicle diameter is of similar size compared to the section thickness, the possibility of a complete overlap to happen, however, is almost zero. And as we only count clear ring-like structures, a stereological correction factor calculated according to Abercrombie (1946) would underestimate real counts (see also Saetzler et al. 2002). If there is, however, relevant literature on "methods based on EM tomography" and "stereological methods to estimate both types of error" (over- and underestimates) that we are missing out on, we would appreciate the reviewer providing us with the corresponding references so that we can include such calculations in our paper.

      Page 27 Line 664 and 665 "sections" are still tissue blocks, as sectioning comes after if the process is correctly written. Please correct.

      We have corrected this according to the reviewer’s comment.

      Page 43 Figure 4 D Data for L1b is missing, only the correlation line is visible.

      Corrected in a new Figure.

      Page 44 Figure 5 C arrowheads are in the correct places? Some of them do not seem to point to the edge of the synapse.

      We carefully checked the Figure and adjusted the arrowheads.

      Figure 5 E lower arrowhead labels something, that is difficult to identify but does not seem to be a vesicle.

      We agree with the reviewer on this point and changed the figure accordingly.

      Figure 5 F, the upper vesicle is at least 10 nm apart from the PreAZ membrane. Did the authors consider it as docked (indicated with arrowhead, according to the legend it labels docked vesicles)?

      We agree with the reviewer on this point and changed the figure accordingly.

      Page 45 Figure 6 B one of the 2 synaptic boutons (sb), sb2 has a tangential active zone that precludes the identification of the pre- and post-synaptic membranes, still 2 "docked vesicles" are labeled. How were they classified as docked? Please remove these tangential synapses from the dataset, as membranes can not be identified.

      The reviewer is right that the active zone is tangentially cut, however, the two vesicles are associated with the AZ. In addition, we did not use this AZ for vesicle data analysis.

      Page 46 Line 1124 interneuron axon labelled in green not brown.

      Corrected as suggested by the reviewer.

      Line 1129 SStC is missing.

      Changed according to the reviewer’s comment.

      Page 48 Table 2 Number of docked vesicles Median values are rounded to integer values? If yes why?

      The statistic package used rounded to the given values.

      Page 51 Supplementary Table 1 Hu_04 Histopathology, what does TLE stands for?

      TLE: temporal lobe epilepsy. We included the abbreviation in the legend of Supplementary Table1, that is now table 2.

      Reviewer #3 (Recommendations for the authors):

      (1) Reanalysis of astrocytic coverage based on the % of synapses that form tripartite synapses.

      We have reanalyzed the data concerning this point (new Figure 6D).

      (2) Segregation of excitatory and inhibitory synapses.

      We have now included a paragraph in our results section to distinguish between excitatory and inhibitory synapses.

      (3) Better explanation of the limits of the study to assess functional parameters.

      We disagree with the reviewer on this point and have not included an explanation concerning the limits of this study.

    1. Another explanation for this disconnection may be that administrators are more upbeat about AI’s influence on teaching than professors are.

      I haven't had much interaction with administrators as far as talking about the use of Ai, but I think that it definitely shows with the fact that it is so available to students even on UVU cites. Most teachers has a positive attitude about it, but the use of it has obviously be discussed a lot.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors investigate the effects of aging on auditory system performance in understanding temporal fine structure (TFS), using both behavioral assessments and physiological recordings from the auditory periphery, specifically at the level of the auditory nerve. This dual approach aims to enhance understanding of the mechanisms underlying observed behavioral outcomes. The results indicate that aged animals exhibit deficits in behavioral tasks for distinguishing between harmonic and inharmonic sounds, which is a standard test for TFS coding. However, neural responses at the auditory nerve level do not show significant differences when compared to those in young, normal-hearing animals. The authors suggest that these behavioral deficits in aged animals are likely attributable to dysfunctions in the central auditory system, potentially as a consequence of aging. To further investigate this hypothesis, the study includes an animal group with selective synaptic loss between inner hair cells and auditory nerve fibers, a condition known as cochlear synaptopathy (CS). CS is a pathology associated with aging and is thought to be an early indicator of hearing impairment. Interestingly, animals with selective CS showed physiological and behavioral TFS coding similar to that of the young normal-hearing group, contrasting with the aged group's deficits. Despite histological evidence of significant synaptic loss in the CS group, the study concludes that CS does not appear to affect TFS coding, either behaviorally or physiologically.

      We agree with the reviewer’s summary.

      Strengths:

      This study addresses a critical health concern, enhancing our understanding of mechanisms underlying age-related difficulties in speech intelligibility, even when audiometric thresholds are within normal limits. A major strength of this work is the comprehensive approach, integrating behavioral assessments, auditory nerve (AN) physiology, and histology within the same animal subjects. This approach enhances understanding of the mechanisms underlying the behavioral outcomes and provides confidence in the actual occurrence of synapse loss and its effects. The study carefully manages controlled conditions by including five distinct groups: young normal-hearing animals, aged animals, animals with CS induced through low and high doses, and a sham surgery group. This careful setup strengthens the study's reliability and allows for meaningful comparisons across conditions. Overall, the manuscript is well-structured, with clear and accessible writing that facilitates comprehension of complex concepts.

      Weaknesses:

      The stimulus and task employed in this study are very helpful for behavioral research, and using the same stimulus setup for physiology is advantageous for mechanistic comparisons. However, I have some concerns about the limitations in auditory nerve (AN) physiology. Due to practical constraints, it is not feasible to record from a large enough population of fibers that covers a full range of best frequencies (BFs) and spontaneous rates (SRs) within each animal. This raises questions about how representative the physiological data are for understanding the mechanism in behavioral data. I am curious about the authors' interpretation of how this stimulus setup might influence results compared to methods used by Kale and Heinz (2010), who adjusted harmonic frequencies based on the characteristic frequency (CF) of recorded units. While, the harmonic frequencies in this study are fixed across all CFs, meaning that many AN fibers may not be tuned closely to the stimulus frequencies.

      We chose the stimuli for the AN recordings to be identical to the stimuli used in the behavioral evaluation of the perceptual sensitivity. Only with this approach can we directly compare the response of the population of AN fibres with perception measured in behaviour. We will address this more clearly in the revision.

      If units are not responsive to the stimulus further clarification on detecting mistuning and phase locking to TFS effects within this setup would be valuable.

      It is unclear to us what the reviewer alludes to. We ask to rephrase the question.

      Given the limited number of units per condition-sometimes as few as three for certain conditions - I wonder if CF-dependent variability might impact the results of the AN data in this study and discussing this factor can help with better understanding the results. While the use of the same stimuli for both behavioral and physiological recordings is understandable, a discussion on how this choice affects interpretation would be beneficial. In addition a 60 dB stimulus could saturate high spontaneous rate (HSR) AN fibers, influencing neural coding and phase-locking to TFS. Potentially separating SR groups, could help address these issues and improve interpretive clarity.

      In the discussion of a revised version of the manuscript, we will point out the pros and cons of using fixed-level stimuli that were not adjusted in frequency to the BF.

      A deeper discussion on the role of fiber spontaneous rate could also enhance the study. How might considering SR groups affect AN results related to TFS coding? While some statistical measures are included in the supplement, a more detailed discussion in the main text could help in interpretation. We do not think that it will be necessary to conduct any statistical analysis in addition to that already reported in the supplement.

      We will consider moving some supplementary information back into the main manuscript when revising.

      Although Figure S2 indicates no change in median SR, the high-dose treatment group lacks LSR fibers, suggesting a different distribution based on SR for different animal groups, as seen in similar studies on other species. A histogram of these results would be informative, as LSR fiber loss with CS-whether induced by ouabain in gerbils or noise in other animals-is well documented (e.g., Furman et al., 2013).

      We will add information on the distribution when revising.

      Although ouabain effects on gerbils have been explored in previous studies, since these data already seems to be recorded for the animal in this study, a brief description of changes in auditory brainstem response (ABR) thresholds, wave 1 amplitudes, and tuning curves for animals with cochlear synaptopathy (CS) in this study would be beneficial. This would confirm that ouabain selectively affects synapses without impacting outer hair cells (OHCs). For aged animals, since ABR measurements were taken, comparing hearing differences between normal and aged groups could provide insights into the pathologies besides CS in aged animals. Additionally, examining subject variability in treatment effects on hearing and how this correlates with behavior and physiology would yield valuable insights. If limited space maybe a brief clarification or inclusion in supplementary could be good enough.

      We do indeed have data on ABR amplitudes and the wave 1 growth functions but only in response to broadband clicks. For more frequency-specific information, mass-potential recordings are available, obtained before and after ouabain treatment. Regarding neural tuning, we did not obtain full frequency-threshold curves but do have bandwidths for response curves recorded close to threshold. We are in the process of analyzing all these data further and will consider how to best incorporate them into the manuscript, to address the reviewer’s concerns.

      Another suggestion is to discuss the potential role of MOC efferent system and effect of anesthesia in reducing efferent effects in AN recordings. This is particularly relevant for aged animals, as CS might affect LSR fibers, potentially disrupting the medial olivocochlear (MOC) efferent pathway. Anesthesia could lessen MOC activity in both young and aged animals, potentially masking efferent effects that might be present in behavioral tasks. Young gerbils with functional efferent systems might perform better behaviorally, while aged gerbils with impaired MOC function due to CS might lack this advantage. A brief discussion on this aspect could potentially enhance mechanistic insights.

      Our provisional response below will be integrated in similar form into the Discussion.

      Olivocochlear efferent activity is a potential modulator of OHC gain (by medial olivocochlear neurons, MOC) and afferent activity (by lateral olivocochlear neurons, LOC). Beyond this general observation it is, however, difficult to speculate about its specific role in the TFS1 test, as almost nothing is known about efferent activity under naturalistic conditions in a behaving animal (reviewed by Lauer et al., 2022). We note, however, that efferent activity is believed to be reduced under general anesthesia (reviewed by Guinan, 2011, DOI 10.1007/978-1-4419-7070-1_3) and possibly abnormal in other ways, considering the potential top-down inputs to the efferent neurons from extensive brain networks (reviewed by Schofield, 2011, DOI 10.1007/978-1-4419-7070-1_9; Romero and Trussell, 2022, DOI: 10.1016/j.heares.2022.108516). Thus, it is reasonable to assume a reduced efferent influence in our auditory-nerve data, compared to the behavioral test situation. In contrast, we assume more comparable efferent influences in young-adult and old gerbils. It was recently shown that, despite age-related losses in both MOC and LOC cochlear innervation, this basically reflected the loss of efferent target structures (OHC and type-I afferents), with the surviving cochlear circuitry remaining largely normal (Steenken et al., 2024, DOI: 10.3389/fnsyn.2024.1422330). The main difference was an increased proportion of OHC without any efferent innervation, predominantly in low-frequency cochlear regions (Steenken et al., 2024). Such OHC are thus not under efferent control, and they are more numerous (about 10 – 30%) in old gerbils.

      Lastly, although synapse counts did not differ between the low-dose treatment and NH I sham groups, separating these groups rather than combining them with the sham might reveal differences in behavior or AN results, particularly regarding the significance of differences between aged/treatment groups and the young normal-hearing group. For maximizing statistical power, we combined those groups in the statistical analysis. These two groups did not differ in synapse number and had quite similar ABR wave 1 growth functions.

      Reviewer #2 (Public review):

      Summary:

      Using a gerbil model, the authors tested the hypothesis that loss of synapses between sensory hair cells and auditory nerve fibers (which may occur due to noise exposure or aging) affects behavioral discrimination of the rapid temporal fluctuations of sounds. In contrast to previous suggestions in the literature, their results do not support this hypothesis; young animals treated with a compound that reduces the number of synapses did not show impaired discrimination compared to controls. Additionally, their results from older animals showing impaired discrimination suggest that age-related changes aside from synaptopathy are responsible for the age-related decline in discrimination.

      We agree with the reviewer’s summary.

      Strengths:

      (1) The rationale and hypothesis are well-motivated and clearly presented.

      (2) The study was well conducted with strong methodology for the most part, and good experimental control. The combination of physiological and behavioral techniques is powerful and informative. Reducing synapse counts fairly directly using ouabain is a cleaner design than using noise exposure or age (as in other studies), since these latter modifiers have additional effects on auditory function.

      (3) The study may have a considerable impact on the field. The findings could have important implications for our understanding of cochlear synaptopathy, one of the most highly researched and potentially impactful developments in hearing science in the past fifteen years.

      Weaknesses:

      (1) My main concern is that the stimuli may not have been appropriate for assessing neural temporal coding behaviorally. Human studies using the same task employed a filter center frequency that was (at least) 11 times the fundamental frequency (Marmel et al., 2015; Moore and Sek, 2009). Moore and Sek wrote: "the default (recommended) value of the centre frequency is 11F0." Here, the center frequency was only 4 or 8 times the fundamental frequency (4F0 or 8F0). Hence, relative to harmonic frequency, the harmonic spacing was considerably greater in the present study. By my calculations, the masking noise used in the present study was also considerably lower in level relative to the harmonic complex than that used in the human studies. These factors may have allowed the animals to perform the task using cues based on the pattern of activity across the neural array (excitation pattern cues), rather than cues related to temporal neural coding. The authors show that mean neural driven rate did not change with frequency shift, but I don't understand the relevance of this. It is the change in response of individual fibers with characteristic frequencies near the lowest audible harmonic that is important here.

      The auditory filter bandwidth of the gerbil is about double that of human subjects. Because of this, the masking noise has a larger overall level than in the human studies in the filter. This precludes that the gerbils can use excitation patterns, especially in the condition with a center frequency of 1600 Hz and a fundamental of 200 Hz and in the condition with a center frequency of 3200 Hz and a fundamental of 400 Hz.

      The case against excitation pattern cues needs to be better made in the Discussion. It could be that gerbil frequency selectivity is broad enough for this not to be an issue, but more detail needs to be provided to make this argument. The authors should consider what is the lowest audible harmonic in each case for their stimuli, given the level of each harmonic and the level of the pink noise. Even for the 8F0 center frequency, the lowest audible harmonic may be as low as the 4th (possibly even the 3rd). In human, harmonics are thought to be resolvable by the cochlea up to at least the 8th.

      Because of the gerbil’s broader auditory filters, with the exception of the condition with center frequency of 1600 Hz and fundamental of 400 Hz harmonics are are not resolved. We will expand the topic of potential excitation pattern cues in the discussion of the revised version and add results on modeled excitation patterns to the supplement.

      (2) The synapse reductions in the high ouabain and old groups were relatively small (mean of 19 synapses per hair cell compared to 23 in the young untreated group). In contrast, in some mouse models of the effects of noise exposure or age, a 50% reduction in synapses is observed, and in the human temporal bone study of Wu et al. (2021, https://doi.org/10.1523/JNEUROSCI.3238-20.2021) the age-related reduction in auditory nerve fibres was ~50% or greater for the highest age group across cochlear location. It could be simply that the synapse loss in the present study was too small to produce significant behavioral effects. Hence, although the authors provide evidence that in the gerbil model the age-related behavioral effects are not due to synaptopathy, this may not translate to other species (including human). This should be discussed in the manuscript.

      Our provisional response below will be integrated in similar form into the Discussion.

      The observed extent of age-related or noise-induced loss of type-I afferent synapses on IHC varies widely between species and studies. For example, in ageing CBA/CaJ mice, mean losses of between 20 and 50% of afferent synapses (depending on cochlear location and precise age) were reported (Sergeyenko et al., 2013, DOI: 10.1523/JNEUROSCI.1783-13.2013; Kobrina et al., 2020, DOI: 10.1016/j.neurobiolaging.2020.08.012). Humans showed more pronounced losses of peripheral axons, of 40–100%, again depending on cochlear location, precise age, and noise history (Wu et al., 2019, DOI: 10.1016/j.neuroscience.2018.07.053; 2021, DOI: 10.1523/JNEUROSCI.3238-20.2021). The age-related and induced synapse losses in our gerbils were in a more moderate range, around 20% (Steenken et al., 2021, DOI: 10.1016/j.neurobiolaging.2021.08.019; this study). Thus, it is possible that a more severe, induced synaptopathy would have resulted in behavioral deficits in young-adult gerbils. However, in the absence of additional noise or pharmacologically induced damage, our study provides strong evidence for other factors causing temporal processing problems with advancing age. Our 3-year-old gerbils are approximately comparable to a 60-year-old human (Castano-Gonzalez et al., 2024, DOI: 10.1016/j.heares.2024.108989) with beginning but not yet clinically relevant hearing loss (Hamann et al., 2002, DOI: 10.1016/S0378-5955(02)00454-9).

      It would be informative to provide synapse counts separately for the animals who were tested behaviorally, to confirm that the pattern of loss across the group was the same as for the larger sample.

      Yes, the pattern was the same for the subgroup of behaviorally tested animals. We will add this information to the revised version of the manuscript.

      (3) The study was not pre-registered, and there was no a priori power calculation, so there is less confidence in replicability than could have been the case. Only three old animals were used in the behavioral study, which raises concerns about the reliability of comparisons involving this group.

      The results for the three old subjects differed significantly from those of young subjects and young ouabain-treated subjects. This indicates a sufficient statistical power, since otherwise no significant differences would be observed.

      Reviewer #3 (Public review):

      This study is a part of the ongoing series of rigorous work from this group exploring neural coding deficits in the auditory nerve, and dissociating the effects of cochlear synaptopathy from other age-related deficits. They have previously shown no evidence of phase-locking deficits in the remaining auditory nerve fibers in quiet-aged gerbils. Here, they study the effects of aging on the perception and neural coding of temporal fine structure cues in the same Mongolian gerbil model.

      They measure TFS coding in the auditory nerve using the TFS1 task which uses a combination of harmonic and tone-shifted inharmonic tones which differ primarily in their TFS cues (and not the envelope). They then follow this up with a behavioral paradigm using the TFS1 task in these gerbils. They test young normal hearing gerbils, aged gerbils, and young gerbils with cochlear synaptopathy induced using the neurotoxin ouabain to mimic synapse losses seen with age. In the behavioral paradigm, they find that aging is associated with decreased performance compared to the young gerbils, whereas young gerbils with similar levels of synapse loss do not show these deficits. When looking at the auditory nerve responses, they find no differences in neural coding of TFS cues across any of the groups.

      However, aged gerbils show an increase in the representation of periodicity envelope cues (around f0) compared to young gerbils or those with induced synapse loss. The authors hence conclude that synapse loss by itself doesn't seem to be important for distinguishing TFS cues, and rather the behavioral deficits with age are likely having to do with the misrepresented envelope cues instead.

      We agree with the reviewer’s summary.

      The manuscript is well written, and the data presented are robust. Some of the points below will need to be considered while interpreting the results of the study, in its current form. These considerations are addressable if deemed necessary, with some additional analysis in future versions of the manuscript.

      Spontaneous rates - Figure S2 shows no differences in median spontaneous rates across groups. But taking the median glosses over some of the nuances there. Ouabain (in the Bourien study) famously affects low spont rates first, and at a higher degree than median or high spont rates. It seems to be the case (qualitatively) in Figure S2 as well, with almost no units in the low spont region in the ouabain group, compared to the other groups. Looking at distributions within each spont rate category and comparing differences across the groups might reveal some of the underlying causes for these changes. Given that overall, the study reports that low-SR fibers had a higher ENV/TFS log-z-ratio, the distribution of these fibers across groups may reveal specific effects of TFS coding by group.

      As the reviewer points out, our sample from the group treated with a high concentration of ouabain showed very few low-spontaneous-rate auditory-nerve fibers, as expected from previous work. However, this was also true, e.g., for our sample from sham-operated animals, and may thus well reflect a sampling bias. We are therefore reluctant to attach much significance to these data distributions. We will consider moving some supplementary information back into the main manuscript when revising.

      Threshold shifts - It is unclear from the current version if the older gerbils have changes in hearing thresholds, and whether those changes may be affecting behavioral thresholds. The behavioral stimuli appear to have been presented at a fixed sound level for both young and aged gerbils, similar to the single unit recordings. Hence, age-related differences in behavior may have been due to changes in relative sensation level. Approaches such as using hearing thresholds as covariates in the analysis will help explore if older gerbils still show behavioral deficits.

      Unfortunately, we did not obtain behavioral thresholds that could be used here. The ABR thresholds, although not directly comparable to behavioral thresholds, suggest that our old animals had at most a moderate threshold increase in quiet. Furthermore, we want to point out that the TFS 1 stimuli had an overall level of 68 dB SPL, and the pink noise masker would have increased the threshold more than expected from the moderate, age-related hearing loss in quiet. Thus, the masked thresholds for all gerbil groups are likely similar and should have no effect on the behavioral results.

      Task learning in aged gerbils - It is unclear if the aged gerbils really learn the task well in two of the three TFS1 test conditions. The d' of 1 which is usually used as the criterion for learning was not reached in even the easiest condition for aged gerbils in all but one condition for the aged gerbils (Fig. 5H) and in that condition, there doesn't seem to be any age-related deficits in behavioral performance (Fig. 6B). Hence dissociating the inability to learn the task from the inability to perceive TFS 1 cues in those animals becomes challenging.

      Even in the group of gerbils with the lowest sensitivity, for the condition 400/1600 the animals achieved a d’ of on average above 1. Furthermore, stimuli were well above threshold and audible, even when no discrimination could be observed. Finally, as explained in the methods, different stimulus conditions were interleaved in each session, providing stimuli that were easy to discriminate together with those being difficult to discriminate. This approach ensures that the gerbils were under stimulus control, meaning properly trained to perform the task. Thus, an inability to discriminate does not indicate a lack of proper training.

      Increased representation of periodicity envelope in the AN - the mechanisms for increased representation of periodicity envelope cues is unclear. The authors point to some potential central mechanisms but given that these are recordings from the auditory nerve what central mechanisms these may be is unclear. If the authors are suggesting some form of efferent modulation only at the f0 frequency, no evidence for this is presented. It appears more likely that the enhancement may be due to outer hair cell dysfunction (widened tuning, distorted tonotopy). Given this increased envelope coding, the potential change in sensation level for the behavior (from the comment above), and no change in neural coding of TFS cues across any of the groups, a simpler interpretation may be -TFS coding is not affected in remaining auditory nerve fibers after age-related or ouabain induced synapse loss, but behavioral performance is affected by altered outer hair cell dysfunction with age.

      A similar point is made by Reviewer #1. As indicated above, we do have limited data on neural bandwidths and will explore if these are sufficient to address the reviewers’ questions about potential, age-related changes in neural tuning in our sample. Previous work found no substantial OHC losses (Tarnowski et al., 1991, DOI: 10.1016/0378-5955(91)90142-V; Adams and Schulte, 1997, DOI: 10.1016/S0378-5955(96)00184-0; Steenken et al., 2024, DOI: 10.3389/fnsyn.2024.1422330) nor any deterioration in neural frequency tuning (Heeringa et al., 2020, DOI: 10.1523/JNEUROSCI.2784-18.2019), in quiet-aged gerbils of similar age as the ones used here.

      Emerging evidence seems to suggest that cochlear synaptopathy and/or TFS encoding abilities might be reflected in listening effort rather than behavioral performance. Measuring some proxy of listening effort in these gerbils (like reaction time) to see if that has changed with synapse loss, especially in the young animals with induced synaptopathy, would make an interesting addition to explore perceptual deficits of TFS coding with synapse loss.

      This is an interesting suggestion that we will explore in the revision of the manuscript. Reaction times were recorded for responses that can be used as a proxy for listening effort.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      Evidence, reproducibility and clarity

      Summary: The manuscript by Yang et al. describes a new CME accessory protein. CCDC32 has been previously suggested to interact with AP2 and in the present work the authors confirm this interaction and show that it is a bona fide CME regulator. In agreement with its interaction with AP2, CCDC32 recruitment to CCPs mirrors the accumulation of clathrin. Knockdown of CCDC32 reduces the amount of productive CCPs, suggestive of a stabilisation role in early clathrin assemblies. Immunoprecipitation experiments mapped the interaction of CCDC42 to the α-appendage of the AP2 complex α-subunit. Finally, the authors show that the CCDC32 nonsense mutations found in patients with cardio-facial-neuro-developmental syndrome disrupt the interaction of this protein to the AP2 complex. The manuscript is well written and the conclusions regarding the role of CCDC32 in CME are supported by good quality data. As detailed below, a few improvements/clarifications are needed to reinforce some of the conclusions, especially the ones regarding CFNDS.

      Response: We thank the referee for their positive comments. In light of a recently published paper describing CCDC32 as a co-chaperone required for AP2 assembly (Wan et al., PNAS, 2024, see reviewer 2), we have added several additional experiments to address all concerns and consequently gained further insight into CCDC32-AP2 interactions and the important dual role of CCDC32 in regulating CME.

      Major comments:

      1) Why did the protein could just be visualized at CCPs after knockdown of the endogenous protein? This is highly unusual, especially on stable cell lines. Could this be that the tag is interfering with the expressed protein function rendering it incapable of outcompeting the endogenous? Does this points to a regulated recruitment?

      Response: The reviewer is correct, this would be unusual; however, it is not the case. We misspoke in the text (although the figure legend was correct) these experiments were performed without siRNA knockdown and we can indeed detect eGFP-CCDC32 being recruited to CCPs in the presence of endogenous protein. Nonetheless, we repeated the experiment to be certain.

      2) The disease mutation used in the paper does not correspond to the truncation found in patients. The authors use an 1-54 truncation, but the patients described in Harel et al. have frame shifts at the positions 19 (Thr19Tyrfs*12) and 64 (Glu64Glyfs*12), while the patient described in Abdalla et al. have the deletion of two introns, leading to a frameshift around amino acid 90. Moreover, to be precisely test the function of these disease mutations, one would need to add the extra amino acids generated by the frame shift. For example, as denoted in the mutation description in Harel et al., the frameshift at position 19 changes the Threonine 19 to a Tyrosine and ads a run of 12 extra amino acids (Thr19Tyrfs*12).

      Response: The label of the disease mutant p.(Thr19Tyrfs∗12) and p.(Glu64Glyfs∗12) is based on a 194aa polypeptide version of CCDC32 initiated at a nonconventional start site that contains a 9 aa peptide (VRGSCLRFQ) upstream of the N-terminus we show. Thus, we are indeed using the appropriate mutation site (see: https://www.uniprot.org/uniprotkb/Q9BV29/entry). The reviewer is correct that we have not included the extra 12 aa in our construct; however as these residues are not present in the other CFNDS mutants, we think it unlikely that they contribute to the disease phenotype. Rather, as neither of the clinically observed mutations contain the 78-98 aa sequence required for AP2 binding and CME function, we are confident that this defect contributed to the disease. Thus, we are including the data on the CCDC32(1-54) mutant, as we believe these results provide a valuable physiological context to our studies.

      3) The frameshift caused by the CFNDS mutations (especially the one studied) will likely lead to nonsense mediated RNA decay (NMD). The frameshift is well within the rules where NMD generally kicks in. Therefore, I am unsure about the functional insights of expressing a disease-related protein which is likely not present in patients.

      Response: We thank the reviewer for bringing up this concern. However, as shown in new Figure S1, the mutant protein is expressed at comparable levels as the WT, suggesting that NMD is not occurring.

      4) Coiled coils generally form stable dimers. The typically hydrophobic core of these structures is not suitable for transient interactions. This complicates the interpretation of the results regarding the role of this region as the place where the interaction to AP2 occurs. If the coiled coil holds a stable CCDC32 dimer, disrupting this dimer could reduce the affinity to AP2 (by reduced avidity) to the actual binding site. A construct with an orthogonal dimeriser or a pulldown of the delta78-98 protein with of the GST AP2a-AD could be a good way to sort this issue.

      Response: We were unable to model a stable dimer (or other oligomer) of this protein with high confidence using Alphafold 3.0. Moreover, we were unable to detect endogenous CCDC32 co-immunoprecipitating with eGFP-CCDC32 (Fig. S6C). Thus, we believe that the moniker, based solely on the alpha-helical content of the protein is a misnomer. We have explained this in the main text.

      Minor comments:

      1) The authors interchangeably use the term "flat CCPs" and "flat clathrin lattices". While these are indeed related, flat clathrin lattices have been also used to refer to "clathrin plaques". To avoid confusion, I suggest sticking to the term "flat CCPs" to refer to the CCPs which are in their early stages of maturation.

      Response: Agreed. Thank you for the suggestion. We have renamed these structures flat clathrin assemblies, as they do not acquire the curvature needed to classify them as pits, and do not grow to the size that would classify then as plaques.

      Significance

      General assessment: CME drives the internalisation of hundreds of receptors and surface proteins in practically all tissues, making it an essential process for various physiological processes. This versatility comes at the cost of a large number of molecular players and regulators. To understand this complexity, unravelling all the components of this process is vital. The manuscript by Yang et al. gives an important contribution to this effort as it describes a new CME regulator, CCDC32, which acts directly at the main CME adaptor AP2. The link to disease is interesting, but the authors need to refine their experiments. The requirement for endogenous knockdown for recruitment of the tagged CCDC32 is unusual and requires further exploration.

      Advance: The increased frequency of abortive events presented by CCDC32 knockdown cells is very interesting, as it hints to an active mechanism that regulates the stabilisation and growth of clathrin coated pits. The exact way clathrin coated pits are stabilised is still an open question in the field.

      Audience: This is a basic research manuscript. However, given the essential role of CME in physiology and the growing number of CME players involved in disease, this manuscript can reach broader audiences.

      Response: We thank the referee for recognizing the 'interesting' advances our studies have made and for considering these studies as 'an important contribution' to 'an essential process for various physiological processes' and able 'to reach broader audiences'. We have addressed and reconciled the reviewer's concerns in our revised manuscript.

      Field of expertise of the reviewer: Clathrin mediated endocytosis, cell biology, microscopy, biochemistry.


      Reviewer #2

      Evidence, reproducibility and clarity

      In this manuscript, the authors demonstrate that CCDC32 regulates clathrin-mediated endocytosis (CME). Some of the findings are consistent with a recent report by Wan et al. (2024 PNAS), such as the observation that CCDC32 depletion reduces transferrin uptake and diminishes the formation of clathrin-coated pits. The primary function of CCDC32 is to regulate AP2 assembly, and its depletion leads to AP2 degradation. However, this study did not examine AP2 expression levels. CCDC32 may bind to the appendage domain of AP2 alpha, but it also binds to the core domain of AP2 alpha. Overall, while this work presents some interesting ideas, it remains unclear whether CCDC32 regulates AP2 beyond the assembly step.

      Response: We thank the reviewer for drawing our attention to the Wan et al. paper, that appeared while this work was under review. However, our in vivo data are not fully consistent with the report from Wan et al. The discrepancies reveal a dual function of CCDC32 in CME that was masked by complete knockout vs siRNA knockdown of the protein, and also likely affected by the position of the GFP-tag (C- vs N-terminal) on this small protein. Thus:

      • Contrary to Wan et al., we do not detect any loss of AP2 expression (see new Figure S3A-B) upon siRNA knockdown. Most likely the ~40% residual CCDC32 present after siRNA knockdown is sufficient to fulfill its catalytic chaperone function but not its structural role in regulating CME beyond the AP2 assembly step.
      • Contrary to Wan et al., we have shown that CCDC32 indeed interacts with intact AP2 complex (Figure S3C and 6B,C) showing that all 4 subunits of the AP2 complex co-IP with full length eGFP-CCDC32. Interestingly, whereas the full length CCDC32 pulls down the intact AP2 complex, co-IP of the ∆78-98 mutant retains its ability to pull down the b2-µ2 hemicomplex, its interactions with α:σ2 are severely reduced. While this result is consistent with the report of Wan et al that CCDC32 binds to the α:σ2 hemi-complex, it also suggests that the interactions between CCDC32 and AP2 are more complex and will require further studies.
      • Contrary to Wan et al., we provide strong evidence that CCDC32 is recruited to CCPs. Interestingly, modeling with AlphaFold 3.0 identifies a highly probably interaction between alpha helices encoded by residues 66-91 on CCDC32 and residues 418-438 on a. The latter are masked by µ2-C in the closed confirmation of the AP2 core, but exposed in the open confirmation triggered by cargo binding, suggesting that CCDC32 might only bind to membrane-bound AP2. Thus, our findings are indeed novel and indicate striking multifunctional roles for CCDC32 in CME, making the protein well worth further study.

      • Besides its role in AP2 assembly, CCDC32 may potentially have another function on the membrane. However, there is no direct evidence showing that CCDC32 associates with the plasma membrane.

      Response: We disagree, our data clearly shows that CCDC32 is recruited to CCPs (Fig. 1B) and that CCPs that fail to recruit CCDC32 are short-lived and likely abortive (Fig. 1C). Wan et al. did not observe any colocalization of C-terminally tagged CCDC32 to CCPs, whereas we detect recruitment of our N-terminally tagged construct, which we also show is functional (Fig. 6F). Further, we have demonstrated the importance of the C-terminal region of CCDC32 in membrane association (see new Fig. S7). Thus, we speculate that a C-terminally tagged CCDC32 might not be fully functional. Indeed, SIM images of the C-terminally-tagged CCDC32 in Wan et al., show large (~100 nm) structures in the cytosol, which may reflect aggregation.

      CCDC32 binds to multiple regions on AP2, including the core domain. It is important to distinguish the functional roles of these different binding sites.

      Response: We have localized the AP2-ear binding region to residues 78-99 and shown these to be critical for the functions we have identified. As described above we now include data that are complementary to those of Wan et al. However, our data also clearly points to additional binding modalities. We agree that it will be important and map these additional interactions and identify their functional roles, but this is beyond the scope of this paper.

      AP2 expression levels should be examined in CCDC32 depleted cells. If AP2 is gone, it is not surprising that clathrin-coated pits are defective.

      Response: Agreed and we have confirmed this by western blotting (Figure S3A-B) and detect no reduction in levels of any of the AP2 subunits in CCDC32 siRNA knockdown cells. As stated above this could be due to residual CCDC32 present in the siRNA KD vs the CRISPR-mediated gene KO.

      If the authors aim to establish a secondary function for CCDC32, they need to thoroughly discuss the known chaperone function of CCDC32 and consider whether and how CCDC32 regulates a downstream step in CME.

      Response: Agreed. We have described the Wan et al paper, which came out while our manuscript was in review, in our Introduction. As described above, there are areas of agreement and of discrepancies, which are thoroughly documented and discussed throughout the revised manuscript.

      The quality of Figure 1A is very low, making it difficult to assess the localization and quantify the data.

      Response: The low signal:noise in Fig. 1A the reviewer is concerned about is due to a diffuse distribution of CCDC32 on the inner surface of the plasma membrane. We now, more explicitly describe this binding, which we believe reflects a specific interaction mediated by the C-terminus of CCDC32; thus the degree of diffuse membrane binding we observe follows: eGFP-CCDC32(FL)> eGFP-CCDC32(∆78-98)>eGFP-CCDC32(1-54)~eGFP/background (see new Fig. S7). Importantly, the colocalization of CCDC32 at CCPs is confirmed by the dynamic imaging of CCPs (Fig 1B).

      In Figure 6, why aren't AP2 mu and sigma subunits shown?

      Response: Agreed. Not being aware of CCDC32's possible dual role as a chaperone, we had assumed that the AP2 complex was intact. We have now added this data in Figure 6 B,C and Fig. S3C, as discussed above.

      Page 5, top, this sentence is confusing: "their surface area (~17 x 10 nm2) remains significantly less than that required for the average 100 nm diameter CCV (~3.2 x 103 nm2)."

      Response: Thank you for the criticism. We have clarified the sentence and corrected a typo, which would definitely be confusing. The section now reads, "While the flat CCSs we detected in CCDC32 knockdown cells were significantly larger than in control cells (Fig. 4D, mean diameter of 147 nm vs. 127 nm, respectively), they are much smaller than typical long-lived flat clathrin lattices (d{greater than or equal to}300 nm)(Grove et al., 2014). Indeed, the surface area of the flat CCSs that accumulate in CCDC32 KD cells (mean ~1.69 x 104 nm2) remains significantly less than the surface area of an average 100 nm diameter CCV (~3.14 x 104 nm2). Thus, we refer to these structures as 'flat clathrin assemblies' because they are neither curved 'pits' nor large 'lattices'. Rather, the flat clathrin assemblies represent early, likely defective, intermediates in CCP formation."

      Significance

      Please see above.(from above: Overall, while this work presents some interesting ideas, it remains unclear whether CCDC32 regulates AP2 beyond the assembly step)

      Response: Our responses above argue that we have indeed established that CCDC32 regulates AP2 beyond the assembly step. We have also identified several discrepancies between our findings and those reported by Wan et al., most notably binding between CCDC32 and mature AP2 complexes and the AP2-dependent recruitment of CCDC32 to CCPs. It is possible that these discrepancies may be due to the position of the GFP tag (ours is N-terminal, theirs is C-terminal; we show that the N-terminal tagged CCDC32 rescues the knockdown phenotype, while Wan et al., do not provide evidence for functionality of the C-terminal construct).

      __Reviewer #3 __

      Evidence, reproducibility and clarity (Required):

      In this manuscript, Yang et al. characterize the endocytic accessory protein CCDC32, which has implications in cardio-facio-neuro-developmental syndrome (CFNDS). The authors clearly demonstrate that the protein CCDC32 has a role in the early stages of endocytosis, mainly through the interaction with the major endocytic adaptor protein AP2, and they identify regions taking part in this recognition. Through live cell fluorescence imaging and electron microscopy of endocytic pits, the authors characterize the lifetimes of endocytic sites, the formation rate of endocytic sites and pits and the invagination depth, in addition to transferrin receptor (TfnR) uptake experiments. Binding between CCDC32 and CCDC32 mutants to the AP2 alpha appendage domain is assessed by pull down experiments. Together, these experiments allow deriving a phenotype of CCDC32 knock-down and CCDC32 mutants within endocytosis, which is a very robust system, in which defects are not so easily detected. A mutation of CCDC32, known to play a role in CFNDS, is also addressed in this study and shown to have endocytic defects.

      Response: We thank the reviewer for their positive remarks regarding the quality of our data and the strength of our conclusions.

      In summary, the authors present a strong combination of techniques, assessing the impact of CCDC32 in clathrin mediated endocytosis and its binding to AP2, whereby the following major and minor points remain to be addressed:

      • The authors show that CCDC32 depletion leads to the formation of brighter and static clathrin coated structures (Figure 2), but that these were only prevalent to 7.8% and masked the 'normal' dynamic CCPs. At the same time, the authors show that the absence of CCDC32 induces pits with shorter life times (Figure 1 and Figure 2), the 'majority' of the pits. Clarification is needed as to how the authors arrive at these conclusions and these numbers. The authors should also provide (and visualize) the corresponding statistics. The same statement is made again later on in the manuscript, where the authors explain their electron microscopy data. Was the number derived from there?

      These points are critical to understanding CCDC32's role in endocytosis and is key to understanding the model presented in Figure 8. The numbers of how many pits accumulate in flat lattices versus normal endocytosis progression and the actual time scales could be included in this model and would make the figure much stronger.

      Response: Thank you for these comments. We understand the paradox between the visual impression and the reality of our dynamic measurements. We have been visually misled by this in previous work (Chen et al., 2020), which emphasizes the importance of unbiased image analysis afforded to us through the well-documented cmeAnalysis pipeline, developed by us (Aguet et al., 2013) and now used by many others (e.g. (He et al., 2020)).

      The % of static structures was not derived from electron microscopy data, but quantified using cmeAnalysis, which automatedly provides the lifetime distribution of CCPs. We have now clarified this in the manuscript and added a histogram (Fig. S4) quantifying the fraction of CCPs in lifetime cohorts 150s (static).

      • In relation to the above point, the statistics of Figure 2E-G and the analysis leading there should also be explained in more detail: For example, what are the individual points in the plot (also in Figures 6G and 7G)? The authors should also use a few phrases to explain software they use, for example DASC, in the main text.

      Response: Each point in these bar graphs represents a movie, where n{greater than or equal to}12. These details have been added to the respective figure legend. We have also added a brief description of DASC analysis in the text.

      • There are several questions related to the knock-down experiments that need to be addressed:

      Firstly, knock-down of CCDC32 does not seem to be very strong (Figure S2B). Can the level of knock-down be quantified?

      Response: We have now quantified the KD efficiency. It is ~60%. This turns out to be fortuitous (see responses to reviewer 2), as a recent publication, which came out after we completed our study, has shown by CRISPR-mediated knockout, that CCD32 also plays an essential chaperone function required for AP2 assembly. We do not see any reduction in AP2 levels or its complex formation under our conditions (see new Supplemental Figure S3), which suggests that the effects of CCDC32 on CCP dynamics are more sensitive to CCDC32 concentration than its roles as a chaperone. Our phenotypes would have been masked by more efficient depletion of CCDC32.

      In page 6 it is indicated that the eGFP-CCDC32(1-54) and eGFP-CCDC32(∆78-98) constructs are siRNA-resistant. However in Fig S2B, these proteins do not show any signal in the western blot, so it is not clear if they are expressed or simply not detected by the antibody. The presence of these proteins after silencing endogenous CCDC32 needs to be confirmed to support Figures 6 and Figures 7, which critically rely on the presence of the CCDC32 mutants.

      Response: Unfortunately, the C-terminally truncated CCDC32 proteins are not detected because they lack the antibody epitope, indeed even the D78-98 deletion is poorly detected (compare the GFP blot in new S1A with the anti-CCDC32 blot in S1B). However, these constructs contain the same siRNA-resistance mutation as the full length protein. That they are expressed and siRNA resistant can be seen in Fig. S2A (now Fig. S1A) blotting for GFP.

      In Figures 6 and 7, siRNA knock-down of CCDC32 is only indicated for sub-figures F to G. Is this really the case? If not, the authors should clarify. The siRNA knock-down in Figure 1 is also only mentioned in the text, not in the figure legend. The authors should pay attention to make their figure legends easy to understand and unambiguous.

      Response: No, it is not the case. Thank you for pointing out the uncertainty. We have added these details to the Figure legends and checked all Figure legends to ensure that they clearly describe the data shown.

      • It is not exactly clear how the curves in Figure 3C (lower panel) on the invagination depth were obtained. Can the authors clarify this a bit more? For example, what are kT and kE in Figure 3A? What is I0? And how did the authors derive the logarithmic function used to quantify the invagination depth? In the main text, the authors say that the traces were 'logarithmically transformed'. This is not a technical term. The authors should refer to the actual equation used in the figure.

      Response: This analysis was developed by the Kirchhausen lab (Saffarian and Kirchhausen, 2008). We have added these details and reference them in the Figure legend and in the text. We also now use the more accurate descriptor 'log-transformed'.

      • In the discussion, the claim 'The resulting dysregulation of AP2 inhibits CME, which further results in the development of CFNDS.' is maybe a bit too strong of a statement. Firstly, because the authors show themselves that CME is perturbed, but by no means inhibited. Secondly, the molecular link to CFNDS remains unclear. Even though CCDC32 mutants seem to be responsible for CFNDS and one of the mutant has been shown in this study to have a defect in endocytosis and AP2 binding, a direct link between CCDC32's function in endocytosis and CFNDS remains elusive. The authors should thus provide a more balanced discussion on this topic.

      Response: We have modified and softened our conclusions, which now read that the phenotypes we see likely "contribute to" rather than "cause" the disease.

      • In Figure S1, the authors annotate the presence of a coiled-coil domain, which they also use later on in the manuscript to generate mutations. Could the authors specify (and cite) where and how this coiled-coil domain has been identified? Is this predicted helix indeed a coiled-coil domain, or just a helix, as indicated by the authors in the discussion?

      Response: See response to Reviewer 1, point 4. We have changed this wording to alpha-helix. The 'coiled-coil' reference is historical and unlikely a true reflection of CCDC32 structure. AlphaFold 3.0 predictions were unable to identify with certainly any coiled-coil structures, even if we modelled potential dimers or trimers; and we find no evidence of dimerization of CCDC32 in vivo. We have clarified this in the text.

      Minor comments

      • In general, a more detailed explanation of the microscopy techniques used and the information they report would be beneficial to provide access to the article also to non-expert readers in the field. This concerns particularly the analysis methods used, for example: How were the cohort-averaged fluorescence intensity and lifetime traces obtained? How do the tools cmeAnalysis and DASC work? A brief explanation would be helpful.

      Response: We have expanded Methods to add these details, and also described them in the main text.

      • The axis label of Figure 2B is not quite clear. What does 'TfnR uptake % of surface bound' mean? Maybe the authors could explain this in more detail in the figure legend? Is the drop in uptake efficiency also accessible by visual inspection of the images? It would be interesting to see that.

      Response: This is a standard measure of CME efficiency. 'TfnR uptake % of surface bound' = Internalized TfnR/Surface bound TfnR. Again, images may be misleading as defects in CME lead to increased levels of TfnR on the cell surface, which in turn would result in more Tfn uptake even if the rate of CME is decreased.

      • Figure 4: How is the occupancy of CCPs in the plasma membrane measured? What are the criteria used to divide CCSs into Flat, Dome or Sphere categories?

      Response: We have expanded Methods to add these details. Based on the degree of invagination, the shapes of CCSs were classified as either: flat CCSs with no obvious invagination; dome-shaped CCSs that had a hemispherical or less invaginated shape with visible edges of the clathrin lattice; and spherical CCSs that had a round shape with the invisible edges of clathrin lattice in 2D projection images. In most cases, the shapes were obvious in 2D PREM images. In uncertain cases, the degree of CCS invagination was determined using images tilted at {plus minus}10-20 degrees. The area of CCSs were measured using ImageJ and used for the calculation of the CCS occupancy on the plasma membrane.

      • Figure 5B: Can the authors explain, where exactly the GFP was engineered into AP2 alpha? This construct does not seem to be explained in the methods section.

      Response: We have added this information. The construct, which corresponds to an insertion of GFP into the flexible hinge region of AP2, at aa649, was first described by (Mino et al., 2020) and shown to be fully functional. This information has been added to the Methods section.

      • Figure S1B: The authors should indicate the colour code used for the structural model.

      Response: We have expanded our structural modeling using AlphaFold 3.0 in light of the recent publication suggesting the CCDC32 interacts with the µ2 subunit and does not bind full length AP2. These results are described in the text. The color coding now reflects certainty values given by AlphaFold 3.0 (Fig. S6B, D).

      • The list of primers referred to in the materials and methods section does not exist. There is a Table S1, but this contains different data. The actual Table S1 is not referenced in the main text. This should be done.

      Response: We apologize for this error. We have now added this information in Table S2.

      __ Significance (Required):__

      In this study, the authors analyse a so-far poorly understood endocytic accessory protein, CCDC32, and its implication for endocytosis. The experimental tool set used, allowing to quantify CCP dynamics and invagination is clearly a strength of the article that allows assessing the impact of an accessory protein towards the endocytic uptake mechanism, which is normally very robust towards mutations. Only through this detailed analysis of endocytosis progression could the authors detect clear differences in the presence and absence of CCDC32 and its mutants. If the above points are successfully addressed, the study will provide very interesting and highly relevant work allowing a better understanding of the early phases in CME with implication for disease.

      The study is thus of potential interest to an audience interested in CME, in disease and its molecular reasons, as well as for readers interested in intrinsically disordered proteins to a certain extent, claiming thus a relatively broad audience. The presented results may initiate further studies of the so-far poorly understood and less well known accessory protein CCDC32.

      Response: We thank the reviewer for their positive comments on the significance of our findings and the importance of our detailed phenotypic analysis made possible by quantitative live cell microscopy. We also believe that our new structural modeling of CCDC32 and our findings of complex and extensive interactions with AP2 make the reviewers point regarding intrinsically disordered proteins even more interesting and relevant to a broad audience. We trust that our revisions indeed address the reviewer's concerns.

      The field of expertise of the reviewer is structural biology, biochemistry and clathrin mediated endocytosis. Expertise in cell biology is rather superficial.


      References:

      Aguet, F., Costin N. Antonescu, M. Mettlen, Sandra L. Schmid, and G. Danuser. 2013. Advances in Analysis of Low Signal-to-Noise Images Link Dynamin and AP2 to the Functions of an Endocytic Checkpoint. Developmental Cell. 26:279-291.

      Chen, Z., R.E. Mino, M. Mettlen, P. Michaely, M. Bhave, D.K. Reed, and S.L. Schmid. 2020. Wbox2: A clathrin terminal domain-derived peptide inhibitor of clathrin-mediated endocytosis. Journal of Cell Biology. 219.

      Grove, J., D.J. Metcalf, A.E. Knight, S.T. Wavre-Shapton, T. Sun, E.D. Protonotarios, L.D. Griffin, J. Lippincott-Schwartz, and M. Marsh. 2014. Flat clathrin lattices: stable features of the plasma membrane. Mol Biol Cell. 25:3581-3594.

      He, K., E. Song, S. Upadhyayula, S. Dang, R. Gaudin, W. Skillern, K. Bu, B.R. Capraro, I. Rapoport, I. Kusters, M. Ma, and T. Kirchhausen. 2020. Dynamics of Auxilin 1 and GAK in clathrin-mediated traffic. J Cell Biol. 219.

      Mino, R.E., Z. Chen, M. Mettlen, and S.L. Schmid. 2020. An internally eGFP-tagged α-adaptin is a fully functional and improved fiduciary marker for clathrin-coated pit dynamics. Traffic. 21:603-616.

      Saffarian, S., and T. Kirchhausen. 2008. Differential evanescence nanometry: live-cell fluorescence measurements with 10-nm axial resolution on the plasma membrane. Biophys J. 94:2333-2342.

    1. Reviewer #3 (Public review):

      In this paper, the authors use a three-phase economic game to examine the tendency to engage in prosocial versus competitive exchanges with three anonymous partners. In particular, they consider individual differences in the tendency to infer about others' tendencies based on one's preferences and to update one's preferences based on observations of others' behavior. The study includes a sample of individuals diagnosed with borderline personality disorder and a matched sample of psychiatrically healthy control participants.

      On the whole, the experimental design is well-suited to the questions and the computational model analyses are thorough, including modern model-fitting procedures. I particularly appreciated the clear exposition regarding model parameterization and the descriptive Table 2 for qualitative model comparison. My broad question about the experiment (in terms of its clinical and cognitive process relevance): Does the task encourage competition or give participants a reason to take advantage of others? I don't think it does, so it would be useful to clarify the normative account for prosociality in the introduction (e.g., some of Robin Dunbar's work).

      The finding that individuals with BPD do not engage in self-other generalization on this task of social intentions is novel and potentially clinically relevant. The authors find that BPD participants' tendency to be prosocial when splitting points with a partner does not transfer into their expectations of how a partner will treat them in a task where they are the passive recipient of points chosen by the partner. In the discussion, the authors reasonably focus on model differences between groups (Bayesian model comparison), yet I thought this finding -- BPD participants not assuming prosocial tendencies in phase 2 while CON participant did -- merited greater attention. Although the BPD group was close to 0 on the \beta prior in Phase 2, their difference from CON is still in the direction of being more mistrustful (or at least not assuming prosociality). This may line up with broader clinical literature on mistrustfulness and attributions of malevolence in the BPD literature (e.g., a 1992 paper by Nigg et al. in Journal of Abnormal Psychology). My broad point is to consider further the Phase 2 findings in terms of the clinical interpretation of the shift in \beta relative to controls.

      On the conceptual level, I had two additional concerns. First, the authors note that they have "proposed a theory with testable predictions" (p. 4 but also elsewhere) but they do not state any clear predictions in the introduction, nor do they consider what sort of patterns will be observed in the BPD group in view of extant clinical and computational literature. Rather, the paper seems to be somewhat exploratory, largely looking at group differences (BPD vs. CON) on all of the shared computational parameters and additional indices such as belief updating and reaction times. Given this, I would suggest that the authors make stronger connections between extant research on intention representation in BPD and their framework (model and paradigm). In particular, the authors do not address related findings from Ereira (2020) and Story (2024) finding that in a false belief task that BPD participants *overgeneralize* from self to other. A critical comparison of this work to the present study, including an examination of the two tasks differ in the processes they measure, is important.

      In addition, perhaps it is fairer to note more explicitly the exploratory nature of this work. Although the analyses are thorough, many of them are not argued for a priori (e.g., rate of belief updating in Figure 2C) and the reader amasses many individual findings that need to by synthesized.

      Second, in the discussion, the authors are too quick to generalize to broad clinical phenomena in BPD that are not directly connected to the task at hand. For example, on p. 22: "Those with a diagnosis of BPD also show reduced permeability in generalising from other to self. While prior research has predominantly focused on how those with BPD use information to form impressions, it has not typically examined whether these impressions affect the self." Here, it's not self-representation per se (typically, identity or one's view of oneself), but instead cooperation and prosocial tendencies in an economic context. It is important to clarify what clinical phenomena may be closely related to the task and which are more distal and perhaps should not be approached here.

      On a more technical level, I had two primary concerns. First, although the authors consider alternative models within a hierarchical Bayesian framework, some challenges arise when one analyzes parameter estimates fit separately to two groups, particularly when the best-fitting model is not shared. In particular, although the authors conduct a model confusion analysis, they do not as far I could tell (and apologies if I missed it) demonstrate that the dynamics of one model are nested within the other. Given that M4 has free parameters governing the expectations on the absolute and relative reward preferences in Phase 2, is it necessarily the case that the shared parameters between M1 and M4 can be interpreted on the same scale? Relatedly, group-specific model fitting has virtues when believes there to be two distinct populations, but there is also a risk of overfitting potentially irrelevant sample characteristics when parameters are fit group by group.

      To resolve these issues, I saw one straightforward solution (though in modeling, my experience is that what seems straightforward on first glance may not be so upon further investigation). M1 assumes that participants' own preferences (posterior central tendency) in Phase 1 directly transfer to priors in Phase 2, but presumably the degree of transfer could vary somewhat without meriting an entirely new model (i.e., the authors currently place this question in terms of model selection, not within-model parameter variation). I would suggest that the authors consider a model parameterization fit to the full dataset (both groups) that contains free parameters capturing the *deviations* in the priors relative to the preceding phase's posterior. That is, the free parameters $\bar{\alpha}_{par}^m$ and $\bar{\beta}_{par}^m$ govern the central tendency of the Phase 2 prior parameter distributions directly, but could be reparametrized as deviations from Phase 1 $\theta^m_{ppt}$ parameters in an additive form. This allows for a single model to be fit all participants that encompasses the dynamics of interest such that between-group parameter comparisons are not biased by the strong assumptions imposed by M1 (that phase 1 preferences and phase 2 observations directly transfer to priors). In the case of controls, we would expect these deviation parameters to be centred on 0 insofar as the current M1 fit them best, whereas for BPD participants should have significant deviations from earlier-phase posteriors (e.g., the shift in \beta toward prior neutrality in phase 2 compared to one's own prosociality in phase 1). I think it's still valid for the authors to argue for stronger model constraints for Bayesian model comparison, as they do now, but inferences regarding parameter estimates should ideally be based on a model that can encompass the full dynamics of the entire sample, with simpler dynamics (like posterior -> prior transfer) being captured by near-zero parameter estimates.

      My second concern pertains to the psychometric individual difference analyses. These were not clearly justified in the introduction, though I agree that they could offer potentially meaningful insight into which scales may be most related to model parameters of interest. So, perhaps these should be earmarked as exploratory and/or more clearly argued for. Crucially, however, these analyses appear to have been conducted on the full sample without considering the group structure. Indeed, many of the scales on which there are sizable group differences are also those that show correlations with psychometric scales. So, in essence, it is unclear whether most of these analyses are simply recapitulating the between-group tests reported earlier in the paper or offer additional insights. I think it's hard to have one's cake and eat it, too, in this regard and would suggest the authors review Preacher et al. 2005, Psychological Methods for additional detail. One solution might be to always include group as a binary covariate in the symptom dimension-parameter analyses, essentially partialing the correlations for group status. I remain skeptical regarding whether there is additional signal in these analyses, but such controls could convince the reader. Nevertheless, without such adjustments, I would caution against any transdiagnostic interpretations such as this one in the Highlights: "Higher reported childhood trauma, paranoia, and poorer trait mentalizing all diminish other-to-self information transfer irrespective of diagnosis." Since many of these analyses relate to scales on which the groups differ, the transdiagnostic relevance remains to be demonstrated.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      *We are grateful for the overall positive feedback and constructive suggestions. We have been able to experimentally address several of the suggested points and provide here a revision plan addressing all of the reviewers’ additional concerns. *

      *In summary, this study is of fundamental novelty and high impact as it: *

      1. Reveals an unexpected role of ErbB3 in controlling ____Integrin β1 ____trafficking ____and thus epithelial cell motility and extracellular vesicles secretion. This may shed important insights into the role of ErbB3____ in cancer.
      2. Uncovers the first ligand-independent, non-canonical cellular function for ErbB3 as a scaffold for the Arf6-Rabaptin5-GGA3 endosomal sorting complex.
      3. Provoking the notion that pseudo-RTKs may have evolved cellular functions beyond receptor signaling, such as by scaffolding endosomal sorting compartments. *We hope that you share our view that these conceptually ground breaking findings will be of interest to a broad cross-disciplinary audience interested in cell signaling, cancer biology, endocytic trafficking and integrin biology. *

      1. Point-by-point description of the revisions

      Reviewer #1 (Evidence____, reproducibility and clarity (Required)):

      ErbB3 is well-known for its significance in cancer, which is dependent on ligand-binding and heterodimerization with other ErbB family members. In the current work, Rodrigues-Junior et al. identified novel, unexpected functions of ErbB3 in promoting early endocytic recycling and restricting exocytic trafficking (extracellular vesicles secretion) of membrane receptors, such as integrin b1 and transferrin receptor, via stabilizing the Arf6-GGA3-Rabaptin5 endosomal sorting complex. Via ErbB3 siRNA knockdown, they observed an impaired recycling of transferrin receptor and integrin b1 back to the cell membrane. The recycling assay condition (growth factor-deprived) provided a very clean result to support that this ErbB3-dependent endocytic trafficking is ligand-binding independent. The trafficking-dependence on ErbB3 (both the endocytic and the exocytic) was further supported by integrin b1 functional assays (scratch closure assay and Matrigel invasion assay). There are still some details that need to be clarified to fully understand the conclusion.

      Major points:

        • The manuscript started with a pathological correlation between high ErbB3 level and poor patient survival rate. In Fig.1, the impaired TfR recycling, and the co-localization between ErbB3 and integrin b1 were also performed in the pathological breast cancer cell line, MCF7. While investigating integrin b1 recycling, the authors suddenly switched to another two non-malignant human breast epithelial cell lines, which led to a difficult correlation of ErbB3-mediated recycling back to the disease situation. The authors should state more clearly this point, rather than data not shown. This inconsistency occurred also in other assays, for example, when addressing the trafficking from TGN to cell surface, MCF7 was utilized; while when addressing extracellular vesicle secretion, MCF10A was utilized. Response: we thank the reviewer for the comment. The rationale for using different cell-lines or primary cells is now better explained in the manuscript. We found that depletion of ErbB3 impaired recycling of Integrin β1 in the non-malignant cells, including MCF10A and primary breast epithelial cells, but not in malignant MCF7 cells that overexpress ErbB3 (data not shown). We now speculate in the manuscript that perhaps the dependence on ErbB3 for Integrin b1 recycling is lost at some point during carcinogenesis, although further studies will be needed to address this possibility. MCF7 cells were used to detect endogenous ErbB3 as normal expression levels of ErbB3 (primary MECs and MCF10A) were not detectable by immunofluorescence microscopy in our hands with a range of antibodies we tested. With regard to the transferrin recycling assay, we first attempted to use MCF10A cells for consistency, however we found that transferrin internalized poorly in these cells and the limited pool of transferrin that internalised was retained in these cells for an extended time (3 h), thus rendering them unsuitable for our transferrin experiments. *

      *Concerning the data on trafficking from the TGN to cell surface we mistakenly wrote that they were performed in MCF7 cells although they were in fact done in MCF10A cells. This is now corrected in the new version of this manuscript. *

      Additionally, based on the constructive comment by this reviewer, we have now extended the analysis of EV secretion in ErbB3, Rab4 and Rabaptin5 silenced cells to MCF7 cells. The new data is in line with our findings in MCF10A and prHMEC cells, that absence of ErbB3 significantly increased EV secretion. Moreover, Rab4 and Rabaptin5 knockdown also enhanced the amount of EVs secreted by MCF7 cells. These results were incorporated in the manuscript as new Supplementary Figure S7F-G and new Supplementary Figure S9F-G, as recommended. Furthermore, we also included in this new version that GGA3 and to a lesser extent Rab GTPase-binding effector protein 1 (Rabaptin5 or RABPT5) shared colocalisation with endogenous ErbB3 in MCF7 cells as the new Supplementary Figure 9A, B. Finally, we also attempted to conduct the Arf6 IP in MCF7 cells, but as opposed to MCF10A cells, the yield of Arf6 in pull down experiments was much lower than in MCF10A cells, and interacting proteins were not detectable.

      It was shown before that ErbB3 undergoes constitutive internalization and degradation within several hours that is independent of ligand-binding (ref#13). Can the authors provide experimental evidences to show the correlation of TfR or integrin b1 recycling with this dynamic ErbB3 levels rather than ErbB3 knockdown?

      Response: we have performed colocalization of ErbB3, traced Integrin β1 and the recycling endosome marker EHD1, showing triple colocalization in a subset of endosomes, as shown in the new Supplementary Figure S2H. Experimental limitations prevented us from including EEA1 in triple staining for mCherry-ErbB3 or endogenous ErbB3 protein. Furthermore, ectopically expressed ErbB3 in MCF10A cells did not show convincing co-localisation. We hope that the new EHD1 triple colocalization with ErbB3 and Integrin β1 in endosomal compartments satisfies this specific comment.

      As mentioned above, regarding the transferrin recycling assay, we first attempted to use MCF10A cells for consistency, however we found that transferrin internalized poorly in these cells and the limited pool of transferrin that internalised was retained in these cells for an extended time (3 h), thus preventing their use.

      The efficiency of siRNA knockdown of ErbB3 (both #1 and #2) should support the observed phenotype (Fig. 1I-J, K-L). Is there a correlation between the ErbB3 level with integrin recycling? For example, siRNA#2 led to more efficient knockdown of ErbB3 in MCF10A?

      Response: notably, the immunoblots presented here to assess the efficiency of the two different siRNAs are one example and we noted some variability between different experiments but find that both siRNAs work well and yield comparable effects on recycling of Integrin β1. Importantly, the recycling data represents biological repeats of independently performed experiments, and have yielded reproducible and consistent ErbB3 silencing using both siRNAs. This is noted by the lack of significance between ErbB3 knocked down cells in Fig. 1I-J and K-L. Hence, we consider that both siRNAs against ErbB3 worked efficiently with comparable outcome. Please also note our reply to Rev2 #07.

      ErbB3 loss led to more extracellular vesicles secretion, but also lysosomal degradation of integrin b1. This conclusion is supported by results shown in Fig.4D-E and Fig. S8A-B, while the analysis from the same cell line (MCF10A, Fig. S3A) results in no change of integrin b1 levels upon ErbB3 depletion. Fig. S3B showed also no change in a second non-malignant cell line (prHMEC). How do the authors explain this conflict?

      Response*: we thank the reviewer for this comment. We believe that the increase in EV secretion and lysosomal degradation is compensated by increase in de novo synthesis of Integrin β1 (see data below, from Fig. S3C). In the original manuscript we did not perform the appropriate statistical analysis of the RT-qPCR data. The unpaired two-tail Student’s T-test is only suitable for normally distributed samples, which is not the case here. Instead, we performed the appropriate Mann-Whitney U-test assuming non-normal distribution, yielding an exact p-value of 0.017. The figure S3A and associated text has been modified accordingly. *

      Minor points: 1. Is TfR also colocalizing with endogenous ErbB3?

      *Response: as mentioned in the major comment #02, we attempted to perform the transferrin recycling assay using MCF10A cells to enable direct comparisons with the integrin b1 recycling, but found that transferrin internalized poorly in these cells. *

      Fig. 3J, TSG101, T is masked by 3I

      Response: we apologize for this oversight. We have gone through the manuscript in detail and corrected all pointed errors accordingly.

      Page 10, the description of the EV secretion in prHMEC cells is annotated to the wrong figure. Fig S5Dà S7D; S5Eà S7E

      Response: we apologize for this oversight and have now corrected the mistake.

      Fig. 4M: How was the motility/invasion into Matrigel determined? Images? Only quantifications are shown.

      Response*: the matrigel invasion assay was described in the Material and Methods section. Accordingly, the data were expressed as the percentage of invasion based on the ratio of the mean number of cells invading through Matrigel matrix per mean number of cells in the uncoated support. For this rebuttal letter, the reviewer can find representative images of invaded MCF10A siCtrl non-treated (Ctrl) or treated with VSF secreted from MCF10A siCtrl or siErbB3. Since this is an established method to measure cell invasion, we hope the reviewer agrees that these images do not add value to the manuscript. *

      Fig. 4M: Exosomes collected from ErbB3-depleted cells promotes the migration in MCF10A-wild type cells, how about the effects on ErbB3-depleted cells? This group should be included for analysis.

      Response*: as proposed, we have treated both control and ErbB3-silenced MCF10A cells with normalized concentrations of EVs secreted from siCtrl and siErbB3 (1 x 109 nanoparticles/ mL) for 48 hours, followed by cell viability and cell invasion assays. The new data show that both EV pools modestly increased cell viability and substantially increased invasiveness of both wild-type and ErbB3-depleted cells through Matrigel (new Figures 4K and L). Together, our results indicate that while ErbB3-silenced MCF10A cells exhibited lower basal motility, ErbB3 is not required for the observed EV induced motility. The new Figures 4K and L were included and further discussed in this manuscript. *

      Quantification of the blots should be provided for Fig. 5A (GGA3), 5B (GGA3, Rabaptin5 and Arf6), 5F (GGA3) and 5G (GGA3, Rabaptin5 and Arf6). What is mock IP in each graph? The mock IP is neither mentioned in methods nor in legends.

      Response*: we have now carried out densitometry analysis in all the requested immunoblots shown in Figure 5. We also changed the mock IP term to IgG IP for clarity. The use of non-immunogenic IgG in control IPs is now specified in the methods and respective figure legend. *


      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary: In their manuscript, Rodrigues-Junior and colleagues identify a novel ligand-independent function of the tyrosine kinase receptor (RTK) ErbB3 as a regulator of integrin β1 recycling. In particular, the authors demonstrate that ErbB3 depletion reduce β1 integrin surface expression, triggering its lysosomal degradation and increasing its secretion in extracellular vesicles (EVs). Moreover, the authors show that these EVs enhanced the invasive capacity of ErbB3 wild type breast epithelial cells. In addition, the authors evidence the interaction between ErbB3, GGA3 and Rabaptin5. Loss of any of these proteins destabilizes this interaction, which abrogates integrin β1 recycling and leads to its degradation and secretion. The work is potentially interesting; however, there are some aspects that need to be analyzed in a more robust manner.

      Major comments:

      1. The manuscript is mainly focused on β1 integrin endocytic and post-endocytic fate following ErbB3 silencing, describing also a molecular mechanism underlying these observations. Despite the cited manuscript by Deneka, A. and colleagues indicates a similar mechanism for transferrin receptor (TfR) recycling, the Authors only studied the receptor internalization upon ErbB3 silencing. Therefore, this observation does not add any significance to the main topic of the manuscript and its removal should be considered. Response*: we agree with the reviewer the fate of Integrin β1 is the main focus of this manuscript. We would however favour retaining the TfR data as it implies a wider role of ErbB3, beyond trafficking of Integrin β1. We ask for the reviewer’s understanding of our rationale. *

      2.Data from Figure S1A seems to be not normally distributed. Have the Authors tested the data for normal distribution? If not, please consider it. If the data is not normally distributed, a non parametric Mann-Whitney U-Test would be more suitable.

      Response: we thank the reviewer for the comment. The differential ErbB3 mRNA expression analysis was retrieved from the widely used GEPIA2 portal (to date about 600 manuscripts cite this portal on PubMed), based on the selected datasets (“TCGA tumors vs TCGA normal + GTEx normal” or “TCGA tumors vs TCGA normal”). The method for differential analysis is one-way ANOVA, using disease state (Tumor or Normal) as variable for calculating differential expression, as it considers differential expression among several tumors.

      Tang, Z., Kang, B., Li, C., Chen, T., and Zhang, Z. (2019). GEPIA2: an enhanced web server for large-scale expression profiling and interactive analysis. Nucleic Acids Res. 47, W556–W560. https://doi.org/10.1093/nar/gkz430.

      1. The Authors studied the colocalization of ErbB3, Rab4 and Rab11, observing an increased colocalization between ErbB3 and Rab4 10 minutes following primaquine. However, the Authors previously referred to Sönnichsen, B et al. manuscript, in which TfR colocalized with Rab11 at 30min. It would be interesting to see whether ErbB3 and Rab11 colocalize at later time points in the presence or absence of primaquine. This will reinforce the conclusion that ErbB3 is involved in early Rab4-dependent recycling.

      Response: we appreciate the reviewer’s comment. However, we consider that these requested experiments will not add significant value to the novelty of this manuscript and hope that the reviewer accepts that we politely refrain from reproducing them.

      In Figure 4C the Authors observed a reduction in β1 integrin levels in ErbB3 silenced cells compared to the control already at the beginning of tracing (0 min), which might be due to accelerated turnover at the internalization step of their experimental design. To confirm this, immunofluorescence of β1 integrin in control and ErbB3 silenced cells could be performed just right after the 15min integrin internalization.

      Response: this is likely a misunderstanding as the timepoint (0 min) is defined as the point after the 15 min internalization step when the imaging-based tracing begins, which aligns perfectly with the reviewer’s request.

      In the discussion, the Authors indicate that "loss of ErbB3 redirects Integrin β1 towards lysosomes for degradation, mimicking loss of GGA3 that similarly redirects both Integrin β1 and c-Met towards lysosomal degradation, or Rabaptin5 depletion that we find similarly redirects trafficking of internalised Integrin β1 towards lysosomal degradation". However, the involvement of lysosomal degradation was only studied for ErbB3 silencing by employing chloroquine. To further support this statement, the use of chloroquine in Rabaptin5- and GGA3-depleted cells is recommended.

      Response: we appreciate the reviewer’s comment, but since these findings have been published earlier, we think that they will not add significant value to the manuscript and hope that the reviewer accepts that we politely refrain from reproducing them.

      Minor comments:

      6.The Authors should consider shortening the following sentences from the Introduction: "GGA proteins contain several functional domains that...thereby regulating sorting of cargo including Integrin β3 and TfR into recycling endosomes".

      Response: we thank the reviewer for the comment. We have now divided this sentence into two for smoother reading.

      The Authors do not show ErbB3 silencing efficiency at the protein level until Figure 3G, which should have been shown in Figure 1 or Supplementary Figure 1, as all the research is based on it. Moreover, GGA3 silencing efficiency was never tested.

      Response*: we thank the reviewer for this comment. We have included a new immunoblot confirming the silencing of ErbB3 by two independent siRNAs in MCF7 cells, as the new Supplementary Figure S2A. Please, note that GGA3 silencing was shown in the main Figure 6J. *

      Figure 1I and Figure 1K may include the representative images for the missing siErbB3 to properly illustrate the associated quantification.

      Response: we thank the reviewer for the comment. We have now included the representative images, as suggested.

      Consider including a Western blot showing the effect of lapatinib in EGFR, ErbB2 and ErbB3 protein expression, including their phosphorylated forms.

      Response: we thank the reviewer for the comment. As requested, we now show that at used concentration, lapatinib efficiently blocked tyrosine phosphorylation of ErbB3 and ERK1/2, without perturbing EGFR or ErbB3 expression levels. We also considered it relevant to show that 1 µM lapatinib used was not cytotoxic to MCF10A and MCF7 cells. We hope that these new results satisfy this specific request.

      Some supplementary figures are mislabelled, such as Supplementary Figure S5D and S5E on page 10, which should be S7D and S7E, respectively. Supplementary Figure S7C on page 15 should be S9C.

      Response: we apologize for this oversight and have performed the corrections.

      The following sentence on page 8 should be revised as a verb is missing: "which corresponds to the reported peak time when colocalization of Rab4 with traced TfR, preceding Rab11 and TfR colocalization that peaks later at 30 minutes".

      Response: we apologize for this oversight. It now reads: "which corresponds to the reported peak time of colocalization of Rab4 with traced TfR, which precedes Rab11 and TfR colocalization that peaks later at 30 minutes".

      The main text indicates that the amount of VSV-G transported to the cell surface after 30min it is not affected by ErbB3 silencing. However, in Figure 3E seems to slightly decrease following the silencing. The Authors may consider employing another Western blot image to match the main text and the quantification in Figure 3F.

      Response: as the reviewer noted the immunoblot showed a slight decrease. It is however a very modest decrease that is also observed in the positive control (MUC1) in the same Streptavidin IP sample. We ask for permission to keep these representative images.

      In the main text, a significant difference in the nanoparticles/cell between ErbB3-depleted cells and wild type or control cells were reported. However, Figure 3I only showed the statistics of each siRNA vs the control and not the wild type condition.

      Response: we apologize for this oversight. We removed from the text the comparison with the wild-type non-transfected cells to avoid misunderstanding.

      The Authors concluded that "chloroquine treatment significantly restored traced Integrin β1 levels". However, this conclusion is not reflected in the statistical analysis reported in Figure 4H, which only showed the differences between control and ErbB3 silenced cells. Thus, the statistics reported for the chloroquine results should be added.

      Response: we appreciate the comment by the reviewer. The requested comparison is now included in the new Figure 4H.

      The Authors concluded that "loss of either GGA3 or Rabaptin5 mimics the effect of loss of ErbB3 on endocytic trafficking of Integrin β1, consistent with the hypothesis that GGA3 and Rabaptin5 are effectors of ErbB3 in promoting endosomal recycling and impeding EV release". To confirm this conclusion, the inclusion of siRabaptin5 results in Figures 6H and 6J is suggested.

      Response*: we thank the reviewer for the comment. We have now included immunoblots of MCF10A cell lysate after silencing ErbB3 or Rabaptin5, as the results shown in the previous Figure 6G. We believe that these new data satisfy the specific request. *

      To be consistent with the results presentation:

      • The inclusion of Modal size is recommended in Figure 6I.

      • Some graphs show the number of cells or biological replicates while other ones no.

      • Figure 4E showed different time points for both siRNAs.

      Response: we appreciate the comment and we have now included as the new main Figure 6H the modal size for the EVs secreted by MCF10A cells upon Rabaptin5 silencing. We will ensure that all respective Figure legends indicate the number of replicates. The intermediate time points showed in the main Figure 4E are different, however since the final read out at 9 h using two independent siRNAs against ErbB3 are directly comparable we ask permission to maintain the time points with respect to the analysis we performed.

      Figure 1E represents the squared regions of Figure 1D, but it is not indicated in the figure legend.

      Response: we apologize for this oversight. We have now indicated in Figure 1 legend that Figure 1E represents the squared regions of Figure 1D, as suggested.

      In the legend of Figure 1D-G, 30min of integrin internalization is reported, where it should be 15min according to main text and methods.

      Response: we apologize for this oversight and we thank the reviewer for this comment. We have now indicated the correct time point in Figure 1 legend.

      The addition of representative images in Figure 6A is recommended, as already present in Figure 1I.

      Response: we thank the reviewer for the comment. Representative images of Fig. 6A-D were included as the new panel Fig. 6B.

      As two different siRNAs for ErbB3 were used and not in all experiments, the employed siRNA should be indicated in each experiment. In the cases where both ErbB3 siRNAs were employed, figures should report them either as main results or supplementary.

      Response: we appreciate this meticulous comment. We have now indicated in the figure and in the respective figure legends which siRNA was used in the respective set of experiments (siErbB3 #01 or #02).

      Why do the Authors use EVs enriched in the VSF or by UC to show the same result? What is the criteria to choose one or the other one? For example, in Figures 6G and 6K.

      Response*: based on the guidelines suggested by MISEV 2018 and 2023, there is no gold standard method for EV isolation. Thus, by using at least two independent methods (i.e., tangential flow filtration, followed by immuno-affinity and ultracentrifugation; UC) we validate the enrichment of EVs in our sample preparations, showing reproducible results among the different EV enrichment protocols (Figure 3). *


      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      The paper by Dorival Mendes Rodrigues-Junior et al., focuses on a novel ligand-independent role of ErbB3 receptor, modulating Transferrin receptor and integrin beta1 early recycling. Authors perform several in vitro studies where they show how ErbB3 depletion diverts integrin beta1 from recycling towards lysosomal degradation and extracellular vesicle secretion, impairing cell migration. They also provide mechanistic experiments showing the role of ErbB3 on Arf6-GGA3-Rabaptin5 endosomal complex assembly.

      Major comments:

      1. Fig. 1. Authors should co-stain with early endosomal markers (such as EEA1) to clearly show endogenous ErbB3 and Beta1 integrin endosomal co-localization. Including some insets with higher magnifications would also improve visual inspection of such interactions. Response: as requested, we have performed colocalization of ErbB3, traced Integrin β1 and the recycling endosome marker EHD1, showing triple colocalization in a subset of endosomes, as shown in the new Supplementary Figure S2H. Experimental limitations prevented us from including EEA1 in triple staining for mCherry-ErbB3 or endogenous ErbB3 protein. Furthermore, ectopically expressed ErbB3 in MCF10A cells did not show convincing co-localisation with EEA. We believe that the new triple colocalization showing ErbB3 and Integrin β1 in EHD1-positive endosomal compartments satisfies this specific comment.

      Fig. 1H and 1I. Authors need to provide TIRF penetration depth to better evaluate the potential cytosolic contribution. Additionally, plasma membrane purification studies would help to validate their live imaging results.

      Response: the TIRF penetration depth was 83nm which has now been added to the methods section. Purifications of plasma membrane fractions, following recycling of traced surface-labelled Integrin β1 in control or siErbB3 depleted cells, by cell surface biotinylation and immunoblotting of the recovered proteins is indeed a valuable approach to validate our findings. Nevertheless, we are confident about the results of our confocal imaging results. Thus, including these results might not contribute significantly to the novelty of this manuscript. Hence, we ask permission to publish the paper at this stage, without the plasma membrane purification, as this requires optimizations and will delay the publication of our paper, in addition to exhausting our limited financial resources.

      Fig. 1J. Authors should explain better how they calculated normalized fluorescence.

      Response: the normalized fluorescence is explained in the Fig. 1J legend and in the respective method section. Alexa488 intensity was normalized between 0-1, with the control as reference where Fnorm=((Fmax-Fmin)/(F-Fmin)). All data points were background corrected, followed by normalization to the pre-stimulatory level (F/F0).

      Fig. 2B. Authors should include some plasma membrane markers (such as WGA) to better localize cell surface after beta1 integrin tracing.

      Response: we appreciate the reviewer’s comment, and have attempted the suggested experiment, but in our hands, WGA did not give a clear membrane staining but a diffuse faint signal in MCF10A cells for reasons we do not fully understand.

      Fig. 1J, 1M-1L: beta1 integrin endocytic recycling should be compared across the same time-points to better evaluate kinetic differences.

      Response: the intermediate time points showed in the main Figure 1J, M-L are based on the final read out. We understand that it could be interesting evaluating the kinetic differences but this will generate a substantial number of comparisons that might be difficult for visualization. We ask permission to keep the comparisons among the latest respective time points with respect to the performed analysis.

      Fig. 3. Author should consider adding additional experiments with Rab4 and Rab11 dominant negative forms to validate their results.

      Response: the experiments proposed have been performed, but the ectopic expression of dominant negative Rab4 and Rab11 had detrimental effects to the cells, with the formation of large endosomal blobs and rounding up of the MCF10A cells. Subsequently we do not feel confident with the possible conclusions from these data. We ask the reviewer to understand this technical detail and accept the fact that we are not able to address this point.

      Fig. 4M. To validate authors' claim on the role of integrin Beta1-containing EVs on invasive behaviour, they should repeat the experiment using blocking beta1 antibodies prior to EV addition.

      Response*: we thank the reviewer for this comment. As requested, we performed the experiment using the Integrin β1 blocking monoclonal antibody (mAb; clone P4C10). The new data show that P4C10A treatment alone or in combination with EVs derived from MCF10A cells transfected with siCtrl or siErbB3 significantly reduced invasiveness in comparison to IgG treatments, confirming the mechanistic role of Integrin β1 promoting MCF10A invasive behaviour. The new Figure 4M was included and further discussed in this manuscript. *

      While authors claim that their results could potentially clarify different aspects of tumour dissemination, most of their experiments are done in MCF10A, a non-tumorigenic epithelial cell line. To better support their conclusion, they should reproduce key experiments in MCF7 or other tumorigenic cell line.

      Response: we thank the reviewer for the comment. As explained in response to reviewer 1, the rational for using different cell-lines or primary cells is now better explained in the manuscript. We found that depletion of ErbB3 impaired recycling of Integrin β1 in the normal non-malignant cells including MCF10A and primary breast epithelial cells, but not in malignant MCF7 cells that overexpress ErbB3 (data not shown), which is now discussed in the paper. Moreover, *MCF7 cells were used to detect endogenous ErbB3 as normal expression levels of ErbB3 (primary MECs and MCF10A) were not detectable by immunofluorescence microscopy with a range of antibodies we tested. Furthermore, we also included in this new version that GGA3 and Rab GTPase-binding effector protein 1 (Rabaptin5 or RABPT5) shared colocalisation with endogenous ErbB3 in MCF7 cells as the new Supplementary Figure 9A, B. Finally, we also attempted to conduct the Arf6 IP in MCF7 cells, but as opposed to MCF10A cells, the yield of Arf6 in pull down experiments was much lower than in MCF10A cells, and interacting proteins were not detectable. *

      Minor comments:

      1. Fig. 1D-1F: please explain better if beta1 integrin surface signal was quenched in these specific set of studies. Response: Beta1 Integrin was quenched on ice with an antibody against Alexa488 as described by Arjonen et al. (Traffic, 2012; DOI: 10.1111/j.1600-0854.2012.01327.x), and further outlined in the methods section and results section (page 6 and schematic Fig4A).

      Suppl. Fig. 3A: last WB lane should read "siErB2" instead of "siErbB3".

      Response: we thank the reviewer and we apologize for this oversight. We corrected the siErbB2 lane in Supplementary Figure 3A, as requested.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment:

      This study provides valuable insights, addressing the growing threat of multi-drug-resistant (MDR) pathogens by focusing on the enhanced efficacy of colistin when combined with artesunate and EDTA against colistin-resistant Salmonella strains. The evidence is solid, supported by comprehensive microbiological assays, molecular analyses, and in vivo experiments demonstrating the effectiveness of this synergic combination. However, the discussion on the clinical application challenges of this triple combination is incomplete, and it would benefit from addressing the high risk associated with using three potential nephrotoxic agents in vivo.

      The development of novel pharmaceutical dosage forms, pharmacokinetic, pharmacodynamic and safety analysis of the triple combination will be further conducted in our next study to provide a theoretical basis for the next clinical drug use. The discussion of potential toxicity of AS, colistin, EDTA and the triple combination have been added in line 318 to 337.

      Public Reviews:

      Reviewer #1 (Public Review):

      (1) The study focuses on a limited number of Salmonella strains, and broader testing on various MDR pathogens would strengthen the findings.

      The number of COL-resistant clinical strains that actually used was larger than that mentioned in our original article, when evaluating the antimicrobial activities of AS, EDTA, COL alone or drug combinations. But, considering that there were superfluous results of mcr-1 positive Salmonella strains, we omitted these results (Table supplement 7 and 8 in revised supplement materials) to avoid redundant data presentation in the original article. Additionally, much more gram-negative and -positive MDR bacteria, such as Klebsiella pneumoniae, Pseudomonas aeruginosa and Staphylococcus aureus will be selected for the next study including the development of novel pharmaceutical dosage forms, pharmacokinetic, pharmacodynamic and safety analysis et al.

      (2) While the study elucidates several mechanisms, further molecular details could provide deeper insights into the interactions between these drugs and bacterial targets.

      In our next study, further molecular details will be focused on the regulatory targets of CheA and SpvD-related pathways, as well as the precise inhibition targets of MCR protein by the triple combination, through the generation of deletion or point mutations, and analysis of intermolecular interactions.

      (3) The time-kill experiment was conducted over 12 hours instead of the recommended 24 hours. To demonstrate a synergistic effect among the drugs, a reduction of at least 2 log10 in colony count should be shown in a 24-hour experiment. Additionally, clarifying the criteria for selecting drug concentrations is important to improve the interpretation of the results.

      The time-kill experiment of 24 hours have been re-executed and could be used to replace the Figure 1 in the original paper. The New Figure 1 has been uploaded and the change do not affect our interpretation of the result.

      Although in vitro studies have determined that with increasing dose of AS and EDTA, the antibacterial synergistic activity was gradually enhanced, and meanwhie, may also resulting in more toxic side effects. Thus, in our study, the 1/8 MICs of AS and EDTA were selected to ensure excellent antibacterial activity whereas minimize the potential toxicity. The instructions on the selection of drug concentration have been added in line 323 to 326.

      (4) While the combination of EDTA, artesunate, and colistin shows promising in vitro results against Salmonella strains, the clinical application of this combination warrants careful consideration due to potential toxicity issues associated with these compounds.

      The development of novel pharmaceutical dosage forms, pharmacokinetic, pharmacodynamic and safety analysis of the triple combination will be further conducted in our next study to provide a theoretical basis for the next clinical drug use.

      Reviewer #2 (Public Review):

      (1) The study by Zhai et al describes repurposing of artesunate, to be used in combination with EDTA to resensitize Salmonella spp. to colistin. The observed effect applied both to strains with and without mobile colistin resistance determinants (MCR). It was already known that EDTA in combination with colistin has an inhibitory effect on MCR-enzymes, but at the same time, both colistin and EDTA can contribute to nephrotoxicity, something which is also true for artesunate. Thus, the triple combination of three nephrotoxic agents has significant challenges in vivo, which is not particularly discussed in this paper.

      The discussion of potential toxicity of triple combination has been added in line 318 to 337.

      (2) The selection of strains is not very clear. Nothing is known about the sequence types of the strains or how representative they are for strains circulating in general. Thus, it is difficult to generalize from this limited number of isolates, although the studies done in these isolates are comprehensive.

      The tested strains in this study were all COL-resistant clinical isolates, and the genome sequencing and comparative analysis of these strains have not been analyzed. The antibacterial activities of different antimicrobial drugs against the S16 and S30 strains have been measured and listed in the Table supplement 9 within revised supplement materials. Considering that the number of COL-resistant clinical strains that actually used was larger than that mentioned in our original article (see the NO.1 response to the Public Reviewer #1), we think that the results obtained in this study could be representative to some extent.

      (3) Nothing is known about the susceptibility of the strains to other novel antimicrobial agents. Colistin has a limited role in the treatment of gram-negative infections, and although it can be used sometimes in combination, it is not clear why it would be combined with two other nephrotoxic agents and how this could have relevance in a clinical setting.

      The antibacterial activities of different antimicrobial drugs against the S16 and S30 strains have been measured and listed in the Table supplement 9 within revised supplement materials. Additionally, the discussion of potential toxicity of triple combination has been added in line 318 to 337.

      (4) It is not clear whether their transcriptomics analysis should at least be carried out in duplicate for reasons of being able to assess reproducibility. It is also not clear why the samples were incubated for 6 hours - no discussion is presented on the selection of a time point for this.

      As it can be seen from the time kill curves that the survival number of bacteria started to decrease after 4 h incubation of drug combinations. If the incubation time is too short (for example less than 4 h), the differentially expressed genes can not be fully revealed, while too long incubation time (such as 8 h and 12 h) may lead to a significant CFU reduction of bacteria, and result in inaccurate sequencing results. Therefore, we selected the incubation time 6 h, at which point drugs exhibited  significant antibacterial effects and there were also enough survival bacteria in the sample for transcriptome analysis. Each sample had three replications to preserve the accuracy of results.

      (5) Discussion is lacking on the reproducibility and selection of details for the methodology.

      The results obtained in this paper have been repeated several times, which indicated that the detailed operation steps described in the materials and methods section were reproducibility. To avoid redundancy, we did not include too much details in the discussion section.

      Reviewer #3 (Public Review):

      (1) Number of strains tested.

      The number of COL-resistant clinical strains that actually used was larger than that mentioned in our original article (see the NO.1 response to the Public Reviewer #1)

      (2) Response to comment: Lack of data on cytotoxicity.

      The pharmacokinetic, pharmacodynamic and safety analysis of the triple combination will be further conducted in our next study to provide a theoretical basis for the next clinical drug use.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Introduction:

      The introduction should provide more context about the pathogen Salmonella, its significance in both human and veterinary medicine, and the impact of colistin resistance in these pathogens. Salmonella is a leading cause of foodborne illnesses worldwide, resulting in substantial morbidity and mortality. It can cause a range of diseases, from gastroenteritis to more severe systemic infections like typhoid fever and invasive non-typhoidal salmonellosis. In veterinary medicine, Salmonella infections can lead to significant economic losses in livestock industries due to illness and death among animals, as well as through the contamination of animal products.

      The description has been added in the introduction section in line 47 to 53.

      (2) Results and Discussion:

      (1) While the combination of EDTA, artesunate, and colistin shows promising in vitro results against Salmonella, the clinical application of this combination warrants careful consideration due to potential toxicity issues associated with these compounds. Colistin is known for nephrotoxicity and neurotoxicity, limiting its use to severe cases where the benefits outweigh the risks. EDTA, as a chelating agent, can disrupt essential metal ions in the body, posing risks of metabolic imbalances. Although it has clinical applications, primarily in cases of heavy metal poisoning, its use as an adjuvant in antibiotics may present risks. Although generally well-tolerated for malaria, interactions of artesunate with other drugs and long-term safety in combined therapies require thorough investigation.

      The discussion of potential toxicity of triple combination has been added in line 318 to 337.

      (2) Table 1: The manuscript mentions that some strains used in the study are mcr-positive and mcr-negative. It is important to indicate in Table 1, in addition to the identification of Salmonella species, which strains are mcr-positive or mcr-negative.

      The relevant information has been added in Table 1.

      (3) Figure 2: What is the authors' hypothesis regarding the growth curves labeled "a" and "e" where strains JS and S16 resume growth 12 hours after treatment with AS? In the legend of Figure 2, describe what was used as the "positive control group."

      The growth curves labeled “a” and “e” were in Figure 1. After incubated with AC for 8 h, the survival CFUs of JS and S16 strains showed a slightly reduction, but there were still living cells. Since the bactericidal activity of AC is not strong enough to exert sustained bactericidal activity, these remaining living cells will resume growth after treatment with AC for 12 h. The “positive control group” in the legend of Figure 2 has been indicated in line 724.

      (4) What is the authors' hypothesis for the differences observed in the transcriptome and metabolome?

      The changes in gene transcription level may cause corresponding changes in protein level, but these proteins are not all involved in the bacterial metabolic process. For example, MCR protein  is encoded by the COL resistance related gene mcr, which mediates the modification of lipid A, but are not involved in the cellular metabolic process. Therefore, the transcriptome change of mcr gene may affect the protein production of MCR, nor the bacterial metabolic processes, so there are differences observed in the transcriptome and metabolome.

      (5) In some parts of the text, the authors state that artesunate and EDTA potentiate the action of colistin, which is a bacteriostatic drug. However, in other parts, the authors describe the effect of the AEC combination as bacteriostatic (Abstract: line 32; Results: line 179). How do the authors explain this inconsistency?

      The artesunate and EDTA could be regarded as “adjuvants” for the bacteriostatic drug colistin. Adjuvants itself act no or weak antibacterial effect on bacteria. For antimicrobial drugs, the “adjuvants” are compounds that generally used in combination with antibacterial drugs to re-sensitizing bacteria that have developed drug resistance. Thus, in this paper the AEC combination could be regared as bacteriostatic.

      (6) According to Brennan & Kirby (2019; doi: 10.1016/j.cll.2019.04.002), to evaluate the synergism between different drug combinations, bacterial growth curves need to be assessed over 24 hours. If the colony count is {greater than or equal to} 2 log10 lower than that of the most active antimicrobial alone, the combination is considered synergistic. Based on the growth curve results shown in Figure 1, the experiment was conducted for 12 hours, and in some cases, only a small reduction in growth was observed, even at the maximum concentration of colistin. Moreover, in some cases, the curve resumes rising between 8 and 12 hours. What is the authors' hypothesis in this case? It is important to conduct the assay over 24 hours to confirm the synergism between these drugs.

      The time-kill experiment of 24 hours have been re-executed and could be used to replace the Figure 1 in the original paper. Additionally, the phenomenon that “the curve resumes rising between 8 and 12 hours” has been explained in the response to comment of “Reviewer #1 (Recommendations For The Authors), Results and Discussion, (3) Figure 2”.

      (7) To prove that CheA and SpvD play a critical role in the effect of the AEC combination, deletion of these genes should be performed, and the mutant strains should be tested.

      The deletion of cheA and spvD will be carried out in our next study.

      (8) To demonstrate that the flagellum is no longer assembled, a transmission electron microscopy image using antibodies against flagellin should be performed, along with motility tests.

      The motility assays have been performed and displayed as Figure supplement 5 in the revised supplement materials.

      (9) Figure 7: In the X-axis legend, specify what "model" refers to.

      The “model” refers to the PBS control group that mice were treated with PBS after the intraperitoneal injection of 100 µL bacterial solution (1.31 × 10<sup>5</sup> CFU).

      (10) Figure 8 Legend: In the legend of Figure 8 (line 717), are the authors referring to E. coli or Salmonella?

      It referred to Salmonella, which has already been illustrated in the headline of Figure 8 in the revised manuscript.

      (3) Materials and Methods:

      (1) Bacterial Strains and Agents: It would be beneficial to include in the table the species of the strains used in the study, as well as the concentrations of colistin, artesunate, and EDTA utilized (lines 321 - 332).

      We have ever tried to add the above information to Table 1, but the addition of this information would make the table too large and beyond the margins, which is not conducive to the layout design of the table, so we chose to display these information in the materials and methods section instead of the table.

      (2) Antibacterial Activity In Vitro: Ensure clarity and well-defined ranges for the concentrations of colistin, EDTA, and artesunate used separately and in combinations (lines 335 - 344).

      The drug concentrations have been listed in line 369 to 371.

      (3) Time-Kill Assays: Clarify the criteria for selecting concentrations, whether based on MICs or peak and trough concentrations relevant to human and animal treatments with colistin (lines 345 - 351).

      Although in vitro studies have determined that with increasing dose of AS and EDTA, the antibacterial synergistic activity was gradually enhanced, and meanwhie, may also resulting in more toxic side effects. Thus, in our study, the 1/8 MICs of AS and EDTA were selected to ensure excellent antibacterial activity whereas minimize the potential toxicity. The instructions on the selection of drug concentration have been added in line 323 to 326.

      (4) General Corrections: Throughout the manuscript, correct typographical errors and consistently include the concentration values in mg/L alongside the MIC fractions. Specify the strains used for all experiments to ensure clarity. In the manuscript, the term "medication regimens" is used to describe the experimental setups involving different combinations of drugs tested in vitro. To improve accuracy and clarity, it is recommended to use the term "drug combination" instead. This term is more appropriate for in vitro experiments and will help avoid confusion with clinical treatment protocols.

      The typographical errors have been checked and corrected throughout the manuscript, and the “medication regimens” have been replaced by “drug combinations”.

      Reviewer #2 (Recommendations For The Authors):

      Please see above for recommendations on what can be done to improve the manuscript.

      While other omics analyses have been conducted herein, the authors do not comment on the genomic analysis of their own strains. It would have been a natural step to sequence all the strains used in the experiments.

      Due to limited program funding, the genome sequencing and comparative analysis of these strains have not been analyzed. The antibacterial activities of different antimicrobial drugs against the S16 and S30 strains have been measured and listed in the Table supplement 9 within revised supplement materials.

      Some minor comments:

      (1) There are some spelling errors - e.g. "bacteria strains" instead of "bacterial strains".

      The grammar and spelling errors have been corrected throughout the manuscript.

      (2) I would avoid words like "unfortunately".

      The word “unfortunately” has been changed.

      (3) Some MIC-values in Table 1 seem incorrect - e.g. 24 mg/L. This is not a 2-log value - the value should be 32 mg/L if the dilution series has been carried out correctly.

      We are so sorry for the mistake. The data has been corrected, and we also checked other data.

      Reviewer #3 (Recommendations For The Authors):

      Below are some suggestions.

      (1) Sentences L47 & L48 "Infections with antibiotic-resistant pathogens, especially carbapenemase-producing Enterobacteriaceae, represent an impending catastrophe of a return to the pre-antibiotic era" - this is slightly exaggerated! I also wonder if we need to use Enterobacterales instead of Enterobacteriaceae.

      The sentences in L47 & L48 have been changed. We googled the “carbapenemase-producing Enterobacteriaceae” and found it is a high-frequency word in numerous reports.

      (2) L48. The drying up of the antibiotic discovery pipeline is NOT necessarily the reason to use colistin as a drug of last resort!

      The sentence has been revised.

      (3) The manuscript requires extensive English editing but has merit based on the strong compilation of data.

      We have optimized and revised the writing of the whole article.

      (4) I suggest the authors have some data on the cytotoxicity of AS alone, colistin alone, and both of them against eucaryotic cells (Caco-) and if possible determine IS (index selectivity). This additional experiment will strengthen the quality of the manuscript. The authors must also explain how to put such tri-therapy into practice.

      The development of novel pharmaceutical dosage forms, pharmacokinetic, pharmacodynamic and safety analysis of the triple combination will be further conducted in our next study to provide a theoretical basis for the next clinical drug use. The discussion of potential toxicity of AS, colistin, EDTA and the triple combination have been added in line 318 to 337.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviews for their thorough assessment of our manuscript and their constructive suggestions for further improvement. We are pleased that the reviewers recognise that “this work represents an important and substantive contribution” to the field of genome organization and gene transcription.

      Reviewer 1

      1) Does the CTCF degron substantially remove CTCF from the Mnx1/Shh TAD border? In prior AID-CTCF degron studies a considerable fraction of cohesin dependent TAD borders are retained upon CTCF removal. Moreover, CTCF sites at these retained borders still have clear ChIP-seq peaks - even though the protein is >95% depleted and scarcely detectable by western. Thus, while I suspect that the authors are correct that the shorter distance of the 35 kb border deletion contributes substantially to the increased crosstalk between the Mnx1 and Shh-enhancers, I suspect part of the reason for a lack of a similar effect in the CTCF degron is due to the known challenges in removing CTCF from this border. To argue that the border but not the CTCF is important, I think it would be helpful to show the CTCF signal is sufficiently lost in the degron by ChIP-seq and/or show that this TAD border has been lost by Hi-C. Alternatively, the authors could tone down this claim to something more conservative, as I did not find it to be presented as a key conclusion of the paper as a whole.

      We used the CTCF-AID mESC line published by Nora et al (2017). In our previous manuscript (Kane et al., 2022) we presented the published Hi-C and CTCF-ChIP-seq data from these cells at the Shh TAD (Fig 2c of Kane et al) – reproduced below for the reviewer’s benefit. This shows the loss of insulation at the Shh/Mnx1 TAD boundary when CTCF is degraded, and the loss of CTCF ChIP-seq signal at this boundary.

      • *

      2) In my opinion, the authors' description of existing data for the importance of TAD borders in enhancer promoter regulation is not described in a sufficiently balanced and complete manner, and overall impression given by the text is that CTCF marked borders have little serious evidence for a role in developmental enhancer specificity and are maybe a cancer thing. This is doubly unfortunate, as it undermines the impact of the authors work in expanding our view of what TAD borders are in a regulatory sense, as well as presents an unbalanced view of work in the field. This is of course easily corrected. In particular, I recommend the following revisions: It is " depletion of CTCF has only a small effect on transcription in cell culture (Nora et al., 2017; Hsieh et al., 2022)." It should be clarified that there is only a small *acute * effect on transcription (in the first 6-12 hours), which may tell us more about the timescale at which promoters sample, integrate and respond to changes in their enhancer environment than about the roles of CTCF particularly. Notably, this degradation is *lethal*, it results in massive changes in transcription after 4 days, and I suspect the authors agree that this lethal affect arises from CTCF's role in transcription regulation (if you remove some key cytoskeletal protein or metabolic enzyme the primary cause of cell death is not transcriptional, but almost all the evidence for CTCF's vital role in the cell is linked in one way or another to transcription).

      As suggested by the reviewer we have inserted the word “acute” into that sentence.

      The discussion of TAD border deletions is more one-sided than ideal. I appreciate the discussion is usually even more unbalanced when presenting the opposite view in the literature - many works only cite the examples where border deletion does lead to ectopic expression and phenotypes. The current text presented a subset of these border deletion data in such a way as to give me the impression the authors are deeply skeptical that CTCF plays a role as an insulator of E-P interactions in a developmental context (rather than just as a weird cancer thing). For example: Pennacchio's lab has analyzed a series of TAD border deletions with more examples of both lethal effects and effects with no apparent phenotype 3

      I appreciate that Bickmore and colleagues found quite phenotypically normal mice upon deletion of CTCF sites from Shh, but it might be balanced to still reference the work from Uishiki et al that indicate in humans the CTCF site does play a role in Shh - ZRS communication. As the authors are doubtless aware, Andrey and colleagues show a CTCF dependent enhancement of a sensitized ZRS enhancer. Zuin et al. in an elegant experiment in which an enhancer is mobilized to different distances away from its promoter using transposon induction, reported a complete lack of detection of enhancers mobilizing outside the TAD to activate gene expression. A balanced presentation of the data on CTCF role might include some discussion of the above. In light of these earlier works, the findings the authors report about border bypass are all the more surprising.

      • *

      We thank the reviewer for highlighting some of these studies, especially for drawing our attention to the interesting recent preprint from Chakraborty et al. (doi.org/10.1101/2024.08.03.606480), which we now discuss in the revised manuscript. * As suggested by the reviewer, we now also cite Ushiki et al., 2021 in the Introduction in the context of CTCF-associated phenotypes, rather than just in the Discussion as in the original submission. We already cited the work of Andrey and colleagues (Paliou et al). However, we chose not to cite the Pennacchio study, because the deletions used were large – all >10kb and some as large as 80kb. Therefore, we consider it highly likely that other regulatory sequences beyond CTCF sites themselves may have been deleted, complicating conclusions drawn about the function of the TAD boundaries per se. We have also chosen to focus our discussion on studies of enhancers in their native genomic locus, and predominantly in vivo analyses, rather than ectopic enhancer integrations (such as Zuin et al) in cell lines.*

      4) By contrast, direct evidence for cross TAD interactions at endogenous loci has not to my knowledge been shown as clearly as described in the current manuscript. Recent work from Rocha and colleagues showed evidence that some enhancers upstream of Sox2 can pass ectopically induced boundaries. While recent work has described examples of 'TAD border bypass' at endogenous loci (e.g. for Pitx1 8, Hoxa regulation 9), these reports really just expand the view of regulatory boundaries rather than provide evidence against it. They invoke a 3D stacking of boundaries that allows boundary proximal enhancers and promoters to stack with (and so bypass) an intervening TAD boundary. Notably, in this view enhancers and promoters that lie away from the border of their respective TADs are still separate, and indeed intervening genes between distal enhancers for Pitx1 and Hoxa appear to follow these rules.2 Mnx1 and the Shh enhancers by contrast do not appear to be an example of border stacking. Given that Sox2 at least is also a TAD border, and the position of the bypassing enhancers is not precisely known in the work from Rocha, it is possible that that case is also an example of boundary stacking, which appears less likely in the case of Mnx1 (which does not appear to be at CTCF marked border, at least in mESCs).

      • *

      We thank the reviewer for highlighting some of these studies. We had already discussed the study from Rocha and colleagues (Chakraborty et al., 2023) and we had discussed the boundary stacking paper from Hung et al, (2024). However, based on the reviewer’s comment we now include a specific discussion about TAD boundary stacking and boundary proximal enhancer bypass, noting that Mnx1 is not close to a TAD boundary. This will become even more relevant in our planned revised manuscript where we will investigate possible Mnx1 activation by Shh enhancers (SBE2/3) located even further away from the Shh/Mnx1 TAD boundary.

      Statistics: Some of the bar graphs quantifying the %-expressing cells do not obviously have associated n-values, as are some of the violin plots of the distances. I think all these bar graphs could also benefit from adding error bars (e.g. by bootstrapping from the sampled population). This will help the reader more easily appreciate how sampling error and sample size affect the variation seen in the plots.

      We will add the n-values to all graphs. Regarding error bars, we think that showing the data from the two biological replicate separately is a better way to show the data reproducibility to the reader, than using boostrapping to estimate error bars.

      Figures 2 and 3: I would have preferred the authors zoom in more on the FISH spots to help the reader appreciate the proximity. I do appreciate also seeing a field of more than 1 cell (to give some sense of the variability), but these images mostly have only 1 spot pair per panel, which is exceedingly small as they contain parts of more than 1 nucleus. There is also unnecessary white space in this figure that could have been used to show zoom in panels.

      The same applies to the image panels in Figure 3 as for figure 2 - there is considerable unused whitespace, the image panels capture mostly a single nucleus and its pattern of DAPI dense heterochromatin (which isn't particularly relevant to the narrative) while the fluorescent spots that are the focus of the narrative are quite small. It is nice to have an example of the cell to see that this isn't just random background (that there is just one spot per cell) - in that sense though it's equally helpful to show its not just 1 cell in the field that has the signal-to-noise (SNR) shown. For this figure and the panels in figure 2, I'd recommend showing a zoom out showing ~3 nuclei with transcription foci (at least in the regions where the % transcribing is >60% it should be fine to have adjacent nuclei transcribing, for those where it is 10%, 1 of 3 nuclei transcribing in the image selected would also help get the sense of the data). These zoom out images would also give a sense of the SNR in the image, and then a zoom in where the FISH spots are sizable would make it easier to see the neighboring transcripts. Extended Data Fig 3 does a better job showing the context of the limb and then zooming in to an image where the RNA spots are appreciable. It looks like the resolution of the zoom in is lower, such that zooming in further on the spots in this data may not enhance the image.

      • *

      In response to the reviewer’s comment, we will present zoomed-out and zoomed-in images as suggested.

      1. Figure 3 - DNA FISH It would be helpful to include a diagram indicated where the DNA FISH probes are located on the genome and their size in kb as an inset in the figure.

      2. *

      We will indicate the locations of DNA-FISH probes in a revised version of Figure 1a. Probe sizes are listed in the supplementary tables. We have now made this clearer in the legend to Figure 3.

      • *

      Reviewer 2

      The authors claim that co-expression of Mnx1 and Shh in the foregut and lung buds is also driven by boundary crossing contacts with the MACS1 enhancer. However, the effect of the boundary deletion on the co-transcription of Shh and Mnx1 is only showed for the ZPA. In this sense I find potentially misleading the statement of the authors in the following paragraph: "In the ZPA, the foregut, and the lung buds, the majority of Mnx1 RNA-FISH signals are at alleles that show simultaneous signal for Shh nascent transcript from the same allele (closely apposed signals) (Fig. 2a, b and Extended Data Fig. 2a). In del 35 embryos, an even higher proportion of Mnx1 transcribing alleles also transcribe Shh (Fig. 2b,Extended Data Fig. 2a, Extended Data Table 3.). These data suggest that both the ZRS and MACS1 enhancers are able to simultaneously activate transcription at two gene loci on the same chromosome". In my opinion this phrasing implicitly extends the increase in Mnx1-Shh co-expressing nuclei observed in the ZPA of 35 del embryos to the expression of these two genes in the foregut and lung buds (driven by the MACS1 enhancer) while this effect has not been specifically addressed. In a previous work, the authors showed that boundary deletion does not impact Mnx1 expression in the foregut and lungs. It would be important to clarify whether more precise analysis in this study have led to different conclusions or, alternatively, appropriately discuss the results. Ideally the authors should analyse the effect of the 35 del allele in the foregut / lung buds or rephrase the statement about the sharing of the MACS1 enhancer. * *

      The reviewer is correct that in our previous publication (Williamson et al., 2019) we did not detect Mnx1 expression in the lungs of 35kb del embryos. However, we only examined this by in situ hybridisation so we probably lacked the sensitivity to detect weak Mnx1 expression. In response to the reviewer’s comments, we now propose to do RNA FISH for transcription at Mnx1 in other tissues of 35kb del embryos.

      The authors use the quantifications of nuclei co-expressing Mnx1 and Shh from the same allele as an indicator of simultaneous transcription of the two genes by the sharing of the enhancer as opposed to a model of alternate transcriptional bursts. However, I am concerned that the time scale at which looping and transcriptional bursts occur is at odds with the detection of nascent transcription in FISH experiments, thus not excluding that shifting of the enhancer from one promoter to the other could still result in detection of nascent RNA of the two genes in the same allele. In any case, following the argumentation of the authors, the fraction of nuclei expressing Mnx1 alone does not appear to be significantly different from those expressing Mnx1 and Shh, and the increase of Mnx1 expressing nuclei upon boundary deletion seem proportionally similar to the increase of Mnx1+/Shh+ nuclei. In my opinion, this makes it difficult to interpret the detection of Mnx1 alone or both Mnx1-Shh expression as a reflection of alternate looping and transcriptional burst from enhancer sharing. Determining whether the two promoters compete for the interaction with the enhancer or share it would require estimate whether in the 35 del homozygote embryos Shh expression is reduced compared to wts, as a result of the increased interaction of the ZRS with the enhancer. The authors claim that there are no differences in the % of cells expressing Shh upon boundary deletion but in my opinion measurement is not sufficient to estimate a change in transcriptional rate (frequency of bursting). Nascent mRNA level detection in single cells would allow to better asses competition or concomitant activation of the two gene. Not being an expert in the RAN FISH technique it is not clear to me whether fluorescence intensity could be used as an estimator of transcription. From the images of the authors, in some cases it seems that expression of Shh alone is higher than when both Shh and Mnx1 are transcribed from the same allele (Fig. 2a, left panel, Fig 2c left vs right panel. However, in other cases an opposite trend can be observed (Mnx1 intensity in Fig2a central vs right panel). Thus, a single nuclei PCR or RNAseq approach may be more suited for this assessment.

      • *

      We respectfully disagree with the reviewer. We argue that nascent RNA FISH, using probe pools that for the most part detect the introns of Shh and Mnx1, is a better measure of transcription bursting/frequency (on or off) than probe signal intensity and therefore is a measurement of transcription rate. Single nuclei PCR or RNAseq would not assay nascent transcription and would not distinguish between alleles.

      Minor comments: 3. In the mESC model overexpressing the tZRS-VP64 construct, Shh and Mnx1 seem to be transcribed at similar rates compared to what observed in vivo (where only a minor fraction of Shh+ cells express Mnx1). Thus, despite the fact that TAD boundary deletion increases Mnx1, but not Shh, expression, the ZRS activity seems to more easily overcome the border in this context than in vivo. Could the authors comment on this interesting observation? May it relate to the insulation score of TAD boundaries in the mESCs compared to in vivo? Alternatively, could it reflect that combinatorial TF binding to an enhancer contribute to its directionality?

      • *

      *These are interesting speculations by the reviewer, but we would argue that it is hard to compare in vivo and in vitro experiments. For example, in the limb bud, the ZPA region where the ZRS is active cannot be distinguished morphologically from the surrounding mesenchymal cells, therefore it is likely that some nuclei that are just outside the ZPA may be included in the analysis. *

      Overall, figure organization and clarity could be improved. For example, enlargement of RNA fish images in Fig. 1 could be enlarged (to the same size than the broad view image) and RNA FISH signal could be highlighted with arrowheads. Panel distribution could also be optimized.

      • *

      We will try to clarify these figures – see also response to reviewer 1 (point 6).

      • *

      Reviewer 3

      There are a couple of claims and conclusions that are not fully supported by the data, and which I think could be resolved by rephrasing them and/or qualify them as preliminary or speculative. The authors often indicate co-expression as suggestive of co-regulation by a single enhancer, when in most cases this is not formally shown; such suggestion remains one among other possibilities. For instance, co-expression of Shh and Mnx1 in the developing bud is attributed to the ZRS enhancer, co-expression of Shh and Mnx1 in the foregut is attributed to MACS1 enhancer. Do the authors have any evidence that when deleting these enhancers, Mnx1 expression is abolished (or reduced) in the respective tissues?

      If not, I think the following sentences need revision, because causality is implied by the way it is written but it is not formally shown (and the data could suggest other options too):

      "However, we have previously identified that ZRS can also drive low level expression of Mnx1, located 150kb away in the adjacent TAD, in the developing limb bud (Williamson et al., 2019)." No genetic evidence is provided in Williamson et al. 2019

      i) It is true that in Williamson et al., we did not provide genetic evidence that ZRS is the enhancer responsible for Mnx1 expression in the limb bud ZPA. However, there is no other known enhancer in biology with activity specific to the ZPA, and when the ZRS is deleted the ZPA no longer functions as a signaling centre for the limb bud. As a compromise, we have rephrased the indicated text to “However, we have previously identified that ZRS also appears to be able to drive low level expression of Mnx1, located 150kb away in the adjacent TAD, in the ZPA of the developing limb bud”.

      "However, we also detect nascent transcription from Mnx1 in the Shh expressing portions of the developing ventral foregut and the lung bud of E10.5 embryos, an activity that is driven by the Shh MACS1 enhancer, located a further 100kb into the Shh TAD from ZRS (Sagai et al., 2017) and therefore able to induce transcription at Mnx1 across a TAD boundary from a distance of >260 kb (Fig. 1a)."

      ii) We have modified the text to now read “However, we also detect nascent transcription from Mnx1 in the Shh expressing portions of the developing ventral foregut and the lung bud of E10.5 embryos, an activity that is likely to bedriven by the Shh MACS1 enhancer, located a further 100kb into the Shh TAD from ZRS”.

      "These data suggest that both the ZRS and MACS1 enhancers are able to simultaneously activate transcription at two gene loci on the same chromosome."

      iii) We have modified this statement to now read that these enhancers “may be able to simultaneously activate transcription at two gene loci on the same chromosome”.

      "This is the first report of two endogenous mammalian genes transcribed simultaneously under the control of the same enhancer" (can the authors really claim this without genetic evidence, i.e., deleting the enhancer? Isn't that the golden standard in the field?).

      iv) We stand by this claim, because we have been able to provide evidence in support of our observations in tissues, by using synthetic enhancer activation in cell culture where we can be absolutely be sure what the enhancer responsible for activation is.

      "Therefore, the Shh ZRS enhancer can simultaneously activate transcription at two genes and across an intact, but porous, TAD boundary. See response (iv) above

      "This is a consequence of ZRS-driven activation, not Mnx1 transcription per se."

      v) We stand by this claim.

      The mathematical model, even if simple, is very poorly described. In the results section, it is not easy to understand what the model takes into account, etc; it would be important for non-experts to understand as well what is at stake. In the methods section, it does not seem to be properly described; it is only stated "The association between the transcription of Shh and Mnx1 regulated by the same enhancer was done by linear modelling with binomial link function." Would this be enough to recreate / reproduce the same model? I am not a mathematician, but I suspect more details would be needed. * *

      *We apologize if our approach was not clear. We used logistic regression not a mathematical model. We have now expanded the relevant Methods section to now read: *

      “To test whether or not there is a tendency of coexpression between two loci on the same chromatid, only nuclei with exactly one signal of each locus are informative. For these nuclei, we scored how many had expression in cis and how many in trans. To assess whether there was chromatid-specific coexpression, we tested statistically whether there was an excess of nuclei showing expression in cis. We did this using logistic regression, a form of generalized linear regression model. More specifically, we tested, for each model, whether the model intercept was significantly different from zero by using the z-scaled test statistic returned by these models and converting it to a p-value.”

      The authors claim that an enhancer working exclusively on one gene at a time would lead to a preference in individual expression - is this really the case? Could the authors show the expected scenarios for [one enhancer - two common targets] versus [two enhancers - two independent targets] and how this compares to the data?

      • *

      Our statistical analysis is restricted to the scenario of one enhancer acting on two genes (either simultaneously, or alternately). We do not test a two enhancers two target genes scenario because it is not relevant to our experimental analyses using synthetic activation of a single enhancer (with tZRS-Vp64, Extended Data Table 4).

      1. The results obtained with the VP64 activation (activation of ZRS leads to increased expression of Mnx1) are used by the authors as another piece of evidence that ZRS controls Mnx1 - but could VP64 activation be inducing chromatin opening / enhanced accessibility and therefore increased expression across the TAD boundary? I am not sure the authors need to test this, but they should at least acknowledge other possibilities (in relation to point 1).

      *We have previously shown (Benabdallah et al., 2019) that tal-VP64 activators alter chromatin structure (H3K27ac) in the Shh TAD only locally at the site of binding and at the Shh gene, and that this does not spread more generally. We have clarified this in the revised text. We also note that the effect of both the 35kb deletion and cohesin degradation on Mnx1 activation from the tZRSVp64 activator would not be consistent with a model of general chromatin opening/accessibility. The same argument applies to the DNA-FISH experiment (Fig 3) showing Mnx1 activation in the limb bud (ZPA) occurs specifically in the context of a compact chromatin conformation. *

      "In the nuclei of pre-motor neurons, where Mnx1 expression is driven from its own proximal enhancers (Fig. 1a), Mnx-ZRS and Mnx1-Shh distances are not different between Mnx1 expressing and non-expressing alleles." The authors use this as an argument to claim that Mnx1 expression per se does not explain the distance differences observed in the limb bud - but can such comparisons of expression and distances between loci be made between different cell types? Is there enough evidence for this to be a valid assumption? If not, then the assumption should be explicitly presented.

      • *

      We believe that the reviewer is confused here. We are not suggesting that Mnx1 expression per se doesn’t explain the distance differences in the limb bud, rather that these distance differences in the limb bud associated with Mnx1 transcription do not occur in the pre-motor neurons where activation is not dependent on distal enhancers, particularly in the Shh TAD.

      1. In Fig. 3b the authors show that shorter distances between the loci (Mnx1, Shh, ZRS) were associated with simultaneous transcription at Mnx1 and Shh, implying throughout that this would be associated with common activation by ZRS; but the shorter distances between the three loci are also associated with Mnx1 transcription alone. How is this explained?

      2. *

      *This is explained by the configuration of the Shh TAD and the general spatial proximity of Shh-ZRS in both expressing and non-expressing tissues due to the CTCF-mediated loop and that is apparent in Hi-C heat maps. *

      1. The text could be revised to look out for "expression levels" versus "expression frequency" - in several instances the authors mention expression "levels" when they are referring to % of cells expressing a given gene, which would thus be more appropriate called "expression frequency"?

      The reviewer makes an important point. In the revised manuscript we have removed all mention of “expression levels” and have replaced these with “ frequency”.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      In this study, Bu et al examined the dynamics of TRPV4 channel in cell overcrowding in carcinoma conditions. They investigated how cell crowding (or high cell confluence) triggers a mechano-transduction pathway involving TRPV4 channels in high-grade ductal carcinoma in situ (DCIS) cells that leads to large cell volume reduction (or cell volume plasticity) and proinvasive phenotype.

      In vitro, this pathway is highly selective for highly malignant invasive cell lines derived from a normal breast epithelial cell line (MCF10CA) compared to the parent cell line, but not present in another triple-negative invasive breast epithelial cell line (MDA-MB-231). The authors convincingly showed that enhanced TRPV4 plasma membrane localization correlates with highgrade DCIS cells in patient tissue samples.

      Specifically in invasive MCF10DCIS.com cells, they showed that overcrowding or overconfluence leads to a decrease in cell volume and intracellular calcium levels. This condition also triggers the trafficking of TRPV4 channels from intracellular stores (nucleus and potentially endosomes), to the plasma membrane (PM). When these over-confluent cells are incubated with a TRPV4 activator, there is an acute and substantial influx of calcium, attesting to the fact that there are a high number of TRPV4 channels present on the PM. Long-term incubation of these over-confluent cells with the TRPV4 activator results in the internalization of the PMlocalized TRPV4 channels.

      In contrast, cells plated at lower confluence primarily have TRPV4 channels localized in the nucleus and cytosol. Long-term incubation of these cells at lower confluence with a TRPV4 inhibitor leads to the relocation of TRPV4 channels to the plasma membrane from intracellular stores and a subsequent reduction in cell volume. Similarly, incubation of these cells at low confluence with PEG 3000 (a hyperosmotic agent) promotes the trafficking of TRPV4 channels from intracellular stores to the plasma membrane.

      Strengths:

      The study is elegantly designed and the findings are novel. Their findings on this mechanotransduction pathway involving TRPV4 channels, calcium homeostasis, cell volume plasticity, motility, and invasiveness will have a great impact in the cancer field and are potentially applicable to other fields as well. Experiments are well-planned and executed, and the data is convincing. The authors investigated TRVP4 dynamics using multiple different strategies- overcrowding, hyperosmotic stress, and pharmacological means, and showed a good correlation between different phenomena.

      Weaknesses:

      A major emphasis in the study is on pharmacological means to relate TRPV4 channel function to the phenotype. I believe the use of genetic means would greatly enhance the impact and provide compelling proof for the involvement of TRPV4 channels in the associated phenotype.

      In this regard, I wonder if siRNA-mediated knockdown of TRPV4 in over-confluent cells (or knockout) would lead to an increase in cell volume and normalize the intracellular calcium levels back to normal, thus ultimately leading to a decrease in cell invasiveness.

      We greatly appreciate the positive feedback regarding the design of our study and the novelty of our findings. We also acknowledge the valuable suggestion to complement our pharmacological approaches with genetic manipulation of TRPV4.

      In response to the comment regarding siRNA-mediated knockdown or knockout of TRPV4, we fully agree that this would further substantiate our findings. In the revised manuscript, we implemented shRNA targeting TRPV4 to investigate its functional effects on intracellular calcium level changes, cell volume plasticity, and invasiveness phenotypes, assessed through singlecell motility assays under cell crowding or hyperosmotic stress. These results have been incorporated into the revised manuscript, and detailed descriptions of these findings are included below.

      Using the shRNA approach that resulted in ~50% reduction of TRPV4 expression

      (Supplementary Figure 6A and 6B show TRPV4 expression levels via IF and immunoblots, respectively), we examined the effect of reduced TRPV4 on intracellular calcium levels in MCF10DCIS.com cells under normal density (ND) and stress conditions (confluent; Con and hyperosmotic; PEG) using Fluo-4 AM imaging (Fig. 4S-X). We found that shRNA TRPV4 slightly decreased calcium levels in ND cells, likely due to fewer active calcium channels at the plasma membrane resulting from lower TRPV4 expression (as shown in the summary plot in Fig. 4W). With fewer active calcium channels, cells treated with shRNA TRPV4 exhibited less reduction in intracellular calcium levels under cell crowding conditions compared to control cells. Additionally, hyperosmotic stress using PEG 300 induced smaller calcium spikes in shRNA cells compared to the significant spike observed in control cells. This reduced calcium response to Con and hyperosmotic stress in shRNA cells was reflected in the decreased cell volume reduction by PEG 300 shown in Fig. 4Y. Consequently, shRNA-mediated TRPV4 reduction impaired cell volume plasticity in MCF10DCIS.com cells and abolished the pro-invasive mechanotransduction capability involving cell volume reduction, as evidenced by no increase in cell motility (both cell diffusivity and directionality) under hyperosmotic conditions (Fig. 5H-J). These findings demonstrate the critical role of TRPV4 in conferring pro-invasive

      mechanotransduction capability to MCF10DCIS.com cells through cell volume reduction.

      Reviewer #2 (Public review):

      Summary:

      The metastasis poses a significant challenge in cancer treatment. During the transition from non-invasive cells to invasive metastasis cells, cancer cells usually experience mechanical stress due to a crowded cellular environment. The molecular mechanisms underlying mechanical signaling during this transition remain largely elusive. In this work, the authors utilize an in vitro cell culture system and advanced imaging techniques to investigate how non-invasive and invasive cells respond to cell crowding, respectively.

      Strengths:

      The results clearly show that pre-malignant cells exhibit a more pronounced reduction in cell volume and are more prone to spreading compared to non-invasive cells. Furthermore, the study identifies that TRPV4, a calcium channel, relocates to the plasma membrane both in vitro and in vivo (patient samples). Activation and inhibition of the TRPV4 channel can modulate the cell volume and cell mobility. These results unveil a novel mechanism of mechanical sensing in cancer cells, potentially offering new avenues for therapeutic intervention targeting cancer metastasis by modulating TRPV4 activity. This is a very comprehensive study, and the data presented in the paper are clear and convincing. The study represents a very important advance in our understanding of the mechanical biology of cancer.

      Weaknesses:

      However, I do think that there are several additional experiments that could strengthen the conclusions of this work. A critical limitation is the absence of genetic ablation of the TRPV4 gene to confirm its essential role in the response to cell crowding.

      We are deeply grateful for the positive assessment of our study and its contribution to advancing our understanding of mechanical signaling in cancer progression. We also greatly appreciate the suggestion to incorporate genetic ablation experiments to further validate the role of TRPV4 in cell crowding responses.

      As noted in our response to Reviewer #1, we employed an shRNA approach to investigate the functional effects of TRPV4 knockdown on intracellular calcium level changes, cell volume plasticity, and invasiveness phenotypes. We assessed these effects using Fluo-4 AM calcium assay, single-cell volume measurements, and single-cell motility assays under cell crowding or hyperosmotic stress. These results have been incorporated into the revised manuscript and are described in detail in our response to Reviewer #1's "weaknesses" comment.

      Reducing TRPV4 expression levels by shRNA diminished mechanosensing intracellular calcium changes under cell crowding and hyperosmotic conditions using PEG 300 treatment. Furthermore, a significantly reduced cell volume plasticity was observed under hyperosmotic conditions in shRNA treated cells compared to control cells (Fig. 4S-X). This diminished mechanosensing capability abolished the pro-invasive mechanotransduction effect, as assessed by single cell motility under hyperosmotic conditions (Fig. 5H-J). These findings demonstrate the critical role of TRPV4 in conferring pro-invasive mechanotransduction capability to MCF10DCIS.com cells through cell volume reduction.

      Reviewer #1 (Recommendations for the authors):

      The way the results or discussion section is written. It was a little confusing for me to relate to some phenomena. For example, it is not clear how TRPV4 inhibition (due to overcrowding) leads to a decrease in intercellular calcium levels, especially when TRPV4 channels were intercellular (not on the PM) to begin with (in normal density (ND) conditions). Along the same lines, how GSK219 causes a dip in calcium levels in ND cells when TRPV4 channels are primarily intercellular (Figure 4E). If most of the TRPV4 channels that are translocated to the PM in response to cell crowding are in an inactive state, how do they confer enhanced cell volume plasticity relative to non-invasive cell lines?

      Thank you very much for raising this important point. We fully agree with your concern and have significantly revised the manuscript to clarify this aspect. Specifically, we have emphasized that a modest level of TRPV4 channels are constitutively active at the plasma membrane in normal density (ND) cells. This is now discussed in detail in the context of Fig. 4:

      Page 14: “Considering these factors, we hypothesized that cell crowding might inhibit calcium-permeant ion channels that are constitutively active at the plasma membrane, including TRPV4, which would then lower intracellular calcium levels and subsequently reduce cell volume via osmotic water movement.”

      Page 16-17: “… However, the temporal profile of Fluo-4 intensity in Fig. 4E, which corresponds to the time points marked in Fig. 4D (t<sub>1</sub>: baseline and t<sub>2</sub>: dip), clearly shows the dip at t<sub>2</sub>, indicated by ΔCa (the vertical dashed line between the dip and baseline). This modest Fluo-4 dip at t<sub>2</sub> represents the inhibition of activity by GSK219 on a small population of constitutively active TRPV4 channels at the plasma membrane under ND conditions.

      In Con cells, 1 nM GSK219 caused a smaller dip in Fluo-4 intensity compared to the one observed in ND cells, with no subsequent changes. This is likely due to fewer constitutively active TRPV4 at the plasma membrane in Con cells than in ND cells. …These findings suggest that a substantial portion of TRPV4 channels relocated to the plasma membrane under cell crowding was inactive, and some constitutively active TRPV4 channels already present in the membrane became inactive as a result of cell crowding.”

      'Internalization' might be a better word than 'uptake' in the following line in the results section

      "...activating TRPV4 under cell crowding conditions triggered channel uptake, indicating that TRPV4 trafficking depended on the channel's activation status."

      Thank you very much for this suggestion. As recommended, we replaced ‘uptake’ with internalization’ on page 18: 

      “However, in Con cells, where a large number of inactive TRPV4 channels are likely located at the plasma membrane, GSK101 treatment notably reduced plasma membrane-associated TRPV4 in a dose-dependent manner through internalization (Fig. 4O, 4Q), consistent with previous findings65. These data suggest that plasma membrane TRPV4 levels were largely

      regulated by the channel activity status. Specifically, channel activation led to the internalization of TRPV4, while channel inhibition promoted the relocation of TRPV4 to the plasma membrane.”

      1. Out of curiosity:

      2. Is there any information on what the intercellular TRPV4 channels are doing in the cytosol and in the nucleus? Is there any role of intercellular calcium stores in the proposed pathway?

      We greatly appreciate this insightful question. Although we were unable to find studies specifically exploring the roles of cytosolic TRPV4, a recent study (Reference 74) identified a role for nuclear TRPV4 in regulating calcium within the nucleus. We speculate that when TRPV4 activity is severely impaired, such as with additional TRPV4 inhibition under cell crowding conditions, some TRPV4 channels may be redirected to the nucleus. This redistribution could help maintain nuclear calcium homeostasis.

      This discussion is included on page 18 of the manuscript:

      “These findings suggest that further TRPV4 inhibition under crowding conditions triggers a distinct trafficking alteration. Recent studies have implicated nuclear TRPV4 in regulating nuclear Ca2+ homeostasis and Ca2+-regulated transcription74. In light of this study and our findings, TRPV4 may relocate to the nucleus as a compensatory mechanism to maintain nuclear calcium regulation. This relocation could reflect an adaptive response to preserve calcium-dependent transcriptional programs or other nuclear processes essential for cell survival under mechanical stress.”

      One recommendation is to add some explanation or some minor details for the convenience of the reader. For example:

      At normal or lower confluence, cells show an acute large dip in intercellular calcium when an inhibitor is applied implying that there are a few TRPV4 channels on the PM and they are constitutively active.

      Thank you very much for highlighting this important point and for the helpful suggestion to improve clarity. We have significantly revised the text associated with Fig. 4 to ensure this point is clear. Specifically, we have added the following explanation on page 16:

      "This modest Fluo-4 dip at t2 represents the inhibition of activity by GSK219 on a small population of constitutively active TRPV4 channels at the plasma membrane under ND conditions."

      Reviewer #2 (Recommendations for the authors):

      (1) Figure 1. The authors frequently change the medium to prevent acidification in overconfluent cultures. A cell viability assay should be performed to ensure that the over-confluent cells remain healthy and viable during the experiments. There are commercial kits that can be easily used to quantify the number of viable cells and the extent of cell toxicity. The number of viable cells would provide a more reliable basis for comparison between normal density and overconfluent conditions.

      Thank you very much for raising this important point. We have consistently observed that cell crowding does not induce significant cell death in MCF10DCIS.com cells. To address your recommendation, we performed a viability assay using propidium iodide (PI) to selectively stain dead cells and WGA-488 to stain all live cells. Cell death was quantified under normal density (ND) conditions and at 1, 3, 5, 7, and 10 days post-confluence.

      Our results indicate that cells remain similarly viable post-confluence, with minimal cell death

      (~1.5%) compared to ND cells (~0.75%). These findings are summarized in Supplementary Figure 2, demonstrating that over-confluent cultures remain healthy and viable during the experiments.

      (2) Figure 2. To determine whether the reduction in cell volume is reversible, over-confluent cells can be further diluted back to normal density. Additionally, the reversibility of TRPV4 channel trafficking to the plasma membrane should be assessed under these conditions in IF experiments and cell surface biotinylation.

      Thank you for this suggestion. We reseeded the previously overcrowded (OC) cells at normal density and observed that their TRPV4 distribution predominantly returned to being intracellular, with only modest plasma membrane localization, as shown by line analysis (Supplementary Figure 10A-C, page 13). Furthermore, their invasiveness decreased to levels comparable to the original normal density (ND) cells (Supplementary Figure 3C and 3E, page 6). These results demonstrate the reversibility of TRPV4 trafficking changes and the increase in invasiveness under mechanical stress.

      Page 6. "The enhanced invasiveness of MCF10DCIS.com cells under cell crowding was largely reversible. When OC cells were reseeded at normal density for invasion assays, their invasive cell fraction decreased to approximately 15%, slightly lower (p = 0.012) than the initial value of around 24% (Suppl. Fig. 3C, 3E)."

      Page 13. “We investigated whether TRPV4 relocation to the plasma membrane induced by cell crowding is reversible, as suggested by its impact on invasiveness (Suppl. Fig. 3E). To test this, previously OC MCF10DCIS.com cells were reseeded under ND conditions. We then assessed TRPV4 localization via immunofluorescence (IF) imaging to determine if most channels returned to the cytoplasm and could be relocated to the plasma membrane under mechanical stress, such as hyperosmotic conditions. Consistent with their initial ND state, reseeded ND MCF10DCIS.com cells displayed intracellular TRPV4 distribution (Suppl. Fig. 10A). Upon exposure to hyperosmotic stress (74.4 mOsm/Kg PEG300), TRPV4 was again relocated to the plasma membrane (Suppl. Fig. 10B). These findings, quantified through line analysis (Suppl. Fig. 10C), demonstrate that the mechanosensing response of MCF10DCIS.com cells is reversible.”

      (3) Figure 3B. A control using intracellular proteins such as GAPDH or Tubulin is missing. Including this control would help exclude the possibility of cell rupture or compromised cell membranes in crowded environments, which is very common in a cell crowding environment.

      Thank you very much for pointing this out. The control lanes (GAPDH) were already included in the full gel results shown in Supplementary Figure 5. For the immunoprecipitation and immunoblotting of surface-biotinylated cell lysates, we did not expect to detect GAPDH; however, some GAPDH signals were still observed. As shown for MCF10DCIS.com cells, less GAPDH was detected under OC conditions, but the immunoprecipitated samples displayed significantly higher levels of TRPV4 on the cell surface compared to ND cells (Supplementary Figure 5A). For the whole cell lysates, TRPV4 protein levels were comparable across different cell lines based on the immunoblot results, with consistent GAPDH signals serving as a loading control (Supplementary Figure 5B).

      (4) Figure 4. To convincingly demonstrate TRPV4 relocation to the plasma membrane, IF should be performed under non-permeable conditions (i.e., without detergents like saponin). This approach ensures that only plasma membrane proteins are accessible to antibodies, reducing intracellular background. The same approach should be applied to Piezo1 and TfR.

      Thank you for this suggestion. We observed that under non-permeable conditions, primary antibodies could still access intracellular proteins. To address this issue, we employed extracellular-binding TRPV4 antibodies to selectively detect TRPV4 relocation to the plasma membrane under hyperosmotic conditions (74.4 mOsm/kg PEG 300) in live MCF10DCIS.com cells, as shown in Supplementary Figure 9. These results clearly demonstrate the plasma membrane relocation of TRPV4 under hyperosmotic conditions, distinguishing it from control conditions. Unfortunately, we were unable to identify high-affinity extracellular-binding antibodies for Piezo1 and TfR. Nevertheless, our findings strongly support the mechanosensing plasma membrane relocation of TRPV4.

      Essential Weakness:

      Throughout the study, only TRPV4 inhibitors and activators were used to show that TRPV4 relocation is associated with intracellular calcium concentration and cell size changes. It is crucial to use TRPV4 KO or KD cells to confirm that the observed effects are specific to TRPV4 and not due to off-target effects on other proteins. Additionally, fusing a plasma membrane targeting sequence to TRPV4 to make a constitutive plasma membrane-localized construct could demonstrate the opposite effect.

      Thank you very much for this important comment. As noted in our response to Reviewer #1, we employed an shRNA approach to investigate the functional effects of TRPV4 knockdown on intracellular calcium level changes, cell volume plasticity, and invasiveness phenotypes. We assessed these effects using Fluo-4 AM calcium assay, single-cell volume measurements, and single-cell motility assays under cell crowding or hyperosmotic stress. These results have been incorporated into the revised manuscript and are described in detail in our response to Reviewer #1's "weaknesses" comment.

      Reducing TRPV4 expression levels by shRNA diminished mechanosensing intracellular calcium changes under cell crowding and hyperosmotic conditions using PEG 300 treatment. Furthermore, a significantly reduced cell volume plasticity was observed under hyperosmotic conditions in shRNA treated cells compared to control cells (Fig. 4S-X). This diminished mechanosensing capability abolished the pro-invasive mechanotransduction effect, as assessed by single cell motility under hyperosmotic conditions (Fig. 5H-J). These findings demonstrate the critical role of TRPV4 in conferring pro-invasive mechanotransduction capability to MCF10DCIS.com cells through cell volume reduction.

      Minor Points:

      The introduction section is poorly written; many results currently included in the introduction would be more appropriately placed in the discussion section. The long redundant introduction makes the article hard to read through.

      Thank you very much for pointing this out. In the revised introduction, we have significantly reduced references to the results, streamlining the section to make it more concise and focused. This adjustment ensures the introduction is clearer and avoids redundancy, improving the readability of the manuscript.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      When you search for something, you need to maintain some representation (a "template") of that target in your mind/brain. Otherwise, how would you know what you were looking for? If your phone is in a shocking pink case, you can guide your attention to pink things based on a target template that includes the attribute 'pink'. That guidance should get you to the phone pretty effectively if it is in view. Most real-world searches are more complicated. If you are looking for the toaster, you will make use of your knowledge of where toasters can be. Thus, if you are asked to find a toaster, you might first activate a template of a kitchen or a kitchen counter. You might worry about pulling up the toaster template only after you are reasonably sure you have restricted your attention to a sensible part of the scene.

      Zhou and Geng are looking for evidence of this early stage of guidance by information about the surrounding scene in a search task. They train Os to associate four faces with four places. Then, with Os in the scanner, they show one face - the target for a subsequent search. After an 8 sec delay, they show a search display where the face is placed on the associated scene 75% of the time. Thus, attending to the associated scene is a good idea. The questions of interest are "When can the experimenters decode which face Os saw from fMRI recording?" "When can the experimenters decode the associated scene?" and "Where in the brain can the experimenters see evidence of this decoding? The answer is that the face but not the scene can be read out during the face's initial presentation. The key finding is that the scene can be read out (imperfectly but above chance) during the subsequent delay when Os are looking at just a fixation point. Apparently, seeing the face conjures up the scene in the mind's eye.

      This is a solid and believable result. The only issue, for me, is whether it is telling us anything specifically about search. Suppose you trained Os on the face-scene pairing but never did anything connected to the search. If you presented the face, would you not see evidence of recall of the associated scene? Maybe you would see the activation of the scene in different areas and you could identify some areas as search specific. I don't think anything like that was discussed here.

      You might also expect this result to be asymmetric. The idea is that the big scene gives the search information about the little face. The face should activate the larger useful scene more than the scene should activate the more incidental face, if the task was reversed. That might be true if the finding is related to a search where the scene context is presumed to be the useful attention guiding stimulus. You might not expect an asymmetry if Os were just learning an association.

      It is clear in this study that the face and the scene have been associated and that this can be seen in the fMRI data. It is also clear that a valid scene background speeds the behavioral response in the search task. The linkage between these two results is not entirely clear but perhaps future research will shed more light.

      It is also possible that I missed the clear evidence of the search-specific nature of the activation by the scene during the delay period. If so, I apologize and suggest that the point be underlined for readers like me.

      We will respond to this question by acknowledging that the reviewer is right in that the delay period activation of the scene is not necessarily search-specific. We will then discuss how this possibility affects the interpretation of our results and what kind of studies would need to be conducted in order to fully establish a causal link between delay period activity and visual search performance. We will also discuss the literature on cued attention and situate our work within the context of these other studies that have used similar task paradigms to infer attentional processes. Finally, we will discuss the interpretation of delay period activity in PPA and IFJ.

      Reviewer #2 (Public review):

      Summary:

      This work is one of the best instances of a well-controlled experiment and theoretically impactful findings within the literature on templates guiding attentional selection. I am a fan of the work that comes out of this lab and this particular manuscript is an excellent example as to why that is the case. Here, the authors use fMRI (employing MVPA) to test whether during the preparatory search period, a search template is invoked within the corresponding sensory regions, in the absence of physical stimulation. By associating faces with scenes, a strong association was created between two types of stimuli that recruit very specific neural processing regions - FFA for faces and PPA for scenes. The critical results showed that scene information that was associated with a particular cue could be decoded from PPA during the delay period. This result strongly supports the invoking of a very specific attentional template.

      Strengths:

      There is so much to be impressed with in this report. The writing of the manuscript is incredibly clear. The experimental design is clever and innovative. The analysis is sophisticated and also innovative. The results are solid and convincing.

      Weaknesses:

      I only have a few weaknesses to point out.

      This point is not so much of a weakness, but a further test of the hypothesis put forward by the authors. The delay period was long - 8 seconds. It would be interesting to split the delay period into the first 4seconds and the last 4seconds and run the same decoding analyses. The hypothesis here is that semantic associations take time to evolve, and it would be great to show that decoding gets stronger in the second delay period as opposed to the period right after the cue. I don't think this is necessary for publication, but I think it would be a stronger test of the template hypothesis.

      We will conduct the suggested analysis. Depending on the outcome, we will include it in supplemental materials or the main text.

      Type in the abstract "curing" vs "during."

      We will fix this.

      It is hard to know what to do with significant results in ROIs that are not motivated by specific hypotheses. However, for Figure 3, what are the explanations for ROIs that show significant differences above and beyond the direct hypotheses set out by the authors?

      We will address how each of the ROIs wdas selected based on the use of a priori networks as masks with ROIs as sub-parcels. We will explain why specific ROIs were associated with the strongest hypotheses but how the entire networks are relevant and related to existing literatures on attentional control and working memory. This content will be included in the introduction and discussion sections.

      Reviewer #3 (Public review):

      The manuscript contains a carefully designed fMRI study, using MVPA pattern analysis to investigate which high-level associate cortices contain target-related information to guide visual search. A special focus is hereby on so-called 'target-associated' information, that has previously been shown to help in guiding attention during visual search. For this purpose the author trained their participants and made them learn specific target-associations, in order to then test which brain regions may contain neural representations of those learnt associations. They found that at least some of the associations tested were encoded in prefrontal cortex during the cue and delay period.

      The manuscript is very carefully prepared. As far as I can see, the statistical analyses are all sound and the results integrate well with previous findings.

      I have no strong objections against the presented results and their interpretation.

      Thank you.

    1. Does it matter whether we think of homework as formative or summative? Definitely! How you think of homework affects the policies you associate with your assignments. If homework is summative, you want to ensure that it represents only (or primarily) the student’s work, and you forbid students to talk to each other about it [3]. If homework is formative, it seems reasonable to allow students to talk to each other. The latter means that there’s a risk that the work we see is only partly the student’s. However, we hope they learned along the way (and that they’ve cited the people they’ve gotten help from).

      Yes. And, I've struggled with detangling "formative" from "summative" ever since I learned them in my graduate training. I have subsequently come to some thoughts that the distinction between these relies heavily on strong links to particular goals/learning outcomes and the available time for the learning experience. The very same assessment can be considered "formative" or "summative" if one changes the goal/LO or the learning timeframe examined. For example, a final exam in CRS 101 is likely to be summative for that course, but may be considered formative for the next course in the curricular sequence (CRS 102) if the major were designed so that the courses seamlessly progressed from one to the next.

    1. But the second one you referred to is Vannevar Bush, who wrote this beautiful essay in 1945 called “As We May Think.” In it, he basically envisions the internet. He envisions this personal computer called the “memex,” from memory and index. It’s extraordinarily prophetic — not just the technology but the relationships that we’ll have with knowledge, with information, with each other. He talks about this — he said there will be a new profession of trailblazers who will make a career out of finding useful trails through the common record. I love this notion of the common record. In a way, so much of what I do is an attempt to make sense of humanity’s common record.

      memex personal computer

      memory and index

      not just interpersonal but interplanetary and local firsst autonomous private secure permanenet evergreen

      will be a new profession of trailblazers who will make a career out of finding useful trails through the common record.

      not just finding but naming and creating/namng trails linking meaningfully the personal and collaboratively emerging interintellect in grounded in individuals indranet.work spaces

      indy.memex interpersonal computer

      spiritual reparenting

    1. One may be tempted to assume that GenAI tools, likeChatGPT, have negated the need for many types of knowl-edge. Asking for facts, procedures, or an analysis of facts iseasily within the range of many GenAI tools now. However,Neelen and Kirschner (2020) respond to this type of think-ing in detail in the context of learners and the Google searchengine. They address the learning myth, “Google can replacehuman knowledge” by examining types of knowledge (e.g.,propositional, tacit, etc.) and present well-documented argu-ments for such statements as:“Let’s assume for a second that Google can replaceour own knowledge. We’d still have to interpret theinformation that Google gives us to make it meaning-ful” (p. 122) and;“If we’re trying to solve very complex problems, werun into several issues when relying on Google. Themain problems are that we need to know what we’relooking for and that we need to be able to judge theinformation we find based on the knowledge that’s inour head” (p. 130)

      This section makes me reflect on the common misconception that tools like GenAI or even Google can replace human knowledge. As an educator, I see how tempting it might be for students (and even teachers) to rely heavily on these tools, but this dependency can create significant gaps in critical thinking and problem-solving. The quote about interpreting information resonates with me because technology can provide data, but understanding and applying it require skills and context that only humans bring.

      Personally, I agree with the statement that solving complex problems requires more than just finding information online. It reminds me of situations in my professional role where I’ve had to assess the validity of data or consider the nuances of a problem—something no search engine or AI can do without my input and expertise. GenAI can be a powerful assistant, but the “knowledge in our head” is what allows us to navigate ambiguity and discern quality.

      I wonder if relying too much on tools like ChatGPT might weaken students’ ability to critically evaluate information or even know where to start when they don’t have a foundation of knowledge. While GenAI can support learning, I see a real danger if we let it replace the essential process of building and applying our understanding. What do you think—is there a way to balance using these tools without diminishing the development of core skills?

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      (1) As VRMate (a component of behaviorMate) is written using Unity, what is the main advantage of using behaviorMate/VRMate compared to using Unity alone paired with Arduinos (e.g. Campbell et al. 2018), or compared to using an existing toolbox to interface with Unity (e.g. Alsbury-Nealy et al. 2022, DOI: 10.3758/s13428-021-01664-9)? For instance, one disadvantage of using Unity alone is that it requires programming in C# to code the task logic. It was not entirely clear whether VRMate circumvents this disadvantage somehow -- does it allow customization of task logic and scenery in the GUI? Does VRMate add other features and/or usability compared to Unity alone? It would be helpful if the authors could expand on this topic briefly.

      We have updated the manuscript (lines 412-422) to clarify the benefits of separating the VR system as an isolated program and a UI that can be run independently. We argue that “…the recommended behaviorMate architecture has several important advantages. Firstly, by rendering each viewing angle of a scene on a dedicated device, performance is improved by splitting the computational costs across several inexpensive devices rather than requiring specialized or expensive graphics cards in order to run…, the overall system becomes more modular and easier to debug [and] implementing task logic in Unity would require understanding Object-Oriented Programming and C# … which is not always accessible to researchers that are typically more familiar with scripting in Python and Matlab.”

      VRMate receives detailed configuration info from behaviorMate at runtime as to which VR objects to display and receives position updates during experiments. Any other necessary information about triggering rewards or presenting non-VR cues is still handled by the UI so no editing of Unity is necessary. Scene configuration information is in the same JSON format as the settings files for behaviorMate, additionally there are Unity Editor scripts which are provided in the VRmate repository which permit customizing scenes through a “drag and drop” interface and then writing the scene configuration files programmatically. Users interested in these features should see our github page to find example scene.vr files and download the VRMate repository (including the editor scripts).  We provided 4 vr contexts, as well as a settings file that uses one of them which can be found on the behaviorMate github page (https://github.com/losonczylab/behaviorMate) in the “vr_contexts” and “example_settigs_files” directories. These examples are provided to assist VRMate users in getting set up and could provide a more detailed example of how VRMate and behaviorMate interact.

      (2) The section on "context lists", lines 163-186, seemed to describe an important component of the system, but this section was challenging to follow and readers may find the terminology confusing. Perhaps this section could benefit from an accompanying figure or flow chart, if these terms are important to understand.

      We maintain the use of the term context and context list in order to maintain a degree of parity with the java code. However, we have updated lines 173-175 to define the term context for the behaviorMate system: “... a context is grouping of one or more stimuli that get activated concurrently. For many experiments it is desirable to have multiple contexts that are triggered at various locations and times in order to construct distinct or novel environments.”

      a. Relatedly, "context" is used to refer to both when the animal enters a particular state in the task like a reward zone ("reward context", line 447) and also to describe a set of characteristics of an environment (Figure 3G), akin to how "context" is often used in the navigation literature. To avoid confusion, one possibility would be to use "environment" instead of "context" in Figure 3G, and/or consider using a word like "state" instead of "context" when referring to the activation of different stimuli.

      Thank you for the suggestion. We have updated Figure 3G to say “Environment” in order to avoid confusion.

      (3) Given the authors' goal of providing a system that is easily synchronizable with neural data acquisition, especially with 2-photon imaging, I wonder if they could expand on the following features:

      a. The authors mention that behaviorMate can send a TTL to trigger scanning on the 2P scope (line 202), which is a very useful feature. Can it also easily generate a TTL for each frame of the VR display and/or each sample of the animal's movement? Such TTLs can be critical for synchronizing the imaging with behavior and accounting for variability in the VR frame rate or sampling rate.

      Different experimental demands require varying levels of precision in this kind of synchronization signals. For this reason, we have opted against a “one-size fits all” for synchronization with physiology data in behaviorMate. Importantly this keeps the individual rig costs low which can be useful when constructing setups specifically for use when training animals. behaviorMate will log TTL pulses sent to GPIO pins setup as sensors, and can be configured to generate TTL pulses at regular intervals. Additionally all UDP packets received by the UI are time stamped and logged. We also include the output of the arduino millis() function in all UDP packets which can be used for further investigation of clock drift between system components. Importantly, since the system is event driven there cannot be accumulating drift across running experiments between the behaviorMate UI and networked components such as the VR system.

      For these reasons, we have not needed to implement a VR frame synchronization TTL for any of our experiments, however, one could extend VRMate to send "sync" packets back to behaviorMate to log when each frame was displayed precisely or TTL pulses (if using the same ODROID hardware we recommend in the standard setup for rendering scenes). This would be useful if it is important to account for slight changes in the frame rate at which the scenes are displayed. However, splitting rendering of large scenes between several devices results in fast update times and our testing and benchmarks indicate that display updates are smooth and continuous enough to appear coupled to movement updates from the behavioral apparatus and sufficient for engaging navigational circuits in the brain.

      b. Is there a limit to the number of I/O ports on the system? This might be worth explicitly mentioning.

      We have updated lines 219-220 in the manuscript to provide this information: Sensors and actuators can be connected to the controller using one of the 13 digital or 5 analog input/output connectors.

      c. In the VR version, if each display is run by a separate Android computer, is there any risk of clock drift between displays? Or is this circumvented by centralized control of the rendering onset via the "real-time computer"?

      This risk is mitigated by the real-time computer/UI sending position updates to the VR displays. The maximum amount scenes can be out of sync is limited because they will all recalibrate on every position update – which occurs multiple times per second as the animal is moving. Moreover, because position updates are constantly being sent by behaviorMate to VRMate and VRMate is immediately updating the scene according to this position, the most the scene can become out of sync with the mouse's position is proportional to the maximum latency multiplied by the running speed of the mouse. For experiments focusing on eliciting an experience of navigation, such a degree of asynchrony is almost always negligible. For other experimental demands it could be possible to incorporate more precise frame timing information but this was not necessary for our use case and likely for most other use cases. Additionally, refer to the response to comment 3a.

      Reviewer #2 (Public review):

      (1) The central controlling logic is coupled with GUI and an event loop, without a documented plugin system. It's not clear whether arbitrary code can be executed together with the GUI, hence it's not clear how much the functionality of the GUI can be easily extended without substantial change to the source code of the GUI. For example, if the user wants to perform custom real-time analysis on the behavior data (potentially for closed-loop stimulation), it's not clear how to easily incorporate the analysis into the main GUI/control program.

      Without any edits to the existing source code behaviorMate is highly customizable through the settings files, which allow users to combine the existing contexts and decorators in arbitrary combinations. Therefore, users have been able to perform a wide variety of 1D navigation tasks, well beyond our anticipated use cases by generating novel settings files. The typical method for providing closed-loop stimulation would be to set up a context which is triggered by animal behavior using decorators (e.g. based on position, lap number and time) and then trigger the stimulation with a TTL pulse. Rarely, if users require a behavioral condition not currently implemented or composable out of existing decorators, it would require generating custom code in Java to extend the UI. Performing such edits requires only knowledge of basic object-oriented programming in Java and generating a single subclass of either the BasicContextList or ContextListDecorator classes. In addition, the JavaFX (under development) version of behaviorMate incorporates a plugin which doesn't require recompiling the code in order to make these changes. However, since the JavaFX software is currently under development, documentation does not yet exist. All software is open-sourced and available on github.com for users interested in generating plugins or altering the source code.

      We have added the additional caveat to the manuscript in order to clarify this point (Line 197-202): “However, if the available set of decorators is not enough to implement the required task logic, some modifications to the source code may be necessary. These modifications, in most cases, would be very simple and only a basic understanding of object-oriented programming is required. A case where this might be needed would be performing novel customized real-time analysis on behavior data and activating a stimulus based on the result”

      (2) The JSON messaging protocol lacks API documentation. It's not clear what the exact syntax is, supported key/value pairs, and expected response/behavior of the JSON messages. Hence, it's not clear how to develop new hardware that can communicate with the behaviorMate system.

      The most common approach for adding novel hardware is to use TTL pulses (or accept an emitted TTL pulse to read sensor states). This type of hardware addition  is possible through the existing GPIO without the need to interact with the software or JSON API. Users looking to take advantage of the ability to set up and configure novel behavioral paradigms without the need to write any software would be limited to adding hardware which could be triggered with and report to the UI with a TTL pulse (however fairly complex actions could be triggered this way).

      For users looking to develop more customized hardware solutions that interact closely with the UI or GPIO board, additional documentation on the JSON messaging protocol has been added to the behaviormate-utils repository (https://github.com/losonczylab/behaviormate_utils). Additionally, we have added a link to this repository in the Supplemental Materials section (line 971) and referenced this in the manuscript (line 217) to make it easier for readers to find this information.

      Furthermore, developers looking to add completely novel components to the UI  can implement the interface described by Context.java in order to exchange custom messages with hardware. (described  in the JavaDoc: https://www.losonczylab.org/behaviorMate-1.0.0/)  These messages would be defined within the custom context and interact with the custom hardware (meaning the interested developer would make a novel addition to the messaging API). Additionally, it should be noted that without editing any software, any UDP packets sent to behaviorMate from an IP address specified in the settings will get time stamped and logged in the stored behavioral data file meaning that are a large variety of hardware implementation solutions using both standard UDP messaging and through TTL pulses that can work with behaviorMate with minimal effort. Finally, see response to R2.1 for a discussion of the JavaFX version of the behaviorMatee UI including plugin support.

      (3) It seems the existing control hardware and the JSON messaging only support GPIO/TTL types of input/output, which limits the applicability of the system to more complicated sensor/controller hardware. The authors mentioned that hardware like Arduino natively supports serial protocols like I2C or SPI, but it's not clear how they are handled and translated to JSON messages.

      We provide an implementation for an I2C-based capacitance lick detector which interested developers may wish to copy if support for novel I2C or SPI. Users with less development experience wishing to expand the hardware capabilities of  behaviorMatecould also develop adapters which can be triggered  on a TTL input/output. Additionally, more information about the JSON API and how messages are transmitted to the PC by the arduino is described in point (2) and the expanded online documentation.

      a. Additionally, because it's unclear how easy to incorporate arbitrary hardware with behaviorMate, the "Intranet of things" approach seems to lose attraction. Since currently, the manuscript focuses mainly on a specific set of hardware designed for a specific type of experiment, it's not clear what are the advantages of implementing communication over a local network as opposed to the typical connections using USB.

      As opposed to serial communication protocols as typical with USB, networking protocols seamlessly function based on asynchronous message passing. Messages may be routed internally (e.g. to a PCs localhost address, i.e. 0.0.0..0) or to a variety of external hardware (e.g. using IP addresses such as those in the range 192.168.1.2 - 192.168.1.254). Furthermore, network-based communication allows modules, such as VR, to be added easily. behavoirMate systems can be easily expanded using low-cost Ethernet switches and consume only a single network adapter on the PC (e.g. not limited by the number of physical USB ports). Furthermore, UDP message passing is implemented in almost all modern programming languages in a platform independent manner (meaning that the same software can run on OSX, Windows, and Linux). Lastly, as we have pointed out (Line 117) a variety of tools exist for inspecting network packets and debugging; meaning that it is possible to run behaviorMate with simulated hardware for testing and debugging.

      The IOT nature of behaviorMate means there is no requirement for novel hardware to be implemented  using an arduino,  since any system capable of  UDP communication can  be configured. For example, VRMate is usually run on Odroid C4s, however one could easily create a system using Raspberry Pis or even additional PCs. behaviorMate is agnostic to the format of the UDP messages, but packaging any data in the JSON format for consistency would be encouraged. If a new hardware is a sensor that has input requiring it to be time stamped and logged then all that is needed is to add the IP address and port information to the ‘controllers’ list in a behaviorMate settings file. If more complex interactions are needed with novel hardware than a custom implementation of ContextList.java may be required (see response to R2.2). However, the provided UdpComms.java class could be used to easily send/receive messages from custom Context.java subclasses.

      Solutions for highly customized hardware do require basic familiarity with object-oriented programming using the Java programming language. However, in our experience most behavioral experiments do not require these kinds of modifications. The majority of 1D navigation tasks, which behaviorMate is currently best suited to control, require touch/motion sensors, LEDs, speakers, or solenoid valves,  which are easily controlled by the existing GPIO implementation. It is unlikely that custom subclasses would even be needed.

      Reviewer #3 (Public review):

      (1) While using UDP for data transmission can enhance speed, it is thought that it lacks reliability. Are there error-checking mechanisms in place to ensure reliable communication, given its criticality alongside speed?

      The provided GPIO/behavior controller implementation sends acknowledgement packets in response to all incoming messages as well as start and stop messages for contexts and “valves”. In this way the UI can update to reflect both requested state changes as well as when they actually happen (although there is rarely a perceptible gap between these two states unless something is unplugged or not functioning). See Line 85 in the revised manuscript “acknowledgement packets are used to ensure reliable message delivery to and from connected hardware”.

      (2) Considering this year's price policy changes in Unity, could this impact the system's operations?

      VRMate is not affected by the recent changes in pricing structure of the Unity project.

      The existing compiled VRMate software does not need to be regenerated to update VR scenes, or implement new task logic (since this is handled by the behaviorMate GUI). Therefore, the VRMate program is robust to any future pricing changes or other restructuring of the Unity program and does not rely on continued support of Unity. Additionally, while the solution presented in VRMate has many benefits, a developer could easily adapt any open-source VR Maze project to receive the UDP-based position updates from behaviorMate or develop their own novel VR solutions.

      (3) Also, does the Arduino offer sufficient precision for ephys recording, particularly with a 10ms check?

      Electrophysiology recording hardware typically has additional I/O channels which can provide assistance with tracking behavior/synchronization at a high resolution. While behaviorMate could still be used to trigger reward valves, either the ephys hardware or some additional high-speed DAQ would be recommended to maintain accurately with high-speed physiology data. behaviorMate could still be set up as normal to provide closed and open-loop task control at behaviorally relevant timescales alongside a DAQ circuit recording events at a consistent temporal resolution. While this would increase the relative cost of the individual recording setup, identical rigs for training animals could still be configured without the DAQ circuit avoiding unnecessary cost and complexity.

      (4) Could you clarify the purpose of the Sync Pulse? In line 291, it suggests additional cues (potentially represented by the Sync Pulse) are needed to align the treadmill screens, which appear to be directed towards the Real-Time computer. Given that event alignment occurs in the GPIO, the connection of the Sync Pulse to the Real-Time Controller in Figure 1 seems confusing.

      A number of methods exist for synchronizing recording devices like microscopes or electrophysiology recordings with behaviorMate’s time-stamped logs of actuators and sensors. For example, the GPIO circuit can be configured to send sync triggers, or receive timing signals as input. Alternatively a dedicated circuit could record frame start signals and relay them to the PC to be logged independently of the GPIO (enabling a high-resolution post-hoc alignment of the time stamps). The optimal method to use varies based on the needs of the experiment. Our setups have a dedicated BNC output and specification in the settings file that sends a TTL pulse at the start of an experiment in order to trigger 2p imaging setups (see line 224, specifically that this is a detail of “our” 2p imaging setup). We provide this information as it might be useful suggesting how to have both behavior and physiology data start recording at the same time. We do not intend this to be the only solution for alignment. Figure 1 indicates an “optional” circuit for capturing a high speed sync pulse and providing time stamps back to the real time PC. This is another option that might be useful for certain setups (or especially for establishing benchmarks between behavior and physiology recordings). In our setup event alignment does not exclusively occur on the GPIO.

      a. Additionally, why is there a separate circuit for the treadmill that connects to the UI computer instead of the GPIO? It might be beneficial to elaborate on the rationale behind this decision in line 260.

      Event alignment does not occur on the GPIO, separating concerns between position tracking and more general input/output features which improves performance and simplifies debugging.  In this sense we maintain a single event loop on the Arduino, avoiding the need to either run multithreaded operations or rely extensively on interrupts which can cause unpredictable code execution (e.g. when multiple interrupts occur at the same time). Our position tracking circuit is therefore coupled to a separate,low-cost arduino mini which has the singular responsibility of position-tracking.

      b. Moreover, should scenarios involving pupil and body camera recordings connect to the Analog input in the PCB or the real-time computer for optimal data handling and processing?

      Pupil and body camera recordings would be independent data streams which can be recorded separately from behaviorMate. Aligning these forms of full motion video could require frame triggers which could be configured on the GPIO board using single TTL like outputs or by configuring a valve to be “pulsed” which is a provided type customization.

      We also note that a more advanced developer could easily leverage camera signals to provide closed loop control by writing an independent module that sends UDP packets to behavoirMate. For example a separate computer vision based position tracking module could be written in any preferred language and use UDP messaging to send body tracking updates to the UI without editing any of the behaviorMate source code (and even used for updating 1D location).

      (5) Given that all references, as far as I can see, come from the same lab, are there other labs capable of implementing this system at a similar optimal level?

      To date two additional labs have published using behaviorMate, the Soltez and Henn labs (see revised lines 341-342). Since behaviorMate has only recently been published and made available open source, only external collaborators of the Losonczy lab have had access to the software and design files needed to do this. These collaborators did, however, set up their own behavioral setups in separate locations with minimal direct support from the authors–similar to what would be available to anyone seeking to set a behaviorMate system would find online on our github page or by posting to the message board.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (4) To provide additional context for the significance of this work, additional citations would be helpful to demonstrate a ubiquitous need for a system like behaviorMate. This was most needed in the paragraph from lines 46-65, specifically for each sentence after line 55, where the authors discuss existing variants on head-fixed behavioral paradigms. For instance, for the clause "but olfactory and auditory stimuli have also been utilized at regular virtual distance intervals to enrich the experience with more salient cues", suggested citations include Radvansky & Dombeck 2018 (DOI: 10.1038/s41467-018-03262-4), Fischler-Ruiz et al. 2021 (DOI: 10.1016/j.neuron.2021.09.055).

      We thank the reviewer for the suggested missing citations and have updated the manuscript accordingly (see line 58).

      (5) In addition, it would also be helpful to clarify behaviorMate's implementation in other laboratories. On line 304 the authors mention "other labs" but the following list of citations is almost exclusively from the Losonczy lab. Perhaps the citations just need to be split across the sentence for clarity? E.g. "has been validated by our experimental paradigms" (citation set 1) "and successfully implemented in other labs as well" (citation set 2).

      We have split the citation set as suggested (see lines 338-342).

      Minor Comments:

      (6) In the paragraph starting line 153 and in Fig. 2, please clarify what is meant by "trial" vs. "experiment". In many navigational tasks, "trial" refers to an individual lap in the environment, but here "trial" seems to refer to the whole behavioral session (i.e. synonymous with "experiment"?).

      In our software implementation we had originally used “trial” to refer to an imaging session rather than experiment (and have made updates to start moving to the more conventional lexicon). To avoid confusion we have remove this use of “trial” throughout the manuscript and replaced with “experiment” whenever possible

      (7) This is very minor, but in Figure 3 and 4, I don't believe the gavage needle is actually shown in the image. This is likely to avoid clutter but might be confusing to some readers, so it may be helpful to have a small inset diagram showing how the needle would be mounted.

      We assessed the image both with and without the gavage needle and found the version in the original (without) to be easier to read and less cluttered and therefore maintained that version in the manuscript.

      (8) In Figure 5 legend, please list n for mice and cells.

      We have updated the Figure 5 legend to indicate that for panels C-G, n=6 mice (all mice were recorded in both VR and TM systems), 3253 cells in VR classified as significantly tuned place cells VR, and 6101 tuned cells in TM,

      (9) Line 414: It is not necessary to tilt the entire animal and running wheel as long as the head-bar clamp and objective can rotate to align the imaging window with the objective's plane of focus. Perhaps the authors can just clarify the availability of this option if users have a microscope with a rotatable objective/scan head.

      We have added the suggested caveat to the manuscript in order to clarify when the goniometers might be useful (see lines 281-288).

      (10) Figure S1 and S2 could be referenced explicitly in the main text with their related main figures.

      We have added explicit references to figures S1 and S2 in the relevant sections (see lines 443, 460  and 570)

      (11) On line 532-533, is there a citation for "proximal visual cues and tactile cues (which are speculated to be more salient than visual cues)"?

      We have added citations to both Knierim & Rao 2003 and Renaudineau et al. 2007 which discuss the differential impact of proximal vs distal cues during navigation as well as Sofroniew et al. 2014 which describe how mice navigate more naturally in a tactile VR setup as opposed to purely visual ones.

      (12) There is a typo at the end of the Figure 2 legend, where it should say "Arduino Mini."

      This typo has been fixed.

      Reviewer #2 (Recommendations For The Authors):

      (4) As mentioned in the public review: what is the major advantage of taking the IoT approaches as opposed to USB connections to the host computer, especially when behaviorMate relies on a central master computer regardless? The authors mentioned the readability of the JSON messages, making the system easier to debug. However, the flip side of that is the efficiency of data transmission. Although the bandwidth/latency is usually more than enough for transmitting data and commands for behavior devices, the efficiency may become a problem when neural recording devices (imaging or electrophysiology) need to be included in the system.

      behaviorMate is not intended to do everything, and is limited to mainly controlling behavior and providing some synchronizing TTL style triggers. In this way the system can easily and inexpensively be replicated across multiple recording setups; particularly this is useful for constructing additional animal training setups. The system is very much sufficient for capturing behavioral inputs at relevant timescales (see the benchmarks in Figures 3 and 4 as well as the position correlated neural activity in Figures 5 and 6 for demonstration of this). Additional hardware might be needed to align the behaviorMate output with neural data for example a high-speed DAQ or input channels on electrophysiology recording setups could be utilized (if provided). As all recording setups are different the ideal solution would depend on details which are hard to anticipate. We do not mean to convey that the full neural data would be transmitted to the behaviorMate system (especially using the JSON/UDP communications that behaviorMate relies on).

      (5) The author mentioned labView. A popular open-source alternative is bonsai (https://github.com/bonsai-rx/bonsai). Both include a graphical-based programming interface that allows the users to easily reconfigure the hardware system, which behaviorMate seems to lack. Additionally, autopilot (https://github.com/auto-pi-lot/autopilot) is a very relevant project that utilizes a local network for multiple behavior devices but focuses more on P2P communication and rigorously defines the API/schema/communication protocols for devices to be compatible. I think it's important to include a discussion on how behaviorMate compares to previous works like these, especially what new features behaviorMate introduces.

      We believe that behaviorMate provides a more opinionated and complete solution than the projects mentioned. A wide variety of 1D navigational paradigms can be constructed in behaviorMate without the need to write any novel software. For example, bonsai is a “visual programming language” and would require experimenters to construct a custom implementation of each of their experiments. We have opted to use Java for the UI with distributed computations across modules in various languages. Given the IOT methodology it would be possible to use any number of programming languages or APIs; a large number of design decisions were made  when building the project and we have opted to not include this level of detail in the manuscript in order to maintain readability. We strongly believe in using non-proprietary and open source projects, when possible, which is why the comparison with LabView based solutions was included in the introduction. Also, we have added a reference to the autopilot reference to the section of the introduction where this is discussed.

      (6) One of the reasons labView/bonsai are popular is they are inherently parallel and can simultaneously respond to events from different hardware sources. While the JSON events in behaviorMate are asynchronous in nature, the handling of those events seems to happen only in a main event loop coupled with GUI, which is sequential by nature. Is there any multi-threading/multi-processing capability of behaviorMate? If so it's an important feature to highlight. If not I think it's important to discuss the potential limitation of the current implementation.

      IOT solutions are inherently concurrent since the computation is distributed. Additional parallelism could be added by further distributing concerns between additional independent modules running on independent hardware. The UI has an eventloop which aggregates inputs and then updates contexts based on the current state of those inputs sequentially. This sort of a “snapshot” of the current state is necessary to reason about when the start certain contexts based on their settings and applied decorators. While the behaviorMate UI uses multithreading libraries in Java to be more performant in certain cases, the degree to which this represents true vs “virtual” concurrency would depend on the individual PC architecture it is run on and how the operating system allocates resources. For this reason, we have argued in the manuscript that behaviorMate is sufficient for controlling experiments at behaviorally relevant timescales, and have presented both benchmarks and discussed different synchronization approaches and permit users to determine if this is sufficient for their needs.

      (7) The context list is an interesting and innovative approach to abstract behavior contingencies into a data structure, but it's not currently discussed in depth. I think it's worth highlighting how the context list can be used to cover a wide range of common behavior experimental contingencies with detailed examples (line 185 might be a good example to give). It's also important to discuss the limitation, as currently the context lists seem to only support contingencies based purely on space and time, without support for more complicated behavior metrics (e.g. deliver reward only after X% correct).

      To access more complex behavior metrics during runtime, custom context list decorators would need to be implemented. While this is less common in the sort of 1D navigational behaviors the project was originally designed to control, adding novel decorators is a simple process that only requires basic object oriented programming knowledge. As discussed we are also implementing a plugin-architecture in the JavaFX update to streamline these types of additions.

      Minor Comments:

      (8) In line 202, the author suggests that a single TTL pulse is sent to mark the start of a recording session, and this is used to synchronize behavior data with imaging data later. In other words, there are no synchronization signals for every single sample/frame. This approach either assumes the behavior recording and imaging are running on the same clock or assumes evenly distributed recording samples over the whole recording period. Is this the case? If so, please include a discussion on limitations and alternative approaches supported by behaviorMate. If not, please clarify how exactly synchronization is done with one TTL pulse.

      While the TTL pulse triggers the start of neural data in our setups, various options exist for controlling for the described clock drift across experiments and the appropriate one depends on the type of recordings made, frame rate duration of recording etc. Therefore behaviorMate leaves open many options for synchronization at different time scales (e.g. the adding a frame-sync circuit as shown in Figure 1 or sending TTL pulses to the same DAQ recording electrophysiology data).  Expanded consideration of different synchronization methods has been included in the manuscript (see lines 224-238).

      (9) Is the computer vision-based calibration included as part of the GUI functionality? Please clarify. If it is part of the GUI, it's worth highlighting as a very useful feature.

      The computer vision-based benchmarking is not included in the GUI. It is in the form of a script made specifically for this paper. However for treadmill-based experiments behaviorMate has other calibration tools built into it (see line 301-303).

      (10) I went through the source code of the Arduino firmware, and it seems most "open X for Y duration" functions are implemented using the delay function. If this is indeed the case, it's generally a bad idea since delay completely pauses the execution and any events happening during the delay period may be missed. As an alternative, please consider approaches comparing timestamps or using interrupts.

      We have avoided the use of interrupts on the GPIO due to the potential for unpredictable code execution. There is a delay which is only just executed if the duration is 10 ms or less as we cannot guarantee precision of the arduino eventloop cycling faster than this. Durations longer than 10 ms would be time stamped and non-blocking. We have adjusted this MAX_WAIT to be specified as a macro so it can be more easily adjusted (or set to 0).

      (11) Figure 3 B, C, D, and Figure 4 D, E suffer from noticeable low resolution.

      We have converted Figure 3B, C, D and 4C, D, E to vector graphics in order to improve the resolution.

      (12) Figure 4C is missing, which is an important figure.

      This figure appeared when we rendered and submitted the manuscript. We apologize if the figure was generated such that it did not load properly in all pdf viewers. The panel appears correctly in the online eLife version of the manuscript. Additionally, we have checked the revision in Preview on Mac OS as well as Adobe Acrobat and the built-in viewer in Chrome and all figure panels appear in each so we hope this issue has been resolved.

      (13) There are thin white grid lines on all heatmaps. I don't think they are necessary.

      The grid lines have been removed from the heatmaps  as suggested.

      (14) Line 562 "sometimes devices directly communicate with each other for performance reasons", I didn't find any elaboration on the P2P communication in the main text. This is potentially worth highlighting as it's one of the advantages of taking the IoT approaches.

      In our implementation it was not necessary to rely on P2P communication beyond what is indicated in Figure 1. The direct communication referred to in line 562 is meant only to refer to the examples expanded on in the rest of the paragraph i.e. the behavior controller may signal the microscope directly using a TTL signal without looping back to the UI. As necessary users could implement UDP message passing between devices, but this is outside the scope of what we present in the manuscript.

      (15) Line 147 "Notably, due to the systems modular architecture, different UIs could be implemented in any programming language and swapped in without impacting the rest of the system.", this claim feels unsupported without a detailed discussion of how new code can be incorporated in the GUI (plugin system).

      This comment refers to the idea of implementing “different UIs”. This would entail users desiring to take advantage of the JSON messaging API and the proposed electronics while fully implementing their own interface. In order to facilitate this option we have improved documentation of the messaging API posted in the README file accompanying the arduino source code. We have added reference to the supplemental materials where readers can find a link to the JSON API implementation to clarify this point.

      Additionally, while a plugin system is available in the JavaFX version of behaviorMate, this project is currently under development and will update the online documentation as this project matures, but is unrelated to the intended claim about completely swapping out the UI.

      Reviewer #3 (Recommendations For The Authors):

      (6) Figure 1 - the terminology for each item is slightly different in the text and the figure. I think making the exact match can make it easier for the reader.

      - Real-time computer (figure) vs real-time controller (ln88).

      The manuscript was adjusted to match figure terminology.

      - The position controller (ln565) - position tracking (Figure).

      We have updated Figure 1 to highlight that the position controller does the position tracking.

      - Maybe add a Behavior Controller next to the GPIO box in Figure 1.

      We updated Figure 1 to highlight that the Behavior Controller performs the GPIO responsibility such that "Behavior Controller" and "GPIO circuit" may be used interchangeably.

      - Position tracking (fig) and position controller (subtitle - ln209).

      We updated Figure 1 to highlight that the position controller does the position tracking.

      - Sync Pulse is not explained in the text.

      The caption for Figure 1 has been updated to better explain the Sync pulse and additional systems boxes

      (7) For Figure 3B/C: What is the number of data points? It would be nice to see the real population, possibly using a swarm plot instead of box plots. How likely are these outliers to occur?

      In order to better characterize the distributions presented in our benchmarking data we have added mean and standard deviation information the plots 3 and 4. For Figure 3B: 0.0025 +/- 0.1128, Figure 3C: 12.9749 +/- 7.6581, Figure 4C: 66.0500 +/- 15.6994, Figure 4E: 4.1258 +/- 3.2558.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Time periods in which experience regulates early plasticity in sensory circuits are well established, but the mechanisms that control these critical periods are poorly understood. In this manuscript, Leier and Foden and colleagues examine early-life critical periods that regulate the Drosophila antennal lobe, a model sensory circuit for understanding synaptic organization. Using early-life (0-2 days old) exposure to distinct odorants, they show that constant odor exposure markedly reduces the volume, synapse number, and function of the VM7 glomerulus. The authors offer evidence that these changes are mediated by invasion of ensheathing glia into the glomerulus where they phagocytose connections via a mechanism involving the engulfment receptor Draper.

      This manuscript is a striking example of a study where the questions are interesting, the authors spent a considerable amount of time to clearly think out the best experiments to ask their questions in the most straightforward way, and expressed the results in a careful, cogent, and well-written fashion. It was a genuine delight to read this paper. I have two experimental suggestions that would really round out existing work to better support the existing conclusions and some instances where additional data or tempered language in describing results would better support their conclusions. Overall, though, this is an incredibly important finding, a careful analysis, and an excellent mechanistic advance in understanding sensory critical period biology.

      We thank the reviewer for their thoughtful and constructive comments on our manuscript. In response to their critiques, we conducted several new experiments as well as additional analysis and making changes to the text. As requested, we carried out an electrophysiological analysis of VM7 PN firing in draper knockdown animals with and without odor exposure. To our surprise, loss of glial Draper fully suppresses the dramatic reduction in spontaneous PN activity observed following critical period ethyl butyrate exposure, arguing that the functional response is restored alongside OSN morphology. It also suggests that the OR42a OSN terminals are intact and functional until they are phagocytosed by ensheathing glia. In other words, glia are not merely clearing axon terminals that have already degenerated. This evidence provides additional support to the claim that the VM7 glomerulus will be an outstanding model for defining mechanism of experience-dependent glial pruning. Detailed responses to the reviewers’ comments follow below. 

      Regarding the apparent disconnect between the near complete silencing of PNs versus the 50% reduction in OR42a OSN infiltration volume, we agree with the reviewer that this tracks with previous data in the field. While our Imaris pipeline is relatively sensitive, it may not pick up modest changes to terminal arbor architecture. Indeed, as described in Jindal et al. (2023) and in the Methods in this manuscript, we chose conservative software settings that, if anything, would undercount the percent change in infiltration volume. We also note that increased inhibitory LN inputs onto PNs could contribute to dramatic PN silencing we observe. While fascinating, we view LN plasticity beyond the scope of the current manuscript. We removed any mention of ‘silent synapses’ and now speculate about increased inhibition. 

      Reviewer #1 (Recommendations For The Authors):

      Major Elements:

      (1) The authors demonstrate that loss of draper in glia can suppress many of the pruning related phenotypes associated with EB exposure. However, they do not assess electrophysiological output in these experiments, only morphology. It would be great to see recordings from those animals to see if the functional response is also restored.

      We performed the experiment the reviewer requested (see Figure 4F-J). We are pleased to report that our recordings from VM7 PNs match our morphology measurements: in repo-GAL4>UAS-draper RNAi flies, there was no difference in the innervation of VM7 PNs between animals exposed to mineral oil or 15% EB from 0-2 DPE. This result is in sharp contrast to the near-total loss of OSN-PN innervation in flies with intact glial Draper signaling, and strongly validates the role we propose for Draper in the Or42a OSN critical period.

      (2) There is a disconnect between physiology and morphology with a near complete loss of activity from VM7 PNs but a less severe loss of ORN synapses. While not completely incongruent (previous work in the AL showed a complete loss of attractive behavior though synapse number was only reduced 40% - Mosca et al. 2017, eLife), it is curious. Can the authors comment further? Ideally, some of these synapses could be visualized by EM to determine if the remaining synapses are indeed of correct morphology. If not, this could support their assertion of silent inputs from page 7. Further, what happens to the remaining synapses? VM7 PNs should be receiving some activity from other local interneurons as well as neighboring PNs.

      We agree that on the surface, our electrophysiology results are more striking than one might expect solely from our measurements of VM7 morphology and presynaptic content. As the reviewer points out, previous studies of fly olfaction have consistently found that relatively modest shifts in glomerular volume in response to prolonged earlylife odorant exposure can be accompanied by drastic changes in physiology and behavior (in addition, we would add Devaud et al., 2003; Devaud et al., 2001; Acebes et al., 2012; and Chodankar et al., 2020, as foundational examples of this phenomenon). 

      A major driver of these changes appears to be remodeling of antennal lobe inhibitory LNs (see Das et al., 2011; Wilson and Laurent, 2005; Chodankar et al., 2020), especially GABAergic inhibitory interneurons. Perhaps increased LN inhibition of chronically activated PNs, on top of the reduced excitatory inputs resulting from ensheathing glial pruning of the Or42a OSN terminal arbor, would explain the near-total loss of VM7 PN activity we observe after critical period EB exposure. However, given that the scope of our study is limited to critical-period glial biology and does not address the complex topics of LN rewiring or synapse morphology, we have removed the sentence in which we raise the possibility of “silent synapses” in order to avoid confusion. The reviewer is also correct that VM7 PNs have inputs from non-ORN presynaptic partners, including LNs and PNs. So again, perhaps increased inhibitory inputs contributes to the near-complete silencing of the PNs. Given the heterogeneity of LN populations, we view this area as fertile ground for future research. 

      Language / Data Considerations:

      (1) Or42a OSNs have other inputs, namely, from LNs. What are they doing here? Are they also affected?

      As discussed above, the question of how LN innervation of Or42a OSNs is altered by critical-period EB exposure is an intriguing one that fully deserves its own follow-up study, and we have tried to avoid speculation about the role of LNs when discussing our pruning phenotype. We note at multiple points throughout the text the importance of LNs and refer to previous studies of LN plasticity in response to chronic odorant exposure. 

      (2) In all of the measurements, what happens to synaptic density? Is it maintained? Does it scale precisely? This would be helpful to know.

      We have performed the analysis as requested, which is now included in a supplement to Figure 5. We found that synaptic density shows no trend in variation across conditions and glial driver genotypes.

      (3) In Figure 5, the controls for the alrm-GAL4 experiments show a much more drastic phenotype than controls in previous figures? Does this background influence how we can interpret the results? Could the response have instead hit a floor effect and it's just not possible to recover?

      The reviewer is correct that following EB exposure, astrocyte vs. ensheathing glial driver backgrounds displayed modest differences in the extent of pruning by volume (0.27 for astros, 0.36 for EG). We note that the two drpr RNAi lines that we used had non-significant (but opposite) effects on the estimated size of OSN42a OSN volume in combination with the astrocyte driver, arguing against a floor effect. In addition, a recent publication by Nelson et al. (2024) replicated our findings with a different astrocyte GAL4 driver and draper RNAi line. Thus, we are confident that this result is biologically meaningful and not an artifact of genetic background. 

      (4) The estimation of infiltration measurement in Figure 6 is tricky to interpret. It implies that the projections occupy the same space, which cannot be possible. I'd advocate a tempering of some of this language and consider an intensity measurement in addition to their current volume measurements (or perhaps an "occupied space" measurement) to more accurately assess the level of resolution that can be obtained via these methods.

      We completely agree that our language in describing EG infiltration could have been more precise, and we modified our language as suggested. The combination of the Or42a-mCD8::GFP label we and others use, our use of confocal microscopy, and our Surface pipeline in Imaris combine to create a glomerular mask that traces the outline of the OSN terminal arbor, but is nonetheless not 100% “filled” by neuronal membrane and/or glial processes. 

      (5) Do the authors have the kind of resolution needed to tell whether there is indeed Or42a-positive axon fragmentation (as asserted on p16 and from their data in figures 4, 5, 7). If the authors want to say this, I would advocate for a measurement of fragmentation / total volume to prove it - if not, I would advocate tempering of the current language.

      The reviewer brings up a fair criticism: while our assertion about axon fragmentation was based on our visual observations of hundreds of EB-exposed brains, the resolution limits of confocal microscopy do not allow us to rigorously rule out fragmentation within a bundle of OSN axons. Instead, our most compelling evidence for the lack of EB-induced Or42a OSN fragmentation in the absence of glial Draper comes from our new electrophysiology data (Figure 4F-J) in repo-GAL4>UAS-draper RNAi animals. We found no difference in spontaneous release from Or42a terminals in flies exposed to mineral oil or 15% EB from 0-2 DPE, which would not be the case if there was Draper-independent fragmentation along the axons or terminal arbors upon EB exposure. We have updated our discussion of fragmentation so that our statements are based on this new evidence, and not confocal microscopy. 

      (6) There is an interesting Discussion opportunity missed here. Some experiments would, ostensibly, require pupae to detect odorants within the casing via structures consistently in place for olfaction during pupation. It would be useful for the authors to discuss a little more deeply when this critical period may arise and why the experiment where pupae are exposed to EB two days before eclosion and there is no response, occurs as it does. I agree that it's clearly a time when they are not sensitive to the odorant, but that could just be because there's no ability to detect odorants at that time. Is it a question of non-sensitivity to EB or just non-sensitivity to everything?

      We share the reviewer’s interest in the plasticity of the olfactory circuit during pupariation, although, as they correctly point out, it is difficult to conceive of an odorant-exposure experiment that could disentangle the barrier effects of puparium from the sensitivity of the circuit itself, and our pre-eclosion data in Figure 3A, D, G does not distinguish between the two. While an investigation into mechanism by which the critical period for ethyl butyrate exposure opens and closes is outside the scope of the present study, we would consider the physical barrier of the puparium to be a satisfactory explanation for why eclosion marks the functional opening of experiencedependent plasticity. As the reviewer suggests, we have added this important nuance to our discussion of the opening of the critical period in the corresponding paragraph of the Results, as well as to the Discussion section “Glomeruli exhibit dichotomous responses to critical period odor exposure.” 

      Minor Elements:

      (1) Page 6 bottom: "Or4a-mCD8::GFP" should be "Or42a-mCD8::GFP"

      (2) Page 15, end of last full paragraph. Remove the "e"

      Thank you for pointing out these typos. They have been corrected. 

      Reviewer #2 (Public Review):

      Sensory experiences during developmental critical periods have long-lasting impacts on neural circuit function and behavior. However, the underlying molecular and cellular mechanisms that drive these enduring changes are not fully understood. In Drosophila, the antennal lobe is composed of synapses between olfactory sensory neurons (OSNs) and projection neurons (PNs), arranged into distinct glomeruli. Many of these glomeruli show structural plasticity in response to early-life odor exposure, reflecting the sensitivity of the olfactory circuitry to early sensory experiences.

      In their study, the authors explored the role of glia in the development of the antennal lobe in young adult flies, proposing that glial cells might also play a role in experiencedependent plasticity. They identified a critical period during which both structural and functional plasticity of OSN-PN synapses occur within the ethyl butyrate (EB)responsive VM7 glomerulus. When flies were exposed to EB within the first two days post-eclosion, significant reductions in glomerular volume, presynaptic terminal numbers, and postsynaptic activity were observed. The study further highlights the importance of the highly conserved engulfment receptor Draper in facilitating this critical period plasticity. The authors demonstrated that, in response to EB exposure during this developmental window, ensheathing glia increase Draper expression, infiltrate the VM7 glomerulus, and actively phagocytose OSN presynaptic terminals. This synapse pruning has lasting effects on circuit function, leading to persistent decreases in both OSN-PN synapse numbers and spontaneous PN activity as analyzed by perforated patch-clamp electrophysiology to record spontaneous activity from PNs postsynaptic to Or42a OSNs.

      In my view, this is an intriguing and potentially valuable set of data. However, since I am not an expert in critical periods or habituation, I do not feel entirely qualified to assess the full significance or the novelty of their findings, particularly in relation to existing research.

      We thank the reviewer for their insightful critique of our work. In response to their comments, we added additional physiological analysis and tempered our language around possible explanations for the apparent disconnect between the physiological and morphological critical period odor exposure. These changes are explained in more detail in the response to the public review by Reviewer 1 and also in our responses outlined below. 

      Reviewer #2 (Recommendations For The Authors):

      I though do have specific comments and questions concerning the presynaptic phenotype they deduce from confocal BRP stainings and electrophysiology.

      Concerning the number of active zones: this can hardly be deduced from standardresolution confocal images and, maybe more importantly, lacking postsynaptic markers. This particularly also in the light of them speculating about "silent synapses". There are now tools existing concerning labeled, cell type specific expression of acetylcholine-receptor expression and cholinergic postsynaptic density markers (importantly Drep2). Such markers should be entailed in their analysis. They should refer to previous concerning "brp-short" concerning its original invention and prior usage.

      We thank the reviewer for their thoughtful approach to our methodology and claims. While the use of confocal microscopy of Bruchpilot puncta to estimate numbers of presynapses is standard practice (see Furusawa et al., 2023; Aimino et al., 2022; Urwyler et al., 2019; Ackerman et al., 2021), the reviewer is correct that a punctum does not an active zone make. Bruchpilot staining and quantification is a well-validated tool for approximating the number of presynaptic active zones, not a substitute for super-resolution microscopy. We made changes to our language about active zones to make this distinction clearer. We have also removed the sentence where we discuss the possibility of “silent synapses,” which both reviewers felt was too speculative for our existing data. Finally, we are highly interested in characterizing the response of PNs and higher-order processing centers to critical-period odorant exposure as a future direction for our research. However, given the complexity of the subject, we chose to limit the scope of this study to the interactions between OSNs and glia. 

      Regarding their electrophysiological analysis and the plausibility of their findings: I am uncertain whether the moderate reduction in BRP puncta at the relevant OSN::PN synapse can fully account for the significantly reduced spontaneous PN activity they report. This seems particularly doubtful in the absence of any direct evidence for postsynaptically silent synapses. Perhaps this is my own naivety, but I wonder why they did not use antennal nerve stimulation in their experiments?

      We refer to previous studies of the AL indicating that moderate changes in glomerular volume and presynaptic content can translate to far more striking alterations in electrophysiology and behavior (Devaud et al., 2003; Devaud et al., 2001; Acebes et al., 2012; and Chodankar et al., 2020, Mosca et al., 2017). This literature has demonstrated that chronic odorant exposure can result in remodeling of inhibitory local interneurons to suppress over-active inputs from OSNs. While we do not address the complex subject of interneuron remodeling in the present study, we find it highly likely that there would be significant changes in interneuron innervation of PNs, independent of glial phagocytosis of OSN excitatory inputs, resulting in additional inhibition. Moving forward, we are very interested in expanding these studies to include odor-evoked changes in PN activity.  

      Additional minor point: The phrase "Soon after its molecular biology was described (et al., 1999), the Drosophila melanogaster" seems somewhat misleading. Isn't the field still actively describing the molecular biology of the fly olfactory system?

      We completely agree and have removed this sentence entirely.  

      Reviewing Editor's Note: to enhance the evidence from mostly compelling in most facets to solid would be to add physiology to the Draper analysis.

      These experiments have been completed and are presented in Figure 4F-J. 

      References

      Acebes A, Devaud J-M, Arnés M, Ferrús A. 2012. Central Adaptation to Odorants Depends on PI3K Levels in Local Interneurons of the Antennal Lobe. J Neurosci 32:417–422. doi:10.1523/jneurosci.2921-11.2012

      Ackerman SD, Perez-Catalan NA, Freeman MR, Doe CQ. 2021. Astrocytes close a motor circuit critical period. Nature592:414–420. doi:10.1038/s41586-021-03441-2

      Aimino MA, DePew AT, Restrepo L, Mosca TJ. 2022. Synaptic Development in Diverse Olfactory Neuron Classes Uses Distinct Temporal and Activity-Related Programs. J Neurosci 43:28–55. doi:10.1523/jneurosci.0884-22.2022

      Chodankar A, Sadanandappa MK, VijayRaghavan K, Ramaswami M. 2020. Glomerulus-Selective Regulation of a Critical Period for Interneuron Plasticity in the Drosophila Antennal Lobe. J Neurosci 40:5549–5560. doi:10.1523/jneurosci.2192-19.2020

      Das S, Sadanandappa MK, Dervan A, Larkin A, Lee JA, Sudhakaran IP, Priya R, Heidari R, Holohan EE, Pimentel A, Gandhi A, Ito K, Sanyal S, Wang JW, Rodrigues V, Ramaswami M. 2011. Plasticity of local GABAergic interneurons drives olfactory habituation. Proc Natl Acad Sci 108:E646–E654. doi:10.1073/pnas.1106411108 Devaud J, Acebes A, Ramaswami M, Ferrús A. 2003. Structural and functional changes in the olfactory pathway of adult Drosophila take place at a critical age. J Neurobiol 56:13–23. doi:10.1002/neu.10215

      Devaud J-M, Acebes A, Ferrus A. 2001. Odor Exposure Causes Central Adaptation and ́Morphological Changes in Selected Olfactory Glomeruli in Drosophila. J Neurosci 21:6274–6282. doi:10.1523/jneurosci.21-16-06274.2001

      Furusawa K, Ishii K, Tsuji M, Tokumitsu N, Hasegawa E, Emoto K. 2023. Presynaptic Ube3a E3 ligase promotes synapse elimination through down-regulation of BMP signaling. Science 381:1197–1205. doi:10.1126/science.ade8978

      Mosca TJ, Luginbuhl DJ, Wang IE, Luo L. 2017. Presynaptic LRP4 promotes synapse number and function of excitatory CNS neurons. eLife 6:e27347. doi:10.7554/elife.27347

      Nelson N, Vita DJ, Broadie K. 2024. Experience-dependent glial pruning of synaptic glomeruli during the critical period. Sci Rep 14:9110. doi:10.1038/s41598-024-59942-3

      Urwyler O, Izadifar A, Vandenbogaerde S, Sachse S, Misbaer A, Schmucker D. 2019. Branch-restricted localization of phosphatase Prl-1 specifies axonal synaptogenesis domains. Science 364. doi:10.1126/science.aau9952

      Wilson RI, Laurent G. 2005. Role of GABAergic Inhibition in Shaping Odor-Evoked Spatiotemporal Patterns in the Drosophila Antennal Lobe. J Neurosci 25:9069–9079.

      doi:10.1523/jneurosci.2070-05.2005

    1. Author response:

      We thank all the reviewers for their insightful comments on this work.

      Response to Reviewer #1:

      We greatly appreciate your comments on the general reliability and significance of our work. We fully agree that it would have been ideal to have additional evidence related to the role of PEBP1 in HRI activation. Unfortunately, we have not been able to find phospho-HRI antibodies that work reliably. The literature seems to agree with this as a band shift using total-HRI antibodies is usually used to study HRI activation. However, with the cell lines showing the most robust effect with PEBP1 knockout or knockdown, we are yet to convince ourselves with the band shifts we see. This could be addressed by optimizing phos-tag gels although these gels can be a bit tricky with complex samples such as cell lysates which contain many phosphoproteins.

      To address the interaction between PEBP1 and eIF2alpha more rigorously we were inspired by the insights you and reviewer #2 provided. While we are unable to do further experiments, we now think it would indeed be possible to do this with either using the purified proteins and/or CETSA WB. These experiments could also provide further evidence for the role of PEBP1 phosphorylation. Although phosphorylation of PEBP1 at S153 has been implicated as being important for other functions of PEBP1, we are not sure about its role here. It may indeed have little relevance for ISR signalling.

      For the in vitro thermal shift assay, we have performed two independent experiments. While it appears that there is a slight destabilization of PEBP1 by oligomycin, the ultimate conclusion of this experiment remains incomplete as there could be alternative explanations despite the apparent simplicity of the assay due the fluorescence background by oligomycin only. We now provide a lysate based CETSA analysis which does not display the same PEBP1 stabilization as the intact cell experiment. As for the signal saturation in ATF4-luciferase reporter assay, this is a valid point.

      Response to Reviewer #2:

      We strongly agree that CETSA has a lot of potential to inform us about cellular state changes and this was indeed the starting point for this project. We apologize for being (too) brief with the explanations of the TPP/MS-CETSA approach and we have now added a bit more detail. With regard to the cut-offs used for the mass spectrometry analysis, you are absolutely right that we did not establish a stringent cut-off that would show the specificity of each drug treatment. Our take on the data was that using the p values (and ignoring the fold-changes) of individual protein changes as in Fig 1D, we can see that mitochondrial perturbations display a coordinated response. We now realize that the downside of this representation is that it obscures the largest and specific drug effects. As mentioned in the response to Reviewer #1, we now also think that it would be possible to obtain more evidence for the potential interaction between PEBP1 and eIF2alpha using CETSA-based assays.

      Response to Reviewer #3:

      Thank you for your assessment, we agree that this manuscript would have been made much stronger by having clearer mechanistic insights. As mentioned in the responses to other reviewers above, we aim to address this limitation in part by looking at the putative interaction between PEBP1 and eIF2alpha with orthogonal approaches. However, we do realize that analysis of protein-protein interactions can be notoriously challenging due to false negative and false positive findings. As with any scientific endeavor, we will keep in mind alternative explanations to the observations, which could eventually provide that cohesive model explaining how precisely PEBP1, directly or indirectly, influences ISR signalling.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      The data overall are very solid, and I would only recommend the following minor changes: 

      (1) Line 187 and line 268: there is perhaps a trend towards slightly increased ATF4-luc reporter with PEBP1-S153D, but it is not statistically significant, so I would tone down the wording here. 

      We now modified this part to "This data is consistent with the modest increase…" .

      (2) The recently discovered SIFI complex (Haakonsen 2024, https://doi.org/10.1038/s41586023-06985-7) regulates both HRI and DELE1 through bifunctional localization/degron motifs. It seems like PEBP1 also contains such a motif, which suggests a potential mechanism for enrichment near mitochondria, perhaps even in response to stress. Maybe the authors could further speculate on this in the discussion. 

      While working on the manuscript, we considered the possibility that PEBP1 function could be related to SIFI complex and concluded that here is a critical difference: while  SIFI specifically acts to turn off stress response signalling, loss of PEBP1 prevents eIF2alpha phosphorylation. We did not however consider that PEBP1 could have a localization/degron motif. Motif analysis by deepmito (busca.biocomp.unibo.it) and similar tools did not identify any conventional mitochondrial targeting signal although we acknowledge that PEBP1 has a terminal alpha-helix which was identified for SIFI complex recognition. We are not sure why you think PEBP1 contains such a motif and therefore are hesitant to speculate on this further in the manuscript.

      (3) Line 358: references 50 and 45 are identical. 

      Thank you for spotting this. Corrected now. 

      (4) Figure S1D: it looks like Oligomycin has a significant background fluorescence, which makes interpretation of these graphs difficult - do you have measurements of the compound alone that can be used to subtract this background from the data? Based on the Tm I would say it does stabilize recombinant PEBP1, and there is no quantification of the variance across the 3 replicates to say there is no difference. 

      You are right, this assay is problematic due to the background fluorescence. The measurements with oligomycin only and subtracting this background results in slightly negative values and nonsensical thermal shift curves. We now additionally show quantification from two different experiments (unfortunately we ran out of reagents for further experiments), and this quantification shows that if anything, oligomycin causes mild destabilization of recombinant PEBP1. We also used lysate CETSA assay which does not show thermal stabilization of PEBP1 by oligomycin, ruling out a direct effect. We attempted to use ferrostatin1 as a positive control as it may bind PEBP1-ALOX protein complex, and it appeared to show marginal stabilization of PEBP1. 

      Reviewer #2 (Recommendations for the authors): 

      I have a few comments for the authors to address: 

      (1) The MS-CETSA experiment is quite briefly described and this could be expanded somewhat. Not clear if multiple biological replicates are used. Is there any cutoff in data analysis based on fold change size (which correlated to the significance of cellular effects), etc? As expected from only one early timepoint (see eg PMID: 38328090), there appear to be a limited number of significant shifts over the background (as judged from Figure S1A). In the Excel result file, however (if I read it right) there are large numbers of proteins that are assigned as stabilized or destabilized. This might be to mark the direction of potential shifts, but considering that most of these are likely not hits, this labeling could give a false impression. Could be good to revisit this and have a column for what could be considered significant hits, where a fold change cutoff could help in selecting the most biologically relevant hits. This would allow Figure 1D to be made crisper when it likely dramatically overestimates the overlap between significant CETSA shifts for these drugs.  

      Fair point, while we focused more on PEBP1, it is important to have sufficient description of the methods. We used duplicate samples for the MS, which is probably the most important point which was absent from the original submission as is now added to the methods. We also added slightly more description on the data analysis. While the AID method does not explicitly use log2 fold changes, it does consider the relative abundance of proteins under different temperature fractions. Since the Tm (melting temperature) for each protein can be at any temperature, we felt that if would be complicated to compare fractions where the protein stability is changed the most and even more so if we consider both significance and log2FC. Therefore, we used this multivariate approach which indicates the proteins with most likely changes across the range of temperatures. To acknowledge that most of the statistically significant changes are not the much over the background as you correctly pointed out, we now add to the main text that “However, most of these changes are relatively small. To focus our analysis on the most significant and biologically relevant changes…” We also agree that it may be confusing that the AID output reports de/stabilization direction for all proteins. In general, we are not big fans of cutoffs as these are always arbitrary, but with multivariate p value of 0.1 it becomes clear that there are only a relatively small number of hits with larger changes. We have now added to the guide in the data sheet that "Primarily, use the adjusted p value of the log10 Multivariate normal pvalue for selecting the overall statistically significant hits (p<0.05 equals  -1.30 or smaller; p<0.01 equals  -2 or smaller)". We have also added to the guide part of the table that “Note that this prediction does not consider whether the change is significant or not, it only shows the direction of change”

      (2) On page 4 the authors state "We reasoned that thermal stability of proteins might be particularly interesting in the context of mitochondrial metabolism as temperature-sensitive fluorescent probes suggest that mitochondrial temperature in metabolically active cells is close to 50{degree sign}C". I don't see the relevance of this statement as an argument for using TPP/CETSA. When this is also not further addressed in the work, it could be deleted.

      Deleted. We agree, while this is an interesting point, it is not that relevant in this paper. 

      (3) To exclude direct drug binding to PEBP1, a thermofluor experiment is performed (Fig S1D). However, the experiment gives a high background at the lower temperatures and it could be argued that this is due to the flouroprobe binding to a hydrophobic pocket of the protein, and that oligomycin at higher concentrations competes with this binding, attenuating fluorescence. These are complex experiments and there could be other explanations, but the authors should address this. An alternative means to provide support for non-binding would be a lysate CETSA experiment, with very short (1-3 minutes) drug exposure before heating. This would typically give a shift when the protein is indicated to be CETSA responsive as in this case. 

      Agree. However, we don't have good means to perform the thermofluor experiments to rule out alternative explanations. What we can say is (as discussed above for reviewer #1, point 4) that quantification from two different experiments shows that oligomycin is does not thermally stabilizing recombinant PEBP1. To complement this conclusion, we used lysate CETSA assay which does not show thermal stabilization of PEBP1 by oligomycin. In this assay we attempted to use ferrostatin1 as a positive control as it may bind PEBP1-ALOX protein complex, and it appeared to show marginal stabilization of PEBP1. But since we lack a robust positive control for these assays, some doubt will inevitably remain.

      (4) The authors appear to have missed that there is already a MS-CETSA study in the literature on oligomycin, from Sun et al (PMID: 30925293). Although this data is from a different cell line and at a slightly longer drug treatment and is primarily used to access intracellular effects of decreased ATP levels induced by oligomycin, the authors should refer to this data and maybe address similarities if any.  

      Apologies for the oversight, the oligomycin data from this paper eluded us at it was mainly presented in the supplementary data. We compared the two datasets and find found some overlap despite the differences in the experimental details. Both datasets share translational components (e.g. EIF6 and ribosomal proteins), but most notably our other top hit BANF1 which we mentioned in the main text was also identified by Sun et al. We have updated the manuscript text as "Other proteins affected by oligomycin included BANF1, which binds DNA in an ATP dependent manner [16], and has also identified as an oligomycin stabilized protein in a previous MS-CETA experiment [23]", citing the Sun et al paper.   

      (5) The confirmation of protein-protein interaction is notoriously prone to false positives. The authors need to use overexpression and a sensitive reporter to get positive data but collect additional data using mutants which provide further support. Typically, this would be enough to confirm an interaction in the literature, although some doubt easily lingers. When the authors already have a stringent in-cell interaction assay for PEBP1 in the CETSA thermal shift, it would be very elegant to also apply the CETSA WB assay to the overexpressed constructs and demonstrate differences in the response of oligomycin, including the mutants. I am not sure this is feasible but it should be straightforward to test. 

      This is a very good suggestion. Unfortunately, due to the time constraints of the graduate students (who must write up their thesis very soon), we are not able to perform and repeat such experiments to the level of confidence that we would like.

      (6) At places the story could be hard to follow, partly due to the frequent introduction of new compounds, with not always well-stated rationale. It could be useful to have a table also in the main manuscript with all the compounds used, with the rationale for their use stated. Although some of the cellular pathways addressed are shown in miniatures in figures, it could be useful to have an introduction figure for the known ISR pathways, at least in the supplement. There are also a number of typos to correct. 

      We agree that there are many compounds used. We have attempted to clarify their use by adding this information into the table of used compounds in the methods and adding an overall schematic to Fig S1G and a note on line 132 "(see Figure 1-figure supplement 1G for summary of drugs used to target PEBP1 and ISR in this manuscript). We have also attempted to remove typos as far as possible.

      (7) EIF2a phosphorylation in S1E does not appear to be more significant for Sodium Arsenite argued to be a positive control, than CCCP, which is argued to be negative. Maybe enough with one positive control in this figure? 

      This experiment was used as a justification for our 30 min time point for the proteomics. By showing the 30 min and 4 h time points as Fig 1G and Figure 1-figure supplement 1F, our point was to demonstrate that the kinetics of phosphorylation and dephosphorylation are relevant. As you correctly pointed out, the stress response induced by sodium arsenite, but also tunicamycin is already attenuated at the 4h time point. We prefer to keep all samples to facilitate comparisons.

      (8) Page 7 reference to Figure S2H, which doesn't exist. Should be S3H.  

      Apologies for the mistake, now corrected to Figure 2-figure supplement 1B.

      (9) Finally, although the TPP labeling of the method is used widely in the literature this is CETSA with MS detection and MS-CETSA is a better term. This is about thermal shifts of individual proteins which is a very well-established biophysical concept. In contrast, the term Thermal Proteome Profiling does not relate to any biophysical concept, or real cell biology concept, as far as I can see, and is a partly misguided term. 

      We changed the term TPP into MS-CETSA, but also include the term TPP in the introduction to facilitate finding this paper by people using the TPP term.

      Reviewer #3 (Recommendations for the authors): 

      Major Issues 

      (1) The one major issue of this work is the lack of a mechanism showing precisely how PEBP1 amplifies the mitochondrial integrated stress response. The work, as it is described, presents data suggesting PEBP1's role in the ISR but fails to present a more conclusive mechanism. The idea of mitochondrial stress causing PEBP1 to bind to eIF2a, amplifying ISR is somewhat vague. Thus, the lack of a more defined model considerably weakens the argument, as the data is largely corollary, showing KO and modulation of PEBP1 definitely has a unique effect on the ISR, however, it is not conclusive proof of what the authors claim. While KO of PEBP1 diminishes the phosphorylation of eIF2a, taken together with the binding to eIF2a, different pathways could be simultaneously activated, and it seems premature to surmise that PEBP1 is specific to mitochondrial stress. Could PEBP1 be reacting to decreased ATP? Release of a protein from the mitochondria in response to stress? Is PEBP1's primary role as a modulator of the ISR, or does it have a role in non-stress-related translation? A cohesive model would tie together these separate indirect findings and constitute a considerable discovery for the ISR field, and the mitochondrial stress field.  

      Thank you for your assessment, we agree that this manuscript would have been much stronger by having clearer mechanistic insights. As with any scientific endeavor, we will keep in mind alternative explanations to the observations, which could eventually provide that cohesive model explaining how precisely PEBP1, directly or indirectly, influences ISR signalling.

      (2) The data relies on the initial identification of PEBP1 thermal stabilization concomitant with mitochondrial ISR induction post-treatment of several small molecules. However, the experiment was performed using a single timepoint of 30 minutes. There was no specific rationale for the choice of this time point for the thermal proteome profiling. 

      The reasoning for this was explicitly stated:  "We reasoned that treating intact cells with the drugs for only 30 min would allow us to observe rapid and direct effects related to metabolic flux and/or signaling related to mitochondrial dysfunction in the absence of major changes in protein expression levels.”

      Minor Issues 

      (1) In Lines 163-166 the authors state "The cells from Pebp1 KO animals displayed reduced expression of common ISR genes (Figure 2F), despite upregulation of unfolded protein response genes Ern1 (Ire1α) and Atf6 genes. This gene expression data therefore suggests that Pebp1 knockout in vivo suppresses induction of the ISR". This statement should be reassessed. While an arm of the UPR does stimulate ISR, this arm is controlled by PERK, and canonically IRE1 and ATF6 do not typically activate the ISR, thus their upregulation is likely unrelated to ISR activation and does not contribute the evidence necessary for this statement. 

      Apologies for the confusion, we aimed to highlight that as there is an increase in the two UPR arms, it is more likely that ISR instead of UPR is reduced. We have now changed the statement to the following:

      "The cells from Pebp1 PEBP1 KO animals displayed reduced expression of common ISR genes (Figure 2F), while there was mild upregulation of the unfolded protein response genes Ern1 (Ire1α) and Atf6 genes. This gene expression data therefore suggests that the reduced expression of common ISR genes is less likely to be mediated by changes in PERK, the third UPR arm, and more likely due to suppression of ISR by Pebp1 knockout in vivo."

      (2) In Lines 169 and 170 the authors state "Western blotting indicated reduced phosphorylation of eIF2α in RPE1 cells lacking PEBP1, suggesting that PEBP1 is involved in regulating ISR signaling between mitochondria and eIF2α". This conclusion is not supported by evidence. A number of pathways could be activated in these knockout cells, and simply observing an increase in p-eIF2α after knocking out PEBP1 does not constitute an interaction, as correlation doesn't mean causation. This KO could indirectly affect the ISR, with PEBP1 having no role in the ISR. While taken together there is enough circumstantial evidence in the manuscript to suggest a role for PEBP1 in the ISR, statements such as these have to be revised so as not to overreach the conclusions that can be achieved from the data, especially with no discernible mechanism.  

      We have now revised this statement by removing the conclusion and stating only the observation:  "Western blotting indicated reduced phosphorylation of eIF2α in RPE1 cells lacking PEBP1 (Fig. 3A)."

    1. Author response:

      Reviewer #2 (Public Review): 

      Comment 1: In terms of the biological significance of this interaction, it would be good to examine (via co-immunoprecipitation) whether the CEP89/NCS-1/C3ORF14 interaction takes place upon serum starvation. Does the complex change? 

      NCS1 centriolar localization requires CEP89 as no NCS1 localization was observed in CEP89 knockout cells (Figure 2L; Figure 2-figure supplement 2B). Both CEP89 and NCS1 centriolar localization were observed (Figure 2C; Figure 1D of the PMID: 36711481) in cells grown in serum containing media, although their localization was further enhanced in serum starved cells. From these results, we predict that CEP89 and NCS1 can interact and colocalize in both serum-fed and serum-depleted condition. We think it may not be easy to assess the change in interaction with the co-immunoprecipitation assay, as interactions occur in a test tube, which may not reflect the binding condition inside the cells.

      Comment 2: Also, for the subdistal appendage localization of NCS-1 and C3ORF14, would this also change upon serum starvation? 

      We agree that it would be interesting to see whether the subdistal appendage localization changes upon serum starvation, as NCS1 may capture the ciliary vesicle at the subdistal appendages as we discussed. However, the loss of the subdistal appendage protein, CEP128, blocks subdistal appendage localization of CEP89 [PMID: 32242819] without affecting cilium formation [PMID: 27818179]. This suggests that the subdistal appendage localization of NCS1 or C3ORF14 is likely dispensable for cilium formation.

      Comment 3: For the ciliation results and the recruitment of IFT88 in CEP89 knockout cell lines, this contradicts previous work from Tanos et al (PMID: 23348840), as well as Hou et al (PMID: 36669498). A parallel comparison using siRNA, a transient knockout system, or a degron system would help understand this. A similar point goes for Figure 4, where the effect on ciliogenesis is minimal in knockout cells, but acute siRNA has been shown to have a stronger phenotype. 

      Hou et al. [PMID: 36669498] investigated the role of distal appendage proteins, CEP164, CEP89, and FBF1 in the ciliated chordotonal organ of Drosophila melanogaster by generating knockout Drosophila strains. The results were markedly different from what was observed in mammalian cells. Notably, CEP164 is not required for cilium formation, and CEP89 is required for FBF1 localization in the animal. CEP89 was required for cilium formation in the cells in the ciliated chordotonal organ, of which cilium formation is dependent on IFT machinery. They did not show if IFT centriolar recruitment is affected in the CEP89 mutant cells. These differences likely reflect the divergence of the organization of distal appendage during evolution.

      The ciliation phenotype of our CEP89 knockout cells are milder than what was shown in Tanos et al [PMID: 23348840], but largely consistent with the results from Bornens group, which used siRNA to deplete CEP89 [PMID: 23789104]. Besides, NCS1 knockout cells showed very similar phenotype to the CEP89 knockout cells, and relatively acute deletion of NCS1 (14 days after infection of the lenti-virus containing sgNCS1 without single-cell cloning) displayed an almost identical ciliation defect (Figure 4B-C). Thus, we believe CEP89 is only partially required for cilium formation in RPE-hTERT cells and that the differences are more technical than definitive.

      Comment 4: An elegant phenotype rescue is shown in Figure 5. An interesting question would be, how does this mutant and/or the myristoylation affect the recruitment of C3ORF14? 

      NCS1 is not required for the localization of C3ORF14 (Figure 2M; Figure 2- figure supplement 2C), so we can assume that the myristoylation defective mutant does not affect C3ORF14 recruitment.

      Comment 5: For the EF-hand mutants, it would be good to use control mutants, from known Ca2+ binding proteins as a control for the experiment shown. 

      In the Figure 5-figure supplement 1A-C, we generated a series of EF-hand mutant of NCS1 to see if the calcium binding affects the CEP89 interaction, NCS1 localization, and cilium formation. NCS1 is only protein among the calcium binding NCS family proteins that was found as a positive hit in the mass spec data of CEP89 tandem affinity purification. Therefore, we cannot use other NCS1 family proteins as a control for CEP89 binding, NCS1 localization, and cilium formation.

    1. Reviewer #2 (Public review):

      This revised manuscript mostly addresses previous concerns by doubling down on the model without providing additional direct evidence of interactions between Srs2 and PCNA, and that "precise sites of Srs2 actions in the genome remain to be determined." One additional Srs2 allele has been examined, showing some effect in combination with rfa1-zm2.

      Many of the conclusions are based on reasonable assumptions about the consequences of various mutations, but direct evidence of changes in Srs2 association with PNCA or other interactors is still missing. There is an assumption that a deletion of a Rad51-interacting domain or a PCNA-interacting domain have no pleiotropic effects, which may not be the case. How SLX4 might interact with Srs2 is unclear to me, again assuming that the SLX4 defect is "surgical" - removing only one of its many interactions.

      One point of concern is the use of t-tests without some sort of correction for multiple comparisons - in several figures. I'm quite sceptical about some of the p < 0.05 calls surviving a Bonferroni correction. Also in 4B, which comparison is **? Also, admittedly by eye, the changes in "active" Rad53 seem much greater than 5x. (also in Fig. 3, normalizing to a non-WT sample seems odd).

      What is the WT doubling time for this strain? From the FACS it seems as if in 2 h the cells have completed more than 1 complete cell cycle. Also in 5D. Seems fast...

      I have one over-arching confusion. Srs2 was shown initially to remove Rad51 from ssDNA and the suppression of some of srs2's defects by deleting rad51 made a nice, compact story, though exactly how srs2's "suppression of rad6" fit in isn't so clear (since Rad6 ties into Rad18 and into PCNA ubiquitylation and into PCNA SUMOylation). Now Srs2 is invoked to remove RPA. It seems to me that any model needs to explain how Srs2 can be doing both. I assume that if RPA and Rad51 are both removed from the same ssDNA, the ssDNA will be "trashed" as suggested by Symington's RPA depletion experiments. So building a model that accounts for selective Srs2 action at only some ssDNA regions might be enhanced by also explaining how Rad51 fits into this scheme.

      As a previous reviewer has pointed out, CPT creates multiple forms of damage. Foiani showed that 4NQO would activate the Mec1/Rad53 checkpoint in G1- arrested cells, presumably because there would be single-strand gaps but no DSBs. Whether this would be a way to look specifically at one type of damage is worth considering; but UV might be a simpler way to look.

      As also noted, the effects on the checkpoint and on viability are quite modest. Because it isn't clear (at least to me) why rfa1 mutants are so sensitive to CPT, it's hard for me to understand how srs2-zm2 has a modest suppressive effect: is it by changing the checkpoint response or facilitating repair or both? Or how srs2-3KR or srs2-dPIM differ from Rfa1-zm2 in this respect. The authors seem to lump all these small suppressions under the rubric of "proper levels of RPA-ssDNA" but there are no assays that directly get at this. This is the biggest limitation.

      Srs2 has also been implicated as a helicase in dissolving "toxic joint molecules" (Elango et al. 2017). Whether this activity is changed by any of the mutants (or by mutations in Rfa1) is unclear. In their paper, Elango writes: "Rare survivors in the absence of Srs2 rely on structure-specific endonucleases, Mus81 and Yen1, that resolve toxic joint-molecules" Given the involvement of SLX4, perhaps the authors should examine the roles of structure-specific nucleases in CPT survival?

      Experiments that might clarify some of these ambiguities are proposed to be done in the future. For now, we have a number of very interesting interactions that may be understood in terms of a model that supposes discriminating among gaps and ssDNA extensions by the presence of PCNA, perhaps modified by SUMO. As noted above, it would be useful to think about the relation to Rad6.

    1. Watch out for metaphors, slang, and figurative language that simply have no meaning to non-native speakers of English. Many American expressions have to do with sports—everything from poker to football—and have no significance to those who have not grown up around those sports. Some of our expressions are actually racist or have a racist past, without our knowing or recognizing it because we do not know the origin of the phrase. Even a phrase that seems innocuous such as “bury the hatchet” could be viewed as culturally insensitive to Native Americans. If you use it, you are referring (inadvertently) to ethnic stereotypes as well as using references that non-U.S. cultures would not understand.

      Just because we think we know the meaning of a common used phrase or expression, it does not mean everyone will know it or will interpret it the same way. For example, non- native English speakers may not fully understand the meaning of some expressions or slang in English. Also, we may not know the origin of where some of the expressions or slang actually origin from. It is always important to be sensitive to the reader.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Using a knock-out mutant strain, the authors tried to decipher the role of the last gene in the mycofactocin operon, mftG. They found that MftG was essential for growth in the presence of ethanol as the sole carbon source, but not for the metabolism of ethanol, evidenced by the equal production of acetaldehyde in the mutant and wild type strains when grown with ethanol (Fig 3). The phenotypic characterization of ΔmftG cells revealed a growth-arrest phenotype in ethanol, reminiscent of starvation conditions (Fig 4). Investigation of cofactor metabolism revealed that MftG was not required to maintain redox balance via NADH/NAD+, but was important for energy production (ATP) in ethanol. Since mycobacteria cannot grow via substrate-level phosphorylation alone, this pointed to a role of MftG in respiration during ethanol metabolism. The accumulation of reduced mycofactocin points to impaired cofactor cycling in the absence of MftG, which would impact the availability of reducing equivalents to feed into the electron transport chain for respiration (Fig 5). This was confirmed when looking at oxygen consumption in membrane preparations from the mutant and would type strains with reduced mycofactocin electron donors (Fig 7). The transcriptional analysis supported the starvation phenotype, as well as perturbations in energy metabolism, and may be beneficial if described prior to respiratory activity data.

      The data and conclusions support the role of MftG in ethanol metabolism.

      We thank the reviewer for the positive evaluation of our manuscript.

      Reviewer #3 (Public review):

      Summary:

      The work by Graca et al. describes a GMC flavoprotein dehydrogenase (MftG) in the ethanol metabolism of mycobacteria and provides evidence that it shuttles electrons from the mycofactocin redox cofactor to the electron transport chain.

      Strengths:

      Overall, this study is compelling, exceptionally well designed and thoroughly conducted. An impressively diverse set of different experimental approaches is combined to pin down the role of this enzyme and scrutinize the effects of its presence or absence in mycobacteria cells growing on ethanol and other substrates. Other strengths of this work are the clear writing style and stellar data presentation in the figures, which makes it easy also for non-experts to follow the logic of the paper. Overall, this work therefore closes an important gap in our understanding of ethanol oxidation in mycobacteria, with possible implications for the future treatment of bacterial infections.

      Weaknesses:

      I see no major weaknesses of this work, which in my opinion leaves no doubt about the role of MftG.

      We thank the reviewer for the positive evaluation of our manuscript.

      Reviewer #4 (Public review):

      Summary:

      The manuscript by Graça et al. explores the role of MftG in the ethanol metabolism of mycobacteria. The authors hypothesise that MftG functions as a mycofactocin dehydrogenase, regenerating mycofactocin by shuttling electrons to the respiratory chain of mycobacteria. Although the study primarily uses M. smegmatis as a model microorganism, the findings have more general implications for understanding mycobacterial metabolism. Identifying the specific partner to which MftG transfers its electrons within the respiratory chain of mycobacteria would be an important next step, as pointed out by the authors.

      Strengths:

      The authors have used a wide range of tools to support their hypothesis, including co-occurrence analyses, gene knockout and complementation experiments, as well as biochemical assays and transcriptomics studies.

      An interesting observation that the mftG deletion mutant grown on ethanol as the sole carbon source exhibited a growth defect resembling a starvation phenotype.

      MftG was shown to catalyse the electron transfer from mycofactocinol to components of the respiratory chain, highlighting the flexibility and complexity of mycobacterial redox metabolism.

      Weaknesses:

      Could the authors elaborate more on the differences between the WT strains in Fig. 3C and 3E? in Fig. 3C, the ethanol concentration for the WT strain is similar to that of WT-mftG and ∆mftG-mftG, whereas the acetate concentration in thw WT strain differs significantly from the other two strains. How this observation relates to ethanol oxidation, as indicated on page 12.

      This is a good question, and we agree with the reviewer that the sum of processes leading to the experimental observations shown in Figure 3 are not completely understood. For instance, when looking at ethanol concentrations, evaporation is a dominating effect and the situation is furthermore confounded by the fact that the rate of ethanol evaporation appears to be inversely correlated to the optical density of the samples (see Figure 3E and compare media control as well as the samples of DmftG and DmftG at OD<sub>600</sub> = 1). Additionally, the growth rate and thus the OD<sub>600</sub> of all strains monitored are different at each time point, thus further complicating the analysis. This is why we assume that the rate of ethanol oxidation is mirrored more clearly by acetate formation, at least in the early phase before 48 h (Figure 3E),i.e., before acetate consumption becomes dominant in DmftG-mftG and WT-mftG. Here, we see that the rate of acetate formation is zero for media controls, low for DmftG, but high for WT as well as DmftG-mftG and WT-mftG. The latter two strains also showed an earlier starting point of growth as well as acetate formation and the following phase of acetate depletion.

      All of these observations are in line with our general statement, i.d., “Parallel to the accelerated and enhanced growth described above (Figure 3A), the overexpression strains displayed higher rates of ethanol consumption as well as an earlier onset of acetate overflow metabolism and acetate consumption (Figure 3D).” We are still convinced that this summary describes the findings well and avoids unnecessary speculation.

      The authors conclude from their functional assays that MftG catalyses single-turnover reactions, likely using FAD present in the active site as an electron acceptor. While this is plausible, the current experimental set up doesn't fully support this conclusions, and the language around this claim should be softened.

      This is a fair point. We revised our claim accordingly. In particular, we changed:

      Page 28: we added “possibly”

      Page 28 we changed “single-turnover reactions” to “reactions reminiscent of a single-turnover process”.

      The authors suggest in the manuscript that the quinone pool (page 24) may act as the electron acceptor from mycofactocinol, but later in the discussion section (page 30) they propose cytochromes as the potential recipients. If the authors consider both possibilities valid, I suggest discussing both options in the manuscript.

      This is true. However, no change to the manuscript is necessary, since both options were discussed on page 30.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors addressing some of the original recommendations is appreciated e.g. title change. Other recommendations that were not adequately addressed would mostly improve the clarity and help comprehension for the reader, but they are at the author's discretion.

      Reviewer #3 (Recommendations for the authors):

      Abstract: "Here, we show that MftG enzymes strictly require mft biosynthetic genes and are found in 75% of organisms harboring these genes". I read this sentence several times and I am still somewhat confused and not sure what exactly is meant here. I suggest to rephrase, e.g., to "Here, we show that in 75% of all organisms that harbour the mft biosynthetic genes, MftG enzymes are also encoded and functionally associated with these genes" (if that was meant; also the abbreviation mft should be introduced in the abstract or otherwise the full name be used).

      We thank the reviewer for the good hint. We changed the sentence to “Here, we show that MftG enzymes are almost exclusively found in genomes containing mycofactocin biosynthetic genes and are present in 75% of organisms harboring these genes”.

      p.3, 2nd paragraph: "Although the role of MFT in alcohol metabolism is well established, further biological roles of mycofactocin appear to exist." Mycofactocin is once written as MFN and once in full length, which is slightly confusing. Consider rephrasing, e.g., to "...further biological roles of this cofactor appear to exist".

      Thank you, we adopted the suggested change.

      Fig. 1: Consider adding MftG in brackets after "mycofactocin dehydrogenase" in panel B.

      Good suggestion. We added (MftG) to the figure.

      Fig. 3: Legend should be corrected. The color of the signs should be teal diamond for "M. smegmatis double presence of the mftG gene" and orange upward facing triangle for "Medium with 10 g L-1 of ethanol without bacterial inoculation". Aside from the coloration, the order should ideally also be identical to the one shown in the upper right part.

      Thank you for the valuable hint! We corrected the legend and unified the legends in the figure caption and figure.

      p.20 : It is not exactly clear to me why "semipurified cell-free extracts from M. smegmatis ∆mftG-mftGHis6 " were used here rather than the purified enzyme. Was the purification by HisTrap columns not feasible or was the protein unstable when fully purified? In any case, it would help the reader to quickly state the reason in this section.

      Indeed, the problem with M. smegmatis as an expression host was a combination of low protein yield and poor binding to Ni-NTA columns. In E. coli, poor expression, low solubility or poor binding was the issue. Unfortunately, the usage of other affinity tags resulted in either poor expression or inactive protein. We have shortly mentioned the major issues on page 21 and prefer not to focus on failed attempts too much.

      p. 21: "We, therefore, concluded that MftG can indeed interact with mycofactocins as electron donors but might require complex electron acceptors, for instance, proteins present in the respiratory chain." I agree. For the future it might be worthwhile to determine the redox potential of MftG, which could provide hints on the natural electron acceptor.

      Thank you for the suggestion. We will consider this question in our future work.

      p. 23: "In M. smegmatis, cyanide is a known inhibitor of the cytochrome bc/aa3 but not of cytochrome bd (34), therefore, the decrease of oxygen consumption when MFTs were added to the membrane fractions in combination with KCN (Figure 7), revealed that MFT-induced oxygen consumption is indeed linked to mycobacterial respiration." It might be a good idea to quickly recapitulate the functions of these cytochromes here. Also, I think it should read "bc1aa3" (also correct in legend of Fig. 8 that says "bcc-aa3").

      Thank you for the good observation. We changed all instances to the correct designation (bc1-aa3).

      Reviewer #4 (Recommendations for the authors):

      Abstract: revise the wording "MftG enzymes strictly require mft biosynthetic genes". It should be either mftG gene with the mft biosynthetic genes or MftG enzyme with the Mft biosynthetic proteins. I also suggest replacing "require" with a more appropriate term.

      This was taken care of. See above.

      Page 3, end of the first paragraph; does the alcohol dehydrogenase refer to Mno/Mdo?

      Partially, yes, but also to other alcohol dehydrogenases.

      Page 4, radical SAM; define upon first use

      Good, point, we changed “radical SAM” to radical S-adenosyl methionine (rSAM)

      Page 6; Rossman fold refers to the fold and not only the FAD binding pocket.

      Good point. We deleted “(Rossman fold)”

      Page 11; not exactly sure what this means "the growth curve of the complemented strain, which could be dysregulated in mftG expression"

      By “dysregulated” expression, we mean that the expression of mftG could be higher or lower than in the WT and could follow different regulatory signals than in the wild type. Since this phenomenon is not well understood, we would like to avoid speculative discussions.

      Page 11; Figures 2E and 2C should be 3E and 3C. Likewise on page 12 Figure 2D.

      Thank you very much for the valuable hint. We corrected the figure numbers as suggested.

      Page 12; the last Figure 3D in the page should be 3E?

      Yes, good catch, we corrected the Figure number.

      Page 17, KO; define upon first use.

      Good suggestion, we changed both instances of “KO” to “knockout”

      Page 24; revise: "for instance. For example"

      We deleted “for instance”.

      Page 26; change 6.506 to 6,506

      Corrected.

      Page 23; "In M. smegmatis, cyanide is a known inhibitor ..." is too long and not easy to understand/follow.

      Good suggestion. We simplified the sentence to “Therefore, the decrease of oxygen consumption in the presence of KCN (Figure 7) revealed…”

      Page 29; "single-turnover reactions could be observed". There are no experiments to support this statement, except the results shown in Figure 7F. I suggest softening the language, as it has been done on page 21. To claim single-turnover, a proper kinetic analysis would be necessary, which is not included in the current manuscript.

      This is true and has been taken care of. See above.

      Figure 1; Indicate mycofactocin dehydrogenase as MftG

      Done.

      Figure 5A; what is the significance of comparing ∆mftG glucose with WT ethanol?

      We agree, that, although the difference of the two columns is significant, this does not have any relevant meaning. Therefore, we removed the bracket with p-value in Panel A.

      Make HdB-Tyl/HdB-tyloxapol usage consistent throughout the document. Likewise, re the usage of mycobacteria/Mycobacteria/Mycobacteria

      Thank you for the valuable hint, we unified the usage throughout the document

    1. Author response:

      Reviewer #1:

      Summary:

      Beyond what is stated in the title of this paper, not much needs to be summarized. eIF2A in HeLa cells promotes translation initiation of neither the main ORFs nor short uORFs under any of the conditions tested.

      Strengths:

      Very comprehensive, in fact, given the huge amount of purely negative data, an admirably comprehensive and well-executed analysis of the factor of interest.

      Weaknesses:

      The study is limited to the HeLa cell line, focusing primarily on KO of eIF2A and neglecting the opposite scenario, higher eIF2A expression which could potentially result in an increase in non-canonical initiation events.

      We thank the reviewer for the positive evaluation. As suggested by the reviewer in the detailed recommendations, we will clarify in the title, abstract and text that our conclusions are limited to HeLa cells. Furthermore, as suggested we will test the effect of eIF2A overexpression on the luciferase reporter constructs, and will upload a revised manuscript.

      Reviewer #2:

      Summary

      Roiuk et al describe a work in which they have investigated the role of eIF2A in translation initiation in mammals without much success. Thus, the manuscript focuses on negative results. Further, the results, while original, are generally not novel, but confirmatory, since related claims have been made before independently in different systems with Haikwad et al study recently published in eLife being the most relevant.

      Despite this, we find this work highly important. This is because of a massive wealth of unreliable information and speculations regarding eIF2A role in translation arising from series of artifacts that began at the moment of eIF2A discovery. This, in combination with its misfortunate naming (eIF2A is often mixed up with alpha subunit of eIF2, eIF2S1) has generated a widespread confusion among researchers who are not experts in eukaryotic translation initiation. Given this, it is not only justifiable but critical to make independent efforts to clear up this confusion and I very much appreciate the authors' efforts in this regard.

      Strengths

      The experimental investigation described in this manuscript is thorough, appropriate and convincing.

      Weaknesses

      However, we are not entirely satisfied with the presentation of this work which we think should be improved.

      We thank the reviewer for the positive evaluation. We will revise the manuscript according to the reviewer's suggestions made in the detailed recommendations.

      Reviewer #3:

      Summary:

      This is a valuable study providing solid evidence that the putative non-canonical initiation factor eIF2A has little or no role in the translation of any expressed mRNAs in cultured human (primarily HeLa) cells. Previous studies have implicated eIF2A in GTP-independent recruitment of initiator tRNA to the small (40S) ribosomal subunit, a function analogous to canonical initiation factor eIF2, and in supporting initiation on mRNAs that do not require scanning to select the AUG codon or that contain near-cognate start codons, especially upstream ORFs with non-AUG start codons, and may use the cognate elongator tRNA for initiation. Moreover, the detected functions for eIF2A were limited to, or enhanced by, stress conditions where canonical eIF2 is phosphorylated and inactivated, suggesting that eIF2A provides a back-up function for eIF2 in such stress conditions. CRISPR gene editing was used to construct two different knock-out cell lines that were compared to the parental cell line in a large battery of assays for bulk or gene-specific translation in both unstressed conditions and when cells were treated with inhibitors that induce eIF2 phosphorylation. None of these assays identified any effects of eIF2A KO on translation in unstressed or stressed cells, indicating little or no role for eIF2A as a back-up to eIF2 and in translation initiation at near-cognate start codons, in these cultured cells.

      The study is very thorough and generally well executed, examining bulk translation by puromycin labeling and polysome analysis and translational efficiencies of all expressed mRNAs by ribosome profiling, with extensive utilization of reporters equipped with the 5'UTRs of many different native transcripts to follow up on the limited number of genes whose transcripts showed significant differences in translational efficiencies (TEs) in the profiling experiments. They also looked for differences in translation of uORFs in the profiling data and examined reporters of uORF-containing mRNAs known to be translationally regulated by their uORFs in response to stress, going so far as to monitor peptide production from a uORF itself. The high precision and reproducibility of the replicate measurements instil strong confidence that the myriad of negative results they obtained reflects the lack of eIF2A function in these cells rather than data that would be too noisy to detect small effects on the eIF2A mutations. They also tested and found no evidence for a recent claim that eIF2A localizes to the cytoplasm in stress and exerts a global inhibition of translation. Given the numerous papers that have been published reporting functions of eIF2A in specific and general translational control, this study is important in providing abundant, high-quality data to the contrary, at least in these cultured cells.

      Strengths:

      The paper employed two CRISPR knock-out cell lines and subjected them to a combination of high-quality ribosome profiling experiments, interrogating both main coding sequences and uORFs throughout the translatome, which was complemented by extensive reporter analysis, and cell imaging in cells both unstressed and subjected to conditions of eIF2 phosphorylation, all in an effort to test previous conclusions about eIF2A functioning as an alternative to eIF2.

      Weaknesses:

      There is some question about whether their induction of eIF2 phosphorylation using tunicamycin was extensive enough to state forcefully that eIF2A has little or no role in the translatome when eIF2 function is strongly impaired. Also, similar conclusions regarding the minimal role of eIF2A were reached previously for a different human cell line from a study that also enlisted ribosome profiling under conditions of extensive eIF2 phosphorylation; although that study lacked the extensive use of reporters to confirm or refute the identification by ribosome profiling of a small group of mRNAs regulated by eIF2A during stress.

      We thank the reviewer for the positive evaluation. We will revise the manuscript according to the recommendations made in the detailed recommendations. Regarding the two points mentioned here:

      (1) the reason eIF2alpha phosphorylation does not increase appreciably is because unfortunately the antibody is very poor. The fact that the Integrated Stress Response (ISR) is induced by our treatment can be seen, for instance, by the fact that ATF4 protein levels increase strongly (in the very same samples where eIF2alpha phosphorylation does not increase much, in Suppl. Fig. 5E). We will strengthen the conclusion that the ISR is indeed activated with additional experiments/data as suggested by the reviewer.

      (2) We agree that our results are in line with results from the previous study mentioned by the reviewer, so we will revise the manuscript to mention this other study more extensively in the discussion.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Review:

      The overall goal of this manuscript is to understand how Notch signaling is activated in specific regions of the endocardium, including the OFT and AVC, that undergo EMT to form the endocardial cushions. Using dofetilide to transiently block circulation in E9.5 mice, the authors show that Notch receptor cleavage still occurs in the valve-forming regions due to mechanical sheer stress as Notch ligand expression and oxygen levels are unaffected. The authors go on to show that changes in lipid membrane structure activate mTOR signaling, which causes phosphorylation of PKC and Notch receptor cleavage.

      The strengths of the manuscript include the dual pharmacological and genetic approaches to block blood flow in the mouse, the inclusion of many controls including those for hypoxia, the quality of the imaging, and the clarity of the text. However, several weaknesses were noted surrounding the main claims where the supporting data are incomplete.

      PKC - Notch1 activation:

      (1) Does deletion of Prkce and Prkch affect blood flow, and if so, might that be suppressing Notch1 activation indirectly?

      To address this concern, we performed echocardiography of Prkce<sup>+/-</sup>;Prkch<sup>+/-</sup>, Prkce<sup>-/-</sup>;Prkch<sup>+/-</sup>, and Prkce<sup>+/-</sup>;Prkch<sup>-/-</sup> mouse hearts (Figure 3-supplement figure 2D), showing no significant effect in heartbeat and blood flow. (Line 308)

      (2) It would be helpful to visualize the expression of prkce and prkch by in situ hybridization in E9.5 embryos.

      We now added immunofluorescence staining results for both PKCE and PKCH as shown in Figure 3-supplement figure 2B. In E9.5 embryonic heart, PKCH is mainly expressed in the endocardium overlying AV canal and the base of trabeculae, overlapping with the expression pattern of NICD and pPKC<sup>Ser660</sup>. PKCE is expressed in both endocardium and myocardium. In the endocardium, PKCE is mainly expressed in the endocardium overlying AV canal (Line312-314)

      (2) PMA experiments: Line 223-224: A major concern is related to the conclusion that "blood flow activates Notch in the cushion endocardium via the mTORC2-PKC signaling pathway". To make that claim, the authors show that a pharmacological activation with a potent PKC activator, PMA, rescues NICD levels in the AVC in dofetilide-treated embryos. This claim would also need proof that a lack of blood flow alters the activity of mTORC2 to phosphorylate the targets of PKC phosphorylation. Also, this observation does not explain the link between PKC activity and Notch activation.

      Both AKT Ser473 and PKC Ser660 are well characterized phosphorylation sites regulated by mTORC2 (Baffi TR et. al, mTORC2 controls the activity of PKC and Akt by phosphorylating a conserved TOR interaction motif. Sci Signal. 2021;14.). pAKT<sup>Ser473</sup> is widely used as an indicator of mTORC2 activity. Therefore, the reduced staining intensity of pAKT<sup>Ser473</sup> and pPKC<sup>Ser660</sup> observed in the dofetilide treated embryos should reflect the reduced activity of their common upstream activator mTORC2. This information is provided in Line 317-321.

      As PMA is a well-characterized specific activator of PKC, we believe the rescue of NICD by PMA could explain the link between PKC activity and Notch activation.

      (3) In addition, the authors hypothesise that shear stress lies upstream of PKC and Notch activation, and that because shear stress is highest at the valve-forming regions, PKC and Notch activity is localised to the valve-forming regions. Since PMA treatment affects the entire endocardium which expresses Notch1, NICD should be seen in areas outside of the AVC in the PMA+dofetilide condition. Please clarify.

      As shown in Figure 3C and Figure 3-supplement figure 2B, pPKC, PKCH and PKCE expression are all confined in the AVC region. This explains PMA activates NICD specifically in the valve-forming region. This information is added in Line 312-314.

      Lipid Membrane:

      (1) It is not clear how the authors think that the addition of cholesterol changes the lipid membrane structure or alters Cav-1 distribution. Can this be addressed? Does adding cholesterol make the membrane more stiff? Does increased stiffness result from higher shear stress?

      We do not know how exactly addition of cholesterol alters membrane structure and influence mTORC2-PKC-Notch signaling. As cholesterol is an important component of lipid raft and caveolae, it is possible that enrichment of cholesterol might alter the membrane structure to make the lipid raft structure less dependent on sheer stress. This hypothesis need to be tested in further in vitro studies. This information is added to Line 433-436.

      (2) The loss of blood flow apparently affects Cav1 membrane localization and causes a redistribution from the luminal compartment to lateral cell adhesion sites. Cholesterol treatment of dofetilide-treated hearts (lacking blood flow) rescued Cav1 localization to luminal membrane microdomains and rescued NICD expression. It remains unclear how the general addition of cholesterol would result in a rescue of regionalized membrane distribution within the AVC and in high-shear stress areas.

      We do not know the exact mechanism. As replied in the previous question, future cell-based work is needed to address these important questions. (Line 433-436)

      (3) The authors do not show the entire heart in that rescue treatment condition (cholesterol in dofetilide-treated hearts). Also, there is no quantification of that rescue in Figure 4B. Currently, only overview images of the heart are shown but high-resolution images on a subcellular scale (such as electron microscopy) are needed to resolve and show membrane microdomains of caveolae with Cav1 distribution. This is important because Cav-1could have functions independent of caveolae.

      In Figure 4C, most panels display the large part of the heart including AVC, atrium and ventricle. The images in the third column appear to be more restricted to AVC. We have now replaced these images to reveal AVC and part of the atrium and ventricle. 

      The quantification has also been provided in Figure 4C. We also added a new panel of scanning EM of AVC endocardium, showing numerous membrane invaginations on the luminal surface of the endocardial cells. The size of the invaginations ranges from 50 to 100 nm, consistent with the reported size of caveolae. Dofetilide significantly reduced the number of membrane invaginations, which recovered after restore of blood flow at 5 hours post dofetilide treatment. The reduction of membrane invaginations could also be rescued by ex vivo cholesterol treatment. This information is added to Line 342-349.

      Figure Legends, missing data, and clarity:

      (1) The number of embryos used in each experiment is not clear in the text or figure legends. In general, figure legends are incomplete (for instance in Figure 1).

      Thanks for reminding. we have now added numbers of embryos in the figure legends.

      (2) Line 204: The authors refer to unpublished endocardial RNAseq data from E9.5 embryos. These data must be provided with this manuscript if it is referred to in any way in the text.

      The RNAseq data of PKC isoforms is now provided in Figure3-Figure supplement 2A, Line 301-302.

      (3) Figure 1 shows Dll4 transcript levels, which do not necessarily correlate with protein levels. It would be important to show quantifications of these patterns as Notch/Dll4 levels are cycling and may vary with time and between different hearts.

      The Dll4 immuno-staining in Figure 1B,C is indeed Dll4 protein, not transcript. The quantification is added in Figure 1—Figure supplement 1C. Line 215.

      (4) Line 212-214: The authors describe cardiac cushion defects due to the loss of blood flow and refer to some quantifications that are not completely shown in Figure 3. For instance, quantifications for cushion cellularity and cardiac defects at three hours (after the start of treatment?) are missing.

      The formation of the defects is a developmental process and time dependent. To address this concern, we quantified the cushion cellularity at 5 hours post dofetilide treatment and showed that cell density significantly decreased in the dofetilide treated embryos, albeit less pronounced than the difference at E10.5. (Line 256-257)

      (5) Related to Figure 5. The work would be strengthened by quantification of the effects of dofetilide and verapamil on heartbeat at the doses applied. Is the verapamil dosage used here similar to the dose used in the clinic?

      We are grateful to this suggestion. The effect of dofetilide on heartbeat has already been shown in Figure 2A. We have now additionally measured the heartbeat rate of verapamil treated embryos, and provided the results in Figure 5E. For verapamil injection in mice, a single i.p. dose of 15 mg/kg was used, which is equivalent to 53 mg/m<sup>2</sup> body surface. Verapamil is used in the clinic at dosage ranging from 200 to 480 mg/day, equivalent to 3.33 - 8 mg/kg or 117 - 282 mg/m<sup>2</sup> body surface. Therefore, the dosage used in the mouse is not excessively high compared to the clinic uses. (Line 361-365) 

      Overstated Claims:

      (1) The authors claim that the lipid microstructure/mTORC2/PKC/Notch pathway is responsive to shear stress, rather than other mechanical forces or myocardial function. Their conclusions seem to be extrapolated from various in vitro studies using non-endocardial cells. To solidify this claim, the authors would need additional biomechanical data, which could be obtained via theoretical modelling or using mouse heart valve explants. This issue could also be addressed by the authors simply softening their conclusions.

      We aggrege with the reviewer’s comment. We have now revised the statement as “Our data support a model that membrane lipid microdomain acts as a shear stress sensor and transduces the mechanical cue to activate intracellular mTORC2-PKC-Notch signaling pathway in the developing endocardium. (line 416-418) It is noteworthy that the methodology used to alter blood flow in this study inevitably affects myocardial contraction. Additional work to uncouple sheer stress with other changes of mechanical properties of the myocardium with the aid of theoretical modelling or using mouse heart valve explants is needed to fully characterize the effect of sheer stress on mouse endocardial development.” (Line 436-440)

      (2) Line 263-264: In the discussion, the authors conclude that "Strong fluid shear stress in the AVC and OFT promotes the formation of caveolae on the luminal surface of the endocardial cells, which enhances PKCε phosphorylation by mTORC2." This link was shown rather indirectly, rather than by direct evidence, and therefore the conclusion should be softened. For example, the authors could state that their data are consistent with this model.

      We have revised the statement as “Strong fluid shear stress in the AVC and OFT enhances PKC phosphorylation by mTORC2 possibly by maintaining a particular membrane microstructure.” (Line 372-374)

      (3) In the Discussion, it says: "Mammalian embryonic endocardium undergoes extensive EMT to form valve primordia while zebrafish valves are primarily the product of endocardial infolding (Duchemin et al., 2019)." In the paper cited, Duchemin and colleagues described the formation of the zebrafish outflow tract valve. The zebrafish atrioventricular valve primordia is formed via partial EMT through Dll-Notch signaling (Paolini et al. Cell Reports 2021) and the collective cell migration of endocardial cells into the cardiac jelly. Then, a small subset of cells that have migrated into the cardiac jelly give rise to the valve interstitial cells, while the remainder undergo mesenchymal-to-endothelial transition and become endothelial cells that line the sinus of the atrioventricular valve (Chow et al., doi: 10.1371/journal.pbio.3001505). The authors should modify this part of the Discussion and cite the relevant zebrafish literature.

      Thanks for valuable comments. We have now revised the statement as “Mammalian embryonic endocardium undergoes extensive EMT to form valve primordia while zebrafish atrioventricular valve primordia is formed via partial EMT and the collective cell migration of endocardial cells into the cardiac jelly followed by tissue sheet delamination.” with relevant references added. (Line 411-414)

      Recommendations to the Authors:

      (1) One issue that the authors could address is the organization of figures. There are several cases where positive data that are central to the conclusions are placed in the supplement and should be moved to the main figures. Places where this occurred are listed below:

      - The Tie2 conditional deletion of Dll4 showing retention of NICD in the OFT and AVC regions is highly supportive of the model. The authors should consider moving these data to main Figure 1.

      Thanks for the suggestion. We have reorganized the figure as requested.

      - The ligand expression data in Figure 2- Supplement Figure 1 A is VERY important to the conclusions drawn from the dofetilide treatment. The authors should move these data to main Figure 2.

      The ligand expression data in Figure 2- Supplement Figure 1A are now moved to Figure 2B.

      - In Figure 3A - the area in the field of view should be stated in the Figure (is it the AVC?) Figure 3 - Supplement 1 proximal OFT data should be moved to main Figure 3 as it is central to the conclusions. Negative DA data can be left in the supplement. Again, for Figure 3 - Supplement 1 Stauroporine treatment data should be moved to the main figure as it is positive data that are central to the conclusions.

      Thanks for the suggestion. We have reorganized the figure as requested.

      (2) Antibody used for Twist1 detection is not listed in the resource table.

      Twist1 is purchased from abcam, the detailed information is now available in the resource table.

      (3) Missing arrowhead in Figure 4A, last row.

      Sorry for the negligence. Arrowhead is now added.

      (4) Line 286. "OFT" pasted on the word "endothelium".

      “OFT” is now removed.

      (5) Related to Figure 2C. The fast response of NICD to flow cessation was used as an argument to support post-translational modification. It is not clear why Sox9 and Twist1 expression also responds so quickly.

      Sox9 and Twist1 expression does seem to respond very quickly. Whether there exists additional regulatory pathways such as Wnt, Vegf signaling that also respond to sheer stress needs to be investigated in the future.

      (6) Line 200: The sentence should end with a period.

      Sorry for the oversight. It is now corrected.

      (7) Lines 34 to 35: the authors phrase that Notch is "allowed" to be specifically activated in the AVC and outflow tract by shear stress.

      We have rephrased the statement with “enabling Notch to be specifically activated in AVC and OFT by regional increased shear stress.” Line 27

      (8) Lines 96-100: At the end of the introduction, the text is copied from the abstract. New text should be written or summarized in a different way.

      The last sentence of introduction is now changed to “The results uncovered a new mechanism whereby mechanical force serves as a primary cue for endocardial patterning in mammalian embryonic heart.” (Line 93-95)

      (9) Line 125: The term "agreed with the Dll4 transcript.."should be replaced with a better term like "overlapped" or "was identical with".

      The word “agreed” is now “overlapped”. (Line 219)

      (10) Line 291: "Thus, through these sophisticated mechanisms, the developing mouse hearts may achieve three purposes:"- The English should be adjusted here since it sounds like hearts are aiming to achieve a purpose, which is unlikely what was meant by the authors.

      This sentence is rephrased to “Thus, in the developing mouse hearts: (1) VEGF signaling is reduced to permit endocardial EMT; (2) Dll4 expression is reduced to prevent widespread endocardial Notch activation and make endocardium sensitive to flow; (3) a proper cushion size and shape is maintained by limiting the flanking endocardium to undergo EMT despite physically close to the field of BMP2 derived from of AVC myocardium (Figure 6).” (Line 402-406)

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This work uses transgenic reporter lines to isolate entpd5a+ cells representing classical osteoblasts in the head and non-classical (osterix-) notochordal sheath cells. The authors also include entpd5a- cells, col2a1a+ cells to represent the closely associated cartilage cells. In a combination of ATAC and RNA-Seq analysis, the genome-wide transcriptomic and chromatin status of each cell population is characterized, validating their methodology and providing fundamental insights into the nature of each cell type, especially the less well-studied notochordal sheath cells. Using these data, the authors then turn to a thorough and convincing analysis of the regulatory regions that control the expression of the entpd5a gene in each cell population. Determination of transcriptional activities in developing zebrafish, again combined with ATAC data and expression data of putative regulators, results in a compelling and detailed picture of the regulatory mechanisms governing the expression of this crucial gene.

      Strengths:

      The major strength of this paper is the clever combination of RNA-Seq and ATAC analysis, further combined with functional transcriptional analysis of the regulatory elements of one crucial gene. This results in a very compelling story.

      Weaknesses:

      No major weaknesses were identified, except for all the follow-up experiments that one can think of, but that would be outside of the scope of this paper.

      Reviewer #2 (Public Review):

      Summary:

      Complementary to mammalian models, zebrafish has emerged as a powerful system to study vertebrate development and to serve as a go-to model for many human disorders. All vertebrates share the ancestral capacity to form a skeleton. Teleost fish models have been a key model to understand the foundations of skeletal development and plasticity, pairing with more classical work in amniotes such as the chicken and mouse. However, the genetic foundation of the diversity of skeletal programs in teleosts has been hampered by mapping similarities from amniotes back and not objectively establishing more ancestral states. This is most obvious in systematic, objective analysis of transcriptional regulation and tissue specification in differentiated skeletal tissues. Thus, the molecular events regulating bone-producing cells in teleosts have remained largely elusive. In this study, Petratou et al. leverage spatial experimental delineation of specific skeletal tissues -- that they term 'classical' vs 'non-classical' osteoblasts -- with associated cartilage of the endo/peri-chondrial skeleton and inter-segmental regions of the forming spine during development of the zebrafish, to delineate molecular specification of these cells by current chromatin and transcriptome analysis. The authors further show functional evidence of the utility of these datasets to identify functional enhancer regions delineating entp5 expression in 'classical' or 'non-classical' osteoblast populations. By integration with paired RNA-seq, they delineate broad patterns of transcriptional regulation of these populations as well as specific details of regional regulation via predictive binding sites within ATACseq profiles. Overall the paper was very well written and provides an essential contribution to the field that will provide a foundation to promote modeling of skeletal development and disease in an evolutionary and developmentally informed manner.

      Strengths:

      Taken together, this study provides a comprehensive resource of ATAC-seq and RNA-seq data that will be very useful for a wide variety of researchers studying skeletal development and bone pathologies. The authors show specificity in the different skeletal lineages and show the utility of the broad datasets for defining regulatory control of gene regulation in these different lineages, providing a foundation for hypothesis testing of not only agents of skeletal change in evolution but also function of genes and variations of unknown significance as it pertains to disease modeling in zebrafish. The paper is excellently written, integrating a complex history and experimental analysis into a useful and coherent whole. The terminology of 'classical' and 'non-classical' will be useful for the community in discussing the biology of skeletal lineages and their regulation.

      Weaknesses:

      Two items arose that were not critical weaknesses but areas for extending the description of methods and integration into the existing data on the role of non-classical osteoblasts and establishment/canalization of this lineage of skeletal cells.

      (1) In reading the text it was unclear how specific the authors' experimental dissection of the head/trunk was in isolating different entp5a osteoblast populations. Obviously, this was successful given the specificity in DEG of results, however, analysis of contaminating cells/lineages in each population would be useful - e.g. using specific marker genes to assess. The text uses terms such as 'specific to' and 'enriched in' without seemingly grounded meaning of the accuracy of these comments. Is it really specific - e.g. not seen in one or other dataset - or is there some experimental variation in this?

      We thank the reviewer for pointing this out. Given that the separation from head and trunk is done manually, there will be some experimental variability. We have used anatomical hallmarks (cleithrum and swim bladder), and therefore would expect the variability to be small. Regarding classical osteoblasts contaminating trunk tissue, head removal was consistently performed using the aforementioned anatomical hallmarks in a manner that ensures that the cleithrum does not remain in the trunk tissue.  In order to alleviate concerns regarding trunk cell populations contaminating cranial populations, and to further clarify our strategy, we add the following statement to the Materials and Methods section: “The procedure does not allow for a complete separation of notochordal non-classical osteoblasts from cranial classical osteoblasts, as the notochord extends into the cranium. However, the amount of sheath cells in that portion of the notochord is negligible, compared both to the number of classical (cranial) osteoblasts in head samples, and to notochord cells isolated in trunk samples.”

      (2) Further, it would be valuable to discuss NSC-specific genes such as calymmin (Peskin 2020) which has species and lineage-specific regulation of non-classical osteoblasts likely being a key mechanistic node for ratcheting centra-specific patterning of the spine in teleost fishes. What are dynamics observed in this gene in datasets between the different populations, especially when compared with paralogues - are there obvious cis-regulatory changes that correlate with the co-option of this gene in the early regulation of non-classical osteoblasts? The addition of this analysis/discussion would anchor discussions of the differential between different osteoblasts lineages in the paper.

      This is an interesting concept and idea, that we will consider in a possible revision or, if requiring substantial additional efforts, in a possible new research line. An excellent starting point for further studies using our datasets.

      Reviewer #3 (Public Review):

      Summary:

      This study characterizes classical and nonclassical osteoblasts as both types were analyzed independently (integrated ATAC-seq and RNAseq). It was found that gene expression in classical and nonclassical osteoblasts is not regulated in the same way. In classical osteoblasts, Dlx family factors seem to play an important role, while Hox family factors are involved in the regulation of spinal ossification by nonclassical osteoblasts. In the second part of the study, the authors focus on the promoter structure of entpd5a. Through the identification of enhancers, they reveal complex modes of regulation of the gene. The authors suggest candidate transcription factors that likely act on the identified enhancer elements. All the results taken together provide comprehensive new insights into the process of bone development, and point to spatio-temporally regulated promoter/enhancer interactions taking place at the entpd5a locus.

      Strengths:

      The authors have succeeded in justifying a sound and consistent buildup of their experiments, and meaningfully integrating the results into the design of each of their follow-up experiments. The data are solid, insightfully presented, and the conclusion valid. This makes this manuscript of great value and interest to those studying (fundamental) skeletal biology.

      Weaknesses:

      The study is solidly constructed, the manuscript is clearly written and the discussion is meaningful - I see no real weaknesses.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor issues that may need to be addressed or detailed:

      Supplementary Figures 1I-J, text page 4, line 24: "photoconversion and imaging": this needs some more detailed description: green fluorescent cells should be actively expressing Kaede, but only if there is a delay between photoconversion and imaging. What is the reason that Supplementary Figure 1F shows mainly green fluorescent cells, contrary to 1G-J?

      In our experiments, we could see new Kaede expression under the control of the entpd5a promoter region within 1.5 hours of photoconversion, as shown in Suppl. Figure 1E-H, suggesting that this time window was sufficient for protein generation. The reason for Suppl. Fig 1F showing more green fluorescence we believe relates to the high rate of transcriptional activity at that stage, in the entirety of the notochord progenitor cells. In addition, this is an effect which we attribute to the relatively small number of cells producing red fluorescence at that stage, due to photoconversion of only a few Kaede+ cells at the 15 somites stage (Suppl. Fig. 1E). Therefore, the masking effect of the green fluorescence by the red is not as significant as in G and H, where the red fluorescence resulting from photoconversion right after imaging at 18s and 21s, respectively, significantly overlaps with new green fluorescence. This can be seen in the image as the presence of orange fluorescence in G and H, instead of the clear red shown in E, I and J.

      In addition to this, we would like to point out that in Suppl. Fig. 1I, J the reason that green fluorescence is only detected in the ventral region of the notochord, is because the promoter of entpd5a only remains active in the ventral-most sheath cells at that stage. This is stated in the results section of the main text, first subsection, paragraph 3. The reason for this very interesting, strictly localised expression pattern remains unclear.

      Somewhat intriguing: green fluorescence in Figure 1B, C (osx:GAL4FF) and Supplementary Figure 1C (entpd5a:GAL4FF) in the CNS? Would that be an artefact of the GAL4FF/UAS:GFP system?

      We are confident that the fluorescence pointed out by the reviewer is not an artefact of the GAL4FF/UAS system, for a few reasons. Firstly, osx (Sp7) has been shown to be expressed and to function in the nervous system in mice (Park et al, BBRC, 2011; Elbaz et al, Neuron, 2023). Secondly, not only osx, but also entpd5a can be readily detected in a subset of cranial and spinal neurons in early development using the entpd5a:GAL4FF; UAS:GFP transgenic line (Suppl. Fig 1C). Finally, when establishing transgenic lines with the entpd5a(1.1):GFP construct, expression was almost invariably present in diverse elements of the nervous system, but not in bone (data not shown). This led us to hypothesise that the minimal promoter of entpd5a (and possibly also that of osx) is activated by transcription factors active in the nervous system, and this effect is likely controlled by the surrounding enhancers, but also the genome location. It is unclear at present what the endogenous neural expression of the two genes is like, and we did not further investigate this in this study, as the focus was on the skeleton.

      Figure 2: What exactly is "Corrected Total Cell Fluorescence"? Is it green + red fluorescence?

      We thank the reviewer for pointing out the absence of more information on this. Corrected total cell fluorescence does not correspond to green+ red fluorescence, rather it is calculated as follows for a single channel:

      CTCF = Integrated Density – (Area of selected cell X Mean fluorescence of background readings)

      More details can be found in the following website: https://theolb.readthedocs.io/en/latest/imaging/measuring-cell-fluorescence-using-imagej.html

      We have edited the Materials and Methods section under “Imaging and image analysis” to include the aforementioned information.

      Page 11, line 34: The authors may have missed the recently published "Raman et al., Biomolecules 2024 Vol. 14; doi:10.3390/biom14020139" describing RNA-Seq in 4 dpf osterix+ osteoblasts.

      We thank the reviewer for drawing our attention to the Raman et al publication. The reference has now been added in the manuscript.

      Figure 5A and B: use a higher resolution version to make the numbers and gene names more readable. Figures 5C and 6A could also use a larger font for the text and numbers.

      High resolution files are now included with the revised manuscript, which should significantly help in making figures more easily readable. Although we agree with the reviewer that larger fonts would improve readability, due to the nature of the graphs (very small spaces in some cases, where the numbers would have to fit) this would not be easy to achieve. However, we believe that this issue will be resolved with the availability of higher resolution files. If readability remains a concern, we would be happy to attempt re-organising the graphs to allow for larger fonts.

      Reviewer #2 (Recommendations For The Authors):

      I suggest no further experiments, but do suggest that a few points be clarified.

      In the Discussion, the text "the less evolved osteoblasts of fish and amphibians..." is not accurate. These cells are not less evolved as they represent an independent lineage to tetrapods that have evolved with different stresses for a similar time. However, as teleost fishes and amphibians share characteristics and all share a common ancestor, these signatures represent a putative ancestral state of skeletal differentiation not seen in amniotes, including humans.

      We thank the reviewer for pointing out the unfortunate phrasing. The text has now been modified as follows: “Specifically, the osteoblasts of teleost fish and amphibians, whose characteristics are putatively closer to a more ancestral state of skeletal differentiation compared to amniotes, appear to share gene expression with chondrocytes”.

      The title could potentially be shortened to reach a broader audience by removing the initial clause of 'integration of ATAC and RNA seq' as this is a commonly performed analysis - "Chromatin and transcriptomic signature in classical and non-classical zebrafish osteoblasts indicate mechanisms of ancestral skeletal differentiation" is more descriptive of the findings and not focused on the method.

      We have discussed this internally, but would prefer to retain the current title. The reason is (1) because we would like to see our methodology and datasets be used as platform for further studies, and the current title, in our opinion, facilitates this. In regards to replacing “mechanisms of entpd5a regulation” with “mechanisms of ancestral skeletal differentiation”, we think this does not give an accurate description of our work, which is primarily focused on elucidating entpd5a promoter dynamics.

      All datasets should be made available as soon as possible for use in the field.

      The datasets (raw and processed) are available on the GEO database. The corresponding accession numbers can be found in our data availability statement.

      Minor comments:

      (1) Figure 1A. The labels are missing for grey and light blue structures.

      These structures are together making up the “notochord sheath”, which is comprised of the basal lamina (grey), the medial layer of fibrillar collagen (light blue) and the outer layer of loosely arranged matrix (lighter blue). We modified the figure legend to indicate that the three layers all correspond to the notochord sheath.

      (2) Figure 2A. The constructs in the lower part of the panel are not discussed in the legend and seem out of place in terms of data type and analysis.

      We would argue that indicating which non-coding regions and which ATAC peaks were responsible for driving GFP expression in each construct aids in a better understanding of our results. We thank the reviewer for pointing out the lack of mention of these constructs in the figure legend. This issue has now been resolved.

      (3) Be wary of red/green color combinations, especially in the figures where these are juxtaposed with each other.

      We apologise for the use of red/green colour. Although it is not possible for this manuscript to change the colour patterns, we will make sure to avoid the use of these colours in conjunction in the future.

      (4) The use of fish as a term should be classified as teleost fish, as authors are not addressing non-teleost basal ray-finned fishes or the fact that tetrapods are within bony fishes overall.

      This is well spotted, we have now remedied this by editing the manuscript. Where the term “fish” was used, we now state “teleost fish”.

      (5) Age information is missing in several Figures (e.g. 1D and 2C).

      In some of the figures space constrains did not allow for including the stage on the figure itself. However, we have made sure that in those cases the stage is incorporated in the figure legend.

      (6) The resolution of several Figures (e.g. Figure 5 and Supplementary Figure 3) is low.

      We address this issue by providing high resolution figures with the revised manuscript.

      (7) In the sentence (top page before Discussion) "The same conclusion was reached upon isolation from these three..", it was unclear what 'upon isolation' referred to.

      We agree with the reviewer that this phrasing is unclear. To enhance clarity, the manuscript now reads as follows: “The same conclusion was reached upon isolation of the DEGs highlighted by our RNA-seq results, from the three aforementioned groups of genes associated with ATAC peaks for each cell population.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The work analyzes how centrosomes mature before cell division. A critical aspect is the accumulation of pericentriolar material (PCM) around the centrioles to build competent centrosomes that can organize the mitotic spindle. The present work builds on the idea that the accumulation of PCM is catalyzed either by the centrioles themselves (leading to a constant accumulation rate) or by enzymes activated by the PCM itself (leading to autocatalytic accumulation). These ideas are captured by a previous model derived for PCM accumulation in C. elegans (ref. 8) and are succinctly summarized by Eq. 1. The main addition of the present work is to allow the activated enzymes to diffuse in the cell, so they can also catalyze the accumulation of PCM in other centrosomes (captured by Eqs. 2-4). The authors claim that this helps centrosomes to reach the same size, independent of potential initial mismatches.

      A strength of the paper is the simplicity of the equations, which are reduced to the bare minimum and thus allow a detailed inspection of the physical mechanism. One shortcoming of this approach is that all equations assume that the diffusion of molecules is much faster than any of the reactive time scales, although there is no experimental evidence for this.

      We appreciate the reviewer’s recognition of the strengths of our work. Indeed, the centrosome growth model incorporates multiple timescales corresponding to various reactions, and existing experimental data do not directly provide diffusion constants for the cytosolic proteins. However, we can estimate these diffusion constants using protein mass, based on the Stokes-Einstein relation, and compare the diffusion timescales with the reaction timescales obtained from FRAP analysis. For example, we estimate that the diffusion timescale for centrosomes separated by 5-10 micrometers is much smaller than the reaction timescales deduced from the FRAP experiments. Specifically, for SPD-5, a scaffold protein with a mass of ~150 kDa, the estimated diffusion constant is ~17 µm<sup>2</sup>/s, using the Stokes-Einstein relation and a reference diffusion constant of ~30 µm<sup>2</sup>/s for a 30 kDa GFP protein (reference: Bionumbers book). This results in a diffusion timescale of ~1 second for centrosomes 10 µm apart. In contrast, FRAP recovery timescales for SPD-5 in C. elegans embryos are on the order of several minutes, suggesting that scaffold protein binding reactions are much slower than diffusion. Therefore, a reaction-limited model is appropriate for studying PCM self-assembly during centrosome maturation. We have revised the manuscript to clarify this point and to include a discussion of the diffusion and reaction timescales.

      Spatially extended model with diffusion

      Both the reviewers have pointed out the importance of considering diffusion effects in centrosome size dynamics, and we agree that this is important to explore. We have developed a spatially extended 3D version of the centrosome growth model, incorporating stochastic reactions and diffusion (see Appendix 4). In this model, the system is divided into small reaction volumes (voxels), where reactions depend on local density, and diffusion is modeled as the transport of monomers/building blocks between voxels.

      We find that diffusion can alter the timescales of growth, particularly when the diffusion timescale is comparable to or slower than the reaction timescale, potentially mitigating size inequality by slowing down autocatalysis. However, the main conclusions of the catalytic growth model remain unchanged, showing robust size regulation independent of diffusion constant or centrosome separation (Figure 2—figure supplement 3). Hence, we focused on the effect of subunit diffusion on the autocatalytic growth model. We find that in the presence of diffusion, the size inequality reduces with increasing diffusion timescale, i.e., increasing distance between centrosomes and decreasing diffusion constant (Figure 2—figure supplement 4). However, the lack of robustness in size control in the autocatalyic growth model remains, i.e., the final size difference increases with increasing initial size difference. Notably, in the diffusion-limited regime (very small diffusion or large distances), the growth curve loses its sigmoidal shape, resembling the behavior in the non-autocatalytic limit (Figure 2). These findings are discussed in the revised manuscript.

      Another shortcoming of the paper is that it is not clear what species the authors are investigating and how general the model is. There are huge differences in centrosome maturation and the involved proteins between species. However, this is not mentioned in the abstract or introduction. Moreover, in the main body of the paper, the authors mention C. elegans on pages 2 and 3, but refer to Drosophila on page 4, switching back to C. elegans on page 5, and discuss Drosophila on page 6. This is confusing and looks as if they are cherry-picking elements from various species. The original model in ref. 8 was constructed for C. elegans and it is not clear whether the autocatalytic model is more general than that. In any case, a more thorough discussion of experimental evidence would be helpful.

      We believe one strength of our approach is its applicability across organisms. Our goal in comparing the theoretical model with experimental data from C. elegans and D.

      melanogaster is to demonstrate that the apparent qualitative differences in centrosome growth across species (see e.g., the extent of size scaling discussed in the section “Cytoplasmic pool depletion regulates centrosome size scaling with cell size”) may arise from the same underlying mechanisms in the theoretical model, albeit with different parameter values. We acknowledge differences in regulatory molecules between species, but the core pathways remain conserved see e.g. Raff, Trends in Cell Biology 2019, section: “Molecular Components of the Mitotic Centrosome Scaffold Appear to Have Been Conserved in Evolution from Worms to Humans”. In the revised manuscript, we have expanded the introduction to clarify this point and explain how our theory applies across species. We have also provided a clearer discussion of the experimental systems used throughout the manuscript and the available experimental evidence.

      The authors show convincingly that their model compensates for initial size differences in centrosomes and leads to more similar final sizes. These conclusions rely on numerical simulations, but it is not clear how the parameters listed in Table 1 were chosen and whether they are representative of the real situation. Since all presented models have many parameters, a detailed discussion on how the values were picked is indispensable. Without such a discussion, it is not clear how realistic the drawn conclusions are. Some of this could have been alleviated using a linear stability analysis of the ordinary differential equations from which one could have gotten insight into how the physical parameters affect the tendency to produce equal-sized centrosomes.

      Following the suggestion of the reviewer, we have revised the manuscript to add references and discussions justifying the choice of the parameter values used for the numerical simulations. These references and parameter choices can be found in Table 1 and Table 2, and are also discussed in relevant figure captions and within the manuscript text.

      We thank the reviewer for the excellent suggestion of including linear stability analysis of the ODE models of centrosome growth. We included linear stability analyses of the catalytic and autocatalytic growth models in Appendix 3. Analysis of the catalytic growth model reaffirms the robustness of size equality and the analysis of autocatalytic growth provides an approximate condition of size inequality. We have modified the revised manuscript to discuss these results.

      The authors use the fact that their model stabilizes centrosome size to argue that their model is superior to the previously published one, but I think that this conclusion is not necessarily justified by the presented data. The authors claim that "[...] none of the existing quantitative models can account for robustness in centrosome size equality in the presence of positive feedback." (page 1; similar sentence on page 2). This is not shown convincingly. In fact, ref 8. already addresses this problem (see Fig. 5 in ref. 8) to some extent.

      The linear stability analysis shown in Fig 5 in ref 8 (Zwicker et al, PNAS, 2014) shows that the solutions are stable around the fixed point and it was inferred from this result that Ostwald ripening can be suppressed by the catalytic activity of the centriole, therefore stabilizing the centrosomes (droplets) against coarsening by Ostwald ripening. But, if size discrepancy arises from the growth process (e.g., due to autocatalysis) the timescale of relaxation for such discrepancy is not clear from the above-mentioned result. We show (in figure 2 - figure supplement 3) that for any appreciable amount of positive feedback, the solution moves very slowly around the fixed point (almost like a line attractor) and cannot reach the fixed point in a biologically relevant timescale. Hence the model in ref 8 does not provide a robust mechanism for size control in the presence of autocatalytic growth. We have added this discussion in the Discussion section.

      More importantly, the conclusion seems to largely be based on the analysis shown in Fig. 2A, but the parameters going into this figure are not clear (see the previous paragraph). In particular, the initial size discrepancy of 0.1 µm^3 seems quite large, since it translates to a sphere of a radius of 300 nm. A similarly large initial discrepancy is used on page 3 without any justification. Since the original model itself already showed size stability, a careful quantitative comparison would be necessary.

      We thank the reviewer for the valuable suggestions. The parameters used in Fig. 2A are listed in Table 1 with corresponding references, and we used the parameter values from Zwicker et al. (2014) for rate constants and concentrations.

      The issue of initial size differences between centrosomes is important, but quantitative data on this are not readily available for C. elegans and Drosophila. Centrosomes may differ initially due to disparities in the amount and incorporation rate of PCM between the mother and daughter centrioles. Based on available images and videos (Cabral et al, Dev. Cell, 2019, DOI: https://doi.org/10.1016/j.devcel.2019.06.004), we estimated an initial radius of ~0.5 μm for centrosomes. Accounting for a 5% radius difference would lead to a volume difference of ~0.1 μm<sup>3</sup>, which was used in our analysis (Fig. 2A). These differences likely arise from distinct growth conditions of centrosomes containing different centrioles (older mother and newer daughter).

      More importantly, we emphasize that the initial size difference does not qualitatively alter the results presented in Figure 2. We agree that a quantitative analysis will further clarify our conclusions, and we have revised the manuscript accordingly. For example, Figure 2—figure supplement 3 provides a detailed analysis of how the final centrosome size depends on initial size differences across various parameter values. Additionally, Appendix 3 now includes analytical estimates of the onset of size inequality as a function of these parameters.

      The analysis of the size discrepancy relies on stochastic simulations (e.g., mentioned on pages 2 and 4), but all presented equations are deterministic. It's unclear what assumptions go into these stochastic equations, and how they are analyzed or simulated. Most importantly, the noise strength (presumably linked to the number of components) needs to be mentioned. How is this noise strength determined? What are the arguments for this choice? This is particularly crucial since the authors quote quantitative results (e.g., "a negligible difference in steady-state size (∼ 2% of mean size)" on page 4).

      As described in the Methods, we used the exact Gillespie method (Gillespie, JPC, 1977) to simulate the evolution of the stochastic trajectories of the systems, corresponding to the deterministic growth and reaction kinetics outlined in the manuscript. We've expanded the Methods to include further details on the stochastic simulations and refer to Appendix 1, where we describe the chemical master equations governing autocatalytic growth..

      The noise strength (fluctuations about the mean size of centrosome) does depend on the total monomer concentration (the pool size), and this may affect size inequality. Similar values of the total monomer concentration were used in the catalytic (0.04 uM) and autocatalytic growth (0.33 uM) simulations. These values for the pool size are similar to previous studies (Zwicker et al, PNAS, 2012) and have been optimized to obtain a good fit with experimental growth curves from C. elegans embryo data.

      To present more quantitative results, we have revised our manuscript to add data showing the effect of pool size on centrosome size inequality (Figure 3 - figure supplement 2). We find the size inequality in catalytic growth to increase with decreasing pool size as the origin of this inequality is the stochastic fluctuation in individual centrosome size. The size inequality (ratio of dv/<V>) in the autocatalytic growth does not depend (strongly) on the pool size (dv and <V> both increase similarly with pool size).

      Moreover, the two sets of testable predictions that are offered at the end of the paper are not very illuminative: The first set of predictions, namely that the model would anticipate an "increase in centrosome size with increasing enzyme concentration, the ability to modify the shape of the sigmoidal growth curve, and the manipulation of centrosome size scaling patterns by perturbing growth rate constants or enzyme concentrations.", are so general that they apply to all models describing centrosome growth. Consequently, these observations do not set the shared enzyme pool apart and are thus not useful to discriminate between models. The second part of the first set of predictions about shifting "size scaling" is potentially more interesting, although I could not discern whether "size scaling" referred to scaling with cell size, total amount of material, or enzymatic activity at the centrioles. The second prediction is potentially also interesting and could be checked directly by analyzing published data of the original model (see Fig. 5 of ref. 8). It is unclear to me why the authors did not attempt this.

      In response to the reviewers' valuable feedback, we have revised the manuscript to include results on potential methods for distinguishing catalytic growth from autocatalytic growth. Since the growth dynamics of a single centrosome do not significantly differ between these two models, it is necessary to experimentally examine the growth dynamics of a centrosome pair under various initial size perturbations. In Figure 3-figure supplement 2, we present theoretical predictions for both catalytic and autocatalytic growth models, illustrating the correlation between initial and final sizes after maturation. The figure demonstrates that the initial size difference and final size difference should be correlated only in the autocatalytic growth and the relative size inequality decreases with increasing subunit pool size in catalytic growth while remains almost unchanged in autocatalytic growth. These predictions can be experimentally examined by inducing varying centrosome sizes at the early stage of maturation for different expression levels of the scaffold former proteins.

      A second experimentally testable feature of the catalytic growth model involves sharing of the enzyme between both centrosomes. This could be tested through immunofluorescent staining of the kinase or by constructing a FRET reporter for PLK1 activity, where it can be studied if the active form of the PLK1 is found in the cytoplasm around the centrosomes indicating a shared pool of active enzyme. Additionally, photoactivated localization microscopy could be employed, where fluorescently tagged enzyme can be selectively photoactivated in one centrosome and intensity can be measured at the other centrosome to find the extent of enzyme sharing between the centrosomes.

      We also discuss shifts in centrosome size scaling behavior with cell size by varying parameters of the catalytic growth model (Fig 4). While quantitative analysis of size scaling in Drosophila is currently unavailable, such an investigation could enable us to distinguish catalytic growth mode with other models. We have included this point in the Discussion section.

      “The second prediction is potentially also interesting …” We assume the reviewer is referencing the scenario in Zwicker et al. (ref 8), where differences in centriole activity lead to unequal centrosome sizes. The data in that study represent a case of centrosome growth with variable centriole activity, resulting in size differences in both autocatalytic and catalytic growth models. This differs from our proposed experiment, where we induce unequal centrosome sizes without modifying centriole activity. We have now revised the text to clarify this distinction.

      Taken together, I think the shared enzyme pool is an interesting idea, but the experimental evidence for it is currently lacking. Moreover, the model seems to make little testable predictions that differ from previous models.

      We appreciate the reviewer’s interest in the core idea of our work. As mentioned earlier, we have improved the clarity in model predictions in the revised discussion section. Unfortunately, the lack of publicly available experimental data limits our ability to provide more direct experimental evidence. However, we are hopeful that our theoretical model will inspire future experiments to test these model predictions.

      Reviewer #2 (Public Review):

      Summary:

      In this paper, Banerjee & Banerjee argue that a solely autocatalytic assembly model of the centrosome leads to size inequality. The authors instead propose a catalytic growth model with a shared enzyme pool. Using this model, the authors predict that size control is enzyme-mediate and are able to reproduce various experimental results such as centrosome size scaling with cell size and centrosome growth curves in C. elegans.

      The paper contains interesting results and is well-written and easy to follow/understand.

      We are delighted that the reviewer finds our work interesting, and we appreciate the thoughtful suggestions provided. In response, we have revised the text and figures to incorporate these recommendations. Below, we address each of the reviewer’s comments point by point:

      Suggestions:

      ● In the Introduction, when the authors mention that their "theory is based on recent experiments uncovering the interactions of the molecular components of centrosome assembly" it would be useful to mention what particular interactions these are.

      As the reviewer suggested, we have modified the introduction section to add the experimental observations upon which we build our model.

      ● In the Results and Discussion sections, the authors note various similarities and differences between what is known regarding centrosome formation in C. elegan and Drosophila. It would have been helpful to already make such distinctions in the Introduction (where some phenomena that may be C. elegans specific are implied to hold centrosomes universally). It would also be helpful to include more comments for the possible implications for other systems in which centrosomes have been studied, such as human, Zebrafish, and Xenopus.

      We thank the reviewer for this suggestion. We have modified the Introduction to motivate the comparative study of centrosome growth in different organisms and draw relevant connections to centrosome growth in other commonly studied organisms like Zebrafish and Xenopus.

      ● For Fig 1.C, the two axes are very close to being the same but are not. It makes the graph a little bit more difficult to interpret than if they were actually the same or distinctly different. It would be more useful to have them on the same scale and just have a legend.

      We have modified the Figure 1C in the revised manuscript. The plot now shows the growth of a single and a pair of centrosomes both on the same y-axis scale.

      ● The authors refer to Equation 1 as resulting from an "active liquid-liquid phase separation", but it is unclear what that means in this context because the rheology of the centrosome does not appear to be relevant.

      We used the term “active liquid-liquid phase separation” simply to refer to a previous model proposed by Zwicker et al (PNAS, 2014) where the underlying process of growth results from liquid-liquid phase separation. We agree with the reviewer that the rheological property of the centrosome is not very relevant in our discussions and we have thus removed the sentence from the revised manuscript to avoid any confusion.

      ● The authors reject the non-cooperative limit of Eq 1 because, even though it leads to size control, it does not give sigmoidal dynamics (Figure 2B). While I appreciate that this is just meant to be illustrative, I still find it to be a weak argument because I would guess a number of different minor tweaks to the model might keep size control while inducing sigmoidal dynamics, such as size-dependent addition of loss rates (which could be due to reactions happen on the surface of the centrosome instead of in its bulk, for example). Is my intuition incorrect? Is there an alternative reason to reject such possible modifications?

      The reviewer raises an interesting point here. However, we disagree with the idea that minor adjustments to the model can produce sigmoidal growth curves while still maintaining size control. In the absence of an external, time-dependent increase in building block concentration (which would lead to an increasing growth rate), achieving sigmoidal growth requires a positive feedback mechanism in the growth rate. This positive feedback alone could introduce size inequality unless shared equally between the centrosomes, as it is in our model of catalytic growth in a shared enzyme pool. The proposed modification involving size-dependent addition or loss rates due to surface assembly/disassembly may result in unequal sizes precisely because of this positive feedback. A similar example is provided in Appendix 1, where assembly and disassembly across the pericentriolic material volume lead to sigmoidal growth but also generate significant size inequality and lack of robustness in size control.

      ● While the inset of Figure 3D is visually convincing, it would be good to include a statistical test for completeness.

      Following the reviewer’s suggestion, we present a statistical analysis in Figure 3 - Figure supplement 2 in the modified manuscript to enhance clarity. We show that the size difference values are uncorrelated (Pearson’s correlation coefficient ~ 0) with the initial size difference indicating the robustness of the size regulation mechanism.

      ● The authors note that the pulse in active enzyme in their model is reminiscent of the Polo kinase pulse observed in Drosophila. Can the authors use these published experimental results to more tightly constrain what parameter regime in their model would be relevant for Drosophila? Can the authors make predictions of how this pulse might vary in other systems such as C. elegans?

      Thank you for the insightful suggestion regarding the use of pulse dynamics in experiments to better constrain the model’s parameter regime. In our revised manuscript, we attempted this analysis; however, the data from Wong et al. (EMBO 2022) for Drosophila are presented as normalized intensity in arbitrary units, rather than as quantitative measures of centrosome size or Polo enzyme concentration. This lack of quantitative data limits our ability to benchmark the model beyond capturing qualitative trends. We thus believe that quantitative measurements of centrosome size and enzyme concentration are necessary to achieve a tighter alignment between model predictions and biological data.

      We discuss the enzyme dynamics in C. elegans in the revised manuscript. We find the enzyme dynamics corresponding to the fitted growth curves of C. elegans centrosomes are distinctly different from the ones observed in Drosophila. Instead of the pulse-like feature, we find a step-like increase in (cytosolic) active enzyme concentration.

      ● The authors mention that the shared enzyme pool is likely not diffusion-limited in C. elegans embryos, but this might change in larger embryos such as Drosophila or Xenopus. It would be interesting for the authors to include a more in-depth discussion of when diffusion will or will not matter, and what the consequence of being in a diffusion-limit regime might be.

      Both the reviewers have pointed out the importance of considering diffusion effects in centrosome size dynamics, and we agree that this is important to explore. We have developed a spatially extended 3D version of the centrosome growth model, incorporating stochastic reactions and diffusion (see Appendix 4). In this model, the system is divided into small reaction volumes (voxels), where reactions depend on local density, and diffusion is modeled as the transport of monomers/building blocks between voxels.

      We find that diffusion can alter the timescales of growth, particularly when the diffusion timescale is comparable to or slower than the reaction timescale, potentially mitigating size inequality by slowing down autocatalysis. However, the main conclusions of the catalytic growth model remain unchanged, showing robust size regulation independent of diffusion constant or centrosome separation (Figure 2—figure supplement 3). Hence, we focused on the effect of subunit diffusion on the autocatalytic growth model. We find that in the presence of diffusion, the size inequality reduces with increasing diffusion timescale, i.e., increasing distance between centrosomes and decreasing diffusion constant (Figure 2—figure supplement 4). However, the lack of robustness in size control in the autocatalyic growth model remains, i.e., the final size difference increases with increasing initial size difference. Notably, in the diffusion-limited regime (very small diffusion or large distances), the growth curve loses its sigmoidal shape, resembling the behavior in the non-autocatalytic limit (Figure 2). These findings are discussed in the revised manuscript.

      ● The authors state "Firstly, our model posits the sharing of the enzyme between both centrosomes. This hypothesis can potentially be experimentally tested through immunofluorescent staining of the kinase or by constructing FRET reporter of PLK1 activity." I don't understand how such experiments would be helpful for determining if enzymes are shared between the two centrosomes. It would be helpful for the authors to elaborate.

      Our results indicate the necessity of the centrosome-activated enzyme to be shared for the robust regulation of centrosome size equality. If a FRET reporter of the active form of the enzyme (e.g., PLK1) can be constructed then the localization of the active form of the enzyme may be determined in the cytosol. We propose this based on reports of studying PLK activities in subcellular compartments using FRET as described in Allen & Zhang, BBRC (2006). Such experiments will be a direct proof of the shared enzyme pool. Following the reviewer’s suggestion, we have modified the description of the FRET based possible experimental test for the shared enzyme pool hypothesis in the revised manuscript.

      Additionally, we have added another possible experimental test based on photoactivated localization microscopy (PALM), where tagged enzyme can be selectively photoactivated in one centrosome and intensity measured at the other centrosome to indicate whether the enzyme is shared between the centrosomes.

      Recommendations for the authors:

      The manuscript needs to clarify better what species the model describes, how alternative models were rejected, and how the parameters were chosen.

      In the revised manuscript, we have connect the chemical species in our model to those documented in organisms like Drosophila and C. elegans. This connection is detailed in the main text under the Catalytic Growth Model section and summarized in Table 2. We discuss alternative models and our reasons for excluding them in the first results section on autocatalytic growth, with additional details provided in Appendix 1 and the accompanying supplementary figures. The selection of model parameters is addressed in the main text and methods, with references listed in Table 1. We believe that these revisions, along with our point-by-point responses to reviewer comments, comprehensively address all reviewer concerns.

      Reviewer #1 (Recommendations For The Authors):

      I think the style and structure of the paper could be improved on at least two accounts:

      (1) What's the role of the last section ("Multi-component centrosome model reveals the utility of shared catalysis on centrosome size control.")? It seems to simply add another component, keeping the essential structure of the model untouched. Not surprisingly, the qualitative features of the model are preserved and quantitative features are not discussed anyway.

      This model provides a more realistic description of centrosome growth by incorporating the dynamics of the two primary scaffold-forming subunits and their interactions with an enzyme. It is based on the observation that the major interaction pathways among centrosome components are conserved across many organisms (see Raff, Trends in Cell Biology, 2019 and Table 2), typically involving two scaffold-forming proteins and one enzyme that mediates positive feedback between them. These pathways may involve homologous proteins in different species.

      This model allows us to validate the experimentally observed spatial spread of the two subunits, Cnn and Spd-2, in Drosophila. Additionally, we used it to investigate the impact of relaxing the assumption of a shared enzyme pool on size control. Although similar insights could be obtained using a single-component model, the two-component model offers a more biologically relevant framework. We have highlighted these points in the revised manuscript to ensure clarity.

      (2 ) The very long discussion section is not very helpful. First, it mostly reiterates points already made in the main text. Second, it makes arguments for the choice of modeling (top left column of page 8), which probably should have been made when introducing the model. Third, it introduces new results (lower left column of page 8), which should probably be moved to the main text. Fourth, the interpretation of the model in light of the known biochemistry is useful and should probably be expanded although I think it would be crucial to keep information from different organisms clearly separate (this last point actually holds for the entire manuscript).

      We thank the reviewer for the feedback. We have modified the discussion section to focus more on the interpretation of the results, model predictions and future outlook with possible experiments to validate crucial aspects of the model. We have moved most of the justifications to the main text model description.

      Here are a few additional minor points:

      * page 1: Typo "for for" → "for"

      * Page 8: Typo "to to" → "to"

      We thank the reviewer for the useful recommendations. We have corrected all the typos in the revised manuscript.

      * Why can diffusion be neglected in Eq. 1? This is discussed only very vaguely in the main text (on page 3). Strangely, there is some discussion of this crucial initial step in the discussion section, although the diffusion time of PLK1 is compared to the centrosome growth time there and not the more relevant enzyme-mediate conversion rate or enzyme deactivation rate.

      We now discuss the justification of neglecting diffusion while motivating the model. We have added a more detailed discussion in the Methods section. We estimate the timescale of diffusion for the scaffold formers and the enzyme and compare them with the turnover timescales of the respective proteins Spd-2, Cnn and Polo. We find the proteins to diffuse fast compared to their FRAP recovery timescales indicating reaction timescales to be slower than the timescales of diffusion. Nevertheless, following the reviewer’s suggestion, we have also investigated the effect of diffusion on the growth process in Appendix 4.

      * Page 3: The comparison k_0^+ ≫ k_1^+ is meaningless without specifying the number of subunits n. I even doubt that this condition is the correct one since even if k_0^+ is two orders of magnitude larger than k_1^+, the autocatalytic term can dominate if there are many subunits.

      We thank the reviewer for the insightful comment on the comparison between the growth rates k^+_0 and k^+_1. Indeed, the pool size matters and we have now included a linear stability analysis of the autocatalytic growth equations in Appendix 3 to estimate the condition for size inequality. We have commented on these new findings in the revised manuscript.

      * The Eqs. 2-4 are difficult to follow in my mind. For instance, it is not clear why the variables N_av and N_av^E are introduced when they evidently are equivalent to S_1 and E. It would also help to explicitly mention that V_c is the cell volume. Moreover, do these equations contain any centriolar activity? If so, I could not understand what term mediates this. If not, it might be good to mention this explicitly.

      Following the reviewer’s suggestion, we have modified the equations 2-4 and added the definition of V_c to enhance clarity in the revised manuscript. The centriole activity is given by k^+ in the catalytic model. We now explicitly mention it.

      * Page 4: The observed peak of active enzyme (Fig 3C) is compared to experimental observation of a PLK1 peak at centrosomes in Drosophila (ref. 28). However, if I understand correctly, the peak in the model refers to active enzyme in the entire cell (and the point of the model is that this enzymatic pool is shared everywhere), whereas the experimental measurement quantified the amount of PLK1 at the centrosome (and not the activity of the enzyme). How are the quantity in the model related to the experimental measurements?

      The reviewer is correct in pointing out the difference between the quantities calculated from our model and those measured in the experiment by Wong et al. We have clarified this point in the revised manuscript. We hypothesize that if, in future experiments, the active (phosphorylated) polo can be observed by using a possible FRET reporter of activity then the cytosolic pulse can be observed too. We discuss this point in the revised manuscript.

      * Page 6: The asymmetry due to differences in centriolar activity is apparently been done for both models (Eq. 1 and Eqs. 2-4), referring to a parameter k_0^+ in both cases. How does this parameter enter in the latter model? More generally, I don't really understand the difference in the two rows in Fig. 5 - is the top row referring to growth driven by centriolar activity while the lower row refers to pure autocatalytic growth? If so, what about the hybrid model where both mechanisms enter? This is particularly relevant, since ref. 8 claims that such a hybrid model explains growth curves of asymmetric centrosomes quantitatively. Along these lines, the analysis of asymmetric growth is quite vague and at most qualitative. Can the models also explain differential growth quantitatively?

      We believe the reviewer’s comment on centrosome size asymmetry may stem from a lack of clarity in our initial explanation. In this section, as shown in Figure 5, we compare the full autocatalytic model (where both k_0^+ and k_1^+ are non-zero) with the catalytic model. The confusion might have arisen due to an unclear definition of centriolar activity in the catalytic growth model, which we have clarified in the revised manuscript. Specifically, we use k+ in the catalytic model and k0+ in the autocatalytic model as indicators of centriolar activity.

      Our findings quantitatively demonstrate that variations in centriole activity can robustly drive size asymmetry in catalytic growth, independent of initial size differences. However, in autocatalytic growth, increased initial size differences make the system more vulnerable to a loss of regulation, as positive feedback can amplify these differences, ultimately influencing the final size asymmetry. Our results do not contradict Zwicker et al. (ref 8); rather, they complement it. We show that size asymmetry in autocatalytic growth is governed by both centriole activity and positive feedback, highlighting that centriole activity alone cannot robustly regulate centrosome size asymmetry within this framework.

      * The code for performing the simulations does not seem to be available

      We have now made the main codes available in a GitHub repository. Link: https://github.com/BanerjeeLab/Centrosome_growth_model

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study demonstrates the significant role of secretory leukocyte protease inhibitor (SLPI) in regulating B. burgdorferi-induced periarticular inflammation in mice. They found that SLPI-deficient mice showed significantly higher B. burgdorferi infection burden in ankle joints compared to wild-type controls. This increased infection was accompanied by infiltration of neutrophils and macrophages in periarticular tissues, suggesting SLPI's role in immune regulation. The authors strengthened their findings by demonstrating a direct interaction between SLPI and B. burgdorferi through BASEHIT library screening and FACS analysis. Further investigation of SLPI as a target could lead to valuable clinical applications.

      The conclusions of this paper are mostly well supported by data, but two aspects need attention:

      (1) Cytokine Analysis:

      The serum cytokine/chemokine profile analysis appears without TNF-alpha data. Given TNF-alpha's established role in inflammatory responses, comparing its levels between wild-type and infected B. burgdorferi conditions would provide valuable insight into the inflammatory mechanism.

      (2) Sample Size Concerns:

      While the authors note limitations in obtaining Lyme disease patient samples, the control group is notably smaller than the patient group. This imbalance should either be addressed by including additional healthy controls or explicitly justified in the methodology section.

      We thank the reviewer for the careful review and positive comments.

      (1) We did look into the level of TNF-alpha in both WT and SLPI-/- mice with and without B. burgdorferi infection. At serum level, using ELISA, we did not observe any significant difference between all four groups. At gene expression level, using RT-qPCR on the tibiotarsal tissue, we also did not observe any significant differences. Our RT-qPCR result is consistent with the previous microarray study using the whole murine joint tissue (DOI: 10.4049/jimmunol.177.11.7930). The microarray study did not show significant changes in TNF-alpha level in C57BL/6 mice following B. burgdorferi infection. The above data suggest that TNF-alpha does not involve in SLPI-regulated immune responses in the murine tibiotarsal tissue following B. burgdorferi infection. A brief discussion will be added, and the above data will be provided as a supplemental figure in the revised manuscript.

      (2) We agree with the reviewer that the control group is smaller than the patient group. Among the archived samples that are available, the number of adult healthy controls are limited. It has been shown that the serum level of SLPI in healthy volunteers is in average about 40 ng/ml  (DOI: 10.3389/fimmu.2019.00664 and 10.1097/00003246-200005000-00003). The median level in the healthy control in our data was 38.92 ng/ml, which is comparable to the previous results. A brief discussion will be added in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      This manuscript by Yu and coworkers investigates the potential role of Secretory leukocyte protease inhibitor (SLPI) in Lyme arthritis. They show that, after needle inoculation of the Lyme disease (LD) agent, B. burgdorferi, compared to wild type mice, a SLPI-deficient mouse suffers elevated bacterial burden, joint swelling and inflammation, pro-inflammatory cytokines in the joint, and levels of serum neutrophil elastase (NE). They suggest that SLPI levels of Lyme disease patients are diminished relative to healthy controls. Finally, they find that SLPI may interact directly the B. burgdorferi.

      Strengths:

      Many of these observations are interesting and the use of SLPI-deficient mice is useful (and has not previously been done).

      We appreciate the reviewer’s careful reading and positive comments.

      Weaknesses:

      (a) The known role of SLPI in dampening inflammation and inflammatory damage by inhibition of NE makes the enhanced inflammation in the joint of B. burgdorferi-infected mice a predicted result;

      We agree that the observation of the elevated NE level and the enhanced inflammation is theoretically likely. Indeed, that was the hypothesis that we explored, and often what is theoretically possible does not turn out to occur. In addition, despite the known contribution of neutrophils to the severity of murine Lyme arthritis, the importance of the neutrophil serine proteases and anti-protease has not been specifically studied, and neutrophils secrete many factors. Therefore, our data fill an important gap in the knowledge of murine Lyme arthritis development – and set the stage for the further exploration of this hypothesis in the genesis of human Lyme arthritis.

      (b) The potential contribution of the greater bacterial burden to the enhanced inflammation is not addressed;

      We agree with the reviewer’s viewpoint that the increased infection burden in the tibiotarsal tissue of the infected SLPI-/- mice could contribute to the enhanced inflammation. A brief discussion of this possibility will be added to the revised manuscript.

      (c) The relationship of SLPI binding by B. burgdorferi to the enhanced disease of SLPI-deficient mice is not clear; and

      We agree with the reviewer that we have not shown the importance of the SLPI-B. burgdorferi binding in the development of periarticular inflammation. It is an ongoing project in our lab to identify the SLPI binding partner in B. burgdorferi. Our hypothesis is that SLPI could bind and inhibit an unknown B. burgdorferi virulence factor that contributes to murine Lyme arthritis. We will include the above discussion in the revised manuscript.

      (d) Several methodological aspects of the study are unclear.

      We appreciate the critique and will modify the method session in greater detail in the revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      The authors investigated the role of secretory leukocyte protease inhibitors (SLPI) in developing Lyme disease in mice infected with Borrelia burgdorferi. Using a combination of histological, gene expression, and flow cytometry analyses, they demonstrated significantly higher bacterial burden and elevated neutrophil and macrophage infiltration in SLPI-deficient mouse ankle joints. Furthermore, they also showed direct interaction of SLPI with B. burgdorferi, which likely depletes the local environment of SLPI and causes excessive protease activity. These results overall suggest ankle tissue inflammation in B. burgdorferi-infected mice is driven by unchecked protease activity.

      Strengths:

      Utilizing a comprehensive suite of techniques, this is the first study showing the importance of anti-protease-protease balance in the development of periarticular joint inflammation in Lyme disease.

      We greatly appreciate the reviewer’s careful reading and positive comments.

      Weaknesses:

      Due to the limited sample availability, the authors investigated the serum level of SLPI in both in Lyme arthritis patients and patients with earlier disease manifestations.

      We agree with the reviewer that it would be ideal to have more samples from Lyme arthritis patients. However, among the available archived samples, samples from Lyme arthritis patients are limited. For the samples from patients with single EM, the symptom persisted into 3-4 month after diagnosis, the same timeframe when arthritis is developed. We will add the above discussion in the revised manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In Figure 2, for histological scoring, do they have similar n numbers?

      In panel B, 20 infected WT mice and 19 infected SLPI-/- mice were examined. In panel D, 13 infected WT and SLPI-/- mice were examined. Without infection, WT and SLPI-/- mice do not develop spontaneous arthritis. Due to the slow breeding of the SLPI-/- mice, a small number of uninfected control animals were used.

      (2) In Figure 3, for macrophage population analysis, maybe consider implementing Ly6G-negative gating strategy to prevent neutrophil contamination in macrophage population?

      We appreciate reviewer’s suggestion. We will analyze the data using the Ly6G-negative gating strategy and provide the result in a supplemental figure. We will compare the results using the two gating strategies in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      (1) The investigators should address the possibility that much of the enhanced inflammatory features of infected SLPI-deficient mice are simply due to the higher bacterial load in the joint.

      We agree with the reviewer’s viewpoint that the increased infection burden in the tibiotarsal tissue of the infected SLPI-/- mice could contribute to the enhanced inflammation. A brief discussion of this possibility will be added to the revised manuscript.

      (2) Fig. 1. (A) There is no statistically significant difference in the bacterial load in the heart or skin, in contrast to the tibiotarsal joint. It would be of interest to know whether other tissues that are routinely sampled to assess the bacterial load, such as injection site, knee, and bladder, also harbored increased bacterial load in SLPI-deficient mice. (B) Heart and joint burden were measured at "21-28" days. The two time points should be analyzed separately rather than pooled.

      (A) We appreciate the reviewer’s suggestion. We agree that looking into the infection load in other tissues is helpful. However, studies into murine Lyme arthritis have been predominantly focused on tibiotarsal tissue, which displays the most consistent and prominent swelling that’s easy to observe and measure. Thus, we focused on the tibiotarsal joint in our study. (B) We collected the heart and joint tissue approximately 3-week post infection within a 3-day window based on the feasibility and logistics of the laboratory. Using “21-28 d”, we meant to describe between 21-24 days post infection. We apologize for the mislabeling and will correct it in the revised manuscript, stating approximately 3 weeks in the results, and defining approximately 3-weeks as between 21-24 days in the methods.

      (3) Fig. 2. (A) The same ambiguity as to the days post-infection as cited above in Point 2B exists in this figure. (B) Panel B: Caliper measurements to assess joint swelling should be utilized rather than visual scoring. (In addition, the legend should make clear that the black circles represent mock-infected mice.)

      (A) The histology scoring, and histopathology examination were performed at the same time as heart and joint tissue collection, approximately 3 weeks post infection within a 3-day window based on the feasibility and logistics of the laboratory. We apologize for the mislabeling and will correct it in the revised manuscript.  (B) We appreciate the reviewer’s suggestion. However, our extensive experience is that caliper measurement can alter the assessment of swelling by placing pressure on the joints and did not produce consistent results. Double blinded scoring was thus performed. Histopathology examination was performed by an independent pathologist and confirmed the histology score and provided additional measurements.

      (4) Fig. 3. (A) See Point 2B. (B) For Panels C-E, uninfected controls are lacking.

      We apologize for this omission. Uninfected controls will be provided in the revised manuscript.

      (5) Fig. 4. Fig. 4. Some LD subjects were sampled multiple times (5 samples from 3 subjects with Lyme arthritis; 13 samples from 4 subjects with EM), and samples from same individuals apparently are treated as biological replicates in the statistical analysis. In contrast, the 5 healthy controls were each sampled only once.

      We agree with the reviewer that the control group is smaller than the patient group. Among the archived samples that are available, the number of adult healthy controls are limited, and sampled once. We used these samples to establish the baseline level of SLPI in the serum. It has been shown that the serum level of SLPI in healthy volunteers is in average about 40 ng/ml  (DOI: 10.3389/fimmu.2019.00664 and 10.1097/00003246-200005000-00003). The median level in the healthy control in our data was 38.92 ng/ml, which is comparable to the previous results. A brief discussion will be added in the revised manuscript.

      (6) Fig. 5. (A) Panel A: does binding occur when intact bacteria are used? (B) Panels B, C: Were bacteria probed with PI to indicate binding likely to occur to surface? How many biological replicates were performed for each panel? Is "antibody control" a no SLPI control? What is the blue line?

      Actively growing B. burgdorferi were collected and used for binding assays. We do not permeabilize the bacteria for flow cytometry. Thus, all the binding detected occurs to the bacterial surface. Three biological replicates were performed for each panel. The antibody control is no SLPI control. For panel D, the bacteria were stained with Hoechst, which shows the morphology of bacteria. We apologize for the missing information. A complete and detailed description of Figure 5 will be provided in the revised manuscript. 

      (7) Sup Fig. 1. (A) Panel A: Was this experiment performed multiple times? I.e., how many biological replicates? (B) Panel B: Strain should be specified.

      The binding assay to B. burgdorferi B31A was performed two times. In panel B, B. burgdorferi B31A3 was used. We apologize for the missing information. A complete and detailed description will be provided in the revised manuscript. 

      (8) Fig. S2. It is not clear that the condition (20% serum) has any bactericidal activity, so the potential protective activity of SLPI cannot be determined. (Typical serum killing assays in the absence of specific antibody utilized 40% serum.)

      In Fig. S2, panel B, the first two bars (without SLPI, with 20% WT anti serum) showed around 40% viability. It indicates that the 20% WT anti serum has bactericidal activity. Serum was collected from B. burgdorferi-infected WT mice at 21 dpi, which should contain polyclonal antibody against B. burgdorferi.

      Reviewer #3 (Recommendations for the authors):

      It was a pleasure to review! I congratulate the authors on this elegant study. I think the manuscript is very well-written and clearly conveys the research outcomes. I only have minor suggestions to improve the readability of the text.

      We greatly appreciate the reviewer’s recognition of our work.

      Line 92: Please briefly summarize the key results of the study at the end of the introduction section.

      We appreciate the reviewer’s suggestion. A brief summary will be added in the revised manuscript.

      Line 108: Why is the inflammation significantly occurred only in ankle joints of SLPI-I mice? Could you please provide a brief explanation?

      The inflammation may also happen in other joints the B. burgdorferi infected SLPI-/- mice, which has not been studied. The study into murine Lyme arthritis has been predominantly done in the tibiotarsal tissue, which displays the most prominent swelling that’s easy to observe and measure. Thus, we focused on the tibiotarsal joint in our study.

      Line 136: Please also include the gene names in Figure 3.

      We apologize for the omission. Gene names will be included in the revised manuscript.

      Line 181: Please briefly introduce BASEHIT. Why did you use this tool? What are the benefits?

      We appreciate the reviewer’s suggestion. We will provide more background information on BASEHIT in the revised manuscript.

    1. Author response:

      eLife Assessment

      This important study reveals a role for IκBα in the regulation of embryonic stem cell pluripotency. The solid data in mouse embryonic stem cells include separation of function mutations in IκBα to dissect its non-canonical role as a chromatin regulator and its canonical function as NF-κB inhibitor. The conclusions could be strengthened by including better markers of differentiation status and additional controls or orthogonal approaches.

      We are thankful to the two reviewers and editors for their kind feedback and for highlighting the impact of NF-kB-independent IkBa function in stabilizing naïve pluripotency.

      In order to address reviewer’s comments, we will perform further analysis of differentiation trajectories, as well as a deeper comparison of the epigenetic features in our IkBa-KO mESCs with the Serum/LIF and 2i/LIF conditions. Moreover, we recognize that some sentences need to be modified to soften our conclusions in terms of effects on block in the naïve state or the global epigenetic effects, as the reviewers pointed out.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study probes the role of the NF-κB inhibitor IκBa in the regulation of pluripotency in mouse embyronic stem cells (mESCs). It follows from previous work that identified a chromatin-specific role for IκBa in the regulation of tissue stem cell differentiation. The work presented here shows that a fraction of IκBa specifically associates with chromatin in pluripotent stem cells. Using three Nfkbia-knockout lines, the authors show that IκBa ablation impairs the exit from pluripotency, with embryonic bodies (an in vitro model of mESC multi-lineage differentiation) still expressing high levels of pluripotency markers after sustained exposure to differentiation signals. The maintenance of aberrant pluripotency gene expression under differentiation conditions is accompanied by pluripotency-associated epigenetic profiles of DNA methylation and histone marks. Using elegant separation of function mutants identified in a separate study, the authors generate versions of IκBa that are either impaired in histone/chromatin binding or NF-κB binding. They show that the provision of the WT IκBa, or the NF-κB-binding mutant can rescue the changes in gene expression driven by loss of IκBa, but the chromatin-binding mutant can not. Thus the study identifies a chromatin-specific, NF-κB-independent role of IκBa as a regulator of exit from pluripotency.

      Strengths:

      The strengths of the manuscript lie in: (a) the use of several orthogonal assays to support the conclusions on the effects of exit from pluripotency; (b) the use of three independent clonal Nfkbia-KO mESC lines (lacking IκBa), which increase confidence in the conclusions; and (c) the use of separation of function mutants to determine the relative contributions of the chromatin-associated and NF-κB-associated IκBa, which would otherwise be very difficult to unpick.

      Weaknesses:

      In this reviewer's view, the term "differentiation" is used inappropriately in this manuscript. The data showing aberrant expression of pluripotency markers during embryoid body formation are supported by several lines of evidence and are convincing. However, the authors call the phenotype of Nfkbia-KO cells a "differentiation impairment" while the data on differentiation markers are not shown (beyond the fact that H3K4me1, marking poised enhancers, is reduced in genes underlying GO processes associated with differentiation and organ development). Data on differentiation marker expression from the transcriptomic and embryoid body immunofluorescent experiments, for example, should be at hand without the need to conduct many more experiments and would help to support the conclusions of the study or make them more specific. The lack of probing the differentiation versus pluripotency genes may be a missed opportunity in gaining in-depth understanding of the phenotype associated with loss of the chromatin-associated function of IκBa.

      Specific answer to weaknesses for Reviewer 1:

      We have data showing the lack of expression of specific differentiation markers that we will add to the manuscript. Moreover, we will also globally analyse differentiation markers in our transcriptomic data to have a more accurate description of the phenotype.

      Reviewer #2 (Public review):

      Summary:

      This manuscript investigates the role of IκBα in regulating mouse embryonic stem cell (ESC) pluripotency and differentiation. The authors demonstrate that IκBα knockout impairs the exit from the naïve pluripotent state during embryoid body differentiation. Through mechanistic studies using various mutants, they show that IκBα regulates ESC differentiation through chromatin-related functions, independent of the canonical NF-κB pathway.

      Strengths:

      The authors nicely investigate the role of IκBα in pluripotency exit, using embryoid body formation and complementing the phenotypic analysis with a number of genome-wide approaches, including transcriptomic, histone marks deposition, and DNA methylation analyses. Moreover, they generate a first-of-its-kind mutant set that allows them to uncouple IκBα's function in chromatin regulation versus its NF-κB-related functions. This work contributes to our understanding of cellular plasticity and development, potentially interesting a broad audience including developmental biologists, chromatin biology researchers, and cell signaling experts.

      Weaknesses:

      - The study's main limitation is the lack of crucial controls using bona fide naïve cells across key experiments, including DNA methylation analysis, gene expression profiling in embryoid bodies, and histone mark deposition. This omission makes it difficult to evaluate whether the observed changes in IκBα-KO cells truly reflect naïve pluripotency characteristics.

      - Several conclusions in the manuscript require a more measured interpretation. The authors should revise their statements regarding the strength of the pluripotency exit block, the extent of hypomethylation, and the global nature of chromatin changes.

      - From a methodological perspective, the manuscript would benefit from additional orthogonal approaches to strengthen the knockout findings, which may be influenced by clonal expansion of ES cells.

      Overall, this study makes an important contribution to the field. However, the concerns raised regarding controls, data interpretation, and methodology should be addressed to strengthen the manuscript and support the authors' conclusions.

      Specific answer to weaknesses for Reviewer 2:

      - As the reviewer pointed out, we have not performed all the analysis by comparing with cells in 2i LIF since our initial study was focused on Serum LIF and differentiation. However, it was the transcriptome analysis in Serum LIF which showed that KO cells resembled naïve ES cells in 2i LIF by GSEA. We have repeated key experiments with all conditions (Figure 1B, 1D, Figure 3C and 3), but we do not think that repeating all ‘omics’ experiments with 2i LIF conditions will add important information. Nevertheless, we will analyze different chromatin data (DNA methylation and different histone post-translational modifications) from previously published works in 2i/LIF and Serum/LIF and compare them with our IκBα-WT and IκBα-KO mESCs to better confirm the stabilization of the ground state pluripotency in IκBα-KO mESCs under Serum/LIF conditions.

      - We agree about reducing the strength of the pluripotency exit block, extend of hypomethylation and the global nature of chromatin changes. There are many changes in the chromatin that we are trying to better characterize by HiC in ongoing studies that are out of the scope of this manuscript.

      We have performed studies in 3 different IkBa KO and WT clones. In addition, the reconstitution studies with IkBa separation-of-function (SOF) mutants with differential effect after expressing the NFkB binding form (IkBaDChrom) or the chromatin binding form (IkBaDNFkB) also support the robustness of this phenotype.

    1. The problem is, once you really understand a problem, you realize that most problems are not solvable at all.  They’re tangled webs of causality, which one might call “wicked” problems33 Coyne, R. (2005). Wicked problems revisited. Design Studies. .  The best you can do is understand this complex causality and find opportunities to adjust, nudge, and tweak

      I find this statement to be incredibly useful in understanding problems in our perception and approach to solving them. I think most of us were raised with one-dimensional notions of good and bad, right and wrong. We conflate the problems with the concept of something 'bad' which therefore suggests that there is a 'good' or a right to be made. If we apply the reasoning presented in this question, problems will never actually be solved in the greater context. A situation or instance that may present as a problem to one subject or subjects at a single point in time may not be problematic at all in a different context. Having such a fixed and rigid perspective of problems and their solutions can limit our ability to innovate and develop because such a perspective does not acknowledge or consider the alternative contexts that a 'problem' might exist in.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This study asks whether the phenomenon of crossmodal temporal recalibration, i.e. the adjustment of time perception by consistent temporal mismatches across the senses, can be explained by the concept of multisensory causal inference. In particular, they ask whether the explanation offered by causal inference better explains temporal recalibration better than a model assuming that crossmodal stimuli are always integrated, regardless of how discrepant they are.

      The study is motivated by previous work in the spatial domain, where it has been shown consistently across studies that the use of crossmodal spatial information is explained by the concept of multisensory causal inference. It is also motivated by the observation that the behavioral data showcasing temporal recalibration feature nonlinearities that, by their nature, cannot be explained by a fixed integration model (sometimes also called mandatory fusion).

      To probe this the authors implemented a sophisticated experiment that probed temporal recalibration in several sessions. They then fit the data using the two classes of candidate models and rely on model criteria to provide evidence for their conclusion. The study is sophisticated, conceptually and technically state-of-the-art, and theoretically grounded. The data clearly support the authors’ conclusions.

      I find the conceptual advance somewhat limited. First, by design, the fixed integration model cannot explain data with a nonlinear dependency on multisensory discrepancy, as already explained in many studies on spatial multisensory perception. Hence, it is not surprising that the causal inference model better fits the data.

      We have addressed this comment by including an asynchrony-contingent model, which is capable of predicting the nonlinearity of recalibration effects by employing a heuristic approximation of the causal-inference process (Fig. 3). We also updated the previous competitor model with a more reasonable asynchrony-correction model as the baseline of model comparison, which assumes recalibration aims to restore synchrony whenever the sensory measurement of SOA indicates an asynchrony. The causal-inference model outperformed both models, as indicated by model evidence (Fig. 4A). Furthermore, model predictions show that the causal-inference model more accurately captures recalibration at large SOAs at both the group (Fig. 4B) and the individual levels (Fig. S4).

      Second, and again similar to studies on spatial paradigms, the causal inference model fails to predict the behavioral data for large discrepancies. The model predictions in Figure 5 show the (expected) vanishing recalibration for large delta, while the behavioral data don’t decay to zero. Either the range of tested SOAs is too small to show that both the model and data converge to the same vanishing effect at large SOAs, or the model's formula is not the best for explaining the data. Again, the studies using spatial paradigms have the same problem, but in my view, this poses the most interesting question here.

      We included an additional simulation (Fig. 5B) to show that the causal-inference model can predict non-zero recalibration for long adapter SOAs, especially in observers with a high common-cause prior and low sensory precision. This ability to predict a non-zero recalibration effect even at large SOA, such as 0.7 s, is one key feature of the causal-inference model that distinguishes it from the asynchrony-contingent model.

      In my view there is nothing generally wrong with the study, it does extend the 'known' to another type of paradigm. However, it covers little new ground on the conceptual side.

      On that note, the small sample size of n=10 is likely not an issue, but still, it is on the very low end for this type of study.

      This study used a within-subject design, which included 3 phases each repeated in 9 sessions, totaling 13.5 hours per participant. This extensive data collection allows us to better constrain the model for each participant. Our conclusions are based on the different models’ ability to fit individual data.

      Reviewer #2 (Public Review):

      Summary:

      Li et al.’s goal is to understand the mechanisms of audiovisual temporal recalibration. This is an interesting challenge that the brain readily solves in order to compensate for real-world latency differences in the time of arrival of audio/visual signals. To do this they perform a 3-phase recalibration experiment on 9 observers that involves a temporal order judgment (TOJ) pretest and posttest (in which observers are required to judge whether an auditory and visual stimulus were coincident, auditory leading or visual leading) and a conditioning phase in which participants are exposed to a sequence of AV stimuli with a particular temporal disparity. Participants are required to monitor both streams of information for infrequent oddballs, before being tested again in the TOJ, although this time there are 3 conditioning trials for every 1 TOJ trial. Like many previous studies, they demonstrate that conditioning stimuli shift the point of subjective simultaneity (pss) in the direction of the exposure sequence.

      These shifts are modest - maxing out at around -50 ms for auditory leading sequences and slightly less than that for visual leading sequences. Similar effects are observed even for the longest offsets where it seems unlikely listeners would perceive the stimuli as synchronous (and therefore under a causal inference model you might intuitively expect no recalibration, and indeed simulations in Figure 5 seem to predict exactly that which isn't what most of their human observers did). Overall I think their data contribute evidence that a causal inference step is likely included within the process of recalibration.

      Strengths:

      The manuscript performs comprehensive testing over 9 days and 100s of trials and accompanies this with mathematical models to explain the data. The paper is reasonably clearly written and the data appear to support the conclusions.

      Weaknesses:

      While I believe the data contribute evidence that a causal inference step is likely included within the process of recalibration, this to my mind is not a mechanism but might be seen more as a logical checkpoint to determine whether whatever underlying neuronal mechanism actually instantiates the recalibration should be triggered.

      We have addressed this comment by replacing the fixed-update model with an asynchrony-correction model, which assumes that the system first evaluates whether the measurement of SOA is asynchronous, thus indicating a need for recalibration (Fig. 3). If it does, it shifts the audiovisual bias by a proportion of the measured SOA. We additionally included an asynchrony-contingent model, which is capable of replicating the nonlinearity of recalibration effects by a heuristic approximation of the causal-inference process.

      Model comparisons indicate that the causal-inference model of temporal recalibration outperforms both alternative models (Fig. 4A). Furthermore, the model predictions demonstrate that the causal-inference model more accurately captures recalibration at large SOAs at both the group level (Fig. 4B) and individual level (Fig. S4).

      The authors’ causal inference model strongly predicts that there should be no recalibration for stimuli at 0.7 ms offset, yet only 3/9 participants appear to show this effect. They note that a significant difference in their design and that of others is the inclusion of longer lags, which are unlikely to originate from the same source, but don’t offer any explanation for this key difference between their data and the predictions of a causal inference model.

      We added further simulations to show that the causal-inference model can predict non-zero recalibration also for longer adapter SOAs, especially in observers with a large common-cause prior (Fig. 5A) and low sensory precision (Fig. 5B). This ability to predict a non-zero recalibration effect even at longer adapter SOAs, such as 0.7 s, is a key feature of the causal-inference model that distinguishes it from the asynchrony-contingent model.

      I’m also not completely convinced that the causal inference model isn’t ‘best’ simply because it has sufficient free parameters to capture the noise in the data. The tested models do not (I think) have equivalent complexity - the causal inference model fits best, but has more parameters with which to fit the data. Moreover, while it fits ‘best’, is it a good model? Figure S6 is useful in this regard but is not completely clear - are the red dots the actual data or the causal inference prediction? This suggests that it does fit the data very well, but is this based on predicting held-out data, or is it just that by having more parameters it can better capture the noise? Similarly, S7 is a potentially useful figure but it's not clear what is data and what are model predictions (what are the differences between each row for each participant; are they two different models or pre-test post-test or data and model prediction?!).

      I'm not an expert on the implementation of such models but my reading of the supplemental methods is that the model is fit using all the data rather than fit and tested on held-out data. This seems problematic.

      We recognize the risk of overfitting with the causal-inference model. We now rely on Bayesian model comparisons, which use model evidence for model selection. This method automatically incorporates a penalty for model complexity through the marginalization over the parameter space (MacKay, 2003).

      Our design is not suitable for cross-validation because the model-fitting process is computationally intensive and time-consuming. Each fit of the causal-inference model takes approximately 30 hours, and multiple fits with different initial starting points are required to rule out that the parameter estimates correspond to local minima.

      I would have liked to have seen more individual participant data (which is currently in the supplemental materials, albeit in a not very clear manner as discussed above).

      We have revised Supplementary Figures S4-S6 to show additional model predictions of the recalibration effect for individual participants, and participants’ temporal-order judgments are now shown in Supplement Figure S7. These figures confirm the better performance of the causal-inference model.

      The way that S3 is described in the text (line 141) makes it sound like everyone was in the same direction, however, it is clear that 2 /9 listeners show the opposite pattern, and 2 have confidence intervals close to zero (albeit on the -ve side).

      We have revised the text to clarify that the asymmetry occurs in both directions and is idiosyncratic (lines 168-171). We summarized the distribution of the individual asymmetries of the recalibration effect across visual-leading and auditory-leading adapter SOAs in Supplementary Figure S2.

      Reviewer #3 (Public Review):

      Summary:

      Li et al. describe an audiovisual temporal recalibration experiment in which participants perform baseline sessions of ternary order judgments about audiovisual stimulus pairs with various stimulus-onset asynchronies (SOAs). These are followed by adaptation at several adapting SOAs (each on a different day), followed by post-adaptation sessions to assess changes in psychometric functions. The key novelty is the formal specification and application/fit of a causal-inference model for the perception of relative timing, providing simulated predictions for the complete set of psychometric functions both pre and post-adaptation.

      Strengths:

      (1) Formal models are preferable to vague theoretical statements about a process, and prior to this work, certain accounts of temporal recalibration (specifically those that do not rely on a population code) had only qualitative theoretical statements to explain how/why the magnitude of recalibration changes non-linearly with the stimulus-onset asynchrony of the adapter.

      (2) The experiment is appropriate, the methods are well described, and the average model prediction is a fairly good match to the average data (Figure 4). Conclusions may be overstated slightly, but seem to be essentially supported by the data and modelling.

      (3) The work should be impactful. There seems a good chance that this will become the go-to modelling framework for those exploring non-population-code accounts of temporal recalibration (or comparing them with population-code accounts).

      (4) A key issue for the generality of the model, specifically in terms of recalibration asymmetries reported by other authors that are inconsistent with those reported here, is properly acknowledged in the discussion.

      Weaknesses:

      (1) The evidence for the model comes in two forms. First, two trends in the data (non-linearity and asymmetry) are illustrated, and the model is shown to be capable of delivering patterns like these. Second, the model is compared, via AIC, to three other models. However, the main comparison models are clearly not going to fit the data very well, so the fact that the new model fits better does not seem all that compelling. I would suggest that the authors consider a comparison with the atheoretical model they use to first illustrate the data (in Figure 2). This model fits all sessions but with complete freedom to move the bias around (whereas the new model constrains the way bias changes via a principled account). The atheoretical model will obviously fit better, but will have many more free parameters, so a comparison via AIC/BIC or similar should be informative

      In the revised manuscript, we switched from AIC to Bayesian model selection, which approximates and compares model evidence. This method incorporates a strong penalty for model complexity through marginalization over the parameter space (MacKay, 2003).

      We have addressed this comment by updating the former competitor model into a more reasonable version that induces recalibration only for some measured SOAs and by including another (asynchrony-contingent) model that is capable of predicting the nonlinearity and asymmetry of recalibration (Fig. 3) while heuristically approximating the causal inference computations. The causal-inference model outperformed the asynchrony-contingent model, as indicated by model evidence (Fig. 4A). Furthermore, model predictions show that the causal-inference model more accurately captures recalibration at large SOAs at both the group (Fig. 4B) and the individual level (Fig. S4).

      (2) It does not appear that some key comparisons have been subjected to appropriate inferential statistical tests. Specifically, lines 196-207 - presumably this is the mean (and SD or SE) change in AIC between models across the group of 9 observers. So are these differences actually significant, for example via t-test?

      We statistically compared the models using Bayes factors (Fig. 4A). The model evidence for each model was approximated using Variational Bayesian Monte Carlo. Bayes factors provided strong evidence in support of the causal-inference model relative to the other models.

      (3) The manuscript tends to gloss over the population-code account of temporal recalibration, which can already provide a quantitative account of how the magnitude of recalibration varies with adapter SOA. This could be better acknowledged, and the features a population code may struggle with (asymmetry?) are considered.

      We simulated a population-code model to examine its prediction of the recalibration effect for different adapter SOAs (lines 380–388, Supplement Section 8). The population-code model can predict the nonlinearity of recalibration, i.e., a decreasing recalibration effect as the adapter SOA increases. However, to capture the asymmetry of recalibration effects across auditory-leading and visual-leading adapter stimuli, we would need to assume that the auditory-leading and visual-leading SOAs are represented by neural populations with unequal tuning curves.

      (4) The engagement with relevant past literature seems a little thin. Firstly, papers that have applied causal inference modeling to judgments of relative timing are overlooked (see references below). There should be greater clarity regarding how the modelling here builds on or differs from these previous papers (most obviously in terms of additionally modelling the recalibration process, but other details may vary too). Secondly, there is no discussion of previous findings like that in Fujisaki et al.’s seminal work on recalibration, where the spatial overlap of the audio and visual events didn’t seem to matter (although admittedly this was an N = 2 control experiment). This kind of finding would seem relevant to a causal inference account.

      References:

      Magnotti JF, Ma WJ and Beauchamp MS (2013) Causal inference of asynchronous audiovisual speech. Front. Psychol. 4:798. doi: 10.3389/fpsyg.2013.00798

      Sato, Y. (2021). Comparing Bayesian models for simultaneity judgement with different causal assumptions. J. Math. Psychol., 102, 102521.

      We have revised the Introduction and Discussion to better situate our study within the existing literature. Specifically, we have incorporated the suggested references (lines 66–69) and provided clearer distinctions on how our modeling approach builds on or differs from previous work on causal-inference models, particularly in terms of modeling the recalibration process (lines 75–79). Additionally, we have discussed findings that might contradict the assumptions of the causal-inference model (lines 405–424).

      (5) As a minor point, the model relies on simulation, which may limit its take-up/application by others in the field.

      Upon acceptance, we will publicly share the code for all models (simulation and parameter fitting) to enable researchers to adapt and apply these models to their own data.

      (6) There is little in the way of reassurance regarding the model’s identifiability and recoverability. The authors might for example consider some parameter recovery simulations or similar.

      We conducted a model recovery for each of the six models described in the main text and confirmed that the asynchrony-contingent and causal-inference models are identifiable (Supplement Section 11). Simulations of the asynchrony-correction model were sometimes best fit by causal-inference models, because the latter behaves similarly when the prior of a common cause is set to one.

      We also conducted a parameter recovery for the winning model, the causal-inference model with modality-specific precision (Supplement Section 13).

      Key parameters, including audiovisual bias  , amount of auditory latency noise  , amount of visual latency noise  , criterion, lapse rate  showed satisfactory recovery performance. The less accurate recovery of  is likely due to a tradeoff with learning rate  .

      (7) I don't recall any statements about open science and the availability of code and data.

      Upon acceptance of the manuscript, all code (simulation and parameter fitting) and data will be made available on OSF and publicly available.

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      In addition to the comments below, we would like to offer the following summary based on the discussion between reviewers:

      The major shortcoming of the work is that there should ideally be a bit more evidence to support the model, over and above a demonstration that it captures important trends and beats an account that was already known to be wrong. We suggest you:

      (1) Revise the figure legends (Figure 5 and Figure 6E).

      We revised all figures and figure legends.

      (2) Additionally report model differences in terms of BIC (which will favour the preferred model less under the current analysis);

      We now base the model comparison on Bayesian model selection, which approximates and compares model evidence. This method incorporates a strong penalty for model complexity through marginalization over the parameter space (MacKay, 2003).

      (3) Move to instead fitting the models multiple times in order to get leave-one-out estimates of best-fitting loglikelihood for each left-out data point (and then sum those for the comparison metric).

      Unfortunately, our design is not suitable for cross-validation methods because the model-fitting process is computationally intensive and time-consuming. Each fit of the causal-inference model takes approximately 30 hours, and multiple fits with different initial starting points are required to rule out local minima.

      (4) Offering a comparison with a more convincing model (for example an atheoretical fit with free parameters for all adapters, e.g. as suggested by Reviewer 3.

      We updated the previous competitor model and included an asynchrony-contingent model, which is capable of predicting the nonlinearity of recalibration (Fig. 3). The causal-inference model still outperformed the asynchrony-contingent model (Fig. 4A). Furthermore, model predictions show that only the causal-inference model captures non-zero recalibration effects for long adapter SOAs at both the group level (Fig. 4B) and individual level (Figure S4).

      Reviewer #1 (Recommendations For The Authors):

      A larger sample size would be better.

      This study used a within-subject design, which included 9 sessions, totaling 13.5 hours per participant. This extensive data collection allows us to better constrain the model for each participant. Our conclusions are based on the different models’ ability to fit individual data rather than on group statistics.

      It would be good to better put the study in the context of spatial ventriloquism, where similar model comparisons have been done over the last ten years and there is a large body of work to connect to.

      We now discuss our model in relation to models of cross-modal spatial recalibration in the Introduction (lines 70–78) and Discussion (lines 324–330).

      Reviewer #2 (Recommendations For The Authors):

      Previous authors (e.g. Yarrow et al.,) have described latency shift and criterion change models as providing a good fit of experimental data. Did the authors attempt a criterion shift model in addition to a shift model?

      We have considered criterion-shift variants of our atheoretical recalibration models in Supplement Section 1. To summarize the results, we varied two model assumptions: 1) the use of either a Gaussian or an exponential measurement distribution, and 2) recalibration being implemented either as a shift of bias or a criterion. We fit each model variant separately to the ternary TOJ responses of all sessions. Bayesian model comparisons indicated that the bias-shift model with exponential measurement distributions best captured the data of most participants.

      Figure 4B - I'm not convinced that the modality-independent uncertainty is anything but a straw man. Models not allowed to be asymmetric do not show asymmetry? (the asymmetry index is irrelevant in the fixed update model as I understand it so it is not surprising the model is identical?).

      We included the assumption that temporal uncertainty might be modality-independent for several reasons. First, there is evidence suggesting that a central mechanism governs the precision of temporal-order judgments (Hirsh & Sherrick, 1961), indicating that precision is primarily limited by a central mechanism rather than the sensory channels themselves. Second, from a modeling perspective, it was necessary to test whether an audio-visual temporal bias alone, i.e., assuming modality-independent uncertainty, could introduce asymmetry across adapter SOAs. Additionally, most previous studies implicitly assumed symmetric likelihoods, i.e., modality-independent latency noise, by fitting cumulative Gaussians to the psychometric curves derived from 2AFC-TOJ tasks (Di Luca et al., 2009; Fujisaki et al., 2004; Harrar & Harris, 2005; Keetels & Vroomen, 2007; Navarra et al., 2005; Tanaka et al., 2011; Vatakis et al., 2007, 2008; Vroomen et al., 2004).

      Why does a zero SOA adapter shift the pss towards auditory leading? Is this a consequence of the previous day’s conditioning - it’s not clear from the methods whether all listeners had the same SOA conditioning sequence across days.

      The auditory-leading recalibration effect for an adapter SOA of zero has been consistently reported in previous studies (e.g., Fujisaki et al., 2004; Vroomen et al., 2004). This effect symbolizes the asymmetry in recalibration. This asymmetry can be explained by differences across modalities in the noisiness of the latencies (Figure 5C) in combination with audiovisual temporal bias (Figure S8).

      We added details about the order of testing to the Methods section (lines 456–457).

      Reviewer #3 (Recommendations For The Authors):

      Abstract

      “Our results indicate that human observers employ causal-inference-based percepts to recalibrate cross-modal temporal perception” Your results indicate this is plausible. However, this statement (basically repeated at the end of the intro and again in the discussion) is - in my opinion - too strong.

      We have revised the statement as suggested.

      Intro and later

      Within the wider literature on relative timing perception, the temporal order judgement (TOJ) task refers to a task with just two response options. Tasks with three response options, as employed here, are typically referred to as ternary judgments. I would suggest language consistent with the existing literature (or if not, the contrast to standard usage could be clarified).

      Ref: Ulrich, R. (1987). Threshold models of temporal-order judgments evaluated by a ternary response task. Percept. Psychophys., 42, 224-239.

      We revised the term for the task as suggested throughout the manuscript.

      Results, 2.2.2

      “However, temporal precision might not be due to the variability of arrival latency.” Indeed, although there is some recent evidence that it might be.

      Ref: Yarrow, K., Kohl, C, Segasby, T., Kaur Bansal, R., Rowe, P., & Arnold, D.H. Neural-latency noise places limits on human sensitivity to the timing of events. Cognition, 222, 105012 (2022).

      We included the reference as suggested (lines 245–248).

      Methods, 4.3.

      Should there be some information here about the order of adaptation sessions (e.g. random for each observer)?

      We added details about the order of testing to the Methods section (lines 456–457).

      Supplemental material section 1.

      Here, you test whether the changes resulting from recalibration look more like a shift of the entire psychometric function or an expansion of the psychometric function on one side (most straightforwardly compatible with a change of one decision criterion). Fine, but the way you have done this is odd, because you have introduced a further difference in the models (Gaussian vs. exponential latency noise) so that you cannot actually conclude that the trend towards a win for the bias-shift model is simply down to the bias vs. criterion difference. It could just as easily be down to the different shapes of psychometric functions that the two models can predict (with the exponential noise model permitting asymmetry in slopes). There seems to be no reason that this comparison cannot be made entirely within the exponential noise framework (by a very simple reparameterization that focuses on the two boundaries rather than the midpoint and extent of the decision window). Then, you would be focusing entirely on the question of interest. It would also equate model parameters, removing any reliance on asymptotic assumptions being met for AIC.

      We revised our exploration of atheoretical recalibration models. To summarize the results, we varied two model assumptions: 1) the use of either a Gaussian or an exponential measurement distribution, and 2) recalibration being implemented either as a shift of the cross-modal temporal bias or as a shift of the criterion. We fit each model separately to the ternary TOJ responses of all sessions. Bayesian model comparisons indicated that the bias-shift model with exponential measurement distributions best described the data of most participants.

      References

      Di Luca, M., Machulla, T.-K., & Ernst, M. O. (2009). Recalibration of multisensory simultaneity:

      cross-modal transfer coincides with a change in perceptual latency. Journal of Vision, 9(12), Article 7.

      Fujisaki, W., Shimojo, S., Kashino, M., & Nishida, S. ’ya. (2004). Recalibration of audiovisual simultaneity. Nature Neuroscience, 7(7), 773–778.

      Harrar, V., & Harris, L. R. (2005). Simultaneity constancy: detecting events with touch and vision. Experimental Brain Research. Experimentelle Hirnforschung. Experimentation Cerebrale, 166(3-4), 465–473.

      Hirsh, I. J., & Sherrick, C. E., Jr. (1961). Perceived order in different sense modalities. Journal of Experimental Psychology, 62(5), 423–432.

      Keetels, M., & Vroomen, J. (2007). No effect of auditory-visual spatial disparity on temporal recalibration. Experimental Brain Research. Experimentelle Hirnforschung. Experimentation Cerebrale, 182(4), 559–565.

      MacKay, D. J. (2003). Information theory, inference and learning algorithms.https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=201b835c3f3a3626ca07b e68cc28cf7d286bf8d5

      Navarra, J., Vatakis, A., Zampini, M., Soto-Faraco, S., Humphreys, W., & Spence, C. (2005). Exposure to asynchronous audiovisual speech extends the temporal window for audiovisual integration. Brain Research. Cognitive Brain Research, 25(2), 499–507.

      Tanaka, A., Asakawa, K., & Imai, H. (2011). The change in perceptual synchrony between auditory and visual speech after exposure to asynchronous speech. Neuroreport, 22(14), 684–688.

      Vatakis, A., Navarra, J., Soto-Faraco, S., & Spence, C. (2007). Temporal recalibration during asynchronous audiovisual speech perception. Experimental Brain Research. Experimentelle Hirnforschung. Experimentation Cerebrale, 181(1), 173–181.

      Vatakis, A., Navarra, J., Soto-Faraco, S., & Spence, C. (2008). Audiovisual temporal adaptation of speech: temporal order versus simultaneity judgments. Experimental Brain Research. Experimentelle Hirnforschung. Experimentation Cerebrale, 185(3), 521–529.

      Vroomen, J., Keetels, M., de Gelder, B., & Bertelson, P. (2004). Recalibration of temporal order perception by exposure to audio-visual asynchrony. Brain Research. Cognitive Brain Research, 22(1), 32–35.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1 (Public Review):

      Summary:

      In this manuscript by Bimbard et al., a new method to perform stable recordings over long periods of time with neuropixels, as well as the technical details on how the electrodes can be explanted for follow-up reuse, is provided. I think the description of all parts of the method is very clear, and the validation analyses (n of units per day over time, RMS over recording days...) are very convincing. I however missed a stronger emphasis on why this could provide a big impact on the ephys community, by enabling new analyses, new behavior correlation studies, or neurophysiological mechanisms across temporal scales.

      Strengths:

      Open source method. Validation across laboratories. Across species (mice and rats) demonstration of its use and in different behavioral conditions (head-fixed and freely moving).

      Weaknesses:

      Weak emphasis on what can be enabled with this new method that didn't exist before.

      We thank the reviewer for highlighting the limited discussion around scientific impact. Our implant has several advantages which combine to make it much more accessible than previous solutions. This enables a variety of recording configurations that would not have been possible with previous designs, facilitating recordings from a wider range of brain regions, animals, and experimental setups. In short, there are three key advances which we now emphasise in the manuscript:

      Adaptability: The CAD files can be readily adapted to a wide range of configurations (implantation depth, angle, position of headstage, etc.). Labs have already modified the design for their needs, and re-shared with the community (Discussion, Para 5).

      Weight: Because of the lightweight design, experimenters can i) perform complex and demanding freely moving tasks as we exemplify in the manuscript, and ii) implant female and water restricted mice while respecting animal welfare weight limitations (Flexible design, Para 1).

      Cost: At ~$10, our implant is significantly cheaper than published alternatives, which makes it affordable to more labs and means that testing modifications is cost-effective (Discussion, Para 4).

      Reviewer 1 (Recommendations For The Authors):

      - Differences between mice and rats seem very significant. Although this is probably not surprising, I suggest that the authors comment on this to make it clear to anyone trying to use in different species that are not quantified in the main figures.

      The reviewer is correct—there are qualitative differences between mice and rats, particularly with respect to the unit median amplitude. We have added a comment in the discussion to highlight these inter-species variations (Discussion, Para 7)

      - Another comment that would be useful to have would be how to tackle the problem of tracking the same neuron across days. Even if currently impossible, it could be useful to provide discussion along those lines as to where future improvements (either in hardware or software) can be made.

      We thank the reviewer for highlighting this. Figure. 5 does show data from tracking the same neuron across days (and even months). We have modified the language to make this clear.

      Reviewer 2 (Public Review):  

      Summary:

      This work by Bimbard et al., introduces a new implant for Neuropixels probes. While Neuropixels probes have critically improved and extended our ability to record the activity of a large number of neurons with high temporal resolution, the use of these expensive devices in chronic experiments has so far been hampered by the difficulty of safely implanting them and, importantly, to explant and reuse them after conclusion of the experiment. The authors present a newly designed two-part implant, consisting of a docking and a payload module, that allows for secure implantation and straightforward recovery of the probes. The implant is lightweight, making it amenable for use in mice and rats, and customizable. The authors provide schematics and files for printing of the implants, which can be easily modified and adapted to custom experiments by researchers with little to no design experience. Importantly, the authors demonstrate the successful use of this implant across multiple use cases, in head-fixed and freely moving experiments, in mice and rats, with different versions of Neuropixels probes, and across 8 different labs. Taken together, the presented implants promise to make chronic Neuropixel recordings and long-term studies of neuronal activity significantly easier and attainable for both current and future Neuropixels users.

      Strengths:

      The implants have been successfully tested across 8 different laboratories, in mice and rats, in headfixed and freely moving conditions, and have been adapted in multiple ways for a number of distinct experiments.

      Implants are easily customizable and the authors provide a straightforward approach for customization across multiple design dimensions even for researchers not experienced in design.

      The authors provide clear and straightforward descriptions of the construction, implantation, and explant of the described implants.

      The split of the implant into a docking and payload module makes reuse even in different experiments (using different docking modules) easy.

      The authors demonstrate that implants can be re-used multiple times and still allow for high-quality recordings.

      The authors show that the chronic implantations allow for the tracking of individual neurons across days and weeks (using additional software tracking solutions), which is critical for a large number of experiments requiring the description of neuronal activity, e.g. throughout learning processes.

      The authors show that implanted animals can even perform complex behavioral tasks, with no apparent reduction in their performance.

      Weaknesses:

      While implanted animals can still perform complex behavioral tasks, the authors describe that the implants may reduce the animals' mobility, as measured by prolonged reaction times. However, the presented data does not allow us to judge whether this effect is specifically due to the presented implant or whether any implant or just tethering of the animals per se would have the same effects.

      The reviewer is correct: some of the differences in mouse reaction time could be due to the tether rather than the implant. As these experiments were also performed in water-restricted female mice with the heavier Neuropixels 1.0 implant, our data represent the maximal impact of the implant, and we have highlighted this point in the revision (Freely behaving animals, Para 2).  

      While the authors make certain comparisons to other, previously published approaches for chronic implantation and re-use of Neuropixels probes, it is hard to make conclusive comparisons and judge the advantages of the current implant. For example, while the authors emphasize that the lower weight of their implant allows them to perform recordings in mice (and is surely advantageous), the previously described, heavier implants they mention (Steinmetz et al., 2021; van Daal et al., 2021), have also been used in mice. Whether the weight difference makes a difference in practice therefore remains somewhat unclear.

      The reviewer is correct: without a direct comparison, we cannot be certain that our smaller, lighter implant improves behavioural results (although this is supported by the literature, e.g. Newman et al, 2023). However, the reduced weight of our implant is critical for several laboratories represented in this manuscript due to animal welfare requirements. Indeed, in van Daal et al the authors “recommend a [mouse] weight of >25 g for implanting Neuropixels 1.0 probes.” This limit precludes using (the vast majority of) female mice, or water-restricted animals. Conversely, our implant can be routinely used with lighter, water-restricted male and female mice. We emphasised this point in the revision (Discussion, Para 2).

      The non-permanent integration of the headstages into the implant, while allowing for the use of the same headstage for multiple animals in parallel, requires repeated connections and does not provide strong protection for the implant. This may especially be an issue for the use in rats, requiring additional protective components as in the presented rat experiments.

      We apologise for not clarifying the various headstage holder options in the manuscript and we have now addressed this in the revision (Freely behaving animals, Para 1&2). Our repository has headstage holder designs (in the XtraModifications/Mouse_FreelyMoving folder). This allows leaving the headstage on the implant, and thus minimize the number of connections (albeit increasing the weight for the mouse). Indeed, mice recorded while performing the task described in our manuscript had the head-stage semi-permanently integrated to the implant, and we now highlight this in the revision (Freely behaving animals, Para 1).

      Reviewer 2 (Recommendations For The Authors): 

      The description of the different versions of the head-stage holders should be more clear, listing also advantages/disadvantages of the different solutions. It would be also useful if the authors could comment on the use of these head-stage holders in rats, since they do not seem to offer much protection.

      We thank the reviewer for this point, and we have added notes to the manuscript to clarify the various advantages of the different headstage-holders, and that the headstage can be permanently attached to the implant (Freely behaving animals, Para 1&2). This is the primary advantage of these solutions compared with the minimal implant—at the expense of increasing the implant weight.  

      The reviewer’s concerns regarding the lack of protection for implants in rats is well-placed, and we now emphasise that these experiments benefited from the additional protection of an external 3D casing, which is likely critical for use in larger animals (Freely behaving animals, Para 1).

      While re-used probes seem to show similar yields across multiple uses (Figure 4C), it seems as if there is a much higher variability of the yield for probes that are used for the first (maybe also second) time. There are probes with much higher than average yields, but it seems none of the re-used probes show such high yields. Is this a real effect? Is this because the high-yield probes happened to have not been used multiple times? Is there an analysis the authors could provide to reduce the concern that yields may generally be lower for re-used probes/that there are no very high yields for re-used probes?

      We understand the reviewer’s concern with respect to Figure 4C, however, the re-use of any given probe was determined only by the experimental needs of the project. It is therefore not possible that there is a relationship between probes selected for re-use and unit-yield. We now specify this in the revised legend of Figure 4C. This variability (and the consistency in yield across uses) likely stems from differences between labs, brain region, and implantation protocol.

      The authors claim that a 'large fraction' of units could be tracked for the entire duration of the experiment (Figure 5A,B). They mention in the discussion that quantification can be found in a different manuscript (van Beest et al., 2023), but this should also be quantified here in at least some more detail, also for other animals in addition to the one mouse which was recorded for ~100 days. What fraction can be held for different durations? What is the average holding time, etc.?

      We agree with the reviewer, and have now added new panels quantifying the probability and reliability of tracking a neuron across days (Figure 5E-F). We also comment on the change in tracking probability across time, and its variability across recordings (Stability, Para 4).

      Reviewer 3 (Public Reviews):

      Summary:

      In this manuscript, Bimbard and colleagues describe a new implant apparatus called "Apollo Implant", which should facilitate recording in freely moving rodents (mice and rats) using Neuropixels probes. The authors collected data from both mice and rats, they used 3 different versions of Neuropixels, multiple labs have already adopted this method, which is impressive. They openly share their CAD designs and surgery protocol to further facilitate the adaptation of their method.

      Strengths:

      Overall, the "Apollo Implant" is easy to use and adapt, as it has been used in other laboratories successfully and custom modifications are already available. The device is reproducible using common 3D printing services and can be easily modified thanks to its CAD design (the video explaining this is extremely helpful). The weight and price are amazing compared to other systems for rigid silicon probes allowing a wide range of use of the "Apollo Implant".

      Weaknesses:

      The "Apollo Implant" can only handle Neuropixels probes. It cannot hold other widely used and commercially available silicon probes. Certain angles and distances are not possible in their current form (distance between probes 1.8 to 4mm, implantation depth 2-6.5 mm, or angle of insertion up to 20 degrees).

      As we now discuss in the manuscript (Discussion, Para 4), one implant accommodating the diversity of the existing probes is beyond the scope of this project. However, because the design is adaptable, groups should be able to modify the current version of the implant to adapt to their electrodes’ size and format (and can highlight any issues in the Github “Discussions” area).

      With Neuropixels, the current range of depths covers practically all trajectories in the mouse brain. In rats, where deeper penetrations may be useful, the experimenter can attach the probe at a lower point in the payload module to expose more of the shank. We now specify this in the Github repository.  

      We have now extended the range of inter-probe distances from a maximum of 4 mm to 6.5 mm. Distances beyond this may be better served by 2 implants, and smaller distances could be achieved by attaching two probes on the same side of the docking module. These points are now specified in the revised manuscript (Flexible design, Para 2).

      Reviewer 3 (Recommendations For The Authors):

      I have only a few questions and suggestions:

      Is it possible to create step-by-step instructions for explantation (similar to Figure-1 with CAD schematics)? You mention that payload holder is attached to a micromanipulator, but it is unclear how this is achieved. How was the payload secured with a screw (which screw)? My understanding is that as you turn the screw in the payload holder, it will grab onto the payload module from both sides, but the screw is not in contact with the payload module, correct? I found the screw type on your GitHub, but it would be great if you could add a bill of materials in a table format, so readers don't have to jump between GitHub and article.

      We have now added a bill of materials to the revised manuscript (Implant design and materials, Para 2), although up-to-date links are still provided on the Github repository due to changing availability.

      What happens if you do a dual probe implant and cannot avoid blood vessels in one or both of the craniotomies due to the pre-defined geometry? Is this a frequent issue? How can you overcome this during the surgery?

      Blood vessels can be difficult to avoid in some cases, but we are typically able to rotate/reposition the probes to solve this issue. In some cases, with 4-shank probes, the blood vessel can be positioned between individual probe shanks. We now detail this in the revised manuscript (Assembly and implantation, Para 3).

      I assume if the head is not aligned (line-332) the probe can break during recovery. Have you experienced this during explanation?

      As we now specify in the manuscript (Explantation, Para 2), we are careful when explanting the probe to avoid this issue, and due to the flexibility of the shanks, it does not appear to be a major concern.

      Why did you remove the UV glue (line 435)? How can you level the skull? I assume you have covered bregma and lambda in the first surgery which can create an uneven surface to measure even after you remove the UV glue.

      We thank the reviewer for highlighting this omission from the methods. We now explain (Implantation, Carandini-Harris laboratory) that the UV-glue is completely removed during the second surgery, and the skull is cleaned and scored. This improves the adhesion of the dental cement, and allows for reliable levelling of the skull.

      In line 112 you mentioned that the number of recorded neurons was stable; however, you found a 3% mean decrease in unit count per day (line 120). Stability is great until day 10 (in Figure 4A), but it deteriorates quickly after that. I think it would help readers if you could add the mean{plus minus}SEM of recorded units in the text for days 1-10, days 11-50, and days 51-100 (using the data from Figure 4A).

      We have now added Supplementary Figure 4 to show unit count across bins of days, and a corresponding comment in the text (Stability, Para 2).

      A full survey of the probe (Figure 4B) means that you recorded neuronal activity across 4-5000 channels (depending on how many channels were in the brain). While it is clear that a full probe survey can reduce the number of animals needed for a study, it is also clear in this figure that by day 25 you can record ~300 neurons on 4000 channels. It would be great to discuss this in the discussion and give a balanced view of the long-term stability of these recordings.

      Overall, keeping a large number of units for a long time still remains a challenge. Here, we could record on average 85 neurons per bank during the first 10 days, and then only 45 after 50 days. It is important to note that our quantification averages across all banks recorded, including those in a ventricle or partly outside of the brain. Thus, our results represent a lower estimate of the total neurons recorded. Our new Supplementary Figure 4 helps to highlight the diversity of neuron number recorded per animal. Further improvements in surgical techniques and spike sorting will likely improve stability further and we have now added this comment in the manuscript (Stability, Para 2). For example, we observed excellent stability in a mouse where the craniotomy was stabilized with KwikSil (Supplementary Figure 5).

      The RMS value was around 20 uV in some of the recordings, and according to Figure 4G it is around 16 uV on average. Is it safe to accept putative single units with 20 uV amplitudes, when the baseline noise level is this close to the spike peak-to-peak amplitude?

      On average, less than 1% of the units selected using all the other metrics except the amplitude had an amplitude below 30 µV, and 2.6% below 50 µV. Increasing the threshold to 30 µV, or even 50 µV, did not affect the results. We have now added this comment in the Methods (Data processing, Para 3).

      Can you add the waveform and ISIH of the example unit from day 106 to Figure 5?

      We have now added 4 units tracked up to day 106 in Figure 5.  

      Could you move Supplementary Figure 3A to Figure 4? The number of units is more valuable information than the RMS noise level. I understand that you don't have such a nice coverage of all the days as in Figure 3 and 4, but you might be able to group for the first 3 days and the last 3 days (and if data is available, the middle 3 days) as a boxplot. The goal would be for the reader to be able to see whether there is any change in the number of single units over time.

      We agree with the reviewer, the number of units is more valuable. We had included this information in Figure 4A-F, but we have made edits to the text to make it clearer that this is what is being shown. The data from Figure 3A is already contained within Figure 4, but in 3A the data is separated by individual labs.

      Product numbers are missing in multiple places: line-285 (screw), line-288 (screw), line-290 (screw), line-309 (manipulator), line-374 (gold pin and silver wire), line-384 (Mill-Max), line-394 (silver wire), and many more. It would be great if you could add all these details, so people can replicate your protocol.

      We thank the reviewer for highlighting this, and we have added details of screw thread-size and length to relevant parts of the manuscript, although any type of screw can be used. Similarly, other components are non-specific (e.g. multiple silver-wire diameters were used across labs), so we have not included specific product numbers for general consumer items (like screws and silver wires) to avoid indicating that a specific part must be purchased.

      While it is great to see lab-specific methods, I am not sure in their current form it helps to understand the protocol better. The information is conveyed in different ways (I assume these were written by different people), in different orders, and in different depths (some mention probe implant location relative to bregma and midline, some don't). There are many different glues, epoxies, cement, wires, and pins. I would recommend rewriting these methods sections under a unified template, so it is easier to follow.

      We thank the reviewer for this suggestion and we have rewritten this section of the methods accordingly. We now use a template structure to simplify the comparisons between labs: the same template is used for each lab in each section (payload module assembly, implantation, and data acquisition).

      Line-307: why is a skull screw optional for grounding? What did you use for ground and reference if not a ground screw?

      We now specify in the manuscript that during head-fixed experiments, the animal’s headplate can be used for grounding, and combined with internal referencing provided by the Neuropixels, yielded lownoise recordings (Implantation protocol, Methods).

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer 1

      Comment 1: A gallery of different cell cycle stages should be included to define KDM4A centrosomal localization at G1, S and G2 phases and whether it is localized to duplicating centrosomes.

      Response: We thank the reviewer for this excellent suggestion. We have now included Fig S1H demonstrating the persistence/retention of KDM4A at the centrosome through the cell cycle. The text in the Results section has been updated to reflect this addition.

      Comment 2: The immunoprecipitations in Fig. 1 and Supp. Fig. 1 must include appropriate controls. There is no positive control in Fig. 1E and the negative controls for the tagged pulldowns are not appropriate in that there is no other HA-tagged protein in cells. Antibody controls and the reciprocal immunoprecipitations should also be included in the same figure (with controls).

      Response: To address the first point, we have included Histone H3 as a positive control for the KDM4A antibody in Figures 1E and 1F. As for the second point raised by the reviewer, the empty vector is an HA-tagged empty vector and so the antibody controls are already included in the Figure as the ‘empty vector’. We have now included detailed information in the Figure legend to clarify the same. In addition, as suggested by the reviewer we have moved the reverse IPs to the main Figure 1 (Figures 1G and 1I).

      Comment 3: Fig. 1H: The use of overexpressed GFP-centrin for immunoprecipitations is questionable; centrin overexpression can cause centrosome amplification, so the level of centrin relative to the endogenous level should be demonstrated.

      Response: This is with regards to the renumbered Fig. 1J. We have generated hTERT RPE-1 GFP-Centrin expressing stable cell lines that were used for our studies. This is a commonly used cell line in the field and although transient over-expression of GFP-centrin does cause centrosome defects, stable cells are less likely to have elevated centrosome defects. Importantly, the concern of overamplification of centrosomes in these cells is less of a concern given that we have only used these cells to validate the localization of KDM4A to centrosomes using centrin as a centrosome marker. Nonetheless, to ensure that we do not have an aberrant increase in centrosome defects in these cells we have included IF images of our cells (green channel in low-mag and high-mag images below) and are happy to report that we did not observe significantly elevated incidence of centrosome amplification in these stable cell lines.

      Comment 4: The precise localization of KDM4A should be determined more clearly with respect to known centrosomal structures/ regions. One would speculate a PCM localization from the data presented here, but the use of centrobin as a marker does not allow the mother centriole's location to be determined with great clarity. It is unclear why the authors chose centrobin as a marker; further explanation of this might be helpful to the reader. Centrobin is usually cited as a daughter centriole marker (PMID: 16275750, but see 29440264). Supp. Fig. 3J appears to shows 2centrioles labelled with centrobin but the paper does not specify whether centrobin is chosen as a daughter marker or otherwise.

      Response: We thank the reviewer for this astute observation. Our initial rationale for choosing centrobin was simply to use a centrosome marker that worked robustly and reliably with minimal background staining, essential for the single-molecule super-resolution imaging. The question we wanted to address was generating a geographic region in the cell showing nano-scale localization of KDM4A. The 2D images shown in Fig. 2 can be understandably static and hard to visualize the 3D distribution of KDM4A which is not exclusive to centrioles (centrobin although more daughter centriole, does show weaker signal at the mother centriole as well). We have now extensively re-worked Figure 2, including the inclusion of a video in Supplemental Information. We have now included new nano-scale imaging of KDM4A with g-tubulin (a more traditional centrosome marker), which shows a similar distribution of KDM4A across the centrosome and have also included distribution measurements along the x- and y-axis showing both KDM4A and centrobin/g-tubulin. We have modified the text to refer to centrobin as a centrosome marker (centrobin as the reviewer rightly noted can localize to both centrioles although predominantly at the daughter centriole).

      Comment 5: Related to this localization issue, Fig. 2D is unclear to this reviewer. What is this normalized to- a marker or just a set of coordinates? This is an unusual means of representing a localization that does not help the reader understand the (sub-) centrosomal location of KDM4A. The analysis in Supp. Fig. 4 is of somewhat limited value and might be omitted.

      __Response: __We apologize for the confusion with the Figure and have simplified the graphs to indicate the single-molecule distribution plotted along the x- and y-axis showing both KDM4A and the centrosome markers i.e. centrobin and g-tubulin.

      Comment 6: Fig. 3 shows the amplification of gamma-tubulin signals, but there is no control for cell cycle stage. The Kdm4a knockout cells appear to be twice the size of the controls, suggesting a G2 phase arrest, which can potentiate centrosome overduplication, or cytokinetic failure in a previous cell cycle (this may also be the case in Figs. 6C and 7B). Therefore, these cells should be phenotyped more robustly with respect to their proliferative characteristics and cell cycle phase distribution. Cell cycle phenotypes should also be checked in the rescue experiments.

      Response: We thank the reviewer for the comments above. The cells shown in Fig. 3 are interphase cells evaluated for centrosome numbers in Kdm4a-deficient cells, independent of mitosis. We apologize for the lack of clarity and the confusion generated by our erroneous statement at the beginning of the paragraph “we next investigated a functional role for KDM4A at mitotic centrosomes”. In fact, we started by first evaluating interphase cells to interrogate consequences of losing Kdm4a, followed by evaluations of the mitotic phenotypes once we observed increased centrosome numbers. This error has now been corrected in the Results.

      As for the reviewer’s comment on phenotyping the cells further, we have now performed these evaluations and have included them in Figure 3 (as new panels Figures 3D, 3E, 3F). Our MTT proliferation assays showed the Kdm4a-null cells proliferated slower than control non-targeted MEFs, although this did not result in any significant issues with cell cycle progression with both cell lines progressing without any arrests and importantly without accumulating increased DNA content/aneuploidy. The rescue cell lines were also phenotyped (new Figures 7C, 7D and 7E) and similarly did not show any altered cell cycle progression.

      Comment 7: Related to the previous point, in the DAPI staining in Figure 5A, 'pseudo-bipolar' cells #1 and #3 (from the top) seem to have greatly increased levels of DNA, suggesting failed cytokinesis as a mechanism of centrosome abnormality. This is a very different process to a centrosome overduplication within a single cell cycle; given that these are knockouts, it is not clear what conclusions should be drawn from the current analysis.

      Response: The reviewer makes an excellent point, about the increased centrosome numbers arising from failure to complete cytokinesis. We have performed further phenotyping of the Kdm4a-null cells, included as new Figures 3D, 3E and 3F. Although the Kdm4a-deficient cells grew slower than their Kdm4a-proficient counterparts, there were no significant issues with cell cycle progression and importantly no evidence of increased aneuploidy. We have also now performed further analysis using centrin as a centriole marker to quantify centrosome numbers (new Figures 4C, 4D and 4E) and have found that there is a significant increase in disjointed centrioles (Figure 4E) suggesting that in addition to any potential amplification there also appears to be an increased loss of cohesion in cells deficient for KDM4A. We have also further confirmed presence of single/disjointed centrioles using TEM analysis (new Figure 4F)

      Comment 8: The JIB-04 result may suggest that KDM4A inhibition causes fragmentation of spindle poles, given that it is a relatively short treatment that would probably not be long enough for centrosome overduplication. Whether this arises during M phase, distinct from the over duplication phenotype seen where there are >4 centrioles, should be posed as a separate question- these may be distinct outcomes from KMD4A inhibition at different cell cycle times.

      Response: We completely agree with the reviewer that the JIB-04 treatment is relatively short and does in fact suggest that this is independent of any over duplication phenotype observed in the Kdm4a-CRISPR knockouts. We thank the reviewer for the suggestion of posing two separate questions to address this point and have made the changes in the manuscript (see Results). In addition, our new data discussed in Comment 7 above, corroborates this hypothesis.

      Comment 9: It is unclear why the authors call the cell shown in Fig. 4B 'pseudo-bipolar'- there are clearly four poles here (as in the multipolar example shown in Fig 5A). This makes the data in Fig. 5 difficult to interpret. The authors should review their classification.

      Response: We thank the reviewer for catching this error. We apologize for the misrepresentation of the representative image and have now included the correct image that shows pseudo-bipolar spindles (new Figure 5D) replacing the multipolar spindle. In addition, we have reviewed our data and the quantitation remains unchanged.

      Comment 10: Expression of the vector control in the Kdm4a nulls in Fig. 7A appears to show a decline in the H3K36me3 levels, confusing the outcome of this experiment. Quantitation should be provided for these blots.

      Response: We have now included the requested quantitation (new Figure 7B) for Figure 7A.

      Comment 11: A rescue experiment should be included for the siRNA knockdown of KDM4A.

      Response: A rescue experiment with the siRNA experiments is challenging as we use a pooled siRNA (4 siRNAs) targeting KDM4A. Rescue with a KDM4A construct would result in the knockdown of the exogenously expressed KDM4A as well. The rescue experiments have been therefore performed with the CRISPR knockout cell lines.

      Comment 12: Size markers should be shown in all immunoblots.

      Response: We have now included size markers as requested by the reviewer for all Figures showing immunoblots (Figures 1, 5, 7 and Supplementary Figures 1, 5).

      Comment 13: p.6, 11 'the resulting payment' and 'caustic chromosome environment' are strange usages and should be rephrased.

      Response: The text has been rephrased.

      Comment 14: Are all panels shown at the same magnification in Fig. 1B? (The telophase DAPI appears different to the anaphase)

      Response: We have confirmed that the magnification is the consistent across the entire panel of images in Figure 1.

      Comment 15: Blow-up panels should be shown so that the centrosomes can be visualised more clearly (Fig. 1 and Supp. Fig. 1).

      Response: We have now included blow-up panels for all centrosome images in Figure 1 and Supplemental Figure 1.

      Comment 16: The MT labelling in Fig. 1D is not of good quality; this imaging should be improved.

      Response: We believe that microtubule densities are impacted by modulating KDM4A in cells likely arising from alternate mechanisms that we are currently investigating. However, to the reviewer’s point we have placed the transient overexpression images in Supplementary information (Supplemental Figure 1I) and have replaced with new Figure 1D, using our stable clones expressing RFP-vector or RFP-KDM4A.

      Reviewer 2

      Comment 1: Coimmunoprecipitation and GFP-trap analyses demonstrated interactions between KDM4A and centrobin, CP110, and centrin-2 (Fig. 1). While the authors suggest a functional a functional association with the centrosome, it is noteworthy that no known centriole protein has been identified to interact simultaneously with centrobin, CP110, and centrin-2, located in distinct sub-centriolar regions. Additionally, 3D super-resolution microscopy indicates that KDM4A is not restrained to a particular region of the centrosome, surely not at the centriole (Fig. 4D). These results hint that centrobin, CP110 and centrin-2 may be potential substrates of KDM4A. Therefore, it is worth to conduct immunostaining and coimmunoprecipitation analyses with the JIB-04-treated cells.

      Response: The reviewer makes an excellent point. The co-immunoprecipitation studies were not conducted to show a direct interaction between the centrosome proteins and KDM4A, but more as a proof-of principle that KDM4A is interacting with centrosome proteins (we do not know if this is direct or indirect, although the data would likely suggest an indirect mechanism). Given that we had used centrobin, centrin and CP110 in our immunofluorescence analysis we also used them for our co-IP studies to provide further evidence of a centrosome localization for KDM4A. It is intriguing that any one of these proteins could in fact be substrates for KDM4A, although an in-depth study would be required to prove this since the super-resolution localization would suggest that KDM4A is not at the centrioles per se and is in fact more of a pericentriolar protein. We have clarified this point in the Discussion. Although the experiments suggested with the JIB treatment would be intriguing, identifying a bone fide centrosome substrate for KDM4A’s demethylase activity is not trivial and would require identification of methylation on a substrate followed by then determining if KDM4A can demethylate the target. Methylation on non-chromatin substrates such as centrosome proteins is not currently well characterized.

      Comment 2: The generation supernumerary centrioles in Kdm4a KO MEFs is intriguing yet warrants careful description (Fig. 3). First, supernumerary centrioles should be coimmunostained with multiple centriole markers, such as centrin-2, CP110 and centrobin antibodies at synchronized populations such as G1, S and M phases. Second, the number of centrioles per cells may be counted and statistically analyzed.

      Response: We thank the reviewer for making this suggestion. We have now included new Figures 4C, 4D, 4E and 4F where we show immunofluorescence with Centrin 2 in Kdm4a-deficient cells. Having found an increased incidence of unpaired centrioles in cells deficient for Kdm4a we have further performed TEM to show the presence of these unpaired/disjoint centrioles.

      Comment 3: The high proportion of pseudo-bipolar cells in the NT group requires attention (Fig. 5).

      Response: We thank the reviewer for this astute observation. To obtain enough mitotic cells for analysis we synchronized the MEFs, which appeared to increase the baseline of pseudo-bipolar spindles reflected in the quants. Despite this increase the differential between the controls and Kdm4a-null cells is significant, as indicated, and we have now made this evident in the text for clarity.

      Comment 4: The KO-rescue cells should be valuable tools to confirm specific roles of KDM4A at the centrosome (Fig. 7). The authors may generate stable cell lines in which wild type and H188A mutant KDM4A are expressed in the KO cells, and use them for centrosome localization of the ectopic proteins, spindle formation and supernumerary centriole generation.

      Response: The reviewer makes an excellent point and in fact we generated the stables (Figure 7) with this idea in mind. Unfortunately (but not completely surprising as this is frequently observed in comparable settings) we observed decreased mitotic abnormalities and genomic instability in the Kdm4a-null cells over time in culture. This is likely arising from a compensatory mechanism/redundancy that perhaps kicks in to enable survival of these cells. The process of generating the stables was therefore tricky with us only being able to reliably analyze genomic stability as a downstream readout of mitotic abnormalities that might have occurred in these cells (early passages analyzed for genomic stability).

      Reviewer 3

      Comment 1: Figure 1D: the RFP vector alone localizes to the centrosome. How was the signal across the cells? Can the authors provide a fluorescence intensity measurement comparing the negative control RFP and RFP-KDM4A to demonstrate the localization at centrosomes of the enzyme? While I found the endogenous staining convincing, the fusion protein is less.

      Response: The MEFs were transiently transfected with the RFP-vector/KDM4A for the images shown. In our experience it is not uncommon for the RFP/mCherry/GFP tags to be prominent at the spindle and often tagged vector controls are omitted from many prominent publications. However, in our case there is a significant increase in RFP-KDM4A signal observed at the spindle poles and we have now included the quantification of signal from the two poles in Supplemental Figure 1J where the signal is 3 times higher in the RFP-KDM4A expressing cells compared to vector. We have also included new Figure 1D demonstrating the RFP-KDM4A localization to spindle poles in our stable cell lines where the signal for the control RFP-vector is negligible. The transient transfection data has been moved to Supplemental Figure 1 (1I).

      Comment 2: Figure 1E-F: How specific do the authors think the interactions with CP110 and centrobin are? Do they IP the entire centrosome proteome or do they think that they reveal some specific interactions within the centrosome? Can the authors comment on this? What is the significance of these interactions? Do the authors think that KDM4A is a centriolar component? Or a PCM component? This is only briefly mentioned in the discussion, it should be extended. Did they try to IP PCM components as well?

      Response: The reviewer brings up an excellent point. The purpose of the immunoprecipitation was to demonstrate the ability of KDM4A to pull down centrosome associated proteins and vice versa. We are unable to comment on the interactions being direct or indirect, although we suspect that most of the interactions are likely indirect, given that KDM4A is not specifically localized to the centrioles. As per the reviewer’s suggestion, we have now expanded the Discussion to speculate on the potential significance of these interactions and how they might enable identification of novel KDM4A interactors and potential substrates.

      Comment 3: Fig.S3: the signal of KDM4A seems broader than that of centrobin, with an average diameter of 749 nm. What is the diameter of centrobin for comparison using this method? The interpretation of the authors concerning this localization is not clear to me: "The quantification data of the diameter of the KDM4A distribution, independently in the different axes (x, y, z), revealed a relatively uniform/circular distribution (Fig. 2D) suggesting that KDM4A was not restrained to a particular region of the centrosome". Is KDM4A at centriole or at centrosomes? PCM or centriole component? From the interpretation stated above, it seems that KDM4A is everywhere from the proximal to the distal axis of the centriole, is it correct? But isn't more PCM?

      Response: We would like to apologize for the lack of clarity with respect to the centrobin measurements compared to those of KDM4A. We have attempted to clarify the distributions measurements by showing the distributions for both the centrobin and KDM4A signals. In addition, we have anow included new data with g-tubulin to show co-localization of KDM4A signal with g-tubulin and to also demonstrate that the signal for KDM4A is not centriole specific but is essentially more uniformly distributed throughout the centrosome. We have also included a video (Video 1) as Supplemental data to clarify this point.

      Comment 4: Fig.4B: The authors established that there is an increase in centrosome number upon short inactivation of KDM4A by JIB-04, which affects its enzymatic activity and not the scaffolding function. In addition, the loss of KDM4A phenocopies the effect of the drug: this means that the enzymatic activity is required to control the centrosome number. This is also re-enforced by the rescue with WT enzyme and not the enzymatically dead mutant of KDM4A (looking at micronuclei formation-Fig.7). Could the author speculate on this? The fast action of the inhibitor would exclude a block in S phase as stated in the discussion. The authors mention centrosome fragmentation but there is no evidence that this is happening here. The authors mentioned several possible mechanisms in the discussion without really exploring them. The authors also mention here that the chronic loss of KDM4A could arise through a distinct mechanism than that of the inhibitor, this statement was surprising. Could the authors check if they have a cell cycle delay or block in their KO cells? While it seems that the authors would like to address these points in the future, I think that the mechanistic aspect is lacking in this study or at least some hints of it.

      Response: We agree with all the points brought up by the reviewer. We have elaborated the discussion as recommended, however the challenge with a demethylase is identifying a potential methyltransferase that can lay a methyl mark on a potential substrate followed by then establishing KDM4A as an eraser for the same substrate. To address, the comment about a cell cycle delay as also brought up by Reviewer 1 (Comment 6), we have performed additional phenotyping of the cells and these data are now included in Figure 3 (as new panels Figures 3D, 3E, 3F) and new Figures 7C, 7D and 7E (for the rescue cell lines) which did not show any altered cell cycle progression.

      Comment 5: In general, the figures are organized in an unconventional manner with the panels from one figure distributed on several pages. Could the authors group the panels of each figure in one page to ease the understanding and the reading?

      Response: Although we do understand how having multiple panels on several pages makes its difficult to read, the immunofluorescence images would be extremely difficult to observe clearly. Also, this comment will be resolved once the manuscript is accepted for publication as we will re-format per journal guidelines.

      Comment 6: Figure S1F-G: the authors provide a large field of view showing a dozen of nuclei. While I acknowledge that this is to show the overall staining, itis difficult to really see the foci of KDM4A or g-Tubulin or centrin. The quality of the images looks really pixelated; this might be due to the PDF compression, but I cannot see any red signal on the panels. Could the authors enhance the B/C of the images so that one can see the signal corresponding to the centrosomes? Is it also possible to have a zoom on the centrosome itself with split channels to illustrate the co-localization? As it is, it is not clearly shown. In the panel G, there are many foci of KDM4A in the nucleus and 2 associated with a centrin staining, which correspond to the centrosomes. However, the signals do not seem to fully colocalize. What do the authors think about this?

      Response: We have provided larger zoomed in view of the cells in Figure S1 as requested.

      Comment 7: Figure 1A: same comment as above concerning the quality of the image. I am also concerned by the g-Tubulin staining as it looks not on focus and I do not see any foci that would correspond to the centrosome position, while the merge image clearly shows yellow signal, proof of co-localisation. Could the author correct this? In the inset, can the authors zoom on the centrosomes and display the split channels so that one can appreciate the co-localization of the 2 signals?The quality and display of Fig.1B is much better. Could we have the same rendering for the interphase cells of 1A?

      Response: The picture in Figure 1A is a raw image. This image has not undergone the same post-image deconvolution applied to the other images in the manuscript. The deconvolved images reduce the KDM4A signal in the nucleus and only demonstrate the highly intense signal at the centrosomes especially in mitotic cells. If we show the deconvolved image here it would lead to the erroneous perception that there is no KDM4A signal in the nucleus and the rest of the cell. To clarify this point we have modified the figure legends to state that this is a raw image. In addition, we have also provided blow-ups of the centrosomes specifically.

      Comment 8: Fig.3D: the nucleus of the cell is really affected with many blobs or micronuclei. Is this cell dying? The authors count the number of g-tubulin foci in interphase (Fig. 3C). Could they do it in mitosis and use centrin? In mitosis, there should be 4

      Response: The cell in question is not dying and is micronucleated. The question of genomic instability is addressed later in the manuscript and hence the point was not made in this figure. We thank the reviewer for suggesting use of centrin. We have now included these data as new Figures 4C, 4D and 4E.

    1. Authors’ Response (31 December 2024)

      GENERAL ASSESSMENT

      In this article, Kay refutes a major claim made by Watson et al., 2023. In the original publication, Watson et al. argue that macromolecular condensation acts as a cellular buffering mechanism to compensate for the effects of osmotic shock. In particular, they claim that, when water is drawn into or out of the cell due to hypo- or hyper-osmotic shock, respectively, macromolecular condensates rapidly capture (during hypo-osmotic shock) or release (during hyper-osmotic shock) free water to maintain a constant water potential (presumably in addition to a constant solute concentration and osmolality) within the cell. While Watson et al. find that macromolecular condensation in cells is responsive to osmotic shock, they do not measure intracellular water potential, osmolality, or macromolecular density in intact cells, and therefore do not directly demonstrate that biocondensation buffers any of these properties in living cells. In response, Kay argues that, while such a water buffer could temporarily maintain an osmolality differential across the membrane, this osmolality differential will necessarily drive water across the membrane until the osmolality within the cell equals the osmolality outside of the cell. Therefore, the steady-state behaviour is expected to be identical with and without the water buffer. Using the well-established pump-leak model for osmotic water transport, Kay further shows that the timescale at which a water buffer can maintain even a 10% osmolality differential across the membrane is at most a minute for a typical animal cell.

      Overall, Kay 2024 provides a compelling rebuttal to a strong claim made by Watson et al. However, there is an opportunity for Kay to acknowledge nuanced situations where such a water buffering mechanism as that posited by Watson may be useful to cells. It’s also unclear if Kay has described a major inconsistency with Watson et al., particularly since the water release rate from condensates is not well quantified.<br />

      I thank the reviewers and curating editor for taking the time to work through our paper and providing guidance for improving it.

      In what follows we have responded to all the points raised during the review in blue and have attached a revised version of the manuscript with all changes marked up. I should note that I have included my collaborator Zahra Aminzare as an author on the paper, since she did significant work in overcoming some difficulties with the simulations.

      RECOMMENDATIONS

      Essential revisions:

      1. The author could acknowledge nuanced situations in which the water buffering mechanism described by Watson et al. may be useful to cells. For example, by slowing the rate of change of intracellular osmolarity due to osmotic shock and thus giving the cell time for more active feedback mechanisms to engage, or in buffering rapid fluctuations in extracellular osmolality

      The only situation that we could think of where the change in osmolarity was transient and fast, is blood cells moving through the vasa recta in kidney (L 143) , where the interstitial osmolarity gets as high as 1,200 mOsm. The cells are exposed to this change for a few seconds, approximately every 5 min. However, no one seems to have measured how much the plasma changes as blood transits through the vasa recta. we have elaborated a bit more on this in the revised version of the manuscript.

      1. The flux of water across lipid membranes depends on the pressure difference across the membrane. The author simulated the situation with a 30 mOsm (~75 kPa) osmotic pressure difference. Considering that physiologically relevant pressure fluctuations can be much lower (a few kPa), is it possible that a water buffer would be more effective when there are small pressure differences across the cell membrane? The author should discuss this.

      We have repeated the simulations now using smaller osmotic gradient (15 mOsm). With these it can buy a little more time since that rate of water influx is slower. This has been included in the revised paper (L 76-78).

      1. The author should cite the work from which they obtained the water buffer release rate.

      To the best of our knowledge there is no information on this point. When I first presented Watson et al. with my counter argument, I assumed that the release of water was instantaneous. They countered that it is likely to be more gradual, therefore I simulated a slower release process. In the revised version of the paper, we now included both slow and quick release to show that it makes little difference. We have now noted that for the WB to counteract the change in extracellular osmolarity, the rate of water release must match very closely that of the water influx or efflux, which seems implausible (see L 81-86).

      1. It would be helpful to measure the dynamics of intracellular volume concurrently with biocondensate formation under cells exposed to osmotic shock (ideally under experimental conditions where cells either do or do not form condensates). If Watson et al.’s hypothesis is correct, the volume should not change (this seems unlikely). If your hypothesis is correct that buffering could only ever be temporary, one could then experimentally determine the buffering timescale by measuring the stall time between the shock and when the volume begins to change. The stall should also disappear in conditions where condensate formation is inhibited.

      We have added this suggestion to the revised paper (L 152-154).

      1. There are two inaccuracies in the discussion of membrane tension caused by osmotic pressure that would benefit from being corrected. First, when using Laplace’s Law to calculate membrane tension induced by 30 mOsm pressure, the author used a cell radius of 10 um and calculated a large (180 mN/m) membrane tension. This is significantly overestimated because the cell membrane can form local deformations via attachment to the cytoskeleton. These local deformations are typically around 10 - 100 nm, thus reducing the calculated membrane tension by 2-3 orders of magnitude, below the lysis tension of the membrane (1 - 10 mN/m). Second, the author is correct that measured resting membrane tension is low (< 0.3 mN/m). However, recent evidence suggests that tension on the cell membrane can locally or transiently reach much higher levels (to > 1mN/m). This is supported by activation of mechanosensitive ion channels such as Piezo1, which require an activation membrane tension ~ 1mN/m.

      We have now included a discussion of the first point (see L 127-130). On the second point, if the reviewers could please provide us with a reference to this we would be grateful and will include it in the revised version of the paper.

      1. The discussion on non-equilibrium states is not very clear. Is the author suggesting that a water buffer can work more efficiently in an equilibrium system such as a giant vesicle?

      The reviewers raise an interesting point. In a closed system like a test tube a WB could act without opposition, but as soon as one introduces a membrane that is permeable to water it will be overwhelmed, no matter if the system is an energetically driven one like the PLM or a passive one like a giant vesicle. There is no difference between a passive and active system. To show this we set up a cell without a sodium pump and used a high concentration of impermeant extracellular molecules to stabilize the volume. (see L 102-107, and Fig. 1)

      1. Because the pump-leak model is generic, some contextual discussion of condensates would be helpful. For example, the dynamic formation of hydrogen bonds, van der Waals interactions, and possible charges resulting from hyperosmotic or low osmotic conditions that may indirectly participate in the hypothesis.

      Our argument does not depend on the mechanism of water release or binding. There may indeed be many ways of changing the water bound to a macromolecule, but we will leave it to proponents of the WB hypothesis to explore this.

      1. Has the author considered whether the thermodynamic driving forces associated with phase separation and condensate formation might affect the ability of condensates to buffer intracellular osmolality?

      This is beyond the scope of our paper, so again we will leave it to others to address this. Rather than discuss all the different scenarios we have chosen to quote the papers of (Guttman et al., 1995; Parsegian et al., 2000), which Watson et al. do not refer to.

      Optional suggestions:

      1. The language of the article focuses on the role of membrane permeability, which is of course key. However, it might be helpful to explicitly state that an osmolality differential will always drive water across the membrane, so even if a water buffer could temporarily maintain such an osmolality differential, water will continue to flow across the membrane until the buffer is saturated and this differential is equalized.

      This is indeed what will happen. We have tried to clarify this in the revised version (see L 208-210).

      (This is a response to peer review conducted by Biophysics Colab on version 1 of this preprint.)

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript by Oleh et al. uses in vitro electrophysiology and compartmental modeling (via NEURON) to investigate the expression and function of HCN channels in mouse L2/3 pyramidal neurons. The authors conclude that L2/3 neurons have developmentally regulated HCN channels, the activation of which can be observed when subjected to large hyperpolarizations. They further conclude via blockade experiments that HCN channels in L2/3 neurons influence cellular excitability and pathway-specific EPSP kinetics, which can be neuromodulated. While the authors perform a wide range of slice physiology experiments, concrete evidence that L2/3 cells express functionally relevant HCN channels is limited. There are serious experimental design caveats and confounds that make drawing strong conclusions from the data difficult. Furthermore, the significance of the findings is generally unclear, given modest effect sizes and a lack of any functional relevance, either directly via in vivo experiments or indirectly via strong HCN-mediated changes in known operations/computations/functions of L2/3 neurons.

      Specific points:

      (1) The interpretability and impact of this manuscript are limited due to numerous methodological issues in experimental design, data collection, and analysis. The authors have not followed best practices in the field, and as such, much of the data is ambiguous and/or weak and does not support their interpretations (detailed below). Additionally, the authors fail to appropriately explain their rationale for many of their choices, making it difficult to understand why they did what they did. Furthermore, many important references appear to be missing, both in terms of contextualizing the work and in terms of approach/method. For example, the authors do not cite Kalmbach et al 2018, which performed a directly comparable set of experiments on HCN channels in L2/3 neurons of both humans and mice. This is an unacceptable omission. Additionally, the authors fail to cite prior literature regarding the specificity or lack thereof of Cs+ in blocking HCN. In describing a result, the authors state "In line with previous reports, we found that L2/3 PCs exhibited an unremarkable amount of sag at 'typical' current commands" but they then fail to cite the previous reports.

      We thank the reviewer for the thorough examination of our manuscript; however, we disagree with many of the raised concerns for several reasons, as detailed here:

      To address the lack of certain citations, we would like to emphasize that in the introduction section, we did initially focus on the several decades-long line of investigation into the HCN channel content of layer 2/3 pyramidal cells (L2/3 PCs), where there has undoubtedly been some controversy as to their functional contribution. We did not explicitly cite papers that claimed to find no/little HCN channels/sag- although this would be a significant list of publications from some excellent investigators, as methods used may have differed from ours leading to different interpretations. Simply stated, unless one was explicitly looking for HCN in L2/3 PCs, it might go unobserved. However, we now addressed this more clearly in the revision:

      Just to take one example: in the publication mentioned by the reviewer (Kalmbach et al 2018), the investigators did not carry out voltage clamp or dynamic clamp recordings, as we did in our work here. Furthermore, the reported input resistance values in the aforementioned paper were far above other reports in mice (Routh et al. 2022, Brandalise et al 2022, Hedrick et al 2012; which were similar to our findings here), suggesting that recordings in Kalmbach were carried out at membrane potentials where HCN activation may be less available (Routh, Brager and Johnston 2022).

      Another reason for some mixed findings in the field is undoubtedly due to the small/nonexistent sag in L2/3 current clamp recordings (in mice). We also observed a very small sag, which can be explained by the following:  The ‘sag’ potential is a biphasic voltage response emerging from a relatively fast passive membrane response and a slower Ih activation. In L2/3 PCs, hyperpolarization-activated currents are apparently faster than previously described, and are located proximally (Figure 2 & Figure 5). Therefore, their recruitment in mouse L2/3 PCs is on a similar timescale to the passive membrane response, resulting in a more monophasic response. We now include a more full set of citations in the updated introduction section, to highlight the importance of HCN channels in L2/3 PCs in mice (and other species).

      The justification for using cesium (i.e., ‘best practices’) is detailed below.

      (2) A critical experimental concern in the manuscript is the reliance on cesium, a nonspecific blocker, to evaluate HCN channel function. Cesium blocks HCN channels but also acts at potassium channels (and possibly other channels as well). The authors do not acknowledge this or attempt to justify their use of Cs+ and do not cite prior work on this subject. They do not show control experiments demonstrating that the application of Cs+ in their preparation only affects Ih. Additionally, the authors write 1 mM cesium in the text but appear to use 2 mM in the figures. In later experiments, the authors switch to ZD7288, a more commonly used and generally accepted more specific blocker of HCN channels. However, they use a very high concentration, which is also known to produce off-target effects (see Chevaleyre and Castillo, 2002). To make robust conclusions, the authors should have used both blockers (at accepted/conservative concentrations) for all (or at least most) experiments. Using one blocker for some experiments and then another for different experiments is fraught with potential confounds.

      To address the concerns regarding the usage of cesium to block HCN channels, we would like to state that neither cesium nor ZD-7288 are without off-target effects, however in our case the potential off-target effects of external cesium were deemed less impactful, especially concerning AP firing output experiments. Extracellular cesium has been widely accepted as a blocker of HCN channels (Lau et al. 2010, Wickenden et al. 2009, Rateau and Ropert 2005, Hemond et al. 2009, Yang et al. 2015, Matt et al. 2010). However, it is well known to act on potassium channels as well at higher concentrations, which has been demonstrated with intracellular and extracellular application (Puil et al. 1981, Fleidervish et al. 2008, Williams et al. 1991, 2008).

      Although we initially performed ‘internal’ control experiments to ensure the cesium concentration was unlikely to greatly block voltage gated K+ channels during our recordings, we recognize these were not included in the original manuscript. These are detailed as follows: during our recordings cesium had no significant effect on action potential halfwidth, ruling out substantial blocking of potassium channels, nor did it affect any other aspects of suprathreshold activity (now reported in results, page 4 - line 113). Furthermore, we observed similar effects on passive properties (resting membrane potential, input resistance) following ZD-7288 as with cesium, which we now also updated in our figures (Supplementary Figure 1). We did acknowledge that ZD-7288 is a widely accepted blocker of HCN, and for this reason we carried out some of our experiments using this pharmacological agent instead of cesium.

      On the other hand, ZD-7288 suffers from its own side effects, such as potential effects on sodium channels (Wu et al. 2012) and calcium channels (Sánchez-Alonso et al. 2008, Felix et al. 2003). As our aim was to provide functional evidence for the importance of HCN channels, we initially deemed these potential effects unacceptable in experiments where AP firing output (e.g., in cell-attached experiments) was measured. Nonetheless, in new experiments now included here, we found the effects of ZD and cesium on AP output were similar as shown in new Supplemental Figure 1.

      Many experiments were supported by complementary findings using external cesium and ZD-7288. For example, the effect of ZD-7288 on EPSPs was confirmed by similar synaptic stimulation experiments using cesium. This is important, as synaptic inputs of L2/3 PCs are modulated by both dendritic sodium (Ferrarese et al. 2018) and calcium channels (Landau 2022), therefore the application of ZD-7288 alone may have been difficult to interpret in isolation. We thank the reviewer for bringing up this important point.

      (3) A stronger case could be made that HCN is expressed in the somatic compartment of L2/3 cells if the authors had directly measured HCN-isolated currents with outside-out or nucleated patch recording (with appropriate leak subtraction and pharmacology). Whole-cell voltage-clamp in neurons with axons and/or dendrites does not work. It has been shown to produce erroneous results over and over again in the field due to well-known space clamp problems (see Rall, Spruston, Williams, etc.). The authors could have also included negative controls, such as recordings in neurons that do not express HCN or in HCN-knockout animals. Without these experiments, the authors draw a false equivalency between the effects of cesium and HCN channels, when the outcomes they describe could be driven simply by multiple other cesium-sensitive currents. Distortions are common in these preparations when attempting to study channels (see Williams and Womzy, J Neuro, 2011). In Fig 2h, cesium-sensitive currents look too large and fast to be from HCN currents alone given what the authors have shown in their earlier current clamp data. Furthermore, serious errors in leak subtraction appear to be visible in Supplementary Figure 1c. To claim that these conductances are solely from HCN may be misleading.

      We disagree with the argument that “Whole-cell voltage-clamp in neurons with axons and/or dendrites does not work”. Although this method is not without its confounds (i.e. space clamp), it is still a useful initial measure as demonstrated countless times in the literature. However, the reviewer is correct that the best approach to establish the somatodendritic distribution of ion channels is by direct somatic and dendritic outside-out patches. Due to the small diameter of L2/3 PC dendrites, these experiments haven’t been carried out yet in the literature for any other ion channel either to our knowledge. Mapping this distribution electrophysiologically may be outside the scope of the current manuscript, but it was hard for us to ignore the sheer size of the Cs<sup>+</sup> sensitive hyperpolarizing currents in whole cell. Thus, we will opt to report this data.

      Also, we should point out that space clamp-related errors manifest in the overestimation of frequency-dependent features, such as activation kinetics, and underestimation of steady-state current amplitudes. The activation time constant of our measured currents are somewhat faster than previously reported; reducing major concerns regarding space clamp errors. Furthermore, we simply do not understand what “too large… to be from HCN currents” means. Our voltage-clamp measured currents are similar to previously reported HCN currents (Meng et al. 2011, Li 2011, Zhao et al. 2019, Yu et al. 2004, Zhang et al. 2008, Spinelli et al. 2018, Craven et al. 2006, Ying et al. 2012, Biel et al. 2009).

      Furthermore, we should point out that our measured currents activated at hyperpolarized voltages, had the same voltage dependence as HCN currents, did not show inactivation, influenced both input resistance and resting membrane potential, and are blocked by low concentration extracellular cesium. Each of these features would point to HCN.

      (4) The authors present current-clamp traces with some sag, a primary indicator of HCN conductance, in Figure 2. However, they do not show example traces with cesium or ZD7288 blockade. Additionally, the normalization of current injected by cellular capacitance and the lack of reporting of input resistance or estimated cellular size makes it difficult to determine how much current is actually needed to observe the sag, which is important for assessing the functional relevance of these channels. The sag ratio in controls also varies significantly without explanation (Figure 6 vs Figure 7). Could this variability be a result of genetically defined subgroups within L2/3? For example, in humans, HCN expression in L2/3 varies from superficial and deep neurons. The authors do not make an effort to investigate this. Regardless of inconsistencies in either current injection or cell type, the sag ratio appears to be rather modest and similar to what has already been reported previously in other papers.

      We thank the reviewer for pointing out that our explanation for the modest sag ratio might have not been sufficient to properly understand why this measurement cannot be applied to layer 2/3 pyramidal cells. Briefly: sag potential emerges from a relatively (compared to I<sub>h</sub>) fast passive membrane response and a slower HCN recruitment. The opposing polarity and different timescales of these two mechanisms results in a biphasic response called “sag” potential. However, if the timescale of these two mechanisms is similar, the voltage response is not predicted to be biphasic. We have shown that hyperpolarization activated currents in our preparations are fast and proximal, therefore they are recruited during the passive response (see Figure 2g.). This means that although a substantial amount of HCN currents are activated during hyperpolarization, their activation will not result in substantial sag. Therefore, sag ratio measurement is not necessarily applicable to approximate the HCN content of mouse L2/3 PCs. We would like to emphasize that sag ratio measurements are correct in case of other cell types (i.e. L5 and CA1 PCs_,_ and our aim is not to discredit the method, but rather to show that it cannot be applied similarly in the case of mouse L2/3 PCs.

      Our own measurements, similar to others in the literature show that L2/3 PCs exhibit modest sag ratios, however, this does not mean that HCN is not relevant. I<sub>h</sub> activation in L2/3 PCs does not manifest in large sag potential but rather in a continuous distortion of steady-state responses (Figure 2b.). The reviewer is correct that L2/3 PCs are non-homogenous, therefore we sampled along the entire L2/3 axis. This yielded some potential variability in our results (i.e., passive properties); yet we did not observe any cells where hyperpolarizing-activated/Cs<sup>+</sup>-sensitive currents could not be resolved. As structural variability of L2/3 cells does result in variability in cellular capacitance, we compensated for this variability by injecting cellular capacitance-normalized currents. Our measured cellular capacitances were in accordance with previously published values, in the range of 50-120 pF. Therefore, the injected currents were not outside frequently used values. Together, we would like to state that whether substantial sag potential is present or not, initial estimates of the HCN content for each L2/3 PC should be treated with caution.

      (5) In the later experiments with ZD7288, the authors measured EPSP half-width at greater distances from the soma. However, they use minimal stimulation to evoke EPSPs at increasingly far distances from the soma. Without controlling for amplitude, the authors cannot easily distinguish between attenuation and spread from dendritic filtering and additional activation and spread from HCN blockade. At a minimum, the authors should share the variability of EPSP amplitude versus the change in EPSP half-width and/or stimulation amplitudes by distance. In general, this kind of experiment yields much clearer results if a more precise local activation of synapses is used, such as dendritic current injection, glutamate uncaging, sucrose puff, or glutamate iontophoresis. There are recording quality concerns here as well: the cell pictured in Figure 3a does not have visible dendritic spines, and a substantial amount of membrane is visible in the recording pipette. These concerns also apply to the similar developmental experiment in 6f-h, where EPSP amplitude is not controlled, and therefore, attenuation and spread by distance cannot be effectively measured. The outcome, that L2/3 cells have dendritic properties that violate cable theory, seems implausible and is more likely a result of variable amplitude by proximity.

      To resolve this issue, we made a supplementary figure showing elicited amplitudes, which showed no significant distance dependence and minimal variability (new Supplementary Figure 6). We thank the reviewer for suggesting an amplitude-halfwidth comparison control (now included as new Supplementary Figure 6).). To address the issue of the non-visible spines, we would like to note that these images are of lower magnification and power to resolve them. The presence of dendritic spines was confirmed in every recorded pyramidal cell observed using 2P microscopy at higher magnification.

      We would like to emphasize that although our recordings “seemingly” violated the cable theory, this is only true if we assume a completely passive condition. As shown in our manuscript, cable theory was not violated, as the presence of NMDA receptor boosting explained the observed ‘non-Rallian’ phenomenon.

      (6) Minimal stimulation used for experiments in Figures 3d-i and Figures 4g-h does not resolve the half-width measurement's sensitivity to dendritic filtering, nor does cesium blockade preclude only HCN channel involvement. Example traces should be shown for all conditions in 3h; the example traces shown here do not appear to even be from the same cell. These experiments should be paired (with and without cesium/ZD). The same problem appears in Figure 4, where it is not clear that the authors performed controls and drug conditions on the same cells. 4g also lacks a scale bar, so readers cannot determine how much these measurements are affected by filtering and evoked amplitude variability. Finally, if we are to believe that minimal stimulation is used to evoke responses of single axons with 50% fail rates, NMDA receptor activation should be minimal to begin with. If the authors wish to make this claim, they need to do more precise activation of NMDA-mediated EPSPs and examine the effects of ZD7288 on these responses in the same cell. As the data is presented, it is not possible to draw the conclusion that HCN boosts NMDA-mediated responses in L2/3 neurons.

      As stated in the figure legends, the control and drug application traces are from the same cell, both in figure 3 and figure 4, and the scalebar is not included as the amplitudes were normalized for clarity. We have address the effects of dendritic filtering above in answer (5), and cesium blockade above in answer (2). To reiterate, dendritic filtering alone cannot explain our observations, and cesium is often a better choice for blocking HCN channels compared to ZD-7288, which blocks sodium channels as well.

      When an excitatory synaptic signal arrives onto a pyramidal cell in typical conditions, neurotransmitter sensitive receptors transmit a synaptic current to the dendritic spine. This dendritic spine is electrically isolated by the high resistance of the spine neck and due to the small membrane surface of the spine, the synaptic current can elicit remarkably large voltage changes. These voltage changes can be large enough to depolarize the spine close to zero millivolts upon even single small inputs (Jayant et al. 2016). Therefore, to state that single inputs arriving to dendritic spines cannot be large enough to recruit NMDA receptor activation is incorrect. This is further exemplified by the substantial literature showing ‘miniature’ NMDA recruitment via stochastic vesicle release alone.

      (7) The quality of recordings included in the dataset has concerning variability: for example, resting membrane potentials vary by >15-20 mV and the AP threshold varies by 20 mV in controls. This is indicative of either a very wide range of genetically distinct cell types that the authors are ignoring or the inclusion of cells that are either unhealthy or have bad seals.

      Although we are aware of the diversity of L2/3 PCs, resolving further layer depth differences is outside the scope of our current manuscript. However, as shown in Kalmbech et al, resting membrane potential can greatly vary (>15-20 mV) in L2/3 PCs depending on distance from pia. We acknowledge that the variance in AP threshold is large and could be due to genetically distinct cell types.

      (8) The authors make no mention of blocking GABAergic signaling, so it must be assumed that it is intact for all experiments. Electrical stimulation can therefore evoke a mixture of excitatory and inhibitory responses, which may well synapse at very different locations, adding to interpretability and variability concerns.

      We thank the reviewer for pointing out our lack of detail regarding the GABAergic signaling blocker SR 95531. We did include this drug in our recordings of (50Hz stim.) signal summation, so GABAergic responses did not contaminate our recordings. We now included this information in the results section (page 5) and the methods section (page 15)

      (9) The investigation of serotonergic interaction with HCN channels produces modest effect sizes and suffers the same problems as described above.

      We do not agree with the reviewer that 50% drop in neuronal AP firing responses (Figure 7b) was a modest effect size. Thus, we opted to keep this data in the manuscript.

      (10) The computational modeling is not well described and is not biologically plausible. Persistent and transient K channels are missing. Values for other parameters are not listed. The model does not seem to follow cable theory, which, as described above, is not only implausible but is also not supported by the experimental findings.

      The model was downloaded from the Cell Type Database from the Allen Institute, with only minor modifications including the addition of dendritic HCN channels and NDMA receptors- which were varied along a wide parameter space to find a ‘best fit’ to our observations. These additions were necessary to recapitulate our experimental findings. We agree the model likely does not fully recapitulate all aspects of the dendrites, which as we hope to convey in this manuscript, are not fully resolved in mouse L2/3 PCs. This is a previously published neuronal model, and despite its potential shortcomings, is one among a handful of open-source neuronal models of a fully reconstructed L2/3 PC.

      Reviewer #2 (Public Review):

      Summary:

      This paper by Olah et al. uncovers a previously unknown role of HCN channels in shaping synaptic inputs to L2/3 cortical neurons. The authors demonstrate using slice electrophysiology and computational modeling that, unlike layer 5 pyramidal neurons, L2/3 neurons have an enrichment of HCN channels in the proximal dendrites. This location provides a locus of neuromodulation for inputs onto the proximal dendrites from L4 without an influence on distal inputs from L1. The authors use pharmacology to demonstrate the effect of HCN channels on NMDA-mediated synaptic inputs from L4. The authors further demonstrate the developmental time course of HCN function in L2/3 pyramidal neurons. Taken together, this a well-constructed investigation of HCN channel function and the consequences of these channels on synaptic integration in L2/3 pyramidal neurons.

      Strengths:

      The authors use careful, well-constrained experiments using multiple pharmacological agents to asses HCN channel contributions to synaptic integrations. The authors also use a voltage clamp to directly measure the current through HCN channels across developmental ages. The authors also provide supplemental data showing that their observation is consistent across multiple areas of the cerebral cortex.

      Weaknesses:

      The gradient of the HCN channel function is based almost exclusively on changes in EPSP width measured at the soma. While providing strong evidence for the presence of HCN current in L2/3 neurons, there are space clamp issues related to the use of somatic whole-cell voltage clamps that should be considered in the discussion.

      We thank the reviewer for pointing out our careful and well-constrained experiments and for making suggestions. The potential effects of space clamp errors are detailed in the extended explanations under Reviewer 1, Specific points (3).

      Reviewer #3 (Public Review):

      Summary:

      The authors study the function of HCN channels in L2/3 pyramidal neurons, employing somatic whole-cell recordings in acute slices of visual cortex in adult mice and a bevy of technically challenging techniques. Their primary claim is a non-uniform HCN distribution across the dendritic arbor with a greater density closer to the soma (roughly opposite of the gradient found in L5 PT-type neurons). The second major claim is that multiple sources of long-range excitatory input (cortical and thalamic) are differentially affected by the HCN distribution. They further describe an interesting interplay of NMDAR and HCN, serotonergic modulation of HCN, and compare HCN-related properties at 1, 2 and 6 weeks of age. Several results are supported by biophysical simulations.

      Strengths:

      The authors collected data from both male and female mice, at an age (6-10 weeks) that permits comparison with in vivo studies, in sufficient numbers for each condition, and they collected a good number of data points for almost all figure panels. This is all the more positive, considering the demanding nature of multi-electrode recording configurations and pipette-perfusion. The main strength of the study is the question and focus.

      Weaknesses:

      Unfortunately, in its present form, the main claims are not adequately supported by the experimental evidence: primarily because the evidence is indirect and circumstantial, but also because multiple unusual experimental choices (along with poor presentation of results) undermine the reader's confidence. Additionally, the authors overstate the novelty of certain results and fail to cite important related publications. Some of these weaknesses can be addressed by improved analysis and statistics, resolving inconsistent data across figures, reorganizing/improving figure panels, more complete methods, improved citations, and proofreading. In particular, given the emphasis on EPSPs, the primary data (for example EPSPs, overlaid conditions) should be shown much more.

      However, on the experimental side, addressing the reviewer's concerns would require a very substantial additional effort: direct measurement of HCN density at different points in the dendritic arbor and soma; the internal solution chosen here (K-gluconate) is reported to inhibit HCN; bath-applied cesium at the concentrations used blocks multiple potassium channels, i.e. is not selective for HCN (the fact that the more selective blocker ZD7288 was used in a subset of experiments makes the choice of Cs+ as the primary blocker all the more curious); pathway-specific synaptic stimulation, for example via optogenetic activation of specific long-range inputs, to complement / support / verify the layer-specific electrical stimulation.

      We thank the reviewer for their very careful examination of our manuscript and helpful suggestions. We addressed the concerns raised in the review and presented more raw traces in our figures. Although direct dendritic HCN mapping measurements are outside the scope of the current manuscript due to the morphological constraints presented by L2/3 PCs (which explains why no other full dendritic nonlinearity distribution has been described in L2/3 PCs with this method), we nonetheless supplemented our manuscript with additional suggested experiments as suggested. For example, we included the excellent suggestion of pathway-specific optogenetic stimulation to further validate the disparate effect of HCN channels for distal and proximal inputs. We agree that ZD-7288 is a widely accepted blocker of HCN channels. However, the off-target effects on sodium channels may have significantly confounded our measurements of AP output using extracellular stimulation. Therefore, we chose low concentration cesium as the primary blocker for those experiments, but now validated several other Cs<sup>+</sup>-based results with ZD-7288 as well.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      I have some issues that need clarification or correction.

      (1) On page 3, line 90, the authors state "We found that bath application of Cs+ (1mM)..." but the methods and Figure 1 state "2mM Cs+". Please check and correct.

      Correct, typo corrected.

      (2) Related to Cs+ application, the methods state that "CsMeSO4 (2mM) was bath applied..." Is this correct? CsMeSO4 is typically used intracellularly while CsCl is used extracellularly. If so, please justify. If not, please correct.

      It is correct. The justification for not using CsCl selectively extracellularly is that introducing intracellular chloride ions can significantly alter basic biophysical properties, unrelated to the cesium effect. However, no similar distinction has been made for CsMeSO4, which would exclude the use of this drug extracellularly.

      (3) The authors normalize the current injections by cell capacitance (pA/pF). Was this done because there is a significant variance in cell morphology? A bit of justification for why the authors chose to normalize the current injection this way would help. If there is significant variation in cell capacitance across cells (or developmental ages), the authors could also include these data.

      Indeed, we choose to normalize current injection to cellular capacitance due to the markedly different morphology of deep and superficial L2/3 PCs. Deeper L2/3 PCs have a pronounced apical branch, closely resembling other pyramidal cell types such as L5 PCs, while superficial L2/3 PC lack a thick main apical branch and instead are equipped with multiple, thinner apical dendrites. This morphological variation would yield an inherent bias in several of the reported measurements, therefore we corrected for it by normalizing current injection to cellular capacitance, similar to our previous recent publications (Olah, Goettemoeller et al., 2022, Goettemoeller et al. 2024, Kumar et al. 2024).

      (4) On page 15, line 445, the section heading is "PV cell NEURON modeling". Is this a typo? The models are of L2/3 pyramidal neurons, correct?  

      Correct, typo corrected.

      (5) Figures 3F and 3I are plots of the voltage integral for different inputs before and after Cs+. The y-axis label units are "pA*ms". This should be "mV*ms" for a voltage integral.  

      Correct, typo corrected.

      (6) On page 9, line 273, the text reads "Voltage clamp experiments revealed that the rectification of steady-state voltage responses to hyperpolarizing current injection was amplified with 5-CT (Fig. 7c)". Both the text and Figure 7C describe current clamp, not voltage clamp, recordings. Please check and correct.

      Correct, typo corrected.

      (7) Figure 2i looks to be a normalized conductance vs voltage (i.e. activation) plot. The y-axis shows 0-1 but the units are in nS. Is that a coincidence or an error?

      Correct, typo corrected.

      Reviewer #3 (Recommendations For The Authors):

      This is your paper. My comments are my own opinion, I don't expect you to agree or to respond. But I hope that what I wrote below will help you to understand my perspective.

      Please pardon my directness (and sheer volume) in this section - I have a lot of notes/thoughts and hope you may find some of them helpful. My high-level comments are unfortunately rather critical, and in (small) part that is because I encountered too many errors/typos/ambiguities in figures, legend, and text. I expect many would be caught with good proofreading, but uncorrected caused confusion on my part, or an inability to interpret your figures with confidence, given some ambiguity.

      The paper reads a bit like patchwork - likely a result of many "helpful" reviewers who came before me. Consider starting with and focusing on the synaptic findings, expanding the number of figures and panels dedicated to that, showing example traces for all conditions, and giving yourself the space to portray these complex experiments and results. While I'm not a fan of a large number of supplemental figures, I feel you could move the "extra" results to the supplementals to improve the focus and get right to the meat of it.

      For me, the main concern is that the evidence you present for the non-uniform HCN distribution is rather indirect. Ideally, I'd like to see patch recordings from various dendritic locations (as others have done in rats, at least; I'm not sure if L2/3 mice have had such conductance density measurements made in basal and apical dendrites). Otherwise, perhaps optical mapping, either functional or via staining. I also mention some concerns about the choice of internal and cesium. More generally, I want to see more primary data (traces), in particular for the big synaptic findings (non-uniform, L1-vs-L4 differences, NMDAR).

      We thank the reviewer for the helpful suggestions. Indeed, direct patch clamp recording is widely considered to be the best method to identify dendritic ion channel distribution, however, we choose an in silico approach instead, for several reasons. Undoubtedly, one of the main reasons to omit direct dendritic recordings was that due to the uniquely narrow apical dendrites this method is extremely challenging, with no previous examples in the literature where isolated dendritic outside-out patch recordings were achieved from this cell type. However, there are theoretical considerations as well. In primates, it has been demonstrated that HCN1 channels are concentrated on dendritic spines (Datta et al., 2023) therefore direct outside-out recordings are not adequate in these circumstances. In future experiments we could directly target L2/3 PC dendrites for outside out recordings in order to resolve dendritic nonlinearity distribution, although a cell-attached methodology may be better suited due to the HCN biophysical properties being closely regulated by intracellular signaling pathways.

      The introduction and Figures 1 and 2 are not so interesting and not entirely accurate: L2/3 do not have "abundant" HCN, nor is there an actual controversy about whether they have HCN. It's been clear (published) for years that they have about the same as all other non-PT neocortical pyramidal neurons (see e.g. Larkum 2007; Sheets 2011). Your own Figure 1A has a logarithmic scale and shows L2/3 as having the lowest expression (?) of all pyramidals and roughly 10x lower than L5 PT, but the text says "comparable", which is misleading.

      We thank the reviewer for this comment. Although there are sporadic reports in the literature about the HCN content of L2/3 PCs, most of these publications arrive to the same conclusion from the negligible sag potential (as the mentioned Larkum et al., 2007 publication); namely that L2/3 PCs do not contain significant amount of HCN channels. We have shown with voltage and current clamp recordings that this assumption is false, as sag potential is not a reliable indicator of HCN content in L2/3 PCs. With the term “controversial” we aimed to highlight the different conclusions of functional investigations (e.g. Sheets et al., 2011) and sag potential recordings (e.g. Larkum et al., 2007), regarding the importance of HCN channels in L2/3 PCs.

      Non-uniform HCN with distal lower density has already been published for a (rare) pyramidal neuron in CA1 (Bullis 2007), similar to what you found in L2/3, and different from the main CA1 population.

      We thank the reviewer for this suggestion. We have now included the mentioned citation in the introduction section (page 3).

      Express sag as a ratio or percentage, consistently. Figure out why in Figure 7 the average sag ratio is 0.02 while in Fig. S1 it is 0.07 (for V1) - that is a massive difference.

      The calculation of sag ratio is consistent across the manuscript (at -6pA.pF), except for experiments depicted in Fig. 7 where sag ratio was calculated from -2pA/pF steps. Explanation below:

      Sag should be measured at a common membrane potential, with each neuron receiving a current pulse appropriate to reach that potential. Your approach of capacitance-based may allow for the same, but it is not clear which responses are used to calculate a single sag value per cell (as in Figure 2d).

      Thank you, we now included this info in the methods section. Sag potential was measured at the -6 pA/pF step peak voltage, except for Fig. 7 as noted above. We have now included this discrepancy detail in the methods section (page 14 ). These recordings in Fig. 7 took significantly longer than any other recording in the manuscript, as it took a considerable time to reach steady-state response from 5-CT application. -6pA/pF is a current injection in the range of 400-800 pA, which was proven to be too severe for continued application in cells after more than an hour of recording. Accordingly, we decided to lower the hyperpolarizing current step in these recordings. The absolute value of sag is thus different in Fig. 7, but nonetheless the 5-CT effect was still significant. Notably, we probably wouldn’t have noticed the small sag in L2/3 here (and thus the entire study), save for the fact that we looked at -6pA/pF to begin.

      In a paper focused on HCN, I would have liked to see resonance curves in the passive characterization.

      We thank the reviewer for the suggestion. Resonance curves can indeed provide useful insights into the impact of HCN on a cell’s physiological behavior, however, these experiments are outside the scope of our current manuscript as without in vivo recordings, resonance curves do not contribute to the manuscript in our opinion.

      How did you identify L2/3? Did you target cells in L2 or L3 or in the middle, or did you sample across the full layer width for each condition? A quantitative diagram showing where you patched (soma) and where you stimulated (L1, L4) with actual measurements, would be helpful (supplemental perhaps). You mention in the text that some L2/3 don't have a tuft, suggesting some variability in morphology - some info on this would be useful, i.e. since you did fill at least some of the neurons (eg 3A), how similar/different are the dendritic arbors?

      We sampled the entire L2/3 region during our recordings. It has been published that deep and superficial L2/3  PCs are markedly different in their morphology, and a recent publication (Brandelise et al. 2023) has even separated these two subpopulations to broad-tufted and slender tufted pyramidal cells, which receive distinct subcortical inputs. Although this differentiation opens exciting avenues for future research, examining potential layer gradients in our dataset would warrant significantly higher sample numbers and is currently out of the scope of our manuscript.

      Distal vs proximal: this could use more clarification, considering how central it is to your results. What about a synapse on a basal dendrite, but 150 or 200 um from the soma, is that considered proximal? Is the distance to the soma you report measured along the 3D dendrite, along the 2D dendrite, as a straight line to the soma, or just relative to some layers or cortical markers? (I apologize if I missed this).

      We thank the reviewer for pointing out the missing description in the results section. We have amended this oversight (p15).  Furthermore, although deeper L3 PCs have characteristic apical and basal dendritic branches, when recordings were made from more superficial L2 cells, a large portion of their dendrites extended radially, which made their classification ambiguous. Therefore, we did not use “apical” and “basal” terminology in the paper to avoid confusion. Distances were measured along the 3D reconstructed surface of the recovered pyramidal cells. This information is now included in the methods.

      Line 445, "PV cell NEURON modeling" ... hmm. Everyone re-uses methods sections to some degree, but this is not confidence-inspiring, and also not from a proofreading perspective.

      We have corrected the typo.

      It seems that you constructed a new HCN NEURON mechanism when several have been published/reviewed already. Please explain your reasons or at least comment on the differences.

      There are slight differences in our model compared to previously published models. Nevertheless, we took a previously published HCN model as a base (Gasparini et al, 2004), and created our own model to fit our whole-cell voltage clamp recordings.

      Bath-applied Cs+ can change synaptic transmission (in the hippocampus; Chevaleyre 2002). But also ZD7288 has some such effects. Also, see (Harris 1995) for a Cs+ and ZD7288 comparison. As well as (Harris 1994) for more Cs+ side-effects (it broadens APs, etc). Bath-applied blockers may affect both long-range and local synapses in your recordings, via K-channels or perhaps presynaptic HCN (though I am aware of your Fig. 1e). Since you can do intracellular perfusion, you could apply ZD7288 postsynaptically (Sheets 2011), an elegant solution.

      We thank the reviewer for the suggestion. We were aware of the potential presynaptic effects of cesium (i.e., presynaptic Kv or other channel effects) and did measure PPR after cesium application (Fig. 1h), noting no effect. At Cs<sup>+</sup> concentrations used here, we now also include new data in the results showing no effect on somatically recorded AP waveform (i.e., representative of a Kv channel effect). As stated earlier for reviewer 1, we now performed additional experiments using either cesium or ZD-7288 for comparison (e.g., see updated Fig. 1; Supplementary Figure 1; Fig. 3b-e). Intracellular ZD re-perfusion is an elegant solution which we will absolutely consider in future experiments.

      K-Gluconate is reported to inhibit Ih (Velumian 1997), consider at least some control experiments with a different internal for the main synaptic finding - maybe you'll find no big change ...

      We thank the reviewer for the suggestion. Although K-Gluconate can inhibit HCN current, the use of this intracellular solution is often used in the literature to measure this current (Huang & Trussel 2014). We have chosen this intracellular solution to improve recording stability.  

      (Biel 2009) is a very comprehensive HCN review, you may find it useful.

      We thank the reviewer for bringing this to our attention, we have now included the citation in the introduction.

      "Hidden" in your title seems too much.

      We changed the title to more accurately describe our findings and removed ‘hidden’.

      While I'm glad you didn't record at room temperature, the choice of 30C seems a bit unfortunate - if you go to the trouble to heat the bath, why not at least 34C, which is reasonably standard as an approximation for physiological temperature?

      We thank the reviewer for pointing this out. The choice of 30C was made to approach physiological temperature levels, while preserving the slices for extended amounts of time which is a standard approach. Future experiments in vivo be performed to further understand the naturalistic relevance at ~37C.

      Line 506: do you mean "Hz" here? It's not a frequency, is it? I think it's a unitless ratio?

      Correct, we have amended the typo.

      Line 95: you have not shown that HCN is "essential" for "excess" AP firing.

      We have corrected the phrasing, we agree.

      Fig. 2b,c: is this data from a single example neuron, maybe the same neuron as in 2a? Or from all recorded neurons pooled?

      The data is from several recorded cells pooled.

      Fig. 3 (important figure):

      Why did you not use a paired test for panels e and f? You have the same number of neurons for each condition and the expectation is that you record each neuron in control and then in cesium condition, which would be a paired comparison. Or did you record only 1 condition per neuron?

      This figure presents your main finding (in my opinion). You should show examples of the synaptic responses, i.e. raw traces, for each condition and panel, and overlaid in such a way that the reader can immediately see the relevant comparison - it's worth the space it requires.

      We thank the reviewer for the suggestions. Traces are only overlaid in the paper when they come from the same cell. For Fig. 3d-i, EPSPs in every neuron were evoked in 2-3 different locations (i.e., 1-2 ‘L4’ locations for Type-I and Type-II synapses, and one ‘L1’ location in each) with the same stimulation pipette and one pharmacological condition per cell. Therefore two-sample t-test were used since the control and cesium conditions came from separate cells (i.e., separate observations). This was necessary, as we can never assume that the stimulating electrode can return back to the same synapse after moving it. We were not comfortable with showing overlaid traces from different cells, however, we did show representative traces from control and the Cs<sup>+</sup> conditions in Fig. 3h. Complementary ZD-7288 experiments can be found on panel b and c, where we did perform within-cell pharmacology (and thus used paired t-tests) from one stimulation area/cell. We hope these complementary experiments increase overall confidence as neither pharmacological approach is 100% without off-target effects. We now also included more overlaid traces where appropriate (i.e., Fig. 3b, and in the new  Fig. 3k experiments using within-cell pharmacology comparisons). We do realize these complementary approaches could cause confusion to the reader, and have now done our best to make the slightly different approaches in this Figure clearer in the results section.

      Consider repeating at least some of these critical experiments with ZD7288 instead of Cs+ (and not K-gluc), or even with ZD7288 pipette perfusion, if it's technically feasible here.

      We thank the reviewer for the suggestions. Although many of our recordings using Cs<sup>+</sup> already had complementary experiments (such as synaptic experiments Figure 3e vs Figure 3b), we recognize the need to extend the manuscript with more ZD-7288 experiments. We have now extended Figure 1 with three panels (Figure 1 c,d,e), which recapitulates a fundamental finding, the change in overall excitability upon HCN channel blockade, using ZD-7288 as well.

      Fig. 3a, why show a schematic (and weirdly scaled) stimulating electrode? Don't you have a BF photo showing the actual stimulating electrode, which you could trace to scale or overlay? Could you use this panel to indicate what counts as "distal" and what as "proximal", visually?

      The stimulating electrode was unfortunately not filled with florescent materials, therefore it was not captured during the z-stack.

      Fig. 3b: is the y-axis labeled correctly? A "100% change" would mean a doubling, but based on the data points here I think y=100% means "no change"?

      The scale is labeled correctly, 100% means doubling.

      Fig. 3b, c: again, show traces representing distal and proximal, not just one example (without telling us how far it was). And use those traces to illustrate the half-width measurement, which may be non-trivial.

      We have extended Figure 3b with an inset showing the effect of ZD-7288 on a proximal stimulating site. The legend now includes additional information indicating stimulating location 28 µm away from the soma in control conditions (black trace) and upon Z-7288 application (green trace).  

      Line 543, 549: it seems you swapped labels "h" and "i"?

      Typo corrected.

      Fig. 4b: to me, MK-801 only *partially* blocks amplification, but in the text L198 you write "abolish".

      We thank the reviewer for pointing this out. Indeed, there are several other subthreshold mechanisms that are still intact after pipette perfusion, which can cause amplification. We have now clarified this in the text (p7).

      Fig. 4e,f: what is the message? Uniform NMDAR? The red asterisk in (e) is at a proximal/distal ratio of roughly 1. I don't understand the meaning of the asterisk (the legend is too basic) and I'm surprised to see a ratio of 1 as the best fit, and also that the red asterisk is at a dendritic distance of 0 um in (f). This could use more explanation (if you feel it's relevant).

      We thank the reviewer for pointing this out. We have now included a better explanation in the results and figure legend. We have also updated the figure to make it clearer and added model traces in Fig. 4f, which correspond to example data from slices in Fig. 4g (both green). The graph suggests nonuniform, proximally abundant NMDA distribution. The color coding corresponds to the proximal EPSP halfwidth divided by distal EPSP halfwidth. It is true that the dendritic distance ‘center’ was best-fit very close to the soma, but also note the dispersion (distribution) half-width was >150mm, so there is quite a significant dendritic spread despite the proximal bias prediction. Based on this model there is likely NMDA spread throughout the entire dendrite, but biased proximally. Naturally, future work will need to map this at the spine level so this is currently an oversimplification. Nonetheless, a proximal NMDA bias was necessary to recapitulate findings from Fig. 3, and additional slice recordings in Fig. 4 were consistent with this interpretation.

      Fig. 4g: I feel your choice of which traces to overlay is focusing on the wrong question. As the reader, what I want to see here is an overlay of all 4 conditions for one pathway. If this is a sequential recording in a single cell (Cs, Cs+MK801, wash out Cs, MK801), then the overlay would be ideal and need not be scaled. Otherwise, you can scale it. But the L1/L4 comparison does not seem appropriate to me. I find myself trying to imagine what all the dark lines would look like overlaid, and all the light lines overlaid separately. Also, the time axis is missing from this panel. Consider a subtraction of traces (if appropriate).

      In these recordings, all EPSPs cells were measured using a stimulating electrode that was moved between L1 and L4 (only once, to keep the exact input consistent) to measure the different inputs in a single neuron. In separate sets of experiments, the same method was used but in the presence of Cs<sup>+</sup>, Cs<sup>+</sup> + MK-801, or MK-801 alone. This was the most controlled method in our hands for this type of approach, as drug wash outs were either impractical or not possible.  Overlaying four traces would have presented a more cluttered image, and were not actually performed experimentally. As our aim was to resolve the proximal-distal halfwidth relationship, therefore we deemed the within-cell L1 vs. L4 comparison appropriate. We have nonetheless added model traces in Fig. 4f, which correspond to example data from slices in Fig. 4g (both green). The bar graphs should serve also serve to illustrate the input-specific  relationship- i.e., that the only time the L1 and L4 EPSP relationship was inverted was in the presence of Cs<sup>+</sup> (green bars) and that this effect was occluded with simultaneous MK-801 in the pipette (red bars).

      Line 579: should "hyperpolarized" be depolarized?

      Corrected

      Fig. 5a: it looks like the HCN density is high in the most basal dendrites (black curve above), then drops towards the soma, then rises again in the apicals (red curve). Is that indeed how the density was modeled? If so, this is completely at odds with the impression I received from reading your text and experimental data - there, "proximal" seems to mean where the L4 axons are, and "distal" seems to mean where the L1 axons are, in other words, high HCN towards the pia and low HCN towards the white matter. But this diagram suggests a biphasic hill-valley-hill distribution of HCN (meaning there is a second "distal" region below the soma). In that case, would the laterally-distant basal dendrites also be considered distal? How does the model implement the distribution - is it 1D, 2D or 3D? As you can probably tell, this figure raised more questions for me and made me wonder why I don't have a better understanding yet of your definitions.

      We thank the reviewer for pointing this out. We agree our initial cartoon of the parameter fitting procedure was not accurate and should have just been depicted a single ‘curve’. We have now simplified it to better demonstrate what the model is testing, and also made the terms more consistent and accurate. There is no ‘second’ region in the model. We hope this better illustrates it now. We also edited the legend to be clearer. Because the model description in Fig. 4d suffered from similar shortcomings, we also modified it accordingly as well as the figure legend there.

      Fig. 5b: why is the best fit at a proximal/distal ratio of 1, yet sigma is 50 um?

      Proximal/distal bias on this figure was fitted to 0.985 (prox/distal ratio) as we modeled control conditions, with intact NDMA and HCN channels,  which closely approximated the control recording comparisons.

      Fig. 6h, Line 662: "vs CsMeSO4 ... for putative LGN events" The panel shows proximal vs distal, not control vs Cs+. What's going on here?

      Typo corrected.

      Fig. 7e: the ctrl sag ratio here averages 0.02, while in Fig. S1 the average (for V1 and others) is about 0.07.  Please refer to our answer given to the previous question regarding sag ratio measurements. Briefly, recordings made with 5-CT application were made using a less severe, -2 pA/pF current injection to test seg responses. This more modest hyperpolarization activated less HCN channels, therefore the sag ratio is lower compared to previously reported datapoints.

      We have included this explanation in the methods section (page 14)

      Now hear you are using a paired test for this pharmacology, but you didn't previously (see my earlier comments/questions).

      Paired t-test were used for these experiments as these control and test datapoints came from the same cell. Cells were recorded in control conditions, and after drug application.

      Line 137: single-axon activation: but cortical axons make multi-synaptic contacts, at least for certain types of pre- and post-synaptic neurons, and (e.g. in L5-L5 pairs) those contacts can be distributed across the entire dendritic arbor. In other words, it's possible that when you stimulate in L1, you activate local axons, and the signal could then propagate to multiple synaptic contact locations, some being distal and some proximal. Maybe you have reasons to believe you're able to avoid this?

      We thank the reviewer for this question. Cortical axons often make distributed contacts, however, top-down and bottom-up pathways innervating L2/3 PCs are at least somewhat restricted to L2/3/L4 and L1, respectively (Shen et al. 2022, Sermet et al. 2019). Therefore, due to the lack evidence suggesting a heavily mixed topographical distribution for top-down and bottom-up inputs, we have reason to believe that L1 stimulation will result in mainly distal input recruitment, while L4 stimulation will mainly excite proximal dendritic regions. The resolution of our experiments was also improved by the minimal stimulation and visual guidance (subset of experiments) of the stimulation. Furthermore, new optogenetic experiments stimulating LGN and LM axons, which have been anatomically defined previously as biased to deeper layers and L1, respectively, were now also performed (Fig. 3j-l) with analogous cesium effects as our local electrical stimulation experiments. Future work using varying optogenetic stimulation parameters will expand on this.

      L140: "previous reports" ==> citation needed.

      We have inserted the citation needed.

      L149: "arriving to layer 1"; but I think earlier you noted that some or many L2/3 neurons lack a dendritic tuft; do they all nevertheless have dendrites in L1? Note that cortico-cortical long-range axons still need to pass through all cortical layers on their way up to L1.

      We thank the reviewer for the question. Although the more superficial L2/3 PCs lack distinct apical tuft, their dendrites reach the pia similarly to deeper L2/3 PCs. All of our recorded and post-hoc recovered cells had dendrites in L1, except in cases where they were clearly cut during the slicing procedure, which cells were occluded from the study.

      When you write "L4 axons" or "L4 inputs", do you specifically mean long-range thalamic axons? Or axons from local L4 neurons? What about axons in L4 that originate from L5 pyramidal neurons?

      In case of ‘L4’ axons, we cannot disambiguate these inputs a priori, as they are both part of the bottom-up pathway, and are possibly experimentally indistinguishable. Even with restricted opto LGN stimulation, disynaptic inputs via L4 PCs cannot be completely ruled out under our conditions. On the other hand, the probability of L5 PC axons to terminate on L2/3 PCs is exceedingly low (single reported connection out of 1145 potential connections; Hage et al. 2022). We did find two clearly different synaptic subpopulations (Supp. Fig 3) in L4- which was tempting to classify as one or the other. However we felt there was not enough evidence in the literature as well as our additional optogenetic experiments to make a classification on the source of these different L4 inputs. Thus we deemed them as Type-I or Type-II for now.

      Do you inject more holding current to compensate for the resting membrane potential when Cs+ or ZD7288 is in the bath?

      We thank the reviewer for the question. We did not inject a compensatory current, as we wanted to investigate the dual, physiologically relevant action of HCN channels (George et al. 2009)

      I'd like to see distributions (histograms) of L4 and L1 EPSP amplitudes, under control conditions and ideally also under HCN block.

      We have now extended the manuscript with a supplementary figure (Supplementary Figure 6) to show that EPSP peak was not distance dependent in control conditions, and there was no relationship between peak and halfwidth in our dataset.

      Line 186, custom pipette perfusion: why not use this for internal ZD7288, to make it cell-specific?

      We thank the reviewer for the question, this is a good point. In future work we will consider this when applicable. It is certainly a way to control for bath application confounds in many ways.

      L205: "recapitulate our experimental findings" - which findings do you mean? I think a bit of explanation/referencing would help.

      Corrected.

      Line 210: L4-evoked were narrower than L1-evoked: is this not expected based on filtering?

      We thank the reviewer for pointing this out, the word “Intriguingly” has been omitted.

      Line 231 and 235: "in L5 PCs" should be restricted to L5 PT-type PCs.

      We have corrected this throughout the manuscript.

      Neuromodulation, Fig. 7, L263-282: the neuromodulation finding is interesting. However, a bit like the developmental figure, it feels "tacked on" and the transition feels a bit awkward. I think you may want to discuss/cite more of the existing literature on neuromodulatory interactions with HCN (not just L2/3). Most importantly, what I feel is missing is a connection to your main finding, namely L1 and L4 inputs. Does serotonergic neuromodulation put L1 and L4 back on equal footing, or does it exaggerate the differences?

      We thank the reviewer for the question. We agree with the reviewer that Figure 7 does not give a complete picture about how the adult brain can capitalize on this channel distribution, as our intention was to show that HCN channels are not a stationary feature of L2/3 PC, but a feature which can be regulated developmentally and even in the adult brain via neuromodulation. In other words, the subthreshold NMDA boosting we observed can be gated by HCN, depending on developmental stage and/or neuromodulatory state of the system. We have now added some brief language to better introduce the transition and its relevance to the current study in the results (p8), and discussed the implications in the discussion section of the original manuscript.

      General comment: different types/sources of synapses may have different EPSP kinetics. I feel this is not mentioned/discussed adequately, considering your emphasis on EPSPs/HCN.

      See points above on input-specific synaptic diversity.

      Line 319/320: enriched distal HCN is found in L5 PT-type, not in all L5 PCs.

      Corrected

      L320: CA1 reportedly has a subset of pyramidal neurons that have higher proximal HCN than distal (I gave the citation above). In light of that, I think "unprecedented" is an overstatement.

      Corrected.

      Methods:

      L367: What form of anesthesia was used?

      Amended.

      Which brain areas, and how?

      Amended.

      Why did you first hold slices at 34C, but during recording hold at 30C?

      We held the slices at 34C to accelerate the degradation of superficial damaged parts of the slice, which is in line with currently used acute slice preparation methodologies, regardless of the subsequent recording temperature.

      Pipette resistance/tip size?

      Amended.

      Cell-attached recordings (L385): provide details of recordings. What was the command potential (fixed value, or did you adjust it per neuron by some criteria)?

      Amended.

      What type of stimulating electrode did you use? If glass, what solution is inside, and what tip size?

      We thank the reviewer for pointing these out, the specific points were added to the methods section.

      L392/393: you adjusted the holding (bias) current to sit at -80 mV. What were the range and max values of holding current? Was -80 mV the "raw" potential, or did it account for liquid junction? If you did not account for liquid junction potential, then would -80 in your hands effectively be between -95 and -90 mV? That seems unusually hyperpolarized.

      All cells were held with bias holding currents between -50 pA and 150 pA. To be clear, as mentioned below, we did not change the bias current after any drug applications. We did not correct for liquid junction potential, and cells were ‘held‘ with bias current at -80 mV as during our recordings, as 1) this value was apparently close to the RMP (i.e. little bias current needed at this voltage on average) (Fig. 2e) and 2) to keep consistent conditions across recordings. The uncorrected -80 mV is in the range of previously reported membrane potential values both in vivo and in vitro (Svoboda et al. 1999, Oswald et al. 2008, Luo et al. 2017), which found the (corrected) RMP to be below -80mV. Naturally this will not reflect every in vivo condition completely and further investigation using naturalistic conditions in the future are warranted.  

      Did you adjust the bias current during/after pharmacology?

      Bias current was not adjusted in order to resolve the effect on resting membrane potential.

      L398: sag calculation could use better explanation: how did you combine/analyze multiple steps from a single neuron when calculating sag? Did you choose one level (how) or did you average across step sizes or ...?

      Sag ratio was measured at -6 pA/pF current step except for one set of experiments in Fig. 7. Methods section was amended.

      L400, 401: 10 uM Alexa-594 or 30 um Alexa-594, which is correct?

      10 µM is correct, typo was corrected

      L445: "PV cell" seems like a typo?

      Typo is corrected.

      L450: "altered", please describe the algorithm or manual process.

      Alterations were made manually.

      L474: NDMA, typo.

      Typo is fixed.

      L474: "were adjusted", again please describe the process.

      Adjustments were made by a grid-search algorithm.

      Biel, M., Wahl-Schott, C., Michalakis, S., & Zong, X. (2009). Hyperpolarization-activated cation channels: from genes to function. Physiological reviews, 89(3), 847-885. https://journals.physiology.org/doi/full/10.1152/physrev.00029.2008 - (very comprehensive review of HCN)

      Bullis JB, Jones TD, Poolos NP. Reversed somatodendritic I(h) gradient in a class of rat hippocampal neurons with pyramidal morphology. J Physiol. 2007 Mar 1;579(Pt 2):431-43. doi: 10.1113/jphysiol.2006.123836. Epub 2006 Dec 21. PMID: 17185334; PMCID: PMC2075407. https://physoc.onlinelibrary.wiley.com/doi/full/10.1113/jphysiol.2006.123836 - (CA1 subset (PLPs) have a reversed HCN gradient; cell-attached patches, NMDAR)

      Velumian AA, Zhang L, Pennefather P, Carlen PL. Reversible inhibition of IK, IAHP, Ih, and ICa currents by internally applied gluconate in rat hippocampal pyramidal neurones. Pflugers Arch. 1997 Jan;433(3):343-50. doi: 10.1007/s004240050286. PMID: 9064651. https://link.springer.com/article/10.1007/s004240050286 - (K-Gluc internal inhibits HCN)

      Sheets, P. L., Suter, B. A., Kiritani, T., Chan, C. S., Surmeier, D. J., & Shepherd, G. M. (2011). Corticospinal-specific HCN expression in mouse motor cortex: I h-dependent synaptic integration as a candidate microcircuit mechanism involved in motor control. Journal of neurophysiology, 106(5), 2216-2231. https://journals.physiology.org/doi/full/10.1152/jn.00232.2011 - (L2/3 IT have same sag ratio as all other non-PT pyramidals, roughly 5% (vs 20% PT); intracellular ZD7288 used at 10 or 25 um)

      Harris NC, Constanti A. Mechanism of block by ZD 7288 of the hyperpolarization-activated inward rectifying current in guinea pig substantia nigra neurons in vitro. J Neurophysiol. 1995 Dec;74(6):2366-78. doi: 10.1152/jn.1995.74.6.2366. PMID: 8747199. https://journals.physiology.org/doi/abs/10.1152/jn.1995.74.6.2366 - (comparison Cs+ and ZD7288)

      Harris, N. C., Libri, V., & Constanti, A. (1994). Selective blockade of the hyperpolarization-activated cationic current (Ih) in guinea pig substantia nigra pars compacta neurones by a novel bradycardic agent, Zeneca ZM 227189. Neuroscience letters, 176(2), 221-225. https://www.sciencedirect.com/science/article/abs/pii/0304394094900876 - (Cs+ is not HCN-selective; it also broadens APs, reduces the AHP)

      Chevaleyre, V., & Castillo, P. E. (2002). Assessing the role of Ih channels in synaptic transmission and mossy fiber LTP. Proceedings of the National Academy of Sciences, 99(14), 9538-9543. https://pnas.org/doi/abs/10.1073/pnas.142213199 - (Cs+ blocks K channels, increases transmitter release; but also ZD7288 affects synaptic transmission)

      Thank you

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      The modeling and experimental work described provide solid evidence that this model is capable of qualitatively predicting alterations to the swing and stance phase durations during locomotion at different speeds on intact or split-belt treadmills, but a revision of the figures to overlay the model predictions with the experimental data would facilitate the assessment of this qualitative agreement. This paper will interest neuroscientists studying vertebrate motor systems, including researchers investigating motor dysfunction after spinal cord injury.

      Figures showing the overlay of the experimental data with the modeling predictions have been included as figure supplements for Figures 5-7. This highlights how accurate the model predictions were.

      Public Reviews:

      Reviewer #1 (Public review):

      We thank the reviewer for the positive evaluation of our paper and emphasizing its strengths in the Summary.

      Weaknesses:

      (1) Could the authors provide a statement in the methods or results to clarify whether there were any changes in synaptic weight or other model parameters of the intact model to ensure locomotor activity in the hemisected model?

      Such a statement has been inserted in Materials and Methods, section “Modeling”. Also, in the 1st paragraph of section “Spinal sensorimotor network architecture and operation after a lateral spinal hemisection”, we stated that no “additional changes or adjustments” were made.

      (2) The authors should remind the reader what the main differences are between state-machine, flexor-driven, and classical half-center regimes (lines 77-79).

      Short explanations/reminders have been inserted (see lines 80-83 of tracked changes document).

      (3) There may be changes in the wiring of spinal locomotor networks after the hemisection. Yet, without applying any sort of plasticity, the model is able to replicate many of the experimental data. Based on what was experimentally replicated or not, what does the model tell us about possible sites of plasticity after hemisection?

      Quantitative correspondence of changes in locomotor characteristics predicted by the model and those obtained experimentally provide additional validation of the model proposed in the preceding paper and used in this paper. This was our ultimate goal. None of the plastic changes during recovery were modeled because of a lack of precise information on these changes. The absence of possible plastic changes may explain the small discrepancies between our simulations and experimental data (see Supplemental Figures that have been added). However, the model only has a simplified description of spinal circuits without motoneurons and without real simulation of leg biomechanics. This limits our analysis or predictions of possible plastic changes within a reasonable degree of speculation. This issue is discussed in section: “Limitations and future directions” in the Discussion. We have also inserted a sentence: “The lack of possible plastic changes in spinal sensorimotor circuits of our model may explain the absence of exact/quantitative correspondences between simulated and experimental data.

      (4) Why are the durations on the right hemisected (fast) side similar to results in the full spinal transected model (Rybak et al. 2024)? Is it because the left is in slow mode and so there is not much drive from the left side to the right side even though the latter is still receiving supraspinal drive, as opposed to in the full transection model? (lines 202-203).

      This is correct. We have included this explanation in the text (lines 210-211 of tracked changes document).

      (5) There is an error with probability (line 280).

      This typo was corrected.

      Reviewer #2 (Public review):

      This is a nice article that presents interesting findings. One main concern is that I don't think the predictions from the simulation are overlaid on the animal data at any point - I understand the match is qualitative, which is fine, but even that is hard to judge without at least one figure overlaying some of the data.

      We thank the Reviewer for the constructive comments. Figures showing the overlay of the experimental data with the modeling predictions have been included as figure supplements for Figures 5-7. This highlights how accurate the model predictions were.

      Second is that it's not clear how the lateral coupling strengths of the model were trained/set, so it's hard to judge how important this hemi-split-belt paradigm is. The model's predictions match the data qualitatively, which is good; but does the comparison using the hemi-split-belt paradigm not offer any corrections to the model? The discussion points to modeling plasticity after SCI, which could be good, but does that mean the fit here is so good there's no point using the data to refine?

      The model has not been trained or retrained, but was used as it was described in the preceding paper. Response: Quantitative correspondence of changes in locomotor characteristics predicted by the model and those obtained experimentally provide additional validation of the model proposed in the preceding paper and used in this paper. This was our ultimate goal. None of the plastic changes during recovery were modeled because of a lack of precise information on these changes. The absence of possible plastic changes may explain the small discrepancies between our simulations and experimental data (see figure supplements that have been added). However, the model only has a simplified description of spinal circuits without motoneurons and without real simulation of leg biomechanics. This limits our analysis or predictions of possible plastic changes within a reasonable degree of speculation. This issue is discussed in section: “Limitations and future directions” in the Discussion.

      The manuscript is well-written and interesting. The putative neural circuit mechanisms that the model uncovers are great, if they can be tested in an animal somehow.

      We agree and we are considering how we can do this in an animal model.

      Page 2, lines 75-6: Perhaps it belongs in the other paper on the model, but it's surprising that in the section on how the model has been revised to have different regimes of operation as speed increases, there is no reference to a lot of past literature on this idea. Just one example would be Koditschek and Full, 1999 JEB Figure 3, where they talk about exactly this idea, or similarly Holmes et al., 2006 SIAM review Figure 7, but obviously many more have put this forward over the years (Daley and Beiwener, etc). It's neat in this model to have it tied down to a detailed neural model that can be compared with the vast cat literature, but the concept of this has been talked about for at least 25+ years. Maybe a review that discusses it should be cited?

      We have revised the Introduction to include the suggested references.

      Page 2, line 88: While it makes sense to think of the sides as supraspinal vs afferent driven, respectively, what is the added insight from having them coupled laterally in this hemisection model? What does that buy you beyond complete transection (both sides no supra) compared with intact?

      We are trying to make one model that could reproduce multiple experimental data in quadrupedal locomotion, including genetic manipulations with (silencing/removal) particular neuron types (and commissural interneurons), as pointed out in the section “Model Description” in the Results. These lateral connections are critical for reproducing and explaining other locomotor behaviors demonstrated experimentally. However, even in this study, these lateral interactions are necessary to maintain left-right coordination and equal left-right frequency (step period) during split-belt locomotion and after hemisection.

      I can see how being able to vary cycle frequencies separately of the two limbs is a good "knob" to vary when perturbing the system in order to refine the model. But there isn't a ton of context explaining how the hemi-section with split belt paradigm is important for refining the model, and therefore the science. Is it somehow importantly related to the new "regimes" of operation versus speed idea for the model?  

      We did not refine the model in this paper. We just used it for new simulations. The predictions strengthen the organization and operation of the model we recently proposed.

      Page 5, line 212: For the predictions from the model, a lot depends on how strong the lateral coupling of the model is, which, in turn, depends on the data the model was trained on. Were the model parameters (especially for lateral coupling of the limbs) trained on data in a context where limbs were pushed out of phase and neuronal connectivity was likely required to bring the limbs back into the same phase relationship? Because if the model had no need for lateral coupling, then it's not so surprising that the hemisected limbs behave like separate limbs, one with surpaspinal intact and one without.

      Please see our response above concerning the need for lateral interactions incorporated to the model.

      Page 8, line 360: The discussion of the mechanisms (increased influence of afferents, etc) that the model reveals could be causing the changes is exciting, though I'm not sure if there is an animal model where it can be tested in vivo in a moving animal.

      We agree it may be difficult to test right now but we are considering experimental approaches.

      Page 9, line 395: There are some interesting conclusions that rely on the hemi-split-belt paradigm here.

      We agree with this comment. Thanks.

      Reviewer #2 (Recommendations for the authors):

      Figures: Why aren't there any figures with the simulation results overlaid on the animal data?

      We followed this suggestion. Figures showing the overlay of the experimental data with the modeling predictions have been included as figure supplements.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      A nice study trying to identify the relationship between E. coli O157 from cattle and humans in Alberta, Canada.

      Strengths:

      (1) The combined human and animal sampling is a great foundation for this kind of study.

      (2) Phylogenetic analyses seem to have been carried out in a high-quality fashion.

      Weaknesses:

      I think there may be a problem with the selection of the isolates for the primary analysis. This is what I'm thinking:

      (1) Transmission analyses are strongly influenced by the sampling frame.

      (2) While the authors have randomly selected from their isolate collections, which is fine, the collections themselves are not random.

      (3) The animal isolates are likely to represent a broad swathe of diversity, because of the structured sampling of animal reservoirs undertaken (as I understand it).

      (4) The human isolates are all from clinical cases. Clinical cases of the disease are likely to be closely related to other clinical cases, because of outbreaks (either detected, or undetected), and the high ascertainment rate for serious infections.

      (5) Therefore, taking an equivalent number of animal and clinical isolates, will underestimate the total diversity in the clinical isolates because the sampling of the clinical isolates is less "independent" (in the statistical sense) than sampling from the animal isolates.

      (6) This could lead to over-estimating of transmission from cattle to humans.

      We appreciate the reviewer’s careful thoughts about our sampling strategy. We agree with points (1) and (2), and we have provided additional details on the animal collections as requested (lines 95-101).

      We agree with point (3) in theory but not in fact. As shown in Figure 3, the cattle isolates were very closely related, despite the temporal and geographic breadth of sampling within Alberta. The median SNP distance between cattle sequences was 45 (IQR 36-56), compared to 54 (IQR 43-229) SNPs between human sequences from cases in Alberta during the same years. Additionally, as shown in Figure 2, only clade A and B isolates – clades that diverge substantially from the rest of the tree – were dominated by human cases in Alberta. We have better highlight this evidence in the revision (lines 234-236 and 247-249).

      We agree with the reviewer in point (4) that outbreaks can be an important confounder of phylogenetic inference. This is why we down-sampled outbreaks (based on genetic relatedness, not external designation) in our extended analyses. We did not do this in the primary analysis, because there were no large clusters of identical isolates. Figure 3b shows a limited number of small clusters; however, clustered cattle isolates outnumbered clustered human isolates, suggesting that any bias would be in the opposite direction the reviewer suggests. In the revision, we down-sampled all analyses and, indeed, the proportion of human lineages descending from cattle lineages increased (lines 259-261). Regarding severe cases being oversampled among the clinical isolates, this is absolutely true and a limitation of all studies utilizing public health reporting data. We made this limitation to generalizability clearer in the discussion. However, as noted above, clinical isolates were more variable than cattle isolates, so it does not appear to have heavily biased the analysis (lines 490-495).

      We disagree with the reviewer on point (5). While the bias toward severe cases could make the human isolates less independent, the relative sampling proportions are likely to induce greater distance between clinical isolates than cattle isolates, which is exactly what we observe (see response to point (3) above). Cattle are E. coli O157:H7’s primary reservoir, and humans are incidental hosts not able to sustain infection chains long-term. Not only is the bacteria prevalent among cattle, cattle are also highly prevalent in Alberta. Thus, even with 89 sampling points, we are still capturing a small proportion of the E. coli O157:H7 in the province. Being able to sample only a small proportion of cattle’s E. coli O157:H7 increases the likelihood of only sampling from the center of the distribution, making extreme cases such as that shown at the very bottom of the tree in Figure 4, rare and important. In comparison, sampling from human cases constitutes a higher proportion of human infections relative to cattle, and is therefore more representative of the underlying distribution, including extremes. We added this point to the limitations (lines 495-504). As with the clustering above, if anything, this outcome would have biased the study away from identifying cattle as the primary reservoir. Additionally, the relatively small proportion of cattle sampled makes our finding that 15.7% of clinical isolates were within 5 SNPs of a cattle isolate, the distance most commonly used to indicate transmission for E. coli O157:H7, all the more remarkable.

      Because of the aforementioned points, we disagree with the reviewer’s conclusion in point (6). If a bias exists, we believe transmission from cattle-to-humans is likely underestimated for the reasons given above. Not only do all prior studies indicate ruminants as the primary reservoirs of E. coli O157:H7, and humans as only incidental hosts, our specific data do not support the reviewer’s individual contentions. The results of the sensitivity analysis the reviewer recommended is consistent with the points we outlined above, estimating that 94.3% of human lineages arose from cattle lineages (vs. 88.5% in the primary analysis). We have opted to retain the more conservative estimate of the primary analysis, which includes a more representative number of clinical cases.

      (7) We hypothesize that the large proportion of disease associated with local transmission systems is a principal cause of Alberta's high E. coli O157:H7 incidence" - this seems a bit tautological. There is a lot of O157 because there's a lot of transmission. What part of the fact it is local means that it is a principal cause of high incidence? It seems that they've observed a high rate of local transmission, but the reasons for this are not apparent, and hence the cause of Alberta's incidence is not apparent. Would a better conclusion not be that "X% of STEC in Alberta is the result of transmission of local variants"? And then, this poses a question for future epi studies of what the transmission pathway is.

      The reviewer is correct, and the suggestion for the direction of future studies was our intent with this statement. We have removed this sentence.

      Reviewer #1 (Recommendations For The Authors):

      (1) To address my concerns about the different sampling frames in humans and animals, I would suggest a sensitivity analysis, using something like the following strategy. Make a phylogeny of all the available genome sequences from humans and cattle from Alberta. Phylogenetically sub-sample the tree, using something like Treemer (https://github.com/fmenardo/Treemmer), to remove phylogenetically redundant isolates from the same host type. Randomly select 100 human and 100 animal isolates from this non-redundant tree, and re-do your analysis.

      Although we originally down-sampled outbreaks for our analysis of the extended Alberta tree (2007-2019), we had not done this systematically for all analyses. We were not able to use the recommended Treemer tool, because we did not see a way to incorporate the timing of sequences. Because the objective of our study was to evaluate persistence, we did not want to exclude identical sequences that were separated in time and thus could be indicating persistence. To accomplish this, we developed a utility that allowed us to incorporate the temporality of sequences. Using this utility, we systematically down-sampled all sequences that met the following conditions: 1) within 0-2 SNPs of another sequence and 2) no gaps in sequence set >2 months. The second condition means that for any set of sequences within 0-2 SNPs of one another, there can be no more than 2 months without a sequence from the set. Similar sequences that occur beyond this 2-month-cutoff would be considered a separate set for down-sampling. This cutoff was chosen based on the epidemiology of E. coli O157 outbreaks, which are generally either point-source or continuous-source outbreaks. Intermittent outbreaks of a single strain are believed to arise from distinct contamination events and are exactly the type of phenomena we are seeking to identify. We have added details on down-sampling to the Methods (lines 178-180).

      After down-sampling, our primary analysis included 115 human and 84 cattle isolates. T conduct the recommended sensitivity analysis, we further randomly subsampled the human isolates, selecting 84 to match the number of cattle isolates. As we suggested in our initial response, and contrary to the reviewer’s concern, subsampling in this way accentuated the results, with 94.3% of human lineages inferred as arising from cattle lineages, compared to 88.5% in the primary analysis. This sensitivity analysis also identified 10 of the 11 LPLs identified in the primary analysis. The LPL not identified had 5 isolates in the primary analysis, the minimum for definition as an LPL, and was reduced to 4 isolates through subsampling. This sensitivity analysis is shown in Suppl. Figure S3.

      (2) This is the first time I've seen target diagrams used for SNP distances, I'm not sure of their value compared with histograms. They seem to emphasise the maximum distance, rather than the largest number of isolates. I.e. most isolates are closely related, but the diagram emphasises the small number of divergent ones.

      In using the target diagrams, we sought to emphasize the bimodal distribution of human-to-closest-cattle SNP differences. However, this is still mostly visible in a histogram, so we have replaced the target diagrams with a histogram as suggested (Figure 3).

      (3) L130 - fastqc doesn't trim adapters and read ends, there will be something else like trimmomatic which does.

      The reviewer is correct, and we appreciate them catching this error. Trimmomatic is incorporated into the Shovill pipeline, which was the assembler we used through the Bactopia pipeline. We have updated the Methods to indicate this (lines 142-144).

      (4) I find the flow of the article a bit confusing. You have your primary analysis, but Figure 2, which is a secondary analysis, comes before Figure 3. Which is the primary analysis? For me, primary analysis results should come first, or at least signpost a bit better.

      Figure 2 is not a secondary analysis. It is intended to provide an overview of the isolates used from the phylogenetic perspective, just as the diagram in Figure 1 provides an overview of the isolates by analysis. The secondary analyses are shown in Figures 5-7. We have added a sub-header, “Description of Isolates”, to the section referring to Figure 2, to clarify (line 232).

      (5) Locally persistent lineage definition. What is the rationale for the different criteria signifying locally persistent lineages? There is nothing in some of your criteria e.g. all isolates <30 SNPs from each other, which indicates that it is locally persistent - could have been transmitted to Japan (just to pick a place at random), causing a bunch of cases there, and then come back for all we know. Would that be a locally persistent lineage? Did you use the MCC tree here? That is a sub-sample of your full dataset, I am not sure what exactly you're trying to say with the LPLs, but maybe using a larger dataset would be better? Also, there are lots of STEC genomes available from e.g. UK and USA, by only including a fraction of these, you limit the strength of the inferences you can make about locally persistent lineages unless you know that they don't see the G sub-lineage that you observe.

      The reviewer raises multiple points here. First, regarding our definition of LPLs, it is intended to identify those lineages that pose a threat to populations in the specific geographic area (“local”) for at least 1 year (“persistent”) that are likely to be harbored in local reservoirs. Each of the criteria contributes to this definition.

      (1) A single lineage of the MCC tree with a most recent common ancestor (MRCA) with ≥95% posterior probability: This criterion provides confidence in the given isolates being part of a single, defined lineage. The posterior probability gives the probability that the topology of the tree is accurate, based on the data provided and the chosen model of evolution. In other words, we required at least 95% probability that the lineage was correct, and in practice the posterior probability of the lineages we defined as LPLs was 99.7-100% (we have added this detail to the text, lines 269-270). We also added a sensitivity analysis, shown in Suppl. Figure S4, which shows all sampled trees. We find that the essential structure of the tree around the LPLs we defined is well-supported.

      (2) All isolates ≤30 core SNPs from one another: This criterion limited LPLs to those lineages where the isolates were closely related. We did not want to limit LPLs to those that might define an outbreak, for example using a 5-10 SNP threshold, because the point of the study is to identify lineages that persistently cause disease over longer periods than a normal outbreak. Pathogens evolve over time in their reservoirs, leading to greater SNP distances, and we wanted to allow for this. The U.S. CDC has acknowledged a similar concern for such persistent lineages in its definition of REP strains, which it has defined based on ranges of 13-104 allele differences by cgMLST. Thus, our choice of 30 core SNPs as the threshold is in line with current practice in the emerging science on persistence of enteric pathogens. We have also added a sensitivity analysis examining alternate SNP thresholds, shown in Suppl. Figure S5, which results in clusters of LPLs identified in the primary analysis being grouped into larger lineages. Additionally, in the tree showing our primary analysis (Figure 4), we now note the minimum number of SNPs all isolates within the lineage differ by.

      (3) Contained at least 1 cattle isolate: This criterion increases confidence that the lineage is indeed “local”. Unlike humans, cattle are not known to be routinely infected by imported food products, and they do not make roundtrip journeys to other locations, as humans infected during travel do. Cattle themselves may be imported into Alberta while infected, and cattle in Alberta can be infected by other imported animals. In these cases, if the STEC strains the cattle harbor persist for ≥1 year, they become the type of lineages we are interested in as LPLs, regardless where they previously came from, because they are now potential persistent sources of infection in Alberta. By including at least one cattle isolate in each LPL, the only way an identified LPL is not actually local is if cattle are imported from the lineage’s reservoir community elsewhere (e.g., in Japan, as the reviewer suggested), the lineage is persisting in that non-Alberta reservoir, and newly infected cattle are imported repeatedly over 1 or more years. This could feasibly explain G(vi)-AB LPL 5 (Figure 4), which is entirely composed of cattle. Indeed, such an explanation would be consistent with the lack of new cases from this LPL after 2015 in the extended analysis (Figure 5). However, for all other LPLs, which contain both cattle and human isolates, for the LPL to not be local, both cattle and human cases would have to be imported from the same non-Alberta reservoir. While this is possible, the probability of such a scenario is low, and it decreases the more isolates are in an LPL. For the average LPL, this means 4 human and 6 cattle cases would need to be imported from a non-Alberta reservoir over several years. Given that our study is only a random sample of the total STEC cases and cattle in Alberta from 2007-2015, these numbers are underestimates of the true absolute number of cases and cattle associated with LPLs that would have to be explained by importation if the LPL were not local. We have added some explanation of the possibility of importation in the Discussion where we discuss the LPL criteria (lines 376-380).

      (4) Contained ≥5 isolates: In concert with criterion 3, this criterion guards against anomalies being counted as LPLs. By requiring at least 5 isolates in an LPL after down-sampling, at least 5 infection events must have occurred from the LPL, reducing the likelihood of importation explaining the LPL and emphasizing more significant LPLs.

      (5) The isolates were collected at sampling events (for cattle) or reported (for humans) over a period of at least 1 year: This criterion defines the persistence aspect of the LPL. In the primary analysis, the LPLs we identified persisted for an average of 8 years, with the shortest persisting for 5 years (these details have been added to the text, lines 268-269). Incorporating the extended analysis, several LPLs persisted for the full 13 years of the study.

      Regarding using additional non-Alberta isolates to help rule out importation, we have expanded the number of U.S. and global isolates included in the importation analysis, over-sampling clade G isolates from the U.S. (Figure 7). As cattle trade is substantially more common with the U.S. than other countries, we felt it most important to focus on the U.S. as a potential source of both imported cattle and human cases. Our results from this analysis show that only 9 of 494 (1.8%) U.S. isolates occurred in the LPLs we defined in the primary analysis, and all occurred after Alberta isolates (lines 313-317). Although we also added more global isolates, we still found that none were associated with the Alberta LPLs.

      (6) Given the importance of sampling for a study like this, some more information on animal sampling studies should be included here.

      We have added details on the cattle sampling to the Methods (lines 95-101).

      (7) L172 - do you mean an MRCA with >- 95% probability of location in Alberta?

      Location in Alberta was not determined from the primary analysis, which defined the LPLs, as only Alberta isolates were included in that analysis. As described above, this criterion meant that we required at least 95% probability that the tree topology at the lineage’s MRCA was correct, and in practice the posterior probability of the lineages we defined as LPLs was 99.7-100%.

      (8) Need a supplementary figure of just clade G from Figure 2.

      We have added a sub-tree diagram of clade G(vi) as Figure 2b.

      Reviewer #2 (Public Review):

      This study identified multiple locally evolving lineages transmitted between cattle and humans persistently associated with E. coli O157:H7 illnesses for up to 13 years. Furthermore, this study mentions a dramatic shift in the local persistent lineages toward strains with the more virulent stx2a-only profile. The authors hypothesized that this phenomenon is the large proportion of disease associated with local transmission systems is a principal cause of Alberta's high E. coli O157:H7 incidence. These opinions more effectively explain the role of the cattle reservoir in the dynamics of E. coli O157:H7 human infections.

      (1) The authors acknowledge the possibility of intermediate hosts or environmental reservoirs playing a role in transmission. Further discussion on the potential roles of other animal species commonly found in Alberta (e.g., sheep, goats, swine) could enhance the understanding of the transmission dynamics. Were isolates from these species available for analysis? If not, the authors should clearly state this limitation.”

      We have expanded the discussion of other species in Alberta, as suggested, including other livestock, wildlife, and the potential role of birds and flies (lines 353-360). Unfortunately, we did not have sequences available from other species, which we have added to the limitations (lines 487-490).

      (2) The focus on E. coli O157:H7 is understandable given its prominence in Alberta and the availability of historical data. However, a brief discussion on the potential applicability of the findings to non-O157 STEC serogroups, and the limitations therein, would be beneficial. Are there reasons to believe the transmission dynamics would be similar or different for other serogroups?

      We appreciate this comment and have expanded our discussion of relevance to non-O157 STEC (lines 452-460). Other authors have proposed that transmission dynamics differ, and studies of STEC risk factors, including our own, support this. However, there has been very little direct study of non-O157 transmission dynamics and there is even less cross-species genomic and metadata available for non-O157 isolates of concern.

      (3) The authors briefly mention the need for elucidating local transmission systems to inform management strategies. A more detailed discussion on specific public health interventions that could be targeted at the identified LPLs and their potential reservoirs would strengthen the paper's impact.

      We agree with the reviewer that this would be a good addition to the manuscript. The public health implications for control are several and extend to non-STEC reportable zoonotic enteric infections, such as Campylobacter and Salmonella. We have added a discussion of these (lines 460-465, 467-485).

      (4) Understanding the relationship between specific risk factors and E. coli O157:H7 infections is essential for developing effective prevention strategies. Have case-control or cohort studies been conducted to assess the correlation between identified risk factors and the incidence of E. coli O157:H7 infections? What methodologies were employed to control for potential confounders in these studies?

      Yes, there have been several case-control studies of reported cases. Many of these are referenced in the discussion in terms of the contribution of different sources to infection. As risk factors were not the focus of the current study, we believe a thorough discussion of the literature on the aspects of these various studies is beyond our scope. However, we have added some details on the risk factors themselves (lines 72-79).

      (5) The study's findings are noteworthy, particularly in the context of E. coli O157:H7 epidemiology. However, the extent to which these results can be replicated across different temporal and geographical settings remains an open question. It would be constructive for the authors to provide additional data that demonstrate the replication of their sampling and sequencing experiments under varied conditions. This would address concerns regarding the specificity of the observed patterns to the initial study's parameters.

      We appreciate the reviewer’s comment, as we are currently building on this analysis with an American dataset with different types of data available than were used in this study. Aligned with this work, we have added a comment on the adaptation of our method to other settings with different types of data (lines 448-450). We also added a sensitivity analysis to the manuscript simulating a different sampling approach (Suppl. Fig. S3), which should be informative to this question.

      Reviewer #2 (Recommendations For The Authors):

      Minor comments.

      (1) Figure 1: The figure is a critical visual representation of the study's findings and should be given prominent emphasis. It is essential that the key discoveries of the research are clearly depicted and explained in this visual format. The authors should ensure that Figure 1 is detailed and informative enough to stand out as a central piece of the study.

      Figure 1 is the diagram of sample numbers, locations, and corresponding analyses. We assume that the reviewer means to refer to Figure 2. Although the inclusion of >1,200 isolates makes the tree difficult to see in detail, we have made some modifications to make the findings clearer. First, we changed the clade coloration such that the only subclade differentiated is G(vi). We have removed the stx metadata ring to focus attention on the location and species of the isolates, as stx data are described in Table 1. Finally, we have added a sub-tree diagram of clade G(vi), colored by location. This makes clear the large sections of the subclade dominated by isolates from one location or another, and the limited areas where they overlap.

      (2) Figures 2 and 4: While these figures contribute to the presentation of the data, they appear to be somewhat rudimentary in their current form. The lack of detailed annotations regarding the clustering of different strains is a notable omission. I recommend that the authors refine these figures to include comprehensive labeling that clearly delineates the various bacterial clusters. Enhanced graphical representation with clear annotations will aid readers in better understanding the study's findings.

      We appreciate this suggestion. We have remade all trees generated by the BEAST 2 analyses in R, rather than FigTree. This has allowed us to annotate the trees with additional information on the LPLs and we believe provides a clearer picture of each LPL.

      (3) Supplemental Table S1: The supplemental tables are an excellent opportunity to showcase additional data and findings that support the study's conclusions. For Supplemental Table S1, it is recommended that the authors highlight the innovative aspects or novel discoveries presented in this table.

      Suppl. Table S1 shows the modeling specifications and priors used in the analyses. These decisions were not in and of themselves novel. The innovation in our methods is due to the development of the LPLs based on the trees resulting from the analyses detailed in Suppl. Table S1, as well as from the application of these models to E. coli O157:H7 for the first time. However, we understand the reviewers point and have emphasized the importance of the results shown in Suppl. Table S2 (lines 391-395).

      (4) Line 35: "We assessed the role of persistent cross-species transmission systems in Alberta's E. coli O157:H7 epidemiology." change to "We assessed the impact of persistent cross-species transmission systems on the epidemiology of E. coli O157:H7 in Alberta."

      We have made this change.

      (5) To facilitate a deeper understanding of the core findings of the manuscript and to enable the development of effective response strategies, I suggest that the authors provide more information regarding the sequencing data used in the study. This information should at least include aspects such as data accessibility and quality control measures.

      We have included a Supplemental Data File that lists all isolates used in the analysis, and the QC measures are detailed in the Methods.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      Can a plastic RNN serve as a basis function for learning to estimate value. In previous work this was shown to be the case, with a similar architecture to that proposed here. The learning rule in previous work was back-prop with an objective function that was the TD error function (delta) squared. Such a learning rule is non-local as the changes in weights within the RNN, and from inputs to the RNN depends on the weights from the RNN to the output, which estimates value. This is non-local, and in addition, these weights themselves change over learning. The main idea in this paper is to examine if replacing the values of these non-local changing weights, used for credit assignment, with random fixed weights can still produce similar results to those obtained with complete bp. This random feedback approach is motivated by a similar approach used for deep feed-forward neural networks.

      This work shows that this random feedback in credit assignment performs well but is not as well as the precise gradient-based approach. When more constraints due to biological plausibility are imposed performance degrades. These results are not surprising given previous results on random feedback. This work is incomplete because the delay times used were only a few time steps, and it is not clear how well random feedback would operate with longer delays. Additionally, the examples simulated with a single cue and a single reward are overly simplistic and the field should move beyond these exceptionally simple examples.

      Strengths:

      • The authors show that random feedback can approximate well a model trained with detailed credit assignment.

      • The authors simulate several experiments including some with probabilistic reward schedules and show results similar to those obtained with detailed credit assignments as well as in experiments.

      • The paper examines the impact of more biologically realistic learning rules and the results are still quite similar to the detailed back-prop model.

      Weaknesses:

      • The authors also show that an untrained RNN does not perform as well as the trained RNN. However, they never explain what they mean by an untrained RNN. It should be clearly explained. These results are actually surprising. An untrained RNN with enough units and sufficiently large variance of recurrent weights can have a high-dimensionality and generate a complete or nearly complete basis, though not orthonormal (e.g: Rajan&Abbott 2006). It should be possible to use such a basis to learn this simple classical conditioning paradigm. It would be useful to measure the dimensionality of network dynamics, in both trained and untrained RNN's.

      Thank you for pointing out the lack of explanation about untrained RNN. Untrained RNN in our simulations (except Fig. 6D/6E-gray-dotted) was randomly initialized RNN (i.e., connection weights were drawn from a pseudo normal distribution) that was used as initial RNN for training of value-RNNs. As you suggested, the performance of untrained RNN indeed improved as the number of units increased (Fig. 2J), and its highest part was almost comparable to the highest performance of trained value-RNNs (Fig. 2I). In the revision we will show the dimensionality of network dynamics (as you have suggested), and eigenvalue spectrum of the network.

      • The impact of the article is limited by using a network with discrete time-steps, and only a small number of time steps from stimulus to reward. What is the length of each time step? If it's on the order of the membrane time constant, then a few time steps are only tens of ms. In the classical conditioning experiments typical delays are of the order to hundreds of milliseconds to seconds. Authors should test if random feedback weights work as well for larger time spans. This can be done by simply using a much larger number of time steps.

      Thank you for pointing out this important issue, for which our explanation was lacking and our examination was insufficient. We do not consider that single time step in our models corresponds to the neuronal membrane time constant. Rather, for the following reasons, we assume that the time step corresponds to several hundreds of milliseconds:

      - We assume that single RNN unit corresponds to a small neuron population that intrinsically (for genetic/developmental reasons) share inputs/outputs and are mutually connected via excitatory collaterals.

      - Cortical activity is suggested to be sustained not only by fast synaptic transmission and spiking but also, even predominantly, by slower synaptic neurochemical dynamics (Mongillo et al., 2008, Science "Synaptic Theory of Working Memory" https://www.science.org/doi/10.1126/science.1150769).

      - In line with such theoretical suggestion, previous research examining excitatory interactions between pyramidal cells, to which one of us (the corresponding author Morita) contributed by conducting model fitting (Morishima, Morita, Kubota, Kawaguchi, 2011, J Neurosci, https://www.jneurosci.org/content/31/28/10380), showed that mean recovery time constant from facilitation for recurrent excitation among one of the two types of cortico-striatal pyramidal cells was around 500 milliseconds.

      If single time step corresponds to 500 milliseconds, three time steps from cue to reward in our simulations correspond to 1.5 sec, which matches the delay in the conditioning task used in Schultz et al. 1997 Science. Nevertheless, as you pointed out, it is necessary to examine whether our random feedback models can work for longer delays, and we will examine it in our revision.

      • In the section with more biologically constrained learning rules, while the output weights are restricted to only be positive (as well as the random feedback weights), the recurrent weights and weights from input to RNN are still bi-polar and can change signs during learning. Why is the constraint imposed only on the output weights? It seems reasonable that the whole setup will fail if the recurrent weights were only positive as in such a case most neurons will have very similar dynamics, and the network dimensionality would be very low. However, it is possible that only negative weights might work. It is unclear to me how to justify that bipolar weights that change sign are appropriate for the recurrent connections and inappropriate for the output connections. On the other hand, an RNN with excitatory and inhibitory neurons in which weight signs do not change could possibly work.

      Our explanation and examination about this issue were insufficient, and thank you for pointing it out and giving us helpful suggestion. In the Discussion (Line 507-510) of the original manuscript, we described "Regarding the connectivity, in our models, recurrent/feed-forward connections could take both positive and negative values. This could be justified because there are both excitatory and inhibitory connections in the cortex and the net connection sign between two units can be positive or negative depending on whether excitation or inhibition exceeds the other." However, we admit that the meaning of this description was not clear, and more explicit modeling will be necessary as you suggested.

      Therefore in our revision, we will examine models, in which inhibitory units (modeling fast-spiking (FS) GABAergic cells) will be incorporated, and neuron will follow Dale’s law.

      • Like most papers in the field this work assumes a world composed of a single cue. In the real world there many more cues than rewards, some cues are not associated with any rewards, and some are associated with other rewards or even punishments. In the simplest case, it would be useful to show that this network could actually work if there are additional distractor cues that appear at random either before the CS, or between the CS and US. There are good reasons to believe such distractor cues will be fatal for an untrained RNN, but might work with a trained RNN, either using BPPT or random feedback. Although this assumption is a common flaw in most work in the field, we should no longer ignore these slightly more realistic scenarios.

      Thank you very much for this insightful comment. In our revision, we will examine situations where there exist not only reward-associated cue but also randomly appeared distractor cues.

      Reviewer #2 (Public review):

      Summary:

      Tsurumi et al. show that recurrent neural networks can learn state and value representations in simple reinforcement learning tasks when trained with random feedback weights. The traditional method of learning for recurrent network in such tasks (backpropagation through time) requires feedback weights which are a transposed copy of the feed-forward weights, a biologically implausible assumption. This manuscript builds on previous work regarding "random feedback alignment" and "value-RNNs", and extends them to a reinforcement learning context. The authors also demonstrate that certain non-negative constraints can enforce a "loose alignment" of feedback weights. The author's results suggest that random feedback may be a powerful tool of learning in biological networks, even in reinforcement learning tasks.

      Strengths:

      The authors describe well the issues regarding biologically plausible learning in recurrent networks and in reinforcement learning tasks. They take care to propose networks which might be implemented in biological systems and compare their proposed learning rules to those already existing in literature. Further, they use small networks on relatively simple tasks, which allows for easier intuition into the learning dynamics.

      Weaknesses:

      The principles discovered by the authors in these smaller networks are not applied to deeper networks or more complicated tasks, so it remains unclear to what degree these methods can scale up, or can be used more generally.

      In our revision, we will examine more biologically realistic models with excitatory and inhibitory units, as well as more complicated tasks with distractor cues. We will also consider whether/how the depth of networks can be increased, though we do not currently have concrete idea on this last point. Thank you also for giving us the detailed insightful 'recommendations for authors'. We will address also them in our revision.

      Reviewer #3 (Public review):

      Summary:

      The paper studies learning rules in a simple sigmoidal recurrent neural network setting. The recurrent network has a single layer of 10 to 40 units. It is first confirmed that feedback alignment (FA) can learn a value function in this setting. Then so-called bio-plausible constraints are added: (1) when value weights (readout) is non-negative, (2) when the activity is non-negative (normal sigmoid rather than downscaled between -0.5 and 0.5), (3) when the feedback weights are non-negative, (4) when the learning rule is revised to be monotic: the weights are not downregulated. In the simple task considered all four biological features do not appear to impair totally the learning.

      Strengths:

      (1) The learning rules are implemented in a low-level fashion of the form: (pre-synaptic-activity) x (post-synaptic-activity) x feedback x RPE. Which is therefore interpretable in terms of measurable quantities in the wet-lab.

      (2) I find that non-negative FA (FA with non negative c and w) is the most valuable theoretical insight of this paper: I understand why the alignment between w and c is automatically better at initialization.

      (3) The task choice is relevant since it connects with experimental settings of reward conditioning with possible plasticity measurements.

      Weaknesses:

      (4) The task is rather easy, so it's not clear that it really captures the computational gap that exists with FA (gradient-like learning) and simpler learning rule like a delta rule: RPE x (pre-synpatic) x (post-synaptic). To control if the task is not too trivial, I suggest adding a control where the vector c is constant c_i=1.

      Thank you for this insightful comment. We have realized that this is actually an issue that would need multilateral considerations. A previous study of one of us (Wärnberg & Kumar, 2023 PNAS) assumed that DA represents a vector error rather than a scalar RPE, and thus homogeneous DA was considered as negative control because it cannot represent vector error other than the direction of (1, 1, .., 1). In contrast, the present work assumed that DA represents a scalar RPE, and then homogeneous DA (i.e., constant feedback) would not be said as a failure mode because it can actually represent a scalar RPE and FA to the direction of (1, 1, .., 1) should in fact occur. And this FA to (1, 1, ..., 1) may actually be interesting because it means that if heterogeneity of DA inputs is not large and the feedback is not far from (1, 1, ..., 1), states are learned to be represented in such a way that simple summation of cortical neuronal activity approximates value, thereby potentially explaining why value is often correlated with regional activation (fMRI BOLD signal) of not only striatal but also cortical regions (which I have been considering as an unresolved mystery). But on the other hand, the case with constant feedback is the same as the simple delta rule, as you pointed out, and then what could be obtained from the present analyses would be that FA is actually occurring behind the successful operation of such a simple rule. Anyway we will make further examinations and considerations on this issue.

      (5) Related to point 3), the main strength of this paper is to draw potential connection with experimental data. It would be good to highlight more concretely the prediction of the theory for experimental findings. (Ideally, what should be observed with non-negative FA that is not expected with FA or a delta rule (constant global feedback) ?).

      In response to this insightful comment, we considered concrete predictions of our models. In the FA model, the feedback vector c and the value-weight vector w are initially at random (on average orthogonal) relationships and become gradually aligned, whereas in the non-negative model, the vectors c and w are loosely aligned from the beginning. We considered how the vectors c and w can be experimentally measured. Each element of the feedback vector c is multiplied with TD-RPE, modulating the degree of update in each pyramidal cell (more accurately, pyramidal cell population that corresponds to single RNN unit). Thus each element of c could be measured as the magnitude of response of each pyramidal cell to DA stimulation. The element of the value-weight vector w corresponding to a given pyramidal cell could be measured, if striatal neuron that receives input from that pyramidal cell can be identified (although technically demanding), as the magnitude of response of the striatal neuron to activation of the pyramidal cell.

      Then, the abovementioned predictions can be tested by (i) identify cortical, striatal, and VTA regions that are connected by meso-cortico-limbic pathway and cortico-striatal-VTA pathway, (ii) identify pairs of cortical pyramidal cells and striatal neurons that are connected, (iii) measure the responses of identified pyramidal cells to DA stimulation, as well as the responses of identified striatal neurons to activation of the connected pyramidal cells, and (iv) test whether the DA->pyramidal responses and the pyramidal->striatal responses are associated across pyramidal cells, and whether such associations develop through learning. We will elaborate this tentative idea, and also other ideas, in our revision.

      (6a) Random feedback with RNN in RL have been studied in the past, so it is maybe worth giving some insights how the results and the analyzes compare to this previous line of work (for instance in this paper [https://www.nature.com/articles/s41467-020-17236-y]). For instance, I am not very surprised that FA also works for value prediction with TD error. It is also expected from the literature that the RL + RNN + FA setting would scale to tasks that are more complex than the conditioning problem proposed here, so is there a more specific take-home message about non-negative FA? or benefits from this simpler toy task?

      In reply to this suggestion, we will explore how our results compare to the previous studies including the paper [https://www.nature.com/articles/s41467-020-17236-y], and explore benefits of our models. At preset, we think of one possible direction. According to our results (Fig. 6E), under the non-negativity constraint, the model with random feedback and monotonic plasticity rule (bioVRNNrf) performed better, on average, than the model with backprop and non-monotonic plasticity rule (revVRNNbp) when the number of units was large, though the difference in the performance was not drastic. We will explore reasons for this, and examine if this also applies to cases with more realistic models, e.g., having separate excitatory and inhibitory units (as suggested by other reviewer).

      (6b) Related to task complexity, it is not clear to me if non-negative value and feedback weights would generally scale to harder tasks. If the task in so simple that a global RPE signal is sufficient to learn (see 4 and 5), then it could be good to extend the task to find a substantial gap between: global RPE, non-negative FA, FA, BP. For a well chosen task, I expect to see a performance gap between any pair of these four learning rules. In the context of the present paper, this would be particularly interesting to study the failure mode of non-negative FA and the cases where it does perform as well as FA.

      In reply to this comment and also other reviewer's comment, we will examine the performance of the different models in more complex tasks, e.g., having distractor cues or longer delays. We will also see whether or not the better performance of bioVRNNrf than revVRNNbp mentioned in the previous point applies to the different tasks.

      (7) I find that the writing could be improved, it mostly feels more technical and difficult than it should. Here are some recommendations:

      (7a) for instance the technical description of the task (CSC) is not fully described and requires background knowledge from other paper which is not desirable.

      (7b) Also the rationale for the added difficulty with the stochastic reward and new state is not well explained.

      (7c) In the technical description of the results I find that the text dives into descriptive comments of the figures but high-level take home messages would be helpful to guide the reader. I got a bit lost, although I feel that there is probably a lot of depth in these paragraphs.

      Thank you for your helpful suggestions. We will thoroughly revise our writings.

      (8) Related to the writing issue and 5), I wished that "bio-plausibility" was not the only reason to study positive feedback and value weights. Is it possible to develop a bit more specifically what and why this positivity is interesting? Is there an expected finding with non-negative FA both in the model capability? or maybe there is a simpler and crisp take-home message to communicate the experimental predictions to the community would be useful?

      We will make considerations on whether/how the non-negative constraints could have any benefits other than biological plausibility, in particular, in theoretical aspects or applications using neuro-morphic hardware, while we will also elaborate the links to biology and concretize the model's predictions.

    1. In her acclaimed recent book, Dear Science and Other Stories, Black studies scholar Katherine McKittrick takes on the project not of history but of science, explaining how an account that centers Black people, Black life, and Blackness more broadly can reveal the "asymmetrically connected knowledge systems" that structure modern scientific inquiry.

      I think you could maybe make McKittrick's book an even bigger part of DxD, maybe even moving it into the introduction, too? Your team is mounting a strong alternative to the simple and instantaneous impression and its associated epistemology, in favor of an epistemology that is still related to empirical experience and quantification but that does not aim to make experience rapidly extractable. I think this is a big point of resonance between your work and Dear Science—to reconceive science as wondering about things we don’t know through creative representations of experience that is located at the intersection of our bodies, our relations to colonial knowledge systems, and grounds that may be outside those systems.

    1. Author response:

      Reviewer #1 (Evidence, reproducibility and clarity):

      This is an interesting manuscript where the authors systematically measure rG4 levels in brain samples at different ages of patients affected by AD. To the best of my knowledge this is the first time that BG4 staining is used in this context and the authors provide compelling evidence to show an association with BG4 staining and age or AD progression, which interestingly indicates that such RNA structure might play a role in regulating protein homeostasis as previously speculated. The methods used and the results reported seem robust and reproducible.

      In terms of the conclusions, however, I think that there are 2 main things that need addressing prior to publication:

      (1) Usually in BG4 staining experiments to ensure that the signal detected is genuinely due to rG4 an RNase treatment experiment is performed. This does not have to be extended to all the samples presented but having a couple of controls where the authors observe loss of staining upon RNase treatment will be key to ensure with confidence that rG4s are detected under the experimental conditions. This is particularly relevant for this brain tissue samples where BG4 staining has never been performed before.

      With what is now known about RNA rG4s and the recent reconciliation of the controversy on rG4 formation (Kharel, Nature Communications 2023), this experiment is no longer strictly required for demonstration of rG4 formation. Despite this change, we did attempt this experiment at the reviewer’s suggestion, but the controls were not successful, suggesting it may not be feasible with our fixing and staining conditions. That said, we agree that despite the G4 staining appearing primarily outside the nucleus, it would be helpful to have some direct indication of whether we were observing primarily RNA or DNA G4s, and so we performed an alternate experiment to determine this.

      In our previous submission, we had performed ribosomal RNA staining  (Figure S7), and the staining patterns were similar to that of BG4, especially the punctate pattern near the nuclei. Therefore, we directly asked whether the BG4 was largely binding to rRNA and have now shown the resulting co-stain in Figure 3b. These results show that at least a large amount of the BG4 staining does arise from rG4s in ribosomes. At high magnification, we observe that the BG4 stains a subset of the ribosomes, consistent with previous observations of high rG4 levels in ribosomes both in vitro and in cells (Mestre-Fos, 2019 J Mol Biol, Mestre-Fos 2019 PLoS One, Mestre-Fos 2020 J Biol Chem), but this had never been demonstrated in tissue. This experiment has therefore both answered the primary question of whether we are primarily observing rG4s, as well as provided more detailed information on the cellular sublocalization of rG4 formation, and provided the first evidence of rG4 formation on ribosomes in tissue.

      (2) The authors have an association between rG4-formation and age/disease progression. They also observe distribution dependency of this, which is great. However, this is still an association which does not allow the model to be supported. This is not something that can be fixed with an easy experiment and it is what it is, but my point is that the narrative of the manuscript should be more fair and reflect the fact that, although interesting, what the authors are observing is a simple correlation. They should still go ahead and propose a model for it, but they should be more balanced in the conclusion and do not imply that this evidence is sufficient to demonstrate the proposed model. It is absolutely fine to refer to the literature and comment on the fact that similar observations have been reported and this is in line with those, but still this is not an ultimate demonstration.

      We agree that these are correlative studies (of necessity when studying human tissue), but recent experiments have shown that rG4s affect the aggregation of Tau in vitro – and we have now better clarified this in the text itself. We have now also been more careful in drawing causative conclusions as shown in the revised text.

      Minor point:

      (3) rG4s themselves have been shown to generate aggregates in ALS models in the absence of any protein (Ragueso et al. Nat Commun 2023). I think this is also important in the light of my comment on the model, could well be that these rG4s are causing aggregates themselves that act as nucleation point for the proteins as reported in the paper I mentioned. Providing a broader and more unbiased view of the current literature on the topic would be fair, rather than focusing on reports more in line with the model proposed.

      We agree and have modified the discussion and added a broader context, including the Ragueso report described above.

      Reviewer #1 (Significance):

      This is a significant novel study, as per my comments above. I believe that such a study will be of impact in the G4 and neurodegenerative fields. Providing that the authors can address the criticisms above, I strongly believe that this manuscript would be of value to the scientific community. The main strength is the novelty of the study (never done before) the main weakness is the lack of the RNase control at the moment and the slightly over interpretation of the findings (see comments above).

      Reviewer #2 (Evidence, reproducibility and clarity):

      RNA guanine-rich G-quadruplexes (rG4s) are non-canonical higher order nucleic acid structures that can form under physiological conditions. Interestingly, cellular stress is positively correlated with rG4 induction.  In this study, the authors examined human hippocampal postmortem tissue for the formation ofrG4s in aging and Alzheimer Disease (AD). rG4 immunostaining strongly increased in the hippocampus with both age and with AD severity. 21 cases were used in this study (age range 30-92).  This immunostaining co-localized with hyper-phosphorylated tau immunostaining in neurons. The BG4 staining levels were also impacted by APOE status. rG4 structure was previously found to drive tau aggregation. Based on these observations, the authors propose a model of neurodegeneration in which chronic rG4 formation drives proteostasis collapse.

      This model is interesting, and would explain different observations (e.g., RNA is present in AD aggregates and rG4s can enhance protein oligomerization and tau aggregation).

      Main issue:

      There is indeed a positive correlation between Braak stage severity and BG4 staining, but this correlation is relatively weak and borderline significant ((R = 0.52, p value = 0.028). This is probably the main limitation of this study, which should be clearly acknowledged (together with a reminder that "correlation is not causality”.

      We believe that we had not explained this clearly enough in the text (based on the reviewer’s comment), as the correlation mentioned by the Reviewer was for the CA4 region only, and not the OML, which was substantially more correlated and statistically significant (Spearman R= 0.72, p = 0.00086). As a result, we believe this was a miscommunication that is rectified by the revised text:

      “In the OML, plotting BG4 percent area versus Braak stage demonstrated a strong correlation (Spearman R= 0.72) with highly significantly increased BG4 staining with higher Braak stages (p = 0.00086) (Fig. 2b).”

      Related to this, here is no clear justification to exclude the four individuals in Fig 1d (without them R increases to 0.78). Please remove this statement. On the other hand, the difference based on APOE status is more striking.

      We did not mean to imply that deleting these outliers was correct, but merely were demonstrating that they were in fact outliers. To avoid this misinterpretation, we have now deleted the sentence in the Figure 1d caption mentioning the outliers.

      Minor suggestions

      - "BG4 immunostaining was in many cases localized in the cytoplasm near the nucleus in a punctate pattern". Define "many"

      This is seen in nearly every cells and this is now altered in the text and is now identified as ribosomes containing rG4s using the rRNA antibody (Fig. 3b).

      - Specify that MABE917 corresponds to the specific single-chain version of the BG4 antibody

      Yes, this is correct, and this clarification has been added to the manuscript

      - Define PMI, Braak, CERAD (add a list of acronyms or insert these definitions in Fig 1b legend)

      These definitions have all been added when they first appear.

      - Fig 3: scale bar legend missing (50 micrometers?)

      This has been added, and the reviewer was correct that it was 50 micrometers.

      - Supplementary data Table 1: indicate target for all antibodies

      The target for each antibody has been added to supplementary Table 1.

      - Supplementary data Table 2: why give ages with different levels of precision? (e.g. 90.15 vs 63)

      We apologize for this oversight and have altered the ages to the same (whole years) in the figure.

      - Supplementary data Fig 1 X-axis legend: add "(nm)" after wavelength. Sequence can also be added in the legend. Why this one? Max/Min Wavelengths in the figure do not match indications in the experimental part. Not sure if that part is actually relevant for this study.

      The CD spectrum in Sup Fig 1 is the sequence that had previously been shown to aid in tau aggregation seeding, but had not been suspected by those authors to be a quadruplex. So we tested that here and showed it is a quadruplex, as described at the end of the introduction. We have added wording to the figure legend to clarify where its corresponding description in the main text can be found. We have also checked and corrected the wavelength and units.

      - Supplementary data Fig 7: Which ribosomal antibody was used?

      The details of this antibody have now been added to Supplementary Table 2 which lists all the antibodies used.

      Reviewer #2 (Significance):

      Provide a link between Alzheimer disease and RNA G-quadruplexes.

      Reviewer #3 (Evidence, reproducibility and clarity):

      This study investigated the formation of RNA G quadruplexes (rG4) in aging and AD in human hippocampal postmortem tissue. The rG4 immunostaining in the hippocampus increases strongly with age and with the severity of AD. Furthermore, rG4 is present in neurons with an accumulation of phosphorylated tau immunostaining.

      Major comments

      (1) The method used in this study is primarily immunostaining of BG4, and the results cannot be considered correct without additional data from more multifaceted analyses (biochemical analysis, RNA expression analysis, etc.).

      We respectfully disagree with the Reviewer’s assessment of the value of these experiments. The most relevant biochemical experiments at the cellular and molecular level showing the role of G4s in aggregation in general and Tau in particular have been done and are referenced in the text. The results here stand on their own and are highly novel and significant, as evaluated by both of the other reviewers. There has been no previous work demonstrating the presence of rG4s in human brain – either in controls or in patients with AD. AD is a complex condition that only occurs spontaneously in the human brain and no other species; because of this complexity, novel aspects are best first studied in human brain tissue using the methods employed here.

      (2) Overall, the quality of the stained images is poor, and detailed quantitative analysis using further high quality data is essential to conclude the authors' conclusions.

      We have again looked at our images and they are not poor quality -they are confocal images taken at recommended resolution of the confocal microscope. It is possible the poor quality came from pdf compression by the manuscript submission portal, which is beyond our control as they were uploaded at high resolution. These data were quantified by scientists who were blinded to the diagnosis of each case. The level of description on the detailed quantification is higher than we have observed in similar studies. We therefore disagree with the reviewer’s conclusion.

      Reviewer #3 (Significance):

      Overall, this study is not a deeply analyzed study. In addition, the authors of this study need further understanding regarding G4.

      It is also unclear why the reviewer believes that we do not have sufficient understanding of G4s, and would request that the reviewer instead provides specific comments regarding what is lacking in terms of knowledge on G4s, as we respectfully disagree with this judgement of our knowledge-base (see other G4 papers from the Horowitz lab, Begeman, 2020, Litberg 2023, Son, 2023 referenced below).

      Litberg TJ, Sannapureddi RKR, Huang Z, Son A, Sathyamoorthy B, Horowitz S. Why are G-quadruplexes good at preventing protein aggregation? Jan;20(1):495-509. doi: 10.1080/15476286.2023.2228572. RNA Biol. (2023)

      Son A, Huizar Cabral V, Huang Z, Litberg TJ, Horowitz S. G-quadruplexes rescuing protein folding. May 16;120(20):e2216308120. doi: 10.1073/pnas.2216308120. Proc Natl Acad Sci U S A (2023)

      Guzman BB, Son A, Litberg TJ, Huang Z, Dominguez , Horowitz S. Emerging Roles for G-Quadruplexes in Proteostasis FEBS J.doi: 10.1111/febs.16608. (2022)

      Begeman A, Son A, Litberg TJ, Wroblewski TH, Gehring T, Huizar Cabral V, Bourne J, Xuan Z, Horowitz S. G-Quadruplexes Act as Sequence Dependent Protein Chaperones. EMBO Reports Sep 18;e49735. doi: 10.15252/embr.201949735. (2020)

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors present an interesting study using RL and Bayesian modelling to examine differences in learning rate adaptation in conditions of high and low volatility and noise respectively. Through "lesioning" an optimal Bayesian model, they reveal that apparently a suboptimal adaptation of learning rates results from incorrectly detecting volatility in the environment when it is not in fact present.

      Strengths:

      The experimental task used is cleverly designed and does a good job of manipulating both volatility and noise. The modelling approach takes an interesting and creative approach to understanding the source of apparently suboptimal adaptation of learning rates to noise, through carefully "lesioning" and optimal Bayesian model to determine which components are responsible for this behaviour.

      We thank the reviewer for this assessment.

      Weaknesses:

      The study has a few substantial weaknesses; the data and modelling both appear robust and informative, and it tackles an interesting question. The model space could potentially have been expanded, particularly with regard to the inclusion of alternative strategies such as those that estimate latent states and adapt learning accordingly.

      We thank the reviewer for this suggestion. We agree that it would be interesting to assess the ability of alternative models to reproduce the sub-optimal choices of participants in this study. The Bayesian Observer Model described in the paper is a form of Hierarchical Gaussian Filter, so we will assess the performance of a different class of models that are able to track uncertainty-- RL based models that are able to capture changes of uncertainty (the Kalman filter, and the model described by Cochran and Cisler, Plos Comp Biol 2019). We will assess the ability of the models to recapitulate the core behaviour of participants (in terms of learning rate adaption) and, if possible, assess their ability to account for the pupillometry response.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors aimed to investigate how humans learn and adapt their behavior in dynamic environments characterized by two distinct types of uncertainty: volatility (systematic changes in outcomes) and noise (random variability in outcomes). Specifically, they sought to understand how participants adjust their learning rates in response to changes in these forms of uncertainty.

      To achieve this, the authors employed a two-step approach:

      (1) Reinforcement Learning (RL) Model: They first used an RL model to fit participants' behavior, revealing that the learning rate was context-dependent. In other words, it varied based on the levels of volatility and noise. However, the RL model showed that participants misattributed noise as volatility, leading to higher learning rates in noisy conditions, where the optimal strategy would be to be less sensitive to random fluctuations.

      (2) Bayesian Observer Model (BOM): To better account for this context dependency, they introduced a Bayesian Observer Model (BOM), which models how an ideal Bayesian learner would update their beliefs about environmental uncertainty. They found that a degraded version of the BOM, where the agent had a coarser representation of noise compared to volatility, best fit the participants' behavior. This suggested that participants were not fully distinguishing between noise and volatility, instead treating noise as volatility and adjusting their learning rates accordingly.

      The authors also aimed to use pupillometry data (measuring pupil dilation) as a physiological marker to arbitrate between models and understand how participants' internal representations of uncertainty influenced both their behavior and physiological responses. Their objective was to explore whether the BOM could explain not just behavioral choices but also these physiological responses, thereby providing stronger evidence for the model's validity.

      Overall, the study sought to reconcile approximate rationality in human learning by showing that participants still follow a Bayesian-like learning process, but with simplified internal models that lead to suboptimal decisions in noisy environments.

      Strengths:

      The generative model presented in the study is both innovative and insightful. The authors first employ a Reinforcement Learning (RL) model to fit participants' behavior, revealing that the learning rate is context-dependent-specifically, it varies based on the levels of volatility and noise in the task. They then introduce a Bayesian Observer Model (BOM) to account for this context dependency, ultimately finding that a degraded BOM - in which the agent has a coarser representation of noise compared to volatility - provides the best fit for the participants' behavior. This suggests that participants do not fully distinguish between noise and volatility, leading to the misattribution of noise as volatility. Consequently, participants adopt higher learning rates even in noisy contexts, where an optimal strategy would involve being less sensitive to new information (i.e., using lower learning rates). This finding highlights a rational but approximate learning process, as described in the paper.

      We thank the reviewer for their assessment of the paper.

      Weaknesses:

      While the RL and Bayesian models both successfully predict behavior, it remains unclear how to fully reconcile the two approaches. The RL model captures behavior in terms of a fixed or context-dependent learning rate, while the BOM provides a more nuanced account with dynamic updates based on volatility and noise. Both models can predict actions when fit appropriately, but the pupillometry data offers a promising avenue to arbitrate between the models. However, the current study does not provide a direct comparison between the RL framework and the Bayesian model in terms of how well they explain the pupillometry data. It would be valuable to see whether the RL model can also account for physiological markers of learning, such as pupil responses, or if the BOM offers a unique advantage in this regard. A comparison of the two models using pupillometry data could strengthen the argument for the BOM's superiority, as currently, the possibility that RL models could explain the physiological data remains unexplored.

      We thank the reviewer for this suggestion. In the current version of the paper, we use an extremely simple reinforcement learning model to simply measure the learning rate in each task block (as this is the key behavioural metric we are interested in). As the reviewer highlights, this simple model doesn’t estimate uncertainty or adapt to it. Given this, we don’t think we can directly compare this model to the Bayesian Observer Model—for example, in the current analysis of the pupillometry data we classify individual trials based on the BOM’s estimate of uncertainty and show that participants adapt their learning rate as expected to the reclassified trials, this analysis would not be possible with our current RL model. However, there are more complex RL based models that do estimate uncertainty (as discussed above in response to Reviewer #1) and so may more directly be compared to the BOM. We will attempt to apply these models to our task data and describe their ability to account for participant behaviour and physiological response as suggested by the Reviewer.

      The model comparison between the Bayesian Observer Model and the self-defined degraded internal model could be further enhanced. Since different assumptions about the internal model's structure lead to varying levels of model complexity, using a formal criterion such as Bayesian Information Criterion (BIC) or Akaike Information Criterion (AIC) would allow for a more rigorous comparison of model fit. Including such comparisons would ensure that the degraded BOM is not simply favored due to its flexibility or higher complexity, but rather because it genuinely captures the participants' behavioral and physiological data better than alternative models. This would also help address concerns about overfitting and provide a clearer justification for using the degraded BOM over other potential models.

      Thank you, we will add this.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      How reconsolidation works - particularly in humans - remains largely unknown. With an elegant, 3-day design, combining fMRI and psychopharmacology, the authors provide evidence for a certain role for noradrenaline in the reconsolidation of memory for neutral stimuli. All memory tasks were performed in the context of fMRI scanning, with additional resting-state acquisitions performed before and after recall testing on Day 2. On Day 1, 3 groups of healthy participants encoded word-picture associates (with pictures being either scenes or objects) and then performed an immediate cued recall task to presentation of the word (answering is the word old or new, and whether it was paired with a scene or an object). On Day 2, the cued recall task was repeated using half of the stimulus set words encoded on Day 1 (only old words were presented, with subjects required to indicate prior scene vs object pairing). This test was immediately preceded by the oral administration of placebo, cortisol, or yohimbine (to raise noradrenaline levels) depending on group assignment. On Day 3, all words presented on Day 1 were presented. As expected, on Day 3, memory was significantly enhanced for associations that were cued and successfully retrieved on Day 2 compared to uncued associations. However, for associative d', there was no Cued × Group interaction nor a main effect of Group, i.e., on the standard measure of memory performance, post-retrieval drug presence on Day 2 did not affect memory reconsolidation. As further evidence for a null result, fMRI univariate analyses showed no Cued × Group interactions in whole-brain or ROI activity.

      Strengths:

      There are some aspects of this study that I find impressive. The study is well-designed and the fMRI analysis methodology is innovative and sound. The authors have made meticulous and thorough physiological measurements, and assays of mood, throughout the experiment. By doing so, they have overcome, to a considerable extent, the difficulties inherent in the timing of human oral drug delivery in reconsolidation tasks, where it is difficult to have the drug present in the immediate recall period without affecting recall itself. This is beautifully shown in Figure 3. I also think that having some neurobiological assay of memory reactivation when studying reconsolidation in humans is critical, and the authors provide this. While multi-voxel patterns of hemodynamic responses are, in my view, very difficult to equate with an "engram", these patterns do have something to do with memory.

      We thank the reviewer for considering aspects of our work impressive, the study to be well-designed, and the methodology to be innovative and sound.

      Weaknesses:

      I have major issues regarding the behavioral results and the framing of the manuscript.

      (1) To arrive at group differences in memory performance, the authors performed median splitting of Day 3 trials by short and long reaction times during memory cueing on Day 2, as they took this as a putative measure of high/low levels of memory reactivation. Associative category hits on Day 3 showed a Group by Day 2 Reaction time (short, long) interaction, with post-hocs showing (according to the text) worse memory for short Day 2 RTs in the Yohimbine group. These post-hocs should be corrected for multiple comparisons, as the result is not what would be predicted (see point 2). My primary issue here is that we are not given RT data for each group, nor is the median splitting procedure described in the methods. Was this across all groups, or within groups? Are short RTs in the yohimbine group any different from short RTs in the other two groups? Unfortunately, we are not given Day 2 picture category memory levels or reaction times for each group. This is relevant because (as given in Supplemental Table S1) memory performance (d´) for the Yohimbine group on Day 1 immediate testing is (roughly speaking) 20% lower than the other 2 groups (independently of whether the pairs will be presented again the following day). I appreciate that this is not significant in a group x performance ANOVA but how does this relate to later memory performance? What were the group-specific RTs on Day 1? So, before the reader goes into the fMRI results, there are questions regarding the supposed drug-induced changes in behavior. Indeed, in the discussion, there is repeated mention of subsequent memory impairment produced by yohimbine but the nature of the impairment is not clear.

      Thank you for the opportunity to clarify these important issues.

      Reaction times are well established proxies (correlates) of memory strength and memory confidence in previous research, as they reflect cognitive processes involved in retrieving information. Faster reaction times indicate stronger mnemonic evidence and higher confidence in the accuracy of a memory decision, while slower responses suggest weaker evidence and decision uncertainty or doubt. This relationship is supported by an extensive literature (e.g., Starns 2021; Robinson et al., 1997; Ratcliff & Murdock, 1976; amongst others). Importantly, distinguishing between high and low confidence choices in a memory task serves the purpose of differentiating between particularly strong memory evidence (e.g., in associative cued recall, when remembering is particularly vivid) and weaker memory evidence. Separating low from high confidence responses based on participants’ reaction times was especially important in the current analyses, because previous research demonstrates that reaction times during cued recall tasks inversely correlate with hippocampal involvement (Heinbockel et al., 2024; Gagnon et al. 2019) and that stress-effects on human memory may be particularly pronounced for high-confidence memories (Gagnon et al., 2019).

      In response to the Reviewer 1’s comments, we have elaborated on our rationale for the distinction between short and long reaction times in the introduction, results, and methods. Please see page 4, lines 144 to 148:

      “We distinguished between responses with short and long reaction times indicative of high and low confidence responses because previous research showed that reaction times are inversely correlated with hippocampal memory involvement(58-60) and memory strength(61,62), and that high confidence memories associated with short reaction times may be particularly sensitive to stress effects(63).”

      On page 13, lines 520 to 523:

      “Reaction times in the Day 2 Memory cueing task revealed a trial-specific gradient in reactivation strength. Thus, we turned to single-trial analyses, differentiating Day 3 trials by short and long reaction times during memory cueing on Day 2 (median split), indicative of high vs. low memory confidence(58–60) and hippocampal reactivation(26,63).”

      And on page 26, lines 1046 to 1053:

      “Reaction times serve as a proxy for memory confidence and memory strength, with faster responses reflecting higher confidence/strength and slower responses suggesting greater uncertainty/weaker memory. The association between reaction times and memory confidence has been established by previous research(58–60), suggesting that the distinction between high from low confidence responses differentiates vividly recalled associations from decisions based on weaker memory evidence. Reaction times are further linked to hippocampal activity during recall tasks(26,53), and stress effects on memory are particularly pronounced for high-confidence memories(53).”

      With respect to behavioral data reporting, we agree that the critical median-split procedure was not sufficiently clear in the original manuscript. We elaborate on this important aspect of the analysis now on page 26, lines 1053 to 1057:

      “We conducted a median-split within each participant to categorize trials as fast vs. slow reaction time trials during Day 2 memory cueing. We conducted this split on the participant- and not group-level because there is substantial inter-individual variability in overall reaction times. This approach also results in an equal number of trials in the low and high confidence conditions.”

      We completely agree that the relevant post-hoc test should be corrected for multiple comparisons. Please note that all reported post-hoc tests had been Bonferroni-corrected already. We clarify this now by explicitly referring to corrected p-values (P<sub>corr</sub>) and indicate in the methods that P<sub>corr</sub> refers to Bonferroni-corrected p-values. (please see page 25, lines 1036 to 1038).

      We further agree that for a comprehensive overview of the behaviour in terms of memory performance and RTs, these data need to be provided for each group and experimental day. Therefore, we now extended Supplementary Table S1 to include descriptive indices of memory performance (hits, dprime) and RTs for each group for each day. Moreover, we now report ANOVAs for reaction times for each of the experimental days in the main text.

      The ANOVA for Day 1 is now reported on page 6, lines 200 to 204: “To test for potential group differences in reaction times for correctly remembered associations on Day 1, we fit a linear model including the factors Group and Cueing. Critically, we did not observe a significant Group x Cueing interaction, suggesting no RT difference between groups for later cued and not cued items (F(2,58) = 1.41, P = .258, η<sup>2</sup> = 0.01; Supplemental Table S1).”

      The ANOVA for Day 2 is now reported on page 7, lines 243 to 248: “To test for potential group differences in reaction times for correctly remembered associations on Day 2, we fit a linear model including the factors Group and Reaction time (slow/fast) following the subject specific median split. The model did not reveal any main effect or interaction including the factor Group (all Ps > .535; Supplemental Table S1), indicating that there was no RT difference between groups, nor between low and high RT trials in the groups.”

      The ANOVA for Day 3 is reported on page 13 lines 487 to 494: “To test for potential group differences in reaction times for correctly remembered associations on Day 3 we fit a linear model including the factors Group and Cueing. This model did not reveal any main effect or interaction including the factor Group (all Ps > .267), indicating that there was no average RT difference between groups. As expected we observed a main effect of the factor Cueing, indicating a significant difference of reaction times across groups between trials that were successfully cued and those not cued on Day 2 (F(2,58) = 153.07, P < .001, η<sup>2</sup> = 0.22; Supplemental Table S1).”

      (2) The authors should be clearer as to what their original hypotheses were, and why they did the experiment. Despite being a complex literature, I would have thought the hypotheses would be reconsolidation impairment by cortisol and enhancement by yohimbine. Here it is relevant to point out that - only when the reader gets to the Methods section - there is mention of a paper published by this group in 2024. In this publication, the authors used the same study design but administered a stress manipulation after Day 2 cued recall, instead of a pharmacological one. They did not find a difference in associative hit rate between stress and control groups, but - similar to the current manuscript - reported that post-retrieval stress disrupts subsequent remembering (Day 3 performance) depending on neural memory reinstatement during reactivation (specifically driven by the hippocampus and its correlation with neocortical areas).

      Instead of using these results, and other human studies, to motivate the current work, reference is made to a recent animal study: Line 169 "Building on recent findings in rodents (Khalaf et al. 2018), we hypothesized that the effects of post-retrieval noradrenergic and glucocorticoid activation would critically depend on the reinstatement of the neural event representation during retrieval". It is difficult to follow that a rodent study using contextual fear conditioning and examining single neuron activity to remote fear recall and extinction would be relevant enough to motivate a hypothesis for a human psychopharmacological study on emotionally neutral paired associates.

      We agree that our recent publication utilizing a very similar experimental design including three days is highly relevant in the context of the current study and we now refer to this recent study earlier in our manuscript. Please see page 3, lines 89 to 94:  

      “Recently, we showed a detrimental impact of post-retrieval stress on subsequent memory that was contingent upon reinstatement dynamics in the Hippocampus, VTC and PCC during memory reactivation26. While this study provided initial insights into the potential brain mechanisms involved in the effects of post-retrieval stress on subsequent memory, the underlying neuroendocrine mechanisms remained elusive.”

      Moreover, we explicitly state our hypothesis regarding the neural mechanism, with reference to our recent work, on page 5, lines 166 to 169:

      “Building on our recent findings in humans(26) as well as current insights from rodents(47), we hypothesized that the effects of post-retrieval noradrenergic and glucocorticoid activation would critically depend on the reinstatement of the neural event representation during retrieval.”

      Concerning the potential direction of the effects of post-retrieval cortisol and noradrenaline, the literature is indeed mixed with partially contradicting results, which made it, in our view, difficult to derive a clear hypothesis of potentially opposite effects of cortisol and yohimbine. We summarize the relevant evidence in the introduction on pages 3 to 4, lines 100 to 113:

      “Some studies, using emotional recognition memory or fear conditioning in healthy humans, suggest enhancing effects of post-retrieval glucocorticoids on subsequent memory(30,31). However, rodent studies on neutral recognition memory(21), fear conditioning(32), as well as evidence from humans on episodic recognition memory(33) report impairing effects of glucocorticoid receptor activation on post-retrieval memory dynamics. For noradrenaline, post-retrieval blockade of noradrenergic activity impairs putative reconsolidation or future memory accessibility in human fear conditioning(34), as well as drug (alcohol) memory(35) and spatial memory in rodents(36). However, this effect is not consistently observed in human studies on fear conditioning(40), speaking anxiety(37), inhibitory avoidance(39), traumatic mental imagination (PTSD patients)(38), and might depend on the arousal state of the individual(21) or the exact timing of drug administration as suggested by studies in humans(41) and rodents(42). Thus, while there is evidence that glucocorticoid and noradrenergic activation after retrieval can affect subsequent memory, the direction of these effects remains elusive.”

      In addition to these reviewer comments and in response to the eLife assessment, we would like to emphasize that the present findings are in our view not only relevant for a subfield but may be of considerable interest for researchers from various fields, beyond experimental memory research, including Neurobiology, Psychiatry, Clinical Psychology, Educational Psychology, or Law Psychology. We highlight the relevance of the topic and our findings now more explicitly in the introduction and discussion. Please see page 3:

      “The dynamics of memory after retrieval, whether through reconsolidation of the original trace or interference with retrieval-related traces, have fundamental implications for educational settings, eyewitness testimony, or mental disorders(5,11,12). In clinical contexts, post-retrieval changes of memory might offer a unique opportunity to retrospectively modify or render less accessible unwanted memories, such as those associated with posttraumatic stress disorder (PTSD) or anxiety disorders(13–15). Given these potential far reaching implications, understanding the mechanisms underlying post-retrieval dynamics of memory is essential.”

      On page 17:

      “Upon their retrieval, memories can become sensitive to modification(1,2). Such post-retrieval changes in memory may be fundamental for adaptation to volatile environments and have critical implications for eyewitness testimony, clinical or educational contexts(5,11–15). Yet, the brain mechanisms involved in the dynamics of memory after retrieval are largely unknown, especially in humans.”

      And on page 19:

      “Beyond their theoretical relevance, these findings may have relevant implications for attempts to employ post-retrieval manipulations to modify unwanted memories in anxiety disorders or PTSD(97,98). Specifically, the present findings suggest that such interventions may be particularly promising if combined with cognitive or brain stimulation techniques ensuring a sufficient memory reactivation.“

      Reviewer #1 (Recommendations for the authors):

      (1) Related to major issue 2 in the Public Review. In the introduction, it would be helpful to be specific about the type of memory being probed in the different studies referenced (episodic vs conditioning). For the former, please make it clear whether stimuli to be remembered were emotional or neutral, and for which stimulus class drug effects were observed. This is particularly important given that in the first paragraph, you describe memory reactivation in the context of traumatic memories via mention of PTSD. It would also be helpful to know to which species you refer. For example, in line 115, "timing of drug administration..." a rodent and a human study are cited.

      We completely agree that these aspects are important. We have therefore rewritten the corresponding paragraph in the introduction to clarify the type of memory probed, the emotionality of the stimuli and the species tested. Please see pages 3 to 4, lines 100 to 113:

      “Some studies, using emotional recognition memory or fear conditioning in healthy humans, suggest enhancing effects of post-retrieval glucocorticoids on subsequent memory(30,31). However, rodent studies on neutral recognition memory(21), fear conditioning(32), as well as evidence from humans on episodic recognition memory(33) report impairing effects of glucocorticoid receptor activation on post-retrieval memory dynamics. For noradrenaline, post-retrieval blockade of noradrenergic activity impairs putative reconsolidation or future memory accessibility in human fear conditioning(34), as well as drug (alcohol) memory(35) and spatial memory in rodents(36). However, this effect is not consistently observed in human studies on fear conditioning(40), speaking anxiety(37), inhibitory avoidance(39), traumatic mental imagination (PTSD patients)(38), and might depend on the arousal state of the individual(21) or the exact timing of drug administration as suggested by studies in humans(41) and rodents(42). Thus, while there is evidence that glucocorticoid and noradrenergic activation after retrieval can affect subsequent memory, the direction of these effects remains elusive.”

      (2) The Bos 2014 reference appears incorrect. I think you mean the Frontiers paper of the same year.

      Thank you for noticing this mistake, which has been corrected.

      (3) Line 734 "The study employed a fully crossed, placebo-controlled, double-blind, between-subjects design". What is a fully crossed design?

      A fully-crossed design refers to studies in which all possible combinations of multiple between-subjects factors are implemented. However, because the factor reactivation/cueing was manipulated within-subject in the present study and there is only one between-subjects factor (group/drug), “fully-crossed” may be misleading here. We removed it from the manuscript.

      (4) Supplemental Table S3. Are these ordered in terms of significance? A t- or Z-value for each cluster (either of the peak or a summed value) would be helpful.

      We agree that the ordering of the clusters was not clearly described. In the revised Supplemental Table S3, we have now added a column with the cluster-peak specific T-values and added an explanation in the table caption: “Depicted clusters are ordered by cluster-peak T-values.”

      (5) Please provide the requested memory performance and reaction time data, and relevant group comparisons.

      In response to general comment #1 above, we now provide all relevant accuracy and reaction time data for all groups and experimental days in the revised Supplemental Table S1. Moreover, we now report the relevant group comparisons in the main text on page 6, lines 200 to 204, on page 7, lines 243 to 248, and on page 13, lines 487 to 494.

      (6) Please rewrite the introduction with specific hypotheses, mention your recent results published in Science Advances, and attend to suggestions made in the first comment above.

      We have rewritten parts of the introduction to make the link to our recent publication clearer and to clarify the types of memories and species tested, as suggested by the reviewer (please see pages 3 to 4, lines 100 to 113). Moreover, we explicitly state our hypothesis regarding the neural mechanism on page 5, lines 166 to 169:

      “Building on our recent findings in humans(26) as well as current insights from rodents(47), we hypothesized that the effects of post-retrieval noradrenergic and glucocorticoid activation would critically depend on the reinstatement of the neural event representation during retrieval.”

      In terms of the direction of the potential cortisol and yohimbine effects, we have elaborated on the relevant literature, which in our view does not allow a clear prediction regarding the nature of the drug effects. We have made this explicit by stating that “… while there is evidence that glucocorticoid and noradrenergic activation after retrieval can affect subsequent memory, the direction of these effects remains elusive.” (please see page 4, lines 111 to 113). It would be, in our view, inappropriate to retrospectively add another, more specific “hypothesis”.

      Reviewer #2 (Public review):

      Summary:

      The authors aimed to investigate how noradrenergic and glucocorticoid activity after retrieval influence subsequent memory recall with a 24-hour interval, by using a controlled three-day fMRI study involving pharmacological manipulation. They found that noradrenergic activity after retrieval selectively impairs subsequent memory recall, depending on hippocampal and cortical reactivation during retrieval.

      Overall, there are several significant strengths of this well-written manuscript.

      Strengths:

      (1) The study is methodologically rigorous, employing a well-structured three-day experimental design that includes fMRI imaging, pharmacological interventions, and controlled memory tests.

      (2) The use of pharmacological agents (i.e., hydrocortisone and yohimbine) to manipulate glucocorticoid and noradrenergic activity is a significant strength.

      (3) The clear distinction between online and offline neural reactivation using MVPA and RSA approaches provides valuable insights into how memory dynamics are influenced by noradrenergic and glucocorticoid activity distinctly.

      We thank the reviewer for these very positive and encouraging remarks.

      Weaknesses:

      (1) One potential limitation is the reliance on distinct pharmacodynamics of hydrocortisone and yohimbine, which may complicate the interpretation of the results.

      We agree that the pharmacodynamics of hydrocortisone and yohimbine are different. However, we took these pharmacodynamics into account when designing the experiment and have made an effort to accurately track the indicators for noradrenergic arousal and glucocorticoids across the experiment. As shown in Figure 2, these indicators confirm that both drugs are active within the time window of approximately 40-90 minutes after reactivation. This time window corresponds to the proposed reconsolidation window, which is assumed to open around 10 minutes post-reactivation and to remain open for a few hours (approximately 90 minutes; Monfils & Holmes, 2018; Lee et al., 2017; Monfils et al., 2009).

      We have now acknowledged the distinct pharmacodynamics of hydrocortisone and yohimbine on page 21, lines 845 to 847: “We note that yohimbine and hydrocortisone follow distinct pharmacodynamics(104,105), yet selected the administration timing to ensure that both substances are active within the relevant post-retrieval time window.”

      In the results section, on page 11, lines 437 to 439, we further emphasize this differential dynamic: “Our data demonstrate that, despite the distinct pharmacodynamics of CORT and YOH, both substances are active within the time window that is critical for potential reconsolidation effects(3,4,43).”

      (2) Another point related above, individual differences in pharmacological responses, physiological and cortisol measures may contribute to memory recall on Day 3.

      The administered drugs elicit a pronounced adrenergic and glucocorticoid response, respectively. Specifically, the cortisol levels reached by 20mg of hydrocortisone correspond to those observed after a significant stressor exposure. Moreover, individual variation in stress system activation following drug intake tends to be less pronounced than in response to a natural stressor. Nevertheless, we fully agree that individual factors, such as metabolism or body weight, can influence the drug's action.

      We therefore re-analysed the reported Day 3 models, now including individual measures of baseline-to-peak changes in cortisol and systolic blood pressure, respectively. We report these additional analyses in the supplement and refer the interested reader to these analyses on page 15, lines 580 to 586:

      “As individual factors, such as metabolism or body weight, can influence the drug's action, we ran an additional analysis in which we included individual (baseline-to-peak) differences in salivary cortisol and (systolic) blood pressure, respectively. This analysis did not show any group by baseline-to-peak difference interaction suggesting that the observed memory effects were mainly driven by the pharmacological intervention group per se and less by individual variation in responses to the drug (see Supplemental Results).”

      And in the Supplemental Results:

      “To account for individual differences in cortisol responses after pill intake, we fit additional GLMMs predicting Day 3 subsequent memory of cued and correct trials including the factors Individual baseline-to-peak cortisol and Group. Doing so allowed us to account for variation in Day 3 performance, which might have resulted from within-group variation in cortisol responses, in particular in the CORT group. Importantly, none of the models predicting Day 3 memory performance by Day 2 cortisol-increase and Group, median-split RTs (high/low), hippocampal activity and RTs, or hippocampal activity and VTC category reinstatement revealed a significant group x baseline-to-peak cortisol interaction (all Ps > .122). These results suggest that inter-individual differences in cortisol responses did not have a significant impact on subsequent memory, beyond the influence of group per se. The same analyses were repeated for systolic blood pressure employing GLMMs predicting Day 3 subsequent memory of cued and correct trials including the factors Individual baseline-to-peak systolic blood pressure and Group to account for variation in Day 3 performance, which might have resulted from within-group variation in blood pressure response, in particular in the YOH group. While the model predicting Day 3 memory performance revealed a significant Individual baseline-to-peak systolic blood pressure × Group × median-split RTs (high/low) interaction (β = -0.05 ± 0.02, z = -2.04, P = .041, R<sup>2</sup><sub>conditional</sub> = 0.01), post-hoc slope tests, however, did not show any significant difference between groups (all P<sub>Corr</sub> > .329). The remaining models including hippocampal activity and RTs, or hippocampal activity and VTC category reinstatement did not reveal a significant Group × Individual baseline-to-peak systolic blood pressure interaction (all Ps > .101). These results suggest that inter-individual differences in systolic blood pressure responses did not have a significant impact on subsequent memory, beyond the influence of group per se.”

      Although we acknowledge that our study may not have been sufficiently powered for an analysis of individual differences, these data suggest that our memory effects were mainly driven by the pharmacological intervention group per se and less by individual variation in responses. It is to be noted, however, that all participants of the respective groups showed a pronounced increase in cortisol concentrations (on average > 1000% in the CORT group) and autonomic arousal (on average > 10% in the YOH group), respectively. These increases appeared to be sufficient to drive the observed memory effects, irrespective of some individual variation in the magnitude of the response.

      (3) Median-splitting approach for reaction times and hippocampal activity should better be justified.

      Reaction times are well established proxies (correlates) of memory strength and memory confidence in previous research, as they reflect cognitive processes involved in retrieving information. Faster reaction times indicate stronger mnemonic evidence and higher confidence in the accuracy of a memory decision, while slower responses suggest weaker evidence and decision uncertainty or doubt. This relationship is supported by an extensive literature (e.g., Starns 2021; Robinson et al., 1997; Ratcliff & Murdock, 1976; amongst others). Importantly, distinguishing between high and low confidence choices in a memory task serves the purpose to differentiating between particularly strong memory evidence (e.g., is associative cued recall, when remembering is particularly vivid) and weaker memory evidence. Separating low from high confidence responses based on participants’ reaction times was especially important in the current analyses, because previous research demonstrates that reaction times during cued recall tasks inversely correlate with hippocampal involvement  Heinbockel et al., 2024; Gagnon et al. 2019) and that stress-effects on human memory may be particularly pronounced for high-confidence memories (Gagnon et al., 2019).

      In response to the Reviewer comments, we have elaborated on our rationale for the distinction between short and long reaction times in the introduction, results, and methods. Please see page 4, lines 144 to 148:

      “We distinguished between responses with short and long reaction times indicative of high and low confidence responses because previous research showed that reaction times are inversely correlated with hippocampal memory involvement(58–60) and memory strength(61,62), and that high confidence memories associated with short reaction times may be particularly sensitive to stress effects(63).”

      On page 13, lines 520 to 523:

      “Reaction times in the Day 2 Memory cueing task revealed a trial-specific gradient in reactivation strength. Thus, we turned to single-trial analyses, differentiating Day 3 trials by short and long reaction times during memory cueing on Day 2 (median split), indicative of high vs. low memory confidence(58–60) and hippocampal reactivation(26,63).”

      And on page 26, lines 1046 to 1053:

      “Reaction times serve as a proxy for memory confidence and memory strength, with faster responses reflecting higher confidence/strength and slower responses suggesting greater uncertainty/weaker memory. The association between reaction times and memory confidence has been established by previous research(58–60), suggesting that the distinction between high from low confidence responses differentiates vividly recalled associations from decisions based on weaker memory evidence. Reaction times are further linked to hippocampal activity during recall tasks(26,53), and stress effects on memory are particularly pronounced for high-confidence memories(53).”

      We agree that the critical median-split procedure was not sufficiently clear in the original manuscript. We elaborate on this important aspect of the analysis now on page 26, lines 1053 to 1057:

      “We conducted a median-split within each participant to categorize trials as slow vs. fast reaction time trials during Day 2 memory cueing. We chose to conduct this split on the participant- and not group-level because there is substantial inter-individual variability in overall reaction times and to retain an equal number of trials in the low and high confidence conditions.”

      In addition to these reviewer comments and in response to the eLife assessment, we would like to emphasize that the present findings are in our view not only relevant for a subfield but may be of considerable interest for researchers from various fields, beyond experimental memory research, including Neurobiology, Psychiatry, Clinical Psychology, Educational Psychology, or Law Psychology. We highlight the relevance of the topic and our findings now more explicitly in the introduction and discussion. Please see page 3:

      “The dynamics of memory after retrieval, whether through reconsolidation of the original trace or interference with retrieval-related traces, have fundamental implications for educational settings, eyewitness testimony, or mental disorders5,11,12. In clinical contexts, post-retrieval changes of memory might offer a unique opportunity to retrospectively modify or render less accessible unwanted memories, such as those associated with posttraumatic stress disorder (PTSD) or anxiety disorders(13–15). Given these potential far reaching implications, understanding the mechanisms underlying post-retrieval dynamics of memory is essential.”

      On page 17:

      “Upon their retrieval, memories can become sensitive to modification(1,2). Such post-retrieval changes in memory may be fundamental for adaptation to volatile environments and have critical implications for eyewitness testimony, clinical or educational contexts(5,11–15), Yet, the brain mechanisms involved in the dynamics of memory after retrieval are largely unknown, especially in humans.”

      And on page 19:

      “Beyond their theoretical relevance, these findings may have relevant implications for attempts to employ post-retrieval manipulations to modify unwanted memories in anxiety disorders or PTSD(97,98). Specifically, the present findings suggest that such interventions may be particularly promising if combined with cognitive or brain stimulation techniques ensuring a sufficient memory reactivation.“

      Reviewer #2 (Recommendations for the authors):

      My comments and/or questions for the authors to improve this well-written manuscript.

      (1) This study identifies the modulatory role of the hippocampus and VTC in the effects of norepinephrine on subsequent memory. Are there functional interactions between these ROIs and other brain regions that could be wise to consider for a more comprehensive understanding of the underlying neural mechanisms?

      We agree that functional interactions of hippocampus and VTC and other regions that were active during Day 2 memory cueing are relevant for our understanding of the underlying mechanisms. We therefore now performed connectivity analyses using general psycho-physiological interaction analysis (gPPI; as implemented in SPM) and report the results of this analysis on page 16, lines 635 to 644, and added Supplemental Table S4 including gPPI statistics.

      “We conducted general psycho-physiological interaction analysis (gPPI) analyses on the Day 2 memory cueing task (remembered – forgotten), which revealed that successful cueing was accompanied by significant functional connectivity between the left hippocampus, VTC, PCC and MPFC (see Supplemental Table S4). However, using these connectivity estimates to predict Day 3 subsequent memory performance (dprime) via regression did not reveal any significant Group × Connectivity interactions, indicating that the pharmacological manipulation (i.e. noradrenergic stimulation) did not modulate subsequent memory based on functional connectivity during memory cueing (all P<sub>Corr</sub> > .228). The same pattern of results was observed when including single trial beta estimates from multiple ROIs during memory cueing to predict Day 3 memory (all interaction effects P<sub>Corr</sub> > .288).”

      (2) In theory, noradrenergic activity would have a profound impact on activity in widespread brain regions that are closely related to memory function. It would be interesting to know other possible effects beyond the hippocampus and VTC.

      We agree and included in our analysis additional ROIs beyond the HC and VTC; we now report these explorative results on page 16, lines 616 to 633:

      “Beyond hippocampal and VTC activity during memory cueing (Day 2), we exploratively reanalysed the GLMMs predicting Day 3 memory performance including the PCC, which was relevant during memory cueing in the current study and in our previous work(26).  Predicting Day 3 memory performance by the factors Group and Single trial beta activity during memory cueing in the PCC did not reveal a significant interaction (P<sub>Corr</sub>  = 1); adding the factor Reaction time to the model also did not result in a significant interaction (P<sub>Corr</sub> = 1). We also included the Medial Prefrontal Cortex (MPFC) to predict Day 3 memory performance, as the MPFC has been shown to be sensitive to noradrenergic modulation in previous work(75). Predicting Day 3 memory performance by the factors Group and Single trial beta activity during memory cueing in the MPFC did not reveal a significant interaction (P<sub>Corr</sub>  = 1); adding the factor Reaction time to the model also did not result in a significant interaction (P<sub>Corr</sub> = 1), which indicates that the MPFC was not modulated by either pharmacological intervention. Finally, we investigated memory cueing from all remaining ROIs that were significantly activated during the Day 2 memory cueing task (Day 2 whole-brain analysis; correct-incorrect; Supplemental Table S3). We again fit GLMMs predicting Day 3 memory performance by the factors Group and Single trial beta activity during memory cueing. Again, we did not observe any significant interaction effect any of the ROIs (all interaction P<sub>Corr</sub> > .060) and these results did not change when adding the factor Reaction time to the respective models (all  P<sub>Corr</sub> > .075).”

      (3) There are substantial individual differences in pharmacological responses, physiological and cortisol measures, as shown in Figure 3A&B. If such individual differences are taken into account, are there any potential effects on subsequent recall on Day 3 pertaining to the hydrocortisone group?

      In response to this comment (and the General comment #1 of this reviewer), we now re-analyzed the respective models including individual measures of baseline-to-peak cortisol and systolic blood pressure.

      We re-analysed the reported Day 3 models, now including individual measures of baseline-to-peak changes in cortisol and systolic blood pressure, respectively. We report these additional analyses in the supplement and refer the interested reader to these analyses on page 15, lines 580 to 586:

      “As individual factors, such as metabolism or body weight, can influence the drug's action, we ran an additional analysis in which we included individual (baseline-to-peak) differences in salivary cortisol and (systolic) blood pressure, respectively. This analysis did not show any group by baseline-to-peak difference interaction suggesting that the observed memory effects were mainly driven by the pharmacological intervention group per se and less by individual variation in responses to the drug (see Supplemental Results).”

      And in the Supplemental Results:

      “To account for individual differences in cortisol responses after pill intake, we fit additional GLMMs predicting Day 3 subsequent memory of cued and correct trials including the factors Individual baseline-to-peak cortisol and Group. Doing so allowed us to account for variation in Day 3 performance, which might have resulted from within-group variation in cortisol responses, in particular in the CORT group. Importantly, none of the models predicting Day 3 memory performance by Day 2 cortisol-increase and Group, median-split RTs (high/low), hippocampal activity and RTs, or hippocampal activity and VTC category reinstatement revealed a significant group x baseline-to-peak cortisol interaction (all Ps > .122). These results suggest that inter-individual differences in cortisol responses did not have a significant impact on subsequent memory, beyond the influence of group per se. The same analyses were repeated for systolic blood pressure employing GLMMs predicting Day 3 subsequent memory of cued and correct trials including the factors Individual baseline-to-peak systolic blood pressure and Group to account for variation in Day 3 performance, which might have resulted from within-group variation in blood pressure response, in particular in the YOH group. While the model predicting Day 3 memory performance revealed a significant Individual baseline-to-peak systolic blood pressure × Group × median-split RTs (high/low) interaction (β = -0.05 ± 0.02, z = -2.04, P = .041, R<sup>2</sup><sub>conditional</sub> = 0.01), post-hoc slope tests, however, did not show any significant difference between groups (all P<sub>Corr</sub> > .329). The remaining models including hippocampal activity and RTs, or hippocampal activity and VTC category reinstatement did not reveal a significant Group × Individual baseline-to-peak systolic blood pressure interaction (all Ps > .101). These results suggest that inter-individual differences in systolic blood pressure responses did not have a significant impact on subsequent memory, beyond the influence of group per se.”

      (4) Median-splitting approach for reaction times and hippocampal activity should better be justified.

      Reaction times are well established proxies (correlates) of memory strength and memory confidence in previous research, as they reflect cognitive processes involved in retrieving information. Faster reaction times indicate stronger mnemonic evidence and higher confidence in the accuracy of a memory decision, while slower responses suggest weaker evidence and decision uncertainty or doubt. This relationship is supported by an extensive literature (e.g., Starns 2021; Robinson et al., 1997; Ratcliff & Murdock, 1976; amongst others). Importantly, distinguishing between high and low confidence choices in a memory task serves the purpose to differentiating between particularly strong memory evidence (e.g., is associative cued recall, when remembering is particularly vivid) and weaker memory evidence. Separating low from high confidence responses based on participants’ reaction times was especially important in the current analyses, because previous research demonstrates that reaction times during cued recall tasks inversely correlate with hippocampal involvement ( Heinbockel et al., 2024; Gagnon et al. 2019) and that stress-effects on human memory may be particularly pronounced for high-confidence memories (Gagnon et al., 2019).

      In response to the Reviewer comments, we have elaborated on our rationale for the distinction between short and long reaction times in the introduction, results, and methods. Please see page 4, lines 144 to 148:

      “We distinguished between responses with short and long reaction times indicative of high and low confidence responses because previous research showed that reaction times are inversely correlated with hippocampal memory involvement(58–60) and memory strength(61,62), and that high confidence memories associated with short reaction times may be particularly sensitive to stress effects(63).”

      On page 13, lines 520 to 523:

      “Reaction times in the Day 2 Memory cueing task revealed a trial-specific gradient in reactivation strength. Thus, we turned to single-trial analyses, differentiating Day 3 trials by short and long reaction times during memory cueing on Day 2 (median split), indicative of high vs. low memory confidence(58–60) and hippocampal reactivation(26,63).”

      And on page 26, lines 1046 to 1053:

      “Reaction times serve as a proxy for memory confidence and memory strength, with faster responses reflecting higher confidence/strength and slower responses suggesting greater uncertainty/weaker memory. The association between reaction times and memory confidence has been established by previous research(58–60), suggesting that the distinction between high from low confidence responses differentiates vividly recalled associations from decisions based on weaker memory evidence. Reaction times are further linked to hippocampal activity during recall tasks(26,53), and stress effects on memory are particularly pronounced for high-confidence memories(53).”

      Minor comments:

      (5) Please include the full names of key abbreviations in the figure legends, such as "ass.cat.hit" and among others.

      We now include the full names of key abbreviations in all figure legends (e.g., ass.cat.hit = associative category hit).

      (6) Please introduce various metrics used in the study to aid readers in better understanding the measurements they utilized.

      We agree that various measures that were included in our analyses had not been described clearly enough before, especially concerning the multivariate analyses. We therefore added short explanations across the results section.

      Page 8, lines 279 to 280: “Classifier accuracy is derived from the sum of correct predictions the trained classifier made in the test-set, relative to the total amount of predictions.”

      Page 8, lines 290 to 292:  “Neural reinstatement reflects the extent to which a neural activity pattern (i.e., for objects) that was present during encoding is reactivated during retrieval (e.g., memory cueing).”

      Page 8, lines 299 to 301:  “The logits here reflect the log-transformed trial-wise probability of a pattern either representing a scene or an object.”

      Page 10, lines 378 to 380:  “Beyond category-level reinstatement, we assessed event-level memory trace reinstatement from initial encoding (Day 1) to memory cueing (Day 2), via RSA, correlating neural patterns in each region (hippocampus, VTC, and PCC) across days.”

      (7) Please explain what the different colors represent in Figures 5B and 5C to avoid confusion. It would be good to indicate significant differences in the figures if applicable.

      We now added line legends to the figure and also the caption to clarify what exactly is depicted. We added asterisks to mark significant differences.

      References:

      Monfils, M. H., Cowansage, K. K., Klann, E., & LeDoux, J. E. (2009). Extinction-reconsolidation boundaries: key to persistent attenuation of fear memories. science324(5929), 951-955.

      Monfils, M. H., & Holmes, E. A. (2018). Memory boundaries: opening a window inspired by reconsolidation to treat anxiety, trauma-related, and addiction disorders. The Lancet Psychiatry5(12), 1032-1042.

      Lee, J. L. C., Nader, K. & Schiller, D. An Update on Memory Reconsolidation Updating. Trends Cogn. Sci. 21, 531–545 (2017).

      Radley, J. J., Williams, B., & Sawchenko, P. E. (2008). Noradrenergic innervation of the dorsal medial prefrontal cortex modulates hypothalamo-pituitary-adrenal responses to acute emotional stress. Journal of Neuroscience28(22), 5806-5816.

      Heinbockel, H., Wagner, A. D., & Schwabe, L. (2024). Post-retrieval stress impairs subsequent memory depending on hippocampal memory trace reinstatement during reactivation. Science Advances10(18), eadm7504.

  5. inst-fs-iad-prod.inscloudgate.net inst-fs-iad-prod.inscloudgate.net
    1. Americans want neighborhood schools, decentralized decision making, and democratic control. They see these devices in part as ways to ensure that schools can accommodate distinctive community desires, and to give parents a greater say about what goes on in them. Despite the fact that participation in school elections is very low and information on which to base a vote is often scarce, Americans will not surrender local control without a fight. They simply will not permit distant politicians or experts in a centralized civil service to make educational decisions. The reasons for this preference are complicated, in-cluding the incredible diversity of the population and the huge size of the coun-try. Not least important, however, is the fact that local districts mirror and reinforce separation by class and race. Democratic control, therefore, not only provides support for public education but also creates a forum for the occa-sional exercise of bigotry and xenophobia; localism not only accommodates community idiosyncrasies but also serves as a barrier to changes in the distri-bution of students and resources

      This statement discusses how the community's values are often portrayed in the schools as they are able to vote in school elections. It is then brought up that this can create "bigotry and zenophobia." Which I think raises a question for schools in general, not just the public ones. And that is; Should we embrace differences from other communities or should we keep it how it is? Now, this is where it gets complicated. Some may ask, why should people who are not a part of "said" community be able to make decisions on it. Another thing is, if this was possible, how effective would it be? Would we see a sharp positive reaction or would it eventually decline?

  6. inst-fs-iad-prod.inscloudgate.net inst-fs-iad-prod.inscloudgate.net
    1. But brilliance can come from anywhere. If we insist on class equity in schools, it will come from everywhere

      I think this excerpt can really speak to some people and even help people be more open minded. There are people out there that may be overlooked, or their capabilities may surprise you. Everyone deserves a chance. We should have a goal as society to make better for all so that recognizing brilliance doesnt come from social advantage.

    1. In fact, many argue that to truly be just and inclusive, design should not be done by professionals on behalf of the world, but rather done with the world. This need for radical inclusion in design processes comes from designers’ inability, no matter how committed to understanding other people’s perspectives, to accounting for the needs of a community, or the potential unintended consequences of a design on a community.

      I definitely agree with this statement as lived-experiences and unique needs of a community will always be more in-depth and might present hidden challenges that a designer who has only done research - no matter how extensive - may miss. I think there is a misconception that inclusion is something they can change individually in their life by including more resources and research on a community, but I think it is more of a systemic issue. In general, the principle seems so simple and even beautiful to assume the design we make for the world should be done by communities of the world instead of just a select few who use abstracted information about the world, yet we still struggle with inclusitivity in design.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their general comment and for the critical evaluation of our analyses and results interpretation. Their comments greatly helped us to improve the manuscript.

      • *

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary: An analysis of an Arabidopsis VSP13 presumed lipid transport is provided. The analysis pretty much follows similar studies done on yeast and human homologs. Key findings are the identification of multiple products from the locus due to differential splicing, analysis of lipid binding and transport properties, subcellular location, tissue specific promoter activity, mutant analysis suggesting a role in lipid remodeling following phosphate deprivation, but no physiological or growth defects of the mutants. Major points: The paper is generally written and documented, the experiments are well conducted and follow established protocols. The following major points should be considered:

      1. There are complementary lipid binding assays that should be considered such as liposome binding assays, or lipid/western dot blots. All of these might give slightly different results and may inform a consensus. Of course, non-membrane lipids such as TAG cannot be tested in a liposome assay.

      Concerning lipid transfer proteins (LTPs), it is important to differentiate the lipid binding capacity related to the transport specificity (which lipids are transported by a LTP?) from the lipid binding capacity linked to the targeting of a LTP to a specific membrane (a LTP can bind a specific lipid via a domain distinct from the lipid transfer domain to be targeted in cells, but will not transport this lipid). Both aspects are of high interest to be determined. Our goal here was to focus on the identification of the lipids bound to AtVPS13M1 and to be likely transported, which is why we used a truncation (1-335) corresponding to the N-term part of the hydrophobic tunnel. Liposome binding assays and lipid dot blots are necessary to answer the question of the membrane binding capacity of the protein. We think that this aspect is out of the scope of the current article as it will require to express and purify other AtVPS13M1 domains that are known to bind lipids such as the two PH domains and the C2. This will be the scope of future investigations in our lab.

      Similarly, lipid transfer based only on fluorophore-labeled lipids may be misleading because the fluorophore could affect binding. It is mentioned that the protein in this assay is tethered by 3xHis to the liposomes. Un less I ma missing something, I do not understand how that should work. This needs to be better explained.

      We truly agree with Reviewer 1 that the presence of a fluorophore could affect lipid binding to the protein. In this assay, lipids are labeled on their polar head and it is therefore difficult to conclude about the specificity of our protein in term of transport. This assay is used as a qualitative assay to show that AtVPS13M1(1-335) is able to transfer lipids in vitro, and in the manuscript, we did not make any conclusion about its transport specificity based on this assay, but rather used the binding assay to assess the binding, and likely transport, specificity of AtVPS13M1. FRET-based assay is a well-accepted assay in the lipid transfer community to easily probe lipid transport in vitro and has been used in the past to assess transfer capacity of different proteins, including for VPS13 proteins (for examples, see (Kumar et al., 2018; Hanna et al., 2022; Valverde et al., 2019)).

      To be able to transfer lipids from one liposome to another, both liposomes have to be in close proximity. Therefore, we attached our protein on donor acceptors, to favor the transport of the fluorescent lipids from the donor to the acceptor liposomes. Then, we progressively increased acceptor liposomes concentration to favor liposome proximity and the chance to have lipid transfer. We added a scheme on Figure 3B of the revised version of the manuscript to clarify the principle of the assay. In addition, we provided further control experiments suggested by Reviewers 2 and 3 showing that the fluorescence signal intensity depend on AtVPS13M1(1-335) protein concentration and that no fluorescence increase is measured with a control protein (Tom20.3) (see Figure 3C-D of the revised manuscript).

      The in vivo lipid binding assay could be obscured by the fact that the protein was produced in insect cells and lipid binding occurs during the producing. What is the evidence that added plants calli lipids can replace lipids already present during isolation.

      Actually we don’t really know whether the insect cells lipids initially bound to AtVPS13M1(1-335) are replaced by calli lipids or whether they bound to still available lipid binding sites on the protein. But we have two main lines of evidence showing that our purified protein can bind plant lipids even in the presence of insect cells lipids: 1) our protein can bind SQDG and MGDG, two plants specific lipids, and 2) as explained p.8 (lines 243-254), lipids coming from both organisms have a specific acyl-chain composition, with insect cells fatty acids mainly composed of C16 and C18 with 0 or 1 unsaturation whereas plant lipids can have up to 3 unsaturations. By analyzing and presenting on the histograms lipid species from insect cells, calli and those bound to AtVPS13M1(1-335), we were able to conclude that for all the lipid classes besides PS, a wide range of lipid species deriving from both organisms was bound to our protein. The data about the lipid species bound to AtVPS13M1(1-335) are presented in Figure 2E and S2.

      The effects on lipid composition of the mutants are not very drastic from what I can tell. Furthermore, how does this fit with the lipid composition of mitochondria where the protein appears to be mostly located?

      It is true that lipid composition variations in the mutants are not drastic but still statistically significant. As a general point in the field of lipid transfer, it is not very common to have major changes in total lipidome on single mutants of lipid transfer proteins because of a high redundancy of lipid transport pathway in cells. This is particularly true for VPS13 proteins, as exemplified by multiple studies. Major lipid phenotypes can be revealed in specific conditions, such as phosphate starvation in our case, or when looking at specific organelles or specific tissues and/or developmental stages. This is explained and illustrated by examples in the discussion part p. 16 (line 526-532). In addition, as suggested by Reviewer 3, we performed further lipid analysis on calli and also on rosettes under Pi starvation and found a similar trend (Figure 4 and S4 of the revised version of the manuscript). Thus, we believe that, even if not drastic, these variations during Pi starvation are a real phenotype of our mutants.

      As we found that our protein is located at the mitochondrial surface, we agree that Reviewer 1’s suggestion to perform lipidomic analyses on isolated mitochondria will be of high interest but this will be the scope of future studies that we will performed in our lab. First, we would like to identify all the organelles at which AtVPS13M1 is localized before performing subfractionations of these different organelles from the same pool of cell cultures grown in presence or absence of phosphate.

      For the localization of the fusion protein, has it been tested whether the furoin is functional? This should be tested (e.g. by reversion of lipid composition).

      As we did not observe major developmental phenotypes in our mutants, complementation should be indeed tested by performing lipidomic analyses in calli or plants grown in presence or absence of Pi, which is a time-consuming and expensive experiment. Because we used the fusions mainly for tissue expression study and subcellular localization and not for functional analyses, we believe that this is not an essential control to be performed for this work.

      It is speculated that different splice forms are located to different compartments. Can that be tested and used to explain the observed subcellular location patterns?

      Indeed some splice forms can modify the sequence of domains putatively involved in protein localization. This could be tested by producing synthetic constructs with one specific exon organization, which is challenging according to the size of AtVPS13M1 cDNA (around 12kb). In addition, our long-read sequencing experiment and PCR analyses revealed the existence of six transcripts, a major one representing around 92% and the five others representing less than 2.5% (Figure 1D). Among the five less abundant transcripts, four produce proteins with a premature stop codon and are likely to arise from splicing defects as explained in the discussion part p. 15 (lines 488-496). One produces a full-length protein with an additional loop in the VAB domain but because of the low abundance of this alternative transcript (1.4%), we believe it does not contribute significantly to the major localization we observed in plants and did not attend to analyze its localization.

      GUS fusion data only probe promoter activity but not all levels of gene expression. That caveat should be discussed.

      We are aware of this drawback and that is the reason why we fused the GUS enzyme directly to our protein expressed under its native locus (i.e. with endogenous promoter and exons/introns) as depicted in Figure 5A. Therefore, our construction allows to assess directly AtVPS13M1 protein level in plant tissues.

      Minor points: 1. Extraplastidic DGDG and export from chloroplasts following phosphate derivation was first reported in PMID: 10973486.

      We added this reference in the text.

      Check throughout the correct usage of gene expression as genes are expressed and proteins produced.

      Many thanks for this remark, we modified the text accordingly

      In general, the paper is too long. Redundancies between introduction, results and discussion should be removed to streamline.

      We reduced the text to avoid redundancy.

      I suggest to redraw the excel graphs to increase line thickness and enlarge font size to increase presentation and readability.

      We tried as much as we can to enlarge graphs and font size increasing readability.

      Reviewer #1 (Significance (Required)):

      Significance: Interorganellar lipid trafficking is an important topic and especially under studied in plants. Identifying components involved represents significant progress in the field. Similarly, lipid remodeling following phosphate derivation is an important phenomenon and the current advances our understanding.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary: The manuscript "AtVPS13M1 is involved in lipid remodelling in low phosphate and is located at the mitochondria surface in plants" by Leterme et al. identifies the protein VPS13M1 as a lipid transporter in Arabidopsis thaliana with important functions during phosphate starvation. The researchers were able to localise this protein to mitochondria via GFP-targeting in Arabidopsis. Although VPS13 proteins are well described in yeast and mammals, highlighting their importance in many vital cellular processes, there is very little information on them in plants. This manuscript provides new insights into plant VPS13 proteins and contributes to a better understanding of these proteins and their role in abiotic stress responses, such as phosphate starvation.

      Major points: - Please describe and define the domains of the VPS13M1 protein in detail, providing also a figure for that. Figure 1 is mainly describing possible splice variants, whereas the characteristics of the protein are missing.

      We have added information on AtVPS13M1 domain organization in the introduction (p.4, lines 103-109) and referred to Figure 1A that described protein domain organization. We did not added too much details as plant VPS13 protein domains organization was extensively described in two previous studies cited several times in the manuscript (Leterme et al., 2023; Levine, 2022).

      • Please compare the expression level of VPS13M1 in the presence and in the absence of phosphate.

      Many thanks for this suggestion. We performed qRT-PCR analyses of AtVPS13M1 from mRNA extracted from calli grown six days in presence and absence of phosphate. The results obtained did not reveal variations in mRNA level. The results were added in Figure S1A of the revised version of the manuscript and discussed in p.5 (lines 154-156).

      • Page 9, second paragraph: Here, the lipid transport capability of AtVPS13M1 is described. Varying concentrations of this recombinant protein should be used in this test. Further, it is not highlighted, that a truncated version of VSP13M1 is able to transport lipids. This is surprising, since this truncated version is less than 10% of the total protein (only aa 1-335).

      We agree with reviewer 2 that increasing protein concentration is an important control to perform. We included an experiment with an increasing quantity of protein (2X and 4X) in the revised version of the manuscript and showed that the signal intensity increased faster when protein concentration is higher (Figure 3D of the revised manuscript). As requested by Reviewer 3, we also included a negative control with Tom20.3 to show that the signal increase after the addition of AtVPS13M1(1-335) is specific to this protein (Figure 3C of the revised manuscript).

      The transport ability of the N-terminal part of VPS13 was demonstrated in yeast and mammals VPS13D (Kumar et al., 2018; Wang et al., 2021). We highlighted this p. 7 (lines 213-218) of the revised version of the manuscript. This is explained by the inherent structure of VPS13 proteins that are composed of several repeats of the same domain type called RBG (for repeating β-groove), each forming a β-sheet with a hydrophobic surface. The higher the number of RBG repeats, the longer the hydrophobic tunnel is. The (1-335) N-terminal region corresponds to two RBG unit repeats forming a “small” tunnel able to bind and transfer lipids. The number of RBG repeats has influence on the quantity of lipids bound per protein in vitro, the longest the protein is, the highest the number of lipid molecules bound is (Kumar et al., 2018), but the effect on protein length on in vitro lipid transfer capacity has not been investigated yet to the best of our knowledge.

      • Also, for phenotype analysis, T-DNA insertion mutants are used that still contain VPS13M1 transcripts. Although protein fragments where not detected by proteomic analysis, this might be due to low sensitivity of the proteomic assay. Further the lipid transport domain of VPS13M1 (aa 1-335) might not be affected by the T-DNA insertions at all. Here more detailed analysis needs to be done to prove that indeed loss-of protein function occurs in the mutants.

      We do not have other methods than proteomic to test whether our mutants are KO or not. We tried unsuccessfully to produce antibodies. Mass spectrometry is the most sensitive method but the absence of detection indeed does not mean the absence of the protein. From proteomic data, we can conclude that at least, our mutants present a decrease in AtVPS13M1 protein level, thus we called them “knock down” in the revised version of the manuscript and added the following sentence p. 9 (lines 297-300): “As the absence of detection of a protein by mass spectrometry-based proteomics does not allow us to strictly claim the absence of this protein in the sample, we concluded that AtVPS13M1 expression in both atvps13m1-1 and atvps13m1-4 was below the detection limit and consider them as knock down (KD) for AtVPS13M1.”

      • Localisation in mitochondria: As the Yepet signal is very weak, a control image of not transfected plant tissue needs to be included. Otherwise, it might be hard to distinguish the Yepet signal from background signal. The localisation data presented in Figure 5 does not allow the conclusion that VPS13M1 is localized at the surface of mitochondria as stated in the title. It only indicates (provided respective controls see above) that VPS13M1 is in mitochondria. Please provide more detailed analysis such as targeting to tobacco protoplasts, immunoblots or in vitro protein import assays. Also test +Pi vs. -Pi to see if VPS13M1 localisation is altered in dependence of Pi.

      Indeed our Yepet signal is not very strong but on the experiments we performed on Col0 non-transformed plants, we did not very often see fluorescence background in the leaves’ vascular tissue, that is why we focused our study on this tissue. We sometimes observed some background signals in some cells that are clearly different from AtVPS13M1-3xYepet signals and never co-localized with mitochondria. Examples of these aspecific signals are presented in Figure S6E of the revised version of the manuscript.

      We agree with reviewer 2 that our confocal images suggested, but not demonstrated, a localization at the surface of mitochondria. To confirm the localization, we generated calli cell cultures from AtVPS13M1-3xYepet lines and performed subcellular fractionations and western blot analyses confirming that AtVPS13M1 was indeed enriched in mitochondria and also in microsomal fractions (Figure 6G of the revised version). Then we performed mild proteolytic digestion of the isolated mitochondria with thermolysin and show that AtVPS13M1 was degraded, as the outer membrane protein Tom20.3, but not the inner membrane protein AtMic60, showing that AtVPS13M1 is indeed at the surface of mitochondria (Figure 5H of the revised manuscript). We believe that this experiment, in addition to the confocal images showing a signal around mitochondria, convincingly demonstrates that AtVPS13M1 is located at the surface of mitochondria.

      The localization of AtVPS13M1 under Pi starvation is a very important question that we tried to investigate without success. Indeed, we intended to perform confocal imaging on seedlings grown in liquid media to easily perform Pi starvation as described for the analysis of AtVPS13M1 tissue expression with β-glucuronidase constructs. However, the level of fluorescence background was very high in seedlings and no clear differences between non-transformed and AtVPS13M1-3xYepet lines were observed, even in root tips where the protein is supposed to be the most highly expressed according to β-glucuronidase assays. Example of images obtained are presented in Figure R1. We concluded that the level of expression of our construct was too low in seedlings. The constructions of lines with a higher AtVPS13M1 expression level, by changing the promotor, to better analyze AtVPS13M1 in different tissues or in response to Pi starvation will be the scope of future work in our laboratory in order to investigate AtVPS13M1 localization under low Pi.

      Phenotype analysis needs to be done under Pi stress and not under cold stress! Further, root architecture and root growth should also be done under Pi depletion. Here the title is also misleading, it is not at all clear why the authors switch from phosphate starvation to cold stress.

      In the revised version of the manuscript, we analyzed the seedlings root growth of two mutants (atvps13m1-3 and m1-4) under low Pi and did not notice significant differences (Figure 7E, S7D of the revised version). We analyzed growth under cold stress because this stress also promotes remodeling of lipids, but we agree that it goes beyond the scope of this article that is focused on Pi starvation and we removed this part from the revised manuscript.

      Minor points: Page 3, line 1: what does the abbreviation VPS stand for?

      The definition of VPS (Vacuolar Protein Sorting) was added.

      Page 3, line 1: change "amino acids residues" to "amino acid residues"

      This was done.

      Page 3, line 8 - 12: please rewrite this sentence. You write, that because of their distribution VPS13 proteins do exhibit many important physiological roles. The opposite is true: They are widely distributed in the cell because of their involvement in many physiological processes.

      We changed the sentence to “ VPS13 proteins localize to a wide variety of membranes and membrane contact sites (MCSs) in yeast and human (Dziurdzik and Conibear, 2021). This broad distribution on different organelles and MCSs is important to sustain their important roles in numerous cellular and organellar processes such as meiosis and sporulation, maintenance of actin skeleton and cell morphology, mitochondrial function, regulation of cellular phosphatidylinositol phosphates level and biogenesis of autophagosome and acrosome (Dziurdzik and Conibear, 2021; Hanna et al., 2023; Leonzino et al., 2021).”

      Page 6, line6: change "cDNA obtained from A. thaliana" to "cDNA generated from A. thaliana.

      This was done.

      Page 6, line 10: change" 7.6kb" to "7.6 kb"

      This was done.

      Page 7: address this question: can the isoforms form functional VPS13 proteins? This might help to postulate whether these isoforms are a result of defective splicing events.

      We addressed this aspect in the discussion p.15 at lines 486-502.

      Figure 2 B: Change "AtVPS13M1"to "AtVPS13M1(1-335)"

      This was done.

      Figure 2, legend: -put a blank before µM in each case.

      This was done.

      -Change 0,125µM to 0.125 µM

      This was done.

      -what does "in absence (A-0µM)" mean?

      This means that the Acceptor liposomes are at 0 µM. To clarify, we changed it to “Acceptor 0 µM” in the revised version of the manuscript (Figure 3C).

      -Which statistical analysis was employed?

      We performed a non-parametric Mann-Whitney test in the revised version of the manuscript. This was indicated in the legend.

      -Further, rewrite the sentence "Mass spectrometry (MS) analysis of lipids bound to AtVPS13M1(1-335) or Tom20 (negative control) after incubation with calli total lipids. Results are expresses in nmol of lipids per nmol of proteins (C) or in mol% (D)". -"C" and "D" are not directly comparable, as in "C" no Tom20 was used and in "C" no insect cells were used.

      -Further, in "D" the experimental setup is not clear. AtVPS13(1-335) is supposed to be purified protein after incubation with calli lipids (figure 2, A). Further, in the same figure, lipid composition of "insect cells" and "calli-Pi" are compared àwhy? Please clarify this.

      C and D are two different representations of the same results providing different types of information. In C., the results are expressed in nmol of lipids / nmol of proteins to assess 1) that the level of lipids found in AtVPS13M1(1-335) purifications is significantly higher than what we can expect from the background (assessed using Tom20) and 2) what are the classes of lipids that associate or not to AtVPS13M1(1-335). In D. the lipid distribution in mol% is presented for AtVPS13M1(1-335) as well as for total extracts from calli and insect cells to be able to compare if one lipid class is particularly enriched or not in AtVPS13M1(1-335) purifications compared to the initial extracts with which the protein was incubated. As an example, it allows to deduce that the absence of DGDG detected in the AtVPS13M1(1-335) purifications is not linked to a low level of DGDG in the calli extract, because it represented around 15 mol%, but likely to a weak affinity of the protein for this lipid. We did not represent the Tom20 lipid distribution on this graph because it represents background of lipid binding to the purification column and might suggest that Tom20 binds lipids. We changed the legend in this way and hope that it is clearer now: “C-D. Mass spectrometry (MS) analysis of lipids bound to AtVPS13M1(1-335) or Tom20 (negative control) after incubation with calli total lipids and repurification. Results are expresses in nmol of lipids per nmol of proteins in order to analyze the absolute quantity of the different lipid classes bound to AtVPS13M1(1-335) compared to Tom20 negative control (C), and in mol% to assess the global distribution of lipid classes in AtVPS13M1(1-335) purifications compared to the total lipid extract of insect cells and calli (D).”

      Figure 3: -t-test requires a normal distribution of the data. This is not possible for an n=3. Please use an adequate analysis.

      We performed more replicates and used non-parametric Mann-Whitney analyses in the revised version of the manuscript.

      -Please clarify the meaning of the letters on the top of the bars in the legend.

      This corresponded to the significance of t-tests performed in the first version of the manuscript that were reported in Table S3. As in the new version we performed Mann-Whitney tests, we highlighted the significance by stars and in the figure legends.

      Please, make it clear that two figures belong to C.

      This was clarified in the legend.

      -Reorganise the order of figure 3 (AàBàCàD)

      Because of the configuration of the different histograms presented in the figure, we were not able to change the order but we believed that the graphs can be easily red this way.

      Page 10, 3. Paragraph: since the finding, that no peptides were found in the VSP13M1 ko lines, although transcription was not altered, is surprising, please include the proteomic data in the supplement

      Proteomic data were deposited on PRIDE with the identifier PXD052019. They will remain not publicly accessible until the acceptance of the manuscript.

      Page 11, line 17: The in vitro experiments showed a low affinity of VSP13M1 towards galactolipids. It is further claimed that this is consistent with the finding of the AtVSP13M1 Ko line in vivo, that in absence of PI, no change in DGDG content could be observed. However, the "absence" of VSP13M1 in vivo might still result in a bigger VSP13M1 protein, than the truncated form (1-335) used for the in vitro experiments

      It is true that our in vitro experiments were performed only with a portion of AtVPS13M1 and that the length of the protein could influence protein binding specificity. We removed this assessment from the manuscript.

      Page 13, lane 8: you should reconsider the use of a triple Yepet tag: If two or more identical fluorescent molecules are in close proximity, their fluorescence emission is quenched, which results in a weak signal (as the one that you obtained). See: Zhuang et al. 2000 (PNAS) Fluorescence quenching: A tool for single-molecule protein-folding study

      Many thanks to point this paper. We use a triple Yepet because AtVPS13M1 has a very low level of expression and because this strategy was used successfully to visualize proteins for which the signal was below the detection level with a single GFP (Zhou et al., 2011). The quenching of the 3xYepet might also depend on the conformation they adopt on the targeting protein.

      Page 13, line 14: change 1µm to 1 µm

      This was done.

      Page 13, line 29: please reduce the sentence to the first part: if A does not colocalize with B, it is not necessary to mention that B does not colocalise with A.

      The sentence was modified accordingly.

      Page 14, 2. Paragraph: it is not conclusive that phenotype analysis is suddenly conducted with plants under cold stress, since everything was about Pi-starvation and the role of VSP13M1. Lipid remodelling under Pi stress completely differs from the lipid remodelling under cold stress.

      We eliminated this part in the revised version of the manuscript.

      Page 14, line 20: change figure to Figure

      This was done.

      Page 07, line 17: change artifact to artefact

      This was done.

      Reviewer #2 (Significance (Required)):

      General assessment: The paper is well written and technically sound. However, some points could be identified, that definitely need a revision. Overall, we got the impression that so far, the data gathered are still quite preliminary and need some more detailed investigations prior to publication (see major points).

      Advance: The study definitely fills a gap of knowledge since not much is known on the function of plant VPS13 proteins so far.

      Audience: The study is of very high interest to the plant lipid community but as well of general interest for Plant Molecular Biology and intracellular transport.

      Our expertise: Plant membrane transport and lipid homeostasis.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      The manuscript by Leterme et al. (2024) describes the characterization of VPS13M1 from Arabidopsis. VPS13 proteins have been analyzed in yeast and animals, where they establish lipid transfer connections between organelles, but not much is known about VPS13 proteins in plants. First, different splicing forms were characterized, and the form A was identified as the most relevant one with 92% of the transcripts. The protein (just N-terminal 335 amino acids out of ca. 3000 amino acids) was expressed in insect cells and purified. Next, the protein was used for lipid binding assays with NBD-labeled lipids followed by analysis in polyacrylamide gel electrophoresis. VPS13M1 bound to PC, PE, PS and PA. Then, the protein from insect cells was incubated with Arabidopsis callus lipids, and lipids bound to VPS13M1 analyzed by LC-MS/MS. Lipid transfer between liposomes was measured by the change in fluorescence in donor liposomes derived from two labeled lipids after addition of the protein caused by lipid transfer and dilution to acceptor liposomes. T-DNA insertion mutants were isolated and the lipids measured in callus derived from these mutants. Protein localization in different plant organs was recorded with a GUS fusion construct transferred into transgenic plants. The protein was localized to mitochondria using a VPS13M1-Yepet fusion construct transferred into mutant plants. The mutant plants show no visible difference to wild type, even when the plants were grown under stress conditions like low temperature. The main message of the title is that VPS13M1 localizes to the mitochondria which is well documented, and it is involved in lipid remodeling under low phosphate conditions.

      The lipid transfer assay shown in Figure 2F lacks a negative control. This would be the experiment with donor and acceptor liposomes in the presence of another protein like Tom20.

      Many thanks for this suggestion. In the revised version of the manuscript, we performed a fluorescent lipid transport assay with Tom20.3 in the presence of 25 µM of donor liposomes and 1.5 mM of acceptor liposomes, the condition for which we observed a maximum of transport for AtVPS13M1(1-335). As expected, no fluorescence increase was observed. The results are presented in the Figure 3C of the revised manuscript.

      The lipid data (Fig. 3 and Fig. S4) do not sufficiently support the second claim, i.e. that the protein is involved in lipid remodeling under low P. Data in Fig. 3C are derived from only 3 replicates and in Fig. S4 from only 2 replicas with considerable error bars. Having only 2 replicates is definitely not sufficient. Fig. 3C shows a suppression in the decrease in PE and PC at 4 d of P deprivation (significant for two mutants for PE, for only one for PC). Fig. S4A shows suppression of the decrease in PC at 6 d after P deprivation (significant for both mutants), but no significant effect on PE. Fig. 4SB shows no significant change in PE or PC at -P after 8 d of P deprivation. The data are not consistent. There are also problems with the statistics in Fig. 3 and Fig. S4. The authors used T-test, but place letters a, b, c on top of the bars. Usually, asterisks should be used to indicate significant differences. Data indicate medians and ranges, not mean and SD. In Fig. S4, how can you indicate median and range if you have only 2 replicates? Why did the authors use callus for lipid measurements? Why not use leaves and root tissues? What does adjusted nmol mean? What does the dashed line at 1.05 on the y axis mean? Taken together, I suggest to repeat lipid measurements with leaves and roots from plantets grown under +P and -P conditions in tissue culture with 5 replcates. Significant differences can be analyzed on the level of absolute (nmol per mg FW/DW) or relative (%) amounts.

      Here are our answers to concerns about the design of our lipidomics experiments:

      We used calli for lipid measurement because it is very easy to control growth conditions and to performed phosphate starvation from this cell cultures. The second reason is that it is a non-photosynthetic tissue with a high level of phospholipids and a low level of galactoglycerolipids and it is easier to monitor the modification of the balance phospholipids/galactoglycerolipids in this system. The lipid analysis on calli at 4 days of growth in presence or absence of Pi were performed on 3 biological replicates but on two different mutants (atvps13m-1 and m1-3) and we drew our conclusions based on variations that were significant for both mutants. In the revised version of the manuscript, we performed further lipidomic analyses on calli from Col0 and another mutant (atvps13m1-2) after 6 days of growth in presence or absence of Pi (Figure 4E, S4A-C, n=4-5) and added new data on a photosynthetic tissue (rosettes) from Col0 and atvps13m1-3 mutant. For rosettes analysis, seeds were germinated 4 days in plates with 1 mM Pi and then transferred on plates with 1 mM or 5 µM of Pi. Rosettes were harvested and lipids analyzed after 6 days (Figure 4F-G, S4D, n=4-5). All the data were represented with medians and ranges because we believe that median is less sensitive to extreme values than mean and might better represent what is occurring. Ranges highlight the minimal and maximal value of the data analyzed and we believe it is a representative view of the variability we obtained between biological samples.

      Lipid measurement are done by mass spectrometry. As it was already reported, mass spectrometry quantification is not trivial as the intensity of the response depends on the nature of the molecule (for a review, see (Jouhet et al., 2024)). To counteract this ionisation problem, we developed a method with an external standard that we called Quantified Control (QC) corresponding to an A. thaliana callus lipid extract for which the precised lipid composition was determined by TLC and GC-FID. All our MS signals were “adjusted” to the signal of this QC as described in (Jouhet et al., 2017). Therefore our lipid measurement are in adjusted nmol. In material and method we modified the sentence accordingly p22 lines 720-723: “Lipid amounts (pmol) were adjusted for response differences between internal standards and endogenous lipids and by comparison with a quality control (QC).” This allows to represent all the lipid classes on a same graph and to have an estimation of the lipid classes distribution. To assess the significance of our results, we used in the revised version of the manuscript non-parametric Mann-Whitney tests and added stars representing the p-value on charts. This was indicated in the figure legends.

      Here are our answers to concerns about the interpretation of our lipidomics experiments:

      To summarize, in the revised version of the manuscript, lipid analyses were performed in calli from 3 different mutants (two at day 4, one at day 6) and in the rosettes from one of these mutants. All the results are presented in Figure 4 and S4. In all the experiments, we found that in +Pi, there is no major modifications in the lipid content or composition. In –Pi, we found that the total glycerolipid content is always higher in the mutant compared to the Col0, whatever the tissue or mutant considered (Figure 4A and S4A, D). In calli, this higher increase in lipid content is mainly due to an accumulation of phospholipids and in rosettes, of galactolipids. Because of high variability between our biological replicates, we did not always found significant differences in the absolute amount of lipids in –Pi. However, the analysis of the fold change in lipid content in –Pi vs +Pi always pointed toward a reduced extent of phospholipid degradation. We also added in these graphs the fold change for the total phospholipids and total galactolipids contents in the revised version of the manuscript. We believe that the new analyses we performed strengthen our conclusion about the role of AtVPS13M1 in phospholipid degradation and not on the recycling of precursors backbone to feed galactoglycerolipids synthesis at the chloroplast envelope.

      Page 9, line 15: Please use the standard form of abbreviations of lipid molecular species with colon, e.g. PC32:0, not PC32-0

      The lipid species nomenclature has been changed accordingly.

      Page 11, line 4, (atvps13m1.1 and m1.3: please indicate the existence of mutant alleles with dashes, i.e. (atvps13m1-1 and atvps13m1-3

      Names of the mutants have been changed accordingly.

      Page 14, line 21: which line is indicated by atvps13m1.2-4? What does -4 indicate here?

      This indicates that mutants m1-2 to m1-4 were analyzed.

      Page 16, line 25: many abbreviations used here are very specific and not well known to the general audience e.g. ONT, IR, PTC, NMD etc. I think it is OK to mention them here, but still use the full terms, given that they are not used very frequently in the manuscript.

      We kept ONT abbreviation because it was cited many times in both the results and discussion part. IR, PTC and NMD were cited only in the discussion and were eliminated.

      Page 19, line 11. The authors cite Hsueh et al and Yang et al for LPTD1 playing a role in lipid homeostasis during P deficiency. But Yang et al. described the function of a SEC14 protein in Arabidopsis and rice during P deficiency. Is SEC14 related to LPTD1?

      Many thanks for noticing this mistake. We removed the citation Yang et al. in the revised version of the manuscript.

      Reference Tangpranomkorn et al. 2022: In the text, it says that this is a preprint, but in the Reference list, this is indicated with "Plant Biology" as Journal. In the internet, I could only find this manuscript in bioRxiv.

      This manuscript was accepted in “New Phytologist” in December 2024 and is now cited accordingly in the new version of the manuscript.

      Reviewer #3 (Significance (Required)):

      The manuscript by Leterme et al describes the characterization of the lipid binding and transport protein VTPS13M1 from Arabidopsis. I think that the liposome assay needs to be done with a negative control. Furthermore, I have major concerns with the lipid data in Fig. 3C and Fig. S4. These lipid data of the current manuscript need to be redone. I do not agree that the lipid data allow the conclusion that "AtVPS13M1 is involved in lipid remodeling in low phosphate" as stated in the title.

      References cited in this document:

      Dziurdzik, S.K., and E. Conibear. 2021. The Vps13 Family of Lipid Transporters and Its Role at Membrane Contact Sites. Int J Mol Sci. 22:2905. doi:10.3390/ijms22062905.

      Hanna, M., A. Guillén-Samander, and P. De Camilli. 2023. RBG Motif Bridge-Like Lipid Transport Proteins: Structure, Functions, and Open Questions. Annu Rev Cell Dev Biol. 39:409–434. doi:10.1146/annurev-cellbio-120420-014634.

      Hanna, M.G., P.H. Suen, Y. Wu, K.M. Reinisch, and P. De Camilli. 2022. SHIP164 is a chorein motif lipid transfer protein that controls endosome–Golgi membrane traffic. Journal of Cell Biology. 221:e202111018. doi:10.1083/jcb.202111018.

      Jouhet, J., E. Alves, Y. Boutté, S. Darnet, F. Domergue, T. Durand, P. Fischer, L. Fouillen, M. Grube, J. Joubès, U. Kalnenieks, J.M. Kargul, I. Khozin-Goldberg, C. Leblanc, S. Letsiou, J. Lupette, G.V. Markov, I. Medina, T. Melo, P. Mojzeš, S. Momchilova, S. Mongrand, A.S.P. Moreira, B.B. Neves, C. Oger, F. Rey, S. Santaeufemia, H. Schaller, G. Schleyer, Z. Tietel, G. Zammit, C. Ziv, and R. Domingues. 2024. Plant and algal lipidomes: Analysis, composition, and their societal significance. Progress in Lipid Research. 96:101290. doi:10.1016/j.plipres.2024.101290.

      Jouhet, J., J. Lupette, O. Clerc, L. Magneschi, M. Bedhomme, S. Collin, S. Roy, E. Maréchal, and F. Rébeillé. 2017. LC-MS/MS versus TLC plus GC methods: Consistency of glycerolipid and fatty acid profiles in microalgae and higher plant cells and effect of a nitrogen starvation. PLoS ONE. 12:e0182423. doi:10.1371/journal.pone.0182423.

      Kumar, N., M. Leonzino, W. Hancock-Cerutti, F.A. Horenkamp, P. Li, J.A. Lees, H. Wheeler, K.M. Reinisch, and P. De Camilli. 2018. VPS13A and VPS13C are lipid transport proteins differentially localized at ER contact sites. J Cell Biol. 217:3625–3639. doi:10.1083/jcb.201807019.

      Leonzino, M., K.M. Reinisch, and P. De Camilli. 2021. Insights into VPS13 properties and function reveal a new mechanism of eukaryotic lipid transport. Biochimica et Biophysica Acta (BBA) - Molecular and Cell Biology of Lipids. 1866:159003. doi:10.1016/j.bbalip.2021.159003.

      Leterme, S., O. Bastien, R.A. Cigliano, A. Amato, and M. Michaud. 2023. Phylogenetic and Structural Analyses of VPS13 Proteins in Archaeplastida Reveal Their Complex Evolutionary History in Viridiplantae. Contact (Thousand Oaks). 6:1–23. doi:10.1177/25152564231211976.

      Levine, T.P. 2022. Sequence Analysis and Structural Predictions of Lipid Transfer Bridges in the Repeating Beta Groove (RBG) Superfamily Reveal Past and Present Domain Variations Affecting Form, Function and Interactions of VPS13, ATG2, SHIP164, Hobbit and Tweek. Contact. 5:251525642211343. doi:10.1177/25152564221134328.

      Valverde, D.P., S. Yu, V. Boggavarapu, N. Kumar, J.A. Lees, T. Walz, K.M. Reinisch, and T.J. Melia. 2019. ATG2 transports lipids to promote autophagosome biogenesis. J Cell Biol. 218:1787–1798. doi:10.1083/jcb.201811139.

      Wang, J., N. Fang, J. Xiong, Y. Du, Y. Cao, and W.-K. Ji. 2021. An ESCRT-dependent step in fatty acid transfer from lipid droplets to mitochondria through VPS13D−TSG101 interactions. Nat Commun. 12:1252. doi:10.1038/s41467-021-21525-5.

      Zhou, R., L.M. Benavente, A.N. Stepanova, and J.M. Alonso. 2011. A recombineering-based gene tagging system for Arabidopsis. Plant J. 66:712–723. doi:10.1111/j.1365-313X.2011.04524.x.

    1. AbstractBackground Cardamine chenopodiifolia is an amphicarpic plant that develops two fruit morphs, one above and the other below ground. Above-ground fruit disperse their seeds by explosive coiling of the fruit valves, while below-ground fruit are non-explosive. Amphicarpy is a rare trait that is associated with polyploidy in C. chenopodiifolia. Studies into the development and evolution of this trait are currently limited by the absence of genomic data for C. chenopodiifolia.Results We produced a chromosome-scale assembly of the octoploid C. chenopodiifolia genome using high-fidelity long read sequencing with the Pacific Biosciences platform. We successfully assembled 32 chromosomes and two organelle genomes with a total length of 597.2 Mbp and an N50 of 18.8 kbp (estimated genome size from flow cytometry: 626 Mbp). We assessed the quality of this assembly using genome-wide chromosome conformation capture (Omni-C) and BUSCO analysis (97.1% genome completeness). Additionally, we conducted synteny analysis to infer that C. chenopodiifolia likely originated via allo-rather than auto-polyploidy and phased one of the four sub-genomes.Conclusions This study provides a draft genome assembly for C. chenopodiifolia, which is a polyploid, amphicarpic species within the Brassicaceae family. This genome offers a valuable resource to investigate the under-studied trait of amphicarpy and the origin of new traits by allopolyploidy.

      Reviewer 1. Rie Shimizu

      This manuscript deciphers the complicated genome of an octoploid species, Cardamine chenopodiifolia. They successfully assembled a chromosome-level genome with 32 chromosomes, consistent with the chromosome counting. They evaluated the quality of the genome by several methods (mapping Omni-C reads, BUSCO, variant calling etc.). All benchmarks ensured the high quality of their assembly. They even tried to phase the chromosomes into four subgenomes, and one subgenome was successfully phased thanks to its higher divergence compared to the other three sets. Despite their intensive effort, the other three subgenomes could not be phased, suggesting the relationship originated from the same or closely related species. As a whole, the manuscript is very well written and describes enough details, and the genome data looks like it is already available in a public database. They even added a description of the biological application of this assembly about the amphicarpy.

      I only found a few minor points for which I kindly suggest reconsideration/rephrasing before publication, as listed below. *As the review PDF does not contain the line numbers, I suggest the original description at the first line and then write my comments.

      –C. chenopodiifolia genome is octoploid …, suggesting that its genome is octoploid. They compare the 8C peak of C. hirsuta and 2C peak of the target, but considering the genome size variation among Cardamine species, I do not think this is an appropriate expression. The pattern may mean ‘consistent’ with the expectation from C. hirsuta peaks but does not ‘suggest’ octoploidy. -C. chenopodiifolia chromosome-level genome assembly PacBio Sequel II platform. Here and nowhere, they do not mention the mode of sequencing (only found in method and the title of a table). Maybe ‘HiFi’ could be added here to make the method clearer. -Table 2. It would make more sense to overview the genome quality if the N90 and L90 (or similar, if it is already fragmented at L90) values are added. (maybe the same for Table 1). Otherwise Nx curves would be also fine for the same purpose. -We obtained only 20800 variants,…as expected for a selfing species. It might be partially due to selfing in wild habitat, but also by selfing (5 times) in the lab. This should be mentioned here to avoid misleading. -Table 4 The unit of each item (bp, number, frequency…?) should be suggested. In addition to the points listed above, I appreciate more Information about the phased chromosomes set: Total subgenome sizes of this set and the other three sets?(1:3 or imbalanced?) It would be even better with a synteny plot in addition to the colinear plot as Fig 3C. (e.g. by GENESPACE or something similar, including phased and unphased chenopodiifolia chromosome sets and C. hirsuta)

      Reviewer 2. .Qing Liu

      This manuscript “Polyploid genome assembly of Cardamine chenopodiifolia” produced a chromosome-scale assembly of the octoploid C. chenopodiifolia genome using highfidelity long read sequencing with the Pacific Biosciences platform with two organelle genomes with a total length of 597.2 Mb and an N50 of 18.8 Mb together with BUSCO analysis (99.8% genome completeness), and phased one of the four sub-genomes. This study provides a valuable resource to investigate the understudied trait of amphicarpy and the origin of new traits by allopolyploidy. The manuscript is suitably edited and significant data for amphicarpy breeding of C. chenopodiifolia except for the below revision points. The major revision is suggested for the current version of the manuscript.

      1 Please elucidate “an N50 of 18.8 Mb”, which is Contig or Scaffold N50 length. 2 Please elucidate “originated via allo- rather than auto-polyploidy”, which is “originated via allopolyploidy rather than autopolyploidy”. 3 Please substitute the word “understudied trait” using alternative sensible word. 4 “to phase this set of chromosomes by gene tree topology analysis”, it is suggested to be “to phase this set of chromosomes by gene phylogeney analysis”. 5 In the first section of Resuts, Cardamine chenopodiifolia genome is octoploid is suggested. 6 Could Table 1 and Table2 be combined as one table to present the sequencing and assembly characterization of C. chenopodiifolia genome. 7 Could the entromere locations be predicted in Table 5, which is the 32 chromosome summary of C. chenopodiifolia genome. 8 In Table 2, assembly 32 chromosomes including two organelles, which is not close related with the C. chenopodiifolia genome, from my point of view, two organelle genome assembly do not critical section of manuscript. 9 Could all figure numbers are ordered below each group figures, for example the below figure should be numbered before the Figure 2A (according group figure presence order). I wonder it is Figure 2, authors want to elucidate the chromosome number 2n=42, while I can’t count out 42 chromosomes from present format.Could authors using alternative clear figure to show the cytological evidence of C. chenopodiifolia chromosome number. 10 In Figure 5A, it is difficult to point out the clear meaning for first-diverged chromosome from gene tree, which is a phylogenetic meaning tree or just framework, could author redraw this Figure 5A in order to reader got what you mean.

      Reviewer 3. Kang Zhang.

      The paper produced a chromosome-scale assembly of the C. chenopodiifolia genome in the Brassicaceae family, and offers a valuable resource to investigate the understudied trait of amphicarpy and the origin of new traits by allopolyploidy. I have the following comments which can be considered to improve the ms.

      Major points. 1.The introduction states that Cardamine is among the largest genera within the Brassicaceae family. The octaploid model species C. occulta and the diploid C. hirsuta have been sequenced. Therefore, I propose that a description of the evolutionary relationships among various species be included here. Additionally, the significance of the amphicarpic trait in the study of plant evolution and adaptation could be highlighted when discussing their octoploid characteristics. 2.The paper omits a detailed description of genome annotation and significant genomic features, which are essential for clearly illustrating the characteristics of the genome. To enhance this aspect, it would be beneficial to include a circular chart that displays fundamental components such as gene density, CG content, TE density, and collinearity links, among others. 3.The authors employed various techniques to differentiate the four subgenomic sets within the C. chenopodiifolia genome and ultimately managed to isolate a single sub-genomic set. The paper references the assembly of the octaploid genome of another model plant, C. occulta, within the same genus. Could it be utilized to compare with C. chenopodiifolia to achieve improvements? In addition, I suggest the authors to examine the gene density differences among these subgenomes, which could be helpful in distinguishing them. 4.Little important information were included in Table 1, 3, and Figure 4. These tables and figures should be moved to Supplementary data. 5.Evidence from Hi-C heatmap should be provided to validate the structural variations among different sets of subgenomes, such as those in Figure 3.

      Minor points. 1.Figure 5B, please change the vertical coordinate ‘# gene pairs’ to ‘Number of gene pairs’. The fonts in some figures are a little bit small. I suggest to adjust them to make it easy to read.

    1. AbstractThe number of high-quality genomes is rapidly growing across taxa. However, it remains limited for coral reef fish of the Pomacentrid family, with most research focused on anemonefish. Here, we present the first assembly for a Pomacentrid of the genus Chrysiptera. Using PacBio long-read sequencing with a coverage of 94.5x, the genome of the Sapphire Devil, Chrysiptera cyanea was assembled and annotated. The final assembly consisted of 896 Mb pairs across 91 contigs, with a BUSCO completeness of 97.6%. 28,173 genes were identified. Comparative analyses with available chromosome-scale assemblies for related species identified contig-chromosome correspondences. This genome will be useful to use as a comparison to study the specific adaptations linked to symbiosis life of the closely related anemonefish. Furthermore, this species is present in most tropical coastal areas in the Indo-West Pacific and could become a model for environmental monitoring. This work will allow to expand coral reef research efforts and highlights the power of long-read assemblies to retrieve high quality genomes.

      This work has been published in GigaByte Journal under a CC-BY 4.0 license (https://doi.org/10.46471/gigabyte.144). These reviews are as follows.

      Reviewer 1. Darrin T. Schultz

      Are all data available and do they match the descriptions in the paper?

      No. The genome is also not yet on NCBI, but it would be good to upload it.

      Are the data and metadata consistent with relevant minimum information or reporting standards?

      Yes. I suggest later that there should be more information about the HiFi library preparation details, as the manuscript lacks them and it appears to be a non-standard (large insert size) library.

      Is the data acquisition clear, complete and methodologically sound?

      No. See above comment-

      Is there sufficient detail in the methods and data-processing steps to allow reproduction?

      No. No parameters are provided for the genome assembly software, for read trimming, or for other software used.

      Is there sufficient data validation and statistical analyses of data quality?

      No. See extended comments - the read data could use more QC, as well as the genome assembly.

      Is the validation suitable for this type of data?

      No.

      Is there sufficient information for others to reuse this dataset or integrate it with other data?

      Yes. There is a degree of information missing about the data, but another researcher could use them for their study.

      Additional Comments:

      Thank you for the opportunity to review the work, The genome of the sapphire damselfish Chrysiptera cyanea: a new resource to support further investigation of the evolution of Pomacentrids, by Gairin and colleagues. In this manuscript, the authors collect an individual of the pomocentrid fish, Chrysiptera cyanea, in Okinawa, Japan. After isolating DNA, the sequencing center at OIST prepared and sequenced a SMRT sequencing library. Additionally, the authors generated some bulk RNA-seq data and sequenced it on the Illumina platform. The authors assembled the genome with two assemblers, and performed some comparisons of the C. cyanea contigs aligned to the chromosome-scale scaffolds of closely related pomacentrids. Given my background, I will mostly comment on the genomic analyses.

      I appreciate the authors' diligence in exploring different genome assembly methods and their efforts in running BUSCO and QUAST to QC the assemblies. The DNA sequencing data and assembly produced contigs that align well with the chromosomes of closely related species (which is convenient for comparative genomics!), and the manuscript presents a solid foundation for better understanding the chromosomal evolutionary history of the Pomacentridae.

      While this work represents an important step toward providing a new genomic resource for Chrysiptera cyanea, I see a few areas where the manuscript could be refined to enhance it as a community resource:

      (1) More information about data generation: Including additional details about the HiFi library preparation, specifically the chemistries used, the number of SMRT cells sequenced, and the bioinformatics steps used to generate the HiFi reads, would improve the manuscript's clarity and reproducibility. I have some questions regarding whether these libraries were prepared for HiFi sequencing: the reported mean read length of 25kbp is 10kbp longer than the standard HiFi library insert size; and the reported amount of bases in the reads, 84 Gbp, is more data than one would expect from a single CCS-processed SMRT cell, but could be the amount of data produced from one CLR run. Characterizing the quality score vs read length distribution could be helpful to characterize the read data. Clarifying these steps taken before the genome was assembled would strengthen the reliability of these reads as a resource.

      (2) Incorporating a few more important quality control (QC) steps would better clarify the completeness of the genome assembly. For instance, an estimate of genome size from the HiFi reads could be performed with jellyfish and GenomeScope, taking advantage of the k-mer fidelity of HiFi reads. This would provide a more conclusive estimate than the current comparison. Additionally, steps such as checking for contamination and providing an explanation for decisions like haplotig removal would make the assembly process more transparent. Lastly, supplementing the QC analysis with Merqury will provide a reliable answer to how complete the assembly represents the information in the individual HiFi reads in a way that complements BUSCO and QUAST.

      (3) The initial analyses of chromosome structure are a promising look into some yet-unexplored chromosomal changes in the Pomacentridae, and I think that incorporating a deeper phylogenetic analysis would build on this strength. Situating the chromosomal findings within a phylogenetic framework could provide stronger support, or actually resolve, the evolutionary interpretations presented. Doing this analysis likely could also help resolve whether the structures seen are genome misassemblies, or instead reflect lineage-specific chromosomal changes. The authors could supplement their beautiful figures using other tools that leverage whole-genome alignments and chromosome visualization to help answer these questions. One tool to try for two-genome comparisons, that the authors may have explored already in place of their ggplot script, is D-GENIES.

      Overall, this is a valuable resource, and I commend the authors for taking the steps to analyze the chromosomal evolutionary history within the pomacentrids. I look forward to seeing the authors’ future contributions to the field of genomics and chromosome evolution.

      Minor Points Line 125: Sharing the specific Trimmomatic settings used would enhance the reproducibility of the RNA-seq data processing. The parameters for genome assembly should also be added. Line 212: Are there any replicates for the RNA-seq data? Line 294: Consider uploading the assembly to NCBI for broader visibility and accessibility.

      Reviewer 2. Yue Song.

      Are all data available and do they match the descriptions in the paper?

      No. The authors have provided clues for accessing the data in public databases such as NCBI, but it seems that the data has not been released; At least, I haven't been able to obtain available data using the provided accession number (e.g. PRJNA1167451). I'm not sure if I've missed any information, but I believe it would be better if the data could be easily accessible to the public.

      Is the data acquisition clear, complete and methodologically sound?

      No. The authors used PacBio's third-generation sequencing technology for genome sequencing, which has become a "necessary option" for obtaining high-quality genomes in current genomic research. However, they did not further advance on the path of "assembling a chromosome-level genome" based on this version. Providing a chromosome-level genome would likely be more meaningful.

      Is there sufficient detail in the methods and data-processing steps to allow reproduction?

      No. Regarding the genome assembly and annotation process, the method described by the authors is overly simplistic and lacks detailed information on the parameters and procedures used. This makes it difficult for other researchers to effectively replicate the results described in the article.

      Is there sufficient data validation and statistical analyses of data quality?

      No. The authors have calculated the N50 of contigs and the completeness of BUSCO genes, which are indeed two commonly used indicators for assessing the quality of genome assemblies. However, it is still challenging to gain a clear understanding of the assembly quality based solely on these two indicators. Could other measurements be added, such as comparing the continuity and completeness of the assembly with those of closely related species or other comparable species' genomes? Additionally, there is a point that is difficult to understand: the authors report a BUSCO completeness of approximately 94% for the genome, yet a BUSCO completeness of 97% for the gene set. It is puzzling how BUSCO genes that are not annotated in the genome can still be present in the gene set.

      Is there sufficient information for others to reuse this dataset or integrate it with other data?

      No. As I mentioned earlier, the authors did not provide detailed information about the processing procedures and parameters, which makes it difficult for other researchers to replicate their results.

      Additional Comments: It is recommended that the authors provide a detailed description of the methods and easily accessible data retrieval methods. It would be even better if the authors could further provide a chromosome-level genome, as T2T (telomere-to-telomere) level genomes are becoming increasingly popular.

    1. Reviewer #2 (Public review):

      Summary:

      This paper by Misra and Pessoa uses switching linear dynamical systems (SLDS) to investigate the neural network dynamics underlying threat processing at varying levels of proximity. Using an existing dataset from a threat-of-shock paradigm in which threat proximity is manipulated in a continuous fashion, the authors first show that they can identify states that each has their own linear dynamical system and are consistently associated with distinct phases of the threat-of-shock task (e.g., "peri-shock", "not near", etc). They then show how activity maps associated with these states are in agreement with existing literature on neural mechanisms of threat processing, and how activity in underlying brain regions alters around state transitions. The central novelty of the paper lies in its analyses of how intrinsic and extrinsic factors contribute to within-state trajectories and between-state transitions. A final set of analyses shows how the findings generalize to another (related) threat paradigm.

      Strengths:

      The analyses for this study are conducted at a very high level of mathematical and theoretical sophistication. The paper is very well written and effectively communicates complex concepts from dynamical systems. I am enthusiastic about this paper, but I think the authors have not yet exploited the full potential of their analyses in making this work meaningful toward increasing our neuroscientific understanding of threat processing, as explained below.

      Weaknesses:

      (1) I appreciate the sophistication of the analyses applied and/or developed by the authors. These methods have many potential use cases for investigating the network dynamics underlying various cognitive and affective processes. However, I am somewhat disappointed by the level of inferences made by the authors based on these analyses at the level of systems neuroscience. As an illustration consider the following citations from the abstract: "The results revealed that threat processing benefits from being viewed in terms of dynamic multivariate patterns whose trajectories are a combination of intrinsic and extrinsic factors that jointly determine how the brain temporally evolves during dynamic threat" and "We propose that viewing threat processing through the lens of dynamical systems offers important avenues to uncover properties of the dynamics of threat that are not unveiled with standard experimental designs and analyses". I can agree to the claim that we may be able to better describe the intrinsic and extrinsic dynamics of threat processing using this method, but what is now the contribution that this makes toward understanding these processes?

      (2) How sure can we be that it is possible to separate extrinsically and intrinsically driven dynamics?

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Response to Reviewers

      We thank the reviewers for their fair and thorough review. With regards to the reviewers’ comments, both largely focused on something that is a misunderstanding. For unclear reasons, both reviewers thought that most of the data shown was in pregnant or previously pregnant mice and both requested a significant amount of preliminary data regarding virgin mice (R1 comment #3, #4, R2 comment #1). This may be due to a (now-corrected) typo in the results section despite the methods section being correct, or the very few instances of pregnant mice being used for analyses that led to confusion. As mentioned below, the entire manuscript evaluates virgin mice, with a few specific exceptions, so the preliminary revisions have emphasized the parity status of the mice used in every experiment. We regret this misunderstanding happened and we are concerned this may have led to reviews that were biased towards a negative viewpoint. We hope the completed preliminary revisions (indicated in red text in the manuscript) and the planned revisions will, combined, satisfy the reviewer’s concerns and clarify points of confusion, while leading to a greatly improved manuscript.


      Reviewer 1:

      Major points

      1. Major Comment 1: “Several of the conclusions are made based on a limited number of replicates (often n=3) which is not a robust sample size to make a rigorous conclusion.”

      We have consulted with a biostatistician (Adam Lane, now included in acknowledgements) and plan to add at least 3 more mice per group to bring the total sample size to 6-7. Given our results are already statistically significant with an n=3-4, we do not anticipate any changes in the overall results of our data. We have already collected at least 3 more age-matched and parity-matched mice per group for the molecular analyses and are working on performing the immunohistochemical stains, western blots, etc.


      Major Comment 2: The main text for Figure 1C mentions repression of luciferase expression by doxycycline chow, however the figure does not show any discernable repression in the Dek-OE conditions.

      We believe the reviewer may have mis-interpreted the figure. The mouse on the far left (“control”) with no luciferase signal is the dox chow-repressed condition. We have revised the figure label to specify that “Control” is the “+dox condition” and throughout the manuscript have specified “+dox controls” instead of just “controls.”


      Major Comments 3: To evaluate the impact of prolonged Dek overexpression on mammary epithelium in Figure 1G and 1H, the authors used multiparous females. One confounding factor with this experimental set up is the impact of previous pregnancies on the development of the mammary epithelium and in lowering tumorigenesis. Therefore, the impact of Dek on tumorigenesis cannot be determined in multiparous animals alone. To get a full picture, nulliparous animals should also be examined.

      __We have revised the text on page 6 to explain that we have monitored tumor growth in both aged virgins and in multiparous mice (our female breeders) and neither group develops tumors. __

      Major Comment 4: “To elucidate the molecular underpinning of Dek-OE phenotypes, the authors performed bulk RNA sequencing in Figure 2. Similar to point 2 however, only multiparous animals were used. As it has been previously shown that pregnancy significantly impacts the transcriptome of mammary glands, the effects of Dek overexpression can't be generalized to mammary glands as a whole. To make it generalizable, nulliparous Dek-OE animals must also be characterized.”

      As mentioned in the introduction to the review, the reviewer has misunderstood the experimental design, perhaps through a single typo in the Results section when the Methods were correct, or through poor writing on our behalf. Regardless, the RNA-Seq, whole mounts, and all subsequent molecular validations were conducted on virgin mice. The only exceptions are in Figure 4, where we do explore the expression of endogenous Dek during pregnancy and the impact of pregnancy in the transgenic model. We have revised the typographical error, confirmed the parity status of all mice in the study to date, and have specifically added the parity status to each experiment in Results section and/or Figure Legend.


      Major Comment 5: To validate findings from their transcriptomics work, the authors used IHC and western blots of candidate proteins that were found to be down regulated. In Figure 3A and 3C, the decrease in p21 protein levels through western blot seem much more modest than what the decrease seen in 3A would suggest.

      We thank the reviewer for pointing this out. With increased sample sizes, as requested, we hope this will resolve. We plan to increase sample size and quantify the p21 western blot to potentially resolve the concern. In addition, we would like to note that the p21 IHC is specific for mammary epithelium signal while the western blot is whole mammary gland lysate that includes quiescent stromal cells, which may explain the slight discrepancy between the two methods.

      Major Comment 6: In Figure 3G-3I, the authors test the CDK4/6 inhibitor palbociclib to establish a direct link between the phenotypes seem in Dek-OE and cell cycle progression in organoid culture. Have the authors verified these findings with treatment of Dek-OE mice with palbociclib? In addition, have the authors checked to see if palbociclib corrected any of the transcriptional features associated with the Dek-OE model found in their transcriptomics data? In addition, the authors claim that the effect is specific to Dek-OE organoids as the effects of palbociclib on growth are not seen in control organoids. However, the data on unperturbed growth of control cells are not seen. To determine the specificity of the effects of palbociclib on Dek-OE derived organoids, the authors must show a time course tracking the growth of organoids with and without palbociclib. Rather than conclude the effects of palbociclib being specific to Dek-OE organoids, the authors most likely wanted to conclude that the increased growth of Dek-OE organoids compared to control organoids is dependent on the increase in cell cycle factors. (The validity of this is also weird though because even if division and growth were triggered through other transcriptional changes they found, like increased metabolism, growth in that scenario would be stopped by palbo as well)

      1. Because the hyperplasia phenotype accumulates over the lifetime of the animal, the amount of treatment time required to abrogate the hyperplasia phenotype could be from days to weeks to months. For this reason, we believe it is outside the scope of this revision to test the effects of palbociclib in vivo.
      2. We plan to re-do this experiment with palbociclib treatment to test organoid growth over time as suggested and, time permitting, perform immunofluorescence for some of the transcription targets such as cyclins, CDKs, Ki67, and p27/p21
      3. We have revised the text on page 8 to say “____We observed that the increased growth of Dek over-expressing organoids was dependent on the Dek-induced increase in CDK4/6, since palbociclib treatment resulted in smaller Dek over-expressing organoids that were comparable to organoids from +dox controls.” We also agree that CDK inhibitor treatment may impact multiple downstream signaling pathways. However, the authors do not see this as a negative because cell proliferation, induced by cyclin/CDK complexes, requires metabolic regulation to support physical growth of the cell. The two processes are intricately integrated and have a bidirectional relationship. Thus, it is possible that DEK induces both processes, or it may only promote one process (i.e.: cell cycle) and the other one (i.e.: metabolism) is induced as a secondary result of cell cycle demands. This is one reason why we indicate that metabolic dysregulation should be further studied in the Discussion section. Indeed, a colleague in the DEK field (Susanne Wells) is already working on the relationship between DEK over-expression and metabolic dysfunction, thus this particular aspect of the request is outside the scope of this manuscript.

      Major Comment 7: In the main text of Figure 4, the authors conclude that markers for luminal hormone sensing cells were unchanged in Dek-OE mammary glands, however the data to show this is not shown. This is problematic because the authors are directly drawing the conclusion that Dek-OE specifically upregulates luminal alveolar markers using this data.

      We have revised the manuscript to include a new supplementary figure (now Fig S4) to include a western blot for HER2 and ERa and a summary of RNA expression data from the bulk RNA-Seq experiment. We will also perform additional western blots to increase the sample size to demonstrate this negative data as part of our planned


      Major comment 8: In figure 7, the authors look at a conditional knockout of Dek and conclude that pup death in the knockout was due to insufficient milk production by dams. While the authors establish that H3K27me3 and Ezh2 expression are abrogated, morphological analysis of the ducts is missing and would present convincing data. For instance, in the Dek conditional knockout, are luminal alveolar cells unable to differentiate fully, or are there far fewer? Decreased levels of histone modifications does not tell you much about whether repressive chromatin has changed its landscape in Dek KO mice, which is actually what influences transcription.

      __We plan to add histological and whole mount imaging of Dek knockout mammary glands in the revision. We have preliminary data that supports this from 2 mice and will be collecting more samples for the revision. However, as noted in Fig 7C-D, heterozygous females also have small litter sizes and this will pose a breeding challenge for generating knockout females for this experiment in a timely manner. __

      Minor Points:

      All figures need some sort of reformatting. Several of the conclusions are made based on a limited number of replicates (often n=3) which is not a robust sample size to make a rigorous conclusion. Many figures have text that is stretched. Histology and whole mount images are missing scale bar. IHC quantifications are obscure - what is an optical density? how many animals were analyzed and how many fields of vision were captured? Figure 2F is absolutely impossible to understand. Neither figures nor legends disclose the number of animals or samples analyzed. The statistical test utilized across all figures is not appropriated. Fig5B GSEA plots are missing statistical significance, and without this information one cannot properly access the relevance of the findings. Fig5C - how were co-expressed genes defined? is this just random genes that are expressed in cells that have higher levels of DEK? The term co-expressed suggests a specific type of analysis that would investigate linkage of expression between genes, which i dont think is the case here.

      __As the reviewer already mentioned in major comment #1, there was a concern with sample size, which we addressed above in the planned revisions. We believe this concern about sample size was the rationale for the minor comment about “The statistical test utilized across all figures is not appropriate.” We have consulted with a biostatistician, Adam Lane PhD, who has confirmed that our statistical approaches were correct but were limited by our sample size. Thus, we do not agree with the reviewer’s view of statistical analyses. We have revised the text to include sample size information in figure legends and statistical significance information for GSEA plots in Fig 5, With regards to figure text being stretched, it does not look like that in our version of the document and reviewer 2 did not comment on this, so we would like the reviewer to identify a specific instance of this. We plan to capture images with size bars for IHC while we are performing the additional sample size collection. The reviewer asked about the number of fields of view for IHC quantification and we would like to note that our methods section already had that information in the first submission, “at least 3 fields of view from at least 3 different mice per group.” Our methods section also already had information regarding the identification of co-expressed genes in scRNA-Seq data and quantifying IHC with Image J. However, we have revised the text to add some clarifying sentences that we hope helps the reviewer better understand our methods. Finally, we are not sure what is “absolutely impossible to understand” about Fig 2F, which is a network visualization of functional enrichment analyses for differentially expressed genes in our RNA-Seq data. Is the text too small, or does the reviewer not understand the network? We would appreciate it if the reviewer could clarify this concern in their next review. __


      Minor Point 1: Throughout, it would be better to indicate the genotype of the "Control" animals on each figure so as the rigor the experiment can be evaluated fully.

      It appears that the reviewer was not aware that all controls were the same genotype and were the bitransgenic mice on dox chow. We have revised the manuscript to better clarify that “controls” = “+dox chow” bitransgenics and have added text on page 5 to directly state this. We have also revised Fig 1C to specify that the mouse with no luciferase signal is the “+dox” control.


      Minor Point 2: Standard nomenclature for gene names and protein names should be corrected throughout the text.

      __We have revised the text to confirm gene and protein names are correct. We have followed convention in using italics for gene names, non-italics for protein names, all capital letters for human genes/proteins (i.e.: DEK) and only first letter capitalization for non-human gene names (i.e.: Dek). __


      Minor Point 3: Similar to the point above, the use of Dek-OE to either refer to the mouse model or function as an acronym for "Dek overexpression" is inconsistent throughout the text.

      We thank the reviewer for pointing out this inconsistency and we have revised the text so that the “-OE” notation is only used when discussing the mice and have changed to writing out “over-expression” for function.


      Minor Point 4: In the main text for Figure 4I-J, the authors state that DEK was previously published as an Erα target gene, however there is no citation to support this.

      We have revised the text to include this citation, which is:

      16. Privette Vinnedge, L.M., et al., The DEK Oncogene Is a Target of Steroid Hormone Receptor Signaling in Breast Cancer. PLoS One, 2012. 7(10): p. e46985.

      Minor Point 5: It is unclear what the conclusion drawn from the experiments shown in Figure 4G-H and Figure 4I-J mean with respect to the goal of Figure 4, which was to show that Dek-OE mice have an expanded luminal alveolar compartment.

      We have revised the text to better explain that we were investigating the impact of ovarian hormones and pregnancy on endogenous Dek expression in wild-type mice, since this information has not been previously reported and adds context to our study.


      Minor Point 6: Optical density was used to quantify IHC experiments, which was performed using color deconvolution in ImageJ. Something that is unclear is whether the authors are measuring density in the entire field of view, or if the authors are measuring optical density per cell. This has implications whether there are more cell expressing the protein of interest, or if the existing cells are expressing a higher level of the protein of interest.

      We have revised the text to include more information in the methods. The Methods now states: “____Image J color deconvolution was utilized to measure the staining intensity only within mammary epithelial cells from at least 3 fields of view from at least 3 different mice per group. Specifically, cross-sections of similarly sized ducts were outlined such that only the collective epithelial cells within that cross section were measured, removing background signal from the stromal cells. Only single cross-sections of ducts were analyzed to minimize the impact of epithelial hyperplasia in experimental mice compared to controls fed dox chow.”


      Minor Point 7: In the main text for Figure 6D, the system being used to overexpress DEK protein is not described. It is not the same genetic system as is used in the Dek-OE mice, as doxycycline is inducing Dek expression.

      We have revised the figure 6 legend to specify “____DEK over-expression was accomplished with a dox-inducible pTRIPZ vector while DEK knockdown was accomplished with a pLKO.1 shRNA vector” and we kindly point the reviewer to the Methods section (“human cell lines” subsection) as written in the first submission which included detailed information for the subcloning of DEK cDNA into the pTRIPZ vector.


      Reviewer 2

      _All comments_

      1. Comment 1: This study would be improved by sharing important data including virgin mammary gland development in the DEK-OE and DEK-KO models (ductal growth and branching) and the expression of markers including ESR1, PGR, and ERBB2 (data not shown, page 8). Although there may be no differences, this is important data to share regarding the goal of this study. For example, in the DEK-OE model, data are only evaluated in the aged/multiparous stage and in the DEK-KO model, data are only evaluated during lactation. Furthermore, the DEK-KO model resembles germline DEK loss (under control of the CMV promoter), and there is limited validation of a MEC-intrinsic function.

      We have revised the manuscript to include data on Esr1/ERa and Erbb2/Her2 by western blot in new Fig S4 as well as the bulk RNA-Seq mRNA levels (by FPKM) for select basal and hormone sensing cell populations. The concern regarding parity was also mentioned by Reviewer 1 (major comments 3&4 above). Briefly we have clarified that ____nearly all data in the manuscript is from nulliparous (virgin) females and have revised the text throughout to more clearly state this fact. We have also revised the text to address the limitation of the CMV promoter. The Discussion section now states “____However, it is noted that one weakness of this CMV-Cre knockout model, is that there is a constitutive loss of Dek, which limits the interpretation for mammary epithelial cell-specific Dek functions.”

      Comment 2: Another major concern with this manuscript is the use of immunohistochemistry (IHC) and bulk mammary gland lysate western blots. IHC is non-quantitative, and the images are low resolution. For example, using IHC DEK expression is observed in all MECs (control and DEK-OE mice, Figure 1F), however, in the scRNAseq data DEK expression is confined to basal cells and a subset of stem/progenitor cells (Figure 5A). Furthermore, the hyperplasia in the DEK-OE model will bias bulk analysis (such as western blot and RNAseq) towards increased expression of MEC markers.

      1. __We have revised the text to point out that IHC images for Dek in control tissues show some cells have higher expression than others, which is what would be predicted by scRNA-Seq. The text now states on page 16 “____The scRNA-Seq data suggests that Dek is more highly expressed in specific subpopulations of cells, and the variable intensity of immunohistochemical staining for Dek in epithelial cells within control mouse tissue supports this (see Fig 3I, 4I, 4K, and 7H).” Furthermore, on page 10 in the Result section we have revised the text to state “The mammary gland undergoes substantial hormone-induced remodeling across the murine lifespan. We show that Dek is highest during pregnancy and minimally expressed during lactation and involution (Fig 4K), and that Dek protein expression is not uniform across all epithelial cells in wild-type glands (Fig 3I, 4I-K). This suggests that certain epithelial subpopulations express more Dek than others.” __
      2. __We acknowledge that IHC and western blots are only semi-quantitative, which is why we attempt to perform both as orthogonal approaches or find additional ways to support our findings throughout the manuscript (i.e.: co-expression at the RNA level from other sources, small molecule inhibitor treatment, etc). We also note that these methods are used to validate the quantitative method of RNA-Seq, and (often) validation of differentially expressed genes can be limited by antibody availability and the applications those antibodies are suitable for. __
      3. We also have revised the text to acknowledge that we knew the bulk RNA-Seq would be biased towards the hyperplastic cells. We wanted to take advantage of that bias to identify a gene signature that could be used to determine which cell type was leading to the hyperplasia phenotype. We used the differentially expressed genes to identify biomarkers for specific cell populations. On pages 6-7 the text now reads “____We performed bulk RNA sequencing on whole mammary tissue from two +dox control and two Dek-OE adult virgin females at 15 months of age to discover molecular targets regulated by Dek over-expression and to reveal a gene signature that could help identify the expanded cell population(s) in hyperplastic glands.” And “DEGs were plotted as a heatmap and ontologies for biomarkers of cell populations were defined to help identify the expanded cell population driving Dek-induced hyperplasia.”

      Comment 3: A third major concern is the mechanistic link between DEK and H3K27me3. Most of the data are correlative and rely on bulk analysis or IHC. For example, in the DEK-OE organoid model, is there an increase in H3K27me3. Additionally, in the DEK-OE organoids, can loss of EZH2 block the increased cell proliferation?

      __We plan to revise the manuscript to include an experiment in which we treat primary mammary epithelial cell organoids from Dek-OE mice with EZH2 inhibitor, GSK-126, +/- doxycycline for a mechanistic or functional link between DEK and H3K27me3 levels. We will then determine organoid size and attempt molecular characterization with IF. This will support the biochemical studies in Fig 6 showing DEK interacts with the PRC2 complex. __


    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      In this manuscript, the authors recorded cerebellar unipolar brush cells (UBCs) in acute brain slices. They confirmed that mossy fiber (MF) inputs generate a continuum of UBC responses. Using systematic and physiological trains of MF electrical stimulation, they demonstrated that MF inputs either increased or decreased UBC firing rates (UBC ON vs. OFF) or induced complex, long-lasting modulation of their discharges. The MF influence on UBC firing was directly associated with a specific combination of metabotropic glutamate receptors, mGluR2/3 (inhibitory) and mGluR1 (excitatory). Ultimately, the amount and ratio of these two receptors controlled the time course of the effect, yielding specific temporal transformations such as phase shifts.

      Overall, the topic is compelling, as it broadens our understanding of temporal processing in the cerebellar cortex. The experiments are well-executed and properly analyzed.

      Strengths:

      (1) A wide range of MF stimulation patterns was explored, including burst duration and frequency dependency, which could serve as a valuable foundation for explicit modeling of temporal transformations in the granule cell layer.

      (2) The pharmacological blockade of mGluR2/3, mGluR1, AMPA, and NMDA receptors helped identify the specific roles of these glutamate receptors.

      (3) The experiments convincingly demonstrate the key role of mGluR1 receptors in temporal information processing by UBCs.

      Weaknesses:

      (1) This study is largely descriptive and represents only a modest incremental advance from the previous work (Guo et al., Nat. Commun., 2021). 

      We feel that the present study is a major advance.  It builds on (Guo et al., Nat. Commun., 2021) in which we examined the effects of bursts of 20 stimuli at 100 spk/s.  In that study we found that differential expression of mGluR1 and mGluR2 let to a continuum of temporal responses in UBCs, but AMPARs make a minimal contribution for such bursts. It was not known how UBCs transform realistic mossy fiber input patterns. Here we provide a comprehensive evaluation of a wide range of input patterns that include a range of bursts comprised of 1-20 stimuli, sustained stimulation with stimulation of 1 spk/s to 60 spk/s. This more thorough assessment of UBC transformations combined with a pharmacological assessment of the contributions of different glutamate receptor subtypes provided many new insights: 

      • We found that UBC transformations are comprised of two different components: a slow temporally filtered component controlled by an interplay of mGluR1 and mGluR2, and a second component mediated by AMPARs that can convey spike timing information. NMDARs do not make a major contribution to UBC firing. The finding that UBCs simultaneously convey two types of signals, a slow filtered response and responses to single stimuli, has important implications for the computational potential of UBCs and fundamentally changes the way we think about UBCs.  

      • We found that with regard to the slow filtered component mediated by mGluR1 and mGluR2, we could extend the concept of a continuum of responses evoked by 20 stimuli at 100 spk/s (Guo et al., Nat. Commun., 2021) to a wide range of stimuli. It was not a given that this would be the case.   

      • The contributions of AMPARs was surprising. Even though snRNAseq data did not reveal a gradient of AMPAR expression across the population of UBCs (Guo et al., Nat. Commun., 2021), we found that there was a gradient of AMPA-mediated responses, and that the AMPA component was also most prominent in cells with a large mGluR1 component. Our finding that AMPAR accessory proteins exhibit a gradient across the population, which could account for the gradient of AMPAR responses, will prompt additional studies to test their involvement. 

      (2) The MF activity used to mimic natural stimulation was previously collected in primates, while the recordings were conducted in mice.

      Our first task was to determine the firing properties of mossy fibers under physiological conditions in UBC rich cerebellar regions. Previous studies have estimated this in anesthetized mice using whole cell granule cell recordings (Arenz et al., 2008; Witter & De Zeeuw 2015). However, for assessing firing patterns during awake behavior, we felt that the most comprehensive data set available in a UBC rich cerebellar region was for mossy fibers involved in smooth pursuit in monkeys (David J. Herzfeld and Stephen G. Lisberger). This revealed the general features of mossy fiber firing that helped us design stimulus patterns to thoroughly probe the properties of MF to UBC transformations. The firing patterns are designed to investigate the transformations for a wide range of activity patterns and have important general implications for UBC transformations that are likely applicable to UBCs in different species that are activated in different ways.   

      (3) Inhibition was blocked throughout the study, reducing its physiological relevance.

      The reviewer correctly brings up the very important issue of inhibition in shaping UBC responses.  It is well established that UBCs are inhibited by Golgi cells (Rousseau et al., 2012), and we recently showed that some UBCs are also inhibited by PCs (Guo et al., eLife, 2021). This will undoubtedly influence the firing of UBCs in vivo. We considered examining this issue, but felt that brain slice experiments are not well suited to this. In contrast to MF inputs that can be activated with a realistic activity pattern, it is exceedingly difficult to know how Golgi cells and Purkinje cells are activated under physiological conditions. Each UBC is activated by a single mossy fiber, but inhibition is provided by Golgi cells that are activated by many mossy fibers and granule cells, and PCs that are controlled by many granule cells and many other PCs. In addition, we found that many Golgi cells do not survive very well in slices, and the axons of many PCs are severed in brain slice. Although limitations of the slice preparation prevent us from determining the role of inhibition in shaping UBC responses, we have added a section to the discussion in which we address the important issue of inhibition and UBC responses.   

      Reviewer #2 (Public review):

      This study addresses the question of how UBCs transform synaptic input patterns into spiking output patterns and how different glutamate receptors contribute to their transformations. The first figure utilizes recorded patterns of mossy fiber firing during eye movements in the flocculus of rhesus monkeys obtained from another laboratory. In the first figure, these patterns are used to stimulate mossy fibers in the mouse cerebellum during extracellular recordings of UBCs in acute mouse brain slices. The remaining experiments stimulate mossy fiber inputs at different rates or burst durations, which is described as 'mossy-fiber like', although they are quite simpler than those recorded in vivo. As expected from previous work, AMPA mediates the fast responses, and mGluR1 and mGluR2/3 mediate the majority of longer-duration and delayed responses. The manuscript is well organized and the discussion contextualizes the results effectively.

      The authors use extracellular recordings because the washout of intracellular molecules necessary for metabotropic signaling may occur during whole-cell recordings. These cell-attached recordings do not allow one to confirm that electrical stimulation produces a postsynaptic current on every stimulus. Moreover, it is not clear that the synaptic input is monosynaptic, as UBCs synapse on one another. This leaves open the possibility that delays in firing could be due to disynaptic stimulation. Additionally, the result that AMPAmediated responses were surprisingly small in many UBCs, despite apparent mRNA expression, suggests the possibility that spillover from other nearby synapses activated the higher affinity extrasynaptic mGluRs and that that main mossy fiber input to the UBC was not being stimulated. For these reasons, some whole-cell recordings (or perforated patch) would show that when stimulation is confirmed to be monosynaptic and reliable it can produce the same range of spiking responses seen extracellularly and that AMPA receptormediated currents are indeed small or absent in some UBCs.

      We appreciate the reviewer’s concerns regarding the reliability of mossy fiber activation, the possibility of glutamate spillover from other synapses, and the possibility of disynaptic activation involving stimulation of MFàUBCàUBC connections. We examined these issues in a previous study (Guo et al., Nat. Commun., 2021).  We did on-cell recordings and followed that up with whole cell voltage clamp recordings from the same cell (Guo et al., Nat. Commun., 2021, Fig. 5), and there was good agreement with the amplitude and timing of spiking and the time course and amplitudes of the synaptic currents.  We also compared responses evoked by focal glutamate uncaging over the brush and MF stimulation (Guo et al., Nat. Commun., 2021, Fig. 4). We found that the time courses and amplitudes of the responses were remarkably similar. This strongly suggests that the responses we observe do not reflect disynaptic activation (MFàUBCàUBC connections). We also showed that the responses were all-or-none: at low intensities no response was evoked, as the intensity of extracellular stimulation was increased a large response was suddenly evoked at a threshold intensity and further increases in intensity did not increase the amplitude of the response (Guo et al., Nat. Commun., 2021, Extended data Fig. 1).  We can be well above threshold and still excite the same response, and as a result we do not see stereotyped indications of an inability to stimulate during prolonged high frequency activation.  We recognize the importance of these issues, so we have  added a section dealing explicitly with these issues (pp. 15-16).  

      A discussion of whether the tested glutamate receptors affected the spontaneous firing rates of these cells would be informative as standing currents have been reported in UBCs. It is unclear whether the firing rate was normalized for each stimulation, each drug application, or each cell. It would also be informative to report whether UBCs characterized as responding with Fast, Mid-range, Slow, and OFF responses have different spontaneous firing rates or spontaneous firing patterns (regular vs irregular).

      The spontaneous firing of UBCs is indeed an interesting issue that is deserving of further investigation. It is not currently known how spontaneous firing at rest is regulated in UBCs, however, in previous work we have shown that there is great diversity in the rates across the population of UBCs in the dorsal cochlear nucleus (Huson & Regehr, JNeurosci, 2023, Fig. 4). Unfortunately, during the kind of sustained high-frequency stimulation protocols (as used in this study) spontaneous firing rates tend to increase. This is likely an effect of residual receptor activation. As such, our current dataset is not suitable to performing in depth analysis of the effects of the different glutamate receptors on spontaneous firing rates. As this study aims to explore UBC responses to MF inputs we feel that specific experiments to address the issue of spontaneous firing rates are outside of the scope.

      As the reviewers points out there are indeed different ways the firing rates can be normalized for display in the heatmaps, and different normalizations have been used in different figures. We have made sure that the method for normalization is clearly indicated in the figure legends for each of the heatmaps on display, specifying the protocol and drug application used for normalization.

      Figure 1 shows examples of how Fast, Mid-range, Slow, and OFF UBCs respond to in vivo MF firing patterns, but lacks a summary of how the input is transformed across a population of UBCs. In panel d, it looks as if the phase of firing becomes more delayed across the examples from Fast to OFF UBCs. Quantifying this input/output relationship more thoroughly would strengthen these results.

      The UBC responses to in vivo MF firing patterns are intriguing and we agree that there appears to be increasing delays for slower UBCs visible in Figure 1. However, we feel that the true in vivo MF firing patterns are too complex and irregular for rigorous interpretation. Therefore, we only tested simplified burst and smooth pursuit-like input patterns on the full population of UBCs. Here we indeed do see increasingly delayed responses as UBCs get slower (Fig. 4).

      Inhibition was pharmacologically blocked in these studies. Golgi cells and other inhibitory interneurons likely contribute to how UBCs transform input signals. Speculation of how GABAergic and glycinergic synaptic inhibition may contribute additional context to help readers understand how a circuit with intact inhibition may behave. 

      As indicated in our response to reviewer 1, we have added a section discussing the very important issue of inhibition and UBC responses in vivo.   

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Including recordings without inhibition blocked would strengthen the study and provide a more comprehensive view of the transformations made by UBCs at the input stage of the cerebellar cortex.

      See response to public comments.   

      (2) The authors claim that a continuum of temporal responses was observed in UBCs, but they also distinguish between fast, mid-range, slow, and OFF UBCs. While some UBCs fire spontaneously, others are activated by MF inputs. A more thorough classification effort would clarify the various response profiles observed under specific MF stimulation regimes. Have the authors considered using machine learning algorithms to aid in classification? 

      We fundamentally feel that these response properties do not conform to rigid categories. In our previous work we have shown that UBC population constitutes a continuum in terms of gene expression, and in terms of spontaneous and evoked firing patterns. While in order to answer some questions empirically it may still be useful to apply advanced algorithms to enforce separate groups to be compared, in this work we aimed to present the full range of UBC responses without introducing any additional biases that such methods would produce.

      (3) A robust classification could assist in quantifying the temporal shifts observed during smooth pursuit-like MF stimulation, a critical outcome of the study.

      As stated above, we prefer to present an unbiased overview of the continuous nature of the UBC population, as we believe that this is fundamentally the most accurate representation. While it is true that this prevents us from providing a quantification in the different temporal shifts, we believe that the range of shifts across the population is sufficiently large and continuously varying the be convincing (see Figure 4d).  

      (4) In Figure 5, contrary to what is described on page 10, Cells 10 and 11 (OFF UBCs) appear to behave differently, as mGluR1 does not seem to affect their firing rates. A specific case should be made for OFF UBCs. 

      Indeed, cells 10 and 11 do not show clear increases in firing and are not strongly affected by blocking of mGluR1. However, as discussed above and explored in our previous work, we feel that the range of UBC increases in firing is best described as a continuum, including the extreme where increases in firing are no longer clearly observable. As the aim in this work is to describe this continuum of responses for physiologically relevant inputs, we do not feel there is a benefit to creating a specific case for OFF UBCs here. It should be pointed out that the number of “pure” OFF UBCs completely lacking an mGluR1 component is very small.  

      (5) A summary diagram should be added at the end of the manuscript to highlight the key temporal features observed in this study. 

      This is a great suggestion and we have prepared such a summary diagram (Figure 6).

      Reviewer #2 (Recommendations for the authors):

      (1) Page 3- "Assed" should be "assessed"

      (2) Page 19- "by integrating" is repeated twice

      (3) It was not noted whether the data would be made available. It could be useful for those interested in implementing UBCs in models of the cerebellar cortex.

      We agree that this data set is invaluable to those interested in implementing UBCs in models of the cerebellar cortex.  We will make the dataset available as described in the text.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #2 (Public review):

      Summary:

      Yamawaki et al., conducted a series of neuroanatomical tracing and whole cell recording experiments to elucidate and characterise a relatively unknown pathway between the endopiriform (EN) and CA1 of the ventral hippocampus (vCA1) and to assess its functional role in social and object recognition using fibre photometry and dual vector chemogenetics. The main findings were that the EN sends robust projections to the vCA1 that collateralise to the prefrontal cortex, lateral entorhinal cortex and piriform cortex, and these EN projection neurons terminate in the stratum lacunosum-moleculare (SLM) layer of distal vCA1, synapsing onto GABAergic neurons that span across the Pyramidal-Stratum Radiatum (SR) and SR-SML borders. It was also demonstrated that EN input disynaptically inhibits vCA1 pyramidal neurons. vCA1 projecting EN neurons receive afferent input from piriform cortex, and from within EN. Finally, fibre photometry experiments revealed that vCA1 projecting EN neurons are most active when mice explore novel objects or conspecifics, and pathway-specific chemogenetic inhibition led to an impairment in the ability to discriminate between novel vs. familiar objects and conspecifics.

      The authors have addressed most of my concerns, but a few weaknesses remain :<br /> (1) I expected to see the addition of raw interaction times with objects and conspecifics for each phase of social testing (pre-test, sociability test, social discrimination), as per my comment on including raw data. However, the authors only provided total distance traveled and velocity, and total interaction time in Figure S9, which is less informative.

      We apologies for missing the request. We have added the raw interaction times in Fig. S9G.

      (2) The authors observed increased activity in vCA1-projecting EN neurons tracking with the preferred object during the pre-test (object-object exploration) phase of the social tests, and the summary schematic (Figure 9A) depicts animals as showing a preference for one object over the other (although they are identical) in both the social and object recognition tests. However, in the chemogenetic experiment, the data (Fig S9B) indicate that animals did not show this preference for one object over another, making the expected baseline for this task unclear. This also raises an important question of whether the lack of effect from chemogenetic inhibition of vCA1-projecting EN neurons could be attributed to the absence of this baseline preference.

      We appreciate the comments. In Fig. S9B, although the group median at baseline (pretest) showed no preference for one object, individual subjects displayed a preference for one object (i.e., each data point deviated positively or negatively from 0.5) in saline condition. Therefore, we do not think that a lack of baseline preference accounts for the absence of the inhibition effect in the pretest.

      Additionally, the finding that vCA1-projecting EN activity is associated with the preferred object exploration appears to counter the authors' argument that novelty engages this circuit (since both objects are novel in this instance). This discrepancy warrants further discussion.

      This is an interesting point. One possibility is that during the pretest, EN activity simply "reports" or "represents" the interaction time without driving exploratory preference. This aligns with our DREADD experiment data, which show that inhibition of EN neurons produced no overall behavioral effect. Innate exploratory behavior has been attributed to various circuits, including the medial preoptic area → PAG circuit (Ryoo et al., 2021, Front. Neuro.) and the Septal → VTA circuit (Mocellin et al., 2024, Neuron). We found no direct projection from these areas to EN (Fig. 6), but such connections could be established di- or polysynaptically. Moreover, these circuits could be driven by common inputs, such as the locus coeruleus or the cholinergic system for arousal, with only specific downstream targets, excluding EN, playing a key role in driving innate exploration and preference.

      We have inserted the following sentence in discussion (line 253-255):

      “The correlation of ENvCA1-proj. activity with novel object preference in the pretest nevertheless suggests that these neurons 'represent' the innate preference without driving it.”

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Line 209: Please remove the reference to neural activity 'predicting' behavior, as correlation analysis does not imply predictive power.

      We now have changed the phrase to “Although EN<sup>vCA1-proj.</sup> activity was correlated with the behavior…”

      Line 236: It is unclear what is meant by: 'This circuit motif may predict the predominant role of ENvCA1-proj. neurons in social recognition memory'

      We have changed the sentence to the following for the clarity:

      “Since social odor information is crucial for discriminating conspecifics in rodents, this circuit motif may predict the predominant role of ENvCA1-proj. neurons in social recognition memory, given that social odor can engage multiple olfactory pathways innervating the piriform cortex.”

      Fig 7 title: insert 'with' after correlates: 'Activity of ENvCA1-proj. neurons correlates social/object discrimination performance'

      Corrected.

      Fig S1 title: 'Projecing' typo.

      Corrected.

      Fig S8: Please rephrase for clarity: 'In pretest, the object was aligned by longer interaction time (preferred object is plotted in right side)'

      We now have rephrased the sentence to:

      “In the pretest plot, the object that the mice interacted with more is placed on the right side.”

      References:

      A septal-ventral tegmental area circuit drives exploratory behavior. Mocellin, Petra et al. Neuron, Volume 112, Issue 6, 1020-1032.e7

      An inhibitory medial preoptic circuit mediates innate exploration. Ryoo, Jia et al. Front. Neurosci., 23 August 221. Volume 15- 2021

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife Assessment

      This study presents valuable findings on the potential of short-movie viewing fMRI protocol to explore the functional and topographical organization of the visual system in awake infants and toddlers. Although the data are compelling given the difficulty of studying this population, the evidence presented is incomplete and would be strengthened by additional analyses to support the authors' claims. This study will be of interest to cognitive neuroscientists and developmental psychologists, especially those interested in using fMRI to investigate brain organisation in pediatric and clinical populations with limited fMRI tolerance.

      We are grateful for the thorough and thoughtful reviews. We have provided point-bypoint responses to the reviewers’ comments, but first, we summarize the major revisions here. We believe these revisions have substantially improved the clarity of the writing and impact of the results.

      Regarding the framing of the paper, we have made the following major changes in response to the reviews:

      (1) We have clarified that our goal in this paper was to show that movie data contains topographic, fine-grained details of the infant visual cortex. In the revision, we now state clearly that our results should not be taken as evidence that movies could replace retinotopy and have reworded parts of the manuscript that could mislead the reader in this regard.

      (2) We have added extensive details to the (admittedly) complex methods to make them more approachable. An example of this change is that we have reorganized the figure explaining the Shared Response Modelling methods to divide the analytic steps more clearly.

      (3) We have clarified the intermediate products contributing to the results by adding 6 supplementary figures that show the gradients for each IC or SRM movie and each infant participant.

      In response to the reviews, we have conducted several major analyses to support our findings further:

      (1) To verify that our analyses can identify fine-grained organization, we have manually traced and labeled adult data, and then performed the same analyses on them. The results from this additional dataset validate that these analyses can recover fine-grained organization of the visual cortex from movie data.

      (2) To further explore how visual maps derived from movies compare to alternative methods, we performed an anatomical alignment control analysis. We show that high-quality maps can be predicted from other participants using anatomical alignment.

      (3) To test the contribution of motion to the homotopy analyses, we regressed out the motion effects in these analyses. We found qualitatively similar results to our main analyses, suggesting motion did not play a substantial role.

      (4) To test the contribution of data quantity to the homotopy analyses, we correlated the amount of movie data collected from each participant with the homotopy results. We did not find a relationship between data quantity and the homotopy results. 

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Ellis et al. investigated the functional and topographical organization of the visual cortex in infants and toddlers, as evidenced by movie-viewing data. They build directly on prior research that revealed topographic maps in infants who completed a retinotopy task, claiming that even a limited amount of rich, naturalistic movie-viewing data is sufficient to reveal this organization, within and across participants. Generating this evidence required methodological innovations to acquire high-quality fMRI data from awake infants (which have been described by this group, and elsewhere) and analytical creativity. The authors provide evidence for structured functional responses in infant visual cortex at multiple levels of analyses; homotopic brain regions (defined based on a retinotopy task) responded more similarly to one another than to other brain regions in visual cortex during movie-viewing; ICA applied to movie-viewing data revealed components that were identifiable as spatial frequency, and to a lesser degree, meridian maps, and shared response modeling analyses suggested that visual cortex responses were similar across infants/toddlers, as well as across infants/toddlers and adults. These results are suggestive of fairly mature functional response profiles in the visual cortex in infants/toddlers and highlight the potential of movie-viewing data for studying finer-grained aspects of functional brain responses, but further evidence is necessary to support their claims and the study motivation needs refining, in light of prior research.

      Strengths:

      - This study links the authors' prior evidence for retinotopic organization of visual cortex in human infants (Ellis et al., 2021) and research by others using movie-viewing fMRI experiments with adults to reveal retinotopic organization (Knapen, 2021).

      - Awake infant fMRI data are rare, time-consuming, and expensive to collect; they are therefore of high value to the community. The raw and preprocessed fMRI and anatomical data analyzed will be made publicly available.

      We are grateful to the reviewer for their clear and thoughtful description of the strengths of the paper, as well as their helpful outlining of areas we could improve.

      Weaknesses:

      - The Methods are at times difficult to understand and in some cases seem inappropriate for the conclusions drawn. For example, I believe that the movie-defined ICA components were validated using independent data from the retinotopy task, but this was a point of confusion among reviewers. 

      We acknowledge the complexity of the methods and wish to clarify them as best as possible for the reviewers and the readers. We have extensively revised the methods and results sections to help avoid potential misunderstandings. For instance, we have revamped the figure and caption describing the SRM pipeline (Figure 5).

      To answer the stated confusion directly, the ICA components were derived from the movie data and validated on the (completely independent) retinotopy data. There were no additional tasks. The following text in the paper explains this point:

      “To assess the selected component maps, we correlated the gradients (described above) of the task-evoked and component maps. This test uses independent data: the components were defined based on movie data and validated against task-evoked retinotopic maps.” Pg. 11

      In either case: more analyses should be done to support the conclusion that the components identified from the movie reproduce retinotopic maps (for example, by comparing the performance of movie-viewing maps to available alternatives (anatomical ROIs, group-defined ROIs). 

      Before addressing this suggestion, we want to restate our conclusions: features of the retinotopic organization of infant visual cortex could be predicted from movie data. We did not conclude that movie data could ‘reproduce’ retinotopic maps in the sense that they would be a replacement. We recognize that this was not clear in our original manuscript and have clarified this point throughout, including in this section of the discussion:

      “To be clear, we are not suggesting that movies work well enough to replace a retinotopy task when accurate maps are needed. For instance, even though ICA found components that were highly correlated with the spatial frequency map, we also selected some components that turned out to have lower correlations. Without knowing the ground truth from a retinotopy task, there would be no way to weed these out. Additionally, anatomical alignment (i.e., averaging the maps from other participants and anatomically aligning them to a held-out participant) resulted in maps that were highly similar to the ground truth. Indeed, we previously[23] found that adult-defined visual areas were moderately similar to infants. While functional alignment with adults can outperform anatomical alignment methods in similar analyses[27], here we find that functional alignment is inferior to anatomical alignment. Thus, if the goal is to define visual areas in an infant that lacks task-based retinotopy, anatomical alignment of other participants’ retinotopic maps is superior to using movie-based analyses, at least as we tested it.” Pg. 21

      As per the reviewer’s suggestion and alluded to in the paragraph above, we have created anatomically aligned visual maps, providing an analogous test to the betweenparticipant analyses like SRM. We find that these maps are highly similar to the ground truth. We describe this result in a new section of the results:

      “We performed an anatomical alignment analog of the functional alignment (SRM) approach. This analysis serves as a benchmark for predicting visual maps using taskbased data, rather than movie data, from other participants. For each infant participant, we aggregated all other infant or adult participants as a reference. The retinotopic maps from these reference participants were anatomically aligned to the standard surface template, and then averaged. These averages served as predictions of the maps in the test participant, akin to SRM, and were analyzed equivalently (i.e., correlating the gradients in the predicted map with the gradients in the task-based map). These correlations (Table S4) are significantly higher than for functional alignment (using infants to predict spatial frequency, anatomical alignment > functional alignment: ∆<sub>Fisher Z</sub> M=0.44, CI=[0.32–0.58], p<.001; using infants to predict meridians, anatomical alignment > functional alignment: ∆<sub>Fisher Z</sub> M=0.61, CI=[0.47–0.74], p<.001; using adults to predict spatial frequency, anatomical alignment > functional alignment: ∆<sub>Fisher Z</sub> M=0.31, CI=[0.21–0.42], p<.001; using adults to predict meridians, anatomical alignment > functional alignment: ∆<sub>Fisher Z</sub> M=0.49, CI=[0.39–0.60], p<.001). This suggests that even if SRM shows that movies can be used to produce retinotopic maps that are significantly similar to a participant, these maps are not as good as those that can be produced by anatomical alignment of the maps from other participants without any movie data.” Pg. 16–17

      Also, the ROIs used for the homotopy analyses were defined based on the retinotopic task rather than based on movie-viewing data alone - leaving it unclear whether movie-viewing data alone can be used to recover functionally distinct regions within the visual cortex.

      We agree with the reviewer that our approach does not test whether movie-viewing data alone can be used to recover functionally distinct regions. The goal of the homotopy analyses was to identify whether there was functional differentiation of visual areas in the infant brain while they watch movies. This was a novel question that provides positive evidence that these regions are functionally distinct. In subsequent analyses, we show that when these areas are defined anatomically, rather than functionally, they also show differentiated function (e.g., Figure 2). Nonetheless, our intention was not to use the homotopy analyses to define the regions. We have added text to clarify the goal and novelty of this analysis.

      “Although these analyses cannot define visual maps, they test whether visual areas have different functional signatures.” Pg. 6

      Additionally, even if the goal were to define areas based on homotopy, we believe the power of that analysis would be questionable. We would need to use a large amount of the movie data to define the areas, leaving a low-powered dataset to test whether their function is differentiated by these movie-based areas.

      - The authors previously reported on retinotopic organization of the visual cortex in human infants (Ellis et al., 2021) and suggest that the feasibility of using movie-viewing experiments to recover these topographic maps is still in question. They point out that movies may not fully sample the stimulus parameters necessary for revealing topographic maps/areas in the visual cortex, or the time-resolution constraints of fMRI might limit the use of movie stimuli, or the rich, uncontrolled nature of movies might make them inferior to stimuli that are designed for retinotopic mapping, or might lead to variable attention between participants that makes measuring the structure of visual responses across individuals challenging. This motivation doesn't sufficiently highlight the importance or value of testing this question in infants. Further, it's unclear if/how this motivation takes into account prior research using movie-viewing fMRI experiments to reveal retinotopic organization in adults (e.g., Knapen, 2021). Given the evidence for retinotopic organization in infants and evidence for the use of movie-viewing experiments in adults, an alternative framing of the novel contribution of this study is that it tests whether retinotopic organization is measurable using a limited amount of movie-viewing data (i.e., a methodological stress test). The study motivation and discussion could be strengthened by more attention to relevant work with adults and/or more explanation of the importance of testing this question in infants (is the reason to test this question in infants purely methodological - i.e., as a way to negate the need for retinotopic tasks in subsequent research, given the time constraints of scanning human infants?).

      We are grateful to the reviewer for giving us the opportunity to clarify the innovations of this research. We believe that this research contributes to our understanding of how infants process dynamic stimuli, demonstrates the viability and utility of movie experiments in infants, and highlights the potential for new movie-based analyses (e.g., SRM). We have now consolidated these motivations in the introduction to more clearly motivate this work:

      “The primary goal of the current study is to investigate whether movie-watching data recapitulates the organization of visual cortex. Movies drive strong and naturalistic responses in sensory regions while minimizing task demands[12, 13, 24] and thus are a proxy for typical experience. In adults, movies and resting-state data have been used to characterize the visual cortex in a data-driven fashion[25–27]. Movies have been useful in awake infant fMRI for studying event segmentation[28], functional alignment[29], and brain networks[30]. However, this past work did not address the granularity and specificity of cortical organization that movies evoke. For example, movies evoke similar activity in infants in anatomically aligned visual areas[28], but it remains unclear whether responses to movie content differ between visual areas (e.g., is there more similarity of function within visual areas than between31). Moreover, it is unknown whether structure within visual areas, namely visual maps, contributes substantially to visual evoked activity. Additionally, we wish to test whether methods for functional alignment can be used with infants. Functional alignment finds a mapping between participants using functional activity – rather than anatomy – and in adults can improve signal-to-noise, enhance across participant prediction, and enable unique analyses[27, 32–34].” Pg. 3-4

      Furthermore, the introduction culminates in the following statement on what the analyses will tell us about the nature of movie-driven activity in infants:

      “These three analyses assess key indicators of the mature visual system: functional specialization between areas, organization within areas, and consistency between individuals.” Pg. 5

      Furthermore, in the discussion we revisit these motivations and elaborate on them further:

      [Regarding homotopy:] “This suggests that visual areas are functionally differentiated in infancy and that this function is shared across hemispheres[31].” Pg. 19

      [Regarding ICA:] “This means that the retinotopic organization of the infant brain accounts for a detectable amount of variance in visual activity, otherwise components resembling these maps would not be discoverable.” Pg. 19–20

      [Regarding SRM:] “This is initial evidence that functional alignment may be useful for enhancing signal quality, like it has in adults[27,32,33], or revealing changing function over development[45].” Pg. 21

      Additionally, we have expanded our discussion of relevant work that uses similar methods such as the excellent research from Knapen (2021) and others:

      “In adults, movies and resting-state data have been used to characterize the visual cortex in a data-driven fashion[25-27].” Pg. 4

      “We next explored whether movies can reveal fine-grained organization within visual areas by using independent components analysis (ICA) to propose visual maps in individual infant brains[25,26,35,42,43].” Pg. 9

      Reviewer #2 (Public Review):

      Summary:

      This manuscript shows evidence from a dataset with awake movie-watching in infants, that the infant brain contains areas with distinct functions, consistent with previous studies using resting state and awake task-based infant fMRI. However, substantial new analyses would be required to support the novel claim that movie-watching data in infants can be used to identify retinotopic areas or to capture within-area functional organization.

      Strengths:

      The authors have collected a unique dataset: the same individual infants both watched naturalistic animations and a specific retinotopy task. These data position the authors to test their novel claim, that movie-watching data in infants can be used to identify retinotopic areas.

      Weaknesses:

      To claim that movie-watching data can identify retinotopic regions, the authors should provide evidence for two claims:

      - Retinotopic areas defined based only on movie-watching data, predict retinotopic responses in independent retinotopy-task-driven data.

      - Defining retinotopic areas based on the infant's own movie-watching response is more accurate than alternative approaches that don't require any movie-watching data, like anatomical parcellations or shared response activation from independent groups of participants.

      We thank the reviewer for their comments. Before addressing their suggestions, we wish to clarify that we do not claim that movie data can be used to identify retinotopic areas, but instead that movie data captures components of the within and between visual area organization as defined by retinotopic mapping. We recognize that this was not clear in our original manuscript and have clarified this point throughout, including in this section of the discussion:

      “To be clear, we are not suggesting that movies work well enough to replace a retinotopy task when accurate maps are needed. For instance, even though ICA found components that were highly correlated with the spatial frequency map, we also selected some components that turned out to have lower correlations. Without knowing the ground truth from a retinotopy task, there would be no way to weed these out. Additionally, anatomical alignment (i.e., averaging the maps from other participants and anatomically aligning them to a held-out participant) resulted in maps that were highly similar to the ground truth. Indeed, we previously[23] found that adult-defined visual areas were moderately similar to infants. While functional alignment with adults can outperform anatomical alignment methods in similar analyses[27], here we find that functional alignment with infants is inferior to anatomical alignment. Thus, if the goal is to define visual areas in an infant that lacks task-based retinotopy, anatomical alignment of other participants’ retinotopic maps is superior to using movie-based analyses, at least as we tested it.” Pg. 21

      In response to the reviewer’s suggestion, we compare the maps identified by SRM to the averaged, anatomically aligned maps from infants. We find that these maps are highly similar to the task-based ground truth and we describe this result in a new section:

      “We performed an anatomical alignment analog of the functional alignment (SRM) approach. This analysis serves as a benchmark for predicting visual maps using taskbased data, rather than movie data, from other participants. For each infant participant, we aggregated all other infant or adult participants as a reference. The retinotopic maps from these reference participants were anatomically aligned to the standard surface template, and then averaged. These averages served as predictions of the maps in the test participant, akin to SRM, and were analyzed equivalently (i.e., correlating the gradients in the predicted map with the gradients in the task-based map). These correlations (Table S4) are significantly higher than for functional alignment (using infants to predict spatial frequency, anatomical alignment < functional alignment: ∆<sub>Fisher Z</sub> M=0.44, CI=[0.32–0.58], p<.001; using infants to predict meridians, anatomical alignment < functional alignment: ∆<sub>Fisher Z</sub> M=0.61, CI=[0.47–0.74], p<.001; using adults to predict spatial frequency, anatomical alignment < functional alignment: ∆<sub>Fisher Z</sub> M=0.31, CI=[0.21–0.42], p<.001; using adults to predict meridians, anatomical alignment < functional alignment: ∆<sub>Fisher Z</sub> M=0.49, CI=[0.39–0.60], p<.001). This suggests that even if SRM shows that movies can be used to produce retinotopic maps that are significantly similar to a participant, these maps are not as good as those that can be produced by anatomical alignment of the maps from other participants without any movie data.” Pg. 16–17

      Note that we do not compare the anatomically aligned maps with the ICA maps statistically. This is because these analyses are not comparable: ICA is run withinparticipant whereas anatomical alignment is necessarily between-participant — either infant or adults. Nonetheless, an interested reader can refer to the Table where we report the results of anatomical alignment and see that anatomical alignment outperforms ICA in terms of the correlation between the predicted and task-based maps.

      Both of these analyses are possible, using the (valuable!) data that these authors have collected, but these are not the analyses that the authors have done so far. Instead, the authors report the inverse of (1): regions identified by the retinotopy task can be used to predict responses in the movies. The authors report one part of (2), shared responses from other participants can be used to predict individual infants' responses in the movies, but they do not test whether movie data from the same individual infant can be used to make better predictions of the retinotopy task data, than the shared response maps.

      So to be clear, to support the claims of this paper, I recommend that the authors use the retinotopic task responses in each individual infant as the independent "Test" data, and compare the accuracy in predicting those responses, based on:

      -  The same infant's movie-watching data, analysed with MELODIC, when blind experimenters select components for the SF and meridian boundaries with no access to the ground-truth retinotopy data.

      -  Anatomical parcellations in the same infant.

      -  Shared response maps from groups of other infants or adults.

      -  (If possible, ICA of resting state data, in the same infant, or from independent groups of infants).

      Or, possibly, combinations of these techniques.

      If the infant's own movie-watching data leads to improved predictions of the infant's retinotopic task-driven response, relative to these existing alternatives that don't require movie-watching data from the same infant, then the authors' main claim will be supported.

      These are excellent suggestions for additional analyses to test the suitability for moviebased maps to replace task-based maps. We hope it is now clear that it was never our intention to claim that movie-based data could replace task-based methods. We want to emphasize that the discoveries made in this paper — that movies evoke fine-grained organization in infant visual cortex — do not rely on movie-based maps being better than alternative methods for producing maps, such as the newly added anatomical alignment.

      The proposed analysis above solves a critical problem with the analyses presented in the current manuscript: the data used to generate maps is identical to the data used to validate those maps. For the task-evoked maps, the same data are used to draw the lines along gradients and then test for gradient organization. For the component maps, the maps are manually selected to show the clearest gradients among many noisy options, and then the same data are tested for gradient organization. This is a double-dipping error. To fix this problem, the data must be split into independent train and test subsets.

      We appreciate the reviewer’s concern; however, we believe it is a result of a miscommunication in our analytic strategy. We have now provided more details on the analyses to clarify how double-dipping was avoided. 

      To summarize, a retinotopy task produced visual maps that were used to trace both area boundaries and gradients across the areas. These data were then fixed and unchanged, and we make no claims about the nature of these maps in this paper, other than to treat them as the ground truth to be used as a benchmark in our analyses. The movie data, which are collected independently from the same infant in the session, used the boundaries from the retinotopy task (in the case of homotopy) or were compared with the maps from the retinotopy task (in the case of ICA and SRM). In other words, the statement that “the data used to generate maps is identical to the data used to validate those maps” is incorrect because we generated the maps with a retinotopy task and validated the maps with the movie data. This means no double dipping occurred.

      Perhaps a cause of the reviewer’s interpretation is that the gradients used in the analysis are not clearly described. We now provide this additional description:  “Using the same manually traced lines from the retinotopy task, we measured the intensity gradients in each component from the movie-watching data. We can then use the gradients of intensity in the retinotopy task-defined maps as a benchmark for comparison with the ICA-derived maps.” Pg. 10

      Regarding the SRM analyses, we take great pains to avoid the possibility of data contamination. To emphasize how independent the SRM analysis is, the prediction of the retinotopic map from the test participant does not use their retinotopy data at all; in fact, the predicted maps could be made before that participant’s retinotopy data were ever collected. To make this prediction for a test participant, we need to learn the inversion of the SRM, but this only uses the movie data of the test participant. Hence, there is no double-dipping in the SRM analyses. We have elaborated on this point in the revision, and we remade the figure and its caption to clarify this point:

      We also have updated the description of these results to emphasize how double-dipping was avoided:

      “We then mapped the held-out participant's movie data into the learned shared space without changing the shared space (Figure 5c). In other words, the shared response model was learned and frozen before the held-out participant’s data was considered.

      This approach has been used and validated in prior SRM studies[45].” Pg. 14

      The reviewer suggests that manually choosing components from ICA is double-dipping. Although the reviewer is correct that the manual selection of components in ICA means that the components chosen ought to be good candidates, we are testing whether those choices were good by evaluating those components against the task-based maps that were not used for the ICA. Our statistical analyses evaluate whether the components chosen were better than the components that would have been chosen by random chance. Critically: all decisions about selecting the components happen before the components are compared to the retinotopic maps. Hence there is no double-dipping in the selection of components, as the choice of candidate ICA maps is not informed by the ground-truth retinotopic maps. We now clarify what the goal of this process is in the results:

      “Success in this process requires that 1) retinotopic organization accounts for sufficient variance in visual activity to be identified by ICA and 2) experimenters can accurately identify these components.” Pg. 10

      The reviewer also alludes to a concern that the researcher selecting the maps was not blind to the ground-truth retinotopic maps from participants and this could have influenced the results. In such a scenario, the researcher could have selected components that have the gradients of activity in the places that the infant has as ground truth. The researcher who made the selection of components (CTE) is one of the researchers who originally traced the areas in the participants approximately a year prior to the identification of ICs. The researcher selecting the components didn’t use the ground-truth retinotopic maps as reference, nor did they pay attention to the participant IDs when sorting the IC components. Indeed, they weren’t trying to find participant specific maps per se, but rather aimed to find good candidate retinotopic maps in general. In the case of the newly added adult analyses, the ICs were selected before the retinotopic mapping was reviewed or traced; hence, no knowledge about the participant-specific ground truth could have influenced the selection of ICs. Even with this process from adults, we find results of comparable strength as we found in infants, as shown below. Nonetheless, there is a possibility that this researcher’s previous experience of tracing the infant maps could have influenced their choice of components at the participant-specific level. If so, it was a small effect since the components the researcher selected were far from the best possible options (i.e., rankings of the selected components averaged in the 64th percentile for spatial frequency maps and the 68th percentile for meridian maps). We believe all reasonable steps were taken to mitigate bias in the selection of ICs.

      Reviewer #3 (Public Review):

      The manuscript reports data collected in awake toddlers recording BOLD while watching videos. The authors analyse the BOLD time series using two different statistical approaches, both very complex but do not require any a priori determination of the movie features or contents to be associated with regressors. The two main messages are that 1) toddlers have occipital visual areas very similar to adults, given that an SRM model derived from adult BOLD is consistent with the infant brains as well; 2) the retinotopic organization and the spatial frequency selectivity of the occipital maps derived by applying correlation analysis are consistent with the maps obtained by standard and conventional mapping.

      Clearly, the data are important, and the author has achieved important and original results. However, the manuscript is totally unclear and very difficult to follow; the figures are not informative; the reader needs to trust the authors because no data to verify the output of the statistical analysis are presented (localization maps with proper statistics) nor so any validation of the statistical analysis provided. Indeed what I think that manuscript means, or better what I understood, may be very far from what the authors want to present, given how obscure the methods and the result presentation are.

      In the present form, this reviewer considers that the manuscript needs to be totally rewritten, the results presented each technique with appropriate validation or comparison that the reader can evaluate.

      We are grateful to the reviewer for the chance to improve the paper. We have broken their review into three parts: clarification of the methods, validation of the analyses, and enhancing the visualization.

      Clarification of the methods

      We acknowledge that the methods we employed are complex and uncommon in many fields of neuroimaging. That said, numerous papers have conducted these analyses on adults (Beckman et al., 2005; Butt et al., 2015; Guntupalli et al., 2016; Haak & Beckman, 2018; Knapen, 2021; Lu et al., 2017) and non-human primates (Arcaro & Livingstone, 2017; Moeller et al., 2009). We have redoubled our efforts in the revision to make the methods as clear as possible, expanding on the original text and providing intuitions where possible. These changes have been added throughout and are too vast in number to repeat here, especially without context, but we hope that readers will have an easier time following the analyses now. 

      Additionally, we updated Figures 3 and 5 in which the main ICA and SRM analyses are described. For instance, in Figure 3’s caption we now add details about how the gradient analyses were performed on the components: 

      “We used the same lines that were manually traced on the task-evoked map to assess the change in the component’s response. We found a monotonic trend within area from medial to lateral, just like we see in the ground truth.” Pg. 11

      Regarding Figure 5, we reconsidered the best way to explain the SRM analyses and decided it would be helpful to partition the diagram into steps, reflecting the analytic process. These updates have been added to Figure 5, and the caption has been updated accordingly.

      We hope that these changes have improved the clarity of the methods. For readers interested in learning more, we encourage them to either read the methods-focused papers that debut the analyses (e.g., Chen et al., 2015), read the papers applying the methods (e.g., Guntupalli et al., 2016), or read the annotated code we publicly release which implements these pipelines and can be used to replicate the findings.

      Validation of the analyses

      One of the requests the reviewer makes is to validate our analyses. Our initial approach was to lean on papers that have used these methods in adults or primates (e.g., Arcaro, & Livingstone, 2017; Beckman et al., 2005; Butt et al., 2015; Guntupalli et al., 2016; Haak & Beckman, 2018; Knapen, 2021; Moeller et al., 2009) where the underlying organization and neurophysiology is established. However, we have made changes to these methods that differ from their original usage (e.g., we used SRM rather than hyperalignment, we use meridian mapping rather than traveling wave retinotopy, we use movie-watching data rather than rest). Hence, the specifics of our design and pipeline warrant validation. 

      To add further validation, we have rerun the main analyses on an adult sample. We collected 8 adult participants who completed the same retinotopy task and a large subset of the movies that infants saw. These participants were run under maximally similar conditions to infants (i.e., scanned using the same parameters and without the top of the head-coil) and were preprocessed using the same pipeline. Given that the relationship between adult visual maps and movie-driven (or resting-state) analyses has been shown in many studies (Beckman et al., 2005; Butt et al., 2015; Guntupalli et al., 2016; Haak & Beckman, 2018; Knapen, 2021; Lu et al., 2017), these adult data serve as a validation of our analysis pipeline. These adult participants were included in the original manuscript; however, they were previously only used to support the SRM analyses (i.e., can adults be used to predict infant visual maps). The adult results are described before any results with infants, as a way to engender confidence. Moreover, we have provided new supplementary figures of the adult results that we hope will be integrated with the article when viewing it online, such that it will be easy to compare infant and adult results, as per the reviewer’s request. 

      As per the figures and captions below, the analyses were all successful with the adult participants: 1) Homotopic correlations are higher than correlations between comparable areas in other streams or areas that are more distant within stream. 2) A multidimensional scaling depiction of the data shows that areas in the dorsal and ventral stream are dissimilar. 3) Using independent components analysis on the movie data, we identified components that are highly correlated with the retinotopy task-based spatial frequency and meridian maps. 4) Using shared response modeling on the movie data, we predicted maps that are highly correlated with the retinotopy task-based spatial frequency and meridian maps.

      These supplementary analyses are underpowered for between-group comparisons, so we do not statistically compare the results between infants and adults. Nonetheless, the pattern of adult results is comparable overall to the infant results. 

      We believe these adult results provide a useful validation that the infant analyses we performed can recover fine-grained organization.

      Enhancing the visualization

      The reviewer raises an additional concern about the lack of visualization of the results. We recognize that the plots of the summary statistics do not provide information about the intermediate analyses. Indeed, we think the summary statistics can understate the degree of similarity between the components or predicted visual maps and the ground truth. Hence, we have added 6 new supplementary figures showing the intensity gradients for the following analyses: 1. spatial frequency prediction using ICA, 2. meridian prediction using ICA, 3. spatial frequency prediction using infant SRM, 4. meridian prediction using infant SRM, 5. spatial frequency prediction using adult SRM, and 6. meridian prediction using adult SRM.

      We hope that these visualizations are helpful. It is possible that the reviewer wishes us to also visually present the raw maps from the ICA and SRM, akin to what we show in Figure 3A and 3B. We believe this is out of scope of this paper: of the 1140 components that were identified by ICA, we selected 36 for spatial frequency and 17 for meridian maps. We also created 20 predicted maps for spatial frequency and 20 predicted meridian maps using SRM. This would result in the depiction of 93 subfigures, requiring at least 15 new full-page supplementary figures to display with adequate resolution. Instead, we encourage the reader to access this content themselves: we have made the code to recreate the analyses publicly available, as well as both the raw and preprocessed data for these analyses, including the data for each of these selected maps.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) As mentioned in the public review, the authors should consider incorporating relevant adult fMRI research into the Introduction and explain the importance of testing this question in infants.

      Our public response describes the several citations to relevant adult research we have added, and have provided further motivation for the project.

      (2) The authors should conduct additional analyses to support their conclusion that movie data alone can generate accurate retinotopic maps (i.e., by comparing this approach to other available alternatives).

      We have clarified in our public response that we did not wish to conclude that movie data alone can generate accurate retinotopic maps, and have made substantial edits to the text to emphasize this. Thus, because this claim is already not supported by our analyses, we do not think it is necessary to test it further.

      (3) The authors should re-do the homotopy analyses using movie-defined ROIs (i.e., by splitting the movie-viewing data into independent folds for functional ROI definition and analyses).

      As stated above, defining ROIs based on the movie content is not the intended goal of this project. Even if that were the general goal, we do not believe that it would be appropriate to run this specific analysis with the data we collected. Firstly, halving the data for ROI definition (e.g., using half the movie data to identify and trace areas, and then use those areas in the homotopy analysis to run on the other half of data) would qualitatively change the power of the analyses described here. Secondly, we would be unable to define areas beyond hV4/V3AB with confidence, since our retinotopic mapping only affords specification of early visual cortex. Thus we could not conduct the MDS analyses shown in Figure 2.

      (4) If the authors agree that a primary contribution of this study and paper is to showcase what is possible to do with a limited amount of movie-viewing data, then they should make it clearer, sooner, how much usable movie data they have from infants. They could also consider conducting additional analyses to determine the minimum amount of fMRI data necessary to reveal the same detailed characteristics of functional responses in the visual cortex.

      We agree it would be good to highlight the amount of movie data used. When the infant data is first introduced in the results section, we now state the durations:

      “All available movies from each session were included (Table S2), with an average duration of 540.7s (range: 186--1116s).” Pg. 5

      Additionally, we have added a homotopy analysis that describes the contribution of data quantity to the results observed. We compare the amount of data collected with the magnitude of same vs. different stream effect (Figure 1B) and within stream distance effect (Figure 1C). We find no effect of movie duration in the sample we tested, as reported below:

      “We found no evidence that the variability in movie duration per participant correlated with this difference [of same stream vs. different stream] (r=0.08, p=.700).” Pg. 6-7

      “There was no correlation between movie duration and the effect (Same > Adjacent: r=-0.01, p=.965, Adjacent > Distal: r=-0.09, p=.740).” Pg. 7

      (5) If any of the methodological approaches are novel, the authors should make this clear. In particular, has the approach of visually inspecting and categorizing components generated from ICA and movie data been done before, in adults/other contexts?

      The methods we employed are similar to others, as described in the public review.

      However, changes were necessary to apply them to infant samples. For instance, Guntupalli et al. (2016) used hyperalignment to predict the visual maps of adult participants, whereas we use SRM. SRM and hyperalignment have the same goal — find a maximally aligned representation between participants based on brain function — but their implementation is different. The application of functional alignment to infants is novel, as is their use in movie data that is relatively short by comparison to standard adult data. Indeed, this is the most thorough demonstration that SRM — or any functional alignment procedure — can be usefully applied to infant data, awake or sleeping. We have clarified this point in the discussion.

      “This is initial evidence that functional alignment may be useful for enhancing signal quality, like it has in adults[27,32,33], or revealing changing function over development[45], which may prove especially useful for infant fMRI[52].” Pg. 21

      (6) The authors found that meridian maps were less identifiable from ICA and movie data and suggest that this may be because these maps are more susceptible to noise or gaze variability. If this is the case, you might predict that these maps are more identifiable in adult data. The authors could consider running additional analyses with their adult participants to better understand this result.

      As described in the manuscript, we hypothesize that meridian maps are more difficult to identify than spatial frequency maps because meridian maps are a less smooth, more fine-grained map than spatial frequency. Indeed, it has previously been reported (Moeller et al., 2009) that similar procedures can result in meridian maps that are constituted by multiple independent components (e.g., a component sensitive to horizontal orientations, and a separate component sensitive to vertical components). Nonetheless, we have now conducted the ICA procedure on adult participants and again find it is easier to identify spatial frequency components compared to meridian maps, as reported in the public review.

      Minor corrections:

      (1) Typo: Figure 3 title: "Example retintopic task vs. ICA-based spatial frequency maps.".

      Fixed

      (2) Given the age range of the participants, consider using "infants and toddlers"? (Not to diminish the results at all; on the contrary, I think it is perhaps even more impressive to obtain awake fMRI data from ~1-2-year-olds). Example: Figure 3 legend: "A) Spatial frequency map of a 17.1-monthold infant.".

      We agree with the reviewer that there is disagreement about the age range at which a child starts being considered a toddler. We have changed the terms in places where we refer to a toddler in particular (e.g., the figure caption the reviewer highlights) and added the phrase “infants and toddlers” in places where appropriate. Nonetheless, we have kept “infants” in some places, particularly those where we are comparing the sample to adults. Adding “and toddlers” could imply three samples being compared which would confuse the reader.

      (3) Figure 6 legend: The following text should be omitted as there is no bar plot in this figure: "The bar plot is the average across participants. The error bar is the standard error across participants.".

      Fixed

      (4) Table S1 legend: Missing first single quote: Runs'.

      Fixed

      Reviewer #2 (Recommendations For The Authors):

      I request that this paper cite more of the existing literature on the fMRI of human infants and toddlers using task-driven and resting-state data. For example, early studies by (first authors) Biagi, Dehaene-Lambertz, Cusack, and Fransson, and more recent studies by Chen, Cabral, Truzzi, Deen, and Kosakowski.

      We have added several new citations of recent task-based and resting state studies to the second sentence of the main text:

      “Despite the recent growth in infant fMRI[1-6], one of the most important obstacles facing this research is that infants are unable to maintain focus for long periods of time and struggle to complete traditional cognitive tasks[7].”

      Reviewer #3 (Recommendations For The Authors):

      In the following, I report some of my main perplexities, but many more may arise when the material is presented more clearly.

      The age of the children varies from 5 months to about 2 years. While the developmental literature suggests that between 1 and 2 years children have a visual system nearly adult-like, below that age some areas may be very immature. I would split the sample and perhaps attempt to validate the adult SRM model with the youngest children (and those can be called infants).

      We recognize the substantial age variability in our sample, which is why we report participant-specific data in our figures. While splitting up the data into age bins might reveal age effects, we do not think we can perform adequately powered null hypothesis testing of the age trend. In order to investigate the contribution of age, larger samples will be needed. That said, we can see from the data that we have reported that any effect of age is likely small. To elaborate: Figures 4 and 6 report the participant-specific data points and order the participants by age. There are no clear linear trends in these plots, thus there are no strong age effects.

      More broadly, we do not think there is a principled way to divide the participants by age. The reviewer suggests that the visual system is immature before the first year of life and mature afterward; however, such claims are the exact motivation for the type of work we are doing here, and the verdict is still out. Indeed, the conclusion of our earlier work reporting retinotopy in infants (Ellis et al., 2021) suggests that the organization of the early visual cortex in infants as young as 5 months — the youngest infant in our sample — is surprisingly adult-like.

      The title cannot refer to infants given the age span.

      There is disagreement in the field about the age at which it is appropriate to refer to children as infants. In this paper, and in our prior work, we followed the practice of the most attended infant cognition conference and society, the International Congress of Infant Studies (ICIS), which considers infants as those aged between 0-3 years old, for the purposes of their conference. Indeed, we have never received this concern across dozens of prior reviews for previous papers covering a similar age range. That said, we understand the spirit of the reviewer’s comment and now refer to the sample as “infants and toddlers” and to older individuals in our sample as “toddlers” wherever it is appropriate (the younger individuals would fairly be considered “infants” under any definition).

      Figure 1 is clear and an interesting approach. Please also show the average correlation maps on the cortical surface.

      While we would like to create a figure as requested, we are unsure how to depict an area-by-area correlation map on the cortical surface. One option would be to generate a seed-based map in which we take an area and depict the correlation of that seed (e.g., vV1) with all other voxels. This approach would result in 8 maps for just the task-defined areas, and 17 maps for anatomically-defined areas. Hence, we believe this is out of scope of this paper, but an interested reader could easily generate these maps from the data we have released.

      Figure 2 results are not easily interpretable. Ventral and dorsal V1-V3 areas represent upper or lower VF respectively. Higher dorsal and ventral areas represent both upper and lower VF, so we should predict an equal distance between the two streams. Again, how can we verify that it is not a result of some artifacts?

      In adults, visual areas differ in their functional response properties along multiple dimensions, including spatial coding. The dorsal/ventral stream hypothesis is derived from the idea that areas in each stream support different functions, independent of spatial coding. The MDS analysis did not attempt to isolate the specific contribution of spatial representations of each area but instead tested the similarity of function that is evoked in naturalistic viewing. Other covariance-based analyses specifically isolate the contribution of spatial representations (Haak et al., 2013); however, they use a much more constrained analysis than what was implemented here. The fact that we find broad differentiation of dorsal and ventral visual areas in infants is consistent with adults (Haak & Beckman, 2018) and neonate non-human primates (Arcaro & Livingstone, 2017). 

      Nonetheless, we recognize that we did not mention the differences in visual field properties across areas and what that means. If visual field properties alone drove the functional response then we would expect to see a clustering of areas based on the visual field they represent (e.g., hV4 and V3AB should have similar representations). Since we did not see that, and instead saw organization by visual stream, the result is interesting and thus warrants reporting. We now mention this difference in visual fields in the manuscript to highlight the surprising nature of the result.

      “This separation between streams is striking when considering that it happens despite differences in visual field representations across areas: while dorsal V1 and ventral V1 represent the lower and upper visual field, respectively, V3A/B and hV4 both have full visual field maps. These visual field representations can be detected in adults[41]; however, they are often not the primary driver of function[39]. We see that in infants too: hV4 and V3A/B represent the same visual space yet have distinct functional profiles.” Pg. 8

      The reviewer raises a concern that the MDS result may be spurious and caused by noise. Below, we present three reasons why we believe these results are not accounted for by artifacts but instead reflect real functional differentiation in the visual cortex. 

      (1) Figure 2 is a visualization of the similarity matrix presented in Figure S1. In Figure S1, we report the significance testing we performed to confirm that the patterns differentiating dorsal and ventral streams — as well as adjacent areas from distal areas — are statistically reliable across participants. If an artifact accounted for the result then it would have to be a kind of systematic noise that is consistent across participants.

      (2) One of the main sources of noise (both systematic and non-systematic) with infant fMRI is motion. Homotopy is a within-participant analysis that could be biased by motion. To assess whether motion accounts for the results, we took a conservative approach of regressing out the framewise motion (i.e., how much movement there is between fMRI volumes) from the comparisons of the functional activity in regions. Although the correlations numerically decreased with this procedure, they were qualitatively similar to the analysis that does not regress out motion:

      “Additionally, if we control for motion in the correlation between areas --- in case motion transients drive consistent activity across areas --- then the effects described here are negligibly different (Figure S5).” Pg. 7

      (3) We recognize that despite these analyses, it would be helpful to see what this pattern looks like in adults where we know more about the visual field properties and the function of dorsal and ventral streams. This has been done previously (e.g., Haak & Beckman, 2018), but we have now run those analyses on adults in our sample, as described in the public review. As with infants, there are reliable differences in the homotopy between streams (Figure S1). The MDS results show that the adult data was more complex than the infant data, since it was best described by 3 dimensions rather than 2. Nonetheless, there is a rotation of the MDS such that the structure of the ventral and dorsal streams is also dissociable. 

      Figure 3 also raises several alternative interpretations. The spatial frequency component in B has strong activity ONLY at the extreme border of the VF and this is probably the origin of the strong correlation. I understand that it is only one subject, but this brings the need to show all subjects and to report the correlation. Also, it is important to show the putative average ICA for retinotopy and spatial frequencies across subjects and for adults. All methods should be validated on adults where we have clear data for retinotopy and spatial frequency.

      The reviewer notes that the component in Figure 3 shows strong negative response in the periphery. It is often the case, as reported elsewhere (Moeller et al., 2009), that ICA extracts portions of visual maps. To make a full visual map would require combining components into a composite (e.g., a component that has a high response in the periphery and another component that has a high response in the fovea). If we were to claim that this component, or others like it, could replace the need for retinotopic mapping, then we would want to produce these composite maps; however, our conclusion in this project is that the topographic information of retinotopic maps manifest in individual components of ICA. For this purpose, the analysis we perform adequately assesses this topography.

      Regarding the request to show the results for all subjects, we address this in the public response and repeat it here briefly: we have added 6 new figures to show results akin to Figure 3C and D. It is impractical to show the equivalent of Figure 3A and B for all participants, yet we do release the data necessary to see to visualize these maps easily.

      Finally, the reviewer suggests that we validate the analyses on adult participants. As shown in Figure S3 and reported in the public response, we now run these analyses on adult participants and observe qualitatively similar results to infants.

      How much was the variation in the presumed spatial frequency map? Is it consistent with the acuity range? 5-month-old infants should have an acuity of around 10c/deg, depending on the mean luminance of the scene.

      The reviewer highlights an important weakness of conducting ICA: we cannot put units on the degree of variation we see in components. We now highlight this weakness in the discussion:

      “Another limitation is that ICA does not provide a scale to the variation: although we find a correlation between gradients of spatial frequency in the ground truth and the selected component, we cannot use the component alone to infer the spatial frequency selectivity of any part of cortex. In other words, we cannot infer units of spatial frequency sensitivity from the components alone.” Pg. 20

      Figure 5 pipeline is totally obscure. I presumed that I understood, but as it is it is useless. All methods should be clearly described, and the intermediate results should be illustrated in figures and appropriately discussed. Using such blind analyses in infants in principle may not be appropriate and this needs to be verified. Overall all these techniques rely on correlation activities that are all biased by head movement, eye movement, and probably the dummy sucking. All those movements need to be estimated and correlated with the variability of the results. It is a strong assumption that the techniques should work in infants, given the presence of movements.

      We recognize that the SRM methods are complex. Given this feedback, we remade Figure 5 with explicit steps for the process and updated the caption (as reported in the public review).

      Regarding the validation of these methods, we have added SRM analyses from adults and find comparable results. This means that using these methods on adults with comparable amounts of data as what we collected from infants can predict maps that are highly similar to the real maps. Even so, it is not a given that these methods are valid in infants. We present two considerations in this regard. 

      First, as part of the SRM analyses reported in the manuscript, we show that control analyses are significantly worse than the real analyses (indicated by the lines on Figure 6). To clarify the control analysis: we break the mapping (i.e., flip the order of the data so that it is backwards) between the test participant and the training participants used to create the SRM. The fact that this control analysis is significantly worse indicates that SRM is learning meaningful representations that matter for retinotopy. 

      Second, we believe that this paper is a validation of SRM for infants. Infant fMRI is a nascent field and SRM has the potential to increase the signal quality in this population. We hope that readers will see these analyses as a proof of concept that SRM can be used in their work with infants. We have stated this contribution in the paper now.

      “Additionally, we wish to test whether methods for functional alignment can be used with infants. Functional alignment finds a mapping between participants using functional activity -- rather than anatomy -- and in adults can improve signal-to-noise, enhance across participant prediction, and enable unique analyses[27,32-34].” Pg. 4

      “This is initial evidence that functional alignment may be useful for enhancing signal quality, like it has in adults[27,32,33], or revealing changing function over development[45].” Pg. 21

      Regarding the reviewer’s concern that motion may bias the results, we wish to emphasize the nature of the analyses being conducted here: we are using data from a group of participants to predict the neural responses in a held-out participant. For motion to explain consistency between participants, the motion would need to be timelocked across participants. Even if motion was time-locked during movie watching, motion will impair the formation of an adequate model that can contain retinotopic information. Thus, motion should only hurt the ability for a shared response to be found that can be used for predicting retinotopic maps. Hence, the results we observed are despite motion and other sources of noise.

      What is M??? is it simply the mean value??? If not, how it is estimated?

      M is an abbreviation for mean. We have now expanded the abbreviation the first time we use it.

      Figure 6 should be integrated with map activity where the individual area correlation should be illustrated. Probably fitting SMR adult works well for early cortical areas, but not for more ventral and associative, and the correlation should be evaluated for the different masks.

      With the addition of plots showing the gradients for each participant and each movie (Figures S10–S13) we hope we have addressed this concern. We additionally want to clarify that the regions we tested in the analysis in Figure 6 are only the early visual areas V1, V2, V3, V3A/B, and hV4. The adult validation analyses show that SRM works well for predicting the visual maps in these areas. Nonetheless, it is an interesting question for future research with more extensive retinotopic mapping in infants to see if SRM can predict maps beyond extrastriate cortex.

      Occipital masks have never been described or shown.

      The occipital mask is from the MNI probabilistic structural atlas (Mazziotta et al., 2001), as reported in the original version and is shared with the public data release. We have added the additional detail that the probabilistic atlas is thresholded at 0% in order to be liberally inclusive. 

      “We used the occipital mask from the MNI structural atlas[63] in standard space -- defined liberally to include any voxel with an above zero probability of being labelled as the occipital lobe -- and used the inverted transform to put it into native functional space.” Pg. 27–28

      Methods lack the main explanation of the procedures and software description.

      We hope that the additions we have made to address this reviewer’s concerns have provided better explanations for our procedures. Additionally, as part of the data and code release, we thoroughly explain all of the software needed to recreate the results we have observed here.

    1. In this study, we developed SCZ-specific brain assembloids that, we believe, may overcome many limitations in studying the pathogenesis of human SCZ, such as the lack of systems representing different stages of developing human brains with cellular complexity.

      I think this is one important crux of the study, and could lead to resources useful for the field.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1:

      (1) All analyses were performed on trial-averaged neural responses that were pooled across mice. Owing to differences between subjects in behavior, experimental preparation quality, and biological variability, it seems important to perform at least some analyses on individual analyses to assess how behavioral training might differently affect each animal.

      In order to image at a relatively fast rate (30Hz) appropriate to the experimental conditions, we restricted our imaging to a relatively small field of view (412x412um with 512x512 pixels). This entails a smaller number of ROIs per animal, which can lead to an unbalanced distribution of cells responsive to different stimuli for individual fields-of-view. We used the common approach of pooling across animals (Homann et al., 2021; Kim et al., 2019) to overcome limitations imposed by sampling a smaller number of cells per animal. In response to this comment, we included supplemental analyses (Sup.Fig. 6) showing that representational drift (which was not performed on trial-averaged data) looks substantially the same (albeit nosier) for individual animals as at the population level. Additional analyses (PE ratio, etc.) were difficult since the distribution of cells selective for individual stimuli is unbalanced between individual animals and few mice have multiple cells representing all of the different stimuli.

      (2) The correlation analyses presented in Figure 3 (labeled the second Figure 2 in the text) should be conducted on a single-animal basis. Studying population codes constructed by pooling across mice, particularly when there is no behavioral readout to assess whether learning has had similar effects on all animals, appears inappropriate to me. If the results in Figure 3 hold up on single animals, I think that is definitely an interesting result.

      We repeated the correlation analysis performed on mice individually and included them in the supplement (Supp. Fig. 6). The overall result generally mirrors the result found by pooling across animals.

      (3) On Day 0 and Day 5, the reordered stimuli are presented in trial blocks where each image sequence is shown 100 times. Why wasn't the trial ordering randomized as was done in previous studies (e.g. Gavornik and Bear 2014)? Given this lack of reordering, did neurons show reduced predictive responses because the unexpected sequence was shown so many times in quick succession? This might change the results seen in Figure 2, as well as the decoder results where there is a neural encoding of sequence order (Figure 4). It would be interesting if the Figure 4 decoder stopped working when the higher-order block structure of the task was disrupted.

      Our work builds primarily on previous studies (Gavornik & Bear, 2014; Price et al., 2023) that demonstrated clear changes in neural responses over days while employing a similar block structure. Notably, Price et al. found that trial number (within a block) was not a significant factor in the generation of prediction-error responses which strongly suggests short-term plasticity does not play a significant role in shaping responses within the block structure. This finding is consistent with our previous LFP recordings which have not revealed any significant plasticity occurring within a training session, a conclusion bolstered by a collaborative work currently in press (Hosmane et al. 2024, Sleep) revealing the requirement for sleep in sequence plasticity expression.

      It is possible that layer 2/3 adapts to sequences more rapidly than layer 4/5. While manual inspection does not reveal an obvious difference between early and late blocks in this dataset, the n for this subset is too small to draw firm conclusions. It is our view that the block structure provides the strongest comparison to previous work, but agree it would be interesting to randomize or fully interleave sequences in future studies to determine what effect, if any, short-term changes might have. 

      (4) A primary advantage of using two-photon calcium imaging over other techniques like extracellular electrophysiology is that the same neurons can be tracked over many days. This is a standard approach that can be accomplished by using many software packages-including Suite2P (Pachitariu et al. 2017), which is what the authors already used for the rest of their data preprocessing. The authors of this paper did not appear to do this. Instead, it appears that different neurons were imaged on Day 0 (baseline) and Day 5 (test). This is a significant weakness of the current dataset.

      The hypothesis being tested was whether expectation violations, as described in Keller & Mrsic-Flogel 2018, exist under a multi-day sequence learning paradigm. For this, tracking cells across days is not necessary as our PE metric compared responses of individual neurons to multiple stimuli within a single session. Given the speed/FOV tradeoff discussed above, we wanted to consider all cells irrespective of whether they were visible/active or trackable across days, especially since we would expect cells that learn to signal prediction errors to be inactive on day 0 and not selected by our segmentation algorithm. Though we did not compare the responses of single cells before/after training, we did analyze cells from the same field of view on days 0 and 5 (see Supp.Fig. 1) and not distinct populations.

      Reviewer #2:

      (1) There appears to be some confusion regarding the conceptual framing of predictive coding.

      Assuming the mouse learns to expect the sequence ABCD, then ABBD does not probe just for negative prediction errors, and ACBD is not just for positive prediction errors. With ABBD, there is a combination of a negative prediction error for the missing C in the 3rd position, and a positive prediction error for B in the 3rd. Likewise, with ACBD, there is a negative prediction error for the missing B at 2nd and missing C at 3rd, and a positive prediction error for the C in 2nd and B in 3rd. Thus, the authors' experimental design does not have the power to isolate either negative or positive prediction errors. Moreover, looking at the raw data in Figure 2C, this does not look like an "omission" response to C, but more like a stronger response to a longer B. The pitch of the paper as investigating prediction error responses is probably not warranted - we see no way to align the authors' results with this interpretation.

      The reviewer has identified a real problem with the framing of “positive” and “negative” prediction errors in context of sensory stimuli where substitution simultaneously introduces unexpected “positive” violation and “negative” omission. Simply put, even if there are separate mechanisms to represent positive and negative errors, there may be no way to isolate the positive response experimentally since an unexpected input always replaces the unseen expected input. For example, had a cell fired solely to ACBD (and not during either ABCD or ABCD), then whether it was signaling the unexpected occurrence of C or the unexpected absence of B would be inherently ambiguous. In either case, such a cell would have been labeled as C-responsive, and its activity would have been elevated compared with ABCD and would have been included in our substitution-type analysis of prediction errors. We accept that there is some ambiguity regarding the description in this particular case, but overall, this cell’s activity pattern would have informed the PE analysis for which the result was essentially null for the substitution-type violation ACBD.

      Omission, in which the sensory input does not change, may experimentally isolate the negative response though this is only true if there is a temporal expectation of when the change should have occurred. If A is predicting B in an ordinal sense but there is no expectation of when B will occur with respect to A, changing the duration of A would not be expected to produce an error signal since at any point in time B might still be coming and the expectation is not broken until something other than B occurs. With respect specifically to ABBD in our experiments, it is correct that the learned error responses take the form of stronger, sustained responses to B during the time C was expected. This is still in contrast to day 0 in which activation decays after a transient response to ABBD. The data shows that responses during an omitted element are altered with training and take the form of elevated responses to ABBD on day 5.As we say in our discussion, this is somewhat ambiguous evidence of prediction errors since it does emerges only with training and is generally consistent with the hypothesis being tested though it takes a different form than we expected it to.

      (2) Related to the interpretation of the findings, just because something can be described as a prediction error does not mean it is computed in (or even is relevant to) the visual cortex. To the best of our knowledge, it is still unclear where in the visual stream the responses described here are computed. It is possible that this type of computation happens before the signals reach the visual cortex, similar to mechanisms predicting moving stimuli already in the retina (https://pubmed.ncbi.nlm.nih.gov/10192333/). This would also be consistent with the authors' finding (in previous work) that single-cell recordings in V1 exhibit weaker sequence violation responses than the author's earlier work using LFP recordings.

      Our work was aimed at testing the specific hypothesis that PE responses, at the very least, exist in L2/3—a hypothesis that is well-supported under different experimental paradigms (often multisensory mismatch). Our aim was to test this idea under a sequence learning paradigm and connect it with previously found PE responses in L4. We don’t claim that it is the only place in which prediction errors may be computed or useful, especially since (as you mentioned), there is evidence for such responses in layer 4. But it is fundamentally important to predictive processing that we determine whether PE responses can be found in layer 2/3 under this passive sequence learning paradigm, whether or not they reflect upstream processes, feedback from higher areas, or entirely local computations. Our aim was to establish some baseline evidence for or against predictive processing accounts of L2/3 activity during passive exposure to visual sequences.

      (3) Recording from the same neurons over the course of this paradigm is well within the technical standards of the field, and there is no reason not to do this. Given that the authors chose to record from different neurons, it is difficult to distinguish representational drift from drift in the population of neurons recorded.

      Our discussion of drift refers to changes occurring within a population of neurons over the course of a single imaging session. We have added clarifying language to the manuscript to make this clear. Changes to the population-level encoding of stimuli over days are treated separately and with different analytical tools. Re. tracking single across days, please see the response to Reviewer #1, comment 4.

      (4) The block paradigm to test for prediction errors appears ill-chosen. Why not interleave oddball stimuli randomly in a sequence of normal stimuli? The concern is related to the question of how many repetitions it takes to learn a sequence. Can the mice not learn ACBD over 100x repetitions? The authors should definitely look at early vs. late responses in the oddball block. Also, the first few presentations after the block transition might be potentially interesting. The authors' analysis in the paper already strongly suggests that the mice learn rather rapidly. The authors conclude: "we expected ABCD would be more-or-less indistinguishable from ABBD and ACBD since A occurs first in each sequence and always preceded by a long (800 ms) gray period.

      This was not the case. Most often, the decoder correctly identified which sequence stimulus A came from." This would suggest that whatever learning/drift could happen within one block did indeed happen and responses to different sequences are harder to interpret.

      This work builds on previous studies that used a block structure to drive plasticity across days. We previously tested whether there are intra-block effects and found no indication of changes occurring within a block or withing a session (please see the response to Reviewer #1, comment 3 for further discussion). Observed drift does complicate comparison between blocks. There is no indication in our data that this is a learned effect, though future experiments could test this directly.

      (5) Throughout the manuscript, many of the claims are not statistically tested, and where they are the tests do not appear to be hierarchical (https://pubmed.ncbi.nlm.nih.gov/24671065/), even though the data are likely nested.

      We have modified language throughout the manuscript to be more precise about our claims. We used pooled data between mice and common parametric statistics in line with published literature. The referenced paper offers a broad critique of this approach, arguing that it increases the possibility of type 1 errors, though it is not clear to us that our experimental design carries this risk particularly since most of our results were negative. To address the specific concern, however we performed a non-parametric hierarchical bootstrap analysis (https://pmc.ncbi.nlm.nih.gov/articles/PMC7906290/) that re-confirmed the statistical significance of our positive results, see Supplemental Figure 8.

      (6) The manuscript would greatly benefit from thorough proofreading (not just in regard to figure references).

      We apologize for the errors in the manuscript. We caught the issue and passed on a corrected draft, but apparently the uncorrected draft was sent for review. The re-written manuscript addresses all identified issues.

      (7) With a sequence of stimuli that are 250ms in length each, the use of GCaMP6s appears like a very poor choice.

      We started our experiments using GCaMP6f but ultimately switched to GCaMP6s due to its improved sensitivity, brightness, and accuracy in spike detection (Huang et al., 2021). When combined with deconvolution (Pachitariu et al., 2018; Pnevmatikakis et al., 2016), we found GCaMP6s provides the most complete and accurate view of spiking within 40ms time bins. The inherent limitations of calcium imaging are more likely to be addressed using electrophysiology rather than a faster sensor in future studies.

      (8) The data shown are unnecessarily selective. E.g. it would probably be interesting to see how the average population response evolves with days. The relevant question for most prediction error interpretations would be whether there are subpopulations of neurons that selectively respond to any of the oddballs. E.g. while the authors state they "did" not identify a separate population of omission-responsive neurons, they provide no evidence for this. However, it is unclear whether the block structure of the experiments allows the authors to analyze this.

      We concluded that there is no clear dedicated subpopulation of omission-responding cells by inspecting cells with large PE responses (i.e., ABBD, see supplemental figure 3). Out of the 107 B-responsive cells on day 5, only one appeared to fire exclusively during the omitted stimulus. Average traces for all B-responsive cells are included in the supplement and we have updated the manuscript accordingly. Similarly, a single C-responsive cell was found with an apparently unique substitution error profile (ABCD and ACBD , supplemental figure 4).

      Our primary concern was to make sure that days 0 and 5 had the highest quality fields-of-view. In work leading up to this study, there were concerns that imaging on all intermediate days resulted in a degradation of quality due to photobleaching. We agree that an analysis of intermediate days would be interesting, but it was excluded due to these concerns. 

      Reviewer #3:

      (1) Experimental design using a block structure. The use of a block structure on test days (0 and 5) in which sequences were presented in 100 repetition blocks leads to several potential confounds. First, there is the potential for plasticity within blocks, which could alter the responses and induce learned expectations. The ability of the authors to clearly distinguish blocks 1 and 2 on Day 0 with a decoder suggests this change over time may be meaningful.

      Repeating the experiments with fully interleaved sequences on test days would alleviate this concern. With the existing data, the authors should compare responses from the first trials in a block to the last trials in a block.

      This block design likely also accounts for the ability of a decoder to readily distinguish stimulus A in ABCD from A in ABBD. As all ABCD sequences were run in a contiguous block separate from ABBD, the recent history of experience is different for A stimuli in ABCD versus ABBD. Running fully interleaved sequences would also address this point, and would also potentially mitigate the impact of drift over blocks (discussed below).

      As described in other responses, the block structure was chosen to align more closely with previous studies. We take the overall point though, and future studies will employ the suggested randomized or interleaved structure in addition to block structures to investigate the effects of short-term plasticity.

      (2) The computation of prediction error differs significantly for omission as opposed to substitutions, in meaningful ways the authors do not address. For omission errors, PE compares the responses of B1 and B2 within ABBD blocks. These responses are measured from the same trial, within tens of milliseconds of each other. In contrast, substitution PE is computed by comparing C in ABCD to C in ACBD. As noted above, the block structure means that these C responses were recorded in different blocks, when the state of the brain could be different. This may account for the authors' detection of prediction error for omission but not substitution. To address this, the authors should calculate PE for omission using B responses from ABCD.

      We performed the suggested analysis (i.e., ABBD vs ABCD) prior to submission but omitted it from the draft for brevity (the effect was the same as with ABBD vs ABBD). We have added the results of standardizing with ABCD as supplementary figure 3.

      (3) The behavior of responses to B and C within the trained sequence ABCD differs considerably, yet is not addressed. Responses to B in ABCD potentiate from d0-> d5, yet responses to C in the same sequence go down. This suggests there may be some difference in either the representation of B vs C or position 2 vs 3 in the sequence that may also be contributing to the appearance of prediction errors in ABBD but not ACBD. The authors do not appear to consider this point, which could potentially impact their results. Presenting different stimuli for A,B,C,D across mice would help (in the current paper B is 75 deg and C is 165 deg in all cases). Additionally, other omissions or substitutions at different sequence positions should be tested (eg ABCC or ABDC).

      We appreciate the suggestion. Ideally, we could test many different variants, but practical concerns regarding the duration of the imaging sessions prevented us from testing other interesting variations (such as ABCC) in the current study. We are uncertain as to how we should interpret the overall depressed response to element C seen on day 5, but since the effect is shared in both ABCD and ACBD, we don’t think it affected our PE calculations. 

      (4) The authors' interpretation of their PCA results is flawed. The authors write "Experience simplifies activity in principal component space". This is untrue based on their data. The variance explained by the first set of PCs does not change with training, indicating that the data is not residing in a lower dimensional ("simpler") space. Instead, the authors show that the first 5 PCs better align with their a priori expectations of the stimulus structure, but that does not mean these PCs necessarily represent more information about the stimulus (and the fact that the authors fail to see an improvement in decoding performance argues against this case). Addressing such a question would be highly interesting, but is lacking in the current manuscript. Without such analysis, referring to the PCs after training as "highly discretized" and "untangled" are largely meaningless descriptions that lack analytical support.

      We meant the terms “simpler”, “highly-discretized”, and “untangled” as qualitative descriptions of changes in covariance structure that occurred despite the maintenance of overall dimensionality. As the reviewer notes, the obvious changes in PC space appear to have had practically no effect on decodability or dimensionality, and we found this surprising and worth describing.

      (5) The authors report that activity sparsifies, yet provide only the fraction of stimulus-selective cells. Given that cell detection was automated in a manner that takes into account neural activity (using Suite2p), it is difficult to interpret these results as presented. If the authors wish to claim sparsification, they need to provide evidence that the total number of ROIs drawn on each day (the denominator for sparseness in their calculation) is unbiased. Including more (or less) ROIs can dramatically change the calculated sparseness.

      The authors mention sparsification as contributing to coding efficiency but do not test this. Training a decoder on variously sized subsets of their data on days 0 and 5 would test whether redundant information is being eliminated in the network over training.

      First, we provide evidence for sparseness using a visual responsiveness metric in addition to stimulus-selectivity. Second, it is true that Suite2p’s segmentation is informed by activity and therefore may possibly omit cells with very minimal activity. However, we detected a comparable number of cells on day 5 (n=1500) to day 0 (1368). We reportedly roughly half as many cells are stimulus-selective on day 5 compared with day 0. In order for that to have been a result of biased ROI segmentation, we would have needed to have detected closer to 2600 cells on day 5 rather than 1500.  Therefore, we consider any bias in the segmentation to have had little effect on the main findings.

      (6) The authors claim their results show representational drift, but this isn't supported in the data. Rather they show that there is some information in the structure of activity that allows a decoder to learn block ID. But this does not show whether the actual stimulus representations change, and could instead reflect an unrelated artifact that changes over time (responsivity, alertness, bleaching, etc). To actually assess representational drift, the authors should directly compare representations across blocks (one could train a decoder on block 1 and test on blocks 2-5). In the absence of this or other tests of representational drift over blocks, the authors should remove the statement that "These findings suggest that there is a measurable amount of representational drift".

      “To actually assess representational drift, the authors should directly compare representations across blocks (one could train a decoder on block 1 and test on blocks 25)”: This is the exact analysis that was performed. Additionally, our analysis of pairwise correlations directly measures representational drift.

      “But this does not show whether the actual stimulus representations change, and could instead reflect an unrelated artifact that changes over time (responsivity, alertness, bleaching, etc)”: We have repeated the decoder analysis using normalized population vectors (Supplementary Figure 5) which we believe directly addresses whether the observed drift is due to photobleaching or alertness that would affect the overall magnitudes of response vectors.

      Our analysis of block decoding reflects decoders trained on individual stimulus elements, and we show the average over all such decodings (we have clarified this in the text). For example, we trained a decoder on ABCD presentations from block 1 and tested only against ABCD from other blocks, which I believe is the test being suggested by the reviewer. Furthermore, we do show that representational similarity for all stimulus elements reduces gradually and more-or-less monotonically as the time between presentations increases. We believe this is a fairly straightforward test of representational drift as has been reported and used elsewhere (Deitch et al., 2021).

      (7) The authors allude to "temporal echoes" in a subheading. This term is never defined, or substantiated with analysis, and should be removed.

      We hoped the term ‘temporal echo’ would be understood in the context of rebounding activity during gray periods as supported by analysis in figure 6a. We have eliminated the wording in the updated manuscript.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements [optional]

      We thank the reviewers for their insightful comments regarding our study and for appreciating the range of experiments used, the depth of our study and the significance of our work. We also thank reviewers with expertise in evolutionary biology for highlighting the need for precise wrong of some parts of the manuscript and the need for balancing the various viewpoints on the current understanding of early metazoan evolution. A point-by-point response to each reviewer comment is given below. We believe that we can effectively address most reviewer comments in a revised version. The revised improved manuscript will be the first insightful study of intracellular signalling pathways in the context of early animal evolution. We thank the reviewer for noting that this study is highly impactful and can have a broader influence on the scientific community.

      2. Description of the planned revisions

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      __ Summary: The researchers identified PIP4K (phosphatidylinositol 5 phosphate 4-kinase) as a lipid kinase that is specific to metazoans. In order to determine its conserved function across metazoans, they compared PIP4K activity in both early-branching metazoans and bilaterian animals. Biochemical assays demonstrated a conserved catalytic activity between the sponge Amphimedon queenslandica (AqPIP4K) and human PIP4K. In in-vivo experiments, AqPIP4K was found to rescue the reduced cell size, growth, and development phenotype in larvae of null mutant in Drosophila PIP4K. Based on these findings, the authors suggest that the function of PIP4K was established in early metazoans to facilitate intercellular communication. The experiments were well designed, and a range of biochemical, in vitro, and in vivo experiments were conducted.__

      __ That being said, there are some questions that require further discussion before we can fully accept the author's conclusion of an evolutionarily conserved function of PIP4K across metazoans.__

      Major comments:

      • The authors mentioned that PIP4K is metazoan-specific and involved in intercellular communication. How can we explain the presence of PIP4K in choanoflagellate genomes? Despite its high similarity with conserved domains and functionally important residues, experimental results with the PIP4K from Choanoflagellate (Monosiga brevicollis, MbPIP4K) such as Mass spectrometry-based kinase assay and mutant Drosophila PIP4K didn't show similar activity to sponge AqPIP4K. The authors suggested that "In the context of other ancient PIP4K it is possible that since choanoflagellates exist as both single-cell and a transient multicellular state and do not have the characteristics of metazoans, PIP4K does not play any important functional role in these." However, this explanation is not well justified; they need to provide a more detailed discussion on this. Response: PIP4K is found in the genome of the choanoflagellate, M.brevicollis. MbPIP4K has the requisite kinase domain and the critical residue in the activation loop (A381) required for PIP4K activity is also conserved with the Amphimedon enzyme. Despite this, MbPIP4K was unable to rescue the growth and cell size phenotype of dPIP4K mutants (dPIP4K29) unlike AqPIP4K.

      We have previously published a comparison of the in vitro activity versus in vivo function for the three PIP4K enzymes in the human genome (Mathre et.al PMID: __30718367). While all three human PIP4K isoforms can functionally rescue the Drosophila dPIP4K mutant, there is a nearly 104-fold difference for in vitro activity between them with PIP4K2C showing almost no in vitro activity. __The difference in in vitro enzyme activity between MbPIP4K and AqPIP4K is similarly notable. We would however highlight that this is more likely a reflection of the limitations of the in vitro PIP4K activity.

      However, while AqPIP4K can rescue function in vivo (rescue of fly mutant phenotypes) MbPIP4K could not when expressed in fly cells. This must imply that there are differences in the polypeptide sequences of AqPIP4K and MbPIP4K that allow the former but not the latter to couple to the Insulin PI3K signalling pathway in fly cells. Given that Amphimedon and Choanoflagellates are separated by 100-150 Mya in evolution, this is possible. Our data on expression of AqPIP4K and MbPIP4K in fly S2 cells shows that they do not have equivalent localization (Fig 2C). What are the differences in the two polypeptides that lead to this? We will perform a multiple sequence alignment using PIP4K sequences from multiple choanoflagellates and sponges to identify these differences.

      We will include the results of this analysis and an appropriate discussion in the revised manuscript.

      • Likewise, the PIP4K gene has been identified in cnidarians, which are a sister group to bilaterian animals. However, the Cnidaria HvPIP4K showed no activity in biochemical or functional assays. In comparison to sponges, cnidarians are relatively complex organisms, and I believe that PIP4K is highly important for intercellular communication, as it is in bilaterians. The authors attempted to explain this by suggesting that "Based on theories of parallel evolution between cnidarians and sponges during early metazoan evolution, it is possible that the PIP4K gene was retained functional in one lineage and not in other." However, I am not convinced by this statement.

      Response: This is a really interesting and challenging question from the reviewer. We are aware that both sponges (Porifera) and Cnidaria are examples of primitive metazoans separated by 80-90 Mya of evolution, yet while AqPIP4K shows activity and can functionally rescue dPIP4K mutants, HvPIP4K cannot. What does this mean?

      A key difference between sponges and cnidarians is that while cnidarians have a simple “nerve-net” like nervous system, sponges do not have such a mode of communication. Therefore, it is possible that PIP4K, which we propose works in the context of hormone-based communication, is functionally important in sponges.

      We are of course aware and acknowledge that in a like for like experimental system (Drosophila cells) our data shows that the two proteins behave differently, be it in terms of in vitro activity or in vivo function. This must imply inherent differences in the two polypeptides.

      What we propose to do is to compare available PIP4K sequences from multiple Porifera and Cnidaria genomes and try and understand differences in the protein sequence that might explain differences in function. These results and their implications will be included in the revised manuscript.

      • Please provide details of the databases (Uniprot-KB, NCBI sequence database, Pfam) versions. After identifying the specific PIP4K protein in each species (e.g. AqPIP4K and HvPIP4K), have you considered performing a reciprocal blast against the human genome to see if you have a top hit to PIP4K? Hence, the main focus of the project is on PIP4K as a metazoan-specific protein. We need to include a wider representation of non-bilaterian animals, including multiple species from sponges, ctenophores, placozoans, and cnidarians. Additionally, please check if homologues of PIP4K are present in other unicellular holozoans besides choanoflagellates. Response: We will add the NCBI IDs for all the sequences. We have carried out reciprocal blast to human proteome and then classified the selected sequences as PIP4K, we will add the results in the supplementary for the same. We will add more species of sponges, ctenophores, placozoans, and cnidarians in our analysis of PIP4K sequences. We will also include an analysis of other unicellular holozoans where genome sequence is available.

      • Authors suggested the identification of other components of the PI signaling pathway along with PIP4k in the sponge. What is the status of these PI signaling pathway genes in other non-bilaterians and choanoflagellates? Response: We will add the details of the same in the revised manuscript and agree that this will help enhance the interpretation of our results.

      • Phylogenetic tree of all PIP4K sequences (Figure 1C): How authors can be certain that the identified PIP4K sequences (e.g. AqPIP4K, HvPIP4K, and MbPIP4K) are indeed PIP4K, especially when there are several closely related proteins? It is important to conduct phylogenetic analysis alongside other PIP sequences (such as PI3K, PI4K, PIP5K, and PIP4K). If this analysis is carried out, the identified AqPIP4K, HvPIP4K, and MbPIP4K should be grouped together with human PIP4K in the same cluster. Response: As described in the methods, we have searched all the individual genomes analyzed for all PIK and PIPK enzyme sequences. We have marked the domains (using Pfam and Interpro) on these sequences and eliminated other PIK and PIPK sequences (such as PI3K, PI4K, PIP5K) and selected only PIP4K. To additionally confirm the distinction between PIP5K and PIP4K, we have manually inspected each sequence to establish the identity of the A381 amino acid residue in the activation loop. The identity of the amino acid at this position in the activation loop has been experimentally demonstrated to be an essential feature of PIP4K (Kunze et.al PMID: 11733501) and we have also confirmed this independently in a recent study (Ghosh et.al PMID: 37316298).

      We will perform the phylogenetic analysis of the phosphoinositide kinases in the format suggested by the reviewer and add it in the revision as a supporting evidence.

      Minor comments:

      • Line 157: Phylogenetic conservation of PIP4Ks: Please provide details about bootstrap analysis. Response: Will be added

      • Line 230: symbol correction 30{degree sign}C Response: Will be done

      • Line 429-430: "from early metazoans like Sponges, Cnidaria and Nematodes." Nematodes are not considered early metazoans. Response: Apologies for the typo. This will be corrected. We agree that nematodes are not early metazoans.

      • Line 477-478: "However, interestingly, MbPIP4K::GFP localizes only at the plasma membrane in S2 cells (Figure 2C)." This part was not further discussed. Can you please elaborate on why MbPIP4K::GFP localizes only at the plasma membrane in S2 cells? Response: We have discussed this point specifically in response to major comment by the reviewer and it will be addressed as described.

      • Line 598: "the earliest examples of metazoa, namely the coral A. queenslandica" A. queenslandica is a sponge, not coral. Response: Apologies for the error. We will correct it.

      • Line 602: "Amphimedon and human enzyme, although separated by 50Mya years of evolution" I think it's 500 million years ago, not 50 million years ago. Response: This typo will be corrected.

      • Line 612: "coordinated communication between the cells is the most likely function" the cell. Response: Will change the sentence accordingly

      • Line 614: "intracellular phosphoinositide signalling the identity of the hormone" missing full stop punctuation. Response: Will change the sentence accordingly

      • Line 802 - 804: "other by way of difference in colour. The sub clusters have been numbered (1- early metazoans, 2- Nematodes, 3- Arthropods, 4- Molluscs, 5- Vertebrates (isoform PIP4K2C), 6- Vertebrates (isoform PIP4K2A), 7- Vertebrates (isoform PIP4K2B)." In the Figure, I can't find numbers on the subclusters. Response: Will add the numbers in the figure.

      • Line 805- 807: "Phylogenetic analysis of selected PIP4K sequences from model organisms of interest. PIP4K from A. queenslandica has been marked in rectangular box." The rectangular box is missing in the figure. Response: Will change the figure accordingly

      • Figure 1C: full forms of species names are missing. Response: Will change the figure accordingly

      Reviewer #1 (Significance (Required):

      The data is presented well, and the authors used a wide range of assays to support their conclusion. The study is highly impactful and can have a broader influence on the scientific community, particularly in evolutionary molecular biology, development, and biochemistry.

      The study provides interesting findings; however, the reasons for PIP4K not being functional in cnidarians as in sponges and why PIP4K is present in unicellular holozoans but not functional are unclear.

      We thank the reviewer for appreciating the significance and impact of our study. The very helpful questions raised by the reviewer will help enhance the quality of our study even further. We will make every effort to address these queries.

      Reviewer #2 (Evidence, reproducibility and clarity (Required):

      The manuscript by Krishnan et al. uses molecular phylogenetics, in vitro kinase assays, heterologous expression assays in Drosophila S2 cells and mutant complementation assays in yeast to study the evolution and function of putative PIP4 kinase genes from a sponge, a cnidarian and a choanoflagellate. Based on these experiments, the authors conclude that PIP4K is metazoan-specific and that the sponge PIP4K has conserved functions in selectively phosphorylating PI5P.

      The study is in principle of interest and it could all be valid data, but the large number of flaws in the data presentation and/or analysis just makes it hard to assess the quality and thus validity of the data and conclusions.

      We thank the reviewer for appreciating the potential interest in our findings of PIP4K function in early metazoans. We thank them for noting the need for correcting data presentation and these will be done in the revision.

      __ Major comments:__

      Overall, the manuscript lacks scientific rigor in the analysis and representation of the results, and the validity of many of the conclusions is therefore difficult to assess.

      Major problems are:

      (i) The authors base their study on the evolution of PIP4K genes on a deeply flawed concept of animal evolution. On multiple occasions, including the title, the authors refer to extant species (e.g. Amphimedon) as 'early metazoan', 'regarded as the earliest evolved metazoan' (l. 46-7) or 'the earliest examples of metazoans' just to name a few. This reflects a 'ladder-like' view on evolution that suggests that extant sponges are identical to early 'steps' of animal evolution.

      We thank the reviewer who is clearly vastly more experienced in the field of evolutionary biology for the possible imprecise/incorrect usage of the word “ancient metazoan”. As new entrants to this area of evolutionary biology, we have of course referred to the existing literature such as PMID: 20686567 to guide us. This paper describes the sequencing of the A. queenslandica genome. It is clear that there is perceived value in studying this sponge in the context of early animal evolution although we are aware of there are a multitude of sponges and not all of them may be of value in the study of early animal evolution. We will peruse the literature more carefully and revise the manuscript to provide a more balanced view of this very interesting but unresolved area.

      Also, the author's interpretation that one cluster of genes 'contained the sequences from early metazoans like sponges, cnidaria and nematodes' is referring to an outdated idea of animal phylogeny where nematodes were thought to be ancestrally simple organisms grouped as 'Acoelomata'. This idea of animal phylogeny was however disproven by molecular phylogenetics since the 1990ies.

      Response: We are aware that the field of animal classification is undergoing continuous evolution. While earlier classifications may have been based of the presence or otherwise of a coelom and/or other anatomical features, we are aware of the use of molecular phylogenetics.

      The phylogeny presented in Fig 1C is based on the sequence relationships between the PIP4K sequences from various animal genomes. Any errors in the labelling of groups such as that highlighted by the reviewer will be revised or corrected after a careful consideration of extant views in the field, which are somewhat varied.

      (ii) The description of taxa in the phylogenetic tree in Fig. 1B lacks any understanding of phylogenetic relationships between animals and other eukaryotic groups. What kind of taxa are 'invertebrates' or 'parasites'? And why would 'invertebrates' exclude cnidarians and sponges? Also, why is the outgroup of opisthokonts named 'Eukaryota'?? Are not all organisms represented on the tree eukaryotes?

      Response: We apologize for this imprecision in labelling taxa. This will be corrected.

      (iii) The methods part lacks any information about the type of analysis (ML, Bayesian, Parsimony?) used to perform the phylogenetic analysis shown in Fig. 1C. Also, the authors mention three distinct clusters (l.428) that are not labelled in the figure.

      Response: We will update the methods to include the additional details requested by the reviewer. Fig 1C will be re-labelled.

      (iv) The validity of the Western Blot is difficult to assess as the authors have cut away the MW markers. Without, it is for example difficult to assess the size differences visible between Hydra and Monosiga PIP4K-GFP proteins on Fig. 2B. Also, it has become standard practice to show the whole Western blot as supplementary data in order to assess the correct size of the bands and the specificity of the antibody. This is also missing from this manuscript.

      Response: Cropped Western blots have been shown to facilitate figure preparation in the main manuscript. The complete uncropped Western blots, in all cases, will be shared as Source data as is the standard practice for multiple journals in the review Commons portfolio.

      (v) The authors claim that AqPIP4K was able to convert PI3P into PI with very low efficiency (Figure 2E), but without further label in the figure or explanation, it remains unclear how the authors come to this conclusion.

      Response: We regret the typo in line 500 of the manuscript we have stated that “Further,……… was able to convert PI3P into PI with very low efficiency (Figure 2E).” What we intended to write was “Further,……… was able to convert PI3P into PI (3,4) P2 with very low efficiency (Figure 2E).” The efficiency with which this reaction takes place is very low and has been reported by us (Ghosh et.al PMID: 31652444) and others (Zhang et.al PMID: 9211928). At the exposure of the TLC shown in Fig 2E the PI(3,4)P2 spot cannot been seen. Much longer exposures of the TLC plate will be needed to see the PI(3,4)P2 spot. This will be corrected in a revised version of the manuscript.

      (vi) The box plots in Fig. 3C and D lack error bars and thus seem to be consisting of only single data points without replicates. Also, Fig. 3C is a quantification of Fig. 3B but it remains unclear what has been quantified and how. It is also unclear how %PIP2 was determined.

      Response: For Fig 3C, the colony count has been done from three replicates and the average has been considered to calculate the % growth for each genotype. We will include error bars and clarify this in the revised figure legend. For Fig 3D, the PIP2/PIP ratio has been calculated from biological replicates and average has been represented in the graphs. The individual values can be provided as supplementary data.

      (vii) Throughout Fig. 4, I do not understand the genotypes indicated on the x-axis of the plots and below the images. I read the figure legends and manuscript describing these results at least 3 times, but cannot figure out what it all means. On Fig. 4C, what is the wild-type situation?

      Response: We apologize for the lack of precision in labelling the figures versus the figure legends. This will be corrected in the revision:

      The genotypes are as follows

      • w1118 (control) * Act-GAL4. This has been referred to as wild type in the figure legend and called Act-Gal4 in Fig4 panels A-E
      • dPIP4K29 – This refers to the protein null strain of dPIP4K. This strain is the background in which all reconstitutions of PIP4K genes have been done.
      • PIP4K transgene from A. queenslandica.
      • AqPIP4KKD Kinase dead PIP4K transgene from * queenslandica. In panels A, B, D and E, Act-GAL4: dPIP4K29* indicates the genetic background in which either AqPIP4K or AqPIP4KKD has been reconstituted.

      Reviewer #2 (Significance (Required)):

      If validated and put in the right phylogenetic context, the study is potentially contributing to expanding our knowledge on the evolution of metazoan-specific features, especially the evolution of proteins involved in cell-cell signalling and growth control. My field of expertise is broadly in evo-devo, molecular phylogentics, developmental genetics and cell biology. The in vitro lipid analysis seems interesting and potentially valid but I do not have sufficient expertise to evaluate its validity.

      We thank the reviewer for appreciating the novelty of our contribution and its potential to contribute to understanding the evolution of metazoan specific signalling systems, once appropriate corrections have been made. We also appreciate their positive comment on our in vitro experimental analysis. This paper is a big effort to not only perform phylogenetic analysis but address the emerging interpretations experimentally as much as possible.


      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary In this manuscript, the authors investigate the evolutionary origins of metazoan Phosphatidylinositol phosphates (PIPs) signaling by elucidating the sequence and function of the PIP4K enzyme, which is crucial for converting PI5P to PI(4,5)P2 through phosphorylation. The authors have described PIP4K-like sequences distributed throughout metazoans and choanoflagellates through an extensive sequence screening. With in vitro and in vivo functional assays, the authors have shown that the sponge A. queenslandica PIP4K (AqPIP4K) is functionally similar to its human counterpart and highlight the major discovery of this study - that PIP4K protein function dates back to as early as sponges.

      We thank the reviewer for noting the major finding of our study and our efforts to experimentally validate, using multiple approaches, the findings of our detailed bioinformatics analysis of PIP4K gene distribution across the tree of life.

      Major comments

      There are two key limitations to this paper. Like the sponges, ctenophores are one of the earliest branching metazoans. They are not well addressed in the paper. Secondly, despite finding PIP4 homologs in choanoflagellates, the authors claim that PIP4 is metazoan-specific.

      We thank the reviewer for highlighting these two points; we recognize that both of these are important to address, to the extent that it is possible to do so. These will be addressed using the approaches detailed in the response to reviewer 1 comments.

      1. Line 46: A. queenslandica is the earliest branching metazoan. The phylogeny of sponges and ctenophores is not conclusively defined and hence, the statement must be rephrased. Despite the brief description of the evolution of metazoan lineage in the discussion section, ctenophores are missing from the phylogenetic tree. At least a sequence-level information PIP4K in ctenophores would strongly back the claims of the manuscript. Here is the link to the Mnemiopsis database. Response: We thank the reviewer for highlighting this point and pointing us to the Mnemiopsis database. We will most certainly analyse ctenophore genome sequences and add the ctenophore PIP4K sequence to the phylogeny, post analysis and the discussion will be modified to reflect the findings.

      Mentioning that choanoflagellates contain homologs of PIP4K contradicts the statement that PIP4K is metazoan-specific. As per Fig 1E., the domain organization of PIP4K is conserved among choanoflagellates and metazoans. What is the percent sequence similarity to the query? This could answer why it doesn't show activity in Drosophila rescues - the system might simply not be compatible with the choanoflagellate homolog. The same may apply to the cnidarian homolog HvPIP4K. Further evidence is needed before concluding that MbPIP4K doesn't phosphorylate PIP5. It is additionally fascinating that MbPIP4K localizes at the plasma membrane unlike other homologs - this function might be choano-specific. Overall, PIP4K's possible origin in the choanoflagellate-metazoan common ancestor backs the current research that choanoflagellates indeed hold clues to understanding metazoan evolution. Further research is necessary before concluding (as in line 648) in the discussions section, where it is mentioned that "PIP4K does not play any important functional role in choanos".

      Response: We thank the reviewer for highlighting the very interesting but incompletely understood facets of our study vis-à-vis choanoflagellates versus metazoans. The proposal for additional analysis is indeed interesting and we will carry out these analysis and revise the text accordingly.

      __ Minor Comments__

      1. A detailed comparison of the sequence of the hydra PIP4K might help understand why it may not have worked like the sponge PIP4K. The discussion on the cnidarian PIP4K evolution is not convincing. It may not have worked because of it being expressed in a non-natural system. Structure prediction and comparison of proteins from different early branching animals should be used. Response: Thank you for these suggestions to understand why the cnidarian PIP4K may not have been functional. We will perform the suggested analysis and incorporate the data into the revision.

      78 - Multicellularity evolved many times. Maybe say 'first evolved metazoans'

      Response: Thank you for the suggestion.

      Line 598 A. queenslandica is not a coral, it's a sponge.

      Response: Text will be changed accordingly

      Line 612 'thcells' à 'the cells'

      Response: Text will be changed.

      Line 623 - full stop missing after metazoans.

      Response: Text will be changed

      Figure 1B - Classification should be consistent - C. elegans is a species name, whereas ctenophores and vertebrates belong to a different classification. Invertebrates is not a scientific group. The edges of the lines of the phylogenetic tree don't join and they need to be arranged correctly.

      Response: The names in the phylogeny will be changed to maintain uniformity. The representation of the phylogeny will be changed as mentioned.

      Figure 2B The full blot could be shown in the supplement.

      Response: Full blot will be provided as source data on resubmission or included as supplementary based on the destination journal’s specification.


      Optional

      1. Heterologous overexpression does not always provide the full picture of the gene functionality. To make claims on the evolution of function, testing gene functions homologous systems can give a better picture. For example, performing in vitro kinase activity assays of MbPIP4K after overexpressing PIP4K in Monosiga brevicollis. would be a great. Data is missing also about the presence and function of ctenophore PIP4K. Overexpression of ctenophore-PIP4K in Drosophila for functional analyses could help in understanding the distribution/diversity of function of PIP4K in early animals. Response: We agree with the reviewer that heterologous expression may sometimes not replicate the biochemical environment of cells in the organism from which the gene being expressed was originally derived. Yet, heterologous expression experiments do sometimes provide an insight into properties solely dependent on the polypeptide with limited or no contribution from the cellular environment. In principle expressing PIP4K in M.brevicollis cells and then performing kinase assays would be a very good idea. However, we would like to highlight that till date there has been only one study where septins have been transfected in Choanoflagellates and their localization being observed. We are not set up to culture M. brevicollis and will be unable to do this for a revision of the current manuscript. However, we appreciate the importance of this experiment and will do this in collaboration with a choanoflagellate lab in a follow up study to this one.

      Ctenophores like cnidarians have two main layers of cells that sandwich a middle layer of jelly-like material, while, more complex animals have three main cell layers and no intermediate jelly-like layer. Hence ctenophores and cnidarians have traditionally been labelled diploblastic. Studies have shown that ctenophores and unicellular eukaryotes share ancestral metazoan patterns of chromosomal arrangements, whereas sponges, bilaterians, and cnidarians share derived chromosomal rearrangements. Conserved syntenic characters unite sponges with bilaterians, cnidarians, and placozoans in a monophyletic clade while ctenophores are excluded from this clustering, placing ctenophores as the sister group to all other animals. Ctenophore PIP4K sequence can be identified and compared as discussed before to other PIP4K sequences used in this study.

      Reviewer #3 (Significance (Required)):

      Significance: This is the first study that addresses PIP signaling pathway in early metazoans. The findings of this manuscript contribute to the understanding of second-messenger signaling and its link with the origin and evolution of metazoan multicellularity. PIP signaling is crucial in different metazoan aspects such as cytoskeletal dynamics, neurotransmission, and vesicle trafficking, and hence, plays a critical role in metazoan multicellularity. Through this study, it was interesting to see that some components of the PIP signaling pathway are conserved in yeast, but some, such as the PIP4K protein evolved at the brink of metazoan evolution, highlighting the need for complexity in metazoans and their close relatives - the facultatively multicellular choanoflagellates. Since this is a crucial pathway in human biology and has medical significance due to its role in tumorigenesis and cancer cell migration, this study serves the audience in basic research such as evolutionary biology, and applied research such as human medicine. My field of expertise is molecular biology, cell biology and microbiology, with specific expertise on choanoflagellates. Therefore, it is exciting to see the homologs of PIP4K present in choanoflagellates.

      __ Evidence, Reproducibility, and clarity:__

      The authors have made a clear case of why PIP4K needs to be studied. They have thoroughly mapped PIP4K throughout the tree of life. The results are clear and reproducible. With the findings of this study, they have linked the PIP signalling cascade and metazoan evolution. Using the heterologous expression of sponge A. queensladica PIP4K, they have made compelling evidence that AqPIP4K functions in PIP5 phosphorylation, as seen in humans and Drosophila. However, it was not convincing why the hydra PIP4K was not functional. It was also not convincing why the PIP4K is metazoan-only when there is a conserved sequence (with conserved domain structure) present in choanoflagellates.


      We thank the reviewer for appreciating the novelty and importance of our findings in multiple areas of basic biology related to early metazoans and basic biomedical sciences. We also note their comments on the clear and reproducible results presented. Points raised related to the lack of functionality of PIP4K from Hydra and choanoflagellates are noted and will be addressed as indicated in response to other reviewer comments.


      Experiments/Analysis to be done

      1. We will perform a multiple sequence alignment using PIP4K sequences from multiple choanoflagellates and sponges to identify these differences.
      2. What we propose to do is to compare available PIP4K sequences from multiple Porifera and Cnidaria genomes and try and understand differences in the protein sequence that might explain differences in function.
      3. We will add more species of sponges, ctenophores, placozoans, and cnidarians in our analysis of PIP4K sequences. We will also include an analysis of other unicellular holozoans where genome sequence is available.
      4. We will perform the phylogenetic analysis of the phosphoinositide kinases in the format suggested by the reviewer and add it in the revision as a supporting evidence.
      5. Structure prediction and comparison of proteins from different early branching animals should be used.
      6. Uniformity of terminology and alignment with conventions in the field of animal taxonomy
      7. NCBI ID of sequences to be added and include more non-bilaterian animals sequences in phylogeny- redo the phylogeny.
      8. Check for PI signalling genes in choanoflagellates
      9. More detailed description of phylogenetic analysis.
      10. Add complete Western blot as source data.
      11. *

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. If no revisions have been carried out yet, please leave this section empty.

      • *

      4. Description of analyses that authors prefer not to carry out

      • Expression of PIP4K in choanoflagellates and in vitro kinase assays with lysates. It is beyond our technical ability to perform these experiments at this stage.
    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This work aims to understand the role of thalamus POm in dorsal lateral striatum (DLS) projection in learning a sensorimotor associative task. The authors first confirm that POm forms "en passant" synapses with some of the DLS neuronal subtypes. They then perform a go/no-go associative task that consists of the mouse learning to discriminate between two different textures and to associate one of them with an action. During this task, they either record the activity of the POm to DLS axons using endoscopy or silence their activity. They report that POm axons in the DLS are activated around the sensory stimulus but that the activity is not modulated by the reward. Last, they showed that silencing the POm axons at the level of DLS slows down learning the task.

      The authors show convincing evidence of projections from POm to DLS and that POm inputs to DLS code for whisking whatever the outcome of the task is. However, their results do not allow us to conclude if more neurons are recruited during the learning process or if the already activated fibres get activated more strongly. Last, because POm fibres in the DLS are also projecting to S1, silencing the POm fibres in the DLS could have affected inputs in S1 as well and therefore, the slowdown in acquiring the task is not necessarily specific to the POm to DLS pathway.

      We thank the reviewer for these constructive comments. The points are addressed below.  

      Strengths:

      One of the main strengths of the paper is to go from slice electrophysiology to behaviour to get an in-depth characterization of one pathway. The authors did a comprehensive description of the POm projections to the DLS using transgenic mice to unambiguously identify the DLS neuronal population. They also used a carefully designed sensorimotor association task, and they exploited the results in depth.

      It is a very nice effort to have measured the activity of the axons in the DLS not only after the mice have learned the task but throughout the learning process. It shows the progressive increase of activity of POm axons in the DLS, which could imply that there is a progressive strengthening of the pathway. The results show convincingly that POm axons in the DLS are not activated by the outcome of the task but by the whisker activity, and that this activity on average increases with learning.

      Weaknesses:

      One of the main targets of the striatum from thalamic input are the cholinergic neurons that weren't investigated here, is there information that could be provided?

      This is true of the parafascicular (Pf) thalamic nucleus, which has been well studied in this context. However, there is much less known about the striatal projections of other thalamic nuclei, including POm, and their inputs to cholinergic neurons. Anatomical tracing evidence from Klug et al. (2018), which mapped brain-wide inputs to striatal cholinergic (ChAT) interneurons, suggests that Pf provides the majority of thalamic innervation of striatal ChAT neurons compared to other thalamic nuclei. Many other thalamic nuclei, including POm, showed very little of no labeling, suggesting weak innervation of ChAT interneurons. However, it is possible that these thalamic nuclei, including POm, do provide functional innervation of ChAT interneurons that is not sufficiently assessed by anatomical tracing. Understanding the innervation patterns of POm-striatal projections beyond the three cell types we have studied here would be an important area of further study.

      It is interesting to know that the POm projects to all neuronal types in the DLS, but this information is not used further down the manuscript so the only take-home message of Figure 1 is that the axons that they image or silence in the DLS are indeed connected to DLS neurons and not just passing fibres. In this line, are these axons the same as the ones projecting to S1? If this is the case, why would we expect a different behaviour of the axon activity at the DLS level compared to S1?

      Tracing of single POm axons by Ohno et al. (2012) indicated that POm axons form a branched collateral that innervates striatum, while the main axon continues in the rostral-dorsal direction to innervate cortex. We think it is reasonable, based on the morphology, that our optogenetic suppression experiment restricted the suppression of glutamate release to this branch and avoided the other branches of the axon that project to cortex. However, testing this would require monitoring S1 activity during the POm-striatal axon suppression, which we did not do in this study.

      It is a very interesting question whether there could be different axon activity behavior in striatum versus S1. There is surprising evidence that POm synaptic terminals are different sizes in S1 and M1 and show different synaptic physiological properties depending on these cortical projection targets (Casas-Torremocha et al., 2022). Based on this, it is possible that POm-striatal synapses show distinct properties compared to cortex; however, this will need to be tested in future work.

      The authors used endoscopy to measure the POm axons in the DLS activity, which makes it impossible to know if the progressive increase of POm response is due to an increase of activity from each individual neuron or if new neurons are progressively recruited in the process.

      This is a good point. It would be necessary to perform chronic two-photon imaging of POm neurons (or chronic electrophysiological recordings) to determine whether the activity of individual neurons increased versus whether individual neuron activity levels remained similar but new neurons became active with learning. Even under baseline conditions, it is not known in detail what fraction of the population of POm neurons is active during sensory processing or behavior, highlighting how much is still to be discovered in this exciting area of neuroscience.

      The picture presented in Figure 4 of the stimulation site is slightly concerning as there are hardly any fibres in neocortical layer 1 while there seems to be quite a lot of them in layer 4, suggesting that the animal here was injected in the VB. This is especially striking as the implantation and projection sites presented in Figures 1 and 2 are very clean and consistent with POm injection.

      Although this image was selected to demonstrate the position of the POm injection site and optical fiber implant above striatal axons, the reviewer is correct that there appears to be mixed labeling of axons in L4 and L5a. In some cases, there was expression slightly outside the border of POm (see Fig. 1B, right), which might explain the cortical innervation pattern in this figure. While cortically bound VPM axons pass through the striatum, they do not form synaptic terminals until reaching the cortex (Hunnicutt et al., 2016). If, as may be the case, inhibitory opsins suppress release of neurotransmitter at synaptic terminals more effectively than action potential propagation in axons, it may be likely that optogenetic suppression of POm-striatal terminals is more effective than suppression of action potentials in off-target-labelled VPM axons of passage. Ideally, we could compare effects of suppression of POm-striatal synapses with POm-cortical synapses and VPM-cortical synapses, but this was outside the bandwidth of the present study.

      Reviewer #2 (Public Review):

      Summary:

      Yonk and colleagues show that the posterior medial thalamus (POm), which is interconnected with sensory and motor systems, projects directly to major categories of neurons in the striatum, including direct and indirect pathway MSNs, and PV interneurons. Activity in POm-striatal neurons during a sensory-based learning task indicates a relationship between reward expectation and arousal. Inhibition of these neurons slows reaction to stimuli and overall learning. This circuit is positioned to feed salient event activation to the striatum to set the stage for effective learning and action selection.

      Strengths:

      The results are well presented and offer interesting insight into an understudied thalamostriatal circuit. In general, this work is important as part of a general need for an increased understanding of thalamostriatal circuits in complex learning and action selection processes, which have generally received less attention than corticostriatal systems.

      Weaknesses:

      There could be a stronger connection between the connectivity part of the data - showing that POm neurons context D1, D2, and PV neurons in the striatum but with some different properties - and the functional side of the project. One wonders whether the POm neurons projecting to these subtypes or striatal neurons have unique signaling properties related to learning, or if there is a uniform, bulk signal sent to the striatum. This is not a weakness per se, as it's reasonable for these questions to be answered in future papers.

      We are very interested to understand the potentially distinct learning-related synaptic and circuit changes that potentially occur at the POm synapses with D1- and D2-SPNs and PV interneurons, and other striatal cell types. We agree that this would be an important topic for further investigation.

      All the in vivo activity-related conclusions stem from data from just 5 mice, which is a relatively small sample set. Optogenetic groups are also on the small side.

      We appreciate this point and agree that higher N can be important for observing robust effects. A factor of our experiments that helped reduce the number of animals used was the longitudinal design, with repeated measures in the same subjects. This allowed for the internal control of comparing learning effects in the same subject from naïve to expert stages and therefore increased robustness. Even with relatively small group sizes, results were statistically significant, suggesting that the use of more mice was unnecessary, which we considered consistent with best practice in the use of animals in research. We also note that our group sizes were consistent with other studies in the field.  

      Reviewer #3 (Public Review):

      Yonk and colleagues investigate the role of the thalamostriatal pathway. Specifically, they studied the interaction of the posterior thalamic nucleus (PO) and the dorsolateral striatum in the mouse. First, they characterize connectivity by recording DLS neurons in in-vitro slices and optogenetically activating PO terminals. PO is observed to establish depressing synapses onto D1 and D2 spiny neurons as well as PV neurons. Second, the image PO axons are imaged by fiber photometry in mice trained to discriminate textures. Initially, no trial-locked activity is observed, but as the mice learn PO develops responses timed to the audio cue that marks the start of the trial and precedes touch. PO does appear to encode the tactile stimulus type or outcome. Optogenetic suppression of PO terminals in striatum slow task acquisition. The authors conclude that PO provides a "behaviorally relevant arousal-related signal" and that this signal "primes" striatal circuitry for sensory processing.

      A great strength of this paper is its timeliness. Thalamostriatal processing has received almost no attention in the past, and the field has become very interested in the possible functions of PO. Additionally, the experiments exploit multiple cutting-edge techniques.

      There seem to be some technical/analytical weaknesses. The in vitro experiments appear to have some contamination of nearby thalamic nuclei by the virus delivering the opsin, which could change the interpretation. Some of the statistical analyses of these data also appear inappropriate. The correlative analysis of Pom activity in vivo, licking, and pupil could be more convincingly done.

      The bigger weakness is conceptual - why should striatal circuitry need "priming" by the thalamus in order to process sensory stimuli? Why would such circuitry even be necessary? Why is a sensory signal from the cortex insufficient? Why should the animal more slowly learn the task? How does this fit with existing ideas of striatal plasticity? It is unclear from the experiments that the thalamostriatal pathway exists for priming sensory processing. In fact, the optogenetic suppression of the thalamostriatal pathway seems to speak against that idea.

      We thank the reviewer for these constructive comments. The points are addressed below.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      Do POm neurons innervate CINs also? The connection between the PF thalamus and CINs is mentioned in a couple of places - one question is how unique are the input patterns for the POm versus adjacent sensorimotor thalamic regions, including the PF? This isn't a weakness per se but knowing the answer to that question would help in forming a more complete picture of how these different thalamostriatal circuits do or do not contribute uniquely to learning and action selection.

      Anatomical tracing evidence from Klug et al. (2018), which mapped brain-wide inputs to striatal cholinergic (ChAT) interneurons, suggests that Pf provides the majority of thalamic innervation of striatal ChAT neurons compared to other thalamic nuclei. Many other thalamic nuclei, including POm, showed very little or no labeling, suggesting weak innervation of ChAT interneurons. However, it is possible that these thalamic nuclei, including POm, do provide functional innervation of ChAT interneurons that is not sufficiently assessed by anatomical tracing.

      Another difference between Pf and other thalamic nuclei (likely including POm) comes from anatomical tracing evidence (Smith et al., 2014; PMID: 24523677) which indicates that Pf inputs form the majority of their synapses onto dendritic shafts of SPNs, while other thalamic nuclei form synapses onto dendritic spines. Understanding the innervation patterns of POm-striatal projections beyond the three cell types we have studied here, including ChAT neurons and subcellular localization, would be an important area of further study.

      It would be useful to know to what extent these POm-striatum neurons are activated generally during movement, versus this discrimination task specifically.

      We agree that distinguishing general movement-related activity from task-specific activity would be very useful. Earlier work (Petty et al., 2021) showed a close relationship between POm neuron activity, spontaneous (task-free) whisker movements, and pupil-indexed arousal in head-restrained mice. Oram et al. (2024; PMID: 39003286) recently recorded VPM and POm in freely moving mice during natural movements, finding that activity of both nuclei correlated with head and whisker movements. These studies indicate that POm is generally coactive with exploratory head and whisker movements.

      During task performance, the situation may change with training and attentional effects. For example, Petty and Bruno (2024) (https://elifesciences.org/reviewed-preprints/97188) showed that POm activity correlates more closely with task demands than tactile or visual stimulus modality. Our data indicate that POm axonal signals are increased at trial start during anticipation of tactile stimulus delivery and through the sensory discrimination period, then decrease to baseline levels during licking and water reward collection (Fig. 3). Results of Petty and Bruno (2024) together with ours suggest that POm is particularly active during the context of behaviorally relevant task performance. Thus, we think it is likely that, while pupil dilation indexes general movement and arousal, POm activity is more specific to movement and arousal associated with task engagement and behavioral performance. We have strengthened this point in the Discussion.

      Many of the data panels and text for legends/axes are quite small, and the stroke on line art is quite faint - overall figures could be improved from a readability standpoint.

      We thank the reviewer for their careful attention to the figures. 

      Reviewer #3 (Recommendations For The Authors):

      Major

      (1) Page 4, the Results regarding PSP and distance from injection site. The r-squared is the wrong thing to look at to test for a relationship. One should look at the p-value on the coefficient corresponding to the slope. The p-value is probably significant given the figures, in which case there may be a relationship contrary to what is stated. All the low r-squared value says is that, if there is a relationship, it does not explain a lot of the PSP variability.

      We thank the reviewer for alerting us this oversight. We have included the p value (p = 0.0293) in the figure and legend, and indicated that the relationship is “small but significant”.

      (2) Figure 1B suggests that the virus injections extend beyond POm and into other thalamic structures. Do any of the results change if the injections contaminating other nuclei are excluded from the analysis? I am not suggesting the authors change the figures/analyses. I am simply suggesting they double-check.

      We selected for injections that were predominantly expressing in POm as determined by post-hoc histological analysis (see Fig. 1, right). As above, we think that axons of passage that do not form striatal synapses are less likely to be suppressed than axons with terminals; however, this would need to be determined in further experiments. Because the preponderance of expression is within POm, we think the results would be similar even with a stricter selection criterion. 

      (3) The authors conclude that POm and licking are not correlated (bottom of page 6 pertaining to Figures 3A-F). The danger of these analyses is that they assume that GCaMP8 is a perfect linear reporter of POm spikes. The reliability of GCaMP8 has been quantified in some cell types, but not thalamic neurons, which have relatively higher firing rates.

      The reviewer is correct that the relationship between GCaMP8 fluorescence changes and spiking has not been sufficiently characterized in thalamic neurons, and that this would be important to do.

      What if the indicator is simply saturated late into the trial (after the average reaction time)? It would look like there is no response and one would conclude no correlation, but there could be a very strong correlation.

      While saturation is worthy of concern, the signal dynamics here argue against this possibility. The reason is that the signal increased in the early part of the trial and decreased by the end. If saturation was an issue, this would have been apparent during the initial increase. When the signal decreased in amplitude at the end of the trial, this indicates that the signal is not saturated because it is returning from a point closer to its maximum (and is becoming less saturated).

      Also, what happens between trials? Are the correlations the same, stronger, weaker? Ideally, the authors would analyze the data during and between trials.

      Between trials the signal did not show further changes in baseline beyond what was displayed at the start and end of behavioral trials. There were no consistent increases or decreases in signals between trials, except perhaps during strong whisking bouts. This is anecdotal because we did not analyze between-trial data. However, it is interesting and important to note that signals increased dramatically in amplitude from naïve, early learning to expert behavioral performance (Fig. 3), highlighting that POm-axonal signals relate to behavioral engagement and performance rather than spontaneous behaviors.  

      (4) Axonal activity could also appear more correlated with the pupil than licking because pupil dynamics are slow like the dynamics of calcium indicators. These kernels could artificially inflate the correlation. Ideally, the authors could consider these temporal effects. Perhaps they could deconvolve the temporal profiles of calcium and pupil before correlating? Or equivalently incorporate the profiles into their analysis?

      We analyzed the lick probability histograms, which had a temporal profile similar to the calcium signals (Fig. 3D,E), ruling out concerns about effects of temporal effects on correlations. It is also worth noting that we observed changes in correlations between calcium signals and pupil with learning stage (Fig. 3I), even though the temporal profiles (signal dynamics) are not changing. Thus, temporal effects of the signals themselves are not the driver of correlations, but rather the changes in relative timing between calcium signals and pupil, as occur with learning.

      (5) The authors conclude that PO provides a "behaviorally relevant arousal-related signal" and that this signal "primes" striatal circuitry for sensory processing. The data here support the first part. It is not clear that the data support the second part, largely because it is vague what "priming" of sensory processing or "a key role in the initial stages of action selection (p.9) even means here. Why would such circuitry even be necessary? Why is a sensory signal from the cortex insufficient? Why should the animal more slowly learn the task? How does this fit with existing ideas of striatal plasticity? Some conceptual proposals from the authors, even if speculative and not offered as a conclusion, would be helpful.

      We appreciate these good points and have added further consideration and revision of the concept of priming and potential roles in an extensively revised Discussion section.

      (6) The photometry shows that PO turns on about 2 seconds before the texture presentation. PO's activity seems locked to the auditory cue, not the texture (Figure 2). This means that the attempt to suppress the thalamostriatal pathway with JAWS (Figure 4) is rather late, isn't it? Some PO signals surely go through. This seems to contradict the idea of priming above. It would be good if the authors could factor this into their narrative. Perhaps labelling the time of the auditory cue in Figure 4C would also be helpful.

      The start of texture presentation (movement of the texture panel toward the mouse) and auditory cue occur at the same time. To clarify this, we added a label “start tone” in Figure 4C and also in Figure 2C.

      For optogenetic (JAWS) suppression, we intentionally chose a time window between start tone onset and texture presentation, because our photometry experiments showed that this was when the preponderance of the signal occurred. However, the reviewer is correct that our chosen optogenetic suppression (JAWS) onset occurs shortly after the photometry signal has already started, potentially leaving the early photometry signal un-suppressed. Our motivation for choosing a restricted time window surrounding the texture presentation time was 1) to minimize illumination and potential heating of brain tissue; 2) to target a time window that avoids the auditory cue but covers stimulus presentation. We did not want to extend the duration of the suppression to before the trial started, because this could produce task-non-specific effects, such as distraction or loss of attention before the start of the trial.

      Even if some signal were getting through before suppression, we don’t think this contradicts the possibility of ‘priming’, because the process underlying priming would still be disrupted even if not totally suppressed. This would alter the temporal relationship between POm-striatal inputs and further corticostriatal inputs (from S1 and M1 cortex, for example). We have included further consideration of these points and possible relation to the priming concept in the Discussion.

      Minor

      (1) Page 5, "the sensitivity metric is artificially increased". What do you mean "artificially"? The mice are discriminating better. It is true that either a change in HR or FAR can cause the sensitivity metric to change, but there is nothing artificial or misleading about this.

      We removed the word artificial and clarified our definition of behaviorally Expert in this context:

      “Mice were considered Expert once they had reached ≥ 0.80 Hit Rate and ≤ 0.30 FA Rate for two consecutive sessions in lieu of a strict sensitivity (d’) threshold; we found this definition more intuitive because d’ is enhanced as Hit Rate and FA Rate approach their extremes (0 or 1)”

      (2) Page 7, "Upon segmentation (Figure S4G-J)". Do you mean "segregation by trial outcome"?

      Corrected.

      (3) Page 9, "POm projections may have discrete target-specific functions, such that POm-striatal inputs may play a distinct role in sensorimotor behavior compared to POm-cortical inputs". Would POm-cortical inputs not also be sensorimotor? The somatosensory cortex contains a lot of corticostriatal cells. It also has various direct and indirect links to the motor cortex as well.

      We have clarified the wording here to convey the possibility that POm signals could be received and processed differently by striatal versus cortical circuitry, and have moved this statement to later in the discussion for better elaboration.

      (4) The Methods state that male and female mice were used. Why not say how many of each and whether or not there are any sex-specific differences?

      We added the following information to the Methods:

      The number of male and female mice were as follows, by experiment type: 6 male, 4 female (electrophysiology); 3 male, 2 female (fiber photometry); 4 male, 5 female (optogenetics). Data were not analyzed for sex differences.

    1. Reviewer #1 (Public review):

      Summary:

      In this series of studies, Locantore et al. investigated the role of SST-expressing neurons in the entopeduncular nucleus (EPNSst+) in probabilistic switching tasks, a paradigm that requires continued learning to guide future actions. In prior work, this group had demonstrated EPNSst+ neurons co-release both glutamate and GABA and project to the lateral habenula (LHb), and LHb activity is also necessary for outcome evaluation necessary for performance in probabilistic decision-making tasks. Previous slice physiology works have shown that the balance of glutamate/GABA co-release is plastic, altering the net effect of EPN on downstream brain areas and neural circuit function. The authors used a combination of in vivo calcium monitoring with fiber photometry and computational modelling to demonstrate that EPNSst+ neural activity represents movement, choice direction and reward outcomes in their behavioral task. However, viral-genetic manipulations to synaptically silence these neurons or selectively eliminate glutamate release had no effect on behavioral performance in well-trained animals. The authors conclude that despite their representation of task variables, EPN Sst+ neuron synaptic output is dispensable for task performance.

      Strengths and Weaknesses:

      Overall, the manuscript is exceptionally scholarly, with a clear articulation of the scientific question and a discussion of the findings and their limitations. The analyses and interpretations are careful and rigorous. This review appreciates the thorough explanation of the behavioral modelling and GLM for deconvolving the photometry signal around behavioral events, and the transparency and thoroughness of the analyses in the supplemental figures. This extra care has the result of increasing the accessibility for non-experts, and bolsters confidence in the results. To bolster a reader's understanding of results, we suggest it would be interesting to see the same mouse represented across panels (i.e. Fig 1 F-J, Supp 1 F,K etc i.e via inclusion of faint hash lines connecting individual data points across variables. Additionally, Fig 3E demonstrates that eliminating the 'reward' and 'choice and reward' terms from the GLM significantly worsens model performance; to demonstrate the magnitude of this effect, it would be interesting to include a reconstruction of the photometry signal after holding out of both or one of these terms, alongside the 'original' and 'reconstructed' photometry traces in panel D. This would help give context for how the model performance degrades by exclusion of those key terms. Finally, the authors claimed calcium activity increased following ipsilateral movements. However, figure 3C clearly shows that both SXcontra and SXisi increase beta coefficients. Instead, the choice direction may be represented in these neurons, given that beta coefficients increase following CXipsi and before SEipsi, presumably when animals make executive decisions. Could the authors clarify their interpretation on this point? Also, it is not clear if there is a photometry response related to motor parameters (i.e. head direction or locomotion, licking), which could change the interpretation of the reward outcome if it is related to a motor response; could the authors show photometry signal from representative 'high licking' or 'low licking' reward trials, or from spontaneous periods of high. Vs low locomotor speeds (if the sessions are recorded) to otherwise clarify this point?

      There are a few limitations with the design and timing of the synaptic manipulations that would improve the manuscript if discussed or clarified. The authors take care to validate the intersectional genetic strategies: Tetanus Toxin virus (which eliminates synaptic vesicle fusion) or CRISPR editing of Slc17a6, which prevents glutamate loading into synaptic vesicles. The magnitude of effect in the slice physiology results are striking. However, this relies on co-infection of a second AAV to express channelrhodopsin for the purposes of validation, and it is surely the case that there will not be 100% overlap between the proportion of cells infected. Alternative means of glutamate packaging (other VGluT isoforms, other transporters etc) could also compensate for the partial absence of VGluT2, which should be discussed. The authors do not perform a complimentary experiment to delete GABA release (i.e. via VGAT editing), which is understandable, given the absence of an effect with the pan-synaptic manipulation. A more significant concern is the timing of these manipulations as the authors acknowledge. The manipulations are all done in well-trained animals, who continue to perform during the length of viral expression. Moreover, after carefully showing that mice use different strategies on the 70/30 version vs the 90/10 version of the task, only performance on the 90/10 version is assessed after the manipulation. Together, the observation that EPNsst activity does not alter performance on a well learned, 90/10 switching task decreases the impact of the findings, as this population may play a larger role during task acquisition or under more dynamic task conditions. Additional experiments could be done to strengthen the current evidence, although the limitations is transparently discussed by the authors.

      Finally, intersectional strategies target LHb-projecting neurons, although in the original characterization it is not entirely clear that the LHb is the only projection target of EPNsst neurons. A projection map would help clarify this point.

      Overall, the authors used a pertinent experimental paradigm and common cell-specific approaches to address a major gap in the field, which is the functional role of glutamate/GABA co-release from the major basal ganglia output nucleus in action selection and evaluation. The study is carefully conducted, their analyses are thorough, and the data are often convincing and thought-provoking. However, the limitations of their synaptic manipulations with respect to the behavioral assays reduces generalizability and to some extent the impact of their findings.

      Comments on the latest version:

      Specifically, they have included more thorough analyses to address several concerns related to interpreting activity patterns of EPSst+ neurons. The authors clearly point out that calcium activity increased during ipsilateral movements, and the increase was statistically larger during the choice phase (Figure 2 supplement 1F-G), indicating that these neurons may represent movement and additional factors (e.g. executive decision-making). Correspondingly, we appreciate the thorough explanation of using a GLM model to determine which behavioural variables contribute to observed physiological signals and adding the example reconstructed signal with direction and reward variables omitted in Figure 3 supplements 1 and 2.

      Although no new manipulation experiment is added to the manuscript, the authors respond to common critiques related to testing the behavioural effect after the manipulations in well-trained mice. The discussion related to technical limitations, possible compensatory mechanisms and alternative interpretations is thorough and overall satisfying. Based on the behaviour modeling results, the authors speculate that animals need to integrate more evidence from the past to guide choice in a more uncertain environment (70/30 version), instead of adopting a 'win-stay, lose-shift' strategy in the more deterministic 90/10 version. The authors expand the discussion, but the possibility that EPNSst+ neurons contribute to task performance in well-trained animals under uncertainty is not directly tested. Along with other alternative explanations discussed in the manuscript, we think the paper is valuable literature for future studies to understand the basal ganglia circuits in learning and decision-making.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this series of studies, Locantore et al. investigated the role of SST-expressing neurons in the entopeduncular nucleus (EPNSst+) in probabilistic switching tasks, a paradigm that requires continued learning to guide future actions. In prior work, this group had demonstrated EPNSst+ neurons co-release both glutamate and GABA and project to the lateral habenula (LHb), and LHb activity is also necessary for outcome evaluation necessary for performance in probabilistic decision-making tasks. Previous slice physiology works have shown that the balance of glutamate/GABA co-release is plastic, altering the net effect of EPN on downstream brain areas and neural circuit function. The authors used a combination of in vivo calcium monitoring with fiber photometry and computational modeling to demonstrate that EPNSst+ neural activity represents movement, choice direction, and reward outcomes in their behavioral task. However, viral-genetic manipulations to synaptically silence these neurons or selectively eliminate glutamate release had no effect on behavioral performance in well-trained animals. The authors conclude that despite their representation of task variables, EPN Sst+ neuron synaptic output is dispensable for task performance.

      Strengths and Weaknesses:

      Overall, the manuscript is exceptionally scholarly, with a clear articulation of the scientific question and a discussion of the findings and their limitations. The analyses and interpretations are careful and rigorous. This review appreciates the thorough explanation of the behavioral modeling and GLM for deconvolving the photometry signal around behavioral events, and the transparency and thoroughness of the analyses in the supplemental figures. This extra care has the result of increasing the accessibility for non-experts, and bolsters confidence in the results.

      (1) To bolster a reader's understanding of results, we suggest it would be interesting to see the same mouse represented across panels (i.e. Figures 1 F-J, Supplementary Figures 1 F, K, etc i.e via the inclusion of faint hash lines connecting individual data points across variables.

      Thank you for the suggestion. The same mouse is now represented in Fig 1 and Fig 1—Figure Supplement 1 as a darkened circle so it can be followed across different panels. Photometry from this mouse was used as sample date in Figure 2b and Figure 2—figure supplement 1a-b.

      (2) Additionally, Figure 3E demonstrates that eliminating the 'reward' and 'choice and reward' terms from the GLM significantly worsens model performance; to demonstrate the magnitude of this effect, it would be interesting to include a reconstruction of the photometry signal after holding out of both or one of these terms, alongside the 'original' and 'reconstructed' photometry traces in panel D. This would help give context for how the model performance degrades by exclusion of those key terms.

      We have now added analyses and reconstructed photometry signals from GLMs excluding important predictors in Figure 3—figure supplement 1 and 2. We use the model where both “Direction and reward” were omitted as predictors for the GLM and showed photometry reconstructions aligned to behavioral events used for the full model (Figure 3—figure supplement 1) and partial model (Figure 3—figure supplement 2) to compare model performance.  

      (3) Finally, the authors claimed calcium activity increased following ipsilateral movements. However, Figure 3C clearly shows that both SXcontra and SXipsi increase beta coefficients. Instead, the choice direction may be represented in these neurons, given that beta coefficients increase following CXipsi and before SEipsi, presumably when animals make executive decisions. Could the authors clarify their interpretation on this point?

      We observe that calcium activity increases during ipsilateral choices as the animal moves toward the ipsilateral side port (e.g. CX<sub>ipsi</sub> to SE<sub>ipsi</sub>; Fig 2C and Fig 3C). The animal also makes other ipsiversive movements not during the “choice” phase of a trial such as when it is returning to the center port following a contralateral choice (e.g. SX<sub>Contra</sub> to CE; Fig 2—figure supplement 1F and Fig 3C). We also observe an increase in calcium activity during these ipsiversive movements (e.g. SX<sub>Contra</sub> to CE), but they are not as large as those observed during the choice phase (Fig 2—figure supplement 1G). Therefore, during the choice phase of a trial, activity contains signals related to ipsilateral movement and additional factors (e.g. executive decision making).    

      (4) Also, it is not clear if there is a photometry response related to motor parameters (i.e. head direction or locomotion, licking), which could change the interpretation of the reward outcome if it is related to a motor response; could the authors show photometry signal from representative 'high licking' or 'low licking' reward trials, or from spontaneous periods of high vs. low locomotor speeds (if the sessions are recorded) to otherwise clarify this point?

      Unfortunately, neither licks nor locomotion were recorded during the behavioral sessions when photometry was recorded. In Figure 2—figure supplement 1a we now show individual trials sorted by trial duration (time elapsed between CE and SE) to illustrate the dynamics of the photometry signal on fast vs slow trials within a session.  

      (5) There are a few limitations with the design and timing of the synaptic manipulations that would improve the manuscript if discussed or clarified. The authors take care to validate the intersectional genetic strategies: Tetanus Toxin virus (which eliminates synaptic vesicle fusion) or CRISPR editing of Slc17a6, which prevents glutamate loading into synaptic vesicles. The magnitude of effect in the slice physiology results is striking. However, this relies on the co-infection of a second AAV to express channelrhodopsin for the purposes of validation, and it is surely the case that there will not be 100% overlap between the proportion of cells infected.

      For the Tet-tox experiments in Figure 4 we estimate approximately 70±15% of EP<sup>Sst+</sup> neurons expressed Tet-tox based on our histological counts and published stereological counts in EP (Miyamoto and Fukuda, 2015). It is true that channelrhodopsin expression will not overlap 100% with cells infected by the other virus, indeed our in vitro synaptic physiology shows small residual postsynaptic currents following optogenetic stimulation either from incomplete blockade of synaptic release or neurons that expressed channelrhodopsin but not Tettx (Figure 4—figure supplement 1J-K). The same is shown for CRISPR mediated deletion of Slc17a6 (Fig 5 – Fig supplement 1J-K).  

      (6) Alternative means of glutamate packaging (other VGluT isoforms, other transporters, etc) could also compensate for the partial absence of VGluT2, which should be discussed.

      While single cell sequencing (Wallace et al, 2017) has shown EP<sup>Sst+</sup> neurons do not express Slc17a7/8 (vGlut1 or vGlut3) it is possible that these genes could be upregulated following CRISPR mediated deletion of Slc17a6, however we do not see evidence of this with our in vitro synaptic physiology (EPSCs are significant suppressed, Figure 5 – Fig supplement 1J-K) and therefore can conclude it is highly unlikely to occur to a significant degree in our experiments. This is now included in the Discussion.

      (7) The authors do not perform a complimentary experiment to delete GABA release (i.e. via VGAT editing), which is understandable, given the absence of an effect with the pan-synaptic manipulation. A more significant concern is the timing of these manipulations as the authors acknowledge. The manipulations are all done in well-trained animals, who continue to perform during the length of viral expression. Moreover, after carefully showing that mice use different strategies on the 70/30 version vs the 90/10 version of the task, only performance on the 90/10 version is assessed after the manipulation. Together, the observation that EPNsst activity does not alter performance on a well-learned, 90/10 switching task decreases the impact of the findings, as this population may play a larger role during task acquisition or under more dynamic task conditions. Additional experiments could be done to strengthen the current evidence, although the limitation is transparently discussed by the authors.

      As mentioned above, it is possible that a requirement for EP<sup>Sst+</sup> neurons could be revealed if the experiment was conducted with different parameters (either different reward probabilities, fluctuating reward probabilities within a session, or withholding additional training during viral expression). It is difficult to predict which version of the task, if any, would be most likely to reveal a requirement for EP<sup>Sst+</sup> neurons based on our results. We favor testing for EP<sup>Sst+</sup> function using a new behavioral paradigm that allows us to carefully examine task learning following EP manipulations in an independent study.

      (8) Finally, intersectional strategies target LHb-projecting neurons, although in the original characterization, it is not entirely clear that the LHb is the only projection target of EPNsst neurons. A projection map would help clarify this point.

      In a previous study we confirmed that EP<sup>Sst+</sup> neurons project exclusively to the LHb using cell-type specific rabies infection and examining all reported downstream regions for axon collaterals (Wallace et al 2017, Suppl. Fig 6F-G). When EP<sup>Sst+</sup> neurons were labeled we did not observe axon collaterals in known targets of EP such as ventro-antero lateral thalamus, red nucleus, parafasicular nucleus of the thalamus, or the pedunculopontine tegmental nucleus, only in the LHb. Additionally, using single cell tracing techniques, others have shown EP neurons that exclusively project to the LHb (Parent et al, 2001).

      Overall, the authors used a pertinent experimental paradigm and common cell-specific approaches to address a major gap in the field, which is the functional role of glutamate/GABA co-release from the major basal ganglia output nucleus in action selection and evaluation. The study is carefully conducted, their analyses are thorough, and the data are often convincing and thought-provoking. However, the limitations of their synaptic manipulations with respect to the behavioral assays reduce generalizability and to some extent the impact of their findings.

      Reviewer #2 (Public Review):

      Summary:

      This paper aimed to determine the role EP sst+ neurons play in a probabilistic switching task.

      Strengths:

      The in vivo recording of the EP sst+ neuron activity in the task is one of the strongest parts of this paper. Previous work had recorded from the EP-LHb population in rodents and primates in head-fixed configurations, the recordings of this population in a freely moving context is a valuable addition to these studies and has highlighted more clearly that these neurons respond both at the time of choice and outcome.

      The use of a refined intersectional technique to record specifically the EP sst+ neurons is also an important strength of the paper. This is because previous work has shown that there are two genetically different types of glutamatergic EP neurons that project to the LHb. Previous work had not distinguished between these types in their recordings so the current results showing that the bidirectional value signaling is present in the EP sst+ population is valuable.

      Weaknesses:

      (1) One of the main weaknesses of the paper is to do with how the effect of the EP sst+ neurons on the behavior was assessed.

      (a) All the manipulations (blocking synaptic release and blocking glutamatergic transmission) are chronic and more importantly the mice are given weeks of training after the manipulation before the behavioral effect is assessed. This means that as the authors point out in their discussion the mice will have time to adjust to the behavioral manipulation and compensate for the manipulations. The results do show that mice can adapt to these chronic manipulations and that the EP sst+ are not required to perform the task. What is unclear is whether the mice have compensated for the loss of EP sst+ neurons and whether they play a role in the task under normal conditions. Acute manipulations or chronic manipulations without additional training would be needed to assess this.

      Unfortunately, when mice are given a three week break from behavioral training (the time required to allow for adequate viral expression) behavioral performance on the task (p(highport), p(switch), trial number, trial time, etc.) is significantly degraded. Animals do eventually recover to previous performance levels, but this takes place during a 4-5 day “relearning” period. Here we sought to examine if EP<sup>Sst+</sup> neurons are required for continued task performance and chose to continue to train the animals following viral injection to avoid the “relearning” period that occurs following an extended break from behavioral training which may have made it difficult to interpret changes in behavioral performance due to the viral manipulation vs relearning.  

      Acute manipulations were not used because we planned to compare complete synaptic ablation (Tettx) and single neurotransmitter ablation (CRISPR Slc17a6) over similar time courses and we know of no acute manipulation that could achieve single neurotransmitter ablation. 

      (b) Another weakness is that the effect of the manipulations was assessed in the 90/10 contingency version of the task. Under these contingencies, mice integrate past outcomes over fewer trials to determine their choice and animals act closer to a simple win-stay-lose switch strategy. Due to this, it is unclear if the EP sst+ neurons would play a role in the task when they must integrate over a larger number of conditions in the less deterministic 70/30 version of the task.

      It is possible that a requirement for EP<sup>Sst+</sup> neurons could be revealed if the experiment was conducted with different parameters (either different reward probabilities, fluctuating reward probabilities within a session, or withholding additional training during viral expression). It is difficult to predict which version of the task, if any, would be most likely to reveal a requirement for EP<sup>Sst+</sup> neurons based on our results. We favor testing for EP<sup>Sst+</sup> function using a new behavioral paradigm that allows us to carefully examine task learning following EP manipulations in an independent study.

      The authors show an intriguing result that the EP sst+ neurons are excited when mice make an ipsilateral movement in the task either toward or away from the center port. This is referred to as a choice response, but it could be a movement response or related to the predicted value of a specific action. Recordings while mice perform movement outside the task or well-controlled value manipulations within the session would be needed to really refine what these responses are related to.

      If activity of EP<sup>Sst+</sup> neurons included a predicted value component, we would expect to see a change in activity during ipsilateral movements when the previous trial was rewarded vs unrewarded. This is examined in Fig 2—figure suppl. 2C, where we compare EP<sup>Sst+</sup> responses during ipsilateral trials when the previous trials were either rewarded (blue) or unrewarded (gray). We show that EP<sup>Sst+</sup> activity prior to side port entry (SE) is identical in these two trial types indicating that EP<sup>Sst+</sup> neurons do not show evidence of predicted value of an action in this context. Therefore, we conclude that increased EP<sup>Sst+</sup> activity during ipsilateral trials is primarily related to ipsilateral movement following CX (we call this the “choice” phase of the trial). We also show that other ipsiversive movements outside of the “choice” phase of a trial (such as the return to center port following a contralateral trial) show a smaller but significant increase in activity (Figure 2—figure supplement 1F-G). Therefore, whereas the activity observed during ipsilateral choice contains signals related to ipsilateral movement and additional factors, our data suggest that predicted value is not one of those factors. We will clarify this point and our definition of “choice” in the narrative.  

      (2) The authors conclude that they do not see any evidence for bidirectional prediction errors. It is not possible to conclude this. First, they see a large response in the EP sst+ neurons to the omission of an expected reward. This is what would be expected of a negative reward prediction error. There are much more specific well-controlled tests for this that are commonplace in head-fixed and freely moving paradigms that could be tested to probe this. The authors do look at the effect of previous trials on the response and do not see strong consistent results, but this is not a strong formal test of what would be expected of a prediction error, either a positive or negative. The other way they assess this is by looking at the size of the responses in different recording sessions with different reward contingencies. They claim that the size of the reward expectation and prediction error should scale with the different reward probabilities. If all the reward probabilities were present in the same session this should be true as lots of others have shown for RPE. Because however this data was taken from different sessions it is not expected that the responses should scale, this is because reward prediction errors have been shown to adaptively scale to cover the range of values on offer (Tobler et al., Science 2005). A better test of positive prediction error would be to give a larger-than-expected reward on a subset of trials. Either way, there is already evidence that responses reflect a negative prediction error in their data and more specific tests would be needed to formally rule in or out prediction error coding especially as previous recordings have shown it is present in previous primate and rodent recordings.

      We do not conclude that we see no evidence for RPE and the reviewer is correct in stating that a large increase in EP<sup>Sst+</sup> activity following omission of an expected reward would be expected of a negative reward prediction error. However, this observation alone is not strong enough evidence that EP<sup>Sst+</sup> neurons signal RPE. When we looked for additional evidence of RPE within our experiments we did not find consistent demonstrations of its existence in our data. When performing photometry measurements of dopamine release in the striatum, RPE signals are readily observed with a task identical to ours using trial history to as a modifier of reward prediction (Chantranupong, et al 2023). Of course, there could be a weaker more heterogeneous RPE signal in EP<sup>Sst+</sup> neurons that we cannot detect with our methods. As we state in the discussion, RPE signals may be present in a subset of individual neurons (as observed in Stephenson-Jones et al, 2016 and Hong and Hikosaka, 2008) which are below our detection threshold using fiber photometry. Additionally, Hong and Hikosaka, 2008 show that LHb-projecting GPi neurons show both positive and negative reward modulations which may obscure observation of RPE signals with photometry recordings that arise from population activity of genetically defined neurons.   

      (3) There are a lot of variables in the GLM that occur extremely close in time such as the entry and exit of a port. If two variables occur closely in time and are always correlated it will be difficult if not impossible for a regression model to assign weights accurately to each event. This is not a large issue, but it is misleading to have regression kernels for port entry and exits unless the authors can show these are separable due to behavioral jitter or a lack of correlation under specific conditions, which does not seem to be the case.

      It is true that two variables that are always correlated are redundant in a GLM. For example, center entry (CE) and center exit (CX) occur in quick succession in most trials and are highly correlated (Figure 1C). For this reason, when only one is removed as a predictor from the model but not the other there is a very small change in the MSE of the fit (Figure 3E, -CE or -CX). However, when both are removed model performance decreases further indicating that center-port nose-pokes do contribute to model performance (Figure 3E, -CE/CX). Due to the presence/absence of reward following side port entry there is substantial behavioral jitter (due to water consumption in rewarded trials) that the SE and SX are not always correlated, therefore the model performs worse when either are omitted alone, but even worse still when both SE/SX are omitted together (Figure 3E, -SE/SX). We will update Figure 3 and the narrative to make this more explicit.

      Reviewer #3 (Public Review):

      Summary:

      The authors find that Sst-EPN neurons, which project to the lateral habenula, encode information about response directionality (left vs right) and outcome (rewarded vs unrewarded). Surprisingly, impairment of vesicular signaling in these neurons onto their LHb targets did not impair probabilistic choice behavior.

      Strengths:

      Strengths of the current work include extremely detailed and thorough analysis of data at all levels, not only of the physiological data but also an uncommonly thorough analysis of behavioral response patterns.

      Weaknesses:

      Overall, I saw very few weaknesses, with only two issues, both of which should be possible to address without new experiments:

      (1) The authors note that the neural response difference between rewarded and unrewarded trials is not an RPE, as it is not affected by reward probability. However, the authors also show the neural difference is partly driven by the rapid motoric withdrawal from the port. Since there is also a response component that remains different apart from this motoric difference (Figure 2, Supplementary Figure 1E), it seems this is what needs to be analyzed with respect to reward probability, to truly determine whether there is no RPE component. Was this done?

      We thank the reviewer for this comment, we believe this is particularly important for unrewarded trials as SE and SX occur in rapid succession. In Figure 2—figure supplement 2A-B we now show the photometry signal from Rewarded and Unrewarded ipsilateral trials aligned to SX for different reward probabilities. We quantify the signals for different reward probabilities during a 500ms window immediately prior to SX but find no differences between groups.  

      (2) The current study reaches very different conclusions than a 2016 study by Stephenson-Jones and colleagues despite using a similar behavioral task to study the same Sst-EPN-LHb circuit. This is potentially very interesting, and the new findings likely shed important light on how this circuit really works. Hence, I would have liked to hear more of the authors' thoughts about possible explanations of the differences. I acknowledge that a full answer might not be possible, but in-depth elaboration would help the reader put the current findings in the context of the earlier work, and give a better sense of what work still needs to be done in the future to fully understand this circuit.

      For example, the authors suggest that the Sst-EPN-LHb circuit might be involved in initial learning, but play less of a role in well-trained animals, thereby explaining the lack of observed behavioral effect. However, it is my understanding that the probabilistic switching task forces animals to continually update learned contingencies, rendering this explanation somewhat less persuasive, at least not without further elaboration (e.g. maybe the authors think it plays a role before the animals learn to switch?).

      Also, as I understand it, the 2016 study used manipulations that likely impaired phasic activity patterns, e.g. precisely timed optogenetic activation/inhibition, and/or deletion of GABA/glutamate receptors. In contrast, the current study's manipulations - blockade of vesicle release using tetanus toxin or deletion of VGlut2, would likely have blocked both phasic and tonic activity patterns. Do the authors think this factor, or any others they are aware of, could be relevant?

      We have added further discussion of the Stephenson-Jones, et al 2016 study as well as the Lazaridis, et al 2019 study which shows no effect of phasic stimulation of EP when specifically manipulating EP<sup>Sst+</sup> (vGat+/vGlut2+) neurons rather than vGlut2+ neurons as in the Stephenson-Jones study.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      In some places, there seems to be a mismatch between referenced figures and texts. For example:

      (1) The authors described that 'This increase in activity was seen for all three reward probabilities tested (90/10, 80/20, and 70/30) and occurred while the animal was engaged in ipsiversive movements as similar increases were observed following side exit (SX) on contralateral trials as the animal was moving from the contralateral side port back to the center port (Figure 2-Figure Supplement 1c)', but supplement 1c is not about calcium dynamics around the SX event. I presume they mean Figure 2-Figure Supplement 1d.

      Yes, this will be corrected in the revised manuscript.

      (2) The authors explained that increased EPSst+ neuronal activity following an unrewarded outcome was partially due to the rapid withdrawal of the animal's snout following an unrewarded outcome however, differences in rewarded and unrewarded trials were still distinguishable when signals were aligned to side port exit indicating that these increases in EPSst+ neuronal activity on unrewarded trials were a combination of outcome evaluation (unrewarded) and side port withdrawal occurring in quick succession (SX, Figure 2 - Figure Supplement 1d). I presume that they mean Figure 2 - Figure Supplement 1e.

      Yes, this will be corrected in the revised manuscript.

      Minor suggestions related to specific figure presentation are below:

      Figure 2 and supplement figures:

      (1) Figure 2B: the authors may consider presenting outcome-related signals recorded from all trials, including both ipsilateral and contralateral events, and align signals to SE when reward consumption presumably begins, rather than aligning to CE.

      We have added sample recordings from ipsilateral and contralateral trials and sorted them by trial duration to allow for clearer presentation of activity following CE and SE (Figure 2—figure supplement 1a-b).

      (2) The authors described that 'This increase in activity was seen for all three reward probabilities tested (90/10, 80/20, and 70/30) and occurred while the animal was engaged in ipsiversive movements as similar increases were observed following side exit (SX) on contralateral trials as the animal was moving from the contralateral side port back to the center port (Figure 2-Figure Supplement 1c)', but supplement 1c is not about calcium dynamics around the SX event. I presume they mean Figure 2-Figure Supplement 1d.

      Yes, this will be corrected in the revised manuscript.

      (3) The authors explained that increased EPSst+ neuronal activity following an unrewarded outcome was partially due to the rapid withdrawal of the animal's snout following an unrewarded outcome however, differences in rewarded and unrewarded trials were still distinguishable when signals were aligned to side port exit indicating that these increases in EPSst+ neuronal activity on unrewarded trials were a combination of outcome evaluation (unrewarded) and side port withdrawal occurring in quick succession (SX, Figure 2 -Figure Supplement 1d). I presume that they mean Figure 2 -Figure Supplement 1e.

      Yes, this will be corrected in the revised manuscript.

      Figure 3 and supplement figures:

      (1) Figure 3C-F: it is hard to compare the amplitude of calcium signals between different behaviour events without a uniform y-axis.

      The scale for the y-axis on Figure 3C-D is uniform for all panels. Figure 3E is also uniform for all boxplots. The reviewer may be referring to Figure 2C-F, but the y-axis for all of the photometry data is uniform for all panels and the horizontal line represents zero. The y-axis for the quantification on the right of each panel is scaled to the max/min for each comparison.

      (2) Figure 3E is difficult to follow. The authors explained that the 'SE' variable is generated by collapsing the ipsilateral and contralateral port entries, and hence the variable has no choice of direction information. I assumed that the 'SX', 'CE', and 'CX' variables are generated similarly. It is not clear if this is the case for the 'side', 'centre' and 'choice' variables. The authors explained that 'omitting center port entry/exit together or individually also resulted in decreased GLM performance but to a smaller degree than the omission of choice direction (Figure 3e, "-Center")'. My understanding is that they created the Centre variable by collapsing ipsilateral and contralateral centre port entry/exit together. The Centre variable should have no choice of direction information. How is the Center variable generated differently from omitting centre port entry/exit together? I would ask the authors to explain the model and different variables a bit more thoroughly in the text.

      We apologize for the confusion. All ten variables used to train the full GLM are listed in Fig. 3C. In Figure 3E variable(s) were omitted to test how they contributed to GLM performance (data labeled “None” is the full model with all variables). Omitted variables are now defined as follows: -Rew = Rew+Unrew removed, -Direction = Ipsi/Contra designation removed and collapsed into CE, CX, SE, SX, -Direction & Rew = Ipsi/Contra info removed from all variables + Rew/Unrew removed, -CE/CX = Ipsi/Contra CE and CX removed, -CE = Ipsi/contra CE removed, -CX = Ipsi/contra CX removed, -SE/SX = Ipsi/Contra SE and SX removed, -SE = Ipsi/contra SE removed, -SX = Ipsi/contra SX removed. This clarification has also been added to the Generalized Linear Model section of Materials and Methods.

      Figure 5 and supplement figures:

      There are no representative and summary figures show the specificity and efficiency of oChief-tdTomato or Tetx-GFP expression. Body weight changes following virus injection are not well described.

      A representative image of Tettx GFP expression are shown in Fig. 4A and percent of infected EP<sup>Sst+</sup> neurons is described in the text (70±15.1% (mean±SD), 1070±230 neurons/animal, n=6 mice). Most oChief-tdTom animals were used for post-hoc electrophysiology experiments and careful quantification of viral expression was not possible. However, Slc17a6 deletion was confirmed in these animals (Fig. 5 – Fig supplement 1J-K) to confirm the manipulation was effective in the experimental group. A representative image of oChief-tdTom expression is shown in Fig. 5A.

      We now mention the body weight changes observed following Tettx injection in the narrative.

      Reviewer #2 (Recommendations For The Authors):

      (1) In the RFLR section you state that "this variable decays...", a variable can't decay only the value of a variable can change. Also, it is not mentioned what variable is being discussed. There are lots of variables in the model so this should be made clear.

      We now state, “This variable (β) changes over trials and is updated with new evidence from each new trial’s choice and outcome with an additional bias towards or away from its most recent choice (Figure 1-figure supplement 2A-C).”

      (2) I couldn't find in the results section, or the methods section the details for the Tet tx experiments, were mice trained and tested on 90/10 only? Were they trained while the virus was expressing etc? This should be added.

      In the methods section we state, ”For experiments where we manipulated synaptic release in EP<sup>Sst+</sup> neurons (Figures 4-5) we trained mice (reward probabilities 90/10, no transparent barrier present) to the following criteria for the 5 days prior to virus injection: 1) p(highport) per session was greater than or equal to 0.80 with a variance less than 0.003, 2) p(switch) per session was less than or equal to 0.15 with a variance less than 0.001, 3) the p(left port) was between 0.45-0.55 with a variance less than 0.005, and 4) the animal performed at least 200 trials in a session. The mean and variance for these measurements was calculated across the five session immediately preceding surgery. The criterion were determined by comparing performance profiles in separate animals and chosen based on when animals first showed stable and plateaued behavioral performance. Following surgery, mice were allowed to recover for 3 days and then continued to train for 3 weeks during viral expression. Data collected during the 5 day pre-surgery period was then compared to data collected for 10 sessions following the 3 weeks allotted for viral expression (i.e. days 22-31 post-surgery).”

      Reviewer #3 (Recommendations For The Authors):

      (1) The kernel in Figure 3C shows an activation prior to CE on "contra" trials that is not apparent in Figure 2C which shows no activation prior to CE on either contra or ipsi trials. Given that movement directionality prior to CE is dictated by the choice on the PREVIOUS trial, is the "contra" condition in 3C actually based on the previous trial? If so, this should be clarified.

      On most “contra” trials the animal is making an ipsiversive movement just prior to CE as it returns to the center from the contralateral side-port (as most trials are no “switch” trials). Therefore, an increase in activity is expected and shown most clearly following SX for contralateral trials in Fig 2 –Fig suppl 1F. A significant increase in activity prior to CE on contra trials compared to ipsi trials can also be seen in Fig 2C, its just not as large a change as the increase observed following CE for ipsi. trials. The comparison between activity observed during the two types of ipsiversive movements is now shown directly in Figure 2—figure supplement 1G.

      (2) Paragraph 7 of the discussion uses a phrase "by-in-large", which probably should be "by and large".

      Thank you for the correction.

      Editor's note:

      Should you choose to revise your manuscript, if you have not already done so, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript.

      Readers would also benefit from coding individual data points by sex and noting N/sex.

      Sex breakdown has been added to figure legends for each experiment, full statistical reporting is now also include in the figure legends.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Bell et. al. describes an analysis of the effects of removing one of two mutually exclusive splice exons at two distinct sites in the Drosophila CaV2 calcium channel Cacophony (Cac). The authors perform imaging and electrophysiology, along with some behavioral analysis of larval locomotion, to determine whether these alternatively spliced variants have the potential to diversify Cac function in presynaptic output at larval neuromuscular junctions. The author provided valuable insights into how alternative splicing at two sites in the calcium channel alters its function.

      Strengths:

      The authors find that both of the second alternatively spliced exons (I-IIA and I-IIB) that are found in the intracellular loop between the 1st and second set of transmembrane domains can support Cac function. However, loss of the I-IIB isoform (predicted to alter potential beta subunit interactions) results in 50% fewer channels at active zones and a decrease in neurotransmitter release and the ability to support presynaptic homeostatic potentiation. Overall, the study provides new insights into Cac diversity at two alternatively spliced sites within the protein, adding to our understanding of how regulation of presynaptic calcium channel function can be regulated by splicing.

      Weaknesses:

      The authors find that one splice isoform (IS4B) in the first S4 voltage sensor is essential for the protein's function in promoting neurotransmitter release, while the other isoform (IS4A) is dispensable. The authors conclude that IS4B is required to localize Cac channels to active zones. However, I find it more likely that IS4B is required for channel stability and leads to the protein being degraded, rather than any effect on active zone localization. More analysis would be required to establish that as the mechanism for the unique requirement for IS4B.

      (1) We thank the reviewer for this important point. In fact, all three reviewers raised the same question, and the reviewing editor pointed out that caution or additional experiments were required to distinguish between IS4 splicing being important for cac channel localization versus channel stability/degradation. We provide multiple sets of experiments as well as text and figure revisions to strengthen our claim that the IS4B exon is required for cacophony channels to enter motoneuron presynaptic boutons and localize to active zones.

      a. If IS4B was indeed required for cac channel stability (and not for localization to active zones) IS4A channels should be instable wherever they are. This is not the case because we have recorded somatodendritic cacophony currents from IS4A expressing adult motoneurons that were devoid of cac channels with the IS4B exon. Therefore, IS4A cac channels are not instable but underlie somatodendritic voltage dependent calcium currents in these motoneurons. These new data are now shown in the revised figure 3C and referred to in the text on page 7, line 42 to page 8 line 9.

      b. Similarly, if IS4B was required for channel stability, it should not be present anywhere in the nervous system. We tested this by immunohistochemistry for GFP tagged IS4A channels in the larval CNS. Although IS4A channels are sparsely expressed, which is consistent with low expression levels seen in the Western blots (Fig. 1E), there are always defined and reproducible patterns of IS4A label in the larval brain lobes as well as in the anterior part of the VNC. This again shows that the absence of IS4A from presynaptic active zones is not caused by channel instability, because the channel is expressed in other parts of the nervous system. These data are shown in the new supplementary figure 1 and referred to in the text on page 15, lines 3 to 8.

      c. As suggested in a similar context by reviewers 1 and 2, we now show enlargements of the presence of IS4B channels in presynaptic active zones as well as enlargements of the absence of IS4A channels in presynaptic active zones in the revised figures 2A-C and 3A. In these images, no IS4A label is detectable in active zones or anywhere else throughout the axon terminals, thus indicating that IS4B is required for expressing cac channels in the axon terminal boutons and localizing it to active zones. Text and figure legends have been adjusted accordingly.

      d. Related to this, reviewer 1 also recommended to quantify the IS4A and ISB4 channel intensity and co-localization with the active zone marker brp (recommendation for authors). After following the reviewers’ suggestion to adjust the background values in IS4A and IS4B immunolabels to identical (revised Figs. 2A-C), it becomes obvious that IS4A channel are not detectable above background in presynaptic terminals or active zones, thus intensity is close to zero. We still calculated the Pearsons co-localization coefficient for both IS4 variants with the active zone marker brp. For IS4B channels the Pearson’s correlation coefficient is control like, just above 0.6, whereas for IS4A channels we do not find colocalization with brp (Pearson’s below 0.25). These new analyses are now shown in the revised figure 2D and referred to on page 6, lines 33 to 38.

      e. Consistent with our finding that IS4B is required for cac channel localization to presynaptic active zones, upon removal of IS4B we find no evoked synaptic transmission (Fig. 2 in initial submission, now Fig. 3B).

      Together these data are in line with a unique requirement of IS4B at presynaptic active zones (not excluding additional functions of IS4B), whereas IS4A containing cac isoforms are not found in presynaptic active zones and mediate different functions.

      Reviewer #2 (Public Review):

      This study by Bell et al. focuses on understanding the roles of two alternatively spliced exons in the single Drosophila Cav2 gene cac. The authors generate a series of cac alleles in which one or the other mutually exclusive exons are deleted to determine the functional consequences at the neuromuscular junction. They find alternative splicing at one exon encoding part of the voltage sensor impacts the activation voltage as well as localization to the active zone. In contrast, splicing at the second exon pair does not impact Cav2 channel localization, but it appears to determine the abundance of the channel at active zones.

      Together, the authors propose that alternative splicing at the Cac locus enables diversity in Cav2 function generated through isoform diversity generated at the single Cav2 alpha subunit gene encoded in Drosophila.

      Overall this is an excellent, rigorously validated study that defines unanticipated functions for alternative splicing in Cav2 channels. The authors have generated an important toolkit of mutually exclusive Cac splice isoforms that will be of broad utility for the field, and show convincing evidence for distinct consequences of alternative splicing of this single Cav2 channel at synapses. Importantly, the authors use electrophysiology and quantitative live sptPALM imaging to determine the impacts of Cac alternative splicing on synaptic function. There are some outstanding questions regarding the mechanisms underlying the changes in Cac localization and function, and some additional suggestions are listed below for the authors to consider in strengthening this study. Nonetheless, this is a compelling investigation of alternative splicing in Cav2 channels that should be of interest to many researchers.

      (2) We believe that the additional data on cac IS4A isoform localization and function as detailed above (response to public review 1) has strengthened the manuscript and answered some of the remaining questions the reviewer refers to. We are also grateful for the specific additional reviewer suggestions which we have addressed point-by-point and refer to below (section recommendations for authors).

      Reviewer #3 (Public Review):

      Summary:

      Bell and colleagues studied how different splice isoforms of voltage-gated CaV2 calcium channels affect channel expression, localization, function, synaptic transmission, and locomotor behavior at the larval Drosophila neuromuscular junction. They reveal that one mutually exclusive exon located in the fourth transmembrane domain encoding the voltage sensor is essential for calcium channel expression, function, active zone localization, and synaptic transmission. Furthermore, a second mutually exclusive exon residing in an intracellular loop containing the binding sites for Caβ and G-protein βγ subunits promotes the expression and synaptic localization of around ~50% of CaV2 channels, thereby contributing to ~50% of synaptic transmission. This isoform enhances release probability, as evident from increased short-term depression, is vital for homeostatic potentiation of neurotransmitter release induced by glutamate receptor impairment, and promotes locomotion. The roles of the two other tested isoforms remain less clear.

      Strengths:

      The study is based on solid data that was obtained with a diverse set of approaches. Moreover, it generated valuable transgenic flies that will facilitate future research on the role of calcium channel splice isoforms in neural function.

      Weaknesses:

      (1) Based on the data shown in Figures 2A-C, and 2H, it is difficult to judge the localization of the cac isoforms. Could they analyze cac localization with regard to Brp localization (similar to Figure 3; the term "co-localization" should be avoided for confocal data), as well as cac and Brp fluorescence intensity in the different genotypes for the experiments shown in Figure 2 and 3 (Brp intensity appears lower in the dI-IIA example shown in Figure 3G)? Furthermore, heterozygous dIS4B imaging data (Figure 2C) should be quantified and compared to heterozygous cacsfGFP/+.

      According to the reviewer’s suggestion, we have quantified cac localization relative to brp localization by computing the Pearson’s correlation coefficient for controls and IS4A as well as IS4B animals. These new data are shown in the revised Fig. 2D and referred to on page 6, lines 33-38. Furthermore, we now confirm control-like Pearson’s correlation coefficients for all exon out variants except ΔIS4B and show Pearson’s correlation coefficients for all genotypes side-by-side in the revised Fig. 4D (legend has been adjusted accordingly). In addition, in response to the recommendations to authors, we now provide selective enlargements for the co-labeling of Brp and each exon out variant in the revised figures 2-4. We have also adjusted the background in Fig. 2C (ΔIS4B) to match that in Figs. 2A and B (control and ΔIS4A). This allows a fair comparison of cac intensities following excision of IS4B versus excision of IS4A and control (see also Fig 3). Together, this demonstrates the absence of IS4A label in presynaptic active zones much clearer. As suggested, we have also quantified brp puncta intensity on m6/7 across homozygous exon excision mutants and found no differences (this is now stated for IS4A/IS4B in the results text on page 6, lines 37/38 and for I-IIA/I-IIB on page 8, lines 42-44.). We did not quantify the intensity of cacophony puncta upon excision of IS4B because the label revealed no significant difference from background (which can be seen much better in the images now), but the brp intensities remained control-like even upon excision of IS4B.

      (2) They conclude that I-II splicing is not required for cac localization (p. 13). However, cac channel number is reduced in dI-IIB. Could the channels be mis-localized (e.g., in the soma/axon)? What is their definition of localization? Could cac be also mis-localized in dIS4B? Furthermore, the Western Blots indicate a prominent decrease in cac levels in dIS4B/+ and dI-IIB (Figure 1D). How do the decreased protein levels seen in both genotypes fit to a "localization" defect? Could decreased cac expression levels explain the phenotypes alone?

      We have now precisely defined what we mean by cac localization, namely the selective label of cac channels in presynaptic active zones that are defined as brp puncta, but no cac label elsewhere in the presynaptic bouton (page 6, lines 18 to 20). On the level of CLSM microscopy this corresponds to overlapping cac puncta and brp puncta, but no cac label elsewhere in the bouton. Based on the additional analysis and data sets outlined in our response 1 (see above) we conclude that excision of IS4B does not cause channel mislocalization because we find reproducible expression patterns elsewhere in the nervous system as well as somatodendritic cac current in ΔIS4B (for detail see above). Therefore, the isoforms containing the mutually exclusive IS4A exon are expressed and mediate other functions, but cannot substitute IS4B containing isoforms at the presynaptic AZ. In fact, our Western blots are in line with reduced cac expression if all isoforms that mediate evoked release are missing, again indicating that the presynapse specific cac isoforms cannot be replaced by other cac isoforms. This is also in line with the sparse expression of IS4A throughout the CNS as seen in the new supplementary figure 1 (for detail see above).

      (3) Cac-IS4B is required for Cav2 expression, active zone localization, and synaptic transmission. Similarly, loss of cac-I-IIB reduces calcium channel expression and number. Hence, the major phenotype of the tested splice isoforms is the loss of/a reduction in Cav2 channel number. What is the physiological role of these isoforms? Is the idea that channel numbers can be regulated by splicing? Is there any data from other systems relating channel number regulation to splicing (vs. transcription or post-transcriptional regulation)?

      Our data are not consistent with the idea that splicing regulates channel numbers. Rather, splicing can be used to generate channels with specific properties that match the demand at the site of expression. For the IS4 exon pair we find differences in activation voltage between IS4A and IS4B channels (revised Fig. 3C), with IS4B being required for sustained HVA current. IS4A does not localize to presynaptic active zones at the NMJ and is only sparsely expressed elsewhere in the NS (new supplementary Fig. 1). By contrast, IS4B is abundantly expressed in many neuropils. Therefore, taking out IS4B takes out the more abundant IS4 isoform. This is consistent with different expression levels for IS4 isoforms that have different functions, but we do not find evidence for splicing regulating expression levels per se.

      Similarly, the I-II mutually exclusive exon pair differs markedly in the presence or absence of G-protein βγ binding sites that play a role in acute channel regulation as well the conservation of the sequence for β-subunit binding (see page 5, lines 9-17). Channel number reduction in active zones occurs specifically if expression of the cac channels with the G<sub>βγ</sub>-binding site as well as the more conserved β-subunit binding is prohibited by excision of the I-IIB exon (see Fig. 5F). Vice versa, excision of I-IIA does not result in reduced channel numbers. This scenario is consistent with the hypothesis that conserved β-subunit binding affects channel number in the active zone (see page 17, lines 3 to 6 and lines 33-36), but we have no evidence that I-II splicing per se affects channel number.

      (4) Although not supported by statistics, and as appreciated by the authors (p. 14), there is a slight increase in PSC amplitude in dIS4A mutants (Figure 2). Similarly, PSC amplitudes appear slightly larger (Figure 3J), and cac fluorescence intensity is slightly higher (Figure 3H) in dI-IIA mutants. Furthermore, cac intensity and PSC amplitude distributions appear larger in dI-IIA mutants (Figures 3H, J), suggesting a correlation between cac levels and release. Can they exclude that IS4A and/or I-IIA negatively regulate release? I suggest increasing the sample size for Canton S to assess whether dIS4A mutant PSCs differ from controls (Figure 2E). Experiments at lower extracellular calcium may help reveal potential increases in PSC amplitude in the two genotypes (but are not required). A potential increase in PSC amplitude in either isoform would be very interesting because it would suggest that cac splicing could negatively regulate release.

      There are several possibilities to explain this, but as none of the effects is statistically significant, we prefer to not investigate this in further depth. However, given that we cannot find IS4A in presynaptic active zones (revised figures 2C and 3A plus the new enlargements 2Ci and 3Ai, revised text page 6, lines 22 to 24 and 29 to 31, and page 7, second paragraph, same as public response 1D) IS4A channels cannot have a direct negative effect on release probability. Nonetheless, given that IS4A containing cac isoforms mediate functions in other neuronal compartments (see revised Fig. 3C) it may regulate release indirectly by affecting e.g. action potential shape. Moreover, in response to the more detailed suggestions to authors we provide new data that give additional insight.

      (5) They provide compelling evidence that IS4A is required for the amplitude of somatic sustained HVA calcium currents. However, the evidence for effects on biophysical properties and activation voltage (p. 13) is less convincing. Is the phenotype confined to the sustained phase, or are other aspects of the current also affected (Figure 2J)? Could they also show the quantification of further parameters, such as CaV2 peak current density, charge density, as well as inactivation kinetics for the two genotypes? I also suggest plotting peaknormalized HVA current density and conductance (G/Gmax) as a function of Vm. Could a decrease in current density due to decreased channel expression be the only phenotype? How would changes in the sustained phase translate into altered synaptic transmission in response to AP stimulation?

      Most importantly, sustained HVA current is abolished upon excision of IS4B (not IS4A, we think the reviewer accidentally mixed up the genotype) and presynaptic active zones at the NMJ contain only cac isoforms with the IS4B exon. This indicates that the cac isoforms that mediate evoked release encode HVA channels. The somatodendritic currents shown in the revised figure 3C (previously 2J) that remain upon excision of IS4B are mediated by IS4A containing cac isoforms. Please note that these never localize to the presynaptic active zone, and thus do not contribute to evoked release. Therefore, the interpretation is that specifically sustained HVA current encoded by IS4B cac isoforms is required for synaptic transmission. Reduced cac current density due to decreased channel expression is not the cause for impaired evoked release upon IS4B excision, but instead, the cause is the absence of any cac channels in active zones. IS4B-containing cac isoforms encode sustained HVA current, and we speculate that this might be a well suited current to minimize cacophony channel inactivation in the presynaptic active zone. Given that HVA current shows fast voltage dependent activation and fast inactivation upon repolarization, it is useful at large intraburst firing frequencies as observed during crawling (Kadas et al., 2017) without excessive cac inactivation (see page 15, Kadas, lines 16 to 20).

      However, we agree with the reviewer that a deeper electrophysiological analysis of splice isoform specific cac currents will be instructive. We have now added traces of control and ΔIS4B from a holding potential of -90 mv (revised Fig. 3C, bottom traces and revised text on page 7, line 43 to page 8, lines 1 to 10), and these are also consistent with IS4B mediating sustained HVA cac current. However, further analysis of activation and inactivation voltages and kinetics suffers form space clamp issues in recordings from the somata of such complex neurons (DLM motoneurons of the adult fly contain roughly 6000 µm of dendrites with over 4000 branches, Ryglewski et al., 2017, Neuron 93(3):632-645). Therefore, we will analyze the currents in a heterologous expression system and present these data to the scientific community as a separate study at a later time point.

      (6) Why was the STED data analysis confined to the same optical section, and not to max. intensity z-projections? How many and which optical sections were considered for each active zone? What were the criteria for choosing the optical sections? Was synapse orientation considered for the nearest neighbor Cac - Brp cluster distance analysis? How do the nearest-neighbor distances compare between "planar" and "side-view" Brp puncta?

      Maximum intensity z-projections would be imprecise because they can artificially suggest close proximity of label that is close by in x and y but far away in z. Therefore, the analysis was executed in xy-direction of various planes of entire 3D image stacks. We considered active zones of different orientations (Figs. 5C, D) to account for all planes. In fact, we searched the entire z-stacks until we found active zones of all orientations within the same boutons, as shown in figures 5C1-C6. The same active zone orientations were analyzed for all exon-out mutants with cac localization in active zones. The distance between cac and brp did not change if viewed from the side or any other orientation. We now explain this in more clarity in the results text on page 9, lines 23/24.

      (7) Cac clusters localize to the Brp center (e.g., Liu et al., 2011). They conclude that Cav2 localization within Brp is not affected in the cac variants (p. 8). However, their analysis is not informative regarding a potential offset between the central cac cluster and the Brp "ring". Did they/could they analyze cac localization with regard to Brp ring center localization of planar synapses, as well as Brp-ring dimensions?

      In the top views (planar) we did not find any clear offset in cac orientation to brp between genotypes. In such planar synapses (top views, Fig. 5D, left row) we did not find any difference in Brp ring dimensions. We did not quantify brp ring dimensions rigorously, because this study focusses on cac splice isoform-specific localization and function. Possible effects of different cac isoforms on brp-ring dimensions or other aspects of scaffold structure are not central to our study, in particular given that brp puncta are clearly present even if cac is absent from the synapse (Fig. 3A), indicating that cac is not instructive for the formation of the brp scaffold.

      (8) Given the accelerated PSC decay/ decreased half width in dI-IIA (Fig. 5Q), I recommend reporting PSC charge in Figure 3, and PPR charge in Figures 5A-D. The charge-based PPRs of dI-IIA mutants likely resemble WT more closely than the amplitude-based PPR. In addition, miniature PSC decay kinetics should be reported, as they may contribute to altered decay kinetics. How could faster cac inactivation kinetics in response to single AP stimulation result in a decreased PSC half-width? Is there any evidence for an effect of calcium current inactivation on PSC kinetics? On a similar note, is there any evidence that AP waveform changes accelerate PSC kinetics? PSC decay kinetics are mainly determined by GluR decay kinetics/desensitization. The arguments supporting the role of cac splice isoforms in PSC kinetics outlined in the discussion section are not convincing and should be revised.

      We agree that reporting charge in figure 3 is informative and do so in the revised text. Since the result (no significant difference in the PSCs between between CS, cac<sup>GFP</sup>, <sup>ΔI-IIA</sup>, and transheterozygous I-IIA/I-IIB, but significantly smaller values in ΔI-IIB) remained unchanged no matter whether charge or amplitude were analyzed, we decided to leave the figure as is and report the additional analysis in the text (page 8, lines 40 to 42). This way, both types of analysis are reported. Please note that EPSC amplitude is slightly but not significantly increased upon excision of I-IIA (Fig. 4J), whereas EPSC half amplitude width is significantly smaller (Fig. 5Q, now revised Fig 6R). Together, a tendency of increased EPSC amplitudes and smaller half amplitude width result in statistically insignificant changes in EPSC in ∆I-IIA (now discussed on page 15, lines 37 to 40). We also understand the reviewer’s concern attributing altered EPSC kinetics to presynaptic cac channel properties. We have toned down our interpretation in the discussion and list possible alterations in presynaptic AP shape or cac channel kinetics as alternative explanations (not conclusions; see revised discussion on page 15, line 40 to page 16, line 2). Moreover, we have quantified postsynaptic GluRIIA abundance to test whether altered PSC kinetics are caused by altered GluRIIA expression. In our opinion, the latter is more instructive than mini decay kinetic analysis because this depends strongly on the distance of the recording electrode to the actual site of transmission in these large muscle cells. Although we find no difference in GluRIIA expression levels we now clearly state that we cannot exclude other changes in GluR receptor fields, which of course, could also explain altered PSC kinetics. We have updated the discussion on page 16, lines 2/3 accordingly.

      (9) Paired-pulse ratios (PPRs): On how many sweeps are the PPRs based? In which sequence were the intervals applied? Are PPR values based on the average of the second over the first PSC amplitudes of all sweeps, or on the PPRs of each sweep and then averaged? The latter calculation may result in spurious facilitation, and thus to the large PPRs seen in dI-IIB mutants (Kim & Alger, 2001; doi: 10.1523/JNEUROSCI.21-2409608.2001).

      We agree that the PP protocol and analyses had to be described more precisely in the methods and have done so on page 23, lines 31 to 37 in the methods. Mean PPR values are based on the PPRs of each sweep and then averaged. We are aware of the study of Kim and Alger 2001 and have re-analyzed the PP data in both ways outlined by the reviewer. We get identical results with either analyses method. Spurious facilitation is thus not an issue in our data. We now explain this in the methods section along with the PPR protocol. The large spread seen in dI-IIB is indeed caused by reduced calcium influx into active zones with fewer channels, as anticipated by the reviewer (see next point).

      (10) Could the dI-IIB phenotype be simply explained by a decrease in channel number/ release probability? To test this, I propose investigating PPRs and short-term dynamics during train stimulation at lower extracellular Ca2+ concentration in WT. The Ca2+ concentration could be titrated such that the first PSC amplitude is similar between WT and dI-IIB mutants. This experiment would test if the increased PPR/depression variability is a secondary consequence of a decrease in Ca2+ influx, or specific to the splice isoform.

      In fact, the interpretation that decreased PSC amplitude upon I-IIB excision is caused mainly by reduced channel number is precisely our interpretation (see discussion page 14, last paragraph to page 15, first paragraph in the original submission, now page 16, second paragraph paragraph). In addition, we are grateful for the reviewer’s suggestion to triturate the external calcium such that the first PSC amplitude in matches in ∆I-IIB and control. This experiment tests whether altered short term plasticity is solely a function of altered channel number or whether additional causes, such as altered channel properties, also play into this. We triturated the first pulse amplitude in ∆I-IIB to match control and find that paired pulse ratio and the variance thereof are not different anymore. Therefore, the differences observed in identical external calcium can be fully explained by altered channel numbers. This additional dataset is shown in the revised figures 6D and E and referred to in the results section on page 10, lines 14 to 25 and the discussion on page16, lines 36 to 38.

      (11) How were the depression kinetics analyzed? How many trains were used for each cell, and how do the tau values depend on the first PSC amplitude? Time constants in the range of a few (5-10) milliseconds are not informative for train stimulations with a frequency of 1 or 10 Hz (the unit is missing in Figure 5H). Also, the data shown in Figures 5E-K suggest slower time constants than 5-10 ms. Together, are the data indeed consistent with the idea that dIIIB does not only affect cac channel number, but also PPR/depression variability (p. 9)?

      For each animal the amplitudes of all subsequent PSCs in each train were plotted over time and fitted with a single exponential. For depression at 1 and 10 Hz, we used one train per animal, and 5-6 animals per genotype (as reflected in the data points in Figs. 6I, M). This is now explained in more detail in the revised methods section (page 23, lines 39 to 41). The tau values are not affected by the amplitude of the first PSC. First, we carefully re-fitted new and previously presented depression data and find that the taus for depression at low stimulation frequencies (1 and 10Hz) are not affected by exon excisions at the I-II site. We thank the reviewer for detecting our error in units and tau values in the previous figure panels 5H and L (this has now been corrected in the revised figure panels 6I and M). Given that PSC amplitude upon I-IIB excision is significantly smaller than in controls and following I-IIA excision, we suspected that the time course of depression at low stimulation frequency is not significantly affected by the amount of calcium influx during the first PSC. To further test this, we followed the reviewer ’s suggestion and re-measured depression at 1 and 10 Hz for cac-GFP controls and for delta I-IIB in a higher external calcium concentration (1.8 mM), so that the first PSC was increased in amplitude in both genotypes (1.8 mM external calcium triturates the PSC amplitude in delta I-IIB to match that of controls measured in 0.5 mM external calcium, see revised Figs. 6H, L). Neither in control, nor in delta I-IIB did this affect the time course of synaptic depression (see revised Figs. 6I, M). This indicates that at low stimulation frequencies (1 and 10Hz) the time course of depression is not affected by mean quantal content. This is consistent with the paired pulse ratio at 100 ms interpulse interval shown in figures 6A-D. However, for synaptic depression at 1 Hz stimulation the variability of the data is higher for delta I-IIB (independent of external calcium concentration, see rev. Fig. 6I), which might also be due to reduced channel number in this genotype. Taken together, the data are in line with the idea that altered cac channel numbers in active zones are sufficient to explain all effects that we observe upon I-IIB excision on PPRs and synaptic depression at low stimulation frequencies. This is now clarified in the revised text on page 12, lines 3 to 7.

      (12) The GFP-tagged I-IIA and mEOS4b-tagged I-IIB cac puncta shown in Figure 6N appear larger than the Brp puncta. Endogenously tagged cac puncta are typically smaller than Brp puncta (Gratz et al., 2019). Also, the I-IIA and I-IIB fluorescence sometimes appear to be partially non-overlapping. First, I suggest adding panels that show all three channels merged. Second, could they analyze the area and area overlap of I-IIA and I-IIB with regard to each other and to Brp, and compare it to cac-GFP? Any speculation as to how the different tags could affect localization? Finally, I recommend moving the dI-IIA and dI-IIB localization data shown in Figure 6N to an earlier figure (Figure 1 or Figure 3).

      We now show panels with the two I-II cac isoforms merged in the revised figure 7H (previously 6N). We also tested merging all three labels as suggested, but found this not instructive for the reader. We thank the reviewer for pointing out that the Brp puncta appeared smaller than the cac puncta in some panels. We carefully went through the data and found that the Brp puncta are not systematically smaller than the cac puncta. Please note that punctum size can appear quite differently, depending on different staining qualities as well as different laser intensities and different point spread in different imaging channels. The purpose of this figure was not to analyze punctum size and labeling intensity, but instead, to demonstrate that I-IIA and I-IIB are both present in most active zones, but some active zones show only I-IIB labeling, as quantified in figure 7I. We did not follow the suggestion to conduct additional co-localization analyses and compare it with cac-GFP controls, because Pearson co-localization coefficients for cac-GFP and all exon-out variants analyzed, including delta I-IIA and delta I-IIB are presented in the revised figure 4D. Moreover, delta I-IIA and delta I-IIB show similar Manders 1 and 2 co-localization coefficients with Brp (see Figs. 4E, F). We do not want to speculate whether the different tags have any effect on localization precision. Artificial differences in localization precision can also be suggested by different antibodies, but we know from our STED analyses with identical tags and antibodies for all isoforms that I-IIA and I-IIB co-localize identically with Brp (see Figs. 5A-E). Finally, we prefer to not move the figure because we believe it is informative to show our finding that active zones usually contain both splice I-II variants together with the finding that only I-IIB is required for PHP.

      Recommendations for the authors:

      Reviewing Editor Comments:

      We thank you for your submission. All three reviewers urge caution in interpreting the S4 splice variant playing a role specifically in Cac localization, as opposed to just leading to instability and degradation. There are other issues with the electrophysiological experiments, a need for improved imaging and analyses, and some areas of interpretation detailed in the reviews.

      We agree that additional data was required to conclude that IS4 splicing plays a specific role in cac channel localization and is not just leading to channel instability and degradation. As outlined in detail in our response to reviewer 1, comment 1, we conducted several sets of experiments to support our interpretation. First, electrophysiological experiments show that upon removal of IS4B, which eliminates synaptic transmission at the larval NMJ and cac positive label in presynaptic active zones, somatodendritic cac current is reliably recorded (new data in revised figure 3C). This is not in line with a channel instability or degradation effect, but instead with IS4B containing isoforms being required and sufficient for evoked release from NMJ motor terminals, whereas IS4A isoforms are not sufficient for evoked release from axon terminals, but IS4A isoforms alone can mediate a distinct component of somatodendritic calcium current. Second, immunohostochemical analyses reveal that IS4A, which is not present in NMJ presynaptic active zones, is expressed sparsely, but in reproducible patterns in the larval brain lobes and in specific regions of the anterior VNC parts (new supplementary figure 1). Again, the absence of a IS4A-containing cac isoform from presynaptic active zones but their simultaneous presence in other parts of the nervous system is in accord with isoform specific localization, but not with general channel isoform instability. Third, enlargements of NMJ boutons with brp positive presynaptic active zones confirm the absence of IS4A and the presence of IS4B in active zones (these enlargements are now shown in the revised figures 2A-C, 3A, and 4A-C). Fourth, as suggested we have quantified the Pearson co-localization of IS4 isoforms with Brp in presynaptic active zones (revised Fig. 2D). This confirms quantitatively similar co-localization of IS4B and control with Brp, but no co-localization of IS4A with Brp. In fact, the labeling intensity of IS4A in presynaptic active zones is quantitatively not significantly different from background, no IS4A label is detected anywhere in the axon terminals at the NMJ, but we find IS4 label in the CNS. Together, these data strongly support our interpretation that the IS4 splice site plays a distinct role in cac channel localization. Figure legends as well as results and discussion section have been modified accordingly (the respective page and line numbers are listed in our-point-by-point responses).

      In addition, we have carefully addressed all other public comments as well as all other recommendations for authors by providing multiple new data sets, new image analyses, and revising text. Addressing the insightful comments of all three reviewers and the reviewing editor has greatly helped to make the manuscript better.

      Reviewer #1 (Recommendations For The Authors):

      The conclusion that the IS4B exon controls Cac localization to active zones versus simply being required for channel abundance is not well supported. The authors need to either mention both possibilities or provide stronger support for the active zone localization model if they want to emphasize this point.

      We agree and have included several additional data sets as outlined in our response to point 1 of reviewer 1 and to the reviewing editor (see above). These new data strongly support our interpretation that the IS4B exon controls Cac localization to active zones and is not simply required for channel abundance. The additions to the figures and accompanying text (including the respective figure panel, page, and line numbers) are listed in the point-bypoint responses to the reviewers’ public suggestions.

      Figure 2C staining for Cac localization in the delta 4B line is difficult to compare to the others, as the background staining is so high (muscles are green for example). As such, it is hard to determine whether the arrows in C are just background.

      We had over-emphasized the green label to show that there really is no cacophony label in active zones. However, we agree that this hampered image interpretation. Thus, we have adjusted brightness such that it matches the other genotypes (see new figure panel 2C, and figure 3A, bottom). Revising the figure as suggested by the reviewer shows much more clearly that IS4B puncta are detected exclusively in presynaptic active zones, whereas IS4A channels are not detectable in active zones or anywhere else in the axon terminal boutons. Quantification of IS4A label in brp positive active zones confirms that labeling intensity is not significantly above background (page 6, lines 29 to 31 and page 7, lines 19 to 21). Therefore, IS4A is not detectable in active zones at the NMJ.

      It seems more likely that the removal of the 4B exon simply destabilizes the protein and causes it to be degraded (as suggested by the Western), rather than mislocalizing it away from active zones. It's hard to imagine how some residue changes in the S4 voltage sensor would control active zone localization to begin with. The authors should note that the alternative explanation is that the protein is just degraded when the 4B exon is removed.

      Based on additional data and analyses, we disagree with the interpretation that removal of IS4B disrupts protein integrity and present multiple lines of evidence that support sparse expression of IS4A channels (ΔIS4B). As outlined in our response to reviewer 1 and to the reviewing editor, we show (1) in new immunohistochemical stainings (new supplementary figure 1) that upon removal of IS4B, sparse label is detectable in the VNC and the brain lobes (for detail see above). (2) In our new figure 3C, we show cacophony-mediated somatodendritic calcium currents recorded from adult flight motoneurons in a control situation and upon removal of IS4B that leaves only IS4A channels. This clearly demonstrates that IS4A underlies a substantial component of the HVA somatodendritic calcium current, although it is absence from axon terminals. This is in line with isoform specific functions at different locations, but not with IS4A instability/degradation. (3) We do not agree with the reviewer’s interpretation of the Western Blot data in figure 1E (formerly figure 1D). Together with our immunohistochemical data that show sparse cacophony IS4A expression, we think that the faint band upon removal of IS4B in a heterozygous background (that reduces labeled channels even further) reflects the sparseness of IS4A expression. This sparseness is not due to channel instability, but to IS4A functions that are less abundant than the ubiquitously expressed cac<sup>IS4B</sup> channels at presynaptic active zones of fast chemical synapses (see page 15, lines 24 to 29).

      If they really want to claim the 4B exon governs active zone localization, much higher quality imaging is required (with enlarged views of individual boutons and their AZs, rather than the low-quality full NMJ imaging provided). Similarly, higher resolution imaging of Cac localization at Muscle 12 (Figure 2H) boutons would be very useful, as the current images are blurry and hard to interpret. Figure 6N shows beautiful high-resolution Cac and Brp imaging in single boutons for the I-II exon manipulations - the authors should do the same for the 4B line. For all immuno in Figure 2, it is important to quantify Cac intensity as well. There is no quantification provided, just a sample image. The authors should provide quantification as they do for the delta I-II exons in Figure 3.

      We did as suggested and added figure panels to figure 2A-C and to new figures 3A (formerly part of figure 2 and 4A-C (formerly figure 3) showing magnified label at the NMJ AZs to better judge on cacophony expression after exon excision. These data are now referred to in the results section on page 6, lines 22 to 24, page 7, lines 18 to 21 and page 8, lines 17/18.

      As suggested, we now also provide quantification of co-localization with brp puncta as Pearson’s correlation coefficient for control, IS4B, and IS4A in the new figure panel 2D (text on page 6, lines 34 to 38). This further underscores control-like active zone localization of IS4B but no significant active zone localization of IS4A. As suggested, we quantified now also the intensity of IS4B label in active zones, and it was not different from control (see revised figure 4H and text on page 8, lines 38/39). We did not quantify the intensity of IS4A label, because it was not over background (text, page 6, lines 30/31).

      Reviewer #2 (Recommendations For The Authors):

      (1a) Questions about the engineered Cac splice isoform alleles:

      The authors using CRISPR gene editing to selectively remove the entire alternatively spliced exons of interest. Do the authors know what happens to the cac transcript with the deleted exon? Is the deleted exon just skipped and spliced to the next exon? Or does the transcript instead undergo nonsense-mediated decay?

      We do not believe that there is nonsense mediated mRNA decay, because for all exon excisions the respective mRNA and protein are made. Protein has been detected on the level of Western blotting and immunocytochemistry. Therefore, we are certain that the mRNA is viable for each exon excision (and we have confirmed this for low abundance cac protein isoforms by rt-PCR), but only subsets of cac isoforms can be made from mRNAs that are lacking specific exons. However, we can not make any statements as to whether the lack of specific protein isoforms exerts feedback on mRNA stability, the rate of transcription and translation, or other unknown effects.

      (1b) While it is clear that the IS4 exons encode part of the voltage sensor in the first repeat, are there studies in Drosophila to support the putative Ca-beta and G-protein beta-gamma binding sites in the I-II loop? Or are these inferred from Mammalian studies?

      To the best of our knowledge, there are no studies in Drosophila that unambiguously show Caβ and Gβγ binding sites in the I-II loop of cacophony. However, sequence analysis strongly suggests that I-IIB contains both, a Caβ as well as a Gβγ binding site (AID: α-interacting domain) because the binding motif QXXER is present. In mouse Cav2.1 and Ca<sub>v</sub>2.2 channels the sequence is QQIER, while in Drosophila cacophony I-IIB it is QQLER. In the alternative IIIA, this motif is not present, strongly suggesting that G<sub>βγ</sub> subunits cannot interact at the AID. However, as already suggested by Smith et al. (1998), based on sequence analysis, Ca<sub>β</sub> should still be able to bind, although possibly with a lower affinity. We agree that this information should be given to the reader and have revised the text accordingly on page 5, lines 9 to 17.

      (1c) The authors assert that splicing of Cav2/cac in flies is a means to encode diversity, as mammals obviously have 4 Cav2 genes vs 1 in flies. However, as the authors likely know, mammalian Cav2 channels also have various splice isoforms encoded in each of the 4 Cav2 genes. The authors should discuss in more detail what is known about the splicing of individual mammalian Cav2 channels and whether there are any homologous properties in mammalian channels controlled by alternative splicing.

      We agree and now provide a more comprehensive discussion of vertebrate Ca<sub>v</sub>2 splicing and its impact on channel function. In line to what we report in Drosophila, properties like G<sub>βγ</sub> binding and activation voltage can also be affected by alternative splicing in vertebrate Ca<sub>v</sub>2 channel, through the exon patterns are quite different from Drosophila. We integrated this part on page 14, first paragraph) in the revised discussion. The respective text is below for the reviewer’s convenience:

      “However, alternative splicing increases functional diversity also in mammalian Ca<sub>v</sub>2 channels. Although the mutually exclusive splice site in the S4 segment of the first homologous repeat (IS4) is not present in vertebrate Cav channels, alternative splicing in the extracellular linker region between S3 and S4 is at a position to potentially change voltage sensor properties (Bezanilla 2002). Alternative splice sites in rat Ca<sub>v</sub>2.1 exon 24 (homologous repeat III) and in exon 31 (homologous repeat IV) within the S3-S4 loop modulate channel pharmacology, such as differences in the sensitivity of Ca<sub>v</sub>2.1 to Agatoxin. Alternative splicing is thus a potential cause for the different pharmacological profiles of P- and Q-channels (both Ca<sub>v</sub>2.1; Bourinet et al. 1999). Moreover, the intracellular loop connecting homologous repeats I and II is encoded by 3-5 exons and provides strong interaction with G<sub>βγ</sub>-subunits (Herlitze et al. 1996). In Ca<sub>v</sub>2.1 channels, binding to G<sub>βγ</sub> subunits is potentially modulated by alternative splicing of exon 10 (Bourinet et al. 1999). Moreover, whole cell currents of splice forms α1A-a (no Valine at position 421) and α1A-b (with Valine) represent alternative variants for the I-II intracellular loop in rat Ca<sub>v</sub>2.1 and Ca<sub>v</sub>2.2 channels. While α1A-a exhibits fast inactivation and more negative activation, α1A-b has delayed inactivation and a positive shift in the IV-curve (Bourinet et al. 1999). This is phenotypically similar to what we find for the mutually exclusive exons at the IS4 site, in which IS4B mediates high voltage activated cacophony currents while IS4A channels activate at more negative potentials and show transient current (Fig. 3; see also Ryglewski et al. 2012). Furthermore, altered Ca<sub>β</sub> interaction have been shown for splice isoforms in loop III (Bourinet et al. 1999), similar to what we suspect for the I-II site in cacophony. Finally, in mammalian VGCCs, the C-terminus presents a large splicing hub affecting channel function as well as coupling distance to other proteins. Taken together, Ca<sub>v</sub>2  channel diversity is greatly enhanced by alternative splicing also in vertebrates, but the specific two mutually exclusive exon pairs investigated here are not present in vertebrate Ca<sub>v</sub>2 genes.”

      (1d) In Figure 1, it would be helpful to see the entire cac genomic locus with all introns/exons and the 4 specific exons targeted for deletion.

      We agree and have changed figure 1 accordingly.

      (2a) Cav2.IS4B deletion alleles:

      More work is necessary to explain the localization of Cac controlled by the IS4B exon. First, can the authors determine whether actual Cac channels are present at NMJ boutons? The authors seem to indicate that in the IS4B deletion mutants, some Cac (GFP) signal remains in a diffuse pattern across NMJ boutons. However, from the imaging of wild-type Cac-GFP (and previous studies), there is no Cac signal outside of active zones defined by the BRP signal. It would benefit the study to a) take additional, higher resolution images of the remaining Cac signal at NMJs in IS4B deletion mutants, and b) comment on whether the apparent remaining signal in these mutants is only observed in the absence of IS4Bcontaining Cac channels, or if the IS4A-positive channels are normally observed (but perhaps mis-localized?).

      We have conducted additional analyses to show convincingly that IS4A channels (that remain upon IS4B deletion) are absent from presynaptic active zone. Please see also responses to reviewers 1 and 3. By adjusting the background values in of CLSM images to identical values in control, delta IS4A, and delta IS4B, as well as by providing selective enlargements as suggested, the figure panels 2C, Ci and 3A now show much clearer, that upon deletion of IS4B no cac label remains in active zones or anywhere else in the axon terminal boutons (see text on page 6, lines 22 to 24). This is further confirmed by quantification showing the in IS4B mutants cac labeling intensity in active zones is not above background (see text on page 6, lines 27 to 31). We never intended to indicate that there was cac signal outside of active zones defined by the brp signal, and we now carefully went through the text to not indicate this possibility unintentionally anywhere in the manuscript.

      (2b) Do the authors know whether any presynaptic Ca2+ influx is contributed by IS4Apositive Cac channels at boutons, given the potential diffuse localization? There are various approaches for doing presynaptic Ca2+ imaging that could provide insight into this question.

      We agree that this is an interesting question. However, based on the revisions made, we now show with more clarity that IS4A channels are absent from the presynaptic terminal at the NMJ. IS4A labeling intensities within active zones and anywhere else in the axon terminals are not different from background (see text on page 6, lines 27 to 31 and revised Figs. 2C, Ci, and 3A with new selective enlargements in response to comments of both other reviewers). This is in line with our finding that evoked synaptic transmission from NMJ axon terminals to muscle cells is mostly absent upon excision of IS4B (see Fig. 3B). The very small amplitude EPSC (below 5 % of the normal amplitude of evoked EPSCs) that can still be recorded in the absence of IS4B is similar to what is observed in cac null mutant junctions and is mediated by calcium influx through another voltage gated calcium channels, a Ca<sub>v</sub>1 homolog named Dmca1D, as we have previously published (Krick et al., 2021, PNAS 118(28):e2106621118. Gathering additional support for the absence of IS4A from presynaptic terminals by calcium imaging experiments would suffer significantly from the presence of additional types of VGCCs in presynaptic terminals (for sure Dmca1D (Krick et al., 2021) and potentially also the Ca<sub>v</sub>3 homolog DmαG or Dm-α1T). Such experiments would require mosaic null mutants for cac and DmαG channels in a mosaic IS4B excision mutant, which, if feasible at all, would be very hard and time consuming to generate. In the light of the additional clarification that IS4A is not located in NMJ axon terminal boutons, as shown by additional labeling intensity analysis, revised figures with selective enlargement, and revised text, we feel confident to state that IS4A is not sufficient for evoked SV release.

      (2c) Mechanistically, how are amino acid changes in one of the voltage sensing domains in Cac related to trafficking/stabilization/localization of Cac to AZs?

      This is an exciting question that has occupied our discussions a lot. Some sorting mechanism must exist that recognizes the correct protein isoforms, just as sorting and transport mechanisms exist that transport other synaptic proteins to the synapse. We do not think that the few amino acid changes in the voltage sensor are directly involved in protein targeting. We rather believe that the cacophony variants that happen to contain this specific voltage sensor are selected for transport out to the synapse. There are possibilities to achieve this cell biological, but we have not further addressed potential mechanisms because we do not want enter the realms of speculation.

      (3) How are auxiliary subunits impacted in the Cac isoform mutants?

      Recent work by Kate O'Connor-Giles has shown that both Stj and Ca-Beta subunits localize to active zones along with Cac at the Drosophila NMJ. Endogenously tagged Stj and CaBeta alleles are now available, so it would be of interest to determine if Stj and particular Cabeta levels or localization change in the various Cac isoform alleles. This would be particularly interesting given the putative binding site for Ca-beta encoded in the I-II linker.

      We agree that the synthesis of the work of Kate O'Connor-Giles group and our study open up new avenues to explore exciting hypotheses about differential coupling of specific cacophony splice isoforms with distinct accessory proteins such as Caβ and α<sub>2</sub>δ subunits. However, this requires numerous full sets of additional experiments and is beyond the scope of this study.

      (4a) Interpretation of short-term plasticity in the I-IIB exon deletion:

      The changes in short-term plasticity presented in Figure 5 are interpreted as an additional phenotype due to the loss of the I-IIB exon, but it seems this might be entirely explained simply due to the reduced Cac levels. Reduced Cac levels at active zones will obviously reduce Ca2+ influx and neurotransmitter release. This may be really the only phenotype/function of the I-IIB exon. Hence, to determine whether loss of the I-IIB exon encodes any functions in short-term plasticity, separate from reduced Cac levels, the authors should compare short-term plasticity in I-IIB loss alleles compared to wild type with starting EPSC amplitudes are equal (for example by reducing extracellular Ca2+ levels in wild type to achieve the same levels at in Cac I-IIB exon deleted alleles). Reduced release probability, simply by reduced Ca2+ influx (either by reduced Cac abundance or extracellular Ca2+) should result in more variability in transmission, so I am not sure there is any particular function of the I-IIB exon in maintaining transmission variability beyond controlling Cac abundance at active zones.

      For two reasons we are particularly grateful for this comment. First, it shows us that we needed to explain much clearer that our interpretation is that changes in paired pulse ratios (PPRs) and in depression at low stimulation frequencies are a causal consequence of lower channel numbers upon I-IIB exon deletion, precisely as pointed out by the reviewer. We have carefully revised the text accordingly on page 10, lines 14-25, page 11, lines 3-7 and 22-28; page 16, lines 36-38. Second, the experiment suggested by the reviewer is superb to provide additional evidence that the cause of altered PPRs is in fact reduced channel number, but not altered channel properties. Accordingly, we have conducted additional TEVC recordings in elevated external calcium (1.8 mM) so that the single PSC amplitudes in I-IIB excision animals match those of controls in 0.5 mM extracellular calcium. This makes the amplitudes and the variance of PPR for all interpulse intervals tested control-like (see revised Figs. 6D, E). This strongly indicates that differences observed in PPRs as well as the variance thereof were caused by the amount of calcium influx during the first EPSC, and thus by different channel numbers in active zones.

      (4b) Another point about the data in Figure 5: If "behaviorally relevant" motor neuron stimulation and recordings are the goal, the authors should also record under physiological Ca2+ conditions (1.8 mM), rather than the highly reduced Ca2+ levels (0.5 mM) they are using in their protocols.

      Although we doubt that the effective extracellular calcium concentration that determines the electromotoric force for calcium to enter the ensheathed motoneuron terminals in vivo during crawling is known, we followed the reviewer’s suggestion partly and have repeated the high frequency stimulation trains for ΔI-IIB in 1.8 mM calcium. As for short-term plasticity this brings the charge conducted to values as observed in control and in ΔI-IIA in 0.5 mM calcium. Therefore, all difference observed in previous figure 5 (now revised figure 6) can be accounted to different channel numbers in presynaptic active zones. This is now explained on page 11, lines 19-28. For controls recordings at high frequency stimulation in higher external calcium (e.g. 2 mM) have previously been published and show significant synaptic depression (e.g. Krick et al., 2021, PNAS). Given that in the exon out variants we do not expect any differences except from those caused by different channel numbers, we did not repeat these experiments for control and ΔI-IIA.

      (5a) Mechanism of Cac's role in PHP :

      As the authors likely know, mutations in Cac were previously reported to disrupt PHP expression (see Frank et al., 2006 Neuron). Inexplicably, this finding and publication were not cited anywhere in this manuscript (this paper should also be cited when introducing PhTx, as it was the first to characterize PhTx as a means of acutely inducing PHP). In the Frank et al. paper (and in several subsequent studies), PHP was shown to be blocked in mutations in Cac, namely the CacS allele. This allele, like the I-IIB excision allele, reduces baseline transmission presumably due to reduced Ca2+ influx through Cac. The authors should at a minimum discuss these previous findings and how they relate to what they find in Figure 6 regarding the block in PHP in the Cac I-IIB excision allele.

      We thank the reviewer for pointing this out and apologize for this oversight. We agree that it is imperative to cite the 2006 paper by Frank et al. when introducing PhTx mediated PHP as well as when discussing cac the effects of cac mutants on PHP together with other published work. We have revised the text accordingly on page 12, lines 9-11 and 21-23 and on page 17, lines 29-33.

      In terms of data presentation in Fig. 6, as is typical in the field, the authors should normalize their mEPSC/QC data as a percentage of baseline (+PhTx/-PhTx). This makes it easier to see the reduction in mEPSC values (the "homeostatic pressure" on the system) and then the homeostatic enhancement in QC. Similarly, in Fig. 6M, the authors should show both mEPSC and QC as a percentage of baseline (wild type or non-GluRIIA mutant background).

      We agree and have changed figure presentation accordingly. Figure 7 (formerly figure 6) was updated as was the accompanying results text on page 12, lines 23-40.

      (6) Cac I-IIA and I-IIB excision allele colocalization at AZs:

      These are very nice and important experiments shown in Figures 6N and O, which I suggest the authors consider analyzing in further detail. Most significantly:

      (6i) The authors nicely show that most AZs have a mix of both Cac IIA and IIB isoforms. Using simple intensity analysis, can the authors say anything about whether there is a consistent stoichiometric ratio of IIA vs IIB at single AZs? It is difficult to extract actual numbers of IIA vs IIB at individual AZs without having both isoforms labeled mEOS4b, but as a rough estimate can the authors say whether the immunofluorescence intensity of IIA:IIB is similar across each AZ? Or is there broad heterogeneity, with some AZs having low vs high ratios of each isoform (as the authors suggest across proximal to distal NMJ AZs)?

      We agree and have conducted experiments and analyses to provide these data. We measured the cac puncta fluorescence intensities for heterozygous cac<sup>sfGFP</sup>/cac, cacIIIA<sup>sfGFP</sup>/cacI-IIB, and cacI-IIB<sup>sfGFP</sup>/cacI-IIA animals. We preferred this strategy, because intensity was always measured from cac puncta with the same GFP tag. Next, we normalized all values to the intensities obtained in active zones from heterozygous cac<sup>sfGFP</sup>/cac controls and then plotted the intensities of I-IIA versus I-IIB containing active zones side by side. Across junctions and animals, we find a consistent ratio 2:1 in the relative intensities of I-IIB and I-IIA, thus indicating on average roughly twice as many I-IIB as compared to I-IIA channels across active zones. This is consistent with the counts in our STED analysis (see Fig. 5F). These new data are shown in the new figure panel 7J and referred to on page 13, lines 10-16 in the revised text.

      (6ii) Intensity analysis of Cac IIA vs IIB after PHP: Previous studies have shown Cac abundance increases at NMJ AZs after PHP. Can the authors determine whether both Cac IIA vs IIB isoforms increase after PHP or whether just one isoform is targeted for this enhancement?

      We already show that PHP is not possible in the absence of I-IIB channels (see figure 7). However, we agree that it is an interesting question to test whether I-IIA channel are added in the presence of I-IIB channels during PHP, but we consider this a detail beyond the scope of this study.

      Minor points:

      (1) Including line numbers in the manuscript would help to make reviewing easier.

      We agree and now provide line numbers.

      (2) Several typos (abstract "The By contrast", etc).

      We carefully double checked for typos.

      (3) Throughout the manuscript, the authors refer to Cac alleles and channels as "Cav2", which is unconventional in the field. Unless there is a compelling reason to deviate, I suggest the authors stick to referring to "Cac" (i.e. cacdIS4B, etc) rather than Cav2. The authors make clear in the introduction that Cac is the sole fly Cav2 channel, so there shouldn't be a need to constantly reinforce that cac=Cav2.

      We agree and have changed all fly Ca<sub>v</sub>2 reference to cac.

      (4) In some figures/text the authors use "PSC" to refer to "postsynaptic current", while in others (i.e. Figure 6) they switch to the more conventional terms of mEPSC or EPSC. I suggest the authors stick to a common convention (mEPSC and EPSC).

      We have changed PSC to EPSC throughout.

      Reviewer #3 (Recommendations For The Authors):

      (1) The abstract could focus more on the results at the expense of the background.

      We agree and have deleted the second introductory background sentence and added information on PPRs and depression during low frequency stimulation.

      (2) What does "strict" active zone localization refer to? Could they please define the term strict?

      Strict active zone localization means that cac puncta are detected in active zones but no cac label above background is found anywhere else throughout the presynaptic terminal, now defined on page 6, lines 27-29.

      (3) Single boutons/zoomed versions of the confocal images shown in Figures 2A-C, 2H, and 3A-C would be very helpful.

      We have provided these panels as suggested (see above and revised figures 2-4). Figure 3 is now figure 4.

      (4) The authors cite Ghelani et al. (2023) for increased cac levels during homeostatic plasticity. I recommend citing earlier work making similar observations (Gratz et al., 2019; DOI: 10.1523/JNEUROSCI.3068-18.2019), and linking them to increased presynaptic calcium influx (Müller & Davis, 2012; DOI: 10.1016/j.cub.2012.04.018).

      We agree and have added Gratz et al. 2019 and Davis and Müller 2012 to the results section on page 12, lines 17/18 and lines 21-23, in the discussion on page 17, lines 29-33.

      (5) The data shown in Figure 3 does not directly support the conclusion of altered release probability in dI-IIB. I therefore suggest changing the legend's title.

      We have reworded to “Excisions at the I-II exon do not affect active zone cacophony localization but can alter cacsfGFP label intensity in active zones and PSC amplitude” as this is reflecting the data shown in the figure panels more directly.

      (6) It would be helpful to specify "adult flight muscle" in Figure 2J.

      We agree that it is helpful to specify in the figure (now revised figure 3C) that the voltage clamp recordings of somatodendritic calcium current were conducted in adult flight motoneurons and have revised the headline of figure panel 3C and the legend accordingly. Please note, these are not muscle cells but central neurons.

      (7) Do dIS4B/Cav2null MNs indeed show an inward or outward current at -90 to -70 mV/-40 and -50 mV, or is this an analysis artifact?

      No, this is due to baseline fluctuations as typical for voltage clamp in central neurons with more than 6000 µm dendritic length and more than 4000 dendritic branches.

      (8) Loss of several presynaptic proteins, including Brp (Kittel et al., 2006), and RBP (Liu et al., 2011), induce changes in GluR field size (without apparent changes in miniature amplitude). The statement regarding the Cav2 isoform and possible effects on GluR number (p. 8) should be revised accordingly.

      We understand and have done two things. First, we measured the intensity of GluRIIA immunolabel in ΔI-IIA, ΔI-IIB, and controls and found no differences. Second, we reworded the statement. It now reads on page 9, lines 1-6: “It seems unlikely that presynaptic cac channel isoform type affects glutamate receptor types or numbers, because the amplitude of spontaneous miniature postsynaptic currents (mEPSCs, Fig. 4K) and the labeling intensity of postsynaptic GluRIIA receptors are not significantly different between controls, I-IIA, and I-IIB junctions (see suppl. Fig. 2, p = 0.48, ordinary one-way ANOVA, mean and SD intensity values are 61.0 ± 6.9 (control), 55.8 ± 8.5 (∆I-IIA), 61.1 ± 17.3 (∆I-IIB)). However, we cannot exclude altered GluRIIB numbers and have not quantified GluR receptor field sizes.”

      (9) The statement relating miniature frequency to RRP size is unclear (p. 8). Is there any evidence for a correlation between miniature frequency to RRP size? Could the authors please clarify?

      We agree that this statement requires caution. Although there is some published evidence for a correlation of RRP size and mini frequency (Neuron, 2009 61(3):412-24. doi: 10.1016/j.neuron.2008.12.029 and Journal of Neuroscience 44 (18) e1253232024; doi: 10.1523/JNEUROSCI.1253-23.2024), which we now refer to on page 9, it is not clear whether this is true for all synapses and how linear such a relationship may be. Therefore, we have revised the text on page 9, lines 6-9. It now reads: “Similarly, the frequency of miniature postsynaptic currents (mEPSCs) remains unaltered. Since mEPSCs frequency has been related to RRP size at some synapses (Pan et al., 2009; Ralowicz et al., 2024) this indicates unaltered RRP size upon I-IIB excision, but we have not directly measured RRP size.”

      (10) Please define the "strict top view" of synapses (p. 8).

      Top view is what this reviewer referred to as “planar view” in the public review points 6 and 7. In our responses to these public review points we now also define “strict top view”, see page 9, lines 17-19.

      (11) Two papers are cited regarding a linear relationship between calcium channel number and release probability (p. 15). Many more papers could be cited to demonstrate a supralinear relationship (e.g., Dodge & Rahaminoff, 1967; Weyhersmüller et al., 2011 doi: 10.1523/JNEUROSCI.6698-10.2011). The data of the present study were collected at an extracellular calcium concentration of 0.5 mM, whereas Meideiros et al. (2023) used 1.5 mM. The relationship between calcium and release is supra-linear around 0.5 mM extracellular calcium (Weyhersmüller et al. 2011). This should be discussed/the statements be revised. Also, the reference to Meideiros et al. (2023) should be included in the reference list.

      We have now updated the Medeiros reference (updated version of that paper appeared in eLife in 2024) in the text and reference list. We agree that the relationship of the calcium concentration and P<sub>r</sub> can also be non-linear and refer to this on page 16, lines 26-32, but the point we want to make is to relate defined changes in calcium channel number (not calcium influx) as assessed by multiple methods (CLSM intensity measures and sptPALM channel counting) to release probability. We now also clearly state that we measured at 0.5 mM external calcium (page 16, lines 27/28) whereas Medeiros et al. 2024 measured at 1.5 mM calcium (page 16, lines 31/32).

      (12) Figure 6: Quantal content does not have any units - please remove "n vesicles".

      We have revised this figure in response to reviewer 2 (comment 5) and quantal content is now expressed as percent baseline, thus without units (see revised figure 7).

      (13) Figure 6C should be auto-scaled from zero.

      This has been fixed by revising that figure in response to reviewer 2 (comment 5)

      (14) The data supporting the statement on impaired motor behavior and reduced vitality of adult IS4A should be either shown, or the statement should be removed (p. 13). Any hypotheses as to why IS4A is important for behavior and or viability?

      As suggested, we have removed that statement.

      (15) They do not provide any data supporting the statement that changes in PSC decay kinetics "counteract" the increase in PSC amplitude (p. 14). The sentence should be changed accordingly.

      We agree and have down toned. It now reads on page 16, lines 7-9: “During repetitive firing, the median increase of PSC amplitude by ~10 % is potentially counteracted by the significant decrease in PSC half amplitude width by ~25 %...”.

      (16) How do they explain the net locomotion speed increase in dI    -IIA larvae? Although the overall charge transfer is not affected during the stimulus protocols used, could the accelerated PSC decay affect PSP summation (I would actually expect a decrease in summation/slower speed)? Independent of the voltage-clamp data, is muscle input resistance changed in dI-IIA mutants?

      Muscle input resistance is not altered in I-II mutants. We refer to potential causes of the locomotion effects of I-IIA excision in the discussion. On page 16, lines 12 to 21 it reads: “there is no difference in charge transfer from the motoneuron axon terminal to the postsynaptic muscle cell between ∆I-IIA and control. Surprisingly, crawling is significantly affected by the removal of I-IIA, in that the animals show a significantly increased mean crawling speed but no significant change in the number of stops. Given that the presynaptic function at the NMJ is not strongly altered upon I-IIA excision, and that I-IIA likely mediates also Ca<sub>v</sub>2 functions outside presynaptic AZs (see above) and in other neuron types than motoneurons, and that the muscle calcium current is mediated by Ca<sub>v</sub>1>/i> and Ca<sub>v</sub>3, the effects of I-IIA excision of increasing crawling speed is unlikely caused by altered pre- or postsynaptic function at the NMJ. We judge it more likely that excision of I-IIA has multiple effects on sensory and pre-motor processing, but identification of these functions is beyond the scope of this study.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study analyzed biomarker data from 28 subjects with geographic atrophy (GA) in a Phase I/II clinical trial of PPY988, a subretinal AAV2 complement factor I (CFI) gene therapy, to evaluate pharmacokinetics and pharmacodynamics. Post-treatment, a 2-fold increase in the vitreous humor (VH) FI was observed, correlating with a reduction in FB breakdown product Ba but minimal changes in other complement factors. The aqueous humor (AH) was found to be an unreliable proxy for VH in assessing complement activation. In vitro assays showed that the increase in FI had a minor effect on the complement amplification loop compared to the more potent C3 inhibitor pegcetacoplan. These findings suggest that PPY988 may not provide enough FI protein to effectively modulate complement activation and slow GA progression, highlighting the need for a thorough biomarker review to determine optimal dosing in future studies.

      Strengths:

      This manuscript provides critical data on the efficacy of gene therapy for the eye, specifically introducing complement FI expression. It presents the results from a halted clinical trial, making sharing this data essential for understanding the outcomes of this gene therapy approach. The findings offer valuable insights and lessons for future gene therapy attempts in similar contexts.

      Weaknesses:

      No particular weaknesses. The study was carefully performed and limitations are discussed.

      I have just some concerns about the methodology used. The authors use the MILLIPLEX assays, which allow for multiplexed detection of complement proteins and they mention extensive validation. How are the measurements with this assay correlating with gold standard methods? Is the specificity and the expected normal ranges preserved with this assay? This also stands for the Olink assay. Some of the proteins are measured by both assay and/or by standard ELISA. How do these measurements correlate?

      The authors thank the reviewer for the positive response. Regarding the ELISA assays used to measure the array of complement proteins described, these were extensively validated for the following parameters: specificity, intra-assay and inter-assay precision, accuracy, stability, reference range, and parallelism. All assays were validated in plasma, vitreous and aqueous humour. Due to the limited volume and availability of ocular fluids from individuals in the study, validation in vitreous and aqueous matrices was performed using a pool of several samples from post-mortem donors. At the time this study was initiated, the Millipore Luminex complement panels and the Quidel C3a and Ba EIA were the most sensitive assays and the only commercially available options capable of measuring the proteins of interest in the context of limited vitreous and aqueous humor sample. The concentrations measured were observed at similar ranges as those published in the literature using assays in distinct patient populations e.g. in (Mandava et al, Invest Ophthalmol Vis Sci, 2020).

      Measurements from vitreous and aqueous from subject samples were deemed reportable if they were within the quantifiable ranges defined for these sample types during the validation (coefficient of variation of 20%, or 30% when results were below the lower limit of quantification but above limit of detection). Notably, given the limited amount of biomarker data due to small sample size, we share results from outlier biomarker measurements, to illustrate the heterogeneity in sample quality. We further publish plasma sample biomarker results in supplemental table 5 wherein complement protein concentrations can be observed and compared to normal ranges in the literature.

      Adding confidence to the robustness of our assays was the observation that some of the complement proteins quantified by standard assay (e.g. plate and bead-based ELISAs) were also measured by the OLINK assay, and there was a general trend observed for positive correlation between results from both assays for FI levels post-treatment. However, we did not provide detailed correlative statistical analyses for further complement proteins as OLINK findings were deemed highly exploratory and hypothesis generating, and because the OLINK assay produced normalised results which are challenging to directly compare to ELISA results that were absolute.

      Reviewer #2 (Public Review):

      Summary:

      The results presented demonstrate that AAV2-CFI gene therapy delivers long-term and marginally higher FI protein in vitreous humor that results in a concomitant reduction in the FB activation product Ba. However, the lack of clinical efficacy in the phase I/II study, possibly due to lower in vitro potency when compared to currently approved pegcetacoplan, raises important considerations for the utility of this therapeutic approach. Despite the early termination of the PPY988 clinical development program, the study achieved significant milestones, including the implementation of subretinal gene therapy delivery in older adults, complement biomarker comparison between serial vitreous humor and aqueous humor samples and vitreous humor proteomic assessment via Olink.

      Strengths:

      Long-term augmentation of FI protein in vitreous humor over 96 weeks and reduction of FB breakdown product Ba in vitreous humor suggests modulation of the complement system. Developed a novel in vitro assay suggesting FI's ability to reduce C3 convertase activity is weaker than pegcetacoplan and FH and may suggest a higher dose of FI will be required for clinical efficacy. Warn of the poor correlation between vitreous humor and aqueous humor biomarkers and suggest aqueous humor may not be a reliable proxy for vitreous humor with regard to complement activation/inhibition studies.

      Weaknesses:

      The vitrectomy required for the subretinal route of administration causes a long-term loss of total protein and may influence the interpretation of complement biomarker results even with normalization. The modified in vitro assay of complement activation suggests a several hundred-fold increase in FI protein is required to significantly affect C3a levels. Interestingly, the in vitro assay demonstrates 100% inhibition of C3a with pegcetacoplan and FH therapeutics, but only a 50% reduction with FI even at the highest concentrations tested. This observation suggests FI may not be rate-limiting for negative complement regulation under the in vitro conditions tested and potentially in the eye. It is unclear if pharmacokinetic and pharmacodynamic properties in aqueous humor and vitreous humor compartments are reliable predictors of FI level/activity after subretinal delivery AAV2-CFI gene therapy.

      The authors thank the reviewer for the positive response and we agree that a limitation of the biomarker strategy for ocular gene therapy delivered to the retinal tissues is inferring PK/PD from vitreous and aqueous samples, which are the fluid sample compartments accessible from subjects available to measure molecular treatment response. We agree that these compartments may not accurately represent sub-retinal and tissue level complement turnover. In the discussion, line 508, we state: ‘Overall, the data suggests that fully functional FI is being secreted into the VH, but the regulatory effects on the level of Ba may be representative of convertase formation in the VH and not the macula retina/RPE nor the choroid. To validate this hypothesis, one approach would be to conduct vitreal sampling using an effective drug targeting C3 for GA in a larger cohort’.

      However, the observation of elevation of FI in VH (and AH) post treatment, and changes in levels of downstream complement proteins that align with prior knowledge of control of alternative pathway activation, is compelling evidence that these measurements reflect modest but direct consequences of an FI-gene therapy that was delivered to the subretinal space. We add to the discussion, line 479: ‘the findings of elevated FI in the VH after sub-retinally delivered CFI gene therapy and changes in complement pathway proteins post-treatment build confidence that VH matrix is at least partially reflecting the complement system at the retinal layers and treatment site, and is a valid biomarker for PK/PD insights in response to treatment.’

      Furthermore, the observation of moderately raised FI levels in modelled VH post treatment being insufficient to control CS activation in vitro accords with the lack of clinical response observed at phase II. We note that measuring FI and complement biomarkers in retinal tissues from treated eyes at post-mortem would be one way to explore the PK/PD effects from AAV2-FI gene therapy.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Hallam et al describes the analysis of various biomarkers in patients undergoing complement factor I supplementation treatment (PPY988 gene therapy) as part of the FOCUS Phase I/II clinical trial. The authors used validated methods (multiplexed assays and OLINK proteomics) for measuring multiple soluble complement proteins in the aqueous humour (AH) and vitreous humour (VH) of 28 patients over a series of time points, up to and including 96 weeks. Based on biomarker comparisons, the levels of FI synthesised by PPY988 were believed to be insufficient to achieve the desired level of complement inhibition. Subsequent comparative experiments showed that PPY988-delivered FI was much less efficacious than Pegceptacoplan (FDA-approved complement inhibitor under the name SYFORVE) when tested in an artificial VH matrix.

      Strengths:

      The manuscript is well written with data clearly presented and appropriate statistics used for the analysis itself. It's great to see data from real clinical samples that can help support future studies and therapeutic design. The identification that complement biomarker levels present in the AH do not represent the levels found in the VH is an important finding for the field, given the number of complement-targeting therapies in development and the desperate need for good biomarkers for target engagement. This study also provides a wealth of baseline complement protein measurements in both human AH and VH (and companion measurements in plasma) that will prove useful for future studies.

      Weaknesses:

      Perhaps the conclusions drawn regarding the lack of observed efficacy are not fully justified. The authors focus on the hypothesis that not enough FI was synthesised in these patients receiving the PPY988 gene therapy, suggesting a delivery/transduction/expression issue. But beyond rare CFI genetic variants, most genetic associations with AMD imply that it is a FI-cofactor disease. A hypothesis supported by the authors' own experiments when they supplement their artificial VH matrix with FH and achieve a significantly greater breakdown of C3b than achieved with PPY988 treatment alone. Justification around doubling FI levels driving complement turnover refers to studies conducted in blood, which has an entirely different complement protein profile than VH. In Supplemental Table 5 we see there is approx. 10-fold more FH than FI (533ug/ml vs 50ug/ml respectively) so increasing FI levels will have a direct effect. Yet in Supplemental Table 3 we see there is more FI than FH in VH (608ng/ml vs 466ng/ml respectively). Therefore, adding more FI without more co-factors would have a very limited effect. Surely this demonstrates that the study was delivering the wrong payload, i.e. FI, which hit a natural ceiling of endogenous co-factors within the eye?

      See response to reviewer 3’s review after reviewer 3 recommendations section below.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      The authors present strong evidence using validated complement biomarker assays and comprehensive proteomic profiling that support their findings. The presentation of complement biomarker data in vitreous humor and aqueous humor after FI augmentation is presented in a clear and concise format. The direct comparison of complement biomarkers in vitreous humor and aqueous humor from the same patients and demonstrating similarities and differences is important for the nascent complement gene therapy field. Developing a novel in vitro complement model and comparing pegcetacoplan, FH, and FI inhibitors provides the field with a valuable assay to benchmark other complement therapeutics. As currently designed, the in vitro assay supports why FI augmentation did not contribute to clinical success. It also suggests that non-physiological concentrations of FI protein (over 100 µg/mL) maximally inhibit C3a signal by ~50%, whereas both pegcetacoplan and FH reduce the signal by 100%. Does this suggest that CFI is not an appropriate therapeutic target to control complement overactivation in the eye?

      We agree with the reviewer that the new data from the novel in vitro assay coupled with the clinical findings from the phase II gene therapy trial does now suggest FI is less attractive as a therapeutic target for controlling complement activation in the retinal tissues of subjects with Geographic Atrophy.

      Reviewer #3 (Recommendations For The Authors):

      I think the authors have done a great job collecting and analysing these clinical samples and elucidating the baseline complement protein profile in both the AH and VH. I only have minimal suggested changes.

      Perhaps a more direct discussion around the limitations of adding more FI into environments where there is no excess of FI-cofactors present? And a discussion around the limitations of VH (and VA for that matter) biomarker sampling for a disease that primarily affects the neurosensory retina and outer blood/retinal barrier: perhaps the landscape of complement proteins is different yet again (although, admittedly, impossible to sample in a patient)? Finally, would it not have been better to perform complement activation experiments using the VH of treated patients directly rather than creating an artificial VH matrix (there may, or may not, be a couple of things in human VH that directly affect complement turnover...)?

      We thank the reviewer for the supportive comments. This study is the first to describe FI and FH levels and respective ratios in vitreous humour (plus aqueous and plasma) from GA subjects, before and after sub-retinal gene therapy. It is compelling to observe that in the VH the levels of FI are greater than FH, the primary fluid phase co-factor for FI enzymatic activity. This new information does indeed argue against further FI supplementation (using gene therapy) being of added benefit to controlling the complement system in the broader population in individuals with Geographic Atrophy. We note that at the start of the clinical development of GT005/PPY988 AAV2-FI gene therapy, there was limited information on FI and FH levels in AMD in ocular fluids to inform the pharmacodynamics of complement activation. Now, by running the FOCUS phase I clinical trial and measuring the complement biomarker data using validated assays we have added to our understanding on the levels and ratio of FI to FH and other complement proteins in a larger number of GA subjects’ ocular samples.  We report the levels of complement proteins measured in ocular and systemic samples, to show the ranges and also the differences in ratios between the different matrices.   

      Regarding the statement that FI supplementation could likely be ineffective due to limited FH cofactor; FH is not the only co-factor that FI may partner with at cell surfaces to become enzymatically active (others include MCP (CD46) and CR1 (CD35), although the latter is known to be of limited expression in the eye), as such, it is certainly true that other proteins may be present in the tissue altering the kinetics of FI’s activity after sub-retinal gene-therapy. In addition, the ratio between FI and FH detected in the VH may not be the same as in retinal tissue. As such, we agree that drawing insights from biomarkers in the VH may not fully reflect the disease processes and treatment response at the retinal cell layers, but it is the closest fluid sample available to sample tissue released soluble proteins. We acknowledge that VH biomarkers will not fully capture retinal disease processes and treatment responses, but due to their proximity, will reflect retina-released soluble proteins. The findings of elevated FI in the VH after sub-retinally delivered CFI gene therapy and changes in complement pathway proteins post-treatment build confidence that VH matrix is at least partially reflecting the complement system at the retinal layers and treatment site, and is a valid biomarker for PK/PD insights in response to treatment. We agree modelling different inhibitor effects on complement activation directly using subject’s vitreous would be informative, but this was not possible due to the limitations of very small sample volume.

      We add several sentences to the discussion regarding the points above. Line 473: ‘Notably, that FI does not reduce C3a breakdown to baseline even at supermolecular concentrations suggests cofactor limitation that might be more pronounced in VH given FH is not in excess of FI as is the case in blood 27. Moreover, there are additional cell-bound cofactors for FI that may be present in retinal tissue that are not present in the VH and could further alter the kinetics of the assay, such as MCP (CD46) albeit with disease related changes observed 37. However, the findings of elevated FI in the VH after sub-retinally delivered CFI gene therapy and changes in complement pathway proteins post-treatment build confidence that VH matrix is at least partially reflecting the complement system at the retinal layers and treatment site, and is a valid biomarker for PK/PD insights in response to treatment.’

      Minor comments:

      Line 237: Missing parenthesis at the end of the sentence

      Manuscript updated.

      Line 435: Missing secondary parenthesis after .....Figure 3A)......

      Manuscript updated.

      Line 536: I don't think suggesting the addition of FHR proteins into the neurosensory retina/VH is such a good idea

      The reference to FHRs has been clarified in the manuscript, line 558. The authors note that FHR dimerization domains have been engineered to dimerize Factor H constructs increasing half-life and potency for drugs currently in development.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer # 1 (Public Review):

      Summary:

      Inthispreprint, theauthorssystematicallyandrigorouslyinvestigatehowspecificclassesofresiduemutations alter the critical temperature as a proxy for the driving forces for phase separation. The work is well executed, the manuscript well-written, and the results reasonable and insightful.

      Strengths:

      The introductory material does an excellent job of being precise in language and ideas while summarizing the state of the art. The simulation design, execution, and analysis are exceptional and set the standard for these types of large-scale simulation studies. The results, interpretations, and Discussion are largely nuanced, clear, and well-motivated.

      We thank the reviewer for their assessment of our work and for highlighting the key strengths of the paper.

      Weaknesses:

      This is not exactly a weakness, but I think it would future-proof the authors’ conclusions to clarify a few key caveats associated with this work. Most notably, given the underlying implementation of the Mpipi model, temperature dependencies for intermolecular interactions driven by solvent effects (e.g., hydrophobic effect and charge-mediated interactions facilitated by desolvation penalties) are not captured. This itself is not a “weakness” per se, but it means I would imagine CERTAIN types of features would not be wellcaptured; notably, my expectation is that at higher temperatures, proline-rich sequences drive intermolecular interactions, but at lower temperatures, they do not. This is likely also true for the aliphatic residues, although these are found less frequently in IDRs. As such, it may be worth the authors explicitly discussing.

      We also thank the reviewer for pointing out that a more detailed discussion of the model limitations is needed. The original Mpipi model was designed to probe UCST-type transitions (that are associative in nature) of disordered sequences. The reviewer is correct, that in its current form, the model does not capture LCST-type transitions that depend on changes in solvation of hydrophobic residues with temperature. We have amended the discussion to highlight this fact.

      Similarly, prior work has established the importance of an alpha-helical region in TDP-43, as well as the role of aliphatic residues in driving TDP-43’s assembly (see Schmidt et al 2019). I recognize the authors have focussed here on a specific set of mutations, so it may be worth (in the Discussion) mentioning [1] what impact, if any, they expect transient or persistent secondary structure to have on their conclusions and [2] how they expect aliphatic residues to contribute. These can and probably should be speculative as opposed to definitive.

      Again - these are not raised as weaknesses in terms of this work, but the fact they are not discussed is a minor weakness, and the preprint’s use and impact would be improved on such a discussion.

      We agree with the reviewer that the effects of structural changes/propensities on these scaling behaviors would be an interesting and important angle to probe. We also comment on this in the discussion.

      Reviewer # 2 (Public Review):

      This is an interesting manuscript where a CA-only CG model (Mpipi) was used to examine the critical temperature (Tc) of phase separation of a set of 140 variants of prion-like low complexity domains (PLDs). The key result is that Tc of these PLDs seems to have a linear dependence on substitutions of various sticker and space residues. This is potentially useful for estimating the Tc shift when making novel mutations of a PLD. However, I have strong reservations about the significance of this observation as well as some aspects of the technical detail and writing of the manuscript.

      We thank the reviewer for their thoughtful and detailed feedback on the manuscript.

      (1) Writing of the manuscript: The manuscript can be significantly shortened with more concise discussions. The current text reads as very wordy in places. It even appears that the authors may be trying a bit too hard to make a big deal out of the observed linear dependence.

      The manuscript needs to be toned done to minimize self-promotion throughout the text. Some of the glaring examples include the wording “unprecedented”, “our research marks a significant milestone in the field of computational studies of protein phase behavior ..”, “Our work explores a new framework to describe, quantitatively, the phase behavior ...”, and others.

      We thank the reviewer for their suggestions on the writing of the manuscript. We understand the concern regarding the length and tone of the manuscript, and in response to their feedback, we have revised the language throughout the manuscript.

      There is really little need to emphasize the need to manage a large number of simulations for all 140 variants. Yes, some thoughts need to go into designing and managing the jobs and organizing the data, but it is pretty standard in computational studies. For example, large-scale protein ligand-free energy calculations can require one to a few orders of magnitude larger number of runs, and it is pretty routine.

      We fully agree with the reviewer that this aspect of the study is relatively standard in computational research and does not require special emphasis. In response, we have revised the manuscript to shorten the aforementioned section, focusing instead on the scientific insights gained from the simulations rather than the logistical challenges of managing them.

      When discussing the agreement with experimental results on Tm, it should be noted that the values of R > 0.93 and RMSD < 14 K are based on only 16 data points. I am not sure that one should refer to this as “extended validation”. It is more like a limited validation given the small data size.

      We thank the reviewer for their consideration of our validation set. Indeed, the agreement with experimental results is based on 16 data points, as this set represents the available published data at the time of writing of this manuscript. The term “extended validation” is used to signify that our current dataset builds upon previous validations (in Joseph, Reinhardt et al. Nat Comput. Sci. 2021), incorporating additional variants not previously examined. The metrics of an r>0.93 and a low RMSD indicate a strong agreement between the model and experiments, and an improvement with respect to other reported models. We are committed to continue validating our methods.

      Results of linear fitting shown in Eq 4-12 should be summarized in a single table instead of scattering across multiple pages.

      We considered the reviewer’s suggestion to compile all the laws into a single table. However, we believe it would be more effective for readers to reference each relationship directly where it is first discussed in the text. That said, we do include Table 1 in the original manuscript, which provides a summary of all the laws.

      The title may also be toned down a bit given the limited significance of the observed linear dependence.

      We respectfully disagree with the reviewer and believe that the current title accurately captures the scope of the manuscript.

      (2) Significance and reliability of Tc: Given the simplicity of Mpipi (a CA-only model that can only describe polymerchaindimension)andthelowcomplexitynatureofPLDs, thesequencecompositionitselfisexpected to be the key determinant of Tc. This is also reflected in various mean-field theories. It is well known that other factors will contribute, such as patterning (examined in this work as well), residual structures, and conformational preferences in dilute and dense phases. The observed roughly linear dependence is a nice confirmation but really unsurprising by itself. It appears how many of the constructs deviate from the expected linear dependence (e.g., Figure 4A) may be more interesting to explore.

      While linear dependencies in critical solution temperatures may appear expected for certain systems, for example, symmetric hard spheres, the heterogeneity of intrinsically disordered regions (IDRs), like prion-like domains (PLDs), make this finding notable. The simplicity of our linear scaling law belies the underlying complexity of multivalent interactions and sequence-dependent behaviors in a certain sequence regime, which has not been quantitatively characterized in this manner before. Likewise, although linear dependencies may be expected in simplified models, the real-world applicability and empirical validation of these laws in biologically relevant systems are not guaranteed. Our chemically based model provides the robustness needed to do that. The linear relationship observed is significant because it provides a predictive framework for understanding how specific mutations affect a diverse set of PLDs. The framework presented can be extended to other protein families upon the application of a validated model, which might or might not yield linear relationships depending on the cooperative effects of their collective behavior. This extends beyond confirming known theories—it offers a practical tool for predicting phase behavior based on sequence composition

      We agree with the reviewer that, while the overarching linear trend is clear, deviations from linearity observed in constructs like those in Figure 4A point to additional, and interesting, layers of complexity. These deviations offer interesting avenues for future research and suggest that while linearity might dominate PLD critical behavior, other factors may modulate this behavior under specific conditions.

      This is an excellent suggestion from the reviewer that, while it falls outside the scope of the current study, we are interested in exploring in the future.

      Finally, the relationships are all linear, they have been normalized in different ways—the strength of the study also lies in that. Instead of focusing solely on linearity, our study explores the physical mechanisms that underlie these relationships. This approach provides a more complete understanding of how sequence composition and the underlying chemistry of the mutated residues influence T<sub>c</sub.

      The assumption that all systems investigated here belong to the same universality class as a 3D Ising model and the use of Eqn 20 and 21 to derive Tc is poorly justified. Several papers have discussed this issue, e.g., see Pappu Chem Rev 2023 and others. Muthukumar and coworkers further showed that the scaling of the relevant order parameters, including the conserved order parameter, does not follow the 3D Ising model. More appropriate theoretical models including various mean field theories can be used to derive binodal from their data, such as using Rohit Pappu’s FIREBALL toolset. Imposing the physics of the 3D Ising model as done in the current work creates challenges for equivalence relationships that are likely unjustified.

      We thank the reviewer for raising this point and for highlighting the FIREBALL toolset. Based on our understanding, FIREBALL is designed to fit phase diagrams using mean-field theories, such as Flory–Huggins and Gaussian Cluster Theory. Our experience with this toolset suggests that it places a higher weight on the dilute arm of the binodal. However, in our slab simulations, we observe greater uncertainty in the density of the dilute arm. This leads to only a moderate fit of the data to the mean-field theories employed in the toolset. While we agree that there is no reason to assume the phase behavior of these systems is fully captured by the 3D Ising model, we expect that such a model will describe the behavior near the critical point better than mean-field theories. Testing our results further with different critical exponents would be valuable in assessing how these predictions compare to a broader set of experimental data. Additionally, we have made the raw data points for the phase diagrams available on our GitHub, enabling practitioners to apply alternative fitting methods.

      While it has been a common practice to extract Tc when fitting the coexistence densities, it is not a parameter that is directly relevant physiologically. Instead, Csat would be much more relevant to think about if phase separation could occur in cells.

      WhileitistruethatCsatisdirectlyrelevanttowhetherphaseseparationcanoccurincellsunder physiological conditions, T<sub>c</sub> should not be dismissed as irrelevant.T<sub>c</sub> provides fundamental insights into the thermodynamics of phase separation, reflecting the overall stability and strength of interactions driving condensate formation. This stability is crucial for understanding how environmental factors, such as temperature or mutations, might affect phase behavior. In Figure 2C and D we compare experimental C<sub>sat</sub> values with our predicted T<sub>c</sub> from simulations. These quantities are roughly inversely proportional to each other and so we expect that, to a first approximation, the relationships recovered for T<sub>c</sub> should hold when consideringC<sub>sat</sub> at a fixed temperature.

      Reviewer # 3 (Public Review):

      Summary:

      “Decoding Phase Separation of Prion-Like Domains through Data-Driven Scaling Laws” by Maristany et al. offers a significant contribution to the understanding of phase separation in prion-like domains (PLDs). The study investigates the phase separation behavior of PLDs, which are intrinsically disordered regions within proteins that have a propensity to undergo liquid-liquid phase separation (LLPS). This phenomenon is crucial in forming biomolecular condensates, which play essential roles in cellular organization and function. The authors employ a data-driven approach to establish predictive scaling laws that describe the phase behavior of these domains.

      Strengths:

      The study benefits from a robust dataset encompassing a wide range of PLDs, which enhances the generalizability of the findings. The authors’ meticulous curation and analysis of this data add to the study’s robustness. The scaling laws derived from the data provide predictive insights into the phase behavior of PLDs, which can be useful in the future for the design of synthetic biomolecular condensates.

      We thank the reviewer for highlighting the importance of our work and for their critical feedback.

      Weaknesses:

      While the data-driven approach is powerful, the study could benefit from more experimental validation. Experimental studies confirming the predictions of the scaling laws would strengthen the conclusions. For example, in Figure 1, the Tc of TDP-43 is below 300 K even though it can undergo LLPS under standard conditions. Figure 2 clearly highlights the quantitative accuracy of the model for hnRNPA1 PLD mutants, but its applicability to other systems such as TDP-43, FUS, TIA1, EWSR1, etc., may be questionable.

      In the manuscript, we have leveraged existing experimental data for the A1-LCD variants, extracting critical temperatures and saturation concentrations to compare with our model and scaling law predictions. We acknowledge that a larger set of experiments would be beneficial. By selecting sequences that are related, we hypothesize that the scaling laws described herein should remain robust. In the case of TDP-43, to our knowledge this protein does not phase separate on its own under standard conditions. In vitro experiments that report phase separation at/above 300 K involve either the use of crowding agents (such as dextran or PEG) or multicomponent mixtures that include RNA or other proteins. Therefore, our predictions for TDP-43 are consistent with experiments. In general, we hope that the scaling laws presented in our work will inspire other researchers to further test their validity.

      The authors may wish to consider checking if the scaling behavior is only observed for Tc or if other experimentally relevant quantities such as Csat also show similar behavior. Additionally, providing more intuitive explanations could make the findings more broadly accessible.

      In Figure 2C and D we compare experimental C<sub>sat</sub> values with our predicted T<sub>c</sub> from simulations. These quantities are roughly inversely proportional to each other and so we expect that, to a first approximation, the relationships recovered for T<sub>c</sub> should hold when considering C<sub>sat</sub> at a fixed temperature.

      The study focuses on a particular subset of intrinsically disordered regions. While this is necessary for depth, it may limit the applicability of the findings to other types of phase-separating biomolecules. The authors may wish to discuss why this is not a concern. Some statements in the paper may require careful evaluation for general applicability, and I encourage the authors to exercise caution while making general conclusions. For example, “Therefore, our results reveal that it is almost twice more destabilizing to mutate Arg to Lys than to replace Arg with any uncharged, non-aromatic amino acid...” This may not be true if the protein has a lot of negative charges.

      A significant number of proteins, in addition to those mentioned in the manuscript, that contain prion-like low complexity domains have been reported to exhibit phase separation behaviors and/or are constituents of condensates inside cells. We therefore expect these laws to be applicable to such systems and have further revised the text to emphasize this point. As the reviewer suggests, we have also clarified that the reported scaling of various mutations applies to these systems.

      I am surprised that a quarter of a million CPU hours are described as staggering in terms of computational requirements.

      We have removed the note on CPU hours from the manuscript. However, we would like to clarify that the amount of CPU hours was incorrectly reported. The correct estimate is 1.25 million hours, but this value was unfortunately misrepresented during the editing process. We thank the reviewer for catching this mistake on our part.

      Reviewer # 1 (Recommendations For The Authors):

      Some minor points here:

      “illustrating that IDPs indeed behave like a polymer in a good solvent [43]. ” Whether or not an IDP depends as a polymer in a good solvent depends on the amino acid sequence - the referenced paper selected a set of sequences that do indeed appear on average to map to a good-solvent-like polymer, but lest we forget SAXS experiments require high protein concentrations and until the recent advent of SEC-SAXS, your protein essentially needed to be near infinitely soluble to be measured. As such, this paper’s conclusions are, apparently, ignorant of the limitations associated with the data they are describing, drawing sweeping generalizations that are clearly not supported by a multitude of studies in which sequence-dependencies have led to ensembles with a scaling exponent far below 0.59 (See Riback et al 2017, Peng et al 2019, Martin et al 2020, etc).

      We thank the reviewer for raising this point. To avoid making incorrect generalizations and potentially misleading readers, we have removed the quoted statement from our manuscript.

      As of right now, the sequences are provided in a convenient multiple-sequence alignment figure. However, it would be important also to provide all sequences in an Excel table to make it easy for folks to compare.

      In addition to the sequence alignment figure, we now provide all tested sequences in an Excel table format in the GitHub repository.

      Maybe I’m missing it, but it would be extremely valuable if the coexistence points plot in all the figures were provided as so-called source data; this could just be on the GitHub repository, but I’m envisaging a scenario where for each sequence you have a 4 column file where Col1=concentration and Col2=temperature, col3=fit concentration and col4=fit temperature, such that someone could plot col1 vs. col2 and col3 vs. col4 and reproduce the binodals in the various figures. Given the tremendous amount of work done to achieve binodals:

      The coexistence points used to plot the figures are now provided in the GitHub, in a format similar to that suggested by the reviewer.

      It would be nice to visually show how finite size effects are considered/tested for (which they are very nicely) because I think this is something the simulation field should be thinking about more than they are.

      Thank you for highlighting this point. In our previous work (supporting information of the original Mpipi paper), we demonstrated a thorough approach by varying both the cross-sectional area of the box and the long axis while keeping the overall density constant. In this work, we verified that the cross-sectional area was larger than the average R<sub>g</sub> of the protein. We then maintained a fixed cross-sectional area to long-axis ratio, varying the number of proteins while keeping the overall density constant. We have updated Appendix 1–Figure 2 to clarify our procedure and revised the caption to better explain how we ensured the number of proteins was adequate.

      When explaining the law of reticular diameters, it would be good to explain where the 3.06 exponent comes from.

      Based on the reviewer’s suggestion, we have added to the text: “The constant 3.06 in the equation is a dimensionless empirical factor that was derived from simulations of the 3D Ising model.”

      The NCPR scale in Figure 5 being viridis is not super intuitive and may benefit from being seismic or some other r-w-b colormap just to make it easier for a reader to map the color to meaning.

      We thank the reviewer for this suggestion and have replaced the scale with a r-w-b colormap.

      The “sticker and spacer” framework has received critiques recently given its perceived simplicity. However, this work seems to clearly illustrate that certain types of residues have a large effect on Tc when mutated, whereas others have a smaller effect. It may be worth re-phrasing the sticker-spacer introduction not as “everyone knows aromatic/arginine residues are stickers” but as “aromatic and arginine residues have been proposed to be stickers, yet other groups have argued all residues matter equally” and then go on to make the point that while a black-and-white delineation is probably not appropriate, based on the data, certain residues ARE demonstrably more impactful on Tc than others, which is the definition of stickers. With this in mind, it may be useful to separate out a sticker and a spacer distribution in Figure 1D, because the different distribution between the two residues types is not particularly obvious from the overlapping points.

      We have revised the introduction of the sticker–spacer model in the manuscript for clarity. As the reviewer suggests, we have also separated the sticker and spacer distribution, which is now summarized in new Appendix 0–figure 8.

      Reviewer # 3 (Recommendations For The Authors):

      Figure 2 clearly highlights the quantitative accuracy of the model for hnRNPA1 PLD mutants, but its applicability to other systems such as TDP-43, FUS, TIA1, EWSR1, etc., may be questionable. The following sentence may be revised to reflect this: “Our extended validation set confirms that the Mpipi potential can ...”

      Based on the reviewer’s suggestion, we have revised the text: “Our validation set, which expands the range of proteins variants originally tested [32], highlights that the Mpipi potential can effectively capture the thermodynamic behavior of a wide range of hnRNPA1-PLD variants, and suggests that Mpipi is adequate for proteins with similar sequence compositions, as in the set of proteins analyzed in this study. In recent work by others [66], Mpipi was tested against experimental radius of gyration data for 137 disordered proteins and the model produced highly accurate results, which further suggests the applicability of the approach to a broad range of sequences.”

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      As a starting point, the authors discuss the so-called "additive partitioning" (AP) method proposed by Loreau & Hector in 2001. The AP is the result of a mathematical rearrangement of the definition of overyielding, written in terms of relative yields (RY) of species in mixtures relative to monocultures. One term, the so-called complementarity effect (CE), is proportional to the average RY deviations from the null expectations that plants of both species "do the same" in monocultures and mixtures. The other term, the selection effect (SE), captures how these RY deviations are related to monoculture productivity. Overall, CE measures whether relative biomass gains differ from zero when averaged across all community members, and SE, whether the "relative advantage" species have in the mixture, is related to their productivity. In extreme cases, when all species benefit, CE becomes positive.

      This is not true; positive CE does not require positive RY deviations of all species. CE is positive as long as average RY deviation is greater than 0. In a 2-species mixture, for example, if the RY deviation of one species is -0.2 and that of the other species is +0.3, CE would be still positive. Positive CE can be associated with negative NE (net biodiversity effects) when more productivity species have smaller negative RY deviation compared to positive RY deviation of less productive species. Therefore, the suggestion by the reviewer “This is intuitively compatible with the idea that niche complementarity mitigates competition (CE>0)” is not correct.   

      When large species have large relative productivity increases, SE becomes positive. This is intuitively compatible with the idea that niche complementarity mitigates competition (CE>0), or that competitively superior species dominate mixtures and thereby driver overyielding (SE>0).

      The use of word “mitigate” indicates that the effects of niche complementarity and competition are in opposite directions, which is not true with biodiversity experiments based on replacement design. We have explained this in detail in our first responses to reviewers.    

      However, it is very important to understand that CE and SE capture the "statistical structure" of RY that underlies overyielding. Specifically, CE and SE are not the ultimate biological mechanisms that drive overyielding, and never were meant to be. CE also does not describe niche complementarity. Interpreting CE and SE as directly quantifying niche complementarity or resource competition, is simply wrong, although it sometimes is done. The criticism of the AP method thus in large part seems unwarranted. The alternative methods the authors discuss (lines 108-123) are based on very similar principles.

      Agree. However, If CE and SE are not meant to be biological mechanisms, as suggested by the reviewer, the argument “This is intuitively compatible with the idea that niche complementarity mitigates competition (CE>0), or that competitively superior species dominate mixtures and thereby driver overyielding (SE>0)” would be invalid.  

      Lines 108-123 are not on our method.   

      The authors now set out to develop a method that aims at linking response patterns to "more true" biological mechanisms.

      Assuming that "competitive dominance" is key to understanding mixture productivity, because "competitive interactions are the predominant type of interspecific relationships in plants", the authors introduce "partial density" monocultures, i.e. monocultures that have the same planting density for a species as in a mixture. The idea is that using these partial density monocultures as a reference would allow for isolating the effect of competition by the surrounding "species matrix".

      The authors argue that "To separate effects of competitive interactions from those of other species interactions, we would need the hypothesis that constituent species share an identical niche but differ in growth and competitive ability (i.e., absence of positive/negative interactions)." - I think the term interaction is not correctly used here, because clearly competition is an interaction, but the point made here is that this would be a zero-sum game.

      We did not say that competition is not an interaction.

      The authors use the ratio of productivity of partial density and full-density monocultures, divided by planting density, as a measure of "competitive growth response" (abbreviated as MG). This is the extra growth a plant individual produces when intraspecific competition is reduced.

      Here, I see two issues: first, this rests on the assumption that there is only "one mode" of competition if two species use the same resources, which may not be true, because intraspecific and interspecific competition may differ. Of course, one can argue that then somehow "niches" are different, but such a niche definition would be very broad and go beyond the "resource set" perspective the authors adopt. Second, this value will heavily depend on timing and the relationship between maximum initial growth rates and competitive abilities at high stand densities.

      True. Research findings indicate that biodiversity effect detected with AP is not constant.    

      The authors then progress to define relative competitive ability (RC), and this time simply uses monoculture biomass as a measure of competitive ability. To express this biomass in a standardized way, they express it as different from the mean of the other species and then divide by the maximum monoculture biomass of all species.

      I have two concerns here: first, if competitive ability is the capability of a species to preempt resources from a pool also accessed by another species, as the authors argued before, then this seems wrong because one would expect that a species can simply be more productive because it has a broader niche space that it exploits. This contradicts the very narrow perspective on competitive ability the authors have adopted. This also is difficult to reconcile with the idea that specialist species with a narrow niche would outcompete generalist species with a broad niche.

      Competitive ability is not necessarily associated with species niche space. Both generalist and specialist species can be more productive at a particular study site, as long as they are more capable of obtaining resources from a local pool. Remember, biodiversity experiments are conducted at a site of particular conditions, not across a range of species niche space at landscape level.

      Second, I am concerned by the mathematical form. Standardizing by the maximum makes the scaling dependent on a single value.

      As explained in lines 370-376, the mathematical form is a linear approximation as the relationship between competitive growth responses and species relative competitive ability is generally unknow but would be likely nonlinear. Once the relationship is determined in future research, the scaling factor is not needed.    

      As a final step, the authors calculate a "competitive expectation" for a species' biomass in the mixture, by scaling deviations from the expected yield by the product MG ⨯ RC. This would mean a species does better in a mixture when (1) it benefits most from a conspecific density reduction, and (2) has a relatively high biomass.

      Put simply, the assumption would be that if a species is productive in monoculture (high RC), it effectively does not "see" the competitors and then grows like it would be the sole species in the community, i.e. like in the partial density monoculture.

      Overall, I am not very convinced by the proposed method.

      Comments on revised version:

      Only minimal changes were made to the manuscript, and they do not address the main points that were raised.

      Reviewer #2 (Public review):

      This manuscript by Tao et al. reports on an effort to better specify the underlying interactions driving the effects of biodiversity on productivity in biodiversity experiments. The authors are especially concerned with the potential for competitive interactions to drive positive biodiversity-ecosystem functioning relationships by driving down the biomass of subdominant species. The authors suggest a new partitioning schema that utilizes a suite of partial density treatments to capture so-called competitive ability. While I agree with the authors that understanding the underlying drivers of biodiversity-ecosystem functioning relationships is valuable - I am unsure of the added value of this specific approach for several reasons.

      No responses.

      Comments on revised version:

      The authors changed only one minor detail in response to the last round of reviews.

      Reviewer #3 (Public review):

      Summary:

      This manuscript claims to provide a new null hypothesis for testing the effects of biodiversity on ecosystem functioning. It reports that the strength of biodiversity effects changes when this different null hypothesis is used. This main result is rather inevitable. That is, one expects a different answer when using a different approach. The question then becomes whether the manuscript's null hypothesis is both new and an improvement on the null hypothesis that has been in use in recent decades.

      Our approach adopts two hypotheses, null hypothesis that is also with the additive partitioning model and competitive hypothesis that is new. Null hypothesis assumes that inter- and intra-specie interactions are the same, while competitive hypothesis assumes that species differ in competitive ability and growth rate. Therefore, our approach is an extension of current approach. Our approach separates effects of competitive interactions from those of other species interactions, while the current approach does not.      

      Strengths:

      In general, I appreciate studies like this that question whether we have been doing it all wrong and I encourage consideration of new approaches.

      Weaknesses:

      Despite many sweeping critiques of previous studies and bold claims of novelty made throughout the manuscript, I was unable to find new insights. The manuscript fails to place the study in the context of the long history of literature on competition and biodiversity and ecosystem functioning.

      We have explained in our first responses that competition and biodiversity effects are studied in different experimental approaches, i.e., additive and replacement designs. Results from one approach are not compatible with those from the other. For example, competition effect with additive design is negative but generally positive with replacement design that is used extensively in biodiversity experiments. We have considered species competitive ability, density-growth relationship, and different effects of competitive interactions between additive and replacement design, while the current method does not reflect any of those.        

      The Introduction claims the new approach will address deficiencies of previous approaches, but after reading further I see no evidence that it addresses the limitations of previous approaches noted in the Introduction. Furthermore, the manuscript does not reproducibly describe the methods used to produce the results (e.g., in Table 1) and relies on simulations, claiming experimental data are not available when many experiments have already tested these ideas and not found support for them.

      We used simulation data, as partial density monocultures are generally not available in previous biodiversity experiments.

      Finally, it is unclear to me whether rejecting the 'new' null hypothesis presented in the manuscript would be of interest to ecologists, agronomists, conservationists, or others.

      Our null hypothesis is the same as the null hypothesis with the additive partitioning assuming that inter- and intra-species interactions are the same, while our competitive hypothesis assumes that species differ in competitive ability and growth rate. Rejecting null hypothesis means that inter- and intra-species interactions are different, whereas rejecting competitive hypothesis indicates existence of positive/negative species interactions. This would be interesting to everyone.       

      Comments on revised version:

      Please see review comments on the previous version of this manuscript. The authors have not revised their manuscript to address most of the issues previously raised by reviewers.

      No responses.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Do take reviews seriously. Even if you think the reviewers all are wrong and did not understand your work, then this seems to indicate that it was not clearly presented.

      Reviewer #2 (Recommendations for the authors):

      I can understand that the authors are perhaps frustrated with what they perceive as a basic misunderstanding of their goals and approach. This misunderstanding however, provides with it an opportunity to clarify. I believe that the authors have tried to clarify in rebutting our statements but would do better to clarify in the manuscript itself. If we reviewers, who are deeply invested in this field, don't understand the approach and its value, then it is likely that many readers will not as well.

      The additive partitioning has been publicly questioned at least for serval times since the conception of the method in 2001. Our work provides an alternative.

    1. Reviewer #2 (Public review):

      Summary:

      The authors present a paper that attempts to tackle an important question, with potential impact far beyond the field of animal behavior research: what are the relative contributions of innate personality traits versus early life experience on individual behavior in the wild? The study, performed on Egyptian fruit bats that are caught in the wild and later housed in an outdoor colony, is solidly executed, and benefits greatly from a unique setup in which controlled laboratory experiments are combined with monitoring of individuals as they undertake undirected, free exploration of their natural environment.

      The primary finding of the paper is that there is a strong effect of early life experience on behavior in the wild, where individual bats that were exposed to an enriched environment as juveniles later travelled farther and over greater distances when permitted to explore and forage ad libitum, as compared with individual bats who were subjected to a more impoverished environment. Meanwhile, no prominent effect of innate "personality", as assessed by indices of indoor foraging behavior early on, before the bats were exposed to the controlled environmental treatment, was observed on three metrics of outdoor foraging behavior. The authors conclude that the early environment plays a larger role than innate personality on the behavior of adult bats.

      Strengths:

      (1) Elegant design of experiments and impressive combination of methods<br /> Bats used in the experiment were taken from wild colonies in different geographical areas, but housed during the juvenile stage in a controlled indoor environment. Bats are tested on the same behavioral paradigm at multiple points in their development. Finally, the bats are monitored with GPS as they freely explore the area beyond the outdoor colony.

      (2) Development of a behavioral test that yields consistent results across time<br /> The multiple-foraging box paradigm, in which behavioral traits such as overall activity, levels of risk-taking, and exploratoriness can be evaluated as creative, and suggestive of behavioral paradigms other animal behavior researchers might be able to use. It is especially useful, given that it can be used to evaluate the activity of animals seemingly at most stages of life, and not just in adulthood.

      Weaknesses:

      (1) Robustness and validity of personality measures<br /> Coming up with robust measures of "personality" in non-human animals is tricky. While this paper represents an important attempt at a solution, some of the results obtained from the indoor foraging paradigm raise questions as to the reliability of this task for assessing "personality".

      (2) Insufficient exploitation of data<br /> Between the behavioral measures and the very multidimensional GPS data, the authors are in possession of a rich data set. However, I don't feel that this data has been adequately exploited for underlying patterns and relationships. For example, many more metrics could be extracted from the GPS data, which may then reveal correlations with early measures of personality or further underscore the role of the early environment. In addition, the possibility that these personality measures might in combination affect outdoor foraging is not explored.

      (3) Interpretation of statistical results and definition of statistical models<br /> Some statistical interpretations may not be entirely accurate, particularly in the case of multiple regression with generalized linear models. In addition, some effects which may be present in the data are dismissed as not significant on the basis of null hypothesis testing.

      Below I have organized the main points of critique by theme, and ordered subordinate points by order of importance:

      (1) Assessing personality metrics and the indoor paradigm: While I applaud this effort and think the metrics used are justified, I see a few issues in the results as they are currently presented:<br /> (a) [Major] I am somewhat concerned that here, the foraging box paradigm is being used for two somewhat conflicting purposes: (1) assessing innate personality and (2) measuring changes in personality as a result of experience. If the indoor foraging task is indeed meant to measure and reflect both at the same time, then perhaps this can be made more explicit throughout the manuscript. In this circumstance, I think the authors could place more emphasis on the fact that the task, at later trials/measurements, begins to take on the character of a "composite" measure of personality and experience.

      (b) [Major] Although you only refer to results obtained in trials 1 and 2 when trying to estimate "innate personality" effects, I am a little worried that the paradigm used to measure personality, i.e. the stable components of behavior, is itself affected by other factors such as age (in the case of activity, Fig. 1C3, S1C1-2), the environment (see data re trial 3), and experience outdoors (see data re trials 4/5).

      Ideally, a study that aims to disentangle the role of predisposition from early-life experience would have a metric for predisposition that is relatively unchanging for individuals, which can stand as a baseline against a separate metric that reflects behavioral differences accumulated as a result of experience.

      I would find it more convincing that the foraging box paradigm can be used to measure personality if it could be shown that young bats' behavior was consistent across retests in the box paradigm prior to any environmental exposure across many baseline trials (i.e. more than 2), and that these "initial settings" were constant for individuals. I think it would be important to show that personality is consistent across baseline trials 1 and 2. This could be done, for example, by reproducing the plots in Fig. 1C1-3 while plotting trial 1 against trial 2. (I would note here that if a significant, positive correlation were to be found (as I would expect) between the measures across trial 1 and 2, it is likely that we would see the "habituation effect" the authors refer to expressed as a steep positive slope on the correlation line (indicating that bold individuals on trial 1 are much bolder on trial 2).)

      (c) Related to the previous point, it was not clear to me why the data from trial 2 (the second baseline trial) was not presented in the main body of the paper, and only data from trial 1 was used as a baseline.

      In the supplementary figure and table, you show that the bats tended to exhibit more boldness and exploratory behavior, but fewer actions, in trial 2 as compared with trial 1. You explain that this may be due to habituation to the experimental setup, however, the precise motivation for excluding data from trial 2 from the primary analyses is not stated. I would strongly encourage the authors to include a comparison of the data between the baseline trials in their primary analysis (see above), combine the information from these trials to form a composite baseline against which further analyses are performed, or further justify the exclusion of data as a baseline.

      (2) Comparison of indoor behavioral measures and outdoor behavioral measures<br /> Regarding the final point in the results, correlation between indoor personality on Trial 4 and outdoor foraging behavior: It is not entirely clear to me what is being tested (neither the details of the tests nor the data or a figure are plotted). Given some of the strong trends in the data - namely, (1) how strongly early environment seems to affect outdoor behavior, (2) how strongly outdoor experience affects boldness, measured on indoor behavior (Fig. 1D) - I am not convinced that there is no relationship, as is stated here, between indoor and outdoor behavior. If this conclusion is made purely on the basis of a p-value, I would suggest revisiting this analysis.

      (3) Use of statistics/points regarding the generalized linear models<br /> While I think the implementation of the GLMM models is correct, I am not certain that the interpretation of the GLMM results is entirely correct for cases where multivariate regression has been performed (Tables 4s and S1, and possibly Table 3). (You do not present the exact equation they used for each model (this would be a helpful addition to the methods), therefore it is somewhat difficult to evaluate if the following critique properly applies, however...)

      The "estimate" for a fixed effect in a regression table gives the difference in the outcome variable for a 1 unit increase in the predictor variable (in the case of numeric predictors) or for each successive "level" or treatment (in the case of categorical variables), compared to the baseline, the intercept, which reflects the value of the outcome variable given by the combination of the first value/level of all predictors. Therefore, for example, in Table 4a - Time spend outside: the estimate for Bat sex: male indicates (I believe) the difference in time spent outside for an enriched male vs. an enriched female, not, as the authors seem to aim to explain, the effect of sex overall. Note that the interpretation of the first entry, Environmental condition: impoverished, is correct. I refer the authors to the section "Multiple treatments and interactions" on p. 11 of this guide to evaluating contrasts in G/LMMS: https://bbolker.github.io/mixedmodels-misc/notes/contrasts.pdf

  7. Dec 2024
    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review): 

      Summary: 

      The authors introduced their previous paper with the concise statement that "the relationships between lineage-specific attributes and genotypic differences of tumors are not understood" (Chen et al., JEM 2019, PMID: 30737256). For example, it is not clear why combined loss of RB1 and TP53 is required for tumorigenesis in SCLC or other aggressive neuroendocrine (NE) cancers, or why the oncogenic mutations in KRAS or EGFR that drive NSCLC tumorigenesis are found so infrequently in SCLC. This is the main question addressed by the previous and current papers. 

      One approach to this question is to identify a discrete set of genetic/biochemical manipulations that are sufficient to transform non-malignant human cells into SCLC-like tumors. One group reported the transformation of primary human bronchial epithelial cells into NE tumors through a complex lentiviral cocktail involving the inactivation of pRB and p53 and activation of AKT, cMYC, and BCL2 (PARCB) (Park et al., Science 2018, PMID: 30287662). The cocktail previously reported by Chen and colleagues to transform human pluripotent stem-cell (hPSC)-derived lung progenitors (LPs) into NE xenografts was more concise: DAPT to inactivate NOTCH signaling combined with shRNAs against RB1 and TP53. However, the resulting RP xenografts lacked important characteristics of SCLC. Unlike SCLC, these tumors proliferated slowly and did not metastasize, and although small subpopulations expressed MYC or MYCL, none expressed NEUROD1. 

      MYC is frequently amplified or expressed at high levels in SCLC, and here, the authors have tested whether inducible expression of MYC could increase the resemblance of their hPSC-derived NE tumors to SCLC. These RPM cells (or RPM T58A with stabilized cMYC) engrafted more consistently and grew more rapidly than RP cells, and unlike RP cells, formed liver metastases when injected into the renal capsule. Gene expression analyses revealed that RPM tumor subpopulations expressed NEUROD1, ASCL1, and/or YAP1. 

      The hPSC-derived RPM model is a major advance over the previous RP model. This may become a powerful tool for understanding SCLC tumorigenesis and progression and for discovering gene dependencies and molecular targets for novel therapies. However, the specific role of cMYC in this model needs to be clarified. 

      cMYC can drive proliferation, tumorigenesis, or apoptosis in a variety of lineages depending on concurrent mutations. For example, in the Park et al., study, normal human prostate cells could be reprogrammed to form adenocarcinoma-like tumors by activation of cMYC and AKT alone, without manipulation of TP53 or RB1. In their previous manuscript, the authors carefully showed the role of each molecular manipulation in NE tumorigenesis. DAPT was required for NE differentiation of LPs to PNECs, shRB1 was required for expansion of the PNECs, and shTP53 was required for xenograft formation. cMYC expression could influence each of these steps, and importantly, could render some steps dispensable. For example, shRB1 was previously necessary to expand the DAPT-induced PNECs, as neither shTP53 nor activation of KRAS or EGFR had no effect on this population, but perhaps cMYC overexpression could expand PNECs even in the presence of pRB, or even induce LPs to become PNECs without DAPT. Similarly, both shRB1 and shTP53 were necessary for xenograft formation, but maybe not if cMYC is overexpressed. If a molecular hallmark of SCLC, such as loss of RB1 or TP53, has become dispensable with the addition of cMYC, this information is critically important in interpreting this as a model of SCLC tumorigenesis.  

      The reviewer’s suggestion may be possible; indeed, in a recent report from our group (Gardner EE, et al., Science 2024) we have shown, using genetically engineered mouse modeling coupled with lineage tracing, that the cMyc oncogene can selectively expand Ascl1+ PNECs in the lung.

      We agree with the reviewer that not having a better understanding of the individual components necessary and/or sufficient to transform hESC-derived LPs is an important shortcoming of this current work. However, we would like to stress three important points about the comments:  1) tumors were reviewed and the histological diagnoses were certified by a practicing pulmonary pathologist at WCM (our co-author, C. Zhang); 2 )the observed  transcriptional programs were consistent with primary human SCLC; and 3) RB1-proficient SCLC is now recognized as a rare presentation of SCLC (Febrese-Aldana CA, et al., Clin. Can. Res. 2022. PMID: 35792876).

      To interpret the role of cMYC expression in hPSC-derived RPM tumors, we need to know what this manipulation does without manipulation of pRB, p53, or NOTCH, alone or in combination. Seven relevant combinations should be presented in this manuscript: (1) cMYC alone in LPs, (2) cMYC + DAPT, (3) cMYC + shRB1, (4) cMYC + DAPT + shRB1, (5) cMYC + shTP53, (6) cMYC + DAPT + shTP53, and (7) cMYC + shRB1 + shTP53. Wildtype cMYC is sufficient; further exploration with the T58A mutant would not be necessary. 

      We respectfully disagree that an interrogation of the differences between the phenotypes produced by wildtype and Myc(T58A) would not be informative. (Our view is confirmed by the second reviewer; see below.)    It is well established that Myc gene or protein dosage can have profound effects on in vivo phenotypes (Murphy DJ, et al., Cancer Cell 2008. PMID: 19061836). The “RPM” model of variant SCLC developed by Trudy Oliver’s lab relied on the conditional T58A point mutant of cMyc, originally made by Rob Wechsler-Reya. While we do not discuss the differences between Myc and Myc(T58A), it is nonetheless important to present our results with both the WT and mutant MYC constructs, as we are aware of others actively investigating differences between them in GEMM models of SCLC tumor development.

      We agree with the reviewer about the virtues of trying to identify the effects of individual gene manipulations; indeed our original paper (Chen et al., J. Expt. Med. 2019), describing the RUES2derived model of SCLC did just that, carefully dissecting events required to transform LPs towards a SCLC-like state. The central  purpose of the current study was to determine the effects of adding cMyc on the behavior of weakly tumorigenic SCLC-like cells cMyc.  Presenting data with these two alleles to seek effects of different doses of MYC protein seems reasonable.

      This reviewer considers that there should be a presentation of the effects of these combinations on LP differentiation to PNECs, expansion of PNECs as well as other lung cells, xenograft formation and histology, and xenograft growth rate and capacity for metastasis. If this could be clarified experimentally, and the results discussed in the context of other similar approaches such as the Park et al., paper, this study would be a major addition to the field.  

      Reviewer #2 (Public Review): 

      Summary: 

      Chen et al use human embryonic stem cells (ESCs) to determine the impact of wildtype MYC and a point mutant stable form of MYC (MYC-T58A) in the transformation of induced pulmonary neuroendocrine cells (PNEC) in the context of RB1/P53 (RP) loss (tumor suppressors that are nearly universally lost in small cell lung cancer (SCLC)). Upon transplant into immune-deficient mice, they find that RP-MYC and RP-MYC-T58A cells grow more rapidly, and are more likely to be metastatic when transplanted into the kidney capsule, than RP controls. Through single-cell RNA sequencing and immunostaining approaches, they find that these RPM tumors and their metastases express NEUROD1, which is a transcription factor whose expression marks a distinct molecular state of SCLC. While MYC is already known to promote aggressive NEUROD1+ SCLC in other models, these data demonstrate its capacity in a human setting that provides a rationale for further use of the ESC-based model going forward. Overall, these findings provide a minor advance over the previous characterization of this ESC-based model of SCLC published in Chen et al, J Exp Med, 2019. 

      We consider the findings more than a “minor” advance in the development of the model, since any useful model for SCLC would need to form aggressive and metastatic tumors.

      The major conclusion of the paper is generally well supported, but some minor conclusions are inadequate and require important controls and more careful analysis. 

      Strengths:

      (1) Both MYC and MYC-T58A yield similar results when RP-MYC and RP-MYCT58A PNEC ESCs are injected subcutaneously, or into the renal capsule, of immune-deficient mice, leading to the conclusion that MYC promotes faster growth and more metastases than RP controls. 

      (2) Consistent with numerous prior studies in mice with a neuroendocrine (NE) cell of origin (Mollaoglu et al, Cancer Cell, 2017; Ireland et al, Cancer Cell, 2020; Olsen et al, Genes Dev, 2021), MYC appears sufficient in the context of RB/P53 loss to induce the NEUROD1 state. Prior studies also show that MYC can convert human ASCL1+ neuroendocrine SCLC cell lines to a NEUROD1 state (Patel et al, Sci Advances, 2021); this study for the first time demonstrates that RB/P53/MYC from a human neuroendocrine cell of origin is sufficient to transform a NE state to aggressive NEUROD1+ SCLC. This finding provides a solid rationale for using the human ESC system to better understand the function of human oncogenes and tumor suppressors from a neuroendocrine origin. 

      Weaknesses:

      (1) There is a major concern about the conclusion that MYC "yields a larger neuroendocrine compartment" related to Figures 4C and 4G, which is inadequately supported and likely inaccurate. There is overwhelming published data that while MYC can promote NEUROD1, it also tends to correlate with reduced ASCL1 and reduced NE fate (Mollaoglu et al, Cancer Cell, 2017; Zhang et al, TLCR, 2018; Ireland et al, Cancer Cell, 2020; Patel et al, Sci Advances, 2021). Most importantly, there is a lack of in vivo RP tumor controls to make the proper comparison to judge MYC's impact on neuroendocrine identity. RPM tumors are largely neuroendocrine compared to in vitro conditions, but since RP control tumors (in vivo) are missing, it is impossible to determine whether MYC promotes more or less neuroendocrine fate than RP controls. It is not appropriate to compare RPM tumors to in vitro RP cells when it comes to cell fate. Upon inspection of the sample identity in S1B, the fibroblast and basal-like cells appear to only grow in vitro and are not well represented in vivo; it is, therefore, unclear whether these are transformed or even lack RB/P53 or express MYC. Indeed, a close inspection of Figure S1B shows that RPM tumor cells have little ASCL1 expression, consistent with lower NE fate than expected in control RP tumors. 

      We would like to clarify two points related to the conclusions that we draw about MYC’s ability to promote an increase in the neuroendocrine fraction in hESC-derived cultures:  1) The comparisons in Figures 4C were made between cells isolated in culture following the standard 50 day differentiation protocol, where, following generation of LPs around day 25, MYC was added to the other factors previously shown to enrich for a PNEC phenotype (shRB1, shTP53, and DAPT). Therefore, the argument that MYC increased the frequency of “neuroendocrine cells” (which we define by a gene expression signature) is a reasonable conclusion in the system we are using; and 2) following injection of these cells into immunocompromised mice, an ASCL1-low / NEUROD1-high presentation is noted (Supplemental Figures 1F-G). In the few metastases that we were able use to sequence bulk RNA, there is an even more pronounced increase in expression of NEUROD1 with a decrease in ASCL1.

      Some confusion may have arisen from our previous characterization of neuroendocrine (NE) cells using either ASCL1 or NEUROD1 as markers. To clarify, we have now designated cells positive for ASCL1 as classical NE cells and those positive for NEUROD1 as the NE variant. According to this revised classification, our findings indicate that MYC expression leads to an increase in the NEUROD1+ NE variant and a decrease in ASCL1+ classical NE cells. This adjustment has been reflected on the results section titled, “Inoculation of the renal capsule facilitates metastasis of the RUES2-derived RPM tumors” of the manuscript.  

      From the limited samples in hand, we compared the expression of ASCL1 and NEUROD1 in the weakly tumorigenic hESC RP cells after successful primary engraftment into immunocompromised mice. As expected, the RP tumors were distinguished by the lack of expression of NEUROD1, compared to levels observed in the RPM tumors.

      In addition, since MYC appears to require Notch signaling to induce  NE fate (cf Ireland et al), the presence of DAPT in culture could enrich for NE fate despite MYC's presence. It's important to clarify in the legend of Fig 4A which samples are used in the scRNA-seq data and whether they were derived from in vitro or in vivo conditions (as such, Supplementary Figure S1B should be provided in the main figure). Given their conclusion is confusing and challenges robustly supported data in other models, it is critical to resolve this issue properly. I suspect when properly resolved, MYC actually consistently does reduce NE fate compared to RP controls, even though tumors are still relatively NE compared to completely distinct cellular identities such as fibroblasts.

      We have clarified the source of tumor sequencing data and the platform (single cell or bulk) in Figure 4 and Supplemental Figure 1. To reiterate – the RNA sequencing results from paired metastatic and primary tumors from the RPM model are derived from bulk RNA;  the single cell RNA data in RP or RPM datasets are from cells in culture.  These distinctions are clarified in the legend to Supplemental Figure 1.

      (2) The rigor of the conclusions in Figure 1 would be strengthened by comparing an equivalent number of RP animals in the renal capsule assay, which is n = 6 compared to n = 11-14 in the MYC conditions.

      As we did not perform a power calculation to determine a sample size required to draw a level of statistical significance from our conclusions, this comment is not entirely accurate. Our statistical rigor was limited by the availability of samples from the RP tumor model.

      (3) Statistical analysis is not provided for Figures 2A-2B, and while the results are compelling, may be strengthened by additional samples due to the variability observed. 

      We acknowledge that the cohorts are relatively small but we have added statistical comparisons in Figure 2B. 

      (4a) Related to Figure 3, primary tumors and liver metastases from RPM or RPM-T58A-expressing cells express NEUROD1 by immunohistochemistry (IHC) but the putative negative controls (RP) are not shown, and there is no assessment of variability from tumor to tumor, ie, this is not quantified across multiple animals. 

      The results of H&E and IF staining for ASCL1, NEUROD1, CGRP, and CD56 in negative control (RP tumors) are presented in the updated Figure 3F-G.

      (4b) Relatedly, MYC has been shown to be able to push cells beyond NEUROD1 to a double-negative or YAP1+ state (Mollaoglu et al, Cancer Cell, 2017; Ireland et al, Cancer Cell, 2020), but the authors do not assess subtype markers by IHC. They do show subtype markers by mRNA levels in Fig 4B, and since there is expression of ASCL1, and potentially expression of YAP1 and POU2F3, it would be valuable to examine the protein levels by IHC in control RP vs. RPM samples.

      YAP1 positive SCLC is still somewhat controversial, so it is not clear what value staining for YAP1 offers beyond showing the well-established markers, ASCL1 and NEUROD1.  

      (5) Given that MYC has been shown to function distinctly from MYCL in SCLC models, it would have raised the impact and value of the study if MYC was compared to MYCL or MYCL fusions in this context since generally, SCLC expresses a MYC family member. However, it is quite possible that the control RP cells do express MYCL, and as such, it would be useful to show. 

      We now include Supplemental Figure S2 to illustrate four important points raised by this reviewer and others:  1) expression of MYC family members in the merged dataset (RP and RPM) is low or undetectable in the basal/fibroblast cultures; 2) MYC does have a weak correlation with EGFP in the neuroendocrine cluster when either WT MYC or T58A MYC is overexpressed; 3) MYCL and MYCN are detectable, but at low levels compared to CMYC; and 4) Expression of  ASCL1 is anticorrelated with MYC expression across the merged single cell datasets using RP and RPM models.

      Reviewer #3 (Public Review): 

      Summary: 

      The authors continue their study of the experimental model of small cell lung cancer (SCLC) they created from human embryonic stem cells (hESCs) using a protocol for differentiating the hESCs into pulmonary lineages followed by NOTCH signaling inactivation with DAPT, and then knockdown of TP53 and RB1 (RP models) with DOX inducible shRNAs. To this published model, they now add DOX-controlled activation of expression of a MYC or T58A MYC transgenes (RPM and RPMT58A models) and study the impact of this on xenograft tumor growth and metastases. Their major findings are that the addition of MYC increased dramatically subcutaneous tumor growth and also the growth of tumors implanted into the renal capsule. In addition, they only found liver and occasional lung metastases with renal capsule implantation. Molecular studies including scRNAseq showed that tumor lines with MYC or T58A MYC led surprisingly to more neuroendocrine differentiation, and (not surprisingly) that MYC expression was most highly correlated with NEUROD1 expression. Of interest, many of the hESCs with RPM/RPMT58A expressed ASCL1. Of note, even in the renal capsule RPM/RPMT58A models only 6/12 and 4/9 mice developed metastases (mainly liver with one lung metastasis) and a few mice of each type did not even develop a renal sub capsule tumor. The authors start their Discussion by concluding: " In this report, we show that the addition of an efficiently expressed transgene encoding normal or mutant human cMYC can convert weakly tumorigenic human PNEC cells, derived from a human ESC line and depleted of tumor suppressors RB1 and TP53, into highly malignant, metastatic SCLC-like cancers after implantation into the renal capsule of immunodeficient mice.". 

      Strengths: 

      The in vivo study of a human preclinical model of SCLC demonstrates the important role of c-Myc in the development of a malignant phenotype and metastases. Also the role of c-Myc in selecting for expression of NEUROD1 lineage oncogene expression. 

      Weaknesses: 

      There are no data on results from an orthotopic (pulmonary) implantation on generation of metastases; no comparative study of other myc family members (MYCL, MYCN); no indication of analyses of other common metastatic sites found in SCLC (e.g. brain, adrenal gland, lymph nodes, bone marrow); no studies of response to standard platin-etoposide doublet chemotherapy; no data on the status of NEUROD1 and ASCL1 expression in the individual metastatic lesions they identified. 

      We have acknowledged from the outset that our study has significant limitations, as noted by this reviewer, and we explained in our initial letter of response why we need to present this limited, but still consequential, story at this time. 

      In particular, while we have attempted orthotopic transplantations of RPM tumor cells into NSG mice (by tail vein or intra-pulmonary injection, or intra-tracheal instillation of tumor cells), these methods were not successful in colonizing the lung. Additionally, we have compared the efficacy of platinum/etoposide to that of removing DOX in established RPM subcutaneous tumors, but we chose not to include these data as we lacked a chemotherapy responsive tumor model, and thus could not say with confidence that the chemotherapeutic agants were active and that the RPM models were truly resistant to standard SCLC chemotherapy. In a discussion about other metastatic sites, we have now included the following text: 

      “In animals administered DOX, histological examinations showed that approximately half developed metastases in distant organs, including the liver or lung (Figure 1D). No metastases were observed in the bone, brain, or lymph nodes. For a more detailed assessment, future studies could employ more sensitive imaging methods, such as luciferase imaging.”

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors): 

      Technical points related to Major Weakness #1: 

      For Figure 4: Cells were enriched for EGFP-high cells only, under the hypothesis that cells with lower EGFP may have silenced expression of the integrated vector. Since EGFP is expressed only in the shRB1 construct, selection for high EGFP may inadvertently alter/exclude heterogeneity within the transformed population for the other transgenes (shP53, shMYC/MYC-T58A). Can authors include data to show the expression of MYC/MYC T58A in EGFP-high v -med v-low cells? MYC levels may alter the NEdifferentiation status of tumor cells. 

      Please now refer to Supplemental Figure S2.

      Related to the appropriateness of the methods for Figure 4C, the authors state, "We performed differential cluster abundance analysis after accounting for the fraction of cells that were EGFP+". If only EGFP+ cells were accounted for in the analysis for 4C, the majority of RP cells in the "Neuroendocrine differentiated" cluster would not be included in the analysis (according to EGFP expression in Fig S1A-B), and therefore inappropriately reduce NE identity compared to RPM samples that have higher levels of EGFP. 

      There is no consideration or analysis of cell cycling/proliferation until after the conclusion is stated. Yet, increased proliferation of MYC-high vs MYC-low cultures would enhance selection for more tumors (termed "NE-diff") than non-tumors (basal/fibroblast) in 2D cultures. 

      The expression of MYC itself isn't assessed for this analysis but assumed, and whether higher levels of MYC/MYC-T58A may be present in EGFP+ tumor cells that are in the NE-low populations isn't clear. Can MYC-T58A/HA also be included in the reference genome? 

      We did not include an HA tag in our reference transcriptome. For [some] answers to this and other related questions, please refer to Supplemental Figure S2.

      Reviewer #3 (Recommendations For The Authors): 

      (1) The experiments are all technically well done and clearly presented and represent a logical extension exploring the role of c-Myc in the hESC experimental model system. 

      We appreciate this supportive comment!

      (2) It is of great interest that both the initial RP model only forms "benign" tumors and that with the addition of a strong oncogene like c-myc, where expression is known to be associated with a very bad prognosis in SCLC, that while one gets tumor formation there are still occasional mice both for subcutaneous and renal capsule test sites that don't get tumors even with the injection of 500,000 RPM/RPMT58A cells. In addition, of the mice that do form tumors, only ~50% exhibit metastases from the renal sub-capsule site. The authors need to comment on this further in their Discussion. To me, this illustrates both how incredibly resistant/difficult it is to form metastases, thus indicating the need for other pathways to be activated to achieve such spread, and also represents an opportunity for further functional genomic tests using their preclinical model to systematically attack this problem. Obvious candidate genes are those recently identified in genetically engineered mouse models (GEMMs) related to neuronal behavior. In addition, we already know that full-fledged patient-derived SCLC when injected subcutaneously into immune-deprived mice don't exhibit metastases - thus, while the hESC RPM result is not surprising, it indicates to me the power of their model (logs less complicated genetically than a patient SCLC) to sort through a mechanism that would allow metastases to develop from subcutaneous sites. The authors can point these things out in their Discussion section to provide a "roadmap" for future research. 

      Although we remain mindful of the relatively small cohorts we have studied, the thrust of Reviewer #3’s comments is now included in the Discussion. And there is, of course, a lot more to do, and it has taken several years already to get to this point. Additional information about the prolonged gestation of this project and about the difficulties of doing more in the near future was described in our initial response to reviewers/Editor, included near the start of this letter.    

      (3) I will state the obvious that this paper would be much more valuable if they had compared and contrasted at least one of the myc family members (MYCL or MYCN) with the CMYC findings whatever the results would be. Most SCLC patients develop metastases, and most of their tumors don't express high levels of CMYC (and often use MYCL). In any event, as the authors Discuss, this will be an important next stage to test.

      We have acknowledged and explained the limitations of the work in several ways. Further, we were unaware of the relationship between metastases and the expression of MYC and MYCL1 noted by the reviewer; we will look for confirmation of this association in any future studies, although we have not encountered it in current literature.

      (4) Their assays for metastases involved looking for anatomically "gross" lesions. While that is fine, particularly given that the "gross" lesions they show in figures are actually pretty small, we still need to know if they performed straightforward autopsies on mice and looked for other well-known sites of metastases in SCLC patients besides liver and lung - namely lymph nodes, adrenal, bone marrow, and brain. I would guess these would probably not show metastatic growth but with the current report, we don't know if these were looked for or not. Again, while this could be a "negative" result, the paper's value would be increased by these simple data. Let's assume no metastases are seen, then the authors could further strengthen the case for the value of their hESC model in systematically exploring with functional genomics the requirements to achieve metastases to these other sites.

      We have included descriptions of what we found and didn’t find at other potential sites of metastasis in the results section, with the following sentences: 

      “In animals administered DOX, histological examinations showed that approximately half developed metastases in distant organs, including the liver or lung (Figure 1D). No metastases were observed in the bone, brain, or lymph nodes. For a more detailed assessment, future studies could employ more sensitive imaging methods, such as luciferase imaging.”

      (5) Related to this, we have no idea if the mice that developed liver metastases (or the one mouse with lung metastasis) had more than one metastatic site. They will know this and should report it. Again, my guess is that these were isolated metastases in each mouse. Again, they can indicate the value of their model in searching for programs that would increase the number of the various organs. 

      We appreciate the suggestion. We observed that one of the mice developed metastatic tumors in both the liver and lungs. This information has been incorporated into the Results section.

      (6) While renal capsule implantation for testing growth and metastatic behavior is reasonable and based on substantial literature using this site for implantation of patient tumor specimens, what would have increased the value of the paper is knowing the results from orthotopic (lung implantation). Whatever the results were (they occurred or did not occur) they will be important to know. I understand the "future experiments" argument, but in reading the manuscript this jumped out at me as an obvious thing for the authors to try. 

      We conducted orthotopic implantation several ways, including via intra-tracheal instillation of 0.5 million RP or RPM cells in PBS per mouse. However, none of the subjects (0/5 mice) developed tumor-like growths and the number of animals used was small. Further, this outcome could be attributed to biological or physical factors. For instance, the conducting airway is coated with secretory cells producing protective mucins and may not have retained the 0.5 million cells. This is one example that may have hindered effective colonization. Future adjustments, such as increasing the number of cells, embedding them in Matrigel, or damaging the airway to denude secretory cells and trigger regeneration might alter the outcomes. These ideas might guide future work to strengthen the utility of the models.

      (7) Another obvious piece of data that would have improved the value of this manuscript would be to know whether the RPM tumors responded to platin-etoposide chemotherapy. Such data was not presented in their first RP hESC notch inhibition paper (which we now know generated what the authors call "benign" tumors). While I realize chemotherapy responses represent other types of experiments, as the authors point out one of the main reasons they developed their new human model was for therapy testing. Two papers in and we are all still asking - does their model respond or not respond dramatically to platin-etoposide therapy? Whatever the results are they are a vital next step in considering the use of their model. 

      Please see the comments above regarding our decision not to include data from a clinical trial that lacked appropriate controls.

      (8) The finding of RPM cells that expressed NEUROD1, ASCL1, or both was interesting. From the way the data were presented, I don't have a clear idea which of these lineage oncogenes the metastatic lesions from ~11 different mice expressed. Whatever the result is it would be useful to know - all NEUROD1, some ASCL1, some mixed etc.

      Based on the bulk RNA-sequencing of a few metastatic sites (Figure 4H), what we can demonstrate is that all sites were NEUROD1 and expressed low or no detectable  ASCL1.

      (9) While several H&E histologic images were presented, even when I enlarged them to 400% I couldn't clearly see most of them. For future reference, I think it would be important to have several high-quality images of the RP, RPM, RPMT58A subcutaneous tumors, sub-renal capsule tumors, and liver and lung metastatic lesions. If there is heterogeneity in the primary tumors or the metastases it would be important to show this. The quality of the images they have in the pdf file is suboptimal. If they have already provided higher-quality images - great. If not, I think in the long run as people come back to this paper, it will help both the field and the authors to have really great images of their tumors and metastases. 

      We have attempted to improve the quality of the embedded images. Digital resolution is a tradeoff with data size – higher resolution images are always available upon request, but may not be suitable  for generation of figures in a manuscript viewed on-line.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors revealed the cellular heterogeneity of companion cells (CCs) and demonstrated that the florigen gene FT is highly expressed in a specific subpopulation of these CCs in Arabidopsis. Through a thorough characterization of this subpopulation, they further identified NITRATE-INDUCIBLE GARP-TYPE TRANSCRIPTIONAL REPRESSOR 1 (NIGT1)-like transcription factors as potential new regulators of FT. Overall, these findings are intriguing and valuable, contributing significantly to our understanding of florigen and the photoperiodic flowering pathway. However, there is still room for improvement in the quality of the data and the depth of the analysis. I have several comments that may be beneficial for the authors.

      Strengths:

      The usage of snRNA-seq to characterize the FT-expressing companion cells (CCs) is very interesting and important. Two findings are novel: 1) Expression of FT in CCs is not uniform. Only a subcluster of CCs exhibits high expression level of FT. 2) Based on consensus binding motifs enriched in this subcluster, they further identify NITRATE-INDUCIBLE GARP-TYPE TRANSCRIPTIONAL REPRESSOR 1 (NIGT1)-like transcription factors as potential new regulators of FT.

      We are pleased to hear that reviewer 1 noted the novelty and importance of our work. As reviewer 1 mentioned, we are also excited about the identification of a subcluster of companion cells with very high FT expression. We believe that this work is an initial step to describe the molecular characteristics of these FT-expressing cells. We are also excited to share our new findings on NIGT1_s as potential _FT regulators. We think that this finding attracts broader audiences, as the molecular factor that coordinates plant nutrition status with flowering time remains largely unknown despite its well-known plant phenomenon.

      Weaknesses:

      (1) Title: "A florigen-expressing subpopulation of companion cells". It is a bit misleading. The conclusion here is that only a subset of companion cells exhibit high expression of FT, but this does not imply that other companion cells do not express it at all.

      We agree with this comment, as we also did not intend to say that FT is not produced in other companion cells than the subpopulation we identified. We will revise the title to more accurately reflect the point.

      (2) Data quality: Authors opted for fluorescence-activated nuclei sorting (FANS) instead of traditional cell sorting method. What is the rationale behind this decision? Readers may wonder, especially given that RNA abundance in single nuclei is generally lower than that in single cells. This concern also applies to snRNA-seq data. Specifically, the number of genes captured was quite low, with a median of only 149 genes per nucleus. Additionally, the total number of nuclei analyzed was limited (1,173 for the pFT:NTF and 3,650 for the pSUC2:NTF). These factors suggest that the quality of the snRNA-seq data presented in this study is quite low. In this context, it becomes challenging for the reviewer to accurately assess whether this will impact the subsequent conclusions of the paper. Would it be possible to repeat this experiment and get more nuclei?

      We appreciate this comment; we noticed that we did not clearly explain the rationale of using single-nucleus RNA sequencing (snRNA-seq) instead of single-cell RNA-seq (scRNA-seq). As reviewer 1 mentioned, RNA abundance in scRNA-seq is higher than in snRNA-seq. To conduct scRNA-seq using plant cells, protoplasting is the necessary step. However, in our study, protoplasting has many drawbacks in isolating our target cells from the phloem. It is technically challenging to efficiently isolate protoplasts from highly embedded phloem companion cells from plant tissues. Usually, it requires a minimum of several hours of enzymatic incubation to protoplast companion cells and the efficiencies of protoplasting these cells are still low. For our analysis, restoring the time information within a day is also crucial. Therefore, we performed more speedy isolation method. In the revision, we will explain our rationale of choosing snRNA-seq due to the technical limitations.

      Here, reviewer 1 raised a concern about the quality of our snRNA-seq data, referring to the relatively low readcounts per nucleus. Although we believe that shallow reads do not necessaryily indicate low quality and are confident in the accuracy of our snRNA-seq data, as supported by the detailed follow-up experiments (e.g., imaging analysis in Fig. 4B), we agree that it is important to address this point in the revision and alleviate readers’ concerns regarding the data quality.

      (3) Another disappointment is that the authors did not utilize reporter genes to identify the specific locations of the FT-high expressing cells (cluster 7 cells) within the CC population in vivo. Are there any discernible patterns that can be observed?

      As we previously showed only limited spatial images of overlap between FT-expressing cells and other cluster 7 gene-expressing cells in Fig. 4B, this comment is understandable. To respond to it, we will include whole leaf images of FT- and cluster 7 gene-expressing cells to assess the spatial overlaps between FT and cluster 7 genes within a leaf.

      (4) The final disappointment is that the authors only compared FT expression between the nigtQ mutants and the wild type. Does this imply that the mutant does not have a flowering time defect particularly under high nitrogen conditions?

      To answer this question, we will include the flowering time measurement data of the nigtQ mutants grown on the soil with sufficient nitrogen sources.

      Reviewer #2 (Public review):

      This manuscript submitted by Takagi et al. details the molecular characterization of the FT-expressing cell at a single-cell level. The authors examined what genes are expressed specifically in FT-expressing cells and other phloem companion cells by exploiting bulk nuclei and single-nuclei RNA-seq and transgenic analysis. The authors found the unique expression profile of FT-expressing cells at a single-cell level and identified new transcriptional repressors of FT such as NIGT1.2 and NIGT1.4.

      Although previous researchers have known that FT is expressed in phloem companion cells, they have tended to neglect the molecular characterization of the FT-expressing phloem companion cells. To understand how FT, which is expressed in tiny amounts in phloem companion cells that make up a very small portion of the leaf, can be a key molecule in the regulation of the critical developmental step of floral transition, it is important to understand the molecular features of FT-expressing cells in detail. In this regard, this manuscript provides insight into the understanding of detailed molecular characteristics of the FT-expressing cell. This endeavor will contribute to the research field of flowering time.

      We are grateful that reviewer 2 recognizes the importance of transcriptome profiling of FT-expressing cells at the single-cell level.

      Here are my comments on how to improve this manuscript.

      (1) The most noble finding of this manuscript is the identification of NTGI1.2 as the upstream regulator of FT-expressing cluster 7 gene expression. The flowering phenotypes of the nigtQ mutant and the transgenic plants in which NIGT1.2 was expressed under the SUC2 gene promoter support that NIGT1.2 functions as a floral repressor upstream of the FT gene. Nevertheless, the expression patterns of NIGT1.2 genes do not appear to have much overlap with those of NIGT1.2-downstream genes in the cluster 7 (Figs S14 and F3). An explanation for this should be provided in the discussion section.

      We agree reviewer 2 that spatial expression patterns of NIGT1.2 and cluster 7 genes do not overlap much, and some discussion should be provided in the manuscript. Although we do not have a concrete answer for this phenomenon, NIGT1.2 may suppress FT gene expression in non-cluster 7 cells to prevent the misexpression of FT. Another possible explanation is that NIGT1.2 negatively affects the formation of cluster 7 cells. If so, cells with high NIGT1.2 gene expression hardly become cluster 7 cells. We will discuss it further in the discussion section in our revised manuscript.

      (2) To investigate gene expression in the nuclei of specific cell populations, the authors generated transgenic plants expressing a fusion gene encoding a Nuclear Targeting Fusion protein (NTF) under the control of various cell type-specific promoters. Since the public audience would not know about NTF without reading reference 16, some explanation of NTF is necessary in the manuscript. Please provide a schematic of constructs the authors used to make the transformants.

      As reviewer 2 pointed out, we lacked a clear explanation why we used NTF in this study. NTF is the fusion protein that consists of a nuclear envelope targeting domain, GFP, and biotin acceptor peptide. It was originally designed for the INTACT (isolation of nuclei tagged in specific cell types) method that enables us to isolate bulk nuclei from specific tissues. Although our original intention was profiling the bulk transcriptome of mRNAs that exist in nuclei of the FT-expressing cells using INTACT, we utilized our NTF transgenic lines for snRNA-seq analysis. To explain what NTF is to readers, we will include a schematic diagram of NTF.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      We have carefully addressed all the reviewers' suggestions, and detailed responses are provided at the end of this letter. In summary:

      • We conducted two additional replicates of the study to obtain more robust and reliable data.

      • The Introduction has been revised for greater clarity and conciseness.

      • The Results section was shortened and reorganized to highlight the key findings more effectively.

      • The Discussion was modified according to the reviewers' suggestions, with a focus on reorganization and conciseness.

      We hope you find this revised version of the manuscript satisfactory.

      Reviewer #1 (Public Review):

      Summary:

      This study examines the role of host blood meal source, temperature, and photoperiod on the reproductive traits of Cx. quinquefasciatus, an important vector of numerous pathogens of medical importance. The host use pattern of Cx. quinquefasciatus is interesting in that it feeds on birds during spring and shifts to feeding on mammals towards fall. Various hypotheses have been proposed to explain the seasonal shift in host use in this species but have provided limited evidence. This study examines whether the shifting of host classes from birds to mammals towards autumn offers any reproductive advantages to Cx. quinquefasciatus in terms of enhanced fecundity, fertility, and hatchability of the offspring. The authors found no evidence of this, suggesting that alternate mechanisms may drive the seasonal shift in host use in Cx. quinquefasciatus.

      Strengths:

      Host blood meal source, temperature, and photoperiod were all examined together.

      Weaknesses:

      The study was conducted in laboratory conditions with a local population of Cx. quinquefasciatus from Argentina. I'm not sure if there is any evidence for a seasonal shift in the host use pattern in Cx. quinquefasciatus populations from the southern latitudes.

      Comments on the revision: 

      Overall, I am not quite convinced about the possible shift in host use in the Argentinian populations of Cx. quinquefasciatus. The evidence from the papers that the authors cite is not strong enough to derive this conclusion. Therefore, I think that the introduction and discussion parts where they talk about host shift in Cx. quinquefasciatus should be removed completely as it misleads the readers. I suggest limiting the manuscript to talking only about the effects of blood meal source and seasonality on the reproductive outcomes of Cx. quinquefasciatus

      As mentioned in the previous revision, we agree on the reviewer observation about the lack of evidence on seasonal shift in the host use pattern in Cx. quinquefasciatus populations from Argentina. We include this topic in the discussion.

      Additionally, we also added a paragraph in the discussion section to include the limitations of our study and conclusions. One of them is the fact that our results are based on controlled conditions experiments. Future studies are needed to elucidate if the same trend is found in the field.

      Reviewer #1 (Recommendations for the authors): 

      Abstract

      Line 73: shift in feeding behavior

      Accepted as suggested. 

      Discussion

      Line 258: addressed that Accepted as suggested.

      Line 263: blood is nutritionally richer

      Accepted as suggested.

      Reviewer #2 (Public Review): 

      Summary:

      Conceptually, this study is interesting and is the first attempt to account for the potentially interactive effects of seasonality and blood source on mosquito fitness, which the authors frame as a possible explanation for previously observed host-switching of Culex quinquefasciatus from birds to mammals in the fall. The authors hypothesize that if changes in fitness by blood source change between seasons, higher fitness on birds in the summer and on mammals in the autumn could drive observed host switching. To test this, the authors fed individuals from a colony of Cx. quinquefasciatus on chickens (bird model) and mice (mammal model) and subjected each of these two groups to two different environmental conditions reflecting the high and low temperatures and photoperiod experienced in summer and autumn in Córdoba, Argentina (aka seasonality). They measured fecundity, fertility, and hatchability over two gonotrophic cycles. The authors then used a generalized linear model to evaluate the impact of host species, seasonality, and gonotrophic cycle on fecundity, fertility, and hatchability. The authors were trying to test their hypothesis by determining whether there was an interactive effect of season and host species on mosquito fitness. This is an interesting hypothesis; if it had been supported, it would provide support for a new mechanism driving host switching. While the authors did report an interactive impact of seasonality and host species, the directionality of the effect was the opposite from that hypothesized. The authors have done a very good job of addressing many of the reviewer concerns, with several exception that continue to cause concern about the conclusions of the study. 

      Strengths:

      (1) Using a combination of laboratory feedings and incubators to simulate seasonal environmental conditions is a good, controlled way to assess the potentially interactive impact of host species and seasonality on the fitness of Culex quinquefasciatus in the lab.

      (2) The driving hypothesis is an interesting and creative way to think about a potential driver of host switching observed in the field. 

      (3) The manuscript has become a lot clearer and easier to read with the revisions - thank you to the authors for working hard to make many of the suggested changes. 

      Weaknesses:

      (1) The authors have decided not to follow the suggestion of conducting experimental replicates of the study. This is understandable given the significant investment of resources and time necessary, however, it leaves the study lacking support. Experimental replication is an important feature of a strong study and helps to provide confidence that the observed patterns are real and replicable. Without replication, I continue to lack confidence in the conclusions of the study. 

      We included replicates as suggested.  

      (2) The authors have included some additional discussion about the counterintuitive nature of their results, but the paragraph discussing this in the discussion was confusing. I believe that this should be revised. This is a key point of the paper and needs to be clear to the reader.

      Revised as suggested. 

      (3) There should be more discussion of the host switching observed in the two studies conducted in Argentina referenced by the authors. Since host switching is the foundation for the hypothesis tested in this paper, it is important to fully explain what is currently known in Argentina. 

      Accepted as suggested.

      (4) In some cases, the explanations of referenced papers are not entirely accurate. For example, when referencing Erram et al 2022, I think the authors misrepresented the paper's discussion regarding pre-diuresis- Erram et al. are suggesting that pre-diuresis might be the mechanism by which C. furens compensates for the lower nutritional value of avian blood, leading to no significant difference between avian/mammal blood on fecundity/fertility (rather than leading to higher fecundity on birds, as stated in this manuscript). The study performed by Erram et al. also didn't prove this phenomenon, they just suggest it as a possible mechanism to explain their results, so that should be made clear when referencing the paper. 

      Changed as suggested.

      (5) In some cases, the conclusions continue to be too strongly worded for the evidence available. For example, lines 322-324: I don't think the data is sufficient to conclude that a different physiological state is induced, nor that they are required to feed on a blood source that results in higher fitness. 

      Redaction was modified as suggested to tight our discussion with results.

      (6) There is limited mention of the caveat that this experiment performed with simulated seasonality that does not perfectly replicate seasonality in the field. I think this caveat should be discussed in the discussion (e.g. that humidity is held constant).

      This topic is now included in the discussion as suggested. 

      Reviewer #2 (Recommendations for the authors): 

      59-60: These terms should end with -phagic instead of -philic. These papers study blood feeding patterns, not preference. I understand that the Janssen papers calls it "mammalophilic" in their title, but this was an incorrect use of the term in their paper. There are some review papers that explain the difference in this terminology if it's helpful.

      Accepted as suggested. 

      73: edit to "in" feeding behavior 

      Accepted as suggested.

      77-78: Given that the premise of your study is based on the phenomenon of host switching, I suggest that you expand your discussion of these two papers. What did they observe? Which hosts did they switch from / to and how dramatic was the shift?

      Accepted as suggested. 

      79: replace acknowledged with experienced 

      Accepted as suggested.

      79-80: the way that this is written is misleading. It suggests that Spinsanti showed that seasonal variation in SLEV could be attributed to a host shift, which isn't true. This citation should come before the comma and then you should use more cautious language in the second half. E.g which MIGHT be possible to attribute to .... 

      Accepted as suggested.

      80-82: this is not convincing. Even if the Robin isn't in Argentina, Argentina does have migrating birds, so couldn't this be the case for other species of birds? Do any of the birds observed in previous blood meal analyses in Argentina migrate? If so, couldn't this hypothesis indeed play a role? 

      A paragraph about this topic was added to the discussion as suggested.

      90: hypotheses for what? The fall peak in cases? Or host switching? 

      Changed to be clearer.

      98: where was this mentioned before? I think "as mentioned before" can be removed. 

      Accepted as suggested.

      101: edit to "whether an interaction effect exists" 

      Accepted as suggested.

      104: edit to "We hypothesize that..." 

      Accepted as suggested.

      106: reported host USE changes, not host PREFERENCE changes, right? 

      All the terminology was change to host pattern and not preference to avoid confusion.

      200: Briefly reading Carsey and Harden, it looks like the methodology was developed for social science. Is there anything you can cite to show this applied to other types of data? If not, I think this requires more explanation in your MS. 

      This was removed as replicates were included.

      237-239: I think it is best not to make a definitive statement about greater/higher if it isn't statistically significant; I suggest modifying the sentences to state that the differences you are listing were not significantly different up front rather than at the end, otherwise if people aren't reading carefully, they may get the wrong impression. 

      Accepted as suggested.

      245: you only use the term MS-I once before and I forgot what it meant since it wasn't repeated, so I had to search back through with command-F. I suggest writing this out rather than using the acronym. 

      Accepted as suggested.

      249: edit to: "an interaction exists between the effect of..." 

      Accepted as suggested.

      253-254: greater compared to what? 

      Change for clearness. 258-260: edit for grammar 

      Accepted as suggested.

      260-262: edit for grammar; e.g. "However, this assumption lacks solid evidence; there is a scarcity of studies regarding nutritional quality of avian blood and its impact on mosquito fitness." 

      Accepted as suggested.

      263: edit: blood is nutritionally... 

      Accepted as suggested.

      264-267: This doesn't sound like an accurate interpretation of what the paper suggests regarding pre-diuresis in their discussion - they are suggesting that pre-diuresis might be the mechanism by which C. furens compensates for the lower nutritional value of avian blood, leading to no significant difference between avian/mammal blood on fecundity/fertility. They also don't show this, they just suggest it as a possible mechanism to explain their results. 

      This topic was removed given the restructuring of discussion.

      253-269: You should tie this paragraph back to your results to explicitly compare/contrast your findings with the previous literature. 

      Accepted as suggested.

      270-282: This paragraph would be a good place to explain the caveat of working in the laboratory - for example, humidity was the same across the two seasons which I'm guessing isn't the case in the field in Argentina. You can discuss what aspects of laboratory season simulation do not accurately replicate field conditions and how this can impact your findings. You said in your response to the reviewers that you weren't interested in measuring other variables (which is fair, and not expected!), but the beauty of the discussion section is to be able to think about how your experimental design might impact your results - one possibility is that your season simulation may not have produced the results produced by true seasonal shifts. 

      Accepted as suggested.

      279-281: You say your experiment was conducted within the optimal range, which would suggest that both summer and autumn were within that range, but then you only talk about summer as optimal in the following sentence. 

      Changed for clearness.

      281-282: You should clarify this sentence - state what the interaction has an effect on. 

      Accepted as suggested.

      283-291: I appreciate that your discussion now acknowledges the small sample size and the questions that remain unanswered due to the results being opposite to that of the hypothesis, but this paragraph lacks some details and in places doesn't make sense. 

      I think you need to emphasize which groups had small sample size and which conclusions that might impact. I also think you need to explain why the sample size was substantially smaller for some groups (e.g. did they refuse to feed on the mouse in the autumn?). I appreciate that sample sizes are hard to keep high across many groups and two gonotrophic periods, but unfortunately, that is why fitness experiments are so hard to do and by their nature, take a long time. I understand that other papers have even lower sample size, but I was not asked to review those papers and would have had the same critique of them. I don't believe that creating simulated data via a Monte Carlo approach can make up for generating real data. As I understand it from your explanation, you are parametrizing the Monte Carlo simulations with your original data, which was small to begin with for autumn mouse. Using this simulation doesn't seem like a satisfactory replacement for an experimental replicate in my opinion. I maintain that at least a second replicate is necessary to see whether the patterns that you have observed hold. 

      The performing of a power analysis and addition of more replicates tried to solve the issue of sample size. More about this critic is added in the discussion. The simulation approach was totally removed.

      Regarding the directionality of the interaction effect, I think this warrants more discussion. Lines 287-291 don't make sense to me. You suggest that feeding on birds in the autumn may confer a reproductive advantage when conditions are more challenging. But then why wouldn't they preferentially feed on birds in the autumn, rather than mammals? I suggest rewriting this paragraph to make it clearer. 

      Accepted as suggested.

      297: earlier mentioned treatments? Do you mean compared to the first gonotrophic cycle? This isn't clear. 

      Changed for clearness.

      302-303: Did you clarify whether you are allowed to reference unpublished data in eLife? 

      This was removed to follow the guidelines of eLife.

      316-317: "it becomes apparent" sounds awkward, I suggest rewording and also explaining how this conclusion was made. 

      Accepted as suggested.

      322-324: I think that this statement is too strongly worded. I don't think your data is sufficient to conclude that a different physiological state is induced, nor that they are required to feed on a blood source that results in higher fitness. Please modify this and make your conclusions more cautious and closely linked to what you actually demonstrated. 

      Accepted as suggested.

      325: change will perform to would have 

      Accepted as suggested.

      326: add to the sentence: "and vice versa in the summer" 

      Accepted as suggested.

      330: possible explanations, not explaining scenarios. 

      Accepted as suggested.

      517: I think you should repeat the abbreviation definitions in the caption to make it easier for readers, otherwise they have to flip back and forth which can be difficult depending on formatting.

      Accepted as suggested. 

      In general, I think that your captions need more information. I think the best captions explain the figure relatively thoroughly such that the reader can look at the figure and caption and understand without reading the paper in depth. (e.g. the statistical test used).

      Data availability: The eLife author instructions do say that data must be made available, so there should be a statement on data availability in your MS. I also suggest you make the code available.

      Accepted as suggested.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      BMP signaling is, arguably, best known for its role in the dorsoventral patterning, but not in nematodes, where it regulates body size. In their paper, Vora et al. analyze ChIP-Seq and RNA-Seq data to identify direct transcriptional targets of SMA-3 (Smad) and SMA-9 (Schnurri) and understand the respective roles of SMA-3 and SMA-9 in the nematode model Caenorhabditis elegans. The authors use publicly available SMA-3 and SMA-9 ChIP-Seq data, own RNA-Seq data from SMA-3 and SMA-9 mutants, and bioinformatic analyses to identify the genes directly controlled by these two transcription factors (TFs) and find approximately 350 such targets for each. They show that all SMA-3-controlled targets are positively controlled by SMA-3 binding, while SMA-9-controlled targets can be either up or downregulated by SMA-9. 129 direct targets were shared by SMA-3 and SMA-9, and, curiously, the expression of 15 of them was activated by SMA-3 but repressed by SMA-9. Since genes responsible for cuticle collagen production were eminent among the SMA-3 targets, the authors focused on trying to understand the body size defect known to be elicited by the modulation of BMP signaling. Vora et al. provide compelling evidence that this defect is likely to be due to problems with the BMP signaling-dependent collagen secretion necessary for cuticle formation.

      We thank the reviewer for this supportive summary. We would like to clarify the status of the publicly available ChIP-seq data. We generated the GFP tagged SMA-3 and SMA‑9 strains and submitted them to be entered into the queue for ChIP-seq processing by the modENCODE (later modERN) consortium. Thus, the publicly available SMA-3 and SMA-9 ChIP-seq datasets used here were derived from our efforts.  Due to the nature of the consortium’s funding, the data were required to be released publicly upon completion. Nevertheless, our current manuscript provides the first comprehensive analysis of these datasets. We have updated the text to clarify this point.

      Strengths:

      Vora et al. provide a valuable analysis of ChIP-Seq and RNA-Seq datasets, which will be very useful for the community. They also shed light on the mechanism of the BMP-dependent body size control by identifying SMA-3 target genes regulating cuticle collagen synthesis and by showing that downregulation of these genes affects body size in C. elegans.

      Weaknesses:

      (1) Although the analysis of the SMA-3 and SMA-9 ChIP-Seq and RNA-Seq data is extremely useful, the goal "to untangle the roles of Smad and Schnurri transcription factors in the developing C. elegans larva", has not been reached. While the role of SMA-3 as a transcriptional activator appears to be quite straightforward, the function of SMA-9 in the BMP signaling remains obscure. The authors write that in SMA-9 mutants, body size is affected, but they do not show any data on the mechanism of this effect.

      We thank the reviewer for directing our attention to the lack of clarity about SMA-9’s function. We have revised the text to highlight what this study and others demonstrate about SMA-9’s role in body size. Simply stated, SMA-9 is needed together with SMA-3 to promote the expression of genes involved in one-carbon metabolism, collagens, and chaperones, all of which are required for body size. SMA-3 has additional, SMA-9-independent transcriptional targets, including chaperones and ER secretion factors, that also contribute to body size. Finally, SMA-9 regulates additional targets independent of SMA-3 that likely have a minimal role in body size. We have adjusted Figure 5 with new graphs of the original data to make these points more clear.

      (2) The authors clearly show that both TFs can bind independently of each other, however, by using distances between SMA-3 and SMA-9 ChIP peaks, they claim that when the peaks are close these two TFs act as complexes. In the absence of proof that SMA-3 and SMA-9 physically interact (e.g. that they co-immunoprecipitate - as they do in Drosophila), this is an unfounded claim, which should either be experimentally substantiated or toned down.

      We acknowledge that we have not demonstrated a physical interaction between SMA-3 and SMA-9 through a co-immunoprecipitation, and we have indicated in the text that a formal biochemical demonstration would be required to make this point. Moreover, we toned down the text by stating that our results suggest that either SMA-3 and SMA-9 frequently bind as either subunits in a complex or in close vicinity to each other along the DNA. As the reviewer has indicated, a physical interaction between Smads and Schnurris has been amply demonstrated in other systems. A limitation in these previous studies is that only a small number of target genes were analyzed. Our goal in this study was to determine how widespread this interaction is on a genomic scale. Our analyses demonstrate for the first time that a Schnurri transcription factor has significant numbers of both Smad-dependent and Smad-independent target genes. We have revised the text to clarify this point.

      (3) The second part of the paper (the collagen story) is very loosely connected to the first part. dpy-11 encodes an enzyme important for cuticle development, and it is a differentially expressed direct target of SMA-3. dpy-11 can be bound by SMA-9, but it is not affected by this binding according to RNA-Seq. Thus, technically, this part of the paper does not require any information about SMA-9. However, this can likely be improved by addressing the function of the 15 genes, with the opposing mode of regulation by SMA-3 and SMA-9.

      We appreciate this suggestion and have clarified in the text how SMA-9 contributes to collagen organization and body size regulation.

      (4) The Discussion does not add much to the paper - it simply repeats the results in a more streamlined fashion.

      We thank the reviewer for this suggestion. We have added more context to the Discussion.

      Reviewer #2 (Public Review):

      In the present study, Vora et al. elucidated the transcription factors downstream of the BMP pathway components Smad and Schnurri in C. elegans and their effects on body size. Using a combination of a broad range of techniques, they compiled a comprehensive list of genome-wide downstream targets of the Smads SMA-3 and SMA-9. They found that both proteins have an overlapping spectrum of transcriptional target sites they control, but also unique ones. Thereby, they also identified genes involved in one-carbon metabolism or the endoplasmic reticulum (ER) secretory pathway. In an elaborate effort, the authors set out to characterize the effects of numerous of these targets on the regulation of body size in vivo as the BMP pathway is involved in this process. Using the reporter ROL-6::wrmScarlet, they further revealed that not only collagen production, as previously shown, but also collagen secretion into the cuticle is controlled by SMA-3 and SMA-9. The data presented by Vora et al. provide in-depth insight into the means by which the BMP pathway regulates body size, thus offering a whole new set of downstream mechanisms that are potentially interesting to a broad field of researchers.

      The paper is mostly well-researched, and the conclusions are comprehensive and supported by the data presented. However, certain aspects need clarification and potentially extended data.

      (1) The BMP pathway is active during development and growth. Thus, it is logical that the data shown in the study by Vora et al. is based on L2 worms. However, it raises the question of if and how the pattern of transcriptional targets of SMA-3 and SMA-9 changes with age or in the male tail, where the BMP pathway also has been shown to play a role. Is there any data to shed light on this matter or are there any speculations or hypotheses?

      We agree that these are intriguing questions, and we are interested in the roles of transcriptional targets at other developmental stages and in other physiological functions, but these analyses are beyond the scope of the current study.

      (2) As it was shown that SMA-3 and SMA-9 potentially act in a complex to regulate the transcription of several genes, it would be interesting to know whether the two interact with each other or if the cooperation is more indirect.

      A physical interaction between Smads and Schnurri has been amply demonstrated in other systems. Our goal in this study was not to validate this physical interaction, but to analyze functional interactions on a genome-wide scale.

      (3) It would help the understanding of the data even more if the authors could specifically state if there were collagens among the genes regulated by SMA-3 and SMA-9 and which.

      We thank the reviewer for this suggestion. col-94 and col-153 were identified as direct targets of both SMA-3 and SMA-9. We noted this in the Discussion.

      (4) The data on the role of SMA-3 and SMA-9 in the regulation of the secretion of collagens from the hypodermis is highly intriguing. The authors use ROL-6 as a reporter for the secretion of collagens. Is ROL-6 a target of SMA-9 or SMA-3? Even if this is not the case, the data would gain even more strength if a comparable quantification of the cuticular levels of ROL-6 were shown in Figure 6, and potentially a ratio of cuticular versus hypodermal levels. By that, the levels of secretion versus production can be better appreciated.

      We previously showed that rol-6 mRNA levels are reduced in dbl-1 mutants at L2, but RNA-seq analysis did not find enough of a statistically significant change in rol-6 to qualify it as a transcriptional target and total levels of protein are also not significantly reduced in mutants. We added this information in the text.

      (5) It is known that the BMP pathway controls several processes besides body size. The discussion would benefit from a broader overview of how the identified genes could contribute to body size. The focus of the study is on collagen production and secretion, but it would be interesting to have some insights into whether and how other identified proteins could play a role or whether they are likely to not be involved here (such as the ones normally associated with lipid metabolism, etc.).

      We have added more information to the Discussion.

      Reviewer #1 (Recommendations For The Authors):

      Figure 1 - Figure 3: The authors might want to think about condensing this into two figures.

      To avoid confusion with the different workflows, we prefer to keep these as three separate figures.

      Figure 1a-b: Measurement unit missing on X.

      We added the unit “bps” to these graphs.

      Line 244-246: The authors should stress in the Results that they analyzed publicly available ChIP-Seq data, which was not generated by them, - not just by providing a reference to Kudron et al., 2018. As far as I understood, ChIP was performed with an anti-GFP antibody. Please mention this, and specify the information about the vendor and the catalog number in the Methods.

      We would like to clarify the status of the publicly available ChIP-seq data. We generated the GFP tagged SMA-3 and SMA‑9 strains and submitted them to be entered into the queue for ChIP-seq processing by the modENCODE (later modERN) consortium. Thus, the publicly available SMA-3 and SMA-9 ChIP-seq datasets used here were derived from our efforts.  Due to the nature of the consortium’s funding, the data were required to be released publicly upon completion. Nevertheless, our current manuscript provides the first comprehensive analysis of these datasets. We have clarified these issues in the text.  We have also added information regarding the anti-GFP antibody to the Methods.

      Line 267-270: The authors should either provide experimental evidence that SMA-3 and SMA-9 form complexes or write something like "significant overlap between SMA-3 and SMA-9 peaks may indicate complex formation between these two transcription factors as shown in Drosophila" - but in the absence of proof, this must be a point for the Discussion, not for the Results. Moreover, similar behavior of fat-6 (overlapping ChIP peaks) and nhr-114 (non-overlapping ChIP peaks) in SMA-3 and SMA-9 mutants may be interpreted as a circumstantial argument against SMA-3/SMA-9 complex formation (see Lines 342-348). Importantly, since ChIP-Seq data are available for a wide array of C. elegans TFs, it would be very useful to have an estimate of whether SMA-3/SMA-9 peak overlap is significantly higher than the peak overlap between SMA-3 and several other TFs expressed at the same L2 stage.

      We have clarified our goals regarding SMA-3 and SMA-9 interactions and softened our conclusions by indicating in the text that a formal biochemical demonstration would be required to demonstrate a physical interaction. Moreover, we toned down the text by stating that our results suggest that either SMA-3 and SMA-9 frequently bind as either subunits in a complex or in close vicinity to each other along the DNA. We have added an analysis of HOT sites to address overlap of binding with other transcription factors. We disagree with the interpretation that transcription factors with non-overlapping sites cannot act together to regulate gene expression; however, nhr-114 also has an overlapping SMA-3 and SMA-9 site, so this point becomes less relevant. We have clarified the categorization of nhr-114 in the text.

      Lines 272-292: The authors do not comment on the seemingly quite small overlap between the RNA-Seq and the ChIP-Seq dataset, but I think they should. They have 3205 SMA-3 ChIP peaks and 1867 SMA-3 DEGs, but the amount of directly regulated targets is 367. It is important that the authors provide information on the number of genes to which their peaks have been assigned. Clearly, this will not be one gene per peak, but if it were, this would mean that just 11.5% of bound targets are really affected by the binding. The same number would be 4.7% for the SMA-9 peaks.

      We have added a discussion of the discrepancy between binding sites and DEGs. The high number of additional sites classified as non-functional could represent the detection of weak affinity targets that do not have an actual biological purpose. Alternatively, these sites could have an additional role in DBL-1 signaling besides transcriptional regulation of nearby genes, or they could be regulating the expression of target genes at a far enough distance to not be detected by our BETA analysis as per the constraints chosen for the analysis. The difference between total binding sites and those associated with changes in gene expression underscores the importance of combining RNA-seq with ChIP-seq to identify the most biologically relevant targets. And as the reviewer indicated, more than one gene can be assigned to a single neighboring peak.

      Lines 294-323: I feel like there is a terminology problem, which makes reading very difficult. The authors use "direct targets" as bound genes with significant expression change, but then run into a problem when the gene is bound by SMA-9 and SMA-3, but significant expression change is only associated with one of the two factors. I am not sure this is consistent with the idea of the SMA3/SMA9 complex. Also, different modalities of the SMA3 and SMA9 effect in 15 cases can be explained by co-factors. Reading would be also simplified if the order of the panels in Figure 3 were different. Currently, the authors start their explanation by referring to the shared SMA-3/SMA-9 targets (Figures 3c-d), and only later come to Figure 3b. In general, the authors should start with a clear explanation of what is on the figure (currently starting on Line 313), otherwise, it is unclear why, if the authors only discuss common targets, it is not just 114+15=129 targets, but more.

      We have re-ordered the columns in Figure 3 to match the order discussed in the text. We also incorporated more precise language about regulation by SMA-3 and/or SMA-9 in the text.

      Lines 325-355: The chapter has a rather unfortunate name "Mechanisms of integration of SMA-3 and SMA-9 function", although the authors do not provide any mechanism. Using 3 target genes, they show that if the regulatory modality of SMA-3 and SMA-9 is the same (2 examples), there is no difference in the expression of the targets, but if the modalities are opposing (1 example), SMA-9 repressive action is epistatic to the SMA-3 activating action. Can this be generalized? The authors should test all their 15 targets with opposite regulations. Moreover, it seems obvious to ask whether the intermediate phenotype of the double-mutants can be attributed to the action of these 15 genes activated by SMA-3 and repressed by SMA-9. I would suggest testing this by RNAi. I would also suggest renaming the chapter to something better reflecting its content.

      We have removed the word “mechanism” from the title of this section. We also performed additional RT-PCR experiments on another 5 targets with opposing directions of regulation. The results from these genes are consistent with the result from C54E4.5, demonstrating that the epistasis of sma-9 is generalizable.

      Figure 4b: Why was a two-way ANOVA performed here? With the small number of measurements, I would consider using a non-parametric test.

      These data are parametric and the distribution of the data is normal, so we chose to use a parametric test (ANOVA).

      Lines 354-355. The authors offer two suggestions for the mechanism of the epistatic action of SMA-9 on SMA-3 in the case of C54E4.5, but this is something for the Discussion. If they want to keep it in the Results they should address this experimentally by performing SMA-3 ChIP-seq in the SMA-9 mutants and SMA-9 ChIP-Seq in the SMA-3 mutants.

      We moved these models to the discussion as suggested.

      Lines 365-367: "We expect that clusters of genes involved in fatty acid metabolism and innate immunity mediate the physiological functions of BMP signaling in fat storage and pathogen resistance, respectively." - This is pretty confusing since the Authors claim in the previous sentence that regulation of immunity by SMA-9 is TGF-beta independent.

      Co-regulation of immunity by BMP signaling and SMA-9 is already known. The novel insight is that SMA-9 may have an additional independent role in immunity. We have clarified the language to address this confusion.

      Lines 377, and 380: Please explain in non-C. elegans-specific terminology, what rrf-3 and LON-2 are (e.g. write "glypican LON-2" instead of just "LON-2") and add relevant references.

      We added information on the proteins encoded by these genes.

      Lines 382-384: I am not sure what the Authors mean here by "more limiting".

      We substituted the phrase “might have a more prominent requirement in mediating the exaggerated growth defect of a lon-2 mutant”.

      Lines 388-392: I found this very confusing. What were these 36 genes? Were these direct targets of SMA-3, SMA-9, or both? Top 36 targets? 36 targets for which mutants are available?

      The new Figure 5 clarifies whether target genes are SMA-3-exclusive, SMA-9-exclusive, or co-regulated. The text was also updated for clarity.

      Line 397: This is the first time the authors mention dpy-11 but they do not say what it is until later, and they do not say whether it is a target of SMA3/SMA9. Checking Figure 3, I found that it is among the 238 genes bound by both but upregulated only by SMA3. The authors need to explicitly state this - from this point on, they have a section for which SMA-9 appears to be irrelevant.

      We added the molecular function of dpy-11 at its first mention. Furthermore, we included the hypothesis that SMA-3 may regulate collagen secretion independently of SMA-9. Our subsequent results with sma-9 mutants disprove this hypothesis.

      Line 402: Is ROL-6 a SMA-3/SMA-9 target or just a marker gene?

      We previously showed that rol-6 mRNA levels are reduced in dbl-1 mutants at L2, but RNA-seq analysis did not find enough of a statistically significant change in rol-6 to qualify it as a transcriptional target and total levels of protein are also not significantly reduced in mutants. We added this information in the text.

      Line 421: I am not sure what "more skeletonized" means.

      Replaced with “thinner and skeletonized”

      Figure 2b and 2d legends: "Non-target genes nevertheless showing differential expression are indicated with green squares." (l. 581-582 and again l. 588-589) I think should be "Non-direct target genes...".

      Changed to “non-direct target genes”

      Figure 7 legend: Please indicate the scale bar size in the legend.

      Indicated the scale bar size in the legend.

      Figure 7: The ER marker is referred to as "ssGFP::KDEL" (in the image and Line 700), however in the text it is called "KDEL::oxGFP" (Line 419). Please use consistent naming.

      We fixed the inconsistent naming.

      All the experiment suggestions made are optional and can, in principle, be ignored if the authors tone down their claims (for example, the SMA-3/SMA-9 complex formation).

      Reviewer #2 (Recommendations For The Authors):

      (1) As a control: Have the authors found the known regulated genes among the differentially regulated ones?

      Previously known target genes such as fat-6 and zip-10 were identified here. We have added this information in the text.

      (2) How many repetitions were performed in Figure 4b? I am wondering as the deviation for C54E4.5 is quite large and that makes me worry that the significant differences stated are not robust.

      There were two biologically independent collections from which three cDNA syntheses were analyzed using two technical replicates per point.

      (3) Lines 333-336: Can you really make this claim that the antagonistic effects seen in the regulation of body size can be correlated with some targets being regulated in the opposite direction? I would assume that the situation is far more complex as SMADs also regulate other processes.

      We agree with the reviewer that multiple models could explain this antagonism, and we have added distinct alternatives in the text.

      (4) Lines 367-369: Add the respective reference please.

      We have added the relevant references.

    1. Author response:

      The following is the authors’ response to the original reviews.

      The revised manuscript contains new results and additional text. Major revisions:

      (1) Additional simulations and analyses of networks with different biophysical parameters and with identical time constants for E and I neurons (Methods, Supplementary Fig. 5).

      (2) Additional simulations and analyses of networks with modifications of connectivity parameters to further analyze effects of E/I assemblies on manifold geometry (Supplementary Fig. 6).

      (3) Analysis of synaptic current components (Figure 3 D-F; to analyze mechanism of modest amplification in Tuned networks). 

      (4) More detailed explanation of pattern completion analysis (Results).

      (5) Analysis of classification performance of Scaled networks (Supplementary Fig.8).

      (6) Additional analysis (Figure 5D-F) and discussion (particularly section “Computational functions of networks with E/I assemblies”) of functional benefits of continuous representations in networks with E-I assemblies. 

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      Meissner-Bernard et al present a biologically constrained model of telencephalic area of adult zebrafish, a homologous area to the piriform cortex, and argue for the role of precisely balanced memory networks in olfactory processing. 

      This is interesting as it can add to recent evidence on the presence of functional subnetworks in multiple sensory cortices. It is also important in deviating from traditional accounts of memory systems as attractor networks. Evidence for attractor networks has been found in some systems, like in the head direction circuits in the flies. However, the presence of attractor dynamics in other modalities, like sensory systems, and their role in computation has been more contentious. This work contributes to this active line of research in experimental and computational neuroscience by suggesting that, rather than being represented in attractor networks and persistent activity, olfactory memories might be coded by balanced excitation-inhibitory subnetworks. 

      Strengths: 

      The main strength of the work is in: (1) direct link to biological parameters and measurements, (2) good controls and quantification of the results, and (3) comparison across multiple models. 

      (1) The authors have done a good job of gathering the current experimental information to inform a biological-constrained spiking model of the telencephalic area of adult zebrafish. The results are compared to previous experimental measurements to choose the right regimes of operation. 

      (2) Multiple quantification metrics and controls are used to support the main conclusions and to ensure that the key parameters are controlled for - e.g. when comparing across multiple models.  (3) Four specific models (random, scaled I / attractor, and two variant of specific E-I networks - tuned I and tuned E+I) are compared with different metrics, helping to pinpoint which features emerge in which model. 

      Weaknesses: 

      Major problems with the work are: (1) mechanistic explanation of the results in specific E-I networks, (2) parameter exploration, and (3) the functional significance of the specific E-I model. 

      (1) The main problem with the paper is a lack of mechanistic analysis of the models. The models are treated like biological entities and only tested with different assays and metrics to describe their different features (e.g. different geometry of representation in Fig. 4). Given that all the key parameters of the models are known and can be changed (unlike biological networks), it is expected to provide a more analytical account of why specific networks show the reported results. For instance, what is the key mechanism for medium amplification in specific E/I network models (Fig. 3)? How does the specific geometry of representation/manifolds (in Fig. 4) emerge in terms of excitatory-inhibitory interactions, and what are the main mechanisms/parameters? Mechanistic account and analysis of these results are missing in the current version of the paper. 

      We agree that further mechanistic insights would be of interest and addressed this issue at different levels:

      (1) Biophysical parameters: to determine whether network behavior depends on specific choices of biophysical parameters in E and I neurons we equalized biophysical parameters across neuron types. The main observations are unchanged, suggesting that the observed effects depend primarily on network connectivity (see also response to comment [2]).

      (2) Mechanism of modest amplification in E/I assemblies: analyzing the different components of the synaptic currents demonstrate that the modest amplification of activity in Tuned networks results from an “imperfect” balance of recurrent excitation and inhibition within assemblies (see new Figures 3D-F and text p.7). Hence, E/I co-tuning substantially reduces the net amplification in Tuned networks as compared to Scaled networks, thus preventing discrete attractor dynamics and stabilizing network activity, but a modest amplification still occurs, consistent with biological observations.

      (3) Representational geometry: to obtain insights into the network mechanisms underlying effects of E/I assemblies on the geometry of population activity we tested the hypothesis that geometrical changes depend, at least in part, on the modest amplification of activity within E/I assemblies (see Supplementary Figure 6). We changed model parameters to either prevent the modest amplification in Tuned networks (increasing I-to-E connectivity within assemblies) or introduce a modest amplification in subsets of neurons by other mechanisms (concentration-dependent increase in the excitability of pseudo-assembly neurons; Scaled I networks with reduced connectivity within assemblies). Manipulations that introduced a modest, input-dependent amplification in neuronal subsets had geometrical effects similar to those observed in Tuned networks, whereas manipulations that prevented a modest amplification abolished these effects (Supplementary Figure 6). Note however that these manipulations generated different firing rate distributions. These results provide a starting point for more detailed analyses of the relationship between network connectivity and representational geometry (see p.12).

      In summary, our additional analyses indicate that effects of E/I assemblies on representational geometry depend primarily on network connectivity, rather than specific biophysical parameters, and that the resulting modest amplification of activity within assemblies makes an important contribution. Further analyses may reveal more specific relationships between E/I assemblies and representational geometry, but such analyses are beyond the scope of this study.

      (2) The second major issue with the study is a lack of systematic exploration and analysis of the parameter space. Some parameters are biologically constrained, but not all the parameters. For instance, it is not clear what the justification for the choice of synaptic time scales are (with E synaptic time constants being larger than inhibition: tau_syn_i = 10 ms, tau_syn_E = 30 ms). How would the results change if they are varying these - and other unconstrained - parameters? It is important to show how the main results, especially the manifold localisation, would change by doing a systematic exploration of the key parameters and performing some sensitivity analysis. This would also help to see how robust the results are, which parameters are more important and which parameters are less relevant, and to shed light on the key mechanisms.  

      We thank the reviewer for raising this point. We chose a relatively slow time constant for excitatory synapses because experimental data indicate that excitatory synaptic currents in Dp and piriform cortex contain a prominent NMDA component. Nevertheless, to assess whether network behavior depends on specific choices of biophysical parameters in E and I neurons, we have performed additional simulations with equal synaptic time constants and equal biophysical parameters for all neurons. Each neuron also received the same number of inputs from each population (see revised Methods). Results were similar to those observed previously (Supplementary Fig.5 and p.9 of main text). We therefore conclude that the main effects observed in Tuned networks cannot be explained by differences in biophysical parameters between E and I neurons but is primarily a consequence of network connectivity.

      (3) It is not clear what the main functional advantage of the specific E-I network model is compared to random networks. In terms of activity, they show that specific E-I networks amplify the input more than random networks (Fig. 3). But when it comes to classification, the effect seems to be very small (Fig. 5c). Description of different geometry of representation and manifold localization in specific networks compared to random networks is good, but it is more of an illustration of different activity patterns than proving a functional benefit for the network. The reader is still left with the question of what major functional benefits (in terms of computational/biological processing) should be expected from these networks, if they are to be a good model for olfactory processing and learning. 

      One possibility for instance might be that the tasks used here are too easy to reveal the main benefits of the specific models - and more complex tasks would be needed to assess the functional enhancement (e.g. more noisy conditions or more combination of odours). It would be good to show this more clearly - or at least discuss it in relation to computation and function. 

      In the previous manuscript, the analysis of potential computational benefits other than pattern classification was limited and the discussion of this issue was condensed into a single itemized paragraph to avoid excessive speculation. Although a thorough analysis of potential computational benefits exceeds the scope of a single paper, we agree with the reviewer that this issue is of interest and therefore added additional analyses and discussion.

      In the initial manuscript we analyzed pattern classification primarily to investigate whether Tuned networks can support this function at all, given that they do not exhibit discrete attractor states. We found this to be the case, which we consider a first important result.

      Furthermore, we found that precise balance of E/I assemblies can protect networks against catastrophic firing rate instabilities when assemblies are added sequentially, as in continual learning. Results from these simulations are now described and discussed in more detail (see Results p.11 and Discussion p.13).

      In the revised manuscript, we now also examine additional potential benefits of Tuned networks and discuss them in more detail (see new Figure 5D-F and text p.11). One hypothesis is that continuous representations provide a distance metric between a given input and relevant (learned) stimuli. To address this hypothesis, we (1) performed regression analysis and (2) trained support vector machines (SVMs) to predict the concentration of a given odor in a mixture based on population activity. In both cases, Tuned E+I networks outperformed Scaled and _rand n_etworks in predicting the concentration of learned odors across a wide range mixtures (Figure 5D-F).  E/I assemblies therefore support the quantification of learned odors within mixtures or, more generally, assessments of how strongly a (potentially complex) input is related to relevant odors stored in memory. Such a metric assessment of stimulus quality is not well supported by discrete attractor networks because inputs are mapped onto discrete network states.

      The observation that Tuned networks do not map inputs onto discrete outputs indicates that such networks do not classify inputs as distinct items. Nonetheless, the observed geometrical modifications of continuous representations support the classification of learned inputs or the assessment of metric relationships by hypothetical readout neurons. Geometrical modifications of odor representations may therefore serve as one of multiple steps in multi-layer computations for pattern classification (and/or other computations). In this scenario, the transformation of odor representations in Dp may be seen as related to transformations of representations between different layers in artificial networks, which collectively perform a given task (notwithstanding obvious structural and mechanistic differences between artificial and biological networks). In other words, geometrical transformations of representations in Tuned networks may overrepresent learned (relevant) information at the expense of other information and thereby support further learning processes in other brain areas. An obvious corollary of this scenario is that Dp does not perform odor classification per se based on inputs from the olfactory bulb but reformats representations of odor space based on experience to support computational tasks as part of a larger system. This scenario is now explicitly discussed (p.14).

      Reviewer #2 (Public Review): 

      Summary: 

      The authors conducted a comparative analysis of four networks, varying in the presence of excitatory assemblies and the architecture of inhibitory cell assembly connectivity. They found that co-tuned E-I assemblies provide network stability and a continuous representation of input patterns (on locally constrained manifolds), contrasting with networks with global inhibition that result in attractor networks. 

      Strengths: 

      The findings presented in this paper are very interesting and cutting-edge. The manuscript effectively conveys the message and presents a creative way to represent high-dimensional inputs and network responses. Particularly, the result regarding the projection of input patterns onto local manifolds and continuous representation of input/memory is very Intriguing and novel. Both computational and experimental neuroscientists would find value in reading the paper. 

      Weaknesses: 

      that have continuous representations. This could also be shown in Figure 5B, along with the performance of the random and tuned E-I networks. The latter networks have the advantage of providing network stability compared to the Scaled I network, but at the cost of reduced network salience and, therefore, reduced input decodability. The authors may consider designing a decoder to quantify and compare the classification performance of all four networks. 

      We have now quantified classification by networks with discrete attractor dynamics (Scaled) along with other networks. However, because the neuronal covariance matrix for such networks is low rank and not invertible, pattern classification cannot be analyzed by QDA as in Figure 5B. We therefore classified patterns from the odor subspace by template matching, assigning test patterns to one of the four classes based on correlations (see Supplementary Figure 8). As expected, Scaled networks performed well, but they did not outperform Tuned networks. Moreover, the performance of Scaled networks, but not Tuned networks, depended on the order in which odors were presented to the network. This hysteresis effect is a direct consequence of persistent attractor states and decreased the general classification performance of Scaled networks (see Supplementary Figure 8 for details). These results confirm the prediction that networks with discrete attractor states can efficiently classify inputs, but also reveal disadvantages arising from attractor dynamics. Moreover, the results indicate that the classification performance of Tuned networks is also high under the given task conditions, which simulate a biologically realistic scenario.

      We would also like to emphasize that classification may not be the only task, and perhaps not even a main task, of Dp/piriform cortex or other memory networks with E/I assemblies. Conceivably, other computations could include metric assessments of inputs relative to learned inputs or additional learning-related computations. Please see our response to comment (3) of reviewer 1 for a further discussion of this issue. 

      Networks featuring E/I assemblies could potentially represent multistable attractors by exploring the parameter space for their reciprocal connectivity and connectivity with the rest of the network. However, for co-tuned E-I networks, the scope for achieving multistability is relatively constrained compared to networks employing global or lateral inhibition between assemblies. It would be good if the authors mentioned this in the discussion. Also, the fact that reciprocal inhibition increases network stability has been shown before and should be cited in the statements addressing network stability (e.g., some of the citations in the manuscript, including Rost et al. 2018, Lagzi & Fairhall 2022, and Vogels et al. 2011 have shown this).  

      We thank the reviewer for this comment. We now explicitly discuss multistability (see p. 12) and refer to additional references in the statements addressing network stability.

      Providing raster plots of the pDp network for familiar and novel inputs would help with understanding the claims regarding continuous versus discrete representation of inputs, allowing readers to visualize the activity patterns of the four different networks. (similar to Figure 1B). 

      We thank the reviewer for this suggestion. We have added raster plots of responses to both familiar and novel inputs in the revised manuscript (Figure 2D and Supplementary Figure 4A).

      Reviewer #3 (Public Review): 

      Summary: 

      This work investigates the computational consequences of assemblies containing both excitatory and inhibitory neurons (E/I assembly) in a model with parameters constrained by experimental data from the telencephalic area Dp of zebrafish. The authors show how this precise E/I balance shapes the geometry of neuronal dynamics in comparison to unstructured networks and networks with more global inhibitory balance. Specifically, E/I assemblies lead to the activity being locally restricted onto manifolds - a dynamical structure in between high-dimensional representations in unstructured networks and discrete attractors in networks with global inhibitory balance. Furthermore, E/I assemblies lead to smoother representations of mixtures of stimuli while those stimuli can still be reliably classified, and allow for more robust learning of additional stimuli. 

      Strengths: 

      Since experimental studies do suggest that E/I balance is very precise and E/I assemblies exist, it is important to study the consequences of those connectivity structures on network dynamics. The authors convincingly show that E/I assemblies lead to different geometries of stimulus representation compared to unstructured networks and networks with global inhibition. This finding might open the door for future studies for exploring the functional advantage of these locally defined manifolds, and how other network properties allow to shape those manifolds. 

      The authors also make sure that their spiking model is well-constrained by experimental data from the zebrafish pDp. Both spontaneous and odor stimulus triggered spiking activity is within the range of experimental measurements. But the model is also general enough to be potentially applied to findings in other animal models and brain regions. 

      Weaknesses: 

      I find the point about pattern completion a bit confusing. In Fig. 3 the authors argue that only the Scaled I network can lead to pattern completion for morphed inputs since the output correlations are higher than the input correlations. For me, this sounds less like the network can perform pattern completion but it can nonlinearly increase the output correlations. Furthermore, in Suppl. Fig. 3 the authors show that activating half the assembly does lead to pattern completion in the sense that also non-activated assembly cells become highly active and that this pattern completion can be seen for Scaled I, Tuned E+I, and Tuned I networks. These two results seem a bit contradictory to me and require further clarification, and the authors might want to clarify how exactly they define pattern completion. 

      We believe that this comment concerns a semantic misunderstanding and apologize for any lack of clarity. We added a definition of pattern completion in the text: “…the retrieval of the whole memory from noisy or corrupted versions of the learned input.”. Pattern completion may be assessed using different procedures. In computational studies, it is often analyzed by delivering input to a subset of the assembly neurons which store a given memory (partial activation). Under these conditions, we find recruitment of the entire assembly in all structured networks, as demonstrated in Supplementary Figure 3. However, these conditions are unlikely to occur during odor presentation because the majority of neurons do not receive any input.

      Another more biologically motivated approach to assess pattern completion is to gradually modify a realistic odor input into a learned input, thereby gradually increasing the overlap between the two inputs. This approach had been used previously in experimental studies (references added to the text p.6). In the presence of assemblies, recurrent connectivity is expected to recruit assembly neurons (and thus retrieve the stored pattern) more efficiently as the learned pattern is approached. This should result in a nonlinear increase in the similarity between the evoked and the learned activity pattern. This signature was prominent in Scaled networks but not in Tuned or rand networks. Obviously, the underlying procedure is different from the partial activation of the assembly described above because input patterns target many neurons (including neurons outside assemblies) and exhibit a biologically realistic distribution of activity. However, this approach has also been referred to as “pattern completion” in the neuroscience literature, which may be the source of semantic confusion here. To clarify the difference between these approaches we have now revised the text and explicitly described each procedure in more detail (see p.6). 

      The authors argue that Tuned E+I networks have several advantages over Scaled I networks. While I agree with the authors that in some cases adding this localized E/I balance is beneficial, I believe that a more rigorous comparison between Tuned E+I networks and Scaled I networks is needed: quantification of variance (Fig. 4G) and angle distributions (Fig. 4H) should also be shown for the Scaled I network. Similarly in Fig. 5, what is the Mahalanobis distance for Scaled I networks and how well can the Scaled I network be classified compared to the Tuned E+I network? I suspect that the Scaled I network will actually be better at classifying odors compared to the E+I network. The authors might want to speculate about the benefit of having networks with both sources of inhibition (local and global) and hence being able to switch between locally defined manifolds and discrete attractor states. 

      We agree that a more rigorous comparison of Tuned and Scaled networks would be of interest. We have added the variance analysis (Fig 4G) and angle distributions (Fig. 4H) for both Tuned I and Scaled networks. However, the Mahalanobis distances and Quadratic Discriminant Analysis cannot be applied to Scaled networks because their neuronal covariance matrix is low rank and not invertible_. To nevertheless compare these networks, we performed template matching by assigning test patterns to one of the four odor classes based on correlations to template patterns (Supplementary Figure 8; see also response to the first comment of reviewer 2). Interestingly, _Scaled networks performed well at classification but did not outperform Tuned networks, and exhibited disadvantages arising from attractor dynamics (Supplementary Figure 8; see also response to the first comment of reviewer 2). Furthermore, in further analyses we found that continuous representational manifolds support metric assessments of inputs relative to learned odors, which cannot be achieved by discrete representations. These results are now shown in Figure 5D-E and discussed explicitly in the text on p.11 (see also response to comment 3 of reviewer 1).

      We preferred not to add a sentence in the Discussion about benefits of networks having both sources of inhibition_,_ as we find this a bit too speculative.

      At a few points in the manuscript, the authors use statements without actually providing evidence in terms of a Figure. Often the authors themselves acknowledge this, by adding the term "not shown" to the end of the sentence. I believe it will be helpful to the reader to be provided with figures or panels in support of the statements.  

      Thank you for this comment. We have provided additional data figures to support the following statements:

      “d<sub>M</sub> was again increased upon learning, particularly between learned odors and reference classes representing other odors (Supplementary Figure 9)”

      “decreasing amplification in assemblies of Scaled networks changed transformations towards the intermediate behavior, albeit with broader firing rate distributions than in Tuned networks (Supplementary Figure 6 B)”  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Meissner-Bernard et al present a biologically constrained model of telencephalic area of adult zebrafish, a homologous area to the piriform cortex, and argue for the role of precisely balanced memory networks in olfactory processing. 

      This is interesting as it can add to recent evidence on the presence of functional subnetworks in multiple sensory cortices. It is also important in deviating from traditional accounts of memory systems as attractor networks. Evidence for attractor networks has been found in some systems, like in the head direction circuits in the flies. However, the presence of attractor dynamics in other modalities, like sensory systems, and their role in computation has been more contentious. This work contributes to this active line of research in experimental and computational neuroscience by suggesting that, rather than being represented in attractor networks and persistent activity, olfactory memories might be coded by balanced excitation-inhibitory subnetworks. 

      The paper is generally well-written, the figures are informative and of good quality, and multiple approaches and metrics have been used to test and support the main results of the paper. 

      The main strength of the work is in: (1) direct link to biological parameters and measurements, (2) good controls and quantification of the results, and (3) comparison across multiple models. 

      (1) The authors have done a good job of gathering the current experimental information to inform a biological-constrained spiking model of the telencephalic area of adult zebrafish. The results are compared to previous experimental measurements to choose the right regimes of operation. 

      (2) Multiple quantification metrics and controls are used to support the main conclusions and to ensure that the key parameters are controlled for - e.g. when comparing across multiple models.   (3) Four specific models (random, scaled I / attractor, and two variant of specific E-I networks - tuned I and tuned E+I) are compared with different metrics, helping to pinpoint which features emerge in which model. 

      Major problems with the work are: (1) mechanistic explanation of the results in specific E-I networks, (2) parameter exploration, and (3) the functional significance of the specific E-I model. 

      (1) The main problem with the paper is a lack of mechanistic analysis of the models. The models are treated like biological entities and only tested with different assays and metrics to describe their different features (e.g. different geometry of representation in Fig. 4). Given that all the key parameters of the models are known and can be changed (unlike biological networks), it is expected to provide a more analytical account of why specific networks show the reported results. For instance, what is the key mechanism for medium amplification in specific E/I network models (Fig. 3)? How does the specific geometry of representation/manifolds (in Fig. 4) emerge in terms of excitatory-inhibitory interactions, and what are the main mechanisms/parameters? Mechanistic account and analysis of these results are missing in the current version of the paper. 

      Precise balancing of excitation and inhibition in subnetworks would lead to the cancellation of specific dynamical modes responsible for the amplification of responses (hence, deviating from the attractor dynamics with an unstable specific mode). What is the key difference in the specific E/I networks here (tuned I or/and tuned E+I) which make them stand between random and attractor networks? Excitatory and inhibitory neurons have different parameters in the model (Table 1). Time constants of inhibitory and excitatory synapses are also different (P. 13). Are these parameters causing networks to be effectively more excitation dominated (hence deviating from a random spectrum which would be expected from a precisely balanced E/I network, with exactly the same parameters of E and I neurons)? It is necessary to analyse the network models, describe the key mechanism for their amplification, and pinpoint the key differences between E and I neurons which are crucial for this. 

      To address these comments we performed additional simulations and analyses at different levels. Please see our reply to comment (1) of the public review (reviewer 1) for a detailed description. We thank the reviewer for these constructive comments.

      (2) The second major issue with the study is a lack of systematic exploration and analysis of the parameter space. Some parameters are biologically constrained, but not all the parameters. For instance, it is not clear what the justification for the choice of synaptic time scales are (with E synaptic time constants being larger than inhibition: tau_syn_i = 10 ms, tau_syn_E = 30 ms). How would the results change if they are varying these - and other unconstrained - parameters? It is important to show how the main results, especially the manifold localisation, would change by doing a systematic exploration of the key parameters and performing some sensitivity analysis. This would also help to see how robust the results are, which parameters are more important and which parameters are less relevant, and to shed light on the key mechanisms.  

      We thank the reviewer for this comment. We have now carried out additional simulations with equal time constants for all neurons. Please see our reply to the public review for more details (comment 2 of reviewer 1).

      (3) It is not clear what the main functional advantage of the specific E-I network model is compared to random networks. In terms of activity, they show that specific E-I networks amplify the input more than random networks (Fig. 3). But when it comes to classification, the effect seems to be very small (Fig. 5c). Description of different geometry of representation and manifold localization in specific networks compared to random networks is good, but it is more of an illustration of different activity patterns than proving a functional benefit for the network. The reader is still left with the question of what major functional benefits (in terms of computational/biological processing) should be expected from these networks, if they are to be a good model for olfactory processing and learning. 

      One possibility for instance might be that the tasks used here are too easy to reveal the main benefits of the specific models - and more complex tasks would be needed to assess the functional enhancement (e.g. more noisy conditions or more combination of odours). It would be good to show this more clearly - or at least discuss it in relation to computation and function.

      Please see our reply to the public review (comment 3 of reviewer 1).

      Specific comments: 

      Abstract: "resulting in continuous representations that reflected both relatedness of inputs and *an individual's experience*" 

      It didn't become apparent from the text or the model where the role of "individual's experience" component (or "internal representations" - in the next line) was introduced or shown (apart from a couple of lines in the Discussion) 

      We consider the scenario that that assemblies are the outcome of an experience-dependent plasticity process. To clarify this, we have now made a small addition to the text: “Biological memory networks are thought to store information by experience-dependent changes in the synaptic connectivity between assemblies of neurons.”.

      P. 2: "The resulting state of "precise" synaptic balance stabilizes firing rates because inhomogeneities or fluctuations in excitation are tracked by correlated inhibition" 

      It is not clear what the "inhomogeneities" specifically refers to - they can be temporal, or they can refer to the quenched noise of connectivity, for instance. Please clarify what you mean. 

      The statement has been modified to be more precise: “…“precise” synaptic balance stabilizes firing rates because inhomogeneities in excitation across the population or temporal variations in excitation are tracked by correlated inhibition…”.

      P. 3 (and Methods): When odour stimulus is simulated in the OB, the activity of a fraction of mitral cells is increased (10% to 15 Hz) - but also a fraction of mitral cells is suppressed (5% to 2 Hz). What is the biological motivation or reference for this? It is not provided. Is it needed for the results? Also, it is not explained how the suppressed 5% are chosen (e.g. randomly, without any relation to the increased cells?). 

      We thank the reviewer for this comment. These changes in activity directly reflect experimental observations. We apologize that we forgot to include the references reporting these observations (Friedrich and Laurent, 2001 and 2004); this is now fixed.

      In our simulation, OB neurons do not interact with each other, and the suppressed 5% were indeed randomly selected. We changed the text in Methods accordingly to read: “An additional 75 randomly selected mitral cells were inhibited” 

      P. 4, L. 1-2: "... sparsely connected integrate-and-fire neurons with conductance-based synapses (connection probability {less than or equal to}5%)." 

      Specify the connection probability of specific subtypes (EE, EI, IE, II).  

      We now refer to the Methods section, where this information can be found. 

      “... conductance-based synapses (connection probability ≤5%, Methods)”  

      P. 4, L. 6-7: "Population activity was odor-specific and activity patterns evoked by uncorrelated OB inputs remained uncorrelated in Dp (Figure 1H)" 

      What would happen to correlated OB inputs (e.g. as a result of mixture of two overlapping odours) in this baseline state of the network (before memories being introduced to it)? It would be good to know this, as it sheds light on the initial operating regime of the network in terms of E/I balance and decorrelation of inputs.  

      This information was present in the original manuscript at (Figure 3) but we improved the writing to further clarify this issue: “ (…) we morphed a novel odor into a learned odor (Figure 3A), or a learned odor into another learned odor (Supplementary Figure 3B), and quantified the similarity between morphed and learned odors by the Pearson correlation of the OB activity patterns (input correlation). We then compared input correlations to the corresponding pattern correlations among E neurons in Dp (output correlation). In rand networks, output correlations increased linearly with input correlations but did not exceed them (Figure 3B and Supplementary Figure 3B)”

      P. 4, L. 12-13: "Shuffling spike times of inhibitory neurons resulted in runaway activity with a probability of ~80%, .."   Where is this shown? 

      (There are other occasions too in the paper where references to the supporting figures are missing). 

      We now provide the statistics: “Shuffling spike times of inhibitory neurons resulted in runaway activity with a probability of 0.79 ± 0.20”

      P. 4: "In each network, we created 15 assemblies representing uncorrelated odors. As a consequence, ~30% of E neurons were part of an assembly ..." 

      15 x 100 / 4000 = 37.5% - so it's closer to 40% than 30%. Unless there is some overlap? 

      Yes: despite odors being uncorrelated and connectivity being random, some neurons (6 % of E neurons) belong to more than one assembly.

      P. 4: "When a reached a critical value of ~6, networks became unstable and generated runaway activity (Figure 2B)." 

      Can this transition point be calculated or estimated from the network parameters, and linked to the underlying mechanisms causing it? 

      We thank the reviewer for this interesting question. The unstability arises when inhibitions fails to counterbalance efficiently the increased recurrent excitation within Dp. The transition point is difficult to estimate, as it can depend on several parameters, including the probability of E to E connections, their strength, assembly size, and others. We have therefore not attempted to estimate it analytically.

      P. 4: "Hence, non-specific scaling of inhibition resulted in a divergence of firing rates that exhausted the dynamic range of individual neurons in the population, implying that homeostatic   global inhibition is insufficient to maintain a stable firing rate distribution." 

      I don't think this is justified based on the results and figures presented here (Fig. 2E) - the interpretation is a bit strong and biased towards the conclusions the authors want to draw. 

      To more clearly illustrate the finding that in Scaled networks, assembly neurons are highly active (close to maximal realistic firing rates) whereas non-assembly neurons are nearly silent we have now added Supplementary Fig. 2B. Moreover, we have toned down the text: “Hence, non-specific scaling of inhibition resulted in a large and biologically unrealistic divergence of firing rates (Supplementary Figure 2B) that nearly exhausted the dynamic range of individual neurons in the population, indicating that homeostatic global inhibition is insufficient to maintain a stable firing rate distribution”

      P. 5, third paragraph: Description of Figure 2I, inset is needed, either in the text or caption. 

      The inset is now referred to in the text: ”we projected synaptic conductances of each neuron onto a line representing the E/I ratio expected in a balanced network (“balanced axis”) and onto an orthogonal line (“counter-balanced axis”; Figure 2I inset, Methods).”

      P. 5, last paragraph: another example of writing about results without showing/referring to the corresponding figures: 

      "In rand networks, firing rates increased after stimulus onset and rapidly returned to a low baseline after stimulus offset. Correlations between activity patterns evoked by the same odor at different time points and in different trials were positive but substantially lower than unity, indicating high variability ..." 

      And the continuation with similar lack of references on P. 6: 

      "Scaled networks responded to learned odors with persistent firing of assembly neurons and high pattern correlations across trials and time, implying attractor dynamics (Hopfield, 1982; Khona and Fiete, 2022), whereas Tuned networks exhibited transient responses and modest pattern correlations similar to rand networks." 

      Please go through the Results and fix the references to the corresponding figures on all instances. 

      We thank the reviewer for pointing out these overlooked figure references, which are now fixed.

      P. 8: "These observations further support the conclusion that E/I assemblies locally constrain neuronal dynamics onto manifolds." 

      As discussed in the general major points, mechanistic explanation in terms of how the interaction of E/I dynamics leads to this is missing. 

      As discussed in the reply to the public review (comment 3 of reviewer 1), we have now provided more mechanistic analyses of our observations.

      P. 9: "Hence, E/I assemblies enhanced the classification of inputs related to learned patterns."   The effect seems to be very small. Also, any explanation for why for low test-target correlation the effect is negative (random doing better than tuned E/I)? 

      The size of the effect (plearned – pnovel = 0.074; difference of means; Figure 5C) may appear small in terms of absolute probability, but it is substantial relative to the maximum possible increase (1 – p<sub>novel</sub> =  0.133; Figure 5C). The fact that for low test-target correlations the effect is negative is a direct consequence of the positive effect for high test-target correlations and the presence of 2 learned odors in the 4-way forced choice task. 

      P. 9: "In Scaled I networks, creating two additional memories resulted in a substantial increase   in firing rates, particularly in response to the learned and related odors"   Where is this shown? Please refer to the figure. 

      We thank the reviewer again for pointing this out. We forgot to include a reference to the relevant figure which has now been added in the revised manuscript (Figure 6C).

      P. 10: "The resulting Tuned networks reproduced additional experimental observations that were not used as constraints including irregular firing patterns, lower output than input correlations, and the absence of persistent activity" 

      It is difficult to present these as "additional experimental observations", as all of them are negative, and can exist in random networks too - hence cannot be used as biological evidence in favour of specific E/I networks when compared to random networks. 

      We agree with the reviewer that these additional experimental observations cannot be used as biological evidence favouring Tuned E+I networks over random networks. We here just wanted to point out that additional observations which we did not take into account to fit the model are not invalidating the existence of E-I assemblies in biological networks. As assemblies tend to result in persistent activity in other types of networks, we feel that this observation is worth pointing out.

      Methods: 

      P. 13: Describe the parameters of Eq. 2 after the equation. 

      Done.

      P. 13: "The time constants of inhibitory and excitatory synapses were 10 ms and 30 ms, respectively." 

      What is the (biological) justification for the choice of these parameters? 

      How would varying them affect the main results (e.g. local manifolds)? 

      We chose a relatively slow time constant for excitatory synapses because experimental data indicate that excitatory synaptic currents in Dp and piriform cortex contain a prominent NMDA component. We have now also simulated networks with equal time constants for excitatory and inhibitory synapses and equal biophysical parameters for excitatory and inhibitory neurons, which did not affect the main results (see also reply to the public review: comment 2 of reviewer 1).

      P. 14: "Care was also taken to ensure that the variation in the number of output connections was low across neurons"   How exactly?

      More detailed explanations have now been added in the Methods section: “connections of a presynaptic neuron y to postsynaptic neurons x were randomly deleted when their total number exceeded the average number of output connections by ≥5%, or added when they were lower by ≥5%.“

      Reviewer #2 (Recommendations For The Authors): 

      Congratulations on the great and interesting work! The results were nicely presented and the idea of continuous encoding on manifolds is very interesting. To improve the quality of the paper, in addition to the major points raised in the public review, here are some more detailed comments for the paper: 

      (1) Generally, citations have to improve. Spiking networks with excitatory assemblies and different architectures of inhibitory populations have been studied before, and the claim about improved network stability in co-tuned E-I networks has been made in the following papers that need to be correctly cited: 

      • Vogels TP, Sprekeler H, Zenke F, Clopath C, Gerstner W. 2011. Inhibitory Plasticity Balances Excitation and Inhibition in Sensory Pathways and Memory Networks. Science 334:1-7. doi:10.1126/science.1212991 (mentions that emerging precise balance on the synaptic weights can result in the overall network stability) 

      • Lagzi F, Bustos MC, Oswald AM, Doiron B. 2021. Assembly formation is stabilized by Parvalbumin neurons and accelerated by Somatostatin neurons. bioRxiv doi: https://doi.org/10.1101/2021.09.06.459211 (among other things, contrasts stability and competition which arises from multistable networks with global inhibition and reciprocal inhibition)   • Rost T, Deger M, Nawrot MP. 2018. Winnerless competition in clustered balanced networks: inhibitory assemblies do the trick. Biol Cybern 112:81-98. doi:10.1007/s00422-017-0737-7 (compares different architectures of inhibition and their effects on network dynamics) 

      • Lagzi F, Fairhall A. 2022. Tuned inhibitory firing rate and connection weights as emergent network properties. bioRxiv 2022.04.12.488114. doi:10.1101/2022.04.12.488114 (here, see the eigenvalue and UMAP analysis for a network with global inhibition and E/I assemblies) 

      Additionally, there are lots of pioneering work about tracking of excitatory synaptic inputs by inhibitory populations, that are missing in references. Also, experimental work that show existence of cell assemblies in the brain are largely missing. On the other hand, some references that do not fit the focus of the statements have been incorrectly cited. 

      The authors may consider referencing the following more pertinent studies on spiking networks to support the statement regarding attractor dynamics in the first paragraph in the Introduction (the current citations of Hopfield and Kohonen are for rate-based networks): 

      • Wong, K.-F., & Wang, X.-J. (2006). A recurrent network mechanism of time integration in perceptual decisions. Journal of Neuroscience, 26(4), 1314-1328. https://doi.org/10.1523/JNEUROSCI.3733-05.2006 

      • Wang, X.-J. (2008). Decision making in recurrent neuronal circuits. Neuron, 60(2), 215-234. https://doi.org/10.1016/j.neuron.2008.09.034  

      • F. Lagzi, & S. Rotter. (2015). Dynamics of competition between subnetworks of spiking neuronal networks in the balanced state. PloS One. 

      • Goldman-Rakic, P. S. (1995). Cellular basis of working memory. Neuron, 14(3), 477-485. 

      • Rost T, Deger M, Nawrot MP. 2018. Winnerless competition in clustered balanced networks: inhibitory assemblies do the trick. Biol Cybern 112:81-98. doi:10.1007/s00422-017-0737-7. 

      • Amit DJ, Tsodyks M (1991) Quantitative study of attractor neural network retrieving at low spike rates: I. substrate-spikes, rates and neuronal gain. Network 2:259-273. 

      • Mazzucato, L., Fontanini, A., & La Camera, G. (2015). Dynamics of Multistable States during Ongoing and Evoked Cortical Activity. Journal of Neuroscience, 35(21), 8214-8231. 

      We thank the reviewer for the references suggestions. We have carefully reviewed the reference list and made the following changes, which we hope address the reviewer’s concerns:

      (1) We adjusted References about network stability in co-tuned E-I networks.

      (2) We added the Lagzi & Rotter (2015), Amit et al. (1991), Mazzucato et al. (2015) and GoldmanRakic (1995) papers in the Introduction as studies on attractor dynamics in spiking neural networks. We preferred to omit the two X.J Wang papers, as they describe attractors in decision making rather than memory processes.

      (3) We added the Ko et al. 2011 paper as experimental evidence for assemblies in the brain. In our view, there are few experimental studies showing the existence of cell assemblies in the brain, which we distinguish from cell ensembles, group of coactive neurons. 

      (4) We also included Hennequin 2018, Brunel 2000, Lagzi et al. 2021 and Eckmann et al. 2024, which we had not cited in the initial manuscript.

      (5) We removed the Wiechert et al. 2010 reference as it does not support the statement about geometry-preserving transformation by random networks.

      (2) The gist of the paper is about how the architecture of inhibition (reciprocal vs. global in this case) can determine network stability and salient responses (related to multistable attractors and variations) for classification purposes. It would improve the narrative of the paper if this point is raised in the Introduction and Discussion section. Also see a relevant paper that addresses this point here: 

      Lagzi F, Bustos MC, Oswald AM, Doiron B. 2021. Assembly formation is stabilized by Parvalbumin neurons and accelerated by Somatostatin neurons. bioRxiv doi: https://doi.org/10.1101/2021.09.06.459211 

      Classification has long been proposed to be a function of piriform cortex and autoassociative memory networks in general, and we consider it important. However, the computational function of Dp or piriform cortex is still poorly understood, and we do not focus only on odor classification as a possibility. In fact, continuous representational manifolds also support other functions such as the quantification of distance relationships of an input to previously memorized stimuli, or multi-layer network computations (including classification). In the revised manuscript, we have performed additional analyses to explore these notions in more detail, as explained above (response to public reviews, comment 3 of reviewer 1). Furthermore, we have now expanded the discussion of potential computational functions of Tuned networks and explicitly discuss classification but also other potential functions. 

      (3) A plot for the values of the inhibitory conductances in Figure 1 would complete the analysis for that section. 

      In Figure 1, we decided to only show the conductances that we use to fit our model, namely the afferent and total synaptic conductances. As the values of the inhibitory conductances can be derived from panel E, we refrained from plotting them separately for the sake of simplicity. 

      (4) How did the authors calculate correlations between activity patterns as a function of time in Figure 2E, bottom row? Does the color represent correlation coefficient (which should not be time dependent) or is it a correlation function? This should be explained in the Methods section. 

      The color represents the Pearson correlation coefficient between activity patterns within a narrow time window (100 ms). We updated the Figure legend to clarify this: “Mean correlation between activity patterns evoked by a learned odor at different time points during odor presentation. Correlation coefficients were calculated between pairs of activity vectors composed of the mean firing rates of E neurons in 100 ms time bins. Activity vectors were taken from the same or different trials, except for the diagonal, where only patterns from different trials were considered.”

      (5) Figure 3 needs more clarification (both in the main text and the figure caption). It is not clear what the axes are exactly, and why the network responses for familiar and novel inputs are different. The gray shaded area in panel B needs more explanation as well.  

      We thank the reviewer for the comment. We have improved Figure 3A, the figure caption, as well as the text (see p.6). We hope that the figure is now clearer.

      (6) The "scaled I" network, known for representing input patterns in discrete attractors, should exhibit clear separation between network responses in the 2D PC space in the PCA plots. However, Figure 4D and Figure 6D do not reflect this, as all network responses are overlapped. Can the authors explain the overlap in Figure 4D? 

      In Figure 4D, activity of Scaled networks is distributed between three subregions in state space that are separated by the first 2 PCs. Two of them indeed correspond to attractor states representing the two learned odors while the third represents inputs that are not associated with these attractor states. To clarify this, please see also the density plot in Figure 4E. The few datapoints between these three subregions are likely outliers generated by the sequential change in inputs, as described in Supplementary Figure 8C.

      (7) The reason for writing about the ISN networks is not clear. Co-tuned E-I assemblies do not necessarily have to operate in this regime. Also, the results of the paper do not rely on any of the properties of ISNs, but they are more general. Authors should either show the paradoxical effect associated with ISN (i.e., if increasing input to I neurons decreases their responses) or show ISN properties using stability analysis (See computational research conducted at the Allen Institute, namely Millman et al. 2020, eLife ). Currently, the paper reads as if being in the ISN regime is a necessary requirement, which is not true. Also, the arguments do not connect with the rest of the paper and never show up again. Since we know it is not a requirement, there is no need to have those few sentences in the Results section. Also, the choice of alpha=5.0 is extreme, and therefore, it would help to judge the biological realism if the raster plots for Figs 2-6 are shown.

      We have toned down the part on ISN and reduced it to one sentence for readers who might be interested in knowing whether activity is inhibition-stabilized or not. We have also added the reference to the Tsodyks et al. 1997 paper from which we derive our stability analysis. The text now reads “Hence, pDp<sub>sim</sub> entered a balanced state during odor stimulation (Figure 1D, E) with recurrent input dominating over afferent input, as observed in pDp (Rupprecht and Friedrich, 2018). Shuffling spike times of inhibitory neurons resulted in runaway activity with a probability of 0.79 ± 0.20, demonstrating that activity was inhibition-stabilized (Sadeh and Clopath, 2020b, Tsodyks et al., 1997).”  

      We have now also added the raster plots as suggested by the reviewer (see Figure 2D, Supplementary Figure 1 G, Supplementary Figure 4). We thank the reviewer for this comment.

      (8) In the abstract, authors mention "fast pattern classification" and "continual learning," but in the paper, those issues have not been addressed. The study does not include any synaptic plasticity. 

      Concerning “continual learning” we agree that we do not simulate the learning process itself. However, Figure 6 show results of a simulation where two additional patterns were stored in a network that already contained assemblies representing other odors. We consider this a crude way of exploring the end result of a “continual learning” process. “Fast pattern classification” is mentioned because activity in balanced networks can follow fluctuating inputs with high temporal resolution, while networks with stable attractor states tend to be slow. This is likely to account for the occurrence of hysteresis effects in Scaled but not Tuned networks as shown in Supplementary

      Fig. 8.

      (9) In the Introduction, the first sentence in the second paragraph reads: "... when neurons receive strong excitatory and inhibitory synaptic input ...". The word strong should be changed to "weak".

      Also, see the pioneering work of Brunel 2000. 

      In classical balanced networks, strong excitatory inputs are counterbalanced by strong inhibitory inputs, leading to a fluctuation-driven regime. We have added Brunel 2000.

      (10) In the second paragraph of the introduction, the authors refer to studies about structural co-tuning (e.g., where "precise" synaptic balance is mentioned, and Vogels et al. 2011 should be cited there) and functional co-tuning (which is, in fact, different than tracking of excitation by inhibition, but the authors refer to that as co-tuning). It makes it easier to understand which studies talk about structural co-tuning and which ones are about functional co-tuning. The paper by Znamenski 2018, which showed both structural and functional tuning in experiments, is missing here. 

      We added the citation to the now published paper by Znamenskyi et al. (2024).  

      (11) The third paragraph in the Introduction misses some references that address network dynamics that are shaped by the inhibitory architecture in E/I assemblies in spiking networks, like Rost et al 2018 and Lagzi et al 2021. 

      These references have been added.

      (12) The last sentence of the fourth paragraph in the Introduction implies that functional co-tuning is due to structural co-tuning, which is not necessarily true. While structural co-tuning results in functional co-tuning, functional co-tuning does not require structural co-tuning because it could arise from shared correlated input or heterogeneity in synaptic connections from E to I cells.  

      We generally agree with the reviewer, but we are uncertain which sentence the reviewer refers to.

      We assume the reviewer refers to the last sentence of the second (rather than the fourth paragraph), which explicitly mentions the “…structural basis of E/I co-tuning…”. If so, we consider this sentence still correct because the “structural basis” refers not specifically to E/I assemblies, but also includes any other connectivity that may produce co-tuning, including the connectivity underlying the alternative possibilities mentioned by the reviewer (shared correlated input or heterogeneity of synaptic connections).

      (13) In order to ensure that the comparison between network dynamics is legit, authors should mention up front that for all networks, the average firing rates for the excitatory cells were kept at 1 Hz, and the background input was identical for all E and I cells across different networks.

      We slightly revised the text to make this more clear “We (…) uniformly scaled I-to-E connection weights by a factor of χ until E population firing rates in response to learned odors matched the corresponding firing rates in rand networks, i.e., 1 Hz”

      (14) In the last paragraph on page 5, my understanding was that an individual odor could target different cells within an assembly in different trials to generate trial to trail variability. If this is correct, this needs to be mentioned clearly. 

      This is not correct, an odor consists of 150 activated mitral cells with defined firing rates. As now mentioned in the Methods, “Spikes were then generated from a Poisson distribution, and this process was repeated to create trial-to-trial variability.”

      (15) The last paragraph on page 6 mentions that the four OB activity patterns were uncorrelated but if they were designed as in Figure 4A, dues to the existing overlap between the patterns, they cannot be uncorrelated. 

      This appears to be a misunderstanding. We mention in the text (and show in Figure 4B) that the four odors which “… were assigned to the corners of a square…” are uncorrelated.  The intermediate odors are of course not uncorrelated. We slightly modified the corresponding paragraph (now on page 7) to clarify this: “The subspace consisted of a set of OB activity patterns representing four uncorrelated pure odors and mixtures of these pure odors. Pure odors were assigned to the corners of a square and mixtures were generated by selecting active mitral cells from each of the pure odors with probabilities depending on the relative distances from the corners (Figure 4A, Methods).”

      (16) The notion of "learned" and "novel" odors may be misleading as there was no plasticity in the network to acquire an input representation. It would be beneficial for the authors to clarify that by "learned," they imply the presence of the corresponding E assembly for the odor in the network, with the input solely impacting that assembly. Conversely, for "novel" inputs, the input does not target a predefined assembly. In Figure 2 and Figure 4, it would be especially helpful to have the spiking raster plots of some sample E and I cells.  

      As suggested by the reviewer, we have modified the existing spiking raster plots in Figure 2, such that they include examples of responses to both learned and novel odors. We added spiking raster plots showing responses of I neurons to the same odors in Supplementary Figure 1F, as well as spiking raster plots of E neurons in Supplementary Figure 4A. To clarify the usage of “learned” and “novel”, we have added a sentence in the Results section: “We thus refer to an odor as “learned” when a network contains a corresponding assembly, and as “novel” when no such assembly is present.”.

      (17) In the last paragraph of page 8, can the authors explain where the asymmetry comes from? 

      As mentioned in the text, the asymmetry comes from the difference in the covariance structure of different classes. To clarify, we have rephrased the sentence defining the Mahalanobis distance: 

      “This measure quantifies the distance between the pattern and the class center, taking into account covariation of neuronal activity within the class. In bidirectional comparisons between patterns from different classes, the mean dM may be asymmetric if neural covariance differs between classes.”

      (18) The first paragraph of page 9: random networks are not expected to perform pattern classification, but just pattern representation. It would have been better if the authors compared Scaled I network with E/I co-tuned network. Regardless of the expected poorer performance of the E/I co-tuned networks, the result would have been interesting. 

      Please see our reply to the public review (reviewer 2).

      (19) Second paragraph on page 9, the authors should provide statistical significance test analysis for the statement "... was significantly higher ...". 

      We have performed a Wilcoxon signed-rank test, and reported the p-value in the revised manuscript (p < 0.01). 

      (20) The last sentence in the first paragraph on page 11 is not clear. What do the authors mean by "linearize input-output functions", and how does it support their claim? 

      We have now amended this sentence to clarify what we mean: “…linearize the relationship between the mean input and output firing rates of neuronal populations…”.

      (21) In the first sentence of the last paragraph on page 11, the authors mentioned “high variability”, but it is not clear compared with which of the other 3 networks they observed high variability.

      Structurally co-tuned E/I networks are expected to diminish network-level variability. 

      “High variability” refers to the variability of spike trains, which is now mentioned explicity in the text. We hope this more precise statement clarifies this point.

      (22) Methods section, page 14: "firing rates decreased with a time constant of 1, 2 or 4 s". How did they decrease? Was it an implementation algorithm? The time scale of input presentation is 2 s and it overlaps with the decay time constant (particularly with the one with 4 s decrease).  

      Firing rates decreased exponentially. We have added this information in the Methods section.

      Reviewer #3 (Recommendations For The Authors): 

      In the following, I suggest minor corrections to each section which I believe can improve the manuscript. 

      - There was no github link to the code in the manuscript. The code should be made available with a link to github in the final manuscript. 

      The code can be found here: https://github.com/clairemb90/pDp-model. The link has been added in the Methods section.

      Figure 1: 

      - Fig. 1A: call it pDp not Dp. Please check if this name is consistent in every figure and the text. 

      Thank you for catching this. Now corrected in Figure 1, Figure 2 and in the text.

      - The authors write: "Hence, pDpsim entered an inhibition-stabilized balanced state (Sadeh and Clopath, 2020b) during odor stimulation (Figure 1D, E)." and then later "Shuffling spike times of inhibitory neurons resulted in runaway activity with a probability of ~80%, demonstrating that activity was indeed inhibition-stabilized. These results were robust against parameter variations (Methods)." I would suggest moving the second sentence before the first sentence, because the fact that the network is in the ISN regime follows from the shuffled spike timing result. 

      Also, I'd suggest showing this as a supplementary figure. 

      We thank the reviewer for this comment. We have removed “inhibition-stabilized” in the first sentence as there is no strong evidence of this in Rupprecht and Friedrich, 2018. And removed “indeed” in the second sentence. We also provided more detailed statistics. The text now reads “Hence, pDpsim entered a balanced state during odor stimulation (Figure 1D, E) with recurrent input dominating over afferent input, as observed in pDp (Rupprecht and Friedrich, 2018). Shuffling spike times of inhibitory neurons resulted in runaway activity with a probability of 0.79 ± 0.20, demonstrating that activity was inhibition-stabilized (Sadeh and Clopath, 2020b).”

      Figure 2: 

      - "... Scaled I networks (Figure 2H." Missing ) 

      Corrected.

      - The authors write "Unlike in Scaled I networks, mean firing rates evoked by novel odors were indistinguishable from those evoked by learned odors and from mean firing rates in rand networks (Figure 2F)." 

      Why is this something you want to see? Isn't it that novel stimuli usually lead to high responses? Eg in the paper Schulz et al., 2021 (eLife) which is also cited by the authors it is shown that novel responses have high onset firing rates. I suggest clarifying this (same in the context of Fig. 3C). 

      In Dp and piriform cortex, firing rates evoked by learned odors are not substantially different from firing rates evoked by novel odors. While small differences between responses to learned versus novel odors cannot be excluded, substantial learning-related differences in firing rates, as observed in other brain areas, have not been described in Dp or piriform cortex. We added references in the last paragraph of p.5. Note that the paper by Schulz et al. (2021) models a different type of circuit.  

      - Fig. 2B: Indicate in figure caption that this is the case "Scaled I" 

      This is not exactly the case “Scaled I”, as the parameter 𝝌𝝌 (increased I to E strength) is set to 1.

      - Suppl Fig. 2I: Is E&F ever used in the manuscript? I couldn't find a reference. I suggest removing it if not needed. 

      Suppl. Fig 2I E&F is now Suppl Fig.1G&H. We now refer to it in the text: “Activity of networks with E assemblies could not be stabilized around 1 Hz by increasing connectivity from subsets of I neurons receiving dense feed-forward input from activated mitral cells (Supplementary Figure 1GH; Sadeh and Clopath, 2020).”

      Figure 3: 

      - As mentioned in my comment in the public review section, I find the arguments about pattern completion a little bit confusing. For me it's not clear why an increase of output correlations over input correlations is considered "pattern completion" (this is not to say that I don't find the nonlinear increase of output correlations interesting). For me, to test pattern completion with second-order statistics one would need to do a similar separation as in Suppl Fig. 3, ie measuring the pairwise correlation at cells in the assembly L that get direct input from L OB with cells in the assembly L that do not get direct input from OB. If the pairwise correlations of assembly cells which do not get direct input from OB increase in correlations, I would consider this as pattern completion (similar to the argument that increase in firing rate in cells which are not directly driven by OB are considered a sign of pattern completion). 

      Also, for me it now seems like that there are contradictory results, in Fig. 3 only Scaled I can lead to pattern completion while in the context of Suppl. Fig. 3 the authors write "We found that assemblies were recruited by partial inputs in all structured pDpsim networks (Scaled and Tuned) without a significant increase in the overall population activity (Supplementary Figure 3A)."   I suggest clarifying what the authors exactly mean by pattern completion, why the increase of output correlations above input correlations can be considered as pattern completion, and why the results differs when looking at firing rates versus correlations. 

      Please see our reply to the public review (reviewer 3).

      - I actually would suggest adding Suppl. Fig. 3 to the main figure. It shows a more intuitive form of pattern completion and in the text there is a lot of back and forth between Fig. 3 and Suppl. Fig. 3 

      We feel that the additional explanations and panels in Fig.3 should clarify this issue and therefore prefer to keep Supplementary Figure 3 as part of the Supplementary Figures for simplicity.  

      - In the whole section "We next explored effects of assemblies ... prevented strong recurrent amplification within E/I assemblies." the authors could provide a link to the respective panel in Fig. 2 after each statement. This would help the reader follow your arguments. 

      We thank the reviewer for pointing this out. The references to the appropriate panels have been added. 

      - Fig. 3A: I guess the x-axis has been shifted upwards? Should be at zero. 

      We have modified the x-axis to make it consistent with panels B and C.  

      - Fig. 3B: In the figure caption, the dotted line is described as the novel odor but it is actually the unit line. The dashed lines represent the reference to the novel odor. 

      Fixed.

      - Fig. 3C: The " is missing for Pseudo-Assembly N

      Fixed.

      - "...or a learned odor into another learned odor." Have here a ref to the Supplementary Figure 3B.

      Added.

      Figure 4:   

      - "This geometry was largely maintained in the output of rand networks, consistent with the notion that random networks tend to preserve similarity relationships between input patterns (Babadi and Sompolinsky, 2014; Marr, 1969; Schaffer et al., 2018; Wiechert et al., 2010)." I suggest adding here reference to Fig. 4D (left). 

      Added.

      - Please add a definition of E/I assemblies. How do the authors define E/I assemblies? I think they consider both, Tuned I and Tuned E+I as E/I assemblies? In Suppl. Fig. 2I E it looks like tuned feedforward input is defined as E/I assemblies. 

      We thank the reviewer for pointing this out. E/I assemblies are groups of E and I neurons with enhanced connectivity. In other words, in E/I assemblies, connectivity is enhanced not only between subsets of E neurons, but also between these E neurons and a subset of I neurons. This is now clarified in the text: “We first selected the 25 I neurons that received the largest number of connections from the 100 E neurons of an assembly. To generate E/I assemblies, the connectivity between these two sets of neurons was then enhanced by two procedures.”. We removed “E/I assemblies” in Suppl. Fig.2, where the term was not used correctly, and apologize for the confusion.

      - Suppl. Fig. 4: Could the authors please define what they mean by "Loadings" 

      The loadings indicate the contribution of each neuron to each principal component, see adjusted legend of Suppl. Fig. 4: “G. Loading plot: contribution of neurons to the first two PCs of a rand and a Tuned E+I network (Figure 4D).”

      - Fig. 4F: The authors might want to normalize the participation ratio by the number of neurons (see e.g. Dahmen et al., 2023 bioRxiv, "relative PR"), so the PR is bound between 0 and 1 and the dependence on N is removed. 

      We thank the reviewer for the suggestion, but we prefer to use the non-normalized PR as we find it more easily interpretable (e.g. number of attractor states in Scaled networks).

      - Fig. 4G&H: as mentioned in the public review, I'd add the case of Scaled I to be able to compare it to the Tuned E+I case. 

      As already mentioned in the public review, we thank the reviewer for this suggestion, which we have implemented.

      - Figure caption Fig. 4H "Similar results were obtained in the full-dimensional space." I suggest showing this as a supplemental panel. 

      Since this only adds little information, we have chosen not to include it as a supplemental panel to avoid overloading the paper with figures.

      Figure 5: 

      - As mentioned in the public review, I suggest that the authors add the Scaled I case to Fig. 5 (it's shown in all figures and also in Fig. 6 again). I guess for Scaled I the separation between L and M will be very good? 

      Please see our reply to the public review (reviewer 3).

      - Fig. 5A&B: I am a bit confused about which neurons are drawn to calculate the Mahalanobis distance. In Fig. 5A, the schematic indicates that the vector B from which the neurons are drawn is distinct from the distribution Q. For the example of odor L, the distribution Q consists of pure odor L with odors that have little mixtures with the other odors. But the vector v for odor L seems to be drawn only from odors that have slightly higher mixtures (as shown in the schematic in Fig. 5A). Is there a reason to choose the vector v from different odors than the distribution Q? 

      The distribution Q and the vector v consist of activity patterns across the same neurons in response to different odors. The reason to choose a different odor for v was to avoid having this test datapoint being included in the distribution Q. We also wanted Q to be the same for all test datapoints. 

      What does "drawn from whole population" mean? Does this mean that the vectors are drawn from any neuron in pDp? If yes, then I don't understand how the authors can distinguish between different odors (L,M,O,N) on the y-axis. Or does "whole population" mean that the vector is drawn across all assemblies as shown in the schematic in Fig. 5A and the case "neurons drawn from (pseudo-) assembly" means that the authors choose only one specific assembly? In any case, the description here is a bit confusing, I think it would help the reader to clarify those terms better.  

      Yes, “drawn from whole population” means that we randomly draw 80 neurons from the 4000 E neurons in pDp. The y-axis means that we use the activity patterns of these neurons evoked by one of the 4 odors (L, M, N, O) as reference. We have modified the Figure legend to clarify this: “d<sub>M</sub> was computed based on the activity patterns of 80 E neurons drawn from the four (pseudo-) assemblies (top) or from the whole population of 4000 E neurons (bottom). Average of 50 draws.”

      - Suppl Fig. 5A: In the schematic the distance is called d_E(\bar{Q},\bar{V}) while the colorbar has d_E(\bar{Q},\bar{Q}) with the Qs in different color. The green Q should be a V. 

      We thank the reviewer for spotting this mistake, it is now fixed.

      - Fig. 5: Could the authors comment on the fact that a random network seems to be very good in classifying patterns on it's own. Maybe in the Discussion? 

      The task shown in Figure 5 is a relatively easy one, a forced-choice between four classes which are uncorrelated. In Supplementary Figure 9, we now show classification for correlated classes, which is already much harder.

      Figure 6: 

      - Is the correlation induced by creating mixtures like in the other Figures? Please clarify how the correlations were induced. 

      We clarified this point in the Methods section: “The pixel at each vertex corresponded to one pure odor with 150 activated and 75 inhibited mitral cells (…) and the remaining pixels corresponded to mixtures. In the case of correlated pure odors (Figure 6), adjacent pure odors shared half of their activated and half of their inhibited cells.”. An explicit reference to the Methods section has also been added to the figure legend.

      - Fig. 6C (right): why don't we see the clear separation in PC space as shown in Fig. 4? Is this related to the existence of correlations? Please clarify. 

      Yes. The assemblies corresponding to the correlated odors X and Y overlap significantly, and therefore responses to these odors cannot be well separated, especially for Scaled networks. We added the overlap quantification in the Results section to make this clear. “These two additional assemblies had on average 16% of neurons in common due to the similarity of the odors.”

      - "Furthermore, in this regime of higher pattern similarity, dM was again increased upon learning, particularly between learned odors and reference classes representing other odors (not shown)." Please show this (maybe as a supplemental figure). 

      We now show the data in Supplementary Figure 9.

      Discussion: 

      - The authors write: "We found that transformations became more discrete map-like when amplification within assemblies was increased and precision of synaptic balance was reduced. Likewise, decreasing amplification in assemblies of Scaled networks changed transformations towards the intermediate behavior, albeit with broader firing rate distributions than in Tuned networks (not shown)." 

      Where do I see the first point? I guess when I compare in Fig. 4D the case of Scaled I vs Tuned E+I, but the sentence above sounds like the authors showed this in a more step-wise way eg by changing the strength of \alpha or \beta (as defined in Fig. 1). 

      Also I think if the authors want to make the point that decreasing amplification in assemblies changes transformation with a different rate distribution in scaled vs tuned networks, the authors should show it (eg adding a supplemental figure). 

      The first point is indeed supported by data from different figures. Please note that the revised manuscript now contains further simulations that reinforce this statement, particularly those shown in Supplementary Figure 6, and that this point is now discussed more extensively in the Discussion. We hope that these revisions clarify this general point.

      The data showing effects of decreasing amplification in assemblies is now shown in Supplementary Figure 6 (Scaled[adjust])

      - I suggest adding the citation Znamenskiy et al., 2024 (Neuron; https://doi.org/10.1016/j.neuron.2023.12.013), which shows that excitatory and inhibitory (PV) neurons with functional similarities are indeed strongly connected in mouse V1, suggesting the existence of E/I assembly structure also in mammals.

      Done.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors intended to investigate the earliest mechanisms enabling self-prioritization, especially in the attention. Combining a temporal order judgement task with computational modelling based on the Theory of Visual Attention (TVA), the authors suggested that the shapes associated with the self can fundamentally alter the attentional selection of sensory information into awareness. This self-prioritization in attentional selection occurs automatically at early perceptual stages. Furthermore, the processing benefits obtained from attentional selection via self-relatedness and physical salience were separated from each other.

      Strengths:

      The manuscript is written in a way that is easy to follow. The methods of the paper are very clear and appropriate.

      Thank you for your valuable feedback and helpful suggestions. Please see specific answers below.

      Weaknesses:

      There are two main concerns:

      (1) The authors had a too strong pre-hypothesis that self-prioritization was associated with attention. They used the prior entry to consciousness (awareness) as an index of attention, which is not appropriate. There may be other processing that makes the stimulus prior to entry to consciousness (e.g. high arousal, high sensitivity), but not attention. The self-related/associated stimulus may be involved in such processing but not attention to make the stimulus easily caught. Perhaps the authors could include other methods such as EEG or MEG to answer this question.

      We found the possibility of other mechanisms to be responsible for “prior entry” interesting too, but believe there are solid grounds for the hypothesis that it is indicative of attention:

      First, prior entry has a long-standing history as in index of attention (e.g., Titchener, 1903; Shore et al., 2001; Yates and Nicholls, 2009; Olivers et al. 2011; see Spence & Parise, 2010, for a review.) Of course, other factors (like the ones mentioned) can contribute to encoding speed. However, for the perceptual condition, we systematically varied a stimulus feature that is associated with selective attention (salience, see e.g. Wolfe, 2021) and kept other features that are known to be associated with other factors such as arousal and sensitivity constant across the two variants (e.g. clear over threshold visibility) or varied them between participants (e.g. the colours / shapes used).

      Second, in the social salience condition we used a manipulation that has repeatedly been used to establish social salience effects in other paradigms (e.g., Li et al., 2022; Liu & Sui, 2016; Scheller et al., 2024; Sui et al., 2015; see Humphreys & Sui, 2016, for a review). We assume that the reviewer’s comment suggests that changes in arousal or sensitivity may be responsible for social salience effects, specifically. We have several reasons to interpret the social salience effects as an alteration in attentional selection, rather than a result of arousal or sensitivity:

      Arousal and attention are closely linked. However, within the present model, arousal is more likely linked to the availability of processing resources (capacity parameter C). That is, enhanced arousal is typically not stimulus-specific, and therefore unlikely affects the *relative* advantage in processing weights/rates of the self-associated (vs other-associated) stimuli. Indeed, a recent study showed that arousal does not modulate the relative division of attentional resources (as modelled by the Theory of Visual Attention; Asgeirsson & Nieuwenhuis, 2017). As such, it is unlikely that arousal can explain the observed results in relative processing changes for the self and other identities.

      Further, there is little reason to assume that presenting a different shape enhances perceptual sensitivity. Firstly, all stimuli were presented well above threshold, which would shrink any effects that were resulting from increases in sensitivity alone. Secondly, shape-associations were counterbalanced across participants, reducing the possibility that specific features, present in the stimulus display, lead to the measurable change in processing rates as a result of enhanced shape-sensitivity.

      Taken together, both, the wealth of literature that suggests prior entry to index attention and the specific design choices within our study, strongly support the notion that the observed changes in processing rates are indicative of changes in attentional selection, rather than other mechanisms (e.g. arousal, sensitivity).

      (2) The authors suggested that there are two independent attention processes. I suspect that the brain needs two attention systems. Is there a probability that the social and perceptual (physical properties of the stimulus) salience fired the same attention processing through different processing?

      We appreciate this thought-provoking comment. We conceptualize attention as a process that can facilitate different levels of representation, rather than as separate systems tuned to specific types of information. Different forms of representation, such as the perceptual shape, or the associated social identity, may be impacted by the same attentional process at different levels of representation. Indeed, our findings suggest that both social and perceptual salience effects may result from the same attentional system, albeit at different levels of representation. This is further supported by the additivity of perceptual and social salience effects and the negative correlation of processing facilitations between perceptually and socially salient cues. These results may reflect a trade-off in how attentional resources are distributed between either perceptually or socially salient stimuli.

      Reviewer #2 (Public review):

      Summary:

      The main aim of this research was to explore whether and how self-associations (as opposed to other associations) bias early attentional selection, and whether this can explain well-known self-prioritization phenomena, such as the self-advantage in perceptual matching tasks. The authors adopted the Visual Attention Theory (VAT) by estimating VAT parameters using a hierarchical Bayesian model from the field of attention and applied it to investigate the mechanisms underlying self-prioritization. They also discussed the constraints on the self-prioritization effect in attentional selection. The key conclusions reported were:

      (1) Self-association enhances both attentional weights and processing capacity

      (2) Self-prioritization in attentional selection occurs automatically but diminishes when active social decoding is required, and

      (3) Social and perceptual salience capture attention through distinct mechanisms.

      Strengths:

      Transferring the Theory of Visual Attention parameters estimated by a hierarchical Bayesian model to investigate self-prioritization in attentional selection was a smart approach. This method provides a valuable tool for accessing the very early stages of self-processing, i.e., attention selection. The authors conclude that self-associations can bias visual attention by enhancing both attentional weights and processing capacity and that this process occurs automatically. These findings offer new insights into self-prioritization from the perspective of the early stage of attentional selection.

      Thank you for your valuable feedback and helpful suggestions. Please see specific answers below.

      Weaknesses:

      (1) The results are not convincing enough to definitively support their conclusions. This is due to inconsistent findings (e.g., the model selection suggested condition-specific c parameters, but the increase in processing capacity was only slight; the correlations between attentional selection bias and SPE were inconsistent across experiments), unexpected results (e.g., when examining the impact of social association on processing rates, the other-associated stimuli were processed faster after social association, while the self-associated stimuli were processed more slowly), and weak correlations between attentional bias and behavioral SPE, which were reported without any p-value corrections. Additionally, the reasons why the attentional bias of self-association occurs automatically but disappears during active social decoding remain difficult to explain. It is also possible that the self-association with shapes was not strong enough to demonstrate attention bias, rather than the automatic processes as the authors suggest. Although these inconsistencies and unexpected results were discussed, all were post hoc explanations. To convince readers, empirical evidence is needed to support these unexpected findings.

      Thank you for outlining the specific points that raise your concern. We were happy to address these points as follows:

      a. Replications and Consistency: In our study, we consistently observed trends (relative reduction in processing speed of the self-associated stimulus) in the social salience conditions across experiments. While Experiment 2 demonstrated a significant reduction in processing rate towards self-stimuli, there was a notable trend in Experiment 1 as well.

      b. Condition-specific parameters: The condition-specific C parameters, though presenting a small effect size, significantly improved model fit. Inspecting the HDI ranges of our estimated C parameters indicates a high probability (85-89%) that processing capacity increased due to social associations, suggesting that even small changes (~2Hz) can hold meaningful implications within the context attentional selection.

      Please also note that the main conclusions about relative salience (self/other, salient/non-salient) are based on the relative processing rates. Processing rates are the product of the processing capacity (condition- but not stimulus dependent) and the attentional weight (condition and stimulus dependent). The latter is crucial to judge the *relative* advantage of the salient stimulus. Hence, the self-/salient stimulus advantage that is reflected in the ‘processing rate difference’ is automatically also reflected in the relative attentional weights attributed to the self/other and salient/non-salient stimuli. As such, the overall results of an automatic relative advantage of self-associated stimuli hold, independently of the change in overall processing capacity.

      c. Correlations: Regarding the correlations the reviewer noted, we wish to clarify that these were exploratory, and not the primary focus of our research. The aim of these exploratory analyses was to gauge the contribution of attentional selection to matching-based SPEs. As SPEs measured via the matching task are typically based on multiple different levels of processing, the contribution of early attentional selection to their overall magnitude was unclear. Without being able to gauge the possible effect sizes, corrected analyses may prevent detecting small but meaningful effects. As such, the effect sizes reported serve future studies to estimate power a priori and conduct well-powered replications of such exploratory effects. Additionally, Bayes factors were provided to give an appreciation of the strength of the evidence, all suggesting at least moderate evidence in favour of a correlation. Lastly, please note that effects that were measured within individuals and task (processing rate increase in social and perceptual decision dimensions in the TOJ task) showed consistent patterns, suggesting that the modulations within tasks were highly predictive of each other, while the modulations between tasks were not as clearly linked. We will add this clarification to the revised manuscript.

      d. Unexpected results: The unexpected results concerning the processing rates of other-associated versus self-associated stimuli certainly warrant further discussion. We believe that the additional processing steps required for social judgments, reflected in enhanced reaction times, may explain the slower processing of self-associated stimuli in that dimension. We agree that not all findings will align with initial hypotheses, and this variability presents avenues for further research. We have added this to the discussion of social salience effects.

      e. Whether association strength can account for the findings: We appreciate the scepticism regarding the strength of self-association with shapes. However, our within-participant design and control matching task indicate that the relative processing advantage for self-associated stimuli holds across conditions. This makes the scenario that “the self-association with shapes was not strong enough to demonstrate attention bias” very unlikely. Firstly, the relative processing advantage of self-associated stimuli in the perceptual decision condition, and the absence of such advantage in the social decision condition, were evidenced in the same participants. Hence, the strength of association between shapes and social identities was the same for both conditions. However, we only find an advantage for the self-associated shape when participants make perceptual (shape) judgements. It is therefore highly unlikely that the “association strength” can account for the difference in the outcomes between the conditions in experiment 1. Also, note that the order in which these conditions were presented was counter-balanced across participants, reducing the possibility that the automatic self-advantage was merely a result of learning or fatigue. Secondly, all participants completed the standard matching task to ascertain that the association between shapes and identities did indeed lead to processing advantages (across different levels).

      In summary, we believe that the evidence we provide supports the final conclusions. We do, of course, welcome any further empirical evidence that could enhance our understanding of the contribution of different processing levels to the SPE and are committed to exploring these areas in future work.

      (2) The generalization of the findings needs further examination. The current results seem to rely heavily on the perceptual matching task. Whether this attentional selection mechanism of self-prioritization can be generalized to other stimuli, such as self-name, self-face, or other domains of self-association advantages, remains to be tested. In other words, more converging evidence is needed.

      The reviewer indicates that the current findings heavily rely on the perceptual matching task, and it would be more convincing to include other paradigm(s) and different types of stimuli. We are happy to address these points here: first, we specifically used a temporal order paradigm to tap into specific processes, rather than merely relying on the matching task. Attentional selection is, along with other processes, involved in matching, but the TOJ-TVA approach allows tapping into attentional selection specifically.  Second, self-prioritization effects have been replicated across a wide range of stimuli (e.g. faces: Wozniak et al., 2018; names or owned objects: Scheller & Sui, 2022a, or even fully unfamiliar stimuli: Wozniak & Knoblich, 2019) and paradigms (e.g. matching task: Sui et al., 2012; cross-modal cue integration: e.g. Scheller & Sui, 2022b; Scheller et al., 2023; continuous flash suppression: Macrae et al., 2017; temporal order judgment: Constable et al., 2019; Truong et al., 2017). Using neutral geometric shapes, rather than faces and names, addresses a key challenge in self research: mitigating the influence of stimulus familiarity on results. In addition, these newly learned, simple stimuli can be combined with other paradigms, such as the TOJ paradigm in the current study, to investigate the broader impact of self-processing on perception and cognition.

      To the best of our knowledge, this is the first study showing evidence about the mechanisms that are involved in early attentional selection of socially salient stimuli. Future replications and extensions would certainly be useful, as with any experimental paradigm.

      (3) The comparison between the "social" and "perceptual" tasks remains debatable, as it is challenging to equate the levels of social salience and perceptual salience. In addition, these two tasks differ not only in terms of social decoding processes but also in other aspects such as task difficulty. Whether the observed differences between the tasks can definitively suggest the specificity of social decoding, as the authors claim, needs further confirmation.

      Equating the levels of social and perceptual salience is indeed challenging, but not an aim of the present study. Instead, the present study directly compares the mechanisms and effects of social and perceptual salience, specifically experiment 2. By manipulating perceptual salience (relative colour) and social salience (relative shape association) independently and jointly, and quantifying the effects on processing rates, our study allows to directly delineate the contributions of each of these types of salience. The results suggest additive effects (see also Figure 7). Indeed, the possibility remains that these effects are additive because of the use of different perceptual features, so it would be helpful for future studies to explore whether similar perceptual features lead to (supra-/sub-) additive effects. In either case, the study design allows to directly compare the effects and mechanisms of social and perceptual salience.

      Regarding the social and perceptual decision dimensions, they were not expected to be equated. Indeed, the social decision dimension requires additional retrieval of the associated identity, making it likely more challenging. This additional retrieval is also likely responsible for the slower responses towards the social association compared to the shape itself. However, the motivation to compare the effects of these two decisional dimensions lies in the assumption that the self needs to be task relevant. Some evidence suggests that the self needs to be task-relevant to induce self-prioritization effects (e.g., Woźniak & Knoblich, 2022). However, these studies typically used matching tasks and were powered to detect large effects only (e.g. f = 0.4, n = 18). As it is likely that lacking contribution of decisional processing levels (which interact with task-relevance) will reduce the SPE, smaller self-prioritization effects that result from earlier processing levels may not be detected with sufficient statistical power. Targeting specific processing levels, especially those with relatively early contributions or small effect sizes, requires larger samples (here: n = 70) to provide sufficient power. Indeed, by contrasting the relative attentional selection effects in the present study we find that the self does not need to be task-relevant to produce self-prioritization effects. This is in line with recent findings of prior entry of self-faces (Jubile & Kumar, 2021)

      Reviewer #2 (Recommendations for the authors):

      Suggestions:

      (1) The research questions should be revised to better align with the conclusions. For example, Q2 is phrased as "Does self-relatedness bias attentional selection at the level of the perceptual feature representation (shape) or at the level of the associated identity (social association)," which is unclear in its reference to "levels." A more appropriate phrasing would be whether the self-association bias occurs automatically or whether it depends on explicit social decoding.

      Thank you for this suggestion – we have revised the phrasing accordingly: “Does self-relatedness bias attentional selection automatically or does it require explicit social decoding?”

      (2) After presenting the data, it would be helpful to include one or two sentences summarizing the conclusions drawn from the data and how they relate to the research questions. Currently, readers are left to guess whether the results are consistent with the hypotheses.

      Thank you for this suggestion, which we think will enhance the clarity of the manuscript – we have added summary sentences when presenting the results:<br /> “This cross-experimental parameter inspection revealed that participants exhibited an attentional selection bias towards socially associated information. Interestingly, enhanced processing speed was observed for other-associated rather than self-associated information, a pattern that diverged from our prediction.”

      (1) “Results from experiment 2 demonstrated a faster, more automatic attentional selection for self-associated information when the decision did not require explicit social decoding. When the social identity had to be judged, processing speed for self-associated information decreased. Contrary to the hypothesis that social decoding is necessary for self-prioritization to emerge, these findings suggest that attentional selection can operate automatically to prioritize self-associated information. “

      (2) “Taken together, as also confirmed in the cross-experimental analysis, attentional selection favoured the other-related information when social identity had to be judged. In contrast, perceptual salience, as predicted, led to increased processing speed for the more salient stimulus. “

      (3) The identity of the "other" used in the experiments is unclear, making it uncertain whether the results are self-specific. It would be beneficial to compare the self condition with a control condition, such as a close friend vs. an unfamiliar other. Alternatively, the results may reflect attentional bias for familiar vs. unfamiliar individuals rather than self-specific bias.

      Thank you for this comment. Firstly, we would like to clarify that we have provided participants with a description of who the “other” is (see methods: “At the beginning of this task, participants were told that one of the two geometric shapes that was used in the TOJ task has been assigned to them, and the other shape has been assigned to another participant in the experiment – someone they did not know, but who was of similar age and gender”). We aimed to make the ‘other’ as concrete as possible, while maintaining a ‘stranger’ identity.

      Secondly, this specification is in line with the vast majority of the literature, which typically measures the effects of self-prioritization relative to the association with an unfamiliar other (stranger), or an unfamiliar and familiar other (e.g. friend, family member). They find that processing advantages that affect friend-related stimuli (friend-stimuli being processed faster than stranger-associated stimuli) are likely mediated by self-extension, that is, an association of the friend with the self. As such, SPEs, relative to familiar others, are typically smaller in size (see, e.g., Sui et al., 2012). They, however, are less stable and more variable than the self-prioritization effects measured relative to a stranger (see Scheller & Sui, 2022 JEP:HPP). Importantly, this is driven by the variability of the friend-associated stimulus, rather than the self or other-associated stimulus (see Figure 4 in main text and S5 in supplementary material in Scheller & Sui, 2022: https://durham-repository.worktribe.com/output/1210478/the-power-of-the-self-anchoring-information-processing-across-contexts). Effectively, this would suggest that choosing a familiar other as a reference would not only (a) lead to a smaller effect size, but also (b) be a less stable effect, which likely depends on the association the individual has to the other familiar person. In contrast, by associating the other shape with another participant in this experiment, we provide participants not only with a concrete representation of a stranger, but also maximise our ability to detect true effects, as these are likely to be larger and more stable.

      (4) The key aspects of the procedure (e.g., the order of different conditions) and its rationale need to be clearly explained before or during the presentation of the results. Currently, readers are left to infer certain details.

      Thank you for pointing this out. The methods that provide these details are outlined at the end of the document, however, we agree it would be useful to bring some of these details up. We have therefore revised the methods figure (Figure 3) to include an outline of the task type, order, and trial numbers. Task boxes are colour coded by the conditions that are listed in the results figures of the manuscript. We also added these details to the caption of Figure 3.

      “Task structures of Experiments 1 and 2. Both experiments started with a TOJ baseline task. In Experiment 1, only non-salient targets were presented, while in Experiment 2, perceptually salient and non-salient trials were included. These were presented in randomly intermixed order. Next, targets were associated with social identities. Associations were practiced using the matching task. Following association learning, which attaches social salience to the shapes, participants completed the same TOJ task as before. In Experiment 1, they completed one block using a social decision dimension, and one block using a perceptual decision dimension. The order of these blocks was counterbalanced across participants to reduce the influence of order effects in the results. In Experiment 2, perceptually salient and non-salient stimuli were presented in an intermixed fashion, and participants responded within the social decision dimension. Each task block was preceded by 8 (matching) to 14 (TOJ) practice trials.”

      (5) Certain imprecise terms used to describe the results, such as "slightly," "roughly," and "loosely," create confusion for the readers. The authors should take a clearer stance on the results and provide an explanation for why the data only "slightly," "roughly," or "loosely" support the findings.

      Thank you for highlighting this. We have provided a more concrete wording and details throughout (e.g., “target shapes’ were 30% bigger than the ‘background shapes”).

      Lastly, we have updated the formatting of the manuscript to provide higher fidelity figures, which were previously compromised by file conversion.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This provocative manuscript from presents valuable comparisons of the morphologies of Archaean bacterial microfossils to those of microbes transformed under environmental conditions that mimic those present on Earth during the same Eon, although the evidence in support of the conclusions is currently incomplete. The reasons include that taphonomy is not presently considered, and a greater diversity of experimental environmental conditions is not evaluated -- which is important because we ultimately do not know much about Earth's early environments. The authors may want to reframe their conclusions to reflect this work as a first step towards an interpretation of some microfossils as 'proto-cells,' and less so as providing strong support for this hypothesis. 

      Regarding the taphonomic alterations: The editor and reviewers are correct in pointing out this issue. Taphonomic alteration of the microfossils attains special significance in the case of microorganisms, as they lack rigid structures and are prone to morphological alterations during or after their fossilization. We are acutely aware of this issue and have conducted long-term experiments (lasting two years) to observe how cells die, decay, and get preserved. A large section of the manuscript (pages 11 to 20) and a substantial portion of the supplementary information is dedicated to understanding the taphonomic alterations. To the best of our knowledge, these are among the longest experiments done to understand the taphonomic alterations of the cells within laboratory conditions. 

      Recent reports by Orange et al. (1,2)  showed that under favorable environmental conditions, cells could be fossilized rather rapidly with little morphological modifications. We observed a similar phenomenon in this work. Cells in our study underwent rapid encrustation with cations from the growth media. We have analyzed the morphological changes over a period of 18 months. After 18 months, the softer biofilms got encrusted entirely in salt and turned solid (Fig. ). Despite this transformation, morphologically intact cells could still be observed within these structures. This suggests that the cells inhabiting Archaean coastal marine environments could undergo rather rapid encrustation, and their morphological features could be preserved in the geological record with little taphonomic alteration.    

      Regarding the environmental conditions: We are in total agreement with the reviewers that much is unknown about Archaean geology and its environmental conditions. Like the present-day Earth, Archaean Earth certainly had regions that greatly differed in their environmental conditions—volcanic freshwater ponds, brines, mildly halophilic coastal marine environments, and geothermal and hydrothermal vents, to name a few. Our experimental design focuses on one environment we have a relatively good understanding of rather than the rest of the planet, of which we know little. Below, we list our reasons for restricting to coastal marine environments and studying cells under mildly halophilic experimental conditions.  

      (1) Very little continental crust from Haden and early Archaean Eon exists on the presentday Earth. Much of our geochemical understanding of this time period was a result of studying the Pilbara Iron Formations and the Barberton Greenstone Belt. Geological investigations suggest that these sites were coastal marine environments. The salinity of coastal marine environments is higher than that of open oceans due to the greater water evaporation within these environments. Moreover, brines were discovered within pillow basalts within the Barberton greenstone belt, suggesting that the salinity within these sites is higher or similar to marine environments. 

      (2) We are not certain about the environmental conditions that could have supported the origin of life. However, all currently known Archaean microfossils were reported from coastal marine environments (3.8-2.4Ga). This suggests that proto-life likely flourished in mildly halophilic environments, similar to the experimental conditions employed in our study. 

      (3) The chemical analysis of Archaean microfossils also suggests that they lived in saltrich environments, as most, if not all, microfossils are closely associated, often encrusted in a thin layer of salt.  

      However, we concur with the reviewers that our interpretations should be reassessed if Archaean microfossils that greatly differ from the currently known microfossils are to be discovered or if new microfossils are to be reported from environments other than coastal marine sites.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      Microfossils from the Paleoarchean Eon represent the oldest evidence of life, but their nature has been strongly debated among scientists. To resolve this, the authors reconstructed the lifecycles of Archaean organisms by transforming a Gram-positive bacterium into a primitive lipid vesicle-like state and simulating early Earth conditions. They successfully replicated all morphologies and life cycles of Archaean microfossils and studied cell degradation processes over several years, finding that encrustation with minerals like salt preserved these cells as fossilized organic carbon. Their findings suggest that microfossils from 3.8 to 2.5 billion years ago were likely liposome-like protocells with energy conservation pathways but without regulated morphology. 

      Strengths: 

      The authors have crafted a compelling narrative about the morphological similarities between microfossils from various sites and proliferating wall-deficient bacterial cells, providing detailed comparisons that have never been demonstrated in this detail before. The extensive number of supporting figures is impressive, highlighting numerous similarities. While conclusively proving that these microfossils are proliferating protocells morphologically akin to those studied here is challenging, we applaud this effort as the first detailed comparison between microfossils and morphologically primitive cells. 

      Weaknesses: 

      Although the species used in this study closely resembles the fossils morphologically, it would be beneficial to provide a clearer explanation for its selection. The literature indicates that many bacteria, if not all, can be rendered cell wall-deficient, making the rationale for choosing this specific species somewhat unclear. While this manuscript includes clear morphological comparisons, we believe the authors do not adequately address the limitations of using modern bacterial species in their study. All contemporary bacteria have undergone extensive evolutionary changes, developing complex and intertwined genetic pathways unlike those of early life forms. Consequently, comparing existing bacteria with fossilized life forms is largely hypothetical, a point that should be more thoroughly emphasized in the discussion. 

      Another weak aspect of the study is the absence of any quantitative data. While we understand that obtaining such data for microfossils may be challenging, it would be helpful to present the frequencies of different proliferative events observed in the bacterium used. Additionally, reflecting on the chemical factors in early life that might cause these distinct proliferation modes would provide valuable context. 

      Regarding our choice of using modern organisms or this particular bacterial species: 

      Based on current scientific knowledge, it is logical to infer that cellular life originated as protocells; nevertheless, there has been no direct geological evidence for the existence of such cells on early Earth. Hence, protocells remain an entirely theoretical concept. Moreover, protocells are considered to have been far more primitive than present-day cells. Surprisingly, this lack of sophistication was the biggest challenge in understanding protocells. Designing experiments in which cells are primitive (but not as primitive as non-living lipid vesicles) and still retain a functional resemblance to a living cell does pose some practical challenges. Laboratory experiments with substitute (proxy) protocells almost always come with some limitations. Although not a perfect proxy, we believe protocells and protoplasts share certain characteristics. Having said that, we would like to reemphasize that protoplasts are not protocells. Our reasons for using protoplasts as model organisms and working with this bacterial species (Exiguobacterium Strain-Molly) are based on several scientific and practical criteria listed below.

      (1) Irrespective of cell physiology and intracellular complexity, we believe that protoplasts and protocells share certain similarities in the biophysical properties of their cytoplasm. We explained our reasoning in the manuscript introduction and in our previous manuscripts (Kanaparthi et al., 2024 & Kanaparthi et al., 2023). In short, to be classified as a cell, even a protocell should possess minimal biosynthetic pathways, a physiological mechanism of harvesting free energy from the surrounding (energy-yielding pathways), and a means of replicating its genetic material and transferring it to the daughter cells. These minimal physiological processes could incorporate considerable cytoplasmic complexity. Hence, the biophysical properties of the protocell cytoplasm could have resembled those of the cytoplasm of protoplasts, irrespective of the genomic complexity. 

      (2) Irrespective of their physiology, protoplasts exhibit several key similarities to protocells, such as their inherent inability to regulate their morphology or reproduction. This similarity was pointed out in previous studies (3). Despite possessing all the necessary genetic information, protoplasts undergo reproduction through simple physiochemical processes independent of canonical molecular biological processes. This method of reproduction is considered to have been erratic and rather primitive, akin to the theoretical propositions on protocells. Although protoplasts are fully evolved cells with considerable physiological complexity, the above-mentioned biophysical similarities suggest that the protoplast life cycle could morphologically resemble that of protocells (in no other aspect except for their morphology and reproduction).  

      (3) Physiologically or genomically different species of Gram-positive protoplasts are shown to exhibit similar morphologies. This suggests that when Gram-positive bacteria lose their cell wall and turn into a protoplast,  they reproduce in a similar manner independent of physiological or genome-based differences. As morphology and only morphology is key to our study, at least from the scope of this study, intracellular complexity is not a key consideration. 

      (4) This specific strain was isolated from submerged freshwater springs in the Dead Sea. This isolate and members of this bacterial genus are known to have been well acclimatized to growing in a wide range of salt concentrations and in different salt species. This is important for our study (this and previous manuscript), in which cells must be grown not only at high salt concentrations (1-15%) but in different salts like NaCl, MgCl<sub>2</sub>, and KCl. 

      (5) Our initial interest in this isolate was due to its ability to reduce iron at high salt concentrations. Given that most spherical microfossils are found in Archaean-banded iron formations covered in pyrite, this suggests that these microfossils could have been reducing oxidized iron species like Fe(III). Nevertheless, over the course of our study, we realized the complexities of live cell staining and imaging under anoxic conditions. Given that the scope of the manuscript is restricted only to comparing the morphologies, not the physiology, we abandoned the idea of growing cells under anoxic conditions.  

      Based on these observations, cell physiology may not be a key consideration, at least within the scope of studying microfossil morphology. However, we want to emphasize again that “We do not claim present-day protoplasts are protocells.”  

      Regarding the absence of quantitative data:

      We are unsure what the reviewer meant by the absence of quantitative data. Is it from the cell size/reproductive pathways perspective or from a microfossil/ecological perspective? At the risk of being portrayed in a bad light, we admit that we did not present quantitative data from either of these perspectives. In our defense, this was not due to our lack of effort but due to the practical limitations imposed by our model organism. 

      If the reviewer means the quantitative data regarding cell sizes and morphology: In our previous work, we studied the relationship between protoplast morphology, growth rate, and environmental conditions. In that study, we proposed that the growth rate is one factor that regulates protoplast morphology. Nevertheless, we did not observe uniformity in the sizes of the cells. This lack of uniformity was not just between the replicates but even among the cells grown within the same culture flask or the cells within the same microscopic field. Moreover, cells are often observed to be reproducing either by forming internal or external or by both these processes at the same time. The size and morphological differences among cells within a growth stage could be explained by the physiological and growth rate heterogenicity among cells. 

      Bacterial growth curves and their partition into different stages (lag, log & stationary), in general, represent the growth dynamics of an entire bacterial population. Nevertheless, averaging the data obscures the behavior of individual cells (4,5). It is known that genetically identical cells within a single bacterial population could exhibit considerable cell-to-cell variation in gene expression (6,7) and growth rates (8). The reason for such stochastic behavior among monoclonal cells has not been well understood. In the case of normal cells, morphological manifestation of these variations is restricted by a rigid cell wall. Given the absence of a cell wall in protoplasts, we assume such cell-to-cell variations in growth rate is manifested in cell morphology. This makes it challenging to quantitatively determine variations in cell sizes or the size increase in a statically robust manner, even in monoclonal cells. 

      Although this lack of uniformity in cell sizes should not be perceived as a limitation, this behavior is consistently observed among microfossils. Spherical microfossils of similar morphology but different sizes were reported from different microfossil sites (9,10). In this regard, both protoplasts and microfossils are very similar. 

      If the reviewer means the quantitative data from an ecological perspective: 

      Based on the elemental composition and the isotopic signatures of the organic carbon, we can deduce if these structures are of biological origin or not. However, any further interpretation of this data to annotate these microfossils to a particular physiology group is fraught with errors. Hence, we refrain from making any inferences about the physiology and ecological function of these microfossils. This lack of clarity on the physiology of microfossils reduces the chance of quantitative studies on their ecological functions. Moreover, we would like to re-emphasize that the scope of this work is restricted to morphological comparison and is not targeted at understanding the ecological function of these microfossils. This narrow objective also limits the nature of the quantitative data we could present.

      Moreover, developing a quantitative understanding of some phenomena could be technically challenging. Many theories on the origin of life, like chemical evolution, started with the qualitative observation that lightning could mediate the synthesis of biologically relevant organic carbon. Our quantitative understanding of this process is still being explored and debated even to this day.     

      Reviewer #2 (Public Review): 

      Summary: 

      In summary, the manuscript describes life-cycle-related morphologies of primitive vesiclelike states (Em-P) produced in the laboratory from the Gram-positive bacterium Exiguobacterium Strain-Molly) under assumed Archean environmental conditions. Em-P morphologies (life cycles) are controlled by the "native environment". In order to mimic Archean environmental conditions, soy broth supplemented with Dead Sea salt was used to cultivate Em-Ps. The manuscript compares Archean microfossils and biofilms from selected photos with those laboratory morphologies. The photos derive from publications on various stratigraphic sections of Paleo- to Neoarchean ages. Based on the similarity of morphologies of microfossils and Em-Ps, the manuscript concludes that all Archean microfossils are in fact not prokaryotes, but merely "sacks of cytoplasm". 

      Strengths: 

      The approach of the authors to recognize the possibility that "real" cells were not around in the Archean time is appealing. The manuscript reflects the very hard work by the authors composing the Em-Ps used for comparison and selecting the appropriate photo material of fossils. 

      Weaknesses: 

      While the basic idea is very interesting, the manuscript includes flaws and falls short in presenting supportive data. The manuscript makes too simplistic assumptions on the "Archean paleoenvironment". First, like in our modern world, the environmental conditions during the Archean time were not globally the same. Second, we do not know much about the Archean paleoenvironment due to the immense lack of rock records. More so, the Archean stratigraphic sections from where the fossil material derived record different paleoenvironments: shelf to tidal flat and lacustrine settings, so differences must have been significant. Finally, the Archean spanned 2.500 billion years and it is unlikely that environmental conditions remained the same. Diurnal or seasonal variations are not considered. Sediment types are not considered. Due to these reasons, the laboratory model of an Archean paleoenvironment and the life therein is too simplistic. Another aspect is that eucaryote cells are described from Archean rocks, so it seems unlikely that prokaryotes were not around at the same time. Considering other fossil evidence preserved in Archean rocks except for microfossils, the many early Archean microbialites that show baffling and trapping cannot be explained without the presence of "real cells". With respect to lithology: chert is a rock predominantly composed of silica, not salt. The formation of Em-Ps in the "salty" laboratory set-up seems therefore not a good fit to evaluate chert fossils. Formation of structures in sediment is one step. The second step is their preservation. However, the second aspect of taphonomy is largely excluded in the manuscript, and the role of fossilization (lithification) of Em-Ps is not discussed. This is important because Archean rock successions are known for their tectonic and hydrothermal overprint, as well as recrystallization over time. Some of the comparisons of laboratory morphologies with fossil microfossils and biofilms are incorrect because scales differ by magnitudes. In general, one has to recognize that prokaryote cell morphologies do not offer many variations. It is possible to arrive at the morphologies described in various ways including abiotic ones. 

      Regarding the simplistic presumptions on the Archaean Eon environmental conditions, we provided a detailed explanation of this issue in our response to the eLife evaluation. In short, we agree with the reviewer that little is known about the Archaean Eon environmental conditions at a planetary scale. Hence, we restricted our study to one particular environment of which we had a comparatively good understanding. The Archaean Eon spanned 2.5 billion years. However, most of the microfossil sites we discussed in the manuscript are older than 3 billion years, with one exception (2.4 billion years old Turee Creek microfossils). We presume that conditions within this niche (coastal marine) environment could not have changed greatly until 2Ga, after which there have been major changes in the ocean salt composition and salinities.

      In the manuscript, we discussed extensively the reasons for restricting our study to these particular environmental conditions. Further explanations of these choices are presented in our response to the eLife evaluation (also see our previous manuscript). In short, the fact that all known microfossils are restricted to coastal marine environments justifies the experimental conditions employed in our study. Nevertheless, we agree with the reviewer that all lab-based studies involve some extent of simplification. This gap/mismatch is even wider when it comes to studies involving origin or early life on Earth.

      We are not arguing that prokaryotes are not around at this time. The key message of the manuscript is that they are present, but they have not developed intracellular mechanisms to regulate their morphology and remained primitive in this aspect.  

      The sizes of the microfossils and cells from our study were similar in most cases. However, we agree with the reviewer that they deviated considerably in some cases, for example, S70, S73, and S83. These size variations are limited to sedimentary structures like laminations rather than cells. These differences should be expected as we try to replicate the real-life morphologies of biofilms that could have extended over large swats of natural environments in a 2ml volume chamber slide. More specifically, in Fig. S70, there is a considerable size mismatch. But, in Fig. S73, the sizes were comparable between A & C (of course, the size of our reproduction did not match B). In the case of Fig. S83, we do not see a huge size mismatch.      

      Reviewer #1 (Recommendations For The Authors): 

      We would like to provide several suggestions for changes in text and additions to data analysis. 

      39-41: It has been stated that reconstructing the lifecycle is the only way of understanding the nature of these microfossils. First of all, I would rephrase this to 'the most promising way', as there are always multiple approaches to comparing phenomena. 

      We agree with the reviewer's suggestion. The suggested changes have been made (line 41). 

      125: Please rephrase "under the environmental condition of early Earth" to "under experimental conditions possibly resembling the conditions of the Paleoarchean Eon". Now it sounds like the exact environmental conditions have been produced, which has already been debated in the discussion. 

      We agree with the reviewer's suggestion. The suggested changes have been made (line 127). 

      125: Please mention the fold change in size, the original size in numbers, and whether this change is statistically significant. 

      In the above sections of this document, we explained our reservations about presenting the exact number.

      128: Have you found a difference in the relative percentages of modes of reproduction? In other words, is there a difference in percentage between forming internal daughter cells or a string of external daughter cells? 

      We explained our reservations about presenting the exact number above. But this has been extensively discussed in our accompaining manuscript. We want to reemphasize that the scope of this manuscript is restricted to comparing morphologies rather than providing a mechanistic explanation of the reproduction process. 

      151: A similar model for endocytosis has already been described in proliferating wall-less cells (Kapteijn et al., 2023). In the discussion, please compare your results with the observations made in that paper. 

      This is an oversight on our part. The manuscript suggested by the reviewer has now been added (line 154 & 155).  

      163: Please use another word for uncanny. We suggest using 'strong resemblance'. 

      We changed this according to the reviewers' suggestion (line 168). 

      433: Please elaborate on why the results are not shown. This sounds like a statement that should be substantiated further. 

      To observe growth and simultaneously image the cells, we conducted these experiments in chamber slides (2ml volume). Over time, we observed cells growing and breaking out of the salt crust (Fig. S86, S87 & Movie 22) and a gradual increase in the turbidity of the media. Although not quantitative, this is a qualitative indication of growth. We did not take precise measurements for several reasons. This sample is precious; it took us almost two years to solidify the biofilm completely, as shown in Fig. S84A. Hence, it was in limited supply, which prevented us from inoculating these salt crusts into large volumes of fresh media. Given a long period of starvation, these cells often exhibited a long lag phase (several days), and there wasn't enough volume to do OD measurements over time. 

      We also crushed the solidified biofilm with a sterile spatula before transferring it into the chamber slide with growth media. This resulted in debris in the form of small solid particles, which interfered with our OD measurements. These practical considerations made it challenging to determine the growth precisely. Despite these challenges, we measured an OD of 4 in some chamber slides after two weeks of incubation. Given that these measurements were done haphazardly, we chose not to present this data. 

      456: Could you please double-check whether the description is correct for the figure? 8C and 8D are part of Figure 8B, but this is stated otherwise in the description. 

      We thank the reviewer for pointing it out. It has now been rectified (line 461-472).

      Reviewer #2 (Recommendations For The Authors): 

      We thank Reviewer #2  for carefully reading the manuscript and such an elaborate list of questions. The revisions suggested have definitely improved the quality of the manuscript. Here, we would like to address some of the questions that came up repeatedly below. One frequently asked question is regarding the letters denoting the individual figures within the images. For comparison purposes, we often reproduced previously published images. To maintain a consistent figure style, we often have to block the previous denotations with an opaque square and give a new letter. 

      The second question that appeared repeatedly below is the missing scale bars in some of the images within a figure. We often did not include a scale bar in the images when this image is an enlarged section of another image within the same figure.     

      Title: Please consider being more precise in the title. Microfossils are only one fossil group of "oldest life". Perhaps better: "On the nature of some microfossils in Archean rocks". (see also Line 37).  

      Authors’ response: The title conveys a broader message without quantitative insinuations. If our manuscript had been titled "On the nature of all known Archaean microfossils,” we should have agreed with the reviewer's suggestion and changed it to "On the nature of some microfossils in Archean rocks". As it is not, we respectfully decline to make this modification.     

      Abstract:  

      Line 41: "one way", not "the only way" 

      We agree with the reviewer’s comment, and necessary changes have been made (line 41).  

      Introduction: 

      Line 58f: "oldest sedimentary rock successions", not "oldest known rock formations". There are rocks of much older ages, but those are not well preserved due to metamorphic overprint, or the rocks are igneous to begin with. Minor issue: please note that "formations" are used as stratigraphic units, not so much to describe a rock succession in the field. 

      We agree with the reviewer’s comment and have made necessary changes (line 58).

      Line 67: Microfossils are widely accepted as evidence of life. Please rephrase. 

      We agree with the reviewer’s comment, and necessary changes have been made.

      Line 71 - 74: perhaps add a sentence of information here.

      We agree with the reviewer’s comment, and necessary changes have been made (line 71).

      Line 76: which "chemical and mineralogical considerations"? 

      This has been rephrased to “Apart from the chemical and δ<sup>13</sup>C-biomass composition” (line 76).

      Line 84ff: This is a somewhat sweeping statement. Please remember that there are microbialites in such rocks that require already a high level of biofilm organization. The existence of cyanobacteria-type microbes in the Archean is also increasingly considered. 

      We are aware of literature that labeled the clusters of Archaean microfossils as biofilms and layered structures as microbialites or stromatolite-like structures. However, the use of these terms is increasingly being discouraged. A more recent consensus among researchers suggests annotating these structures simply as sedimentary structures, as microbially induced sedimentary structures (MISS). 

      We respectfully disagree with the reviewer’s comment that Archaean microfossils exhibit a high level of biofilm organization. We are not aware of any studies that have conducted such comprehensive research on the architecture of Archaean biofilms. We are not even certain if these clusters of Archaean cells could even be labeled as biofilms in the true sense of the term. We presently lack an exact definition of a biofilm. In our study, we do see sedimentation and bacteria and their encapsulation in cell debris. From a broader perspective, any such aggregation of cells enclosed in cell debris could be annotated as a biofilm. However, more in-depth studies show that biofilm is not a random but a highly organized structure. Different bacterial species have different biofilm architectures and chemical composition. The multispecies biofilms in natural environments are even more complex. We do agree with the reviewer that these structures could broadly be labeled as biofilms, but we presently lack a good, if any, understanding of the Archaean biofilm architecture. 

      Regarding the annotation of microfossils as cyanobacteria, we respectfully disagree with the reviewer. This is not a new concept. Many of the Archaean microfossils were annotated as cyanobacteria at the time of their discovery. This annotation is not without controversy. With the advent of genome-based studies, researchers are increasingly moving away from this school of thought.  

      Line 101ff: The conditions on early Earth are unknown - there are many varying opinions. Perhaps simply state that this laboratory model simulates an Archean Earth environment of these conditions outlined. 

      This is a good idea. We thank the reviewer for this suggestion, and we made appropriate changes. 

      Line 112: manuscript to be replaced by "paper"? 

      This change has been made (line 114).

      Line 116: "spanned years" - how many years? 

      We now added the number of years in the brackets (line 118).

      Results: 

      Line 125: see comment for 101ff. 

      we made appropriate changes. 

      Figure 1: Caption: Please write out ICV the first time this abbreviation is used. Images: Note that some lettering appears to not fit their white labels underneath. (G, H, I, J0, and M). 

      We apologize; this is an oversight on our part. We now spell complete expansion of ICV, the first time we used this abbreviation. 

      We took these images from previously published work (references in the figure legend), so we must block out the previous figure captions. This is necessary to maintain a uniform style throughout the manuscript. 

      Line 152ff.: here would be a great opportunity to show in a graph the size variations of modern ICVs and to compare the variations with those in the fossil material. 

      In the above sections of this document, we explained our reservations about presenting the exact number.

      Line 159f.: Fig.1K - what is to see here? Maybe a close-up or - better - a small sketch would help? 

      Fig. 1K shows the surface depressions formed during the vesicle formation. The surface characteristics of EM-P and microfossils is very similar.   

      Line 161f.: reference?  

      The paragraph spanning lines 159 to 172 discusses the morphological similarities between EM-P and SPF microfossils. We rechecked the reference no 35 (Delarue 2019). This is the correct reference. We do not see a mistake if the reviewer meant the reference to the figures.    

      Line 164ff.: A question may be asked, how many fossils of the Strelley Pool population would look similar to the "modeled" ones. Questions may rise in which way the environmental conditions control such morphology variations. Perhaps more details? 

      This relationship between the environmental conditions and the morphology is discussed extensively in our previous work (11).  

      Line 193: what is meant by "similar discontinuous distribution of organic carbon"?

      This statement highlights similarities between EM-P and microfossils. The distribution of cytoplasm within the cells is not uniform. There are regions with and devoid of cytoplasm, which is quite unusual for bacteria. Some previous studies argued that this could indicate that these organic structures are of abiotic origin. Here, we show that EMP-like cells could exhibit such a patchy distribution of cytoplasm within the cell.    

      Line 218 - 291: The observations are very nice, however, the figures of fossil material in Figures 3 A, B, and C appear not to conform. Perhaps use D, E and I to K. Also, S48 does not show features as described here (see below).  

      We did not completely understand the reviewer’s question. As mentioned in the figure legend, both the microfossils and the cells exhibit string with spherical daughter cells within them. Moreover, there are also other similarities like the presence of hollow spherical structures devoid of organic carbon. We also saw several mistakes in the Fig. S48 legend. We have rectified them, and we thank the reviewer for pointing them out.   

      Line 293f: Title with "." at end?

      This change has been made.

      Line 298: predominantly in chert. In clastic material preservation of cells and pores is unlikely due to the common lack of in situ entombment by silica. 

      We rephrased this entire paragraph to better convey our message. Either way, we are not arguing that hollow pore spaces exist. As the reviewer mentioned, they will, of course, be filled up with silica. In this entire paragraph, we did not refer to hollow spaces. So, we are not entirely sure what the question was.     

      Line 324, 328-349: Please see below comments on the supplementary figures 51-62. Some of the interpretations of morphologies may be incorrect. 

      Please find our response to the reviewer’s comments on individual figures below.  

      Figure 5 A to D look interesting, however E to J appear to be unconvincing. What is the grey frame in D (not the white insert). 

      The grey color is just the background that was added during the 3D rendering process.  

      Figure 6 does not appear to be convincing. - Erase? 

      We did not understand the reviewer’s reservations regarding this figure. Images A-F within the figure show the gradual transformation of cells into honeycomb-like structures, and images G-J show such structures from the Archaean that are closely associated with microfossils. Moreover, we did not come up with this terminology (honeycomb-like). Previous manuscripts proposed it.  

      Line 379ff: S66 and 69, please see my comments below. Microfossils "were often discovered" in layers of organic carbon. 

      Please see our response below.   

      Line 393-403: Laminae? There are many ways to arrive at C-rich laminae, especially, if the material was compressed during burial. Basically, any type of biofilm would appear as laminae, if compressed. The appearance of thin layers is a mere coincidence. Note that the scale difference in S70, S73, as well as S83, is way too high (cm versus μm!) to allow any such sweeping conclusions. What are α- and β- laminations, the one described by Tice et al.? The arguments are not convincing.

      We propose that cells be compressed to form laminae. We answered this question above about the differences in the scale bars. Yes, we are referring to α- and β- laminations described by Tice et al.       

      Figure 7: This is an interesting figure, but what are the arguments for B and C, the fossil material, being a membrane? Debris cannot be distinguished with certainty at this scale in the insert of C. B could also be a shriveled-up set of trichomes.  

      We agree with the reviewer that debris cannot be definitely differentiated. Traditionally, annotations given to microfossil structures such as biofilm, intact cells, or laminations were all based on morphological similarities with existing structures observed in microorganisms. Given that the structures observed in our study are very similar to the microfossil structures, it is logical to make such inferences. Scales in A & B match perfectly well. The structure in C is much larger, but, as we mentioned in reply to one of the reviewer’s earlier questions, some of the structures from natural environments could not be reproduced at scale in lab experiments. Working in a 2 ml chamber slides does impose some restrictions.   

      Figure 8: The figure does not show any honeycomb patterns. The "gaps" in the Moodies laminae are known as lenticular particles in biofilms. They form by desiccated and shriveledup biofilm that mineralizes in situ. Sometimes also entrapped gases induce precipitation. Note also that the modelled material shows a kind of skin around the blobs that are not present in the Moodies material.  

      We agree that entrapped gas bubbles could have formed lenticular gaps. In the manuscript, we did not discount this possibility. However, if that is the case, one should explain why we often find clumps of organic carbon within these gaps. As we presented a step-by-step transformation of parallel layers of cells into laminations, which also had similar lenticular gaps, we believe this is a more plausible way such structures could have formed. In the end, there could have been more than one way such structures could have been formed. 

      We do see the honeycomb pattern in the hollow gaps. Often, the 3D-rendering of the STED images obscures some details. Hence, in the figure legend, we referred to the supplementary figures also show the sequence of steps involved in the formation of such a pattern.      

      Line 405-417: During deposition of clastic sediment any hollow space would be compressed during burial and settling. It is rare that additional pore space (except between the graingrain-contacts) remains visible, especially after consolidation. The exception would be if very early silicification took place filling in any pore space. What about EPS being replaced by mineralic substance? The arguments are not convincing. 

      We are suggesting that EPS or cell debris is rapidly encrusted by cations from the surrounding environment and gets solidified into rigid structures. This makes it possible for the structures to be preserved in the fossil record. We believe that hollow structures like the lenticular gaps will be filled up with silica. 

      We do not agree with the reviewer’s comment that all biological structures will be compressed. If this is true, there should be no intact microfossils in the Archaean sedimentary structures, which is definitely not the case.      

      Line 419-430: Lithification takes place within the sediment and therefore is commonly controlled by the chemistry of pore water and chemical compounds that derive from the dissolution of minerals close by. Another aspect to consider is whether "desiccation cracks" on that small scale may be artefacts related to sample preparation (?).  

      We agree that desiccation cracks could have formed during the sample preparation for SEM imaging, as this involves drying the biofilms. However, we observed that the sample we used for SEM is a completely solidified biofilm (Fig. S84), so we expect little change in its morphology during drying. Moreover, visible cracks and pointy edges were also observed in wet samples, as shown in Fig. S87.        

      Line 432 - 439: Please see comments on the supplementary material below.

      Please find our response to the reviewer’s comments on individual figures below.  

      Discussion:  

      Line 477f: "all known microfossil morphologies" - is this a correct statement? Also, would the Archean world provide only one kind of "EM-P type"? Morphologies of prokaryote cells (spherical, rod-shaped, filamentous) in general are very simple, and any researcher of Precambrian material will appreciate the difficulties in concluding on taxonomy. There are papers that investigate putative microfossils in chert as features related to life cycles. Microfossil-papers commonly appear not to be controversial give and take some specific cases.  

      We made a mistake in using the term “all known microfossil morphologies.” We have now changed it to “all known spherical microfossils” from this statement (line 483). However, we do not agree with the statement that microfossil manuscripts tend not to be controversial. Assigning taxonomy to microfossils is anything but controversial. This has been intensely debated among the scientific community.     

      Line 494-496: This statement should be in the Introduction.

      We agree with the reviewer’s comment. In an earlier version of the manuscript this statement was in the introduction. To put this statement in its proper context, it needs to be associated with a discussion about the importance of morphology in the identification of microfossils. The present version of the manuscript do not permit moving an entire paragraph into the introduction. Hence, we think making this statement in the discussion section is appropriate. 

      Line 484ff. The discussion on biogenicity of microfossils is long-standing (e.g., biogenicity criteria by Buick 1990 and other papers), and nothing new. In paleontology, modern prokaryotes may serve as models but everyone working on Archean microfossils will agree that these cannot correspond to modern groups. An example is fossil "cyanobacteria" that is thought to have been around already in the early Archean. While morphologically very similar to modern cyanobacteria, their genetic information certainly differed - how much will perhaps remain undisclosed by material of that high age.  

      Yes, we agree with the reviewer that there has been a longstanding conflict on the topic of biogenicity of microfossils. However, we have never come across manuscripts suggesting that modern microorganisms should only be used as models. If at all, there have been numerous manuscripts suggesting that these microfossils represent cyanobacteria, streptomycetes, and methanotrophs. Regarding the annotation of microfossils as cyanobacteria, we addressed this issue in one of the previous questions raised by the reviewer.    

      Line 498ff: Can the variation of morphology and sizes of the EM-Ps be demonstrated statistically? Line 505ff are very speculative statements. Relabeling of what could be vesicles as "microfossils" appears inappropriate. Contrary to what is stated in the manuscript, the morphologies of the Dresser Formation vesicles do not resemble the S3 to S14 spheroids from the Strelley Pool, the Waterfall, and Mt Goldsworthy sites listed in the manuscript. The spindle-shaped vesicles in Wacey et al are not addressed by this manuscript. What roles in mineral and element composition would have played diagenetic alteration and the extreme hydrothermal overprint and weathering typical for Dresser material? S59, S60 do not show what is stated, and the material derives from the Barberton Greenstone Belt, not the Pilbara.

      Please see the comments below regarding the supplementary images. 

      We did not observe huge variations in the cell morphology. Morphologies, in most cases, were restricted to spherical cells with intracellular vesicles or filamentous extensions. Regarding the sizes of the cells, we see some variations. However, we are reluctant to provide exact numbers. We have presented our reasons above.

      We respectfully disagree with the reviewer’s comments. We see quite some similarities between Dresser formation microfossils and our cells. Not just the similarities, we have provided step-by-step transformation of cells that resulted in these morphologies. We fail to see what exactly is the speculation here. The argument that they should be classified as abiotic structures is based on the opinion that cells do form such structures. We clearly show here that they can, and these biological structures resemble Dresser formation microfossils more closely than the abiotic structures. 

      Regarding the figures S3-S14. We think they are morphologically very similar. Often, it's not just comparing both images or making exact reproductions (which is not possible). We should focus on reproducing the distinctive morphological features of these microfossils.  

      We agree with the reviewer that we did not reproduce all the structures reported by Wacey’s original manuscript, such as spherical structures. We are currently preparing another manuscript to address the filamentous microfossils. These spindle-like structures will be addressed in this subsequent work. 

      We agree with the reviewer, we often have difficulties differentiating between cells and vesicles. This is not a problem in the early stages of growth. During the log phase, a significant volume of the cell consists of the cytoplasm, with hollow vesicles constituting only a minor volume (Fig. 1B or S1A). During the later growth stages (Fig. 1E7F or S11), cells were almost hollow, with numerous daughter cells within them. These cells often resemble hollow vesicles rather than cells. However, given these are biologically formed structures, and one could argue that these vesicles are still alive as there is still a minimal amount of cytoplasm (Fig. S27). Hence, we should consider them as cells until they break apart to release daughter cells. 

      Regarding Figures S59 and S60, we did not claim either of these microfossils is from Pilbara Iron Formations. The legend of Figure S59 clearly states that these structures are from Buck Reef Chert, originally reported by Tice et al., 2006 (Figure 16 in the original manuscript). The legend of Figure S60 says these structures were originally reported by Barlow et al., 2018, from the Turee Creek Formation. 

      Line 546f and 552: The sites including microfossils in the Archean represent different paleoenvironments ranging from marine to terrestrial to lacustrine. References 6 and 66 are well-developed studies focusing on specific stratigraphic successions, but cannot include information covering other Archean worlds of the over 2.5 Ga years Archean time.  

      All the Archaean microfossils reported to date are from volcanic coastal marine environments. We are aware that there are rocky terrestrial environments, but no microfossils have been reported from these sites. We are unaware of any Archaean microfossils reported from freshwater environments. 

      Line 570ff: The statements may represent a hypothesis, but the data presented are too preliminary to substantiate the assumptions.

      We believe this is a correct inference from an evolutionary, genomic, and now from a morphological perspective. 

      Figures:  

      Please check all text and supplementary figures, whether scale bars are of different styles within the figure (minor quibble). 

      S3 (no scale in C, D); S4, S5: Note that scale bars are of different styles. 

      We believe we addressed this issue above. 

      S6 D: depressions here are well visible - perhaps exchange with a photo in the main text? Note that scale bars are of different styles.  

      We agree that depressions are well visible in E. The same image of EM-P cell in E is also present in Fig. 1D in the main text.   

      S7: Scale bars should all be of the same style, if anyhow possible. Scale in D? 

      We believe we addressed this issue above. 

      S9: F appears to be distorted. Is the fossil like this? The figure would need additional indicators (arrows) pointing toward what the reader needs to see - not clear in this version. More explanation in the figure caption could be offered. 

      We rechecked the figure from the original publication to check if by mistake the figure was distorted during the assembly of this image. We can assure you that this is not the case. We are not sure what further could be said in the figure legend.     

      S13: What is shown in the inserts of D and E that is also visible in A and B? Here a sketch of the steps would help. 

      We did not understand the question.  

      S14: Scale in A, B? 

      We believe we addressed this issue above. 

      S15: Scales in A, E, C, D 

      We believe we addressed this issue above. 

      S16: scales in D, E, G, H, I, J?  

      We believe we addressed this issue above. 

      S17: "I" appears squeezed, is that so? If morphology is an important message, perhaps reduce the entire figure so it fits the layout. Note that labels A, B, C, and D are displaced. 

      As shown in several subsequent figures, the hollow spherical vesicles are compressed first into honeycomb-like structures, and they often undergo further compression to form lamination-like structures. Such images often give the impression that the entire figure is squashed, but this is not the case. If one examines the figure closely, you could see perfectly spherical vesicles together with laterally sqeezed structures. Regarding the figure labels, we addressed this issue above. 

      S18: The filamentous feature in C could also be the grain boundaries of the crystals. Can this be excluded as an interpretation? Are there microfossils with the cell membranes? That would be an excellent contribution to this figure. Note that scale bars are of different styles.

      If this is a one-off observation, we could have arrived at the reviewer's opinion. But spherical cells in a “string of beads” configuration were frequently reported from several sites, to be discounted as mere interpretation.    

      S19: The morphologies in A - insert appear to be similar to E - insert in the lower left corner. The chain of cells in A may look similar to the morphologies in E - insert upper right of the image. B - what is to see here? D - the inclusions do not appear spherical (?). Does C look similar to the cluster with the arrow in the lower part of image E? Note that scale bars are of different styles (minor quibble). A, B, C, and D appear compressed. Perhaps reduce the size of the overall image?  

      The structures highlighted (yellow box) in C are similar to the highlighted regions in E—the agglomeration of hollow vesicles. It is hard to get understand this similarity in one figure. The similarities are apparent when one sees the Movie 4 and Fig. S12, clearly showing the spherical daughter cells within the hollow vesicle. We now added the movie reference to the figure legend.    

      S20: A appears not to contribute much. The lineations in B appear to be diagenetic. However, C is suitable. Perhaps use only C, D, E? 

      We believe too many unrecognizable structures are being labeled as diagenetic. Nevertheless, we do not subscribe to the notion that these are too lenient interpretations. These interpretations are justified as such structures have not been reported from live cells. This is the first study to report that cells could form such structures. As we now reproduced these structures, an alternate interpretation that these are organic structures derived from microfossils should be entertained. 

      S 21: Note that scale bars are of different styles.  

      We believe we addressed this issue above. 

      S22: Perhaps add an arrow in F, where the cell opened, and add "see arrow" in the caption? Is this the same situation as shown in C (white arrow)? What is shown by the white arrow in A? Note that scale bars are of different styles.

      We did the necessary changes.  

      S23: In the caption and main text, please replace "&" with "and" (please check also the other figure captions, e.g. S24). Note that scale bars are of different styles. What is shown in F? A, D - what is shown here?

      We replaced “&” with “and.”  

      S24: Note that scale bars are of different styles. Note that Wacey et al. describe the vesicles as abiotic not as "microfossils"; please correct in figure caption [same also S26; 25; 28].

      We are aware of Prof. Dr. Wacey’s interpretations. We discuss it at length in the discussion section our manuscript. Based on the similarities between the Dresser formation structures and structures formed by EM-P, we contest that these are abiotic structures.  

      S25: Appears compressed; note different scale bars. 

      We believe we addressed this issue above. 

      S28: The label in B is still in the upper right corner; scale in D? What is to see in rectangles (blue and red) in A, B? In fossil material, this could be anything. 

      These figures are taken from a previous manuscript cited in the figure legend. We could not erase or modify these figures.  

      S33: "L"ewis; G appears a bit too diffuse - erase? Note that scale bars are of different styles.

      We believe we addressed this issue above. 

      S34: This figure appears unconvincing. Erase? 

      There are considerable similarities between the microfossils and structures formed by EM-P. If the reviewer expands a bit on what he finds unconvincing, we can address his reservations.    

      S35: It would be more convincing to show only the morphological similarities between the cell clusters. B and C are too blurry to distinguish much. Scales in D to F and in sketches? A appears compressed (?). 

      We rechecked the original manuscript to see if image A was distorted while making this figure, but this is not the case. Regarding B & C, cells in this image are faint as they are hollow vesicles and, by nature, do not generate too much contrast when imaged with a phase-contrast microscope. There are some limitations on how much we can improve the contrast. We now added scale bars for D-I. Similarly, faint hollow vesicles can be seen in Fig. S21 C & D, and Fig. 3H.  

      S36: Very nice; in B no purple arrow is visible. Note that scale bars are of different styles. S37 and S36 are very much the same - fuse, perhaps?  

      We are sorry for the confusion. There are purple arrows in Fig. S37B-D. 

      S38: this is a more unconvincing figure - erase? 

      Unconvincing in wahy sense. There are considerable similarities between the microfossils and structures formed by EM-P. If the reviewer expands a bit on what he finds unconvincing, we can address his reservations.

      S39: white rectangle in A? Arrow in A? Note that scale bars are of different styles.

      These are some of the unavoidable remnants from the image from the original publication. 

      S40: in F: CM, V = ?; Note that scale bars are of different style. 

      It’s an oversite on our part. We now added the definitions to the figure legaend. We thank the reviewer for pointing it out.  

      S41: Rectangles in D, E, F, G can be deleted? Scales and labels missing in photos lower right. 

      Those rectangles are added by the image processing software to the 3Drendered images. Regarding the missing scale bars in H & I they are the magnified regions of F. The scale bar is already present in F.   

      S42: appears compressed. G could be trimmed. Labels too small; scale in G? 

      This is a curled-up folded membrane. We needed to lower the resolution of some images to restrict the size of the supplement to journal size restrictions. It is not possible to present 85 figures in high resolution. But we assure you that the image is not laterally compressed in any manner.   

      S43: This figure appears to be unconvincing. Reducing to pairing B, C, D with L, K? Spherical inclusions in B? Scales in E to G? Similar in S44: A, B, E only? Note that scale bars are of different styles. 

      Figures I to K are important. They show not just the morphological similarities but also the sequence of steps through which such structures are formed. We addressed the issue of the scale bars above.  

      S45: A, B, and C appear to show live or subrecent material. How was this isolated of a rock? Note that scale bars are of different styles.  

      It is common to treat rocks with acids to dissolve them and then retrieve organic structures within them. This technique is becoming increasingly common. The procedure is quite extensively discussed in the original manuscript. We don’t see much differences in the scale bars of microfossils and EM-P cells, they are quite similar. 

      S46: A: what is to see here? Note that scale bars are of different styles. 

      There are considerable similarities between the folded fabric like organic structures with spherical inclusions and structures formed by EM-P. If the reviewer expands a bit on what he finds unconvincing, we can address his reservations.    

      S47: Perhaps enlarge B and erase A. Note that scale bars are of different styles. 

      S48: Image B appears to show the fossil material - is the figure caption inconsistent? There are no aggregations visible in the boxes in A. H is described in the figure caption but missing in the figure. Overall, F and G do not appear to mirror anything in A to E (which may be fossil material?). 

      S51; S52 B, C, E; S53: these figures appear unconvincing - erase? 

      Unconvincing in what sense? The structures from our study are very similar to the microfossils.   

      S54: North "Pole; scale bars in A to C =? 

      These figures were borrowed from an earlier publication referenced in the figure legend. That is the reason for the differences in the styles of scale bars.  

      S55: D and E appear not to contribute anything. Perhaps add arrow(s) and more explanation? Check the spelling in the caption, please. 

      D & E show morphological similarities between cells from our study and microfossils (A).   

      S56: Hexagonal morphologies may also be a consequence of diagenesis. Overall, perhaps erase this figure?  

      I certainly agree that could be one of the reasons for the hexagonal morphologies. Such geometric polygonal morphologies have not been observed in living organisms. Nevertheless, as you can see from the figure, such morphologies could also be formed by living organisms. Hence, this alternate interpretation should not be discounted.   

      S57: The figure caption needs improvement. Please add more description. What show arrows in A, what are the numbers in A? What is the relation between the image attached to the right side of A? Is this a close-up? Note that scale bars are of different styles. 

      We expanded a bit on our original description of the figure. However, we request the reviewer to keep in mind that the parts of the figure are taken from previous publication. We are not at liberty to modifiy them, like removing the arrows. This imposes some constrains. 

      S58: There are no honeycomb-shaped features visible. What is to see here? Erase this figure? 

      Clearly, one can see spherical and polygonal shapes within the Archaean organic structures and mat-like structures formed by EM-P.  

      S59 and S60: What is to see here? - Erase? 

      Clearly, one can see spherical and polygonal shapes within the Archaean organic structures and mat-like structures formed by EM-P in Fig. S59. Further disintegration of these honeycomb shaped mats into filamentous struructures with spherical cells attached to them can be seen in both Archaean organic structures and structures formed by EM-P.   

      S61: This figure appears to be unconvincing. B and F may be a good pairing. Note that scale bars are of different styles.  

      There are considerable similarities between the microfossils and structures formed by EM-P. If the reviewer expands a bit on what he finds unconvincing, we might be able to address his reservations.     

      S62: This figure appears to be unconvincing - erase?

      There are considerable similarities between the microfossils and structures formed by EM-P. If the reviewer expands a bit on what he finds unconvincing, we might be able to address his reservations.     

      S66: This figure is unconvincing - erase? 

      There are considerable similarities between the microfossils and structures formed by EM-P. If the reviewer expands a bit on what he finds unconvincing, we might be able to address his reservations.    

      S68: Scale in B, D, and E? 

      Image B is just a magnified image of a small portion of image A. Hence, there is no need for an additional scale bar. The same is true for images D and E. 

      S69: This figure appears to be unconvincing, at least the fossil part. Filamentous features are visible in fossil material as well, but nothing else. 

      We are not sure what filamentous features the reviewer is referring to. Both the figures show morphologically similar spherical cells covered in membrane debris.    

      S70 [as well as S82]: Good thinking here, but scales differ by magnitudes (cm to μm). Erase this figure? Very similar to Figure S73: Insert in C has which scale in comparison to B? Note that scale bars are of different styles.  

      We realize the scale bars are of different sizes. In our defense, our experiments are conducted in 1ml volume chamber slides. We don’t have the luxury of doing these experiments on a scale similar to the natural environments. The size differences are to be expected. 

      S71: Scale in E? 

      Image E is just a magnified image of a small portion of image D. Hence, we believe a scale bar is unnecessary. 

      S72: Scale in insert?  

      The insert is just a magnified region of A & C

      S75: This figure appears to be unconvincing. This is clastic sediment, not chert. Lenticular gaps would collapse during burial by subsequent sediment. - Erase? 

      Regarding the similarities, we see similar lenticular gaps within the parallel layers of organic carbon in both microfossils, and structures formed by EM-P.

      S76: A, C, D do not look similar to B - erase? Similar to S79, also with respect to the differences in scale. Erase? 

      Regarding the similarities, we see similar lenticular gaps within the parallel layers of organic carbon in both microfossils, and structures formed by EM-P. We believe we addressed the issue of scale bars above. 

      S80: A appears to be diagenetic, not primary. Erase? 

      These two structures share too many resemblances to ignore or discount just as diagenic structures - Raised filamentous structures originate out of parallel layers of organic carbon (laminations), with spherical cells within this filamentous organic carbon.  

      S85: What role would diagenesis play here? This figure appears unconvincing. Erase?

      We do believe that diagenesis plays a major role in microfossil preservation. However, we also do not suscribe to the notion that we should by default assign diagenesis to all microfossil features. Our study shows that there could be an alternate explanation to some of the observations.  

      S86 and S87: These appear unconvincing. What is to see here? Erase? 

      The morphological similarities between these two structures. Stellarshaped organic structures with strings of spherical daughter cells growing out of them.  

      S88: Does this image suggest the preservation of "salt" in organic material once preserved in chert?  

      That is one inference we conclude from this observation. Crystaline NaCl was previously reported from within the microfossil cells.    

      S89: What is to see here? Spherical phenomena in different materials? 

      At present, the presence of honeycomb-like structures is often considered to have been an indication of volcanic pumice. We meant to show that biofilms of living organisms could result in honeycomb-shaped patterns similar to volcanic pumice.

      References 

      Please check the spelling in the references. 

      We found a few references that required corrention. We now rectified them. 

      References  

      (1) Orange F, Westall F, Disnar JR, Prieur D, Bienvenu N, Le Romancer M, et al. Experimental silicification of the extremophilic archaea pyrococcus abyssi and methanocaldococcus jannaschii: Applications in the search for evidence of life in early earth and extraterrestrial rocks. Geobiology. 2009;7(4). 

      (2) Orange F, Disnar JR, Westall F, Prieur D, Baillif P. Metal cation binding by the hyperthermophilic microorganism, Archaea Methanocaldococcus Jannaschii, and its effects on silicification. Palaeontology. 2011;54(5). 

      (3) Errington J. L-form bacteria, cell walls and the origins of life. Open Biol. 2013;3(1):120143. 

      (4) Cooper S. Distinguishing between linear and exponential cell growth during the division cycle: Single-cell studies, cell-culture studies, and the object of cell-cycle research. Theor Biol Med Model. 2006; 

      (5) Mitchison JM. Single cell studies of the cell cycle and some models. Theor Biol Med Model. 2005; 

      (6) Kærn M, Elston TC, Blake WJ, Collins JJ. Stochasticity in gene expression: From theories to phenotypes. Nat Rev Genet. 2005; 

      (7) Elowitz MB, Levine AJ, Siggia ED, Swain PS. Stochastic gene expression in a single cell. Science. 2002; 

      (8) Strovas TJ, Sauter LM, Guo X, Lidstrom ME. Cell-to-cell heterogeneity in growth rate and gene expression in Methylobacterium extorquens AM1. J Bacteriol. 2007; 

      (9) Knoll AH, Barghoorn ES. Archean microfossils showing cell division from the Swaziland System of South Africa. Science. 1977;198(4315):396–8. 

      (10) Sugitani K, Grey K, Allwood A, Nagaoka T, Mimura K, Minami M, et al. Diverse microstructures from Archaean chert from the Mount Goldsworthy–Mount Grant area, Pilbara Craton, Western Australia: microfossils, dubiofossils, or pseudofossils? Precambrian Res. 2007;158(3–4):228–62. 

      (11) Kanaparthi D, Lampe M, Krohn JH, Zhu B, Hildebrand F, Boesen T, et al. The reproduction process of Gram-positive protocells. Sci Rep. 2024 Mar 25;14(1):7075.

    1. Author response:

      Reviewer #1 (Public review):

      This manuscript presents an interesting exploration of the potential activation mechanisms of DLK following axonal injury. While the experiments are beautifully conducted and the data are solid, I feel that there is insufficient evidence to fully support the conclusions made by the authors.

      In this manuscript, the authors exclusively use the puc-lacZ reporter to determine the activation of DLK. This reporter has been shown to be induced when DLK is activated. However, there is insufficient evidence to confirm that the absence of reporter activation necessarily indicates that DLK is inactive. As with many MAP kinase pathways, the DLK pathway can be locally or globally activated in neurons, and the level of DLK activation may depend on the strength of the stimulation. This reporter might only reflect strong DLK activation and may not be turned on if DLK is weakly activated. The results presented in this manuscript support this interpretation. Strong stimulation, such as axotomy of all synaptic branches, caused robust DLK activation, as indicated by puc-lacZ expression. In contrast, weak stimulation, such as axotomy of some synaptic branches, resulted in weaker DLK activation, which did not induce the puc-lacZ reporter. This suggests that the strength of DLK activation depends on the severity of the injury rather than the presence of intact synapses. Given that this is a central conclusion of the study, it may be worthwhile to confirm this further. Alternatively, the authors may consider refining their conclusion to better align with the evidence presented.

      We wish to further clarify a striking aspect of puc-lacZ induction following injury: it is bimodal. It is either induced (in various injuries that remove all synaptic boutons), or not induced, including in injuries that spared only 1-2 remaining boutons. This was particularly evident for injuries that spared the NMJ on muscle 29, which is comprised of only a few boutons. In some instances, only a single bouton was evident on muscle 29. While our injuries varied enormously in the number of branches and boutons that were lost, we did not see a comparable variability in puc-lacZ induction.  In the revision we will include additional images to better demonstrate this observation.

      The reviewer (and others) fairly point out that our current study focuses on puc-lacZ as a reporter of Wnd signaling in the cell body. We consider this to be a downstream integration of events in axons that are more challenging to detect. It is striking that this integration appears strongly sensitized to the presence of spared synaptic boutons. Examination of Wnd’s activation in axons and synapses is a goal for our future work.

      As noted by the authors, DLK has been implicated in both axon regeneration and degeneration. Following axotomy, DLK activation can lead to the degeneration of distal axons, where synapses are located. This raises an important question: how is DLK activated in distal axons? The authors might consider discussing the significance of this "synapse connection-dependent" DLK activation in the broader context of DLK function and activation mechanisms.

      While it has been noted that inhibition of DLK can mildly delay Wallerian degeneration (Miller et al., 2009), this does not appear to be the case for retinal ganglion cell axons following optic nerve crush (Fernandes et al., 2014). It is also not the case for Drosophila motoneurons and NMJ terminals following peripheral nerve injury (Xiong et al., 2012; Xiong and Collins, 2012). Instead, overexpression of Wnd or activation of Wnd by a conditioning injury leads to an opposite phenotype - an increase in resiliency to Wallerian degeneration for axons that have been previously injured (Xiong et al., 2012; Xiong and Collins, 2012). The downstream outcome of Wnd activation is highly dependent on the context; it may be an integration of the outcomes of local Wnd/DLK activation in axons with downstream consequences of nuclear/cell body signaling.  The current study suggests some rules for the cell body signaling, however, how Wnd is regulated at synapses and why it promotes degeneration in some circumstances but not others are important future questions.

      For the reviewer’s suggestion, it is interesting to consider DLK’s potential contributions to the loss of NMJ synapses in a mouse model of ALS (Le Pichon et al., 2017; Wlaschin et al., 2023). Our findings suggest that the synaptic terminal is an important locus of DLK regulation, while dysfunction of NMJ terminals is an important feature of the ‘dying back’ hypothesis of disease etiology (Dadon-Nachum et al., 2011; Verma et al., 2022). We propose that the regulation of DLK at synaptic terminals is an important area for future study, and may reveal how DLK might be modulated to curtail disease progression. Of note, DLK inhibitors are in clinical trials (Katz et al., 2022; Le et al., 2023; Siu et al., 2018), but at least some have been paused due to safety concerns (Katz et al., 2022). Further understanding of the mechanisms that regulate DLK are needed to understand whether and how DLK and its downstream signaling can be tuned for therapeutic benefit.

      Reviewer #2 (Public review):

      Summary:

      The authors study a panel of sparsely labeled neuronal lines in Drosophila that each form multiple synapses. Critically, each axonal branch can be injured without affecting the others, allowing the authors to differentiate between injuries that affect all axonal branches versus those that do not, creating spared branches. Axonal injuries are known to cause Wnd (mammalian DLK)-dependent retrograde signals to the cell body, culminating in a transcriptional response. This work identifies a fascinating new phenomenon that this injury response is not all-or-none. If even a single branch remains uninjured, the injury signal is not activated in the cell body. The authors rule out that this could be due to changes in the abundance of Wnd (perhaps if incrementally activated at each injured branch) by Wnd, Hiw's known negative regulator. Thus there is both a yet-undiscovered mechanism to regulate Wnd signaling, and more broadly a mechanism by which the neuron can integrate the degree of injury it has sustained. It will now be important to tease apart the mechanism(s) of this fascinating phenomenon. But even absent a clear mechanism, this is a new biology that will inform the interpretation of injury signaling studies across species.

      Strengths:

      (1) A conceptually beautiful series of experiments that reveal a fascinating new phenomenon is described, with clear implications (as the authors discuss in their Discussion) for injury signaling in mammals.

      (2) Suggests a new mode of Wnd regulation, independent of Hiw.

      Weaknesses:

      (1) The use of a somatic transcriptional reporter for Wnd activity is powerful, however, the reporter indicates whether the transcriptional response was activated, not whether the injury signal was received. It remains possible that Wnd is still activated in the case of a spared branch, but that this activation is either local within the axons (impossible to determine in the absence of a local reporter) or that the retrograde signal was indeed generated but it was somehow insufficient to activate transcription when it entered the cell body. This is more of a mechanistic detail and should not detract from the overall importance of the study

      We agree. The puc-lacZ reporter tells us about signaling in the cell body, but whether and how Wnd is regulated in axons and synaptic branches, which we think occurs upstream of the cell body response, remains to be addressed in future studies.

      (2) That the protective effect of a spared branch is independent of Hiw, the known negative regulator of Wnd, is fascinating. But this leaves open a key question: what is the signal?

      This is indeed an important future question, and would still be a question even if Hiw were part of the protective mechanism by the spared synaptic branch. Our current hypothesis (outlined in Figure 4) is that regulation of Wnd is tied to the retrograde trafficking of a signaling organelle in axons. The Hiw-independent regulation complements other observations in the literature that multiple pathways regulate Wnd/DLK (Collins et al., 2006; Feoktistov and Herman, 2016; Klinedinst et al., 2013; Li et al., 2017; Russo and DiAntonio, 2019; Valakh et al., 2013). It is logical for this critical stress response pathway to have multiple modes of regulation that may act in parallel to tune and restrain its activation.

      Reviewer #3 (Public review):

      Summary:

      This manuscript seeks to understand how nerve injury-induced signaling to the nucleus is influenced, and it establishes a new location where these principles can be studied. By identifying and mapping specific bifurcated neuronal innervations in the Drosophila larvae, and using laser axotomy to localize the injury, the authors find that sparing a branch of a complex muscular innervation is enough to impair Wallenda-puc (analogous to DLK-JNK-cJun) signaling that is known to promote regeneration. It is only when all connections to the target are disconnected that cJun-transcriptional activation occurs.

      Overall, this is a thorough and well-performed investigation of the mechanism of spared-branch influence on axon injury signaling. The findings on control of wnd are important because this is a very widely used injury signaling pathway across species and injury models. The authors present detailed and carefully executed experiments to support their conclusions. Their effort to identify the control mechanism is admirable and will be of aid to the field as they continue to try to understand how to promote better regeneration of axons.

      Strengths:

      The paper does a very comprehensive job of investigating this phenomenon at multiple locations and through both pinpoint laser injury as well as larger crush models. They identify a non-hiw based restraint mechanism of the wnd-puc signaling axis that presumably originates from the spared terminal. They also present a large list of tests they performed to identify the actual restraint mechanism from the spared branch, which has ruled out many of the most likely explanations. This is an extremely important set of information to report, to guide future investigators in this and other model organisms on mechanisms by which regeneration signaling is controlled (or not).

      Weaknesses:

      The weakest data presented by this manuscript is the study of the actual amounts of Wallenda protein in the axon. The authors argue that increased Wnd protein is being anterogradely delivered from the soma, but no support for this is given. Whether this change is due to transcription/translation, protein stability, transport, or other means is not investigated in this work. However, because this point is not central to the arguments in the paper, it is only a minor critique.

      We agree and are glad that the reviewer considers this a minor critique; this is an area for future study. In Supplemental Figure 1 we present differences in the levels of an ectopically expressed GFP-Wnd-kinase-dead transgene, which is strikingly increased in axons that have received a full but not partial axotomy. We suspect this accumulation occurs downstream of the cell body response because of the timing. We observed the accumulations after 24 hours (Figure S1F) but not at early (1-4 hour) time points following axotomy (data not shown). Further study of the local regulation of Wnd protein and its kinase activity in axons is an important future direction.

      As far as the scope of impact: because the conclusions of the paper are focused on a single (albeit well-validated) reporter in different types of motor neurons, it is hard to determine whether the mechanism of spared branch inhibition of regeneration requires wnd-puc (DLK/cJun) signaling in all contexts (for example, sensory axons or interneurons). Is the nerve-muscle connection the rule or the exception in terms of regeneration program activation?

      DLK signaling is strongly activated in DRG sensory neurons following peripheral nerve injury (Shin et al., 2012), despite the fact that sensory neurons have bifurcated axons and their projections in the dorsal spinal cord are not directly damaged by injuries to the peripheral nerve. Therefore it is unlikely that protection by a spared synapse is a universal rule for all neuron types. However the molecular mechanisms that underlie this regulation may indeed be shared across different types of neurons but utilized in different ways. For instance, nerve growth factor withdrawal can lead to activation of DLK (Ghosh et al., 2011), however neurotrophins and their receptors are regulated and implemented differently in different cell types. We suspect that the restraint of Wnd signaling by the spared synaptic branch shares a common underlying mechanism with the restraint of DLK signaling by neurotrophin signaling. Further elucidation of the molecular mechanism is an important next step towards addressing this question.

      Because changes in puc-lacZ intensity are the major readout, it would be helpful to better explain the significance of the amount of puc-lacZ in the nucleus with respect to the activation of regeneration. Is it known that scaling up the amount of puc-lacZ transcription scales functional responses (regeneration or others)? The alternative would be that only a small amount of puc-lacZ is sufficient to efficiently induce relevant pathways (threshold response).

      While induction of puc-lacZ expression correlates with Wnd-mediated phenotypes, including sprouting of injured axons (Xiong et al., 2010), protection from Wallerian degeneration (Xiong et al., 2012; Xiong and Collins, 2012) and synaptic overgrowth (Collins et al., 2006), we have not observed any correlation between the degree of puc-lacZ induction (eg modest, medium or high) and the phenotypic outcomes (sprouting, overgrowth, etc). Rather, there appears to be a striking all-or-none difference in whether puc-lacZ is induced or not induced. There may indeed be a threshold that can be restrained through multiple mechanisms. We posit in figure 4 that restraint may take place in the cell body, where it can be influenced by the spared bifurcation.

      References Cited:

      Collins CA, Wairkar YP, Johnson SL, DiAntonio A. 2006. Highwire restrains synaptic growth by attenuating a MAP kinase signal. Neuron 51:57–69.

      Dadon-Nachum M, Melamed E, Offen D. 2011. The “dying-back” phenomenon of motor neurons in ALS. J Mol Neurosci 43:470–477.

      Feoktistov AI, Herman TG. 2016. Wallenda/DLK protein levels are temporally downregulated by Tramtrack69 to allow R7 growth cones to become stationary boutons. Development 143:2983–2993.

      Fernandes KA, Harder JM, John SW, Shrager P, Libby RT. 2014. DLK-dependent signaling is important for somal but not axonal degeneration of retinal ganglion cells following axonal injury. Neurobiol Dis 69:108–116.

      Ghosh AS, Wang B, Pozniak CD, Chen M, Watts RJ, Lewcock JW. 2011. DLK induces developmental neuronal degeneration via selective regulation of proapoptotic JNK activity. J Cell Biol 194:751–764.

      Hao Y, Frey E, Yoon C, Wong H, Nestorovski D, Holzman LB, Giger RJ, DiAntonio A, Collins C. 2016. An evolutionarily conserved mechanism for cAMP elicited axonal regeneration involves direct activation of the dual leucine zipper kinase DLK. Elife 5. doi:10.7554/eLife.14048

      Huntwork-Rodriguez S, Wang B, Watkins T, Ghosh AS, Pozniak CD, Bustos D, Newton K, Kirkpatrick DS, Lewcock JW. 2013. JNK-mediated phosphorylation of DLK suppresses its ubiquitination to promote neuronal apoptosis. J Cell Biol 202:747–763.

      Katz JS, Rothstein JD, Cudkowicz ME, Genge A, Oskarsson B, Hains AB, Chen C, Galanter J, Burgess BL, Cho W, Kerchner GA, Yeh FL, Ghosh AS, Cheeti S, Brooks L, Honigberg L, Couch JA, Rothenberg ME, Brunstein F, Sharma KR, van den Berg L, Berry JD, Glass JD. 2022. A Phase 1 study of GDC-0134, a dual leucine zipper kinase inhibitor, in ALS. Ann Clin Transl Neurol 9:50–66.

      Klinedinst S, Wang X, Xiong X, Haenfler JM, Collins CA. 2013. Independent pathways downstream of the Wnd/DLK MAPKKK regulate synaptic structure, axonal transport, and injury signaling. J Neurosci 33:12764–12778.

      Le K, Soth MJ, Cross JB, Liu G, Ray WJ, Ma J, Goodwani SG, Acton PJ, Buggia-Prevot V, Akkermans O, Barker J, Conner ML, Jiang Y, Liu Z, McEwan P, Warner-Schmidt J, Xu A, Zebisch M, Heijnen CJ, Abrahams B, Jones P. 2023. Discovery of IACS-52825, a potent and selective DLK inhibitor for treatment of chemotherapy-induced peripheral neuropathy. J Med Chem 66:9954–9971.

      Le Pichon CE, Meilandt WJ, Dominguez S, Solanoy H, Lin H, Ngu H, Gogineni A, Sengupta Ghosh A, Jiang Z, Lee S-H, Maloney J, Gandham VD, Pozniak CD, Wang B, Lee S, Siu M, Patel S, Modrusan Z, Liu X, Rudhard Y, Baca M, Gustafson A, Kaminker J, Carano RAD, Huang EJ, Foreman O, Weimer R, Scearce-Levie K, Lewcock JW. 2017. Loss of dual leucine zipper kinase signaling is protective in animal models of neurodegenerative disease. Sci Transl Med 9. doi:10.1126/scitranslmed.aag0394

      Li J, Zhang YV, Asghari Adib E, Stanchev DT, Xiong X, Klinedinst S, Soppina P, Jahn TR, Hume RI, Rasse TM, Collins CA. 2017. Restraint of presynaptic protein levels by Wnd/DLK signaling mediates synaptic defects associated with the kinesin-3 motor Unc-104. Elife 6. doi:10.7554/eLife.24271

      Miller BR, Press C, Daniels RW, Sasaki Y, Milbrandt J, DiAntonio A. 2009. A dual leucine kinase-dependent axon self-destruction program promotes Wallerian degeneration. Nat Neurosci 12:387–389.

      Nihalani D, Merritt S, Holzman LB. 2000. Identification of structural and functional domains in mixed lineage kinase dual leucine zipper-bearing kinase required for complex formation and stress-activated protein kinase activation. J Biol Chem 275:7273–7279.

      Russo A, DiAntonio A. 2019. Wnd/DLK is a critical target of FMRP responsible for neurodevelopmental and behavior defects in the Drosophila model of fragile X syndrome. Cell Rep 28:2581–2593.e5.

      Shin JE, Cho Y, Beirowski B, Milbrandt J, Cavalli V, DiAntonio A. 2012. Dual leucine zipper kinase is required for retrograde injury signaling and axonal regeneration. Neuron 74:1015–1022.

      Siu M, Sengupta Ghosh A, Lewcock JW. 2018. Dual Leucine Zipper Kinase Inhibitors for the Treatment of Neurodegeneration. J Med Chem 61:8078–8087.

      Valakh V, Walker LJ, Skeath JB, DiAntonio A. 2013. Loss of the spectraplakin short stop activates the DLK injury response pathway in Drosophila. J Neurosci 33:17863–17873.

      Verma S, Khurana S, Vats A, Sahu B, Ganguly NK, Chakraborti P, Gourie-Devi M, Taneja V. 2022. Neuromuscular junction dysfunction in amyotrophic lateral sclerosis. Mol Neurobiol 59:1502–1527.

      Wlaschin JJ, Donahue C, Gluski J, Osborne JF, Ramos LM, Silberberg H, Le Pichon CE. 2023. Promoting regeneration while blocking cell death preserves motor neuron function in a model of ALS. Brain 146:2016–2028.

      Xiong X, Collins CA. 2012. A conditioning lesion protects axons from degeneration via the Wallenda/DLK MAP kinase signaling cascade. J Neurosci 32:610–615.

      Xiong X, Hao Y, Sun K, Li J, Li X, Mishra B, Soppina P, Wu C, Hume RI, Collins CA. 2012. The Highwire ubiquitin ligase promotes axonal degeneration by tuning levels of Nmnat protein. PLoS Biol 10:e1001440.

      Xiong X, Wang X, Ewanek R, Bhat P, Diantonio A, Collins CA. 2010. Protein turnover of the Wallenda/DLK kinase regulates a retrograde response to axonal injury. J Cell Biol 191:211–223.

    1. Reviewer #1 (Public review):

      Summary:

      Zhang et al. addressed the question of whether advantageous and disadvantageous inequality aversion can be vicariously learned and generalized. Using an adapted version of the ultimatum game (UG), in three phases, participants first gave their own preference (baseline phase), then interacted with a "teacher" to learn their preference (learning phase), and finally were tested again on their own (transfer phase). The key measure is whether participants exhibited similar choice preferences (i.e., rejection rate and fairness rating) influenced by the learning phase, by contrasting their transfer phase and baseline phase. Through a series of statistical modeling and computational modeling, the authors reported that both advantageous and disadvantageous inequality aversion can indeed be learned (Study 1), and even be generalised (Study 2).

      Strengths:

      This study is very interesting, it directly adapted the lab's previous work on the observational learning effect on disadvantageous inequality aversion, to test both advantageous and disadvantageous inequality aversion in the current study. Social transmission of action, emotion, and attitude have started to be looked at recently, hence this research is timely. The use of computational modeling is mostly appropriate and motivated. Study 2, which examined the vicarious inequality aversion in conditions where feedback was never provided, is interesting and important to strengthen the reported effects. Both studies have proper justifications to determine the sample size.

      Weaknesses:

      Despite the strengths, a few conceptual aspects and analytical decisions have to be explained, justified, or clarified.

      INTRODUCTION/CONCEPTUALIZATION<br /> (1) Two terms seem to be interchangeable, which should not, in this work: vicarious/observational learning vs preference learning. For vicarious learning, individuals observe others' actions (and optionally also the corresponding consequence resulting directly from their own actions), whereas, for preference learning, individuals predict, or act on behalf of, the others' actions, and then receive feedback if that prediction is correct or not. For the current work, it seems that the experiment is more about preference learning and prediction, and less so about vicarious learning. The intro and set are heavily around vicarious learning, and later the use of vicarious learning and preference learning is rather mixed in the text. I think either tone down the focus on vicarious learning, or discuss how they are different. Some of the references here may be helpful: Charpentier et al., Neuron, 2020; Olsson et al., Nature Reviews Neuroscience, 2020; Zhang & Glascher, Science Advances, 2020

      EXPERIMENTAL DESIGN<br /> (2) For each offer type, the experiment "added a uniformly distributed noise in the range of (-10 ,10)". I wonder what this looks like? With only integers such as 25:75, or even with decimal points? More importantly, is it possible to have either 70:30 or 90:10 option, after adding the noise, to have generated an 80:20 split shown to the participants? If so, for the analyses later, when participants saw the 80:20 split, which condition did this trial belong to? 70:30 or 90:10? And is such noise added only to the learning phase, or also to the baseline/transfer phases? This requires some clarification.

      (3) For the offer conditions (90:10, 70:30, 50:50, 30:70, 10:90) - are they randomized? If so, how is it done? Is it randomized within each participant, and/or also across participants (such that each participant experienced different trial sequences)? This is important, as the order especially for the learning phase can largely impact the preference learning of the participants.

      STATISTICAL ANALYSIS & COMPUTATIONAL MODELING<br /> (4) In Study 1 DI offer types (90:10, 70:30), the rejection rate for DI-AI averse looks consistently higher than that for DI averse (ie, the blue line is above the yellow line). Is this significant? If so, how come? Since this is a between-subject design, I would not anticipate such a result (especially for the baseline). Also, for the LME results (eg, Table S3), only interactions were reported but not the main results.

      (5) I do not particularly find this analysis appealing: "we examined whether participants' changes in rejection rates between Transfer and Baseline, could be explained by the degree to which they vicariously learned, defined as the change in punishment rates between the first and last 5 trials of the Learning phase." Naturally, the participants' behavior in the first 5 trials in the learning phase will be similar to those in the baseline; and their behavior in the last 5 trials in the learning phase would echo those at the transfer phase. I think it would be stronger to link the preference learning results to the change between the baseline and transfer phase, eg, by looking at the difference between alpha (beta) at the end of the learning phase and the initial alpha (beta).

      (6) I wonder if data from the baseline and transfer phases can also be modeled, using a simple Fehr-Schimdt model. This way, the change in alpha/beta can also be examined between the baseline and transfer phase.

      (7) I quite liked Study 2 which tests the generalization effect, and I expected to see an adapted computational modeling to directly reflect this idea. Indeed, the authors wrote, "[...] given that this model [...] assumes the sort of generalization of preferences between offer types [...]". But where exactly did the preference learning model assume the generalization? In the methods, the modeling seems to be only about Study 1; did the authors advise their model to accommodate Study 2? The authors also ran simulation for the learning phase in Study 2 (Figure 6), and how did the preference update (if at all) for offers (90:10 and 10:90) where feedback was not given? Extending/Unpacking the computational modeling results for Study 2 will be very helpful for the paper.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      General Comment. *Using ubiquitous and targeted heterologous expression of the honeybee venom peptide Apamin in Drosophila, the authors find that apamin has antimicrobial activity that is enhanced by membrane-tethering and dependent on the Drosophila pattern-recognition receptors PGRP-LA and PGRP-SC2. Expression of apamin in the Drosophila gut or ingestion of Apamin by honeybees has positive effects on gut health as shown by a number of metrics. *

      __ Answer: __We thank the reviewer for their insightful comments. We agree that the findings of this study are significant and have broad implications for understanding the antimicrobial properties of apamin. As suggested, we have further delved into the molecular mechanisms underlying apamin's antimicrobial activity, providing additional details on its interactions with target bacteria. We have also expanded our discussion on the role of membrane-tethering in enhancing apamin's activity and its potential impact on its localization. We believe that these additional illustrations strengthen our conclusions and provide a more comprehensive understanding of apamin's biological functions.

      Major comments:

      Comment 1. *The key conclusions are convincing and largely supported by the data as shown. The data is presented clearly, save for some areas in the results where the authors should be more explicit about the methods that were used as they affect the reader's interpretation of the results (see minor comments). *

      __ Answer: __We would like to express our gratitude to the reviewer for their constructive feedback and positive remarks regarding our manuscript. We are pleased to note that the reviewer found our key conclusions convincing and largely supported by the data presented. This affirmation encourages us as we strive to contribute meaningful insights to the field. We acknowledge the reviewer's suggestion to enhance clarity in certain areas of the Results section, particularly concerning the methods employed. We appreciate this guidance and have taken it into account. In our revised manuscript, we have made explicit revisions to ensure that the methodology is clearly articulated, thereby improving the reader's interpretation of our results. Thank you once again for your valuable feedback, which has undoubtedly strengthened our work. We look forward to your continued guidance as we finalize our manuscript.

      Comment 2. *If the authors wish to conclude that PGRP-LE and PGRP-LC are not required for the demonstrated functions of Apamin, the authors should do a double knock-down of PGRP-LC and LE together, as these pattern recognition receptors function partially redundantly in activation of the Imd pathway (e.g. doi: 10.1038/ni1356). *

      __ Answer:__ We appreciate reviewer's interesting suggestion to know PGRP-LE and LC's functions are redundant to activate Imd pathway or Apamin is totally independent of Imd pathway. As reviewer suggested, we have conducted double knockdown of PGRP-LE and PGRP-LC and showed that apamin still suppress bacterial infection regardless of these double knockdowns of these genes. This data suggests that apamin's antimicrobial function is totally not dependent on PGRP-LE or LC and open new questions about apamin's unique function as AMP. We added new data in Fig. 5d and described in main text as below:

      "Knockdown of PGRP-LC or LE, as well as their combined knockdown, did not affect the antimicrobial efficacy of apamin (Fig. 5b-d), suggesting that the antimicrobial properties of apamin are independent of PGRP-LC and LE functions (Fig. 5a)."

      Comment 3. *The Introduction and Discussion would benefit from providing more context that helps the reader understand the significance of the research. Where is apamin expressed in the honeybee? Is it likely to be ingested and have effects on gut health in natural conditions? Do honeybees have homologs of PGRP-LA and PGRP-SC2? Do these findings translate to the honeybee system in any way or are they restricted to heterologous expression in Drosophila? *

      __Answer: __We thank the reviewer for valuable suggestions. We agree that providing additional context on the natural role of apamin in honeybees and the relevance of our findings to the honeybee system is crucial.

      Natural expression and function of apamin: While apamin is primarily known for its neurotoxic effects, studies have suggested that it may also play a role in antimicrobial defense. While its specific expression pattern in honeybees is not fully understood, it is conceivable that it is mainly expressed in venom sacs according to research on biochemistry and pharmacology of apamin (Habermann, 1972; Schumacher et al, 1994; INOUE et al, 1987) . We have outlined this information in the Introduction section as follows:

      "Apamin, an 18 amino acid peptide neurotoxin, is one of the bioactive components of bee venom, making up 2%-3% of its total dry weight, naturally expressed in bee venom sacs (RIETSCHOTEN et al, 1975; E.H., 1976; Son et al, 2007; Zhou et al, 2010; Habermann, 1972)."

      Potential for ingestion and gut effects: Although direct evidence for apamin ingestion and its impact on gut health in natural conditions is limited, it is plausible that honeybees could be exposed to apamin through various means, including foraging and social interactions. However, artificial interference is the potential application method that we are more focusing on. We have included additional details regarding the function of apamin in the Introduction section as follows:

      "It is the smallest known neurotoxic polypeptide and exhibits elevated basicity and sulfur content, demonstrating prolonged action relative to other pharmacological agents influencing the central or peripheral nervous systems(Habermann, 1972)."

      Honeybee homologs of PGRPs: Concerning the honeybee PGRPs and their homologs in Drosophila, we have provided an explanation as follows:

      "While honeybees possess homologs of PGRP family, including PGRP-LC and PGRP-S2, their specific roles in response to apamin and other antimicrobial peptides remain to be elucidated(Larsen et al, 2019a)."

      Relevance to the honeybee system: While our study primarily utilized Drosophila as a model system, the conserved nature of innate immune pathways suggests that the findings may have broader implications for honeybee health. Future studies aimed at directly investigating the effects of apamin in honeybees will be essential to fully understand its role in their physiology and behavior. We have incorporated these points into the Discussion sections to provide a more comprehensive and informative overview of our research as below:

      "In conclusion, it is important to note that much of our understanding of the honeybee immune system is derived from studies conducted on the Drosophila model, owing to the evolutionary proximity of these two species (Larsen et al, 2019b). This close relationship allows for valuable insights into immune mechanisms that are conserved across species (Evans et al, 2006; Morfin et al, 2021). Research has demonstrated that the fruit fly Drosophila melanogaster serves as an effective model for studying the effects of insecticides on honeybees, particularly in understanding the sub-lethal impacts of neonicotinoids, which are known to affect pollinators significantly (Tasman et al, 2021).

      By investigating the function of honeybee AMPs within the Drosophila platform, we can further enhance our knowledge of immune responses and their implications. Just as research on Drosophila has significantly advanced our understanding of human genetic diseases (Bellen et al, 2010; Casci & Pandey, 2015; Bier, 2005; Perrimon et al, 2016; Rieder & Larschan, 2014; Bilder et al, 2021), studying honeybee AMPs in this context holds the potential to uncover novel therapeutic avenues and deepen our comprehension of immune function across taxa."

      Comment 4. *It is surprising that there is no speculation or hypothesis provided about why PGRP-LA and -SC2 may enhance apamin activity whereas other components are nonessential. It was a significant part of the paper but receives almost no discussion. *

      Answer: We thank the reviewer for highlighting this important point. The specific mechanism by which PGRP-LA and PGRP-SC2 enhance apamin's activity is an intriguing question that warrants further investigation. Our findings indicate that both PGRP-LA and PGRP-SC2 are crucial for the antimicrobial action of apamin, as their knockdown abolishes this effect, suggesting a specific functional relationship between these peptidoglycan recognition proteins and apamin's mechanism of action in the gut environment.

      PGRP-LA is known to play a significant regulatory role as positive regulator of immune responses, while PGRP-SC2 has been shown to promote gut immune homeostasis and prevent dysbiosis, which is essential for maintaining a balanced microbiome in Drosophila (Guo et al, 2014). The enhancement of apamin activity by these proteins could be attributed to their ability to modulate the immune response and facilitate a more effective antimicrobial environment, thereby allowing apamin to exert its effects more efficiently.

      Furthermore, our study aligns with previous research indicating that PGRP-SC2 can limit commensal dysbiosis and promote tissue homeostasis, which may enhance the overall efficacy of antimicrobial peptides like apamin in combating pathogenic bacteria (Guo et al, 2014). By leveraging the evolutionary insights gained from Drosophila, we can better understand how these mechanisms operate in honeybees, ultimately contributing to our knowledge of immune function across species. We have provided a detailed explanation of the potential roles of PGRP-LA and PGRP-SC2 in the action of apamin, as outlined below:

      "The PGRP-LA gene is located in a cluster with PGRP-LC and PGRP-LF, which encode a receptor and a negative regulator of the Imd pathway, respectively; structural predictions suggest that PGRP-LA may not directly bind to peptidoglycan, indicating a potential regulatory role for this PGRP in modulating immune responses (Gendrin et al, 2013). PGRP-SC2 possesses amidase activity, which means it can cleave the peptidoglycan layer of bacterial cell walls, rendering them susceptible to further degradation and ultimately leading to bacterial cell death. This amidase activity contributes to the insect's innate immune response by directly targeting and neutralizing bacterial threats (Takehana et al, 2002; Park et al, 2007; Paredes et al, 2011)."

      Comment 5. *Line 264: The fact that Rel knockdown did not impair antimicrobial activity of Apamin is a bit odd since upregulation of PGRP-SC2 upon infection is at least partially dependent on Rel (de Gregorio 2002, EMBO J), and the authors find that PGRP-SC2 is required for apamin activity. This is somewhat incongruous. *

      __Answer: __We thank the reviewer for highlighting this important point. The observation that Rel knockdown did not impair apamin's antimicrobial activity, despite its role in upregulating PGRP-SC2, is indeed intriguing.

      Several factors may contribute to this discrepancy:

      Redundancy in PGRP-SC2 regulation: It is possible that other transcription factors, in addition to Rel, may regulate PGRP-SC2 expression. Therefore, even in the absence of Rel, sufficient levels of PGRP-SC2 may be maintained to support apamin's activity(Bischoff et al, 2006) . Direct effects of apamin: Apamin may directly interact with bacterial cells or host immune cells and contribute to its antimicrobial activity, even in the absence of optimal PGRP-SC2 levels.

      We cited (de Gregorio 2002, EMBO J) paper and added explanation for this result as below:

      "It is known that the upregulation of PGRP-SC during infection is partially reliant on the Rel pathway (Gregorio et al, 2002). Our findings indicate that apamin can exert its antimicrobial activity independently of Rel's transcriptional activation function. This observation can be attributed to two key factors. First, there may be redundancy in the regulation of PGRP-SC2 expression, as other transcription factors could compensate for the absence of Rel, allowing sufficient levels of PGRP-SC2 to be maintained to support apamin's activity. Second, apamin may have direct interactions with bacterial cells or host immune cells, contributing to its antimicrobial effects even when optimal levels of PGRP-SC2 are not present. These mechanisms suggest that apamin can function effectively in the immune response, highlighting its potential as a versatile antimicrobial agent."

      Comment 6. *I cannot comment on the adequacy of the statistical analyses. Some recommendations to improve the methods: *

      *- Be specific about the kind of medium used to rear flies (provide or cite recipe). Different cornmeal-yeast media have very different compositions and can affect fly physiology and microbiome characteristics. *

      *- Specify flipping schedule (every 2-3 days?) - this also affects microbiome. *

      __Answer: __We thank the reviewer for their valuable comments. We agree that precise experimental details are crucial for reproducibility and accurate interpretation of results.

      To address the reviewer's specific concerns:

      Culture medium: We used a standard cornmeal-molasses-agar medium. The specific recipe for this medium is as follows: water add up to 5 L,agar 47g, inactive yeast 65.5g, corn flour 232.5g, soy flour 30g, molasses 350 ml, tegosept sol. 35g, propionic acid 12.5ml, phosphoric acid 2.5ml. Flipping schedule: Flies were flipped every 2-3 days to prevent overcrowding and maintain optimal culture conditions.

      We have included these details in the Methods section to enhance the clarity and reproducibility of our experiments.

      Minor comments:

      *- Line 90: Be specific about how the constructs differ from endogenous Melittin and Apamin. Do the endogenous versions have signal peptides? *

      Answer: The endogenous versions do not have signal peptides we have used, we have specified this in the manuscript for readers to have a better understanding as below:

      "To assess the functionality of genetically encoded honeybee VPs in the Drosophila model, we developed UAS-Melittin, and UAS-Apamin constructs that incorporate a previously characterized signal peptide at their N-termini (Choi et al, 2009), which original AMP and VP sequences do not have (Fig. 1a)."

      - Line 92: What is 'broad expression'? Ubiquitous? Specify driver or extent of expression.

      Answer: We have added "by tub-GAL4 driver"

      *- Line 93: Was this oral or septic P. aeruginosa infection? *

      Answer: We have added "oral"

      *- Lines 97-98: Melittin expressed genetically did not show activity against the one pathogen that was tested; making a broad statement without qualification about activity seems excessive. *

      Answer: We have added "against P. aeruginosa"

      *- Line 105: Various Gal4 drivers that express in different tissues or a similar subset of tissues? *

      Answer: We utilized tub-GAL4 and da-GAL4 in this part of screening, they both drive expression in ubiquitous tissues. Daughterless (da) involves in the transcriptional regulation of various processes, including oogenesis, neurogenesis, myogenesis, and cell proliferation. While tub-GAL4 is ubiquitous expression throughout most tissues and cell types in the Drosophila body. We have added "various ubiquitously expressing"

      *- Line 134: Present as a commensal? Pathobiont? Pathogen? *

      Answer: Apibacter raozihei is generally considered a commensal bacterium in the honeybee gut. We have added to manuscript "which is present as a commensal bacterium in the guts of".

      *- Line 149: Are Cyanobacteria naturally present in gut microbiota? What are photosynthetic bacteria doing as part of a gut microbiome? *

      Answer: While cyanobacteria are not typically found in the gut, cyanobacterial 16S rRNA-like sequences have been previously detected in human gut samples, bovine rumen, termite gut, and other animal intestines, suggesting the presence of a non-photosynthetic cyanobacterial lineage in these aphotic environments(Hu & Rzymski, 2022; Hongoh et al, 2003).

      *- Line 171: Where is apamin endogenously expressed in the honeybee? Only in the venom gland? Or in gut cells as done here in Drosophila? *

      Answer: Natural expression and function of apamin: While apamin is primarily known for its neurotoxic effects, studies have suggested that it may also play a role in antimicrobial defense. While its specific expression pattern in honeybees is not fully understood, it is conceivable that it is mainly expressed in venom sacs according to research on biochemistry and pharmacology of apamin (Habermann, 1972; Schumacher et al, 1994; INOUE et al, 1987) .

      *- Line 252: -LC and -LE work in a complementary/semi-redundant fashion, so single knockdown is not an effective method of indicating that they are not required for antimicrobial function. *

      Answer: We appreciate reviewer's interesting suggestion to know PGRP-LE and LC's functions are redundant to activate Imd pathway or Apamin is totally independent of Imd pathway. As reviewer suggested, we have conducted double knockdown of PGRP-LE and PGRP-LC and showed that apamin still suppress bacterial infection regardless of these double knockdowns of these genes. This data suggests that apamin's antimicrobial function is totally not dependent on PGRP-LE or LC and open new questions about apamin's unique function as AMP. We added new data in Fig. 5d and described in main text as below:

      "Knockdown of PGRP-LC or LE, as well as their combined knockdown, did not affect the antimicrobial efficacy of apamin (Fig. 5b-d), suggesting that the antimicrobial properties of apamin are independent of PGRP-LC and LE functions (Fig. 5a)."

      *- Lines 279-283: The bacterial infections that expression of these AMPs were tested against should be mentioned in the text, as all bacteria are not equivalent. *

      Answer: Added with "P. aeruginosa"

      *- Line 296: Challenged with which bacteria? *

      Answer: Added with "P. aeruginosa"

      *- Line 328: Provide brief explanation of what Ttk depletion is for reader context. *

      Answer: Added with short explanation as below:

      "which refers to the reduction or elimination of a protein called TTK (Monopolar Spindle 1 Kinase) that plays a crucial role in cell division, specifically in ensuring accurate chromosome segregation during mitosis (Mason et al, 2017)."

      *- Line 719: This should say, '5 days after eclosion'. *

      Answer: Corrected

      *- General comment on figures: The little icons used to denote what the figure is depicting (gut health, climbing, aging, etc.) are very effective. *

      Answer: We thank the reviewer for their appreciation on figures.

      *- General comment on figure titles: Use of the term 'infectious dose' throughout does not make sense. I think what the authors mean is 'pathogen load' as they are testing using CFUs. 'Infectious dose' should only be used to refer to the amount/OD of pathogen that was initially administered to establish an infection. Also, 'oral feeding' should be used throughout instead of 'orally feeding'. *

      Answer: We thank the reviewer for their insightful comment. We agree that the use of the term 'infectious dose' was inaccurate in certain contexts. We have revised the manuscript to use 'pathogen load' to refer to the number of CFUs administered or recovered, as this more accurately reflects the bacterial burden.

      We have also replaced 'orally feeding' with 'oral feeding' throughout the manuscript to improve clarity and consistency.

      We appreciate the reviewer's attention to detail and believe that these changes have significantly enhanced the clarity and accuracy of the manuscript.

      *- Figure 1O: Abrupt die-offs at 1000hrs and 2800hrs in the UAS-Melittin line suggest that lifespan experiment was only performed once and that die-offs may have been exacerbated due to infrequent flipping. This is perhaps not an issue as the lifespans appear to be quite different between the active line and control regardless. *

      Answer: We thank the reviewer for their careful observation. The abrupt die-offs in the UAS-Melittin line at 1000 hours and 2800 hours were unexpected. While we cannot definitively rule out the possibility that infrequent flipping might have contributed to these events, we believe that the overall lifespan difference between the experimental and control groups is substantial and likely reflects a genuine biological effect of Melittin overexpression.

      *- Figure 2F would be improved by putting the legend in the same descending order that the genotypes are displayed on the graph (tApamin infected, GFP infected, tApamin, GFP) *

      Answer: We have corrected error.

      *- Figure 3I: Unclear what small image inserted in the graph depicts. *

      Answer: This is an image of fly stem cells that is available for free licensing.

      - Figures 3N and 3O are verry low resolution and difficult to identify the differences that the authors *intend to show. *

      Answer: We have utilized a higher resolution image and revised the figure accordingly.

      - Figure 4 title is confusing. Do the authors mean, "Locomotion of flies expressing neuronal Apamin, sleep in flies with ubiquitous expression of Apamin, and Smurf results induced by different types of stress."?

      Answer: Corrected as below:

      "Locomotion of flies expressing neuronal tApaminDC, sleep in flies with ubiquitous expression of tApaminDC, and Smurf results induced by different types of stress."

      *- Figure 5: Some of these graphs are very cluttered and difficult to parse (particularly 5H). Suggest putting peptide sequences in figure title rather than underneath graphs to simplify and increase visual effectiveness. *

      Answer: We have improved by removing the sequences to figure legend part.

      *-Throughout: Methods section in particular could use a solid edit for grammar. Homogenize capitalization of "Gram-negative/-positive" and "gram-negative/-positive" *

      Answer: We have corrected error.

      *- Line 98: "an AMPs" should be "an AMP" *

      Answer: We have corrected error.

      *- Line 119: Incorrect grammar. Suggest, "which did not affect the lifespan of female flies and had only a slight effect on male flies" *

      Answer: We have corrected error.

      Reviewer #1 (Significance (Required)):

      *The paper reveals that apamin has antimicrobial properties. The intended significance seems to be an exploration of apamin for therapeutic potential in gut health, but this is not explicitly stated by the authors. The contribution mainly appears to be conceptual in nature. *

      *The findings appear to be in line with other recent in vitro results suggesting that apamin has antimicrobial properties (DOI: 10.9775/kvfd.2024.32125). *

      *Researchers interested in developing therapeutic applications for bee venom constituents or promoting gut health and microbiome balance will likely find this research of interest. *

      *My expertise is primarily in Drosophila molecular genetics and immunity. I have a broad understanding of Drosophila immune pathways, epithelial immunity, and infection dynamics. I do not feel qualified to comment on the statistics or data analysis aspects of this paper. *

      Answer: We sincerely appreciate the reviewer's positive feedback regarding our findings on the antimicrobial properties of apamin. We are grateful for the acknowledgment that our results align with recent in vitro studies, such as the one referenced (DOI: 10.9775/kvfd.2024.32125), which further supports the significance of our work. We have cited this paper in the Discussion section as below.

      "Our findings are consistent with recent in vitro studies demonstrating the antimicrobial and antibiofilm effects of apamin (AYDIN et al, 2024)."

      We recognize the reviewer's observation that our intended significance-specifically, the exploration of apamin's therapeutic potential for gut health-was not explicitly stated in the original manuscript. To address this, we have revised the Introduction and Discussion sections to clearly articulate our aim of investigating apamin as a candidate for promoting gut health and microbiome balance. We believe this clarification will enhance the conceptual contribution of our study and its relevance to researchers interested in therapeutic applications of bee venom constituents.

      "Apamin shows promising therapeutic potential for enhancing bee gut health by exhibiting antimicrobial properties that can help maintain a balanced microbiome. Its ability to modulate immune responses and promote gut integrity, particularly in the presence of harmful bacteria, positions apamin as a valuable candidate for developing strategies aimed at improving gut health in honeybees."

      Additionally, we appreciate the reviewer's expertise in Drosophila molecular genetics and immunity, and we are grateful for their insights regarding the broader implications of our research. We will ensure that our manuscript reflects these considerations more explicitly.

      Thank you once again for your valuable feedback, which has helped us improve the clarity and impact of our work.

      Reviewer #2

      *Reviewer #2 (Evidence, reproducibility and clarity (Required)): *

      General Comment. *The reviewer would like to thank the authors for their contributions to the research of animal venoms and their therapeutic value. The manuscript is very well and clearly written. Additionally, the choice of using a model organism such as D. melanogaster in the context of venoms research strengthens the manuscript by providing evidence that is both robust and broadly applicable, thus enhancing the manuscript's scientific merit and relevance to the field. *

      Answer: We would like to express our sincere gratitude to the reviewer for the positive feedback and thoughtful comments regarding our manuscript. We are pleased to hear that the reviewer appreciates our contributions to the research on animal venoms and their therapeutic potential. The reviewer's acknowledgment of the clarity and quality of our writing is particularly encouraging, as we strive to communicate our findings effectively. Additionally, we are glad that the choice of Drosophila melanogaster as a model organism was recognized for its ability to strengthen our research by providing robust and broadly applicable evidence. This endorsement enhances the scientific merit and relevance of our work within the field. Thank you once again for the constructive feedback, which has been invaluable in refining our manuscript.

      Comment 1. *What is the significance of that the biological property of apamin is independent of its disulfide bonds? Does it suggest that the core functional parts of apamin might not entirely depend on its stabilized structure? Could it mean that modifications to the molecule that disrupt disulfide bonds wouldn't necessarily eliminate all of its activity, which could be important in designing analogs or derivatives of apamin for research or therapeutic purposes etc.? This sentence is written in the abstract which means that it should be a key finding, and it should be clear and a given to the reader. However, it is not the case, and it should be stated more clearly. *

      Answer: We greatly appreciate the reviewer for the perceptive notation. The fact that the biological functioning of apamin needs no disulfide bonds should bring forth the attention of the scientists because it has further implications. This hints that apamin's major functional units are most likely to compose from its polypeptide instead of being rooted in the disulfide-stabilized tertiary structure(Habermann, 1972). The strategy can then lead to the optimization of apamin-based drugs with altered disulfide bridges granting them either higher activity or reduced toxicity. These changes can give apamin additional properties like stability, bioavailability, or selectivity, which make it suitable for research and applied use. We have included an explanation for this in both the Results and Discussion sections.

      "This finding suggests that the core functional components of apamin may not be entirely reliant on its stabilized structure."

      "We discovered that apamin lacking the C-terminus retains its function as an antimicrobial agent, despite missing one of its two disulfide bridges. This finding suggests that the core functional components of apamin may not be entirely dependent on its stabilized structure, indicating that modifications to the molecule that disrupt these disulfide bonds could still maintain some level of activity. These insights are vital for designing analogs or derivatives of apamin, as they pave the way for developing new compounds that could retain therapeutic potential even without the native disulfide bond configuration (Habermann, 1972)."

      Comment 2. *The authors well explained the evolutionary proximity between apamin producing Honeybees and D. melanogaster in order to justify the choice of the model organism which we can all agree on for genetics and developmental biology studies. However, when addressing the behavior of the insects (sleeping, locomotion, social etc.) which are driven by their ecological roles, evolutionary strategies, and social complexity. How much can you really tell about the role of apamin in the behavior of Honeybees (highly social and form colonies) by studying it on an insect (D. melanogaster) which has a completely different and divergent behavior (solitary and exhibit only few basic forms of social interaction)? *

      Answer: We appreciate the reviewer's insightful comment. While Drosophila melanogaster is an excellent model organism for investigating fundamental biological processes, we recognize the limitations of using it to fully comprehend the complex behavioral effects of apamin in honeybees. Nevertheless, our study establishes a foundational understanding of apamin's potential impact on behavior, including its effects on sleep and locomotion-core behavioral processes that are conserved across many organisms, including insects (Zimmerman et al, 2008).

      By employing Drosophila as a model, we were able to identify potential mechanisms of action for apamin, particularly regarding its effects on intestinal systems. Although honeybees and fruit flies exhibit ecological differences, there is substantial consensus and experimental evidence that many molecular pathways involved in immune responses are conserved between these species. Thus, while the interpretation of behavioral changes induced by apamin may be limited by the ecological and evolutionary divergence between honeybees and fruit flies, the molecular pathways governing the immune response in honeybees can be effectively studied using the Drosophila platform. This approach has previously revealed functions of genes related to human genetic diseases. We have clearly articulated this limitation and the advantages of using the fly model to study the honeybee immune system in the Discussion section as follows:

      "In conclusion, it is important to note that much of our understanding of the honeybee immune system is derived from studies conducted on the Drosophila model, owing to the evolutionary proximity of these two species (Larsen et al, 2019b). This close relationship allows for valuable insights into immune mechanisms that are conserved across species (Evans et al, 2006; Morfin et al, 2021). Research has demonstrated that the fruit fly Drosophila melanogaster serves as an effective model for studying the effects of insecticides on honeybees, particularly in understanding the sub-lethal impacts of neonicotinoids, which are known to affect pollinators significantly (Tasman et al, 2021).

      By investigating the function of honeybee AMPs within the Drosophila platform, we can further enhance our knowledge of immune responses and their implications. Just as research on Drosophila has significantly advanced our understanding of human genetic diseases (Bellen et al, 2010; Casci & Pandey, 2015; Bier, 2005; Perrimon et al, 2016; Rieder & Larschan, 2014; Bilder et al, 2021), studying honeybee AMPs in this context holds the potential to uncover novel therapeutic avenues and deepen our comprehension of immune function across taxa."

      Comment 3. *Please include the following references: *

      • Wehbe R, Frangieh J, Rima M, El Obeid D, Sabatier JM, Fajloun Z. Bee Venom: Overview of Main Compounds and Bioactivities for Therapeutic Interests. Molecules. 2019 Aug 19;24(16):2997. *

      • Nader RA, Mackieh R, Wehbe R, El Obeid D, Sabatier JM, Fajloun Z. Beehive Products as Antibacterial Agents: A Review. Antibiotics. 2021; 10(6):717. *

      Answer: We have incorporated the references mentioned above in appropriate sections of the manuscript. We appreciate the reviewer's suggestions.

      Reviewer #2 (Significance (Required)):

      The manuscript is very well and clearly written. Additionally, the choice of using a model organism such as D. melanogaster in the context of venoms research strengthens the manuscript by providing evidence that is both robust and broadly applicable, thus enhancing the manuscript's scientific merit and relevance to the field.

      Answer: We would like to express our sincere gratitude to the reviewer for their positive feedback regarding our manuscript. We are thrilled to hear that the clarity and quality of our writing were appreciated. Additionally, we are glad that the choice of Drosophila melanogaster as a model organism in our venoms research was recognized for its ability to provide robust and broadly applicable evidence. This endorsement underscores the scientific merit and relevance of our work within the field, and we appreciate the reviewer's acknowledgment of this important aspect. Thank you for your encouraging comments, which motivate us to continue exploring this vital area of research.

    1. AbstractBackground The expanding availability of large-scale genomic data and the growing interest in uncovering gene-disease associations call for efficient tools to visualize and evaluate gene expression and genetic variation data.Methodology Data collection involved filtering biomarkers related to multiple neurological diseases from the ClinGen database. We developed a comprehensive pipeline that was implemented as an interactive Shiny application and a standalone desktop application.Results NeuroVar is a tool for visualizing genetic variation (single nucleotide polymorphisms and insertions/deletions) and gene expression profiles of biomarkers of neurological diseases.Conclusion The tool provides a user-friendly graphical user interface to visualize genomic data and is freely accessible on the project’s GitHub repository (https://github.com/omicscodeathon/neurovar).

      This work has been published in GigaByte Journal under a CC-BY 4.0 license (https://doi.org/10.46471/gigabyte.143). These reviews are as follows.

      **Reviewer 1. Joost Wagenaar **

      Is there a clear statement of need explaining what problems the software is designed to solve and who the target audience is?

      Yes. There is a clear statement of need, but the audience is not very targeted. The investigators outline the need for tools to help users identify phenotypic subtypes of disease and describe how the tool would help with this. Although the investigators mention that the tool will allow users to analyze biomarker data, the scope of the types of analysis that can be performed is relatively small. I think that it would benefit the tool to better define the targeted users (clinicians, data scientists, enthusiasts?) and develop specifically towards a single audience.

      The tool leverages several existing R packages to run the analysis over the data and the provided tool can be described as a user-friendly wrapper around these libraries. The interface allows users to submit a file, and plot the results of the analysis within the app.

      As Open Source Software are there guidelines on how to contribute, report issues or seek support on the code?

      No. I did not see any guidelines for contributing to the project in the paper, or in the associated GitHub repository.

      Is the documentation provided clear and user friendly?

      Yes, the investigators did a great job providing documentation and installation instructions. [also video demo: https://youtu.be/cYZ8WOvabJs?si=DnxVuL65yr0wYYjq]

      Is there a clearly-stated list of dependencies, and is the core functionality of the software documented to a satisfactory level?

      Yes, the investigators provide a clearly-stated list of dependencies and instructions on how to install them prior to running the application. Is test data available, either included with the submission or openly available via cited third party sources (e.g. accession numbers, data DOIs)?

      Yes. The paper, and GitHub repository point to a public dataset that can be used to test the application.

      Are there (ideally real world) examples demonstrating use of the software?

      Yes. The investigators provide a video highlighting the use of the application and provide a use-case where they use the app to validate some existing knowledge.

      Is automated testing used or are there manual steps described so that the functionality of the software can be verified?

      No. The application is sufficiently small that no automated testing or manual testing would necessary be required beyond validating that the application works.

      Additional Comments:

      The proposed application provides a nice tool that makes visualization of vcf data and analysis easier for users who are not comfortable working within R directly. It provides a nice demonstration how the scientific community can wrap scientific tools into deployable applications and tools that can be easily understood. A question remains on the target audience for an application like this as most people who are interested in these type of analysis and visualizations are, in fact, familiar enough with R, or other programming languages to directly leverage the libraries and plot the results.
      

      That said, as data integration and multi-omics visualization becomes more complex and the app provides more ways to visualize the data in meaningful ways, I do strongly believe that applications like this can provide a meaningful addition to the scientific tools that are available.

      Reviewer 2. Ruslan Rust

      Is the language of sufficient quality? Yes. The language quality of the document is of sufficient quality. I did not notice any major issues.

      Is there a clear statement of need explaining what problems the software is designed to solve and who the target audience is?

      Yes. Yes, authors provide a statement of need. Authors mention that there is the need for a specialized software tool to identify genes from transcriptomic data and genetic variations such as SNPs, specifically for neurological diseases. Perhaps authors could expand on how they chose the diseases. E.g. stroke is not listed among the neurological diseases. Perhaps authors could expand a bit on the diseases they chose in the introduction.

      Is the source code available, and has an appropriate Open Source Initiative license (https://opensource.org/licenses) been assigned to the code?

      Yes the source code is available in github under the following link: https://github.com/omicscodeathon/neurovar. Additionally authors deposited the source code and additional supplementary data in a permanent depository with zenodo under the following DOI: https://zenodo.org/records/13375493. They also provided test data https://zenodo.org/records/13375591. I was able to download and access the complete set of data

      As Open Source Software are there guidelines on how to contribute, report issues or seek support on the code?

      No. I did not find any way to contribute, report issues or seek support. I would recommend that the authors add this information to the Github README file.

      Is the code executable?

      Yes, I could execute the code using Rstudio 4.3.3

      Is installation/deployment sufficiently outlined in the paper and documentation, and does it proceed as outlined?

      Yes. I could follow the installation process, but perhaps authors could add few more details how to download from Github in more detail. As some scientist may have trouble with it. Also perhaps an installation video (additionally to the video demonstration of the Neurovar Shiny App might be helpful.

      Is the documentation provided clear and user friendly?

      Yes. The documentation is provided and is user friendly. I was able to install, test and run the tool using RStudio. Authors may consider to offer also a simple website link for the RshinyTools if possible. This may enable the access also for scientists that are not familiar with R.Especially, it is great that authors provided a demonstration video. I was able to reproduce the steps. However, I would recommend to add more information into the Youtube video. E.g. reference to the preprint/ paper and Github link would be helpful to connect the data.Perhaps authors could also expand a bit on the possibilities to export data from their software. And provide different formats e.g., PDF / PNG /JPEG. I think this is important for many researchs to export their outputs e.g., from the heatmaps.

      Is there a clearly-stated list of dependencies, and is the core functionality of the software documented to a satisfactory level?

      Yes, dependencies are listed and are installed automatically. It worked for me with Rstudio version 4.3.3. In the manuscript and in the repository.

      Is test data available, either included with the submission or openly available via cited third party sources (e.g. accession numbers, data DOIs)?

      Yes the authors provide test data with this doi: https://doi.org/10.5281/zenodo.13375590

      Are there (ideally real world) examples demonstrating use of the software?

      Yes, authors use the example of Epilepsy, focal epilepsy and the gene of interest DEPDC5. I replicated their search and got the same results. However, I find that the label in Figure 1 in the gene’s transcript could be a bit more clear. E.g. it is not clear to me what transcript start and end refers to. It might also be more helpful if authors provide an example dataset for the Expression data that is loaded in the software by default.Furthermore authors use a case study results using RNAseq in ALS patients with mutations in FUS, TARDBP, SOD1, VCP genes.

      Is automated testing used or are there manual steps described so that the functionality of the software can be verified?

      No. Automated testing is not used as far as I can access it.

      Additional Comments: The preprint version of this paper was also reviewed in ResearchHub: https://www.researchhub.com/paper/7381836/neurovar-an-open-source-tool-for-gene-expression-and-variation-data-visualization-for-biomarkers-of-neurological-diseases/reviews

      My expertise: I am assistant professor in neuroscience and physiology at University of Southern California and work on stem cell therapies on stroke. We are particularly interested in working with genomic data and the development of new biomarkers for stroke, AD and other neurological diseases.

      Summary: The authors provide a software tool NeuroVar that helps visualizing genetic variations and gene expression profiles of biomarkers in different neurological diseases.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews:

      Reviewer #1 (public review):

      (1) The link between the background in the introduction and the actual study and findings is often tenuous or not clearly explained. A re-working of the intro to better set up and link to the study questions would be beneficial.

      We have rewritten the introduction of the manuscript and clearly stated the study questions we were aiming for:

      In paragraph 1-we have stated clearly that we need to study why ADC type of cervical cancer is more aggressive. (Line 58 - 77)

      In paragraph 2- we have stated clearly that we need to find valuable biomarkers to help diagnose lymph node metastasis, which may compensate the shortage of radiological imaging tools and reduce the rate of misdiagnosis. (Line 78 - 100)

      In paragraph 3- we have stated clearly that HPV negative cases is a special group of cervical cancer and we aim to study its cellular features. (Line 101 - 108)

      In paragraph 4- we have stated clearly that we need to decode cell-to-cell interaction mode in the tumor immune microenvironment of ADC using scRNA-seq. (Line 109 - 123)

      (2) For the sequencing, which kit was used on the Novaseq6000?

      For sequencing, we used the Chromium Controller and Chromium Single Cell 3’Reagent Kits (v3 chemistry CG000183) on the Novaseq6000. We feel sorry for lacking this quite important part and have already add the information in Methods section. (Line 196- 197)

      (3) Additional details are needed for the analysis pipeline. How were batch effects identified/dealt with, what were the precise functions and settings for each step of the analysis, how was clustering performed and how were clusters validated etc. Currently, all that is given is software and sometimes function names which are entirely inadequate to be able to assess the validity of the analysis pipeline. This could alternatively be answered by providing annotated copies of the scripts used for analysis as a supplement.

      We apologize for the inadequacy of descriptions of data analysis process. We have already provided a new part of “data processing” with more details in the Methods section (Line 202 - 221). In addition, we have also provided annotated copies of scripts in the supplementary data as Supplementary Data 1.

      (4) For Cell type annotation, please provide the complete list of "selected gene markers" that were used for annotation.

      We have already added the list of marker genes for cell type annotation in the revised manuscript as Supplementary Table 3.

      (5) No statistics are given for the claims on cell proportion differences throughout the paper (for cell types early, epithelial sub-clusters later, and immune cell subsets further on). This should be a multivariate analysis to account for ADC/SCC, HPV+/- and Early/Late stage.

      We feel sorry for lacking statistics when performing analyses of comparisons. In the revision, we have already used statistic approaches to analyze the differences between each set of group comparison. As a result, the corresponding figures have been revised, accordingly.

      For examle, Fig. 1F, Fig. 2D, Fig. 4E, Fig. 5D, Fig. 6D had been re-analyzed to compare ADC/SCC;Supplementary Fig. 1A, Supplementary Fig. 2A, Supplementary Fig. 4A, Supplementary Fig. 5A, Supplementary Fig. 6A had been re-analyzed to compare HPV+/HPV-; Supplementary Fig. 1B, Supplementary Fig. 2B, Supplementary Fig. 4B, Supplementary Fig. 5B, Supplementary Fig. 6B had been re-analyzed to compare Early/Late stage. All P values have been listed in the figure legends.

      (6) The Y-axis label is missing from the proportion histograms in Figure 2D. In these same panels, the bars change widths on the right side. If these are exclusively in ADC, show it with a 0 bar for SCC, not doubling the width which visually makes them appear more important by taking up more area on the plot.

      We feel sorry for impreciseness when presenting histograms of Fig. 2D and we have also revised other figures with similar mistakes, such as Fig. 1F,  Fig. 5D. As for the width of bars, which is due to output style of data processing, we have already corrected all similar mistakes alongside the whole manuscript, for example, Fig. 2D and Supplementary Fig. 2A-B.

      (7) Throughout the manuscript, informatic predictions (differentiation potential, malignancy score, stemness, and trajectory) are presented as though they're concrete facts rather than the predictions they are. Strong conclusions are drawn on the basis of these predictions which do not have adequate data to support. These conclusions which touch on essentially all of the major claims made in the manuscript would need functional data to validate, or the claims need to be very substantially softened as they lack concrete support. Indeed, the fact that most of the genes examined that were characteristic of a given cluster did not show the expected expression patterns in IHC highlights the fact that such predictions require validation to be able to draw proper inferences.

      Thank you for your insightful comments. As you noted, several conclusions were initially based on bioinformatics predictions. Thus in the revised manuscript, we have rewritten all relevant descriptions in a more softened way, particularly in the paragraph of “epithelial cells” in Results section, as well as the conclusions derived from bioinformatics predictions in other paragraphs throughout the manuscript. We hope our revised descriptions will enhance the precision of our work.

      For example, in paragraph “The sub-clusters of epithelial cells in ADC exhibit elevated stem-like features (from Line 353)”, many over-affirmative disriptions had been re-written in Line 353, 362, 371, 375, 379, 383, 390, 392. From Line 395 to 399, the conclusion had been revised as “The observation of cluster Epi_10_CYSTM1 and its possible specificity to ADC makes us question whether or not it may be related to the aggressiveness of ADC” compared to the previous “This observation may partially indicate that high stemness cluster Epi_10_CYSTM1 is essential for ADC to present more aggressive features”. From Line 400 to 408, conclusions from GO analyses had also been rewritten.

      In paragraph “ADC-specific epithelial cluster-derived gene SLC26A3 is a potential prognostic marker for lymph node metastasis (from Line 422)”, many conclusions based on predictions had been revises, such as Line 424 - 428, Line 439 - 441, Line 451 - 453, Line 455 - 457, Line 458 - 459, Line 471 - 473, Line 478 - 481, Line 484 - 486, Line 489, etc.

      In paragraph “Tumor associated neutrophils (TANs) surrounding ADC tumor area may contribute to the formation of a malignant microenvironment (from Line 536)”, we have changed the descriptions based on bio-infomative predictions, such as Line 560, Line 561, Line 565, Line 566, Line 572, Line 576 - 577, etc.

      In paragraph “Crosstalk among tumor cells, Tregs and neutrophils establishes the immunosuppressive TIME in ADC (from Line 601)”, we have already corrected the all the affirmative descriptions, such as Line 604, Line 612, Line 614, Line 626, Line 628 - 629, Line 641, Line 654 – 655, etc.

      All the changes have also been listed in Revision Notes in detail.

      (8) The cluster Epi_10_CYSTM1 which is the basis for much of the paper is present in a single individual (with a single cell coming from another person), and heavily unconnected from the rest of the epithelial populations. If so much emphasis is placed on it, the existence of this cluster as a true subset of cells requires validation.

      We appreciate this suggestion. We agree that the majority of Epi_10_CYSTM1 cells are derived from sample S7. The fact that we have detected this cluster in only one patient may be due to sampling differences and the inherent heterogeneity of tumor specimens. However, the relatively high number of cells in this cluster from one stage III patient suggests its presence in ADC patients and highlights its potential as a diagnostic marker for clinical staging. To further investigate whether this cluster is generally existing in ADC patients, we have identified and selected candidate genes, such as SLC26A3, ORM1, and ORM2, as representative markers of this cluster, which demonstrated high specificity (as shown in Fig. 3B). We then performed IHC staining on a total of 56 tissue samples, and the results showed positive expressions of these markers in the majority of stage IIIC tumor tissues, confirming the existence of this cell cluster (as shown in Supplementary Fig. 3E). In our revised manuscript, we have included an in-depth discussion of this issue in the seventh paragraph of the Discussion section (From Line 801).

      (9) Claims based on survival analysis of TCGA for Epi_10_CYSTM1 are based on a non-significant p-value, though there is a slight trend in that direction.

      Thank you for your insightful comment. From the data of TCGA survival analysis for Epi_10, we found a not-so-slight trend of difference between groups (with a small P value). As a result, we presented this data and hoped to add more strength to the clinical significance of this cluster. However, this indeed caused controversy because the P value is non-significant. As a result, we have already deleted this data in the revised manuscript.

      (10) The claim "The identification of Epi_10_CYSTM1 as the only cell cluster found in patients with stage IIICp raises the possibility that this cluster may be a potential marker to diagnose patients with lymph node metastasis." This is incorrect according to the sample distributions which clearly show cells from the patient who has EPI_10_CYSTM1 in multiple other clusters. This is then used as justification for SLC26A3 which appears to be associated with associated with late stage, however, in the images SLC26A3 appears to be broadly expressed in later tumours rather than restricted to a minor subset as it should be if it were actually related to the EPI_10_CYSTM1 cluster.

      We feel thankful for this question. The conclusion that “The identification of Epi_10_CYSTM1 as the only cell cluster found in patients with stage IIICp raises the possibility that this cluster may be a potential marker to diagnose patients with lymph node metastasis” has indeed been written too concrete according to the sample distribution. We feel sorry for this and have already corrected the description into “As one of stage IIIC-specific cell clusters, the cluster of Epi_10_CYSTM1, with its representative marker gene SLC26A3, presents potential diagnostic value to predict lymph node metastasis” from Line 478-481.

      However, based on our results, we do think this cluster is a potential diagnostic marker and the hypothesis is right. As for SLC26A3, we have specifically added a new paragraph (from Line 801 - 822) in Discussion section to discuss the rationality and necessity of selecting this gene as our central focus, and the reasons why SLC26A3 should be the representative of cluster Epi_10_CYSTM1. As you noted, SLC26A3 appears to be broadly expressed in later tumors rather than restricted to a minor subset in the images. We apologize for any misunderstanding caused. When presenting the IHC data, we only showed the strongly positive areas of each slide to emphasize the differences. In our revision, we have included whole slide scanning images of the IHC samples, clearly showing that SLC26A3 is restricted to a part of the tumors (Supplementary Fig.9).

      (11) The authors claim that cytotoxic T cells express KRT17, and KRT19. This likely represents a mis-clustering of epithelial cells.

      We apologize for using data without noticing the contamination of T cells with few epithelial cells. We have re-performed quality control to exclude contamination and re-analyzed all data of T cells. In the reviesed manuscript, we have therefore updated completely new data for T cells in both Fig. 4 and Supplementary Fig. 4.

      (12) Multiple claims are made for specific activities based on GO term biological process analysis which while not contradictory to the data, certainly are by no means the only explanation for it, nor directly supported.

      Our initial purpose was to use GO analysis as supports for our conclusions. However, we know these are only claims but not evidence, which is also the problem of our writing techniques as in question (7). Therefore, in our revised manuscript, we have already deleted GO data and descriptions in the paragraphs of “T cell (Fig.4)”(from Line 495) and “B/plasma cell (Fig.6)” (from Line 579), because the predictions are quite irrelevant to our conclusions.

      However, in the sections of “epithelial cell (Fig.2)” (from Line 352) and “neutrophils (Fig.5)” (from Line 536), we retained the GO data and rewrote the conclusions, because these analyses have provided us with valuable information regarding the role of specific cell clusters in ADC progression. Furthermore, our subsequent analyses, such as CellChat, have further validated the accuracy of the findings from the GO analysis. We do think this logically supports the whole storyline of the study.

      Reviewer #2 (public review):

      (1) I believe that many of the proposed conclusions are over-interpretations or unwarranted generalizations of the single-cell analysis. These conclusions are often based on populations in the scRNA-seq data that are described as enriched or specific to a given group of samples (eg. ADC). This conclusion is based on the percentage of cells in that population belonging to the given group; for example, a cluster of cells that dominantly come from ADC. The data includes multiple samples for each group, but statistical approaches are never used to demonstrate the reproducibility of these claims.

      We feel sorry that many of the conclusions have been written in an over-affirmative way but lack profound supporting evidences. In our revision, we have already optimized the writing techniques and re-written all conclusions or descriptions related to only bio-informatic predictions. Moreover, we have performed statistical re-analyses on all data and rearranged the related figures.

      For example, in Line 352, we have changed the sub-title “The sub-clusters of epithelial cells exhibit elevated stem-like features to promote the aggressiveness of ADC” into “The sub-clusters of epithelial cells in ADC exhibit elevated stem-like features”. In this paragraph, many over-affirmative discriptions such as “exclusively”, “significant”, “overwhelmingly”, “remarkably” have been deleted. From Line 486-493, the conclusion of “Moreover, SLC26A3 could be employed as a marker for the Epi_10_CYSTM1 cluster, aiding in the diagnosis of lymph node metastasis to prevent post-surgical upstaging in ADC patients in the future” have been changed into “our results propose that SLC26A3 might be considered as a diagnostic marker to predict lymph node metastasis in ADC patients”. Similar over-affirmative descriptions and conclusions had also been re-written in the other paragraphs, which has been refered to question (7) above.

      (2) This leads to problematic conclusions. For example, the "ADC-specific" Epi_10_CYSTM1 cluster, which is a central focus of the paper, only contains cells from one of the 11 ADC samples and represents only a small fraction of the malignant cells from that sample (Sample 7, Figure 2A). Yet, this population is used to derive SLC26A3 as a potential biomarker. SLC26A3 transcripts were only detected in this small population of cells (none of the other ADC samples), which makes me question the specificity of the IHC staining on the validation cohort.

      We sincerely feel grateful for this question. This is a quite important question as it is also pointed out by reviewer#1 in question (8) above. In the revised manuscript, we have already optimized our descriptions and have added detailed explanation for the importance of SLC26A3 in the Discussion section  (from Line 802 - 823). We agree that the majority of Epi_10_CYSTM1 cells are derived from sample S7. The fact that we detected this cluster in only one patient may be due to sampling differences and the inherent heterogeneity of tumor specimens. However, the relatively high number of cells in this cluster from one stage III patient suggests its presence in ADC and highlights its potential as a diagnostic marker for staging ADC. To further investigate whether this cluster is generally present in ADC patients, we identified and selected candidate genes, such as SLC26A3, ORM1, and ORM2, as representative markers of this cluster, which demonstrated high specificity (as shown in Fig. 3B). We then performed IHC staining on 56 cases of tissue samples, and the results showed positive expression of these markers in the majority of stage III tumor tissues, confirming the existence of this cell cluster (as shown in Supplementary Fig. 3E). In our revised manuscript, we have included an in-depth discussion of this issue in the seventh paragraph of the Discussion section.

      (3) This is compounded by technical aspects of the analysis that hinder interpretation. For example, it is clear that the clustering does not perfectly segregate cell types. In Figures 2B and D, it is evident that C4 and C5 contain mixtures of cell type (eg. half of C4 is EPCAM+/CD3-, the other half EPCAM-/CD3+). These contaminations are carried forward into subclustering and are not addressed. Rather, it is claimed that there is a T cell population that is CD3- and EPCAM+, which does not seem likely.

      Thank you for your insightful comment. This important point is also raised by reviewer#1 above. In the revised manuscript, we have reanalyzed our scRNA-seq data and listed the canonical marker genes for cell type annotation. Most importantly, as for T cells and its sub-clustering, we have performed quality control and re-analyzed all data for T cells, with contamination excluded. In the reviesed manuscript, we have added the re-analyzed data for T cells in both Fig. 4 and Supplementary Fig. 4.

      Recommendations for the authors:

      Reviewer #1 (recommendations for the authors):

      The text would substantially benefit from an editorial revision of language usage.

      We sincerely feel grateful for this suggestion. In our revision, we have conducted language editing and carefully rewritten our manuscript. The changes have been clearly marked in the tracked version of the revised manuscript.

      Reviewer #2 (recommendations for the authors):

      (1) Use statistical approaches to claim enrichment/specificity of populations to given groups (ADC, HPV, etc). Analysis packages like Milo for differential abundance testing would be very helpful.

      We feel grateful for this suggestion. In our revision, we have performed statistical analyses for all groups of comparison data. Meanwhile, we have rearranged the figures based on these statistical results, for example, Fig. 1F, Fig. 2D, Fig. 4E, Fig. 5D, Fig. 6D, Supplementary Fig. 1A-B, Supplementary Fig. 2A-B, Supplementary Fig. 4A-B, Supplementary Fig. 5A-B, Supplementary Fig. 6A-B.

      (2) In the subclustering, consider a round of quality control to ensure that all cells are of the cell type they are claimed to be. Contaminant clusters/cells could be filtered out or reassigned. This could be supplemented with an automated annotation approach using cell-type references.

      We feel thankful for this suggestion. As a result, we have provided copies of scripts in the supplementary data to ensure the quality control of cell type annotation.

      (3) An explanation for why SLC26A3 is so rare in the scRNA-seq data, but seemingly common in the IHC staining would be helpful. I am concerned about the specificity of the stain.

      We apologize for lacking adequate explanation of SLC26A3 and cluster Epi_10_CYSTM1. This is a quite crucial question as it has been listed above in question (8) of reviewer #1 and question (2) of reviewer #2 (public review section). In the revised manuscript, we have added intenstive discussion about this question in the seventh paragraph of Disccusion section (from Line 801 - 822). In fact, because of the heterogeneity among different individuals and different tumor regions even within one sample, Epi_10_CYSTM1 seemed to be derived from only one sample. However, the relatively high number of cells in this cluster from one late-stage (stage IIIC) patient suggests its presence in ADC and highlights its potential as a diagnostic marker for staging ADC. Furthermore, we have identified SLC26A3, ORM1 and ORM2 as specific markers of this cluser and performed IHC staining. With a positive expression of these markers, the existence of this cluster has been indirectly proved (as shown in Fig. 3B).

    1. Author response:

      The following is the authors’ response to the current reviews.

      The authors agree with the reviewers that future studies are needed to dissect the mechanisms of eIF3 binding to 3'UTRs and their impact on translation, and the impact of this binding on cellular fate.


      The following is the authors’ response to the original reviews.

      eLife Assessment

      This valuable study reveals extensive binding of eukaryotic translation initiation factor 3 (eIF3) to the 3' untranslated regions (UTRs) of efficiently translated mRNAs in human pluripotent stem cell-derived neuronal progenitor cells. The authors provide solid evidence to support their conclusions, although this study may be enhanced by addressing potential biases of techniques employed to study eIF3:mRNA binding and providing additional mechanistic detail. This work will be of significant interest to researchers exploring post-transcriptional regulation of gene expression, including cellular, molecular, and developmental biologists, as well as biochemists.

      We thank the reviewers for their positive views of the results we present, along with the constructive feedback regarding the strengths and weaknesses of our manuscript, with which we generally agree. We acknowledge our results will require a deeper exploration of the molecular mechanisms behind eIF3 interactions with 3'-UTR termini and experiments to identify the molecular partners involved. Additionally, given that NPC differentiation toward mature neurons is a process that takes around 3 weeks, we recognize the importance of examining eIF3-mRNA interactions in NPCs that have undergone differentiation over longer periods than the 2-hr time point selected in this study. Finally, considering the molecular complexity of the 13subunit human eIF3, we agree that a direct comparison between Quick-irCLIP and PAR-CLIP will be highly beneficial and will determine whether different UV crosslinking wavelengths report on different eIF3 molecular interactions. Additional comments are given below to the identified weaknesses.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors perform irCLIP of neuronal progenitor cells to profile eIF3-RNA interactions upon short-term neuronal differentiation. The data shows that eIF3 mostly interacts with 3'-UTRs - specifically, the poly-A signal. There appears to be a general correlation between eIF3 binding to 3'-UTRs and ribosome occupancy, which might suggest that eIF3 binding promotes protein

      Strengths:

      The study provides a wealth of new data on eIF3-mRNA interactions and points to the potential new concept that eIF3-mRNA interactions are polyadenylation-dependent and correlate with ribosome occupancy.

      Weaknesses:

      (1) A main limitation is the correlative nature of the study. Whereas the evidence that eIF3 interacts with 3-UTRs is solid, the biological role of the interactions remains entirely unknown. Similarly, the claim that eIF3 interactions with 3'-UTR termini require polyadenylation but are independent of poly(A) binding proteins lacks support as it solely relies on the absence of observable eIF3 binding to poly-A (-) histone mRNAs and a seeming failure to detect PABP binding to eIF3 by co-immunoprecipitation and Western blotting. In contrast, LC-MS data in Supplementary File 1 show ready co-purification of eIF3 with PABP.

      We agree the molecular mechanisms underlying the crosslinking between eIF3 and the end of mRNA 3’-UTRs remains to be determined. We also agree that the lack of interaction seen between eIF3 and PABP in Westerns, even from HEK293T cells, is a puzzle. The low sequence coverage in the LC-MS data gave us pause about making a strong statement that these represent direct eIF3 interactions, given the similar background levels of some ribosomal proteins.

      (2) Another question concerns the relevance of the cellular model studied. irCLIP is performed on neuronal progenitor cells subjected to neuronal induction for 2 hours. This short-term induction leads to a very modest - perhaps 10% - and very transient 1-hour-long increase in translation, although this is not carefully quantified. The cellular phenotype also does not appear to change and calling the cells treated with differentiation media for 2 hours "differentiated NPCs" seems a bit misleading. Perhaps unsurprisingly, the minor "burst" of translation coincides with minor effects on eIF3-mRNA interactions most of which seem to be driven by mRNA levels. Based on the ~15-fold increase in ID2 mRNA coinciding with a ~5-fold increase in ribosome occupancy (RPF), ID2 TE actually goes down upon neuronal induction.

      We agree that it will be interesting to look at eIF3-mRNA interactions at longer time points after induction of NPC differentiation. However, the pattern of eIF3 crosslinking to the end of 3’-UTRs occurs in both time points reported here, which is likely to be the more general finding in what we present.

      (3) The overlap in eIF3-mRNA interactions identified here and in the authors' previous reports is minimal. Some of the discrepancies may be related to the not well-justified approach for filtering data prior to assessing overlap. Still, the fundamentally different binding patterns - eIF3 mostly interacting with 5'-UTRs in the authors' previous report and other studies versus the strong preference for 3'-UTRs shown here - are striking. In the Discussion, it is speculated that the different methods used - PAR-CLIP versus irCLIP - lead to these fundamental differences. Unfortunately, this is not supported by any data, even though it would be very important for the translation field to learn whether different CLIP methodologies assess very different aspects of eIF3-mRNA interactions.

      We agree the more interesting aspect of what we observe is the difference in location of eIF3 crosslinking, i.e. the end of 3’-UTRs rather than 5’-UTRs or the pan-mRNA pattern we observed in T cells. The reviewer is right that it will be important in the future to compare PAR-CLIP and Quick-irCLIP side-by-side to begin to unravel the differences we observe with the two approaches.

      Reviewer #2 (Public review):

      Summary:

      The paper documents the role of eIF3 in translational control during neural progenitor cell (NPC) differentiation. eIF3 predominantly binds to the 3' UTR termini of mRNAs during NPC differentiation, adjacent to the poly(A) tails, and is associated with efficiently translated mRNAs, indicating a role for eIF3 in promoting translation.

      Strengths:

      The manuscript is strong in addressing molecular mechanisms by using a combination of nextgeneration sequencing and crosslinking techniques, thus providing a comprehensive dataset that supports the authors' claims. The manuscript is methodologically sound, with clear experimental designs.

      Weaknesses:

      (1) The study could benefit from further exploration into the molecular mechanisms by which eIF3 interacts with 3' UTR termini. While the correlation between eIF3 binding and high translation levels is established, the functionality of these interactions needs validation. The authors should consider including experiments that test whether eIF3 binding sites are necessary for increased translation efficiency using reporter constructs.

      We agree with the reviewer that the molecular mechanism by which eIF3 interacts with the 3’UTR termini remains unclear, along with its biological significance, i.e. how it contributes to translation levels. We think it could be useful to try reporters in, perhaps, HEK293T cells in the future to probe the mechanism in more detail.

      (2) The authors mention that the eIF3 3' UTR termini crosslinking pattern observed in their study was not reported in previous PAR-CLIP studies performed in HEK293T cells (Lee et al., 2015) and Jurkat cells (De Silva et al., 2021). They attribute this difference to the different UV wavelengths used in Quick-irCLIP (254 nm) and PAR-CLIP (365 nm with 4-thiouridine). While the explanation is plausible, it remains a caveat that different UV crosslinking methods may capture different eIF3 modules or binding sites, depending on the chemical propensities of the amino acid-nucleotide crosslinks at each wavelength. Without addressing this caveat in more detail, the authors cannot generalize their findings, and thus, the title of the paper, which suggests a broad role for eIF3, may be misleading. Previous studies have pointed to an enrichment of eIF3 binding at the 5' UTRs, and the divergence in results between studies needs to be more explicitly acknowledged.

      We agree with the reviewer that the two methods of crosslinking will require a more detailed head-to-head comparison in the future. However, we do think the title is justified by the fact that we see crosslinking to the termini of 3’-UTRs across thousands of transcripts in each condition. Furthermore, the 3’-UTR crosslinking is enriched on mRNAs with higher ribosome protected fragment counts (RPF) in differentiated cells, Figure 3F.

      (3) While the manuscript concludes that eIF3's interaction with 3' UTR termini is independent of poly(A)-binding proteins, transient or indirect interactions should be tested using assays such as PLA (Proximity Ligation Assay), which could provide more insights.

      This is a good idea, but would require a substantial effort better suited to a future publication. We think our observations are interesting enough to the field to stimulate future experimentation that we may or may not be most capable of doing in our lab.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript by Mestre-Fos and colleagues, authors have analyzed the involvement of eIF3 binding to mRNA during differentiation of neural progenitor cells (NPC). The authors bring a lot of interesting observations leading to a novel function for eIF3 at the 3'UTR.

      During the translational burst that occurs during NPC differentiation, analysis of eIF3-associated mRNA by Quick-irCLIP reveals the unexpected binding of this initiation factor at the 3'UTR of most mRNA. Further analysis of alternative polyadenylation by APAseq highlights the close proximity of the eIF3-crosslinking position and the poly(A) tail. Furthermore, this interaction is not detected in Poly(A)-less transcripts. Using Riboseq, the authors then attempted to correlate eIF3 binding with the translation efficacy of mRNA, which would suggest a common mechanism of translational control in these cells. These observations indicate that eIF3-binding at the 3'UTR of mRNA, near the poly(A) tail, may participate to the closed-loop model of mRNA translation, bridging 5' and 3', and allowing ribosomes recycling. However, authors failed to detect interactions of eIF3, with either PABP or Paip1 or 40S subunit proteins, which is quite unexpected.

      Strength:

      The well-written manuscript presents an attractive concept regarding the mechanism of eIF3 function at the 3'UTR. Most mRNA in NPC seems to have eIF3 binding at the 3'UTR and only a few at the 5'end where it's commonly thought to bind. In a previous study from the Cate lab, eIF3 was reported to bind to a small region of the 3'UTR of the TCRA and TCRB mRNA, which was responsible for their specific translational stimulation, during T cell activation. Surprisingly in this study, the eIF3 association with mRNA occurs near polyadenylation signals in NPC, independently of cell differentiation status. This compelling evidence suggests a general mechanism of translation control by eIF3 in NPC. This observation brings back the old concept of mRNA circularization with new arguments, independent of PABP and eIF4G interaction. Finally, the discussion adequately describes the potential technical limitations of the present study compared to previous ones by the same group, due to the use of Quick-irCLIP as opposed to the PAR-CLIP/thiouridine.  

      Weaknesses:

      (1) These data were obtained from an unusual cell type, limiting the generalizability of the model.

      We agree that unraveling the mechanism employed by eIF3 at the mRNA 3’-UTR termini might be better studied in a stable cell line rather than in primary cells.

      (2) This study lacks a clear explanation for the increased translation associated with NPC differentiation, as eIF3 binding is observed in both differentiated and undifferentiated NPC. For example, I find a kind of inconsistency between changes in Riboseq density (Figure 3B) and changes in protein synthesis (Figure 1D). Thus, the title overstates a modest correlation between eIF3 binding and important changes in protein synthesis.

      We thank the reviewer for this question. Riboseq data and RNASeq data are not on absolute scales when comparing across cell conditions. They are normalized internally, so increases in for example RPF in Figure 3B are relative to the bulk RPF in a given condition. By contrast, the changes in protein synthesis measured in Figure 1D is closer to an absolute measure of protein synthesis. 

      (3) This is illustrated by the candidate selection that supports this demonstration. Looking at Figure 3B, ID2, and SNAT2 mRNA are not part of the High TE transcripts (in red). In contrast, the increase in mRNA abundance could explain a proportionally increased association with eIF3 as well as with ribosomes. The example of increased protein abundance of these best candidates is overall weak and uncertain.

      We agree that using TE as the criterion for defining increased eIF3 association would not be correct. By “highly translated” we only mean to convey the extent of protein synthesis, i.e. increases in ribosome protected fragments (RPF), rather than the translational efficiency.

      (4) Despite several attempts (chemical and UV cross-linking) to identify eIF3 partners in NPC such as PABP, PAIP1, or proteins from the 40S, the authors could not provide any evidence for such a mechanism consistent with the closed-loop model. Overall, this rather descriptive study lacks mechanistic insight (eIF3 binding partners).

      We agree that it will be important to identify the molecular mechanism used by eIF3 to engage the termini of mRNA 3’-UTRs. Nevertheless, the identification of eIF3 crosslinking to that location in mRNAs is new, and we think will stimulate new experiments in the field.

      (5) Finally, the authors suspect a potential impact of technical improvement provided by QuickirCLIP, that could have been addressed rather than discussed.

      We agree a side-by-side comparison of eIF3 crosslinks captured by PAR-CLIP versus QuickirCLIP will be an important experiment to do. However, NPCs or other primary cells may not be the best system for the comparison. We think using an established cell line might be more informative, to control for effects such as 4-thiouridine toxicity.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The Western blot signals for SLC38A2 and ID2 are close to the membrane background and little convincing. Size markers are missing.

      We agree these antibodies are not great. They are the best we could find, unfortunately. We have included originals of all western blots and gels as supplementary information. It’s important to note that the Riboseq data for ID2 and SLC38A2 are consistent with the western blots. See Figure 3C and Figure 3–figure supplement 3B.

      (2) Figure 1 - Figure Supplement 1 appears to present data from a single experiment. This is far less than ideal considering the minor differences measured.

      Thanks for the comment. This is a representative experiment showing the early time course. We have added a second experiment with two different treatments that show the same pattern in the puromycin assay, in Figure 1–figure supplement 1.

      (3) Figure 3F: One wonders what this would look like if TE was plotted instead of RPF. Figure 3 - Figure Supplement 4 seems to show something along those lines. However, the data are not mentioned in the main results section are quite unclear. Why are data separated into TE high and low? Doesn't TE high in differentiated cells equal TE low in undifferentiated cells?

      This is an interesting question. Note that in Figure 3B, n=6300 genes show no change in TE upon differentiation, compared to a total of n=2127 that show a change in TE, with most of those changes not very large. We have now replotted Figure 3F comparing irCLIP read counts in 3’-UTRs to RPF read counts, which shows a significant positive correlation, regardless of whether we look at undifferentiated or differentiated NPCs (See Figure 3F and a new Figure 3– figure supplement 4A). We also compare irCLIP reads in 3’-UTRs to TE values, which show no correlation (See Figure 3G and Figure 3–figure supplement 4B).

      Figure 3-figure supplement 4 was actually a response to a previous round of review (at PLOS Biology) to a rather technical question from a reviewer. We think this figure and associated text should be removed. Instead, we now include supplementary tables with the processed RPF and TE values, for reference (Supplemental files 4-6). We omitted these in the original submission when they should have been included. We also abandoned comparing undifferentiated and differentiated NPCs, and instead look directly at irCLIP reads vs. RPFs or TE, regardless of NPC state, as noted above (Figure 3F, G, and Figure 3–figure supplement 4).

      (4) Figure 3C: The data should be plotted on the same y-axis scale. This would make a visual assessment of the differences in mRNA and RFP levels more intuitive.

      Thanks for this suggestion. We have rescaled the plots as requested.

      Reviewer #2 (Recommendations for the authors):

      (1) The quality of the Western blots in several figures is quite poor. Notably, Figure 1C seems to be a composite gel, as each blot appears to come from a different gel. Additionally, in Supplementary Figure 1A, there is only a single data point, yet the authors indicate that this image is representative of multiple assays. The lack of error bars in this figure raises a question vis-a-vis the reproducibility of the experiments.

      Thanks for the comments. We now include all the original gels as supplementary information. As noted above, the antibodies for ID2 and SLC38A2 are not great, we agree. And as we noted above, the Riboseq data for ID2 and SLC38A2 are consistent with the western blots.

      (2) For the top 500 targets of undifferentiated and differentiated NPCs in the Quick-irCLIP assay, the manuscript does not clarify how many targets are common and how many are unique to each condition. This information is important for understanding the extent of overlap and differentiation-specific interactions of eIF3 with mRNAs. Providing this data would strengthen the interpretation of the results.

      There are 449 of the top 500 hits in common between undifferentiated and differentiated NPCs. We have now added this information to the text, to add clarity. 

      (3) The manuscript does not provide detailed percentages or numbers regarding the overlap between iCLIP and APA-Seq peaks. Clarifying this overlap, particularly in terms of how many of the APA sites are also targets of eIF3, would bolster the understanding of how these two datasets converge to support the authors' conclusions.

      This is a difficult calculation to make, due to the fact that APA-Seq reads are generally much longer than the Quick-irCLIP reads. This is why we focused instead on quantifying the percent of Quick-irCLIP peaks (which are more narrow) overlap with predicted polyadenylation sequences, in Figure 2-figure supplement 1.

      Reviewer #3 (Recommendations for the authors):

      (1) Perform Quick-irCLIP in HEK293 cells to infer technical limitations and/or to generalize the model. The authors will then compare again eIF3 binding site in Jurkat, HEK293, and NPC.

      This is an experiment we plan to do for a future publication, given that we would want to repeat both Quick-irCLIP and PAR-CLIP at the same time.

      (2) Select mRNA candidates with high or low TE changes and analyze eIF3 binding and RPF density and protein abundance along NPC differentiation to support the role of eIF3 binding in stimulating translation.

      We agree looking at time courses in more depth would be interesting. However, this would require substantial experimentation, which is better suited to a future study. Furthermore, now that we have moved away from comparing undifferentiated NPCs and differentiated NPCs when examining TE and RPF values (Figure 3 and Figure 3–figure supplement 4), we think the results now support a more general mechanism of translation reflected in the irCLIP 3’-UTR vs. RPF correlation, independent of NPC state.

      (3) Analyze the interaction of eIF3 with eIF4G and other known partners. This will really provide an improvement to the manuscript. The lack of interaction between eIF3 and the 40S is quite surprising.

      We agree more work needs to be done on the mechanistic side. These are experiments we think would be best to carry out in a stable cell line in the future, rather than primary cells.

      (4) Perform Oligo-dT pulldown (or cap column if possible) and analyze the relative association of PABP, eIF3, and eIF4F on mRNA in NPC versus HEK293. This will clarify whether this mechanism of mRNA translation is specific to NPC or not.

      Thanks for this suggestion. We are uncertain how it would be possible to deconvolute all the possible ways to interpret results from such an experiment. We agree thinking about ways to study the mechanism will keep us occupied for a while.

      (5) Citations in the text indicate the first author, whereas the references are numbered! 

      Our apologies for this oversight. This was a carryover from previous formatting, and has been fixed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      (1) In my opinion, the major weakness is the selection of IVs, the same IVs should be used for each exposure, especially when the outcomes (IA, SAH, and uIA) are closely related. The removal of IVs was inconsistent, for example, why was LPA rs10455872 removed for SAH but not for uIA? (significantly more IVs were used for uIA). The authors should provide more details for the justification of the removal of IVs other than only indicating "confounder" in supplementary tables. The authors should also perform additional analyses including all IVs and IVs from other PUFA GWAS.

      We apologized for our negligence. We reconducted a two-sample MR analysis following the removal of rs10455872 from the uIA, which yielded unaltered ORs and 95% confidence intervals. The P-value was once again found to be statistically insignificant. These results demonstrate the robustness of our MR analyses and indicate that this SNP does not exert an influence on the overall results. (see Figure 4)

      For SNP selection, we adhered rigorously to the established Mendelian randomization analysis process for the screening of instrumental variables. "Confounder" is mean that a current explicit influencer that is explicitly associated with the outcome variable. Following the removal of such confounding SNPs, the analysis of heterogeneity and pleiotropy is repeated on several occasions in MR analysis using radical MR, MRPRESSO, IVW-radical and Egger-radical, with each iteration involving the removal of the corresponding anomalous SNPs until all instances of pleiotropy and heterogeneity have been eliminated, it can be observed that the final single-nucleotide polymorphism (SNP) for each group is not identical. Therefore, It can be observed that the final SNPs for each group is not identical.

      (2) In addition, it seems that the SNPs in the FADS locus were driving the MR association, while FADS is a very pleiotropic locus associated with many lipid traits, removing FADS could attenuate the MR effect. The authors should perform a sensitivity analysis to remove this locus.

      Thanks for the reviewer’s suggestion. In our revised manuscript, We reconducted MR analysis of the positive results after the removal of the FADS2 and its SNPs within 500 kb of the FADS2 locus. This analysis demonstrated that there was no significant causal pathogenic association between PUFA and IA, aSAH. This result validated that SNP: rs174564 was a significant factor driving the causal association between PUFAs and CA. (See page 6, line155-157 and Figure 8)

      (3) Instead of removing multiple "confounder" IVs which I think may bias the MR results due to very closely related lipid traits, the authors should perform multivariable MR to identify independent effects of PUFAs to IA, conditioning on other PUFAs and/or other lipids.

      Thanks for the reviewer’s suggestion. In our revised manuscript, we employed MVMR through adjust for HDL cholesterol, LDL cholesterol, total cholesterol and triglycerides, to remove bias from closely related lipid traits. The application of MVMR analysis serves to reinforce the robustness of our conclusions. (See page 6, line151-153 and Figure5-7)

      (4) Colocalization was not well described, the authors should include the colocalization results for each locus in a supplementary table. They also mentioned "a large PP for H4 (PP.H4 above 0.75) strongly supports shared causal variants affecting both gene expression and phenotype". The authors should make sure that the colocalization was performed using the expression data of each gene or using the GWAS summary of each PUFA locus.

      I apologize for our negligence. We have added the detailed results of the COLOC for each locus in the supplementary table. (See supplementary table 6)

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) I suggest the authors consult Borges et al., 2022 (doi: 10.1186/s12916-022-02399-w) for PUFA IV selection, and perform sensitivity analysis based on Borges et al., 2022 IVs and another PUFA GWAS (such as J Kettunen et al., 2016, doi: 10.1038/ncomms11122).

      Thanks for the reviewer’s suggestion. In order to provide further evidence of the robustness of the results of our analyses, we conducted MVMR and a sensitivity analysis after excluding SNPs within 500 kb of the FADS2 locus, as recommended by Borges et al. (2022). (See page 6, line151-157 and Figure 5-8)

      In regard to the article by J. Kettunen et al. (2016), we found that the validation dataset from which the article was sourced was insufficient in terms of sample size and lacked the requisite statistical efficacy to be used for validation purposes.

      (2) The authors justified that colocalization is to determine if "PUFAs are mediators in the hereditary causative route of cerebral aneurysm", which I don't think is the case.

      Colocalization is to determine whether an MR estimate is not confounded by LD.

      I apologize for our incorrect description. We have made careful modification in our revised manuscript, as follows: “There is consistent evidence that PUFAs have a beneficial causal effect on cerebral aneurysm. In order to determine an MR estimate is not confounded by LD, we used COLOC to identify shared causal SNP between PUFAs and cerebral aneurysms”. (See page 7-8, line 215-217)

      (3) Supplementary tables 2-4 were a bit confusing to me, I suggest the authors provide one supplementary table for each exposure.

      Thanks for the reviewer’s suggestion. Supplementary tables 2_1-2_5 shows the exposure data for the five PUFAs associated with IA, supplementary tables 3_1-3_5 shows the exposure data for the five PUFAs associated with aSAH and supplementary tables 4_1-4_5 shows the exposure data for the five PUFAs associated with UIA. Each exposure is represented by a distinct table.

      (4) Figure 1 legend: I can't find multivariable MR in the figure/method.

      I apologize for our negligence. In our revised manuscript, we have added the MVMR methodology. We also have modified Figure 1 and Figure 1 legend. (See Figure 1, Figure 1 legend and page 6, line 151-153)

      (5) LOO analysis was mentioned in methods and results but I could not find the results for LOO.

      I apologize for our negligence. In our revised manuscript, we have described the results of the LOO, as follows: “The leave-one-out plot demonstrates that there is a potentially influential SNP (rs174564) driving the causal link between PUFA and cerebral aneurysm.” (See page 7, line 209-210)

      (6) Finally, the authors should proofread their manuscript as many sentences are difficult to read, such as:

      Line 183: "...MR methods revealed consistency", "However, there was no any causal relationship..."

      Line 200: "For achieve that..."

      I apologize for our incorrect description. We have modified these descriptions in our revised manuscript, as follows: “The results demonstrated consistency in the outcomes and directionality of the various MR methods employed” and “In order to determine an MR estimate is not confounded by LD, we used COLOC to identify shared causal SNP between PUFAs and cerebral aneurysms”. (See page 7, line 187-188 and line 215-217).

      Reviewer #2 (Recommendations For The Authors):

      (1) Are there any previous epidemiological studies on the association between PUFA and cerebral aneurysm? It will be helpful to introduce this background.

      Thanks for the reviewer’s suggestion. The epidemiology of PUFA with aneurysm in other sites, such as the abdominal aorta, are described in the Introduction section. Although there is a paucity of large-scale multicenter clinical epidemiological studies examining the relationship between PUFAs and cerebral aneurysms, we are endeavoring to infer a prior association between PUFAs and cerebral aneurysms with the aid of Mendelian randomization analysis.

      (2) The authors performed a leave-one-out analysis but did not explain much about the results. The leave-one-out analysis seems to provide some evidence that some SNP is driving the results, like rs174564 in Supplementary Figure 5-1.

      I apologize for our negligence. In our revised manuscript, we have described the results of the leave-one-out analysis, as follows: “The leave-one-out plot demonstrates that there is a potentially influential SNP (rs174564) driving the causal link between PUFA and cerebral aneurysm”. (See page 7, line209-214)”.

      (3) In the discussion (line 211), the authors mentioned omega-6 fatty acids increased the risk of IA and aSAH, omega-3 fatty acids decreased the risk for IA and aSAH, but omega-6 by omega-3 decreased the risk of IA and aSAH. This seems to be different from the figures.

      I apologize for our incorrect description. We have modified this description in our revised manuscript, as follows: “We demonstrated that the omega-3 fatty acids, DHA and, omega-3-pct causally decreased the risk for IA and aSAH. And omega-6 by omega-3 causally increased the risk of IA and aSAH”. (See page 8, line228-230)

      Minor:

      (4) Some grammar errors need to be checked, such as:

      In line 200, "For achieve that, we tested for shared causative SNPs between PUFAs and cerebral aneurysm using COLOC".

      In line 123, "Fourth, to eliminate unclear, palindromic and associated with known confounding factors (body mass index (McDowell et 125 al., 2018), blood pressure (Sun et al., 2022), type 2 diabetes (Tian et al., 2022), high-density lipoprotein (Huang et al., 2018)) SNPs."

      I apologize for our incorrect description. We have modified these descriptions in our revised manuscript, as follows: “Fourth, remove SNPs that are obscure, palindromic, and linked to recognized confounding variables (body mass index (McDowell et al., 2018), blood pressure (Sun et al., 2022), type 2 diabetes (Tian et al., 2022), high-density lipoprotein (Huang et al., 2018))” and “In order to determine an MR estimate is not confounded by LD, we used COLOC to identify shared causal SNP between PUFAs and cerebral aneurysms”. (See page 5, line 124-127 and page 7 line215-217)

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility____,____ and clarity)

      This manuscript by Tsai et al. shows that phage resistance mutations (LPS truncation) confer a cost during interbacterial competition. The authors show that various phage resistant mutants of S. enterica are inhibited by E. cloacae in a contact-dependent manner (on a solid surface but not in liquid). Further experiments showed that this inhibition of S. enterica was mediated by T6SS in E. cloacae. The authors then dissect which parts of the LPS are required for resistance against T6SS attacks and show that a similar resistance is conferred against T6SS of B. thailandensis and C. rodentium. Moreover, the authors show that enzymatic degradation of LPS by a phage enzyme can also increase sensitivity to T6SS (including when such enzymes are on phage particles). Finally, the authors suggest that the change in the thickness of the LPS surface layer could be the reason for changes in T6SS susceptibility. Overall, the manuscript is very well-written. The experiments and controls are explained in sufficient detail and in a logical order. The figures are clear and easy to navigate. The findings are very interesting and important for the T6SS field but also for general understanding how different evolutionary pressures combine and influence each other. I believe that this manuscript will initiate further research in this direction.

      • We thank the reviewer for their positive remarks on our manuscript and the valuable suggestions for its improvement. Major comments

      The only major point that I would like to raise is that I am not generally convinced that the 2 nm difference in the thickness of LPS is the main reason for the observed differences in T6SS-mediated killing of S. enterica. Based on what we know about T6SS mode of action, we expect that it is potentially pushing effectors by up to several hundreds of nanometers. Therefore, the change in the LPS thickness by a few nanometers (as measured by AFM) seems insufficient to provide enough spacing between the attacker and the prey to significantly decrease T6SS effector delivery. While it is clear that understanding the exact reason for the LPS mediated resistance is beyond the scope of this manuscript, I would suggest that the authors consider the fact that T6SS is known to deliver proteins even to the cytoplasm of target gram-negative cells and discuss the mode of action of the machine in the context of their finding. If the T6SS was drawn to scale in the model figure, it would become apparent that 2 nm change in the distance between two cells has probably no major impact on killing by T6SS and the actual reason for the observed phenotype is likely more complicated than what is proposed.

      We appreciate the reviewer's comments and acknowledge that our manuscript leaves open questions regarding the exact mechanisms underlying LPS-mediated resistance. We have now moderated the Discussion in our revised manuscript to reflect the complexity of this phenomenon (Lines 410-423). Although we agree that the nanometer difference in LPS thickness may not fully explain the observed protective phenotype, we believe it remains a plausible contributing factor that is worth considering.

      To fully understand how LPS influences T6SS effector delivery, future studies will need to address key mechanistic questions regarding the T6SS injection process. For example, 1) how deeply does the T6SS apparatus penetrate the target Gram-negative cells during injection; 2) what is the magnitude of the injection force generated by the T6SS; and 3) does the structural integrity of the T6SS apparatus remain intact throughout and after contraction? While it is well documented that some T6SS effectors act in the cytosol of target cells, there is evidence to suggest that cytosolic effectors are initially delivered into the periplasm and subsequently translocated into the cytosol for intoxication1,2. Furthermore, although contraction of the T6SS apparatus occurs within milliseconds3,4, this rapid action does not preclude the possibility that the injection force could be influenced by the thickness of the LPS layer. In addition, the stability of T6SS structural or delivered proteins-such as PAAR, VgrG, and Hcp-within the delivery complex might be compromised upon encountering physical barriers such as the LPS layer and the outer membrane of target cells. These potential interactions could affect the efficiency of effector delivery, leading to reduced competitiveness during interbacterial antagonism, as shown in our study.

      • We appreciate the reviewer's suggestions and acknowledge that the precise reasons for LPS-mediated resistance likely involve a combination of factors beyond those proposed here. We are actively pursuing these questions as part of an ongoing, long-term effort to better elucidate the mechanisms of T6SS action. Minor comments

      Specify which T6SS of B. thailandensis was tested.

      • We now cite studies by Schwarz, S., et al., 20105 and LeRoux, M., et al., 20156, from which we used the tssM (BTH_I2954) gene deletion strain abrogating the T6SS-1 of the B. thailandensis E264 (Line 234, Supplementary Table 1). Use a different naming of the two strains used in competition assays than "donor" and "recipient".

      • Thank you for this suggestion. In the revised manuscript, we have replaced the terms "donor" and "recipient" with "attacker" and "prey" for clarity. This change has been applied to the text (Lines 441, and 649-667) and to revised Figures 2c-h, Figures 3b, d, g, i, j, Figures 4f, g, Figures 5b, e, g, h, Supplementary Figures 3d-f, and Supplementary Figures 4b-d. Indicate in the material and methods ODs of bacterial mixtures used in the "Bacterial competition assays".

      • We apologize for this oversight. The ODs of bacterial mixtures used in the "Bacterial competition assays" have now been specified in the revised Methods section (Line 6____51). Reviewer #1 (Significance)

      This manuscript is interesting for researchers who study T6SS, phage predation and other evolutionary pressures shaping bacterial interactions. The work provides new and interesting insights. My expertise in LPS biology is limited.

      • We sincerely appreciate the reviewer's interest in and support of our study. Reviewer #____2____ (Evidence, reproducibility____,____ and clarity)

      This work investigates the fitness trade-offs in Salmonella enterica resistant to phages. The authors performed co-culture experiments with S. enterica, E. coli, and E. cloacae and found that phage-resistant S. enterica strains displayed reduced fitness in the presence of E. cloacae. Further experiments demonstrated that phage-resistant S. enterica strains were more susceptible to the type VI secretion system (T6SS) of E. cloacae. The authors then examined the role of the O-antigen of lipopolysaccharide (LPS) in T6SS-mediated interbacterial antagonism. By constructing S. enterica mutants with varying O-antigen chain lengths, the authors demonstrated that the O-antigen protects S. enterica from T6SS attack. They then demonstrated that the O-antigen-deficient S. enterica, E. coli, and C. rodentium strains were more susceptible to T6SS attack by E. cloacae. Finally, the authors showed that phage tail spike proteins (TSPs) with endoglycosidase activity could cleave the bacterial O-antigen, thereby increasing susceptibility to T6SS attack.

      The study is well-designed and the experiments are well-executed. The findings are significant and have implications for the understanding of microbial community dynamics.

      • We thank the reviewer for their positive comments regarding our original submission. Major comments

      While the study elegantly demonstrates the link between phage resistance, LPS structure, and T6SS susceptibility, we must remember that these LPS-defective strains are likely at a significant disadvantage in real-world environments without the influence of competing bacteria. Whether it's the gut or external environments, Salmonella needs its LPS for protection against a myriad of host and environmental factors. It seems a bit redundant for T6SS mediated antagonism to select for LPS structures when those structures are essential for bacterial survival outside of this very specific context. It would benefit some discussion about the likelihood of these phage-resistant, LPS-defective strains actually persisting and competing effectively in a more natural setting.

      • We thank the reviewer for their insightful comments and appreciate the opportunity to clarify this point. We agree that LPS-defective bacterial strains face significant disadvantages in natural environments, where they must contend with various host and environmental stresses. Consequently, we did not intend to suggest that T6SS-mediated antagonism is the primary driving force in selecting specific LPS structures. Rather, our study highlights an additional role for LPS during interbacterial interactions, complementing its well-established functions. This notion aligns with the hypotheses proposed in prior studies7-9. The reviewer's comments raise an intriguing question about the essentiality of LPS in Gram-negative bacteria under natural conditions. During our revision process, we identified several examples in the literature demonstrating that LPS may not always be indispensable. For instance, LPS-depleted Neisseria meningitidis strains with an early block in lipid A biosynthesis have been shown to remain viable10,11. These strains may possess adaptive advantages under specific circumstances12. Similarly, some pathogenic bacteria produce truncated LPS structures lacking O-antigen or introduce modified LPS to evade host immune responses13. Additionally, evolutionary pressures, such as phage predation, often drive mutations in O-antigen biosynthesis pathways, resulting in alterations to or an absence of O-antigen14. Furthermore, recent studies have also indicated that trade-offs between abiotic and biotic stresses can influence LPS integrity. For instance, LPS-deficient strains may exhibit selective advantages in extreme environments15,16. These findings underscore the context-dependent nature of LPS functionality and its potential dispensability in certain ecological niches.We sincerely appreciate the reviewer's thought-provoking comments. Our current study aims to provide evidence for the role of interbacterial antagonism as an additional factor influencing LPS integrity. However, we did not mean to overstate the contribution of this mechanism. Instead, we only seek to contribute to a broader understanding of the multifaceted functions of LPS in bacterial survival and adaptation. We have modified the Discussion in our revised manuscript to clarify this idea (Lines 453-466). Minor comments

      Figure 5 could be more effective is panels b and c are together

      • We appreciate this suggestion. We have revised the manuscript accordingly, so panels b and c have been combined in revised Figure 5, __and the respective figure legends have been modified for improved clarity (__Lines 810-823).

      69 Authors should define mucoid

      • The term "mucoid" has now been defined in the revised manuscript (Lines 69-70).

      155 Authors should explain that this result is expected since T6SS acts on solid surface while CDI works in liquid cultures

      • Thank you for this comment. Prior studies have demonstrated that while CDI-mediated antibacterial activity is less efficient in liquid environments, it can still occur on both solid surfaces and in liquid cultures, provided the competitors possess the necessary CdiA binding unit, such as BamA17,18. This understanding supports our initial hypothesis that T6SS and/or CDI contribute to the observed protective phenotype in S. enterica phage-resistant variants (Figure 2).

      clarify what it is meant by unicellular cultures. Should it be monocultures?

      • We apologize for this error and have now replaced "unicellular cultures" with "monocultures" in the revised manuscript (Lines 137, 180, and 258).

      618 add to the text how much dead phage was added per bacterial cell

      • Apologies for this oversight. The multiplicity of infection (MOI) describing the amount of inactivated phages used to treat bacterial cells has now been included in the revised Methods section (Line 661).

      364 references needed for "consistent with predictions for intact LPS structures "

      • We thank the reviewer for pointing out this omission. The relevant reference has now been added to the revised manuscript19 (Line 368). Reviewer #____2____ (Significance)

      This study offers a new perspective on the interplay between phage resistance and bacterial fitness in the context of microbial communities. While the concept of fitness trade-offs associated with antibiotic resistance is well-established, the authors extend this paradigm to phage resistance. They demonstrate that phage-resistant Salmonella enterica strains exhibit reduced fitness in the presence of Enterobacter cloacae due to increased susceptibility to the type VI secretion system (T6SS). This finding is significant as it highlights the potential for interbacterial antagonism to shape the evolution of phage resistance. The authors further show that the O-antigen of lipopolysaccharide (LPS) plays a crucial role in protecting S. enterica from T6SS attack. This observation provides mechanistic insights into the fitness trade-offs associated with phage resistance.

      The study's strength lies in its elegant experimental design and the comprehensive analysis of the interplay between phage resistance, T6SS susceptibility, and O-antigen structure. The authors employ a combination of co-culture experiments, genetic manipulations, and structural analyses to dissect the underlying mechanisms. The findings are robust and have implications for understanding the evolution of bacterial communities in the presence of phages and competing bacterial species.

      This research will be of interest to a broad audience, including researchers in microbiology, synthetic biology, and microbial ecology. The findings have implications for understanding the evolution of phage resistance, and the dynamics of microbial communities. The study's insights into the role of the O-antigen in T6SS susceptibility could also inform the design of novel antimicrobial strategies.

      My expertise is microbial physiology

      • We thank the reviewer for their positive remarks and careful reading of our manuscript. Reviewer #____3____ (Evidence, reproducibility____,____ and clarity)

      Tsai et al. describe LPS biosynthesis mutants arising in selection for phage resistance that increase susceptibility to T6SS-mediated interbacterial antagonism. Phage-derived LPS degrading enzymes also contribute to T6SS susceptibility, which may be due to weakening of the physical barrier of LPS. The mechanisms of this fitness trade-off are elucidated with well-executed and presented experiments.

      • We are grateful to the reviewer for their kind words and critical reading of the manuscript. Major comments

      No major critiques.

      Minor comments

      Others have described two T6SS in Enterobacter cloacae ATCC 13047 (PMID 33072020). Please clarify which of the two are inactivated by the tssM deletion in this study and either provide compelling evidence that both are inactive or change the text throughout to indicate T6SS-1 or T6SS-2 being inactivated.

      • We thank the reviewer for this comment. In our study, we refer to the work by Whitney, J., et al., 201420, from which we used the tssM (ECL_01536) gene deletion strain in which T6SS-1 of the E. cloacae ATCC 13047 is abrogated. Consistent with this detail, we have now clarified in the revised manuscript (Line 155, Supplementary Table 1) that T6SS-1 is inactivated. Moreover, the reference suggested by the reviewer provides additional evidence supporting that T6SS-1, but not T6SS-2, is involved in bacterial competition21, which we also now specify in the revised manuscript. It seems the authors used EHEC EDL933, which has T6SS, in co-culture experiments (Figure 1C). Why do the authors think the S. enterica LPS mutants don't have a competitive disadvantage against EHEC? It seems to run counter to the conclusion that LPS is broadly protective against T6SS.

      • We thank the reviewer for raising this point. While it is true that EHEC O157:H7 strain EDL933 possesses a T6SS gene cluster in its genome, a prior study has shown that the T6SS in this strain appears to be inactivated under laboratory conditions, likely due to repression by the global regulator H-NS22. Consistent with these findings, our data indicate that the S. enterica LPS mutants did not exhibit a competitive disadvantage against EHEC EDL933. These results support the conclusion that, under the conditions tested, the truncated LPS in S. enterica does not affect its fitness against EHEC (Figure 1c), likely due to the inactivity of the EHEC T6SS22. It's not clear if the only Felix O1 and P22 phage-resistant transposon hits were in LPS-related genes, or if that pattern was observed in a more complete transposon sequencing dataset and selected for further study. A complete list of the sequence-identified hits, including the non-LPS related variants, would help clarify this and provide a useful resource to the research community.

      • We thank the reviewer for the opportunity to clarify this point. For each phage, we initially isolated nine phage-resistant transposon variants, which were subsequently used for co-culture assays and transposon insertion site identification, as described in the original manuscript (Figure 1a __and Supplementary Figure 2a__). We agree with the reviewer that a broader screening approach could reveal non-LPS-related variants and provide a more comprehensive resource for the research community. To address this point, during the manuscript revision period, we followed the same procedure and isolated an additional nine phage-resistant variants for each phage (Supplementary Table 1). Interestingly, from this expanded isolation dataset, the transposon insertions were again found exclusively in LPS-related genes (Author Response Figure 1). We have now included this new dataset in the revised manuscript and believe it strengthens the robustness of our findings. This expanded data has been made available below for further reference. The fact that 8 of the 9 Felix O1 resistant variants all have transposon insertions in waaO should be stated in the results. The initial impression of showing R1-R9 is that 9 disrupted genes are being tested - in this case it's really only two. This is a minor critique because clean deletions by allelic exchange are shown for a more extensive set of genes anyway.

      • We thank the reviewer for this comment. As suggested, we have revised the Results section (Lines 126-131) to explicitly state that Felix O1-resistant variants harbor transposon insertions in only two genes (waaO and dagR), which were initially tested in the competition assay (Figure 2). The S. enterica serovar Typhimurium transposon mutagenesis library could benefit from clarification on details. The results section suggests use of a pre-existing "established" transposon library, but the methods and Figure 1 seem to indicate a new library was created based on prior methods. In either case, what is the genome coverage and redundancy of the library? If this is not known or saturation is not reached, the implications of potentially missing phage resistance genes with this approach should be discussed.

      • We thank the reviewer for the opportunity to clarify this point. For our study, we created a transposon library following previously established methods23. The library comprises approximately 12,000 variants, as noted in Figure 1a. While doing so provided substantial genome coverage, it did not achieve full saturation. We have now revised the Results section (Lines 93-94, and 115-117) to better describe the potential limitations of this approach, including by stating the possibility that some phage-resistance genes may have been missed during the screening. There is some variation in phenotype among the strains with transposon insertion into the same gene, such as P22 resistant strain R7 which macroscopically agglutinates while the other waaJ insertions R5 and R1 don't. Is this due to polar effects on waaO, or could it be genetic alterations at other sites driven by stringent phage selection?

      • We thank the reviewer for this comment. We also suspect that the variation in the macroscopically agglutinative phenotypes among P22-resistant strains, such as strain R7 compared to R5 and R1, may be caused by polar effects on waaO. Additionally, the possibility of genetic alterations at other loci driven by stringent phage selection cannot be excluded. To address this potential variability and ensure consistency, we used clean deletions of each LPS biogenesis gene in all subsequent experiments. This approach eliminates the confounding effects of polar mutations or secondary genetic alterations, thereby providing more robust and interpretable data. Figure S1- The graphs with 12 growth curves are difficult to decipher, and the error bars would suggest maybe there are subtle growth differences among the mutants. Quantifying curve parameter(s) and applying a statistical test may clarify. The CFU counts in panel D seem to be not in log scale. Likewise in Figure S3 panel A, the authors say there are no significant growth defects, but the growth curves are modestly right-shifted for several mutants. This is a point of precision rather than a major critique, because the reversal of competitive growth phenotypes by donor T6SS inactivation indicate the potential minor growth defects aren't playing a major role in competition.

      • We thank the reviewer for these suggestions and corrections. We have now revised the manuscript accordingly, including in Supplementary Figures 1 and 3. Quantitative analysis of growth curve parameters and statistical tests have been included below to clarify the observed differences (Author Response Figure 2). The slight right-shift of the growth curves for some mutants, as noted in Supplementary Figure 3, may be attributable to cell aggregation, as shown in Supplementary Figures 2e, f. The growth rate measurements were conducted in a 96-well plate with steady shaking at 200 rpm using a plate reader, which does not fully account for the aggregated cell phenotype. Despite these subtle growth differences, we agree with the reviewer that they do not appear to play a major role in the competitive growth phenotypes, as evidenced by the reversal of phenotypes upon donor T6SS inactivation (Supplementary Figure 3). Figure 3f - The authors say fepE is responsible for very long O-antigen chains, but it is not clear that the delta fepE LPS PAGE differs from wild type, which would fit with the lack of competitive disadvantage against E. cloacae in Figure 3g. The increased VL-modal O-antigen upon fepE overexpression in Figure 3h and increase protection in competition (figure 3i) are convincing. Is there another pathway(s) compensating for fepE deletion?

      • We thank the reviewer for this thoughtful comment. We have repeated the experiment independently at least three times and consistently observed a reduction in the VL-modal O-antigen in the ∆fepE strain. To provide additional clarity, we have included supplementary LPS profiles and quantifications below (Author Response Figure 3). We currently do not have evidence from the literature or our experiments to identify an alternative pathway compensating for the deletion of fepE. Nonetheless, we acknowledge this as a possibility and appreciate the reviewer's insight into this topic. Lines 199-200 - I believe the conclusion from wzzB deletion would be that L-modal O-antigen is necessary for protection against T6SS, and not necessarily sufficient.

      • We thank the reviewer for pointing out this important distinction. The respective sentence has now been revised in the manuscript (Line 204). Do the environmentally isolated phages As2 and As4 encode TSP homologs?

      • We thank the reviewer for this question. We did not identify TSP homologs in the genome of As2 and As4 phages. The genome sequences of As1 to As4 have been uploaded to NCBI's BioProject resource under accession number PRJNA1199570 (Lines 535-544, 741-743). Reviewer #____3____ (Significance)

      This manuscript provides a substantial advance in the field's understanding of how phages affect bacterial community interactions. To my knowledge, it is the first to bring together phage and T6SS defense with a strong mechanistic link. It's a conceptual advance in this regard that will stimulate more thought and experimentation on the roles of phage in bacterial communities like gut and environmental microbiomes. The manuscript's strengths include rigorous overall design, clarity of the communication, and depth of mechanistic investigation, all the way down to atomic force microscopy measurements. There are some minor revisions suggested, but these are addressable with minimal/no additional experiments.

      As someone with expertise in bacterial secretion systems and interbacterial interactions, I think this work will be of interest to microbiologists generally, and specifically in the fields of phage biology, bacterial secretion systems, and microbiome research. While the phage virology components are straightforward and well described, I think a review from someone with more expertise in this specific area would be beneficial.

      • We thank the reviewer for their careful reading of our manuscript and for the suggestions to improve it. References

      • Whitney, J.C., Quentin, D., Sawai, S., LeRoux, M., Harding, B.N., Ledvina, H.E., Tran, B.Q., Robinson, H., Goo, Y.A., Goodlett, D.R., et al. (2015). An interbacterial NAD(P)(+) glycohydrolase toxin requires elongation factor Tu for delivery to target cells. Cell 163, 607-619. 10.1016/j.cell.2015.09.027.

      • Ali, J., Yu, M., Sung, L.K., Cheung, Y.W., and Lai, E.M. (2023). A glycine zipper motif is required for the translocation of a T6SS toxic effector into target cells. EMBO Rep 24, e56849. 10.15252/embr.202356849.
      • LeRoux, M., De Leon, J.A., Kuwada, N.J., Russell, A.B., Pinto-Santini, D., Hood, R.D., Agnello, D.M., Robertson, S.M., Wiggins, P.A., and Mougous, J.D. (2012). Quantitative single-cell characterization of bacterial interactions reveals type VI secretion is a double-edged sword. Proc Natl Acad Sci U S A 109, 19804-19809. 10.1073/pnas.1213963109.
      • Basler, M., Pilhofer, M., Henderson, G.P., Jensen, G.J., and Mekalanos, J.J. (2012). Type VI secretion requires a dynamic contractile phage tail-like structure. Nature 483, 182-186. 10.1038/nature10846.
      • Schwarz, S., West, T.E., Boyer, F., Chiang, W.C., Carl, M.A., Hood, R.D., Rohmer, L., Tolker-Nielsen, T., Skerrett, S.J., and Mougous, J.D. (2010). Burkholderia type VI secretion systems have distinct roles in eukaryotic and bacterial cell interactions. PLoS Pathog 6, e1001068. 10.1371/journal.ppat.1001068.
      • LeRoux, M., Kirkpatrick, R.L., Montauti, E.I., Tran, B.Q., Peterson, S.B., Harding, B.N., Whitney, J.C., Russell, A.B., Traxler, B., Goo, Y.A., et al. (2015). Kin cell lysis is a danger signal that activates antibacterial pathways of Pseudomonas aeruginosa. Elife 4. 10.7554/eLife.05701.
      • Hersch, S.J., Manera, K., and Dong, T.G. (2020). Defending against the Type Six Secretion System: beyond Immunity Genes. Cell Rep 33, 108259. 10.1016/j.celrep.2020.108259.
      • Unterweger, D., Kitaoka, M., Miyata, S.T., Bachmann, V., Brooks, T.M., Moloney, J., Sosa, O., Silva, D., Duran-Gonzalez, J., Provenzano, D., and Pukatzki, S. (2012). Constitutive type VI secretion system expression gives Vibrio cholerae intra- and interspecific competitive advantages. PLoS One 7, e48320. 10.1371/journal.pone.0048320.
      • Toska, J., Ho, B.T., and Mekalanos, J.J. (2018). Exopolysaccharide protects Vibrio cholerae from exogenous attacks by the type 6 secretion system. Proc Natl Acad Sci U S A 115, 7997-8002. 10.1073/pnas.1808469115.
      • Steeghs, L., den Hartog, R., den Boer, A., Zomer, B., Roholl, P., and van der Ley, P. (1998). Meningitis bacterium is viable without endotoxin. Nature 392, 449-450. 10.1038/33046.
      • Steeghs, L., de Cock, H., Evers, E., Zomer, B., Tommassen, J., and van der Ley, P. (2001). Outer membrane composition of a lipopolysaccharide-deficient Neisseria meningitidis mutant. EMBO J 20, 6937-6945. 10.1093/emboj/20.24.6937.
      • Fransen, F., Heckenberg, S.G., Hamstra, H.J., Feller, M., Boog, C.J., van Putten, J.P., van de Beek, D., van der Ende, A., and van der Ley, P. (2009). Naturally occurring lipid A mutants in neisseria meningitidis from patients with invasive meningococcal disease are associated with reduced coagulopathy. PLoS Pathog 5, e1000396. 10.1371/journal.ppat.1000396.
      • Maldonado, R.F., Sa-Correia, I., and Valvano, M.A. (2016). Lipopolysaccharide modification in Gram-negative bacteria during chronic infection. FEMS Microbiol Rev 40, 480-493. 10.1093/femsre/fuw007.
      • Yu, J., Zhang, H., Ju, Z., Huang, J., Lin, C., Wu, J., Wu, Y., Sun, S., Wang, H., Hao, G., and Zhang, A. (2024). Increased mutations in lipopolysaccharide biosynthetic genes cause time-dependent development of phage resistance in Salmonella. Antimicrob Agents Chemother 68, e0059423. 10.1128/aac.00594-23.
      • Burmeister, A.R., Fortier, A., Roush, C., Lessing, A.J., Bender, R.G., Barahman, R., Grant, R., Chan, B.K., and Turner, P.E. (2020). Pleiotropy complicates a trade-off between phage resistance and antibiotic resistance. Proc Natl Acad Sci U S A 117, 11207-11216. 10.1073/pnas.1919888117.
      • Carretero-Ledesma, M., Garcia-Quintanilla, M., Martin-Pena, R., Pulido, M.R., Pachon, J., and McConnell, M.J. (2018). Phenotypic changes associated with Colistin resistance due to Lipopolysaccharide loss in Acinetobacter baumannii. Virulence 9, 930-942. 10.1080/21505594.2018.1460187.
      • Aoki, S.K., Pamma, R., Hernday, A.D., Bickham, J.E., Braaten, B.A., and Low, D.A. (2005). Contact-dependent inhibition of growth in Escherichia coli. Science 309, 1245-1248. 10.1126/science.1115109.
      • Aoki, S.K., Malinverni, J.C., Jacoby, K., Thomas, B., Pamma, R., Trinh, B.N., Remers, S., Webb, J., Braaten, B.A., Silhavy, T.J., and Low, D.A. (2008). Contact-dependent growth inhibition requires the essential outer membrane protein BamA (YaeT) as the receptor and the inner membrane transport protein AcrB. Mol Microbiol 70, 323-340. 10.1111/j.1365-2958.2008.06404.x.
      • Gao, Y., Widmalm, G., and Im, W. (2023). Modeling and Simulation of Bacterial Outer Membranes with Lipopolysaccharides and Capsular Polysaccharides. J Chem Inf Model 63, 1592-1601. 10.1021/acs.jcim.3c00072.
      • Whitney, J.C., Beck, C.M., Goo, Y.A., Russell, A.B., Harding, B.N., De Leon, J.A., Cunningham, D.A., Tran, B.Q., Low, D.A., Goodlett, D.R., et al. (2014). Genetically distinct pathways guide effector export through the type VI secretion system. Mol Microbiol 92, 529-542. 10.1111/mmi.12571.
      • Soria-Bustos, J., Ares, M.A., Gomez-Aldapa, C.A., Gonzalez, Y.M.J.A., Giron, J.A., and De la Cruz, M.A. (2020). Two Type VI Secretion Systems of Enterobacter cloacae Are Required for Bacterial Competition, Cell Adherence, and Intestinal Colonization. Front Microbiol 11, 560488. 10.3389/fmicb.2020.560488.
      • Wan, B., Zhang, Q., Ni, J., Li, S., Wen, D., Li, J., Xiao, H., He, P., Ou, H.Y., Tao, J., et al. (2017). Type VI secretion system contributes to Enterohemorrhagic Escherichia coli virulence by secreting catalase against host reactive oxygen species (ROS). PLoS Pathog 13, e1006246. 10.1371/journal.ppat.1006246.
      • Mandal, R.K., Jiang, T., and Kwon, Y.M. (2021). Genetic Determinants in Salmonella enterica Serotype Typhimurium Required for Overcoming In Vitro Stressors in the Mimicking Host Environment. Microbiol Spectr 9, e0015521. 10.1128/Spectrum.00155-21.
    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This work makes several contributions: (1) a method for the self-supervised segmentation of cells in 3D microscopy images, (2) an cell-segmented dataset comprising six volumes from a mesoSPIM sample of a mouse brain, and (3) a napari plugin to apply and train the proposed method.

      First, thanks for acknowledging our contributions of a new tool, new dataset, and new software.

      (1) Method

      This work presents itself as a generalizable method contribution with a wide scope: self-supervised 3D cell segmentation in microscopy images. My main critique is that there is almost no evidence for the proposed method to have that wide of a scope. Instead, the paper is more akin to a case report that shows that a particular self-supervised method is good enough to segment cells in two datasets with specific properties.

      First, thanks for acknowledging our contributions of a new tool, new dataset, and new software. We agree we focus on lightsheet microscopy data, therefore to narrow the scope we have changed the title to “CellSeg3D: self-supervised 3D cell segmentation for light-sheet microscopy”.

      To support the claim that their method "address[es] the inherent complexity of quantifying cells in 3D volumes", the method should be evaluated in a comprehensive study including different kinds of light and electron microscopy images, different markers, and resolutions to cover the diversity of microscopy images that both title and abstract are alluding to.

      You have selectively dropped the last part of that sentence that is key: “.... 3D volumes, often in cleared neural tissue” – which is what we tackle. The next sentence goes on to say: “We offer a new 3D mesoSPIM dataset and show that CellSeg3D can match state-of-the-art supervised methods.” Thus, we literally make it clear our claims are on MesoSPIM and cleared data.

      The main dataset used here (a mesoSPIM dataset of a whole mouse brain) features well-isolated cells that are easily distinguishable from the background. Otsu thresholding followed by a connected component analysis already segments most of those cells correctly.

      This is not the case, as all the other leading methods we fairly benchmark cannot solve the task without deep learning (i.e., no method is at an F1-Score of 1).

      The proposed method relies on an intensity-based segmentation method (a soft version of a normalized cut) and has at least five free parameters (radius, intensity, and spatial sigma for SoftNCut, as well as a morphological closing radius, and a merge threshold for touching cells in the post-processing). Given the benefit of tweaking parameters (like thresholds, morphological operation radii, and expected object sizes), it would be illuminating to know how other non-learning-based methods will compare on this dataset, especially if given the same treatment of segmentation post-processing that the proposed method receives. After inspecting the WNet3D predictions (using the napari plugin) on the used datasets I find them almost identical to the raw intensity values, casting doubt as to whether the high segmentation accuracy is really due to the self-supervised learning or instead a function of the post-processing pipeline after thresholding.

      First, thanks for testing our tool, and glad it works for you. The deep learning methods we use cannot “solve” this dataset, and we also have a F1-Score (dice) of ~0.8 with our self-supervised method. We don’t see the value in applying non-learning methods; this is unnecessary and beyond the scope of this work.

      I suggest the following baselines be included to better understand how much of the segmentation accuracy is due to parameter tweaking on the considered datasets versus a novel method contribution:

      *  comparison to thresholding (with the same post-processing as the proposed method) * comparison to a normalized cut segmentation (with the same post-processing as the proposed method)

      *  comparison to references 8 and 9.

      Ref 8 and 9 don’t have readily usable (https://github.com/LiangHann/USAR) or even shared code (https://github.com/Kaiseem/AD-GAN), so re-implementing this work is well beyond the bounds of this paper. We benchmarked Cellpose, StartDist, SegResNets, and a transformer – SwinURNet. Moreover, models in the MONAI package can be used. Note, to our knowledge the transformer results also are a new contribution that the Reviewer does not acknowledge.

      I further strongly encourage the authors to discuss the limitations of their method. From what I understand, the proposed method works only on well-separated objects (due to the semantic segmentation bottleneck), is based on contrastive FG/BG intensity values (due to the SoftNCut loss), and requires tuning of a few parameters (which might be challenging if no ground-truth is available).

      We added text on limitations. Thanks for this suggestion.

      (2) Dataset

      I commend the authors for providing ground-truth labels for more than 2500 cells. I would appreciate it if the Methods section could mention how exactly the cells were labelled. I found a good overlap between the ground truth and Otsu thresholding of the intensity images. Was the ground truth generated by proofreading an initial automatic segmentation, or entirely done by hand? If the former, which method was used to generate the initial segmentation, and are there any concerns that the ground truth might be biased towards a given segmentation method?

      In the already submitted version, we have a 5-page DataSet card that fully answers your questions. They are ALL labeled by hand, without any semi-automatic process.

      In our main text we even stated “Using whole-brain data from mice we cropped small regions and human annotated in 3D 2,632 neurons that were endogenously labeled by TPH2-tdTomato” - clearly mentioning it is human-annotated.

      (3) Napari plugin

      The plugin is well-documented and works by following the installation instructions.

      Great, thanks for the positive feedback.

      However, I was not able to recreate the segmentations reported in the paper with the default settings for the pre-trained WNet3D: segments are generally too large and there are a lot of false positives. Both the prediction and the final instance segmentation also show substantial border artifacts, possibly due to a block-wise processing scheme.

      Your review here does not match your comments above; above you said it was working well, such that you doubt the GT is real and the data is too easy as it was perfectly easy to threshold with non-learning methods.

      You would need to share more details on what you tried. We suggest following our code; namely, we provide the full experimental code and processing for every figure, as was noted in our original submission: https://github.com/C-Achard/cellseg3d-figures.

      Reviewer #2 (Public Review):

      Summary:

      The authors propose a new method for self-supervised learning of 3d semantic segmentation for fluorescence microscopy. It is based on a WNet architecture (Encoder / Decoder using a UNet for each of these components) that reconstructs the image data after binarization in the bottleneck with a soft n-cuts clustering. They annotate a new dataset for nucleus segmentation in mesoSPIM imaging and train their model on this dataset. They create a napari plugin that provides access to this model and provides additional functionality for training of own models (both supervised and self-supervised), data labeling, and instance segmentation via post-processing of the semantic model predictions. This plugin also provides access to models trained on the contributed dataset in a supervised fashion.

      Strengths:

      (1) The idea behind the self-supervised learning loss is interesting.

      (2) The paper addresses an important challenge. Data annotation is very time-consuming for 3d microscopy data, so a self-supervised method that yields similar results to supervised segmentation would provide massive benefits.

      Thank you for highlighting the strengths of our work and new contributions.

      Weaknesses:

      The experiments presented by the authors do not adequately support the claims made in the paper. There are several shortcomings in the design of the experiment, presentation of the results, and reproducibility.

      We address your concerns and misunderstandings below.

      Major weaknesses:

      (1) The main experiments are conducted on the new mesoSPIM dataset, which contains quite small nuclei, much smaller than the pretraining datasets of CellPose and StarDist. I assume that this is one of the main reasons why these well-established methods don't work for this dataset.

      StarDist is not pretrained, we trained it from scratch as we did for WNet3D. We retrained Cellpose and reported the results both with their pretrained model and our best-retrained model. This is documented in Figure 1 and Suppl. Figure 1. We also want to push back and say that they both work very well on this data. In fact, our main claim is not that we beat them, it is that we can match them with a self-supervised method.

      Limiting method comparison to only this dataset may create a misleading impression that CellSeg3D is superior for all kinds of 3D nucleus segmentation tasks, whereas this might only hold for small nuclei.

      The GT dataset we labeled has nuclei that are normal brain-cell sized. Moreover in Figure 2 we show very different samples with both dense and noisy (c-FOS) labeling.

      We also clearly do not claim this is superior for all tasks, from our text: “First, we benchmark our methods against Cellpose and StarDist, two leading supervised cell segmentation packages with user-friendly workflows, and show our methods match or outperform them in 3D instance segmentation on mesoSPIM-acquired volumes" – we explicitly do NOT claim beyond the scope of the benchmark. Moreover we state: "We found that WNet3D could be as good or better than the fully supervised models, especially in the low data regime, on this dataset at semantic and instance segmentation" – again noting on this dataset. Again, we only claimed we can be as good as these methods with an unsupervised approach, and in the low-GT data regime we can excel.

      Further, additional preprocessing of the mesoSPIM images may improve results for StarDist and CellPose (see the first point in minor weaknesses). Note: having a method that works better for small nuclei would be an important contribution. But I doubt that the claims hold for larger and or more crowded nuclei as the current version of the paper implies.

      Figure 2 benchmarks our method on larger and denser nuclei, but we do not intend to claim this is a universal tool. It was specifically designed for light-sheet (brain) data, and we have adjusted the title to be more clear. But we also show in Figure 2 it works well on more dense and noisy samples, hinting that it could be a promising approach. But we agree, as-is, it’s unlikely to be good for extremely dense samples like in electron microscopy, which we never claim it would be.

      With regards to preprocessing, we respectfully disagree. We trained StarDist (and asked the main developer of StarDist, Martin Weigert, to check our work and he is acknowledged in the paper) and it does very well. Cellpose we also retrained and optimized and we show it works as-well-as leading transformer and CNN-based approaches. Again, we only claimed we can be as good as these methods with an unsupervised approach.

      The contribution of the paper would be much stronger if a **fair** comparison with StarDist / CellPose was also done on the additional datasets from Figure 2.

      We appreciate that more datasets would be ideal, but we always feel it’s best for the authors of tools to benchmark their own tools on data. We only compared others in Figure 1 to the new dataset we provide so people get a sense of the quality of the data too; there we did extensive searches for best parameters for those tools. So while we think it would be nice, we will leave it to those authors to be most fair. We also narrowed the scope of our claims to mesoSPIM data (added light-sheet to the title), which none of the other examples in Figure 2 are.

      (2) The experimental setup for the additional datasets seems to be unrealistic. In general, the description of these experiments is quite short and so the exact strategy is unclear from the text. However, you write the following: "The channel containing the foreground was then thresholded and the Voronoi-Otsu algorithm used to generate instance labels (for Platynereis data), with hyperparameters based on the Dice metric with the ground truth." I.e., the hyperparameters for the post-processing are found based on the ground truth. From the description it is unclear whether this is done a) on the part of the data that is then also used to compute metrics or b) on a separate validation split that is not used to compute metrics. If a) this is not a valid experimental setup and amounts to training on your test set. If b) this is ok from an experimental point of view, but likely still significantly overestimates the quality of predictions that can be achieved by manual tuning of these hyperparameters by a user that is not themselves a developer of this plugin or an absolute expert in classical image analysis, see also 3.

      We apologize for this confusion; we have now expanded the methods to clarify the setup is now b; you can see what we exactly did as well in the figure notebook: https://c-achard.github.io/cellseg3d-figures/fig2-b-c-extra-datasets/self-supervised-ext ra.html#threshold-predictions.

      For clarity, we additionally link each individual notebook now in the Methods.

      (3) I cannot reproduce any of the results using the plugin. I tried to reproduce some of the results from the paper qualitatively: First I downloaded one of the volumes from the mesoSPIM dataset (c5image) and applied the WNet3D to it. The prediction looks ok, however the value range is quite close (Average BG intensity ~0.4, FG intensity 0.6-0.7). I try to apply the instance segmentation using "Convert to instance labels" from "Utilities". Using "Voronoi-Otsu" does not work due to an error in pyClesperanto ("clGetPlatformIDs failed: PLATFORM_NOT_FOUND_KHR"). Segmentation via "Connected Components" and "Watershed" requires extensive manual tuning to get a somewhat decent result, which is still far from perfect.

      We are sorry to hear of the installation issue; pyClesperanto is a dependency that would be required to reproduce the images (sounds like you had this issue; https://forum.image.sc/t/pyclesperanto-prototype-doesnt-work/45724 ) We added to our docs now explicitly the fix:https://github.com/AdaptiveMotorControlLab/CellSeg3D/pull/90. We recommend checking the reproduction notebooks (which were linked in initial submission): https://c-achard.github.io/cellseg3d-figures/intro.html.

      Then I tried to reproduce the results for the Mouse Skull Nuclei Dataset from EmbedSeg. The results look like a denoised version of the input image, not a semantic segmentation. I was skeptical from the beginning that the method would transfer without retraining, due to the very different morphology of nuclei (much larger and elongated). None of the available segmentation methods yield a good result, the best I can achieve is a strong over-segmentation with watersheds.

      We are surprised to hear this; did you follow the following notebook which directly produces the steps to create this figure? (This was linked in preprint): https://c-achard.github.io/cellseg3d-figures/fig2-c-extra-datasets/self-supervised-extra .html

      We also expanded the methods to include the exact values from the notebook into the text.

      Minor weaknesses:

      (1) CellPose can work better if images are resized so that the median object size in new images matches the training data. For CellPose the cyto2 model should do this automatically. It would be important to report if this was done, and if not would be advisable to check if this can improve results.

      We reported this value in Figure 1 and found it to work poorly, that is why we retrained Cellpose and found good performance results (also reported in Figure 1). Resizing GB to TB volumes for mesoSPIM data is otherwise not practical, so simply retraining seems the preferable option, which is what we did.

      (2) It is a bit confusing that F1-Score and Dice Score are used interchangeably to evaluate results. The dice score only evaluates semantic predictions, whereas F1-Score evaluates the actual instance segmentation results. I would advise to only use F1-Score, which is the more appropriate metric. For Figure 1f either the mean F1 score over thresholds or F1 @ 0.5 could be reported. Furthermore, I would advise adopting the recommendations on metric reporting from https://www.nature.com/articles/s41592-023-01942-8.

      We are using the common metrics in the field for instance and semantic segmentation, and report them in the methods. In Figure 2f we actually report the “Dice” as defined in StarDist (as we stated in the Methods). Note, their implementation is functionally equivalent to F1-Score of an IoU >= 0, so we simply changed this label in the figure now for clarity. We agree this clarifies for the expert readers what was done, and we expanded the methods to be more clear about metrics.

      We added a link to the paper you mention as well.

      (3) A more conceptual limitation is that the (self-supervised) method is limited to intensity-based segmentation, and so will not be able to work for cases where structures cannot be distinguished based on intensity only. It is further unclear how well it can separate crowded nuclei. While some object separation can be achieved by morphological operations this is generally limited for crowded segmentation tasks and the main motivation behind the segmentation objective used in StarDist, CellPose, and other instance segmentation methods. This limitation is only superficially acknowledged in "Note that WNet3D uses brightness to detect objects [...]" but should be discussed in more depth. Note: this limitation does not mean at all that the underlying contribution is not significant, but I think it is important to address this in more detail so that potential users know where the method is applicable and where it isn't.

      We agree, and we added a new section specifically on limitations. Thanks for raising this good point. Thus, while self-supervision comes at the saving of hundreds of manual labor, it comes at the cost of more limited regimes it can work on. Hence why we don’t claim this should replace excellent methods like Cellpose or Stardist, but rather complement them and can be used on mesoSPIM samples, as we show here.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) One of the listed contributions is "adding the SoftNCuts loss". This is not true, reference 10 already introduced that loss.

      “Our changes include a conversion to a fully 3D architecture and adding the SoftNCuts loss” - we dropped the common and added the word “AND” to note that we added the 3D version of the SoftNCuts loss TO the 3D architecture, which 10 did not do.

      (2) "Typically, these methods use a multi-step approach" to segment 3D from 2D: this is only true for CellPose, StarDist does real 3D.

      That is why we preface with “typically” which implies not always.

      (3) "see Methods, Figure 1c, c)" is missing an opening in parentheses.

      (4) K is not introduced in equation (1) (presumably the number of classes, which seems to be 2 for all experiments considered).

      k actually was introduced just below equation 1 as the number of classes. We added the note that k was set to 2.

      (5) X is not introduced in equation (2) (presumably the spatial position of a voxel).

      Sorry for this oversight. We add that $X$ is the spatial position of the voxel.

      Reviewer #2 (Recommendations For The Authors):

      To improve the paper the weaknesses mentioned above should be addressed:

      (1) Compare to StarDist and/or CellPose on further datasets, esp. using pre-trained CellPose, to see if the claims of competitive performance with state-of-the-art approaches hold up for the case of different nucleus morphologies. The EmbedSeg datasets from Figure 2 c are well suited for this. In the current form, the claims are too broad and not supported if thorough experiments are performed on a single dataset with a very specific morphology. Note: even if the method is not fully competitive with CellPose / StarDist on these Datasets it holds merit since a segmentation method that works for small nuclei as in the mesoSPIM dataset and works self-supervised is very valuable.

      (2) Clarify how the best instance segmentation hyperparameters are found. If you indeed optimize these on the same part of the dataset used for evaluating metrics then the current experimental set-up is invalid. If this is not the case I would still rethink if this is a good way to report the results since it does not seem to reflect user experience. I found it not possible to find good hyperparameters for either of the two segmentation approaches I tried (see also next point) so I think these numbers are too optimistic.

      (3) Improve the instance segmentation part of the plugin: either provide troubleshooting for how to install pyClesperanto correctly to use the voronoi-based instance segmentation or implement it based on more standard functionality like skimage / scipy. Provide more guidance for finding good hyperparameters for the segmentation task.

      (4) Make sure image resizing is done correctly when using pre-trained CellPose models and report on this.

      (5) Report F1 Scores only (unless there is a compelling reason to also report Dice).

      (6) Address the limitations of the method in more detail.

      On a positive note: all data and code are available and easy to download/install. A minor comment: it would be very helpful to have line numbers for reviewing a revised version.

      All comments are also addressed in the public reviews.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This valuable study provides in vivo evidence for the synchronization of projection neurons in the olfactory bulb at gamma frequency in an activity-dependent manner. This study uses optogenetics in combination with single-cell recordings to selectively activate sensory input channels within the olfactory bulb. The data are thoughtfully analyzed and presented; the evidence is solid, although some of the conclusions are only partially supported.

      We deeply thank all the reviewers for their time, effort, and insightful comments. Their revision led to a significant improvement of the paper.

      The reviewers suggested toning down our claim that we found a mechanism that synchronizes all odor-evoked MTC activities, as we do not directly show that. We concur and address this in our revised version to ensure a precise interpretation of our findings. In short, we state that we revealed a synchronization mechanism between two groups of active mitral and tufted cells (MTCs) and show that this synchronization is activity-dependent and distance-independent. This mechanism can enable the synchronization of all odor-activated MTCs.

      Another issue raised is the interpretation of the results obtained under Ketamine anesthesia. Ketamine is an NMDA receptor antagonist that plays a crucial role in the  MTC-GC reciprocal synapse. To address this, we include new analyses demonstrating that optogenetic activation of granule cells (GCs) can inhibit the recorded MTCs during baseline activity but does not substantially affect odor-evoked MTC firing rates. We show that this is correct in both Ketamine-induced anesthesia and awake mice (Dalal & Haddad, 2022). This indicates that GC-MTC connections are functional even under Ketamine anesthesia, however, they do not exert substantial suppression on odor-evoked MTC responses. We added a paragraph to the discussion section on the potential influence of Ketamine anesthesia on GC-MTC synapses and its implications on our findings.

      Finally, we discuss several recent studies that are particularly relevant to our research and expand the discussion on our hypothesis that parvalbumin-positive cells in the olfactory bulb may serve as key mediators of the activity- and distance-dependent lateral inhibition observed in our findings.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Dalal and Haddad investigated how neurons in the olfactory bulb are synchronized in oscillatory rhythms at gamma frequency. Temporal coordination of action potentials fired by projection neurons can facilitate information transmission to downstream areas. In a previous paper (Dalal and Haddad 2022, https://doi.org/10.1016/j.celrep.2022.110693), the authors showed that gamma frequency synchronization of mitral/tufted cells (MTCs) in the olfactory bulb enhances the response in the piriform cortex. The present study builds on these findings and takes a closer look at how gamma synchronization is restricted to a specific subset of MTCs in the olfactory bulb. They combined odor and optogenetic stimulations in anesthetized mice with extracellular recordings.<br /> The main findings are that lateral synchronization of MTCs at gamma frequency is mediated by granule cells (GCs), independent of the spatial distance, and strongest for MTCs with firing rates close to 40 Hz. The authors conclude that this reveals a simple mechanism by which spatially distributed neurons can form a synchronized ensemble. In contrast to lateral synchronization, they found no evidence for the involvement of GCs in lateral inhibition of nearby MTCs.

      Strengths:

      Investigating the mechanisms of rhythmic synchronization in vivo is difficult because of experimental limitations for the readout and manipulation of neuronal populations at fast timescales. Using spatially patterned light stimulation of opsin-expressing neurons in combination with extracellular recordings is a nice approach. The paper provides evidence for an activity-dependent synchronization of MTCs in gamma frequency that is mediated by GCs.

      Weaknesses:

      An important weakness of the study is the lack of direct evidence for the main conclusion - the synchronization of MTCs in gamma frequency. The data shows that paired optogenetic stimulation of MTCs in different parts of the olfactory bulb increases the rhythmicity of individual MTCs (Figure 1) and that combined odor stimulation and GC stimulation increases rhythmicity and gamma phase locking of individual MTCs (Figure 4). However, a direct comparison of the firing of different MTCs is missing. This could be addressed with extracellular recordings at two different locations in the olfactory bulb. The minimum requirement to support this conclusion would be to show that the MTCs lock to the same phase of the gamma cycle. Also, showing the evoked gamma oscillations would help to interpret the data.

      We agree with the reviewer that direct evidence of mutual synchronization between multiple recorded MTCs has not been shown in our study. Our study only shows a mechanism that can enable this synchronization. We now state this clearly in the manuscript. We based this on previous studies that tested MTC spike synchronization. Specifically, Schoppa 2006, reported that electrical OSN stimulation evokes MTC spikes synchronization in the gamma range, in-vitro. Kashiwadni et al., 1999 and Doucette et al., 2011 showed that odor-evoked MTC spike times are synchronized, in-vivo. Given these studies, we asked what is the underlying mechanism that can support such a synchronization. Our study demonstrates that activating a group of MTCs can entrain another MTC in an activity-dependent and distance-independent manner. We claim this could be the underlying mechanism for the odor-evoked synchronization as demonstrated by these previous studies.

      To make sure this is clearly stated in the manuscript we changed the title to “Activity-dependent lateral inhibition enables the synchronization of active olfactory bulb projection neurons”, and rephrased a sentence in the abstract to “This lateral synchronization was particularly effective when the recorded MTC fired at the gamma rhythm”. To further clarify this point, we made several other changes throughout the results and the discussion section.

      Another weakness is that all experiments are performed under anesthesia with ketamine/medetomidine. Ketamine is an antagonist of NMDA receptors and NMDA receptors are critically involved in the interactions of MTCs and GCs at the reciprocal synapses (see for example Lage-Rupprecht et al. 2020, https://doi.org/10.7554/eLife.63737; Egger and Kuner 2021, https://doi.org/10.1007/s00441-020-03402-7). This should be considered for the interpretation of the presented data.

      This issue has been raised by reviewers #1 and #2. We think, as also reviewer #2 acknowledged, that this issue does not compromise our results. However, to address this important point we added the below section to the Discussion:

      “Our experiments were performed under Ketamine anesthesia, an NMDA receptor antagonist that affects the reciprocal dendro-dendritic synapses between MTCs and GCs (Egger and Kuner, 2021; Lage-Rupprecht et al., 2020). Consistent with that, recent studies reported lower excitability of GC activity under anesthesia (Cazakoff et al., 2014; Kato et al., 2012).  This raises the concern that our result might not be valid in the awake state. We argue that this is unlikely. First, (Fukunaga et al., 2014) reported that GCs baseline activity in anesthetized and awake mice is similar, suggesting that MTC-GC synapses are functioning. Second, we show that light activation of GCL neurons strongly inhibits the MTC baseline activity (Figure 5) and increases MTC odor-evoked spike-LFP coupling in the gamma range (Figure 4). These experiments validate that GCL neurons can exert inhibition over MTCs in our experimental setup. Third, we have shown that light-activating all accessible GCL neurons has a minor effect on the MTC odor-evoked firing rates in an awake state (Dalal and Haddad, 2022), corroborating the finding that GCL neurons are unlikely to provide strong suppression to MTCs. Fourth, and most importantly, we showed that optogenetic stimulation of MTCs entrains other MTC spike times, which is achieved via the GCL neurons. This suggests that the lack of lateral suppression following MTC or GCL neuron opto-activation is not due to MTC-GC synapse blockage. That said, we cannot exclude the unlikely possibility that NMDA receptor blockage under anesthesia impairs MTC-to-MTC suppressive interactions but not the MTC-to-MTC mediated spike entrainment.”

      Figure 1A and D from Dalal & Haddad 2022 show the effect of GCL neurons opto-activation during odor stimulation on MTC firing rates in awake and anesthetized mice.

      Furthermore, the direct effect of optogenetic stimulation on GCs activity is not shown. This is particularly important because they use Gad2-cre mice with virus injection in the olfactory bulb and expression might not be restricted to granule cells and might not target all subtypes of granule cells (Wachowiak et al., 2013, https://doi.org/10.1523/JNEUROSCI.4824-12.2013). This should be considered for the interpretation of the data, particularly for the absence of an effect of GC stimulation on lateral inhibition.

      In this study we used Gad2-cre mice, and the protocol for viral transfection of GCL neurons reported in Fukunaga et al., 2014. They reported that: ‘more than 90% of Cre-expressing neurons in the GCL also expressed fluorescently tagged ArchT’. Consistently, when Fukunaga et al. expressed ChR2 in the GCL using the same viral infection as we used, they reported that: ”Light presentation in vivo resulted in rapid and strong depolarization of, and action potential (AP) discharges in, GCs (Fig. 3b), which in

      turn consistently and strongly hyperpolarized M/TCs (9 of 9 cells showed 100% AP suppression; Fig. 3c,d)”. This study shows clearly that this infection protocol is robust. Moreover, in new panels we added to the manuscript (Figure 5a-b), we show that optogenetic activation of GCL neurons strongly suppressed MTC activity during baseline conditions but not odor-evoked responses MTCs. This is consistent with the reports by Fukunaga et al, and indicates that GCL neurons are functional as they can suppress MTC baseline activity.

      Finally, since virus injection to the granule cell layer can target other GCL neuron types, we changed the reference in the text to GCL neurons (as was done in Gschwend et al., 2015) instead of ‘GCs’ when referring to GC. We replaced the image in Figure 4A, to show the expression of ChR2 is restricted to GCL neurons. That said, it is still possible that our protocol did not infect all GC subtypes. To address this, we added this line to the Discussion: “We also note that our viral transfection protocol in Gad2-Cre mice might not transfect all subtypes of GCs”

      Several conclusions are only supported by data from example neurons. The paper would benefit from a more detailed description of the analysis and the display of some additional analysis at the population level:

      - What were the criteria based on which the spots for light-activation were chosen from the receptive field map?

      In order to make this point clearer, we extended the explanation in the Methods on the selection criteria: “Spots were selected either randomly or manually. In the manual selection case, we selected spots that caused either significant or mild but insignificant inhibitory effect on the recorded MTC (e.g., local cold spots in the receptive-field map; see example in Figure 2a of example spots that were selected manually)”. We also add a reference in the text to the Methods: “see Methods for spots selection criteria”.

      - The absence of an effect on firing rate for paired stimulations is only shown for one example (Figure 1c). A quantification of the population level would be interesting.

      - Only one example neuron is shown to support the conclusion that "two different neural circuits mediate suppression and entrainment" in Figure 3. A population analysis would provide more evidence.

      Thank you very much for these comments. We added a population analysis in Figure 3. This analysis shows a dissociation between firing rate suppression and the entrainment groups (Figure 3c-d). This suggests that two different circuits mediate suppression and entrainment.

      - Only one example neuron is shown to illustrate the effect of GC stimulation on gamma rhythmicity of MTCs in Figures 4 f,g.

      In this figure, we show that the activation of subsets of GCL neurons elevated odor-evoked spike synchronization to the gamma rhythm. We thought it would be beneficial to demonstrate the change in spike entrainment following GCL neurons optogenetic activation regardless of the ongoing OB gamma oscillations, using the method presented by Fukunaga et al., 2014. However, this analysis requires that the neuron has a relatively high firing rate. As we describe in the figure legend of this panel, this neuron is probably a tufted cell based on the findings shown in Fukunaga et al., 2014 and Burton & Urban, 2021. Most of our recorded cells had a lower firing rate, which coincides with our typical recording depth, targeting mitral cells rather than tufted cells (~400µm deep). Since this analysis is shown only over a single neuron, we moved it to Supplementary Figure 4.

      - In Figure 5 and the corresponding text, "proximal" and "distal" GC activation are not clearly defined.

      We agree. Initially, we used these terms to refer to GC columns that include the recorded MTC (proximal) and columns that are away from it (distal). We decided that instead of using a coarse division, we would show the whole range of distances. We updated the analysis in Figure 5d to show the effect of GC optogenetic activation on MTC odor-evoked responses as a function of the distance from the recorded MTC.

      Reviewer #2 (Public Review):

      Summary

      This study provides a detailed analysis and dissociation between two effects of activation of lateral inhibitory circuits in the olfactory bulb on ongoing single mitral/tufted cell (MTC) spiking activity, namely enhanced synchronization in the gamma frequency range or lateral inhibition of firing rate.

      The authors use a clever combination of single-cell recordings, optogenetics with variable spatial stimulation of MTCs and sensory stimulation in vivo, and established mathematical methods to describe changes in autocorrelation/synchronization of a single MTC's spiking activity upon activation of lateral glomerular MTC ensembles. This assay is rounded off by a gain-of-function experiment in which the authors enhance granule cell (GC) excitation to establish a causal relation between GC activation and enhanced synchronization to gamma (they had used this manipulation in their previous paper Dalal & Haddad 2022, but use a smaller illumination spot here for spatially restricted activation).

      Strengths

      This study is of high interest for olfactory processing - since it shows directly that interactions between only two selected active receptor channels are sufficient to enhance the synchronization of single neurons to gamma in one channel (and thus by inference most likely in both). These interactions are distance-independent over many 100s of µms and thus can allow for non-topographical inhibitory action across the bulb, in contrast to the center-surround lateral inhibition known from other sensory modalities.

      In my view, parallels between vision and olfaction might have been overemphasized so far, since the combinatorial encoding of olfactory stimuli across the glomerular map might require different mechanisms of lateral interaction versus vision. This result is indicative of such a major difference.

      Such enhanced local synchronization was observed in a subset of activated channel pairs; in addition, the authors report another type of lateral interaction that does involve the reduction of firing rates, drops off with distance and most likely is caused by a different circuit-mediated by PV+ neurons (PVN; the evidence for which is circumstantial).

      Weaknesses/Room for improvement

      Thus this study is an impressive proof of concept that however does not yet allow for broad generalization. Therefore the framing of results should be slightly more careful in my opinion.

      We agree with the reviewer. We copy here our response to reviewer #1, who raised the same issue.

      We agree that direct evidence of mutual synchronization between multiple recorded MTCs has not been shown in our study. Our study only shows a mechanism that can enable this synchronization. We now state this clearly in the manuscript. We relayed previous studies that tested MTC spike synchronization. Specifically, Schoppa 2006, reported that electrical OSN stimulation evokes MTC spikes synchronization in the gamma range, in-vitro. Kashiwadni et al., 1999 and Doucette et al., showed that odor-evoked MTC spike times are synchronized, in-vivo. Given these studies, we asked what is the underlying mechanism that can support such a synchronization. Our study demonstrates that activating a group of MTCs can entrain another MTC in an activity-dependent and distance-independent manner. We claim this could be the underlying mechanism for the odor-evoked synchronization as demonstrated by these previous studies.

      To make sure this is clearly stated in the manuscript we changed the title to “Activity-dependent lateral inhibition enables the synchronization of active olfactory bulb projection neurons”, and rephrased a sentence in the abstract to “This lateral synchronization was particularly effective when the recorded MTC fired at the gamma rhythm”. To further clarify this point, we made several other changes throughout the results and the discussion section.

      Along this line, the conclusions regarding two different circuits underlying lateral inhibition vs enhanced synchronization are not quite justified by the data, e.g.

      (1) The authors mention that their granule cell stimulation results in a local cold spot (l. 527 ff) - how can they then said to be not involved in the inhibition of firing rate (bullet point in Highlights)? Please elaborate further. In l.406 they also state that GCs can inhibit MTCs under certain conditions. The argument, that this stimulation is not physiological, makes sense, but still does not rule out anything. You might want to cite Aghvami et al 2022 on the very small amplitude of GC-mediated IPSPs, also McIntyre and Cleland 2015.

      We apologize for the lack of clarity. We reported that we found a local cold spot in the context of an additional experiment not presented in the manuscript and only described in the Methods section. Following the revision, we decided to add the analysis of this experiment to Figure 5. This experiment validated that optogenetic activation of GCs is potent and can affect the recorded MTC firing rates. This is particularly important as we performed all experiments under Ketamine anesthesia, which is a NMDA receptor antagonist. In this experiment, we recorded the activity of MTCs at baseline conditions (without odor presentation) under optogenetic activation of GCs. We divided the OB surface into a grid and optogenetically activated GC columns at a random order, one light spot in each trial, using light patches of size of size 330um2. We used the same light intensity as in the optogenetic GC activation during odor stimulation (reported in Figures 4-5). We show that the recorded MTC was strongly inhibited by GC light activation, mostly when activating GCs in its vicinity (within its column, i.e., local cold spot). This experiment validates that in our experimental setup, GCs can exert inhibition over MTCs at baseline conditions.

      (2) Even from the shown data, it appears that laterally increased synchronization might co-occur with lateral suppression (See also comment on Figures 1d,e and Figure S1c)

      We kindly note that the panels you referred to do not quantify the firing rate but the rhythmicity of MTC light-evoked responses. We should have explained these graphs better in the main text and not only in the Methods section. We added a panel to Supplementary Figure 1, which describes our analysis: In each of these examples, we performed a time-frequency Wavelet analysis over the average response of the neurons across trials (computed using a sliding Gaussian with a std of 2ms). The results of the Wavelet analysis allowed us to visually capture the enhanced spike alignment across trials under paired activation as a function of the stimulus duration (as, for example, in Figure 1c, middle panel). The response amplitude to light stimulation did not change in this example (shown in Figure 1c lower panel), and the spikes entrainment increased following paired activation of MTCs.

      To address the relations between lateral suppression and synchronization at the population level, we added additional analyses to Figure 3c-d.

      (3) There are no manipulations of PVN activity in this study, thus there is no direct evidence for the substrate of the second circuit.

      We completely agree with the reviewer. Using the current data, we can only claim that optogenetic activation of GCL neurons did not affect the MTC odor-evoked response. This finding is consistent with the loss-of-function experiment reported by Fukunaga et al., 2014, where GC suppression did not change odor-evoke responses in both anesthetized and awake mice. Therefore, we speculated that PVN might be a candidate OB interneuron to mediate lateral inhibition between MTCs. This hypothesis is based on their higher likelihood of interconnecting two MTCs compared with GCs (Burton, 2017). We elaborated on this in the discussion and made sure it is clearly stated as a hypothesis.

      (4) The manipulation of GC activity was performed in a transgenic line with viral transfection, which might result in a lower permeation of the population compared to the line used for optogenetic stimulation of MTCs.

      We used a previously validated protocol for optogenetic manipulation of GCs from Fukunaga et al., 2014 in order to minimize this caveat. As we cited previously from their paper, following the expression of ChR2 in the GCL, ‘Light presentation in vivo resulted in rapid and strong depolarization of, and action potential (AP) discharges in, GCs (Fig. 3b), which in turn consistently and strongly hyperpolarized M/TCs (9 of 9 cells showed 100% AP suppression; Fig. 3c,d)’. These results are consistent with the additional experiment we added to the manuscript, where optogenetic activation of GCL neurons strongly suppressed MTC activity during baseline conditions (without odor presentation). The high similarity between these two reports, in which, in the case of Fukunaga et al., GC activation was directly measured, suggests that lack of opsin expression or insufficient light intensity is unlikely to explain the lack of GCL neuron activation effect on lateral inhibition. Moreover, GCL neurons' optogenetic activation during odor stimulation increased MTC spike-LFP coupling in the gamma range. Therefore, the dissociation between the effects of GCL neurons on spike entrainment and lateral inhibition suggests that the lack of lateral inhibition following GC activation is unlikely due to low expression rates.

      In some instances, the authors tend to cite older literature - which was not yet aware of the prominent contribution of EPL neurons including PVN to recurrent and lateral inhibition of MT cells - as if roles that then were ascribed to granule cells for lack of better knowledge can still be unequivocally linked to granule cells now. For example, they should discuss Arevian et al (2006), Galan et al 2006, Giridhar et al., Yokoi et al. 1995, etc in the light of PVN action.

      Therefore it is also not quite justified to state that their result regarding the role of GCs specifically for synchronization, not suppression, is "in contrast to the field" (e.g. l.70 f.,, l.365, l. 400 ff).

      We changed several sentences in the discussion and introduction to explain that previous studies attributed lateral suppression to GC because they were not aware of the prominent contribution of EPL neurons as has been demonstrated by more recent studies (Burton 2024, Huang et al., 2016,  Kato et al., 2013, and more).

      We also toned down the statement that these findings are in contrast to the field. Instead, we state that our findings support the claim that GCs are not involved in affecting MTC odor-evoked firing rate.

      Why did the authors choose to use the term "lateral suppression", often interchangeably with lateral inhibition? If this term is intended to specifically reflect reductions of firing rates, it might be useful to clearly define it at first use (and cite earlier literature on it) and then use it consistently throughout.

      We agree and have changed the manuscript accordingly. We added the following in the introduction: “We use this phrase here to refer to a process that suppresses the firing rate of the post-synaptic neuron.”

      A discussion of anesthesia effects is missing - e.g. GC activity is known to be reportedly stronger in awake mice (Kato et al). This is not a contentious point at all since the authors themselves show that additional excitation of GCs enhances synchrony, but it should be mentioned.

      We completely agree and added a paragraph to the Discussion in this regard. Please see also the response to reviewer #1, who made a similar suggestion.

      Some citations should be added, in particular relevant recent preprints - e.g. Peace et al. BioRxiv 2024, Burton et al. BioRxiv 2024 and the direct evidence for a glutamate-dependent release of GABA from GCs (Lage-Rupprecht et al. 2020).

      We thank the reviewer for noting us these relevant recent manuscripts. We have now cited Peace et al., when discussing the spatial range of inhibition and gamma synchronization in the OB, Lage-Rupprecht et al in the context of the involvement of NMDA receptor in MTC-GC reciprocal synapse and Burton et al. when discussing PV neurons potential function.

      The introduction on the role of gamma oscillations in sensory systems (in particular vision) could be more elaborated.

      In our previous paper (Dalal & Haddad 2022) we had an elaborated introduction on the role of gamma oscillations in sensory processing, since we focused in this study in the effect of gamma synchronization on information transmission between brain regions. In the current study we looked at gamma rhythms as a mechanism that can facilitate ensemble synchronization.

      Reviewer #3 (Public Review):

      Summary:

      This study by Dalal and Haddad analyzes two facets of cooperative recruitment of M/TCs as discerned through direct, ChR2-mediated spot stimulations:

      (1) mutual inhibition and

      (2) entrainment of action potential timing within the gamma frequency range.

      This investigation is conducted by contrasting the evoked activity elicited by a "central" stimulus spot, which induces an excitatory response alone, with that elicited when paired with stimulations of surrounding areas. Additionally, the effect of Gad2-expressing granule cells is examined.

      Based on the observed distance dependence and the impact of GC stimulations, the authors infer that mutual inhibition and gamma entrainment are mediated by distinct mechanisms.

      Strengths:

      The results presented in this study offer a nice in vivo validation of the significant in vitro findings previously reported by Arevian, Kapoor, and Urban in 2008. Additionally, the distance-dependent analysis provides some mechanistic insights.

      We thank the reviewer for his comments. Indeed, the current study provides in-vivo replication of the results reported in Arevian et al., 2008 in-vitro, and adds further insights by showing that lateral inhibition is distant-dependent. However, this is not the main focus of the current study. Following the findings reported by Dalal & Haddad 2022, the motivation for this study was to test the mechanism that allows co-activated MTCs to entrain their spike timing. By light-activating pairs of MTCs at varying distances, we detected a subset of pairs in which paired light-activation evoked activity-dependent lateral inhibition, as was reported by Arevian et al., 2008. Moreover, we think it is highly important to know that a previous result in an in-vitro study is fully reproducible in-vivo.

      Weaknesses:

      The results largely reproduce previously reported findings, including those from the authors' own work, such as Dalal and Haddad (2022), where a key highlight was "Modulating GC activities dissociates MTCs odor-evoked gamma synchrony from firing rates." Some interpretations, particularly the claim regarding the distance independence of the entrainment effect, may be considered over-interpretations.

      We kindly disagree with the reviewer. We think the current study extends rather than reproduces the findings reported in Dalal & Haddad 2022. The 2022 study mainly focused on the effect of OB gamma synchronization on odor representation in the Piriform cortex. We bidirectionally modulated the level of MTC gamma synchronization and found that it had bidirectional effects on odor representation in one of their downstream targets, the anterior piriform cortex. The current study, however, focuses on the question of how spatially distributed odor-activated MTCs can synchronize their spiking activity. Our current main finding is that paired activation of MTCs can enhance the spikes entrainment of the recorded MTC in an activity-dependent and spatially independent manner. We suggest that this mechanism is mediated by GCL neurons.

      The reviewer did not explain why he\she thinks that the distance independence of the entrainment effects is an over-interpretation. However, to make our claim more precise we added the following sentence to the corresponding results section:” Furthermore, within the distance range that we were able to measure, the increased phase-locking did not significantly correlate with the distance from the MTC”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor comments:

      (1) Line 17f: "This lateral synchronization was particularly effective when both MTCs fired at the gamma rhythm, ..."

      This sentence implies a direct comparison of the simultaneously recorded firing of MTCs but I could not find evidence for this in this manuscript. I would suggest to change this.

      We thank the reviewer. The sentence was changed to “This lateral synchronization was particularly effective when the recorded MTC fired at the gamma rhythm”.

      (2) Line 43f: A brief description of what glomeruli are could help to avoid confusion for readers less familiar with the OB. The phrasing of "activated glomeruli" and "each glomerulus innervates" are somewhat misleading given that they do not contain the cell bodies of the projection neurons.

      We edited this part of the introduction so it briefly describes what glomeruli are: ‘Olfactory processing starts with the activity of odorant-activated olfactory sensory neurons. The axons of these sensory neurons terminate in one or two anatomical structures called glomeruli located on the surface of the olfactory bulb (OB). Each glomerulus is innervated by several mitral and tufted cells (MTCs), which then project the odor information to several cortical regions. ‘

      (3) Line 78ff: The text sounds as if glomeruli are activated by the light stimulation but ChR2 is expressed in MTCs, the postsynaptic component of the glomeruli. It would be clearer to refer to the stimulation as light activation of MTCs.

      We corrected this sentence to: ‘We first mapped each recorded cell's receptive field, i.e., the set of MTCs on the dorsal OB that affect its firing rates when they are light-stimulated.’

      (4) Line 90: It would be great to mention somewhere in this paragraph that you are analyzing single-unit data sorted from extracellular recordings with tungsten electrodes.

      We added that to the description of the experimental setup: ‘To investigate how MTCs interact, we expressed the light-gated channel rhodopsin (ChR2) exclusively in MTCs by crossing the Tbet-Cre and Ai32 mouse lines (Grobman et al., 2018; Haddad et al., 2013), and extracellularly recorded the spiking activity of MTCs in anesthetized mice during optogenetic stimulation using tungsten electrodes.’

      (5) Line 97: The term "delta entrainment" could be easily confused with the entrainment of MTCs to respiration in the delta frequency band. Maybe better to use a different term or stick to "change in entrainment" also used in the text.

      We completely agree. The term was changed to “change in entrainment” throughout the manuscript and figures.

      (6) Line 121f: "Light stimulation did not affect ..." . Should this be "Paired light stimulation did not affect ..."?

      Corrected, thank you.

      (7) Supplementary Figure 1a: The example is not very convincing. It looks a bit like a rhythmic bursting neuron mildly depending on the stimulation.

      This panel serves to present our light stimulation method. The potency of the light stimulation protocol can be seen in the receptive field maps.

      (8) Supplementary Figure 1c: Why is there no confidence interval for 'Paired'?

      This panel shows the power spectrum density of the average neuron response across trials computed over the entire stimulus window (100ms). We decided to remove this panel, as panel Figure 1d shows the evolution of the entrainment in time and, therefore, provides better insight into the effect.

      (9) Line 166f: "... across any light intensities". Maybe better "... for the four light intensities tested"?

      We agree, we changed the text in accordance.

      (10) Figure 2f: It would be more intuitive to have the x-axis in the same orientation as in 2e.

      Corrected, thank you.

      (11) Figure 4a: The image in this panel is identical to Figure 1a in Dalal and Haddad 2022 in Cell reports just with a different intensity. The reuse of items and data from previous publications should be indicated somewhere but I could not find it.

      We apologize for this replication. We replaced it with a photo showing a larger portion of the OB, demonstrating the restricted viral expression within the GCL.

      (12) Line 408ff: A brief explanation for the hypothesis of EPL parvalbumin interneurons as the ones mediating lateral inhibition would be great.

      We agree. We added the following paragraph to the discussion section: “We speculate that MTC-to-MTC suppression is mediated by EPL neurons, most likely the Parvalbumin neuron (PV). This hypothesis is based on their activity and connectivity properties with MTCs(Burton, 2017; Kato et al., 2013; Miyamichi et al., 2013; Burton, 2024). More studies are required to reveal how PV neurons affect MTC activity.”

      (13) Line 425ff: You show that only activity of high firing rate neurons is suppressed by lateral inhibition, whereas "low and noise MTC responses" are not affected. Wouldn't this rather support the conclusion that lateral inhibition prevents excess activity from the OB?

      We found lateral inhibition was mainly effective when the postsynaptic neurons fired at ~30-80Hz in response to light stimulation. That is, it affects MTC firing in this “intermediate” rate, and to a lesser extent when the MTC have low and very high firing rates. To prevent excess activity, one would expect a mechanism that affects more high firing rates than medium ones. This was demonstrated in Kato 2013 for PV-MTC inhibition

      (14) Line 387: "..., only ~20% of the tested MTC pairs exhibited significant lateral inhibition." This is higher than the 16% of neurons you reported to have lateral entrainment (line 100). Why do you consider the lateral inhibition as 'sparse' but the lateral entrainment as relevant?

      We apologize for this unclear statement. The papers we cited in this regard (Fantana et al., 2008; Lehmann et al., 2016; Pressler and Strowbridge, 2017) have tested lateral inhibition when the recorded MTC was not active, which resulted in a sparse MTC-MTC inhibition. We validated and replicated these findings in our setup, by systematically projecting light spots over the dorsal OB without simultaneous activation of the recorded MTC and found similar rates of largely scarce inhibition (data not shown). In this study, using spike-triggered average light stimulation protocol and paired activation of MTCs, we found higher rates of lateral inhibition, consistent with the reports by Isaacson and Strowbridge, 1998, Urban and Sakmann, 2002. We changed this paragraph to the following:

      “We found that in only ~20% of the tested MTC pairs exhibited significant lateral suppression. This rate is consistent with previous in-vitro studies that found lateral suppression between 10-20% of heterotypic MTC pairs (Isaacson and Strowbridge, 1998; Urban and Sakmann, 2002), and is higher compared to a case where the recorded MTC is not active (Lehmann et al., 2016).”

      Reviewer #2 (Recommendations For The Authors):

      Figure-by-figure comments:

      (1) Figures 1d,e: both these examples seem to show that the firing rate is decreased in the paired condition? From maxima at 110 to 58 Hz in d and 100 to 48 Hz in e. Please explain (see also comment on Figure S1c).

      Please see the response in the Public Review section, reviewer #2, bullet (2). We also added a panel to Supplementary Figure 1 to better explain this.

      (2) Figure 1 f The means and SEMs are hard to see. Why is the SEM bar plotted horizontally? Since this is a major finding of the paper, will there be a table provided that shows the distribution of ∆ shifts across animals?

      We apologize for the mistake. The horizontal bar was the marking of the mean. Since the SEM is small, we corrected the graph for better visualization of the SEM.

      (3) Figure 1g Showing the running average of data where there is almost none or no data points (beyond 50 Hz) seems not ideal. Is the enhanced entrainment around 40Hz significant? Perhaps the moving average should be replaced by binned data with indicated n?

      We prefer to show all data points instead of binning the data so the reader can see it all. We agree that such a wide range on the x-axis is unnecessary. We shorten this graph only to include the firing rate range in which the data points ranged.

      (4) Figure 1h Impressive result!

      Thank you!

      (5) Figure S1a: since the authors show the respiratory pattern here and there obviously was no alignment of light stimulation with inspiration, was there any correlation between the respiratory phase and efficiency of light stimulation with respect to lateral interactions?

      This is an interesting idea. In Haddad et al., 2013, figure 7, the authors performed a similar analysis, and showed that optogenetic activation of MTCs had a more pronounced effect on firing rate in the respiration phases where the neuron was less firing. However, we haven’t quantified the impact of lateral interactions with respect to the respiration phase. That being said, the data will be publicly available to test this question.

      (6) Figure S1c: Here the shift towards a lower firing rate seems to be obvious (see comment in Figures 1 d and e). Please also show the plot for Figure 1e.

      This panel shows the power spectrum density of the average neuron's response across trials computed over the entire stimulus window (100ms). We decided to remove this panel, as panel Figure 1d shows the evolution of the entrainment in time and, therefore, provides better insight into the effect.

      (7) Figure 2b: show the same plot also for pair 2? Why is it stated that there is no lateral suppression for lateral stimulation alone, if the MTC did not spike spontaneously in the first place and thus inhibition cannot be demonstrated?

      We use Figure 2b to demonstrate the effect of lateral inhibition, and in Figure 2c we detail the responses under each light intensity for both pairs. We think that showing the mean and SEM for one example is enough to give a sense of the effect, as in Figure 2c we show the average response across time together with significant assessment for each pair (panels without a p-value have no significant difference between the conditions).

      However, we agree with the comment on this specific example and therefore deleted this sentence. However, at the population level we found no inhibition when activating the lateral spots, regardless of their firing rates (shown in Supplementary Figure 2a).

      (8) Figure 2d: why is there no distance-dependent color coding for the significant data points? Or, alternatively, since the distance plot is shown in 2e, perhaps drop this information altogether? Again, the moving average is problematic.

      Distance-dependent color coding is applied to all data points in this panel. Significant data points are shown in full circles and have distance-dependent color coding, which is mainly restricted to the lower part of the distance scale (cold colors).

      We used a moving average to relate to the similar result reported in Arevian 2008.In Figure 2e, the actual distance for each data point is indicated on the x-axis.

      (9) Figure 2f: the diagonal averaging method seems to neglect a lot of the data in Figure S2b, why not use radial coordinates for averaging?

      Thank you for the great suggestion. We indeed performed radial coordinates for the averaging, and the results are more robust and better summarize the entire data.

      (10) Figure 3: These are interesting observations, but are there cumulative data on such types of pairs? Please describe and show, otherwise this can only be a supplemental observation. Regarding 3b was it always the lower light intensity that resulted in suppression and the higher in sync? Since Burton et al. 2024 have just shown that PVNs require very little input to fire!

      This figure shows several examples of entrainment and inhibition properties. As suggested, we added population analysis (Figure 3c-d). This analysis compares the firing rate changes in pairs that evoked significant suppression or entrainment. First, we found only a few pairs in which paired activation evoked both spikes entrainment and suppression. Second, the mean of firing rate changes of pairs that evoked significant entrainment (N=50, shown in Figure 1f in full circles) is significantly different from the mean of the pairs that evoked significant lateral inhibition (N=51, shown in Figure 2d in full circles).

      (11) Figure 4: This Figure and the corresponding section should be entitled "Additional GC activation... ", otherwise it might be confusing for the reader. A loss of function manipulation (local GC silencing) would be also great to have! You did this in the previous paper, why not here? Raw LFP data are not shown. In Figure 4e the reported odor response firing rate ranges only up to 40Hz, but the example in g shows a much higher frequency. Is the maximum in 4e significant? (same issue as for Figure 1g).

      We changed the phrase to ‘optogenetic GCL neurons activation’. Unfortunately, we haven’t performed experiments where we suppress GC columns. In the previous paper, we suppressed the activity of all accessible GCs, which resulted in reduced spike synchronization to the OB gamma oscillations. Silencing only the GC column is, we think, unlikely to have a substantial effect, especially if the GCs have low activity (but this needs to be tested). Furthermore, we added examples of raw LFP data for odor stimulation and odor combined with GCL column activation (see Supplementary Figure 4a).

      The instantaneous firing rate is high (~80Hz), however the firing rate values we report in Figure 4e is the average within a window of 2 seconds (the odor duration is 1.5 seconds and we extend the window to account for responses with late return to baseline). The average firing rate of this example neuron in this window was 28Hz.

      (12) Fig 5: what does "proximal" mean - does this mean stimulation of the GCs below the recorded MTC, that might actually belong to the same glomerular unit?

      Yes, by “proximal” we mean the activation of the GC in the column of the recorded MTC. However, we decided that instead of coarsely dividing the data into proximal and distal optogenetic activation of GCL neurons, we will show the data continuously to show that GC had no significant effect on MTC odor-evoked firing rates regardless of their location (Figure 5d).

      A comment on the title:

      Please tone it down: "Ensemble synchronization" is a hypothesis at this point, not directly shown in the paper. Also, the paper does not show lateral interactions between odor-activated neurons.

      We agree and have rephrased it to “Activity-dependent lateral inhibition enables the synchronization of active olfactory bulb projection neurons ”

      (1) Figure 1a, 2a scale bar missing.

      Corrected, thank you.

      (2) Figure 1 c is the "rebound" in the lateral stim trace (green) real or not significant?

      The activity during this rebound is not significantly different than the baseline activity before light stimulation.

      (3) Figure 2b legend: "lateral alone" instead of lateral?

      We appreciate the suggestion. For simplicity, we will keep it as “lateral”.

      (4) Figure 2c: some of the data plots seem to be breaking off, e.g. the blue line in the bottom third one.

      This line breaking is due to the lack of spikes in this period. The PSTHs used in all analyses result from the convolution of the spike train with a Gaussian window with a standard deviation of 50ms.

      (5) Figure 2f: Why is the x axis flopped vs 2d,e?

      This panel was mistakenly plotted that way, and was corrected.

      Comments on the text:

      Abstract - we had indicated suggestions by strike-throughs and color which are lost in the online submission system, please compare with your original text:

      Information in the brain is represented by the activity of neuronal ensembles. These ensembles are adaptive and dynamic, formed and truncated based on the animal`s experience. One mechanism by which spatially distributed neurons form an ensemble is via synchronization of their spiking activity in response to a sensory event. In the olfactory bulb, odor stimulation evokes rhythmic gamma activity in spatially distributed mitral and tufted cells (MTCs). This rhythmic activity is thought to enhance the relay of odor information to the downstream olfactory targets. However, how only specifically the odor-activated MTCs are synchronized is unknown. Here, we demonstrate that light optogenetic activation of activating one set of MTCs can gamma-entrain the spiking activity of another set. This lateral synchronization was particularly effective when both MTCs fired at the gamma rhythm, facilitating the synchronization of only the odor-activated MTCs. Furthermore, we show that lateral synchronization did not depend on the distance between the MTCs and is mediated by granule cells. In contrast, lateral inhibition between MTCs that reduced their firing rates was spatially restricted to adjacent MTCs and was not mediated by granule cells. Our findings reveal lead us to propose ? a simple yet robust mechanism by which spatially distributed neurons entrain each other's spiking activity to form an ensemble.

      Thank you. We adopted most of the changes and edited the abstract to reflect the reported results better.

      "both MTCs fired at the gamma rhythm"/this is at this point unwarranted since the mutual entrainment is not shown - tone down or present as hypothesis?

      We completely agree. This sentence was changed to “This lateral synchronization was particularly effective when the recorded MTC fired at the gamma rhythm, facilitating the synchronization of the active MTC”.

      l. 28: distance-independent instead of "spatially independent"?

      Corrected

      l. 46: are there inhibitory neurons in the ONL? Or which 6 layers are you referring to here?

      Corrected to “spanning all OB layers”.

      l. 49: "is mediated" => "likely to be mediated". Schoppa's work is in vitro and did not account for PVNs, see comment in Public Review.

      Corrected. Indeed Schoppa`s work was performed in-vitro. We cite it here since it showed that the synchronized firing of two MTC pairs depends on granule cells.

      l.52: "method"? rather "mechanism"? "specifically" instread of "only"?

      Corrected.

      l.52: perhaps more precise: a recent hypothesis is that GCs enable synchronization solely between odor-activated MTCs via an activity-dependent mechanism for GABA-release (Lage Rupprecht et al. 2020 - please cite the experimental paper here). Again. Galan has no direct evidence for GCs vs PVNs, see comment in Public Review.

      Thank you, we updated this sentence here and in the discussion and added the relevant citation.

      l. 66: spike timings instead of spike's timing?

      Corrected to spike timings

      l. 67 -71: this part could be dropped.

      We appreciate the suggestion; however, we think that it is convenient to briefly read the main results before the results section.

      l. 76 mouse instead of mice.

      Corrected.

      l. 77: for clarification: " a single MTC"?

      In some cases, we recorded more than one cell simultaneously.

      l. 89: just use "hotspot".

      Corrected

      l. 97 instead of "change", "positive change" or "increase"?

      We left the word change, since we wanted to report that the change between hotspot alone and paired stimulation was significantly higher than zero.

      l. 104: the postsyn MTC's firing rate.

      Corrected to MTC instead of MTCs

      l.108: "distributed on the OB surface" sounds misleading, perhaps "across the glomerular map"?

      Corrected.

      l. 254: "which the MTCs form with each other"- perhaps "which interconnect MTCs".

      Corrected.

      l. 270 Additional GC activation.

      Corrected to ‘optogenetic activation of GCL neurons’

      l. 284 somewhat unclear - please expand.

      Corrected to ‘This measure minimizes the bias of the neuron's firing rate on the spike-LFP synchrony value’.

      l. 371: no odors in Schoppa et al.

      Corrected to ‘It has been shown that two active MTCs can synchronize their stimulus-evoked and odor-evoked spike timings’

      l. 406 ff. good point - but where is the transition? How does this observation rule out that GCs can mediate lateral suppression?

      It is an important question. We tested two setups of GCs optogenetic activation, either column activation (in this paper) or the activation of all accessible GCs of the dorsal OB (Dalal & Haddad, 2022). Although the latter manipulation results in significant firing rate suppression, the effect of MTC suppression was relatively small in anesthetized mice and even smaller in awake mice. Optogenetically activating GCs at baseline conditions resulted in a strong suppression of only the adjacent MTCs. Taken together, we think that GCs are capable of strongly inhibit MTCs, but it is not their main function in natural olfactory sensation.

      l. 422 ff: again, this is a hypothesis, please frame accordingly.

      Corrected to ‘Activity-dependent synchronization can enables the synchronization of odor-activated MTCs that are dispersed across the glomerular map’

      l. 551 typo.

      Corrected.

      l 556 ff: Figure 2 does not show odor responses.

      Corrected.

      l 582: Mix up of above/below and low/high?

      Corrected to ‘The values in the STA map that were above or below these high and low percentile thresholds’

      Reviewer #3 (Recommendations For The Authors):

      Line 76: "Ai39" should be corrected to "Ai32".

      Corrected. Thank you.

      Figure Legends: The legends should describe the results rather than interpret the data. For instance, the legends for Figures 1f, g, and h contain interpretations. The authors should review all legends and revise them accordingly.

      We appreciate the comment. However, we kindly disagree. We don’t see these opening sentences as interpretations but as guidance to the reader. For example, ‘Paired stimulation increases spikes’ temporal precision’ is not an interpretation; instead, it describes the finding presented in this panel. We think that legends that only repeat what can already be deduced from the graph are not helpful and, in many cases, obsolete. Explaining what we think this graph shows is common, and we prefer it as it helps the reader.

      For Figures 1d and e, it may be beneficial to add the spectrograms for the second stimulation alone.

      We show the stimulation of the hotspot alone and when we stimulate both.<br /> The spectrogram of the lateral alone does not show anything of importance.

      Figures 1a and 2a: Please add color bars so that readers can understand the meaning of the colors plotted.

      Color bars were added.

      Figure 3: The purpose of this figure is unclear. Why does the baseline firing rate for the paired activation differ? Is this an isolated observation, or is it observed in other units as well?

      This issue has been raised also by reviewer #2. Attached here is our response to reviewer #2

      This figure shows several examples of entrainment and inhibition properties. As suggested, we added population analysis (Figure 3c-d). This analysis compares the firing rate changes in pairs that evoked significant suppression or entrainment. First, we found only a few pairs in which paired activation evoked both spikes entrainment and suppression. Second, the mean of firing rate changes of pairs that evoked significant entrainment (N=50, shown in Figure 1f in full circles) is significantly different from the mean of the pairs that evoked significant lateral inhibition (N=51, shown in Figure 2d in full circles).

      Figures 4 and 5 data seems to come from the same dataset as in Dalal and Haddad (2022) DOI: https://doi.org/10.1016/j.celrep.2022.110693. For example, the fluorescence image looks identical. If this is the case, the authors may want to state that that the image and and some of the data and analyses are reproduced.

      The recorded data shown in these figures are not reproduced from Dalal & Haddad 2022. We collected this data, using GC-columns activation instead of light activating the entire OB dorsal surface as was done in the 2022 paper.

      However, the histology image is the same and we now replaced it with a new image, which shows that the expression is restricted to the GCL.

      Figure 4d: the authors use the data plotted here to argue that the gamma entrainment is distance-independent. But there is a clear decrease over distance (e.g., delta PPC1 over 0.01 is not seen for distance beyond 1000 m). The claim of distance independence may be an over-interpretation of the data. Peace et al. (2024) also claimed that coupling via gamma oscillations occurs over a large spatial extent.

      From a statistical point of view, we can’t state that there is a dependency on distance as the correlation is insignificant (P = 0.86). PPC1 of value 0.01 can be found at 0, 500, and 700 microns. Lower values are found at far distances, but this can result from a smaller number of points. The reduced level of synchrony observed at distances above one mm could be the result of the reduced density of lateral interactions at these distances. That said, we rephrase the sentence to a more careful statement. Please see the rephrased sentence at the Public review section.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Previous studies have shown that treatment with 17α-estradiol (a stereoisomer of the 17β-estradiol) extends lifespan in male mice but not in females. The current study by Li et al, aimed to identify cell-specific clusters and populations in the hypothalamus of aged male rats treated with 17α-estradiol (treated for 6 months). This study identifies genes and pathways affected by 17α-estradiol in the aged hypothalamus.

      Strengths:

      Using single-nucleus transcriptomic sequencing (snRNA-seq) on the hypothalamus from aged male rats treated with 17α-estradiol they show that 17α-estradiol significantly attenuated age-related increases in cellular metabolism, stress, and decreased synaptic activity in neurons.

      Thanks.

      Moreover, sc-analysis identified GnRH as one of the key mediators of 17α-estradiol's effects on energy homeostasis. Furthermore, they show that CRH neurons exhibited a senescent phenotype, suggesting a potential side effect of the 17α-estradiol. These conclusions are supported by supervised clustering by neuropeptides, hormones, and their receptors.

      Thanks.

      Weaknesses:

      However, the study has several limitations that reduce the strength of the key claims in the manuscript. In particular:

      (1) The study focused only on males and did not include comparisons with females. However, previous studies have shown that 17α-estradiol extends lifespan in a sex-specific manner in mice, affecting males but not females. Without the comparison with the female data, it's difficult to assess its relevance to the lifespan.

      This study was originally designed based on previous findings indicating that lifespan extension is only effective in males, leading to the exclusion of females from the analysis. The primary focus of our research was on the transcriptional changes and serum endocrine alterations induced by 17α-estradiol in aged males compared to untreated aged males. We believe that even in the absence of female subjects, the significant effects of 17α-estradiol on metabolism in the hypothalamus, synapses, and endocrine system remain evident, particularly regarding the expression levels of GnRH and testosterone. Notably, lower overall metabolism, increased synaptic activity, and elevated levels of GnRH and testosterone are strong indicators of health and well-being in males, supporting the validity of our primary conclusions. However, including female controls would enhance the depth of our findings. If female controls were incorporated, we propose redesigning the sample groups to include aged male control, aged female control, aged female treated, aged male treated, as well as young male control, young male treated, young female control, and young female treated. We regret that we cannot provide this data in the short term. Nevertheless, we believe this presents a valuable avenue for future research on this topic. In this study, we emphasize the role of 17α-estradiol in overall metabolism, synaptic function, GnRH, and testosterone in aged males and underscore the importance of supervised clustering of neuropeptide-secreting neurons in the hypothalamus.

      (2) It is not known whether 17α-estradiol leads to lifespan extension in male rats similar to male mice. Therefore, it is not possible to conclude that the observed effects in the hypothalamus, are linked to the lifespan extension.

      Thanks for the reminding. 17α-estradiol was reported to extend lifespan in male rats similar to male mice (PMID: 33289482). We have added the valuable reference to introduction in the new version.  

      (3) The effect of 17α-estradiol on non-neuronal cells such as microglia and astrocytes is not well-described (Figure 1). Previous studies demonstrated that 17α-estradiol reduces microgliosis and astrogliosis in the hypothalamus of aged male mice. Current data suggest that the proportion of oligo, and microglia were increased by the drug treatment, while the proportions of astrocytes were decreased. These data might suggest possible species differences, differences in the treatment regimen, or differences in drug efficiency. This has to be discussed.

      We have reviewed reports describing changes in cell numbers following 17α-estradiol treatment in the brain, using the keywords "17α-estradiol," "17alpha-estradiol," and "microglia" or "astrocyte." Only a limited amount of data was obtained. We found one article indicating that 17α-estradiol treatment in Tg (AβPP(swe)/PS1(ΔE9)) model mice resulted in a decreased microglial cell number compared to the placebo (AβPP(swe)/PS1(ΔE9) mice), but this change was not significant when compared to the non-transgenic control (PMID: 21157032). The transgenic AβPP(swe)/PS1(ΔE9) mouse model may differ from our wild-type aging rat model in this context.

      Moreover, the calculation of cell numbers was based on visual observation under a microscope across several brain tissue slices. This traditional method often yields controversial results. For example, oligodendrocytes in the corpus callosum, fornix, and spinal cord have been reported to be 20-40% more numerous in males than in females based on microscopic observations (PMID: 16452667). In contrast, another study found no significant difference in the number of oligodendrocytes between sexes when using immunohistochemistry staining (PMID: 18709647). Such discrepancies arising from traditional observational methods are inevitable.

      We believe the data presented in this article are reliable because the cell number and cell ratio data were derived from high-throughput cell counting of the entire hypothalamus using single-cell suspension and droplet wrapping (10x Genomics).

      (4) A more detailed analysis of glial cell types within the hypothalamus in response to drugs should be provided.

      We provided more enrichment analysis data of differentially expressed genes between Y, O, and O.T in microglia and astrocytes in Figure 2—figure supplement 3. In this supplemental data, we found unlike that in neurons, Micro displayed lower levels of synapse-related cellular processes in O.T. compared to O.

      (5) The conclusion that CRH neurons are going into senescence is not clearly supported by the data. A more detailed analysis of the hypothalamus such as histological examination to assess cellular senescence markers in CRH neurons, is needed to support this claim.

      We also noticed the inappropriate claim and we have changed "senescent phenotype" to "stressed phenotype" and "abnormal phenotype" in abstract and in results.

      Reviewer #2 (Public Review):

      Summary:

      Li et al. investigated the potential anti-ageing role of 17α-Estradiol on the hypothalamus of aged rats. To achieve this, they employed a very sophisticated method for single-cell genomic analysis that allowed them to analyze effects on various groups of neurons and non-neuronal cells. They were able to sub-categorize neurons according to their capacity to produce specific neurotransmitters, receptors, or hormones. They found that 17α-Estradiol treatment led to an improvement in several factors related to metabolism and synaptic transmission by bringing the expression levels of many of the genes of these pathways closer or to the same levels as those of young rats, reversing the ageing effect. Interestingly, among all neuronal groups, the proportion of Oxytocin-expressing neurons seems to be the one most significantly changing after treatment with 17α-Estradiol, suggesting an important role of these neurons in mediating its anti-ageing effects. This was also supported by an increase in circulating levels of oxytocin. It was also found that gene expression of corticotropin-releasing hormone neurons was significantly impacted by 17α-Estradiol even though it was not different between aged and young rats, suggesting that these neurons could be responsible for side effects related to this treatment. This article revealed some potential targets that should be further investigated in future studies regarding the role of 17α-Estradiol treatment in aged males.

      Strengths:

      (1) Single-nucleus mRNA sequencing is a very powerful method for gene expression analysis and clustering. The supervised clustering of neurons was very helpful in revealing otherwise invisible differences between neuronal groups and helped identify specific neuronal populations as targets.

      Thanks.

      (2) There is a variety of functions used that allow the differential analysis of a very complex type of data. This led to a better comparison between the different groups on many levels.

      Thanks.

      (3) There were some physiological parameters measured such as circulating hormone levels that helped the interpretation of the effects of the changes in hypothalamic gene expression.

      Thanks.

      Weaknesses

      (1) One main control group is missing from the study, the young males treated with 17α-Estradiol.

      Given that the treatment period lasts six months, which extends beyond the young male rats' age range, we aimed to investigate the perturbation of 17α-Estradiol on the normal aging process. Including data from young males could potentially obscure the treatment's effects in aged males due to age effects, though similar effects between young and aged animals may exist. Long-term treatment of hormone may exert more developmental effects on the young than the old. Consequently, we decided to exclude this group from our initial sample design. We apologize for this omission.

      (2) Even though the technical approach is a sophisticated one, analyzing the whole rat hypothalamus instead of specific nuclei or subregions makes the study weaker.

      The precise targets of 17α-Estradiol within the hypothalamus remain unresolved. Selecting a specific nucleus for study is challenging. The supervised clustering method described in this manuscript allows us to identify the more sensitive neuron subtypes influenced by 17α-Estradiol and aging across the entire hypothalamus, without the need to isolate specific nuclei in a disturbed hypothalamic environment.

      (3) Although the authors claim to have several findings, the data fail to support these claims. You may mean the claim as the senescent phenotype in Crh neuron induced by 17a-estradiol.

      Thanks. We have changed the "senescent phenotype" to "stressed phenotype"  or "abnormal phenotype" in the abstract and results to avoid such claim.

      (4) The study is about improving ageing but no physiological data from the study demonstrated such a claim with the exception of the testes histology which was not properly analyzed and was not even significantly different between the groups.

      The primary objective of this study is to elucidate the effects of 17α-Estradiol on the endocrine system in the aging hypothalamus; exploring anti-aging effects is not the main focus. From the characteristics of the aging hypothalamus, we know that down-regulated GnRH and testosterone levels, along with elevated mTOR signaling, are indicators of aging in these organs (PMID: 37886966, PMID: 37048056, PMID: 22884327). The contrasting signaling networks related to metabolism and synaptic processes significantly differentiate young and aging hypothalami, and 17α-Estradiol helps rebalance these networks, suggesting its potential anti-aging effects.

      (5) Overall, the study remains descriptive with no physiological data to demonstrate that any of the effects on hypothalamic gene expression are related to metabolic, synaptic, or other functions.

      The study focuses on investigating cellular responses and endocrine changes in the aging hypothalamus induced by 17α-estradiol, utilizing single-nucleus RNA sequencing (snRNA-seq) and a novel data mining methodology to analyze various neuron subtypes. It is important to note that this study does not mainly aim to explore the anti-aging effects. Consequently, we have revised the claim in the abstract from “the effects of 17α-estradiol in anti-aging in neurons” to “the effects of 17α-estradiol on aging neurons.” We observed that the lower overall metabolism and increased expression levels of cellular processes in the synapses align with findings previously reported regarding 17α-estradiol. To address the lack of physiological data and the challenges in measuring multiple endocrine factors due to their volatile nature, we employed several bidirectional Mendelian analyses of various genome-wide association study (GWAS) data related to these serum endocrine factors to identify their mutual causal effects.

      Reviewing Editor Comment:

      Based on the Public Reviews and Recommendations for Authors, the Reviewers strongly recommend that revisions include an experimental demonstration of the physiological effects of the treatment on ageing in rats as well as the CRH-senescence link. Additional analysis of the glia would greatly strengthen the study, as would inclusion of females and young male controls. The important point was also raised that the work linking 17a-estradiol was performed in mice, and the link with lifespan in rats is not known. Discussion of this point is recommended.

      We acknowledge that 17α-estradiol has been reported to extend lifespan in male rats, similar to findings in male mice (PMID: 33289482), and we have noted this in the Introduction. We apologize for not conducting further experiments to validate this point.

      Additionally, we have revised the description of the phenotype of senescent CRH neurons to “stressed phenotype” without carrying out further experiments to confirm the senescent phenotype. To provide more clarity on the performance of glial cells during treatment, we have included additional enrichment analysis data of differentially expressed genes among young (Y), old (O), and old treated (O.T) microglia and astrocytes in Figure 2—figure supplement 3. Notably, the behavior of microglia contrasts with that of total neurons concerning synapse-related cellular processes. We apologize for being unable to include female and young controls in this study.

      Reviewer #2 (Recommendations For The Authors)

      General comments:

      (1) The manuscript is very hard to read. Proofreading and editing by software or a professional seems necessary. The words "enhanced", "extensive" etc. are not always used in the right way.

      Thanks for the suggestion. We have revised the proofreading and editing. The words "enhanced" and "extensive" were also revised in most sentences.

      (2) The numbers of animals and samples are not well explained. Is it 9 rats overall or per group? If there are 8 testes samples per group, should we assume that there were 4 rats per group? The pooling of the hypothalamic how was it done? Were all the hypothalamic from each group pooled together? A small table with the animals per group and the samples would help.

      We appreciate your reminder regarding the initial mistake in our manuscript preparation. In the preliminary submission, we reported 9 rats based solely on sequencing data and data mining. The revised version (v1) now includes additional experimental data, with an effective total of 12 animals (4 per group). Unfortunately, we overlooked updating this information in the v1 submission. We have since added detailed information in the Materials and Methods sections: Animals, Treatment and Tissues, and snRNA-seq Data Processing, Batch Effect Correction, and Cell Subset Annotation.

      (3) The Clustering is wrong. There are genes in there that do not fall into any of the 3 categories: Neurotransmitters, Receptors, Hormones.

      We have changed the description to “Vast majority of these subtypes were clustered by neuropeptides, hormones, and their receptors within all the neurons”.

      (4) The coloring of groups in the graphs is inconsistent. It must be more homogeneous to make it easier to identify.

      We have changed the colors of groups in Fig. 1D to make the color of cell clusters consistent in Fig. 1A-D.

      (5) The groups c1-c4 are not well explained. How did the authors come up with these?

      We have added more descriptions of c1-c4 in materials and methods in the new version.

      (6) In most cases it's not clear if the authors are talking about cell numbers that express a certain mRNA, the level of expression of a certain mRNA, or both. They need to do a better job using more precise descriptions instead of using general terms such as "signatures", "expression profiles", "affected neurons" etc. It is very hard to understand if the number of neurons is compared between the groups or the gene expression.

      We have changed the "signatures" to "gene signatures" to make it more accurate in meaning. The "affected neurons" were also changed to "sensitive neurons". But sorry that we were not able to find better alternatives to the "expression profiles".

      (7) Sometimes there are claims made without justification or a reference. For example, the claim about the senescence of CRH neurons due to the upregulation of mitochondrial genes and downregulation of adherence junction genes (lines 326-328) should be supported by a reference or own findings.

      The "senescence" here is not appropriate. We have changed it to "stressed phenotype" or "aberrant changes" in abstract and results.

      (8) Young males treated with Estradiol as a control group is necessary and it is missing.

      Your suggestion is appreciated; however, the treatment duration for aged mice (O.T) was set at 6 months, while the young mice were only 4 months old. This disparity makes it challenging to align treatment timelines for the young animals. The primary aim of this study is to investigate the perturbation of 17α-estradiol on the aging process, and any distinct effects due to age effect observed in young males might complicate our understanding of its role in aged males, though similar endocrine effects may exist in the young animals. Long-term treatment of hormone may exert more developmental effects on the young than the old. Therefore, we made the decision to exclude the young samples in our initial study design. We apologize for any confusion this may have caused.

      Specific Comments:

      Line 28: "elevated stresses and decreased synaptic activity": Please make this clearer. Can't claim changes in synaptic activity by gene expression.

      We have changed it to "the expression level of pathways involved in synapse".

      Line 32: "increased Oxytocin": serum Oxytocin.

      We have added the “serum”.

      Line 52 - 54: Any studies from rats?

      Thanks. In rats there is also reported that 17α-estradiol has similar metabolic roles as that in mice (PMID: 33289482) and we have added it to the refences. It’s very useful for this manuscript.

      Line 62 - 65: It wasn't investigated thoroughly in this paper so why was it suggested in the introduction?

      We have deleted this sentence as being suggested.

      Line 70: "synaptic activity" Same as line 28.

      We have changed it to "pathways involved in synaptic activity".

      Line 79: Why were aged rats caged alone and young by two? Could that introduce hypothalamic gene expression effects?

      The young males were bred together in peace. But the aged males will fight and should be kept alone.

      Lines 78, 99, 109-110: It is not clear how many animals per group were used and how many samples per group were used separately and/or grouped. Please be more specific.

      We have added these information to Materials and methods/Animals, treatment and tissues and Materials and methods/snRNA-seq data processing, batch effect correction, and cell subset annotation.

      Line 205: "in O" please add "versus young.".

      We have changed accordingly.

      Line 207: replace "were" with "was" .

      We have alternatively changed the "proportion" to "proportions".

      Line 208: replace "that" with "compared to" and after "in O.T." add "compared to?"

      We have changed accordingly.

      Line 223: "O.T." compared to what? Figure?

      We have changed it accordingly.

      Line 227: Figure?

      We have added (Figure 1E) accordingly.

      Line 229: "synaptic activity" Same as line 28.

      We have revised it.

      Line 235: "synaptic activity" and "neuropeptide secretion" Same as line 28.

      We have revised it.

      Line 256:" interfered" please revise.

      We changed to "exerted".

      Line 263: "on the contrary" please revise.

      We have changed "on the contrary" to "opposite".

      Line 270: "conversed" did you mean "conserved"?

      We have changed "conversed" to "inversed".

      Line 296-298: Please explain. Why would these be side effects?

      It’s hard to explain, therefore, we deleted the words "side effects".

      Line 308: "synaptic activity" Same as line 28.

      We have changed it to "expression levels of synapse-related cellular processes".

      Line 314: "and sex hormone secretion and signaling"Isn't this expected?

      Yes, it is expected. We have added it to the sentence "and, as expected, sex hormone secretion and signaling".

      Line 325-328: Why is this senescence? Reference?

      We have added “potent” to it.

      Line 360-361: This doesn't show elevated synaptic activity.

      "elevated synaptic activity" was changed to "The elevated expression of synapse-related pathways"

      Line 363-364: "Unfortunately" is not a scientific expression and show bias.

      We have changed it to "Notably".

      Line 376: Similar as above.

      Yes, we have change it to "in contrast".

      Lines 382-385: This is speculation. Please move to discussion.

      Sorry for that. We think the causal effects derived from MR result is evidence. As such, we have not changed it.

      Line 389: Please revise "hormone expressing".

      We have changed it accordingly.

      Line 401: Isn't this effect expected due to feedback inhibition of the biochemical pathway? Please comment.

      The binding capability of 17alpha-estradiol to estrogen receptors and its role in transcriptional activation remain core questions surrounded by controversy. Earlier studies suggest that 17alpha-estradiol exhibits at least 200 times less activity than 17beta-estradiol (PMID: 2249627, PMID: 16024755). However, recent data indicate that 17alpha-estradiol shows comparable genomic binding and transcriptional activation through estrogen receptor α (Esr1) to that of 17beta-estradiol (PMID: 33289482). Additionally, there is evidence that 17alpha-estradiol has anti-estrogenic effects in rats (PMID: 16042770). These findings imply possible feedback inhibition via estrogen receptors. Furthermore, 17alpha-estradiol likely differs from 17beta-estradiol due to its unique metabolic consequences and its potential to slow aging in males, an effect not attributed to 17beta-estradiol. For instance, neurons are also targets of 17alpha-estradiol, with Esr1 not being the sole target (PMID: 38776045). Nevertheless, the precise effective targets of 17alpha-estradiol are still unresolved.

      Line 409: This conclusion cannot be made because the effect is not statistically significant. Can say "trend" etc.

      Thanks for the recommendation. We have added "potential" in front of the conclusion.

      Line 426: "suggesting" please revise.

      sorry, it’s a verb.

      Lines 426-428: This is speculation. Please move to discussion.

      The elevated GnRH levels in O.T., observed through EIA analysis, suggest a deduction regarding the direct causal effects of 17alpha-estradiol on various endocrine factors related to feeding, energy homeostasis, reproduction, osmotic regulation, stress response, and neuronal plasticity through MR analysis. Thus, we have not amended our position. We apologize for any confusion.

      Lines 431-432: improved compared to what?

      The statement have been revised as " The most striking role of 17α-estradiol treatment revealed in this study showed that HPG axis was substantially improved in the levels of serum Gnrh and testosterone".

      Line 435: " Estrogen Receptor Antagonists". Please revise.

      Thanks for the recommendation. We have changed it to "estrogen receptor antagonists".

      Line 438" "Secrete". Please revise.

      Sorry, it is "secret".

      Lines 439-449: None of this has been demonstrated. Please remove these conclusions.

      These are not conclusions but rather intriguing topics for discussion. Given the role of 17alpha-estradiol in promoting testosterone and reducing estradiol levels in males, we believe it is worthwhile to explore the potential application of 17alpha-estradiol in increasing testosterone levels in aged males, particularly those with hypogonadism.

      Lines 450-457: No females were included in this study. Why? Also, why is this discussed? It is relevant but doesn't belong in this manuscript since it was not studied here.

      Testosterone levels are crucial for male health, while estradiol levels are essential for the health and fertility of females. Previous studies have demonstrated that 17α-estradiol does not contribute to lifespan extension in females. Given the effects of 17α-estradiol on males—specifically, its role in promoting testosterone and reducing estradiol levels—we believe it is important to discuss the potential sex-biased effects of 17α-estradiol, as this could inform future investigations. Therefore, we have chosen not to make changes to this section.

      Lines 458-459: This was not demonstrated in this article. Please remove.

      We have restricted the claim to "expression level of energy metabolism in hypothalamic neurons".

      Line 464: "Promoted lifespan extension" Not demonstrated. Please remove.

      At the end of the sentence it was revised as "which may be a contributing factor in promoting lifespan extension".

      Line 466: "Showed" No.

      The whole sentence was deleted in the new version.

      Line 483: "the sex-based effects". Not studied here.

      Since the changes in testosterone levels are significant in this dataset and this hormone has a sex-biased nature, we find it worthwhile to suggest this as a topic for future investigation. We have added "which needs further verification in the future" at the end of this sentence.

    1. Reviewer #1 (Public review):

      Summary:

      There is prior literature showing a robust relationship between sulcal interruptions in the posterior occipital temporal sulcus (pOTS) and reading ability. The goals of this study were to extend these findings to children examined longitudinally as they become better readers, and to examine the underlying white matter properties in individuals with and without pOTS sulcal interruptions. To do this, the authors collected longitudinal structural, diffusion, and behavioral data in 51 children (TP1 age 5.5, TP3 age 8.2 years).

      First, the authors found that the gyral gap was consistent across time within the subject. This is expected, as they state in the introduction that sulcal patterns are typically established in utero. Next, they found that children with an interrupted pOTS have higher reading scores (across a variety of measures) at timepoint (TP) 3 than children with continuous pOTS, and this was specific to the pOTS, as no associations emerged for the anterior OTS or MFS; this is again expected from prior literature. They then found that the binary presence of this gap, but not anterior OTS or MFS predicted T3 reading performance. Further, they found that a subsample of the lowest readers at TP1 did not have differences in reading score by gyral gap, but that this difference emerged at TP3. Additionally, the gyral gap at TP1 is similar to variance TOWRE 3 reading skills as some behavioral measures at TP1. Examining underlying white matter in a smaller subset of children, the authors found higher MD in children with an interrupted pOTS vs. those with a continuous pOTS, which was contrary to their hypothesis, and higher local connectivity for interrupted, aligning with their hypothesis, but this difference was no longer present when accounting for TP3 reading scores. The authors conclude that structural properties, in this case, the gyral gap, may guide neural plasticity for reading.

      Strengths:

      This paper has an interesting set of longitudinal data to examine the perhaps changing relationship between sulcal interruptions in the pOTS with reading scores. I commend the authors on data collection and attention to detail in the anatomical analyses.

      Weaknesses:

      However, my enthusiasm was somewhat dampened after finding numerous prior publications on this very topic and I'm unclear as to how much more this paper adds to the current literature. Would we expect the existence of sulcal interruptions to be aligned with reading skills in older kids but not younger kids? Is the point to see if the interruptions exist prior to reading (but these children are not really prereaders)? What is the alternative- why would these interruptions not exist? After all, this anatomy is determined prenatally. Children who have pOTS interruptions at T1 should also have these interruptions at T3 (and indeed that is what the authors find). So how can this be the mechanism that drives plasticity? The authors also talk about the neuronal recycling hypothesis but their data cannot speak to this because they do not have fMRI data nor does their sample include only prereaders with no reading experience. The conclusions are overall overstated and not supported by the results. I think this paper could add interesting knowledge for the specific subfield of reading and the brain. However, the current state of the results, especially with the inclusion of so many trending results and the comparison of so many different processing pipelines and models, in addition to a conclusion that is not motivated by the work makes it difficult to appreciate the paper.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews

      Reviewer #1: 

      Summary:

      In this study, Avila et al. tested the hypothesis that chronic pain states are associated with changes in the excitability of the medial prefrontal cortex (mPFC). The authors used the slope of the aperiodic component of the EEG power spectrum (= the aperiodic exponent) as a novel, non-invasive proxy for the cortical excitation-inhibition ratio. They performed source localization to estimate the EEG signals generated specifically by the mPFC. By pooling resting-state EEG recordings from three existing datasets, the authors were able to compare the aperiodic exponent in the mPFC and across the whole brain (at all modeled cortical sources) between 149 chronic pain patients and 115 healthy controls. Additionally, they assessed the relationship between the aperiodic exponent and pain intensity reported by the patients. To account for heterogeneity in pain etiology, the analysis was also performed separately for two patient subgroups with different chronic pain conditions (chronic back pain and chronic widespread pain). The study found robust evidence against differences in the aperiodic exponent in the mPFC between people with chronic pain and healthy participants, and no correlation was observed between the aperiodic exponent and pain intensity. These findings were consistent across different patient subgroups and were corroborated by the whole-brain analysis.

      Strengths:

      The study is based on sound scientific reasoning and rigorously employs suitable methods to test the hypothesis. It follows a pre-registered protocol, which greatly increases the transparency and, consequently, the credibility of the reported results. In addition to the planned steps, the authors used a multiverse analysis to ensure the robustness of the results across different methodological choices. I find this particularly interesting, as the EEG aperiodic exponent has only recently been linked to network excitability, and the most appropriate methods for its extraction and analysis are still being determined. The methods are clearly and comprehensively described, making this paper very useful for researchers planning similar studies. The results are convincing, and supported by informative figures, and the lack of the expected difference in mPFC excitability between the tested groups is thoroughly and constructively discussed.

      We are grateful for the appreciation of the strengths of our study.  

      Weaknesses:

      Firstly, although I appreciate the relatively large sample size, pooling data recorded by different researchers using different experimental protocols inevitably increases sample variability and may limit the availability of certain measures, as was the case here with the reports of pain intensity in the patient group. Secondly, the analysis heavily relies on the estimation of cortical sources, an approach that offers many advantages but may yield imprecise results, especially when default conduction models, source models, and electrode coordinates are used. In my opinion, this point should be discussed as well.

      We agree that the heterogeneous sample of people with chronic pain increases variability and limits the availability of clinical measures. We further agree on the limitations of source space analysis. Therefore, we have added these limitations to the discussion section.

      Reviewer #2: 

      Summary:

      This study evaluated the aperiodic component in the medial prefrontal cortex (mPFC) using restingstate EEG recordings from 149 individuals with chronic pain and 115 healthy participants. The findings showed no significant differences in the aperiodic component of the mPFC between the two groups, nor was there any correlation between the aperiodic component and pain intensity. These results were consistent across various chronic pain subtypes and were corroborated by whole-brain analyses. The study's robustness was further reinforced by preregistration and multiverse analyses, which accounted for a wide range of methodological choices.

      Strengths:

      This study was rigorously conducted, yielding clear and conclusive results. Furthermore, it adhered to stringent open and reproducible science practices, including preregistration, blinded data analysis, and Bayesian hypothesis testing. All data and code have been made openly available, underscoring the study's commitment to transparency and reproducibility.

      We appreciate the appraisal of the strengths of our study, highlighting our efforts in open and reproducible science practices.

      Weaknesses:

      The aperiodic exponent of the EEG power spectrum is often regarded as an indicator of the excitatory/inhibitory (E/I) balance. However, this measure may not be the most accurate or optimal for quantifying E/I balance, a limitation that the authors might consider addressing in the future.

      We are grateful for this suggestion and fully agree that the aperiodic component of the power spectrum is not necessarily the most optimal and accurate measure for quantifying E/I balance. We have now included this limitation in the discussion section.

      Recommendations for the authors

      Reviewer #1: 

      (1) In the Results section, it might be helpful to provide the mean values of the aperiodic exponent (before age correction) for all tested groups and subgroups. As this measure is still not widely used, providing these values would allow readers to better understand the normal range of the aperiodic exponent.

      We have added the mean values of the aperiodic exponent and their standard deviation (before age correction) to the manuscript's results section (page 6 and 11).

      (2) When reporting the aperiodic exponent across all cortical sources (Q3), I think it would be useful to include the raw values in Figure 6 in the main text rather than in the Supplementary Materials. At a glance, these plots seem to suggest that the aperiodic exponent differs between groups in the occipital and parietal regions, even though no tests were significant after correcting for multiple comparisons. Maybe this observation also deserves a mention in the text and possibly in the Discussion..?

      We have moved the report on the aperiodic exponent across all cortical sources from the Supplementary Material to the main text. It is now Fig. 7 of the main manuscript. Moreover, we agree that the plots suggest group differences in certain brain regions. However, according to our rigorous open and reproducible science practices and pre-registration, we prefer not to speculate on these non-significant findings. 

      (3) In the Methods section, when describing the participants, the authors state that "Gender was balanced across both groups...". It might be better to avoid referring to the datasets as "balanced," considering that the sample includes almost twice as many females as males.

      We have replaced the misleading statement with the more precise statement that ”the gender ratio of both groups was similar.”

      (4) In the Methods section, when describing the source localization, I find it slightly confusing that the authors first mention the anterior cingulate cortex as a possible label included in the mPFC cortical parcels but then state that the version of the cortical atlas used did not contain such a label. It might be simpler not to mention the cingulate cortex at all.

      We have deleted the misleading sentence from the manuscript.  

      Reviewer #2: 

      (1) The aperiodic exponent of the EEG power spectrum is often considered an indicator of the excitatory/inhibitory (E/I) balance, but this measure can be susceptible to artifacts. It is important to acknowledge this limitation and consider exploring alternative measures to quantify the E/I ratio in future studies.

      We are grateful for this suggestion and fully agree that the aperiodic component of the power spectrum is not necessarily the most optimal and accurate measure for quantifying E/I balance. We have now included this limitation in the discussion section.

      (2) The study assumed a linear relationship between the E/I ratio (represented by the aperiodic exponent of the EEG power spectrum) and chronic pain. However, this assumption may not hold true in all cases, and this point could be discussed in the study.

      We fully agree that the relationship between the E/I ratio and chronic pain might not be a linear one and have added this point to the discussion section.

      (3) The aperiodic component was characterized in eyes-closed resting-state EEG recordings, although EEG data were collected in both eyes-closed and eyes-open conditions. The authors could also consider assessing the aperiodic component from EEG data with eyes open.

      We thank the reviewer for this suggestion. We have focused our analysis on eyes-closed recordings since these recordings are usually less contaminated by artifacts than eyes-open recordings. Moreover, in our current datasets, some participants were missing eyes-open recordings. We agree that performing similar analyses for the eyes-open recordings would also be interesting. However, adding these analyses would double the amount of data included in the manuscript, which would likely overload it. We have, therefore, now included a statement to the discussion that future studies should also analyze eyes-open EEG recordings.  

      (4) The EEG power spectrum was calculated from signals after source reconstruction, a crucial step for targeting specific brain regions. However, this process can introduce potential signal distortions, such as variations in source waveforms depending on different regularization parameters. To ensure the robustness of the results, the authors could perform the same analysis at the sensor level, for example, using signals recorded at Fz.

      We agree on the potential shortcomings and limitations of source space analysis and have added this limitation to the discussion section.

      (5) It would be beneficial to present the raw EEG power spectrum averaged across subjects for each condition, along with the scalp distribution of the aperiodic exponent. This would enhance readers' understanding of the study and help demonstrate the quality of the data.

      We are grateful for this suggestion and added the power spectrum for each condition and the scalp distribution of the aperiodic exponent to the Supplementary Material.

      (6) Linear regression models were used to control for the influence of age on aperiodic exponents and pain intensity ratings. However, it is unclear why other relevant variables, such as gender and medication use, were not considered.

      We agree that the aperiodic exponent might be influenced by gender and medication. As these analyses had not been included in our pre-registered analysis plan, we have not performed them. Moreover, although we agree that gender might have an impact, we have not found any evidence for this so far. Regarding medication, we fully agree that medication can influence the measure. However, medication was very heterogeneous, including drugs with fundamentally different mechanisms of action. Thus, we do not see a robust way to appropriately analyze these effects with sufficient statistical power. We have now added this important point to the discussion section.

      (7) The authors may consider addressing or discussing the impact of inter-individual variability on the negative results, particularly given that the data were derived from multiple experiments.

      We agree that the heterogeneous sample of people with chronic pain increases variability and limits the availability of clinical measures. We have added this limitation to the discussion.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In the first half of this study, Pham et al. investigate the regulation of TEAD via ubiquitination and PARylation, identifying an E3 ubiquitin ligase, RNF146, as a negative regulator of TEAD activity through an siRNA screen of ubiquitin-related genes in MCF7 cells. The study also finds that depletion of PARP1 reduced TEAD4 ubiquitination levels, suggesting a certain relationship between TEAD4 PARylation and ubiquitination which was also explored through an interesting D70A mutation. Pham et al. subsequently tested this regulation in D. melanogaster by introducing Hpo loss-of-function mutations and rescuing the overgrowth phenotype through RNF146 overexpression.

      In the second half of this study, Pham et al. designed and assayed several potential TEAD degraders with a heterobifunctional design, which they term TEAD-CIDE. Compounds D and E were found to effectively degrade pan-TEAD, an effect which could be disrupted by treatment with TEAD lipid pocket binders, proteasome inhibitors, or E1 inhibitors, demonstrating that the TEAD-CIDEs operate in a proteasome-dependent manner. These TEAD-CIDEs could reduce cell proliferation in OVCAR-8, a YAP-deficient cell line, but not SK-N-FI, a Hippo pathway independent cell line. Finally, this study also utilizes ATAC-seq on Compound D to identify reductions in chromatin accessibility at the regions enriched for TEAD DNA binding motifs.

      Strengths:

      The study provides compelling evidence that the E3 ubiquitin ligase RNF146 is a novel negative regulator of TEAD activity. The authors convincingly delineate the mechanism through multiple techniques and approaches. The authors also describe the development of heterobifunctional pan-degraders of TEAD, which could serve as valuable reagents to more deeply study TEAD biology.

      Weaknesses:

      The scope of this study is extremely broad. The first half of the paper highlights the mechanisms underlying TEAD degradation; however, the connection to the second half of the paper on small molecule degraders of TEAD is jarring, and it seems as though two separate stories were combined into this single massive study. In my opinion, the study would be stronger if it chose to focus on only one of these topics and instead went deeper.

      We thank the reviewer for the thoughtful feedback. In our mind, the two parts of the paper are inherently related as they both focus on proteasome-mediated degradation of TEADs. We first demonstrated that TEAD can be turned over by the ubiquitin proteasome system under endogenous conditions and identified a PARylation-dependent E3 ligase RNF146 as a major regulator of TEAD stability. Intriguingly, we observed that the four TEAD paralogs show different levels of polyubiquitination with some of them being highly stable in cells. These observations raised the question of whether the activity of the ubiquitin-proteasome system could be further enhanced pharmacologically to effectively target TEADs. We then tackled this question by providing a proof-of-concept demonstration of engineered heterobifunctional protein degraders can effectively degrade TEADs in cells and can be exploited as a therapeutic strategy for treating Hippo-dependent cancers.

      Additionally, the figure clarity needs to be substantially improved, as readability and interpretation were difficult in many panels. Lastly, there are numerous typos and poor grammar throughout the text that need to be addressed.

      We appreciate the suggestions from the reviewer and have updated the figures with high resolution images. We also corrected typos and grammatical errors in the text.

      Reviewer #2 (Public Review):

      The paper is made of two parts. One deals with RNF146, the other with the development of compounds that may cause TEAD degradation. The two parts are rather unrelated to each other.

      The main limit of this work is the lack of evidence that TEAD factors are in fact regulated by the proteasome and ubiquitylation under endogenous conditions. Also lacking is the demonstration that TEADs are labile proteins to the extent that such quantitative regulation at the level of stability can impact on YAP-TAZ biology. Without these two elements, the relevance and physiological significance of all these data is lacking.

      As for the development of new inhibitors of TEAD, this is potentially very interesting but underdeveloped in this manuscript. Irrespectively, if TEAD is stable, these molecules are likely lead compounds of interest. If TEAD is unstable, as entertained in the first part of the paper, then these molecules are likely marginal.

      We thank the reviewer for evaluating our manuscript. As the reviewer pointed out, the paper aimed to address 1) whether TEAD is being regulated by the proteasome and ubiquitination under endogenous conditions, and 2) whether TEAD can be inhibited through pharmacologically-induced degradation. First, we demonstrated that TEAD is ubiquitinated in cells and mapped the lysine residues that are poly-ubiquitinated (Fig. 1). Next, we identified RNF146 as a major E3 ligase that ubiquitinates TEADs and reduces their stability. Third, we show that RNF146-mediated TEAD ubiquitination is functionally important: RNF146 suppresses TEAD activity, and RNF146 genetically interacts with Hippo pathway components in fruit flies. Furthermore, as we showed in Fig. S2H, RNF-146 does not affect TEAD1 and TEAD4 to the same extent. Across all four cell lines evaluated, TEAD1 is more stable than TEAD4, raising the question of whether more consistent degradation of different TEAD paralogues could be achieved. To this end, we demonstrated that while the TEAD family of proteins is labile under endogenous conditions, more complete degradation of the TEAD proteins could be achieved using a heterobifunctional CRBN degrader. We further characterized these TEAD degraders in a series of cellular and genomic assays to demonstrate their cellular activity, selectivity, and inhibitory effects against YAP/TAZ target genes. We believe these degrader compounds would be of great interest to the Hippo community. We have revised the main text to clarify these points.

      Here are a few other specific observations:

      (1) The effect of MG is shown in a convoluted way, by MS. What about endogenous TEAD protein stability?

      We thank the reviewer for the question. The MS experiment shown in Figure 1 is a standard KGG experiment, where we used MS to map ubiquitination sites on TEADs. The graphical representation of the process is included in Fig. 1C, and the details of the procedure are included in the Methods section. Fig. 1D shows the different KGG peptides detected with or without MG-132 treatment. Fig. 1E shows the quantified abundance of each of the peptides across the four conditions indicated at the bottom of the plot. Regarding endogenous TEAD stability, ​​we conducted cycloheximide chase experiments to assess the stability of endogenously expressed TEAD isoforms upon RNF146 knockdown (Fig. S2G and S2H). Using isoform-specific antibodies, we demonstrated that siRNF146 significantly stabilized TEAD4 in multiple cell lines, including H226, PATU-8902, Detroit-562, and OVCAR-8 (Fig. S2G, S2H, and S2I), supporting the notion that RNF146 is a negative regulator of TEAD stability. Notably, the effect of siRNF146 on TEAD1 stability was less pronounced, and TEAD1 is more stable than TEAD4 across all four cell lines. These results are consistent with the lower level of ubiquitination of TEAD1 (Fig. 1A) and are corroborated by various biochemical, molecular, and genetic characterizations (Fig. 3A-C and S3E).

      (2) The relevance of siRNF on YAP target genes of Fig.2D is not statistically significant.

      We thank the reviewer for this comment. We have now removed the statistically significant claim.

      (3) All assays are with protein overexpression and Ub-laddering

      We thank the reviewer for the comment. To examine the ubiquitination level of TEAD proteins, we adopted an in vivo ubiquitination assay as described in our Materials and Methods section. To our knowledge, this assay is very standard in the ubiquitination field. Furthermore, as mentioned above, we have included in our revised manuscript cycloheximide chase experiments to assess the stability of endogenously expressed TEAD isoforms upon RNF146 knockdown (Fig. S2G and S2H). In addition to the overexpression system, we also assessed endogenously expressed TEAD using isoform-specific antibodies. We demonstrated that siRNF146 firmly stabilized TEAD4 in multiple cell lines, including H226, PATU-8902, Detroit-562, and OVCAR-8 (Fig. S2G with quantification and t-test), supporting the notion that RNF146 is a negative regulator of TEAD stability.

      (4) An inconsistency exists on the only biological validation (only by overexpression) on the fly eye size. RNF gain in Fig4C is doing the opposite of what is expected from what is portrayed here as a YAP/TEAD inhibitor: RNF gain is shown to INCREASE eye size, phenocopying a Hippo loss of function phenotype. According to the model proposed, RNF addition should reduce eye size. The authors stated that " This is in contrast to the anti-growth effect of RNF-146 in the Hpo loss-of-function background and indicates RNF146 may regulate other genes/pathways controlling eye sizes besides its role as a negative regulator of Sd/yki activity". This raises questions on what the authors are really studying: why, according to the authors, these caveats should occur on the controls, and not when they study Hpo mutants?

      We thank the reviewer for the comment. We acknowledge the complexity of the fly phenotype compared to tumor growth. TEAD (Sd) isn’t the only substrate of RNF146 in the fly. For instance, RNF146 is known to positively regulate Wnt signaling by degrading Axin. Previous studies have shown that activation of the Wnt signaling pathway by removal of the negative regulator Axin from clones of cells results in an overgrowth phenotype (Legent and Treisman, 2008). The overgrowth phenotype that we observed with overexpressing RNF146 only, therefore, likely is due to the role of RNF146 in regulating other signaling pathways. Importantly, we showed that upon Hippo loss of function, overexpression of RNF146 can rescue the Hippo overgrowth phenotype (Fig 4B). This differential outcome of RNF146 expression in wildtype versus Hippo-deficient flies indicates that the genetic interactions between RNF146 and Hippo pathway components altered the phenotypic outcome, and the phenotype we get with RNF146 overexpression in a Hippo loss of function background is not simply due to additive effects of functional loss of either component alone.

      Complementary to these overexpression data, we showed that knockdown of RNF146 increased the eye size further (Fig. S4A, B) in Hippo loss of function background, further supporting the role of RNF146 as a negative regulator of the overall pro-growth signals induced by yki upon Hippo loss of function.

      (5) The role of TEAD inactivation on YAP function is already well known. Disappointingly, no prior literature is cited. In any case, this is a mere control.

      We thank the reviewer for the suggestion. We have cited several published reviews that touch upon this aspect of the TEAD-YAP function, including Calses et al., 2019; Dey et al., 2020; Halder and Johnson, 2011; Wang et al., 2018. We are open to your suggestions on additional citations.

      (6) The second part of the paper on the Development and Screening of pan-TEAD lipid pocket degraders is interesting but unconnected to the above. The degradation pathway it involves has nothing to do with the enzyme described in the first figures.

      We thank the reviewer for the comment. We acknowledge that our paper broadly covers two aspects. We believe that they are inherently connected as they both address ubiquitin/proteasome-mediated TEAD degradation and the functional consequences of TEAD degradation. Given the increasing interest in targeting TEAD/YAP/TAZ in cancers, we think the pharmacological approaches to enhance TEAD degradation using orthogonal E3 ligases provide an important toolbox to understand how this pathway can be regulated under both physiological and pathological conditions. While RNF146 appears to be a major E3 ligase responsible for TEAD turnover under physiological conditions, we showed that the four TEAD paralogs have different poly-ubiquitination levels (Fig. 1A), and are differentially labile in cells (Fig. S2G-I). These observations raised the question of whether the activity of the ubiquitination-proteasome system could be further enhanced to allow more complete removal of TEADs. To this end, we demonstrated that E3 ligases that do not regulate TEAD under endogenous conditions can be leveraged pharmacologically to achieve deep TEAD degradation, thus providing a proof of concept that TEADs can be targeted simultaneously using such approaches. Finally, in addition to establishing the basic biological concept linking RNF146 to TEAD degradation, the compounds we engineered will serve as valuable chemical tools for future studies of TEAD biology and the Hippo pathway in cancers and beyond.

      (7) The role of CIDE on YAP accessibility to Chromatin is superficially executed. Key controls are missing along with the connection with mechanisms and prior knowledge of TEAD, YAP, chromatin, and other TEAD inhibitors, just to mention a few.

      We used ATAC-seq to assess chromatin accessibility comparing cells treated with DMSO and two different concentrations of compound D. We acknowledge there are small molecule inhibitors of TEADs that can modulate accessibility of YAP binding sites. Potential mechanistic differences between TEAD degraders versus TEAD small molecule inhibitions will be a future area of investigation.

      (8) The physiological relevance and the mechanistic interpretation of what should be in the ATAC seq in ovcar cells is missing.

      We showed in Fig. 7A-D the dose response of OVCAR cells to the TEAD degraders. As evident from those experiments, TEAD degraders inhibit the proliferation of OVCAR cells as expected from their dependencies on the TEAD/YAP/TAZ transcription complex. In the ATAC-seq experiment, we showed that the canonical TEAD/YAP/TAZ target genes ANKRD1 and CCN1 have reduced chromatin accessibility at their promoter/enhancer regions (Fig. 8C). By unbiased motif and pathway analyses, we show that TEAD binding sites and YAP signatures are most significantly downregulated in OVCAR-8 cells (Fig. 8D-E). These results are incorporated into the results section of the manuscript.

      Reviewer #3 (Public Review):

      Summary

      Pham, Pahuja, Hagenbeek, et al. have conducted a comprehensive range of assays to biochemically and genetically determine TEAD degradation through RNF146 ubiquitination. Additionally, they designed a PROTAC protein degrader system to regulate the Hippo pathway through TEAD degradation. Overall, the data appears robust. However, the manuscript lacks detailed methodological descriptions, which should be addressed and improved before publication. For instance, the methods used to analyze the K48 ubiquitination site on TEAD and the gene expression analysis of Hippo Signaling are unclear. Furthermore, the multiple proteomics, RNA-seq, and ATAC-seq data must be made publicly available upon publication to ensure reproducibility. Most of the main figures are of low resolution, which needs addressing.

      We thank the reviewer for evaluating our manuscript. All of the data will be uploaded to public databases. We apologize for the low figure resolution and have updated the figures in the revised manuscript. We also expanded the methods section with more details.

      Strengths:

      - A broad range of assays was used to robustly determine the role of RNF146 in TEAD degradation.

      - Development of novel PROTAC for degrading TEAD.

      Weaknesses:

      - An orthogonal approach is needed (e.g., PARP1 inhibitor) to demonstrate PARP1's dependency in TEAD ubiquitination.

      We thank the reviewer for the suggestion. We had attempted to assess the effect of PARP inhibitors (including veliparib and olaparib) on TEAD ubiquitination, but the data is relatively complex to interpret. Besides inhibiting PARP1/2 catalytic activities, these PARP inhibitors also trap PARP on chromatin. Hence, these inhibitors could induce other cellular changes in addition to inhibiting the catalytic activities of PARP1/2. Given these potential pitfalls, we decided not to include these inconclusive data. Even though the experiments with PARP inhibitors were inconclusive, our study supports that TEAD2 and TEAD4 are PARylated in cells using an anti-PAR antibody (Fig. 3B). Furthermore, we show that mutation of the D70 PARsylation site to alanine greatly abolished TEAD4 ubiquitination in cells, suggesting PARylation is important for TEAD4 ubiquitination. In addition, PARP1 depletion by siRNA and CRISPR guide RNA reduced TEAD2 and TEAD4 ubiquitination levels, indicating PARP1 is one of the PARPs responsible for TEAD PARylation in cells.

      - The data from Table 2 is unclear in illustrating the association of identified K48 ubiquitination with TEAD4, especially since the experiments were presumably to be conducted on whole cell lysates with KGG enrichment. This raises the possibility that the K48 ubiquitination could originate from other proteins. Alternatively, if the authors performed immunoprecipitation on TEAD followed by mass spectrometry, this should be explicitly described in the text and materials and methods section.

      We thank the reviewer for this question. The experiment was an IP-mass spectrometry study in a TEAD4 amplified cell line model (PATU-8902) after IP with a pan-TEAD antibody. Here, we observed K48 ubiquitin and other ubiquitin linkages as shown in the Supplementary Table S2 of the original submission. Although it is possible that the IP wash steps could be more stringent, we did enrich for TEAD protein prior to mass spectrometry. While the ubiquitin linkage signals may come mainly from TEAD protein (mainly TEAD4), we recognized that some signals may come from other proteins. Given the caveat, we have now removed the table from our paper and updated the text accordingly.

      - Figure 2D: The methodology for measuring the Hippo signature is unclear, as is the case for Figures 7E and F regarding the analysis of Hippo target genes.

      We apologize for the lack of clarification. In short, we previously developed the Hippo signature using machine learning and chemogenomics as described previously (Pham et al. Cancer Discovery 2021). In the revised version of the manuscript, we added the methodology for measuring the Hippo signature and cited our previous publication where we developed the Hippo signature.

      - Figure S3F requires quantification with additional replicates for validation.

      We thank the reviewer for the suggestion. We added the quantification for the blot and indicated the replication in the figure legend. Note that Figure S3F is now S3G.

      - There is a misleading claim in the discussion stating "TEAD PARylation by PAR-family members (Figure 3)"; however, the demonstration is only for PARP1, which should be corrected.

      We apologize for the statement. We observed both PARP1 and PARP9 in our TEAD IP-mass spec (now Figure S3E), which suggest both PARP-family members could be invovled. Nonetheless, we primarily focus on PARP1, which is widely expressed aross cell line models and present in higher abundance. Thus, our study only experimentally validated PARP1's role in regulating TEAD.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      General comments:

      (1) Please provide a smoother transition and well-defined connection between the first and second parts of the manuscript. The manuscript reads as two papers that were combined into one, without much attempt to disguise the fact.

      We thank the reviewer for the suggestion. We have added a transition paragraph to smoothen the transition. We acknowledge that our paper broadly covers two aspects. However, they both touch upon TEAD ubiquitination and degradation. In the first part of the manuscript, we described TEAD biology and showed that TEADs are post-translationally modified and subsequently regulated through PARylation-dependent RNF146-mediated ubiquitination. In the second part, we highlighted our abilities to leverage the PROTAC system for degrading such labile oncogenic proteins like TEADs. In addition to the biological concept, the compounds we engineered will serve as valuable chemical tools for future studies of TEAD biology and the Hippo pathway in cancers and beyond.

      (2) To confirm the proteasome mechanism of action, viability assays should be conducted with a CRBN KO.

      We thank the reviewer for the comment. In Figure 6E, we measured TEAD protein levels under CRBN knockdown and observed an expected change in TEAD stability. This observation and the other data presented in Figure 6 suggest that TEAD proteins are targeted for proteasomal degradation under compound D treatment.

      (3) As a control, sgPARP1 or PARP1 inhibitors should be used to confirm TEAD PARylation reduction.

      We thank the reviewer for the suggestion. We had attempted to assess the effect of PARP inhibitors (including veliparib and olaparib) on TEAD ubiquitination, but the data is relatively complex to interpret. Besides inhibiting PARP1/2 catalytic activities, PARP inhibitors also trap PARP on chromatin. Hence, these inhibitors could induce other cellular changes in addition to inhibit the catalytic activities of PARP1/2. Given these pitfalls, we decided not to include these inconclusive data. Even though the experiments with PARP inhibitors were inconclusive, our study supports that TEAD2 and TEAD4 are PARylated in cells using an anti-PAR antibody (Fig. 3B). Furthermore, we show that mutation of the D70 PARsylation site to alanine greatly abolished TEAD4 ubiquitination in cells, suggesting PARylation is important for TEAD4 ubiquitination. In addition, PARP1 depletion by siRNA and CRISPR guide RNA reduced TEAD2 and TEAD4 ubiquitination levels, indicating PARP1 is one of the PARPs responsible for TEAD PARylation in cells.

      (4) MS data looks convincing but an FDR of 1% should be applied - this is the accepted standard in the proteomics field. Please research the data with the more stringent filter.

      We thank the reviewer for the suggestion. Our IP-MS experiment comparing siNTC versus siYAP1/WWTR1 in Patu-8902 cells did not have replicates and FDR could not be derived. Therefore, we listed the raw data in Supplemental Table 3 without showing statistics. To validate the putative interactions identified by IP-MS, we performed IP-Western experiments to confirm that TEAD4 interacts with PARP1 (Figure 3A). It is important to note that in addition to our report, the interaction between PARP1 and TEADs has been observed in other publications (Calses et al., 2023; Yang et al., 2017). We have included more details of the IP-MS experiment reported in Supplemental Table 3 in the revised manuscript and cited previous work reporting TEAD-PARP1 interaction.

      (5) Proofread the manuscript more thoroughly for typos and grammatical errors.

      We thank the reviewer for raising this issue and have addressed it in the revision.

      (6) Improve figure clarity (e.g., clearly labeling graph axes).

      We apologize for the oversight. The revised manuscript contains high resolution figures.

      Specific points:

      Generally, the manuscript could use additional proofreading for grammar and clarity. It would not be practical to list all, but some representative examples are listed below:

      Run-on: "They act through an event-driven mechanism instead of conventional occupancy-driven pharmacology, in addition, target protein degradation removes all functions of the target protein and may also lead to destabilization of entire multidomain protein complexes."

      Typo: "Compound D exhibits significant inhibition of cell proliferation and downstream signaling compared to compound A, a reversible TEAD lipid pocket binder that lack the ubiquitin ligase binding moiety."

      Typo: "Thus, we sought to deplete TEAD proteins by directly target them for ubiquitination and proteasomal degradation via pharmacologically inducing interactions between TEAD and other abundantly expressed and PARylation-independent E3 ligases."

      Typo: "Compound A is a close in analog of Compound B as described previously (Holden et al., 2020)."

      We have revised the manuscript and corrected the typos and grammatical errors listed above and beyond.

      Specific comments on the figures are listed below:

      Figure 2:

      • Figures 2B and 2C should be separated into separate panels for clarity.

      We have updated the Figures 2B and 2C as suggested.

      • Figure 2C - "To further assess the function of RNF146, we depleted RNF146 by either sgRNA or siRNA." This should say either CRISPR-Cas9 KO or siRNA-mediated knockdown.

      We thank the reviewer for the suggestion. We revised the text to address this issue.

      • Figure 2D - y-axis is not labeled well/clearly. Additionally, there are different resolutions for the p-values on the graph (the top p-value is slightly clearer than the other two, suggesting either a different font was used or the value was pasted on top of a picture of the graph at a different resolution).

      We updated the figures according to the suggestions.

      • Figure S2A - "We identified three ubiquitin ligases - RNF146, TRAF3, and PH5A - as potential negative regulators for the Hippos pathway from the primary screen using the luciferase reporter." However, the siPHF5A data appears to decrease luciferase levels whereas siRNF146 and siTRAF3 increase it.

      We thank the reviewer for catching this error. We removed PH5A from this list.

      Figure 3:

      • Figure 3A - label more clearly. Is this an endogenous TEAD4 co-IP?

      We thank the reviewer for the suggestion. The experiment was an IP-mass spectrometry study in a TEAD4 amplified cell line model (PATU-8902) with pan-TEAD antibody. We have included the details to in the figure legends. Figure 3A is now Figure S3E in the revised manuscript.

      • Figure 3C - why are the dark and light exposures not matching/corresponding? In the dark exposure, there are two particularly dark bands, the darkest of which is at the top of the gel. However, this darkest band disappears in the light exposure gel. Additionally, the last lane is marked as +TEAD2 and +TEAD4. Not sure if this is a typo, and meant to be only +TEAD4? Seems a bit strange to have a double TEAD lane.

      We thank the reviewer for this comment and apologize for the oversight. There was a typo in the label. The light exposure image was from a replicate run instead of the same run, therefore the lanes didn’t all match up. We have removed the light exposure panel to resolve the confusion. (Figure 3B).

      Figure 5:

      • Figure 5B - why is shTEAD1-4/Sucrose a much higher tumor volume than shNTC/Sucrose negative control? Additionally, should the legend say "sNTC/Sucrose" as it does or "shNTC/Sucrose"?

      The labels for shTEAD1-4/Sucrose and shNTC/Sucrose are correct. We do not understand why there is a slight increase in tumor volume for shTEAD1-4/Sucrose and suspect that is due to the considerable variation in the experiment. This slight change, however, doesn’t influence our observation of tumor regression in shTEAD1-4 under the Doxycycline treatment.

      "sNTC/Sucrose" is a typo. We apologize for the oversight and have revised the figure.

      • Figure 5E - cited in text after Figures 6 and 7.

      We have updated the text accordingly.

      Figure 6:

      • Figure 6B - it is very interesting how this clearly shows the Hook effect for Compound D, but it's a bit harder to see for compound E that the compound degrades pan-TEAD. Would it be possible to quantify the blots to reinforce claims about protein degradation here?

      We thank the reviewer for the question. There may seem to be some hook effect across the three concentrations of compound D treatment in Fig. 6B.  However, in Fig. 6C-E, we observed pretty consistent TEAD degradation levels across a variety of concentrations. In addition, these experiments have been repeated in multiple cell lines with consistent results. We respectfully argue that more detailed investigation of the hook effect is beyond the scope of our study.

      Figure 7:

      • Figure 7F - this heat map is extremely difficult to interpret. Are there any interesting clusters? What are the darker/lighter bands for Compound D compared to DMSO control?

      We thank the reviewer for the comment and apologize for the lack of information on the figure. These are genes from a Hippo signature derived from our earlier work (Pham et al. Cancer Discovery). As a result of degrading TEAD when treating the cells with Compound D, we observed an expected downregulation of most of these genes compared to compound A.

      Figure 8:

      • Figure 8B - these two pie charts are also difficult to interpret. Perhaps try to present the data in a form other than encircling pie charts?

      We thank the reviewer for the suggestion. However, this is a very descriptive pie chart, we used this format to save space.

      • Figure 8C - what is GNE-6915? Is this Compound D?

      Yes, this is compound D. The text is updated accordingly.

      Reviewer #3 (Recommendations For The Authors):

      Figure 3A would benefit from explicitly stating the conditions within the figure, rather than referring to the legend. This clarity is also needed for Figure 8C, indicating whether the treatment was with compound D or GNE-6915.

      We thank the reviewer for the suggestion. We have added the details to the figures and made the suggested edits.

      Standardize the terms "ubiquitination" and "ubiquitylation" throughout the paper for consistency.

      We now use the term “ubiquitination” throughout the manuscript.

      The statement "In this study, we show that the activity of TEAD transcription factors can be post-transcriptionally regulated via the ubiquitin/proteasome system" should be corrected to "post-translationally regulated."

      We have update the manuscript accordingly.

      There is an additional exclamation mark above Figure 5E that should be removed.

      We have revised Figure 5E.

    1. Reviewer #1 (Public review):

      The revision by Ruan et al clarifies several aspects of the original manuscript that were difficult to understand, and I think it presents some useful and interesting ideas. I understand that the authors are distinguishing their model from the standard Wright-Fisher model in that the population size is not imposed externally, but is instead a consequence of the stochastic reproduction scheme. Here, the authors chose a branching process but in principle any Markov chain can probably be used. Within this framework, the authors are particularly interested in cases where the variance in reproductive success changes through time, as explored by the DDH model, for example. They argue with some experimental results that there is a reason to believe that the variance in reproductive success does change over time.

      One of the key aspects of the original manuscript that I want to engage with is the DDH model. As the authors point out, their equations 5 and 6 are assumptions, and not derived from any principles. In essence, the authors are positing that that the variance in reproductive success, given by 6, changes as a function of the current population size. There is nothing "inherent" to a negative binomial branching mechanism that results in this: in fact, the the variance in offspring number could in principle be the same for all time. As relates to models that exist in the literature, I believe that this is the key difference: unlike Cannings models, the authors allow for a changing variance in reproduction through time.

      This is, of course, an interesting thing to consider, and I think that the situation the authors point out, in which drift is lower at small population sizes and larger at large population sizes, is not appreciated in the literature. However, I am not so sure that there is anything that needs to be resolved in Paradox 1. A very strong prediction of that model is that Ne and N could be inversely related, as shown by the blue line in Fig 3b. This suggests that you could see something very strange if you, for example, infer a population size history using a Wright-Fisher framework, because you would infer a population *decline* when there is in fact a population *expansion*. However, as far as I know there are very few "surprising population declines" found in empirical data. An obvious case where we know there is very rapid population growth is human populations; I don't think I've ever seen an inference of recent human demographic history from genetic data that suggests anything other than a massive population expansion. While I appreciate the authors empirical data supporting their claim of Paradox 1 (more on the empirical data later), it's not clear to me that there's a "paradox" in the literature that needs explaining so much as this is a "words of caution about interpreting inferred effective population sizes". To be clear, I think those words of caution are important, and I had never considered that you might be so fundamentally misled as to infer decline when there is growth, but calling it a "paradox" seems to suggest that this is an outstanding problem in the literature, when in fact I think the authors are raising a *new* and important problem. Perhaps an interesting thing for the authors to do to raise the salience of this point would be to perform simulations under this model and then infer effective population sizes using e.g. dadi or psmc and show that you could identify a situation in which the true history is one of growth, but the best fit would be one of decline

      The authors also highlight that their approach reflects a case where the population size is determined by the population dynamics themselves, as opposed to being imposed externally as is typical in Cannings models. I agree with the authors that this aspect of population regulation is understudied. Nonetheless, several manuscripts have dealt with the case of population genetic dynamics in populations of stochastically fluctuating size. For example, Kaj and Krone (2003) show that under pretty general conditions you get something very much like a standard coalescent; for example, combining their theorem 1 with their arguments on page 36 and 37, they find that exchangeable populations with stochastic population dynamics where the variance does not change with time still converge to exactly the coalescent you would expect from Cannings models. This is strongly suggestive that the authors key result isn't about stochastic population dynamics per se, but instead related to arguing that variance in reproductive success could change through time. In fact, I believe that the result of Kaj and Krone (2003) is substantially more general than the models considered in this manuscript. That being said, I believe that the authors of this manuscript do a much better job of making the implications for evolutionary processes clear than Kaj and Krone, which is important---it's very difficult to understand from Kaj and Krone the conditions under which effective population sizes will be substantially impacted by stochastic population dynamics.

      I also find the authors exposition on Paradox 3 to be somewhat strange. First of all, I'm not sure there's a paradox there at all? The authors claim that the lack of dependence of the fixation probability on Ne is a paradox, but this is ultimately not surprising---fixation of a positively selected allele depends mostly on escaping the boundary layer, which doesn't really depend on the population size (see Gillespie's book "The Causes of Molecular Evolution" for great exposition on boundary layer effects). Moreover, the authors *use a Cannings-style argument* to get gain a good approximation of how the fixation probability changes when there is non-Poisson reproduction. So it's not clear that the WFH model is really doing a lot of work here. I suppose they raise the interesting point that the particularly simple form of p(fix) = 2s is due to the assumption that variance in offspring is equal to 1.

      In addition, I raised some concerns about the analysis of empirical results on reproductive variance in my original review, and I don't believe that the authors responded to it at all. I'm not super worried about that analysis, but I think that the authors should probably respond to me.

      Overall, I feel like I now have a better understanding of this manuscript. However, I think it still presents its results too strongly: Paradox 1 contains important words of caution that reflect what I am confident is an under appreciated possibility, and Paradox 3 is, as far as I'm concerned, not a paradox at all. I have not addressed Paradox 2 very much because I think that another reviewer had solid and interesting comments on that front and I am leaving it to them. That being said, I do think Paradox 2 actually presents a deep problem in the literature and that the authors' argument may actually represent a path toward a solution.

      This manuscript can be a useful contribution to the literature, but as it's presented at the moment, I think most of it is worded too strongly and it continues to not engage appropriately with the literature. Theoretical advances are undoubtedly important, and I think the manuscript presents some interesting things to think about but ultimately needs to be better situated and several of the claims strongly toned down.

      References:<br /> Kaj, I., & Krone, S. M. (2003). The coalescent process in a population with stochastically varying size. Journal of Applied Probability, 40(1), 33-48.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      This manuscript explores the multiple cell types present in the wall of murine-collecting lymphatic vessels with the goal of identifying cells that initiate the autonomous action potentials and contractions needed to drive lymphatic pumping. Through the use of genetic models to delete individual genes or detect cytosolic calcium in specific cell types, the authors convincingly determine that lymphatic muscle cells are the origin of the action potential that triggers lymphatic contraction. 

      Strengths: 

      The experiments are rigorously performed, the data justify the conclusions, and the limitations of the study are appropriately discussed. 

      There is a need to identify therapeutic targets to improve lymphatic contraction and this work helps identify lymphatic muscle cells as potential cellular targets for intervention. 

      Weaknesses: 

      My only major comment would be that the manuscript provides a lot of rich information describing the cellular components of the muscular lymphatic vessel wall and that these data are not well represented by the title. The title (while currently accurate) could be tweaked to better represent all that is in this manuscript. Maybe something like

      "Characterization/Interrogation of the cellular components of murine collecting lymphatic vessels reveals that lymphatic muscle cells are the innate pacemaker cells regulating lymphatic contractions" or "Discovery/Confirmation of lymphatic muscle cells as innate pacemaker cells of lymphatic contraction through characterization of the cellular components of murine collecting lymphatic vessels". Potentially a cartoon summary figure of the components that make up the collecting lymphatic vessel wall could also be included. In my opinion, these changes will make this manuscript of more interest to a broader group of scientists. I have a few additional comments for consideration to improve the clarity and enhance the discussion of this work. 

      We agree with the reviewer that our original manuscript, and our resubmission even more so with the addition of the scRNAseq data, provides a significant amount of information regarding the composition of the lymphatic collecting vessel wall. We have changed our title to match one suggestion of the reviewer: “Characterization of the cellular components of murine collecting lymphatic vessels reveals that lymphatic muscle cells are the innate pacemaker cells regulating lymphatic contractions".

      Reviewer #2 (Public Review): 

      Summary: 

      This is a well-written manuscript describing studies directed at identifying the cell type responsible for pacemaking in murine-collecting lymphatics. Using state-of-the-art approaches, the authors identified a number of different cell types in the wall of these lymphatics and then using targeted expression of Channel Rhodopsin and GCaMP, the authors convincingly demonstrate that only activation of lymphatic muscle cells produces coordinated lymphatic contraction and that only lymphatic muscle cells display pressure-dependent Ca2+ transients as would be expected of a pacemaker in these lymphatics. 

      Strengths: 

      The use of a targeted expression of channel rhodopsin and GCaMP to test the hypothesis that lymphatic muscle cells serve as the pacemakers in musing lymphatic collecting vessels. 

      Weaknesses: 

      The only significant weakness was the lack of quantitative analysis of most of the imaging data shown in Figures 1-11. In particular, the colonization analysis should be extended to show cells not expected to demonstrate colocalization as a negative control for the colocalization analysis that the authors present. 

      We understand the reviewer’s concern regarding the lack of a control for the colocalization analysis and that the colocalization analysis was limited to just one set of cell markers. We have now provided a colocalization analysis of Myh11 and PDGFRα, to serve as a co-localization negative control based on our RT-PCR and scRNASeq findings, which is incorporated into the current Supplemental figure 1. In regard to the staining pattern of other various marker combinations, the results were often quite clear with the representative images that two separate cell populations were being stained such as the case with labeling endothelial cells with CD31, macrophage labeling with the MacGreen mice, or hematopoietic cells with CD45. 

      During our lengthy rebuttal process we completed a single cell RNA sequence analysis using our isolated and cleaned mouse inguinal axillary lymphatic collecting vessels to aid in our characterization of the vessel wall and to more thoroughly answer these questions regarding colocalization in arguably a robust manner. The generation of our scRNAseq dataset, derived from isolated and cleaned mouse inguinal axillary collecting vessels from 10 mice, 5 male and 5 females, allowed us to profile over 2200 of the adventitial fibroblast like cells (AdvCs) we had identified in our original submission. Using this dataset, we were able to confirm co-expression of Cd34 and Pdgfrα in AdvCs and assess the co-expression of other genes of interest from our RT-PCR experiments and immunofluorescence experiments. This approach will also allow other lymphatic investigators to assess their genes of interest as our dataset is uploaded to the NIH Gene Omnibus and will be uploaded to the Broad Institute Single Cell Portal upon publication.

      Here we show that the vast majority of non-muscle fibroblast like cells referred to as AdvCs were double positive for both CD34 and PDGFRα. We also show that the AdvCs that express commonly used pericyte markers Pdgfrb and Cspg4 also co-expressed Pdgfrα. Critically, this data also shows that the AdvCs that express genes linked with lymphatic contractile dysfunction (Ano1, Gjc1 or connexin 45, and Cacna1c “Cav1.2”) co-express Pdgfrα and would render these genes susceptible to Cre-mediated recombination using our Pdgfrα-CreER<sup>TM</sup> model.  

      Reviewer #3 (Public Review): 

      Summary: 

      Zawieja et al. aimed to identify the pacemaker cells in the lymphatic collecting vessels. Authors have used various Cre-based expression systems and optogenetic tools to identify these cells. Their findings suggest these cells are lymphatic muscle cells that drive the pacemaker activity in the lymphatic collecting vessels. 

      Strengths: 

      The authors have used multiple approaches to test their hypothesis. Some findings are presented as qualitative images, while some quantitative measurements are provided.   

      Weaknesses: 

      -  More quantitative measurements. 

      -  Possible mechanisms associated with the pacemaker activity. 

      -  Membrane potential measurements. 

      We thank the reviewers for their concerns and have addressed them in the following manner. 

      - We added novel single cell RNA sequencing of isolated and cleaned inguinal axillary vessels from 10 mice (5 males and 5 females). This allowed us to quantify the number of AdvCs that coexpress CD34 and Pdgfrα as well as the number of cells co-expressing Pdgfrα and other markers.

      - We have added a negative control with quantification for the co-localization analysis assessing Myh11 and Pdgfrα. We have added a negative control with quantification for the ChR2-photo stimulated contraction experiments using Myh11CreERT2-ChR2 mice that were not injected with tamoxifen. 

      - We also used Biocytin-AF488 in our intracellular Vm electrodes to map the specific cells in which we recorded action potentials and in neighboring cells since Biocytin-AF488 is under 1KDa and can pass through gap junctions. This approach independently labeled lymphatic muscle cells and their direct neighbors for 3 IALVs from 3 separate mice. 

      - We performed membrane potential recordings in isolated, pressurized (under isobaric conditions), and spontaneously contracting inguinal axillary lymphatic collecting vessels at different pressures. 

      - We also show that the pressure-frequency relationship is dependent on the slope of the diastolic depolarization as no other parameter was significantly altered in our study and the diastolic depolarization slope was highly correlated with contraction frequency. 

      We believe the addition of these novel data, controls, experiments, and quantifications have improved the manuscript in line with the reviewers’ suggestions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Lines 149-162: The authors rule out the methylene blue staining cells in the cLV wall as pacemakers because they don't form continuous longitudinal connections to drive propagation. Is it possible for a pacemaker cell to only initiate the contraction and then have the LMCs make the axial electrical connections to propagate the electrical wave? I am not trying to suggest the methylene blue cells are pacemakers, but I am not sure the lack of longitudinal (or radial) connectivity is sufficient evidence to rule out the possibility. This comment also is relevant to the 3 criteria for a pacemaker cell listed in the Discussion (Lines 413-417). 

      We agree with the reviewer’s broader point that a pacemaker cell may not require direct contact with other ‘pacemaker’ cells within the tissue as long as they are still within the same electrical syncytium. However, we do expect a continuous presence of a pacemaker cell type throughout the vessel wall length to account for the persistence of spontaneous contractile behavior despite vessel length, and the ability for contraction initiation to shift (Akl et al 2011, Castorena et al 2018 and Castorena et al 2022) and the occurrence of spontaneous action potentials. In Dirk van Helden’s seminal work in 1993 on lymphatic pacemaking, a major finding was that “SM of small lymphangions or that of short segments, cut from lymphangions of any length, behaved similarly”. We have adjusted our phrase regarding the requirement of a contiguous network and instead suggest a continuous presence along the vessel network and integrated into the electrical syncytium. 

      Methylene blue is an alkaline stain that will stain acidic structures and historically methylene blue is noted to stain Interstitial cells of Cajal in the gastrointestinal tract which typically exist as network of cells(Huizinga et al 1993 and Berezin 1988). No such network was readily apparent in our methylene blue staining nor did the stained cells have a similar morphology to the ICCs of the gastrointestinal tract. Further, methylene blue is staining is not limited to ICCs or pacemaker cells at large as it has been used to kill cancer cells. Within the small intestine methylene blue was noted to also stain macrophage like cells (Mikkelsen et al 1988), and we too draw parallels between the macrophage morphology observed with Macgreen mice and methylene-blue stained cells. The specific structure for the ICC affinity for methylene blue is not well described and while the innate cytotoxicity of methylene blue and light has been used to kill ICCs and impair slow wave generation, the lack of specificity of this method leaves much to be desired. What is clear is that the ICC network highlighted by methylene blue in the gut is absent in lymphatic collecting vessels.

      In Figure 15/Video12, is it possible that the cells that are showing intracellular Ca2+ in diastole are the cells that reach a threshold membrane potential that then trigger the rest of the LMCs? As the authors have shown heterogeneity in the LMCs surface markers, is it possible that the cells with Ca2+ activity during diastole are identifiable by a distinct molecular phenotype? Or is the thought that these cells are randomly active in diastole? Some discussion/speculation about this seems appropriate. 

      We are in agreement with the reviewer’s conclusion that there is heterogeneity in the LMCs as it pertains to the calcium oscillations in diastole, either under normal buffer conditions or when L-type channels are inhibited with nifedipine. We also note significant heterogeneity in the gene expression noted within the four LMC subclusters (0-3), though we did not see significant differences in either in Ip3R1 or Ano1 expression. However, subcluster “0” had increased expression of Itprid2, also known as KRas-induced actin-interacting protein (KRAP) which is thought to tether, and thus immobilize, IP3 receptors to the actin cortex beneath the cell membrane. KRAP has been recently proposed to be a critical player in IP3 receptor “licensing” which allows IP3 receptors to release calcium (Vorontsova et al., 2022).  However, whether similar requirement of IP3R licensing is necessitated in all cells or specifically in LMCs is unknown it is quite clear there are specific release sites within the cell and this topic is currently under further investigation for a separate manuscript. We would like to note that there is yet to be a clear consensus on whether IP3R licensing is required as much of these studies are performed in cultured cells and this mechanism has only recently been described. 

      Healthy lymphatic collecting vessels typically have a single pacemaker driving a coordinated propagated contraction in ex vivo isobaric myograph studies (Castorena-Gonzalez et al., 2018), which is typically at either end of the cannulated vessel. We believe that this is due to the lack of a bordering cell in one direction and allows charge to accumulate and voltage to reach threshold at these sites preferentially. We have tried to image calcium at the pacemaking pole of the vessel to observe the specific Ca<sup>2+</sup> transients at these sites though invariably the act of imaging GCaMP6f results in the pacemaker activity initiating from the other pole of the vessel. It is our opinion that the fact that LMCs are heterogenous in their Ca<sup>2+</sup> transients is a feature to the system as it permits a wider range of depolarization signals, and thus allows finer control of the pacing as different physical/pressure or signaling stimuli is encountered. Likely, the cells with the higher propensity of Ca<sup>2+</sup> transients act as the contraction initiation site in vivo, though it must also be noted that the LMC density decreases around lymphatic valve sites. In fact, in guinea pig collecting vessels there are very few LMCs at the valves which can render them electrically uncoupled or poorly coupled (Van Helden, 1993). Thus, valve sites in which there is greater electrical resistance due to lower LMC-LMC coupling may allow for charge accumulation in the LMCs at the valve site, similar to the artificial condition achieved in our myograph preparations with two cut ends, and allow them to reach threshold first and drive coordination at the valve sties.

      An additional description of what the PTCL analysis is meant to represent physiologically would be helpful for readers. 

      We have better described the conversion of the calcium signals into “particles” for analysis at first mention in the methods and results section and have included the requisite reference to this specific methodology in Line 429-30. 

      A description of how DMAX is experimentally determined is needed. 

      We have adjusted our methods section to describe DMAX in line 774-775.

      “with Ca<sup>2+</sup>-free Krebs buffer (3mM EGTA) and diameter at each pressure recorded under passive conditions (DMAX).”

      I think the vessels referred to as popliteal lymphatic vessels are actually saphenous lymphatic vessels (afferent to the popliteal lymph node). Please clarify. 

      Indeed, some of the vessels used in this study are the afferents to the single popliteal node. They travel with the caudal branch of the saphenous vein, but have routinely been described as popliteal vessels, as opposed to saphenous lymphatic vessels, by the lymphatic field at large (Tilney 1971 PMCID: PMC1270981, Liao 2015 PMID: 25512945). To move away from this nomenclature would likely add to confusion although we agree that the lymphatic field may need to improve or correct the vessel naming paradigm to match the vascular pairs they follow.

      Reviewer #2 (Recommendations For The Authors): 

      Lines 214-215 - can you cite a reference for the observation that rhythmic contractions don't require the presence of valves? 

      We have added the reference. In Dr. Van Helden’s seminal work on the topic in 1993, “Vessel segments were then cut from selected small lymphangions (length 300-500 um) by cutting at the valves.” Additionally, work by Dr Anatoliy Gashev utilized sections of lymphatic vessels that lacked valves to study orthograde and retrograde shear sensitivity (Gashev et al., 2002).

      Lines 224-230 - It would have been nice to see colocalization analysis for all cell types so that "negative" results could be compared with the "positives" that you report. This would help bolster evidence of your ability to separate cell types. 

      We understand the reviewer’s sentiment and agree. We have now added a “negative control” colocalization staining and analysis for PDGFR and Myh11 which has been added to the current SuppFigure 1. We stained 3 IALVs from 3 separate mice with PDGFRα and Myh11 and performed confocal microscopy. We ran the FIJI BIOP-JACOP colocalization plugin as before and observed very little colocalization of the two signals. Additionally, we have also added a coexpression assessment for CD34 and PDGFRα and other genes using our scRNAseq dataset.  

      line 293 - Should read "Cx45 in..." 

      This has been corrected. 

      “The expression of the genes critically involved in cLV function—Cav1.2, Ano1, and Cx45—in the PdgfrαCreER<sup>TM</sup>-ROSA26mTmG purified cells and scRNAseq data prompted us to generate PdgfrαCreER<sup>TM</sup>-Ano1<sup>fl/fl</sup>, PdgfrαCreER<sup>TM</sup>-Cx45<sup>fl/fl</sup>, and PdgfrαCreER<sup>TM</sup>-Cav1.2<sup>fl/fl</sup> mice for contractile tests.”

      lines 470-473 - A reference for this statement should be cited. 

      We have added the reference. In Dr. Van Helden’s seminal work on the topic in 1993, “Vessel segments were then cut from selected small lymphangions (length 300-500 um) by cutting at the valves.” Additionally, work by Dr Anatoliy Gashev utilized sections of lymphatic vessels that lacked valves to study orthograde and retrograde shear sensitivity (Gashev et al., 2002).

      Lines 483-487 - References should be cited for these statements. 

      We have narrowed and clarified this statement and supported it with the necessary citations. 

      “Of course, mesenchymal stromal cells (Andrzejewska et al., 2019) and fibroblasts (Muhl et al., 2020; Buechler et al., 2021; Forte et al., 2022) are present, and it remains controversial to what extent telocytes are distinct from or are components/subtypes of either cell type (Clayton et al., 2022). Telocytes are not monolithic in their expression patterns, displaying both organ directed transcriptional patterns as well as intra-organ heterogeneity (Lendahl et al., 2022) as readily demonstrated by recent single cell RNA sequencing studies that provided immense detail about the subtypes and activation spectrum within these cells and their plasticity (Luo et al., 2022).”

      Lines 584-585 - Missing a reference citation. 

      Thank you for catching this error, the correct citation was for Boedtkjer et al 2013 and is now properly cited. 

      Line 638 - "these this" should read "this" 

      Thank you for catching this error. This particular sentence was removed in light of the addition of the scRNAseq data.

      Reviewer #3 (Recommendations For The Authors): 

      This manuscript from Zawieja et al. explored an interesting hypothesis about the pacemaker cells in lymphatic collecting vessels. Many aspects of lymphatic collecting vessels are still under investigation; hence this work provides timely knowledge about the lymphatic muscle cells as a pacemaker. Although it is an important topic of the investigation, the data provided do not support the overall goal of the manuscript. Many figures (Figure 1-5) provide quantitative estimation and the description provided in the results section might only be useful for a restricted audience, but not to the broader audience. Some of the figures are very condensed with multiple imaging panels and it is hard to follow the differences in qualitative analysis. Overall, this manuscript can be improved by more streamlined description/writing and figure arrangements (some of the figures/panels can be moved to the supplementary figures). 

      We disagree with the notion that the original data provided did not support the goal of the manuscript- to identify and test putative pacemaker cell types. Nonetheless we believe we have also added ample novel data to the manuscript, including membrane potential recordings and scRNAseq to highlight and to add further support to our conclusion that the pacemaker cell is an LMC. We believe the scRNAseq data will also greatly enhance the appeal of the manuscript to a broader audience and have renamed the manuscript in line with the wealth of data we have collected on the components of the vessel wall as we tested for putative pacemaker cells.

      As requested, we have moved many figures to the supplement to allow readers to focus more on the more critical experiments.

      A few other points that need to be addressed: 

      (1) Authors used immunofluorescence-based differences in various cell types in the collecting vessels. Initially, they chose ICLC, pericytes, and lymphatic muscle cells. But then they started following adventitial cells and endothelial cells. It is not clear from the description, why these other cells could be possibly involved in the pacemaker activity. It will be easier to follow if authors provide a graphical abstract or summary figure about their hypothesis and what is known from their and others' work. 

      We would like to clarify that we used the endothelial cells as controls to ensure what we observed via immunofluorescence and FACs RT-PCR were a separate cell type from either lymphatic muscle or lymphatic endothelial cells on the vessel wall. Staining for the endothelium also allowed us to assess where these PDGFRα+CD34+ cells reside in the vessel wall.  We started with a wide range of markers that are conventionally used for targeting specific cell types, but as expected those markers are not always 100% specific. Specifically, we focused on CD34, Kit, and Vimentin as those were the markers for the non-muscle cells observed in the lymphatic collecting vessel wall previously. What we found was that CD34 and PDGFRα labeled the same cell type. As there was not a CD34Cre mouse available at the time we instead utilized the inducible PDGFRαCreERTM. We are unsure how well an abstract figure will condense the conclusions from the experiments listed here but if absolutely required for publication we can attempt to highlight the representative cell populations identified on the vessel wall.

      (2) Authors used many acronyms in the manuscript without defining them (when they appeared for the first time). Please follow the convention. 

      We have checked the manuscript and made several corrections regarding the use of abbreviations.

      (3) How specific PDGFR-alpha as a marker of the pericytes? It can also label the mesenchymal cells. Why did the author choose PDGFR-alpha over beta for their Cre-based expression approach? 

      We tried to assess if there were a pericyte like cell present in or along the wall using PDGFRbeta (Pdgfrβ). Pdgfrβ is commonly used to identify pericytes (Winkler et al., 2010), while in contrast Pdgfrα is a known fibroblast marker (Lendahl et al., 2022). Pdgfrβ CreERT2 resulted in recombination in both LMCs and AdvCs, preventing it from being a discriminating marker for our study where as Myh11CreER<sup>T2</sup> and PDGFRαCreER<sup>TM</sup> were specific at least to cell type based on our FACSs-RT-PCR and staining. As you can tell from the scRNAseq data in Figure 5, there was no cell cluster that Pdgfrβ was specific for in contrast to PDGFRα and Myh11.  In Figure 6 we show the expression of another commonly used pericyte marker NG2 (Cspg4) in our scRNAseq dataset which was observed in both LMCs and AdvCs as well. Lastly, MCAM (Figure 6) can also be a marker for pericytes though we see only expression in the LMCs and LECs for this marker. Notably, almost all of the AdvCs express PDGFRα rendering the PDGFRαCreER<sup>TM</sup> a powerful tool to study this population of cells on the vessel wall including those that were PDGFRα+Cspg4+ or PDGFRα+ Pdgfrβ+.

      We were reliant on PDGFRαCreER<sup>TM</sup> as that was the only available PDGFRα Cre model at the time. Note we used PdgfrβCreER<sup>T2</sup> and Ng2Cre in our study but found that both Cre models recombined both LMCs and AdvCs.

      (4) Please include appropriate references for all the labeling markers (PDGFR-alpha, beta, and myc11 etc.) that are used in this manuscript. 

      We have added multiple references to the manuscript to support the use of these common cell “specific” markers as of course each marker is limited in some capacity to fully or specifically label a single population of cells (Muhl et al., 2020).

      (5) One of the criteria for the pacemaker cells is depolarization-induced propagated contractions. Authors have used optogenetics-induced depolarization to test this phenomenon. Please include negative controls for these experiments. 

      We have now added negative controls to this experiment which were non-induced (no tamoxifen) Myh11CreER<sup>T2</sup>-Chr2 popliteal vessels. This data has been added to the Figure 8.  

      (6) What are the resting membrane potentials of Lymphatic muscle cells? The authors should provide some details about this in the manuscript. 

      We agree with the reviewer and have added membrane potential recordings (Figure 13) at different pressures and filled our recording electrode with the cell labeling molecule BiocytinAF488 to highlight the action potential exhibiting cells, which were the LMCs. Lymphatic resting membrane potential is dynamic in pressurized vessels, which appears to be a critical difference in this approach as compared to pinned out vessels or those on wire myographs likely due to improper stretch or damage to the vessel wall. In mesenteric lymphatic vessels isolated from rats the minimum membrane potential achieved during repolarization ranges from -45 to 50mV typically while IALVs from mice are typically around -40mV, though IALVs have a notably higher contraction frequency. Critically, we have also added novel membrane potential recordings to this manuscript in IALVs at different pressures and show that the diastolic depolarization rate is the critical factor driving the pressure-dependent frequency.

      (7) In the discussion, the authors discussed SR Ca2+ cycling in Pacemaking, but the relevant data are not included in this manuscript, but a manuscript from JGP (in revision) is cross-referenced. 

      As discussed above, we have recently published our work where studied IALVs from Myh11CreERT2-Ip3R1fl/fl (Ip3r1ismKO) and Myh1CreERT2-Ip3r1fl/fl-Ip3r2fl/fl-Ip3r3fl/fl mice (Zawieja et al., 2023). Deletion of Ip3r1 from LMCs recapitulated the dramatic reduction in frequency we previously published in Myh11CreERT2-Ano1fl/fl mice and the loss of pressure dependent chronotropy. Furthermore, in this manuscript we also showed that the diastolic calcium transients are nearly completely lost in ILAVs from Myh11CreERT2-Ip3R1fl/fl knockout mice. There was no difference in the contractile function between IALVs from single Ip3r1 knockout and the triple Ip3r1-3 knockout mice suggesting that it is Ip3r1 that is required for the diastolic calcium oscillations. Further, in the presence of 1uM nifedipine there were still no calcium oscillations in the Myh11CreERT2-Ip3r1fl/fl LMCs. These findings provide further support for our interpretation that the pacemaking is of myogenic origin.

      Andrzejewska, A., B. Lukomska, and M. Janowski. 2019. Concise Review: Mesenchymal Stem Cells: From Roots to Boost. Stem Cells. 37:855-864.

      Buechler, M.B., R.N. Pradhan, A.T. Krishnamurty, C. Cox, A.K. Calviello, A.W. Wang, Y.A. Yang, L.

      Tam, R. Caothien, M. Roose-Girma, Z. Modrusan, J.R. Arron, R. Bourgon, S. Muller, and S.J. Turley. 2021. Cross-tissue organization of the fibroblast lineage. Nature. 593:575579.

      Castorena-Gonzalez, J.A., S.D. Zawieja, M. Li, R.S. Srinivasan, A.M. Simon, C. de Wit, R. de la Torre, L.A. Martinez-Lemus, G.W. Hennig, and M.J. Davis. 2018. Mechanisms of Connexin-Related Lymphedema. Circ Res. 123:964-985.

      Clayton, D.R., W.G. Ruiz, M.G. Dalghi, N. Montalbetti, M.D. Carattino, and G. Apodaca. 2022. Studies of ultrastructure, gene expression, and marker analysis reveal that mouse bladder PDGFRA(+) interstitial cells are fibroblasts. Am J Physiol Renal Physiol. 323:F299F321.

      Forte, E., M. Ramialison, H.T. Nim, M. Mara, J.Y. Li, R. Cohn, S.L. Daigle, S. Boyd, E.G. Stanley, A.G. Elefanty, J.T. Hinson, M.W. Costa, N.A. Rosenthal, and M.B. Furtado. 2022. Adult mouse fibroblasts retain organ-specific transcriptomic identity. Elife. 11.

      Gashev, A.A., M.J. Davis, and D.C. Zawieja. 2002. Inhibition of the active lymph pump by flow in rat mesenteric lymphatics and thoracic duct. J Physiol. 540:1023-1037.

      Lendahl, U., L. Muhl, and C. Betsholtz. 2022. Identification, discrimination and heterogeneity of fibroblasts. Nat Commun. 13:3409.

      Luo, H., X. Xia, L.B. Huang, H. An, M. Cao, G.D. Kim, H.N. Chen, W.H. Zhang, Y. Shu, X. Kong, Z.

      Ren, P.H. Li, Y. Liu, H. Tang, R. Sun, C. Li, B. Bai, W. Jia, Y. Liu, W. Zhang, L. Yang, Y. Peng, L. Dai, H. Hu, Y. Jiang, Y. Hu, J. Zhu, H. Jiang, Z. Li, C. Caulin, J. Park, and H. Xu. 2022. Pancancer single-cell analysis reveals the heterogeneity and plasticity of cancer-associated fibroblasts in the tumor microenvironment. Nat Commun. 13:6619.

      Muhl, L., G. Genove, S. Leptidis, J. Liu, L. He, G. Mocci, Y. Sun, S. Gustafsson, B. Buyandelger, I.V.

      Chivukula, A. Segerstolpe, E. Raschperger, E.M. Hansson, J.L.M. Bjorkegren, X.R. Peng, M. Vanlandewijck, U. Lendahl, and C. Betsholtz. 2020. Single-cell analysis uncovers fibroblast heterogeneity and criteria for fibroblast and mural cell identification and discrimination. Nat Commun. 11:3953.

      Van Helden, D.F. 1993. Pacemaker potentials in lymphatic smooth muscle of the guinea-pig mesentery. J Physiol. 471:465-479.

      Vorontsova, I., J.T. Lock, and I. Parker. 2022. KRAP is required for diffuse and punctate IP(3)mediated Ca(2+) liberation and determines the number of functional IP(3)R channels within clusters. Cell Calcium. 107:102638.

      Winkler, E.A., R.D. Bell, and B.V. Zlokovic. 2010. Pericyte-specific expression of PDGF beta receptor in mouse models with normal and deficient PDGF beta receptor signaling. Mol Neurodegener. 5:32.

      Zawieja, S.D., G.A. Pea, S.E. Broyhill, A. Patro, K.H. Bromert, M. Li, C.E. Norton, J.A. CastorenaGonzalez, E.J. Hancock, C.D. Bertram, and M.J. Davis. 2023. IP3R1 underlies diastolic ANO1 activation and pressure-dependent chronotropy in lymphatic collecting vessels. J Gen Physiol. 155.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Horizontal gene transfer is the transmission of genetic material between organisms through ways other than reproduction. Frequent in prokaryotes, this mode of genetic exchange is scarcer in eukaryotes, especially in multicellular eukaryotes. Furthermore, the mechanisms involved in eukaryotic HGT are unknown. This article by Banerjee et al. claims that HGT occurs massively between cells of multicellular organisms. According to this study, the cell free chromatin particles (cfChPs) that are massively released by dying cells are incorporated in the nucleus of neighboring cells. These cfChPs are frequently rearranged and amplified to form concatemers, they are made of open chromatin, expressed, and capable of producing proteins. Furthermore, the study also suggests that cfChPs transmit transposable elements (TEs) between cells on a regular basis, and that these TEs can transpose, multiply, and invade receiving cells. These conclusions are based on a series of experiments consisting in releasing cfChPs isolated from various human sera into the culture medium of mouse cells, and using FISH and immunofluorescence to monitor the state and fate of cfChPs after several passages of the mouse cell line.

      Strengths:

      The results presented in this study are interesting because they may reveal unsuspected properties of some cell types that may be able to internalize free-circulating chromatin, leading to its chromosomal incorporation, expression, and unleashing of TEs. The authors propose that this phenomenon may have profound impacts in terms of diseases and genome evolution. They even suggest that this could occur in germ cells, leading to within-organism HGT with long-term consequences.

      Weaknesses:

      The claims of massive HGT between cells through internalization of cfChPs are not well supported because they are only based on evidence from one type of methodological approach: immunofluorescence and fluorescent in situ hybridization (FISH) using protein antibodies and DNA probes. Yet, such strong claims require validation by at least one, but preferably multiple, additional orthogonal approaches. This includes, for example, whole genome sequencing (to validate concatemerization, integration in receiving cells, transposition in receiving cells), RNA-seq (to validate expression), ChiP-seq (to validate chromatin state).

      We agree with the reviewer’s suggestions. We propose to use RNA-seq using an orthogonal platform as a solution. This will allow us to answer multiple questions viz. validation of expression of human DNA in mouse cells, obtaining a detailed insight into genes and pathways driven by human cfChPs and enable us to identify chimeric human and mouse transcripts.

      Another weakness of this study is that it is performed only in one receiving cell type (NIH3T3 mouse cells). Thus, rather than a general phenomenon occurring on a massive scale in every multicellular organism, it could merely reflect aberrant properties of a cell line that for some reason became permeable to exogenous cfChPs. This begs the question of the relevance of this study for living organisms.

      We agree with the reviewer’s suggestion. We propose to show horizontal transfer of cfChPs using four different cell-lines representing four different species.

      Should HGT through internalization of circulating chromatin occur on a massive scale, as claimed in this study, and as illustrated by the many FISH foci observed in Fig 3 for example, one would expect that the level of somatic mosaicism may be so high that it would prevent assembling a contiguous genome for a given organism. Yet, telomere-to-telomere genomes have been produced for many eukaryote species, calling into question the conclusions of this study.

      The reviewer is right in expecting that the level of somatic mosaicism may be so high that it would prevent assembling a contiguous genome. This is indeed the case, and we find that beyond ~ 250 passages the genomes of the cfChPs treated NIH3T3 cells begin to die out apparently become their genomes have become too unstable for survival. This point will be highlighted in the revised version. It is likely that cell death resulting from large scale HGT creates a vicious cycle of more cell death induced by cfChPs thereby helping to explain the massive daily turnover of cells in the body (10<sup>9</sup> – 10<sup>12</sup> cells per day).  

      Reviewer #2 (Public review):

      I must note that my comments pertain to the evolutionary interpretations rather than the study's technical results. The techniques appear to be appropriately applied and interpreted, but I do not feel sufficiently qualified to assess this aspect of the work in detail.

      I was repeatedly puzzled by the use of the term "function." Part of the issue may stem from slightly different interpretations of this word in different fields. In my understanding, "function" should denote not just what a structure does, but what it has been selected for. In this context, where it is unclear if cfChPs have been selected for in any way, the use of this term seems questionable.

      We think this is a matter of semantics. We have used the term “function” since cfChPs that enter the cell are biologically active; they transcribe, translate, synthesize, proteins and proliferate. We, therefore feel that the term function is not inappropriate.

      Similarly, the term "predatory genome," used in the title and throughout the paper, appears ambiguous and unjustified. At this stage, I am unconvinced that cfChPs provide any evolutionary advantage to the genome. It is entirely possible that these structures have no function whatsoever and could simply be byproducts of other processes. The findings presented in this study do not rule out this neutral hypothesis. Alternatively, some particular components of the genome could be driving the process and may have been selected to do so. This brings us to the hypothesis that cfChPs could serve as vehicles for transposable elements. While speculative, this idea seems to be compatible with the study's findings and merits further exploration.

      We take the reviewer’s point. We will replace the term “predatory genome” with a more neutral and factual term “supernumerary genome” in the title and throughout the manuscript in the revised version.

      I also found some elements of the discussion unclear and speculative, particularly the final section on the evolution of mammals. If the intention is simply to highlight the evolutionary impact of horizontal transfer of transposable elements (e.g., as a source of new mutations), this should be explicitly stated. In any case, this part of the discussion requires further clarification and justification.

      We propose to revise the “discussion” section taking into account the issues raised by the reviewer and highlight the potential role of cfChPs in evolution by acting as vehicles of transposable elements.  

      In summary, this study presents important new findings on the behavior of cfChPs when introduced into a foreign cellular context. However, it overextends its evolutionary interpretations, often in an unclear and speculative manner. The concept of the "predatory genome" should be better defined and justified or removed altogether. Conversely, the suggestion that cfChPs may function at the level of transposable elements (rather than the entire genome or organism) could be given more emphasis.

      Our responses to this paragraph are given in the two above sections.

    1. We belong to the community. It is not the tailoralone who is the ninth part of a man; it is as much the preacher, andthe merchant, and the farmer. Where is this division of labor to end?And what object does it finally serve? No doubt another may alsothink for me; but it is not therefore desirable that he should do so tothe exclusion of my thinking for myself.

      Thoreau is saying that in today's society, because we divide all our labor and rely on specialists, there are so many basic things that none of us actually know how to do. What example(s) can you think of?

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      This study presents useful insights into the in vivo dynamics of insulin-producing cells (IPCs), key cells regulating energy homeostasis across the animal kingdom. The authors provide compelling evidence using adult Drosophila melanogaster that IPCs, unlike neighboring DH44 cells, do not respond to glucose directly, but that glucose can indirectly regulate IPC activity after ingestion supporting an incretin-like mechanism in flies, similar to mammals. The authors link the decreased activity of IPCs to hyperactivity observed in starved flies, a locomotive behavior aimed at increasing food search. 

      Furthermore, there is supporting evidence in the paper that IPCs receive inhibitory inputs from Dh44 neurons, which are linked to increased locomotor activity. However, although the electrophysiological data underlying the dynamics of IPCs in vivo is compelling, the link between IPCs and other potential elements of the circuitry (e.g. octopaminergic neurons) regulating locomotive behaviors is not clear and would benefit from more rigorous approaches. 

      This paper is of interest to cell biologists and electrophysiologists, and in particular to scientists aiming to understand circuit dynamics pertaining to internal state-linked behaviors competing with the feeding state, shown here to be primarily controlled by the IPCs. 

      Strengths: 

      (1) By using whole-cell patch clamp recording, the authors convincingly showed the activity pattern of IPCs and neighboring DH44 neurons under different feeding states. 

      (2) The paper provides compelling evidence that IPCs are not directly and acutely activated by glucose, but rather through a post-ingestive incretin-like mechanism. In addition, the authors show that Dh44 neurons located adjacent to the IPCs respond to bath application of glucose contrary to the IPCs. 

      (3) The paper provides useful data on the firing pattern of 2 key cell populations regulating foodrelated brain function and behavior, IPCs and Dh44 neurons, results which are useful to understand their in vivo function. 

      Weaknesses: 

      (1) The term nutritional state generally refers to the nutrients which are beneficial to the animal. In Figure 1, the authors showed that IPCs respond to glucose but not proteins. To validate the term nutritional state the authors could test the effect of a non-nutritive sugar (e.g. D-arabinose or L-Glucose) on the post-ingestive physiological responses of the IPCs.

      We thank the referee for this insightful comment. Following their suggestion, we included two new experimental data sets, which we added to Figure 1: We show that IPCs do not respond to the non-nutritive sugar D-arabinose (Figure 1H). In order to further expand this data set and our conclusions, we additionally show that IPCs do respond to fructose – a second nutritive sugar in addition to glucose (Figure 1H). Together, these data sets permit the conclusion that IPCs are sensitive to the ingestion of nutritive sugars, and do not respond to ingestion of nonnutritive sugars or high protein diets. Thus, we validate the term nutritional state.

      (2) It is difficult to grasp the main message from the figures in the result section as some figures have several results subsections referring to different points the authors want to make. The key results of a figure will be easier to understand if they are summarized in one section of the results. Alternatively, a figure can be split into 2 figures if there are several key messages in those figures, e.g. Figures 2 and 3.  

      We appreciate this suggestion and have made several changes to our manuscript to add more clarity. Among other things, we have changed the order of data presentation in Figure 2, as suggested by the referee below, where we now start with the IPC activation data rather than the OAN activation. We also swapped the order of data presentation and split Figure S1 into Figures S1 & S2. Moreover, we re-arranged the panel order in supplementary figure S4. This significantly improved the flow of the results section. Since the figures the referee refers to contain comparative data, for example between diets (Figure 1) or neuron types (Figure 2), we prefer to keep these data sets together. However, we have carefully revised the results section to more clearly relate our statements to individual figure panels.

      (3) The prime investigation of the paper is about the physiological response and locomotive behavioral readout linked to IPCs. The authors do not show a link between OANs and IPCs in terms of functional or behavioral readouts. In Figure 2 the authors first start with stating a link between OAN neurons and locomotion changes resulting from internal feeding states. The flow of the paper would be better if the authors focused on the effect of optogenetic activation of IPCs under different feeding states and their impact on fly locomotion. If the experiments done on optogenetic activation of OANs were to validate the experimental approach the data on OAN neurons is better suited for the supplement without the need of a subsection in the result section on the OANs.  

      We agree with the reviewer’s suggestion and switched the order of the figure panels and text to aid the flow of the manuscript. We now show and discuss the IPC activation data first (Figure 2C-H) and OAN activation afterwards (Figure 2I-K). We did keep the OAN data in the main document, though, since that facilitates comparisons between the small effects of IPC activation and the large, well-established effects of OAN activation.

      (4) Figure 2F shows that optogenetic activation of IPCs in fed flies does not influence their locomotor output. In the text, the conclusion linked to Figure 2F-H states that IPC activation reduces starvation-induced hyperactivity which is a statement more suited to Figure 2I-K. 

      We edited the text accordingly.

      (5) The authors show activation of Dh44 neurons leads to hyperpolarisation of the IPCs. What is the functional link between non-PI Dh44 neurons and the IPCs? Do IPCs express DH44R or is DH44 required for this effect on IPCs? Investigating a potential synaptic or peptidergic link between DH44 neurons and IPCs and its effect on behavior would benefit the paper, as it is so far not well connected. 

      Although we have not performed any experiments dedicated to investigating the functional link between DH44Ns outside the PI and the IPCs in this study, there are two lines of evidence supporting that this connection is relatively direct. First, IPCs do express DH44R1 & R2, as we show in a parallel study in eLife (Held M, et al. ‘Aminergic and peptidergic modulation of Insulin-Producing Cells in Drosophila’. eLife. 2024;13. doi:10.7554/ELIFE.99548.1). Second, we performed functional connectivity experiments using a Leucokinin (LK) driver line in that paper. This driver line labels two pairs of non-PI DH44Ns in the VNC, which are DH44 and LK positive (Zandawala et al 2018). Activating that line leads to inhibition of IPCs, similar to the effect we observed here for DH44N activation. These two lines of evidence suggest that there could be a direct peptidergic connection between DH44+ neurons and IPCs. We have added a paragraph mentioning these experiments to our discussion:

      ‘Notably, the DH44<sup>PI</sup>Ns express the DH44 peptide, as confirmed by anti-DH44 stainings(100). This also applies to a large fraction of neurons labelled in the broad DH44 driver line(100). However, a subset of neurons labelled in the broad line did not exhibit DH44 immunoreactivity(100), and might therefore not actually express the DH44 peptide. Hence, the inhibition of IPCs could be driven by neurons in the DH44 driver line that do not express DH44. A strong candidate for the inhibition are LK and DH44-positive neurons, which are labelled by the broad line(76). In a parallel study, we showed that LK-expressing neurons strongly inhibit IPCs(30), similar to the broad DH44 line used here. Furthermore, evidence from single-nucleus transcriptomic analysis shows that IPCs express DH44-R1 and DH44-R2 receptors(30). Therefore, it is possible that DH44Ns communicate with IPCs through a direct peptidergic connection. Notably, the inhibitory effect of non-PI DH44Ns on IPCs was very strong and fast, suggesting that a connection via classical synapses is more likely. Regardless, our results show that the glucose sensing DH44<sup>PI</sup>Ns and IPCs act independently of each other.’

      Reviewer #2 (Public Review): 

      Summary: 

      In this study, Bisen et al. characterized the state-dependency of insulin-producing cells in the brain of *Drosophila melanogaster*. They successfully established that IPC activity is modulated by the nutritional state and age of the animal. Interestingly, they demonstrate that IPCs respond to the ingestion of glucose, rather than to perfusion with it, an observation reminiscent of the incretin effect in mammals. The study is well conducted and presented and the experimental data convincingly support the claims made. 

      Strengths: 

      The study makes great use of the tools available in *Drosophila* research, demonstrating the effect that starvation and subsequent refeeding have on the physiological activity of IPCs as well as on the behavior of flies to then establish causal links by making use of optogenetic tools. 

      It is particularly nice to see how the authors put their findings in context to published research and use for example TDC2 neuron activation or DH44 activity to establish baselines to relate their data to. 

      Weaknesses: 

      I find the inability of SD to rescue the IPC starvation effect in Figure 1G&H surprising, given that the fully fed flies were raised and kept on that exact diet. Did the authors try to refeed flies with SD for longer than 24 hours? I understand that at some point the age effect would also kick in and counteract potential IPC activity rescue. I think the manuscript would benefit if the authors could indicate the exact age of the SD refed flies and expand a bit on the discussion of that point.  

      We have expanded the first paragraph of our discussion to tackle these questions, in particular the potential effect of aging, as suggested by the referee. We now also indicate the exact age of the flies. Moreover, we have conducted additional experiments in which we added either glucose or arabinose to our standard diet (Figure 1H). As we would have expected based on our hypothesis that the glucose concentration in our standard diet was too low to cause an increase in IPC activity after starvation, we find that feeding standard diet plus glucose increases IPC activity to the same level as glucose only, and that adding arabinose to the standard diet does not lead to increased IPC activity after starvation (Figure 1H).

      The incretin-like effect is exciting and it will be interesting in the future to find out what might be the signal mediating this effect. It is interesting that IPCs in explants seem to be responsive to glucose. I think it would help if the authors could briefly discuss possible sources for the different findings between these in fact very different preparations. Could the the absence of the inhibitory DH44 feedback in the *ex-vivo* recordings for example play a role? 

      We thank the referee for this interesting point and expanded our discussion accordingly. We included that, in particular in brain explants without a VNC, the inhibitory connection we describe might be absent, as the referee suggested: ‘Previous ex vivo studies suggested that IPCs, like pancreatic beta cells, sense glucose cell-autonomously(23,24). Consistent with this, we observed an increase in IPC activity after the ingestion of glucose (Figure 2B). However, IPC activity did not increase during the perfusion of glucose directly over the brain. Importantly, the fly preparations were kept alive for several hours allowing the glucose-rich saline to enter circulation and reach all body parts. Several factors may explain the difference between ex vivo and in vivo preparations. First, in ex vivo studies, certain regulatory feedback mechanisms present in vivo could be absent. For example, the strong inhibitory input IPCs receive from DH44Ns we found would likely be absent in brain explants without a VNC. A lack of inhibitory feedback might allow for more direct glucose sensing by IPCs ex vivo, whereas in vivo, the IPC response could be suppressed by more complex systemic feedback. Second, we attempted to use the intracellular saline formulation employed in a previous ex vivo study44. However, we observed that IPCs depolarized quickly using this saline, leading to unstable recordings that did not meet our quality standards for in vivo experiments. Another possible explanation for the lack of an effect of glucose might have been that the dominant circulating sugar in flies is trehalose(70,71) which is derived from glucose. When we extended our experiments, we found that trehalose perfusion did not affect IPC activity either, strengthening the idea that IPCs do not directly sense changes in hemolymph sugar levels. Therefore, our findings suggest that, similar to mammals, IPC activity and hence, insulin release, is not simply modulated by hemolymph sugar concentration in Drosophila.’ 

      The incretin-like effect the authors observed seems to start only after 5h which seems longer than in mammals where, as far as I know, insulin peaks around 1h. Do the authors have ideas on how this timescale relates to ingestion and glucose dynamics in flies? 

      We have now included the following section in the discussion to explicitly address the question of different activity dynamics in flies and mammals, but also the limitations of our electrophysiological approach in this regard: ‘We observed that IPC activity increased over a timescale of hours, which is longer compared to the fast insulin response in mammals, where insulin typically peaks within an hour of feeding(97). In flies, insulin levels rise within minutes of refeeding, followed by a drop after 30 min(20). Our experimental techniques limit our ability to capture these fast initial dynamics, since the preparation for intracellular recordings requires tens of minutes, so that we typically recorded IPC activity at least 20 min after the last food ingestion. Notably, studies in fasted mammals have shown that insulin peaks within minutes of refeeding, followed by a rapid decline, with levels stabilizing as feeding continues(98,99). We speculate a similar dynamic could be present in flies, but with our approach, we capture the steady-state reached tens of minutes after food ingestion rather than a potential initial peak.’ 

      The authors mention "a decrease in the FV of IPC-activated starved flies even before the first optogenetic stimulation (Figure 2I),". Could this be addressed by running an experiment in darkness, only using the IR illumination of their behavioral assay? 

      We thank the referee for pointing out this unexpected result. We discuss this in more detail in the new version of our manuscript and expand on the reasons for not performing these optogenetic activation experiments in the dark: First, the red LED required to activate CsChrimson triggers strong startle responses in dark-adapted flies, which mask other behavioral effects, in particular subtle ones such as those observed for IPCs. The startle response is much reduced when performing experiments under low background light conditions. Second, flies, at least in our hands, do not exhibit robust foraging behavior or starvation-induced hyperactivity in the dark, which is critical for our behavioral experiments. However, we also explain in our discussion that we believe the effect of background illumination is relatively small, since flies expressing CsChrimson in OANs or DH44Ns show comparable activity levels to controls. Hence, a part of this effect is likely attributable to leak currents induced by CsChrimson expression. We would like to point out though that we are careful in our description of the IPC effect on behavior, and focus on the fact that it is considerably smaller than the effects of other modulatory neurons (DH44Ns and OANs).

      The authors show an inhibitory effect of DH44 neuron activation on IPC activity. They further demonstrate that DH44PI neurons are not the ones driving this and thus conclude that "...IPCs are inhibited by DH44Ns outside the PI.". As the authors mentioned the broad expression of the DH44-Gal4 line, can they be sure that the cells labeled outside the PI are actually DH44+? If so they should state this more clearly, if not they should adapt the discussion accordingly.   

      We have substantially added to our discussion of this point, according to the referee’s great suggestion. In short, the broad line includes neurons that are DH44 positive and neurons that are not: ‘Notably, the DH44<sup>PI</sup>Ns express the DH44 peptide, as confirmed by anti-DH44 stainings(100). This also applies to a large fraction of neurons labelled in the broad DH44 driver line(100). However, a subset of neurons labelled in the broad line did not exhibit DH44 immunoreactivity(100), and might therefore not actually express the DH44 peptide. Hence, the inhibition of IPCs could be driven by neurons in the DH44 driver line that do not express DH44.’

      Reviewer #3 (Public Review): 

      Although insulin release is essential in the control of metabolism, adjusted to nutritional state, and plays major roles in normal brain function as well as in aging and disease, our knowledge about the activity of insulin-producing (and releasing) cells (IPCs) in vivo is limited. 

      In this technically demanding study, IPC activity is studied in the Drosophila model system by fine in vivo patch clamp recordings with parallel behavioral analyses and optogenetic manipulation. 

      The data indicate that IPC activity is increased with a slow time course after feeding a high-glucose diet. By contrast, IPC activity is not directly affected by increasing blood glucose levels. This is reminiscent of the incretin effect known from vertebrates and points to a conserved mechanism in insulin production and release upon sugar feeding. 

      Moreover, the data confirm earlier studies that nutritional state strongly affects locomotion. Surprisingly, IPC activity makes only a negligible contribution to this. Instead, other modulatory neurons that are directly sensitive to blood glucose levels strongly affect modulation. Together, these data indicate a network of multiple parallel and interacting neuronal layers to orchestrate the physiological, metabolic, and behavioral responses to nutritional state. Together with the data from a previous study, this work sets the stage to dissect the architecture and function of this network. 

      Strengths: 

      State-of-the-art current clamp in situ patch clamp recordings in behaving animals are a demanding but powerful method to provide novel insight into the interplay of nutritional state, IPC activity, and locomotion. The patch clamp recordings and the parallel behavioral analyses are of high quality, as are the optogenetic manipulations. The data showing that starvation silences IPC activity in young flies (younger than 1 week) are compelling. The evidence for the claim that locomotor activity is not increased upon IPC activity but upon the activity of other blood glucose-sensitive modulatory neurons (Dh44) is strong. The study provides a great system to experimentally dissect the interplay of insulin production and release with metabolism, physiology, and behavior. 

      Weaknesses: 

      Neither the mechanisms underlying the incretin effect, nor the network to orchestrate physiological, metabolic, and behavioral responses to nutritional state have been fully uncovered. Without additional controls, some of the conclusions would require significant downtoning. Controls are required to exclude the possibility that IPCs sense other blood sugars than glucose. The claim that IPC activity is controlled by the nutritional state would require that starvation-induced IPC silencing in young animals can be recovered by feeding a normal diet. At current firing in starvation, silenced IPCs can only be induced by feeding a high-glucose diet that lacks other important ingredients and reduces vitality. Therefore, feasible controls are needed to exclude that diet-induced increases in IPC firing rate are caused by stress rather than nutritional changes in normal ranges. The finding that refeeding starved flies with a standard diet had no effect on IPC activity but a strong effect on the locomotor activity of starved flies contradicts the statement that locomotor activity is affected by the same dietary manipulations that affect IPC activity. The compelling finding that starvation induces IPC firing would benefit from determining the time course of the effect. The finding that IPCs are not active in fed animals older than 1 week is surprising and should be further validated. 

      We thank the referee for the thoughtful and constructive criticism of our experiments and conclusions. Below, we lay out how we tackled the individual points raised by the referee.

      (1) ‘Controls are required to exclude the possibility that IPCs sense other blood sugars than glucose.’  

      To address this point, we conducted experiments in which we perfused trehalose (Figure 3B), the main circulating hemolymph sugar in Drosophila and other insects. Our results clearly show that trehalose does not affect IPC activity upon perfusion, confirming our statements that IPCs do not sense key blood sugars directly.

      (2) ‘Feasible controls are needed to exclude that diet-induced increases in IPC firing rate are caused by stress rather than nutritional changes in normal ranges’. 

      We agree with the referee that this point was not completely fleshed out in our first submission. We have now performed additional experiments in which we added glucose (and fructose) to our standard diet (Figure 1H). Flies feeding on this diet received all necessary nutrients but still experienced high concentrations of sugars. The effects of high glucose in a standard diet background were indistinguishable from those of high glucose in agarose, confirming that the IPCs respond to sugar rather than stress. Another important observation in this context is that IPCs in flies kept on a high protein diet exhibited much lower spike rates than flies exhibiting the high glucose diet, even though they had a much shorter lifespan and therefore, presumably, experienced much higher stress levels (Figure 1H, Figure S1). These observations underline that stress is certainly not the primary factor here.

      (3) ‘The finding that refeeding starved flies with a standard diet had no effect on IPC activity but a strong effect on the locomotor activity of starved flies contradicts the statement that locomotor activity is affected by the same dietary manipulations that affect IPC activity.’

      We have revised the respective section of the results and discussion accordingly and are more careful and clearer in our interpretation of this behavioral dataset: ‘These results show that the locomotor activity was affected by the same dietary manipulations that had strong effects on IPC activity. However, IPC activity changes alone cannot explain the modulation of starvation-induced hyperactivity. On the one hand, high-glucose diets which drove the highest activity in IPCs were not sufficient to reduce locomotor activity back to baseline levels. On the other hand, refeeding flies with SD did not revert the effects of starvation on IPC activity (Figure 1H), but it was sufficient to reduce the locomotor activity below baseline levels (Figure 2B). This suggests that the modulation of starvation-induced hyperactivity is achieved by multiple modulatory systems acting in parallel.’

      (4) ‘The compelling finding that starvation induces IPC firing would benefit from determining the time course of the effect.’

      We followed the referee’s excellent suggestion and determined the time course of the starvation effect in three timesteps, similar to the experiments we did for refeeding (Figure 1G). In addition, we now also quantify the number of active IPCs (i.e., IPCs that fired at least one action potential during our five-minute analysis window), which further illustrates the dynamics of the starvation and refeeding effects. We find that the starvation effect is graded, and that IPC activity decreases with increasing starvation duration.

      (5) ‘The finding that IPCs are not active in fed animals older than 1 week is surprising and should be further validated.’

      To address the referee’s comment, we have added 14 new IPC recordings from flies in the 6–26-day range, such that we now have recordings from 9-14 IPCs for each age range (Figure S2B). They confirmed our previous analysis and strengthened the finding that IPC activity dramatically decreases after 8 days (on our standard diet). The total number of IPCs in this supplementary dataset was thus increased from 34 to 48.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      (1) Do IPCs respond to glucose specifically after ingestion or generally to any other nutritive sugars? To tackle this question the IPC responses in starved flies can be recorded after refeeding flies with other nutritive sugars (fructose, sucrose). 

      To address this important question, we have performed additional experiments in which we refed starved flies with fructose, as a nutritive sugar, and arabinose, as a non-nutritive sugar. As expected, IPCs responded to fructose but not arabinose and hence nutritive sugars in general. We describe and discuss these key results in the new version of our manuscript.

      (2) In Figure 2, the x and y axes are not annotated on all subfigures, which might help improve clarity. 

      We have annotated the subfigures as requested.

      (3) In the discussion on page 9 ("...we observed an increase in IPC activity after the ingestion of glucose (Figure 2B)."), the authors refer to Figure 2B instead of 3C.

      We have fixed this oversight.

      Reviewer #2 (Recommendations For The Authors): 

      Introduction 

      I think it could be helpful for the reader if you would briefly state the number of IPCs and whether you are targeting all of them with Dilp2-Gal4. 

      We included the numbers according to the suggestion. 14 IPCs are labeled in the driver line, and this is the number of IPCs commonly assumed to be present in the PI.

      Figures 

      In some Figures (for example 1D & E) the authors state the number of IPCs recorded (N) but not the number of animals used (n). This should be stated as the data from within an animal are dependent and might give insights about IPC heterogeneity. 

      We have compiled tables for the supplementary material (Tables S5 & S6) in which we state the number of IPCs and DH44<sup>PI</sup>Ns recorded and the number of different flies for each figure panel. We have recorded an average of 1.4 IPCs per fly (217 IPCs from 160 flies). We therefore expect the bias introduced by individual flies to be rather small. However, in our parallel study, we specifically investigate the heterogeneity of IPCs by maximizing the number of IPCs recorded per fly (Held M, et al. ‘Aminergic and peptidergic modulation of Insulin-Producing Cells in Drosophila’. eLife. 2024;13. doi:10.7554/ELIFE.99548.1). In the case of DH44PINs, we recorded 24 neurons in 21 flies – 1.1 neurons per fly.

      - Figure 3D: There is some white visible among the cell bodies in the overlay. I assume this comes from projecting across layers rather than indicating DH44 - IPC overlap? It would help to explicitly state that. 

      We have added a statement to the results section, in which we explain that most of the white is due to overlap in the z-projection rather than overlap in the driver lines. However, there are few cases (typically one to two cells per brain), in which neurons labeled by the DH44 line also stain positive for Dilp2, indicating they express both neuropeptides. We have added this information to the manuscript:  

      Results: ‘DH44<sup>PI</sup>Ns are anatomically similar to IPCs, and their cell bodies are located directly adjacent to those of IPCs in the PI, making them an ideal positive control for our experiments (Figure 3D). A small subset of DH44<sup>PI</sup>Ns also expresses Dilp2(75), and our immunostainings confirmed colocalization of Dilp2 and DH44 in a single neuron (Figure 3D, white arrow).’

      In figure caption: ‘UAS-myr-GFP was expressed under a DH44-GAL4 driver to label DH44 neurons. GFP was enhanced with anti-GFP (green), brain neuropils were stained with anti-nc82 (cyan), and IPCs were labelled using a Dilp2 antibody (magenta). White arrow indicates Dilp2 and DH44-GAL4 positive neuron. The other white regions in the image result from an overlap in z-projections between the two channels, rather than from antibody colocalization.’

      - Figure 4I: One might get the impression that the fast onset peak of activity precedes the stimulation onset, using a thinner line width might help avoid that. 

      This effect is due to a combination of using relatively heavy lines for clear visibility of the data and a gentle smoothing step (a 2s median filter, which corresponds to less than 1% of the 300s stimulation window) in our analysis of the behavioral data. However, inspection of the raw data clearly shows increases in velocity after the onset of the optogenetic activation. We clarified this in the figure caption: ‘Average FV across all DH44N activation trials based on two independent replications of the experiment in I. Note that the peak in average FV lies within the first frame of the stimulation window.’

      - S3 panel letters do not match references in the text.

      We fixed this oversight.

      Formatting 

      - Page 10: The paragraphs on the bottom of the page got switched around.

      This has been fixed.

      - Page 14: The first paragraph after the header "Free-walking assay" seems to be coming from elsewhere. 

      We apologize for this slightly embarrassing mistake. We used our related bioRxiv preprint (Held et al.) as a template for formatting this paper, and accidentally left this part of the methods section in the manuscript. We have fixed this error in our resubmission.

      Reviewer #3 (Recommendations For The Authors): 

      Major suggestions: 

      (1) The data show convincingly that IPC activity is decreased by starvation during the first week of adult life (Figures 1C and D). However, the conclusion that IPC activity is controlled by the nutritional state requires additional care. First, refeeding starved adult animals with a normal diet does not bring back normal IPC firing rates (Figure 1H). Therefore, IPC activity does not strictly follow changes in nutritional state, but IPCs are silenced by starvation. Second, from the second week of adult life on, IPCs are silent anyway, and thus unlikely responsive to changes in the nutritional state anymore (which might be different on a different standard diet?) The only effect of feeding on IPC activity is observed upon feeding starved, young animals with high glucose for 12-24 hrs (Figure 1G). However, it is not clear whether increased IPC firing is caused by the effects of high glucose on the nutritional state in a normal range, or because of diet-induced stress (the diet also severely shortens lifespan, Figure 1S). Does high glucose also increase IPC firing rate in young, fed animals? These would have strongly increased glucose concentrations but not suffer the stress of not getting any other nutrients. Such experiments would be required to make the statement that glucose feeding increases IPC firing rate. 

      We have performed several experiments to address this criticism. First, we performed a time course analysis of the starvation effect. We show that the IPC activity reduction is graded, and that IPC activity declines already after two hours of starvation, a timepoint at which stress levels should still be relatively small (Figure 1G). Second, we refed flies with high glucose concentrations added to the standard diet (Figure 1H). This minimized any potential stress responses due to a lack in nutrients. Third, we now show that IPCs specifically respond to nutritive (glucose and fructose), but not to non-nutritive sugars (arabinose, Figure 1H). We believe that these data sets, in addition to the graded refeeding effect, make a strong case for the nutritional state dependent modulation of IPCs. 

      (2) The testing of locomotor activity is well done, nicely recapitulates starvation-induced increases in locomotion, and adds interesting novel findings on refeeding with high glucose versus high protein diet. However, the statement that locomotor activity was affected by the same dietary manipulations that had strong effects on IPC activity does not reflect the data presented. Refeeding starved flies with a standard diet had no effect on IPC activity (Figure 1H) but a strong effect on locomotor activity of starved flies (a strong reduction, even stronger than high glucose diet, Figure 2B). 

      We have revised the respective section of the results and discussion accordingly and are more careful and clearer in our interpretation of this behavioral dataset: ‘These results show that the locomotor activity was affected by the same dietary manipulations that had strong effects on IPC activity. However, IPC activity changes alone cannot explain the modulation of starvationinduced hyperactivity. On the one hand, high-glucose diets which drove the highest activity in IPCs were not sufficient to reduce locomotor activity back to baseline levels. On the other hand, refeeding flies with SD did not revert the effects of starvation on IPC activity (Figure 1H), but it was sufficient to reduce the locomotor activity below baseline levels (Figure 2B). This suggests that the modulation of starvation-induced hyperactivity is achieved by multiple modulatory systems acting in parallel.’

      Related to points 1 and 2, a key statement that the results establish that IPC activity is controlled by the nutritional state requires care. What the data convincingly show is that IPC activity is near zero upon starvation. 

      As described above, we have added several extensive data sets (fructose feeding, arabinose feeding, trehalose perfusion, starvation time course) to show that we indeed observe a nutritional state dependent modulation of IPCs and describe these new results in the results and discussion.

      (3) The time course of nutritional state-dependent changes of IPC activity is claimed to be slow, several hours to days. Unless I have missed a figure, the underlying data are not presented (only for high glucose diet). It would be great if this could also be shown for a standard diet with higher glucose concentrations than the one used so that it rescues starvation-induced IPC silencing without shortening lifespan (if this is feasible?). The data showing starvation-induced IPC silencing are convincing, but, unless I have missed it, the time course has not been determined. It would be very nice to actually show this. Have different starvation times been tested in relation to IPC firing rate, and if yes, with what time resolution? Does IPC activity change already after 0.5 or 1 or a few hours of starvation? If starvation can silence IPCs faster than assumed, the nearzero IPC activity in animals older than a week could very well be caused by longer time intervals between meals. 

      We have performed experiments to address both important points raised by the referee here. 1) We have added high glucose concentrations to our standard diet, and show that it has the same effect – a significant increase in IPC activity – as the high glucose diet (Figure 1H). 2) We have analyzed the time course of IPC activity reduction in response to starvation (Figure 1G). Indeed, we find that a few hours of starvation start reducing IPC activity. We discuss the possibility that reduced IPC activity in older flies could be due to reduced food intake: ‘One of our experiments demonstrated that IPC activity was heavily diminished in flies older than 10 days (Figure S2B). A possible explanation could be that flies feed less as they age. However, this only holds true for flies older than 14 days86. Therefore, reduced IPC activity in 10-11 day old flies is unlikely to result from reduced food intake and likely involves inhibition of insulin signaling.’

      (4) The data on the proposed incretin effect are of high importance in potentially highlighting a highly conserved link between glucose ingestion and insulin release. An important control would be to test different sugars, such as trehalose, an important blood sugar of flies. If glucose is converted into trehalose and this is what IPCs sense, then perfusion of glucose has no effect. The fact fantastic experiments show that the DH44 neurons are sensitive to glucose perfusion does rule out that IPCs sense a different sugar. This would be very different from the incretin effect that requires additional hormones. In addition, as mentioned above, controls are required to show that high glucose affects IPCs as a nutrient and not as a stressor (see point 1), for example refeeding with a standard diet that contains a higher glucose concentration but does not reduce lifespan. Another great control to solidify the exciting claim on the incretin effect would be to knock out candidate Drosophila incretin hormones and test whether a high glucose diet stops increasing the IPC firing rate (although simpler controls might also do the job). 

      We have performed the two key experiments suggested by the referee. 1) We perfused trehalose as the primary blood sugar of flies and showed that IPCs do not respond to trehalose perfusion (Figure 3B & C). This further strengthens the finding that IPC activity in flies shows an incretin-like effect. 2) We have added high concentrations of glucose to our standard diet to provide flies with a full diet that contains high glucose concentrations. IPC activity in these flies was indistinguishable from the activity in flies which consumed pure glucose diets. In contrast, IPC activity in flies kept on a high protein diet, which dramatically reduced lifespan, was very low. These results clearly show that higher IPC activity is not due to increased stress levels, but a function of nutritive sugar ingestion. We further validated this hypothesis by refeeding flies with fructose as a nutritive sugar, which increased IPC activity, and arabinose as a non-nutritive sugar, which did not affect IPC activity (Figure 1H).

      Another point that might be relevant to this discussion is that IPC activity is almost entirely shut down during flight in Drosophila (which we showed in Liessem et al. 2023, Current Biology 33 (3), 449-463. e5). Several ‘stress hormones’ are released during flight, including octopamine. The fact that IPC activity is low in flying flies, starved flies, and flies kept on a pure protein diet (which all experience high stress levels), to us, very clearly suggests that stress is not the predominant factor here. We would also like to point out that, while the lifespan was reduced in flies kept on pure glucose diets, survival rates were at 100% until day 14, and we carried out our experiments on day 2 after starvation. Hence, these flies might not (yet) experience particularly high stress levels.

      (5) The discussion relates the absence of IPC firing in animals older than 1 week to aging. However, given that the flies fed on a normal diet show the typical lifespan for Drosophila, a 10-dayold fly is still in its youth. Maybe flies at 10 days eat simply less and thus IPC spiking goes down as in starved flies, especially because the standard diet used contains low glucose. Do IPCs also become silent after a week if the animals are fed with a standard diet that contains a higher glucose concentration? Without additional controls, this part of the discussion is pretty speculative and should be revised. 

      We agree with the reviewer, that it is not clear whether reduced IPC activity is a direct result of physiological changes that occur with aging, or an indirect effect of reduced food intake, which occur during aging. In both cases, in our view, it would be an age-related effect. Since this is a minor point of our manuscript, we decided not to perform additional experiments, other than significantly increasing the sample size for the aging data set already presented to shore up our findings (Figure S2B). We have, however, revisited the discussion of this point according to the referee’s suggestion: ‘One of our experiments demonstrated that IPC activity was heavily diminished in flies older than 10 days (Figure S2B). A possible explanation could be that flies feed less as they age. However, this only holds true for flies older than 14 days(85). Therefore, reduced IPC activity in 10-11 day old flies is unlikely to result from reduced food intake and likely involves inhibition of insulin signaling.’

      Other suggestions: 

      (6) For the mixed effects of octopamine and tyramine on larval locomotion that are referred to, it might be interesting to also look at Schützler et al 2019, PNAS because it shows that starvation activates TBH so that the octopamine to tyramine ratio is increased. 

      We refer to Schützler et al. in the following paragraph of our discussion: ‘This intermittent locomotor arrest has been previously described in adult flies and is thought to be mediated by ventral unpaired median OANs, which have been suggested to suppress long-distance foraging behavior(69). Since these are not the only neurons we activate in the TDC2 line, we speculate that the stopping phenotype could also result from concerted effects of octopamine and tyramine modulating muscle contractions(65-67) and motor neuron excitability(68), as previously described in Drosophila larvae, or from OANs interfering with pattern generating networks in the ventral nerve cord (VNC) during longer activation(69).’  

      (7) The reference list requires care. For example, reference 43 is identical to 67, reference 66 gives no information on incretin-like hormones in Drosophila as stated in the text 

      We carefully double-checked our reference list and corrected the mistakes mentioned.

    1. Author response:

      Reviewer #1 (Public review):

      I did not follow the logic behind including spindle amplitude in the meta-analysis. This is not a measure of SO-spindle coupling (which is the focus of the review), unless the authors were restricting their analysis of the amplitude of coupled spindles only. It doesn't sound like this is the case though. The effect of spindle amplitude on memory consolidation has been reviewed in another recent meta-analysis (Kumral et al, 2023, Neuropsychologia). As standardization this isn't a measure of coupling, it wasn't clear why this measure was included in the present meta-analysis. You could easily make the argument that other spindle measures (e.g., density, oscillatory frequency) could also have been included, but that seems to take away from the overall goal of the paper which was to assess coupling.

      Indeed, spindle amplitude refers to all spindle events rather than only coupled spindles. This choice was made because we recognized the challenge of obtaining relevant data from each study—only 4 out of the 23 included studies performed their analyses after separating coupled and uncoupled spindles. This inconsistency strengthens the urgency and importance of this meta-analysis to standardize the methods and measures used for future analysis on SO-SP coupling and beyond. We agree that focusing on the amplitude of coupled spindles would better reveal their relations with coupling, and we will discuss this limitation in the manuscript.

      Nevertheless, we believe including spindle amplitude in our study remains valuable, as it served several purposes. First, SO-SP coupling involves the modulation between spindle amplitude and slow oscillation phase. Different studies have reported conflicting conclusions regarding how spindle amplitude was related to coupling– some found significant correlations (e.g., Baena et al., 2023), while others did not (e.g., Roebber et al., 2022). This discrepancy highlights an indirect but potentially crucial insight into the role of spindle amplitude in coupling dynamics. Second, in studies related to SO-SP coupling, spindle amplitude is one of the most frequently reported measures along with other coupling measures that significantly correlated with oversleep memory improvements (e.g. Kurz et al., 2023; Ladenbauer et al., 2021; Niknazar et al., 2015), so we believe that including this measure can more comprehensively review of the existing literature on SO-SP coupling. Third, incorporating spindle amplitude allows for a direct comparison between the measurement of coupling and individual events alone in their contribution to memory consolidation– a question that has been extensively explored in recent research. (e.g., Hahn et al., 2020; Helfrich et al., 2019; Niethard et al., 2018; Weiner et al., 2023). Finally, spindle amplitude was identified as a key moderator for memory consolidation in Kumral et al.'s (2023) meta-analysis. By including it in our analysis, we sought to replicate their findings within a broader framework and introduce conceptual overlaps with existing reviews. Therefore, although we were not able to selectively include coupled spindles, there is still a unique relation between spindle amplitude and SO-SP coupling that other spindle measures do not have. 

      Originally, we also intended to include coupling density or counts in the analysis, which seems more relevant to the coupling metrics. However, the lack of uniformity in methods used to measure coupling density posed a significant limitation. We hope that our study will encourage consistent reporting of all relevant parameters in future research, enabling future meta-analyses to incorporate these measures comprehensively. We will add this discussion to the manuscript in the revised version to further clarify these points.

      References:

      Roebber, J. K., Lewis, P. A., Crunelli, V., Navarrete, M. & Hamandi, K. Effects of anti-seizure medication on sleep spindles and slow waves in drug-resistant epilepsy. Brain Sci. 12, 1288 (2022). https://doi.org/10.3390/brainsci12101288

      All other citations were referenced in the manuscript.

      At the end of the first paragraph of section 3.1 (page 13), the authors suggest their results "... further emphasise the role of coupling compared to isolated oscillation events in memory consolidation". This had me wondering how many studies actually test this. For example, in a hierarchical regression model, would coupled spindles explain significantly more variance than uncoupled spindles? We already know that spindle activity, independent of whether they are coupled or not, predicts memory consolidation (e.g., Kumral meta-analysis). Is the variance in overnight memory consolidation fully explained by just the coupled events? If both overall spindle density and coupling measures show an equal association with consolidation, then we couldn't conclude that coupling compared to isolated events is more important.

      While primary coupling measurements, including coupling phase and strength, showed strong evidence for their associations with memory consolidation, measures of spindles, including spindle amplitude, only exhibited limited evidence (or “non-significant” effect) for their association with consolidation. These results are consistent with multiple empirical studies using different techniques (e.g., Hahn et al., 2020; Helfrich et al., 2019; Niethard et al., 2018; Weiner et al., 2023), which reported that coupling metrics are more robust predictors of consolidation and synaptic plasticity than spindle or slow oscillation metrics alone. However, we agree with the reviewer that we did not directly separate the effect between coupled and uncoupled spindles, and a more precise comparison would involve contrasting the “coupling of oscillation events” with ”individual oscillation events” rather than coupling versus isolated events.

      We recognized that Kumral and colleagues’ meta-analysis reported a moderate association between spindle measures and memory consolidation (e.g., for spindle amplitude-memory association they reported an effect size of approximately r = 0.30). However, one of the advantages of our study is that we actively cooperated with the authors to obtain a large number of unreported and insignificant data relevant to our analysis, as well as separated data that were originally reported under mixed conditions. This approach decreases the risk of false positives and selective reporting of results, making the effect size more likely to approach the true value. In contrast, we found only a weak effect size of r = 0.07 with minimal evidence for spindle amplitude-memory relation. However, we agree with the reviewer that using a more conservative term in this context would be a better choice since we did not measure all relevant spindle metrics including the density.

      To improve clarity in our manuscript, we will revise the statement to: “Together with other studies included in the review, our results suggest a crucial role of coupling but did not support the role of spindle events alone in memory consolidation,” and provide relevant references. We believe this can more accurately reflect our findings and the existing literature to address the reviewer’s concern.

      It was very interesting to see that the relationship between the fast spindle coupling phase and overnight consolidation was strongest in the frontal electrodes. Given this, I wonder why memory promoting fast spindles shows a centro-parietal topography? Surely it would be more adaptive for fast spindles to be maximally expressed in frontal sites. Would a participant who shows a more frontal topography of fast spindles have better overnight consolidation than someone with a more canonical centro-parietal topography? Similarly, slow spindles would then be perfectly suited for memory consolidation given their frontal distribution, yet they seem less important for memory.

      Regarding the topography of fast spindles and their relationship to memory consolidation, we agree this is an intriguing issue, and we have already developed significant progress in this topic in our ongoing work. We share a few relevant observations: First, there are significant discrepancies in the definition of “slow spindle” in the field. Some studies defined slow spindle from 9-12 Hz (e.g. Mölle et al., 2011; Kurz et al., 2021), while others performed the event detection within a range of 11-13/14 Hz (e.g. Barakat et al., 2011; D'Atri et al., 2018). Compounding this issue, individual differences in spindle frequency are often overlooked, leading to challenges in reliably distinguishing between slow and fast spindles. Some studies have reported difficulty in clearly separating the two types of spindles altogether (e.g., Hahn et al., 2020). Moreover, a critical factor often ignored in past research is the traveling nature of both slow oscillations and spindles across the cortex, where spindles are coupled with significantly different phases of slow oscillations (see Figure 5). We believe a better understanding of coupling in the context of the movement of these waves will help us better understand the observed frontal relationship with consolidation. We will address this in our revised manuscript.

      The authors rightly note the issues with multiple comparisons in sleep physiology and memory studies. Multiple comparison issues arise in two ways in this literature. First are comparisons across multiple electrodes (many studies now use high-density systems with 64+ channels). Second are multiple comparisons across different outcome variables (at least 3 ways to quantify coupling (phase, consistency, occurrence) x 2 spindle types (fast, slow). Can the authors make some recommendations here in terms of how to move the field forward, as this issue has been raised numerous times before (e.g., Mantua 2018, Sleep; Cox & Fell 2020, Sleep Medicine Reviews for just a couple of examples). Should researchers just be focusing on the coupling phase? Or should researchers always report all three metrics of coupling, and correct for multiple comparisons? I think the use of pre-registration would be beneficial here, and perhaps could be noted by the authors in the final paragraph of section 3.5, where they discuss open research practices.

      There are indeed multiple methods that we can discuss, including cluster-based and non-parametric methods, etc., to correct for multiple comparisons in EEG data with spatiotemporal structures. In addition, encouraging the reporting of all tested but insignificant results, at least in supplementary materials, is an important practice that helps readers understand the findings with reduced bias. We agree with the reviewer’s suggestions and will add more information in section 3.5 to advocate for a standardized “template” used to analyze and report effect size in future research.

      We advocate for the standardization of reporting all three coupling metrics– phase, consistency, and occurrence. Each coupling metric captures distinct properties of the coupling process and may interact with one another (Weiner et al., 2023). Therefore, we believe it is essential to report all three metrics to comprehensively explore their different roles in the “how, what, and where” of long-distance communication and consolidation of memory. As we advance toward a deeper understanding of the relationship between memory and sleep, we hope this work establishes a standard for the standardization, transparency, and replication of relevant studies.

      Reviewer #2 (Public review):

      Regarding the Moderator of Age: Although the authors discuss the limited studies on the analysis of children and elders regarding age as a moderator, the figure shows a significant gap between the ages of 40 and 60. Furthermore, there are only a few studies involving participants over the age of 60. Given the wide distribution of effect sizes from studies with participants younger than 40, did the authors test whether removing studies involving participants over 60 would still reveal a moderator effect?

      We agree that there is an age gap between younger and older adults, as current studies often focus on contrasting newly matured and fully aged populations to amplify the effect, while neglecting the gradual changes in memory consolidation mechanisms across the aging spectrum. We suggest that a non-linear analysis of age effects would be highly valuable, particularly when additional child and older adult data become available.

      In response to the reviewer’s suggestion, we re-tested the moderation effect of age after excluding effect sizes from older adults. The results revealed a decrease in the strength of evidence for phase-memory association due to increased variability, but were consistent for all other coupling parameters. The mean estimations also remained consistent (coupling phase-memory relation: -0.005 [-0.013, 0.004], BF10 = 5.51, the strength of evidence reduced from strong to moderate; coupling strength-memory relation: -0.005 [-0.015, 0.008], BF10 = 4.05, the strength of evidence remained moderate). These findings align with prior research, which typically observed a weak coupling-memory relationship in older adults during aging (Ladenbauer et al, 2021; Weiner et al., 2023) but not during development (Hahn et al., 2020; Kurz et al., 2021; Kurz et al., 2023). Therefore, this result is not surprising to us, and there are still observable moderate patterns in the data. We will report these additional results in the revised manuscript, and interpret “the moderator effect of age becomes less pronounced during development after excluding the older adult data”. We believe the original findings including the older adult group remain meaningful after cautious interpretation, given that the older adult data were derived from multiple studies and different groups.

      Reviewer #3 (Public review):

      First, the authors conclude that "SO-SP coupling should be considered as a general physiological mechanism for memory consolidation". However, the reported effect sizes are smaller than what is typically considered a "small effect" (0.10)

      While we acknowledge the concern about the small effect sizes reported in our study, it is important to contextualize these findings within the field of neuroscience, particularly memory research. Even in individual studies, small effect sizes are not uncommon due to the inherent complexity of the mechanisms involved and the multitude of confounding variables. This is an important factor to be considered in meta-analyses where we synthesize data from diverse populations and experimental conditions. For example, the relationship between SO-slow SP coupling and memory consolidation in older adults is expected to be insignificant.

      As Funder and Ozer (2019) concluded in their highly cited paper, an effect size of r = 0.3 in psychological and related fields should be considered large, with r = 0.4 or greater likely representing an overestimation and rarely found in a large sample or in a replication. Therefore, we believe r = 0.1 should not be considered as a lower bound of the small effect. Bakker et al. (2019) also advocate for a contextual interpretation of the effect size. This is particularly important in meta-analyses, where the results are less prone to overestimation compared to individual studies, and we cooperated with all authors to include a large number of unreported and insignificant results. In this context, small correlations may contain substantial meaningful information to interpret. Although we agree that effect sizes reported in our study are indeed small at the overall level, they reflect a rigorous analysis that incorporates robust evidence across different levels of moderators. Our moderator analyses underscore the dynamic nature of coupling-memory relationships, with certain subgroups demonstrating much stronger and more meaningful effects, especially after excluding slow spindles and older adults. For example, both the coupling phase and strength of frontal fast spindles with slow oscillations exhibited "moderate-to-large" correlations with the consolidation of different types of memory, especially in young adults, with r values ranging from 0.18 to 0.32. (see Table S9.1-9.4). We will add more discussion about the influence of moderators on the dynamics of coupling-memory associations. In addition, we will update the conclusion to be “SO-fast SP coupling should be considered as a general physiological mechanism for memory consolidation”.

      Reference:

      Funder, D. C. & Ozer, D. J. Evaluating effect size in psychological research: sense and nonsense. Adv. Methods Pract. Psychol. Sci. 2, 156–168 (2019). https://doi.org/10.1177/2515245919847202.

      Bakker, A. et al. Beyond small, medium, or large: Points of consideration when interpreting effect sizes. Educ. Stud. Math. 102, 1–8 (2019). https://doi.org/10.1007/s10649-019-09908-4

      Second, the study implements state-of-the-art Bayesian statistics. While some might see this as a strength, I would argue that it is the greatest weakness of the manuscript. A classical meta-analysis is relatively easy to understand, even for readers with only a limited background in statistics. A Bayesian analysis, on the other hand, introduces a number of subjective choices that render it much less transparent.

      This kind of analysis seems not to be made to be intelligible to the average reader. It follows a recent trend of using more and more opaque methods. Where we had to trust published results a decade ago because the data were not openly available, today we must trust the results because the methods can no longer be understood with reasonable effort.

      This becomes obvious in the forest plots. It is not immediately apparent to the reader how the distributions for each study represent the reported effect sizes (gray dots). Presumably, they depend on the Bayesian priors used for the analysis. The use of these priors makes the analyses unnecessarily opaque, eventually leading the reader to question how much of the findings depend on subjective analysis choices (which might be answered by an additional analysis in the supplementary information).

      We appreciate the reviewer for sharing this viewpoint and we value the opportunity to clarify some key points. To address the concern about clarity, we will include a sub-section in the methods section explaining how to interpret Bayesian statistics including priors, posteriors, and Bayes factors, making our results more accessible to those less familiar with this approach.

      On the use of Bayesian models, we believe there may have been a misunderstanding. Bayesian methods, far from being "opaque" or overly complex, are increasingly valued for their ability to provide nuanced, accurate, and transparent inferences (Sutton & Abrams, 2001; Hackenberger, 2020; van de Schoot et al., 2021; Smith et al., 1995; Kruschke & Liddell, 2018). It has been applied in more than 1,200 meta-analyses as of 2020 (Hackenberger, 2020). In our study, we used priors that assume no effect (mean set to 0, which aligns with the null) while allowing for a wide range of variation to account for large uncertainties. This approach reduces the risk of overestimation or false positives and demonstrates much-improved performance over traditional methods in handling variability (Williams et al., 2018; Kruschke & Liddell, 2018). Sensitivity analyses reported in the supplemental material (Table S9.1-9.4) confirmed the robustness of our choices of priors– our results did not vary by setting different priors.

      As Kruschke and Liddell (2018) described, “shrinkage (pulling extreme estimates closer to group averages) helps prevent false alarms caused by random conspiracies of rogue outlying data,” a well-known advantage of Bayesian over traditional approaches. This explains the observed differences between the distributions and grey dots in the forest plots. Unlike p-values, which can be overestimated with a large sample size and underestimated with a small sample size, Bayesian methods make assumptions explicit, enabling others to challenge or refine them– an approach aligned with open science principles (van de Schoot et al., 2021). For example, a credible interval in Bayesian model can be interpreted as “there is a 95% probability that the parameter lies within the interval.”, while a confidence interval in frequentist model means “In repeated experiments, 95% of the confidence intervals will contain the true value.” We believe the former is much more straightforward and convincing for readers to interpret. We will ensure our justification for using Bayesian models is more clearly presented in the manuscript.

      We acknowledge that even with these justifications, different researchers may still have discrepancies in their preferences for Bayesian and frequentist models. To increase the effort of transparent reporting, we have also reported the traditional frequentist meta-analysis results in Supplemental Material 10 to justify the robustness of our analysis, which suggested non-significant differences between Bayesian and frequentist models. We will include clearer references in the next version of the manuscript to direct readers to the figures that report the statistics provided by traditional models.

      References:

      Hackenberger, B.K. Bayesian meta-analysis now—let's do it. Croat. Med. J. 61, 564–568 (2020). https://doi.org/10.3325/cmj.2020.61.564

      Sutton, A.J. & Abrams, K.R. Bayesian methods in meta-analysis and evidence synthesis. Stat. Methods Med. Res. 10, 277–303 (2001). https://doi.org/10.1177/096228020101000404

      Williams, D.R., Rast, P. & Bürkner, P.C. Bayesian meta-analysis with weakly informative prior distributions. PsyArXiv (2018). https://doi.org/10.31234/osf.io/9n4zp

      van de Schoot, R., Depaoli, S., King, R. et al. Bayesian statistics and modelling. Nat Rev Methods Primers 1, 1 (2021). https://doi.org/10.1038/s43586-020-00001-2

      Smith, T.C., Spiegelhalter, D.J. & Thomas, A. Bayesian approaches to random-effects meta-analysis: a comparative study. Stat. Med. 14, 2685–2699 (1995). https://doi.org/10.1002/sim.4780142408

      Kruschke, J.K. & Liddell, T.M. The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective. Psychon. Bull. Rev. 25, 178–206 (2018). https://doi.org/10.3758/s13423-016-1221-4

      However, most of the methods are not described in sufficient detail for the reader to understand the proceedings. It might be evident for an expert in Bayesian statistics what a "prior sensitivity test" and a "posterior predictive check" are, but I suppose most readers would wish for a more detailed description. However, using a "Markov chain Monte Carlo (MCMC) method with the no-U-turn Hamiltonian Monte Carlo (HMC) sampler" and checking its convergence "through graphical posterior predictive checks, trace plots, and the Gelman and Rubin Diagnostic", which should then result in something resembling "a uniformly undulating wave with high overlap between chains" is surely something only rocket scientists understand. Whether this was done correctly in the present study cannot be ascertained because it is only mentioned in the methods and no corresponding results are provided. 

      We appreciate the reviewer’s concerns about accessibility and potential complexity in our descriptions of Bayesian methods. Our decision to provide a detailed account serves to enhance transparency and guide readers interested in replicating our study. We acknowledge that some terms may initially seem overwhelming. These steps, such as checking the MCMC chain convergence and robustness checks, are standard practices in Bayesian research and are analogous to “linearity”, “normality” and “equal variance” checks in frequentist analysis. We have provided exemplary plots in the supplemental material and will add more details to explain the interpretation of these convergence checks. We hope this will help address any concerns about methodological rigor.

      In one point the method might not be sufficiently justified. The method used to transform circular-linear r (actually, all references cited by the authors for circular statistics use r² because there can be no negative values) into "Z_r", seems partially plausible and might be correct under the H0. However, Figure 12.3 seems to show that under the alternative Hypothesis H1, the assumptions are not accurate (peak Z_r=~0.70 for r=0.65). I am therefore, based on the presented evidence, unsure whether this transformation is valid. Also, saying that Z_r=-1 represents the null hypothesis and Z_r=1 the alternative hypothesis can be misinterpreted, since Z_r=0 also represents the null hypothesis and is not half way between H0 and H1.

      First, we realized that in the title of Figures 12.2 and 12.3. “true r = 0.35” and “true r = 0.65” should be corrected as “true Z_r”. The method we used here is to first generate an underlying population that has null (0), moderate (0.35), or large (0.65) Z_r correlations, then test whether the sampling distribution drawn from these populations followed a normal distribution across varying sample sizes. Nevertheless, the reviewer correctly noticed discrepancies between the reported true Z_r and its sampling distribution peak. This discrepancy arises because, when generating large population data, achieving exact values close to a strong correlation like Z_r = 0.65 is unlikely. We loop through simulations to generate population data and ensure their Z_r values fall within a threshold. For moderate effect sizes (e.g., Z_r = 0.35), this is straightforward using a narrow range (0.345 < Z_r < 0.355). However, for larger effect sizes like Z_r = 0.65, a wider range (0.6 < Z_r < 0.7) is required. therefore sometimes the population we used to draw the sample has a Z_r slightly deviated from 0.65. This remains reasonable since the main point of this analysis is to ensure that large Z_r still has a normal sampling distribution, but not focus specifically on achieving Z_r = 0.65.

      We acknowledge that this variability of the range used was not clearly explained and it is not accurate to report “true Z_r = 0.65”. In the revised version, we will address this issue by adding vertical lines to each subplot to indicate the Z_r of the population we used to draw samples, making it easier to check if it aligns with the sampling peak. In addition, we will revise the title to “Sampling distributions of Z_r drawn from strong correlations (Z_r = 0.6-0.7)”. We confirmed that population Z_r and the peak of their sampling distribution remain consistent under both H0 and H1 in all sample sizes with n > 25, and we hope this explanation can fully resolve your concern.

      We agree with the reviewer that claiming Z_r = -1 represents the null hypothesis is not accurate. The circlin Z_r = 0 is better analogous to Pearson’s r = 0 since both represent the mean drawn from the population with the null hypothesis. In contrast, the mean effect size under null will be positive in the raw circlin r, which is one of the important reasons for the transformation. To provide a more accurate interpretation, we will update Table 6 to describe the following strength levels of evidence: no effect (r < 0), null (r = 0), small (r = 0.1), moderate (r = 0.3), and large (r = 0.5).

    1. Author response:

      (1) Controls for the genetic background are incomplete, leaving open the possibility that the observed oviposition timing defects may be due to targeted knockdown of the period (per) gene but from the GAL4, Gal80, and UAS transgenes themselves. To resolve this issue the authors should determine the egg-laying rhythms of the relevant controls (GAL4/+, UAS-RNAi/+, etc); this only needs to be done for those genotypes that produced an arrhythmic egg-laying rhythm.

      We agree with this objection, and in the corrected version we plan to provide the assessment of the egg laying rhythms for the missing GAL4 controls as recommended only for Figure 3.

      (2) Reliance on a single genetic tool to generate targeted disruption of clock function leaves the study vulnerable to associated false positive and false negative effects: a) The per RNAi transgene used may only cause partial knockdown of gene function, as suggested by the persistent rhythmicity observed when per RNAi was targeted to all clock neurons. This could indicate that the results in Fig 2C-H underestimate the phenotypes of targeted disruption of clock function. b) Use of a single per RNAi transgene makes it difficult to rule out that off-target effects contributed significantly to the observed phenotypes. We suggest that the authors repeat the critical experiments using a separate UAS-RNAi line (for period or for a different clock gene), or, better yet, use the dominant negative UAS-cycle transgene produced by the Hardin lab (https://doi.org/10.1038/22566).

      We have recently acquired mutant flies with a dominant negative-cycle transgene (UAS-cycDN, Tanoue et al. 2004), and we plan to repeat our experiments with these mutants, in order to confirm our results.

      (3) The egg-laying profiles obtained show clear damping/decaying trends which necessitates careful trend removal from the data to make any sense of the rhythm. Further, the detrending approach used by the authors is not tested for artefacts introduced by the 24h moving average used.

      In the revised version we will show that the detrending approach used does not introduce any artefacts. The analysis of numerical simulations with an aperiodic stochastic signal superposed to a decaying signal shows that the detrending method used does not result in a spurious periodic signal. Furthermore, we can show that when the underlying signal is rhythmic, the correct period is obtained even when the moving average is a few hours larger or smaller than 24 h.

      (4) According to the authors the oviposition device cannot sample at a resolution finer than 4 hours, which will compel any experimenter to record egg laying for longer durations to have a suitably long time series which could be useful for circadian analyses.

      We apologize for not being clear enough. The device can in principle sample at any desired resolution. Notice, however, that the variable we are analyzing (number of eggs laid by a single female) has only a few possible values, which is one of the features that render the assessment of rhythmicity a particularly difficult task. If egg laying is sampled more often (say, at 2 h intervals) more time points will be available, but the values available for each time point will be much less. We will show an example where we compare both rates (2h and 4h). Even though the 2h sampling reveals the rhythmicity of the time series, the significance of the peaks obtained is less than when sampling at 4h intervals. We have found that a 4h sampling seems to provide the best compromise between frequency of the sampling and discreteness of the variable.

      On the other hand, it is important to stress that sampling frequency and longer durations are not very correlated (see e.g. Cohen et al. Journal of Theoretical Biology 314, pp 182 [2012]). It has been shown that the best way to make accurate predictions of the period of a rhythmic signal is to have a series spanning many cycles, irrespective of the sampling frequency. In other words, it is not true that with a 2h sampling it would be possible to analyze shorter series than with 4h sampling. Unfortunately, egg laying records are usually less than 5 cycles long, which is one of the reasons for the difficulties in the assessment of their rhythmicity.

      (5) Despite reducing the interference caused by manually measuring egg-laying, the rhythm does not improve the signal quality such that enough individual rhythmic flies could be included in the analysis methods used. The authors devise a workaround by combining both strongly and weakly rhythmic (LSpower > 0.2 but less than LSpower at p < 0.05) data series into an averaged time series, which is then tested for the presence of a 16-32h "circadian" rhythm. This approach loses valuable information about the phase and period present in the individual mated females, and instead assumes that all flies have a similar period and phase in their "signal" component while the distribution of the "noise" component varies amongst them. This assumption has not yet been tested rigorously and the evidence suggests a lot more variability in the inter-fly period for the egg-laying rhythm.

      The assumption is difficult to test rigorously, since for individual flies the records seem to be so noisy that no information can be extracted. As shown in the paper, it is even very difficult to assess the presence of rhythmicity at the individual level. We consider that the appearance of a rhythm after averaging several records shows the presence of this rhythm at the individual level. But it could be argued that the presence of rhythmicity in the average record could be due to only a few (or even a single) rhythmic individuals. In order to show that this is probably not the case, in the revised version we will show that, when the individuals that are rhythmic are left out, the average of the remaining flies still shows a rhythm (albeit a weaker one, as was to be expected).

      Regarding our assumption that all flies have the “same” period, the results on Fig. 1 F cannot really rule out this possibility, because with so few cycles, the determination of the period is not very accurate (see e.g. Cohen et al. Journal of Theoretical Biology 314, pp 182 [2012]). In our case, the error for the period is related to the width of the corresponding peak in the periodogram, which is typically 4 hs. In any case, in the revised version we will try to show, by using numerical simulations, that when the individual periods are not the same, but are distributed approximately as in Fig 1F, the average series is still rhythmic with the correct period.

      (6) This variability could also depend on the genotype being tested, as the authors themselves observe between their Canton-S and YW wild-type controls for which their egg-laying profiles show clearly different dynamics. Interestingly, the averaged records for these genotypes are not distinguishable but are reflected in the different proportions of rhythmic flies observed. Unfortunately, the authors also do not provide further data on these averaged profiles, as they did for the wild-type controls in Figure 1, when they discuss their clock circuit manipulations using perRNAi. These profiles could have been included in Supplementary figures, where they would have helped the reader decide for themselves what might have been the reason for the loss of power in the LS periodogram for some of these experimental lines.

      Even though we think that the individual records are in general too noisy to be really informative, we will provide all the individual egg profiles in the Supplementary Material of the revised version, in order to let the reader, check this for herself/himself.

      (7) By selecting 'the best egg layers' for inclusion in the oviposition analyses an inadvertent bias may be introduced and the results of the assays may not be representative of the whole population.

      We agree that this may introduce some bias in the results. But in our opinion this bias is very difficult to avoid, since for females that lay very few eggs, rhythmicity can even be difficult to define (some females can spend a whole day without laying a single egg). On the other hand, even when the results may not be representative of the whole population, they would be representative of the flies that lay most of the eggs in a population, which seems to be very relevant in ecological terms.

      (8) An approach that measures rhythmicity for groups of individual records rather than separate individual records is vulnerable to outliers in the data, such as the inclusion of a single anomalous individual record. Additionally, the number of individual records that are included in a group may become a somewhat arbitrary determinant for the observed level of rhythmicity. Therefore, the experimental data used to map the clock neurons responsible for oviposition rhythms would be more convincing if presented alongside individual fly statistics, in the same format as used for Figure 1.

      The question of possible rhythmic outliers has been addressed above, in question 5, where we discuss why we think that such outliers are not “determinant for the observed level of rhythmicity”. As also mentioned above, even though we think that they are too noisy to be informative, we plan to include all individual profiles in the Supplementary Material.

      (9) The features in the experimental periodogram data in Figures 3B and D are consistent with weakened complex rhythmicity rather than arrhythmicity. The inclusion of more individual records in the groups might have provided the added statistical power to demonstrate this. Graphs similar to those in 1G and 1I, might have better illustrated qualitative and quantitative aspects of the oviposition rhythms upon per knockdown via MB122B and Mai179; Pdf-Gal80.

      We assume that the features mentioned refer to the appearance in the periodograms of two small peaks under the significance lines. We are aware that in the studies of the rhythmicity of locomotor activity such features are usually interpreted as “complex rhythms”, i.e. as evidence of the existence of two different mechanisms producing two different rhythms in the same individual. In our case, however, at least two other possibilities should be taken into account. Since the periodograms we show assess the rhythmicity of the average time series of several individuals, the two small peaks could correspond to the periods of two different subpopulations. Another possibility could be that such peaks are simply an artifact of the method in the analysis of time series that consist of very few cycles (as explained above) and also few points per cycle. A cursory examination of the individual profiles, that will be provided in the new version, do not seem to support any of the first two possibilities mentioned. On the other hand, we will show evidence that the analysis of series that are perfectly random sometimes result in periodograms with some small peaks.

    1. Author response:

      Reviewer #1 (Public review):

      This study is focused on a population of neurons in the mouse parasubthalamic nucleus (pSTN) that express Tackhykinin1 (Tac1). This gene has been used before to target pSTN for functional circuit studies because it is fairly selective for pSTN in this region, though it targets only a subset of pSTN neurons. Prior work has shown that activity in these neurons can impact motivated behaviors, including feeding and drinking behaviors, and that their activity is associated with aversion or avoidance behaviors. While not breaking much new ground, this study adds to that work by making use of a 2-way active avoidance assay, where a CS predicts a US (footshock), that the mice can escape. Using fiber photometry, the authors show convincing evidence that Tac1 neurons in pSTN increase their activity in response to a US footshock, and that after some pairings the neurons will start responding to the CS too, though to a lesser extent than the US. Their most important data shows that either ablation or optogenetic inhibition of these cells can hugely block the active avoidance (escape) behavior, suggesting these neurons are key for the performance of this task, which they interpret as key for learning the task (but see more below). They show that optogenetic stimulation is aversive in a real-time place assay, and when paired with footshock can enhance active avoidance behavior. Finally, they show that Tac1 pSTN axons in PVT recapitulate these effects while showing that axons in CEA or PBN may only recapitulate some of these effects (more below). Overall I think the data is solid and shows that the activity of Tac1 pSTN neurons in the 2 way active avoidance task is causally related to avoidance behavior in the direction that would be predicted by recent literature. However, I think the authors overstate the conclusions in the title, abstract, and text. I do not think the data make a strong case for a role for these cells in learning, at least in any classical sense, as used in the title and abstract and elsewhere. Also, the statement in the abstract that the pSTN mediates its effects 'differentially' through its downstream targets is not convincingly supported by data.

      We are very pleased that Reviewer 1 thought our data is solid.

      Major concerns:

      (1) The authors infer that the activity in the Tac1 pSTN neurons is necessary for aversive or avoidance 'learning'. But this is not well defined, what exactly does that mean and what types of evidence would support or falsify such a hypothesis? Moreover, the authors show convincingly, and in line with prior reports, that these cells are activated by aversive stimuli (here footshock), and that activation of these cells is sufficient to induce avoidance behavior. Because manipulation of these cells can serve as a primary negative reinforcer, it becomes even more challenging and important to explain how experiments that manipulate these cells while measuring behavior/performance can discriminate between changes in: (1) primary aversion, (2) motivation to avoid, (3) associative learning, or (4) memory/retrieval. The authors seem to favor #3, but they don't make a clear case for this point of view or else what they mean by 'avoidance learning'. In my opinion, the data do not well discriminate between possibilities 1 through 3. The authors should clarify their logic and temper their conclusions throughout.

      Thank you Reviewer 1 for providing us insightful suggestions. Based on our fiber photometry data that the activities of PSTN Tac1+ neurons show a significant increase in CS-evoked calcium fluorescent signals in late trials relative to those in early trials (Figure 1H-K) and our optogenetic inhibition experiments during CS (Figure 2N-Q), these results illustrate that the activities of PSTN Tac1+ neurons are modulated by learning and are required for active avoidance learning. Moreover, PSTN Tac1+ neurons are activated by footshock and activation of these cells is sufficient to induce avoidance behavior. These findings demonstrate that PSTN Tac1+ neurons encode aversive information. Together, our current data support that PSTN Tac1+ neurons encode both aversive event and its predicting cue. We will clarify our conclusions in the revised manuscript.

      (2) Abstract line 37 is not well supported. The authors focus mostly on pSTN projections to PVT and show that the measurements or manipulation of these axons recapitulates the effects seen with pSTN cell bodies. The authors do fewer studies of axons in CeA and PBN, but do find that they can recapitulate the effects with opsin inhibition, but detect no effects with opsin stimulation. However, the lack of effect with opsin stimulation in Figure S7a-e proves very little on its own. It could be technical, due to inadequate expression or functional efficacy. It is not supported by histological and functional evidence that the manipulation was effective. Overall, I can only conclude that the projections to these regions might be very similar (based on the inhibition data), or might be a little different. The data are thus inadequate to support the authors' claim that the pSTN mediates learning differentially through its downstream targets.

      In the revised version of manuscript, we will provide more histological and functional evidence for the PSTN-to-CeA and PSTN-to-PBN circuits to support our conclusion on the functional roles of these downstream targets. Similar with our anterograde experiment that the PSTN densely projects to CeA and PBN (Figure S6), optogenetic activation and inhibition experiments showed dense axonal terminals in the CeA and PBN from the PSTN and this line of data will be included in the revised manuscript. In addition, we will further examine these circuits by investigating the functional roles of CeA-projecting or PBN-Projecting PSTN neurons during 2-way active avoidance task.

      Other concerns:

      (3) Line 93 is not adequately supported by data in Figure 1b. Additional data is needed that shows expression across cases, including any spread that may be visible when zooming out from pSTN. Additional methods are needed to indicate what exclusion criteria were applied and how many mice were excluded. These data could help support the statement on line 93 that expression was largely restricted within pSTN.

      In the revised version of manuscript, we will provide larger example images containing pSTN and its adjacent areas to demonstrate that the viral expression is well restricted into this brain area. Moreover, we will provide detailed information on the exclusion criteria and the number of mice excluded in the Method section.   

      (4) From the results and methods it is not clear where the GFP signal would come from in the mice expressing Casp3 for the ablation studies. It is therefore not clear if the absence of GFP should be taken as evidence of cell loss. For example, it is not clear if multiple vectors were used, if volumes and titers were carefully matched between control groups, or if competition/occlusion between AAVs could be ruled out. It is also not clear how this was quantified, that is how many sections/subjects and how counting was done. It is not clear how long was waited between the AAV infusion, behavior, and euthanasia, perhaps especially important for the ablation done after avoidance learning occurred.

      I totally agree with Reviewer 1’s concerns. We will perform immunohistochemistry or in situ hybridization for Tachykinin-1 itself and then measure colocalization of GFP with Tachykinin-1 inside and outside of the PTSN, and the degree of absence of Tachykinin-1 in Casp mice. In addition, we will provide more detailed experimental information in the revised manuscript.

      (5) The authors should consider showing individual measurements and not just mean/sem wherever feasible, for example, to support the statement on line 141 that 'all ablated mice showed...'.

      Thank you Reviewer 1 for this suggestion. We will re-plot the data as individual measurements in the revised manuscript.

      (6) S3 is an important control for interpreting data in Figure 2d-i. Something similar is needed to support the inferences made in 2j-u. The very strong effect showing a lack of active avoidance in response to CS or the US when pSTN Tac1 neurons are inhibited during CS or during US suggests that something gross may be going on, such as a gross motor or sensory response that supersedes the effect of footshock. The authors do not comment on whether there are any gross behavioral responses to the inhibition, but an experiment as in S3 is needed, for example, to show that behavior is intact during pSTN inhibition if delivered after the mice already learned to associate CS with US.

      Thank you Reviewer 1 for this insightful suggestion. During the review process, we have performed this line of experiment as in Figure S3. We measured the behavioral responses during pSTN optogenetic inhibition after the mice already learned to associate CS with US and found most GtACR-expressing mice showed unaffected avoidance learning. This data will be included in the revised manuscript.

      (7) The authors use 100 shocks of 0.8 mA for 7 days. I think this is quite strong and in the pSTN inhibition experiments it seems to be functionally 'inescapable' and could thus produce behaviors similar to 'learned helplessness'. Can the authors consider whether this might contribute to the striking findings they observed in their opsin inhibition assays?

      I agree with the Reviewer 1’s comment on the string findings in the optogenetic inhibition results. Indeed, based on the results on days 1 and 2, optogenetic inhibition of PSTN tac1+ neurons has significantly blocked GtACR-expressing animals’ behavioral performance during 2-way active avoidance task. To examine whether the effect by optogenetic inhibition of these neurons could possibly decline with prolonged training, we conducted additional 5-day training. We will discuss and add this comment in the revised manuscript.

      (8) The description of the experiment in S5 is inadequate. What are the adjacent areas? Where do the authors see spread? The use of the word 'case' in figure S5 implies an individual case, but the legend says 5 mice were used for 'case 1' and 3 mice were used for 'case 2'. The use of the word 'off-target in the figure implies that the expression was of the intended target. But the text of results and methods implies it was intentional targeting of unnamed and unshown adjacent regions. This should be clarified.

      We will add histological images and clarify these comments in the revised manuscript. The purpose of this experiment is to illustrate that even slightly spreading ChR2 viruses into Tac1+ neurons of the adjacent areas of the PSTN did not result in behavioral changes and this will indirectly support the main behavioral function caused by the PSTN tac1+ neurons rather than its neighboring areas. Because Tac1+ neurons outside the PSTN are sparsely expressed, it is quite difficult to completely restrict the viral expression in the PSTN from the anterior to the posterior. Thus, we will provide detailed information on the exclusion criteria and the number of mice excluded in the Method section.   

      (9) The authors suggest the CPA study is divergent from Serra et al 2023. Though I think this could be due to how the conditioning was done, it would be helpful for the authors to include less processed data. This would aid in possible interpretations for any divergences across studies. Can the authors include raw data (in seconds of time spent) in each compartment for each group across baseline and test days?

      We will follow Reviewer 1’s suggestion to include raw data (in seconds of time spent) in each compartment for each group across baseline and test days in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Hu et. al presents a clearly-designed examination of the role of tachykinin1-expressing neurons in the parasubthalamic nucleus of the lateral posterior hypothalamus (PTSN) in active avoidance learning. These glutamatergic neurons have previously been implicated in responding to negative stimuli. This manuscript expands the current understanding of PTSNTac1 neurons in learned responses to threats by showing their role in encoding and mediating the active avoidance response. The authors first use bulk fiber photometry imaging to show the encoding of the active avoidance procedure, followed by cell-type specific manipulations of PTSNTac1 neurons during active avoidance. Finally, they show that encoding and mediation of active avoidance in a downstream target of PTSNTac1 neurons, the PVT/intermediodorsal nuclei of the dorsal thalamus (IMD), has the same effect as what was discovered in the cell body. This contrasts other output regions of the PTSN, such as the PBN and CeA, which were not found to promote active avoidance learning. The experiments presented were well-designed to support the conclusions of the authors, however, the manuscript is missing several key control experiments and supplemental information to support their main findings.

      Strengths:

      The manuscript provides information on a brain region and downstream target that mediates active avoidance learning. The manuscript provides valuable information via necessity and sufficiency experiments to show the role of the population of interest (PTSNTac1 neurons) in active avoidance learning. The authors also performed most behavior experiments in male and female mice, with adequate power to address potential sex differences in the control of active avoidance by PTSNTac1 neurons. Finally, the manuscript provides valuable information about the specificity of the PTSNTac1 downstream target in regulating active avoidance learning, identifying the PVT/intermediodorsal nuclei of the dorsal thalamus as the key target and ruling out the PBN and CeA.

      We highly appreciate that Reviewer 2 thought that our experiments presented were well-designed to support the conclusions and provided valuable information in several aspects.

      Weaknesses:

      However, several main conclusions of the paper must be interpreted carefully due to missing or inadequate control experiments and histological verification.

      (1) Inadequate presentation of viral localization. The authors state that expression was "largely restricted within PSTN" however there is no quantification of the amount of viral expression beyond the target region. Given that Tac1 is expressed in neighboring regions, it is critical to show the viral expression and fiber implant location data for all animals included in the figures. Furthermore, criteria for inclusion and exclusion based on mistargeting should be delineated. This should also be clearly outlined for the experiments in Figure S5, where "behavioral effects of activation of sparsely Tac1-expressing neurons in two adjacent areas of PSTN" was tested but the location of viral expression in those cases is unclear.

      Similar with questions 3 and 8 of Reviewer 1. We will provide the viral expression and fiber implant location data for all animals included in the figures and histological images in Figure S5 in the revised manuscript. Moreover, we will provide detailed information on the exclusion criteria and the number of mice excluded in the Method section.  

      2) Lack of motion artifact correction with isosbestic signal for GCamp recordings. It is appreciated that the authors included a separate EGFP-expressing group to compare to the GCamp-expressing group, however, additional explanation is required for the methods used to analyze the raw fluorescent signal. Namely, were fluorescent signals isosbestic-corrected prior to calculating ΔF/F? If no isosbestic signal was used to correct motion artifacts within a recording session, additional explanation is needed to explain how this was addressed. The lack of motion artifacts in the EGFP signal in a separate cohort is inadequate to answer this caveat as motion artifacts are within-animal.

      We will follow Reviewer 2’s suggestion and perform isosbestic-correction for fluorescent signals prior to calculating ΔF/F. We will re-plot related figures and add this information in the revised manuscript.

      (3) Missing control experiment demonstrating intact locomotor performance in caspase ablation experiments. The authors use caspase ablation of PTSNTac1 neurons prior to active avoidance learning to appraise the necessity of this cell population. However, a control experiment showing intact locomotor ability in ablated mice was not performed.

      We will follow Reviewer 2’s suggestion to perform a control experiment showing intact locomotor ability in caspase 3-ablated mice and will include this data in the revised manuscript.

      (4) Missing control experiment demonstrating [lack of] valence with PTSN silencing manipulations. The authors performed a real-time and conditioned place preference experiments for ChR2-expressing mice (Fig 3M) and found stimulation to be negatively-valenced and generate an aversive memory, respectively. Absent this control experiment with silencing, an alternative conclusion remains possible that optogenetic silencing via GtACR2 created nonspecific location preferences in the active avoidance apparatus, confounding the interpretation of those results.

      Thank you Reviewer 2 for this useful suggestion. We will examine the valence with PTSN silencing manipulations by using a RTPP test and add this data in the revised manuscript.

      (5) Incomplete analysis of sex differences. Data in female mice is conspicuously missing from inhibition experiments. The rationale for exclusion from this dataset would be useful for the interpretation of the other noted sex differences.

      Thank you Reviewer 2 for this useful suggestion. During the review process, we have performed ablation and inhibition experiments in females, demonstrating similar behavioral effects as those in males. We will add these data in the revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      This study by Hu et al. examined the role of tachykinin1 (Tac1)-expressing neurons in the para subthalamic nucleus (PSTH) in active avoidance of electric shocks. Bulk recording of PSTH Tac1 neurons or axons of these neurons in PVT showed activation of a shock-predicting tone and shock itself. Ablation of these neurons or optogenetic manipulation of these neurons or their projection to PVT suggests the causality of this pathway with the learning of active avoidance.

      Strengths:

      This work found an understudied pathway potentially important for active avoidance of electric shocks. Experiments were thoroughly done and the presentation is clear. The amount of discussion and references are appropriate.

      We are very pleased to have Reviewer 3’s positive comments on the manuscript.

      Weaknesses:

      Critical control experiments are missing for most experiments, and statistical tests are not clear or not appropriate in most parts. Details are shown below.

      (1) There are some control experiments missing. Notably, optogenetic manipulation is not verified in any experiments. It is important to verify whether neural activation with optogenetic activation is at the physiological level or supra-physiological level, and whether optogenetic inhibition does not cause unwanted activity patterns such as rebound activation at the critical time window.

      Thank you Reviewer 3 for this useful suggestion. We will perform in vitro slice recording experiments to verify optogenetic manipulations and add this line of evidence in the revised manuscript.

      (2) Neural ablation with caspase was confirmed by GFP expression. However, from the present description, a different virus to express EITHER caspase or GFP was injected, and then the numbers of GFP-expressing neurons were compared. It is not clear how this can detect ablation.

      Similar with question 4 of Reviewer 1. We will perform immunohistochemistry or in situ hybridization for Tachykinin-1 itself and then measure colocalization of GFP with Tachykinin-1 inside and outside of the PTSN, and the degree of absence of Tachykinin-1 in Casp-ablated mice. In addition, we will provide more detailed experimental information in the revised manuscript.

      (3) In many places, statistical approaches are not clear from the present figures, figure legends, and Methods. It seems that most statistics were performed by pooling trials, but it is not described, or multiple "n" are described. For example, it is explicitly mentioned in Figure 4H, "n = 3 mice, n = 213 avoidance trials and n = 87 failure trials". The authors should not pool trials, but should perform across-animal tests in this and other figures, and "n" for should be clearly described in each plot.

      We have provided all statistical information in the Supplementary Table 1. In the revised manuscript, we will perform across-animal tests, re-plot new figures and provide clear statistical information.

      (4) It is also unclear how the test types were selected. For example, in Figure 1K and O with similar datasets, one is examined by a paired test and the other is by an unpaired test. Since each animal has both early vs late trials, and avoidance vs failure trials, paired tests across animals should be performed for both.

      Following Reviewer 3’s suggestion, we will perform across-animal tests. In the first version of our manuscript, for fiber photometry experiments, we pooled trial data of each animal and performed statistics tests across trials. Because avoidance and failure trials were different, we thus selected an unpaired test for this kind of dataset.

      (5) It is also strange to show violin plots for only 6 animals. They should instead show each dot for each animal, connected with a line to show consistent increases of activity in late vs early trials and avoidance vs failure trials.

      Similar with question 4 of Reviewer 3, we pooled trial data of each animal and performed statistics tests across trials. We will perform across-animal tests and re-plot figures by connecting with a line to show consistent increases of activity in late vs early trials and avoidance vs failure trials for each animal.

      (6) To tell specificity in avoidance learning, it is better to show escape in the current trials with optogenetic manipulation.

      Thank you Reviewer 3 for this useful suggestion. We will follow this suggestion and add this analysis in the revised manuscript.

      (7) For place aversion, % time decrease across days was tested. It is better to show the original number before normalization, as well.

      Similar with question 9 of Reviewer 1, we will show the original number before normalization in the revised manuscript.

      (8) For anatomical results in Figure S6, it is important to show images with lower magnification, too.

      We will follow this suggestion and provide histological images with lower magnification in the revised manuscript.

      (9) Inactivation of either pathway from PSTH to PBN or to CeA also inhibits active avoidance, but the authors conclude that these effects are "partial" compared to the inactivation of PSTH to PVT. It is not clear how the effects were compared since the effects of PSTH-CeA inactivation are quite strong, comparable to PSTH-PVT inactivation by eye. They should quantify the effects to conclude the difference.

      We will quantify the effects of different downstream targets of the PSTN to make a precise conclusion.

      (10) Supplementary table 1: as mentioned above, n for statistical tests should be clearer.

      As mentioned above, we will perform across-animal tests and provide clear statistical information in the figure legends and supplementary table 1.

    1. Reviewer #2 (Public review):

      In this version of manuscript, the author clarified many details and rewrote some sections. This substantially improved the readability of the paper. I also recognized that the author spent substantial efforts in the Appendix to answer the potential questions.

      Unfortunately, I am not currently convinced by the theory proposed in this paper. In the next section, I will first recap the logic of the author and explain why I am not convinced. Although the theory fits many experimental results, other theories on overflow metabolism are also supported by experiments. Hence, I do not think based on experimental data we could rule in or rule out different theories.

      Recap: To explain the origin of overflow metabolism, the author uses the following logic:

      (1) There is a substantial variability of single-cell growth rate<br /> (2) The flux (J_r^E) and (J_f^E) are coupled with growth rate by Eq. 3<br /> (3) Since growth rate varies from cells to cells, flux (J_r^E) and (J_f^E) also varies<br /> (4) The variabilities of above fluxes in above create threshold-analog relation, and hence overflow metabolism.

      My opinion:

      The logic step (2) and (3) have caveats. The variability of growth rate has large components of cellular noise and external noise. Therefore, variability of growth rate is far from 100% correlated with variability of flux (J_r^E) and (J_f^E) at the single-cell level. Single-cell growth rate is a complex, multivariate functional, including (Jr^E) and (J_f^E) but also many other variables. My feeling is the correlation could be too low to support the logic here.

      One example: ribosomal concentration is known to be an important factor of growth rate in bulk culture. However, the "growth law" from bulk culture cannot directly translate into the growth law at single-cell level [Ref1,2]. This is likely due to other factors (such as cell aging, other muti-stability of cellular states) are involved.

      Therefore, I think using Eq.3 to invert the distribution of growth rate into the distribution of (Jr^E) and (J_f^E) is inapplicable, due to the potentially low correlation at single-cell level. It may show partial correlations, but may not be strong enough to support the claim and create fermentation at macroscopic scale.

      Overall, if we track the logic flow, this theory implies overflow metabolism is originated from variability of k_cat of catalytic enzymes from cells to cells. That is, the author proposed that overflow metabolism happens macroscopically as if it is some "aberrant activation of fermentation pathway" at the single-cell level, due to some unknown partially correlation from growth rate variability.

      Compared with other theories, this theory does not involve any regulatory mechanism and can be regarded as a "neutral theory". I am looking forward to seeing single cell experiments in the future to provide evidences about this theory.

      [Ref1] https://www.biorxiv.org/content/10.1101/2024.04.19.590370v2<br /> [Ref2] https://www.biorxiv.org/content/10.1101/2024.10.08.617237v2

    1. Reviewer #1 (Public review):

      Summary:

      In this lovely paper, McDermott and colleagues tackle an enduring puzzle in the cognitive neuroscience of perceptual prediction. Though many scientists agree that top-down predictions shape perception, previous studies have yielded incompatible results - with studies showing 'sharpened' representations of expected signals, and others showing a 'dampening' of predictable signals to relatively enhance surprising prediction errors. To deepen the paradox further, it seems like there are good reasons that we would want to see both influences on perception in different contexts.

      Here, the authors aim to test one possible resolution to this 'paradox' - the opposing process theory (OPT). This theory makes distinct predictions about how the time course of 'sharpening' and 'dampening' effects should unfold. The researchers present a clever twist on a leading-trailing perceptual prediction paradigm, using AI to generate a large dataset of test and training stimuli so that it is possible to form expectations about certain categories without repeating any particular stimuli. This provides a powerful way of distinguishing expectation effects from repetition effects - a perennial problem in this line of work.

      Using EEG decoding, the researchers find evidence to support the OPT. Namely, they find that neural encoding of expected events is superior in earlier time ranges (sharpening-like) followed by a relative advantage for unexpected events in later time ranges (dampening-like). On top of this, the authors also show that these two separate influences may emerge differently in different phases of learning - with superior decoding of surprising prediction errors being found more in early phases of the task, and enhanced decoding of predicted events being found in the later phases of the experiment.

      Strengths:

      As noted above, a major strength of this work lies in important experimental design choices. Alongside removing any possible influence of repetition suppression mechanisms in this task, the experiment also allows us to see how effects emerge in 'real-time' as agents learn to make predictions. This contrasts with many other studies in this area - where researchers 'over-train' expectations into observers to create the strongest possible effects or rely on prior knowledge that was likely to be crystallised outside the lab.

      Weaknesses:

      This study reveals a great deal about how certain neural representations are altered by expectation and learning on shorter and longer timescales, so I am loath to describe certain limitations as 'weaknesses'. But one limitation inherent in this experimental design is that, by focusing on implicit, task-irrelevant predictions, there is not much opportunity to connect the predictive influences seen at the neural level to the perceptual performance itself (e.g., how participants make perceptual decisions about expected or unexpected events, or how these events are detected or appear).

      The behavioural data that is displayed (from a post-recording behavioural session) shows that these predictions do influence perceptual choice - leading to faster reaction times when expectations are valid. In broad strokes, we may think that such a result is broadly consistent with a 'sharpening' view of perceptual prediction, and the fact that sharpening effects are found in the study to be larger at the end of the task than at the beginning. But it strikes me that the strongest test of the relevance of these (very interesting) EEG findings would be some evidence that the neural effects relate to behavioural influences (e.g., are participants actually more behaviourally sensitive to invalid signals in earlier phases of the experiment, given that this is where the neural effects show the most 'dampening' a.k.a., prediction error advantage?)

    1. Author response:

      The following is the authors’ response to the original reviews.

      Overall authors’ response

      We would like to thank the 3 reviewers for a thorough critique of our manuscript, and acknowledging the novelty and importance of our studies, in particular the relevance to collagenrelated pathologies such as idiopathic pulmonary fibrosis and chronic skin wound. We appreciate that there are shortcomings in these studies, as highlighted by reviewers; we have rewritten parts of our manuscript to clarify any misunderstandings, and conducted additional experiments to address concerns raised by reviewers (please see below red text within each response), which have been incorporated into our revised manuscript (modified text highlighted in yellow in revised manuscript). We believe that the revision had made our manuscript stronger in support of our original conclusions. 

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors describe that the endocytic pathway is crucial for ColI fibrillogenesis. ColI is endocytosed by fibroblasts, prior to exocytosis and formation of fibrils, which can include a mixture of endogenous/nascent ColI chains and exogenous ColI. ColI uptake and fibrillogenesis are regulated by circadian rhythm as described by the authors in 2020, thanks to the dependence of this pathway on circadian-clock-regulated protein VPS33B. Cells are capable of forming fibrils with recently endocytosed ColI when nascent chains are not available. Previously identified VPS33B is demonstrated not to have a role in endocytosis of ColI, but to play a role in fibril formation, which the authors demonstrate by showing the loss of fibril formation in VPS33B KO, and an excess of insoluble fibrils - along-side a decrease in soluble ColI secretion - in VPS33B overexpression conditions. A VPS33B binding protein VIPAS39 is also shown to be required for fibrillogenesis and to colocalise with ColI. The authors thus conclude that ColI is internalised into endosomal structures within the cell, and that ColI, VPS33B, and VIPA39 are co-trafficked to the site of fibrillogenesis, where along with ITGA11, which by mass spectrometric analysis is shown to be regulated by VPS33B levels, ColI fibrils are formed. Interestingly, in involved human skin sections from idiopathic pulmonary fibrosis (IPF) patients, ITGA11 and VPS33B expression is increased compared to healthy tissue, while in patient-derived fibroblasts, uptake of fluorescently-labelled ColI is also increased. This suggests that there may be a significant contribution of endocytosis-dependent fibrillogenesis in the formation of fibrotic and chronic wound-healing diseases in humans. 

      Strengths: 

      This is an interesting paper that contributes an exciting novel understanding of the formation of fibrotic disease, which despite its high occurrence, still has no robust therapeutic options. The precise mechanisms of fibrillogenesis are also not well understood, so a study devoted to this complex and key mechanism is well appreciated. The dependence of fibrillogenesis on VPS33B and VIPA39 is convincing and robust, while the distinction between soluble ColI secretion and insoluble fibrillar ColI is interesting and informative. 

      Weaknesses: 

      There are a number of limitations to this study in its current state. Inhibition of ColI uptake is performed using Dyngo4a, which although proposed as an inhibitor of Clathrin-dependent endocytosis is known to be quite un-specific. This may not be a problem however, as the endocytic mechanism for ColI also does not seem to be well defined in the literature, in fact, the principle mechanism described in the papers referred to by the authors is that of phagocytosis.

      We thank the reviewer for pointing this out. Macropinocytosis or phagocytosis could be modelled using high molecular weight dextran, and we have used fluorescently-labelled dextran to investigate potential co-localisation with exogenous collagen to investigate the involvement of these mechanisms in addition to endocytosis, and showed very little co-localisation (revised Figure S2B, lines 123-126). Further, we have performed a competition experiment where unlabelled collagen was added in excess at the same time as labelled collagen and showed that excess unlabelled collagen led to a retention of labelled collagen at the cell periphery (revised Figure S2C, lines 126-129). This is suggestive of collagen-I uptake utilises a different pathway to dextran (i.e. fluid-phase endocytosis) and is a receptor-mediated process.  

      It would be interesting to explore this important part of the mechanism further, especially in relation to the intracellular destination of ColI.

      We agree with the reviewer that the intracellular destination of ColI is very interesting, which is what the current Chang lab is investigating, although we believe the research findings fall out of scope for the revised manuscript here. However, we have included additional immunofluorescence data to support that collagen is indeed taken up into endosomal compartments using GFP-tagged Rab5 constructs (revised Figure 1D, Figure S6A).

      The circadian regulation does not appear as robust as the authors' last paper, however, there could be a larger lag between endocytosis of ColI and realisation of fibrils.

      The authors state that the endocytic pathway is the mechanism of trafficking and that they show ColI, VPS33B, and VIPA39 are co-trafficked. However, the only link that is put forward to the endosomes is rather tenuously through VPS33B/VIPA39.

      We would like to clarify that we meant the post-Golgi compartment. We did not mean VPS33b/VIPAS39 as an endosome marker; however as we see collagen entering the cell in intracellular compartments, which is then recycled, we take that as convention, the endosome would be involved. This is further supported that we see some colocalisation with the classic Rab5 endosome marker.

      There is no direct demonstration of ColI localisation to endosomes (ie. immunofluorescence), and this is overstated throughout the text.

      We appreciate the comment and have modified overstatements in the revised manuscript as appropriate. As stated above, we have included additional immunofluorescence data to support that collagen is indeed taken up into endosomal compartments.

      Demonstrating the intracellular trafficking and localisation of ColI, and its actual relationship to VPS33B and VIPA39, followed by ITGA11, would broaden the relevance of this paper significantly to incorporate the field of protein trafficking. Finally, the "self-formation" of ColI fibrils is discussed in relation to the literature and the concentration of fluorescently-tagged ColI, however as the key message of the paper is the fibrillogenesis from exocytosed colI, I do not feel like it is demonstrated to leave no doubt. Specific inhibition of intracellular trafficking steps, or following the progressive formation of ColI fibrils over time by immunofluorescence would demonstrate without any further doubt that ColI must be endocytosed first, to form fibrils as a secondary step, rather than externally-added ColI being incorporated directly to fibrils, independent of cellular uptake.

      We appreciate the concern raised here. This is precisely why we trypsinised and replated cells as part of the workflow, so we can make sure that there is no residual exogenous collagen which is not endocytosed being incorporated onto pre-existing fibrils. We have new data using flow imaging, which showed that cells that don’t endocytose exogenous collagen has accumulation of said collagen at the periphery of the cells, which is greatly reduced after trypsinisation. This new data is in a more detailed methodology-based study which is under preparation, which will allow future studies to further dissect the collagen intracellular trafficking process, and thus is not included in the revised manuscript. 

      Reviewer #2 (Public Review): 

      Summary: 

      In this manuscript, the authors describe a mechanism, by which fluorescently-labelled Collagen type

      I is taken up by cells via endocytosis and then incorporated into newly synthesized fibers via an ITGA11 and VPS33B-dependent mechanism. The authors claim the existence of this collagen recycling mechanism and link it to fibrotic diseases such as IPF and chronic wounds. 

      Strengths: 

      he manuscript is well-written, and experimentally contains a broad variation of assays to support their conclusions. Also, the authors added data of IPF patient-derived fibroblasts, patient-derived lung samples, and patient-derived samples of chronic wounds that highlight a potential in vivo disease correlation of their findings. 

      The authors were also analyzing the membrane topology of VPS33B and could unravel a likely 'hairpin' like conformation in the ER membrane. 

      Weaknesses: 

      Experimental evidence is missing that supports the non-degradative endocytosis of the labeled collagen.

      We thank the reviewer for raising this. We would like to clarify that we do not think that all endocytosed collagen-I is recycled, but rather sorted in the endosome which determines the fate of endocytosed collagen. Interestingly, results from Kadler’s group has shown that blocking lysosome function (through chloroqine and bafilomycin) significantly reduced endogenous collagen fibril formation (https://www.biorxiv.org/content/10.1101/2024.05.09.593302v1), suggesting a nondegradative role for lysosome in fibrillogenesis.   

      The authors show and mention in the text that the endocytosis inhibitor Dyngo®4a shows an effect on collagen secretion. It is not clear to me how specific this readout is if the inhibitor affects more than endocytosis. This issue was unfortunately not further discussed.

      We thank the reviewer for this comment and have included in discussion the specificity of Dyngo4a (revised manuscript lines 383392). The ponceau stain suggests that Dyngo4a treatment did not affect global secretion and thus the effects are specific to collagen-I (Fig 2B).

      The authors use commercial rat tail collagen, it is unclear to me which state the collagen is in when it's endocytosed. Is it fully assembled as collagen fiber or are those single heterotrimers or homotrimers?

      We apologise for the confusion and will clarify in our revision. These would be single helical trimers from acid-extracted rat tail collagen. We have performed additional light scattering and CD spectra to confirm the molecular weight and helicity, and confirm that adding fluorescent tags did not alter the readout. We have included this in the revised manuscript (revised Figure S1A-C, manuscript lines 82-86).    

      The Cy-labeled collagen is clearly incorporated into new fibers, but I'm not sure whether the collagen is needed to be endocytosed to be incorporated into the fibers or if that is happening in the extracellular space mediated by the cells.

      We appreciate the concern raised here, which is also raised by reviewer 1. As answered above, this is why we trypsinised and replated cells as part of the workflow, so we can make sure that there is no residual exogenous collagen being incorporated onto pre-existing fibrils. We also have new data using flow imaging, which shows that cells that don’t endocytose exogenous collagen has accumulation of said collagen at the periphery of the cells, which is greatly reduced after trypsinisation. This new data is in a methodology-based manuscript which is under preparation, thus will not be included in the revised manuscript.  

      In general for the collagen blots, due to the lack of molecular weight markers, what chain/form of collagen type I are you showing here?

      Apologies for the lack of molecular weight markers, it was an oversight by the authors and have been included in the revised figures.  

      Besides the VPS33B siRNA transfected cells the authors also use CRISPR/Cas9-generated KO. The KO cells do not seem to be a clean system, as there is still a lot of mRNA produced. Were the clones sequenced to verify the KO on a genomic level?

      Yes, the clones were verified and used in our previous paper on circadian control of collagen homeostasis. There are instances where despite knockout at the protein level, mRNA is still persistent; however these transcripts are likely then directed to degradation through nonsense-mediated mRNA decay. To fully understand this mechanism is beyond the scope of this paper. 

      For the siRNA transfection, a control blot for efficiency would be great to estimate the effect size. To me it is not clear where the endocytosed collagen and VPS33B eventually meet in the cells and whether they interact. Or is ITGA11 required to mediate this process, in case VPS33B is not reaching the lumen?

      This is an interesting question. We have conducted experiments with Col1-GFP11 containing conditioned media incubated with VPS33b-barrell in the revised paper, which showed that they interact within the cell and not at the cell periphery (revised Figure 6G, lines 293-296), again highlighting that VPS33b is not involved in the endocytosis step but interacts with endocytosed collagen-I intracellularly. We have attempted colocliasation studies using the split GFP approach with VPS33B and ITGA11 to investigate where they interact, but as the ITGA11 construct we used did not localise to the cell surface as expected, we are not confident that this system is appropriate for investigating how/if VPS33B interacts with ITGA11, and there are simply no good antibody for VPS33B for staining. 

      The authors show an upregulation of ITGA11 and VPS33B in IPF patients-derived fibroblasts, which can be correlated to an increased level of ColI uptake, however, it is not clear whether this increased uptake in those cells is due to the elevated levels of VPS33B and/or ITGA11.

      We would like to clarify here that we do not think collagen-I uptake is due to VPS33B and/or ITGA11, as siITGA11 and VPS33B in fibroblasts showed no consistent changes in uptake as determined by flow cytometry, which was included in the original manuscript (now revised Figure 6H, 7I). VPS33B and ITGA11 are involved in the ‘outward’ arm of recycled collagen-I, i.e. directing to fibrillogenesis route. We agree that the inclusion of additional functional studies using IPF patient-derived patient fibroblasts would add to the manuscript, and have performed siRNA against VPS33B and ITGA11 on IPF fibroblasts, and demonstrated a late of endocytic recycling events (revised Figure 8D, S6B, lines 351-353).  

      Reviewer #3 (Public Review): 

      Summary: 

      Chang et al. investigated the mechanisms governing collagen fibrillogenesis, firstly demonstrating that cells within tail tendons are able to uptake exogenous collagen and use this to synthesize new collagen-1 fibrils. Using an endocytic inhibitor, the authors next showed that endocytosis was required for collagen fibrillogenesis and that this process occurs in a circadian rhythmic manner. Using knockdown and overexpression assays, it was then demonstrated that collagen fibril formation is controlled by vacuolar protein sorting 33b (VPS33b), and this VPS33b-dependent fibrillogenesis is mediated via Integrin alpha-11 (ITGA11). Finally, the authors demonstrated increased expression of VPS33b and ITGA11 at the gene level in fibroblasts from patients with idiopathic pulmonary fibrosis (IPF), and greater expression of these proteins in both lung samples from IPF patients and in chronic skin wounds, indicating that endocytic recycling is disrupted in fibrotic diseases. 

      Strengths: 

      The authors have performed a comprehensive functional analysis of the regulators of endocytic recycling of collagen, providing compelling evidence that VPS33b and ITGA11 are crucial regulators of this process. 

      Weaknesses: 

      Throughout the study, several different cell types have been used (immortalised tail tendon fibroblasts, NIHT3T cells, and HEK293T cells). In general, it is not clear which cells have been used for a particular experiment, and the rationale for using these different cell types is not explained. In addition, some experimental details are missing from the methods.

      We thank the reviewer for pointing out the lack of clarity, and have filled in missing information in the methods. HEK293T cells were used for virus production for the VPSoe system, and we have clarified the cell types used in figure legends (predominantly iTTF). We have also provided justification when NIH3T3 cells were used (revised lines 290-291).    

      There is also a lack of functional studies in patient-derived IPF fibroblasts which means the link between endocytic recycling of collagen and the role of VPS33b and ITGA11 cannot be fully established.

      We thank the reviewer for this comment, which was also raised by reviewer 2 above. We agree that the inclusion of additional functional studies using IPF patient-derived patient fibroblasts would add to the manuscript and have performed siRNA against VPS33B and ITGA11 on IPF fibroblasts, and demonstrated a late of endocytic recycling events (revised Figure 8D, S6B, lines 351-353).  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      The authors inhibit Clathrin-dependent endocytosis with dyngo4a. It is well known that this inhibitor is not highly specific for this pathway. It is also not explained why the authors only inhibit the Clathrin uptake pathway, and not pinocytosis or Clathrin-independent endocytosis too. The authors refer to papers that describe pinocytosis for collagen endocytosis.

      We thank the reviewer for raising this question. Based on the fact that inhibition of clathrin-dependent pathway does not completely abrogate endocytosis of collagen-I, we anticipate that other pathways are involved in mediating collagen-I uptake, although additional data suggested this is unlikely through fluid-phase endocytosis, and is receptor mediated (revised Figure S2B, C).  

      Where does the ColI go in the cell? Depending on the uptake pathway, it is likely to pass through endocytic carriers to endosomes, where it may be recycled to the PM or degraded. From the start, the authors describe the ColI as being in vesicular structures, however, the imaging data that this is based on is not co-labelled with anything to determine the potential structure/localisation. This is not done at any point in the paper, until IF is shown of ColI with VIPA39, however without the relevant controls, this IF is unconvincing, as the general pattern of ColI and VIPA39 as an endosomal marker are not classically recognisable. Additionally, VPS33B is described as a late endosome/lysosome marker, which would have different connotations on ColI trafficking or destination than other types of endosomes.

      We thank the reviewer for pointing out the weaknesses in our original IF. We have included new confocal images showing labelled collagen co-localisation with GFP-tagged Rab5 through transient transfection, which is a more traditional endosome marker (revised Figure 1D, Figure S6A).  

      We are currently characterising the compartments to where ColI is trafficked to, which is being prepared as part of a methodology-based manuscript. We believe that this characterisation would be too detailed to be included in a revised version of this manuscript. The Kadler lab also have data suggesting that the lysosome is involved in collagen fibrillogenesis instead of its canonical degradation function, which is in another submitted manuscript (https://www.researchsquare.com/article/rs-1336021/v1). It was not included in this manuscript due to our focus (i.e. endocytic-recycling).   

      In Figure 5H, the pattern of Cy5-ColI staining looks like it could even be ER/Golgi in the VPSKO zoom panel, but in the absence of co-labelling, we cannot conclude anything. In order for the authors to conclude that ColI is within the endosomes, co-labelled If should be performed to demonstrate ColIendosomal colocalization. Likewise for the role of VPS33B in ColI fibrillogenesis: dependence of the process is demonstrated, but the relationship is not defined. This could be clarified using IF. This would also support the authors' statements of co-trafficking between ColI, VPS33B, and VIPA39, which as the paper stands, is not demonstrated.

      We would like to clarify that our hypothesis is that the endosome controls how collagen is being deposited outside the cell, i.e. whether it’s protomeric secretion or fibrillogenesis, and that the decision of whether an endocytosed collagen is recycled or degraded lies in this compartment. The reviewer is correct that it may not be just the endosome that endocytosed collagen-I ends up in, as we have new data suggesting involvement of other intracellular compartment, although the detailed mechanism is beyond the scope of this manuscript. Nonetheless, we have included new data showing co-localisation of endocytosed collagen with Rab5 in this revised manuscript (revised Figure 1D, Figure S6A).  

      The basis of this paper is that endocytosis of ColI must occur before re-exocytosis as fibrillar ColI. The authors show this through pulse-chase experiments, with a trypsinisation step to remove any externally bound ColI. The authors also show nice time progression by flow cytometry, but it would truly demonstrate this point if they showed 0 timepoint, or low timepoint of IF to show progressive lengthening of ColI fibrils. This is used early on in Figure 1D, although the presentation here is not very clear. This is especially important as the authors address the self-seeding capabilities of Collagen in cell-free conditions in Figure 1F.

      We would like to thank the reviewer for this suggestion.  From previous endogenously tagged collagen data, we know that the appearance of collagen fibrils is rather rapid, thus it may not be a gradual lengthening as expected, but rather a depletion of endocytosed collagen in the initial seeding/growth step (please see https://www.researchsquare.com/article/rs-1336021/v1). We have included an image of replated fibroblasts after 18 hours showing no appearance of extracellular collagen, endogenous or otherwise (revised Figures S2A, line 110).  

      Finally, although the involvement of ITGA11 is interesting, it is not well described, and its role is not well demonstrated. This could likely be clarified by an additional introduction to ITGA11 and its role in collagen exocytosis/fibrillogenesis.

      We would like to thank the reviewer for pointing this out and have included additional sentences to specifically introduce ITGA11 and its role in fibrillogenesis (see lines 320, 321; 446-450).  

      Specific points: 

      Line 73: You haven't compared reuse vs production, so you can't say that reuse is central rather than production. They may be both as important or production still may be the most crucial, maybe it depends on cell/collagen type. Using the ColI KD or CHX to block nascent synthesis, you could directly compare the impact of both.

      We would like to clarify that we are not referring to reuse/recycling here. We meant that production of collagen (i.e. single hetero/homotrimer molecules within the cell) is not as crucial as the utilisation (i.e. are these being secreted as protomers, or assembled into fibrils) of these building blocks by the cells, which was supported by our finding that production (as suggested by mRNA levels) of IPF fibroblasts are similar to that in control fibroblasts (now revised Figure 8A). We have conducted ColI siRNA to block nascent synthesis in the original manuscript and showed that fibroblasts can efficiently make new fibrils by recycling exogenous collagen (Figure 3B, C), although we appreciate that siRNA may not completely inhibit endogenous production. Thus, we have also included new data using collagen-I knockout cells to support our hypothesis that without endogenous production, fibroblasts can still effectively make collagen fibrils if they can reuse what is available in the extracellular space (revised Figure 4, Figure S3C, D; lines 178-199).  

      Lines 83-87: The rationale for this experiment is not clear. Cy3-ColI is added, taken up into cells, and incorporated into fibrils coming from cells. 5FAM-ColI is added at a later stage, then at 2 days (when incorporation is demonstrated in Fig 1B), it is also incorporated into cells as expected. Why does this comment on ColI not being degraded any more than Cy3-ColI alone?

      We believe that the pulse chase experiment using the differently tagged collagen demonstrated a dimension of dynamics that is not demonstrated with Cy3-ColI alone. In this case, Cy3-ColI was initially added, and removed after 3 days; 5FAM-ColI is then added and incubated for 2 more days. Thus after 5 days since the initial pulse, the Cy3-ColI persisted and was not degraded. We would like to apologise for causing this confusion, and have clarified in the revised manuscript (lines 542-549; Figure S1D figure legend).  

      Figure 1A: I would like to see a negative control: either dark colI or no Cy3-Col, or timescale. Is B quantified from these images?

      We thank the reviewer for this comment. We have added the nocollagen control image in our revision (revised Figure S1D). 1B is not quantified from the ex vivo tendon experiments, but rather the in vitro cell culture experiments (i.e. those from 1D-1F, although they are all from independent experiments).  

      Figure 1B: in iTTF cells (immortalised tendon cells) Corrected to max: What does that mean?

      As there are variations between individual experiments (e.g. changes in the amount of collagen added due to pipetting) we have normalised to the maximum value obtained in each individual experiments so that we can display all biological repeats within the same graph.  

      Figure 1C: You can't say ColI is in vesicular structures from this, they are spots, yes, but that could also be in Golgi/ER (unlikely to be cytosolic but not impossible).

      We appreciate this comment and have change the wording accordingly and call them intracellular/punctate structures.

      Figure 1D: Not the best presentation: The cell mask has structures: what are these? It's not clear if this is a single cell, would be better with a defined marker (endocytic marker, lysosome etc). Instead of a low-resolution 3D view, it would be clearer with normal confocal XY and zooms of "vesicular structures" using appropriate markers as 3D reconstructions I think it could be removed.

      This is a single cell and the cell mask is staining plasma membrane. We didn’t use defined marker as we wanted to visualise the whole intracellular cell compartment. We appreciate that further proof is needed to verify the location of the endocytosed collagen, and have included additional confocal imaging data to support the localisation of collagen into Rab5 positive intracellular compartments (revised Figure 1D, Figure S6B).  

      Figure 1 E/F: Cy3 is only visible in extracellular structure, not also intracellular. Why? Would be useful to see the time points of incorporation at the end of the pulse, then at an early point into the chase, to demonstrate 1) Cy3-ColI uptake into cells and progressive incorporation rather than potential direct binding of ColI-Cy3 to ECM, or other non-specific factors. Showing the image at 0t would demonstrate an absence of external labelled colI and therefore its appearance later could be presumed that it had been internalised before.

      As the cells were trypsinized and replated after one hour labelled collagen feeding to ensure we are only tracking endocytosed collagen, t=0 in this case would be cells that are unattached. We have included t=18hr images post replate instead to show baseline level of collagen (revised Figures S2A, line 110).

      Figure S1A: yellow box: doesn't show only Cy3-ColI, there is red and yellow in the central cell, and large yellow blobs in the cell above. These images do not support this claim, including the Fiber Zoom box. They should also be shown in single channels to demonstrate the authors' points better.

      Apologies for the confusion – this is to show that newly added FAM5 Collagen is also co-localising with previously endocytosed Cy3-ColI, i.e. the Cy3-ColI is persisting rather than being degraded.  

      Line 92: endocytosed into distinct structures: These images are very vague, but I don't think you can call them distinct structures, all you can say from this is that they are spots.

      We have changed the wording to ‘distinct puncta’.  

      It is not clear why the authors use Cy3, Cy5, and 5FAM labelled colI. A brief explanation would be useful.

      Apologies for the confusion, we initially included our justification (to show that the fluorescence labels do not change the way collagen is internalised) but removed it in the final manuscript due to length. We have added the justification (revised line 101-102).   

      Figure 1F: It would be useful to see a quantification of the Cy3 channel here: I agree with the conclusions, and find the 0.5 ug/ml condition more convincing than 0.1 actually, although there is some feint Cy3 in cell-free samples there seems to be quite a big increase in the presence of cells, and this would look more convincing if quantified.

      We thank the reviewer for this suggestion and have included quantification in the revised manuscript (revised Figure 1G-I).  

      Figure 2B: Dyng is not an abbreviation of Dyng. Standardise Dyng/Dyngo/Dyngo4a. WB is soluble colI and represents little (if any) insoluble col. IF is more or less the other way round. How do they compare this?

      Thank you for pointing out the inconsistencies, we have corrected this in the revised manuscript. We took the conditioned media from the same experiment where cells are fixed for IF and carried out Western blot analyses. The IF showed some collagen still present, albeit significantly reduced. This is in agreement with the western blot results (i.e. Dyng4a inhibits both soluble and insoluble forms of collagen deposition).  

      Figure 2C: not an image series. Quant: no cells/independent exps and STATS?

      Apologies for the missing experimental details in figure legends, it should say ‘representative of N=3 experiments’. We are not sure what the reviewer meant by Figure 2C not being an image series, as we meant it to be an image series of the individual fluorescence channels. We have changed this terminology to avoid confusion, and have included statistical analyses in the methods section. The statistical analyses of the fibril quantification is next to the fluorescence images.  

      Figures 2D/E: The authors show that internalised ColI peaks at 20h and decreases to 60h, Fibers peak at 40h. How is this measured? ECM removed? Why would there be less in the cells, degradation? Whats the synchronisation?

      We apologise for omitting the synchronisation method in methods section, and have included in our revised manuscript (revised lines 542-544). This is through dexamethasone addition (and removal after 1hr incubation) as standard. The internalised Col-I is measured using Cy3ColI so the cells would have both nascent and external collagen. Total intracellular collagen at the different time points would likely be higher than represented as a result, but here we are demonstrating that internalisation is a rhythmic event using the external labelled collagen. Fibers are measured using standard IF and then fibril counting.  

      Please note that we are only overlaying the two graphs to form our hypothesis that endocytosis may be used for accumulation of collagen protomers that then allows for efficient fibrillogenesis. They are not directly comparable as the quantification are of different things (internalised Cy3-ColI, total collagen fibrils). We have clarified this in our discussion (revised lines 399-401).  

      Discussion: Where does the ColI go? Solubilised? Degraded? Taken up by other cells? 

      The inverse correlation is not very tight. In fact, at 38h where fiber count peaks, Cy3-ColI also peaks (esp in normalised data, Figure S2D).

      We thank the reviewer for this comment and have reworded our main text to reflect this, and included additional discussion in our revised manuscript (revised lines 401-404).  

      Line 123: What is the turnover rate of Fibrils? Don't know for how long the transcription has been done, or when this would affect the fibril number. You have the quant for Fn1, where is the quant for ColI?

      We have included the quantification of collagen-I in original Figure 2A. We appreciate that it might cause confusion in Figure 2C (as we co-stained ColI and Fn1 in the same experiment) we have removed the collagen-I panel from the revised Figure 2C. We know from previous results that the number of fibrils fluctuate over 24hour period, although the turnover of one specific fibril is unlikely going to be 24 hours (https://www.biorxiv.org/content/10.1101/331496v2)

      Line 124: no accumulation of col in extracellular space, but you don't know how much endogenous colI (or other endogenous ECM proteins) they're taking up as it isn't measured here. If the author wants to comment on this, should use either exogenous col to monitor take up and resection or block transcription/translation to show fibril formation endo/exocytosis independent of endogenous synthesis.

      This experiment has been done in the original manuscript – siCol1a1 experiment was done with two rounds of siRNA, first round is normal transfection followed by reverse transfection onto fresh coverslips (this will ensure no prior ECM is being deposited, see Figure 3). However we appreciate that there may still be low levels of endogenous collagen-I, and thus have included new data using collagen-I knock-out fibroblasts to strengthen our findings (revised Figure 4).  

      Line 142: Why is fibronectin synthesis also decreased in Col KD? This is clear in the image but no explanation/reference is given.

      Due to the dynamic and complex nature of ECM, it is unsurprising if there is a knockon effect when knocking down one matrix protein. However, we have quantified the amount of fibronectin fibril deposited by scr and siCol1a1 fibroblasts, and showed that there was in fact no significant change between the two treatments (revised Figure 3A).

      Figure 3A: Need labels for which colour/protein is shown. Needs quantifying, especially as the Fn1 decrease is not so obvious here, it is consistent between Figure 3A and 2C?

      We have provided quantification in the revision (revised Figure 3A). Figure 3A and 2C are two separate experiments (one is Dyngo treatment and one is siCol1a1), and neither showed significant changes in fibronectin fibril areas.   

      Figure 3B: Line 151: the text states that "The observation of fibrillar Cy3 signals in siCol1a1 cells showed that the cells can repurpose collagen into fibrils without the requirement for intrinsic collagen-I production (red arrow Figure 3B), however, there is clearly endogenous colI here too (along the fiber and also strongly at each end). Does the ColI antibody recognise the exogenous ColI?

      In our hands the ColI antibody does not recognise exogenous ColI, as the cell-free Cy3-ColI images were also stained with ColI antibody to ensure the two experimental conditions were treated exactly the same.

      This conclusion could only be made in the true absence of collagen: either in knock-out cells, or where collagen production/trafficking has been blocked (ie knockout of ColI chaperone or ERES block), or in a cell type that produces collagens but not ColI. Alternatively, if there are any fibrils seen that are completely negative, they should be shown in the figure and quantified (number of Cy3-ColI+-ColI+ vs Cy3-ColI+-ColI-).

      We thank the reviewer for this suggestion. We have included new data from collagen knock-out fibroblasts in this revision (revised Figure 4).  

      Figure S4A: the quality of this blot isn't very high, the result is not very clear and the high intensity (unspecific?) band below confounds the interpretation. In the author's previous paper (NCB 2020) the blots for VPS33B were much clearer, as is Fig S4D. It would be nice to include a clearer blot, maybe from the other repeats.

      This is the only blot that we used to select which knockout clones to use for our previous paper, which is why the quality is not as high. Knockout clones were all verified with additional western blots, and we do not think that endogenous VPS33b is expressed at high levels (also verified by MS analyses).  Fig S4D is overexpression of VPS33b, which is much easier to detect.  

      Figure S4D: This blot is much clearer, it would be useful to include a high gain to show the VPS33B band in CT to be able to understand the true increase.

      From the qPCR data one can see that the increase at mRNA is 20+ fold increase; we’ve always had problems trying to detect endogenous VPS33b using western blot or mass spectrometry analysis.  

      Figure 4A: The fibrils here in the CT are not obvious, and the difference between CT and KOs is not appreciable. Would this be clearer shown at a lower magnification, with zooms where needed? Or immunogold labelling/CLEM to label the ColI?

      It is not trivial to carry out immunogold labelling/CLEM. These are cell-derived matrices in culture and thus lower magnification may not show as many collagen fibrils as one would expect. We are not confident that lower magnification will provide more information as the characteristic D-banded collagen pattern will be lost.  

      Line 167/Figure 4B: It looks like there is more internal ColI in KO, but the images are not good enough to tell. This could be better shown by flow cytometry.

      We have previously seen that VPSKO leads to accumulation of collagen-I in intracellular punctas (NCB2020) which is also seen here. Flow cytometry data for internalisation of external collagen is already included in original Figure 5G (revised Figure 6H).  

      Again you mention intercellular vesicles, but based on these images, it is not possible to conclude this. These large spots could be aggregation elsewhere in the cell. Specific localisation should be shown by co-labelled IF/confocal, or it could be nicely shown by EM + fluorescent element (CLEM / Immunogold), or these statements removed from the text.

      We appreciate that the term ‘vesicles’ is very defined in the trafficking field, and have changed it to ‘intracellular compartments’.  

      Line 173-174 / Figure 4E: Why do you think the matrix mass is not increased in VPSoe by the approach shown in E when there is seemingly a huge increase by IF? E must also measure other ECM matrix proteins, which do you expect to be secreted by these cells? Could this confound the data if they too are affected by VPSoe?

      IF is showing specifically collagen-I. Hydroxyproline detects multiple collagens, and shows a trend of increase (although not significant due to one outlier). Matrix mass is a very generic measurement of total ECM deposited based on decellularized ECM weight. The reviewer is correct that VPSoe may also affect other ECM deposition, however here we are focussing specifically with its effect on collagen-I. How VPSoe changes other types of ECM deposition would be something that could be addressed in future studies and is not within scope of this manuscript.   

      Are the results in E paired?

      Individual values between control and VPSoe in each separate experiments are paired.  

      Figure 4F: Is quantification from IF shown in D? Specify which kind of microscopy it is based on.

      Quantification is based on fibril counting using standard fluorescence microscopy, as used in our previous paper. D is independent of F, as F is specifically looking at synchronised circadian effects, and D (and elsewhere) we are looking at global collagen deposition effects, irrespective of what time of day the cells are in.  

      Figure S5F: What do the yellow/red spots in the blots represent?

      We apologise for the initial unclear description of what the yellow/magenta circles depict in relation to the phosphoimages of the radiolabelled cell free translation products displayed in Supplementary Figure 5, panels F, G and I. These circles indicate non-glycosylated (yellow) and N-glycosylated (magenta) species respectively, as is now clearly descried in the revised manuscript.

      Figure 5 title: You can't conclude this from these images, need confocal and PM or cytosolic marker.

      We have changed the title to ‘VPS33B co-trafficks with collagen-I”. There is no good commercial VPS33b antibody for immunofluorescence staining, which is why we used the split GFP approach in this paper, and the images were acquired using confocal imaging (Olympus SpinSR system).  

      Figure 5E: The authors describe that ColI is in endosomes throughout most of the paper, and this is based on the involvement of VPS33B in the colI pathway. VPS33B is thought to be at the late endosome/lysosome. However, these images do not look like classic endosomes or lysosomes, or other normal organelle IF phenotypes. The fluorescent intensity looks saturated, and it is difficult to conclude anything from these images. It is unclear where in the cell the largest blob in the zoom would be localised and in which cell. I would suggest that this image is replaced and proper controls included (IgG controls and single channels) as well as using different markers for other potential intracellular structures.

      We appreciate the reviewers comment with regards to the classification of VPS33b localisation in the endosome compartment. We did not mean to use VPS33b as an endosome marker, as the focus of our studies are the function of VPS33b in directing endogenous or exogenous collagen to fibrillogenesis. With live imaging we could see endocytosed collagen moving in intracellular compartments, and have conducted additional staining to show co-localisation with Rab5 (revised Figure 1), which we take to indicate, through convention, that it is occupying an endosome compartment. We have included single channel images in the revised manuscript (revised Figure 6E).

      Line 255/ Figure 5G: no consistent change in uptake. Why are the results so varied in the KO and oe, here and in Fig 4C/E? N=4, what does that mean? 4 cells? 4 independent exps?

      In all cases, “N” represents independent biological experiments in this manuscript. Thus “N=4” in this case is 4 independent biological experiments, with at least 10,000 cells analysed per experiment. 

      We don’t know why there is a variation in response, however that is also why we concluded that it is unlikely that VPS33B is directly involved with collagen uptake. We have changed 5G (now revised Figure 5H) to a paired line graph for better representation.  

      Figure 5H shows the uptake of Cy5ColI. At this resolution, VP2ko looks like the col is ER, in one of the cells in the zoom, it looks like it is at Golgi. I think that the uptake route of ColI needs to be better defined, as there is no way to tell here where the colI goes. ColI being recycled/degraded would be most likely. But this figure looks like that might not be the case. It is also not clear where the zooms come from, they should be indicated with dashed boxes in the lower mag image

      We thank the reviewer for this comment, and agree that we need to define the uptake route of ColI. This is currently being assembled as a methodology manuscript, and how ColI is being recycled/degraded is one major research area of the Chang lab. 

      We have added dashed boxes in the lower mag images to indicate where the zooms derived from, and we would also like to thank the reviewer for pointing this out as we realised we have accidentally cropped the image to a slightly different area for the VPSko image, and have now corrected this.  

      Line 257: Based on this data, it could be trafficking through the cell as well as into the extracellular space.

      We think that VPS33B is involved in trafficking collagen through the cell to plasma membrane but not secreted, as based on our split-GFP experiment we never observed extracellular GFP signal, which suggests VPS33b is not deposited extracellularly.

      Line 259: "highlighting the role in recycling col to fibril formation sites" is an overstatement based on the data shown here, there is no data on colI trafficking or its regulation

      We respectfully disagree that we have not shown data on col-I trafficking or regulation by VPS33b – split GFP highlighted cotrafficking to the plasma membrane, and we have shown a clear relationship between VPS33b and collagen-I fibril formation, with minimal changes to collagen-I mRNA levels. We acknowledge that we have not shown specifically the location of VPS33b at fibrillogenic sites and have modified this statement in revised manuscript (revised line 302).  

      Line 262: "Having identified VPS33B as specifically driving collagen-I fibril formation" is also an overstatement.

      We refer here the data that VPS33b is not controlling collagen-I secretion (as demonstrated by the CM westerns) and specifically fibrillogenesis. We have clarified this in the revised text (revised line 304).  

      Line 286: It would be useful to have a brief intro to PLOD3.

      We have included a brief intro to PLOD3 in the introduction, as well as the results highlighted by the reviewer, in our revised manuscript (revised line 54-58).  

      Line 289/290: There could be other explanations for disruption to exo-endocytosis when disrupting col trafficking. Is VPS33B controlling exocytosis in general? Why should it be specific to col? Likewise with siITGA11 KD? Hypothesis for ITGA11 and fibrillogenesis?

      The relationship between ITGA11 and collagen fibrillogenesis is currently in a manuscript by Donald Gullberg and Cedric Zeltz, under revision at Matrix Biology (see reference 63 in revised manuscript). We do not think that VPS33b is controlling exocytosis in general, which is supported by the minimal change in ponceau stain of the western blots in the manuscript. Previously it has been shown that VPS33B co-trafficks with PLOD3, a collagen-I modifier.  

      Figure 6I: Why only quant Scr + siITGA11, not in VPSoe? It looks like there is still an increase in intracellular or fibril formation in VPSoe + siITGA11, which would be a key result to discuss.

      We would like to clarify that 6I (now revised Figure 7I) is on the endocytosis of exogenous collagen-I, not quantification of Figure 6H.  

      Line 307: Discuss fibrillogenic sites, what are they?

      As we have not shown direct evidence of VPS33B delivering endocytosed collagen at the site of fibrillogenesis, we have decided to alter the text to avoid overstatement, as suggested from previous reviewers’ comments.  

      Figure 8: What does pentachrome label?

      Pentachrome staining allows for simultaneous staining of multiple species: collagen in red, sulphated mucopolysaccharides in violet, red blood cells in yellow, muscle in orange, nuclei in green.

      Line 326: "In this study we have identified the endosome as a major protagonist in..." This is an overstatement and cant be drawn from this data.

      We have modified this statement to “In this study we have identified an endocytic recycling mechanism for type I collagen fibrillogenesis that is under circadian regulation”

      Line 330/331: "Collagen-I co-traffics with VPS33B in a VIPAS-containing endosomal compartment that directs collagen-I to sites of fibril assembly," This is also an overstatement that cannot be drawn from this data.

      We have modified this statement to “Collagen-I co-traffics with VPS33B to the plasma membrane for fibrillogenesis”.  

      Line 340: again, the demonstration of the involvement of the endocytic pathway is very limited.

      We have provided new evidence in the revised manuscript that support the involvement of classical endosomal compartments.  

      Line 366: You cant conclude this, you have not manipulated these proteins to show a functional effect or modulation of fibrillogenesis, it could still be a secondary effect.

      We have provided new evidence in the revised manuscript that supports this conclusion. 

      Line 569: "Unless otherwise stated, incubation and washes were done at room temperature." Which incubations? Specify if this is just post-fixation during the EM prep or during cell culture.

      This is specific to the EM preparation and we have clarified in the revised manuscript (revised line 663).  

      Small text alterations:

      Overall we would like to thank the reviewer for highlighting these errors and mistakes in our manuscript, and have corrected them in our revised manuscript.  

      Figure 1E: Fluoro image series? This is only one image.

      We wrote this to mean single channel images, we have corrected the terminology.  

      Line 111: Ref for Dyngo4a?

      We have included this in the revised manuscript  

      Line 121: introduction/abbreviation definition for Fn1? Instead it is on Line 140.

      Thank you for highlighting this, we have corrected this in revised manuscript.  

      Figure S2C: Alignment of labels cleaves x-axis.

      We thank the reviewer for catching this and have corrected this with our revised manuscript.  

      Figure S4F and G should be inverted to mention sequentially in the text.

      We thank the reviewer for catching this and have corrected this in our revised manuscript.  

      Line 182: Figure 4J should be G.

      We thank the reviewer for catching this and have corrected this in our revised manuscript.

      Line 209: typo: N-glycosylated.

      We have corrected this typo in our revised manuscript.

      Fig 6E: Very big as a figure element compared to others.

      We have made this smaller in the revised manuscript to fit better with rest of the figure.  

      Line 313: Figure 7E not F.

      Thank you for spotting this, we have corrected it.  

      Line 555: Typo: Scraped.

      We have corrected this typo in our revised manuscript.

      Line 562: missing )

      We have corrected this typo in our revised manuscript.

      Standardise

      We thank the reviewer for spotting the mistakes below and have corrected in our revised manuscript.  

      Legends: Include numbers of repeats and STATs throughout. 

      Terminology: Dyng etc. 

      Scale bars: some included as editable lines, some with size on top, small/large etc.

      In certain cases we have positioned the scale bars in different regions of the figures to ensure no obscuring of the images.

      VPS33b v B. 

      Reviewer #2 (Recommendations For The Authors):  

      The authors can improve the experimental part of the manuscript the following: 

      -  For all the western blots please include molecular weight markers.

      We thank the reviewer for noticing this omission and have included molecular weight markers in the revised manuscript.  

      - Performing immunofluorescence and western blot analysis of endocytosed collagen -/+ inhibitors for lysosomal degradation (BafA1 or E64d+PepstatinA) in order to exclude endocytosis for degradation.

      We thank the reviewer for this comment, another paper from the lab has identified lysosome to be involved in collagen fibrillogenesis (https://www.biorxiv.org/content/10.1101/2024.05.09.593302v1), thus  

      - Figure out how Dyngo4a is affecting Col1 secretion in the first place? Does it interfere with the secretory pathway. Alternatively, use a different model to block endocytosis (e.g. siRNA Dynamin).

      We thank the reviewer for raising this. The Dyngo CM blot for total ponceau stain (revised Figure 2B) showed minimal changes, which suggest that global secretion is not affected.  

      - Further characterization of the VPS33B / collagen vesicles by immunofluorescence containing markers for early, late, and recycling endosomes. Block endocytic recycling by depletion of either Rabs or e.g. EHD1.

      There are no good VPS33b antibody for staining. We have included images of GFP-tagged Rab5 co-localisation with labelled collagen-I (revised Figure 1D, Figure S6B).  

      - Further clarify the status of the VPS33B knockouts e.g. by sequencing. also provide a readout of the siRNA KD, besides the mRNA levels, since there the difference is not striking.

      The knockout cell lines were characterised previously in our 2020 paper, which is referred to in our revised manuscript. We have always had issues detecting endogenous VPS33b due to reagents limitations, which is why we resorted to mRNA as the key readout.  

      - Doing siRNA knockdowns and endocytosis inhibition in the IPF fibroblasts to further strengthen the link between elevated expression of VPS33B/ ITGA11 and increased collagen uptake.

      We thank the reviewer for suggesting these experiments. Due to limitations of the patient-derived fibroblasts (cell numbers and passage numbers) we had to prioritise experiments, and thus have performed siRNA against VPS33B and ITGA11 in the IPF fibroblasts. We showed that in both cases the amount of recycled labelled-collagen in collagen fibrils is significantly reduced (revised Figure 8D).  

      Reviewer #3 (Recommendations For The Authors): 

      Major points 

      (1) Choice of cells: Please provide a rationale for why each cell line was used, and make sure that it is clear throughout the manuscript which cell line was used for each particular experiment. The HEK293T cell line is also missing from the reagent table.

      We thank the reviewer for pointing out this omission, and have clarified in our revised manuscript which cell lines were used in each experiment. We used HEK293T to generate lentiviruses as described in the methods section.  

      (2) Missing information from methods. Experimental details are missing from the methods in several places, making it difficult for someone to replicate an experiment. For example, no details are given in the methods describing the explant culture of murine tail tendons (described in results lines 78100), and there are no details on how the skin samples were obtained or stained. Further, no ethical approval details are provided for the use of human skin tissue.

      We apologise for leaving the ethical approval details and skin sample collection out, this was an oversight and will be included in the revised manuscript. We have also included the method to how murine tail tendons were cultured ex vivo (revised lines 527-531, 546-553).  

      (3) Functional studies in patient-derived cells. To fully establish the role of VPS33b and ITGA11 in fibrotic diseases, functional studies including the knockdown/overexpression of these genes could be performed to establish if the same response is seen as in non-diseased cells.

      We agree that this will add much to the paper, and have performed siRNA against VPS33B and ITGA11 in the IPF fibroblasts. We showed that in both cases the amount of recycled labelled-collagen in collagen fibrils is significantly reduced (revised Figure 8D).

      Minor Points

      We thank the reviewer for pointing out these mistakes, and have corrected and included additional details in the revised manuscript.  

      (1) Lines 51-52. Wording of this sentence is unclear, please rephrase. 

      (2) Line 182. Should this be Fig 4G rather than J? 

      (3) Line 209. Correct spelling of glycosylated. 

      (4) Line 463. Incomplete brackets and details missing? 

      (5) Line 590. Correct tense - was rather than are. 

      (6) Line 593. Specify centrifugation speed. 

      (7) Line 619. Nuclei rather than nucleus. 

      (8) Ln 650. Statistical analysis - was normality tested? 

      (9) Figure 1e - Difficult to read labels for coll/DAPI.

    1. Share The positive impact of gratitude on mental and physical health on LinkedIn Sign up for the Smarter Faster newsletter A weekly newsletter featuring the biggest ideas from the smartest people Fields marked with an * are required Email * If you are a human seeing this field, please leave it empty. var formDisplay=1;var nfForms=nfForms||[];var form=[];form.id='29';form.settings={"objectType":"Form Setting","editActive":true,"title":"Inline Newsletter Prompt - Smarter","default_label_pos":"above","show_title":"0","clear_complete":"1","hide_complete":"1","logged_in":"0","wrapper_class":"form-one-line-newsletter bt-form-wrapper","element_class":"","key":"","add_submit":"0","currency":"","repeatable_fieldsets":"","unique_field_error":"A form with this value has already been submitted.","not_logged_in_msg":"","sub_limit_msg":"The form has reached its submission limit.","calculations":[],"container_styles_show_advanced_css":"0","title_styles_show_advanced_css":"0","row_styles_show_advanced_css":"0","row-odd_styles_show_advanced_css":"0","success-msg_styles_show_advanced_css":"0","error_msg_styles_show_advanced_css":"0","formContentData":[{"order":1,"cells":[{"order":0,"fields":["email"],"width":50},{"order":1,"fields":["subscribe_1646670993680"],"width":50}]},{"order":2,"cells":[{"order":0,"fields":["form_page_url"],"width":16},{"order":1,"fields":["form_placement"],"width":16},{"order":2,"fields":["conversion_type"],"width":16},{"order":3,"fields":["mailchimp_internal_source"],"width":16},{"order":4,"fields":["rejoiner_preference_tags"],"width":16},{"order":6,"fields":["mailchimp_list_action"],"width":16}]}],"changeEmailErrorMsg":"Please enter a valid email address!","changeDateErrorMsg":"Please enter a valid date!","confirmFieldErrorMsg":"These fields must match!","fieldNumberNumMinError":"Number Min Error","fieldNumberNumMaxError":"Number Max Error","fieldNumberIncrementBy":"Please increment by ","formErrorsCorrectErrors":"Please correct errors before submitting this form.","validateRequiredField":"This is a required field.","honeypotHoneypotError":"Honeypot Error","fieldsMarkedRequired":"Fields marked with an <span class=\"ninja-forms-req-symbol\">*<\/span> are required","drawerDisabled":false,"form_title_heading_level":"3","objectDomain":"display","allow_public_link":0,"embed_form":"","ninjaForms":"Ninja Forms","fieldTextareaRTEInsertLink":"Insert Link","fieldTextareaRTEInsertMedia":"Insert Media","fieldTextareaRTESelectAFile":"Select a file","formHoneypot":"If you are a human seeing this field, please leave it empty.","fileUploadOldCodeFileUploadInProgress":"File Upload in Progress.","fileUploadOldCodeFileUpload":"FILE UPLOAD","currencySymbol":"&#36;","thousands_sep":",","decimal_point":".","siteLocale":"en_US","dateFormat":"m\/d\/Y","startOfWeek":"1","of":"of","previousMonth":"Previous Month","nextMonth":"Next Month","months":["January","February","March","April","May","June","July","August","September","October","November","December"],"monthsShort":["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"weekdays":["Sunday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday"],"weekdaysShort":["Sun","Mon","Tue","Wed","Thu","Fri","Sat"],"weekdaysMin":["Su","Mo","Tu","We","Th","Fr","Sa"],"recaptchaConsentMissing":"reCaptcha validation couldn&#039;t load.","recaptchaMissingCookie":"reCaptcha v3 validation couldn&#039;t load the cookie needed to submit the form.","recaptchaConsentEvent":"Accept reCaptcha cookies before sending the form.","currency_symbol":"","beforeForm":"","beforeFields":"","afterFields":"","afterForm":""};form.fields=[{"objectType":"Field","objectDomain":"fields","editActive":false,"order":1,"idAttribute":"id","label":"Email","type":"email","key":"email","label_pos":"hidden","required":1,"default":"","placeholder":"Your email address","container_class":"","element_class":"","admin_label":"","help_text":"","custom_name_attribute":"email","personally_identifiable":1,"wrap_styles_show_advanced_css":0,"label_styles_show_advanced_css":0,"element_styles_show_advanced_css":0,"cellcid":"c24697","value":"","wrap_styles_border":"","wrap_styles_width":"","wrap_styles_margin":"","wrap_styles_padding":"","wrap_styles_float":"","label_styles_border":"","label_styles_width":"","label_styles_font-size":"","label_styles_margin":"","label_styles_padding":"","label_styles_float":"","element_styles_border":"","element_styles_width":"","element_styles_font-size":"","element_styles_margin":"","element_styles_padding":"","element_styles_float":"","drawerDisabled":"","field_label":"Email","field_key":"email","manual_key":1,"id":282,"beforeField":"","afterField":"","parentType":"email","element_templates":["email","input"],"old_classname":"","wrap_template":"wrap"},{"objectType":"Field","objectDomain":"fields","editActive":false,"order":2,"idAttribute":"id","label":"Subscribe","type":"submit","processing_label":"Processing","container_class":"container-submit-inline","element_class":"","key":"subscribe_1646670993680","wrap_styles_border":"","wrap_styles_width":"","wrap_styles_margin":"","wrap_styles_padding":"","wrap_styles_float":"","wrap_styles_show_advanced_css":0,"element_styles_border":"","element_styles_width":"","element_styles_font-size":"","element_styles_margin":"","element_styles_padding":"","element_styles_float":"","element_styles_show_advanced_css":0,"submit_element_hover_styles_border":"","submit_element_hover_styles_width":"","submit_element_hover_styles_font-size":"","submit_element_hover_styles_margin":"","submit_element_hover_styles_padding":"","submit_element_hover_styles_float":"","submit_element_hover_styles_show_advanced_css":0,"cellcid":"c24700","drawerDisabled":"","field_label":"Subscribe","field_key":"subscribe_1646670993680","admin_label":"","id":283,"beforeField":"","afterField":"","value":"","label_pos":"above","parentType":"textbox","element_templates":["submit","button","input"],"old_classname":"","wrap_template":"wrap-no-label"},{"objectType":"Field","objectDomain":"fields","editActive":false,"order":3,"idAttribute":"id","type":"hidden","label":"Page Url","key":"form_page_url","default":"{wp:post_url}","admin_label":"","wrap_styles_border":"","wrap_styles_width":"","wrap_styles_margin":"","wrap_styles_padding":"","wrap_styles_float":"","wrap_styles_show_advanced_css":0,"label_styles_border":"","label_styles_width":"","label_styles_font-size":"","label_styles_margin":"","label_styles_padding":"","label_styles_float":"","label_styles_show_advanced_css":0,"element_styles_border":"","element_styles_width":"","element_styles_font-size":"","element_styles_margin":"","element_styles_padding":"","element_styles_float":"","element_styles_show_advanced_css":0,"cellcid":"c24703","manual_key":1,"drawerDisabled":"","field_label":"Page Url","field_key":"form_page_url","id":284,"beforeField":"","afterField":"","value":"https:\/\/bigthink.com\/neuropsych\/benefits-of-gratitude-robert-emmons\/","label_pos":"above","parentType":"hidden","element_templates":["hidden","input"],"old_classname":"","wrap_template":"wrap-no-label"},{"objectType":"Field","objectDomain":"fields","editActive":false,"order":4,"idAttribute":"id","type":"hidden","label":"Form Placement","key":"form_placement","default":5,"admin_label":"","wrap_styles_border":"","wrap_styles_width":"","wrap_styles_margin":"","wrap_styles_padding":"","wrap_styles_float":"","wrap_styles_show_advanced_css":0,"label_styles_border":"","label_styles_width":"","label_styles_font-size":"","label_styles_margin":"","label_styles_padding":"","label_styles_float":"","label_styles_show_advanced_css":0,"element_styles_border":"","element_styles_width":"","element_styles_font-size":"","element_styles_margin":"","element_styles_padding":"","element_styles_float":"","element_styles_show_advanced_css":0,"cellcid":"c24705","drawerDisabled":"","manual_key":1,"field_label":"Form Placement","field_key":"form_placement","id":287,"beforeField":"","afterField":"","value":"5","label_pos":"above","parentType":"hidden","element_templates":["hidden","input"],"old_classname":"","wrap_template":"wrap-no-label"},{"objectType":"Field","objectDomain":"fields","editActive":false,"order":5,"idAttribute":"id","type":"hidden","label":"Conversion Type","key":"conversion_type","default":0,"admin_label":"","wrap_styles_border":"","wrap_styles_width":"","wrap_styles_margin":"","wrap_styles_padding":"","wrap_styles_float":"","wrap_styles_show_advanced_css":0,"label_styles_border":"","label_styles_width":"","label_styles_font-size":"","label_styles_margin":"","label_styles_padding":"","label_styles_float":"","label_styles_show_advanced_css":0,"element_styles_border":"","element_styles_width":"","element_styles_font-size":"","element_styles_margin":"","element_styles_padding":"","element_styles_float":"","element_styles_show_advanced_css":0,"cellcid":"c24707","drawerDisabled":"","manual_key":1,"field_label":"Conversion Type","field_key":"conversion_type","id":286,"beforeField":"","afterField":"","value":"","label_pos":"above","parentType":"hidden","element_templates":["hidden","input"],"old_classname":"","wrap_template":"wrap-no-label"},{"objectType":"Field","objectDomain":"fields","editActive":false,"order":6,"idAttribute":"id","type":"hidden","label":"Hidden","key":"mailchimp_internal_source","default":"Bigthink.com signup form: Inline SWAB Prompt","admin_label":"","wrap_styles_border":"","wrap_styles_width":"","wrap_styles_margin":"","wrap_styles_padding":"","wrap_styles_float":"","wrap_styles_show_advanced_css":0,"label_styles_border":"","label_styles_width":"","label_styles_font-size":"","label_styles_margin":"","label_styles_padding":"","label_styles_float":"","label_styles_show_advanced_css":0,"element_styles_border":"","element_styles_width":"","element_styles_font-size":"","element_styles_margin":"","element_styles_padding":"","element_styles_float":"","element_styles_show_advanced_css":0,"cellcid":"c24709","manual_key":1,"drawerDisabled":false,"field_label":"Hidden","field_key":"mailchimp_internal_source","id":289,"beforeField":"","afterField":"","value":"Bigthink.com signup form: Inline SWAB Prompt","label_pos":"above","parentType":"hidden","element_templates":["hidden","input"],"old_classname":"","wrap_template":"wrap-no-label"},{"objectType":"Field","objectDomain":"fields","editActive":false,"order":7,"idAttribute":"id","label":"Hidden","type":"hidden","cellcid":"c24711","key":"rejoiner_preference_tags","default":"big-think-smarter","admin_label":"","wrap_styles_border":"","wrap_styles_width":"","wrap_styles_margin":"","wrap_styles_padding":"","wrap_styles_float":"","wrap_styles_show_advanced_css":0,"label_styles_border":"","label_styles_width":"","label_styles_font-size":"","label_styles_margin":"","label_styles_padding":"","label_styles_float":"","label_styles_show_advanced_css":0,"element_styles_border":"","element_styles_width":"","element_styles_font-size":"","element_styles_margin":"","element_styles_padding":"","element_styles_float":"","element_styles_show_advanced_css":0,"manual_key":1,"drawerDisabled":false,"field_label":"Hidden","field_key":"rejoiner_preference_tags","id":285,"beforeField":"","afterField":"","value":"big-think-smarter","label_pos":"above","parentType":"hidden","element_templates":["hidden","input"],"old_classname":"","wrap_template":"wrap-no-label"},{"objectType":"Field","objectDomain":"fields","editActive":false,"order":8,"idAttribute":"id","label":"Hidden","type":"hidden","cellcid":"c24713","key":"mailchimp_list_action","default":"subscribe","admin_label":"","wrap_styles_border":"","wrap_styles_width":"","wrap_styles_margin":"","wrap_styles_padding":"","wrap_styles_float":"","wrap_styles_show_advanced_css":0,"label_styles_border":"","label_styles_width":"","label_styles_font-size":"","label_styles_margin":"","label_styles_padding":"","label_styles_float":"","label_styles_show_advanced_css":0,"element_styles_border":"","element_styles_width":"","element_styles_font-size":"","element_styles_margin":"","element_styles_padding":"","element_styles_float":"","element_styles_show_advanced_css":0,"manual_key":1,"drawerDisabled":"","field_label":"Hidden","field_key":"mailchimp_list_action","id":288,"beforeField":"","afterField":"","value":"subscribe","label_pos":"above","parentType":"hidden","element_templates":["hidden","input"],"old_classname":"","wrap_template":"wrap-no-label"}];nfForms.push(form); <nf-fields></nf-fields> <nf-cells></nf-cells> Gratitude is an affirmation of goodness, according to Dr. Robert Emmons. Photo by stockfour on Shutterstock Dr. Robert Emmons is known as the “world’s leading scientific expert on gratitude.” He is a psychology profession from the University of California, Davis and also the founding editor-in-chief of the Journal of Positive Psychology. Emmons has dedicated his life to better understanding what role gratitude and thankfulness play, not just in our lives, but in our mental and physical health as well.Featured VideosThe video player is currently playing an ad. You can skip the ad in 5 sec with a mouse or keyboard 1/100:243 powerful mind states: Flow state, good anxiety, and Zen Buddhism Skip Ad Continue watching3 powerful mind states: Flow state, good anxiety, and Zen Buddhismafter the adVisit Advertiser websiteGO TO PAGE.cnx-non-linear-ad-container .cnx-ad-bid-slot{position:absolute;top:0;left:0;grid-area:adslot;opacity:0;background:none;width:100%;height:100%}.cnx-non-linear-ad-container .cnx-ad-bid-slot.cnx-ad-bid-slot-selected{opacity:1;z-index:10}.cnx-non-linear-ad-container .cnx-ad-slot{display:flex;position:absolute;top:0;left:0;justify-content:center;align-items:center;width:100%;height:100%;overflow:hidden}.cnx-non-linear-ad-container .cnx-ad-slot video,.cnx-non-linear-ad-container video.cnx-ad-slot{background-color:unset}.cnx-ad-container .cnx-ad-bid-slot{position:absolute;top:0;left:0;grid-area:adslot;opacity:0;background:#f4f4f4;width:100%;height:100%}.cnx-ad-container .cnx-ad-bid-slot.cnx-ad-bid-slot-selected{opacity:1;z-index:10}.cnx-ad-container .cnx-ad-slot{display:flex;position:absolute;top:0;left:0;justify-content:center;align-items:center;width:100%;height:100%;overflow:hidden}.cnx-ad-container .cnx-ad-slot div{background-color:transparent !important}.cnx-ad-container .cnx-ad-slot iframe{box-sizing:border-box;border:3px solid #ffffff !important;color-scheme:none}.cnx-ad-container .cnx-ad-slot iframe:not([id]){border:none !important}.cnx-ad-container .cnx-ad-slot-video-type iframe{border:none !important}.cnx-ad-container .cnx-ad-slot video,.cnx-ad-container video.cnx-ad-slot{background-color:#f4f4f4} Many people are in need of motivation to practice gratitude for the good things in life, especially during a pandemic when stress-levels are at an all-time high.

      I feel like during the pandemic many people didn't have any motivation to do anything and it made many peoples life hard

    1. this is the time it will finally happen, this is the technology that will finally deliver the hammer blow to human labor. And yet, it never happens.

      Not to be the exact kind of person this is referring to, but to return to the discussion on AI in creative spaces, we can see signs of this happening. I use the Coca-Cola AI ad as an example earlier, but I've seen a handful of others in the past too. Usually from smaller companies or things that look like they may just be scams. But one example I can vividly remember from recent was this one for Tyson chicken nuggets I saw on Instagram which looked exactly like it was made with AI. I've seen people try to push AI as a new cost effective way to generate things without needing to have people do it instead. There is an argument to be made that AI is taking jobs. Even if not to the level of everyone is out of a job and we all die and are homeless, there is something happening here. I do not think it can be ignored

    2. AI will not destroy the world, and in fact may save it.

      I understand that moral panics are a thing, and that they lead to severe overreactions towards anything new. Treating it as if it is a blight upon humanity. But doesn't think almost feel like exactly that, just on the opposite side? There is 100% good that can come from AI, we just need to make sure to use correctly, and not for other unethical purposes, but to act like this is THE SAVIOR THAT WILL BRING US ALL HOPE is a little much.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      "Neural noise", here operationalized as an imbalance between excitatory and inhibitory neural activity, has been posited as a core cause of developmental dyslexia, a prevalent learning disability that impacts reading accuracy and fluency. This study is the first to systematically evaluate the neural noise hypothesis of dyslexia. Neural noise was measured using neurophysiological (electroencephalography [EEG]) and neurochemical (magnetic resonance spectroscopy [MRS]) in adolescents and young adults with and without dyslexia. The authors did not find evidence of elevated neural noise in the dyslexia group from EEG or MRS measures, and Bayes factors generally informed against including the grouping factor in the models. Although the comparisons between groups with and without dyslexia did not support the neural noise hypothesis, a mediation model that quantified phonological processing and reading abilities continuously revealed that EEG beta power in the left superior temporal sulcus was positively associated with reading ability via phonological awareness. This finding lends support for analysis of associations between neural excitatory/inhibitory factors and reading ability along a continuum, rather than as with a case/control approach, and indicates the relevance of phonological awareness as an intermediate trait that may provide a more proximal link between neurobiology and reading ability. Further research is needed across developmental stages and over a broader set of brain regions to more comprehensively assess the neural noise hypothesis of dyslexia, and alternative neurobiological mechanisms of this disorder should be explored.

      Strengths:

      The inclusion of multiple methods of assessing neural noise (neurophysiological and neurochemical) is a major advantage of this paper. MRS at 7T confers an advantage of more accurately distinguishing and quantifying glutamate, which is a primary target of this study. In addition, the subject-specific functional localization of the MRS acquisition is an innovative approach. MRS acquisition and processing details are noted in the supplementary materials according to the experts' consensus-recommended checklist (https://doi.org/10.1002/nbm.4484). Commenting on the rigor, the EEG methods is beyond my expertise as a reviewer.

      Participants recruited for this study included those with a clinical diagnosis of dyslexia, which strengthens confidence in the accuracy of the diagnosis. The assessment of reading and language abilities during the study further confirms the persistently poorer performance of the dyslexia group compared to the control group.

      The correlational analysis and mediation analysis provide complementary information to the main case-control analyses, and the examination of associations between EEG and MRS measures of neural noise is novel and interesting.

      The authors follow good practice for open science, including data and code sharing. They also apply statistical rigor, using Bayes Factors to support conclusions of null evidence rather than relying only on non-significant findings. In the discussion, they acknowledge the limitations and generalizability of the evidence and provide directions for future research on this topic.

      Weaknesses:

      Though the methods employed in the paper are generally strong, there are certain aspects that are not clearly described in the Materials & Methods section, such as a description of the statistical analyses used for hypothesis testing.

      Thank you for pointing this out. A description of the statistical models used in the analyses of EEG biomarkers has been added to the Materials and Methods:

      “First, exponent and offset values were averaged across all electrodes and analyzed using a 2x2 repeated measures ANOVA with group (dyslexic, control) as a between-subjects factor and condition (resting state, language task) as a within-subjects factor. Age was included in the analyses as a covariate due to the correlation between variables. Next, exponent and offset values were averaged across electrodes corresponding to the left (F7, FT7, FC5) and right inferior frontal gyrus (F8, FT8, FC6), and to the left (T7, TP7, TP9) and right superior temporal sulcus (T8, TP8, TP10). The electrodes were selected based on the analyses outlined by Giacometti and colleagues (2014) and Scrivener and Reader (2022). For these analyses, a 2x2x2x2 repeated measures ANOVA with age as a covariate was conducted with group (dyslexic, control) as a between-subjects factor and condition (resting state, language task), hemisphere (left, right), and region (frontal, temporal) as within-subjects factors. Results for the alpha and beta bands were calculated for the same clusters of frontal and temporal electrodes and analyzed with a similar 2x2x2x2 repeated measures ANOVA; however, for these analyses, age was not included as a covariate due to a lack of significant correlations.”

      We also expanded the description of the statistical models used in the analyses of MRS biomarkers:

      “To analyze the metabolite results, separate univariate ANCOVAs were conducted for Glu, GABA+, Glu/GABA+ ratio and Glu/GABA+ imbalance measures with group (control, dyslexic) as a between-subjects factor and voxel gray matter volume (GMV) as a covariate. Additionally, for the Glu analysis, age was included as a covariate due to a correlation between variables. Both frequentist and Bayesian statistics were calculated. Glu/GABA+ imbalance measure was calculated as the square root of the absolute residual value of a linear relationship between Glu and GABA+ (McKeon et al., 2024).”

      With regard to metabolite quantification, it is unclear why the authors chose to analyze and report metabolite values in terms of creatine ratios rather than quantification based on a water reference given that the MRS acquisition appears to support using a water reference.

      We have decided to use the ratio of Glu and GABA to total creatine (tCr), as this is still a common practice in MRS studies at 7T (e.g., Nandi et al., 2022; Smith et al., 2021). This approach normalizes the signal, reducing the impact of intensity variations across different regions and tissue compositions. Additionally, total creatine concentration is considered relatively stable across different brain regions, which is particularly important in our study, where a functional localizer was used to establish the left STS region individually. Our decision was further influenced by previous studies on dyslexia (Del Tufo et al., 2018; Pugh et al., 2014) which have reported creatine ratios and included GM volume as a covariate in their models, thus providing comparability. It is now indicated in the Results:

      “For comparability with previous studies in dyslexia (Del Tufo et al., 2018; Pugh et al., 2014) we report Glu and GABA as a ratio to total creatine (tCr).”

      and in the Method sections:

      “Glu and GABA+ concentrations were expressed as a ratio to total-creatine (tCr; Creatine + Phosphocreatine) following previous MRS studies in dyslexia (Del Tufo et al., 2018; Pugh et al., 2014).

      We did not estimate absolute concentrations using water signals as a reference, as this would require accounting for water relaxation times, which may vary across our age range. Nevertheless, our dataset has been made publicly available for future researchers to calculate and compare absolute values.

      Del Tufo, S. N., Frost, S. J., Hoeft, F., Cutting, L. E., Molfese, P. J., Mason, G. F., Rothman, D. L., Fulbright, R. K., & Pugh, K. R. (2018). Neurochemistry Predicts Convergence of Written and Spoken Language: A Proton Magnetic Resonance Spectroscopy Study of Cross-Modal Language Integration. Frontiers in Psychology, 9, 1507. https://doi.org/10.3389/fpsyg.2018.01507

      Nandi, T., Puonti, O., Clarke, W. T., Nettekoven, C., Barron, H. C., Kolasinski, J., Hanayik, T., Hinson, E. L., Berrington, A., Bachtiar, V., Johnstone, A., Winkler, A. M., Thielscher, A., Johansen-Berg, H., & Stagg, C. J. (2022). tDCS induced GABA change is associated with the simulated electric field in M1, an effect mediated by grey matter volume in the MRS voxel. Brain Stimulation, 15(5), 1153–1162. https://doi.org/10.1016/j.brs.2022.07.049

      Pugh, K. R., Frost, S. J., Rothman, D. L., Hoeft, F., Del Tufo, S. N., Mason, G. F., Molfese, P. J., Mencl, W. E., Grigorenko, E. L., Landi, N., Preston, J. L., Jacobsen, L., Seidenberg, M. S., & Fulbright, R. K. (2014). Glutamate and choline levels predict individual differences in reading ability in emergent readers. Journal of Neuroscience, 34(11), 4082–4089. https://doi.org/10.1523/JNEUROSCI.3907-13.2014

      Smith, G. S., Oeltzschner, G., Gould, N. F., Leoutsakos, J. S., Nassery, N., Joo, J. H., Kraut, M. A., Edden, R. A. E., Barker, P. B., Wijtenburg, S. A., Rowland, L. M., & Workman, C. I. (2021). Neurotransmitters and Neurometabolites in Late-Life Depression: A Preliminary Magnetic Resonance Spectroscopy Study at 7T. Journal of Affective Disorders, 279, 417–425. https://doi.org/10.1016/j.jad.2020.10.011

      GABA is typically quantified using J-editing sequences as lower field strengths (~3T), and there is some evidence that the GABA signal can be reliably measured at 7T without editing, however, the authors should discuss potential limitations, such as reliability of Glu and GABA measurements with short-TE semi-laser at 7T.

      In addition, MRS measurements of GABA are known to be influenced by macromolecules, and GABA is often denoted as GABA+ to indicate that other compounds contribute to the measured signal, especially at a short TE and in the absence of symmetric spectral editing.

      A general discussion of the strengths and limitations of unedited Glu and GABA quantification at 7T is warranted given the interest of this work to researchers who may not be experts in MRS.

      While we agree with the Reviewer that at 3T, it is recommended to use J-edited MRS to measure GABA (Mullins et al., 2014), the better spectral resolution at 7T allows for more reliable results for both metabolites using moderate echo-time, non-edited MRS (Finkelman et al., 2022). In this study, we used a short echo time (TE), which is optimal for Glu but not ideal for GABA, as it interferes with other signals. We are grateful to the Reviewer for suggesting the addition of a short paragraph to the Discussion, describing the practicalities of 3T and 7T MRS and changing the abbreviation to GABA+ to inform readers of possible macromolecule contamination:

      “We chose ultra-high-field MRS to improve data quality (Özütemiz et al., 2023), as the increased sensitivity and spectral resolution at 7T allows for better separation of overlapping metabolites compared to lower field strengths. Additionally, 7T provides a higher signal-to-noise ratio (SNR), improving the reliability of metabolite measurements and enabling the detection of small changes in Glu and GABA concentrations. Despite these theoretical advantages, several practical obstacles should be considered, such as susceptibility artifacts and inhomogeneities at higher field strengths that can impact data quality. Interestingly, actual methodological comparisons (Pradhan et al., 2015; Terpstra et al., 2016) show only a slight practical advantage of 7T single-voxel MRS compared to optimized 3T acquisition. For example, fitting quality yielded reduced estimates of variance in concentration of Glu in 7T (CRLB) and slightly improved reproducibility levels for Glu and GABA (at both fields below 5%). Choosing the appropriate MRS sequence involves a trade-off between the accuracy of Glu and GABA measurements, as different sequences are recommended for each metabolite. J-edited MRS is recommended for measuring GABA, particularly with 3T scanners (Mullins et al., 2014). However, at 7T, more reliable results can be obtained using moderate echo-time, non-edited MRS (Finkelman et al., 2022). We have opted for a short-echo-time sequence, which is optimal for measuring Glu. However, this approach results in macromolecule contamination of the GABA signal (referred to as GABA+).”

      Finkelman, T., Furman-Haran, E., Paz, R., & Tal, A. (2022). Quantifying the excitatory-inhibitory balance: A comparison of SemiLASER and MEGA-SemiLASER for simultaneously measuring GABA and glutamate at 7T. NeuroImage, 247, 118810. https://doi.org/10.1016/j.neuroimage.2021.118810

      Mullins, P. G., McGonigle, D. J., O'Gorman, R. L., Puts, N. A., Vidyasagar, R., Evans, C. J., Cardiff Symposium on MRS of GABA, & Edden, R. A. (2014). Current practice in the use of MEGA-PRESS spectroscopy for the detection of GABA. NeuroImage, 86, 43–52. https://doi.org/10.1016/j.neuroimage.2012.12.004

      Özütemiz, C., White, M., Elvendahl, W., Eryaman, Y., Marjańska, M., Metzger, G. J., Patriat, R., Kulesa, J., Harel, N., Watanabe, Y., Grant, A., Genovese, G., & Cayci, Z. (2023). Use of a Commercial 7-T MRI Scanner for Clinical Brain Imaging: Indications, Protocols, Challenges, and Solutions-A Single-Center Experience. AJR. American Journal of Roentgenology, 221(6), 788–804. https://doi.org/10.2214/AJR.23.29342

      Pradhan, S., Bonekamp, S., Gillen, J. S., Rowland, L. M., Wijtenburg, S. A., Edden, R. A., & Barker, P. B. (2015). Comparison of single voxel brain MRS AT 3T and 7T using 32-channel head coils. Magnetic Resonance Imaging, 33(8), 1013–1018. https://doi.org/10.1016/j.mri.2015.06.003

      Terpstra, M., Cheong, I., Lyu, T., Deelchand, D. K., Emir, U. E., Bednařík, P., Eberly, L. E., & Öz, G. (2016). Test-retest reproducibility of neurochemical profiles with short-echo, single-voxel MR spectroscopy at 3T and 7T. Magnetic Resonance in Medicine, 76(4), 1083–1091. https://doi.org/10.1002/mrm.26022

      Further, the single MRS voxel location is a limitation of the study as neurochemistry can vary regionally within individuals, and the putative excitatory/inhibitory imbalance in dyslexia may appear in regions outside the left temporal cortex (e.g., network-wide or in frontal regions involved in top-down executive processes). While the functional localization of the MRS voxel is a novelty and a potential advantage, it is unclear whether voxel placement based on left-lateralized reading-related neural activity may bias the experiment to be more sensitive to small, activity-related fluctuations in neurotransmitters in the CON group vs. the DYS group who may have developed an altered, compensatory reading strategy.

      We agree that including only one region of interest for the MRS measurements is a potential limitation of our study, and we have now added this information to the Discussion:

      “Moreover, since the MRS data was collected only from the left STS, it is plausible that other areas might be associated with differences in Glu or GABA concentrations in dyslexia.”

      However, differences in Glu and GABA concentrations in this region were directly predicted by the neural noise hypothesis of dyslexia. We acknowledge that this information was missing in the previous version of the manuscript. It is now included in the Results:

      “Moreover, the neural noise hypothesis of dyslexia identifies perisylvian areas as being affected by increased glutamatergic signaling, and directly predicts associations between Glu and GABA levels in the superior temporal regions and phonological skills (Hancock et al., 2017).”

      as well as in the Discussion:

      “Nevertheless, the neural noise hypothesis predicted increased glutamatergic signaling in perisylvian regions, specifically in the left superior temporal cortex (Hancock et al., 2017).”

      Figure 1 contains a lot of information, and it may be helpful to split it into 2 figures (EEG vs. MRS) so that the plots could be made larger and the reader could more easily digest the information.

      (a) I would also recommend displaying separate metabolite fit plots for each group, since the current presentation in panel F makes it appear that the MRS data is examined by testing differences between groups across the full spectrum (where the lines diverge), which really isn't the case.

      (b) The GABA peak is not visible in the spectrum, and Glutamate and GABA both have multiple peaks that should be shown on the spectrum. This may be best achieved by displaying the individual metabolite sub-spectra below the full spectrum

      Thank you for these suggestions. We have split the information into two Figures following the Reviewer’s recommendations.

      It is not clear why the 3T structural images were used for segmentation and calculation of tissue fraction if 7T structural images were also acquired (which would presumably have higher resolution).

      Generally, T1-weighted images from the 7T scanner exhibit more artifacts than those from the 3T scanner due to higher magnetic field inhomogeneity. These artifacts are especially pronounced in regions near air-tissue interfaces, such as the temporal lobes. Therefore, we chose the 3T structural images for segmentation and tissue fraction calculations and clarified this in the Method section:

      “Voxel segmentation was performed on structural images from a 3T scanner, coregistered to 7T structural images in SPM12, as the latter exhibited excessive artifacts and intensity bias in the temporal regions”.

      The basis set includes a large number of metabolites (27), including many low-concentration metabolites/compounds (e.g., bHG, bHB, Citrate, Threonine, ethanol) that are typically only included in studies targeting specific metabolites in disease/pathology. Please justify the inclusion of this maximal set of metabolites in the basis set, given that the inclusion of overlapping low-concentration metabolites may influence metabolite measurements of interest (https://doi.org/10.1002/mrm.10246).

      There is still no consensus in the MR community on which metabolites should be included in the model of human cerebral 1H-MR spectra. Typically, only major contributors such as NAA, Cr, Cho, Lac, mI, and possibly Glx are evaluated. Some studies also include additional metabolites like Ace, Ala, Asp, GABA, Glc, Gly, sI, NAAG, and Tau. In this study, as in a few others, further metabolites such as PCh, GPC, PCr, GSH, PE, and Thr were introduced and this approach seems suitable for high-field spectra (Hofmann et al., 2002).

      Hofmann, L., Slotboom, J., Jung, B., Maloca, P., Boesch, C., & Kreis, R. (2002). Quantitative 1H-magnetic resonance spectroscopy of human brain: Influence of composition and parameterization of the basis set in linear combination model-fitting. Magnetic Resonance in Medicine, 48(3), 440–453. https://doi.org/10.1002/mrm.10246

      Please provide a figure indicating the localization of the MRS voxel for a sample subject.

      A figure indicating the localization of the MRS voxel for a sample subject was added to the MRS checklist.

      It would be helpful to include Table S1 in the main article.

      Table S1 from the Supplementary Material has now been added to the main manuscript as Table 1 in the Results section.

      Please report descriptive statistics for EEG and MRS measures in Table S1.

      We have added a new Table S1 in the Supplementary Material, providing descriptive statistics for EEG and MRS E/I balance measures, presented separately for the dyslexic and control groups.

      I recommend avoiding using the terms "direct" and "indirect" to contrast MRS and EEG measures of E/I balance. Both of these measures are imperfect and it is misleading to say that MRS is a "direct" measure of neurotransmitters. There is also ambiguity in what is meant by "direct": in contrast to EEG, MRS does not measure neural activity and does not provide high-resolution temporal information, so in a sense, it is less direct.

      Thank you for this suggestion. We have replaced the terms 'direct' and 'indirect' biomarkers with 'MRS' and 'EEG' biomarkers throughout the text.

      There are many cases throughout the results in which Bayes and frequentist stats seem to contradict each other in terms of significance and what should be included in the models, especially with regard to the interaction effects (the Bayes factors appear to favor non-significant interactions). I think this is worth considering and describing to offer more clarity for the readers.

      We agree that a discussion of the divergent results between Bayesian and frequentist models was missing in the previous version of the manuscript. To provide greater clarity for the readers, we have conducted follow-up Bayesian t-tests in every case where the results indicated the inclusion of non-significant interactions with the effect of group in the model. These additional analyses have been performed for the exponent, offset, as well as for beta bandwidth in the Supplementary Material. We have also added a paragraph addressing these discrepancies in the Discussion:

      “Remarkably, in some models, results from Bayesian and frequentist statistics yielded divergent conclusions regarding the inclusion of non-significant effects. This was observed in more complex ANOVA models, whereas no such discrepancies appeared in t-tests or correlations. Given reports of high variability in Bayesian ANOVA estimates across repeated runs of the same analysis (Pfister, 2021), these results should be interpreted with caution. Therefore, following the recommendation to simplify complex models into Bayesian t-tests for more reliable estimates (Pfister, 2021), we conducted follow-up Bayesian t-tests in every case that favored the inclusion of non-significant interactions with the group factor. These analyses provided further evidence for the lack of differences between the dyslexic and control groups. Another source of discrepancy between the two methods may stem from the inclusion of interactions between covariates and within-subject effects in frequentist ANOVA, which were not included in Bayesian ANOVA to adhere to the recommendation for simpler Bayesian models (Pfister, 2021).”

      Pfister, R. (2021). Variability of Bayes factor estimates in Bayesian analysis of variance. The Quantitative Methods for Psychology, 17(1), 40-45. doi:10.20982/tqmp.17.1.p040

      It would be helpful to indicate whether participants in the DYS group had a history of reading intervention/remediation. In addition to showing that the DYS group performed lower than the CON group on reading assessments as a whole and given their age, was the performance on the reading assessments at an individual level considered for inclusion in the study? (i.e., were participants' persistent poor reading abilities confirmed with the research assessments?)

      We were unable to assess individual reading skills due to the lack of standardized diagnostic norms for adult dyslexia in Poland. Therefore, participants in the dyslexic group were recruited based on a previous clinical diagnosis of dyslexia, and reading and reading-related tasks were used for group-level comparisons only. This information has been added to the Methods section:

      “Since there are no standardized diagnostic norms for dyslexia in adults in Poland, individuals were assigned to the dyslexic group based on a past diagnosis of dyslexia.”

      Unfortunately, we did not collect information about participants' history of reading intervention or remediation. In this context, we acknowledge that including a sample of adult participants is a potential limitation of our study, however, this was already mentioned in the Discussion.

      Regarding the fMRI task, please indicate whether the participants whose threshold and/or contrast was changed for localization were from the DYS or CON group.

      This information is now added to the Method section:

      “For 6 participants (DYS n = 2, CON n = 4), the threshold was lowered to p < .05 uncorrected, while for another 6 participants (DYS n = 3, CON n = 3) the contrast from the auditory run was changed to auditory words versus fixation cross due to a lack of activation for other contrasts.”

      Reviewer #2 (Public Review):

      Summary:

      This study utilized two complementary techniques (EEG and 7T MRI/MRS) to directly test a theory of dyslexia: the neural noise hypothesis. The authors report finding no evidence to support an excitatory/inhibitory balance, as quantified by beta in EEG and Glutamate/GABA ratio in MRS. This is important work and speaks to one potential mechanism by which increased neural noise may occur in dyslexia.

      Strengths:

      This is a well-conceived study with in-depth analyses and publicly available data for independent review. The authors provide transparency with their statistics and display the raw data points along with the averages in figures for review and interpretation. The data suggest that an E/I balance issue may not underlie deficits in dyslexia and is a meaningful and needed test of a possible mechanism for increased neural noise.

      Weaknesses:

      The researchers did not include a visual print task in the EEG task, which limits analysis of reading-specific regions such as the visual word form area, which is a commonly hypoactivated region in dyslexia. This region is a common one of interest in dyslexia, yet the researchers measured the I/E balance in only one region of interest, specific to the language network.

      We agree with the Reviewer that including different tasks for the EEG biomarkers assessment would be valuable. However, this limitation was already addressed in the Discussion:

      “Importantly, our study focused on adolescents and young adults, and the EEG recordings were conducted during rest and a spoken language task. These factors may limit the generalizability of our results. Future research should include younger populations and incorporate a broader array of tasks, such as reading and phonological processing, to provide a more comprehensive evaluation of the E/I balance hypothesis.”

      Further, this work does not consider prior studies reporting neural inconsistency; a potential consequence of increased neural noise, which has been reported in several studies and linked with candidate-dyslexia gene variants (e.g., Centanni et al., 2018, 2022; Hornickel & Kraus, 2013; Neef et al., 2017). While E/I imbalance may not be a cause of increased neural noise, other potential mechanisms remain and should be discussed.

      Thank you for referring us to other works reporting neural variability in dyslexia. We agree that a broader context regarding sources of reduced neural synchronization, beyond E/I imbalance, was missing in the previous version of the manuscript. We have now included these references in the Discussion:

      “Furthermore, although our results do not support the idea of E/I balance alterations as a source of neural noise in dyslexia, they do not preclude other mechanisms leading to less synchronous neural firing posited by the hypothesis. In this context, there is evidence showing increased trial-to-trial inconsistency of neural responses in individuals with dyslexia (Centanni et al., 2022) or poor readers (Hornickel and Kraus, 2013) and its associations with specific dyslexia risk genes (Centanni et al., 2018; Neef et al., 2017). At the same time, the observed trial-to-trial inconsistency was either present only in a subset of participants (Centanni et al., 2018), limited to some experimental conditions (Centanni et al., 2022), or specific brain regions – e.g., brainstem in Hornickel and Kraus (2013), left auditory cortex in Centanni et al. (2018), or left supramarginal gyrus in Centanni et al. (2022).”

      A better description of the exponent and offset components is needed at the beginning of the results, given that the methods are presented in detail at the end. I also do not see a clear description of these components in the methods.

      A description of the aperiodic components is now included in the Results:

      “In the initial step of the analysis, we analyzed the aperiodic (exponent and offset) components of the EEG spectrum. The exponent reflects the steepness of the EEG power spectrum, with a higher exponent indicating a steeper signal; while the offset represents a uniform shift in power across frequencies, with a higher offset indicating greater power across the entire EEG spectrum (Donoghue et al., 2020).”

      as well as in the Materials and Methods:

      “Two broadband aperiodic parameters were extracted: the exponent, which quantifies the steepness of the EEG power spectrum, and the offset, which indicates signal’s power across the entire frequency spectrum.”

      Reviewer #3 (Public Review):

      Summary:

      This study by Glica and colleagues utilized EEG (i.e., Beta power, Gamma power, and aperiodic activity) and 7T MRS (i.e., MRS IE ratio, IE balance) to reevaluate the neural noise hypothesis in Dyslexia. Supported by Bayesian statistics, their results show solid 'no evidence' of EI balance differences between groups, challenging the neural noise hypothesis. The work will be of broad interest to neuroscientists, and educational and clinical psychologists.

      Strengths:

      Combining EEG and 7T MRS, this study utilized both the indirect (i.e., Beta power, Gamma power, and aperiodic activity) and direct (i.e., MRS IE ratio, IE balance) measures to reevaluate the neural noise hypothesis in Dyslexia.

      Weaknesses:

      The authors may need to provide more data to assess the quality of the MRS data.

      We have addressed the following specific recommendations of the Reviewer providing more data about the quality of the MRS data.

      The authors may need to explain how the number of subjects is determined in the MRS section.

      We have clarified the MRS sample description in the Results section:

      “Due to financial and logistical constraints, 59 out of the 120 recruited subjects, selected progressively as the study unfolded, were examined with MRS. Subjects were matched by age and sex between the dyslexic and control groups. Due to technical issues and to prevent delays and discomfort for the participants, we collected 54 complete sessions. Additionally, four datasets were excluded based on our quality control criteria, and three GABA+ estimates exceeded the selected CRLB threshold. Ultimately, we report 50 estimates for Glu (21 participants with dyslexia) and 47 for GABA+ and Glu/GABA+ ratios (20 participants with dyslexia).”

      Is there a reason why theta and gamma peaks were not observed in the majority of participants? What are the possible reasons that likely caused the discrepancy between this study and previously reported relevant studies?

      We have now added a discussion about the absence of oscillatory peaks in the theta and gamma bands to the Discussion section:

      “We could not perform analyses for the gamma oscillations since in the majority of participants the gamma peak was not detected above the aperiodic component. Due to the 1/f properties of the EEG spectrum, both aperiodic and periodic components should be disentangled to analyze ‘true’ gamma oscillations; however, this approach is not typically recognized in electrophysiology research (Hudson and Jones, 2022). Indeed, previous studies that analyzed gamma activity in dyslexia (Babiloni et al., 2012; Lasnick et al., 2023; Rufener and Zaehle, 2021) did not separate the background aperiodic activity. For the same reason, we could not analyze results for the theta band, which often does not meet the criteria for an oscillatory component manifested as a peak in the power spectrum (Klimesch, 1999). Moreover, results from a study investigating developmental changes in both periodic and aperiodic components suggest that theta oscillations in older participants are mostly observed in frontal midline electrodes (Cellier et al., 2021), which were not analyzed in the current study.”

      Hudson, M. R., & Jones, N. C. (2022). Deciphering the code: Identifying true gamma neural oscillations. Experimental Neurology357, 114205. https://doi.org/10.1016/j.expneurol.2022.114205

      Klimesch, W. (1999). EEG alpha and theta oscillations reflect cognitive and memory performance: A review and analysis. Brain Research Reviews29(2-3), 169-195. https://doi.org/10.1016/S0165-0173(98)00056-3

      Based on Figure 1F, the quality of the MRS data may be contaminated by the lipid signal, especially for the DYS group. To better evaluate the MRS data, especially the GABA measurements, the authors need to show:

      (a) the placement of the MRS voxel on the anatomical images;

      Averaged MRS voxel placement was already presented in Figure 1 (now Figure 2) in the manuscript. Now, we have also added exemplary single-subject images to the MRS checklist in the Supplement.

      (b) Glu and GABA model functions

      We have now provided more meaningful Glu and GABA indications in Figure 2.

      (c) CRLB for GABA

      We have added respective estimates to the Supplement:

      %CRLB of Glu: mean 2.96, SD = 0.79

      %CRLB of GABA: mean 10.59, SD = 2.76

      %CRLB of NAA: 1.76 SD = 0.46

      Further, the authors added voxel's gray matter volume as a covariate when performing separate ANCOVAs. The authors may need to use alpha correction or 1-fCSF correction to corroborate these results.

      We chose to use the ratio of Glu and GABA to total creatine (tCr), as this remains a common practice in MRS studies at 7T (e.g., Nandi et al., 2022; Smith et al., 2021). This decision was also influenced by previous dyslexia studies (Del Tufo et al., 2018; Pugh et al., 2014) and is now clarified in the Results and Methods sections.

      Regarding alpha correction, a recent paper (García-Pérez et al., 2023) recommends: 'In general, avoid corrections for multiple testing if statistical claims are to be made for each individual test, in the absence of an omnibus null hypothesis.' Since we report null findings, further alpha correction would not significantly impact the results.

      García-Pérez, M. A. (2023). Use and misuse of corrections for multiple testing. Methods in Psychology8, 100120. https://doi.org/10.1016/j.metip.2023.100120

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, Zhang et al., presented an electrophysiology method to identify the layers of macaque visual cortex with high density Neuropixels 1.0 electrode. They found several electrophysiology signal profiles for high-resolution laminar discrimination and described a set of signal metrics for fine cortical layer identification.

      Strengths:

      There are two major strengths. One is the use of high density electrodes. The Neuropixels 1.0 probe has 20 um spacing electrodes, which can provide high resolution for cortical laminar identification. The second strength is the analysis. They found multiple electrophysiology signal profiles which can be used for laminar discrimination. Using this new method, they could identify the most thin layer in macaque V1. The data support their conclusion.

      Weaknesses:

      While this electrophysiology strategy is much easier to perform even in awake animals compared to histological staining methods, it provides an indirect estimation of cortical layers. A parallel histological study can provide a direct matching between the electrode signal features and cortical laminar locations. However, there are technical challenges, for example the distortions in both electrode penetration and tissue preparation may prevent a precise matching between electrode locations and cortical layers. In this case, additional micro wires electrodes binding with Neuropixels probe can be used to inject current and mark the locations of different depths in cortical tissue after recording.

      While we agree that it would be helpful to adopt a more direct method for linking laminar changes observed with electrophysiology to anatomical layers observed in postmortem histology, we do not believe that the approach suggested by the reviewer would be particularly helpful. The approach suggested involves making lesions, which are known to be quite variable in size, asymmetric in shape, and do not have a predictable geometry relative to the location of the electrode tip. In contrast, our electrophysiology measures have identified clear boundaries which precisely match the known widths and relative positions of all the layers of V1, including layer 4A, which is only 50 microns thick, much smaller than the resolution of lesion methods.

      Reviewer #2 (Public Review):

      Summary:

      This paper documents an attempt to accurately determine the locations and boundaries of the anatomically and functionally defined layers in macaque primary visual cortex using voltage signals recorded from a high-density electrode array that spans the full depth of cortex with contacts at 20 um spacing. First, the authors attempt to use current source density (CSD) analysis to determine layer locations, but they report a striking failure because the results vary greatly from one electrode penetration to the next and because the spatial resolution of the underlying local field potential (LFP) signal is coarse compared to the electrical contact spacing. The authors thus turn to examining higher frequency signals related to action potentials and provide evidence that these signals reflect changes in neuronal size and packing density, response latency and visual selectivity.

      Strengths:

      There is a lot of nice data to look at in this paper that shows interesting quantities as a function of depth in V1. Bringing all of these together offers the reader a rich data set: CSD, action potential shape, response power and coherence spectrum, and post-stimulus time response traces. Furthermore, data are displayed as a function of eye (dominant or non-dominant) and for achromatic and cone-isolating stimuli.

      This paper takes a strong stand in pointing out weaknesses in the ability of CSD analysis to make consistent determinations about cortical layering in V1. Many researchers have found CSD to be problematic, and the observations here may be important to motivate other researchers to carry out rigorous comparisons and publish their results, even if they reflect negatively on the value of CSD analysis.

      The paper provides a thoughtful, practical and comprehensive recipe for assigning traditional cortical layers based on easily-computed metrics from electrophysiological recordings in V1, and this is likely to be useful for electrophysiologists who are now more frequently using high-density electrode arrays.

      Weaknesses:

      Much effort is spent pointing out features that are well known, for example, the latency difference associated with different retinogeniculate pathways, the activity level differences associated with input layers, and the action potential shape differences associated with white vs. gray matter. These have been used for decades as indicators of depth and location of recordings in visual cortex as electrodes were carefully advanced. High density electrodes allow this type of data to now be collected in parallel, but at discrete, regular sampling points. Rather than showing examples of what is already accepted, the emphasis should be placed on developing a rigorous analysis of how variable vs. reproducible are quantitative metrics of these features across penetrations, as a function of distance or functional domain, and from animal to animal. Ultimately, a more quantitative approach to the question of consistency is needed to assess the value of the methods proposed here.

      We thank the reviewer for suggesting the addition of quantitative metrics to allow more substantive comparisons between various measures within and between penetrations. We have added quantification and describe this in the context of more specific comments made by this reviewer. We have retained descriptions of metrics that are well established because they provide an important validation of our approaches and laminar assignments.

      Another important piece of information for assessing the ability to determine layers from spiking activity is to carry out post-mortem histological processing so that the layer determination made in this paper could be compared to anatomical layering.

      We are not aware of any approach that would provide such information at sufficient resolution. For example, it is well known that electrolytic lesions often do not match to the locations expected from electrophysiological changes observed with single electrodes. As noted above, our observation that the laminar changes in electrophysiology precisely match the known widths and relative positions of all the layers of V1, including layer 4A, provides confidence in our laminar assignments.

      On line 162, the text states that there is a clear lack of consistency across penetrations, but why should there be consistency: how far apart in the cortex were the penetrations? How long were the electrodes allowed to settle before recording, how much damage was done to tissue during insertion? Do you have data taken over time - how consistent is the pattern across several hours, and how long was the time between the collection of the penetrations shown here?

      Answers to most of these questions can be found within the manuscript text. We have added text describing distance between electrode penetrations (at least 1mm, typically far more) and added a figure which shows a map of the penetration locations. The Methods section describes electrode penetration methods to minimize damage and settling times of penetrations. Data are provided regarding changes in recordings over time (see Methods, Drift Correction). The stimuli used to generate the data described are presented within a total of 30 minutes or less, minimizing any changes that might occur due to electrode drift. There is a minimum of 3 hours between different penetrations from the same animal.

      The impact of the paper is lessened because it emphasizes consistency but not in a consistent manner. Some demonstrations of consistency are shown for CSDs, but not quantified. Figure 4A is used to make a point about consistency in cell density, but across animals, whereas the previous text was pointing out inconsistency across penetrations. What if you took a 40 or 60 um column of tissue and computed cell density, then you would be comparing consistency across potentially similar scales. Overall, it is not clear how all of these different metrics compare quantitatively to each other in terms of consistency.

      As noted above, we have now added quantitative comparisons of consistency between different metrics. It is unclear why the reviewer felt that we use Figure 4A to describe consistency. That figure was a photograph from a previous publication simply showing the known differences in neuron density that are used to define layers in anatomical studies. This was intended to introduce the reader to known laminar differences. At any rate, we have been unable to contact the previous publishers of that work to obtain permission to use the figure. So we have removed that figure as it is unnecessary to illustrate the known differences in cell density that are used to define layers. We have kept the citation so that interested readers can refer to the publication.

      In many places, the text makes assertions that A is a consistent indicator of B, but then there appear to be clear counterexamples in the data shown in the figures. There is some sense that the reasoning is relying too much on examples, and not enough on statistical quantities.

      Without reference to specific examples we are not able to address this point.

      Overall

      Overall, this paper makes a solid argument in favor of using action potentials and stimulus driven responses, instead of CSD measurements, to assign cortical layers to electrode contacts in V1. It is nice to look at the data in this paper and to read the authors' highly educated interpretation and speculation about how useful such measurements will be in general to make layer assignments. It is easy to agree with much of what they say, and to hope that in the future there will be reliable, quantitative methods to make meaningful segmentations of neurons in terms of their differentiated roles in cortical computation. How much this will end up corresponding to the canonical layer numbering that has been used for many decades now remains unclear.

      Reviewer #3 (Public Review):

      Summary:

      Zhang et al. explored strategies for aligning electrophysiological recordings from high-density laminar electrode arrays (Neuropixels) with the pattern of lamination across cortical depth in macaque primary visual cortex (V1), with the goal of improving the spatial resolution of layer identification based on electrophysiological signals alone. The authors compare the current commonly used standard in the field - current source density (CSD) analysis - with a new set of measures largely derived from action potential (AP) frequency band signals. Individual AP band measures provide distinct cues about different landmarks or potential laminar boundaries, and together they are used to subdivide the spatial extent of array recordings into discrete layers, including the very thin layer 4A, a level of resolution unavailable when relying on CSD analysis alone for laminar identification. The authors compare the widths of the resulting subdivisions with previously reported anatomical measurements as evidence that layers have been accurately identified. This is a bit circular, given that they also use these anatomical measurements as guidelines limiting the boundary assignments; however, the strategy is overall sensible and the electrophysiological signatures used to identify layers are generally convincing. Furthermore, by varying the pattern of visual stimulation to target chromatically sensitive inputs known to be partially segregated by layer in V1, they show localized response patterns that lend confidence to their identification of particular sublayers.

      The authors compellingly demonstrate the insufficiency of CSD analysis for precisely identifying fine laminar structure, and in some cases its limited accuracy at identifying coarse structure. CSD analysis produced inconsistent results across array penetrations and across visual stimulus conditions and was not improved in spatial resolution by sampling at high density with Neuropixels probes. Instead, in order to generate a typical, informative pattern of current sources and sinks across layers, the LFP signals from the Neuropixels arrays required spatial smoothing or subsampling to approximately match the coarser (50-100 µm) spacing of other laminar arrays. Even with smoothing, the resulting CSDs in some cases predicted laminar boundaries that were inconsistent with boundaries estimated using other measures and/or unlikely given the typical sizes of individual layers in macaque V1. This point alone provides an important insight for others seeking to link their own laminar array recordings to cortical layers.

      They next offer a set of measures based on analysis of AP band signals. These measures include analyses of the density, average signal spread, and spike waveforms of single- and multi-units identified through spike sorting, as well as analyses of AP band power spectra and local coherence profiles across recording depth. The power spectrum measures in particular yield compact peaks at particular depths, albeit with some variation across penetrations, whereas the waveform measures most convincingly identified the layer 6-white matter transition. In general, some of the new measures yield inconsistent patterns across penetrations, and some of the authors' explanations of these analyses draw intriguing but rather speculative connections to properties of anatomy and/or responsivity. However, taken as a group, the set of AP band analyses appear sufficient to determine the layer 6-white matter transition with precision and to delineate intermediate transition points likely to correspond to actual layer boundaries.

      Strengths:

      The authors convincingly demonstrate the potential to resolve putative laminar boundaries using only electrophysiological recordings from Neuropixels arrays. This is particularly useful given that histological information is often unavailable for chronic recordings. They make a clear case that CSD analysis is insufficient to resolve the lamination pattern with the desired precision and offer a thoughtful set of alternative analyses, along with an order in which to consider multiple cues in order to facilitate others' adoption of the strategy. The widths of the resulting layers bear a sensible resemblance to the expected widths identified by prior anatomical measurements, and at least in some cases there are satisfying signatures of chromatic visual sensitivity and latency differences across layers that are predicted by the known connectivity of the corresponding layers. Thus, the proposed analytical toolkit appears to work well for macaque V1 and has strong potential to generalize to use in other cortical regions, though area-targeted selection of stimuli may be required.

      Weaknesses:

      The waveform measures, and in particular the unit density distribution, are likely to be sensitive to the criteria used for spike sorting, which differ widely among experimenters/groups, and this may limit the usefulness of this particular measure for others in the community. The analysis of detected unit density yields fluctuations across cortical depth which the authors attribute to variations in neural density across layers; however, these patterns seemed particularly variable across penetrations and did not consistently yield peaks at depths that should have high neuronal density, such as layer 2. Therefore, this measure has limited interpretability.

      While we agree that our electrophysiological measure of unit density does not strictly reflect anatomical neuronal density, we would like to remind the reader that we use this measure only to roughly estimate the correspondence between changes in density and likely layer assignments. We rely on other measures (e.g. AP power, AP power changes in response to visual stimuli) that have sharp borders and more clear transitions to assign laminar boundaries. Further, as noted in the reviewer’s list of strengths, the laminar assignments made with these measures are cross validated by differences in response latencies and sensitivity to different types of stimuli that are observed at different electrode depths.

      More generally, although the sizes of identified layers comport with typical sizes identified anatomically, a more powerful confirmation would be a direct per-penetration comparison with histologically identified boundaries. Ultimately, the absence of this type of independent confirmation limits the strength of their claim that veridical laminar boundaries can be identified from electrophysiological signals alone.

      As we have noted in response to similar comments from other reviewers, we are not aware of a method that would make this possible with sufficient resolution.

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      The reviewers have indicated that their assessment would potentially be stronger if their advice for quantitative, statistically validated comparisons was followed, for example, to demonstrate variability or consistency of certain measures that are currently only asserted. Also, if available, some histological confirmation would be beneficial. It was requested that the use and modification of the layering from Balaram & Kaas is addressed, as well as dealing with inconsistencies in the scale bars on those figures. There are two figure permission issues that need to be resolved prior to publication: Balaram & Kaas 2014 in Fig 1A, Kelly & Hawken 2017 in Fig. 4A.

      Please see detailed responses to reviewer comments below. We have added new supplemental figures to quantitatively compare variability among metrics. As noted above, the suggested addition of data linking the electrophysiology directly to anatomical observations of laminar borders from the same electrode penetration is not feasible. The figure reused in Figure 1A is from open-access (CC BY) publication (Balaram & Kaas 2014). After reexamining the figure in the original study, we found that the inferred scale bar would give an obviously inaccurate result. So, we decided to remove the scale bar in Figure 1A. We haven’t received any reply from Springer Nature for Figure 4A permission, so we decided to remove the reused figure from our article (Kelly & Hawken 2017).

      Reviewer #1 (Recommendations For The Authors):<br /> Figure 4A has a different scale to Figure 4B-4F. It is better to add dashed lines to indicate the relationship between the cortical layers or overall range from Figure 4A to the corresponding layers in 4B to 4F.

      The reused figure in Figure 4A is removed due to permission issue. See also comments above.

      Reviewer #2 (Recommendations For The Authors):

      General comments

      This paper demonstrates that voltage signals in frequency bands higher than those used for LFP/CSD analysis can be used from high-density electrical contact recording to generate a map of cortical layering in macaque V1 at a higher spatial resolution than previously attained.

      My main concern is that much of this paper seems to show that properties of voltage signals recorded by electrodes change with depth in V1. This of course is well known and has been mapped by many who have advanced a single electrode micron-by-micron through the cortex, listening and recording as they go. Figure 4 shows that spike shapes can give a clear indication of GM to WM borders, and this is certainly true and well known. Figures 5 and 6 show that activity level on electrodes can indicate layers related to LGN input, and this is known. Figure 7 shows that latencies vary with layer, and this is certainly true as we know. A main point seems to be that CSD is highly inconsistent. This is important to know if CSD is simply never going to be a good measure for layering in V1, but it would require quantification and statistics to make a fair comparison.

      We are glad to see that the reviewer understands that changes in electrical signals across layers are well known and are expected to have particular traits that change across layers. We do not claim that have discovered anything that is unexpected or unknown. Instead, we introduce quantitative measures that are sensitive to these known differences (historically, often just heard with an audio monitor e.g. “LGN axon hash”). While the primary aim of this paper is not to show that Neuropixels probes can record some voltage signal properties that cannot be recorded with a single electrode before, we would like to point out that multi-electrode arrays have a very different sampling bias and also allow comparisons of simultaneous recordings across contacts with known fixed distances between them. For example our measure of “unit spread” could not be estimated with a single electrode.

      We’ve added Figure S3 to show quantitative comparison of variation between CSD and AP metrics. These figures add support to our prior, more anecdotal descriptions showing that CSDs are inconsistent and lack the resolution needed to identify thin layers.

      Some things are not explained very clearly. Like achromatic regions, and eye dominance - these are not quantified, and we don't know if they are mutually consistent - are achromatic/chromatic the same when tested through separate eyes? How consistent are these basic definitions? How definitive are they?

      The quantitative definitions of achromatic region/COFD and eye dominance column can be found in our previous paper (Li et al., 2022) cited in this article. The main theme of this study is to develop a strategy for accurately identifying layers, the more detailed functional analysis will be described in future publications.

      Specific comments

      The abstract refers to CSD analysis and CSD signals. Can you be more precise - do you aim to say that LFP signals in certain frequency bands are already known to lack spatial localization, or are you claiming to be showing that LFP signals lack spatial resolution? A major point of the results appears to be lack of consistency of CSD, but I do not see that in the Abstract. The first sentence in the abstract appears to be questionable based on the results shown here for V1.

      We have updated the Abstract to minimize confusion and misunderstanding.

      Scale bar on Fig 1A implies that layers 2-5 are nearly 3 mm thick. Can you explain this thickness? Other figures here suggest layers 1-6 is less than 2 mm thick. Note, in a paper by the same authors (Balaram et al) the scale bar (100 um, Figure 4) on similar macaque tissue suggests that the cortex is much thinner than this. Perhaps neither is correct, but you should attempt to determine an approximately accurate scale. The text defines granular as Layer 4, but the scale bar in A implies layer 4 is 1 mm thick, but this does not match the ~0.5 mm thickness consistent with Figure 1E, F. The text states that L4A is less then 100 um thick, but the markings and scale bar in Figure 1A suggests that it could be more than 100 um thick.

      We thank the reviewer for pointing out that there are clearly errors in the scale bars used in these previously published figures from another group. In the original figure 1(Balaram & Kaas 2014), histological slices were all scaled to one of the samples (Chimpanzee) without scale bar. After reexamining the scale bar we derived based on figure 2 of the original study, we found the same problem. Since relative widths of layers are more important than absolute widths in our study, we decided to remove the scale bar that we had derived and added to the Figure 1A.

      Line 157. Fix "The most commonly visual stimulus"

      Text has been changed

      Line 161. Fix "through dominate eye"

      Text has been changed

      Line 166. Please specify if the methods established and validated below are histological, or tell something about their nature here.

      The Abstract and Introduction already described the nature of our methods

      Line 184. Text is mixing 'dominant' and 'dominate', the former is better.

      Text has been changed accordingly

      Line 188. Can you clarify "beyond the time before a new stimulus transition". Are you generally referring to the fact that neuronal responses outlast the time between changes in the stimulus?

      That is correct. We are referring to the fact that neuronal responses outlast the time between changes in the stimulus. We have edited the text for clarity.

      Line 196. Fix "dominate eye" in two places.

      Text has been changed

      Line 196. The text seems to imply it is striking to find different response patterns for the two eyes, but given the OD columns, why should this be surprising?

      Since we didn’t find systematic comparison for CSD depth profiles of dominant/non-dominant eyes, or black/white in the past studies, we just describe what we saw in our data. The rational for testing each eye is that it is known that LGN projections from two eyes remain separated in direct input layer of V1, so comparing CSDs from two eyes could potentially help identifying input layers, such as L4C. Here we provide evidence showing that CSD profiles from two eyes deviate from naive expectations. For example, CSDs from black stimulus show less variation between two eyes, whereas CSDs from white stimulus could range from similar profile to drastically different ones across eyes.

      Line 198. Text like, "The most consistent..." is stating overall conclusions drawn by the authors before pointing the reader specifically to the evidence or the quantification that supports the statement.

      We’ve adjusted the text pointing to Figure S2, where depth profiles of all penetrations are visualized, and a newly added Figure S3, where the coefficients of variation for several metric profiles were shown.

      Line 200. "white stimulus is more variable" - the text does not tell us where/how this is supported with quantitative analysis/statistics.

      We’ve adjusted the text pointing to Figure S2, S3

      The metric in 4B is not explained, the text mentions the plot but the reader is unable to make any judgement without knowledge of the method, nor any estimate of error bars.

      The figure is first mentioned in section: Unit Density, and text in this section already described the definition of neuron density and unit density.  We’ve also modified the text pointing to the method section for details.

      Line 236. The text states the peak corresponds to L4C, but does not explain how the layer lines were determined.

      As described early in the CSD section, all layer boundaries are determined following the guide which layouts the strategy for how to draw borders by combining all metrics.

      At Line 296 the spike metrics section ends without providing a clear quantification of how useful the metrics will be. It is clear that the GM to WM boundary can be identified, but that can be found with single electrodes as well, as neurophysiologists get to see/hear the change in waveform as the electrode is advanced in even finer spatial increments than the 20 um spacing of the contacts here.

      The aim of this study is to develop an approach for accurately delineating layers simultaneously. The metrics we explored are considered estimation of well-known properties, so they can provide support for the correctness we hope to achieve. Here we first demonstrate the usefulness and later show the average across penetrations (Figure 9C-F). We are less concerned in quantification of how different factors affect precision and consistency of these metrics or how useful a single metric is, but rather, as described in the guide section, whether we can delineate all layers given all metrics.

      Line 302-306. Why this statement is made here is unclear, it interrupts the flow for a reason that perhaps will be explained later.

      This statement notes the insensitivity of this measure to temporal differences, introducing the value of incorporating a measure of how AP powers changes over time in the next section of the manuscript.

      Line 311. What is the reason to speculate about no canceling because of temporal overlap? Are you assuming a very sparse multi unit firing rate such that collisions do not happen?

      Here we describe a simple theoretical model in which spike waveforms only add without cancelling, then the power would be proportional to the number of spikes. In reality, spike waveform sometimes cancels causing the theoretical relationship to deteriorate to some degree.

      Lines 327-346. There is a considerable amount of speculation and arguing based on particular examples and there is a lack of quantification. Neuron density is mentioned, but not firing rate. would responses from fewer neurons with higher firing rate not be similar to more neurons with lower firing rates?

      According to the theoretical model we described, power is proportional to numbers of spikes which then depend on both neuron density and firing rate. So fewer neurons with higher firing rate would generate similar power to more neurons with lower firing rate. We’ve expanded the explanation of the model and added Figure S4 about the depth profile of firing rate. Text has also been adjusted pointing to the Figure S2, S3 about quantitively comparisons of variability.

      Line 348 states there is a precise link between properties and cortical layers, but the manuscript has not, up to this point, shown how that link was determined or quantified it.

      Through our generative model of power and the similarity between depth profile of firing rate and depth profile of neuron density (Figure S4), depth profile of power can be used to approximate depth profile of neuron density which is known to be closely correlated to cortical layering.

      Line 350. What is meant by "stochastic variability"?

      The text essentially says distances from electrode contact to nearby cell bodies were random, so closer cells have higher spike amplitudes and in turn result in higher power on a channel.

      The figures showing the two metrics, Pf and Cf, should be shown for the same data sets. The markings indicate that Fig 5 and Fig 6 show results from non-overlapping data sets. This does not build confidence about the results in the paper.

      Here we use typical profiles to demonstrate the characteristics of the power spectrum/coherence spectrum because of the variation across penetrations. We show later, in the guide section, all metrics for one penetration (another two cases in supplemental figures) and how to combine all metrics to derive layer delineations.

      Line 375 the statement is somewhat vague, "there are nevertheless sometimes cases where they can resolve uncertainties," can you please provide some quantitative support?

      We provided 3 examples in Figure 6, and more examples are shown in Figure 8, Figure S5, S6.

      Line 379. I believe the change you want to describe here is a change associated with a transition in the visual stimulus. It would be good to clarify this in the first several sentences here. Baseline can mean different things. I got the impression that your stimuli flip between states at a rate fast enough that signals do not really have time to return to a baseline.

      We rephrased the sentence to describe the metric more precisely. A pair of uniform colors flipping in 1.5 second intervals is usually long enough for spiking activities to decay to a saturated level.

      This section (379 - 398) continues a qualitative show-and-tell feel. There appears to be a lot of variability across the examples in Figure 7. How could you try to quantify this variability versus the variability in LFP? And, in this section overall, the text and figure legend don't really describe what the baseline is.

      Text adjustments are made to briefly describe the baseline window and point to the Method section where definitions are described in detail. We’ve added Figure S3 together with Figure S2 to address the variability across penetrations, stimuli, and metrics.

      Line 405 - 415. The discussion here does not consider that layers may not have well defined boundaries, the text gives the impression that there is some ultimate ground truth to which the metrics are being compared, but that may not be accurate.

      Except for a few layers/sublayers, such as L2, L3A, L3B, most layer boundaries of neocortex are well defined (Figure 1A) and histological staining of neurons/density and correlated changes in chemical content show very sharp transitions. The best of these staining methods is cytochrome oxidase, which shows sharp borders at the top and bottom of layer 4A, top and bottom of layer 4C, and the layer 5/6 border. There is also a sharp transition in neuronal cell body size and density at the top and bottom of layer 4Cb. The definition and delineation of all possible layers are constantly being refined, especially by accumulated knowledge of genetic markers of different cell types and connection patterns. In our study, we develop metrics to estimate well known anatomical and functional properties of different layers. We have also discussed layer boundaries that have been ambiguous to date and explained the reason and criteria to resolve them.

      Line 423. The text references Figure 1A in stating that relative thickness and position is crucial, but FIgure 1A does not provide that information and does not explain how it might be determined, or how much of a consensus there is. Also, the text does not consider that the electrode may go through the cortex at oblique angles, and not the same angle in each layer, and the relative thickness may not be a dependable reference.

      There are numerous studies that describe criteria to delineate cortical layers, the referenced article (Balaram & Kaas 2014) is used here as an example. We are not aware of any publication that has systematically compared the relative thickness of layers across the V1 surface of a given animal or across animals. Nevertheless, it is clear from the literature that there is considerable similarity across animals. Accordingly, we cannot know what the source of variability in overall cortical thickness in our samples is, but we do see considerable consistency in the relative thickness of the layers we infer from our measures. We illustrate the differences that we see across penetrations and consider likely causes, such as the extent to which the coverslip pressing down on the cortex might differentially compress the cortex at different locations within the chamber.

      The angle deviation of probe from surface will not change the relative thickness of layers, and the rigid linear probe is unlikely to bend in the cortex.

      Line 433. The term "Coherence" is used, clarify is this is you Cf from Figure 6. The text states, "marked decrease at the bottom of layer 6". Please clarify this, I do not see that in Figure 6.

      Text has been adjusted.

      In Figure 6, the locations of the lines between L1 and 2 do not seem to be consistent with respect to the subtle changes in light blue shading, across all three examples, yet the text on line 436 states that there is a clear transition.

      We feel that the language used accurately reflects what is shown in the figure. While the transition is not sharp, it is clear that there is a transition. This transition is not used to define this laminar border. We have edited the text to clarify that the L1/2 border is better defined based on the change in AP power which shows a sharp transition (Figure 7). 

      The text states that the boundary is also "always clear" from metrics... and sites Figure 5, but I do not see that this boundary is clear for all three examples in Figure 5.

      Text has been adjusted.

      Line 438. The text states that "it is not unusual for unit density to fall to zero below the L1/2 border (Figure 8E)", but surprisingly, the line in Figure 8 E does not even cover the indicated boundary between L1 and L2.

      At this point, the number of statements in the text that do not clearly and precisely correlate to the data in the figures is worrisome, and I think you could lose the confidence of readers at this point.

      We do not see any inconstancy between what is stated in our text and what is noted by the reviewer. The termination of the blue line corresponds to the location where no units are detected. This is the location where “unit density falls to zero”.  In this example, no units resolved through spike sorting until ~100mm beneath the L1/L2 boundary, which is exactly zero unity density (Figure 8E). That there are electrical signals in this region is clear from the AP power change (Figure 8C) which also shows the location of the L1/L2 border.

      Line 448. Text states that the 6A/B border is defined by a sharp boundary in AP power, but Figure 8A "AP power spectrum" does not show a sharp change at the A/B line. There is a peak in this metric in the middle to upper middle of 6A, but nothing so sharp to define a boundary between distinct layers, at least for penetration A2.

      Text has been adjusted.

      In Figure 8, the layer labels are not clear, whereas they are reasonably clear in the other figures.

      This is a technical problem regarding vector graphics that were not properly converted in PDF generation. We will upload each high-quality vector graphics when we finalize the version of record.

      The text emphasizes differences in L4B and L4C with respect to average power and coherence, but the transition seems a bit gradual from layer 3B to 4C in some examples in Figure 6. And in Figure 5, A3, there doesn't appear to be any particular transition along the line between 4B and 4C.

      In this guide section, we pointed out early that some metrics are good for some boundaries and variation exists between penetrations. We’ve expanded text emphasizing the importance of timing differences in DP/P for differentiating sublayers in L4. Lastly, in case of several unresolvable boundaries given all the metrics, the prior knowledge of relative thickness should be used.

      Line 466 provides prescriptions in absolute linear distances, but this is unwise given that cortex may be crossed at oblique angles by electrodes, particularly for parts of V1 that are not on the surface of the brain. Other parts of the text have emphasized relative measurements.

      Text has been changed using relative measurements.

      Line 507. The text says 9C and 4A are a good match, but the match does not look that good (4A has substantial dips at 0.5 and 0.75, and substantial peaks), and there is no quantification of fit. The error bars on 9C do not help show the variability across penetrations, they appear to be SEM, which shows that error bars get smaller as you average more data. It would seem more important to understand what is the variance in the density from one penetration to the next compared to the variance in density across layers.

      We have replaced “good match” with “roughly corresponds”. We note that we do not use unit density as a metric for identification of laminar borders and instead show that the expected locations of layers with higher neuronal density correspond to the locations where there are similar changes in unit density. It should be noted that Figure 9C is an average across many penetrations so should not be expected to show transitions that are as sharp in individual penetrations. Because of the figure permission issue, we have removed Figure 4A, and changed the text accordingly.

      Figure 9C-F show a lot of variability in the individual curves (dim gray lines) compared to the overall average. Does this show that these metrics are not reliable indicators at the level of single penetration, but show some trends across larger averages?

      In the beginning of the guide, we emphasized that all metrics should be combined for individual penetration, because some metrics are only reliable for delineating certain layer boundaries and the quality of data for the various measures varies between penetrations. The penetration average serves the same purpose explained in the previous question as an indicator that our layer delineation was not far off.

      The discussion mentions improvements in layer identification made here. Did this work check the assignments for these penetration against assignments made based on some form of ground truth? Previous methods would advance electrodes steadily, and make lesions, and carry out histology. Is there any way to tell how this method would compare to that?

      Even electrolytic lesions do not necessarily reveal ground truth and can be quite misleading. And their resolution is limited by lesion size. Lesions are typically variable in size, asymmetric and have variable shape and position relative to the location of the electrode tip, likely affected by the quality and location of electrical grounding and variations in current flow due to locations of blood vessels. A review of the published literature with electrode lesions shows that electrophysiological transitions are likely a far more accurate indicator of recording locations than post-mortem histology from electrolytic lesions. It is extremely rare for the locations of lesions to be precisely aligned to expected laminar transitions. See for example Chatterjee et al (Nature 2004). Also see several manuscripts from the Shapley lab. The lone rare exception of which we are aware is Blasdel and Fitzpatrick1984 in which consistently small and round lesions were produced and even these would be too large (~100 microns) to accurately identify layers if it were not for the fact that the electrode penetrations were very long and tangential to the cortical layers. 

      Reviewer #3 (Recommendations For The Authors):

      - The authors say (lines 360-362) that "Assuming spikes of a neuron spread to at least two adjacent recording channels, then the coherence between the two channels would be directly proportional to number of spikes, independent of spike amplitude." Has this been demonstrated? Very large amplitude spikes should show up on more channels than small amplitude spikes. Do waveform amplitudes and unit densities from the spike waveform analyses show consistent relationships to the power and/or coherence distributions over depth across penetrations?

      This part of the manuscript is providing a theoretical rational for what might be expected to affect the measures that we have derived. That is why we begin by stating that we are making an assumption. The answers to the reviewer’s questions are not known and have not been demonstrated. By beginning with this theoretical preface, we can point to cases where the data match these expectations as well as other cases where the data differ from the theoretical expectations.

      Coherence, by definition, is a normalized metric that is insensitive to amplitude. Spike amplitude mainly depends on how close the signal source is to electrode, and spike spread mainly depends on cell body size and shape given the same distance to electrode. Therefore, a very large spike amplitude could stem from a very close small cell to electrode, but would result in a small spike spread, especially axonal spikes (Figure 4B, red spike). Spike amplitudes on average are higher in L4C which matches the expectation that higher cell density would result, on average, closer cell body to electrode (Figure S4A). Nonetheless, the high-density small cell bodies in L4C result in a small spike spread (Figure 9D).

      - I suggest clarifying what is defined as the baseline window for the ΔP/P measure - is it the entire 10-150 ms response window used for the power spectrum analysis?

      Text adjustments are made in the Methods where the time windows are defined at the beginning of the CSD section. Only temporal change metrics (ΔCSD and ΔP/P) use the baseline window ([-40, 10]ms). The other two spectrum metrics (Power and Coherence) use the response window ([10, 150]ms).

      - Firing rate differs by cell type and, on average, differs by layer in V1. Many layer 2/3 neurons, for example, have low maximum firing rates when driven with optimized achromatic grating stimuli. To the extent that the generative models explaining the sources of power and coherence signals rely on the assumption that firing rates are matched across cortical depth, these models may be inaccurate. This assumption is declared only subtly, and late in the paper, but it is relevant to earlier claims.

      Text adjustments are made to explicitly describe the possibility that uneven depth profile of firing rate could counteract the depth profile of neuron density, resulting distorted or even a flat depth profile of power/coherence that deviates far from the depth profile of neuron density. In a newly added Figure S4, we first show the average firing rate profile during a set of stimuli (uniform color, static/drifting, achromatic/chromatic gratings), then specifically the PSTHs of the same stimuli shown in this study. It can be seen that layers receiving direct LGN inputs tend to fire at a higher rate (L4C, L6A). Firing rates in the PSTHs either roughly match across layers or are much higher in the densely packed layers. Therefore, the depth profile of firing rate contributes to rather than counteracting that of neuron density, enhancing the utility of the power/coherence profile for identification of correct layer boundaries.

      - Given the acute preparation used for recordings, I wonder whether tissue is available for histological evaluation. Although the layers identified are generally appropriate in relative size, it would be particularly compelling if the authors could demonstrate that the fraction of the cortical thickness occupied by each layer corresponded to the proportion occupied by that layer along the probe trajectory in histological sections. This would lend strength to the claim that these analyses can be used to identify layers in the absence of histology. Furthermore, variations in apparent cortical thickness could arise from different degrees of deviation from surface normal approach angles, which might be apparent by evaluation of histological material. I would add that variation in thickness on the scale shown in Fig. S4 is more likely to have an explanation of this kind.

      To serve other purposes unrelated to this study (identification of CO blobs), we cut the postmortem tissue in horizontal slices, so the histological comparison suggested cannot be made. The cortical thickness measured in this study had been affected not only by the angle deviation from the surface normal but also the swelling and compression of cortex. Nevertheless, evaluating the absolute thickness of cortex is not the main purpose of this study.

      Text and figure suggestions:

      - Fig 1A has been modified from Balaram & Kaas (2014) to revert to the Brodmann nomenclature scheme they argue against using in that paper; I wonder if they would object to this modification without explanation. Related, in the main text the authors initially refer to layers using Brodmann's labels with a secondary scheme (Hassler's) in parentheses and later drop the parenthetical labels; these conventions are not described or explained. Readers less familiar with the multiple nomenclature schemes for monkey V1 layers might be confused by the multiple labels without context, and could benefit from a brief description of the convention the authors have adopted.

      Throughout our article, we only used Brodmann’s naming convention because it has historically been adopted for old world monkey which we use in our study, whereas Hassler’s naming convention is more commonly used for new world monkey. Different naming conventions do not change our result, and it is out of scope for our study to discuss which nomenclature is more appropriate.

      - References to "dominate eye" throughout the text and figure legends should be replaced with "dominant eye."

      It has been changed throughout the article.

      - It is a bit odd to duplicate the same example in Fig. 2C and 2E. Perhaps a unique example would be a better use of the space.

      Here we first demonstrate the filtering effect, then compare profiles across different penetrations. The same example bridges the transition allowing side-by-side comparison.

      - The legend for Fig. 3 might be clearer if it simply listed the stimulus transitions for each column left to right, i.e. "black to white (non-dominant eye), white to black (non-dominant eye), black to white (dominant eye), ..."

      We feel that the icons are helpful. Here we want to show the stimulus colors directly to readers.

      - The misalignment between Fig. 4A vs. 4B-F, combined with the very small font size of the layer labels in Fig. 4B-F, make the visual comparison difficult. In Figs. 7 and 8, layer labels (and most labels in general) are much too small and/or low resolution to read easily. Overall, I would recommend increasing font size of labels in figures throughout the paper.

      The reused figure in Figure 4A is removed due to permission issue. Font sizes are adjusted.

      - Line 591 "using of high-density probes" should be "using high-density probes"

      Text has been changed accordingly

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      This study investigates what happens to the stimulus-driven responses of V4 neurons when an item is held in working memory. Monkeys are trained to perform memory-guided saccades: they must remember the location of a visual cue and then, after a delay, make an eye movement to the remembered location. In addition, a background stimulus (a grating) is presented that varies in contrast and orientation across trials. This stimulus serves to probe the V4 responses, is present throughout the trial, and is task-irrelevant. Using this design, the authors report memory-driven changes in the LFP power spectrum, changes in synchronization between the V4 spikes and the ongoing LFP, and no significant changes in firing rate.

      Strengths:

      (1) The logic of the experiment is nicely laid out.

      (2) The presentation is clear and concise.

      (3) The analyses are thorough, careful, and yield unambiguous results.

      (4) Together, the recording and inactivation data demonstrate quite convincingly that the signal stored in FEF is communicated to V4 and that, under the current experimental conditions, the impact from FEF manifests as variations in the timing of the stimulus-evoked V4 spikes and not in the intensity of the evoked activity (i.e., firing rate).

      Weaknesses:

      I think there are two limitations of the study that are important for evaluating the potential functional implications of the data. If these were acknowledged and discussed, it would be easier to situate these results in the broader context of the topic, and their importance would be conveyed more fairly and transparently.

      (1) While it may be true that no firing rate modulations were observed in this case, this may have been because the probe stimuli in the task were behaviorally irrelevant; if anything, they might have served as distracters to the monkey's actual task (the MGS). From this perspective, the lack of rate modulation could simply mean that the monkeys were successful in attending the relevant cue and shielding their performance from the potentially distracting effect of the background gratings. Had the visual probes been in some way behaviorally relevant and/or spatially localized (instead of full field), the data might have looked very different.

      Any task design involves tradeoffs; if the visual stimulus was behaviorally relevant, then any observed neurophysiological changes would be more confounded by possible attentional effects. We cannot exclude the possibility that a different task or different stimuli would produce different results; we ourselves have reported firing rate enhancements for other types of visual probes during an MGS task (Merrikhi et al. 2017). We have added an acknowledgement of these limitations in the discussion section (lines 311-319). At minimum, our results show a dissociation between the top-down modulation of phase coding, which is enhanced during WM even for these task-irrelevant stimuli, and rate coding. Establishing whether and how this phase coding is related to perception and behavior will be an important direction for future work.

      With this in mind, it would be prudent to dial down the tone of the conclusions, which stretch well beyond the current experimental conditions (see recommendations).

      We have edited the title (removing the word ‘primarily’) and key sentences throughout to tone down the conclusions, generally to state that the importance of a phase code in WM modulations is *possible* given the observed results, rather than certain (see abstract line 27, introduction lines 58-60, results line 215, conclusion lines 294-295).

      (2) Another point worth discussing is that although the FEF delay-period activity corresponds to a remembered location, it can also be interpreted as an attended location, or as a motor plan for the upcoming eye movement. These are overlapping constructs that are difficult to disentangle, but it would be important to mention them given prior studies of attentional or saccade-related modulation in V4. The firing rate modulations reported in some of those cases provide a stark contrast with the findings here, and I again suspect that the differences may be due at least in part to the differing experimental conditions, rather than a drastically different encoding mode or functional linkage between FEF and V4.

      We have added a paragraph to the discussion section addressing links to attention and motor planning (lines 301-322), and specifically acknowledging the inherent difficulties of fully dissociating these effects when interpreting our results (lines 311-319).

      Reviewer #2 (Public review):

      Summary:

      It is generally believed that higher-order areas in the prefrontal cortex guide selection during working memory and attention through signals that selectively recruit neuronal populations in sensory areas that encode the relevant feature. In this work, Parto-Dezfouli and colleagues tested how these prefrontal signals influence activity in visual area V4 using a spatial working memory task. They recorded neuronal activity from visual area V4 and found that information about visual features at the behaviorally relevant part of space during the memory period is carried in a spatially selective manner in the timing of spikes relative to a beta oscillation (phase coding) rather than in the average firing rate (rate code). The authors further tested whether there is a causal link between prefrontal input and the phase encoding of visual information during the memory period. They found that indeed inactivation of the frontal eye fields, a prefrontal area known to send spatial signals to V4, decreased beta oscillatory activity in V4 and information about the visual features. The authors went one step further to develop a neural model that replicated the experimental findings and suggested that changes in the average firing rate of individual neurons might be a result of small changes in the exact beta oscillation frequency within V4. These data provide important new insights into the possible mechanisms through which top-down signals can influence activity in hierarchically lower sensory areas and can therefore have a significant impact on the Systems, Cognitive, and Computational Neuroscience fields.

      Strengths:

      This is a well-written paper with a well-thought-out experimental design. The authors used a smart variation of the memory-guided saccade task to assess how information about the visual features of stimuli is encoded during the memory period. By using a grating of various contrasts and orientations as the background the authors ensured that bottom-up visual input would drive responses in visual area V4 in the delay period, something that is not commonly done in experimental settings in the same task. Moreover, one of the major strengths of the study is the use of different approaches including analysis of electrophysiological data using advanced computational methods of analysis, manipulation of activity through inactivation of the prefrontal cortex to establish causality of top-down signals on local activity signatures (beta oscillations, spike locking and information carried) as well as computational neuronal modeling. This has helped extend an observation into a possible mechanism well supported by the results.

      Weaknesses:

      Although the authors provide support for their conclusions from different approaches, I found that the selection of some of the analyses and statistical assessments made it harder for the reader to follow the comparison between a rate code and a phase code. Specifically, the authors wish to assess whether stimulus information is carried selectively for the relevant position through a firing rate or a phase code. Results for the rate code are shown in Figures 1B-G and for the phase code are shown in Figure 2. Whereas an F-statistic is shown over time in Figure 1F (and Figure S1) no such analysis is shown for LFP power. Similarly, following FEF inactivation there is no data on how that influences V4 firing rates and information carried by firing rates in the two conditions (for positions inside and outside the V4 RF). In the same vein, no data are shown on how the inactivation affects beta phase coding in the OUT condition.

      We plan to incorporate statistical analysis of this point in the revised version.

      Moreover, some of the statistical assessments could be carried out differently including all conditions to provide more insight into mechanisms. For example, a two-way ANOVA followed by post hoc tests could be employed to include comparisons across both spatial (IN, OUT) and visual feature conditions (see results in Figures 2D, S4, etc.). Figure 2D suggests that the absence of selectivity in the OUT condition (no significant difference between high and low contrast stimuli) is mainly due to an increase in slope in the OUT condition for the low contrast stimulus compared to that for the same stimulus in the IN condition. If this turns out to be true it would provide important information that the authors should address.

      We plan to incorporate statistical analysis of this point in the revised version.

      There are also a few conceptual gaps that leave the reader wondering whether the results and conclusion are general enough. Specifically,

      (1) the authors used microstimulation in the FEF to determine RFs. It is thus possible that the FEF sites that were inactivated were largely more motor-related. Given that beta oscillations and motor preparatory activity have been found to be correlated and motor sites show increased beta oscillatory activity in the delay period, it is possible that the effect of FEF inactivation on V4 beta oscillations is due to inactivation of the main source of beta activity. Had the authors inactivated sites with a preponderance of visual neurons in the FEF would the results be different?

      We do not believe this to be likely based on what is known anatomically and functionally about this circuitry. Anatomically, the projections from FEF to V4 arise primarily from the supragranular layers, not layers which contain the highest proportion of motor activity (Barone et al. 2000, Pouget et al. 2009, Markov et al. 2013). Functionally, based on electrical identification of V4-projecting FEF neurons, we know that FEF to V4 projections are predominantly characterized by delay rather than motor activity (Merrikhi et al. 2017). We have now tried to emphasize these points when we introduce the inactivation experiments (lines 180-182).

      Experimentally, the spread of the pharmacological effect with our infusion system is quite large relative to any clustering of visual vs. motor neurons within the FEF, with behavioral consequences of inactivation spreading to cover a substantial portion of the visual hemifield (e.g., Noudoost et al. 2014, Clark et al. 2014), and so our manipulation lacks the spatial resolution to selectively target motor vs. other FEF neurons.

      (2) Somewhat related to this point and given the prominence of low-frequency activity in deeper layers of the visual cortex according to some previous studies, it is not clear where the authors' V4 recordings were located. The authors report that they do have data from linear arrays, so it should be possible to address this.

      Unfortunately our chamber placement for V4 has produced linear array penetration angles which do not reliably allow identification of cortical layers. We are aware of previous results showing layer-specific effects of attention in V4 (e.g., Pettine et al. 2019, Buffalo et al. 2011), and it would indeed be interesting to determine whether our observed WM-driven changes follow similar patterns. We may be able to analyze a subset of the data with current source density analysis to look for layer-specific effects in the future, but are not able to provide any information at this time.

      (3) The authors suggest that a change in the exact frequency of oscillation underlies the increase in firing rate for different stimulus features. However, the shift in frequency is prominent for contrast but not for orientation, something that raises questions about the general applicability of this observation for different visual features.

      We plan to incorporate statistical analysis of this point in the revised version.

      (4) One of the major points of the study is the primacy of the phase code over the rate code during the delay period. Specifically, here it is shown that information about the visual features of a stimulus carried by the rate code is similar for relevant and irrelevant locations during the delay period. This contrasts with what several studies have shown for attention in which case information carried in firing rates about stimuli in the attended location is enhanced relative to that for stimuli in the unattended location. If we are to understand how top-down signals work in cognitive functions it is inevitable to compare working memory with attention. The possible source of this difference is not clear and is not discussed. The reader is left wondering whether perhaps a different measure or analysis (e.g. a percent explained variance analysis) might reveal differences during the delay period for different visual features across the two spatial conditions.

      We have added discussion regarding the relationship of these results to previous findings during attention in the discussion section (lines 301-322).

      The use of the memory-guided saccade task has certain disadvantages in the context of this study. Although delay activity is interpreted as memory activity by the authors, it is in principle possible that it reflects preparation for the upcoming saccade, spatial attention (particularly since there is a stimulus in the RF), etc. This could potentially change the conclusion and perspective.

      We have added a new discussion paragraph addressing the relationship to attention and motor planning (lines 301-322). We have also moderated the language used to describe our conclusions throughout the manuscript in light of this ambiguity.

      For the position outside the V4 RF, there is a decrease in both beta oscillations and the clustering of spikes at a specific phase. It is therefore possible that the decrease in information about the stimuli features is a byproduct of the decrease in beta power and phase locking. Decreased oscillatory activity and phase locking can result in less reliable estimates of phase, which could decrease the mutual information estimates.

      We plan to incorporate statistical analysis of this point in the revised version.

      The authors propose that coherent oscillations could be the mechanism through which the prefrontal cortex influences beta activity in V4. I assume they mean coherent oscillations between the prefrontal cortex and V4. Given that they do have simultaneous recordings from the two areas they could test this hypothesis on their own data, however, they do not provide any results on that.

      This paper only includes inactivation data. We are working on analyzing the simultaneous recording data for a future publication.

      The authors make a strong point about the relevance of changes in the oscillation frequency and how this may result in an increase in firing rate although it could also be the reverse - an increase in firing rate leading to an increase in the frequency peak. It is not clear at all how these changes in frequency could come about. A more nuanced discussion based on both experimental and modeling data is necessary to appreciate the source and role (if any) of this observation.

      As the reviewer notes, it is difficult to determine whether the frequency changes drive the rate changes, vice versa, or whether both are generated in parallel by a common source. We have adjusted our language to reflect this (lines 277-278). Future modeling work may be able to shed more light on the causal relationships between various neural signatures.

      Reviewer #3 (Public review):

      Summary:

      In this report, the authors test the necessity of prefrontal cortex (specifically, FEF) activity in driving changes in oscillatory power, spike rate, and spike timing of extrastriate visual cortex neurons during a visual-spatial working memory (WM) task. The authors recorded LFP and spikes in V4 while macaques remembered a single spatial location over a delay period during which task-irrelevant background gratings were displayed on the screen with varying orientation and contrast. V4 oscillations (in the beta range) scaled with WM maintenance, and the information encoded by spike timing relative to beta band LFP about the task-irrelevant background orientation depended on remembered location. They also compared recorded signals in V4 with and without muscimol inactivation of FEF, demonstrating the importance of FEF input for WM-induced changes in oscillatory amplitude, phase coding, and information encoded about background orientations. Finally, they built a network model that can account for some of these results. Together, these results show that FEF provides meaningful input to the visual cortex that is used to alter neural activity and that these signals can impact information coding of task-irrelevant information during a WM delay.

      Strengths:

      (1) Elegant and robust experiment that allows for clear tests for the necessity of FEF activity in WM-induced changes in V4 activity.

      (2) Comprehensive and broad analyses of interactions between LFP and spike timing provide compelling evidence for FEF-modulated phase coding of task-irrelevant stimuli at remembered location.

      (3) Convincing modeling efforts.

      Weaknesses:

      (1) 0% contrast background data (standard memory-guided saccade task) are not reported in the manuscript. While these data cannot be used to consider information content of spike rate/time about task-irrelevant background stimuli, this condition is still informative as a 'baseline' (and a more typical example of a WM task).

      We plan to incorporate statistical analysis of this point in the revised version.

      (2) Throughout the manuscript, the primary measurements of neural coding pertain to task-irrelevant stimuli (the orientation/contrast of the background, which is unrelated to the animal's task to remember a spatial location). The remembered location impacts the coding of these stimulus variables, but it's unclear how this relates to WM representations themselves.

      Indeed, here we have focused on how maintaining spatial WM impacts visual processing of incoming sensory information, rather than on how the spatial WM signal itself is represented and maintained. Behaviorally, this impact on visual signals could be related to the effects of the content of WM on perception and reaction times (e.g., Soto et al. 2008, Awh et al. 1998, Teng et al. 2019), but no such link to behavior is shown in our data.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public review):

      Summary:

      The study by Nelson et al. is focused on formation of the Drosophila Posterior Signaling Center (PSC) which ultimately acts as a niche to support hematopoietic stem cells of the lymph gland (LG). Using a combination of genetics and live imaging, the authors show that PSC cells migrate as a tight collective and associate with multiple tissues during a trajectory that positions them at the posterior of the LG.

      This is an important study that identifies Slit-Robo signaling as a regulator of PSC morphogenesis, and highlights the complex relationship of interacting cell types - PSC, visceral mesoderm (VM) and cardioblasts (CBs) - in coordinated development of these three tissues during organ development. However, one point requiring clarification is the idea that PSC cells exhibit a collective cell migration; it is not clear that the cells are migrating rather than being pushed to a more dorsal position through dorsal closure and/or other similar large scale embryo movement. This does not detract from the very interesting analysis of PSC morphogenesis as presented.

      This Public Review by Reviewer #1 is identical to their original Public Review, thus we are unsure whether Reviewer #1 assessed the revised version of our manuscript, and whether they read our responses to their original Public Review. Below we summarize our original responses to the weaknesses listed for the first version of our manuscript.

      Strengths:

      • Using expression of Hid or Grim to ablate associated tissues, they find evidence that the VM and CB of the dorsal vessel affect PSC migration/morphology whereas the alary muscles do not. Slit is expressed by both VM and CBs, and therefore Slit-Robo signaling was investigated as PSCs express Robo.

      • Using a combination of approaches, the authors convincingly demonstrate that Slit expression in the CBs and VM acts to support PSC positioning. A strength is the ability to knockdown slit levels in particular tissue types using the Gal4 system and RNAi.

      • Although in the analysis of robo mutants, the PSC positioning phenotype is weaker in the individual mutants (robo1 and robo2) with only the double mutant (robo1,robo2) exhibiting a phenotype comparable to the slit RNAi. The authors make a reasonable argument that Slit-Robo signaling has an intrinsic effect, likely acting within PSCs, because PSCs show a phenotype even when CBs do not (Fig 4G).

      • New insight into dorsal vessel formation by VM is presented in Fig 4A,B, as loss of the VM can affect dorsal vessel morphogenesis. This result additionally points to the VM as important.

      Weaknesses:

      • The authors are cautioned to temper the result that Slit-Robo signaling is intrinsic to PSC since loss of robo may affect other cell types (besides CBs and PSCs) to indirectly affect PSC migration/morphogenesis. In fact, in the robo2, robo1 mutant, the VM appears to be incorrectly positioned (Fig. 4G).

      We maintain our conclusion, and, we point out that the Reviewer stated, “The authors make a reasonable argument that Slit-Robo signaling has an intrinsic effect, likely acting within PSCs”. We already added a statement to the Discussion reminding the reader of the possibility of secondary defects (“Finally, it is possible that PSC cells do not intrinsically require Robo activation, but rather CB-independent PSC mis-positioning in sli or robo mutants could be a secondary defect caused by compromised Slit-Robo signaling in some other tissue.”).

      • If possible, the authors should use RNAi to knockdown Robo1 and Robo2 levels specifically in the PSCs if a Gal4 is available; might Antp.Gal4 (Fig 1K) be useful? Even if knockdown is achieved in PSCs+CBs, this would be a better/complementary experiment to support the approach outlined in Fig 4D.

      As described in our first response, use of Antp-GAL4 with RNAi would be no better than a whole animal double Robo mutant.

      • Movies are hard to interpret, as it seems unclear that the PSCs actively migrate rather than being pushed/moved indirectly due to association with VM and CBs/dorsal vessel.

      Vm does not directly contact the PSC, so the Vm cannot be physically pushing the PSC. In their original review, Reviewer #3 expressed similar concerns (Weaknesses #1 and #2), and upon their review of our revised manuscript they determined we addressed these concerns.

      Reviewer #2 (Public review):

      The paper by Nelson KA, et al. explored the collective migration, coalescence and positioning of the posterior signaling center (PSC) cells in Drosophila embryo. With live imaging, the authors observed the dynamic progress of PSC migration. Throughout this process, visceral mesoderm (VM), alary muscles (Ams) and cardioblasts (CBs) are in proximity of PSC. Genetic ablation of these tissues reveals the requirement for VM and CBs, but not AMs in this process. Genetic manipulations further demonstrated that Slit-Robo signaling was critical during PSC migration and positioning. While the genetic mechanisms of positioning the PSC were explored in much detail, including using live imaging, the functional consequence of mispositioning or (partial) absence of PSC cells has not been addressed, but would much increase the relevance of their findings. A few additional issues need to be addressed as well in this otherwise well-done study.

      Previous major points:

      (1) The only readout in their experiments is the relative correctness of PSC positioning. Importantly, what is the functional consequence if PSC is not properly positioned? This would be particularly important with robo-sli manipulations, where the PSC is present but some cells are misplaced. What is the consequence? Are the LGs affected, like specification of their cell types, structure and function? To address this for at least the robo-slit requirement in the PSC, it may be important to manipulate them directly in the PSC with a split Gal4 system, using Antp and Odd promoters.

      We state in our original response that exploring the functional consequences of PSC mis-positioning was outside the scope of this study. Given that the necessary cis-regulatory modules have not been identified at Antp or Odd, creating a split-GAL4 with ‘Antp and Odd promoters’ cannot be accomplished in a reasonable time frame, as we previously detailed in our original response.

      (2) The densely, parallel aligned fibers in the lower part of Figure 1J seemed to be visceral mesoderm, but further up (dorsally) that may be epidermis. It is possible that the PSC migrate together with the epidermis? This should be addressed.

      This was directly addressed by the additional data included in our revision. When epidermal closure is stalled, the PSC is able to migrate past the stalled leading edge, closer to the midline.

      (3) Although the authors described the standards of assessing PSC positioning as "normal" or "abnormal", it is rather subtle at times and variable in the mutant or KD/OE examples. The criteria should be more clearly delineated and analyzed double-blind, also since this is the only readout. Further examples of abnormal positioning in supplementary figures would also help.

      We addressed this comment in detail in our original response. Briefly, double-blinding was oftentimes not possible due to the obviousness of the genotype in the image. The criteria we outline for normal PSC positioning is as comprehensive as possible given the subtlety variability of mis-positioning phenotypes. Two of the authors independently analyzed the relatively large sets of samples and arrived at the same conclusions.

      (4) Discussion is very lengthy and should shortened.

      We shortened the Discussion in the revised version.

      Comments on revised version:

      Although the authors have responded to my concerns as they deemed suitable, these concerns still stand for the revised version.

      Given our responses above and the lack of detail in this comment, we are unsure why the Reviewer is still concerned.

      Reviewer #3 (Public review):

      Summary:

      This work is a detailed and thorough analysis of the morphogenesis of the posterior signaling center (PSC), a hematopoietic niche in the Drosophila larva. Live imaging is performed from the stage of PSC determination until the appearance of a compact lymph gland and PSC in the stage 16 embryo. This analysis is combined with genetic studies that clarify the involvement of adjacent tissue, including the visceral mesoderm, alary muscle, and cardioblasts/dorsal vessel. Lastly, the Slit/Robo signaling system is clearly implicated in the normal formation of the PSC.

      Strengths:

      The data are clearly presented and well documented, and fully support the conclusions drawn from the different experiments.

      The authors have addressed all of my previous comments, in particular concerning the role of epidermal cell rearrangements during dorsal closure as a possible force acting on the movement of PSC cells. The authors have clarified their definition of "collective migration" as it applies to the movement of PSC. The revised paper will make an important contribution to our understanding of the mechanisms driving morphogenesis.

      We are appreciative of the time spent by the Reviewer reading our responses and assessing the revision.

      ---------

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study by Nelson et al. is focused on the formation of the Drosophila Posterior Signaling Center (PSC) which ultimately acts as a niche to support hematopoietic stem cells of the lymph gland (LG). Using a combination of genetics and live imaging, the authors show that PSC cells migrate as a tight collective and associate with multiple tissues during a trajectory that positions them at the posterior of the LG.

      This is an important study that identifies Slit-Robo signaling as a regulator of PSC morphogenesis, and highlights the complex relationship of interacting cell types - PSC, visceral mesoderm (VM), and cardioblasts (CBs) - in the coordinated development of these three tissues during organ development. However, one point requiring clarification is the idea that PSC cells exhibit a collective cell migration; it is not clear that the cells are migrating rather than being pushed to a more dorsal position through dorsal closure and/or other similar large-scale embryo movement. This does not detract from the very interesting analysis of PSC morphogenesis as presented.

      Since each referee asked for clarification concerning collective cell migration, we present a combined response further below, placed after the comments from Reviewer #3.

      Strengths:

      (1) Using the expression of Hid or Grim to ablate associated tissues, they find evidence that the VM and CB of the dorsal vessel affect PSC migration/morphology whereas the alary muscles do not. Slit is expressed by both VM and CBs, and therefore Slit-Robo signaling was investigated as PSCs express Robo.

      (2) Using a combination of approaches, the authors convincingly demonstrate that Slit expression in the CBs and VM acts to support PSC positioning. A strength is the ability to knockdown slit levels in particular tissue types using the Gal4 system and RNAi.

      (3) Although in the analysis of robo mutants, the PSC positioning phenotype is weaker in the individual mutants (robo1 and robo2) with only the double mutant (robo1,robo2) exhibiting a phenotype comparable to the slit RNAi. The authors make a reasonable argument that Slit-Robo signaling has an intrinsic effect, likely acting within PSCs because PSCs show a phenotype even when CBs do not (Figure 4G).

      (4) New insight into dorsal vessel formation by VM is presented in Figure 4A, B, as loss of the VM can affect dorsal vessel morphogenesis. This result additionally points to the VM as important.

      Weaknesses:

      (1) The authors are cautioned to temper the result that Slit-Robo signaling is intrinsic to PSC since the loss of robo may affect other cell types (besides CBs and PSCs) to indirectly affect PSC migration/morphogenesis. In fact, in the robo2, robo1 mutant, the VM appears to be incorrectly positioned (Figure 4G).

      We have reexamined our wording in the relevant Results section and, given that this referee agrees that we, “make a reasonable argument that Slit-Robo signaling has an intrinsic effect, likely acting within PSCs because PSCs show a phenotype even when CBs do not (Figure 4G)”, it was not clear how we might temper our conclusions more. Given that PSC cells express Robo1 and Robo2, and that the Vm does not contact the PSC, our ‘reasonable argument’ appears fair and parsimonious. Since we agree with the referee that a reader should be made as aware as possible of alternatives, we will add a comment to the Discussion, reminding the reader of the possibility of a secondary defect.

      (2) If possible, the authors should use RNAi to knockdown Robo1 and Robo2 levels specifically in the PSCs if a Gal4 is available; might Antp.Gal4 (Fig 1K) be useful? Even if knockdown is achieved in PSCs+CBs, this would be a better/complementary experiment to support the approach outlined in Figure 4D.

      While we agree that PSC-specific knockdown of Robo1 and Robo2 simultaneously would be ideal, this is not possible. First, the most-effective UAS-RNAi transgenes (that is, those in a Valium 20 backbone) are both integrated at the same chromosomal position; these cannot be simultaneously crossed with a GAL4 transgenic line to attempt double knock down. Additionally, as with all RNAi approaches that must rely on efficient knockdown over the rapid embryonic period, even having facile access to the above does not ensure the RNAi approach will cause as effective depletion as the genetic null condition that we use. Second, as the referee concedes, there is no embryonic PSC-specific GAL4. The proposed use of Antp-GAL4 would cause knockdown in many tissues (PSC, CB, Vm, epidermis and amnioserosa). This would lead to a reservation similar to that caused by our use of the straight genetic double mutant, as regards potential indirect requirement for Robo function.

      (3) Movies are hard to interpret, as it seems unclear that the PSCs actively migrate rather than being pushed/moved indirectly due to association with VM and CBs/dorsal vessel.

      First, the Vm does not directly contact the PSC, so it cannot be pushing the PSC dorsally. We will re-examine our text to be certain to make this clear. Second, in our analysis of bin mutants, which lack Vm, LGs and PSCs are able to reach the dorsal midline region in the absence of Vm. Finally, please see our response to Reviewer #3, point 2, for why we maintain that PSC cells are “migrating” even though some PSC cells are attached to CBs.

      Reviewer #2 (Public Review):

      The paper by Nelson KA, et al. explored the collective migration, coalescence, and positioning of the posterior signaling center (PSC) cells in Drosophila embryo. With live imaging, the authors observed the dynamic progress of PSC migration. Throughout this process, visceral mesoderm (VM), alary muscles (Ams), and cardioblasts (CBs) are in proximity to PSC. Genetic ablation of these tissues reveals the requirement for VM and CBs, but not AMs in this process. Genetic manipulations further demonstrated that Slit-Robo signaling was critical during PSC migration and positioning. While the genetic mechanisms of positioning the PSC were explored in much detail, including using live imaging, the functional consequence of mispositioning or (partial) absence of PSC cells has not been addressed, but would much increase the relevance of their findings. A few additional issues need to be addressed as well in this otherwise well-done study.

      Major points:

      (1) The only readout in their experiments is the relative correctness of PSC positioning. Importantly, what is the functional consequence if PSC is not properly positioned? This would be particularly important with robo-sli manipulations, where the PSC is present but some cells are misplaced. What is the consequence? Are the LGs affected, like the specification of their cell types, structure, and function? To address this for at least the robo-slit requirement in the PSC, it may be important to manipulate them directly in the PSC with a split Gal4 system, using Antp and Odd promoters.

      We agree that the functional consequence of PSC mis-positioning is important and a relevant question to eventually address. However, virtually all markers and reagents used to assess the effect of the PSC on progenitor cells and their differentiated descendants are restricted to analyses carried out on the third larval instar - some three days after the experiments reported here. Most of the manipulated conditions in our work are no longer viable at this phase and, thus, addressing the functional consequences of a malformed PSC will require the field to develop new tools. 

      As we noted in the Introduction, the consistency with which the wildtype PSC forms as a coalesced collective at the posterior of the LG strongly suggests importance of its specific positioning and shape, as has now been found for other niches (citations in manuscript). Additionally, in the Discussion we mention the existence of a gap junction-dependent calcium signaling network in the PSC that is important for progenitor maintenance. Without continuity of this network amongst all PSC cells (under conditions of PSC mis-positioning), we strongly anticipate that the balance of progenitors to differentiated hemocytes will be mis-managed, either constitutively, and / or under immune challenge conditions. 

      Finally, to our knowledge, the tools do not exist to build a “split Gal4 system using Antp and Odd promoters”. The expression pattern observed using the genomic Antp-GAL4 line must be driven by endogenous enhancers–none of which have been defined by the field, and thus cannot be used in constructing second order drivers. Similarly, for odd skipped, in the embryo the extant Odd-GAL4 driver expresses only in the epidermis, with no expression in the embryonic LG. Thus, the cis regulatory element controlling Odd expression in the embryonic LG is unknown. In the future, the discovery of an embryonic PSC-specific driver will aid in addressing the specific functional consequences of PSC mis-positioning.

      (2) The densely, parallel aligned fibers in the part of Figure 1J seemed to be visceral mesoderm, but further up (dorsally) that may be epidermis. It is possible that the PSC migrate together with the epidermis? This should be addressed.

      See response to Reviewer #3.

      (3) Although the authors described the standards of assessing PSC positioning as "normal" or "abnormal", it is rather subtle at times and variable in the mutant or KD/OE examples. The criteria should be more clearly delineated and analyzed double-blind, also since this is the only readout. Further examples of abnormal positioning in supplementary figures would also help.

      We appreciate the Reviewer’s concern and acknowledge that the phenotypes we observed were indeed variable, and, at times subtle. As we show and discuss in the paper, our results revealed that the signaling requirements for proper PSC positioning are complex; this was favorably commented upon by Reviewer #1 (“...highlights the complex relationship of interacting cell types - PSC, visceral mesoderm (VM), and cardioblasts (CBs) - in the coordinated development of these three tissues during organ development.…”). We suspect the phenotypic variability is attributable to any number of biological differences such as heterogeneity of PSC cells and an accompanying difference in the timing of their competence to receive and respond to Slit-Robo signaling, the timing of release of Slit from CBs and Vm, number of cells in a given PSC, which PSC cells in the cluster respond to too little or too much signaling, and/or typical variability between organisms. Furthermore, PSC positioning analyses were conducted by two of the authors, who independently came to the same conclusions. For many of the manipulations double blinding was not possible since the genotype of the embryo was discernible due to the obvious phenotype of the manipulated tissue.

      (4) The Discussion is very lengthy and should shortened.

      We will re-examine the prose and emphasize more conciseness, while maintaining clarity for the reader.

      Reviewer #3 (Public Review):

      Summary:

      This work is a detailed and thorough analysis of the morphogenesis of the posterior signaling center (PSC), a hematopoietic niche in the Drosophila larva. Live imaging is performed from the stage of PSC determination until the appearance of a compact lymph gland and PSC in the stage 16 embryo. This analysis is combined with genetic studies that clarify the involvement of adjacent tissue, including the visceral mesoderm, alary muscle, and cardioblasts/dorsal vessels. Lastly, the Slit/Robo signaling system is clearly implicated in the normal formation of the PSC.

      Strengths:

      The data are clearly presented, well documented, and fully support the conclusions drawn from the different experiments. The manuscript differs in character from the mainstay of "big data" papers (for example, no sets of single-cell RNAseq data of, for instance, PSC cells with more or less Slit input, are offered), but what it lacks in this regard, it makes up in carefully planned and executed visualizations and genetic manipulations.

      Weaknesses:

      A few suggestions concerning improvement of the way the story is told and contextualized.

      (1) The minute cluster of PSC progenitors (5 or so cells per side) is embedded (as known before and shown nicely in this study) in other "migrating" cell pools, like the cardioblasts, pericardial cells, lymph gland progenitors, alary muscle progenitors. These all appear to move more or less synchronously. What should also be mentioned is another tissue, the dorsal epidermis, which also "moves" (better: stretches?) towards the dorsal midline during dorsal closure. Would it be reasonable to speculate (based on previously published data) that without the force of dorsal closure, operating in the epidermis, at least the lateral>medial component of the "migration" of the PSC (and neighboring tissues) would be missing? If dorsal closure is blocked, do essential components of PSC and lymph gland morphogenesis (except for the coming-together of the left and right halves) still occur? Are there any published data on this?

      Each of the Reviewers is interested in our response to this very relevant question, and, thus, we will address the issue en bloc here. First, we will add a Supplementary Figure showing that LG and CBs are still able to progress medially towards the dorsal midline when dorsal closure stalls.  This rules out any major effect for the most prominent “large-scale embryo cell sheet movement” in positioning the PSC. Second, published work by Haack et. al. and Balaghi et. al. shows that CBs and leading edge epidermal cells are independently migratory, and we will add this context to the manuscript for the reader.

      (2) Along similar lines: the process of PSC formation is characterized as "migration". To be fair: the authors bring up the possibility that some of the phenotypes they observe could be "passive"/secondary: "Thus, it became important to test whether all PSC phenotypes might be 'passive', explained by PSC attachment to a malforming dorsal vessel. Alternatively, the PSC defects could reflect a requirement for Robo activation directly in PSC cells." And the issue is resolved satisfactorily. But more generally, "cell migration" implies active displacement (by cytoskeletal forces) of cells relative to a substrate or to their neighbors (like for example migration of hemocytes). This to me doesn't seem really clearly to happen here for the dorsal mesodermal structures. Couldn't one rather characterize the assembly of PSC, lymph gland, pericardial cells, and dorsal vessel in terms of differential adhesion, on top of a more general adhesion of cells to each other and the epidermis, and then dorsal closure as a driving force for cell displacement? The authors should bring in the published literature to provide a background that does (or does not) justify the term "migration".

      Before addressing this specifically, we remind readers of our response above that states the rationale ruling out large, embryo-scale movements, such as epidermal dorsal closure, in driving PSC positioning. So, how are PSC cells arriving at their reproducible position? This manuscript reports the first live-imaging of the PSC as it comes to be positioned in the embryo. We interpret these movies to suggest strongly that these cells are a ‘collective’ that migrates. Neither the data, nor we, are asserting that each PSC cell is ‘individually’ migrating to its final position. Rather, our data suggest that the PSC migrates as a collective. The most paradigmatic example of directed, collective cell migration, is of Drosophila ovarian border cells. That cell cluster is surrounded at all times by other cells (nurse cells, in that case), and for the collective to traverse through the tissue, the process requires constant remodeling of associations amongst the migrating cells in the collective (the border cells), as well as between cells in the collective and those outside of it (the nurse cells). In fact, the nurse cells are considered the substrate upon which border cells migrate. Note also that in collective border cell migration cells within the collective can switch neighbors, suggesting dynamic changes to cell associations and adhesions. 

      In our analysis, the PSC cells exhibit qualities reminiscent of the border cells, and thus we infer that the PSC constitutes a migratory cell collective.  We also show in Figure 1H that PSC cells exhibit cellular extensions, and thus have a very active, intrinsic actin-based cytoskeleton. In fact, in Figure 1I, we point out that PSC cells shift position within the collective, which is not only a direct feature of migration, but also occurs within the border cell collective as that collective migrates. Additionally, the fact that the lateral-most PSC cells shift position in the collective while remaining a part of the collective–and they do this while executing net directional movement–makes a strong argument that the PSC is migratory, as no cell types other than PSCs are contacting the surfaces of those shifting PSC cells. Lastly, the Reviewer’s supposition that, rather than migration, dorsal mesoderm structures form via “differential adhesion, on top of a more general adhesion of cells to each other” is, actually, precisely an inherent aspect of collective cell migration as summarized above for the ovarian border collective.

      In our resubmission we will adjust text citing the existing literature to better put into context the reasoning for why PSC formation based on our data is an example of collective cell migration.

      (3) That brings up the mechanistic centerpiece of this story, the Slit/Robo system. First: I suggest adding more detailed data from the study by Morin-Poulard et al 2016, in the Introduction, since these authors had already implicated Slit-Robo in PSC function and offered a concrete molecular mechanism: "vascular cells produce Slit that activates Robo receptors in the PSC. Robo activation controls proliferation and clustering of PSC cells by regulating Myc, and small GTPase and DE-cadherin activity, respectively". As stated in the Discussion: the mechanism of Slit/Robo action on the PSC in the embryo is likely different, since DE-cadherin is not expressed in the embryonic PSC; however, it maybe not be THAT different: it could also act on adhesion between PSC cells themselves and their neighbors. What are other adhesion proteins that appear in the late lateral mesodermal structures?

      Could DN-cadherin or Fasciclins be involved?

      We agree with the Reviewer that Slit-Robo signaling likely acts in part on the PSC by affecting PSC cell adhesion to each other and/or to CBs (lines 428-435). As stated in the Discussion, we do not observe Fasciclin III expression in the PSC until late stages when the PSC has already been positioned, suggesting that Fasciclin III is not an active player in PSC formation. Assessing whether the PSC expresses any other of the suite of potential cell adhesion molecules such as DN-Cadherin or other Fasciclins, and then study their potential involvement in the Slit-Robo pathway in PSC cells, would be part of a follow-up study.  

      Recommendations for the authors:

      Reviewing Editor Comments:

      The authors are encouraged to address several key issues and provide more explicit clarification when interpreting the behavior of the PSC cells as "migration." It is recommended that the authors engage with all reviewers' comments and refine the text based on the feedback they find valuable.

      Reviewer #1 (Recommendations For The Authors):

      Major concerns:

      (1) Is it possible to assay robo1 and/or robo1 RNAi in a tissue-specific manner to further explore an intrinsic role in the PSC? Might the VM indirectly affect PSCs in a CB-independent manner? How does this affect the interpretation of results in Figure 4.

      See also our response to Reviewer #1, Public review weaknesses #2.

      Though we agree with the Reviewer that this is the better experiment to test for an intrinsic role for Robo in the PSC, this experiment is not possible at this time. As we noted in the manuscript, we do not yet have an embryonic PSC-specific GAL4, though we have been putting efforts towards identifying/developing such a tool. The Antp-GAL4 driver we used in this study will drive not only in both PSCs and CBs, but also in Vm, epidermis, and amnioserosa, as well as other tissues. The other available embryonic PSC drivers are not specific to the PSC and will drive expression in CBs and Vm, at minimum. This, combined with the reality that RNAi can be ineffective in embryonic tissues, resulted in our use of whole organism mutants to best address this question. 

      We acknowledge that it is possible the Vm indirectly effects the PSC in a CB-independent manner in the double Robo mutant, and we added a statement to the Discussion reiterating this point. However, because the PSC expresses Robo1 and Robo2, we maintain that the simplest interpretation of the results in Figure 4 is that PSC cells require intrinsic Robo signaling. And, as we state in the manuscript, it is possible that Slit signals directly from Vm to Robo on the PSC.

      (2) As this is the first study to be presenting PSC formation as involving collective cell migration, can the authors provide experimental evidence and rationale for this categorization?

      We have added our rationale to the Results section in the revision.

      See also our response to Reviewer #3, Public review weakness #2.

      (3) The Slit staining presented in Fig 3 W', Z' should be quantified. Furthermore, what is the VM phenotype when Robo1 is overexpressed? Is there a VM-specific phenotype and could this indirect effect cause the PSC to misform/mismigrate?

      We didn’t quantify Slit levels in the Vm-specific Robo overexpression condition because there was a visually striking difference compared to controls (increased intensity and specific localization to Vm membranes), and the manipulation resulted in a PSC phenotype. Thus, the evidence we show appears sufficient to strongly suggest that our genetic manipulation resulted in successful trapping of Slit on the Vm.

      As to a Vm phenotype when Robo1 is overexpressed Vm-specifically: we know Vm is present, but we haven’t performed an in-depth phenotypic analysis. In the manuscript we show that this manipulation at least affects organization of PSC-adjacent CBs, which we go on to show is correlated with mis-positioned PSCs. Thus, the PSC phenotype in this condition is not solely due to a Vm-specific phenotype.

      Minor concerns/suggestions:

      (1) I might have missed it but where are the Movies referenced in the text? Are legends provided for the videos? It is important that this is included in the final version (or more clearly presented if I missed it).

      We thank you the Reviewer for pointing this out; we now direct the reader to the movies at appropriate places within the text.

      (2) In Figure 5, it might be helpful to add a third column to A in which the PSCs are pseudo-colored and thus highlighted because it is difficult to discern the white (not pink) PSCs...

      We appreciate the suggestion and now include these panels as Figure 5A’’ in the revision.

      (3) If I am following correctly, the lost PSC cells in Figure 5 don't move. Doesn't this suggest that what is critical is that the PSCs attach to the VM and/or CBs, and not necessarily that they are an actively migrating cell type? They "move" but might be passively carried.

      See also the response to Reviewer #3, Public reviews weaknesses #2.

      The Reviewer is correct that the PSC cells in Fig. 5 don’t move very much, but we interpret this differently from the Reviewer. After detachment of the cells in question they undergo dramatic shape changes, indicating active cytoskeletal remodeling, so the molecular machinery needed for migration appears to remain intact. Thus, we suggest that this observation actually emphasizes our finding that collectivity is needed for the migration. Given the consistency of PSC coalescence/collectivity and the intricate regulation that controls it, we believe it to be an integral part of PSC identity. When PSC cells become detached, they likely lose an aspect of their identity. In various manipulations we’ve noted instances of severely dispersed PSC cells expressing very low levels of identity markers Antp or Odd. Cells in such cases are likely compromised for their function, and this can include, for example, whether they can properly sense cues for migration.

      Reviewer #2 (Recommendations For The Authors):

      Minor points:

      (1) The expression pattern of Antp-Gal4 > myrGFP in the whole embryo should be shown to better demonstrate the overlap with Odd. How does it compare with Antp-Gal4 > CD8::GFP?

      We do not understand the question posed. We are not suggesting that Antp and Odd overlap in all cells, nor even many cells. It has been demonstrated by the field that co-expression among mesodermal cells, in the position where LG cells are specified, is a marker for the PSC. We have not thoroughly investigated all reporter lines for the GAL4 drivers used by the field.

      (2) Does Tincdelta4-Gal4 not at all express in the PSC? This should be verified.

      This question appears to refer to depletion of Slit by RNAi or cell killing driven by tinCΔ4-GAL4. TinCΔ4-GAL4 is expressed in CBs and in precisely 1 embryonic PSC cell. First, Slit isn’t expressed by any PSC cells to our eye, so any PSC mis-positioning observed upon tinCΔ4>Sli RNAi implicates CB involvement in PSC positioning. In designing tests for CB involvement, we were unable to identify any mutant known to lack CBs (or have fewer CBs) that didn’t also affect specification of the LG/PSC. The cell killing approach seemed best.  It is possible that, in this scenario, perhaps ablation of a single, key PSC cell could affect final positioning of the other PSCs, but we think that less likely than a role for CBs. We also retain our original conclusion due to the fact that we often find mis-positioned PSC cells adjacent to mis-positioned CBs, including in the panel representing the CB ablation experiment, Figure 2S.  

      (3) Line 212: The data provide evidence that Vm is necessary, but clearly not sufficient, as CBs are also necessary.

      We see how this wording was misleading and have adjusted the text accordingly.

      (4) The CBs are not visible in Figure 3B.

      We are unsure what the Reviewer is referring to, as we are certain that the signal between the blue outlines is indeed Slit expression in CBs.

      Reviewer #3 (Recommendations For The Authors):

      One minor mistake (I believe): in line 229 it should say "3C and 3D"

      We have corrected this error.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews

      Reviewer #1 (Public Review): 

      (1) Although the theory is based on memory, it also is based on spatially-selective cells.

      Not all cells in the hippocampus fulfill the criteria of place/HD/border/grid cells, and place a role in memory. E.g., Tonegawa, Buszaki labs' work does not focus on only those cells, and there are certainly a lot of non-pure spatial cells in monkeys (Martinez-Trujillo) and humans (iEEG). Does the author mainly focus on saying that "spatial cells" are memory, but do not account for non-spatial memory cells? This seems to be an incomplete account of memory - which is fine, but the way the model is set up suggests that *all* memory is, place (what/where), and non-spatial attributes ("grid") - but cells that don't fulfil these criteria in MTL (Diehl et al., 2017, Neuron; non-grid cells; Schaeffer et al., 2022, ICML; Luo et al., 2024, bioRxiv) certainly contribute to memory, and even navigation. This is also related to the question of whether these cell definitions matter at all (Luo et al., 2024). The authors note "However, this memory conjunction view of the MTL must be reconciled with the rodent electrophysiology finding that most cells in MTL appear to have receptive fields related to some aspect of spatial navigation (Boccara et al., 2010; Grieves & Jeffery, 2017). The paucity of non-spatial cells in MTL could be explained if grid cells have been mischaracterized as spatial." Is the author mainly talking about rodent work?

      There is a new section in the introduction that deals with these issues, titled ‘Why Model the Rodent Navigation Literature with a Memory Model?’ That section reads:

      “Spatial navigation is inherently a memory problem – learning the spatial arrangement of a new enclosure requires memory for the conjunction of what and where. This has long been realized and in the introduction to ‘Hippocampus as a Cognitive Map’, O’Keefe and Nadel (1978) wrote “We shall argue that the hippocampus is the core of a neural memory system providing an objective spatial framework within which the items and events of an organism's experience are located and interrelated” (emphasis added). Furthermore, in the last chapter of their book, they extended cognitive map theory to human memory for non-spatial characteristics. However, in the decades since the development of cognitive map theory, the rodent spatial navigation and human memory literatures have progressed somewhat independently.

      The ideas proposed in this model are an attempt to reunify these literatures by returning to the original claim that spatial navigation is inherently a memory problem. The goal of the current study is to explain the rodent spatial navigation literature using a memory model that has the potential to also explain the human memory literature. In contrast, most grid cell models (Bellmund et al., 2016; Bush et al., 2015; Castro & Aguiar, 2014; Hasselmo, 2009; Mhatre et al., 2012; Solstad et al., 2006; Sorscher et al., 2023; Stepanyuk, 2015; Widloski & Fiete, 2014) are domain specific models of spatial navigation and as such, they do not lend themselves to explanations of human memory. Thus, the reason to prefer this model is parsimony. Rather than needing to develop a theory of memory that is separate from a theory of spatial navigation, it might be possible to address both literatures with a unified account.

      This study does not attempt to falsify other theories of grid cells. Instead, this model reaches a radically different interpretation regarding the function of grid cells; an interpretation that emerges from viewing spatial navigation as a memory problem. All other grid cell models assume that an entorhinal grid cell displaying a spatially arranged grid of firing fields serves the function of spatial coding (i.e., spatial grid cells exist to support a spatial metric). In contrast, the proposed memory model of grid cells assumes that the hexagonal tiling reflects the need to keep memories separate from each other to minimize confusion and confabulation – the grid pattern is the byproduct of pattern separation between memories rather than the basis of a spatial code. 

      It is now understood that grid-like firing fields can occur for non-spatial twodimensional spaces. For instance, human entorhinal cortex exhibits grid-like responses to video morph trajectories in a two-dimensional bird neck-length versus bird leg-length space (Constantinescu et al., 2016). As a general theory of learning and memory, the proposed memory model of grid cells is easily extended to explain these results (e.g., relabeling the border cell inputs in the model as neck-length and leg-length inputs). However, there are other grid cell models that can explain both spatial grid cells as well as non-spatial grid-like responses (Mok & Love, 2019; Rodríguez-Domínguez & Caplan, 2019; Stachenfeld et al., 2017; Wei et al., 2015). Similar to this memory model of grid cells, these models are also positioned to explain both the rodent spatial navigation and human memory literatures. Nevertheless, there is a key difference between this model and other grid cell models that generalize to non-spatial representations. Specifically, these other models assume that grid cells exhibiting spatial receptive fields serve the function of identifying positions in the environment (i.e., their function is spatial). As such, these models do not explain why most of the input to rodent hippocampus appears to be spatial (Boccara et al., 2010; Diehl et al., 2017; Grieves & Jeffery, 2017). This memory model of grid cells provides an answer to the apparent paucity of nonspatial cell types in rodent MTL by proposing that grid cells with spatial receptive fields have been misclassified as spatial (they are what cells rather than where cells) and that place cells are fundamentally memory cells that conjoin what and where.”

      (2) Related to the last point, how about non-grid multi-field mEC cells? In theory, these also should be the same; but the author only presents perfect-look grid cells. In empirical work, clearly, this is not the case, and many mEC cells are multi-field non-grid cells (Diehl et al., 2017). Does the model find these cells? Do they play a different role? As noted by the author "Because the non-spatial attributes are constant throughout the two-dimensional surface, this results in an array of discrete memory locations that are approximately hexagonal (as explained in the Model Methods, an "online" memory consolidation process employing pattern separation rapidly turns an approximately hexagonal array into one that is precisely hexagonal). " If they are indeed all precisely hexagonal, does that mean the model doesn't have non-grid spatial cells? 

      Grid cells with irregular firing fields are now considered in the discussion with the following paragraphs

      “According to this model, hexagonally arranged grid cells should be the exception rather than the rule when considering more naturalistic environments. In a more ecologically valid situation, such as with landmarks, varied sounds, food sources, threats, and interactions with conspecifics, there may still be remembered locations were events occurred or remembered properties can be found, but because the non-spatial properties are non-uniform in the environment, the arrangement of memory feedback will be irregular, reflecting the varied nature of the environment. This may explain the finding that even in a situation where there are regular hexagonal grid cells, there are often irregular non-grid cells that have a reliable multi-location firing field, but the arrangement of the firing fields is irregular (Diehl et al., 2017). For instance, even when navigating in an enclosure that has uniform properties as dictated by experimental procedures, they may be other properties that were not well-controlled (e.g., a view of exterior lighting in some locations but not others), and these uncontrolled properties may produce an irregular grid (i.e., because the uncontrolled properties are reliably associated with some locations but not others, hippocampal memory feedback triggers retrieval of those properties in the associations locations).

      In this memory model, there are other situations in which an irregular but reliable multilocation grid may occur, even when everything is well controlled. In the reported simulations, when the hippocampal place cells were based on variation in X/Y (as defined by Border cells), nothing else changed as a function of location, and the model rapidly produced a precise hexagonal arrangement of hippocampal place cell memories. When head direction was included (i.e., real-world variation in X, Y, and head direction), the model still produced a hexagonal arrangement as per face-centered cubic packing of memories, but this precise arrangement was slower to emerge, with place cells continuing to shift their positions until the borders of the enclosure were sufficiently well learned from multiple viewpoints. If there is real-world variation in four or more dimensions, as is likely the case in a more ecologically valid situation, it will be even harder for place cell memories to settle on a precise regular lattice. Furthermore, in the case of four dimensions, mathematicians studying the “sphere packing problem” recently concluded that densest packing is irregular (Campos et al., 2023). This may explain why the multifield grid cells for freely flying bats have a systematic minimum distance between firing fields, but their arrangement is globally irregular (Ginosar et al., 2021). Assuming that the memories encoded by a bat include not just the three real-world dimensions of variation, but also head direction, the grid will likely be irregular even under optimal conditions of laboratory control.”

      (3) Theoretical reasons for why the model is put together this way, and why grid cells must be coding a non-spatial attribute: Is this account more data-driven (fits the data so formulated this way), or is it theoretical - there is a reason why place, border, grid cells are formulated to be like this. For example, is it an efficient way to code these variables? It can be both, like how the BVC model makes theoretical sense that you can use boundaries to determine a specific location (and so place cell), but also works (creates realistic place cells). 

      The motivation for this model is now articulated in the new section, quoted above, titled ‘Why Model the Rodent Navigation Literature with a Memory Model?’ Regarding the assumption that border cells provide a spatial metric, this assumption is made for the same reasons as in the BVC model. Regarding this, the text said: “These assumptions regarding border cells are based on the boundary vector cell (BVC) model of Barry et al. (2006). As in the BVC model, combinations of border cells encode where each memory occurred in the realworld X/Y plane.”. A new sentence is added to model methods, stating: “This assumption is made because border cells provide an efficient representation of Euclidean space (e.g., if the animal knows how far it is from different walls of the enclosure, this already available information can be used to calculate location).”

      But in this case, the purpose of grid cell coding a non-spatial attribute, and having some kind of system where it doesn't fire at all locations seems a little arbitrary. If it's not encoding a spatial attribute, it doesn't have to have a spatial field. For example, it could fire in the whole arena - which some cells do (and don't pass the criteria of spatial cells as they are not spatially "selective" to another location, related to above).  

      Some cells have a constant high firing rate, but they are the exception rather than the rule. More typically, cells habituate in the presence of ongoing excitatory drive and by doing so become sensitive to fluctuations in excitatory drive. Habituation is advantageous both in terms of metabolic cost and in terms of function (i.e., sensitivity to change). This is now explained in the following paragraph:

      “In theory, a cell representing a non-spatial attribute found at all locations of an enclosure (aka, a grid cell in the context of this model), could fire constantly within the enclosure. However, in practice, cells habituate and rapidly reduce their firing rate by an order of magnitude when their preferred stimulus is presented without cessation (Abbott et al., 1997; Tsodyks & Markram, 1997). After habituation, the firing rate of the cell fluctuates with minor variation in the strength of the excitatory drive. In other words, habituation allows the cell to become sensitive to changes in the excitatory drive (Huber & O’Reilly, 2003). Thus, if there is stronger top-down memory feedback in some locations as compared to others, the cell will fire at a higher rate in those remembered locations rather than in all locations even though the attribute is found at all locations. In brief when faced with constant excitatory drive, the cell accommodates, and becomes sensitive to change in the magnitude of the excitatory drive. In the model simulation, this dynamic adaptation is captured by supposing that cells fire 5% of the time on-average across the simulation, regardless of their excitatory inputs.”

      (4) Why are grid cells given such a large role for encoding non-spatial attributes? If anything, shouldn't it be lateral EC or perirhinal cortex? Of course, they both could, but there is less reason to think this, at least for rodent mEC.  

      This is a good point and the following paragraph has been added to the introduction to explain that lateral EC is likely part of the explanation. But even when including lateral EC, it still appears that most of the input to hippocampus is spatial.

      “One possible answer to the apparent lack of non-spatial cells in MTL is to highlight the role of the lateral entorhinal cortex (LEC) as the source of non-spatial what information for memory encoding (Deshmukh & Knierim, 2011). LEC can be contrasted with mEC, which appears to only provide where information (Boccara et al., 2010a; Diehl et al., 2017). Although it is generally true that LEC is involved in non-spatial processing, there is evidence that LEC provides some forms of spatial information (Knierim et al., 2014). The kind of non-spatial information provided by LEC appears to be in relation to objects (Connor & Knierim, 2017; Wilson et al., 2013). However, in a typical rodent spatial navigation study there are no objects within the enclosure. Thus, although the distinction between mEC and LEC is likely part of the explanation, it is still the case that rodent entorhinal input to hippocampus appears to heavily favor spatial information.”

      (5) Clarification: why do place cells and grid cells differ in terms of stability in the model? Place cells are not stable initially but grid cells come out immediately. They seem directly connected so a bit unclear why; especially if place cell feedback leads to grid cell fields. There is an explanation in the text - based on grid cells coding the on-average memories, but these should be based on place cell inputs as well. So how is it that place fields are unstable then grid fields do not move at all? I wonder if a set of images or videos (gifs) showing the differences in spatial learning would be nice and clarify this point.  

      In this revision, I provide a new video focused on learning of place cell memories that include head direction. This second video is in relation to the results reported in Figure 9. The short answer is that the grid fields for the non-spatial cell are based on the average across several view-dependent memories (i.e., across several place cells that have head direction sensitivity) and the average is reliable even if the place cells are unstable. The text of this explanation now reads:

      “Why was the grid immediately apparent for the non-spatial attribute cell whereas the grid took considerable prior experience for the head direction cells? The answer relates to memory consolidation and the shifting nature of the hippocampal place cells. Head direction cells only produced a reliable grid once the hippocampal place cells (aka, memory cells) assumed stable locations. During the first few sessions, the hippocampal place cells were shifting their positions owing to pattern separation and consolidation. But once the place cells stabilized, they provided reliable top-down memory feedback to the head direction cells in some places but not others, thus producing a reliable grid arrangement to the firing maps of the head direction cells. In other words, for the head direction cells, the grid only appeared once the place cells stabilized. This slow stabilization of place fields is a known property (Bostock et al., 1991; Frank et al., 2004).

      In the simulation, the place cells did not stabilize until a sufficient number of place cells were created (Figure 9C). Specifically, these additional memories were located immediately outside the enclosure, around all borders (Figure 9D). These “outside the box” memories served to constrain the interior place cells, locking them in position despite ongoing consolidation. This dynamic can be seen in a movie showing a representative simulation. The movie shows the positions of the head direction sensitive place cells during initial learning, and then during additional sessions of prior experience as the movie speeds up (see link in Figure 9 capture).

      Why did the non-spatial grid cell (k) produce a grid immediately, before the place cells stabilized? As discussed in relation to Figure 8, the non-spatial grid cell is the projection through the 3D volume of real-world coordinates that includes X, Y, and head direction. Each grid field of a non-spatial grid cell reflects feedback from several place cells that each have a different head direction sensitivity (see for instance the allocentric pairs of memories illustrated in Figure 8C and 8D). Thus, each grid field is the average across several memories that entail different viewpoints and this averaging across memories provides stability even if the individual memories are not yet stable. This average of unstable memories produces a blurry sort of grid pattern without any prior experience.

      A final piece of the puzzle relies on the same mechanism that caused the grid pattern to align with the borders as reported in the results of Figures 6 and 7. Specifically, there are some “sticky” locations with ongoing consolidation because the connection weights are bounded. Because weights cannot go below their minimum or above their maximum, it is slightly more difficult for consolidation to push or pull connection weights over the peak value or under the minimum value of the tuning curve. Thus, the place cells tend to linger in locations that correspond to the peak or trough of a border cell. There are multiple peak and trough locations but for the parameter values in this simulation, the grid pattern seen in Figure 9C shows the set of peak/trough locations that satisfy the desired spacing between memories. Thus, the average across memories shows a reliable grid field at these locations even though the memories are unstable.”

      (6) Other predictions. Clearly, the model makes many interesting (and quite specific!) predictions. But does it make some known simple predictions? 

      • More place cells at rewarded (or more visited) locations. Some empirical researchers seem to think this is not as obvious as it seems (e.g., Duvellle et al., 2019; JoN; Nyberg et al., 2021, Neuron Review).  

      • Grid cell field moves toward reward (Butler et al., 2019; Boccera et al., 2019).  

      • Grid cells deform in trapezoid (Krupic et al., 2015) and change in environments like mazes (Derikman et al., 2014).  

      Thank you for these suggestions and I have added the following paragraph to the discussion:

      “In terms of the animal’s internal state, all locations in the enclosure may be viewed as equally aversive and unrewarding, which is a memorable characteristic of the enclosure. Reward, or lack thereof, is arguably one of the most important nonspatial characteristics and application of this model to reward might explain the existence of goal-related activity in place cells (Hok et al., 2007; although see Duvelle et al., 2019), reflecting the need to remember rewarding locations for goal directed behavior. Furthermore, if place cell memories for a rewarding location activate entorhinal grid cells, this may explain the finding that grid cells remap in an enclosure with a rewarded location such that firing fields are attracted to that location (Boccara et al., 2019; Butler et al., 2019). Studies that introduce reward into the enclosure are an important first step in terms of examining what happens to grid cells when the animal is placed in a more varied environment.”

      Regarding the changes in shape of the environment, this was discussed in the section of the paper that reads “As seen in Figure 12, because all but one of the place cells was exterior when the simulated animal was constrained to a narrow passage, the hippocampal place cell memories were no longer arranged in a hexagonal grid. This disruption of the grid array for narrow passages might explain the finding that the grid pattern (of grid cells) is disrupted in the thin corner of a trapezoid (Krupic et al., 2015) and disrupted when a previously open enclosure is converted to a hairpin maze by insertion of additional walls within the enclosure (Derdikman et al., 2009).” This particular section of the paper now appears in the Appendix and Figure 12 is now Appendix Figure 2.

      Reviewer #2 (Public Review): 

      The manuscript describes a new framework for thinking about the place and grid cell system in the hippocampus and entorhinal cortex in which these cells are fundamentally involved in supporting non-spatial information coding. If this framework were shown to be correct, it could have high impact because it would suggest a completely new way of thinking about the mammalian memory system in which this system is non-spatial. Although this idea is intriguing and thought-provoking, a very significant caveat is that the paper does not provide evidence that specifically supports its framework and rules out the alternate interpretations. Thus, although the work provides interesting new ideas, it leaves the reader with more questions than answers because it does not rule out any earlier ideas. 

      Basically, the strongest claim in the paper, that grid cells are inherently non-spatial, cannot be specifically evaluated versus existing frameworks on the basis of the evidence that is shown here. If, for example, the author had provided behavioral experiments showing that human memory encoding/retrieval performance shifts in relation to the predictions of the model following changes in the environment, it would have been potentially exciting because it could potentially support the author's reconceptualization of this system. But in its current form, the paper merely shows that a new type of model is capable of explaining the existing findings. There is not adequate data or results to show that the new model is a significantly better fit to the data compared to earlier models, which limits the impact of the work. In fact, there are some key data points in which the earlier models seem to better fit the data.  

      Overall, I would be more convinced that the findings from the paper are impactful if the author showed specific animal memory behavioral results that were only supported by their memory model but not by a purely spatial model. Perhaps the author could run new experiments to show that there are specific patterns of human or animal behavior that are only explained by their memory model and not by earlier models. But in its current form, I cannot rule out the existing frameworks and I believe some of the claims in this regard are overstated. 

      As previously detailed in Box 1 and as explained in the text in several places, the model provides an explanation of several findings that remain unexplained by other theories (see “Results Uniquely Explained by the Memory Model”). But more generally this is a good point, and the initial draft failed to fully articulate why a researcher might choose this model to guide future empirical investigations. A new section in the introduction that deals with these issues, titled ‘Why Model the Rodent Navigation Literature with a Memory Model?’ That section reads:

      “Spatial navigation is inherently a memory problem – learning the spatial arrangement of a new enclosure requires memory for the conjunction of what and where. This has long been realized and in the introduction to ‘Hippocampus as a Cognitive Map’, O’Keefe and Nadel (1978) wrote “We shall argue that the hippocampus is the core of a neural memory system providing an objective spatial framework within which the items and events of an organism's experience are located and interrelated” (emphasis added). Furthermore, in the last chapter of their book, they extended cognitive map theory to human memory for non-spatial characteristics. However, in the decades since the development of cognitive map theory, the rodent spatial navigation and human memory literatures have progressed somewhat independently.

      The ideas proposed in this model are an attempt to reunify these literatures by returning to the original claim that spatial navigation is inherently a memory problem. The goal of the current study is to explain the rodent spatial navigation literature using a memory model that has the potential to also explain the human memory literature. In contrast, most grid cell models (Bellmund et al., 2016; Bush et al., 2015; Castro & Aguiar, 2014; Hasselmo, 2009; Mhatre et al., 2012; Solstad et al., 2006; Sorscher et al., 2023; Stepanyuk, 2015; Widloski & Fiete, 2014) are domain specific models of spatial navigation and as such, they do not lend themselves to explanations of human memory. Thus, the reason to prefer this model is parsimony. Rather than needing to develop a theory of memory that is separate from a theory of spatial navigation, it might be possible to address both literatures with a unified account.

      This study does not attempt to falsify other theories of grid cells. Instead, this model reaches a radically different interpretation regarding the function of grid cells; an interpretation that emerges from viewing spatial navigation as a memory problem. All other grid cell models assume that an entorhinal grid cell displaying a spatially arranged grid of firing fields serves the function of spatial coding (i.e., spatial grid cells exist to support a spatial metric). In contrast, the proposed memory model of grid cells assumes that the hexagonal tiling reflects the need to keep memories separate from each other to minimize confusion and confabulation – the grid pattern is the byproduct of pattern separation between memories rather than the basis of a spatial code. 

      It is now understood that grid-like firing fields can occur for non-spatial twodimensional spaces. For instance, human entorhinal cortex exhibits grid-like responses to video morph trajectories in a two-dimensional bird neck-length versus bird leg-length space (Constantinescu et al., 2016). As a general theory of learning and memory, the proposed memory model of grid cells is easily extended to explain these results (e.g., relabeling the border cell inputs in the model as neck-length and leg-length inputs). However, there are other grid cell models that can explain both spatial grid cells as well as non-spatial grid-like responses (Mok & Love, 2019; Rodríguez-Domínguez & Caplan, 2019; Stachenfeld et al., 2017; Wei et al., 2015). Similar to this memory model of grid cells, these models are also positioned to explain both the rodent spatial navigation and human memory literatures. Nevertheless, there is a key difference between this model and other grid cell models that generalize to non-spatial representations. Specifically, these other models assume that grid cells exhibiting spatial receptive fields serve the function of identifying positions in the environment (i.e., their function is spatial). As such, these models do not explain why most of the input to rodent hippocampus appears to be spatial (Boccara et al., 2010; Diehl et al., 2017; Grieves & Jeffery, 2017). This memory model of grid cells provides an answer to the apparent paucity of nonspatial cell types in rodent MTL by proposing that grid cells with spatial receptive fields have been misclassified as spatial (they are what cells rather than where cells) and that place cells are fundamentally memory cells that conjoin what and where.”

      - The paper does not fully take into account all the findings regarding grid cells, some of which very clearly show spatial processing in this system. For example, findings on grid-bydirection cells (e.g., Sargolini et al. 2006) would seem to suggest that the entorhinal grid system is very specifically spatial and related to path integration. Why would grid-bydirection cells be present and intertwined with grid cells in the author's memory-related reconceptualization? It seems to me that the existence of grid-by-direction cells is strong evidence that at least part of this network is specifically spatial.

      Head by direction grid cells were a key part of the reported results. These grid cells naturally arise in the model as the animal forms memories (aka, hippocampal place cells) that conjoin location (as defined by border cells), head direction at the time of memory formation, and one or more non-spatial properties found at that location. In this revision, I have attempted to better explain how including head direction in hippocampal memories naturally gives rise to these cell types. The introduction to the head direction module simulations now reads:

      “According to this memory model of spatial navigation, place cells are the conjunction of location, as defined by border cells, and one or more properties that are remembered to exist at that location. Such memories could, for instance, allow an animal to remember the location of a food cache (Payne et al., 2021). The next set of simulations investigates behavior of the model when one of the to-be-remembered properties is head direction at the time when the memory was formed (e.g., the direction of a pathway leading to a food cache). Indicating that head direction is an important part of place cell representations, early work on place cells in mazes found strong sensitivity to head direction, such that the place field is found in one direction of travel but not the other (McNaughton et al., 1983; Muller et al., 1994). Place cells can exhibit a less extreme version of head direction sensitivity in open field recordings (Rubin et al., 2014), but the nature of the sensitivity is more complicated, depending on location of the animal relative to the place field center (Jercog et al., 2019).

      It is possible that some place cell memories do not receive head direction input, as was the case for the simulations reported in Figures 6/7 – in those simulations, place cells were entirely insensitive to head direction, owing to a lack of input from head direction cells. However, removal of head direction input to hippocampus affects place cell responses (Calton et al., 2003) and grid cell responses (Winter et al., 2015), suggesting that head direction is a key component of the circuit. Furthermore, if place cells represent episodic memories, it seems natural that they should include head direction (i.e., viewpoint at the time of memory formation).

      In the simulations reported next, head direction is simply another property that is conjoined in a hippocampal place cell memory. In this case, a head direction cell should become a head direction conjunctive grid cell (i.e., a grid cell, but only when the animal is heading in a particular direction), owing to memory feedback from the hexagonal array of hippocampal place cell memories. When including head direction, the real-world dimensions of variation are across three dimensions (X, Y, and head direction) rather than two, and consolidation will cause the place cells to arrange in a three-dimensional volume. The simulation reported below demonstrates that this situation provides a “grid module”.”

      - I am also concerned that the paper does not do enough to address findings regarding how the elliptical shape of grid fields shifts when boundaries of an environment compress in one direction or change shape/angles (Lever et al., & Krupic et al). Those studies show compression in grid fields based on boundary position, and I don't see how the authors' model would explain these findings.  

      This finding was covered in the original submission: “For instance, perhaps one egocentric/allocentric pair of mEC grid modules is based on head direction (viewpoint) in remembered positions relative to the enclosure borders whereas a different egocentric/allocentric pair is based on head direction in remembered positions relative to landmarks exterior to the enclosure. This might explain why a deformation of the enclosure (moving in one of the walls to form a rectangle rather than a square) caused some of the grid modules but not others to undergo a deformation of the grid pattern in response to the deformation of the enclosure wall (see also Barry et al., 2007). More specifically, if there is one set of non-orthogonal dimensions for enclosure borders and the movement of one wall is too modest as to cause avoid global remapping, this would deform the grid modules based the enclosure border cells. At the same time, if other grid modules are based on exterior properties (e.g., perhaps border cells in relation to the experimental room rather than the enclosure), then those grid modules would be unperturbed by moving the enclosure wall.”

      I apologize for being unclear in describing how the model might explain this result. The paragraph has been rewritten and now reads:

      “Consider the possibility that one mEC grid modules is based on head direction (viewpoint) in remembered positions relative to the enclosure borders (e.g., learning the properties of the enclosure, such as the metal surface) while a different grid module is based on head direction in remembered positions relative to landmarks exterior to the enclosure (e.g., learning the properties of the experimental room, such as the sound of electronics that the animal is subject to at all locations). This might explain why a deformation of the enclosure (moving one of the walls to form a rectangle rather than a square) caused some of the grid modules but not others to undergo a deformation of the grid pattern in response to the deformation of the enclosure wall (see also Barry et al., 2007). More specifically, suppose that the movement of one wall is modest and after moving the wall, the animal views the enclosure as being the same enclosure, albeit slightly modified (e.g., when a home is partially renovated, it is still considered the same home). In this case, the set of non-orthogonal dimensions associated with enclosure borders would still be associated with the now-changed borders and any memories in reference to this border-determined space would adjust their positions accordingly in real-world coordinates (i.e., the place cells would subtly shift their positions owing to this deformation of the borders, producing a corresponding deformation of the grid). At the same time, there may be other sets of memories that are in relation to dimensions exterior to the enclosure. Because these exterior properties are unchanged, any place cells and grid cells associated with the exterior-oriented memories would be unchanged by moving the enclosure wall.”

      - Are findings regarding speed modulation of grid cells problematic for the paper's memory results? 

      - A further issue is that the paper does not seem to adequately address developmental findings related to the timecourses of the emergence of different cell types. In their simulation, researchers demonstrate the immediate emergence of grid fields in a novel environment, while noting that the stabilization of place cell positions takes time. However, these simulation findings contradict previous empirical developmental studies (Langston et al., 2010). Those studies showed that head direction cells show the earliest development of spatial response, followed by the appearance of place cells at a similar developmental stage. In contrast, grid cells emerge later in this developmental sequence. The gradual improvement in spatial stability in firing patterns likely plays a crucial role in the developmental trajectory of grid cells. Contrary to the model simulation, grid cells emerge later than place cells and head direction cells, yet they also hold significance in spatial mapping. 

      - The model simulations suggest that certain grid patterns are acquired more gradually than others. For instance, egocentric grid cells require the stabilization of place cell memories amidst ongoing consolidation, while allocentric grid cells tend to reflect average place field positions. However, these findings seemingly conflict with empirical studies, particularly those on the conjunctive representation of distance and direction in the earliest grid cells. Previous studies show no significant differences were found in grid cells and grid cells with directional correlates across these age groups, relative to adults (Wills et al., 2012). This indicates that the combined representation of distance and direction in single mEC cells is present from the earliest ages at which grid cells emerge. 

      These are good points and they have been addressed in a new section of the introduction titled ‘The Scope of the Proposed Model’. That section reads:

      “The reported simulations explain why most mEC cell types in the rodent literature appear to be spatial (Boccara et al., 2010; Diehl et al., 2017; Grieves & Jeffery, 2017). Assuming that rodents can form non-spatial memories, rodent hippocampus must receive non-spatial input from entorhinal cortex. These simulations suggest that characterization of the rodent mEC cortex as primarily spatial might be incorrect if most grid cells (except perhaps head direction conjunctive grid cells) have been mischaracterized as spatial. Other literatures with other species find non-spatial representations in MTL (Gulli et al., 2020; Quiroga et al., 2005; Wixted et al., 2014) and non-spatial hippocampal memory encoding has been found in rodents (Liu et al., 2012; McEchron & Disterhoft, 1999). The proposed memory model is compatible with these results – the ideas contained in this model could be applied to nonspatial memory representations. However, surveys of cell types in rodent entorhinal cortex seem to indicate that most cells are spatial (Boccara et al., 2010; Diehl et al., 2017; Grieves & Jeffery, 2017). How can the rodent hippocampus encode nonspatial memories if most of its input is spatial? The goal of the reported simulations is to explain the apparent paucity of non-spatial cells in rodent entorhinal cortex by proposing that grid cells have been misclassified as spatial (see also Luo et al., 2024).

      Given the simplicity of the proposed model, there are important findings that the model cannot address -- it is not that the model makes the wrong predictions but rather that it makes no predictions. The role of running speed (Kraus et al., 2015) is one such variable for which the model makes no predictions. Similarly, because the model is a rate-coded model rather than a model of oscillating spiking neurons, it makes no predictions regarding theta oscillations (Buzsáki & Moser, 2013). The model is an account of learning and memory for an adult animal, and it makes no predictions regarding the developmental (Langston et al., 2010; Muessig et al., 2015; Wills et al., 2012) or evolutionary (Rodrıguez et al., 2002) time course of different cell types. This model contains several purely spatial representations such as border cells, head direction cells, and head direction conjunctive grid cells and it may be that these purely spatial cell types emerged first, followed by the evolution and/or development of non-spatial cell types. However, this does not invalidate the model. Instead, this is a model for an adult animal that has both episodic memory capabilities and spatial navigation capabilities, irrespective of the order in which these capabilities emerged.

      This model has the potential to explain context effects in memory (Godden & Baddeley, 1975; Gulli et al., 2020; Howard et al., 2005). According to this model, different grid cells represent different non-spatial characteristics and place cells represent the combination of these “context” factors and location. In the simulation, just one grid cell is simulated but the same results would emerge when simulating hundreds of different non-spatial inputs provided that all of the simulated non-spatial inputs exist throughout the recording session. However, there is evidence that hippocampus can explicitly represent the passage of time (Eichenbaum, 2014), and time is assuredly an important factor in defining episodic memory (Bright et al., 2020). Thus, although the current model addresses unique combinations of what and where, it is left to future work to incorporate representations of when in the memory model.”

      Reviewer #3 (Public Review): 

      A crucial assumption of the model is that the content of experience must be constant in space. It's difficult to imagine a real-world example that satisfies this assumption. Odors and sounds are used as examples. While they are often more spatially diffuse than an objects on the ground, odors and sounds have sources that are readily detectable. Animals can easily navigate to a food source or to a vocalizing conspecific. This assumption is especially problematic because it predicts that all grid cells should become silent when their preferred non-spatial attribute (e.g. a specific odor) is missing. I'm not aware of any experimental data showing that grid cells become silent. On the contrary, grid cells are known to remain active across all contexts that have been tested, including across sleep/wake states. Unlike place cells, grid cells do not seem to turn off. Since grid cells are active in all contexts, their preferred attribute must also be present in all contexts, and therefore they would not convey any information about the specific content of an experience.  

      These are good points and in this revision I have attempted to explain that there is a great deal of contextual similarity across all recording sessions. One paragraph in the discussion now reads

      “In a typical rodent spatial navigation study, the non-spatial attributes are wellcontrolled, existing at all locations regardless of the enclosure used during testing (hence, a grid cell in one enclosure will be a grid cell in a different enclosure). Because labs adopt standard procedures, the surfaces, odors (e.g., from cleaning), external lighting, time of day, human handler, electronic apparatus, hunger/thirst state, etc. might be the same for all recording sessions. Additionally, the animal is not allowed to interact with other animals during recording and this isolation may be an unusual and highly salient property of all recording sessions. Notably, the animal is always attached to wires during recording. The internal state of the animal (fear, aloneness, the noise of electronics, etc.) is likely similar across all recording situations and attributes of this internal state are likely represented in the hippocampus and entorhinal input to hippocampus. According to this model, hippocampal place cells are “marking” all locations in the enclosure as places where these things tend to happen.”

      The proposed novelty of this theory is that other models all assume that grid cells encode space. This isn't quite true of models based on continuous attractor networks, the discussion of which is notably absent. More specifically, these models focus on the importance of intrinsic dynamics within the entorhinal cortex in generating the grid pattern. While this firing pattern is aligned to space during navigation and therefore can be used as a representation of that space, the neural dynamics are preserved even during sleep. Similarly, it is because the grid pattern does not strictly encode physical space that gridlike signals are also observed in relation to other two-dimensional continuous variables. 

      These models were briefly discussed in the general discussion section and in this revision they are further discussed in the introduction in a new section, titled ‘Why Model the Rodent Navigation Literature with a Memory Model?’ That section reads:

      “Spatial navigation is inherently a memory problem – learning the spatial arrangement of a new enclosure requires memory for the conjunction of what and where. This has long been realized and in the introduction to ‘Hippocampus as a Cognitive Map’, O’Keefe and Nadel (1978) wrote “We shall argue that the hippocampus is the core of a neural memory system providing an objective spatial framework within which the items and events of an organism's experience are located and interrelated” (emphasis added). Furthermore, in the last chapter of their book, they extended cognitive map theory to human memory for non-spatial characteristics. However, in the decades since the development of cognitive map theory, the rodent spatial navigation and human memory literatures have progressed somewhat independently.

      The ideas proposed in this model are an attempt to reunify these literatures by returning to the original claim that spatial navigation is inherently a memory problem. The goal of the current study is to explain the rodent spatial navigation literature using a memory model that has the potential to also explain the human memory literature. In contrast, most grid cell models (Bellmund et al., 2016; Bush et al., 2015; Castro & Aguiar, 2014; Hasselmo, 2009; Mhatre et al., 2012; Solstad et al., 2006; Sorscher et al., 2023; Stepanyuk, 2015; Widloski & Fiete, 2014) are domain specific models of spatial navigation and as such, they do not lend themselves to explanations of human memory. Thus, the reason to prefer this model is parsimony. Rather than needing to develop a theory of memory that is separate from a theory of spatial navigation, it might be possible to address both literatures with a unified account.

      This study does not attempt to falsify other theories of grid cells. Instead, this model reaches a radically different interpretation regarding the function of grid cells; an interpretation that emerges from viewing spatial navigation as a memory problem. All other grid cell models assume that an entorhinal grid cell displaying a spatially arranged grid of firing fields serves the function of spatial coding (i.e., spatial grid cells exist to support a spatial metric). In contrast, the proposed memory model of grid cells assumes that the hexagonal tiling reflects the need to keep memories separate from each other to minimize confusion and confabulation – the grid pattern is the byproduct of pattern separation between memories rather than the basis of a spatial code. 

      It is now understood that grid-like firing fields can occur for non-spatial two dimensional spaces. For instance, human entorhinal cortex exhibits grid-like responses to video morph trajectories in a two-dimensional bird neck-length versus bird leg-length space (Constantinescu et al., 2016). As a general theory of learning and memory, the proposed memory model of grid cells is easily extended to explain these results (e.g., relabeling the border cell inputs in the model as neck-length and leg-length inputs). However, there are other grid cell models that can explain both spatial grid cells as well as non-spatial grid-like responses (Mok & Love, 2019; Rodríguez-Domínguez & Caplan, 2019; Stachenfeld et al., 2017; Wei et al., 2015). Similar to this memory model of grid cells, these models are also positioned to explain both the rodent spatial navigation and human memory literatures. Nevertheless, there is a key difference between this model and other grid cell models that generalize to non-spatial representations. Specifically, these other models assume that grid cells exhibiting spatial receptive fields serve the function of identifying positions in the environment (i.e., their function is spatial). As such, these models do not explain why most of the input to rodent hippocampus appears to be spatial (Boccara et al., 2010; Diehl et al., 2017; Grieves & Jeffery, 2017). This memory model of grid cells provides an answer to the apparent paucity of nonspatial cell types in rodent MTL by proposing that grid cells with spatial receptive fields have been misclassified as spatial (they are what cells rather than where cells) and that place cells are fundamentally memory cells that conjoin what and where.”

      The use of border cells or boundary vector cells as the main (or only) source of spatial information in the hippocampus is not well supported by experimental data. Border cells in the entorhinal cortex are not active in the center of an environment. Boundary-vector cells can fire farther away from the walls but are not found in the entorhinal cortex. They are located in the subiculum, a major output of the hippocampus. While the entorhinalhippocampal circuit is a loop, the route from boundary-vector cells to place cells is much less clear than from grid cells. Moreover, both border cells and boundary-vector cells (which are conflated in this paper) comprise a small population of neurons compared to grid cells.

      AUTHOR RESPONSE: The model can be built without assuming between-border cells (early simulations with the model did not make this assumption). Regarding this issue, the text reads “Unlike the BVC model, the boundary cell representation is sparsely populated using a basis set of three cells for each of the three dimensions (i.e., 9 cells in total), such that for each of the three non-orthogonal orientations, one cell captures one border, another the opposite border, and the third cell captures positions between the opposing borders (Solstad et al., 2008). However, this is not a core assumption, and it is possible to configure the model with border cell configurations that contain two opponent border cells per dimension, without needing to assume that any cells prefer positions between the borders (with the current parameters, the model predicts there will be two border cells for each between-border cell). Similarly, it is possible to configure the model with more than 3 cells for each dimension (i.e., multiple cells representing positions between the borders).” The Solstad paper found a few cells that responded in positions between borders, but perhaps not as many as 1 out of 3 cells, such as this particular model simulation predicts. If the paucity of between-border cells is a crucial data point, the model can be reconfigured with opponent-border cells without any between border cells. The reason that 3 border cells were used rather than 2 opponent border cells was for simplicity. Because 3 head direction cells were used to capture the face-centered cubic packing of memories, the simulation also used 3 border cells per dimensions to allow a common linear sum metric when conjoining dimensions to form memories. If the border dimensions used 2 cells while head direction used 3 cells, a dimensional weighting scheme would be needed to allow this mixing of “apples and oranges” in terms of distances in the 3D space that includes head direction.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Specific questions/clarifications:  

      (1) Assumption of population-based vs single unit link to biological cells: At the start, the author assumes that each unit here can be associated with a population: "the simulated activation values can be thought of as proportional to the average firing rate of an ensemble of neurons with similar inputs and outputs (O'Reilly & Munakata, 2000)." But is a 'grid cell' found here a single cell or an average of many cells? Does this mean the model assumes many cells that have different fields that are averaged, which become a grid-like unit in the model? But in biology, these are single cells? Or does it mean a grid response is an average of the place cell inputs? 

      I apologize for being unclear about this. The grid cells in the model are equivalent to real single cells except that the simulation uses a ratecoded cell rather than a spiking cell. The averaging that was mentioned in the paper is across identically behaving spiking cells rather than across cells with different grid field arrangements. To better explain this, I have added the following text:

      “For instance, consider a set of several thousand spiking grid cells that are identical in terms of their firing fields. At any moment, some of these identically-behaving cells will produce an action potential while others do not (i.e., the cells are not perfectly synchronized), but a snapshot of their behavior can be extracted by calculating average firing rate across the ensemble. The simulated cells in the model represent this average firing rate of identically-behaving ensembles of spiking neurons.” 

      This is a mathematical short-cut to avoid simulating many spiking neurons. Because this model was compared to real spike rate maps, this real-valued average firing rate is down-sampled to produce spikes by finding the locations that produced the top 5% of real-valued activation values across the simulation.

      (2) It is not clear to me why they are circular border cells/basis sets.  

      In the initial submission, there was a brief paragraph describing this assumption. In this revision, that paragraph has been expanded and modified for greater clarity. It now reads:

      “Because head direction is necessarily a circular dimension, it was assumed that all dimensions are circular (a circular dimension is approximately linear for nearby locations). This assumption of circular dimensions was made to keep the model relatively simple, making it easier to combine dimensions and allowing application of the same processes for all dimensions. For instance, the model requires a weight normalization process to ensure that the pattern of weights for each dimension corresponds to a possible input value along that dimension. However, the normalization for a linear dimension is necessarily different than for a circular dimension. Because the neural tuning functions were assumed to be sine waves, normalization requires that the sum of squared weights add up to a constant value. For a linear dimension, this sum of squares rule only applies to the subset of cells that are relevant to a particular value along the dimension whereas for a circular dimension, this sum of squares rule is over the entire set of cells that represent the dimension (i.e., weight normalization is easier to implement with circular dimensions). Although all dimensions were assumed to be circular for reasons of mathematical convenience and parsimony, circular dimensions may relate to the finding that human observers have difficultly re-orienting themselves in a room depending on the degree of rotational symmetry of the room (Kelly et al., 2008). In addition, this simplifying assumption allows the model to capture the finding that the population of grid cells lies on a torus (Gardner et al., 2022), although I note that the model was developed before this result was known.”

      (3) Why is it 3 components? I realise that the number doesn't matter too much, but I believe more is better, so is it just for simplicity? 

      In this revision, additional text has been added to explain this assumption: “To keep the model simple, the same number of cells was assumed for all dimensions and all dimensions were assumed to be circular (head direction is necessarily circular and because one dimension needed to be circular, all dimensions were assumed to be circular). Three cells per dimensions was chosen because this provides a sparse population code of each dimension, with few border cells responding between borders, with few border cells responding between borders, while allowing three separate phases of grid cells within a grid cell module (in the model, a grid cell module arises from combination of a third dimension, such as head direction, with the real-world X/Y dimensions defined by border cells).”

      As a reminder, the text explaining the sparse coding of border cells reads: “However, this is not a core assumption, and it is possible to configure the model with border cell configurations that contain two opponent border cells per dimension, without needing to assume that any cells prefer positions between the borders (with the current parameters, the model predicts there will be two border cells for each between-border cell). Similarly, it is possible to configure the model with more than 3 cells for each dimension (i.e., multiple cells representing positions between the borders).”

      The model can work with just two opponent cells or with more than three cells per basis set. In different simulations, I have explored these possibilities. Three was chosen because it is a convenient way to highlight the face-centered cubic packing of memories that tends to occur (FCP produces 3 alternating layers of hexagonally arranged firing fields). Thus, each of the three head direction cells captures a different layer of the FCP arrangement. A more realistic simulation might combine 6 different head direction cells tiling the head direction dimension with opponent border cells (just 2 cells for each border dimensions). Such a combination would produce responses at borders, but no responses between borders and, at the same time, the head direction cells would still reveal the FCP arrangement. However, it is not easy to find the right parameters for such a mix-and-match simulation in which different dimensions have different numbers of tuning functions (e.g., some dimensions having 2 cells while others have 3 or 6 and some dimensions being linear while others are circular). When all of the dimensions are of the same type, the simple sum that arises from multiplying the input by the weight values gives rise to Euclidean distance (see Figure 3B). With a mix-and-match model of different dimension-types, it should be possible to adjust the sum to nevertheless produce a monotonic function with Euclidean distance although I leave this to future work. To keep things simple, I assumed that all dimensions are of the same type (circular, with 3 cells per dimension).  

      (4) Confusion due to the border cells/box was unclear to me. "If the period of the circular border cells was the same as the width of the box, then a memory pushed outside the box on one side would appear on the opposite side of the box, in which case the partial grid field on one side should match up with its remainder on the other side. This would entail complete confusion between opposite sides of the box, and the representation of the box would be a torus (donut-shaped) rather than a flat two-dimensional surface. To reduce confusion ..." Is this confusion of the model? Of the animal?  

      This would be confusion of the animal (e.g., a memory field overlapping with one border would also appear at the opposite border in the corresponding location). At one point in model development, I made the assumption that one side of the box wraps to the other side, and I asked Trygve Solstad to run some analyses of real data to see if cells actually wrap around in this manner. He did not find any evidence of this, and so I decided to include outsidethe-box representational area which, as it turned out, allowed the model to capture other behaviors as detailed in the paper.

      This section of the paper now reads:

      “The cosine tuning curves of the simulated border cells represent distance from the border on both sides of the border (i.e., firing rate increases as the animal approaches the border from either the inside or the outside of the enclosure). Experimental procedures do not allow the animal to experience locations immediately outside the enclosure, but these locations remain an important part of the hypothetic representation, particularly when considering the modification of memories through consolidation (i.e., a memory created inside the enclosure might be moved to a location outside the enclosure). This symmetry about the border cell’s preferred location is needed to maintain an unbiased representation, with a constant sum of squares for the border cell inputs (see methods section). Rather than using linear dimensions, all dimensions were assumed to be circular to keep the model relatively simple. This assumption was made because head direction is necessarily a circular dimension and by having all dimensions be circular, it is easy to combine dimensions in a consistent manner to produce multidimensional hippocampal place cell memories. Thus, the border cells define a torus (or more accurately a three-torus) of possible locations. This provides a hypothetical space of locations that could be represented.

      In light of the assumption to represent border cells with a circular dimension, when a memory is pushed outside the East wall of the enclosure, it would necessarily be moved to the West wall of the enclosure if the period of the circular dimension was equal to the width of the enclosure. If this were true, then the partial grid field on one side of the enclosure would match up with its remainder on the other side. Such a situation would cause the animal to become completely confused regarding opposite sides of the enclosure (a location on the West wall would be indistinguishable from the corresponding location on the East wall). To reduce confusion between opposite sides of the enclosure, the width of the enclosure in which the animal navigated (Figure 5) was assumed to be half as wide as the full period of the border cells. In other words, although the space of possible representations was a three-torus, it was assumed that the real-world twodimensional enclosure encompassed a section of the torus (e.g., a square piece of tape stuck onto the surface of a donut). The torus is better thought of as “playing field” in which different sizes and shapes of enclosure can be represented (i.e., different sizes and shapes of tape placed on the donut). Furthermore, this assumption provides representational space that is outside the box without such locations wrapping around to the opposite side of the box.”

      (5) Figure 3 - This result seems to be related to whether you use Euclidean or city-block distance. If you use Euclidean distances in two dimensions wouldn't this work out fine?  

      Euclidean distance was the metric used in the analysis of the two-dimensional simulation, but this did not work out. To make this clear, I have changed the label on the x-axes to read “Euclidean distance” for both the two- and three-dimensional simulations. The two-dimensional simulation produced city block behavior rather than Euclidean behavior because memory retrieval is the sum of the two dimensions, as is standard in neural networks, rather than the Euclidian distance formula, which would require that memory retrieval be the square root of the sum of squares of the two dimensions. One way to address this problem with the two-dimensional simulation would be to use a specific Euclidean-mimicking activation function rather than a simple sum of dimensions. The very first model I developed used such an activation function as applied to opponent border cells with just two dimensions (so 4 cells in total – left/right and top/down). This produced Euclidean behavior, but the activation function was implausible and did not generalize to simulations that also included head direction. In contrast, with three non-orthogonal dimensions, the simple sum of dimensions is approximately Euclidean.

      (6) Final sentence of the Discussion: "However, unlike the present model, these models still assume that entorhinal grid cells represent space rather than a non-spatial attribute." I am not sure if the authors of the cited papers will agree with this. They consider the spatial cases, but most argue they can treat non-spatial features as well. What the author might mean is that they assume non-spatial features are in some metric space that, in a way, is spatial. However, I am not sure if the author would argue that non-spatial features cannot be encoded metrically (e.g., Euclidean distance based on the similarity of odours). 

      In this section, when referring to “entorhinal grid cells” I was specifically referring to traditional grid cells in a rodent spatial navigation experiment. I did not mean to imply that these other theories cannot explain nonspatial grid fields, such as in the two-dimensional bird space grid cells found with humans. The way in which the proposed memory model and these other models differ is in terms of what they assume regarding the function of grid cells that exhibit spatial grid fields. In this revision, I have changed this text to read:

      “These models can capture some of the grid cell results presented in the current simulations, including extension to non-spatial grid-like responses (e.g., grid field that cover a two-dimensional neck/leg length bird space). Furthermore, these models may be able to explain memory phenomena similar to the model proposed in this study. However, unlike the proposed model, these models assume that the function of entorhinal grid cells that exhibit spatial X/Y grid fields during navigation is to represent space. In contrast, the memory model proposed in this study assume that the function of spatial X/Y grid cells is to represent a non-spatial attribute; the only reason they exhibit a spatial X/Y grid is because memories of that non-spatial attribute are arranged in a hexagonal grid owing to the uncluttered/unvarying nature of the enclosure. Thus, these model do not explain why most of the input to rodent hippocampus appears to be spatial (Boccara et al., 2010b; Diehl et al., 2017; Grieves & Jeffery, 2017) whereas the proposed model can explain this situation as reflecting the miss-classification of grid cells with a spatial arrangement as providing spatial input to hippocampus.”

      (7) It would be interesting to see videos/gifs of the model learning, and an idea of how many steps of trials it takes (is it capturing real-time rodent cell firing whilst foraging, or is it more abstracted, taking more trials). 

      The short answer is “yes”, the model is capturing real-time rodent cell firing while foraging. This is particularly true when simulating place cell memories in the absence of head direction information, as was shown in a video provided in the initial submission in relation to Figure 4. In this revision, I have provided a second video of learning when simulating place cell memories that include head direction. This second video is in relation to the results reported in Figure 9. This shows that even when learning a three-dimensional real-world space (X, Y, and head direction), the model rapidly produces an on-average hexagonal arrangement of place cells memories owing to the slight tendency of the place cell memories to linger in some locations as compared to others during consolidation. More specifically, they are more likely to linger in the locations that are the intersections of the peaks and/or troughs of the border cells and it is this tendency that supports the immediate appearance of grid cells. However, because the place cell memories are still shifting, head direction conjunctive grid cells are slower to emerge (the head direction conjunctive grid cells require stabilization of the place cells). The video then speeds up the learning process to so how place cells eventually stabilize after sufficient learning of the borders of the enclosure from different head/view directions.

      (8) One question is whether all the results have to be presented in the main text. It was difficult to see which key predictions fit the data and do so better than a spatial/navigation account. 

      Thank you for this suggestion. To make the paper more readable and easier for different readers with different interests to choose different aspects of the results to read, the second half of the results have been put in an appendix. More specifically, the second half of the results concerned place cells rather than grid cells. Thus, in this revision, the main text concerns grid cell results and the appendix concerns place cell results.

      Reviewer #3 (Recommendations For The Authors):  

      The title could usefully be shortened to focus on the main argument that observed firing patterns could be consistent with mapping memories instead of space. It's a stretch to argue that memory is the primary role when no such data is presented (i.e., there is no comparison of competing models). 

      This is a good point (I do not present evidence that conclusively indicates the function of MTL). This original title was chosen to make clear how this account is a radical departure from other accounts of grid cells. The revised title highlights that: 1) a memory model can also explain rodent single cell recording data during navigation; and 2) grid cell may not be non-spatial. The revised title is: “A Memory Model of Rodent Spatial Navigation: Place Cells are Memories Arranged in a Grid and Grid Cells are Non-spatial”

      When arguing that the main role of the hippocampus is memory, I strongly suggest engaging with the work of people like Howard Eichenbaum who spent the better part of their career arguing the same (e.g. DOI:10.1152/jn.00005.2017.)  

      Thank you for pointing out this important oversight. Early in introduction, I now write: “The proposal that hippocampus represents the multimodal conjunctions that define an episode is not new (Marr et al., 1991; Sutherland & Rudy, 1989) and neither is the proposal that hippocampal memory supports spatial/navigation ability (Eichenbaum, 2017). This view of the hippocampus is consistent with “feature in place” results (O’Keefe & Krupic, 2021) in which hippocampal cells respond to the conjunction of a non-spatial attribute affixed to a specific location, rather than responding more generically to any instance of a non-spatial attribute. In other words, the what/where conjunction is unique. Furthermore, the uniqueness of the what/where conjunction may be the fundamental building block of spatial memory and navigation. In reviewing the hippocampal literature, Howard Eichenbaum (2017) concludes that ‘the hippocampal system is not dedicated to spatial cognition and navigation, but organizes experiences in memory, for which spatial mapping and navigation are both a metaphor for and a prominent application of relational memory organization.’”

      With a focus on episodic memory, there should be a mention of the temporal component of memory. While it may rightfully be beyond the scope of this model, it's confusing to omit time completely from the discussion. 

      This issue and several others are now addressed in a new section in the introduction titled ‘The Scope of the Proposed Model’. That section reads:

      “The reported simulations explain why most mEC cell types in the rodent literature appear to be spatial (Boccara et al., 2010; Diehl et al., 2017; Grieves & Jeffery, 2017). Assuming that rodents can form non-spatial memories, rodent hippocampus must receive non-spatial input from entorhinal cortex. These simulations suggest that characterization of the rodent mEC cortex as primarily spatial might be incorrect if most grid cells (except perhaps head direction conjunctive grid cells) have been mischaracterized as spatial. Other literatures with other species find non-spatial representations in MTL (Gulli et al., 2020; Quiroga et al., 2005; Wixted et al., 2014) and non-spatial hippocampal memory encoding has been found in rodents (Liu et al., 2012; McEchron & Disterhoft, 1999). The proposed memory model is compatible with these results – the ideas contained in this model could be applied to nonspatial memory representations. However, surveys of cell types in rodent entorhinal cortex seem to indicate that most cells are spatial (Boccara et al., 2010; Diehl et al., 2017; Grieves & Jeffery, 2017). How can the rodent hippocampus encode nonspatial memories if most of its input is spatial? The goal of the reported simulations is to explain the apparent paucity of non-spatial cells in rodent entorhinal cortex by proposing that grid cells have been misclassified as spatial (see also Luo et al., 2024).

      Given the simplicity of the proposed model, there are important findings that the model cannot address -- it is not that the model makes the wrong predictions but rather that it makes no predictions. The role of running speed (Kraus et al., 2015) is one such variable for which the model makes no predictions. Similarly, because the model is a rate-coded model rather than a model of oscillating spiking neurons, it makes no predictions regarding theta oscillations (Buzsáki & Moser, 2013). The model is an account of learning and memory for an adult animal, and it makes no predictions regarding the developmental (Langston et al., 2010; Muessig et al., 2015; Wills et al., 2012) or evolutionary (Rodrıguez et al., 2002) time course of different cell types. This model contains several purely spatial representations such as border cells, head direction cells, and head direction conjunctive grid cells and it may be that these purely spatial cell types emerged first, followed by the evolution and/or development of non-spatial cell types. However, this does not invalidate the model. Instead, this is a model for an adult animal that has both episodic memory capabilities and spatial navigation capabilities, irrespective of the order in which these capabilities emerged.

      This model has the potential to explain context effects in memory (Godden & Baddeley, 1975; Gulli et al., 2020; Howard et al., 2005). According to this model, different grid cells represent different non-spatial characteristics and place cells represent the combination of these “context” factors and location. In the simulation, just one grid cell is simulated but the same results would emerge when simulating hundreds of different non-spatial inputs provided that all of the simulated non-spatial inputs exist throughout the recording session. However, there is evidence that hippocampus can explicitly represent the passage of time (Eichenbaum, 2014), and time is assuredly an important factor in defining episodic memory (Bright et al., 2020). Thus, although the current model addresses unique combinations of what and where, it is left to future work to incorporate representations of when in the memory model.”

      I recommend explaining the motivation of the theory in more detail in the introduction. It reads as "what if it's like this?" It would be helpful to instead highlight the limitations of current theories and argue why this theory is either a better fit for the data or is logically simpler. 

      This issue and several others are now addressed in the new section in the introduction titled ‘Why Model the Rodent Navigation Literature with a Memory Model?’, which I quoted above in response to the public reviews.

      It's worth considering shortening the results section to include only those that most convincingly support the main claim. The manuscript is quite long and appears to lack focus at times. 

      Thank you for this suggestion. To make the paper more readable and easier for different readers with different interests to choose different aspects of the results to read, the second half of the results have been put in an appendix. More specifically, the second half of the results concerned place cells rather than grid cells. Thus, in this revision, the main text concerns grid cell results and the appendix concerns place cell results.

      The discussion of path dependence on the formation of the grid pattern is important but only briefly discussed. It may be useful to add simulations testing whether different paths (not random walks) produce distorted grid patterns. 

      The short answer is that the path doesn’t affect things in general. The consolidation rule ensures equally spaced memories even if, for instance, one side of the enclosure is explored much more than the other side. As just one example, I have run simulations with a radial arm maze and even though the animal is constrained to only run on the maze arms. The memories still arrange hexagonally as memories become pushed outside the arms. Rather than adding additional simulations to study, I now briefly describe this in the model methods:

      “Of note, the ability of the model to produce grid cell responses does not depend on this decision to simulate an animal taking a random walk – the same results emerge if the animal is more systematic in its path. All that matters for producing grid cell responses is that the animal visits all locations and that the animal takes on different head directions for the same location in the case of simulations that also include head direction as an input to hippocampal place cells.”

      I struggle to understand in Figure 3 why retrieval strength ought to scale monotonically with Euclidean distance, and why that justifies a more complex model (three non-orthogonal dimensions). 

      The introduction to this section now reads: “Animals can plan novel straight line paths to reach a known position and evidence suggests they do so by learning Euclidean representations of space (Cheng & Gallistel, 2014; Normand & Boesch, 2009; Wilkie, 1989). Thus, it was assumed that hippocampal place cells represent positions in Euclidean space (as opposed to non-Euclidean space, such a occurs with a city-block metric).”

      p.17 "although the representational space is a torus (or more specifically a three-torus), it is assumed that the real-world two-dimensional surface is only a section of the torus (e.g., a square piece of tape stuck onto the surface of a donut)." I fail to understand how the realworld surface is only a part of the torus. In the existing theoretical and experimental work on toroidal topology of grid cell activity, the torus represents a very small fraction of the real world, and repeating activity on the toroidal manifold is a crucial feature of how it maps 2D space in a regular manner. Why then here do you want the torus to be larger than the realworld? 

      This section has been rewritten to better explain these assumptions. The relevant paragraphs now read:

      “The cosine tuning curves of the simulated border cells represent distance from the border on both sides of the border (i.e., firing rate increases as the animal approaches the border from either the inside or the outside of the enclosure). Experimental procedures do not allow the animal to experience locations immediately outside the enclosure, but these locations remain an important part of the hypothetic representation, particularly when considering the modification of memories through consolidation (i.e., a memory created inside the enclosure might be moved to a location outside the enclosure). This symmetry about the border cell’s preferred location is needed to maintain an unbiased representation, with a constant sum of squares for the border cell inputs (see methods section). Rather than using linear dimensions, all dimensions were assumed to be circular to keep the model relatively simple. This assumption was made because head direction is necessarily a circular dimension and by having all dimensions be circular, it is easy to combine dimensions in a consistent manner to produce multidimensional hippocampal place cell memories. Thus, the border cells define a torus (or more accurately a three-torus) of possible locations. This provides a hypothetical space of locations that could be represented.

      In light of the assumption to represent border cells with a circular dimension, when a memory is pushed outside the East wall of the enclosure, it would necessarily be moved to the West wall of the enclosure if the period of the circular dimension was equal to the width of the enclosure. If this were true, then the partial grid field on one side of the enclosure would match up with its remainder on the other side. Such a situation would cause the animal to become completely confused regarding opposite sides of the enclosure (a location on the West wall would be indistinguishable from the corresponding location on the East wall). To reduce confusion between opposite sides of the enclosure, the width of the enclosure in which the animal navigated (Figure 5) was assumed to be half as wide as the full period of the border cells. In other words, although the space of possible representations was a three-torus, it was assumed that the real-world twodimensional enclosure encompassed a section of the torus (e.g., a square piece of tape stuck onto the surface of a donut). The torus is better thought of as “playing field” in which different sizes and shapes of enclosure can be represented (i.e., different sizes and shapes of tape placed on the donut). Furthermore, this assumption provides representational space that is outside the box without such locations wrapping around to the opposite side of the box.”

      p.28 "More specifically, egocentric grid cells (e.g., head direction conjunctive grid cells) require stabilization of the place cell memories in the face of ongoing consolidation whereas allocentric grid cells reflect on-average place field positions." and p.32 "if place cells represent episodic memories, it seems natural that they should include head direction (an egocentric viewpoint)." But the head direction signal is not egocentric, it is allocentric. I'm unsure whether this is a typo or a potentially more serious conceptual misunderstanding. 

      Any reference to egocentric has been removed in this revision. In the initial submission, when I used egocentric, I was referring to memories that depended on the head direction of the animal at the time of memory formation. I was using “egocentric” in relation to whether the memory was related to the animal’s personal bodily experience at the time of memory formation. But I concede that this is confusing since the ego/allo distinction is typically used to differentiate angular directions that are relative to the person (left/right) versus earth (East/West). Instead, throughout the manuscript I now refer to these as view-dependent memories since head direction would entail having a different view of the environment at the time of memory formation. I still refer to the stacking of multiple view-dependent memories on the same X/Y location as being the development of an allocentric representation however, since this can be thought of as one way to learn a cognitive map of the enclosure that is view independent.

      p.37 "But if the border cells had changed their alignment with the new enclosure (e.g., if the E border dimension aligned with the North-South borders), then the place cells would have appeared to undergo global remapping as their positions rotated by 90 degrees and the grid pattern would have also rotated." But this would not be interpreted as global remapping by standard analyses of place and grid cell responses. A coherent rotation of firing patterns is not interpreted as remapping. 

      This sentence now reads: “But if the border cells had changed their alignment with the new enclosure (e.g., if the E border dimension aligned with the North-South borders), then the place cells would remain in their same positions relative to the now-rotated borders (i.e., no remapping relative to the enclosure) and the corresponding grid cells would also retain their same alignment relative to the enclosure.”

      p.37 "this is more accurately described as partial remapping (nearly all place fields were unaffected)." If nearly all place fields were unaffected, this should be interpreted as a stable map. Partial remapping is a mix of stability, rate remapping, and global remapping within a population of place cells. 

      This sentence has been removed.

      p.40 "The dependence of grid cell responses on memory may help explain why grid cells have been found for bats crawling on a two-dimensional surface (Yartsev et al., 2011), but three-dimensional grid cells have never been observed for flying bats." This is not true. Ginosar et al. (2021) observed 3D grid cells in flying bats.  

      Thank you for highlighting this issue. In the initial submission I was using “grid cell” to mean a cell that produced a precise hexagonal grid, which is not the case for the 3D grid cells in bats. In this revision, I now discuss grid cell that produce irregular grid fields, writing:

      “According to this model, hexagonally arranged grid cells should be the exception rather than the rule when considering more naturalistic environments. In a more ecologically valid situation, such as with landmarks, varied sounds, food sources, threats, and interactions with conspecifics, there may still be remembered locations were events occurred or remembered properties can be found, but because the non-spatial properties are non-uniform in the environment, the arrangement of memory feedback will be irregular, reflecting the varied nature of the environment. This may explain the finding that even in a situation where there are regular hexagonal grid cells, there are often irregular non-grid cells that have a reliable multi-location firing field, but the arrangement of the firing fields is irregular (Diehl et al., 2017). For instance, even when navigating in an enclosure that has uniform properties as dictated by experimental procedures, they may be other properties that were not well-controlled (e.g., a view of exterior lighting in some locations but not others), and these uncontrolled properties may produce an irregular grid (i.e., because the uncontrolled properties are reliably associated with some locations but not others, hippocampal memory feedback triggers retrieval of those properties in the associations locations).

      In this memory model, there are other situations in which an irregular but reliable multi-location grid may occur, even when everything is well controlled. In the reported simulations, when the hippocampal place cells were based on variation in X/Y (as defined by Border cells), nothing else changed as a function of location, and the model rapidly produced a precise hexagonal arrangement of hippocampal place cell memories. When head direction was included (i.e., real-world variation in X, Y, and head direction), the model still produced a hexagonal arrangement as per face centered cubic packing of memories, but this precise arrangement was slower to emerge, with place cells continuing to shift their positions until the borders of the enclosure were sufficiently well learned from multiple viewpoints. If there is realworld variation in four or more dimensions, as is likely the case in a more ecologically valid situation, it will be even harder for place cell memories to settle on a precise regular lattice. Furthermore, in the case of four dimensions, mathematicians studying the “sphere packing problem” recently concluded that densest packing is irregular (Campos et al., 2023). This may explain why the multifield grid cells for freely flying bats have a systematic minimum distance between firing fields, but their arrangement is globally irregular (Ginosar et al., 2021). Assuming that the memories encoded by a bat include not just the three realworld dimensions of variation, but also head direction, the grid will likely be irregular even under optimal conditions of laboratory control.”

      Multiple typos are found on page 25, end of paragraph 3: "More specifically, if there is one set of non-orthogonal dimensions for enclosure borders and the movement of one wall is too modest as to cause avoid global remapping, this would deform the grid modules based the enclosure border cells."

      As detailed above in the response the public reviews, this paragraph has been rewritten.

    1. According to all known laws of aviation,

      there is no way a bee should be able to fly.

      Its wings are too small to get its fat little body off the ground.

      The bee, of course, flies anyway

      because bees don't care what humans think is impossible.

      Yellow, black. Yellow, black. Yellow, black. Yellow, black.

      Ooh, black and yellow! Let's shake it up a little.

      Barry! Breakfast is ready!

      Ooming!

      Hang on a second.

      Hello?

      Barry?

      Adam?

      Oan you believe this is happening?

      I can't. I'll pick you up.

      Looking sharp.

      Use the stairs. Your father paid good money for those.

      Sorry. I'm excited.

      Here's the graduate. We're very proud of you, son.

      A perfect report card, all B's.

      Very proud.

      Ma! I got a thing going here.

      You got lint on your fuzz.

      Ow! That's me!

      Wave to us! We'll be in row 118,000.

      Bye!

      Barry, I told you, stop flying in the house!

      Hey, Adam.

      Hey, Barry.

      Is that fuzz gel?

      A little. Special day, graduation.

      Never thought I'd make it.

      Three days grade school, three days high school.

      Those were awkward.

      Three days college. I'm glad I took a day and hitchhiked around the hive.

      You did come back different.

      Hi, Barry.

      Artie, growing a mustache? Looks good.

      Hear about Frankie?

      Yeah.

      You going to the funeral?

      No, I'm not going.

      Everybody knows, sting someone, you die.

      Don't waste it on a squirrel. Such a hothead.

      I guess he could have just gotten out of the way.

      I love this incorporating an amusement park into our day.

      That's why we don't need vacations.

      Boy, quite a bit of pomp… under the circumstances.

      Well, Adam, today we are men.

      We are!

      Bee-men.

      Amen!

      Hallelujah!

      Students, faculty, distinguished bees,

      please welcome Dean Buzzwell.

      Welcome, New Hive Oity graduating class of…

      …9:15.

      That concludes our ceremonies.

      And begins your career at Honex Industries!

      Will we pick ourjob today?

      I heard it's just orientation.

      Heads up! Here we go.

      Keep your hands and antennas inside the tram at all times.

      Wonder what it'll be like? A little scary. Welcome to Honex, a division of Honesco

      and a part of the Hexagon Group.

      This is it!

      Wow.

      Wow.

      We know that you, as a bee, have worked your whole life

      to get to the point where you can work for your whole life.

      Honey begins when our valiant Pollen Jocks bring the nectar to the hive.

      Our top-secret formula

      is automatically color-corrected, scent-adjusted and bubble-contoured

      into this soothing sweet syrup

      with its distinctive golden glow you know as…

      Honey!

      That girl was hot.

      She's my cousin!

      She is?

      Yes, we're all cousins.

      Right. You're right.

      At Honex, we constantly strive

      to improve every aspect of bee existence.

      These bees are stress-testing a new helmet technology.

      What do you think he makes? Not enough. Here we have our latest advancement, the Krelman.

      What does that do? Oatches that little strand of honey that hangs after you pour it. Saves us millions.

      Oan anyone work on the Krelman?

      Of course. Most bee jobs are small ones. But bees know

      that every small job, if it's done well, means a lot.

      But choose carefully

      because you'll stay in the job you pick for the rest of your life.

      The same job the rest of your life? I didn't know that.

      What's the difference?

      You'll be happy to know that bees, as a species, haven't had one day off

      in 27 million years.

      So you'll just work us to death?

      We'll sure try.

      Wow! That blew my mind!

      "What's the difference?" How can you say that?

      One job forever? That's an insane choice to have to make.

      I'm relieved. Now we only have to make one decision in life.

      But, Adam, how could they never have told us that?

      Why would you question anything? We're bees.

      We're the most perfectly functioning society on Earth.

      You ever think maybe things work a little too well here?

      Like what? Give me one example.

      I don't know. But you know what I'm talking about.

      Please clear the gate. Royal Nectar Force on approach.

      Wait a second. Oheck it out.

      Hey, those are Pollen Jocks! Wow. I've never seen them this close.

      They know what it's like outside the hive.

      Yeah, but some don't come back.

      Hey, Jocks! Hi, Jocks! You guys did great!

      You're monsters! You're sky freaks! I love it! I love it!

      I wonder where they were. I don't know. Their day's not planned.

      Outside the hive, flying who knows where, doing who knows what.

      You can'tjust decide to be a Pollen Jock. You have to be bred for that.

      Right.

      Look. That's more pollen than you and I will see in a lifetime.

      It's just a status symbol. Bees make too much of it.

      Perhaps. Unless you're wearing it and the ladies see you wearing it.

      Those ladies? Aren't they our cousins too?

      Distant. Distant.

      Look at these two.

      Oouple of Hive Harrys. Let's have fun with them. It must be dangerous being a Pollen Jock.

      Yeah. Once a bear pinned me against a mushroom!

      He had a paw on my throat, and with the other, he was slapping me!

      Oh, my! I never thought I'd knock him out. What were you doing during this?

      Trying to alert the authorities.

      I can autograph that.

      A little gusty out there today, wasn't it, comrades?

      Yeah. Gusty.

      We're hitting a sunflower patch six miles from here tomorrow.

      Six miles, huh? Barry! A puddle jump for us, but maybe you're not up for it.

      Maybe I am. You are not! We're going 0900 at J-Gate.

      What do you think, buzzy-boy? Are you bee enough?

      I might be. It all depends on what 0900 means.

      Hey, Honex!

      Dad, you surprised me.

      You decide what you're interested in?

      Well, there's a lot of choices. But you only get one. Do you ever get bored doing the same job every day?

      Son, let me tell you about stirring.

      You grab that stick, and you just move it around, and you stir it around.

      You get yourself into a rhythm. It's a beautiful thing.

      You know, Dad, the more I think about it,

      maybe the honey field just isn't right for me.

      You were thinking of what, making balloon animals?

      That's a bad job for a guy with a stinger.

      Janet, your son's not sure he wants to go into honey!

      Barry, you are so funny sometimes. I'm not trying to be funny. You're not funny! You're going into honey. Our son, the stirrer!

      You're gonna be a stirrer? No one's listening to me! Wait till you see the sticks I have.

      I could say anything right now. I'm gonna get an ant tattoo!

      Let's open some honey and celebrate!

      Maybe I'll pierce my thorax. Shave my antennae.

      Shack up with a grasshopper. Get a gold tooth and call everybody "dawg"!

      I'm so proud.

      We're starting work today! Today's the day. Oome on! All the good jobs will be gone.

      Yeah, right.

      Pollen counting, stunt bee, pouring, stirrer, front desk, hair removal…

      Is it still available? Hang on. Two left! One of them's yours! Oongratulations! Step to the side.

      What'd you get? Picking crud out. Stellar! Wow!

      Oouple of newbies?

      Yes, sir! Our first day! We are ready!

      Make your choice.

      You want to go first? No, you go. Oh, my. What's available?

      Restroom attendant's open, not for the reason you think.

      Any chance of getting the Krelman? Sure, you're on. I'm sorry, the Krelman just closed out.

      Wax monkey's always open.

      The Krelman opened up again.

      What happened?

      A bee died. Makes an opening. See? He's dead. Another dead one.

      Deady. Deadified. Two more dead.

      Dead from the neck up. Dead from the neck down. That's life!

      Oh, this is so hard!

      Heating, cooling, stunt bee, pourer, stirrer,

      humming, inspector number seven, lint coordinator, stripe supervisor,

      mite wrangler. Barry, what do you think I should… Barry?

      Barry!

      All right, we've got the sunflower patch in quadrant nine…

      What happened to you? Where are you?

      I'm going out.

      Out? Out where?

      Out there.

      Oh, no!

      I have to, before I go to work for the rest of my life.

      You're gonna die! You're crazy! Hello?

      Another call coming in.

      If anyone's feeling brave, there's a Korean deli on 83rd

      that gets their roses today.

      Hey, guys.

      Look at that. Isn't that the kid we saw yesterday? Hold it, son, flight deck's restricted.

      It's OK, Lou. We're gonna take him up.

      Really? Feeling lucky, are you?

      Sign here, here. Just initial that.

      Thank you. OK. You got a rain advisory today,

      and as you all know, bees cannot fly in rain.

      So be careful. As always, watch your brooms,

      hockey sticks, dogs, birds, bears and bats.

      Also, I got a couple of reports of root beer being poured on us.

      Murphy's in a home because of it, babbling like a cicada!

      That's awful. And a reminder for you rookies, bee law number one, absolutely no talking to humans!

      All right, launch positions!

      Buzz, buzz, buzz, buzz! Buzz, buzz, buzz, buzz! Buzz, buzz, buzz, buzz!

      Black and yellow!

      Hello!

      You ready for this, hot shot?

      Yeah. Yeah, bring it on.

      Wind, check.

      Antennae, check.

      Nectar pack, check.

      Wings, check.

      Stinger, check.

      Scared out of my shorts, check.

      OK, ladies,

      let's move it out!

      Pound those petunias, you striped stem-suckers!

      All of you, drain those flowers!

      Wow! I'm out!

      I can't believe I'm out!

      So blue.

      I feel so fast and free!

      Box kite!

      Wow!

      Flowers!

      This is Blue Leader. We have roses visual.

      Bring it around 30 degrees and hold.

      Roses!

      30 degrees, roger. Bringing it around.

      Stand to the side, kid. It's got a bit of a kick.

      That is one nectar collector!

      Ever see pollination up close? No, sir. I pick up some pollen here, sprinkle it over here. Maybe a dash over there,

      a pinch on that one. See that? It's a little bit of magic.

      That's amazing. Why do we do that?

      That's pollen power. More pollen, more flowers, more nectar, more honey for us.

      Oool.

      I'm picking up a lot of bright yellow. Oould be daisies. Don't we need those?

      Oopy that visual.

      Wait. One of these flowers seems to be on the move.

      Say again? You're reporting a moving flower?

      Affirmative.

      That was on the line!

      This is the coolest. What is it?

      I don't know, but I'm loving this color.

      It smells good. Not like a flower, but I like it.

      Yeah, fuzzy.

      Ohemical-y.

      Oareful, guys. It's a little grabby.

      My sweet lord of bees!

      Oandy-brain, get off there!

      Problem!

      Guys! This could be bad. Affirmative.

      Very close.

      Gonna hurt.

      Mama's little boy.

      You are way out of position, rookie!

      Ooming in at you like a missile!

      Help me!

      I don't think these are flowers.

      Should we tell him? I think he knows. What is this?!

      Match point!

      You can start packing up, honey, because you're about to eat it!

      Yowser!

      Gross.

      There's a bee in the car!

      Do something!

      I'm driving!

      Hi, bee.

      He's back here!

      He's going to sting me!

      Nobody move. If you don't move, he won't sting you. Freeze!

      He blinked!

      Spray him, Granny!

      What are you doing?!

      Wow… the tension level out here is unbelievable.

      I gotta get home.

      Oan't fly in rain.

      Oan't fly in rain.

      Oan't fly in rain.

      Mayday! Mayday! Bee going down!

      Ken, could you close the window please?

      Ken, could you close the window please?

      Oheck out my new resume. I made it into a fold-out brochure.

      You see? Folds out.

      Oh, no. More humans. I don't need this.

      What was that?

      Maybe this time. This time. This time. This time! This time! This…

      Drapes!

      That is diabolical.

      It's fantastic. It's got all my special skills, even my top-ten favorite movies.

      What's number one? Star Wars?

      Nah, I don't go for that…

      …kind of stuff.

      No wonder we shouldn't talk to them. They're out of their minds.

      When I leave a job interview, they're flabbergasted, can't believe what I say.

      There's the sun. Maybe that's a way out.

      I don't remember the sun having a big 75 on it.

      I predicted global warming.

      I could feel it getting hotter. At first I thought it was just me.

      Wait! Stop! Bee!

      Stand back. These are winter boots.

      Wait!

      Don't kill him!

      You know I'm allergic to them! This thing could kill me!

      Why does his life have less value than yours?

      Why does his life have any less value than mine? Is that your statement?

      I'm just saying all life has value. You don't know what he's capable of feeling.

      My brochure!

      There you go, little guy.

      I'm not scared of him. It's an allergic thing.

      Put that on your resume brochure.

      My whole face could puff up.

      Make it one of your special skills.

      Knocking someone out is also a special skill.

      Right. Bye, Vanessa. Thanks.

      Vanessa, next week? Yogurt night?

      Sure, Ken. You know, whatever.

      You could put carob chips on there.

      Bye.

      Supposed to be less calories.

      Bye.

      I gotta say something.

      She saved my life. I gotta say something.

      All right, here it goes.

      Nah.

      What would I say?

      I could really get in trouble.

      It's a bee law. You're not supposed to talk to a human.

      I can't believe I'm doing this.

      I've got to.

      Oh, I can't do it. Oome on!

      No. Yes. No.

      Do it. I can't.

      How should I start it? "You like jazz?" No, that's no good.

      Here she comes! Speak, you fool!

      Hi!

      I'm sorry.

      You're talking. Yes, I know. You're talking!

      I'm so sorry.

      No, it's OK. It's fine. I know I'm dreaming.

      But I don't recall going to bed.

      Well, I'm sure this is very disconcerting.

      This is a bit of a surprise to me. I mean, you're a bee!

      I am. And I'm not supposed to be doing this,

      but they were all trying to kill me.

      And if it wasn't for you…

      I had to thank you. It's just how I was raised.

      That was a little weird.

      I'm talking with a bee. Yeah. I'm talking to a bee. And the bee is talking to me!

      I just want to say I'm grateful. I'll leave now.

      Wait! How did you learn to do that? What? The talking thing.

      Same way you did, I guess. "Mama, Dada, honey." You pick it up.

      That's very funny. Yeah. Bees are funny. If we didn't laugh, we'd cry with what we have to deal with.

      Anyway…

      Oan I…

      …get you something?

      Like what? I don't know. I mean… I don't know. Ooffee?

      I don't want to put you out.

      It's no trouble. It takes two minutes.

      It's just coffee.

      I hate to impose.

      Don't be ridiculous!

      Actually, I would love a cup.

      Hey, you want rum cake?

      I shouldn't.

      Have some.

      No, I can't.

      Oome on!

      I'm trying to lose a couple micrograms.

      Where? These stripes don't help. You look great!

      I don't know if you know anything about fashion.

      Are you all right?

      No.

      He's making the tie in the cab as they're flying up Madison.

      He finally gets there.

      He runs up the steps into the church. The wedding is on.

      And he says, "Watermelon? I thought you said Guatemalan.

      Why would I marry a watermelon?"

      Is that a bee joke?

      That's the kind of stuff we do.

      Yeah, different.

      So, what are you gonna do, Barry?

      About work? I don't know.

      I want to do my part for the hive, but I can't do it the way they want.

      I know how you feel.

      You do? Sure. My parents wanted me to be a lawyer or a doctor, but I wanted to be a florist.

      Really? My only interest is flowers. Our new queen was just elected with that same campaign slogan.

      Anyway, if you look…

      There's my hive right there. See it?

      You're in Sheep Meadow!

      Yes! I'm right off the Turtle Pond!

      No way! I know that area. I lost a toe ring there once.

      Why do girls put rings on their toes?

      Why not?

      It's like putting a hat on your knee.

      Maybe I'll try that.

      You all right, ma'am?

      Oh, yeah. Fine.

      Just having two cups of coffee!

      Anyway, this has been great. Thanks for the coffee.

      Yeah, it's no trouble.

      Sorry I couldn't finish it. If I did, I'd be up the rest of my life.

      Are you…?

      Oan I take a piece of this with me?

      Sure! Here, have a crumb.

      Thanks! Yeah. All right. Well, then… I guess I'll see you around.

      Or not.

      OK, Barry.

      And thank you so much again… for before.

      Oh, that? That was nothing.

      Well, not nothing, but… Anyway…

      This can't possibly work.

      He's all set to go. We may as well try it.

      OK, Dave, pull the chute.

      Sounds amazing. It was amazing! It was the scariest, happiest moment of my life.

      Humans! I can't believe you were with humans!

      Giant, scary humans! What were they like?

      Huge and crazy. They talk crazy.

      They eat crazy giant things. They drive crazy.

      Do they try and kill you, like on TV?

      Some of them. But some of them don't.

      How'd you get back?

      Poodle.

      You did it, and I'm glad. You saw whatever you wanted to see.

      You had your "experience." Now you can pick out yourjob and be normal.

      Well… Well? Well, I met someone.

      You did? Was she Bee-ish?

      A wasp?! Your parents will kill you!

      No, no, no, not a wasp.

      Spider?

      I'm not attracted to spiders.

      I know it's the hottest thing, with the eight legs and all.

      I can't get by that face.

      So who is she?

      She's… human.

      No, no. That's a bee law. You wouldn't break a bee law.

      Her name's Vanessa. Oh, boy. She's so nice. And she's a florist!

      Oh, no! You're dating a human florist!

      We're not dating.

      You're flying outside the hive, talking to humans that attack our homes

      with power washers and M-80s! One-eighth a stick of dynamite!

      She saved my life! And she understands me.

      This is over!

      Eat this.

      This is not over! What was that?

      They call it a crumb. It was so stingin' stripey! And that's not what they eat. That's what falls off what they eat!

      You know what a Oinnabon is? No. It's bread and cinnamon and frosting. They heat it up…

      Sit down!

      …really hot!

      Listen to me! We are not them! We're us. There's us and there's them!

      Yes, but who can deny the heart that is yearning?

      There's no yearning. Stop yearning. Listen to me!

      You have got to start thinking bee, my friend. Thinking bee!

      Thinking bee. Thinking bee. Thinking bee! Thinking bee! Thinking bee! Thinking bee!

      There he is. He's in the pool.

      You know what your problem is, Barry?

      I gotta start thinking bee?

      How much longer will this go on?

      It's been three days! Why aren't you working?

      I've got a lot of big life decisions to think about.

      What life? You have no life! You have no job. You're barely a bee!

      Would it kill you to make a little honey?

      Barry, come out. Your father's talking to you.

      Martin, would you talk to him?

      Barry, I'm talking to you!

      You coming?

      Got everything?

      All set!

      Go ahead. I'll catch up.

      Don't be too long.

      Watch this!

      Vanessa!

      We're still here. I told you not to yell at him. He doesn't respond to yelling!

      Then why yell at me? Because you don't listen! I'm not listening to this.

      Sorry, I've gotta go.

      Where are you going? I'm meeting a friend. A girl? Is this why you can't decide?

      Bye.

      I just hope she's Bee-ish.

      They have a huge parade of flowers every year in Pasadena?

      To be in the Tournament of Roses, that's every florist's dream!

      Up on a float, surrounded by flowers, crowds cheering.

      A tournament. Do the roses compete in athletic events?

      No. All right, I've got one. How come you don't fly everywhere?

      It's exhausting. Why don't you run everywhere? It's faster.

      Yeah, OK, I see, I see. All right, your turn.

      TiVo. You can just freeze live TV? That's insane!

      You don't have that?

      We have Hivo, but it's a disease. It's a horrible, horrible disease.

      Oh, my.

      Dumb bees!

      You must want to sting all those jerks.

      We try not to sting. It's usually fatal for us.

      So you have to watch your temper.

      Very carefully. You kick a wall, take a walk,

      write an angry letter and throw it out. Work through it like any emotion:

      Anger, jealousy, lust.

      Oh, my goodness! Are you OK?

      Yeah.

      What is wrong with you?! It's a bug. He's not bothering anybody. Get out of here, you creep!

      What was that? A Pic 'N' Save circular?

      Yeah, it was. How did you know?

      It felt like about 10 pages. Seventy-five is pretty much our limit.

      You've really got that down to a science.

      I lost a cousin to Italian Vogue. I'll bet. What in the name of Mighty Hercules is this?

      How did this get here? Oute Bee, Golden Blossom,

      Ray Liotta Private Select?

      Is he that actor?

      I never heard of him.

      Why is this here?

      For people. We eat it.

      You don't have enough food of your own?

      Well, yes.

      How do you get it?

      Bees make it.

      I know who makes it!

      And it's hard to make it!

      There's heating, cooling, stirring. You need a whole Krelman thing!

      It's organic. It's our-ganic! It's just honey, Barry.

      Just what?!

      Bees don't know about this! This is stealing! A lot of stealing!

      You've taken our homes, schools, hospitals! This is all we have!

      And it's on sale?! I'm getting to the bottom of this.

      I'm getting to the bottom of all of this!

      Hey, Hector.

      You almost done? Almost. He is here. I sense it.

      Well, I guess I'll go home now

      and just leave this nice honey out, with no one around.

      You're busted, box boy!

      I knew I heard something. So you can talk!

      I can talk. And now you'll start talking!

      Where you getting the sweet stuff? Who's your supplier?

      I don't understand. I thought we were friends.

      The last thing we want to do is upset bees!

      You're too late! It's ours now!

      You, sir, have crossed the wrong sword!

      You, sir, will be lunch for my iguana, Ignacio!

      Where is the honey coming from?

      Tell me where!

      Honey Farms! It comes from Honey Farms!

      Orazy person!

      What horrible thing has happened here?

      These faces, they never knew what hit them. And now

      they're on the road to nowhere!

      Just keep still.

      What? You're not dead?

      Do I look dead? They will wipe anything that moves. Where you headed?

      To Honey Farms. I am onto something huge here.

      I'm going to Alaska. Moose blood, crazy stuff. Blows your head off!

      I'm going to Tacoma.

      And you? He really is dead. All right.

      Uh-oh!

      What is that?!

      Oh, no!

      A wiper! Triple blade!

      Triple blade?

      Jump on! It's your only chance, bee!

      Why does everything have to be so doggone clean?!

      How much do you people need to see?!

      Open your eyes! Stick your head out the window!

      From NPR News in Washington, I'm Oarl Kasell.

      But don't kill no more bugs!

      Bee!

      Moose blood guy!!

      You hear something?

      Like what?

      Like tiny screaming.

      Turn off the radio.

      Whassup, bee boy?

      Hey, Blood.

      Just a row of honey jars, as far as the eye could see.

      Wow!

      I assume wherever this truck goes is where they're getting it.

      I mean, that honey's ours.

      Bees hang tight. We're all jammed in. It's a close community.

      Not us, man. We on our own. Every mosquito on his own.

      What if you get in trouble? You a mosquito, you in trouble. Nobody likes us. They just smack. See a mosquito, smack, smack!

      At least you're out in the world. You must meet girls.

      Mosquito girls try to trade up, get with a moth, dragonfly.

      Mosquito girl don't want no mosquito.

      You got to be kidding me!

      Mooseblood's about to leave the building! So long, bee!

      Hey, guys! Mooseblood! I knew I'd catch y'all down here. Did you bring your crazy straw?

      We throw it in jars, slap a label on it, and it's pretty much pure profit.

      What is this place?

      A bee's got a brain the size of a pinhead.

      They are pinheads!

      Pinhead.

      Oheck out the new smoker. Oh, sweet. That's the one you want. The Thomas 3000!

      Smoker?

      Ninety puffs a minute, semi-automatic. Twice the nicotine, all the tar.

      A couple breaths of this knocks them right out.

      They make the honey, and we make the money.

      "They make the honey, and we make the money"?

      Oh, my!

      What's going on? Are you OK?

      Yeah. It doesn't last too long.

      Do you know you're in a fake hive with fake walls?

      Our queen was moved here. We had no choice.

      This is your queen? That's a man in women's clothes!

      That's a drag queen!

      What is this?

      Oh, no!

      There's hundreds of them!

      Bee honey.

      Our honey is being brazenly stolen on a massive scale!

      This is worse than anything bears have done! I intend to do something.

      Oh, Barry, stop.

      Who told you humans are taking our honey? That's a rumor.

      Do these look like rumors?

      That's a conspiracy theory. These are obviously doctored photos.

      How did you get mixed up in this?

      He's been talking to humans.

      What? Talking to humans?! He has a human girlfriend. And they make out!

      Make out? Barry!

      We do not.

      You wish you could. Whose side are you on? The bees!

      I dated a cricket once in San Antonio. Those crazy legs kept me up all night.

      Barry, this is what you want to do with your life?

      I want to do it for all our lives. Nobody works harder than bees!

      Dad, I remember you coming home so overworked

      your hands were still stirring. You couldn't stop.

      I remember that.

      What right do they have to our honey?

      We live on two cups a year. They put it in lip balm for no reason whatsoever!

      Even if it's true, what can one bee do?

      Sting them where it really hurts.

      In the face! The eye!

      That would hurt. No. Up the nose? That's a killer.

      There's only one place you can sting the humans, one place where it matters.

      Hive at Five, the hive's only full-hour action news source.

      No more bee beards!

      With Bob Bumble at the anchor desk.

      Weather with Storm Stinger.

      Sports with Buzz Larvi.

      And Jeanette Ohung.

      Good evening. I'm Bob Bumble. And I'm Jeanette Ohung. A tri-county bee, Barry Benson,

      intends to sue the human race for stealing our honey,

      packaging it and profiting from it illegally!

      Tomorrow night on Bee Larry King,

      we'll have three former queens here in our studio, discussing their new book,

      Olassy Ladies, out this week on Hexagon.

      Tonight we're talking to Barry Benson.

      Did you ever think, "I'm a kid from the hive. I can't do this"?

      Bees have never been afraid to change the world.

      What about Bee Oolumbus? Bee Gandhi? Bejesus?

      Where I'm from, we'd never sue humans.

      We were thinking of stickball or candy stores.

      How old are you?

      The bee community is supporting you in this case,

      which will be the trial of the bee century.

      You know, they have a Larry King in the human world too.

      It's a common name. Next week…

      He looks like you and has a show and suspenders and colored dots…

      Next week…

      Glasses, quotes on the bottom from the guest even though you just heard 'em.

      Bear Week next week! They're scary, hairy and here live.

      Always leans forward, pointy shoulders, squinty eyes, very Jewish.

      In tennis, you attack at the point of weakness!

      It was my grandmother, Ken. She's 81.

      Honey, her backhand's a joke! I'm not gonna take advantage of that?

      Quiet, please. Actual work going on here.

      Is that that same bee? Yes, it is! I'm helping him sue the human race.

      Hello. Hello, bee. This is Ken.

      Yeah, I remember you. Timberland, size ten and a half. Vibram sole, I believe.

      Why does he talk again?

      Listen, you better go 'cause we're really busy working.

      But it's our yogurt night!

      Bye-bye.

      Why is yogurt night so difficult?!

      You poor thing. You two have been at this for hours!

      Yes, and Adam here has been a huge help.

      Frosting… How many sugars? Just one. I try not to use the competition.

      So why are you helping me?

      Bees have good qualities.

      And it takes my mind off the shop.

      Instead of flowers, people are giving balloon bouquets now.

      Those are great, if you're three.

      And artificial flowers.

      Oh, those just get me psychotic! Yeah, me too. Bent stingers, pointless pollination.

      Bees must hate those fake things!

      Nothing worse than a daffodil that's had work done.

      Maybe this could make up for it a little bit.

      This lawsuit's a pretty big deal. I guess. You sure you want to go through with it?

      Am I sure? When I'm done with the humans, they won't be able

      to say, "Honey, I'm home," without paying a royalty!

      It's an incredible scene here in downtown Manhattan,

      where the world anxiously waits, because for the first time in history,

      we will hear for ourselves if a honeybee can actually speak.

      What have we gotten into here, Barry?

      It's pretty big, isn't it?

      I can't believe how many humans don't work during the day.

      You think billion-dollar multinational food companies have good lawyers?

      Everybody needs to stay behind the barricade.

      What's the matter? I don't know, I just got a chill. Well, if it isn't the bee team.

      You boys work on this?

      All rise! The Honorable Judge Bumbleton presiding.

      All right. Oase number 4475,

      Superior Oourt of New York, Barry Bee Benson v. the Honey Industry

      is now in session.

      Mr. Montgomery, you're representing the five food companies collectively?

      A privilege.

      Mr. Benson… you're representing all the bees of the world?

      I'm kidding. Yes, Your Honor, we're ready to proceed.

      Mr. Montgomery, your opening statement, please.

      Ladies and gentlemen of the jury,

      my grandmother was a simple woman.

      Born on a farm, she believed it was man's divine right

      to benefit from the bounty of nature God put before us.

      If we lived in the topsy-turvy world Mr. Benson imagines,

      just think of what would it mean.

      I would have to negotiate with the silkworm

      for the elastic in my britches!

      Talking bee!

      How do we know this isn't some sort of

      holographic motion-picture-capture Hollywood wizardry?

      They could be using laser beams!

      Robotics! Ventriloquism! Oloning! For all we know,

      he could be on steroids!

      Mr. Benson?

      Ladies and gentlemen, there's no trickery here.

      I'm just an ordinary bee. Honey's pretty important to me.

      It's important to all bees. We invented it!

      We make it. And we protect it with our lives.

      Unfortunately, there are some people in this room

      who think they can take it from us

      'cause we're the little guys! I'm hoping that, after this is all over,

      you'll see how, by taking our honey, you not only take everything we have

      but everything we are!

      I wish he'd dress like that all the time. So nice!

      Oall your first witness.

      So, Mr. Klauss Vanderhayden of Honey Farms, big company you have.

      I suppose so.

      I see you also own Honeyburton and Honron!

      Yes, they provide beekeepers for our farms.

      Beekeeper. I find that to be a very disturbing term.

      I don't imagine you employ any bee-free-ers, do you?

      No.

      I couldn't hear you.

      No.

      No.

      Because you don't free bees. You keep bees. Not only that,

      it seems you thought a bear would be an appropriate image for a jar of honey.

      They're very lovable creatures.

      Yogi Bear, Fozzie Bear, Build-A-Bear.

      You mean like this?

      Bears kill bees!

      How'd you like his head crashing through your living room?!

      Biting into your couch! Spitting out your throw pillows!

      OK, that's enough. Take him away.

      So, Mr. Sting, thank you for being here. Your name intrigues me.

      Where have I heard it before? I was with a band called The Police. But you've never been a police officer, have you?

      No, I haven't.

      No, you haven't. And so here we have yet another example

      of bee culture casually stolen by a human

      for nothing more than a prance-about stage name.

      Oh, please.

      Have you ever been stung, Mr. Sting?

      Because I'm feeling a little stung, Sting.

      Or should I say… Mr. Gordon M. Sumner!

      That's not his real name?! You idiots!

      Mr. Liotta, first, belated congratulations on

      your Emmy win for a guest spot on ER in 2005.

      Thank you. Thank you.

      I see from your resume that you're devilishly handsome

      with a churning inner turmoil that's ready to blow.

      I enjoy what I do. Is that a crime?

      Not yet it isn't. But is this what it's come to for you?

      Exploiting tiny, helpless bees so you don't

      have to rehearse your part and learn your lines, sir?

      Watch it, Benson! I could blow right now!

      This isn't a goodfella. This is a badfella!

      Why doesn't someone just step on this creep, and we can all go home?!

      Order in this court! You're all thinking it! Order! Order, I say!

      Say it! Mr. Liotta, please sit down! I think it was awfully nice of that bear to pitch in like that.

      I think the jury's on our side.

      Are we doing everything right, legally?

      I'm a florist.

      Right. Well, here's to a great team.

      To a great team!

      Well, hello.

      Ken! Hello. I didn't think you were coming.

      No, I was just late. I tried to call, but… the battery.

      I didn't want all this to go to waste, so I called Barry. Luckily, he was free.

      Oh, that was lucky.

      There's a little left. I could heat it up.

      Yeah, heat it up, sure, whatever.

      So I hear you're quite a tennis player.

      I'm not much for the game myself. The ball's a little grabby.

      That's where I usually sit. Right… there.

      Ken, Barry was looking at your resume,

      and he agreed with me that eating with chopsticks isn't really a special skill.

      You think I don't see what you're doing?

      I know how hard it is to find the rightjob. We have that in common.

      Do we?

      Bees have 100 percent employment, but we do jobs like taking the crud out.

      That's just what I was thinking about doing.

      Ken, I let Barry borrow your razor for his fuzz. I hope that was all right.

      I'm going to drain the old stinger.

      Yeah, you do that.

      Look at that.

      You know, I've just about had it

      with your little mind games.

      What's that? Italian Vogue. Mamma mia, that's a lot of pages.

      A lot of ads.

      Remember what Van said, why is your life more valuable than mine?

      Funny, I just can't seem to recall that!

      I think something stinks in here!

      I love the smell of flowers.

      How do you like the smell of flames?!

      Not as much.

      Water bug! Not taking sides!

      Ken, I'm wearing a Ohapstick hat! This is pathetic!

      I've got issues!

      Well, well, well, a royal flush!

      You're bluffing. Am I? Surf's up, dude!

      Poo water!

      That bowl is gnarly.

      Except for those dirty yellow rings!

      Kenneth! What are you doing?!

      You know, I don't even like honey! I don't eat it!

      We need to talk!

      He's just a little bee!

      And he happens to be the nicest bee I've met in a long time!

      Long time? What are you talking about?! Are there other bugs in your life?

      No, but there are other things bugging me in life. And you're one of them!

      Fine! Talking bees, no yogurt night…

      My nerves are fried from riding on this emotional roller coaster!

      Goodbye, Ken.

      And for your information,

      I prefer sugar-free, artificial sweeteners made by man!

      I'm sorry about all that.

      I know it's got an aftertaste! I like it!

      I always felt there was some kind of barrier between Ken and me.

      I couldn't overcome it. Oh, well.

      Are you OK for the trial?

      I believe Mr. Montgomery is about out of ideas.

      We would like to call Mr. Barry Benson Bee to the stand.

      Good idea! You can really see why he's considered one of the best lawyers…

      Yeah.

      Layton, you've gotta weave some magic

      with this jury, or it's gonna be all over.

      Don't worry. The only thing I have to do to turn this jury around

      is to remind them of what they don't like about bees.

      You got the tweezers? Are you allergic? Only to losing, son. Only to losing.

      Mr. Benson Bee, I'll ask you what I think we'd all like to know.

      What exactly is your relationship

      to that woman?

      We're friends.

      Good friends? Yes. How good? Do you live together?

      Wait a minute…

      Are you her little…

      …bedbug?

      I've seen a bee documentary or two. From what I understand,

      doesn't your queen give birth to all the bee children?

      Yeah, but…

      So those aren't your real parents!

      Oh, Barry…

      Yes, they are!

      Hold me back!

      You're an illegitimate bee, aren't you, Benson?

      He's denouncing bees!

      Don't y'all date your cousins?

      Objection! I'm going to pincushion this guy! Adam, don't! It's what he wants!

      Oh, I'm hit!!

      Oh, lordy, I am hit!

      Order! Order!

      The venom! The venom is coursing through my veins!

      I have been felled by a winged beast of destruction!

      You see? You can't treat them like equals! They're striped savages!

      Stinging's the only thing they know! It's their way!

      Adam, stay with me. I can't feel my legs. What angel of mercy will come forward to suck the poison

      from my heaving buttocks?

      I will have order in this court. Order!

      Order, please!

      The case of the honeybees versus the human race

      took a pointed turn against the bees

      yesterday when one of their legal team stung Layton T. Montgomery.

      Hey, buddy.

      Hey.

      Is there much pain?

      Yeah.

      I…

      I blew the whole case, didn't I?

      It doesn't matter. What matters is you're alive. You could have died.

      I'd be better off dead. Look at me.

      They got it from the cafeteria downstairs, in a tuna sandwich.

      Look, there's a little celery still on it.

      What was it like to sting someone?

      I can't explain it. It was all…

      All adrenaline and then… and then ecstasy!

      All right.

      You think it was all a trap?

      Of course. I'm sorry. I flew us right into this.

      What were we thinking? Look at us. We're just a couple of bugs in this world.

      What will the humans do to us if they win?

      I don't know.

      I hear they put the roaches in motels. That doesn't sound so bad.

      Adam, they check in, but they don't check out!

      Oh, my.

      Oould you get a nurse to close that window?

      Why? The smoke. Bees don't smoke.

      Right. Bees don't smoke.

      Bees don't smoke! But some bees are smoking.

      That's it! That's our case!

      It is? It's not over?

      Get dressed. I've gotta go somewhere.

      Get back to the court and stall. Stall any way you can.

      And assuming you've done step correctly, you're ready for the tub.

      Mr. Flayman.

      Yes? Yes, Your Honor!

      Where is the rest of your team?

      Well, Your Honor, it's interesting.

      Bees are trained to fly haphazardly,

      and as a result, we don't make very good time.

      I actually heard a funny story about…

      Your Honor, haven't these ridiculous bugs

      taken up enough of this court's valuable time?

      How much longer will we allow these absurd shenanigans to go on?

      They have presented no compelling evidence to support their charges

      against my clients, who run legitimate businesses.

      I move for a complete dismissal of this entire case!

      Mr. Flayman, I'm afraid I'm going

      to have to consider Mr. Montgomery's motion.

      But you can't! We have a terrific case.

      Where is your proof? Where is the evidence?

      Show me the smoking gun!

      Hold it, Your Honor! You want a smoking gun?

      Here is your smoking gun.

      What is that?

      It's a bee smoker!

      What, this? This harmless little contraption?

      This couldn't hurt a fly, let alone a bee.

      Look at what has happened

      to bees who have never been asked, "Smoking or non?"

      Is this what nature intended for us?

      To be forcibly addicted to smoke machines

      and man-made wooden slat work camps?

      Living out our lives as honey slaves to the white man?

      What are we gonna do? He's playing the species card. Ladies and gentlemen, please, free these bees!

      Free the bees! Free the bees!

      Free the bees!

      Free the bees! Free the bees!

      The court finds in favor of the bees!

      Vanessa, we won!

      I knew you could do it! High-five!

      Sorry.

      I'm OK! You know what this means?

      All the honey will finally belong to the bees.

      Now we won't have to work so hard all the time.

      This is an unholy perversion of the balance of nature, Benson.

      You'll regret this.

      Barry, how much honey is out there?

      All right. One at a time.

      Barry, who are you wearing?

      My sweater is Ralph Lauren, and I have no pants.

      What if Montgomery's right? What do you mean? We've been living the bee way a long time, 27 million years.

      Oongratulations on your victory. What will you demand as a settlement?

      First, we'll demand a complete shutdown of all bee work camps.

      Then we want back the honey that was ours to begin with,

      every last drop.

      We demand an end to the glorification of the bear as anything more

      than a filthy, smelly, bad-breath stink machine.

      We're all aware of what they do in the woods.

      Wait for my signal.

      Take him out.

      He'll have nauseous for a few hours, then he'll be fine.

      And we will no longer tolerate bee-negative nicknames…

      But it's just a prance-about stage name!

      …unnecessary inclusion of honey in bogus health products

      and la-dee-da human tea-time snack garnishments.

      Oan't breathe.

      Bring it in, boys!

      Hold it right there! Good.

      Tap it.

      Mr. Buzzwell, we just passed three cups, and there's gallons more coming!

      I think we need to shut down! Shut down? We've never shut down. Shut down honey production!

      Stop making honey!

      Turn your key, sir!

      What do we do now?

      Oannonball!

      We're shutting honey production!

      Mission abort.

      Aborting pollination and nectar detail. Returning to base.

      Adam, you wouldn't believe how much honey was out there.

      Oh, yeah?

      What's going on? Where is everybody?

      Are they out celebrating? They're home. They don't know what to do. Laying out, sleeping in.

      I heard your Uncle Oarl was on his way to San Antonio with a cricket.

      At least we got our honey back.

      Sometimes I think, so what if humans liked our honey? Who wouldn't?

      It's the greatest thing in the world! I was excited to be part of making it.

      This was my new desk. This was my new job. I wanted to do it really well.

      And now…

      Now I can't.

      I don't understand why they're not happy.

      I thought their lives would be better!

      They're doing nothing. It's amazing. Honey really changes people.

      You don't have any idea what's going on, do you?

      What did you want to show me? This. What happened here?

      That is not the half of it.

      Oh, no. Oh, my.

      They're all wilting.

      Doesn't look very good, does it?

      No.

      And whose fault do you think that is?

      You know, I'm gonna guess bees.

      Bees?

      Specifically, me.

      I didn't think bees not needing to make honey would affect all these things.

      It's notjust flowers. Fruits, vegetables, they all need bees.

      That's our whole SAT test right there.

      Take away produce, that affects the entire animal kingdom.

      And then, of course…

      The human species?

      So if there's no more pollination,

      it could all just go south here, couldn't it?

      I know this is also partly my fault.

      How about a suicide pact?

      How do we do it?

      I'll sting you, you step on me. Thatjust kills you twice. Right, right.

      Listen, Barry… sorry, but I gotta get going.

      I had to open my mouth and talk.

      Vanessa?

      Vanessa? Why are you leaving? Where are you going?

      To the final Tournament of Roses parade in Pasadena.

      They've moved it to this weekend because all the flowers are dying.

      It's the last chance I'll ever have to see it.

      Vanessa, I just wanna say I'm sorry. I never meant it to turn out like this.

      I know. Me neither.

      Tournament of Roses. Roses can't do sports.

      Wait a minute. Roses. Roses?

      Roses!

      Vanessa!

      Roses?!

      Barry?

      Roses are flowers! Yes, they are. Flowers, bees, pollen!

      I know. That's why this is the last parade.

      Maybe not. Oould you ask him to slow down?

      Oould you slow down?

      Barry!

      OK, I made a huge mistake. This is a total disaster, all my fault.

      Yes, it kind of is.

      I've ruined the planet. I wanted to help you

      with the flower shop. I've made it worse.

      Actually, it's completely closed down.

      I thought maybe you were remodeling.

      But I have another idea, and it's greater than my previous ideas combined.

      I don't want to hear it!

      All right, they have the roses, the roses have the pollen.

      I know every bee, plant and flower bud in this park.

      All we gotta do is get what they've got back here with what we've got.

      Bees.

      Park.

      Pollen!

      Flowers.

      Repollination!

      Across the nation!

      Tournament of Roses, Pasadena, Oalifornia.

      They've got nothing but flowers, floats and cotton candy.

      Security will be tight.

      I have an idea.

      Vanessa Bloome, FTD.

      Official floral business. It's real.

      Sorry, ma'am. Nice brooch.

      Thank you. It was a gift.

      Once inside, we just pick the right float.

      How about The Princess and the Pea?

      I could be the princess, and you could be the pea!

      Yes, I got it.

      Where should I sit?

      What are you?

      I believe I'm the pea.

      The pea?

      It goes under the mattresses.

      Not in this fairy tale, sweetheart. I'm getting the marshal. You do that! This whole parade is a fiasco!

      Let's see what this baby'll do.

      Hey, what are you doing?!

      Then all we do is blend in with traffic…

      …without arousing suspicion.

      Once at the airport, there's no stopping us.

      Stop! Security.

      You and your insect pack your float? Yes. Has it been in your possession the entire time?

      Would you remove your shoes?

      Remove your stinger. It's part of me. I know. Just having some fun. Enjoy your flight.

      Then if we're lucky, we'll have just enough pollen to do the job.

      Oan you believe how lucky we are? We have just enough pollen to do the job!

      I think this is gonna work.

      It's got to work.

      Attention, passengers, this is Oaptain Scott.

      We have a bit of bad weather in New York.

      It looks like we'll experience a couple hours delay.

      Barry, these are cut flowers with no water. They'll never make it.

      I gotta get up there and talk to them.

      Be careful.

      Oan I get help with the Sky Mall magazine?

      I'd like to order the talking inflatable nose and ear hair trimmer.

      Oaptain, I'm in a real situation.

      What'd you say, Hal? Nothing. Bee!

      Don't freak out! My entire species…

      What are you doing?

      Wait a minute! I'm an attorney! Who's an attorney? Don't move.

      Oh, Barry.

      Good afternoon, passengers. This is your captain.

      Would a Miss Vanessa Bloome in 24B please report to the cockpit?

      And please hurry!

      What happened here?

      There was a DustBuster, a toupee, a life raft exploded.

      One's bald, one's in a boat, they're both unconscious!

      Is that another bee joke? No! No one's flying the plane!

      This is JFK control tower, Flight 356. What's your status?

      This is Vanessa Bloome. I'm a florist from New York.

      Where's the pilot?

      He's unconscious, and so is the copilot.

      Not good. Does anyone onboard have flight experience?

      As a matter of fact, there is.

      Who's that? Barry Benson. From the honey trial?! Oh, great.

      Vanessa, this is nothing more than a big metal bee.

      It's got giant wings, huge engines.

      I can't fly a plane.

      Why not? Isn't John Travolta a pilot? Yes. How hard could it be?

      Wait, Barry! We're headed into some lightning.

      This is Bob Bumble. We have some late-breaking news from JFK Airport,

      where a suspenseful scene is developing.

      Barry Benson, fresh from his legal victory…

      That's Barry!

      …is attempting to land a plane, loaded with people, flowers

      and an incapacitated flight crew.

      Flowers?!

      We have a storm in the area and two individuals at the controls

      with absolutely no flight experience.

      Just a minute. There's a bee on that plane.

      I'm quite familiar with Mr. Benson and his no-account compadres.

      They've done enough damage.

      But isn't he your only hope?

      Technically, a bee shouldn't be able to fly at all.

      Their wings are too small…

      Haven't we heard this a million times?

      "The surface area of the wings and body mass make no sense."

      Get this on the air!

      Got it.

      Stand by.

      We're going live.

      The way we work may be a mystery to you.

      Making honey takes a lot of bees doing a lot of small jobs.

      But let me tell you about a small job.

      If you do it well, it makes a big difference.

      More than we realized. To us, to everyone.

      That's why I want to get bees back to working together.

      That's the bee way! We're not made of Jell-O.

      We get behind a fellow.

      Black and yellow! Hello! Left, right, down, hover.

      Hover? Forget hover. This isn't so hard. Beep-beep! Beep-beep!

      Barry, what happened?!

      Wait, I think we were on autopilot the whole time.

      That may have been helping me. And now we're not! So it turns out I cannot fly a plane.

      All of you, let's get behind this fellow! Move it out!

      Move out!

      Our only chance is if I do what I'd do, you copy me with the wings of the plane!

      Don't have to yell.

      I'm not yelling! We're in a lot of trouble.

      It's very hard to concentrate with that panicky tone in your voice!

      It's not a tone. I'm panicking!

      I can't do this!

      Vanessa, pull yourself together. You have to snap out of it!

      You snap out of it.

      You snap out of it.

      You snap out of it!

      You snap out of it!

      You snap out of it!

      You snap out of it!

      You snap out of it!

      You snap out of it!

      Hold it!

      Why? Oome on, it's my turn.

      How is the plane flying?

      I don't know.

      Hello?

      Benson, got any flowers for a happy occasion in there?

      The Pollen Jocks!

      They do get behind a fellow.

      Black and yellow. Hello. All right, let's drop this tin can on the blacktop.

      Where? I can't see anything. Oan you?

      No, nothing. It's all cloudy.

      Oome on. You got to think bee, Barry.

      Thinking bee. Thinking bee. Thinking bee! Thinking bee! Thinking bee!

      Wait a minute. I think I'm feeling something.

      What? I don't know. It's strong, pulling me. Like a 27-million-year-old instinct.

      Bring the nose down.

      Thinking bee! Thinking bee! Thinking bee!

      What in the world is on the tarmac? Get some lights on that! Thinking bee! Thinking bee! Thinking bee!

      Vanessa, aim for the flower. OK. Out the engines. We're going in on bee power. Ready, boys?

      Affirmative!

      Good. Good. Easy, now. That's it.

      Land on that flower!

      Ready? Full reverse!

      Spin it around!

      Not that flower! The other one!

      Which one?

      That flower.

      I'm aiming at the flower!

      That's a fat guy in a flowered shirt. I mean the giant pulsating flower

      made of millions of bees!

      Pull forward. Nose down. Tail up.

      Rotate around it.

      This is insane, Barry! This's the only way I know how to fly. Am I koo-koo-kachoo, or is this plane flying in an insect-like pattern?

      Get your nose in there. Don't be afraid. Smell it. Full reverse!

      Just drop it. Be a part of it.

      Aim for the center!

      Now drop it in! Drop it in, woman!

      Oome on, already.

      Barry, we did it! You taught me how to fly!

      Yes. No high-five! Right. Barry, it worked! Did you see the giant flower?

      What giant flower? Where? Of course I saw the flower! That was genius!

      Thank you. But we're not done yet. Listen, everyone!

      This runway is covered with the last pollen

      from the last flowers available anywhere on Earth.

      That means this is our last chance.

      We're the only ones who make honey, pollinate flowers and dress like this.

      If we're gonna survive as a species, this is our moment! What do you say?

      Are we going to be bees, orjust Museum of Natural History keychains?

      We're bees!

      Keychain!

      Then follow me! Except Keychain.

      Hold on, Barry. Here.

      You've earned this.

      Yeah!

      I'm a Pollen Jock! And it's a perfect fit. All I gotta do are the sleeves.

      Oh, yeah.

      That's our Barry.

      Mom! The bees are back!

      If anybody needs to make a call, now's the time.

      I got a feeling we'll be working late tonight!

      Here's your change. Have a great afternoon! Oan I help who's next?

      Would you like some honey with that? It is bee-approved. Don't forget these.

      Milk, cream, cheese, it's all me. And I don't see a nickel!

      Sometimes I just feel like a piece of meat!

      I had no idea.

      Barry, I'm sorry. Have you got a moment?

      Would you excuse me? My mosquito associate will help you.

      Sorry I'm late.

      He's a lawyer too?

      I was already a blood-sucking parasite. All I needed was a briefcase.

      Have a great afternoon!

      Barry, I just got this huge tulip order, and I can't get them anywhere.

      No problem, Vannie. Just leave it to me.

      You're a lifesaver, Barry. Oan I help who's next?

      All right, scramble, jocks! It's time to fly.

      Thank you, Barry!

      That bee is living my life!

      Let it go, Kenny.

      When will this nightmare end?!

      Let it all go.

      Beautiful day to fly.

      Sure is.

      Between you and me, I was dying to get out of that office.

      You have got to start thinking bee, my friend.

      Thinking bee! Me? Hold it. Let's just stop for a second. Hold it.

      I'm sorry. I'm sorry, everyone. Oan we stop here?

      I'm not making a major life decision during a production number!

      All right. Take ten, everybody. Wrap it up, guys.

      I had virtually no rehearsal for that.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The work by Chuong et al. provides important new insights into the contribution of different molecular mechanisms in the dynamics of CNV formation. It will be of interest to anyone curious about genome architecture and evolution from yeast biologists to cancer researchers studying genome rearrangements.

      Thank you for recognizing the broad significance of our study.

      Strengths:

      Their results are especially striking in that the "simplest" mechanism of GAP1 amplification-non-allelic homologous recombination between the flanking Ty-LTR elements is not the most common route taken by the cells, emphasizing the importance of experimentally testing what might seem on the surface to be obvious answers. One of the important developments of their work is the use of their neural network simulation-based inference (nnSBI) model to derive rates of amplicon formation and their fitness effects.

      We agree with this assessment as the results of our study challenge our intuition that the simplest path to structural variation is the most likely and reveals the great diversity in mechanisms that can lead to large scale changes in the genome.

      Weaknesses:

      The manuscript reads as though two different people wrote two different sections of the manuscript - an experimental evolutionist and a computational scientist. If the goal is to reach both groups of readers, there needs to be more explanation of both types of work. I found the computational sections to be particularly dense but even the experimental sections need clearer explanations and more specific examples of the rearrangements found. I will point out these areas in the detailed remarks to the authors. While I have no reason to question their conclusions, I couldn't independently verify the results that ODIRA was the majority mechanism since the sequence of amplified clones was not made available during the review. I've encouraged the authors to include specific, detailed sequence information for both ODIRA events as well as the specific clones where GAP1 was amplified but the flanking gene GFP was not.

      We have revised the manuscript to expand explanations of both the experimental and computational aspects of our study and to provide additional information for the reader. In doing so, we have edited the text to improve readability. We have made all raw data publicly available through the NCBI short read archive (SRA) and are hosting all sequence data for easy visualization in JBrowse using a public server.

      Reviewer #2 (Public Review):

      Summary:

      This study examines how local DNA features around the amino acid permease gene GAP1 influence adaptation to glutamine-limited conditions through changes in GAP1 Copy Number Variation (CNV). The study is well motivated by the observation of numerous CNVs documented in many organisms, but difficulty in distinguishing the mechanisms by which they are formed, and whether or how local genomic elements influence their formation. The main finding is convincing and is that a nearby Autonomous Replicating Sequence (ARS) influences the formation of GAP1 CNVs and this is consistent with a predominate mechanism of Origin Dependent Inverted Repeat Amplification (ODIRA). These results along with finding and characterizing other mechanisms of GAP1 CNV formation will be of general interest to those studying CNVs in natural systems, experimental evolution, and in tumor evolution. While the results are limited to a single CNV of interest (GAP1), the carefully controlled experimental design and quantification of CNV formation will provide a useful guide to studying other CNVs and CNVs in other organisms.

      Thank you for this positive assessment of our study.

      Strengths:

      The study was designed to examine the effects of two flanking genomic features next to GAP1 on CNV formation and adaptation during experimental evolution. This was accomplished by removing two Long Terminal Repeats (LTRs), removing a downstream ARS, and removing both LTRs and the ARS. Although there was some heterogeneity among replicates, later shown to include the size and breakpoints of the CNV and the presence of an unmarked CNV, both marker-assisted tracking of CNV formation and modeling of CNV rate and fitness effects showed that deletion of the ARS caused a clear difference compared to the control and the LTR deletion.

      The consequence of deletion of local features (LTR and ARS) was quantified by genome sequencing of adaptive clones to identify the CNV size, copy number and infer the mechanism of CNV formation. This greatly added value to the study as it showed that i) ODIRA was the most common mechanism but ODIRA is enhanced by a local ARS, ii) non-allelic homologous recombination (NAHR) is also used but depends on LTRs, and iii) de novo insertion of transposable elements mediate NAHR in strains with both ARS and LTR deletions. Together, these results show how local features influence the mechanism of CNV formation, but also how alternative mechanisms can substitute when primary ones are unavailable.

      We agree with this assessment.

      Weaknesses:

      The CNV mutation rate and its effect on fitness are hard to disentangle. The frequency of the amplified GFP provides information about mutation rate differences as well as fitness differences. The data and analysis show that each evolved population has multiple GAP1 CNV lineages within it, with some being unmarked by GFP. Thus, estimates of CNV fitness are more of a composite view of all CNV amplifications increasing in frequency during adaptation. Another unknown but potential complication is whether the local (ARS, LTR) deletions influence GAP1 expression and thus the fitness gain of GAP1 CNVs. The neural network simulation-based inference does a good job at estimating both mutation rates and fitness effects, while also accounting for unmarked CNVs. However, the model does not account for the population heterogeneity of CNVs and their fitness effects. Despite these limitations of distinguishing mutation rate and fitness differences, the authors' conclusions are well supported in that the LTR and ARS deletions have a clear impact on the CNV-mediated evolutionary outcome and the mechanism of CNV formation.

      While it is true that the inferred mutation rate and fitness effect are negatively correlated, as in other studies (Gitschlag et al., 2023; Caspi et al., 2023; Avecilla et al., 2022), our modeling approach does generate an estimate of each parameter that is best explained by the data. By reporting the confidence intervals (i.e. the 95% HDI) we define the set of parameter values that are consistent with the data. It is true that our model doesn't explicitly account for population heterogeneity; rather, following Hegreness et al. (2006), we employ a single effective fitness effect and mutation rate for all GAP1 CNVs. It is interesting to consider whether the ARS and LTR affect GAP1 expression; however, we have no evidence that this is the case.

      Reviewer #3 (Public Review):

      Summary:

      The authors represent an elegant and detailed investigation into the role of cis-elements, and therefore the underlying mechanisms, in gene dosage increase. Their most significant finding is that in their system copy number increase frequently occurs by what they call replication errors that result from the origin of replication firing.

      The authors somewhat quantitatively determine the effect of the presence of a proximal origin of replication or LTR on the different CNV scenarios.

      Strengths:

      (1) A clever and elegant experimental design.

      (2) A quantitative determination of the effect of a proximal origin of replication or LTR on the different CNV scenarios. Measuring directly the contribution of two competing elements.

      (3) ODIRA can occur by firing of a distal ARS element.

      (4) Re-insertion of Ty elements is interesting.

      We agree that these are interesting and novel findings from our study.

      Weaknesses:

      (1) Overall, the research does not considerably advance the current knowledge. The research does not investigate what the maximum distance between ARS for ODIRA is to occur. This is an important point since ODIRA was previously described. A considerable contribution to the field would be to understand under what conditions ODIRA wins NAHR.

      We agree that these are important questions and they are ones that we are pursuing in future studies.

      (2) The title and some sentences in the abstract give a wrong impression of the generality and the novelty of the observations presented. Below are some examples of much earlier work that dealt with mechanisms of CNV and got different conclusions. The Lobachev lab (Cell 2006) published a different scenario years ago, with a very different mechanism (hair-pin capped breaks). The Argueso lab found something different (NAHR) (Genetics 2013).

      In fact, the CUP1 system presents a good example of this point. The Houseley group showed a complex replication transcription-based mechanism (NAR 2022, cited), the Argueso group showed Ty-based amplification and the Resnick group showed aneuploidy-based amplification. While aneuploidy is a minor factor here the numerous works in Candida albicans, Cryptococcus neoformans, and Yeast suggest otherwise (Selmecki et al Science 2006, Yona et al PNAS 2013, Yang et al Microbiology Spectrum 2021).

      As the reviewer points out there have been several important published studies investigating mechanisms by which structural variation is generated. It is important to note that we are explicitly looking at CNVs in the context of adaptive evolution and the role of genomic features that enable different mechanisms of CNV formation. To emphasize this point, we have changed the title of our manuscript to “Template switching during DNA replication is a prevalent source of adaptive gene amplification”. Aneuploidy is indeed a mechanism of adaptive gene amplification in our current and previously reported studies. We have expanded our discussion to place our study in the context of previous studies reporting mechanisms of gene amplification.

      (3) The authors added a mathematical model to their experimental data. For me, it was very difficult to understand the contribution of the model to the research. I anticipated, for example, that the model would make predictions that would be tested experimentally. For example, " ARSΔ and ALLΔ are predicted to be almost eliminated by generation 116, as the average predicted WT proportion is 0.998 and 0.999" But to my understanding without testing the model.

      In our previous publication (Avecilla et al. 2022, PLoS Biology) we experimentally validated the use of nnSBI to infer evolutionary parameters. In this study, we have extended our modeling framework to quantify differences between genotypes, which was not previously possible. Our results reveal that the local ARS has a key role in the overall supply rate of CNVs at this locus.

      Recommendations for the authors:

      We have addressed all public reviews and recommendations.

      Reviewer #1 (Recommendations For The Authors):

      Specific comments about the work are covered in the order of appearance in the text or Figures. I apologize in advance for the number of comments. They are made out of curiosity, enthusiasm for the research, and a desire to help highlight the most interesting aspects of this work.

      We are grateful for the thoughtful comments that have helped us to significantly improve our manuscript.

      (1) I would appreciate the inclusion of several references to the work on the ODIRA model.

      a) Page 3 last paragraph: "(2) DNA replication-based mechanisms (Harel et al., 2015; Hastings, Lupski, et al., 2009; Malhotra & Sebat, 2012; Pös et al., 2021; Zhang, Gu, et al., 2009; Brewer et al., 2011)" (Addition of Brewer et al., 2011).

      We have added all suggested references.

      b) Page 4 top: (Brewer et al., 2011; Brewer et al., 2015; Martin et al., 2024). (Addition of Brewer et al., 2011).

      We have added all suggested references.

      c) Page 14 top: "Recent work has proposed that ODIRA CNVs are a major mechanism of CNVs in human genomes (Brewer et al., 2015; Martin et al., 2024; Brewer et al., 2024)." Brewer et al., 2024 focuses specifically on ODIRA and human CNVs. (Addition of Brewer et al., 2024).

      We have added all suggested references.

      (2) Page 6, third paragraph: I was surprised that a single inoculating strain was used to establish the replicate chemostats because of the possibility of non-independence of the resulting GAP1 CNVs. A nnSBI model was used to correct for this possibility later in the paper. It seems like it could have been avoided by a simple change in protocol to inoculate each chemostat with an independent inoculum. Was there a reason that the replicate chemostats were not conducted as independent events? Establishing the presence of 'founder' GAP1 CNVs without GFP seems rather secondary to the point of the paper (examining the CNVs that arise during evolution) and I would recommend it being moved to the supplement.

      As is typical in microbial experimental evolution studies, we aimed to start with genetically identical homogenous populations and observe the emergence and selection of de novo variation. Therefore, we founded independent populations from a single inoculum. However, this study, and our prior work using lineage tracking barcodes, has clearly demonstrated that during the initial growth of the culture used for the inoculum CNVs are generated that contribute to the adaptation dynamics on all derived populations. This unanticipated result now suggests that the reviewer’s suggestion is a valid one - independent populations should be derived from independent inocula and this will be our standard practice in future studies.

      We believe that our results, presented in Figure 2, establishing the presence of pre-existing GAP1 CNVs without the GFP are important as it highlights a limitation of the use of CNV reporters of gene copy number that was not previously known. However, we subsequently show that this class of variant - CNVs that are not detected by the reporter system - can be incorporated into our modeling framework enabling estimation of evolutionary parameters, which we believe is an important finding warranting inclusion in the main text.

      (3) Page 7 first full paragraph: "Finally, we also observe a significant delay (ANOVA, p = 0.00833) in the generation at which the CNV frequency reaches equilibrium in ARS∆ (~generation 112) compared to WT (pairwise t-test, adjusted p = 0.05) . . .". Is the delay in reaching a plateau in Figure 1E just a consequence of the later appearance of CNVs or do the authors believe there are two separate events responsible for this delay? E.g. if the authors think that the delay in reaching a plateau is related to lower selection coefficients of the CNVs that do arise compared to the CNVs of other strains, then this should be explicitly discussed.

      We believe that the delay in reaching equilibrium is a consequence of both a lower CNV formation and reduced selection coefficients. Lower values for the fitness coefficient and formation rate in ARS∆ explain both the delay in CNV appearance and CNV equilibrium as shown by the predicted dynamics (Figure S3B). We have added an explicit discussion of the effect of the ARS on CNV dynamics in paragraph 2 of the Discussion section paragraph 2 starting at line 456.

      (4) Page 7: Incorporating pre-existing CNVs into an evolutionary model: The rationale for how you are able to discount the formation rate of GFP-free CNVs (C-) in your model isn't clear to me. How are you able to assume that these C- events don't form after timepoint 0? Why do you assume a starting population of C- events but not a starting population of C+ events?

      We explored the possibility of modeling C- (amplifications of GAP1 without amplification of the reporter) during the evolution experiment. However, because the rate at which C- events occurs is slower than the rate at which C+ events occur (GAP1 amplifications with amplification of the reporter) we found that the effect was negligible. Importantly, the simple model is sufficient to describe the observed dynamics and thus we do not include these possible rare events.

      (5) Figure 1:

      (a) Panel B: Please put the tRNAs on the line diagrams of the four strains. I first interpreted ALLΔ as missing the tRNAs, too.

      Thank you for this suggestion. We added tRNAs to all diagrams to provide additional detail about the structure of the GAP1 locus.

      (b) Panels C, D, and E: the dark shade of the colored boxplots obscures the individual points. I recommend reducing the opacity of the box or choosing a lighter shade so that the individual points are visible on top of the box. Is the percent increase in CNVs per generation (Panel D) based on the slopes of the curves in panel B? By eye the slopes of ARS∆ and ALL∆ appear at least as steep as those of wild type and LTR∆.

      Thank you for this suggestion. We have now made the individual points visible on top of the boxplots in Figures 1C, 1D, and 1E. The lines in Figure 1B show the median value across populations per time point whereas each point in Figure 1D is the slope from linear regression using values from individual populations (data from individual populations are shown in Figure 3C).

      (6) Figure 2:

      (a) Panel A: Please remind the readers what FSC-A is measuring and label the different groups of cells in each sample. Are we supposed to assume the upper scatter in generation 8 is the pre-existing CNV variants? Are the three species at generation 50 due to 1, 2, and 3 copies of GFP? Is the new species in generation 137 further amplification of the locus? And if so, how many copies does it represent? I find it fascinating that what I assume is the 2-copy CNV (presumably a direct oriented amplicon produced by NAHR) at 50 generations is lost (out-competed by a potential inverted triplication) at later times, but I didn't find any mention of this phenomenon in the text. What do the different mutant strains look like over the same time course? Please supply supplemental figures with the flow cytometry gating and vertically aligned histograms of the GFP signal so that the peaks are more easily compared. And provide this information for each of the altered strains in supplementary materials.

      Thank you for these useful suggestions. We have added a gating legend to the figure to clearly indicate the copy-number for each subpopulation. We have edited the caption and main text to explain forward scatter (FSC-A). Raw flow cytometry plots are now provided as Supplementary figure 2 and distributions of cell-size normalized GFP signal are provided in Supplementary figure 3. Although our primary objective with Figure 2A was to show the persistence of the 1-copy GFP population the reviewer is correct that we did not highlight interesting aspects of the CNV dynamics. We have added additional text starting at line 251 to point out these features of the data.

      (b) Panel B: It would help to label the different colored boxes inside cells in Figure 2B - it took me a while to identify the white box as an unrelated adaptive mutation elsewhere in the genome. The linear arrangement of these small colored blocks seems to indicate their structural arrangement. Is that the case? And are they inverted or direct amplicons? Perhaps the authors are being agnostic at this point but it would be better if each of the blocks were separate. If there are other mutations that can explain these GFP-non-amplified survivors, were they identified in your whole genome sequencing?

      We have now included a complete legend for Figure 2B indicating that the white box reflects other beneficial mutations. We have separated this class of beneficial mutation from the GAP1 and reporter elements to reflect that they are not linked. We did not identify additional beneficial mutations but plan to pursue this question in a future project.

      (c) Panel C: Are the two sets of lines mislabeled? One would expect the "reported" CNV proportions to be lower than the total CNV proportions, not the other way around. Maybe the labels "total CNVs" and "reported CNVs" are unclear to me and I am misunderstanding what "reported" refers to. Please clarify.

      Thank you for identifying this mistake. The lines were mislabeled and have now been corrected in the revised version.

      (7) Figure 3:

      (a) A fuller discussion of panels A and B is needed. The results of panel A in particular seem like an excellent opportunity for connecting the computation to the biology. Can the authors speculate on why the ALL∆ strain has a higher CNV formation rate (𝛿c) than the ARS∆ strain? I would think that taking away one means of amplification would decrease CNV formation. Likewise, could the authors discuss why the selection coefficient (sc) for the LTR∆ strain would be the same as for the wild type? Overall, I would like to see more discussion about what these differences in formation rates and selection coefficients could mean for the types of amplicons arising in the chemostats. (In panel B I don't see the shaded area referred to in the figure legend.) A side-by-side comparison of the data in Panel A with the data shown in Supplemental Figure S3A would be instructive..

      Thank you for raising these points. We have added substantial text to the manuscript to address these findings. Starting at line 456 we state:

      “The lower CNV formation rate in the LTR∆ could be a closer approximation of ODIRA formation rates at this locus as ODIRA CNVs are the predominant CNV mechanism in the LTR∆ strain (Figure 4F). Furthermore, the low formation rates in the LTR∆ relative to WT might suggest that the presence of the flanking long terminal repeats may increase the rate of ODIRA formation through an otherwise unknown combinatorial effect of DNA replication across these flanking LTRs and template switching at the GAP1 locus. ARS∆ has the lowest CNV formation rate and it could be an approximation of the rates of NAHR between flanking LTRs and ODIRA at distal origins. We find that the ALL∆ has a higher CNV formation rate than the ARS∆, even though three elements are deleted instead of one. One explanation for this is that the deletion of the flanking LTRs in ALL∆ gives opportunity for novel transposon insertions and subsequent LTR NAHR. Indeed we find an enrichment of novel transposon-insertions in the ALL∆ (Figure 4F) and subsequent CNV formation through recombination of the Ty1-associated repeats (Figure 4H, ALL∆). Both events, transposon insertion followed by LTR NAHR, would have to occur quickly at a rate that explains our estimated CNV rate in ALL∆. While remarkable, increased transposon activity has been associated with nutrient stress (Curcio & Garfinkel, 1999; Lesage & Todeschini, 2005; Todeschini et al., 2005) and therefore feasible explanation for the CNV rate estimated in the ALL∆. Additionally, ARS∆ clones rely more on LTR NAHR to form CNVs (Figure 4F). The prevalence of ODIRA in ARS∆ and ALL∆ are similar. LTR NAHR usually occurs after double strand breaks at the long terminal repeats to give rise to CNVs (Argueso et al., 2008). Because we use haploid cells, such double strand break and homology-mediated repair would have to occur during S-phase after DNA replication with a sister chromatid repair template to form tandem duplications. Therefore the dependency on LTR NAHR to form CNVs and the spatial (breaks at LTR sequences) and temporal (S-phase) constraints could explain the lower formation rate in ARS∆.”

      In addition, we added a discussion of the different selection coefficients estimated and how the simulated competitions help us understand the decreased selection coefficients in the architecture mutants. In newly added text starting at line 479 we state:

      “The genomic elements have clear effects on the evolutionary dynamics in simulated competitive fitness experiments. The similar selection coefficients in WT and LTR∆ suggest that CNV clones formed in these background strains are similar. Indeed, the predominant CNV mechanism in both is ODIRA followed by LTR NAHR (Figure 4F). While LTR NAHR is abolished in the LTR∆, it seems that CNVs formed by ODIRA allow adaptation to glutamine-limitation similar to WT. The lower selection coefficients in ARS∆ and ALL∆ suggest that GAP1 CNVs formed in these strains have some cost. In a competition, they would get outcompeted by CNV alleles in the WT and LTR∆ background.”

      (b) The data shown in panel C seems redundant to what is shown more clearly in Supplemental Figure S3B. It seems to me the more important comparison to make in panel C would be the overlay of the predicted data to the median proportion of cells obtained from the experimental data (Figure 1B). Also, overlays of the cultures from each strain could be added to S3A. It is difficult to see the variation within each strain when the data from all four strains are superimposed as they are in Figure 3C.

      We agree and have edited Figure 3C to incorporate these suggestions and more clearly convey the intra- and interstrain variation.

      (8) Figure 4:

      (a) Panels A, B, and C are nice summaries and certainly helpful for understanding panel E, but it would be instructive to see some actual rearrangements of the ODIRA events, the NAHR, and the transposon-mediated rearrangements. It isn't clear to me what these last events look like. A figure that shows the specific architecture of example clones for each category would be helpful. I am also having a hard time reconciling ODIRA events with a copy number of 2. Are these rearrangements free isochromosomes with amplification to the telomere or are they secondary rearrangements like those described in Brewer et al., 2024? And what about the non-aneuploid rearrangement that includes the centromere? Is it a dicentric?

      We have now added more detailed depictions of CNVs in Figure 4A and provide links to visualize the alignment files. We have added additional discussion starting at line 397 of the non-canonical ODIRA events and putative neochromosome amplicons with reference to Brewer et al 2024. Starting at line 397 we state:

      “Surprisingly, we found CNVs with breakpoints consistent with ODIRA that contained only 2 copies of the amplified region, whereas ODIRA typically generates a triplication. In the absence of additional data, we cannot rule out inaccuracy in our read-depth estimates of copy numbers for these clones (ie. they have 3 copies). An alternate explanation is a secondary rearrangement of an original inverted triplication resulting in a duplication (Brewer et al., 2024); however, we did not detect evidence for secondary rearrangements in the sequencing data. A third alternate explanation is that a duplication was formed by hairpin capped double-strand break repair (Narayanan et al., 2006). Notably, we found 3 additional ODIRA clones that end in native telomeres, each of which had amplified 3 copies. In these clones the other breakpoint contains the centromere, indicating the entire right arm of chromosome XI was amplified 3 times via ODIRA, each generating supernumerary chromosomes. Thus,ODIRA can result in amplifications of large genomics regions from segmental amplifications to supernumerary chromosomes.”

      (b) In Panel B the violin plots appear to indicate that there are two size categories for amplicons in the ARS∆ strain. Do clones from these different sub-populations share a common CNV architecture?

      Thank you for making this point. (Please note that the violin plots are now Figure 4E) We added a short discussion and Supplementary Figure 14. In line 432, we state:

      “In ARS∆, we find two CNV length groups (Figure 4E) that correspond with two different CNV mechanisms (Supplementary Figure 14). 100% of smaller CNVs (6-8kb) (Supplementary Figure 14) correspond with a mechanism of NAHR between LTRs flanking the GAP1 gene (Figure 4H, ARS∆, bottom left green points). Larger CNVs (8kb-200kb) (Supplementary Figure 14) correspond with other mechanisms that tend to produce larger CNVs, including ODIRA and NAHR between one local and one distal LTR element (Figure 4H).”

      (c) Panels D and E: There is great information in these two panels but I find the color keys confusing. There doesn't seem to be any reason for the strain color key in panel E. I am assuming that the key should go with Panel D. Is there some way to indicate in Panel D which events are in which CNV category? It is cumbersome to find that information from Panel E. Perhaps the color-coding from Panel E could be applied to the row labels in Panel D. Being able to link amplicon to the mechanism of CNV formation is especially important for seeing which ODIRA events contain an origin.

      Thank you for this suggestions. We now indicate the mechanism of CNV formation using a consistent color coding in panels G and H (previously panels D and E).

      (d) Panel E: I don't understand the two axes in Panel E. If both axes are log scales, why is the origin 0 for the X-axis and 1 for the Y-axis? And why are the focal amplicons (most of which are recombination events between the two LTRs) scattered in both X and Y coordinates? Shouldn't they form a single point? The same for the recombinants with distal LTRs. Also, orange and red (ODIRA and complex CNVs, respectively) are very hard to distinguish. All of these data need to be presented in a spreadsheet identifying each clone's strain ID, chemostat number, GAP1 and GFP copy numbers, sequence across the junction, and their coordinates. The SRA project (PRJNA1016460) for the sequence data was not found in SRA. Will this data be available to easily look at read depth across chromosome XI for all of the sequenced strains - perhaps as .bam files?

      Thank you for calling these issues with data visualization to our attention. Indeed, the focal amplifications do form around a single point. We originally had jittered the data to show each individual focal amplification but agree that this is confusing. We now overlay the individual points and have altered opacity to enable visualization of individual values. The suggested table of clone data is provided in Supplementary File 2 and the SRA project is now publicly available. Moreover, we are providing all alignment (.bam) files, split, and discordant read depth profiles for each CNV strain and their corresponding ancestor aligned to our custom reference genomes in a public jbrowse server at:

      https://jbrowse.bio.nyu.edu/gresham/?data=data/ee_gap1_arch_muts for WT strains, https://jbrowse.bio.nyu.edu/gresham/LTRKO_clones for LTR∆ strains, https://jbrowse.bio.nyu.edu/gresham/ARSKO_clones for ARS∆ strains, https://jbrowse.bio.nyu.edu/gresham/ALLKO_clones for ALL∆ strains.

      (e) Supplementary Table 1 and Supplementary Figure S2: Please indicate which rearrangements (of the 8 reported in Figure S2A) were identified in each of the clones described in the table. If each of the 8 amplicons is identified by a letter, then this information could be added as a column in the table. I am assuming that each of the eight rearrangements was found in more than one chemostat. Showing these data is crucial for establishing the possibility that they were preexisting at the time of chemostat inoculation. The other possibility is that the clones with amplified GAP1 but a single copy of GFP could have been created by a secondary rearrangement in the outgrowth of the clones that originally had amplified both genes to the same extent. What is the structure of these amplicons? Is there a common junction between GAP1 and GFP? I couldn't find these data in the paper. A suggestion for Supplemental Figure S2A - include a zoomed-in inset for the GAP1 GFP region for each of the 8 read-depth plots. It is hard to see the exact location of GFP and GAP1 across all 8 tracks without getting out a ruler. Were these sequences aligned to your custom reference genome or the reference genome without GFP? If they were aligned to the custom reference that includes the GFP reporter, the reader could visually confirm the absence of GFP amplification.

      Thank you for these suggestions. We edited Supplementary Table 1 and Supplementary Figure 1A as requested. We now provide the precise CNV breakpoints in the GFP-GAP1 region (supplemental figure 1B) displaying both genome read depth and split read depth tracks. These sequences were aligned to the custom reference containing the GFP reporter, which is now clearer in the figure and caption text in line 1226.

      The clones in this figure were sampled from the five different chemostats and we have clarified this in the edited table and text at line 210. We did not detect the same CNV allele in different chemostats and therefore we do not have evidence to support GAP1 amplification without the GFP reporter pre-existing at time of inoculation. We are not able to definitively distinguish whether the amplicons were pre-existing at the time of inoculation or occurred after as we do not have barcoded lineages. We isolated clones carrying this class of amplification from the 1-GFP-copy subfraction late in the experimental evolution (generation 165-182). Given that the alleles appear to differ between populations we think the most parsimonious explanation is that these amplifications occurred after chemostat inoculation but early in the evolution experiment. We explicitly state this in the text starting in line 219.

      (9) Page 8-9: I am sorry to say that I can't evaluate the "HDI of posterior distributions". It is out of my competency range. So I am not sure what this analysis is adding to the paper. The same goes for the rest of the supplementary figures.

      HDI is a measure of certainty in an estimate, similar to confidence interval. We state this in the text in line 276. With the editing of the text we hope the modeling and its supplementary figures are more clear now.

      (10) Page 9 top: Deletion of the ARS appears to lower the fitness of the amplified GAP1 variants. Can the authors speculate on why the ARS deletion would reduce fitness? Did they consult published replication profiles to determine the size of the origin-free gap that could result from the deletion of this mid-S phase origin? Could it explain the delay in the appearance of GAP1 amplicons in the ARS-deletion strains and be responsible for their reduced selection coefficients? Did you examine the growth properties of the starting strain or any of the amplified GAP1 derivatives? Perhaps this consideration could contribute to the discussion. Could there be a bit fuller discussion on the interaction between CNV length differences as shown in Figure 4A and differences in selection coefficient as determined by the nnSBI?

      Thank you for raising this point. We have now added text to our discussion of the reduced fitness in ARS∆ in relation to DNA replication starting on line 359:

      “ARS1116 is a major origin (McGuffee et al., 2013) and ODIRA CNVs found around this origin corroborate its activity. GAP1 is highly transcribed in glutamine-limited chemostats (Airoldi et al., 2016). Head-on transcription-replication collisions at this locus may be contributing to the higher CNV formation rate in wild type and LTR∆. Elimination of the local ARS could result in less transcription-replication collisions and the slower CNV formation rates estimated. Once formed they get outcompeted by faster-forming CNVs and thus in theory are less fit than CNVs in other strain backgrounds. These simulated competitions further suggest that the ARS is a more important contributor to adaptive evolution mediated by GAP1 CNVs.”

      We examined replication profiles in McGuffee et al. Mol Cell. 2013 but could not determine the size of the origin-free gap. ARS1116 and its neighboring ARSs, ARS1118 downstream and ARS1115 upstream are efficient firing origins (Supplement 1 of McGuffee et al. 2013) and therefore the gap is likely to be minimal. The dynamics of the distal firing ARS elements involved in creating ODIRA CNVs might explain the reduced fitness, but further experiments would be required to address this. Regarding growth properties, the growth rate at steady-state in the chemostat is the same as the dilution rate regardless of strain background. Because we had the same dilution rate for each chemostat, the ARS∆ populations would have the same replication rate as the other three strains even if there may be replication rate differences in bulk culture growth. Finally, we found no significant interaction between CNV length and selection coefficients and we state this in line 359.

      (11) Page 10: WT competition simulations: It may help to explicitly state that the competition modeling approach was experimentally validated in Avecilla 2022 as opposed to just citing the paper. I found the results much more convincing after reading Avecilla 2022, but I imagine many readers may skip that.

      We added a sentence to state that the nnSBI method was experimentally validated in Avecilla et 2022 at line 249.

      Reviewer #2 (Recommendations For The Authors):

      (1) Figure 2: says reported CNV proportions (dashed). This may be a typo since I think the GFP reported should be solid, not dashed. Also, (C) isn't bold.

      Thank you for identifying these mistakes. We have corrected the figure’s caption in line 1157.

      (2) "compared to 898/345 clones" Does this refer to transposition/clone? Seems more natural to compare clones with transpositions to a total number of clones. This could be clarified.

      We rephrased the sentence (lines 519-520) to clarify that in their study Hays et al. 2023 found 898 novel Ty insertions across 345 nitrogen-evolved clones. As a result of this high rate of transposition, some clones are expected to have multiple Ty insertions.

      (3) The methods state that Kan replaces the Nat cassette that was used to make the deletions. It should be made more clear whether Kan is present and where Kan is with respect to GFP and GAP1.

      Thank you for pointing this out. To clarify we added the following sentence to the methods starting in line 567:

      “The CNV reporter is 3.1 kb and located 1117 nucleotides upstream of the GAP1 coding sequence. It consists of, in the following order, an ACT1 promoter, mCitrine (GFP) coding sequence, ADH1 terminator, and kanamycin cassette under control of a TEF promoter and terminator.”

      Additionally in line 571 we clarify the drug resistance of the genomic architecture ∆ strains that are kanamycin(+) and nourseothricin(-).

      Reviewer #3 (Recommendations For The Authors):

      (1) The major advancement of the manuscript is stated in the title "DNA replication errors are a major source of adaptive gene amplification" First, in my humble opinion the term replication errors is not quite right; the term template switching is more accurate. In that regard, recently a paper was published just on this topic (Martin et al Plos Genetics, 2024).

      We have changed the title to “Template-switching during DNA replication is a prevalent source of adaptive gene amplification”. We cite Martin et al Plos Genetics 2024 throughout the main text in lines 93, 126, 159, 502, 555.

      (2) I find the statement "We find that 49% of all GAP1 CNVs are mediated by the DNA replication-based mechanism Origin Dependent Inverted Repeat Amplification (ODIRA) regardless of background strain." Somewhat misleading, there were considerable differences between the strains. If I am not mistaken the range was 20-80%.

      Thank you for pointing this out. Indeed, the range was 26-80% across the four strains. We updated this sentence in the abstract at line 40, and in the main text at line 141 to clearly state the range.

      (3) In their attempt to fill the gap of knowledge regarding the fitness effect of the adaptive CNV the authors use a mathematical model. As an experimental biologist, I found the description lacking. It is hard for me to evaluate the contribution of the model to understanding the results and I think the authors could improve this part.

      We have edited the text regarding the modeling and associated results and hope that it is now more clear. The mathematical model describes the experiment in a simplified manner. We use it to predict the outcomes of additional experiments without additional experimental work. For example, we used it to simulate a competition between two strains, predict the total proportion of GAP1 CNVs, and predict the relative genetic diversity.

      (4) Experiments the authors may want to consider to increase the novelty of their work:

      a) Place the GAP1 gene right in the middle of the two most distant ARS elements and test the mechanism of CNV.

      Thank you for this proposed experiment. It is beyond the scope of this paper and will be pursued in future studies.

      b) The finding of de-novo Ty element insertion is interesting. What happens if the overdose strain of Jef Boeke is used (Retrotransposon overdose and genome integrity, PNAS 2009) or in contrast, a reverse transcriptase deficient strain?

      We agree. Our study has revealed a critical role for novel Ty insertion in mediating CNVs. The suggested experiments as well as using strains that lack Ty sequences will be very interesting to explore in followup studies.

      c) The genomic analyses were based on single colony isolates. To my understanding, the CNV events are identified at least partly by split reads. Therefore, each event may have a "signature" that is unique and can be concluded from single reads and not necessarily from the assembled genome. If true, a distinction between the scenarios could be achieved if bulk cultures are sequenced with enough depth. Thus, a truly dynamic and quantitative determination of the different events, rate of appearance, and disappearance can be made.

      Thank you for this suggestion, which is a good idea but not currently feasible for several reasons. First, although split reads are a powerful way to detect CNV breakpoints, we have found that even at high coverage (21-153X, median 78.5X), in clonal samples that are rare with only 3-30 split reads (median 14) detected. These observations are from a total of 23 breakpoints across 16 sequenced clones. Thus, when sequencing heterogeneous cultures, in which different CNVs only comprise a fraction of the population, our ability to detect single CNV alleles by split reads and quantify their frequency is limited. Given our observations, with a median of 14 split reads when sequencing to 78.5X genome-wide read coverage it is possible we may be able to detect an individual CNV allele once it makes up (14/78.5) 17% of the population. However, our previous study has shown that there are tens to hundreds of unique CNV alleles initially and thus this would only be feasible at very late timepoints. Second, recurrent CNVs may occur independently at the same exact location, such as LTR NAHR. Thus, unique signatures may not be obtained even if they are independent events. Third, it would be not appropriate to pursue this analysis with our current dataset, as we lack lineage tracking barcodes to validate the results.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The authors sometimes seem to equivocate on to what extent they view their model as a neural (as opposed to merely behavioral) description. For example, they introduce their paper by citing work that views heterogeneity in strategy as the result of "relatively independent, separable circuits that are conceptualized as supporting distinct strategies, each potentially competing for control." The HMM, of course, also relates to internal states of the animal. Therefore, the reader might come away with the impression that the MoA-HMM is literally trying to model dynamic, competing controllers in the brain (e.g. basal ganglia vs. frontal cortex), as opposed to giving a descriptive account of their emergent behavior. If the former is really the intended interpretation, the authors should say more about how they think the weighting/arbitration mechanism between alternative strategies is implemented, and how it can be modulated over time. If not, they should make this clearer.

      The MoA-HMM is meant to be descriptive in identifying behaviorally distinct strategies. Our intention in connecting it with a “mixture-of-strategies” view of the brain is that the results of the MoA-HMM could be indicative of an underlying arbitration process, but not modeling that process per se, that can be used to test neural hypotheses driven by this idea. We’ve added additional clarification in the discussion to highlight this point.

      Explicitly, we added the following sentence in the discussion: “For example, while the MoA-HMM itself is a descriptive model of behavior and is not explicitly modeling an underlying arbitration of controllers in the brain, the resulting behavioral states may be indicative of underlying neural processes and help identify times when different neural controllers are prevailing”

      Second, while the authors demonstrate that model recovery recapitulates the weight dynamics and action values (Fig. 3), the actual parameters that are recovered are less precise (Fig. 3 Supplement 1). The authors should comment on how this might affect their later inferences from behavioral data. Furthermore, it would be better to quantify using the R^2 score between simulated and recovered, rather than the Pearson correlation (r), which doesn't enforce unity slope and zero intercept (i.e. the line that is plotted), and so will tend to exaggerate the strength of parameter recovery.

      In the methods section, we noted that the interaction between parameters can cause the recovery of randomly drawn parameter sets to fail, as seen in Figure 3 Supplement 1. This is because there are parameter regimes (specifically when a softmax temperature is near zero) which causes choices to be random, and therefore other parameters no longer matter. To address this, we included a second supplemental figure, Figure 3 Supplement 2, where we recovered model parameters from data simulated solely from models inferred from the behavioral data. Recovery of these models is much more precise, which credits our later inferences from the behavioral data.

      To make this point clearer, we changed the reference to Figure 3 Supplements 1 & 2 to: “(Figure 3 – figure supplement 1 for recovery of randomized parameters with noted limitations, and figure supplement 2 for recovery of models fit to real data)” We additionally added the following to the Figure 3 Supplement 1 caption: “Due to the interaction between different model parameters (e.g. a small 𝛽 weight will affect the recoverability of the agent’s learning rate 𝛼), a number of “failures” can be seen.”

      Furthermore, we added an R^2 score that enforces unity slope and zero intercept alongside the Pearson correlation coefficient for more comprehensive metrics of recovery. The R^2 scores are plotted on both Figure 3 Supplements 1 & 2 as “R2”, and the following text was added in both captions: “"r" is the Pearson's correlation coefficient between the simulated and recovered parameters, and "R2" is the coefficient of determination, R2, calculating how well the simulated parameters predict the recovered parameters.”

      Finally, the authors are very aware of the difficulties associated with long-timescale (minutes) correlations with neural activity, including both satiety and electrode drift, so they do attempt to control for this using a third-order polynomial as a time regressor as well as interaction terms (Fig. 7 Supplement 1). However, on net there does not appear to be any significant difference between the permutation-corrected CPDs computed for states 2 and 3 across all neurons (Fig. 7D). This stands in contrast to the claim that "the modulation of the reward effect can also be seen between states 2 and 3 - state 2, on average, sees a higher modulation to reward that lasts significantly longer than modulation in state 3," which might be true for the neuron in Fig. 7C, but is never quantified. Thus, while I am convinced state modulation exists for model-based (MBr) outcome value (Fig. 7A-B), I'm not convinced that these more gradual shifts can be isolated by the MoA-HMM model, which is important to keep in mind for anyone looking to apply this model to their own data.

      We agree with the reviewers that our initial test of CPD significance was not sufficient to support the claims we made about state differences, especially for Figure 7D. To address this, we updated the significance test and indicators in Figure 7B,D to instead signify when there is a significant difference between state CPDs. This updated test supports a small, but significant difference in early post-outcome reward modulation between states 2 and 3.

      We clarified and updated the significance test in the methods with the following text:

      “A CPD (for a particular predictor in a particular state in a particular time bin) was considered significant if that CPD computed using the true dataset was greater than 95% of corresponding CPDs (same predictor, same state, same time bin) computed using these permuted sessions. For display, we subtract the average permuted session CPD from the true CPD in order to allow meaningful comparison to 0.

      To test whether neural coding of a particular predictor in a particular time bin significantly differed according to HMM state, we used a similar test. For each CPD that was significant according to the above test, we computed the difference between that CPD and the CPD for the same predictor and time bin in the other HMM states. We compare this difference to the corresponding differences in the circularly permuted sessions (same predictor, time bin, and pair of HMM states). We consider this difference to be significant if the difference in the true dataset is greater than 95% of the CPD differences computed from the permuted sessions.”

      We updated the significance indicators above the panels in Figure 7B,D (colored points) to refer to significant differences between states, with additional text to the left of each row of points to specify the tested state and which states it is significantly greater than. We updated the figure caption for both B and D to reflect these changes.

      We also changed text in the results to focus on significant differences between states. Specifically, we replaced the sentence “Looking at the CPD of expected outcome value split by state (Figure 7B) reveals that the trend from the example neuron is consistent across the population of OFC units, where state 2 shows the greatest CPD.” with the sentence “Looking at the CPD of expected outcome value split by state (Figure 7B) reveals that the trend from the example neuron is consistent across the population of OFC units, where state 2 has a significantly greater CPD than states 1 and 3.”

      We also replaced the sentence “Suggestively, the modulation of the reward effect can also be seen between states 2 and 3 – state 2, on average, sees a higher modulation to reward that lasts significantly longer than modulation in state 3.” with the sentence “Additionally, the modulation of the reward effect can also be seen between states 2 and 3 — immediately after outcome, we see a small but significantly higher modulation to reward during state 2 than during state 3.”

      Reviewer #2 (Public Review):

      There were a lot of typos and some figures were mis-referenced in the text and figure legends.

      We apologize for the numerous typos and errors in the text and are grateful for the assistance in identifying many of them. We have taken another thorough pass through the manuscript to address those identified by the reviewer as well as fix additional errors. To reduce redundancy, we’ll address all typoand error-related suggestions from both reviewers here.

      ● We fixed all Figure 1 references. We additionally reversed the introduction order of the agents in Figure 1 and in the results section “Reinforcement learning in the rat two-step task”, where we introduce both model-free agents before both model-based agents. This is to make the model-based choice agent description (which references the model-free choice agent in the statement “That is, like MFc, this agent tends to repeat or switch choices regardless of reward”) come after introducing the model-free choice agent.

      ● We fixed all Figure 4 references.

      ● We fixed all Figure 6 references and fixed the panel references in the figure caption to match the figure labeling: Starting with panel B, the reference to (i) was removed, and the reference to (ii) was updated to C. The previous reference to C was updated to D.

      ● All line-numbered suggestions were addressed.

      ● The text “(move to supplement?)” was removed from the methods heading, and the mistaken reference to Q_MBr was fixed.

      ● We removed all “SR” acronyms from the statistics as it was an artifact from an earlier draft.

      ● We homogenized notation in Figure 2, replacing all “c” variable references with “y”, as well as homogenized notation of β

      ● We replaced many uses of the word “action” with the word “choice” for consistency throughout the manuscript.

      ● We addressed many additional minor errors

      Reviewer #1 (Recommendations For The Authors):

      (1) Could the authors comment on why the cross-validated accuracy continues to increase, albeit non-significantly, after four states, as opposed to decreasing (as I would naively expect would be the result due to overfitting)?

      Due to the large amounts of trials and sessions obtained from each rat (often >100 sessions with >200 trials per session) and the limited number of training iterations (capped at 300 iterations), it is not guaranteed that the cross-validated accuracy would decrease over the range of states we included in Figure 4, especially given that the number of total parameters in the largest model shown (7-states, 95 parameters) is greatly less than the number of observations. Since we’re mainly interested in using this tool to identify interpretable, consistent structure across animals, we did not focus on interpreting the regime of larger models.

      (2) It seems like the model was refit multiple times with different priors ("Estimation of Population Prior"), each derived from the previous step of fitting. I'm not very familiar with fitting these kinds of models. Is this standard practice? It gives off the feeling of double-dipping. It would be helpful if the authors could cite some relevant literature here or further justify their choices.

      We adopted a “one-step” hierarchical approach, where we estimate the population prior a single time on (nearly) unconstrained model fits, and use it for a second, final round of model fits which were used for analysis. Since the prior is only estimated once, in practice there isn’t risk of converging on an overly constrained prior. This is a somewhat simplified approach motivated by analogy to the first step of EM fit in a hierarchical model, in which population- and subject-level parameters are iteratively re-estimated in terms of one another until convergence (Huys et al., 2012; Daw 2010). We have clarified this approach in the methods with citations by adding the following paragraph:

      “Hierarchical modeling gives a better estimate of how model parameters can vary within a population by additionally inferring the population distribution over which individuals are likely drawn (Daw, 2011). This type of modeling, however, is notoriously difficult in HMMs; therefore, as a compromise, we adopt a “one-step” hierarchical model, where we estimate population parameters from “unconstrained” fits on the data, which are then used as a prior to regularize the final model fits. This approach is motivated by analogy to the first step of EM fit in a hierarchical model, in which population- and subject-level parameters are iteratively re-estimated in terms of one another until convergence (Daw, 2011; Huys et al., 2012). It is important to emphasize, since we aren’t inferring the population distributions directly, that we only estimate the population prior a single time on the “unconstrained” fits as follows.”

      Reviewer #2 (Recommendations For The Authors):

      Figure 3a.iii: Did the model capture the transition probabilities correctly as well?

      We have updated Figure 3E to include additional panels (iii) and (iv) to show the recovered initial state probabilities and transition matrix.

      For Figure 6, panel B makes it look like there is a larger influence of state on ITI rate after omission, in both the top and bottom plots. However, the violin plots in panel C show a different pattern, where state has a greater effect on ITIs following rewarded trials. Is it that the example in panel B is not representative of the population, or am I misinterpreting?

      We thank the reviewer for catching this issue, as the colors were erroneously flipped in panel C. We have fixed this figure by ensuring that the colors appropriately matched the trial type (reward or omission). Additionally, we updated the colors in B and C that correspond to reward (previously gray, now blue) and omission (previously gold, now red) trials to match the color scheme used in Figure 1. We also inverted the corresponding line styles (reward changed to solid, omission changed to dashed) to match the convention used in Figure 7. To differentiate from the reward/omission color changed, we additionally changed the colors in Figure 6D and Figure 7 Supplement 1, where the color for “time” was changed from blue to gray, and the color for “state” was changed from red to gold.

      For figure 4B right, I am confused. The legend says that this is the change in model performance relative to a model with one fewer state. But the y-axis says it's the change from the single-state model. Please clarify.

      The plot is showing the increase in performance from the single-state model, while the significance tests were done between consecutive numbered states. We updated the significance indicators on the plot to more clearly identify that adjacent models are being compared (with the exception of the 2-state model, which is being compared to 0). We updated the Figure 4B caption text for the left panel to state: “Change in normalized, cross-validated likelihood when adding additional hidden states into the MoA-HMM, relative to the single-state model. Significant changes are computed with respect to models with one fewer states (e.g. 2-state vs 1-state, 3-state vs 2-state)”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      (1) Gap of knowledge:

      From the introduction, I got the impression that the manuscript tries to answer the question of whether homeostatic structural plasticity is functionally redundant to synaptic scaling. However, the importance of this question needs to be worked out better. Also, I think it is hard to tackle this question with the shown experiments as one would have to block all other redundant mechanisms and see whether HSP functionally replaces them.

      We appreciate the reviewer’s valuable feedback regarding the relationship between homeostatic structural plasticity (HSP) and synaptic scaling. The main objective of our study is indeed to investigate whether structural plasticity is homeostatically regulated, and if so, whether it acts as a redundant or heterogeneous mechanism in relation to synaptic scaling, which is widely recognized as a primary homeostatic process.

      In our revised introduction, we have clarified this central question and its significance. Specifically, we explored why experimentally observed changes in spine density, a measure of structural plasticity, do not exhibit the same homeostatic characteristics as changes in spine head size, which reflects synaptic scaling, particularly under conditions of activity blockade.

      We hypothesized two key points:

      (1) Structural plasticity may not follow a monotonically activity-dependent rule as strictly as synaptic scaling.

      (2) The observed changes in spine density may be influenced by the simultaneous modulation of spine size, suggesting that structural plasticity and synaptic scaling interact within the same biological system.

      Both hypotheses were tested through a combination of experimental observations and systematic computer simulations. Our conclusions demonstrate that spine-number-based structural plasticity follows a biphasic activity-dependent rule. While it largely overlaps with synaptic scaling under typical conditions, it exhibits heterogeneity under extreme conditions, such as activity silencing. Furthermore, our simulations revealed that both mechanisms can compete and complement each other within neural networks.

      We believe that these results offer a nuanced understanding of the interaction between structural plasticity and synaptic scaling, highlighting their redundancy under most conditions but also their heterogeneity under specific circumstances. Blocking all other redundant mechanisms, as suggested, would provide a more reductionist view, which may not capture the complexity and interplay of these processes in a physiological setting. Our approach reflects this complexity, providing insight into how these mechanisms operate together in a naturalistic context.

      We have revised the introduction to better convey these points and emphasize the significance of this question for understanding the dynamics of homeostatic regulation in neural networks.

      Similarly, the simulations do not really tackle redundancy as, e.g. network growth cannot be achieved by scaling alone.

      We appreciate the reviewer’s comment regarding synaptic scaling's limitations in achieving network growth. We would like to clarify that we did not intend to suggest that structural plasticity and synaptic scaling are fully redundant. In fact, it is well established in the literature that structural plasticity plays a dominant role during development, particularly in network growth, which synaptic scaling alone cannot achieve.

      The primary objective of our study was to investigate the interaction between structural plasticity and synaptic scaling under conditions of activity perturbation, rather than during network growth or development. To avoid any confusion regarding developmental processes, we chose to grow the network using only structural plasticity in our simulations. Synaptic scaling was then introduced (or not) during the phase of activity deprivation to specifically examine its role in regulating homeostasis under these conditions.

      We have revised the corresponding sections of the manuscript to clarify this distinction, and we have ensured that the simulations reflect our focus on activity perturbation rather than network development. This distinction should help readers avoid conflating developmental processes with the specific goals of our study.

      Instead, the section on "Integral feedback mechanisms" (L112-129) contains a much better description of the actual goals of the paper than is given in the introduction. Moreover, this section does not seem to include any new results (at least the Ca-dependent structural plasticity and synaptic scaling rules seem to be very common for me). I, therefore, suggest fusing this paragraph in the introduction to obtain a clearer and better understandable gap of knowledge, which is addressed by the paper.

      We agree that the "Integral feedback control" section provides key information relevant to both the Introduction and Methodology. It outlines the theoretical framework and serves as a basis for the experimental design.

      To better reflect this, we have revised the Introduction to include the gap in knowledge. However, we opted to retain the section in the Results, slightly modified, to set the context for the first experiment.

      Along this line, as it seems a central point of the manuscript to distinguish the controller dependencies on Calcium, the different dependencies (working models) should be described in more detail. Also, the description of the inconsistencies of the previous results on HSP can be moved from the discussion (l419-l441) to the introduction.

      We have revised the manuscript to place less emphasis on the controller models while retaining the core principles of control theory. The description of the HSP model has been moved to the Introduction, as suggested, while the detailed history remains in the Discussion to maintain the manuscript's consistency.

      Systematic text revision: Regarding comment (1), we thank the reviewer for suggesting the text reorganization. We have adjusted several parts in the introduction, M&M section, and results section to increase clarity.

      (2) Pharmacological Choice:

      It should be discussed why NBQX is used to induce the homeostatic effect instead of TTX. As there are studies showing that it might block homeostatic rewiring (doi.org/10.1073/pnas.0501881102) as well as synaptic scaling (10.1523/JNEUROSCI.3753-08.2009), it seems unclear whether the observed effects are actually corresponding to those in other publications.

      The rationale for using NBQX in our experiments, rather than TTX, is detailed in the public response. We selected NBQX based on specific experimental motivations relevant to our study’s objectives, while acknowledging the potential differences in effects compared to other studies.

      Local text revision: We added one paragraph in the discussion section to explain the idea better.

      (3) Model-Experiment Connection:

      The paper combines simulations with experimental work, which is very good. However, in my opinion, the only connection between the two parts is that the experiments suggest a non-monotonic dependency between firing rate and synapse density (i.e. the biphasic dependency). The rest of the experimental results seem to be neglected in the modeling part. It is not even shown that the model reproduces the experiments. Instead, the model is tested in different situations and paradigms (blocking AMPARs in the whole culture vs network growth or silencing a sub-population). I think it would make the paper stronger and more consequential when a reproduction of the experiment by the model is demonstrated (with analogue analyses).

      The experimental results serve three main purposes. First, as the reviewer noted, the spine analysis was conducted to inform the biphasic rule. Second, spine size analysis was performed to replicate published findings and confirm our modeling results, showing that activity deprivation leads to fewer synapses with larger sizes or higher weights. Third, the correlation analysis of spine density and size across dendritic segments suggested a hybrid combination of two types of plasticity across different neurons.

      While we addressed these aspects in the Results and Discussion sections, the collective presentation in Fig. 2 may have caused some confusion. To improve clarity, we have now split the experimental results, presenting them alongside the relevant modeling data in Fig. 2, Fig. 8, and Fig. 9.

      Also, there are a few more mismatches between the experiment and the model that you will want to discuss:

      • The size-dependent homeostatic effect (l154ff, Fig2F) is not reflected by the used scaling model.

      We revised Fig 8 and the corresponding text to explain how the scaling model reflects such an effect.

      • The model assumes reduced Ca levels. Yet, the experimental protocol blocks AMPARs, which are to my knowledge not the primary source of Ca influx, but rather the NMDARs.

      The model is based on neural activity, with calcium concentration serving as an internal integral signal of the firing rate, allowing for integral control. While calcium plays a critical role in homeostasis, we caution against drawing a strict correspondence between the model's calcium dynamics and the experimental protocol, as calcium can be sourced from multiple pathways in neurons beyond AMPARs, such as NMDARs, voltage gated calcium channels, and intracellular stores. Also, our recent work demonstrated that under baseline conditions, the majority of AMPARs are not Ca2+ permeable, i.e., GluA2-lacking (Kleidonas et al., 2023)

      Improving the calcium dynamics, including secondary calcium release and calcium stores, is part of our future plan to refine the HSP model and address experimental findings that are not fully explained by the current model.

      • The model further assumes silencing by input removal, whereas the recurrent connections stay intact. Wouldn't this rather correspond to a deafferentation experiment, where connections to another brain area are cut?

      Thank you for pointing at this. The modeling section was not intended to directly replicate the tissue culture experiments but rather to provide insights into a broader range of scenarios, including pharmacological treatments, deafferentation, lesions, and even monocular deprivation.

      Systematic text revision: Regarding comment (3), the goal of our modeling work was more than reproducing. To better serve the purposes of experimental results used in the present study, to inform, confirm, and inspire, we have systematically adjusted the layout of experimental and modeling results to link them better.

      (4) Is the recurrent component too weak?

      Your results show that HSP does not restore activity after silencing (deafferentation), whereas you discuss that earlier models did achieve this by active neighbors in a spatially organized network. However, the silenced neurons in your simulations also receive inputs through the "recurrent" connections from their neighbors (at least shortly after silencing). Therefore, given the recurrent input is strong enough, they should be able to recover in a similar way as the spatially organized ones. As a consequence, I obtained the impression that, in your model networks, activity is strongly driven by external stimulation and less by recurrent connections. I understand that this is important to achieve silencing through removing the Poisson stimulation. Yet, this fact may be responsible for the failure to restore activity such that presented effects are only applicable for networks that are strongly driven by external inputs, but not for strongly recurrent networks, which would severely limit the generality of the results. As a consequence, the paper would benefit from a systematic analysis of the trade-off between recurrent strength and input strength. Maybe, different constant negative currents could be injected in all neurons, such that HSP creates more recurrent synapses in the network.

      We appreciate this insight. However, increasing recurrent input strength is beyond the scope of the current study, as it would fundamentally alter the predefined network dynamics of the Brunel network used. As noted in the manuscript, complete isolation or cell death is not always the outcome after input deprivation, lesion, or stroke, which cannot be fully explained by the Gaussian HSP rule alone. Butz and colleagues offered a solution using growth rules that maximized recurrent input, and we recognize the importance of their work.

      That said, we approached the issue from a different angle, emphasizing the role of synaptic scaling in recurrence rather than relying solely on recurrent input strength. In biological networks, external inputs may vary, recurrency can be weak or strong, and synaptic scaling can dominate. Our model offers a complementary hypothesis, suggesting that these factors, in combination, contribute to the diverse and sometimes contradictory results found in the literature, rather than posing a strict constraint on network topology.

      Local text revision: We emphasized these points in the Discussion section again.

      (5) Missing conclusions / experimental predictions

      As already described, the modelling work is not reproducing the presented or previous experimental data. Hence, the goal of modelling should be to derive a more general understanding and make experimental predictions. Yet, the conclusions in the discussion stay superficial and vague and there are no specific experimental predictions derived from the model results.

      For example, the authors report that the recovery of activity in silenced cultures is observed in a previously spatially structured model but not in theirs -- at least with slow or no scaling. Yet it is left to the reader to think about whether the current model is an improvement to the previous one, how they could be experimentally distinguished, or to which experimental findings they relate or compare, which I would expect at this point. I would advise reworking the discussion and thoroughly working out which new insights the modelling part of the study has generated (not to be confused with the assumptions of the model aka the biphasic plasticity rule) and relating them to experimental pre- and postdiction.

      We recognize the reviewer’s concern, which is closely related to comment (4). We have addressed these points by reorganizing the text to better clarify the purpose of our experimental work and its connection to the modeling results.

      Specifically, we have reworked the discussion to highlight the new insights gained from the modeling, and how these can inform experimental predictions and interpretations. This includes distinguishing our model from previous ones and providing clearer connections to experimental findings.

      Systematic text revision: Most of the comments on combining experiments and modeling results and on developing the story based on our expectations raised here are sincere and may also reflect the expectations and concerns of a broader readership, so we have accordingly adjusted the text in the Results and Discussion sections to make our points clear.

      Suggestions for minor changes:

      Fig 1I: Please check the graph and make it more self-explaining. For example, mark the "setpoint" activity (in my opinion, both curves should be at baseline there. In that case, however, I do not see the biphasic behavior anymore). Maybe the table and the graph can be aligned along the activity axis? Also: synaptic inhibition should be increased and not decreased, right?

      Local text and figure revision: I guess the reviewer meant for Fig. 2I? We have improved the visualization to avoid confusion.

      L74-81: I would reverse the order of associative and homeostatic plasticity in this paragraph.

      Local text and figure revision: We have fine-tuned the order in the first and second paragraphs to match the readers' expectations.

      L74-75: Provide references for such theories.

      Local text and figure revision: fixed.

      L84-86: Please provide a reference for the claim that negative feedback, redundancy, and heterogeneity contribute to robustness.

      Local text and figure revision: fixed.

      L 95-97: I think the heterogeneity aspect needs to be worked out a bit better. Do you mean that the described mechanisms contribute to firing rate homeostasis in a different mixture for each neuron (as shown assumed in the last figure)?

      Local text and figure revision: The term heterogeneity is used in the manuscript for two major different settings: (1) heterogeneity in terms of control theory and (2) different combinations of HSP and SS rules. We have named the second condition as diversity to avoid confusion.

      L 132: The question of linearity has not been posed so far. Also, I think "monotonous" would be a much better term than linear (as a test for linearity would require more than 2 datapoints).

      Local text and figure revision: We agreed linear is not a good term. We replaced it with ‘monotonic’ throughout the manuscript.

      Fig2 Bii: The data for 50um is clearly not Gaussian.

      We did not imply that the 50 µM condition is Gaussian. Instead, we noted that the non-linearity observed in both the 200 nM and 50 µM data suggests a non-monotonic growth rule rather than a linear one. We applied the Gaussian rule because it has been extensively studied in previous simulations, allowing us to benchmark our findings against those results.

      Fig2 D, E inset: The point at time 0 does not convey any information and could be left out.

      The time zero data is included to demonstrate that the three groups have a similar baseline, ensuring that any observed differences are due to the treatment and not pre-existing biases in the grouping.

      L 178: As the Gaussian rule drops below zero above the upper set-point again, it is rather tri-phasic than bi-phasic.

      We intended to convey that inhibition results in either spine growth or deletion, reflecting a bi-phasic response rather than a true tri-phasic one.

      Fig 6A: You may want to mark the eta variables in the curves.

      Local text and figure revision: fixed.

      Fig 6E: The curve of the S population extending to the next panel looks a bit messy.

      We retained the curve extension to visually convey the impression of excessive network activity.

      L272: It needs to be better described/motivated how protocol 1 and 2 are supposed to study the role of recurrent connection as well as what kind of biological situation this may be.

      Local text and figure revision: The corresponding text has been adjusted to avoid confusion.

      L 272: It is not clear how faster simulation leads to less recurrent connectivity, when the stimulation protocol and the rates stay the same and the algorithm compensates for the timestep properly. Maybe you rather want to say that you silence 10x longer and stimulate 10x longer?

      Local text revision: The corresponding text has been adjusted to avoid confusion.

      L. 302: "reactivate"?

      Local text revision: fixed.

      L 322f: I would suggest showing the connectivity matrix for a time-point with restored activity as well.

      Local text and figure revision: fixed.

      Fig 8A: The use of the morphological reconstructions is a bit misleading as the model uses point neuron.

      Local text revision: Now after reorganization, it is in Fig.9. We kept the reconstruction figure for motivational purposes, suggesting how to understand the meaning of the combinations in more biologically realistic scenarios. The corresponding text has been adjusted to avoid confusion.

      Fig 8E-F: the y axis should be in the same orientation as in panel D.

      Local text and figure revision: Good idea and fixed in the new Fig. 9.

      Fig. 8F: The results here look a little bit random. Maybe more runs with the same parameters would smooth out the contours or reveal a phase transition.

      Local text and figure revision: Thank you for the suggestion. We conducted an additional ten random trials to average the traces and heatmaps, improving the clarity of the results now presented in Fig. 9.

      L411: Note that there are earlier HSP models by Damasch and van Ooyen & van Pelt, that might be worth discussing here.

      Local text revision: fixed.

      L416 "beyond synaptic scaling" reference needed.

      Local text revision: fixed.

      L419: The biphasic rule was suggested by Butz already.

      Local text revision: We adjusted the text to emphasize our contribution in suggesting/confirming the biphasic rule based on direct experimental observations.

      L 419-44: Most of this is actually state-of-the art and may be better placed in the introduction to justify the use of NBQX as a competititve blocker.

      Local text revision: We adjusted the text in the introduction and Discussion sections to cover the raised points.

      L487: In my opinion, although scaling adapts the weights quickly, the information about deviating firing rate is still stored in the calcium signal such that it will also give rise to structural changes (although they may be small when the rate is low). Thus, I think that fast scaling does not abolish structural changes.

      Local text revision: We adjusted the text to account for other factors that could lead to the same or opposite conclusions.

      L502f: Sentence unclear. Do you mean Ca is an integrated (low-pass filtered) version of the firing rate?

      Yes.

      L504: What is the cumulative temporal effect of error in estimating firing rates?

      We were referring to the potential instability in numeric simulations if the firing rate is not tracked by an integral signal (calcium concentration) but is instead estimated through average spike counts over time. In our model, calcium serves as a proxy for the firing rate to guide homeostatic structural plasticity. The intake and decay constants are set to minimize the accumulation of errors over time, making long-term error accumulation unlikely. In any case, this is not intended to be a precise measure of the firing rate but rather a smooth guide for homeostatic control.

      Local text revision: We rewrote the section so as not to cause extra concerns.

      L505: Which two rules are meant here? Ca- and firing rate based or HSP and scaling?

      Local text revision: The two rules are the HSP rule and the HSS rule. We have adjusted the text to improve clarity.

      L505ff: I did not really understand the control theoretic view here and Supp Fig 5 is not self-explaining enough to help. In my view, scaling is a proportional controller for the calcium level (the setpoint is defined for calcium and not firing rate). Also, all of the HSP rules do neither contain an integral nor a differential of the error and are thus nonlinear but proportional controllers in first approximation. If this part is supposed to stay in the manuscript, the supporting information should contain a more detailed mathematical explanation. Relevant previous work on homeostatic control by synaptic scaling and homeostatic rewiring, e.g. doi: 10.23919/ECC54610.2021.9655157 should be discussed

      Local text revision: We have updated the last paragraph to increase clarity. The HSP and HSS rules are proportional and integral for neural activity, as neural firing rate homeostasis is the meaningful goal. However, it is also correct that the integral component is gone if we view calcium concentration as the goal or setpoint. This paper is discussed and cited in a paragraph above this one.

      Reviewer #2 (Recommendations For The Authors):

      I have some additional suggestions and questions for the authors, which I am presenting following the order of the figures.

      Fig 1A: I'm a little bit puzzled by the timescales between Hebbian and homeostatic plasticity; a wealth of data suggests that Hebbian plasticity acts on a faster timescale than homeostatic plasticity, while Aii-Aiii implies the opposite. In lesion-induced degeneration, for instance, which is mentioned later by the authors, spine loss has been suggested to be Hebbian (LTD) while the subsequent recovery is homeostatic. Additionally, it will not be clear to the reader if the same stimulus could induce Hebbian and homeostatic plasticity, or why; the rest of the manuscript seems to imply that any stimulus could and would trigger homeostatic plasticity, which is not the case. Finally, there should be a mention somewhere that Hebbian structural plasticity also exists.

      Local text and figure revision: We thank the reviewer for pointing out the time scale issue, which was not explicitly considered here and is now updated.

      Fig. 2Bii: There is no significant difference at 200nm NBQX for sEPSC amplitude, contrary to what is stated in the text (line 136). Which one is it?

      Local text revision: We thank the reviewer for pointing out the mistake. We have inspected the original statistical file and corrected the text.

      Fig. 2F: The description of Fig. 2F in the text confused me for the longest time. I am still unsure why 200nm NBQX is described as leading to a general size increase when it follows the control line so closely, crosses 0 at the same point, and is even below the control line for the largest spine sizes. Similarly, 50um NBQX neatly overlaps with the control condition except for the smallest and largest spines, so the "shrinkage of middle-sized spines" doesn't seem different from the control condition. I also couldn't find any data supporting the statement that 50um NBQX increased only the size of "a small subset of large spines". Maybe the authors could clarify this section? I would also suggest adding statistics between the treatments at each spine size bin to support the claims, as they are central to the rest of the paper.

      Importantly, there is no description of the normalization nor the quantification of the difference between days in the methods; I am assuming post-pre for the difference and (post-pre)/pre for the normalization, but this should be much more detailed in the methodology. I was happy to see the baseline raw spine sizes in Supplementary Fig. 1, and would also suggest adding the raw spine sizes after treatment for comparison.

      Local text and figure revision: We have adjusted the text and figure to improve clarity.

      Fig. 2G/S2A: a scale for the label sizes would be helpful. I would also like to have the same correlation for 50um NBQX treatment and the control condition (at least in the supplementary figures).

      Local text and figure revision: We have adjusted the text and figure to improve clarity.

      Fig. 2I: I might be missing something, but why is the activity line flat when there are changes in spine density and size?

      Local text and figure revision: We have adjusted the text and figure to improve clarity.

      Fig. 3C-D: they are referenced in the text as Fig. 1C-D (lines 188-194).

      Local text revision: fixed.

      Fig. 5: it is interesting that the biphasic model captures both spine loss and recovery, fitting well with lesion-induced degeneration and recovery. Does this mean that the model captures other types of plasticity, or does it suggest to the authors that both steps are homeostatic?

      Indeed, the biphasic HSP rule captures two types of activity dependence. The pioneering work by Gallinaro and Rotter (2018) also demonstrated that the HSP rule, even in its monotonic/linear form, exhibits associative properties, which are typically associated with Hebbian plasticity.

      Fig. 6A: This figure requires a more detailed legend - what are the various insets? Does the top right graph only have one curve because they are overlapping and the growth rules are the same for axons and dendrites?

      Local text revision: fixed.

      Fig. 6E: There is usually an overshoot when a stimulus is removed, in this case at the end of the silencing period (as shown in Fig. 1Aiii). Is there a reason why this is not recapitulated here? It shouldn't be as extreme as in the right panel so there should be no degeneration.

      We agree that removing the stimulus would typically trigger an opposite homeostatic process. However, in this protocol, we aimed to emphasize the role of recurrency by presenting extreme cases to illustrate potential scenarios for the readers.

      Local text revision: We revised this paragraph to walk the readers through the rationale better.

      Fig. 6: the authors mention distance-dependent connectivity (line 268), but I couldn't find any data related to that statement. I was particularly curious about that aspect, so I would like to know what this statement is based on, especially as they touch again on the role of morphology in Fig. 8, and distance-dependent connectivity is more prominent in the discussion. On a similar note, would the authors have data from other layers of CA1 that would show similar or other rules? Please note that I am not asking to include these data in the present paper - I am just curious if these data exist (or if the experiments are considered).

      Such an extensive dataset is included and thoroughly investigated in another study that has just been published in Lenz et al., 2023. We updated the reference in the revised text.

      Fig. 7E top: the scalebar is missing.

      Local text revision: fixed.

      Fig. 8A: do the colors have meaning? If yes, please state them. Also indicate that the left two neurons are pyramidal cells from CA1 and the right neurons are granule cells from the dentate gyrus.

      Local text revision: fixed.

      Line 302: "reactive" should be "reactivate".

      Local text revision: fixed.

    1. Authors’ Response (6 May 2024)

      GENERAL ASSESSMENT

      He et al. explore the structure and mechanisms of human mitochondrial RNA splicing 2 protein (MRS2). MRS2 is a mitochondrial ion channel that was thought to form Mg2+-selective channels based on its homology to the CorA family of prokaryotic Mg2+ channels. Here, the authors used an innovative biochemical strategy to express MRS2 and perform single particle reconstructions of MRS2 in the absence and presence of key divalent cations. They obtained high resolution reconstructions of pentameric MRS2 and identified the divalent binding sites, some of which appear to be different from the prokaryotic counterparts. In addition, they showed that the structures of MRS2 appear to be more stable than CorA, exhibiting consistent features across different conditions, including in the presence of EDTA, Mg2+, and Ca2+. They further investigated electrophysiological characteristics of a mutant MRS2 channel and propose that it acts like a Ca2+-regulated, cation-selective, Mg2+-permeable channel, in contrast to the better characterized CorA channel, which is Mg2+-regulated and has a higher selectivity for Mg2+. This is an important study with interesting structural observations and an innovative hypothesis on function. We suggest that a more careful interpretation of the functional data and their relevance to MRS2 function in mitochondria would increase the overall value of the work.

      We would like to thank the colleagues from Biophysics Colab for reviewing our manuscript. We have revised our initial manuscript incorporating these recommendations and the reviewers’ comments from the publishing journal. We will also acknowledge Biophysics Colab in the published version of this work.

      RECOMMENDATIONS

      Essential revisions:

      1.    Because R332 lines the channel pore, one would predict that neutralization of its positive charge would have an effect on ion permeation characteristics – either single channel conductance or relative permeabilities of different ions. Thus, it is unclear whether ion selectivity of the R332S mutation (probed in, for example, Fig. 4) is representative of WT MRS2. Ideally, selectivity would have been measured on the WT channel. If the authors performed similar experiments with R332D (if it expresses), would the observations be at least qualitatively similar?

      This is an excellent point. Indeed, it is possible that the R332S mutation affects the ion selectivity of MRS2. To test this, we have examined the ion permeation properties of the wild-type channel, MRS2WT. While MRS2WT conducted no detectable Mg2+ currents, its Na+ currents could be detected as shown in the original Figure 4a. MRS2WT still showed no anomalous mole fraction effect (AMFE), as the Na+ currents were unaffected by 100-µM Mg2+ (see new Extended Data Fig. 7a in the revised manuscript). Therefore, the lack of divalent cation selectivity of MRS2 was not artificially caused by the R332S background. We are in the process of mutating R332 to a wide range of other amino acids to better link the side-chain chemistry to MRS2 function. This will be an important future direction.

      Similarly, if the corresponding site in TmCorA (S) is mutated to R, would it behave like MRS2? Such data would increase confidence in the conclusions regarding selectivity. In addition, measuring relative permeabilities of ions would be significantly more informative than current magnitudes. If measurement of relative permeabilities is not feasible due to low current amplitudes, it would be important for the authors to tone down their conclusions on selectivity.

      Our results above have now demonstrated that R332 does not contribute to the ion selectivity of MRS2. Therefore, it is unlikely that mutating the corresponding residue of R332 in TmCorA (S284) to Arg would create profound effects on the ion selectivity of CorA. It should also be noted that the selectivity filter of CorA has been identified as the ‘GMN’ motif, which is far away from S284. However, we agree that S284R likely reduces the CorA conductance, and plan to test this mutant in future work.

      We are unable to measure the permeability ratio, as we have not established patch-clamp recordings of MRS2. This is certainly an important future direction. However, the lack of anomalous mole fraction effect (AMFE) indicates that MRS2 lacks the molecular property that confers divalent cation selectivity to CorA, and accordingly it is reasonable to conclude that MRS2 is a non-selective cation channel.

      A related technical consideration: from the description of the experiments, summarized in the bar graph in Fig. 4a (right), it’s not clear which/how many measurements were done on the same oocyte. It might be useful to mention that because oocyte-to-oocyte variability is a very important factor which can sometimes obfuscate observations and their interpretation. For all electrophysiological observations, it would be very useful to clarify whether the error bars are standard deviations (sd) or standard errors of the mean (sem). Because the replicates for the different measurements are highly variable – ranging between 6 and 34 – it might be more appropriate to compare sd instead of sem.

      Recordings from the same oocyte would only be counted as a single data point. We appreciate the reviewer's concern about using SD vs. SEM. However, we are comparing drastic differences. For example, we can detect Mg2+ currents with the R332S mutation but not with MRS2WT, and we can see AMFE in CorA but not in MRS2. These major effects are unlikely affected by whether we present the data with SEM or SD.

      2.    Recordings to examine currents at more hyperpolarizing potentials are essential for drawing conclusions about the function of MRS2 in mitochondria. The voltage at which the oocytes are clamped in all electrophysiological measurements (-60 mV) might be very different from the voltage at which MRS2 operates in a native environment. If MRS2 is susceptible to voltage-dependent block by the permeant divalents (Ca2+/Mg2+), their presence could influence currents observed at hyperpolarized potentials.

      We have now recorded MRS2WT at -120 mV. No Mg2+ currents or AMFE were observed, as in our recordings at -60 mV.

      3.    P4, “In the divalent-free MRS2EDTA structure, discernible ion densities are absent in the central pore.” Because the map was generated by imposing C5 symmetry during processing (with the pore located at the central symmetry axis) and the buffer contained NaCl (which is known to permeate MRS2), we would expect the maps to show some density for ions in addition to noise generated during data processing. Although these maps were not available for this review, inspection of related maps for MRS2 (EMDB-41628 and EMDB-35631) indeed show density within the pore in the presence of NaCl and EDTA. Also, the symmetrical diamond-shaped density (either from ions or noise) shown in Extended Data Fig. 5 has the characteristics of being enhanced during processing with imposed C5 symmetry. It would be important for the authors to clarify how they drew conclusions about the absence or presence of ion densities along the pore in the different maps they refined. Showing density at equivalent positions within the pore for their different structures would be a nice addition to Ext. Data Fig. 5.

      This is an excellent point. Assignment of ions in cryo-EM density maps is indeed challenging because of noise, especially at the symmetry axis. We have carefully examined these densities and the chemical environment nearby to assign these ions. We have now included density maps at these equivalent positions in the revised manuscript.

      4.    The currents shown in recordings from oocytes were at negative voltages and were elicited by replacement of NMDG with smaller monovalent or divalent cations. For these currents to be rigorously attributed to MSR2, it would have been important to perform the experiments in parallel with control oocytes not expressing the protein (either injected with water or uninjected). However, we appreciate that this would require considerable effort to address in retrospect. One solution would be for the authors to identify a few key conditions, perhaps those shown in Fig. 3, and repeat them with appropriate controls to allow comparison of the data in a bar chart or related graph. The data shown in Fig. 4a for the WT protein could be considered a reasonable control in such experiments, so perhaps the authors could point this out to the reader?

      In the new Extended Data Fig. 7a in the revised manuscript, we provided data showing that uninjected oocytes, or oocytes expressing the mitochondrial calcium uniporter, showed no Mg2+ currents, suggesting that the observed Mg2+ currents were mediated by MRS2. Additionally, we could inhibit these currents with cobalt hexammine (Fig. 3c-d), or drastically reduce the currents with MRS2 mutations (Fig. 5a-b). These observations all support the conclusion that we are observing MRS2 currents.

      Optional suggestions:

      1.    In several of the 2D class averages, particularly in Extended Fig. 1a, MRS2 seems to be located off-center, almost at the edge of the micelle. With a relatively small transmembrane core, it is possible that MRS2 is “freely diffusing” in the micelle, in which the lateral pressure that the transmembrane domains are subject to is quite different from the scenario where the protein is more at the micelle center. Would this observation have any bearing on the function/reconstruction of MRS2, particularly given that limited structural changes are observed in the transmembrane segments between divalent free and with-divalent conditions? The 2D classes are likely from an early stage of reconstruction. It might be worthwhile to show 2D classes of the final set of particles used for the reconstruction.

      This is an interesting point. It may influence the conformations of channels that are very sensitive to their surrounding environments. For MRS2, we do not think that the off-center location in detergent micelles significantly changes its structure. We have later also determined the cryo-EM structure of MRS2 in lipid nanodiscs, which is identical to the structure in detergents.

      2.    It is interesting that, in the reconstructions with Ca2+, the peripheral domains become more heterogenous than in Mg2+ or EDTA (Extended Fig. 1). How does this region of the map compare with the location of divalent site 3?

      The divalent site 3 is not located within the peripheral domains. The cryo-EM densities, as shown in Extended Data Fig. 5, are well defined near site 3 in both Ca2+ and Mg2+ conditions.

      3.    Would Ca2+ (but not Mg2+) binding make this region more dynamic and could that have any mechanistic significance?

      This is a very interesting point. We did not see apparent structural changes in Ca2+ vs Mg2+ conditions and hypothesize that Ca2+ regulation may arise from differences in structural dynamics. We have been using other biophysical techniques such as high-speed atomic force microscopy to investigate these differences.

      4.    Does the Ca2+ reconstruction (Extended Fig. 1) have a preferred orientation? The elevation/azimuth plots show an asymmetry (along the elevation) which might have appeared from some kind of bias. It is not clear if the authors have tried to address this, say by rebalancing 2D classes. 3D FSC curves might help test/address this bias.

      In general, particles in Ca2+ conditions are more prone to aggregation and appear to have some degrees of preferred orientation.

      5.    While the structural difference between MRS2 and TmCorA at the level of the a/b _domain is clear in Extended Fig. 3, it may be worthwhile to compare them in the context of the pentamer. Particularly, does the difference alter the interfaces? Are the surface electrostatic properties of the domain similar? Considering that these domains mediate divalent regulation, comparison of these properties might help readers better appreciate similarities/differences in their structural attributes.

      These are very good points to add to structural comparison between MRS2 and TmCorA.

      6.    With respect to Fig. 1D, do the authors observe any side portals for ion entry/exit into the pore? The soluble domains of MRS2 seems to form a highly electronegative cavity for ion translocation. Are there any single channel conductance measurements of MRS2 that would argue for the importance of these electronegative surfaces?

      No apparent side portals for ion entry were observed. We have not done single-channel recordings yet. Since CorA single-channel recordings have not been reported to date, we speculate that MRS2/CorA might be too slow to produce detectable single-channel currents.

      7.    It doesn’t appear that any approximations of the relative affinity of MRS2 for Mg2+/Ca2+ (e.g. EC50 measurements) are available at this point. It might therefore have been better for the Mg2+ reconstructions to include EGTA in the buffers to sequester Ca2+, given that conventional filter papers used during plunge-freezing are fabricated with ash containing a lot of Ca2+/Zn2+. This would have helped to at least partially address questions about whether the observed divalent densities truly correspond to the ions used during cryoEM sample preparation.If the authors are not able to do Mg2+ reconstructions with EGTA in the buffer, it would be of benefit to at least comment on this issue in the discussion of the results.

      This is a very good control experiment to validate Mg2+ binding. Given that the MRS2 structure in 10 mM EDTA (without added divalent ions) is essentially the same as that in high Mg2+ or Ca2+, we would expect that MRS2 structures in Mg2+ & EGTA conditions are likely the same as in other conditions.

      8.    What is the orientation of MRS2 when it is in the plasma membrane? If the orientation is such that the regulatory domains face the cytosol, inside-out patches would be more informative, appropriate and reliable for addressing the mechanistic questions that the authors are exploring. The authors should comment on whether or not the orientation is known.

      The MRS2 orientation is oocyte membranes is currently unknown. It will be interesting to determine the orientation in the future.

      9.    P4, “additional unique Mg2+ binding site (site3)”. In Fig. 2, it would be beneficial to label and specify the distances between the binding residues and the ions, along with elucidating the nature of the interactions they form.

      This is a good point. However, we do not want to overinterpret the structure to specify how the ion is coordinated by side-chain atoms because of the limited resolution.

      10. P9, in the discussion about structural dynamics. Drawing conclusions about the rigidity of MRS2's structure may be premature at this stage. Since the MRS2 structures are pentameric, the unique feature of asymmetrical particles can potentially be averaged by the features of symmetrical particles, particularly when a substantial number of symmetrical particles are present. This can pose a challenge in isolating and distinguishing asymmetrical structure from the overall dataset, even when applying C1 symmetry during the data processing. It would be helpful to employ techniques such as 3D Variability Analysis from CryoSPARC or subtracting the density of a monomer for focused 3D classification that might provide more insights into the structural dynamics of MRS2.

      To better investigate the structural dynamics of MRS2, we plan to apply more appropriate biophysical methods such as high-speed atomic force microscopy.

      (This is a response to peer review conducted by Biophysics Colab on version 1 of this preprint.)

    1. Author response:

      Reviewer #1 (Public review):

      The significance of the target molecule and mechanisms may help in understanding the molecular mechanisms of metformin.

      We greatly appreciate the reviewer’s insightful comment regarding the significance of the target molecule and its mechanisms in understanding the molecular actions of metformin. ATP5I is responsible for the dimerization of the F<sub>1</sub>F<sub>0</sub>-ATPase(1-3). Hence, we propose conducting BN-PAGE followed by a western blot using the β-subunit of the F1 domain of F1F0-ATP synthase to investigate whether metformin affects its dimerization. This will provide a more direct evidence of the on target action of metformin on ATP5I. Due to the high abundance of F<sub>1</sub>F<sub>0</sub>-ATP synthase in cells and the slow ability of metformin to enter mitochondria, we plan to perform long-term treatments (3 and 6 days) with high concentrations of metformin (10 mM) to enhance the likelihood of detecting subtle yet biologically relevant shifts in the monomer and dimer populations. Prolonged exposure is expected to reveal the cumulative effects of metformin on F<sub>1</sub>F<sub>0</sub>-ATP synthase dimers/monomers ratio. We do not expect that metformin will totally mimic the cumulative effect of the dimerization as in ATP5I KO cells but we think it will be important to report to what extent this ratio is affected.

      Reviewer #2 (Public review):

      (1) The interpretation of the cellular co-localization of the biotin-biguanide conjugate with TOMM20 (Figure 1-D) as mitochondrial "accumulation" of the conjugate is overstated because it cannot exclude binding of the conjugate to the mitochondrial membrane. It would have been more convincing if additional incubations with the biotin-biguanide conjugate in combination with metformin had shown that metformin is competitive with the biotin-conjugate.

      We appreciate the reviewer’s insightful comment and agree that the resolution provided by fluorescence microscopy makes it challenging to pinpoint the specific mitochondrial compartment where the biotin-biguanide conjugate localizes, even with additional markers such as TOMM20 antibodies for the inner mitochondrial membrane. While it remains a possibility that the conjugate binds to the mitochondrial surface, another plausible explanation is that the biotin moiety may facilitate entry into mitochondria through a biotin-specific transporter, adding further mechanistic intricacies. Furthermore, while a competition assay with metformin might help investigate interactions with mitochondrial targets and transporters (OCT family), it would not compete for biotin-mediated transport. Thus, while we acknowledge the reviewer’s suggestion, we believe such an experiment may not provide conclusive evidence regarding the conjugate’s mitochondrial localization or mechanism of entry. Instead, we will revise the manuscript to more accurately describe the findings as "mitochondrial association" rather than "mitochondrial accumulation," ensuring that our interpretation remains consistent with the resolution and limitations of the data presented.

      (2) The manuscript reports the identification of 69 proteins by mass spectrometry of the pull-down assay of which 31 proteins were eluted by metformin. However, no Mass Spectrometry data is presented of the peptides identified. The methodology does not state the minimum number of peptides (1, 2?) that were used for the identification of the 31/69 proteins.

      Concerning the mass spectrometry results, our intention was to provide a comprehensive table summarizing these findings in a separate data sheet, as part of the data availability section. To address the reviewer’s comment and ensure full transparency, we will include this table as supplementary material in the revised manuscript. Additionally, we will update the methodology section to explicitly state these criteria and ensure clarity regarding the identification process.

      (3) The validation of ATP5I was based on the use of recombinant protein (which was 90% pure) for the SPR and the use of a single antibody to ATP5I. The validity of the immunoblotting rests on the assumption that there is no "non-specific" immunoactivity in the relevant mol wt range. Information on the validation of the antibody would be helpful.

      Regarding the recombinant protein used for SPR, its purity was evaluated using a Coomassie-stained gel. For the antibody used in immunoblotting, its specificity was validated through knockout cell lines, ensuring minimal concerns about non-specific immunoactivity within the relevant molecular weight range. Unfortunately, the KO data comes in the paper after the first immunoblots are presented. In the revised manuscript, we will clearly outline these validation steps in the methods section and additional manufacturer documentation for the antibody we used.

      (4) Knock-out of ATP5I markedly compromised the NAD/NADH ratio (Fig.3A) and cell proliferation (Figure 3D). These effects may be associated with decreased mitochondrial membrane potential which could explain the low efficacy of metformin (and most of the data in Figures 3-5). This possibility should be discussed. Effects of [metformin] on the NAD/NADH ratio in control cells and ATP5I-KO would have been helpful because the metformin data on cell growth is normalized as fold change relative to control, whereas the NAD/NADH ratio would represent a direct absolute measurement enabling comparison of the absolute effect in control cells with ATP5I KO.

      The mitochondrial membrane potential depends on a functional electron transport chain which drives proton pumping from the matrix to the intermembrane space. Metformin can decrease the mitochondrial membrane potential and this usually explained as a consequence of complex I inhibition(4). It has been published the metformin requires this membrane potential to accumulate in mitochondria so the actions of metformin are self-limiting due to this requirement. The reviewer is right that ATP5I KO cells could be resistant to metformin because they may have a lower membrane potential. We do not believe this to be the case because the response to phenformin, another biguanide that can enter mitochondria through the membrane without the need of the OCT transporters(5), is also affected in ATP5IKO cells. Of note, compensatory mechanisms such as enhanced glycolysis, as observed in ATP5I-KO cells (elevated ECAR and increased sensitivity to 2-D-deoxyglucose), and the ATPase activity of F<sub>1</sub>F<sub>0</sub>-ATP synthase could potentially help maintain membrane potential suggesting that this might not be an issue in the ATP5I KO cells. We will discuss these possibilities in the revised manuscript.

      Nevertheless, to experimentally address this point, we propose measuring mitochondrial membrane potential using tetramethylrhodamine methyl ester (TMRE) and ATP levels using luciferase-based assays (CellTiter-Glo) in ATP5I-KO cells.

      Regarding the NAD+/NADH in both control and KO cells may not be very helpful because this ratio can be corrected by LDH which is induced as part of the glycolytic adaptation that occurs after inhibition of respiration. Since our KO cells have been propagated already for several passages, the extent of this adaptation is likely different from metformin-treated cells. As we mentioned in answering Reviewer 1, we will provide a more direct measurement of metformin acting on ATP5I: the levels of F1F0-ATPase dimers and monomers.

      (5) Figure-6 CRISPR/Cas9 KO at 16mM metformin in comparison with 70nM rotenone and 2 micromolar oligomycin (in serum-containing medium). The rationale for the use of such a high concentration of metformin has not been explained. In liver cells metformin concentrations above 1mM cause severe ATP depletion, whereas therapeutic (micromolar) concentrations have minimal effects on cellular ATP status. The 16mM concentration is ~2 orders of magnitude higher than therapeutic concentrations and likely linked to compromised energy status. The stronger inhibition of cell proliferation by 16mM metformin compared with rotenone or oligomycin raises the issue of whether the changes in gene expression may be linked to the greater inhibition of mitochondrial metabolism. Validation of the cellular ATP status and NAD/NADH with metformin as compared with the two inhibitors could help the interpretation of this data.

      To address the reviewer’s final comment, we would like to clarify the rationale behind our experimental approach. NALM-6 cells are very glycolytic, have low respiration rates, and weak dependence on ATP5I (DepMap score: -0.47)(6). The concentration of 16 mM metformin was chosen based on the IC50 for this cell line. This approach aligns with our focus on the anticancer mechanism of action rather than the antidiabetic effects of metformin. Both ATP status and NAD+/NADH ratios will depend on the extent of the compensatory glycolysis. On the other hand, our genetic screening evaluates cell proliferation as an integration of all metabolic activities required for the process. This unbiased screening revealed a common pathway affected by metformin and oligomycin different that the pathway affected by rotenone, which is consistent with the finding that metformin acts of the F<sub>1</sub>F<sub>0</sub>ATPase.

      Reviewer #3 (Public review):

      (1) Most of the data are based on measurements of the oxygen consumption rate (OCR) and extracellular acidification rate (ECAR) measured by the Seahorse analyser in control and ATP5l KO cells. However, these measurements are conducted by a single injection of a biguanide, followed over time and presented as fold change. By doing so, the individual information on the effect of metformin and derivate on control and KO cells are lost. In addition, the usual measurement of OCR is coupled with certain inhibitors and uncouplers, such as oligomycin, FCCP, and Antimycin A/rotenone, to understand the contribution of individual complexes to respiration. Since biguanides and ATP5l KO affect protein levels of components of complex I and IV, it would be informative to measure their individual contributions/effects in the Seahorse. To further strengthen the data, it would be helpful to obtain measurements of actual ATP levels in these cells, as this would explain the activation of AMPK.

      We appreciate the reviewer’s observations regarding the Seahorse measurements and acknowledge the potential limitations of presenting the data as fold change. Due to experimental challenges in maintaining KP-4 and ATP5I-KO cells with sufficient nutrients, caused by their rapid glucose uptake and subsequent lactate production, it was more practical to present the Seahorse results in this format. Using inhibitors at each time point during the Seahorse experiment was not feasible, as the delay between inhibitor injections and the corresponding changes in oxygen consumption rate (OCR) and extracellular acidification rate (ECAR) would introduce variability and complicate the interpretation of dynamic responses. Nevertheless, we recognize the importance of understanding the contributions of specific respiratory complexes to OCR and ECAR. To address this, we will include a representative figure showcasing a typical Seahorse analysis, highlighting ATP turnover and proton leak after oligomycin addition, maximal respiration with FCCP, and disruption with rotenone and antimycin A. While these experiments are inherently complex due to the metabolic demands of ATP5I-KO cells, this approach will provide a clearer breakdown of mitochondrial activity. Furthermore, as mentioned in our response to Reviewer 2, we will measure ATP levels using a luciferase-based assay (CellTiter-Glo) in both control and ATP5I-KO cells to better explain AMPK activation. This will provide additional context to strengthen the interpretation of mitochondrial function and metabolic compensation mechanisms in these cells.

      (2) The authors report on alterations in mitochondrial morphology upon ATP5l KO, which is measured by subjective quantifications of filamentous versus puncta structures. Fiji offers great tools to quantify the mitochondrial network unbiasedly and with more accuracy using deconvolution and skeletonization of the mitochondria, providing the opportunity to measure length, shape, and number quantitatively. This will help to understand better, whether mitochondria are really fragmented upon ATP5l KO and rescued by its re-introduction.

      Concerning the analysis of mitochondrial morphology, we acknowledge the potential benefits of using Fiji and additional plugins such as MiNA for more accurate and unbiased quantification. Indeed, this approach could provide stronger evidence for mitochondrial fragmentation upon ATP5I-KO and its potential rescue by ATP5I reintroduction. We will consider integrating this methodology into our analysis to enhance the precision and robustness of our findings.

      (3) Finally, the authors report in the last part of the paper a genetic CRISPR/Cas9 KO screen in NALM-6 cells cultured with high amounts of metformin to identify potential new mediators of metformin action. It is difficult to connect that to the rest of the paper because a) different concentrations of metformin are used and b) the metabolic effects on energy consumption are not defined. They argue about the molecular function of the obtained hits based on literature and on a comparison of the pattern of genetic alterations based on treatments with known inhibitors such as oligomycin and rotenone. However, a direct connection is not provided, thus the interpretation at the end of the results that "the OMA1-DEL1-HRI pathway mediates the antiproliferative activity of both biguanides and the F1ATPase inhibitor oligomycin" while increasing glycolysis, needs to be toned down. This is an interesting observation, but no causality is provided. In general, this part stands alone and needs to be better connected to the rest of the paper.

      NALM-6 are very glycolytic, have low respiration rates, and weak dependence on ATP5I(6), forcing us to use higher concentrations of metformin to inhibit their growth. Recent results show that metformin targets PEN2 in the cytosol to increase AMPK activity, controlling both the glucose lowering and the life span extension abilities of metformin 7. This work raises the question whether the antiproliferative and anticancer effects of metformin are due to a mitochondrial activity or are controlled by this new pathway of AMPK activation. Hence, the genetic screening was performed to unbiasedly find how metformin works. The results provide compelling evidence for mitochondria and in particular the ATP synthase as potential targets of metformin and a foundation for future studies. We will revise the text and abstract to better reflect the exploratory nature of this finding and ensure clarity.

      (1) Paumard, P. et al. Two ATP synthases can be linked through subunits i in the inner mitochondrial membrane of Saccharomyces cerevisiae. Biochemistry 41, 10390-10396 (2002). https://doi.org/10.1021/bi025923g

      (2) Paumard, P. et al. The ATP synthase is involved in generating mitochondrial cristae morphology. EMBO J 21, 221-230 (2002). https://doi.org/10.1093/emboj/21.3.221

      (3) Habersetzer, J. et al. ATP synthase oligomerization: from the enzyme models to the mitochondrial morphology. Int J Biochem Cell Biol 45, 99-105 (2013). https://doi.org/10.1016/j.biocel.2012.05.017

      (4) Xian, H. et al. Metformin inhibition of mitochondrial ATP and DNA synthesis abrogates NLRP3 inflammasome activation and pulmonary inflammation. Immunity 54, 1463-1477 e1411 (2021). https://doi.org/10.1016/j.immuni.2021.05.004

      (5) Hawley, S. A. et al. Use of cells expressing gamma subunit variants to identify diverse mechanisms of AMPK activation. Cell metabolism 11, 554-565 (2010). https://doi.org/10.1016/j.cmet.2010.04.001

      (6) Hlozkova, K. et al. Metabolic profile of leukemia cells influences treatment efficacy of L-asparaginase. BMC Cancer 20, 526 (2020). https://doi.org/10.1186/s12885-020-07020-y

      (7) Ma, T. et al. Low-dose metformin targets the lysosomal AMPK pathway through PEN2. Nature 603, 159-165 (2022). https://doi.org/10.1038/s41586-022-04431-8

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Sun et al. are interested in how experience can shape the brain and specifically investigate the plasticity of the Toll-6 receptor-expressing dopaminergic neurons (DANs). To learn more about the role of Toll-6 in the DANs, the authors examine the expression of the Toll-6 receptor ligand, DNT-2. They show that DNT-2 expressing cells connect with DANs and that loss of function of DNT-2 in these cells reduces the number of PAM DANs, while overexpression causes alterations in dendrite complexity. Finally, the authors show that alterations in the levels of DNT-2 and Toll-6 can impact DAN-driven behaviors such as climbing, arena locomotion, and learning and long-term memory.

      Strengths:

      The authors methodically test which neurotransmitters are expressed by the 4 prominent DNT-2 expressing neurons and show that they are glutamatergic. They also use Trans-Tango and Bac-TRACE to examine the connectivity of the DNT-2 neurons to the dopaminergic circuit and show that DNT-2 neurons receive dopaminergic inputs and output to a variety of neurons including MB Kenyon cells, DAL neurons, and possibly DANS.

      We are very pleased that Reviewer 1 found our connectivity analysis a strength.

      Weaknesses:

      (1) To identify the DNT-2 neurons, the authors use CRISPR to generate a new DN2-GAL4.

      They note that they identified at least 12 DNT-2 plus neurons. In Supplementary Figure 1A, the DNT-2-GAL4 driver was used to express a UAS-histoneYFP nuclear marker. From these figures, it looks like DNT-2-GAL4 is labeling more than 12 neurons. Is there glial expression?

      Indeed, we claimed that DNT-2 is expressed in at least 12 neurons (see line 141, page 6 of original manuscript), which means more than 12 could be found. The membrane tethered reporters we used – UAS-FlyBow1.1, UASmcD8-RFP, UAS-MCFO, as well as UAS-DenMark:UASsyd-1GFP – gave a consistent and reproducible pattern. However, with DNT-2GAL4>UAS-Histone-YFP more nuclei were detected that were not revealed by the other reporters. We have found also with other GAL4 lines that the patterns produced by different reporters can vary. This could be due to the signal strength (eg His-YFP is very strong) and perdurance of the reporter (e.g. the turnover of His-YFP may be slower than that of the other fusion proteins).

      We did not test for glial expression, as it was not directly related to the question addressed in this work.

      (2) In Figure 2C the authors show that DNT-2 upregulation leads to an increase in TH levels using q-RT-PCR from whole heads. However, in Figure 3H they also show that DNT-2 overexpression also causes an increase in the number of TH neurons. It is unclear whether TH RNA increases due to expression/cell or the number of TH neurons in the head.

      Figure 3H shows that over-expression of DNT-2 FL increased the number of Dcp1+ apoptotic cells in the brain, but not significantly (p=0.0939). The ability of full-length neurotrophins to induce apoptosis and cleaved neurotrophins promote cell survival is well documented in mammals. We had previously shown that DNT-2 is naturally cleaved, and that over-expression of DNT-2 does not induce apoptosis in the various contexts tested before (McIlroy et al 2013 Nature Neuroscience; Foldi et al 2017 J Cell Biol; Ulian-Benitez et al 2017 PLoS Genetics). Similarly, throughout this work we did not find DNT-2FL to induce apoptosis.

      Instead, in Figure 3G we show that over-expression of DNT-2FL causes a statistically significant increase in the number of TH+ cells. This is an important finding that supports the plastic regulation of PAM cell number. We thank the Reviewer for highlighting this point, as we had forgotten to add the significance star in the graph. In this context, we cannot rule out the possibility that the increase in TH mRNA observed when we over-express DNT-2FL could not be due to an increase in cell number instead. Unfortunately, it is not possible for us to separate these two processes at this time. Either way, the result would still be the same: an increase in dopamine production when DNT-2 levels rise.

      We have now edited the abstract lines 38-39 adding that “By contrast, over-expressed DNT-2 increased DAN cell number,…”, within the main text in Results page 10 lines 259-265 and in the Discussion section page 15 lines 391, 393-396.

      (3) DNT-2 is also known as Spz5 and has been shown to activate Toll-6 receptors in glia (McLaughlin et al., 2019), resulting in the phagocytosis of apoptotic neurons. In addition, the knockdown of DNT-2/Spz5 throughout development causes an increase in apoptotic debris in the brain, which can lead to neurodegeneration. Indeed Figure 3H shows that an adult specific knockdown of DNT-2 using DNT2-GAL4 causes an increase in Dcp1 signal in many neurons and not just TH neurons.

      Indeed, we did find Dcp1+ TH-negative cells too (although not widely throughout the brain), although this is not shown in the images of Figure 3H where we showed only TH+ Dcp+ cells.

      That is not surprising, as DNT-2 neurons have large arborisations that can reach a wide range of targets; DNT-2 is secreted, and could reach beyond its immediate targets; Toll-6 is expressed in a vast number of cells in the brain; DNT-2 can bind promiscuously at least also Toll-7 and other Keks, which are also expressed in the adult brain (Foldi et al 2017 J Cell Biology; Ulian-Benitez et al 2017 PLoS Genetics; Li et al 2020 eLife). Together with the findings by McLaughlin et al 2019, our findings further support the notion that DNT-2 is a neuroprotective factor in the adult brain. It will be interesting to find out what other neuron types DNT-2 maintains.

      We have made some edits on these points in page 10 lines 259-265.

      We would like to thank Reviewer 1 for their positive comments on our work and their interesting and valuable feedback.

      Reviewer #2 (Public review):

      This paper examines how structural plasticity in neural circuits, particularly in dopaminergic systems, is regulated by Drosophila neurotrophin-2 (DNT-2) and its receptors, Toll-6 and Kek-6. The authors show that these molecules are critical for modulating circuit structure and dopaminergic neuron survival, synaptogenesis, and connectivity. They show that loss of DNT-2 or Toll-6 function leads to loss of dopaminergic neurons, dendritic arborization, and synaptic impairment, whereas overexpression of DNT-2 increases dendritic complexity and synaptogenesis. In addition, DNT-2 and Toll-6 modulate dopamine-dependent behaviors, including locomotion and long-term memory, suggesting a link between DNT-2 signaling, structural plasticity, and behavior.

      A major strength of this study is the impressive cellular resolution achieved. By focusing on specific dopaminergic neurons, such as the PAM and PPL1 clusters, and using a range of molecular markers, the authors were able to clearly visualize intricate details of synapse formation, dendritic complexity, and axonal targeting within defined circuits. Given the critical role of dopaminergic pathways in learning and memory, this approach provides a good opportunity to explore the role of DNT-2, Toll-6, and Kek-6 in experience-dependent structural plasticity. However, despite the promise in the abstract and introduction of the paper, the study falls short of establishing a direct causal link between neurotrophin signaling and experience-induced plasticity.

      Simply put, this study does not provide strong evidence that experience-induced structural plasticity requires DNT-2 signaling. To support this idea, it would be necessary to observe experience-induced structural changes and demonstrate that downregulation of DNT-2 signaling prevents these changes. The closest attempt to address this in this study was the artificial activation of DNT-2 neurons using TrpA1, which resulted in overgrowth of axonal arbors and an increase in synaptic sites in both DNT-2 and PAM neurons. However, this activation method is quite artificial, and the authors did not test whether the observed structural changes were dependent on DNT-2 signaling. Although they also showed that overexpression of DNT-2FL in DNT-2 neurons promotes synaptogenesis, this phenotype was not fully consistent with the TrpA1 activation results (Figures 5C and D).

      In conclusion, this study demonstrates that DNT-2 and its receptors play a role in regulating the structure of dopaminergic circuits in the adult fly brain. However, it does not provide convincing evidence for a causal link between DNT-2 signaling and experience-dependent structural plasticity within these circuits.

      We would like to thank Reviewer 2 for their very positive assessment of our approach to investigate structural circuit plasticity. We are delighted that this Reviewer found our cellular resolution impressive. We are also very pleased that Reviewer 2 found that our work demonstrates that DNT-2 and its receptors regulate the structure of dopaminergic circuits in the adult fly brain. This is already a very important finding that contributes to demonstrating that, rather than being hardwired, the adult fly brain is plastic, like the mammalian brain. Furthermore, it is remarkable that this involves a neurotrophin functioning via Toll and kinase-less Trks, opening an opportunity to explore whether such a mechanism could also operate in the human brain.

      We are very pleased that this Reviewer acknowledges that this work provides a good opportunity to explore the role of DNT-2, Toll-6, and Kek-6 in experience-dependent structural plasticity. We provide a molecular mechanism and proof of principle, and we demonstrate a direct link between the function of DNT-2 and its receptors in circuit plasticity. We also showed a link of DNT-2 to neuronal activity, as neuronal activity increased the production of DNT-2GFP, induced the cleavage of DNT-2 and a feedback loop between DNT-2 and dopamine, and both neuronal activity and increased DNT-2 levels promoted synaptogenesis.

      As the Reviewer acknowledges this approach provides a good opportunity to explore the role of DNT-2, Toll-6, and Kek-6 in experience-dependent structural plasticity. Finding out the direct link in response to lived experience is a big task, beyond the scope of this manuscript, and we will be testing this with future projects. Nevertheless, it is important to place our findings within this context together with the link to mammalian neurotrophins (as explained in the discussion), as it is here where the findings have deep and impactful implications.

      To accommodate the criticism of this Reviewer, we have now toned down our narrative. This does not diminish the importance of the findings, it makes the argument more stringent. Please see edits in: Abstract page 2 lines 42-44; and Discussion page 22 line 586 – which were the only points were a direct claim had been made.

      We would like to thank Reviewer 2 for the positive and thoughtful evaluation of our work, and for their feedback.

      Reviewer #3 (Public review):

      Summary:

      The authors used the model organism Drosophila melanogaster to show that the neurotrophin Toll-6 and its ligands, DNT-2 and kek-6, play a role in maintaining the number of dopaminergic neurons and modulating their synaptic connectivity. This supports previous findings on the structural plasticity of dopaminergic neurons and suggests a molecular mechanism underlying this plasticity.

      Strengths:

      The experiments are overall very well designed and conclusive. Methods are in general state-of-the-art, the sample sizes are sufficient, the statistical analyses are sound, and all necessary controls are in place. The data interpretation is straightforward, and the relevant literature is taken into consideration. Overall, the manuscript is solid and presents novel, interesting, and important findings.

      We are delighted that Reviewer 3 found our work solid, novel, interesting and with important findings. We are also very pleased that this Reviewer found that all necessary controls have been carried out.

      Weaknesses:

      There are three technical weaknesses that could perhaps be improved.

      First, the model of reciprocal, inhibitory feedback loops (Figure 2F) is speculative. On the one hand, glutamate can act in flies as an excitatory or inhibitory transmitter (line 157), and either situation can be the case here. On the other hand, it is not clear how an increase or decrease in cAMP level translates into transmitter release. One can only conclude that two types of neurons potentially influence each other.

      Thank you for pointing out that glutamate can be inhibitory. In response, we have removed the word ‘excitatory’ from the only point it had been used in the text: page 7 line 167.

      In mammals, the neurotrophin BDNF has an important function in glutamatergic synapses, thus we were intrigued by a potential evolutionary conservation. Our evidence that DNT-2A neurons could be excitatory is indirect, yet supportive: exciting DNT-2 neurons with optogenetics resulted in an increase in GCaMP in PAMs (data not shown); over-expression of DNT-2 in DNT-2 neurons increased TH mRNA levels; optogenetic activation of DNT-2 neurons results in the Dop2R-dependent downregulation of cAMP levels in DNT-2 neurons. Dop2R signals in response to dopamine, which would be released only if dopaminergic neurons had been excited. Accordingly, glutamate released from DNT-2 neurons would have been rather unlikely to inhibit DANs.

      cAMP is a second messenger that enables the activation of PKA. PKA phosphorylates many target proteins, amongst which are various channels. This includes the voltage gated calcium channels located at the synapse, whose phosphorylation increases their opening probability. Other targets regulate synaptic vesicle release. Thus, a rise in cAMP could facilitate neurotransmitter release, and a downregulation would have the opposite effect. Other targets of PKA include CREB, leading to changes in gene expression. Conceivably, a decrease in PKA activity could result in the downregulation of DNT-2 expression in DNT-2 neurons. This negative feedback loop would restore the homeostatic relationship between DNT-2 and dopamine levels.

      We agree with this Reviewer that whereas our qRT-PCR data show that over-expression of DNT-2 increases TH mRNA levels, this does not demonstrate that originates from PAM neurons. Similarly, although our EPAC data imply that dopamine must be released from DANs and received by DNT-2 neurons to explain those data, the evidence did not include direct visualisation of dopamine release in response to DNT-2 neuron activation. To accommodate these criticisms, we have edited the summary Figure 2E adding question marks to indicate inference points and page 9 line 221.

      Our data indeed demonstrate that DNT-2 and PAM neurons influence each other, not potentially, but really. We have provided data that: DNT-2 and PAMs are connected through circuitry; that the DNT-2 receptors Toll-6 and kek-6 are expressed in DANs, including in PAMs; that alterations in the levels of DNT-2 (both loss and gain of function) and loss of function for the DNT-2 receptors Toll-6 and Kek-6 alter PAM cell number, alter PAM dendritic complexity and alter synaptogenesis in PAMs; alterations in the levels of DNT-2, Toll-6 and kek-6 in adult flies alters dopamine dependent behaviours of climbing, locomotion in an arena and learning and long-term memory. These data firmly demonstrate that the two neuron types DNT-2 and PAMs influence each other.

      We have also shown that over-expression of DNT-2 in DNT-2 neurons increases TH mRNA levels, whereas activation of DNT-2 neurons decreases cAMP levels in DNT-2 neurons in a dopamine/Dop2R-dependent manner. These data show a functional interaction between DNT-2 and PAM neurons.

      Second, the quantification of bouton volumes (no y-axis label in Figure 5 C and D!) and dendrite complexity are not convincingly laid out. Here, the reader expects fine-grained anatomical characterizations of the structures under investigation, and a method to precisely quantify the lengths and branching patterns of individual dendritic arborizations as well as the volume of individual axonal boutons.

      Figure 5C, D do contain Y-axis labels, all our graphs in main manuscript and in supplementary files contain Y-axis labels.

      In fact, we did use a method to precisely quantify the lengths and branching patterns of individual dendritic arborisations, volume of individual boutons and bouton counting. These analyses were carried out using Imaris software. For dendritic branching patterns, the “Filament Autodetect” function was used. Here, dendrites were analysed by tracing semi-automatically each dendrite branch (ie manual correction of segmentation errors) to reconstruct the segmented dendrite in volume. From this segmented dendrite, Imaris provides measurements of total dendrite volume, number and length of dendrite branches, terminal points, etc. For bouton size and number, we used the Imaris “Spot” function. Here, a threshold is set to exclude small dots (eg of background) that do not correspond to synapses/boutons. All samples and genotypes are treated with the same threshold, thus the analysis is objective and large sample sizes can be analysed effectively. We had already provided a description of the use of Imaris in the methods section.

      We have now exapanded the protocol on how we use Imaris to analyse dendrites and synapses, in: Materials and Methods section, page 28 lines 756-768 and page 29 lines 778-799.

      Third, Figure 1C shows two neurons with the goal of demonstrating between-neuron variability. It is not convincingly demonstrated that the two neurons are actually of the very same type of neuron in different flies or two completely different neurons.

      We thank Reviewer 3 for raising this interesting point. It is not possible to prove which of the four DNT-2A neurons per hemibrain, which we visualised with DNT-2>MCFO, were the same neurons in every individual brain we looked at. This is because in every brain we have looked at, the soma of the neurons were not located in exactly the same location. Furthermore, the arborisation patterns are also different and unique, for each individual brain. Thus, there is natural variability in the position of the soma and in the arborisation patterns. Such variability presumably results from the combination of developmental and activity-dependent plasticity. Importantly, for every staining we carried out using DNT-2GAL4 and various membrane reporters and MCFO clones, we never found two identical DNT-2 neuron profiles.

      To increase the evidence in support of this point, we have now expanded Figure 1, adding one more image of DNT-2>FlyBow (Figure 1A) and two more images of DNT-2>MCFO (Figure 1D). In total, seven images in Figure 1 and two further images in Figure 5A demonstrate the variability of DNT-2 neurons.

      We would like to thank Reviewer 3 for the very positive evaluation of our work and the interesting and valuable feedback.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      In the fly list, several fly lines are missing references and sources. 

      Apologies for this over-sight, this has now been corrected.

      We thank Reviewer 1 for their effort and time to scrutinise our work, and for their very positive and helpful feedback.

      Reviewer #2 (Recommendations for the authors):

      (1) Here I provide some more specific comments that I hope will help the authors further improve the study.

      (2) L148: "single neuron clones revealed variability in the DNT-2A". How do the authors know that they are labeling the same subtype of DNT-2A neurons? 

      There are four anterior DNT-2A cells per hemibrain, that project from the SOG area to the SMP. It is not possible to verify that every time we look at exactly the same neuron, because the exact position of the somas and the arborisation patterns vary from brain to brain. We know this from two sources of data: (1) when using DNT-2GAL4 to visualise the expression of membrane reporters (e.g. UAS-FlyBow, UAS-mCD8-GFP, UAS-CD8-RFP) no brain ever showed a pattern identical to that of another brain, neither in the exact position of the somas nor in the exact arborisation patterns. (2) When we generated DNT-2>MCFO clones to visualise 1-2 cells at a time, no single neuron or 2-neuron clones ever showed an identical pattern. The most parsimonious interpretation is that the exact location of the somas and the exact arborisation patterns vary across individual flies. Developmental variability in neuronal patterns has also been reporter by Linneweber et al (2020) Science.

      To make our evidence more compelling, and in response to this Reviewer’s query, we have now added further images. Please find in revised Figure 1 A,B three examples of three different brains expressing DNT-2>FlyBow1.1. In Figure 1D, two more examples (altogether 4) of DNT-2>MCFO clones. Here it is clear to see that no neuron shape is identical to that of others, demonstrating variability in individual fly brains. We now show four images in Figure 1 and two more in Figure 5A that demonstrate the variability of DNT-2A neurons.

      (3) Figure 1E: Are all DNT-2A neurons positive for vGlut and Dop2R? This figure shows only two DNT-2A neurons. 

      Yes, all four DNT-2A neurons per hemibrain are vGlut positive and we have now added more images to Supplementary Figure S1A (right), also showing that presynaptic DNT-2A endings at SMP also coincide with a vGlut+ domain (Figure S1A left).

      Yes, all all four DNT-2A neurons per hemibrain are Dop2R positive and we have now added more images to Supplementary Figure S1B.

      (4) L156: Glutamate is generally considered to be inhibitory in the adult fly brain. More evidence is needed before the authors can claim that "DNT-2A neurons are excitatory glutamatergic neurons". 

      Thank you for pointing this out. Although our data do not conclusively demonstrate it, they are consistent with DNT-2A neurons being excitatory. BDNF is most commonly released from glutamatergic neurons in mammals, its release is activity-dependent and leads to formation and stabilisation of synapses.  The phenotypes we have observed are consistent with this and reveal functional evolutionarily conservation: (1) exciting DNT-2 neurons with TrpA1 results in increased production and cleavage of DNT-2GFP and de novo synaptogenesis; (2) over-expression of DNT-2 in the adult induces de novo synaptogenesis; (3) down-regulation or loss of DNT-2 and its receptors Toll-6 and Kek-6 impair synaptogenesis. Furthermore, we show that DNT-2 dependent synaptogenesis is between DNT-2 and dopaminergic neurons, which are involved in the control of locomotion, reward learning and long-term memory, and dopamine itself is required for such behaviour. Consistently with this we found that: (1) over-expression of DNT-2 increases TH mRNA levels, which would lead to the up-regulation of dopamine production; (2) exciting DNT-2 neurons increases locomotion speed in an arena; (3) knock-down of DNT-2 and its receptors decreases locomotion, whereas over-expression of DNT-2 increases locomotion; (4) over-expression of DNT-2 increases learning and long-term memory. Finally, in a previous version in bioRxiv, we also showed using optogenetics and calcium imaging that exciting DNT-2 neurons induced GCaMP signalling in their output PAM neurons, and in this version we show that exciting DNT-2 neurons regulates cAMP in DNT-2 neurons via dopamine-release dependent feedback. Altogether, the most parsimonious interpretation of these data is that vGlut+ DNT-2 neurons are excitatory.

      In any case, to address this reviewer’s point, we have now removed the word ‘excitatory’ from page 7 line 167.

      (5) Figure 1H, I: A more detailed description of the Toll-6 and Kek-6 expressing neurons will be helpful. Are they expressed in specific types of PAM and PPL1 DANs? The legend in Figure S2 mentions labeling in γ2α′1 zones, but it seems to be more than that.

      This information had been already provided, presumable this Reviewer overlooked this. This was already described in great detail by comparing our microscopy data with the single cell RNA-seq data available through Fly Cell Atlas (https://flycellatlas.org) and Scope (https://scope.aertslab.org/#/b77838f4-af3c-4c37-8dd9-cf7a41e4b034/*/welcome).

      Please see our previously submitted Table S1 “Expression of Tolls, keks and Toll downstream adaptors in cells related to DNT-2A neurons”.

      (6) Figure S3 should be controls for Figure 2A. It is incorrectly labeled as controls for Figure 3A. 

      Thank you for pointing out this typo, this has now been corrected.

      (7) L197: The authors state, "This showed that DNT-2 could stimulate dopamine production in neighboring DANs". However, the results do not fully support this conclusion because the experiments measure overall TH levels in the brain, not specifically in neighboring DANs. The observed effect could be indirect via other neurons. 

      Indeed, we have now edited the text to: “This showed that DNT-2 could stimulate dopamine production”: page 8 line 208.

      (8) Figure 3: If Toll-6 is expressed in specific subtypes of PAM DANs, are they the dying cells when Toll-6 was knocked down? I think the paper will be significantly improved if the authors provide a more in-depth analysis of the phenotype. Also, permissive temperature controls are missing for the experiments in (E)-(H). Permissive controls are essential to confirm that the observed effects are due to adult-specific RNAi knockdown.

      Current tools do not enable us to visualise Toll-6+ neurons at the same time as manipulating DNT-2 neurons and at the same time as monitoring Dcp1. Stainings with Dcp1 in the adult brain are not trivial. Thus, we cannot guarantee this. However, Toll-6 is the preferential receptor for DNT-2, and given that apoptosis increases when we knock-down DNT-2, the most parsimonious interpretation is that the dying cells bear the DNT-2 receptor Toll-6. Even if DNT-2 can promiscuously bind other Toll receptors, the simplest way to interpret these data remains that DNT-2 promotes cell survival by signalling via its receptors, as no other possible route is known to date. This would be consistent with all other data in this figure.

      We thank this Reviewer for the feedback on the controls. Unfortunately, these are not trivial experiments, they require considerable time, effort, dedication and skill. This manuscript has already taken 5 years of daily hard work. We no longer have the staff (ie the first author left the lab) nor resources to dedicate to address this point.

      (9) Figure 4B: This phenotype in DNT-2 mutants is very striking. Did the neurons still survive and did their axonal innervation in the lobes remain intact?

      Homozygous DNT-2 mutants are viable and have impair climbing, as we had already shown in Figure 7C.

      (10) L261: The authors mention that "PAM-β2β′2 neurons express Toll-6 (Table S1)". However, I cannot find this information in Table S1. 

      Unfortunately, I cannot identify the source of that statement at present and the first authors has left the lab. In any case, although the fact that knocking down Toll-6 in these neurons causes a phenotype means they must, it does not directly prove it. We have now corrected this to: “PAM-b2b'2 neuron dendrites overlap axonal DNT2 projections”, page 11 line 280.

      (11) Figure 4C, D: What about their synaptogenesis? Do they agree with the result in Figure 4B? 

      This was not tested at the time. Unfortunately, these are not trivial experiments and require considerable time, effort, dedication and skill. Addressing this point experimentally is not possible for us at this point. In any case, given the evidence we already provide, it is highly unlikely they would alter the interpretation of our findings and the value of the discoveries already provided.

      (12) L270: The authors state: "To ask whether DNT-2 might affect axonal terminals, we tested PPL1 axons." However, it is unclear why the focus was shifted to PPL1 neurons when similar analyses could have been performed on PAM DANs for consistency. In addition, it would be beneficial to assess dendritic arbor complexity and synaptogenesis in PPL1-γ1-pedc neurons to provide a more comprehensive comparison between PPL1 and PAM DANs. Performing parallel analyses on both neuron types would strengthen the study by providing insight into the generality and specificity of DNT-2 in different dopaminergic circuits. 

      The question we addressed with Figure 4 was whether the DNT-2 and its receptors could modify axons, dendrites and synapses, ie all features of neuronal plasticity. The reason we used PPL1-g1-pedc to analyse axonal terminals was because of their morphology, which offered a clearer opportunity to visualise axonal endings than PAMs did. An exhaustive analysis of PPL1-g1-pedc is beyond the scope of this work and not the central focus.

      (13) Figure 4G lacks a permissive temperature control, which is essential to confirm that the observed effects are due to adult-specific RNAi knockdown. 

      We thank this Reviewer for this feedback, which we will bear in mind for future projects.

      (14) Figure 5A requires quantification and statistical comparison.

      We thank this Reviewer for this feedback. We did consider this, but the data are too variable to quantify and we decided it was best to present it simply as an observation, interesting nonetheless. This is consistent as well with the data in Figure 1, which we have now expanded with this revision, which show the natural variability in DNT-2 neurons.

      (15) Figure 5B: Many green signals in the control image are not labeled as PSDs, raising concerns about the accuracy of the image analysis methods used for synapse identification. While I trust that the authors have validated their analysis approach, it would strengthen the study if they provided a clearer description or evidence of the validation process. 

      This was done using the Imaris “Spot function”, in volume. A threshold is set to exclude spots due to GFP background and select only synaptic spots. The selection of spots and quantification are done automatically by Imaris. All spots below the threshold are excluded, regardless of genotype and experimental conditions, rendering the analysis objective. We have now provided a detailed description of the protocol in the Materials and Methods section: page 29 lines 778-799.

      (16) Figure 5C lacks genotype controls (i.e., DNT2-GAL4-only and UAS-TrpA1-only). These controls are essential because elevated temperatures alone, without activation of DNT2 neurons, could potentially increase Syt-GCaMP production, leading to an increase in the number of Syt+ synapses. Including these controls would help ensure that the observed effects are truly due to the activation of DNT2 neurons and not temperature-related artifacts. 

      We thank this Reviewer for this feedback, which we will bear in mind for future projects.

      (17) L314-316: The authors state, "Here, the coincidence of... revealed that newly formed synapses were stable." I think this statement needs to be toned down because there is no evidence that these pre- and post-synaptic sites are functionally connected. 

      The Reviewer is correct that our data did not visualise together, in the same preparation and specimen, both pre- and post-synaptic sites. Still, given that PAMs have already been proved by others to be required for locomotion, learning and long-term memory, our data strongly suggest that synapses between them at the SMP are functionally connected.

      Nevertheless, as we do not provide direct cellular evidence, we have now edited the text to tone down this claim: “Here, the coincidence of increased pre-synaptic Syt-GFP from PAMs and post-synaptic Homer-GFP from DNT-2 neurons at SMP suggests that newly formed synapses could be stable”, page 13 line 351.

      (18) Figure 5D lacks permissive temperature controls. Also, the DNT-2FL overexpression phenotypes are different from the TpA1 activation phenotypes. The authors may want to discuss this discrepancy. 

      Regarding the controls, these are not appropriate for this data set. These data were all taken at a constant temperature of 25°C, there were no shifts, and therefore do not require a permissive temperature control. We thank this Reviewer for drawing our attention to the fact that we made a mistake drawing the diagram, which we have now corrected in Figure 5D.

      Regarding the discrepancy, this had already been discussed in the Discussion section of the previously submitted version, page 19 Line 509-526. Presumably this Reviewer missed this before.

      (19) Figure 6A, B lack permissive temperature controls. These controls are important if the authors want to claim that the behavioral defects are due to adult-specific manipulations. In addition, there is no statistical difference between the PAM-GAL4 control and the RNAi knockdown group. The authors should be careful when stating that climbing was reduced in the RNAi knockdown flies (L341-342). 

      We thank this Reviewer for this feedback, which we will bear in mind for future projects.

      Point taken, but climbing of the tubGAL80ts, PAM>Toll-6RNAi flies was significantly different from that of the UAS-Toll-6RNAi/+ control.

      (20) Figure 6C: It seems that the DAN-GAL4 only control (the second group) also rescued the climbing defect. The authors may want to clarify this point. 

      The phenotype for this genotype was very variable, but certainly very distinct from that of flies over-expressing Toll-6[CY].

      We thank Reviewer 2 for their very thorough analysis of our paper that has helped improve the work.

      Reviewer #3 (Recommendations for the authors): 

      Overall, the manuscript reports highly interesting and mostly very convincing experiments. 

      We are very grateful to this Reviewer for their very positive evaluation of our work.

      Based on my comments under the heading "public review", I would like to suggest three possible improvements. 

      First, the quantification of structural plasticity at the sub-cellular level should be explained in more detail and potentially improved. For example, 3D reconstructions of individual neurons and quantification of the structure of boutons and dendrites could be undertaken. At present, it is not clear how bouton volumes are actually recorded accurately. 

      Thank you for the feedback. The analyses of dendrites and synapses were carried out in 3D-volumes using Imaris “Filament” module and “Spot function”, respectively. Dendrites are analysed semi-automatically, ie correcting potential branching errors of Imaris, and synapses are counted automatically, after setting appropriate thresholds. Details have now been expanded in the Materials and Sections section: page 28 lines 756-768 and page 29 lines 780-799.

      We would also like to thank Imaris for enabling and facilitating our remote working using their software during the Covid-19 pandemic, post-pandemic lockdowns and lab restrictions that spanned for over a year.

      Second, the variability between DNT-2A-positive neurons with increasing sample size compared to a control (DNT-2A-negative neurons) should be demonstrated. Figure 2C does currently not present convincing evidence of increased structural variability. 

      It is unclear what data the Reviewer refers to. Figure 2C shows qRT-PCR data, and it does not show structural variability, which instead is shown with microscopy. If it is the BacTrace data in Figure 2B, the controls had been provided and the data were unambiguous. If Reviewer means Figure 1C, it is unclear why DNT-2GAL4-negative flies are needed when the aim was to visualise normal (not genetically manipulated) DNT-2 neurons. Thus, unfortunately we do not understand what the point is here.

      The observation that DNT-2 neurons are very variable, naturally, is highly interesting, and presumably this is what drew the attention of Reviewer 3. We agree that showing further data in support of this is interesting and valuable. Thus, in response to this Reviewer’s comment we have now increased the number of images that demonstrate variability of DNT-2 neurons:

      (1) We have added an extra image, altogether providing three images in new Figure 1A showing three different individual brains stained with DNT-2GAL4>UAS-FlyBow1.1. These show common morphology and features, but different location of the somas and distinct detailed arborisation patterns. Two more images using DNT-2GAL4 are provided in Figure 5A.

      (2) We have now added two further MCFO images, altogether showing four examples where the somas are not always in the same location and the axons arborise consistently at the SMP, but the detailed projections are not identical: new Figure 1D.

      These data compellingly show natural variability in DNT-2 neuron morphology.

      Third, I propose to simplify the feedback model (Figure 2F) to be less speculative. 

      Indeed, some details in Figure 2F are speculative as we did not measure real dopamine levels. Accordingly, we have now edited this diagram, adding question marks to indicate speculative inference, to distinguish from the arrows that are grounded on the data we provide.

      Accordingly, we have also edited the text in:

      - page 9, lines 221: “Altogether, this shows that DNT-2 up-regulated TH levels (Figure 2E), and presumably via dopamine release, this inhibited cAMP in DNT-2A neurons (Figure 2F)”.

      - page 20, lines 515: “Importantly, we showed that activating DNT-2 neurons increased the levels and cleavage of DNT-2, up-regulated DNT-2 increased TH expression, and this initial amplification resulted in the inhibition of cAMP signalling via the dopamine receptor Dop2R in DNT-2 neurons.”

      As minor points: 

      (1) Appetitive olfactory learning is based on Tempel et al., (1983); Proc Natl Acad Sci U S A. 1983 Mar;80(5):1482-6. doi: 10.1073/pnas.80.5.1482. This paper should perhaps be cited. 

      Thank you for bringing this to our attention, we have now added this reference to page 14 line 394.

      (2) Line 34: I would add ..."ligand for Toll-6 AND KEK-6,". 

      Indeed, thank you, now corrected.

      (3) Line 39: DNT-2-POSITIVE NEURONS. 

      Now corrected, thank you.

      (4) The levels of TH mRNA were quantified. Why not TH or dopamine directly using antibodies, ELISA, or HPLC? After all, later it is explicitly written that DNT modulates dopamine levels (line 481)! 

      We thank this Reviewer for this suggestion. We did try with HPLC once, but the results were inconclusive and optimising this would have required unaffordable effort by us and our collaborators. Part of this work spanned over the pandemic and subsequent lockdowns and lab restrictions to 30% then 50% lab capacity that continued for one year, making experimental work extremely challenging. Although we were unable to carry out all the ideal experiments, the DNT-2-dependent increase in TH mRNA coupled with the EPAC-Dop2R data provided solid evidence of a DNT-2-dopamine link.

      (5) Line 271: The PPL1-g1-pedc neuron has mainly (but not excusively) a function in short-term memory! 

      They do, but others have also shown that PPL1-g1-pedc neurons have a gating function in long-term memory (Placais et al 2012; Placais et al 2017; Huang et al 2024) and are required for long-term memory (Adel and Griffith 2020; Boto et al 2020).

      (6) Line 401: Reward learning requires PAM neurons. PPL1 neurons are required for aversive learning. 

      Indeed, PPL1 neurons are required for aversive learning, but they also have a gating function in long-term memory common for both reward and aversive learning (Adel and Griffith, 2020 Neurosci Bull; Placais et al, 2012 Nature Neuroscience; Placais et al 2017 Nature Communications; Huang et al 2024 Nature).

      Overall, the manuscript presents extremely interesting, novel results, and I congratulate the authors on their findings. 

      We would like to thank this Reviewer for taking the time to scrutinise our work, their helpful feedback that has helped us improve the work and for their interest and positive and kind works.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2024-02465

      Corresponding author(s): Saravanan, Palani

      1. General Statements

      We would like to thank the Review Commons Team for handling our manuscript and the Reviewers for their constructive feedback and suggestions. In our revised manuscript, we have addressed and incorporated all the major suggestions of the reviewers, and we have also added new significant data on the role of Tropomyosin in regulation of endocytosis through its control over actin monomer pool maintenance and actin network homeostasis. We believe that with all these additions, our study has significantly gained in quality, strength of conclusions made, and scope for future work.

      2. Point-by-point description of the revisions

      Reviewer #1

      Evidence, reproducibility and clarity

      There are 2 Major issues -

      Having an -ala-ser- linker between the GFP and tropomyosin mimics acetylation. This is not the case, and more likely the this linker acts as a spacer that allows tropomyosin polymers to form on the actin, and without it there is steric hindrance. A similar result would be seen with a simple flexible uncharged linker. It has been shown in a number of labs that the GFP itself masks the effect of the charge on the amino terminal methionine. This is consistent with NMR, crystallographic and cryo structural studies. Biochemical studies should be presented to demonstrate that the impact of a linker for the conclusions stated to be made, which provide the basis of a major part of this study.

      Response: We would like to clarify that all mNG-Tpm constructs used in our study contain a 40 amino-acid (aa) flexible linker between the N-terminal mNG fluorescent protein and the Tpm protein as per our earlier published study (Hatano et al., 2022). During initial optimization, we have also experimented with linker length and the 40aa-linker length works optimally for clear visualization of Tpm onto actin cable structures in budding yeast, fission yeast (both S. pombe and S. japonicus), and mammalian cells (Hatano et al., 2022). These constructs have also been used since in other studies (Wirshing et al., 2023; Wirshing and Goode, 2024) and currently represents the best possible strategy to visualize Tpm isoforms in live cells. In our study, we characterized these proteins for functionality and found that both mNG-Tpm1 and mNG-Tpm2 were functional and can rescue the synthetic lethality observed in Dtpm1Dtpm2 cells. During our study, we observed that mNG-Tpm1 expression from a single-copy integration vector did not restore full length actin cables in Dtpm1 cells (Fig. 1B, 1C). We hypothesized that this could be a result of reduced binding affinity of the tagged tropomyosin due to lack of normal N-terminal acetylation which stabilizes the N-terminus. The 40aa linker is unstructured and may not be able to neutralize the charge on the N-terminal Methionine, thus, we tried to insert -Ala-Ser- dipeptide which has been routinely used in vitro biochemical studies to stabilize the N-terminal helix and impart a similar effect as the N-terminal acetylation (Alioto et al., 2016; Palani et al., 2019; Christensen et al., 2017) by restoring normal binding affinity of Tpm to F-actin (Monteiro et al., 1994; Greenfield et al., 1994). We observed that addition of the -Ala-Ser- dipeptide to mNG-Tpm fusion, indeed, restored full length actin cables when expressed in Dtpm1 cells, performing significantly better in our in vivo experiments (Fig. 1B, 1C). We agree with the reviewer that the -AS- dipeptide addition may not mimic N-terminal acetylation structurally but as per previous studies, it may stabilize the N-terminus of Tpm and allow normal head-to-tail dimer formation (Greenfield et al., 1994; Monteiro et al., 1994; Frye et al., 2010). We have discussed this in our new Discussion section (Lines 350-372). Since, the addition of -AS- dipeptide was referred to as "acetyl-mimic (am)" in a previous study (Alioto et al., 2016), we continued to use the same nomenclature in our study. Now as per your suggestions and to be more accurate, we have renamed "mNG-amTpm" constructs as "mNG-ASTpm" throughout the study to not confuse or claim that -AS- addition mimics acetylation. In any case, we have not seen any other ill effect of -AS- dipeptide introduction in addition to our 40 amino acid linker suggesting that it can also be considered part of the linker. Although, we agree with the reviewer that biochemical characterization of the effect of linker would be important to determine, we strongly believe that it is currently outside the scope of this study and should be taken up for future work with these proteins. Our study has majorly aimed to understand the functionality and utility of these mNG-Tpm fusion proteins for cell biological experiments in vivo, which was not done earlier in any other model system.

      My major issue however is making the conclusions stated here, using an amino-terminal fluorescent protein tag that s likely to impact any type of isoform selection at the end of the actin polymer. Carboxyl terminal tagging may have a reduced effect, but modifying the ends of the tropomyosin, which are integral in stabilising end to end interactions with itself on the actin filament, never mind any section systems that may/maynot be present in the cell, is not appropriate.

      Response: __ We agree with the reviewer that N-terminal tagging of tropomyosin may have effects on its function, but these constructs represent the only fluorescently tagged functional tropomyosin constructs available currently while C-terminal fusions are either non-functional (we were unable to construct strains with endogenous Tpm1 gene fused C-terminally to GFP) or do not localize clearly to actin structures (See __Figure R1 showing endogenous C-terminally tagged Tpm2-yeGFP that shows almost no localization to actin cables). To our knowledge, our study represents a first effort to understand the question of spatial sorting of Tpm isoforms, Tpm1 and Tpm2, in S. cerevisiae and any future developments with better visualization strategies for Tpm isoforms without compromising native N-terminal modifications and function will help improve our understanding of these proteins in vivo. We have also discussed these possibilities in our new Discussion section (Lines 391-396).

      Significance

      This paper explores the role of formin in determining the localisation of different tropomyosins to different actin polymers and cellular locations within budding yeast. Previous studies have indicated a role for the actin nucleating proteins in recruiting different forms of tropomyosin within fission yeast. In mammalian cells there is variation in the role of formins in affiecting tropomyosin localisation - variation between cell type. There is also evidence that other actin binding proteins, and tropomyosin abundance play roles in regulating the tropomyosin-actin association according to cell type. Biochemical studies have previously been undertaken using budding yeast and fission yeast that the core actin polymerisation domain of formins do not interact with tropomyosin directly. The significance of this study, given the above, and the concerns raised is not clear to this reviewer.

      Response: __Our study explores multiple facets of Tropomyosin (Tpm) biology. The lack of functional tagged Tpm has been a major bottleneck in understanding Tpm isoform diversity and function across eukaryotes. In our study, we characterize the first functional tagged Tpm proteins (Fig. 1, Fig. S1) and use them to answer long-standing questions about localization and spatial sorting of Tpm isoforms in the model organism S. cerevisiae (Fig. 2, Fig. 3, Fig. S2, Fig. S3). We also discover that the dual Tpm isoforms, Tpm1 and Tpm2, are functionally redundant for actin cable organization and function, while having gained divergent functions in Retrograde Actin Cable Flow (RACF) (Fig. 4, Fig. 5A-D, Fig. S4, Fig. S5, Fig. S6). We have now added new data on role of global Tpm levels controlling endocytosis via maintenance of normal linear-to-branched actin network homeostasis in S. cerevisiae (Fig. 5E-G)__. We respectfully differ with the reviewer on their assessment of our study and request the reviewer to read our revised manuscript which discusses the significance, limitations, and future perspectives of our study in detail.

      Reviewer #2

      Evidence, reproducibility and clarity

      This manuscript by Dhar, Bagyashree, Palani and colleagues examines the function of the two tropomyosins, Tpm1 and Tpm2, in the budding yeast S. cerevisiae. Previous work had shown that deletion of tpm1 and tpm2 causes synthetic lethality, indicating overlapping function, but also proposed that the two tropomyosins have distinct functions, based on the observation that strong overexpression of Tpm2 causes defects in bud placement and fails to rescue tpm1∆ phenotypes (Drees et al, JCB 1995). The manuscript first describes very functional mNeonGreen tagged version of Tpm1 and Tpm2, where an alanine-serine dipeptide is inserted before the first methionine to mimic acetylation. It then proposes that the Tpm1 and Tpm2 exhibit indistinguishable localization and that low level overexpression (?) of Tpm2 can replace Tpm1 for stabilization of actin cables and cell polarization, suggesting almost completely redundant functions. They also propose on specific function of Tpm2 in regulating retrograde actin cable flow.

      Overall, the data are very clean, well presented and quantified, but in several places are not fully convincing of the claims. Because the claims that Tpm1 and Tpm2 have largely overlapping function and localization are in contradiction to previous publication in S. cerevisiae and also different from data published in other organisms, it is important to consolidate them. There are fairly simple experiments that should be done to consolidate the claims of indistinguishable localization, and levels of expression, for which the authors have excellent reagents at their disposal.

      1. Functionality of the acetyl-mimic tagged tropomyosin constructs: The overall very good functionality of the tagged Tpm constructs is convincing, but the authors should be more accurate in their description, as their data show that they are not perfectly functional. For instance, the use of "completely functional" in the discussion is excessive. In the results, the statement that mNG-Tpm1 expression restores normal growth (page 3, line 69) is inaccurate. Fig S1C shows that tpm1∆ cells expressing mNG-Tpm1 grow more slowly than WT cells. (The next part of the same sentence, stating it only partially restores length of actin cables should cite only Fig S1E, not S1F.) Similarly, the growth curve in Fig S1C suggests that mNG-amTpm1, while better than mNG-Tpm1 does not fully restore the growth defect observed in tpm1∆ (in contrast to what is stated on p. 4 line 81). A more stringent test of functionality would be to probe whether mNG-amTpm1 can rescue the synthetic lethality of the tpm1∆ tpm2∆ double mutant, which would also allow to test the functionality of mNG-amTpm2.

      __Response: __We would like to thank the reviewer for his feedback and suggestions. Based on the suggestions, we have now more accurately described the growth rescue observed by expression of mNG-ASTpm1 in Dtpm1 cells in the revised text. We have also removed the use of "completely functional" to describe mNG-Tpm functionality and corrected any errors in Figure citations in the revised manuscript.

      As per reviewers' suggestion, we have now tested rescue of synthetic lethality of Dtpm1Dtpm2 cells by expression of all mNG-Tpm variants and we find that all of them are capable of restoring the viability of Dtpm1Dtpm2 cells when expressed under their native promoters via a high-copy plasmid (pRS425) (Fig. S1E) but only mNG-Tpm1 and mNG-ASTpm1 restored viability of Dtpm1Dtpm2 cells when expressed under their native promoters via an integration plasmid (pRS305) (Fig. S1F). These results clearly suggest that while both mNG-Tpm1 and mNG-Tpm2 constructs are functional, Tpm1 tolerates the presence of the N-terminal fluorescent tag better than Tpm2. These observations now enhance our understanding of the functionality of these mNG-Tpm fusion proteins and will be a useful resource for their usage and experimental design in future studies in vivo.


      It would also be nice to comment on whether the mNG-amTpm constructs really mimicking acetylation. Given the Ala-Ser peptide ahead of the starting Met is linked N-terminally to mNG, it is not immediately clear it will have the same effect as a free acetyl group decorating the N-terminal Met.

      Response: __We agree with the reviewer's observation and for the sake of clarity and accuracy, we have now renamed "mNG-amTpm" with "mNG-ASTpm". The use of -AS- dipeptide is very routine in studies with Tpm (Alioto et al., 2016; Palani et al., 2019; Christensen et al., 2017) and its addition restores normal binding affinities to Tpm proteins purified from E. coli (Monteiro et al., 1994). We agree with the reviewer that the -AS- dipeptide addition may not mimic N-terminal acetylation structurally but as per previous studies, it may help neutralize the impact of a freely protonated Met on the alpha-helical structure and stabilize the N-terminus helix of Tpm and allow normal head-to-tail dimer formation (Monteiro et al., 1994; Frye et al., 2010; Greenfield et al., 1994). Consistent with this, we also observe a highly significant improvement in actin cable length when expressing mNG-ASTpm as compared to mNG-Tpm in Dtpm1 cells, suggesting an improvement in function probably due to increased binding affinity (Fig. 1B, 1C). We have also discussed this in our answer to Question 1 of Reviewer 1 and the revised manuscript (Lines 350-372)__.

      __ Localization of Tpm1 and Tpm2:__Given the claimed full functionality of mNG-amTpm constructs and the conclusion from this section of the paper that relative local concentrations may be the major factor in determining tropomyosin localization to actin filament networks, I am concerned that the analysis of localization was done in strains expressing the mNG-amTpm construct in addition to the endogenous untagged genes. (This is not expressly stated in the manuscript, but it is my understanding from reading the strain list.) This means that there is a roughly two-fold overexpression of either tropomyosin, which may affect localization. A comparison of localization in strains where the tagged copy is the sole Tpm1 (respectively Tpm2) source would be much more conclusive. This is important as the results are making a claim in opposition to previous work and observation in other organisms.

      Response: __We thank the reviewer for this observation and their suggestions. We agree that relative concentrations of functional Tpm1 and Tpm2 in cells may influence the extent of their localizations. As per the reviewer's suggestion, we have now conducted our quantitative analysis in cells lacking endogenous Tpm1 and only expressing mNG-ASTpm1 from an integrated plasmid copy at the leu2 locus and the data is presented in new __Figure S3. We compared Tpm-bound cable length (Fig. S3A, S3B) __and Tpm-bound cable number (Fig. S3A, S3C) along with actin cable length (Fig. S3D, S3E) and actin cable number (Fig. S3D, S3F) in wildtype, Dbnr1, and Dbni1 cells. Our analysis revealed that mNG-ASTpm1 localized to actin cable structures in wildtype, Dbnr1, and Dbni1 cells and the decrease observed in Tpm-bound cable length and number upon loss of either Bnr1 or Bni1, was accompanied by a corresponding decrease in actin cable length and number upon loss of either Bnr1 or Bni1. Thus, this analysis reached the same conclusion as our earlier analysis (Fig. 2) that mNG-ASTpm1 does not show preference between Bnr1 and Bni1-made actin cables. mNG-ASTpm2 did not restore functionality, when expressed as single integrated copy, in Dtpm1Dtpm2 cells (new results in __Fig. S1E, S1F, S5A) thus, we could not conduct a similar analysis for mNG-ASTpm2. This suggests that use of mNG-ASTpm2 would be more meaningful in the presence of endogenous Tpm2 as previously done in Fig. 2D-F.

      We have now also performed additional yeast mating experiments with cells lacking bnr1 gene and expressing either mNG-ASTpm1 or mNG-ASTpm2 and the data is shown in new Figure 3. From these observations, we observe that both mNG-ASTpm1 and mNG-ASTpm2 localize to the mating fusion focus in a Bnr1-independent manner (Fig. 3B, 3D) and suggests that they bind to Bni1-made actin cables that are involved in polarized growth of the mating projection. These results also add strength to our conclusion that Tpm1 and Tpm2 localize to actin cables irrespective of which formin nucleates them. Overall, these new results highlight and reiterate our model of formin-isoform independent binding of Tpm1 and Tpm2 in S. cerevisiae.

      In fact, although the authors conclude that the tropomyosins do not exhibit preference for certain actin structures, in the images shown in Fig 2A and 2D, there seems to be a clear bias for Tpm1 to decorate cables preferentially in the bud, while Tpm2 appears to decorate them more in the mother cell. Is that a bias of these chosen images, or does this reflect a more general trend? A quantification of relative fluorescence levels in bud/mother may be indicative.

      Response: __We thank the reviewer for pointing this out. Our data and analysis do not suggest that Tpm1 and Tpm2 show any preference for decoration of cables in either mother or bud compartment. As per the reviewer's suggestion, we have now quantified the ratio of mean mNG fluorescence in the bud to the mother (Bud/Mother) and the data is shown in __Figure. S2G. The bud-to-mother ratio was similar for mNG-ASTpm1 and mNG-ASTpm2 in wildtype cells, and the ratio increased in Dbnr1 cells and decreased in Dbni1 cells for both mNG-ASTpm1 and mNG-ASTpm2 (Fig. S2G). __This is consistent with the decreased actin cable signal in the mother compartment in Dbnr1 cells and decreased actin cable signal in the bud compartment in Dbni1 cells (Fig. S2A-D). Thus, our new analysis shows that both mNG-ASTpm1 and mNG-ASTpm2 have similar changes in their concentration (mean fluorescence) upon loss of either formins Bnr1 and Bni1 and show similar ratios in wildtype cells as well, suggesting no preference for binding to actin cables in either bud or mother compartment. The preference inferred by the reviewer seems to be a bias of the current representative images and thus, we have replaced the images in __Fig. 2A, 2D to more accurately represent the population.

      The difficulty in preserving mNG-amTpm after fixation means that authors could not quantify relative Tpm/actin cable directly in single fixed cells. Did they try to label actin cables with Lifeact instead of using phalloidin, and thus perform the analysis in live cells?

      __Response: __We did not use LifeAct for our analysis as LifeAct is known to cause expression-dependent artefacts in cells (Courtemanche et al., 2016; Flores et al., 2019; Xu and Du, 2021) and it also competes with proteins that regulate normal cable organization like cofilin. Use of LifeAct would necessitate standardization of expression to avoid such artefacts in vivo. Also, phalloidin staining provides the best staining of actin cables and allows for better quantitative results in our experiments. The use of LifeAct along with mNG-Tpm would also require optimization with a red fluorescent protein which usually tend to have lower brightness and photostability. However, during the revision of our study, a new study from Prof. Goode's lab has developed and optimized expression of new LifeAct-3xmNeonGreen constructs for use in S. cerevisiae (Wirshing and Goode, 2024). Thus, a similar strategy of using tandem copies of bright and photostable red fluorescent proteins can be explored for use in combination with mNG-Tpm in the future studies.

      __ Complementation of tpm1∆ by Tpm2:__

      I am confused about the quantification of Tpm2 expression by RT-PCR shown in Fig S3F. This figure shows that tpm2 mRNA expression levels are identical in cells with an empty plasmid or with a tpm2-encoding plasmid. In both strains (which lack tpm1), as well as in the WT control, one tpm2 copy is in the genome, but only one strain has a second tpm2 copy expressed from a centromeric plasmid, yet the results of the RT-PCR are not significantly different. (If anything, the levels are lower in the tpm2 plasmid-containing strain.) The methods state that the primers were chosen in the gene, so likely do not distinguish the genomic from the plasmid allele. However, the text claims a 1-fold increase in expression, and functional experiments show a near-complete rescue of the tpm1∆ phenotype. This is surprising and confusing and should be resolved to understand whether higher levels of Tpm2 are really the cause of the observed phenotypic rescue.

      The authors could for instance probe for protein levels. I believe they have specific nanobodies against tropomyosin. If not, they could use expression of functional mNG-amTpm2 to rescue tpm1∆. Here, the expression of the protein can be directly visualized.

      Response: __We thank the reviewer for pointing this out. We would like to clarify that in our RT-qPCR experiments, the primers were chosen within the Tpm1 and Tpm2 gene and do not distinguish between transcripts from endogenous or plasmid copy. We have now mentioned this in the Materials and Methods section of the revised manuscript. So, they represent a relative estimate of the total mRNA of these genes present in cells. We were consistently able to detect ~19 fold increase in Tpm2 total mRNA levels as compared to wildtype and ∆tpm1 cells (Fig. S4D) when tpm2 was expressed from a high-copy plasmid (pRS425). This increase in Tpm2 mRNA levels was accompanied by a rescue in growth (Fig. S4A) and actin cable organization (Fig. S4B) of ∆tpm1 cells containing pRS425-ptpm2TPM2. When tpm2 was expressed from a low-copy number centromeric plasmid (pRS316), we detected a ~2 fold increase in Tpm2 transcript levels when using the tpm1 promoter and no significant change was detected when using tpm2 promoter (Fig. S4E)__. We have made sure that these results are accurately described in the revised manuscript.

      As per the reviewer's suggestion, we have now conducted a more extensive analysis to ascertain the expression levels of Tpm2 in our experiments and the data is now presented in new Figure S5. We used mNG-ASTpm1 and mNG-ASTpm2 to rescue growth of ∆tpm1 (Fig. S5A) and correlated growth rescue with protein levels using quantified fluorescence intensity (Fig. S5B, S5C) and western blotting (anti-mNG) (Fig. S5D, S5E). We find that ∆tpm1 cells containing pRS425-ptpm1mNG-ASTpm1 had the highest protein level followed by pRS425-ptpm2 mNG-ASTpm2, pRS305-ptpm1mNG-ASTpm1, and the least protein levels were found in pRS305-ptpm2 mNG-ASTpm2 containing ∆tpm1 cells in both fluorescence intensity and western blotting quantifications (Fig. S5C, S5E). Surprisingly, we were not able to detect any protein levels in ∆tpm1 cells containing pRS305-ptpm2 mNG-ASTpm2 with western blotting (Fig. S5D) which was also accompanied by a lack of growth rescue (Fig. S5A). This most likely due to weak expression from the native Tpm2 promoter which is consistent with previous literature (Drees et al., 1995). Taken together, this data clearly shows that the rescue observed in ∆tpm1 cells is caused due to increased expression of mNG-ASTpm2 in cells and supports our conclusion that increase in Tpm2 expression leads to restoration of normal growth and actin cables in ∆tpm1 cells.

      __ Specific function of Tpm2:__

      The data about the retrograde actin flow is interpreted as a specific function of Tpm2, but there is no evidence that Tpm1 does not also share this function. To reach this conclusion one would have to investigate retrograde actin flow in tpm1∆ (difficult as cables are weak) or for instance test whether Tpm1 expression restores normal retrograde flow to tpm2∆ cells.

      Response: __We agree with the reviewer and as per the reviewer's suggestion, we have performed another experiment which include wildtype, ∆tpm2 cells containing empty pRS316 vector or pRS316-ptpm2TPM1 or pRS316-ptpm1TPM1. We find that RACF rate increased in ∆tpm2 cells as compared to wildtype and was restored to wildtype levels by exogenous expression of Tpm2 but not Tpm1 (Fig. S6E, S6F). Since, actin cables were not detectable in ∆tpm1 cells, we measured RACF rates in ∆tpm1 cells expressing Tpm1 or Tpm2 from a plasmid copy, which restored actin cables as shown previously in __Fig. 5A-C. We observed that RACF rates were similar to wildtype in ∆tpm1 cells expressing either Tpm1 or Tpm2 (Fig. S6E, S6F), suggesting that Tpm1 is not involved in RACF regulation. Taken together, these results suggest a specific role for Tpm2, but not Tpm1, in RACF regulation in S. cerevisiae, consistent with previous literature (Huckaba et al., 2006).

      Minor comments: __1.__The growth of tpm1∆ with empty plasmid in Fig S3A is strangely strong (different from other figures).

      Response: __ We thank the reviewer for pointing this out. We have now repeated the drop test multiple times (__Fig. R2), but we see similar growth rates as the drop test already presented in Fig. S4A. __At this point, it would be difficult to ascertain the basis of this difference observed at 23{degree sign}C and 30{degree sign}C, but a recent study that links leucine levels to actin cable stability (Sing et al., 2022) might explain the faster growth of these ∆tpm1 cells containing a leu2 gene carrying high-copy plasmid. However, there is no effect on growth rate at 37{degree sign}C which is consistent with other spot assays shown in __Fig. S1D, S4F, S5A.


      Significance

      I am a cell biologist with expertise in both yeast and actin cytoskeleton.

      The question of how tropomyosin localizes to specific actin networks is still open and a current avenue of study. Studies in other organisms have shown that different tropomyosin isoforms, or their acetylated vs non-acetylated versions, localize to distinct actin structures. Proposed mechanisms include competition with other ABPs and preference imposed by the formin nucleator. The current study re-examines the function and localization of the two tropomyosin proteins from the budding yeast and reaches the conclusion that they co-decorate all formin-assembled structures and also share most functions, leading to the simple conclusion that the more important contribution of Tpm1 is simply linked to its higher expression. Once consolidated, the study will appeal to researchers working on the actin cytoskeleton.

      We thank the reviewer for their positive assessment of our work and the constructive feedback that has greatly improved the quality of our study. After addressing the points raised by the reviewer, we believe that our study has significantly gained in consolidating the major conclusions of our work.

      **Referees cross-commenting**

      Having read the other reviewers' comments, I do agree with reviewer 1 that it is not clear whether the Ala-Ser linker really mimics acetylation. I am less convinced than reviewer 3 that the key conclusions of the study are well supported, notably the issue of Tpm2 expression levels is not convincing to me.

      Response: __We acknowledge the reviewer's point about the effect of Ala-Ser dipeptide and would request the reviewer to refer to our response to Reviewer 1 (Question 1) for a more detailed discussion on this. We have also extensively addressed the question of Tpm2 expression levels as suggested by the reviewer (new data in __Figure S5) which has further strengthened the conclusions of our study.

      __Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary:__ The study presents the first fully functional fluorescently tagged Tpm proteins, enabling detailed probing of Tpm isoform localization and functions in live cells. The authors created a modified fusion protein, mNG-amTpm, which mimicked native N-terminal acetylation and restored both normal growth and full-length actin cables in yeast cells lacking native Tpm proteins, demonstrating the constructs' full functionality. They also show that Tpm1 and Tpm2 do not have a preference for actin cables nucleated by different formins (Bnr1 and Bni1). Contrary to previous reports, the study found that overexpressing Tpm2 in Δtpm1 cells could restore growth rates and actin cable formation. Furthermore, it is shown that despite its evolutionary divergence, Tpm2 retains actin-protective functions and can compensate for the loss of Tpm1, contributing to cellular robustness.

      Major and Minor Comments: 1. The key conclusions of this paper are convincing. However, I suggest that more detail be provided regarding the image analysis used in this study. Specifically, since threshold settings can impact the quality of the generated data and, therefore, its interpretation, it would be useful to see a representative example of the quantification methods used for actin cable length/number (as in refs. 80 and 81) and mitochondria morphology. These could be presented as Supplemental Figures. Additionally, it would help to interpret the results if the authors could be more specific about the statistical tests that were used.

      Response: __We agree with the reviewer's suggestions and have now updated our Materials and Methods section to describe the image analysis pipelines used in more detail. We have also added examples of quantification procedure for actin cable length/number and mitochondrial morphology as an additional Supplementary __Figure S7. Briefly, the following pipelines were used:

      • Actin cable length and number analysis: This was done exactly as mentioned in McInally et al., 2021, McInally et al., 2022. Actin cables were manually traced in Fiji as shown in __ S7A__, and then the traces files for each cell were run through a Python script (adapted from McInally et al., 2022) that outputs mean actin cable length and number per cell.
      • Mitochondria morphology: Mitochondria Analyzer plug-in in Fiji was used to segment out the mitochondrial fragments. The parameters used for 2D segmentation of mitochondria were first optimized using "2D Threshold Optimize" to find the most accurate segmentation and then the same parameters were run on all images. After segmentation of the mitochondrial network, measurements of fragment number were done using "Analyze Particles" function in Fiji. An example of the overall process is shown in __ S7B.__ As per the reviewer's suggestion, we have now included the description of the statistical test used in the Figure Legends of each Figure in the revised manuscript. We have used One-Way Anova with Tukey's Multiple Comparison test, Kruskal-Wallis test with Dunn's Multiple Comparisons, and Unpaired Two-tailed t-test using the in-built functions in GraphPad Prism (v.6.04).

      **Referees cross-commenting**

      I agree with both reviewers 1 and 2 regarding the issues with the Ala-Ser acetylation mimic and Tpm2 expression levels, respectively. I think the authors should be more careful in how they frame the results, but I consider that these issues do not invalidate the main conclusions of this study.

      Response: __We acknowledge the reviewer's concern about the Ala-Ser dipeptide and would request them to refer our earlier discussion on this in response to Reviewer 1 (Question 1) and Reviewer 2 (Question 2). We would also request the reviewer to refer to our answer to Reviewer 2 (Question 6) where we have extensively addressed the question of Tpm2 expression levels and their effect on rescue of Dtpm1 cells. This data is now presented as new __Figure S5 in our revised manuscript.

      Reviewer#3 (Significance (Required)):

      The finding that Tpm2 can compensate for the loss of Tpm1, restoring actin cable organization and normal growth rates, challenges previous assumptions about the non-redundant functions of these isoforms in Saccharomyces cerevisiae (ref. 16). It also supports a concentration-dependent and formin-independent localization of Tpm isoforms to actin cables in this species. The development of fully functional fluorescently tagged Tpm proteins is a significant methodological advancement. This advancement overcomes previous visualization challenges and allows for accurate in vivo studies of Tpm function and regulation in S. cerevisiae.

      The findings will be of particular interest to researchers in the field of cellular and molecular biology who study actin cytoskeleton dynamics. Additionally, it will be relevant for those utilizing advanced microscopy and live-cell imaging techniques.

      As a researcher, my experience lies in cytoskeleton dynamics and protein interactions, though I do not have specific experience related to tropomyosin. I use different yeast species as models and routinely employ live-cell imaging as a tool.

      We thank the reviewer for their positive outlook and assessment of our study. We have incorporated all their suggestions, and we are confident that the revised manuscript has significantly improved in quality due to these additions.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      The work is important and of potential value to areas other than the bone field because it supports a role and mechanism for beta-catenin that is novel and unusual. The findings are significant in that they support the presence of another anabolic pathway in bone that can be productively targeted for therapeutic goals. The data for the most part are convincing. The work could be strengthened by better characterizing the osteoclast KO of Malat1 related to the Lys cre model and by including biochemical markers of bone turnover from the mice.

      We thank the editors and reviewers for their time and their positive and insightful comments. We are pleased that the editors and reviewers were very enthusiastic, as stated in their Strength comments. We have performed experiments and addressed all of the points raised by the reviewers. We have revised the manuscript accordingly and the reviewers’ points are specifically addressed below. 

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary

      The authors were trying to discover a novel bone remodeling network system. They found that an IncRNA Malat1 plays a central role in the remodeling by binding to β-catenin and functioning through the β-catenin-OPG/Jagged1 pathway in osteoblasts and chondrocytes. In addition, Malat1 significantly promotes bone regeneration in fracture healing in vivo. Their findings suggest a new concept of Malat1 function in the skeletal system. One significantly different finding between this manuscript and the competing paper pertains to the role of Malat1 in osteoclast lineage, specifically, whether Malat1 functions intrinsically in osteoclast lineage or not.

      Strengths:

      This study provides strong genetic evidence demonstrating that Malat1 acts intrinsically in osteoblasts while suppressing osteoclastogenesis in a non-autonomous manner, whereas the other group did not utilize relevant conditional knockout mice. As shown in the results, Malat1 knockout mouse exhibited abnormal bone remodeling and turnover. Furthermore, they elucidated molecular function of Malat1, which is sufficient to understand the phenotype in vivo.

      We are grateful to the reviewer for highlighting the novelty, strengths and significance of our work.

      Weaknesses:

      Discussing differences between previous paper and their status would be highly informative and beneficial for the field, as it would elucidate the solid underlying mechanisms.

      These points have been fully addressed in the point-to-point response below.

      Reviewer #2 (Public Review):

      Summary:

      The authors investigated the roles of IncRNA Malat1 in bone homeostasis which was initially believed to be non-functional for physiology. They found that both Malat1 KO and conditional KO in osteoblast lineage exhibit significant osteoporosis due to decreased osteoblast bone formation and increased osteoclast resorption. More interestingly they found that deletion of Malat1 in osteoclast lineage cells does not affect osteoclast differentiation and function. Mechanistically, they found that Malat1 acts as a co-activator of b-Catenin directly regulating osteoblast activity and indirectly regulating osteoclast activity via mediating OPG, but not RANKL expression in osteoblast and chondrocyte. Their discoveries establish a previously unrecognized paradigm model of Malat1 function in the skeletal system, providing novel mechanistic insights into how a lncRNA integrates cellular crosstalk and molecular networks to fine-tune tissue homeostasis, and remodeling.

      Strengths:

      The authors generated global and conditional KO mice in osteoblast and osteoclast lineage cells and carefully analyzed the role of Matat1 with both in vivo and in vitro systems. The conclusion of this paper is mostly well supported by data.

      We are grateful to the reviewer for highlighting the novelty, strengths and significance of our work.

      Weaknesses:

      More objective biological and biochemical analyses are required.

      These points have been fully addressed in the point-to-point response below.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Qin and colleagues study the role of Malat1 in bone biology. This topic is interesting given the role of lncRNAs in multiple physiologic processes. A previous study (PMID 38493144) suggested a role for Malat1 in osteoclast maturation. However, the role of this lncRNA in osteoblast biology was previously not explored. Here, the authors note osteopenia with increased bone resorption in mice lacking Malat1 globally and in osteoblast lineage cells. At the mechanistic level, the authors suggest that Malat1 controls beta-catenin activity. These results advance the field regarding the role of this lncRNA in bone biology.

      Strengths:

      The manuscript is well-written and data are presented in a clear and easily understandable manner. The bone phenotype of osteoblast-specific Malat1 knockout mice is of high interest. The role of Malat1 in controlling beta-catenin activity and OPG expression is interesting and novel.

      We are grateful to the reviewer for highlighting the novelty, strengths and significance of our work.

      Weaknesses:

      The lack of a bone phenotype when Malat1 is deleted with LysM-Cre is of interest given the previous report suggesting a role for this lncRNA in osteoclasts. However, to interpret the findings here, the authors should investigate the deletion efficiency of Malat1 in osteoclast lineage cells in their model. The data in the fracture model in Figure 8 seems incomplete in the absence of a more complete characterization of callus histology and a thorough time course. The role of Malat1 and OPG in chondrocytes is unclear since the osteocalcin-Cre mice (which should retain normal Malat1 levels in chondrocytes) have similar bone loss as the global mutants.

      These points have been fully addressed in the point-to-point response below.

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      There are several suggestions for improving the manuscript, and we hope that you will review the recommendations carefully and make changes to the paper to address the concerns raised. Suggestions have been made to better characterize the osteoclast KO of Malat1 related to the Lys cre model as well as suggestions to include biochemical markers of bone turnover from your mice.

      These points have been fully addressed in the point-to-point response below.

      Reviewer #1 (Recommendations For The Authors):

      (1) Replicate numbers in Figure 3 should be noted.

      We thank the reviewer for this point. The experiments in Fig. 3 have been replicated three times, which is now noted in the figure legend.

      (2) It is novel to identify OPG expression in chondrocytes. More discussion is expected.

      Yes, a paragraph regarding this point has been added to the Discussion section.  

      Reviewer #2 (Recommendations For The Authors):

      (1) It is better to show serum osteoblast bone formation marker and osteoclast resorption marker, such as P1NP and CTx, in both Malat1 KO and osteoblast conditional KO mice.

      We thank the reviewer for this important point. Since CTx values are often influenced by food intake, we measured serum TRAP levels, which also reflect changes in osteoclastic bone resorption. We have observed that the serum osteoblastic bone formation marker P1NP was decreased, while osteoclastic bone resorption marker TRAP was increased, in both Malat1<sup>-/-</sup> and Malat1<sup>ΔOcn</sup> mice. These changes in serum biochemical markers of bone turnover are consistent with the bone phenotype caused by Malat1 deficiency. The new data are shown in Fig.1i, Fig. 2e, and Fig.5b.    

      (2) in vitro osteoblast differentiation assay is required to further confirm Malat1 regulates osteoblast differentiation.

      We thank the reviewer for this suggestion. As recommended, we have performed in vitro osteoblast differentiation multiple times using calvarial cells, a commonly used system in the field. However, we observed big variability in the culture results across different experimental batches, whether conducted by different scientists or the same individual. This variability is likely due to differences in the purity of the cultured cells, as literature shows that the current culture system in the field contains a mixture of tissue cells, including not only osteoblasts but also other cells, such as stromal and hematopoietic lineage cells (DOI: 10.1002/jbmr.4052). We hope to test osteoblast differentiation using a purer culture system once it becomes available in the field. In contrast, our in vivo data, indicated by multiple parameters, show consistent osteoblast and bone formation phenotypes across a large number of mice. Therefore, the in vivo results in our study strongly support our conclusion regarding Malat1's role in osteoblastic bone formation.

      (3) The authors found that Matat1 regulates osteoclast activity through OPG expression not only in osteoblasts, but also in chondrocytes and concluded that chondrocyte is involved in the crosstalk with osteoclast lineage cells in marrow. This is a very novel finding. Do the authors have any in vivo data to support this point, such as deleting Malat1 in chondrocyte lineage cells with chondrocyte-specific Cre?

      We appreciate the reviewer for highlighting our novel findings and providing valuable suggestions. Given the considerable time required to generate chondrocyte-specific conditional KO mice, we plan to thoroughly investigate the crosstalk between chondrocytes and osteoclasts via Malat1 in vivo in our next project.

      Reviewer #3 (Recommendations For The Authors):

      (1) Ideally would show male and female data side by side in the main text figures

      We thank the reviewer for this suggestion. The male and female data are now displayed side by side in Fig. 1b. 

      (2) The sample size for the in vivo datasets is quite large. A power calculation should be provided to better understand how the authors decided to analyze so many mice.

      Due to staff turnover during the pandemic, the first authors and several co-authors were involved in breeding the mice and collecting and analyzing bone samples. To avoid bias in sample selection, we pooled all the samples, resulting in a highly consistent phenotype across mice. This robust approach further strengthens our conclusion. 

      (3) The candidate gene approach to look at beta-catenin is a bit random, it would be ideal to assess Malat1 binding proteins in osteoblasts in an unbiased way. Also, does Malat1 bind bcatenin in other cell types? The importance of this point is further underscored by ref 47 which indicates that Malat binds TEAD3.

      As β-catenin is a key regulator in osteoblasts, we believe that studying the interaction between β-catenin and Malat1 is not random. Instead, this approach is well-founded and based on established knowledge in the field (as discussed below). In parallel, we are investigating genome-wide Malat1-bound targets beyond β-catenin, which will be reported in future studies. 

      More detailed points have been discussed in the manuscript: 

      Given that we identified Malat1 as a critical regulator in osteoblasts, we sought to investigate the mechanisms underlying the regulation of osteoblastic bone formation by Malat1. β-catenin is a central transcriptional factor in canonical Wnt signaling pathway, and plays an important role in positively regulating osteoblast differentiation and function (28-33). Upon stimulation, most notably from canonical Wnt ligands, β-catenin is stabilized and translocates into the nucleus, where it interacts with coactivators to activate target gene transcription. Previous reports observed a link between Malat1 and β-catenin signaling pathway in cancers (34,35), but the underlying molecular mechanisms in terms of how Malat1 interacts with β-catenin and regulates its nuclear retention and transcriptional activity are unclear. 

      Ref47 tested Malat1 binding to Tead3 in osteoclasts. However, a key difference between our findings and those of Ref47 is that both our in vitro and in vivo data, using myeloid osteoclastspecific conditional Malat1 KO mice, do not support an intrinsically significant role for Malat1 in osteoclasts. 

      (4) The statement on page 6 concluding that Malat acts as a scaffold to tether β-catenin in the nucleus is not supported by data in Fig 3d demonstrating that b-catenin nucleus translocation in response to Wnt3a is similar in control and Malat-deficient cells.

      The experiment in Fig. 3d is not designed to demonstrate Malat1 and β-catenin binding, but it is essential as the result rules out the possibility that Malat1 may affect β-catenin nuclear translocation. Moreover, we have utilized two robust approaches, CHIRP and RIP, to demonstrate that Malat1 acts as a scaffold to tether β-catenin in the nucleus (Fig. 3a, b, c, Supplementary Fig. 3). 

      (5) Figure 4e: can the authors show Malat deletion efficiency in the LysM-Cre model? This is important in light of the negative data in this figure and ref 47 which claims an osteoclast intrinsic role for Malat

      We thank the reviewer for this suggestion. The deletion efficiency of Malat1 in the LysM-Cre mice is very high (>90%). This data is now presented in Fig. 4e. 

      (6) Figure 5: since the magnitude of the effects on osteoclasts at the histology level are mild, it would be nice to also look at serum markers of bone resorption (CTX)

      The magnitude of osteoclast changes at the histological level in Fig. 5 is not mild in our view, as we observe 25-30% changes with statistical significance in the osteoclast parameters of Malat1ΔOcn mice. Since CTx values are often influenced by food intake, we measured serum TRAP levels, which reflect changes in osteoclastic bone resorption. As shown in Fig.5b, serum TRAP levels are significantly elevated in Malat1<sup>ΔOcn</sup> mice compared to control mice.

      (7) Data showing chondrocytic expression of OPG is not as novel as the authors claim. Should think about growth plate versus articular sources of OPG. Growth plate chondrocytes express OPG to regulate osteoclasts in the primary spongiosa which resorb mineralized cartilage.

      In the present study, we do not focus on comparing the sources of OPG from the chondrocytes in the growth plate versus articular cartilage. The novelty of our work lies in the discovery that Malat1 links chondrocyte and osteoclast activities through the β-catenin-OPG/Jagged1 axis. This Malat1-β-catenin-OPG/Jagged1 axis represents a novel mechanism regulating the crosstalk between chondrocytes and osteoclasts. 

      (8) The relevance of the chondrocyte role of Malat is unclear since the bone phenotype in global and osteocalcin-Cre mice is similar.

      Bone mass was decreased by 20% in Malat1<sup>ΔOcn</sup> mice, while a 30% reduction was observed in global KO (Malat1<sup>-/-</sup>) mice. This difference indicates potential contributions from other cell types, such as chondrocytes, and our results in Fig. 6 further support the impact of chondrocytes in Malat1's regulation of bone mass. We plan to thoroughly investigate the crosstalk between chondrocytes and osteoclasts via Malat1 in vivo in our next project.

      (9) Fracture data in Figure 8 seems incomplete, it would be ideal to support micro CT with histology and look at multiple time points.

      We thank the reviewer for this suggestion. We have performed histological analysis of our samples, and found that Malat1 promotes bone healing in the fracture model (Fig. 8f), which is consistent with our μCT data.

    1. But let's be more careful. Position is actually a relative quantity. Really, we should only ever write the position of two points relative to each other. We'll use e.g. ApC to denote the position of C measured from A. The left superscript looks mighty strange, but we'll see that it pays off once we start transforming points.

      \(\renewcommand{\vec}[1]{\overrightarrow{#1}}\)While I appreciate the introduction of monogram notation and the algebraic rules given in section 3.3 below, I believe a small minority of readers (myself included) would appreciate a reference to a more formal introduction (e.g. an introduction that relates points to the concepts of vector spaces).

      As far as I understand, the references provided in the introduction above do not formalize the notions given here, but instead develop them in more detail.

      To clarify, the concept of a "point" (and its relationship to vectors in a vector space) is not rigorously defined here. Similarly, the addition operation of points "\({}^Ap^B_F + {}^Bp^C_F = {}^Ap^C_F.\)" given in \((1)\) came out of thin air. These are just two of the several concepts presented on this page that I think would benefit from more formal grounding.

      Because of this, I looked around for a more formal overview and found [1] to be very helpful ([1] is freely available here from my end). In [1], the concept of an affine space is introduced (see also here).

      After introducing the concept of an affine space, the author then shows that the addition operation "\({}^Ap^B_F + {}^Bp^C_F = {}^Ap^C_F\)" given in \((1)\) is nothing but Chasles' relation, which the author formally proves.

      Additionally, the author explains affine combinations, affine maps, and other useful concepts (all of these concepts explain where the rest of the algebraic rules in section 3.3 came from). I believe this reference presents a different perspective that some readers may find helpful, and perhaps it can be mentioned somewhere near here.

      References

      [1] J. Gallier, “Basics of Affine Geometry,” in Geometric Methods and Applications: For Computer Science and Engineering, J. Gallier, Ed., New York, NY: Springer, 2011, pp. 7–63. doi: 10.1007/978-1-4419-9961-0_2.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment 

      This study presents valuable experimental and numerical results on the motility of a magnetotactic bacterium living in sedimentary environments, particularly in environments of varying magnetic field strengths. The evidence supporting the claims of the authors is solid, although the statistical significance comparing experiments with the numerical work is weak. The study will be of interest to biophysicists interested in bacterial motility. 

      We thank the reviewers and editors for their careful reading and the constructive comments. With respect to the statement about weak statistical significance, we think that this statement mixes two separate issues, the significance of the difference between experiments at 0 and 50µT and the comparison of experiments with simulations. We have amended our manuscript to address both points as described below. The difference between the experiments at 0 and 50µT is indeed significant, and the discrepancy between experiments and simulations can be explained by unavoidable differences in the way we quantify bacterial throughput.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors present experimental and numerical results on the motility Magnetospirillum gryphiswaldense MSR-1, a magnetotactic bacterium living in sedimentary environments. The authors manufactured microfluidic chips containing three-dimensional obstacles of irregular shape, that match the statistical features of the grains observed in the sediment via microcomputer tomography. The bacteria are furthermore subject to an external magnetic field, whose intensity can be varied. The key quantity measured in the experiments is the throughput ratio, defined as the ratio between the number of bacteria that reach the end of the microfluidic channel and the number of bacteria entering it. The main result is that the throughput ratio is non-monotonic and exhibits a maximum at magnetic field strength comparable with Earth's magnetic field. The authors rationalize the throughput suppression at large magnetic fields by quantifying the number of bacteria trapped in corners between grains. 

      Strengths: 

      While magnetotactic bacteria's general motility in bulk has been characterized, we know much less about their dynamics in a realistic setting, such as a disordered porous material. The micro-computer tomography of sediments and their artificial reconstruction in a microfluidic channel is a powerful method that establishes the rigorous methodology of this work. This technique can give access to further characterization of microbial motility. The coupling of experiments and computer simulations lends considerable strength to the claims of the authors, because the model parameters (with one exception) are directly measured in the experiments. 

      Weaknesses: 

      The main weakness of the manuscript pertains to the discussion of the statistical significance of the experimental throughput ratio. Especially when comparing results at zero and 50 micro Tesla. The simulations seem to predict a stronger effect than seen in the experiments. The authors do not address this discrepancy. 

      We thank the reviewer for their positive assessment and the detailed constructive remarks. 

      The increase in bacterial throughput between 0 and 50 µT is indeed more pronounced in the simulations than in the experiments, partly due to the fact that there is considerably more variability in the experimental data. We did two things to address this issue: (1) We performed additional statistical test addressing the difference between the experimental results at 0 and 50 µT. Indeed, the difference is only weakly significant (in contrast to the difference of either to 500µT). The increase is however consistent with the observation in the absence of obstacles in the channel, where we see a monotonous increase from 0 to 500 µT (Supp. Figure S5). We have added the test results in the caption of Fig. 3. (2) To address the difference between simulations and experiments, we added a section in Methods on how we determine the throughput and a short discussion in the Results section. The key points are that the initial condition is different in simulations and experiments and that the throughput is therefore quantified differently. This difference is due to experimental limitations: we cannot track bacteria through the whole channel and we wanted to avoid pushing them into the channel with fluid flow to avoid effects of flow on the results. As a consequence, bacteria continue to enter the IN region of the channel from the inlet during the experiment, while in the simulation, they all start at the beginning of the channel simultaneously. We expect this to mostly affect the case with diffusive transport (B=0).

      Reviewer #2 (Public Review): 

      Summary: 

      simulation study of magnetotactic bacteria in microfluidic channels containing sediment-mimicking obstacles. The obstacles were produced based on micro-computer tomography reconstructions of bacteria-rich sediment samples. The swimming of bacteria through these channels is found experimentally to display the highest throughput for physiological magnetic fields. Computer simulations of active Brownian particles, parameterized based on experimental trajectories are used to quantify the swimming throughput in detail. Similar behavior as in experiments is obtained, but also considerable variability between different channel geometries. Swimming at strong field is impeded by the trapping of bacteria in corners, while at weak fields the direction of motion is almost random. The trapping effect is confirmed in the experiments, as well as the escape of bacteria with reducing field strength. 

      Strengths: 

      This is a very careful and detailed study, which draws its main strength from the fruitful combination of the construction of novel microfluidic devices, their use in motility experiments, and simulations of active Brownian particles adapted to the experiment. Based on their results, the authors hypothesize that magnetotactic bacteria may have evolved to produce magnetic properties that are adapted to the geomagnetic field in order to balance movement and orientation in such crowded environments. They provide strong arguments in favor of such a hypothesis. 

      Weaknesses: 

      Some of the issues touched upon here have been studied also in other articles. It would be good to extend the list of references accordingly and discuss the relation briefly in the text. 

      We thank the reviewer for the constructive comments. We answer to the point concerning previous literature in the response to the recommendations below.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Here follows a list of points the authors should address. 

      (1) Are additional experiments feasible to decrease the statistical noise present in Fig. 3c? At the very least, the authors should discuss the statistical significance of the results at 50 muT vis-a-vis 0 T. 

      See our response to Strengths/Weaknesses above

      (2) The experimental setup is not immediately clear. I think that adding a panel from Fig. S1 (or a sketch thereof) would help clarify, especially in relation to the entry zone and end zone. 

      We are not sure what you mean. Fig. 3A already contains exactly such a panel. We have however added another supplementary figure that shows an additional detailed view of the setup (Fig. S3). In addition, we revised several figures: We have replaced Fig. S1 with a better version and exchanged the schematic view of the obstacle channel in Fig 1, removing the additional inlets that were not used in this study (also in Fig 3A), Instead we added a comment in Methods explaining their presence. Hopefully this makes the setup clear.

      (3) It should be also stated that there is no external flow imposed on the channel. 

      We have added such a statement in the description of the experiment (in section 2.2 Swimming of magnetotactic bacteria through sediment-mimicking obstacle channels.  

      (4) Fig. 3c and Fig. 6c are seemingly showing the same quantity (or closely related ones). The authors should use the same symbol and give an explicit mathematical definition. 

      The two quantities are not exactly the same, as we cannot directly quantify the flux of bacteria through the channel in our experiments. On the one hand, we cannot track bacteria through the whole channel, on the other hand, the initial conditions are not exactly the same as in the simulations. In the simulations all bacteria start at the same time at the entrance to the channel. In the experiments, they enter from the inlet and do so at different times (pushing them in with fluid flow would be possible, but carries the risk of perturbing the results due to induced flow through the channel). We have added a new section in the Methods section that explains this difference and describes the procedure used to obtain the throughput from the experiments in detail. We have also added a corresponding comment in the Result section, where the simulations are compared with the experiments. 

      Minor issues: 

      - Figures have different styles that should be unified. For example, the panel labels sometimes have round brackets and sometimes they don't.

      See above

      - Page 6, (muCT) should have the Greek letter mu 

      Thanks, corrected.

      - Fig. 3a is not very clear; see my point 2 above. 

      See above

      Reviewer #2 (Recommendations For The Authors): 

      I have only a few comments and questions, which the authors should address: 

      (1) The observed exponential dependence of decay time on the "well" depth could be related to the exponential density distribution of active particles in a gravitational field, which has been derived previously. Might be interesting to discuss such a possible connection. 

      Thank you for the suggestion, the two cases are indeed somewhat analogous with behaviors reminiscent of thermal processes with an effective temperature. Such a description is however not generally possible (even for sedimentation, only some features are described). We plan to address in future work whether it can be made more quantitative in our case of escape from the corner traps. We have included a short discussion of the analogy in the section on trapping and escape. 

      (2) The authors should consider the following relevant references, and discuss them briefly in their manuscript:

      - Sedimentation, trapping, and rectification of dilute bacteria J Tailleur, ME Cates EPL 86, 60002 (2009) 

      - Human spermatozoa migration in microchannels reveals boundary-following navigation P Denissenko, V Kantsler, DJ Smith, J Kirkman-Brown Proc. Natl. Acad. Sci. USA 109, 8007-8010 (2012) 

      - Wall accumulation of self-propelled spheres J Elgeti, G Gompper Europhysics Letters 101, 48003 (2013) 

      - Wall entrapment of peritrichous bacteria: a mesoscale hydrodynamics simulation study SM Mousavi, G Gompper, RG Winkler Son Maber 16 (20), 4866-4875 (2020) 

      - A Geometric Criterion for the Optimal Spreading of Active Polymers in Porous Media C Kurzthaler, S Mandal, T Bhabacharjee, H Löwen, SS Daba, HA Stone Nat. Commun. 12, 7088 (2021) 

      - Run-to-Tumble Variability Controls the Surface Residence Times of E. coli Bacteria G Junot, T Darnige, A Lindner, VA Martinez, J Arlt, A Dawson, WCK Poon, H Auradou, E Clement Phys. Rev. Leb. 128, 248101 (2022) 

      - Dynamics and phase separation of active Brownian particles on curved surfaces and in porous media P Iyer, RG Winkler, DA Fedosov, G Gompper Phys. Rev. Research 5, 033054 (2023) 

      We agree that there is a lot of literature on these aspects, specifically interaction of self-propelled objects with walls and motion of swimmers through porous media. We have slightly extended our overview of previous literature in the introduction and included most of these references.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1: 

      (1) Their results with human macrophages suggest that there are differences between murine and human macrophages in inflammasome-mediated restriction of STm growth. For example, Thurston et al. showed that in murine macrophages that inflammasome activation controls the replication of mutant STm that aberrantly invades the cytosol, but only slightly limits replication of WT STm. In contrast, here the authors found that primed human macrophages rely on caspase-1, gasdermin D and ninjurin-1 to restrict WT STm. I wonder if the priming of the human macrophages in this study could account for the differences in these studies. Along those lines, do the authors see the same results presented in this study in the absence of priming the macrophages with Pam3CSK4. I think that determining whether the control of intracellular STm replication is dependent on priming is very important.

      We thank the Reviewer for their careful attention to our manuscript and for their thoughtful comments. We have addressed this question about the impact of priming by repeating the bacterial intracellular burden assays in unprimed WT and CASP1-/- THP-1 cells. We have added additional figures to the manuscript to address this: Figure 1 – Figure Supplement 3. Under unprimed conditions, CASP1-/- cells still harbored significantly higher bacterial burdens at 6 hpi and a significant fold-increase in bacterial CFUs compared to WT cells. These results suggest that the caspase-1-mediated restriction of intracellular Salmonella replication in human macrophages is independent of priming. 

      (2) Another difference with the Thurston et al. paper is the way that the STm inoculum was prepared - stationary phase bacteria that were opsonized. Could this also account for differences between the two studies rather than differences between murine and human macrophages in inflammasome-dependent control of STm?

      We thank the Reviewer for this excellent suggestion. To address this possibility, we repeated the bacterial intracellular burden assays in WT and CASP1-/- THP-1 cells using stationary phase bacteria. We infected WT and CASP1-/- THP-1 cells with stationary phase Salmonella, and we subsequently assayed for intracellular bacterial burdens. These data have now been added to the manuscript in Figure 1 – Figure Supplement 4. Interestingly, we did not observe any fold-change in the bacterial colony forming units in both the WT and CASP1-/- THP-1 cells for the stationary phase Salmonella. These data indicate that by 6 hours postinfection, Salmonella do not replicate efficiently in human macrophages unless grown under SPI-1-inducing conditions. Furthermore, these results suggest that differences in how the Salmonella inoculum is prepared may contribute to the discrepancies between our study and previous studies, as noted by the Reviewer. 

      (3) The authors show that the pore-forming proteins GSDMD and Ninj1 contribute to control of STm replication in human macrophages. Is it possible that leakage of gentamicin from the media contributes to this control?

      Response: We thank the Reviewer for their insightful comment. We have addressed this question on the impact of gentamicin by repeating the bacterial intracellular burden assays using a lower concentration of gentamicin in combination with extensively washing the cells with RPMI media to remove the gentamicin. WT and CASP1-/- THP-1 cells were infected with WT Salmonella. Then, at 30 minutes post-infection, cells were treated with 25 μg/ml of gentamicin to kill any extracellular bacteria. At 1 hour post-infection (hpi), the cells were washed for a total of five times with fresh RPMI to remove the gentamicin, and then the media was replaced with fresh media containing no gentamicin. In parallel, we also treated cells with 100 μg/ml of gentamicin at 30 minutes post-infection, washed the cells five times with fresh RPMI at 1 hpi to remove the gentamicin, and then replaced the media with fresh media containing 10 μg/ml of gentamicin. This data has now been included in the manuscript as Figure 1 – Figure Supplement 5. We observed similar levels in the intracellular bacterial burdens at 1 hpi and 6 hpi and a fold-increase in bacterial colony forming units in CASP1-/- cells compared to WT cells across both gentamicin conditions, suggesting that gentamicin appears to not contribute to the intracellular control of Salmonella replication in human macrophages. Of note, we also tried repeating the bacterial intracellular burden assays without gentamicin, using only washes to remove extracellular at 1 hpi; however, under these experimental conditions, we observed high levels of extracellular Salmonella. Therefore, we relied on using a lower concentration of gentamicin to kill extracellular Salmonella in conjunction with extensive washing to remove the gentamicin for the remainder of the infection. 

      (4) One major question that remains to be answered is whether casp-1 plays a direct role in the intracellular localization of STm. If the authors quantify the percentage of vacuolar vs. cytosolic bacteria at early time points in WT and casp-1 KO macrophages, would that be the same in the presence and absence of casp-1? If so, then this would suggest that there is a basal level of bacterial-dependent lysis of the SCV and in WT macrophages the presence of cytosolic PAMPS trigger cell death and bacteria can't replicate in the cytosol. However, in the inflammasome KO macrophages, the host cell remains alive and bacteria can replicate in the cytosol.

      We thank this Reviewer for raising this important point. We have addressed this experimentally by quantifying the percentage of vacuolar vs. cytosolic Salmonella at 2 hpi in WT, NAIP-/-, and CASP1-/- THP-1 cells using a chloroquine (CHQ) resistance assay. This data has now been included in the manuscript in the new Figure 5A. The original subfigures of Figure 5 have consequently been rearranged. We did not observe any significant differences in vacuolar and cytosolic bacterial burdens at this early time point in WT, NAIP-/-, and CASP1-/- THP-1 cells. As noted by the Reviewer, these results suggest that the basal level of bacterialdependent lysis of the SCV in human macrophages is not dependent on caspase-1 or NAIP. 

      Reviewer #3: 

      (1) The main weaknesses of the study are the inherent limitations of tissue culture models. For example, to study interaction of Salmonella with host cells in vitro, it is necessary to kill extracellular bacteria using gentamicin. However, since Salmonella-induced macrophage cell death damages the cytosolic membrane, gentamicin can reach intracellular bacteria and contribute to changes in CFU observed in tissue culture models (major point 1). This can result in tissue culture "artefacts" (i.e., observations/conclusions that cannot be recapitulated in vivo). For example, intracellular replication of Salmonella in murine macrophages requires T3SS-2 in vitro, but T3SS-2 is dispensable for replication in macrophages of the spleen in vivo (Grant et al., 2012).  

      We thank the Reviewer for their helpful comments and insightful suggestions. We have addressed some of the concerns about gentamicin in our response to Reviewer #1 above. To address the Reviewer’s concerns further, we have included language to acknowledge the limitations of our study based on the artefacts of tissue culture models in our Discussion section: “In this study, we utilized tissue culture models to examine intracellular Salmonella replication in human macrophages. These in vitro systems allow for precise control of experimental conditions and, therefore, serve as powerful tools to interrogate the molecular mechanisms underlying inflammasome responses and Salmonella replication in both immortalized and primary human cells. Still, there are limitations of tissue culture models, as they lack the inherent complexity of tissues and organs in vivo. To assess whether our findings reflect Salmonella dynamics in the mammalian host, it will be important to complement our studies and extend the implications of our work using approaches that model more complex systems, such as organoids or organ explant models co-cultured with immune cells, and in vivo techniques, such as humanized mouse models.”

      (2) In Figure 1: are increased CFU in WT vs CASP1-deficient THP-1 cells due to Caspase 1 restricting intracellular replication or due to Caspase-1 causing pore formation to allow gentamicin to enter the cytosol thereby restricting bacterial replication? The same question arises about Caspase-4 in Figure 2, where differences in CFU are observed only at 24h when differences in cell death also become apparent. The idea that gentamicin entering the cytosol through pores is responsible for controlling intracellular Salmonella replication is also consistent with the finding that GSDMD-mediated pore formation is required for restricting intracellular Salmonella replication (Figure 3). Similarly, the finding that inflammasome responses primarily control Salmonella replication in the cytosol could be explained by an intact SCV membrane protecting Salmonella from gentamicin (Figure 5). 

      We thank the Reviewer for highlighting this important point regarding gentamicin.

      We have addressed this question in our response above to Review #1 and in Figure 1 – Figure Supplement 5. We observed caspase-1-mediated restriction of Salmonella in human macrophages even when cells were treated with a lower concentration of gentamicin (25 μg/ml) for 30 minutes and then extensively washed with RPMI media to remove any gentamicin for the remainder of the infection. These data suggest that gentamicin is likely not responsible for controlling intracellular Salmonella in human macrophages.

    1. Who Can Name the Bigger Number?by Scott Aaronson [Author's blog] [This essay in Spanish] [This essay in French] [This essay in Chinese] In an old joke, two noblemen vie to name the bigger number. The first, after ruminating for hours, triumphantly announces "Eighty-three!" The second, mightily impressed, replies "You win." A biggest number contest is clearly pointless when the contestants take turns. But what if the contestants write down their numbers simultaneously, neither aware of the other’s? To introduce a talk on "Big Numbers," I invite two audience volunteers to try exactly this. I tell them the rules: You have fifteen seconds. Using standard math notation, English words, or both, name a single whole number—not an infinity—on a blank index card. Be precise enough for any reasonable modern mathematician to determine exactly what number you’ve named, by consulting only your card and, if necessary, the published literature. So contestants can’t say "the number of sand grains in the Sahara," because sand drifts in and out of the Sahara regularly. Nor can they say "my opponent’s number plus one," or "the biggest number anyone’s ever thought of plus one"—again, these are ill-defined, given what our reasonable mathematician has available. Within the rules, the contestant who names the bigger number wins. Are you ready? Get set. Go. The contest’s results are never quite what I’d hope. Once, a seventh-grade boy filled his card with a string of successive 9’s. Like many other big-number tyros, he sought to maximize his number by stuffing a 9 into every place value. Had he chosen easy-to-write 1’s rather than curvaceous 9’s, his number could have been millions of times bigger. He still would been decimated, though, by the girl he was up against, who wrote a string of 9’s followed by the superscript 999. Aha! An exponential: a number multiplied by itself 999 times. Noticing this innovation, I declared the girl’s victory without bothering to count the 9’s on the cards. And yet the girl’s number could have been much bigger still, had she stacked the mighty exponential more than once. Take , for example. This behemoth, equal to 9387,420,489, has 369,693,100 digits. By comparison, the number of elementary particles in the observable universe has a meager 85 digits, give or take. Three 9’s, when stacked exponentially, already lift us incomprehensibly beyond all the matter we can observe—by a factor of about 10369,693,015. And we’ve said nothing of or . Place value, exponentials, stacked exponentials: each can express boundlessly big numbers, and in this sense they’re all equivalent. But the notational systems differ dramatically in the numbers they can express concisely. That’s what the fifteen-second time limit illustrates. It takes the same amount of time to write 9999, 9999, and —yet the first number is quotidian, the second astronomical, and the third hyper-mega astronomical. The key to the biggest number contest is not swift penmanship, but rather a potent paradigm for concisely capturing the gargantuan. Such paradigms are historical rarities. We find a flurry in antiquity, another flurry in the twentieth century, and nothing much in between. But when a new way to express big numbers concisely does emerge, it’s often a byproduct of a major scientific revolution: systematized mathematics, formal logic, computer science. Revolutions this momentous, as any Kuhnian could tell you, only happen under the right social conditions. Thus is the story of big numbers a story of human progress. And herein lies a parallel with another mathematical story. In his remarkable and underappreciated book A History of π, Petr Beckmann argues that the ratio of circumference to diameter is "a quaint little mirror of the history of man." In the rare societies where science and reason found refuge—the early Athens of Anaxagoras and Hippias, the Alexandria of Eratosthenes and Euclid, the seventeenth-century England of Newton and Wallis—mathematicians made tremendous strides in calculating π. In Rome and medieval Europe, by contrast, knowledge of π stagnated. Crude approximations such as the Babylonians’ 25/8 held sway. This same pattern holds, I think, for big numbers. Curiosity and openness lead to fascination with big numbers, and to the buoyant view that no quantity, whether of the number of stars in the galaxy or the number of possible bridge hands, is too immense for the mind to enumerate. Conversely, ignorance and irrationality lead to fatalism concerning big numbers. Historian Ilan Vardi cites the ancient Greek term sand-hundred, colloquially meaning zillion; as well as a passage from Pindar’s Olympic Ode II asserting that "sand escapes counting." ¨ But sand doesn’t escape counting, as Archimedes recognized in the third century B.C. Here’s how he began The Sand-Reckoner, a sort of pop-science article addressed to the King of Syracuse: There are some ... who think that the number of the sand is infinite in multitude ... again there are some who, without regarding it as infinite, yet think that no number has been named which is great enough to exceed its multitude ... But I will try to show you [numbers that] exceed not only the number of the mass of sand equal in magnitude to the earth ... but also that of a mass equal in magnitude to the universe. This Archimedes proceeded to do, essentially by using the ancient Greek term myriad, meaning ten thousand, as a base for exponentials. Adopting a prescient cosmological model of Aristarchus, in which the "sphere of the fixed stars" is vastly greater than the sphere in which the Earth revolves around the sun, Archimedes obtained an upper bound of 1063 on the number of sand grains needed to fill the universe. (Supposedly 1063 is the biggest number with a lexicographically standard American name: vigintillion. But the staid vigintillion had better keep vigil lest it be encroached upon by the more whimsically-named googol, or 10100, and googolplex, or .) Vast though it was, of course, 1063 wasn’t to be enshrined as the all-time biggest number. Six centuries later, Diophantus developed a simpler notation for exponentials, allowing him to surpass . Then, in the Middle Ages, the rise of Arabic numerals and place value made it easy to stack exponentials higher still. But Archimedes’ paradigm for expressing big numbers wasn’t fundamentally surpassed until the twentieth century. And even today, exponentials dominate popular discussion of the immense. Consider, for example, the oft-repeated legend of the Grand Vizier in Persia who invented chess. The King, so the legend goes, was delighted with the new game, and invited the Vizier to name his own reward. The Vizier replied that, being a modest man, he desired only one grain of wheat on the first square of a chessboard, two grains on the second, four on the third, and so on, with twice as many grains on each square as on the last. The innumerate King agreed, not realizing that the total number of grains on all 64 squares would be 264-1, or 18.6 quintillion—equivalent to the world’s present wheat production for 150 years. Fittingly, this same exponential growth is what makes chess itself so difficult. There are only about 35 legal choices for each chess move, but the choices multiply exponentially to yield something like 1050 possible board positions—too many for even a computer to search exhaustively. That’s why it took until 1997 for a computer, Deep Blue, to defeat the human world chess champion. And in Go, which has a 19-by-19 board and over 10150 possible positions, even an amateur human can still rout the world’s top-ranked computer programs. Exponential growth plagues computers in other guises as well. The traveling salesman problem asks for the shortest route connecting a set of cities, given the distances between each pair of cities. The rub is that the number of possible routes grows exponentially with the number of cities. When there are, say, a hundred cities, there are about 10158 possible routes, and, although various shortcuts are possible, no known computer algorithm is fundamentally better than checking each route one by one. The traveling salesman problem belongs to a class called NP-complete, which includes hundreds of other problems of practical interest. (NP stands for the technical term ‘Nondeterministic Polynomial-Time.’) It’s known that if there’s an efficient algorithm for any NP-complete problem, then there are efficient algorithms for all of them. Here ‘efficient’ means using an amount of time proportional to at most the problem size raised to some fixed power—for example, the number of cities cubed. It’s conjectured, however, that no efficient algorithm for NP-complete problems exists. Proving this conjecture, called P¹ NP, has been a great unsolved problem of computer science for thirty years. Although computers will probably never solve NP-complete problems efficiently, there’s more hope for another grail of computer science: replicating human intelligence. The human brain has roughly a hundred billion neurons linked by a hundred trillion synapses. And though the function of an individual neuron is only partially understood, it’s thought that each neuron fires electrical impulses according to relatively simple rules up to a thousand times each second. So what we have is a highly interconnected computer capable of maybe 1014 operations per second; by comparison, the world’s fastest parallel supercomputer, the 9200-Pentium Pro teraflops machine at Sandia National Labs, can perform 1012 operations per second. Contrary to popular belief, gray mush is not only hard-wired for intelligence: it surpasses silicon even in raw computational power. But this is unlikely to remain true for long. The reason is Moore’s Law, which, in its 1990’s formulation, states that the amount of information storable on a silicon chip grows exponentially, doubling roughly once every two years. Moore’s Law will eventually play out, as microchip components reach the atomic scale and conventional lithography falters. But radical new technologies, such as optical computers, DNA computers, or even quantum computers, could conceivably usurp silicon’s place. Exponential growth in computing power can’t continue forever, but it may continue long enough for computers—at least in processing power—to surpass human brains. To prognosticators of artificial intelligence, Moore’s Law is a glorious herald of exponential growth. But exponentials have a drearier side as well. The human population recently passed six billion and is doubling about once every forty years. At this exponential rate, if an average person weighs seventy kilograms, then by the year 3750 the entire Earth will be composed of human flesh. But before you invest in deodorant, realize that the population will stop increasing long before this—either because of famine, epidemic disease, global warming, mass species extinctions, unbreathable air, or, entering the speculative realm, birth control. It’s not hard to fathom why physicist Albert Bartlett asserted "the greatest shortcoming of the human race" to be "our inability to understand the exponential function." Or why Carl Sagan advised us to "never underestimate an exponential." In his book Billions & Billions, Sagan gave some other depressing consequences of exponential growth. At an inflation rate of five percent a year, a dollar is worth only thirty-seven cents after twenty years. If a uranium nucleus emits two neutrons, both of which collide with other uranium nuclei, causing them to emit two neutrons, and so forth—well, did I mention nuclear holocaust as a possible end to population growth? ¨ Exponentials are familiar, relevant, intimately connected to the physical world and to human hopes and fears. Using the notational systems I’ll discuss next, we can concisely name numbers that make exponentials picayune by comparison, that subjectively speaking exceed as much as the latter exceeds 9. But these new systems may seem more abstruse than exponentials. In his essay "On Number Numbness," Douglas Hofstadter leads his readers to the precipice of these systems, but then avers: If we were to continue our discussion just one zillisecond longer, we would find ourselves smack-dab in the middle of the theory of recursive functions and algorithmic complexity, and that would be too abstract. So let’s drop the topic right here. But to drop the topic is to forfeit, not only the biggest number contest, but any hope of understanding how stronger paradigms lead to vaster numbers. And so we arrive in the early twentieth century, when a school of mathematicians called the formalists sought to place all of mathematics on a rigorous axiomatic basis. A key question for the formalists was what the word ‘computable’ means. That is, how do we tell whether a sequence of numbers can be listed by a definite, mechanical procedure? Some mathematicians thought that ‘computable’ coincided with a technical notion called ‘primitive recursive.’ But in 1928 Wilhelm Ackermann disproved them by constructing a sequence of numbers that’s clearly computable, yet grows too quickly to be primitive recursive. Ackermann’s idea was to create an endless procession of arithmetic operations, each more powerful than the last. First comes addition. Second comes multiplication, which we can think of as repeated addition: for example, 5´3 means 5 added to itself 3 times, or 5+5+5 = 15. Third comes exponentiation, which we can think of as repeated multiplication. Fourth comes ... what? Well, we have to invent a weird new operation, for repeated exponentiation. The mathematician Rudy Rucker calls it ‘tetration.’ For example, ‘5 tetrated to the 3’ means 5 raised to its own power 3 times, or , a number with 2,185 digits. We can go on. Fifth comes repeated tetration: shall we call it ‘pentation’? Sixth comes repeated pentation: ‘hexation’? The operations continue infinitely, with each one standing on its predecessor to peer even higher into the firmament of big numbers. If each operation were a candy flavor, then the Ackermann sequence would be the sampler pack, mixing one number of each flavor. First in the sequence is 1+1, or (don’t hold your breath) 2. Second is 2´2, or 4. Third is 3 raised to the 3rd power, or 27. Hey, these numbers aren’t so big! Fee. Fi. Fo. Fum. Fourth is 4 tetrated to the 4, or , which has 10154 digits. If you’re planning to write this number out, better start now. Fifth is 5 pentated to the 5, or with ‘5 pentated to the 4’ numerals in the stack. This number is too colossal to describe in any ordinary terms. And the numbers just get bigger from there. Wielding the Ackermann sequence, we can clobber unschooled opponents in the biggest-number contest. But we need to be careful, since there are several definitions of the Ackermann sequence, not all identical. Under the fifteen-second time limit, here’s what I might write to avoid ambiguity: A(111)—Ackermann seq—A(1)=1+1, A(2)=2´2, A(3)=33, etc Recondite as it seems, the Ackermann sequence does have some applications. A problem in an area called Ramsey theory asks for the minimum dimension of a hypercube satisfying a certain property. The true dimension is thought to be 6, but the lowest dimension anyone’s been able is prove is so huge that it can only be expressed using the same ‘weird arithmetic’ that underlies the Ackermann sequence. Indeed, the Guinness Book of World Records once listed this dimension as the biggest number ever used in a mathematical proof. (Another contender for the title once was Skewes’ number, about , which arises in the study of how prime numbers are distributed. The famous mathematician G. H. Hardy quipped that Skewes’ was "the largest number which has ever served any definite purpose in mathematics.") What’s more, Ackermann’s briskly-rising cavalcade performs an occasional cameo in computer science. For example, in the analysis of a data structure called ‘Union-Find,’ a term gets multiplied by the inverse of the Ackermann sequence—meaning, for each whole number X, the first number N such that the Nth Ackermann number is bigger than X. The inverse grows as slowly as Ackermann’s original sequence grows quickly; for all practical purposes, the inverse is at most 4. ¨ Ackermann numbers are pretty big, but they’re not yet big enough. The quest for still bigger numbers takes us back to the formalists. After Ackermann demonstrated that ‘primitive recursive’ isn’t what we mean by ‘computable,’ the question still stood: what do we mean by ‘computable’? In 1936, Alonzo Church and Alan Turing independently answered this question. While Church answered using a logical formalism called the lambda calculus, Turing answered using an idealized computing machine—the Turing machine—that, in essence, is equivalent to every Compaq, Dell, Macintosh, and Cray in the modern world. Turing’s paper describing his machine, "On Computable Numbers," is rightly celebrated as the founding document of computer science. "Computing," said Turing, is normally done by writing certain symbols on paper. We may suppose this paper to be divided into squares like a child’s arithmetic book. In elementary arithmetic the 2-dimensional character of the paper is sometimes used. But such use is always avoidable, and I think it will be agreed that the two-dimensional character of paper is no essential of computation. I assume then that the computation is carried out on one-dimensional paper, on a tape divided into squares. Turing continued to explicate his machine using ingenious reasoning from first principles. The tape, said Turing, extends infinitely in both directions, since a theoretical machine ought not be constrained by physical limits on resources. Furthermore, there’s a symbol written on each square of the tape, like the ‘1’s and ‘0’s in a modern computer’s memory. But how are the symbols manipulated? Well, there’s a ‘tape head’ moving back and forth along the tape, examining one square at a time, writing and erasing symbols according to definite rules. The rules are the tape head’s program: change them, and you change what the tape head does. Turing’s august insight was that we can program the tape head to carry out any computation. Turing machines can add, multiply, extract cube roots, sort, search, spell-check, parse, play Tic-Tac-Toe, list the Ackermann sequence. If we represented keyboard input, monitor output, and so forth as symbols on the tape, we could even run Windows on a Turing machine. But there’s a problem. Set a tape head loose on a sequence of symbols, and it might stop eventually, or it might run forever—like the fabled programmer who gets stuck in the shower because the instructions on the shampoo bottle read "lather, rinse, repeat." If the machine’s going to run forever, it’d be nice to know this in advance, so that we don’t spend an eternity waiting for it to finish. But how can we determine, in a finite amount of time, whether something will go on endlessly? If you bet a friend that your watch will never stop ticking, when could you declare victory? But maybe there’s some ingenious program that can examine other programs and tell us, infallibly, whether they’ll ever stop running. We just haven’t thought of it yet. Nope. Turing proved that this problem, called the Halting Problem, is unsolvable by Turing machines. The proof is a beautiful example of self-reference. It formalizes an old argument about why you can never have perfect introspection: because if you could, then you could determine what you were going to do ten seconds from now, and then do something else. Turing imagined that there was a special machine that could solve the Halting Problem. Then he showed how we could have this machine analyze itself, in such a way that it has to halt if it runs forever, and run forever if it halts. Like a hound that finally catches its tail and devours itself, the mythical machine vanishes in a fury of contradiction. (That’s the sort of thing you don’t say in a research paper.) ¨ "Very nice," you say (or perhaps you say, "not nice at all"). "But what does all this have to do with big numbers?" Aha! The connection wasn’t published until May of 1962. Then, in the Bell System Technical Journal, nestled between pragmatically-minded papers on "Multiport Structures" and "Waveguide Pressure Seals," appeared the modestly titled "On Non-Computable Functions" by Tibor Rado. In this paper, Rado introduced the biggest numbers anyone had ever imagined. His idea was simple. Just as we can classify words by how many letters they contain, we can classify Turing machines by how many rules they have in the tape head. Some machines have only one rule, others have two rules, still others have three rules, and so on. But for each fixed whole number N, just as there are only finitely many distinct words with N letters, so too are there only finitely many distinct machines with N rules. Among these machines, some halt and others run forever when started on a blank tape. Of the ones that halt, asked Rado, what’s the maximum number of steps that any machine takes before it halts? (Actually, Rado asked mainly about the maximum number of symbols any machine can write on the tape before halting. But the maximum number of steps, which Rado called S(n), has the same basic properties and is easier to reason about.) Rado called this maximum the Nth "Busy Beaver" number. (Ah yes, the early 1960’s were a more innocent age.) He visualized each Turing machine as a beaver bustling busily along the tape, writing and erasing symbols. The challenge, then, is to find the busiest beaver with exactly N rules, albeit not an infinitely busy one. We can interpret this challenge as one of finding the "most complicated" computer program N bits long: the one that does the most amount of stuff, but not an infinite amount. Now, suppose we knew the Nth Busy Beaver number, which we’ll call BB(N). Then we could decide whether any Turing machine with N rules halts on a blank tape. We’d just have to run the machine: if it halts, fine; but if it doesn’t halt within BB(N) steps, then we know it never will halt, since BB(N) is the maximum number of steps it could make before halting. Similarly, if you knew that all mortals died before age 200, then if Sally lived to be 200, you could conclude that Sally was immortal. So no Turing machine can list the Busy Beaver numbers—for if it could, it could solve the Halting Problem, which we already know is impossible. But here’s a curious fact. Suppose we could name a number greater than the Nth Busy Beaver number BB(N). Call this number D for dam, since like a beaver dam, it’s a roof for the Busy Beaver below. With D in hand, computing BB(N) itself becomes easy: we just need to simulate all the Turing machines with N rules. The ones that haven’t halted within D steps—the ones that bash through the dam’s roof—never will halt. So we can list exactly which machines halt, and among these, the maximum number of steps that any machine takes before it halts is BB(N). Conclusion? The sequence of Busy Beaver numbers, BB(1), BB(2), and so on, grows faster than any computable sequence. Faster than exponentials, stacked exponentials, the Ackermann sequence, you name it. Because if a Turing machine could compute a sequence that grows faster than Busy Beaver, then it could use that sequence to obtain the D‘s—the beaver dams. And with those D’s, it could list the Busy Beaver numbers, which (sound familiar?) we already know is impossible. The Busy Beaver sequence is non-computable, solely because it grows stupendously fast—too fast for any computer to keep up with it, even in principle. This means that no computer program could list all the Busy Beavers one by one. It doesn’t mean that specific Busy Beavers need remain eternally unknowable. And in fact, pinning them down has been a computer science pastime ever since Rado published his article. It’s easy to verify that BB(1), the first Busy Beaver number, is 1. That’s because if a one-rule Turing machine doesn’t halt after the very first step, it’ll just keep moving along the tape endlessly. There’s no room for any more complex behavior. With two rules we can do more, and a little grunt work will ascertain that BB(2) is 6. Six steps. What about the third Busy Beaver? In 1965 Rado, together with Shen Lin, proved that BB(3) is 21. The task was an arduous one, requiring human analysis of many machines to prove that they don’t halt—since, remember, there’s no algorithm for listing the Busy Beaver numbers. Next, in 1983, Allan Brady proved that BB(4) is 107. Unimpressed so far? Well, as with the Ackermann sequence, don’t be fooled by the first few numbers. In 1984, A.K. Dewdney devoted a Scientific American column to Busy Beavers, which inspired amateur mathematician George Uhing to build a special-purpose device for simulating Turing machines. The device, which cost Uhing less than $100, found a five-rule machine that runs for 2,133,492 steps before halting—establishing that BB(5) must be at least as high. Then, in 1989, Heiner Marxen and Jürgen Buntrock discovered that BB(5) is at least 47,176,870. To this day, BB(5) hasn’t been pinned down precisely, and it could turn out to be much higher still. As for BB(6), Marxen and Buntrock set another record in 1997 by proving that it’s at least 8,690,333,381,690,951. A formidable accomplishment, yet Marxen, Buntrock, and the other Busy Beaver hunters are merely wading along the shores of the unknowable. Humanity may never know the value of BB(6) for certain, let alone that of BB(7) or any higher number in the sequence. Indeed, already the top five and six-rule contenders elude us: we can’t explain how they ‘work’ in human terms. If creativity imbues their design, it’s not because humans put it there. One way to understand this is that even small Turing machines can encode profound mathematical problems. Take Goldbach’s conjecture, that every even number 4 or higher is a sum of two prime numbers: 10=7+3, 18=13+5. The conjecture has resisted proof since 1742. Yet we could design a Turing machine with, oh, let’s say 100 rules, that tests each even number to see whether it’s a sum of two primes, and halts when and if it finds a counterexample to the conjecture. Then knowing BB(100), we could in principle run this machine for BB(100) steps, decide whether it halts, and thereby resolve Goldbach’s conjecture. We need not venture far in the sequence to enter the lair of basilisks. But as Rado stressed, even if we can’t list the Busy Beaver numbers, they’re perfectly well-defined mathematically. If you ever challenge a friend to the biggest number contest, I suggest you write something like this: BB(11111)—Busy Beaver shift #—1, 6, 21, etc If your friend doesn’t know about Turing machines or anything similar, but only about, say, Ackermann numbers, then you’ll win the contest. You’ll still win even if you grant your friend a handicap, and allow him the entire lifetime of the universe to write his number. The key to the biggest number contest is a potent paradigm, and Turing’s theory of computation is potent indeed. ¨ But what if your friend knows about Turing machines as well? Is there a notational system for big numbers more powerful than even Busy Beavers? Suppose we could endow a Turing machine with a magical ability to solve the Halting Problem. What would we get? We’d get a ‘super Turing machine’: one with abilities beyond those of any ordinary machine. But now, how hard is it to decide whether a super machine halts? Hmm. It turns out that not even super machines can solve this ‘super Halting Problem’, for the same reason that ordinary machines can’t solve the ordinary Halting Problem. To solve the Halting Problem for super machines, we’d need an even more powerful machine: a ‘super duper machine.’ And to solve the Halting Problem for super duper machines, we’d need a ‘super duper pooper machine.’ And so on endlessly. This infinite hierarchy of ever more powerful machines was formalized by the logician Stephen Kleene in 1943 (although he didn’t use the term ‘super duper pooper’). Imagine a novel, which is imbedded in a longer novel, which itself is imbedded in an even longer novel, and so on ad infinitum. Within each novel, the characters can debate the literary merits of any of the sub-novels. But, by analogy with classes of machines that can’t analyze themselves, the characters can never critique the novel that they themselves are in. (This, I think, jibes with our ordinary experience of novels.) To fully understand some reality, we need to go outside of that reality. This is the essence of Kleene’s hierarchy: that to solve the Halting Problem for some class of machines, we need a yet more powerful class of machines. And there’s no escape. Suppose a Turing machine had a magical ability to solve the Halting Problem, and the super Halting Problem, and the super duper Halting Problem, and the super duper pooper Halting Problem, and so on endlessly. Surely this would be the Queen of Turing machines? Not quite. As soon as we want to decide whether a ‘Queen of Turing machines’ halts, we need a still more powerful machine: an ‘Empress of Turing machines.’ And Kleene’s hierarchy continues. But how’s this relevant to big numbers? Well, each level of Kleene’s hierarchy generates a faster-growing Busy Beaver sequence than do all the previous levels. Indeed, each level’s sequence grows so rapidly that it can only be computed by a higher level. For example, define BB2(N) to be the maximum number of steps a super machine with N rules can make before halting. If this super Busy Beaver sequence were computable by super machines, then those machines could solve the super Halting Problem, which we know is impossible. So the super Busy Beaver numbers grow too rapidly to be computed, even if we could compute the ordinary Busy Beaver numbers. You might think that now, in the biggest-number contest, you could obliterate even an opponent who uses the Busy Beaver sequence by writing something like this: BB2(11111). But not quite. The problem is that I’ve never seen these "higher-level Busy Beavers" defined anywhere, probably because, to people who know computability theory, they’re a fairly obvious extension of the ordinary Busy Beaver numbers. So our reasonable modern mathematician wouldn’t know what number you were naming. If you want to use higher-level Busy Beavers in the biggest number contest, here’s what I suggest. First, publish a paper formalizing the concept in some obscure, low-prestige journal. Then, during the contest, cite the paper on your index card. To exceed higher-level Busy Beavers, we’d presumably need some new computational model surpassing even Turing machines. I can’t imagine what such a model would look like. Yet somehow I doubt that the story of notational systems for big numbers is over. Perhaps someday humans will be able concisely to name numbers that make Busy Beaver 100 seem as puerile and amusingly small as our nobleman’s eighty-three. Or if we’ll never name such numbers, perhaps other civilizations will. Is a biggest number contest afoot throughout the galaxy? ¨ You might wonder why we can’t transcend the whole parade of paradigms, and name numbers by a system that encompasses and surpasses them all. Suppose you wrote the following in the biggest number contest: The biggest whole number nameable with 1,000 characters of English text Surely this number exists. Using 1,000 characters, we can name only finitely many numbers, and among these numbers there has to be a biggest. And yet we’ve made no reference to how the number’s named. The English text could invoke Ackermann numbers, or Busy Beavers, or higher-level Busy Beavers, or even some yet more sweeping concept that nobody’s thought of yet. So unless our opponent uses the same ploy, we’ve got him licked. What a brilliant idea! Why didn’t we think of this earlier? Unfortunately it doesn’t work. We might as well have written One plus the biggest whole number nameable with 1,000 characters of English text This number takes at least 1,001 characters to name. Yet we’ve just named it with only 80 characters! Like a snake that swallows itself whole, our colossal number dissolves in a tumult of contradiction. What gives? The paradox I’ve just described was first published by Bertrand Russell, who attributed it to a librarian named G. G. Berry. The Berry Paradox arises not from mathematics, but from the ambiguity inherent in the English language. There’s no surefire way to convert an English phrase into the number it names (or to decide whether it names a number at all), which is why I invoked a "reasonable modern mathematician" in the rules for the biggest number contest. To circumvent the Berry Paradox, we need to name numbers using a precise, mathematical notational system, such as Turing machines—which is exactly the idea behind the Busy Beaver sequence. So in short, there’s no wily language trick by which to surpass Archimedes, Ackermann, Turing, and Rado, no royal road to big numbers. You might also wonder why we can’t use infinity in the contest. The answer is, for the same reason why we can’t use a rocket car in a bike race. Infinity is fascinating and elegant, but it’s not a whole number. Nor can we ‘subtract from infinity’ to yield a whole number. Infinity minus 17 is still infinity, whereas infinity minus infinity is undefined: it could be 0, 38, or even infinity again. Actually I should speak of infinities, plural. For in the late nineteenth century, Georg Cantor proved that there are different levels of infinity: for example, the infinity of points on a line is greater than the infinity of whole numbers. What’s more, just as there’s no biggest number, so too is there no biggest infinity. But the quest for big infinities is more abstruse than the quest for big numbers. And it involves, not a succession of paradigms, but essentially one: Cantor’s. ¨ So here we are, at the frontier of big number knowledge. As Euclid’s disciple supposedly asked, "what is the use of all this?" We’ve seen that progress in notational systems for big numbers mirrors progress in broader realms: mathematics, logic, computer science. And yet, though a mirror reflects reality, it doesn’t necessarily influence it. Even within mathematics, big numbers are often considered trivialities, their study an idle amusement with no broader implications. I want to argue a contrary view: that understanding big numbers is a key to understanding the world. Imagine trying to explain the Turing machine to Archimedes. The genius of Syracuse listens patiently as you discuss the papyrus tape extending infinitely in both directions, the time steps, states, input and output sequences. At last he explodes. "Foolishness!" he declares (or the ancient Greek equivalent). "All you’ve given me is an elaborate definition, with no value outside of itself." How do you respond? Archimedes has never heard of computers, those cantankerous devices that, twenty-three centuries from his time, will transact the world’s affairs. So you can’t claim practical application. Nor can you appeal to Hilbert and the formalist program, since Archimedes hasn’t heard of those either. But then it hits you: the Busy Beaver sequence. You define the sequence for Archimedes, convince him that BB(1000) is more than his 1063 grains of sand filling the universe, more even than 1063 raised to its own power 1063 times. You defy him to name a bigger number without invoking Turing machines or some equivalent. And as he ponders this challenge, the power of the Turing machine concept dawns on him. Though his intuition may never apprehend the Busy Beaver numbers, his reason compels him to acknowledge their immensity. Big numbers have a way of imbuing abstract notions with reality. Indeed, one could define science as reason’s attempt to compensate for our inability to perceive big numbers. If we could run at 280,000,000 meters per second, there’d be no need for a special theory of relativity: it’d be obvious to everyone that the faster we go, the heavier and squatter we get, and the faster time elapses in the rest of the world. If we could live for 70,000,000 years, there’d be no theory of evolution, and certainly no creationism: we could watch speciation and adaptation with our eyes, instead of painstakingly reconstructing events from fossils and DNA. If we could bake bread at 20,000,000 degrees Kelvin, nuclear fusion would be not the esoteric domain of physicists but ordinary household knowledge. But we can’t do any of these things, and so we have science, to deduce about the gargantuan what we, with our infinitesimal faculties, will never sense. If people fear big numbers, is it any wonder that they fear science as well and turn for solace to the comforting smallness of mysticism? But do people fear big numbers? Certainly they do. I’ve met people who don’t know the difference between a million and a billion, and don’t care. We play a lottery with ‘six ways to win!,’ overlooking the twenty million ways to lose. We yawn at six billion tons of carbon dioxide released into the atmosphere each year, and speak of ‘sustainable development’ in the jaws of exponential growth. Such cases, it seems to me, transcend arithmetical ignorance and represent a basic unwillingness to grapple with the immense. Whence the cowering before big numbers, then? Does it have a biological origin? In 1999, a group led by neuropsychologist Stanislas Dehaene reported evidence in Science that two separate brain systems contribute to mathematical thinking. The group trained Russian-English bilinguals to solve a set of problems, including two-digit addition, base-eight addition, cube roots, and logarithms. Some subjects were trained in Russian, others in English. When the subjects were then asked to solve problems approximately—to choose the closer of two estimates—they performed equally well in both languages. But when asked to solve problems exactly, they performed better in the language of their training. What’s more, brain-imaging evidence showed that the subjects’ parietal lobes, involved in spatial reasoning, were more active during approximation problems; while the left inferior frontal lobes, involved in verbal reasoning, were more active during exact calculation problems. Studies of patients with brain lesions paint the same picture: those with parietal lesions sometimes can’t decide whether 9 is closer to 10 or to 5, but remember the multiplication table; whereas those with left-hemispheric lesions sometimes can’t decide whether 2+2 is 3 or 4, but know that the answer is closer to 3 than to 9. Dehaene et al. conjecture that humans represent numbers in two ways. For approximate reckoning we use a ‘mental number line,’ which evolved long ago and which we likely share with other animals. But for exact computation we use numerical symbols, which evolved recently and which, being language-dependent, are unique to humans. This hypothesis neatly explains the experiment’s findings: the reason subjects performed better in the language of their training for exact computation but not for approximation problems is that the former call upon the verbally-oriented left inferior frontal lobes, and the latter upon the spatially-oriented parietal lobes. If Dehaene et al.’s hypothesis is correct, then which representation do we use for big numbers? Surely the symbolic one—for nobody’s mental number line could be long enough to contain , 5 pentated to the 5, or BB(1000). And here, I suspect, is the problem. When thinking about 3, 4, or 7, we’re guided by our spatial intuition, honed over millions of years of perceiving 3 gazelles, 4 mates, 7 members of a hostile clan. But when thinking about BB(1000), we have only language, that evolutionary neophyte, to rely upon. The usual neural pathways for representing numbers lead to dead ends. And this, perhaps, is why people are afraid of big numbers. Could early intervention mitigate our big number phobia? What if second-grade math teachers took an hour-long hiatus from stultifying busywork to ask their students, "How do you name really, really big numbers?" And then told them about exponentials and stacked exponentials, tetration and the Ackermann sequence, maybe even Busy Beavers: a cornucopia of numbers vaster than any they’d ever conceived, and ideas stretching the bounds of their imaginations. Who can name the bigger number? Whoever has the deeper paradigm. Are you ready? Get set. Go. References Petr Beckmann, A History of Pi, Golem Press, 1971. Allan H. Brady, "The Determination of the Value of Rado’s Noncomputable Function Sigma(k) for Four-State Turing Machines," Mathematics of Computation, vol. 40, no. 162, April 1983, pp 647- 665. Gregory J. Chaitin, "The Berry Paradox," Complexity, vol. 1, no. 1, 1995, pp. 26- 30. At http://www.umcs.maine.edu/~chaitin/unm2.html. A.K. Dewdney, The New Turing Omnibus: 66 Excursions in Computer Science, W.H. Freeman, 1993. S. Dehaene and E. Spelke and P. Pinel and R. Stanescu and S. Tsivkin, "Sources of Mathematical Thinking: Behavioral and Brain-Imaging Evidence," Science, vol. 284, no. 5416, May 7, 1999, pp. 970- 974. Douglas Hofstadter, Metamagical Themas: Questing for the Essence of Mind and Pattern, Basic Books, 1985. Chapter 6, "On Number Numbness," pp. 115- 135. Robert Kanigel, The Man Who Knew Infinity: A Life of the Genius Ramanujan, Washington Square Press, 1991. Stephen C. Kleene, "Recursive predicates and quantifiers," Transactions of the American Mathematical Society, vol. 53, 1943, pp. 41- 74. Donald E. Knuth, Selected Papers on Computer Science, CSLI Publications, 1996. Chapter 2, "Mathematics and Computer Science: Coping with Finiteness," pp. 31- 57. Dexter C. Kozen, Automata and Computability, Springer-Verlag, 1997. ———, The Design and Analysis of Algorithms, Springer-Verlag, 1991. Shen Lin and Tibor Rado, "Computer studies of Turing machine problems," Journal of the Association for Computing Machinery, vol. 12, no. 2, April 1965, pp. 196- 212. Heiner Marxen, Busy Beaver, at http://www.drb.insel.de/~heiner/BB/. ——— and Jürgen Buntrock, "Attacking the Busy Beaver 5," Bulletin of the European Association for Theoretical Computer Science, no. 40, February 1990, pp. 247- 251. Tibor Rado, "On Non-Computable Functions," Bell System Technical Journal, vol. XLI, no. 2, May 1962, pp. 877- 884. Rudy Rucker, Infinity and the Mind, Princeton University Press, 1995. Carl Sagan, Billions & Billions, Random House, 1997. Michael Somos, "Busy Beaver Turing Machine." At http://grail.cba.csuohio.edu/~somos/bb.html. Alan Turing, "On computable numbers, with an application to the Entscheidungsproblem," Proceedings of the London Mathematical Society, Series 2, vol. 42, pp. 230- 265, 1936. Reprinted in Martin Davis (ed.), The Undecidable, Raven, 1965. Ilan Vardi, "Archimedes, the Sand Reckoner," at http://www.ihes.fr/~ilan/sand_reckoner.ps. Eric W. Weisstein, CRC Concise Encyclopedia of Mathematics, CRC Press, 1999. Entry on "Large Number" at http://www.treasure-troves.com/math/LargeNumber.html. Back to Writings page Back to Scott's homepage Back to Scott's blog

      What even is the largest number that has real world use what would be the point of bigger numbers if we cant use the big numbers we have now for real world calculations?

    1. Author response:

      Reviewer #1 (Public review):

      Li et al. investigate Ca2+ signaling in T. gondii and argue that Ca2+ tunnels through the ER to other organelles to fuel multiple aspects of T. gondii biology. They focus in particular on TgSERCA as the presumed primary mechanism for ER Ca2+ filling. Although, when TgSERCA was knocked out there was still a Ca2+ release in response to TG present.

      Note that we did not knockout SERCA as it is an essential gene so it would not be possible to isolate parasites that do not express SERCA. We created conditional mutants that downregulate the expression of SERCA and some activity is present in the mutant after 24 h of ATc treatment.

      Overall the Ca2+ signaling data do not support the conclusion of Ca2+ tunneling through the ER to other organelles in fact they argue for direct Ca2+ uptake from the cytosol.

      The authors show EM membrane contact sites between the ER and other organelles, so Ca2+ released by the ER could presumably be taken up by other organelles but that is not ER Ca2+ tunneling.

      They clearly show that SERCA is required for T. gondii function.

      Overall, the data presented to not fully support the conclusions reached

      We agree that the data does not support Ca2+ tunneling as defined and characterized in mammalian cells. In response to this comment, we modified the title and the text accordingly.

      However, we think that the study shows far more than just the role of SERCA in T. gondii functions. We argue that the study shows that the ER (through the activity of the SERCA pump) sequesters and re-distributes calcium to other organelles following influx through the PM. The experiments show that the ER is able to take calcium from the cytosol as it enters the parasite through SERCA activity, and this activity is important for the transition of the parasite between various extracellular calcium exposures. We believe that the role of the ER in redistributing calcium following exposure to physiological levels of extracellular calcium is demonstrated in the experiments shown in Figs 1H-I, 4G-H and 5G,H, I, J, K . There are no previous T. gondii studies that address the question of how intracellular stores are filled with calcium, which are essential for the continuation of the lytic cycle, meaning they are essential for the parasitism of T. gondii.

      Data argue for direct Ca2+ uptake from the cytosol

      The ER most likely takes up calcium from the cytosol following its entry through the PM and redistributes it to the other organelles. We will delete the word “tunneling” and replace it with transfer and re-distribution as they represent our results.

      What we think is re-distribution is shown in Figure 1H and I in which the calcium released after GPN and nigericin are enhanced after TG addition. Of note is that there is no experimental evidence that supports the regulation of calcium entry by store depletion (PMID: 24867952), and we do not think that the enhanced response is due to calcium entry.

      Figure 4G and H show that knocking down SERCA reduces significantly the response to GPN. Fig 5I shows that the mitochondrial calcium uptake is reduced after the addition of GPN in the knockdown mutant. Fig 2B shows that SERCA can take up calcium at 55 nM calcium while mitochondrial uptake needs higher concentrations (Fig 5B-C). However, higher calcium concentrations could be reached at the microdomains formed around MCS between the ER and mitochondrion. Figure 5E shows that the mitochondrion is not responsive to an increase of cytosolic calcium. This is also shown for the apicoplast in Fig. 7 E and F of the Li et al, Nat Commun 2021 paper.

      Reviewer #2 (Public review):

      The role of the endoplasmic reticulum (ER) calcium pump TgSERCA in sequestering and redistributing calcium to other intracellular organelles following influx at the plasma membrane.

      T. gondii transitions through life cycle stages within and exterior to the host cells, with very different exposures to calcium, adds significance to the current investigation of the role of the ER in redistributing calcium following exposure to physiological levels of extracellular calcium.

      They also use a conditional knockout of TgSERCA to investigate its role in ER calcium store-filling and the ability of other subcellular organelles to sequester and release calcium. These knockout experiments provide important evidence that ER calcium uptake plays a significant role in maintaining the filling state of other intracellular compartments.

      We thank the reviewer.

      While it is clearly demonstrated, and not surprising, that the addition of 1.8 mM extracellular CaCl2 to intact T. gondii parasites preincubated with EGTA leads to an increase in cytosolic calcium and subsequent enhanced loading of the ER and other intracellular compartments, there is a caveat to the quantitation of these increases in calcium loading. The authors rely on the amplitude of cytosolic free calcium increases in response to thapsigargin, GPN, nigericin, and CCCP, all measured with fura2. This likely overestimates the changes in calcium pool sizes because the buffering of free calcium in the cytosol is nonlinear, and fura2 (with a Kd of 100-200 nM) is a substantial, if not predominant, cytosolic calcium buffer. Indeed, the increases in signal noise at higher cytosolic calcium levels (e.g. peak calcium in Figure 1C) are indicative of fura2 ratio calculations approaching saturation of the indicator dye.

      We agree about the limitations of using Fura2 but according to the literature (PMID:3838314, fig. 3) Fura2 is suitable for measurements between 100 nM and 1 mM calcium.  The responses in our experiments were within its linear range and the experiments with the SERCA mutant and mitochondrial GCaMPs supports the conclusions of our work.

      We agree that the experiment shown in Fig 1C shows a response close to the limit of the linear range of Fura2 and we can provide a more representative trace in the final article. We can include new quantifications and comparisons.

      Another caveat, not addressed, is that loading of fura2/AM can result in compartmentalized fura2, which might modify free calcium levels and calcium storage capacity in intracellular organelles.

      We are aware of this issue and because of that we have modified our protocol to minimize compartmentalization. We load cells for 26 min at room temperature and keep cells in ice and do not use them for longer that 2-3 hours because we do see evidence of compartmentalization. One evidence of compartmentalization is the increase in the resting calcium concentration.

      The finding that the SERCA inhibitor cyclopiazonic acid (CPA) only mobilizes a fraction of the thapsigargin-sensitive calcium stores in T. gondii coincides with previously published work in another apicomplexan parasite, P. falciparum, showing that thapsigargin mobilizes calcium from both CPA-sensitive and CPA-insensitive calcium pools (Borges-Pereira et al., 2020, DOI: 10.1074/jbc.RA120.014906). It would be valuable to determine whether this reflects the off-target effects of thapsigargin or the differential sensitivity of TgSERCA to the two inhibitors.

      This is an interesting observation, and we will discuss the result considering the Plasmodium study and include the citation. We will add inhibition curves using the MagFluo protocol and compare CPA and TG.

      Figure S1 suggests differential sensitivity, and it shows that thapsigargin mobilizes calcium from both CPA-sensitive and CPA-insensitive calcium pools in T. gondii. Also important is that we used 1 µM TG as we are aware that TG has shown off-target effects at higher concentrations. 

      The authors interpret the residual calcium mobilization response to Zaprinast observed after ATc knockdown of TgSERCA (Figures 4E, 4F) as indicative of a target calcium pool in addition to the ER. While this may well be correct, it appears from the description of this experiment that it was carried out using the same conditions as Figure 4A where TgSERCA activity was only reduced by about 50%.

      We partially agree as pointed by the reviewer knock down of TgSERCA by only 50% means that the ER still could be targeted by zaprinast and no evidence of another target calcium pool. From the MagFLuo4 experiment (although we are aware that the fluorescence of mag Fluo4 is not linear to calcium), there is SERCA activity after 24 hr of ATc treatment.  However, when adding Zaprinast after TG we see a significant release of calcium which is true for both wild type and conditional knockdowns. Because of this result we proposed that there could be another large neutral calcium pool than the one mobilized by TG. We will address these possibilities in the discussion and interpretation of the result.

      The data in Figures 4A vs 4G and Figures 4B vs 4H indicate that the size of the response to GPN is similar to that with thapsigargin in both the presence and absence of extracellular calcium. This raises the question of whether GPN is only releasing calcium from acidic compartments or whether it acts on the ER calcium stores, as previously suggested by Atakpa et al. 2019 DOI: 10.1242/jcs.223883. Nonetheless, Figure 1H shows that there is a robust calcium response to GPN after the addition of thapsigargin.

      The results of the experiments did not exclude the possibility that GPN can also mobilize some calcium from the ER besides acidic organelles. We don’t have any evidence to support that GPN can mobilize calcium from the ER either. Based on our unpublished work, we think GPN mainly release calcium from the PLVAC. We will include the mentioned citation and discuss the result considering the possibility that GPN may be acting on the ER.

      An important advance in the current work is the use of state-of-the-art approaches with targeted genetically encoded calcium indicators (GECIs) to monitor calcium in important subcellular compartments. The authors have previously done this with the apicoplast, but now add the mitochondria to their repertoire. Despite the absence of a canonical mitochondrial calcium uniporter (MCU) in the Toxoplasma genome, the authors demonstrate the ability of T. gondii mitochondrial to accumulate calcium, albeit at high calcium concentrations. Although the calcium concentrations here are higher than needed for mammalian mitochondrial calcium uptake, there too calcium uptake requires calcium levels higher than those typically attained in the bulk cytosolic compartment. And just like in mammalian mitochondria, the current work shows that ER calcium release can elicit mitochondrial calcium loading even when other sources of elevated cytosolic calcium are ineffective, suggesting a role for ER-mitochondrial membrane contact sites. With these new tools in hand, it will be of great value to elucidate the bioenergetics and transport pathways associated with mitochondrial calcium accumulation in T. gondii.

      We thank this reviewer for his/her positive comment. Studies of bioenergetics and transport pathways associated with mitochondrial calcium accumulation is part of our future plans.

      The current studies of calcium pools and their interactions with the ER and dependence on SERCA activity in T. gondi are complemented by super-resolution microscopy and electron microscopy that do indeed demonstrate the presence of close appositions between the ER and other organelles (see also videos). Thus, the work presented provides good evidence for the ER acting as the orchestrating organelle delivering calcium to other subcellular compartments through contact sites in T. gondi, as has become increasingly clear from work in other organisms.

      Thank you

      Reviewer #3 (Public review):

      This manuscript describes an investigation of how intracellular calcium stores are regulated and provides evidence that is in line with the role of the SERCA-Ca2+-ATPase in this important homeostasis pathway. Calcium uptake by mitochondria is further investigated and the authors suggest that ER-mitochondria membrane contact sites may be involved in mediating this, as demonstrated in other organisms.

      The significance of the findings is in shedding light on key elements within the mechanism of calcium storage and regulation/homeostasis in the medically important parasite Toxoplasma gondii whose ability to infect and cause disease critically relies on calcium signalling. An important strength is that despite its importance, calcium homeostasis in Toxoplasma is understudied and not well understood.

      We agree with the reviewer. Thank you

      A difficulty in the field, and a weakness of the work, is that following calcium in the cell is technically challenging and thus requires reliance on artificial conditions. In this context, the main weakness of the manuscript is the extrapolation of data. The language used could be more careful, especially considering that the way to measure the ER calcium is highly artificial - for example utilising permeabilization and over-loading the experiment with calcium. Measures are also indirect - for example, when the response to ionomycin treatment was not fully in line with the suggested model the authors hypothesise that the result is likely affected by other storage, but there is no direct support for that.

      The MagFluo protocol has been amply used in mammalian cells, DT40 cells and other cells for the characterization of the IP3 receptor response to IP3. We will include and discuss more citations in the revised article. The scheme at the top of the figure shows the protocol used. There is no overloading with calcium because the cells are permeabilized and the concentrations of calcium used are physiological and all experiments were performed at 220 nm calcium which is within the cytosolic levels tolerated by cells. The experiment was done with permeabilized cells because permeabilization allows the indicator to become diluted, the substrate MgATP to reach the membrane of the ER and in addition allows for the exposure to precise concentrations of calcium. MagFluo4 loading is intended for its compartmentalization to all intracellular compartments and the uptake stimulated by MgATP exclusively occurs in the compartment occupied by SERCA. IO is an ionophore that causes calcium release from other stores in addition to the ER and it is expected that will result in a larger release. We must clarify that the experiment shown in Fig. 2 was done to characterize the activity of SERCA and was not aimed at the characterization of the role of SERCA in the parasite. We will explain this result better in the revised version of the article.

      Below we provide some suggestions to improve controls, however, even with those included, we would still be in favour of revising the language and trying to avoid making strong and definitive conclusions. For example, in the discussion perhaps replace "showed" with "provide evidence that are consistent with..."; replace or remove words like "efficiently" and "impressive"; revise the definitive language used in the last few lines of the abstract (lines 13-17); etc. Importantly we recommend reconsidering whether the data is sufficiently direct and unambiguous to justify the model proposed in Figure 7 (we are in favour of removing this figure at this early point of our understanding of the calcium dynamic between organelles in Toxoplasma).

      We thank the reviewer for the suggestions and will modify the language as suggested.

      Fig 7 is only a model and as all models could be incorrect. However, considering this reviewer’s criticism we will replace the model for a simpler one that is less speculative.

      Another important weakness is poor referencing of previous work in the field. Lines 248-250 read almost as if the authors originally hypothesised the idea that calcium is shuttled between ER and mitochondria via membrane contact sites (MCS) - but there is extensive literature on other eukaryotes which should be first cited and discussed in this context. Likewise, the discussion of MCS in Toxoplasma does not include the body of work already published on this parasite by several groups. It is informative to discuss observations in light of what is already known.

      We added a citation following the sentence mentioned by the reviewer in lines 248-250 (corrected preprint) and will include more in the revised article. We cite several pertinent articles that describe MCS in Toxoplasma (lines 378-380, very few actually). We will make sure not to miss any new articles that could have been recently published. Note that our work is not about describing the presence of MCSs. We are showing transfer of calcium between the ER and mitochondria and we present evidence that supports that it happens through MCSs.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      - Summary: 

      Recordings were made from the dentate nucleus of two monkeys during a decision-making task. Correlates of stimulus position and stimulus information were found to varying degrees in the neuronal activities. 

      We agree with this summary.

      - Strengths: 

      A difficult decision-making task was examined in two monkeys.

      We agree with this statement.

      - Weaknesses: 

      One of the monkeys did not fully learn the task. The manuscript lacked a coherent hypothesis to be tested, and no attempt was made to consider the possibility that this part of the brain may have little to do with the task that was being studied. 

      We understand the reviewers concern. It is correct that one of the monkeys (Mi) did not perform at a high level, but it should be noted that both monkeys learned significantly above chance level. Therefore, we would argue that both monkeys in fact did learn the task but Mi’s performance was suboptimal. This difference in the performance levels gave us a rare opportunity to dive deeper into the reasons why some animals perform better than the others and we show that Mi (the lower performing monkey) paid more attention to the outcome of the previous trial – this is evident from our behavioural and decoding models.

      We tested the overall hypothesis that neurons of the nucleus dentate can dynamically modulate their activity during a visual attention task, comprising not only sensorimotor but also cognitive attentional components. Many neurons in the dentate are multimodal (Figure 3C-D) which was something that was theorized. One of the specific hypotheses that we tested is that the dentate cells can be direction-selective for both the sensorimotor and cognitive component. Given that many of the recorded cells showed direction-selectivity in their firing rate modulation for gap directions and/or stimulus directions, we provide strong evidence that this hypothesis is correct. We have now spelled out this hypothesis more explicitly in the introduction of the revised version. We now also explain better why we tested this specific hypothesis. Indeed, earlier studies in primates such as those by Herzfeld and colleagues (2018, Nat. Neuro.) and van Es and colleagues (2019, Current Biol) have indicated that direction-selectivity of cerebellar activity may occur in various sensorimotor domains.

      We also appreciate the comment of this Reviewer that in our original submission we did not show our attempt to consider the possibility that this part of the brain may have little to do with the task that was being studied. We in fact did consider this possibility in that we successfully injected 3 ml of muscimol (5 μg/ml, Sigma Aldrich) into the dentate nucleus in vivo in one of the monkeys (Mo). This application resulted in a reduction of more than 10% in correct responses of the covert attention task after 45 minutes, whereas the performance remained the same following saline injections. Unfortunately, due to the timing of the experiments and Covid19-related laboratory restrictions we were unable to perform these experiments in the other monkey or repeat them in Mo. We aim to replicate this in future experiments and publish it when we have full datasets of at least two monkeys available. For this paper we have prioritized our tracing experiments, highlighting the connections of the dentate nucleus with attention related areas in brainstem and cortex in both monkeys, following perfusion.

      - Perhaps the large differences in performance between the two subjects can be used as a way to interpret the neural data's relationship to behavior, as it provided a source of variance. This is what we would hypothesize if we believed that this area of the brain is playing a significant role in the task. If one animal learns much more poorly, and this region of the brain is important for that behavior, then shouldn't there be clear, interpretable differences in the neural data? 

      We thank the Reviewer for this comment. We have added a new Supplementary Figure 2, in which we present the data for both monkeys separately in the revised manuscript. Comparing the two datasets however, we see more commonalities related to the significant learning in both monkeys than differences that might be related to their different levels of learning. We have therefore decided to show the different datasets transparently in the new Supplementary Figure 2, but to stay on the conservative side in our interpretations.

      - How should we look for these differences? A number of recent papers in mice have uncovered a large body of data showing that during the deliberation period, when the animal is interpreting a sensory stimulus (often using the whisker system), there is ramping activity in a principal component space among neurons that contribute to the decision. This ramping activity is present (in the PCA space) in the motor areas of the cortex, as well as in the medial and lateral cerebellar nuclei. Perhaps a similar computational approach would benefit the current manuscript. 

      We also appreciate this point. We have done the principal component analysis accordingly, and we indeed do find the ramping activity in several components of the dentate activity of both monkeys (Mi and Mo). We have now added a new Supplementary Figure 3 with the first three components of both correct and incorrect trials for Mi and Mo, highlighting their potential contribution.

      - What is the hypothesis that is being tested? That is, what do you think might be the function of this region of the cerebellum in this task? It seems to me that we are not entirely in the dark, as previous literature on mice decision-making tasks has produced a reasonable framework: the deliberation period coincides with ramping activity in many regions of the frontal lobe and the cerebellum. Indeed, the ramp in the cerebellum appears to be a necessary condition for the ramp to be present in the frontal lobe. Thus, we should see such ramping activity in this task in the dentate. When the monkey makes the wrong choice, the ramp should predict it. If you don't see the ramping activity, then it is possible that the hypothesis is wrong, or that you are not recording from the right place. 

      It is indeed one of our specific hypotheses that the dentate cells can be direction-selective for the preparing cognitive component and/or sensorimotor response. We provide evidence that this hypothesis may be correct when we analyze the regular time response curves (see Figure 2 and the new Supplementary Figure 2 where the data of both monkeys are now presented separately). Moreover, we have now verified this by analysing the ramping curves of PCA space (new Supplementary Figure 3) and firing frequency of DN neurons that modulated upon presentation of the C-stimulus (new Supplementary Figure 4). These figures and findings are now referred to in the main text.

      - As this is a difficult task that depends on the ability of the animals to understand the meaning of the cues, it is quite concerning that one of the monkeys performed poorly, particularly in the early sessions. Notably, the disparity between the two subjects is rather large: one monkey at the start of the recordings achieved a performance that was much better than the second monkey did at the end of the recording sessions. You highlighted the differences in performance in Figure 1D and mentioned that you started recording once the animals reached 60% performance. However, this did not make sense to me as the performance of Mi even after the final day of recording did not reach the performance of Mo on the first day of recording. Thus, in contrast to Mo, Mi appeared to be not ready for the task when the recording began.

      We understand this point. However, please note that the learning performance of the monkeys concerned retraining sessions after they had had several weeks of vacation. So, even though it is correct that one of the two monkeys had a very good consolidation and started already at a relatively high level on the first retraining session, the other one also started and ended at a level above chance level (the y-axis starts at 0.5). We now highlight this point better in the Results section.

      - One objective of having two monkeys is to illustrate that what is true in one animal is also true in the other. In some figures, you show that the neural data are significantly different, while in others you combine them into one. Thus, are you confident that the neural data across the animals should be combined, as you have done in Figure 2? Perhaps you can use the large differences in performance as a source of variance to find meaning in the neural data. 

      This is a valid question; as highlighted above, we have now addressed this point in the new Supplementary Figure 2, where the data for both monkeys are presented separately. Given the sample sizes and level of variances, it is in general difficult to draw conclusions about the potential differences and contributions, but the data are sufficiently transparent to observe common trends. With regard to linking differences in the neural data to the differences in performance level, please also consider Figure 4, the new Supplementary Figure 3 (with the ramping PCA component) and new Supplementary Figure 4 (with the additional analysis of the ramping activity of DN neurons that modulated upon presentation of the C-stimulus), which suggests that the ramping stage of Mo starts before that of Mi. This difference highlights the possibility that injecting accelerations of the simple spike modulations of Purkinje cells in the cerebellar hemispheres into the complex of cerebellar nuclei may be instrumental in improving the performance of responses to covert attention, akin to what has been shown for the impact of Purkinje cells of the vestibulocerebellum on eye movement responses to vestibular stimulation (De Zeeuw et al. 1995, J Neurophysiol). This possibility is now also raised in the Discussion.

      - How do we know that these neurons, or even this region of the brain, contribute to this task? When a new task is introduced, the contributions of the region of the brain that is being studied are usually established via some form of manipulation. This question is particularly relevant here because the two subjects differed markedly in their performance, yet in Figure 3 you find that a similar percentage of neurons are responding to the various elements of the task.

      We appreciate this question. As highlighted above, we are refraining from showing our muscimol manipulation (3 ml of 5 μg/ml muscimol, Sigma Aldrich), as it only concerns 1 successful dataset and 1 control experiment. We hope to replicate this reversible lesion experiment in the future and publish it when we have full new datasets of at least two monkeys available. As explained above, for this paper we have sacrificed both monkeys following a timed perfusion, so as to have similar survival times for the transport of the neuro-anatomical tracer involved.  

      - Behavior in both animals was better when the gap direction was up/down vs. left/right. Is this difference in behavior encoded during the time that the animal is making a decision? Are the dentate neurons better at differentiating the direction of the cue when the gap direction is up/right vs. left/right? 

      These data have now been included in the new Supplementary Figure 2; we did not observe any significant differences in this respect.

      Reviewer #2:

      - The authors trained monkeys to discriminate peripheral visual cues and associate them with planning future saccades of an indicated direction. At the same time, the authors recorded single-unit neural activity in the cerebellar dentate nucleus. They demonstrated that substantial fractions of DN cells exhibited sustained modulation of spike rates spanning task epochs and carrying information about stimulus, response, and trial outcome. Finally, tracer injections demonstrated this region of the DN projects to a large number of targets including several known to interconnect the visual attention network. The data compellingly demonstrate the authors' central claims, and the analyses are well-suited to support the conclusions. Importantly, the study demonstrates that DN cells convey many motor and nonmotor variables related to task execution, event sequencing, visual attention, and arguably decision-making/working memory. 

      We thank the Reviewer for this positive and constructive feedback.

      - The study is solid and I do not have major concerns, but only points for possible improvement. 

      We thank the Reviewer for this positive feedback.

      - A key feature of this data is the extended changes/ramps in DN output across epochs (Figure 2). Crudely, this presents a challenge for the view that DN output mainly drives motor effectors, as the saccade itself lasts only a tiny fraction of the overall task. Some discussion of this dichotomy in thinking about the function(s) of the cerebellum, vis a vis the multifarious DN targets the authors demonstrate here, etc., would be helpful. 

      We agree with the Reviewer and we have expanded our Discussion on this point, also now highlighting the outcome of the new PCA analysis recommended by Reviewer 1 (see the new Supplementary figure Figure 3).

      - A high-level suggestion on the data: the presentation of the data focuses (sensibly) on the representation of the stimulus and response epochs (Figures 2-3). Yet, the authors then show that from decoding, it is, in fact, a trial outcome that is best represented in the population (Figure 4). While there is nothing 'wrong' with this, it reads slightly incongruously, and the reader does a bit of a "double take" back to the previous figures to see if they missed examples of the trial-outcome signals, but the previous presentations only show correct trials. Consider adding somewhere in the first 3 main figures some neural data showing comparisons with incorrect trials. This way, the reader develops prior expectations for the outcome decoding result and frame of reference for interpreting it. On a related note, the text contains an earlier introduction of this issue (p24 last sentence) and p25 paragraph 1 cites Figure 3D and 3E for signals "related to the absence of reward" - but the caption says this includes only correct trials? 

      We thank the Reviewer for bringing up these points. We have addressed the textual suggestions. Moreover, we have done the PCA analysis suggested by Reviewer 1 for both the correct and incorrect trials (see Supplementary material).

      - P29: The discrepancy in retrograde labeling between monkeys (2 orders of magnitude): I realize the authors can't really do anything about this, but the difference is large enough to warrant concerns in the interpretation (how did the tracer spread over the drastically larger area? Isotropically? Could it cross more "hard boundaries" and incorporate qualitatively different inputs/outputs?). A small discussion of possible caveats in interpreting the outcomes would be helpful. 

      We fully agree with this comment. As highlighted in the text, in both monkeys we first identified the optimal points for injection in the dentate nucleus electrophysiologically and we used the same pump with the same settings to carry out the injections, but even so the differences are substantial. We suspect that the larger injection might have been caused by an air bubble trapped in the syringe or a deviation in the stock solution, but we can never be sure of that. We have added a potential explanation for the caveat that might have played a role.

      - And a list of quick points: 

      We have addressed all points listed below; we want to thank the Reviewer for bringing them up.

      P3 paragraph 2 needs comma "in daily life,". 

      P4 paragraph 2 "C-gap" terminology not previously defined. 

      P4 paragraph 2 "animals employed different behavioral strategies". Grammatically, you should probably say "each animal employed a different behavioral strategy," but also scientifically the paragraph doesn't connect this claim to anything about the DN (whereas, e.g., the abstract does make this connection clear). 

      P5 paragraph 1 "theca" should be "the". 

      P6 paragraph 1 problem with ignashenkova citation insert. 

      P10 paragraph 1 I think the spike rate "difference between highest and lowest" is not exactly the same as "variance," you might want to change the terminology. 

      P10 paragraph 1 should probably say "To determine if a cell preferentially modulated". 

      P10 paragraph 1 last sentence the last clause could be clearer. 

      P17 paragraph 2 should be something like "as well as those by Carpenter and..."? 

      P20 caption: consider "...directionality in the task: only one C-stim...". 

      P20 caption: consider "to the left and right in the [L/R] task...to the top/bottom in the [U/D] task". 

      Fig1E and S1 - is there a physical meaning of the "weight" unit, and if none, can this be transformed into a more meaningful unit? 

      P21 paragraph 1 consider "activity was recorded for 304 DN neurons...". 

      P21 paragraph 1 "correlations with the temporal windows" it's not clear how activity can "correlate" with a time window, consider rephrasing (activity levels changed during these time epochs, depending on stimulus identity). 

      P21 paragraph 1 should be "by comparing the number of spikes in a bin...". 

      P22 paragraph 2 "when we aligned the neurons to the time of maximum change" needs clarification. The maximum change of what? And per neuron? Across the population? 

      P22 paragraph 2 "than that of the facilitating" should be "than did the facilitating units". 

      P24 paragraph 1 needs a comma and rewording "Within each direction, trials are sorted by the time of saccade onset". 

      P24 paragraph 1 should probably say "Same as in G, but for suppressed cells". 

      P24 paragraph 2 should say "more than one task event" not "events". 

      P24 paragraph 2 needs a comma "To fully characterize the neural responses, we fitted". 

      P25 paragraph 1 should probably say "we sampled from similar populations of DN". 

      P34 paragraph 3 consider rephrasing the sentence that contains both "dissociation" and "dissociate". 

      P37 last line: consider "coordination of cerebellum and cerebral cortex *in* higher order mental..."? 

      P38 paragraph 1 citation needed for "kinematics of goal-directed hand actions of others"? 

      P38 paragraph 1 commas probably not needed "map visual input, from high-level visual regions, onto..." 

      References

      - Herzfeld D.J., Kojima Y, Soetedjo R, Shadmehr R (2018) Encoding of error and learning to correct that error by the Purkinje cells of the cerebellum. Nat Neurosci 21:736–743.

      - van Es, D.M., van der Zwaag W., and Knapen T. (2019) Topographic Maps of Visual Space in the Human Cerebellum. Current Biol Volume 29, Issue 10p1689-1694.e3May 20.

      - De Zeeuw CI, Wylie DR, Stahl JS, Simpson JI. (1995) Phase relations of Purkinje cells in the rabbit flocculus during compensatory eye movements. J Neurophysiol. Nov;74(5):2051-64. doi: 10.1152/jn.1995.74.5.2051.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      Cesar, Santos & Cogni use a meta-analysis to report on the direction and magnitude of three fundamental fitness components in defensive symbioses. Specifically, the work focuses on interactions between three arthropod host families (Aphididae, Culicidae, Drosophilidae, and others) and common bacterial endosymbionts (Wolbachia, Serratia, Hamiltonella, Spiroplasma, Rickettsia, Regiella X-type and Arsenophonus). The results of the overall analysis confirm common assumptions and previous work on such fitness components, showing that defensive symbionts provide strong protection to hosts and cause detectable costs to both hosts and the enemy. The analysis provides insight into the extent of the cost/benefit tradeoff for hosts, reporting that the cost is six times lower than the protective effect. The confirmation that natural enemies attacking hosts infected with symbionts have a reduction in their fitness is also an interesting one, as this shows that the majority of defensive symbionts provide protection by resisting enemy infection, as opposed to tolerating it. This finding has important consequences for evolutionary counter-responses in the enemy species. Of course, this result has less relevance for certain types of enemies (such as parasitoids) where successful infection is dependent upon host killing.

      Interesting results also emerge from the subgroup analysis. For the full dataset, both natural and introduced symbionts were similarly effective in positively influencing the fitness of hosts. However, in the Wolbachia-specific analysis, the artificially introduced symbionts caused costs to the hosts where the natural strain did not. These findings have potentially important ramifications for schemes that use endosymbionts for biocontrol or vector competence, suggesting that (in some cases) natural strains may be the more stable choice for deploying (as they are associated with lower costs).

      The analysis draws from an impressively large dataset, but the interpretation of the full impact of the results would be helped by greater detail on the species/strain level systems included, the data extraction approach, and inclusion criteria. Accounting for phylogenetic nonindependence and alternative coding of one of the moderator variables could also strengthen the biological relevance of the models. Suggestions and thoughts are outlined below.

      We sincerely thank Reviewer #1 for the time and effort dedicated to reviewing our manuscript. The suggestions provided are highly constructive and will greatly assist us in improving both our analyses and the manuscript overall.

      Strengths & Potential Improvements:

      An impressively large number of effect sizes (3000) from only 226 studies is collected, robustly confirming common assumptions on the magnitude of fundamental fitness components. However the paper would benefit from a clear breakdown in the main text of the specificities of each system included (e.g. a table at the host species/symbiont strain level, where it is possible). Currently, there is not enough detail for those who want a deep dive to understand what data was extracted for the analysis from these 226 studies, or those who want to understand the underlying diversity in the dataset.

      We thank the reviewer for the suggestion, and we will add this information to our revised manuscript.

      Currently, when the 'natural enemy group' is tested as a moderator it is coded broadly by type of organism (e.g. virus, bacterium, fungi, parasitoid). But this doesn't adequately capture the mode of killing/fitness reduction by the enemy, which would be the much more biologically relevant categorisation for your questions. For example, parasitoid infection is dependent upon host death (thus host fecundity is not relevant, because the host either survived or did not). Among bacterial and viral pathogens antagonists there is scope for both fecundity and survival to be affected. This in turn may be a very influential factor for the outcome. You could consider recoding this enemy moderator.

      We agree, and we will implement this in the analysis to our revised manuscript.

      The analysis is restricted to arthropod hosts and defensive symbionts that are also classed as endosymbionts. This focus should be made clear early on in the paper, as there are many systems (that are classed by many as defensive symbioses) that are not part of the analysis.

      We agree, and we will implement this to our revised manuscript.

      There is fairly minimalistic testing of moderators/sub-groups (which probably has its statistical strengths) but perhaps there are also some missed opportunities for testing other ecological contributors to variance, including coinfection (although perhaps limited by power) and other approaches to coding enemy group (as detail above).

      We agree, and we will implement this in the analysis to our revised manuscript.

      Looking at the overview of systems included, there's likely a high degree of phylogenetic non-independence in the dataset. Where it is possible, using phylogenetically controlled models could strengthen this analysis.

      We thank the reviewer for the suggestion. We will explore the possibility of using phylogenetically controlled models in our analyses, although we recognize the challenges associated with their implementation, particularly in the case of the natural enemies, given the great diversity of distant related groups included in our study - viruses, bacteria, fungi, protozoans, nematodes and parasitoids wasps.

      Looking at your included systems (Table S5), you might be able to test the effect of coinfection on the 3 variables of interest. For example, it would be particularly important to see if the effects of two symbionts are additive or not.

      We agree, and we will implement this in the analysis to our revised manuscript.

      No code for the analysis is provided for review at this stage and full details of the dataset are also not available. This slightly limits the ability to assess the full scope and robustness of the study. It would be helpful to have an extensive table in the supplementary detailing (minimum) the reference, study, experiment, host species, symbiont strain, and a description of the exact data extraction source (e.g.table/figure/in text), and method of extraction.

      The code for the analysis and the full raw data with the suggested information are available at https://github.com/cassiasqr/MetaSymbiont (The link is available at the end of the manuscript).

      Reviewer #2 (Public review):

      Summary:

      In this exciting study, Cesar and co-authors perform a meta-analysis on the influence of arthropod symbionts on the fitness of their hosts when they are exposed or not to natural enemies. These so-called defensive symbionts are increasingly recognized as key elements in arthropod survival against natural enemies, with effects that ripple through entire terrestrial ecosystems. The topic is timely, the approach is sound, and the manuscript is well-written. I believe this manuscript will attract the attention of entomologists and of microbiologists interested in symbiosis. This study builds on a previous meta-analysis that I was involved in, which was based on phloem-feeding insects. This novel data set is much larger and includes flies (including the model system Drosophila) and mosquitoes (a group of high medical interest). While the previous metaanalysis considered only parasitoids as natural enemies, this study also includes fungi, bacteria, and viruses.

      Strengths:

      The authors compile a very large dataset and provide a broad quantitative overview of the effects of defensive symbionts in insects. By measuring symbiont effects in the presence and absence of natural enemies, the authors are able to infer whether a trade-off between defense and the costs of mutualism in the absence of enemy pressure exists. Defensive symbioses are an important research topic that had its initial "momentum" a decade ago, so the timing for such a systematic review is very appropriate.

      We sincerely thank Reviewer #2 for dedicating their time and effort to reviewing our manuscript. The suggestions are very insightful and will significantly contribute to improving our manuscript.

      Weaknesses:

      I think the manuscript could be improved by clarifying several sections, particularly the introduction and methods. The introduction section is too specific and heavily reliant on particular examples. In my view, the theoretical background of the study could be made clearer, and the knowledge gap identified more explicitly. A focus on how widespread defensive symbioses are, along with a brief, up-to-date review of the groups possessing such symbionts, would help. This lack of focus is also observed in the methods section, where more details are needed in many instances to better understand how data was collected and analyzed. Regarding the analyses, the multi-level analysis contains many moderators, but it's unclear why these moderators were included. While this may seem a minor issue, it highlights a disconnection between the analyses, the conceptual background, and the hypotheses tested. 

      We thank the reviewer for the suggestions, and we will try to make the introduction and the methods section clearer. 

      Another important weakness is that the analyses are too general, and much-hidden information is not immediately apparent. For instance, readers cannot easily identify which species of symbionts are studied (and the effects they have), or which natural enemies are involved. Although this information is found in the supplementary material, including it in the main body would significantly improve the manuscript.

      We agree, and we will implement this to our   revised manuscript.

    1. Even the popular socialites Kim and Khloe Kardashian have utilized Photoshop to post edited selfies for their Instagram accounts.

      Kendall Jenner has somewhat recently been called out for photoshopping photos of herself on red carpets. The differences are shown in the video below, but either way, she edits the photos thats she herself posts on her instagram but the ones produced by the paparazzi show the real photo. i think this just proves that not matter how thin someone is, they will have insecurities that they are wanting to hide. But at what point do we, as a society realize that social media culture is toxic and do something about it. I am worried that that day may never come.

      https://www.tiktok.com/t/ZTYfDtq8Q/

    1. Reviewer #1 (Public review):

      Summary:

      This study by Fuqua et al. studies the emergence of sigma70 promoters in bacterial genomes. While there have been several studies to explore how mutations lead to promoter activity, this is the first to explore this phenomena in a wide variety of backgrounds, which notably contain a diverse assortment of local sigma70 motifs in variable configurations. By exploring how mutations affect promoter activity in such diverse backgrounds, they are able to identify a variety of anecdotal examples of gain/loss of promoter activity and propose several mechanisms for how these mutations are interacting within the local motif landscape. Ultimately, they show how different sequences have different probabilities of gaining/losing promoter activity and may do so through a variety of mechanisms.

      Major strengths and weaknesses of the methods and results:

      This study uses Sort-Seq to characterize promoter activity, which has been adopted by multiple groups and shown to be robust. Furthermore, they use a slightly altered protocol which allows measurements of bi-directional promoter activity. This combined with their pooling strategy allows them to characterize expression of many different backgrounds in both directions in extremely high-throughput which is impressive! A second key approach this study relies on is the identification of promoter motifs using position weight matrices (PWMs). While these methods are prone to false positives, the authors implement a systematic approach which is standard in the field. However, drawing these types of binary definitions (is this a motif? yes/no) should always come with the caveat that gene expression is quantitative traits that we oversimplify when drawing boundaries.

      Their approach to randomly mutagenize promoters allowed them to find many examples of different types of evolutions that may occur to increase or decrease promoter activity. They have supported these with validations in more controlled backgrounds which convincingly support their proposed mechanisms for promoter evolution.

      An appraisal of whether the authors achieved their aims, and whether the results support their conclusions:

      The authors express a key finding that the specific landscape of promoter motifs in a sequence affect the likelihood that local mutations create or destroy regulatory elements. The authors have described many examples, including several that are non-obvious, and show convincingly that different sequence backgrounds have different probabilities for gaining or losing promoter activity. This overarching conclusion is supported by trend and mechanistic data which show differences in probabilities of evolving promoters, as well as the mechanisms underlying these evolutions. Furthermore, these mutations are well described and presented, showing the strength of emergent promoter motifs and their specific spacings from existing motifs within the sequence.

      Impact of the work on the field, and the utility of the methods and data to the community:

      This study enhances our understanding of the diverse mechanisms by which promoters can evolve or devolve, potentially improving models that predict mutational outcomes. While this study reveals complex mutational patterns, modeling them could significantly advance our ability to predict bacterial evolutionary trajectories and interpret genomes, bringing us closer to that goal.

      Recent work in the field of bacterial gene regulation has raised interest in bidirectional promoter regions. While the authors do not discuss how mutations that raise expression in one direction may affect another, they have created an expansive dataset which may enable other groups to study this interesting phenomenon. Also, their variation of the Sort-Seq protocol will be a valuable example for other groups who may be interested in studying bidirectional expression. Lastly, this study may be of interests to groups studying eukaryotic regulation as it can inform how the evolution of transcription factor binding sites influences short-range interactions with local regulator elements.

      Any additional context to understand the significance of the work:

      Predicting whether a sequence drives promoter activity is a challenging task. By learning the types of mutations that create or destroy promoters, this study provides valuable insights for computational models aimed at predicting promoter activity.

      Comments on revised version:

      I am satisfied with the extensive changes made by the author. This manuscript is excellent.

      I very much like the change in figures to incorporate the sequence information. It is great to see clear representations of the emergent sigma70 motifs and their spacing relative to existing motifs. This addition significantly improves the clarity of the findings.

      The validation of mutations on a clean background is well-executed, and the results are convincing. I appreciate the effort put into validating their results. The additional analyses that include TGn and UP-element motifs are also well done and highly relevant, as these elements are known to compensate for weaker or absent -35 sequences.

      Most or all perceived inconsistencies from the previous version have been resolved. While I don't think the fluorescence threshold of 1.5 a.u. for promoter activity is justified, the authors do acknowledge this shortcoming, and even empirically-derived thresholds are still technically arbitrary.

      I particularly enjoyed Figure 1E, thank you for entertaining my analysis request! Also, the H-NS story is a nice addition showing how transcription factors influence this evolution

      Overall, this revised manuscript is an excellent contribution to the field, and I have no further recommendations for improvement.

    1. Reviewer #1 (Public review):

      Summary:

      This study from Abssy et al. aims to determine if different non-invasive peripheral stimulation techniques - such as magnetic and electrical stimulations - may influence pain intensity, unpleasantness, and secondary hyperalgesia using a 4-arm parallel-group study. They observed no effect on pain intensity and unpleasantness. Also, they reported that only the TENS (electrical stimulation) did not impact secondary hyperalgesia. They hypothesized that the effects were probably due to the sound emitted by RPMS (magnetic stimulation). In a follow-up study, they tried to determine if covering the sound of RPMS would abolish the effect on secondary hyperalgesia using a single-arm design. They observed no effect of RPMS.

      Strengths:

      (1) The research team recruited a relatively large sample size for this type of study.

      (2) The phasic heat pain protocol appears rigorous and well-described.

      (3) The Figures are helpful in facilitating the understanding of the study design and results.

      (4) The statistical analyses appear sound.

      Weaknesses:

      (1) The proposed design is not sufficient to answer the research question. The rationale of the study proposed in the introduction is that auditory stimulation may explain the analgesic effects of RPMS. To answer this question, the authors should have used a factorial design using 4 groups (active RPMS + sound; active RPMS + no sound; sham RPMS + sound; sham RPMS + no sound). Using this design, it would have been possible to determine if the sound, the afferent stimulation, or both are necessary to produce analgesia. Rather, they tested two types of RPMS (iTBS, cTBS) without real rationale, one electrical stimulation and a placebo.

      (2) There are multiple ways that the current design could have introduced biases. The study was not randomized but pseudo-randomised. What does that mean? Was their allocation concealment? Was the assessor and data analyst blinded to group allocation? Did an intention to treat analyses were performed? Did the participants were adequately blinded (was it measured)?

      (3) The TENS parameters used were not optimal and are not those commonly used in clinical practice. This could have explained the lack of TENS effects. The lack of TENS effects has not been discussed and it is concerning. If TENS had been effective (as expected), the story about the auditory effects would not have been presented as the primary mechanisms underlying the current results.

      (4) No primary outcome has been identified. It is important to mention that the interpretation of results is based on the presence of only one statistically significant result. Pain intensity and pain unpleasantness are not affected. This was not properly addressed in the Discussion. What does that mean that secondary hyperalgesia is affected but not pain?

      (5) The use of secondary hyperalgesia as a variable requires further clarification. How is it possible to measure secondary hyperalgesia if there is no lesioned tissue? If heat creates secondary hyperalgesia without lesion, what does that mean physiologically? Is it a valid and reliable "pain" variable?

      (6) The follow-up study has been designed to cover the RPMS sound using pink noise. However, the pink noise was also present during the PHP measurement. How can we determine whether the absence of change is due to the pink noise during the RPMS or the presence of pink noise during PHP? I don't think this is possible to discriminate.

      Appraisal:

      (7) Despite all these potential issues, authors interpret their data with high confidence and with several overstatements in the Title, Abstract, and Discussion. The results do not support their conclusions. The fact that auditory stimulation may produce an analgesic effect is a hypothesis, but the current study cannot ascertain it.

    2. Author response:

      Reviewer 1 (Public Review)

      (1) The proposed design is not sufficient to answer the research question. The rationale of the study proposed in the introduction is that auditory stimulation may explain the analgesic effects of RPMS. To answer this question, the authors should have used a factorial design using 4 groups (active RPMS + sound; active RPMS + no sound; sham RPMS + sound; sham RPMS + no sound). Using this design, it would have been possible to determine if the sound, the afferent stimulation, or both are necessary to produce analgesia. Rather, they tested two types of RPMS (iTBS, cTBS) without real rationale, one electrical stimulation and a placebo.

      We will clarify that the study design employed was originally designed to determine whether iTBS or cTBS would be more effective to reduce pain. We included TENS as a positive control, and sham as a negative control. We were indeed surprised by the findings, and present them herein. Future RCTs should be performed to reproduce these findings.

      (2) There are multiple ways that the current design could have introduced biases. The study was not randomized but pseudo-randomised. What does that mean? Was their allocation concealment? Was the assessor and data analyst blinded to group allocation? Did an intention to treat analyses were performed? Did the participants were adequately blinded (was it measured)?

      This study was not designed as an RCT, but rather as experimental study. The study was pseudo-randomized to ensure that the groups had equal allocation and distribution of sexes.

      The groups were blinded to the other stimulations (they were not informed of the various arms of the study, through different consent forms).

      It was not possible to blind the experimenter as the iTBS and cTBS protocols are very different: iTBS has multiple bursts separated by brief intervals, whereas cTBS is continuous). The data were masked for analysis, and only unblinded at the final stage. We will update the manuscript to reflect these changes.

      (3) The TENS parameters used were not optimal and are not those commonly used in clinical practice. This could have explained the lack of TENS effects. The lack of TENS effects has not been discussed and it is concerning. If TENS had been effective (as expected), the story about the auditory effects would not have been presented as the primary mechanisms underlying the current results.

      We acknowledge that this is a limitation of the study. A future study should address this. However, we will not remove the arm for transparency.

      (4) No primary outcome has been identified. It is important to mention that the interpretation of results is based on the presence of only one statistically significant result. Pain intensity and pain unpleasantness are not affected. This was not properly addressed in the Discussion. What does that mean that secondary hyperalgesia is affected but not pain?

      We reiterate that this study was not designed as an RCT, but rather an experimental study with The primary outcomes measures that capture change in  were measures of pain sensitivity (pain intensity NRS, pain unpleasantness NRS, and secondary hyperalgesia). We will clarify this in the revised manuscript.

      We will now include discussion of the effects being solely on secondary hyperalgesia, and not on pain intensity and unpleasantness.

      (5a) The use of secondary hyperalgesia variable is concerning. How is it possible to measure secondary hyperalgesia if there is no lesioned tissue?

      Secondary hyperalgesia refers to hyperalgesia assessed in an area adjacent to or remote of the site of stimulation. In general, it is not required to lesion a tissue to activate the nociceptive system or to induce pain. We have cited other studies that have employed secondary hyperalgesia as a pain outcome measure without inducing a lesion.

      Hyperalgesia reflects increased pain on suprathreshold stimulation. Then, one measures the subjective response to a painful (i.e. suprathreshold) stimulation, then applies a conditioning stimulation (e.g. heat), and measures the subjective response to the same original stimulus. If the response after conditioning is higher than the baseline measure, hyperalgesia has been induced. Secondary hyperalgesia just refers to hyperalgesia assessed in an area adjacent to or remote of the site of stimulation. In general, it is not required to lesion a tissue to activate the nociceptive system or to induce pain.

      (5b) If heat creates secondary hyperalgesia without lesion, what does that mean physiologically?

      Secondary hyperalgesia is normally interpreted as a perceptual correlate of central sensitization.

      (5c) Is it a valid and reliable "pain" variable?

      Yes and yes. A noxious heat stimulus can reliably elicit secondary hyperalgesia (see section 3.2 from Quesada et al. 2021). We also cite several studies that have used secondary hyperalgesia as an outcome measure of central sensitization in pain.

      (6) The follow-up study has been designed to cover the RPMS sound using pink noise. However, the pink noise was also present during the PHP measurement. How can we determine whether the absence of change is due to the pink noise during the RPMS or the presence of pink noise during PHP? I don't think this is possible to discriminate.

      We will add a third study that performs the control analysis with the sound of the rPMS masked, but no pink noise otherwise. The study will be performed in two groups: one with pink noise, and one without pink noise.

      Appraisal

      (7) Despite all these potential issues, authors interpret their data with high confidence and with several overstatements in the Title, Abstract, and Discussion. The results do not support their conclusions. The fact that auditory stimulation may produce an analgesic effect is a hypothesis, but the current study cannot ascertain it.

      We believe that the chief concern with the interpretation lies with concerns with the second study. The proposed third experiment will address these concerns.

      Reviewer 2 (Public Review):

      (1) My biggest concern in this paper is that the stimulation protocols are not applied after pain was induced in the subjects, but before. This is not bad in itself, but as the paper presents the stimulations as potential "treatments" it generates a severe mismatch between the objective, context (introduction), and impact (discussion) presented for the experiments, and how they are actually designed. This adds to the fact that healthy volunteers are used here to generate a study with low translational capability, that aims to be translational and provide an indication for clinics (maybe this is why the reduction in pain intensity caused by PMS when applied in patients, reported in references [29, 35 and 39], is not observed here).

      We will reframe these as prophylaxis, rather than treatment. This study was an experimental study originally designed to determine which stimulation parameters (cTBS or iTBS) would be better suited to modulate pain. We performed the study in healthy individuals undergoing acute pain, akin to a person undergoing painful procedure, which could lead to central sensitization and pain persistence (e.g., post-surgical pain). However, before testing this in individuals undergoing actual procedures, it is essential to determine efficacy in people before translation.

      Khan et al [29] is a case study with neuropathic pain, whereas our study uses a nociceptive pain model. Lim et al [35] employed 10 sessions of rPMS stimulation in patients with acute low back pain. Similar to our study, the change in VAS driven by rPMS was no different than the sham stimulation. We notice that there is no reference 39, and will correct this.

      (2) TENS treatment duration is simply too short (90s) to be considered a therapeutic TENS intervention. I get that this duration was chosen to match the one of PMS, but TENS is never applied like this in the clinics, in which the duration varies from 10 minutes to an hour (or more). This specific study comparing different durations recommends 40 minutes for knee osteoarthritis pain relief (PMID: 12691335). Under these conditions, this stimulation is more similar to a sham TENS than to a real TENS treatment: I would suggest interpreting it as such. As the paper is right now, it could give the impression that PMS could produce clinical effects not observed in TENS, but while the PMS application resembles a clinical one, the TENS application does not (due to its extremely short duration). As an example, giving paracetamol at a dose 10 times below its effective dose is a placebo, not a paracetamol treatment.

      We acknowledge that this is a limitation, and will address this in the Discussion of the revised manuscript.

      (3) This study measured pain, not central sensitization. Specifically, the effects refer to the area of secondary hyperalgesia. The IASP definition for central sensitization is "Increased responsiveness of nociceptive neurons in the central nervous system to their normal or subthreshold afferent input." (PMID: 32694387). No neuronal results are reported in this article. Therefore, central sensitization is not measured here, and we do not know if it is reduced by sound. This frontally clashes with the title of the article and with many interpretations of the results. For a deep review on this topic, I recommend PMID: 39278607 and the short article PMID: 30416715.

      It is widely accepted that central sensitization is the neurophysiological basis of secondary hyperalgesia (see PMID: 11313449; PMID: 10581220).

      The reviewer is conflating secondary hyperalgesia due to central sensitization and chronic pain. Whether chronic pain is driven or maintained by central sensitization is not the goal of our study. However, there is ample evidence that nociceptive drive can induce plasticity in the CNS, which alters pain sensitivity, and that these changes facilitate pain.

      (4a) There is no mention of blinding/masking/concealing in this manuscript. Was the therapist blind to whether they applied one protocol, another, or a placebo? Were the evaluators blind, as this can heavily influence their measurements? And the volunteers? Was allocation concealed? Was this blinding measured afterwards? Blinding is, together with randomization, the most important methodological feature for those interventional studies. For example, not introducing blinding and concealing directly makes a study lose 4 out of 10 points in the PEDro scale, failing to fulfill criteria 3, 5, 6, and 7 (https://pedro.org.au/english/resources/pedro-scale/).

      This study was not designed as an RCT, but rather as experimental study. The study was pseudo-randomized to ensure that the groups had equal allocation and distribution of sexes.

      The groups were blinded to the other stimulations (they were not informed of the various arms of the study, through different consent forms). However, blinding was not measured afterwards (again, this was not meant to be an RCT).

      It was not possible to blind the experimenter as the iTBS and cTBS protocols are very different: iTBS has multiple bursts separated by brief intervals, whereas cTBS is continuous). The data were masked for analysis, and only unblinded at the final stage. We will update the manuscript to reflect these changes.

      (4b) Continuing with methodological considerations, the dropout percentage is high (18% for the first and 25% for the second study), both above the 15% cutoff for criterion 8 of the PEDro, losing another point.

      In the study, only 2 withdrew after feeling the heat, 2 were lost to follow up, and 2 had incomplete data. That totals 6/123 in Study 1. In study 2, none of the participants that met inclusion/exclusion criteria, and who were ‘allocated’ to the study were included (0% dropout/data loss).

      We are unsure how to address this point, as we had clear inclusion/exclusion criteria, and these could only be measured after consenting. As this is an experimental study performed on healthy individuals in a university setting, we are not able to collect any study related data prior to consent.

      We openly reported individuals who did not meet the criteria, and thus were excluded. These criteria are a combination of what is required to collect good quality data, and what we are ethically permitted to do. We understand that in an interventional trial where >15% drop out due to intolerance, or adverse events would indeed be concerning.

      (5) Data reporting and statistical treatment can be improved, as only differences are reported and regression to the mean is not accounted for in this study. Moreover, baseline levels for the dependent variables (control session) are not accessible for evaluation and they are not compared statistically, making it impossible to know if the groups were similar at baseline. This will imply failing criterion 3 of the PEDro, for a total of 2/10 points.

      This only concerns study 1, as study 2 is a within subject study design. Study 1 provides the raw data in Figure 4. We will provide the raw data for each of the primary outcome measures in a supplemental table in the revision.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This manuscript from Schwintek and coworkers describes a system in which gas flow across a small channel (10^-4-10^-3 m scale) enables the accumulation of reactants and convective flow. The authors go on to show that this can be used to perform PCR as a model of prebiotic replication.

      Strengths:

      The manuscript nicely extends the authors' prior work in thermophoresis and convection to gas flows. The demonstration of nucleic acid replication is an exciting one, and an enzyme-catalyzed proof-of-concept is a great first step towards a novel geochemical scenario for prebiotic replication reactions and other prebiotic chemistry.

      The manuscript nicely combines theory and experiment, which generally agree well with one another, and it convincingly shows that accumulation can be achieved with gas flows and that it can also be utilized in the same system for what one hopes is a precursor to a model prebiotic reaction. This continues efforts from Braun and Mast over the last 10-15 years extending a phenomenon that was appreciated by physicists and perhaps underappreciated in prebiotic chemistry to increasingly chemically relevant systems and, here, a pilot experiment with a simple biochemical system as a prebiotic model.

      I think this is exciting work and will be of broad interest to the prebiotic chemistry community.

      Weaknesses:

      The manuscript states: "The micro scale gas-water evaporation interface consisted of a 1.5 mm wide and 250 µm thick channel that carried an upward pure water flow of 4 nl/s ≈ 10 µm/s perpendicular to an air flow of about 250 ml/min ≈ 10 m/s." This was a bit confusing on first read because Figure 2 appears to show a larger channel - based on the scale bar, it appears to be about 2 mm across on the short axis and 5 mm across on the long axis. From reading the methods, one understands the thickness is associated with the Teflon, but the 1.5 mm dimension is still a bit confusing (and what is the dimension in the long axis?) It is a little hard to tell which portion (perhaps all?) of the image is the channel. This is because discontinuities are present on the left and right sides of the experimental panels (consistent with the image showing material beyond the channel), but not the simulated panels. Based on the authors' description of the apparatus (sapphire/CNC machined Teflon/sapphire) it sounds like the geometry is well-known to them. Clarifying what is going on here (and perhaps supplying the source images for the machined Teflon) would be helpful.

      We understand. We will update the figures to better show dimensions of the experimental chamber. We will also add a more complete Figure in the supplementary information. Part of the complexity of the chamber however stems from the fact that the same chamber design has also been used to create defined temperature gradients which are not necessary and thus the chamber is much more complex than necessary.

      We added the scheme of the whole PTFE Chip to Figure 2 in the top left corner, indicating the ROI shown in the fluorescence micrographs. Additionally, the channel walls are now clearly indicated by white dotted lines. The dimensions of the setup are now shown clearer, by showing the total width of the channel as well as its height until the gas flux channel, as well as its depth. Changed caption of the figure accordingly and it now reads: “[…] The PTFE chip cutout in the top left corner shows the ROI used for the micrographs. The color scale is equal for both simulation and experiment and Channel dimensions are 4 x 1.5 x 0.25 mm as indicated. Dotted lines visualize the location of the channel walls. […]“

      The data shown in Figure 2d nicely shows nonrandom residuals (for experimental values vs. simulated) that are most pronounced at t~12 m and t~40-60m. It seems like this is (1) because some symmetry-breaking occurs that isn't accounted for by the model, and perhaps (2) because of the fact that these data are n=1. I think discussing what's going on with (1) would greatly improve the paper, and performing additional replicates to address (2) would be very informative and enhance the paper. Perhaps the negative and positive residuals would change sign in some, but not all, additional replicates?

      To address this, we will show two more replicates of the experiment and include them in Figure 2.

      We are seeing two effects when we compare fluorescence measurements of the experiments.

      Firstly, degassing of water causes the formation of air-bubbles, which are then transported upwards to the interface, disrupting fluorescence measurements. This, however, mostly occurs in experiments with elevated temperatures for PCR reactions, such as displayed in Figure 4.

      Secondly, due to the high surface tension of water, the interface is quite flexible. As the inflow and evaporation work to balance each other, the shape of the interface adjusts, leading to alterations in the circular flow fields below.

      Thus the conditions, while overall being in steady state, show some fluctuations. The strong dependence on interface shape is also seen in the simulation. However, modeling a dynamic interface shape is not so easy to accomplish, so we had to stick to one geometry setting. Again here, the added movies of two more experiments should clarify this issue.

      We performed three more replicates of the experiment and included the averaged data points together with their respective standard deviation as error bars in Figure 2d. Additionally, the videos of each individual repeat are now added to the supplementary files for the reader to better understand where the strong fluctuations around half an hour come from. The Figure caption was adjusted to “ […] The maximum relative concentration of DNA increased within an hour to ~30 X the initial concentration, with the trend following the simulation. Error bars are the standard deviation from four independent measurements. […].

      The main text was also changed to better explain how the fluctuations impact the measurements: […] Water continuously evaporated at the interface, but nucleic acids remained in the aqueous phase accumulating near the interface. They could only escape downward either by diffusion or by the vortex induced by the gas flowing across the interface, pushing the molecules back deeper into the bulk (See the flow lines in Fig2(b) taken from the simulation).  As the gas flow continuously removed excess vapor, the evaporation rate remained constant. Thus, except for fluctuations, a stable interface shape should be expected. However, due to the high surface tension of water, the interface is very flexible. As the inflow and evaporation work to balance each other, the shape of the interface adjusts, likely in response to small fluctuations in gas pressure and spatial variations in water surface tension. This is leading to alterations in the circular flow fields below (Supplementary Movie 2).

      As these fluctuations are difficult to simulate, we decided to stick with one interface shape, matching evaporation and inflow speeds. The evaporation rate at the interface was therefore set to be proportional to the vapor concentration gradient and varied spatially along the interface between 5 and 10.5 µm/s (See Suppl. Fig. VI.1(d)). Using the known diffusion coefficient of 95 µm²/s for the 63mer[9]}, the simulation closely matched the experimental results. In both cases, DNA accumulated in regions with circular flow patterns driven by the gas flux (Fig.2(b), right panel).

      5 minutes after starting the experiment, the maximum DNA accumulation was 3-fold, while after one hour of evaporation, around 30-fold accumulation was observed. Due to molecules residing in very shallow volumes when directly at the interface, the fluorescence signal can vary drastically compared to measurements deeper in the bulk. This can be seen in the fluctuations between independent measurements (See Supplementary Movies 2b,2b,2c), especially around 0.5~h shown in Figure 2(d). The simulated maximum accumulation followed the experimental results and starts saturating after about one hour (Fig.2(d)). […]”

      The authors will most likely be familiar with the work of Victor Ugaz and colleagues, in which they demonstrated Rayleigh-Bénard-driven PCR in convection cells (10.1126/science.298.5594.793, 10.1002/anie.200700306). Not including some discussion of this work is an unfortunate oversight, and addressing it would significantly improve the manuscript and provide some valuable context to readers. Something of particular interest would be their observation that wide circular cells gave chaotic temperature profiles relative to narrow ones and that these improved PCR amplification (10.1002/anie.201004217). I think contextualizing the results shown here in light of this paper would be helpful.

      Thanks for pointing this out and reminding us. We apologize. We agree that the chaotic trajectories within Rayleigh-Bénard convection cells lead to temperature oscillations similar to the salt variations in our gas-flux system. Although the convection-driven PCR in Rayleigh-Bénard is not isothermal like our system, it provides a useful point of comparison and context for understanding environments that can support full replication cycles. We will add a section comparing approaches and giving some comparison into the history of convective PCR and how these relate to the new isothermal implementation.

      We added a main text paragraph after the last paragraph in section “Strand Separation Dynamics”: “[…]Rayleigh-Bénard convection cells generate similar patterns to those seen in Fig. 3(c) The oscillations in salt concentration resemble the temperature fluctuations observed in convection-based PCR reactions from earlier studies [32,33], which showed that chaotic temperature variations, compared to periodic ones, enhanced the efficiency of the PCR reaction.[…]

      Again, it appears n=1 is shown for Figure 4a-c - the source of the title claim of the paper - and showing some replicates and perhaps discussing them in the context of prior work would enhance the manuscript.

      We appreciate the reviewer for bringing this to our attention. We will now include the two additional repeats for the data shown in Figure 4c, while the repeats of the PAGE measurements are already displayed in Supplementary Fig. IX.2. Initially, we chose not to show the repeats in Figure 4c due to the dynamic and variable nature of the system. These variations are primarily caused by differences at the water-air interface, attributed to the high surface tension of water. Additionally, the stochastic formation of air bubbles in the inflow—despite our best efforts to avoid them—led to fluctuations in the fluorescence measurements across experiments. These bubbles cause a significant drop in fluorescence in a region of interest (ROI) until the area is refilled with the sample.

      Unlike our RNA-focused experiments, PCR requires high temperatures and degassing a PCR master mix effectively is challenging in this context. While we believe our chamber design is sufficiently gas-tight to prevent air from diffusing in, the high surface-to-volume ratio in microfluidics makes degassing highly effective, particularly at elevated temperatures. We anticipate that switching to RNA experiments at lower temperatures will mitigate this issue, which is also relevant in a prebiotic context.

      The reviewer’s comments are valid and prompt us to fully display these aspects of the system. We will now include these repeats in Figure 4c to give readers a deeper understanding of the experiment's dynamics. Additionally, we will provide videos of all three repeats, allowing readers to better grasp the nature of the fluctuations in SYBR Green fluorescence depicted in Figure 4c.

      The data from the triplicates are now added to Figure 4c, showing how air bubbles, forming through degassing at the high temperatures required for Taq polymerase, disrupt the measurement, as they momentarily dry off the channel and stop the reaction until the channel fills again. Figure caption has been adapted and now reads: “[…] Dotted lines show the data from independent repeats. Air bubbles formed through degassing can momentarily disrupt the reaction. […]”

      We additionally changed the main text to explain the reader the experimental difficulties: “[…] In other repetitions of the reaction, this increase was sometimes even observed earlier, around the one-hour mark (dotted lines). However, air bubbles nucleated by degassing events rise and temporarily dry out the channel, interrupting the reaction until the liquid refills the channel (Supplementary Movies 4,4b,4c\&5). Despite our best efforts, we were unable to fully prevent this, especially given the high temperatures required for Taq polymerase activity. In an identical setting when the gas- and water flux were switched off, no fluorescence increase was found (See Fig. 4(c) red lines). Fluorescence variations are additionally caused by fluctuations in the position of the gas-water interface, as discussed earlier. […]”

      I think some caution is warranted in interpreting the PCR results because a primer-dimer would be of essentially the same length as the product. It appears as though the experiment has worked as described, but it's very difficult to be certain of this given this limitation. Doing the PCR with a significantly longer amplicon would be ideal, or alternately discussing this possible limitation would be helpful to the readers in managing expectations.

      This is a good point and should be discussed more in the manuscript. Our gel electrophoresis is capable of distinguishing between replicate and primer dimers. We know this since we were optimizing the primers and template sequences to minimize primer dimers, making it distinguishable from the desired 61mer product. That said, all of the experiments performed without a template strand added did not show any band in the vicinity of the product band after 4h of reaction, in contrast to the experiments with template, presenting a strong argument against the presence of primer dimers.

      We added a main text section explaining this to the reader: “[…]Suppl. Fig. IX.2 shows all independent repeats of the corresponding experiments. No product was detected in any of these cases, ruling out reaction limitations such as primer dimer formation. Primer dimers would form even in the absence of a template strand and would be identifiable through gel electrophoresis. As Taq polymerase requires a significant overlap between the two dimers to bind, this would result in a shorter product compared to the 61mer used here.  […]”

      Reviewer #2 (Public review):

      Schwintek et al. investigated whether a geological setting of a rock pore with water inflow on one end and gas passing over the opening of the pore on the other end could create a non-equilibrium system that sustains nucleic acid reactions under mild conditions. The evaporation of water as the gas passes over it concentrates the solutes at the boundary of evaporation, while the gas flux induces momentum transfer that creates currents in the water that push the concentrated molecules back into the bulk solution. This leads to the creation of steady-state regions of differential salt and macromolecule concentrations that can be used to manipulate nucleic acids. First, the authors showed that fluorescent bead behavior in this system closely matched their fluid dynamic simulations. With that validation in hand, the authors next showed that fluorescently labeled DNA behaved according to their theory as well. Using these insights, the authors performed a FRET experiment that clearly demonstrated the hybridization of two DNA strands as they passed through the high Mg++ concentration zone, and, conversely, the dissociation of the strands as they passed through the low Mg++ concentration zone. This isothermal hybridization and dissociation of DNA strands allowed the authors to perform an isothermal DNA amplification using a DNA polymerase enzyme. Crucially, the isothermal DNA amplification required the presence of the gas flux and could not be recapitulated using a system that was at equilibrium. These experiments advance our understanding of the geological settings that could support nucleic acid reactions that were key to the origin of life.

      The presented data compellingly supports the conclusions made by the authors. To increase the relevance of the work for the origin of life field, the following experiments are suggested:

      (1) While the central premise of this work is that RNA degradation presents a risk for strand separation strategies relying on elevated temperatures, all of the work is performed using DNA as the nucleic acid model. I understand the convenience of using DNA, especially in the latter replication experiment, but I think that at least the FRET experiments could be performed using RNA instead of DNA.

      We understand the request only partially. The modification brought about by the two dye molecules in the FRET probe to be able to probe salt concentrations by melting is of course much larger than the change of the backbone from RNA to DNA. This was the reason why we rather used the much more stable DNA construct which is also manufactured at a lower cost and in much higher purity also with the modifications. But we think the melting temperature characteristics of RNA and DNA in this range is enough known that we can use DNA instead of RNA for probing the salt concentration in our flow cycling.

      Only at extreme conditions of pH and salt, RNA degradation through transesterification, especially under alkaline conditions is at least several orders of magnitude faster than spontaneous degradative mechanisms acting upon DNA [Li, Y., & Breaker, R. R. (1999). Kinetics of RNA degradation by specific base catalysis of transesterification involving the 2 ‘-hydroxyl group. Journal of the American Chemical Society, 121(23), 5364-5372.]. The work presented in this article is however focussed on hybridization dynamics of nucleic acids. Here, RNA and DNA share similar properties regarding the formation of double strands and their respective melting temperatures. While RNA has been shown to form more stable duplex structures exhibiting higher melting temperatures compared to DNA [Dimitrov, R. A., & Zuker, M. (2004). Prediction of hybridization and melting for double-stranded nucleic acids. Biophysical Journal, 87(1), 215-226.], the general impact of changes in salt, temperature and pH [Mariani, A., Bonfio, C., Johnson, C. M., & Sutherland, J. D. (2018). pH-Driven RNA strand separation under prebiotically plausible conditions. Biochemistry, 57(45), 6382-6386.] on respective melting temperatures follows the same trend for both nucleic acid types. Also the diffusive properties of RNA and DNA are very similar [Baaske, P., Weinert, F. M., Duhr, S., Lemke, K. H., Russell, M. J., & Braun, D. (2007). Extreme accumulation of nucleotides in simulated hydrothermal pore systems. Proceedings of the National Academy of Sciences, 104(22), 9346-9351.].

      Since this work is a proof of principle for the discussed environment being able to host nucleic acid replication, we aimed to avoid second order effects such as degradation by hydrolysis by using DNA as a proxy polymer. This enabled us to focus on the physical effects of the environment on local salt and nucleic acid concentration. The experiments performed with FRET are used to visualize local salt concentration changes and their impact on the melting temperature of dissolved nucleic acids.  While performing these experiments with RNA would without doubt cover a broader application within the field of origin of life, we aimed at a step-by-step / proof of principle approach, especially since the environmental phenomena studied here have not been previously investigated in the OOL context. Incorporating RNA-related complexity into this system should however be addressed in future studies. This will likely require modifications to the experimental boundary conditions, such as adjusting pH, temperature, and salt concentration, to account for the greater duplex stability of RNA. For instance, lowering the pH would reduce the RNA melting temperature [Ianeselli, A., Atienza, M., Kudella, P. W., Gerland, U., Mast, C. B., & Braun, D. (2022). Water cycles in a Hadean CO2 atmosphere drive the evolution of long DNA. Nature Physics, 18(5), 579-585.].

      (2) Additionally, showing that RNA does not degrade under the conditions employed by the authors (I am particularly worried about the high Mg++ zones created by the flux) would further strengthen the already very strong and compelling work.

      Based on literature values for hydrolysis rates of RNA [Li, Y., & Breaker, R. R. (1999). Kinetics of RNA degradation by specific base catalysis of transesterification involving the 2 ‘-hydroxyl group. Journal of the American Chemical Society, 121(23), 5364-5372.], we estimate RNA to have a half-life of multiple months under the deployed conditions in the FRET experiment (High concentration zones contain <1mM of Mg2+). Additionally, dsRNA is multiple orders of magnitude more stable than ssRNA with regards to degradation through hydrolysis [Zhang, K., Hodge, J., Chatterjee, A., Moon, T. S., & Parker, K. M. (2021). Duplex structure of double-stranded RNA provides stability against hydrolysis relative to single-stranded RNA. Environmental Science & Technology, 55(12), 8045-8053.], improving RNA stability especially in zones of high FRET signal. Furthermore, at the neutral pH deployed in this work, RNA does not readily degrade. In previous work from our lab [Salditt, A., Karr, L., Salibi, E., Le Vay, K., Braun, D., & Mutschler, H. (2023). Ribozyme-mediated RNA synthesis and replication in a model Hadean microenvironment. Nature Communications, 14(1), 1495.], we showed that the lifetime of RNA under conditions reaching 40mM Mg2+ at the air-water interface at 45°C was sufficient to support ribozymatically mediated ligation reactions in experiments lasting multiple hours.

      With that in mind, gaining insight into the median Mg2+ concentration across multiple averaged nucleic acid trajectories in our system (see Fig. 3c&d) and numerically convoluting this with hydrolysis dynamics from literature would be highly valuable. We anticipate that longer residence times in trajectories distant from the interface will improve RNA stability compared to a system with uniformly high Mg2+ concentrations.

      Added a new Supplementary section for this. We used the trace from Figure 3(c) and calculated the hydrolysis rate for each timestep by using literature values from RNA [Li, Y., & Breaker, R. R. (1999). Kinetics of RNA degradation by specific base catalysis of transesterification involving the 2 ‘-hydroxyl group. Journal of the American Chemical Society, 121(23), 5364-5372.]. We conclude that the conditions deployed for the experiment are not harsh on RNA, with hydrolysis rates in the E-6 1/min regime. The figure below (also now in the supplementary information) shows the hydrolysis of RNA deployed under the conditions of the experiment in Figure 3. RNA is not expected to hydrolyze under these conditions and timescales, in which a replication reaction would occur. With a half life of around 83 days, even a prebiotically plausible – very slow – replication reaction would not be constrained by hydrolysis boundary conditions in this scenario.

      Referenced to this section in the supplementary information in the maintext: […] In the experimental conditions used here, RNA would also not readily degrade, even if the strand enters the high salt regimes (See Suppl. Sec. IX). Using literature values for hydrolysis rates under the deployed conditions, we estimate dissolved RNA to have a half life of around 83 days. […]

      (3) Finally, I am curious whether the authors have considered designing a simulation or experiment that uses the imidazole- or 2′,3′-cyclic phosphate-activated ribonucleotides. For instance, a fully paired RNA duplex and a fluorescently-labeled primer could be incubated in the presence of activated ribonucleotides +/- flux and subsequently analyzed by gel electrophoresis to determine how much primer extension has occurred. The reason for this suggestion is that, due to the slow kinetics of chemical primer extension, the reannealing of the fully complementary strands as they pass through the high Mg++ zone, which is required for primer extension, may outcompete the primer extension reaction. In the case of the DNA polymerase, the enzymatic catalysis likely outcompetes the reannealing, but this may not recapitulate the uncatalyzed chemical reaction.

      This is certainly on our to-do list for future experiments in this setting. Our current focus is on templated ligation rather than templated polymerization and we are working hard to implement RNA-only enzyme-free ligation chain reaction, based on more optimized parameters for the templated ligation from 2’3’-cyclic phosphate activation that was just published [High-Fidelity RNA Copying via 2′,3′-Cyclic Phosphate Ligation, Adriana C. Serrão, Sreekar Wunnava, Avinash V. Dass, Lennard Ufer, Philipp Schwintek, Christof B. Mast, and Dieter Braun, JACS doi.org/10.1021/jacs.3c10813 (2024)]. But we first would try this at an air-water interface which was shown to work with RNA in a temperature gradient [Ribozyme-mediated RNA synthesis and replication in a model Hadean microenvironment, Annalena Salditt, Leonie Karr, Elia Salibi, Kristian Le Vay, Dieter Braun & Hannes Mutschler, Nature Communications doi.org/10.1038/s41467-023-37206-4 (2023)] before making the jump to the isothermal setting we describe here. So we can understand the question, but it was good practice also in the past to first get to know the setting with PCR, then jump to RNA.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) Could the authors comment on the likelihood of the geological environments where the water inflow velocity equals the evaporation velocity?

      This is an important point to mention in the manuscript, thank you for pointing that out. To produce a defined experiment, we were pushing the water out with a syringe pump, but regulated in a way that the evaporation was matching our flow rate. We imagine that a real system will self-regulate the inflow of the water column on the one hand side by a more complex geometry of the gas flow, matching the evaporation with the reflow of water automatically. The interface would either recede or move closer to the gas flux, depending on whether the inflow exceeds or falls short of the evaporation rate. As the interface moves closer, evaporation speeds up, while moving away slows it down. This dynamic process stabilizes the system, with surface tension ultimately fixing the interface in place.

      We have seen a bit of this dynamic already in the experiments, could however so far not yet find a good geometry within our 2-dimensional constant thickness geometry to make it work for a longer time. Very likely having a 3-dimensional reservoir of water with less frictional forces would be able to do this, but this would require a full redesign of a multi-thickness microfluidics. The more we think about it, the more we envisage to make the next implementation of the experiment with a real porous volcanic rock inside a humidity chamber that simulates a full 6h prebiotic day. But then we would lose the whole reproducibility of the experiment, but likely gain a way that recondensation of water by dew in a cold morning is refilling the water reservoirs in the rocks again. Sorry that I am regressing towards experiments in the future.

      We added a paragraph after the second paragraph in Results and Discussion.

      It now reads: […] For a real early Earth environment we envision a system that self-regulates the water column's inflow by automatically balancing evaporation with capillary flows. The interface adjusts its position relative to the gas flux, moving closer if the inflow is less than the evaporation rate, or receding if it exceeds it. When the interface nears the gas flux, evaporation accelerates, while moving it away slows evaporation. This dynamic process stabilizes the system, with surface tension ultimately fixing the interface's position. […]

      (2) Could the authors speculate on using gases other than ambient air to provide the flux and possibly even chemical energy? For example, using carbonyl sulfide or vaporized methyl isocyanide could drive amino acid and nucleotide activation, respectively, at the gas-water interface.

      This is an interesting prospect for future work with this system. We thought also about introducing ammonia for pH control and possible reactions. We were amazed in the past that having CO2 instead of air had a profound impact on the replication and the strand separation [Water cycles in a Hadean CO2 atmosphere drive the evolution of long DNA, Alan Ianeselli, Miguel Atienza, Patrick Kudella, Ulrich Gerland, Christof Mast & Dieter Braun, Nature Physics doi.org/10.1038/s41567-022-01516-z (2022)]. So going more in this direction absolutely makes sense and as it acts mostly on the length-selectively accumulated molecules at the interface, only the selected molecules will be affected, which adds to the selection pressure of early evolutionary scenarios.

      Of course, in the manuscript, we use ambient air as a proxy for any gas, focusing primarily on the energy introduced through momentum transfer and evaporation. We speculate that soluble gasses could establish chemical gradients, such as pH or redox potential, from the bulk solution to the interface, similar to the Mg2+ accumulation shown in Figure 3c. The nature of these gradients would depend on each gas's solubility and diffusivity. We have already observed such effects in thermal gradients [Keil, L. M., Möller, F. M., Kieß, M., Kudella, P. W., & Mast, C. B. (2017). Proton gradients and pH oscillations emerge from heat flow at the microscale. Nature communications, 8(1), 1897.] and finding similar behavior in an isothermal environment would be a significant discovery.

      Added a paragraph in the Conclusion to showcase this: [… ] Furthermore we expect that other gases, such as CO2, could establish chemical gradients in this environment. Such gradients have been observed in thermal gradients before [23] and finding similar behaviour in an isothermal environment would be a significant discovery.[…]

      (3) Line 162: Instead of "risk," I suggest using "rate".

      Thanks for pointing this out! Will be changed.

      Fixed.

      (4) Using FRET of a DNA duplex as an indicator of salt concentration is a decent proxy, but a more direct measurement of salt concentration would provide further merit to the explicit statement that it is the salt concentration that is changing in the system and not another hidden parameter.

      Directly observing salt concentration using microscopy is a difficult task. While there are dyes that change their fluorescence depending on the local Na+ or Mg2+ concentration, they are not operating differentially, i.e. by making a ratio between two color channels. Only then we are not running into artifacts from the dye molecules being accumulated by the non-equilibrium settings. We were able to do this for pH in the past, but did not find comparable optical salt sensors. This is the reason we ended up with a FRET pair, with the advantage that we actually probe the strand separation that we are interested in anyhow. Using such a dye in future work would however without a doubt enhance the understanding of not only this system, but also our thermal gradient environments.

      (5) Figure 3a: Could the authors add information on "Dried DNA" to the caption? I am assuming this is the DNA that dried off on the sides of the vessel but cannot be sure.

      Thanks to the reviewer for pointing this out. This is correct and we will describe this better in the revised manuscript.

      Added a sentence in the caption to address this: […] Fluctuations in interface position can dry and redissolve DNA repeatedly (see “Dried DNA” in right panel). […]

      (6) Figure 4b and c: How reproducible is this data? Have the authors performed this reaction multiple independent times? If so, this data should be added to the manuscript.

      The data from the gel electrophoresis was performed in triplicates and is shown in full in supplementary information. The data in c is hard to reproduce, as the interface is not static and thus ROI measurements are difficult to perform as an average of repeats. Including the data from the independent repeats will however give the reader insight into some of the experimental difficulties, such as air bubbles, which form from degassing as the liquid heats up, that travel upwards to the interface, disrupting the ongoing fluorescence measurements.

      This was also pointed out by reviewer 1 and addressed there.

      (7) Line 256: "shielding from harmful UV" statement only applies to RNA oligomers as UV light may actually be beneficial for earlier steps during ribonucleoside synthesis. I suggest rephrasing to "shielding nucleic acid oligomers from UV damage.".

      Will be adjusted as mentioned.

      Fixed.

      (8) The final paragraph in the Results and Discussion section would flow better if placed in the Conclusion section.

      This is a good point and we will merge results and discussion closer together.

      Fixed.

      (9) Line 262, "...of early Life" is slightly overstating the conclusions of the study. I suggest rephrasing to "...of nucleic acids that could have supported early life."

      This is a fair comment. We thank the reviewer for his detailed analysis of the manuscript!

      Changed the phrase to: […]In this work we investigated a prebiotically plausible and abundant geological environment to support the replication of nucleic acids. […]

      (10) In references, some of the journal names are in sentence case while others are in title case (see references 23 and 26 for example).

      Thanks - this will be fixed.

      Fixed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This study provides compelling evidence that RAR, rather than its obligate dimerization partner RXR, is functionally limiting for chromatin binding. This manuscript provides a paradigm for how to dissect the complicated regulatory networks formed by dimerizing transcription factor families.

      Dahal and colleagues use advanced SMT techniques to revisit the role of RXR in DNA-binding of the type-2 nuclear receptor (T2NR) RAR. The dominant consensus model for regulated DNA binding of T2NRs posits that they compete for a limited pool of RXR to form an obligate T2NR-RXR dimer. Using advanced SMT and proximity-assisted photoactivation technologies, Dahal et al. now test the effect of manipulating the endogenous pool size of RAR and RXR on heterodimerization and DNA-binding in live U2OS cells. Surprisingly, it turns out that RAR, rather than RXR, is functionally limiting for heterodimerization and chromatin binding. By inference, the relative pool size of various T2NRs expressed in a given cell, rather than RXR, is likely to determine chromatin binding and transcriptional output.

      The conclusions of this study are well supported by the experimental results and provide unexpected novel insights into the functioning of the clinically important class of T2NR TFs. Moreover, the presented results show how the use of novel technologies can put long-standing theories on how transcription factors work upside down. This manuscript provides a paradigm for how to further dissect the complicated regulatory networks formed by T2NRs or other dimerizing TFs. I found this to be a complete story that does not require additional experimental work. However, I do have some suggestions for the authors to consider.

      Reviewer #1 (Recommendations For The Authors):

      (1) Does the increased chromatin binding measured when the RAR levels are increased reflect a higher occupancy of a similar set of loci, or are additional loci bound? The authors could discuss this issue in the context of the published literature. Obviously, this could be addressed experimentally by ChIP-seq or a similar analysis, but this would extend beyond the main topic of this manuscript.

      We attempted to explore this experimentally using ChIP-seq with multiple RAR- and RXR-specific antibodies. Unfortunately, our results were inconclusive, as the antibody enrichment relative to the IgG control was insufficient for reliable interpretation. Specifically, our ChIP-seq enrichment levels were only around 1.5fold, while the accepted standard for meaningful ChIP enrichment is typically at least 2-fold. Due to these technical limitations, we decided to defer these experiments for now.

      However, we agree with the reviewer that understanding whether the increased chromatin binding of RAR reflects higher occupancy at the same set of loci or binding to additional loci is a key question. In similar experiments involving the transcription factor TFEB (Esbin et al., 2024, Genes Dev, doi: 10.1101/gad.351633.124) where an increase in the SMT bound fraction occurred, both scenarios—higher occupancy at known loci and binding to additional loci in ChIP-seq was observed. So, addressing this intriguing possibility in future studies focused on RAR and RXR would be interesting.

      (2) The results presented suggest convincingly that endogenous RXR is normally in excess to its binding partners (in U2OS cells). This point could be strengthened further by reducing RXR levels, e.g., by knocking out 1 allele or the use of shRNAs (although the latter method might be too hard to control). Overexpression of another T2NR might also help determine the buffer capacity of RXR.

      We appreciate the reviewers’ acknowledgment that our results convincingly demonstrate that endogenous RXR is typically in excess relative to its binding partners in U2OS cells. We agree that this conclusion could be further reinforced by experiments such as overexpression of another T2NR to test RXR's buffering capacity. We are actively pursuing follow-up experiments involving overexpression of additional T2NRs to address this question in more detail. These studies are ongoing, and we plan to explore the buffer capacity of RXR more extensively in a future manuscript.

      (3) The ~10% difference in fbound of RAR and RXR (in Figs 1 and 2), while they should be 1:1 dimers, is explained by invoking the expression of RXR isoforms. Can the authors be more specific concerning the nature of these isoforms?

      We have provided detailed information about different T2NRs expressed in U2OS cells according to the Expression Atlas and the Human Protein Atlas Database in Supplementary Table S1. Table S1 specifically shows that both isoforms of RXRα and RXRβ are expressed in U2OS cells. Additionally, the caption of Table S1 explicitly notes the presence of isoform RXRβ in U2OS cells. In the main text, we reference Table S1 when discussing the 10% difference in fbound between RARα and RXRα, and we have now suggested that the expression of RXRβ likely accounts for the observed discrepancy.

      Reviewer #2 (Public Review):

      Summary:

      In the manuscript "Surprising Features of Nuclear Receptor Interaction Networks Revealed by Live Cell Single Molecule Imaging", Dahal et al combine fast single molecule tracking (SMT) with proximity-assisted photoactivation (PAPA) to study the interaction between RARa and RXRa. The prevalent model in the nuclear receptor field suggests that type II nuclear receptors compete for a limiting pool of their partner RXRa. Contrary to this, the authors find that over-expression of RARa but not RXRa increases the fraction of RXRa molecules bound to chromatin, which leads them to conclude that the limiting factor is the abundance of RARa and not RXRa. The authors also perform experiments with a known RARa agonist, all trans retinoic acid (atRA) which has little effect on the bound fraction. Using PAPA, they show that chromatin binding increases upon dimerization of RARa and RXRa.

      Strengths:

      In my view, the biggest strength of this study is the use of endogenously tagged RARa and RXRa cell lines. As the authors point out, most previous studies used either in vitro assays or over-expression. I commend the authors on the generation of single-cell clones of knock-in RARa-Halo and Halo-RXRa. The authors then carefully measure the abundance of each protein using FACS, which is very helpful when comparing across conditions. The manuscript is generally well written and figures are easy to follow. The consistent color-scheme used throughout the manuscript is very helpful.

      Weaknesses:

      (1) Agonist treatment:

      The authors test the effect of all trans retinoic acid (atRA) on the bound fraction of RARa and RXRa and find that "These results are consistent with the classic model in which dimerization and chromatin binding of T2NRs are ligand independent." However, all the agonist treatments are done in media containing FBS. FBS is not chemically defined and has been found to have between 10 and 50 nM atRA (see references in PMID 32359651 for example). The addition of 1 nM or 100 nM atRA is unlikely to result in a strong effect since the medium already contains comparable or higher levels of agonist. To test their hypothesis of ligand-independent dimerization, the authors should deplete the media of atRA by growing the cells in a medium containing charcoal-stripped FBS for at least 24 hours before adding agonist.

      We acknowledge the reviewer's concern regarding the presence of atRA in FBS and agree that it may introduce baseline levels of agonist. However, in our experiments, both the 1 nM and 100 nM atRA treatments resulted in observable changes in RAR expression levels (Figure S3C). Additionally, the luciferase assays demonstrated that 100 nM atRA significantly increased retinoic acid-responsive promoter activity (Figure S1C). Given these clear responses to atRA, we believe the observed lack of effect on the chromatin-bound fraction cannot be attributed to the presence of comparable or higher levels of atRA in the FBS, as the reviewer suggests. Moreover, since our results align with the established literature and do not impact the core findings of our study, we decided not to pursue the suggested experiments with charcoal-stripped FBS in this manuscript.  

      (2) Photobleaching and its effect on bound fraction measurements:

      The authors discard the first 500 to 1000 frames due to the high localization density in the initial frames. This will preferentially discard bound molecules that will bleach in the initial frames of the movie and lead to an over-estimation of the unbound fraction.

      For experiments with over-expression of RAR-Halo and Halo-RXR, the authors state that the cells were pre-bleached and that these frames were used to calculate the mean intensity of the nuclei. When pre-bleaching, bound molecules will preferentially bleach before the diffusing population. This will again lead to an over-representation of the unbound fraction since this is the population that will remain relatively unaffected by the pre-bleaching. Indeed, the bound fraction for over-expressed RARa and RXRa is significantly lower than that for the corresponding knock in lines. To confirm whether this is a biological result, I suggest that the authors either reduce the amount of dye they use so that this pre-bleaching is not necessary or use the direct reactivation strategy they use for their PAPA experiments to eliminate the pre-bleaching step.

      As for the measurement of the nuclear intensity, since the authors have access to multiple HaloTag dyes, they can saturate the HaloTagged proteins with a high concentration of JF646 or JFX650 to measure the mean intensity of the protein while still using the PA-JFX549 for SMT. Together, these will eliminate the need to prebleach or discard any frames.

      The Janelia Fluor dyes used in our experiments are known for their high photostability (Grimm et al., 2021, JACS Au, doi: 10.1021/jacsau.1c00006). During the initial 80 ms imaging to calculate the mean nuclear intensity, the laser power was kept at very low intensity (~3%) for a brief duration (~10 seconds), in contrast to the high-intensity (~100%) used during the tracking experiments, which span around 3 minutes. This low-power illumination does not induce significant photobleaching but merely puts the dyes in a temporary dark state. Therefore, this pre-bleaching step closely resembles the direct reactivation strategy employed in our PAPA experiments.

      To further address the reviewer's concern, we performed a frame cut-off analysis for our SMT movies of endogenous RARα-Halo and over-expressed RARα-Halo (Figure S9B). The analysis shows no significant change in the bound fraction of either endogenous or over-expressed RARα-Halo when discarding the initial 1000 frames. Based on these results, we conclude that the pre-bleaching does not lead to an overestimation of the unbound fraction, and that our experimental approach is robust.

      (3) Heterogeneous expression of the SNAP fusion proteins:

      The cell lines expressing SNAP tagged transgenes shown in Fig S6 have very heterogeneous expression of the SNAP proteins. While the bulk measurements done by Western blotting are useful, while doing single-cell experiments (especially with small numbers - ~20 - of cells), it is important to control for expression levels. Since these transgenic stable lines were not FACS sorted, it would be helpful for the reader to know the spread in the distribution of mean intensities of the SNAP proteins for the cells that the SMT data are presented for. This step is crucial while claiming the absence of an effect upon over-expression and can easily be done with a SNAPTag ligand such as SF650 using the procedure outlined for the over-expressed HaloTag proteins.

      We agree with the reviewer that there is heterogeneity in SNAP protein expression across the transgenic lines. In response to the reviewer’s suggestion, we performed the proposed experiment to assess the distribution of mean intensities for two key experimental conditions: Halo-RXRα with overexpressed RARα-SNAP and HaloRXRα with overexpressed RARαRR-SNAP. These results again confirm that the increase in chromatin-bound fraction of Halo-RXRα is observed only in the presence of RARα capable of heterodimerizing with RXRα, supporting our main conclusion (Figure S9).

      For these experiments, we followed the same labelling procedure described in the methods section for tracking endogenous Halo-tagged proteins alongside transgenic SNAP proteins. As shown in Figure S9, for ~ 70 cell nuclei, the distribution of mean intensities is similar for both conditions, with the bound fraction of Halo-RXRα significantly increasing in the presence of RARα-SNAP compared to RARαRR-SNAP. This analysis underscores that the observed effects are indeed due to the functional differences between the two RARα variants rather than variability in expression levels.

      (4) Definition of bound molecules:

      The authors state that molecules with a diffusion coefficient less than 0.15 um2/s are considered bound and those between 1-15 um2/s are considered unbound. Clarification is needed on how this threshold was determined. In previous publications using saSPT, the authors have used a cutoff of 0.1 um2/s (for example, PMID 36066004, 36322456). Do the results rely on a specific cutoff? A diffusion coefficient by itself is only a useful measure of normal diffusion. Bound molecules are unlikely to be undergoing Brownian motion, but the state array method implemented here does not seem to account for non-normal diffusive modes. How valid is this assumption here?

      We acknowledge the inconsistency in the diffusion coefficient thresholds for defining the chromatin-bound fraction used across our group’s publications. The choice of threshold or cutoff (0.1 µm²/s vs 0.15 µm²/s) is largely arbitrary and does not significantly impact the results. To validate this, we tested the effect of different cutoffs on fbound (%) for endogenously expressed Halo-tagged RARα and RXRα (Figure S10). As shown in Figure S10, there was no substantial difference in fbound (%) calculated using a 0.1 µm²/s versus 0.15 µm²/s cutoff (e.g., RARα clone c156: 47±1% vs 49±1%; RXRα clone D6: 34±1% vs 35±1%). 

      Since we have consistently applied the 0.15 µm²/s cutoff throughout this manuscript across all experimental conditions, the comparative analysis of fbound (%) remains valid. While we agree that a Brownian diffusion model may not fully capture the motion of bound molecules, our state array model accounts for localization error, which likely incorporates some of the chromatin motion features. Moreover, the distinction between bound (<0.15 µm²/s) and unbound (1-15 µm²/s) populations is sufficiently large that using a normal diffusion model is reasonable for our analysis.

      (5) Movies:

      Since this is an imaging manuscript, I request the authors to provide representative movies for all the presented conditions. This is an essential component for a reader to evaluate the data and for them to benchmark their own images if they are to try to reproduce these findings.

      We have now included representative movies for all the SMT experimental conditions presented in the manuscript. Please see data availability section of the manuscript.

      (6) Definition of an ROI:

      The authors state that "ROI of random size but with maximum possible area was selected to fit into the interior of the nuclei" while imaging. However, the readout speed of the Andor iXon Ultra 897 depends on the size of the defined ROI. If the ROI was variable for every movie, how do the authors ensure the same sampling rate?

      We used the frame transfer mode on the Andor iXon Ultra 897 camera for our acquisitions, which allows for fast frame rate measurements without altering the exposure time between frames. Additionally, we verified the metadata of all our movies to ensure a consistent frame interval of 7.4 ms across all conditions. This confirms that the sampling rate was maintained uniformly, despite the variability in ROI size. 

      Reviewer #2 (Recommendations For The Authors):

      (1) 'Hoechst' is mis-spelled.

      We have now corrected this typo in the manuscript.

      (2) Cos7 appears in several places throughout the text. I assume this is a typo. If so, please correct it. If not, please explain if some experiments were done in Cos7 cells and kindly provide a justification for that.

      The use of Cos7 cells is intentional and not a typo. Cos7 cells have been previously utilized in studies investigating the interaction between T2NRs (Kliewer et al., 1992, Nature, doi: 10.1038/355446a0). In our study, due to technical issues with antibodies for coIP in U2OS cells, we initially used Cos7 cells for control experiments to verify that Halo-tagging of RARα and RXRα did not disrupt their interaction, by transiently expressing the constructs in Cos7 cells. Following these control experiments, we confirmed the direct interaction of endogenously expressed RAR and RXR in U2OS cells with their respective binding partners using the SMT-PAPA assay. Since these results confirmed that Halo-tagging did not interfere with RAR-RXR interactions, we chose not to repeat the coIP experiments in U2OS cells.

      Reviewer #3 (Public Review):

      Summary:

      This study aims to investigate the stoichiometric effect between core factors and partners forming the heterodimeric transcription factor network in living cells at endogenous expression levels. Using state-of-the-art single-molecule analysis techniques, the authors tracked individual RARα and RXRα molecules labeled by HALO-tag knock-in. They discovered an asymmetric response to the overexpression of counter-partners. Specifically, the fact that an increase in RARα did not lead to an increase in RXRα chromatin binding is incompatible with the previous competitive core model. Furthermore, by using a technique that visualizes only molecules proximal to partners, they directly linked transcription factor heterodimerization to chromatin binding.

      Strengths:

      The carefully designed experiments, from knock-in cell constructions to singlemolecule imaging analysis, strengthen the evidence of the stoichiometric perturbation response of endogenous proteins. The novel finding that RXR, previously thought to be a target of competition among partners, is in excess provides new insight into key factors in dimerization network regulation. By combining the cutting-edge single-molecule imaging analysis with the technique for detecting interactions developed by the authors' group, they have directly illustrated the relationship between the physical interactions of dimeric transcription factors and chromatin binding. This has enabled interaction analysis in live cells that was challenging in single-molecule imaging, proving it is a powerful tool for studying endogenous proteins.

      Weaknesses:

      As the authors have mentioned, they have not investigated the effects of other T2NRs or RXR isoforms. These invisible factors leave room for interpretation regarding the origin of chromatin binding of endogenous proteins (Recommendations 4). In the PAPA experiments, overexpressed factors are visualized, but changes in chromatin binding of endogenous proteins due to interactions with the overexpressed proteins have not been investigated. This might be tested by reversing the fluorescent ligands for the Sender and Receiver. Additionally, the PAPA experiments are likely to be strengthened by control experiments (Recommendations 5).

      We agree that this would be an interesting experiment. However, there are three technical challenges that complicate its implementation: First, as demonstrated in our original PAPA paper, dark state formation is less efficient when dyes are conjugated to Halo compared to SNAPf, making the reverse configuration less optimal. Second, SNAPf-tagged proteins have slower labeling kinetics than Halotagged proteins, often resulting in under-labeling of SNAPf. Third, our SNAPf transgenes were integrated polyclonally. Since background PAPA scales with the concentration of the sender-labeled protein, variable concentrations of the senderlabeled SNAPf proteins would introduce significant variability, complicating the interpretation of the background PAPA signal. Due to these concerns, we believe that performing reciprocal measurements with reversed fluorescent ligands may not yield reliable results. 

      Reviewer #3 (Recommendations For The Authors):

      (1) The term "Surprising features" in the title is ambiguous and may force readers to search for what it specifically refers to. Including a word that evokes specific features might be helpful.

      Our findings contradict previous work, which suggested that chromatin binding of T2NRs is regulated by competition for a limited pool of RXR. In contrast, we found that RAR expression can limit RXR chromatin binding, but not the other way around, which challenges the existing model. This unexpected result is what we refer to as a "surprising feature" in our title, and we believe it accurately reflects the novel insights our study provides. We also think that this is clearly conveyed in our manuscript abstract, supporting the use of "Surprising features" in the title. 

      (2) p.3, line 11 - The threshold of 0.15 μm2s-1 seems to be a crucial value directly linked to the value of fbound. What is the rationale for choosing this specific value? If consistent conclusions can be obtained using threshold values that are similar but different, it would strengthen the robustness of the results.

      Please refer to our response to Reviewer #2’s Public Review point 4. The threshold choice is arbitrary and doesn’t affect the overall conclusions. To test this, we compared fbound (%) values calculated using both 0.1 μm²s-1 and 0.15 μm²s-1 cutoffs. For example, with endogenously expressed Halo-tagged RARα (clone c156), we observed fbound values of 47±1% vs 49±1%, and for RXRα (clone D6), 34±1% vs 35±1%, respectively (Figure S10). Since we have consistently applied the 0.15 μm²s-1 cutoff across all experimental conditions in this manuscript, the comparisons of fbound (%) between different conditions are robust and valid.

      (3) p.4, line 13 - "the fbound of endogenous RARα-Halo (47{plus minus}1%) was largely unchanged upon expression of SNAP (47{plus minus}1%)" part of the sentence is not surprising. It would make more sense if it were expressed as "the fbound of endogenous RARα-Halo (47{plus minus}1%) was largely unchanged upon expression of RXRα-SNAP (49{plus minus}1%), consistent with the control SNAP (47{plus minus}1%).".

      We understand how the original phrasing may be confusing to the readers and have restructured the sentence as suggested by the reviewer for clarity.

      (4) p.6, line 26 - The discussion that "most chromatin binding of endogenous RXRα in U2OS cells depends on heterodimerization partners other than RARα" seems to contradict the top right figure in Figure 4. If that's the case, the binding partner for the bound red molecule might be yellow rather than blue. Given a decrease in the number of RARα molecules with an unchanged binding ratio, the total number of binding molecules has decreased. Could it be interpreted that the potential reduction in RXRα chromatin binding, accompanying the decrease in binding RARα, is compensated for by other partners?

      We agree with the reviewer that both the yellow and blue molecules in Figure 4 represent T2NRs that can heterodimerize with RXR. For simplicity, we chose to omit the depiction of RXR dimerization with other T2NRs (represented in yellow) in Figure 4. We have now included a note in the figure caption to clarify this. We plan to follow up on the buffer capacity of RXR with other T2NRs in a separate manuscript and will discuss this aspect in more detail once we have data from those experiments.

      (5) Fig. 3 - I expected that DR localizations always appear more frequently than PAPA localizations by the difference in the number of distal molecules. Why does the linear line for SNAP-RXRα in Fig. 3 B have a slope exceeding 1? Also, although the sublinearity is attributed to binding saturation, is there any possibility that this sublinearity originates from the PAPA system like the saturation of PAPA reactivation? Control samples like Halo-SNAPf-3xNLS might address these concerns.

      The number of DR and PAPA localizations depends on the arbitrarily chosen intensity and duration of green and violet light pulses. For any given protein pair, different experimental settings can result in PAPA localizations being greater than, less than, or equal to the number of DR localizations. Therefore, the informative metric is not the absolute number of DR and PAPA localizations, but rather how the ratio of PAPA to DR localizations changes between different conditions—such as between interacting pairs and non-interacting controls.

      Regarding the sublinearity, we agree that it is essential to consider whether the observed sublinearity might stem from saturation of the PAPA signal. We know of two ways in which this could occur:

      First, PAPA can be saturated as the duration of the green light pulse increases and dark-state complexes are depleted. However, this cannot explain the nonlinearity that we observe, because the duration of the green light pulse is constant, and thus the probability that a given complex is reactivated by PAPA is also constant. Likewise, holding the violet pulse duration constant yields a constant probability that a given molecule is reactivated by DR. PAPA localizations are expected to scale linearly with the number of complexes, while DR localizations are expected to scale linearly with the total number of molecules. Sublinear scaling of PAPA localizations with DR localizations thus implies that the number of complexes scales sublinearly with the total concentration of the protein.

      Second, saturation could occur if PAPA localizations are undercounted compared to DR localizations. While this is a valid concern, we consider it unlikely in this case because 1) our localization density is below the level at which our tracking algorithm typically undercounts localizations, and 2) we observe sublinearity for RXR → RAR PAPA even though the number of PAPA localizations is lower than the DR localizations; undercounting due to excessive localization density would be expected to introduce the opposite bias in this case.

      (6) Fig. 4 - The differences between A, B, and C on the right side of the model are subtle, making it difficult to discern where to see. Emphasizing the difference in molecule numbers or grouping free molecules at the top might help clarify these distinctions.

      We appreciate the reviewer’s feedback. In response, we have revised Figure 4 by grouping the free molecules on the top right side for panels A, B and C, as suggested.

      (7) While the main results are obtained through single-molecule imaging, no singlemolecule fluorescence images or trajectory plots are provided. Even just for representative conditions, these could serve as a guide for readers trying to reproduce the experiments with different custom-build microscope setups. Also, considering data availability, depositing the source data might be necessary, at least for the diffusion spectra.

      We have now included representative movies for all the presented SMT conditions as source data. Please see data availability section of the manuscript.

      (8) Tick lines are not visible on many of the graph axes. 

      We have revised the figures to ensure that the tick lines are now clearly visible on all graph axes.

      (9) Inconsistencies in the formatting are present in the methods, such as "hrs" vs. "hours", spacing between numbers and units, and "MgCl2". "u" should be "μ" and "x" should be "×". 

      We have corrected the formatting errors.

      (10) Table S4, rows 16 and 17 - Are "RAR"s typos for "RXR"s? 

      We have corrected this in the manuscript.

      (11) p.10~12 - Are three "Hoestch"s typos for "Hoechst"s? 

      This is now corrected in the manuscript.

      (12) p.11, line 17 - According to the referenced paper, the abbreviation should be "HILO" in all capital letters, not "HiLO". 

      This is now corrected in the manuscript.

      (13) "%" on p.3, line 18, and "." on p.6, line 27 are missing. 

      This missing “%”  and “.” are now added.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Yao S. and colleagues aims to monitor the potential autosomal regulatory role of the master regulator of X chromosome inactivation, the Xist long non-coding RNA. It has recently become apparent that in the human system, Xist RNA can not only spread in cis on the future inactive X chromosome but also reach some autosomal regions where it recruits transcriptional repression and Polycomb marking. Previous work has also reported that Xist RNA can show a diffused signal in some biological contexts in FISH experiments.

      In this study, the authors investigate whether Xist represses autosomal loci in differentiating female mouse embryonic stem cells (ESCs) and somatic mouse embryonic fibroblasts (MEFs). They perform a time course of ESC differentiation followed by Capture Hybridization of Associated RNA Targets (CHART) on both female and male ESCs, as well as pulldowns with sense oligos for Xist. The authors also examine transcriptional activity through RNA-seq and integrate this data with prior ChIP-seq experiments. Additional experiments were conducted in MEFs and Xist-ΔB repeat mutants, the latter fails to recruit Polycomb repressors.

      Based on this experimental design, the authors make several bold claims:

      (1) Xist binds to about a hundred specific autosomal regions.

      (2) This binding is specific to promoter regions rather than broad spreading.

      (3) Xist autosomal signal is inversely correlated with PRC1/2 marks but positively correlated with transcription.

      (4) Xist targeting results in the attenuation of transcription at autosomal regions.

      (5) The B-repeat region is important for autosomal Xist binding and gene repression.

      (6) Xist binding to autosomal regions also occurs in somatic cells but does not lead to gene repression.

      Together, these claims suggest that Xist might play a role in modulating the expression of autosomal genes in specific developmental and cellular contexts in mice.

      Strengths:

      This paper deals with an interesting hypothesis that Xist ncRNA can also function at autosomal loci.

      Weaknesses: The claims reported in this paper are largely unsubstantiated by the data, with multiple misinterpretations, lacking controls, and inadequate statistics. Fundamental flaws in the experimental design/analysis preclude the validity of the findings. Major concerns are listed below: (1) The entire paper is based on the CHART observation that Xist is specifically targeted to autosomal promoters. Overall, the data analysis is flawed and does not support such conclusions. Importantly the sense WT and the 0h controls are not used, nor are the biological replicates. 

      We respectfully disagree with Rev1 but nevertheless thank the reviewer for making some suggestions that helped to strengthen our manuscript.  We have provided new experiments and analyses in the revised manuscript. Please see responses below.

      Rev1 seems to have missed or misunderstood some key experiments. In fact, the sense WT and 0h controls were shown. Furthermore, we included at least two biological replicates for each experiment.

      We used both male ES cells (which do not express Xist) and sense probes as key negative controls, as outlined in Figure S1. Crucially, we only analyzed peaks that were reproducible between biological replicates. The Xist CHART peaks in differentiating female ES cells were significantly enriched above the “background” defined by the sense probe and male controls. Specifically, in comparison to undifferentiated female ES cells (day 0) where both X chromosomes are active and Xist is not induced, Xist CHART robustly pulled down the X chromosome during cell differentiation (day 4, day 7, and day 14). In contrast, male ES cells showed no significant pull-down of the X chromosome, and the sense group also exhibited markedly reduced binding (new Figure S1B). Furthermore, Principal Component Analysis (PCA) of CHART-seq reads (day 4 as an example) include Xist, sense, and input in WT and ΔRepB female, further confirmed that the sense probe CHART was clearly distinguishable from Xist CHART signals. Please see revised Figure S1C. Together, these findings underscore the specificity and robustness of our CHART results.

      Data is typically visualized without quantification, and when quantified, control loci/gene sets are erroneously selected. Firstly, CHART validation on the X in FigS1 is misleading and not based on any quantifications (e.g., see the scale on Kdm6a (0-190) compared to Cdkl5 (0-40)). If scaled appropriately, there is Xist signal on the escapee. 

      Rev1 may have misread the presented data. In the example raised by Rev1, Fig. S1 is inherently quantitative: e.g., a ratio is a number in Fig. S1A (now Fig. S1B) and all gene tracks in Fig. 1B-E are shown with scales. We showed X-linked genes in Fig. S1 (now Fig. S2) as a control to demonstrate that the CHART worked and that Xist accumulated over time from day 0 to day 14. Our new Figure 1B demonstrates the Xist accumulation in graph format. 

      Our paper focuses on Xist autosomal binding sites. Thus, the X-linked examples were placed in the supplement. Escapee genes do in fact accumulate Xist at their promoter regions and this finding is consistent with data published by Simon et al. (2013, Nature). It was therefore not desirable in this paper to reanalyze X-linked genes, including escapees. Nevertheless, to address the reviewer’s concerns, we present new data in new Figure S3A. Here we analyzed the density of Xist binding across X-linked genes, including both active and inactive genes, as well as escapee genes. From this quantitative analysis, it should be clear that escapees do bind Xist. However, from the metagene plots in Figure S3B, we confirm the previous conclusion that escapees bind Xist at high levels just upstream of the promoter and that there is a depletion of Xist in the escapee gene body, consistent with a barrier preventing Xist from moving into the active gene. 

      All X-linked loci should have been quantified and classified based on escape status; sense control should also be quantified, and biological replicates should be shown separately. 

      Please see above response.

      Additionally, in the revised manuscript, we have examined the Irreproducible Discovery Rate (IDR) to validate the reproducibility of peaks between the two replicates in the revised version, and we included a representative example from female WT ES cells at day 4 (revised Figure S4A). The results showed a strong correlation between the replicates, with an IDR threshold of 0.05 (red point > 0.05). As described in the Methods section, to ensure reliable and robust peak identification, we performed peak calling (MACS2) separately on each replicate, and then used bedtools intersect to identify peaks that overlapped between the two replicates. This stringent process, including strict q-value settings in MACS2, ensures the reliability and reproducibility of the peaks presented in this study.

      Secondly, and most importantly, Figure 1 does not convincingly show specific Xist autosomal binding. Panel A quantification is on extremely variable y-scales and actually shows that Xist is recruited globally to nearly all autosomal genes, likely indicating an unspecific signal. Again, the sense and 0h controls should have been quantified along with biological replicates. 

      Figure 1 shows heatmaps and corresponding metagenes for d0, d4, d7, and d14 female ES cells. Two biological replicates are analyzed. In our revised manuscript, we have used Pearson and Spearman correlation coefficients to measure the strength and direction of a relationship between two biological replicates and shown that the two replicates have high reproducibility (new Figure S1A). On d0, the Xist coverage on autosomes and X chromosome is low, but there is a clear increase on d4, d7, and d14, particularly at the TSS of autosomal genes, as shown by the metagene plots on in Figure 1A-B and the CHART density maps in new Figure 1E-F. We also show relative depletion of Xist signals in the male and sense negative controls.

      Upon inspecting genome browser tracks of all regions reported in the manuscript (Rbm14, Srp9, Brf1, Cand2, Thra, Kmt2c, Kmt2e, Stau2, and Bcl7b), the signal is unspecific on all sites with the possible exception of Kmt2e. On all other loci, there is either a strong signal in the 0h ESC controls or more signal in some of the sense controls. This implies that peak calling is picking up false positive regions. How many peaks would have been picked up if the sense or the 0h controls were used for peak calling? It is likely that there would be a lot since there are also possible "peaks" (e.g., Fzd9) in control tracks. 

      The analysis cannot be performed by visual inspection. A statistical analysis must be performed to call signal above noise. This is why we performed peak-calling on two biological replicates and identified overlapping peaks using bedtools intersect to improve reliability. Significant peaks are noted as black bars under each track. As mentioned above, for our analysis, we focused on the top 100 peaks based on peak scores to ensure robustness. Xist has significantly higher signal compared to the sense probe in the Xist-autosomal peak regions (revised Figure 1E-F). Additionally, we conducted peak calling on undifferentiated ES cells (d0) and detected a significantly higher number of peaks (~600) compared to the differentiated states (d4 or d7) (~100).

      Single-cell sequencing studies have shown that about 2% of undifferentiated mESCs express detectable Xist (Pacini et al., Nat Commun, 2021). The Xist peaks in “day 0” cells may be due to the differentiating population.

      Further inspection of the data was not possible as the authors did not provide access to the raw fastq files. When inspecting results from past published experiments {Engreitz, 2013 #1839} reported regions were not bound by Xist. 

      On the contrary, we deposited the raw data files to GEO prior to the submission of the paper and included the reviewer link to access them. As of August 24, 2024, GEO publicly released these files, allowing for full inspection of the data. 

      Regarding the Engreitz publication, it is not recommended to compare our current study to their analysis for the crucial reason that the Engreitz study was not conducted under physiological conditions. The authors overexpressed the Xist gene in male ES cells. Because Xist RNA can silence genes in male cells as well, this ectopic overexpression normally leads to cell death — thus forcing examination of effects in a narrow time window before Xist can fully spread and act across the genome. Comparing our experiments (endogenous Xist expression in female ES cells) to the ectopic overexpression in male ES cells of Engreitz et al. should therefore not be undertaken.

      Thirdly, contrary to the authors' claim, deleting the B repeat does not lead to a loss of autosomal signal. Indeed, comparing Fig1A and Fig2B side by side clearly shows no difference in the autosomal signal, likely because the autosomal signal is CHART background. Properly quantifying the signal with separate replicates as well as the sense and 0h controls is vital. Overall current data together with published results indicate that CHART peak calling on autosomes is due to technical noise or artefacts.

      In our revised manuscript, we have included the quantitative results as mentioned above in the main and supplementary figure (new Figure 1E-F, Figure 2E-F, and S3A). The data clearly show an enrichment in the Xist CHART samples in differentiating female ES cells.

      We believe the reviewer may be comparing the original Figure 1A and Figure 2A (not Figure 2B). As mentioned above, the analysis cannot be performed by visual inspection. Please see new Figure 2E and 2F. From these data, it should be clear that deleting RepB causes a decrease in Xist targeting to autosomal loci.

      (2) The RNA-seq analysis is also flawed and precludes strong statements. Firstly, the analysis frequently lacks statistical analysis (Fig3B, FigS2B-C) and is often based on visualizations (Fig 3D-G) without quantifications. Day 4 B-repeat deletion does not lead to a significant change in the expression of genes close to Xist signal (Fig3H, d14 does not fully show). 

      Please see new revised Figure 3B and Figures S2B-C (now revised as Figures S6A and S6B). 

      Secondly, for all transcriptional analysis, it is important to show autosomal non-target genes, which is not always done. 

      In the revised manuscript, we included non-target genes for each analysis (new Figure 4E-F, 5D and 5F, 7C and 7E, S7F, S8).

      Indeed, both males and B repeat deletion will lead to transcriptional changes on autosomes as a secondary effect from different X inactivation status. The control set, if used, is inappropriate as it compares one randomly selected set of ~100 genes. This introduces sampling error and compares different classes of genes. Since Xist signal targets more active genes, it is important to always compare autosomal target genes to all other autosomal genes with similar basal expression patterns.

      Please see new Figure S8. We included 100 randomly selected non-target sites on autosomes for this comparative analysis. For consistency, we applied the same flanking regions (10 kb) in the analysis of both target and non-target genes. We believe that this selection method for nontargets is appropriate for two reasons: first, it allows us to control for Xist binding and non-binding; second, it ensures a similar number of genes in both groups, providing a robust foundation for statistical analysis. 

      (3) The ChIP-seq analysis also has some problems. The authors claim that there is no positive correlation between genes close to Xist autosomal binding (10kb) compared to those 50kb away (Fig 3C, S2D); however, this analysis is based entirely on metagene visualization. Signal within the Xist binding sites should be quantified (not genes close by) and compared to other types of genomic loci and promoters. Focusing on the 50kb group only as controls is misleading.

      We believe the reviewer may have misunderstood our conclusions. As stated in the paper, we observed lower coverage of the histone marks H3K27me3 and H2AK119ub, associated with PRC2 and PRC1, respectively. Our conclusions regarding PRC1/2 support the RNA-seq results, indicating that Xist tends to bind to actively expressed genes. In other words, these genes exhibit lower levels of PRC-mediated silencing signals. This observation underscores the relationship between Xist binding and gene activity, highlighting that Xist preferentially associates with regions that are less subject to silencing by polycomb repressive complexes.

      Secondly, the authors only look at PRC mark signal upon differentiation; what about the 0h timepoint, i.e., is there pre-marking? 

      Day 0 is not an appropriate timepoint for this analysis because Xist is not yet induced. There is also a small fraction of cells (<5%) that spontaneously differentiate and start to undergo XCI. Because of these reasons, the day 0 timepoint is considered somewhat heterogeneous and it would be difficult to make conclusions regarding Xist peaks in these samples.

      Most worryingly, the data analysis is not consistent between figures (see Fig3C vs 5H-I). In Fig5, the group of Xist targets was chosen as those within 100kb of Xist binding, which would encompass all the control regions from Fig3C. In this analysis, the authors report that there is Xist-dependent H3K27me3 deposition, and in fact, here the Xist autosomal targets have more of it than the controls. Overall, all of this analysis is misleading, and clear conclusions cannot be made.

      We believe that the reviewer may have also misunderstood the analysis in Figure 5. Figure 5 shows the effect of the Xist inhibitor, X1, on H3K27me3 and gene expression. X1 blocks reduces PRC2 targeting and gene silencing — consistent with X1’s effect on RepA as published in Aguilar et al. 2022. 

      All in all, because the fundamental observation is not robust (see point 1), all subsequent analyses are also affected. There are also multiple other inconsistencies within the analysis; however, they have not been included here for brevity.

      We again respectfully disagree with Rev1 but thank the reviewer for making suggestions that helped to strengthen our manuscript.  We believe that the revised manuscript with new analyses is improved in part because of the reviewer’s critical comments.

      Reviewer #2 (Public review):

      Summary:

      To follow-up on recent reports of Xist-autosome interaction the authors examine female (and male transgenic) mESCs and MEFs by CHARTseq. Upon finding that only 10% of reads map to X, they sought to identify reproducible alternative sites of Xist-binding, and identify ~100 autosomal Xistbinding sites and show a transient impact on expression.

      Strengths:

      The authors address a topical and interesting question with a series of models including developmental timepoints and utilize unbiased approaches (CHARTseq, RNAseq). For the CHARTseq they have controls of both sense probes and male cells; and indeed do detect considerable background with their controls. The use of deletions emphasizes that intact functional Xist is involved. The use of 'metagene' plots provides a visual summation of genic impact.

      Reviewer 2 has made some excellent suggestions. We have revised the manuscript accordingly and are grateful to the reviewer for the recommendations.

      Weaknesses:

      Overall, the result presentation has many 'sample' gene presentations (in contrast to the stronger 'metagene' summation of all genes). The manuscript often relies on discussion of prior X chromosomal studies, while the data generated would allow assessment of the X within this study to confirm concordance with prior results using the current methodology/cell lines. 

      Many of the 'follow-up' analyses are in fact reprocessing and comparison of published datasets. The figure legends are limited, and sample size and/or source of control is not always clear. While similar numbers of autosomal Xist-binding sites were often observed, the presented data did not clarify how many were consistent across time-points/cell types. While there were multiple time points/lines assessed, only 2 replicates were generally done.

      We apologize for the deficiencies in the legend.  The revised manuscript has corrected them.

      We generated many new datasets with deep sequencing, with at least two biological replicates for each. Such experiments are extremely expensive by nature. Thus, two biological replicates are typically considered acceptable.

      Additionally, we performed reanalysis of published datasets to test whether — in the hands of other investigators — cell lines expressing Xist also supported autosomal targeting. Figure 4 is a case in point. Here we examined Tg1 and Tg2, which respond to doxycycline to overexpress Xist from an ectopic site. Transcriptomic analysis showed significant downregulation of autosomal Xist targets, as exemplified by Rbm14 and Bcl7b (new Figure 4C, S9B). In contrast, non-targets of Xist such as Stau1 did not demonstrate significant changes in gene expression (new Figure 4E and 4G). Looking across all autosomal target genes, we observed a significant decrease in mean expression in the Xist overexpressing cell lines (new Figure 4D). The fact that the autosomal changes were also observed in datasets generated by other investigators greatly strengthen our conclusions. 

      Aim achievement:

      The authors do identify autosomal sites with enrichment of chromatin marks and evidence of silencing. More details regarding sample size and controls (both treatment, and most importantly choice of 'non-targets' - discussed in comments to authors) are required to determine if the results support the conclusions.

      Specific scenarios for which I am concerned about the strength of evidence underlying the conclusion:

      I found the conclusion "Thus, RepB is required not only for Xist to localize to the X- chromosome but also for its localization to the ~100 autosomal genes " (p5) in constrast to the statement 2 lines prior: "A similar number of Xist peaks across autosomes in ΔRepB cells was observed and the autosomal targets remained similar". Some quantitative statistics would assist in determining impact, both on autosomes and also X; perhaps similar to the quintile analysis done for expression.

      We have added the Xist coverage panel for day 4 and 7 in the identified Xist-autosomal peak regions (new Figure 1E-F, Figure 2E-F), as mentioned above. The results clearly demonstrate that the deletion of RepB decreases Xist binding to autosomes. Also, we showed that ΔRepB increased X-linked genes expression in our revised Figure 3D. 

      It is stated that there is a significant suppression of X-linked genes with the autosomal transgenes; however, only an example is shown in Figure 4B. To support this statement, a full X chromosomal geneset should be shown in panels F and G, which should also list the number of replicates. 

      Please see new Figure 4B.

      As these are hybrid cells, perhaps allelic suppression could be monitored? Is Med14 usually subject to X inactivation in the Ctrl cells, and is the expression reduced from both X chromosomes or preferentially the active (or inactive) X chromosome?

      If Rev2 is referring to Figure 4, the dataset used in Figure 4 comes from another research group and was previously published (Loda, A. et al. Nat Commun, 2017).

      If Rev2 is referring to our ES cells, they are N2 cell lines.  The X chromosomes are fully hybridized (Cas/Mus), but the autosomes are not fully hybridized (Ogawa et al., Science, 2008). Med14 is subject to XCI and is expressed from the Xa, silenced on the Xi. 

      The expression change for autosomes after transgene induction is barely significant; and it was not clear what was used as the Ctrl? This is a critical comparator as doxycycline alone can change expression patterns.

      We agree that there was a modest change in expression after transgene induction, but it is a significant change. Again, the dataset is from a published study where the authors generated doxycycline-responsive Xist transgenes (see above). The control in this case is Dox-treated wildtype cells. We now clarify these points.

      In the discussion there is the statement. "Genetic analysis coupled to transcriptomic analysis showed that Xist down-regulates the target autosomal genes without silencing them. This effect leads to clear sex difference - where female cells express the ~100 or so autosomal genes at a lower level than male cells (Figure 7H)." This sweeping statement fails to include that in MEFs there is no significant expression difference, in transgenics only borderline significance, and at d14 no significant expression difference. The down-regulation overall seems to be transient during development while targeting is ongoing?

      Indeed, the Xist effects on autosomes seem to occur during cell differentiation in ES cells. While there is no apparent effect in MEFs, we cannot exclude effects on other somatic cells. Regardless of whether the effects are in early development or throughout life, the sex differences may have life-long effects in mammals. The study conducted in human cells by the Plath lab also concluded that the differences primarily affect stem cells.

      Finally, I would have liked to see discussion of the consistency of the identified genes to support the conclusion that the autosomal sites are not merely the results of Xist diffusion.

      We address this in the third paragraph of the Discussion. Our main argument is that if autosomal binding were caused by diffusion, then RepB deletion or X1 treatment would have led to increased binding at autosomal sites, as Xist would bind less to the X chromosome. However, as demonstrated in our study, both treatments resulted in reduced Xist binding on both the X chromosome and autosomes. This finding suggests that the binding is specific and reliant on Xist's RepA and RepB domains, rather than being a passive diffusion process.

      To examine overlap between the conditions (days of differentiation and WT/RepB cells), we generated Venn Diagrams as now shown in Figure S4E.

      The impact of Xist on autosomes is important for consideration of impact of changes in Xist expression with disease (notably cancers). Knowing the targets (if consistent) would enable assessment of such impact.

      We thank Rev2 for the very helpful review and for the forward-looking experiments. Indeed, the physiological changes brought on by autosomal targeting will be of future interest.

      Reviewer #3 (Public review):

      Summary:

      Yao et al use CHART to identify chromatin associated with Xist in female mouse ESCs, and, as control, male ESCs at various timepoints of differentiation. Besides binding of Xist to X chromosome regions they found significant binding to autosomes, concentrating mostly on promoter regions of around 100 autosomal genes, as elucidated by MACS. The authors went on to show that the RepB repeat is mostly responsible for these autosomal interactions using a female ESC line in which RepB is deleted. Evidence is provided that Xist interacts with active autosomal genes containing lower coverage of repressive marks H3K27me3 and H2AK119ub and that RepB dependent Xist binding leads to dampening of expression, but not silencing of autosomal genes. These results were confirmed by overexpression studies using transgenic ESCs with doxycycline-inducible Xist as well as via a small molecule inhibitor of Xist (X1), inducing/inhibiting the dampening of autosomal genes, respectively. Finally, using MEFs and Xist mutants RepB or RepE the authors provide evidence that Xist is bound to autosomal genes in cells after the XCI process but appears not to affect gene expression. The data presented appear generally clear and consistent and indicate some differences between human and mouse autosomal regulation by Xist. Thus, these results are timely and should be published.

      We thank Rev3 for the positive remarks and great suggestions.  We have amended the manuscript per below. 

      Strengths:

      Regulation of autosomal gene expression by Xist is a "big deal" as misregulation of this lncRNA causes developmental defects and human disease. Moreover, this finding may explain sexspecific developmental differences between the sexes. The results in this manuscript identify specific mouse autosomal genes bound by Xist and decipher critical Xist regions that mediate this binding and gene dampening. The methods used in this study are appropriate, and the overall data presented appear convincing and are consistent, indicating some differences between human and mouse autosomal regulation by Xist.

      Weaknesses:

      (1) The figure legends and/or descriptions of data are often very short lacking detail, and this unnecessarily impedes the reading of the manuscript, in particular the figures would benefit not only from more detailed descriptions/explanations of what has been done but also what is shown. 

      We have included more detailed descriptions in the figure legends and throughout the manuscript.

      This will facilitate the reading and overall comprehension by the reader. One out of many examples: In Fig S1B in the CHART data at d4 and d7 there is not only signal in female WT Xist antisense but also in female sense control. For a reader that is not an expert in XCI it would be helpful to point out in the legend that this signal corresponds to the lncRNA Tsix (I suppose), that is transcribed on the other strand.

      We thank the reviewer for this excellent point.  We have amended the Results section accordingly.

      (2) Different scales are used in the lower panels of Figures 1A and 2A, which makes it difficult to directly compare signals between the different differentiation stages.

      We have included a figure combining all timepoints — d0, d4, d7, and d14 WT female Xist CHART signals  — on the X chromosome and autosomes to support our thesis. Please see new Figure 1B.

      (3) In this study some of the findings on mouse cells contrast previously published results in human ESCs: 1) Xist binding occurs preferentially to promoters in mice, not in human. 2) Binding of Xist is mostly detected in polycomb-depleted regions in mice but there is a positive correlation between Xist RNA and PRC2 marks in human ESCs. These differences are surprising but may be very interesting and relevant. While I am aware that this might be a difficult task, it would be helpful to experimentally address this issue in order to distinguish whether species specific and/or methodological differences between the studies are responsible for these differences.

      Indeed, our findings in mouse cells contrast with those observed in humans. As discussed in the manuscript, this discrepancy may be attributed to factors such as cell type, differentiation methods, and the Xist pull-down technique employed (our CHART method utilizes a 20 nt oligo library, whereas RAP uses long oligos). We agree that future work should investigate the underlying causes of these differences between mouse and human systems.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      For Figure 2: labelling ∆B on the panel A timeline (e.g. d0-∆B) would make the results clearer for the audience. Panel B makes most sense beside panel E of Figure 1, so combine here and skip in Figure 1?

      We have modified Figure 2A and thank Rev2 for this suggestion. As for the embedded tables: since we performed peak calling for WT and ∆B separately, we believe that showing both the peak numbers and their corresponding peak patterns provides a clearer representation of the data.

      I agree that at day 7 there appears to be a difference in X; but by day 14 this looks much more minimal - is it just time-shifted rather than altered? Perhaps this could be discussed. Autosomal binding sites show no change in number.

      Day 7 exhibits the strongest Xist binding on the X chromosome, consistent with the de novo establishment phase of XCI when Xist is expressed at the highest levels (300 copies/cell during de novo XCI versus ~100 copies/cell during maintenance [Sunwoo et al., 2015 as cited]. Per our RNA-seq analysis here, we also observed highest Xist expression on day 7 and reduced levels on day 14 (Fig. S5A). This expression difference explains the reduced Xist CHART levels on day 14 compared to day 7. 

      While the X has previously been examined, it would seem beneficial to conduct the same expression analyses (Figure 3) for the X (perhaps supplemental), as the authors have the data 'in hand'. I feel comparison to X in the main figure for panels A and B would fit, while a similar analysis for the X for panel C could be supplemental, presumably supporting the published data to which this data is currently compared. 

      This is a good suggestion. Please find the new data in Figures 2E-F and 3D, which demonstrate that the RepB deletion inhibits Xist binding on the X chromosome, resulting in increased X-linked gene expression, as previously mentioned. Since Xist binds across the X chromosome, we did not perform peak calling as we did for the autosomes. Therefore, applying a similar analysis as in Figures 3A-B may not be appropriate in this case.

      Such a direct comparison to X-data from the same study would be important. For panel H: How many replicates (2)? This should be in the legend. What is the change in median expression? Again, a supplemental figure showing impact on X-linked targets would be useful. Do male and female ESCs show an expression difference prior to differentiation (ie d0)? The data underlying this Figure should be in one of the supplementary tables, showing the full statistical tests and average change. The supplementary tables 8-12 list the WT target genes, not expression differences with the deletion. Again, given that the difference appears transient, might the ∆B cells be altered in rate of differentiation?

      Panel H (revised Figure 3G) includes two replicates, and this has been added to the legends. We have provided a supplementary figure demonstrating that RepB increases the expression levels of X-linked genes on days 4, 7, and 14 (revised Figure 3D). Male and female ESCs show differences in the expression of X-linked genes, as both X chromosomes are active in females at this stage prior to differentiation (revised Figure S5C). 

      A supplementary table with statistical tests and average change information has been included in our revised version (Table S11).

      On the other hand, these Xist-autosomal target genes displayed no significant differences between WT male, female, or ∆B female cells on day 0 — prior to onset of XCI and Xist expression. Please see new Figure 3H. 

      As for whether ∆B cells are altered in their rate of differentiation, the analysis by Colognori et al. 2019 indicates that ∆B cells differentiate similarly to WT cells. (In Figure 6 of Colognori et al. 2019, autosomal genes expressed similarly in WT and ∆B cells, whereas XCI is affected only in ∆B cells)

      We have also modified the legends for our supplementary tables.

      Why were the transgene lines examined upon neuronal differentiation rather than the same approach as in Figures 1-3? I would have thought neuronal differentiation might be more similar to d14, where limited changes remain? Could the authors clarify and discuss?

      We apologize for the confusion. The Tg lines in Figure 4 came from a previously published study. We performed reanalysis of published datasets because we wanted to test whether — in the hands of other investigators — cell lines expressing Xist also supported autosomal targeting. Here we examined Tg1 and Tg2, which respond to doxycycline to overexpress Xist from an ectopic site. Transcriptomic analysis showed significant downregulation of autosomal Xist targets, as exemplified by Bcl7b and Rbm14 (Figure 4C and S9B). In contrast, non-targets of Xist such as Stau1 did not demonstrate significant changes in gene expression (Figure 4E and 4F). Looking across all autosomal target genes, we observed a significant decrease in mean expression in the Xist overexpressing cell lines (Figure 4D). The fact that the autosomal changes were also observed in datasets generated by other investigators greatly strengthen our conclusions. We have clarified this in the Results section.

      Figure 5 - the legend should specify the number of replicates and clarify the blue/green (intuitive, but not specified). Are the 'target' / 'non-target' genes from d4 Chart (but the RNA from d5)? How are 'non-targets' defined - do they match the 'targets' in certain criteria (expression level, chromatin features, GC content)? Do they change per differentiation protocol?

      We have modified the legends to clarify that the 'target' and 'non-target' genes are derived from the day 4 CHART-seq data, while the RNA data is from day 5, as that study sequenced day 5 and not day 4. Non-targets were randomly chosen based on (i) the absence of Xist binding and (ii) similar expression levels. Please see revised Figure S8.

      It would be helpful to compare Xist expression levels across the various models, and the MEF model could be better described - are they polyploid as often happens?

      We have included the Xist expression levels of ES cells and MEF cells in the revised version (revised Figure S5A, 6D). The transformed MEFs are indeed tetraploid, as is typical.

      For 6A to be informative, one needs to know % mapping to X in ES timeline, which is in supplemental, so perhaps 6A should also be supplemental?

      We have moved 6A to the supplemental figure.

      It is odd that ∆B seems to have had more impact in MEFs, and I would like more discussion - but I also think I am missing something: "We observed that Xist signals were more substantially reduced on both the Xi and autosomal regions in ΔRepE MEFs compared to ΔRepB cells", yet in lower panel 6 G it looks like ∆B is LOWER than ∆E? Am I misinterpreting?

      We apologize for the confusing writing.  The revised text now reads:  “To investigate, we utilized a deletion of Xist’s Repeat E (∆RepE), which was previously demonstrated to severely abrogate localization of Xist to the Xi 41,42. We reasoned that the severe loss of Xist binding might unmask a transcriptomic difference. As expected, we observed that Xist signals were somewhat more reduced on the Xi in ΔRepE MEFs compared to ΔRepB cells (Figure 6E-6F). Despite this reduction, peak coverages in autosomal target genes did not increase in ΔRepE MEFs (Figure 6E-6F). However, there was an overall decrease in the number of significant autosomal peaks in ∆RepE MEFs relative to WT cells (Figure 6A). Regardless, we observed no significant transcriptomic differences in ∆RepE MEFs relative to WT MEFs (Figure 7A-7E). Additionally, further examination of RNA sequencing data from male and female MEF cells in two published studies 43,44 corroborated that the expression levels of these autosomal Xist targets did not exhibit significant changes (Figure 7F and 7G). Altogether, the analysis in MEFs demonstrates that Xist continues to bind autosomal genes in post-XCI somatic cells. However, autosomal binding of Xist in post-XCI cells does not overtly impact expression of the associated autosomal genes. Nonetheless, we cannot exclude more subtle changes that do not meet the significance cut-off.”

      Overall, I would like to see how consistent these autosomal peaks are - I shudder to suggest Venn diagrams, but something to show whether there are day/lineage specific peaks and/or ∆repeat B/E resistant peaks. 

      We now present Venn diagrams comparing MEF, ES_d4, and ES_d7, showing approximately 50% overlap between MEF and ES cells (revised Figure S10B). This may be expected, as each timepoint is a different developmental stage of XCI, with expected gene expression differences.

      Very minor comments:

      It would be easier if the supplemental tables were tabs in 1 file!

      We will defer to the editor on how best to format the supplemental tables.

      Similar to the text, could gene names be included in the supplemental?

      We have provided gene names in the supplemental files.

      Figure 3 legend: should 'representing' be representative?

      We have modified it.

      "Xist patterns identified in human cells" p 5; it is challenging to follow human versus mouse, so specify or ensure correct use of XIST/Xist Indeed, we edited the manuscript accordingly.

      Gene names should be italicized.

      We have italicized gene names in our manuscript.

      Ref. 38 lacks details (...).

      We have updated the reference.

      Peak-like characters - perhaps characteristics? P8

      We have modified this.

      Reviewer #3 (Recommendations for the authors):

      On page 6, the 6th sentence in the first paragraph needs correction. "Consistent with Xist's behavior on the X chromosome."

      We have modified the sentence. Thank you.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The study by Longhurst et al. investigates the mechanisms of chemoresistance and chemosensitivity towards three compounds that inhibit cell cycle progression: camptothecin, colchicine, and palbociclib. Genome-wide genetic screens were conducted using the HAP1 Cas9 cell line, revealing compound-specific and shared pathways of resistance and sensitivity. The researchers then focused on novel mechanisms that confer resistance to palbociclib, identifying PRC2.1. Genetic and pharmacological disruption of PRC2.1 function, but not related PRC2.2, leads to resistance to palbociclib. The researchers then show that disruption of PRC2.1 function (for example, by MTF2 deletion), results in locus-specific changes in H3K27 methylation and increases in D-type cyclin expression. It is suggested that increased expression of D-type cyclins results in palbociclib resistance.

      Strengths:

      The results of this study are interesting and contribute insights into the molecular mechanisms of CDK4/6 inhibitors. Importantly, while CDK4/6 inhibitors are effective in the clinic, tumour recurrence is very high due to acquired resistance.

      Weaknesses:

      A key resistance mechanism is Rb loss, so it is important to understand if resistance conferred by PRC2.1 loss is mediated by Rb, and whether restoration of PRC2.1 function in Rb-deplete cells results in renewed palbociclib sensitivity. It is also important to understand the clinical implications of the results presented. The inclusion of these data would significantly improve the paper. However, besides some presentation issues and typos as described below, it is my opinion that the results are robust and of broad interest.

      Major questions:

      (1) Is the resistance to CDK4/6 inhibition conferred by mutation of MTF2 mediated by Rb?

      (2) Are mutations in PRC2.1 found in genetic analyses of tumour samples in patients with acquired resistance?

      We thank the reviewer for their editing and experimental suggestions, and have integrated their responses into our re-submitted manuscript.

      We also agree that understanding the role of RB1 in mediating palbociclib resistance to the proposed resistance mechanism is of particular interest. However, as there are three RB proteins expressed in human cells, this is a technically difficult question to probe genetically. Despite this technical challenge, we have provided multiple lines of evidence in our resubmitted manuscript that the resistance to palbociclib observed in our PRC2.1-deficent cells is mediated through the canonical CDK4/6-RB1 pathway. First, disruption of RB1 in HAP1 cells results in palbociclib resistance to a level comparable level to PRC2.1 disruption (Fig. 4E). Second, inactivation of SUZ12 or MTF2 increases the number of cells entering S-phase in palbociclib treatment (Fig. 4G) with no increase in basal rates of apoptosis (Fig. S2D), suggesting that any proliferation advantage observed in PRC2.1-defective cells is due to resistance to  palbociclib-induced cell cycle arrest. Third, we show that over expression of CCND1 and CCND2 is sufficient to drive resistance to palbociclib in wild-type HAP1 cells (Fig. S5F).  And finally, increased levels of CCND1 and CCND2 observed in cells lacking PRC2.1 activity results in higher CDK4/6 activity as measured by RB1 phosphorylation, despite palbociclib blockade (Fig. 6F). All these lines of evidence strongly suggest that MTF2-containing PRC2.1 regulates G1 progression in through the canonical CDK4/6RB1 pathway by repressing CCND1 and CCND2 expression. 

      Whether or not MTF2 deletion leads to palbociclib resistance in clinical samples is also of a question of particular interest. Currently, we are unaware of any reports that specifically mention MTF2 deletion as leading to palbociclib resistance, and we were unable to find another example in our own cancer database review. However, we have included references to other examples of MTF2 mutation resulting in chemotherapeutic resistance in our discussion. Additionally, although MTF2 is rarely observed to be mutated in cancers (Ngubo et al. 2023), it is highly differentially expressed and investigating decreased MTF2 transcription in palbociclib resistant tumors, though challenging, might prove fruitful.  However, as mechanisms of palbociclib resistance is an area of active investigation, we speculate that future studies might uncover additional examples of MTF2 mediating resistance to this clinically important chemotherapeutic.  

      Reviewer #2 (Public Review):

      Summary:

      Longhurst et al. assessed cell cycle regulators using a chemogenetic CRISPR-Cas9 screen in haploid human cell line HAP1. Besides known cell cycle regulators they identified the PRC2.1 subcomplex to be specifically involved in G1 progression, given that the absence of members of the complex makes the cells resistant to Palbociclib. They further showed that in HAP1 cells the PRC2.1, but not the PRC2.2 complex is important to repress the cyclins CCND1 and CCND2. This can explain the enhanced resistance to Palbociclib, a CDK4/6Inhibitor, after PRC2.1 deletion.

      Strengths:

      The initial CRISPR screen is very interesting because it uses three distinct chemicals that disturb the cell cycle at various stages. This screen mostly identified known cell cycle regulators, which demonstrates the validity of the approach. The results can be used as a resource for future research.

      The most interesting outcome of the experiment is the finding that knockouts of the PRC2.1 complex make the cell resistant to Palbociclib. In a further experiment, the authors focused on MTF2 and JARID2 as the main components of PRC2.1 and PRC2.2, respectively. Via extensive analyses, including genome-wide experiments, they confirmed that MTF2 is particularly important to repress the cyclins CCND1 and CCND2. The absence of MTF2 therefore leads to increased expression of these genes, sufficient to make the cell resistant to palociclib. This result will likely be of wide interest to the community.

      Weaknesses:

      The main weakness of the manuscript is that the experiments were performed in only one cell line. To draw more general conclusions, it would be essential to confirm some of the results in other cell lines.

      In addition, some of the findings, such as the results from the CRISPR screen as well as the stronger impact of the MTF2 KO on H3K27me3 and gene expression (compared to JARID2 KO), are not unexpected, given that similar results were already obtained before by other labs.

      We thank the reviewer for their suggestions and we believe that we have addressed their main concern about the generality of the MTF2 regulation of D-type cyclin expression in our resubmitted manuscript. We have now shown through shRNA knockdown that MTF2 represses CCND1 in two additional cell lines, the breast cancer MDA-MB-231 and immortalized monkey COS7 cell line (Fig. 6E). However, it is important to note that MTF2 did not control CCND1 expression in every cell line tested (Fig. 6D), underscoring the context-dependent nature of this regulation. Future studies will illuminate what cell or tumor types in which this regulation is observed.

      Additionally, while MTF2 has previously been shown to exert a greater effect on H3K27me3 levels in some circumstances (Loh et al. 2021, Rothberg et al. 2018), a number of notable reports in ES cell lines have concluded that PRC2 localization and H3K27me3 at the majority of genomic sites are dependent on both PRC2.1 and PRC2.2 activity (Healy et al. 2019, Højfeldt et al. 2019, Perino et al. 2020, Oksuz et al. 2018). Therefore, we think it is important to highlight the greater dependence on MTF2 for promoter proximal H3K27me3 levels in our transformed cell line context.  

      Reviewer #3 (Public Review):

      This study begins with a chemogenetic screen to discover previously unrecognized regulators of the cell cycle. Using a CRISPR-Cas9 library in HAP1 cells and an assay that scores cell fitness, the authors identify genes that sensitize or desensitize cells to the presence of palbociclib, colchicine, and camptothecin. These three drugs inhibit proliferation through different mechanisms, and with each treatment, expected and unexpected pathways were found to affect drug sensitivity. The authors focus the rest of the experiments and analysis on the polycomb complex PRC2, as the deletion of several of its subunits in the screen conferred palbociclib resistance. The authors find that PRC2, specifically a complex dependent on the MTF2 subunit, methylates histone 3 lysine 27 (H3K27) in promoters of genes associated with various processes including cell-cycle control. Further experiments demonstrate that Cyclin D expression increases upon loss of PRC2 subunits, providing a potential mechanism for palbociclib resistance.

      The strengths of the paper are the design and execution of the chemogenetic screen, which provides a wealth of potentially useful information. The data convincingly demonstrate in the HAP1 cell line that the MTF2-PRC2 complex sustains the effects of palbociclib (Figure 4), methylates H3K27 in CpG-rich promoters (Figure 5), and represses Cyclin D expression (Figure 6). These results could be of great interest to those studying cell-cycle control, resistance mechanisms to therapeutic cell-cycle inhibitors, and chromatin regulation and gene expression.

      There are several weaknesses that limit the overall quality and potential impact of the study. First, none of the results from the colchicine and camptothecin screens (Figures 1 and 2) are experimentally validated, which lessens the rigor of those data and conclusions. Second, all experiments validating and further exploring results from the palbociclib screen are restricted to the Hap1 cell line, so the reproducibility and generality of the results are not established. While it is reasonable to perform the initial screen to generate hypotheses in the Hap1 line, other cancer and non-transformed lines should be used to test further the validity of conclusions from data in Figures 4-6. Third, conclusions drawn from data in Figures 3D and 4D are not fully supported by the experimental design or results. Finally, there have been other similar chemogenetic screens performed with palbociclib, most notably the study described by Chaikovsky et al. (PMID: 33854239). Results here should be compared and contrasted to other similar studies.

      We thank the reviewer for their suggestions regarding our manuscript. While the genes recovered as mediating cellular responses to camptothecin and colchicine was never confirmed following our chemogenetic screens, we felt our primary findings were in the area of palbociclib resistance and decided focus our follow-up investigations on genes. We included the results camptothecin and colchicine chemogenetic screens as confirmation of the specificity of PRC2 mutation resulting in resistance to palbociclib (Fig. 4C) and for others in the community to use as a resource for future investigations. We have also clarified our results for Figure 3D and 4D in our revised manuscript, as well as included additional plots of these results (Fig. S1DS1F). And, with our resubmitted manuscript, we believe we have addressed their concern of the generality of our results by demonstrating our primary finding that MTF2 regulates D-type cyclins in additional cell lines other than HAP1. We feel these results indicate that while not “general”, there are additional cellular contexts that our main result holds true. In line with this, and to address how our chemogenetic screens fits into the landscape of previous studies, including Chaikosvsky et al., we have included the following lines to our discussion:  “Additionally, other chemogenetic screens utilizing palbociclib and have not identified that inactivation of PRC2 components as either enhancing or reducing palbociclib-induced proliferation defects, suggesting that PRC2 mutation is neutral in the cell lines studied. These observations not only underscore the context-dependent ramifications of mutation of these PRC2 complex members, but also may help inform the context in which CDK4/6 inhibitors are most efficacious.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) "We found that only thirteen and twenty genes resulted in sensitivity or resistance, respectively, in every conditions tested and were deemed non-specific and excluded from any further analysis (see Table S2)." It's unclear to me why these genes were deemed 'nonspecific'. Are these genes functionally important for the general exclusion of xenobiotic molecules?

      By this, we simply meant that these effects were not specific to one condition. Such genes could affect drug half-life or a general stress response, but are less likely to have functions directly tied to the pathway targeted by a drug than are genes whose loss affects only one condition.  

      (2) "Given that increased CCND1 levels is sufficient to drive increased CDK4/6 kinase activity, upregulation of these D-type cyclins is likely to be a significant contributor to the palbociclib resistance in MTF2∆ cells." It's unclear to me what is the basis for this statement. This is only true if there is free CDK4/6. If CDK4/6 is already fully occupied by D-type cyclins, then increased CCND1 levels would not be expected to have an effect. 

      While we anticipated that increased levels of CCND1 would result in more CDK4/6-Dtype association, we now demonstrate in the new Figure S5F that there is more CCND1 in complex with CDK6 in both SUZ12∆ and MTF2∆ cell lines. Furthermore, we able to show in Figure S5G that overexpression of D-type cyclins results in resistant to palbociclib-induced proliferation defects in HAP1 cells.

      (3) The description of the results is very confusing in places, especially regarding "resistance" versus "sensitivity" genes. For example: "CCNE1, CDK6, CDK2, CCND2 and CCND1, all of which are integral to promoting the G1/S phase transition, ranked as the 2nd, 24th, 27th, 29th and 46th most important genes for palbociclib resistance, respectively (Figures 1F and 1G). CCND1 and CCND2 bind either CDK4 or CDK6, the molecular targets of palbociclib, whereas CDK2 and CCNE1 form a related CDK kinase that promotes the G1/S transition.

      Similarly, cells with sgRNAs targeting RB1, whose phosphorylation by CDK4/6 is a critical step in G1 progression, displayed substantial resistance to palbociclib." My reading of this paragraph suggests that disruption of the CDK6 locus is associated with palbociclib resistance - surely this is a typo and instead should have been sensitivity? Please explain.

      We thank the reviewer for pointing this out and have corrected this typo  

      (4) Sensitivity to palbociclib was enhanced in cells expressing sgRNAs targeting H4 acetylation, positive regulators of Pol II transcription, and regulators of the DNA Damage Response pathway (Figures 3A and 3B), although this sensitivity was much weaker than that seen with DNA damaging agents. This observation is consistent with long-term treatment with palbociclib inducing DNA damage, as has been suggested by a number of recent publications 65,66." This is also consistent with recent work on Cdk7 inhibitors (Wilson et al. Mol Cell 2023), as Cdk7 inhibition is expected to affect both CDK1/2/4/6 activities and Pol II transcription.

      We thank the reviewer for bringing this observation to our attention and we have added this citation to this passage in our manuscript.

      (5) Figure 3D - would it not make sense to plot the data such that palbo concentration is on the x-axis? It is also difficult to interpret since the data are normalized to starting "% proliferation" at the indicated palbo treatment, when it is likely that % proliferation changes significantly with palbo concentration. Indeed, this is the graphing format used for a later figure (Figure 4D). The data with rotenone suggests palbo antagonizes rotenone-mediated reduction in proliferation. But it's unclear to me whether the graph shows the converse - that rotenone treatment modulates palbo-induced cell cycle arrest.

      This reviewer is correct about the fact that increasing doses of palbociclib in the absence of oxidative phosphorylation do indeed have an effect on proliferation. However, it is helpful to normalize proliferation values to each initial dose of palbociclib and then compare this to the different oxidative phosphorylation inhibitors treatment combinations. To illustrate that the oxidative phosphorylation inhibitors do indeed antagonize palbociclib-induced proliferation defects, we have now included the data graphed as each oxidative phosphorylation inhibitor vs palbociclib as Supplemental Figures S1D-S1F.

      • The highest concentration of GSK126 tested (5µM) does not appear to confer resistance, but perhaps this is due to off-target effects or cytotoxicity?

      We agree with the reviewer that at the highest doses of dose of GSK126, low doses of palbociclib do not confer resistance to palbociclib. However, higher doses do appear to have this effect. We have included a statement in our results section to address this reviewer’s observations. 

      • Disruption of Emi1 leads to resistance (Figure 1F, FZR1), yet overexpression induces resistance (Mouery et al. bioRxiv 2023). Explain.

      We do not understand why EMI1 responds in this way, and therefore we cannot comment on this in the text. 

      Typos/stylistic comments:

      • Typo "However, the net result of these opposing effects on cell cycle progression, and the contribution of the individual subcomplexes to this regulation, rained unclear."

      We thank the reviewer for pointing this out, and we have corrected it.  

      • Use of the word "growth" - I think the authors should be more precise. Is "proliferation" meant here?

      We thank the reviewer for pointing this out, and we have corrected it.

      • n Figure 4G, two of the panels have 8.42%. Is this correct, or may it be a copy/paste error?

      This was an error, but is no longer relevant as we have reconducted and reanalyzed this experiment.

      Reviewer #2 (Recommendations For The Authors):

      Major Points

      (1) Some of the conclusions should be confirmed in additional cell lines. I would suggest testing the resistance to Palbociclib in several additional cell lines, where MTF2 and JARID2 are deleted. If the conclusion can be generalized, one would expect that the differential role of MTF2 versus JARID2 can be confirmed in more cell lines.

      While the PRC2.1-dependent repression of D-type cyclins does not appear to be general, we have now demonstrated in Figures 5SE and 6F that there are multiple different cellular contexts in which our observations are consistent. Specifically, we demonstrate that GSK126 causes upregulation of CCND1 in both immortalized nontumor cells (COS7 cells) and in the breast cancer cell line MDA-MB-231. Moreover, in both cases we showed that this effect is PRC2.1-dependent, as shRNA knockdown of MTF2 increases expression of CCND1.

      (2) In addition, it may be attractive to make use of publicly available RNA-seq data of MTF2 and JARID2 knockout/down cells, to investigate the generality of the finding that PRC2.1 regulates CCND1 and CCND2.

      While it would be useful to address this issue, Figure S5E demonstrates that the repression of D-type cyclin expression by PRC2.1 is context dependent. Furthermore, prior to identifying the lines shown in Figure 6F and 5SE, we were not aware of which lines to focus our investigations on. However, we have now demonstrated a few cellular contexts in which either chemical inhibition of PRC2 or knockdown of MTF2 results in de-repression of CCND1 expression.

      (3) At a bare minimum the authors should strongly discuss the limitations of the study, and tone down the conclusions.

      We would agree with this based upon the data in the original submitted manuscript, however, now that we have shown that this effect is more general, this is less critical. That said, we do not see this effect in all cell lines, and we have made this apparent in the final version of the manuscript.

      Minor point

      (1) In my view, Figures 1-3 should be shortened to the most essential points, and some data/figures should be moved to the supplementary figures. Especially the STING genenetwork graphs are in my view not particularly meaningful.

      While we understand the opinion of this reviewer, we feel that these data will be of significant interest to some readers.  

      (2) Figure 6E and 6F/G appear to be largely redundant. This can perhaps be made more concise.

      This has been addressed in the new version of Figure 6

      (3) Figure 5D should be enlarged. 

      We thank the reviewer for this suggestion and have enlarged the image.

      Reviewer #3 (Recommendations For The Authors):

      The manuscript could be edited to improve clarity. In several places, the scientific logic motivating an experiment is confusing, and there are several hypotheses and conclusions that seem opposite from what the data are suggesting. Some aspects of the figures were also unclear. Specific examples include the following:

      (1) Last sentence of abstract : "Our results demonstrate a role for PRC2.1, but not PRC2.2, in promoting G1 progression." Data show that knockout of PRC2.1 components promotes G1 progression through upregulation of CycD, so the conclusion here is the opposite.

      We thank the reviewer for catching this error. We have now changed this to “in antagonizing G1 progression”.

      (2) In the second paragraph of the results, CCNE1, CDK2, etc are described as scoring high for palbociclib resistance, but those genes scored as sensitizing. Also, in that paragraph, it is described that a drug is sensitizing cells to loss of a gene, which seems like incorrect logic. It should be clarified that knock-out of a gene either sensitizes or desensitizes cells to the drug.

      We thank the reviewer for catching this error. We have now corrected it.  

      (3) In the motivation for the experiment in Figure 3D, it is written: "we asked whether chemical inhibition of oxidative phosphorylation could rescue sensitivity to palbociclib". Considering that knock-out of genes that mediate oxidative phosphorylation confer resistance to palbociclib, it is confusing why it was expected that chemical inhibitors would restore sensitivity.

      We are sorry if the original wording was confusing. We have now changed this to “combined inhibition of oxidative phosphorylation and CDK4/6 activity mutually rescue the proliferation defect imposed by agents targeting the other process”.  

      (4) If the intention of Figure 3D is to test the hypothesis that chemical inhibition of oxidative phosphorylation modulates sensitivity to palbociclib, the clarity of Figure 3D would be improved if data were shown such that palbociclib concentration is on the x-axis and the different curves are different drug concentrations.

      It appears that there is some mutual suppression, which inhibition of each process rescues cells partly from inhibition of the other. In fact, with these drugs the stronger of the two is seen as the rescue of mitochondrial poisons by palbociclib. We have now discussed this in the text.  

      (5) The authors should check the units on the x-axis in Figure 4D, should they be log[uM Palbo] or log [nM Palbo]?

      We thank the reviewer for catching this error. We have now corrected it

      (6) It should be clarified which data are summarized in the graph to the right in Figure 4G, are these experiments with palbociclib?

      This is currently included in the figure legends.

      (7) The text suggests that the control CCNE1 knockout is shown in Figure 4E, but those data are missing.

      This has been corrected in Figure 4E.

      Several conclusions are not well supported by the data and should be revised or more data and analysis should be added.

      (1) The titular conclusion that the "PRC2.1 Subcomplex Opposes G1 Progression through Regulation of CCND1 and CCND2" has only been demonstrated in the context of a Cdk4/6 inhibitor in HAP1 cells. There is little evidence supporting this claim that is broadly applicable. For example, data in Figure 4G show small and not demonstrable significant differences in G1 and S phase populations in the mock experiments. Also, experiments in other cells are needed to support the rigor and generality of the conclusion.

      Our chemogenetic screen and competitive proliferation assay data in Figure 4A, 4C and 4E support the conclusion that PRC2.1 and PRC2.2 play opposing roles in G1 progression. Furthermore, we have repeated the initial BrdU incorporation experiments shown in Figure 4G and have been able to demonstrate that JARID2∆ cells do indeed display a significant decrease of cells entering into S-phase when treated with palbociclib. Most importantly, in the Figures 6D and 6E we show additional cell lines where this is the case.  Therefore, we feel that this title is valid in the current version of the manuscript, where we have shown it to be the case in multiple tumor-derived human cell lines as well as immortalized non-human primate cells.  

      (2) It is unclear how the data in Figure 3D support the conclusion that the administered inhibitors of oxidative phosphorylation influence response to palbociclib.

      As noted in the response to point 4, we have now discussed this mutual rescue more thoroughly in the text.  

      (3) In Figure 4D, the IC50 values should be calculated and statistical significance based on biological replicates should be determined. Also, the conclusion that "increasing doses of GSK126 withstood palbociclib-induced growth suppression" is overstated, as ultimately all drug conditions succumb to palbocilib suppression of proliferation, although there may be differences in sensitivity.

      We have now  included a statical analysis of each data point in Figure 4D.  

      Editorial comments:

      (1) The title does not seem to optimally capture the content of the paper. Please consider changing it, e.g. focusing on palbociclib resistance. 

      While we used this particular drug to make the original observation, we feel it is more general to discuss the underlying biology (cyclin gene control) than the pharmacological methodology. Moreover, we have now extended our findings about the regulation of D-type cyclins by PRC2.1 to several cell lines, derived from both cancers and primary cells, re-enforcing the fact that this effect is observed more broadly.   

      (2) Please indicate the biological system (haploid human HAP1 cells) in either title or abstract.

      The abstract now indicates that we have observed this in CML, breast cancer and immortalized primary cells.

    1. Author response:

      We are submitting a revised manuscript with major additions that address the main concerns in the initial reviews. At the highest level, this revision provides i) orthogonal biochemical measurements that yield concrete evidence of lysosomal protein aggregates, and ii) a plausible mechanism linking lysosomal lipid handling and protein aggregation through disruption of ESCRT function. We believe these additions significantly improve the completeness of this study and the conclusions that can be drawn from the data.

      Below are more specific highlights on the addition in this revision:

      -       We included orthogonal techniques (thioflavin-T staining and Lyso-IP followed by differential extraction) and confirmed the accumulation of RIPA-insoluble protein aggregates at the lysosomes in cells under lipid perturbation (Figure 3).

      -       We performed TMT-Proteomics and identified accumulation of insoluble ESCRT components at the lysosomes under lipid perturbation (Figure 4). Two new authors involved in this effort are added onto the manuscript.

      -       The ESCRT result prompted us to revisit lysosomal membrane integrity. With improved imaging conditions and analysis we were able to see increased membrane permeabilization under lipid perturbation. VPS4A overexpression partially rescued this phenotype, suggesting that lipid accumulation impairs ESCRT disassembly (Figure 5).

      -       Together, the results suggest that lipid perturbation impairs ESCRT function, compromising both lysosomal membrane repair and microautophagy, resulting in the accumulation of endogenous protein aggregates at the lysosomes (Graphical Abstract).

      Reviewer #1 (Recommendations For The Authors):

      (1) Perhaps the most prominent limitation of this work is the unilateral focus on native cells (i.e. cells under no endogenous or exogenous stress) as the model for protein aggregate formation. Furthermore, although the ProteoStat stain has been utilized by many investigators before, the sole reliance on this stain as the read-out for their assays is concerning. To compound the concern, the ProteoStat-positive puncta co-localize with lysosmal markers which was surprising even to the authors. All in all, it behooves the authors to test proteostasis in multiple parallel ways to actually define what they are studying. How is it possible that protein aggregates under native conditions are only co-localized with lysosomes? Are we really studying protein aggregates which should predominantly be cytoplasmic insoluble aggregates?

      (a) They need to get away from a simple stain like ProteoStat and conduct co-stainings with other markers such as poly-ubiquitin antibodies and other chaperones to define what and where else exactly are these aggregates.

      Co-staining with poly-ubiquitin was included in the original manuscript. We added orthogonal staining with another widely used amyloid dye, Thioflavin-T, and provided fine-grained quantification of lysosomal vs cytosolic localization of various signals (Figures S4A-C & 3A-B).

      (b) They need to do Immunoblots with and without triton insolubility to see if these aggregates are insoluble as most would predict. They can do lysosomal isolation vs cytoplasmic to see if the insoluble aggregates are really lysosomal.

      We performed Lyso-IP followed by differential detergent extraction to confirm the accumulation of insoluble proteins at the lysosomes (Figure 3C). Proteomic analysis identified some of these insoluble proteins as ESCRT subunits (Figure 4).

      (c) They should compare aggregate formation in the native state versus cells with lysosomal inhibition via Bafilomycin or chloroquine versus cells with proteosomal inhibition. The lysosomal inhibition experiments are particularly informative given the lysosomal relevance they have uncovered.

      We included other small molecule inhibitors and at different time points to compare the effect of different modes of proteostasis challenge (Figure S4A-D). Together with the ESCRT finding, our results suggest the role of microautophagy in our system, and provide a model of how ProteoStat- and/or ubiquitin- positive substrates become partitioned between the cytoplasm and lysosomes under different perturbations.

      (d) Many protein aggregates which are too bulky for proteosome degradation will traditionally be dealt with by aggrephagy. Why is this not observed?

      Knockdown of core macroautophagy components did not impact Proteostat intensity in our CRISPRi screen, suggesting that basal macroautophagy plays a negligible role in clearing endogenous amyloid-like structures in our experimental system. We provide an alternative model that these aggregates instead arrive at the lysosomes via microautophagy.

      (2) After addressing #1, they can validate if the genes they identified by CRISPR screens are also important in modulation of protein aggregate burden in other systems. For example, if they inhibit lysosomes by Bafilo or Chloroquine to obtain protein aggregates and then Knockdown the identified genes in the CRISPR screens, will they get the same results?

      We addressed the effect of different modes of proteostasis challenge as recommended above. Deacidifying the lysosomes alone causes intense protein aggregation (Figure S4A-D) and eventually cell death, and was thus not combined with other perturbations.

      (3) They identify lysosomal lipid metabolism genes/pathways as the culprit for inducing proteostasis. In particular sphingolipid and cholesteryl ester species appear to be operational here. However, there are no specific lipids species or specific lipid metabolism gene that is causative. Rather, you have to knockdown entire processes to have an effect. This suggests that the focus on lysosome health (i.e. permeability, proteolysis, etc) is rudimentary. When you have to knockdown entire classes of lipids, this would indicate more broad effects on cellular lipids (including membrane lipids beyond the lysosome) and related cellular health?

      We included data on the effect of knocking down MYLIP, PSAP, and as a comparison PSMD2 on the growth rate of K562 cells (Figure S5A). MYLIP and PSAP KDs, which cause predominantly an accumulation of lipids, do not impede cell growth. Increasing lipid uptake by MYLIP KD increases cell proliferation under our culture conditions, suggesting a general negative impact on cell health was not required for the association between lipid levels and protein aggregates.

      (a) They conduct a superficial methyl-beta-cyclodextrin experiment with equivocal results. The use of MBCD for different time-courses to deplete various membrane cholesterol pools including the plasma membrane pool is important to ascertain what aspect of the cellular cholesterol is affecting proteostasis. MBCD +/- cholesterol reintroduction time-courses for rescue will also be key to determine the culprit cellular cholesterol pool.

      The MBCD / Filipin experiment helped us determine that ProteoStat doesn’t directly stain cholesterol, nor any major plasma membrane components. Free cholesterol was implicated in neither the screen nor the lipidomics and was not the subject of targeted experiments.

      (b) The same concept can be applied to sphingolipids. There are sphingolipids in abundance in multiple membrane compartments. Which ones are causal here? More nuanced evaluation of this with sphingolipid staining/tracking can be conducted.

      We attempted experiments where sphingolipids were added back to cells grown in FBS-depleted media. Nevertheless, we were not able to consistently deliver these lipid species and doing so while ensuring the correct subcellular localization at physiologically relevant level would require substantial methods development.

      (c) As part of this, are lipid rafts and/or caveolae being affected by the perturbations in cholesterol and sphingolipids? Lipid rafts are highly enriched in these 2 lipids which could link to their preteostasis observation.

      Indeed, ceramides released from SM hydrolysis are proposed to self-assembled into microdomains with negative curvature that can promote the formation of intralumenal vesicles (Alonso and Goni, 2018; Niekamp et al 2022). We propose that SM accumulation may hinder this process by counteracting the negative membrane curvature and impede microautophagy.

      (d) How about ER membrane lipids? The UPR and subsequent effects on proteostasis are intricately involved with ER lipid bilayer composition.

      We did not perform lipidomics on ER membranes in this study, though we note that at steady state, sphingolipids and cholesterol esters are not expected to be enriched at the ER (Ikonen and Zhou, 2021). We checked whether lipid-related genetic perturbations induced the UPR in published perturb-seq data in K562 cells. Neither MYLIP nor PSAP knockdown induced a UPR.

      In conclusion, the manuscript is interesting but the excitement over a link between lysosome-related lipid metabolism and proteostasis needs to be tamped until a more robust experimental approach is employed to generate supportive and corroborating results.

      Reviewer #2 (Recommendations For The Authors):

      - The paper has a number of grammatically awkward sentences. Editing these would enhance clarity.

      - It is important to show the co-localization of aggregates with the lysosome. This is shown in supplements but should be in a main figure. Here the authors cite previous work indicating that ProteoStat puncta co-localize with ubiquitinated proteins and state that they do not see this, then essentially just move on. Is there an explanation for this discrepancy and can it be resolved? What do they think is really going on? What happens to levels of ubiquitinated proteins when lipid metabolism is perturbed as in these experiments?

      We have included the lipid-induced lysosomal protein aggregation data in the main text (Figure 3A-B), and provided fine-grained quantification of the cytosolic-vs-lysosomal ProteoStat / Ub / ThT signals under different aggregate-inducing conditions (Figure S4A-D). We discuss these results in the main text and propose a model involving ESCRT-mediated microautophagy in the main text. This is supported further by the LysoIP-proteomics and LMP analysis.

      - Please add an indicator of amino acid numbers to Fig. 3C.

      These annotations are now included (now Figure S3C).

      - The legend for 3D is mislabelled.

      We have corrected the legend (now Figure S3D).

      Reviewer #3 (Recommendations For The Authors):

      Protein homeostasis and lipid homeostasis are both are important for maintaining cellular functions. However, the crosstalk remains largely unknown. The manuscript entitled as "Impairment of lipid homoeostasis causes accumulation of protein aggregates in the lysosome" deals with this interesting topic. An important link between lysosomal protein aggregation and sphingolipids/cholesterol esters metabolism were discovered. The topic belonging to the Cell Biology domain also falls into the aims and scope of eLife. Here are the revisions I recommend:

      (1) From lipidomics analysis, a remarkable correlation between levels of sphingomyelin and cholesterol ester and ProteoStat staining was found. Could the authors explain how sphingomyelin and cholesterol ester are quantified? The two lipids are not included as internal standards from the lipidomics experiment.

      Sphingomyelin and cholesterol ester internal standards are included in the Avanti 330707 SPLASH® LIPIDOMIX® Mass Spec Standard, which was supplied at 3% v/v to the MeOH/H2O cell lysis buffer. We have amended the Methods section to clarify this.

      (2) Could the authors perhaps delete Figure 1B and show it on Figure 2A only? There is no need to show the same figure two times. The threshold of both False Discovery Rate and Median Enrichment needs to be added. From Figure 2A, the Lysosomal hydrolases (GBA, LIPA, GALC) seems located in statistically insignificant region. Based on previous studies, the GBA could have an effect on sphingolipid levels, then how to explain that sphingomyelin was highly correlated with ProteoSate staining?

      We have combined the two volcano plots into a single figure (now Figure 1D), and added a line to help visualize the gene effects while considering the combined contribution of FDR and enrichment. Individual lysosomal hydrolases indeed have insignificant effects on ProteoStat and this is discussed in the main text as having relatively constrained impacts on the general lipidome. For example, while GBA and GALC KDs can lead to accumulation of their immediate substrates (glucosylceramide and galactosylceramide, respectively), they do not directly impinge on sphingomyelin.

      (3) The authors show the corelation between ProteoState staining and different lipids/lipid classes in Figure 3B and Figure S3A. It is not necessary to show the corelation with individual lipids (such as sphingomyelin(d18:1/24:0) and cholesterol ester(18:2). The corelation with full collection of lipid classes would be more representative, which is only list in Figure 3B and Figure S3A. It is suggested to add the information of how many individual lipids in each chass are used for the correlation analysis. Replace Figure 3A to Figure S3A, and put Figure 3A as supplementary figure are suggested.

      We decided to retain the correlation of two individual lipids (a sphingomyelin and a cholesterol ester species) with ProteoStat as examples to illustrate with clarity how we obtained the class-wide comparison. The number of individual lipids included in each class for correlation analysis is now included in Figures 2F and S3A.

      (4) The authors state that lipid uptake and metabolism modulate proteostasis. However, only cholesterol and LDL were tested. It would be more precise to state as cholesterol uptake and metabolism modulate proteostasis. In addition, sphingolipids and cholesterol esters accumulate with increased lysosomal protein aggregation. It would be interesting to see the effects of sphingolipids uptake, since sphingolipids are correlated with proteostasis better than cholesterol.

      We attempted to add back specific sphingolipids to assess sufficiency. However, we found it challenging to ensure that these lipids were distributed to the correct subcellular locations at physiologically relevant levels. Without this crucial information, it was difficult to draw any conclusions about the sufficiency of the sphingolipids we tested to impair proteostasis.

      Alonso A, Goñi FM. 2018. The Physical Properties of Ceramides in Membranes. Annu Rev Biophys 47:633–654. doi:10.1146/annurev-biophys-070317-033309

      Ikonen E, Zhou X. 2021. Cholesterol transport between cellular membranes: A balancing act between interconnected lipid fluxes. Dev Cell 56:1430–1436. doi:10.1016/j.devcel.2021.04.025

      Niekamp P, Scharte F, Sokoya T, Vittadello L, Kim Y, Deng Y, Südhoff E, Hilderink A, Imlau M, Clarke CJ, Hensel M, Burd CG, Holthuis JCM. 2022. Ca2+-activated sphingomyelin scrambling and turnover mediate ESCRT-independent lysosomal repair. Nat Commun 13:1875. doi:10.1038/s41467-022-29481-4

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. Point-by-point description of the revisions

      • *Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The authors present the use of previously identified biosensors in a single-molecule concentration regime to address lipid effector recruitment. Using controlled and careful single-cell based analysis, the study investigates how expression of the commonly used PIP3 sensor based on Akt-PH domain interferes with the native detection of PIP3. Predominantly live-cell fluorescence microscopy coupled to image analysis drives their studies.

      Conceptually, this manuscript carefully and quantitatively describes the influence of lipid biosensor overexpression and presents a means to overcome the inherent and long-recognized problems therein. This solution, namely employing low expression of the lipid biosensor, should be generally applicable. The work is of general interest to cell biologists focused on answering questions at membranes and organelles, including especially those interested in lipid-mediated signaling transductions.

      Reviewer 1 Major:

      #1.1 The terminology "single molecule biosensor" is not really appropriate. A protein is not "single-molecule". An enzyme does not "single molecule". Better is biosensors at single-molecule expression levels. In most cases, this should be changed. Single-molecule vs single-cell vs. bulk measurements are often poorly defined in quantifications and conflating these does not help the case, which is already supported by generally clear data.

      We appreciate the reviewer’s thoughtful critique of our grammatically incorrect use of jargon; we saw this as soon as they mentioned it! We have amended the manuscript where appropriate as detailed:

      • Title is now changed to “Lipid Biosensors Expressed at Single Molecule Levels Mitigates Inhibition of Endogenous Effector Proteins”
      • Last paragraph of the introduction on __ 2__ now reads “As well as alleviating inhibition of PI3K signaling, biosensors expressed at these low levels show improved dynamic range and report more accurate kinetics than their over-expressed counterparts."
      • The title of the results section on __ 6__ is now: Mitigating PIP3 competition using biosensors expressed at single molecule levels
      • Last paragraph of the results section on 6 now reads: “this showed that when expressed at single molecule levels, the biosensor has substantially better dynamic range”. #1.2 Figure 1D-F, images not as clearly describing quantitation as one would hope. Untransfected cells in 1E should demonstrate more translocated Akt-pS473 than transfected, but it is difficult for this reviewer to find. Consider inset images in addition to the wider field. Consider also moving the "negative" data of Fig 1B-C to Supplement.

      We regret not making this figure easier to interpret; we have substantially updated the figure, as comprehensively detailed in our point-by-point response to reviewer 2’s point 2.3. To specifically address this reviewer’s concerns:

      The older figure used non-confocal, low-resolution images that were used for quantification. Such an approach was employed to enable fluorescence from the entire cellular volume to be captured, which produces more robust quantification. However, to the reviewer’s point, it is not possible to see the translocation of PH-AKT1 nor translocated AKT-pS473 in these images. Fortunately, we had in parallel captured high resolution confocal images for some experiments. These are now shown in Fig 1D-E, which clearly shows translocated AKT-pS473 and PH-AKT-EGFP

      #1.3 The cell line being used is not clearly specified after the initial development of the NG1 followed by CRISPRed NG2 onto Akt. For example, for the Figure 3C experiments, the text states "complete ablation of endogenous AKT1-NG2" but this information is not apparent from the figure legend or figure. Throughout the cell line used and the aspects transfected need to be made explicitly clear.

      We are grateful to the reviewer for highlighting this ambiguity. We have now defined the gene-edited cells used throughout as “AKT1-NG2 cells” and expressly used this term when referring to experiments in figures 2-5.

      #1.4 Fig. 5 shows single cells. It is therefore unclear if broken promoters have resulted in decreased expression. This point is important because the expression plasmids should be made publicly available, and for their use to be understood properly, this must be clarified. The details of the plasmids are unclear. Perhaps listed in the table? - unclear. This aspect would be important for the field to effectively use the reagents.

      Thank you for drawing our attention to the lack of adequate detail here. We have now updated the results text to expressly reference Morita et al., 2022 where the origins of the truncated CMV promoters are detailed. We have also updated the plasmids table 1 to add pertinent details for these constructs: *pCMVd3 plasmids are based on the pEGFP-C1 backbone, with the CMV promoter truncated to remove 18 of the 26 putative transcription factor binding sites in the human Cytomegalovirus Major Intermediate Enhancer/Promoter (pCMV∆3 as described in Morita et al., 2012). The full sequences will be deposited with the plasmids on Addgene.

      We did not perform a formal comparison of full vs truncated promoters. Our only observation is that the truncated promoters greatly help in increasing the number of expressing cells presenting single-molecule resolvable expression levels (though the approach can still work with full promoters).

      #1.5 This manuscript speculates several times that with more abundant PIs like PI45P2, the observed saturation effect is probably not happening. This should be removed. While the back of envelope calculations may reflect an ideal scenario, the heterogeneity of distribution and multiple key cellular structures involved would seem to corral increased PI45P2 levels in certain regions. These factors amid multivalency and electrostatic mechanisms of lipid effector recruitment (e.g. MARCKS) suggest that speculation may be too strong. Moreover, Maib et al JCB 2024 demonstrated PI4P probe overexpression could directly mask the ability to detect PI4P post-fixation - not fully, but partially. Repeating the titration experiments of this manuscript for multiple PIs is entirely beyond the scope of reasonable, and hence, such experiments are not requested, in favor of adopting more conscientious speculation.

      The reviewer’s point is well taken. Whilst we still believe the overall argument for lipids is sounds (for example, PS or cholesterol are far too abundant for any expressed, stoichiometric binding protein to bind the majority of the population) even abundant phosphoinositides like PI4P and PI(4,5)P2 are an edge case. We have therefore undated the first paragraph of the introduction on __p. 1 __to be less explicit: One of the most prominent is the fact that lipid engagement by a biosensor occludes the lipid’s headgroup, blocking its interaction with proteins that mediate biological function. It follows that large fractions of lipid may be effectively outcompeted by the biosensor, inhibiting the associated physiology. We have argued that, in most cases, this is unlikely because the total number of lipid molecules outnumbers expressed biosensors by one to two orders of magnitude (Wills et al., 2018). However, for less abundant lipids, total molecule copy numbers may be in the order of tens to hundreds of thousands, making competition by biosensors a real possibility.

      We also removed the explicit discussion of PI(4,5)P2 from the introduction, and focus now solely on the PI3K lipids.

      Reviewer 1 Minor:

      1.6 Schematics throughout need simplification, enabling their enlargement.

      We have now enlarged the size of all schematics

      #1.7 Numerous spelling (Fig. 4 schemas) and capitalizations need fixing.

      Thank you for drawing our attention to these. We have thoroughly proof-read the figure panels and corrected errors.

      #1.8 Pg 1 Famous is not appropriate wording

      We respectfully beg to differ with the reviewer here. We believe it is perfectly accurate to state that PIP3 is a second messenger molecule that is known about by many people; we see this as the dictionary definition of the word “famous”.

      #1.9 Fig. 1A statistical testing of microscopy quantifications absent (generally, throughout) and should be included.

      This was indeed an oversight on our part. We have now added appropriate multiple comparisons tests to the data presented in figures 1F, 3F, 4C, 4F and 5C.

      #1.10 Fig.1. In a transient transfection, the protein expression is not uniform. Please explain how you normalized the quantification.

      We hope this is now clarified by the expanded “Image Analysis” part of the methods section on pp. 10-11 (relevant sentence is underlined): For immunofluorescence, we identified individual cells by auto thresholding the DAPI channel using the “Huang” method, followed by the Watershed function to segment bunched cells that appeared to touch. We then used the Voronoi function to generate boundary lines for the segmentation of the cells. To identify cytoplasm, auto thresholding of the CellMask channel using the “Huang” function was employed, with the cells segmented by adding the nuclear Voronoi boundaries. The “analyze particles” function was then used to identify individual cellular ROIs that were greater than 10 µm2 and were not touching the image periphery. These ROIs were used to measure the raw 12-bit intensity of the EGFP and AKT-pS473 channels. A cutoff of EGFP > 100 was used to define EGFP-positive cells, since this value was greater than the mean ± 3 standard deviations of the non-transfected cells’ EGFP intensity. Background intensity of AKT-pS473 was estimated from control cells subject to immunofluorescence in the absence of AKT-pS473 antibody; this value was subtracted from the measured values of all other conditions.

      #1.11 Fig. 1D. EGFP expression levels increased with EGF stimulation. How is this possible?

      There appeared to be a difference due to the presence of 5 strongly expressing cells in the chosen field in the original field for the EGF stimulated, EGFP cells. However, this arose just by chance. The new set of high-resolution images in the new figure 1 were selected to be more representative.

      #1.12 Fig. 1D. The images have pS473 whereas the y-axis label on box plots has p473. Can these box plots be labelled separately for consistency?

      Thank you. This has now been corrected in the revised Figure 1.

      #1.13 Fig.1. T308 phosphorylation is mentioned in Figure 1, but only pS473 data is shown.

      Both T308 and S473 phosphorylation are indicative of AKT activation. However, antibodies suitable for immunofluorescence are only available for pS473, hence why our experiments are restricted to this moiety.

      #1.14 Fig.1 legend. 'Over-expression of PH-AKT is hypothesised to outcompete the endogenous AKT's PH domain'. Why do you need to state a hypothesis in the legend?

      We included this statement for the benefit of the casual reader – i.e. one who looks at the pictures, but doesn’t read the main text!

      #1.15 Fig.1E You stated that the PH-AKT R25C-EGFP is stimulated by EGF addition. However, the GFP signal looks the same in both unstimulated and stimulated. Could you please clarify? Are you sure that the stimulation worked?

      We have clarified the second paragraph of the results section “Inhibition of AKT activation by PIP3 biosensor”__on __p. 4 as follows: In the non PIP3 binding PH-AKT1R25C-EGFP positive cells, we still observed an increase in pS473 intensity.

      The revised figure 1 images also show that PH-AKT1R25C does not translocate to the membrane with EGF stimulation.

      #1.16 You mention...that the AKT enzyme is activated by PDK1 and TORC2, which phosphorylate at residues T308 and S473, respectively. Phosphorylation is also known to occur on T450 at c-tail. Does this phosphorylation also contribute to its activation?

      Yes and no. Threonine 450 phosphorylation is thought to occur co-translationally and is important for AKT stability (see Truebestein et al as cited in the manuscript). It is not really relevant in the context for T308 and S473, which are phosphorylated acutely to activate the protein.

      #1.17 Fig. 1 scale bar in all images equivalent?

      We have now added scale bars to panels in both figure 1D and E to clarify.

      __#1.18 __Pg. 1 paragraph 1 "we have argued..." vs. paragraph 3"...consider that an..." feels like arguing with themselves.

      We believe the re-write we have done in response to major point #1.5 clarifies this point also.

      #1.19 Pg. 1 para 3 what is RFC score - must explain

      We have now defined this more clearly in third __paragraph of the __introduction on p. 1: PH domain containing PIP3effector proteins can be predicted based on sequence comparison to known PIP3 effectors vs non effectors using a recursive functional classification matrix for each amino acid (Park et al., 2008).

      #1.20 Discussion of numbers of PIP3 vs. effectors etc may not be appropriate for the introduction, as the points made by these calculations are already made in the previous paragraphs. May fit better in pg 6 Mitigating PIP3 titration... with an accompanying schematic.

      Respectfully, we prefer to keep this discussion of molecular concentrations, as this adds details and specifics to the pathway that is core to the paper.

      #1.21 Pg 2 "a neonGreen" not well defined, needs accurate description.

      We have clarified this in the sentence in the first paragraph of the results section “Genomic tagging of AKT1…” __on __p. 4, which includes the citation to the full description of the tag: To that end, we used gene editing to incorporate a bright, photostable neonGreen fluorescent protein to the C-terminus of AKT1 via gene editing using a split fluorescent protein approach (Kamiyama et al., 2016).

      #1.22 Fig 2C should give a unstimulated trajectory of puncta/100 um2 to compare with the stimulated

      Unfortunately, we did not record a full 5.5-minute video-rate time-lapse with unstimulated cells. However, we do not believe this control is essential for this experiment, since this example data is included to illustrate (1) the problem of photobleaching, which is clear in the 30-s pre-stimulus and (2) the variability in the raw molecule counts.

      #1.23 Fig 2C and F and G should be systematized for easier comparison. E.g. min vs seconds, 0 timepoint of EGF/rapa addition

      We have made the adjustment to figure 2C to be consistent with 2F and G:

      #1.24 Pg 5 "...and calibrated them..." unclear what is being calibrated, as the text later states that the histograms are fit to monomer/dimer/multimer model resulting in 98.1% in monomer. Minor point.

      We have clarified this point in the second paragraph of the results section “__Genomic tagging of AKT1…” __on __p. 4 __as follows: We analyzed the intensity of these spots and compared them to intensity distributions from a known monomeric protein localized to the plasma membrane (PM) and expressed at single molecule levels

      #1.25 Explain why baselines in Fig2CFG are different

      We did not comment on figure 2C; it is a single cell measurement, as opposed to the mean of 20 cells reported in F. However, we do now clarify the difference between figure 2F and G as the very end of the “Genomic tagging of AKT1…” results section on p 4: Notably, baseline AKT-NG2 localization increased from ~5 to ~15 per 100 µm2 in iSH2 cells, perhaps because the iSH2 construct does not contain the inhibitory SH2 domains of p85 regulatory subunits, producing higher basal PI3K activity.

      #1.26 Fig. 2 has quantification with images; Fig. 3 has it separate. Make consistent.

      We sometimes combine images with quantification, and other times separate the panel containing graphs. This is done deliberately, depending on whether the reader is directed to both together, or whether we consider the data separately in the results section.

      #1.27 Fig. 3B comes before images? Where are the images? Also, y-axis = Intensity (a.u.). Is intensity just full image field? Or per cell? All very unclear.

      We have modified both the graph y-axis label and the figure legend to clarify: (C) TIRF imaging of AKT1-NG2 cells from (B) stimulated with 10 ng/ml EGF

      #1.28 Fig. 3C missing images

      We believe the reviewer is referring to the mCherry channel for the “0 ng cDNA” condition. These images are missing because they do not exist. Since these cells were transfected with pUC19, there was no mCherry fluorescence to image.

      #1.29 Fig 3 C needs brightness/contrast adjusted as images are nearly entirely black (zero values).

      We believe the addition of insets addresses this concern. To the reviewer’s specific suggestion, we found that further increases in the brightness and contrast will bring up the camera noise, but this then occludes the signal from single molecules, such as those found after EGF stimulation of the 0 ng condition.

      #1.30 Fig 3C needs scale bar systemization

      We believe that the incorporation of scaled 6 µm insets addresses this point.

      #1.31 Fig 4 needs 4 panels A-D

      We have now added these individual panel labels to figure 4.

      #1.32 Pg 6 5-OH phosphatases needs reference

      We have added a citation to Trésaugues at the very end of the “Sequestration of PIP3 by lipid biosensors” results section on p. 6, which describes the activity of the whole 5-OH phosphatase activity against PIP3, not just the SHIP phosphatases.

      #1.33 Fig 5B, make images bigger

      Again, we trust that the addition of insets to all single molecule images has addressed this point.

      Reviewer 1 Referees cross-commenting**

      I have read the other reviews and find them entirely reasonable. My impression is we landed on similar general content that needs work, none of which is out of line. The importance and care taken in the author's work is uniformly lauded.

      We agree. At the risk of restoring to alliteration, we have been delighted to receive a trio of clear, concise and consistent comments on the manuscript! We believe it is now much improved.

      Reviewer #1 (Significance (Required)):

      This manuscript clearly and reasonably demonstrates that the commonly used PIP3 sensor can be titrated to low concentrations, at which it does not interfere with Akt translocation and activation. This work is a good technical reference for the field. Signal transduction and membrane biologists should be especially interested in the data. The reviewer/s have core expertise in phosphoinositides, protein biochemistry, cell biology, and membrane biophysics.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The authors characterize the inhibition of lipid second messenger mediated cell signaling through lipid biosensors that outcompete endogenous effector proteins. This is a very important study that as it quantitatively assesses an issue that many people suspected to exit, yet never properly characterized. This paper is therefore as much a service to the community as a research study in its own right and should be published without undue delay. I am glad that the authors decided to carry out this study & really appreciate their work.

      I do however, have a number of suggestions that I think will make the manuscript stronger and can be readily implemented, mostly by reformulating and/or re-analysis of exiting datasets. I've structured my comments by the datasets in the respective figures to follow the logic of the paper.

      Reviewer 2 Major:

      #2.1 Throughout the manuscript, statistical tests are missing, e.g. in figures 1C-F. This must be amended in the revised version. The authors are making a very quantitative point about buffering, data should be treated accordingly.

      We have now added appropriate multiple comparisons tests to figures 1F, 3F, 4C, 4F and 5C.

      #2.2 I do not think that "PIP3 titration" is the best term to describe the observed effect. "Titration" usually implies the controlled modulation of a concentration, e. g. in analytical chemistry. I think either "competitive binding of PIP3" or "buffering of free PIP3" are more adequate.

      This point is well taken. We have now replaced the word “titration” throughout, replacing it with either “competitive binding” or “sequestration”.

      #2.3 Specific comments: Figure 1

      #2.3a Why are data in 1D-Ff shown as median, with interquartile ranges and 10-90 percentile distance when everything else in the paper is mean +/- se? There might be a good reason for it, but I did not find it mentioned everywhere

      For consistency’s sake, we have changed figure 1F to show a bar graph, though as noted in the figure legend: Graphs show medians ± 95% confidence interval of the median from 82-160 cells pooled from three experiments (medians are reported since the data are not normally distributed).

      #2.3b The authors should test, whether the difference between the +EGF conditions in 1D (EGFP) and 1F (PH-AktR25C-EGFP) is indeed statistically significant. If this observation holds up, what does it mean? Is the mutant still competing with endogenous Akt despite the much-reduced binding affinity? The authors should discuss.

      We have re-analyzed the data in figure 1, with the quantitative data presented in figure 1F combined with statistical analysis. The new data shows no significant effect of the PH-AKT1R25C mutant in either resting or EGF stimulated condition

      There results are also described in the__ second paragraph__ of the first results section on pp. 3-4: This analysis showed that the R25C mutant had no substantial effect on pS473 levels, whereas wild-type PH-AKT greatly inhibited pS473 staining in EGF-stimulated cells as well as reducing basal levels in serum starved cells (Fig. 1F).

      #2.3c How were biosensor/GFP positive cells chosen? Did the authors choose a defined fluorescence intensity cut-off? I think that a pure manual selection is problematic from a methodological point of view as this may introduce biases. Since the authors use Fiji, they can also simply use the "Analyze particles" function, which allows to automatically segment cells from a thresholded image. By choosing the same threshold for all images, it would be ensured that all images are treated exactly the same way.

      We had initially opted for manual outlining of cells since automatic segmentation of irregularly-shaped HEK293a cells is imperfect. However, we agree with André that this opens the possibility of bias. We have therefore re-run the analysis with an automated segmentation and thresholding approach, as suggested. This is detailed in the__ second paragraph__ of the first results section on pp. 3-4: In parallel, we imaged cells with a low resolution 0.75 NA air objective to capture fluorescence from the cells’ entire volume, then quantified these images using an automatically determined threshold for GFP-positive cells (see Materials and Methods). This analysis showed that the R25C mutant had no substantial effect on pS473 levels, whereas wild-type PH-AKT greatly inhibited pS473 staining in EGF-stimulated cells as well as reducing basal levels in serum starved cells (Fig. 1F).

      Further detail is provided in the first paragraph of the “Image analysis” subsection of the methods on pp. 10-11: For immunofluorescence, we identified individual cells by auto thresholding the DAPI channel using the “Huang” method, followed by the Watershed function to segment bunched cells that appeared to touch. We then used the Voronoi function to generate boundary lines for the segmentation of the cells’ cytoplasm. To identify cytoplasm, auto thresholding of the CellMask channel using the “Huang” function was employed, with the images segmented by adding the nuclear Voronoi boundaries. The “analyze particles” function was then used to identify individual cellular ROIs that were greater than 10 µm2 and were not touching the image periphery. These ROIs were used to measure the raw 12-bit intensity of the EGFP and AKT-pS473 channels. A cutoff of EGFP > 100 was used to define EGFP-positive cells, since this value was greater than the mean ± 3 standard deviations of the untransfected cells’ EGFP intensity. Background intensity of AKT-pS473 was estimated from control cells subject to immunofluorescence with the AKT-pS473 antibody omitted; this value was subtracted from the measured values of all other conditions.

      #2.3d I am missing a statement in the methods section that all images were acquired using the same settings.

      This was indeed an important oversight on our part – thanks for spotting the omission of this crucial detail. This is now included at the end of the “Immunofluorescence” section of the Methods on pp. 9-10: Identical laser excitation power, scan speeds and photomultiplier gains were used across experiments to enable direct comparison.

      #2.3e I recommend that the authors include a single cell correlation plot of EGFP fluorescence intensity vs AktpS473 intensity in Figure 1 D-F. This should be rather informative & make the concentration dependence clear.

      We did not observe a strong correlation between PH-AKT1-EGFP intensity and pS473 staining, likely driven by both the imprecision of the cell segmentation and the fact that very low concentrations of PH domain effectively inhibit endogenous AKT1 (as we show in the later figures with the more precise, live cell AKT-NG2 recruitment experiments: see response to #2.5).

      #2.3f I further recommend that the authors look at alterations of baseline Akt activity in the presence of the biosensor. In the images it looks like there might be an effect, but this is then lost in the analysis due to the normalization.

      As covered in our response to #2.3b, there is indeed an inhibition of baseline pS473 in PH-AKT1-EGFP expressing cells, now explicitly quantified and documented in results.

      #2.3g Please include zoomed image insets in Fig. 1D-F, in the current magnification one needs to zoom in quite a bit to see the effect in the raw data. It is a clear effect, but having a zoomed version would make for much easier reading.

      We now include high-resolution confocal images instead of low power, low NA volumes as shown in the last version of the manuscript, which we believe addresses this point and also reviewer #1.2.

      2.3h Up to the authors: I wonder whether it is possible to extract an IC50 value for the competitive inhibition of Akt by the respective biosensors. The transient expression gives the authors access to a wide range of expression levels at the single cell level, which could be quantified by counterstaining with a EGFP-nanobody at a different color (since the EGFP fluorophore went through the fixation process, it is likely unsuitable for quantification) and microscope calibration. Activity could be quantified as the ratio of observed and expected Akt-pS473 fluorescence (derived from the mean FI per cell from the EGFP control). This is not strictly necessary, but would be a beautiful quantitative experiment, give an easy-to-understand number & make the paper much stronger.

      This is a great suggestion, but does not produce precise enough data to work out, as we detail in response to #2.3e. From our data in new figure 3F and figure 5, it seems we have not explored the appropriate expression range to see intermediate levels of inhibition necessary to estimate IC50. This would be a cool experiment though!

      __#2.4 __Specific comments: Figure 2. Overall, compelling data. However, 25 molecules/100 um^2 at maximal recruitment feels low. Assuming a total cell surface area of appr. 2000 um^2 per cell and taking a baseline of 5 molecules/100 um^2 into account, this would mean that only about 400 copies of Akt are recruited in response to a pretty robust stimulus. Is it possible that the association reaction of the split GFP is not complete under these conditions? I think that a direct measurement of intracellular endogenous Akt concentration is required to put these numbers into context.

      This is an excellent point that we had missed. We now specifically address this point in the third paragraph of the “Genomic tagging of AKT…” section on p. 4: __Accumulation of AKT-NG2 was ~25 molecules per 100 µm2, which assuming a surface area of ~1,500 µm2 per cell corresponds to ~375 molecules total. It should be noted that tagging likely only occurred at a single allele in each cell, and the population still exhibited expression of non-edited AKT1 (__Fig. 2B). Given that HEK293 are known to be pseudotriploid (Bylund et al., 2004), the true number of AKT1 molecules would be at least 1,125. However, given an estimated total copy number of 23,000 AKT1 in these cells (Cho et al., 2022), this is still only about 5%. However, we do not interpret these raw numbers due to uncertainties in the efficiency of NG2 complementation under these conditions, as well as potential for reduced expression from the edited allele.

      We also removed the specific comment on molecule density from the abstract.

      #2.5 Specific comments: Figure 3 I think that the classification by plasmid dose does not make a lot of sense, as the resulting expression levels are rather similar. I suggest to pool all traces and calculate mean curves by actual expression levels using a binning approach (e.g. 0-50 au, 50-100 au and so on in raw intensity from Figure 3b). If there is an effect in the realized concentration regime, this should pick it up.

      This is an excellent suggestion, and we have done just that: thank you! The data is now included as a new panel Fig. 3F. The result is described in the results section, “Sequestration of PIP3 by lipid biosensors”, end of the first paragraph on pp. 4-6: To observe the concentration-dependence of AKT1-PH-mCherry inhibition, we pooled the single cell data from these experiments and split transfected cells into cohorts based on raw expression level (excitation and gain were consistent between experiments, allowing direct comparison). This analysis showed profound inhibition of AKT1-NG2 recruitment at all expression levels, with a slightly reduced effect only visible in the lowest expressing cohort (Fig. 2F).

      #2.6 Specific comments: Figure 5 These are very interesting data, in particular with regard to the underlying PIP3 dynamics. I agree with the conclusion of the authors that shielding of PIP3 from degradation is the likely culprit. What I would like to see here is actual kinetic fits - and different terms. On- and off-rate imply biosensor binding, but these are likely rather fast and not on the minute-timescale. The detected processes are much more likely to reflect production and degradation of PIP3 and that should be reflected in the terminology. For the fit: I think that a simple rate law for subsequent reactions ([PIP3]=C(e^-k1t-e^k2t)) will give good results and yield effective rate constants for PIP3 generation and degradation. This implies the quasi-steady state assumption for biosensor binding and implies that [PIP3] is proportional to the biosensor bound [PIP3], but these are reasonable assumptions to make.

      The is an excellent suggestion, which we have added. Specifically, fits are now present on Figs. 5G and 5I; we describe these in the last paragraph of results on p. 8: Normalizing data from both expression modes to their maximum response (Fig. 5G) and fitting kinetic profiles for cooperative synthesis and degradation reactionsrevealed the rate of synthesis is remarkably similar: 1.09 min–1 (95% C.I. 1.02-1.17) for single molecule expression vs 1.02 min-1 (95% C.I. 0.98-1.06) for over-expression. On the other hand, degradation slowed with over expression from 0.34 min–1 (95% C.I. 0.24-0.58) to 0.13 min–1 (95% C.I. 0.12-0.15). This is expected, since synthesis of PIP3molecules would not be prevented by biosensor. On the other hand, PIP3 degradation could be slowed by the over-expressed biosensor competing with PTEN and 5-OH phosphatases that degrade PIP3. An even more exaggerated result is achieved with the cPHx1 PI(3,4)P2 biosensor; this shows an increase in fold-change over baseline of 600% for single molecule expression levels, compared to only 100% in over-expressed cells (Fig. 5H). Again, the degradation rate of the signal is substantially slowed by the over-expressed sensor, reducing from 0.27 min–1 (95% C.I. 0.22-0.39) to 0.16 min–1 (95% C.I. 0.14-0.19), whereas synthesis remains only minorly impacted, changing from 0.61 min–1 (95% C.I. 0.57-0.64) to 0.54 min–1 (95% C.I. 0.52-0.56) with over-expression (Fig. 5I). Collectively, these data show that single molecule based PI3K biosensors show improved dynamic range and kinetic fidelity compared to the same sensors over-expressed.

      Details of the fits are given in a new methods section on p. 11:

      Fitting of reaction kinetics

      Curve fitting was performed in Graphpad Prism 9 or later. For the data presented in Figs. 5G and 5I, both synthesis and degradation phases displayed clear “s” shaped profiles not well fit by simple first order kinetics. Since activation of the PI3K pathway involves many multiplicative interactions between adapters and allosteric activation of the enzymes themselves, we assumed cooperativity and fit reactions with the two phase reaction as follows:

      Where Ft denotes ∆Ft/∆FMAX, nsyn and ndeg are the Hill coefficients of the respective synthesis and degradation reactions, and the rate constants for the reactions are derived from ksyn = 1/τsyn and kdeg = 1/τdeg.

      André Nadler

      Reviewer #2 (Significance (Required)):

      This is an important paper, analyses the effects of over-expressed lipid biosensors on cell signalling in some detail and will be of significant interest to a broad readership.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      This is essentially a methods paper in which the authors provide a detailed and highly quantitative analysis of the potentially deleterious effects of expressing phosphoinositide-binding domains as biosensors. Specifically, they study the effects on PIP3 signalling, using biosensors that are widely used in the field.

      They show that the most-commonly used method of expressing PIP3 biosensors using transient transfection with viral promotors has clear deleterious effects on downstream signalling due to out-competing the endogenous effectors. Importantly, they also describe a new approach to overcome this by developing new plasmids and methodology to express these reporters at low levels.

      Reviewer 3 Major comments:

      The work in this paper is thorough and very nicely done. I particularly appreciate the efforts to quantitate or estimate actual numbers and densities of molecules, which significantly strengthen their arguments. The data are excellent and strongly support all their conclusions. I would therefore be happy to see this work published in its current form.

      Reviewer 3 Minor comments:

      I only have some minor and optional suggestions for improvement.

      #3.1 In figure 1D-F they show that PH-Atk-EGFP expression can suppress downstream Akt activation by quantifying P-Akt signal my microscopy. In these panels they say tgey selectively measure this in GFP-expressing cells, but it is not clear how they define which cells are expressing GFP - was a threshold used? Also, it would be nice to also measure both PH-Akt-GFP and P-Akt staining by flow cytometry to look for a correlation. Is there a threshold of biosensor expression that blocks downstream signalling, or is there a linear relationship? This might help specifically measure how much biosensor is too much.

      This is an important comment, also raised by reviewer 2. We provide a detailed explanation and outline revisions that address this in our response to reviewer #2.3c; essentially, we replaced the analysis with an automated segmentation and quantification, estimating GFP-positive cells from a fraction of non transfected cells. We have not performed a FACS analysis, but as we note in our response to #2.3e __and #2.3h, the correlation between EGFP and pAKT staining is imprecise in these experiments. The new __Fig. 3C does address this point for AKT1-NG2 recruitment, as described in our response to #2.5.

      #3.2 Some of their microscopy images (e.g. Fig 1D-F, Fig 5) are very small and would benefit from a zoom box - especially when they are trying to demonstrate single molecule detection.

      This is a fair point raised by all of the reviewers in one form or another. We have added zoomed insets to all of the single molecule images in Figs 2-5, and added higher magnification, confocal section images to Fig. 1.

      Reviewer #3 (Significance (Required)):

      This is both a methods paper and cautionary tale for cell biologists working in this field. Whilst everyone who uses these probes should be aware of the potential risk of biosensors titrating our effectors, this is often not sufficiently acknowledged. This paper is a very nice and clear demonstration of these risks, exemplified with probably the most highly-used biosensor and key downstream signalling pathway.

      Whilst the concepts presented are not especially novel, this paper nonetheless makes an important contribution to the community and hopefully will make others more cautious in how they use these biosensors.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We want to thank both reviewers for their thorough and constructive review of our manuscript. Below, we have re-iterated their comments followed by an explanation of how we have revised the manuscript to address this.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      This manuscript presented by Segeren et al. applied an interesting HRASG12V inducible cell model to study the mechanism of cellular resistance to replication stress inducing agents. They also employed a novel reversible fixation technique which allows them to FAC sort cells according to their replication stress levels before applying single cell sequencing analysis to the same cell populations. By comparing cells with low levels of replication stress to cells with high levels of replication stress, they found that reduction in gene expression of FOXM1 target genes potentially protects cells against replication stress induced by CHK1i plus gemcitabine combination. Overall, this is a very interesting study. However, the following points should be addressed prior to publication:

      Major: 1. Figure 3E and 3F showed two lists of differentially expressed genes in γH2Ax low cells. However, instead of arbitrarily extracting the FOXM1 target genes and TP53 targeted genes, it would be appreciated if the author could perform an unbiased and unsupervised gene set enrichment analysis such as Enrichr.

      As recommended, we performed an enrichment analysis using Enrichr to identify transcriptional programs associated with the we used the genes that were downregulated in the γH2AX-low cells. FOXM1 appeared as a prominent hit in different databases (both experimental and computational). We have included the lists of differentially expressed genes as an additional supplemental table (Table S1) and have included the Enrichr results as Table S3 (i.e. CHEA and ENCODE). We have described our results in lines 198-200 of the revised manuscript.

      1. At the experiment design stage, the authors also included HRASG12V status as a test condition because they previously found that HRASG12V mutation induces basal level replication stress and they would like to include this condition to study the adaptation to replication stress (line 110). However, the difference in HRASG12V negative and HRASG12V positive cells was not followed up in the later part of the paper. Can they show lists of differentially expressed genes identified under HRASG12V negative conditions as well (in the same format of Figure 3E and 3F) and comment on the differences as well?

      In the original manuscript, we included heatmaps of differentially expressed genes in the control cells in Figure S2. For improved clarity, we have modified this figure so that the heatmaps are labeled "Control cells". In the revised manuscript, we have also included Table S2, which lists the differentially expressed genes between yH2AX low and yH2AX high control cells, and Table S3, which lists the Enrichr results obtained based on these gene lists.

      We observed FOXM1 target genes in both the control and HRASG12V cells. Thus, the mechanism we identify does not appear to be specific to oncogenic Ras expression. We discuss this in lines 221-225. Because there were no other notable differences between the gene sets, we do not focus on this in the manuscript.

      1. In line 194 and in Figure S2B, the authors claimed that ANLN, HMGB2, CENPE, MKI67, and UBE2C demonstrated co-expression, but other genes displaying similar correlation scores were not commented (such as F3, CYR61, CTGF, etc). To avoid being biased at the analysis stage, the authors should define clearly what the cut-off of correlation score is and why only co-expression of ANLN, HMGB2, CENPE, MKI67, and UBE2C were mentioned.

      As suggested, we explain now in the revised manuscript that we focused on gene clusters consisting of at least 3 genes, that had a correlation coefficient greater than or equal to 0.4 with at least one other gene within the clusters. This cutoff is typically defined as representing a "moderate to good" correlation in biological data (Overholser, Sowinski, 2008). To make clear which clusters correlating gene sets passed these criteria, we have also highlighted these genes in Figure S3B. This returned the cluster we had already identified as FOXM1 targets, and as well spotted by the reviewer, a larger cluster which included F3, CYR61, CTGF, SERPINE1, ANKRD1, KRTAP2-3, UGCG, and AMOTL. Our Enrichr analysis did not identify any putative transcription factors linking the genes in this larger cluster. We are still interested to identify the putative transcription regulation mechanism linking these genes in future studies, but this is beyond the scope of the current manuscript. We have described these observations in lines 211-218.

      1. In line 215, instead of validating CENPE, UBE2C, HMGB2, ANLN, and MKI67 individually, the authors decided to validate FOXM1 instead, because they believe all the aforementioned genes are targets of FOXM1, therefore, validating FOXM1 alone would suffice. Again, this makes the validation process also biased. CENPE, UBE2C, HMGB2, ANLN, and MKI67 should be validated individually because they might sensitize cells to replication stress via different mechanisms. Besides, if all these genes were identified together because they are FOXM1 target genes, why did the authors not identify FOXM1 itself as a differentially expressed gene from the single cell sequencing? The sequencing only analyzed the S/G2/M cells, expression of FOXM1 should be detected easily.

      We agree with the reviewer that the omission of individual FOXM1 target genes in the validation process makes a biased impression. Therefore we ordered siRNAs against CENPE, UBE2C, HMGB2, ANLN, and MKI67. Similar to the other DE genes in the original mini-screen we first knocked down these genes using the siRNA Smartpools (pools of 4 individual siRNAs against each genes). Here, we observed a decrease in γH2AX signal compared to drug-treated cells transfected with all 5 Smartpools compared to drug-treated cells transfected with control siRNA. We next moved on to the deconvolution step of the screen, where we transfected cells with 4 individual siRNA against each gene. Here, we observed inconsistent effects of ANLN, CENPE, and HMGB2 when comparing the individual siRNAs, which all produce efficient knockdown of their target genes. But interestingly, for both MKI67 and UBE2C, each of the 4 individual siRNAs similar decreased yH2AX signal, though it was not as strong as the decrease observed when FOXM1 is knocked out. Understanding the exact mechanism of how MKI67 and UBE2C reduce replication stress is beyond the scope of this paper, but we hypothesize that, as with FOXM1, it is likely linked to their role in promoting progression through the cell cycle. These results are shown in Figures S5, and we mention these remarkable findings in the revised abstract and discuss these in the light of the recent literature in the Discussion section (lines 275-286).

      Then, we also addressed the comment about FOXM1 not being changed in the single cell RNA-seq analysis. We could indeed readily detect FOXM1 expression our single-cell RNA sequencing data. The difference in expression did not change significantly in cells sorted according to γH2AX level (Figure 4C). Because FOXM1 is highly regulated post-translationally, we hypothesized that an increase in the (active) protein is correlated to increased replication stress rather than transcript levels. This was indeed the case and we further explain our experiment to test this hypothesis in response to Point #6 (results are displayed in Figure 4D and described in lines 201-209).

      1. As pointed out by the author in the Discussion, single cell sequencing is not good at differentiating the causes from the consequences. The author tried to validate many of the differentially expressed genes in γH2Ax low cells. However, the fact that only FOXM1 knockdown passed the validation and deconvolution pointed out that the great majority of the identified genes are not the cause of the sensitivity change to replication stress inducing agents but likely the consequences. Therefore, in Figure S2C and S2D, it would be better that the authors could just name the genes as 'downregulated genes' in Figure S2C and 'upregulated genes' in Figure S2D. Taking into consideration that the expression change in the great majority of these genes are just consequences of sensitivity change to replication stress, defining them as 'potentially sensitizing' genes and 'potentially conferring resistance' genes is rather misleading.

      We agree that the way we originally labeled these plots may have been misleading. We have renamed then to "Downregulated in yH2AXlow" and "Upregulated in yH2AXlow", as recommended by the reviewer.

      1. To better prove that FOXM1 is the leading cause of the sensitivity to CHK1i+Gemcitabine induced replication stress, can the authors show the FOXM1 expression status in the tolerant cell population identified in Figure 1B (lowest panel)? Alternatively, can they plot FOXM1 expression level in the same tSNE plots shown in Figure 3B to 3D to see whether some of the γH2Ax low populations also show reduced FOXM1 expression?

      FOXM1 expression levels were not increased with gH2AXhigh versus gH2AXlow HRASG12V cells in the single cell RNA-sequencing data (Figure 4C in revised manuscript). However, as mentioned in our answer to point #4 we performed an additional experiment, which showed a strong positive correlation between phospho-FOXM1 and γH2AX (as measured by flow cytometry) in S-phase cells (Figure 4D). This indicates that the active form of the FOXM1 indeed increases as yH2AX levels increase, consistent with the observed increase in FOXM1 target genes. These results are described in lines 201-209.

      1. Clonogenic survival assay in Figure 4D was not quantified properly in Figure 4E. To rule out the siFOXM1 mediated growth/survival defects and to only focus on the siFOXM1 mediated resistance to CHK1i+Gemcitabine, the survival rate (intensity percent in this case) of CHK1i+Gemcitabine treated condition should be normalized against the survival rate of the Vehicle condition. E.g., the intensity percent of the siSCRAMBLE after treatment should be divided by the intensity percent of the untreated siSCRAMBLE; the intensity percent of the si#1 after treatment should be divided by the intensity percent of the untreated si#1, and so on. If the authors would like to show siFOXM1 induced growth/survival defects, they can still present the left part of the Figure 4E (the Vehicle group).

      Originally, we chose to show the absolute IntensityPercent for all groups, without normalizing to the untreated group, because we wanted to also highlight the FOXM1-mediated changes in growth. We agree that normalizing the IntensityPercent of the drug-treated group to the vehicle group better highlights the siFOXM1-mediated resistance. We have therefore re-analyzed the data and presented it this way in Figure 5E (described in lines 293-295). We moved our original Figure 4E to a new supplemental figure (Figure S4B) to still point out the effects of siFOXM1 on cell growth in untreated cells.

      Minor:

      1. In line 176, the author claimed that 'Interestingly, rare cells treated with CHK1i + gemcitabine are located within the untreated cell cluster (Fig. 3C)'. However, it is not as obvious where these cells are in the plot, especially to people who are new to tSNE plots. It would be appreciated if the authors could label these cells by circling them with red lines and make the point stronger.

      Rather than circling these points (we thought this would make the plot too "busy"), we have created an inset that zooms in on the region where we see the untreated cells within the untreated cell cluster. Within the inset, we use arrows to point out the cells we are referring to. This can be seen in our updated Figure 3C.

      1. In Figure S2B, it will be ideal to label clearly which genes are upregulated genes and which are downregulate.

      On the x-axis of the heatmap, we have drawn lines to separate the downregulated and upregulated genes.

      1. In line 50, the word 'multifaced' needs to be corrected to 'multifaceted'.

      Thank you for catching this, we have fixed it.

      1. It is unclear what 'underly drug resistance' means in line 150.

      We have reworded this sentence so that is more clear. It is now written as follows: "we aimed to identify gene-expression programs that mediate the low level of RS in a subset of cells, which could potentially mediate drug resistance". This change is in lines 155.

      1. It is advised that the phrase 'cell cycle position' could be changed to 'cell cycle phase' or 'cell cycle stage'.

      We purposefully used the phrase "cell cycle position" because we wanted to emphasis gradient-like progress through the cell cycle rather than a discrete distinction from one-phase to the next. We have reworded the text slightly to now say "position within S-phase" (lines 163, 187, 191, 208), since all the cells we are interested in are in S phase, but some are further through S phase than others.

      1. In line 185, the word 'in' after 'within' can be removed.

      Thank you for catching this, we have fixed it.

      1. In line 194, 'Among genes downregulated in γH2AXlow cells, the expression of ANLN, HMGB2, CENPE, MKI67 and UBE2C correlated' is missing an 'are' in front of the word 'correlated'.

      Thank you for catching this, we have fixed it.

      1. In line 239, Fig.SC3 should be Fig. S3C.

      Thank you for catching this, we have fixed it.

      1. FOXM1 is known as a crucial gene for G2/M transition. Therefore, FOXM1 knockdown cells are expected to be mostly arrested at the G2/M interface. Therefore, in line 244, it is incorrect to say stronger FOXM1 knockdown induced a 'lower proportion of cells in G2 phase'. In fact, as shown in Figure 4C, cells are accumulating in G2 phase (peaking around 11M on the DAPI axis) and depleted from G1 phase (peaking around 7M).

      We have reworded this to say that there is "a higher proportion of cells in S-phase and a less distinct G2 peak" (lines 270-271). The DAPI profiles of the scrambled, siFOXM1 #1, and siFOXM1 #2 conditions all show an S-phase "valley" between a G1 and G2 peak (the valley sits at about 8M-9M). In the siFOXM1 #3 and siFOXM1 #4 conditions, we no longer see this valley, therefore we interpret this as cells still in S-phase. If they had progressed from S-phase into G2 phase, we expect that we would again see this "valley" to the left of a clear G2 peak. In the figure below, we overlayed DNA content histograms of the different FOXM1 targeting siRNAs with the scrambled siRNA to demonstrate this point more clearly.

      Reviewer #1 (Significance (Required)):

      Advance: The study reported a novel reversible fixation technique which can lead to potentially good citations. However, the findings from the single cell sequencing alone fell short in novelty to reach high impact because FOXM1 has been reported to impact on cellular sensitivity to CHK1 inhibition mediated replication stress (PMC7970065). Moreover, the study did not provide mechanistic explanation to the observed phenotype but only validated the finding from the sequencing, and the gene of focus (FOXM1) was not originally identified from the sequencing, slightly undermining the paper's foundation. To make it a better paper. the authors need to be less biased when it comes to data analysis and interpretation.

      Audience: People who are interested in basic research in cell cycle, DNA damage, cancer, chemotherapy would be interested.

      My expertise: Cancer, DNA damage, cell cycle

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary:

      Replication stress activates ATR and CHEK1 kinases as part of the inter S phase DNA damage response. CHEK1 kinase inhibitors (CHK1i) have been shown to induce an accumulation of unresolved replication stress and widespread DNA damage and cell death caused by replication catastrophe, and are therefore under clinical evaluation. At the same time, CHEK1 inhibition results in the activation of CDK1 and FOXM1 and premature expression of G2/M genes (Saldivar et al., 2018 Science). FOXM1-drivent premature mitosis has been shown to be required for the replication catastrophe and CHK1i sensitivity (Branigan et al., 2021 Cell Rep.). In this study, Segeren and colleagues set out to investigate the mechanisms of replication stress tolerance. They used CHK1i inhibitors in combination with the DNA-damaging chemotherapeutic agent Gemcitabine and oncogenic HRASG12V expression to increase replication stress. The authors utilized an intriguing setup of combined immunofluorescence staining followed by single cell RNA-seq analysis to overcome limitations of bulk cell analyses. In particular, the authors sought to identify genes that are differentially regulated in replication stress-tolerant cells compared to sensitive cells. However, even single cell analyses can be confounded by differences in cell cycle distribution. To mitigate this, the authors selected mid S-phase cells for their analysis. While this may not have completely eliminated minor differences in cell cycle progression, the authors identified FOXM1-regulated G2/M cell cycle genes, among others, that were down-regulated in the tolerant cells. When the authors followed up on the effect of these genes on replication stress tolerance, they identified FOXM1 knockdown as the only robust mediator of replication stress tolerance.

      Major comments:

      The authors observed that cell cycle distribution could be a major confounding factor in their single cell analysis and attempted to reduce this variation by selecting mid S-phase cells based on the DAPI signal. The authors then chose to compare gH2AXlow and gH2AXhigh subpopulations of RPE-HRASG12V cells because their "DAPI signal was comparable" (line 181-184). However, their data show that these subpopulations also show differences in their DAPI signal distribution, with gH2AXlow cells tending to have lower DAPI signals than gH2AXhigh cells (Supplementary Figure 2A). Thus, the major confounding factor that the authors sought to remove seems to have prevailed and it remains possible that the difference in cell cycle gene expression is merely due to differences in cell cycle progression of the individual cells. Given that DAPI information seem to be readily available for the individual cells, the authors should normalize their analysis to the DAPI signal to remove this potential confounding effect or clearly state this potential limitation.

      We agree that indeed it is very challenging to fully disentangle the influence of cell cycle distribution on our analysis. And indeed, the γH2AXlow HRASG12V cells have slightly reduced median DNA content compared to γH2AXmid and γH2AXhigh. However, this was not the case in the RPE control cells, and we still found that FOXM1 target genes were strongly enriched in the γH2AXhigh cells (Fig S2C and Table S4). Therefore, it is highly unlikely that bias in S-phase position distributions does not explain our results. Nevertheless, to be transparent about this write in the Results on lines 192-193 the following: "The other groups all showed similar DAPI intensities, although gH2AXlow RPE-HRASG12V cells showed a slight but statistically significant reduction compared to their gH2AXhigh counterparts (Fig. S2A)".

      In our subsequent experiments to assess the relationship between phospho-FOXM1 (representing the transcriptionally active protein) and γH2AX, we observed that though there was a strong correlation between pFOXM1 and γH2AX, there was no correlation between phospho-FOXM1 and DAPI (Figure 4D-E). We therefore would like to point out that although our readout for replication stress inevitably increases as cells progress through DNA replication, heterogeneity in phospho-FOXM1 levels cannot be explained by position in S-phase. These results are described in lines 203-209.

      Finally, we do not think it would be statistically appropriate to use the DAPI signal (generated by fluorescence intensity as measured by the flow cytometer) as a normalization factor for our gene expression data.

      Minor comments:

      The findings of Saldivar et al., 2018 Science and Branigan et al., 2021 Cell Rep. should be mentioned in the introduction.

      As recommended, we mentioned both these papers in the introduction. In line 62, we cite the Branigan paper as showing that modulation of cell cycle regulators is a strategy used by cancer cells to resist replication stress. In lines 63-65, we reference them as follows: "The RS response is tightly linked with cell cycle progression, as multiple intra S-phase checkpoint kinases play a role in curtailing proteins involved in the S-G2 transition (Branigan et al., 2021, Saldivar et al., 2018)."

      The authors conclude that "cell cycle position can be a major confounding factor when evaluating the transcriptomic response to RS." It should be noted that stochastic differences in the cell cycle distribution of bulk cells are perhaps the best-known confounder in single cell analyses (see, for example, Buettner et al., 2015 Nat. Biotechnol.).

      We chose to reference the Buettner paper to justify our decision to select only cycling cells in our scRNA seq approach. Our reference to the paper, and to the fact that cell cycle distribution is a major confounder in single cell analysis, is in lines 138-140.

      Supplementary Figure 2A: The median should be added to the violin plots.

      As suggested, we have added medians to the violin plots. In addition, we added details on statistical analysis.

      The statement "Differential expression analysis revealed 19 genes that were significantly downregulated in gH2AXlow RPE-HRASG12V cells, suggesting that elevated levels of these genes are correlated with sensitivity to RS-inducing drugs" refers to Figure 3E and Table S1. However, Table S1 lists the "key resources" and does not seem to be related to this statement. A table showing log2fold-changes and FDR values should be added and referenced here.

      We have generated tables with the fold change values of differentially expressed genes between the yH2AX low and yH2AX high cells. These are found in Table S1 (for HRAS G12V cells) and Table S2 (for Control cells) in the supplementary file of the revised manuscript. The "key resources" has been moved to Table S5.

      The statement "Remarkably, Braningan and co-workers observed no effect of full FOXM1 deletion on cell cycle progression" seems somewhat inconsistent with what has been stated and assessed in that study. The authors may want to replace "progression" with "distribution". A reduction in proliferation is commonly observed when FOXM1 levels are reduced.

      In addition, the authors may want to consider that their addition of HRASG12V and Gemcitabine may contribute to a more substantial S phase checkpoint response.

      We agree with the reviewer that a reduction in proliferation is commonly observed when FOXM1 levels are reduced (Barger et al., 2021, Cheng et al., 2022, Yang et al., 2015, Wu et al., 2010), but in Branigan et al., they see no decrease in proliferation with knockout of FOXM1. They state "There were no apparent differences in the growth rate of the LIN54 and FOXM1 KO versus EV cells over 10 days (Figure 1G)". Though they do not elaborate on why they see this unexpected response, we suspect a permanent full knockout of FOXM1 could cause compensatory adaptation in their cell lines. In our experiments, we perform transient knockdowns, so cells may not have the time to adapt to the loss of FOXM1 and obtain compensatory mechanisms that would allow them to continue cycling as rapidly as control cells treated with non-targeting siRNA.

      However, we decided to remove this from the Discussion section, as it seemed to interrupt the discussion about the potential mechanisms underlying protection against DNA damage by FOXM1 depletion.

      The statement that "the mechanism by which high FOXM1 activity is a prerequisite to accumulate DNA damage in S-phase during CHK1 inhibition remains to be uncovered" seems to neglect that premature mitosis has been suggested as a mechanistic cause (Branigan et al., 2021 Cell Rep.). It would be helpful if the authors could elaborate on this.

      In our discussion, we do already emphasize the described role of FOXM1 in promoting premature mitosis (lines 330-337), but we argue that in our experimental conditions we are observing another - previously undescribed- role for FOXM1 in promoting replication stress during S phase. We previously observed with live cell imaging that CHK1i + gemcitabine does not cause premature mitosis in RPE-HRASG12V cells (published in Segeren et al. Oncogene 2022, Figure 5). Instead, these cells typically showed a cell cycle exit from G2. This makes it highly unlikely that premature mitosis is the reason why these cells would accumulate excessive DNA damage. We realize now that it was an important omission not to elaborate on this and have added this clarification to the Discussion (lines 341-345 in revised manuscript). In addition, we have removed a few lines of less important text (about the lack of direct effect of FOXM1 KO in the Branigan paper; see answer to previous point) to improve clarity and readability.

      Reviewer #2 (Significance (Required)):

      General assessment: The strength of the study is the intriguing methodology of combined immunofluorescence followed by single cell RNA-seq. The limitations are that this methodology does not seem to fully solve the stated problems. In addition, the study is essentially limited to confirming previous findings.

      Advance: The study strengthens current knowledge but provides essentially no advance. The authors confirm existing knowledge with an additional approach. While this is not an advance in itself, it is important to the community.

      Audience: I felt that the study would appeal to a basic science audience. In particular, the CHK1i and intra S-phase checkpoint areas, with limited interest beyond that.

      My relevant expertise lies in transcriptomics, gene regulation and the cell cycle.

      Reference list

      Barger, C.J., Chee, L., Albahrani, M., Munoz-Trujillo, C., Boghean, L., Branick, C., Odunsi, K., Drapkin, R., Zou, L. & Karpf, A.R. 2021, "Co-regulation and function of FOXM1/RHNO1 bidirectional genes in cancer", eLife, vol. 10, pp. 10.7554/eLife.55070.

      Branigan, T.B., Kozono, D., Schade, A.E., Deraska, P., Rivas, H.G., Sambel, L., Reavis, H.D., Shapiro, G.I., D'Andrea, A.D. & DeCaprio, J.A. 2021, "MMB-FOXM1-driven premature mitosis is required for CHK1 inhibitor sensitivity", Cell reports, vol. 34, no. 9, pp. 108808.

      Cheng, Y., Sun, F., Thornton, K., Jing, X., Dong, J., Yun, G., Pisano, M., Zhan, F., Kim, S.H., Katzenellenbogen, J.A., Katzenellenbogen, B.S., Hari, P. & Janz, S. 2022, "FOXM1 regulates glycolysis and energy production in multiple myeloma", Oncogene, vol. 41, no. 32, pp. 3899-3911.

      Overholser, B.R. & Sowinski, K.M. 2008, "Biostatistics primer: part 2", Nutrition in clinical practice : official publication of the American Society for Parenteral and Enteral Nutrition, vol. 23, no. 1, pp. 76-84.

      Saldivar, J.C., Hamperl, S., Bocek, M.J., Chung, M., Bass, T.E., Cisneros-Soberanis, F., Samejima, K., Xie, L., Paulson, J.R., Earnshaw, W.C., Cortez, D., Meyer, T. & Cimprich, K.A. 2018, "An intrinsic S/G(2) checkpoint enforced by ATR", Science (New York, N.Y.), vol. 361, no. 6404, pp. 806-810.

      Segeren, H.A., van Liere, E.A., Riemers, F.M., de Bruin, A. & Westendorp, B. 2022, "Oncogenic RAS sensitizes cells to drug-induced replication stress via transcriptional silencing of P53", Oncogene, vol. 41, no. 19, pp. 2719-2733.

      Wu, Q., Liu, C., Tai, M., Liu, D., Lei, L., Wang, R., Tian, M. & Lu, Y. 2010, "Knockdown of FoxM1 by siRNA interference decreases cell proliferation, induces cell cycle arrest and inhibits cell invasion in MHCC-97H cells in vitro", Acta Pharmacologica Sinica, vol. 31, no. 3, pp. 361-366.

      Yang, K., Jiang, L., Hu, Y., Yu, J., Chen, H., Yao, Y. & Zhu, X. 2015, "Short hairpin RNA- mediated gene knockdown of FOXM1 inhibits the proliferation and metastasis of human colon cancer cells through reversal of epithelial-to-mesenchymal transformation", Journal of experimental & clinical cancer research : CR, vol. 34, no. 1, pp. 40-1.

      We want to thank both reviewers for their thorough and constructive review of our manuscript. Below, we have re-iterated their comments followed by an explanation of how we have revised the manuscript to address this.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The work from Petazzi et al. aimed at identifying novel factors supporting the differentiation of human hematopoietic progenitors from induced pluripotent stem cells (iPSCs). The authors developed an inducible CRISPR-mediated activation strategy (iCRISPRa) to test the impact of newly identified candidate factors on the generation of hematopoietic progenitors in vitro. They first compared previously published transcriptomic data of iPSCderived hemato-endothelial populations with cells isolated ex vivo from the aorta-gonadmesonephros (AGM) region of the human embryo and they identified 9 transcription factors expressed in the aortic hemogenic endothelium that were poorly expressed in the in vitro differentiated cells. They then tested the activation of these candidate factors in an iPSCbased culture system supporting the differentiation of hematopoietic progenitors in vitro. They found that the IGF binding protein 2 (IGFBP2) was the most upregulated gene in arterial endothelium after activation and they demonstrated that IGFBP2 promotes the generation of functional hematopoietic progenitors in vitro.

      Strengths:

      The authors developed an extremely useful doxycycline-inducible system to activate the expression of specific candidate genes in human iPSC. This approach allows us to simultaneously test the impact of 9 different transcription factors on in vitro differentiation of hematopoietic cells, and the system appears to be very versatile and applicable to a broad variety of studies.

      The system was extensively validated for the expression of 1 transcription factor (RUNX1) in both HeLa cells and human iPSC, and a detailed characterization of this test experiment was provided.

      The authors exhaustively demonstrated the role of IGFBP2 in promoting the generation of functional hematopoietic progenitors in vitro from iPSCs. Even though the use of IGFBP2interacting proteins IGF1 and IGF2 have been previously reported in human iPSC-derived hematopoietic differentiation in vitro (Ditadi and Sturgeon, Methods 2016; Ng et al., Nature Biotechnology 2016), and IGFBP-2 itself has been shown to promote adult HSC expansion ex vivo (Zhang et al., Blood 2008), its role on supporting in vitro hematopoiesis was demonstrated here for the first time.

      Weaknesses:

      Although the authors performed a very thorough characterization of the system in proof-ofprinciple experiments activating a single transcription factor, the data provided when 9 independent factors were used is not sufficient to fully validate the experimental strategy. Indeed, in the current version of the manuscript, it is not clear whether the results presented in both the scRNAseq analysis and the functional assays are the consequence of the simultaneous activation of all 9 TF or just a subset of them. This is essential to establish whether all the proposed factors play a role during embryonic hematopoiesis, and a more complete analysis of the scRNAseq dataset could help clarify this aspect.

      Similarly, the data presented in the manuscript are not sufficient to clarify at what stage of the endothelial-to-hematopoietic transition (EHT) the TF activation has an impact. Indeed, even though the overall increase of functional hematopoietic progenitors is fully demonstrated, the assays proposed in the manuscript do not clarify whether this is due to a specific effect at the endothelial level or to an increased proliferation rate of the generated hematopoietic progenitors. Similar conclusions can be applied to the functional validation of IGFBP2 in vitro.

      The overall conclusions are sometimes vague and not always supported by the data. For instance, the authors state that the CRISPR activation strategy resulted in transcriptional remodeling and a steer in cell identity, but they do not specify which cell types are involved and at what level of the EHT process this is happening. In the discussion, the authors also claim that they provided evidence to support that RUNX1T1 could regulate IGFBP2 expression. However, this is exclusively based on the enrichment of RUNX1T1 gRNA in cells expressing higher levels of IGFBP2 and it does not demonstrate any direct or indirect association of the two factors.

      We thank the reviewer for the positive comments about the importance of our work and have now addressed the points raised as weaknesses by performing additional analysis and experiments, adding a new schematic of the mechanism, and rewording our claims.

      We have clarified the different effects mediated by the activation and the IGFBP2 addition in a summary section at the end of the results and added Figure 6, showing this in visual form. We have also clearly stated the limitations related to the correlation between RUNX1T1 and IGFBP2 in the discussion and toned down our claims regarding this throughout the entire paper. We have also reworded the text to clarify the specific cell types identified in the sequencing data that we refer to.

      Reviewer #2 (Public Review):

      To enable robust production of hematopoietic progenitors in-vitro, Petazzi et al examined the role of transcription factors in the arterial hemogenic endothelium. They use IGFBP2 as a candidate gene to increase the directed differentiation of iPSCs into hematopoietic progenitors. They have established a novel induced-CRISPR mediated activation strategy to drive the expression of multiple endogenous transcription factors and show enhanced production of hematopoietic progenitors through expansion of the arterial endothelial cells. Further, upregulation of IGFBP2 in the arterial cells facilitates the metabolic switch from glycolysis to oxidative phosphorylation, inducing hematopoietic differentiation. While the overall study and resources generated are good, assertions in the manuscript are not entirely supported by the experimental data and some claims need further experimental validation.

      We thank the reviewer for the positive comments, and we have provided new data and analysis to make sure that all our assertations are clearly supported and also reworded those where limitations were identified by the reviewers.

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      The assessment could change from "incomplete" to "solid" if the authors: i) improve data analysis (for both scRNAseq and functional assays) by providing additional information that could strengthen their conclusions, as suggested in the specific comments by both reviewers; ii) either provide new functional evidence supporting their mechanistic conclusion or alternatively tone down the claims that are not fully supported by data and acknowledge the limitations raised by reviewers in the discussion; (iii) the issue of paracrine signaling to expand only hematopoietic progenitors needs to be addressed.

      We have now improved the data analysis and provided additional functional tests to strengthen our conclusions and toned down those that were identified by the reviewers as not supported enough and included a discussion on these limitations. We have also reworded the section about the paracrine signaling throughout the paper.

      Reviewer #1 (Recommendations For The Authors):

      Figure 1 contains exclusively published data. It might be more appropriate to use it as a supplementary figure or as part of a more exhaustive figure (maybe combining Figures 1 and 2 together?).

      Figure 1 contained novel bioinformatic analyses that represent the base of our research and it has a different content and focus to figure 2, which is already a large figure. We therefore believe it is better to keep it as a separate figure, containing a new panel now too. 

      It seems there is an issue with Figure S3 labelling:

      • In line 112, Figure S2A-B does not display genomic PCR and sequencing results;

      • In line 123, Figure S3D-E does not show viability and proliferation data;

      • In line 127, Figure S3G does not show mCherry expression in response to DOX;

      We apologies for the confusion with the numbers, we have now correctly labelled the figures.

      It would be more informative to include gates and frequency on flow cytometry plots in Figure S3, to be able to evaluate the extent of the reduction in mCherry expression.

      We have now included the gating and frequency of mCherry-expressing cells in Supplementary Figure 3D.

      It is not clear from the text and figures whether the SB treatment was maintained throughout the hematopoietic differentiation protocol (line 122):

      • If so, it would be important to confirm that HDAC treatment does not affect EHT cultures

      • If not, can the authors provide some evidence that transgene silencing is not occurring during hematopoietic differentiation?

      We have clarified that we decided to treat the cells with SB exclusively in maintenance condihons because HDACs have been shown to be essenhal for the EHT (lines 138-142). We have now also included addihonal data showing the high expression of the mCherry tag reporhng the iSAM expression on day 8 (Supplementary Figure 4F).

      Can the authors provide a simple diagram summarizing the experimental strategy for each differentiation experiment in the respective supplementary figure? For instance, at what stage of the protocol was DOX added in Figure 3? Or at what stage IGFBP2 was added in Figure 5? It would be a very useful addition to the interpretation of the results.

      We have now included three schemahcs for all the experiments in the manuscript in supplementary figure 4 A-C.

      In Figure 3, the authors should provide more detailed information about the data filtering of the scRNAseq experiment, and more specifically:

      • How many cells were included in the analysis for each library after QC and filtering?

      • How "cells in which the gRNAs expression was detected" were selected? Do they include only cells showing expression of gRNAs for all 9 TF?

      This informahon is now included in the method sechon lines 773-781; the detailed code is available on the GitHub link provided in the same sechon. We have filtered the cells expressing one gRNA for the non-targehng gRNA (iSAM_NT) control and more than one for the iSAM_AGM sample. 

      In Figure 3A, it is not clear whether the expression of the 9 factors is consistently detected in all cells or just a subset of them, and the heatmap in Figure 3A does not provide this information. It would be more accurate to provide expression on a per-cell basis, for instance, as a violin plot displaying single dots representing each cell. 

      We have now included this violin plot in Supplementary Figure 4G as requested. However, this visualisation is difficult to interpret because some of the target genes’ expression seems variable in both experimental and control conditions. We had envisaged that this could have been the case and so this is why we had included the three different controls.  For this reason we chose to show the normalised expression which takes all the different variables into account (Figure 3A). 

      In Figure 3B-C, it seems that clusters EHT1 and EHT2 do not express endothelial markers anymore. Are these fully differentiated hematopoietic cells rather than cells undergoing EHT? In general, it would be quite important to provide evidence of expressed marker genes characterizing each cluster (eg. heatmap summarizing top DEG in the supplementary figure?). 

      We have now provided a spreadsheet containing the clusters’ markers that we used in

      Supplementary Table 1) a heatmap in Figure 3E. Furthermor,e we have now edited Figure 3C to include Pan Endothelial markers (PECAM1 and CDH5). These data show that the EHT1 and EHT2 cluster both express endothelial markers but are progressively downregulated as expected during endothelial to hematopoietic transition. We have also included and discussed this in the manuscript lines 192-195 and a schematic for the mechanism in Figure 6.

      In Figure 3E, displaying the proportion of clusters within each sample/library would be a more accurate way of comparing the cell types present in each library (removing potential bias introduced by loading different numbers of cells in each sample).

      We have now included the requested data in Supplementary Figure 4I and it confirms again the expansion of arterial cells in the activated cells.    

      In Figure 3G, by plating 20,000 total CD34+, the assay does not account for potential differences in sample composition. It is then hard to discriminate between the increased number of progenitors in the input or an enhanced ability of HE to undergo EHT. This is an important aspect to consider to precisely identify at what level the activation of the 9 factors is acting. A proper quantification of flow cytometry data summarizing the % of progenitors, arterial cells, etc. would be useful to interpret these results.

      Lines 204-205 reworded. We are very much aware of the fact that the CD34+ cell population consists of a range of cells across the EHT process and this is precisely why we carried out this single cell sequencing analyses.  We purposely tested the effect of the observed changes in composition by colony assays

      In Figure 3G, it seems that NT cells w/o DOX have very little CFU potential (if any). Can the authors provide an explanation for this?

      We think that the limited CFU potential is due to the extensive genetic manipulation and selection that the cells underwent for the derivation of all the iSAM lines but this did not impede us from observing an effect of gene activation on CFU numbers. This is one of the primary reasons that we then validated our overall findings using the parental iPSC line in control condition and with the addition of IGFBP2. We show that the parental iPSC line gives rise to hematopoietic progenitor, both immunophenotypically (Figure 4D) and functionally, at expected levels (Figure 4B left column).

      Figure 4A shows an upregulation of IGFBP2 in arterial cells as a result of TF activation. However, from the data presented here, it is not possible to evaluate whether this is specific to the arterial cluster, or it is a common effect shared by all cell types regardless of their identity. 

      Data has now been included in Supplementary Figure 4H, which shows that all the cells show an increase in IGFBP2, but arterial cells show the highest increase. We have now edited the text to reflect this, in lines 228-230.

      In Figure 5A-B only a minority of arterial cells express RUNX1 in response to IGFBP2 treatment. Is this sufficient to explain the very significant increase in the generation of functional hematopoietic progenitors described in Figure 4? Quantification and statistical analysis of RUNX1 upregulation would strengthen this conclusion.

      We have now provided the statistical analysis showing significant upregulation of RUNX1 upon IGFBP2 addition. The p values are now provided in the figure 5 legend.

      In Figure 5 the authors conclude that IGFBP2 remodels the metabolic profile of endothelial cells. However, it is not clear which cell types and clusters were included in the analysis of Figure 5C-G. Is the switch from Glycolysis to Oxidative Phosphorylation specific to endothelial cells? Or it is a more general effect on the entire culture, including hematopoietic cells? 

      We based this conclusion on the fact that the single-cell RNAseq allows to verify that the metabolic differences are obtained in the endothelial cells. Given that we sorted the adherent cells, the majority of these are endothelial cells as shown in Figure 5A. The Seahorse pipeline includes a number of washing steps resulting in the analyses being performed on the adherent compartment which we know consists primarily of endothelial cells. We cannot exclude some contamination from non-endothelial cells but we highlight to this reviewer that the initial observation of the metabolic changes was identified in endothelial cells in the single cell sequencing data. Taken together, we believe that this implies that metabolic changes are specific to this population. We have clarified this in the line 317.

      In the discussion, the authors conclude that they "provide evidence to support the hypothesis that RUNX1T1 could regulate IGFBP2 expression". To further support this conclusion, the authors could provide a correlation analysis of the expression of the two genes in the cell type of interest. 

      Following the observation of the IGFBP2 high expression across clusters, we have now reworded this sentence in lines 382-385  We have tried to perform the correlation analysis but we believe this not to be appropriate due to the detection level of the gRNA, we have now included this as a limitation point in the discussion lines 416-427, and also toned down the conclusion we did draw about RUNX1T1 throughout the whole manuscript.

      As mentioned by the authors, IGFBP2 binds IGF1 and IGF2 modulating their function. Both IGF1 (http://dx.doi.org/10.1016/j.ymeth.2015.10.001) and IGF2 (doi:10.1038/nbt.3702) have been used in iPSC differentiation into definitive hematopoietic cells. It would be relevant to discuss/reference this in the discussion.

      We have now included the suggested reference in the section where we discuss the role of IGFBP2 in binding IGF1 and IGF2.

      Reviewer #2 (Recommendations For The Authors):

      (1) Figure 1 compares the transcriptome of human AGM and in-vitro derived hemogenic endothelial cells (HECs). It is not clear why only the genes downregulated in the latter were chosen. Are there any significantly upregulated genes, knockdown/knockout which could also serve a similar purpose? Single-cell transcriptome database analysis is very preliminary. A detailed panel with differences in cluster properties of HECs between the two systems should be provided. A heatmap of all differentially expressed genes between the two samples must be generated, along with a logical explanation for choosing the given set of genes. 

      We have now included another panel in figure 1 to better clarify the logic behind the strategy used to identify our target genes (Figure 1A).

      (2) Figure 2 - a panel describing the workflow of gRNA design and targeting for the 9 candidate genes, along with lentiviral packaging and transduction would make it easier to follow. 

      We have now included three schematics for all the experiments in the manuscript in supplementary figure 4 A-C. 

      (3) Figure 3- to assess the effect of arterial cell expansion on the emergence of hematopoietic progenitors, CD34+ Dll4+ cells should be sorted for OP9 co-culture assay.

      Using only CD34+ cells does not answer the question raised. Also, the CFU assay performed does not fully support the claim of enhanced hematopoietic differentiation since only CFU-E and CFU-GM colonies are increased in Dox-treated samples, with no effect on other colony types. OP9 co-culture assay with these cells would be required to strengthen this claim. 

      We wanted to clarify that the effect on the methylcellulose coming from the activated cells was not limited to CFU-E, as the reviewer reported; instead, it also affected CFU-GM and CFU-M. 

      We have now performed additional experiments where we sorted the CD34+ compartment into DLL4- and DLL4+ in Supplementary Figure 5D-E, which we discussed in lines 250-258. 

      (4) In Figure 3F, there appears to be a lot of variation in the DLL4% fold change values for

      DOX treated iSAM_AGM sample, which weakens the claim of increased arterial expansion.

      Can the authors explain the probable reason? It is suggested that the two other controls (iSAM_+DOX and iSAM_-DOX) should be included in this analysis. It is imperative to also show % populations rather than just fold change to gain confidence.

      We agree that there is a lot of variability. That is because differentiation happens in 3D in embryoid bodies, which contain many different cell types that differentiate in different proportions across independent experiments. We have now included the raw data in Supplementary Figure 4 D, with additional statistical analysis to show the expansion of arterial cells including also the suggested additional controls.

      (5) How does activation of these target genes cause increased arterialization? Is the emergence of non-HE populations suppressed? Or is it specific to the HE? The data on this should be clarified and also discussed. ANTO/Lesley text

      We have provided additional data clarifying the connection between increased arterialisation and hemogenic potential. We showed that the activation induces increased arterialisation and that IGFBP2 acts by supporting the acquisition of hemogenic potential. We have discussed this in lines 326-348 and provided a new figure to explain this in detail (figure 6)

      (6) Considering that IGFBP2 was chosen from the activated target gene(s) cluster, can the authors explain why the reduced CFU-M phenomenon observed in Figure 3G does not appear in the MethoCult assay for IGFBP2 treated cells (Figure 4B)?

      The difference could be explained by the fact that in Figure 3G, the cells underwent activation of multiple genes, while in Figure 4B, they were only exposed to IGFBP2. Our results show that IGFBP2 could at least partially explain the phenotype that we see with the activation, but we believe that during the activation experiments, there might be other signals available that might not be induced by IGFBP2 alone. We have also added a summary section and a figure to clarify the different mechanisms of action of the gene activation and IGFBP2.

      (7) Figure 4- while the experiments conducted support the role of IGFBP2 in increasing hematopoietic output, there is no experimental evidence to prove its function through paracrine signalling in HECs. The authors need to provide some evidence of how IGFBP2 supplementation specifically expands only the hematopoietic progenitors. Experimental strategies involving specifically targeting IGFBP2 in hemogenic/arterial endothelial cells are required to prove its cell type specific function. Additionally, assessing the in vivo functional potential of the hematopoietic cells generated in the presence of IGFBP2, by bone-marrow transplantation of CD34+ CD43+ cells, is essential. 

      The role of IGFBP2 in the context of HSC production and expansion was not the topic of our research, and we have not claimed that IGFBP2  affects the long-term repopulating capacity of HSPCs. Therefore, we believe that the requested experiments are not required to support the specific claims that we do make. We have now provided more experiments and bioinformatic analysis that support the role of IGFBP2 in inducing the progression of EHT from arterial cells to hemogenic endothelium, and to avoid misunderstandings, we have toned down our claims by editing the text regarding its paracrine effect s. 

      (8) Figure 4C-D -It is recommended to plot % populations along with fold change value. As this is a key finding, it is important to perform flow cytometry for additional hematopoietic markers- CD144, CD235a and CD41a to demonstrate whether this strategy can also expand erythroid-megakaryocyte progenitors. Telma

      Figure 4C already shows the percentage values; we have now added the percentage for Figure 4D in SF5C. We have also performed additional analysis as requested and added the data obtained to Supplementary Figure 5D.

      (9) In Figure 5, analysis showing the frequency of cells constituting different clusters, between untreated and IGFBP2-treated samples in the single-cell transcriptome analysis is essential. Additional experiments are required to validate the function of IGFBP2 through modulation of metabolic activity. Inhibition of oxidative phosphorylation in the IGFBP2treated cells should reduce the hematopoietic output. Authors should consider doing these experiments to provide a stronger mechanistic insight into IGFBP2-mediated regulation of hematopoietic emergence.

      We have now included the requested cluster composition in Supplementary Figure 5F. We decided not to include further tests on the metabolic profile of IGFBP2 as we already discussed in other papers that showed, using selective inhibitors, that the EHT coincides with a glycol to OxPhos switch. 

      (10) It is very striking to see that IGFBP2 supplementation changes the transcriptional profile of developing hematopoietic cells by increasing transcription of OXPHOS-related genes with concomitant reduction of glycolytic signatures, particularly at Day 13. However, the mitochondrial ATP rate measurements do not seem convincing. The bioenergetic profiles show that when mitochondrial inhibitors are added, both groups exhibit decreased OCR values and, on the other hand, higher ECAR. This indicates that both groups have the capability to utilize OXPHOS or glycolysis and may only differ in their basal respiration rates.

      Differences in proliferation rate can cause basal respiration to change. There is no information on how the bioenergetic profile was normalized (cell no./protein amount). Given that IGFBP2 has been shown to increase proliferation, it is very likely that the cells treated with IGFBP2 proliferated faster and therefore have higher OCR. The data needs to be normalized appropriately to negate this possibility.

      We have previously tested whether IGFBP2 causes an increase in proliferation by analysing the cell cycle of cells treated with it, as we initially thought this could be a mechanism of action. We have now provided the quantification of the cell cycle in the cells treated with IGFBP2, showing no effect was observed in cell cycle Supplementary Figure 4E. Following this analysis, we decided to plate the same number of cells and test their density under the microscope before running the experiment; each experiment was done in triplicate for each condition. We have now added this info to the method sections lines 806-813.  We did not comment on the basal difference, which we agree might be due to several factors, but we only compared the difference in response to the inhibitors, which isn’t affected by the basal level but exclusively by their D values. We have also included the formulas used to calculate the ATP production rate.

      Overall, it appears that IGFBP2 does not seem to primarily cause metabolic changes, but simply accelerates the metabolic dependency on OXPHOS. Hence, the term 'metabolic remodelling' must be avoided unless IGFBP2 depletion/loss of function analysis is shown.

      We thank the reviewer for suggesting how to interpret the data about the dependency on OXPHOS. We have now changed the conclusions and claims about the effect of IGFBP2. We have also included a cell cycle analysis of the hematopoietic cells derived upon IGFBP2 addition to show that they don’t show differences in proliferation that could cause the increase in colony formation we observed. Regarding the assay, we have plated the same number of cells for each group to make sure we were comparing the same number of cells, which we also assessed in the microscope before the test, and we eliminated the suspension cells during the washes that preceded the measurement. The review is correct in indicating that there is a basal difference in the value of OCR and ECAR where the IGFBP2 is lower at the start and not higher, which would not conceal higher proliferation. Finally, the ATP production rate is calculated on the variation of OCR and ECAR upon the addition of inhibitors, which normalizes for the basal differences.