10,000 Matching Annotations
  1. Last 7 days
    1. eLife Assessment

      This valuable biomechanical analysis of kangaroo kinematics and kinetics across a range of hopping speeds and masses is a step towards understanding a long-standing problem in locomotion biomechanics: the mechanism for how kangaroos, unlike other mammals, can increase hopping speed without a concomitant increase in metabolic cost. The authors convincingly demonstrate that changes in kangaroo posture with speed increase tendon stress/strain and hence elastic energy storage/return. This greater tendon elastic energy storage/return may counteract the increased cost of generating muscular force at faster speeds and thus allows for the invariance in metabolic cost. This methodologically impressive study sets the stage for further work to investigate the relation of hopping speed to metabolic cost more definitively.

    2. Reviewer #1 (Public review):

      Summary:

      The study explored the biomechanics of kangaroo hopping across both speed and animal size to try and explain the unique and remarkable energetics of kangaroo locomotion.

      Strengths:

      Brings kangaroo locomotion biomechanics into the 21st century. Remarkably difficult project to accomplish. Excellent attention to detail. Clear writing and figures.

      General Comments

      This is a very impressive tour de force by an all-star collaborative team of researchers. The study represents a tremendous leap forward (pun intended) in terms of our understanding of kangaroo locomotion. Some might wonder why such an unusual species is of much interest. But, in my opinion, the classic study by Dawson and Taylor in 1973 of kangaroos launched the modern era of running biomechanics/energetics and applies to varying degrees to all animals that use bouncing gaits (running, trotting, galloping and of course hopping). The puzzling metabolic energetics findings of Dawson & Taylor (little if any increase in metabolic power despite increasing forward speed) remain a giant unsolved problem in comparative locomotor biomechanics and energetics. It is our "dark matter problem".

      This study is certainly a hop towards solving the problem. The study clearly shows that the ankle and to a lesser extent the mtp joint are where the action is. They show in great detail by how much and by what means the ankle joint tendons experience increased stress at faster forward speeds. Since these were zoo animals, direct measures were not feasible, but the conclusion that the tendons are storing and returning more elastic energy per hop at faster speeds is solid. The conclusion that net muscle work per hop changes little from slow to fast forward speeds is also solid. Doing less muscle work can only be good if one is trying to minimize metabolic energy consumption. However, to achieve the greater tendon stresses, there must be greater muscle forces. Unless one is willing to reject the premise of the cost of generating force hypothesis, that is an important issue to confront. Further, the present data support the Kram & Dawson finding of decreased contact times at faster forward speeds. Kram & Taylor and subsequent applications of (and challenges to) their approach support the idea that shorter contact times (tc) require recruiting more expensive muscle fibers and hence greater metabolic costs. The present authors have clarified that this study has still not tied up the metabolic energetics across speed problem and they now point out how the group is now uniquely and enviably poised to explore the problem more using a dynamic SIMM model that incorporates muscle energetics.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The study explored the biomechanics of kangaroo hopping across both speed and animal size to try and explain the unique and remarkable energetics of kangaroo locomotion.

      Strengths:

      The study brings kangaroo locomotion biomechanics into the 21st century. It is a remarkably difficult project to accomplish. There is excellent attention to detail, supported by clear writing and figures.

      Weaknesses:

      The authors oversell their findings, but the mystery still persists. 

      The manuscript lacks a big-picture summary with pointers to how one might resolve the big question.

      General Comments

      This is a very impressive tour de force by an all-star collaborative team of researchers. The study represents a tremendous leap forward (pun intended) in terms of our understanding of kangaroo locomotion. Some might wonder why such an unusual species is of much interest. But, in my opinion, the classic study by Dawson and Taylor in 1973 of kangaroos launched the modern era of running biomechanics/energetics and applies to varying degrees to all animals that use bouncing gaits (running, trotting, galloping and of course hopping). The puzzling metabolic energetics findings of Dawson & Taylor (little if any increase in metabolic power despite increasing forward speed) remain a giant unsolved problem in comparative locomotor biomechanics and energetics. It is our "dark matter problem".

      Thank you for the kind words.

      This study is certainly a hop towards solving the problem. But, the title of the paper overpromises and the authors present little attempt to provide an overview of the remaining big issues. 

      We have modified the title to reflect this comment.  “Postural adaptations may contribute to the unique locomotor energetics seen in hopping kangaroos”

      The study clearly shows that the ankle and to a lesser extent the mtp joint are where the action is. They clearly show in great detail by how much and by what means the ankle joint tendons experience increased stress at faster forward speeds.

      Since these were zoo animals, direct measures were not feasible, but the conclusion that the tendons are storing and returning more elastic energy per hop at faster speeds is solid. The conclusion that net muscle work per hop changes little from slow to fast forward speeds is also solid. 

      Doing less muscle work can only be good if one is trying to minimize metabolic energy consumption. However, to achieve greater tendon stresses, there must be greater muscle forces. Unless one is willing to reject the premise of the cost of generating force hypothesis, that is an important issue to confront. Further, the present data support the Kram & Dawson finding of decreased contact times at faster forward speeds. Kram & Taylor and subsequent applications of (and challenges to) their approach supports the idea that shorter contact times (tc) require recruiting more expensive muscle fibers and hence greater metabolic costs. Therefore, I think that it is incumbent on the present authors to clarify that this study has still not tied up the metabolic energetics across speed problems and placed a bow atop the package. 

      Fortunately, I am confident that the impressive collective brain power that comprises this author list can craft a paragraph or two that summarizes these ideas and points out how the group is now uniquely and enviably poised to explore the problem more using a dynamic SIMM model that incorporates muscle energetics (perhaps ala' Umberger et al.). Or perhaps they have other ideas about how they can really solve the problem.

      You have raised important points, thank you for this feedback. We have added a limitations and considerations section to the discussion which highlights that there are still unanswered questions. Line 311-328

      Considerations and limitations

      “First, we believe it is more likely that the changes in moment arms and EMA can be attributed to speed rather than body mass, given the marked changes in joint angles and ankle height observed at faster hopping speeds. However, our sample included a relatively narrow range of body masses (13.7 to 26.6 kg) compared to the potential range (up to 80 kg), limiting our ability to entirely isolate the effects of speed from those of mass. Future work should examine a broader range of body sizes. Second, kangaroos studied here only hopped at relatively slow speeds, which bounds our estimates of EMA and tendon stress to a less critical region. As such, we were unable to assess tendon stress at fast speeds, where increased forces would reduce tendon safety factors closer to failure. A different experimental or modelling approach may be needed, as kangaroos in enclosures seem unwilling to hop faster over force plates. Finally, we did not determine whether the EMA of proximal hindlimb joints (which are more difficult to track via surface motion capture markers) remained constant with speed. Although the hip and knee contribute substantially less work than the ankle joint (Fig. 4), the majority of kangaroo skeletal muscle is located around these proximal joints. A change in EMA at the hip or knee could influence a larger muscle mass than at the ankle, potentially counteracting or enhancing energy savings in the ankle extensor muscle-tendon units. Further research is needed to understand how posture and muscles throughout the whole body contribute to kangaroo energetics.”

      Additionally, we added a line “Peak GRF also naturally increased with speed together with shorter ground contact durations (Fig. 2b, Suppl. Fig 1b)” (line 238) to highlight that we are not proposing that changes in EMA alone explain the full increase in tendon stress. Both GRF and EMA contribute substantially (almost equally) to stress, and we now give more equal discussion to both. For instance, we now also evaluate how much each contributes: “If peak GRF were constant but EMA changed from the average value of a slow hop to a fast hop, then stress would increase 18%, whereas if EMA remained constant and GRF varied by the same principles, then stress would only increase by 12%. Thus, changing posture and decreasing ground contact duration both appear to influence tendon stress for kangaroos, at least for the range of speeds we examined” (Line 245-249)

      We have added a paragraph in the discussion acknowledging that the cost of generating force problem is not resolved by our work, concluding that “This mechanism may help explain why hopping macropods do not follow the energetic trends observed in other species (Dawson and Taylor 1973, Baudinette et al. 1992, Kram and Dawson 1998), but it does not fully resolve the cost of generating force conundrum” Line 274-276.

      I have a few issues with the other half of this study (i.e. animal size effects). I would enjoy reading a new paragraph by these authors in the Discussion that considers the evolutionary origins and implications of such small safety factors. Surely, it would need to be speculative, but that's OK.

      We appreciate this comment from the reviewer, however could not extend the study to discuss animal size effects because, as we now note in the results: “The range of body masses may not be sufficient to detect an effect of mass on ankle moment in addition to the effect of speed.” Line 193

      Reviewer #2 (Public Review):

      Summary

      This is a fascinating topic that has intrigued scientists for decades. I applaud the authors for trying to tackle this enigma. In this manuscript, the authors primarily measured hopping biomechanics data from kangaroos and performed inverse dynamics. 

      While these biomechanical analyses were thorough and impressively incorporated collected anatomical data and an Opensim model, I'm afraid that they did not satisfactorily address how kangaroos can hop faster and not consume more metabolic energy, unique from other animals.  Noticeably, the authors did not collect metabolic data nor did they model metabolic rates using their modelling framework. Instead, they performed a somewhat traditional inverse dynamics analysis from multiple animals hopping at a self-selected speed.

      In the current study, we aimed to provide a joint-level explanation for the increases of tendon stress that are likely linked to metabolic energy consumption.

      We have now included a limitations section in the manuscript (See response to Rev 1). We plan to expand upon muscle level energetics in the future with a more detailed musculoskeletal model.

      Within these analyses, the authors largely focused on ankle EMA, discussing its potential importance (because it affects tendon stress, which affects tendon strain energy, which affects muscle mechanics) on the metabolic cost of hopping. However, EMA was roughly estimated (CoP was fixed to the foot, not measured) and did not detectibly associate with hopping speed (see results Yet, the authors interpret their EMA findings as though it systematically related with speed to explain their theory on how metabolic cost is unique in kangaroos vs. other animals

      As noted in our methods, EMA was not calculated from a fixed centre of pressure (CoP). We did fix the medial-lateral position, owing to the fact that both feet contacted the force plate together, but the anteroposterior movement of the CoP was recorded by the force plate and thus allowed to move. We report the movement (or lack of movement) in our results. The anterior-posterior axis is the most relevant to lengthening or shortening the distance of the ‘out-lever’ R, and thereby EMA. It is necessary to assume fixed medial-lateral position because a single force trace and CoP is recorded when two feet land on the force plate. The mediallateral forces on each foot cancel out so there is no overall medial-lateral movement if the forces are symmetrical (e.g. if the kangaroo is hopping in a straight path and one foot is not in front of the other). We only used symmetrical trials so that the anterior-posterior movement of the CoP would be reliable. We have now added additional details into the text to clarify this

      Indeed, the relationship between R and speed (and therefore EMA and speed) was not significant. However, the significant change in ankle height with speed, combined with no systematic change in COP at midstance, demonstrates that R would be greater at faster speeds. If we consider the nonsignificant relationship between R and speed to indicate that there is no change in R, then these two results conflict. We could not find a flaw in our methods, so instead concluded that the nonsignificant relationship between R and speed may be due to a small change in R being undetectable in our data. Taking both results into account, we believe it is more likely that there is a non-detectable change in R, rather than no change in R with speed, but we presented both results for transparency. We have added an additional section into the results to make this clearer (Line 177-185) “If we consider the nonsignificant relationship between R (and EMA) and speed to indicate that there is no change in R, then it conflicts with the ankle height and CoP result. Taking both into account, we think it is more likely that there is a small, but important, change in R, rather than no change in R with speed. It may be undetectable because we expect small effect sizes compared to the measurement range and measurement error (Suppl. Fig. 3h), or be obscured by a similar change in R with body mass. R is highly dependent on the length of the metatarsal segment, which is longer in larger kangaroos (1 kg BM corresponded to ~1% longer segment, P<0.001, R<sup>2</sup>=0.449). If R does indeed increase with speed, both R and r will tend to decrease EMA at faster speeds.”

      These speed vs. biomechanics relationships were limited by comparisons across different animals hopping at different speeds and could have been strengthened using repeated measures design

      There is significant variation in speed within individuals, not just between individuals. The preferred speed of kangaroos is 2-4.5 m/s, but most individuals showed a wide speed range within this. Eight of our 16 kangaroos had a maximum speed that was 1-2m/s faster than their slowest trial. Repeated measures of these eight individuals comprises 78 out of the 100 trials.   It would be ideal to collect data across the full range of speeds for all individuals, but it is not feasible in this type of experimental setting. Interference with animals such as chasing is dangerous to kangaroos as they are prone to adverse reactions to stress. We have now added additional information about the chosen hopping speeds into the results and methods sections to clarify this “The kangaroos elected to hop between 1.99 and 4.48 m s<sup>-1</sup>, with a range of speeds and number of trials for each individual (Suppl. Fig. 9).”  (Line 381-382)

      There are also multiple inconsistencies between the authors' theory on how mechanics affect energetics and the cited literature, which leaves me somewhat confused and wanting more clarification and information on how mechanics and energetics relate

      We thank the reviewer for this comment. Upon rereading we now understand the reviewers position, and have made substantial revisions to the introduction and discussion (See comments below) 

      My apologies for the less-than-favorable review, I think that this is a neat biomechanics study - but am unsure if it adds much to the literature on the topic of kangaroo hopping energetics in its current form.

      Again we thank the reviewer for their time and appreciate their efforts to strengthen our manuscript.

      Reviewer #3 (Public Review):

      Summary:

      The goal of this study is to understand how, unlike other mammals, kangaroos are able to increase hopping speed without a concomitant increase in metabolic cost. They use a biomechanical analysis of kangaroo hopping data across a range of speeds to investigate how posture, effective mechanical advantage, and tendon stress vary with speed and mass. The main finding is that a change in posture leads to increasing effective mechanical advantage with speed, which ultimately increases tendon elastic energy storage and returns via greater tendon strain. Thus kangaroos may be able to conserve energy with increasing speed by flexing more, which increases tendon strain.

      Strengths:

      The approach and effort invested into collecting this valuable dataset of kangaroo locomotion is impressive. The dataset alone is a valuable contribution.

      Thank you!

      Weaknesses:

      Despite these strengths, I have concerns regarding the strength of the results and the overall clarity of the paper and methods used (which likely influences how convincingly the main results come across).

      (1) The paper seems to hinge on the finding that EMA decreases with increasing speed and that this contributes significantly to greater tendon strain estimated with increasing speed. It is very difficult to be convinced by this result for a number of reasons:

      It appears that kangaroos hopped at their preferred speed. Thus the variability observed is across individuals not within. Is this large enough of a range (either within or across subjects) to make conclusions about the effect of speed, without results being susceptible to differences between subjects? 

      Apologies, this was not clear in the manuscript. Kangaroos hopping at their preferred speed means we did not chase or startle them into high speeds to comply with ethics and enclosure limitations. Thus we did not record a wide range of speeds within the bounds of what kangaroos are capable of in the wild (up to 12 m/s), but for the range we did measure (~2-4.5 m/s), there is a large amount of variation in hopping speed within each individual kangaroo. Out of 16 individuals, eight individuals had a difference of 1-2m/s between their slowest and fastest trials, and these kangaroos accounted for 78 out of 100 trials. Of the remainder, six individuals had three for fewer trials each, and two individuals had highly repeatable speeds (3 out of 4, and 6 out of 7 trials were within 0.5 m/s). We have now removed the terminology “preferred speed” e.g line 115. We have added additional information about the chosen hopping speeds into the results and methods, including an appendix figure “The kangaroos elected to hop between 1.99 and 4.48 m s<sup>-1</sup>, with a range of speeds and number of trials for each individual (Suppl. Fig. 9).” (Line 381-382)

      In the literature cited, what was the range of speeds measured, and was it within or between subjects?

      For other literature, to our knowledge the highest speed measured is ~9.5m/s (see supplementary Fig1b) and there were multiple measures for several individuals (see methods Kram & Dawson 1998). 

      Assuming that there is a compelling relationship between EMA and velocity, how reasonable is it to extrapolate to the conclusion that this increases tendon strain and ultimately saves metabolic cost?  They correlate EMA with tendon strain, but this would still not suggest a causal relationship (incidentally the p-value for the correlation is not reported). 

      The functions that underpin these results (e.g. moment = GRF*R) come from physical mechanics and geometry, rather than statistical correlations. Additionally, a p-value is not appropriate in the relationship between EMA and stress (rather than strain) because the relationship does not appear to be linear. We have made it clearer in the discussion that we are not proposing that entire change in stress is caused by changes in EMA, but that the increase in GRF that naturally occurs with speed will also explain some of the increase in stress, along with other potential mechanisms. The discussion has been extensively revised to reflect this. 

      Tendon strain could be increasing with ground reaction force, independent of EMA. Even if there is a correlation between strain and EMA, is it not a mathematical necessity in their model that all else being equal, tendon stress will increase as ema decreases? I may be missing something, but nonetheless, it would be helpful for the authors to clarify the strength of the evidence supporting their conclusions.

      Yes, GRF also contributes to the increase in tendon stress in the mechanism we propose (Suppl. Fig. 8), see the formulas in Fig 6, and we have made this clearer in the revised discussion (see above comment).  You are correct that mathematically stress is inversely proportional to EMA, which can be observed in Fig. 7a, and we did find that EMA decreases. 

      The statistical approach is not well-described. It is not clear what the form of the statistical model used was and whether the analysis treated each trial individually or grouped trials by the kangaroo. There is also no mention of how many trials per kangaroo, or the range of speeds (or masses) tested. 

      The methods include the statistical model with the variables that we used, as well as the kangaroo masses (13.7 to 26.6 kg, mean: 20.9 ± 3.4 kg). We did not have sufficient within individual sample size to use a linear mixed effect model including subject as a random factor, thus all trials were treated individually. We have included this information in the results section. 

      We have now moved the range of speeds from the supplementary material to the results and figure captions. We have added information on the number of trials per kangaroo to the methods, and added Suppl. Fig. 9 showing the distribution of speeds per kangaroo.

      We did not group the data e.g. by using an average speed per individual for all their trials, or by comparing fast to slow groups for statistical analysis (the latter was only for display purposes in our figures, which we have now made clearer in the methods statistics section). 

      Related to this, there is no mention of how different speeds were obtained. It seems that kangaroos hopped at a self-selected pace, thus it appears that not much variation was observed. I appreciate the difficulty of conducting these experiments in a controlled manner, but this doesn’t exempt the authors from providing the details of their approach.

      Apologies, this was not clear in the manuscript. Kangaroos hopping at their preferred speed means we did not chase or startle them into high speeds to comply with ethics and enclosure limitations. Thus we did not record a wide range of speeds within the bounds of what kangaroos are capable of in the wild (up to 12 m/s). We have now removed the terminology “preferred speed” e.g. line 115. We have added additional information about the chosen hopping speeds into the results and methods, including an appendix figure (see above comment). (Line 381-382)

      Some figures (Figure 2 for example) present means for one of three speeds, yet the speeds are not reported (except in the legend) nor how these bins were determined, nor how many trials or kangaroos fit in each bin. A similar comment applies to the mass categories. It would be more convincing if the authors plotted the main metrics vs. speed to illustrate the significant trends they are reporting.

      Thank you for this comment. The bins are used only for display purposes and not within the statistical analysis. We have clarified this in the revised manuscript: “The data was grouped into body mass (small 17.6±2.96 kg, medium 21.5±0.74 kg, large 24.0±1.46 kg) and speed (slow 2.52±0.25 m s<sup>-1</sup>, medium 3.11±0.16 m s<sup>-1</sup>, fast 3.79±0.27 m s<sup>-1</sup>) subsets for display purposes only”. (Line 495-497)

      (2) The significance of the effects of mass is not clear. The introduction and abstract suggest that the paper is focused on the effect of speed, yet the effects of mass are reported throughout as well, without a clear understanding of the significance. This weakness is further exaggerated by the fact that the details of the subject masses are not reported.

      Indeed, the primary aim of our study was to explore the influence of speed, given the uncoupling of energy from hopping speed in kangaroos. We included mass to ensure that the effects of speed were not driven by body mass (i.e.: that larger kangaroos hopped faster). Subject masses were reported in the first paragraph of the methods, albeit some were estimated as outlined in the same paragraph.

      (3) The paper needs to be significantly re-written to better incorporate the methods into the results section. Since the results come before the methods, some of the methods must necessarily be described such that the study can be understood at some level without turning to the dedicated methods section. As written, it is very difficult to understand the basis of the approach, analysis, and metrics without turning to the methods.

      The methods after the discussion is a requirement of the journal. We have incorporated some methods in the results where necessary but not too repetitive or disruptive, e.g. Fig. 1 caption, and specifying we are only analysing EMA for the ankle joint

      Reviewing Editor (Recommendations For The Authors):

      Below is a list of specific recommendations that the authors could address to improve the eLife assessment:

      (1) Based on the data presented and the fact that metabolic energy was not measured, the authors should temper their conclusions and statements throughout the manuscript regarding the link between speed and metabolic energy savings. We recommend adding text to the discussion summarizing the strengths and limitations of the evidence provided and suggesting future steps to more conclusively answer this mystery.

      There is a significant body of work linking metabolic energy savings to measured increases in tendon stress in macropods. However, the purpose of this paper was to address the unanswered questions about why tendon stress increases. We found that stress did not only increase due to GRF increasing with speed as expected, but also due to novel postural changes which decreased EMA. In the revised manuscript, we have tempered our conclusions to make it clearer that it is not just EMA affecting stress, and added limitations throughout the manuscript (see response to Rev 1). 

      (2) To provide stronger evidence of a link between speed, mechanics, and metabolic savings the authors can consider estimating metabolic energy expenditure from their OpenSIM model. This is one suggestion, but the authors likely have other, possibly better ideas. Such a model should also be able to explain why the metabolic rate increases with speed during uphill hopping.

      Extending the model to provide direct metabolic cost estimates will be the goal of a future paper, however the models does not have detailed muscle characteristics to do this in the formulation presented here. It would be a very large undertaking which is beyond the scope of the current manuscript. As per the comment above, the results of this paper are not reliant on metabolic performance. 

      (3) The authors attempt to relate the newly quantified hopping biomechanics to previously published metabolic data. However, all reviewers agree that the logic in many instances is not clear or contradictory. Could one potential explanation be that at slow speeds, forces and tendon strain are small, and thus muscle fascicle work is high? Then, with faster speeds, even though the cost of generating isometric force increases, this is offset by the reduction in the metabolic cost of muscular work. The paper could provide stronger support for their hypotheses with a much clearer explanation of how the kinematics relate to the mechanics and ultimately energy savings.

      In response to the reviewers comments, we have substantially modified the discussion to provide clearer rationale.

      (4) The methods and the effort expended to collect these data are impressive, but there are a number of underlying assumptions made that undermine the conclusions. This is due partly to the methods used, but also the paper's incomplete description of their methods. We provide a few examples below:

      It would be helpful if the authors could speak to the effect of the limited speeds tested and between-animal comparisons on the ability to draw strong conclusions from the present dataset. ·

      Throughout the discussion, the authors highlight the relationship between EMA and speed. However, this is misleading since there was no significant effect of speed on EMA. Speed only affected the muscle moment arm, r. At minimum, this should be clarified and the effect on EMA not be overstated. Additionally, the resulting implications on their ability to confidently say something about the effect of speed on muscle stress should be discussed. 

      We have now provided additional details, (see responses above) to these concerns. For instance, we added a supplementary figure showing the speed distribution per individual. The primary reviewer concern (that each kangaroo travelled at a single speed) was due to a miscommunication around the terminology “preferred” which has now been corrected. 

      We now elaborate in the results why we are not very concerned that EMA is insignificant. The statistical insignificance of EMA is ultimately due to the insignificance of the direct measurement of R, however, we now better explain in the results why we believe that this statistical insignificance is due to error/noise of the measurement which is relatively large compared to the effect size. Indirect indications of how R may increase with speed (via ankle height from the ground) are statistically significant. Lines 177-185. 

      We consider this worth reporting because, for instance, an 18% change in EMA will be undetectable by measurement, but corresponds to an 18% change in tendon stress which is measurable and physiologically significant (safety factor would decrease from 2 to 1.67).  We presented both significant and insignificant results for transparency. 

      We have also discussed this within a revised limitations section of the manuscript (Line 311328). 

      Reviewer #1 (Recommendations For The Authors):

      Title: I would cut the first half of the title. At least hedge it a bit. "Clues" instead of "Unlocking the secrets".

      We have revised the title to: “Postural adaptations may contribute to the unique locomotor energetics seen in hopping kangaroos”

      In my comments, ... typically indicates a stylistic change suggested to the text.

      Overall, the paper covers speed and size. Unfortunately, the authors were not 100% consistent in the order of presenting size then speed, or speed then size. Just choose one and stick with it.

      We have attempted to keep the order of presenting size and speed consistent, however there are several cases where this would reduce the readability of the manuscript and so in some cases this may vary. 

      One must admit that there is a lot of vertical scatter in almost all of the plots. I understand that these animals were not in a lab on a treadmill at a controlled speed and the animals wear fur coats so marker placements vary/move etc. But the spread is quite striking, e.g. Figure 5a the span at one speed is almost 10x. Can the authors address this somewhere? Limitations section?

      The variation seen likely results from attempting to display data in a 2D format, when it is in fact the result of multiple variables, including speed, mass, stride frequency and subject specific lengths. Slight variations in these would be expected to produce some noise around the mean, and I think it’s important to consider this while showing the more dominant effects. 

      In many locations in the manuscript, the term "work" is used, but rarely if ever specified that this is the work "per hop". The big question revolves around the rate of metabolic energy consumption (i.e. energy per time or average metabolic power), one must not forget that hop frequency changes somewhat across speed, so work per hop is not the final calculation.

      Thank you for this comment. We have now explicitly stated work per hop in figure captions and in the results (line 208). The change in stride frequency at this range of speeds is very small, particularly compared to the variance in stride frequency (Suppl. Fig. 1d), which is consistent with other researchers who found that stride frequency was constant or near constant in macropods at analogous speeds (e.g. Dawson and Taylor 1973, Baudinette et al. 1987). 

      Line 61 ....is likely related.

      Added “likely” (line 59)

      Line 86 I think the Allen reference is incomplete. Wasn't it in J Exp Biology?

      Thank you. Changed. 

      Line 122 ... at faster speeds and in larger individuals.

      Changed: “We hypothesised that (i) the hindlimb would be more crouched at faster speeds, primarily due to the distal hindlimb joints (ankle and metatarsophalangeal), independent of changes with body mass” (Line 121-122).

      Line 124 I found this confusing. Try to re-word so that you explain you mean more work done by the tendons and less by the ankle musculature.

      Amended: “changes in moment arms resulting from the change in posture would contribute to the increase in tendon stress with speed, and may thereby contribute to energetic savings by increasing the amount of positive and negative work done by the ankle without requiring additional muscle work” (Line 123)

      Line 129 hopefully "braking" not "breaking"!

      Thank you. Fixed. (Line 130)

      Line 129 specify fore-aft horizontal force.

      Added "fore-aft" to "negative fore-aft horizontal component" (Line 130-131)

      Line 130 add something like "of course" or "naturally" since if there is zero fore-aft force, the GRF vector of course must be vertical. 

      Added "naturally" (Line 132)

      Line 138 clarify that this section is all stance phase. I don't recall reading any swing phase data.

      Changed to: "Kangaroo hindlimb stance phase kinematics varied…" (Line 141)

      Line 143 and elsewhere. I found the use of dorsiflexion and plantarflexion confusing. In Figure 3, I see the ankle never flexing more than 90 degrees. So, the ankle joint is always in something of a flexed position, though of course it flexes and extends during contact. I urge the authors to simplify to flextion/extension and drop the plantar/dorsi.

      We have edited this section to describe both movements as greater extension (plantarflexion). (Line 147). We have further clarified this in the figure caption for figure 3.  

      Line 147 ...changes were…

      Fixed, line 150

      Line 155 I'm a bit confused here. Are the authors calculating some sort of overall EMA or are they saying all of the individual joint EMAs all decreased?

      Thank you, we clarified that it is at the ankle. Line 158

      Line 158 since kangaroos hop and are thus positioned high and low throughout the stance phase, try to avoid using "high" and "low" for describing variables, e.g. GRF or other variables. Just use "greater/greatest" etc.

      Thanks for this suggestion. We have changed "higher" into "greater" where appropriate throughout the manuscript e.g. line 161

      Lines 162 and 168 same comment here about "r" and "R". Do you mean ankle or all joints?

      Clarified that it is the gastrocnemius and plantaris r, and the R to the ankle. (Lines 164-165)

      Line 173 really, ankle height?

      Added: ankle height is "vertical distance from the ground". Line 177

      Line 177 is this just the ankle r?

      Added "of the ankle" line 158 and “Achilles” line 187 

      Line 183 same idea, which tendon/tendons are you talking about here?

      Added "Achilles" to be more clear (Line 187)

      Line 195 substitute "converted" for "transferred".

      Done (Line 210)

      Line 223 why so vague? i.e. why use "may"? Believe in your data. ...stress was also modulated by changes....

      Changed "may" to "is"

      Line 229 smaller ankle EMA (especially since you earlier talked about ankle "height").

      Changed “lower” to “smaller” Line 254

      Line 2236 ...and return elastic energy…

      Added "elastic" line 262

      Line 244 IMPORTANT: Need to explain this better! I think you are saying that the net work at the ankle is staying the same across speed, BUT it is the tendons that are storing and returning that work, it's not that the muscles are doing a lot of negative/positive work.

      Changed: “The consistent net work observed among all speeds suggests the ankle extensor muscle-tendon units are performing similar amounts of ankle work independent of speed, which would predominantly be done by the tendon.” Line 270-272)

      Line 258-261 I think here is where you are over-selling the data/story. Although you do say "a" mechanism (and not "the" mechanism, you still need to deal with the cost of generating more force and generating that force faster.

      We removed this sentence and replaced it with a discussion of the cost of generating force hypothesis, and alternative scenarios for the how force and metabolics could be uncoupled. 

      Line 278 "the" tendon? Which tendon?

      Added "Achilles"

      Line 289. I don't think one can project into the past.

      Changed “projected” to "estimated"

      Line 303 no problem, but I've never seen a paper in biology where the authors admit they don't know what species they were studying!

      Can’t be helped unfortunately. It is an old dataset and there aren’t photos of every kangaroo. Fortunately, from the grey and red kangaroos we can distinguish between, we know there are no discernible species effects on the data. 

      Lines 304-306 I'm not clear here. Did you use vertical impulse (and aerial time) to calculate body weight? Or did you somehow use the braking/propulsive impulse to calculate mass? I would have just put some apples on the force plate and waited for them to stop for a snack.

      Stationary weights were recorded for some kangaroos which did stand on the force plate long enough, but unfortunately not all of them were willing to do so. In those cases, yes, we used impulse from steady-speed trials to estimate mass. We cross-checked by estimated mass from segment lengths (as size and mass are correlated). This is outlined in the first paragraph of the methods.

      Lines 367 & 401 When you use the word "scaled" do you mean you assumed geometric similarity?

      No, rather than geometric scaling, we allowed scaling to individual dimensions by using the markers at midstance for measurements. We have amended the paragraph to clarify that the shape of the kangaroo changes and that mass distribution was preserved during the shape change (line 441-446) 

      Lines 381-82 specify "joint work"

      Added "joint work"  (Line 457)

      Figure 1 is gorgeous. Why not add the CF equation to the left panel of the caption?

      We decided to keep the information in the figure caption. “Total leg length was calculated as the sum of the segment lengths (solid black lines) in the hindlimb and compared to the pelvisto-toe distance (dashed line) to calculate the crouch factor”

      Figure 2 specify Horizontal fore-aft.

      Done

      Figure 3g I'd prefer the same Min. Max Flexion vertical axis labels as you use for hip & knee.

      While we appreciate the reviewer trying to increase the clarity of this figure, we have left it as plantar/dorsi flexion since these are recognised biomechanical terms. To avoid confusion, we have further defined these in the figure caption “For (f-g), increased plantarflexion represents a decrease in joint flexion, while increased dorsiflexion represents increased flexion of the joint.”

      Figure 4. I like it and I think that you scaled all panels the same, i.e. 400 W is represented by the same vertical distance in all panels. But if that's true, please state so in the Caption. It's remarkable how little work occurs at the hip and knee despite the relatively huge muscles there.

      Is it true that the y axes are all at the same scale. We have added this to the caption. 

      Figure 5 Caption should specify "work per hop".

      Added

      Figure 7 is another beauty.

      Thank you!

      Supplementary Figure 3 is this all ANKLE? Please specify.

      Clarified that it is the gastrocnemius and plantaris r, and the R to the ankle.

      Reviewer #2 (Recommendations For The Authors):

      To 'unlock the secrets of kangaroo locomotor energetics' I expected the authors to measure the secretive outcome variable, metabolic rate using laboratory measures. Rather, the authors relied on reviewing historic metabolic data and collecting biomechanics data across different animals, which limits the conclusions of this manuscript.

      We have revised to the title to make it clearer that we are investigating a subset of the energetics problem, specifically posture. “Postural adaptations may contribute to the unique locomotor energetics seen in hopping kangaroos.” We have also substantially modified the discussion to temper the conclusions from the paper. 

      After reading the hypothesis, why do the authors hypothesize about joint flexion and not EMA? Because the following hypothesis discusses the implications of moment arms on tendon stress, EMA predictions are more relevant (and much more discussed throughout the manuscript).

      Ankle and MTP angles are the primary drivers of changes in r, R & thus, EMA. We used a two part hypothesis to capture this. We have rephased the hypotheses: “We hypothesised that (i) the hindlimb would be more crouched at faster speeds, primarily due to the distal hindlimb joints (ankle and metatarsophalangeal), independent of changes with body mass, and (ii) changes in moment arms resulting from the change in posture would contribute to the increase in tendon stress with speed, and may thereby contribute to energetic savings by increasing the amount of positive and negative work done by the ankle without requiring additional muscle work.”

      If there were no detectable effects of speed on EMA, are kangaroos mechanically like other animals (Biewener Science 89 & JAP 04) who don't vary EMA across speeds? Despite no detectible effects, the authors state [lines 228-229] "we found larger and faster kangaroos were more crouched, leading to lower ankle EMA". Can the authors explain this inconsistency? Lines 236 "Kangaroos appear to use changes in posture and EMA". I interpret the paper as EMA does not change across speed.

      Apologies, we did not sufficiently explain this originally. We now explain in the results our reasoning behind our belief that EMA and R may change with speed. “If we consider the nonsignificant relationship between R (and EMA) and speed to indicate that there is no change in R, then it conflicts with the ankle height and CoP result. Taking both into account, we think it is more likely that there is a small, but important, change in R, rather than no change in R with speed. It may be undetectable because we expect small effect sizes compared to the measurement range and measurement error (Suppl. Fig. 3h), or be obscured by a similar change in R with body mass. R is highly dependent on the length of the metatarsal segment, which is longer in larger kangaroos (1 kg BM corresponded to ~1% longer segment, P<0.001, R<sup>2</sup>=0.449). If R does indeed increase with speed, both R and r will tend to decrease EMA at faster speeds.” (Line 177-185)

      Lines 335-339: "We assumed the force was applied along phalanx IV and that there was no medial or lateral movement of the centre of pressure (CoP)". I'm confused, did the authors not measure CoP location with respect to the kangaroo limb? If not, this simple estimation undermines primary results (EMA analyses).

      We have changed "The anterior or posterior movement of the CoP was recorded by the force plate" to read: "The fore-aft movement of the CoP was recorded by the force plate within the motion capture coordinate system" (Line 406-407) and added more justification for fixing the CoP movement in the other axis: “It was necessary to assume the CoP was fixed in the mediallateral axis because when two feet land on the force plate, the lateral forces on each foot are not recorded, and indeed cancel if the forces are symmetrical (i.e. if the kangaroo is hopping in a straight path and one foot is not in front of the other). We only used symmetrical trials to ensure reliable measures of the anterior-posterior movement of the CoP.” (Line 408-413)

      The introduction makes many assertions about the generalities of locomotion and the relationship between mechanics and energetics. I'm afraid that the authors are selectively choosing references without thoroughly evaluating alternative theories. For example, Taylor, Kram, & others have multiple papers suggesting that decreasing EMA and increasing muscle force (and active muscle volume) increase metabolic costs during terrestrial locomotion. Rather, the authors suggest that decreasing EMA and increasingly high muscle force at faster speeds don't affect energetics unless muscle work increases substantially (paragraph 2)? If I am following correctly, does this theory conflict with active muscle volume ideas that are peppered throughout this manuscript?

      Yes, as you point out, the same mechanism does lead to different results in kangaroos vs humans, for instance, but this is not a contradiction. In all species, decreasing EMA will result in an increase in muscle force due to less efficient leverage (i.e. lower EMA) of the muscles, and the muscle-tendon unit will be required to produce more force to balance the joint moment. As a consequence, human muscles activate a greater volume in order for the muscle-tendon unit to increase muscle work and produce enough force. We are proposing that in kangaroos, the increase in work is done by the achilles tendon rather than the muscles. Previous research suggests that macropod ankle muscles contract isometrically or that the fibres do not shorten more at faster speeds i.e. muscle work does not increase with speed. Instead, the additional force seems to come from the tendon storing and subsequently returning more strain energy (indicated by higher stress). We found that the increase in tendon stress comes from higher ground force at faster speeds, and from it adopting a more crouched posture which increases the tendons’ stresses compared to an upright posture for a given speed (think of this as increasing the tendon’s stress capacity). We have substantially revised the discussion to highlight this.

      Similarly, does increased gross or net tendon mechanical energy storage & return improve hopping energetics? Would more tendon stress and strain energy storage with a given hysteresis value also dissipate more mechanical energy, requiring leg muscles to produce more net work? Does net or gross muscle work drive metabolic energy consumption?

      Based on the cost of generating force hypothesis, we think that gross muscle work would be linked to driving metabolic energy consumption. Our idea here is that the total body work is a product of the work done by the tendon and the muscle combined. If the tendon has the potential to do more work, then the total work can increase without muscle work needing to increase.

      The results interpret speed effects on biomechanics, but each kangaroo was only collected at 1 speed. Are inter-animal comparisons enough to satisfy this investigation?

      We have added a figure (Suppl Fig 9) to demonstrate the distribution of speed and number of trials per kangaroo. We have also removed "preferred" from the manuscript as this seems to cause confusion. Most kangaroos travelled at a range of “casual” speeds.

      Abstract: Can the authors more fully connect the concept of tendon stress and low metabolic rates during hopping across speeds? Surely, tendon mechanics don't directly drive the metabolic cost of hopping, but they affect muscle mechanics to affect energetics.

      Amended to: " This phenomenon may be related to greater elastic energy savings due to increasing tendon stress; however, the mechanisms which enable the rise in stress, without additional muscle work remain poorly understood." (Lines 25-27).

      The topic sentence in lines 61-63 may be misleading. The ensuing paragraph does not substantiate the topic sentence stating that ankle MTUs decouple speeds and energetics.

      We added "likely" to soften the statement. (Line 59)

      Lines 84-86: In humans, does more limb flexion and worse EMA necessitate greater active muscle volume? What about muscle contractile dynamics - See recent papers by Sawicki & colleagues that include Hill-type muscle mechanics in active muscle volume estimates.

      Added: “Smaller EMA requires greater muscle force to produce a given force on the ground, thereby demanding a greater volume of active muscle, and presumably greater metabolic rates than larger EMA for the same physiology”. (Line 80-82)

      Lines 106: can you give the context of what normal tendon safety factors are?

      Good idea. Added: "far lower than the typical safety factor of four to eight for mammalian tendons (Ker et al. 1988)." Line 106-107

      I thought EMA was relatively stable across speeds as per Biewener [Science & JAP '04]. However the authors gave an example of an elephant to suggest that it is typically inversely related to speed. Can the authors please explain the disconnect and the most appropriate explanation in this paragraph?

      Knee EMA in particular changed with speed in Biewener 2004. What is “typical” probably depends on the group of animals studied; e.g., cursorial quadrupedal mammals generally seem to maintain constant EMA, but other groups do not.

      These cases are presented to show a range of consequences for changing EMA (usually with mass, but sometimes with speed). We have made several adjustments to the paragraph to make this clearer. Lines 85-93.

      The results depend on the modeled internal moment arm (r). How confident are the authors in their little r prediction? Considering complications of joint mechanics in vivo including muscle bulging. Holzer et al. '20 Sci Rep demonstrated that different models of the human Achilles tendon moment arm predict vastly different relationships between the moment arm and joint angle.

      Our values for r and EMA closely align with previous papers which measured/calculate these values in kangaroos, such as Kram 1998, and thus we are confident in our interpretation.  

      This is a misleading results sentence: Small decreases in EMA correspond to a nontrivial increase in tendon stress, for instance, reducing EMA from 0.242 (mean minimum EMA of the slow group) to 0.206 (mean minimum EMA of the fast group) was associated with an ~18% increase in tendon stress. The authors could alternatively say that a ~15% decrease in EMA was associated with an ~18% increase in tendon stress, which seems pretty comparable.

      Thank you for pointing this out, it is important that it is made clearer. Although the change in relative magnitude is approximately the same (as it should be), this does not detract from the importance. The "small decrease in EMA" is referring to the absolute values, particularly in respect to the measurement error/noise. The difference is small enough to have been undetectable with other methods used in previous studies. We have amended the sentence to clarify this.

      It now reads: “Subtle decreases in EMA which may have been undetected in previous studies correspond to discernible increases in tendon stress. For instance, reducing EMA from 0.242 (mean minimum EMA of the slow group) to 0.206 (mean minimum EMA of the fast group) was associated with an increase in tendon stress from ~50 MPa to ~60 MPa, decreasing safety factor from 2 to 1.67 (where 1 indicates failure), which is both measurable and physiologically significant.” (Line 195-200)

      Lines 243-245: "The consistent net work observed among all speeds suggests the ankle extensors are performing similar amounts of ankle work independent of speed." If this is true, and presumably there is greater limb work performed on the center of mass at faster speeds (Donelan, Kram, Kuo), do more proximal leg joints increase work and energy consumption at faster speeds?

      The skin over the proximal leg joints (knee and hip) moves too much to get reliable measures of EMA from the ratio of moment arms. This will be pursued in future work when all muscles are incorporated in the model so knee and hip EMA can be determined from muscle force.

      We have added limitations and considerations paragraph to the manuscript: “Finally, we did not determine whether the EMA of proximal hindlimb joints (which are more difficult to track via surface motion capture markers) remained constant with speed. Although the hip and knee contribute substantially less work than the ankle joint (Fig. 4), the majority of kangaroo skeletal muscle is located around these proximal joints. A change in EMA at the hip or knee could influence a larger muscle mass than at the ankle, potentially counteracting or enhancing energy savings in the ankle extensor muscle-tendon units. Further research is needed to understand how posture and muscles throughout the whole body contribute to kangaroo energetics.” (Line 321-328)

      Lines 245-246: "Previous studies using sonomicrometry have shown that the muscles of tammar wallabies do not shorten considerably during hops, but rather act near-isometrically as a strut" Which muscles? All muscles? Extensors at a single joint?

      Added "gastrocnemius and plantaris" Line 164-165

      Lines 249-254: "The cost of generating force hypothesis suggests that faster movement speeds require greater rates of muscle force development, and in turn greater cross-bridge cycling rates, driving up metabolic costs (Taylor et al. 1980, Kram and Taylor 1990). The ability for the ankle extensor muscle fibres to remain isometric and produce similar amounts of work at all speeds may help explain why hopping macropods do not follow the energetic trends observed in quadrupedal species." These sentences confuse me. Kram & Taylor's cost of force-generating hypothesis assumes that producing the same average force over shorter contact times increases metabolic rate. How does 'similar muscle work' across all speeds explain the ability of macropods to use unique energetic trends in the cost of force-generating hypothesis context?

      Thank you for highlighting this confusion. We have substantially revised the discussion clarify where the mechanisms presented deviate from the cost of generating force hypothesis. Lines 270-309

      Reviewer #3 (Recommendations For The Authors):

      In addition to the points described in the public review, I have additional, related, specific comments:

      (1) Results: Please refer to the hypotheses in the results, and relate the the findings back to the hypotheses.

      We now relate the findings back to the hypotheses 

      Line 142 “In partial support of hypothesis (i), greater masses and faster speeds were associated with more crouched hindlimb postures (Fig. 3a,c).”.

      Lines 205-206: “The increase in tendon stress with speed, facilitated in part by the change in moment arms by the shift in posture, may explain changes in ankle work (c.f. Hypothesis (ii)).” 

      (2) Results: please provide the main statistical results either in-line or in a table in the main text.

      We (the co-authors) have discussed this at length, and have agreed that the manuscript is far more readable in the format whereby most statistics lie within the supplementary tables, otherwise a reader is met with a wall of statistics. We only include values in the main text when the magnitude is relevant to the arguments presented in the results and discussion.

      (3) Line 140: Describe how 'crouched' was defined.

      We have now added a brief definition of ‘Crouch factor’ after the figure caption. (Line 143) (Fig. 3a,c; where crouch factor is the ratio of total limb length to pelvis to toe distance).

      (4) Line 162: This seems to be a main finding and should be a figure in the main text not supplemental. Additionally, Supplementary Figures 3a and b do not show this finding convincingly There should be a figure plotting r vs speed and r vs mass.

      The combination of r and R are represented in the EMA plot in the main text. The r and R plots are relegated to the supplementary because the main text is already very crowded.  Thank you for the suggestion for the figure plotting r and R versus speed, this is now included as Suppl. Fig. 3h

      (5) Line 166: Supplementary Figure 3g does not show the range of dorsiflexion angles as a function of speed. It shows r vs dorsiflexion angle. Please correct.

      Thanks for noticing this, it was supposed to reference Fig 3g rather than Suppl Fig 3g in the sentence regarding speed. We have fixed this, Line 170. 

      We had added a reference to Suppl Fig 3 on Line 169 as this shows where the peak in r with ankle angle occurs (114.4 degrees).

      (6) Line 184: Where are the statistical results for this statement?

      The relationship between stress and EMA does not appear to be linear, thus we only present R<sup>^</sup>2 for the power relationship rather than a p-value. 

      (7) Line 192: The authors should explain how joint work and power relate/support the overall hypotheses. This section also refers to Figures 4 and 5 even though Figures 6 and 7 have already been described. Please reorganize.

      We have added a sentence at the end of the work and power section to mention hypothesis (ii) and lead into the discussion where it is elaborated upon. 

      “The increase in positive and negative ankle work may be due to the increase in tendon stress rather than additional muscle work.” Line 219-220 We have rearranged the figure order.

      (8) The statistics are not reported in the main text, but in the supplementary tables. If a result is reported in the main text, please report either in-line or with a table in the main text.

      We leave most statistics in the supplementary tables to preserve the readability of the manuscript. We only include values in the main text when the magnitude is relevant to the arguments raised in the results and discussion.

    1. eLife Assessment

      This important paper employs multiple experimental approaches and presents evidence that changes in membrane voltage directly affect ERK signaling to regulate cell division. This result is relevant because it supports an ion channel-independent pathway by which changes in membrane voltage can affect cell growth. The evidence now presented is solid and the data support the conclusions. This paper should be of interest to a broad readershp in the areas of cell and developemental biology and electrophysiology.

    2. Reviewer #1 (Public review):

      This is a contribution to the field of developmental bioelectricity. How do changes of resting potential at the cell membrane affect downstream processes? Zhou et al. reported in 2015 that phosphatidylserine and K-Ras cluster upon plasma membrane depolarization and that voltage-dependent ERK activation occurs when constitutively active K-RasG12V mutants are overexpressed. In this paper, the authors advance the knowledge of this phenomenon by showing that membrane depolarization up-regulates mitosis and that this process is dependent on voltage-dependent activation of ERK. ERK activity's voltage-dependence is derived from changes in the dynamics of phosphatidylserine in the plasma membrane and not by extracellular calcium dynamics. This paper reports an interesting and important finding. It is somewhat derivative of Zhou et al., 2015 (https://www.science.org/doi/full/10.1126/science.aaa5619). The main novelty seems to be that they find quantitatively different conclusions upon conducting similar experiments, albeit with a different cell line (U2OS) than those used by Zhou et al. Sasaki et al. do show that increased K+ levels increase proliferation, which Zhou et al. did not look at. The data presented in this paper are a useful contribution to a field often lacking such data.

    3. Reviewer #2 (Public review):

      Sasaki et al. use a combination of live-cell biosensors and patch-clamp electrophysiology to investigate the effect of membrane potential on the ERK MAPK signaling pathway, and probe associated effects on proliferation. This is an effect that has long been proposed, but a convincing demonstration has remained elusive, because it is difficult to perturb membrane potential without disturbing other aspects of cell physiology in complex ways. The time-resolved measurements here are a nice contribution to this question, and the perforated patch clamp experiments with an ERK biosensor are fantastic - they come closer to addressing the above difficulty of perturbing voltage than any prior work. It would have been difficult to obtain these observations with any other combination of tools.

      Comments on previous revisions:

      The authors have done a good job addressing the comments on the previous submission.

    4. Reviewer #3 (Public review):

      Summary:

      This paper demonstrates that membrane depolarization induces a small increase in cell entry into mitosis. Based on previous work from another lab, the authors propose that ERK activation might be involved. They show convincingly using a combination of assays that ERK is activated by membrane depolarization. They show this is Ca2+ independent and is a result of activation of the whole K-Ras/ERK cascade which results from changed dynamics of phosphatidylserine in the plasma membrane that activates K-Ras. Although the activation of the Ras/ERK pathway by membrane depolarization is not new, linking it to an increase in cell proliferation is novel.

      Strengths:

      A major strength of the study is the use of different techniques - live imaging with ERK reporters, as well as Western blotting to demonstrate ERK activation as well as different methods for inducing membrane depolarization. They also use a number of different cell lines. Via Western blotting the authors are also able to show that the whole MAPK cascade is activated.

      Weaknesses:

      In the previous round of revisions, the authors addressed the issues with Figure 1, and the data presented are much clearer. The authors did also attempt to pinpoint when in the cell cycle ERK is having its activity, but unfortunately, this was not conclusive.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      This is a contribution to the field of developmental bioelectricity. How do changes of resting potential at the cell membrane affect downstream processes? Zhou et al. reported in 2015 that phosphatidylserine and K-Ras cluster upon plasma membrane depolarization and that voltage-dependent ERK activation occurs when constitutively active K-RasG12V mutants are overexpressed. In this paper, the authors advance the knowledge of this phenomenon by showing that membrane depolarization up-regulates mitosis and that this process is dependent on voltage-dependent activation of ERK. ERK activity's voltage-dependence is derived from changes in the dynamics of phosphatidylserine in the plasma membrane and not by extracellular calcium dynamics. This paper reports an interesting and important finding. It is somewhat derivative of Zhou et al., 2015. (https://www.science.org/doi/full/10.1126/science.aaa5619). The main novelty seems to be that they find quantitatively different conclusions upon conducting similar experiments, albeit with a different cell line (U2OS) than those used by Zhou et al. Sasaki et al. do show that increased K+ levels increase proliferation, which Zhou et al. did not look at. The data presented in this paper are a useful contribution to a field often lacking such data.

      Strengths:

      Bioelectricity is an important field for areas of cell, developmental, and evolutionary biology, as well as for biomedicine. Confirmation of ERK as a transduction mechanism and a characterization of the molecular details involved in the control of cell proliferation are interesting and impactful.

      Weaknesses:

      The authors lean heavily on the assumption that the Nernst equation is an accurate predictor of membrane potential based on K+ level. This is a large oversimplification that undermines the author's conclusions, most glaringly in Figure 2C. The author's conclusions should be weakened to reflect that the activity of voltage gated ion channels and homeostatic compensation are unaccounted for.

      We appreciate the reviewer’s thoughtful comment regarding our reliance on the Nernst equation to estimate membrane potential. We agree that the Nernst equation is a simplification and does not account for the activity of other ions, voltage-gated channels, or homeostatic compensation mechanisms. To address this concern, we conducted electrophysiological experiments in which the membrane potential was directly controlled using the perforated patch-clamp technique (Fig. 3). Under these conditions, we also monitored the membrane potential and confirmed that there was negligible drift within 20 minutes of perfusion with 145 mM K<sup>⁺</sup> (only a 1–5 mV change). These results suggest that the influence of voltage-gated channels and homeostatic compensation is minimal in our experimental setup. We revised the manuscript to clarify these limitations and to present our conclusions more cautiously in light of this point.

      “A potential limitation of extracellular K<sup>⁺</sup>-based approaches is their reliance on the Nernst equation to estimate membrane potential, which oversimplifies the actual situation by neglecting voltage-gated ion channel activity and compensatory mechanisms. To directly address this concern, we measured membrane potential using the perforated patch-clamp technique and confirmed that the potential was stable during perfusion with 145 mM K<sup>⁺</sup> (only a 1–5 mV drift within 20 min). Moreover, we used a voltage clamp to precisely control the membrane potential and demonstrated that ERK activity was directly regulated by the voltage itself, excluding the influence of other secondary factors. An additional strength of electrophysiology is its ability to examine the effects of repolarization, which is difficult to assess with conventional perfusion-based methods owing to slow solution exchange.”

      There are grammatical tense errors are made throughout the paper (ex line 99 "This kinetics should be these kinetics")

      We thank the reviewer for pointing out the grammatical errors. We carefully revised the entire manuscript.

      Line 71: Zhou et al. use BHK, N2A, PSA-3 cells, this paper uses U2OS (osteosarcoma) cells. Could that explain the differences in bioelectric properties that they describe? In general, there should be more discussion of the choice of cell line. Why were U2OS cells chosen? What are the implications of the fact that these are cancer cells, and bone cancer cells in particular? Does this paper provide specific insights for bone cancers? And crucially, how applicable are findings from these cells to other contexts?

      We thank the reviewer for this valuable comment regarding the choice of cell line. We selected U2OS cells primarily because they are well suited for live-cell FRET imaging. We did not use BHK, N2A, or PSA-3 cells, and therefore it is difficult for us to provide a clear comparison with the specific bioelectric properties reported in Zhou et al. Nevertheless, we agree that cancer cell lines, including U2OS, may exhibit bioelectric properties that differ from those of non-cancerous cells. While this could be a potential limitation, we are inclined to consider voltage-dependent ERK activation to be a fundamental and generalizable phenomenon, not restricted to osteosarcoma cells. The key components of this pathway—phosphatidylserine, Ras, MAPK (including ERK)—are expressed in essentially all mammalian cells. In support of our view, we observed voltage-dependent ERK activation not only in U2OS cells but also in HeLa, HEK293, and A431 cells. These results strongly suggest that the mechanism we describe is not cell-type specific but rather a universal feature of mammalian cells. In the revised Discussion, we expanded our rationale to choose U2OS cells, while addressing the potential implications of using a cancer-derived cell line. 

      “In this study, we primarily used U2OS cells because their flat morphology makes them suitable for live-cell FRET imaging. Although cancer cell lines, including U2OS, may display bioelectric properties that differ from those of noncancerous cells, our findings raise the possibility that voltage-dependent ERK activation is a fundamental and broadly applicable phenomenon rather than a feature specific to osteosarcoma cells. This conclusion is supported by the fact that essential components of this pathway, namely phosphatidylserine, Ras, and MAPK (including ERK), are ubiquitously expressed in mammalian cells. Consistent with this finding, we observed voltage-dependent ERK activation across multiple cell lines: U2OS, HeLa, HEK293, and A431 cells (Fig.S2). These observations indicate that the mechanism we describe is not cell-type-restricted, but rather a universal property of mammalian cells.”

      Line 115: The authors use EGF to calibrate 'maximal' ERK stimulation. Is this level near saturation? Either way is fine, but it would be useful to clarify.

      We thank the reviewer for raising this important point. The YFP/CFP ratio obtained after EGF stimulation is generally considered to represent saturation levels detectable by EKAREV imaging. However, we acknowledge that it remains uncertain whether 10 ng/mL EGF induces the absolute maximal ERK activity in all contexts. To clarify this point, we revised the manuscript (result) text as follows:

      “To normalize variation among cells, cells were stimulated with EGF (10 ng/mL) at the end of the experiment, which presumably yielded a near-saturated YFP/CFP value (ERK activity). This value was used to determine the maximum ERK activity in each cell”

      Line 121: Starting line 121 the authors say "Of note, U2OS cells expressed wild-type K-Ras but not an active mutant of K-Ras, which means voltage dependent ERK activation occurs not only in tumor cells but also in normal cells". Given that U2OS cells are bone sarcoma cells, is it appropriate to refer to these as 'normal' cells in contrast to 'tumor' cells?

      We thank the reviewer for pointing this out. We agree that it is not appropriate to contrast U2OS cells with “normal” cells, since they are sarcoma-derived. To address this point, we revised the sentence to weaken the claim and avoid the misleading terminology.

      “Importantly, as U2OS cells express wild-type K-Ras rather than an oncogenic mutant (16), our results raise the possibility that voltage-dependent ERK activation may also occur in non-transformed cells.”

      Line 101: These normalizations seem reasonable, the conclusions sufficiently supported and the requisite assumptions clearly presented. Because the dish-to-dish and cell-to-cell variation may reflect biologically relevant phenomena it would be ideal if non-normalized data could be added in supplemental data where feasible.

      We thank the reviewer for this helpful suggestion. As recommended, we added representative non-normalized data in the Supplemental Figure S1, which illustrates the non-normalized variation across cells and dishes.

      Figure 2C is listed as Figure 2D in the text

      There is no Figure 2F (Referenced in line 148)

      We thank the reviewer for pointing out these errors. The incorrect figure citations were corrected.

      Reviewer #2 (Public review):

      Sasaki et al. use a combination of live-cell biosensors and patch-clamp electrophysiology to investigate the effect of membrane potential on the ERK MAPK signaling pathway, and probe associated effects on proliferation. This is an effect that has long been proposed, but a convincing demonstration has remained elusive, because it is difficult to perturb membrane potential without disturbing other aspects of cell physiology in complex ways. The time-resolved measurements here are a nice contribution to this question, and the perforated patch clamp experiments with an ERK biosensor are fantastic - they come closer to addressing the above difficulty of perturbing voltage than any prior work. It would have been difficult to obtain these observations with any other combination of tools.

      However, there are still some concerns as detailed in specific comments below:

      Specific comments:

      (1) All the observations of ERK activation, by both high extracellular K+ and voltage clamp, could be explained by cell volume increase (more discussion in subsequent comments). There is a substantial literature on ERK activation by hypotonic cell swelling (e.g. https://doi.org/10.1042/bj3090013, https://doi.org/10.1002/j.1460-2075.1996.tb00938.x, among others). Here are some possible observations that could demonstrate that ERK activation by volume change is distinct from the effects reported here:

      (i) Does hypotonic shock activate ERK in U2OS cells?

      (ii) Can hypotonic shock activate ERK even after PS depletion, whereas extracellular K+ cannot?

      (iii) Does high extracellular K+ change cell volume in U2OS cells, measured via an accurate method such as fluorescence exclusion microscopy?

      (iv) It would be helpful to check the osmolality of all the extracellular solutions, even though they were nominally targeted to be iso-osmotic.

      (2) Some more details about the experimental design and the results are needed from Figure 1:

      (i) For how long are the cells serum-starved? From the Methods section, it seems like the G1 release in different K+ concentration is done without serum, is this correct? Is the prior thymidine treatment also performed in the absence of serum?

      (ii) There is a question of whether depolarization constitutes a physiologically relevant mechanism to regulate proliferation, and how depolarization interacts with other extracellular signals that might be present in an in vivo context. Does depolarization only promote proliferation after extended serum starvation (in what is presumably a stressed cell state)? What fraction of total cells are observed to be mitotic (without normalization), and how does this compare to the proliferation of these cells growing in serum-supplemented media? Can K+ concentration tune proliferation rate even in serum-supplemented media?

      (3) In Figure 2, there are some possible concerns with the perfusion experiment:

      (i) Is the buffer static in the period before perfusion with high K+, or is it perfused? This is not clear from the Methods. If it is static, how does the ERK activity change when perfused with 5 mM K+? In other words, how much of the response is due to flow/media exchange versus change in K+ concentration?

      (ii) Why do there appear to be population-average decreases in ERK activity in the period before perfusion with high K+ (especially in contrast to Fig. 3)? The imaging period does not seem frequent enough for photobleaching to be significant.

      (4) Figure 3 contains important results on couplings between membrane potential and MAPK signaling. However, there are a few concerns:

      (i) Does cell volume change upon voltage clamping? Previous authors have shown that depolarizing voltage clamp can cause cells to swell, at least in the whole-cell configuration: https://www.cell.com/biophysj/fulltext/S0006-3495(18)30441-7 . Could it be possible that the clamping protocol induces changes in ERK signaling due to changes in cell volume, and not by an independent mechanism?

      (ii) Does the -80 mV clamp begin at time 0 minutes? If so, one might expect a transient decrease in sensor FRET ratio, depending on the original resting potential of the cells. Typical estimates for resting potential in HEK293 cells range from -40 mV to -15 mV, which would reach the range that induces an ERK response by depolarizing clamp in Fig. 3B. What are the resting potentials of the cells before they are clamped to -80 mV, and why do we not see this downward transient?

      (5) The activation of ERK by perforated voltage clamp and by high extracellular K+ are each convincing, but it is unclear whether they need to act purely through the same mechanism - while additional extracellular K+ does depolarize the cell, it could also be affecting function of voltage-independent transporters and cell volume regulatory mechanisms on the timescales studied. To more strongly show this, the following should be done with the HEK cells where there is already voltage clamp data:

      (i) Measure resting potential using the perforated patch in zero-current configuration in the high K+ medium. Ideally this should be done in the time window after high K+ addition where ERK activation is observed (10-20 minutes) to minimize the possibility of drift due to changes in transporter and channel activity due to post-translational regulation.

      (ii) Measure YFP/CFP ratio of the HEK cells in the high K+ medium (in contrast to the U2OS cells from Fig. 2 where there is no patch data).

      (iii) The assertion that high K+ is equivalent to changes in Vmem for ERK signaling would be supported if the YFP/CFP change from K+ addition is comparable to that induced by voltage clamp to the same potential. This would be particularly convincing if the experiment could be done with each of the 15 mM, 30 mM, and 145 mM conditions.

      (6) Line 170: "ERK activity was reduced with a fast time course (within 1 minute) after repolarization to -80 mV." I don't see this in the data: in Fig. 3C, it looks like ERK remains elevated for > 10 min after the electrical stimulus has returned to -80 mV

      Comments on revisions:

      The authors have done a good job addressing the comments on the previous submission.

      Reviewer #3 (Public review):

      Summary:

      This paper demonstrates that membrane depolarization induces a small increase in cell entry into mitosis. Based on previous work from another lab, the authors propose that ERK activation might be involved. They show convincingly using a combination of assays that ERK is activated by membrane depolarization. They show this is Ca2+ independent and is a result of activation of the whole K-Ras/ERK cascade which results from changed dynamics of phosphatidylserine in the plasma membrane that activates K-Ras. Although the activation of the Ras/ERK pathway by membrane depolarization is not new, linking it to an increase in cell proliferation is novel.

      Strengths

      A major strength of the study is the use of different techniques - live imaging with ERK reporters, as well as Western blotting to demonstrate ERK activation as well as different methods for inducing membrane depolarization. They also use a number of different cell lines. Via Western blotting the authors are also able to show that the whole MAPK cascade is activated.

      Weaknesses

      A weakness of the study is the data in Figure 1 showing that membrane depolarization results in an increase of cells entering mitosis. There are very few cells entering mitosis in their sample in any condition. This should be done with many more cells to increase the confidence in the results. The study also lacks a mechanistic link between ERK activation by membrane depolarization and increased cell proliferation.

      The authors did achieve their aims with the caveat that the cell proliferation results could be strengthened. The results, for the most par,t support the conclusions.

      This work suggests that alterations in membrane potential may have more physiological functions than action potential in the neural system as it has an effect on intracellular signalling and potentially cell proliferation.

      In the revised manuscript, the authors have now addressed the issues with Figure 1, and the data presented are much clearer. They did also attempt to pinpoint when in the cell cycle ERK is having its activity, but unfortunately, this was not conclusive.

      Reviewer #2 (Recommendations for the authors):

      Small issues:

      Fig. 1A. Please add a mark on the timeline showing when the K+ concentration is changed. Also, please add a time axis that matches the time axis in (C), so readers can know when in C the medium was changed.

      1B caption: unclear what "the images were 20 min before and after cytokinesis" means, given that the images go from -30 min to +20 min. Maybe the authors mean, "the indicated times are measured relative to cytokinesis."

      Thank you for bringing these points to our attention that can confuse readers. We revised the figure legend.

      Line 214: nonoclusters --> nanoclusters

      Line 475: 10 mm -> 10 ¥mum

      Corrected.

  2. Oct 2025
    1. eLife Assessment

      This high-N, multi-task study offers a comprehensive examination of rhythmicity in behavioral performance during listening. It presents a valuable set of findings that reveal task- and ear-specific effects, challenging the notion of a universal rhythmicity in auditory perception. The evidence is solid and the work is likely to be of significant interest to behavioral and cognitive scientists focused on perception and neural oscillations.

    2. Reviewer #1 (Public review):

      Summary:

      This paper presents results from four independent experiments, each of them testing for rhythmicity in auditory perception. The authors report rhythmic fluctuations in discrimination performance at frequencies between 2 and 6 Hz. The exact frequency depends on the ear and experimental paradigm, although some frequencies seem to be more common than others.

      Strengths:

      The first sentence in the abstract describes the state of the art perfectly: "Numerous studies advocate for a rhythmic mode of perception; however, the evidence in the context of auditory perception remains inconsistent". This is precisely why the data from the present study is so valuable. This is probably the study with the highest sample size (total of > 100 in 4 experiments) in the field. The analysis is very thorough and transparent, due to the comparison of several statistical approaches and simulations of their sensitivity. Each of the experiments differs from the others in a clearly defined experimental parameter, and the authors test how this impacts auditory rhythmicity, measured in pitch discrimination performance (accuracy, sensitivity, bias) of a target presented at various delays after noise onset.

      Weaknesses:

      The authors find that the frequency in auditory perception changes between experiments. Possible reasons for such differences are described, but they remain difficult to interpret, as it is unclear whether they merely reflect some natural variability (independent of experimental parameters) or are indeed driven by the specific experimental paradigm (and therefore replicable).

      Therefore, it remains to be shown whether there is any systematic pattern in the results that allows conclusions about the roles of different frequencies.

    3. Reviewer #2 (Public review):

      Summary:

      The current study aims to shed light on why previous work on perceptual rhythmicity has led to inconsistent results. They propose that the differences may stem from conceptual and methodological issues. In a series of experiments, the current study reports perceptual rhythmicity in different frequency bands that differ between different ear stimulations and behavioral measures. The study suggests challenges regarding the idea of universal perceptual rhythmicity in hearing.

      Strengths:

      The study aims to address differences observed in previous studies about perceptual rhythmicity. This is important and timely because the existing literature provides quite inconsistent findings. Several experiments were conducted to assess perceptual rhythmicity in hearing from different angles. The authors use sophisticated approaches to address the research questions. The manuscript has greatly improved after the revision.

      Weaknesses:

      Additional variance: In several experiments, a fixation cross preceded - at a variable interval - the onset of the background noise that aimed to reset the phase of an ongoing oscillation. There is the chance that the fixation cross also resets the phase, potentially adding variance to the data. In addition, the authors used an adaptive procedure during the experimental blocks such that the stimulus intensity was adjusted throughout. There is good reason for doing so, but it means that correctly identified/discriminated targets will on average have a greater stimulus intensity. This may add variance to the data. These two aspects may potentially contribute to the observation of weak perceptual rhythmicity.

      Figures: The text in Figures 4 and 6 is small. I think readers would benefit from a larger font size. Moreover, Figure 1A is not very intuitive. Perhaps it could be made clearer. The new Figure 5 was not discussed in the text. I wonder whether analyses with traditional t-tests could be placed in the supplements.

      50% significant samples: The authors consider 50% of significant bootstrapped samples robust. For example: "This revealed that the above‐mentioned effects prevail for at least 50% of the simulated experiments, corroborating their robustness within the participant sample". Many of the effects have even lower than 50% of significant samples. It is a matter of opinion of what is robust or not, but I think combined with the overall variable nature of the effects in different frequency bands and ears etc. leaves more the impression that the effects are not very robust. I think the authors state it correctly in the last sentence of the first paragraph of the discussion: "At the same time the prevalence of significant effects in random samples of participants were mostly below 50%, raising questions as to the ubiquity of such effects." I think the authors should update the abstract in this regard to avoid that readers who only read the abstract get the wrong impression about the robustness of the effects. It is not clear to me if the same study (using the same conditions) was done in a different lab that the results would come out similarly to the results reported here.

    4. Reviewer #3 (Public review):

      Summary:

      The finding of rhythmic activity in the brain has for a long time engendered the theory of rhythmic modes of perception, that humans might oscillate between improved and worse perception depending on states of our internal systems. However, experiments looking for such modes have resulted in conflicting findings, particularly in those where the stimulus itself is not rhythmic. This paper seeks to take a comprehensive look at the effect and various experimental parameters which might generate these competing findings: in particular, the presentation of the stimulus to one ear or the other, the relevance of motor involvement, attentional demands, and memory: each of which are revealed to effect the consistency of this rhythmicity.

      The need the paper attempts to resolve is a critical one for the field. However, as presented, I remain unconvinced that the data would not be better interpreted as showing no consistent rhythmic mode effect.

      Strengths:

      The paper is strong in its experimental protocol and its comprehensive analysis which seeks to compare effects across several analysis types and slight experiment changes to investigate which parameters could effect the presence or absence of an effect of rhythmicity. The prescribed nature of its hypotheses and its manner to set out to test them is very clear which allows for a straightforward assessment of its results

      Weaknesses:

      The papers cited to justify a rhythmic mode are largely based on the processing of rhythmic stimuli. The authors assume the rhythmic mode to be the general default but its not so clear to me why this would be so. The task design seems better suited to a continuous vigilance mode task.

      Secondly, the analysis to detect a "rhythmic mode", assumes a total phase rest at noise onset which is highly implausible given standard nonlinear dynamical analysis of oscillator performance. It's not clear that a rhythmic mode (should it be applied in this task) would indeed generate a consistent phase as the analysis searches for.

      Thirdly, the number of statistical tests used here make trusting any single effect quite difficult and very few of the effects replicate more than once. I think the better would be interpreted as not confirming evidence for rhythmic mode processing in the ears.

      Comments on revised version:

      No further comments. The paper has much of the same issues that I expressed in the initial review but I don't think they can be addressed without a replication study which I appreciate is not always plausible.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This paper presents results from four independent experiments, each of which tests for rhythmicity in auditory perception. The authors report rhythmic fluctuations in discrimination performance at frequencies between 2 and 6 Hz. The exact frequency depends on the ear and experimental paradigm, although some frequencies seem to be more common than others.

      Strengths:

      The first sentence in the abstract describes the state of the art perfectly: "Numerous studies advocate for a rhythmic mode of perception; however, the evidence in the context of auditory perception remains inconsistent". This is precisely why the data from the present study is so valuable. This is probably the study with the highest sample size (total of > 100 in 4 experiments) in the field. The analysis is very thorough and transparent, due to the comparison of several statistical approaches and simulations of their sensitivity. Each of the experiments differs from the others in a clearly defined experimental parameter, and the authors test how this impacts auditory rhythmicity, measured in pitch discrimination performance (accuracy, sensitivity, bias) of a target presented at various delays after noise onset.

      Weaknesses:

      (1) The authors find that the frequency of auditory perception changes between experiments. I think they could exploit differences between experiments better to interpret and understand the obtained results. These differences are very well described in the Introduction, but don't seem to be used for the interpretation of results. For instance, what does it mean if perceptual frequency changes from between- to within-trial pitch discrimination? Why did the authors choose this experimental manipulation? Based on differences between experiments, is there any systematic pattern in the results that allows conclusions about the roles of different frequencies? I think the Discussion would benefit from an extension to cover this aspect.

      We believe that interpreting these differences remains difficult and a precise, detailed (and possibly mechanistic) interpretation is beyond the goal of the present study. The main goal of this study was to explore the consistency and variability of effects across variations of the experimental design and samples of participants. Interpreting specific effects, e.g. at particular frequencies, would make sense mostly if differences between experiments have been confirmed in a separate reproduction. Still, we do provide specific arguments for why differences in the outcome between different experiments, e.g. with and without explicit trial initialization by the participants, could be expected. See lines 91ff in the introduction and 786ff in the discussion.

      (2) The Results give the impression of clear-cut differences in relevant frequencies between experiments (e.g., 2 Hz in Experiment 1, 6 Hz in Exp 2, etc), but they might not be so different. For instance, a 6 Hz effect is also visible in Experiment 1, but it just does not reach conventional significance. The average across the three experiments is therefore very useful, and also seems to suggest that differences between experiments are not very pronounced (otherwise the average would not produce clear peaks in the spectrum). I suggest making this point clearer in the text.

      We have revised the conclusions to note that the present data do not support clear cut differences between experiments. For this reason we also refrain from detailed interpretations of specific effects, as suggested by this reviewer in point 1 above.

      (3) I struggle to understand the hypothesis that rhythmic sampling differs between ears. In most everyday scenarios, the same sounds arrive at both ears, and the time difference between the two is too small to play a role for the frequencies tested. If both ears operate at different frequencies, the effects of the rhythm on overall perception would then often cancel out. But if this is the case, why would the two ears have different rhythms to begin with? This could be described in more detail.

      This hypothesis was not invented by us, but in essence put forward in previous work. The study by Ho et al. CurrBiol 2017 has reported rhythmic effects at different frequencies in the left and right ears, and we here tried to reproduce these effects. One could speculate about an ear-difference based on studies reporting a right-ear advantage in specific listening tasks, and the idea that different time scales of rhythmic brain activity may be specifically prevail in the left and right cortical hemispheres; hence it does not seem improbable that there could be rhythmic effects in both ears at different frequencies. We note this in the introduction, l. 65ff.

      Reviewer #2 (Public review):

      Summary:

      The current study aims to shed light on why previous work on perceptual rhythmicity has led to inconsistent results. They propose that the differences may stem from conceptual and methodological issues. In a series of experiments, the current study reports perceptual rhythmicity in different frequency bands that differ between different ear stimulations and behavioral measures.

      The study suggests challenges regarding the idea of universal perceptual rhythmicity in hearing.

      Strengths:

      The study aims to address differences observed in previous studies about perceptual rhythmicity. This is important and timely because the existing literature provides quite inconsistent findings. Several experiments were conducted to assess perceptual rhythmicity in hearing from different angles. The authors use sophisticated approaches to address the research questions.

      Weaknesses:

      (1) Conceptional concerns:

      The authors place their research in the context of a rhythmic mode of perception. They also discuss continuous vs rhythmic mode processing. Their study further follows a design that seems to be based on paradigms that assume a recent phase in neural oscillations that subsequently influence perception (e.g., Fiebelkorn et al.; Landau & Fries). In my view, these are different facets in the neural oscillation research space that require a bit more nuanced separation. Continuous mode processing is associated with vigilance tasks (work by Schroeder and Lakatos; reduction of low frequency oscillations and sustained gamma activity), whereas the authors of this study seem to link it to hearing tasks specifically (e.g., line 694). Rhythmic mode processing is associated with rhythmic stimulation by which neural oscillations entrain and influence perception (also, Schroeder and Lakatos; greater low-frequency fluctuations and more rhythmic gamma activity). The current study mirrors the continuous rather than the rhythmic mode (i.e., there was no rhythmic stimulation), but even the former seems not fully fitting, because trials are 1.8 s short and do not really reflect a vigilance task. Finally, previous paradigms on phase-resetting reflect more closely the design of the current study (i.e., different times of a target stimulus relative to the reset of an oscillation). This is the work by Fiebelkorn et al., Landau & Fries, and others, which do not seem to be cited here, which I find surprising. Moreover, the authors would want to discuss the role of the background noise in resetting the phase of an oscillation, and the role of the fixation cross also possibly resetting the phase of an oscillation. Regardless, the conceptional mixture of all these facets makes interpretations really challenging. The phase-reset nature of the paradigm is not (or not well) explained, and the discussion mixes the different concepts and approaches. I recommend that the authors frame their work more clearly in the context of these different concepts (affecting large portions of the manuscript).

      Indeed, the paradigms used here and in many similar previous studies incorporate an aspect of phase-resetting, as the presentation of a background noisy may effectively reset ongoing auditory cortical processes. Studies trying to probe for rhythmicity in auditory perception in the absence any background noise have not shown any effect (Zoefel and Heil, 2013), perhaps because the necessary rhythmic processes along auditory pathways are only engaged when some sound is present. We now discuss these points, and also acknowledge the mentioned studies in the visual system; l. 57.

      (2) Methodological concerns:

      The authors use a relatively unorthodox approach to statistical testing. I understand that they try to capture and characterize the sensitivity of the different analysis approaches to rhythmic behavioral effects. However, it is a bit unclear what meaningful effects are in the study. For example, the bootstrapping approach that identifies the percentage of significant variations of sample selections is rather descriptive (Figures 5-7). The authors seem to suggest that 50% of the samples are meaningful (given the dashed line in the figure), even though this is rarely reached in any of the analyses. Perhaps >80% of samples should show a significant effect to be meaningful (at least to my subjective mind). To me, the low percentage rather suggests that there is not too much meaningful rhythmicity present. 

      We note that there is no clear consensus on what fraction of experiments should be expected or how this way of quantifying effects should be precisely valued (l. 441ff). However, we now also clearly acknowledge in the discussion that the effective prevalence is not very high (l. 663).

      I suggest that the authors also present more traditional, perhaps multi-level, analyses: Calculation of spectra, binning, or single-trial analysis for each participant and condition, and the respective calculation of the surrogate data analysis, and then comparison of the surrogate data to the original data on the second (participant) level using t-tests. I also thought the statistical approach undertaken here could have been a bit more clearly/didactically described as well.

      We here realize that our description of the methods was possibly not fully clear. We do follow the strategy as suggested by this reviewer, but rather than comparing actual and surrogate data based on a parametric t-test, we compare these based on a non-parametric percentile-based approach. This has the advantage of not making specific (and possibly not-warranted) assumptions about the distribution of the data. We have revised the methods to clarify this, l. 332ff. 

      The authors used an adaptive procedure during the experimental blocks such that the stimulus intensity was adjusted throughout. In practice, this can be a disadvantage relative to keeping the intensity constant throughout, because, on average, correct trials will be associated with a higher intensity than incorrect trials, potentially making observations of perceptual rhythmicity more challenging. The authors would want to discuss this potential issue. Intensity adjustments could perhaps contribute to the observed rhythmicity effects. Perhaps the rhythmicity of the stimulus intensity could be analyzed as well. In any case, the adaptive procedure may add variance to the data.

      We have added an analysis of task difficulty to the results (new section “Effects of adaptive task difficulty“) to address this. Overall we do not find systematic changes in task difficulty across participants for most of the experiments, but for sure one cannot rule out that this aspect of the design also affects the outcomes.  Importantly, we relied on an adaptive task difficulty to actually (or hopefully) reduce variance in the data, by keeping the task-difficulty around a certain level. Give the large number of trials collected, not using such an adaptive produce may result in performance levels around chance or near ceiling, which would make impossible to detect rhythmic variations in behavior. 

      Additional methodological concerns relate to Figure 8. Figures 8A and C seem to indicate that a baseline correction for a very short time window was calculated (I could not find anything about this in the methods section). The data seem very variable and artificially constrained in the baseline time window. It was unclear what the reader might take from Figure 8.

      This figure was intended mostly for illustration of the eye tracking data, but we agree that there is no specific key insight to be taken from this. We removed this. 

      Motivation and discussion of eye-movement/pupillometry and motor activity: The dual task paradigm of Experiment 4 and the reasons for assessing eye metrics in the current study could have been better motivated. The experiment somehow does not fit in very well. There is recent evidence that eye movements decrease during effortful tasks (e.g., Contadini-Wright et al. 2023 J Neurosci; Herrmann & Ryan 2024 J Cog Neurosci), which appears to contradict the results presented in the current study. Moreover, by appealing to active sensing frameworks, the authors suggest that active movements can facilitate listening outcomes (line 677; they should provide a reference for this claim), but it is unclear how this would relate to eye movements. Certainly, a person may move their head closer to a sound source in the presence of competing sound to increase the signal-to-noise ratio, but this is not really the active movements that are measured here. A more detailed discussion may be important. The authors further frame the difference between Experiments 1 and 2 as being related to participants' motor activity. However, there are other factors that could explain differences between experiments. Self-paced trials give participants the opportunity to rest more (inter-trial durations were likely longer in Experiment 2), perhaps affecting attentional engagement. I think a more nuanced discussion may be warranted.

      We expanded the motivation of why self-pacing trials may effectively alter how rhythmic processes affect perception, and now also allude to attention and expectation related effects (l. 786ff). Regarding eye movements we now discuss the results in the light of the previously mentioned studies, but again refrain from a very detailed and mechanistic interpretation (l. 782).

      Discussion:

      The main data in Figure 3 showed little rhythmicity. The authors seem to glance over this fact by simply stating that the same phase is not necessary for their statistical analysis. Previous work, however, showed rhythmicity in the across-participant average (e.g., Fiebelkorn's and similar work). Moreover, one would expect that some of the effects in the low-frequency band (e.g., 2-4 Hz) are somewhat similar across participants. Conduction delays in the auditory system are much smaller than the 0.25-0.5 s associated with 2-4 Hz. The authors would want to discuss why different participants would express so vastly different phases that the across-participant average does not show any rhythmicity, and what this would mean neurophysiologically.

      We now discussion the assumptions and implications of similar or distinct phases of rhythmic processes within and between participants (l. 695ff). In particular we note that different origins of the underlying neurophysiological processes eventually may suggest that such assumptions are or a not warranted.  

      An additional point that may require more nuanced discussion is related to the rhythmicity of response bias versus sensitivity. The authors could discuss what the rhythmicity of these different measures in different frequency bands means, with respect to underlying neural oscillations.

      We expanded discussion to interpret what rhythmic changes in each of the behavioral metric could imply (l. 706ff).

      Figures:

      Much of the text in the figures seems really small. Perhaps the authors would want to ensure it is readable even for those with low vision abilities. Moreover, Figure 1A is not as intuitive as it could be and may perhaps be made clearer. I also suggest the authors discuss a bit more the potential monoaural vs binaural issues, because the perceptual rhythmicity is much slower than any conduction delays in the auditory system that could lead to interference.

      We tried to improve the font sizes where possible, and discuss the potential monaural origins as suggested by other reviewers. 

      Reviewer #3 (Public review):

      Summary:

      The finding of rhythmic activity in the brain has, for a long time, engendered the theory of rhythmic modes of perception, that humans might oscillate between improved and worse perception depending on states of our internal systems. However, experiments looking for such modes have resulted in conflicting findings, particularly in those where the stimulus itself is not rhythmic. This paper seeks to take a comprehensive look at the effect and various experimental parameters which might generate these competing findings: in particular, the presentation of the stimulus to one ear or the other, the relevance of motor involvement, attentional demands, and memory: each of which are revealed to effect the consistency of this rhythmicity.

      The need the paper attempts to resolve is a critical one for the field. However, as presented, I remain unconvinced that the data would not be better interpreted as showing no consistent rhythmic mode effect. It lacks a conceptual framework to understand why effects might be consistent in each ear but at different frequencies and only for some tasks with slight variants, some affecting sensitivity and some affecting bias.

      Strengths:

      The paper is strong in its experimental protocol and its comprehensive analysis, which seeks to compare effects across several analysis types and slight experiment changes to investigate which parameters could affect the presence or absence of an effect of rhythmicity. The prescribed nature of its hypotheses and its manner of setting out to test them is very clear, which allows for a straightforward assessment of its results

      Weaknesses:

      There is a weakness throughout the paper in terms of establishing a conceptual framework both for the source of "rhythmic modes" and for the interpretation of the results. Before understanding the data on this matter, it would be useful to discuss why one would posit such a theory to begin with. From a perceptual side, rhythmic modes of processing in the absence of rhythmic stimuli would not appear to provide any benefit to processing. From a biological or homeostatic argument, it's unclear why we would expect such fluctuations to occur in such a narrow-band way when neither the stimulus nor the neurobiological circuits require it.

      We believe that the framework for why there may be rhythmic activity along auditory pathways that shapes behavioral outcomes has been laid out in many previous studies, prominently here (Schroeder et al., 2008; Schroeder and Lakatos, 2009; Obleser and Kayser, 2019). Many of the relevant studies are cited in the introduction, which is already rather long given the many points covered in this study. 

      Secondly, for the analysis to detect a "rhythmic mode", it must assume that the phase of fluctuations across an experiment (i.e., whether fluctuations are in an up-state or down-state at onset) is constant at stimulus onset, whereas most oscillations do not have such a total phase-reset as a result of input. Therefore, some theoretical positing of what kind of mechanism could generate this fluctuation is critical toward understanding whether the analysis is well-suited to the studied mechanism.

      In line with this and previous comments (by reviewer 2) we have expanded the discussion to consider the issue of phase alignment (l. 695ff). 

      Thirdly, an interpretation of why we should expect left and right ears to have distinct frequency ranges of fluctuations is required. There are a large number of statistical tests in this paper, and it's not clear how multiple comparisons are controlled for, apart from experiment 4 (which specifies B&H false discovery rate). As such, one critical method to identify whether the results are not the result of noise or sample-specific biases is the plausibility of the finding. On its face, maintaining distinct frequencies of perception in each ear does not fit an obvious conceptual framework.

      Again this point was also noted by another reviewer and we expanded the introduction and discussion in this regard (l. 65ff).

      Reviewer #1 (Recommendations for the authors):

      (1) An update of the AR-surrogate method has recently been published (https://doi.org/10.1101/2024.08.22.609278). I appreciate that this is a lot of work, and it is of coursee up to the authors, but given the higher sensitivity of this method, it might be worth applying it to the four datasets described here.

      Reading this article we note that our implementation of the AR-surrogate method was essentially as suggested here, and not as implemented by Brookshire. In fact we had not realized that Brookshire had apparently computed the spectrum based on the group-average data. As explained in the Methods section, as now clarified even better, we compute for each participant the actual spectrum of this participant’s data, and a set of surrogate spectra. We then perform a group-average of both to compute the p-value of the actual group-average based on the percentile of the distribution of surrogate averages. This send step differs from Harris & Beale, which used a one-sided t-test. The latter is most likely not appropriate in a strict statistical sense, but possibly more powerful for detecting true results compared to the percentile-based approach that we used (see l. 332ff).

      (2) When results for the four experiments are reported, a reminder for the reader of how these experiments differ from each other would be useful.

      We have added this in the Results section.

      "considerable prevalence of differences around 4Hz, with dual‐task requirements leading to stronger rhythmicity in perceptual sensitivity". There is a striking similarity to recently published data (https://doi.org/10.1101/2024.08.10.607439 ) demonstrating a 4-Hz rhythm in auditory divided attention (rather than between modalities as in the present case). This could be a useful addition to the paragraph.

      We have added a reference to this preprint, and additional previous work pointing in the same direction mentioned in there.  

      (3) There are two typos in the Introduction: "related by different from the question", and below, there is one "presented" too much.

      These have been fixed.

      Reviewer #3 (Recommendations for the authors):

      My major suggestion is that these results must be replicated in a new sample. I understand this is not simple to do and not always possible, but at this point, no effect is replicated from one experiment to the next, despite very small changes in protocol (especially experiment 1 vs 2). It's therefore very difficult to justify explaining the different effects as real as opposed to random effects of this particular sample. While the bootstrapping effects show the level of consistency of the effect within the sample studied, it can not be a substitute for a true replication of the results in a new sample.

      We agree that only an independent replication can demonstrate the robustness of the results. We do consider experiment 1 a replication test of Ho et al. CurrBiol 2017, which results in different results than reported there. But more importantly, we consider the analysis of ‘reproducibility’ by simulating participant samples a key novelty of the present work, and want to emphasize this over the within-study replication of the same experiment.  In fact, in light of the present interpretation of the data, even a within-study replication would most likely not offer a clear-cut answer. 

      As I said in the public review, the interpretation of the results, and of why perceptual cycles in arhythmic stimuli could be a plausible theory to begin with, is lacking. A conceptual framework would vastly improve the impact and understanding of the results.

      We tried to strengthen the conceptual framework in the introduction. We believe that this is in large provided by previous work, and the aim of the present study was to explore the robustness of effects and not to suggest and discover novel effects. 

      Minor comments:

      (1) The authors adapt the difficulty as a function of performance, which seems to me a strange choice for an experiment that is analyzing the differences in performance across the experiment. Could you add a sentence to discuss the motivation for this choice?

      We now mention the rationale in the Methods section and in a new section of the Results. There we also provide additional analyses on this parameter.

      (2) The choice to plot the p-values as opposed to the values of the actual analysis feels ill-advised to me. It invites comparison across analyses that isn't necessarily fair. It would be more informative to plot the respective analysis outputs (spectral power, regression, or delta R2) and highlight the windows of significance and their overlap across analyses. In my opinion, this would be more fair and accurate depiction of the analyses as they are meant to be used.

      We do disagree. As explained in the Methods (l. 374ff): “(Showing p-values) … allows presenting the results on a scale that can be directly compared between analysis approaches, metrics, frequencies and analyses focusing on individual ears or the combined data. Each approach has a different statistical sensitivity, and the underlying effect sizes (e.g. spectral power) vary with frequency for both the actual data and null distribution. As a result, the effect size reaching statistical significance varies with frequency, metrics and analyses.” 

      The fact that the level of power (or R2 or whatever metric we consider) required to reach significance differs between analyses (one ear, both ears), metrics (d-prime, bias, RT) and between analyses approaches makes showing the results difficult, as we would need a separate panel for each of those. This would multiply the number of panels required e.g. for Figure 4 by 3, making it a figure with 81 axes. Also neither the original quantities of each analysis (e.g. spectral power) nor the p-values that we show constitute a proper measure of effect size in a statistical sense. In that sense, neither of these is truly ideal for comparing between analyses, metrics etc. 

      We do agree thought that many readers may want to see the original quantification and thresholds for statistical significance. We now show these in an exemplary manner for the Binned analysis of Experiment 1, which provides a positive result and also is an attempt to replicate the findings by  Ho et al 2017. This is shown in new Figure 5. 

      (3) Typo in line 555 (+ should be plus minus).

      (4) Typo in line 572: "Comparison of 572 blocks with minus dual task those without"

      (5) Typo in line 616: remove "one".

      (6) Line 666 refers to effects in alpha band activity, but it's unclear what the relationship is to the authors' findings, which peak around 6 Hz, lower than alpha (~10 Hz).

      (7) Line 688 typo, remove "amount of".

      These points have been addressed.  

      (8) Oculomotor effect that drives greater rhythmicity at 3-4 Hz. Did the authors analyze the eye movements to see if saccades were also occurring at this rate? It would be useful to know if the 3-4 Hz effect is driven by "internal circuitry" in the auditory system or by the typical rate of eye movement.

      A preliminary analysis of eye movement data was in previous Figure 8, which was removed on the recommendation of another review.  This showed that the average saccade rate is about 0.01 saccade /per trial per time bin, amounting to on average less than one detected saccade per trial. Hence rhythmicity in saccades is unlikely to explain rhythmicity in behavioral data at the scale of 34Hz. We now note this in the Results.

      Obleser J, Kayser C (2019) Neural Entrainment and Attentional Selection in the Listening Brain. Trends Cogn Sci 23:913-926.

      Schroeder CE, Lakatos P (2009) Low-frequency neuronal oscillations as instruments of sensory selection. Trends Neurosci 32:9-18.

      Schroeder CE, Lakatos P, Kajikawa Y, Partan S, Puce A (2008) Neuronal oscillations and visual amplification of speech. Trends Cogn Sci 12:106-113.

      Zoefel B, Heil P (2013) Detection of Near-Threshold Sounds is Independent of EEG Phase in Common Frequency Bands. Front Psychol 4:262.

    1. eLife Assessment

      The reviewers have found that this manuscript is a valuable contribution, and the evidence in support of its conclusions is mostly solid. It provides novel insights and raises interesting possibilities about the functions of an understudied histone modification within the nucleosome core; however, the data are mostly descriptive and correlative, and although this has value, it is not totally persuasive. Short of additional non-genomic experiments, a more detailed analysis of the genomic data and perhaps additional data would strengthen the conclusions. The manuscript crucially needs further antibody validation to raise confidence in the data.

    2. Reviewer #1 (Public review):

      Summary:

      The authors investigate the role of H3K115ac in mouse embryonic stem cells. They report that H3K115ac localizes to regions enriched for fragile nucleosomes, CpG islands, and enhancers, and that it correlates with transcriptional activity. These findings suggest a potential role for this globular domain modification in nucleosome dynamics and gene regulation. If robust, these observations would expand our understanding of how non-tail histone modifications contribute to chromatin accessibility and transcriptional control.

      Strengths:

      (1) The study addresses a histone PTM in the globular domain, which is relatively unexplored compared to tail modifications.

      (2) The implication of a histone PTM in fragile nucleosome localization is novel and, if substantiated, could represent a significant advance for the field.

      Weaknesses:

      (1) The absence of replicate paired-end datasets limits confidence in peak localization.

      (2) The analyses are primarily correlative, making it difficult to fully assess robustness or to support strong mechanistic conclusions.

      (3) Some claims (e.g., specificity for CpG islands, "dynamic" regulation during differentiation) are not fully supported by the analyses as presented.

      (4) Overall, the study introduces an intriguing new angle on globular PTMs, but additional rigor and mechanistic evidence are needed to substantiate the conclusions.

    3. Reviewer #2 (Public review):

      Summary:

      Kumar et al. aimed to assess the role of the understudied H3K115 acetylation mark, which is located in the nucleosomal core. To this end, the authors performed ChIP-seq experiments of H3K115ac in mouse embryonic stem cells as well as during differentiation into neuronal progenitor cells. Subsequent bioinformatic analyses revealed an association of H3K115ac with fragile nucleosomes at CpG island promoters, as well as with enhancers and CTCF binding sites. This is an interesting study, which provides important novel insights into the potential function of H3K115ac. However, the study is mainly descriptive, and functional experiments are missing.

      Strengths:

      (1) The authors present the first genome-wide profiling of H3K115ac and link this poorly characterized modification to fragile nucleosomes, CpG island promoters, enhancers, and CTCF binding sites.

      (2) The study provides a valuable descriptive resource and raises intriguing hypotheses about the role of H3K115ac in chromatin regulation.

      (3) The breadth of the bioinformatic analyses adds to the value of the dataset

      Weaknesses:

      (1) I am not fully convinced about the specificity of the antibody. Although the experiment in Figure S1A shows a specific binding to H3K115ac-modified peptides compared to unmodified peptides, the authors do not show any experiment that shows that the antibody does not bind to unrelated proteins. Thus, a Western of a nuclear extract or the chromatin fraction would be critical to show. Also, peptide competition using the H3K115ac peptide to block the antibody may be good to further support the specificity of the antibody. Also, I don't understand the experiment in Figure S1B. What does it tell us when the H3K115ac histone mark itself is missing? The KLF4 promoter does not appear to be a suitable positive control, given that hundreds of proteins/histone modifications are likely present at this region.

      It is important to clearly demonstrate that the antibody exclusively recognizes H3K115ac, given that the conclusion of the manuscript strongly depends on the reliability of the obtained ChIP-Seq data.

      (2) The association of H3K115ac with fragile nucleosomes based on MNase-Sensitivity and fragment length, which are indirect methods and can have technical bias. Experiments that support that the H3K115ac modified nucleosomes are indeed more fragile are missing.

      (3) The comparison of H3K115ac with H3K122ac and H3K64ac relies on publicly available datasets. Since the authors argue that these marks are distinct, data generated under identical experimental conditions would be more convincing. At a minimum, the limitations of using external datasets should be discussed.

      (4) The enrichment of H3K115ac at enhancers and CTCF binding sites is notable but remains descriptive. It would be interesting to clarify whether H3K115ac actively influences transcription factor/CTCF binding or is a downstream correlate.

      (5) No information is provided about how H3K115ac may be deposited/removed. Without this information, it is difficult to place this modification into established chromatin regulatory pathways.

      At the very least, the authors should acknowledge these limitations and provide additional validation of antibody specificity.

    4. Reviewer #3 (Public review):

      Summary:

      Kumar et al. examine the H3K115 epigenetic mark located on the lateral surface of the histone core domain and present evidence that it may serve as a marker enriched at transcription start sites (TSSs) of active CpG island promoters and at polycomb-repressed promoters. They also note enrichment of the H3K115ac mark is found on fragile nucleosomes within nucleosome-depleted regions, on active enhancers, and CTCF-bound sites. They propose that these observations suggest that H3K115ac contributes to nucleosome destabilization and so may serve as a marker of functionally important regulatory elements in mammalian genomes.

      Strengths:

      The authors present novel observations suggesting that acetylation of a histone residue in a core (versus on a histone tail) domain may serve a functional role in promoting transcription, in CPG islands and polycomb-repressed promoters. They present a solid amount of confirmatory in silico data using appropriate methodology that supports the idea that the H3K115ac mark may function to destabilize nucleosomes and contribute to regulating ESC differentiation.

      Weaknesses:

      Additional experiments to confirm antibody specificity are needed. The authors use synthetic peptides for other markers (e.g., H3K122) to support the claim that the antibody is specific, but ChIP-ChIP assays are performed under cross-linked, non-denatured conditions, which preserve structure and epitope accessibility differently than synthetic peptides used for dot blots. Does the antibody give a single band in western blots of histones, and can the H3K115ac peptide block western and immunofluorescence signals of the antibody? Given that the antibody is a rabbit polyclonal, specificity is not a trivial consideration.

    1. eLife Assessment

      This important study establishes bathy phytochromes, a unique class of bacterial photoreceptors that respond to near-infrared light (NIR), as versatile tools for bacterial optogenetics. NIR light is a key control signal in optogenetics due to its deep tissue penetration and the ability to combine with existing red- and blue-light sensitive systems, but thus far, NIR-activated proteins have been poorly characterized. The strength of evidence is convincing, with comprehensive in vitro characterization, modular design strategies, and validation across different hosts, supporting the versatility and potential for these tools in biotechnological applications. This study should advance the fields of optogenetics and photobiology and inspire future work.

    2. Reviewer #1 (Public review):

      Summary:

      This is an interesting study characterizing and engineering so-called bathy phytochromes, i.e. those that respond to near infrared (NIR) light in the ground state, for optogenetic control of bacterial gene expression. Previously, the authors have developed a structure-guided approach to functionally link several light responsive protein domains to the signaling domain of the histidine kinase FixL, which ultimately controls gene expression. Here, the authors use the same strategy to link bathy phytochrome light responsive domains to FixL, resulting in sensors of NIR light. Interestingly, they also link these bathy phytochrome light sensing domains to signaling domains from the tetrathionate-sensing SHK TtrS and the toluene-sensing SHK TodS, demonstrating generality of their protein engineering approach more broadly across bacterial two-component systems.

      This is an exciting result that should inspire future bacterial sensor design. The authors go on to leverage this result to develop what is, to my knowledge, the first system for orthogonally controlling the expression of two separate genes in the same cell with NIR and Red light, a valuable contribution to the field.

      Finally, the authors reveal new details of the pH-dependent photocycle of bathy phytochromes and demonstrate their sensors work in the gut- and plant-relevant strains E. coli Nissle 1917 and A. tumefaciens.

      Strengths:

      The experiments are well founded, well executed, and rigorous.

      The manuscript is clearly written.

      The sensors developed exhibit large responses to light, making them valuable tools for ontogenetic applications.

      This study is a valuable contribution to photobiology and optogenetics.

      Weaknesses:

      As the authors note, the sensors are relatively insensitive to NIR light due to the rapid dark reversion process in bathy phytochromes. Though NIR light is generally non-phototoxic, one would expect this characteristic to be a limitation in some downstream applications where light intensities are not high (e.g. in vivo).

      Though they can be multiplexed with Red light sensors, these bathy phytochrome NIR sensors are more difficult to multiplex with other commonly used light sensors (e.g. blue) due to the broad light responsivity of the Pfr state. This challenge may be overcome by careful dosing of blue light, as the authors discuss, but other bacterial NIR sensing systems with less cross-talk may be preferred in some applications.

      Comments on revisions:

      My concerns have been addressed.

    3. Reviewer #2 (Public review):

      In this manuscript, Meier et al. engineer a new class of light-regulated two-component systems. These systems are built using bathy-bacteriophytochromes that respond to near-infrared (NIR) light. Through a combination of genetic engineering and systematic linker optimization, the authors generate bacterial strains capable of selective and tunable gene expression in response to NIR stimulation. Overall, these results are an interesting expansion of the optogenetic toolkit into the NIR range. The cross-species functionality of the system, modularity, and orthogonality have the potential to make these tools useful for a range of applications.

      Strengths:

      (1) The authors introduce a novel class of near-infrared light-responsive two-component systems in bacteria, expanding the optogenetic toolbox into this spectral range.

      (2) Through engineering and linker optimization, the authors achieve specific and tunable gene expression, with minimal cross-activation from red light in some cases.

      (3) The authors show that the engineered systems function robustly in multiple bacterial strains, including laboratory E. coli, the probiotic E. coli Nissle 1917, and Agrobacterium tumefaciens.

      (4) The combination of orthogonal two-component systems can allow for simultaneous and independent control of multiple gene expression pathways using different wavelengths of light.

      (5) The authors explore the photophysical properties of the photosensors, investigating how environmental factors such as pH influence light sensitivity.

      Comments on revisions:

      The authors have addressed all my prior concerns.

    4. Reviewer #3 (Public review):

      Summary:

      This paper by Meier et al introduces a new optogenetic module for regulation of bacterial gene expression based on "bathy-BphP" proteins. Their paper begins with a careful characterization of kinetics and pH dependence of a few family members, followed by extensive engineering to produce infrared-regulated transcriptional systems based on the authors' previous design of the pDusk and pDERusk systems, and closing with characterization of the systems in bacterial species relevant for biotechnology.

      Strengths:

      The paper is important from the perspective of fundamental protein characterization, since bathy-BphPs are relatively poorly characterized compared to their phytochrome and cyanobacteriochrome cousins. It is also important from a technology development perspective: the optogenetic toolbox currently lacks infrared-stimulated transcriptional systems. Infrared light offers two major advantages: it can be multiplexed with additional tools, and it can penetrate into deep tissues with ease relative to the more widely used blue light activated systems. The experiments are performed carefully and the manuscript is well written.

      Weaknesses:

      Some of the light-inducible responses described in this compelling paper are complex and difficult to rationalize, such as the dependence of light responses on linker length and differences in responses observed from the bathy-BphPs in isolation versus strains in which they are multiplexed. Nevertheless, the authors should be commended for carrying out rigorous experiments and reporting these results accurately. These are minor weaknesses in an overall very strong paper.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This is an interesting study characterizing and engineering so-called bathy phytochromes, i.e., those that respond to near infrared (NIR) light in the ground state, for optogenetic control of bacterial gene expression. Previously, the authors have developed a structure-guided approach to functionally link several light-responsive protein domains to the signaling domain of the histidine kinase FixL, which ultimately controls gene expression. Here, the authors use the same strategy to link bathy phytochrome light-responsive domains to FixL, resulting in sensors of NIR light. Interestingly, they also link these bathy phytochrome light-sensing domains to signaling domains from the tetrathionate-sensing SHK TtrS and the toluene-sensing SHK TodS, demonstrating the generality of their protein engineering approach more broadly across bacterial two-component systems.

      This is an exciting result that should inspire future bacterial sensor design. They go on to leverage this result to develop what is, to my knowledge, the first system for orthogonally controlling the expression of two separate genes in the same cell with NIR and Red light, a valuable contribution to the field.

      Finally, the authors reveal new details of the pH-dependent photocycle of bathy phytochromes and demonstrate that their sensors work in the gut - and plant-relevant strains E. coli Nissle 1917 and A. tumefaciens.

      Strengths:

      (1) The experiments are well-founded, well-executed, and rigorous.

      (2) The manuscript is clearly written.

      (3) The sensors developed exhibit large responses to light, making them valuable tools for ontogenetic applications.

      (4) This study is a valuable contribution to photobiology and optogenetics.

      We thank the reviewer for the positive verdict on our manuscript.

      Weaknesses:

      (1) As the authors note, the sensors are relatively insensitive to NIR light due to the rapid dark reversion process in bathy phytochromes. Though NIR light is generally non-phototoxic, one would expect this characteristic to be a limitation in some downstream applications where light intensities are not high (e.g., in vivo).

      We principally concur with this reviewer’s assessment that delivery of light (of any color) into living tissue can be severely limited by absorption, reflection, and scattering. That notwithstanding, at least two considerations suggest that in-vivo deployment of the pNIRusk setups we presently advance may be feasible.

      First, while the pNIRusk setups are indeed less light-sensitive compared to, e.g., our earlier redlight-responsive pREDusk and pDERusk setups (see Meier et al. Nat Commun 2024), we note that the overall light fluences required for triggering them are in the range of tens of µW per cm<sub>2</sub>. By contrast, optogenetic experiments in vivo, in particular in the neurosciences, often employ light area intensities on the order of mW per cm<sub>2</sub> and above. Put another way, compared to the optogenetic tools used in these experiments, the pNIRusk setups are actually quite sensitive to light.

      Second, sensitivity to NIR light brings the advantage of superior tissue penetration, see data reported by Weissleder Nat Biotech 2001 and Ash et al. Lasers Med Sci 2017 (both papers are cited in our manuscript). Based on these data, the intensity of blue light (450 nm) therefore falls off 5-10 times more strongly with penetration depth than that of NIR light (800 nm).

      We have added a brief treatment of these aspects in the Discussion section.

      (2) Though they can be multiplexed with Red light sensors, these bathy phytochrome NIR sensors are more difficult to multiplex with other commonly used light sensors (e.g., blue) due to the broad light responsivity of the Pfr state. This challenge may be overcome by careful dosing of blue light, as the authors discuss, but other bacterial NIR sensing systems with less cross-talk may be preferred in some applications.

      The reviewer is correct in noting that, at least to a certain extent, the pNIRusk systems also respond to blue light owing to their Soret absorbance bands (see Fig. 1). That said, we note two points:

      First, a given photoreceptor that preferentially responds to certain wavelengths, e.g., 700 nm in the case of conventional bacterial phytochromes (BphP), generally absorbs shorter wavelengths to some degree as well. Absorption of these shorter wavelengths suffices for driving electronic and/or vibronic transitions of the chromophore to higher energy levels which often give rise to productive photochemistry and downstream signal transduction. Put another way, a certain response of sensory photoreceptors to shorter wavelengths is hence fully expected and indeed experimentally borne out, as for instance shown by Ochoa-Fernandez et al. in the so-called PULSE setup (Nat Meth 2020, doi: 10.1038/s41592-020-0868-y).

      Second, known BphPs share similar Pr and Pfr absorbance spectra. We therefore expect other BphP-based optogenetic setups to also respond to blue light to some degree. Currently, there are insufficient data to gauge whether individual BphPs systematically differ in their relative sensitivity to blue compared to red or NIR light. Arguably, pertinent experiments may be an interesting subject for future study.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Meier et al. engineer a new class of light-regulated two-component systems. These systems are built using bathy-bacteriophytochromes that respond to near-infrared (NIR) light. Through a combination of genetic engineering and systematic linker optimization, the authors generate bacterial strains capable of selective and tunable gene expression in response to NIR stimulation. Overall, these results are an interesting expansion of the optogenetic toolkit into the NIR range. The cross-species functionality of the system, modularity, and orthogonality have the potential to make these tools useful for a range of applications.

      Strengths:

      (1) The authors introduce a novel class of near-infrared light-responsive two-component systems in bacteria, expanding the optogenetic toolbox into this spectral range.

      (2) Through engineering and linker optimization, the authors achieve specific and tunable gene expression, with minimal cross-activation from red light in some cases.

      (3) The authors show that the engineered systems function robustly in multiple bacterial strains, including laboratory E. coli, the probiotic E. coli Nissle 1917, and Agrobacterium tumefaciens.

      (4) The combination of orthogonal two-component systems can allow for simultaneous and independent control of multiple gene expression pathways using different wavelengths of light.

      (5) The authors explore the photophysical properties of the photosensors, investigating how environmental factors such as pH influence light sensitivity.

      Weaknesses:

      (1) The expression of multi-gene operons and fluorescent reporters could impose a metabolic burden. The authors should present data comparing optical density for growth curves of engineered strains versus the corresponding empty-vector control to provide insight into the burden and overall impact of the system on host viability and growth.

      In response to this comment, we have recorded growth kinetics of bacteria harboring the pNIRusk-DsRed plasmids or empty vectors under both inducing (i.e., under NIR light) and noninducing conditions (i.e., darkness). We did not observe systematic differences in the growth kinetics between the different cultures, thus suggesting that under the conditions tested there is no adverse effect on cell viability.

      We include the new data in Suppl. Fig. 5c-d and refer to them in the main text.

      (2) The manuscript consistently presents normalized fluorescence values, but the method of normalization is not clear (Figure 2 caption describes normalizing to the maximal fluorescence, but the maximum fluorescence of what?). The authors should provide a more detailed explanation of how the raw fluorescence data were processed. In addition, or potentially in exchange for the current presentation, the authors should include the raw fluorescence values in supplementary materials to help readers assess the actual magnitude of the reported responses.

      We appreciate this valid comment and have altered the representation of the fluorescence data. All values for a given fluorescent protein (i.e., either DsRed or YPet) across all systems are now normalized to a single reference value, thus enabling direct comparison between experiments.

      (3) Related to the prior point, it would be useful to have a positive control for fluorescence that could be used to compare results across different figure panels.

      As all data are now normalized to the same reference value, direct comparison across all figures is enabled.

      (4) Real-time gene expression data are not presented in the current manuscript, but it would be helpful to include a time-course for some of the key designs to help readers assess the speed of response to NIR light.

      In response to this comment, we include in the revised manuscript induction kinetics of bacterial cultures bearing pNIRusk upon transfer to inducing NIR-light conditions. To this end, aliquots were taken at discrete timepoints, transcriptionally and translationally arrested, and analyzed for optical density and DsRed reporter fluorescence after allowing for chromophore maturation.

      We include the new data in Suppl. Fig. 5e and refer to them in the manuscript.

      Moreover, we note that the experiments in Agrobacterium tumefaciens used a luciferase reporter thus enabling the continuous monitoring of the light-induced expression kinetics. These data (unchanged in revision) are to be found in Suppl. Fig. 9.

      Reviewer #3 (Public review):

      Summary:

      This paper by Meier et al introduces a new optogenetic module for the regulation of bacterial gene expression based on "bathy-BphP" proteins. Their paper begins with a careful characterization of kinetics and pH dependence of a few family members, followed by extensive engineering to produce infrared-regulated transcriptional systems based on the authors' previous design of the pDusk and pDERusk systems, and closing with characterization of the systems in bacterial species relevant for biotechnology.

      Strengths:

      The paper is important from the perspective of fundamental protein characterization, since bathyBphPs are relatively poorly characterized compared to their phytochrome and cyanobacteriochrome cousins. It is also important from a technology development perspective: the optogenetic toolbox currently lacks infrared-stimulated transcriptional systems. Infrared light offers two major advantages: it can be multiplexed with additional tools, and it can penetrate into deep tissues with ease relative to the more widely used blue light-activated systems. The experiments are performed carefully, and the manuscript is well written.

      Weaknesses:

      My major criticism is that some information is difficult to obtain, and some data is presented with limited interpretation, making it difficult to obtain intuition for why certain responses are observed. For example, the changes in red/infrared responses across different figures and cellular contexts are reported but not rationalized. Extensive experiments with variable linker sequences were performed, but the rationale for linker choices was not clearly explained. These are minor weaknesses in an overall very strong paper.

      We are grateful for the positive take on our manuscript.

      Reviewer #1 (Recommendations for the authors):

      (1) As eLife is a broad audience journal, please define the Soret and Q-bands (line 125).

      We concur and have added labels in fig. 1a that designate the Soret and Q bands.

      (2) The initial (0) Ac design in Figure 2b is activated by NIR and Red light, albeit modestly. The authors state that this construct shows "constant reporter fluorescence, largely independent of illumination" (line 167). This language should be changed to reflect the fact that this Ac construct responds to both of these wavelengths.

      Agreed. We have amended the text accordingly.

      (3) pNIRusk Ac 0 appears to show a greater light response than pNIRusk Av -5. However, the authors claim that the former is not light-responsive and the latter is. This conclusion should be explained or changed.

      The assignment of pNIRusk Av-5 as light-responsive is based on the relative difference in reporter fluorescence between darkness and illumination with either red or NIR light. Although the overall fluorescence is much lower in Av-5 than for Av-0, the relative change upon illumination is much more pronounced. We add a statement to this effect to the text.

      (4) The authors state that "when combining DmDERusk-Str-YPet with AvTod+21-DsRed expression rose under red and NIR light, respectively, whereas the joint application of both light colors induced both reporter genes" (lines 258-261). In contrast, Figure 3c shows that application of both wavelengths of light results in exclusive activation of YPet expression. It appears the description of the data is wrong and must be corrected. That said, this error does not impact their conclusion that two separate target genes can be independently activated by NIR and red light.

      We thank the reviewer for catching this error which we have corrected in the revised manuscript.

      (5) Line 278: I don't agree with the authors' blanket statement that the use of upconversion nanoparticles is a "grave" limitation for NIR-light mediated activation of bacterial gene expression in vivo. The authors should either expound on the severity of the limitation or use more moderate language.

      We have replaced the word ‘grave’ by ‘potential’ and thereby toned down our wording.

      Reviewer #2 (Recommendations for the authors):

      (1) Please include a discussion on the expected depth penetration of different light wavelengths. This is most relevant in the context of the discussion about how these NIR systems could be used with living therapeutics.

      Given the heterogeneity of biological tissue, it is challenging to state precise penetration depths for different wavelengths of light. That said, blue light for instance is typically attenuated by biological tissue around 5 to 10 times as strongly as near-infrared light is.

      We have expanded the Discussion chapter to cover these aspects.

      (2) It would be helpful for Figure 2C (or supplementary) to also include the response to blue light stimulation.

      We agree and have acquired pertinent data for the blue-light response. The new data are included in an updated Fig. 2c. Data acquired at varying NIR-light intensities, originally included in Fig. 2c, have been moved to Suppl. Fig. 5a-b.

      (3) In Figure 4A, data on the response of E. coli Nissle to blue and red light are missing. Including this would help identify whether the reduced sensitivity to non-NIR wavelengths observed in the E. coli lab strain is preserved in the probiotic background.

      In response to this comment, we have acquired pertinent data on E. coli Nissle. While the results were overall similar to those in the laboratory strain, the response to blue and NIR light was yet lower in the Nissle bacteria which stands to benefit optogenetic applications.

      We have updated Fig. 4a accordingly. For clarity, we only show the data for AvNIRusk in the main paper but have relegated the data on AcNIRusk to Suppl. Fig. 8. (Note that this has necessitated a renumbering of the subsequent Suppl. Figs.)

      (4) On many of the figures, there are thin gray lines that appear between the panels that it would be nice to eliminate because, in some cases, they cut through words and numbers.

      The grey lines likely arose from embedding the figures into the text document. In the typeset manuscript, which has become available on the eLife webpage in the meantime, there are no such lines. That said, we will carefully check throughout the submission/publishing/proofing process lest these lines reappear.

      (5) Page 7, line 155: "As not least seen" typo or awkward phrasing.

      We have restructured the sentence and thereby hopefully clarified the unclear phrasing.

      (6) Page 7, line 167: It does not appear to be the case that the initial pNIRusk designs show constant fluorescence that is largely independent of illumination. AcNIRusk shows an almost twofold change from dark to NIR. Reword this to avoid confusion.

      We concur with this comment, similar to reviewer #1’s remark, and have adjusted the text accordingly.

      (7) Page 8, line 174: Related to the previous point, AvNIRusk has one design that is very minimally light switchable (-5), so stating that six light switchable designs have been identified is also confusing.

      As stated in our response to reviewer #1 above, the assignment of AvNIRusk-5 as light-switchable is based on the relative fluorescence change upon illumination. We have added an explanation to the text.

      (8) Page 10, line 228-229: I was not able to find the data showing that expression levels were higher for the DmTtr systems than the pREDusk and pNIRusk setups. This may be an issue related to the normalization point. It was not clear to me how to compare these values.

      We apologize for the initially unclear representation of the data. In response to this reviewer’s general comments above, we have now normalized all fluorescence values to a single reference value, thus allowing their direct comparison.

      (9) Page 12, line 264: "finer-grained expression control can be exerted..." Either show data or adjust the language so that it is clear this is a prediction.

      True, we have replaced ‘can’ by ‘could’.

      (10) Page 25, line 590: CmpX13 cells have a reference that is given later, but it should be added where it first appears.

      Agreed, we have added the reference in the indicated place.

      (11) Page 25, line 592: define LB/Kan.

      We had already defined this abbreviation further up but, for clarity, we have added it again in the indicated position.

      (12) Page 40, line 946: "normalized by" rather than "to".

      We have implemented the requested change in the indicated and several other positions of the manuscript.

      (13) Figures 2C, 3C, and similar plots in the supplementary material would benefit from having a legend for the colors.

      We agree and have added pertinent legends to the corresponding main and supplementary figures.

      (14) As a reader, I had some trouble following all the acronyms. This is at the author's discretion, but I would eliminate ones that are not strictly essential (e.g. MTP for microtiter plate; I was unable to identify what "MCS" meant; look for other opportunities to remove acronyms).

      In the revised manuscript, we have defined the abbreviation ‘MCS’ (for ‘multiple-cloning site’) upon first occurrence. We have decided to retain the abbreviation ‘MTP’ in the text.

      (15) Could the authors briefly speculate on why A. tumefaciens activation with red light might occur?

      While we can but speculate as to the underlying reasons for the divergent red-light response in A. tumefaciens, we discuss possible scenarios below.

      Commonly, two-component systems (TCS) exhibit highly cooperative and steep responses to signal. As a consequence, even small differences in the intracellular amounts of phosphorylated and unphosphorylated response regulator (RR) can give to significantly changed gene-expression output. Put another way, the gene-expression output need not scale linearly with the extent of RR phosphorylation but, rather, is expected to show nonlinear dependence with pronounced thresholding effects.

      Differences in the pertinent RR levels can for instance arise from variations in the expression levels of the pNIRusk system components between E. coli and A. tumefaciens. Moreover, the two bacteria greatly differ in their two-component-system (TCS) repertoire. Although TCSs are commonly well insulated from each other, cross-talk with endogenous TCSs, even if limited, may cause changes in the levels of phosphorylated RR and hence gene-expression output. In a similar vein, the RR can also be phosphorylated and dephosphorylated non-enzymatically, e.g., by reaction with high-energy anhydrides (such as acetyl phosphate) and hydrolysis, respectively. Other potential origins for the divergent red-light response include differences in the strength of the promoters driving expression of the pNIRusk system components and the fluorescent/luminescent reporters, respectively.

      (16) It would be helpful for the authors to briefly explain why they needed to switch to luminescence from fluorescence for the A. tumeraciens studies.

      While there was no strict necessity to switch from the fluorescence-based system used in E. coli to a luminescence-based system in A. tumefaciens, we opted for luminescence based on prior experience with other Alphaproteobacteria (e.g., 10.1128/mSystems.00893-21), where luminescence offered significant advantages. Specifically, it provides essentially background-free signal detection and greater sensitivity for monitoring gene expression. In addition, as demonstrated in Suppl. Fig. 9c and d, the luminescence system enables real-time tracking of gene expression dynamics, which further supported its use in our experimental setup (see our response to reviewer #2’s general comments).

      (17) This is a very minor comment that the authors can take or leave, but I got hung up on the word "implement" when it appeared a few times in the manuscript because I tended to read it as "put a plan into place" rather than its other meaning.

      In the abstract, we have replaced one instance of the word ‘implement’ by ‘instrument’.

      (18) The authors should include the relevant constructs on AddGene or another public strainsharing service.

      We whole-heartedly subscribe to the idea of freely sharing research materials with fellow scientists. Therefore, we had already deposited the most relevant AvNIRusk in Addgene, even prior to the initial submission of the manuscript (accession number 235084). In the meantime, we have released the deposition, and the plasmid can be obtained from Addgene since May 15<sub>th</sub> of this year.

      Reviewer #3 (Recommendations for the authors):

      Suggestion for improvement:

      This paper relies heavily on variations in linker sequences to shift responses. I am familiar with prior work from the Moglich lab in which helical linkers were employed to shift responses in synthetic two-component systems, with interesting periodicity in responses with every 7 residues (as expected for an alpha helix) and inversion of responses at smaller linker shifts. There is no mention in this paper whether their current engineering follows a similar rationale, what types of linkers are employed (e.g. flexible vs helical), and whether there is an interpretation for how linker lengths alter responses. Can you explain what classes of linker sequences are used throughout Figures 2 and 3, and whether length or periodicity affects the outcome? This would be very helpful for readers who are new to this approach, or if the rationale here differs from the authors' prior work.

      The PATCHY approach employed at present followed a closely similar rationale as in our previous studies. That is, linkers were extended/shortened and varied in their sequence by recombining different fragments of the natural linkers of the parental receptors, i.e., the bacteriophytochrome and the FixL sensor histidine kinase, respectively. We have added a statement to this effect in the text and a reference to Suppl. Fig. 3 which illustrates the principal approach.

      Compared to our earlier studies, we isolated fewer receptor variants supporting light-regulated responses, despite covering a larger sequence space. Owing to the sparsity of the light-regulated variants, an interpretation of the linker properties and their correlation with light-regulated activity is challenging. Although doubtless unsatisfying from a mechanistic viewpoint, we therefore refrain from a pertinent discussion which would be premature and speculative at this point. As the reviewer raises a valid and important point, we have expanded the text by referring to our earlier studies and the observed dependence of functional properties on linker composition.

      It is sometimes difficult to intuit or rationalize the differences in red/IR sensitivity across closely related variants. An important example appears in Figure 3C vs 3B. I think the AvTod+21 in 3B should be the equivalent to the DsRed response in the second column of 3C (AvTod+21 + DmDERusk), except, of course, that the bacteria in 3C carry an additional plasmid for the DERusk system. However, in 3B, the response to red light is substantial - ~50% as strong as that for IR, whereas in 3C, red light elicits no response at all. What is the difference? The reason this is important is that the AvTod+21 and DMDERusk represent the best "orthogonal" red and infrared light responses, but this is not at all obvious from 3B, where AvTod+21 still causes a substantial (and for orthogonality, undesirable) response under red light. Perhaps subtle differences in expression level due to plasmid changes cause these differences in light responses? Could the authors test how the expression level affects these responses? The paper would be greatly improved if observations of the diverse red/IR responses could be rationalized by some design criteria.

      As noted above in our response to reviewer #2, we have now normalized all fluorescence readings to joint reference values, thus allowing a better comparison across experiments.

      The reviewer is correct in noting that upon multiplexing, the individual plasmid systems support lower fluorescence levels than when used in isolation. We speculate that the combination of two plasmids may affect their copy numbers (despite the use of different resistance markers and origins of replications) and hence their performance. Likewise, the cellular metabolism may be affected when multiple plasmids are combined. These aspects may well account for the absent red-light response in AvTod+21 in the multiplexing experiments which is – indeed – unexpected. As, at present, we cannot provide a clear rationalization for this effect, we recommend verifying the performance of the plasmid setups when multiplexing.

      The paper uses "red" and "infrared" to refer to ~624 nm and ~800 nm light, respectively. I wonder whether it might be possible to shift these peak wavelengths to obtain even better separation for the multiplexing experiments. Perhaps shifting the specific red wavelength could result in better separation between DERusk and AvTod systems, for example? Could the authors comment on this (maybe based on action spectra of their previously developed tools) or perhaps test a few additional stimulation wavelengths?

      The choice of illumination wavelengths used in these experiments is dictated by the LED setups available for illumination of microtiter plates. On the one hand, we are using an SMD (surface-mount device) three-color LED with a fixed wavelength of the red channel around 624 nm (see Hennemann et al., 2018). On the other hand, we are deploying a custom-built device with LEDs emitting at around 800 nm (see Stüven et al., 2019 and this work). Adjusting these wavelengths is therefore challenging, although without doubt potentially interesting.

      To address this reviewer comment, we have added a statement to the text that the excitation wavelengths may be varied to improve multiplexed applications.

      Additional minor comments:

      (1) Figure 2C: It would be very helpful to place a legend on the figure panel for what the colors indicate, since they are unique to this panel and non-intuitive.

      This comment coincides with one by reviewer #2, and we have added pertinent legends to this and related supplementary figures.

      (2) Figure 3C: it is not obvious which system uses DsRed and which uses YPet in each combination, since the text indicates that all combinations were cloned, and this is not clearly described in the legend. Is it always the first construct in the figure legend listed for DsRed and the second for YPet?

      For clarification, we have revised the x-axis labels in Fig. 3C. (And yes, it is as this reviewer surmises: the first of the two constructs harbored DsRed and the second one YPet.)

    1. eLife Assessment

      This important manuscript presents a thorough analysis of the evolution of Major Histocompatibility Complex gene families across Primates. A key strength of this analysis is the use of state-of-the-art phylogenetic methods to estimate rates of gene gain and loss, accounting for the notorious difficulty to properly assemble MHC genomic regions. Overall the evidence for the authors' conclusions -- that there is considerable diversity in how MHC diversity is deployed across species -- are compelling.

    2. Joint Public Review:

      Summary:

      The Major Histocompatibility Complex (MHC) region is a collection of numerous genes involved in both innate and adaptive immunity. MHC genes are famed for their role in rapid evolution and extensive polymorphism in a variety of vertebrates. This paper presents a summary of gene-level gain and loss of orthologs and paralogs within MHC across the diversity of primates, using publicly available data.

      Strengths:

      This paper provides a strong case that MHC genes are rapidly gained (by paralog duplication) and lost over millions of years of macroevolution. The authors are able to identify MHC loci by homology across species, and from this infer gene duplications and losses using phylogenetic analyses. There is a remarkable amount of genic turnover, summarized in Figure 6 and Figure 7, either of which might be a future textbook figure of immune gene family evolution. The authors draw on state-of-the-art phylogenetic methods, and their inferences are robust.

      Editorial note:

      The authors have responded to the previous reviews and the Assessment was updated without involving the reviewers again.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The Major Histocompatibility Complex (MHC) region is a collection of numerous genes involved in both innate and adaptive immunity. MHC genes are famed for their role in rapid evolution and extensive polymorphism in a variety of vertebrates. This paper presents a summary of gene-level gain and loss of orthologs and paralogs within MHC across the diversity of primates, using publicly available data.

      Strengths:

      This paper provides a strong case that MHC genes are rapidly gained (by paralog duplication) and lost over millions of years of macroevolution. The authors are able to identify MHC loci by homology across species, and from this infer gene duplications and losses using phylogenetic analyses. There is a remarkable amount of genic turnover, summarized in Figure 6 and Figure 7, either of which might be a future textbook figure of immune gene family evolution. The authors draw on state-of-the-art phylogenetic methods, and their inferences are robust insofar as the data might be complete enough to draw such conclusions.

      Weaknesses:

      One concern about the present work is that it relies on public databases to draw inferences about gene loss, which is potentially risky if the publicly available sequence data are incomplete. To say, for example, that a particular MHC gene copy is absent in a taxon (e.g., Class I locus F absent in Guenons according to Figure 1), we need to trust that its absence from the available databases is an accurate reflection of its absence in the genome of the actual organisms. This may be a safe assumption, but it rests on the completeness of genome assembly (and gene annotations?) or people uploading relevant data. This reviewer would have been far more comfortable had the authors engaged in some active spot-checking, doing the lab work to try to confirm absences at least for some loci and some species. Without this, a reader is left to wonder whether gene loss is simply reflecting imperfect databases, which then undercuts confidence in estimates of rates of gene loss.

      Indeed, just because a locus has not been confirmed in a species does not necessarily mean that it is absent. As we explain in the Figure 1 caption, only a few species have had their genomes extensively studied (gray background), and only for these species does the absence of a point in this figure mean that a locus is absent. The white background rows represent species that are not extensively studied, and we point out that the absence of a point does not mean that a locus is absent from the species, rather undiscovered. We have also added a parenthetical to the text to explain this (line 156): “Only species with rows highlighted in gray have had their MHC regions extensively studied (and thus only for these rows is the absence of a gene symbol meaningful).”

      While we agree that spot-checking may be a helpful next step, one of the goals of this manuscript is to collect and synthesize the enormous volume of MHC evolution research in the primates, which will serve as a jumping-off point for other researchers to perform important wet lab work.

      Some context is useful for comparing rates of gene turnover in MHC, to other loci. Changing gene copy numbers, duplications, and loss of duplicates, are common it seems across many loci and many organisms; is MHC exceptional in this regard, or merely behaving like any moderately large gene family? I would very much have liked to see comparable analyses done for other gene families (immune, like TLRs, or non-immune), and quantitative comparisons of evolutionary rates between MHC versus other genes. Does MHC gene composition evolve any faster than a random gene family? At present readers may be tempted to infer this, but evidence is not provided.

      Our companion paper (Fortier and Pritchard, 2025) demonstrates that the MHC is a unique locus in many regards, such as its evidence for deep balancing selection and its excess of disease associations. Thus, we expect that it is evolving faster than any random gene family. It would be interesting to repeat this analysis for other gene families, but that is outside of the scope of this project. Additionally, allele databases for other gene families are not nearly as developed, but as more alleles become available for other polymorphic families, a comparable analysis could become possible.

      We have added a paragraph to the discussion (lines 530-546) to clarify that we do not know for certain whether the MHC gene family is evolving rapidly compared to other gene families.

      While on the topic of making comparisons, the authors make a few statements about relative rates. For instance, lines 447-8 compare gene topology of classical versus non-classical genes; and line 450 states that classical genes experience more turnover. But there are no quantitative values given to these rates to provide numerical comparisons, nor confidence intervals provided (these are needed, given that they are estimates), nor formal statistical comparisons to confirm our confidence that rates differ between types of genes.

      More broadly, the paper uses sophisticated phylogenetic methods, but without taking advantage of macroevolutionary comparative methods that allow model-based estimation of macroevolutionary rates. I found the lack of quantitative measurements of rates of gene gain/loss to be a weakness of the present version of the paper, and something that should be readily remedied. When claiming that MHC Class I genes "turn over rapidly" (line 476) - what does rapidly mean? How rapidly? How does that compare to rates of genetic turnover at other families? Quantitative statements should be supported by quantitative estimates (and their confidence intervals).

      These statements refer to qualitative observations, so we cannot provide numerical values. We simply conclude that certain gene groups evolve faster or slower based on the species and genes present in each clade. It is difficult to provide estimates because of the incomplete sampling of genes that survived to the present day. In addition, the presence or absence of various orthologs in different species still needs to be confirmed, at which point it might be useful to be more quantitative. We have also added a paragraph to the discussion to address this concern and advocate for similar analyses of other gene families in the future when more data is available (lines 530-546).

      The authors refer to 'shared function of the MHC across species' (e.g. line 22); while this is likely true, they are not here presenting any functional data to confirm this, nor can they rule out neofunctionalization or subfunctionalization of gene duplicates. There is evidence in other vertebrates (e.g., cod) of MHC evolving appreciably altered functions, so one may not safely assume the function of a locus is static over long macroevolutionary periods, although that would be a plausible assumption at first glance.

      Indeed, we cannot assume that the function of a locus is static across time, especially for the MHC region. In our research, we read hundreds of papers that each focused on a small number of species or genes and gathered some information about them, sometimes based on functional experiments and sometimes on measures such as dN/dS. These provide some indication of a gene’s broad classification in a species or clade, even if the evidence is preliminary. Where possible, we used this preliminary evidence to give genes descriptors “classical,” “non-classical,” “dual characteristics,” “pseudogene,” “fixed”, or “unfixed.” Sometimes multiple individuals and haplotypes were analyzed, so we could even assign a minimum number of gene copies present in a species. We have aggregated all of these references into Supplementary Table 1 (for Class I/Figure 1) and Supplementary Table 2 (for Class II/Figure 2) along with specific details about which data points in these figures that each reference supports. We realize that many of these classifications are based on a small number of individuals or indirect measures, so they may change in the future as more functional data is generated.

      Reviewer #2 (Public review):

      Summary:

      The authors aim to provide a comprehensive understanding of the evolutionary history of the Major Histocompatibility Complex (MHC) gene family across primate species. Specifically, they sought to:

      (1) Analyze the evolutionary patterns of MHC genes and pseudogenes across the entire primate order, spanning 60 million years of evolution.

      (2) Build gene and allele trees to compare the evolutionary rates of MHC Class I and Class II genes, with a focus on identifying which genes have evolved rapidly and which have remained stable.

      (3) Investigate the role of often-overlooked pseudogenes in reconstructing evolutionary events, especially within the Class I region.

      (4) Highlight how different primate species use varied MHC genes, haplotypes, and genetic variation to mount successful immune responses, despite the shared function of the MHC across species.

      (5) Fill gaps in the current understanding of MHC evolution by taking a broader, multi-species perspective using (a) phylogenomic analytical computing methods such as Beast2, Geneconv, BLAST, and the much larger computing capacities that have been developed and made available to researchers over the past few decades, (b) literature review for gene content and arrangement, and genomic rearrangements via haplotype comparisons.

      (6) The authors overall conclusions based on their analyses and results are that 'different species employ different genes, haplotypes, and patterns of variation to achieve a successful immune response'.

      Strengths:

      Essentially, much of the information presented in this paper is already well-known in the MHC field of genomic and genetic research, with few new conclusions and with insufficient respect to past studies. Nevertheless, while MHC evolution is a well-studied area, this paper potentially adds some originality through its comprehensive, cross-species evolutionary analysis of primates, focus on pseudogenes and the modern, large-scale methods employed. Its originality lies in its broad evolutionary scope of the primate order among mammals with solid methodological and phylogenetic analyses.

      The main strengths of this study are the use of large publicly available databases for primate MHC sequences, the intensive computing involved, the phylogenetic tool Beast2 to create multigene Bayesian phylogenetic trees using sequences from all genes and species, separated into Class I and Class II groups to provide a backbone of broad relationships to investigate subtrees, and the presentation of various subtrees as species and gene trees in an attempt to elucidate the unique gene duplications within the different species. The study provides some additional insights with summaries of MHC reference genomes and haplotypes in the context of a literature review to identify the gene content and haplotypes known to be present in different primate species. The phylogenetic overlays or ideograms (Figures 6 and 7) in part show the complexity of the evolution and organisation of the primate MHC genes via the orthologous and paralogous gene and species pathways progressively from the poorly-studied NWM, across a few moderately studied ape species, to the better-studied human MHC genes and haplotypes.

      Weaknesses:

      The title 'The Primate Major Histocompatibility Complex: An Illustrative Example of GeneFamily Evolution' suggests that the paper will explore how the Major Histocompatibility Complex (MHC) in primates serves as a model for understanding gene family evolution. The term 'Illustrative Example' in the title would be appropriate if the paper aimed to use the primate Major Histocompatibility Complex (MHC) as a clear and representative case to demonstrate broader principles of gene family evolution. That is, the MHC gene family is not just one instance of gene family evolution but serves as a well-studied, insightful example that can highlight key mechanisms and concepts applicable to other gene families. However, this is not the case, this paper only covers specific details of primate MHC evolution without drawing broader lessons to any other gene families. So, the term 'Illustrative Example' is too broad or generalizing. In this case, a term like 'Case Study' or simply 'Example' would be more suitable. Perhaps, 'An Example of Gene Family Diversity' would be more precise. Also, an explanation or 'reminder' is suggested that this study is not about the origins of the MHC genes from the earliest jawed vertebrates per se (~600 mya), but it is an extension within a subspecies set that has emerged relatively late (~60 mya) in the evolutionary divergent pathways of the MHC genes, systems, and various vertebrate species.

      Thank you for your input on the title; we have changed it to “A case study of gene family evolution” instead.

      Thank you also for pointing out the potential confusion about the time span of our study. We have added “Having originated in the jawed vertebrates,” to a sentence in the introduction (lines 38-39). We have also added the sentence “Here, we focus on the primates, spanning approximately 60 million years within the over 500-million-year evolution of the family \citep{Flajnik2010}.“ to be more explicit about the context for our work (lines 59-61).

      Phylogenomics. Particular weaknesses in this study are the limitations and problems associated with providing phylogenetic gene and species trees to try and solve the complex issue of the molecular mechanisms involved with imperfect gene duplications, losses, and rearrangements in a complex genomic region such as the MHC that is involved in various effects on the response and regulation of the immune system. A particular deficiency is drawing conclusions based on a single exon of the genes. Different exons present different trees. Which are the more reliable? Why were introns not included in the analyses? The authors attempt to overcome these limitations by including genomic haplotype analysis, duplication models, and the supporting or contradictory information available in previous publications. They succeed in part with this multidiscipline approach, but much is missed because of biased literature selection. The authors should include a paragraph about the benefits and limitations of the software that they have chosen for their analysis, and perhaps suggest some alternative tools that they might have tried comparatively. How were problems with Bayesian phylogeny such as computational intensity, choosing probabilities, choosing particular exons for analysis, assumptions of evolutionary models, rates of evolution, systemic bias, and absence of structural and functional information addressed and controlled for in this study?

      We agree that different exons have different trees, which is exactly why we repeated our analysis for each exon in order to compare and contrast them. In particular, the exons encoding the binding site of the resulting protein (exons 2 and 3 for Class I and exon 2 for Class II) show evidence for trans-species polymorphism and gene conversion. These phenomena lead to trees that do not follow the species tree and are fascinating in and of themselves, which we explore in detail in our companion paper (Fortier and Pritchard, 2025). Meanwhile, the non-peptide-binding extracellular-domain-encoding exon (exon 4 for Class I and exon 3 for Class II) is comparably sized to the binding-site-encoding exons and provides an interesting functional contrast. As this exon is likely less affected by trans-species polymorphism, gene conversion, and convergent evolution, we present results from it most often in the main text, though we occasionally touch on differences between the exons. See lines 191-196, 223-226, and 407-414 for some examples of how we discuss the exons in the text. Additionally, all trees from all of these exons can be found in the supplement. 

      We agree that introns would valuable to study in this context. Even though the non--binding-site-encoding exons are probably *less* affected by trans-species polymorphism, gene conversion, and convergent evolution, they are still functional. The introns, however, experience much more relaxed selection, if any, and comparing their trees to those for the exons would be valuable and illuminating. We did not generate intron trees for two reasons. Most importantly, there is a dearth of data available for the introns; in the databases we used, there was often intron data available only for human, chimpanzee, and sometimes macaque, and only for a small subset of the genes. This limitation is at odds with the comprehensive, many-gene-many-species approach which we feel is the main novelty of this work. Secondly, the introns that *are* available are difficult to align. Even aligning the exons across such a highly-diverged set of genes and pseudogenes was difficult and required manual effort. The introns proved even more difficult to try to align across genes. In the future, when more intron data is available and sufficient effort is put into aligning them, it will be possible and desirable to do a comparable analysis. We also added a sentence to the “Data” section to briefly explain why we did not include introns (lines 134-135).

      We explain our Bayesian phylogenetics approach in detail in the Methods (lines 650-725), including our assumptions and our solutions to challenges specific to this application. For further explanation of the method itself, we suggest reading the original BEAST and BEAST2 papers (Drummond & Rambaut (2007), Drummond et al. (2012), Bouckaert et al. (2014), and Bouckaert et al. (2019)). Known structural and functional information helped us validate the alignments we used in this study, but the fact that such information is not fully known for every gene and species should not affect the method itself.

      Gene families as haplotypes. In the Introduction, the MHC is referred to as a 'gene family', and in paragraph 2, it is described as being united by the 'MHC fold', despite exhibiting 'very diverse functions'. However, the MHC region is more accurately described as a multigene region containing diverse, haplotype-specific Conserved Polymorphic Sequences, many of which are likely to be regulatory rather than protein-coding. These regulatory elements are essential for controlling the expression of multiple MHC-related products, such as TNF and complement proteins, a relationship demonstrated over 30 years ago. Non-MHC fold loci such as TNF, complement, POU5F1, lncRNA, TRIM genes, LTA, LTB, NFkBIL1, etc, are present across all MHC haplotypes and play significant roles in regulation. Evolutionary selection must act on genotypes, considering both paternal and maternal haplotypes, rather than on individual genes alone. While it is valuable to compile databases for public use, their utility is diminished if they perpetuate outdated theories like the 'birth-and-death model'. The inclusion of prior information or assumptions used in a statistical or computational model, typically in Bayesian analysis, is commendable, but they should be based on genotypic data rather than older models. A more robust approach would consider the imperfect duplication of segments, the history of their conservation, and the functional differences in inheritance patterns. Additionally, the MHC should be examined as a genomic region, with ancestral haplotypes and sequence changes or rearrangements serving as key indicators of human evolution after the 'Out of Africa' migration, and with disease susceptibility providing a measurable outcome. There are more than 7000 different HLA-B and -C alleles at each locus, which suggests that there are many thousands of human HLA haplotypes to study. In this regard, the studies by Dawkins et al (1999 Immunol Rev 167,275), Shiina et al. (2006 Genetics 173,1555) on human MHC gene diversity and disease hitchhiking (haplotypes), and Sznarkowska et al. (2020 Cancers 12,1155) on the complex regulatory networks governing MHC expression, both in terms of immune transcription factor binding sites and regulatory non-coding RNAs, should be examined in greater detail, particularly in the context of MHC gene allelic diversity and locus organization in humans and other primates.

      Thank you for these comments. To clarify that the MHC “region” is different from (and contains) the MHC “gene family” as we describe it, we changed a sentence in the abstract (lines 8-10) from “One large gene family that has experienced rapid evolution is the Major Histocompatibility Complex (MHC), whose proteins serve critical roles in innate and adaptive immunity.” to “One large gene family that has experienced rapid evolution lies within the Major Histocompatibility Complex (MHC), whose proteins serve critical roles in innate and adaptive immunity.” We know that the region is complex and contains many other genes and regulatory sequences; Figure 1 of our companion paper (Fortier and Pritchard, 2025) depicts these in order to show the reader that the MHC genes we focus on are just one part of the entire region.

      We love the suggestion to look at the many thousands of alleles present at each of the classical loci. This is the focus of our complimentary paper (Fortier and Pritchard, 2025) which explores variation at the allele level. In the current paper, we look mainly at the differences between genes and the use of different genes in different species.

      Diversifying and/or concerted evolution. Both this and past studies highlight diversifying selection or balancing selection model is the dominant force in MHC evolution. This is primarily because the extreme polymorphism observed in MHC genes is advantageous for populations in terms of pathogen defence. Diversification increases the range of peptides that can be presented to T cells, enhancing the immune response. The peptide-binding regions of MHC genes are highly variable, and this variability is maintained through selection for immune function, especially in the face of rapidly evolving pathogens. In contrast, concerted evolution, which typically involves the homogenization of gene duplicates through processes like gene conversion or unequal crossing-over, seems to play a minimal role in MHC evolution. Although gene duplication events have occurred in the MHC region leading to the expansion of gene families, the resulting paralogs often undergo divergent evolution rather than being kept similar or homozygous by concerted evolution. Therefore, unlike gene families such as ribosomal RNA genes or histone genes, where concerted evolution leads to highly similar copies, MHC genes display much higher levels of allelic and functional diversification. Each MHC gene copy tends to evolve independently after duplication, acquiring unique polymorphisms that enhance the repertoire of antigen presentation, rather than undergoing homogenization through gene conversion. Also, in some populations with high polymorphism or genetic drift, allele frequencies may become similar over time without the influence of gene conversion. This similarity can be mistaken for gene conversion when it is simply due to neutral evolution or drift, particularly in small populations or bottlenecked species. Moreover, gene conversion might contribute to greater diversity by creating hybrids or mosaics between different MHC genes. In this regard, can the authors indicate what percentage of the gene numbers in their study have been homogenised by gene conversion compared to those that have been diversified by gene conversion?

      We appreciate the summary, and we feel we have appropriately discussed both gene conversion and diversifying selection in the context of the MHC genes. Because we cannot know for sure when and where gene conversion has occurred, we cannot quantify percentages of genes that have been homogenized or diversified.  

      Duplication models. The phylogenetic overlays or ideograms (Figures 6 and 7) show considerable imperfect multigene duplications, losses, and rearrangements, but the paper's Discussion provides no in-depth consideration of the various multigenic models or mechanisms that can be used to explain the occurrence of such events. How do their duplication models compare to those proposed by others? For example, their text simply says on line 292, 'the proposed series of events is not always consistent with phylogenetic data'. How, why, when? Duplication models for the generation and extension of the human MHC class I genes as duplicons (extended gene or segmental genomic structures) by parsimonious imperfect tandem duplications with deletions and rearrangements in the alpha, beta, and kappa blocks were already formulated in the late 1990s and extended to the rhesus macaque in 2004 based on genomic haplotypic sequences. These studies were based on genomic sequences (genes, pseudogenes, retroelements), dot plot matrix comparisons, and phylogenetic analyses of gene and retroelement sequences using computer programs. It already was noted or proposed in these earlier 1999 studies that (1) the ancestor of HLA-P(90)/-T(16)/W(80) represented an old lineage separate from the other HLA class I genes in the alpha block, (2) HLA-U(21) is a duplicated fragment of HLA-A, (3) HLA-F and HLA-V(75) are among the earliest (progenitor) genes or outgroups within the alpha block, (4) distinct Alu and L1 retroelement sequences adjoining HLA-L(30), and HLA-N genomic segments (duplicons) in the kappa block are closely related to those in the HLA-B and HLA-C in the beta block; suggesting an inverted duplication and transposition of the HLA genes and retroelements between the beta and kappa regions. None of these prior human studies were referenced by Fortier and Pritchard in their paper. How does their human MHC class I gene duplication model (Fig. 6) such as gene duplication numbers and turnovers differ from those previously proposed and described by Kulski et al (1997 JME 45,599), (1999 JME 49,84), (2000 JME 50,510), Dawkins et al (1999 Immunol Rev 167,275), and Gaudieri et al (1999 GR 9,541)? Is this a case of reinventing the wheel?

      Figures 6 and 7 are intended to synthesize and reconcile past findings and our own trees, so they do not strictly adhere to the findings of any particular study and cannot fully match all studies. In the supplement, Figure 6 - figure supplement 1 and Figure 7 - figure supplement 1 duly credit all of the past work that went into making these trees. Most previous papers focus on just one aspect of these trees, such as haplotypes within a species, a specific gene or allelic lineage relationship, or the branching pattern of particular gene groups. We believe it was necessary to bring all of these pieces of evidence together. Even among papers with the same focus (to understand the block duplications that generated the current physical layout of the MHC), results differ. For example, Geraghty (1992), Hughes (1995), Kulski (2004)/Kulski (2005),  and Shiina (1999) all disagree on the exact branching order of the genes MHC-W, -P, and -T, and of MHC-G, -J, and -K. While the Kulski studies you pointed out were very thorough for their era, they still only relied on data from three species and one haplotype per species. Our work is not intended to replace or discredit these past works, simply build upon them with a larger set of species and sequences. We hope the hypotheses we propose in Figures 6 and 7 can help unify existing research and provide a more easily accessible jumping-off-point for future work.

      Results. The results are presented as new findings, whereas most if not all of the results' significance and importance already have been discussed in various other publications. Therefore, the authors might do better to combine the results and discussion into a single section with appropriate citations to previously published findings presented among their results for comparison. Do the trees and subsets differ from previous publications, albeit that they might have fewer comparative examples and samples than the present preprint? Alternatively, the results and discussion could be combined and presented as a review of the field, which would make more sense and be more honest than the current format of essentially rehashing old data.

      In starting this project, we found that a large barrier to entry to this field of study is the immense amount of published literature over 30+ years. It is both time-consuming and confusing to read up on the many nuances of the MHC genes, their changing names, and their evolution, making it difficult to start new, innovative projects. We acknowledge that while our results are not entirely novel, the main advantage of our work is that it provides a thorough, comprehensive starting point for others to learn about the MHC quickly and dive into new research. We feel that we have appropriately cited past literature in both the main text, appendices, and supplement, so that readers may dive into a particular area with ease.

      Minor corrections:

      (1) Abstract, line 19: 'modern methods'. Too general. What modern methods?

      To keep the abstract brief, the methods are introduced in the main text when each becomes relevant as well as in the methods section.

      (2) Abstract, line 25: 'look into [primate] MHC evolution.' The analysis is on the primate MHC genes, not on the entire vertebrate MHC evolution with a gene collection from sharks to humans. The non-primate MHC genes are often differently organised and structurally evolved in comparison to primate MHC.

      Thank you! We have added the word “primate” to the abstract (line 25).

      (3) Introduction, line 113. 'In a companion paper (Fortier and Pritchard, 2024)' This paper appears to be unpublished. If it's unpublished, it should not be referenced.

      This paper is undergoing the eLife editorial process at the same time; it will have a proper citation in the final version.

      (4) Figures 1 and 2. Use the term 'gene symbols' (circle, square, triangle, inverted triangle, diamond) or 'gene markers' instead of 'points'. 'Asterisks "within symbols" indicate new information.

      Thank you, the word “symbol” is much clearer! We have changed “points” to “symbols” in the captions for Figure 1, Figure 1 - figure supplement 1, Figure 2, and Figure 2 - figure supplement 1. We also changed this in the text (lines 157-158 and 170).

      (5) Figures. A variety of colours have been applied for visualisation. However, some coloured texts are so light in colour that they are difficult to read against a white background. Could darker colours or black be used for all or most texts?

      With such a large number of genes and species to handle in this work, it was nearly impossible to choose a set of colors that were distinct enough from each other. We decided to prioritize consistency (across this paper, its supplement, and our companion paper) as well as at-a-glance grouping of similar sequences. Unfortunately, this means we had to sacrifice readability on a white background, but readers may turn to the supplement if they need to access specific sequence names.

      (6) Results, line 135. '(Fortier and Pritchard, 2024)' This paper appears to be unpublished. If it's unpublished, it should not be referenced.

      Repeat of (3). This paper is undergoing the eLife editorial process at the same time; it will have a proper citation in the final version.

      (7) Results, lines 152 to 153, 164, 165, etc. 'Points with an asterisk'. Use the term 'gene symbols' (circle, square, triangle, inverted triangle, diamond) or 'gene markers' instead of 'points'. A point is a small dot such as those used in data points for plotting graphs .... The figures are so small that the asterisks in the circles, squares, triangles, etc, look like points (dots) and the points/asterisks terminology that is used is very confusing visually.

      Repeat of (4). Thank you, the word “symbol” is much clearer! We have changed “points” to “symbols” in the captions for Figure 1, Figure 1 - figure supplement 1, Figure 2, and Figure 2 - figure supplement 1. We also changed this in the text (lines 157-158 and 170).

      (8) Line 178 (BEA, 2024) is not listed alphabetically in the References.

      Thank you for catching this! This reference maps to the first bibliography entry, “SUMMARIZING POSTERIOR TREES.” We are unsure how to cite a webpage that has no explicit author within the eLife Overleaf template, so we will consult with the editor.

      (9) Lines 188-190. 'NWM MHC-G does not group with ape/OWM MHC-G, instead falling outside of the clade containing ape/OWM MHC-A, -G, -J and -K.' This is not surprising given that MHC-A, -G, -J, and -K are paralogs of each other and that some of them, especially in NWM have diverged over time from the paralogs and/or orthologs and might be closer to one paralog than another and not be an actual ortholog of OWM, apes or humans.

      We included this sentence to clarify the relationships between genes and to help describe what is happening in Figure 6. Figure 6 - figure supplement 1 includes all of the references that go into such a statement and Appendix 3 details our reasoning for this and other statements.

      (10) Line 249. Gene conversion: This is recombination between two different genes where a portion of the genes are exchanged with one another so that different portions of the gene can group within one or other of the two gene clades. Alternatively, the gene has been annotated incorrectly if the gene does not group within either of the two alternative clades. Another possibility is that one or two nucleotide mutations have occurred without a recombination resulting in a mistaken interpretation or conclusion of a recombination event. What measures are taken to avoid false-positive conclusions? How many MHC gene conversion (recombination) events have occurred according to the authors' estimates? What measures are taken to avoid false-positive conclusions?

      All of these possibilities are certainly valid. We used the program GENECONV to infer gene conversion events, but there is considerable uncertainty owing to the ages of the genes and the inevitable point mutations that have occurred post-event. Gene conversion was not the focus of our paper, so we did our best to acknowledge it (and the resulting differences between trees from different exons) without spending too much time diving into it. A list of inferred gene conversion events can be found in Figure 3 - source data 1 and Figure 4 - source data 1.

      (11) Lines 284-286. 'The Class I MHC region is further divided into three polymorphic blocks-alpha, beta, and kappa blocks-that each contains MHC genes but are separated by well-conserved non-MHC genes.' The MHC class I region was first designated into conserved polymorphic duplication blocks, alpha and beta by Dawkins et al (1999 Immunol Rev 167,275), and kappa by Kulski et al (2002 Immunol Rev 190,95), and should be acknowledged (cited) accordingly.

      Thank you for catching this! We have added these citations (lines 302-303)!

      (12) Lines 285-286. 'The majority of the Class I genes are located in the alpha-block, which in humans includes 12 MHC genes and pseudogenes.' This is not strictly correct for many other species, because the majority of class I genes might be in the beta block of new and old-world monkeys, and the authors haven't provided respective counts of duplication numbers to show otherwise. The alpha block in some non-primate mammalian species such as pigs, rats, and mice has no MHC class I genes or only a few. Most MHC class I genes in non-primate mammalian species are found in other regions. For example, see Ando et al (2005 Immunogenetics 57,864) for the pig alpha, beta, and kappa regions in the MHC class I region. There are no pig MHC genes in the alpha block.

      Yes, which is exactly why we use the phrase “in humans” in that particular sentence. The arrangement of the MHC in several other primate reference genomes is shown in Figure 1 - figure supplement 2.

      (13) Line 297 to 299. 'The alpha-block also contains a large number of repetitive elements and gene fragments belonging to other gene families, and their specific repeating pattern in humans led to the conclusion that the region was formed by successive block duplications (Shiina et al., 1999).' There are different models for successive block duplications in the alpha block and some are more parsimonious based on imperfect multigenic segmental duplications (Kulski et al 1999, 2000) than others (Shiina et al., 1999). In this regard, Kulski et al (1999, 2000) also used duplicated repetitive elements neighbouring MHC genes to support their phylogenetic analyses and multigenic segmental duplication models. For comparison, can the authors indicate how many duplications and deletions they have in their models for each species?

      We have added citations to this sentence to show that there are different published models to describe the successive block duplications (line 307). Our models in Figure 6 and Figure 7 are meant to aggregate past work and integrate our own, and thus they were not built strictly by parsimony. References can be found in Figure 6 - figure supplement 1 and Figure 7 - figure supplement 1.

      (14) Lines 315-315. 'Ours is the first work to show that MHC-U is actually an MHC-A-related gene fragment.' This sentence should be deleted. Other researchers had already inferred that MHC-U is actually an MHC-A-related gene fragment more than 25 years ago (Kulski et al 1999, 2000) when the MHC-U was originally named MHC-21.

      While these works certainly describe MHC-U/MHC-21 as a fragment in the 𝛼-block, any relation to MHC-A was by association only and very few species/haplotypes were examined. So although the idea is not wholly novel, we provide convincing evidence that not only is MHC-U related to MHC-A by sequence, but also that it is a very recent partial duplicate of MHC-A. We show this with Bayesian phylogenetic trees as well as an analysis of haplotypes across many more species than were included in those papers.  

      (15) Lines 361-362. 'Notably, our work has revealed that MHC-V is an old fragment.' This is not a new finding or hypothesis. Previous phylogenetic analysis and gene duplication modelling had already inferred HLA-V (formerly HLA-75) to be an old fragment (Kulski et al 1999, 2000).

      By “old,” we mean older than previous hypotheses suggest. Previous work has proposed that MHC-V and -P were duplicated together, with MHC-V deriving from an MHC-A/H/V ancestral gene and MHC-P deriving from an MHC-W/T/P ancestral gene (Kulski (2005), Shiina (1999)). However, our analysis (Figure 5A) shows that MHC-V sequences form a monophyletic clade outside of the MHC-W/P/T group of genes as well as outside of the MHC-A/B/C/E/F/G/J/K/L group of genes, which is not consistent with MHC-A and -V being closely related. Thus, we conclude that MHC-V split off earlier than the differentiation of these other gene groups and is thus older than previously thought. We explain this in the text as well (lines 317-327) and in Appendix 3.  

      (16) Line 431-433. 'the Class II genes have been largely stable across the mammals, although we do see some lineage-specific expansions and contractions (Figure 2 and Figure 2-gure Supplement 2).' Please provide one or two references to support this statement. Is 'gure' a typo?

      We corrected this typo, thank you! This conclusion is simply drawn from the data presented in Figure 2 and Figure 2 - figure supplement 2. The data itself comes from a variety of sources, which are already included in the supplement as Figure 2 - source data 1.

      (17) Line 437. 'We discovered far more "specific" events in Class I, while "broad-scale" events were predominant in Class II.' Please define the difference between 'specific' and 'broad-scale'.

      These terms are defined in the previous sentence (lines 466-469).

      450-451. 'This shows that classical genes experience more turnover and are more often affected by long-term balancing selection or convergent evolution.' Is balancing selection a form of divergent evolution that is different from convergent evolution? Please explain in more detail how and why balancing selection or convergent evolution affects classical and nonclassical genes differently.

      Balancing selection acts to keep alleles at moderate frequencies, preventing any from fixing in the population. In contrast, convergent evolution describes sequences or traits becoming similar over time even though they are not similar by descent. While we cannot know exactly what selective forces have occurred in the past, we observe different patterns in the trees for each type of gene. In Figures 1 and 2, viewers can see at first glance that the nonclassical genes (which are named throughout the text and thoroughly described in Appendix 3) appear to be longer-lived than the classical genes. In addition, lines 204-222 and 475-488 describe topological differences in the BEAST2 trees of these two types of genes. However, we acknowledge that it could be helpful to have additional, complimentary information about the classical vs. non-classical genes. Thus, we have added a sentence and reference to our companion paper (Fortier and Pritchard, 2025), which focuses on long-term balancing selection and draws further contrast between classical and non-classical genes. In lines 481-484, we added  “We further explore the differences between classical and non-classical genes in our companion paper, finding ancient trans-species polymorphism at the classical genes but not at the non-classical genes \citep{Fortier2025b}.”

      References

      Some references in the supplementary materials such as Alvarez (1997), Daza-Vamenta (2004), Rojo (2005), Aarnink (2014), Kulski (2022), and others are missing from the Reference list. Please check that all the references in the text and the supplementary materials are listed correctly and alphabetically.

      We will make sure that these all show up properly in the proof.

      Reviewer #3 (Public review):

      Summary:

      The article provides the most comprehensive overview of primate MHC class I and class II genes to date, combining published data with an exploration of the available genome assemblies in a coherent phylogenetic framework and formulating new hypotheses about the evolution of the primate MHC genomic region.

      Strengths:

      I think this is a solid piece of work that will be the reference for years to come, at least until population-scale haplotype-resolved whole-genome resequencing of any mammalian species becomes standard. The work is timely because there is an obvious need to move beyond short amplicon-based polymorphism surveys and classical comparative genomic studies. The paper is data-rich and the approach taken by the authors, i.e. an integrative phylogeny of all MHC genes within a given class across species and the inclusion of often ignored pseudogenes, makes a lot of sense. The focus on primates is a good idea because of the wealth of genomic and, in some cases, functional data, and the relatively densely populated phylogenetic tree facilitates the reconstruction of rapid evolutionary events, providing insights into the mechanisms of MHC evolution. Appendices 1-2 may seem unusual at first glance, but I found them helpful in distilling the information that the authors consider essential, thus reducing the need for the reader to wade through a vast amount of literature. Appendix 3 is an extremely valuable companion in navigating the maze of primate MHC genes and associated terminology.

      Weaknesses:

      I have not identified major weaknesses and my comments are mostly requests for clarification and justification of some methodological choices.

      Thank you so much for your kind and supportive review!

      Reviewer #1 (Recommendations for the authors):

      (1) Line 151: How is 'extensively studied' defined?

      Extensively studied is not a strict definition, but a few organisms clearly stand apart from the rest in terms of how thoroughly their MHC regions have been studied. For example, the macaque is a model organism, and individuals from many different species and populations have had their MHC regions fully sequenced. This is in contrast to the gibbon, for example, in which there is some experimental evidence for the presence of certain genes, but no MHC region has been fully sequenced from these animals.

      (2) Can you clarify how 'classical' and 'non-classical' MHC genes are being determined in your analysis?

      Classical genes are those whose protein products perform antigen presentation to T cells and are directly involved in adaptive immunity, while non-classical genes are those whose protein products do not do this. For example, these non-classical genes might code for proteins that interact with receptors on Natural Killer cells and influence innate immunity. The roles of these proteins are not necessarily conserved between closely related species, and experimental evidence is needed to evaluate this. However, in the absence of such evidence, wherever possible we have provided our best guess as to the roles of the orthologous genes in other species, presented in Figure 1 - source data 1 and Figure 2 - source data 1. This is based on whatever evidence is available at the moment, sometimes experimental but typically based on dN/dS ratios and other indirect measures.

      (3) I find the overall tone of the paper to be very descriptive, and at times meandering and repetitive, with a lot of similar kinds of statements being repeated about gene gain/loss. This is perhaps inevitable because a single question is being asked of each of many subsets of MHC gene types, and even exons within gene types, so there is a lot of repetition in content with a slightly different focus each time. This does not help the reader stay focused or keep track. I found myself wishing for a clearly defined question or hypothesis, or some rate parameter in need of estimation. I would encourage the authors to tighten up their phrasing, or consider streamlining the results with some better signposting to organize ideas within the results.

      We totally understand your critique, as we talk about a wide range of specific genes and gene groups in this paper. To improve readability, we have added many more signposting phrases and sentences:

      “Aside from MHC-DRB, …” (line 173)

      “Now that we had a better picture of the landscape of MHC genes present in different primates, we wanted to understand the genes’ relationships. Treating Class I, Class IIA, and Class IIB separately, ...” (line 179-180)

      “We focus first on the Class I genes.” (line 191)

      “... for visualization purposes…” (line195)

      “We find that sequences do not always assort by locus, as would be expected for a typical gene.” (lines 196-197)

      “... rather than being directly orthologous to the ape/OWM MHC-G genes.” (lines 201-202)

      “Appendix 3 explains each of these genes in detail, including previous work and findings from this study.“ (lines 202-203)

      “... (but not with NWM) …” (line 208)

      “While genes such as MHC-F have trees which closely match the overall species tree, other genes show markedly different patterns, …” (lines 212-213)

      “Thus, while some MHC-G duplications appear to have occurred prior to speciation events within the NWM, others are species-specific.” (lines 218-219)

      “... indicating rapid evolution of many of the Class I genes” (lines 220-221)

      “Now turning to the Class II genes, …“ (line 223)

      “(see Appendix 2 for details on allele nomenclature) “ (line 238)

      “(e.g. MHC-DRB1 or -DRB2)” (line 254)

      “...  meaning their names reflect previously-observed functional similarity more than evolutionary relatedness.” (lines 257-258)

      “(see Appendix 3 for more detail)” (line 311)

      “(a 5'-end fragment)” (line 324)

      “Therefore, we support past work that has deemed MHC-V an old fragment.” (lines 326-327)

      “We next focus on MHC-U, a previously-uncharacterized fragment pseudogene containing only exon 3.” (line 328-329)

      “However, it is present on both chimpanzee haplotypes and nearly all human haplotypes, and we know that these haplotypes diverged earlier---in the ancestor of human and gorilla. Therefore, ...” (lines 331-333)

      “Ours is the first work to show that MHC-U is actually an MHC-A-related gene fragment and that it likely originated in the human-gorilla ancestor.” (lines 334-336)  

      “These pieces of evidence suggest that MHC-K and -KL duplicated in the ancestor of the apes.” (lines 341-342)

      “Another large group of related pseudogenes in the Class I $\alpha$-block includes MHC-W, -P, and -T (see Appendix 3 for more detail).” (lines 349-350)

      “...to form the current physical arrangement” (lines 354)

      “Thus, we next focus on the behavior of this subgroup in the trees.” (line 358)

      “(see Appendix 3 for further explanation).” (line 369)

      “Thus, for the first time we show that there must have been three distinct MHC-W-like genes in the ape/OWM ancestor.” (lines 369-371)

      “... and thus not included in the previous analysis. ” (lines 376-377)

      “MHC-Y has also been identified in gorillas (Gogo-Y) (Hans et al., 2017), so we anticipate that Gogo-OLI will soon be confirmed. This evidence suggests that the MHC-Y and -OLI-containing haplotype is at least as old as the human-gorilla split. Our study is the first to place MHC-OLI in the overall story of MHC haplotype evolution“ (lines 381-384)

      “Appendix 3 explains the pieces of evidence leading to all of these conclusions (and more!) in more detail.” (lines 395-396)

      “However, looking at this exon alone does not give us a complete picture.” (lines 410-411)

      “...instead of with other ape/OWM sequences, …” (lines 413-414)

      “Figure 7 shows plausible steps that might have generated the current haplotypes and patterns of variation that we see in present-day primates. However, some species are poorly represented in the data, so the relationships between their genes and haplotypes are somewhat unclear.” (lines 427-429)

      “(and more-diverged)” (line 473)

      “(of both classes)” (line 476)

      “..., although the classes differ in their rate of evolution.”  (line 487-488)

      “Including these pseudogenes in our trees helped us construct a new model of $\alpha$-block haplotype evolution. “ (lines 517-518)

      (4) Line 480-82: "Notably...." why is this notable? Don't merely state that something is notable, explain what makes it especially worth drawing the reader's attention to: in what way is it particularly significant or surprising?

      We have changed the text from “Notably” to “In particular” (line 390) so that readers are expecting us to list some specific findings. Similarly, we changed “Notably” to “Specifically” (line 515).

      (5) The end of the discussion is weak: "provide context" is too vague and not a strong statement of something that we learned that we didn't know before, or its importance. This is followed by "This work will provide a jumping-off point for further exploration..." such as? What questions does this paper raise that merit further work?

      We have made this paragraph more specific and added some possible future research directions. It now reads “By treating the MHC genes as a gene family and including more data than ever before, this work enhances our understanding of the evolutionary history of this remarkable region. Our extensive set of trees incorporating classical genes, non-classical genes, pseudogenes, gene fragments, and alleles of medical interest across a wide range of species will provide context for future evolutionary, genomic, disease, and immunologic studies. For example, this work provides a jumping-off-point for further exploration of the evolutionary processes affecting different subsets of the gene family and the nuances of immune system function in different species. This study also provides a necessary framework for understanding the evolution of particular allelic lineages within specific MHC genes, which we explore further in our companion paper \citep{Fortier2025b}. Both studies shed light on MHC gene family evolutionary dynamics and bring us closer to understanding the evolutionary tradeoffs involved in MHC disease associations.” (lines 576-586)

      Reviewer #3 (Recommendations for the authors):

      (1) Figure 1 et seq. Classifying genes as having 'classical', 'non-classical' and 'dual' properties is notoriously difficult in non-model organisms due to the lack of relevant information. As you have characterised a number of genes for the first time in this paper and could not rely entirely on published classifications, please indicate the criteria you used for classification.

      The roles of these proteins are not necessarily conserved between closely related species, and experimental evidence is needed to evaluate this. However, in the absence of such evidence, wherever possible we have provided our best guess as to the roles of the orthologous genes in other species, presented in Figure 1 - source data 1 and Figure 2 - source data 1. This is based on whatever evidence is available at the moment, sometimes experimental but typically based on dN/dS ratios and other indirect measures.

      (2) Line 61 It's important to mention that classical MHC molecules present antigenic peptides to T cells with variable alphabeta T cell receptors, as non-classical MHC molecules may interact with other T cell subsets/types.

      Thank you for pointing this out; we have updated the text to make this clearer (lines 63-65). We changed “‘Classical’ MHC molecules perform antigen presentation to T cells---a key part of adaptive immunity---while ‘non-classical’ molecules have niche immune roles.” to “‘Classical’ MHC molecules perform antigen presentation to T cells with variable alphabeta TCRs---a key part of adaptive immunity---while ‘non-classical’ molecules have niche immune roles.”

      (3) Perhaps it's worth mentioning in the introduction that you are deliberately excluding highly divergent non-classical MHC molecules such as CD1.

      Thank you, it’s worth clarifying exactly what molecules we are discussing. We have added a sentence to the introduction (lines 38-43): “Having originated in the jawed vertebrates, this group of genes is now involved in diverse functions including lipid metabolism, iron uptake regulation, and immune system function (proteins such as zinc-𝛼2-glycoprotein (ZAG), human hemochromatosis protein (HFE), MHC class I chain–related proteins (MICA, MICB), and the CD1 family) \citep{Hansen2007,Kupfermann1999,Kaufman2022,Adams2013}. However, here we focus on…”

      (4) Line 94-105 This material presents results, it could be moved to the results section as it now somewhat disrupts the flow.

      We feel it is important to include a “teaser” of the results in the introduction, which can be slightly more detailed than that in the abstract.

      (5) Line 118-131 This opening section of the results sets the stage for the whole presentation and contains important information that I feel needs to be expanded to include an overview and justification of your methodological choices. As the M&M section is at the end of the MS (and contains limited justification), some information on two aspects is needed here for the benefit of the reader. First, as far as I understand, all phylogenetic inferences were based entirely on DNA sequences of individual (in some cases concatenated) exons. It would be useful for the reader to explain why you've chosen to rely on DNA rather than protein sequences, even though some of the genes you include in the phylogenetic analysis are highly divergent. Second, a reader might wonder how the "maximum clade credibility tree" from the Bayesian analysis compares to commonly seen trees with bootstrap support or posterior probability values assigned to particular clades. Personally, I think that the authors' approach to identifying and presenting representative trees is reasonable (although one might wonder why "Maximum clade credibility tree" and not "Maximum credibility tree" https://www.beast2.org/summarizing-posterior-trees/), since they are working with a large number of short, sometimes divergent and sometimes rather similar sequences - in such cases, a requirement for strict clade support could result in trees composed largely of polytomies. However, I feel it's necessary to be explicit about this and to acknowledge that the relationships represented by fully resolved bifurcating representative trees and interpreted in the study may not actually be highly supported in the sense that many readers might expect. In other words, the reader should be aware from the outset of what the phylogenies that are so central to the paper represent.

      We chose to rely on DNA rather than protein sequences because convergent evolution is likely to happen in regions that code for extremely important functions such as adaptive and innate immunity. Convergent evolution acts upon proteins while trans-species polymorphism retains ancient nucleotide variation, so studying the DNA sequence can help tease apart convergent evolution from trans-species polymorphism.

      As for the “maximum clade credibility tree”, this is a matter of confusing nomenclature. In the online reference guide (https://www.beast2.org/summarizing-posterior-trees/), the tree with the maximum product of the posterior clade probabilities is called the “maximum credibility tree” while the tree that has the maximum sum of posterior clade probabilities is called the “Maximum credibility tree”. The “Maximum credibility tree” (referring to the sum) appears to have only been named in this way in the first version of TreeAnnotator. However, the version of TreeAnnotator that I used lists the options “maximum clade credibility tree” and “maximum sum of clade probabilities”. So the context suggests that the “maximum clade credibility tree” option is actually maximizing the product. This “maximum clade credibility tree” is the setting I used for this project (in TreeAnnotator version 2.6.3).

      We agree that readers may not fully grasp what the collapsed trees represent upon first read. We have added a sentence to the beginning of the results (line 188-190) to make this more explicit.

      (6) Line 224, you're referring to the DPB1*09 lineage, not the DRB1*09 lineage.

      Indeed! We have changed these typos.

      (7) Line 409, why "Differences between MHC subfamilies" and not "Differences between MHC classes"?

      We chose the word “subfamilies” because we discuss the difference between classical and non-classical genes in addition to differences between Class I and Class II genes.

      (8) Line 529-544 This might work better as a table.

      We agree! This information is now presented as Table 1.

      (9) Line 547 MHC-DRB9 appears out of the blue here - please say why you are singling it out.

      Great point! We added a paragraph (lines 614-623) to explain why this was necessary.

      (10) Line 550-551 Even though you've screened the hits manually, it would be helpful to outline your criteria for this search.

      Thank you! We’ve added a couple of sentences to explain how we did this (lines 607-610).

      (11) Line 556-580 please provide nucleotide alignments as supplementary data so that the reader can get an idea of the actual divergence of the sequences that have been aligned together.

      Thank you! We’ve added nucleotide alignments as supplementary files.

      (12) Line 651-652 Why "Maximum clade credibility tree" and not "Maximum credibility tree"? 

      Repeat of (5). This is a matter of confusing nomenclature. In the online reference guide (https://www.beast2.org/summarizing-posterior-trees/), the tree with the maximum product of the posterior clade probabilities is called the “maximum credibility tree” while the tree that has the maximum sum of posterior clade probabilities is called the “Maximum credibility tree”. The “Maximum credibility tree” (referring to the sum) appears to have only been named in this way in the first version of TreeAnnotator. However, the version of TreeAnnotator that I used lists the options “maximum clade credibility tree” and “maximum sum of clade probabilities”. So the context suggests that the “maximum clade credibility tree” option is actually maximizing the product. This “maximum clade credibility tree” is the setting I used for this project (in TreeAnnotator version 2.6.3).

      (13) In the appendices, links to references do not work as expected.

      We will make sure these work properly when we receive the proofs.

    1. eLife Assessment

      This study provides important insights into how researchers can use perceptual metamers to formally explore the limits of visual representations at different processing stages. The framework is compelling and the data largely support the claims, subject to minor caveats.

    2. Reviewer #1 (Public review):

      This is an interesting study on the nature of representations across the visual field. The question of how peripheral vision differs from foveal vision is a fascinating and important one. The majority of our visual field is extra-foveal yet our sensory and perceptual capabilities decline in pronounced and well-documented ways away from the fovea. Part of the decline is thought to be due to spatial averaging ('pooling') of features. Here, the authors contrast two models of such feature pooling with human judgments of image content. They use much larger visual stimuli than in most previous studies, and some sophisticated image synthesis methods to tease apart the prediction of the distinct models.

      More importantly, in so doing, the researchers thoroughly explore the general approach of probing visual representations through metamers-stimuli that are physically distinct but perceptually indistinguishable. The work is embedded within a rigorous and general mathematical framework for expressing equivalence classes of images and how visual representations influence these. They describe how image-computable models can be used to make predictions about metamers, which can then be compared to make inferences about the underlying sensory representations. The main merit of the work lies in providing a formal framework for reasoning about metamers and their implications, for comparing models of sensory processing in terms of the metamers that they predict, and for mapping such models onto physiology. Importantly, they also consider the limits of what can be inferred about sensory processing from metamers derived from different models.

      Overall, the work is of a very high standard and represents a significant advance over our current understanding of perceptual representations of image structure at different locations across the visual field. The authors do a good job of capturing the limits of their approach I particularly appreciated the detailed and thoughtful Discussion section and the suggestion to extend the metamer-based approach described in the MS with observer models. The work will have an impact on researchers studying many different aspects of visual function including texture perception, crowding, natural image statistics and the physiology of low- and mid-level vision.

      The main weaknesses of the original submission relate to the writing. A clearer motivation could have been provided for the specific models that they consider, and the text could have been written in a more didactic and easy to follow manner. The authors could also have been more explicit about the assumptions that they make.

      Comments following re-submission:

      Overall, I think the authors have done a satisfactory job of addressing most of the points I raised.

      There's one final issue which I think still needs better discussion.

      I think reviewer 2 articulated better than I have the point I was concerned about: the relationship between JNDs and metamers as depicted in the schematics and indeed in the whole conceptualization.

      I think the issue here is that there seems to be a conflating of two concepts- 'subthreshold' and 'metamer'-and I'm not convinced it is entirely unproblematic. It's true that two stimuli that cannot be discriminated from one another due to the physical differences being too small to detect reliably by the visual system are a form of metamer in the strict definition 'physically different, but perceptually the same'.<br /> However, I don't think this is the scientifically substantial notion of metamer that enabled insights into trichromacy. That form of metamerism is due to the principle of univariance in feature encoding, and involves conditions in which physically very different stimuli are mapped to one and the same point in sensory encoding space whether or not there is any noise in the system. When I say 'physically very different' I mean different by a large enough amount that they would be far above threshold, potentially orders of magnitude larger than a JND if the system's noise properties were identical but the system used a different sensory basis set to measure them. This seems to be a very different kind of 'physically different, but perceptually the same'.

      I do think the notion of metamerism can obviously be very usefully extended beyond photoreceptors and photon absorptions. In the interesting case of texture metamers, what I think is meant is that stimuli would be discriminable if scrutinised in the fovea, but because they have the same statistics they are treated as equivalent. I think the discussion of this could still be clearly articulated in the manuscript. It would benefit from a more thorough discussion of the difference between metamerism and subthreshold, especially in the context of the Voronoi diagrams at the beginning.

      It needs to be made clear to the reader why it is that two stimuli that are physically similar (e.g., just spanning one of the edges in the diagram) can be discriminable, while at the same time, two stimuli that are very different (e.g., at opposite ends of a cell) can't.

      Do the cells include BOTH those sets of stimuli that cannot be discriminated just because of internal noise AND those that can't be discriminated because they are projected to literally the same point in the sensory encoding space? What are the strengths and limits of models that involve the strict binarization of sensory representations, and how can they be integrated with models dealing with continuous differences? These seem like important background concepts that ought to be included in either the introduction of discussion sections. In this context it might also be helpful to refer to the notion of 'visual equivalence' as described by:

      Ramanarayanan, G., Ferwerda, J., Walter, B., & Bala, K. (2007). Visual equivalence: towards a new standard for image fidelity. ACM Transactions on Graphics (TOG), 26(3), 76-es.

      Other than that, I congratulate the authors on a very interesting study, and look forward to reading the final version.

    3. Reviewer #2 (Public review):

      Summary:

      The authors have improved clarity overall and have spoken to most of the issues raised by the reviewers. There are still two outstanding problems however, where issues raised during the review were inappropriately dismissed in the manuscript. These should be explicitly addressed as limitations to the results presented (no eye tracking), and early pilot experiments that informed the experiments as presented (pink noise) rather than brushed off as 'unnecessary' and 'would be uninformative'.

      Eye tracking:

      It is generally accepted that experiments testing stimuli presented at specific locations in peripheral vision require eye tracking to ensure that the stimulus is presented as expected, in particular, in the correct location. As I stated in the previous round of review, while a stimulus presentation time of 200ms does help eliminate some saccades, it does not eliminate the possibility that subjects were not fixating well during stimulus onset. I am also unclear what the authors mean by 'trained observer' in this context, though the authors state that an author subject in a different portion of the paper is an 'expert observer'. Does this mean the 'trained observers' are non-expert recruited subjects? Given the conditions tested differ from previous work (Freeman & Simoncelli, 2011) *these differences are a main contribution of the paper!* which DID include eye tracking in a subset of subjects, it is entirely possible to get similar results to this work in the context of non eye-tracking controlled stimulus presentation. The reasons now in the manuscript are not reasons that make eye tracking 'considered unnecessary'.

      I appreciate that the authors now state the lack of eye tracking explicitly, but believe the paper needs to at least state that this is a limitation of the results reported, and eyetracking being 'considered unnecessary' is unreasonable, nor a norm in this subfield.

      N=1: The authors now state clearly the limitations of a single subject in the manuscript, and state the expertise level of this subject.

      Large number of trials: The authors now address this and include an enumeration of the large number of trials.

      Simple Models / Physiology comparison: I support the choice to reduce claims regarding tight connections to physiology, and appreciate the explanation of the luminance model.

      Previous Work: I appreciate the author's changes to the introduction, both in discussing previous work and citation fixes.

      Blurred White, Pink Noise: While the authors now address pink noise, the explanation for such stimuli being expected to be uninformative is confusing to me. The manuscript now first states that pink noise is a natural choice, then claims it would be uninformative, while also stating in the rebuttal (not the manuscript) that they tried it and it indeed reduced the artifacts they note. The logic of the experiments indeed relies on finding the smallest critical scaling value, which is measured by subjects determining if a synthesis is similar or different to a target or second synth. A synthesis free from artifacts would surely affect the subjects responses and the smallest critical scaling measured.

      The statement that the authors experimented with pink noise early on and found this able to address the artifacts should be stated in the manuscript itself, not just in the rebuttal, and the blanket statement that this experiment would be 'uninformative' is incorrect. Surely this early pilot the authors mention in the rebuttal was informative to designing the experiments that appear in the final paper, and would be an informative experiment to include.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This is an interesting study of the nature of representations across the visual field. The question of how peripheral vision differs from foveal vision is a fascinating and important one. The majority of our visual field is extra-foveal yet our sensory and perceptual capabilities decline in pronounced and well-documented ways away from the fovea. Part of the decline is thought to be due to spatial averaging (’pooling’) of features. Here, the authors contrast two models of such feature pooling with human judgments of image content. They use much larger visual stimuli than in most previous studies, and some sophisticated image synthesis methods to tease apart the prediction of the distinct models.

      More importantly, in so doing, the researchers thoroughly explore the general approach of probing visual representations through metamers-stimuli that are physically distinct but perceptually indistinguishable. The work is embedded within a rigorous and general mathematical framework for expressing equivalence classes of images and how visual representations influence these. They describe how image-computable models can be used to make predictions about metamers, which can then be compared to make inferences about the underlying sensory representations. The main merit of the work lies in providing a formal framework for reasoning about metamers and their implications, for comparing models of sensory processing in terms of the metamers that they predict, and for mapping such models onto physiology. Importantly, they also consider the limits of what can be inferred about sensory processing from metamers derived from different models.

      Overall, the work is of a very high standard and represents a significant advance over our current understanding of perceptual representations of image structure at different locations across the visual field. The authors do a good job of capturing the limits of their approach and I particularly appreciated the detailed and thoughtful Discussion section and the suggestion to extend the metamer-based approach described in the MS with observer models. The work will have an impact on researchers studying many different aspects of visual function including texture perception, crowding, natural image statistics, and the physiology of low- and mid-level vision.

      The main weaknesses of the original submission relate to the writing. A clearer motivation could have been provided for the specific models that they consider, and the text could have been written in a more didactic and easy-to-follow manner. The authors could also have been more explicit about the assumptions that they make.

      Thank you for the summary. We appreciate the positives noted above. We address the weaknesses point by point below.

      Reviewer #2 (Public Review):

      Summary

      This paper expands on the literature on spatial metamers, evaluating different aspects of spatial metamers including the effect of different models and initialization conditions, as well as the relationship between metamers of the human visual system and metamers for a model. The authors conduct psychophysics experiments testing variations of metamer synthesis parameters including type of target image, scaling factor, and initialization parameters, and also compare two different metamer models (luminance vs energy). An additional contribution is doing this for a field of view larger than has been explored previously

      General Comments

      Overall, this paper addresses some important outstanding questions regarding comparing original to synthesized images in metamer experiments and begins to explore the effect of noise vs image seed on the resulting syntheses. While the paper tests some model classes that could be better motivated, and the results are not particularly groundbreaking, the contributions are convincing and undoubtedly important to the field. The paper includes an interesting Voronoi-like schematic of how to think about perceptual metamers, which I found helpful, but for which I do have some questions and suggestions. I also have some major concerns regarding incomplete psychophysical methodology including lack of eye-tracking, results inferred from a single subject, and a huge number of trials. I have only minor typographical criticisms and suggestions to improve clarity. The authors also use very good data reproducibility practices.

      Thank you for the summary. We appreciate the positives noted above. We address the weaknesses point by point below.

      Specific Comments

      Experimental Setup

      Firstly, the experiments do not appear to utilize an eye tracker to monitor fixation. Without eye tracking or another manipulation to ensure fixation, we cannot ensure the subjects were fixating the center of the image, and viewing the metamer as intended. While the short stimulus time (200ms) can help minimize eye movements, this does not guarantee that subjects began the trial with correct fixation, especially in such a long experiment. While Covid-19 did at one point limit in-person eye-tracked experiments, the paper reports no such restrictions that would have made the addition of eye-tracking impossible. While such a large-scale experiment may be difficult to repeat with the addition of eye tracking, the paper would be greatly improved with, at a minimum, an explanation as to why eye tracking was not included.

      Addressed on pg. 25, starting on line 658.

      Secondly, many of the comparisons later in the paper (Figures 9,10) are made from a single subject. N=1 is not typically accepted as sufficient to draw conclusions in such a psychophysics experiment. Again, if there were restrictions limiting this it should be discussed. Also (P11) Is subject sub-00 is this an author? Other expert? A naive subject? The subject’s expertise in viewing metamers will likely affect their performance.

      Addressed on pg. 14, starting on line 308.

      Finally, the number of trials per subject is quite large. 13,000 over 9 sessions is much larger than most human experiments in this area. The reason for this should be justified.

      In general, we needed a large number of trials to fit full psychometric functions for stimuli derived for both models, with both types of comparison, both initializations, and over many target images. We could have eliminated some of these, but feel that having a consistent dataset across all these conditions is a strength of the paper.

      In addition to the sentence on pg. 14, line 318, a full enumeration of trials is now described on pg. 23, starting on line 580.

      Model

      For the main experiment, the authors compare the results of two models: a ’luminance model’ that spatially pools mean luminance values, and an ’energy model’ that spatially pools energy calculated from a multi-scale pyramid decomposition. They show that these models create metamers that result in different thresholds for human performance, and therefore different critical scaling parameters, with the basic luminance pooling model producing a scaling factor 1/4 that of the energy model. While this is certain to be true, due to the luminance model being so much simpler, the motivation for the simple luminance-based model as a comparison is unclear.

      The use of simple models is now addressed on pg. 3, starting on line 98, as well as the sentence starting on pg. 4 line 148: the luminance model is intended as the simplest possible pooling model.

      The authors claim that this luminance model captures the response of retinal ganglion cells, often modeled as a center-surround operation (Rodieck, 1964). I am unclear in what aspect(s) the authors claim these center-surround neurons mimic a simple mean luminance, especially in the context of evidence supporting a much more complex role of RGCs in vision (Atick & Redlich, 1992). Why do the authors not compare the energy model to a model that captures center-surround responses instead? Do the authors mean to claim that the luminance model captures only the pooling aspects of an RGC model? This is particularly confusing as Figures 6 and 9 show the luminance and energy models for original vs synth aligning with the scaling of Midget and Parasol RGCs, respectively. These claims should be more clearly stated, and citations included to motivate this. Similarly, with the energy model, the physiological evidence is very loosely connected to the model discussed.

      We have removed the bars showing potential scaling values measured by electrophysiology in the primate visual system and attempted to clarify our language around the relationship between these models and physiology. Our metamer models are only loosely connected to the physiology, and we’ve decided in revision not to imply any direct connection between the model parameters and physiological measurements. The models should instead be understood as loosely inspired by physiology, but not as a tool to localize the representation (as was done in the Freeman paper).

      The physiological scaling values are still used as the mean of the priors on the critical scaling value for model fitting, as described on pg. 27, starting on line 698.

      Prior Work:

      While the explorations in this paper clearly have value, it does not present any particularly groundbreaking results, and those reported are consistent with previous literature.The explorations around critical eccentricity measurement have been done for texture models (Figure 11) in multiple papers (Freeman 2011, Wallis, 2019, Balas 2009). In particular, Freeman 20111 demonstrated that simpler models, representing measurements presumed to occur earlier in visual processing need smaller pooling regions to achieve metamerism. This work’s measurements for the simpler models tested here are consistent with those results, though the model details are different. In addition, Brown, 2023 (which is miscited) also used an extended field of view (though not as large as in this work). Both Brown 2023, and Wallis 2019 performed an exploration of the effect of the target image. Also, much of the more recent previous work uses color images, while the author’s exploration is only done for greyscale.

      We were pleased to find consistency of our results with previous studies, given the (many) differences in stimuli and experimental conditions (especially viewing angle), while also extending to new results with the luminance model, and the effects of initialization. Note that only one of the previous studies (Freeman and Simoncelli, 2011) used a pooled spectral energy model. Moreover, of the previous studies, only one (Brown et al., 2023) used color images (we have corrected that citation - thanks for catching the error).

      Discussion of Prior Work:

      The prior work on testing metamerism between original vs. synthesized and synthesized vs. synthesized images is presented in a misleading way. Wallis et al.’s prior work on this should not be a minor remark in the post-experiment discussion. Rather, it was surely a motivation for the experiment. The text should make this clear; a discussion of Wallis et al. should appear at the start of that section. The authors similarly cite much of the most relevant literature in this area as a minor remark at the end of the introduction (P3L72).

      The large differences we observed between comparison types (original vs synthesized, compared to synthesized vs synthesized) surprised us. Understanding such difference was not a primary motivation for the work, but it is certainly an important component of our results. In the introduction, we thought it best to lay out the basic logic of the metamer paradigm for foveated vision before mentioning the complications that are introduced in both the Wallis and Brown papers (paragraph beginning p. 3, line 109). Our results confirm and bolster the results of both of those earlier works, which are now discussed more fully in the Introduction (lines 109 and following).

      White Noise: The authors make an analogy to the inability of humans to distinguish samples of white noise. It is unclear however that human difficulty distinguishing samples of white noise is a perceptual issue- It could instead perhaps be due to cognitive/memory limitations. If one concentrates on an individual patch one can usually tell apart two samples. Support for these difficulties emerging from perceptual limitations, or a discussion of the possibility of these limitations being more cognitive should be discussed, or a different analogy employed.

      We now note the possibility of cognitive limits on pg. 8, starting on line 243, as well as pg. 22, line 571. The ability of observers to distinguish samples of white noise is highly dependent on display conditions. A small patch of noise (i.e., large pixels, not too many) can be distinguished, but a larger patch cannot, especially when presented in the periphery. This is more generally true for textures (as shown in Ziemba and Simoncelli (2021)). Samples of white noise at the resolution used in our study are indistinguishable.

      Relatedly, in Figure 14, the authors do not explain why the white noise seeds would be more likely to produce syntheses that end up in different human equivalence classes.

      In figure 14, we claim that white noise seeds are more likely to end up in the same human equivalence classes than natural image seeds. The explanation as to why we think this may be the case is now addressed on pg. 19, starting on line 423.

      It would be nice to see the effect of pink noise seeds, which mirror the power spectrum of natural images, but do not contain the same structure as natural images - this may address the artifacts noted in Figure 9b.

      The lack of pink noise seeds is now addressed on pg. 19, starting on line 429.

      Finally, the authors note high-frequency artifacts in Figure 4 & P5L135, that remain after syntheses from the luminance model. They hypothesize that this is due to a lack of constraints on frequencies above that defined by the pooling region size. Could these be addressed with a white noise image seed that is pre-blurred with a low pass filter removing the frequencies above the spatial frequency constrained at the given eccentricity?

      The explanation for this is similar to the lack of pink noise seeds in the previous point: the goal of metamer synthesis is model testing, and so for a given model, we want to find model metamers that result in the smallest possible critical scaling value. Taking white noise seed images and blurring them will almost certainly remove the high frequencies visible in luminance metamers in figure 4 and thus result in a larger critical scaling value, as the reviewer points out. However, the logic of the experiments requires finding the smallest critical scaling value, and so these model metamers would be uninformative. In an early stage of the project, we did indeed synthesize model metamers using pink noise seeds, and observed that the high frequency artifacts were less prominent.

      Schematic of metamerism: Figures 1,2,12, and 13 show a visual schematic of the state space of images, and their relationship to both model and human metamers. This is depicted as a Voronoi diagram, with individual images near the center of each shape, and other images that fall at different locations within the same cell producing the same human visual system response. I felt this conceptualization was helpful. However, implicitly it seems to make a distinction between metamerism and JND (just noticeable difference). I felt this would be better made explicit. In the case of JND, neighboring points, despite having different visual system responses, might not be distinguishable to a human observer.

      Thanks for noting this – in general, metamers are subthreshold, and for the purpose of the diagram, we had to discretize the space showing metameric regions (Voronoi regions) around a set of stimuli. We’ve rewritten the captions to explain this better. We address the binary subthreshold nature of the metamer paradigm in the discussion section (pg. 19, line 438).

      In these diagrams and throughout the paper, the phrase ’visual stimulus’ rather than ’image’ would improve clarity, because the location of the stimulus in relation to the fovea matters whereas the image can be interpreted as the pixels displayed on the computer.

      We agree and have tried to make this change, describing this choice on pg. 3 line 73.

      Other

      The authors show good reproducibility practices with links to relevant code, datasets, and figures.

      Reviewer #1 (Recommendations For The Authors):

      In its current form, I found the introduction to be too cursory. I felt that the article would benefit from a clearer motivation for the two models that are considered as the reader is left unclear why these particular models are of special scientific significance. The luminance model is intended to capture some aspects of retinal ganglion cells response characteristics and the spectral energy model is intended to capture some aspects of the primary visual cortex. However, one can easily imagine models that include the pooling of other kinds of features, and it would be helpful to get an idea of why these are not considered. Which aspects of processing in the retina and V1 are being considered and which are being left out, and why? Why not consider representations that capture even higher-order statistical structure than those covered by the spectral energy model (or even semantics)? I think a bit of rewriting with this in mind could improve the introduction.

      Along similar lines, I would have appreciated having the logic of the study explained more explicitly and didactically: which overarching research question is being asked, how it is operationalised in the models and experiments, and what are the predictions of the different models. Figures 2 and 3 are certainly helpful, but I felt further explanations would have made it easier for the reader to follow. Throughout, the writing could be improved by a careful re-reading with a view to making it easier to understand. For example, where results are presented, a sentence or two expanding on the implications would be helpful.

      I think the authors could also be more explicit about the assumptions they make. While these are obviously (tacitly) included in the description of the models themselves, it would be helpful to state them more openly. To give one example, when introducing the notion of critical scaling, on p.6 the authors state as if it is a self-evident fact that "metamers can be achieved with windows whose size is matched to that of the underlying visual neurons". This presumably is true only under particular conditions, or when specific assumptions about readout from populations of neurons are invoked. It would be good to identify and state such assumptions more directly (this is partly covered in the Discussion section ’The linking proposition underlying the metamer paradigm’, but this should be anticipated or moved earlier in the text).

      We agree that our introduction was too cursory and have reworked it. We have also backed off of the direct comparison to physiology and clarified that we chose these two as the simplest possible pooling models. We have also added sentences at the end of each result section attempting to summarize the implication (before discussing them fully in the discussion). Hopefully the logic and assumptions are now clearer.

      There are also some findings that warrant a more extensive discussion. For example, what is the broader implication of the finding that original vs. synthesised and synthesised vs. synthesised comparisons exhibit very different scaling values? Does this tell us something about internal visual representations, or is it simply capturing something about the stimuli?

      We believe this difference is a result of the stimuli that are used in the experiment and thus the synthesis procedure itself, which interacts with the model’s pooled image feature. We have attempted to update the relevant figures and discussions to clarify this, in the sections starting on pg 17 line 396 and pg. 19 line 417.

      At some points in the paper, a third model (’texture model’) creeps into the discussion, without much explanation. I assume that this refers to models that consider joint (rather than marginal) statistics of wavelet responses, as in the famous Portilla & Simoncelli texture model. However, it would be helpful to the reader if the authors could explain this.

      Addressed on pg. 3, starting on line 94.

      Minor corrections.

      Caption of Figure 3: ’top’ and ’bottom’ should be ’left’ and ’right’

      Line 177: ’smallest tested scaling values tested’. Remove one instance of ’tested’

      Line 212: ’the images-specific psychometric functions’ -> ’image-specific’

      Line 215: ’cloud-like pink noise’. It’s not literally pink noise, so I would drop this.

      Line 236: ’Importantly, these results cannot be predicted from the model, which gives no specific insight as to why some pairs are more discriminable than others’. The authors should specify what we do learn from the model if it fails to provide insight into why some image pairs are more discriminable than others.

      Figure 9: it might be helpful to include small insets with the ’highway’ and ’tiles’ source images to aid the reader in understanding how the images in 9B were generated.

      Table 1 placement should be after it is first referred to on line 258.

      In the Discussion section "Why does critical scaling depend on the comparison being performed", it would be helpful to consider the case where the two model metamers *are* distinguishable from each other even though each is indistinguishable from the target image. I would assume that this is possible (e.g., if the target image is at the midpoint between the two model images in image space and each of the stimuli is just below 1 JND away from the target). Or is this not possible for some reason?

      Regarding line 236: this specific line has been removed, and the discussion about this issue has all been consolidated in the final section of the discussion, starting on pg. 19 line 438.

      Regarding the final comment: this is addressed in the paragraph starting on pg. 16 line 386. To expand upon that: the situation laid out by the reviewer is not possible in our conceptualization, in which metamerism is transitive and image discriminability is binary. In order to investigate situations like the one laid out by the reviewer, one needs models whose representations have metric properties, i.e., which allow you to measure and reason about perceptual distance, which we refer to in the paragraph starting on pg. 20 line 460. We also note that this situation has not been observed in this or any other pooling model metamer study that we are aware of. All other minor changes have been addressed.

      Reviewer #2 (Recommendations For The Authors):

      Original image T should be marked in the Voronoi diagrams.

      Brown et al is miscited as 2021 should be ACM Transactions on Applied Perception 2023.

      Figure 3 caption: models are left and right, not top and bottom.

      Thanks, all of the above have been addressed.

      References

      BrownReral Encoding, in the Human Visual System. ACM Transactions on Applied Perception. 2023 Jan; 20(1):1–22.http://dx.doi.org/10.1145/356460, Dutell V, Walter B, Rosenholtz R, Shirley P, McGuire M, Luebke D. Efficient Dataflow Modeling of Periph-5, doi: 10.1145/3564605.

      Freeman Jdoi: 10.1038/nn.2889, Simoncelli EP. Metamers of the ventral stream. Nature Neuroscience. 2011 aug; 14(9):1195–1201..

      Ziemba CMnications. 2021 jul; 12(1)., Simoncelli EP. Opposing Effects of Selectivity and Invariance in Peripheral Vision. Nature Commu-https://doi.org/10.1038/s41467-021-24880-5, doi: 10.1038/s41467-021-24880-5.

    1. eLife Assessment

      This study presents a valuable finding relating to how the state of arousal is represented within the superior colliculus, a principal visuo-oculomotor structure. The main conclusion that the representation of arousal is segregated, and thus influences visual activity but not motor output, is incompletely supported by the evidence, but could be stronger if a specific concern relating to an alternative explanation for the dichotomy was addressed. The work will be of interest to sensory, motor, and cognitive neuroscientists.

    2. Reviewer #1 (Public review):

      Summary:

      Johnston and Smith used linear electrode arrays to record from small populations of neurons in the superior colliculus (SC) of monkeys performing a memory-guided saccade (MGS) task. Dimensionality reduction (PCA) was used to reveal low-dimensional subspaces of population activity reflecting the slow drift of neuronal signals during the delay period across a recording session (similar to what they reported for parts of cortex: Cowley et al., 2020). This SC drift was correlated with a similar slow-drift subspace recorded from the prefrontal cortex, and both slow-drift subspaces tended to be associated with changes in arousal (pupil size). These relationships were driven primarily by neurons in superficial layers of the SC, where saccade sensitivity/selectivity is typically reduced. Accordingly, delay-period modulations of both spiking activity and pupil size were independent of saccade-related activity, which was most prevalent in deeper layers of the SC. The authors suggest that these findings provide evidence of a separation of arousal- and motor-related signals. The analysis techniques expand upon the group's previous work and provides useful insight into the power of large-scale neural recordings paired with dimensionality reduction. This is particularly important with the advent of recording technologies which allow for the measurement of spiking activity across hundreds of neurons simultaneously. Together, these results provide a useful framework for comparing how different populations encode signals related to cognition, arousal, and motor output in potentially different subspaces.

      Comments on revised manuscript:

      The authors have done a very good job of responding to all of the reviewers' concerns.

    3. Reviewer #2 (Public review):

      Summary:

      Neurons in motor-related areas have increasingly shown to carry also other, non-motoric signals. This creates a problem of avoidance of interference between the motor and non-motor-related signals. This is a significant problem that likely affects many brain areas. The specific example studied here is interference between saccade-related activity and slow-changing arousal signals in the superior colliculus. The authors identify neuronal activity related to saccades and arousal. Identifying saccade-related activity is straightforward, but arousal-related activity is harder to identify. The authors first identify a potential neuronal correlate of arousal using PCA to identifying a component in the population activity corresponding to slow drift over the recording session. Next, they link this component to arousal by showing that the component is present across different brain areas (SC and PFC), and that it is correlated with pupil size, an external marker of arousal. Having identified an arousal-related component in SC, the authors show next that SC neurons with strong motor-related activity are less strongly affected by this arousal component (both SC and PFC). Lastly, they show that SC population activity pattern related to saccades and pupil size form orthogonal subspaces in the SC population.

      Strengths:

      A great strength of this research is the clear description of the problem, its relationship with the performed analysis and the interpretation of the results. the paper is very well written and easy to follow. An additional strength is the use of fairly sophisticated analysis using population activity.

      Weaknesses:

      (1) The greatest weakness in the present research is the fact that arousal is a functionally less important non-motoric variable. The authors themself introduce the problem with a discussion of attention, which is without any doubt the most important cognitive process that needs to be functionally isolated from oculomotor processes. Given this introduction, one cannot help but wonder, why the authors did not design an experiment, in which spatial attention and oculomotor control are differentiated. Absent such an experiment, the authors should spend more time on explaining the importance of arousal and how it could interfere with oculomotor behavior.

      (2) In this context, it is particularly puzzling that one actually would expect effects of arousal on oculomotor behavior. Specifically, saccade reaction time, accuracy, and speed could be influenced by arousal. The authors should include an analysis of such effects. They should also discuss the absence or presence of such effects and how they affect their other results.

      (3) The authors use the analysis shown in Figure 6D to argue that across recording sessions the activity components capturing variance in pupil size and saccade tuning are uncorrelated. however, the distribution (green) seems to be non-uniform with a peak at very low and very high correlation specifically. The authors should test if such an interpretation is correct. If yes, where are the low and high correlations respectively? Are there potentially two functional areas in SC?

      Comments on revised manuscript:

      I remain somewhat concerned that the authors jump immediately into an analysis of the 'arousal-related' effects on SC activity. Before that, I would like to see a more detailed discussion justifying the use pupil size alone (i.e., w/o other indicators such as RT) as indicative of fluctuations in general arousal that are causal to concomitant changes in SC activity. Instead, in its current form, the authors find changes in SC activity and describe them immediately as 'arousal-related'.

      Other than this conceptual issue, I do not have major problems with the analysis per se.

    4. Reviewer #3 (Public review):

      Summary:

      This study looked at slow changes in neuronal activity (on the order of minutes to hours) in the superior colliculus (SC) and prefrontal cortex (PFC) of two monkeys. They found that SC activity shows slow drift in neuronal activity like in the cortex. They then computed a motor index in SC neurons. By definition, this index is low if the neuron has stronger visual responses than motor response, and it is low if the neuron has weaker visual responses and stronger motor responses. The authors found that the slow drift in neuronal activity was more prevalent in the low motor index SC neurons and less prevalent in the high motor index neurons. In addition, the authors measured pupil diameter and found it to correlate with slow drifts in neuronal activity, but only in the neurons with lower motor index of the SC. They concluded that arousal signals affecting slow drifts in neuronal modulations are brain-wide. They also concluded that these signals are not present in the deepest SC layers, and they interpreted this to mean that this minimizes the impact of arousal on unwanted eye movements.

      Strengths:

      The paper is clear and well-written.

      Showing slow drifts in the SC activity is important to demonstrate that cortical slow drifts could be brain-wide.

      Weaknesses:

      The authors find that the SC cells with the low motor index are modulated by pupil diameter. However, this could be completely independent of an "arousal signal". These cells have substantial visual sensitivity. If the pupil diameter changes, then their activity should be influenced since the monkey is watching a luminous display. So, in this regard, the fact that they do not see "an arousal signal" in the most motor neurons (through the pupil diameter analyses) is not evidence that the arousal signal is filtered out from the motor neurons. It could simply be that these neurons simply do not get affected by the pupil diameter because they do not have visual sensitivity. So, even with the pupil data, it is still a bit tricky for me to interpret that arousal signals are excluded from the "output layers" of the SC.

      Of course, the general conclusion is that the motor neurons will not have the arousal signal. It's just the interpretation that is different in the sense that the lack of the arousal signal is due to a lack of visual sensitivity in the motor neurons.

      I think that it is important to consider the alternative caveat of different amounts of light entering the system. Changes in light level caused by pupil diameter variations can be quite large. Please also note that I do not mean the luminance transient associated with the target onset. I mean the luminance of the gray display. it is a source of light. if the pupil diameter changes, then the amount of light entering to the visually sensitive neurons also changes.

      Comments on revised manuscript:

      The authors have addressed my first primary comment. For the light comment, I'm still not sure they addressed it. At the very least, they should explicitly state the possibility that the amount of light entering from the gray background can matter greatly, and it is not resolved by simply changing the analysis interval to the baseline pre-stimulus epoch. I provide more clear details below:

      In line 194 of the redlined version of the article (in the Introduction), the citation to Baumann et al., PNAS, 2023 is missing near the citation of Jagadisan and Gandhi, 2022. Besides replicating Jagadisan and Gandhi, 2022, this other study actually showed that the subspaces for the visual and motor epochs are orthogonal to each other

      Line 683 (and around) of the redlined version of the article (in the Results): I'm very confused here. When I mentioned visual modulation by changed pupil diameter, I did not mean the transient changes associated with the brief onset of the cue in the memory-guided saccade task. I meant the gray background of the display itself. This is a strong source of light. If the pupil diameter changes across trials, then the amount of light entering the eye also changes from the gray background. Thus, visually-responsive neurons will have different amount of light driving them. This will also happen in the baseline interval containing only a fixation spot. The arguments made by the authors here do not address this point at all. So, please modify the text to explicitly state the possibility that the global luminance of the display (as filtered by the pupil diameter) alters the amount of light driving the visually-responsive neurons and could contribute to the higher effects seen in the more visual neurons.

      The figures (everywhere, including the responses to reviewers) are very low resolution and all equations in methods are missing.

      I'm very confused by Fig. 2 - supplement 2. Panel B shows a firing rate burst aligned to *microsaccade* onset. Does that mean you were in the foveal SC? i.e. how can neurons have a motor burst to the target of the memory-guided saccade and also for microsaccades? And which microsaccade directions caused such a burst? And what does it mean to compute the motor index and spike count for microsaccades in panel C? if you were in the proper SC location for the saccade target, then shouldn't you *not* get any microsaccade-related burst at all? This is very confusing to me and needs to be clarified

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) The authors make fairly strong claims that "arousal-related fluctuations are isolated from neurons in the deep layers of the SC" (emphasis added). This conclusion is based on comparisons between a "slow drift axis", a low-dimensional representation of neuronal drift, and other measures of arousal (Figures 2C, 3) and motor output sensitivity (Figures 2B, 3B). However, the metrics used to compare the slow-drift axis and motor activity were computed during separate task epochs: the delay period (600-1100 ms) and a perisaccade epoch (25 ms before and after saccade initiation), respectively. As the authors reference, deep-layer SC neurons are typically active only around the time of a saccade. Therefore, it is not clear if the lack of arousal-related modulations reported for deep-layer SC neurons is because those neurons are truly insensitive to those modulations, or if the modulations were not apparent because they were assessed in an epoch in which the neurons were not active. A potentially more valuable comparison would be to calculate a slow-drift axis aligned to saccade onset. 

      The reviewer makes an important point that the calculation of an axis can depend critically on the time window of neuronal response. We find when considering this that the slow drift axis is less sensitive to this issue because it is calculated on time-averaged activity over multiple trials. In previous work we found that slow drift calculated on the stimulus evoked response in V4 was very well aligned to slow drift calculated on pre-stimulus spontaneous activity (Cowley et al, Neuron, 2020, Supplemental Figure 3A and 3B). To address this issue in the present data, we compared the axis computed for an example session for neural activity during the delay period and neural activity aligned to saccade onset. As shown new Figure 2 – figure supplement 1 in the revised manuscript, we found a similar lack of arousal-related modulations for deep-layer SC neurons when slow drift was computed using the saccade epoch (25ms before to 25ms after the onset of the saccade). Figure 2 – figure supplement 1A shows loadings for the SC slow drift axis when it was computed using spiking responses during the delay period (as in the main manuscript analysis). In contrast, Figure 2 – figure supplement 1B shows loadings from the same session when the SC slow drift axis was computed using spiking responses during the saccade epoch. The plots are highly similar and in both cases the loadings were weaker for neurons recorded from channels at the bottom of the probe which have a higher motor index. Finally, we found that projections onto the SC slow drift axis for this session were strongly correlated when the slow drift axis was computed using spiking responses during the delay period and the saccade epoch (r = 0.66, p < 0.001, Figure 1C). Taken together, these results suggest that arousal-related modulations are less evident in deep-layer SC neurons irrespective of whether slow drift was computed during the delay or saccade epoch (see also Public Reviews, Reviewer 1, Point 2).

      (2) More generally, arousal-related signals may persist throughout multiple different epochs of the task. It would be worthwhile to determine whether similar "slow-drift" dynamics are observed for baseline, sensory-evoked, and saccade-related activity. Although it may not be possible to examine pupil responses during a saccade, there may be systematic relationships between baseline and evoked responses. 

      Similar to the point above, slow drift dynamics tend to be similar across different response epochs because they are averaged across many trials and seem to tap into responsivity trends that are robust across epochs. As shown in Author response image 1 below, and the Figure 2 – figure supplement 1 in the revised manuscript, similar dynamics were observed when the SC slow drift axis was computed using spiking responses during the baseline, delay, visual and saccade epochs. We did not investigate differences between baseline and evoked pupil responses in the current paper. However, these effects were characterized in one of our previous papers that focused exclusively on the relationship between slow drift and eye-related metrics (Johnston et al., 2022, Cereb. Cortex, Figure 6). In this previous work, we found a negative correlation between baseline and evoked pupil size. Both variables were significantly correlated with slow drift, the only difference being the sign of the correlation.

      Author response image 1.

      (A-C) Dynamics of slow drift for three example sessions when the SC slow drift axis was computed using spiking responses during the baseline, delay, visual and saccade epochs. Baseline = 100ms before the onset of the target stimulus; Delay = 600 to 1100ms after the offset of the target stimulus; Stim = 25ms to 125ms after the onset of the target stimulus; Sac = 25ms before to 25ms after the onset of the saccade.

      Johnston R, Snyder AC, Khanna SB, Issar D, Smith MA (2022) The eyes reflect an internal cognitive state hidden in the population activity of cortical neurons. Cereb Cortex 32:3331–3346.

      (3) The relationships between changes in SC activity and pupil size are quite small (Figures 2C & 5C). Although the distribution across sessions (Figure 2C) is greater than chance, they are nearly 1/4 of the size compared to the PFC-SC axis comparisons. Likewise, the distribution of r2 values relating pupil size and spiking activity directly (Figure 5) is quite low. We remain skeptical that these drifts are truly due to arousal and cannot be accounted for by other factors. For example, does the relationship persist if accounting for a very simple, monotonic (e.g., linear) drift in pupil size and overall firing rate over the course of an individual session? 

      Firstly, it is important to note that the strength of the relationship between projections onto the SC slow drift axis and pupil size (r<sup>2</sup> = 0.06) is within the range reported by Joshi et al. (2016, Neuron, Figure 3). They investigated the median variance explained between the spiking responses of individual SC neurons and pupil size and found it to be approximately 0.02 across sessions. Secondly, our statistical approach of testing the actual distribution of r<sup>2</sup> values against a shuffled distribution was specifically designed to rule out the possibility that the relationship between SC spiking responses and pupil size occurred due to linear drifts. The shuffled distribution in Figure 2C of the main manuscript represents the variance that can be explained by one session’s slow drift correlated with another session’s pupil, which would contain effects that occurred due to linear drifts alone. That the actual proportion of variance explained was significantly greater than this distribution suggests that the relationship between projections onto the SC slow drift axis and pupil size reflects changes in arousal rather than other factors related to linear drifts.

      Joshi S, Li Y, Kalwani RM, Gold JI (2016) Relationships between Pupil Diameter and Neuronal Activity in the Locus Coeruleus, Colliculi, and Cingulate Cortex. Neuron 89:221–234.

      (4) It is not clear how the final analysis (Figure 6) contributes to the authors' conclusions. The authors perform PCA on: (i) residual spiking responses during the delay period binned according to pupil size, and (ii) spiking responses in the saccade epoch binned according to target location (i.e., the saccade tuning curve). The corresponding PCs are the spike-pupil axis and the saccade tuning axis, respectively. Unsurprisingly, the spikepupil axis that captures variance associated with arousal (and removes variance associated with saccade direction) was not correlated with a saccade-tuning axis that captures variance associated with saccade direction and omits arousal. Had these measures been related it would imply a unique association between a neuron's preferred saccade direction and pupil control- which seems unlikely. The separation of these axes thus seems trivial and does not provide evidence of a "mechanism...in the SC to prevent arousal-related signals interfering with the motor output." It remains unknown whether, for example, arousal-related signals may impact trial-by-trial changes in neuronal gain near the time of a saccade, or alter saccade dynamics such as acceleration, precision, and reaction time. 

      The reviewer makes a good point, and we agree that more evidence is needed to determine if the separation of the pupil size axis and saccade tuning axis is the mechanism through which cognitive and arousal-related signals can be intermixed in the SC. In the revised manuscript (lines 679-682), we have raised this as a possible explanation that necessitates further study rather than stating definitively that it is the exact mechanism through which these signals are kept separate. Our analysis here is similar to the one from Smoulder et al (2024, Neuron, Fig. 2F), in which the interactions between reward signals and target tuning in M1 were examined (and found to be orthogonal). While we agree with the reviewer that it may seem “trivial” for these axes to be orthogonal, it does not have to be so. If, for example, neural tuning curves shifted with changes in pupil size through gain changes that revealed tuning or affected tuning curve shape, there could be projections of the pupil axis onto the target tuning axis. Thus, while we agree with the reviewer that it appears sensible for these two axes to be orthogonal, our result is nonetheless a novel finding. We have edited the text in our revised manuscript, however, to make sure the nuance of this point is conveyed to the reader.

      Smoulder AL, Marino PJ, Oby ER, Snyder SE, Miyata H, Pavlovsky NP, Bishop WE, Yu BM, Chase SM, Batista AP. A neural basis of choking under pressure. Neuron. 2024 Oct 23;112(20):3424-33.

      Reviewer #2 (Public Review):

      (1) The greatest weakness in the present research is the fact that arousal is a functionally less important non-motoric variable. The authors themselves introduce the problem with a discussion of attention, which is without any doubt the most important cognitive process that needs to be functionally isolated from oculomotor processes. Given this introduction, one cannot help but wonder, why the authors did not design an experiment, in which spatial attention and oculomotor control are differentiated. Absent such an experiment, the authors should spend more time explaining the importance of arousal and how it could interfere with oculomotor behavior. 

      Although attention does represent an important cognitive process, we did not design an experiment in which attention and oculomotor control are differentiated because attention does not appear to be related to slow drift. In our first paper that reported on this phenomenon, we investigated the effects of spatial attention on slow fluctuations in neural activity by cueing the monkeys to attend to a stimulus in the left or right visual field in a block-wise manner. Each block lasted ~20 minutes and we found that slow drift did not covary with the timing of cued blocks (see Figure 4A, Cowley et al., 2020, Neuron). Furthermore, there is a large body of work showing that arousal also impacts motor behavior leading to changes in a range of eye-related metrics (e.g., pupil size, microsaccade rate and saccadic reaction time - for review, see Di Stasi et al. 2013, Neurosci. Biobehav. Rev.). We also note that the terms attention and arousal are often used in nonspecific and overlapping ways in the literature, adding to some potential confusion here. Nonetheless, pupil-linked arousal is an important variable that impacts motor performance. This has now been stated clearly in the Introduction of the revised manuscript (lines 108-114) to address the reviewer’s concerns and highlight the importance of studying how precise fixation and eye movements are maintained even in the presence of signals related to ongoing changes in brain state. 

      Cowley BR, Snyder AC, Acar K, Williamson RC, Yu BM, Smith MA (2020) Slow Drift of Neural Activity as a Signature of Impulsivity in Macaque Visual and Prefrontal Cortex. Neuron 108:551-567.e8.

      (2) In this context, it is particularly puzzling that one actually would expect effects of arousal on oculomotor behavior. Specifically, saccade reaction time, accuracy, and speed could be influenced by arousal. The authors should include an analysis of such effects. They should also discuss the absence or presence of such effects and how they affect their other results. 

      As described above, several studies across species have demonstrated that arousal impacts motor behavior e.g., saccade reaction time, saccade velocity and microsaccade rate (for review, see Di Stasi et al. 2013, Neurosci. Biobehav. Rev.). This has been clarified in the Introduction of the revised manuscript to address the reviewer's concerns (lines 108-114). Our prior work (Johnston et al, Cerebral Cortex, 2022) shows that slow drift impacts several types of oculomotor behavior. Overall, these studies highlight the impact of arousal on eye movements as a robust effect, and support the present investigation into arousal and oculomotor control signals. While we agree reaction time, accuracy, and speed all can be influenced by arousal depending on task demands, the present study is focused on the connection between slow fluctuations in neural activity, linked to arousal, and different subpopulations of SC neurons. 

      Di Stasi LL, Catena A, Cañas JJ, Macknik SL, Martinez-Conde S (2013) Saccadic velocity as an arousal index in naturalistic tasks. Neurosci Biobehav Rev 37:968–975.

      Johnston R, Snyder AC, Khanna SB, Issar D, Smith MA (2022) The eyes reflect an internal cognitive state hidden in the population activity of cortical neurons. Cereb Cortex 32:3331–3346.

      (3) The authors use the analysis shown in Figure 6D to argue that across recording sessions the activity components capturing variance in pupil size and saccade tuning are uncorrelated. however, the distribution (green) seems to be non-uniform with a peak at very low and very high correlation specifically. The authors should test if such an interpretation is correct. If yes, where are the low and high correlations respectively? Are there potentially two functional areas in SC? 

      We agree with the reviewer that our actual data distribution was non-uniform. We examined individual sessions with high and low variance explained and did not find notable differences. One source of this variation has to do with session length. Longer sessions in principle should have a chance distribution of variance explained closer to zero because they contained more time bins. Given that we had no specific hypothesis for a non-uniform distribution, we have simply displayed the full distribution of values in our figure and the statistical result of a comparison to a shuffled distribution.

      Reviewer #3 (Public Review):

      (1) However, I am concerned about two main points: First, the authors repeatedly say that the "output" layers of the SC are the ones with the highest motor indices. This might not necessarily be accurate. For example, current thresholds for evoking saccades are lowest in the intermediate layers, and Mohler & Wurtz 1972 suggested that the output of the SC might be in the intermediate layers. Also, even if it were true that the high motor index neurons are the output, they are very few in the authors' data (this is also true in a lot of other labs, where it is less likely to see purely motor neurons in the SC). So, this makes one wonder if the electrode channels were simply too deep and already out of the SC? In other words, it seems important to show distributions of encountered neurons (regardless of the motor index) across depth, in order to better know how to interpret the tails of the distributions in the motor index histogram and in the other panels of Figure Supplement 1. I elaborate more on these points in the detailed comments below. 

      The reviewer makes a good point about the efferent signals from SC. It is true that electrical thresholds are often lowest in intermediate layers, though deep layers do project to the oculomotor nuclei (Sparks, 1986; Sparks & Hartwich-Young, 1989) and often intermediate and deep layers are considered to function together to control eye movements (Wurtz & Albano, 1980). As suggested by the reviewer, we have edited the text throughout the manuscript to say that slow drift was less evident in SC neurons with a higher motor index, as well as included the above references and points about the intermediate and deep layers (Lines 73-81). Aside from the question of which layers of the SC function as the “motor output”, the reviewer raises a separate and important question – are our deep recordings still in SC. Here, we can say definitively that they are. We removed neurons if they did not exhibit elevated (above baseline) firing rates during the visual or saccade epochs of the MGS task (see Methods section on “Exclusion criteria”). All included neurons possessed a visual, visuomotor or motor response, consistent with the response properties of neurons in the SC. In addition, we found a number of neurons well above the bottom of the probe with strong motor responses and minimal loadings onto the slow drift axis (see Figure 2 – figure supplement 1A), consistent with the reviewer’s comment that intermediate layer neurons are tuned for movement and play a role in saccade production.

      Mohler CW, Wurtz RH. Organization of monkey superior colliculus: intermediate layer cells discharging before eye movements. Journal of neurophysiology. 1976 Jul 1;39(4):722-44.

      Sparks DL. Translation of sensory signals into commands for control of saccadic eye movements: role of primate superior colliculus. Physiol Rev. 1986 Jan;66(1):118-71. doi: 10.1152/physrev.1986.66.1.118. PMID: 3511480.

      Sparks DL, Hartwich-Young R. The deep layers of the superior colliculus. Reviews of oculomotor research. 1989 Jan 1;3:213-55.

      Wurtz RH, Albano JE. Visual-motor function of the primate superior colliculus. Annu Rev Neurosci. 1980;3:189-226. doi: 10.1146/annurev.ne.03.030180.001201. PMID: 6774653.

      (2) Second, the authors find that the SC cells with a low motor index are modulated by pupil diameter. However, this could be completely independent of an "arousal signal". These cells have substantial visual responses. If the pupil diameter changes, then their activity should be influenced since the monkey is watching a luminous display. So, in this regard, the fact that they do not see "an arousal signal" in most motor neurons (through the pupil diameter analyses) is not evidence that the arousal signal is filtered out from the motor neurons. It could simply be that these neurons simply do not get affected by the pupil diameter because they do not have visual sensitivity. So, even with the pupil data, it is still a bit tricky for me to interpret that arousal signals are excluded from the "output layers" of the SC. 

      The reviewer makes an important point about the SC’s visual responses. Neurons with a low motor index are, conversely, likely to have a stronger visual response index. However, we do not believe that changes in luminance can explain why the correlation between SC spiking response and pupil size is weaker for neurons with a lower motor index. Firstly, the changes in pupil size observed in the current paper and our previous work are slow and occur on a timescale of minutes (Cowley et al., 2020, Neuron) and are correlated with eye movement measures such as reaction time and microsaccade rate (Johnston et al., 2022, Cerebral Cortex). This is in stark contrast to luminance-evoked changes in pupil size that occur on a timescale of less than a second. Secondly, as shown the new Figure 5 – figure supplement 1 in the revised manuscript, very similar results were found when SC spiking responses were correlated with pupil size during the baseline period, when only the fixation point was on the screen. Although the luminance of the small peripheral target stimulus can result in small luminance-evoked changes in pupil size, no changes in luminance occurred during the baseline period which was defined as 100ms before the onset of the target stimulus. In Figure 2 – figure supplement 1 and Author response image 1 above, we show that slow drift is the same whether calculated on the baseline response, delay period, or peri-saccadic epoch. Thus, the measurement of slow drift is insensitive to the precise timing of the selection of both the window for the spiking response and the window for the pupil measurement. If luminance were the explanation for the slow changes in firing observed in visually responsive SC neurons, it would require those neurons to exhibit robust, sustained tuned responses to the small changes in retinal illuminance induced by the relatively small fluctuations in pupil size we observed from minute to minute. We are aware of no reports of such behavior in visually-responsive neurons in SC. We have included these analyses and this reasoning in the revised manuscript on lines 478-495.

      Reviewer#1 (Recommendations for the author):

      (1) It would be useful to provide line numbers in subsequent manuscripts for reviewers.

      Line numbers have been added in the revised version of the manuscript.

      (2) Page #6; last sentence: "...even impact processing at the early to mid stages of the visuomotor transformation, without leading to unwanted changes in motor output." I do not believe the authors have provided evidence that arousal levels were not associated with changes in motor output.

      As suggested by Reviewer 3 (see Public Reviews, Reviewer 3, Point 2), we have edited the text throughout the manuscript to say that slow drift was less evident in SC neurons with a higher motor index. This sentence in the revised manuscript now reads:

      “This provides a potential mechanism through which signals related to cognition and arousal can exist in the SC, and even impact processing at the early to mid stages of the visuomotor transformation, without leading to unwanted changes in SC neurons that are linked to saccade execution.”

      (3) Page #8; last paragraph: Although deep-layer SC neurons may not have been obtained during every recording session, a summary of the motor index scores observed along the probe across sessions would be useful to confirm their assumptions. 

      See Author response image 2 below which shows the motor index of each recoded SC neuron on the x-axis and session number on the y-axis. The points are colored by to the squared factor loading which represents the variance explained between the response a neuron and the slow drift axis (see Figure 3B of the main manuscript). You can see from this plot that neurons with a stronger component loading (shown in teal to yellow) typically have a lower motor index whereas the opposite is true for neurons with a weaker component loading (shown in dark blue).

      Author response image 2.

      Scatter plot showing the motor index of each recorded neuron along with the session number in which it was recorded. The points are colored by to the squared factor loading for each neuron along the slow drift axis. Note that loadings above 0.5 (33 data points in total) have been thresholded at 0.5 so that we could effectively use the color range to show all of the slow drift axis loadings.

      (4) Page #10; first paragraph: The authors should state the time window of the delay period used, since it may be distinct from the pupil analysis (first 200ms of delay). 

      This has been stated in the revised version of the manuscript. The sentence now reads:

      “We first asked if arousal-related fluctuations are present in the SC. As in previous studies that recorded from neurons in the cortex (Cowley et al., 2020), we found that the mean spiking responses of individual SC neurons during the delay period (chosen at random on each trial from a uniform distribution spanning 600-1100ms, see Methods) fluctuated over the course of a session while the monkeys performed the MGS task (Figure 2A, left).”

      (5) Page #10; second paragraph: Extra period at the end of a sentence: " most variance in the data..". 

      Fixed in the revised version of the manuscript.

      (6) Page #12: "between projections onto the SC slow drift axis and mean pupil size during the first 200ms of the delay period when a task-related pupil response could be observed." What criteria was used to determine whether a task-related pupil response was observed? 

      This was chosen based on the results of a previous study in our lab that used the same memory-guided saccade task to investigate the relationship between slow drift and changes in based and evoked pupil size (see Johnston et al., 2022, Cereb. Cortex, Figure 6B). The period was chosen based on plotting the average pupil size aligned on different trial epochs. As we show in Figure 5-figure supplement 3 above, the pupil interactions with slow drift did not depend on the particular time window of the pupil we chose.  

      (7) Page #14; Figure 2A: The axes for the individual channels are strangely floating and quite different from all other figures. Please label the channel in the figure legend that was used as an example of the projected values onto the slow drift axis.

      The figure has been changed in the revised version of the manuscript so that the tick mark denoting zero residual spikes per second is on the top layer of each plot. A scale bar was chosen instead of individual axes to reduce clutter in the figure as it was used to demonstrate how slow drift was computed. Residual spiking responses from all neurons were projected on the slow drift axis to generate the scatter plot in the bottom right-hand corner of Figure 2A. There is no single neuron to label.

      (8) Page #16: "These results demonstrate that even though arousal-related fluctuations are present in the SC, they are isolated from deep-layer neurons that elicit a strong saccadic response and presumably reside closer to the motor output." In line with our major comments, lack of arousal-related activity during the delay period is meaningless for deep-layer SC neurons that are generally inactive during this time. It does not imply that there is no arousal signal! 

      Addressed in Public Reviews, Reviewer 1, Point 1 & 2. We found a similar lack of arousal-related modulations reported for deep-layer SC neurons when slow drift was computed using the saccade epoch (Figure 1 above). In addition, similar dynamics were observed when the SC slow drift axis was computed using spiking responses during the baseline, delay, visual and saccade period (Figure 2).

      (9) Page #18: "These findings provide additional support for the hypothesis that arousalrelated fluctuations are isolated from neurons in the deep layers of the SC." The same criticism from above applies.

      Addressed in Public Reviews, Reviewer 1, Point 1 & 2.

      (10) Page #20; paragraph 3: "Taken together, the findings outlined above..." Would be useful to be more specific when referring to "activity" ; e.g., "...these neurons did not exhibit large fluctuations in delay-period activity over time".

      This sentence has been changed in the revised manuscript in light of the reviewer’s comments. It now reads:

      “In addition to being more weakly correlated with pupil size, the spiking responses of these neurons did not exhibit large fluctuations over time (Figure 2), and when considering the neuronal population as a whole, explained less variance in the slow drift axis when it was computed using population activity in the SC (Figure 3) and PFC (Figure 4).”

      Reviewer #3 (Recommendations for the author):

      The paper is clear and well-written. However, I am concerned about two main points: 

      (1) First, the authors repeatedly say that the "output" layers of the SC are the ones with the highest motor indices. This might not necessarily be accurate. For example, current thresholds for evoking saccades are lowest in the intermediate layers, and Mohler & Wurtz 1972 suggested that the output of the SC might be in the intermediate layers. Also, even if it were true that the high motor index neurons are the output, they are very few in the authors' data (this is also true in a lot of other labs, where it is less likely to see purely motor neurons in the SC). So, this makes one wonder if the electrode channels were simply too deep and already out of the SC. In other words, it seems important to show distributions of encountered neurons (regardless of motor index) across depth, in order to better know how to interpret the tails of the distributions in the motor index histogram and in the other panels of the figure supplement 1. I elaborate more on these points in the detailed comments below. 

      Addressed in Public Reviews, Reviewer 3, Point 1.

      (2) Second, the authors find that the SC cells with a low motor index are modulated by pupil diameter. However, this could be completely independent of an "arousal signal". These cells have substantial visual responses. If the pupil diameter changes, then their activity should be influenced since the monkey is watching a luminous display. So, in this regard, the fact that they do not see "an arousal signal" in most motor neurons (through the pupil diameter analyses) is not evidence that the arousal signal is filtered out from the motor neurons. It could simply be that these neurons simply do not get affected by the pupil diameter because they do not have visual sensitivity. So, even with the pupil data, it is still a bit tricky for me to interpret that arousal signals are excluded from the "output layers" of the SC. 

      Addressed in Public Reviews, Reviewer 3, Point 2.

      (3) I think that a remedy to the first point above is to change the text to make it a bit more descriptive and less interpretive. For example, just say that the slow drifts were less evident among the neurons with high motor index. 

      We thank the reviewer for this suggestion (see Public Reviews, Reviewer 3, Point 1).

      (4) For the second point, I think that it is important to consider the alternative caveat of different amounts of light entering the system. Changes in light level caused by pupil diameter variations can be quite large. 

      We thank the reviewer for this suggestion (see Public Reviews, Reviewer 3, Point 2).

      (5) Line 31: I'm a bit underwhelmed by this kind of statement. i.e. we already know that cognitive processes and brain states do alter eye movements, so why is it "critical" that high precision fixation and eye movements are maintained? And, isn't the next sentence already nulling this idea of criticality because it does show that the brain state alters the SC neurons? In fact, cognitive processes are already known to be most prevalent in the intermediate and deep layers of the SC. 

      It seems clear that while cognitive state does affect eye movements, it is desirable to have some separation between cognitive state and eye movement control. Covert attention, for instance, is precisely a situation where eye movement control is maintained to avoid overt saccades to the attended stimulus, and yet there are clear indications of attention’s impact on microsaccades and fixation. We stand by our statement that an important goal of vision is to have precise fixation and movements of the eye, and yet at the same time the eyes are subject to numerous influences by cognitive state.

      (6) Line 65: it is better to clarify that these are "functional layers" because there are actually more anatomical layers. 

      We have edited this sentence in the revised version of the manuscript so that it now reads:

      “The role of these projections in the visuomotor transformation depends on the functional layer of the SC in which they terminate”.

      (7) Line 73: this makes it sound like only the deepest layers are topographically organized, which is not true. Also, as early as Mohler & Wurtz, 1972, it was suggested that the intermediate layers have the biggest impacts downstream of the SC. This is also consistent with electrical microstimulation current thresholds for evoking saccades from the SC. 

      We have addressed the reviewers’ comments about the intermediate layers having the biggest impact downstream of the SC in Public Reviews, Reviewer 3, Point 1. Furthermore, line 73 has been changed in the revised manuscript so that it now reads:

      “As is the case for neurons in the superficial and intermediate layers, they [SC motor neurons] form a topographically organized map of visual space (White et al. 2017; Robinson 1972; Katnani and Gandhi 2011)”.  

      (8) Line 100: there is an analogous literature regarding the question of why unwanted muscle contractions do not happen. Specifically, in the context of why SC visual bursts do not automatically cause saccades (which is a similar problem to the ones you mention about cognitive signals interfering by generating unwanted eye movements), both Jagadisan & Gandhi, Curr Bio, 2022 and Baumann et al, PNAS, 2023 also showed that SC population activity not only has different temporal structure (Jagadisan & Gandhi) but also occupy different subspaces (Baumann et al) under these two different conditions (visual burst versus saccade burst). This is conceptually similar to the idea that you are mentioning here with respect to arousal. So, it is worth it to mention these studies here and again in the discussion. 

      We are grateful to the reviewer for these suggestions and have included text in the Introduction (Lines 125-128) and Discussion (Lines 678-682) of the revised manuscript along with the references cited above.

      (9) Line 147: as mentioned above, it is now generally accepted that there are quite a few "pure" motor neurons in the SC. This is consistent with what you find. E.g. Baumann et al., 2023. And, again see Mohler and Wurtz in the 1970's. So, I wonder how useful it is to go too much into this idea of the deeper motor neurons (e.g. the correlations in the other panels of the Figure 1 supplement). 

      This is related to the reviewer’s comment that the output of the SC might be in the intermediate layers. This concern has been addressed in Public Reviews, Reviewer 3, Point 1.

      (10) Figure 1 should say where the RF was for the shown spike rasters. i.e. were these the same saccade target across trials? And where was that location relative to the RF? It would help also in the text to say whether the saccade was always to the RF center or whether you were randomizing the target location. 

      We centered the array of saccade targets using the microstimulation-evoked eye movement for SC (see Methods section “Memory-guided saccade task”) to find the evoked eccentricity, and then used saccade targets with equal spacing of 45 degrees starting at zero (rightward saccade target). We did not do extensive RF mapping beyond this microstimulation centering. In Figure 1, the spike rasters are shown for a target that was visually identified to be within the neuron’s RF based on assessing responses to all 8 target angles. We have added information about this to the figure caption.

      (11) Line 218: but were there changes in the eye movement statistics? For example, the slow drift eye movements during fixation? Or even the microsaccades? 

      Addressed in Public Reviews, Reviewer 2, Point 2.  

      (12) Line 248: shuffling what exactly? I think that more explanation would be needed here. 

      Addressed in Public Reviews, Reviewer 1, Point 3.  

      (13) Line 263: but isn't this reflecting a sensory transient in the pupil diameter, since the target just disappeared? 

      Addressed in Public Reviews, Reviewer 3, Point 2.  

      (14) Line 271: I suspect that slow drift eye movements (in between microsaccades) would show higher correlations. Not sure how well you can analyze those with a video-based eye tracker. 

      We agree that fixational drift would be a worthwhile metric, but it is not one we have focused on here and to our knowledge does require higher precision tracking. 

      (15) Line 286: again, see above about similar demonstrations with respect to the visual and motor burst intervals, which clearly cause the same problem (even stronger) as the one studied here. 

      See reply, including Figure 2.

      (16) Line 330: again, I'm not sure deeper necessarily automatically means closer to the output. For example, current thresholds for evoked saccades grow higher as you go deeper. Maybe the authors can ask their colleague Neeraj Gandhi about this point specifically, just to be safe. Maybe the safest would be to remain descriptive about the data, and just say something like: arousal-related fluctuations were absent in our deepest recorded sites. 

      Addressed in Public Reviews, Reviewer 3, Point 1.

      (17) Line 332: likewise, statements like this one here would be qualified if the output was the intermediate layers......anyway if I understand what I read so far in the paper, the signal will be anyway orthogonal to the motor burst population subspace. So, maybe there's no need to emphasize that it goes away in the very deepest layers. 

      See reply above, Public Reviews, Reviewer 1, Point 4.

      (18) Figure 3A: related to the above, I think one issue could be that the deeper contacts might already be out of the SC. Maybe some cell count distribution from each channel should help in this regard. i.e. were you finding way fewer saccade-related neurons in the deepest channels (even though the few that you found were with high motor index)? If so, then wouldn't this just mean that the channel was too deep? I think there needs to be an analysis like this, to convince readers that the channels were still in the SC. Ideally, electrical stimulation current thresholds for evoking saccades at different depths would be tested, but I understand that this can be difficult at this stage. 

      Addressed in Public Reviews, Reviewer 3, Point 1.

      (19) I keep repeating this because in general, cognitive effects are stronger in the intermediate/deeper layers than in the superficial layers. If these interfere with eye movements like arousal, then why should arousal be different?

      Few studies have investigated the effects of attention on “pure” movement SC neurons that only discharge during a saccade. One study, which we cited in Introduction (Ignashchenkova et al., 2004, Nat. Neurosci.), found significant differences in spiking responses between trials with and without attentional cueing for visual and visuomotor neurons. No significant difference was found for motor neurons, consistent with our hypothesis that signals related to cognition and arousal are kept separate from saccade-related signals in the SC.

      (20) The problem with Figure 5 and its related text is that the neurons with low motor index are additionally visual. So, of course, they can be modulated if the pupil diameter changes!

      Addressed in Public Reviews, Reviewer 3, Point 2.  

      (21) I had a hard time understanding Figure 6. 

      See reply above, Public Reviews, Reviewer 1, Point 4.

      (22) Line 586: these cells have more visual responses and will be affected by the amount of light entering the eye. 

      Addressed in Public Reviews, Reviewer 3, Point 2.

    1. eLife Assessment

      This important study presents a novel approach to enhance the therapeutic potential of mesenchymal stromal cells (MSCs) by genetically modifying their glycogen synthesis pathway, resulting in increased glycogen accumulation and improved cell survival under starvation conditions, particularly in the context of experimental pulmonary fibrosis. The methods and findings are generally solid and could be strengthened in the future by investigating the kinetics of persistence, the immunomodulatory effects, and the underlying improved mechanism of action of MSCs in this pulmonary fibrosis model. If confirmed, this approach could suggest potential methods to improve the therapeutic functionality of MSCs in cell therapy strategies.

    2. Reviewer #1 (Public review):

      Summary:

      This study provides the first evidence that glucose availability, previously shown to support cell survival in other models, is also a key determinant for post-implantation MSC survival in the specific context of pulmonary fibrosis. To address glucose depletion in this context, the authors propose an original, elegant, and rational strategy: enhancing intracellular glycogen stores to provide transplanted MSCs with an internal energy reserve. This approach aims to prolong their viability and therapeutic functionality after implantation.

      Strengths:

      The efficacy of this metabolic engineering strategy is robustly demonstrated both in vitro and in an orthotopic mouse model of pulmonary fibrosis.

    3. Reviewer #2 (Public review):

      Summary:

      In this article, the authors investigate enhancing the therapeutic and regenerative properties of mesenchymal stem cells (MSCs) through genetic modification, specifically by overexpressing genes involved in the glycogen synthesis pathway. By creating a non-phosphorylatable mutant form of glycogen synthase (GYSmut), the authors successfully increased glycogen accumulation in MSCs, leading to significantly improved cell survival under starvation conditions. The study highlights the potential of glycogen engineering to improve MSC function, especially in inflammatory or energy-deficient environments. However, critical gaps in the study's design, including the lack of validation of key findings, limited differentiation assessments, and missing data on MSC-GYSmut resistance to reactive oxygen species (ROS), necessitate further exploration.

      Strengths:

      (1) Novel Approach: The study introduces an innovative method of enhancing MSC function by manipulating glycogen metabolism.

      (2) Increased Glycogen Storage: The genetic modification of GYS1, resulting in GYSmut, significantly increased glycogen accumulation, leading to improved MSC survival under starvation, which has strong implications for enhancing MSC therapeutic properties in energy-deficient environments.

      (3) Potential Therapeutic Impact: The findings suggest significant therapeutic potential for MSCs in conditions that require improved survival, persistence, and immunomodulation, especially in inflammatory or energy-limited settings.

      (4) In Vivo Validation: The in vivo murine model of pulmonary fibrosis demonstrated the improved survival and persistence of MSC-GYSmut, supporting the translational potential of the approach.

      Weaknesses:

      (1) Lack of Differentiation Assessments: The study did not evaluate key MSC differentiation pathways, including chondrogenic and osteogenic differentiation. The absence of analysis of classical MSC surface markers and multipotency limits the understanding of the full potential of MSC-GYSmut.

      (2) Missing Validation of RNA Sequencing Data: Although RNA sequencing data revealed promising transcriptomic changes in chondrogenesis and metabolic pathways, these findings were not experimentally validated, limiting confidence.

      (3) Lack of ROS Resistance Analysis: Resistance to reactive oxygen species (ROS), an important feature for MSCs under regenerative conditions, was not assessed, leaving out a critical aspect of MSC function.

      (4) Limited Exploration of Immunosuppressive Properties: The study did not address the immunosuppressive functions of MSC-GYSmut, which are critical for MSC-based therapies in clinical settings.

      Conclusion:

      The study presents an exciting new direction for enhancing MSC function through glycogen metabolism engineering. While the results show promise, key experiments and validations are missing, and several areas, such as differentiation capacity, ROS resistance, and immunosuppressive properties, require further investigation. Addressing these gaps would solidify the conclusions and strengthen the potential clinical applications of MSC-GYSmut in regenerative medicine.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review)

      (1) Glycogen biosynthesis typically involves several enzymes. In this context, could the authors comment on the effect of overexpressing a single enzyme - especially a mutant version - on the structure or quality of the glycogen synthesized?

      While quantitative molecular weight analysis of synthesized glycogen was not performed, we documented changes in glycogen particle morphology. GYSmut overexpression resulted in significantly enlarged singular glycogen granules, suggesting potential high molecular mass, while GYS-GYG co-overexpression in MSCs (GYG being the essential enzyme for glycogen synthesis initiation) produced a diffuse glycogen distribution pattern rather than particulate structures. We have incorporated this result as new Figure S2C.

      These results suggest that overexpression of specific glycogen-metabolizing enzymes significantly influences glycogen structure. Consequently, targeted modulation of glycogen architecture and properties through key enzymes represents a potential avenue for future investigation.

      (2) Regarding the in vitro starvation experiments (Figure 2C), what oxygen conditions (pO₂) were used? Are these conditions physiologically relevant and representative of the in vivo lung microenvironment?

      Our in vitro starvation experiments (Figure 3C) were conducted under normoxic (21%). The oxygen concentration in human lungs is physiologically lower than atmospheric levels, with healthy individuals exhaling air containing approximately 16% oxygen (Thalakkotur Lazar Mathew, Diagnostics 2015). To our knowledge, direct measurements of alveolar oxygen concentration in pulmonary fibrosis are rare. Therefore, to evaluate the performance of GYSmut under hypoxic conditions, in the revised manuscript, Figure S2 has been augmented to include assessment of cell performance under combined hypoxia (oxygen concentration < 5%)and nutrient deprivation stress, which further corroborate the superiority of the GYSmut group over the control under different oxygen concentrations. 

      (3) In the in vitro model, how many hours does it take for the intracellular glycogen reserve to be completely depleted under starvation conditions?

      While quantitative cell viability data were recorded up to 72 hours post-implantation (Fig 3C), we observed cell viability at approximately 96 hours. We noticed that the presence of glycogen particles exhibited a correlation with sustained cell viability. However, reliable quantitative assessment of glycogen became increasingly challenging upon significant depletion of viable cells, thereby limiting our measurements during later time points.

      (4) For the in vivo model, is there a quantitative analysis of the survival kinetics of the transplanted cells over time for each group? This would help to better assess the role and duration of glycogen stores as an energy buffer after implantation.

      We tracked the in vivo distribution and persistence of implanted MSCs using enzymatic activity quantification assays (using Gluc luciferase assay) and live animal imaging (using Akaluc luciferase). The revised manuscript includes quantitative analysis of the in vivo fluorescence imaging data, which has been supplemented as Figure S4. Glycogen-engineered MSCs and control cells were quantitatively assessed at three discrete time points post-implantation. This quantification revealed a transient divergence in cell viability between the experimental and control groups around day 7. However, fluorescence in both cohorts subsequently declined to similar levels over the extended observation period.

      (5) Finally, the study was performed in male mice only. Could sex differences exist in the efficacy or metabolism of the engineered MSCs? It would be helpful to discuss whether the approach could be expected to be similarly effective in female subjects.

      We appreciate the reviewer’s important question regarding potential sex differences. Our study used male mice based on three key considerations: 1) Clinical Relevance: Idiopathic pulmonary fibrosis (IPF) shows significant male predominance, with diagnosis rates 3.5-fold higher in men (37.8% vs 10.6%, p<0.0001) and greater diagnostic confidence (Assayag et al., Thorax 2020). 2) Model Consistency: The bleomycin model (our chosen method) demonstrates more consistent fibrotic responses in male mice (Gul et al., BMC Pulm Med 2023). 3) Biological Rationale:

      Estrogen’s protective effects in females may confound therapeutic assessments (cited in Assayag et al.).

      We fully acknowledge this limitation and will include female subjects in subsequent translational studies. The therapeutic principle should theoretically apply to both sexes, but we agree this requires experimental validation.

      (6) The number of mice for each group and time point should be specified.

      The manuscript text has been revised to enhance clarity, and the number of mice for each group and time point has been specified (line 170 to 182).

      Reviewer #2 (Public Review):

      (4) Inconsistencies in In Vivo Data: There is a discrepancy between the number of animals shown in the figures and the graph (three individuals vs. five animals), as well as missing details on how luciferase signal intensity was quantified, requiring further clarification.

      To assess MSC survival in vivo, we employed two strategies utilizing distinct luciferases optimized for specific detection modalities. MSC viability was quantified ex vivo through Gaussia luciferase (Gluc) activity, leveraging its high sensitivity and established commercial assay kits (n = 3 mice per group per time point). For non-invasive longitudinal tracking within living animals, MSC distribution and viability were monitored via in vivo bioluminescence imaging using Akaluc luciferase, selected for its superior tissue penetration and sensitivity in situ (n = 5 mice per group).The manuscript text has been revised to enhance clarity, and the experiment protocols for luciferase signal detection and quantification has been added into Methods.

      (1) (2) (3) (5):

      We fully agree that further investigation into the functional consequences of glycogen engineering in MSCs – encompassing core cellular functions, immunomodulatory properties, and associated signaling pathways – is important to fully elucidate the underlying mechanisms. Cellular metabolism is intrinsically intertwined with diverse physiological processes. Consequently, we believe that glycogen engineering exerts multifaceted effects on MSCs, likely extending beyond the modulation of any single specific pathway. Studying the metabolic perturbation induced by such engineering approaches in mammalian cells represents an interesting field. The exploration of these aspects remains an long-term research objective within our group.

      Reviewer #2 (Recommendations for the authors):

      (6) Clarification of Data in the Murine Model:

      In Figure 4B, there is a discrepancy between the number of animals shown in the image (five) and those represented in the graph (three). This discrepancy needs clarification. Additionally, the study lacks information regarding the intensity of the signal in the luciferase assays. It is unclear how luciferase expression in the mice was quantified, and providing this detail would enhance the understanding of the data presented.

      We sincerely appreciate these valuable suggestions. We have revised the relevant text for greater clarity. Figure 4B and Figure 4C present results from two distinct experimental approaches, each employing different luciferase reporters and measurement methodologies, and different num of mice were used in these two experiments.

      Quantitative data derived from the in vivo bioluminescence imaging has been supplemented as Figure S4. The experiment protocols for luciferase signal detection and quantification has been added into Methods.

      To other recommendations of reviewer 2:

      We sincerely appreciate your valuable insights, which demonstrate your deep expertise. We fully agree that beyond nutrient availability, factors such as reactive oxygen species (ROS) and the immune microenvironment are also critical limitations affecting the survival and therapeutic efficacy of implanted MSCs.

      We propose that glycogen engineering exerts broad effects on MSCs. These effects manifest as changes in multiple cellular characteristics, including proliferation, differentiation, surface marker expression, antioxidant capacity, and immunomodulatory activity – all crucial factors for the therapeutic purpose of MSCs.

      We believe these changes likely involve complex networks of interconnected regulatory factors. The underlying mechanisms might be clarified through proteomic and metabolomic profiling.

      However, comprehensively investigating these interconnected aspects requires significant time and resources. Some components of this research extend beyond the current scope of our project. Nevertheless, exploring these mechanisms remains an important objective, and we will actively work to investigate them further in our ongoing studies.

    1. eLife Assessment

      This important manuscript evaluates how sample size and demographic balance of reference cohorts affect the reliability of normative models. The evidence supporting the conclusions is convincing, although some additional analysis and clarifications could improve the generalisability of the conclusions. This work will be of interest to clinicians and scientists working with normative models.

    2. Reviewer #1 (Public review):

      Summary:

      Overall, this is a well-designed and carefully executed study that delivers clear and actionable guidance on the sample size and representative demographic requirements for robust normative modelling in neuroimaging. The central claims are convincingly supported.

      Strengths:

      The study has multiple strengths. First, it offers a comprehensive and methodologically rigorous analysis of sample size and age distribution, supported by multiple complementary fit indices. Second, the learning-curve results are compelling and reproducible and will be of immediate utility to researchers planning normative modelling projects. Third, the study includes both replication in an independent dataset and an adaptive transfer analysis from UK Biobank, highlighting both the robustness of the results and the practical advantages of transfer learning for smaller clinical cohorts. Finally, the clinical validation ties the methodological work back to clinical application.

      Weaknesses:

      There are two minor points for consideration:

      (1) Calibration of percentile estimates could be shown for the main evaluation (similar to that done in Figure 4E). Because the clinical utility of normative models often hinges on identifying individuals outside the 5th or 95th percentiles, readers would benefit from visual overlays of model-derived percentile curves on the curves from the full training data and simple reporting of the proportion of healthy controls falling outside these bounds for the main analyses (i.e., 2.1. Model fit evaluation).

      (2) The larger negative effect of left-skewed sampling likely reflects a mismatch between the younger training set and the older test set; accounting explicitly for this mismatch would make the conclusions more generalisable.

    3. Reviewer #2 (Public review):

      Summary:

      The authors test how sample size and demographic balance of reference cohorts affect the reliability of normative models in ageing and Alzheimer's disease. Using OASIS-3 and replicating in AIBL, they change age and sex distributions and number of samples and show that age alignment is more important than overall sample size. They also demonstrate that models adapted from a large dataset (UK Biobank) can achieve stable performance with fewer samples. The results suggest that moderately sized but demographically well-balanced cohorts can provide robust performance.

      Strengths:

      The study is thorough and systematic, varying sample size, age, and sex distributions in a controlled way. Results are replicated in two independent datasets with relatively large sample sizes, thereby strengthening confidence in the findings. The analyses are clearly presented and use widely applied evaluation metrics. Clinical validation (outlier detection, classification) adds relevance beyond technical benchmarks. The comparison between within-cohort training and adaptation from a large dataset is valuable for real-world applications.

      The work convincingly shows that age alignment is crucial and that adapted models can reach good performance with fewer samples. However, some dataset-specific patterns (noted above) should be acknowledged more directly, and the practical guidance could be sharper.

      Weaknesses:

      The paper uses a simple regression framework, which is understandable for scalability, but limits generalization to multi-site settings where a hierarchical approach could better account for site differences. This limitation is acknowledged; a brief sensitivity analysis (or a clearer discussion) would help readers weigh trade-offs. Other than that, there are some points that are not fully explained in the paper:

      (1) The replication in AIBL does not fully match the OASIS results. In AIBL, left-skewed age sampling converges with other strategies as sample size grows, unlike in OASIS. This suggests that skew effects depend on where variability lies across the age span.

      (2) Sex imbalance effects are difficult to interpret, since sex is included only as a fixed effect, and residual age differences may drive some errors.

      (3) In Figure 3, performance drops around n≈300 across conditions. This consistent pattern raises the question of sensitivity to individual samples or sub-sampling strategy.

      (4) The total outlier count (tOC) analysis is interesting but hard to generalize. For example, in AIBL, left-skew sometimes performs slightly better despite a weaker model fit. Clearer guidance on how to weigh model fit versus outlier detection would strengthen the practical message.

      (5) The suggested plateau at n≈200 seems context-dependent. It may be better to frame sample size targets in relation to coverage across age bins rather than as an absolute number.

    4. Author response

      We would like to thank the editors and two reviewers for the assessment and the constructive feedback on our manuscript, “Toward Robust Neuroanatomical Normative Models: Influence of Sample Size and Covariates Distributions”. We appreciate the thorough reviews and believe the constructive suggestions will substantially strengthen the clarity and quality of our work. We plan to submit a revised version of the manuscript and a full point-by-point response addressing both the public reviews and the recommendations to the authors. 

      Reviewer 1. 

      In revision, we plan to address the reviewer’s comments by: (i) strengthen the interpretation of model fit through reporting the proportion of healthy controls within and outside the extreme percentile bounds; (ii) adding age-resolved overlays of model-derived percentile curves compared to those from the full reference cohort for key sample sizes and regions; (iii) quantifying age-distribution alignment between train and test set; and (iv) summarizing model performance as a joint function of age-distribution alignment and sample size.

      Reviewer 2. 

      In the revised manuscript, we will (i) expand the Discussion to more clearly outline the trade-offs between simple regression frameworks and hierarchical models for normative modeling (e.g., scalability, handling of multi-site variation, computational considerations), and discuss alternative approaches and harmonization as important directions for multi-site settings; (ii) contextualize OASIS-3 vs AIBL differences by quantifying train– test age-alignment across sampling strategies and emphasize that skewness should be interpreted relative to the target cohort’s alignment rather than absolute numbers. (iii) reassess sex-imbalance effects by reporting expected age distributions per condition and re-evaluate sex effects while controlling for age; (iv) investigate the apparent dip at n≈300 dip by increasing sub-sampling seeds, testing neighboring sample sizes, and using an alternative age-binning scheme to clarify the observed artifact; (v) clarify potential divergence between tOC separation and global fit under discrepancies in demographic distributions and relate tOC to age-alignment distance; (vi)  reframe the sample-size guidance in terms of distributional alignment rather than an absolute n.

    1. eLife Assessment

      This important paper reports the development of proteins and small molecules that induce degradation of a clinically-relevant oncogenic transcription factor, LMO2. The findings provide a proof of concept that PROTAC-type chemicals can be developed against intrinsically disordered proteins. The methods provide a blueprint for rational design of PROTACs starting from intracellular antibody paratopes. Overall, the paper is supported by solid evidence and will be of interest to chemical biologists and cancer pharmacologists.

    2. Reviewer #2 (Public review):

      Summary:

      Sereesongsaeng et al. aimed to develop degraders for LMO2, an intrinsically disordered transcription factor activated by chromosomal translocation in T-ALL. The authors first focused on developing biodegraders, which are fusions of an anti-LMO2 intracellular domain antibody (iDAb) with cereblon. Following demonstrations of degradation and collateral degradation of associated proteins with biodegraders, the authors proceeded to develop PROTACs using antibody paratopes (Abd) that recruit VHL (Abd-VHL) or cereblon (Abd-CRBN). The authors show dose-dependent degradation of LMO2 in LMO2+ T-ALL cell lines, as well as concomitant dose-dependent degradation of associated bHLH proteins in the DNA-binding complex. LMO2 degradation via Abd-VHL was also determined to inhibit proliferation and induce apoptosis in LMO2+ T-ALL cell lines.

      Strengths:

      The topic of degrader development for intrinsically disordered proteins is of high interest and the authors aimed to tackle a difficult drug target. The authors evaluated methods including the development of biodegraders, as well as PROTACs that recruit two different E3 ligases. The study includes important chemical control experiments, as well as proteomic profiling to evaluate selectivity.

      Weaknesses:

      Several weaknesses remain in this study:

      (1) The overall degradation achieved is not highly potent (although important proof-of-concept);

      (2) The mechanism of collateral degradation is not completely addressed. The authors acknowledge possible explanations, which would require mutagenesis and structural studies to further dissect;

      (3) The proteomics experiments do not detect LMO2, which the authors attribute to its size, making it difficult to interpret.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary:

      The authors describe the degradation of an intrinsically disordered transcription factor (LMO2) via PROTACs (VHL and CRBN) in T-ALL cells. Given the challenges of drugging transcription factors, I find the work solid and a significant scientific contribution to the field. 

      Strengths: 

      (1) Validation of LMO2 degradation by starting with biodegraders, then progressing to chemical degrades. 

      (2)interrogation of the biology and downstream pathways upon LMO2 degradation (collateral degradation §

      (3) Cell line models that are dependent/overexpression of LMO2 vs LMO2 null cell lines. 

      (4) CRBN and VHL-derived PROTACs were synthesized and evaluated. 

      Weaknesses: 

      (1) The conventional method used to characterize PROTACs in the literature is to calculate the DC50 and Dmax of the degraders, I did not find this information in the manuscript. 

      As noted in the reply to referee’s point 4 below, our first generation compounds are not highly potent. The DC<sub>50</sub> values have been computed specifically using Western blot reflected in the data shown in Fig. 2. The revised version Supplementary Fig. S3 shows these quantified Western blot data from a time course of treating KOPT-K1 cells with either Abd-CRBN and Abd-VHL, where the 24 hour blot data are shown in Figure 2, G and E, and the quantified data from each 24 hour treatment are quantified in Supplementary Fig. S3). With these data, the DC<sub>50</sub> values 9 μM for Abd-CRBN and 15 μM Abd-VHL), included in in the main text and the Supplementary Fig. S3 figure legend.

      In addition, the loss of signal of the LMO2-Rluc reporter protein from PROTAC treated cells shown in Fig. 2M has been used to calculate a half-point of degradation; although strictly not DC<sub>50</sub>, as it measures a reporter protein, this yielded values are 10 μM for Abd-CRBN and 9 μM Abd-VHL. 

      (2) The proteomics data is not very convincing, and it is not clear why LMO2 does not show in the volcano plot (were higher concentrations of the PROTAC tested? and why only VHL was tested and not CRBN-based PROTAC?).

      Due to the relatively small size of the LMO2 protein, it is challenging to produce enough unique peptides for reliable identification, especially to distinguish some proteins in the LMO2 complex.  

      (3) The correlation between degradation potency and cell growth is not well-established (compare Figure 4C: P12-Ichikawa blots show great degradation at 24 and 48 hrs, but it is unclear if the cell growth in this cell line is any better than in PF-382 or MOLT-16) - Can the authors comment on the correlation between degradation and cell growth?  

      In this study (Fig. 4) we did not aim to compare the effect of LMO2 loss on cell growth among LMO2 positive cells. Rather, we aimed to evaluate the LMO2 importance for cell growth in LMO2-expressing T-ALL cells compared to non-expressing cells and to correlate the loss of the protein with this effect on the cell growth. In addition, the treatment of cells with the LMO2 compounds did now show an effect to LMO2 negative cells until at least 48 hours of treatment indicating that low toxicity of our PROTAC compounds and providing correlation between LMO2 loss and cell growth. 

      (4) The PROTACs are not very potent (double-digit micromolar range?) - can the authors elaborate on any challenges in the optimization of the degradation potency? 

      The Abd methodology to use intracellular domain antibodies to screen for compounds that bind to intrinsically disordered proteins such as the LMO2 transcription factors offers a tractable approach to hard drug targets but, in so doing, creates challenging factors to improve the potency that are not the same as those targets for which structural data are available. LMO2 is an intrinsically disordered protein, for which soluble recombinant protein is not readily available to identify the binding pocket of compounds. The potency has so far been optimized solely based on the different moieties substituted in cell-based SAR studies (http://advances.sciencemag.org/cgi/content/full/7/15/eabg1950/DC1) and all new compounds were tested with BRET assays. Thus, currently optimization of the degradation potency (including properties such as improved solubility) for the LMO2-binding compounds relies on chemical modification the three areas of the compounds indicated in Fig. 2 B,C.  

      (5) The authors mentioned trying six iDAb-E3 ligase proteins; I would recommend listing the E3 ligases tried and commenting on the results in the main text. 

      The six chimaeric iDAb-E3 ligase proteins involved one anti-LMO2 iDAb and three different E3 ligase where either fused at the N- or the C-terminus of the VH (giving six protein formats). These six fusion proteins were described in the text referring to the degrader studies described in Supplementary Fig. 1. 

      Reviewer #2 (Public review): 

      Summary: 

      Sereesongsaeng et al. aimed to develop degraders for LMO2, an intrinsically disordered transcription factor activated by chromosomal translocation in T-ALL. The authors first focused on developing biodegraders, which are fusions of an anti-LMO2 intracellular domain antibody (iDAb) with cereblon. Following demonstrations of degradation and collateral degradation of associated proteins with biodegraders, the authors proceeded to develop PROTACs using antibody paratopes (Abd) that recruit VHL (Abd-VHL) or cereblon (Abd-CRBN). The authors show dose-dependent degradation of LMO2 in LMO2+ T-ALL cell lines, as well as concomitant dose-dependent degradation of associated bHLH proteins in the DNA-binding complex. LMO2 degradation via Abd-VHL was also determined to inhibit proliferation and induce apoptosis in LMO2+ T-ALL cell lines. 

      Strengths: 

      The topic of degrader development for intrinsically disordered proteins is of high interest, and the authors aimed to tackle a difficult drug target. The authors evaluated methods, including the development of biodegraders, as well as PROTACs that recruit two different E3 ligases. The study includes important chemical control experiments, as well as proteomic profiling to evaluate selectivity. 

      Weaknesses: 

      The overall degradation is relatively weak, and the mechanism of potential collateral degradation is not thoroughly evaluated

      The purpose of the study was to evaluate effects of LMO2 degraders. The mechanism of the observed collateral degradation could not be investigated directly within the scope of our study. In the main text, discussed two possible, not exclusive, explanations. One being that our work (and previously published, cited work) indicates that the DNA-binding bHLH proteins have relatively short half file (Supplementary Fig. S12) and may therefore be subject to normal turnover when the LMO2, which is in the complex, turns over. Further, the known structure of the LMO2-bHLH interactions (from Omari et al, doi: 10.1016/j.celrep.2013.06.008) was also examined for the location of lysines in the TAL1 & E47 partners (Supplementary Fig. S11). It is possible that their local association with the LMO2-E3-ligase complex created by the PROTAC interaction, could cause their concurrent degradation. Mutagenesis and structural analysis would be needed to establish this point.

      In addition, experiments comparing the authors' prior work with their anti-LMO2 iDAb or Abl-L are lacking, which would improve our understanding of the potential advantages of a degrader strategy for LMO2.  

      A major motivation behind developing the Antibody-derived (Abd) method to select compounds, which are surrogates of the antibody paratope, is because using iDAbs directly as inhibitors requires the development of delivery technologies for these macromolecules, as protein directly or as vectors or mRNA for their expression. Ultimately, high affinity anti-LMO2 iDAbs should directly be used as tractable inhibitors when delivery methods redeveloped. In the meantime, Abd compounds were envisaged as being surrogates suitable for development into reagents, and potentially drugs, by medicinal chemistry. We evaluated selected first generation LMO2-binding Abd compounds previously, finding their ability to interfere with LMO2-iDAb BRET signal to EC<sub>max</sub> about 50% but these compounds do not have potency to have an effect on the interaction of LMO2 with a non-mutated iDAb (nM affinity). These data indicated that efficacy improvement for the PROTACs was needed. In addition, in the current study, we observed viability effects in T-ALL lines at high concentrations (20 μM) irrespective of LMO2 expression (Supplementary Fig. S 2A, B) These data indicated that efficacy improvement was needed and potentially converting the degraders (PROTACs) would add to in-cell potency. By adding the E3 ligase ligands, we found the toxicity of non-LMO2 expressing Jurkat was significantly reduced (Supplementary Fig. S 2E, F). 

      Reviewer #2 (Recommendations for the authors): 

      Suggestions for additional experiments: 

      (1) The data presented is primarily focused on demonstrating targeted degradation of LMO2, with a focus on phenotypes such as proliferation and apoptosis. In this manuscript, there are limited comparative evaluations of anti-LMO2 iDAb or Abl-L to show the potential benefits of a degrader approach to their previously described work, as well as why targeted degradation is in fact, advantageous. For example, the authors' previous work has shown that anti-LMO2 iDAb inhibits tumor growth in a mouse transplantation model. Comparisons in vitro would be supportive of the importance of continued degrader optimization/development.  

      we have previously shown that an anti-LMO2 scFv inhibits tumour growth in a mouse model but this work used an expressed scFv antibody that binds to LMO2 in nM range. The Abd compounds are much lower potency that the antibody and, because recombinant LMO2 is difficult to work with, we could only evaluate interactions of compounds with LMO2 in cell-based assays like BRET (LMO2-iDAb BRET). In this cell-based assay, the first generation Abd compounds do not have sufficient potency to block LMO2-iDAb interaction unless the affinity of the iDAb is reduced to sub-μM. The justification for proceeding on the degrader process rather than just using the protein-protein interaction (PPI) inhibition was based largely around the low potency of the first generation PPI compounds in cell assays and that incorporation protein degradation with PPI inhibition would enhance the efficacy.

      In addition, the viability experiments are also very short-term; is there a reason why the authors did not carry out these experiments for 3-5 days to fully understand the impacts on proliferation? 

      In Supplementary Fig. S5, we did show assays up to 3 days. In KOPT-K1 (LMO2+), the LMO2 levels were reduced during the time course of this assay (from a single compound dose at time zero) (Supplementary Fig S 5A, B). We also show CellTitreGlo assays up to 3 days and, with these second generation compounds, we observed sustained effects on KOPT-K1 (LMO2+) but low non-DMSO toxicity in Jurkat (LMO2-) (revised version Supplementary (Fig S5 C, D).

      (2) The potential mechanism of collateral degradation is interesting and important in evaluating the on-target responses and consequences of degrading LMO2. At this time, the data supporting collateral degradation is limited and would be strengthened by showing that it is not due to a change in mRNA levels and not due to complex dissociation. Overall, the kinetics and depth of loss of complex members such as E47 in Figure 3 appear more substantial than LMO2 itself, and as presented, collateral degradation is not effectively demonstrated. In addition, to aid in the readers' assessments, additional background and references around the roles of TAL1 and E47 would be helpful. For example, structurally, where do they (and other associated proteins that are not degraded) fit in the complex? 

      We have responded above in relation to the Public Review Comments and note that a structure of the complex was in submitted version (now revised version Supplementary Fig. S11). 

      (3) In Figure 1A, the blots show decreased levels of endogenous CRBN with iDAB-CRBN. Is this a known consequence of this approach in these cell lines? Does the partial recovery of endogenous CRBN in KOPTK1 cells have any indication of iDAB-CRBN levels? 

      We cannot be sure why the endogenous level of CRBN decreases in doxycycline treated cells. It has been shown (DOI:10.1371/journal.pone.0064561) that doxycycline used in the inducible expression system (and its derivatives), such as the lentivirus we used, has an effect to gene expression patterns, which can be increase or decrease expression. Although the published study did not examine CRBN expression, the effect might explain the CRBN expression decrease on doxycycline addition and remains the same level after that. 

      (4) In Figure S7, the authors do not fully explain the results and why there is minimal rescue with epoxomicin (S7A) or MLN4924 (S7J). This could indicate an alternative mechanism of degradation and loss at play, given the lack of rescue. Can the authors comment on this discrepancy, and have they looked autophagy inhibitor or other agents to achieve the chemical rescue? 

      In the experiments such as in revised version Supplementary Fig. S6, we used KOPT-K1 cells with a single concentration of the inhibitors and the cells may less susceptible to the epoxomicin (0.8 μM) but lenalidomide and free thalidomide restored the LMO2 levels fully. In the main text Fig. 3D, we also showed that including epoxomicin and thalidomide with the Abd-CRBN in KOPT-K1 and CCRF-CEM restore LMO2 levels, supporting the conclusion that the main mechanism of degradation is through ubiquitination proteosomal route.

      (5) For the proteomics data, it would be helpful to have the proteins in yellow highlighted to have them noted in 5D and 5E. In addition, can the authors comment on why LMO2 or their collateral targets are not confirmed in the table? Furthermore, 5C is difficult to interpret; if there are no significantly changing proteins in the Jurkat cells, why are there pathways that are identified? 

      As mentioned in reply to referee 1, due to the relatively small size of the LMO2 protein, it is challenging to produce enough unique peptides for reliable identification, especially to distinguish some proteins in the LMO2 complex where expression levels are low.

    1. eLife Assessment

      This valuable study concerns a model for transgenerational epigenetic inheritance, the learned avoidance by C. elegans of the PA14 pathogenic strain of Pseudomonas aeruginosa. A recent study questioned whether transgenerational inheritance in this paradigm lacks robustness. The authors of this study have worked independently of the group that reported the original phenomenon and also independently of the group that challenged the original report. With solid data, this study independently validates findings previously reported by the Murphy group, confirming that the paradigm is reproducible elsewhere. The reviewers also appreciated the information on reagent sources used by different groups. The present study is therefore of broad interest to anyone studying genetics, epigenetics, or learned behavior.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript addresses the discordant reports of the Murphy (Moore et al., 2019; Kaletsky et al., 2020; Sengupta et al., 2024) and Hunter (Gainey et al., 2025) groups on the existence (or robustness) of transgenerational epigenetic inheritance (TEI) controlling learned avoidance of C. elegans to Pseudomonas aeruginosa. Several papers from Colleen Murphy's group describe and characterize C. elegans transgenerational inheritance of avoidance behaviour. In the hands of the Murphy group, the learned avoidance is maintained for up to four generations, however, Gainey et al. (2025) reported an inability to observe inheritance of learned avoidance beyond the F1 generation. Of note, Gainey et al used a modified assay to measure avoidance, rather than the standard assay used by the Murphy lab. A response from the Murphy group suggested that procedural differences explained the inability of Gainey et al.(2025) to observe TEI. They found two sources of variability that could explain the discrepancy between studies: the modified avoidance assay and bacterial growth conditions (Kaletsky et al., 2025). The standard avoidance assay uses azide as a paralytic to capture worms in their initial decision, while the assay used by the Hunter group does not capture the worm's initial decision but rather uses cold to capture the location of the population at one point in time.

      In this short report, Akinosho, Alexander, and colleagues provide independent validation of transgenerational epigenetic inheritance (TEI) of learned avoidance to P. aeruginosa as described by the Murphy group by demonstrating learned avoidance in the F2 generation. These experiments used the protocol described by the Murphy group, demonstrating reproducibility and robustness.

      Strengths:

      Despite the extensive analyses carried out by the Murphy lab, doubt may remain for those who have not read the publications or for those who are unfamiliar with the data, which is why this report from the Vidal-Gadea group is so important. The observation that learned avoidance was maintained in the F2 generation provides independent confirmation of transgenerational inheritance that is consistent with reports from the Murphy group. It is of note that Akinosho, Alexander et al. used the standard avoidance assay that incorporates azide, and followed the protocol described by the Murphy lab, demonstrating that the data from the Moore and Kaletsky publications are reproducible, in contrast to what has been asserted by the Hunter group.

      Comments on revised version:

      I am happy with the responses to reviews.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript "Independent validation of transgenerational inheritance of learned pathogen avoidance in C. elegans" by Akinosho and Vidal-Gadea offers evidence that learned avoidance of the pathogen PA14 can be inherited for at least two generations. In spite of initial preference for the pathogen when exposed in a 'training session', 24 hours of feeding on this pathogen evoked avoidance. The data are robust, replicated in 4 trials, and the authors note that diminished avoidance is inherited in generations F1 and F2.

      Strengths:

      These results contrast with those reported by Gainey et al, who only observed intergenerational inheritance for a single generation. Although the authors' study does not explain why Gainey et el fail to reproduce the Murphy lab results, one possibility is that a difference in a media ingredient could be responsible.

      Comments on revised version:

      The responses to the reviewer comments appear reasonable for the most part.

    4. Reviewer #3 (Public review):

      Summary:

      This short paper aims to provide an independent validation of the transgenerational inheritance of learned behaviour (avoidance) that has been published by the Murphy lab. The robustness of the phenotype has been questioned by the Hunter lab. In this paper, the authors present one figure showing that transgenerational inheritance can be replicated in their hands. Overall, it helps to shed some light on a controversial topic.

      Strengths:

      The authors clearly outline their methods, particularly regarding the choice of assay, so that attempting to reproduce the results should be straightforward. It is nice to see these results repeated in an independent laboratory.

      Comments on revised version:

      I'm happy with the response to reviewers.

    5. Author response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Confirmation of daf-7::GFP data and inheritance beyond F2

      Reviewer suggested confirming daf-7::GFP molecular marker data and testing inheritance beyond the F2 generation to further strengthen the findings.

      We agree these experiments would provide valuable mechanistic insights into the molecular basis of transgenerational inheritance. However, our study was specifically designed as a reproducibility study focusing on the central controversy regarding F2 inheritance (Gainey et al. vs. Murphy lab findings). The daf-7::GFP molecular marker experiments, while important for understanding mechanisms, represent a different research question requiring extensive additional resources and expertise beyond the scope of this validation study. Our primary goal was to provide independent confirmation of the disputed F2 inheritance using standardized behavioral assays. It is our hope that future work will pursue these important mechanistic validations.

      "Exhaustive attempts" language

      Reviewer disagreed with characterizing Gainey et al.'s efforts as "exhaustive attempts" since they modified the original protocol.

      We revised this statement in the Results and Discussion to more accurately reflect the experimental situation: "In contrast, Gainey et al. (2025), representing the Hunter group, reported that while parental and F1 avoidance behaviors were evident, transgenerational inheritance was not reliably observed beyond the F1 generation under their experimental conditions."

      Importance of sodium azide

      Reviewer suggested including more discussion about the recent findings on the importance of sodium azide in the assay, referencing the Murphy group's response paper.

      We have prominently highlighted the critical role of sodium azide in our Introduction with strengthened language that emphasizes its importance for resolving the scientific controversy: "Critically, Kaletsky et al. (2025) demonstrated that omission of sodium azide during scoring can completely abolish detection of inherited avoidance, revealing that this key methodological difference may explain the conflicting results between laboratories. The use of sodium azide to immobilize worms at the moment of initial bacterial choice appears essential for capturing the inherited behavioral response. These findings highlight how seemingly minor methodological variations can dramatically impact detection of transgenerational inheritance and underscore the need for independent replication using standardized protocols."

      Protocol fidelity statement

      Reviewer requested a more direct statement clarifying that we followed the Murphy group protocol, noting that we made some modifications.

      We followed the core Murphy lab protocol with two evidence-based optimizations that preserve the essential experimental elements: 1) We used 400 mM sodium azide instead of 1 M based on preliminary data showing the higher concentration caused premature paralysis before worms could make behavioral choices, and 2) We used liquid NGM buffer instead of M9 to maintain chemical consistency with the solid NGM plates used for worm culture, minimizing potential osmotic stress. These modifications improved experimental reliability while maintaining the critical components: sodium azide immobilization, bacterial lawn density standardization (OD<sub>600</sub> = 1.0), and synchronized scoring conditions that are essential for detecting inherited avoidance.

      Overstated dilution claim

      Reviewer noted that the statement about "gradual decrease" in avoidance strength was overstated and didn't reflect the actual data presented in the manuscript.

      We removed this statement.

      Environmental variables phrasing

      Reviewer found the sentence about environmental variables unclear, noting that Gainey et al. didn't actually acknowledge variability but saw it as indicating error or stochastic processes.

      We refined this statement for greater precision and clarity: "This underscores the assay's sensitivity to environmental variables, such as synchronization method and bacterial lawn density. This highlights the importance of consistency across experimental setups and support the view that context-dependent variation may underlie previously reported discrepancies."

      Reviewer #2 (Public Review):

      Reagent sourcing

      Reviewer suggested listing the sources of media ingredients with company names and catalog numbers, as this might be important for reproducibility.

      To ensure complete reproducibility, we created a comprehensive Table S3 listing all reagents, suppliers, and catalog numbers used in our experiments. This detailed information enables exact replication of our experimental conditions and addresses potential variability that might arise from different reagent sources between laboratories.

      Reviewer #3 (Public Review):

      Raw data transparency

      Reviewer noted that while a spreadsheet with choice assay results was provided, the individual raw data from assays was not included, which would be helpful for assessing sample sizes.

      We now provide complete experimental transparency through Table S2, which contains individual choice indices from all 138 assays conducted across four independent trials. This comprehensive dataset allows full assessment of our experimental outcomes, statistical robustness, and reproducibility while enabling other researchers to perform independent statistical analyses.

      F1/F2 assay disparity

      Reviewer questioned whether the higher number of F2 assays compared to F1 represented truly independent assays, asking if multiple F2 assays were performed from offspring of one F1 plate (which would not represent independent assays).

      We clarified this important statistical consideration in Methods (Transgenerational Testing): "Each behavioral assay was conducted using animals from a biologically independent growth plate. While F2 plates were derived from pooled embryos from multiple F1 parents, each assay represents an independent biological replicate with no reuse of animals across assays. F2 assays (n=45) exceeded F1 assays (n=20) due to PA14-induced fecundity reduction in trained worms, limiting the number of viable F1 progeny. The higher number of F2 assays reflects the greater reproductive success of healthy F1 animals and provides additional statistical power for population-level behavioral comparisons." We also enhanced our Controls section to clarify that "Our experimental design employed population-level comparisons across generations using unpaired statistical analyses, with no attempt to track individual lineages across generations."

      Methodological variations overstatement

      Reviewer felt the Introduction overstated the findings by suggesting the authors "address potential methodological variations," when they only used one assay setup throughout.

      We have corrected the Introduction to accurately reflect our study design and scope: "Here, we adapted the protocol established by the Murphy group, maintaining the critical use of sodium azide to paralyze worms at the time of choice, to test whether parental exposure to PA14 elicits consistent avoidance in subsequent generations. Our study specifically focuses on the transmission of learned avoidance through the F2 generation, beyond the intergenerational (F1) effect, because this is where divergence between published studies begins."

      Reviewer #1 (Recommendations for the authors):

      Worm numbers

      Reviewer noted that information about the number of worms used should be included in the training and choice assay methods section rather than separated.

      We clarified worm numbers and sample sizes in the Methods (Controls and Additional Considerations): "Each individual assay averaged 62 ± 43 animals (range: 15-150 worms per assay), with a total of 138 assays conducted across four independent experimental trials. The variation in worm numbers per assay reflects natural variation in worm recovery and immobilization efficiency during choice assays. We conducted an average of 8.5 assays per condition during each of the four replicates."

      Figure 1 legend and consistency

      Reviewer identified several issues: inconsistent terminology ("treated" vs "trained"), incorrect statistical test naming, missing p-value annotations, and need for consistency between figure and legend. We have systematically addressed all figure consistency and statistical annotation issues:

      Replaced inconsistent "treated" terminology with "trained" throughout

      Corrected the statistical test description to accurately reflect our analysis: "Kruskal-Wallis oneway ANOVA followed by Dunn's post hoc" which properly corresponds to the statistical tests detailed in Table S1

      Added explicit p-value annotations in the figure legend: "*p<0.05, **p<0.01 means and SEM shown (see Table S1 for statistics and Table S2 for raw data)"

      Ensured consistent terminology between figure and legend

      NGM vs. M9 buffer

      Reviewer questioned whether we used NGM buffer or M9 buffer for washing steps, noting that NGM isn't usually referred to as "buffer."

      We have prominently featured and thoroughly clarified our rationale for using liquid NGM buffer in the Methods (Synchronization of Worms section). The explanation now appears upfront in the methods: "We used liquid NGM buffer instead of M9 buffer (as specified in the original Murphy protocol) to maintain chemical consistency with the solid NGM culture plates. This modification minimizes potential osmotic stress since liquid NGM matches the pH (6.0) and ionic composition of the growth medium, whereas M9 buffer has a different pH (7.0) and ionic profile." We provide detailed chemical differences and explain that this modification maintains consistency with culture conditions while preserving essential experimental procedures.

      Grammar/typos

      Reviewer noted that the manuscript needed thorough proofreading to address grammatical errors and typographical mistakes.

      We have conducted comprehensive proofreading and editing throughout the manuscript to resolve grammatical and typographical errors. Specific improvements include: clarified sentence structure in the Introduction and Results sections, corrected technical terminology consistency, improved figure legend clarity, and enhanced overall readability while maintaining scientific precision.

      Sodium azide concentration

      Reviewer noted that our sodium azide concentration differed from the Moore paper and requested comment on this difference.

      We have included explicit justification for our sodium azide concentration choice in the Methods (Training and Choice Assay): "We used 400 mM sodium azide rather than the 1 M concentration reported by Moore et al. (2019) because preliminary trials showed that higher concentrations caused premature paralysis before worms could reach either bacterial spot, potentially biasing choice measurements. The 400 mM concentration provided sufficient immobilization while preserving the behavioral choice window."

      Reviewer #2 (Recommendations for the authors):

      Comparative reagent analysis

      Reviewer suggested creating a supplemental table comparing reagent sources between our study, Gainey et al., and Murphy et al., proposing that media ingredient differences might explain the discrepancies.

      While direct reagent comparison between laboratories was beyond the scope of this validation study, we recognize this as an important consideration for understanding experimental variability. Our comprehensive reagent sourcing information (Table S3) provides the foundation for future comparative studies. We encourage collaborative efforts to systematically compare reagent sources across laboratories, as media component differences could contribute to the experimental variability observed between research groups. Such analyses would be valuable for establishing standardized protocols across the field.

      Conclusion

      We hope that these revisions satisfactorily address the reviewers’ concerns. We believe these improvements significantly strengthened the manuscript's contribution to resolving this important scientific controversy.

      We thank the reviewers again for their invaluable insights and constructive feedback, which have substantially improved the quality and impact of our work.

    1. eLife Assessment

      This paper presents a valuable theory and analysis of the role of neurogenesis and inhibitory plasticity in the drift of neural representations in the olfactory system. For one of the findings, regarding the impact of neurogenesis on the drift, the evidence remains incomplete. The reason lies in the differences in variability/drift of the mitral/tufted cell responses observed in the model compared to experimental observations, where these responses remain stable over extended time scales.

    2. Reviewer #1 (Public review):

      Summary:

      The authors build a network model of the olfactory bulb and the piriform cortex and use it to run simulations and test their hypotheses. Given the model's settings, the authors observe drift across days in the responses to the same odors of both the mitral/tufted cells, as well as of piriform cortex neurons. When representing the M/T and PCx responses within a lower-dimensional space, the apparent drift is more prominent in the PCx, while the M/T responses appear in comparison more stable. The authors further note that introducing spike-time dependent plasticity (STDP) at bulb synapses involving abGCs slows down the drift in the PCx representations, and further link this to the observation that repeated exposure to the same odorant slows down drift in the piriform cortex.

      The model is clearly explained and relies on several assumptions and observations:

      (1) Random projections of MTC from the olfactory bulb to the piriform cortex, random intra-piriform connectivity, and random piriform to bulb connectivity.

      (2) Higher dimensionality of piriform cortex representations compared to M/T responses, which enables superior decoding of odor identity in the piriform cortex.

      (3) Spike time-dependent plasticity (STDP) at synapses involving the abGCs.

      The authors address an open topical problem, and the model is elegant in its simplicity. I have however, several major concerns with the hypotheses underlying the model and with its biological plausibility.

      Concerns:

      (1) In their model, the authors propose that MTC remain stable at the population level, despite changes in individual MTC responses.

      The authors cite several experimental studies to support their claims that individual MTC responses to the same odors change (some increase, some decrease) across days. Interpreting the results of these studies must, however, take into account the variability of M/T responses across odor presentation repeats within the same session vs. across sessions. In the Shani-Narkiss et al., Frontiers in Neural Circuits, 2023 study referenced, a large fraction of the variability across days in M/T responses is also observed across repeats to the same odorant in the same session (Shani-Narkiss et al., Figure 4), while the authors have M/T responses in the same session that are highly reproducible. This is an important point to consider and address, since it constrains how much of the variability in M/T responses can be attributed to adult neurogenesis in the olfactory bulb versus to other networks' inhibitory mechanisms, which do not rely on neurogenesis. In the authors' model, the variability in M/T responses observed across days emerges as a result of adult-born neurogenesis, which does not need to be the main source of variability observed in imaging experiments (Shani-Narkiss et al., Figure 4).

      Another study (Kato et al., Neuron, 2012, Figure 4) reported that mitral cell responses to odors experienced repeatedly across 7 days tend to sparsen and decrease in amplitude systematically, while mitral cell responses to the same odor on day 1 vs. day 7 when the odor is not presented repeatedly in between seem less affected (although the authors also reported a decrease in the CI for this condition). As such, Kato et al. mostly report decreases in mitral cell odor responses with repeated odor exposure at both the individual and population level, and not so much increases and decreases in the individual mitral cell responses, and stability at the population level.

      (2) In Figure 1, a set of GCs is killed off, and new GCs are integrated in the network as abGC. Following the elimination of 10% of GCs in the network, new cells are added and randomly assigned synaptic weights between these abGCs and MTC, GCs, SACs, and top-down projections from PCx. This is done for 11 days, during which time all GCs have gone through adult neurogenesis.

      Is the authors' assumption here that across the 11 days, all GCs are being replaced? This seems to depart from the known biology of the olfactory bulb granule cells, i.e., GCs survive for a large fraction of the animal's life.

      (3) The authors' model relies on several key assumptions: random projections of MTC from the olfactory bulb to the piriform cortex, random intra-piriform connectivity, and random piriform to bulb connectivity. These assumptions are not necessarily accurate, as recent work revealed structure in the projections from the olfactory bulb to the piriform cortex and structure within the piriform cortex connectivity itself (Fink et al., bioRxiv, 2025; Chae et al., Cell, 2022; Zeppilli et al., eLife, 2021).

      How do the results of the model relating adult neurogenesis in the bulb to drift in the piriform cortex representations change when considering an alternative scenario in which the olfactory bulb to piriform and intra-piriform connectivity is not fully distributed and indistinguishable from random, but rather is structured?

      (4) I didn't understand the logic of the low-dimensional space analysis for M/T cells and piriform cortex neurons (Figures 2 & 3). In the authors' model, the full-ensemble M/T responses are reorganized over time, presumably due to the adult-born neurogenesis. Analyzing a lower-dimensional projection of the ensemble trajectories reveals a lower degree of re-organization. This is the same for the piriform cortex, but relatively, the piriform ensembles displayed in a low-dimensional embedding appear to drift more compared to the M/T ensembles.

      This analysis triggers a few questions: which representation is relevant for the brain function - the high or the low-dimensional projection? What fraction of response variance is included in the low-dimensional space analysis? How did the authors decide the low-dimensional cut-off? Why does STDP cause more drift in piriform cortex ensembles vs. M/T ensembles? Is this because of the assumed higher dimensionality of the piriform cortex representations compared to the mitral cells?

      (5) Could the authors comment whether STDP at abGC synapses and its impact on decreasing drift represent a new insight, and also put it into context? Several studies (e.g., Lledo, Murthy, Komiyama groups) reported that abGC integrates in the network in an activity-dependent manner, and not randomly, and as such stabilizes the active neuronal responses, which is consistent with the authors' report.

      Related, I couldn't find through the manuscript which synapses involving abGCs they focus on, or what is the relative contribution of the various plastic synapses shown in the cartoon from Figure 4 A1 (circles and triangles).

      6) The study would be strengthened, in my opinion, by including specific testable predictions that the authors' models make, which can be further food for thought for experimentalists.<br /> How does suppression of adult-born neurogenesis in the OB impact the stability of mitral cell odor responses? How about piriform cortex ensembles?

    3. Reviewer #2 (Public review):

      Summary:

      The authors address a critical problem in olfactory coding. It has long been known that adult neurogenesis, specifically in the form of adult-born granule cells that embed into the existing inhibitory networks on the olfactory bulb, can potentially alter the responses of Mitral/Tufted neurons that project activity to the Piriform Cortex and to other areas of the brain. Fundamentally, it would seem that these granule cells could alter the stability of neural codes in the OB over time. The authors develop a spiking network model to explore how stability can be achieved both in the OB over time and in the PC, which receives inputs. The model recapitulates published activity recordings of M/T cells and shows how activity in different M/T cells from the same glomerulus shifts over time in ways that, in spite of the shift, preserve population/glomerular level codes. However, these different M/T cells fan out onto different pyramidal cells of the PC, which gives rise to instability at that level. STDP then, is necessary to maintain stability at the PC level as long as odor environments remain constant. These results may also apply to a similar neurogenesis-based change in the Dentate Gyrus, which generates instability in CA1/3 regions of the hippocampus

      Strengths:

      A robust network model that untangles important, seemingly contradictory mechanisms that underlie olfactory coding.

      Weaknesses:

      The work is a significant contribution to understanding olfactory coding. But the manuscript would benefit from a brief discussion of why neurogenesis occurs in the first place - e.g., injury, ongoing needs for plasticity, and adapting to turnover of ORNs. There is literature on this topic. It seems counterintuitive to have a process in the MOB (and for that matter in the DG) that potentially disrupts the ability to generate stable codes both in the MOB and PC, and in particular a disruption that requires two different mechanisms - multiple M/T cells per glomerulus in the MOB and STDP in the PC - to counteract.

      Given that neurogenesis has an important function, and a mechanism is in place to compensate for it in the MOB, why would it then be disrupted in fan-out projections to the PC? The answer may lie in the need for fan-out projections so that pyramidal neurons in the PC can combinatorially represent many different inputs from the MOB. So something like STDP would be needed to maintain stability in the face of the need for this coding strategy.

      This kind of discussion, or something like it, would help readers understand why these mechanisms occur in the first place. It is interesting that PC stability requires that odor environments be stable, and that this stability drives PC representational stability. This result suggests experimental work to test this hypothesis. As such, it is a novel outcome of the research.

    4. Reviewer #3 (Public review):

      Summary

      The authors set out to explore the potential relationship between adult neurogenesis of inhibitory granule cells in the olfactory bulb and cumulative changes over days in odor-evoked spiking activity (representational drift) in the olfactory stream. They developed a richly detailed spiking neuronal network model based on Izhikevich (2003), allowing them to capture the diversity of spiking behaviors of multiple neuron types within the olfactory system. This model recapitulates the circuit organization of both the main olfactory bulb (MOB) and the piriform cortex (PCx), including connections between the two (both feedforward and corticofugal). Adult neurogenesis was captured by shuffling the weights of the model's granule cells, preserving the distribution of synaptic weights. Shuffling of granule cell connectivity resulted in cumulative changes in stimulus-evoked spiking of the model's M/T cells. Individual M/T cell tuning changed with time, and ensemble correlations dropped sharply over the temporal interval examined (long enough that almost all granule cells in the model had shuffled their weights). Interestingly, these changes in responsiveness did not disrupt low-dimensional stability of olfactory representations: when projected into a low-dimensional subspace, population vector correlations in this subspace remained elevated across the temporal interval examined. Importantly, in the model's downstream piriform layer, this was not the case. There, shuffled GC connectivity in the bulb resulted in a complete shift in piriform odor coding, including for low-dimensional projections. This is in contrast to what the model exhibited in the M/T input layer. Interestingly, these changes in PCx extended to the geometrical structure of the odor representations themselves. Finally, the authors examined the effect of experience on representational drift. Using an STDP rule, they allowed the inputs to and outputs from adult-born granule cells to change during repeated presentations of the same odor. This stabilized stimulus-evoked activity in the model's piriform layer.

      Strengths

      This paper suggests a link between adult neurogenesis in the olfactory bulb and representational drift in the piriform cortex. Using an elegant spiking network that faithfully recapitulates the basic physiological properties of the olfactory stream, the authors tackle a question of longstanding interest in a creative and interesting manner. As a purely theoretical study of drift, this paper presents important insights: synaptic turnover of recurrent inhibitory input can destabilize stimulus-evoked activity, but only to a degree, as representations in the bulb (the model's recurrent input layer) retain their basic geometrical form. However, this destabilized input results in profound drift in the model's second (piriform) layer, where both the tuning of individual neurons and the layer's overall functional geometry are restructured. This is a useful and important idea in the drift field, and to my knowledge, it is novel. The bulb is not the only setting where inhibitory synapses exhibit turnover (whether through neurogenesis or synaptic dynamics), and so this exploration of the consequences of such plasticity on drift is valuable. The authors also elegantly explore a potential mechanism to stabilize representations through experience, using an STDP rule specific to the inhibitory neurons in the input layer. This has an interesting parallel with other recent theoretical work on drift in the piriform (Morales et al., 2025 PNAS), in which STDP in the piriform layer was also shown to stabilize stimulus representations there. It is fascinating to see that this same rule also stabilizes piriform representations when implemented in the bulb's granule cells.

      The authors also provide a thoughtful discussion regarding the differential roles of mitral and tufted cells in drift in piriform and AON and the potential roles of neurogenesis in archicortex.

      In general, this paper puts an important and much-needed spotlight on the role of neurogenesis and inhibitory plasticity in drift. In this light, it is a valuable and exciting contribution to the drift conversation.

      Weaknesses

      I have one major, general concern that I think must be addressed to permit proper interpretation of the results.

      I worry that the authors' model may confuse thinking on drift in the olfactory system, because of differences in the behavior of their model from known features of the olfactory bulb. In their model, the tuning of individual bulbar neurons drifts over time. This is inconsistent with the experimental literature on the stability of odor-evoked activity in the olfactory bulb.

      In a foundational paper, Bhalla & Bower (1997) recorded from mitral and tufted cells in the olfactory bulb of freely moving rats and measured the odor tuning of well-isolated single units across a five-day interval. They found that the tuning of a single cell was quite variable within a day, across trials, but that this variability did not increase with time. Indeed, their measure of response similarity was equivalent within and across days. In what now reads as a prescient anticipation of the drift phenomenon, Bhalla and Bower concluded: "it is clear, at least over five days, that the cell is bounded in how it can respond. If this were not the case, we would expect a continual increase in relative response variability over multiple days (the equivalent of response drift). Instead, the degree of variability in the responses of single cells is stable over the length of time we have recorded." Thus, even at the level of single cells, this early paper argues that the bulb is stable.

      This basic result has since been replicated by several groups. Kato et al. (2012) used chronic two-photon calcium imaging of mitral cells in awake, head-fixed mice and likewise found that, while odor responses could be modulated by recent experience (odor exposure leading to transient adaptation), the underlying tuning of individual cells remained stable. While experience altered mitral cell odor responses, those responses recovered to their original form at the level of the single neuron, maintaining tuning over extended periods (two months). More recently, the Mizrahi lab (Shani-Narkiss et al., 2023) extended chronic imaging to six months, reporting that single-cell odor tuning curves remained highly similar over this period. These studies reinforce Bhalla and Bower's original conclusion: despite trial-to-trial variability, olfactory bulb neurons maintain stable odor tuning across extended timescales, with plasticity emerging primarily in response to experience. (The Yamada et al., 2017 paper, which the authors here cite, is not an appropriate comparison. In Yamada, mice were exposed daily to odor. Therefore, the changes observed in Yamada are a function of odor experience, not of time alone. Yamada does not include data in which the tuning of bulb neurons is measured in the absence of intervening experience.)

      Therefore, a model that relies on instability in the tuning of bulbar neurons risks giving the incorrect impression that the bulb drifts over time. This difference should be explicitly addressed by the authors to avoid any potential confusion. Perhaps the best course of action would be to fit their model to Mizrahi's data, should this data be available, and see if, when constrained by empirical observation, the model still produces drift in piriform. If so, this would dramatically strengthen the paper. If this is not feasible, then I suggest being very explicit about this difference between the behavior of the model and what has been shown empirically. I appreciate that in the data there is modest drift (e.g., Shani-Narkiss' Figure 8C), but the changes reported there really are modest compared to what is exhibited by the model. A compromise would be to simply apply these metrics to the model and match the model's similarity to the Shani-Narkiss data. Then the authors could ask what effect this has on drift in piriform.

      The risk here is that people will conclude from this paper that drift in piriform may simply be inherited from instability in the bulb. This view is inconsistent with what has been documented empirically, and so great care is warranted to avoid conveying that impression to the community.

      Major comments (all related to the above point)

      (1) Lines 146-168: The authors find in their model that "individual M/T cells changed their responses to the same odor across days due to adult-neurogenesis, with some cells decreasing the firing rate responses (Fig.2A1 top) while other cells increased the magnitude of their responses (Fig. 2A2 bottom, Fig. S2)" they also report a significant decrease in the "full ensemble correlation" in their model over time. They claim that these changes in individual cell tuning are "similar to what has been observed by others using calcium imaging of M/T cell activity (Kato et al., 2012 and Yamada et al., 2017)" and that the decrease in full ensemble correlation is "consistent with experimental observations (Yamada et al., 2017)." However, the conditions of the Kato and Yamada experiments that demonstrate response change are not comparable here, as odors were presented daily to the animals in these experiments. Therefore, the changes in odor tuning found in the Kato and Yamada papers (Kato Figure 4D; Yamada Figure 3E) are a function of accumulated experience with odor. This distinction is crucial because experience-induced changes reflect an underlying learning process, whereas changes that simply accumulate over time are more consistent with drift. The conditions of their model are more similar to those employed in other experiments described in Kato et al. 2012 (Figure 6C) as well as Shani-Narkiss et al. (2023), in which bulb tuning is measured not as a function of intervening experience, but rather as a function of time (Kato's "recovery" experiment). What is found in Kato is that even across two months, the tuning of individual mitral cells is stable. What alters tuning is experience with odor, the core finding of both the Kato et al., 2012 paper and also Yamada et al., 2017. It is crucial that this is clarified in the text.

      (2) The authors show that in a reduced-space correlation metric, the correlation of low-dimensional trajectories "remained high across all days"..."consistent with a recent experimental study" (Shani-Narkiss et al., 2023). It is true that in the Shani-Narkiss paper, a consistent low-dimensional response is found across days (t-SNE analysis in Shani-Narkiss Figure 7B). However, the key difference between the Shani-Narkiss data and the results reported here is that Shani-Narkiss also observed relative stability in the native space (Shani-Narkiss Figure 8). They conclude that they "find a relatively stable response of single neurons to odors in either awake or anesthetized states and a relatively stable representation of odors by the MC population as a whole (Figures 6-8; Bhalla and Bower, 1997)." This should be better clarified in the text.

      (3) In the discussion, the authors state that "In the MOB, individual M/T cells exhibited variable odor responses akin to gain control, altering their firing rate magnitudes over time. This is consistent with earlier experimental studies using calcium-imaging." (L314-6). Again, I disagree that these data are consistent with what has been published thus far. Changes in gain would have resulted in increased variability across days in the Bhalla data. Moreover, changes in gain would be captured by Kato's change index ("To quantify the changes in mitral cell responses, we calculated the change index (CI) for each responsive mitral cell-odor pair on each trial (trial X) of a given day as (response on trial X - the initial response on day 1)/(response on trial X + the initial response on day 1). Thus, CI ranges from −1 to 1, where a value of −1 represents a complete loss of response, 1 represents the emergence of a new response, and 0 represents no change." Kato et al.). This index will capture changes in gain. However, as shown in Figure 4D (red traces), Figure 6C (Recovery and Odor set B during odor set A experience and vice versa), the change index is either zero or near zero. If the authors wish to claim that their model is consistent with these data, they should also compute Kato's change index for M/T odor-cell pairs in their model and show that it also remains at 0 over time, absent experience.

    1. eLife Assessment

      This valuable study compares auditory cortex responses to sounds and cochlear implant stimulation measured with surface electrode grids in rats. Beyond the reduced frequency resolution of cochlear implants observed previously, this study suggests key discrepancies between neuronal representations of cochlear stimulations and natural sounds. However, the evidence for this potentially interesting result is incomplete because there is a lack of evidence for the effectiveness of the comparison method. This study is of interest to researchers in the auditory neuroscience field and clinicians implementing treatments with cochlear implants.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript addresses an important question: whether cortical population codes for cochlear-implant (CI) stimulation resemble those for natural acoustic input or constitute a qualitatively different representation. The authors record intracranial EEG (µECoG) responses to pure tones in normal-hearing rats and to single-channel CI pulses in bilaterally deafened, acutely implanted rats, analysing the data with ERP/high-gamma measures, tensor component analysis (TCA), and information-theoretic decoding. Across several readouts, the acoustic condition supports better single-trial stimulus classification than the CI condition. However, stronger decoding does not, on its own, establish that the acoustic responses instantiate a "richer" cortical code, and the evidence for orderly spatial organisation is not compelling for CI, and is also less evident than expected for normal-hearing, given prior knowledge. The overall narrative is interesting, but at present, the conclusions outpace the data because of statistical, methodological, and presentation issues.

      Strengths:

      The study poses a timely, clinically relevant question with clear implications for CI strategy. The analytical toolkit is appropriate: µECoG captures mesoscale patterns; TCA offers a transparent separation of spatial and temporal structure; and mutual-information decoding provides an interpretable measure of single-trial discriminability. Within-subject recordings in a subset of animals, in principle, help isolate modality effects from inter-animal variability. Where analyses are most direct, the acoustic condition yields higher single-trial decoding accuracy, which is a meaningful and clearly presented result.

      Weaknesses:

      Several limitations constrain how far the conclusions can be taken. Parts of the statistical treatment do not match the data structure: some comparisons mix paired and unpaired animals but are analysed as fully paired, raising concerns about misestimated uncertainty. Methodological reporting is incomplete in places; essential parameters for both acoustic and electrical stimulation, as well as objective verification of implantation and deafening, are not described with sufficient detail to support confident interpretation or replication. Figure-level clarity also undermines the message. In Figure 2, non-significant slopes for CI, repeated identification of a single "best channel," mismatched axes, and unclear distinctions between example and averaged panels make the assertion of spatial organisation unconvincing; importantly, the normal-hearing panels also do not display tonotopy as clearly as expected, which weakens the key contrast the paper seeks to establish. Finally, the decoding claims would be strengthened by simple internal controls, such as within-modality train/test splits and decoding on raw ERP/high-gamma features to demonstrate that poor cross-modal transfer reflects genuine differences in the underlying responses rather than limitations of the modelling pipeline.

    3. Reviewer #2 (Public review):

      Summary:

      This article reports measurements of iEEG signals on the rat auditory cortex during cochlear implant or sound stimulation in separate groups of rats. The observations indicate some spatial organization of cochlear implant stimuli, but that is very different from cochlear implants.

      Strengths:

      The study includes interesting analyses of the sound and cochlear implant representation structure based on decoders.

      Weaknesses:

      The observation that responses to cochlear implant stimulation (stimulation) are spatially organized is not new (e.g., Adenis et al. 2024).

      The claim that spatial and temporal dimensions contribute information about the sound is also not new; there is a large literature on this topic. Moreover, the results shown here are extremely weak. They show similar levels of information in the spatial and temporal dimensions, and no synergy between the two dimensions. This is however, likely the consequence of high measurement noise leading to poor accuracy in the information estimates, as the authors state.

      The main claim of the study - the mismatch between cochlear implant and sound representation - is not supported. The responses to each modality are measured in different animals. The authors do not show that they actually can compare representations across animals (e.g., for the same sounds). Without this positive control, there is no reason to think that it is possible to decode from one animal with a decoder trained on another, and the negative result shown by the authors is therefore not surprising.

    4. Reviewer #3 (Public review):

      Summary:

      Through micro-electroencephalography, Hight and colleagues studied how the auditory cortex in its ensemble responds to cochlear implant stimulation compared to the classic pure tones. Taking advantage of a double-implanted rat model (Micro-ECoG and Cochlear Implant), they tracked and analyzed changes happening in the temporal and spatial aspects of the cortical evoked responses in both normal hearing and cochlear-implanted animals. After establishing that single-trial responses were sufficient to encode the stimuli's properties, the authors then explored several decoder architectures to study the cortex's ability to encode each stimulus modality in a similar or different manner. They conclude that a) intracranial EEG evoked responses can be accurately recorded and did not differed between normal hearing and cochlear-implanted rats; b) Although coarsely spatially organized, CI-evoked responses had higher trial-by-trial variability than pure tones; c) Stimulus identity is independently represented by temporal and spatial aspect of cortical representations and can be accurately decoded by various means from single trials; d) and that Pure tones trained decoder can't decode CI-stimulus identity accurately.

      Strength:

      The model combining micro-eCoG and cochlear implantation and the methodology to extract both the Event Related Potentials (ERPs) and High-Gammas (HGs) is very well designed and appropriately analyzed. Likewise, the PCA-LDA and TCA-LDA are powerful tools that take full advantage of the information provided by the cortical ensembles.

      The overall structure of the paper, with a paced and exhaustive progress through each step and evolution of the decoder, is very appreciable and easy to follow. The exploration of single-trial encoding and stimulus identity through temporal and spatial domains is providing new avenues to characterize the cortical responses to CI stimulations and their central representation. The fact that single trials suffice to decode the stimulus identity regardless of their modality is of great interest and noteworthy. Although the authors confirm that iEEG remains difficult to transpose in the clinic, the insights provided by the study confirm the potential benefit of using central decoders to help in clinic settings.

      Weaknesses:

      The conclusion of the paper, especially the concept of distinct cortical encoding for each modality, is unfortunately partially supported by the results, as the authors did not adequately consider fundamental limitations of CI-related stimulation.

      First, the reviewer assumed that the authors stimulated in a Monopolar mode, which, albeit being clinically relevant, notoriously generates a high current spread in rodent models. Second, comparing the averaged BF maps for iEEG (Figure 2A, C), BFs ranged from 4 to 16kHz with a predominance of 4kHz BFs. The lack of BFs at higher frequencies hints at a potential location mismatch between the frequency range sampled at the level of the cortex (low to medium frequencies) and the frequency range covered by the CI inserted mostly in the first turn-and-a-half of the cochlea (high to medium frequencies). Looking at Figure 2F (and to some extent 2A), most of the CI electrodes elicited responses around the 4kHz regions, and averaged maps show a predominance of CI-3-4 across the cortex (Figure 2C, H) from areas with 4kHz BF to areas with 16kHz BF. It is doubtful that CI-3-4 are located near the 4kHz region based on Müller's work (1991) on the frequency representation in the rat cochlea.

      Taken together with the Pearsons correlations being flat, the decoder examples showing a strong ability to identify CI-4 and 3 and the Fig-8D, E presenting a strong prediction of 4kHz and 8kHz for all the CI electrodes when using a pure tone trained decoder, it is possible that current spread ended stimulating indistinctly higher turns of the cochlea or even the modiolus in a non-specific manner, greatly reducing (or smearing) the place-coding/frequency resolution of each electrode, which in turn could explain the coarse topographic (or coarsely tonotopic according to the manuscript) organization of the cortical responses. Thus, the conclusion that there are distinct encodings for each modality is biased, as it might not account for monopolar smearing. To that end, and since it is the study's main message and title, it would have benefited from having a subgroup of animals using bipolar stimulations (or any focused strategy since they provide reduced current spread) to compare the spatial organization of iEEG responses and the performances of the different decoders to dismiss current spread and strengthen their conclusion.

      Nevertheless, the reviewer wants to reiterate that the study proposed by Hight et al. is well constructed, relevant to the field, and that the overall proposal of improving patient performances and helping their adaptation in the first months of CI use by studying central responses should be pursued as it might help establish new guidelines or create new clinical tools.

    1. eLife Assessment

      This important study uses simultaneous EEG and fMRI recordings to shed light on the relationship between alpha and gamma oscillations and specific cortical layers. The sophisticated methodology provides solid evidence for correlations between oscillatory power and the strength and contents of fMRI signals in different cortical layers, though some caveats remain. This paper will be of interest to neuroscientists studying the role and mechanisms of alpha and gamma oscillations.

    2. Reviewer #1 (Public review):

      In this manuscript, Clausner and colleagues use simultaneous EEG and fMRI recordings to clarify how visual brain rhythms emerge across layers of early visual cortex. They report that gamma activity correlates positively with feature-specific fMRI signals in superficial and deep layers. By contrast, alpha activity generally correlated negatively with fMRI signals, with two higher frequencies within the alpha reflecting feature-specific fMRI signals. This feature-specific alpha code indicates an active role of alpha oscillations in visual feature coding, providing compelling evidence that the functions of alpha oscillations go beyond cortical idling or feature-unspecific suppression.

      The study is very interesting and timely. Methodologically, it is state-of-the-art. The findings on a more active role of alpha activity that goes beyond the classical idling or suppression accounts are in line with recent findings and theories. In sum, this paper makes a very nice contribution. I still have a few comments that I outline below, regarding the data visualization, some methodological aspects, and a couple of theoretical points.

      (1) The authors put a lot of effort into the figure design. For instance, I really like Figure 1, which conveys a lot of information in a nice way. Figures 3 and 4, however, seem overengineered, and it takes a lot of time to distill the contents from them. The fact that they have a supplementary figure explaining the composition of these figures already indicates that the authors realized this is not particularly intuitive. First of all, the ordering of the conditions is not really intuitive. Second, the indication of significance through saturation does not really work; I have a hard time discerning the more and less saturated colors. And finally, the white dots do not really help either. I don't fully understand why they are placed where they are placed (e.g., in Figure 3). My suggestion would be to get rid of one of the factors (I think the voxel selection threshold could go: the authors could run with one of the stricter ones, and the rest could go into the supplement?) and then turn this into a few line plots. That would be so much easier to digest.

      (2) The division between high- and low-frequency alpha in the feature-specific signal correspondence is very interesting. I am wondering whether there is an opposite effect in the feature-unspecific signal correspondence. Would the high-frequency alpha show less of a feature-unspecific correlation with the BOLD?

      (3) In the discussion (line 330 onwards), the authors mention that low-frequency alpha is predominantly related to superficial layers, referencing Figure 4A. I have a hard time appreciating this pattern there. Can the authors provide some more information on where to look?

      (4) How did the authors deal with the signal-to-noise ratio (SNR) across layers, where the presence of larger drain veins typically increases BOLD (and thereby SNR) in superficial layers? This may explain the pattern of feature-unspecific effects in the alpha (Figure 3). Can the authors perform some type of SNR estimate (e.g., split-half reliability of voxel activations or similar) across layers to check whether SNR plays a role in this general pattern?

      (5) The GLM used for modelling the fMRI data included lots of regressors, and the scanning was intermittent. How much data was available in the end for sensibly estimating the baseline? This was not really clear to me from the methods (or I might have missed it). This seems relevant here, as the sign of the beta estimates plays a major role in interpreting the results here.

      (6) Some recent research suggests that gamma activity, much in contrast to the prevailing view of the mechanism for feedforward information propagation, relates to the feedback process (e.g., Vinck et al., 2025, TiCS). This view kind of fits with the localization of gamma to the deep layer here?

      (7) Another recent review (Stecher et al., 2025, TiNS) discusses feature-specific codes in visual alpha rhythms quite a bit, and it might be worth discussing how your results align with the results reported there.

    3. Reviewer #2 (Public review):

      The authors address a long-standing controversy regarding the functional role of neural oscillations in cortical computations and layer-specific signalling. Several studies have implicated gamma oscillations in bottom-up processing, while lower-frequency oscillations have been associated with top-down signalling. Therefore, the question the authors investigate is both timely and theoretically relevant, contributing to our understanding of feedforward and feedback communication in the brain. This paper presents a novel and complicated data acquisition technique, the application of simultaneous EEG and fMRI, to benefit from both temporal and spatial resolution. A sophisticated data analysis method was executed in order to understand the underlying neural activity during a visual oddball task. Figures are well-designed and appropriately represent the results, which seem to support the overall conclusions. However, some of the claims (particularly those regarding the contribution of gamma oscillations) feel somewhat overstated, as the results offer indeed some significant evidence, but most seem more like a suggestive trend. Nonetheless, the paper is well-written, addresses a relevant and timely research question, introduces a novel and elegant analysis approach, and presents interesting findings. Further investigation will be important to strengthen and expand upon these insights.

      One of the main strengths of the paper lies in the use of a well-established and straightforward experimental paradigm (the visual oddball task). As a result, the behavioural effects reported were largely expected and reassuring to see replicated. The acquisition technique used is very novel, and while this may introduce challenges for data analysis, the authors appear to have addressed these appropriately.

      Later findings are very interesting, and mainly in line with our current understanding of feedback and feedforward signalling. However, the layer weight calculation is lacking in the manuscript. While it is discussed in the methods, it would help to briefly explain in the results how these weights are calculated, so that the reader can better follow what is being interpreted.

      Line 104 states there is one virtual channel per hemisphere for low and high frequencies. It may be helpful to include the number of channels (n=4) in the results section, as specified in the methods. Also, this raises the question of whether a single virtual channel (i.e., voxel) provides sufficient information for reproducibility.

      One area that would benefit from further clarification is the interpretation of gamma oscillations. The evidence for gamma involvement in the observed effects appears somewhat limited. For example, no significant gamma-related clusters were found for the feature-unspecific BOLD signal (Figure 2). Significant effects emerged only when the analysis was restricted to positively responding voxels, and even then, only for the contrast between EEG-coherent and EEG-incoherent conditions in the feature-specific BOLD response. It remains unclear how to interpret this selective emergence of gamma-related effects. Given previous literature linking gamma to feedforward processing, one might expect more robust involvement in broader, feature-unspecific contrasts. The current discussion presents the gamma-related findings with some confidence, and the manuscript would benefit from a more nuanced reflection on why these effects may not have appeared more broadly. The explanation provided in line 230, that restricting the analysis to positively responding voxels may have increased the SNR, is reasonable, but it may not fully account for the absence of gamma effects in V1's feature-unspecific response. Including the actual beta values from Figure 4 in the legend or main text would also help readers better assess the strength and specificity of the reported effects.

      Relating to behavioural findings for underlying neural activity, could the authors test on a trial-by-trial basis how behavioural performance relates to the BOLD signal / oscillatory activity change? Line 305 states that "Since behavioural performance in the present study was consistently high at 94% on average and participants were instructed to respond quickly to potential oddball stimuli, a higher alpha frequency might reflect a more successful stimulus encoding and hence faster and more accurate behavioural performance." Also, this might help to relate the findings to the lower vs upper alpha functionality difference.

      In Figure 4, the EEG alpha specificity plot shows relatively large error bars, and there is visible overlap between the lower and upper alpha in both congruent and incongruent conditions. While upper alpha shows a positive slope across conditions and lower alpha remains flat, the interaction appears to be driven by the change from congruent to incongruent in upper alpha. It is worth clarifying whether the simple effects (e.g., lower vs upper within each condition) were tested, given the visual similarity at the incongruent condition. Overall, the significant interaction (p < 0.001, FDR-corrected) is consistent with diverging trends, but a breakdown of simple effects would help interpret the result more clearly. Was there a significant difference between lower and upper alpha in congruent or incongruent conditions?

      Overall, this study provides a valuable contribution to the literature on oscillatory dynamics and laminar fMRI, though some interpretations would benefit from further clarification or qualification.

    4. Reviewer #3 (Public review):

      Summary:

      Clausner et al. investigate the relationship between cortical oscillations in the alpha and gamma bands and the feature-specific and feature-unspecific BOLD signals across cortical layers. Using a well-designed stimulus and GLM, they show a method by which different BOLD signals can be differentiated and investigated alongside multiple cortical oscillatory frequencies. In addition to the previously reported positive relationship between gamma and BOLD signals in superficial layers, they show a relationship between gamma and feature-specific BOLD in the deeper layers. Alpha-band power is shown to have a negative relationship with the negative BOLD response for both feature-specific and feature-unspecific contrasts. When separated into lower (8-10Hz) and upper (11-13Hz) alpha oscillations, they show that higher frequency alpha showed a significantly stronger negative relationship with congruency, and can therefore be interpreted as more feature-specific than lower frequency alpha.

      Strengths:

      The use of interleaved EEG-fMRI has provided a rich dataset that can be used to evaluate the relationship of cortical layer BOLD signals with multiple EEG frequencies. The EEG data were of sufficient quality to see the modulation of both alpha-band and gamma-band oscillations in the group mean VE-channel TFS. The good EEG data quality is backed up with a highly technical analysis pipeline that ultimately enables the interpretation of the cortical layer relationship of the BOLD signal with a range of frequencies in the alpha and gamma bands. The stimulus design allowed for the generation of multiple contrasts for the BOLD signal and the alpha/gamma oscillations in the GLM analysis. Feature-specific and unspecific BOLD contrasts are used with congruently or incongruently selected EEG power regressors to delineate between local and global alpha modulations. A transparent approach is used for the selection of voxels contributing to the final layer profiles, for which statistical analysis is comprehensive but uses an alternative statistical test, which I have not seen in previous layer-fMRI literature.

      A significant negative relationship between alpha-band power and the BOLD signal was seen in congruently (EEGco) selected voxels (predominantly in superficial layers) and in feature-contrast (EEGco-inco) selected (superficial and deep layers). When separated into lower (8-10Hz) and upper (11-13Hz) alpha oscillations, they show that higher frequency alpha showed a significantly stronger negative relationship with congruency than lower frequency alpha. This is interpreted as a frequency dissociation in the alpha-BOLD relationship, with upper frequency alpha being feature-specific and lower frequency alpha corresponding to general modulation. These results are a valuable addition to the current literature and improve our current understanding of the role of cortical alpha oscillations.

      There is not much work in the literature on the relationship between alpha power and the negative BOLD response (NBR), so the data provided here are particularly valuable. The negative relationship between the NBR and alpha power shown here suggests that there is a reduction in alpha power, linked to locally reduced BOLD activity, which is in line with the previously hypothesized inhibitory nature of alpha.

      Weaknesses:

      It is not entirely clear how the draining vein effect seen in GE-BOLD layer-fMRI data has been accounted for in the analysis. For the contrast of congruent-incongruent, it is assumed that the underlying draining effect will be the same for both conditions, and so should be cancelled out. However, for the other contrasts, it is unclear how the final layer profiles aren't confounded by the bias in BOLD signal towards the superficial layers. Many of the profiles in Figure 3 and Figure 4A show an increased negative correlation between alpha power and the BOLD signal towards the superficial layers.

      When investigating if high alpha (8-10 Hz) and low alpha (11-13 Hz) are two different sources of alpha, it would be beneficial to show if this effect is only seen at the group level or can be seen in any single subjects. Inter-subject variability in peak alpha power could result in some subjects having a single low alpha peak and some a single high alpha peak rather than two peaks from different sources.

      The figure layout used to present the main findings throughout is an innovative way to present so much information, but it is difficult to decipher the main findings described in the text. The readability would be improved if the example (Appendix 0 - Figure 1) in the supplementary material is included as a second panel inside Figure 3, or, if this is not possible, the example (Appendix 0 - Figure 1) should be clearly referred to in the figure caption.

    1. eLife Assessment

      This important study shows that visual search for upright and rotated objects is affected by rotating participants in a VR and gravitational reference frame. However, the evidence supporting this conclusion is incomplete, given the authors' use of normalized response time and the assumption that object recognition across rotations requires mental rotation.

    2. Reviewer #1 (Public review):

      Summary:

      The current study sought to understand which reference frames humans use when doing visual search in naturalistic conditions. To this end, they had participants do a visual search task in a VR environment while manipulating factors such as object orientation, body orientation, gravitational cues, and visual context (where the ground is). They generally found that all cues contributed to participants' performance, but visual context and gravitational cues impacted performance the most, suggesting that participants represent space in an allocentric reference frame during visual search.

      Strengths:

      The study is valuable in that it sheds light on which cues participants use during visual search. Moreover, I appreciate the use of VR and precise psychophysical predictions (e.g., slope vs. intercept) to dissociate between possible reference frames.

      Weaknesses:

      It's not clear what the implications of the study are beyond visual search. Moreover, I have some concerns about the interpretation of Experiment 1, which relies on an incorrect interpretation of mental rotation. Thus, most of the conclusions rely on Experiment 2, which has a small sample size (n = 10). Finally, the statistical analyses could be strengthened with measures of effect size and non-parametric statistics.

    3. Reviewer #2 (Public review):

      Summary:

      This paper addresses an interesting issue: how is the search for a visual target affected by its orientation (and the viewer's) relative to other items in the scene and gravity? The paper describes a series of visual search tasks, using recognizable targets (e.g., a cat) positioned within a natural scene. Reaction times and accuracy at determining whether the target was present or absent, trial-to-trial, were measured as the target's orientation, that of the context, and of the viewer themselves (via rotation in a flight simulator) were manipulated. The paper concludes that search is substantially affected by these manipulations, primarily by the reference frame of gravity, then visual context, followed by the egocentric reference frame.

      Strengths:

      This work is on an interesting topic, and benefits from using natural stimuli in VR / flight simulator to change participants' POV and body position.

      Weaknesses:

      There are several areas of weakness that I feel should be addressed.

      (1) The literature review/introduction seems to be lacking in some areas. The authors, when contemplating the behavioral consequences of searching for a 'rotated' target, immediately frame the problem as one of rotation, per se (i.e., contrasting only rotation-based explanations; "what rotates and in which 'reference frame[s]' in order to allow for successful search?"). For a reader not already committed to this framing, many natural questions arise that are worth addressing.

      1a) Why do we need to appeal to rotation at all as opposed to, say, familiarity? A rotated cat is less familiar than a typically oriented one. This is a long-standing literature (e.g., Wang, Cavanagh, and Green (1994)), of course, with a lot to unpack.

      1b) What are the triggers for the 'corrective' rotation that presumably brings reference frames back into alignment? What if the rotation had not been so obvious (i.e. for a target that may not have a typical orientation, like a hand, or a ball, or a learned, nonsense object?) or the background had not had such clear orientation (like a cluttered non-naturalistic background of or a naturalistic backdrop, but viewed from an unfamiliar POV (e.g., from above) or a naturalistic background, but not all of the elements were rotated)? What, ultimately, is rotated? The entire visual field? Does that mean that searching for multiple targets at different angles of rotation would interfere with one another?

      1c) Relatedly, what is the process by which the visual system comes to know the 'correct' rotation? (Or, alternatively, is 'triggered to realize' that there is a rotation in play?) Is this something that needs to be learned? Is it only learned developmentally, through exposure to gravity? Could it be learned in the context of an experiment that starts with unfamiliar stimuli?

      1d) Why the appeal to natural images? I appreciate any time a study can be moved from potentially too stripped-down laboratory conditions to more naturalistic ones, but is this necessary in the present case? Would the pattern of results have been different if these were typical laboratory 'visual search' displays of disconnected object arrays?

      1e) How should we reconcile rotation-based theories of 'rotated-object' search with visual search results from zero gravity environments (e.g., for a review, see Leone (1998))?

      1f) How should we reconcile the current manipulations with other viewpoint-perspective manipulations (e.g., Zhang & Pan (2022))?

      (2) The presentation/interpretation of results would benefit from more elaboration and justification.

      2a) All of the current interpretations rely on just the RT data. First, the RT results should also be presented in natural units (i.e., seconds/ms), not normalized. As well, results should be shown as violin plots or something similar that captures distribution - a lot of important information is lost when just presenting one 'average' dot across participants. More fundamentally, I think we need to have a better accounting for performance (percent correct or d') to help contextualize the RT results. We should at least be offered some visualization (Heitz, 2014) of the speed accuracy trade-off for each of the conditions. Following this, the authors should more critically evaluate how any substantial SAT trends could affect the interpretation of results.

      2b) Unless I am missing something, the interpretation of the pattern of results (both qualitatively and quantitatively in their 'relative weight' analysis) relies on how they draw their contrasts. For instance, the authors contrast the two 'gravitational' conditions (target 0 deg versus target 90 deg) as if this were a change in a single variable/factor. But there are other ways to understand these manipulations that would affect contrasts. For instance, if one considers whether the target was 'consistent' (i.e., typically oriented) with respect to the context, egocentric, and gravitational frames, then the 'gravitational 0 deg' condition is consistent with context, egocentric view, but inconsistent with gravity. And, the 'gravitational 90 deg' condition, then, is inconsistent with context, egocentric view, but consistent with gravity. Seen this way, this is not a change in one variable, but three. The same is true of the baseline 0 deg versus baseline 90 deg condition, where again we have a change in all three target-consistency variables. The 'one variable' manipulations then would be: 1) baseline 0 versus visual context 0 (i.e., a change only in the context variable); 2) baseline 0 versus egocentric 0 (a change only in the egocentric variable); and 3) baseline 0 versus gravitational 0 (a change only in the gravitational variable). Other contrasts (e.g., gravitational 90 versus context 90) would showcase a change in two variables (in this case, a change in both context and gravity). My larger point is, again, unless I am really missing something, that the choice of how to contrast the manipulations will affect the 'pattern' of results and thereby the interpretation. If the authors agree, this needs to be acknowledged, plausible alternative schemes discussed, and the ultimate choice of scheme defended as the most valid.

      2c) Even with this 'relative weight' interpretation, there are still some patterns of results that seem hard to account for. Primarily, the egocentric condition seems hard to account for under any scheme, and the authors need to spend more time discussing/reconciling those results.

      2d) Some results are just deeply counterintuitive, and so the reader will crave further discussion. Most saliently for me, based on the results of Experiment 2 (specifically, the fact that gravitational 90 had better performance than gravitational 0), designers of cockpits should have all gauges/displays rotate counter to the airplane so that they are always consistent with gravity, not the pilot. Is this indeed a fair implication of the results?

      2e) I really craved some 'control conditions' here to help frame the current results. In keeping with the rhetorical questions posed above in 1a/b/c/d, if/when the authors engage with revisions to this paper, I would encourage the inclusion of at least some new empirical results. For me the most critical would be to repeat some core conditions, but with a symmetric target (e.g. a ball) since that would seem to be the only way (given the current design) to tease out nuisance confounding factors such as, say, the general effect of performing search while sideways (put another way, the authors would have to assume here that search (non-normalized RT's and search performance) for a ball-target in the baseline condition would be identical to that in the gravitational condition.)

    4. Reviewer #3 (Public review):

      The study tested how people search for objects in natural scenes using virtual reality. Participants had to find targets among other objects, shown upright or tilted. The main results showed that upright objects were found faster and more accurately. When the scene or body was rotated, performance changed, showing that people use cues from the environment and gravity to guide search.

      The manuscript is clearly written and well designed, but there are some aspects related to methods and analyses that would benefit from stronger support.

      First, the sample size is not justified with a power analysis, nor is it explained how it was determined. This is an important point to ensure robustness and replicability.

      Second, the reaction time data were processed using different procedures, such as the use of the median to exclude outliers and an ad hoc cut-off of 50 ms. These choices are not sufficiently supported by a theoretical rationale, and could appear as post-hoc decisions.

      Third, the mixed-model analyses are overall well-conducted; however, the specification of the random structure deserves further consideration. The authors included random intercepts for participants and object categories, which is appropriate. However, they did not include random slopes (e.g., for orientation or set size), meaning that variability in these effects across participants was not modelled. This simplification can make the models more stable, but it departs from the maximal random structure recommended by Barr et al. (2013). The authors do not explicitly justify this choice, and a reviewer may question why participant-specific variability in orientation effects, for example, was not allowed. Given the modest sample sizes (20 in Experiment 1 and 10 in Experiment 2), convergence problems with more complex models are likely. Nonetheless, ignoring random slopes can, in principle, inflate Type I error rates, so this issue should at least be acknowledged and discussed.

    1. eLife Assessment

      This study provides important results with regard to the ongoing debate of the relationship between internalizing psychopathology and learning under uncertainty. The methods and analyses are solid, and the results are backed by a large sample size, yet the study could still benefit from a more detailed discussion about the difference in experimental design and analysis compared to previous studies. If these concerns are addressed, this study would be of interest to researchers in clinical and computational psychiatry for the behavioral markers of psychopathological symptoms.

    2. Reviewer #1 (Public review):

      The authors conducted a series of experiments using two established decision-making tasks to clarify the relationship between internalizing psychopathology (anxiety and depression) and adaptive learning in uncertain and volatile environments. While prior literature has reported links between internalizing symptoms - particularly trait anxiety - and maladaptive increases in learning rates or impaired adjustment of learning rates, findings have been inconsistent. To address this, the authors designed a comprehensive set of eight experiments that systematically varied task conditions. They also employed a bifactor analysis approach to more precisely capture the variance associated with internalizing symptoms across anxiety and depression. Across these experiments, they found no consistent relationship between internalizing symptoms and learning rates or task performance, concluding that this purported hallmark feature may be more subtle than previously assumed.

      Strengths:

      (1) A major strength of the paper lies in its impressive collection of eight experiments, which systematically manipulated task conditions such as outcome type, variability, volatility, and training. These were conducted both online and in laboratory settings. Given that trial conditions can drive or obscure observed effects, this careful, systematic approach enables a robust assessment of behavior. The consistency of findings across online and lab samples further strengthens the conclusions.

      (2) The analyses are impressively thorough, combining model-agnostic measures, extensive computational modeling (e.g., Bayesian, Rescorla-Wagner, Volatile Kalman Filter), and assessments of reliability. This rigor contributes meaningfully to broader methodological discussions in computational psychiatry, particularly concerning measurement reliability.

      (3) The study also employed two well-established, validated computational tasks: a game-based predictive inference task and a binary probabilistic reversal learning task. This choice ensures comparability with prior work and provides a valuable cross-paradigm perspective for examining learning processes.

      (4) I also appreciate the open availability of the analysis code that will contribute substantially to the field using similar tasks.

      Weakness:

      (1) While the overall sample size (N = 820 across eight experiments) is commendable, the number of participants per experiment is relatively modest, especially in light of the inherent variability in online testing and the typically small effect sizes in correlations with mental health traits (e.g., r = 0.1-0.2). The authors briefly acknowledge that any true effects are likely small; however, the rationale behind the sample sizes selected for each experiment is unclear. This is especially important given that previous studies using the predictive inference task (e.g., Seow & Gillan, 2020, N > 400; Loosen et al., 2024, N > 200) have reported non-significant associations between trait anxiety symptoms and learning rates.

      (2) The motivation for focusing on the predictive inference task is also somewhat puzzling, given that no cited study has reported associations between trait anxiety and parameters of this task. While this is mitigated by the inclusion of a probabilistic reversal learning task, which has a stronger track record in detecting such effects, the study misses an opportunity to examine whether individual differences in learning-related measures correlate across the two tasks, which could clarify whether they tap into shared constructs.

      (3) The parameterization of the tasks, particularly the use of high standard deviations (SDs) of 20 and 30 for outcome distributions and hazard rates of 0.1 and 0.16, warrants further justification. Are these hazard rates sufficiently distinct? Might the wide SDs reduce sensitivity to volatility changes? Prior studies of the circle version of this predictive inference task (e.g., Vaghi et al., 2019; Seow & Gillan, 2020; Marzuki et al., 2022; Loosen et al., 2024; Hoven et al., 2024) typically used SDs around 12. Indeed, the Supplementary Materials suggest that variability manipulations did not seem to substantially affect learning rates (Figure S5)-calling into question whether the task manipulations achieved their intended cognitive effects.

      (4) Relatedly, while the predictive inference task showed good reliability, the reversal learning task exhibited only "poor-to-moderate" reliability in its learning-rate estimates. Given that previous findings linking anxiety to learning rates have often relied on this task, these reliability issues raise concerns about the robustness and generalizability of conclusions drawn from it.

      (5) As the authors note, the study relies on a subclinical sample. This limits the generalizability of the findings to individuals with diagnosed disorders. A growing body of research suggests that relationships between cognition and symptomatology can differ meaningfully between general population samples and clinical groups. For example, Hoven et al. (2024) found differing results in the predictive inference task when comparing OCD patients, healthy controls, and high- vs. low-symptom subgroups.

      (6) Finally, the operationalization of internalizing symptoms in this study appears to focus on anxiety and depression. However, obsessive-compulsive disorder is also generally considered an internalizing disorder, which presents a gap in the current cited literature of the paper, particularly when there have been numerous studies with the predictive inference task and OCD/compulsivity (e.g., Vaghi et al., 2019; Seow & Gillan, 2020; Marzuki et al., 2022; Loosen et al., 2024; Hoven et al., 2024), rather than trait anxiety per se.

      Overall:

      Despite the named limitations, the authors have done very impressive work in rigorously examining the relationship between anxiety/internalizing symptoms and learning rates in commonly used decision-making tasks under uncertainty. Their conclusion is well supported by the consistency of their null findings across diverse task conditions, though its generalizability may be limited by some features of the task design and its sample. This study provides strong evidence that will guide future research, whether by shifting the focus of examining dysfunctions of larger effect sizes or by extending investigations to clinical populations.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, the authors recruited a large sample of participants to complete two well-established paradigms: the predictive inference task and the volatile reversal learning task. With this dataset, they not only replicated several classical findings on uncertainty-based learning from previous research but also demonstrated that individual differences in learning behavior are not systematically associated with internalizing psychopathology. These results provide valuable large-scale evidence for this line of research.

      Strengths:

      (1) Use of two different tasks.

      (2) Recruitment of a large sample of participants.

      (3) Inclusion of multiple experiments with different conditions, demonstrating strong scientific rigor.

      Weaknesses:

      Below are questions rather than 'weaknesses':

      (1) This study uses a large human sample, which is a clear strength. However, was the study preregistered? It would also be useful to report a power analysis to justify the sample size.

      (2) Previous studies have tested two core hypotheses: (a) that internalizing psychopathology is associated with overall higher learning rates, and (b) that it is associated with learning rate adaptation. In the first experiment, the findings seem to disconfirm only the first hypothesis. I found it unclear how, in the predator task, participants were expected to adjust their learning rate to adapt to volatility. Could the authors clarify this point?

      (3) According to the Supplementary Information, Model 13 showed the best fit, yet the authors selected Model 12 due to the larger parameter variance in Model 13. What would the results of Model 13 look like? Furthermore, do Models 12 and 13 correspond to the optimal models identified by Gagne et al. (2020)? Please clarify.

      (4) In the Discussion, the authors addressed both task reliability and parameter reliability. However, the term reliability seems to be used differently in these two contexts. For example, good parameter recovery indicates strong reliability in one sense, but can we then directly equate this with parameter reliability? It would be helpful to define more precisely what is meant by reliability in each case.

      (5) The Discussion also raises the possibility that limited reliability may represent a broader challenge facing the interdisciplinary field of computational psychiatry. What, in the authors' view, are the key future directions for the field to mitigate this issue?

    1. eLife Assessment

      This valuable study describes MerQuaCo, a computational and automatic quality control tool for spatial transcriptomics datasets. The authors have collected a remarkable number of tissues to construct the main algorithm. The compelling strength of the evidence is demonstrated through a combination of empirical observations, automated computational approaches, and validation against existing software packages. MerQuaCo will interest researchers who routinely perform spatial transcriptomic imaging (especially MERSCOPE), as it provides an imperfection detector and quality control measures for reliable and reproducible downstream analysis.

    2. Reviewer #1 (Public review):

      Summary:

      The authors present MerQuaCo, a computational tool that fills a critical gap in the field of spatial transcriptomics: the absence of standardized quality control (QC) tools for image-based datasets. Spatial transcriptomics is an emerging field where datasets are often imperfect, and current practices lack systematic methods to quantify and address these imperfections. MerQuaCo offers an objective and reproducible framework to evaluate issues like data loss, transcript detection variability, and efficiency differences across imaging planes.

      Strengths:

      (1) The study draws on an impressive dataset comprising 641 mouse brain sections collected on the Vizgen MERSCOPE platform over two years. This scale ensures that the documented imperfections are not isolated or anecdotal but represent systemic challenges in spatial transcriptomics. The variability observed across this large dataset underscores the importance of using sufficiently large sample sizes when benchmarking different image-based spatial technologies. Smaller datasets risk producing misleading results by over-representing unusually successful or unsuccessful experiments. This comprehensive dataset not only highlights systemic challenges in spatial transcriptomics but also provides a robust foundation for evaluating MerQuaCo's metrics. The study sets a valuable precedent for future quality assessment and benchmarking efforts as the field continues to evolve.

      (2) MerQuaCo introduces thoughtful metrics and filters that address a wide range of quality control needs. These include pixel classification, transcript density, and detection efficiency across both x-y axes (periodicity) and z-planes (p6/p0 ratio). The tool also effectively quantifies data loss due to dropped images, providing tangible metrics for researchers to evaluate and standardize their data. Additionally, the authors' decision to include examples of imperfections detectable by visual inspection but not flagged by MerQuaCo reflects a transparent and balanced assessment of the tool's current capabilities.

      Weaknesses:

      (1) The study focuses on cell-type label changes as the main downstream impact of imperfections. Broadening the scope to explore expression response changes of downstream analyses would offer a more complete picture of the biological consequences of these imperfections and enhance the utility of the tool.

      (2) While the manuscript identifies and quantifies imperfections effectively, it does not propose post-imaging data processing solutions to correct these issues, aside from the exclusion of problematic sections or transcript species. While this is understandable given the study is aimed at the highest quality atlas effort, many researchers don't need that level of quality to compare groups. It would be important to include discussion points as to how those cut-offs should be decided for a specific study.

      (3) Although the authors demonstrate the applicability of MerQuaCo on a large MERFISH dataset, and the limited number of sections from other platforms, it would be helpful to describe its limitations in its generalizability.

    3. Reviewer #2 (Public review):

      Summary:

      The authors present MerQuaCo, a computational tool for quality control in image-based spatial transcriptomic, especially MERSCOPE. They assessed MerQuaCo on 641 slides that are produced in their institute in terms of the ratio of imperfection, transcript density, and variations of quality by different planes (x-axis).

      Strengths:

      This looks to be a valuable work that can be a good guideline of quality control in future spatial transcriptomics. A well-controlled spatial transcriptomics dataset is also important for the downstream analysis.

      Weaknesses:

      The results section needs to be more structured.

    4. Reviewer #3 (Public review):

      Summary:

      MerQuaCo is an open-source computational tool developed for quality control in image-based spatial transcriptomics data, with a primary focus on data generated by the Vizgen MERSCOPE platform. The authors analyzed a substantial dataset of 641 fresh-frozen adult mouse brain sections to identify and quantify common imperfections, aiming to replace manual quality assessment with an automated, objective approach, providing standardized data integrity measures for spatial transcriptomics experiments.

      Strengths:

      The manuscript's strengths lie in its timely utility, rigorous empirical validation, and practical contributions to methodology and biological discovery in spatial transcriptomics.

      Weaknesses:

      While MerQuaCo demonstrates utility in large datasets and cross-platform potential, its generalizability and validation require expansion, particularly for non-MERSCOPE platforms and real-world biological impact.

    1. eLife Assessment

      This study provides a valuable contribution to spatial transcriptomics by introducing MerQuaCo, a computational tool for standardizing quality control in image-based spatial transcriptomics datasets. The tool addresses the lack of consensus in the field and provides robust metrics to identify and quantify common imperfections in datasets. The work is supported by an impressive dataset and compelling analyses, and will be of significant interest to researchers focused on data reproducibility and downstream analysis reliability in spatial transcriptomics.

    2. Reviewer #1 (Public review):

      The authors present MerQuaCo, a computational tool that fills a critical gap in the field of spatial transcriptomics: the absence of standardized quality control (QC) tools for image-based datasets. Spatial transcriptomics is an emerging field where datasets are often imperfect, and current practices lack systematic methods to quantify and address these imperfections. MerQuaCo offers an objective and reproducible framework to evaluate issues like data loss, transcript detection variability, and efficiency differences across imaging planes.

      Strengths

      (1) The study draws on an impressive dataset comprising 641 mouse brain sections collected on the Vizgen MERSCOPE platform over two years. This scale ensures that the documented imperfections are not isolated or anecdotal but represent systemic challenges in spatial transcriptomics. The variability observed across this large dataset underscores the importance of using sufficiently large sample sizes when benchmarking different image-based spatial technologies. Smaller datasets risk producing misleading results by over-representing unusually successful or unsuccessful experiments. This comprehensive dataset not only highlights systemic challenges in spatial transcriptomics but also provides a robust foundation for evaluating MerQuaCo's metrics. The study sets a valuable precedent for future quality assessment and benchmarking efforts as the field continues to evolve.

      (2) MerQuaCo introduces thoughtful metrics and filters that address a wide range of quality control needs. These include pixel classification, transcript density, and detection efficiency across both x-y axes (periodicity) and z-planes (p6/p0 ratio). The tool also effectively quantifies data loss due to dropped images, providing tangible metrics for researchers to evaluate and standardize their data. Additionally, the authors' decision to include examples of imperfections detectable by visual inspection but not flagged by MerQuaCo reflects a transparent and balanced assessment of the tool's current capabilities.

      Comments on revisions:

      All previous concerns have been fully addressed. The revised manuscript presents a robust, well-documented, and user-friendly tool for quality control in image-based spatial transcriptomics, a rapidly advancing area where objective assessment tools are urgently needed.

    3. Reviewer #3 (Public review):

      Summary:

      MerQuaCo is an open-source computational tool developed for quality control in image-based spatial transcriptomics data, with a primary focus on data generated by the Vizgen MERSCOPE platform. The authors analyzed a substantial dataset of 641 fresh-frozen adult mouse brain sections to identify and quantify common imperfections, aiming to replace manual quality assessment with an automated, objective approach, providing standardized data integrity measures for spatial transcriptomics experiments.

      Strengths:

      The manuscript's strengths lie in its timely utility, rigorous empirical validation, and practical contributions to methodology and biological discovery in spatial transcriptomics.

      Weaknesses:

      While MerQuaCo demonstrates utility in large datasets and cross-platform potential, its generalizability and validation are currently limited by the availability of sufficient datasets from non-MERSCOPE platforms and non-brain tissues. The evaluation of data imperfections' impact on downstream analyses beyond cell typing (e.g., differential expression, spatial statistics, and cell-cell interactions) is also constrained by space and scope. However, these represent valuable directions for future work as more datasets become available.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      The authors present MerQuaCo, a computational tool that fills a critical gap in the field of spatial transcriptomics: the absence of standardized quality control (QC) tools for image-based datasets. Spatial transcriptomics is an emerging field where datasets are often imperfect, and current practices lack systematic methods to quantify and address these imperfections. MerQuaCo offers an objective and reproducible framework to evaluate issues like data loss, transcript detection variability, and efficiency differences across imaging planes.

      Strengths:

      (1) The study draws on an impressive dataset comprising 641 mouse brain sections collected on the Vizgen MERSCOPE platform over two years. This scale ensures that the documented imperfections are not isolated or anecdotal but represent systemic challenges in spatial transcriptomics. The variability observed across this large dataset underscores the importance of using sufficiently large sample sizes when benchmarking different image-based spatial technologies. Smaller datasets risk producing misleading results by over-representing unusually successful or unsuccessful experiments. This comprehensive dataset not only highlights systemic challenges in spatial transcriptomics but also provides a robust foundation for evaluating MerQuaCo's metrics. The study sets a valuable precedent for future quality assessment and benchmarking efforts as the field continues to evolve.

      (2) MerQuaCo introduces thoughtful metrics and filters that address a wide range of quality control needs. These include pixel classification, transcript density, and detection efficiency across both x-y axes (periodicity) and z-planes (p6/p0 ratio). The tool also effectively quantifies data loss due to dropped images, providing tangible metrics for researchers to evaluate and standardize their data. Additionally, the authors' decision to include examples of imperfections detectable by visual inspection but not flagged by MerQuaCo reflects a transparent and balanced assessment of the tool's current capabilities.

      Weaknesses:

      (1) The study focuses on cell-type label changes as the main downstream impact of imperfections. Broadening the scope to explore expression response changes of downstream analyses would offer a more complete picture of the biological consequences of these imperfections and enhance the utility of the tool.

      Here, we focused on the consequences of imperfections on cell-type labels, one common use for spatial transcriptomics datasets. Spatial datasets are used for so many other purposes that there are almost endless ways in which imperfections could impact downstream analyses. It is difficult to see how we might broaden the scope to include more downstream effects, while providing enough analysis to derive meaningful conclusions, all within the scope of a single paper. Existing studies bring some insight into the impact of imperfections and we expect future studies will extend our understanding of consequences in other biological contexts.

      (2) While the manuscript identifies and quantifies imperfections effectively, it does not propose post-imaging data processing solutions to correct these issues, aside from the exclusion of problematic sections or transcript species. While this is understandable given the study is aimed at the highest quality atlas effort, many researchers don't need that level of quality to compare groups. It would be important to include discussion points as to how those cut-offs should be decided for a specific study.

      Studies differ greatly in their aims and, as a result, the impact of imperfections in the underlying data will differ also, preventing us from offering meaningful guidance on how cut-offs might best be identified. Rather, our aim with MerQuaCo was to provide researchers with tools to generate information on their spatial datasets, to facilitate downstream decisions on data inclusion and cut-offs.

      (3) Although the authors demonstrate the applicability of MerQuaCo on a large MERFISH dataset, and the limited number of sections from other platforms, it would be helpful to describe its limitations in its generalizability.

      In figure 9, we addressed the limitations and generalizability of MerQuaCo as best we could with the available datasets. Gaining deep insight into the limitations and generalizability of MerQuaCo would require application to multiple large datasets and, to the best of our knowledge, these datasets are not available.

      Reviewer #2 (Public review):

      The authors present MerQuaCo, a computational tool for quality control in image-based spatial transcriptomic, especially MERSCOPE. They assessed MerQuaCo on 641 slides that are produced in their institute in terms of the ratio of imperfection, transcript density, and variations of quality by different planes (x-axis).

      Strengths:

      This looks to be a valuable work that can be a good guideline of quality control in future spatial transcriptomics. A well-controlled spatial transcriptomics dataset is also important for the downstream analysis.

      Weaknesses:

      The results section needs to be more structured.

      We have split the ‘Transcript density’ subsection of the results into 3 new subsections.

      Reviewer #3 (Public review):

      MerQuaCo is an open-source computational tool developed for quality control in imagebased spatial transcriptomics data, with a primary focus on data generated by the Vizgen MERSCOPE platform. The authors analyzed a substantial dataset of 641 freshfrozen adult mouse brain sections to identify and quantify common imperfections, aiming to replace manual quality assessment with an automated, objective approach, providing standardized data integrity measures for spatial transcriptomics experiments.

      Strengths:

      The manuscript's strengths lie in its timely utility, rigorous empirical validation, and practical contributions to methodology and biological discovery in spatial transcriptomics.

      Weaknesses:

      While MerQuaCo demonstrates utility in large datasets and cross-platform potential, its generalizability and validation require expansion, particularly for non-MERSCOPE platforms and real-world biological impact.

      We agree that there is value in expanding our analyses to non-Merscope platforms, to tissues other than brain, and to analyses other than cell typing. The limiting factor in all these directions is the availability of large enough datasets to probe the limits of MerQuaCo. We look forward to a future in which more datasets are available and it’s possible to extend our analyses

      Reviewer #1(Recommendation for the Author):

      (1) To better capture the downstream impacts of imperfections, consider extending the analysis to additional metrics, such as specificity variation across cell types, gene coexpression, or spatial gene patterning. This would deepen insights into how these imperfections shape biological interpretations and further demonstrate the versatility of MerQuaCo.

      These are compelling ideas, but we are unable to study so many possible downstream impacts in sufficient depth in a single study. Insights into these topics will likely come from future studies.

      (2) In Figure 7 legend, panel label (D) is repeated thus panels E-F are mislabelled. 

      We have corrected this error.

      (3) Ensure that the image quality is high for the figures. 

      We will upload Illustrator files, ensuring that images are at full resolution.

      Reviewer #2 (Recommendation for the Author):

      (1) A result subsection "Transcript density" looks too long. Please provide a subsection heading for each figure. 

      We have split this section into 3 with new subheadings.

      (2) The result subsection title "Transcript density" sounds ambiguous. Please provide a detailed title describing what information this subsection contains. 

      We have renamed this section ‘Differences in transcript density between MERSCOPE experiments’.

      Minor: 

      (1) There is no explanation of the black and grey bars in Figure 2A.

      We have added information to the figure legend, identifying the datasets underlying the grey and black bars.

      (2) In the abstract, the phrase "High-dimension" should be "High-dimensional". 

      We have changed ‘high-dimension’ to ‘high-dimensional’.

      (3) In the abstract, "Spatial results" is an unclear expression. What does it stand for? 

      We have replaced the term ‘spatial results’ with ‘the outputs of spatial transcriptomics platforms’.

      Reviewer #3 (Recommendation for the Author):

      (1) While the tool claims broad applicability, validation is heavily centered on MERSCOPE data, with limited testing on other platforms. The authors should expand validation to include more diverse platforms and add a small analysis of non-brain tissue. If broader validation isn't feasible, modify the title and abstract to reflect the focus on the mouse brain explicitly.

      We agree that expansion to other platforms is desirable, but to the best of our knowledge sufficient datasets from other platforms are not available. In the abstract, we state that ‘… we describe imperfections in a dataset of 641 fresh-frozen adult mouse brain sections collected using the Vizgen MERSCOPE.’

      (2) The impact of data imperfections on downstream analysis needs a more comprehensive evaluation. The authors should expand beyond cluster label changes to include a) differential expression analysis with simulated imperfections, b) impact on spatial statistics and pattern detection, and c) effects on cell-cell interactions. 

      Each of these ideas could support a substantial study. We are unable to do them justice in the limited space available as an addition to the current study.

      (3) The pixel classification workflow and validation process need more detailed documentation. 

      The methods and results together describe the workflow and validation in depth. We are unclear what details are missing.

      (4) The manuscript lacks comparison to existing. QC pipelines such as Squidpy and Giotto. The authors should benchmark MerQuaCo against them and provide integration options with popular spatial analysis tools with clear documentation.

      To the best of our knowledge, Squidpy and Giotto lack QC benchmarks, certainly of the parameters characterized by MerQuaCo. Direct comparison isn’t possible.

    1. eLife Assessment

      This important study shows how the relative importance of inter-species interactions in microbiomes can be inferred from empirical species abundance data. The methods based on statistical physics of disordered systems are compelling and rigorous, and allow for distinguishing healthy and non-healthy human gut microbiomes via differences in their inter-species interaction patterns. This work should be of broad interest to researchers in microbial ecology and theoretical biophysics.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors develop a novel method to infer ecologically-informative parameters across healthy and diseased states of the gut microbiota, although the method is generalizable to other datasets for species abundances. The authors leverage techniques from theoretical physics of disordered systems to infer different parameters-mean and standard deviation for the strength of bacterial interspecies interactions, a bacterial immigration rate, and the strength of demographic noise-that describe the statistics of microbiota samples from two groups-one for healthy subjects and another one for subjects with chronic inflammation syndromes. To do this, the authors simulate communities with a modified version of the Generalized Lotka-Volterra model and randomly-generated interactions, and then use a moment-matching algorithm to find sets of parameters that better reproduce the data for species abundances. They find that these parameters are different for the healthy and diseased microbiota groups. The results suggest, for example, that bacterial interaction strengths, relative to noise and immigration, are more dominant of microbiota dynamics in diseased states than in healthy states.

      We think that this manuscript brings an important contribution that will be of interest in the areas of statistical physics, (microbiota) ecology and (biological) data science. The evidence of their results is solid and the work improves the state-of-the-art in terms of methods.

      Strengths:

      • Using a fairly generic ecological model, the method can identify the change in the relative importance of different ecological forces (distribution of interspecies interactions, demographic noise and immigration) in different sample groups. The authors focus on the case of the human gut microbiota, showing that the data is consistent with a higher influence of species interactions (relative to demographic noise and immigration) in a disease microbiota state than in healthy ones.

      • The method is novel, original and it improves the state-of-the-art methodology for the inference of ecologically-relevant parameters. The analysis provides solid evidence on the conclusions.

      Weaknesses:

      • As a proof of concept for a new inference method, this text maintains a technical focus, which may require some familiarity with statistical physics. Nevertheless, the authors' clear introduction of key mathematical terms and their interpretations, along with a clear discussion of the ecological implications, make the results accessible and easy to follow.
    3. Reviewer #2 (Public review):

      Summary:

      This valuable work aims to infer, from microbiome data, microbial species interaction patterns associated with healthy and unhealthy human gut microbiomes. Using solid techniques from statistical physics, the authors propose that healthy and unhealthy microbiome interaction patterns substantially differ. Unhealthy microbiomes are closer to instability and single-strain dominance; whereas healthy microbiomes showcase near-neutral dynamics, mostly driven by demographic noise and immigration.

      Strengths:

      This is a well-written article, relatively easy to follow and transparent despite the high degree of technicality of the underlying theory. The authors provide a powerful inferring procedure, which bypasses the issue of having only compositional data. This work shows that embracing the complexity of microbial systems can be used to our advantage, instead of being an insurmountable obstacle. This is a powerful counterpoint to the classic reductionist view that pushes researchers to study much simpler systems, and only hope to one day scale up their findings.

      Weaknesses:

      As acknowledged by the authors themselves, this is only a proof of concept. Further research is to better understand the dynamical nature of gut-microbiomes. The authors do however point at ways in which species abundance distributions could be better reproduced by dynamical models. They also suggest that they work could explain prior empirical findings invoking the "Anna Karenina principle", where healthy microbiomes resemble one another, but disease states tend to all differ.

    4. Reviewer #3 (Public review):

      Summary:

      I found the manuscript to be well-written. I have a few questions regarding the model, though the bulk of my comments are requests to provide definitions and additional clarity. There are concepts and approaches used in this manuscript that are clear boons for understanding the ecology of microbiomes but are rarely considered by researchers approaching the manuscript from a traditional biology background. The authors have clearly considered this in their writing of S1 and S2, so addressing these comments should be straightforward. The methods section is particularly informative and well-written, with sufficient explanations of each step of the derivation that should be informative to researchers in the microbial life sciences that are not well-versed with physics-inspired approaches to ecology dynamics.

      Strengths:

      The modeling efforts of this study primarily rely on a disordered for of the generalized Lotka-Volterra (gLV) model. This model can be appropriate for investigating certain systems and the authors are clear about when and how more mechanistic models (i.e., consumer-resource) can lead to gLV. Phenomenological models such as this have been found to be highly useful for investigating the ecology of microbiomes, so this modeling choice seems justified, and the limitations are laid out.

      Weaknesses:

      The authors use metagenomic data of diseased and healthy patients that was first processed in Pasqualini et al. (2024). The use of metagenomic data leads me into a question regarding the role of sampling effort (i.e., read counts) in shaping model parameters such as $h$. This parameter is equal to the average of 1/# species across samples because the data are compositional in nature. My understanding is that $h$ was calculated using total abundances (i.e., read counts). The number of observed species is strongly influenced by sampling effort and the authors addressed this point in their revised manuscript.

      However, the role of sampling effort can depend on the type of data and my instinct about the role that sampling effort plays in species detection is primarily based on 16S data. The dependency between these two variables may be less severe for the authors' metagenomic pipeline. This potential discrepancy raises a broader issue regarding the investigation of microbial macroecological patterns and the inference of ecological parameters. Often microbial macroecology researchers rely on 16S rRNA amplicon data because that type of data is abundant and comparatively low-cost. Some in microbiology and bioinformatics are increasingly pushing researchers to choose metagenomics over 16S. Sometimes this choice is valid (discovery of new MAGs, investigate allele frequency changes within species, etc.), sometimes it is driven by the false equivalence "more data = better". The outcome though is that we have a body of more-or-less established microbial macroecological patterns which rest on 16S data and are now slowly incorporating results from metagenomics. To my knowledge there has not been a systematic evaluation of the macroecological patterns that do and do not vary by one's choice in 16S vs. metagenomics. Several of the authors in this manuscript have previously compared the MAD shape for 16S and metagenomic datasets in Pasqualini et al., but moving forward a more comprehensive study seems necessary (2024). These points were addressed by the authors in their revised manuscript.

      Final review: The authors addressed all comments and I have no additional comments.

      References

      Pasqualini, Jacopo, et al. "Emergent ecological patterns and modelling of gut microbiomes in health and in disease." PLOS Computational Biology 20.9 (2024): e1012482.

    1. eLife Assessment

      This manuscript characterizes a mutated clone of RNA polymerase I in yeast, referred to as SuperPol, to understand the mechanisms of RNA polymerase I elongation and termination. The authors present convincing evidence that demonstrates the existence of premature termination in Pol I transcription. Overall, the characterization of this RNA pol I offers important insights into the regulation of ribosomal RNA transcription and its potential application in cancer pharmacology.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Summary:

      The study characterises an RNA polymerase (Pol) I mutant (RPA135-F301S) named SuperPol. This mutant was previously shown to increase yeast ribosomal RNA (rRNA) production by Transcription Run-On (TRO). In this work, the authors confirm this mutation increases rRNA transcription using a slight variation of the TRO method, Transcriptional Monitoring Assay (TMA), which also allows the analysis of partially degraded RNA molecules. The authors show a reduction of abortive rRNA transcription in cells expressing the SuperPol mutant and a modest occupancy decrease at the 5' region of the rRNA genes compared to WT Pol I. These results suggest that the SuperPol mutant displays a lower frequency of premature termination. Using in vitro assays, the authors found that the mutation induces an enhanced elongation speed and a lower cleavage activity on mismatched nucleotides at the 3' end of the RNA. Finally, SuperPol mutant was found to be less sensitive to BMH-21, a DNA intercalating agent that blocks Pol I transcription and triggers the degradation of the Pol I subunit, Rpa190. Compared to WT Pol I, short BMH-21 treatment has little effect on SuperPol transcription activity, and consequently, SuperPol mutation decreases cell sensitivity to BMH-21.

      Significance:

      The work further characterises a single amino acid mutation of one of the largest yeast Pol I subunits (RPA135-F301S). While this mutation was previously shown to increase rRNA synthesis, the current work expands the SuperPol mutant characterisation, providing details of how RPA135-F301S modifies the enzymatic properties of yeast Pol I. In addition, their findings suggest that yeast Pol I transcription can be subjected to premature termination in vivo. The molecular basis and potential regulatory functions of this phenomenon could be explored in additional studies.

      Our understanding of rRNA transcription is limited, and the findings of this work may be interesting to the transcription community. Moreover, targeting Pol I activity is an open strategy for cancer treatment. Thus, the resistance of SuperPol mutant to BMH-21 might also be of interest to a broader community, although these findings are yet to be confirmed in human Pol I and with more specific Pol I inhibitors in future.

      Comments on revision:

      The authors' response addressed all the points I raised adequately.

    3. Reviewer #2 (Public review):

      Summary:

      This article presents a study on a mutant form of RNA polymerase I (RNAPI) in yeast, referred to as SuperPol, which demonstrates increased rRNA production compared to the wild-type enzyme. While rRNA production levels are elevated in the mutant, RNAPI occupancy as detected by CRAC is reduced at the 5' end of rDNA transcription units. The authors interpret these findings by proposing that the wild-type RNAPI pauses in the external transcribed spacer (ETS), leading to premature transcription termination (PTT) and degradation of truncated rRNAs by the RNA exosome (Rrp6). They further show that SuperPol's enhanced activity is linked to a lower frequency of PTT events, likely due to altered elongation dynamics and reduced RNA cleavage activity, as supported by both in vivo and in vitro data.

      The study also examines the impact of BMH-21, a drug known to inhibit Pol I elongation, and shows that SuperPol is less sensitive to this drug, as demonstrated through genetic, biochemical, and in vivo approaches. The authors show that BMH-21 treatment induces premature termination in wild-type Pol I, but only to a lesser extent in SuperPol. They suggest that BMH-21 promotes termination by targeting paused Pol I complexes and propose that PTT is an important regulatory mechanism for rRNA production in yeast.

      The data presented are of high quality and support the notion that 1) premature transcription termination occurs at the 5' end of rDNA transcription units; 2) SuperPol has an increased elongation rate with reduced premature termination; and 3) BMH-21 promotes both pausing and termination. The authors employ several complementary methods, including in vitro transcription assays. These results are significant and of interest for a broad audience.

      Adding experiments in different growth conditions to support the claim of regulation by PTT (as the authors propose) will also be an important addition. The revisions further support the claim, with in particular the notion that increased elongation rate of superpol occurs at the expense of fidelity.

      Significance:

      These results are significant and of interest for a basic research audience.

    4. Reviewer #3 (Public review):

      In the manuscript "Ribosomal RNA synthesis by RNA polymerase I is regulated by premature termination of transcription", Azouzi and co-authors investigate the regulatory mechanisms of ribosomal RNA (rRNA) transcription by RNA Polymerase I (RNAPI) in the budding yeast S. cerevisiae. They follow up on exploring the molecular basis of a mutant allele of the second-largest subunit of RNAPI, RPA135-F301S, also dubbed SuperPol, that they had previously reported (Darrière et al, 2019), and which was shown to rescue Rpa49-linked growth defects, possibly by increasing rRNA production.

      Through a combination of genomic and in vitro approaches, the authors test the hypothesis that RNAPI activity could be subjected to a premature transcription termination (PTT) mechanism, akin to what is observed for RNA Polymerase II (RNAPII). The authors demonstrate that SuperPol increased processivity "desensitizes" RNAPI to abortive transcription cycles at the expense of decreased fidelity. In agreement, SuperPol is shown to be resistant to BMH-21, a drug previously shown to impair RNAPI elongation.

      Overall, this work expands the mechanistic understanding of the early dynamics of RNAPI transcription. The presented results are of interest for researchers studying transcription regulation, particularly those interested in RNAPI's transcription mechanisms and fidelity.

      Strengths:

      Overall, the experiments are performed with rigor and include the appropriate controls and statistical analyses. Conclusions are drawn from appropriate experiments. Both the figures and the text present the data clearly. The Materials and Methods section is detailed enough.

      Weaknesses:

      The biological significance of this phenomenon remains unaddressed and thus unclear. The lack of experiments to test a specific regulatory function (such as UTP-A loading checkpoint or other mechanisms) limit these termination events to possibly abortive actions of unclear significance.

      Comments on revised version:

      I appreciated the additional experiments and the other changes made by the authors in the revised version.

    5. Author response:

      The following is the authors’ response to the original reviews

      General Statements:

      In our manuscript, we demonstrate for the first time that RNA Polymerase I (Pol I) can prematurely release nascent transcripts at the 5' end of ribosomal DNA transcription units in vivo. This achievement was made possible by comparing wild-type Pol I with a mutant form of Pol I, hereafter called SuperPol previously isolated in our lab (Darrière at al., 2019). By combining in vivo analysis of rRNA synthesis (using pulse-labelling of nascent transcript and cross-linking of nascent transcript - CRAC) with in vitro analysis, we could show that Superpol reduced premature transcript release due to altered elongation dynamics and reduced RNA cleavage activity. Such premature release could reflect regulatory mechanisms controlling rRNA synthesis. Importantly, This increased processivity of SuperPol is correlated with resistance with BMH-21, a novel anticancer drugs inhibiting Pol I, showing the relevance of targeting Pol I during transcriptional pauses to kill cancer cells. This work offers critical insights into Pol I dynamics, rRNA transcription regulation, and implications for cancer therapeutics.

      We sincerely thank the three reviewers for their insightful comments and recognition of the strengths and weaknesses of our study. Their acknowledgment of our rigorous methodology, the relevance of our findings on rRNA transcription regulation, and the significant enzymatic properties of the SuperPol mutant is highly appreciated. We are particularly grateful for their appreciation of the potential scientific impact of this work. Additionally, we value the reviewer’s suggestion that this article could address a broad scientific community, including in transcription biology and cancer therapy research. These encouraging remarks motivate us to refine and expand upon our findings further.

      All three reviewers acknowledged the increased processivity of SuperPol compared to its wildtype counterpart. However, two out of three questions our claims that premature termination of transcription can regulate ribosomal RNA transcription. This conclusion is based on SuperPol mutant increasing rRNA production. Proving that modulation of early transcription termination is used to regulate rRNA production under physiological conditions is beyond the scope of this study. Therefore, we propose to change the title of this manuscript to focus on what we have unambiguously demonstrated:

      “Ribosomal RNA synthesis by RNA polymerase I is subjected to premature termination of transcription”.

      Reviewer 1 main criticisms centers on the use of the CRAC technique in our study. While we address this point in detail below, we would like to emphasize that, although we agree with the reviewer’s comments regarding its application to Pol II studies, by limiting contamination with mature rRNA, CRAC remains the only suitable method for studying Pol I elongation over the entire transcription units. All other methods are massively contaminated with fragments of mature RNA which prevents any quantitative analysis of read distribution within rDNA.  This perspective is widely accepted within the Pol I research community, as CRAC provides a robust approach to capturing transcriptional dynamics specific to Pol I activity. 

      We hope that these findings will resonate with the readership of your journal and contribute significantly to advancing discussions in transcription biology and related fields.

      Description of the planned revisions:

      Despite numerous text modification (see below), we agree that one major point of discussion is the consequence of increased processivity in SuperPol mutant on the “quality” of produced rRNA. Reviewer 3 suggested comparisons with other processive alleles, such as the rpb1-E1103G mutant of the RNAPII subunit (Malagon et al., 2006). This comparison has already been addressed by the Schneider lab (Viktorovskaya OV, Cell Rep., 2013 - PMID: 23994471), which explored Pol II (rpb1-E1103G) and Pol I (rpa190-E1224G). The rpa190-E1224G mutant revealed enhanced pausing in vitro, highlighting key differences between Pol I and Pol II catalytic ratelimiting steps (see David Schneider's review on this topic for further details).

      Reviewer 2 and 3 suggested that a decreased efficiency of cleavage upon backtracking might imply an increased error rate in SuperPol compared to the wild-type enzyme. Pol I mutant with decreased rRNA cleavage have been characterized previously, and resulted in increased errorrate. We already started to address this point. Preliminary results from in vitro experiments suggest that SuperPol mutants exhibit an elevated error rate during transcription. However, these findings remain preliminary and require further experimental validation to confirm their reproducibility and robustness. We propose to consolidate these data and incorporate into the manuscript to address this question comprehensively. This could provide valuable insights into the mechanistic differences between SuperPol and the wild-type enzyme. SuperPol is the first pol I mutant described with an increased processivity in vitro and in vivo, and we agree that this might be at the cost of a decreased fidelity.

      Regulatory aspect of the process:

      To address the reviewer’s remarks, we propose to test our model by performing experiments that would evaluate PTT levels in Pol I mutant’s or under different growth conditions. These experiments would provide crucial data to support our model, which suggests that PTT is a regulatory element of Pol I transcription. By demonstrating how PTT varies with environmental factors, we aim to strengthen the hypothesis that premature termination plays an important role in regulating Pol I activity.

      We propose revising the title and conclusions of the manuscript. The updated version will better reflect the study's focus and temper claims regarding the regulatory aspects of termination events, while maintaining the value of our proposed model.

      Description of the revisions that have already been incorporated in the transferred manuscript:

      Some very important modifications have now been incorporated:

      Statistical Analyses and CRAC Replicates:

      Unlike reviewers 2 and 3, reviewer 1 suggests that we did not analyze the results statistically. In fact, the CRAC analyses were conducted in biological triplicate, ensuring robustness and reproducibility. The statistical analyses are presented in Figure 2C, which highlights significant findings supporting the fact WT Pol I and SuperPol distribution profiles are different. We CRAC replicates exhibit a high correlation and we confirmed significant effect in each region of interest (5’ETS, 18S.2, 25S.1 and 3’ ETS, Figure 1) to confirm consistency across experiments. We finally took care not to overinterpret the results, maintaining a rigorous and cautious approach in our analysis to ensure accurate conclusions.

      CRAC vs. Net-seq:

      Reviewer 1 ask to comment differences between CRAC and Net-seq. Both methods complement each other but serve different purposes depending on the biological question on the context of transcription analysis. Net-seq has originally been designed for Pol II analysis. It captures nascent RNAs but does not eliminate mature ribosomal RNAs (rRNAs), leading to high levels of contamination. While this is manageable for Pol II analysis (in silico elimination of reads corresponding to rRNAs), it poses a significant problem for Pol I due to the dominance of rRNAs (60% of total RNAs in yeast), which share sequences with nascent Pol I transcripts. As a result, large Net-seq peaks are observed at mature rRNA extremities (Clarke 2018, Jacobs 2022). This limits the interpretation of the results to the short lived pre-rRNA species. In contrast, CRAC has been specifically adapted by the laboratory of David Tollervey to map Pol I distribution while minimizing contamination from mature rRNAs (The CRAC protocol used exclusively recovers RNAs with 3′ hydroxyl groups that represent endogenous 3′ ends of nascent transcripts, thus removing RNAs with 3’-Phosphate, found in mature rRNAs). This makes CRAC more suitable for studying Pol I transcription, including polymerase pausing and distribution along rDNA, providing quantitative dataset for the entire rDNA gene.

      CRAC vs. Other Methods:

      Reviewer 1 suggests using GRO-seq or TT-seq, but the experiments in Figure 2 aim to assess the distribution profile of Pol I along the rDNA, which requires a method optimized for this specific purpose. While GRO-seq and TT-seq are excellent for measuring RNA synthesis and cotranscriptional processing, they rely on Sarkosyl treatment to permeabilize cellular and nuclear membranes. Sarkosyl is known to artificially induces polymerase pausing and inhibits RNase activities which are involved in the process. To avoid these artifacts, CRAC analysis is a direct and fully in vivo approach. In CRAC experiment, cells are grown exponentially in rich media and arrested via rapid cross-linking, providing precise and artifact-free data on Pol I activity and pausing.

      Pol I ChIP Signal Comparison:

      The ChIP experiments previously published in Darrière et al. lack the statistical depth and resolution offered by our CRAC analyses. The detailed results obtained through CRAC would have been impossible to detect using classical ChIP. The current study provides a more refined and precise understanding of Pol I distribution and dynamics, highlighting the advantages of CRAC over traditional methods in addressing these complex transcriptional processes.

      BMH-21 Effects:

      As highlighted by Reviewer 1, the effects of BMH-21 observed in our study differ slightly from those reported in earlier work (Ref Schneider 2022), likely due to variations in experimental conditions, such as methodologies (CRAC vs. Net-seq), as discussed earlier. We also identified variations in the response to BMH-21 treatment associated with differences in cell growth phases and/or cell density. These factors likely contribute to the observed discrepancies, offering a potential explanation for the variations between our findings and those reported in previous studies. In our approach, we prioritized reproducibility by carefully controlling BMH-21 experimental conditions to mitigate these factors. These variables can significantly influence results, potentially leading to subtle discrepancies. Nevertheless, the overall conclusions regarding BMH-21's effects on WT Pol I are largely consistent across studies, with differences primarily observed at the nucleotide resolution. This is a strength of our CRAC-based analysis, which provides precise insights into Pol I activity.

      We will address these nuances in the revised manuscript to clarify how such differences may impact results and provide context for interpreting our findings in light of previous studies.

      Minor points:

      Reviewer #1:

      In general, the writing style is not clear, and there are some word mistakes or poor descriptions of the results, for example: 

      On page 14: "SuperPol accumulation is decreased (compared to Pol I)". 

      On page 16: "Compared to WT Pol I, the cumulative distribution of SuperPol is indeed shifted on the right of the graph." 

      We clarified and increased the global writing style according to reviewer comment.

      There are also issues with the literature, for example: Turowski et al, 2020a and Turowski et al, 2020b are the same article (preprint and peer-reviewed). Is there any reason to include both references? Please, double-check the references.  

      This was corrected in this version of the manuscript.

      In the manuscript, 5S rRNA is mentioned as an internal control for TMA normalisation. Why are Figure 1C data normalised to 18S rRNA instead of 5S rRNA? 

      Data are effectively normalized relative to the 5S rRNA, but the value for the 18S rRNA is arbitrarily set to 100%.

      Figure 4 should be a supplementary figure, and Figure 7D doesn't have a y-axis labelling. 

      The presence of all Pol I specific subunits (Rpa12, Rpa34 and Rpa49) is crucial for the enzymatic activity we performed. In the absence of these subunits (which can vary depending on the purification batch), Pol I pausing, cleavage and elongation are known to be affected. To strengthen our conclusion, we really wanted to show the subunit composition of the purified enzyme. This important control should be shown, but can indeed be shown in a supplementary figure if desired.

      Y-axis is figure 7D is now correctly labelled

      In Figure 7C, BMH-21 treatment causes the accumulation of ~140bp rRNA transcripts only in SuperPol-expressing cells that are Rrp6-sensitive (line 6 vs line 8), suggesting that BHM-21 treatment does affect SuperPol. Could the author comment on the interpretation of this result? 

      The 140 nt product is a degradation fragment resulting from trimming, which explains its lower accumulation in the absence of Rrp6. BMH21 significantly affects WT Pol I transcription but has also a mild effect on SuperPol transcription. As a result, the 140 nt product accumulates under these conditions.

      Reviewer #2:

      pp. 14-15: The authors note local differences in peak detection in the 5'-ETS among replicates, preventing a nucleotide-resolution analysis of pausing sites. Still, they report consistent global differences between wild-type and SuperPol CRAC signals in the 5'ETS (and other regions of the rDNA). These global differences are clear in the quantification shown in Figures 2B-C. A simpler statement might be less confusing, avoiding references to a "first and second set of replicates" 

      According to reviewer, statement has been simplified in this version of the manuscript.

      Figures 2A and 2C: Based on these data and quantification, it appears that SuperPol signals in the body and 3' end of the rDNA unit are higher than those in the wild type. This finding supports the conclusion that reduced pausing (and termination) in the 5'ETS leads to an increased Pol I signal downstream. Since the average increase in the SuperPol signal is distributed over a larger region, this might also explain why even a relatively modest decrease in 5'ETS pausing results in higher rRNA production. This point merits discussion by the authors. 

      We agree that this is a very important discussion of our results. Transcription is a very dynamic process in which paused polymerase is easily detected using the CRAC assay. Elongated polymerases are distributed over a much larger gene body, and even a small amount of polymerase detected in the gene body can represent a very large rRNA synthesis. This point is of paramount importance and, as suggested by the reviewer, is now discussed in detail.

      A decreased efficiency of cleavage upon backtracking might imply an increased error rate in SuperPol compared to the wild-type enzyme. Have the authors observed any evidence supporting this possibility? 

      Reviewer suggested that a decreased efficiency of cleavage upon backtracking might imply an increased error rate in SuperPol compared to the wild-type enzyme. We thank Reviewer #2 to point it as in our opinion, this is an important point what should be added to the manuscript. We have now included new data (panels 5G, 5H and 5I) in the manuscript showing that SuperPol in vitro exhibits an increased error rate compared to the WT enzyme. From these results obtained in vitro, we concluded that SuperPol shows reduced nascent transcript cleavage, associated with more efficient transcript elongation, but to the detriment of transcriptional fidelity.

      pp. 15 and 22: Premature transcription termination as a regulator of gene expression is welldocumented in yeast, with significant contributions from the Corden, Brow, Libri, and Tollervey labs. These studies should be referenced along with relevant bacterial and mammalian research. 

      According to reviewer suggestion, we referenced these studies.

      p. 23: "SuperPol and Rpa190-KR have a synergistic effect on BMH-21 resistance." A citation should be added for this statement. 

      This represents some unpublished data from our lab. KR and SuperPol are the only two known mutants resistant to BMH-21. We observed that resistance between both alleles is synergistic, with a much higher resistance to BMH-21 in the double mutant than in each single mutant (data not shown). Comparing their resistance mechanisms is a very important point that we could provide upon request. This was added to the statement.

      p. 23: "The released of the premature transcript" - this phrase contains a typo 

      This is now corrected.

      Reviewer #3:

      Figure 1B: it would be opportune to separate the technique's schematic representation from the actual data. Concerning the data, would the authors consider adding an experiment with rrp6D cells? Some RNAs could be degraded even in such short period of time, as even stated by the authors, so maybe an exosome depleted background could provide a more complete picture. Could also the authors explain why the increase is only observed at the level of 18S and 25S? To further prove the robustness of the Pol I TMA method could be good to add already characterized mutations or other drugs to show that the technique can readily detect also well-known and expected changes. 

      The precise objective of this experiment is to avoid the use of the Rrp6 mutant. Under these conditions, we prevent the accumulation of transcripts that would result from a maturation defect. While it is possible to conduct the experiment with the Rrp6 mutant, it would be impossible to draw reliable conclusions due to this artificial accumulation of transcripts.

      Figure 1C: the NTS1 probe signal is missing (it is referenced in Figure 1A but not listed in the Methods section or the oligo table). If this probe was unused, please correct Figure 1A accordingly. 

      We corrected Figure 1A.  

      Figure 2A: the RNAPI occupancy map by CRAC is hard to interpret. The red color (SuperPol) is stacked on top of the blue line, and we are not able to observe the signal of the WT for most of the position along the rDNA unit. It would be preferable to use some kind of opacity that allows to visualize both curves. Moreover, the analysis of the behavior of the polymerase is always restricted to the 5'ETS region in the rest of the manuscript. We are thus not able to observe whether termination events also occur in other regions of the rDNA unit. A Northern blot analysis displaying higher sizes would provide a more complete picture. 

      We addressed this point to make the figure more visually informative. In Northern Blot analysis, we use a TSS (Transcription Start Site) probe, which detects only transcripts containing the 5' extremity. Due to co-transcriptional processing, most of the rRNA undergoing transcription lacks its 5' extremity and is not detectable using this technique. We have the data, but it does not show any difference between Pol I and SuperPol. This information could be included in the supplementary data if asked.

      "Importantly, despite some local variations, we could reproducibly observe an increased occupancy of WT Pol I in 5'-ETS compared to SuperPol (Figure 1C)." should be Figure 2C. 

      Thanks for pointing out this mistake. It has been corrected.

      Figure 3D: most of the difference in the cumulative proportion of CRAC reads is observed in the region ~750 to 3000. In line with my previous point, I think it would be worth exploring also termination events beyond the 5'-ETS region. 

      We agree that such an analysis would have been interesting. However, with the exception of the pre-rRNA starting at the transcription start site (TSS) studied here, any cleaved rRNA at its 5' end could result from premature termination and/or abnormal processing events. Exploring the production of other abnormal rRNAs produced by premature termination is a project in itself, beyond this initial work aimed at demonstrating the existence of premature termination events in ribosomal RNA production.

      Figure 4: should probably be provided as supplementary material. 

      As l mentioned earlier (see comments), the presence of all Pol I specific subunits (Rpa12, Rpa34 and Rpa49) is crucial for the enzymatic activity we performed. This important control should be shown, but can indeed be shown in a supplementary figure if desired.

      "While the growth of cells expressing SuperPol appeared unaffected, the fitness of WT cells was severely reduced under the same conditions." I think the growth of cells expressing SuperPol is slightly affected. 

      We agree with this comment and we modified the text accordingly.

      Figure 7D: the legend of the y-axis is missing as well as the title of the plot. 

      Legend of the y-axis and title of the plot are now present.

      The statements concerning BMH-21, SuperPol and Rpa190-KR in the Discussion section should be removed, or data should be provided.

      This was discussed previously. See comment above.

      Some references are missing from the Bibliography, for example Merkl et al., 2020; Pilsl et al., 2016a, 2016b. 

      Bibliography is now fixed

      Description of analyses that authors prefer not to carry out:

      Does SuperPol mutant produces more functional rRNAs ?

      As Reviewer 1 requested, we agree that this point requires clarification.. In cells expressing SuperPol, a higher steady state of (pre)-rRNAs is only observed in absence of degradation machinery suggesting that overproduced rRNAs are rapidly eliminated. We know that (pre)rRNas are unable to accumulate in absence of ribosomal proteins and/or Assembly Factors (AF). In consequence, overproducing rRNAs would not be sufficient to increase ribosome content. This specific point is further address in our lab but is beyond the scope of this article.

      Is premature termination coupled with rRNA processing 

      We appreciate the reviewer’s insightful comments. The suggested experiments regarding the UTP-A complex's regulatory potential are valuable and ongoing in our lab, but they extend beyond the scope of this study and are not suitable for inclusion in the current manuscript.

    1. eLife Assessment

      In this study, the authors offer a theoretical explanation for the emergence of nematic bundles in the actin cortex, carrying implications for the assembly of actomyosin stress fibers. As such, the study is a valuable contribution to the field actomyosin organisation in the actin cortex. The theoretical work is solid and provides a rigorous theoretical framework to study active self-organisation in actomyosin systems, including qualitative comparison with experimental observations.

    2. Reviewer #1 (Public review):

      Summary:

      In this article, Mirza et al developed a continuum active gel model of actomyosin cytoskeleton that account for nematic order and density variations in actomyosin. Using this model, they identify the requirements for the formation of dense nematic structures. In particular, they show that self-organization into nematic bundles requires both flow-induced alignment and active tension anisotropy in the system. By varying model parameters that control active tension and nematic alignment, the authors show that their model reproduces a rich variety of actomyosin structures, including tactoids, fibres, asters as well as crystalline networks. Additionally, discrete simulations are employed to calculate the activity parameters in the continuum model, providing a microscopic perspective on the conditions driving the formation of fibrillar patterns.

      Strengths:

      The strength of the work lies in its delineation of the parameter ranges that generate distinct types of nematic organization within actomyosin networks. The authors pinpoint the physical mechanisms behind the formation of fibrillar patterns, which may offer valuable insights into stress fiber assembly. Another strength of the work is connecting activity parameters in the continuum theory with microscopic simulations.

      Weaknesses:

      This paper is a very difficult read for nonspecialists, especially if you are not well-versed in continuum hydrodynamic theories. Efforts should be made to connect various elements of theory with biological mechanisms, which is mostly lacking in this paper. The comparison with experiments is predominantly qualitative. It is unclear if the theory is suited for in vitro or in vivo actomyosin systems. The justification for various model assumptions, especially concerning their applicability to actomyosin networks, requires a more thorough examination. The classification of different structures demands further justification. For example, the rationale behind categorizing structures as sarcomeric remains unclear when nematic order is perpendicular to the axis of the bands. Sarcomeres traditionally exhibit a specific ordering of actin filaments with alternating polarity patterns. Similarly, the criteria for distinguishing between contractile and extensile structures need clarification, as one would expect extensile structures to be under tension contrary to the authors' claim. Additionally, it's unclear if the model's predictions for fiber dynamics align with observations in cells, as stress fibers exhibit a high degree of dynamism and tend to coalesce with neighboring fibers during their assembly phase. Finally, it seems that the microscopic model is unable to recapitulate the density patterns predicted by the continuum theory, raising questions about the suitability of the simulation model.

    3. Reviewer #2 (Public review):

      Summary:

      The article by Waleed et al discusses the self-organization of actin cytoskeleton using the theory of active nematics. Linear stability analysis of the governing equations and computer simulations show that the system is unstable to density fluctuations and self-organized structures can emerge.

      Strengths:

      (i) Analytical calculations complemented with simulations (ii) Theory for cytoskeletal network

      Weaknesses:

      Not placed in the context or literature on active nematics.

      Comments on revised version:

      The authors have satisfactorily responded to the comments

    4. Reviewer #3 (Public review):

      The manuscript "Theory of active self-organization of dense nematic structures in the actin cytoskeleton" analysis self-organized pattern formation within a two-dimensional nematic liquid crystal theory and uses microscopic simulations to test the plausibility of some of the conclusions drawn from that analysis. After performing an analytic linear stability analysis that indicates the possibility of patterning instabilities, the authors perform fully non-linear numerical simulations and identify the emergence of stripe-like patterning when anisotropic active stresses are present. Following a range of qualitative numerical observations on how parameter changes affect these patterns, the authors identify, besides isotropic and nematic stress, also active self-alignment as an important ingredient to form the observed patterns. Finally, microscopic simulations are used to test the plausibility of some of the most crucial assumptions underlying continuum simulations.

      The paper is well written, figures are mostly clear, and the theoretical analysis presented in both, main text and supplement, is rigorous. Mechano-chemical coupling has emerged in recent years as a crucial element of cell cortex and tissue organization and it is plausible to think that both, isotropic and anisotropic active stresses, are present within such effectively compressible structures. Even though not explicitly stated this way by the authors, I would argue that combining these two is one of the key ingredients that distinguishes this theoretical paper from similar ones.

      The diversity of patterning processes experimentally observed and theoretically described is nicely elaborated on in the introduction of the paper. The theory development and discussion of the continuum model itself is also well-embedded in a review of the relevant broad literature on active liquid crystals and active nematics, which includes plenty of previous results by the authors themselves. Interestingly, several of the patterns identified in the present work, such as 2D hexagonal and pulsatory patterns (Kumar et al, PRL, 2014), as well as contractile patches (Mietke et al, PRL 2019) have been observed previously in different, but related, active isotropic fluid models. In light of this crowded literature, the authors do good job in delineating key results obtained in the present manuscript from existing work.

      The results of numerical simulations are well-presented. The discussion of numerical observations is comprehensive, but also at many times qualitative. Some of the observations resonate with recent discussions in the field, for example the observation of effectively extensile dynamics in a contractile system, which is interesting and reminiscent of ambiguities about extensile/contractile properties discussed in recent preprints (Nejad et al, Nat Comm 2024). It is convincingly concluded that, besides nematic stress on top of isotropic one, active self-alignment is a key ingredient to produce the observed patterns.

      The authors must be complimented for trying to gain further mechanistic insights into their conclusions using microscopic filament simulations that were diligently performed. It is rightfully stated that these simulations only provide plausibility tests about key assumptions underlying the hydrodynamic theory. Within this scope, I would say the authors are successful. At the same time, it leaves open questions that could have been discussed more carefully. For example, I wonder what can be said about the regime \kappa>0 microscopically, in which the continuum theory does also predict the formation of stripe patterns? How does the spatial inhomogeneous organization the continuum theory predicts fit in the presented, microscopic picture and vice versa? The authors clearly explain the scope and limitations of the microscopic model, which suggests that questions like these will be interesting directions of future investigations.

      Overall, the paper represents a valuable contribution to the field of active matter that should provide a fruitful basis to develop new hypothesis about the dynamic self-organisation and mechanics of dense filamentous bundles in biological systems.

    5. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      In this study, the authors offer a theoretical explanation for the emergence of nematic bundles in the actin cortex, carrying implications for the assembly of actomyosin stress fibers. As such, the study is a valuable contribution to the field actomyosin organization in the actin cortex. While the theoretical work is solid, experimental evidence in support of the model assumptions remains incomplete. The presentation could be improved to enhance accessibility for readers without a strong background in hydrodynamic and nematic theories.

      To address the weaknesses identified in this assessment, we have expanded the motivation and description of the theoretical model, specifically insisting on the experimental evidence supporting its rationale and assumptions. These changes in the revised manuscript are implemented in the two first paragraphs of Section “Theoretical model” and in a more detailed description and justification of the different mathematical terms that appear in that section. We have made an effort to map in our narrative different terms to mechanistic processes in the actomyosin network. Even if the nature of the manuscript is inevitably theoretical, we think that the revised manuscript will be more accessible to a broader spectrum of readers.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this article, Mirza et al developed a continuum active gel model of actomyosin cytoskeleton that account for nematic order and density variations in actomyosin. Using this model, they identify the requirements for the formation of dense nematic structures. In particular, they show that self-organization into nematic bundles requires both flow-induced alignment and active tension anisotropy in the system. By varying model parameters that control active tension and nematic alignment, the authors show that their model reproduces a rich variety of actomyosin structures, including tactoids, fibres, asters as well as crystalline networks. Additionally, discrete simulations are employed to calculate the activity parameters in the continuum model, providing a microscopic perspective on the conditions driving the formation of fibrillar patterns.

      Strengths:

      The strength of the work lies in its delineation of the parameter ranges that generate distinct types of nematic organization within actomyosin networks. The authors pinpoint the physical mechanisms behind the formation of fibrillar patterns, which may offer valuable insights into stress fiber assembly. Another strength of the work is connecting activity parameters in the continuum theory with microscopic simulations.

      We thank the referee for these comments.

      Weaknesses:

      (A) This paper is a very difficult read for nonspecialists, especially if you are not well-versed in continuum hydrodynamic theories. Efforts should be made to connect various elements of theory with biological mechanisms, which is mostly lacking in this paper. The comparison with experiments is predominantly qualitative.

      We understand the point of the referee. While it is unavoidable to present the continuum hydrodynamic theory behind our results, we have made an effort in the revised manuscript to (1) motivate the essential features required from a theoretical model of the actomyosin cytoskeleton capable of describing its nematic self organization (two first paragraphs of Section “Theoretical model”), and to (2) explicitly explain the physical meaning of each of the mathematical terms in the theory, and when appropriate, relate them to molecular mechanisms in the cytoskeleton. We hope that the revised manuscript addresses the concern of the referee.

      Regarding the comparison with experiments, they are indeed qualitative because the main point of the paper is to establish a physical basis for the self-organization of dense nematic structures in actomyosin gels. Somewhat surprisingly, we argue that a compelling mechanism explaining the tendency of actomyosin gels to form patterns of dense nematic bundles has been lacking. As we review in the introduction, these patterns are qualitatively diverse across cell types and organisms in terms of geometry and dynamics, and for this reason, our goal is to show that the same material in different parameter regimes can exhibit such qualitative diversity. A quantitative comparison is difficult for several reasons. First, many of the parameters in our theory have not been measured and are expected to vary wildly between cell types. In fact, estimates in the literature often rely on comparison with hydrodynamic models such as ours. For this reason, we chose to delineate regimes leading to qualitatively different emerging architectures and dynamics. Second, the patterns of nematic bundles found across cell types depend on the interaction between (1) the intrinsic tendency of actomyosin gels to form such structures studied here and (2) other elements of the cellular context. For instance, polymerization and retrograde flow from the lamellipodium, the physical barrier of the nucleus, and the interaction with the focal adhesion machinery are essential to understand the emergence of stress fibers in adherent cells. Cell shape and curvature anisotropy control the orientation of actin bundles in parallel patterns in the wings and trachea of insects. Nuclear positions guide the actin bundles organizing the cellularization of Sphaeroforma arctica [11]. Here, we focus on establishing that actomyosin gels have an intrinsic ability to self organize into dense nematic bundles, and leave how this property enables the morphogenesis of specific structures for future work. We have emphasized this point in the revised section of conclusions.

      (B) It is unclear if the theory is suited for in vitro or in vivo actomyosin systems. The justification for various model assumptions, especially concerning their applicability to actomyosin networks, requires a more thorough examination.

      We thank the referee for this comment. Our theory is applicable to actomyosin gels originating from living cells. To our knowledge, the ability of reconstituted actomyosin gels from purified proteins to sustain the kind of contractile dynamical steady-states observed in living cells is very limited. In the revised manuscript, we cite a very recent preprint presenting very exciting but partial results in this direction [49]. Instead, reconstituted in vitro systems encapsulating actomyosin cell extracts robustly recapitulate contractile steady-states. This point has been clarified in the first paragraph of Section “Theoretical model”.

      (C) The classification of different structures demands further justification. For example, the rationale behind categorizing structures as sarcomeric remains unclear when nematic order is perpendicular to the axis of the bands. Sarcomeres traditionally exhibit a specific ordering of actin filaments with alternating polarity patterns.

      We agree with the referee and in the revised manuscript we have avoided the term “sarcomeric” because it refers to very specific organizations in cells. What we previously called “sarcomeric patterns”, where bands of high density exhibit nematic order perpendicular to the axis of the bands, is not a structure observed to our knowledge in cells. It is introduced to delimit the relevant region in parameter space. In the revised manuscript, we refer to this pattern as “banded pattern with perpendicular nematic organization” or “banded pattern” in short.

      (D) Similarly, the criteria for distinguishing between contractile and extensile structures need clarification, as one would expect extensile structures to be under tension contrary to the authors' claim.

      We thank the referee for raising this point, which was not sufficiently clarified in the original manuscript. We first note that in incompressible active nematic models, active tension is deviatoric (traceless and anisotropic) because an isotropic component would simply get absorbed by the pressure field enforcing incompressibility. Being compressible, our model admits an active tension tensor with deviatoric and isotropic components. We consider always a contractile (positive) isotropic component of active tension, but the deviatoric component can be either contractile (𝜅 > 0) or extensile (𝜅 < 0), where we follow the common terminology according to which in contractile/extensile active nematics the active stress is proportional to q with a positive/negative proportionality constant [see e.g. https://doi.org/10.1038/s41467018-05666-8]. Furthermore, as clarified in the revised manuscript, total active stresses accounting for the deviatoric and isotropic components are always contractile (positive) in all directions, as enforced by the condition |𝜅| < 1.

      For fibrillar patterns, we need 𝜅 < 0, and therefore active stresses are larger perpendicular to the nematic direction. This means that the anisotropic component of the active tension is extensile, although, accounting for the isotropic component, total active tension is contractile (see Fig. 1c). This is now clarified in the text following Eq. 7 and in Fig. 1.

      However, following fibrillar pattern formation and as a result of the interplay between active and viscous stresses, the total stress can be larger along the emergent dense nematic structures (“contractile structures”) or perpendicular to them (“extensile structures”). To clarify this point, in the revised Fig. 4 and the text referring to it, we have expanded our explanation and plotted the difference between the total stress component parallel to the nematic direction (𝜎∥) and the component perpendicular to the nematic direction (𝜎⊥), with contractile structures satisfying 𝜎∥ − 𝜎⊥ > 0 and extensile structures satisfying 𝜎∥ − 𝜎⊥ < 0. See lines 280 to 303. This is consistent with the common notion of contractile/extensile systems in incompressible nematic systems [see e.g. https://doi.org/10.1038/s41467-018-05666-8].

      (E) Additionally, its unclear if the model's predictions for fiber dynamics align with observations in cells, as stress fibers exhibit a high degree of dynamism and tend to coalesce with neighboring fibers during their assembly phase.

      In the present work, we focus on the self-organization of a periodic patch of actomyosin gel. However, in adherent cells boundary conditions play an essential role, as discussed in our response to comment (A) by this referee. In ongoing work, we are studying with the present model the dynamics of assembly and reconfiguration of dense nematic structures in domains with boundary conditions mimicking in adherent cells, possibly interacting with the adhesion machinery, finding dynamical interactions as those suggested by the referee. As an example, we show a video of a simulation where at the edge of the circular domain, there is an actin influx modeling the lamellipodium, and in four small regions friction is higher simulating focal adhesions. Under these boundary conditions, the model presented in the paper exhibits the kind of dynamical reorganizations alluded by the referee.

      Author response video 1.

      We would like to note, however, that the prominent stress fibers in cells adhered to stiff substrates, so abundantly reported in the literature, are not the only instance of dense nematic actin bundles. In the present manuscript, we emphasize the relation of the predicted organizations with those found in different in vivo contexts not related to stress fibers, such as the aligned patterns of bundles in insects (trachea, scales in butterfly wings), in hydra, or in reproductive organs of C elegans; the highly dynamical network of bundles observed in C elegans early embryos; or the labyrinth patters of micro-ridges in the apical surface of epidermal cells in fish.

      (F) Finally, it seems that the microscopic model is unable to recapitulate the density patterns predicted by the continuum theory, raising questions about the suitability of the simulation model.

      We thank the referee for raising this question, which needs further clarification. The goal of the microscopic model is not to reproduce the self-organized patterns predicted by the active gel theory. The microscopic model lacks essential ingredients, notably a realistic description of hydrodynamics and turnover. Our goal with the agent-based simulations is to extract the relation between nematic order and active stresses for a small homogeneous sample of the network. This small domain is meant to represent the homogeneous active gel prior to pattern formation, and it allows us to substantiate key assumptions of the continuum model leading to pattern formation, notably the dependence of isotropic and deviatoric components of the active stress on density and nematic order (Eq. 7) and the active generalized stress promoting ordering.

      We should mention that reproducing the range of out-of-equilibrium mesoscale architectures predicted by our active gel model with agent-based simulations seems at present not possible, or at least significantly beyond the state-of-the-art. To our knowledge, these models have not been able to reproduce the heterogeneous nonequilibrium contractile states involving sustained self-reinforcing flows underlying the pattern formation mechanism studied in our work. The scope of the discrete network simulations has been clarified in lines 340 to 349 in the revised manuscript.

      While agent-based cytoskeletal simulations are very attractive because they directly connect with molecular mechanisms, active gel continuum models are better suited to describe out-of-equilibrium emergent hydrodynamics at a mesoscale. We believe that these two complementary modeling frameworks are rather disconnected in the literature, and for this reason, we have attempted substantiate some aspects of our continuum modeling with discrete simulations. We have emphasized the complementarity of the two approaches in the conclusions.

      Reviewer #1 (Recommendations For The Authors):

      Questions on the theory:

      Does rho describe the density of actin or myosin? The authors say that they are modeling actomyosin material as a whole, but the actin and myosin should be modeled separately. Along, similar lines, does Q define the ordering of actin or myosin?

      Active gel models of the actomyosin cytoskeleton have been formulated with independent densities for actin and for myosin or using a single density field, implicitly assuming a fixed stoichiometry. Super-resolution imaging of the actomyosin cytoskeleton also suggest that in principle it makes sense to consider different nematic fields for actin and for myosin filaments. In the revised manuscript, we now explicitly mention that our density and nematic field are effective descriptions of the entire actomyosin gel (lines 82-84).

      A more detailed model would entail additional material parameters, not available experimentally, which may help reproduce specific experiments but that would make the systematic study of the different behaviors much more difficult. Our approach has been to keep the model minimal meeting the fundamental requirements outlined in the first paragraphs of Section “Theoretical model”.

      Should the active stress depend on material density? It seems strange (from Eq. 3) that active stress could be non-zero even where density is zero, since sigma_act does not depend on rho.

      Yes, active stress is assumed to be proportional to density. Eq. 3 in the original manuscript was misleading (it was multiplied by rho in Eq. 2). In the revised manuscript, we have explained with a bit more detail the theoretical model, clarifying this point.

      The authors should clearly explain their rationale for retaining certain types of nonlinear terms while ignoring others in theory. For instance, the nonlinearities in the equations of motion are sometimes quadratic in the fields, while there are also some cubic terms. Please remark up to what order in the fields the various interactions are modeled.

      We thank the referee for raising this point. The nonlinearities in the theory are easily explained on the basis of a small number of choices. We have added a new paragraph towards the end of Section “Theoretical model” (lines 145 to 152) providing a rationale for the origin and underlying assumptions leading to different nonlinearities.

      To connect with experiments and the biological context, please explain the biological origin of various terms in the model: (1) L-dependent terms in Eq. 2 and 4, (2) Flowalignment of nematic order and experimental evidence in support of it, (3) densitydependent susceptibility terms in Eq. 4

      (1) Unfortunately, the L-dependent terms are very bulky, but are very standard in nematic theories. The best way to understand their physical significance is through the expression of the nematic free-energy, which is now given and explained in the revised manuscript (Eq. 3). The resulting complicated expression for the molecular field and the nematic stress (Eqs. 4 and 5) are mathematical consequences of the choice of nematic free energy. In the revised manuscript, we also attempt to provide a basis for these terms in the context of the actin cytoskeleton. (2) To our knowledge, the best reference supporting this term from experiments is Reymann et al, eLife (2016). In the revised manuscript, we have provided a physical interpretation. (3) We have expanded the motivation and plausible microscopic justification of this term.

      There are different 'activity' terms in the model. Their biophysical origin is not made clear. For example, the authors should make clear if these activities arise from filament or motor activity. Relatedly, the authors should provide a comprehensive discussion of the signs of the different active parameters and their physical interpretations.

      In an active gel model, activity parameters are phenomenological and how they map to molecular mechanisms is not precisely known, although conventionally contractile active tension is ascribed to the mechanical transduction of chemical power by myosin motors. The fact is that, besides myosin activity, there are many nonequilibrium processes in the actomyosin cytoskeleton that may lead to active stresses including (de)polymerization of filaments or (un)binding of crosslinkers. In the revised manuscript, we have added sentences illustrating how different terms may result from microscopic mechanisms, but providing a precise mapping between our model and nonequilibrium dynamics of proteins is beyond the scope of our work, although our discrete network simulations address this issue to a certain degree.

      Following the suggestion of the referee, our description of the theory now discusses much more extensively the signs of activity parameters and their physical interpretations, e.g. the text following Eq. 7.

      Throughout the paper, various activity terms are varied independently of each other. Is that a reasonable assumption given that activities should depend on ATP and are thus not independent of one another?

      We agree that, ultimately, all active process depend on the conversion of chemical energy into mechanical energy. However, recent work has highlighted how active tension also depends on the microscopic architecture of the network controlled by multiple regulators of the actomyosin cytoskeleton (e.g. Chug et al, Nat Cell Biol, 2017). It is reasonable to expect that, for a given rate of ATP consumption, chemical power will be converted into mechanical power in different ways depending on the micro-architecture of the cytoskeleton, e.g. the stoichiometry of filaments, crosslinkers, myosins, or the length distribution of filaments (very long filaments crosslinked by myosins may be difficult to reorient but may contract efficiently).

      We have added a paragraph in Section “Theoretical model” with a discussion, lines 153 to 156.

      Sarcomeres are muscle fibers that exhibit alternating polarity pattern. Such patterning is not evident in what the authors call 'sarcomeres' in Fig. 2. I believe the authors should revise their terminology and not loosely interpret existing classifications in the field.

      We thank the referee for raising this point. We have changed the terminology.

      Fig 2a: Is the cartoon for filament alignment incorrect for kappa>0?

      The cartoon is correct. In the revised manuscript we have explained more clearly the physical meaning of kappa in the text following Eq. 7. In the caption of Fig. 1 and of Fig. 2a, we have also clarified that when the absolute value of kappa is <1, then active tension is positive in all directions.

      Within the section "Requirements for fibrillar and banded patterns", it will be useful to show the figures for varying the different active parameters in the main figures.

      We have followed the referee’s suggestion and moved Supp. Fig. 1 of the original manuscript to the main figures.

      How do the authors decide if bundles are contractile or extensile? Why are contractile bundles under tension while extensile bundles are under compression? I would expect the opposite.

      We agree that this point deserves a more detailed explanation. In the revised manuscript and in the new Figure 4, we further develop this point. The fibrillar pattern forms when kappa<0. We further assume that -1<kappa<0, so that active tension is positive in all directions. In this regime, the deviatoric (anisotropic) part of active tension is extensile. However, following pattern formation and because of the interplay between active and viscous stresses, the total stress in the emerging bundles may become extensile or contractile, depending on whether the largest component of stress is perpendicular or along the bundle axis. This is now presented in the updated figure, with new panels presenting maps of the total tension. The text discussing this point has been rewritten and we hope that the new version is much clearer (lines 280 to 303).

      A contractile bundle tends to shorten, but it cannot do it because of boundary conditions or the interaction with other bundles. As a result they are in tension. Conversely, an extensile bundle tries to elongate, but being constrained, it becomes compressed. As an analogy, consider the cortex of a suspended cell. The cortex is contractile, but it cannot contract because of volume regulation in th cell, which is typically pressurized. As a result, tension in the cortex is positive, as shown by Laplace’s law [10.1016/j.tcb.2020.03.005]. We have tried to clarify this point in the revised manuscript.

      Can the authors reproduce alternating density patterns using the cytosim simulations? This is an important step in establishing the correspondence between the continuum theory and the agent-based model.

      We have addressed this point in our response to public comment (F) of this referee.

      The authors do not provide code or data.

      The finite element code with an input file require to run a representative simulation in the paper is now made available, see Ref. [74].

      The customizations of Cytosim needed to account for nematic order in our discrete network simulations are available, see Ref. [98].

      Reviewer #2 (Public Review):

      Summary:

      The article by Waleed et al discusses the self organization of actin cytoskeleton using the theory of active nematics. Linear stability analysis of the governing equations and computer simulations show that the system is unstable to density fluctuations and self organized structures can emerge. While the context is interesting, I am not sure whether the physics is new. Hence I have reservations about recommending this article.

      We thank the referee for these comments. In the revised manuscript, we have highlighted the novelty, particularly in the last paragraph of the introduction, the first two paragraphs of Section “Theoretical model”, and in the conclusions. Despite a very large literature on theoretical models of stress fibers, actin rings, and active nematics, we argue that the active self-organization of dense nematic structures from an isotropic and low-density gel has not been compellingly explained so far. Many models assume from the outset the presence of actin bundles, or explain their formation using localized activity gradients. The literature of active nematics has extensively studied symmetry breaking and the self-organization. However, most of the works assume initial orientational order. Only a few works study the emergence of nematic order from a uniform isotropic state, but consider dry systems lacking hydrodynamic interactions or incompressible and density-independent systems [37,38]. Yet, pattern formation in actomyosin gels is characterized by large density variations, and by highly compressible flows, which coordinate in a mechanism relying on an advective instability and self-reinforcing flows.

      Our theoretical model is not particularly novel, and as we mention in the manuscript, it can be particularized to different models used in the literature. However, we argue that it has the right minimal features to capture nematic self-organization in actomyosin gels. To our knowledge, no previous study explains the emergence of dense and nematic structures from a low-density isotropic gel as a result of activity and involving the advective instability typical of symmetry-breaking and patterning in the actomyosin cytoskeleton. These are important qualitative features of our results that resonate with a large experimental record, and as such, we believe that our work provides a new and compelling mechanism relying on self-organization to explain the prominence and diversity of patterns involving dense nematic bundles in the actomyosin cytoskeleton across species.

      Strengths:

      (i) Analytical calculations complemented with simulations (ii) Theory for cytoskeletal network

      Weaknesses:

      Not placed in the context or literature on active nematics.

      We agree with the referee that this was a weakness of the original manuscript. In the revised manuscript, within reasonable space constraints given the size and dynamism of the field of active nematics, we have placed our work in the context of this field (end of introduction and first two paragraphs of Section “Theoretical model”). The published version of our companion manuscript [45] also contributes to providing a clear context to our theoretical model within the field.

      Reviewer #2 (Recommendations For The Authors):

      The article by Waleed et al discusses the self organization of actin cytoskeleton using the theory of active nematics. Linear stability analysis of the governing equations and computer simulations show that the system is unstable to density fluctuations and self organized structures can emerge. While the context is interesting, I am not sure whether the physics is new. Hence I have reservations about recommending this article. I explain my questions comments below.

      We have responded to this comment above.

      (i) Active nematics including density variations have been dealt quite extensively in the literature. For example, the works of Sriram Ramaswami have dealt with this system including linear stability analysis, simulations etc. In what way is the present work different from the system that they have considered?

      (ii) Active flows leading to self organization has been a topic of discussion in many works. For example: (i) Annual Review of Fluid Mechanics, Vol. 43:637-659, 2010, https://doi.org/10.1146/annurev-fluid-121108-145434 (ii) S Santhosh, MR Nejad, A Doostmohammadi, JM Yeomans, SP Thampi, Journal of Statistical Physics 180, 699-709 (iii) M. G. Giordano1, F. Bonelli2, L. N. Carenza1,3, G. Gonnella1 and G. Negro1, Europhysics Letters, Volume 133, Number 5. In what way this work is different from any of these?

      (iii) I am confused about the models used in the paper. There is significant literature from Prof. Mike Cates group, Prof. Julia Yeomans group, Prof. Marchetti's group who all use similar governing equations. In the present paper, I find it hard to understand whether the model used is similar to the existing ones in literature or are there significant differences. It should be clarified.

      Response to (i), (ii) and (iii).

      We completely agree with this referee (and also the previous referee), that the contextualization of our work in the field of active nematics was very insufficient. In the revised manuscript, the last paragraph of the introduction and the first two paragraphs of Section “Theoretical model” now address this point. In short, previous active nematic models predicting patterns with density variations have been either for dry active matter (disregarding hydrodynamic interactions), or for suspensions of active particles moving in an incompressible flow. None of these previous works predict nematic pattern formation as a result of activity relying on the advective instability and self-reinforcing compressible flows, leading to high density and high order bundles surrounded by an isotropic low density phase. Yet, these are fundamental features observed in actomyosin gels. Many works deal with symmetry-breaking of a system with pre-existing order, but very few address how order emerges actively from an isotropic state. We thank the referee for pointing at the paper by Santhosh et al, who nicely make this argument and is now cited. Our mechanism is fundamentally different from that in Santhosh, whose model is incompressible and ignores density variations.

      We hope that the revised manuscript addresses this important concern.

      (i) >(iv) Below Eqn 6, it starts by saying that the “...origin..is clear...” Its not. I don't understand the physical origin of the instability, and this should be clarified, may be with some illustrations.

      We apologize for this unfortunate sentence, which we have rewritten in the revised manuscript (lines 181 to 185).

      Reviewer #3 (Public Review):

      The manuscript "Theory of active self-organization of dense nematic structures in the actin cytoskeleton" analysis self-organized pattern formation within a two-dimensional nematic liquid crystal theory and uses microscopic simulations to test the plausibility of some of the conclusions drawn from that analysis. After performing an analytic linear stability analysis that indicates the possibility of patterning instabilities, the authors perform fully non-linear numerical simulations and identify the emergence of stripelike patterning when anisotropic active stresses are present. Following a range of qualitative numerical observations on how parameter changes affect these patterns, the authors identify, besides isotropic and nematic stress, also active self-alignment as an important ingredient to form the observed patterns. Finally, microscopic simulations are used to test the plausibility of some of the conclusions drawn from continuum simulations.

      The paper is well written, figures are mostly clear and the theoretical analysis presented in both, main text and supplement, is rigorous. Mechano-chemical coupling has emerged in recent years as a crucial element of cell cortex and tissue organization and it is plausible to think that both, isotropic and anisotropic active stresses, are present within such effectively compressible structures. Even though not yet stated this way by the authors, I would argue that combining these two is of the key ingredients that distinguishes this theoretical paper from similar ones. The diversity of patterning processes experimentally observed is nicely elaborated on in the introduction of the paper, though other closely related previous work could also have been included in these references (see below for examples).

      We thank the referee for these comments and for the suggestion to emphasize the interplay of isotropic and anisotropic active tension, which is possible only in a compressible gel, as mentioned in the revised manuscript. We have emphasized this point in different places in the revised manuscript. We thank the suggestions of the referee to better connect with existing literature.

      To introduce the continuum model, the authors exclusively cite their own, unpublished pre-print, even though the final equations take the same form as previously derived and used by other groups working in the field of active hydrodynamics (a certainly incomplete list: Marenduzzo et al (PRL, 2007), Salbreux et al (PRL, 2009, cited elsewhere in the paper), Jülicher et al (Rep Prog Phys, 2018), Giomi (PRX, 2015),...). To make better contact with the broad active liquid crystal community and to delineate the present work more compellingly from existing results, it would be helpful to include a more comprehensive discussion of the background of the existing theoretical understanding on active nematics. In fact, I found it often agrees nicely with the observations made in the present work, an opportunity to consolidate the results that is sometimes currently missed out on. For example, it is known that self-organised active isotropic fluids form in 2D hexagonal and pulsatory patterns (Kumar et al, PRL, 2014), as well as contractile patches (Mietke et al, PRL 2019), just as shown and discussed in Fig. 2. It is also known that extensile nematics, \kappa<0 here, draw in material laterally of the nematic axis and expel it along the nematic axis (the other way around for \kappa>0, see e.g. Doostmohammadi et al, Nat Comm, 2018 "Active Nematics" for a review that makes this point), consistent with all relative nematic director/flow orientations shown in Figs. 2 and 3 of the present work.

      We thank the referee for these suggestions. Indeed, in the original submission we had outsourced much of the justification of the model and the relevant literature to a related pre-print, but this is not reasonable. The companion publication has now been accepted in the New Journal of Physics, with significant changes to better connect the work to the field of active nematics. A preprint reflecting those changes is available in Ref. [64], but we hope to reference the published paper that will come out soon.

      In the revised manuscript, we have significantly rewritten the Section “Theoretical model” to frame the continuum model in the context of the field of active nematics. While our model and results have commonalities with previous work, there are also important differences. We have highlighted the novelty of the present work along with the relation with previous studies and theoretical models in the last paragraph of the introduction and the first two paragraphs of Section “Theoretical model”. Furthermore, as suggested by the referee, we have made an effort to connect our results with previous work by Kumar, Mietke, Doostmohammadi and others.

      Regarding the last point alluded by the referee (“extensile nematics, \kappa<0 here, draw in material laterally of the nematic axis and expel it along the nematic axis”), the picture raised by the referee would be nuanced for our compressible system as compared to the incompressible systems discussed in that reference. As we have elaborated in our response to point (D) of Referee #1, our systems are overall contractile (with positive active tension in all directions), but the deviatoric component of the active tension can be either extensile or contractile. In our “extensile” models (left in Fig. 2c), material is drawn to laterally to the nematic axis but it is not expelled along this axis. Instead, it is “expelled” by turnover. In the revised manuscript, we have added a comment about this.

      The results of numerical simulations are well-presented. Large parts of the discussion of numerical observations - specifically around Fig. 3 - are qualitative and it is not clear why the analysis is restricted to \kappa<0. Some of the observations resonate with recent discussions in the field, for example the observation of effectively extensile dynamics in a contractile system is interesting and reminiscent of ambiguities about extensile/contractile properties discussed in recent preprints (https://arxiv.org/abs/2309.04224). It is convincingly concluded that, besides nematic stress on top of isotropic one, active self-alignment is a key ingredient to produce the observed patterns.

      We thank the referee for these comments. We are reluctant to extend the detailed analysis of emergent architectures and dynamics to the case \kappa > 0 as it leads to architectures not observed, to our knowledge, in actin networks. In the revised manuscript, we have expanded and clarified the characterization of emergent contractile/extensile networks by reporting the relative magnitude of stress along and perpendicular to the nematic direction. Our revised manuscript clearly shows that even though all of our simulations describe locally contractile systems with extensile anisotropic active tension, the emergent meso-structures can be either extensile or contractile, with the extensile ones exhibiting the usual bend-type instability (a secondary instability in our system) described classically for extensile active nematic systems. We have rewritten the text discussing this (lines 280 to 303), where we have placed these results in the context of recent work reporting the nontrivial relation between the contractility/extensibility of the local units vs the nematic pattern.

      I compliment the authors for trying to gain further mechanistic insights into this conclusion with microscopic filament simulations that are diligently performed. It is rightfully stated that these simulations only provide plausibility tests and, within this scope, I would say the authors are successful. At the same time, it leaves open questions that could have been discussed more carefully. For example, I wonder what can be said about the regime \kappa>0 (which is dropped ad-hoc from Fig. 3 onward) microscopically, in which the continuum theory does also predict the formation of stripe patterns - besides the short comment at the very end? How does the spatial inhomogeneous organization the continuum theory predicts fit in the presented, microscopic picture and vice versa?

      We thank the referee for this compliment. We think that the point raised by the referee is very interesting. It is reasonable to expect that the sign of \kappa may not be a constant but rather depend on S and \rho. Indeed, for a sparse network with low order, the progressive bundling by crosslinkers acting on nearby filaments is likely to produce a large active stress perpendicular to the nematic direction, whereas in a dense and highly ordered region, myosin motors are more likely to effectively contract along the nematic direction whereas there is little room for additional lateral contraction by additional bundling. As discussed in our response to referee #1, we believe that studying the formation of patterns using the discrete network simulations is far beyond the scope of our work. We discuss in lines 332 to 341, as well as in the last paragraph of the conclusions, the scope and limitations of our discrete network simulations.

      Overall, the paper represents a valuable contribution to the field of active matter and, if strengthened further, might provide a fruitful basis to develop new hypothesis about the dynamic self-organisation of dense filamentous bundles in biological systems.

      Reviewer #3 (Recommendations For The Authors):

      • The statement "the porous actin cytoskeleton is not a nematic liquid-crystal because it can adopt extended isotropic/low-order phases" is difficult to understand and should be clarified, as the next paragraph starts formulating a nematic active liquid crystal theory. Do the authors mean a crystal that "Tends to be in a disordered phase?", according to its equilibrium properties? It would still be a "nematic liquid crystal", only its ground state is not a nematic phase.

      We agree with the referee, and we hope that changes in the introduction and in Section “Theoretical model” address this comment.

      • I could not find what Frank energy is precisely used, that would be helpful information.

      In the revised manuscript, we have provided the expression for the nematic free energy in Eq. 3.

      • The Significance of green/purple arrows in Fig 2a sketch unclear, green arrows also in b,c, do they represent the same quantity? From the simulations images it is overall it is very difficult to see how the flows are oriented near the high-density regions (i.e. if they are towards / away from the strip).

      We thank the referee for bringing this up. The colorcodings of the sketches were confusing. The modified figures (Fig. 1(c) and Fig. 2(a)) present now a clearer and unified representation of anisotropic tension. The green arrows in Fig. 2(c) represent the out-of-equilibrium flows in the steady state. We agree that the zoom is insufficient to resolve the flow structure. For this reason, in the revised Fig. 2, we have added additional panels showing the flow with higher resolution.

      • It is currently unclear how the linear stability results - beyond identification of the parameter \delta - inform any of the remaining manuscript. Quantitative comparisons of the various length scales seen in simulated patterns (e.g. Fig. 2b, 3c etc) with linear predictions and known characteristic length scales would be instructive mechanistically, would make the overall presentation more compelling and probes limitations of linear results.

      In the revised manuscript, we have provided further information so that the readers can appreciate the predictions and limitations of the linear stability results. We have added a sentence and a Figure to show that, in addition to the critical activity, the linear theory provides a good prediction of the wavelengh of the pattern. See lines 199 to 201.

      • It is not clear what is meant by "[bundle-formation] requires that active tension perpendicular to nematic orientation is larger than along this direction", and therefore also not why that would be "counter-intuitive". If interpreted naively, I would say that a large tension brings in more filaments into the bundle, so that may well be an obviously helpful feature for bundle formation and maintenance. In any case, it would be helpful if clarity is improved throughout when arguments about "directions of tensions" are made.

      We have significantly rewritten the first paragraphs of section “Microscopic origin…” to clarify this point (lines 330 to 339). This paragraph, along with other changes in the manuscript such as the explanation of Eq. 7 or the discussion about the stress anisotropy in the new version of Fig. 4 (see lines 280 to 303), provide a better explanation of this important point.

      • All density color bars: Shouldn't they rather be labelled \rho/\rho_0?

      Yes! We have corrected this typo.

      • Scalar product missing in caption definition of order parameter Fig. 2

      We have corrected this typo.

      • Fig. 3a: I suggest to put the expression for q0 in the caption

      We have changed q_0 by S_0 and clarified its meaning in the caption of what now is Fig 4.

      • Paragraph on bottom right of page 6 should several times probably refer to Fig. 3c(...), instead of Fig. 3b

      We have corrected this typo.

    1. eLife Assessment

      The study is useful for advancing spatial transcriptomics through its novel regression-based linear model (glmSMA) that integrates single-cell RNA-seq with spatial reference atlases, and its methodological framework is convincing. The approach demonstrates notable utility by enabling higher-resolution cell mapping across multiple biological systems and spatial platforms compared to existing tools.

    2. Reviewer #1 (Public review):

      Liu et al., present glmSMA, a network-regularized linear model that integrates single-cell RNA-seq data with spatial transcriptomics, enabling high-resolution mapping of cellular locations across diverse datasets. Its dual regularization framework (L1 for sparsity and generalized L2 via a graph Laplacian for spatial smoothness) demonstrates robust performance of their model. It offers novel tools for spatial biology, despite some gaps in fully addressing spatial communication.

      The study presents a clear methodological framework that balances sparsity and smoothness, with parameter guidelines for different tissue contexts. It is commendable for its application to multiple spatial omics platforms, including both sequencing-based and imaging-based data, with results that can be generalized across both structured and less-structured tissues. After revision, there is a more transparent discussion of assumptions, including the correlation between expression and physical distance, and how performance may vary by tissue heterogeneity.

      Limitations are modest - the spatial communication application is mentioned but not fully developed, and resolution reporting is primarily qualitative, which may limit direct comparability between datasets. The imaging-based validation is currently limited to simulated or lower-plex data, and expansion to high-plex datasets would further support platform versatility, although this is not essential to the core claims.

      Overall, the manuscript delivers on its main objective, which is to present and validate a practical, flexible, and accurate framework for spatial mapping. The methods are clearly described, and the resource will be useful for researchers seeking to integrate single-cell and spatial datasets in diverse biological contexts.

    3. Reviewer #2 (Public review):

      Summary:

      The author proposes a novel method for mapping single-cell data to specific locations with higher resolution than several existing tools.

      Strengths:

      The spatial mapping tests were conducted on various tissues, including the mouse cortex, human PDAC, and intestinal villus.

      Comments on revised version:

      The authors have sufficiently addressed all of my comments.

    4. Reviewer #3 (Public review):

      Summary:

      The authors have provided a thorough and constructive response to the comments. They effectively addressed concerns regarding the dependence on marker gene selection by detailing the incorporation of multiple feature selection strategies, such as highly variable genes and spatially informative markers (e.g., via Moran's I), which enhance glmSMA's robustness even when using gene-limited reference atlases.

      Furthermore, the authors thoughtfully acknowledged the assumption underlying glmSMA-that transcriptionally similar cells are spatially proximal-and discussed both its limitations and empirical robustness in heterogeneous tissues such as human PDAC. Their use of real-world, heterogeneous datasets to validate this assumption demonstrates the method's practical utility and adaptability.

      Overall, the response appropriately contextualizes the limitations while reinforcing the generalizability and performance of glmSMA. The authors' clarifications and experimental justifications strengthen the manuscript and address the reviewer's concerns in a scientifically sound and transparent manner.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Liu et al., present glmSMA, a network-regularized linear model that integrates single-cell RNA-seq data with spatial transcriptomics, enabling high-resolution mapping of cellular locations across diverse datasets. Its dual regularization framework (L1 for sparsity and generalized L2 via a graph Laplacian for spatial smoothness) demonstrates robust performance of their model and offers novel tools for spatial biology, despite some gaps in fully addressing spatial communication.

      Overall, the manuscript is commendable for its comprehensive benchmarking across different spatial omics platforms and its novel application of regularized linear models for cell mapping. I think this manuscript can be improved by addressing method assumptions, expanding the discussion on feature dependence and cell type-specific biases, and clarifying the mechanism of spatial communication.

      The conclusions of this paper are mostly well supported by data, but some aspects of model developmentand performance evaluation need to be clarified and extended.

      We are thankful for the positive comments and have made changes following the reviewer's advice, as detailed below.

      (1) What were the assumptions made behind the model? One of them could be the linear relationship between cellular gene expression and spatial location. In complex biological tissues, non-linear relationships could be present, and this would also vary across organ systems and species. Similarly, with regularization parameters, they can be tuned to balance sparsity and smoothness adequately but may not hold uniformly across different tissue types or data quality levels. The model also seems to assume independent errors with normal distribution and linear additive effects - a simplification that may overlook overdispersion or heteroscedasticity commonly observed in RNA-seq data.

      Thank you for this comment. We acknowledge that the non-linear relationships can be present in complex tissues and may not be fully captured by a linear model. 

      Our choice of a linear model was guided by an investigation of the relationship in the current datasets, which include intestinal villus, mouse brain, and fly embryo.There is a linear correlation between expression distance and physical distance [Nitzan et al]. Within a given anatomical structure, cells in closer proximity exhibit more similar expression patterns (Fig. 3c). In tissues where non-linear relationships are more prevalent—such as the human PDAC sample—our mapping results remain robust. We acknowledge that we have not yet tested our algorithm in highly heterogeneous regions like the liver, and we plan to include such analyses in future work if necessary.

      Regarding the regularization parameters, we agree that the balance between sparsity and smoothness is sensitive to tissue-specific variation and data quality. In our current implementation, we explored a range of values to find robust defaults. Supplementary Figure 7 illustrates the regularization path for cell assignment in the fly embryo.  

      The choice of L1 and L2 regularization parameters is crucial for balancing sparsity and smoothness in spatial mapping. 

      For Structured Tissues (brain):

      Moderate L1 to ensure cells are localized.

      Small to moderate L2 to maintain local smoothness without blurring distinct regions.

      For Less Structured (PDAC):

      Slightly lower L1 to allow cells to be associated with multiple regions if boundaries are ambiguous.

      Higher L2 to stabilize mappings in noisy or mixed regions.

      (2) The performance of glmSMA is likely sensitive to the number and quality of features used. With too few features, the model may struggle to anchor cells correctly due to insufficient discriminatory power, whereas too many features could lead to overfitting unless appropriately regularized. The manuscript briefly acknowledges this issue, but further systematic evaluation of how varying feature numbers affect mapping accuracy would strengthen the claims, particularly in settings where marker gene availability is limited. A simple way to show some of this would be testing on multiple spatial omics (imaging-based) platforms with varying panel sizes and organ systems. Related to this, based on the figures, it also seems like the performance varies by cell type. What are the factors that contribute to this? Variability in expression levels, RNA quantity/quality? Biases in the panel? Personally, I am also curious how this model can be used similarly/differently if we have a FISH-based, high-plex reference atlas. Additional explanation around these points would be helpful for the readers.

      Thank you for this thoughtful comment. The performance of our method is indeed sensitive to the number and quality of selected features. To optimize feature selection, we employed multiple strategies, including Moran’s I statistic, identification of highly variable genes, and the Seurat pipeline to detect anchor genes linking the spatial transcriptomics data with the reference atlas. The number of selected markers depends on the quality of the data. For highquality datasets, fewer than 100 markers are typically sufficient for prediction. To select marker genes, we applied the following optional strategies:

      (1) Identifying highly variable genes (HVGs).

      (2) Calculating Moran’s I scores for all genes to assess spatial autocorrelation.

      (3) Generating anchor genes based on the integration of the reference atlas and scRNA-seq data using Seurat.

      We evaluated our method across diverse tissue types and platforms—including Slide-seq, 10x Visium, and Virtual-FISH—which represent both sequencing-based and imaging-based spatial transcriptomics technologies. Our model consistently achieved strong performance across these settings. It's worth noting that the performance of other methods, such as CellTrek [Wei et al] and novoSpaRc [Nitzan et al], also depends heavily on feature selection. In particular, performance degrades substantially when fewer features are used. For fair comparison across different methods, the same set of marker genes was used. Under this condition, our method outperformed the others based on KL divergence (Fig. 2b, Fig. 5g). 

      To assess the effect of marker gene quantity, we randomly selected subsets of 2,000, 1500, 1,000, 700, 500, and 200 markers from the original set. As the number of markers decreases, mapping performance declines, which is expected due to the reduction in available spatial information. This result underscores the general dependence of spatial mapping accuracy on both the number and quality of informative marker genes (Supplementary Fig. 10).

      We do not believe that the observed performance is directly influenced by cell type composition. Major cell types are typically well-defined, and rare cell types comprise only a small fraction of the dataset. For these rare populations, a single misclassification can disproportionately impact metrics like KL divergence due to small sample size. However, this does not necessarily indicate a systematic cell type–specific bias in the mapping. We incorporated a high-resolution Slide-seq dataset from the mouse hippocampus to evaluate the influence of cell type composition on the algorithm’s performance [Stickels et al., 2020]. Most cell types within the CA1, CA2, CA3, and DG regions were accurately mapped to their original anatomical locations (Fig. 5e, f, g).

      (3) Application 3 (spatial communication) in the graphical abstract appears relatively underdeveloped. While it is clear that the model infers spatial proximities, further explanation of how these mappings translate into insights into cell-cell communication networks would enhance the biological relevance of the findings.

      Thank you for this valuable feedback. We agree that further elaboration on the connection between spatial proximity and cell–cell communication would enhance the biological interpretation of our results. While our current model focuses on inferring spatial relationships,  we may provide some cell-cell communications in the future.

      (4) What is the final resolution of the model outputs? I am assuming this is dictated by the granularity of the reference atlas and the imposed sparsity via the L1 norm, but if there are clear examples that would be good. In figures (or maybe in practice too), cells seem to be assigned to small, contiguous patches rather than pinpoint single-cell locations, which is a pragmatic compromise given the inherent limitations of current spatial transcriptomics technologies. Clarification on the precise spatial scale (e.g., pixel or micrometer resolution) and any post-mapping refinement steps would be beneficial for the users to make informed decisions on the right bioinformatic tools to use.

      Thank you for the comment. For each cell, our algorithm generates a probability vector that indicates its likely spatial assignment along with coordinate information. In our framework, each cell is mapped to one or more spatial spots with associated probabilities. Depending on the amount of regularization through L1 and L2 norms, a cell may be localized to a small patch or distributed over a broader domain (Supplementary Fig. 5 & 7). For the 10x Visium data, we applied a repelling algorithm to enhance visualization [Wei et al]. If a cell’s original location is already occupied, it is reassigned to a nearby neighborhood to avoid overlap. The users can also see the entire regularization path by varying the penalty terms. 

      Nitzan M, Karaiskos N, Friedman N, Rajewsky N. Gene expression cartography. Nature. 2019;576(7785):132-137. doi:10.1038/s41586-019-1773-3

      Wei, R. et al. (2022) ‘Spatial charting of single-cell transcriptomes in tissues’, Nature Biotechnology, 40(8), pp. 1190–1199. doi:10.1038/s41587-022-01233-1.

      Stickels, R.R. et al. (2020) ‘Highly sensitive spatial transcriptomics at near-cellular resolution with Slide-SEQV2’, Nature Biotechnology, 39(3), pp. 313–319. doi:10.1038/s41587-020-0739-1. 

      Reviewer #2 (Public review):

      Summary:

      The author proposes a novel method for mapping single-cell data to specific locations with higher resolution than several existing tools.

      Strengths:

      The spatial mapping tests were conducted on various tissues, including the mouse cortex, human PDAC, and intestinal villus.

      Weakness:

      (1) Although the researchers claim that glmSMA seamlessly accommodates both sequencing-based and image-based spatial transcriptomics (ST) data, their testing primarily focused on sequencingbased ST data, such as Visium and Slide-seq. To demonstrate its versatility for spatial analysis, the authors should extend their evaluation to imaging-based spatial data.

      Thank you for the comment. We have tested our algorithm on the virtual FISH dataset from the fly embryo, which serves as an example of image-based spatial omics data (Fig. 4c). However, such datasets often contain a limited number of available genes. To address this, we will conduct additional testing on image-based data if needed. The Allen Brain Atlas provides high-quality ISH data, and we can select specific brain regions from this resource to further evaluate our algorithm if necessary [Lein et al]. Currently, we plan to focus more on the 10x Visium platform, as it supports whole-transcriptome profiling and offers a wide range of tissue samples for analysis.

      (2) The definition of "ground truth" for spatial distribution is unclear. A more detailed explanation is needed on how the "ground truth" was established for each spatial dataset and how it was utilized for comparison with the predicted distribution generated by various spatial mapping tools.

      Thank you for the comment. To clarify how ground truth is defined across different tissues, we provided the following details. Direct ground truth for cell locations is often unavailable in scRNA-seq data due to experimental constraints. To address this, we adopted alternative strategies for estimating ground truth in each dataset:

      10x Visium Data: We used the cell type distribution derived from spatial transcriptomics (ST) data as a proxy for ground truth. We then computed the KL divergence between this distribution and our model's predictions for performance assessment.

      Slide-seq Data: We validated predictions by comparing the expression of marker genes between the reconstructed and original spatial data.

      Fly Embryo Data: We used predicted cell locations from novoSpaRc as a reference for evaluating our algorithm.

      These strategies allowed us to evaluate model performance even in the absence of direct cell location data. In addition, we can apply multiple evaluation strategies within a single dataset.

      (3) In the analysis of spatial mapping results using intestinal villus tissue, only Figure 3d supports their findings. The researchers should consider adding supplemental figures illustrating the spatial distribution of single cells in comparison to the ground truth distribu tion to enhance the clarity and robustness of their investigation.

      Thank you for the comment. In the intestinal dataset, only six large domains were defined. As a result, the task for this dataset is relatively simple—each cell only needs to be assigned to one of the six domains. As the intestinal villus is a relatively simple tissue, most existing algorithms performed well on it. For this reason, we did not initially provide extensive details in the main text.

      (4) The spatial mapping tests were conducted on various tissues, including the mouse cortex, human PDAC, and intestinal villus. However, the original anatomical regions are not displayed, making it difficult to directly compare them with the predicted mapping results. Providing ground truth distributions for each tested tissue would enhance clarity and facilitate interpretation. For instance, in Figure 2a and  Supplementary Figures 1 and 2, only the predicted mapping results are shown without the corresponding original spatial distribution of regions in the mouse cortex. Additionally, in Figure 3c, four anatomical regions are displayed, but it is unclear whether the figure represents the original spatial regions or those predicted by glmSMA. The authors are encouraged to clarify this by incorporating ground truth distributions for each tissue.

      Thank you for the comment. To improve visualization, we included anatomical structures alongside the mapping results in the next version, wherever such structures are available (e.g., mouse brain cortex, human PDAC sample, etc.). Major cell type assignments for the PDAC samples, along with anatomical structures, are shown in Supplementary Figure 9. Most of these cell types were correctly mapped to their corresponding anatomical regions.

      (5) The cell assignment results from the mouse hippocampus (Supplementary Figure 6) lack a corresponding ground truth distribution for comparison. DG and CA cells were evaluated solely based on the gene expression of specific marker genes. Additional analyses are needed to further validate the robustness of glmSMA's mapping performance on Slide-seq data from the mouse hippocampus.

      Thank you for the comment. The ground truth for DG and CA cells was not available. To better evaluate the model's performance, we computed the KL divergence between the original and predicted cell type distributions, following the same approach used for the 10x Visium dataset. We identified a higher-quality dataset for the mouse hippocampus and used it to evaluate our algorithm. Additionally, we employed KL divergence as an alternative strategy to validate and benchmark our results (Fig. 5e, f, g). Most CA cells, including CA1, CA2, and CA3 principal cells, were correctly assigned back to the CA region. Dentate principal cells were accurately mapped to the DG region (Fig. 5e, f).

      (6) The tested spatial datasets primarily consist of highly structured tissues with well-defined anatomical regions, such as the brain and intestinal villus. Anatomical regions are not distinctly separated, such as liver tissue. Further evaluation of such tissues would help determine the method's broader applicability.

      Thank you for the insightful comment. We agree that many spatial datasets used in our study are from tissues with well-defined anatomical regions. To address the applicability of glmSMA in tissues without clearly separated anatomical structures, we applied glmSMA to the Drosophila embryo, which represents a tissue with relatively continuous spatial patterns and lacks well-demarcated anatomical boundaries compared to organs like the brain or intestinal villus.

      Despite this less structured spatial organization, glmSMA demonstrated robust performance in the fly embryo, accurately mapping cells to their correct spatial spots based on gene expression profiles. This result indicates that glmSMA is not strictly limited to highly structured tissues and can generalize to tissues with more continuous or gradient-like spatial architectures. These results suggest that glmSMA has broader applicability beyond highly compartmentalized tissues.

      Lein, E., Hawrylycz, M., Ao, N. et al. Genome-wide atlas of gene expression in the adult mouse brain. Nature 445, 168–176 (2007). https://doi.org/10.1038/nature05453

      Reviewer #3 (Public review):

      The authors aim to develop glmSMA, a network-regularized linear model that accurately infers spatial gene expression patterns by integrating single-cell RNA sequencing data with spatial transcriptomics reference atlases. Their goal is to reconstruct the spatial organization of individual cells within tissues, overcoming the limitations of existing methods that either lack spatial resolution or sensitivity.

      Strengths:

      (1) Comprehensive Benchmarking:

      Compared against CellTrek and Novosparc, glmSMA consistently achieved lower Kullback-Leibler divergence (KL divergence) scores, indicating better cell assignment accuracy.

      Outperformed CellTrek in mouse cortex mapping (90% accuracy vs. CellTrek's 60%) and provided more spatially coherent distributions.

      (2) Experimental Validation with Multiple Real-World Datasets:

      The study used multiple biological systems (mouse brain, Drosophila embryo, human PDAC, intestinal villus) to demonstrate generalizability.

      Validation through correlation analyses, Pearson's coefficient, and KL divergence support the accuracy of glmSMA's predictions.

      We thank reviewer #3 for their positive feedback and thoughtful recommendations.

      Weaknesses:

      (1) The accuracy of glmSMA depends on the selection of marker genes, which might be limited by current FISH-based reference atlases.

      We agree that the accuracy of glmSMA is influenced by the selection of marker genes, and that current FISH-based reference atlases may offer a limited gene set. To address this, we incorporate multiple feature selection strategies, including highly variable genes and spatially informative genes (e.g., via Moran’s I), to optimize performance within the available gene space. As more comprehensive reference atlases become available, we expect the model’s accuracy to improve further.

      (2) glmSMA operates under the assumption that cells with similar gene expression profiles are likely to be physically close to each other in space which not be true under various heterogeneous environments.

      Thank you for raising this important point. We agree that glmSMA operates under the assumption that cells with similar gene expression profiles tend to be spatially proximal, and this assumption may not strictly hold in highly heterogeneous tissues where spatial organization is less coupled to transcriptional similarity.

      To address this concern, we specifically tested glmSMA on human PDAC samples, which represent moderately heterogeneous environments characterized by complex tumor microenvironments, including a mixture of ductal cells, cancer cells, stromal cells, and other components. Despite this heterogeneity, glmSMA successfully mapped major cell types to their expected anatomical regions, demonstrating that the method is robust even in the presence of substantial cellular diversity and spatial complexity.

      This result suggests that while glmSMA relies on the assumption of spatialtranscriptomic correlation, the method can tolerate a reasonable degree of spatial heterogeneity without a significant loss of performance. Nevertheless, we acknowledge that in extremely disorganized or highly mixed tissues where transcriptional similarity is decoupled from spatial proximity, the performance may be affected.

    1. eLife Assessment

      This study provides a fundamental advancement in our understanding of trabecular meshwork cell diversity and its role in eye pressure regulation and glaucoma using multimodal single-cell analysis, spatial validation, and functional testing that go beyond the current state-of-the-art. The study demonstrates that mitochondrial dysfunction, specifically in one of three distinct cell subtypes (TM3), contributes to elevated IOP in a genetic mouse model of glaucoma carrying a mutation in the transcription factor Lmx1b. While the identification of TM3 cells as metabolically specialized is compelling, there is somewhat limited evidence linking mitochondrial dysfunction to the Lmx1b mutation in TM3 cells.

    2. Reviewer #1 (Public review):

      Summary:

      This study provides a comprehensive single-cell and multiomic characterization of trabecular meshwork (TM) cells in the mouse eye, a structure critical to intraocular pressure (IOP) regulation and glaucoma pathogenesis. Using scRNA-seq, snATAC-seq, immunofluorescence, and in situ hybridization, the authors identify three transcriptionally and spatially distinct TM cell subtypes. The study further demonstrates that mitochondrial dysfunction specifically in one subtype (TM3) contributes to elevated IOP in a genetic mouse model of glaucoma carrying a mutation in the transcription factor Lmx1b. Importantly, treatment with nicotinamide (vitamin B3), known to support mitochondrial health, prevents IOP elevation in this model. The authors also link their findings to human datasets, suggesting the existence of analogous TM3-like cells with potential relevance to human glaucoma.

      Strengths:

      The study is methodologically rigorous, integrating single-cell transcriptomic and chromatin accessibility profiling with spatial validation and in vivo functional testing. The identification of TM subtypes is consistent across mouse strains and institutions, providing robust evidence of conserved TM cell heterogeneity. The use of a glaucoma model to show subtype-specific vulnerability-combined with a therapeutic intervention-gives the study strong mechanistic and translational significance. The inclusion of chromatin accessibility data adds further depth by implicating active transcription factors such as LMX1B, a gene known to be associated with glaucoma risk. The integration with human single-cell datasets enhances the potential relevance of the findings to human disease.

      Weaknesses:

      Although the LMX1B transcription factor is implicated as a key regulator in TM3 cells, its role in directly controlling mitochondrial gene expression is not fully explored. Additional analysis of motif accessibility or binding enrichment near relevant target genes could substantiate this mechanistic link. The therapeutic effect of vitamin B3 is clearly demonstrated phenotypically, but the underlying cellular and molecular mechanisms remain somewhat underdeveloped-for instance, changes in mitochondrial function, oxidative stress markers, or NAD+ levels are not directly measured. While the human relevance of TM3 cells is suggested through marker overlap, more quantitative approaches, such as cell identity mapping or gene signature scoring in human datasets, would strengthen the translational connection.

      Overall, this is a compelling and carefully executed study that offers significant advances in our understanding of TM cell biology and its role in glaucoma. The integration of multimodal data, disease modeling, and therapeutic testing represents a valuable contribution to the field. With additional mechanistic depth, the study has the potential to become a foundational resource for future research into IOP regulation and glaucoma treatment.

    3. Reviewer #3 (Public review):

      Summary:

      In this study, the authors perform multimodal single-cell transcriptomic and epigenomic profiling of 9,394 mouse TM cells, identifying three transcriptionally distinct TM subtypes with validated molecular signatures. TM1 cells are enriched for extracellular matrix genes, TM2 for secreted ligands supporting Schlemm's canal, and TM3 for contractile and mitochondrial/metabolic functions. The transcription factor LMX1B, previously linked to glaucoma, shows the highest expression in TM3 cells and appears to regulate mitochondrial pathways. In Lmx1bV265D mutant mice, TM3 cells exhibit transcriptional signs of mitochondrial dysfunction associated with elevated IOP. Notably, vitamin B3 treatment significantly mitigates IOP elevation, suggesting a potential therapeutic avenue.<br /> This is an excellent and collaborative study involving investigators from two institutions, offering the most detailed single-cell transcriptomic and epigenetic profiling of the mouse limbal tissues-including both TM and Schlemm's canal (SC), from wild-type and Lmx1bV265D mutant mice. The study defines three TM subtypes and characterizes their distinct molecular signatures, associated pathways, and transcriptional regulators. The authors also compare their dataset with previously published murine and human studies, including those by Van Zyl et al., providing valuable cross-species insights.

      Strengths:

      (1) Comprehensive dataset with high single-cell resolution

      (2) Use of multiple bioinformatic and cross-comparative approaches

      (3) Integration of 3D imaging of TM and SC for anatomical context

      (4) Convincing identification and validation of three TM subtypes using molecular markers.

      Weaknesses:

      (1) Insufficient evidence linking mitochondrial dysfunction to TM3 cells in Lmx1bV265D mice: While the identification of TM3 cells as metabolically specialized and Lmx1b-enriched is compelling, the proposed link between Lmx1b mutation and mitochondrial dysfunction remains underdeveloped. It is unclear whether mitochondrial defects are a primary consequence of Lmx1b-mediated transcriptional dysregulation or a secondary response to elevated IOP. Although authors have responded to this, the manuscript is not sufficiently altered to address these points. I would like to suggest that authors tone down mitochondrial connection with Lmx1b from the title and abstract, and clearly discuss that these events are associated, and future work is needed to dissect the role of mitochondria in this pathway.<br /> Furthermore, the protective effects of nicotinamide (NAM) are interpreted as evidence of mitochondrial involvement, but no direct mitochondrial measurements (e.g., immunostaining, electron microscopy, OCR assays) are provided. It is essential to validate mitochondrial dysfunction in TM3 cells using in vivo functional assays to support the central conclusion of the paper. Without this, the claim that mitochondrial dysfunction drives IOP elevation in Lmx1bV265D mice remains speculative. Alternatively, authors should consider revising their claims that mitochondrial dysfunction in these mice is a central driver of TM dysfunction.

      (2) Mechanism of NAM-mediated protection is unclear: The manuscript states that NAM treatment prevents IOP elevation in Lmx1bV265D mice via metabolic support, yet no data are shown to confirm that NAM specifically rescues mitochondrial function. Do NAM-treated TM3 cells show improved mitochondrial integrity? Are reactive oxygen species (ROS) reduced? Does NAM also protect RGCs from glaucomatous damage? Addressing these points would clarify whether the therapeutic effects of NAM are indeed mitochondrial.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This study provides a comprehensive single-cell and multiomic characterization of trabecular meshwork (TM) cells in the mouse eye, a structure critical to intraocular pressure (IOP) regulation and glaucoma pathogenesis. Using scRNA-seq, snATAC-seq, immunofluorescence, and in situ hybridization, the authors identify three transcriptionally and spatially distinct TM cell subtypes. The study further demonstrates that mitochondrial dysfunction, specifically in one subtype (TM3), contributes to elevated IOP in a genetic mouse model of glaucoma carrying a mutation in the transcription factor Lmx1b. Importantly, treatment with nicotinamide (vitamin B3), known to support mitochondrial health, prevents IOP elevation in this model. The authors also link their findings to human datasets, suggesting the existence of analogous TM3-like cells with potential relevance to human glaucoma.

      Strengths:

      The study is methodologically rigorous, integrating single-cell transcriptomic and chromatin accessibility profiling with spatial validation and in vivo functional testing. The identification of TM subtypes is consistent across mouse strains and institutions, providing robust evidence of conserved TM cell heterogeneity. The use of a glaucoma model to show subtype-specific vulnerability, combined with a therapeutic intervention-gives the study strong mechanistic and translational significance. The inclusion of chromatin accessibility data adds further depth by implicating active transcription factors such as LMX1B, a gene known to be associated with glaucoma risk. The integration with human single-cell datasets enhances the potential relevance of the findings to human disease.

      We thank the reviewers for their thorough reading of our manuscript and helpful comments.

      Weaknesses:

      (1) Although the LMX1B transcription factor is implicated as a key regulator in TM3 cells, its role in directly controlling mitochondrial gene expression is not fully explored. Additional analysis of motif accessibility or binding enrichment near relevant target genes could substantiate this mechanistic link. 

      We show that the Lmx1b mutation induces mitochondrial dysfunction with mitochondrial gene expression changes but agree with the referee in that we do not show direct regulation of mitochondrial genes by LMX1B. Emerging data suggest that LMX1B regulates the expression of mitochondrial genes in other cell types [1, 2] making the direct link reasonable. Future work that is beyond the scope of the current paper will focus on sequencing cells at earlier timepoints to help distinguish gene expression changes associated with the V265D mutation from those secondary to ongoing disease and elevated IOP. Additional studies, including ATAC seq at more ages, ChIP-seq and/or Cut and Run/Tag (in TM cells) will be necessary to directly investigate LMX1B target genes.

      As we studied adult mice, mitochondrial gene expression changes could be secondary to other disease induced stresses. Because we did not intend to say we have shown a direct link, we have now added a sentence to the discussion ensure clarity. 

      Lines 932-934: “Although our studies show a clear effect of the Lmx1b mutation on mitochondria, future studies are needed to determine if LMX1B directly modulates mitochondrial genes in V265D mutant TM cells”

      (2) The therapeutic effect of vitamin B3 is clearly demonstrated phenotypically, but the underlying cellular and molecular mechanisms remain somewhat underdeveloped - for instance, changes in mitochondrial function, oxidative stress markers, or NAD+ levels are not directly measured. 

      We agree that further experiments towards a fuller mechanistic understanding of vitamin B3’s therapeutic effects are needed. Such experiments are planned but are beyond the scope of this paper, which is already very large (7 Figures and 16 Supplemental Figures).

      (3) While the human relevance of TM3 cells is suggested through marker overlap, more quantitative approaches, such as cell identity mapping or gene signature scoring in human datasets, would strengthen the translational connection.

      We appreciate the reviewer’s suggestion and agree that additional quantitative analyses will further strengthen the translational relevance of TM3 cells. It is not yet clear if humans have a direct TM3 counterpart or if TM cell roles are compartmentalized differently between human cell types. We are currently limited in our ability to perform these comparative analyses. Specifically, we were unable to obtain permission to use the underlying dataset from Patel et al., and our access to the Van Zyl et al. dataset was through the Single Cell Portal, which does not support more complex analyses (ex. cell identity mapping or gene signature scoring). Differences between human studies themselves also affect these comparisons. Future work aimed at resolving differences and standardizing human TM cell annotations, as well as cross species comparisons are needed (working groups exist and this ongoing effort supports 3 human TM cell subtypes as also reported by Van Zyl). This is beyond what we are currently able to do for this paper. We present a comprehensive assessment using readily available published resources.

      Reviewer #2 (Public review):

      Summary:

      This elegant study by Tolman and colleagues provides fundamental findings that substantially advance our knowledge of the major cell types within the limbus of the mouse eye, focusing on the aqueous humor outflow pathway. The authors used single-cell and single-nuclei RNAseq to very clearly identify 3 subtypes of the trabecular meshwork (TM) cells in the mouse eye, with each subtype having unique markers and proposed functions. The U. Columbia results are strengthened by an independent replication in a different mouse strain at a separate laboratory (Duke). Bioinformatics analyses of these expression data were used to identify cellular compartments, molecular functions, and biological processes. Although there were some common pathways among the 3 subtypes of TM cells (e.g., ECM metabolism), there also were distinct functions. For example:

      TM1 cell expression supports heavy engagement in ECM metabolism and structure, as well as TGFb2 signaling.

      TM2 cells were enriched in laminin and pathways involved in phagocytosis, lysosomal function, and antigen expression, as well as End3/VEGF/angiopoietin signaling.

      TM3 cells were enriched in actin binding and mitochondrial metabolism.

      They used high-resolution immunostaining and in situ hybridization to show that these 3 TM subtypes express distinct markers and occupy distinct locations within the TM tissue. The authors compared their expression data with other published scRNAseq studies of the mouse as well as the human aqueous outflow pathway. They used ATAC-seq to map open chromatin regions in order to predict transcription factor binding sites. Their results were also evaluated in the context of human IOP and glaucoma risk alleles from published GWAS data, with interesting and meaningful correlations. Although not discussed in their manuscript, their expression data support other signaling pathways/ proteins/ genes that have been implicated in glaucoma, including: TGFb2, BMP signaling (including involvement of ID proteins), MYOC, actin cytoskeleton (CLANs), WNT signaling, etc.

      In addition to these very impressive data, the authors used scRNAseq to examine changes in TM cell gene expression in the mouse glaucoma model of mutant Lmxb1-induced ocular hypertension. In man, LMX1B is associated with Nail-Patella syndrome, which can include the development of glaucoma, demonstrating the clinical relevance of this mouse model. Among the gene expression changes detected, TM3 cells had altered expression of genes associated with mitochondrial metabolism. The authors used their previous experience using nicotinamide to metabolically protect DBA2/J mice from glaucomatous damage, and they hypothesized that nicotinamide supplementation of mutant Lmx1b mice would help restore normal mitochondrial metabolism in the TM and prevent Lmx1b-mediated ocular hypertension. Adding nicotinamide to the drinking water significantly prevented Lmxb1 mutant mice from developing high intraocular pressure. This is a laudable example of dissecting the molecular pathogenic mechanisms responsible for a disease (glaucoma) and then discovering and testing a potential therapy that directly intervenes in the disease process and thereby protects from the disease.

      Strengths:

      There are numerous strengths in this comprehensive study including:

      Deep scRNA sequencing that was confirmed by an independent dataset in another mouse strain at another university.

      Identification and validation of molecular markers for each mouse TM cell subset along with localization of these subsets within the mouse aqueous outflow pathway.

      Rigorous bioinformatics analysis of these data as well as comparison of the current data with previously published mouse and human scRNAseq data.

      Correlating their current data with GWAS glaucoma and IOP "hits".

      Discovering gene expression changes in the 3 TM subgroups in the mouse mutant Lmx1b model of glaucoma.

      Further pursuing the indication of dysfunctional mitochondrial metabolism in TM3 cells from Lmx1b mutant mice to test the efficacy of dietary supplementation with nicotinamide. The authors nicely demonstrate the disease modifying efficacy of nicotinamide in preventing IOP elevation in these Lmx1b mutant mice, preventing the development of glaucoma. These results have clinical implications for new glaucoma therapies.

      We thank the reviewer for these generous and thoughtful comments on the strengths of this study.

      Weaknesses:

      (1) Occasional over-interpretation of data. The authors have used changes in gene expression (RNAseq) to implicate functions and signaling pathways. For example: they have not directly measured "changes in metabolism", "mitochondrial dysfunction" or "activity of Lmx1b".

      We thank the reviewer for this feedback. We did not intend to overstate and agree. Our gene expression changes support, but do not by themselves prove, metabolic disturbances. We had felt that this was obvious and did not want to clutter the text. We have revised the manuscript to clarify that our conclusions about metabolic changes and LMX1B activity are based on gene expression patterns rather than direct functional assays and have added EM data (see below under “Recommendations for the authors”).

      We have also added the following to the results:

      Lines 715-721: “Although the documented gene expression changes strongly suggest metabolic and mitochondrial dysfunction, they do not directly prove it. Using electron microscopy to directly evaluate mitochondria in the TM, we found a reduction in total mitochondria number per cell in mutants (P = 0.015, Figure 6G). In addition, mitochondria in mutants had increased area and reduced cristae (inner membrane folds) in mutants consistent with mitochondrial swelling and metabolic dysfunction (all P < 0.001 compared to WT, Figure 6G-H).”

      More detailed EM and metabolic studies are underway but are beyond the scope of this paper.

      (2) In their very thorough data set, there is enrichment of or changes in gene expression that support other pathways that have been previously reported to be associated with glaucoma (such as TGFb2, BMP signaling, actin cytoskeletal organization (CLANs), WNT signaling, ossification, etc. that appears to be a lost opportunity to further enhance the significance of this work.

      We appreciate the reviewer’s suggestions for enhancing the relevance of our work, we had not initially discussed this due to length concerns. We have now incorporated some of this information into the manuscript (see below under “Recommendations for the authors”).

      Reviewer #3 (Public review):

      Summary: In this study, the authors perform multimodal single-cell transcriptomic and epigenomic profiling of 9,394 mouse TM cells, identifying three transcriptionally distinct TM subtypes with validated molecular signatures. TM1 cells are enriched for extracellular matrix genes, TM2 for secreted ligands supporting Schlemm's canal, and TM3 for contractile and mitochondrial/metabolic functions. The transcription factor LMX1B, previously linked to glaucoma, shows the highest expression in TM3 cells and appears to regulate mitochondrial pathways. In Lmx1bV265D mutant mice, TM3 cells exhibit transcriptional signs of mitochondrial dysfunction associated with elevated IOP. Notably, vitamin B3 treatment significantly mitigates IOP elevation, suggesting a potential therapeutic avenue.

      This is an excellent and collaborative study involving investigators from two institutions, offering the most detailed single-cell transcriptomic and epigenetic profiling of the mouse limbal tissues-including both TM and Schlemm's canal (SC), from wild-type and Lmx1bV265D mutant mice. The study defines three TM subtypes and characterizes their distinct molecular signatures, associated pathways, and transcriptional regulators. The authors also compare their dataset with previously published murine and human studies, including those by Van Zyl et al., providing valuable crossspecies insights.

      Strengths: 

      (1) Comprehensive dataset with high single-cell resolution

      (2) Use of multiple bioinformatic and cross-comparative approaches

      (3) Integration of 3D imaging of TM and SC for anatomical context

      (4) Convincing identification and validation of three TM subtypes using molecular markers.

      We thank the reviewer for their comments on the strengths of this study.

      Weaknesses:

      (1) Insufficient evidence linking mitochondrial dysfunction to TM3 cells in Lmx1bV265D mice: While the identification of TM3 cells as metabolically specialized and Lmx1b-enriched is compelling, the proposed link between Lmx1b mutation and mitochondrial dysfunction remains underdeveloped. It is unclear whether mitochondrial defects are a primary consequence of Lmx1b-mediated transcriptional dysregulation or a secondary response to elevated IOP. Additional evidence is needed to clarify whether Lmx1b directly regulates mitochondrial genes (e.g., via ChIP-seq, motif analysis, or ATAC-seq), or whether mitochondrial changes are downstream effects.

      We agree and refer the reviewer to our responses to the other referees including Reviewer 1, Comment 1 and Reviewer 2 comments 1 and 17. As noted there, these mechanistic questions are the focus of ongoing and future studies. We have revised the text where appropriate to ensure it accurately reflects the scope of our current data.

      (2) Furthermore, the protective effects of nicotinamide (NAM) are interpreted as evidence of mitochondrial involvement, but no direct mitochondrial measurements (e.g., immunostaining, electron microscopy, OCR assays) are provided. It is essential to validate mitochondrial dysfunction in TM3 cells using in vivo functional assays to support the central conclusion of the paper. Without this, the claim that mitochondrial dysfunction drives IOP elevation in Lmx1bV265D mice remains speculative. Alternatively, authors should consider revising their claims that mitochondrial dysfunction in these mice is a central driver of TM dysfunction.

      We again refer the reviewer to our other response including Reviewer 1, Comment 1 and Reviewer 2 comments 1 and 17.

      (3) Mechanism of NAM-mediated protection is unclear: The manuscript states that NAM treatment prevents IOP elevation in Lmx1bV265D mice via metabolic support, yet no data are shown to confirm that NAM specifically rescues mitochondrial function. Do NAM-treated TM3 cells show improved mitochondrial integrity? Are reactive oxygen species (ROS) reduced? Does NAM also protect RGCs from glaucomatous damage? Addressing these points would clarify whether the therapeutic effects of NAM are indeed mitochondrial.

      We refer the reviewer to our response to Reviewer 1, Comment 2.

      (4) Lack of direct evidence that LMX1B regulates mitochondrial genes: While transcriptomic and motif accessibility analyses suggest that LMX1B is enriched in TM3 cells and may influence mitochondrial function, no mechanistic data are provided to demonstrate direct regulation of mitochondrial genes. Including ChIP-seq data, motif enrichment at mitochondrial gene loci, or perturbation studies (e.g., Lmx1b knockout or overexpression in TM3 cells) would greatly strengthen this central claim.

      We refer the reviewer to our response to Reviewer 1, Comment 1.

      (5) Focus on LMX1B in Fig. 5F lacks broader context: Figure 5F shows that several transcription factors (TFs)-including Tcf21, Foxs1, Arid3b, Myc, Gli2, Patz1, Plag1, Npas2, Nr1h4, and Nfatc2exhibit stronger positive correlations or motif accessibility changes than LMX1B. Yet the manuscript focuses almost exclusively on LMX1B. The rationale for this focus should be clarified, especially given LMX1B's relatively lower ranking in the correlation analysis. Were the functions of these other highly ranked TFs examined or considered in the context of TM biology or glaucoma? Discussing their potential roles would enhance the interpretation of the transcriptional regulatory landscape and demonstrate the broader relevance of the findings.

      Our analysis (Figure 5F) indicates that Lmx1b is the transcription factor most strongly associated with its predicted target gene expression across all TM cells, as reflected by its highest value along the X-axis. While other transcription factors exhibit greater motif accessibility (Y-axis), this likely reflects their broader expression across TM subtypes. In contrast, Lmx1b is minimally expressed in TM1 and TM2 cells, which may account for its lower motif accessibility overall (motifs not accessible in cells where Lmx1b is not / minimally expressed).

      Our emphasis on LMX1B is further supported by its direct genetic association with glaucoma. In contrast, the other transcription factors lack clear links to glaucoma and are supported primarily by indirect evidence. Nonetheless, we agree that the transcription factors highlighted in our analysis are promising candidates for future investigation. However, to maintain focus on the central narrative of this study, we have chosen not to include an extended discussion of these additional genes.

      (6) In abstract, they say a number of 9,394 wild-type TM cell transcriptomes. The number of Lmx1bV265D/+ TM cell transcriptomes analyzed is not provided. This information is essential for evaluating the comparative analysis and should be clearly stated in the Abstract and again in the main text (e.g., lines 121-123). Including both wild-type and mutant cell counts will help readers assess the balance and robustness of the dataset.

      We thank the reviewer for noticing this oversight and have added this value to the abstract and results section. 

      Lines 41 and 696: 2,491 mutant TM cells.  

      (7) Did the authors monitor mouse weight or other health parameters to assess potential systemic effects of treatment? It is known that the taste of compounds in drinking water can alter fluid or food intake, which may influence general health. Also, does Lmx1bV265D/+ have mice exhibit non-ocular phenotypes, and if so, does nicotinamide confer protection in those tissues as well? Additionally, starting the dose of the nicotinamide at postnatal day 2, how long the mice were treated with water containing nicotinamide, and after how many days or weeks IOP was reduced, and how long the decrease in the IOP was sustained.

      Water intake was monitored in both treatment groups, and dosing was based on the average volume consumed by adult mice (lines 1017–1018, young pups do not drink water and so drug is largely delivered through mothers’ milk until weaning and so we do not know an accurate dose for young pups). Mouse health was assessed throughout the experiment through regular monitoring of body weight and general condition.

      Depending on genetic context, Lmx1b mutations can cause kidney disease and impact other systems. Non-ocular phenotypes were not the focus of this study and were not characterized.

      We added a comment to the method to clarify the NAM treatment timeline. NAM was administered continuously in the drinking water starting at P2 and maintained throughout the experiment. IOP was measured beginning at 2 months and then at monthly time points. NAM lessened IOP at 2 and 3 months. We terminated IOP assessment at 3 months.

      Lines 1028-1029: “Treatment was started at postnatal day 2 and continued throughout the experiment.”

      (8) While the IOP reduction observed in NAM-treated Lmx1bV265D/+ mice appears statistically significant, it is unclear whether this reflects meaningful biological protection. Several untreated mice exhibit very high IOP values, which may skew the analysis. The authors should report the mean values for IOP in both untreated and NAM-treated groups to clarify the magnitude and variability of the response.

      We have added supplemental table 7 with the statistical information. Regarding the high IOP values observed in a subset of untreated V265D mutant mice, we consistently detect individual mutant eyes with IOPs exceeding 30 mmHg across independent cohorts and time points [3-5]. It is important to note that IOP is subject to fluctuation and in disease states such as glaucoma, circadian rhythms can be disrupted with stochastic and episodic IOP spikes throughout the day. This may be occurring in those untreated mice. This is also why we strive to use sample sizes of 40 or more. Additionally, we observe that some mutant eyes with IOPs measured within the normal range have anterior chamber deepening (ACD) - a persistent anatomical change associated with sustained or recurrent high IOP that stretches the cornea and may posteriorly displace the lens. This suggests mutant mice experience transient IOP elevations that are not always captured at a single time point due to the stochastic nature of these fluctuations. To account for this, we include ACD as an additional readout alongside IOP measurements. The reduction in ACD observed in NAM-treated mice provides independent evidence supporting the biological relevance of NAM-mediated IOP reduction.   

      (9) Additionally, since NAM has been shown to protect RGCs in other glaucoma models directly, the authors should assess whether RGCs are preserved in NAM-treated Lmx1b V265D/+ mice. Demonstrating RGC protection would support a synergistic effect of NAM through both IOP reduction and direct neuroprotection, strengthening the translational relevance of the treatment.

      We again thank the referee. We note the possibility of dual IOP protection and neuroprotection in the manuscript (lines 961–963). The goal of the present study, however, was to determine mechanisms underlying IOP elevation in patients with LMX1B variants. Therefore, we limited our focus to IOP elevation (LMX1B is expressed in the TM but not RGCs). Studies of the RGCs and optic nerve in V265D mutant mice treated with NAM take considerable effort but are underway. They will be reported in a subsequent manuscript. Initial data support protection, but that is a work in progress.  

      Additionally, we recently reported a similar pattern of IOP protection to that reported here using pyruvate - in experiments where we analyzed the optic nerve as the focus of the study was assessment of pyruvate as a resilience factor against high genetic risk of glaucoma [4]. In that case, there was statistically significant protection from glaucomatous optic nerve damage, arguing for translational relevance again with a possible synergistic effect through both IOP reduction and direct neuroprotection.

      (10) Can the authors add any other functional validation studies to explore to understand the pathways enriched in all the subtypes of TM1, TM2, and TM3 cells, in addition to the ICH/IF/RNAscope validation?

      We agree with the reviewer on the importance of further functional validation of pathways active in TM cell subtypes that influence IOP. However, comprehensive investigation of the pathways active in subtypes need to be in future studies. It is beyond the scope of his already large paper.

      (11) The authors should include a representative image of the limbal dissection. While Figure S1 provides a schematic, mouse eyes are very small, and dissecting unfixed limbal tissue is technically challenging. It is also difficult to reconcile the claim that the majority of cells in the limbal region are TM and endothelium. As shown in Figure S6, DAPI staining suggests a much higher abundance of scleral cells compared to TM cells within the limbal strip. Additional clarification or visual evidence would help validate the dissection strategy and cellular composition of the captured region.

      We appreciate the reviewer’s suggestion and have added additional images to Figure S1 to show our limbal strip dissection. However, we clarify that we do not intend to suggest that TM and endothelial cells are the most abundant populations in these dissected strips.  When we say “are enriched for drainage tissues” we mean in comparison to dissecting the anterior segment as a whole. We have clarified this in the text. In fact, epithelial cells (primarily from the cornea) constituted the largest cluster in our dataset (Figure 1A). Additionally, to avoid misinterpretation, we generally refrain from drawing conclusions about the relative abundance of cell types based on sequencing data. Single-cell and single nucleus RNA sequencing results are sensitive to technical factors that alter cell proportions depending on exact methodological details. In our study, TM cells comprised 24.4% of the single-cell dataset and 11.8% of the single-nucleus dataset, illustrating the impact of methodological variability. 

      Lines 163-164: “Individual eyes were dissected to isolate a strip of limbal tissue, which is enriched for TM cells in comparison to dissecting the anterior segment as a whole.”

      Reviewer #1 (Recommendations for the authors):

      To enhance the reproducibility and transparency of the findings presented in this study, we strongly recommend that the authors make all analysis scripts and computational tools publicly available.

      We agree with the reviewer’s emphasis on transparency and are currently building a GitHub page to share our scripts. However, we did not develop any new tools for this study. All tools that we used are publicly available and provided in our methods section. All data will be available as raw data and through the Broad Institute’s Single Cell Portal.

      Reviewer #2 (Recommendations for the authors):

      The authors are to be commended for a well-written presentation of high-quality data, their comparisons of datasets (other mouse and human scRNAseq data), correlation with clinical glaucoma risk alleles, and curative therapy for the mouse model of Lmx1b glaucoma. There are several minor suggestions that the authors might consider to further improve their manuscript:

      (1) Lines 42-43: Although their data strongly support the role of mitochondrial dysfunction in Lmx1b glaucoma, they might want to soften their conclusion "supports a primary role of mitochondrial dysfunction within TM3 cells initiating the IOP elevation that causes glaucoma".

      With the inclusion of EM data supporting mitochondrial dysfunction in Lmx1b mutant TM cells, we have revised this sentence to more accurately reflect our findings.

      Lines 42-44 (previously lines 42-43): “Mitochondria in TM cells of V265D/+ mice are swollen with a reduced cristae area, further supporting a role for mitochondrial dysfunction in the initiation of IOP elevation in these mice.”

      (2) Figure 1: Why is the shape of the "TM containing" cluster in 1A so different than the cluster shown in 1B?

      We isolated cells from the 'TM-containing' cluster and performed unbiased reclustering, which alters their positioning in UMAP space. The figure legend has been updated to clarify this point.

      Lines 143-144 “A separate UMAP representation of the trabecular meshwork (TM) containing cluster following subclustering.”

      (3) Line 160: change "data was" to "data were"

      Corrected

      (4) S4 Fig C: Please comment on why the Columbia and Duke heatmaps for TM3 are not as congruent as the heatmaps for TM1 and TM2.

      We cannot definitively determine the reason for this. However, differences in tissue processing techniques between the Columbia and Duke preparations may contribute. Such variations have been shown to affect cellular transcriptomes in certain contexts. It is possible that TM3 cells are more susceptible to these effects than others. We have added a statement addressing this point to the figure legend.

      Lines 238-240: “Because tissue processing techniques can alter gene expression [52], the heatmap variation between institutes likely reflects differences in processing techniques (Methods) and suggests that TM3 cells are more susceptible to these effects than other cell types.”

      (5) S9 Fig: It is very difficult to see any staining for TM1 CHIL1 (2nd panel), TM2 End3 (2nd panel), and TM3 Lypd1 (both panels)

      We apologize for the difficulty in visualizing these panels. To improve clarity, we have increased the brightness of all relevant marker signals, within standard bounds, to facilitate easier interpretation.

      (6) Line 380: "are significantly higher"; since statistical analysis was not reported, please do not use "significantly"

      Done

      (7) The authors should consider discussing several of their findings that agree with published literature. For example:

      Figure 3B: "Wnt protein binding" (PMID: 18274669), "TGFb "binding" (numerous references), "integrin binding" (work of Donna Peters), "actin binding"/"actin filament binding"/"actin filament bundle" (CLANs references)

      S10 Fig c: "ossification" (work of Torretta Borres)

      S11 Fig A: ID2/ID3 (PMID: 33938911); (B) BMP4 (PMID: 17325163)

      S12 Fig A: MYOC in TM1 cells (numerous references)

      We appreciate the reviewer’s diligent review and comments regarding these pathways. We have added a comment to the discussion regarding the agreement of these pathways.

      Lines 855-858: In addition, the expression of genes that we document generally agrees with the literature. For example, the following genes and signaling molecules have been reported in TM cells, WNT signaling [78], TGF-β signaling [79-85], integrin binding [86-88], actin cytoskeletal networks [89], calcification genes [90, 91], and Myocilin [91-94].

      (8) Line 541: was confocal microscopy used to measure the "3D shapes" of nuclei or was this done with a single image to determine sphericity?

      This analysis was performed using confocal microscopy and 3D reconstructed models of the TM nuclei. We have added text to clarify this in the figure legend 

      Lines 553-556: “To rigorously assess whether TM1 nuclei are more spherical, we analyzed their reconstructed 3D shapes from whole mounts images by confocal microscopy, comparing them to TM3 nuclei using the ‘Sphericity’ tool in Imaris.”

      (9) Line 545: please add a close parentheses after "scoring 1"

      Done

      (10) S15 Fig: (A) There does not appear to be "good agreement" (line 653) between the datasets for TM1. (C) please provide a better explanation on how to interpret these "Confusion Matrix" results.

      We understand the referee's concern, the patterns likely appear different to the referee due to limited sampling in snRNA-seq data. Based on our results, TM1 seems particularly susceptible, possibly because these cells do not tolerate the isolation process as well. Although we are confident that TM1 shows good agreement between the two techniques based on our experience, we have revised the language in the text to “generally” to reflect this nuance.

      Lines 633-635 (previously line 653): The generated clusters and their marker genes generally agreed with our scRNA-seq analyses (Fig 5A-B, S15A Fig).

      We have also added additional clarification for how to interpret the Confusion Matrix. 

      Lines 669-672: “Colors indicate the fraction of cells identified in each ATAC cluster (row) which are also identified in each RNA cell type (columns), where darker colors represent stronger correspondence between RNA and ATAC clusters.”

      (11) Line 676: The transition from discussing the sc/snRNAseq data to the work in Lmx1b mutant mice is quite abrupt and could use a better transition to introduce this metabolism work.

      We have revised this transition for improved flow but prefer to keep all transitions brief due to the paper's length.

      Lines 691-694 (previously line 676): To evaluate the utility of our new TM cell atlas, we used it to examine how Lmx1b mutations affect the TM cell transcriptome and to identify potential mechanisms underlying IOP elevation. We selected LMX1B because it causes IOP elevation and glaucoma in humans and was identified as a highly active transcription factor in our TM cell dataset.

      (12) Lines 696-697: It appears counter-intuitive that upregulation of ubiquitin pathways would lead to proteostasis (proteosome protein degradation requires ubiquination).

      We have clarified that the protein tagging pathway was significantly upregulated. However, polyubiquitin precursor itself was downregulated. In general, the statistical significance of the protein tagging pathway suggests perturbation of the system tagging proteins for degradation. We have clarified this in the text. 

      Lines 711-714 (previously lines 696-697): “In addition, mutant TM3 cells showed an upregulation of protein tagging genes. However, there is a downregulation of the polyubiquitin precursor gene (Ubb, P = 4.5E-30), indicating a general dysregulation of pathways that tag proteins for degradation.”

      (13) Line 715: Please justify why "perturbed metabolism" was chosen to pursue vs the other differentially expressed pathways

      We chose to narrow our focus on TM3 cells because of the enrichment for Lmx1b expression.Most pathways identified in our analysis of TM3 cells implicate mitochondrial metabolism.Therefore, we chose to further explore this avenue. We clarified that perturbed metabolism was the strongest gene expression signature in the text. 

      Lines 753-754 (previously line 715): “Our findings most strongly implicate perturbed metabolism within TM3 cells as responsible for IOP elevation in an Lmx1b glaucoma model.”

      (14) Line 759: The authors clearly demonstrate that Lmx1b is most expressed in TM3 cells; however, they did not demonstrate that "Lmx1b was most active"

      ATAC analysis showed that Lmx1b was most active in TM cells overall. We inferred its activity in TM3 because Lmx1b is most enriched in that subtype. This has been clarified in the text.

      Lines 799-800 (previously line 759): “More specifically, we demonstrate that Lmx1b is the most active TM cell TF and is enriched in TM3 cells,…”

      (15) Lines 830-835: Please include references documenting increased TGFβ2 concentrations in POAG aqueous humor and TM, effects of TGFβ2 on TM ECM deposition, and TGFβ2 induced ocular hypertension ex vivo and in vivo.

      Done.

      (16) Line 875: The authors provide no direct evidence for enhances "oxidative stress" in Lmx1b TM3 cells

      The mitochondrial abnormalities and changed pathways support oxidative stress, but we have not directly tested this. Experiments are currently underway to evaluate its role, but these additional analyses are beyond the scope of this paper. We removed oxidative stress from the sentence.

      Lines 920-922 (previously line 875): “Importantly, in heterozygous mutant V265D/+ mice, TM3 cells had pronounced gene expression changes that implicate mitochondrial dysfunction, but that were absent or much lower in other cells including TM1 and TM2.”

      (17) Line 880: Similarly, the authors have not directly assessed effects on metabolism in TM3 cells; they only have shown changes in the expression of mitochondrial genes that may affect metabolism

      We have no way to specifically isolating TM3 cells to test this. Future work is underway to test this more broadly in isolated TM cells but is beyond the scope of this is already large paper. Considering our gene expression data and the addition of supporting EM data, we have qualified the text.

      Lines 930-931 (previously 880): “Our data extend these published findings by showing that inheritance of a single dominant mutation in Lmx1b similarly affects mitochondria in TM cells.”

      (18) Line 892: What markers were used to detect "cell stress"?

      We have revised the text. Although our RNA data show stress gene changes, characterization of these markers is beyond the scope of the current study and will be included in a subsequent paper.

      Lines 945-948 (previously line 892): “However, these processes were not limited to TM3 cells or even to cell types that express detectable Lmx1b, suggesting that they are secondary damaging processes that are subsequent to the initiating, Lmx1b-induced perturbations in TM3 cells.”

      Additional author driven change

      While revising and reviewing our data, we identified a coding error that resulted in the WT and V265D mutant group labels being switched in Figure 6. Importantly, the significance of the differentially expressed genes (DEGs), the implicated biological pathways, and the interpretation of pathway directionality in the manuscript remain accurate. The only issue was the incorrect labeling in the figure. We have corrected the labels in Figure 6 to accurately reflect the data. As noted above, all data and code will be made available to ensure full reproducibility of our results.

      References

      (1) Doucet-Beaupre H, Gilbert C, Profes MS, Chabrat A, Pacelli C, Giguere N, et al. Lmx1a and Lmx1b regulate mitochondrial functions and survival of adult midbrain dopaminergic neurons. Proc Natl Acad Sci U S A. 2016;113(30):E4387-96. Epub 2016/07/14. doi: 10.1073/pnas.1520387113. PubMed PMID: 27407143; PubMed Central PMCID: PMCPMC4968767.

      (2) Jimenez-Moreno N, Kollareddy M, Stathakos P, Moss JJ, Anton Z, Shoemark DK, et al. ATG8-dependent LMX1B-autophagy crosstalk shapes human midbrain dopaminergic neuronal resilience. J Cell Biol. 2023;222(5). Epub 2023/04/05. doi: 10.1083/jcb.201910133. PubMed PMID: 37014324; PubMed Central PMCID: PMCPMC10075225.

      (3) Cross SH, Macalinao DG, McKie L, Rose L, Kearney AL, Rainger J, et al. A dominantnegative mutation of mouse Lmx1b causes glaucoma and is semi-lethal via LDB1mediated dimerization [corrected]. PLoS Genet. 2014;10(5):e1004359. Epub 2014/05/09. doi: 10.1371/journal.pgen.1004359. PubMed PMID: 24809698; PubMed Central PMCID: PMCPMC4014447.

      (4) Li K, Tolman N, Segre AV, Stuart KV, Zeleznik OA, Vallabh NA, et al. Pyruvate and related energetic metabolites modulate resilience against high genetic risk for glaucoma. Elife. 2025;14. Epub 2025/04/24. doi: 10.7554/eLife.105576. PubMed PMID: 40272416; PubMed Central PMCID: PMCPMC12021409.

      (5) Tolman NG, Balasubramanian R, Macalinao DG, Kearney AL, MacNicoll KH, Montgomery CL, et al. Genetic background modifies vulnerability to glaucoma-related phenotypes in Lmx1b mutant mice. Dis Model Mech. 2021;14(2). Epub 2021/01/20. doi: 10.1242/dmm.046953. PubMed PMID: 33462143; PubMed Central PMCID: PMCPMC7903917.

    1. eLife Assessment

      This useful study investigates how intrinsically disordered domains can interact to dictate the sub-cellular localization of a major innate immune sensor termed cGAS. The data from various cellular and biochemical assays are mostly solid, but the main conclusions from these experiments need to be validated further. This paper is relevant to immunologists, especially those interested in cytosolic DNA-sensing pathways.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript by the Yin group presents interesting findings that organelle-tethered intrinsically disordered "MEMCA" scaffolds, as exemplified by ZDHHC18 at the Golgi and MARCH8 at endosomes, enhance the engagement of cGAS with organelle-proximal condensates, thereby sequestering cGAS from cytosolic DNA sensing and negatively regulating innate immunity.

      Strengths:

      These findings suggest a previously unrecognized mechanism by which Golgi/endosomal IDR scaffolds modulate cGAS activity, with implications for antiviral defense and tumor immunology. The study is conceptually intriguing and potentially impactful.

      Weaknesses:

      While the manuscript addresses a novel aspect of cGAS regulation, additional mechanistic insights and targeted validations are needed to ensure robustness:

      (1) How do ZDHHC18/MARCH8 enhance cGAS engagement? Do they act as bridges to form a ternary, membrane-tethered cGAS-DNA-MEMCA complex, or alter cGAS condensate properties allosterically?

      (2) Is organelle cGAS capture selective? For instance, can other palmitoyltransferases/E3 ligases be substituted for ZDHHC18/MARCH8?

      (3) Why does membrane association suppress cGAS enzymic activity, as dsDNA still resides in cGAS condensation?

    3. Reviewer #2 (Public review):

      Summary:

      The authors found that cGAS, a DNA sensor, relocalizes to organelle membranes (ER, Golgi, endosomes) upon DNA stimulation, revealing spatial regulation of its activity. ZDHHC18 and MARCH8 recruit cGAS to Golgi/endosomes via intrinsically disordered regions (IDRs), driving phase-separated condensates. This sequestration of cGAS-dsDNA complexes suppresses innate immune signaling, uncovering a novel regulatory mechanism.

      Strengths:

      The work overall is very interesting. The authors provided molecular and biochemical evidence.

      Weaknesses:

      Overall, the work is very interesting. However, the quality of some of the data does need to be improved, and more experiments need to be performed.

      The following points need to be addressed:

      (1) In Figure S7, no direct binding between cGAS and MARCH8 or ZD18 IDR is observed, and the interaction only occurs after DNA stimulation. However, Figure 5 shows cGAS recruitment to ZD18 or MARCH8 IDR droplets, suggesting direct interactions. This apparent discrepancy should be clarified.

      (2) The authors propose that recruiting cGAS to organelle membranes reduces its activity, as demonstrated by the FKBP experiment. However, ZD18 and MARCH8 also post-translationally modify cGAS. Do both mechanisms contribute to this effect, and can the authors test this?

      (3) To demonstrate the functional importance of MEMCA, the authors should test IFN production or STING activation in cells.

      (4) Does the IDR of MARCH8 or ZD18 influence the interaction between cGAS and DNA?

      (5) Which region of cGAS does the IDR of MARCH8 or ZD18 interact with: the cGAS-CD or the cGAS-N-terminus?

      (6) The in vitro LLPS experiments with cGAS, DNA, and ZD18/MARCH8 should be conducted under physiological conditions.

    4. Reviewer #3 (Public review):

      Summary:

      In this study by Shi et al., the authors evaluate if cGAS is recruited to the membranes of intracellular organelles. Using a combination of biochemical fractionation and imaging techniques, the authors propose that upon recognition of DNA, cGAS translocates to various subcellular locations, including the golgi, endoplasmic reticulum, and endosomes. Mechanistically, the authors propose that upon localizing to the Golgi or endosome, cGAS binding to MARCH8 and ZDHHC18 prevents cGAS activity by incorporating cGAS and dsDNA into biomolecular condensates. However, in its current form, the study does not directly address this question.

      Strengths:

      The question of evaluating cGAS sub-cellular localization as a mechanism for controlling activity is interesting, and there is some evidence that cGAS is localized to sub-cellular organelle membranes.

      Weaknesses:

      (1) The well-established nuclear localization of cGAS is not adequately addressed in the cell lines used and is inconsistent with the findings.

      (2) Previous studies have shown that ZDHHC18 and MARCH8 control cGAS activity, which detracts somewhat from the novelty.

      (3) A lot of inconsistency in the cell lines and artificial expression systems used across the study.

      (4) A key element missing is showing that in the absence of ZDHHC18 or MARCH8, the loss of endogenous cGAS localization to the various sub-cellular organelles increases cGAMP synthesis and downstream STING activation in primary cells. There is an over-reliance on artificial expression systems. An important experiment to validate the hypothesis would be to evaluate endogenous cGAS localization in MARCH8- and ZDHHC18-deficient primary cells. Further, there should be evaluation of endogenous STING responses in MARCH8- and ZDHHC18-deficient primary cells in tandem with the localization studies.

      (5) There are a large number of grammatical errors throughout the manuscript which should be addressed.

    5. Author response:

      Below we outline our provisional responses to the major points raised in the public reviews, and our planned revisions:

      (1) Mechanistic model of how ZDHHC18/MARCH8 engage the cGAS–DNA condensate (Reviewer #1 & #2

      We will add a dedicated subsection and a working-model figure describing our current view: IDRs of ZDHHC18 (Golgi) and MARCH8 (endosomes) engage pre-formed cGAS–DNA condensates at organelle membranes, and thereby tune cGAS activity through PTMs. We will explicitly discuss bridge-like versus allosteric modes by perform additional LLPS experiment (e.g. FRAP assay) to detect any IDR-driven changes in condensate properties, and explain how these scenarios fit our data.

      (2) Selectivity beyond ZDHHC18/MARCH8 (Reviewer #1)

      We will expand the text to explain existing evidence indicating that, in addition to ZDHHC18 or MARCH8, other post-translational modification (PTM) enzymes and/or membrane-associated scaffolds may also modulate cGAS. We will summarize our current datasets that support this possibility and outline how this selectivity relates to organelle identity.

      (3) Why membrane association suppresses cGAS activity (Reviewer #1)

      We will provide a concise mechanistic rationale—integrating our published work—to explain how membrane-proximal sequestration can limit cGAS catalysis despite cGAS–DNA coexistence within condensates. Specifically, we will discuss (i) IDR-dependent changes in condensate properties, and (ii) PTMs by ZDHHC18/MARCH8 that allosterically reduce catalytic efficiency; we will clearly cross-reference our prior publications that bear on these points.

      (4) Reconciling Fig. S7 (DNA-dependent binding) with Fig. 5 (recruitment to IDR droplets) (Reviewer #2)

      We will add text to clarify experimental context and readouts to prove that there is no real contradiction between Fig. S7 and Fig. 5. In the experiment shown in Fig. 5, PEG (a macromolecular crowding agent) was added to the system, which facilitates the formation of IDR phase-separated droplets. Under these conditions, cGAS partitions into the IDR condensates, leading to the observed recruitment. In contrast, Fig. S7 examines the direct physical interaction between cGAS and the IDRs using biochemical pull-down assays and shows that no direct interaction occurs in the absence of DNA. These two results reflect different experimental contexts and are therefore not mutually exclusive.

      (5) Planned additional tests to address specificity and mechanism (Reviewer #2)

      DNA pull-down: to test whether IDRs alter cGAS–DNA affinity, we will compare cGAS binding to DNA with/without MEMCA IDRs (and with charged-residue mutants).

      Domain mapping: to determine which region of cGAS engages MEMCA IDRs, we will map binding using cGAS N-terminus/core-domain truncations and key surface mutants.

      Physiological in vitro LLPS: we will repeat cGAS–DNA–IDR LLPS assays under physiological buffer conditions and report partition coefficients, FRAP, and phase diagrams to ensure physiological relevance.

      (6) Image clarity and data presentation (Reviewer #2):

      We will improve image resolution, add zoomed-in insets with organelle markers, and provide more significant Cy5-ISD signal.

      (7) Nuclear localization of cGAS and system considerations (Reviewer #3)

      We will explicitly document the nuclear signal of cGAS observed in our confocal experiments, detail the cell lines and expression systems used. We will also clarify cGAS nuclear localization in the cell lines used.

      (8) Endogenous validation and cell line consistency (Reviewer #3):

      We will perform experiments in primary cells (knockout macrophages) to address the concern of relying on overexpression.

      (9) Language and grammar (Reviewer #3):

      We will thoroughly revise the manuscript for grammar and clarity.

      Together, these planned revisions will strengthen the mechanistic basis of our findings and provide direct evidence for the physiological role of organelle-tethered IDRs in regulating cGAS activity.

    1. eLife Assessment

      Ruppert et al. investigated how activation of thermogenesis by cold exposure (CE) and methionine restriction (MetR) impacts health and leads to weight loss in mice. The authors provided valuable datasets showing that the responses to MR and CE are tissue-specific, while MR and CE affect beige adipose similarly. Although the study is descriptive, the data analyses are solid, with well-supported conclusions drawn from the findings.

    2. Reviewer #1 (Public review):

      Summary:

      Activation of thermogenesis by cold exposure and dietary protein restriction are two lifestyle changes that impact health in humans and lead to weight loss in model organisms - here, in mice. How these affect liver and adipose tissues has not been thoroughly investigated side by side. In mice, the authors show that the responses to methionine restriction and cold exposure are tissue-specific, while the effects on beige adipose are somewhat similar.

      Strengths:

      The strength of the work is the comparative approach, using transcriptomics and bioinformatic analyses to investigate the tissue-specific impact. The work was performed in mouse models and is state-of-the-art. This represents an important resource for researchers in the field of protein restriction and thermogenesis.

      Weaknesses:

      The findings are descriptive, and the conclusions remain associative. The work is limited to mouse physiology, and the human implications have not been investigated yet.

    3. Reviewer #2 (Public review):

      Summary:

      This study provides a library of RNA sequencing analysis from brown fat, liver, and white fat of mice treated with two stressors - cold challenge and methionine restriction - alone and in combination (interaction between diet and temperature). They characterize the physiologic response of the mice to the stressors, including effects on weight, food intake, and metabolism. This paper provides evidence that while both stressors increase energy expenditure, there are complex tissue-specific responses in gene expression, with additive, synergistic, and antagonistic responses seen in different tissues.

      Strengths:

      The study design and implementation are solid and well-controlled. Their writing is clear and concise. The authors do an admirable job of distilling the complex transcriptome data into digestible information for presentation in the paper. Most importantly, they do not overreach in their interpretation of their genomic data, keeping their conclusions appropriately tied to the data presented. The discussion is well thought out and addresses some interesting points raised by their results.

      Weaknesses:

      The major weakness of the paper is the almost complete reliance on RNA sequencing data, but it is presented as a transcriptomic resource.

    4. Reviewer #3 (Public review):

      Summary:

      Ruppert et al. present a well-designed 2×2 factorial study directly comparing methionine restriction (MetR) and cold exposure (CE) across liver, iBAT, iWAT, and eWAT, integrating physiology with tissue-resolved RNA-seq. This approach allows a rigorous assessment of where dietary and environmental stimuli act additively, synergistically, or antagonistically. Physiologically, MetR progressively increases energy expenditure (EE) at 22{degree sign}C and lowers RER, indicating a lipid utilization bias. By contrast, a 24-hour 4 {degree sign}C challenge elevates EE across all groups and eliminates MetR-Ctrl differences. Notably, changes in food intake and activity do not explain the MetR effect at room temperature.

      Strengths:

      The data convincingly support the central claim: MetR enhances EE and shifts fuel preference to lipids at thermoneutrality, while CE drives robust EE increases regardless of diet and attenuates MetR-driven differences. Transcriptomic analysis reveals tissue-specific responses, with additive signatures in iWAT and CE-dominant effects in iBAT. The inclusion of explicit diet×temperature interaction modeling and GSEA provides a valuable transcriptomic resource for the field.

      Weaknesses:

      Limitations include the short intervention windows (7 d MetR, 24 h CE), use of male-only cohorts, and reliance on transcriptomics without complementary proteomic, metabolomic, or functional validation. Greater mechanistic depth, especially at the level of WAT thermogenic function, would strengthen the conclusions.

    1. eLife Assessment

      This interesting study adapts machine learning tools to analyze movements of a chromatin locus in living cells in response to serum starvation. The machine learning approach developed is useful, the experiments are well controlled, and the data are solid. The study would be greatly strengthened by testing key predictions made using perturbation experiments. This work will be of interest to those studying chromosome biology and gene expression patterns.

    2. Reviewer #1 (Public review):

      Summary:

      Redchuk et al. explore the dynamic properties of chromatin upon serum starvation using machine learning approaches. They use CRISPR-tagging to visualize a region on chromosome 1 in human cells and show that in their system, chromosome 1, but not the previously reported chromosomes 10, 13, and X, undergo a change in radial position upon serum starvation. Live cell imaging showed a position change towards the periphery after serum starvation. They then apply a machine learning algorithm for the analysis of the imaging data, which reveals changes in nuclear area during serum starvation and longer displacements of the chromosome 1 locus near the nuclear periphery. Differential behavior of homologues is also reported.

      Strengths:

      (1) The study of chromatin dynamics is an interesting and important area of research.

      (2) The use of machine learning approaches to analyze live cell imaging data is timely.

      (3) With serum starvation, the authors use a simple, well-controllable model system.

      Weaknesses:

      (1) This study only provides limited new insight into chromatin dynamics.

      (2) It was not immediately evident what the use of machine learning approaches added to this study. It appears that the main conclusions could have been reached by conventional analysis.

      (3) There are several specific technical points:

      a) It was not clear what the CRISRP-Sirius probes actually labelled. The chromosome 1 sgRNA sequence is provided, but I could not find information as to which region(s) of the chromosome are actually labelled (size, location, etc.).

      b) The authors visualize a relatively small region of chromosome 1 but make conclusions regarding the entire chromosome. Additional probes on the same chromosome should be used.

      Related to this point, the discussion of why the authors are unable to reproduce the prior findings of relocation of chromosomes 10, 13, and X is not satisfying. It would be worth comparing the FISH-based painting of entire chromosomes, which generated the results suggesting relocation of these chromosomes, with the point-labelling method used here.

      c) The study lacks controls. Since in their hands chromosomes 10, 13, and X do not change position, they should be used as a negative control in all experiments demonstrating a shift in the location of chromosome 1.

      d) I did not find information about the spatial or temporal resolution of the imaging modality. This is important to assess whether the observed changes in position, relative to time, are meaningful.

      e) The authors analyze surprisingly early timepoints (up to 40 minutes) of serum starvation. Would these results look different if longer serum starvation timepoints of several hours were analyzed?

      f) The authors can do a better job of explaining what the biological meaning of the various parameters (DistR, TDist, etc.) they measure is.

      g) I did not understand the reasoning for the authors' conclusion of differential behavior of homologues. Please explain this better, or idealy use more direct labeling methods that identify the individual homologues.

      h) In many figures, statistical analysis of the data is missing, including, but not limited to, Figures 1B, C, G, Figures 4, 5, 6.

      i) No information is provided throughout the manuscript as to how many cells were analyzed in each experiment. This should be indicated in every figure legend.

    3. Reviewer #2 (Public review):

      Summary:

      The study demonstrates that CRISPR-Sirius provides a powerful approach to investigating chromosome dynamics in living cells during environmental stress. By focusing on serum starvation, the authors show that this process induces global nuclear changes, including a reduction in nuclear area and increased morphological dynamism, while at the same time driving specific reorganization of chromosome 1. Chromosome 1 relocates toward the nuclear periphery and displays distinctive patterns of motion, maintaining overall motility but punctuated by occasional long-distance displacements, particularly near the nuclear envelope. Importantly, the analysis reveals that homologous copies of chromosome 1 do not behave uniformly: peripheral loci become more mobile and responsive to starvation, whereas central homologs remain comparatively stable, often associated with nucleolar subcompartments. By integrating live imaging with machine learning and explainable AI analysis, the study highlights the complexity of nuclear organization and provides valuable insights into how chromosome-specific and locus-specific responses to stress are orchestrated within the three-dimensional nuclear landscape.

      Strengths:

      The study uses live-cell imaging to investigate the dynamics of loci during starvation. Live-cell tracking and data interpretation are carried out using machine learning and AI models, which is a major strength.

      Weaknesses:

      The manuscript is at times difficult to follow, partly because the methodological descriptions are highly specialized, especially for non-expert biologists. In addition, the observations are not tested for a mechanistic basis. Experiments that could provide deeper insights are missing, for example, why chromosome 1 moves, why the peripheral homologue dislocates, or why a "long jump" is observed at the periphery even though the speed of the loci does not change. It is also unclear whether a displacement of 0.5 μm is functionally meaningful.

    1. eLife Assessment

      This study characterises motor and somatosensory cortex neural activity during naturalistic eating and drinking tongue movement in nonhuman primates. The data, which include electrophysiology, three-dimensional tracking of tongue movements, and nerve block manipulations, are valuable to neuroscientists and neural engineers interested in tongue use. Although the current analyses provide a solid description of single neuron activity in these areas, both the population level analyses and the characterisation of activity changes following nerve block could be improved.

    2. Reviewer #1 (Public review):

      Summary:

      Hosack and Arce-McShane investigate how the 3D movement direction of the tongue is represented in the orofacial part of the sensory-motor cortex and how this representation changes with the loss of oral sensation. They examine the firing patterns of neurons in the orofacial parts of the primary motor cortex (MIo) and somatosensory cortex (SIo) in non-human primates (NHPs) during drinking and feeding tasks. While recording neural activity, they also tracked the kinematics of tongue movement using biplanar video-radiography of markers implanted in the tongue. Their findings indicate that many units in both MIo and SIo are directionally tuned during the drinking task. However, during the feeding task, directional turning was more frequent in MIo units and less prominent in SIo units. Additionally, in some recording sessions, they blocked sensory feedback using bilateral nerve block injections, which seemed to result in fewer directionally tuned units and changes in the overall distribution of the preferred direction of the units.

      Strengths:

      The most significant strength of this paper lies in its unique combination of experimental tools. The author utilized a video-radiography method to capture 3D kinematics of the tongue movement during two behavioral tasks while simultaneously recording activity from two brain areas. This specific dataset and experimental setup hold great potential for future research on the understudied orofacial segment of the sensory-motor area.

      Weaknesses:

      A substantial portion of the paper is dedicated to establishing directional tuning in individual neurons, followed by an analysis of how this tuning changes when sensory feedback is blocked. While such characterizations are valuable, particularly in less-studied motor cortical areas and behaviors, the discrepancies in tuning changes across the two NHPs, coupled with the overall exploratory nature of the study, render the interpretation of these subtle differences somewhat speculative. At the population level, both decoding analyses and state space trajectories from factor analysis indicate that movement direction (or spout location) is robustly represented. However, as with the single-cell findings, the nuanced differences in neural trajectories across reach directions and between baseline and sensory-block conditions remain largely descriptive. To move beyond this, model-based or hypothesis-driven approaches are needed to uncover mechanistic links between neural state space dynamics and behavior.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript by Hosack and Arce-McShane examines the directional tuning of neurons in macaque primary motor (MIo) and somatosensory (SIo) cortex. The neural basis of tongue control is far less studied than, for example, forelimb movements, partly because the tongue's kinematics and kinetics are difficult to measure. A major technical advantage of this study is using biplanar video-radiography, processed with modern motion tracking analysis software, to track the movement of the tongue inside the oral cavity. Compared to prior work, the behaviors are more naturalistic behaviors (feeding and licking water from one of three spouts), although the animals were still head-fixed.

      The study's main findings are that:

      • A majority of neurons in MIo and a (somewhat smaller) percentage of SIo modulated their firing rates during tongue movements, with different modulation depending on the direction of movement (i.e., exhibited directional tuning). Examining the statistics of tuning across neurons, there was anisotropy (e.g., more neurons preferring anterior movement) and a lateral bias in which tongue direction neurons preferred that was consistent with the innervation patterns of tongue control muscles (although with some inconsistency between monkeys).<br /> • Consistent with this encoding, tongue position could be decoded with moderate accuracy even from small ensembles of ~28 neurons.<br /> • There were differences observed in the proportion and extent of directional tuning between the feeding and licking behaviors, with stronger tuning overall during feeding. This potentially suggests behavioral context-dependent encoding.<br /> • The authors then went one step further and used a bilateral nerve block to the sensory inputs (trigeminal nerve) from the tongue. This impaired the precision of tongue movements and resulted in an apparent reduction and change in neural tuning in Mio and SIo.

      Strengths:

      The data are difficult to obtain and appear to have been rigorously measured, and provide a valuable contribution to this under-explored subfield of sensorimotor neuroscience. The analyses adopt well-established methods especially from the arm motor control literature, and represent a natural starting point for characterizing tongue 3D direction tuning.

      Weaknesses:

      There are alternative explanations from some of the interpretations, but those interpretations are described in a way that clearly distinguishes results from interpretations, and readers can make their own assessments. Some of these limitations are described in more detail below.

      One weakness of the current study is that there is substantial variability in some of the results between monkeys, including the tuning characteristics of primary somatosensory cortex neurons during drinking, and the effect of nerve block on tongue movements and the associated changes in single neuron tuning.

      This study focuses on describing directional tuning using the preferred direction (PD) / cosine tuning model popularized by Georgopoulous and colleagues for understanding neural control of arm reaching in the 1980s. This is a reasonable starting point and a decent first order description of neural tuning. However, the arm motor control field has moved far past that viewpoint, and in some ways an over-fixation on static representational encoding models and PDs held that field back for many years. The manuscript benefit from drawing the readers' attention (perhaps in their Discussion) that PDs are a very simple starting point for characterizing how cortical activity relates to kinematics, but that there is likely much richer population-level dynamical structure and that a more mechanistic, control-focused analytical framework may be fruitful. A good review of this evolution in the arm field can be found in Vyas S, Golub MD, Sussillo D, Shenoy K. 2020. Computation Through Neural Population Dynamics. Annual Review of Neuroscience. 43(1):249-75. A revised version of the manuscript incorporates more population-level analyses, but with inconsistent use of quantifications/statistics and without sufficient contextualization of what the reader is to make of these results.

      The described changes in tuning after nerve block could also be explained by changes in kinematics between these conditions, which temper the interpretation of these interesting results.

      I am not convinced of the claim that tongue directional encoding fundamentally changes between drinking and feeding given the dramatically different kinematics and the involvement of other body parts like the jaw (e.g., the reference to Laurence-Chasen et al. 2023 just shows that there is tongue information independent of jaw kinematics, not that jaw movements don't affect these neurons' activities). I also find the nerve block results inconsistent (more tuning in one monkey, less in the other?) and difficult to really learn something fundamental from, besides that neural activity and behavior both change - in various ways - after nerve block (not at all surprising but still good to see measurements of).

      The manuscript states that "Our results suggest that the somatosensory cortex may be less involved than the motor areas during feeding, possibly because it is a more ingrained and stereotyped behavior as opposed to tongue protrusion or drinking tasks". An alternative explanation be more statistical/technical in nature: that during feeding, there will be more variability in exactly what somatosensation afferent signals are being received from trial to trial (because slight differences in kinematics can have large differences in exactly where the tongue is and the where/when/how of what parts of it are touching other parts of the oral cavity)? This variability could "smear out" the apparent tuning using these types of trial-averaged analyses. Given how important proprioception and somatosensation are for not biting the tongue or choking, the speculation that somatosensory cortical activity is suppressed during feedback is very counter-intuitive to this reviewer. In the revised manuscript the authors note these potential confounds and other limitations in the Discussion.

    4. Reviewer #3 (Public review):

      Summary

      In this study, the authors aim to uncover how 3D tongue direction is represented in the Motor (M1o) and Somatosensory (S1o) cortex. In non-human primates implanted with chronic electrode arrays, they use X-ray based imaging to track the kinematics of the tongue and jaw as the animal is either chewing food or licking from a spout. They then correlate the tongue kinematics with the recorded neural activity. They perform both single-unit and population level analyses during feeding and licking. Then, they recharacterize the tuning properties after bilateral lidocaine injections in the two sensory branches of the trigeminal nerve. They report that their nerve block causes a reorganization of the tuning properties and population trajectories. Overall, this paper concludes that M1o and S1o both contain representations of the tongue direction, but their numbers, their tuning properties and susceptibility to perturbed sensory input are different.

      Strengths

      The major strengths of this paper are in the state-of-the-art experimental methods employed to collect the electrophysiological and kinematic data. In the revision, the single-unit analyses of tuning direction are robustly characterized. The differences in neural correlations across behaviors, regions and perturbations are robust. In addition to the substantial amount of largely descriptive analyses, this paper makes two convincing arguments 1) The single-neuron correlates for feeding and licking in OSMCx are different - and can't be simply explained by different kinematics and 2) Blocking sensory input alters the neural processing during orofacial behaviors. The evidence for these claims is solid.

      Weaknesses

      The main weakness of this paper is in providing an account for these differences to get some insight into neural mechanisms. For example, while the authors show changes in neural tuning and different 'neural trajectory' shapes during feeding and drinking - their analyses of these differences are descriptive and provide limited insight for the underlying neural computations.

    5. Author response:

      The following is the authors’ response to the current reviews.

      We have significant concerns about the eLife assessment and the reviews. The reviewers acknowledged substantial strengths in our work:

      • Reviewer 3 noted that “the single-unit analyses of tuning direction are robustly characterized”, “the differences in neural correlations across behaviors, regions and perturbations are robust”, and “The evidence for these claims is solid.”

      • Reviewer 2 stated that “the manuscript has been improved” with “new analyses [that] provide improved rigor”.

      Despite these, the final eLife assessment inexplicably downplayed the significance of the findings and strength of evidence.

      Broader Impact and Significance. The findings, not only the data, have theoretical and/or practical implications extending well beyond a single subfield relevant to:

      1. behavioral neuroscientists studying sensorimotor integration

      2. systems and theoretical neuroscientists

      3. neural and biomechanical engineers working on brain-computer interfaces for speech or oral or limb prosthetics

      4. soft robotics researchers

      5. comparative motor control researchers

      6. clinicians involved in the evaluation and rehabilitation of orolingual function (e.g., after stroke or glossectomy, dysphagia)

      Given this broad relevance, we question why the significance was characterized as merely "useful" rather than "important."

      Dismissive Tone Toward Descriptive Research. Some reviews displayed a dismissive or skeptical tone of the findings and their significance, even when methods were solid and support for the claims were strong. They critiqued the “descriptive nature” of our study, faulting the lack of mechanistic explanation. However, in poorly understood fields such as orofacial sensorimotor control, descriptive studies provide the empirical foundation for mechanistic studies. Rich descriptive data generate testable hypotheses that drive mechanistic discoveries forward, while mechanistic studies conducted without this groundwork often pursue precise answers to poorly formulated questions.

      Specific Issues with Reviews:

      1. Significant omission in study description:

      The eLife Assessment’s second sentence states: “The data, which include both electrophysiology and nerve block manipulations, will be of value to neuroscientists and

      neural engineers interested in tongue use.”

      This description omits our simultaneously recorded high-resolution 3D kinematics data—a significant oversight given that combining high-density electrophysiological recording from multiple cortical regions with high-resolution 3D tongue kinematics during naturalistic behaviors in non-human primates represents one of our study's key strengths. Currently, only two research labs in the US possess this capability.

      2. Overemphasis on the “smaller” and “inconsistent” findings

      While we acknowledge some inconsistent findings between animals, the reviews overemphasized these inconsistencies in ways that cast unwarranted doubt on our more significant and consistent results.

      a. Reviewer 1: “[...] the discrepancies in tuning changes across the two NHPs, coupled with the overall exploratory nature of the study, render the interpretation of these subtle differences somewhat speculative. “[...] in some recording sessions, they blocked sensory feedback using bilateral nerve block injections, which seemed to result in fewer directionally tuned units and changes in the overall distribution of the preferred direction of the units.”

      The skeptical tone of the critique is in opposition to Reviewer 3’s statement that: “the evidence for these claims were solid”. In this statement, the reviewer characterized our findings as “somewhat speculative”, seemingly overlooking robust and consistent changes we documented:

      • “Following nerve block, MIo and SIo showed significant decreases in the proportion of directionally modulated neurons across both tasks (Fig. 10A; Chi-square, MIo: p <0.001, SIo: p < 0.05).”

      • “Nerve block significantly altered PD distributions during both tasks. During feeding, MIo neurons in both subjects exhibited a significant clockwise shift in mean PD toward the center (0°), resulting in more uniform distributions (Fig. 11A; circular k-test, p < 0.01).”

      These results were obtained through careful subsampling of trials with similar kinematics for both feeding and drinking tasks, ensuring that the tuning changes in the nerve block experiments could not be attributed to differing kinematics.

      b. Reviewer 2: “One weakness of the current study is that there is substantial variability in results between monkeys.”

      This vague critique, without specifying which results showed “substantial variability”, reads as though most findings were inconsistent, unfairly casting doubt on our study’s validity.

      3. Inaccurate statements in the Reviewers’ summaries

      Several reviewer statements contain factual inaccuracies:

      a. Reviewer 2: “A majority of neurons in MIo and a (somewhat smaller) percentage of SIo modulated their firing rates during tongue movements, with different modulation depending on the direction of movement (i.e., exhibited directional tuning).”

      Reviewer 2's characterization of directional tuning misrepresents our findings. We reported substantial differences in the proportion of directionally tuned neurons between MIo and SIo during the feeding task but a smaller difference in the drinking task:

      • “The proportion of directionally tuned neurons [...] differed significantly between MIo and SIo during the feeding task in both subjects (Chi-square, p < 0.001). In rostral and caudal MIo, 80% of neurons were modulated to 3D direction (bootstrap, p < 0.05, Fig. 3B, left), compared to 52% in areas 1/2 and 3a/3b.

      • “During drinking, the proportion of directionally modulated neurons was more similar between regions (69% in MIo vs. 60% in SIo: Chi-square, p > 0.05, Fig. 3B right).”

      b. Reviewer 2: “There were differences observed in the proportion and extent of directional tuning between the feeding and licking behaviors, with stronger tuning overall during licking.”

      Reviewer 2's claim about task differences directly contradicts our findings. We consistently reported stronger tuning in feeding compared to drinking across multiple measures:

      • “The proportion of directionally tuned neurons was higher in the feeding vs. drinking task (Chi-square, p < 0.05, feeding: 72%, drinking: 66%)”;

      • “Cumulative explained variance for the first three factors was higher in feeding (MIo: 82%, SIo: 81%) than in drinking (MIo: 74%, SIo: 63%)”;

      • “Decoding using LSTM showed consistently higher accuracies in feeding compared to drinking regardless of the length of intervals used ..., behavioral window .., and directional angles ...”

      These results were also summarized in the Discussion.

      c. Reviewer 1: In Figure 12, factor 2 and 3 are plotted against each other? and factor 1 is left out?

      Reviewer 1’s observation about Figure 12 is incorrect. Factor 1 was included: Top subplots (feeding) show Factor 1 vs 3 (MIo) and Factor 1 vs 2 (SIo) while the bottom subplots (drinking) show Factor 2 vs 3 (MIo) and Factor 1 vs 2 (SIo). We plotted the two latent factors with highest explained variance for clarity, though all 20 factors were included in intertrajectory distance calculations.

      4. Framing and interpretive over-scrutiny

      Several critiques targeted framing rather than methodological rigor and emphasized that interpretations were speculative even when appropriately hedged:

      a. Reviewer 2: “A revised version of the manuscript incorporates more population-level analyses, but with inconsistent use of quantifications/statistics and without sufficient contextualization of what the reader is to make of these results.”

      Reviewer 2 mentioned "inconsistent use of quantifications/statistics" without specifying which analyses were problematic or updating their summary to include our additional population-level findings.

      b. Reviewer 2: “The described changes in tuning after nerve block could also be explained by changes in kinematics between these conditions, which temper the interpretation of these interesting results”

      Despite our addressing kinematic concerns through subsampled data analysis, Reviewer 2 remained unsatisfied, contrasting sharply with Reviewer 3's assessment that our arguments were "convincing" with "solid" evidence.

      c. Reviewer 2: “I am not convinced of the claim that tongue directional encoding fundamentally changes between drinking and feeding given the dramatically different kinematics and the involvement of other body parts like the jaw”

      Reviewer 2 expressed skepticism about fundamental encoding differences between tasks, despite our comprehensive controls including subsampled data with similar kinematics and multiple verification analyses (equal neuron numbers, stable neurons, various interval lengths, behavioral windows, and directional angles).

      Without describing why these analyses were insufficient, this criticism goes beyond methods or statistics. It casts doubt and challenges whether the conclusions are even worth drawing despite careful experimental controls.

      d. Reviewer 2: “The manuscript states that "An alternative explanation be more statistical/technical in nature: that during feeding, there will be more variability in exactly what somatosensation afferent signals are being received from trial to trial (because slight differences in kinematics can have large differences in exactly where the tongue is and the where/when/how of what parts of it are touching other parts of the oral cavity)? This variability could "smear out" the apparent tuning using these types of trial-averaged analyses. Given how important proprioception and somatosensation are for not biting the tongue or choking, the speculation that somatosensory cortical activity is suppressed during feedback is very counter-intuitive to this reviewer".

      By not updating this section, Reviewer 2 failed to acknowledge our responsive revisions, including Fano factor analysis showing higher variability in SIo during feeding versus drinking, and our updated discussion addressing their concerns about trial-to-trial variability: “Varying tongue shape, tongue’s contact with varying bolus properties (size and texture) and other oral structures (palate, teeth) may weaken the directional signal contained in SIo activity. Thus, small differences in tongue kinematics might create large differences in sensory signals across trials. When looking at trial-averaged signals, this natural variability could make the neural response patterns appear less precise or specific than they are. These are consistent with our findings that for both tasks, spiking variability was higher in SIo.”

      Authors’ Response to Recommendations for the authors:

      We thank the editors and the reviewers for their helpful comments. We have provided a response to reviewers’ recommendations and made some revisions on the manuscript. 

      Reviewer #1 (Recommendations for the authors): 

      In the newly added population factor analysis, several methodological decisions remain unclear to me:

      In Figure 7, why do the authors compare the mean distance between conditions in the latent spaces of MIo and SIo? Since these latent spaces are derived separately, they exist on different scales (with MIo appearing roughly four times larger than SIo), and this discrepancy is reflected in the reported mean distances (Figure 7, inset plots). Wouldn't this undermine a direct comparison?

      Thank you for this helpful feedback. The reviewer is correct that the latent spaces are derived separately for MIo and SIo, thus they exist on different scales as we have noted in the caption of Figure 7: “Axes for SIo are 1/4 scale of MIo.” 

      To allow for a direct comparison between MIo and SIo, we corrected the analysis by comparing their normalized mean inter-trajectory distances obtained by first calculating the geometric index (GI) of the inter-trajectory distances, d, between each pair of population trajectories per region as: GI= (d<sub>1</sub>-d<sub>2</sub>)/ (d<sub>1</sub>+d<sub>2</sub>). We then performed the statistics on the GIs and found a significant difference between mean inter-trajectory distances in MIo vs. SIo. We performed the same analysis comparing the distance travelled between MIo and SIo trajectories by getting the normalized difference in distances travelled and still found a significant difference in both tasks. We have updated the results and figure inset to reflect these changes.

      In Figure 12, unlike Figure 7 which shows three latent dimensions, only two factors are plotted. While the methods section describes a procedure for selecting the optimal number of latent factors, Figure 7 - figure supplement 3 shows that variance explained continues to increase up to about five latent dimensions across all areas. Why, then, are fewer dimensions shown?

      Thank you for the opportunity to clarify the figure. The m obtained from the 3-fold crossvalidation varied for the full sample and was 20 factors for the subsample. We clarify that all statistical analyses were done using 20 latent factors. Using the full sample of neurons, the first 3 factors explained 81% of variance in feeding data compared to 71% in drinking data. When extended to 5 factors, feeding maintained its advantage with 91% variance explained versus 82% for drinking. Because feeding showed higher variance explained than drinking across 3 or 5 factors, only three factors were shown in Figure 7 for better visualization. We added this clarification to the Methods and Results.

      Figure 12 shows the differences in the neural trajectories between the control and nerve block conditions. The control vs. nerve block comparison complicated the visualization of the results. Thus, we plotted only the two latent factors with the highest separation between population trajectories. This was clarified in the Methods and caption of Figure 12.

      In Figure 12, factor 2 and 3 are plotted against each other? and factor 1 is left out?

      This observation is incorrect; Factor 1 was included: Top subplots (feeding) show Factor 1 vs 3 (MIo) and Factor 1 vs 2 (SIo) while the bottom subplots (drinking) show Factor 2 vs 3 (MIo) and Factor 1 vs 2 (SIo).  We have clarified this in the Methods and caption of Figure 12.

      Finally, why are factor analysis results shown only for monkey R? 

      Factor analysis results were performed on both animals, but the results were shown only for monkey R to decrease the number of figures in the manuscript. Figure 7- figure supplement 1 shows the data for both monkeys. Here are the equivalent Figure 7 plots for monkey Y. 

      Author response image 1.

      Reviewer #2 (Recommendations for the authors): 

      Overall, the manuscript has been improved. 

      New analyses provide improved rigor (as just one example, organizing the feeding data into three-category split to better match the three-direction drinking data decoding analysis and also matching the neuron counts).

      The updated nerve block change method (using an equal number of trials with a similar leftright angle of movement in the last 100 ms of the tongue trajectory) somewhat reduces my concern that kinematic differences could account for the neural changes, but on the other hand the neural analyses use 250 ms (meaning that the neural differences could be related to behavioral differences earlier in the trial). Why not subselect to trials with similar trajectories throughout the whole movement(or at least show that as an additional analysis, albeit one with lower trial counts). 

      As the reviewer pointed out, selecting similar trajectories throughout the whole movement would result in lower trial counts that lead to poor statistical power. We think that the 100 ms prior to maximum tongue protrusion is a more important movement segment to control for similar kinematics between the control and nerve block conditions since this represents the subject’s intended movement endpoint. 

      A lot of the Results seemed like a list of measurements without sufficient hand-holding or guide-posting to explain what the take-away for the reader should be. Just one example to make concrete this broadly-applicable feedback: "Cumulative explained variance for the first three factors was higher in feeding (MIo: 82%, SIo: 81%) than in drinking (MIo: 74%, SIo: 63%) when all neurons were used for the factor analysis (Fig. 7)": why should we care about 3 factors specifically? Does this mean that in feeding, the neural dimensionality is lower (since 3 factors explain more of it)? Does that mean feeding is a "simpler" behavior (which is counter-intuitive and does not conform to the authors' comments about the higher complexity of feeding). And from later in that paragraph: what are we do make of the differences in neural trajectory distances (aside from quantifying using a different metric the same larger changes in firing rates that could just as well be quantified as statistics across single-neuron PETHs)?

      Thank you for the feedback on the writing style. We have made some revisions to describe the takeaway for the reader. That fewer latent factors explain 80% of the variance in the feeding data means that the underlying network activity is relatively simple despite apparent complexity. When neural population trajectories are farther away from each other in state space, it means that the patterns of activity across tongue directions are more distinct and separable, thus, less likely to be confused with each other. This signifies that neural representations of 3D tongue directions are more robust. When there is better neural discrimination and more reliable information processing, it is easier for downstream brain regions to distinguish between different tongue directions.  

      The addition of more population-level analyses is nice as it provides a more efficient summary of the neural measurements. However, it's a surface-level dive into these methods; ultimately the goal of ensemble "computation through dynamics" analyses is to discover simpler structure / organizational principles at the ensemble level (i.e., show things not evidence from single neurons), rather than just using them as a way to summarize data. For instance, here neural rotations are remarked upon in the Results, without referencing influential prior work describing such rotations and why neural circuits may use this computational motif to separate out conditions and shape muscle activity-generating readouts (Churchland et al. Nature 2012 and subsequent theoretical iterations including the Russo et al.). That said, the Russo et al tangling study was well-referenced and the present tangling results were eGectively contextualized with respect to that paper in terms of the interpretation. I wish more of the results were interpreted with comparable depth. 

      Speaking of Russo et al: the authors note qualitative differences in tangling between brain areas, but do not actually quantify tangling in either. These observations would be stronger if quantified and accompanied with statistics.

      Contrary to the reviewer’s critique, we did frame these results in the context of structure/organizational principles at the ensemble level. We had already cited prior work of Churchland et al., 2012; Michaels et al., 2016and Russo et al., 2018. In the Discussion, Differences across behaviors, we wrote: “In contrast, MIo trajectories in drinking exhibited a consistent rotational direction regardless of spout location (Fig. 7). This may reflect a predominant non-directional information such as condition-independent time-varying spiking activity during drinking (Kaufman et al., 2016; Kobak et al., 2016; Arce-McShane et al., 2023).” 

      Minor suggestions: 

      Some typos, e.g. 

      • no opening parenthesis in "We quantified directional differences in population activity by calculating the Euclidean distance over m latent factors)"

      • missing space in "independent neurons(Santhanam et al., 2009;..."); 

      • missing closing parentheses in "followed by the Posterior Inferior (Figure 3 - figure supplement 1."

      There is a one-page long paragraph in the Discussion. Please consider breaking up the text into more paragraphs each organized around one key idea to aid readability.

      Thank you, we have corrected these typos.

      Could it be that the Kaufman et al 2013 reference was intended to be Kaufman et al 2015 eNeuro (the condition-invariant signal paper)?

      Thank you, we have corrected this reference.

      At the end of the Clinical Implications subsection of the Discussion, the authors note the growing field of brain-computer interfaces with references for motor read-out or sensory write-in of hand motor/sensory cortices, respectively. Given that this study looks at orofacial cortices, an even more clinically relevant development is the more recent progress in speech BCIs (two     recent reviews: https://www.nature.com/articles/s41583-024-00819-9, https://www.annualreviews.org/content/journals/10.1146/annurev-bioeng-110122012818) many of which record from human ventral motor cortex and aspirations towards FES-like approaches for orofacial movements (e.g., https://link.springer.com/article/10.1186/s12984-023-01272-y).  

      Thank you, we have included these references.

      Reviewer #3 (Recommendations for the authors): 

      Major Suggestions 

      (1) For the factor analysis of feeding vs licking, it appears that the factors were calculated separately for the two behaviors. It could be informative to calculate the factors under both conditions and project the neural data for the two behaviors into that space. The overlap/separations of the subspace could be informative. 

      We clarify that we performed a factor analysis that included both feeding and licking for MIo, as stated in the Results: “To control for factors such as different neurons and kinematics that might influence the results, we performed factor analysis on stable neurons across both tasks using all trials (Fig. 7- figure supplement 2A) and using trials with similar kinematics (Fig. 7- figure supplement 2B).” We have revised the manuscript to reflect this more clearly.

      (2) For the LSTM, the Factor analyses and the decoding it is unclear if the firing rates are mean subtracted and being normalized (the methods section was a little unclear). Typically, papers in the field either z-score the data or do a softmax.

      The firing rates were z-scored for the LSTM and KNN. For the factor analysis, the spike counts were not z-scored, but the results were normalized. We clarified this in the Methods section.

      Minor: 

      Page 1: Abstract- '... how OSMCx contributes to...' 

      Since there are no direct causal manipulations of OSMCx in this manuscript, this study doesn't directly study the OSMCx's contribution to movement - I would recommend rewording this sentence.

      Similarly, Page 2: 'OSMCx plays an important role in coordination...' the citations in this paragraph are correlative, and do not demonstrate a causal role.

      There are similar usages of 'OSMCx coordinates...' in other places e.g. Page 8. 

      Thank you, we revised these sentences.

      Page 7: the LSTM here has 400 units, which is a very large network and contains >12000 parameters. Networks of this size are prone to memorization, it would be wise to test the rsquare of the validation set against a shuGled dataset to see if the network is actually working as intended. 

      Thank you for bringing up this important point of verifying that the network is learning meaningful patterns versus memorizing. Considering the size of our training samples, the ratio of samples to parameters is appropriate and thus the risk of memorization is low. Indeed, validation tests and cross-validation performed indicated expected network behavior and the R squared values obtained here were similar to those reported in our previous paper (Laurence-Chasen et al., 2023).


      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In their paper, Hosack and Arce-McShane investigate how the 3D movement direction of the tongue is represented in the orofacial part of the sensory-motor cortex and how this representation changes with the loss of oral sensation. They examine the firing patterns of neurons in the orofacial parts of the primary motor cortex (MIo) and somatosensory cortex (SIo) in non-human primates (NHPs) during drinking and feeding tasks. While recording neural activity, they also tracked the kinematics of tongue movement using biplanar videoradiography of markers implanted in the tongue. Their findings indicate that most units in both MIo and SIo are directionally tuned during the drinking task. However, during the feeding task, directional turning was more frequent in MIo units and less prominent in SIo units. Additionally, in some recording sessions, they blocked sensory feedback using bilateral nerve block injections, which resulted in fewer directionally tuned units and changes in the overall distribution of the preferred direction of the units.

      Strengths:

      The most significant strength of this paper lies in its unique combination of experimental tools. The author utilized a video-radiography method to capture 3D kinematics of the tongue movement during two behavioral tasks while simultaneously recording activity from two brain areas. Moreover, they employed a nerve-blocking procedure to halt sensory feedback. This specific dataset and experimental setup hold great potential for future research on the understudied orofacial segment of the sensory-motor area.

      Weaknesses:

      Aside from the last part of the result section, the majority of the analyses in this paper are focused on single units. I understand the need to characterize the number of single units that directly code for external variables like movement direction, especially for less-studied areas like the orofacial part of the sensory-motor cortex. However, as a field, our decadelong experience in the arm region of sensory-motor cortices suggests that many of the idiosyncratic behaviors of single units can be better understood when the neural activity is studied at the level of the state space of the population. By doing so, for the arm region, we were able to explain why units have "mixed selectivity" for external variables, why the tuning of units changes in the planning and execution phase of the movement, why activity in the planning phase does not lead to undesired muscle activity, etc. See (Gallego et al. 2017; Vyas et al. 2020; Churchland and Shenoy 2024) for a review. Therefore, I believe investigating the dynamics of the population activity in orofacial regions can similarly help the reader go beyond the peculiarities of single units and in a broader view, inform us if the same principles found in the arm region can be generalized to other segments of sensorymotor cortex.

      We thank and agree with the reviewer on the value of information gained from studying population activity. We also appreciate that population analyses have led to the understanding that individual neurons have “mixed selectivity”. We have shown previously that OSMCx neurons exhibit mixed selectivity in their population activity and clear separation between latent factors associated with gape and bite force levels (Arce-McShane FI, Sessle BJ, Ram Y, Ross CF, Hatsopoulos NG (2023) Multiple regions of primate orofacial sensorimotor cortex encode bite force and gape. Front Systems Neurosci. doi: 10.3389/fnsys.2023.1213279. PMID: 37808467 PMCID: 10556252), and chew-side and food types (Li Z & Arce-McShane FI (2023). Cortical representation of mastication in the primate orofacial sensorimotor cortex. Program No. NANO06.05. 2023 Neuroscience Meeting Planner. Washington, D.C.: Society for Neuroscience, 2023. Online.). 

      The primary goal of this paper was to characterize single units in the orofacial region and to do a follow-up paper on population activity. In the revised manuscript, we have now incorporated the results of population-level analyses. The combined results of the single unit and population analyses provide a deeper understanding of the cortical representation of 3D direction of tongue movements during natural feeding and drinking behaviors. 

      Further, for the nerve-blocking experiments, the authors demonstrate that the lack of sensory feedback severely alters how the movement is executed at the level of behavior and neural activity. However, I had a hard time interpreting these results since any change in neural activity after blocking the orofacial nerves could be due to either the lack of the sensory signal or, as the authors suggest, due to the NHPs executing a different movement to compensate for the lack of sensory information or the combination of both of these factors. Hence, it would be helpful to know if the authors have any hint in the data that can tease apart these factors. For example, analyzing a subset of nerve-blocked trials that have similar kinematics to the control.

      Thank you for bringing this important point. We agree with the reviewer that any change in the neural activity may be attributed to lack of sensory signal or to compensatory changes or a combination of these factors. To tease apart these factors, we sampled an equal number of trials with similar kinematics for both control and nerve block feeding sessions. We added clarifying description of this approach in the Results section of the revised manuscript: “To confirm this e ect was not merely due to altered kinematics, we conducted parallel analyses using carefully subsampled trials with matched kinematic profiles from both control and nerve-blocked conditions.”

      Furthermore, we ran additional analysis for the drinking datasets by subsampling a similar distribution of drinking movements from each condition. We compared the neural data from an equal number of trials with a similar left-right angle of movement in the last 100 ms of the tongue trajectory, nearest the spout. We compared the directional tuning across an equal number of trials with a similar left-right angle of movement in the last 100 ms of the tongue trajectory, nearest the spout. These analyses that control for similar kinematics showed that there was still a decrease in the proportion of directionally modulated neurons with nerve block compared to the control. This confirms that the results may be attributed to the lack of tactile information. These are now integrated in the revised paper under Methods section: Directional tuning of single neurons, as well as Results section: E ects of nerve block: Decreased directional tuning of MIo and SIo neurons and Figure 10 – figure supplement 1.

      Reviewer #2 (Public review):

      Summary:

      This manuscript by Hosack and Arce-McShane examines the directional tuning of neurons in macaque primary motor (MIo) and somatosensory (SIo) cortex. The neural basis of tongue control is far less studied than, for example, forelimb movements, partly because the tongue's kinematics and kinetics are difficult to measure. A major technical advantage of this study is using biplanar video-radiography, processed with modern motion tracking analysis software, to track the movement of the tongue inside the oral cavity. Compared to prior work, the behaviors are more naturalistic behaviors (feeding and licking water from one of three spouts), although the animals were still head-fixed.

      The study's main findings are that:

      • A majority of neurons in MIo and a (somewhat smaller) percentage of SIo modulated their firing rates during tongue movements, with different modulations depending on the direction of movement (i.e., exhibited directional tuning). Examining the statistics of tuning across neurons, there was anisotropy (e.g., more neurons preferring anterior movement) and a lateral bias in which tongue direction neurons preferred that was consistent with the innervation patterns of tongue control muscles (although with some inconsistency between monkeys).

      • Consistent with this encoding, tongue position could be decoded with moderate accuracy even from small ensembles of ~28 neurons.

      • There were differences observed in the proportion and extent of directional tuning between the feeding and licking behaviors, with stronger tuning overall during licking. This potentially suggests behavioral context-dependent encoding.

      • The authors then went one step further and used a bilateral nerve block to the sensory inputs (trigeminal nerve) from the tongue. This impaired the precision of tongue movements and resulted in an apparent reduction and change in neural tuning in Mio and SIo.

      Strengths:

      The data are difficult to obtain and appear to have been rigorously measured, and provide a valuable contribution to this under-explored subfield of sensorimotor neuroscience. The analyses adopt well-established methods, especially from the arm motor control literature, and represent a natural starting point for characterizing tongue 3D direction tuning.

      Weaknesses:

      There are alternative explanations for some of the interpretations, but those interpretations are described in a way that clearly distinguishes results from interpretations, and readers can make their own assessments. Some of these limitations are described in more detail below.

      One weakness of the current study is that there is substantial variability in results between monkeys, and that only one session of data per monkey/condition is analyzed (8 sessions total). This raises the concern that the results could be idiosyncratic. The Methods mention that other datasets were collected, but not analyzed because the imaging pre-processing is very labor-intensive. While I recognize that time is precious, I do think in this case the manuscript would be substantially strengthened by showing that the results are similar on other sessions.

      We acknowledge the reviewer’s concern about inter-subject variability. Animal feeding and drinking behaviors are quite stable across sessions, thus, we do not think that additional sessions will address the concern that the results could be idiosyncratic. Each of the eight datasets analyzed here have su icient neural and kinematic data to capture neural and behavioral patterns.  Nevertheless, we performed some of the analyses on a second feeding dataset from Monkey R. The results from analyses on a subset of this data were consistent across datasets; for example, (1) similar proportions of directionally tuned neurons, (2) similar distances between population trajectories (t-test p > 0.9), and (3) a consistently smaller distance between Anterior-Posterior pairs than others in MIo (t-test p < 0.05) but not SIo (p > 0.1). 

      This study focuses on describing directional tuning using the preferred direction (PD) / cosine tuning model popularized by Georgopoulous and colleagues for understanding neural control of arm reaching in the 1980s. This is a reasonable starting point and a decent first-order description of neural tuning. However, the arm motor control field has moved far past that viewpoint, and in some ways, an over-fixation on static representational encoding models and PDs held that field back for many years. The manuscript benefits from drawing the readers' attention (perhaps in their Discussion) that PDs are a very simple starting point for characterizing how cortical activity relates to kinematics, but that there is likely much richer population-level dynamical structure and that a more mechanistic, control-focused analytical framework may be fruitful. A good review of this evolution in the arm field can be found in Vyas S, Golub MD, Sussillo D, Shenoy K. 2020. Computation Through Neural Population Dynamics. Annual Review of Neuroscience. 43(1):249-75

      Thank you for highlighting this important point. Research on orofacial movements hasn't progressed at the same pace as limb movement studies. Our manuscript focused specifically on characterizing the 3D directional tuning properties of individual neurons in the orofacial area—an analysis that has not been conducted previously for orofacial sensorimotor control. While we initially prioritized this individual neuron analysis, we recognize the value of broader population-level insights.

      Based on your helpful feedback, we have incorporated additional population analyses to provide a more comprehensive picture of orofacial sensorimotor control and expanded our discussion section. We appreciate your expertise in pushing our work to be more thorough and aligned with current neuroscience approaches.

      Can the authors explain (or at least speculate) why there was such a large difference in behavioral e ect due to nerve block between the two monkeys (Figure 7)?

      We acknowledge this as a variable inherent to this type of experimentation. Previous studies have found large kinematic variation in the effect of oral nerve block as well as in the following compensatory strategies between subjects. Each animal’s biology and response to perturbation vary naturally. Indeed, our subjects exhibited different feeding behavior even in the absence of nerve block perturbation (see Figure 2 in Laurence-Chasen et al., 2022). This is why each individual serves as its own control.

      Do the analyses showing a decrease in tuning after nerve block take into account the changes (and sometimes reduction in variability) of the kinematics between these conditions? In other words, if you subsampled trials to have similar distributions of kinematics between Control and Block conditions, does the effect hold true? The extreme scenario to illustrate my concern is that if Block conditions resulted in all identical movements (which of course they don't), the tuning analysis would find no tuned neurons. The lack of change in decoding accuracy is another yellow flag that there may be a methodological explanation for the decreased tuning result.

      Thank you for bringing up this point. We accounted for the changes in the variability of the kinematics between the control and nerve block conditions in the feeding dataset where we sampled an equal number of trials with similar kinematics for both control and nerve block. However, we did not control for similar kinematics in the drinking task. In the revised manuscript, we have clarified this and performed similar analysis for the drinking task. We sampled a similar distribution of drinking movements from each condition. We compared the neural data from an equal number of trials with a similar left-right angle of movement in the last 100 ms of the tongue trajectory, nearest the spout. There was a decrease in the percentage of neurons that were directionally modulated (between 30 and 80%) with nerve block compared to the control. These results have been included in the revised paper under Methods section: Directional tuning of single neurons, as well as Results section: E ects of nerve block: Decreased directionality of MIo and SIo neurons.

      While the results from decoding using KNN did not show significant differences between decoding accuracies in control vs. nerve block conditions, the results from the additional factor analysis and decoding using LSTM were consistent with the decrease in directional tuning at the level of individual neurons.  

      The manuscript states that "Our results suggest that the somatosensory cortex may be less involved than the motor areas during feeding, possibly because it is a more ingrained and stereotyped behavior as opposed to tongue protrusion or drinking tasks". Could an alternative explanation be more statistical/technical in nature: that during feeding, there will be more variability in exactly what somato sensation afferent signals are being received from trial to trial (because slight differences in kinematics can have large differences in exactly where the tongue is and the where/when/how of what parts of it are touching other parts of the oral cavity)? This variability could "smear out" the apparent tuning using these types of trial-averaged analyses. Given how important proprioception and somatosensation are for not biting the tongue or choking, the speculation that somatosensory cortical activity is suppressed during feedback is very counter-intuitive to this reviewer.

      Thank you for bringing up this point. We have now incorporated this in our revised Discussion (see Comparison between MIo and SIo). We agree with the reviewer that trialby-trial variability in the a erent signals may account for the lower directional signal in SIo during feeding than in drinking. Indeed, SIo’s mean-matched Fano factor in feeding was significantly higher than those in drinking (Author response image 1). Moreover, the results of the additional population and decoding analyses also support this.  

      Author response image 1.

      Comparison of mean-matched Fano Factor between Sio neurons during feeding and drinking control tasks across both subjects (Wilcoxon rank sum test, p < 0.001).

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors aim to uncover how 3D tongue direction is represented in the Motor (M1o) and Somatosensory (S1o) cortex. In non-human primates implanted with chronic electrode arrays, they use X-ray-based imaging to track the kinematics of the tongue and jaw as the animal is either chewing food or licking from a spout. They then correlate the tongue kinematics with the recorded neural activity. Using linear regressions, they characterize the tuning properties and distributions of the recorded population during feeding and licking. Then, they recharacterize the tuning properties after bilateral lidocaine injections in the two sensory branches of the trigeminal nerve. They report that their nerve block causes a reorganization of the tuning properties. Overall, this paper concludes that M1o and S1o both contain representations of the tongue direction, but their numbers, their tuning properties, and susceptibility to perturbed sensory input are different.

      Strengths:

      The major strengths of this paper are in the state-of-the-art experimental methods employed to collect the electrophysiological and kinematic data.

      Weaknesses:

      However, this paper has a number of weaknesses in the analysis of this data.

      It is unclear how reliable the neural responses are to the stimuli. The trial-by-trial variability of the neural firing rates is not reported. Thus, it is unclear if the methods used for establishing that a neuron is modulated and tuned to a direction are susceptible to spurious correlations. The authors do not use shuffling or bootstrapping tests to determine the robustness of their fits or determining the 'preferred direction' of the neurons. This weakness colors the rest of the paper.

      Thank you for raising these points. We have performed the following additional analyses: (1) We have added analyses to ensure that the results could not be explained by neural variability. To show the trial-by-trial variability of the neural firing rates, we have calculated the Fano factor (mean overall = 1.34747; control = 1.46471; nerve block = 1.23023). The distribution was similar across directions, suggesting that responses of MIo and SIo neurons to varying 3D directions were reliable. (2) We have used a bootstrap procedure to ensure that directional tuning cannot be explained by mere chance. (3) To test the robustness of our PDs we also performed a bootstrap test, which yielded the same results for >90% of neurons, and a multiple linear regression test for fit to a cosine-tuning function. In the revised manuscript, the Methods and Results sections have been updated to include these analyses.  

      Author response image 2.

      Comparison of Fano Factor across directions for MIo and SIo Feeding Control (Kruskal-Wallis, p > 0.7).

      The authors compare the tuning properties during feeding to those during licking but only focus on the tongue-tip. However, the two behaviors are different also in their engagement of the jaw muscles. Thus many of the differences observed between the two 'tasks' might have very little to do with an alternation in the properties of the neural code - and more to do with the differences in the movements involved. 

      Using the tongue tip for the kinematic analysis of tongue directional movements was a deliberate choice as the anterior region of the tongue is highly mobile and sensitive due to a higher density of mechanoreceptors. The tongue tip is the first region that touches the spout in the drinking task and moves the food into the oral cavity for chewing and subsequent swallowing. 

      We agree with the reviewer that the jaw muscles are engaged differently in feeding vs. drinking (see Fig. 2). For example, a wider variety of jaw movements along the three axes are observed in feeding compared to the smaller amplitude and mostly vertical jaw movements in drinking. Also, the tongue movements are very different between the two behaviors. In feeding, the tongue moves in varied directions to position the food between left-right tooth rows during chewing, whereas in the drinking task, the tongue moves to discrete locations to receive the juice reward. Moreover, the tongue-jaw coordination differs between tasks; maximum tongue protrusion coincides with maximum gape in drinking but with minimum gape in the feeding behavior. Thus, the different tongue and jaw movements required in each behavior may account for some of the differences observed in the directional tuning properties of individual neurons and population activity. These points have been included in the revised Discussion.

      Author response image 3.

      Tongue tip position (mm) and jaw pitch(degree) during feeding (left) and drinking (right) behaviors. Most protruded tongue position coincides with minimum gape (jaw pitch at 0°) during  feeding but with maximum gape during drinking.

      Many of the neurons are likely correlated with both Jaw movements and tongue movements - this complicates the interpretations and raises the possibility that the differences in tuning properties across tasks are trivial.

      We thank the reviewer for raising this important point. In fact, we verified in a previous study whether the correlation between the tongue and jaw kinematics might explain differences in the encoding of tongue kinematics and shape in MIo (see Supplementary Fig. 4 in Laurence-Chasen et al., 2023): “Through iterative sampling of sub-regions of the test trials, we found that correlation of tongue kinematic variables with mandibular motion does not account for decoding accuracy. Even at times where tongue motion was completely un-correlated with the jaw, decoding accuracy could be quite high.” 

      The results obtained from population analyses showing distinct properties of population trajectories in feeding vs. drinking behaviors provide strong support to the interpretation that directional information varies between these behaviors.

      The population analyses for decoding are rudimentary and provide very coarse estimates (left, center, or right), it is also unclear what the major takeaways from the population decoding analyses are. The reduced classification accuracy could very well be a consequence of linear models being unable to account for the complexity of feeding movements, while the licking movements are 'simpler' and thus are better accounted for.

      We thank the reviewer for raising this point. The population decoding analyses provide additional insight on the directional information in population activity,  as well as a point of comparison with the results of numerous decoding studies on the arm region of the sensorimotor cortex. In the revised version, we have included the results from decoding tongue direction using a long short-term memory (LSTM) network for sequence-tosequence decoding. These results differed from the KNN results, indicating that a linear model such as KNN was better for drinking and that a non-linear and continuous decoder was better suited for feeding.  These results have been included in the revised manuscript.

      The nature of the nerve block and what sensory pathways are being affected is unclear - the trigeminal nerve contains many different sensory afferents - is there a characterization of how e ectively the nerve impulses are being blocked? Have the authors confirmed or characterized the strength of their inactivation or block, I was unable to find any electrophysiological evidence characterizing the perturbation.

      The strength of the nerve block is characterized by a decrease in the baseline firing rate of SIo neurons, as shown in Supplementary Figure 6 of “Loss of oral sensation impairs feeding performance and consistency of tongue–jaw coordination” (Laurence-Chasen et al., 2022)..

      Overall, while this paper provides a descriptive account of the observed neural correlations and their alteration by perturbation, a synthesis of the observed changes and some insight into neural processing of tongue kinematics would strengthen this paper.

      We thank the reviewer for this suggestion. We have revised the Discussion to provide a synthesis of the results and insights into the neural processing of tongue kinematics.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The procedure for anesthesia explained in the method section was not clear to me. The following information was missing: what drug/dose was used? How long the animal was under anesthesia? How long after the recovery the experiments were done?

      The animals were fully sedated with ketamine (100 mg/ml, 10 mg/kg) for less than 30 minutes, and all of the data was collected within 90 minutes after the nerve block was administered.

      (2) In Figure 10, panels A and B are very close together, it was not at first clear whether the text "Monkey R, Monkey Y" belongs to panel A or B.

      We have separated the two panels further in the revised figure.

      (3) I found Figure 11 very busy and hard to interpret. Separating monkeys, fitting the line for each condition, or using a bar plot can help with the readability of the figure.

      Thank you for the suggestion. We agree with you and have reworked this figure. To simplify it we have shown the mean accuracy across iterations.

      (4) I found the laterality discussions like "This signifies that there are more neurons in the left hemisphere contributes toward one direction of tongue movement, suggesting that there is some laterality in the PDs of OSMCx neurons that varies between individuals" bit of an over-interpretation of data, given the low n value and the dissimilarity in how strongly the nerve blocking altered monkies behavior.

      Thank you for sharing this viewpoint. We do think that laterality is a good point of comparison with studies on M1 neurons in the arm/hand region. In our study, we found that the peak of the PD distribution coincides with leftward tongue movements in feeding. The distribution of PDs provides insight into how tongue muscles are coordinated during movement. Intrinsic and extrinsic tongue muscles are involved in shaping the tongue (e.g., elongation, broadening) and positioning the tongue (e.g., protrusion/retraction, elevation/depression), respectively. These muscles receive bilateral motor innervation except for genioglossus. Straight tongue protrusion requires the balanced action of the right and left genioglossi while the lateral protrusion involves primarily the contralateral genioglossus. Given this unilateral innervation pattern, we hypothesized that left MIo/SIo neurons would preferentially respond to leftward tongue movements, corresponding to right genioglossus activation. 

      Reviewer #2 (Recommendations for the authors):

      Are the observation of tuning peaks being most frequently observed toward the anterior and superior directions consistent with the statistics of the movements the tongue typically makes? This could be analogous to anisotropies previously reported in the arm literature, e.g., Lillicrap TP, Scott SH. 2013. Preference Distributions of Primary Motor Cortex Neurons Reflect Control Solutions Optimized for Limb Biomechanics. Neuron. 77(1):168-79

      Thank you for bringing our attention to analogous findings by Lillicrap & Scott, 2013. Indeed, we do observe the highest number of movements in the Anterior Superior directions, followed by the Posterior Inferior. This does align with the distribution of tuning peaks that we observed. Author response image 4 shows the proportions of observed movements in each group of directions across all feeding datasets. We have incorporated this data in the Results section: Neuronal modulation patterns differ between MIo and SIo, as well as added this point in the Discussion.

      Author response image 4.

      Proportion of feeding trials in each group of directions. Error bars represent ±1 standard deviation across datasets (n = 4).

      "The Euclidean distance was used to identify nearest neighbors, and the number of nearest neighbors used was K = 7. This K value was determined after testing different Ks which yielded comparable results." In general, it's a decoding best practice to tune hyperparameters (like K) on fully held-out data from the data used for evaluation. Otherwise, this tends to slightly inflate performance because one picks the hyperparameter that happened to give the best result. It sounds like that held-out validation set wasn't used here. I don't think that's going to change the results much at all (especially given the "comparable results" comment), but providing this suggestion for the future. If the authors replicate results on other datasets, I suggest they keep K = 7 to lock in the method.

      K = 7 was chosen based on the size of our smallest training dataset (n = 55). The purpose of testing different K values was not to select which value gave the best result, but to demonstrate that similar K values did not affect the results significantly. We tested the different K values on a subset of the feeding data, but that data was not fully held-out from the training set. We will keep your suggestion in mind for future analysis.

      The smoothing applied to Figure 2 PSTHs appears perhaps excessive (i.e., it may be obscuring interesting finer-grained details of these fast movements). Can the authors reduce the 50 ms Gaussian smoothing (I assume this is the s.d.?) ~25 ms is often used in studying arm kinematics. It also looks like the movement-related modulation may not be finished in these 200 ms / 500 ms windows. I suggest extending the shown time window. It would also be helpful to show some trial-averaged behavior (e.g. speed or % displacement from start) under or behind the PSTHs, to give a sense of what phase of the movement the neural activity corresponds to.

      Thank you for the suggestion. We have taken your suggestions into consideration and modified Figure 2 accordingly. We decreased the Gaussian kernel to 25 ms and extended the time window shown. The trial-averaged anterior/posterior displacement was also added to the drinking PSTHs.

      Reviewer #3 (Recommendations for the authors):

      The major consideration here is that the data reported for feeding appears to be very similar to that reported in a previous study:

      "Robust cortical encoding of 3D tongue shape during feeding in macaques"

      Are the neurons reported here the same as the ones used in this previous paper? It is deeply concerning that this is not reported anywhere in the methods section.

      These are the same neurons as in our previous paper, though here we include several additional datasets of the nerve block and drinking sessions. We have now included this in the methods section.

      Second, I strongly recommend that the authors consider a thorough rewrite of this manuscript and improve the presentation of the figures. As written, it was not easy to follow the paper, the logic of the experiments, or the specific data being presented in the figures.

      Thank you for this suggestion. We have done an extensive rewrite of the manuscript and revision of the figures.

      A few recommendations:

      (1) Please structure your results sections and use descriptive topic sentences to focus the reader. In the current version, it is unclear what the major point being conveyed for each analysis is.

      Thank you for this suggestion. We have added topic sentences to the begin each section of the results.

      (2) Please show raster plots for at least a few example neurons so that the readers have a sense of what the neural responses look like across trials. Is all of Figure 2 one example neuron or are they different neurons? Error bars for PETH would be useful to show the reliability and robustness of the tuning.

      Figure 2 shows different neurons, one from MIo and one from SIo for each task. There is shading showing ±1 standard error around the line for each direction, however this was a bit difficult to see. In addition to the other changes we have made to these figures, we made the lines smaller and darkened the error bar shading to accentuate this. We also added raster plots corresponding to the same neurons represented in Figure 2 as a supplement.

      (3) Since there are only two data points, I am not sure I understand why the authors have bar graphs and error bars for graphs such as Figure 3B, Figure 5B, etc. How can one have an error bar and means with just 2 data points?

      Those bars represent the standard error of the proportion. We have changed the y-axis label on these figures to make this clearer.

      (4) Results in Figure 6 could be due to differential placement of the electrodes across the animals. How is this being accounted for?

      Yes, this is a possibility which we have mentioned in the discussion. Even with careful placement there is no guarantee to capture a set of neurons with the exact same function in two subjects, as every individual is different. Rather we focus on analyses of data within the same animal. The purpose of Figure 6 is to show the difference between MIo and SIo, and between the two tasks, within the same subject. The more salient result from calculating the preferred direction is that there is a change in the distribution between control and nerve block within the same exact population. Discussions relating to the comparison between individuals are speculative and cannot be confirmed without the inclusion of many more subjects.

      (5) For Figure 7, I would recommend showing the results of the Sham injection in the same figure instead of a supplement.

      Thank you for the suggestion, we have added these results to the figure.

      (6) I think the e ects of the sensory block on the tongue kinematics are underexplored in Figure 7 and Figure 8. The authors could explore the deficits in tongue shape, and the temporal components of the trajectory.

      Some of these effects on feeding have been explored in a previous paper, LaurenceChasen et al., 2022. We performed some additional analyses on changes to kinematics during drinking, including the number of licks per 10 second trial and the length of individual licks. The results of these are included below. We also calculated the difference in the speed of tongue movement during drinking, which generally decreased and exhibited an increase in variance with nerve block (f-test, p < 0.001). However, we have not included these figures in the main paper as they do not inform us about directionality.

      Author response image 5.

      Left halves of hemi-violins (black) are control and right halves (red) are nerve block for an individual. Horizontal black lines represent the mean and horizontal red lines the median. Results of two-tailed t-test and f-test are indicated by asterisks and crosses, respectively: *,† p < 0.05; **,†† p < 0.01; ***,††† p < 0.001.

      (9) In Figures 9 and 10. Are the same neurons being recorded before and after the nerve block? It is unclear if the overall "population" properties are different, or if the properties of individual neurons are changing due to the nerve block.

      Yes, the same neurons are being recorded before and after nerve block. Specifically, Figure 9B shows that the properties of many individual neurons do change due to the nerve block. Differences in the overall population response may be attributed to some of the units having reduced/no activity during the nerve block session.

      Additionally, I recommend that the authors improve their introduction and provide more context to their discussion. Please elaborate on what you think are the main conceptual advances in your study, and place them in the context of the existing literature. By my count, there are 26 citations in this paper, 4 of which are self-citations - clearly, this can be improved upon.

      Thank you for this suggestion. We have done an extensive rewrite of the Introduction and Discussion. We discussed the main conceptual advances in our study and place them in the context of the existing literature.

    1. eLife Assessment

      The authors describe an interesting approach to studying the dynamics and function of membrane proteins in different lipid environments. The important findings have theoretical and practical implications beyond the study of EGFR to all membrane signalling proteins. The evidence supporting the conclusions is convincing, based on the use of a nanodisk system to study membrane proteins in vitro, combined with state-of-the-art single-molecule FRET. The work will be of broad interest to cell biologists and biochemists.

    2. Reviewer #1 (Public review):

      Summary:

      This work addresses a key question in cell signalling: how does the membrane composition affect the behaviour of a membrane signalling protein? Understanding this is important, not just to understand basic biological function but because membrane composition is highly altered in diseases such as cancer and neurodegenerative disease. Although parts of this question have been addressed on fragments of the target membrane protein, EGFR, used here, Srinivasan et al. harness a unique tool, membrane nanodisks, which allow them to probe full-length EGFR in vitro in great detail with cutting-edge fluorescent tools. They find interesting impacts on EGFR conformation in differently charged and fluid membranes, explaining previously identified signalling phenotypes.

      Strengths:

      The nanodisk system enables full-length EGFR to be studied in vitro and in a membrane with varying lipid and cholesterol concentrations. The authors combine this with single-molecule FRET utilising multiple pairs of fluorophores at different places on the protein to probe different conformational changes in response to EGF binding under different anionic lipid and cholesterol concentrations. They further support their findings using molecular dynamics simulations, which help uncover the full atomistic detail of the conformations they observe.

      Weaknesses:

      Much of the interpretation of the results comes down to a bimodal model of an 'open' and 'closed' state between the intracellular tail of the protein and the membrane. Some of the data looks like a bimodal model is appropriate, but its use is not sufficiently justified (statistically or otherwise) in this work in its current form. The experiments with varying cholesterol in particular appear to suggest an alternate model with longer fluorescent lifetimes. More justification of these interpretations of the central experiment of this work would strengthen the paper.

    3. Reviewer #2 (Public review):

      Summary:

      Nanodiscs and synthesized EGFR are co-assembled directly in cell-free reactions. Nanodiscs containing membranes with different lipid compositions are obtained by providing liposomes with corresponding lipid mixtures in the reaction. The authors focus on the effects of lipid charge and fluidity on EGFR activity.

      Strengths:

      The authors implement a variety of complementary techniques to analyze data and to verify results. They further provide a new pipeline to study lipid effects on membrane protein function.

      Weaknesses:

      Due to the relative novelty of the approach, a number of concerns remain.

      (1) I am a little skeptical about the good correlation of the nanodisc compositions with the liposome compositions. I would rather have expected a kind of clustering of individual lipid types in the liposome membrane, in particular of cholesterol. This should then result in an uneven distribution upon nanodisc assembly, i.e., in a notable variation of lipid composition in the individual nanodiscs. Could this be ruled out by the implemented assays, or can just the overall lipid composition of the complete nanodisc fraction be analyzed?

      (2) Both templates have been added simultaneously, with a 100-fold excess of the EGFR template. Was this the result of optimization? How is the kinetics of protein production? As EGFR is in far excess, a significant precipitation, at least in the early period of the reaction, due to limiting nanodiscs, should be expected. How is the oligomeric form of the inserted EGFR? Have multiple insertions into one nanodisc been observed?

      (3) The IMAC purification does not discriminate between EGFR-filled and empty nanodiscs. Does the TEM study give any information about the composition of the particles (empty, EGFR monomers, or EGFR oligomers)? Normalizing the measured fluorescence, i.e., the total amount of solubilized receptor, with the total protein concentration of the samples could give some data on the stoichiometry of EGFR and nanodiscs.

      (4) The authors generally assume a 100% functional folding of EGFR in all analyzed environments. While this could be the case, with some other membrane proteins, it was shown that only a fraction of the nanodisc solubilized particles are in functional conformation. Furthermore, the percentage of solubilized and folded membrane protein may change with the membrane composition of the supplied nanodiscs, while non-charged lipids mostly gave rather poor sample quality. The authors normalize the ATP binding to the total amount of detectable EGFR, and variations are interpreted as suppression of activity. Would the presence of unfolded EGFR fractions in some samples with no access to ATP binding be an alternative interpretation?

    1. eLife Assessment

      In this valuable study, through carefully executed and rigorously controlled experiments, the authors challenged a previously reported role of the Death Receptor 6 (DR6/Tnfrsf21) in Wallerian degeneration (WD). Using two DR6 knockout mouse lines and multiple WD assays, both in vitro and in vivo, the authors provided convincing evidence that loss of DR6 in mice does not protect peripheral axons from WD after injury. Questions remain about whether this conclusion is generalizable to CNS axonal degeneration in disease models such as ALS, AD, and prion diseases. In addition, the authors need to provide information about the sex, age, and genetic background of their animal studies to allow readers to better assess the basis for inconsistencies from previous reports on the protective effects of DR6.

    2. Reviewer #1 (Public review):

      Summary:

      The authors show that genetic deletion of the orphan tumor necrosis factor receptor DR6 in mice does not protect peripheral axons against degeneration after axotomy. Similarly, Schwann cells in DR6 mutant mice react to axotomy similarly to wild-type controls. These negative results are important because previous work has indicated that loss or inhibition of DR6 is protective in disease models and also against Wallerian degeneration of axons following injury. This carefully executed counterexample is important for the field to consider.

      Strengths:

      A strength of the paper is the use of two independent mouse strains that knock out DR6 in slightly different ways. The authors confirm that DR6 mRNA is absent in these models (western blots for DR6 protein are less convincingly null, but given the absence of mRNA, this is likely an issue of antibody specificity). One of the DR6 knockout strains used is the same strain used in a previous paper examining the effects of DR6 on Wallerian degeneration.

      The authors use a series of established assays to evaluate axon degeneration, including light and electron microscopy on nerve histological samples and cultured dorsal root ganglion neurons in which axons are mechanically severed and degeneration is scored in time-lapse microscopy. These assays consistently show a lack of effect of loss of DR6 on Wallerian degeneration in both mouse strains examined.

      Therefore, in the specific context of these experiments, the author's data support their conclusion that loss of DR6 does not protect against Wallerian degeneration.

      Weaknesses:

      The major weaknesses of this paper include the tone of correcting previously erroneous results and the lack of reporting on important details around animal experiments that would help determine whether the results here really are discordant with previous studies, and if so, why.

      The authors do not report the genetic strain background of the mice used, the sex distributions of their experimental cohorts, or the age of the mice at the time the experiments were performed. All of these are important variables.

      The DR6 knockout strain reported in Gamage et al. (2017) was on a C57BL/6.129S segregating background. Gamage et al. reported that loss of DR6 protected axons from Wallerian degeneration for up to 4 weeks, but importantly, only in 38.5% (5 out of 13) mice they examined. In the present paper, the authors speculate on possible causes for differences between the lack of effect seen here and the effects reported in Gamage et al., including possible spontaneous background mutations, epigenetic changes, genetic modifiers, neuroinflammation, and environmental differences. A likely explanation of the incomplete penetrance reported by Gamage et al. is the segregating genetic background and the presence of modifier loci between C57BL/6 and 129S. The authors do not report the genetic background of the mice used in this study, other than to note that the knockout strain was provided by the group in Gamage et al. However, if, for example, that mutation has been made congenic on C57BL/6 in the intervening years, this would be important to know. One could also argue that the results presented here are consistent with 8 out of 13 mice presented in Gamage et al.

      Age is also an important variable. The protective effects of the spontaneous WldS mutation decrease with age, for example. It is unclear whether the possible protective effects of DR6 also change with age; perhaps this could explain the variable response seen in Gamage et al. and the lack of response seen here.

      It is unclear if sex is a factor, but this is part of why it should be reported.

      The authors also state that they do not see differences in the Schwann cell response to injury in the absence of DR6 that were reported in Gamage et al., but this is not an accurate comparison. In Gamage et al., they examined Schwann cells around axons that were protected from degeneration 2 and 4 weeks post-injury. Those axons had much thinner myelin, in contrast to axons protected by WldS or loss of Sarm1, where the myelin thickness remained relatively normal. Thus, Gamage et al. concluded that the protection of axons from degeneration and the preservation of Schwann cell myelin thickness are separate processes. Here, since no axon protection was seen, the same analysis cannot be done, and we can only say that when axons degenerate, the Schwann cells respond the same whether DR6 is expressed or not.

      The authors also take issue with Colombo et al. (2018), where it was reported that there is an increase in axon diameter and a change in the g-ratio (axon diameter to fiber diameter - the axon + myelin) in peripheral nerves in DR6 knockout mice. This change resulted in a small population of abnormally large axons that had thinner myelin than one would expect for their size. The change in g-ratio was specific to these axons and driven by the increased axon diameter, not decreased myelin thickness, although those two factors are normally loosely correlated. Here, the authors report no changes in axon size or g-ratio, but this could also be due to how the distribution of axon sizes was binned for analysis, and looking at individual data points in supplemental figure 3A, there are axons in the DR6 knockout mice that are larger than any axons in wild type. Thus, this discrepancy may be down to specifics and how statistics were performed or how histograms were binned, but it is unclear if the results presented here are dramatically at odds with the results in Colombo et al. (2018).

      Finally, it is important to note that previously reported effects of DR6 inhibition, such as protection of cultured cortical neurons from beta-amyloid toxicity, are not necessarily the same as Wallerian degeneration of axons distal to an injury studied here. The negative results presented here, showing that loss of DR6 is not protective against Wallerian degeneration induced by injury, are important given the interest in DR6 as a therapeutic target, but they are specific to these mice and this mechanism of induced axon degeneration. The extent to which these findings contradict previous work is difficult to assess due to the lack of detail in describing the mouse experiments, and care should be taken in attempting to extrapolate these results to other disease contexts, such as ALS or Alzheimer's disease.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript by Beirowski, Huang, and Babetto revisits the proposed role of Death Receptor 6 (DR6/Tnfrsf21) in Wallerian degeneration (WD). A prior study (Gamage et al., 2017) suggested that DR6 deletion delays axon degeneration and alters Schwann cell responses following peripheral nerve injury. Here, the authors comprehensively test this claim using two DR6 knockout mouse models (the line used in the earlier report plus a CMV-Cre derived floxed ko line) and multiple WD assays in vivo and in vitro, aligned with three positive controls, Sarm1 WldS and Phr1/Mycbp2 mutants. Contrary to the prior findings, they find no evidence that DR6 deletion affects axon degeneration kinetics or Schwann cell dynamics (assessed by cJun expression or [intact+degenerating] myelin abundance after injury) during WD. Importantly, in DRG explant assays, neurites from DR6-deficient mice degenerated at rates indistinguishable from controls. The authors conclude that DR6 is dispensable for WD, and that previously reported protective effects may have been due to confounding factors such as genetic background or spontaneous mutations.

      Strengths:

      The authors employ two independently generated DR6 knockout models, one overlapping with the previously published study, and confirm loss of DR6 expression by qPCR and Western blotting.<br /> Multiple complementary readouts of WD are applied (structural, ultrastructural, molecular, and functional), providing a robust test of the hypothesis.

      Comparisons are drawn with established positive controls (WldS, SARM1, Phr1/Mycbp2 mutants), reinforcing the validity of the assays.

      By directly addressing an influential but inconsistent prior report, the manuscript clarifies the role of DR6 and prevents potential misdirection of therapeutic strategies aimed at modulating WD in the PNS. The discussion thoughtfully considers possible explanations for the earlier results, including colony-specific second-site mutations that could explain the incomplete penetrance of the earlier reported phenotype of only 36%.

      Weaknesses:

      (1) The study focuses on peripheral nerves. The manuscript frequently refers to CNS studies to argue for consistency with their findings. It would be more accurate to frame PNS/CNS similarities as reminiscences rather than as consistencies (e.g., line 205ff in the Discussion).

      (2) The DRG explant assays are convincing, though the slight acceleration of degeneration in the DR6 floxed/Cre condition is intriguing (Figure 4E). Could the authors clarify whether this is statistically robust or biologically meaningful?

      (3) In the summary (line 43), the authors refer to Hu et al. (2013) (reference 5) as the study that previously reported AxD delay and SC response alteration after injury. However, this study did not investigate the PNS, and I believe the authors intended to reference Gamage et al. (2017) (reference 10) at this point.

      (4) In line 74ff of the results section, the authors claim that developmental myelination is not altered in DR6 mutants at postnatal day 1. However, the variability in Figure S2 appears substantial, and the group size seems underpowered to support this claim. Colombo et al. (2018) (reference 11) reported accelerated myelination at P1, but this study likewise appears underpowered. Possible reasons for these discrepancies and the large variability could be that only a defined cross-sectional area was quantified, rather than the entire nerve cross-section.

      (5) The authors stress the data of Gamage et al. (2017) on altered SC responses in DR6 mutants after injury. They employed cJun quantification to show that SC reprogramming after injury is not altered in DR6 mutants. This approach is valid and the conclusion trustworthy. Here, the addition of data showing the combined abundance of intact and degenerated myelin does not add much insight. However, Gamage et al. (2017) reported altered myelin thickness in a subset of axons at 14 days after injury, which is considerably later than the time points analyzed in the present study. While, in the Reviewer's view, the thin myelin observed by Gamage et al. in fact resembles remyelination, the authors may wish to highlight the difference in the time points analyzed.

    4. Reviewer #3 (Public review):

      Summary:

      The authors revisit the role of DR6 in axon degeneration following physical injury (Wallerian degeneration), examining both its effects on axons and its role in regulating the Schwann cell response to injury. Surprisingly, and in contrast to previous studies, they find that DR6 deletion does not delay the rate of axon degeneration after injury, suggesting that DR6 is not a mediator of this process.

      Overall, this is a valuable study. As the authors note, the current literature on DR6 is inconsistent, and these results provide useful new data and clarification. This work will help other researchers interpret their own data and re-evaluate studies related to DR6 and axon degeneration.

      Strengths:

      (1) The use of two independent DR6 knockout mouse models strengthens the conclusions, particularly when reporting the absence of a phenotype.

      (2) The focus on early time points after injury addresses a key limitation of previous studies. This approach reduces the risk of missing subtle protective phenotypes and avoids confounding results with regenerating axons at later time points after axotomy.

      Weaknesses:

      (1) The study would benefit from including an additional experimental paradigm in which DR6 deficiency is expected to have a protective effect, to increase confidence in the experimental models, and to better contextualize the findings within different pathways of axon degeneration. For example, DR6 deletion has been shown in more than one study to be partially axon protective in the NGF deprivation model in DRGs in vitro. Incorporating such an experiment could be straightforward and would strengthen the paper, especially if some of the neuroprotective effects previously reported are confirmed.

      (2) The quality of some figures could be improved, particularly the EM images in Figure 2. As presented, they make it difficult to discern subtle differences.

    1. eLife Assessment

      In their study, Brown et. al. provide an important advance in understanding the architecture of the mycobacterial outer membrane. Using all-atom simulations of model mycomembranes, the work reports compelling structural insights into how α-mycolic acids and outer leaflet lipids (PDIM and PAT) shape membrane organisation. The work revealed membrane heterogeneity with ordered inner leaflets and disordered outer leaflets that provide a molecular explanation for the resilience of the mycobacterial envelope.

    2. Reviewer #1 (Public review):

      Disclaimer:

      This reviewer is not an expert on MD simulations but has a basic understanding of the findings reported and is well-versed with mycobacterial lipids.

      Summary:

      In this manuscript titled "Dynamic Architecture of Mycobacterial Outer Membranes Revealed by All-Atom 1 Simulations", Brown et al describe outcomes of all-atom simulation of a model outer membrane of mycobacteria. This compelling study provided three key insights:<br /> (1) The likely conformation of the unusually long chain alpha-branched beta-methoxy fatty acids, mycolic acids in the mycomembrane, to be the extended U or Z type rather than the compacted W-type. (2) Outer leaflet lipids such as PDIM and PAT provide regional vertical heterogeneity and disorder in the mycomembrane that is otherwise prevented in a mycolic acid-only bilayer.<br /> (3) Removal of specific lipid classes from the symmetric membrane systems leads to significant changes in membrane thickness and resilience to high temperatures.

      Strengths:

      The authors take a step-wise approach in building the complexity of the membrane and highlight the limitations of each of the approaches. A case in point is the use of supraphysiological temperature of 333 K or even higher temperatures for some of the simulations. Overall, this is a very important piece of work for the mycobacterial field, and will help in the development of membrane-disrupting small molecules and provide important insights for lipid-lipid interactions in the mycomembrane.

      Weaknesses:

      (1) The authors used alpha-mycolic acids only for their models. The ratios of alpha, keto, and methoxy-mycolic acids are known in the literature, and it may be worth including these in their model. Future studies can be aimed at addressing changes in the dynamic behavior of the MOM by altering this ratio, but the inclusion of all three forms in the current model will be important and may alter the other major findings of the current study.

      (2) The findings from the 14 different symmetric membrane systems developed with the removal of one complex lipid at a time are very interesting but have not been analysed/discussed at length in the current manuscript. I find many interesting insights from Figures S3 and S5, which I find missing in the manuscript. These are as follows:

      a) Loss of PDIM resulted in reduced membrane thickness. This is a very important finding given that loss of PDIM can be a spontaneous phenomenon in Mtb cultures in vitro and that this is driven by increased nutrient uptake by PDIM-deficient bacilli (Domenech and Reed, 2009 Microbiology). While the latter is explained by the enhanced solute uptake by several PE/PPE transporter systems in the absence of PDIM (Wang et al, Science 2020), the findings presented by Brown et al could be very important in this context. A discussion on these aspects would be beneficial for the mycobacterial community.

      b) I find it interesting that loss of PAT or DAT does not change membrane thickness (Figure S3). While both PAT and PDIM can migrate to the interleaflet space, loss of PDIM and PAT has a different impact on membrane thickness. It is worth explaining what the likely interactions are that shape membrane thickness in the case of the modelled MOM.

      c) Figure S5: Is the presence of SGL driving PDIM and PAT to migrate to the inter-leaflet space? Again, a discussion on major lipid-lipid interactions driving these lipid migrations across the membrane thickness would be useful.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript reports all-atom molecular dynamics simulations on the outer membrane of Mycobacterium tuberculosis. This is the first all-atom MD simulation of the MTb outer membrane and complements the earlier studies, which used coarse-grained simulation.

      Strengths:

      The simulation of the outer membrane consisting of heterogeneous lipids is a challenging task, and the current work is technically very sound.

      The observation about membrane heterogeneity and ordered inner leaflets vs disordered outer leaflets is a novel result from the study. This work will also facilitate other groups to work on all-atom models of mycobacterial outer membrane for drug transport, etc.

      Weaknesses:

      Beyond a challenging simulation study, the current manuscript only provides qualitative explanations on the unusual membrane structure of MTb and does not demonstrate any practical utility of the all-atom membrane simulation. It will be difficult for the general biology community to appreciate the significance of the work, based on the manuscript in its current form, because of the high content of technical details and limited evidence on the utility of the work.

      Major Points:

      (1) The simulation by Basu et al (Phys Chem Chem Phys 2024) has studied drug transports through mycolic acid monolayers. Since the authors of the current study have all atom models of MTb outer membrane, they should carry out drug transport simulations and compare them to the outer membranes of other bacteria through which drugs can permeate. In the current manuscript, it is only discussed in lines 388-392. Can the disruption of MA cyclopropanation be simulated to show its effect on membrane structure ?

      (2) In line 277, the authors mention about 6 simulations which mimic lipid knockout strains. The results of these simulations, specifically the outcomes of in silico knockout of lipids, are not described in detail.

      (3) Figure 5 shows PDIM and PAT-driven lipid redistribution, which is a significant novel observation from the study. However, comparison of 3B and 3D shows that at 313K, the movement of the PDIM head group is much less. Since MD simulations are sensitive to random initial seeds, repeated simulations with different random seeds and initial structures may be necessary.

      (4) As per Figure 1, in the initial structure, the head group of PAT should be on the membrane surface, similar to TDM and TMM, while PDIM is placed towardsthe interior of the outer membrane. However, Figure 5 shows that at t=0, PAT has the same Z position as PDIM. It will be necessary to provide Z-position Figures for TMM and TDM to understand the difference. Is it really dependent on the chemical structure of the lipid moiety or the initial position of the lipid in the bilayer at the beginning of the simulation?

      Minor Point:

      In view of the complexity of the system undertaken for the study, the manuscript in its current form may not be informative for readers who are not experts in molecular simulations.

    1. eLife Assessment

      This important study uses innovative microfluidics-based single-cell imaging to monitor replicative lifespan, protein localization, and intracellular iron levels in aging yeast cells. The evidence for the proposed role of Ssd1 and reduced nutrients for lifespan through limiting iron uptake is convincing, even though some mechanistic details remain unclear. This work will be of interest to cell biologists working on aging and iron metabolism.

    2. Reviewer #1 (Public review):

      Summary:

      Overexpression of the mRNA-binding protein Ssd1 was shown before to expand the replicative lifespan of yeast cells, whereas ssd1 deletion had the opposite effect. Here, the authors provide evidence that Ssd1 acts via sequestration of mRNAs of the Aft1/2-dependent iron regulon. This restricts activation of the regulon and limits accumulation of Fe2+ inside cells, thereby likely lowering oxidative damage. The effects of Ssd1 overexpression and calorie restriction on lifespan are epistatic, suggesting that they might act through the same pathway.

      Strengths:

      The study is well-designed and involves analysis of single yeast cells during replicative aging. The findings are well displayed and largely support the derived model, which also has implications for the lifespan of other organisms, including humans.

      Weaknesses:

      The model is largely supported by the findings, however, they remain largely correlative at the same time. Whether the knockout of ssd1 shortens lifespan by increased intracellular Fe2+ levels has not been tested. The finding that increased Ssd1 levels form condensates in a cell-cycle-dependent manner is interesting, yet the role of the condensates in lifespan expansion remains untested and unlinked.

    3. Reviewer #2 (Public review):

      This manuscript describes the use of a powerful technique called microfluidics to elucidate the mechanisms explaining how overexpression (OE) of Ssd1 and caloric restriction (CR) in yeast extend replicative lifespan (RLS). Microfluidics measures RLS by trapping cells in chambers mounted to a slide. The chambers hold the mother cell but allow daughters to escape. The slide, with many chambers, is recorded during the entire process, roughly 72 hours, with the video monitored afterwards to count how many daughters each of the trapped mothers produces. The power of the method is what can be done with it. For example, the entire process can be viewed by fluorescence so that GFP and mCherry-tagged proteins can be followed as cells age. The budding yeast is the only model where bona fide replicative aging can be measured, and microfluidics is the only system that allows protein localization and levels to be measured in a single cell while aging. The authors do a wonderful job of showing what this combination of tools can do.

      The authors had previously shown that Ssd1, an mRNA-binding protein, extends RLS when overexpressed. This was attributed to Ssd1 sequestering away specific mRNAs under stress, likely leading to reduced ribosomal function. It remained completely unknown how Ssd1 OE extended RLS. The authors observed that overexpressed, but not normally expressed, Ssd1 formed cytoplasmic condensates during mitosis that are resolved by cytokinesis. When the condensates fail to be resolved at the end of mitosis, this signals death.

      It has become clear in the literature that iron accumulation increases with age within the cell. The transcriptional programs that activate the iron regulon also become elevated in aging cells. This is thought to be due to impaired mitochondrial function in aging cells, with increased iron accumulation as an attempt at restoring mitochondrial activity. The authors show that Ssd1 OE and CR both reduce the expression of the iron regulon. The data presented indicate that iron accumulation shortens RLS: deletion of iron regulon components extends RLS, and adding iron to WT cells decreases RLS, but not when Ssd1 is overexpressed or when cells are calorically restricted. Interestingly, iron chelation using BPS has no impact on WT RLS, but decreases the elevated RLS in CR cells and cells overexpressing Ssd1. It was not initially clear why iron chelation would inhibit the extended lifespan seen with CR and Ssd1 OE. This was addressed by an experiment where it was shown that the iron regulon is induced (FIT2 induction) when iron is chelated. Thus, the detrimental effects of induction of the iron regulon by BPS and iron accumulation on RLS cannot be tempered by Ssd1 OE and CR once turned on.

      I did not find any weaknesses to be addressed in this paper. The draft was well-written, and the extensive experimentation was well-designed, performed, and controlled. However, I did make minor comments that I recommend the authors address:

      (1) Why would BPS not reduce RLS in WT cells? The authors could test whether OE of FIT2 reduces RLS in WT cells.

      (2) The authors should add a brief explanation for why the GDP1 promoter was chosen for Ssd1 OE.

      (3) On page 12, growth to saturation was described as glucose starvation. This is more accurately described as nutrient deprivation. Referring to it as glucose starvation is akin to CR, which growing to saturation is not. Ssd1 OE formed condensates upon saturation but not in CR. Why do the authors think Ssd1 OE did not form condensates upon CR? Too mild a stress?

      (4) The authors conclude that the main mechanism for RLS extension in CR and Ssd1 OE is the inhibition of the iron regulon in aging cells. The data certainly supports this. However, this may be an overstatement as other mutations block CR, such as mutations that impair respiration. The authors do note that induction of the iron regulon in aging cells could be a response to impaired mitochondrial function. Thus, it seems that the main goal of CR and Ssd1 OE may be to restore mitochondrial function in aging cells, one way being inactivation of the iron regulon. A discussion of how other mutations impact CR would be of benefit.

      (5) The cell cycle regulation of Ssd1 OE condensates is very interesting. There does not appear to be literature linking Ssd1 with proteasome-dependent protein turnover. Many proteins involved in cell cycle regulation and genome stability are regulated through ubiquitination. It is not necessary to do anything here about it, but it would be interesting to address how Ssd1 condensates may be regulated with such precision.

      (6) While reading the draft, I kept asking myself what the relevance to human biology was. I was very impressed with the extensive literature review at the end of the discussion, going over how well conserved this strategy is in yeast with humans. I suggest referring to this earlier, perhaps even in the abstract. This would nail down how relevant this model is for understanding human longevity regulation.

      In conclusion, I enjoyed reading this manuscript, describing how Ssd1 OE and CR lead to RLS increases, using different mechanisms. However, since the 2 strategies appear to be using redundant mechanisms, I was surprised that synergism was not observed.

    4. Reviewer #3 (Public review):

      In this paper, the authors investigate how the RNA-binding protein Ssd1 and calorie restriction (CR) influence yeast replicative lifespan, with a particular focus on age-dependent iron uptake and activation of the iron regulon. For this, they use microfluidics-based single-cell imaging to monitor replicative lifespan, protein localization, and intracellular iron levels across aging cells. They show that both Ssd1 overexpression and CR act through a shared pathway to prevent the nuclear translocation of the iron-regulon regulator Aft1 and the subsequent induction of high-affinity iron transporters. As a result, these interventions block the age-related accumulation of intracellular free iron, which otherwise shortens lifespan. Genetic and chemical epistasis experiments further demonstrate that suppression of iron regulon activation is the key mechanism by which Ssd1 and CR promote replicative longevity.

      Overall, the paper is technically rigorous, and the main conclusions are supported by a substantial body of experimental data. The microfluidics-based assays in particular provide compelling single-cell evidence for the dynamics of Ssd1 condensates and iron homeostasis.

      My main concern, however, is that the central reasoning of the paper-that Ssd1 overexpression and CR prevent the activation of the iron regulon-appears to be contradicted by previous findings, and the authors may actually be misrepresenting these studies, unless I am mistaken. In the manuscript, the authors state on two occasions:

      "Intriguingly, transcripts that had altered abundance in CR vs control media and in SSD1 vs ssd1∆ yeast included the FIT1, FIT2, FIT3, and ARN1 genes of the iron regulon (8)"

      "Ssd1 and CR both reduce the levels of mRNAs of genes within the iron regulon: FIT1, FIT2, FIT3 and ARN1 (8)"

      However, reference (8) by Kaeberlein et al. actually says the opposite:

      "Using RNA derived from three independent experiments, a total of 97 genes were observed to undergo a change in expression >1.5-fold in SSD1-V cells relative to ssd1-d cells (supplemental Table 1 at http://www.genetics.org/supplemental/). Of these 97 genes, only 6 underwent similar transcriptional changes in calorically restricted cells (Table 2). This is only slightly greater than the number of genes expected to overlap between the SSD1-V and CR datasets by chance and is in contrast to the highly significant overlap in transcriptional changes observed between CR and HAP4 overexpression (Lin et al. 2002) or between CR and high external osmolarity (Kaeberlein et al. 2002). Intriguingly, of the 6 genes that show similar transcriptional changes in calorically restricted cells and SSD1-V cells, 4 are involved in iron-siderochrome transport: FIT1, FIT2, FIT3, and ARN1 (supplemental Table 1 at http://www.genetics.org/supplemental/)."

      Although the phrasing might be ambiguous at first reading, this interpretation is confirmed upon reviewing Matt Kaeberlein's PhD thesis: https://dspace.mit.edu/handle/1721.1/8318 (page 264 and so on).

      Moreover, consistent with this, activation of the iron regulon during calorie restriction (or the diauxic shift) has also been observed in two other articles:

      https://doi.org/10.1016/S1016-8478(23)13999-9

      https://doi.org/10.1074/jbc.M307447200

      Taken together, these contradictory data might blur the proposed model and make it unclear how to reconcile the results.

    1. eLife Assessment

      This study provides valuable insights into the influence of sex on bile acid metabolism and the risk of hepatocellular carcinoma (HCC). The data to support that there are inter-relationships between sex, bile acids, and HCC in mice are convincing, although this is a largely descriptive study. Future studies are needed to understand the interaction of sex hormones, bile acids, and chronic liver diseases and cancer at a mechanistic level. Also, there is not enough evidence to determine the clinical significance of the findings given the differences in bile acid composition between mice and men.

    2. Reviewer #1 (Public review):

      Liver cancer shows a high incidence in males than females with incompletely understood causes. This study utilized a mouse model that lacks the bile acid feedback mechanisms (FXR/SHP DKO mice) to study how dysregulation of bile acid homeostasis and a high circulating bile acid may underlie the gender-dependent prevalence and prognosis of HCC. By transcriptomics analysis comparing male and female mice, unique sets of gene signatures were identified and correlated with HCC outcomes in human patients. The study showed that ovariectomy procedure increased HCC incidence in female FXR/SHP DKO mice that were otherwise resistant to age-dependent HCC development, and that removing bile acids by blocking intestine bile acid absorption reduced HCC progression in FXR/SHP DKO mice. Based on these findings, the authors suggest that gender-dependent bile acid metabolism may play a role in the male-dominant HCC incidence, and that reducing bile acid level and signaling may be beneficial in HCC treatment. This study include many strengths: 1. Chronic liver diseases often proceed the development of liver and bile duct cancer. Advanced chronic liver diseases are often associated with dysregulation of bile acid homeostasis and cholestasis. This study takes advantage of a unique FXR/SHP DKO model that develop high organ bile acid exposure and spontaneous age-dependent HCC development in males but not females to identify unique HCC-associated gene signatures. The study showed that the unique gene signature in female DKO mice that had lower HCC incidence also correlated with lower grade HCC and better survival in human HCC patients. 2. The study also suggests that differentially regulated bile acid signaling or gender-dependent response to altered bile acids may contribute to gender-dependent susceptibility to HCC development and/or progression. 3. The sex-dependent differences in bile acid-mediated pathology clearly exist but are still not fully understood at the mechanistic level. Female mice have been shown to be more sensitive to bile acid toxicity in a few cholestasis models, while this study showed a male dominance of bile acid promotion of HCC. This study used ovariectomy to demonstrate that female hormones are possible underlying factors. Future studies are needed to understand the interaction of sex hormones, bile acids, and chronic liver diseases and cancer.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      Liver cancer shows a high incidence in males than females with incompletely understood causes. This study utilized a mouse model that lacks the bile acid feedback mechanisms (FXR/SHP DKO mice) to study how dysregulation of bile acid homeostasis and a high circulating bile acid may underlie the gender-dependent prevalence and prognosis of HCC. By transcriptomics analysis comparing male and female mice, unique sets of gene signatures were identified and correlated with HCC outcomes in human patients. The study showed that ovariectomy procedure increased HCC incidence in female FXR/SHP DKO mice that were otherwise resistant to agedependent HCC development, and that removing bile acids by blocking intestine bile acid absorption reduced HCC progression in FXR/SHP DKO mice. Based on these findings, the authors suggest that gender-dependent bile acid metabolism may play a role in the male-dominant HCC incidence, and that reducing bile acid level and signaling may be beneficial in HCC treatment. 

      strengths:

      (1) Chronic liver diseases often proceed the development of liver and bile duct cancer. Advanced chronic liver diseases are often associated with dysregulation of bile acid homeostasis and cholestasis. This study takes advantage of a unique FXR/SHP DKO model that develop high organ bile acid exposure and spontaneous age-dependent HCC development in males but not females to identify unique HCC-associated gene signatures. The study showed that the unique gene signature in female DKO mice that had lower HCC incidence also correlated with lower grade HCC and better survival in human HCC patients. 2. The study also suggests that differentially regulated bile acid signaling or gender-dependent response to altered bile acids may contribute to gender-dependent susceptibility to HCC development and/or progression. 3. The sex-dependent differences in bile acidmediated pathology clearly exist but are still not fully understood at the mechanistic level. Female mice have been shown to be more sensitive to bile acid toxicity in a few cholestasis models, while this study showed a male dominance of bile acid promotion of HCC. This study used ovariectomy to demonstrate that female hormones are possible underlying factors. Future studies are needed to understand the interaction of sex hormones, bile acids, and chronic liver diseases and cancer. 

      We thank Reviewer 1 for their positive and thorough assessment of our manuscript

      Weaknesses:

      (1) HCC shows heterogeneity, and it is unclear what tissues (tumor or normal) were used from the DKO mice and human HCC gene expression dataset to obtain the gene signature, and how the authors reconcile these gene signatures with HCC prognosis.

      Mice studies: Aged DKO mice develop aggressive tumors (major and minor nodules, See Figure 1), and the entire liver is burdened with multiple tumor nodules. It is technically challenging to demarcate the tumor boundaries as most of the surrounding tissues do not display normal tissue architecture. Therefore, livers from age- and sexmatched wild-type C57/BL6 mice were used as control tissue. All the mice were inbred in our facility. Spatial transcriptomics and longitudinal studies are ongoing to collect tumors at earlier time points wherein we can differentiate tumor and non-tumor tissue. 

      Human Studies: We mined five separate clinical data sets. The human HCC gene expression comprised of samples from the (i) National Cancer Institute (NCI) cohort (GEO accession numbers, GSE1898 and GSE4024) and (ii) Korea, (iii) Samsung, (iv) Modena, and (v) Fudan cohorts as previously described (GEO accession numbers, GSE14520, GSE16757, GSE43619, GSE36376, and GSE54236). We have added a new supplemental table 4, giving details of these datasets. Depending on the cohort, they are primarily HCC samples- surgical resections of HCC, control samples, with some tumors and paired non-tumor tissues.

      (2) The authors identified a unique set of gene expression signatures that are linked to HCC patient outcomes, but analysis of these gene sets to understand the causes of cancer promotion is still lacking. The studies of urea cycle metabolism and estrogen signaling were preliminary and inconclusive. These mechanistic aspects may be followed up in revision or future studies.

      We agree. Experiments to elicit HCC causality and promotion are complex, given the heterogeneous nature of liver cancer. Moreover, the length of time (12 months) needed to spontaneously develop cancer in this DKO mouse model makes it challenging. As mentioned by the reviewer, mechanistic studies are ongoing, and longitudinal time course experiments are actively being pursued to delineate causality. Having said that, we mined the TCGA LIHC (The Cancer Genome Atlas Liver Hepatocellular Carcinoma) database to examine the expression of the individual urea cycle genes and found them suppressed in liver tumorigenesis (new Supplementary Figure 4). We also evaluated if estrogen receptor  (Er) targets altered in DKO females (DKO_Estrogen) correlate with overall survival in HCC (new Supplementary Figure 6). We note that Er expression per se is reduced in males and females upon liver tumorigenesis. Also, DKO_Estrogen signature positively corroborated with better overall survival (new Supplementary Figure 6). These findings further bolster the relevance of urea cycle metabolism and estrogen signaling during HCC. 

      (3) While high levels of bile acids are convincingly shown to promote HCC progression, their role in HCC initiation is not established. The DKO model may be limited to conditions of extremely high levels of organ bile acid exposure. The DKO mice do not model the human population of HCC patients with various etiology and shared liver pathology (i.e. cirrhosis). Therefore, high circulating bile acids may not fully explain the male prevalence of HCC incidence.

      We agree with this comment that our studies do not show bile acids can initiate HCC and may act as one of the many factors that contribute to the high male prevalence of HCC. This is exactly the reason why throughout the manuscript we do not write about HCC initiation. To clarify further, in the revised discussion of the manuscript, we have added a sentence to highlight this aspect, “while this study demonstrates bile acids promote HCC progression it does not investigate or provide evidence if excess bile acids are sufficient for HCC initiation.”

      (4) The authors showed lower circulating bile acids and increased fecal bile acid excretion in female mice and hypothesized that this may be a mechanism underlying the lower bile acid exposure that contributed to lower HCC incidence in female DKO mice. Additional analysis of organ bile acids within the enterohepatic circulation may be performed because a more accurate interpretation of the circulating bile acids and fecal bile acids can be made in reference to organ bile acids and total bile acid pool changes in these mice.

      As shown in this manuscript- we provide BA compositional analyses from the liver, serum, urine, and feces (Figures 5 and 6, new Supplementary Figure 8, Supplementary Tables 4 and 5). Unfortunately, we did not collect the intestinal tissue or gallbladders for BA analysis in this study. Separate cohorts of mice are being aged for future BA analyses from different organs within the enterohepatic loop. We thank you for this suggestion. Nevertheless, we have previously measured and reported BA values to be elevated in the intestines and the gall bladder of young DKO mice (PMC3007143).

      Reviewer #2 (Public review):

      Weaknesses:

      (1) The translational value to human HCC is not so strong yet. Authors show that there is a correlation between the female-selective gene signature and low-grade tumors and better survival in HCC patients overall. However, these data do not show whether this signature is more highly correlated with female tumor burden and survival. In other words, whether the mechanisms of female protection may be similar between humans and mice. In that respect, it would also be good to elaborate on whether women have higher fecal BA excretion and lower serum BA concentration.

      The reviewer poses an interesting question to test if the DKO female-specific signatures are altered differently in male vs. female HCC samples. As we found the urea cycle and estrogen signaling to be protective and enriched in our mouse model, we tested their expression pattern using the TCGA-LIHC RNA-seq data. We found urea cycle genes and Er transcripts broadly reduced in tumor samples irrespective of the sex (new Supplementary Figure 4 and Supplementary Figure 6), indicating that these pathways are compromised upon tumorigenesis even in the female livers. 

      While prior studies have shown (i) a smaller BA pool w synthesis in men than women (PMID: 22003820), we did not find a study that systematically investigated BA excretion between the sexes in HCC context. The reviewer is spot on in suggesting BA analysis from HCC and unaffected human fecal samples from both sexes. Designing and performing such studies in the future will provide concrete proof of whether BA excretion protects female livers from developing liver cancer. We thank you for these suggestions.

      (2) The authors should perform a thorough spelling and grammar check.

      We apologize for the typos, which have been fixed, and as suggested by the reviewer, we have performed a grammar check.

      (3) There are quite some errors and inaccuracies in the result section, figures, and legends. The authors should correct this.

      We apologize for the inadvertent errors in the manuscript, and we have clarified these inaccuracies in the revised version. Thank you.

      Reviewer#1 (Recommendations for the authors).

      (1) Figures 1A-F, This statement of altered liver steatosis needs to be further supported by measurement of liver triglycerides. Lower magnification images of Sirius red stain should be shown for better evaluation of liver fibrosis.

      Unfortunately, we did not measure liver triglycerides and sirius red stained samples have faded, and lower magnification is unavailable at this juncture. We have modified our results accordingly.  

      We did not take the gross picture of WT female and DKO female livers in the same frame as shown below. Since the manuscript is focused on male and female differences in liver cancer incidence, we provided DKO male and female liver images as Figure 1D in the paper.

      Author response image 1.

      Gross liver images of a year-old WT and DKO mice which show prominent hepatocarcinogenesis in DKO male mice

      (2) Can the authors clarify if the gene transcriptomics was performed with normal or tumor tissues of DKO mice?

      Gene transcriptomics were performed with the tumor tissue of DKO mice. We have previously published data from younger non tumor bearing DKO male mice (PMCID: PMC3007143). 

      (3) Supplementary Figure 3C. Could the authors confirm if this is F vs M or just DKO female since it does not seem to match the result description in the main text? It is better practice to indicate the sub-panels of the Supplementary Figures in the main text while describing the results.

      As the reviewer correctly points out Supplementary Figure 3C is DKO F vs M signature not DKO_female signature and this has been clarified in the text. We have also included DKO_F data now to reduce the confusion.

      (4) Figure 3. Legend, the data presented are not well explained in the Legend, especially the labeling and what is being presented and compared.

      As suggested by the reviewer, we have modified the legend accordingly.

      (5) Supplementary Table 4 does not contain total serum bile acid as described in the main text.

      We agree with the reviewer. We provided primary and secondary BA concentrations, Supplementary Table 4 (currently Supplementary Table 5 in the revised version): Rows 20 and 21. but not their added total. We have modified the text accordingly.

      (6) Method section: many experiments lack descriptions of details.

      We have added details to the animal experimental design, ER ChIP-PCR, schematics of experiments are included within the main and supplemental figures, metabolomics and BA analysis have been expanded. 

      Reviewer #2 (Recommendations For The Authors):

      General:

      (1) The authors are advised to do a thorough grammar and spelling check.

      We have performed spelling and grammar check as suggested using an online platform Grammarly. Thank You.

      Results:

      (1) Figure 1 o The authors should show in Figure 1D female WT and female DKO liver.

      See Figure 1 added in our responses to point 1 of reviewer 1’s comment.

      In the Figure legend, (A-E) should be replaced by (A+D). 

      Thank you. We have modified it accordingly.

      The authors do not refer to 1J in the text, please add this reference.

      Thank you for pointing it. We have referenced 1J in the text.

      The description of 1H does not elaborate on the sex differences in ALT/AST levels, as this is the focus of the manuscript.

      We have added a sentence to show that the injury markers are higher in DKO males, which is consistent with an advanced disease. Thanks.

      The authors should use the correct nomenclature in Figure 1I/1J (gene vs protein and capitals vs non-capitals).

      The Figure 1I and 1J show gene expression of Fxr and Shp and hence we used the non-capital italicized nomenclature. Thanks.

      (2) Figure 2:

      The x-axis length is different in Figures 2A and 2B. Please correct to visualize the differences between males and females better.

      The x axis length has been fixed as suggested. Thanks

      (3) Figure 3:

      The authors should elaborate on how the patients were assigned to each gene signature. This is not fully clear.

      The gene set obtained from the WT and DKO mice were used. The process used is shown as a schematic in Supplemental Fig 2C and the gene list is included  in an excel sheet as Supplemental table 1. 

      We are curious how these data (F3A-C) would look when separating male and female human patients.

      We performed an overall survival analysis with a subgroup of patients and provide it. We segregated the HCC cohort data on sex and age (>55 yr, since we assumed 55 as an age for menopause) and evaluated the DKO gene signature. Similar to the original figure 3, we find that irrespective of sex, and age, DKO FvsM gene signature corresponds with better overall survival in men and in women. These findings align with the combined analysis in overall survival shown in original Figure 3 of the manuscript, and therefore we did not modify it. If deemed necessary, we are happy to include the figure below to reviewers in the main manuscript.

      Author response image 2.

      Correlation of gene signatures obtained from WT and DKO mouse model with the survival data of HCC patients segregated by age and sex. The Kaplan Meier Survival graphs were generated based on WT and DKO transcriptome changes using five HCC clinical cohorts. Analysis of OS (Overall Survival) in patients ((A) Men and (B) Women) using the gene signatures representative of either male WT or male DKO, female WT or female DKO, and unique changes observed in female DKO mice but not in male DKO mice.

      What was used as the control signature in Figure 3C? Please specify this.

      For Figure 3C we compared the DKO_M signature to that of DKOF vs M signature. These genes are listed as an Excel Sheet (Supplementary Table 1).

      The authors claim that DKO female mice display chronic cholestasis, similar to their male counterparts. Please refer to previous work or show the data.

      Serum BA levels are elevated in DKO females are reported in supplementary table 5 and we find comparable hepatic BA composition in Figure 5 F.

      (4) Figure 4: Labels for the x-axis are missing in Figure 4C. Please add legends or labels to the bars.

      The x axis label is included in the top Serum BAs in (M)

      In Figure 4I, the percentage of input is quite low. An IgG control would show whether recruitment of ERalpha to the shown loci is significant above background levels. Also, ChIP on the OVX liver could serve as a negative control.

      We did use IgG as control pull down and the signals above this background were considered. We have not performed this in OVX, which would be an excellent negative control for future studies. Thank You.

      The results and legends refer to ChIP-qPCR, while methods only mention ChIP-seq.Please adapt.

      We sincerely apologize for the mistake. We used published ChIP-seq to identify putative binding site and then performed ChIP PCR to validate it. We have clarified and rectified this error. Thank You.

      Significance indications in the figure legend do not correspond with significance indications in the figure. Please explain the used significance symbols in the figure in the legend.

      Thank You. The legends and their significance have been matched.

      (5) Figure 5:

      Authors claim lowered total serum BA in females compared to males, and reference to Supplementary Table 4. However, these data are not provided, only percentages and ratios are displayed.

      In the revised version, this has become Table 5. See response to the same concern noted by Reviewer 1, Point 5 above.

      Figure 5D: Are sulphated BA also elevated in WT females? Please provide these data.

      There is no significant urinary excretion of BAs in WT control animals. We have previously measured and found none. But under cholestatic conditions BAs are observed in urine. Therefore, sulphated BA levels were found only in the DKO mice. 

      Figure 5H: Is the fecal BA excretion in WT females also proportionally higher than in males? Please provide these data.

      We were unable to perform the untargeted metabolomics profiling of WT fecal samples. When we measured for BAs in the feces, as expected very low conc were present irrespective of the sex (~0.01 M) and we did not find any sex difference.  Also, prior studies in 129SVJ strain exhibited comparable fecal excretion (PMC150802). We did not find any clinical studies that measured fecal BA between the sexes.

      (6) Figure 6:

      References in the text of the result section to Figure 6 are wrong. The authors should change this.

      Thank You. This has been rectified.

      Significance indications in the legend do not correspond with significance indications in the figure. Please explain the used significance symbols in the figure in the legend.

      Thank You. The legends and their significance have been matched.

      (7) Supplemental Figure 3:

      Please adapt the title of this figure; the sentence is incorrect. The description of this figure is very poor.

      We have modified the legend and the title of the Supplemental Figure 3 to make it more appropriate. Thanks

      Please explain what the blue and red dots represent.

      Each dot in blue and yellow indicate the Bayesian probability generated from our BCCP model.

      What are the bold horizontal lines representing? Why are there no dots in some box plots? Please elaborate.

      The box represents the interquartile range (IQR), encompassing the middle 50% of the data. The bottom and top edges correspond to the 25th and 75th percentiles, respectively, while the bold horizontal line indicates the median value.

      The absence of visible dots in certain categories—particularly in higher CLIP and TNM stages—is due to the small number of patients, all of whom had similar Bayesian prediction probabilities. As these values cluster tightly around the median, the individual dots may be overlapped and hidden behind the median line.

      The figure is not visually easy to understand, please reconsider the representation.  

      We hope the modified figure legends with the explanation of the lines and the points in the graphs increases the clarity and makes them acceptable.

      Please add the DKO_female signature plot.

      We have added these graph to Supplemental figure 3

      (8) Supplemental 4A:

      Fold change at Z-score is missing. This should be added.

      Thank you we have added this information

      (9) Supplemental 5:

      The scale bar is missing. This should be included.

      The figure is now supplemental figure 8 and the scale bar has been added.

      Methods:

      (1) Did the authors use ChIP-sequencing or ChIP-qPCR? Please describe the correct method.

      We apologize for the error. We have used ChIP-PCR and rectified it in our methods and in our response to a figure 4 query.

      (2) It is unclear how the mouse model was generated. Please refer to earlier publications.

      The mice were generated in house at UIUC, and we have added this sentence to the Methods section. The original reference has been cited in the text (PMCID: PMC3007143).

      Discussion:

      (1) The authors claim in the discussion: 'consistently higher recruitment of ER to the classical BA synthetic genes ...' This is not shown in Figure 4I, only ER recruitment to Cyp7a1 is significantly higher in females. Please rephrase.

      We agree and we have modified the sentence Cyp7A1 accounts for ~75% of BA synthesis and is a rate-limiting gene in the classical BA synthesis pathway. 

      (2) The authors could make their statements stronger if they could elaborate on whether women have more fecal BA excretion, and if there are differences in serum BA concentration in HCC between male and female patients. 

      Unfortunately, we were unable to find clinical studies with appropriate controls which examined and reported serum BA in HCC in a sex specific manner.

      In addition, to understand whether the female-specific protections in humans are similar to mice, it would be nice to show correlations of the female-specific mouse signature with male and female liver signatures.

      At this time, we do not have large n numbers of control or precancerous early-stage patient datasets from both sexes to make such comparisons. Nevertheless, there is translational relevance of these sex-specific signature. Figure 2 included in the reviewer response shows that DKO male signature correlates with poor overall survival in males, whereas neither DKO male nor DKO female signature predict outcome in females. In contrast, DKO female-specific gene signature (DKOFvsM) correlates with better overall survival in both men and in women. 

      (3) The authors state in the discussion: 'Currently we do not know how to reconcile this data other than indicating a potential ER independent mechanism.' We do not understand the reasoning behind this statement. Please clarify.

      We find that increased Erα expression in DKO coincides with CA-mediated suppression of BA synthesis genes in the absence of Fxr and Shp. But we also noticed that in OVX DKO mice, Erα expression is blunted, and so is basal BA synthesis gene expression. Putting together these data, it is intriguing that Erα expression correlates both positively and negatively with BA synthesis genes. To reconcile these contrasting results, we have written the following sentence in the discussion.

      “These findings suggest Erα expression is linked to both positive and negative regulation of BA synthesis genes. But we do not know how ER elicits these differential effects on BA synthesis.”

    1. eLife Assessment

      With the goal of investigating the assembly and fragmentation of cellular aggregates, this manuscript investigates cyanobacterial aggregates in a laboratory setting. This investigation of the conditions and mechanisms behind aggregation is an important contribution as it yields basic understanding of natural processes and offers potential strategies for control. The combination of computational and experimental investigations in this manuscript provides solid support for the role of shear on aggregation and fragmentation. However, the role of extracellular matrix, with possibly a strong effect on aggregation, is not adequately studied.

    2. Reviewer #1 (Public review):

      Sinzato et. al. investigated how shear flow in a rheological chamber affects the assembly and fragmentation of cyanobacterial aggregates, with the goal of understanding how such aggregates might form naturally, and/or be destroyed industrially. The authors used a combination of experiments and models to show that cyanobacterial colonies can be difficult to fragment with fluid flows. Additionally, they provide biophysical support for the idea that such aggregates likely form primarily when cells stay together after cell division, rather than coming together from disparate paths.

      This work has significant relevance to the field, both practically and naturally. Combatting or preventing toxic cyanobacterial blooms is an active area of environmental research that offers a practical backbone for this manuscript's ideas. Additionally, the formation and behavior of cellular aggregates in general is of widespread interest in many fields, including marine and freshwater ecology, healthcare and antibiotic resistance research, biophysics, and microbial evolution. In this field, there are still outstanding questions regarding how microbial aggregates form into communities, including if and how they come together from separate places. Therefore, I believe that researchers from many distinct fields would find interest in the topic of this paper, and particularly Figure 5, in which a phase space that is meant to represent the different modes of aggregate formation and destruction is suggested, dependent on properties of the fluid flow and particle concentration.

      Altogether, the authors were successful in their investigation, and I find their claims to be justified. In particular, the authors achieve strong results from their experiments. Below, I outline key claims of the paper and indicate the level to which they were supported by their data.

      • Their first major claim is that fluid flows alone must be quite strong in order to fragment the cyanobacterial aggregates they have studied. With their rheological chamber, they explicitly show that energy dissipation rates must exceed "natural" conditions by multiple orders of magnitude in order to fragment lab strain colonies, and even higher to disrupt natural strains sampled from a nearby freshwater lake. This claim is well-supported by their experiments and data.

      • The authors then claim that the fragmentation of aggregates due to fluid flows occurs primarily through erosion of small pieces from larger aggregates. Because their experimental setup does not allow them to directly observe this process (for example, by watching one aggregate break into pieces), they rely on indirect methods to support the claim. Overall, the experimental evidence is generally supportive, but the models leave some gaps. I describe this conclusion in more detail below.

      • The strongest evidence for the erosion-dominated process comes from the authors' measurements of transfer of biomass between large and small size classes, as in Figure 2E and Figure 2D. The authors claim that only the erosion model can reproduce this kind of biomass transfer. However, it also seems that the idealized erosion model alone is not fully sufficient to capture the observed behavior. In Figure 2D, there remains a gap between their experiment and the prediction of the erosion model, which grows larger over time (Supplemental Figure S9). While the authors suggest that the erosion model is better than the equal-fragmentation model, it is also true that tracking the mean size (Figure 2B) or small size distribution (Figure S6) cannot distinguish between these models.

      • Taken altogether, the experimental evidence favors an erosion-dominated process. However, a few minor questions remain regarding the models. Why does the equal-fragmentation model predict no biomass transfer between size classes? To what extent, quantitatively, does the erosion model outperform the equal fragments model at capturing the biomass size distributions? Finally, why does the idealized erosion fail to capture the size distribution at late stages in Supplemental Figure S9 - would this discrepancy be resolved if the authors considered individual colony variances in cell adhesion (for instance, as hypothesized by the authors in lines 133-137)? I do not believe these questions curb the other results of the paper.

      • Their third major claim is that fluid flows only weakly cause cells to collide and adhere in a "coming together" process of aggregate formation. They test this claim in Figure 3, where they suspend single cells in their test chamber and stir them at moderate intensity, monitoring their size histogram. They show that the size histogram changes only slightly, indicating that aggregation is, by-and-large, not occurring at a high rate. Therefore, they lend support to the idea that cell aggregation likely does not initiate group formation in toxic cyanobacterial blooms. Additionally, they show that the median size of large colonies also does not change at moderate turbulent intensities. These results agree with previous studies (their own citation 25) indicating that aggregates in toxic blooms are clonal in nature. This is an important result, and well-supported by their data, but only for this specific particle concentration and stirring intensity. Later, in Figure 5 they show a much broader range of particle concentrations and energy dissipation rates that they leave untested. However, they refer to other literature that does test these regions of the phase map.

      • The fourth major result of the manuscript is displayed in Equation 8 and Figure 5, where the authors derive an expression for the ratio between the rate of increase of a colony due to aggregation vs. the rate due to cell division. They then plot this line on a phase map, altering two physical parameters (concentration and fluid turbulence) to show under what conditions aggregation vs. cell division are more important for group formation. Because these results are derived from relatively simple biophysical considerations, they have the potential to be quite powerful and useful, and represent a significant conceptual advance. By combining their experiments with discussions of other experimental investigations of scum formation in cyanobacterial blooms, the authors have investigated the two most relevant zones of this map for the present study (Zones II and III), and have made a strong contribution to the literature in regards to artificial mixing to disrupt cyanobacterial blooms.

      Other notes:

      The authors rely heavily on size distributions to make the claims of their paper. I was pleased to find the calibration histograms in Supplemental Figure S8, which provide information as to how and why they made corrections to the histograms they observed. From these calibration histograms, it seems that larger colonies are more accurately measured in the cone-and-plate shear setup, while smaller colonies can be missed, presumably due to resolution issues.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, the authors investigate the role of fluid flow in shaping the colony size of a freshwater cyanobacterium Microcystis. To do so, they have created a novel assay by combining a rheometer with a bright field microscope. This allows them to exert precise shear forces on cyanobacterial cultures and field samples, and then quantify the effect of these shear forces on the colony size distribution. Shear force can affect the colony size in two ways: reducing size by fragmentation and increasing size by aggregation. They find limited aggregation at low shear rates, but high shear forces can create erosion-type fragmentation: colonies do not break in large pieces, but many small colonies are sheared off the large colonies. Overall, bacterial colonies from field samples seem to be more inert to shear than laboratory cultures, which the authors explain in terms of enhanced intercellular adhesion mediated by secreted polysaccharides.

      Strengths:

      • This study is timely, as cyanobacterial blooms are an increasing problem in freshwater lakes. They are expected to increase in frequency and severeness because of rising temperatures, and it is worthwhile learning how these blooms are formed. More generally, how physical aspects such as flow and shear influence colony formation is often overlooked, at least in part because of experimental challenges. Therefore, the method developed by the authors is useful and innovative, and I expect applications beyond the presented system here.

      • A strong feature of this paper is the highly quantitative approach, combining theory with experiments, and the combination of laboratory experiments and field samples.

      Weaknesses:

      • Especially the introduction seems to imply that shear force is a very important parameter controlling colony formation. However, if one looks at the results this effect is overall rather modest, especially considering the shear forces that these bacterial colonies may experience in lakes. The main conclusion seems that not shear but bacterial adhesion is the most important factor in determining colony size. The writing could have done more justice to the fact that the importance of adhesion had been described elsewhere. This being said, the same method can be used to investigate systems where shear forces are biologically more relevant.
    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      (1) Their first major claim is that fluid flows alone must be quite strong in order to fragment the cyanobacterial aggregates they have studied. With their rheological chamber, they explicitly show that energy dissipation rates must exceed "natural" conditions by multiple orders of magnitude in order to fragment lab strain colonies, and even higher to disrupt natural strains sampled from a nearby freshwater lake. This claim is well-supported by their experiments and data.

      We thank the reviewer for this positive comment. We fully agree, as our fragmentation experiments on division-formed colonies clearly demonstrate their strong mechanical resistance in naturally occurring flows.

      (2) The authors then claim that the fragmentation of aggregates due to fluid flows occurs through erosion of small pieces. Because their experimental setup does not allow them to explicitly observe this process (for example, by watching one aggregate break into pieces), they implement an idealized model to show that the nature of the changes to the size histogram agrees with an erosion process. However, in Figure 2C there is a noticeable gap between their experiment and the prediction of their model. Additionally, in a similar experiment shown in Figure S6, the experiment cannot distinguish between an idealized erosion model and an alternative, an idealized binary fission model where aggregates split into equal halves. For these reasons, this claim is weakened.

      The two idealized models of colony fragmentation, namely erosion of single cells and fragmentation into equal sizes (or binary fission), lead to distinguishable final size distributions. We believe that our experiments for division-formed colonies support the hypothesis of the erosion mechanism. Specifically, Figure 2E shows that colony fragmentation resulted in a decrease of large colonies and a strong increase of single cells and dimers (two cells). In our view, the strong increase of single cells and dimers provides quite convincing (but indirect) evidence supporting the erosion mechanism. This is described on lines 112-121. To further address the reviewer’s concern, we have included in the revised version of Figure 2 (panels B and D) a direct comparison between these two fragmentation models for large division-formed colonies fragmented at a high dissipation rate of ε = 5.8 m<sup>2</sup>/s<sup>3</sup>. Furthermore, we have included the new Supplementary Figure S9, which details the model predictions for the colony size distribution at various time points.

      The ideal equal fragments model (i.e., where every fracture event produces two identical fragments with half the original biovolume) does not capture the biovolume transfer from large colonies to single cells, as observed for the experimental results in panel D of Figure 2 and panel E of Figure S9. In contrast, the erosion model, in panel D of Figure 2 and panel D of Figure S9, provides a good prediction of the experimental results within the experimental uncertainty. The different fragmentation models are discussed in lines 226-228 of the revised manuscript and lines 865-873 of the SI.

      (3) Their third major claim is that fluid flows only weakly cause cells to collide and adhere in a "coming together" process of aggregate formation. They test this claim in Figure 3, where they suspend single cells in their test chamber and stir them at moderate intensity, monitoring their size histogram. They show that the size histogram changes only slightly, indicating that aggregation is, by and large, not occurring at a high rate. Therefore, they lend support to the idea that cell aggregation likely does not initiate group formation in toxic cyanobacterial blooms. Additionally, they show that the median size of large colonies also does not change at moderate turbulent intensities. These results agree with previous studies (their own citation 25) indicating that aggregates in toxic blooms are clonal in nature. This is an important result and well-supported by their data, but only for this specific particle concentration and stirring intensity. Later, in Figure 5 they show a much broader range of particle concentrations and energy dissipation rates that they leave untested.

      We thank the reviewer for this positive comment. We agree that our experimental results show clear evidence that aggregated colonies have a weaker structure in comparison to division-formed colonies, thus supporting the hypothesis that clonal expansion is the main mechanism for colony formation under most natural settings. The range of energy dissipation rates of our experimental setup covers almost entirely the region for which aggregated and division-formed colonies differ in their fragmentation behavior (Zone III of Figure 5). Within this zone, aggregated colonies are fragmented and only the division-formed colonies are able to withstand the hydrodynamic stresses. Furthermore, we show that this fragmentation behavior has a low sensitivity to the total biovolume fraction, as displayed in the Supplementary Figures S2 and S4 and discussed in lines 151-154 and 160-163. We agree that our cone-and-plate setup covers a limited parameter range, and we have added a detailed discussion of these limitations in the revised manuscript, under section Materials and Methods in lines 462-473.

      (4) The fourth major result of the manuscript is displayed in Equation 8 and Figure 5, where the authors derive an expression for the ratio between the rate of increase of a colony due to aggregation vs. the rate due to cell division. They then plot this line on a phase map, altering two physical parameters (concentration and fluid turbulence) to show under what conditions aggregation vs. cell division are more important for group formation. Because these results are derived from relatively simple biophysical considerations, they have the potential to be quite powerful and useful and represent a significant conceptual advance. However, there is a region of this phase map that the authors have left untested experimentally. The lowest energy dissipation rate that the authors tested in their experiment seemed to be \dot{epsilon}~1e-2 [m^2/s^3], and the highest particle concentration they tested was 5e-4, which means that the authors never tested Zone II of their phase map. Since this seems to be an important zone for toxic blooms (i.e. the "scum formation" zone), it seems the authors have missed an important opportunity to investigate this regime of high particle concentrations and relatively weak turbulent mixing.

      We agree with the reviewer that Zone (II) of Figure 5 is of great importance to dense bloom formation under wind mixing and that this parameter range was not covered by our experiments using a cone-and-plate shear flow. The measuring range of our device was motivated by engineering applications such as artificial mixing of eutrophic lakes using bubble plumes, as well as preliminary experiments which demonstrated that high levels of dissipation rate were required to achieve fragmentation. The range of dissipation rates that can be achieved by the cone-and-plate setup is limited at the lower end by the accumulation of colonies near the stagnation point at the conical tip and at the upper end by the spillage of fluid out of the chamber. We now discuss this measuring range in lines 462-473 of the revised manuscript.

      Although our setup does not cover Zone (II), we now refer to recent results in the literature for evidence of aggregation-dominance at Zone (II). The experimental study of Wu et al. (2024) (reference number 64 of the revised manuscript) investigated the formation of Microcystis surface scum layers in wind-mixed mesocosms. Their study identified aggregation of colonies in the scum layer, resulting in increases of colony size at rates faster than cell division. These results agree with our model, and the parameters range investigated fall within the Zone II. We have included in the revised version, lines 328-337, a detailed discussion elucidating the parameter range covered in our experiments and the findings of Wu et al. (2024).

      Other items that could use more clarity:

      (5) The authors rely heavily on size distributions to make the claims of their paper. Yet, how they generated those size distributions is not clearly shown in the text. Of primary concern, the authors used a correction function (Equation S1) to estimate the counts of different size classes in their image analysis pipeline. Yet, it is unclear how well this correction function actually performs, what kinds of errors it might produce, and how well it mapped to the calibration dataset the authors used to find the fit parameters.

      We agree with the reviewer that more details of the correction function should be included. We have included in the revised version of the Supporting Information, in lines 785-796, a more detailed explanation of the correction function. Furthermore, a direct comparison of raw and corrected histograms of the size distribution and its associated uncertainty is presented in the new Supplementary Figure S8.

      (6) Second, in their models they use a fractal dimension to estimate the number of cells in the group from the group radius, but the agreement between this fractal dimension fit and the data is not shown, so it is not clear how good an approximation this fractal dimension provides. This is especially important for their later derivation of the "aggregation-to-cell division" ratio (Equation 8)

      We agree with the reviewer that more details on the estimation of fractal dimension are needed. The revised version, under Materials and Methods in lines 508-515, now includes the detailed estimation procedure, the number of colonies analysed, and the associated uncertainty.

      Reviewer #1 (Recommendations For The Authors):

      In light of the weak evidence for claim #2 outlined above, I believe the paper would benefit from a more explicit comparison in Figure 2C of the two models - idealized erosion, and idealized binary fission. With such a comparison, the authors would have stronger footing to claim that one process is more important than the other.

      As mentioned in our answer above to comment #2 of public review, we have included in the revised version of Figure 2 (panels B and D) a direct comparison between the erosion and equal fragments (binary fission) models for large division-formed colonies fragmented under ε = 5.8 m<sup>2</sup>/s<sup>3</sup>. The comparison is further detailed in the new Supplementary Figure S9 for representative time points. Only the erosion models can recover the biovolume transfer from large colonies to single cells, as observed for the experimental results in Figure 2D and further detailed in Figure S9D. We believe that the revised version of Figure 2 and the new Supplementary Figure S9 provide strong evidence in support of the erosion fragmentation model.

      Would the authors comment on their chosen range of experimental dissipation rates? For instance, was their goal more to investigate industrial/engineering applications where the goal is to disrupt the cyanobacteria, but not really typical natural conditions under which the groups might form?

      The choice of experimental dissipation rates in our experiment was such that it covers engineering applications such as artificial mixing of eutrophic lakes using bubble plumes. We have now clarified in the Introduction, on lines 37-39, that artificial mixing has been successfully applied in several lakes to suppress cyanobacterial blooms. Furthermore, we have now clarified in the caption of Figure 5 that the bars on the right side indicate typical values of dissipation rates induced by natural wind-mixing, bubble plumes in artificially mixed lakes, and laboratory-scale experiments such as cone-and-plate systems and stirred tanks. The dissipation rates induced by the bubble plumes in artificially mixed lakes could potentially fragment aggregated cyanobacterial colonies and thus disrupt bloom formation. However, our preliminary experiments demonstrated that high levels of dissipation rate were required to achieve fragmentation, therefore we’ve focused on the upper range of values (0.01 to 10 m<sup>2</sup>/s<sup>3</sup>).

      The dissipation rates generated by the cone-and-plate approach are indeed higher than the dissipation rates under typical natural conditions in lakes. We have now added a detailed discussion of the range of dissipation rates generated by the cone-and-plate approach in the revised manuscript, under section Materials and Methods in lines 462-473, where we also explain that these values are higher than the natural dissipation rates generated by wind action in lakes. However, the more generic insights obtained by our study, shown in Figure 5, are relevant for dissipation rates of natural lakes (e.g., Zone II). Therefore, in our discussion of Figure 5 we have now included the recent findings of Wu et al. (2024) (reference number [64] of the revised manuscript), who studied bloom formation of Microcystis in mesocosm experiments at dissipation rates representative of natural conditions; see also our reply to the next comment.

      The authors should consider testing the space of Zone II on their phase map, for instance at very high particle concentrations and even lower rotational speeds, in order to show that their derivations match experiments.

      Good point. As mentioned in our answer above to comment #4 of the public review, Zone II lies beyond the measuring range of our experimental setup. Instead, we refer to the recent study of Wu et al. (2024) (reference number [64] of the revised manuscript) which demonstrated that dense scum layers of Microcystis colonies are aggregation-dominated. These mesocosm experiments agree with our model predictions and their parameter range falls within Zone II. We have included in the revised version, lines 328-337, a detailed discussion where we elucidate the parameter range covered in our experiments and compare our predictions for Zone II with the recent findings of Wu et al. (2024).

      The authors should show their calibration data and fit for the correction function of equation S1. Additionally, you may consider showing "raw" and "corrected" histograms of the size distribution, to demonstrate exactly what corrections are made.

      As mentioned in our answer above to comment #5 of the public review, we have included in the revised version of the Supporting Information the new Supplementary Figure S8, which shows the raw and adjusted histograms of the size distribution, including the associated uncertainties. Furthermore, the correction function is now explained in detail in the new Supporting Information Text in lines 785-796.

      The authors might consider commenting on Figure S3 a bit more in the main text. Even at very high dissipation rates, the cyanobacterial groups don't plummet to size 1, but stay in an equilibrium around 10-20x the diameter of a single cell. What might this mean for industrial applications trying to break up the groups?

      We agree with the reviewer that further discussion of Figure S3, panels E and F, is warranted. In the revised version of the manuscript, under section Fragmentation of Microcystis colonies occurs through erosion in lines 133-137, we have now included a discussion of this figure. Figure S3F shows that more than 90% of the total biovolume ends up in the category “small colonies” (mostly single cells and dimers); hence, most of the initially large colonies do fragment to single cells or dimers. Only about 5-10% of the biovolume remains as “large colonies” of 10-20 cells. Although it is challenging to draw definitive conclusions about the behavior of these remaining large colonies, as they account for only a minor fraction of the suspension, one hypothesis is that variability in mechanical properties between colonies results in a subset of colonies exhibiting exceptional resistance even to very high dissipation rates (see lines 133-137).

      Minor comments:

      Typo Caption of Figure 2: Should read [m^2/s^3] for units

      Thanks for catching this typo. The units in the caption of Figure 2 has been corrected to [m^2/s^3].

      There is no Equation 10 in Materials and Methods as indicated in the rheology section.

      We thank the reviewer for pointing out the lack of clarity in this algebraic manipulation. In fact, the yield stress has to be substituted in the current Equation 11 (previously Eq.10), from which the critical dissipation rate must be substituted in Equation 3. The result is the critical colony size (l* = 2.8) mentioned in line 243 of the revised manuscript. The correct equation numbers and algebraic substitutions are now indicated in lines 241-243 of the revised version of the manuscript.

      <Reviewer #2 (Public review):

      Especially the introduction seems to imply that shear force is a very important parameter controlling colony formation. However, if one looks at the results this effect is overall rather modest, especially considering the shear forces that these bacterial colonies may experience in lakes. The main conclusion seems that not shear but bacterial adhesion is the most important factor in determining colony size. As the importance of adhesion had been described elsewhere, it is not clear what this study reveals about cyanobacterial colonies that was not known before.

      We would like to emphasize several key findings that our study reveals about the impacts of fluid flow on cyanobacterial colonies:

      (I) Quantification of mechanical strength in cyanobacterial colonies: Our results demonstrate the high mechanical strength of cyanobacterial colonies, as evidenced by the requirement of high shear rates to achieve fragmentation. This is new knowledge, that was not known before for cyanobacterial colonies. To this end, our study highlights the resilience of these colonies against naturally occurring flows and bridges the gap between theoretical assumptions about colony strength and experimentally measured mechanical properties.

      (II) The discovery that the mechanical strength of colonies differs between colonies formed by cell division and colonies formed by aggregation. This is again new knowledge, that was not known before for cyanobacterial colonies.

      (III) Validation of a hypothesis regarding colony formation: Using a fluid-mechanical approach, we confirm the findings of recent genetic studies (references 25 and 67 of the revised version of the manuscript) which indicated that colony formation occurs predominantly via cell division rather than cell aggregation under natural conditions (except in very dense blooms).

      (IV) Practical guidelines for cyanobacterial bloom control: Our findings provide valuable insights into the design of artificial mixing systems applied in several lakes. Artificial mixing of lakes is based on fundamentals of fluid flow, aiming at preventing aggregation of buoyant cyanobacteria in scum layers at the water surface. Our results show that the dissipation rates generated by bubble blumes in artificially mixed lakes can fragment cyanobacterial colonies formed by aggregation, but are not intense enough to cause fragmentation of division-formed colonies (see Figure 5 and lines 348-360).

      The agreement between model and experiments is impressive, but the role of the fit parameters in achieving this agreement needs to be further clarified.

      The influence of the fit parameters (namely the stickiness α1 and the pairs of colony strength parameters S1,q1,S2,q2) is discussed in the sections Dynamical changes in colony size modelled by a two-category distribution in lines 247-253 and Materials and Methods in lines 559-565. We kept the discussion concise to maintain readability. However, we agree with the reviewer that additional details about the importance of the fit parameters and the sensitivity of the results to these parameters could be beneficial. In the revised version of the section Materials and Methods in lines 560-563, we have included a detailed discussion of the fit parameters.

      The article may not be very accessible for readers with a biology background. Overall, the presentation of the material can be improved by better describing their new method.

      We apologize for the limited readability of the description of the experimental setup and model used. In the revised version of the manuscript and the SI, we have detailed further the new methods presented here. The modifications include a detailed description of the operating range of the cone-and-plate shear setup (subsection Cone-and-plate shear of the section Materials and Methods, in lines 462-473). Furthermore, we think that incorporation of the recent experimental results of Wu et al. (2024), on lines 331-337 of the manuscript, will appeal to readers with a biology background. Their mesocosm experiments support our model prediction that aggregation is the dominant mechanism for colony formation in region (II) of Figure 5.

      Reviewer #2 (Recommendations For The Authors):

      (1) The authors seem too modest in claiming technological advance. They should describe the technological advance of combining microscopy with rheometry, in such a way that this invites others to apply this or similar approaches on biological samples. Even though I feel that the advancement of knowledge of this system by their method is relatively modest, there may be more advances in other systems.

      We appreciate the positive view of the reviewer towards the importance of this technology and we agree that its advantages should be advertised to researchers investigating similar systems. We have now given more attention to the technological advance of combining microscopic imaging with rheometry in the final paragraph of the Conclusions (lines 386400), where we now also briefly discuss an interesting recent study of marine snow (Song et al. 2023, Song and Rau 2022, reference numbers 70 and 71 of the revised manuscript), which used a similar combination of microscopy and rheometry as in our study. Furthermore, in the Methods section, we now briefly explain how the rheometry can be adjusted to investigate other systems (lines 474-480).

      (2) It seems reasonable -also based on what we already know about these aggregates - to assume that the main difference in shear sensitivity between field samples and cultures lies in the production of extracellular polysaccharide substance (EPS). To go beyond what is already known, the study could try to provide more direct and quantitative evidence for EPS involvement. For example, using a chemical quantification of EPS levels, or perturbing EPS levels using digestive enzymes.

      We agree with the reviewer that further characterization of the EPS is highly relevant to understand the mechanical strength of colonies. However, we believe that chemical quantification and/or degradation of EPS lies beyond the scope of our article and should be addressed by future studies.

      (3) Assuming EPS is indeed the reason for the differences in shear resistance: the authors speculate the reason why the field samples have more EPS lies in chemical composition (Calcium/nitrogen levels). In addition, there could be grazing that is known to promote aggregation (possibly increasing EPS), or just inherent genetic differences between strains. I am not necessarily expecting the authors to explore this direction experimentally, but it seems certainly feasible and would make the final result less speculative.

      We agree with the reviewer that there are more biotic and abiotic factors that can influence EPS amount and composition. The influence of grazing and other relevant factors on cell adhesion is discussed in references [26-29], cited in our introduction in lines 50-53. As discussed in our answer to recommendation #2, we believe that a quantitative investigation of these various factors is beyond the scope of this work and should be addressed in future studies.

      (4) A cool finding seems to be the critical relative diameter (Fig 2E), a colony size that seems invariant under shear. I was slightly surprised that the authors seem to take little effort to understand this critical diameter mechanistically (for example by predicting it, or experimentally perturbing it). Again, not a necessary requirement, but this is where the study could harness its technological advantage to provide a more quantitative understanding of something that goes beyond the existing knowledge of the system.

      We apologize to the reviewer if our descriptions and discussions of Figure 2 were unclear. One of the key conclusions from our experiments is that the critical relative diameter depends on the dissipation rate, as shown in Figure 2F. This dependence is also incorporated into the model through the constitutive equation (2). Furthermore, we expect the mechanical resistance of colonies, quantified by the critical relative diameter, to be affected by other biotic and abiotic factors that influence EPS amount and composition.

      (5) The jump from 0.019 to 1.1 m²/s³ seems large. What was the reason for not exploring intermediate values? The authors should also define low, modest and intense dissipation rates more clearly. Currently, they seem somewhat arbitrarily defined, i.e. 0.019 m²/s³ is described as low (methods) and moderate (results). In Fig 2, the authors further talk about low dissipation rates without a quantitative description.

      We thank the reviewer for pointing out the lack of clarity in the choice of parameter range and the nomenclature. Regarding the former, the suspension of division-formed colonies of Microcystis strain V163 displayed negligible fragmentation for dissipation rates between 0.019 to 1.1 m<sup>2</sup>/s<sup>3</sup>, as seen in Figures S2A and S3A. Due to the low sensitivity of the fragmentation results in this region, we don’t expect change in behavior for intermediate values. Regarding the nomenclature, we have corrected the inconsistencies throughout the text. We have chosen to name the dissipation rate values as: low for values typical of windmixing, moderate for values typical of the core of bubble plumes, and intense for values typical of propellers. Whenever mentioned in the text, the numerical value of dissipation rate is also included to avoid doubt.

      (6.) The structure and narrative of the paper can be improved. The article first describes all lab culture experiments and then the model, while the first figure already shows model fits. Perhaps it would be better to first describe the aggregation experiments, to constrain the appropriate terms of the model, and then move to fragmentation.

      We appreciate the recommendation of the reviewer regarding the structure. We have chosen to describe first the fragmentation experiments (Fig. 2), as these can be understood without introducing the aggregation effects. In contrast, the steady state results in the aggregation experiments (Fig. 3) come from the balance between aggregation and fragmentation. Therefore, we judged the current order to be more appropriate. The model fits are combined with the experimental results in Figures 2 and 3 to have a concise display. We have ensured that all the concepts required to understand each figure panel are explained prior to their discussion.

      (7) The number of data points that go into the histogram needs to be indicated. The main reason is that the authors report the distribution in terms of the biovolume fraction, suggesting the numerical counts are converted into volume. This to me seems like the most sensible parameter, but I could not find how this conversion is calculated (my apologies if I missed it). This seems especially relevant because a single large colony can impact this histogram quite considerably.

      We apologize for the lack of clarity in the calibration and conversion steps of the size distribution. As discussed above in the answer to comment #5 of the reviewer #1, more details of the calibration process have been added to the revised version of the Supporting Information Text in lines 785-796. Furthermore, the new Supplementary Figure S8 presents examples of the raw and adjusted size distribution, including the total number of counted colonies per histogram and the associated uncertainties in the concentration and biovolume distributions.

      (8) Over the timescales measured here, colonies could start sinking (or floating), possibly in a size-dependent manner, that could lead to a bias due to boundary effects. Did the authors consider this potential artifact?

      The sinking or floating of colonies is a relevant process which was taken into account in the choice of our parameter range for the dissipation rate. The minimum dissipation rate used in our experiments ensures that the upward inertial velocity near stagnation is sufficient to counteract the sedimentation of colonies. A detailed discussion of the choice of the parameter range is now included in the revised version of the Materials and Methods in lines 462-473.

      (9) "On the one hand, sequencing of the genetic diversity within Microcystis colonies supports the hypothesis that colony formation undernatural conditions is primarily driven by cell division [25]. On the other hand, cell aggregation can occur on a shorter time scale and may offer improved protection against high grazing pressure [26]." This appears somewhat constructed, as what is described as "on the other hand" is not evidence against the genetic diversity.

      We agree that the suggested dichotomy in this text appeared somewhat constructed, and we have now removed the wording “on the one hand” and “on the other hand”. The studies from reference [25] demonstrated that the genetic diversity between independent Microcystis colonies is much greater than the diversity within colonies. If cell aggregation was the dominant mechanism, a similar genetic diversity would be observed between and within colonies, which contrasts the findings from reference [25]. We have adjusted the text in the revised manuscript, in lines 46-54, to clarify this point.

      (10) The phase diagram seems largely based on extrapolations that are made outside of the measurement regime (e.g. dark red bars indicating the dissipation rate, Fig 5 - by the way 1 this color scheme could use some better contrast, by the way 2 Fig S7 suggests a wider dissipation rate range as indicated in Fig 5, why?). Hence there seems to be the need to more clearly lineate experimental results, simulations, and extrapolations in the phase diagram.

      We agree with the reviewer that further clarifications should be given about the parameter range covered in our experiments and apologize for the lack of readability in the color scheme of Fig 5. In lines 329-337, 346-347, 353-355, we have highlighted the parameters range covered by our experiments as well as the range covered by previous studies of windmixed mesocosm (namely reference [64] of the revised manuscript). Regarding the color scheme of Figure 5, we have modified the legend of the figure to improve readability. The color contrast was increased and leader lines were added to connect the colored bars with the respective label.

      (11) Unfortunately, the manuscript did not contain line numbers.

      We apologize to the reviewer for the lack of line numbers in our initial version. The revised version of the manuscript now contains line numbers, both in the main text and the supporting information.

      (12) Fig 2D. Caption is too minimal. Y-axis could better be named "Fraction of colonies" as both small and large colonies are plotted.

      The caption for Figure 2D was extended to better describe the plot. We have kept the y-axis label as “Fraction of small colonies”, since this is the quantity displayed by the three curves in the plot.

      (13) An inset should have axis labels.

      All the insets in our plots display the same variables as their respective plots. In order to keep the plots light and preserve readability, we therefore prefer to present the axis labels only along the x-axis and y-axis of the main plots, which implies by convention that the same axis labels also apply to the insets. To the best of our knowledge, this is a common approach.

      (14) Page 5, first words. Likely Fig 3A, not 2A was meant.

      We thank the reviewer for pointing out this readability issue. We intend to compare both Figures 2A and 3A. The text of the revised manuscript, in lines 146-148, has been adjusted with the correct figure numbers.

      (15) Introduction, second last paragraph, third last line. "suspension leaded to a broad distribution" I assume you meant "... led to a ..."

      We thank the reviewer for pointing out this typo. It has been corrected (line 122).

    1. eLife Assessment

      This important and creative study finds that the uplift of the Qinghai-Tibet Plateau - via its resultant monsoon system rather than solely its high elevation - has shifted avian migratory directions from a latitudinal to a longitudinal orientation. The authors have expanded and clarified their lines of evidence (including an enlarged tracking set and explicit caveats on species-level eBird inference), such that the central claims are now solid. The conclusions - that monsoon dynamics, rather than elevation per se, are most consistent with observed longitudinal reorientation - illustrates how large, community-sourced and climate-model datasets can inform continent-scale shifts in migratory behavior over time that complement traditional approaches.

    2. Joint Public Review:

      The study assesses how the rise of the Qinghai-Tibet Plateau affected patterns of bird migration between their breeding and wintering sites.

      This is an interesting topic and a novel theme. The visualisations and presentation are to a very high standard. The Introduction is very well-written and introduces the main concepts well, with a clear logical structure and good use of the literature. The Methods are detailed and well-described, and written in such a fashion that they are transparent and repeatable.

      Editorial note: These latest revisions are minor in the sense that they expand on the dataset but do not change the primary results.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      The authors have done a good job of responding to the reviewer's comments, and the paper is now much improved.

      Again, we thank the reviewer for positive comments during review.

      Reviewer #2 (Public review):

      I would like to thank the authors for the revision and the input they invested in this study.

      We are grateful for your thoughtful feedback and enthusiasms, which helps us improve our manuscript. 

      With the revised text of the study, my earlier criticism holds, and your arguments about the counterfactual approach are irrelevant to that. The recent rise of the counterfactual approach might likely mirror the fact that there are too many scientists behind their computers, and few go into the field to collect in situ data. Studies like the one presented here are a good intellectual exercise but the real impact is questionable. 

      We understand your concern about the relevance of the counterfactual approach used in our study. Our intent in using a counterfactual scenario (reconstructing migration patterns assuming pre-uplift conditions on the QTP) was to isolate the potential influence of the plateau’s geological history on current migration routes. Similar approach was widely used to estimate how biogeographic barriers facilitated the divergent vertebrate communities across the world  (e.g., Williams et al. 2024). We agree that such an approach must be used carefully. In the revision, we have explicitly clarified why this counterfactual comparison is useful – namely it provides a theoretical baseline to test how much the QTP’s uplift (and the associated monsoon system) might have redirected migration paths (Gilbert and Lambert 2010, Sanmartín 2012, Bull et al. 2021). We acknowledge that the counterfactual results are theoretical and have explicitly emphasised the assumptions involved (i.e., species–environment relationships hold between pre- and post- lift environments) in the main text (Lines 91- 98). Nonetheless, we defend the approach as a valuable study design: it helps generate testable hypotheses about migration (for instance, that the plateau’s monsoon-driven climate, rather than just its elevation, introduces an east–west shift en route). 

      References:

      Bull, J. W., N. Strange, R. J. Smith, and A. Gordon. 2021. Reconciling multiple counterfactuals when evaluating biodiversity conservation impact in social-ecological systems. Conservation Biology 35:510-521.

      Gilbert, D., and D. Lambert. 2010. Counterfactual geographies: worlds that might have been. Journal of Historical Geography 36:245-252.

      Sanmartín, I. 2012. Historical Biogeography: Evolution in Time and Space. Evolution: Education and Outreach 5:555-568.

      Williams, P. J., E. F. Zipkin, and J. F. Brodie. 2024. Deep biogeographic barriers explain divergent global vertebrate communities. Nature Communications 15:2457.

      All your main conclusions are inferred from published studies on 7! bird species. In addition, spatial sampling in those seven species was not ideal in relation to your target questions. Thus, no matter how fancy your findings look, the basic fact remains that your input data were for 7 bird species only! Your conclusion, “our study provides a novel understanding of how QTP shapes migration patterns of birds” is simply overstretching.

      We appreciate the reviewer’s comment here. We would like to clarify that our conclusions regarding longitudinal shifts in migratory distributions are based on distribution models derived from eBird data of 50 species, not merely on migration tracks from seven species. These species-level spatiotemporal models allow us to infer large-scale biogeographic patterns across the Qinghai-Tibet Plateau (QTP).

      The original seven tracking species were used specifically for analysing the relationship between migration directions (azimuths) and environmental variables, offering independent support for the patterns revealed in the eBird-based distribution models. Recognising the reviewer’s concern on sample size and coverage, we have now expanded this part by incorporating migration tracks from 12 additional species, derived through georeferenced digitisation of published migratory maps. Importantly, this expansion did not change our conclusions, i.e., the monsoons instead of the high elevations act as a prominent role in shaping the current migration direction of birds in the QTP. While the overall conclusion remains unchanged, the expanded dataset led to slight changes in difference between spring and autumn migration. We have updated the Figure 2 and the corresponding results and conclusions throughout the manuscript. We have also clarified in the Discussion that regions of the QTP with relatively less data might lead to underestimation of some migration routes to make sure readers are aware of these data limitations (Lines 211-218).

      The way you respond to my criticism on L 81-93 is something different than what you admit in the rebuttal letter. The text of the ms is silent about the drawbacks and instead highlights your perspective. I understand you; you are trying to sell the story in a nice wrapper. In the rebuttal you state: “we assume species' responses to environments are conservative and their evolution should not discount our findings.” But I do not see that clearly stated in the main text.

      Thanks, as suggested we have clearly stated the assumptions of niche conservatism in the Introduction (Lines 91-98).

      In your rebuttal, you respond to my criticism of "No matter how good the data eBird provides is, you do not know population-specific connections between wintering and breeding sites" when you responded: ... "we can track the movement of species every week, and capture the breeding and wintering areas for specific populations" I am having a feeling that you either play with words with me or do not understand that from eBird data nobody will be ever able to estimate population-specific teleconnections between breeding and wintering areas. It is simply impossible as you do not track individuals. eBird gives you a global picture per species but not for particular populations. You cannot resolve this critical drawback of your study. 

      We agree that inferring population-specific migratory connections (teleconnections) from eBird data is challenging and inherently limited. eBird provides occurrence records for species, but it generally cannot distinguish which breeding population an individual bird came from or exactly where it goes for winter. Our objective is not to determine one-to-one migratory links between specific populations, but to identify general broad-scale directional shifts when birds cross the QTP during their migration. We regret any confusion caused by our earlier wording. To make this clearer, we have now emphasised that our interests focus on the migratory direction and their environmental correlates, rather than population assignments. We have also rephrased the relevant text to explicitly clarify that our study operates at the species level and at large spatial scales (Lines 253–257). We exemplify how distribution of eBird observations and GPS tracking data of four species can be different from each other whilst showing similar migration patterns (Figure S10). We have also explicitly stated in the Discussion that confirming population connectivity would require targeted tracking or genetic studies, and that our eBird-based analysis could only suggest plausible routes and region-to-region linkages (Lines 200-202).

      I am sorry that you invested so much energy into this study, but I see it as a very limited contribution to understanding the role of a major barrier in shaping migration.

      We thank the reviewer’s honest assessment and understand the concern regarding the scope of our contribution. Our intention was not to provide an exhaustive account of all aspects of the QTP as a migratory barrier, but to address a specific and underexplored question: how the uplift of the plateau and the resulting monsoon system may have influenced the orientation of avian migration routes. By integrating both satellite tracking and community-contributed data, we have explored how the uplift of the QTP could shape avian migration across the area. We believe our findings provide important insights of how birds balance their responses to large-scale climate change and geological barrier, which yields the most comprehensive picture to date of how the QTP uplift have shaped migratory patterns of birds. We have also discussed the study’s limitations – including the small number of tracking species (Lines 205218), the use of occurrence data as a proxy for breeding and wintering regions (Lines 200-202), the uneven sampling coverage in the QTP (Lines 202-205) and the assumptions behind the counterfactual scenario (Lines 91-98). This ensures that readers understand the context and constraints of our findings.

      My modest suggestion for you is: go into the field. Ideally use bird radars along the plateau to document whether the birds shift the directions when facing the barrier.

      We thank the reviewer for this suggestion. We agree that radar holds promise for understanding certain aspects of bird migration, particularly for detecting flight intensity, altitudes, and timing. However, the radar systems are currently challenging to resolve migration at the level of species, populations, or individuals, which are central to questions of migratory connectivity and route selection. Most radar signals cannot distinguish between species in mixed flocks, nor can they link breeding and wintering sites for tracked individuals. In addition, the spatial coverage of radar installations remains limited, especially across remote and high-elevation regions like the Qinghai-Tibet Plateau, where infrastructure and continuous power supply are still logistically prohibitive. 

      The eBird dataset used in our study is itself a form of field-based observation, contributed by tens of thousands of birdwatchers across continents, including the QTP region (Figure S11). While eBird cannot provide individual-level tracking, it captures spatiotemporal patterns of occurrence at broad scales, making it a valuable complement to satellite tracking data. We would also emphasis that our team has extensive field experience in the Qinghai-Tibet Plateau (about twenty years), including multi-year expeditions to deploy satellite tags and observe migration at stopover sites. 

      We agree that more direct tracking (e.g. GPS tagging) would be an ideal way to validate migration pathways and population connectivity. Using the satellite-tracking data, we have showed that most tracking species shifted their migration direction when facing the QTP (Figure S6). In this revision, as stated we managed to add a number of 12 more species with satellite tracking routes. We have also noted that future studies should build on our findings by using dedicated tracking of more individual birds and monitoring of migration over the QTP. We have cited recent advances in these techniques and suggested that incorporating more tracking data could further test the hypotheses generated by our work (Lines 205-218).

      Reviewer #2 (Recommendations for the authors):

      L55 "an important animal movement behaviour is.." Is there any unimportant animal movement? I mean this sentence is floppy, empty.

      We used this sentence to introduce migration. We have removed “important” to reduce ambiguous phrasing.

      L 152-154 This sentence is full of nonsense or you misinterpretation. First of all, the issue of inflexible initiation of migration was related to long-distance migrants only! The way you present it mixes apples and oranges (long- and short-distance migrants). It is not "owing to insufficient responses" but due to inherited patterns of when to take off, photoperiod and local conditions.

      We stated that this claim is invoked for long-distance migrants before this sentence and have rewritten the sentence to highlight that this interpretation is for long-distance migrants. 

      L 158 what is a migration circle? I do not know such a term.

      We have amended it as “annual migration cycle”, which is a more common way to describe the yearly round-trip journey between breeding and wintering grounds of birds.

      L 193 The way you present and mix capital and income breeding theory with your simulation study is quite tricky and super speculative.

      We thank the reviewer for raising this important concern. We have presented this idea as an inference rather than a conclusion: “This pattern could be consistent with a ‘capital breeding’ strategy — where birds rely on endogenous reserved energy gained prior to reproduction — rather than an ‘income’ strategy where birds ingest nutrients mainly collected during the period of reproductive activity. This collaborates with studies on breeding strategies of migratory birds in Asian flyways. However, we note that this interpretation would require further study.” By adding this caution, we made it clear that we are not asserting this link as proven fact, only suggesting it as one possible explanation. We have also doublechecked that the rest of the discussion around this point is framed appropriately. Moreover, to help illustrate why we raised this ecological interpretation, we would also draw attention to examples of satellite tracking points from several species (e.g., Beijing Swift, Demoiselle Crane) in the following, which show obvious shifts in migratory direction near the QTP region. These turning points suggest potential behavioral responses to environmental constraints, such as climatic corridors or energy availability, which could help motivate our discussion of possible capital breeding strategies in these species.

    1. eLife Assessment

      This important work has the potential to expand the repertoire of transgenic animals for systems neuroscience investigations across multiple fields. The generation of new reagents has the potential to open new directions in experimental design, and the Cas9-based approach for generating mice may provide additional benefits compared to existing BAC transgenic mouse lines. However, whereas some of the imaging data are compelling, quantitative analysis of transgene fidelity is incomplete, as it relies on a qualitative description of reporter XFP expression at low magnification, with some electrophysiological characterization.

    2. Reviewer #1 (Public review):

      Summary:

      I read with much attention the manuscript titled "Generation of knock-in Cre and FlpO mouse lines for precise targeting of striatal projection neurons and dopaminergic neurons" in which the authors reveal five transgenic lines to target diverse neuronal populations of the basal ganglia. In addition, the authors also provide some assessments of the functionality of the lines.

      Strengths:

      Knockin lines made readily available through Jackson. Lines show specific expression.

      Weaknesses:

      Although I have no doubt these knocking lines will be broadly used by researchers in the field, I find the scientific advances of the study and the breadth of the resource provided quite limited. This is partly because 4 of these lines have been generated by other laboratories. For instance, there are already two other Dat-FlpO lines generated (JAX#: 033673 and 035436), with one of them already characterized (PMID: 33979604). Similarly, Drd1-Cre and Adora2a-Cre have been used abundantly since they were generated over a decade ago, and a novel Drd1-FlpO line has been characterized thoroughly recently (PMID: 38965445). Indeed, some of these lines were BAC transgenic, and I agree with the authors that there is a sound rationale for generating knock-in mice; however, the authors should then demonstrate if/how their new drivers are superior. Overall, the valuable resource generated by the authors would benefit from additional quantification and validation.

    3. Reviewer #2 (Public review):

      Summary:

      The authors report the generation and validation of new knock-in mouse lines enabling precise targeting of basal ganglia projection neurons and midbrain dopamine neurons. By inserting recombinase sequences at endogenous loci, they provide tools that improve on older BAC-based models, with the additional benefit that all lines are openly available through Jackson Laboratories. This work is timely, fills a longstanding gap for the community, and will support both basic circuit mapping and disease-related research.

      Strengths:

      The major strength of this study is the provision of new genetic resources that will be widely used by the basal ganglia and dopamine research communities. Anatomical and electrophysiological data indicate appropriate expression and preserved intrinsic properties. The Flp lines, in particular, show labeling largely confined to basal ganglia circuits, making them especially attractive for circuit-based studies. A further strength is the use of a T2A-recombinase insertion at the native gene stop codon, which preserves endogenous regulation and maintains near-physiological expression of Adora2a, Drd1a, and DAT. The availability of both Cre and Flp versions enables powerful intersectional strategies, and open distribution through Jackson Laboratories ensures broad accessibility and long-term value.

      Weaknesses:

      The major limitation is the discrepancy between Cre and Flp lines, with Cre generally driving broader expression than Flp. This raises concerns about anatomical fidelity that require validation at the cellular level. For the DAT-FlpO line, efficiency remains insufficiently quantified, and higher-resolution co-labeling with TH immunostaining is needed. Electrophysiological comparisons between Cre and Flp versions are also incomplete; current data suggest potential physiological differences, which warrant additional statistical testing and, at a minimum, explicit discussion in the manuscript.

    4. Reviewer #3 (Public review):

      Summary:

      Using latest knock-in technology, the authors generated a set of five mouse lines with expression of recombinases in striatal projection neurons and dopaminergic neurons for public use. They rigorously characterize the expression of the recombinases by intersectional crossing with reporter lines to demonstrate that these lines are faithful, and they perform electrophysiological experiments in slices to provide evidence that the respective neurons show the expected features in these assays.

      Strengths:

      The characterization of the new mouse lines is exceptional, and these will be widely used by the community. The mouse lines are openly available for the community to use.

      Weaknesses:

      No weaknesses were identified by this Reviewer.

    5. Author response:

      We thank all three reviewers for their thoughtful and constructive evaluations of our manuscript, “Generation of knock-in Cre and FlpO mouse lines for precise targeting of striatal projection neurons and dopaminergic neurons.” We are encouraged that the reviewers recognize the value, specificity, and utility of these new lines for the basal ganglia and dopamine research communities. Below, we summarize our planned revisions and clarifications in response to the reviewers’ comments.

      (1) Novelty and comparison with existing lines

      We appreciate Reviewer 1’s point regarding the existence of previously generated Cre and Flp lines targeting similar neuronal populations. Our project was initiated six years ago, and during the course of generating and characterizing all five lines, we became aware that similar individual lines have since been developed by other groups. Nevertheless, our study provides a coordinated and independently validated set of lines created using a standardized knock-in (KI) strategy and distributed through Jackson Laboratories for unrestricted community use. Importantly, whereas previous BAC transgenic approaches rely on random insertion, which can lead to position effects and ectopic expression, our design places the recombinase coding sequence immediately downstream of the endogenous stop codon using a self-cleaving T2A peptide. This ensures expression under native promoter and regulatory control, preserving physiological gene regulation.

      To address the Reviewers’ points, we will (i) expand the Introduction and Discussion to clarify the rationale and advantages of endogenous promoter–driven recombinase expression over BAC-based systems, emphasizing that our lines provide a uniform, promoter-controlled, and publicly accessible toolkit for the community, (ii) and explore including a comparative table summarizing differences in construct design, expression fidelity, and recombination efficiency across published lines (e.g., PMID 33979604, 38965445).

      (2) Quantification, validation, and comparison of Cre vs FlpO

      We agree with Reviewers 1 and 2 that further quantification and discussion of Cre versus FlpO fidelity will strengthen the manuscript. The observed difference in expression breadth between Cre and FlpO lines likely reflects a fundamental property of the recombinases themselves rather than a discrepancy in targeting. Cre recombinase is significantly more enzymatically efficient than FlpO, meaning that even very low endogenous levels of gene expression (e.g., Drd1a or Adora2a) can drive Cre-dependent recombination, whereas FlpO requires higher expression thresholds. Consequently, reporter-based readouts will inherently appear broader for Cre lines, despite both being driven by the same endogenous promoters.

      To address these points, we will (i) provide quantitative co-labeling analyses for the DAT-FlpO line with TH immunostaining to assess efficiency and specificity, (ii) clarify in the Results and Discussion that differences between Cre and FlpO expression patterns largely stem from differences in recombinase kinetics and sensitivity, not mismatched promoter activity, (iii) and include representative high-resolution images and relevant statistics in the revised figures. Importantly, we would like to note that RNAscope may not be an ideal validation approach in this context, as in situ transcript detection cannot capture the enzymatic threshold differences that determine reporter recombination and thus will not help address observed differences between Cre and FlpO lines. Finally, we are actively performing electrophysiological comparisons between Cre and FlpO lines to rigorously quantify potential physiological differences between them. Updated analyses will be incorporated as available or described as ongoing future work.

      (3) Discussion of scope and interpretation

      We appreciate the reviewers’ suggestions to better contextualize the scope of this resource. We will revise the Discussion to (i) highlight that the Cre–FlpO pairings enable powerful intersectional and cross-line strategies for dissecting basal ganglia and midbrain circuitry, (ii) and clarify that our goal was to generate a rigorously validated foundational resource, with detailed functional comparisons and manipulation studies to be explored in subsequent work.

      In summary, we thank the reviewers for their insightful feedback. The planned revisions and clarifications will underscore the unique strengths of our knock-in design, explore potential Cre–FlpO differences, and highlight the value of this standardized and accessible toolkit for the neuroscience community.

    1. eLife Assessment

      This important study resolves the structure of one missing piece of the eukaryotic DNA replication fork, the leading strand clamp loader. Convincing electron microscopy data provides a strong basis for analyzing differences and similarities with other RFC complexes. The evidence to support the specific role of the β-hairpin is incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      The authors report the structure of the human CTF18-RFC complex bound to PCNA. Similar structures (and more) have been reported by the O'Donnell and Li labs. This study should add to our understanding of CTF18-RFC in DNA replication and clamp loaders in general. However, there are numerous major issues that I recommend the authors fix.

      Strengths:

      The structures reported are strong and useful for comparison with other clamp loader structures that have been reported lately.

      Comments on revisions:

      The revised manuscript is greatly improved. The comparison with hRFC and the addition of direct PCNA loading data from the Hedglin group are particular highlights. I think this is a strong addition to the literature.

      I only have minor comments on the revised manuscript.

      (1) The clamp loading kinetic data in Figure 6 would be more easily interpreted if the three graphs all had the same x axes, and if addition of RFC was t=0 rather than t=60 sec.

      (2) The author's statement that "CTF18-RFC displayed a slightly faster rate than RFC" seems to me a bit misleading, even though this is technically correct. The two loaders have indistinguishable rate constants for the fast phase, and RFC is a bit slower than CTF18-RFC in the slow phase. However, the data also show that RFC is overall more efficient than CTF18-RFC at loading PCNA because much more flux through the fast phase (rel amplitudes 0.73 vs 0.36). Because the slow phase represents such a reduced fraction of loading events, the slight reduction in rate constant for the slow phase doesn't impact RFC's overall loading. And because the majority of loading events are in the fast phase, RFC has a faster halftime than CTF18-RFC. (Is it known what the different phases correspond to? If it is known, it might be interesting to discuss.)

      (3) AAA+ is an acronym for "ATPases Associated with diverse cellular Activities" rather than "Adenosine Triphosphatase Associated".

    3. Reviewer #2 (Public review):

      Summary

      Briola and co-authors have performed a structural analysis of the human CTF18 clamp loader bound to PCNA. The authors purified the complexes and formed a complex in solution. They used cryo-EM to determine the structure to high resolution. The complex assumed an auto-inhibited conformation, where DNA binding is blocked, which is of regulatory importance and suggests that additional factors could be required to support PCNA loading on DNA. The authors carefully analysed the structure and compared it to RFC and related structures.

      Strength & Weakness

      Their overall analysis is of high quality, and they identified, among other things, a human-specific beta-hairpin in Ctf18 that flexible tethers Ctf18 to Rfc2-5. Indeed, deletion of the beta-hairpin resulted in reduced complex stability and a reduction in a primer extension assay with Pol ε. Moreover, the authors identify that the Ctf18 ATP-binding domain assumes a more flexible organisation.

      The data are discussed accurately and relevantly, which provides an important framework for rationalising the results.

      All in all, this is a high-quality manuscript that identifies a key intermediate in CTF18-dependent clamp loading.

      Comments on revisions:

      The authors have done a nice job with the revision.

    4. Reviewer #3 (Public review):

      Summary:

      CTF18-RFC is an alternative eukaryotic PCNA sliding clamp loader which is thought to specialize in loading PCNA on the leading strand. Eukaryotic clamp loaders (RFC complexes) have an interchangeable large subunit which is responsible for their specialized functions. The authors show that the CTF18 large subunit has several features responsible for its weaker PCNA loading activity, and that the resulting weakened stability of the complex is compensated by a novel beta hairpin backside hook. The authors show this hook is required for the optimal stability and activity of the complex.

      Relevance:

      The structural findings are important for understanding RFC enzymology and novel ways that the widespread class of AAA ATPases can be adapted to specialized functions. A better understanding of CTF18-RFC function will also provide clarity into aspects of DNA replication, cohesion establishment and the DNA damage response.

      Strengths:

      The cryo-EM structures are of high quality enabling accurate modelling of the complex and providing a strong basis for analyzing differences and similarities with other RFC complexes.

      Weaknesses:

      The manuscript would have benefited from a more detailed biochemical analysis using mutagenesis and assays to tease apart the differences with the canonical RFC complex. Analysis of the FRET assay could be improved.

      Overall appraisal:

      Overall, the work presented here is solid and important. The data is mostly sufficient to support the stated conclusions.

      Comments on revisions:

      While the authors addressed my previous specific concerns, they have now added a new experiment which raises new concerns.

      The FRET clamp loading experiments (Fig. 6) appear to be overfitted so that the fitted values are unlikely to be robust and it is difficult to know what they mean, and this is not explained in this manuscript. Specifically, the contribution of two exponentials is floated in each experiment. By eye, CTF18-RFC looks much slower than RFC1-RFC (as also shown previously in the literature) but the kinetic constants and text suggest it is faster. This is because the contribution of the fast exponential is substantially decreased, and the rate constants then compensate for this. There is a similar change in contribution of the slow and fast rates between WT CTF18 and the variant (where the data curves look the same) and this has been balanced out by a change in the rate constants, which is then interpreted as a defect. I doubt the data are strong enough to confidently fit all these co-dependent parameters, especially for CTF18, where a fast initial phase is not visible. I would recommend either removing this figure or doing a more careful and thorough analysis.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      The authors report the structure of the human CTF18-RFC complex bound to PCNA. Similar structures (and more) have been reported by the O'Donnell and Li labs. This study should add to our understanding of CTF18-RFC in DNA replication and clamp loaders in general. However, there are numerous major issues that I recommend the authors fix. 

      Strengths: 

      The structures reported are strong and useful for comparison with other clamp loader structures that have been reported lately. 

      Weaknesses: 

      The structures don't show how CTF18-RFC opens or loads PCNA. There are recent structures from other groups that do examine these steps in more detail, although this does not really dampen this reviewer's enthusiasm. It does mean that the authors should spend their time investigating aspects of CTF18-RFC function that were overlooked or not explored in detail in the competing papers. The paper poorly describes the interactions of CTF18-RFC with PCNA and the ATPase active sites, which are the main interest points. The nomenclature choices made by the authors make the manuscript very difficult to read. 

      Reviewer #2 (Public review): 

      Summary 

      Briola and co-authors have performed a structural analysis of the human CTF18 clamp loader bound to PCNA. The authors purified the complexes and formed a complex in solution. They used cryo-EM to determine the structure to high resolution. The complex assumed an auto-inhibited conformation, where DNA binding is blocked, which is of regulatory importance and suggests that additional factors could be required to support PCNA loading on DNA. The authors carefully analysed the structure and compared it to RFC and related structures. 

      Strength & Weakness 

      Their overall analysis is of high quality, and they identified, among other things, a human-specific beta-hairpin in Ctf18 that flexibly tethers Ctf18 to Rfc2-5. Indeed, deletion of the beta-hairpin resulted in reduced complex stability and a reduction in a primer extension assay with Pol ε. This is potentially very interesting, although some more work is needed on the quantification. Moreover, the authors argue that the Ctf18 ATP-binding domain assumes a more flexible organisation, but their visual representation could be improved. 

      The data are discussed accurately and relevantly, which provides an important framework for rationalising the results. 

      All in all, this is a high-quality manuscript that identifies a key intermediate in CTF18dependent clamp loading. 

      Reviewer #3 (Public review): 

      Summary: 

      CTF18-RFC is an alternative eukaryotic PCNA sliding clamp loader that is thought to specialize in loading PCNA on the leading strand. Eukaryotic clamp loaders (RFC complexes) have an interchangeable large subunit that is responsible for their specialized functions. The authors show that the CTF18 large subunit has several features responsible for its weaker PCNA loading activity and that the resulting weakened stability of the complex is compensated by a novel beta hairpin backside hook. The authors show this hook is required for the optimal stability and activity of the complex. 

      Relevance: 

      The structural findings are important for understanding RFC enzymology and novel ways that the widespread class of AAA ATPases can be adapted to specialized functions. A better understanding of CTF18-RFC function will also provide clarity into aspects of DNA replication, cohesion establishment, and the DNA damage response. 

      Strengths: 

      The cryo-EM structures are of high quality enabling accurate modelling of the complex and providing a strong basis for analyzing differences and similarities with other RFC complexes. 

      Weaknesses: 

      The manuscript would have benefitted from more detailed biochemical analysis to tease apart the differences with the canonical RFC complex. 

      I'm not aware of using Mg depletion to trap active states of AAA ATPases. Perhaps the authors could provide a reference to successful examples of this and explain why they chose not to use the more standard practice in the field of using ATP analogues to increase the lifespan of reaction intermediates. 

      Overall appraisal: 

      Overall the work presented here is solid and important. The data is sufficient to support the stated conclusions and so I do not suggest any additional experiments. 

      Reviewer #1 (Recommendations for the authors): 

      We thank the reviewer for their positive comments and for their thorough review. All raised points have been addressed below.

      Major points 

      (1) The nomenclature used in the paper is very confusing and sometimes incorrect. The authors refer to CTF18 protein as "Ctf18", and the entire CTF18-RFC complex as "CTF18". This results in massive confusion because it is hard to ascertain whether the authors are discussing the individual subunits or the entire complex. Because these are human proteins, each protein name should be fully capitalized (i.e. CTF18, RFC4 etc). The full complex should be referred to more clearly with the designation CTF18-RFC or CTF18-RLC (RFC-like complex). Also, because the yeast and human clamp loader complexes use the same nomenclature for different subunits, it would be best for the authors to use the "A, B, C, D, E subunit" nomenclature that has been standard in the field for the past 20 years. Finally, the authors try to distinguish PCNA subunits by labeling them "PCNA2" or "PCNA1" (see Page 8 lines 180,181 for an example). This is confusing because the names of the RFC subunits have similar formats (RFC2, RFC3, RFC4, etc). In the case of RFC this denotes unique genes, whereas PCNA is a homotrimer. Could the authors think of another way to denote the different subunits, such as super/subscript? PCNA-I, PCNA-II, PCNA-III? 

      We thank the reviewer for pointing out the confusing nomenclature. Following the referee suggestion, we now refer to the CTF18 full complex as “CTF18-RFC”. We prefer keeping the nomenclature used for CTFC18 subunits as RFC2, RFC3 etc., as recently used in Yuan et al, Science, 2024. However, we followed the referee’s suggestion for PCNA subunits, now referred to as PCNA-I, PCNA-II and PCNA-III.

      (2) I believe that the authors are over-interpreting their data in Figure 1. The claim that "less sharp definition" of the map corresponding to the AAA+ domain of Ctf18 supports a relatively high mobility of this subunit is largely unsubstantiated. There are several reasons why one could get varying resolution in a cryo-EM reconstruction, such as compositional heterogeneity, preferred orientation artifacts, or how the complex interacts with the air-water interface. If other data were presented that showed this subunit is flexible, this evidence would support that data but cannot alone as justification for subunit mobility. Along these lines, how was the buried surface area (2300 vs 1400 A2) calculated? Is this the total surface area or only the buried surface area involving the AAA+ domains? It is surprising that these numbers are so different considering that the subunits and complexes look so similar (Figures 1c and 2b). 

      We respectfully disagree with the suggestion that our interpretation of local flexibility in the AAA+ domain of Ctf18 is overreaching. Several lines of evidence support this interpretation. First, compositional heterogeneity is unlikely, as the A′ domain of Ctf18 is well-resolved and forms stable interactions with RFC3, indicating that Ctf18 is consistently incorporated into the complex. Second, preferred orientation artifacts are excluded, as the particle distribution shows excellent angular coverage (Fig. S9a). Third, we now include a 3D variability analysis (3DVA; Supplementary Video 1), which reveals local conformational heterogeneity centered around the AAA+ domain of Ctf18, consistent with intrinsic flexibility.

      Regarding the buried surface area values, the reported numbers refer specifically to the interfaces between the AAA+ domain of Ctf18 and RFC2, and are derived from buried surface area calculations performed with PISA. The smaller interface (~1400 Ų) compared to RFC1–RFC2 (~2300 Ų) reflects low sequence identity (~26%) and divergent structural features, including the absence of conserved elements such as the canonical PIP-box in Ctf18. We have clarified and expanded this explanation in the revised manuscript (Page 7).

      (3) The authors very briefly discuss interactions with PCNA and how the CTF18-RFC complex differs from the RFC complex. This is amongst the most interesting results from their work, but also not well-developed. Moreover, Figure 3D describing these interactions is extremely unclear. I feel like this observation had potential to be interesting, but is largely ignored by the authors. 

      We thank the referee for pointing this out. We have expanded the section describing the interactions of CTF18-RFC and PCNA (Page 9 in the new manuscript), and made a new panel figure with further details (Fig. 3D).  

      (4) The authors make the observation that key ATP-binding residues in RFC4 are displaced and incompatible with nucleotide binding in their CTF18-RFC structure compared to the hRFC structure. This should be a main-text figure showing these displacements and how it is incompatible with ATP binding. Again, this is likely an interesting finding that is largely glossed over by the authors. 

      We now discuss this feature in detail (Pag 11 in the new manuscript), and added two figure insets (Fig. 4c) describing the incompatibility of RFC4 with nucleotide binding.

      (5) The authors claim that the work of another group (citation 50) "validate(s) our predictions regarding the significant similarities between CTF18-RFC and canonical RFC in loading PCNA onto a ss/dsDNA junction." However, as far as this reviewer can tell the work in citation 50 was posted online before the first draft of this manuscript appeared on biorxiv, so it is dubious to claim that these were "predictions." 

      We agree with the referee about this claim. We have now revised the text as follows:

      “While our work was being finalized, several cryo-EM structures of human CTF18-RFC bound to PCNA and primer/template DNA were reported by another group (He et al, PNAS, 2024). These findings are consistent with the distinct features of CTF18-RFC observed in our structures and independently support the notion of significant mechanistic similarity between CTF18-RFC and canonical RFC in loading PCNA onto a ss/dsDNA junction”.

      (6) The authors use a primer extension assay to test the effects of truncating the Nterminal beta hairpin of CTF18. However, this assay is only a proxy for loading efficiency and the observed effects of the mutation are rather subtle. The authors could test their hypothesis more clearly if they performed an ATPase assay or even better a clamp loading assay. 

      We thank the referee for this valuable suggestion. In response, we have performed clamp loading assays comparing the activities of human RFC, wild-type CTF18-RFC, and the β-hairpin–truncated CTF18-RFC mutant. The results, now presented in Fig. 6 and Table 1 of the revised manuscript, clearly show that truncation of the N-terminal βhairpin results in a slower rate of PCNA loading. We propose that this reduced loading rate likely contributes to the diminished Pol ε–mediated DNA synthesis observed in the primer extension assays.

      Minor points 

      (1) Page 3 line 53 the introduction suggests that ATP hydrolysis prompts clamp closure. While this may be the case, to my knowledge all recent structural work shows that closure can occur without ATP hydrolysis. It may be better to rephrase it to highlight that under normal loading conditions, ATP hydrolysis occurs before clamp closure. 

      The text now reads (Page 3): 

      “DNA binding prompts the closure of the clamp and hydrolysis of ATP induces the concurrent disassembly of the closed clamp loader from the sliding clamp-DNA complex, completing the cycle necessary for the engagement of the replicative polymerases to start DNA synthesis.”

      (2) Page 3 line 60, I do not see how the employment of alternative loaders highlights the specificity of the loading mechanism - would it not be possible for multiple loaders to have promiscuous clamp loading? 

      We thank the referee for this comment. The text now reads (Page 3):

      “However, eukaryotes also employ alternative loaders (20), including CTF18-RFC (6, 21-24), which likely use a conserved loading mechanism but are functionally specialized through specific protein interactions and context-dependent roles in DNA replication.”

      (3) Page 4 line 75 could you please cite a study that shows Ctf8 and Dcc1 bind to the Ctf18 C-terminus and that a long linker is predicted to be flexible? 

      Two references have been added (Stokes et al, NAR, 2020 and Grabarczyk et al, Structure, 2018)

      (4) Figure 2A has the N-terminal region of Ctf18 as bound to RFC3 but should likely be labeled as bound to RFC5. This caused significant confusion while trying to parse this figure. Further, the inclusion of "X" as a sequence - does this refer to a sequence that was not buildable in the cryo-EM map? I would be surprised that density immediately after the conserved DEXX box motif is unbuildable. If this is the case, it should be clearly stated in the figure legend that "X" denotes an unbuildable sequence. For the conserved beta-hairpin in the sequence, could the authors superimpose the AlphaFold prediction onto their structure? It would be more informative than just looking at the sequence. 

      We apologize for this confusion. The error in Figure 2A has been corrected. The figure caption now explicitely says that “X” refers to amino acid residues in the sequence which were not modelled. A superposition of the cryo-EM model of the N-terminal Beta hairpin in human Ctf18 and AlphaFold predictions for this feature in drosophila and yeast Ctf18 is now presented in Figure 2A.

      (5) Page 8 line 168, the use of the term "RFC5" here feels improper, since the "C" subunit is not RFC5 in all lower eukaryotes (see comment above about nomenclature). For instance, in S cerevisiae, the C subunit is RFC3. I would expect this interaction to be maintained in all C subunits, not all RFC5 subunits. 

      The text now reads (Page 8):

      “Therefore, lower eukaryotes may use a similar b-hairpin motif to bind the corresponding subunit of the RFC-module complex (RFC5 in human, Rfc3 in S. cerevisiae), emphasizing its importance.”  

      (6) Page 10 line 228, the authors claim that hydrolysis is dispensable at the Ctf18/RFC2 interface based on evidence from RFC1/RFC2 interface, by analogy that this is the "A/B" interface in both loaders. However, the wording makes it sound as if the cited data were collected while studying Ctf18 loaders. The authors should clarify this point. 

      The text has been modified as follows (Pag 11): 

      “Prior research has indicated that hydrolysis at the large subunit/RFC2 interface is not essential for clamp loading by various loaders (48-51), while the others are critical for the clamp-loading activity of eukaryotic RFCs. “

      (7) Page 11 line 243/244 the authors introduce the separation pin. Could they clarify whether Ctf18 contains any aromatic residues in this structural motif that would suggest it serves the same functional purpose? Also, the authors highlight this is similar to yeast RFC, which makes it sound like this is not conserved in human RFC, but the structural motif is also conserved in human RFC. 

      We thank the reviewer for this helpful comment. We have clarified in the revised text (Page 12) that the separation pin is conserved not only in yeast RFC but also in human RFC, and now note that human Ctf18 also harbors aromatic residues at the corresponding positions. This observation is supported by the new panel in Figure 4e.

      Minutia 

      (1) Page 2 line 37 please remove the word "and" before PCNA. 

      This has been corrected.

      (2) Please define AAA+ and update the language to clarify that not all pentameric AAA+ ATPases are clamp loaders. 

      AAA+ has been now defined (Page 3).

      (3) Page 4 line 86 Given the relatively weak interaction of Pol ε. 

      This has been corrected.

      (4) Page 8 line 204 the authors likely mean "leucine" and not "lysine". 

      We thank the reviewer for catching this. The error has been corrected.

      (5) Page 14 line 300, the authors claim that CTF18 utilizes three subunits but then list four. 

      We have corrected this.

      Reviewer #2 (Recommendations for the authors): 

      We thank the reviewer for their positive comments and valuable suggestions. The points raised by the referee have been addressed below.

      Major point: 

      (1) Please quantify Figure 6 and S9 from 3 independent repeats and determine the standard deviation to show the variability of the Ctf18 beta hairpin deletion.  The authors suggest that a suboptimal Ctf18 complex interaction with PCNA impacts the stability of the complex, but do not test this hypothesis. Could the suboptimal PIP motif in Ctf18 be changed to an improved motif and the impact tested in the primer extension assay? Although not essential, it would be a nice way to explore the mechanism. 

      We thank the reviewer for the suggestion. However, we note that Figure 6b (now 7b) already presents the quantification of the primer extension assay from three independent replicates, with error bars showing standard deviations, and includes the calculated rate of product accumulation. These data clearly indicate a 42% reduction in primer synthesis rate upon deletion of the Ctf18 β-hairpin.

      We agree that we do not provide direct evidence of impaired complex stability upon deletion of the Ctf18 β-hairpin. However, the 2D classification of the cryo-EM dataset (Figure S9) shows a marked reduction in the number of particles corresponding to intact CTF18-RFC–PCNA complexes in the β-hairpin deletion sample, with the majority of particles corresponding to free PCNA. This contrasts with the wild-type dataset, where complex particles are predominant. These findings indirectly suggest that deletion of the β-hairpin compromises the stability or assembly of the clamp-loader–clamp complex.

      We thank the reviewer for the valuable suggestion to mutate the weak PIP-box of Ctf18. While an interesting direction, we instead sought to directly test the mechanism by performing quantitative clamp loading assays. These assays revealed a significant reduction in the rate of PCNA loading by the CTF18<sup>Δ165–194</sup>-RFCmutant (Figure 6), supporting the conclusion that the β-hairpin contributes to productive PCNA loading. This loading delay likely underlies the reduced rate of primer extension observed in the Pol ε assay (Figure 7), consistent with impaired formation of processive polymerase– clamp complexes.

      (2) I did not see the method describing how the 2D classes were quantified to evaluate the impact of the Ctf18 beta hairpin deletion on complex formation. Please add the relevant information. 

      The relevant information has been added to the Method section:

      “For quantification of complex stability, the number of particles contributing to each 2D class was extracted from the classification metadata (Datasets 1 and 3). All classes showing isolated PCNA rings were summed and compared to the total number of particles in classes representing intact CTF18-RFC–PCNA complexes. This analysis was performed for both wild-type and β-hairpin deletion mutant datasets. Notably, no 2D classes corresponding to free PCNA were observed in the wild-type dataset, whereas in the mutant dataset, a substantial fraction of particles corresponded to isolated PCNA, suggesting reduced stability of the mutant complex.”

      Minor point: 

      (1) Page 2, line 25. Detail what type of mobility is referred to. Do you mean flexibility in the EM-map? 

      We have clarified this. The text now reads:

      “The unique RFC1 (Ctf18) large subunit of CTF18-RFC, which based on the cryo-EM map shows high relative flexibility, is anchored to PCNA through an atypical low-affinity PIP box”

      (2) Page 4, line 82. Please introduce CMGE, or at least state what the abbreviation stands for. 

      This has been addressed.

      (3) Page 4, line 89. Specify that the architecture of the HUMAN CTF18-RFC module is not known, as the yeast one has been published. 

      At the time our study was initiated, the architecture of the human CTF18-RFC module was unknown. A structure of the human complex was published by another group during the final stages of our work and is now properly acknowledged in the Discussion.

      (4) Page 6. Is it possible to illustrate why the autoinhibited state cannot bind to DNA? A visual representation would be nice. 

      We thank the reviewer for this suggestion. Figure 4b in the original manuscript already illustrates why the autoinhibited, overtwisted conformation of the CTF18-RFC pentamer cannot accommodate DNA. In this state, the inner chamber of the loader is sterically occluded, precluding the binding of duplex DNA.

      Reviewer #3 (Recommendations for the authors): 

      We thank Reviewer #3 for their constructive feedback and positive overall assessment of our work.

      We also thank the reviewer for their remarks on the use of Mg depletion to halt hydrolysis. Magnesium is an essential cofactor for ATP hydrolysis, and its depletion is expected to effectively prevent catalysis by destabilizing the transition state, possibly more completely than the use of slowly hydrolysable analogues such as ATPγS. We have recently employed Mg<sup>²+</sup> depletion to successfully trap a pre-hydrolytic intermediate in a replicative AAA+ helicase engaged in DNA unwinding (Shahid et al., Nature, 2025). This precedent supports the rationale for our choice, and the reference has now been included in the revised manuscript.

      I think the authors deposited the FSC curve for the +Mg structure in the -Mg structure PDB/EMDB entry according to the validation report. 

      We thank the reviewer for their careful inspection of the deposition materials. The discrepancy in the deposited FSC curve has now been corrected, and the appropriate FSC curves have been assigned to the correct PDB/EMDB entries.

    1. eLife Assessment

      This important study reports on the redundant roles of the decapping activators Edc3 and Scd6 in orchestrating post-transcriptional programs to modulate metabolic responses to nutrients in yeast. The authors employed mutagenesis studies in conjunction with a battery of transcriptome-wide analyses to provide convincing evidence supporting their conclusions. Considering the broad implications of post-transcriptional regulation of gene expression, this study will be of interest across a variety of biomedical disciplines ranging from biochemistry and molecular and cellular biology to those specializing in studying various pathologies.

    2. Reviewer #1 (Public review):

      Summary:

      mRNA decapping and decay factors play critical roles in post-transcriptionally regulating gene expression. Here, Kumar and colleagues investigate how deleting two yeast decapping enhancer proteins (Edc3 and Scd6), either alone or in tandem, affects the transcriptome. Using RNA-Seq, CAGE-Seq and ribosome profiling, they conclude that these factors generally act in a redundant fashion, with a mutant lacking both proteins showing an increased abundance of select mRNAs. As these upregulated transcripts are also upregulated in mutants lacking the decapping enzyme, Dcp2, and show no increases in transcription of their cognate genes, the authors conclude that this is at the level of mRNA decapping and decay. This was further supported by CAGE-Seq analyses carried out in WT cells and the scd∆6edc3∆ double mutant. Their ribosome profiling data also lead them to conclude that Scd6 and Edc3 display functional redundancy and cooperativity with Dhh1/Pat1 in repressing the translation of specific transcripts. Finally, as their data suggest that Scd6 and Edc3 repress mRNAs coding for proteins involved in cellular respiration, as well as proteins involved in the catabolism of alternative carbon sources, they go on to show that these decapping activators play a role in repressing oxidative phosphorylation.

      Strengths:

      Overall, this manuscript is well-written and contains a large amount of compelling high-quality data and analyses. At its core, it helps to shed light on the overlapping roles Edc3 and Scd6 have in sculpting the yeast transcriptome.

      Weaknesses:

      While not essential, it would be interesting if the authors carried out add-back experiments to determine which domain within Scd6/Edce3 plays a critical role for enforcing the regulation that they see? Their double mutant now puts them in a perfect position to carry out such experiments.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript by Kumar and Zhang presents compelling evidence that Edc3 and Scd6 decapping activators, present a high degree of redundancy that can only be overcome by double mutants of both. In addition, the authors provide strong evidence for their role in regulating starvation-induced pathways as evidenced by measurements of mitochondrial membrane potential, metabolomics and analysis of the flux of Krebs cycle intermediates.

      Strengths:

      Kumar, Zhang et al provide multiple source of evidence of the direct mechanism of Edc3 and Scd6, by using and comparing different approaches such as mRNA-seq, ribosome occupancies and translational efficiencies. By extensive analysis the authors show that this complex can also regulate genes outside the Environmental Stress Response (non-iESR) that are significantly up-regulated in all three mutants. Remarkably, the gene ontology analysis of these non-iESR genes identify enrichment for mitochondrial proteins that are implicated in the Krebs cycle. Overall, this study adds novel mechanistic insight into how nutrients control gene expression by modulating decapping and translational repression.

      Weaknesses:

      The authors show very nicely that growth phenotypes from scd6Δedc3∆ can be rescued by transformation of EDC3 (pLfz614-7) or SCD6 (pLfz615-5). Future work could make use of these rescue strategies, for example as a platform to further characterise protein-protein interactions between Edc3, Scd6 and Dhh1.

    4. Reviewer #3 (Public review):

      Summary:

      In this paper, Kumar et al investigated the role of two decapping activators, Edc3 and Scd6, in regulating mRNA decay and translation in yeast. Using a variety of approaches including RNA-seq, ribosome profiling, proteomics, polysome analysis, and metabolomics the authors demonstrate that whereas single deletions of Edc3 or Scd6 have modest effects, the double mutant leads to increased abundance of mRNAs, many of which overlap with those targeted by the decapping activators Dhh1 and Pat1. The data suggest that Edc3 and Scd6 function redundantly to recruit Dhh1 to the Dcp2 decapping complex, thereby promoting mRNA turnover and translational repression. The authors show that these factors cooperate with Dhh1/Pat1 to repress transcripts involved in respiration, mitochondrial function, and alternative carbon source utilization, linking post-transcriptional regulation to nutrient responses. The study establishes Edc3 and Scd6 as important, but redundant regulators that fine-tune gene expression and metabolic adaptation in response to nutrient availability.

      Strengths:

      The paper has several strengths, including the comprehensive approach taken by the authors using multiple experimental techniques (RNA-seq, ribosome profiling, Western blotting, TMT-MS, polysome profiling, and metabolomics) to provide multiple lines of evidence to support their conclusions. The authors demonstrate clear redundancy of the factors by using single and double mutants for Edc3 and Scd6 and their global approach enables an understanding of these factors' roles across the yeast transcriptome. The work connects post-transcriptional processes to nutrient-dependent gene regulation, providing insights into how cells adapt to changes in their environment. The authors demonstrate the redundant roles of Edc3 and Scd6 in mRNA decapping and translation repression. Their RNA-seq and ribosome profiling results convincingly show that many mRNAs are derepressed only in the double mutants, confirming their hypothesis of redundancy. Furthermore, the functional cooperation between Edc3/Scd6 and Dhh1/Pat1 in regulating specific metabolic pathways, including mitochondrial function and carbon source utilization, is supported by the metabolomic data.

      Weaknesses:

      The study uses indirect evidence to support claims about the effect on mRNA stability rather than directly measuring mRNA stability. However, the combination of Pol II occupancy and RNA abundance measurements is consistent with the claims regarding mRNA stability. The addition of new experiments in the revision co-IPing Dhh1 and Dcp2 strengthens the argument that Edc3 and Scd6 recruit these factors.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Strengths: 

      Overall, this manuscript is well-written and contains a large amount of high-quality data and analyses. At its core, it helps to shed light on the overlapping roles of Edc3 and Scd6 in sculpting the yeast transcriptome. 

      Weaknesses: 

      (1) While the data presented makes conclusions about mRNA stability based on corresponding ChIP-Seq analyses and analyzing other mutants (e.g. Dcp2 knockout), at no point is mRNA stability actually ever directly assessed. This direct assessment, even for select transcripts, would further strengthen their conclusions. 

      We appreciate the reviewer’s concern but wish to emphasize that we conducted ChIP-Seq analysis of RNA Polymerase II occupancies in the CDSs of all genes, known to be a reliable indicator of transcription rate, and found only small increases in Pol II occupancies that cannot account for the increased transcript levels of the cohort of mRNAs up-regulated in the scd∆6edc3∆ double mutant (Fig. 3E). This provides strong evidence that increased transcription is not the main driver of increased mRNA abundance in this mutant.  Bolstering this conclusion, we showed that the Hap2/Hap3/Hap4/Hap5 complex of transcription factors responsible for induction of Ox. Phos. genes was not activated in scd6Δedc3Δ cells in glucose medium (Fig. 6F(ii)); nor was the Adr1 activator of CCR genes activated (Fig. S9C(i)), ruling out transcriptional induction of their target genes in glucose-replete scd6Δ/edc3Δ cells and instead favoring reduced degradation as the mechanism underlying derepression of Ox. Phos. and CCR gene transcripts in this mutant. In Fig. 3B, we further showed that the majority of mRNAs up-regulated in the scd6Δedc3Δ double mutant are also derepressed by dcp2Δ, and in Fig. 3D that the mRNAs up-regulated in scd∆6edc3∆ cells exhibit a higher than average codon protection index (CPI) indicating a heightened involvement of decapping and co-translational degradation by Xrn1 in their decay. To provide additional support for our conclusion, we have conducted new experiments to measure the abundance of capped mRNAs genome-wide by CAGE sequencing of total mRNA in both WT and scd∆6edc3∆ cells.  As established previously, normalizing CAGE TPMs to total mRNA TPMs determined by RNA-Seq, dubbed the C/T ratio, provides a reliable measure of the capped proportion of each transcript.  The new data presented in Fig. 3C indicate that the mRNAs up-regulated in the scd∆6edc3∆ mutant have significantly lower than average C/T ratios in WT cells, whereas the C/T ratios for the down-regulated transcripts are higher than average, and that these differences between the two groups and all expressed mRNAs are diminished in the scd∆6edc3∆ double mutant. These are the results expected if the up-regulated mRNAs are selectively targeted for decapping in WT cells dependent on Edc3/Scd6, whereas the downregulated mRNAs are targeted by Edc3/Scd6 less than the average transcript. In the original version of the paper, we came to the same conclusion by analyzing our previous CAGE data for the dhh1∆ mutant for the same transcripts dysregulated scd∆6edc3∆ cells, now presented as supportive data in Fig. S3F. Finally, we added the fact that among all four Dhh1 target mRNAs examined in the previous study of He et al. (2022) and found here to be up-regulated selectively in the scd6∆edc3∆ double mutant (Fig. S10), two of them (SDS23 and HXT6) were shown directly to have longer half-lives in dhh1∆ vs. WT cells by He et al. (2018). Hence, the combined evidence is compelling that selective up-regulation of particular mRNAs in the scd∆6edc3∆ mutant results from diminished decapping/decay rather than enhanced transcription; and we feel that the additional supporting evidence that would be provided by measuring half-lives of a small group of up-regulated transcripts would not justify the considerable effort required to do so.  Moreover, the standard approach for such experiments of impairing transcription with an inhibitor of Pol II or a Pol II Ts<sup>-</sup> mutation has been criticized because of the known buffering (suppression) of mRNA decay rates in response to impaired transcription.

      (2) Scd6 and Edc3 show a high level of functional redundancy, as demonstrated by the double mutant. As these proteins form complexes with other decapping factors/activators, I'm curious if depleting both proteins in the double mutant destabilizes any of these other factors. Have the authors ever assessed the levels of other key decapping factors in the double mutants (i.e. Dhh1, Pat1, Dcp2...etc)? I wonder if depleting both proteins leads to a general destabilization of key complexes. It would also be interesting to see if depleting Edc3 or Scd6 leads to a concomitant increase in the other protein as a compensatory mechanism. 

      We thank the reviewer for this insight.  Examining our Ribo-Seq and TMT-MS data revealed that Dhh1 expression and steady-state abundance are increased ~2-fold in the scd6∆edc3∆ strain, indicating that the up-regulation of many of the same mRNAs by scd6∆edc3∆ and dhh1∆ does not result indirectly from reduced levels of Dhh1 in the scd6∆edc3∆ mutant. The predicted increased in Dhh1 expression might signify a compensatory response to the absence of Scd6/Edc3.  We also observed an ~40% reduction in Dcp2 translation (RPFs) and mRNA abundance in the scd6∆edc3∆ strain, which might contribute to the up-regulation of mRNAs dysregulated in this mutant. However, our new immunoblot analyses revealed no significant reduction in steady-state Dcp2 levels in scd6∆edc3∆ cells (Input lanes in Figs. 3F and S4C(i)-(ii)). Moreover, our previous finding that the majority of mRNAs subject to NMD, up-regulated by both upf1∆ and dcp2∆, are not upregulated by scd6∆edc3∆ implies that Dcp2 abundance in scd6∆edc3∆ cells is adequate for normal levels of NMD and favors a direct role for Scd6/Edc3 in accelerating degradation of most transcripts up-regulated in this mutant. We have added these points to the DISCUSSION.

      (3) While not essential, it would be interesting if the authors carried out add-back experiments to determine which domain within Scd6/Edce3 plays a critical role in enforcing the regulation that they see. Their double mutant now puts them in a perfect position to carry out such experiments. 

      We agree with the reviewer that our scd6∆edc3∆ strain provides an opportunity to dissect the Scd6 and Edc3 proteins to determine which domains and motifs of each protein are most critically required for their functions in activating mRNA decay. However, if conducted thoroughly, this would entail an extensive analysis requiring a combination of genetics, biochemistry and genomics.  Considering the large amount of data already presented in 43 and 34 panels of main and supplementary figures, respectively, we feel that these additional experiments would be conducted more appropriately as a stand-alone follow-up study.

      Reviewer #2 (Public review): 

      Weaknesses: 

      The authors show very nicely in Figure S1A that growth phenotypes from scd6Δedc3∆ can be rescued by transformation of EDC3 (pLfz614-7) or SCD6 (pLfz615-5). The manuscript might benefit from using these rescue strategies in the analysis performed (e.g. RNA-seq, ribosome occupancies, and translational efficiencies). Also, these rescue assays could provide a good platform to further characterise the protein-protein interactions between Edc3, Scd6, and Dhh1. 

      We responded to this point immediately above in responding to Rev. #1.

      Reviewer #3 (Public review): 

      Weaknesses: 

      The limitations of the study include the use of indirect evidence to support claims that Edc3 and Scd6 recruit Dhh1 to the Dcp2 complex, which is inferred from correlations in mRNA abundance and ribosome profiling data rather than direct biochemical evidence. 

      While the reviewer makes a valid point, it is important to note that the greater correlations between effects of scd6∆edc3∆ with those conferred by dhh1∆ vs. pat1∆ also extended to changes in metabolites (Fig. 7A-C). To provide more direct evidence that Edc3 and Scd6 recruit Dhh1 to the Dcp2 complex, we have now conducted co-immunoprecipitation experiments (presented in new Figs. 3F and S5) demonstrating that association of Dhh1 with Dcp2 is diminished in the scd6∆edc3∆ double mutant but not in either scd6∆ or edc3∆ single mutant, thus providing biochemical support for our proposal.

      Also, there is limited exploration of other signals as the study is focused on glucose availability, and it is unclear whether the findings would apply broadly across different environmental stresses or metabolic pathways. Nonetheless, the study provides new insights into how mRNA decapping and degradation are tightly linked to metabolic regulation and nutrient responses in yeast. The RNA-seq and ribosome profiling datasets are valuable resources for the scientific community, providing quantitative information on the role of decapping activators in mRNA stability and translation control. 

      While not disputing the facts of this comment, we think it is unjustified to label as a weakness that our study focused on glucose-grown cells considering the large amount of new data and insights made possible by our multi-omics approach, presented in >70 separate figure panels and nine supplementary datafiles, which the reviewer has characterized as being valuable to the scientific community.  Parallel studies in non-preferred carbon or nitrogen sources are underway and represent large-scale investigations in their own right, for which the current dataset in glucose-replete cells provides the critical reference condition.

      Reviewer #1 (Recommendations for the authors): 

      The authors made a note that a set of 37 mRNAs is repressed exclusively by Edc3 with little contribution by Scd6, a list that includes the RPS28B mRNA. Edc3 has been previously reported to promote the decay of this mRNA in a deadenylation-independent fashion by binding to an element in its 3'UTR (PMIDs 15225544, 24492965). Can the authors comment on whether Edc3 may be binding to similar elements in the 3'UTRs of these transcripts in their shortlist? This could be an interesting topic matter for discussion as well. 

      While an interesting idea, this seems unlikely because the 3’UTR sequence in RPS28B mRNA was shown to bind Rps28 protein itself to confer heightened decapping and decay dependent on Edc3 in a negative autoregulatory loop that exerts tight control over Rps28 protein levels.  It would be surprising if Edc3mediated repression of the other 36 mRNAs would involve Rps28 as none of them encode cytoplasmic ribosomal proteins. Nevertheless, we searched for a conserved motif among the 3’UTRs of the 37 mRNAs using the MEME suite and found enrichment for motifs identified for RNA binding proteins Hrp1 and Nab2 and two novel motifs, but none of these motifs could be recognized within in the Rps28 autoregulatory loop.  We have chosen not to comment on these findings in the revised manuscript to avoid lengthening it unnecessarily with inconclusive observations.

      Reviewer #2 (Recommendations for the authors): 

      The authors show very nicely in Figure S1A that growth phenotypes from scd6Δedc3∆ can be rescued by the transformation of EDC3 (pLfz614-7) or SCD6 (pLfz615-5). The manuscript might benefit from using these rescue strategies on the analysis performed (e.g. RNA-seq, ribosome occupancies, and translational efficiencies); or expressing truncated mutants of EDC3 (pLfz614-7) or SCD6 (pLfz615-5), to show that they can act as dominant negative competitors, either on the binding to Dhh1 and Dcp2. 

      We addressed this comment above in our response to this Reviewer.

      Reviewer #3 (Recommendations for the authors): 

      (1) Labels such as "mRNA_up_s6,e3" are not defined in figures or the text. I suggest clearer sample labeling throughout. 

      The labels had been defined at first mention in the RESULTS but are now indicated there more explicitly, as well as in the legend to Fig. 1.

      (2) In Figure 1D it is surprising that the mRNA profile has a peak in the 5' UTR. I would expect to see such a peak in ribosome footprinting data. Is it possible these are incorrectly labeled?

      The figure is correctly labeled. Generally, one does not expect to see RPFs in the 5’UTR region unless there is an efficiently translated uORF, which appears not to be the case for MDH2.

      In general, the information in this panel and C is inadequate. None of the numbers are clearly explained in the figure legend or in the figure. 

      We had cited the legend to Fig. S3C for details of all such gene browser images but have now inserted this information into the Fig. 1D legend, at the first occurrence of such data in the regular figures. 

      (3) Figures 1C and 1D are in the wrong order.

      Corrected.

      (4) Figure 2D is a very complicated Venn Diagram. I suggest using UpSet plots as an alternative to Venn diagrams to more clearly convey overlaps between sets.  

      We provided additional explanatory text in the Fig. 2D legend to facilitate understanding.

      (5) The use of the same color scheme to represent different sets in panels of the same figure is a source of confusion. E.g. the cyan in Figures 2A, 2D, and 2E indicates unrelated categories, but one would think they are related.

      The use of the same cyan color in these three figure panels actually does designate results for the same set of 591 mRNAs up-regulated in the three mutants.  The application of the color schemes is now mentioned explicitly in Figs. 1, 2, and S3.

      (6) Reporting of p-values = 0 in figures is not useful.

      Corrected.

      (7) The whole manuscript is extremely long which reduces the overall impact. For example, the introduction is six pages long. I suggest reducing redundant text and being more concise to enhance readability. 

      We tried to streamline the text wherever possible, in particular shortening the Introduction by two pages.

      (8) Many abbreviations are used throughout the text that are not introduced the first time they are used. 

      Corrected throughout.

      (9) The ERCC normalization is unclear. Were the spike-ins added before cell lysis to allow estimation of per-cell RNA counts or to the extracted RNA? If added to extracted RNA rather than cells it is not clear to me how the claim can be made regarding increased mRNA abundance in the mutants. 

      We thank the reviewer for this comment. As we explained in the Methods, 2.4 µl of 1:100 diluted ERCC RNA Spike-In Control Mix 1 was added to 1.2 µg of each total RNA sample prior to cDNA library preparation.  Because the majority of total mRNA is comprised of rRNA, this normalization yields the abundance of each mRNA relative to rRNA. Owing to repression of rESR mRNAs encoding ribosomal proteins and biogenesis factors in the scd6∆edc3∆ strain (Fig. S3D), the ribosome content per cell is expected to be reduced in this mutant vs. WT. We showed previously that the isogenic dcp2∆ mutant that elicits an ESR response of similar magnitude, showed a 30% reduction in bulk ribosomal subunits per cell compared to same WT strain examined here {Vijjamarri, 2023 #7866}.  Assuming a similar reduction in ribosome abundance in the scd6∆edc3∆ mutant, the changes in mRNA per cell conferred by the scd6∆edc3∆ mutation are expected to be 0.7-fold of the ERCCnormalized values given in Fig. 3E, yielding fold-changes of 2.00 and 0.62 for the mRNA_up and mRNA_dn, groups, respectively, which still differ substantially from the corresponding changes in normalized Rpb1 occupancies of 1.2 and 0.93, respectively.  We have added this new analysis to the text of RESULTS.

      (10) The use of the terms "up-regulated" and "derepressed" throughout is confusing. Both refer to observed increased abundance of mRNAs, but they imply different causes which are never clearly defined. 

      We changed all occurrences of “derepressed” to “up-regulated”.

    1. eLife Assessment

      This manuscript revisits the well-studied KdpFABC potassium transport system from bacteria with a convincing set of new higher resolution structures, a protein expression strategy that permits purification of the active wildtype protein, and solid insight obtained from mutagenesis and activity assays. The thorough and thoughtful mechanistic analyses makes this a valuable contribution to the membrane transport field.

    2. Reviewer #2 (Public review):

      Summary:

      The paper describes the high-resolution structure of KdpFABC, a bacterial pump regulating intracellular potassium concentrations. The pump consists of a subunit with an overall structure similar to that of a canonical potassium channel and a subunit with a structure similar to a canonical ATP-driven ion pump. The ions enter through the channel subunit and then traverse the subunit interface via a long channel that lies parallel to the membrane to enter the pump, followed by their release into the cytoplasm.

      The work builds on the previous structural and mechanistic studies from the authors' and other labs. While the overall architecture and mechanism have already been established, a detailed understanding was lacking. The study provides a 2.1 Å resolution structure of the E1-P state of the transport cycle, which precedes the transition to the E2 state, assumed to be the rate-limiting step. It clearly shows a single K+ ion in the selectivity filter of the channel and in the canonical ion binding site in the pump, resolving how ions bind to these key regions of the transporter. It also resolves the details of water molecules filling the tunnel that connects the subunits, suggesting that K+ ions move through the tunnel transiently without occupying well-defined binding sites. The authors further propose how the ions are released into the cytoplasm in the E2 state. The authors support the structural findings through mutagenesis and measurements of ATPase activity and ion transport by surface-supported membrane (SSM) electrophysiology.

    3. Reviewer #3 (Public review):

      Summary:

      By expressing protein in a strain that is unable to phosphorylate KdpFABC, the authors achieve structures of the active wildtype protein, capturing a new intermediate state, in which the terminal phosphoryl group of ATP has been transferred to a nearby Asp, and ADP remains covalently bound. The manuscript examines the coupling of potassium transport and ATP hydrolysis by a comprehensive set of mutants. The most interesting proposal revolves around the proposed binding site for K+ as it exits the channel near T75. Nearby mutations to charged residues cause interesting phenotypes, such as constitutive uncoupled ATPase activity, leading to a model in which lysine residues can occupy/compete with K+ for binding sites along the transport pathway.

      Strengths:

      The high resolution (2.1 Å) of the current structure is impressive, and allows many new densities in the potassium transport pathway to be resolved. The authors are judicious about assigning these as potassium ions or water molecules, and explain their structural interpretations clearly. In addition to the nice structural work, the mechanistic work is thorough. A series of thoughtful experiments involving ATP hydrolysis/transport coupling under various pH and potassium concentrations bolsters the structural interpretations and lends convincing support to the mechanistic proposal. The SSME experiments are generally rigorous.

      Weaknesses:

      The present SSME experiments do not support quantitative comparisons of different mutants, as in Figures 4D and 5E. Only qualitative inferences can be drawn among different mutant constructs.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      This study on potassium ion transport by the protein complex KdpFABC from E. coli reveals a 2.1 Å cryo-EM structure of the nanodisc-embedded transporter under turnover conditions. The results confirm that K+ ions pass through a previously identified tunnel that connects the channel-like subunit with the P-type ATPase-type subunit. 

      Strengths: 

      The excellent resolution of the structure and the thorough analysis of mutants using ATPase and ion transport measurements help to strengthen new and previous interpretations. The evidence supporting the conclusions is solid, including biochemical assays and analysis of mutants. The work will be of interest to the membrane transporter and channel communities and to microbiologists interested in osmoregulation and potassium homeostasis. 

      Weaknesses: 

      There is insufficient credit and citation of previous work. 

      The manuscript has been thoroughly revised with special attention to acknowledging all past work relevant to the study.

      Reviewer #2 (Public review): 

      Summary: 

      The paper describes the high-resolution structure of KdpFABC, a bacterial pump regulating intracellular potassium concentrations. The pump consists of a subunit with an overall structure similar to that of a canonical potassium channel and a subunit with a structure similar to a canonical ATP-driven ion pump. The ions enter through the channel subunit and then traverse the subunit interface via a long channel that lies parallel to the membrane to enter the pump, followed by their release into the cytoplasm. 

      Strengths: 

      The work builds on the previous structural and mechanistic studies from the authors' and other labs. While the overall architecture and mechanism have already been established, a detailed understanding was lacking. The study provides a 2.1 Å resolution structure of the E1-P state of the transport cycle, which precedes the transition to the E2 state, assumed to be the ratelimiting step. It clearly shows a single K+ ion in the selectivity filter of the channel and in the canonical ion binding site in the pump, resolving how ions bind to these key regions of the transporter. It also resolves the details of water molecules filling the tunnel that connects the subunits, suggesting that K+ ions move through the tunnel transiently without occupying welldefined binding sites. The authors further propose how the ions are released into the cytoplasm in the E2 state. The authors support the structural findings through mutagenesis and measurements of ATPase activity and ion transport by surface-supported membrane (SSM) electrophysiology. 

      Weaknesses: 

      While the results are overall compelling, several aspects of the work raised questions. First, the authors determined the structure of the pump in nanodiscs under turnover conditions and observed several structural classes, including E1-P, which is detailed in the paper. Two other structural classes were identified, including one corresponding to E2. It is unclear why they are not described in the paper. Notably, the paper considers in some detail what might occur during the E1-P to E2 state transition, but does not describe the 3.1 Å resolution map for the E2 state that has already been obtained. Does the map support the proposed structural changes? 

      As was seen in previous work by Silberberg et at. (2022), imaging KdpFABC under turnover conditions can produce multiple enzymatic states. We focus on the E1~P state and associated biophysical analyses to provide a clear and concise story that is focused on the conduction pathway for K<sup>+</sup> ions. We continue to work with the cryo-EM data as well as other supporting methodologies and datasets with the goal of producing an additional manuscript that will describe other conformations. The class of particles producing the 3.1 Å structure shown in Fig. 1 – figure suppl. 2 is heterogeneous and thus requires further classification to elucidate conformational changes, as is apparent from the downstream processing of the E1 classes also shown in that figure. We cannot therefore derive any conclusions about the configuration of side chains at the CBS based on this structure. Nevertheless, two previous structures of the E2.Pi state - 7BGY and 7BH2 which were stabilized MgF<sub>4</sub> and BeF<sub>x</sub>, respectively – show the structural change that is described in the paragraph discussing D583A. Given the consistency and relatively high resolution (2.9 and 3.0 Å, respectively) of these two independent structures, we believe that they provide strong support for our proposal for Lys586 acting as a built-in counter ion.

      The paper relies on the quantitative activity comparisons between mutants measured using SSM electrophysiology. Such comparisons are notoriously tricky due to variability between SSM chips and reconstitution efficiencies. The authors should include raw traces for all experiments in the supplementary materials, explain how the replicates were performed, and describe the reproducibility of the results. Related to this point above, size exclusion chromatography profiles and reconstitution efficiencies for mutants should be shown to facilitate comparison between measured activities. For example, could it be that the inactive V496R mutant is misfolded and unstable? 

      Similarly, are the reduced activities of V496W and V496H (and many other mutants) due to changes in the tunnel or poor biochemical properties of these variants? Without these data, the validity of the ion transport measurements is difficult to assess. 

      To address this concern, we have generated a series of supplementary figures for Figs. 2, 4, 5, and 6, which show all of the raw traces underlying our SSME data (Figure 2 - figure supplements 2-4, Figure 4 - figure supplement 1,Figure 5 - figure supplement 3, Figure 6 - figure supplement 2). We have also included further detail about the experimental protocols, including number and type of replicates, in an expanded "Activity Assays" section of Methods.

      In addition, we have included SEC profiles for each of the V496 mutants, which show that they are all well behaved in detergent solution prior to reconstitution (Fig. 4 - figure supplement 1). We are not able to directly document reconstitution efficiencies as it is not practical to separate proteoliposomes from unincorporated protein prior to preparing the sensors used for SSME. Binding currents are seen for several of the inactive mutants (e.g., Q116R in Rb and NH<sub>4</sub> in Fig. 2 - figure supplement 3 and V496R in Fig. 4 - figure supplement 1), which demonstrate that protein is indeed present in the corresponding proteoliposomes even though no sustained transport current is observed.

      The authors propose that the tunnel connecting the subunits is filled with water and lacks potassium ions. This is an important mechanistic point that has been debated in the field. It would be interesting to calculate the volume of the tunnel and estimate the number of ions that might be expected in it, given their concentration in bulk. It may also be helpful to provide additional discussion on whether some of the observed densities correspond to bound ions with low occupancy.  

      As suggested, we calculated the internal volume of the tunnel within KdpA (from the S4 K<sup>+</sup> site to the KdpA/KdpB subunit interface) based on the profile derived from Caver. Based on this volume (4.9 x 10<sup>-25</sup> L), a single K<sup>+</sup> ion within this cavity would correspond to 3.4 M, which is near saturation for a solution of KCl. We added this information together with an acknowledgment of low-occupancy K<sup>+</sup> to the fourth paragraph of the Discussion:

      " Fourth, based on the volume of the cavity in KdpA, a single K<sup>+</sup> ion would correspond to a concentration of 3.4 M, suggesting that multiple ions would exceed the solubility limit especially in the absence of counterions. Finally, map densities within the tunnel were either of comparable strength or weaker than surrounding side chain atoms, unlike at S3 and canonical binding sites. Although it is possible that weaker density could represent low occupancy K<sup>+</sup> ions, we favor a mechanism whereby individual K<sup>+</sup> ions occupy the tunnel transiently as they transit between the selectivity filter and the canonical binding site."

      In order to make this analysis, we developed a python script to calculate the volume of the tunnel as defined by the Caver software (this software is available via github.com/dls4n/tunnel). In turn, this enabled us to distinguish water molecules that were actually in the tunnel rather than bound more deeply within the structure of KdpA. As a result, we updated the water distribution plot in Fig. 4b. Notably, the 17 water molecules within this cavity would correspond to 57.8 M, which is reasonably near the expected 55 M for an aqueous solution.

      Reviewer #3 (Public review): 

      Summary: 

      By expressing protein in a strain that is unable to phosphorylate KdpFABC, the authors achieve structures of the active wild-type protein, capturing a new intermediate state, in which the terminal phosphoryl group of ATP has been transferred to a nearby Asp, and ADP remains covalently bound. The manuscript examines the coupling of potassium transport and ATP hydrolysis by a comprehensive set of mutants. The most interesting proposal revolves around the proposed binding site for K+ as it exits the channel near T75. Nearby mutations to charged residues cause interesting phenotypes, such as constitutive uncoupled ATPase activity, leading to a model in which lysine residues can occupy/compete with K+ for binding sites along the transport pathway. 

      Strengths:  

      Although this structure is not so different from previous structures, its high resolution (2.1 Å) is impressive and allows the resolution of many new densities in the potassium transport pathway. The authors are judicious about assigning these as potassium ions or water molecules, and explain their structural interpretations clearly. In addition to the nice structural work, the mechanistic work is thorough. A series of thoughtful experiments involving ATP hydrolysis/transport coupling under various pH and potassium concentrations bolsters the structural interpretations and lends convincing support to the mechanistic proposal. 

      Weaknesses: 

      The structures are supported by solid membrane electrophysiology. These data exhibit some weaknesses, including a lack of information to assess the rigor and reproducibility (i.e., the number of replicates, the number of sensors used, controls to assess proteoliposome reconstitution efficiency, and the stability of proteoliposome absorption to the sensor). 

      To address this concern, we have generated a series of supplementary figures for Figs. 2, 4, 5, and 6, which show all of the raw traces underlying our SSME data (Figure 2 - figure supplements 2-4, Figure 4 - figure supplement 1,Figure 5 - figure supplement 3, Figure 6 - figure supplement 2). We have also included further detail about the experimental protocols, including number and type of replicates, in the "Activity Assays" section of Methods.

      Reviewing Editor Comments

      After discussing the evaluations, the Reviewers and Reviewing Editor have identified the following essential revisions that would need to be addressed to improve the eLife assessment:

      (1) Work from others in the field should be adequately described and acknowledged: 

      (a) Page 2: " A series of X-ray and cryo-EM structures of KdpFABC from E. coli have led to proposals of a novel transport mechanism befitting the unprecedented partnership of these two superfamilies within a single protein complex." 

      The authors must give credit where credit is due (namely, the Haenelt/Paulino groups having discovered the transport pathway). Why don't they cite Stock et al., where this pathway was described first? The Stokes group proposed an entirely different pathway initially. 

      Explicit reference to this work has been added to as follows:

      “A series of X-ray and cryo-EM structures of KdpFABC from E. coli (Huang et al., 2017; Silberberg et al., 2022, 2021; Stock et al., 2018; Sweet et al., 2021) indicate a novel transport mechanism befitting the unprecedented partnership of these two superfamilies within a single protein complex. As first proposed by Stock et al. (Stock et al., 2018), there is now a consensus that K<sup>+</sup> enters the complex from the extracellular side of the membrane through the selectivity filter of KdpA, but is blocked from crossing the membrane.”

      (b) Page 4 " As a result, many previous structures (Huang et al., 2017; Silberberg et al., 2021; Stock et al., 2018; Sweet et al., 2021) feature the S162A mutation to avoid inhibition rather than the fully WT protein used for the current work." 

      This is not correct. At least the work by Huang et al 2017 and Stock et al 2021 was done without the mutation. This is why the structures also captured the off-cycle state when no E2 inhibitor was used. But in Silberberg et al 2022 the mutant was used, but this is not mentioned 

      The Q116R mutant was used by Huang et al., but indeed not used for the Stock et al paper. We have replaced the sentence in the manuscript with the following:

      “Use of the KdpD knockout strain allowed us to produce WT and mutant protein free from Ser162 phosphorylation.”

      (c) Page 4: " In the paper, we report on the most highly populated state (44% of particles)". Exactly the same was also seen in detergent solution, which should be mentioned. 

      Reference to the Silberberg 2022 paper, where E1~P was the most highly populated state, has been added. The percentage of particles was removed as we are still processing data from the other states, which will we hope will be described in a future manuscript.

      (d) Page 7 "Asp583 and Lys586 are two conserved residues on M5 that have previously been shown......indicating that this particular mutation interfered with energy coupling."  The lack of discussion of the Haenelt/Paulino 2021 paper, where they have analyzed the coupling in detail and described a proximal binding site where K+ is coordinated by D583 and the neighbouring Phe is very concerning. 

      To correct this oversight, we made the following changes to the text: 

      On pg. 7 in the Results section, we refer to the 2005 paper from Bramkamp & Altendorf:

      “Consistent with earlier work on this mutant (Bramkamp and Altendorf, 2005), the D583A mutant displayed substantial ATPase activity (30% of WT) but no transport, indicating that this particular mutation interfered with energy coupling.”

      At the end of pg. 10 in the Discussion, we revised the paragraph discussing D583 and Lys586 to explicitly refer to the mechanism of transport described in the 2021 paper from Silberberg et al, including proximal and distal binding sites as well as uncoupling due to the D583A mutation.

      “Similar to the Glu370/Arg493 charge pair in KdpA, Asp583 and Lys586 are the only charged residues in the membrane core of KdpB. Although they are not seen to interact directly in our structure, they coordinate accessory waters associated with the canonical binding site. Previous molecular dynamics simulations (Silberberg et al., 2021) indicate that Asp583 couples with Phe232 to form a “proximal binding site” for K<sup>+</sup> ions. Based on these simulations, these authors proposed a mechanism whereby neutralization of this site either by ion binding or by D583A substitution served to stimulate ATPase activity. Indeed, earlier work on D583A (Bramkamp and Altendorf, 2005) as well as current data demonstrate uncoupling, in which K<sup>+</sup> independent ATPase activity was observed even though transport was abolished. A plausible explanation for this stimulation is seen in the behavior of Lys586 in previous structures of the E2·Pi state (7BGY and 7BH2) (Sweet et al., 2021). In these structures, M5 undergoes a conformational change that pushes the side chain of Lys586 into the CBS. As a consequence of the D583A mutation, this Lys could be freed to act as a built-in counter ion as in related P-type ATPases ZntA (Wang et al., 2014) and AHA2 (Pedersen et al., 2007). In regard to the proximal binding site and the partnering “distal binding site” on the KdpA-side of the subunit interface, our structure does not show densities at either site and thus does not provide any support for the related mechanism. In any case, in the WT complex it seems likely that Asp583 exerts allosteric control over Lys586 and ensures that its movement into the binding site is coordinated with the transition from E1~P to E2·Pi, thus leading to displacement of K<sup>+</sup> from the CBS and release to the cytoplasm. “

      (e) Page 8 " The intersubunit tunnel is arguably one of the most intriguing elements of the KdpFABC complex. Although it has been postulated to conduct K+, experimental evidence has been lacking. " 

      Incorrect, see Silberberg 2021. 

      On this point, we beg to differ. Although this 2021 paper shows densities in experimental cryo-EM maps and effects of mutations to residues at the KdpA and KdpB interface, the intra-tunnel transport mechanism is based on computational analysis (MD simulations) and not experimental evidence. We softened the statement to read as follows:

      “Although it has been postulated to conduct K<sup>+</sup>, direct experimental evidence has been hard to come by.”

      (f) In this context, also f232 is not mentioned anywhere in the text, although depicted in almost all figures. 

      Phe232 is shown as a point of reference for the KdpA/KdpB subunit interface. We added a reference to Phe232 in the Results section labeled “Intersubunit tunnel” as well as the paragraph in the Discussion addressed in point d) above.

      " These densities, which we have modeled as water, are most prevalent near the vestibule, which is the wider part of the tunnel, but then disappear completely at the subunit interface near Phe232, which is the narrowest part of the tunnel and also distinctly hydrophobic (Fig. 4)."

      " Previous molecular dynamics simulations (Silberberg et al., 2021) indicate that Asp583 couples with Phe232 to form a “proximal binding site” for K<sup>+</sup> ions."

      (g) Page 2 "Later, it was recognized that KdpA belongs to the Superfamily of K+ Transporters (SKT superfamily), which also includes bona fide K+ channels such as KcsA, TrkH and KtrB (Durell et al., 2000). " 

      KcsA is not a member of the SKT superfamily. 

      Thanks. This is correct, although the SKT superfamily is believed to have evolved from KcsA. KcsA has been removed from the sentence and a reference added to a review of the SKT superfamily:

      “which also includes bona fide K<sup>+</sup> channels such as TrkH and KtrB (Diskowski et al., 2015; Durell et al., 2000).”

      (2) Two other structural classes were identified, including one corresponding to E2. It is unclear why they are not described in the paper. Notably, the paper considers in some detail what might occur during the E1-P to E2 state transition, but does not describe the 3.1 Å resolution map for the E2 state that has already been obtained. Does the map support the proposed structural changes? 

      As was seen in previous work by Silberberg et at. (2022), imaging KdpFABC under turnover conditions can produce multiple enzymatic states. We focus on the E1~P state and associated biophysical analyses to provide a clear and concise story. We continue to work with the cryo-EM data as well as other supporting methodologies and datasets with the goal of producing an additional manuscript that will describe other conformations. The class of particles producing the 3.1 Å structure shown in Fig. 1 – figure suppl. 2 is heterogeneous and thus requires further classification to elucidate conformational changes, as is apparent from the downstream processing of the E1 classes also shown in that figure. We cannot therefore derive any conclusions about the configuration of side chains at the CBS based on this structure. Nevertheless, two previous structures of the E2.Pi state - 7BGY and 7BH2 which were stabilized MgF<sub>4</sub> and BeF<sub>x</sub>, respectively – show the structural change that is described in the paragraph discussing D583A. Given the consistency and relatively high resolution (2.9 and 3.0 Å, respectively) of these two independent structures, we believe that they provide strong support for our proposal for Lys586 acting as a built-in counter ion.

      (3) The paper relies on the quantitative activity comparisons between mutants measured using SSM electrophysiology. Such comparisons are notoriously tricky due to variability between SSM chips and reconstitution efficiencies. The authors should include raw traces for all experiments in the supplementary materials, explain how the replicates were performed, and describe the reproducibility of the results. 

      To address this concern, we have generated supplementary figures for Figs. 2, 4, 5, and 6, which show all of the raw traces underlying our SSME data (Figure 2 - figure supplements 2-4, Figure 4 - figure supplement 1,Figure 5 - figure supplement 3, Figure 6 - figure supplement 2). We have also added a detailed description of replicates, sensor stability and the experimental protocols in the "Activity Assays" section of Methods. In addition, we have highlighted observations of pre-steady state binding currents that were seen for some mutants (e.g., Q116R assayed with Rb<sup>+</sup>, NH<sub>4</sub><sup>+</sup> and Na<sup>+</sup>), in which an initial, transient current response was observed without an ensuing transport current. The depiction of this raw data has allowed us to explain our use of the current response at 1.25 s, after decay of this binding current, as a measure of transport rate. This approach is consistent with recommendations by the manufacturer, as documented in their 2023 publication (Bazzone et al. https://doi.org/10.3389/fphys.2023.1058583).

      (4) Related to this point above, size exclusion chromatography profiles and reconstitution efficiencies for mutants should be shown to facilitate comparison between measured activities. For example, could it be that the inactive V496R mutant is misfolded and unstable? Similarly, are the reduced activities of V496W and V496H (and many other mutants) due to changes in the tunnel or poor biochemical properties of these variants? Without these data, the validity of the ion transport measurements is difficult to assess. 

      We have included SEC profiles for each of the V496 mutants, which show that they are all well behaved in detergent solution prior to reconstitution (Fig. 4 - figure supplement 1). We are not able to directly document reconstitution efficiencies as it is not practical to separate proteoliposomes from unincorporated protein prior to preparing the sensors used for SSME. Binding currents are seen for several of the inactive mutants (e.g., Q116R in Rb and NH<sub>4</sub> in Fig. 2 - figure supplement 3 and V496R in Fig. 4 - figure supplement 1), which demonstrate that protein is indeed present in the corresponding proteoliposomes even though no sustained transport current is observed.

      (5) What are the different lines in Figure 1 - Supplement 1, panel G? 

      This panel depicted a series of SSME traces as an example of the raw data, but has been removed from the revised version given the inclusion of all the raw traces. These new figures include a legend explaining the conditions for each trace.

      (6) How was the 44 % population of the single-occupancy E1 state estimated (it does not correspond to the number of particles in Figure 1 - Supplement 2. 

      The calculation of 44% for the E1~P state was premature, given that we are still analyzing the data from the turnover conditions. The revised manuscript simply states that E1~P represented the largest population of particles, which is consistent with this state preceding the rate limiting step of the PostAlbers cycle. Reference is made to the Silberberg 2022 paper, which made a similar observation in a detergent-solubilized sample.

      (7) The text states that Km for Q116E is "<10 uM". However, the fitted value is 90 µM in Figure 2e. 

      This was a typographical error. The text now states that Km for Q116E is <100 M.

      (8) The Km values for Rb, NH4, and Na in Figures 2g and h, and Na in Figure 2i do not make sense. They should be removed. 

      The values for Km were determined by fitting the Michaelis-Menton equation to the data as detailed in the Methods section. Although the curves visually appear rather flat relative to other ions, the fitting generated respectable confidence limits and are therefore defensible in a statistical context. Furthermore, the curves that are shown are based on those values of Km and it would be inappropriate not to cite them.

      (9) Figure 3 would benefit from a slice through the protein to orient the viewer. 

      Thanks for the suggestion. We have added panels to Figs. 3, 5 and 6 in an effort to orient the reader to the site that is depicted.

      (10) The differences between R493E, Q, and M do not appear to be significant. 

      The y-axis is logarithmic which makes a visual comparison difficult. To alleviate this, P values were calculated based on one-way ANOVA analysis are results are indicated in Fig. 3c and 3d. They show that all of the Arg493 mutations have Km significantly higher than WT. Differences between R493E orR493Q and R493Q orR493M are not significant at the p<0.01 level, while the difference between R493E and R493M is highly significant (p<0.001).  The associated text on pg. 6 has been slightly modified as follows:

      “Changes to Arg493 generally increase Km (lower apparent affinity) without affecting Vmax, with Met substitution having greater effect than charge reversal (R493E).”

      (11) Page 5, paragraph 2. Q116R and G232D don't seem like the world's most intuitive mutations. It appears there is a historical reason for looking at these. Could the rationale be explained in the text? (Why R and D specifically?) 

      These mutations have historical significance, having been generated by random mutagenesis during early characterization of the Kdp system by Epstein and colleagues. A sentence containing relevant references has been added to this paragraph to provide this context:

      “Specifically, Q116R and G232D substitutions were initially discovered by random mutagenesis during early characterization of the Kdp system (Buurman et al., 1995; Epstein et al., 1978) and have featured in many follow-up studies (Dorus et al., 2001; Schrader et al., 2000; Silberberg et al., 2021; Sweet et al., 2020; van der Laan et al., 2002).”

      Below are the recommendations from each of the reviewers, some of which were not included as essential revisions, but that can also be helpful to further strengthen the manuscript. 

      Reviewer #1 (Recommendations for the authors): 

      It is essential that the authors correct their selective, incomplete, and in places inappropriate references to work from others in the field. 

      Specific points: 

      (1) Page 2: " A series of X-ray and cryo-EM structures of KdpFABC from E. coli have led to proposals of a novel transport mechanism befitting the unprecedented partnership of these two superfamilies within a single protein complex." 

      The authors must give credit where credit is due (namely, the Haenelt/Paulino groups having discovered the transport pathway). Why don't they cite Stock et al., where this pathway was described first? The Stokes group proposed an entirely different pathway initially. 

      (2) Page 4 " As a result, many previous structures (Huang et al., 2017; Silberberg et al., 2021; Stock et al., 2018; Sweet et al., 2021) feature the S162A mutation to avoid inhibition rather than the fully WT protein used for the current work." 

      This is not correct. At least the work by Huang et al 2017 and Stock et al 2021 was done without the mutation. This is why the structures also captured the off-cycle state when no E2 inhibitor was used. But in Silberberg et al 2022 the mutant was used, but this is not mentioned 

      (3) Page 4: " In the paper, we report on the most highly populated state (44% of particles)". Exactly the same was also seen in detergent solution, which should be mentioned. 

      (4) Page 7 "Asp583 and Lys586 are two conserved residues on M5 that have previously been shown......indicating that this particular mutation interfered with energy coupling."  The lack of discussion of the Haenelt/Paulino 2021 paper, where they have analyzed the coupling in detail and described a proximal binding site where K+ is coordinated by D583 and the neighbouring Phe is very concerning. 

      (5) Page 8 " The intersubunit tunnel is arguably one of the most intriguing elements of the KdpFABC complex. Although it has been postulated to conduct K+, experimental evidence has been lacking. " 

      Incorrect, see Silberberg 2021. 

      (6) In this context, also f232 is not mentioned anywhere in the text, although depicted in almost all figures. 

      References have been added to address all of these points. See item 1) under Reviewing Editor’s Comments above.

      Other points: 

      (7) Page 2 "Later, it was recognized that KdpA belongs to the Superfamily of K+ Transporters (SKT superfamily), which also includes bona fide K+ channels such as KcsA, TrkH and KtrB (Durell et al., 2000). " 

      KcsA is not a member of the SKT superfamily. 

      KcsA has been removed from the sentence and a reference added to a review of the SKT family:

      “which also includes bona fide K<sup>+</sup> channels such as TrkH and KtrB (Diskowski et al., 2015; Durell et al., 2000).”

      (8) Page 9 " Our demonstration of coupled transport of NH4+ and Rb+ G232D not only confirms that the selectivity filter governs ion selection, but that the pump subunit, KdpB, is relatively promiscuous."  Check grammar. 

      This sentence has been updated as follows:

      “Our observation that G232D is capable of coupled transport for NH<sub>4</sub><sup>+</sup and Rb<sup>+</sup> confirms not only that the selectivity filter governs ion selection, but that the pump subunit, KdpB, is relatively promiscuous.

      Reviewer #2 (Recommendations for the authors): 

      (1) From an editorial point of view, I suggest a few changes to enhance readability and clarity for non-specialists. A description of the overall transport cycle at the start of the paper (perhaps as a supplementary figure) could help put the work into perspective for general readers who may not be familiar with P-type ATPase mechanisms. It is unclear what "single" and "double" occupancy refer to in the structural classes description. Why is only one structural class described in detail? I would suggest moving the discussion of what is going on with the Nterminus of KdpB to the Results section, where it is described, and shortening the corresponding paragraph in the Discussion. I would furthermore suggest adding a figure that illustrates the proposed regulatory role of the terminus and how phosphorylation might affect it. Otherwise, this section of the results reads very hollow. 

      A diagram showing the Post-Albers cycle is shown as part of Fig. 1 and is described at the end of the second paragraph. This sentence only mentioned KdpB, which may have caused confusion. We therefore changed the sentence to read as follows:

      “Like other P-type ATPases, KdpFABC employs the Post-Albers reaction cycle (Fig. 1) involving two main conformations (E1 and E2) and their phosphorylated states (E1~P and E2-P) to drive transport (Albers, 1967; Post et al., 1969).”

      Single and double occupancy was meant to refer to the number of KdpFABC complexes residing in a nanodisc. This can be seen in the class averages in Fig. 1 - figure supplement 2. The legends to Fig. 1 figure supplements 1 and 2 have been revised to explain this observation more explicitly:

      "Slight asymmetry of the main peak is consistent with a subpopulation of nanodiscs containing two KdpFABC complexes (Fig. 1 - figure supplement 2)."

      and

      "A subset of these particles were further classified to generate four main classes representing nanodiscs with a single copy of KdpFABC in either E1 or E2 conformations, nanodiscs with two copies of KdpFABC which were mainly E1 conformation, and junk."

      As stated above, the class of particles producing the 3.1 Å structure shown in Fig. 1 – figure suppl. 2 is heterogeneous and requires further classification to elucidate conformational changes, as is apparent from the downstream processing of the E1 classes also shown in that figure. We continue to analyze the cryo-EM data and aim to produce a second manuscript that will include descriptions of other conformations together with the additional biophysical analysis related to their function.

      With regard to the N-terminus, we have gone on to generate a truncation of residues 2-9 in KdpB. After expression and purification, this construct remained coupled with ATPase and transport activities similar to WT, which makes proposals of a regulatory effect less compelling. Because of the novelty of observing the N-terminus and the possibility that it plays a subtle role in the kinetics of the cycle not revealed under the current assay conditions, we have retained a brief discussion of this structural observation, but moved it into the Results section as suggested.

      "Given the regulatory roles played by N- and C-termini of a variety of other P-type ATPases (Bitter et al., 2022; Cali et al., 2017; Lev et al., 2023; Timcenko et al., 2019; Zhao et al., 2021), we generated a construct in which residues 2-9 of the N-terminus of KdpB were truncated. However, ATPase and transport activities remained coupled at levels similar to WT, indicating that any functional role of the N-terminus is relatively subtle and not manifested under current assay conditions."

      (2) The wording "exceedingly strong densities" seems ambiguous. 

      We have changed this to “strong” in the Abstract and "exceptionally strong" in the Discussion. The precise values for these densities are shown in density histograms in Fig. 2 – figure supplement 1 and Fig. 5 – figure supplement 2. In the text, the densities are described as follows:

      Results sections describing the selectivity filter:

      "In fact, this S3 site contains the strongest densities in the entire map, measuring 7.9x higher than the threshold used for Fig. 2a (Fig. 2 – figure suppl. 1a)."

      Results section describing the CBS:

      "Given that this is the strongest density in KdpB, measuring 5.6x higher than the map densities shown in Fig. 5 (Fig. 5 – figure suppl 2b), we have modeled it as K<sup>+</sup>."

      (3) What are the different lines in Figure 1 - Supplement 1, panel G? 

      This panel depicted a series of SSME traces as an example of the raw data, but has been removed from the revised version given the inclusion of all the raw traces. These new figures include a legend explaining the conditions for each trace.

      (4) How was the 44 % population of the single-occupancy E1 state estimated (it does not correspond to the number of particles in Figure 1 - Supplement 2. 

      The calculation of 44% for the E1~P state was premature, given that we are still analyzing the data from the turnover conditions. We will consider citing an updated value in a future publication once this analysis is complete. The revised manuscript simply states that E1~P represented the largest population of particles, which is consistent with this state preceding the rate limiting step of the Post-Albers cycle. Reference was made to the Silberberg 2022 paper, where a similar observation was made.

      (5) Panel 1d is called out of order after panel 1e. Please label Ser 162 in the panel. 

      The order of these panels have been switched and Ser162 has been labelled as suggested.

      (6) Several panels in Figure 1- Supplement 1 are neither referenced nor described. 

      This figure supplement is referred to multiple times in the Results and the Methods sections of the text as well as in the figure legends. Although each panel is not individually referenced, all of this information is relevant at different points in the manuscript and is explained in the legend.

      (7) Is the coordinating geometry for the S3 site consistent with what was previously observed for KcsA and relatives? 

      The general arrangement of carbonyl atoms in the S3 site is the same in KcsA and KdpA, described by the MacKinnon group as a square antiprism. However, KcsA has strict four-fold symmetry and KdpA does not. As a result, there are small discrepancies between the coordinating geometries in the two structures. This point was made graphically in our original report on the X-ray structure of KdpFABC (Huang et al. 2007, Extended Data Fig. 3), though the positions of the carbonyls are more accurately determined in the current structure due to increased resolution. We added a sentence to the Selectivity Filter section of the Results stating the following:

      "This coordination geometry is also consistent with that seen in the K<sup>+</sup> channel KcsA, though the strict four-fold symmetry of that homo-tetramer produces a more regular structure, as indicated by the smaller variance in liganding distance (2.77 Å with s.d. 0.075 Å in 1K4C) and as depicted by Huang et al. in Extended Data Fig. 3 (Huang et al., 2017)."

      (8) Label G232D in Figure 2a. 

      G232 is out of the plane shown in Fig. 2a. However, we have added a label for Cys344 to help identify the selectivity filter strands that are shown. Note, however, that G232 is visible and labeled in Fig. 2 - figure suppl. 1. This has now been noted in the legend for Fig. 2.

      (9) The text states that Km for Q116E is "<10 uM". However, the fitted value is 90 uµ in Figure 2e. 

      This was a typographical error. The text now states that Km for Q116E is <100 M.

      (10) The Km values for Rb, NH4, and Na in Figures 2g and h, and Na in Figure 2i do not make sense. They should be removed. 

      The values for Km were determined by fitting the Michaelis-Menton equation to the data as detailed in the Methods section. Although the curves visually appear rather flat relative to other ions, the fitting generated respectable confidence limits and are therefore defensible in a statistical context. Furthermore, the curves that are shown are based on those values of Km and it would be inappropriate not to cite them.

      (11) Figure 3 would benefit from a slice through the protein to orient the viewer. 

      Thank you for the suggestion. We have added panels to Figs. 3, 5 and 6 in an effort to orient the reader to the site that is depicted.

      (12) The differences between R493E, Q, and M do not appear to be significant. 

      The y-axis is logarithmic which makes a visual comparison difficult. To alleviate this, P values were calculated based on one-way ANOVA analysis are results are indicated in Fig. 3c and 3d. They show that all of the Arg493 mutations have Km significantly higher than WT. Differences between R493E orR493Q and R493Q orR493M are not significant at the p<0.01 level, while the difference between R493E and R493M is highly significant (p<0.001).  The associated text on pg. 6 has been slightly modified as follows:

      “Changes to Arg493 generally increase Km (lower apparent affinity) without affecting Vmax, with Met substitution having greater effect than charge reversal (R493E).”

      Reviewer #3 (Recommendations for the authors): 

      Overall, the text was very clear, experiments were rationalized well, and conclusions were justified. A few small comments: 

      (1) Page 5, paragraph 2. Q116R and G232D don't seem like the world's most intuitive mutations. It appears there is a historical reason for looking at these. Could the rationale be explained in the text? (Why R and D specifically?) 

      These mutations are of historical importance, having been generated by random mutagenesis during early characterization of the Kdp system. A sentence containing relevant references has been added to this paragraph to provide this information as context:

      “Specifically, Q116R and G232D substitutions were initially discovered by random mutagenesis during early characterization of the Kdp system (Buurman et al., 1995; Epstein et al., 1978) and have featured in many follow-up studies (Dorus et al., 2001; Schrader et al., 2000; Silberberg et al., 2021; Sweet et al., 2020; van der Laan et al., 2002).”

      (2) Typo: page 14, "diluted" 

      This typo has been corrected.

      (3) The Methods section for SSM electrophysiology could use some additional description of how the data/statistics were collected. How many replicates? Were all replicates from a single sensor/ were multiple sensors examined? Were controls done to test whether the same number of liposomes remain absorbed by the sensor over the length of the experiment? 

      We have extended our description of experimental protocols in the "Activity Assays" section of Methods. This includes the number and type of replicates as well as a discussion of binding currents that were seen for some mutants. Furthermore, a new series of supplementary figures for Figs. 2, 4, 5, and 6 show all of the raw traces for the SSME measurements (Figure 2 - figure supplements 2-4, Figure 4 - figure supplement 1, Figure 5 - figure supplement 3, Figure 6 - figure supplement 2).

      We have included SEC profiles for each of the V496 mutants, which show that they are all well behaved in detergent solution prior to reconstitution (Fig. 4 - figure supplement 1). We are not able to directly document reconstitution efficiencies as it is not practical to separate proteoliposomes from unincorporated protein prior to preparing the sensors used for SSME. Binding currents are seen for several of the inactive mutants (e.g., Q116R in Rb and NH<sub>4</sub> in Fig. 2 - figure supplement 3 and V496R in Fig. 4 - figure supplement 1), which demonstrate that protein is indeed present in the corresponding proteoliposomes even though no sustained transport current is observed.

    1. eLife Assessment

      This is a methodologically rich manuscript that is important for revealing the center-surround inhibition profile of expectation in orientation space. The analyses are compelling in validating the critical role of predictive coding feedback. The findings provide novel insights into how expectation optimizes perception via enhancement and suppression.

    2. Reviewer #1 (Public review):

      Summary:

      The authors tested two competing mechanisms of expectation (1) a sharpening model that suppresses unexpected information via center-surround inhibition; (2) a cancellation model that predicts a monotonic gradient response profile. Using two psychophysical experiments manipulating feature space distance between expected and unexpected stimuli, the results consistently supported the sharpening model. Computational modeling further showed that expectation effects were explained by either sharpened tuning curves or tuning shifts. Finally, convolutional neural network simulations revealed that feedback connections critically mediate the observed center-surround inhibition.

      Strengths:

      The manuscript provides compelling and convergent evidence from both psychophysical experiments and computational modeling to robustly support the sharpening model of expectation, demonstrating clear center-surround inhibition of unexpected information.

      Comments on revisions:

      I appreciate the authors' thoughtful revisions. I have no further comments.

    3. Reviewer #2 (Public review):

      Summary:

      This is a compelling and methodologically rich manuscript. The authors used a variety of methods, including psychophysics, computational modeling, and artificial neural networks, to reveal a non-monotonic, center-surround "Mexican-hat" profile of expectation in orientation space. Their data convincingly extend analogous findings in attention and working memory, and the modeling nicely teases apart sharpening vs. shift mechanisms.

      Strengths:

      The findings are novel and important in elucidating the potential neural mechanisms by which expectation shapes perception. The authors conducted a series of well-designed psychophysical experiments to careful examination of the profile of expectation's modulation. Computational modeling also provides further insights, linking the neural mechanisms of expectation to behavioral results.

      Comments on revisions:

      I think the authors did a great job in addressing my previous comments. I have no further comments.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #2 (Public review):

      (1)  The sharpening model of expectation can predict surround suppression. The authors could further clarify how the cancellation model predicts a monotonic profile of expectation (Figure 1C) with the highest response at the expected orientation, while the cancellation model suggests a suppression of neurons tuned toward the expected stimulus.

      We thank the reviewer for the comment. We would like to emphasize that as the expected signal is suppressed, the relative weight or salience of unexpected inputs increases. We have clarified this interpretation in the manuscript as follows:

      “Here, given these two mechanisms making opposite predictions about how expectation changes the neural responses of unexpected stimuli, thereby displaying different profiles of expectation, we speculated that if expectation operates by the sharpening model with suppressing unexpected information, we should observe an inhibitory zone surrounding the focus of expectation, and its profile then should display as a center-surround inhibition (Fig. 1c, left). If, however, expectation operates as suggested by the cancelation model with highlighting unexpected information, the inhibitory zone surrounding the focus of expectation should be eliminated, and the profile should instead display a monotonic gradient (Fig. 1c, right).”

      (2) I'm a bit concerned about whether the profile solely arises from modulation of expectation. The two auditory cues are each associated with a fixed orientation, which may be confounded by other cognitive processes like visual working memory or attention (which I think the authors also discussed). Although the authors tried to use SFD task to render orientation task-irrelevant, luminance edges (i.e., orientation) and spatial frequency in gratings are highly intertwined and orientation of the gratings may help recall the first grating's SF (fixed at 0.9 c/{degree sign}), especially given the first and second grating's orientations are not very different (4.8{degree sign}).

      We agree that dissociating expectation from attention and other top-down processes remains a key challenge in visual expectation research (see Summerfield & Egner, 2009; Summerfield & de Lange, 2014; de Lange et al., 2018). As is generally acknowledged, expectation reflects the probability of a sensory event, while selective attention relates to its behavioral relevance. To minimize attentional influences, our task design ensured that grating orientation was not taskrelevant: on each trial, participants discriminated either orientation or spatial frequency difference, such that orientation itself did not require attentional allocation, a point already discussed in the manuscript.

      Regarding visual working memory, we argue that even if participants recalled the first grating’s spatial frequency in the SFD task, they were not required to retain its precise spatial frequency (or orientation), as their task was simply to judge whether the second grating appeared denser or sparser. In other words, orientation (or spatial frequency) itself was not task-relevant. Moreover, although not included in the manuscript, we conducted a post-experiment debriefing in which participants were asked whether they noticed any association between the auditory tone and the grating orientation. None of the participants reported this relationship correctly, suggesting that the tone-orientation mapping remained implicit and was unlikely to be driven by strategic attention or memory.

      However, we acknowledge that certain confounding processes such as statistical learning or implicit mapping acquisition cannot be fully ruled out given the current paradigm. Future studies using methods with higher temporal resolution (e.g., EEG/MEG) may help to dissociate these mechanisms more precisely.

      (3) For each of the expected orientations (20{degree sign} or 70{degree sign}), the unexpected ones are linearly separable (i.e., all unexpected ones lie on one side of the expected angle). This might further encourage people to shift their attended or expected orientation, according to the optimal tuning hypothesis. Would this provide an alternative explanation to the tuning shift that the authors found?

      We thank the reviewer for pointing out the relevance of the optimal tuning hypothesis. We acknowledge that the optimal tuning theory (Navalpakkam & Itti, 2007) is an important framework, particularly in visual search paradigms, where attentional templates may shift away from non-target features to enhance discriminability.

      In our task, this hypothesis would predict a shift of expectation toward <20° in E20° trials and >70° in E70° trials, given that all unexpected orientations lie on one side of the expected angle. Importantly, the optimal tuning hypothesis predicts such shifts not only in Δ20°, Δ25°, and Δ30° trials but also in the Δ0° trials. In this regard, the observed shift in Δ20° and Δ30° (Experiment 2) and Δ25° (Experiment 3) trials is broadly consistent with the predictions of the optimal tuning account. However, we did not observe a corresponding shift away from nontarget features in the Δ0° condition, suggesting limited behavioral evidence for optimal tuning effects under our current task settings.

      It is important to note that most previous studies supporting optimal tuning (e.g., Navalpakkam & Itti, 2007; Scolari & Serences, 2009; Geng, DiQuattro, & Helm, 2017; Yu & Geng, 2019) have used visual search paradigms that differ from our design in several critical ways, including the number of stimuli presented, their spatial arrangement (eccentricity), task demands, and so on. Therefore, it is difficult to determine whether the optimal tuning hypothesis could serve as an alternative explanation within the context of our current study. We agree that future studies could further examine how such task parameters influence the presence or absence of optimal tuning.

      (4) It is great that the authors conducted computational modeling to elucidate the potential neuronal mechanisms of expectation. But I think the sharpening hypothesis (e.g., reviewed in de Lange, Heilbron & Kok, 2018) focuses on the neural population level, i.e., narrowing of population tuning profile, while the authors conducted the sharpening at the neuronal tuning level. However, the sharpening of population does not necessarily rely on the sharpening of individual neuronal tuning. For example, neuronal gain modulation can also account for such population sharpening. I think similar logic applies to the orientation adjustment experiment. The behavioral level shift does not necessarily suggest a similar shift at the neuronal level. I would recommend that the authors comment on this.

      We thank the reviewer for this to-the-point comment. As de Lange et al. (2018) noted, “there is not always a direct correspondence between neural-level and voxel-level selectivity patterns.” That is, neuronal tuning, population-level tuning, voxel-level selectivity, and behavioral adaptive outcomes may reflect different underlying mechanisms and do not necessarily align in a one-toone fashion. We fully acknowledge that population-level tuning effects may also result from various neuronal mechanisms such as gain modulation (for review, see Salinas & Thier, 2000), shifts in preferred orientation (Ringach, et al., 1997; Jeyabalaratnam et al., 2013), asymmetric broadening of tuning curves (Schumacher et al., 2022), or tuning curve sharpening (Ringach, et al., 1997; Schoups et al., 2001).  

      In our modeling, we implemented sharpening and shifts of neuronal tuning curves as a conceptual model simplification, intended to explore potential mechanisms underlying expectation-related center-surround suppression effects. While sharpening-based accounts (e.g., Kok et al. 2012) have often been emphasized, we stress that other mechanisms, such as gain modulation or tuning shifts, may also contribute. Our goal is not to provide a definitive account, but to highlight such plausible mechanisms and encourage future investigation. We have revised the Discussion to emphasize that multiple mechanisms may underlie the observed effects.

      “We note that our implementation of sharpening and shifts at the neuronal level serves as a conceptual model simplification, as population-level tuning, voxel-level selectivity, and behavioral adaptive outcomes may reflect different underlying neuronal mechanisms and do not necessarily align in a one-to-one fashion. Here, we stress that other potential mechanisms beyond sharpening, such as tuning shifts, may also contribute to visual expectation.” 

      (5) If the orientation adjustment experiment suggests that both sharpening and shifting are present at the same time, have the authors tried combining both in their computational model?

      We agree with the reviewer that it is necessary to consider the combined model. Accordingly, we implemented a computational model incorporating sharpening of the expected orientation channel together with shifting of the unexpected orientation channels. This model

      successfully captured the sharpening of the expected-orientation channel and the shift of the unexpectedorientation channels (Supplementary Fig. 3). For the expected orientation (Δ0°) , results showed that the amplitude change was significantly higher than zero on both OD (t(23) = 2.582, p = 0.017, Cohen’s d = 0.527) and SFD (t(23) = 2.078, p = 0.049, Cohen’s d = 0.424) tasks (Supplementary Fig. 3e, vertical stripes); the width change was significantly lower than zero on both OD (t(23) = -2.438, p = 0.023, Cohen’s d = 0.498) and SFD (t(23) = -2.578, p = 0.017, Cohen’s d = 0.526) tasks (Supplementary Fig. 3e, diagonal stripes). For unexpected orientations (Δ10°-Δ40°), however, the amplitude and width changes were not significant with zero on either OD (amplitude change: t(23) = 0.443, p = 0.662, Cohen’s d = 0.091; width change: t(23) = -1.819, p = 0.082, Cohen’s d = 0.371) or SFD (amplitude change: t(23) = 1.130, p = 0.270, Cohen’s d = 0.231; width change: t(23) = -1.710, p = 0.101, Cohen’s d = 0.349) tasks (Supplementary Fig. 3f). In the meantime, the location shift was significantly different than zero for unexpected orientations (Δ10°-Δ40°, OD task: t(23) = 3.611, p = 0.001, Cohen’s d = 0.737; SFD task: t(23) = 2.418, p = 0.024, Cohen’s d = 0.493 (Supplementary Fig. 3g). These results provided further evidence that tuning sharpening and tuning shift jointly contribute to center– surround inhibition in expectation.  

      Reviewer#1 (Recommendation for the Author):

      (1) A direct comparison between tasks (baseline vs. expectation conditions) would have strengthened the findings. Specifically, contrasting performance in the orientation discrimination task with the spatial frequency discrimination task could have provided clearer evidence that participants actually used the auditory cues to attend to the expected orientation. This comparison would be particularly important for validating cue manipulation in the orientation discrimination task.

      We agree that a direct comparison between the orientation discrimination (OD) and spatial frequency discrimination (SFD) tasks could further clarify how expectation (auditory cues) differentially modulates orientation relevance. However, the primary goal of the current study was to examine expectation effects within each task separately and to demonstrate that such effects are independent of attentional modulation driven by the task-relevance of orientation.

      In addition, the OD and SFD tasks differ not only in the relevant task features (orientation vs. spatial frequency discrimination), but also in stimulus properties and difficulty, for example, the arbitrary use of 20–70° as the orientation range and ~0.9 cycles/° as the spatial frequency setting, a direct comparison could introduce confounding factors unrelated to expectation.

      Importantly, Previous studies (e.g., Kok et al., 2012, 2017; Aitken et al., 2020) and our current results show that participants performed significantly better when the auditory cue matched the expected orientation, supporting the validity of our expectation manipulation.

      (2) An interesting consideration is why the center-surround inhibition profile of expectation was independent of the task-relevance of orientation. Previous studies (e.g., Kok et al., 2012) have found that orientation discrimination patterns differ depending on whether orientation is taskrelevant or irrelevant. This could be useful to discuss the possible discrepancies.

      We thank the reviewer for this inspiring comment. Kok et al. (2012) showed that both orientation and contrast tasks elicited similar fMRI decoding results, regardless of task relevance, suggesting neural mechanisms of expectation operate independently of whether orientation is task relevant. Behaviorally, they reported better performance for expected versus unexpected trials in the orientation task (3.4° vs. 3.8°, t(17) = 2.8, p = 0.013), and a marginal trend (although not significant) in the contrast task (4.3% vs. 5.0%, t(17) = 1.9, p = 0.075). If any differences between the two tasks exist, they may lie in the correlation between behavioral and fMRI effects, a question that goes beyond the scope of the current study. Therefore, it is hard to strongly conclude that orientation discrimination patterns differ depending on whether orientation is taskrelevant or irrelevant in their paper.

      Our study differs from theirs in at least two important ways, which may account for the clearer expectation facilitatory effect we observed in the expectation (Δ0°) condition. First, in our study, the orientation-irrelevant task involved spatial frequency discrimination (SFD) rather than contrast discrimination. Compared to contrast, spatial frequency has been shown to exhibit a clear cueing effect, as reported in Fang & Liu (2019). Second, our design included a baseline condition, which was absent in their study. We computed discrimination sensitivity (DS) to quantify how much the discrimination threshold (DT) changed relative to baseline. By using this baseline-referenced approach, we observed a significant facilitatory expectation effect in the Δ0° condition, an effect that shifted from marginal significance in their orientation-irrelevant task to clear significance in our study.

      (3) The authors might consider briefly explaining how the orientation adjustment paradigm used in this study is particularly effective for examining the potential co-existence of tuning sharpening and tuning shift computations, and how this approach complements traditional orientation discrimination tasks in characterizing expectation-related mechanisms.

      We thank the reviewer for this valuable suggestion. We agree that further clarification is needed to better connect the two experiments. To explain this, we have elaborated further in the manuscript.

      “To further explore the co-existence of both Tuning sharpening and Tuning shift computations in center-surround inhibition profile of expectation, participants were asked to perform a classic orientation adjustment experiment. Unlike profile experiment (discrimination tasks), the adjustment experiment provides a direct, trial-by-trial measure of participants’ perceived orientation, capturing the full distribution of responses. This enables the construction of orientation-specific tuning curves, allowing us to detect both tuning sharpening and tuning shifts, thereby offering a more nuanced understanding of the computational mechanisms underlying expectation.”

      (4) These interesting findings raise important questions about their relationship to existing hybrid models of attentional modulation. Could the authors discuss how their results might align with or extend previous work demonstrating combined feature-similarity gain and surround suppression effects for orientation (e.g., Fang & Liu, 2019)? Could a hybrid model potentially provide a better account of these data than the pure surround suppression model?

      We thank the reviewer for this valuable comment. We agree that hybrid model should be mentioned in the manuscript and we have elaborated further in the Discussion.

      “For example, within the orientation space, the inhibitory zone was about 20°, 45°, and 54° for expectation evident here, feature-based attention[21], and visual perceptual learning[35], respectively; within the feature-based attention, it was about 30° and 45° in color [77] and motion direction [53] spaces, respectively These variations hint at the exciting possibility that the width of the inhibitory surround may flexibly adapt to stimulus context and task demands, ultimately facilitating our perception and behavior in a changing environment. This principle is consistent with the hybrid model of feature-based attention [53,54,75], where attention is deployed adaptively to prioritize task-relevant information through feature-similarity gain which filters out the most distinctive distractors, and surround suppression which inhibits similar and confusable ones, thereby jointly shaping the attentional tuning profile.”

      (5) On page 19, there appears to be a missing symbol in the description of the Tuning Sharpening model. The text states: 'the tuning width of each channel's tuning function is parameterized by ??', where the question marks seem to indicate a missing parameter symbol.

      We appreciate the reviewer’s careful attention. Yes, the "ơ" is missing, which was likely caused by a formatting issue. We have corrected it.

    1. eLife Assessment

      This important study reports the results of efforts to replicate two phenomena of significant interest to early-career scientists and scientific policymakers: the Matthew effect and the early-career setback effect. Several previous studies of these effects have focused on early-career researchers with grant proposals that fell just below or just above a funding threshold. Those just above the threshold were more likely to be successful when they applied for funding later in the career (an example of the well-known Matthew effect), while those just below were more likely to go on to have stronger publication records (the early-career setback effect). In this study the Matthew effect was found to be robust across funders, and to generalize from those close to the funding threshold to the whole population. The early-career setback effect was not robust across funders and did not generalize to the whole population. The evidence reported is convincing.

    2. Reviewer #1 (Public review):

      Summary:

      The authors performed a multi-funder study to determine if the Matthew effect and early-career setback effect were reproducible across funding programs and processes. The authors extended the analysis of these effects to all applicants and compared the results to the prior studies that only looked at near-hit/near-miss applicants to determine if the effects were generalizable to the whole applicant pool. Further, the authors included new models that also account for researcher behavior and their overall likelihood to reapply for later funding and how this behavior may resolve what appears to be a paradox between the Matthew effect and the early-career setback effect.

      Strengths:

      Figure 4 shows that the "Post (late) MFCR" is the same for the funded and unfunded groups, indicating that the impact of early career funding (at least, in terms of citation metrics) is transient in researcher's overall careers. This finding should encourage researchers to persevere when needed and that long-term success is attainable.

      The inclusion of the collider bias in the models to account for researcher behavioral responses is a key strength of the paper and enhance the analysis and nuanced discussion of the results.

      Weaknesses:

      The discussion of limitations is thorough and point to the need for additional studies. One limitation that is acknowledged is that the authors only looked at applicants who reapplied for funding at the same funder. Given that the authors had the names and affiliations of the applicants from all of the funders, it would be helpful to understand why they were not able to look at applicants across their full data set. Was the limitation technical or a result of the study design? What would have to change to enable this broader analysis?

      In Section 4.1, the authors make a statement that the "between MFCR" difference was seen at 5 years, but not at 10 years, and so the authors chose to use the 5-year period for the presentation of their results. It would be helpful to also see the 10-year analysis and have further justification from the authors on why they selected to look at the 5-year period and how their conclusions might or might not change if they consider the longer time period.

      The discussion could also include that many funders require novel research directions as a condition of receiving an early-career award. For those who receive these awards, they must establish the new research program, begin publishing, and they may initially see a lower citation rate until the impact of the research is more broadly recognized. Are there ways to explore how these time lags impact the "Between MFCR" on those who were funded more so than those who were not funded?

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript evaluates the generalizability of two phenomena of great interest to early-career scientists and scientific policymakers. These phenomena describe how early funding success can promote future funding success (the Matthew Effect) and how initially unsuccessful applicants may later succeed (the early-career setback effect). Given the often-normative aspirations of science-of-science studies, the manuscript represents a much-needed and highly significant effort, as it allows a broader audience to assess whether they should reconsider their behavior or policies.

      Strengths:

      The evidence provided by the authors for the generalizability of the Matthew Effect is very strong and convincing. The manuscripts addresses an important topic of practical concern to early-career scientists and scientific policymakers.

      Weaknesses: If I am correctly interpreting S11 and S12, the statements on the early-career setback effect could be stronger and more direct. The argument in the main text relies on assumptions and simulations to suggest that observations of the early-career setback effect may depend on reapplications. In contrast, S11 and S12 appear to provide more direct evidence against its generalizability, showing that the effect seems to exist in, and be driven by, only one of the six funding agencies considered (FWF). This narrow replication may not be obvious to readers ("the early-career setback effect also replicates, but is not robust across funders").

      I would also suggest that the authors provide a more nuanced discussion of the limitations of their Bayesian model. While the model seems appropriate for accounting for major factors, it appears to exclude others, such as the emergence of new scientific fields or the strategic reorientation of funders toward such fields.

    4. Reviewer #3 (Public review):

      Summary:

      This paper investigates the Matthew effect, where early success in funding peer review can translate into potentially unwarranted later success. It also investigated the previously found "setback" effect for those who narrowly miss out on funding.

      Strengths:

      The study used data from six funding agencies, which increases the generalisability, and was able to link bibliographic data for around 95% of applicants. The authors nicely illustrate how the previously found "setback" effect for near-miss applicants could be a collider bias due to those who chose to apply sometime later. This is a good explanation for the counter-intuitive effect and is nicely shown in Figure 5.

      Weaknesses:

      Most of the methods were clearly presented, but I have a few questions and comments, as outlined below.

      In Figure 4(a) why are the "post" means much lower than the "pre"? This contradicts the expected research trajectory of researchers. Or is this simply due to less follow-up time? But doesn't the field citation ratio control for follow-up time?

      The choice of the log-normal distribution for latent quality was not entirely clear to me. This would create some skew, rather than a symmetric distribution, which may be reasonable but log-normal distributions can have a very long tail which might not mimic reality, as I would not expect a small number of researchers to be extremely above the crowd. However, then the skew was potentially dampened by using percentile scores. Some further reasoning and plots of the priors would help.

      Can the authors confirm the results of Figure S9 which show no visible effect of altering the standard deviation for the review parameter or the mean citations? Is this just because the prior for quality is dominated by the data? Could it be that the width of the distribution for quality does not matter, as it's the relative difference/ranking that counts? So the beta in equation 6 changes to adjust to the different quality scale?

      The contrary result for the FWF is not explained (Table S3). Does this funder have different rules around re-applicants or many other competing funders?

      The outlined qualitative research sounds worthwhile. Another potential mechanism (based on anecdote) is that some researchers react irrationally to rejection or acceptance, tending to think that the whole agency likes or hates their work based on one experience. Many researchers do not appreciate that it was a somewhat random selection of reviewers who viewed their work, and it will unlikely be the same reviewers next time.

      "A key implication is the importance of encouraging promising, but initially unsuccessful applicants to reapply." Yes, A policy implication is to give people multiple chances to be lucky, perhaps by giving fewer grants to more people, which could be achieved by shortening the funding period (e.g., 4 year fellowships instead of 5 years). Although this will have some costs as applicants would need to spend more time on applications and suffer increased stress of shorter-term contracts. The bridge grants is potentially an ideal half-way house between many short-term and few long-term awards. Giving more grants to fewer people is supported by this analysis showing a diminishing returns in research outputs with more funding, DOI: 10.1371/journal.pone.0065263.

      Making more room for re-applicants also made me wonder if there should be an upper cap on funding, potentially for people who have been incredibly successful. Of course, funders generally want to award successful researchers, but people who've won over some limit, for example $50 million, could likely be expected to win funding from other sources such as philanthropy and business. Graded caps could occur by career stage.

    1. eLife Assessment

      This important research addresses the effects of subjective control and task difficulty on experienced stress using a novel behavioral task administered on the same day in two large online samples. Convincing evidence is provided, establishing the internal and external task validity of the task, as well as a relationship between the sense of control and task difficulty, with individual differences in relevant mental health constructs. Evidence for the specificity of the link between control and stress would be more substantial if the design had not conflated control and reward rate. This work will be of interest to psychologists and clinicians studying the concepts of controllability, stress, and psychopathology.

    2. Reviewer #1 (Public review):

      Summary:

      This work investigated how the sense of control influences perceptions of stress. In a novel "Wheel Stopping" task, the authors used task variations in difficulty and controllability to measure and manipulate perceived control in two large cohorts of online participants. The authors first demonstrate that their behavioral task exhibits good internal consistency and external validity, indicating that perceived control during the task is linked to relevant measures of anxiety, depression, and locus of control. Most importantly, manipulating controllability in the task resulted in reduced subjective stress, demonstrating a direct impact of control on stress perception. However, this work has some minor limitations to this work due to the design of the stressor manipulations/measurements and the necessary logistics associated with online versus in-person stress studies.<br /> Nevertheless, this research adds to our understanding of when and how control can influence the effects of stress and has particular relevance for mental health interventions.

      Strengths:

      The primary strength of this research is the development of a unique and clever task design that can reliably and validly elicit variations in beliefs about control. Impressively, higher subjective control in the task was associated with decreased psychopathology measures such as anxiety and depression in a non-clinical sample of participants. In addition, the authors found that lower control and higher task difficulty led to higher perceived stress, suggesting that the task can reliably manipulate perceptions of stress. Prior tasks have not included both controllability and difficulty in this manner and have not directly tested the direct influence of these factors on incidental stress, making this work both novel and important for the field.

      Weaknesses:

      One minor weakness of this research is the validity of the online stress measurements and manipulations. In this study, the authors measure subjective stress via self-report both during the task and after either a Trier Social Stress Test (high-stress condition) or a memory test (low-stress condition). One concern is that these stress manipulations were really "threats" of stress, where participants never had to complete the stress tasks (i.e., recording a speech for judgment). While this is not unusual for an in-lab study and can reliably elicit substantial stress/anxiety, in an online study, there is a possibility for communication between participants (via online forums dedicated to such communication), which could weaken the stress effects. That said, the authors did find sensible increases and decreases in perceived stress between relevant time points; however, future work could improve upon this design by including more comprehensive stress manipulations and by measuring implicit physiological signs of stress.

      Comments on revisions:

      I appreciate the authors' responses to my comments and concerns. I have decided not to make changes to my public review, as I believe it remains relevant and fair after revisions.

    3. Reviewer #2 (Public review):

      Summary:

      The authors have developed a behavioral paradigm to experimentally manipulate the sense of control experienced by participants by varying the level of difficulty in a wheel-stopping task. In the first study, this manipulation is tested by administering the task in a factorial design with two levels of controllability and two levels of stressor intensity to a large number of participants online, while simultaneously recording subjective ratings of perceived control, anxiety, and stress. In a second study, the authors employed the wheel stopping task to induce a high sense of controllability and investigate whether this manipulation buffers the response to a subsequent stress induction when compared to a neutral task, such as watching pleasant videos.

      Strengths:

      (1) The authors validate a method to manipulate stress.

      (2) The authors use an experimental manipulation to induce an enhanced sense of controllability to test its impact on the response to stress induction.\

      (3) The studies involved big sample sizes.

      Weaknesses:

      (1) The study was not preregistered.

      (2) The control manipulation is conflated with task difficulty and, therefore, the reward rate. In the revised version of the manuscript, the authors perform statistical analysis to demonstrate that the relationship between perceived level of control and subjective stress remains robust after the inclusion of win rate in the model. This analysis strengthens the authors's claims, but the evidence would more substantial if the design did not conflate reward rate and control. The authors properly discuss this issue in the revised manuscript.

      This study will be of interest to psychologists and cognitive scientists who are interested in understanding how controllability and its subjective perception influence how people respond to stress exposure. The demonstration that an increased sense of control buffers/protects against subsequent stress is important and may trigger further studies to characterize this phenomenon better. However, beyond the highlighted weaknesses, the current study only studied the effect of stress induction consequent to the performance of the WS task on the same day, and its generalizability is not warranted.

    4. Reviewer #3 (Public review):

      Summary:

      This is an interesting investigation on the benefits of perceiving control and its impact on the subjective experience of stress. To assess the subjective sense of control, the authors introduce a novel wheel stopping (WS) task where control is manipulated via size and speed to induce conditions of low and high control. The authors demonstrate that the subjective sense of control is associated with experienced subjective stress and individual differences related to mental health measures. In a second experiment, they further demonstrate that an increased sense of control buffers subjective stress induced by a trier social stress manipulation, more so than a typical stress-buffering mechanism of watching neutral/calming videos.

      Strengths:

      Several strengths of the manuscript can be highlighted. For instance, the paper introduces a new paradigm and a clever manipulation to test a significant and important question. Additionally, it is a well-powered investigation that allows for confidence in replicability and demonstrate both high internal consistency and high external validity, along with an interesting set of individual difference analyses. Finally, the results are quite interesting and support prior literature, while also making a significant contribution to the field in understanding the benefits of perceiving control.

      Weaknesses:

      The authors have addressed all my queries, and I believe the revised paper has been improved and will make an important contribution to the literature.

    5. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      Summary:

      This work investigated how the sense of control influences perceptions of stress. In a novel "Wheel Stopping" task, the authors used task variations in difficulty and controllability to measure and manipulate perceived control in two large cohorts of online participants. The authors first show that their behavioral task has good internal consistency and external validity, showing that perceived control during the task was linked to relevant measures of anxiety, depression, and locus of control. Most importantly, manipulating controllability in the task led to reduced subjective stress, showing a direct impact of control on stress perception. However, this work has minor limitations due to the design of the stressor manipulations/measurements and the necessary logistics associated with online versus in-person stress studies.

      Nevertheless, this research adds to our understanding of when and how control can influence the effects of stress and is particularly relevant to mental health interventions.

      We thank the reviewer for their clear and accurate summary of the findings. 

      Strengths:

      The primary strength of this research is the development of a unique and clever task design that can reliably and validly elicit variations in beliefs about control. Impressively, higher subjective control in the task was associated with decreased psychopathology measures such an anxiety and depression in a non-clinical sample of participants. In addition, the authors found that lower control and higher difficulty in the task led to higher perceived stress, suggesting that the task can reliably manipulate perceptions of stress. Prior tasks have not included both controllability and difficulty in this manner and have not directly tested the direct influence of these factors on incidental stress, making this work both novel and important for the field.

      We thank the reviewer for their positive comments.

      Weaknesses:

      One minor weakness of this research is the validity of the online stress measurements and manipulations. In this study, the authors measure subjective stress via self-report both during the task and also after either a Trier Social Stress Test (high-stress condition) or a memory test (low-stress condition). One concern is that these stress manipulations were really "threats" of stress, where participants never had to complete the stress tasks (i.e., recording a speech for judgment). While this is not unusual for an in-lab study and can reliably elicit substantial stress/anxiety, in an online study, there is a possibility for communication between participants (via online forums dedicated to such communication), which could weaken the stress effects. That said, the authors did find sensible increases and decreases of perceived stress between relevant time points, but future work could improve upon this design by including more complete stress manipulations and measuring implicit physiological signs of stress.

      We thank the reviewer for urging us to expand on this point. The reviewer is right that stress was merely anticipatory and is in that sense different to the canonical TSST. However, there are ample demonstrations that such anticipatory stress inductions are effective at reliably eliciting physiological and psychological stress responses (e.g. Nasso et al., 2019; Schlatter et al., 2021; Steinbeis et al., 2015). Further, there is evidence that online versions of the TSST are also effective (DuPont et al., 2022; Meier et al., 2022), including evidence that the speech preparation phase conducted online was related to increases in heart rate and blood pressure (DuPont et al., 2022). Importantly, and as the reviewer notes in relation to our study specifically, the anticipatory TSST had a significant impact on subjective stress in the expected direction demonstrating that it was effective at eliciting subjective stress. We have elaborated further on this in our manuscript (pages 8 and 9) as follows: 

      “Prior research has found TSST anticipation to elicit both psychological and physiological stress responses [37-39], suggesting that the task anticipation would be a valid stress induction despite participants not performing the speech task. Moreover, prior research has validated the use of remote TSST in online settings [40, 41], including evidence that the speech preparation phase (online) was related to increased heart rate and blood pressure compared to controls [40].”

      Reviewer #2 (Public review):

      Summary:

      The authors have developed a behavioral paradigm to experimentally manipulate the sense of control experienced by the participants by changing the level of difficulty of a wheel-stopping task. In the first study, this manipulation is tested by administering the task in a factorial design with two levels of controllability and two levels of stressor intensity to a large number of participants online while simultaneously recording subjective ratings on perceived control, anxiety, and stress. In the second study, the authors used the wheel-stopping task to induce a high sense of controllability and test whether this manipulation buffers the response to a subsequent stress induction when compared to a neutral task, like looking at pleasant videos.

      We thank the reviewer for their accurate summary.

      Strengths:

      (1) The authors validate a method to manipulate stress.

      (2) The authors use an experimental manipulation to induce an enhanced sense of controllability to test its impact on the response to stress induction.

      (3) The studies involved big sample sizes.

      We thank the reviewer for noting these positive aspects of our study. 

      Weaknesses:

      (1) The study was not preregistered.

      This is correct.

      (2) The control manipulation is conflated with task difficulty, and, therefore the reward rate. Although the authors acknowledge this limitation at the end of the discussion, it is a very important limitation, and its implications are not properly discussed. The discussion states that this is a common limitation with previous studies of control but omits that many studies have controlled for it using yoking.

      We agree that these are very important issues to consider in the interpretation of our findings. It is important to note, that while our task design does not separate these constructs, we are able to do so in our statistical analyses. For example, our measure of perceived difficulty was included in analyses assessing the fluctuations in stress and control in which subjective control still had a unique effect on the experience of stress over and above perceived difficulty, suggesting that subjective control explains variance in stress beyond what is accounted for by perceived difficulty. Similarly, we have also included additional analyses in which we include the win rate (i.e. percentage of trials won) as a covariate when assessing the relationship between subjective control, perceived difficulty and subjective stress, in which subjective control and perceived difficulty still uniquely predict subjective stress when controlling for win rate. This suggests that there is unique variance in subjective control, separate from perceived task difficulty and win rate that is relevant to stress. We have included these analyses (page 16 of manuscript) as follows:

      “To further isolate the relationship between subjective control and stress separate from perceived task difficulty or objective task performance, we also included the overall win rate (percentage of trials won during the WS task) in the models. In Study 1, lower feelings of control were related to higher levels of subjective stress (β= -0.12, p<.001) even when controlling for both  win rate (β= -0.06, p=.220) and perceived task difficulty (β= 0.37, p<.001, Table S10). This also replicated in Study 2, where lower subjective control was associated with higher feelings of stress (β= -0.32, p<.001) when controlling for perceived task difficulty (β= 0.31, p<.001) and win rate (β= -0.11, p=.428, Table S11). This suggests that there is unique variance in subjective feelings of control, separate from task performance, relevant to subjective stress.”

      As well as expanding on this in the Discussion (pages 27 and 28) as follows:

      “While our task design does not separate control from obtained reward, we are able to do so in the statistical analyses. Like with perceived difficulty, we statistically accounted for reward rate and showed that the relationship between subjective control and stress was not accounted for by reward rate, for example. Similarly, participants received feedback after every trial, and thus feedback valence may contribute to stress perception. However, given that overall win rate (which captures the feedback received during the task) did not predict stress over and above perceived difficulty or subjective control, it suggests that feedback is unlikely to relate to stress over and above difficulty. Future work will need to disentangle this further to rule out such potential confounds.”

      Further, in terms of the wider literature on these issues, we have added more to this point in our discussion, especially in relation to previous literature that also varies control by reward rate (e.g. Dorfman & Gershman, 2019, who use a reward rate of 80% in high control conditions and 50% in low control conditions). This can be found in the manuscript on page 27 as follows: 

      “Previous research typically accounts for different outcomes (e.g. punishment) by yoking controllable and uncontrollable conditions [3] though other work has manipulated the controllability of rewards by changing the reward rate [for example 30] where a decoy stimulus is rewarded 50% of the time in the low control condition but 80% in the high control condition).”

      (3) The methods are not always clear enough, and it is difficult to know whether all the manipulations are done within-subjects or some key manipulations are done between subjects.

      We have added more information in the methods section (page 8) clarifying withinsubject manipulations (WS task parameters) and between-subject manipulations (stressor intensity task, WS task version in Study 1, and WS task/video task in Study 2). Additionally, as recommended by Reviewer 1, we have provided more information in the methods section and Table S3 regarding the details of on-screen written feedback provided to participants after each trial of the WS Task.

      (4) The analysis of internal consistency is based on splitting the data into odd/even sliders. This choice of data parcellation may cause missed drifts in task performance due to learning, practice effects, or tiredness, thus potentially inflating internal consistency.

      We agree that this can indeed be an issue, though drift is likely to be present in any task including even in mood in resting-state (Jangraw et al., 2023). To respond to this specific point, we parcellated the timepoints into a 1<sup>st</sup>/2<sup>nd</sup> half split and report the ICC in the supplementary information. While values are lower, indeed likely due to systematic drifts in task performance as participants learn to perform the task (especially for Study 2 since the order of parameters were designed to get easier throughout the experiment), the ICC values are still high. Control sliders: Study 1 = 0.82, Study 2: = 0.68; Difficulty sliders: Study 1: = 0.84, Study 2 = 0.57; Stress sliders: Study 1 = 0.45, Study 2 = 0.71. As seen, the lowest ICC is for stress sliders in Study 1. This may be because the first 3 sliders (included in the 1<sup>st</sup> half split) were all related to the stress task (initial, post-stress, task, post-debrief) and the final 4 sliders (in the 2<sup>nd</sup> half split) were the three sliders during the WS task and shortly afterwards. 

      (5) Study 2 manipulates the effect of domain (win versus loss WS task), but the interaction of this factor with stressor intensity is not included in the analysis.

      We agree that this would be a valuable analysis to include. We have run additional analyses (section Sensitivity and Exploratory Analyses, pages 24 and 25), testing the interaction of Domain (win or loss) with stressor intensity (and time) when predicting the stress buffering and stress relief effects. This revealed no significant main effects of domain or interactions including domain, suggesting that domain did not impact the stress induction or relief differently depending on whether it was followed by the high or low stressor intensity condition. While the control by time interaction (our main effect of interest) still held for stress induction in this more complex model, the control by time interaction did not hold for the stress relief. However, this more complex model did not provide a better fit for the data, motivating us to continue to draw conclusions from the original model specification with domain as a covariate (rather than an interaction).

      We outline these analyses on page 24 of the manuscript, as follows:

      “Third, we included the interaction of domain with stressor intensity and with time, to test whether the win or loss domain in the WS task significantly impacted stress induction or stress relief differently depending on stressor intensity. There were no significant effects or interactions of domain (Table S14) for stress induction or stress relief, and the main effect of interest (the interaction between time and control) still held for the stress induction (β= 10.20, SE=4.99 p=.041, Table S14), though was no longer significant for the stress relief  (β= 6.72, SE=4.28, p=.117, Table S14). This more complex model did not significantly improve model fit (χ<sup>²</sup>(3)= 1.46, p=.691) compared to our original specification (with domain as a covariate rather than an interaction) and had slightly worse fit (higher AIC and BIC) than the original model (AIC = 5477.2 versus 5472.7, BIC = 5538.5 versus 5520.8).”

      This study will be of interest to psychologists and cognitive scientists interested in understanding how controllability and its subjective perception impact how people respond to stress exposure. Demonstrating that an increased sense of control buffers/protects against subsequent stress is important and may trigger further studies to characterize this phenomenon better. However, beyond the highlighted weaknesses, the current study only studied the effect of stress induction consecutive to the performance of the WS task on the same day and its generalizability is not warranted.

      We thank the reviewer for this assessment and agree that we cannot assume these findings would generalise to more prolonged effects on stress responses.

      Reviewer #3 (Public review):

      Summary:

      This is an interesting investigation of the benefits of perceiving control and its impact on the subjective experience of stress. To assess a subjective sense of control, the authors introduce a novel wheel-stopping (WS) task where control is manipulated via size and speed to induce low and high control conditions. The authors demonstrate that the subjective sense of control is associated with experienced subjective stress and individual differences related to mental health measures. In a second experiment, they further show that an increased sense of control buffers subjective stress induced by a trier social stress manipulation, more so than a more typical stress buffering mechanism of watching neutral/calming videos.

      We agree with this accurate summary of our study. 

      Strengths:

      There are several strengths to the manuscript that can be highlighted. For instance, the paper introduces a new paradigm and a clever manipulation to test an important and significant question. Additionally, it is a well-powered investigation that allows for confidence in replicability and the ability to show both high internal consistency and high external validity with an interesting set of individual difference analyses. Finally, the results are quite interesting and support prior literature while also providing a significant contribution to the field with respect to understanding the benefits of perceiving control.

      We thank the reviewer for this positive assessment. 

      Weaknesses:

      There are also some questions that, if addressed, could help our readership.

      (1) A key manipulation was the high-intensity stressor (Anticipatory TSST signal), which was measured via subjective ratings recorded on a sliding scale at different intervals during testing. Typically, the TSST conducted in the lab is associated with increases in cortisol assessments and physiological responses (e.g., skin conductance and heart rate). The current study is limited to subjective measures of stress, given the online nature of the study. Since TSST online may also yield psychologically different results than in the lab (i.e., presumably in a comfortable environment, not facing a panel of judges), it would be helpful for the authors to briefly discuss how the subjective results compare with other examples from the literature (either online or in the lab). The question is whether the experienced stress was sufficiently stressful given that it was online and measured via subjective reports. The control condition (low intensity via reading recipes) is helpful, but the low-intensity stress does not seem to differ from baseline readings at the beginning of the experiment.

      We agree that it would be helpful to expand on this further. Similar to the comment made by Reviewer 1, we wish to point out that there are ample demonstrations that such anticipatory stress inductions are effective at reliably eliciting physiological and psychological stress responses (e.g. Nasso et al., 2019; Schlatter et al., 2021; Steinbeis et al., 2015). Further, there is evidence that online versions of the TSST are also effective (DuPont et al., 2022; Meier et al., 2022), including evidence that the speech preparation phase conducted online was related to increases in heart rate and blood pressure (DuPont et al., 2022). We have elaborated further on this in our manuscript on pages 8 and 9 as follows:

      “Prior research has found TSST anticipation to elicit both psychological and physiological stress responses [37-39], suggesting that the task anticipation would be a valid stress induction despite participants not performing the speech task. Moreover, prior research has validated the use of remote TSST in online settings [40, 41], including evidence that the speech preparation phase (online) was related to increased heart rate and blood pressure compared to controls [40].”

      (2) The neutral videos represent an important condition to contrast with WS, but it raises two questions. First, the conditions are quite different in terms of experience, and it is interesting to consider what another more active (but not controlled per se) condition would be in comparison to the WS performance. That is, there is no instrumental action during the neutral video viewing (even passive ratings about the video), and the active demands could be an important component of the ability to mitigate stress. Second, the subjective ratings of the stress of the neutral video appear equivalent to the win condition. Would it have been useful to have a high arousal video (akin to the loss condition) to test the idea that experience of control will buffer against stress? That way, the subjective stress experience of stress would start at equivalent points after WS3.

      We agree with the reviewer that this is an important issue to clarify. In our deliberations when designing this study, we considered that that any task with actionoutcome contingencies would have a degree of controllability. To better distinguish experiences of control (WS task) to an experience of no/neutral control (i.e., neither high nor low controllability), we decided to use a task in which no actions were required during the task itself. Importantly, however, there was an active demand and concentration was still required in order to perform the attention checks regarding the content of the videos and ratings of the videos. 

      Thank you for the suggestion of having a high arousal video condition. This would indeed be interesting to test how experiencing ‘neutral’ control and high(er) stress levels preceding the stressor task influences stress buffering and stress relief, and we have included this suggestion for future research in the discussion section (page 28) as below:

      “Another avenue for future research would be to test how control buffers against stress when compared to a neutral control scenario of higher stress levels, akin to the loss domain in the WS Task, given that participants found the video condition generally relaxing. However, given that we found no differences dependent on domain for the stress induction in the WS Task conditions, it is possible that different versions of a neutral control condition would not impact the stress induction.”

      (3) For the stress relief analysis, the authors included time points 2 and 3 (after the stressor and debrief) but not a baseline reading before stress. Given the potential baseline differences across conditions, can this decision be justified in the manuscript?

      We thank the reviewer for raising this. Regarding the stress relief analyses (timepoints 2 and 3) and not including timepoint 1 (after the WS/video task) stress in the model, we have added to the manuscript that there was no significant difference in stress ratings between the high control and neutral control (collapsed across stress and domain) at timepoint 1 (hence why we do not think it’s necessary to include in the stress relief model). Nevertheless, we have now included a sensitivity analysis to test the Timepoint*Control interaction of stress relief when including timepoint 1 stress as a covariate. The timepoint by control interaction still holds, suggesting that the initial stress level prior to the stress induction does not impact our results of interest. The details of this analysis are included in the Sensitivity and Exploratory Analyses section on page 24:

      “Although there were no significant differences between control groups in subjective stress immediately after the WS/video task (t(175.6)=1.17, p=.244), we included participants’ stress level after the WS/video task as a covariate in the stress relief analyses (Table S12). The results revealed a main effect of initial stress (β= 0.643, SE=0.040, p<.001, Table S12) on the stress relief after the stressor debrief. Compared to excluding initial stress as in the original analyses (Table 4), there was now no longer a main effect of domain (β= 0.236, SE=2.60, p=.093, Table S12), but the inference of all other effects remained the same. Importantly, there was still a significant time by control interaction (β= 9.65, SE=3.74, p=.010, Table S12) showing that the decrease in stress after the debrief was greater in the highly controllable WS condition than the neutral control video condition, even when accounting for the initial stress level.”

      (4) Is the increased control experience during the losses condition more valuable in mitigating experienced stress than the win condition?

      We agree that this would be helpful to clarify. To test whether the loss domain was more valuable at mitigating experiences of stress than the win condition, we ran additional analyses with just the high control condition (WS task) to test for a Domain*Time interaction. This revealed no significant Domain*Time interaction, suggesting that the stress buffering or stress relief effect was not dependent on domain in the high control conditions. These analyses are outlined in the Sensitivity and Exploratory Analyses section on page 25:

      “Finally, to test whether the loss domain was more valuable at mitigating experiences of stress than the win condition, we ran additional analyses with just the high control condition (WS task) for the stress induction and stress relief to test for an interaction of domain and time. For the stress induction, there was no significant two-way interaction of domain and time (β= -1.45, SE=4.80, p=.763), nor a significant three-way interaction of domain by time by stressor intensity (β= -3.96, SE=6.74, p=.557, Table S15), suggesting that there were no differences in the stress induction dependent on domain. Similarly for the stress relief, there was no significant two-way interaction of domain and time (β= -5.92, SE=4.42, p=.182), nor a significant three-way interaction of domain by time by stressor intensity interaction (β= 8.86, SE=6.21, p=.154, Table S15), suggesting that there were no differences in the stress relief dependent on the WS Task domain.

      (5) The subjective measure of control ("how in control do you feel right now") tends to follow a successful or failed attempt at the WS task. How much is the experience of control mediated by the degree of experienced success/schedule of reinforcement? Is it an assessment of control or, an evaluation of how well they are doing and/or resolution of uncertainty? An interesting paper by Cockburn et al. 2014 highlights the potential for positive prediction errors to enhance the desire for control.

      We thank the reviewer for this comment. Similar to comments regarding reward rate, our task does not allow us to fully separate control from success/reinforcement because of the manipulation of difficulty. However, we did undertake sensitivity analyses and the inclusion of overall win rate accounted for limited variance when predicting stress over and above subjective control and difficulty (page 16). 

      “To further isolate the relationship between subjective control and stress separate from perceived task difficulty or objective task performance, we also included the overall win rate (percentage of trials won during the WS task) in the models. In Study 1, lower feelings of control were related to higher levels of subjective stress (β= -0.12, p<.001) even when controlling for both  win rate (β= -0.06, p=.220) and perceived task difficulty (β= 0.37, p<.001, Table S10). This also replicated in Study 2, where lower subjective control was associated with higher feelings of stress (β= -0.32, p<.001) when controlling for perceived task difficulty (β= 0.31, p<.001) and win rate (β= -0.11, p=.428, Table S11). This suggests that there is unique variance in subjective feelings of control, separate from task performance, relevant to subjective stress.” 

      (6) While the authors do a very good job in their inclusion and synthesis of the relevant literature, they could also amplify some discussion in specific areas. For example, operationalizing task controllability via task difficulty is an interesting approach. It would be useful to discuss their approach (along with any others in the literature that have used it) and compare it to other typically used paradigms measuring control via presence or absence of choice, as mentioned by the authors briefly in the introduction.

      We are delighted to expand on this particular point and have done so in the Discussion on page 27:

      “Previous research typically accounts for different outcomes (e.g. punishment) by yoking controllable and uncontrollable conditions [3] though other work has manipulated the controllability of rewards by changing the reward rate [for example 30] where a decoy stimulus is rewarded 50% of the time in the low control condition but 80% in the high control condition). While our task design does not separate control from obtained reward, we are able to do so in the statistical analyses.” 

      (7) The paper is well-written. However, it would be useful to expand on Figure 1 to include a) separate figures for study 1 (currently not included) and 2, and b) a timeline that includes the measurements of subjective stress (incorporated in Figure 1). It would also be helpful to include Figure S4 in the manuscript.

      We have expanded Figure 1 to include both Studies 1 and 2 and a timeline of when subjective stress was assessed throughout the experiment as well as adding Figure S4 to the main manuscript (now top panel within Figure 4). 

      Reviewer #1 (Recommendations for the authors):

      (1) Study 2 shows a greater decrease in subjective stress after the high-control task manipulation than after the pleasant video. One possible confound is whether the amount of time to complete the WS task and the video differ. It could be helpful to look at the average completion time for the WS task and compare that to the length of the videos. Alternatively, in future studies, control for this by dynamically adjusting the video play length to each participant based on how long they took to complete the WS task.

      This is an interesting suggestion. As a result, we have included the time taken as a covariate in the stress induction and stress relief analyses to ensure that any differences in time between the WS task and video task were not accounting for any of the stress induction or relief analyses. Controlling for the total time taken did not impact the stress induction or relief results. This is included in the Sensitivity and Exploratory Analyses section on page 24:

      “Our second sensitivity analyses was conducted because the experiment took longer to complete for the video condition (mean = 54.3 minutes, SD = 12.4 minutes) than the WS task condition (mean = 39.7 minutes, SD = 12.8 minutes, t(186.19)=-9.32, p<.001). We therefore included the total time (in ms) as a covariate in the stress induction and stress relief analyses for Study 2. This showed that accounting for total time did not change the results of interest (Table S13), further highlighting that the time by control interactions were robust.”

      (2) Because participants received feedback about their success/failure in the WS task, a confounding factor could be that they received positive feedback on highly controllable trials and negative feedback on low control trials (and/or highly difficult trials). This would suggest that it is not controllability per se that contributes to stress perception but rather feedback valence. The authors show that this is a likely factor in their results in Study 2, which shows significant effects of the loss domain on perceived control and stress. Was a similar analysis done in Study 1? Do participants receive feedback in Study 1? It would be helpful to include this information somewhere in the manuscript. I would be curious to know whether *any* feedback at all influences controllability/stress perceptions.

      We thank the reviewer for this interesting suggestion. It is an interesting question as to whether feedback valence is related to stress in Study 1, and we have added this point to the Discussion on pages 27 and 28. To speak to this point, when we include the overall win rate (which captures the subsequent feedback received) when predicting subjective stress, win rate is not a significant predictor of stress over and above perceived difficulty and subjective control, suggesting that overall feedback valence may not be related to stress in Study 1. We take this as evidence that feedback may not be as important in terms of accounting for the relationship between stress and control. However, we unfortunately do not have any data in which there was no feedback provided to speak to this conclusively. This would be an interesting future study. The excerpt below is added to pages 27 and 28 of the discussion section:

      “Like with perceived difficulty, we statistically accounted for reward rate and showed that the relationship between subjective control and stress was not accounted for by reward rate, for example. Similarly, participants received feedback after every trial, and thus feedback valence may contribute to stress perception. However, given that overall win rate (which captures the feedback received during the task) did not predict stress over and above perceived difficulty or subjective control, it suggests that feedback is unlikely to relate to stress over and above difficulty. Future work will need to disentangle this further to rule out such potential confounds.”

      To respond specifically to the reviewer’s question about the feedback given to participants, written feedback was provided on screen to participants on a trial-bytrial basis also in Study 1 (i.e. for both studies), and we have provided more clarity about this in the manuscript on page 8 as well as providing additional details in Table S3:

      “After each trial, participants were shown written feedback on screen as to whether the segment had successfully stopped on the red zone (or not), and the associated reward (or lack of). See Table S3 for details.”

      (3) I'm not sure how to interpret the fact that in Figure S1, the BICs are all essentially the same. Does this mean that you don't really need all of these varying aspects of the task to achieve the same effects? Could the task be made simpler?

      The similarity of BIC values suggests that a simpler WS task would have produced a worse account of the data approximately in keeping with the extent to which it is a simpler model. Here, the BIC scores for the models are similar, suggesting that adding these parameters adds explanatory power in keeping with what would have been expected from adding a parameter, but not more. We do note that the BIC is a relatively strict and conservative comparison. The fact that the most complex model overall narrowly improves parsimony; combined with the interpretable parameter values and the prior expectations given the task setup led us to focus on this most complex model.  

      (4) A minor point, but the authors refer to their sample as "neurotypical." Were they assessed for prior/current psychopathology/medications? If not, I might use a different term here (perhaps "non-clinical sample"), since some prior work has shown that online samples actually have higher instances of psychopathology compared to community samples.

      We have changed the phrasing of ‘neurotypical’ to a ‘non-clinical sample’ as recommended.

      Reviewer #2 (Recommendations for the authors):

      Figure 4S is very informative and could be presented in the main text.

      We have expanded Figure 1 to include both Studies 1 and 2 and a timeline of when subjective stress was assessed throughout the experiment as well as adding Figure S4 to the main manuscript (top panel of Figure 4). 

      References:

      Dorfman, H. M., & Gershman, S. J. (2019). Controllability governs the balance between Pavlovian and instrumental action selection. Nature Communications, 10(1), 5826. https://doi.org/10.1038/s41467-019-13737-7

      DuPont, C. M., Pressman, S. D., Reed, R. G., Manuck, S. B., Marsland, A. L., & Gianaros, P. J. (2022). An online Trier social stress paradigm to evoke affective and cardiovascular responses. Psychophysiology, 59(10), e14067. https://doi.org/10.1111/psyp.14067

      Jangraw, D. C., Keren, H., Sun, H., Bedder, R. L., Rutledge, R. B., Pereira, F., Thomas, A. G., Pine, D. S., Zheng, C., Nielson, D. M., & Stringaris, A. (2023). A highly replicable decline in mood during rest and simple tasks. Nature Human Behaviour, 7(4), 596–610. https://doi.org/10.1038/s41562-023-015197

      Meier, M., Haub, K., Schramm, M.-L., Hamma, M., Bentele, U. U., Dimitroff, S. J., Gärtner, R., Denk, B. F., Benz, A. B. E., Unternaehrer, E., & Pruessner, J. C. (2022). Validation of an online version of the trier social stress test in adult men and women. Psychoneuroendocrinology, 142, 105818. https://doi.org/10.1016/j.psyneuen.2022.105818

      Nasso, S., Vanderhasselt, M.-A., Demeyer, I., & De Raedt, R. (2019). Autonomic regulation in response to stress: The influence of anticipatory emotion regulation strategies and trait rumination. Emotion, 19(3), 443–454. https://doi.org/10.1037/emo0000448

      Schlatter, S., Schmidt, L., Lilot, M., Guillot, A., & Debarnot, U. (2021). Implementing biofeedback as a proactive coping strategy: Psychological and physiological effects on anticipatory stress. Behaviour Research and Therapy, 140, 103834. https://doi.org/10.1016/j.brat.2021.103834

      Steinbeis, N., Engert, V., Linz, R., & Singer, T. (2015). The effects of stress and affiliation on social decision-making: Investigating the tend-and-befriend pattern. Psychoneuroendocrinology, 62, 138–148. https://doi.org/10.1016/j.psyneuen.2015.08.003

    1. eLife Assessment

      This important study addresses the timely and interesting question of how itaconate generation emerged in evolution, using taxonomic analysis of the gene and enzyme cis-aconitate decarboxylase (CAD). The authors provide solid evidence identifying three CAD branches in metazoans and showing that the early metazoan paleo-form indeed generates aconitate and is already linked to innate immunity. They further provide limited evidence suggesting that taxonomic differences in subcellular localisation of this enzyme may allow for innate immune signalling without compromising cellular energetics. The implications of the study will be of high interest to the field of innate host defence and immunometabolism.

    2. Reviewer #1 (Public review):

      Summary:

      The taxonomic analysis of IRG1 evolution is compelling and fills an important gap in the literature. However, the experimental evidence for IRG1 localization requires greater detail and confirmation.

      Strengths:

      The phylogenetic analysis of IRG1 evolution fills an important gap in the literature. The identification of independent acquisition of metazoan and fungal IRG1 from prokaryotic sources is novel, and the observation that human IRG1 lost mitochondrial matrix localization is particularly interesting, with potentially significant implications for the study of itaconate biology.

      Weaknesses:

      The protease protection assay was conducted with MTS-IRG1 but not with wild-type IRG1, which should also be tested. Moreover, no complementary methods, such as microscopy, were employed to validate localization. Beyond humans, the structure and localization of mouse IRG1, highly relevant given the widespread use of the mouse as a model for IRG1 functional studies, are not addressed. Finally, if itaconate is indeed synthesized outside the mitochondrial matrix to safeguard metabolic activity, it is not discussed how this reconciles with its reported inhibitory effect on SDH.

    3. Reviewer #2 (Public review):

      Summary:

      The authors are trying to explain how the metabolite itaconate evolved, since although it's involved in host defense, it can also limit mitochondrial function. They are trying to probe the trade-off between these two functions.

      Strengths:

      The evolutionary aspect is novel; this is the first time to my knowledge that the evolution of IRG1 has been analysed, and there are interesting findings here. The key finding appears to be that subcellular localisation is an important aspect, allowing host defense in some organisms without compromising bioenergetics. This is an interesting finding in the context of immunomebolism, although it needs extra analysis.

      Weaknesses:

      The work concerning sub-mitochondrial localisation is confusing and needs better analysis.

    4. Reviewer #3 (Public review):

      Summary:

      IRG1 is highly expressed in activated human and mouse myeloid cells. It encodes the mitochondrial enzyme cis-aconitate decarboxylase 1 (ACOD1) that generates itaconate. Itaconate has anti-microbial activity and acts immunoregulatory by interfering with cellular metabolism, signaling to cytokine production, and multiple other processes.

      The authors perform a phylogenetic analysis of IRG1 to obtain insight into the evolution of itaconate biosynthesis. Combining BLAST with human IRG1 and a MmgE/Ptrp domain search, they find CAD in all domains of life, but the presence of IRG1 homologs is patchy in eukaryotes, indicating that itaconate biosynthesis is not essential. The phylogenetic analysis showed a more distant relationship of fungal and metazoan CAD/IRG1 to many prokaryotic sequences, suggesting independent acquisition of these metazoan and fungal CAD genes. In metazoans, three subbranches of paleo-IRG1 (in mollusks/early chordates) and two paralogous vertebrate forms (IRG1 and IRG1-like) were identified, with the latter derived from paleo-IRG1, and by genome duplication. While most jawed vertebrates have both IRG1 and IRG1L, metatherian and eutherian mammals have lost IRG1L and contain only IRG1.

      Interestingly, sequence analysis of both paralogues showed that many IRG1L genes contain an N-terminal mitochondrial targeting sequence (MTS) that is absent from most IRG1 sequences. Limited proteolysis of submitochondrial localization confirmed that zebrafish IRG1L is only sensitive to proteases in the presence of high Triton X-100, indicative of association with mitochondrial matrix. In contrast, a recent paper from the Galan lab (Lian 2003 Nature Microbiology) reported that human IRG1 is not localized to the mitochondrial matrix, although enriched in mitochondria. Here, the authors generated a matrix-targeted human IRG1 by adding the N-terminal MTS and found that it localizes to the matrix based on a limited proteolysis assay. The loss of MTS-containing IRG1L from most mammals appears, therefore, to indicate that itaconate generation is directed to the cytoplasm, potentially reducing inhibition of TCA cycle activity in the mitochondria.

      Next, the authors confirmed that the recombinant IRG1L protein has CAD activity in vitro. The last part of the manuscript addresses the expression of paleo-IRG1 in oysters and amphioxus, where they found high mRNA levels in oyster hemocytes which was further increased by poly(I:C), which was also the case in amphioxus tissues after feeding of LPS or poly(I:C), indicating a role for paleo-IRG1/itaconate in early metazoan innate immunity.

      Strengths

      (1) Phylogenetic perspective largely lacking so far in the IRG1/itaconate field.

      (2) Manuscript clearly written and understandable across disciplines.

      (3) Phylogenetic analyses complemented by biochemical and gene expression analyses to link to function.

      (4) Lack of MTS in IRG1 and change in localization from mitochondria, highly relevant antimicrobial and cellular effects of itaconate.

      Weaknesses:

      (1) Biochemical and functional analysis of different CAD mRNA and proteins lacks depth.

      (2) The submitochondrial localization assay lacks a native human IRG1 control.

      (3) CAD activity shown for IRG1L but not paleo-IRG1.

      (4) Itaconate production by early metazoans after PAMP stimulation?

      (5) No measurement of energy metabolism (trade-offs?).

      I acknowledge that some of these limitations are inevitable because the range of detailed experimental analysis is necessarily limited. However, some of these data would be important to support central claims of the manuscript (further discussed below).

    5. Author response:

      Reviewer #1 (Public review): 

      Summary: 

      The taxonomic analysis of IRG1 evolution is compelling and fills an important gap in the literature. However, the experimental evidence for IRG1 localization requires greater detail and confirmation. 

      Strengths: 

      The phylogenetic analysis of IRG1 evolution fills an important gap in the literature. The identification of independent acquisition of metazoan and fungal IRG1 from prokaryotic sources is novel, and the observation that human IRG1 lost mitochondrial matrix localization is particularly interesting, with potentially significant implications for the study of itaconate biology. 

      We thank the reviewer for appreciating the novelty of our study in exploring IRG1 evolution.  

      Weaknesses: 

      The protease protection assay was conducted with MTS-IRG1 but not with wild-type IRG1, which should also be tested. Moreover, no complementary methods, such as microscopy, were employed to validate localization. Beyond humans, the structure and localization of mouse IRG1, highly relevant given the widespread use of the mouse as a model for IRG1 functional studies, are not addressed. 

      Regarding submitochondrial localization of IRG1, we want to draw attention to the published data that a protease protection assay for wild-type mammalian IRG1 has been performed by Lian et al. 2023 (Extended Data Fig. 4), which convincingly demonstrated an outer-mitochondrial membrane localization of endogenous mouse IRG1 in mouse DC2.4 cells upon LPS stimulation that induces IRG1 expression. 

      Regarding complementary microscopy evidence, the same paper performed two-color,  DNA-paint super-resolution imaging to demonstrate an enrichment of IRG1 to mitochondria with a lack of co-localization of the inner membrane/matrix marker Cox IV. 

      Given the direct visualization of sub-mitochondrial localization, we consider applying super-resolution microscopy to revisit the sub-mitochondrial localization of di[erent IRG1 constructs in the study.   

      Reference:

      Lian H, Park D, Chen M, Schueder F, Lara-Tejero M, Liu J, Galán JE. Parkinson's disease kinase LRRK2 coordinates a cell-intrinsic itaconate-dependent defence pathway against intracellular Salmonella. Nat Microbiol. 2023 Oct;8(10):1880-1895. doi: 10.1038/s41564-023-01459-y. Epub 2023 Aug 28. PMID: 37640963; PMCID: PMC10962312.

      Finally, if itaconate is indeed synthesized outside the mitochondrial matrix to safeguard metabolic activity, it is not discussed how this reconciles with its reported inhibitory e[ect on SDH. 

      We thank the excellent point raised by the reviewer. Indeed, itaconate has been proposed to inhibit matrix SDH exhibiting anti-inflammation function (Lampropoulou, Cell Metab 2016). While the mitochondrial transport of itaconate has not been fully characterized in vivo or in cells, a specific itaconate transport activity has been shown for the mitochondrial 2-oxoglutarate transporter OGC using in vitro proteoliposome system (Mills et al. Nature 2018). 

      We plan to discuss this important point on mitochondrial itaconate transport in the revision. 

      Reference: 

      Lampropoulou V, Sergushichev A, Bambouskova M, Nair S, Vincent EE, Loginicheva E, Cervantes-Barragan L, Ma X, Huang SC, Griss T, Weinheimer CJ, Khader S, Randolph GJ, Pearce EJ, Jones RG, Diwan A, Diamond MS, Artyomov MN. Itaconate Links Inhibition of Succinate Dehydrogenase with Macrophage Metabolic Remodeling and Regulation of Inflammation. Cell Metab. 2016 Jul 12;24(1):158-66. doi: 10.1016/j.cmet.2016.06.004. Epub 2016 Jun 30. PMID: 27374498; PMCID: PMC5108454.  

      Mills EL, Ryan DG, Prag HA, Dikovskaya D, Menon D, Zaslona Z, Jedrychowski MP, Costa ASH, Higgins M, Hams E, Szpyt J, Runtsch MC, King MS, McGouran JF, Fischer R, Kessler BM, McGettrick AF, Hughes MM, Carroll RG, Booty LM, Knatko EV, Meakin PJ, Ashford MLJ, Modis LK, Brunori G, Sévin DC, Fallon PG, Caldwell ST, Kunji ERS, Chouchani ET, Frezza C, Dinkova-Kostova AT, Hartley RC, Murphy MP, O'Neill LA. Itaconate is an anti-inflammatory metabolite that activates Nrf2 via alkylation of KEAP1. Nature. 2018 Apr 5;556(7699):113117. doi: 10.1038/nature25986. Epub 2018 Mar 28. PMID: 29590092; PMCID: PMC6047741.

      Reviewer #2 (Public review): 

      Summary: 

      The authors are trying to explain how the metabolite itaconate evolved, since although it's involved in host defense, it can also limit mitochondrial function. They are trying to probe the trade-o[ between these two functions. 

      Strengths: 

      The evolutionary aspect is novel; this is the first time to my knowledge that the evolution of IRG1 has been analysed, and there are interesting findings here. The key finding appears to be that subcellular localisation is an important aspect, allowing host defense in some organisms without compromising bioenergetics. This is an interesting finding in the context of immunomebolism, although it needs extra analysis. 

      Weaknesses: 

      The work concerning sub-mitochondrial localisation is confusing and needs better analysis. 

      We thank the reviewer for the constructive feedback. As in our response to reviewer 1, we want to draw attention to the published data in which the outer mitochondrial membrane localization of IRG1 has been demonstrated by protease protection assay and explored using super-resolution imaging by Lian et al. 2023 (Extended Data Fig. 4). Given the direct visualization of sub-mitochondrial localization by super-resolution imaging, we plan to revisit and to apply the method to di[erent IRG1 constructs used in the paper.

      Reviewer #3 (Public review): 

      Summary: 

      IRG1 is highly expressed in activated human and mouse myeloid cells. It encodes the mitochondrial enzyme cis-aconitate decarboxylase 1 (ACOD1) that generates itaconate. Itaconate has anti-microbial activity and acts immunoregulatory by interfering with cellular metabolism, signaling to cytokine production, and multiple other processes. 

      The authors perform a phylogenetic analysis of IRG1 to obtain insight into the evolution of itaconate biosynthesis. Combining BLAST with human IRG1 and a MmgE/Ptrp domain search, they find CAD in all domains of life, but the presence of IRG1 homologs is patchy in eukaryotes, indicating that itaconate biosynthesis is not essential. The phylogenetic analysis showed a more distant relationship of fungal and metazoan CAD/IRG1 to many prokaryotic sequences, suggesting independent acquisition of these metazoan and fungal CAD genes. In metazoans, three subbranches of paleo-IRG1 (in mollusks/early chordates) and two paralogous vertebrate forms (IRG1 and IRG1-like) were identified, with the latter derived from paleo-IRG1, and by genome duplication. While most jawed vertebrates have both IRG1 and IRG1L, metatherian and eutherian mammals have lost IRG1L and contain only IRG1. 

      Interestingly, sequence analysis of both paralogues showed that many IRG1L genes contain an N-terminal mitochondrial targeting sequence (MTS) that is absent from most IRG1 sequences. Limited proteolysis of submitochondrial localization confirmed that zebrafish IRG1L is only sensitive to proteases in the presence of high Triton X-100, indicative of association with mitochondrial matrix. In contrast, a recent paper from the Galan lab (Lian 2003 Nature Microbiology) reported that human IRG1 is not localized to the mitochondrial matrix, although enriched in mitochondria. Here, the authors generated a matrix-targeted human IRG1 by adding the N-terminal MTS and found that it localizes to the matrix based on a limited proteolysis assay. The loss of MTS-containing IRG1L from most mammals appears, therefore, to indicate that itaconate generation is directed to the cytoplasm, potentially reducing inhibition of TCA cycle activity in the mitochondria. 

      Next, the authors confirmed that the recombinant IRG1L protein has CAD activity in vitro. The last part of the manuscript addresses the expression of paleo-IRG1 in oysters and amphioxus, where they found high mRNA levels in oyster hemocytes which was further increased by poly(I:C), which was also the case in amphioxus tissues after feeding of LPS or poly(I:C), indicating a role for paleo-IRG1/itaconate in early metazoan innate immunity. 

      Strengths 

      (1) Phylogenetic perspective largely lacking so far in the IRG1/itaconate field. 

      (2) Manuscript clearly written and understandable across disciplines. 

      (3) Phylogenetic analyses complemented by biochemical and gene expression analyses to link to function. 

      (4) Lack of MTS in IRG1 and change in localization from mitochondria, highly relevant antimicrobial and cellular e[ects of itaconate. 

      We thank the reviewer for the positive comments with the strengths.  

      Weaknesses: 

      (1) Biochemical and functional analysis of di[erent CAD mRNA and proteins lacks depth. 

      We plan to explore two types of experiments: 

      First, we plan to purify di[erent CAD recombinant proteins; and if successful, we will test their in vitro enzymatic activity in synthesize itaconate. The positive data will also answer question (3) below.

      Second, we plan to measure itaconate level in oyster hemocytes after PAMP stimulation, to demonstrate an in vivo itaconate production activity by paleo-IRG1. The data will also address question (4) below. 

      (2) The submitochondrial localization assay lacks a native human IRG1 control. 

      As in our response to reviewer 1, we believe Lian et al. 2023. provided strong evidence supporting an outer mitochondrial membrane localization of wild-type endogenous, mouse IRG1. Given the direct visualization using suer-resolution imaging, we plan to revisit submitochondrial localization of di[erent IRG1 constructs using super-resolution imaging. 

      (3) CAD activity shown for IRG1L but not paleo-IRG1. 

      We plan to purify di[erent CAD recombinant proteins; and if successful, we will test their in vitro enzymatic activity in producing itaconate.

      (4) Itaconate production by early metazoans after PAMP stimulation? 

      We plan to measure itaconate level in oyster hemocytes after PAMP stimulation, to demonstrate an in vivo itaconate production activity by paleo-IRG1.

      (5) No measurement of energy metabolism (trade-o[s?). 

      Because PAMP signaling might trigger other downstream e[ects that also impair mitochondrial function, for instance nitric oxide that inhibits complex IV, we plan to avoid PAMP condition and direct test the e[ect of itaconate production. We plan to compare the impact on mitochondrial bioenergetics, if the same CAD enzymes (thus with the same activity) can be expressed at the same level intra-mitochondrially and extramitochondrially, for instance in the case of MTS-hACOD1 and hACOD1.

    1. eLife Assessment

      This work provides a valuable comparison of sentence structure representations in the human brain and state-of-the-art Large Language Models (LLMs). Based on solid analysis of 7T fMRI data, it systematically identifies sentences in which LLMs underperform relative to models that explicitly code for syntactic structure. The study will be of significant interest to both cognitive neuroscientists and artificial intelligence researchers.

    2. Reviewer #1 (Public review):

      Summary:

      This paper investigates whether transformer-based models can represent sentence-level semantics in a human-like way. The authors designed a set of 108 sentences specifically to dissociate lexical semantics from sentence-level information and collected 7T fMRI data from 30 participants reading these sentences. They conducted representational similarity analysis (RSA) comparing brain data and model representations, as well as the human behavioral ratings. It is found that transformer-based models match brain representation better than a static word embedding baseline, which ignores word order, but fall short of models that encode the structural relations between words. The main contributions of this paper are:

      (1) The construction of a sentence set that disentangles sentence structure from word meaning.

      (2) A comprehensive comparison of neural sentence representations (via fMRI), human behavior, and multiple computational models at the sentence level.

      Strengths:

      (1) The paper evaluates a wide variety of models, including layer-wise analysis for transformers and region-wise analysis in the human brain.

      (2) The stimulus design allows precise dissociation between lexical and sentence-level semantics. The RSA-based approach is empirically sound and intuitive.

      (3) The constructed sentences, along with the fMRI and behavioral data, represent a valuable resource for studying sentence representation.

      Weaknesses:

      (1) The rationale behind averaging sentence embeddings across multiple transformer models (with different architectures and training objectives) is unclear. These transformer-based models have different training paradigms and model architectures, which may result in misaligned semantic spaces. The averaging operation may dilute the distinct sentence representations learned by each model, potentially weakening the overall semantic encoding for sentences. Please clarify this choice or cite supporting methodology.

      (2) All structure-sensitive models discussed incorporate semantics to some extent. Including a purely syntactic baseline, such as a model based on context-free grammar, would help confirm the importance of syntactic structures.

      (3) In Figure 2, human behavioral judgments show weak correlations with neural data, and even fall below those of computational models, suggesting the behavioral judgments may not reflect the sentence structures in a brain-like way. This discrepancy between behavioral and neural data should be clarified, as it affects the interpretation of the results.

      (4) To better contextualize model and neural performance, sentence similarity should be anchored to a notion of semantic "ground truth", such as the matrix shown in Figure 1a. Comparing this reference with human judgments, brain responses, and model similarities would help establish an upper bound.

      (5) The structure of this paper is confusing. For instance, Figure 5 is cited early but appears much later. Reordering sections and figures would enhance readability.

      (6) While the analysis is broad and comprehensive, it lacks depth in some respects. For instance, it remains unclear what specific insights are gained from comparing across brain regions (e.g., whole brain, language network, and other subregions). Similarly, the results of simple-average and group-average RSA appear quite similar and may not advance the interpretation.

      (7) While explaining the grid-like pattern due to sentence length is important, this part feels somewhat disconnected from the central question of this paper (word order). It might be better placed in supplementary material.

    3. Reviewer #2 (Public review):

      Summary:

      The paper used fMRI data while reading a set of sentences. The sentences are designed to disentangle syntax from meaning. RSA was performed using voxel activations and a variety of language models. The results show that transformers are inferior to models with explicit syntactic representation in terms of matching brain representations.

      Strengths:

      (1) The study controls for some variables that allow for an investigation of sentence structure in the brain. This controlled setting has an advantage over naturalistic stimuli in targeting more specific linguistic phenomena.

      (2) The study combines fMRI data with behavioral similarity ratings and a variety of language models (static, transformers, graph-based models).

      Weaknesses:

      (1) The stimuli are not fully controlled for lexical content across conditions. Residual lexical differences between sentences could still influence both brain and model similarity patterns. To more cleanly isolate syntactic effects, it would be useful to systematically vary only a single structural element while keeping all other lexical content constant (e.g., the boy kicked the ball / the ball kicked the boy). It would be better to engage more with the minimal pair paradigm, which is widely used in large language model probing research.

      (2) The comparisons are done across fundamentally different model types, including static embeddings, graph-based parsers, and transformers. The inherent differences in dimensionality and training objectives might make the conclusion drawn from RSA inconclusive. Transformer embeddings typically occupy much higher-dimensional, anisotropic representational spaces, and their similarity structure may reflect richer, more heterogeneous information than models explicitly encoding semantic roles. A lower RSA correlation in this study does not necessarily imply that transformers fail to encode syntactic information; rather, they may represent additional aspects of meaning or context that diverge from the narrow structural contrasts probed here.

      (3) The interpretation of the RSA correlation largely depends on the understanding of models. The authors suggest that because hybrid models correlate better than transformers, this implies that transformers are inferior at representing syntax. However, this is not a direct test of syntactic ability. Transformers may encode syntactic information, but it may not be expressed in a way that aligns with the RSA paradigm or the chosen stimuli. RSA does not reveal what the model encodes, and the models might achieve a good correlation for non-syntactic reasons (e.g., length of sentence, orthographic similarity, lexical features).

    4. Reviewer #3 (Public review):

      Summary:

      Large Language Models have revolutionized Artificial Intelligence and can now match or surpass human language abilities on many tasks. This has fueled interest in cognitive neuroscience in exposing representational similarities between Language Models and brain recordings of language comprehension. The current study breaks from this mold by: (1) Systematically identifying sentence structures for which brain and Large Language Model representations diverge. (2) Demonstrating that brain representations for these sentences can be better accounted for by a model structured by the semantic roles of words in the sentence. As such, the study may now fuel interest in characterizing how Large Language Models and brain representations differ, which may prompt new, more brain-like language models.

      Strengths:

      (1) This study presents a bold and solid challenge to a literature trend that has touted similarities between Transformer models and human cognition based on representational correlations with brain activity. This challenge is substantiated by identifying sentences for which brain and model representations of sentences diverge and explaining those divergences using models structured by semantic roles/syntax.

      (2) This study conducts a rigorous pre-registered analysis of a comprehensive selection of the state-of-the-art Large Language Models, on a controlled sentence comprehension fMRI dataset. The analysis is conducted within a Representation Similarity framework to support similarity comparisons between graph structures and brain activity without needing to vectorize graphs. Transformer models are predicted and shown to diverge from brain representations on subsets of sentences with similar word-level content but different sentence structures.

      (3) The study introduces a 7T fMRI sentence comprehension dataset and accompanying human sentence similarity ratings, which may be a fruitful resource for developing more human-like language models. Unlike other model-based sentence datasets, the relation between grammatical structure and word-level content is controlled, and subsets of sentences for which models and brains diverge are identified.

      Weaknesses:

      (1) The interpretation of findings is nuanced. Although Transformers underperform as brain models on the critical subsets of controlled sentences, a Transformer outperforms all other models when evaluated on the union of all sentences when both word-level content and structure vary. Transformers also yield equivalent or better models of human behavioral data. Thus, although Transformers have demonstrable flaws as human models, which are pinpointed here, in the general case, (some) Transformers are more human-like than the other models considered.

      (2) There may be confounds between the critical sentence structure manipulations and visual representations of sentence stimuli. This is inconvenient because activation in brain regions that process semantics tends to partially correlate with visual cortex representations, and computational models tend to reflect the number of words/tokens/elements in sentences. Although the study commendably controls for confounds associated with sentence length, there could still be residual effects that remain. For instance, the Graph model correlates most strongly with the visual cortex despite these sentence length controls.

      (3) Sentence similarity computations are emphasized as the basis for unifying comparative analyses of graph structures and vector data. A strength of this approach is that correlation is not always the ideal similarity metric. However, a weakness is that similarity computations are not unified across models. This has practical consequences here because different similarity metrics applied to the same model produce positive or negative correlations with brain data.

    5. Author response:

      We thank the reviewers for their insightful comments on our manuscript. Here we briefly highlight our responses to several issues raised by reviewers, and also provide a summary of planned changes to be made with the next draft.

      Reviewer 1:

      (1) The reviewer questions the rationale for averaging sentence embeddings across different models. However, our method involves computing correlations separately for each model, then averaging the correlations. We also report model correlations for each model separately in Fig S2. We will clarify this in our revised manuscript.

      (2) We agree with the reviewer that including a context-free grammar model as a comparison would be informative. We will incorporate this in the revised manuscript.

      (3) The reviewer raises questions about the low correlation between behavioural and brain similarities. While the behavioural judgements are made by different participants and involve a different task than the neuroimaging results, nonetheless we agree the difference is surprising and warrants more detailed consideration. We will provide additional discussion of the relationship between behavioural judgements and brain data in the revised manuscript.

      (4) The reviewer suggests contrasting our models with a ‘semantic ground truth’, as in our design matrix shown in Fig 1. While our design matrix served as the basis for constructing a set of stimuli with systematic modifications, we respectfully suggest that it should not be regarded as a ‘semantic ground truth’. In particular, sentence pairs within each category will not have the same degrees of semantic similarity since the words and context differ across sentences in a graded manner. Furthermore, while we anticipated ‘different’ sentence pairs would be less similar than ‘swapped’ sentence pairs, and that within each of the six block diagonals the ‘modified’ or ‘substituted’ sentence pairs would be the most similar, we did not have any prediction about the magnitude of these differences. Our goal was to construct a set of sentence pairs which spanned a range of semantic similarities, and allowed for dissociation between lexical similarity and overall similarity in meaning. The design matrix is not intended to represent a ‘ground truth’ that human judgements or brain representations would be expected to conform with.

      (5) In the revised draft we will modify the location of Fig. 5 so that it flows better with the text.

      (6) We agree that the discussion of the differences between brain regions could be expanded. We will include this in the revised version of our manuscript. The reviewer questions our inclusion of the simple-average and group-average RSA analysis as they show similar results. We included both analyses in line with our preregistration, and also because we believe the fact that two distinct approaches to analyzing the data yield similar results strengthens our conclusions.

      (7) We believe that the grid-like pattern in the RSA results is an important unexpected finding that warrants discussion in the main manuscript.

      Reviewer 2:

      (1) The reviewer argues that our stimuli do not fully control for lexical content across conditions, and that a more appropriate paradigm may be to utilise minimal pairs in which only a single variable of interest (such as sentence structure) is modified. We agree that most of our sentence pairs do not constitute minimal pairs, however this was not our objective. Our study design aimed to synthesise traditional minimal pair approaches with more recent research paradigms using naturalistic stimuli. As such, we selected stimuli which are more complex and contain more variable features than traditional minimal pair studies, but which also are tailored to highlight differences which are of particular theoretical interest. Because we are interested in comparing the effects of multiple sentence elements and semantic roles, a systematic pairwise comparison of minimal pairs is not necessarily optimal. Instead, we designed our stimuli to leverage the advantage of fMRI in that we can measure the brain representations corresponding to each sentence, and hence can conduct a full series of pairwise comparisons of sentence representations. Most of these comparisons will not be between minimal pairs, but we selected sentences so as to provide a range of semantic similarities (low to high), while also providing for semantic contrasts of theoretical interest (such as the ‘swapped’ and ‘substituted’ sentence pairs). We do not claim this approach to be universally superior to a minimal pair approach, but we do believe our novel approach provides additional insights and a new perspective on semantic representation relative to minimal pair studies. We will add additional detail in the revised manuscript providing additional explanation for how stimuli were chosen, and contrasting this with minimal pair approaches.

      (2) The reviewer notes that low RSA correlations do not imply that transformers fail to encode syntactic information. We acknowledge this in our discussion (page 10), where we also highlight that our focus is not on whether transformers encode such information, but rather what transformer representations can tell us about how sentence structure is represented in the brain. Our results indicate that transformer embeddings do not have the same geometric properties as brain representations of sentence meaning, at least for certain types of sentences where lexical information is insufficient to determine overall meaning. The reviewer also notes that transformer embeddings are highly anisotropic, however we adjust for this by normalising each feature as discussed on page 14. Finally, the reviewer notes that the transformers we examine differ in architecture and training objectives. This is not critical for our study because we are not seeking to determine which architecture or training objectives are best. Our goal is simply to compare a range of approaches and see which, if any, have similar sentence representations to those formed by the brain. In fact, our results indicate that architecture and training regime make relatively little difference for our stimuli.

      (3) The reviewer argues that RSA correlations do not measure the extent to which a model encodes syntactic information. This is very similar to the previous point. We do not claim that our results show that transformers do not encode syntactic information. Rather, our claim is that sentence embeddings derived from transformers have different geometric properties to brain representations, and that brain representations are better described by models explicitly representing key semantic roles. From this we conclude that, at least for the sentences we present, the brain is highly sensitive to semantic roles in a way that transformer representations are not (at least to the same extent). We also respectfully disagree with the reviewer’s suggestions that sentence length and orthographic or lexical similarities may drive model correlations with brain activity. As we discuss on page 19, we explicitly control for differences in sentence length when computing correlations. Our process for constructing our sentence set also controls for lexical similarity by generating pairs of sentences with all or mostly the same words but different orderings. We did not explicitly address orthographic similarity, but this will be strongly correlated with lexical similarity.

      Reviewer 3:

      (1) The reviewer emphasises the need for nuance in our conclusions, given that some of the transformers achieve higher correlations when assessed over the full set of sentences. We agree with this comment, and will modify the discussion section in the revised manuscript to address this point. Having said that, we would like to note one of the disadvantages of transformers as a model of mind or brain representations is that they are largely a ‘black box’ whose workings are poorly understood. One advantage of hybrid models like our simple semantic role model is that they can be much easier to interpret, thereby enabling them to be used to determine which features are most important for brain representations of sentence meaning, and what mechanisms are used to combine individual words into a full sentence. Given their relative simplicity and interpretability, we believe hybrid models have considerable value as scientific tools, even in cases where they achieve comparable correlations to transformers. We will highlight this issue more clearly in our revised manuscript.

      (2) The reviewer notes that despite our existing controls, residual confounds of sentence length may remain. We agree that this is a potential issue, and will add discussion to the revised manuscript. We also will present further supplementary analyses which we believe indicate that sentence length effects do not drive our main results. At the same time, we believe the fact that our results are robust to simultaneously controlling for sentence length and the ‘minimum length effect’ (Fig. S5) indicates they are not primarily driven by sentence length effects.

      (3) The reviewer notes that the method for computing similarities differs between the vector-based (mean and transformer) models, and the hybrid and syntax-based models, thereby potentially adding an additional confound to our results. We agree that this is a potential limitation, and our correlations should always be understood as applying to a model paired with a similarity metric. However, we believe that this is mostly unavoidable when comparing different formalisms. An alterative approach of first embedding a graph into a vector and then training an encoding model on the graph embeddings has a similar limitation of being dependent not just on the graph representation, but also on the way it was embedded into a vector and the way the encoding model was trained. Arguably this process is more opaque than similarity methods, since it is unclear to what extent the graph embeddings preserve the logic and properties of a graph-based representation. Further, it not clear whether there is any single method which can overcome the difficulty of comparing distinct formalisms for representing semantics. The reviewer also highlights how the correlations measured for the syntax model differ greatly depending on whether the Smatch or WWLK similarity metrics are used. We believe this highlights the need for careful examination of commonly used graph similarity metrics, as has been noted in previous research. We will include additional discussion of this issue in our revised manuscript.

    1. eLife Assessment

      This useful study introduces a computational pipeline for designing RNA in situ fluorescence hybridization probes that could improve the sensitivity and specificity of RNA detection in cells. While the approach is novel and the preliminary data suggestive, the evidence supporting a clear advantage over existing probe design strategies is incomplete. The work will be of interest to researchers developing or using molecular tools for imaging RNA in cells.

    2. Reviewer #1 (Public review):

      The authors describe a new computational pipeline designed to identify smFISH probes with improved RNA detection compared to preexisting approaches. smFISH is a powerful and relatively straightforward technique to detect single RNAs in cells at subcellular resolution, which is critical for understanding gene expression regulation at the RNA level. However, existing methods for designing smFISH oligos suffer from several limitations, including off-target binding that produces high background signals, as well as a restricted number of probes that are sufficiently specific to target shorter-than-average mRNAs. To address these challenges, the authors developed TrueProbes, a computational method that aims to minimize off-target-mediated background fluorescence.

      Overall, the study addresses a technically relevant problem. If improved, this would allow researchers to study gene expression regulation more effectively using single-molecule FISH. However, based on the current presentation of data, it is not yet clear that TrueProbes offers significant advantages over preexisting pipelines. In the following section, I describe some concerns, which should be adequately addressed.

      Major Comments:

      (1) The manuscript currently presents only one example in which different pipelines were tested to generate probes (targeting ARF4). While the images suggest that both TrueProbes and Stellaris outperform the other pipelines, the comparison is potentially misleading because the number of probes used differs substantially. I recommend that the authors include at least three independent examples in which an equal number of probes are designed across pipelines, so that signal-to-noise can be assessed in a controlled and comparable way. This would allow the probe number to be held constant while directly evaluating performance.

      (2) It is also unclear how many biological replicates were performed for the ARF4 experiments. If only a single replicate was included, it is difficult to conclude that TrueProbes consistently outperforms other pipelines in a robust and reproducible manner. I suggest the authors include data from at least three biological replicates with appropriate statistical analysis, and ideally extend this to additional smFISH targets as outlined in Comment 1.

      (3) No controls are presented to demonstrate that the TrueProbes-designed smFISH spots are specifically detecting ARF4. The current experiment primarily measures signal-to-noise, but it remains possible that some detected spots do not correspond to ARF4 mRNAs. Since one of the major criteria used by TrueProbes is to limit cross-hybridization, the authors should perform ARF4 knockdown experiments and demonstrate that nearly all ARF4 smFISH signal is lost. A similar approach should be applied to the additional examples recommended in Comment 1.

      (4) In the limitations of the study, the authors note that "RNA secondary and tertiary structures are not included, which may lead to inaccuracies if binding sites are structurally occluded." However, I am not convinced that this is a true limitation, since formamide in the smFISH protocol should denature secondary structures and allow oligo access to the RNA. I recommend that the authors comment on this point and clarify whether secondary structure poses a practical limitation in smFISH probe design.

      (5) The authors also correctly acknowledge in their limitations that "RNA-protein interactions, which can modulate accessibility of the transcript, are not modeled." I suggest referencing relevant studies on this issue, particularly Buxbaum et al. (2014, Science), which would provide important context.

    3. Reviewer #2 (Public review):

      Summary:

      Hughes et al present a new single-molecule RNA fluorescence in situ hybridization (smFISH) probe design software, termed "TrueProbes" in this manuscript. They claim that all existing smFISH (and variants) probe design software packages have limitations that ultimately impact experimental performance. The author's claim to address the majority of these limitations in TrueProbes by introducing multiple computational steps to ensure high-quality probe design. The manuscript's goal is clear, and the authors provide some evidence by designing and targeting one gene. Overall, the manuscript lacks rigorous evidence to support the claims, does not demonstrate its suitability for a variety of smFISH-type experiments, and some of the provided quantification data are unclear. While TrueProbes clearly has potential, more data is required, or the authors should tone down the claims.

      Strengths:

      (1) The problem is well-articulated in the abstract and the introduction.

      (2) Figures 3 and 4 follow a consistent color scheme where each probe design method has its own color, which helps the reader visually compare methods.

      (3) The authors compared multiple probe design software packages both computationally and experimentally.

      (4) TrueProbes does produce visually and quantitatively better results when compared to 2 of the 4 existing smFISH probe design packages (Paintshop and MERFISH panel designer).

      (5) The authors introduce a comprehensive steady-state thermodynamic model to help optimally guide probe design.

      Weaknesses:

      (1) The abstract describes the problem well and introduces the solution (the TrueProbes software), but fails to provide specific ways in which the TrueProbes software performs better. The authors state that "...[TrueProbes] consistently outperformed alternatives across multiple computational metrics and experimental validation assays", but specific, quantitative evidence of improved performance would strengthen the statement.

      (2) The text claims that TrueProbes outperforms all other probe design software, but Figure 3 indicates that TrueProbes has neither the greatest number of on-target binding nor the lowest number of off-target binding. The data in Figure 3 does not support the claims made in the text. Specifically, the authors claim that "RNA FISH Experimental Results Demonstrate that Off Target and Binding Affinity Inclusive Probe Design Improve RNA FISH Signal Discrimination" (lines 217-218). However, despite their claim that Stellaris and Oligostan-HT produce more off-target probes when evaluated with the TrueProbes framework, the experiment results are nearly identical. The authors should consider modifying their claims or performing new experiments that more clearly demonstrate their claims.

      (3) The bar graphs in Figure 3 do not seem to agree with the probability graphs in Figure 4. For example, Figure 3 indicates that Stellaris probes have higher off-target binding than TrueProbes; however, in Figure 4, their probability graphs lie almost on top of each other.

      (4) The authors performed validation for only one gene (ARF4), because "...it had the highest gene expression (in TPM units) and the fewest isoforms among all candidate genes for the Jurkat cell line" (lines 176-177). While the results do look good, this is a minimal use case and does not really showcase the power of their method. One experiment that could be helpful would be two-color (or more) smFISH in tissue, where the chances for off-target binding contributing to higher errors are much greater than in an adherent cell line.

      (5) A common strategy for both smFISH and highly multiplexed methods is to use secondary DNA oligos with dye molecules instead of direct conjugation. Given that this is a primary design goal of PaintSHOP and the Zhuang lab's MERFISH probe design code, it would be helpful to demonstrate that TrueProbes can design a two-layer probe strategy for high-quality RNA-FISH labeling.

      (6) The authors claim, "For every probe set, TrueProbes can simulate expected smRNA FISH outcomes including optimal probe, RNA, and salt concentrations and optionally account for probe secondary structure, hybridization temperature, multiple targets, fluorophore choice, DNA, nascent RNA, and photon count statistics (Figures S2A, S2B). The model can be used to generate predictions for temperature and cell line sensitivity, multi-target discrimination, multiple fluorophore colocalization; when provided transcript expression levels and probe/background intensity, it can start to generate predictions for spot intensity, background, signal to noise ratio, and false negative rates (Figure S2C)." (lines 156-163). Figure S2 is a flow chart and does not provide evidence for any of these items. The authors should provide evidence for these claims, either as a figure or an example script in their software repository. If that is not possible, then it should be removed.

      (7) All thermodynamic equations are performed at steady state. The authors do not justify this assumption, and there is no discussion of the potential impacts of either low molecule numbers or violations of the well-mixed assumption. Can the authors please include a discussion on the potential impacts non non-steady state dynamics?

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript introduces a new platform termed "TrueProbes" for designing mRNA FISH probes. In comparison to existing design strategies, the authors incorporate a comprehensive thermodynamic and kinetic model to account for probe states that may contribute to nonspecific background. The authors validate their design pipeline using Jurkat cells and provide evidence of improved probe performance.

      Strengths:

      A notable strength of TrueProbes is the consideration of genome-wide binding affinities, which aims to minimize off-target signals. The work will be of interest to researchers employing mRNA FISH in certain human cell lines.

      Weaknesses:

      However, in my view, the experimental validation is not sufficient to justify the broad claims of the platform. Given the number of assumptions in the model, additional experimental comparisons across probe design methods, ideally targeting transcripts with different expression levels, would be necessary to establish the general superiority of this approach.

    5. Author response:

      Reviewer #1 (Public Review):

      The authors describe a new computational pipeline designed to identify smFISH probes with improved RNA detection compared to preexisting approaches. smFISH is a powerful and relatively straightforward technique to detect single RNAs in cells at subcellular resolution, which is critical for understanding gene expression regulation at the RNA level. However, existing methods for designing smFISH oligos suffer from several limitations, including off-target binding that produces high background signals, as well as a restricted number of probes that are sufficiently specific to target shorter-than-average mRNAs. To address these challenges, the authors developed TrueProbes, a computational method that aims to minimize off-target-mediated background fluorescence.

      Overall, the study addresses a technically relevant problem. If improved, this would allow researchers to study gene expression regulation more effectively using single-molecule FISH. However, based on the current presentation of data, it is not yet clear that TrueProbes offers significant advantages over preexisting pipelines. In the following section, I describe some concerns, which should be adequately addressed.

      Major Comments:

      (1) The manuscript currently presents only one example in which different pipelines were tested to generate probes (targeting ARF4). While the images suggest that both TrueProbes and Stellaris outperform the other pipelines, the comparison is potentially misleading because the number of probes used differs substantially. I recommend that the authors include at least three independent examples in which an equal number of probes are designed across pipelines, so that signal-to-noise can be assessed in a controlled and comparable way. This would allow the probe number to be held constant while directly evaluating performance.

      This is an important observation. We have already addressed this issue in Figures 3E-G and Supplementary Figure 4E-G, where we plotted the number of OFF-targets for each ON-target probe. If we select longer genes to ensure an equal number of designed probes with strong signals, we will still end up with the same number of ON-target probes. Consequently, Figures 3B-D and 3E-G would show similar trends, albeit with different values on the y-axis. Additionally, we will conduct an analysis using Stellaris at its highest probe design stringency setting to compare the software under its strictest design conditions. Additional experiments are outside the scope of the current manuscript.

      (2) It is also unclear how many biological replicates were performed for the ARF4 experiments. If only a single replicate was included, it is difficult to conclude that TrueProbes consistently outperforms other pipelines in a robust and reproducible manner. I suggest the authors include data from at least three biological replicates with appropriate statistical analysis, and ideally extend this to additional smFISH targets as outlined in Comment 1.

      Three biological replicates were utilized for the ARF4 experiments. As stated in the original submission, the average data from all three replicates is presented in Figure 4, while the data for each individual replicate can be found in Figure S5. Statistical analyses were conducted for both the pooled data in Figure 4 and the individual data in Figure S5. The results of all statistical calculations are detailed in Supplemental Table 1. We will update the text to clearly indicate the number of biological replicates and the outcomes of the statistical analysis.

      (3) No controls are presented to demonstrate that the TrueProbes-designed smFISH spots are specifically detecting ARF4. The current experiment primarily measures signal-to-noise, but it remains possible that some detected spots do not correspond to ARF4 mRNAs. Since one of the major criteria used by TrueProbes is to limit cross-hybridization, the authors should perform ARF4 knockdown experiments and demonstrate that nearly all ARF4 smFISH signal is lost. A similar approach should be applied to the additional examples recommended in Comment 1.

      Thank you for your suggestion. Currently, we lack the expertise in our lab to conduct such experiments, so they are beyond the scope of this manuscript. However, we will create additional supplementary figures to demonstrate that the likelihood of false positives is low, based on the assumption that current publicly available BLAST algorithms, genome annotations, and reference transcription expression data are accurate.

      We will include a comparison in our supplementary materials showing the off-target RNA that can bind the highest number of probes simultaneously for each software. Additionally, we will perform a correlation analysis to illustrate the relationship between spot intensity for different software and the number of probes they design. This will help us estimate how the number of probes bound to RNA correlates with expected spot intensity ranges.

      Using this information, along with autofluorescence background intensity measurements from no-probe controls, we will estimate the minimum number of probes that need to bind to targets to be detected as single spots. If this minimum is higher than the maximum number of simultaneous off-target probe bindings, we anticipate that the detected spot signal will primarily reflect ARF4 rather than other transcripts.

      (4) In the limitations of the study, the authors note that "RNA secondary and tertiary structures are not included, which may lead to inaccuracies if binding sites are structurally occluded." However, I am not convinced that this is a true limitation, since formamide in the smFISH protocol should denature secondary structures and allow oligo access to the RNA. I recommend that the authors comment on this point and clarify whether secondary structure poses a practical limitation in smFISH probe design.

      Thank you for pointing this out. We will revise the manuscript to clarify: "We did not include RNA secondary and tertiary structures in the model because the use of formamide in RNA-FISH experiments denatures these structures, allowing oligonucleotides to access the RNA."

      (5) The authors also correctly acknowledge in their limitations that "RNA-protein interactions, which can modulate accessibility of the transcript, are not modeled." I suggest referencing relevant studies on this issue, particularly Buxbaum et al. (2014, Science), which would provide important context.

      Thank you for highlighting the literature that supports this limitation. We will include Buxbaum et al. (2014, Science) and additional studies that discuss how RNA-protein interactions can affect RNA-FISH experiments.

      Reviewer #2 (Public review):

      Summary:

      Hughes et al present a new single-molecule RNA fluorescence in situ hybridization (smFISH) probe design software, termed "TrueProbes" in this manuscript. They claim that all existing smFISH (and variants) probe design software packages have limitations that ultimately impact experimental performance. The author's claim to address the majority of these limitations in TrueProbes by introducing multiple computational steps to ensure high-quality probe design. The manuscript's goal is clear, and the authors provide some evidence by designing and targeting one gene. Overall, the manuscript lacks rigorous evidence to support the claims, does not demonstrate its suitability for a variety of smFISH-type experiments, and some of the provided quantification data are unclear. While TrueProbes clearly has potential, more data is required, or the authors should tone down the claims.

      We appreciate the reviewer’s thoughtful feedback. We will revise the text to ensure that all claims are backed by computational or experimental evidence. For claims that do not have supporting results, we will relocate them to the discussion section as potential future extensions. Since our probe design is open access, both we and the community can further develop our codes as needed.

      Strengths:

      (1) The problem is well-articulated in the abstract and the introduction.

      (2) Figures 3 and 4 follow a consistent color scheme where each probe design method has its own color, which helps the reader visually compare methods.

      (3) The authors compared multiple probe design software packages both computationally and experimentally.

      (4) TrueProbes does produce visually and quantitatively better results when compared to 2 of the 4 existing smFISH probe design packages (Paintshop and MERFISH panel designer).

      (5) The authors introduce a comprehensive steady-state thermodynamic model to help optimally guide probe design.

      We like to thank the reviewer for pointing out the strength of the manuscript.

      Weaknesses:

      (1) The abstract describes the problem well and introduces the solution (the TrueProbes software), but fails to provide specific ways in which the TrueProbes software performs better. The authors state that "...[TrueProbes] consistently outperformed alternatives across multiple computational metrics and experimental validation assays", but specific, quantitative evidence of improved performance would strengthen the statement.

      Thank you for acknowledging the clarity of the abstract and introduction. We will revise the abstract to provide more specific details on how TrueProbes outperforms other software. Additionally, we will include specific computational and experimental metrics that demonstrate TrueProbes' improved performance compared to other software.

      (2) The text claims that TrueProbes outperforms all other probe design software, but Figure 3 indicates that TrueProbes has neither the greatest number of on-target binding nor the lowest number of off-target binding. The data in Figure 3 does not support the claims made in the text. Specifically, the authors claim that "RNA FISH Experimental Results Demonstrate that Off Target and Binding Affinity Inclusive Probe Design Improve RNA FISH Signal Discrimination" (lines 217-218). However, despite their claim that Stellaris and Oligostan-HT produce more off-target probes when evaluated with the TrueProbes framework, the experiment results are nearly identical. The authors should consider modifying their claims or performing new experiments that more clearly demonstrate their claims.

      In Figure 3, we aim to convey two main points. 

      The first point is to compare the number of ON-target probes designed by each software using their most stringent design criteria (Figure 3A). Currently, we are using a medium strict design criterion for Stellaris (level 3). As shown in the new supplementary figure XX, when we apply the most stringent design criteria for Stellaris (level 5), the number of ON-target probes decreases to XX probes. This clearly indicates that, based on theoretical calculations, TrueProbes can design more probes than any of its competitors.

      The second point is to compare the number of OFF-targets produced by each probe design. To illustrate this, we used two different metrics. In Figures 3B-D, we compare the total number of probes bound to OFF-target RNA. However, since each software generates a different number of ON-target probes, the number of OFF-targets may vary simply due to the differences in ON-target probe counts. Therefore, we introduced a second metric to compare OFF-targets. In Figures 3E-G, we present the number of OFF-targets normalized by the number of ON-targets. Using this metric, TrueProbes shows the lowest number of OFF-targets. We will updat the manuscript to clarify this point.

      Regarding the experiments and their comparison to theoretical calculations: The theoretical calculations consider only the reference DNA and RNA genomes along with the oligonucleotide sequences for the probes. We then use a thermodynamic model to identify ON- and OFF-targets. Thus, these theoretical calculations represent an upper bound on the maximum possible number of ON-targets and the minimum number of OFF-targets. All other design software evaluated in this manuscript relies on the same or less reference data and makes certain assumptions. None of these methods quantitatively compare their computational designs with experimental results; they simply design probes based on unverified assumptions, conduct experiments, and present spot data to conclude that their probe designs are effective.

      We will update the manuscript to clarify the goals of the theoretical model and its relationship to the experiments. Future work will be necessary to enhance our theoretical model to fully account for additional aspects of RNA-FISH experiments (e.g., formaldehyde crosslinking, hybridization conditions, washing steps) to better predict the experimental data shown in Figure 4. We will also adjuste our claims to accurately reflect the current capabilities of our theoretical framework and its relation to experimental outcomes.

      (3) The bar graphs in Figure 3 do not seem to agree with the probability graphs in Figure 4. For example, Figure 3 indicates that Stellaris probes have higher off-target binding than TrueProbes; however, in Figure 4, their probability graphs lie almost on top of each other.

      The predictions in Figure 3 regarding the number of probe off-target binding events, based on reference gene expression data, do not necessarily encompass all the information required to predict RNA-FISH signal intensity. Therefore, these predictions should not be expected to translate directly into the experimental results shown in Figure 4, particularly concerning the background signal.

      While our software aims to minimize off-target probe binding, this does not automatically lead to a reduction in off-target background signal. Numerous other factors influence the spot background and overall signal-to-noise ratio (SNR) performance, beyond just probe-target binding interactions. Although we strive to minimize off-target background through probe binding, this approach is not designed to directly predict the SNR. Extending the computational analysis of probe binding dynamics to RNA-FISH signal intensity dynamics is beyond the scope of this study.

      We have revised our text to clearly separate computational results from experimental results into two distinct sections. We will use different terminology to describe the outcomes of computational performance versus experimental performance, reducing potential confusion between these two aspects. Additionally, we will clarify our conceptual overview in Figure 1 regarding traditional probe design limitations related to sensitivity and specificity. We will specify how the signal from the number of probes bound to ON-target RNA, relative to those bound to OFF-targets and cellular autofluorescence, translates—either linearly or non-linearly—into the signal-to-noise ratio.

      (4) The authors performed validation for only one gene (ARF4), because "...it had the highest gene expression (in TPM units) and the fewest isoforms among all candidate genes for the Jurkat cell line" (lines 176-177). While the results do look good, this is a minimal use case and does not really showcase the power of their method. One experiment that could be helpful would be two-color (or more) smFISH in tissue, where the chances for off-target binding contributing to higher errors are much greater than in an adherent cell line.

      Thank you for highlighting these valuable experiments. Currently, our lab lacks the expertise to generate tissue samples beyond culturing cells. Additionally, implementing a two-color probe design in tissues containing different cell types with unknown expression levels presents further challenges. Due to these limitations, designing and conducting two-color experiments in tissue samples is beyond the scope of the current manuscript, but we plan to pursue this in the future.

      (5) A common strategy for both smFISH and highly multiplexed methods is to use secondary DNA oligos with dye molecules instead of direct conjugation. Given that this is a primary design goal of PaintSHOP and the Zhuang lab's MERFISH probe design code, it would be helpful to demonstrate that TrueProbes can design a two-layer probe strategy for high-quality RNA-FISH labeling.

      Thank you for bringing this to our attention. TrueProbes is currently designed and tested specifically for primary smRNA-FISH probes. Our focus is on demonstrating a new approach to designing these probes without the added complexities of secondary probes and multiplexing. Future work will expand on this foundation to incorporate secondary probe detection and transcript multiplexing.

      (6) The authors claim, "For every probe set, TrueProbes can simulate expected smRNA FISH outcomes including optimal probe, RNA, and salt concentrations and optionally account for probe secondary structure, hybridization temperature, multiple targets, fluorophore choice, DNA, nascent RNA, and photon count statistics (Figures S2A, S2B). The model can be used to generate predictions for temperature and cell line sensitivity, multi-target discrimination, multiple fluorophore colocalization; when provided transcript expression levels and probe/background intensity, it can start to generate predictions for spot intensity, background, signal to noise ratio, and false negative rates (Figure S2C)." (lines 156-163). Figure S2 is a flow chart and does not provide evidence for any of these items. The authors should provide evidence for these claims, either as a figure or an example script in their software repository. If that is not possible, then it should be removed.

      The supplemental information of the article will be updated to include figures that illustrate predictions for each capability currently offered by TrueProbes, along with the scripts used to generate these predictions. Any capabilities that do not have corresponding scripts will be removed from this section and instead referred to as potential improvements or future additions to the TrueProbes framework in the discussion section.

      (7) All thermodynamic equations are performed at steady state. The authors do not justify this assumption, and there is no discussion of the potential impacts of either low molecule numbers or violations of the well-mixed assumption. Can the authors please include a discussion on the potential impacts non non-steady state dynamics?

      Thermodynamic equations are calculated at steady state because RNA-FISH hybridization reactions typically last from eight to twenty hours. This duration allows probes adequate time to localize to their targets and reach binding equilibrium, based on current estimates of DNA oligonucleotide association and dissociation rate constants. We will address the potential violation of the well-mixed assumption in the assumptions and limitations section, specifically discussing how RNA localization can affect the spatial distribution of both on-target and off-target probes within cells, which may disrupt the well-mixed condition.

      Low molecule numbers are not a significant concern, as probe DNA oligonucleotide concentrations in RNA-FISH protocols are much higher than the number of transcripts present in cells, by several orders of magnitude.

      The assumptions and limitations section will be revised to clearly state: “Probe hybridization reactions were computed at steady state because most RNA-FISH protocols utilize probe hybridization incubation steps lasting over eight hours, which should provide sufficient time to reach equilibrium based on current estimates of forward and reverse reaction rate constants. Predictions from the equilibrium model may be less accurate for RNA-FISH experiments with shorter hybridization times, where non-steady state dynamics can result in different transient outcomes depending on the duration of hybridization.”

      Reviewer #3 (Public review):

      Summary:

      This manuscript introduces a new platform termed "TrueProbes" for designing mRNA FISH probes. In comparison to existing design strategies, the authors incorporate a comprehensive thermodynamic and kinetic model to account for probe states that may contribute to nonspecific background. The authors validate their design pipeline using Jurkat cells and provide evidence of improved probe performance.

      Strengths:

      A notable strength of TrueProbes is the consideration of genome-wide binding affinities, which aims to minimize off-target signals. The work will be of interest to researchers employing mRNA FISH in certain human cell lines.

      Weaknesses:

      However, in my view, the experimental validation is not sufficient to justify the broad claims of the platform. Given the number of assumptions in the model, additional experimental comparisons across probe design methods, ideally targeting transcripts with different expression levels, would be necessary to establish the general superiority of this approach.

      We will revise our text to make our claims more specific and clearer, avoiding overgeneralizations and ensuring that all claims are adequately supported by the data we present.

    1. eLife Assessment

      This set of experiments provides a valuable finding regarding the need for prior inhibitory training to recruit the infralimbic cortex in extinction learning. The multiple clever behavioral designs supply converging lines of evidence in a compelling manner, but several issues, such as the group sizes and appropriate analysis of data, render the overall strength of support incomplete. With these issues resolved, this manuscript will be of interest to behavioral neuroscientists, especially those interested in learning & memory and/or cortical function.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript reports a series of experiments designed to test whether optogenetic activation of infralimbic (IL) neurons facilitates extinction retrieval and whether this depends on animals' prior experience. In Experiment 1, rats underwent fear conditioning followed by either one or two extinction sessions, with IL stimulation given during the second extinction; stimulation facilitated extinction retrieval only in rats with prior extinction experience. Experiments 2 and 3 examined whether backward conditioning (CS presented after the US) could establish inhibitory properties that allowed IL stimulation to enhance extinction, and whether this effect was specific to the same stimulus or generalized to different stimuli. Experiments 5 - 7 extended this approach to appetitive learning: rats received backward or forward appetitive conditioning followed by extinction, and then fear conditioning, to determine whether IL stimulation could enhance extinction in contexts beyond aversive learning and across conditioning sequences. Across studies, the key claim is that IL activation facilitates extinction retrieval only when animals possess a prior inhibitory memory, and that this effect generalizes across aversive and appetitive paradigms.

      Strengths:

      (1) The design attempts to dissect the role of IL activity as a function of prior learning, which is conceptually valuable.

      (2) The experimental design of probing different inhibitory learning approaches to probe how IL activation facilitates extinction learning was creative and innovative.

      Weaknesses:

      (1) Non-specific manipulation.

      ChR2 was expressed in IL without distinction between glutamatergic and GABAergic populations. Without knowing the relative contribution of these cell types or the percentage of neurons affected, the circuit-level interpretation of the results is unclear.

      (2) Extinction retrieval test conflates processes

      The retrieval test included 8 tones. Averaging across this many tone presentations conflate extinction retrieval/expression (early tones) with further extinction learning (later tones). A more appropriate analysis would focus on the first 2-4 tones to capture retrieval only. As currently presented, the data do not isolate extinction retrieval.

      (3) Under-sampling and poor group matching.

      Sample sizes appear small, which may explain why groups are not well matched in several figures (e.g., 2b, 3b, 6b, 6c) and why there are several instances of unexpected interactions (protocol, virus, and period). This baseline mismatch raises concerns about the reliability of group differences.

      (4) Incomplete presentation of conditioning data.

      Figure 3 only shows a single conditioning session despite five days of training. Without the full dataset, it is difficult to evaluate learning dynamics or whether groups were equivalent before testing.

      (5) Interpretation stronger than evidence.

      The authors conclude that IL activation facilitates extinction retrieval only when an inhibitory memory has been formed. However, given the caveats above, the data are insufficient to support such a strong mechanistic claim. The results could reflect non-specific facilitation or disruption of behavior by broad prefrontal activation. Moreover, there is compelling evidence that optogenetic activation of IL during fear extinction does facilitate subsequent extinction retrieval without prior extinction training (Do-Monte et al 2015, Chen et al 2021), which the authors do not directly test in this study.

      Impact:

      The role of IL in extinction retrieval remains a central question in the fear learning literature. However, because the test used conflates extinction retrieval with new learning and the manipulations lack cell-type specificity, the evidence presented here does not convincingly support the main claims. The study highlights the need for more precise manipulations and more rigorous behavioral testing to resolve this issue.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors examine the mechanisms by which stimulation of the infralimbic cortex (IL) facilitates the retention and retrieval of inhibitory memories. Previous work has shown that optogenetic stimulation of the IL suppresses freezing during extinction but does not improve extinction recall when extinction memory is probed one day later. When stimulation occurs during a second extinction session (following a prior stimulation-free extinction session), freezing is suppressed during the second extinction as well as during the tone test the following day. The current study was designed to further explore the facilitatory role of the IL in inhibitory learning and memory recall. The authors conducted a series of experiments to determine whether recruitment of IL extends to other forms of inhibitory learning (e.g., backward conditioning) and to inhibitory learning involving appetitive conditioning. Further, they assessed whether their effects could be explained by stimulus familiarity. The results of their experiments show that backward conditioning, another form of inhibitory learning, also enabled IL stimulation to enhance fear extinction. This phenomenon was not specific to aversive learning, as backward appetitive conditioning similarly allowed IL stimulation to facilitate extinction of aversive memories. Finally, the authors ruled out the possibility that IL facilitated extinction merely because of prior experience with the stimulus (e.g., reducing the novelty of the stimulus). These findings significantly advance our understanding of the contribution of IL to inhibitory learning. Namely, they show that the IL is recruited during various forms of inhibitory learning, and its involvement is independent of the motivational value associated with the unconditioned stimulus.

      Strengths:

      (1) Transparency about the inclusion of both sexes and the representation of data from both sexes in figures.

      (2) Very clear representation of groups and experimental design for each figure.

      (3) The authors were very rigorous in determining the neurobehavioral basis for the effects of IL stimulation on extinction. They considered multiple interpretations and designed experiments to address these possible accounts of their data.

      (4) The rationale for and the design of the experiments in this manuscript are clearly based on a wealth of knowledge about learning theory. The authors leveraged this expertise to narrow down how the IL encodes and retrieves inhibitory memories.

      Weaknesses:

      (1) In Experiment 1, although not statistically significant, it does appear as though the stimulation groups (OFF and ON) differ during Extinction 1. It seems like this may be due to a difference between these groups after the first forward conditioning. Could the authors have prevented this potential group difference in Extinction 1 by re-balancing group assignment after the first forward conditioning session to minimize the differences in fear acquisition (the authors do report a marginally significant effect between the groups that would undergo one vs. two extinction sessions in their freezing during the first conditioning session)?

      (2) Across all experiments (except for Experiment 1), the authors state that freezing during the initial conditioning increased across "days". The figures that correspond to this text, however, show that freezing changes across trials. In the methods, the authors report that backward conditioning occurred over 5 days. It would be helpful to understand how these data were analyzed and collated to create the final figures. Was the freezing averaged across the five days for each trial for analyses and figures?

      (3) In Experiment 3, the authors report a significant Protocol X Virus interaction. It would be useful if the authors could conduct post-hoc analyses to determine the source of this interaction. Inspection of Figure 4B suggests that freezing during the two different variants of backward conditioning differs between the virus groups. Did the authors expect to see a difference in backward conditioning depending on the stimulus used in the conditioning procedure (light vs. tone)? The authors don't really address this confounding interaction, but I do think a discussion is warranted.

      (4) In this same experiment, the authors state that freezing decreased during extinction; however, freezing in the Diff-EYFP group at the start of extinction (first bin of trials) doesn't look appreciably different than their freezing at the end of the session. Did this group actually extinguish their fear? Freezing on the tone test day also does not look too different from freezing during the last block of extinction trials.

      (5) The Discussion explored the outcomes of the experiments in detail, but it would be useful for the authors to discuss the implications of their findings for our understanding of circuits in which the IL is embedded that are involved in inhibitory learning and memory. It would also be useful for the authors to acknowledge in the Discussion that although they did not have the statistical power to detect sex differences, future work is needed to explore whether IL functions similarly in both sexes.

    4. Reviewer #3 (Public review):

      Summary:

      This is a really nice manuscript with different lines of evidence to show that the IL encodes inhibitory memories that can then be manipulated by optogenetic stimulation of these neurons during extinction. The behavioral designs are excellent, with converging evidence using extinction/re-extinction, backwards/forwards aversive conditioning, and backwards appetitive/forwards aversive conditioning. Additional factors, such as nonassociative effects of the CS or US, are also considered, and the authors evaluate the inhibitory properties of the CS with tests of conditioned inhibition.

      Strengths:

      The experimental designs are very rigorous with an unusual level of behavioral sophistication.

      Weaknesses:

      (1) More justification for parametric choices (number of days of backwards vs forwards conditioning) could be provided.

      (2) The current discussion could be condensed and could focus on broader implications for the literature.

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      The manuscript reports a series of experiments designed to test whether optogenetic activation of infralimbic (IL) neurons facilitates extinction retrieval and whether this depends on animals' prior experience. In Experiment 1, rats underwent fear conditioning followed by either one or two extinction sessions, with IL stimulation given during the second extinction; stimulation facilitated extinction retrieval only in rats with prior extinction experience. Experiments 2 and 3 examined whether backward conditioning (CS presented after the US) could establish inhibitory properties that allowed IL stimulation to enhance extinction, and whether this effect was specific to the same stimulus or generalized to different stimuli. Experiments 5 - 7 extended this approach to appetitive learning: rats received backward or forward appetitive conditioning followed by extinction, and then fear conditioning, to determine whether IL stimulation could enhance extinction in contexts beyond aversive learning and across conditioning sequences. Across studies, the key claim is that IL activation facilitates extinction retrieval only when animals possess a prior inhibitory memory, and that this effect generalizes across aversive and appetitive paradigms.

      Strengths:

      (1) The design attempts to dissect the role of IL activity as a function of prior learning, which is conceptually valuable.

      We thank the Reviewer for their positive assessment.

      (2) The experimental design of probing different inhibitory learning approaches to probe how IL activation facilitates extinction learning was creative and innovative.

      We thank the Reviewer for their positive assessment.

      Weaknesses:

      (1) Non-specific manipulation.

      ChR2 was expressed in IL without distinction between glutamatergic and GABAergic populations. Without knowing the relative contribution of these cell types or the percentage of neurons affected, the circuit-level interpretation of the results is unclear.

      ChR2 was intentionally expressed in the infralimbic cortex (IL) without distinction between local neuronal populations for two reasons. First, this manuscript aimed to uncover some of the features characterizing the encoding of inhibitory memories in the IL, and this encoding likely engages interactions among various neuronal populations within the IL. Second, the hypotheses tested in the manuscript derived from findings that indiscriminately stimulated the IL using the GABA<sub>A</sub> receptor antagonist picrotoxin, which is best mimicked by the approach taken. We agree that it is also important to determine the respective contributions of distinct IL neuronal populations to inhibitory encoding; however, the global approach implemented in the present experiments represents a necessary initial step. This rationale will be incorporated into the revised manuscript, which will also make reference to the need to identify the relative contributions of the various neuronal populations within the IL. 

      (2) Extinction retrieval test conflates processes

      The retrieval test included 8 tones. Averaging across this many tone presentations conflate extinction retrieval/expression (early tones) with further extinction learning (later tones). A more appropriate analysis would focus on the first 2-4 tones to capture retrieval only. As currently presented, the data do not isolate extinction retrieval.

      It is unclear when retrieval of what has been learned across extinction ceases and additional extinction learning occurs. In fact, it is only the first stimulus presentation that unequivocally permits a distinction between retrieval and additional extinction learning, as the conditions for this additional learning have not been fulfilled at that presentation. However, confining evidence for retrieval to the first stimulus presentation introduces concerns that other factors could influence performance. For instance, processing of the stimulus present at the start of the session may differ from that present at the end of the previous session, thereby affecting what is retrieved. Such differences between the stimuli present at the start and end of an extinction session have been long recognized as a potential explanation for spontaneous recovery (Estes, 1955). More importantly, whether the test data presented confound retrieval and additional extinction learning or not, the interpretation remains the same with respect to the effects of a prior history of inhibitory learning on enabling the facilitative effects of IL stimulation. Finally, it is unclear how these facilitative effects could occur in the absence of the subjects retrieving the extinction memory formed under the stimulation. Nevertheless, the revised manuscript will provide the trial-by-trial performance during the post-extinction retrieval tests and discuss this issue.

      (3) Under-sampling and poor group matching.

      Sample sizes appear small, which may explain why groups are not well matched in several figures (e.g., 2b, 3b, 6b, 6c) and why there are several instances of unexpected interactions (protocol, virus, and period). This baseline mismatch raises concerns about the reliability of group differences.

      Efforts were made to match group performance upon completion of each training stage and before IL stimulation. Unfortunately, these efforts were not completely successful due to exclusions following post-mortem analyses. However, we acknowledge that the unexpected interactions deserve further discussion, and this will be incorporated into the revised manuscript (see also comment from Reviewer 2). Although we cannot exclude that sample sizes may have contributed to some of these interactions, we remain confident about the reliability of the main findings reported, especially given their replication across the various protocols. Overall, the manuscript provides evidence that IL stimulation does not facilitate brief extinction in the absence of prior inhibitory experience in five different experiments, replicating previous findings (Lingawi et al., 2018; Lingawi et al., 2017). It also replicates these previous findings by showing that prior experience with either fear or appetitive extinction enables IL stimulation to facilitate subsequent fear extinction. Furthermore, the facilitative effects of such stimulation following fear or appetitive backward conditioning are replicated in the present manuscript.  

      (4) Incomplete presentation of conditioning data.

      Figure 3 only shows a single conditioning session despite five days of training. Without the full dataset, it is difficult to evaluate learning dynamics or whether groups were equivalent before testing.

      We apologize, as we incorrectly labeled the X axis for the backward conditioning data set in Figures 3B, 4B, 4D and 5B. It should have indicated “Days” instead of “Trials”. This error will be corrected in the revised manuscript.

      (5) Interpretation stronger than evidence.

      The authors conclude that IL activation facilitates extinction retrieval only when an inhibitory memory has been formed. However, given the caveats above, the data are insufficient to support such a strong mechanistic claim. The results could reflect non-specific facilitation or disruption of behavior by broad prefrontal activation. Moreover, there is compelling evidence that optogenetic activation of IL during fear extinction does facilitate subsequent extinction retrieval without prior extinction training (Do-Monte et al 2015, Chen et al 2021), which the authors do not directly test in this study.

      As noted above, the revised manuscript will show that the interpretations of the main findings stand whether ore the test data confounds retrieval with additional extinction learning. The revised manuscript will also clarify the plotting of the data for the backward conditioning stages. We do agree that further discussion of the unexpected interactions is necessary, and this will also be incorporated into the revised manuscript. However, the various replications of the core findings provide strong evidence for their reliability and the interpretations advanced in the original manuscript. The proposal that the results reflect non-specific facilitation or disruption of behavior seems highly unlikely. Indeed, the present experiments and previous findings (Lingawi et al., 2018; Lingawi et al., 2017) provide multiple demonstrations that IL stimulation fails to produce any facilitation in the absence of prior inhibitory experience with the target stimulus. Although these demonstrations appear inconsistent with previous studies (Do-Monte et al., 2015; Chen et al., 2021), this inconsistency is likely explained by the fact that these studies manipulated activity in specific IL neuronal populations. Previous work has already revealed differences between manipulations targeting discrete IL neuronal populations as opposed to general IL activity (Kim et al., 2016). Importantly, as previously noted, the present manuscript aimed to generally explore inhibitory encoding in the IL that, as we will acknowledge, is likely to engage several neuronal populations within the IL. Adequate statements on these matters will be included in the revised manuscript.

      Impact:

      The role of IL in extinction retrieval remains a central question in the fear learning literature. However, because the test used conflates extinction retrieval with new learning and the manipulations lack cell-type specificity, the evidence presented here does not convincingly support the main claims. The study highlights the need for more precise manipulations and more rigorous behavioral testing to resolve this issue.

      As noted in our responses, the interpretations of the data presented remain identical whether the test data conflate extinction retrieval with additional extinction learning or not. Although we agree that it is important to establish the role of specific IL neuronal populations in extinction learning, this was beyond the scope of the manuscript and the findings reported remain valuable to our understanding of inhibitory encoding within the IL.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors examine the mechanisms by which stimulation of the infralimbic cortex (IL) facilitates the retention and retrieval of inhibitory memories. Previous work has shown that optogenetic stimulation of the IL suppresses freezing during extinction but does not improve extinction recall when extinction memory is probed one day later. When stimulation occurs during a second extinction session (following a prior stimulation-free extinction session), freezing is suppressed during the second extinction as well as during the tone test the following day. The current study was designed to further explore the facilitatory role of the IL in inhibitory learning and memory recall. The authors conducted a series of experiments to determine whether recruitment of IL extends to other forms of inhibitory learning (e.g., backward conditioning) and to inhibitory learning involving appetitive conditioning. Further, they assessed whether their effects could be explained by stimulus familiarity. The results of their experiments show that backward conditioning, another form of inhibitory learning, also enabled IL stimulation to enhance fear extinction. This phenomenon was not specific to aversive learning, as backward appetitive conditioning similarly allowed IL stimulation to facilitate extinction of aversive memories. Finally, the authors ruled out the possibility that IL facilitated extinction merely because of prior experience with the stimulus (e.g., reducing the novelty of the stimulus). These findings significantly advance our understanding of the contribution of IL to inhibitory learning. Namely, they show that the IL is recruited during various forms of inhibitory learning, and its involvement is independent of the motivational value associated with the unconditioned stimulus.

      Strengths:

      (1) Transparency about the inclusion of both sexes and the representation of data from both sexes in figures.

      We thank the Reviewer for their positive assessment.

      (2) Very clear representation of groups and experimental design for each figure.

      We thank the Reviewer for their positive assessment.

      (3) The authors were very rigorous in determining the neurobehavioral basis for the effects of IL stimulation on extinction. They considered multiple interpretations and designed experiments to address these possible accounts of their data.

      We thank the Reviewer for their positive assessment.

      (4) The rationale for and the design of the experiments in this manuscript are clearly based on a wealth of knowledge about learning theory. The authors leveraged this expertise to narrow down how the IL encodes and retrieves inhibitory memories.

      We thank the Reviewer for their positive assessment.

      Weaknesses:

      (1) In Experiment 1, although not statistically significant, it does appear as though the stimulation groups (OFF and ON) differ during Extinction 1. It seems like this may be due to a difference between these groups after the first forward conditioning. Could the authors have prevented this potential group difference in Extinction 1 by re-balancing group assignment after the first forward conditioning session to minimize the differences in fear acquisition (the authors do report a marginally significant effect between the groups that would undergo one vs. two extinction sessions in their freezing during the first conditioning session)?

      As noted (see response to Reviewer 1), efforts were made daily to match group performance across the training stages, but these efforts were ultimately hampered by the necessary exclusions following post-mortem analyses. This will be made explicit in the revised manuscript. Regarding freezing during Extinction 1, as noted by the Reviewer, the difference, which was not statistically significant, was absent across trials during the subsequent forward fear conditioning stage. Likewise, the protocol difference observed during the initial forward fear conditioning was absent in subsequent stages. We are therefore confident that these initial differences (significant or not) did not impact the main findings at test. Importantly, these findings replicate previous work using identical protocols in which no differences were present during the training stages. These considerations will be addressed in the revised manuscript.

      (2) Across all experiments (except for Experiment 1), the authors state that freezing during the initial conditioning increased across "days". The figures that correspond to this text, however, show that freezing changes across trials. In the methods, the authors report that backward conditioning occurred over 5 days. It would be helpful to understand how these data were analyzed and collated to create the final figures. Was the freezing averaged across the five days for each trial for analyses and figures?

      We apologize, as noted above, we incorrectly labeled the X axis for the backward conditioning data sets in Figures 3B, 4B, 4D and 5B. It should have indicated “Days” instead of “Trials”. The data shown in these Figures use the average of all trials on a given day. This will be clarified in the methods section of the revised manuscript. The labeling errors on the Figures will be corrected.

      (3) In Experiment 3, the authors report a significant Protocol X Virus interaction. It would be useful if the authors could conduct post-hoc analyses to determine the source of this interaction. Inspection of Figure 4B suggests that freezing during the two different variants of backward conditioning differs between the virus groups. Did the authors expect to see a difference in backward conditioning depending on the stimulus used in the conditioning procedure (light vs. tone)? The authors don't really address this confounding interaction, but I do think a discussion is warranted.

      We agree with the Reviewer that further discussion of the Protocol x Virus interaction that emerged during the backward conditioning and forward conditioning stages of Experiment 3 is warranted. This will be provided in the revised manuscript. Briefly, during both stages, follow-up analyses did not reveal any differences (main effects or interactions) between the two groups trained with the light stimulus (Diff-EYFP and Diff-ChR2). By contrast, the ChR2 group trained with the tone (Back-ChR2) froze more overall than the EYFP group (Back-EYFP), but there were no other significant differences between the two groups. Based on these analyses, the Protocol x Virus interaction appears to be driven by greater freezing in the ChR2 group trained with the tone rather than a difference in the backward conditioning performance based on stimulus identity. Consistent with this, the statistical analyses did not reveal a main effect of Protocol during either the backward conditioning stage or the stimulus trials during the forward conditioning stage. Nevertheless, during this latter stage, a main effect of Protocol emerged during baseline performance, but once again, this seems to be driven by the Back-ChR2 group. Critically, it is unclear how greater stimulus freezing in the Back-ChR2 group during forward conditioning would lead to lower freezing during the post-extinction retrieval test.  

      (4) In this same experiment, the authors state that freezing decreased during extinction; however, freezing in the Diff-EYFP group at the start of extinction (first bin of trials) doesn't look appreciably different than their freezing at the end of the session. Did this group actually extinguish their fear? Freezing on the tone test day also does not look too different from freezing during the last block of extinction trials.

      We confirm that overall, there was a significant decline in freezing across the extinction session shown in Figure 4B. The Reviewer is correct to point out that this decline was modest (if not negligible) in the Diff-EYFP group, which was receiving its first inhibitory training with the target tone stimulus. It is worth noting that across all experiments, most groups that did not receive infralimbic stimulation displayed a modest decline in freezing during the extinction session since it was relatively brief, involving only 6 or 8 tone alone presentations. This was intentional, as we aimed for the brief extinction session to generate minimal inhibitory learning and thereby to detect any facilitatory effect of infralimbic stimulation. This issue will be clarified and explained in the revised version of the manuscript.

      (5) The Discussion explored the outcomes of the experiments in detail, but it would be useful for the authors to discuss the implications of their findings for our understanding of circuits in which the IL is embedded that are involved in inhibitory learning and memory. It would also be useful for the authors to acknowledge in the Discussion that although they did not have the statistical power to detect sex differences, future work is needed to explore whether IL functions similarly in both sexes.

      In line with the Reviewer’s suggestion (see also Reviewer 3), the revised manuscript will include a discussion of the broader implications of the findings regarding inhibitory brain circuitry and will acknowledge the need to further explore sex differences and IL functions.

      Reviewer #3 (Public review):

      Summary:

      This is a really nice manuscript with different lines of evidence to show that the IL encodes inhibitory memories that can then be manipulated by optogenetic stimulation of these neurons during extinction. The behavioral designs are excellent, with converging evidence using extinction/re-extinction, backwards/forwards aversive conditioning, and backwards appetitive/forwards aversive conditioning. Additional factors, such as nonassociative effects of the CS or US, are also considered, and the authors evaluate the inhibitory properties of the CS with tests of conditioned inhibition.

      Strengths:

      The experimental designs are very rigorous with an unusual level of behavioral sophistication.

      We thank the Reviewer for their positive assessment.

      Weaknesses:

      (1) More justification for parametric choices (number of days of backwards vs forwards conditioning) could be provided.

      All experimental parameters were based on previously published experiments showing the capacity of the backward conditioning protocols to generate inhibitory learning and the forward conditioning protocols to produce excitatory learning. Although this was mentioned in the methods section, we acknowledge that further explanation is required to justify the need for multiple days of backward training. This will be provided in the revised manuscript.

      (2) The current discussion could be condensed and could focus on broader implications for the literature.

      The revised manuscript will make an effort to condense the discussion and focus on broader implications for the literature.

      References

      Chen, Y.-H., Wu, J.-L., Hu, N.-Y., Zhuang, J.-P., Li, W.-P., Zhang, S.-R., Li, X.-W., Yang, J.-M., & Gao, T.-M. (2021). Distinct projections from the infralimbic cortex exert opposing effects in modulating anxiety and fear. J Clin Invest, 131(14), e145692. https://doi.org/10.1172/JCI145692

      Do-Monte, F. H., Manzano-Nieves, G., Quiñones-Laracuente, K., Ramos-Medina, L., & Quirk, G. J. (2015). Revisiting the role of infralimbic cortex in fear extinction with optogenetics. J Neurosci, 35(8), 3607-3615. https://doi.org/10.1523/JNEUROSCI.3137-14.2015

      Estes, W. K. (1955). Statistical theory of spontaneous recovery and regression. Psychol Rev, 62(3), 145-154. https://doi.org/10.1037/h0048509

      Kim, H.-S., Cho, H.-Y., Augustine, G. J., & Han, J.-H. (2016). Selective Control of Fear Expression by Optogenetic Manipulation of Infralimbic Cortex after Extinction. Neuropsychopharmacology, 41(5), 1261-1273. https://doi.org/10.1038/npp.2015.276

      Lingawi, N. W., Holmes, N. M., Westbrook, R. F., & Laurent, V. (2018). The infralimbic cortex encodes inhibition irrespective of motivational significance. Neurobiol Learn Mem, 150, 64-74. https://doi.org/10.1016/j.nlm.2018.03.001

      Lingawi, N. W., Westbrook, R. F., & Laurent, V. (2017). Extinction and Latent Inhibition Involve a Similar Form of Inhibitory Learning that is Stored in and Retrieved from the Infralimbic Cortex. Cereb Cortex, 27(12), 5547-5556. https://doi.org/10.1093/cercor/bhw322

    1. eLife Assessment

      This important study presents convincing findings on creating an exhaustive library of new enhancer-AAVs targeting astrocytes and oligodendrocytes with high potential for both basic and translational work, which will be of value to a large and growing community. However, the outdated description of glial biology in the Introduction, the overstated claims of utility in the Conclusion, and the loose stringency in the criteria used to assemble the library diminish the strengths of the claims. The work will be of interest to neuroscientists working on glial cell biology.

    2. Reviewer #1 (Public review):

      The goal of this study was to generate a library of new enhancer-driven AAVs in order to selectively and efficiently target astrocytes and oligodendrocytes in rodents. The implied criteria are that such viral vectors should have high specificity for the intended cell type and effectively express in all astrocytes/oligos in the brain or, alternatively, be specific for defined brain regions, layers, or subtypes of astrocytes/oligos. In addition, they should be compatible with intravenous retro-orbital delivery to facilitate experimentation and brain-wide targeting (i.e., show organ specificity and high efficiency in the brain). Ideally, these new AAVs would also maintain their characteristics across disease contexts and show applicability in non-human primates. Tools with such characteristics are generally lacking in studying glial cells and would be extremely useful to scale up and accelerate glial research, allowing targeting of astrocytes/oligos with distinct molecular identity and intersectional strategies.

      At present, however, none of the enhancer-AAVs presented in the study seems to meet this combination of criteria, at least not with the level of stringency typically expected in the field. The main reason is that, in its current form, the study does not present one candidate AAV iteratively improved to meet all these criteria; instead, it presents a catalogue of new AAVs with various degrees of specificity, completeness, and mixed characteristics. Therefore, their utility should be interpreted cautiously. Moreover, the way specificity and completeness are intermixed in the analysis makes it difficult to evaluate the actual utility of any given AAV. The study might have been strengthened by focusing on a small set of the most promising candidates (i.e., AiE0890m_3x2C) and validating them thoroughly for expression specificity, completeness, effective cargo expression, ability to allow specific pan-astrocyte or astrocyte-subtype targeting in vivo, and preserved properties in NHPs and in disease, as this would encourage their adoption by the community. Currently, too many AAVs are assessed inconsistently against the desired criteria, with none being evaluated through and through.

      The impact of the catalogue is also greatly diminished by the fact that a suite of AAVs with outstanding specificity and efficiency is already available for the study of astrocytes (e.g., 4x6T AAVs) and was not utilized as a standard to benchmark the new library, making it difficult to appreciate the relative benefits of the new AAVs. The inclusion of expression data in NHPs is very significant, but benchmarking against established AAVs would also be needed to fully appreciate their value.

      Importantly, readers should also be aware that the study seems noticeably limited in its literacy with glial biology. The introduction and discussion frame the field in a way that seems outdated, creating the impression that the diverse roles of glia in health and disease have not yet been studied, which may inadvertently be perceived as dismissive and stigmatizing.

      In summary, the paper introduces potentially useful viral tools and lays the foundations for future multiplexed targeting of distinct glial cell subpopulations in rodents and in non-human primates, which are extremely important directions. Some of the regionally restricted or even sparsely expressed AAVs may prove valuable in enabling subpopulation-specific targeting or molecular profiling strategies, but currently lack full benchmarking. At present, the promises over the utility of the new tools seem overstated, and the library may not yet represent an actionable resource for targeting astrocytes and oligodendrocytes.

    3. Reviewer #2 (Public review):

      Enhancer elements are regulatory DNA sequences that are capable of driving specific expression patterns. As these elements are generally short and context-independent, enhancers can be used in expression vectors (e.g., packaged in an adeno-associated virus, AAV) to limit expression to target cell populations. This approach was identified as a major strategy for cell-type-specific manipulation in the brain and has been pursued by both standard research studies as well as large-scale efforts led by the BRAIN Initiative. This manuscript describes a major effort to generate enhancer-AAVs targeting astrocytes and oligodendrocytes orchestrated by a large research team led by the Allen Institute for Brain Science. This manuscript parallels other recent publications describing sets of enhancer-AAVs, following rigorous, similar methods with relatively broad testing and application.

      To identify and screen candidate enhancers, the scientists prioritized candidates via analysis of single-nucleus accessibility and methylation datasets (i.e., snATAC-seq) and tested them in mice. The scientists prioritized candidate enhancers that exhibited specificity of accessibility in the target cell type. Following selection, the scientists cloned the candidate sequences into AAV vectors with a minimal promoter and reporter gene, packaged the virus, delivered it to the mouse brain, and screened for activity based on reporter expression. Candidates that passed initial screening were further characterized via imaging and sorting, followed by single-cell RNA-seq. This process had around a 50% success rate and yielded 25 astrocyte and 21 oligodendrocyte enhancer-AAVs with the targeted cell-type-specific expression patterns.

      The scientists went on to test for subtype-specific activity patterns, finding wide diversity in astrocyte activities across sub-populations and conversely, homogenous oligodendrocyte activation. They optimized a few of these via concatenating the enhancer core sequence to increase expression levels of the reporter gene and showed strong specificity and completeness of cell targeting for a set of these enhancer-AAVs. Following characterization and validation, they then deployed these enhancer-AAVs in a number of demonstration applications to show the utility for basic and translational science. All the constructs developed here are available for public use via Addgene, ensuring that these new tools can be used by other researchers.

      There really are no obvious weaknesses in the work presented here, from the generation of the enhancer-AAVs to use in sophisticated validation studies. The enhancer-AAV testing is rigorous and provides critical information necessary for other scientists to select and use these constructs. The applications demonstrate the power of enhancer-AAV approaches. The toolbox presented here may not enable specific targeting of all relevant cellular subtypes or activity states for astrocytes and oligodendrocytes, and future work will be needed to fully understand the activity of the enhancers, identity of the target cell types, and context-dependent utility of these constructs. However, the set of enhancer-AAVs developed here should be transformative for researchers working on accessing and manipulating these cell types and have a major impact on the field.

    1. eLife Assessment

      This paper presents a collection of analyses relating structure and function in the whole-brain Drosophila EM connectome and whole-brain calcium imaging data. The linkage of detailed anatomical structure with population activity is of broad interest in circuit neuroscience in light of increasingly detailed brain maps, but the analysis methods used made the evidence incomplete. The conclusions are useful for specific network observations, but a more thorough analysis of the anatomical and functional data is needed to support the overall claims.

    2. Reviewer #1 (Public review):

      Summary:

      In this paper, the authors analyze connectome data from Drosophila and compare the physical wiring with functional connectivity estimated from calcium imaging data. They quantify structure-function relationships as a correlation of the two connectivity modalities. They report correlations roughly comparable to what has been described in the literature on sc/fc relationships in mammalian connectome data at the meso-scale. They then repeat their analysis, focusing on segregated versus unsegregated synapses. They derive separate connectomes using one or the other class of synapse. They show differential contributions to the sc/fc relationships by segregated versus unsegregated synapses.

      Strengths:

      There is nice synthesis of multimodal imaging data (Ca and EM data from flies and meso-scale data from human and marmoset).

      Weaknesses:

      (1) The paper is written in an unusual way. The introduction intermingles results with background, making it hard to figure out what precisely is being tested.

      (2) There are also major methodological gaps. Though the mammalian connectomes are used as a point of reference, no descriptions of their origins or processing are included.

      (3) A major weakness stems from the actual calculation of the sc/fc correlation. In general, SC is sparse. In the case of the EM connectomes, it is *exceptionally* sparse (most neural elements are not connected to one another). The authors calculated sc/fc coupling by correlating the off-diagonal elements of sc (the logarithm of its edge weights) and fc matrices with one another. The logarithmic transformation yields a value of infinity for all zero entries. The authors simply impute these elements with 0. This makes no sense and, depending on whether these zero elements are distributed systematically versus uniformly random, could either inflate or deflate the sc/fc correlations. Care must be taken here.

      (4) Further, in constructing the segregated versus unsegregated connectomes, they use absolute thresholds for collecting synapses. It is unclear, however, whether similar numbers of synapses were included in both matrices. If the number is different, that might explain the differential relationship with fc; one matrix has more non-zero entries (and as noted earlier, those zero entries are problematic).

      (5) There was also considerable text (in the results) describing the processing of the Ca data. In this section, the authors frequently refer to some pipelines as "better" or "worse" (more or less effective). But it is not clear what measures they adopted to assess the effectiveness of a pipeline.

    3. Reviewer #2 (Public review):

      Summary:

      Okuno et al. investigate the structure-function relationship in the fruit fly Drosophila melanogaster. To do so, they combine published data from two recent synapse-level connectomes ("hemibrain" and "FlyWire") with a dataset comprising functional whole-brain calcium imaging and behavioural data. First, they investigate the applicability of fMRI pre-processing techniques on data from calcium imaging. They then cross-correlate this pre-processed functional data with structural data extracted from the connectomes, including a comparison to humans. The authors proceed to compare the two connectomes and find significant differences, which they attribute to differences in the accuracy of the synapse detections. Next, they present a novel algorithm to quantify whether neurons are segregated (pre- and postsynapses are spatially separate) or unsegregated (pre- and postsynapses are mixed). Using this approach, they find that unsegregated neurons may contribute more to function than segregated neurons. Applying a general linear model to the functional dataset suggests that activity in two brain areas (Wedge and AVLP) is suppressed during walking. The authors identify a GABAergic neuron in the connectome that could be responsible for this effect and suggest it may provide feedback to the fly's "compass" in the central complex.

      Strengths:

      The study tackles a relevant question in connectomics by exploring the relationship between structural and functional connectivity in the Drosophila brain. The authors apply a range of established and adapted analytical methods, including fMRI-style preprocessing and a novel synaptic segregation index. The effort to integrate multiple datasets and to compare across species reflects a broad and methodical approach.

      Weaknesses:

      The manuscript would benefit from a clearer overarching narrative to unify the various analyses, which currently appear somewhat disjointed. While the technical methods are extensive, the writing is often convoluted and lacks crucial details, making it difficult to follow the logic and interpret key findings. Additionally, the conclusions are relatively incremental and lack a compelling conceptual advance, limiting the overall impact of the work.

      (1) The introduction currently contains a number of findings and conclusions that would be better placed in the results and discussion to clearly delineate past findings from new results and speculations.

      (2) The narrative would benefit greatly from some clear statements along the lines of "we wanted to find out X, therefore we did Y".

      (3) More concise terminology would be helpful. For example, the connectomes are currently referred to as either "hemibrain", "FlyEM", "whole-brain", or "FlyWire".

      (4) The abstract claims "a new, more robust method to quantify the degree of pre- and post-synaptic segregation". However, the study fails to provide evidence that this method is indeed more robust than existing methods.

      (5) The authors define unsegregated neurons as having mixed pre- and postsynapses in the same space. However, this ignores the neurons' topology: a neuron can exhibit a clearly defined dendrite with (mostly) postsynapses and a clearly defined axon with (mostly) presynapses, which then occupy the same space. This is different from genuinely unsegregated neurons with no distinct dendritic and axonal compartments, such as CT1.

      (6) It is not entirely clear where the marmoset dataset originates from. Was it generated for this study? If not, why is there a note in the Ethics Declaration?

      (7) On the differences between hemibrain and FlyWire: What is the "18.8 million post-synapses" for FlyWire referring to? The (thresholded) FlyWire synapse table has 130M connections (=postsynapses). Subsetting that synapse cloud to the hemibrain volume still gives ~47M synapses. Further subsetting to only connections between proofread neurons inside the hemibrain volume gives 19.4M - perhaps the authors did something like that? Similarly, the hemibrain synapse table contains 64M postsynapses. Do the 21M "FlyEM" post-synapses refer to proofread neurons only? If the authors indeed used only (post-)synapses from proofread neurons, they need to make that explicit in results and methods, and account for differences in reconstruction status when making any comparisons. For example, the mushroom body in the hemibrain got a lot more attention than in FlyWire, which would explain the differences reported here. For that reason, connection weights are often expressed as, e.g., a fraction of the target's inputs instead of the total number of synapses when comparing connectivity across connectomic datasets. Furthermore, in Figure 3b, it looks like the FlyWire synapse cloud was not trimmed to the exact hemibrain boundaries: for example, the trimmed FlyWire synapse cloud seems to extend further into the optic lobes than the hemibrain volume does.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Okuno et al. re-analyze whole-brain imaging data collected in another paper (Brezovec et al., 2024) in the context of the two currently available Drosophila connectome datasets: the partial "FlyEM" (hemibrain) dataset (Scheffer et al., 2020) and the whole-brain "FlyWire" dataset (Dorkenwald et al., 2024). They apply existing fMRI signal processing algorithms to the fly imaging data and compute function-structure correlations across a variety of post-processing parameters (noise reduction methods, ROI size), demonstrating an inverse relationship between ROI size and FC-SC correlation. The authors go on to look at structural connectivity amongst more polarized or less polarized neurons, and suggest that stronger FC-SC correlations are driven by more polarized neurons.

      Strengths:

      (1) The result that larger mesoscale ROIs have a higher correlation with structural data is interesting. This has been previously discussed in Drosophila in Turner et al., 2021, but here it is quantified more extensively.

      (2) The quantification of neuron polarization (PPSSI) as applied to these structural data is a promising approach for quantifying differences in spatial synapse distribution.

      Weaknesses:

      One should not score noise/nuisance removal methods solely by their impact on FC-SC correlation values, because we do not know a priori that direct structural connections correspond with strong functional correlations. In fact, work in C. elegans, where we have access to both a connectome and neuron-resolution functional data, suggests that this relationship is weak (Yemini et al., 2021; Randi et al., 2023). Similarly, I don't think it's appropriate to tune the confidence scores on the EM datasets using FC-SC correlations as an output metric.

      Any discussion of FC-SC comparisons should include an analysis of excitatory/inhibitory neurotransmitters, which are available in the fly connectome dataset. However, here the authors do not perform any analyses with neurotransmitter information.<br /> Comparisons between fly and human MRI data are also premature here. Firstly, the fly connectomes, which are derived from neuron-scale EM reconstructions, are a qualitatively different kind of data from human connectomes, which are derived from DSI imaging of large-scale tracts. Likewise, calcium data and fMRI data are very different functional data acquisition methods-the fact that similar processing steps can be used on time-series data does not make them surprisingly similar, and does not in my view, constitute evidence of "similar design concepts."

      The comparison of FlyEM/FlyWire connectomes concludes that differences are more likely a result of data processing than of inter-individual variability. If this is the case, the title should not claim that the manuscript covers individual variability.<br /> The analysis of the wedge-AVLP neuron strikes me as highly speculative, given that the alignment precision between the connectome and the functional data is around 5 microns (Brezovec* et al, PNAS 2024).